101
|
Zheng Y, Zhang L, He S, Xie Z, Zhang J, Ge C, Sun G, Huang J, Li H. Integrated Module of Multidimensional Omics for Peripheral Biomarkers (iMORE) in patients with major depressive disorder: rationale and design of a prospective multicentre cohort study. BMJ Open 2022; 12:e067447. [PMID: 36418119 PMCID: PMC9685190 DOI: 10.1136/bmjopen-2022-067447] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Indexed: 11/24/2022] Open
Abstract
INTRODUCTION Major depressive disorder (MDD) represents a worldwide burden on healthcare and the response to antidepressants remains limited. Systems biology approaches have been used to explore the precision therapy. However, no reliable biomarker clinically exists for prognostic prediction at present. The objectives of the Integrated Module of Multidimensional Omics for Peripheral Biomarkers (iMORE) study are to predict the efficacy of antidepressants by integrating multidimensional omics and performing validation in a real-world setting. As secondary aims, a series of potential biomarkers are explored for biological subtypes. METHODS AND ANALYSIS iMore is an observational cohort study in patients with MDD with a multistage design in China. The study is performed by three mental health centres comprising an observation phase and a validation phase. A total of 200 patients with MDD and 100 healthy controls were enrolled. The protocol-specified antidepressants are selective serotonin reuptake inhibitors and serotonin-norepinephrine reuptake inhibitors. Clinical visits (baseline, 4 and 8 weeks) include psychiatric rating scales for symptom assessment and biospecimen collection for multiomics analysis. Participants are divided into responders and non-responders based on treatment response (>50% reduction in Montgomery-Asberg Depression Rating Scale). Antidepressants' responses are predicted and biomarkers are explored using supervised learning approach by integration of metabolites, cytokines, gut microbiomes and immunophenotypic cells. The accuracy of the prediction models constructed is verified in an independent validation phase. ETHICS AND DISSEMINATION The study was approved by the ethics committee of Shanghai Mental Health Center (approval number 2020-87). All participants need to sign a written consent for the study entry. Study findings will be published in peer-reviewed journals. TRIAL REGISTRATION NUMBER NCT04518592.
Collapse
Affiliation(s)
- Yuzhen Zheng
- Shanghai Mental Health Center, Shanghai Jiao Tong University School of Medicine, Shanghai, China
| | - Linna Zhang
- Shanghai Mental Health Center, Shanghai Jiao Tong University School of Medicine, Shanghai, China
| | - Shen He
- Department of Psychiatry, Shanghai Mental Health Center, Shanghai Jiao Tong University School of Medicine, Shanghai, China
| | - Zuoquan Xie
- State Key Laboratory of Drug Research, Shanghai Institute of Materia Medica, Chinese Academy of Sciences, Shanghai, China
| | - Jing Zhang
- Shanghai Green Valley Pharmaceutical Co Ltd, Shanghai, China
| | - Changrong Ge
- Shanghai Green Valley Pharmaceutical Co Ltd, Shanghai, China
| | - Guangqiang Sun
- Shanghai Green Valley Pharmaceutical Co Ltd, Shanghai, China
| | - Jingjing Huang
- Department of Psychiatry, Shanghai Mental Health Center, Shanghai Jiao Tong University School of Medicine, Shanghai, China
- Clinical Research Center for Mental Health, Shanghai Mental Health Center, Shanghai, China
| | - Huafang Li
- Department of Psychiatry, Shanghai Mental Health Center, Shanghai Jiao Tong University School of Medicine, Shanghai, China
- Clinical Research Center for Mental Health, Shanghai Mental Health Center, Shanghai, China
- Shanghai Key Laboratory of Psychotic Disorders, Shanghai Mental Health Center, Shanghai Jiao Tong University School of Medicine, Shanghai, China
| |
Collapse
|
102
|
Askr H, Elgeldawi E, Aboul Ella H, Elshaier YAMM, Gomaa MM, Hassanien AE. Deep learning in drug discovery: an integrative review and future challenges. Artif Intell Rev 2022; 56:5975-6037. [PMID: 36415536 PMCID: PMC9669545 DOI: 10.1007/s10462-022-10306-1] [Citation(s) in RCA: 86] [Impact Index Per Article: 28.7] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 10/24/2022] [Indexed: 11/18/2022]
Abstract
Recently, using artificial intelligence (AI) in drug discovery has received much attention since it significantly shortens the time and cost of developing new drugs. Deep learning (DL)-based approaches are increasingly being used in all stages of drug development as DL technology advances, and drug-related data grows. Therefore, this paper presents a systematic Literature review (SLR) that integrates the recent DL technologies and applications in drug discovery Including, drug-target interactions (DTIs), drug-drug similarity interactions (DDIs), drug sensitivity and responsiveness, and drug-side effect predictions. We present a review of more than 300 articles between 2000 and 2022. The benchmark data sets, the databases, and the evaluation measures are also presented. In addition, this paper provides an overview of how explainable AI (XAI) supports drug discovery problems. The drug dosing optimization and success stories are discussed as well. Finally, digital twining (DT) and open issues are suggested as future research challenges for drug discovery problems. Challenges to be addressed, future research directions are identified, and an extensive bibliography is also included.
Collapse
Affiliation(s)
- Heba Askr
- Faculty of Computers and Artificial Intelligence, University of Sadat City, Sadat City, Egypt
| | - Enas Elgeldawi
- Computer Science Department, Faculty of Science, Minia University, Minia, Egypt
| | - Heba Aboul Ella
- Faculty of Pharmacy and Drug Technology, Chinese University in Egypt (CUE), Cairo, Egypt
| | | | - Mamdouh M. Gomaa
- Computer Science Department, Faculty of Science, Minia University, Minia, Egypt
| | - Aboul Ella Hassanien
- Faculty of Computers and Artificial Intelligence, Cairo University, Cairo, Egypt
| |
Collapse
|
103
|
Shin J, Piao Y, Bang D, Kim S, Jo K. DRPreter: Interpretable Anticancer Drug Response Prediction Using Knowledge-Guided Graph Neural Networks and Transformer. Int J Mol Sci 2022; 23:13919. [PMID: 36430395 PMCID: PMC9699175 DOI: 10.3390/ijms232213919] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/30/2022] [Revised: 10/27/2022] [Accepted: 11/08/2022] [Indexed: 11/16/2022] Open
Abstract
Some of the recent studies on drug sensitivity prediction have applied graph neural networks to leverage prior knowledge on the drug structure or gene network, and other studies have focused on the interpretability of the model to delineate the mechanism governing the drug response. However, it is crucial to make a prediction model that is both knowledge-guided and interpretable, so that the prediction accuracy is improved and practical use of the model can be enhanced. We propose an interpretable model called DRPreter (drug response predictor and interpreter) that predicts the anticancer drug response. DRPreter learns cell line and drug information with graph neural networks; the cell-line graph is further divided into multiple subgraphs with domain knowledge on biological pathways. A type-aware transformer in DRPreter helps detect relationships between pathways and a drug, highlighting important pathways that are involved in the drug response. Extensive experiments on the GDSC (Genomics of Drug Sensitivity and Cancer) dataset demonstrate that the proposed method outperforms state-of-the-art graph-based models for drug response prediction. In addition, DRPreter detected putative key genes and pathways for specific drug-cell-line pairs with supporting evidence in the literature, implying that our model can help interpret the mechanism of action of the drug.
Collapse
Affiliation(s)
- Jihye Shin
- Interdisciplinary Program in Bioinformatics, Seoul National University, Seoul 08826, Korea
| | - Yinhua Piao
- Department of Computer Science and Engineering, Institute of Engineering Research, Seoul National University, Seoul 08826, Korea
| | - Dongmin Bang
- Interdisciplinary Program in Bioinformatics, Seoul National University, Seoul 08826, Korea
- AIGENDRUG Co., Ltd., Seoul 08826, Korea
| | - Sun Kim
- Interdisciplinary Program in Bioinformatics, Seoul National University, Seoul 08826, Korea
- Department of Computer Science and Engineering, Institute of Engineering Research, Seoul National University, Seoul 08826, Korea
- Interdisciplinary Program in Artificial Intelligence, Seoul National University, Seoul 08826, Korea
- MOGAM Institute for Biomedical Research, Yongin-si 16924, Korea
| | - Kyuri Jo
- Department of Computer Engineering, Chungbuk National University, Cheongju 28644, Korea
| |
Collapse
|
104
|
Dickinson Q, Aufschnaiter A, Ott M, Meyer JG. Multi-omic integration by machine learning (MIMaL). Bioinformatics 2022; 38:4908-4918. [PMID: 36106996 PMCID: PMC9801967 DOI: 10.1093/bioinformatics/btac631] [Citation(s) in RCA: 6] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/08/2022] [Revised: 08/17/2022] [Accepted: 09/14/2022] [Indexed: 01/05/2023] Open
Abstract
MOTIVATION Cells respond to environments by regulating gene expression to exploit resources optimally. Recent advances in technologies allow for measuring the abundances of RNA, proteins, lipids and metabolites. These highly complex datasets reflect the states of the different layers in a biological system. Multi-omics is the integration of these disparate methods and data to gain a clearer picture of the biological state. Multi-omic studies of the proteome and metabolome are becoming more common as mass spectrometry technology continues to be democratized. However, knowledge extraction through the integration of these data remains challenging. RESULTS Connections between molecules in different omic layers were discovered through a combination of machine learning and model interpretation. Discovered connections reflected protein control (ProC) over metabolites. Proteins discovered to control citrate were mapped onto known genetic and metabolic networks, revealing that these protein regulators are novel. Further, clustering the magnitudes of ProC over all metabolites enabled the prediction of five gene functions, each of which was validated experimentally. Two uncharacterized genes, YJR120W and YDL157C, were accurately predicted to modulate mitochondrial translation. Functions for three incompletely characterized genes were also predicted and validated, including SDH9, ISC1 and FMP52. A website enables results exploration and also MIMaL analysis of user-supplied multi-omic data. AVAILABILITY AND IMPLEMENTATION The website for MIMaL is at https://mimal.app. Code for the website is at https://github.com/qdickinson/mimal-website. Code to implement MIMaL is at https://github.com/jessegmeyerlab/MIMaL. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Quinn Dickinson
- Department of Biochemistry, Medical College of Wisconsin, Milwaukee, WI 53226, USA
- Department of Computational Biomedicine, Cedars-Sinai Medical Center, Los Angeles, CA 90048, USA
| | - Andreas Aufschnaiter
- Department of Biochemistry and Biophysics, Stockholm University, Stockholm, Sweden
| | - Martin Ott
- Department of Biochemistry and Biophysics, Stockholm University, Stockholm, Sweden
- Department of Medical Biochemistry and Cell Biology, Institute of Biomedicine, University of Gothenburg, Gothenburg, Sweden
| | - Jesse G Meyer
- Department of Biochemistry, Medical College of Wisconsin, Milwaukee, WI 53226, USA
- Department of Computational Biomedicine, Cedars-Sinai Medical Center, Los Angeles, CA 90048, USA
| |
Collapse
|
105
|
Identification of phenocopies improves prediction of targeted therapy response over DNA mutations alone. NPJ Genom Med 2022; 7:58. [PMID: 36253482 PMCID: PMC9576758 DOI: 10.1038/s41525-022-00328-7] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/26/2022] [Accepted: 09/29/2022] [Indexed: 11/09/2022] Open
Abstract
DNA mutations in specific genes can confer preferential benefit from drugs targeting those genes. However, other molecular perturbations can “phenocopy” pathogenic mutations, but would not be identified using standard clinical sequencing, leading to missed opportunities for other patients to benefit from targeted treatments. We hypothesized that RNA phenocopy signatures of key cancer driver gene mutations could improve our ability to predict response to targeted therapies, despite not being directly trained on drug response. To test this, we built gene expression signatures in tissue samples for specific mutations and found that phenocopy signatures broadly increased accuracy of drug response predictions in-vitro compared to DNA mutation alone, and identified additional cancer cell lines that respond well with a positive/negative predictive value on par or better than DNA mutations. We further validated our results across four clinical cohorts. Our results suggest that routine RNA sequencing of tumors to identify phenocopies in addition to standard targeted DNA sequencing would improve our ability to accurately select patients for targeted therapies in the clinic.
Collapse
|
106
|
Raufaste-Cazavieille V, Santiago R, Droit A. Multi-omics analysis: Paving the path toward achieving precision medicine in cancer treatment and immuno-oncology. Front Mol Biosci 2022; 9:962743. [PMID: 36304921 PMCID: PMC9595279 DOI: 10.3389/fmolb.2022.962743] [Citation(s) in RCA: 26] [Impact Index Per Article: 8.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/06/2022] [Accepted: 09/21/2022] [Indexed: 11/13/2022] Open
Abstract
The acceleration of large-scale sequencing and the progress in high-throughput computational analyses, defined as omics, was a hallmark for the comprehension of the biological processes in human health and diseases. In cancerology, the omics approach, initiated by genomics and transcriptomics studies, has revealed an incredible complexity with unsuspected molecular diversity within a same tumor type as well as spatial and temporal heterogeneity of tumors. The integration of multiple biological layers of omics studies brought oncology to a new paradigm, from tumor site classification to pan-cancer molecular classification, offering new therapeutic opportunities for precision medicine. In this review, we will provide a comprehensive overview of the latest innovations for multi-omics integration in oncology and summarize the largest multi-omics dataset available for adult and pediatric cancers. We will present multi-omics techniques for characterizing cancer biology and show how multi-omics data can be combined with clinical data for the identification of prognostic and treatment-specific biomarkers, opening the way to personalized therapy. To conclude, we will detail the newest strategies for dissecting the tumor immune environment and host–tumor interaction. We will explore the advances in immunomics and microbiomics for biomarker identification to guide therapeutic decision in immuno-oncology.
Collapse
Affiliation(s)
| | - Raoul Santiago
- CHU de Québec Research Center, Université Laval, Québec, QC, Canada
- Division of Pediatric Hematology-Oncology, Centre Hospitalier Universitaire de L’Université Laval, Charles Bruneau Cancer Center, Québec, QC, Canada
- *Correspondence: Raoul Santiago, ; Arnaud Droit,
| | - Arnaud Droit
- CHU de Québec Research Center, Université Laval, Québec, QC, Canada
- *Correspondence: Raoul Santiago, ; Arnaud Droit,
| |
Collapse
|
107
|
Reel PS, Reel S, van Kralingen JC, Langton K, Lang K, Erlic Z, Larsen CK, Amar L, Pamporaki C, Mulatero P, Blanchard A, Kabat M, Robertson S, MacKenzie SM, Taylor AE, Peitzsch M, Ceccato F, Scaroni C, Reincke M, Kroiss M, Dennedy MC, Pecori A, Monticone S, Deinum J, Rossi GP, Lenzini L, McClure JD, Nind T, Riddell A, Stell A, Cole C, Sudano I, Prehn C, Adamski J, Gimenez-Roqueplo AP, Assié G, Arlt W, Beuschlein F, Eisenhofer G, Davies E, Zennaro MC, Jefferson E. Machine learning for classification of hypertension subtypes using multi-omics: A multi-centre, retrospective, data-driven study. EBioMedicine 2022; 84:104276. [PMID: 36179553 PMCID: PMC9520210 DOI: 10.1016/j.ebiom.2022.104276] [Citation(s) in RCA: 34] [Impact Index Per Article: 11.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/08/2022] [Revised: 08/31/2022] [Accepted: 09/06/2022] [Indexed: 11/09/2022] Open
Abstract
Background Arterial hypertension is a major cardiovascular risk factor. Identification of secondary hypertension in its various forms is key to preventing and targeting treatment of cardiovascular complications. Simplified diagnostic tests are urgently required to distinguish primary and secondary hypertension to address the current underdiagnosis of the latter. Methods This study uses Machine Learning (ML) to classify subtypes of endocrine hypertension (EHT) in a large cohort of hypertensive patients using multidimensional omics analysis of plasma and urine samples. We measured 409 multi-omics (MOmics) features including plasma miRNAs (PmiRNA: 173), plasma catechol O-methylated metabolites (PMetas: 4), plasma steroids (PSteroids: 16), urinary steroid metabolites (USteroids: 27), and plasma small metabolites (PSmallMB: 189) in primary hypertension (PHT) patients, EHT patients with either primary aldosteronism (PA), pheochromocytoma/functional paraganglioma (PPGL) or Cushing syndrome (CS) and normotensive volunteers (NV). Biomarker discovery involved selection of disease combination, outlier handling, feature reduction, 8 ML classifiers, class balancing and consideration of different age- and sex-based scenarios. Classifications were evaluated using balanced accuracy, sensitivity, specificity, AUC, F1, and Kappa score. Findings Complete clinical and biological datasets were generated from 307 subjects (PA=113, PPGL=88, CS=41 and PHT=112). The random forest classifier provided ∼92% balanced accuracy (∼11% improvement on the best mono-omics classifier), with 96% specificity and 0.95 AUC to distinguish one of the four conditions in multi-class ALL-ALL comparisons (PPGL vs PA vs CS vs PHT) on an unseen test set, using 57 MOmics features. For discrimination of EHT (PA + PPGL + CS) vs PHT, the simple logistic classifier achieved 0.96 AUC with 90% sensitivity, and ∼86% specificity, using 37 MOmics features. One PmiRNA (hsa-miR-15a-5p) and two PSmallMB (C9 and PC ae C38:1) features were found to be most discriminating for all disease combinations. Overall, the MOmics-based classifiers were able to provide better classification performance in comparison to mono-omics classifiers. Interpretation We have developed a ML pipeline to distinguish different EHT subtypes from PHT using multi-omics data. This innovative approach to stratification is an advancement towards the development of a diagnostic tool for EHT patients, significantly increasing testing throughput and accelerating administration of appropriate treatment. Funding European Union's Horizon 2020 Research and Innovation Programme under Grant Agreement No. 633983, Clinical Research Priority Program of the University of Zurich for the CRPP HYRENE (to Z.E. and F.B.), and Deutsche Forschungsgemeinschaft (CRC/Transregio 205/1).
Collapse
|
108
|
Peng W, Liu H, Dai W, Yu N, Wang J. Predicting cancer drug response using parallel heterogeneous graph convolutional networks with neighborhood interactions. Bioinformatics 2022; 38:4546-4553. [PMID: 35997568 DOI: 10.1093/bioinformatics/btac574] [Citation(s) in RCA: 23] [Impact Index Per Article: 7.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/03/2022] [Revised: 07/26/2022] [Accepted: 08/22/2022] [Indexed: 12/24/2022] Open
Abstract
MOTIVATION Due to cancer heterogeneity, the therapeutic effect may not be the same when a cohort of patients of the same cancer type receive the same treatment. The anticancer drug response prediction may help develop personalized therapy regimens to increase survival and reduce patients' expenses. Recently, graph neural network-based methods have aroused widespread interest and achieved impressive results on the drug response prediction task. However, most of them apply graph convolution to process cell line-drug bipartite graphs while ignoring the intrinsic differences between cell lines and drug nodes. Moreover, most of these methods aggregate node-wise neighbor features but fail to consider the element-wise interaction between cell lines and drugs. RESULTS This work proposes a neighborhood interaction (NI)-based heterogeneous graph convolution network method, namely NIHGCN, for anticancer drug response prediction in an end-to-end way. Firstly, it constructs a heterogeneous network consisting of drugs, cell lines and the known drug response information. Cell line gene expression and drug molecular fingerprints are linearly transformed and input as node attributes into an interaction model. The interaction module consists of a parallel graph convolution network layer and a NI layer, which aggregates node-level features from their neighbors through graph convolution operation and considers the element-level of interactions with their neighbors in the NI layer. Finally, the drug response predictions are made by calculating the linear correlation coefficients of feature representations of cell lines and drugs. We have conducted extensive experiments to assess the effectiveness of our model on Cancer Drug Sensitivity Data (GDSC) and Cancer Cell Line Encyclopedia (CCLE) datasets. It has achieved the best performance compared with the state-of-the-art algorithms, especially in predicting drug responses for new cell lines, new drugs and targeted drugs. Furthermore, our model that was well trained on the GDSC dataset can be successfully applied to predict samples of PDX and TCGA, which verified the transferability of our model from cell line in vitro to the datasets in vivo. AVAILABILITY AND IMPLEMENTATION The source code can be obtained from https://github.com/weiba/NIHGCN. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Wei Peng
- Faculty of Information Engineering and Automation, Kunming University of Science and Technology, Kunming 650050, P.R. China
| | - Hancheng Liu
- Faculty of Information Engineering and Automation, Kunming University of Science and Technology, Kunming 650050, P.R. China
| | - Wei Dai
- Faculty of Information Engineering and Automation, Kunming University of Science and Technology, Kunming 650050, P.R. China
| | - Ning Yu
- Department of Computing Sciences, The College at Brockport, State University of New York, Brockport, NY 14422, USA
| | - Jianxin Wang
- School of Computer Science and Engineering, Central South University, Changsha 410083, P.R. China.,Hunan Provincial Key Lab on Bioinformatics, Central South University, Changsha 410083, P. R. China
| |
Collapse
|
109
|
Tang YC, Powell RT, Gottlieb A. Molecular pathways enhance drug response prediction using transfer learning from cell lines to tumors and patient-derived xenografts. Sci Rep 2022; 12:16109. [PMID: 36168036 PMCID: PMC9515168 DOI: 10.1038/s41598-022-20646-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/12/2022] [Accepted: 09/16/2022] [Indexed: 11/24/2022] Open
Abstract
Computational models have been successful in predicting drug sensitivity in cancer cell line data, creating an opportunity to guide precision medicine. However, translating these models to tumors remains challenging. We propose a new transfer learning workflow that transfers drug sensitivity predicting models from large-scale cancer cell lines to both tumors and patient derived xenografts based on molecular pathways derived from genomic features. We further compute feature importance to identify pathways most important to drug response prediction. We obtained good performance on tumors (AUROC = 0.77) and patient derived xenografts from triple negative breast cancers (RMSE = 0.11). Using feature importance, we highlight the association between ER-Golgi trafficking pathway in everolimus sensitivity within breast cancer patients and the role of class II histone deacetylases and interlukine-12 in response to drugs for triple-negative breast cancer. Pathway information support transfer of drug response prediction models from cell lines to tumors and can provide biological interpretation underlying the predictions, serving as a steppingstone towards usage in clinical setting.
Collapse
Affiliation(s)
- Yi-Ching Tang
- grid.267308.80000 0000 9206 2401Center for Precision Health, School of Biomedical Informatics, University of Texas Health Science Center at Houston, Houston, TX 77030 USA
| | - Reid T. Powell
- grid.264756.40000 0004 4687 2082Center for Translational Cancer Research, Texas A&M University, Houston, TX 77030 USA
| | - Assaf Gottlieb
- Center for Precision Health, School of Biomedical Informatics, University of Texas Health Science Center at Houston, Houston, TX, 77030, USA.
| |
Collapse
|
110
|
Akhoundova D, Rubin MA. Clinical application of advanced multi-omics tumor profiling: Shaping precision oncology of the future. Cancer Cell 2022; 40:920-938. [PMID: 36055231 DOI: 10.1016/j.ccell.2022.08.011] [Citation(s) in RCA: 71] [Impact Index Per Article: 23.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 02/21/2022] [Revised: 05/22/2022] [Accepted: 08/11/2022] [Indexed: 12/17/2022]
Abstract
Next-generation DNA sequencing technology has dramatically advanced clinical oncology through the identification of therapeutic targets and molecular biomarkers, leading to the personalization of cancer treatment with significantly improved outcomes for many common and rare tumor entities. More recent developments in advanced tumor profiling now enable dissection of tumor molecular architecture and the functional phenotype at cellular and subcellular resolution. Clinical translation of high-resolution tumor profiling and integration of multi-omics data into precision treatment, however, pose significant challenges at the level of prospective validation and clinical implementation. In this review, we summarize the latest advances in multi-omics tumor profiling, focusing on spatial genomics and chromatin organization, spatial transcriptomics and proteomics, liquid biopsy, and ex vivo modeling of drug response. We analyze the current stages of translational validation of these technologies and discuss future perspectives for their integration into precision treatment.
Collapse
Affiliation(s)
- Dilara Akhoundova
- Department for BioMedical Research, University of Bern, 3008 Bern, Switzerland; Department of Medical Oncology, Inselspital, University Hospital of Bern, 3010 Bern, Switzerland
| | - Mark A Rubin
- Department for BioMedical Research, University of Bern, 3008 Bern, Switzerland; Bern Center for Precision Medicine, Inselspital, University Hospital of Bern, 3008 Bern, Switzerland.
| |
Collapse
|
111
|
Leng D, Zheng L, Wen Y, Zhang Y, Wu L, Wang J, Wang M, Zhang Z, He S, Bo X. A benchmark study of deep learning-based multi-omics data fusion methods for cancer. Genome Biol 2022; 23:171. [PMID: 35945544 PMCID: PMC9361561 DOI: 10.1186/s13059-022-02739-2] [Citation(s) in RCA: 44] [Impact Index Per Article: 14.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/22/2022] [Accepted: 07/26/2022] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND A fused method using a combination of multi-omics data enables a comprehensive study of complex biological processes and highlights the interrelationship of relevant biomolecules and their functions. Driven by high-throughput sequencing technologies, several promising deep learning methods have been proposed for fusing multi-omics data generated from a large number of samples. RESULTS In this study, 16 representative deep learning methods are comprehensively evaluated on simulated, single-cell, and cancer multi-omics datasets. For each of the datasets, two tasks are designed: classification and clustering. The classification performance is evaluated by using three benchmarking metrics including accuracy, F1 macro, and F1 weighted. Meanwhile, the clustering performance is evaluated by using four benchmarking metrics including the Jaccard index (JI), C-index, silhouette score, and Davies Bouldin score. For the cancer multi-omics datasets, the methods' strength in capturing the association of multi-omics dimensionality reduction results with survival and clinical annotations is further evaluated. The benchmarking results indicate that moGAT achieves the best classification performance. Meanwhile, efmmdVAE, efVAE, and lfmmdVAE show the most promising performance across all complementary contexts in clustering tasks. CONCLUSIONS Our benchmarking results not only provide a reference for biomedical researchers to choose appropriate deep learning-based multi-omics data fusion methods, but also suggest the future directions for the development of more effective multi-omics data fusion methods. The deep learning frameworks are available at https://github.com/zhenglinyi/DL-mo .
Collapse
Affiliation(s)
- Dongjin Leng
- Institute of Health Service and Transfusion Medicine, Beijing, People’s Republic of China
| | - Linyi Zheng
- School of Informatics, Xiamen University, Xiamen, People’s Republic of China
| | - Yuqi Wen
- Institute of Health Service and Transfusion Medicine, Beijing, People’s Republic of China
| | - Yunhao Zhang
- School of Informatics, Xiamen University, Xiamen, People’s Republic of China
| | - Lianlian Wu
- Academy of Medical Engineering and Translational Medicine, Tianjin University, Tianjin, People’s Republic of China
| | - Jing Wang
- School of Medicine, Tsinghua University, Beijing, People’s Republic of China
| | - Meihong Wang
- School of Informatics, Xiamen University, Xiamen, People’s Republic of China
| | - Zhongnan Zhang
- School of Informatics, Xiamen University, Xiamen, People’s Republic of China
| | - Song He
- Institute of Health Service and Transfusion Medicine, Beijing, People’s Republic of China
| | - Xiaochen Bo
- Institute of Health Service and Transfusion Medicine, Beijing, People’s Republic of China
| |
Collapse
|
112
|
Pan X, Lin X, Cao D, Zeng X, Yu PS, He L, Nussinov R, Cheng F. Deep learning for drug repurposing: Methods, databases, and applications. WIRES COMPUTATIONAL MOLECULAR SCIENCE 2022. [DOI: 10.1002/wcms.1597] [Citation(s) in RCA: 46] [Impact Index Per Article: 15.3] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/15/2022]
Affiliation(s)
- Xiaoqin Pan
- School of Computer Science and Engineering Hunan University Changsha Hunan China
| | - Xuan Lin
- School of Computer Science Xiangtan University Xiangtan China
- Key Laboratory of Intelligent Computing and Information Processing of Ministry of Education Xiangtan University Xiangtan China
| | - Dongsheng Cao
- Xiangya School of Pharmaceutical Sciences Central South University Changsha China
| | - Xiangxiang Zeng
- School of Computer Science and Engineering Hunan University Changsha Hunan China
| | - Philip S. Yu
- Department of Computer Science University of Illinois at Chicago Chicago Illinois USA
| | - Lifang He
- Department of Computer Science and Engineering Lehigh University Bethlehem Pennsylvania USA
| | - Ruth Nussinov
- Computational Structural Biology Section, Basic Science Program, Frederick National Laboratory for Cancer Research National Cancer Institute at Frederick Frederick Maryland USA
- Department of Human Molecular Genetics and Biochemistry, Sackler School of Medicine Tel Aviv University Tel Aviv Israel
| | - Feixiong Cheng
- Genomic Medicine Institute, Lerner Research Institute, Cleveland Clinic Cleveland Ohio USA
- Department of Molecular Medicine, Cleveland Clinic Lerner College of Medicine Case Western Reserve University Cleveland Ohio USA
- Case Comprehensive Cancer Center Case Western Reserve University School of Medicine Cleveland Ohio USA
| |
Collapse
|
113
|
Paltun BG, Kaski S, Mamitsuka H. DIVERSE: Bayesian Data IntegratiVE Learning for Precise Drug ResponSE Prediction. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2022; 19:2197-2207. [PMID: 33705322 DOI: 10.1109/tcbb.2021.3065535] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/12/2023]
Abstract
Detecting predictive biomarkers from multi-omics data is important for precision medicine, to improve diagnostics of complex diseases and for better treatments. This needs substantial experimental efforts that are made difficult by the heterogeneity of cell lines and huge cost. An effective solution is to build a computational model over the diverse omics data, including genomic, molecular, and environmental information. However, choosing informative and reliable data sources from among the different types of data is a challenging problem. We propose DIVERSE, a framework of Bayesian importance-weighted tri- and bi-matrix factorization(DIVERSE3 or DIVERSE2) to predict drug responses from data of cell lines, drugs, and gene interactions. DIVERSE integrates the data sources systematically, in a step-wise manner, examining the importance of each added data set in turn. More specifically, we sequentially integrate five different data sets, which have not all been combined in earlier bioinformatic methods for predicting drug responses. Empirical experiments show that DIVERSE clearly outperformed five other methods including three state-of-the-art approaches, under cross-validation, particularly in out-of-matrix prediction, which is closer to the setting of real use cases and more challenging than simpler in-matrix prediction. Additionally, case studies for discovering new drugs further confirmed the performance advantage of DIVERSE.
Collapse
|
114
|
Crawford J, Christensen BC, Chikina M, Greene CS. Widespread redundancy in -omics profiles of cancer mutation states. Genome Biol 2022; 23:137. [PMID: 35761387 PMCID: PMC9238138 DOI: 10.1186/s13059-022-02705-y] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/15/2021] [Accepted: 06/14/2022] [Indexed: 02/04/2023] Open
Abstract
BACKGROUND In studies of cellular function in cancer, researchers are increasingly able to choose from many -omics assays as functional readouts. Choosing the correct readout for a given study can be difficult, and which layer of cellular function is most suitable to capture the relevant signal remains unclear. RESULTS We consider prediction of cancer mutation status (presence or absence) from functional -omics data as a representative problem that presents an opportunity to quantify and compare the ability of different -omics readouts to capture signals of dysregulation in cancer. From the TCGA Pan-Cancer Atlas that contains genetic alteration data, we focus on RNA sequencing, DNA methylation arrays, reverse phase protein arrays (RPPA), microRNA, and somatic mutational signatures as -omics readouts. Across a collection of genes recurrently mutated in cancer, RNA sequencing tends to be the most effective predictor of mutation state. We find that one or more other data types for many of the genes are approximately equally effective predictors. Performance is more variable between mutations than that between data types for the same mutation, and there is little difference between the top data types. We also find that combining data types into a single multi-omics model provides little or no improvement in predictive ability over the best individual data type. CONCLUSIONS Based on our results, for the design of studies focused on the functional outcomes of cancer mutations, there are often multiple -omics types that can serve as effective readouts, although gene expression seems to be a reasonable default option.
Collapse
Affiliation(s)
- Jake Crawford
- Genomics and Computational Biology Graduate Group, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, USA
| | - Brock C Christensen
- Department of Epidemiology, Geisel School of Medicine, Dartmouth College, Lebanon, NH, USA
- Department of Molecular and Systems Biology, Geisel School of Medicine, Dartmouth College, Lebanon, NH, USA
| | - Maria Chikina
- Department of Computational and Systems Biology, School of Medicine, University of Pittsburgh, Pittsburgh, PA, USA
| | - Casey S Greene
- Department of Biochemistry and Molecular Genetics, University of Colorado School of Medicine, Aurora, CO, USA.
- Center for Health AI, University of Colorado School of Medicine, Aurora, CO, USA.
| |
Collapse
|
115
|
Ward B, Yombi JC, Balligand JL, Cani PD, Collet JF, de Greef J, Dewulf JP, Gatto L, Haufroid V, Jodogne S, Kabamba B, Pyr dit Ruys S, Vertommen D, Elens L, Belkhir L. HYGIEIA: HYpothesizing the Genesis of Infectious Diseases and Epidemics through an Integrated Systems Biology Approach. Viruses 2022; 14:1373. [PMID: 35891354 PMCID: PMC9318602 DOI: 10.3390/v14071373] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/10/2022] [Revised: 06/13/2022] [Accepted: 06/21/2022] [Indexed: 12/13/2022] Open
Abstract
More than two years on, the COVID-19 pandemic continues to wreak havoc around the world and has battle-tested the pandemic-situation responses of all major global governments. Two key areas of investigation that are still unclear are: the molecular mechanisms that lead to heterogenic patient outcomes, and the causes of Post COVID condition (AKA Long-COVID). In this paper, we introduce the HYGIEIA project, designed to respond to the enormous challenges of the COVID-19 pandemic through a multi-omic approach supported by network medicine. It is hoped that in addition to investigating COVID-19, the logistics deployed within this project will be applicable to other infectious agents, pandemic-type situations, and also other complex, non-infectious diseases. Here, we first look at previous research into COVID-19 in the context of the proteome, metabolome, transcriptome, microbiome, host genome, and viral genome. We then discuss a proposed methodology for a large-scale multi-omic longitudinal study to investigate the aforementioned biological strata through high-throughput sequencing (HTS) and mass-spectrometry (MS) technologies. Lastly, we discuss how a network medicine approach can be used to analyze the data and make meaningful discoveries, with the final aim being the translation of these discoveries into the clinics to improve patient care.
Collapse
Affiliation(s)
- Bradley Ward
- Integrated Pharmacometrics, Pharmacogenomics and Pharmacokinetics Group (PMGK), Louvain Drug Research Institute (LDRI), UCLouvain, Université Catholique de Louvain, 1200 Brussels, Belgium; (B.W.); (S.P.d.R.)
- Louvain Center for Toxicology and Applied Pharmacology (LTAP), Institut de Recherche Expérimentale et Clinique (IREC), UCLouvain, Université Catholique de Louvain, 1200 Brussels, Belgium; (J.d.G.); (J.P.D.); (V.H.)
| | - Jean Cyr Yombi
- Department of Internal Medicine, Cliniques Universitaires Saint-Luc, UCLouvain, Université Catholique de Louvain, 1200 Brussels, Belgium;
| | - Jean-Luc Balligand
- WELBIO (Walloon Excellence in Life Sciences and Biotechnology), Pole of Pharmacology and Therapeutics (FATH), Institut de Recherche Experimentale et Clinique (IREC), Cliniques Universitaires Saint-Luc, UCLouvain, Université Catholique de Louvain, 1200 Brussels, Belgium;
| | - Patrice D. Cani
- WELBIO (Walloon Excellence in Life Sciences and Biotechnology), Metabolism and Nutrition Research Group, Louvain Drug Research Institute (LDRI), UCLouvain, Université Catholique de Louvain, 1200 Brussels, Belgium;
| | - Jean-François Collet
- WELBIO (Walloon Excellence in Life Sciences and Biotechnology), de Duve Institute, UCLouvain, Université Catholique de Louvain, 1200 Brussels, Belgium;
| | - Julien de Greef
- Louvain Center for Toxicology and Applied Pharmacology (LTAP), Institut de Recherche Expérimentale et Clinique (IREC), UCLouvain, Université Catholique de Louvain, 1200 Brussels, Belgium; (J.d.G.); (J.P.D.); (V.H.)
- Department of Internal Medicine, Cliniques Universitaires Saint-Luc, UCLouvain, Université Catholique de Louvain, 1200 Brussels, Belgium;
| | - Joseph P. Dewulf
- Louvain Center for Toxicology and Applied Pharmacology (LTAP), Institut de Recherche Expérimentale et Clinique (IREC), UCLouvain, Université Catholique de Louvain, 1200 Brussels, Belgium; (J.d.G.); (J.P.D.); (V.H.)
- Department of Laboratory Medicine, Cliniques Universitaires Saint-Luc, UCLouvain, Université Catholique de Louvain, 1200 Brussels, Belgium;
- Department of Biochemistry, de Duve Institute, UCLouvain, Université Catholique de Louvain, 1200 Brussels, Belgium
| | - Laurent Gatto
- Computational Biology and Bioinformatics Unit (CBIO), de Duve Institute, UCLouvain, Université Catholique de Louvain, 1200 Brussels, Belgium;
| | - Vincent Haufroid
- Louvain Center for Toxicology and Applied Pharmacology (LTAP), Institut de Recherche Expérimentale et Clinique (IREC), UCLouvain, Université Catholique de Louvain, 1200 Brussels, Belgium; (J.d.G.); (J.P.D.); (V.H.)
- Department of Laboratory Medicine, Cliniques Universitaires Saint-Luc, UCLouvain, Université Catholique de Louvain, 1200 Brussels, Belgium;
| | - Sébastien Jodogne
- Computer Science and Engineering Department (INGI), Institute of Information and Communication Technologies, Electronics and Applied Mathematics (ICTEAM), UCLouvain, Université Catholique de Louvain, 1348 Louvain-la-Neuve, Belgium;
| | - Benoît Kabamba
- Department of Laboratory Medicine, Cliniques Universitaires Saint-Luc, UCLouvain, Université Catholique de Louvain, 1200 Brussels, Belgium;
- Pôle de Microbiologie, Institut de Recherche Expérimentale et Clinique, UCLouvain, Université Catholique de Louvain, 1200 Brussels, Belgium
| | - Sébastien Pyr dit Ruys
- Integrated Pharmacometrics, Pharmacogenomics and Pharmacokinetics Group (PMGK), Louvain Drug Research Institute (LDRI), UCLouvain, Université Catholique de Louvain, 1200 Brussels, Belgium; (B.W.); (S.P.d.R.)
| | - Didier Vertommen
- De Duve Institute, and MASSPROT Platform, UCLouvain, Université Catholique de Louvain, 1200 Brussels, Belgium;
| | - Laure Elens
- Integrated Pharmacometrics, Pharmacogenomics and Pharmacokinetics Group (PMGK), Louvain Drug Research Institute (LDRI), UCLouvain, Université Catholique de Louvain, 1200 Brussels, Belgium; (B.W.); (S.P.d.R.)
- Louvain Center for Toxicology and Applied Pharmacology (LTAP), Institut de Recherche Expérimentale et Clinique (IREC), UCLouvain, Université Catholique de Louvain, 1200 Brussels, Belgium; (J.d.G.); (J.P.D.); (V.H.)
| | - Leïla Belkhir
- Louvain Center for Toxicology and Applied Pharmacology (LTAP), Institut de Recherche Expérimentale et Clinique (IREC), UCLouvain, Université Catholique de Louvain, 1200 Brussels, Belgium; (J.d.G.); (J.P.D.); (V.H.)
- Department of Internal Medicine, Cliniques Universitaires Saint-Luc, UCLouvain, Université Catholique de Louvain, 1200 Brussels, Belgium;
| |
Collapse
|
116
|
Gliozzo J, Mesiti M, Notaro M, Petrini A, Patak A, Puertas-Gallardo A, Paccanaro A, Valentini G, Casiraghi E. Heterogeneous data integration methods for patient similarity networks. Brief Bioinform 2022; 23:6604996. [PMID: 35679533 PMCID: PMC9294435 DOI: 10.1093/bib/bbac207] [Citation(s) in RCA: 13] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/07/2021] [Revised: 04/14/2022] [Accepted: 05/04/2022] [Indexed: 12/29/2022] Open
Abstract
Patient similarity networks (PSNs), where patients are represented as nodes and their similarities as weighted edges, are being increasingly used in clinical research. These networks provide an insightful summary of the relationships among patients and can be exploited by inductive or transductive learning algorithms for the prediction of patient outcome, phenotype and disease risk. PSNs can also be easily visualized, thus offering a natural way to inspect complex heterogeneous patient data and providing some level of explainability of the predictions obtained by machine learning algorithms. The advent of high-throughput technologies, enabling us to acquire high-dimensional views of the same patients (e.g. omics data, laboratory data, imaging data), calls for the development of data fusion techniques for PSNs in order to leverage this rich heterogeneous information. In this article, we review existing methods for integrating multiple biomedical data views to construct PSNs, together with the different patient similarity measures that have been proposed. We also review methods that have appeared in the machine learning literature but have not yet been applied to PSNs, thus providing a resource to navigate the vast machine learning literature existing on this topic. In particular, we focus on methods that could be used to integrate very heterogeneous datasets, including multi-omics data as well as data derived from clinical information and medical imaging.
Collapse
Affiliation(s)
- Jessica Gliozzo
- AnacletoLab - Computer Science Department, Universitá degli Studi di Milano, Via Celoria 18, 20135, Milan, Italy.,European Commission, Joint Research Centre (JRC), Ispra (VA), Italy.,CINI, Infolife National Laboratory, Roma, Italy
| | - Marco Mesiti
- AnacletoLab - Computer Science Department, Universitá degli Studi di Milano, Via Celoria 18, 20135, Milan, Italy.,CINI, Infolife National Laboratory, Roma, Italy
| | - Marco Notaro
- AnacletoLab - Computer Science Department, Universitá degli Studi di Milano, Via Celoria 18, 20135, Milan, Italy.,CINI, Infolife National Laboratory, Roma, Italy
| | - Alessandro Petrini
- AnacletoLab - Computer Science Department, Universitá degli Studi di Milano, Via Celoria 18, 20135, Milan, Italy.,CINI, Infolife National Laboratory, Roma, Italy
| | - Alex Patak
- European Commission, Joint Research Centre (JRC), Ispra (VA), Italy
| | | | - Alberto Paccanaro
- Department of Computer Science, Royal Holloway, University of London, Egham, TW20 0EX UK.,School of Applied Mathematics (EMAp), Fundação Getúlio Vargas, Rio de Janeiro Brazil
| | - Giorgio Valentini
- AnacletoLab - Computer Science Department, Universitá degli Studi di Milano, Via Celoria 18, 20135, Milan, Italy.,CINI, Infolife National Laboratory, Roma, Italy.,DSRC UNIMI, Data Science Research Center, Milano, 20135, Italy.,ELLIS, European Laboratory for Learning and Intelligent Systems, Berlin, Germany
| | - Elena Casiraghi
- AnacletoLab - Computer Science Department, Universitá degli Studi di Milano, Via Celoria 18, 20135, Milan, Italy.,CINI, Infolife National Laboratory, Roma, Italy
| |
Collapse
|
117
|
Hostallero DE, Li Y, Emad A. Looking at the BiG Picture: Incorporating bipartite graphs in drug response prediction. Bioinformatics 2022; 38:3609-3620. [PMID: 35674359 DOI: 10.1093/bioinformatics/btac383] [Citation(s) in RCA: 11] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/11/2021] [Revised: 04/17/2022] [Accepted: 06/01/2022] [Indexed: 12/15/2022] Open
Abstract
MOTIVATION The increasing number of publicly available databases containing drugs' chemical structures, their response in cell lines, and molecular profiles of the cell lines has garnered attention to the problem of drug response prediction. However, many existing methods do not fully leverage the information that is shared among cell lines and drugs with similar structure. As such, drug similarities in terms of cell line responses and chemical structures could prove to be useful in forming drug representations to improve drug response prediction accuracy. RESULTS We present two deep learning approaches, BiG-DRP and BiG-DRP+, for drug response prediction. Our models take advantage of the drugs' chemical structure and the underlying relationships of drugs and cell lines through a bipartite graph and a heterogenous graph convolutional network that incorporate sensitive and resistant cell line information in forming drug representations. Evaluation of our methods and other state-of-the-art models in different scenarios shows that incorporating this bipartite graph significantly improves the prediction performance. Additionally, genes that contribute significantly to the performance of our models also point to important biological processes and signaling pathways. Analysis of predicted drug response of patients' tumors using our model revealed important associations between mutations and drug sensitivity, illustrating the utility of our model in pharmacogenomics studies. AVAILABILITY AND IMPLEMENTATION An implementation of the algorithms in Python is provided in https://github.com/ddhostallero/BiG-DRP. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- David Earl Hostallero
- Department of Electrical and Computer Engineering, McGill University, Montreal, QC H3A 0E9, Canada
- Mila, Quebec AI Institute, Montreal, QC H2S 3H1, Canada
| | - Yihui Li
- Department of Electrical and Computer Engineering, McGill University, Montreal, QC H3A 0E9, Canada
| | - Amin Emad
- Department of Electrical and Computer Engineering, McGill University, Montreal, QC H3A 0E9, Canada
- Mila, Quebec AI Institute, Montreal, QC H2S 3H1, Canada
- The Rosalind and Morris Goodman Cancer Institute, Montreal, QC H3A 1A3, Canada
| |
Collapse
|
118
|
Mo H, Breitling R, Francavilla C, Schwartz JM. Data integration and mechanistic modelling for breast cancer biology: Current state and future directions. CURRENT OPINION IN ENDOCRINE AND METABOLIC RESEARCH 2022; 24:None. [PMID: 36034741 PMCID: PMC9402443 DOI: 10.1016/j.coemr.2022.100350] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 11/07/2022]
Abstract
Breast cancer is one of the most common cancers threatening women worldwide. A limited number of available treatment options, frequent recurrence, and drug resistance exacerbate the prognosis of breast cancer patients. Thus, there is an urgent need for methods to investigate novel treatment options, while taking into account the vast molecular heterogeneity of breast cancer. Recent advances in molecular profiling technologies, including genomics, epigenomics, transcriptomics, proteomics and metabolomics data, enable approaching breast cancer biology at multiple levels of omics interaction networks. Systems biology approaches, including computational inference of ‘big data’ and mechanistic modelling of specific pathways, are emerging to identify potential novel combinations of breast cancer subtype signatures and more diverse targeted therapies.
Collapse
|
119
|
Wang XS, Lee S, Zhang H, Tang G, Wang Y. An integral genomic signature approach for tailored cancer therapy using genome-wide sequencing data. Nat Commun 2022; 13:2936. [PMID: 35618721 PMCID: PMC9135729 DOI: 10.1038/s41467-022-30449-7] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/03/2021] [Accepted: 04/29/2022] [Indexed: 11/19/2022] Open
Abstract
Low-cost multi-omics sequencing is expected to become clinical routine and transform precision oncology. Viable computational methods that can facilitate tailored intervention while tolerating sequencing biases are in high demand. Here we propose a class of transparent and interpretable computational methods called integral genomic signature (iGenSig) analyses, that address the challenges of cross-dataset modeling through leveraging information redundancies within high-dimensional genomic features, averaging feature weights to prevent overweighing, and extracting unbiased genomic information from large tumor cohorts. Using genomic dataset of chemical perturbations, we develop a battery of iGenSig models for predicting cancer drug responses, and validate the models using independent cell-line and clinical datasets. The iGenSig models for five drugs demonstrate predictive values in six clinical studies, among which the Erlotinib and 5-FU models significantly predict therapeutic responses in three studies, offering clinically relevant insights into their inverse predictive signature pathways. Together, iGenSig provides a computational framework to facilitate tailored cancer therapy based on multi-omics data.
Collapse
Affiliation(s)
- Xiao-Song Wang
- UPMC Hillman Cancer Center, University of Pittsburgh, Pittsburgh, PA, 15213, USA.
- Department of Pathology, University of Pittsburgh, Pittsburgh, PA, 15213, USA.
| | - Sanghoon Lee
- UPMC Hillman Cancer Center, University of Pittsburgh, Pittsburgh, PA, 15213, USA
- Department of Biomedical Informatics, University of Pittsburgh, Pittsburgh, PA, 15206, USA
| | - Han Zhang
- UPMC Hillman Cancer Center, University of Pittsburgh, Pittsburgh, PA, 15213, USA
- Department of Biomedical Informatics, University of Pittsburgh, Pittsburgh, PA, 15206, USA
| | - Gong Tang
- Department of Biostatistics, University of Pittsburgh, Pittsburgh, PA, 15261, USA
| | - Yue Wang
- UPMC Hillman Cancer Center, University of Pittsburgh, Pittsburgh, PA, 15213, USA
- Department of Pathology, University of Pittsburgh, Pittsburgh, PA, 15213, USA
| |
Collapse
|
120
|
Park H, Yamaguchi R, Imoto S, Miyano S. Xprediction: Explainable EGFR-TKIs response prediction based on drug sensitivity specific gene networks. PLoS One 2022; 17:e0261630. [PMID: 35584089 PMCID: PMC9116684 DOI: 10.1371/journal.pone.0261630] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/12/2021] [Accepted: 12/06/2021] [Indexed: 12/03/2022] Open
Abstract
In recent years, drug sensitivity prediction has garnered a great deal of attention due to the growing interest in precision medicine. Several computational methods have been developed for drug sensitivity prediction and the identification of related markers. However, most previous studies have ignored genetic interaction, although complex diseases (e.g., cancer) involve many genes intricately connected in a molecular network rather than the abnormality of a single gene. To effectively predict drug sensitivity and understand its mechanism, we propose a novel strategy for explainable drug sensitivity prediction based on sample-specific gene regulatory networks, designated Xprediction. Our strategy first estimates sample-specific gene regulatory networks that enable us to identify the molecular interplay underlying varying clinical characteristics of cell lines. We then, predict drug sensitivity based on the estimated sample-specific gene regulatory networks. The predictive models are based on machine learning approaches, i.e., random forest, kernel support vector machine, and deep neural network. Although the machine learning models provide remarkable results for prediction and classification, we cannot understand how the models reach their decisions. In other words, the methods suffer from the black box problem and thus, we cannot identify crucial molecular interactions that involve drug sensitivity-related mechanisms. To address this issue, we propose a method that describes the importance of each molecular interaction for the drug sensitivity prediction result. The proposed method enables us to identify crucial gene-gene interactions and thereby, interpret the prediction results based on the identified markers. To evaluate our strategy, we applied Xprediction to EGFR-TKIs prediction based on drug sensitivity specific gene regulatory networks and identified important molecular interactions for EGFR-TKIs prediction. Our strategy effectively performed drug sensitivity prediction compared with prediction based on the expression levels of genes. We also verified through literature, the EGFR-TKIs-related mechanisms of a majority of the identified markers. We expect our strategy to be a useful tool for predicting tasks and uncovering complex mechanisms related to pharmacological profiles, such as mechanisms of acquired drug resistance or sensitivity of cancer cells.
Collapse
Affiliation(s)
- Heewon Park
- M&D Data Science Center, Tokyo Medical and Dental University, Bunkyo-ku, Tokyo, Japan
- * E-mail:
| | - Rui Yamaguchi
- Division of Cancer Systems Biology, Aichi Cancer Center Research Institute, Chikusa-ku, Nagoya, Aichi, Japan
- Division of Cancer Informatics, Nagoya University Graduate School of Medicine, Showa-ku, Nagoya, Aichi, Japan
- Human Genome Center, The Institute of Medical Science, The University of Tokyo, Minato-ku, Tokyo, Japan
| | - Seiya Imoto
- Human Genome Center, The Institute of Medical Science, The University of Tokyo, Minato-ku, Tokyo, Japan
| | - Satoru Miyano
- M&D Data Science Center, Tokyo Medical and Dental University, Bunkyo-ku, Tokyo, Japan
- Human Genome Center, The Institute of Medical Science, The University of Tokyo, Minato-ku, Tokyo, Japan
| |
Collapse
|
121
|
Hesami M, Alizadeh M, Jones AMP, Torkamaneh D. Machine learning: its challenges and opportunities in plant system biology. Appl Microbiol Biotechnol 2022; 106:3507-3530. [PMID: 35575915 DOI: 10.1007/s00253-022-11963-6] [Citation(s) in RCA: 28] [Impact Index Per Article: 9.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/14/2022] [Revised: 03/14/2022] [Accepted: 05/07/2022] [Indexed: 12/25/2022]
Abstract
Sequencing technologies are evolving at a rapid pace, enabling the generation of massive amounts of data in multiple dimensions (e.g., genomics, epigenomics, transcriptomic, metabolomics, proteomics, and single-cell omics) in plants. To provide comprehensive insights into the complexity of plant biological systems, it is important to integrate different omics datasets. Although recent advances in computational analytical pipelines have enabled efficient and high-quality exploration and exploitation of single omics data, the integration of multidimensional, heterogenous, and large datasets (i.e., multi-omics) remains a challenge. In this regard, machine learning (ML) offers promising approaches to integrate large datasets and to recognize fine-grained patterns and relationships. Nevertheless, they require rigorous optimizations to process multi-omics-derived datasets. In this review, we discuss the main concepts of machine learning as well as the key challenges and solutions related to the big data derived from plant system biology. We also provide in-depth insight into the principles of data integration using ML, as well as challenges and opportunities in different contexts including multi-omics, single-cell omics, protein function, and protein-protein interaction. KEY POINTS: • The key challenges and solutions related to the big data derived from plant system biology have been highlighted. • Different methods of data integration have been discussed. • Challenges and opportunities of the application of machine learning in plant system biology have been highlighted and discussed.
Collapse
Affiliation(s)
- Mohsen Hesami
- Department of Plant Agriculture, University of Guelph, Guelph, ON, N1G 2W1, Canada
| | - Milad Alizadeh
- Department of Botany, University of British Columbia, Vancouver, BC, V6T 1Z4, Canada
| | | | - Davoud Torkamaneh
- Département de Phytologie, Université Laval, Québec City, QC, G1V 0A6, Canada. .,Institut de Biologie Intégrative Et Des Systèmes (IBIS), Université Laval, Québec City, QC, G1V 0A6, Canada.
| |
Collapse
|
122
|
Park A, Joo M, Kim K, Son WJ, Lim G, Lee J, Kim JH, Lee DH, Nam S. A comprehensive evaluation of regression-based drug responsiveness prediction models, using cell viability inhibitory concentrations (IC50 values). Bioinformatics 2022; 38:2810-2817. [PMID: 35561188 DOI: 10.1093/bioinformatics/btac177] [Citation(s) in RCA: 13] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/02/2021] [Revised: 03/06/2022] [Accepted: 03/22/2022] [Indexed: 11/13/2022] Open
Abstract
MOTIVATION Predicting drug response is critical for precision medicine. Diverse methods have predicted drug responsiveness, as measured by the half-maximal drug inhibitory concentration (IC50), in cultured cells. Although IC50s are continuous, traditional prediction models have dealt mainly with binary classification of responsiveness. However, since there are few regression-based IC50 predictions, comprehensive evaluations of regression-based IC50 prediction models, including machine learning (ML) and deep learning (DL), for diverse data types and dataset sizes, have not been addressed. RESULTS Here, we constructed 11 input data settings, including multi-omics settings, with varying dataset sizes, then evaluated the performance of regression-based ML and DL models to predict IC50s. DL models considered two convolutional neural network architectures: CDRScan and residual neural network (ResNet). ResNet was introduced in regression-based DL models for predicting drug response for the first time. As a result, DL models performed better than ML models in all the settings. Also, ResNet performed better than or comparable to CDRScan and ML models in all settings. AVAILABILITY AND IMPLEMENTATION The data underlying this article are available in GitHub at https://github.com/labnams/IC50evaluation. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Aron Park
- Department of Health Sciences and Technology, Gachon Advanced Institute for Health Sciences and Technology (GAIHST), Gachon University, Incheon 21999, Korea
| | - Minjae Joo
- Department of Health Sciences and Technology, Gachon Advanced Institute for Health Sciences and Technology (GAIHST), Gachon University, Incheon 21999, Korea
| | | | - Won-Joon Son
- Samsung Advanced Institute of Technology, Samsung Electronics, Suwon, Gyeonggi-do 16678, Korea
| | - GyuTae Lim
- Genome Editing Research Center, Korea Research Institute of Bioscience and Biotechnology (KRIBB), Daejeon 34141, Korea
| | - Jinhyuk Lee
- Genome Editing Research Center, Korea Research Institute of Bioscience and Biotechnology (KRIBB), Daejeon 34141, Korea
- Department of Bioinformatics, University of Sciences and Technology, Daejeon 34113, Korea
| | - Jung Ho Kim
- Department of Internal Medicine, Gachon University Gil Medical Center, Gachon University School of Medicine, Incheon 21565, Korea
| | - Dae Ho Lee
- Department of Health Sciences and Technology, Gachon Advanced Institute for Health Sciences and Technology (GAIHST), Gachon University, Incheon 21999, Korea
- Department of Internal Medicine, Gachon University Gil Medical Center, Gachon University School of Medicine, Incheon 21565, Korea
| | - Seungyoon Nam
- Department of Health Sciences and Technology, Gachon Advanced Institute for Health Sciences and Technology (GAIHST), Gachon University, Incheon 21999, Korea
- AI Convergence Center for Medical Science, Department of Genome Medicine and Science, Gachon University Gil Medical Center, Gachon University College of Medicine, Incheon 21565, Korea
- Department of Life Sciences, Gachon University, Seongnam, Gyeonggi-do 13120, Korea
| |
Collapse
|
123
|
Lee D, Kim S. Knowledge-guided artificial intelligence technologies for decoding complex multiomics interactions in cells. Clin Exp Pediatr 2022; 65:239-249. [PMID: 34844399 PMCID: PMC9082244 DOI: 10.3345/cep.2021.01438] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 09/15/2021] [Revised: 10/19/2021] [Accepted: 10/21/2021] [Indexed: 11/27/2022] Open
Abstract
Cells survive and proliferate through complex interactions among diverse molecules across multiomics layers. Conventional experimental approaches for identifying these interactions have built a firm foundation for molecular biology, but their scalability is gradually becoming inadequate compared to the rapid accumulation of multiomics data measured by high-throughput technologies. Therefore, the need for data-driven computational modeling of interactions within cells has been highlighted in recent years. The complexity of multiomics interactions is primarily due to their nonlinearity. That is, their accurate modeling requires intricate conditional dependencies, synergies, or antagonisms between considered genes or proteins, which retard experimental validations. Artificial intelligence (AI) technologies, including deep learning models, are optimal choices for handling complex nonlinear relationships between features that are scalable and produce large amounts of data. Thus, they have great potential for modeling multiomics interactions. Although there exist many AI-driven models for computational biology applications, relatively few explicitly incorporate the prior knowledge within model architectures or training procedures. Such guidance of models by domain knowledge will greatly reduce the amount of data needed to train models and constrain their vast expressive powers to focus on the biologically relevant space. Therefore, it can enhance a model's interpretability, reduce spurious interactions, and prove its validity and utility. Thus, to facilitate further development of knowledge-guided AI technologies for the modeling of multiomics interactions, here we review representative bioinformatics applications of deep learning models for multiomics interactions developed to date by categorizing them by guidance mode.
Collapse
Affiliation(s)
- Dohoon Lee
- Bioinformatics Institute, Seoul National University, Seoul, Korea
| | - Sun Kim
- Interdisciplinary Program in Bioinformatics, Seoul National University, Seoul, Korea
- Department of Computer Science and Engineering, Seoul National University, Seoul, Korea
- Institute of Engineering Research, Seoul National University, Seoul, Korea
- AIGENDRUG Co., Ltd., Seoul, Korea
| |
Collapse
|
124
|
Kowald A, Barrantes I, Möller S, Palmer D, Murua Escobar H, Schwerk A, Fuellen G. Transfer learning of clinical outcomes from preclinical molecular data, principles and perspectives. Brief Bioinform 2022; 23:6572661. [PMID: 35453145 PMCID: PMC9116218 DOI: 10.1093/bib/bbac133] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/19/2021] [Revised: 02/16/2022] [Accepted: 03/21/2022] [Indexed: 01/14/2023] Open
Abstract
Accurate transfer learning of clinical outcomes from one cellular context to another, between cell types, developmental stages, omics modalities or species, is considered tremendously useful. When transferring a prediction task from a source domain to a target domain, what counts is the high quality of the predictions in the target domain, requiring states or processes common to both the source and the target that can be learned by the predictor reflected by shared denominators. These may form a compendium of knowledge that is learned in the source to enable predictions in the target, usually with few, if any, labeled target training samples to learn from. Transductive transfer learning refers to the learning of the predictor in the source domain, transferring its outcome label calculations to the target domain, considering the same task. Inductive transfer learning considers cases where the target predictor is performing a different yet related task as compared with the source predictor. Often, there is also a need to first map the variables in the input/feature spaces and/or the variables in the output/outcome spaces. We here discuss and juxtapose various recently published transfer learning approaches, specifically designed (or at least adaptable) to predict clinical (human in vivo) outcomes based on preclinical (mostly animal-based) molecular data, towards finding the right tool for a given task, and paving the way for a comprehensive and systematic comparison of the suitability and accuracy of transfer learning of clinical outcomes.
Collapse
Affiliation(s)
- Axel Kowald
- Institute for Biostatistics and Informatics in Medicine and Ageing Research, Rostock University Medical Center, Rostock, Germany
| | - Israel Barrantes
- Institute for Biostatistics and Informatics in Medicine and Ageing Research, Rostock University Medical Center, Rostock, Germany
| | - Steffen Möller
- Institute for Biostatistics and Informatics in Medicine and Ageing Research, Rostock University Medical Center, Rostock, Germany
| | - Daniel Palmer
- Institute for Biostatistics and Informatics in Medicine and Ageing Research, Rostock University Medical Center, Rostock, Germany
| | - Hugo Murua Escobar
- Department of Medicine, Clinic III, Hematology, Oncology, Palliative Medicine, Rostock University Medical Center, Rostock, Germany
| | | | - Georg Fuellen
- Institute for Biostatistics and Informatics in Medicine and Ageing Research, Rostock University Medical Center, Rostock, Germany.,Centre for Transdisciplinary Neurosciences Rostock, Research Focus Oncology and Ageing of Individuals and Society, Interdisciplinary Faculty, Rostock, Germany
| |
Collapse
|
125
|
Moon S, Lee H. MOMA: a multi-task attention learning algorithm for multi-omics data interpretation and classification. Bioinformatics 2022; 38:2287-2296. [PMID: 35157023 PMCID: PMC10060719 DOI: 10.1093/bioinformatics/btac080] [Citation(s) in RCA: 31] [Impact Index Per Article: 10.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/29/2021] [Revised: 01/01/2022] [Accepted: 02/08/2022] [Indexed: 02/03/2023] Open
Abstract
MOTIVATION Accurate diagnostic classification and biological interpretation are important in biology and medicine, which are data-rich sciences. Thus, integration of different data types is necessary for the high predictive accuracy of clinical phenotypes, and more comprehensive analyses for predicting the prognosis of complex diseases are required. RESULTS Here, we propose a novel multi-task attention learning algorithm for multi-omics data, termed MOMA, which captures important biological processes for high diagnostic performance and interpretability. MOMA vectorizes features and modules using a geometric approach and focuses on important modules in multi-omics data via an attention mechanism. Experiments using public data on Alzheimer's disease and cancer with various classification tasks demonstrated the superior performance of this approach. The utility of MOMA was also verified using a comparison experiment with an attention mechanism that was turned on or off and biological analysis. AVAILABILITY AND IMPLEMENTATION The source codes are available at https://github.com/dmcb-gist/MOMA. SUPPLEMENTARY INFORMATION Supplementary materials are available at Bioinformatics online.
Collapse
Affiliation(s)
- Sehwan Moon
- School of Electrical Engineering and Computer Science, Gwangju Institute of Science and Technology, Gwangju 61005, South Korea
| | - Hyunju Lee
- School of Electrical Engineering and Computer Science, Gwangju Institute of Science and Technology, Gwangju 61005, South Korea
| |
Collapse
|
126
|
Sapoval N, Aghazadeh A, Nute MG, Antunes DA, Balaji A, Baraniuk R, Barberan CJ, Dannenfelser R, Dun C, Edrisi M, Elworth RAL, Kille B, Kyrillidis A, Nakhleh L, Wolfe CR, Yan Z, Yao V, Treangen TJ. Current progress and open challenges for applying deep learning across the biosciences. Nat Commun 2022; 13:1728. [PMID: 35365602 PMCID: PMC8976012 DOI: 10.1038/s41467-022-29268-7] [Citation(s) in RCA: 112] [Impact Index Per Article: 37.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/25/2021] [Accepted: 03/09/2022] [Indexed: 11/19/2022] Open
Abstract
Deep Learning (DL) has recently enabled unprecedented advances in one of the grand challenges in computational biology: the half-century-old problem of protein structure prediction. In this paper we discuss recent advances, limitations, and future perspectives of DL on five broad areas: protein structure prediction, protein function prediction, genome engineering, systems biology and data integration, and phylogenetic inference. We discuss each application area and cover the main bottlenecks of DL approaches, such as training data, problem scope, and the ability to leverage existing DL architectures in new contexts. To conclude, we provide a summary of the subject-specific and general challenges for DL across the biosciences.
Collapse
Affiliation(s)
- Nicolae Sapoval
- Department of Computer Science, Rice University, Houston, TX, USA
| | - Amirali Aghazadeh
- Department of Electrical Engineering and Computer Sciences, University of California Berkeley, Berkeley, CA, USA
| | - Michael G Nute
- Department of Computer Science, Rice University, Houston, TX, USA
| | - Dinler A Antunes
- Department of Biology and Biochemistry, University of Houston, Houston, TX, USA
| | - Advait Balaji
- Department of Computer Science, Rice University, Houston, TX, USA
| | - Richard Baraniuk
- Department of Electrical and Computer Engineering, Rice University, Houston, TX, USA
| | - C J Barberan
- Department of Electrical and Computer Engineering, Rice University, Houston, TX, USA
| | | | - Chen Dun
- Department of Computer Science, Rice University, Houston, TX, USA
| | | | - R A Leo Elworth
- Department of Computer Science, Rice University, Houston, TX, USA
| | - Bryce Kille
- Department of Computer Science, Rice University, Houston, TX, USA
| | | | - Luay Nakhleh
- Department of Computer Science, Rice University, Houston, TX, USA
| | - Cameron R Wolfe
- Department of Computer Science, Rice University, Houston, TX, USA
| | - Zhi Yan
- Department of Computer Science, Rice University, Houston, TX, USA
| | - Vicky Yao
- Department of Computer Science, Rice University, Houston, TX, USA
| | - Todd J Treangen
- Department of Computer Science, Rice University, Houston, TX, USA.
- Department of Bioengineering, Rice University, Houston, TX, USA.
| |
Collapse
|
127
|
Jiang L, Jiang C, Yu X, Fu R, Jin S, Liu X. DeepTTA: a transformer-based model for predicting cancer drug response. Brief Bioinform 2022; 23:6554594. [PMID: 35348595 DOI: 10.1093/bib/bbac100] [Citation(s) in RCA: 35] [Impact Index Per Article: 11.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/23/2021] [Revised: 02/08/2022] [Accepted: 02/27/2022] [Indexed: 12/27/2022] Open
Abstract
Identifying new lead molecules to treat cancer requires more than a decade of dedicated effort. Before selected drug candidates are used in the clinic, their anti-cancer activity is generally validated by in vitro cellular experiments. Therefore, accurate prediction of cancer drug response is a critical and challenging task for anti-cancer drugs design and precision medicine. With the development of pharmacogenomics, the combination of efficient drug feature extraction methods and omics data has made it possible to use computational models to assist in drug response prediction. In this study, we propose DeepTTA, a novel end-to-end deep learning model that utilizes transformer for drug representation learning and a multilayer neural network for transcriptomic data prediction of the anti-cancer drug responses. Specifically, DeepTTA uses transcriptomic gene expression data and chemical substructures of drugs for drug response prediction. Compared to existing methods, DeepTTA achieved higher performance in terms of root mean square error, Pearson correlation coefficient and Spearman's rank correlation coefficient on multiple test sets. Moreover, we discovered that anti-cancer drugs bortezomib and dactinomycin provide a potential therapeutic option with multiple clinical indications. With its excellent performance, DeepTTA is expected to be an effective method in cancer drug design.
Collapse
Affiliation(s)
- Likun Jiang
- Department of Computer Science, Xiamen University, Xiamen 361005, China.,National Institute for Data Science in Health and Medicine, Xiamen University, Xiamen 361005, China
| | - Changzhi Jiang
- Department of Computer Science, Xiamen University, Xiamen 361005, China
| | - Xinyu Yu
- Department of Computer Science, Xiamen University, Xiamen 361005, China
| | - Rao Fu
- Department of Computer Science, Xiamen University, Xiamen 361005, China
| | - Shuting Jin
- Department of Computer Science, Xiamen University, Xiamen 361005, China.,National Institute for Data Science in Health and Medicine, Xiamen University, Xiamen 361005, China
| | - Xiangrong Liu
- Department of Computer Science, Xiamen University, Xiamen 361005, China.,National Institute for Data Science in Health and Medicine, Xiamen University, Xiamen 361005, China
| |
Collapse
|
128
|
Wang Z, Wang Z, Huang Y, Lu L, Fu Y. A multi-view multi-omics model for cancer drug response prediction. APPL INTELL 2022. [DOI: 10.1007/s10489-022-03294-w] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/02/2022]
|
129
|
Stahlschmidt SR, Ulfenborg B, Synnergren J. Multimodal deep learning for biomedical data fusion: a review. Brief Bioinform 2022; 23:bbab569. [PMID: 35089332 PMCID: PMC8921642 DOI: 10.1093/bib/bbab569] [Citation(s) in RCA: 163] [Impact Index Per Article: 54.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/21/2021] [Revised: 12/06/2021] [Accepted: 12/11/2021] [Indexed: 02/06/2023] Open
Abstract
Biomedical data are becoming increasingly multimodal and thereby capture the underlying complex relationships among biological processes. Deep learning (DL)-based data fusion strategies are a popular approach for modeling these nonlinear relationships. Therefore, we review the current state-of-the-art of such methods and propose a detailed taxonomy that facilitates more informed choices of fusion strategies for biomedical applications, as well as research on novel methods. By doing so, we find that deep fusion strategies often outperform unimodal and shallow approaches. Additionally, the proposed subcategories of fusion strategies show different advantages and drawbacks. The review of current methods has shown that, especially for intermediate fusion strategies, joint representation learning is the preferred approach as it effectively models the complex interactions of different levels of biological organization. Finally, we note that gradual fusion, based on prior biological knowledge or on search strategies, is a promising future research path. Similarly, utilizing transfer learning might overcome sample size limitations of multimodal data sets. As these data sets become increasingly available, multimodal DL approaches present the opportunity to train holistic models that can learn the complex regulatory dynamics behind health and disease.
Collapse
Affiliation(s)
| | | | - Jane Synnergren
- Systems Biology Research Center, University of Skövde, Sweden
| |
Collapse
|
130
|
Nguyen GTT, Vu HD, Le DH. Integrating Molecular Graph Data of Drugs and Multiple -Omic Data of Cell Lines for Drug Response Prediction. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2022; 19:710-717. [PMID: 34260355 DOI: 10.1109/tcbb.2021.3096960] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/13/2023]
Abstract
Previous studies have either learned drug's features from their string or numeric representations, which are not natural forms of drugs, or only used genomic data of cell lines for the drug response prediction problem. Here, we proposed a deep learning model, GraOmicDRP, to learn drug's features from their graph representation and integrate multiple -omic data of cell lines. In GraOmicDRP, drugs are represented as graphs of bindings among atoms; meanwhile, cell lines are depicted by not only genomic but also transcriptomic and epigenomic data. Graph convolutional and convolutional neural networks were used to learn the representation of drugs and cell lines, respectively. A combination of the two representations was then used to be representative of each pair of drug-cell line. Finally, the response value of each pair was predicted by a fully connected network. Experimental results indicate that transcriptomic data shows the best among single -omic data; meanwhile, the combinations of transcriptomic and other -omic data achieved the best performance overall in terms of both Root Mean Square Error and Pearson correlation coefficient. In addition, we also show that GraOmicDRP outperforms some state-of-the-art methods, including ones integrating -omic data with drug information such as GraphDRP, and ones using -omic data without drug information such as DeepDR and MOLI.
Collapse
|
131
|
Abstract
Multi-omics data analysis is an important aspect of cancer molecular biology studies and has led to ground-breaking discoveries. Many efforts have been made to develop machine learning methods that automatically integrate omics data. Here, we review machine learning tools categorized as either general-purpose or task-specific, covering both supervised and unsupervised learning for integrative analysis of multi-omics data. We benchmark the performance of five machine learning approaches using data from the Cancer Cell Line Encyclopedia, reporting accuracy on cancer type classification and mean absolute error on drug response prediction, and evaluating runtime efficiency. This review provides recommendations to researchers regarding suitable machine learning method selection for their specific applications. It should also promote the development of novel machine learning methodologies for data integration, which will be essential for drug discovery, clinical trial design, and personalized treatments.
Collapse
Affiliation(s)
- Zhaoxiang Cai
- ProCan®, Children’s Medical Research Institute, Faculty of Medicine and Health, The University of Sydney, 214 Hawkesbury Rd, Westmead, NSW 2145, Australia
| | - Rebecca C. Poulos
- ProCan®, Children’s Medical Research Institute, Faculty of Medicine and Health, The University of Sydney, 214 Hawkesbury Rd, Westmead, NSW 2145, Australia
| | - Jia Liu
- ProCan®, Children’s Medical Research Institute, Faculty of Medicine and Health, The University of Sydney, 214 Hawkesbury Rd, Westmead, NSW 2145, Australia
- Faculty of Medicine, Western Sydney University, Campbelltown, NSW, Australia
| | - Qing Zhong
- ProCan®, Children’s Medical Research Institute, Faculty of Medicine and Health, The University of Sydney, 214 Hawkesbury Rd, Westmead, NSW 2145, Australia
| |
Collapse
|
132
|
Su R, Huang Y, Zhang DG, Xiao G, Wei L. SRDFM: Siamese Response Deep Factorization Machine to improve anti-cancer drug recommendation. Brief Bioinform 2022; 23:6501725. [DOI: 10.1093/bib/bbab534] [Citation(s) in RCA: 6] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/24/2021] [Revised: 10/31/2021] [Accepted: 11/17/2021] [Indexed: 01/09/2023] Open
Abstract
Abstract
Predicting the response of cancer patients to a particular treatment is a major goal of modern oncology and an important step toward personalized treatment. In the practical clinics, the clinicians prefer to obtain the most-suited drugs for a particular patient instead of knowing the exact values of drug sensitivity. Instead of predicting the exact value of drug response, we proposed a deep learning-based method, named Siamese Response Deep Factorization Machines (SRDFM) Network, for personalized anti-cancer drug recommendation, which directly ranks the drugs and provides the most effective drugs. A Siamese network (SN), a type of deep learning network that is composed of identical subnetworks that share the same architecture, parameters and weights, was used to measure the relative position (RP) between drugs for each cell line. Through minimizing the difference between the real RP and the predicted RP, an optimal SN model was established to provide the rank for all the candidate drugs. Specifically, the subnetwork in each side of the SN consists of a feature generation level and a predictor construction level. On the feature generation level, both drug property and gene expression, were adopted to build a concatenated feature vector, which even enables the recommendation for newly designed drugs with only chemical property known. Particularly, we developed a response unit here to generate weighted genetic feature vector to simulate the biological interaction mechanism between a specific drug and the genes. For the predictor construction level, we built this level integrating a factorization machine (FM) component with a deep neural network component. The FM can well handle the discrete chemical information and both low-order and high-order feature interactions could be sufficiently learned. Impressively, the SRDFM works well on both single-drug recommendation and synergic drug combination. Experiment result on both single-drug and synergetic drug data sets have shown the efficiency of the SRDFM. The Python implementation for the proposed SRDFM is available at at https://github.com/RanSuLab/SRDFM Contact: ran.su@tju.edu.cn, gbx@mju.edu.cn and weileyi@sdu.edu.cn.
Collapse
|
133
|
Kang M, Ko E, Mersha TB. A roadmap for multi-omics data integration using deep learning. Brief Bioinform 2022; 23:bbab454. [PMID: 34791014 PMCID: PMC8769688 DOI: 10.1093/bib/bbab454] [Citation(s) in RCA: 142] [Impact Index Per Article: 47.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/27/2021] [Revised: 09/30/2021] [Accepted: 10/05/2021] [Indexed: 12/18/2022] Open
Abstract
High-throughput next-generation sequencing now makes it possible to generate a vast amount of multi-omics data for various applications. These data have revolutionized biomedical research by providing a more comprehensive understanding of the biological systems and molecular mechanisms of disease development. Recently, deep learning (DL) algorithms have become one of the most promising methods in multi-omics data analysis, due to their predictive performance and capability of capturing nonlinear and hierarchical features. While integrating and translating multi-omics data into useful functional insights remain the biggest bottleneck, there is a clear trend towards incorporating multi-omics analysis in biomedical research to help explain the complex relationships between molecular layers. Multi-omics data have a role to improve prevention, early detection and prediction; monitor progression; interpret patterns and endotyping; and design personalized treatments. In this review, we outline a roadmap of multi-omics integration using DL and offer a practical perspective into the advantages, challenges and barriers to the implementation of DL in multi-omics data.
Collapse
Affiliation(s)
- Mingon Kang
- Department of Computer Science at the University of Nevada, Las Vegas, NV, USA
| | - Euiseong Ko
- Department of Computer Science at the University of Nevada, Las Vegas, NV, USA
| | - Tesfaye B Mersha
- Department of Pediatrics, Cincinnati Children’s Hospital Medical Center, University of Cincinnati, Cincinnati, OH, USA
| |
Collapse
|
134
|
Firoozbakht F, Yousefi B, Schwikowski B. An overview of machine learning methods for monotherapy drug response prediction. Brief Bioinform 2022; 23:bbab408. [PMID: 34619752 PMCID: PMC8769705 DOI: 10.1093/bib/bbab408] [Citation(s) in RCA: 23] [Impact Index Per Article: 7.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/07/2021] [Revised: 08/25/2021] [Accepted: 09/06/2021] [Indexed: 12/11/2022] Open
Abstract
For an increasing number of preclinical samples, both detailed molecular profiles and their responses to various drugs are becoming available. Efforts to understand, and predict, drug responses in a data-driven manner have led to a proliferation of machine learning (ML) methods, with the longer term ambition of predicting clinical drug responses. Here, we provide a uniquely wide and deep systematic review of the rapidly evolving literature on monotherapy drug response prediction, with a systematic characterization and classification that comprises more than 70 ML methods in 13 subclasses, their input and output data types, modes of evaluation, and code and software availability. ML experts are provided with a fundamental understanding of the biological problem, and how ML methods are configured for it. Biologists and biomedical researchers are introduced to the basic principles of applicable ML methods, and their application to the problem of drug response prediction. We also provide systematic overviews of commonly used data sources used for training and evaluation methods.
Collapse
Affiliation(s)
- Farzaneh Firoozbakht
- Systems Biology Group, Department of Computational Biology, Institut Pasteur, Paris, France
| | - Behnam Yousefi
- Systems Biology Group, Department of Computational Biology, Institut Pasteur, Paris, France
- Sorbonne Université, École Doctorale Complexite du Vivant, Paris, France
| | - Benno Schwikowski
- Systems Biology Group, Department of Computational Biology, Institut Pasteur, Paris, France
| |
Collapse
|
135
|
Zhu Y, Ouyang Z, Chen W, Feng R, Chen DZ, Cao J, Wu J. TGSA: protein-protein association-based twin graph neural networks for drug response prediction with similarity augmentation. Bioinformatics 2022; 38:461-468. [PMID: 34559177 DOI: 10.1093/bioinformatics/btab650] [Citation(s) in RCA: 22] [Impact Index Per Article: 7.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/17/2021] [Revised: 08/16/2021] [Accepted: 09/24/2021] [Indexed: 02/03/2023] Open
Abstract
MOTIVATION Drug response prediction (DRP) plays an important role in precision medicine (e.g. for cancer analysis and treatment). Recent advances in deep learning algorithms make it possible to predict drug responses accurately based on genetic profiles. However, existing methods ignore the potential relationships among genes. In addition, similarity among cell lines/drugs was rarely considered explicitly. RESULTS We propose a novel DRP framework, called TGSA, to make better use of prior domain knowledge. TGSA consists of Twin Graph neural networks for Drug Response Prediction (TGDRP) and a Similarity Augmentation (SA) module to fuse fine-grained and coarse-grained information. Specifically, TGDRP abstracts cell lines as graphs based on STRING protein-protein association networks and uses Graph Neural Networks (GNNs) for representation learning. SA views DRP as an edge regression problem on a heterogeneous graph and utilizes GNNs to smooth the representations of similar cell lines/drugs. Besides, we introduce an auxiliary pre-training strategy to remedy the identified limitations of scarce data and poor out-of-distribution generalization. Extensive experiments on the GDSC2 dataset demonstrate that our TGSA consistently outperforms all the state-of-the-art baselines under various experimental settings. We further evaluate the effectiveness and contributions of each component of TGSA via ablation experiments. The promising performance of TGSA shows enormous potential for clinical applications in precision medicine. AVAILABILITY AND IMPLEMENTATION The source code is available at https://github.com/violet-sto/TGSA. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Yiheng Zhu
- College of Computer Science and Technology, Zhejiang University, Hangzhou 310000, China
| | - Zhenqiu Ouyang
- Polytechnic Institute, Zhejiang University, Hangzhou 310000, China
| | - Wenbo Chen
- Polytechnic Institute, Zhejiang University, Hangzhou 310000, China
| | - Ruiwei Feng
- College of Computer Science and Technology, Zhejiang University, Hangzhou 310000, China
| | - Danny Z Chen
- Department of Computer Science and Engineering, University of Notre Dame, Notre Dame, IN 46556, USA
| | - Ji Cao
- College of Pharmaceutical Sciences, Zhejiang University, Hangzhou 310000, China
| | - Jian Wu
- Department of Ophthalmology of the Second Affiliated Hospital School of Medicine, and School of Public Health, Zhejiang University, Hangzhou 310000, China
| |
Collapse
|
136
|
Viaud G, Mayilvahanan P, Cournede PH. Representation Learning for the Clustering of Multi-Omics Data. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2022; 19:135-145. [PMID: 33600320 DOI: 10.1109/tcbb.2021.3060340] [Citation(s) in RCA: 6] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/12/2023]
Abstract
The integration of several sources of data for the identification of subtypes of diseases has gained attention over the past few years. The heterogeneity and the high dimensions of the data sets calls for an adequate representation of the data. We summarize the field of representation learning for the multi-omics clustering problem and we investigate several techniques to learn relevant combined representations, using methods from group factor analysis (PCA, MFA and extensions) and from machine learning with autoencoders. We highlight the importance of appropriately designing and training the latter, notably with a novel combination of a disjointed deep autoencoder (DDAE) architecture and a layer-wise reconstruction loss. These different representations can then be clustered to identify biologically meaningful clusters of patients. We provide a unifying framework for model comparison between statistical and deep learning approaches with the introduction of a new weighted internal clustering index that evaluates how well the clustering information is retained from each source, favoring contributions from all data sets. We apply our methodology to two case studies for which previous works of integrative clustering exist, TCGA Breast Cancer and TARGET Neuroblastoma, and show how our method can yield good and well-balanced clusters across the different data sources.
Collapse
|
137
|
AIM in Genomic Basis of Medicine: Applications. Artif Intell Med 2022. [DOI: 10.1007/978-3-030-64573-1_264] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/19/2022]
|
138
|
Correa R, Alonso-Pupo N, Hernández Rodríguez EW. Multi-omics data integration approaches for precision oncology. Mol Omics 2022; 18:469-479. [DOI: 10.1039/d1mo00411e] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/18/2023]
Abstract
Next-generation sequencing (NGS) has been pivotal to enhance the molecular characterization of human malignancies, allowing multiple omics data types to be available for cancer researchers and practitioners. In this context,...
Collapse
|
139
|
Vijayakumar S, Magazzù G, Moon P, Occhipinti A, Angione C. A Practical Guide to Integrating Multimodal Machine Learning and Metabolic Modeling. Methods Mol Biol 2022; 2399:87-122. [PMID: 35604554 DOI: 10.1007/978-1-0716-1831-8_5] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/15/2023]
Abstract
Complex, distributed, and dynamic sets of clinical biomedical data are collectively referred to as multimodal clinical data. In order to accommodate the volume and heterogeneity of such diverse data types and aid in their interpretation when they are combined with a multi-scale predictive model, machine learning is a useful tool that can be wielded to deconstruct biological complexity and extract relevant outputs. Additionally, genome-scale metabolic models (GSMMs) are one of the main frameworks striving to bridge the gap between genotype and phenotype by incorporating prior biological knowledge into mechanistic models. Consequently, the utilization of GSMMs as a foundation for the integration of multi-omic data originating from different domains is a valuable pursuit towards refining predictions. In this chapter, we show how cancer multi-omic data can be analyzed via multimodal machine learning and metabolic modeling. Firstly, we focus on the merits of adopting an integrative systems biology led approach to biomedical data mining. Following this, we propose how constraint-based metabolic models can provide a stable yet adaptable foundation for the integration of multimodal data with machine learning. Finally, we provide a step-by-step tutorial for the combination of machine learning and GSMMs, which includes: (i) tissue-specific constraint-based modeling; (ii) survival analysis using time-to-event prediction for cancer; and (iii) classification and regression approaches for multimodal machine learning. The code associated with the tutorial can be found at https://github.com/Angione-Lab/Tutorials_Combining_ML_and_GSMM .
Collapse
Affiliation(s)
- Supreeta Vijayakumar
- Computational Systems Biology and Data Analytics Research Group, Teesside University, Middlebrough, UK
| | - Giuseppe Magazzù
- Computational Systems Biology and Data Analytics Research Group, Teesside University, Middlebrough, UK
| | - Pradip Moon
- Computational Systems Biology and Data Analytics Research Group, Teesside University, Middlebrough, UK
| | - Annalisa Occhipinti
- Computational Systems Biology and Data Analytics Research Group, Middlebrough, UK
- Centre for Digital Innovation, Teesside University, Middlesbrough, UK
| | - Claudio Angione
- Computational Systems Biology and Data Analytics Research Group, Teesside University, Middlebrough, UK.
- Centre for Digital Innovation, Teesside University, Middlesbrough, UK.
- Healthcare Innovation Centre, Teesside University, Middlesbrough, UK.
| |
Collapse
|
140
|
Ahmed KT, Sun J, Cheng S, Yong J, Zhang W. Multi-omics data integration by generative adversarial network. Bioinformatics 2021; 38:179-186. [PMID: 34415323 PMCID: PMC10060730 DOI: 10.1093/bioinformatics/btab608] [Citation(s) in RCA: 39] [Impact Index Per Article: 9.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/08/2021] [Revised: 07/27/2021] [Accepted: 08/18/2021] [Indexed: 02/03/2023] Open
Abstract
MOTIVATION Accurate disease phenotype prediction plays an important role in the treatment of heterogeneous diseases like cancer in the era of precision medicine. With the advent of high throughput technologies, more comprehensive multi-omics data is now available that can effectively link the genotype to phenotype. However, the interactive relation of multi-omics datasets makes it particularly challenging to incorporate different biological layers to discover the coherent biological signatures and predict phenotypic outcomes. In this study, we introduce omicsGAN, a generative adversarial network model to integrate two omics data and their interaction network. The model captures information from the interaction network as well as the two omics datasets and fuse them to generate synthetic data with better predictive signals. RESULTS Large-scale experiments on The Cancer Genome Atlas breast cancer, lung cancer and ovarian cancer datasets validate that (i) the model can effectively integrate two omics data (e.g. mRNA and microRNA expression data) and their interaction network (e.g. microRNA-mRNA interaction network). The synthetic omics data generated by the proposed model has a better performance on cancer outcome classification and patients survival prediction compared to original omics datasets. (ii) The integrity of the interaction network plays a vital role in the generation of synthetic data with higher predictive quality. Using a random interaction network does not allow the framework to learn meaningful information from the omics datasets; therefore, results in synthetic data with weaker predictive signals. AVAILABILITY AND IMPLEMENTATION Source code is available at: https://github.com/CompbioLabUCF/omicsGAN. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Khandakar Tanvir Ahmed
- Department of Computer Science, University of Central Florida, Orlando, FL 32816, USA
- Genomics and Bioinformatics Cluster, University of Central Florida, Orlando, FL 32816, USA
| | - Jiao Sun
- Department of Computer Science, University of Central Florida, Orlando, FL 32816, USA
- Genomics and Bioinformatics Cluster, University of Central Florida, Orlando, FL 32816, USA
| | - Sze Cheng
- Department of Biochemistry, Molecular Biology and Biophysics, University of Minnesota Twin Cities, Minneapolis, MN 55455, USA
| | - Jeongsik Yong
- Department of Biochemistry, Molecular Biology and Biophysics, University of Minnesota Twin Cities, Minneapolis, MN 55455, USA
| | - Wei Zhang
- Department of Computer Science, University of Central Florida, Orlando, FL 32816, USA
- Genomics and Bioinformatics Cluster, University of Central Florida, Orlando, FL 32816, USA
| |
Collapse
|
141
|
Mourragui SMC, Loog M, Vis DJ, Moore K, Manjon AG, van de Wiel MA, Reinders MJT, Wessels LFA. Predicting patient response with models trained on cell lines and patient-derived xenografts by nonlinear transfer learning. Proc Natl Acad Sci U S A 2021; 118:e2106682118. [PMID: 34873056 PMCID: PMC8670522 DOI: 10.1073/pnas.2106682118] [Citation(s) in RCA: 20] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 10/18/2021] [Indexed: 12/13/2022] Open
Abstract
Preclinical models have been the workhorse of cancer research, producing massive amounts of drug response data. Unfortunately, translating response biomarkers derived from these datasets to human tumors has proven to be particularly challenging. To address this challenge, we developed TRANSACT, a computational framework that builds a consensus space to capture biological processes common to preclinical models and human tumors and exploits this space to construct drug response predictors that robustly transfer from preclinical models to human tumors. TRANSACT performs favorably compared to four competing approaches, including two deep learning approaches, on a set of 23 drug prediction challenges on The Cancer Genome Atlas and 226 metastatic tumors from the Hartwig Medical Foundation. We demonstrate that response predictions deliver a robust performance for a number of therapies of high clinical importance: platinum-based chemotherapies, gemcitabine, and paclitaxel. In contrast to other approaches, we demonstrate the interpretability of the TRANSACT predictors by correctly identifying known biomarkers of targeted therapies, and we propose potential mechanisms that mediate the resistance to two chemotherapeutic agents.
Collapse
Affiliation(s)
- Soufiane M C Mourragui
- Division of Molecular Carcinogenesis, Oncode Institute, The Netherlands Cancer Institute, 1066 CX Amsterdam, The Netherlands
- Department of Electrical Engineering, Mathematics and Computer Science, Delft University of Technology, 2628 XE Delft, The Netherlands
| | - Marco Loog
- Department of Electrical Engineering, Mathematics and Computer Science, Delft University of Technology, 2628 XE Delft, The Netherlands
- Department of Computer Science, University of Copenhagen, 2100 Copenhagen, Denmark
| | - Daniel J Vis
- Division of Molecular Carcinogenesis, Oncode Institute, The Netherlands Cancer Institute, 1066 CX Amsterdam, The Netherlands
| | - Kat Moore
- Division of Molecular Carcinogenesis, Oncode Institute, The Netherlands Cancer Institute, 1066 CX Amsterdam, The Netherlands
| | - Anna G Manjon
- Division of Cell Biology, Oncode Institute, The Netherlands Cancer Institute, 1066 CX Amsterdam, The Netherlands
| | - Mark A van de Wiel
- Epidemiology and Biostatistics, Amsterdam University Medical Center, 1105 AZ Amsterdam, The Netherlands
- Medical Research Council Biostatistics Unit, Cambridge University, Cambridge CB2 0SR, United Kingdom
| | - Marcel J T Reinders
- Department of Electrical Engineering, Mathematics and Computer Science, Delft University of Technology, 2628 XE Delft, The Netherlands;
- Leiden Computational Biology Center, Leiden University Medical Center, 2333 ZC Leiden, The Netherlands
| | - Lodewyk F A Wessels
- Division of Molecular Carcinogenesis, Oncode Institute, The Netherlands Cancer Institute, 1066 CX Amsterdam, The Netherlands;
- Department of Electrical Engineering, Mathematics and Computer Science, Delft University of Technology, 2628 XE Delft, The Netherlands
| |
Collapse
|
142
|
Out-of-distribution generalization from labelled and unlabelled gene expression data for drug response prediction. NAT MACH INTELL 2021. [DOI: 10.1038/s42256-021-00408-w] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/04/2023]
|
143
|
Demirel HC, Arici MK, Tuncbag N. Computational approaches leveraging integrated connections of multi-omic data toward clinical applications. Mol Omics 2021; 18:7-18. [PMID: 34734935 DOI: 10.1039/d1mo00158b] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/18/2023]
Abstract
In line with the advances in high-throughput technologies, multiple omic datasets have accumulated to study biological systems and diseases coherently. No single omics data type is capable of fully representing cellular activity. The complexity of the biological processes arises from the interactions between omic entities such as genes, proteins, and metabolites. Therefore, multi-omic data integration is crucial but challenging. The impact of the molecular alterations in multi-omic data is not local in the neighborhood of the altered gene or protein; rather, the impact diffuses in the network and changes the functionality of multiple signaling pathways and regulation of the gene expression. Additionally, multi-omic data is high-dimensional and has background noise. Several integrative approaches have been developed to accurately interpret the multi-omic datasets, including machine learning, network-based methods, and their combination. In this review, we overview the most recent integrative approaches and tools with a focus on network-based methods. We then discuss these approaches according to their specific applications, from disease-network and biomarker identification to patient stratification, drug discovery, and repurposing.
Collapse
Affiliation(s)
- Habibe Cansu Demirel
- Graduate School of Informatics, Middle East Technical University, Ankara, 06800, Turkey
| | - Muslum Kaan Arici
- Graduate School of Informatics, Middle East Technical University, Ankara, 06800, Turkey.,Foot and Mouth Diseases Institute, Ministry of Agriculture and Forestry, Ankara, 06044, Turkey
| | - Nurcan Tuncbag
- Chemical and Biological Engineering, College of Engineering, Koc University, Istanbul, 34450, Turkey.,School of Medicine, Koc University, Istanbul, 34450, Turkey.,Koc University Research Center for Translational Medicine (KUTTAM), Istanbul, Turkey.
| |
Collapse
|
144
|
Li F, Dong S, Leier A, Han M, Guo X, Xu J, Wang X, Pan S, Jia C, Zhang Y, Webb GI, Coin LJM, Li C, Song J. Positive-unlabeled learning in bioinformatics and computational biology: a brief review. Brief Bioinform 2021; 23:6415313. [PMID: 34729589 DOI: 10.1093/bib/bbab461] [Citation(s) in RCA: 33] [Impact Index Per Article: 8.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/30/2021] [Revised: 09/27/2021] [Accepted: 10/07/2021] [Indexed: 12/14/2022] Open
Abstract
Conventional supervised binary classification algorithms have been widely applied to address significant research questions using biological and biomedical data. This classification scheme requires two fully labeled classes of data (e.g. positive and negative samples) to train a classification model. However, in many bioinformatics applications, labeling data is laborious, and the negative samples might be potentially mislabeled due to the limited sensitivity of the experimental equipment. The positive unlabeled (PU) learning scheme was therefore proposed to enable the classifier to learn directly from limited positive samples and a large number of unlabeled samples (i.e. a mixture of positive or negative samples). To date, several PU learning algorithms have been developed to address various biological questions, such as sequence identification, functional site characterization and interaction prediction. In this paper, we revisit a collection of 29 state-of-the-art PU learning bioinformatic applications to address various biological questions. Various important aspects are extensively discussed, including PU learning methodology, biological application, classifier design and evaluation strategy. We also comment on the existing issues of PU learning and offer our perspectives for the future development of PU learning applications. We anticipate that our work serves as an instrumental guideline for a better understanding of the PU learning framework in bioinformatics and further developing next-generation PU learning frameworks for critical biological applications.
Collapse
Affiliation(s)
- Fuyi Li
- Monash University, Australia
| | | | - André Leier
- Department of Genetics, UAB School of Medicine, USA
| | - Meiya Han
- Department of Biochemistry and Molecular Biology, Monash University, Australia
| | | | - Jing Xu
- Computer Science and Technology from Nankai University, China
| | - Xiaoyu Wang
- Department of Biochemistry and Molecular Biology and Biomedicine Discovery Institute, Monash University, Australia
| | - Shirui Pan
- University of Technology Sydney (UTS), Ultimo, NSW, Australia
| | - Cangzhi Jia
- College of Science, Dalian Maritime University, Australia
| | - Yang Zhang
- Northwestern Polytechnical University, China
| | - Geoffrey I Webb
- Faculty of Information Technology at Monash University, Australia
| | - Lachlan J M Coin
- Department of Clinical Pathology, University of Melbourne, Australia
| | - Chen Li
- Biomedicine Discovery Institute and Department of Biochemistry of Molecular Biology, Monash University, Australia
| | - Jiangning Song
- Monash Biomedicine Discovery Institute, Monash University, Melbourne, Australia
| |
Collapse
|
145
|
Liu X, Song C, Huang F, Fu H, Xiao W, Zhang W. GraphCDR: a graph neural network method with contrastive learning for cancer drug response prediction. Brief Bioinform 2021; 23:6415314. [PMID: 34727569 DOI: 10.1093/bib/bbab457] [Citation(s) in RCA: 43] [Impact Index Per Article: 10.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/10/2021] [Revised: 09/25/2021] [Accepted: 10/07/2021] [Indexed: 12/29/2022] Open
Abstract
Predicting the response of a cancer cell line to a therapeutic drug is an important topic in modern oncology that can help personalized treatment for cancers. Although numerous machine learning methods have been developed for cancer drug response (CDR) prediction, integrating diverse information about cancer cell lines, drugs and their known responses still remains a great challenge. In this paper, we propose a graph neural network method with contrastive learning for CDR prediction. GraphCDR constructs a graph neural network based on multi-omics profiles of cancer cell lines, the chemical structure of drugs and known cancer cell line-drug responses for CDR prediction, while a contrastive learning task is presented as a regularizer within a multi-task learning paradigm to enhance the generalization ability. In the computational experiments, GraphCDR outperforms state-of-the-art methods under different experimental configurations, and the ablation study reveals the key components of GraphCDR: biological features, known cancer cell line-drug responses and contrastive learning are important for the high-accuracy CDR prediction. The experimental analyses imply the predictive power of GraphCDR and its potential value in guiding anti-cancer drug selection.
Collapse
Affiliation(s)
- Xuan Liu
- College of Informatics, Huazhong Agricultural University, Wuhan, 430070, China
| | - Congzhi Song
- College of Informatics, Huazhong Agricultural University, Wuhan, 430070, China
| | - Feng Huang
- College of Informatics, Huazhong Agricultural University, Wuhan, 430070, China
| | - Haitao Fu
- College of Informatics, Huazhong Agricultural University, Wuhan, 430070, China
| | - Wenjie Xiao
- Information School, University of Washington, Washington, 98105, USA
| | - Wen Zhang
- College of Informatics, Huazhong Agricultural University, Wuhan, 430070, China
| |
Collapse
|
146
|
Rydzewski NR, Peterson E, Lang JM, Yu M, Laura Chang S, Sjöström M, Bakhtiar H, Song G, Helzer KT, Bootsma ML, Chen WS, Shrestha RM, Zhang M, Quigley DA, Aggarwal R, Small EJ, Wahl DR, Feng FY, Zhao SG. Predicting cancer drug TARGETS - TreAtment Response Generalized Elastic-neT Signatures. NPJ Genom Med 2021; 6:76. [PMID: 34548481 PMCID: PMC8455625 DOI: 10.1038/s41525-021-00239-z] [Citation(s) in RCA: 11] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/24/2021] [Accepted: 08/23/2021] [Indexed: 12/14/2022] Open
Abstract
We are now in an era of molecular medicine, where specific DNA alterations can be used to identify patients who will respond to specific drugs. However, there are only a handful of clinically used predictive biomarkers in oncology. Herein, we describe an approach utilizing in vitro DNA and RNA sequencing and drug response data to create TreAtment Response Generalized Elastic-neT Signatures (TARGETS). We trained TARGETS drug response models using Elastic-Net regression in the publicly available Genomics of Drug Sensitivity in Cancer (GDSC) database. Models were then validated on additional in-vitro data from the Cancer Cell Line Encyclopedia (CCLE), and on clinical samples from The Cancer Genome Atlas (TCGA) and Stand Up to Cancer/Prostate Cancer Foundation West Coast Prostate Cancer Dream Team (WCDT). First, we demonstrated that all TARGETS models successfully predicted treatment response in the separate in-vitro CCLE treatment response dataset. Next, we evaluated all FDA-approved biomarker-based cancer drug indications in TCGA and demonstrated that TARGETS predictions were concordant with established clinical indications. Finally, we performed independent clinical validation in the WCDT and found that the TARGETS AR signaling inhibitors (ARSI) signature successfully predicted clinical treatment response in metastatic castration-resistant prostate cancer with a statistically significant interaction between the TARGETS score and PSA response (p = 0.0252). TARGETS represents a pan-cancer, platform-independent approach to predict response to oncologic therapies and could be used as a tool to better select patients for existing therapies as well as identify new indications for testing in prospective clinical trials.
Collapse
Affiliation(s)
| | - Erik Peterson
- Department of Radiation Oncology, University of Michigan, Ann Arbor, MI, USA
| | - Joshua M Lang
- Carbone Cancer Center, University of Wisconsin, Madison, WI, USA
- Department of Medicine, University of Wisconsin, Madison, WI, USA
| | - Menggang Yu
- Carbone Cancer Center, University of Wisconsin, Madison, WI, USA
- Department of Biostatistics and Medical Informatics, University of Wisconsin, Madison, WI, USA
| | - S Laura Chang
- Department of Radiation Oncology, UCSF, San Francisco, CA, USA
| | - Martin Sjöström
- Department of Radiation Oncology, UCSF, San Francisco, CA, USA
| | - Hamza Bakhtiar
- Department of Human Oncology, University of Wisconsin, Madison, WI, USA
| | - Gefei Song
- Department of Human Oncology, University of Wisconsin, Madison, WI, USA
| | - Kyle T Helzer
- Department of Human Oncology, University of Wisconsin, Madison, WI, USA
| | - Matthew L Bootsma
- Department of Human Oncology, University of Wisconsin, Madison, WI, USA
| | - William S Chen
- Department of Radiation Oncology, UCSF, San Francisco, CA, USA
| | | | - Meng Zhang
- Department of Radiation Oncology, UCSF, San Francisco, CA, USA
| | - David A Quigley
- Helen Diller Family Comprehensive Cancer Center, UCSF, San Francisco, CA, USA
- Department of Epidemiology and Biostatistics, UCSF, San Francisco, CA, USA
| | - Rahul Aggarwal
- Helen Diller Family Comprehensive Cancer Center, UCSF, San Francisco, CA, USA
- Division of Hematology and Oncology, Department of Medicine, UCSF, San Francisco, CA, USA
| | - Eric J Small
- Helen Diller Family Comprehensive Cancer Center, UCSF, San Francisco, CA, USA
- Division of Hematology and Oncology, Department of Medicine, UCSF, San Francisco, CA, USA
| | - Daniel R Wahl
- Department of Radiation Oncology, University of Michigan, Ann Arbor, MI, USA
| | - Felix Y Feng
- Department of Radiation Oncology, UCSF, San Francisco, CA, USA
- Helen Diller Family Comprehensive Cancer Center, UCSF, San Francisco, CA, USA
- Division of Hematology and Oncology, Department of Medicine, UCSF, San Francisco, CA, USA
- Department of Urology, UCSF, San Francisco, CA, USA
| | - Shuang G Zhao
- Department of Human Oncology, University of Wisconsin, Madison, WI, USA.
- Carbone Cancer Center, University of Wisconsin, Madison, WI, USA.
- William S. Middleton Memorial Veterans Hospital, Madison, WI, USA.
| |
Collapse
|
147
|
Chen Y, Zhang L. How much can deep learning improve prediction of the responses to drugs in cancer cell lines? Brief Bioinform 2021; 23:6370847. [PMID: 34529029 DOI: 10.1093/bib/bbab378] [Citation(s) in RCA: 20] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/11/2021] [Revised: 08/21/2021] [Accepted: 08/24/2021] [Indexed: 12/24/2022] Open
Abstract
The drug response prediction problem arises from personalized medicine and drug discovery. Deep neural networks have been applied to the multi-omics data being available for over 1000 cancer cell lines and tissues for better drug response prediction. We summarize and examine state-of-the-art deep learning methods that have been published recently. Although significant progresses have been made in deep learning approach in drug response prediction, deep learning methods show their weakness for predicting the response of a drug that does not appear in the training dataset. In particular, all the five evaluated deep learning methods performed worst than the similarity-regularized matrix factorization (SRMF) method in our drug blind test. We outline the challenges in applying deep learning approach to drug response prediction and suggest unique opportunities for deep learning integrated with established bioinformatics analyses to overcome some of these challenges.
Collapse
Affiliation(s)
- Yurui Chen
- Department of Mathematics and Computational Biology Programme, National University of Singapore, 119076, Singapore
| | - Louxin Zhang
- Department of Mathematics and Computational Biology Programme, National University of Singapore, 119076, Singapore
| |
Collapse
|
148
|
Gillenwater LA, Helmi S, Stene E, Pratte KA, Zhuang Y, Schuyler RP, Lange L, Castaldi PJ, Hersh CP, Banaei-Kashani F, Bowler RP, Kechris KJ. Multi-omics subtyping pipeline for chronic obstructive pulmonary disease. PLoS One 2021; 16:e0255337. [PMID: 34432807 PMCID: PMC8386883 DOI: 10.1371/journal.pone.0255337] [Citation(s) in RCA: 21] [Impact Index Per Article: 5.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/16/2020] [Accepted: 07/14/2021] [Indexed: 11/25/2022] Open
Abstract
Chronic Obstructive Pulmonary Disease (COPD) is the third leading cause of mortality in the United States; however, COPD has heterogeneous clinical phenotypes. This is the first large scale attempt which uses transcriptomics, proteomics, and metabolomics (multi-omics) to determine whether there are molecularly defined clusters with distinct clinical phenotypes that may underlie the clinical heterogeneity. Subjects included 3,278 subjects from the COPDGene cohort with at least one of the following profiles: whole blood transcriptomes (2,650 subjects); plasma proteomes (1,013 subjects); and plasma metabolomes (1,136 subjects). 489 subjects had all three contemporaneous -omics profiles. Autoencoder embeddings were performed individually for each -omics dataset. Embeddings underwent subspace clustering using MineClus, either individually by -omics or combined, followed by recursive feature selection based on Support Vector Machines. Clusters were tested for associations with clinical variables. Optimal single -omics clustering typically resulted in two clusters. Although there was overlap for individual -omics cluster membership, each -omics cluster tended to be defined by unique molecular pathways. For example, prominent molecular features of the metabolome-based clustering included sphingomyelin, while key molecular features of the transcriptome-based clusters were related to immune and bacterial responses. We also found that when we integrated the -omics data at a later stage, we identified subtypes that varied based on age, severity of disease, in addition to diffusing capacity of the lungs for carbon monoxide, and precent on atrial fibrillation. In contrast, when we integrated the -omics data at an earlier stage by treating all data sets equally, there were no clinical differences between subtypes. Similar to clinical clustering, which has revealed multiple heterogenous clinical phenotypes, we show that transcriptomics, proteomics, and metabolomics tend to define clusters of COPD patients with different clinical characteristics. Thus, integrating these different -omics data sets affords additional insight into the molecular nature of COPD and its heterogeneity.
Collapse
Affiliation(s)
| | - Shahab Helmi
- Department of Computer Science and Engineering, College of Engineering, Design and Computing, University of Colorado Denver, Denver, CO, United States of America
| | - Evan Stene
- Department of Computer Science and Engineering, College of Engineering, Design and Computing, University of Colorado Denver, Denver, CO, United States of America
| | | | - Yonghua Zhuang
- Department of Biostatistics and Informatics, Colorado School of Public Health, University of Colorado Anschutz Medical Campus, Aurora, CO, United States of America
| | - Ronald P. Schuyler
- Department of Immunology & Microbiology, University of Colorado, Anschutz Medical Campus, Aurora, CO, United States of America
| | - Leslie Lange
- Division of Biomedical Informatics and Personalized Medicine, Department of Medicine, University of Colorado Anschutz Medical Campus, Aurora, CO, United States of America
| | - Peter J. Castaldi
- Channing Division of Network Medicine, Brigham and Women’s Hospital, Boston, MA, United States of America
| | - Craig P. Hersh
- Channing Division of Network Medicine, Brigham and Women’s Hospital, Boston, MA, United States of America
| | - Farnoush Banaei-Kashani
- Department of Computer Science and Engineering, College of Engineering, Design and Computing, University of Colorado Denver, Denver, CO, United States of America
| | | | - Katerina J. Kechris
- Department of Biostatistics and Informatics, Colorado School of Public Health, University of Colorado Anschutz Medical Campus, Aurora, CO, United States of America
| |
Collapse
|
149
|
He D, Xie L. A cross-level information transmission network for hierarchical omics data integration and phenotype prediction from a new genotype. Bioinformatics 2021; 38:204-210. [PMID: 34390577 PMCID: PMC8696111 DOI: 10.1093/bioinformatics/btab580] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/31/2021] [Revised: 07/19/2021] [Accepted: 08/12/2021] [Indexed: 02/03/2023] Open
Abstract
MOTIVATION An unsolved fundamental problem in biology is to predict phenotypes from a new genotype under environmental perturbations. The emergence of multiple omics data provides new opportunities but imposes great challenges in the predictive modeling of genotype-phenotype associations. Firstly, the high-dimensionality of genomics data and the lack of coherent labeled data often make the existing supervised learning techniques less successful. Secondly, it is challenging to integrate heterogeneous omics data from different resources. Finally, few works have explicitly modeled the information transmission from DNA to phenotype, which involves multiple intermediate molecular types. Higher-level features (e.g. gene expression) usually have stronger discriminative and interpretable power than lower-level features (e.g. somatic mutation). RESULTS We propose a novel Cross-LEvel Information Transmission (CLEIT) network framework to address the above issues. CLEIT aims to represent the asymmetrical multi-level organization of the biological system by integrating multiple incoherent omics data and to improve the prediction power of low-level features. CLEIT first learns the latent representation of the high-level domain then uses it as ground-truth embedding to improve the representation learning of the low-level domain in the form of contrastive loss. Besides, CLEIT can leverage the unlabeled heterogeneous omics data to improve the generalizability of the predictive model. We demonstrate the effectiveness and significant performance boost of CLEIT in predicting anti-cancer drug sensitivity from somatic mutations via the assistance of gene expressions when compared with state-of-the-art methods. CLEIT provides a general framework to model information transmissions and integrate multi-modal data in a multi-level system. AVAILABILITYAND IMPLEMENTATION The source code is freely available at https://github.com/XieResearchGroup/CLEIT. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Di He
- PhD Program in Computer Science, Graduate Center, City University of New York, New York, NY 10016, USA
| | - Lei Xie
- To whom correspondence should be addressed.
| |
Collapse
|
150
|
Sharifi-Noghabi H, Jahangiri-Tazehkand S, Smirnov P, Hon C, Mammoliti A, Nair SK, Mer AS, Ester M, Haibe-Kains B. Drug sensitivity prediction from cell line-based pharmacogenomics data: guidelines for developing machine learning models. Brief Bioinform 2021; 22:6348324. [PMID: 34382071 DOI: 10.1093/bib/bbab294] [Citation(s) in RCA: 28] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/12/2021] [Revised: 06/29/2021] [Accepted: 07/10/2021] [Indexed: 11/13/2022] Open
Abstract
The goal of precision oncology is to tailor treatment for patients individually using the genomic profile of their tumors. Pharmacogenomics datasets such as cancer cell lines are among the most valuable resources for drug sensitivity prediction, a crucial task of precision oncology. Machine learning methods have been employed to predict drug sensitivity based on the multiple omics data available for large panels of cancer cell lines. However, there are no comprehensive guidelines on how to properly train and validate such machine learning models for drug sensitivity prediction. In this paper, we introduce a set of guidelines for different aspects of training gene expression-based predictors using cell line datasets. These guidelines provide extensive analysis of the generalization of drug sensitivity predictors and challenge many current practices in the community including the choice of training dataset and measure of drug sensitivity. The application of these guidelines in future studies will enable the development of more robust preclinical biomarkers.
Collapse
Affiliation(s)
- Hossein Sharifi-Noghabi
- School of Computing Science, Simon Fraser University, Burnaby, British Columbia, Canada.,Vancouver Prostate Center, Vancouver, British Columbia, Canada.,Princess Margaret Cancer Centre, Toronto, Ontario, Canada
| | - Soheil Jahangiri-Tazehkand
- Department of Medical Biophysics, University of Toronto, Toronto, Ontario, Canada.,Princess Margaret Cancer Centre, Toronto, Ontario, Canada.,University of Toronto, Toronto, Ontario, Canada
| | - Petr Smirnov
- Department of Medical Biophysics, University of Toronto, Toronto, Ontario, Canada.,Princess Margaret Cancer Centre, Toronto, Ontario, Canada.,University of Toronto, Toronto, Ontario, Canada
| | - Casey Hon
- Princess Margaret Cancer Centre, Toronto, Ontario, Canada.,University of Toronto, Toronto, Ontario, Canada
| | - Anthony Mammoliti
- Department of Medical Biophysics, University of Toronto, Toronto, Ontario, Canada.,Princess Margaret Cancer Centre, Toronto, Ontario, Canada.,University of Toronto, Toronto, Ontario, Canada
| | | | - Arvind Singh Mer
- Department of Medical Biophysics, University of Toronto, Toronto, Ontario, Canada.,Princess Margaret Cancer Centre, Toronto, Ontario, Canada.,University of Toronto, Toronto, Ontario, Canada
| | - Martin Ester
- School of Computing Science, Simon Fraser University, Burnaby, British Columbia, Canada.,Vancouver Prostate Center, Vancouver, British Columbia, Canada
| | - Benjamin Haibe-Kains
- Department of Medical Biophysics, University of Toronto, Toronto, Ontario, Canada.,Princess Margaret Cancer Centre, Toronto, Ontario, Canada.,Ontario Institute for Cancer Research, Toronto, Ontario, Canada.,University of Toronto, Toronto, Ontario, Canada
| |
Collapse
|