1
|
Huang K, Liu H. Identification of drug-resistant individual cells within tumors by semi-supervised transfer learning from bulk to single-cell transcriptome. Commun Biol 2025; 8:530. [PMID: 40164749 PMCID: PMC11958800 DOI: 10.1038/s42003-025-07959-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/25/2024] [Accepted: 03/19/2025] [Indexed: 04/02/2025] Open
Abstract
The presence of pre-existing or acquired drug-resistant cells within the tumor often leads to tumor relapse and metastasis. Single-cell RNA sequencing (scRNA-seq) enables elucidation of the subtle differences in drug responsiveness among distinct cell subpopulations within tumors. A few methods have employed scRNA-seq data to predict the drug response of individual cells to date, but their performance is far from satisfactory. In this study, we propose SSDA4Drug, a semi-supervised few-shot transfer learning method for inferring drug-resistant cancer cells. SSDA4Drug extracts pharmacogenomic features from both bulk and single-cell transcriptomic data using semi-supervised adversarial domain adaptation. This allows us to transfer knowledge of drug sensitivity from bulk-level cell lines to single cells. We conduct extensive performance evaluation experiments across multiple independent scRNA-seq datasets, demonstrating SSDA4Drug's superior performance over current state-of-the-art methods. Remarkably, with only one or two labeled target-domain samples, SSDA4Drug significantly boosts the predictive performance of single-cell drug responses. Moreover, SSDA4Drug accurately recapitulates the temporally dynamic changes of drug responses during continuous drug exposure of tumor cells, and successfully identifies reversible drug-responsive states in lung cancer cells, which initially acquire resistance through drug exposure but later restore sensitivity during drug holidays. Also, our predicted drug responses consistently align with the developmental patterns of drug sensitivity observed along the evolutionary trajectory of oral squamous cell carcinoma cells. In addition, our derived SHAP values and integrated gradients effectively pinpoint the key genes involved in drug resistance in prostate cancer cells. These findings highlight the exceptional performance of our method in determining single-cell drug responses. This powerful tool holds the potential for identifying drug-resistant tumor cell subpopulations, paving the way for advancements in precision medicine and novel drug development.
Collapse
Affiliation(s)
- Kaishun Huang
- College of Computer and Information Engineering, Nanjing Tech University, Nanjing, 211800, Jiangsu, China
| | - Hui Liu
- College of Computer and Information Engineering, Nanjing Tech University, Nanjing, 211800, Jiangsu, China.
| |
Collapse
|
2
|
Codicè F, Pancotti C, Rollo C, Moreau Y, Fariselli P, Raimondi D. The specification game: rethinking the evaluation of drug response prediction for precision oncology. J Cheminform 2025; 17:33. [PMID: 40087708 PMCID: PMC11907791 DOI: 10.1186/s13321-025-00972-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/28/2024] [Accepted: 02/13/2025] [Indexed: 03/17/2025] Open
Abstract
Precision oncology plays a pivotal role in contemporary healthcare, aiming to optimize treatments for each patient based on their unique characteristics. This objective has spurred the emergence of various cancer cell line drug response datasets, driven by the need to facilitate pre-clinical studies by exploring the impact of multi-omics data on drug response. Despite the proliferation of machine learning models for Drug Response Prediction (DRP), their validation remains critical to reliably assess their usefulness for drug discovery, precision oncology and their actual ability to generalize over the immense space of cancer cells and chemical compounds. Scientific contribution In this paper we show that the commonly used evaluation strategies for DRP methods can be easily fooled by commonly occurring dataset biases, and they are therefore not able to truly measure the ability of DRP methods to generalize over drugs and cell lines ("specification gaming"). This problem hinders the development of reliable DRP methods and their application to experimental pipelines. Here we propose a new validation protocol composed by three Aggregation Strategies (Global, Fixed-Drug, and Fixed-Cell Line) integrating them with three of the most commonly used train-test evaluation settings, to ensure a truly realistic assessment of the prediction performance. We also scrutinize the challenges associated with using IC50 as a prediction label, showing how its close correlation with the drug concentration ranges worsens the risk of misleading performance assessment, and we indicate an additional reason to replace it with the Area Under the Dose-Response Curve instead.
Collapse
Affiliation(s)
- Francesco Codicè
- Department of Medical Sciences, University of Torino, 10123, Torino, Italy.
| | - Corrado Pancotti
- Department of Medical Sciences, University of Torino, 10123, Torino, Italy
| | - Cesare Rollo
- Department of Medical Sciences, University of Torino, 10123, Torino, Italy
| | - Yves Moreau
- ESAT-STADIUS, KU Leuven, Leuven, 3001, Belgium
| | - Piero Fariselli
- Department of Medical Sciences, University of Torino, 10123, Torino, Italy
| | - Daniele Raimondi
- Institut de Génétique Moléculaire de Montpellier, Université de Montpellier, 34293, Montpellier, France
| |
Collapse
|
3
|
Chowdhury S, Rajaganapathy S, Sun L, Wang L, Yang P, Cerhan JR, Zong N. SensitiveCancerGPT: Leveraging Generative Large Language Model on Structured Omics Data to Optimize Drug Sensitivity Prediction. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2025:2025.02.27.640661. [PMID: 40060567 PMCID: PMC11888479 DOI: 10.1101/2025.02.27.640661] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Indexed: 03/15/2025]
Abstract
Objective The fast accumulation of vast pharmacogenomics data of cancer cell lines provide unprecedented opportunities for drug sensitivity prediction (DSP), a crucial prerequisite for the advancement of precision oncology. Recently, Generative Large Language Models (LLM) have demonstrated performance and generalization prowess across diverse tasks in the field of natural language processing (NLP). However, the structured format of the pharmacogenomics data poses challenge for the utility of LLM in DSP. Therefore, the objective of this study is multi-fold: to adapt prompt engineering for structured pharmacogenomics data toward optimizing LLM's DSP performance, to evaluate LLM's generalization in real-world DSP scenarios, and to compare LLM's DSP performance against that of state-of-the-science baselines. Methods We systematically investigated the capability of the Generative Pre-trained Transformer (GPT) as a DSP model on four publicly available benchmark pharmacogenomics datasets, which are stratified by five cancer tissue types of cell lines and encompass both oncology and non-oncology drugs. Essentially, the predictive landscape of GPT is assessed for effectiveness on the DSP task via four learning paradigms: zero-shot learning, few-shot learning, fine-tuning and clustering pretrained embeddings. To facilitate GPT in seamlessly processing the structured pharmacogenomics data, domain-specific novel prompt engineering is employed by implementing three prompt templates (i.e., Instruction, Instruction-Prefix, Cloze) and integrating pharmacogenomics-related features into the prompt. We validated GPT's performance in diverse real-world DSP scenarios: cross-tissue generalization, blind tests, and analyses of drug-pathway associations and top sensitive/resistant cell lines. Furthermore, we conducted a comparative evaluation of GPT against multiple Transformer-based pretrained models and existing DSP baselines. Results Extensive experiments on the pharmacogenomics datasets across the five tissue cohorts demonstrate that fine-tuning GPT yields the best DSP performance (28% F1 increase, p-value= 0.0003) followed by clustering pretrained GPT embeddings (26% F1 increase, p-value= 0.0005), outperforming GPT in-context learning (i.e., few-shot). However, GPT in the zero-shot setting had a big F1 gap, resulting in the worst performance. Within the scope of prompt engineering, performance enhancement was achieved by directly instructing GPT about the DSP task and resorting to a concise context format (i.e., instruction-prefix), leading to F1 performance gain of 22% (p-value=0.02); while incorporation of drug-cell line prompt context derived from genomics and/or molecular features further boosted F1 score by 2%. Compared to state-of-the-science DSP baselines, GPT significantly asserted superior mean F1 performance (16% gain, p-value<0.05) on the GDSC dataset. In the cross-tissue analysis, GPT showcased comparable generalizability to the within-tissue performances on the GDSC and PRISM datasets, while statistically significant F1 performance improvements on the CCLE (8%, p-value=0.001) and DrugComb (19%, p-value=0.009) datasets. Evaluation on the challenging blind tests suggests GPT's competitiveness on the CCLE and DrugComb datasets compared to random splitting. Furthermore, analyses of the drug-pathway associations and log probabilities provided valuable insights that align with previous DSP findings. Conclusion The diverse experiment setups and in-depth analysis underscore the importance of generative LLM, such as GPT, as a viable in silico approach to guide precision oncology. Availability https://github.com/bioIKEA/SensitiveCancerGPT.
Collapse
Affiliation(s)
- Shaika Chowdhury
- Department of Artificial Intelligence and Informatics Research, Mayo Clinic, Rochester, MN
| | | | - Lichao Sun
- Department of Artificial Intelligence and Informatics Research, Mayo Clinic, Rochester, MN
- Lehigh University, Bethlehem, PA
| | - Liewei Wang
- Department of Molecular Pharmacology and Experimental Therapeutics, Mayo Clinic, Rochester, MN
| | - Ping Yang
- Department of Quantitative Health Sciences, Mayo Clinic, Rochester, MN
| | - James R Cerhan
- Department of Quantitative Health Sciences, Mayo Clinic, Rochester, MN
| | - Nansu Zong
- Department of Artificial Intelligence and Informatics Research, Mayo Clinic, Rochester, MN
| |
Collapse
|
4
|
Wu Y, Chen M, Qin Y. Anticancer drug response prediction integrating multi-omics pathway-based difference features and multiple deep learning techniques. PLoS Comput Biol 2025; 21:e1012905. [PMID: 40163555 PMCID: PMC11978092 DOI: 10.1371/journal.pcbi.1012905] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/30/2024] [Revised: 04/08/2025] [Accepted: 02/24/2025] [Indexed: 04/02/2025] Open
Abstract
Individualized prediction of cancer drug sensitivity is of vital importance in precision medicine. While numerous predictive methodologies for cancer drug response have been proposed, the precise prediction of an individual patient's response to drug and a thorough understanding of differences in drug responses among individuals continue to pose significant challenges. This study introduced a deep learning model PASO, which integrated transformer encoder, multi-scale convolutional networks and attention mechanisms to predict the sensitivity of cell lines to anticancer drugs, based on the omics data of cell lines and the SMILES representations of drug molecules. First, we use statistical methods to compute the differences in gene expression, gene mutation, and gene copy number variations between within and outside biological pathways, and utilized these pathway difference values as cell line features, combined with the drugs' SMILES chemical structure information as inputs to the model. Then the model integrates various deep learning technologies multi-scale convolutional networks and transformer encoder to extract the properties of drug molecules from different perspectives, while an attention network is devoted to learning complex interactions between the omics features of cell lines and the aforementioned properties of drug molecules. Finally, a multilayer perceptron (MLP) outputs the final predictions of drug response. Our model exhibits higher accuracy in predicting the sensitivity to anticancer drugs comparing with other methods proposed recently. It is found that PARP inhibitors, and Topoisomerase I inhibitors were particularly sensitive to SCLC when analyzing the drug response predictions for lung cancer cell lines. Additionally, the model is capable of highlighting biological pathways related to cancer and accurately capturing critical parts of the drug's chemical structure. We also validated the model's clinical utility using clinical data from The Cancer Genome Atlas. In summary, the PASO model suggests potential as a robust support in individualized cancer treatment. Our methods are implemented in Python and are freely available from GitHub (https://github.com/queryang/PASO).
Collapse
Affiliation(s)
- Yang Wu
- College of Information Technology, Shanghai Ocean University, Shanghai, China
- Key Laboratory of Fisheries Information Ministry of Agriculture, Shanghai, China
| | - Ming Chen
- College of Information Technology, Shanghai Ocean University, Shanghai, China
- Key Laboratory of Fisheries Information Ministry of Agriculture, Shanghai, China
| | - Yufang Qin
- College of Information Technology, Shanghai Ocean University, Shanghai, China
- Key Laboratory of Fisheries Information Ministry of Agriculture, Shanghai, China
| |
Collapse
|
5
|
Niarakis A, Laubenbacher R, An G, Ilan Y, Fisher J, Flobak Å, Reiche K, Rodríguez Martínez M, Geris L, Ladeira L, Veschini L, Blinov ML, Messina F, Fonseca LL, Ferreira S, Montagud A, Noël V, Marku M, Tsirvouli E, Torres MM, Harris LA, Sego TJ, Cockrell C, Shick AE, Balci H, Salazar A, Rian K, Hemedan AA, Esteban-Medina M, Staumont B, Hernandez-Vargas E, Martis B S, Madrid-Valiente A, Karampelesis P, Sordo Vieira L, Harlapur P, Kulesza A, Nikaein N, Garira W, Malik Sheriff RS, Thakar J, Tran VDT, Carbonell-Caballero J, Safaei S, Valencia A, Zinovyev A, Glazier JA. Immune digital twins for complex human pathologies: applications, limitations, and challenges. NPJ Syst Biol Appl 2024; 10:141. [PMID: 39616158 PMCID: PMC11608242 DOI: 10.1038/s41540-024-00450-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/27/2024] [Accepted: 09/27/2024] [Indexed: 12/06/2024] Open
Abstract
Digital twins represent a key technology for precision health. Medical digital twins consist of computational models that represent the health state of individual patients over time, enabling optimal therapeutics and forecasting patient prognosis. Many health conditions involve the immune system, so it is crucial to include its key features when designing medical digital twins. The immune response is complex and varies across diseases and patients, and its modelling requires the collective expertise of the clinical, immunology, and computational modelling communities. This review outlines the initial progress on immune digital twins and the various initiatives to facilitate communication between interdisciplinary communities. We also outline the crucial aspects of an immune digital twin design and the prerequisites for its implementation in the clinic. We propose some initial use cases that could serve as "proof of concept" regarding the utility of immune digital technology, focusing on diseases with a very different immune response across spatial and temporal scales (minutes, days, months, years). Lastly, we discuss the use of digital twins in drug discovery and point out emerging challenges that the scientific community needs to collectively overcome to make immune digital twins a reality.
Collapse
Affiliation(s)
- Anna Niarakis
- Molecular, Cellular and Developmental Biology Unit (MCD), Centre de Biologie Integrative (CBI), University of Toulouse, UPS, CNRS, Toulouse, France.
- Lifeware Group, Inria, Saclay-île de France, Palaiseau, France.
| | | | - Gary An
- Department of Surgery, University of Vermont Larner College of Medicine, Vermont, USA
| | - Yaron Ilan
- Faculty of Medicine Hebrew University, Hadassah Medical Center, Jerusalem, Israel
| | - Jasmin Fisher
- UCL Cancer Institute, University College London, Paul O'Gorman Building, 72 Huntley Street, London, WC1E 6BT, UK
| | - Åsmund Flobak
- Department of Clinical and Molecular Medicine, Norwegian University of Science and Technology, Trondheim, Norway
- The Cancer Clinic, St Olav's University Hospital, Trondheim, Norway
- Department of Biotechnology and Nanomedicine, SINTEF Industry, Trondheim, Norway
| | - Kristin Reiche
- Department of Diagnostics, Fraunhofer Institute for Cell Therapy and Immunology, Leipzig, Germany
- Institute of Clinical Immunology, Medical Faculty, University Hospital, University of Leipzig, Leipzig, Germany
- Center for Scalable Data Analytics and Artificial Intelligence (ScaDS.AI), Dresden/Leipzig, Germany
| | - María Rodríguez Martínez
- Department of Biomedical Informatics & Data Science, Yale School of Medicine, New Haven, CT, USA
| | - Liesbet Geris
- Prometheus Division of Skeletal Tissue Engineering, KU Leuven, Leuven, Belgium
- Skeletal Biology and Engineering Research Center, Department of Development and Regeneration, KU Leuven, Leuven, Belgium
- Biomechanics Research Unit, GIGA Molecular and Computational Biology, University of Liège, Liège, Belgium
| | - Luiz Ladeira
- Biomechanics Research Unit, GIGA Molecular and Computational Biology, University of Liège, Liège, Belgium
| | - Lorenzo Veschini
- Faculty of Dentistry Oral & Craniofacial Sciences, King's College London, London, UK
- Biocomplexity Institute and Department of Intelligent Systems Engineering, Indiana University, Bloomington, Indiana, 47408, USA
| | - Michael L Blinov
- Center for Cell Analysis and Modeling, UConn Health, Farmington, CT, 06030, USA
| | - Francesco Messina
- Department of Epidemiology, Preclinical Research and Advanced Diagnostic, National Institute for Infectious Diseases 'Lazzaro Spallanzani' - I.R.C.C.S., Rome, Italy
| | - Luis L Fonseca
- Department of Medicine, University of Florida, Gainesville, FL, USA
| | - Sandra Ferreira
- Mathematics Department and Center of Mathematics, University of Beira Interior, Covilhã, Portugal
| | - Arnau Montagud
- Barcelona Supercomputing Center (BSC), Barcelone, Spain
- Institute for Integrative Systems Biology (I2SysBio), CSIC-UV, Valencia, Spain
| | - Vincent Noël
- Institut Curie, Université PSL, F-75005, Paris, France
- INSERM, U900, F-75005, Paris, France
- Mines ParisTech, Université PSL, F-75005, Paris, France
| | - Malvina Marku
- Université de Toulouse, Inserm, CNRS, Université Toulouse III-Paul Sabatier, Centre de Recherches en Cancérologie de Toulouse, Toulouse, France
| | - Eirini Tsirvouli
- Department of Clinical and Molecular Medicine, Norwegian University of Science and Technology, Trondheim, Norway
- Department of Biology, Norwegian University of Science and Technology, Trondheim, Norway
| | - Marcella M Torres
- Department of Mathematics and Statistics, University of Richmond, Richmond, VA, USA
| | - Leonard A Harris
- Department of Biomedical Engineering, University of Arkansas, Fayetteville, AR, USA
- Interdisciplinary Graduate Program in Cell and Molecular Biology, University of Arkansas, Fayetteville, AR, USA
- Cancer Biology Program, Winthrop P. Rockefeller Cancer Institute, University of Arkansas for Medical Sciences, Little Rock, AR, USA
| | - T J Sego
- Department of Medicine, University of Florida, Gainesville, FL, USA
| | - Chase Cockrell
- Department of Surgery, University of Vermont Larner College of Medicine, Vermont, USA
| | - Amanda E Shick
- Department of Mechanical and Aerospace Engineering, University of Florida, Gainesville, FL, USA
| | - Hasan Balci
- Maastricht Centre for Systems Biology (MaCSBio), Maastricht University, Maastricht, The Netherlands
| | - Albin Salazar
- INRIA Paris/CNRS/École Normale Supérieure/PSL Research University, Paris, France
| | - Kinza Rian
- Andalusian Platform for Computational Medicine, Andalusian Public Foundation Progress and Health-FPS, Seville, Spain
| | - Ahmed Abdelmonem Hemedan
- Bioinformatics Core Unit, Luxembourg Centre of Systems Biomedicine LCSB, Luxembourg University, Esch-sur-Alzette, Luxembourg
| | - Marina Esteban-Medina
- Andalusian Platform for Computational Medicine, Andalusian Public Foundation Progress and Health-FPS, Seville, Spain
| | - Bernard Staumont
- Biomechanics Research Unit, GIGA Molecular and Computational Biology, University of Liège, Liège, Belgium
| | - Esteban Hernandez-Vargas
- Department of Mathematics and Statistical Science, University of Idaho, Moscow, ID, 83844-1103, USA
| | | | | | | | | | - Pradyumna Harlapur
- Department of Bioengineering, Indian Institute of Science, Bengaluru, India
| | | | - Niloofar Nikaein
- School of Medical Sciences, Faculty of Medicine and Health, Örebro University, SE-70182, Örebro, Sweden
- X-HiDE - Exploring Inflammation in Health and Disease Consortium, Örebro University, Örebro, Sweden
| | - Winston Garira
- Multiscale Mathematical Modelling of Living Systems program (M3-LSP), Kimberley, South Africa
- Department of Mathematical Sciences, Sol Plaatje University, Kimberley, South Africa
- Private Bag X5008, Kimberley, 8300, South Africa
| | - Rahuman S Malik Sheriff
- European Bioinformatics Institute, European Molecular Biology Laboratory (EMBL-EBI), Hinxton, Cambridge, UK
- Department of Surgery and Cancer, Faculty of Medicine, Imperial College London, London, UK
| | - Juilee Thakar
- Department of Microbiology & Immunology and Department of Biostatistics & Computational Biology, University of Rochester Medical Center, Rochester, NY, 14642, USA
| | - Van Du T Tran
- Vital-IT Group, SIB Swiss Institute of Bioinformatics, Lausanne, Switzerland
| | | | - Soroush Safaei
- Institute of Biomedical Engineering and Technology, Ghent University, Gent, Belgium
- Auckland Bioengineering Institute, University of Auckland, Auckland, New Zealand
| | - Alfonso Valencia
- Barcelona Supercomputing Center (BSC), Barcelone, Spain
- ICREA, 23 Passeig Lluís Companys, 08010, Barcelona, Spain
| | | | - James A Glazier
- Biocomplexity Institute and Department of Intelligent Systems Engineering, Indiana University, Bloomington, Indiana, 47408, USA
| |
Collapse
|
6
|
Jiang Z, Li P. DeepDR: a deep learning library for drug response prediction. Bioinformatics 2024; 40:btae688. [PMID: 39558584 PMCID: PMC11629690 DOI: 10.1093/bioinformatics/btae688] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/09/2024] [Revised: 10/29/2024] [Accepted: 11/13/2024] [Indexed: 11/20/2024] Open
Abstract
SUMMARY Accurate drug response prediction is critical to advancing precision medicine and drug discovery. Recent advances in deep learning (DL) have shown promise in predicting drug response; however, the lack of convenient tools to support such modeling limits their widespread application. To address this, we introduce DeepDR, the first DL library specifically developed for drug response prediction. DeepDR simplifies the process by automating drug and cell featurization, model construction, training, and inference, all achievable with brief programming. The library incorporates three types of drug features along with nine drug encoders, four types of cell features along with nine cell encoders, and two fusion modules, enabling the implementation of up to 135 DL models for drug response prediction. We also explored benchmarking performance with DeepDR, and the optimal models are available on a user-friendly visual interface. AVAILABILITY AND IMPLEMENTATION DeepDR can be installed from PyPI (https://pypi.org/project/deepdr). The source code and experimental data are available on GitHub (https://github.com/user15632/DeepDR).
Collapse
Affiliation(s)
- Zhengxiang Jiang
- School of Computer Science and Technology, Xidian University, Xi’an, Shaanxi 710126, China
- School of Electronic Engineering, Xidian University, Xi’an, Shaanxi 710126, China
| | - Pengyong Li
- School of Computer Science and Technology, Xidian University, Xi’an, Shaanxi 710126, China
| |
Collapse
|
7
|
Huang S, Yao D, Shan C, Du X, Pan L, Wang N, Wang Y, Duan X, Peng D. The protective mechanism of Tao Hong Si Wu decoction against breast cancer through regulation of EGFR/ERK1/2 signaling. JOURNAL OF ETHNOPHARMACOLOGY 2024; 332:118339. [PMID: 38777083 DOI: 10.1016/j.jep.2024.118339] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 02/01/2024] [Revised: 05/02/2024] [Accepted: 05/11/2024] [Indexed: 05/25/2024]
Abstract
ETHNOPHARMACOLOGICAL RELEVANCE Tao Hong Si Wu Decoction (THSWD), a traditional Chinese herbal medicine, is widely utilized in clinical settings, either alone or in combination with other medications, for the treatment of breast cancer. AIM OF THE STUDY The specific targeting molecule(s) of THSWD and its associated molecular mechanisms remain unclear. This research aims to elucidate the underlying molecular mechanisms of THSWD in the treatment of breast cancer. MATERIALS AND METHODS The pharmacological properties of THSWD were investigated in breast cancer cells and tumor tissues using a range of methods including Acridine Orange/Ethidium Bromide (AO/EB) staining, Transwell assay, flow cytometry, immunofluorescence assay, and breast cancer mice models. RESULTS Our findings demonstrate that THSWD induces necrosis and/or apoptosis in breast cancer cells, while significantly inhibiting cell migration. Target proteins of THSWD in anticancer activity include EGFR, RAS, and others. THSWD treatment for breast cancer is associated with the EGFR/ERK1/2 signaling pathway. CONCLUSION Our findings offer initial insights into the primary mechanism of action of THSWD in breast cancer treatment, indicating its potential as a complementary therapy deserving further investigation.
Collapse
Affiliation(s)
- Shi Huang
- The First Affiliated Hospital of Anhui University of Chinese Medicine, Hefei, PR China; State Key Laboratory of Natural Medicines, China Pharmaceutical University, Nanjing, PR China
| | - Dan Yao
- The First Affiliated Hospital of Anhui University of Chinese Medicine, Hefei, PR China; College of Pharmacy, Anhui University of Chinese Medicine, Hefei, PR China
| | - Chun Shan
- State Key Laboratory of Natural Medicines, China Pharmaceutical University, Nanjing, PR China
| | - Xiuli Du
- State Key Laboratory of Natural Medicines, China Pharmaceutical University, Nanjing, PR China
| | - Linyu Pan
- The First Affiliated Hospital of Anhui University of Chinese Medicine, Hefei, PR China
| | - Ni Wang
- The First Affiliated Hospital of Anhui University of Chinese Medicine, Hefei, PR China; College of Pharmacy, Anhui University of Chinese Medicine, Hefei, PR China
| | - Yongzhong Wang
- The First Affiliated Hospital of Anhui University of Chinese Medicine, Hefei, PR China
| | - Xianchun Duan
- The First Affiliated Hospital of Anhui University of Chinese Medicine, Hefei, PR China.
| | - Daiyin Peng
- College of Pharmacy, Anhui University of Chinese Medicine, Hefei, PR China.
| |
Collapse
|
8
|
Li X, Shen B, Feng F, Li K, Tang Z, Ma L, Li H. Dual-view jointly learning improves personalized drug synergy prediction. Bioinformatics 2024; 40:btae604. [PMID: 39423102 PMCID: PMC11524890 DOI: 10.1093/bioinformatics/btae604] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/03/2024] [Revised: 08/23/2024] [Accepted: 10/17/2024] [Indexed: 10/21/2024] Open
Abstract
MOTIVATION Accurate and robust estimation of the synergistic drug combination is important for medicine precision. Although some computational methods have been developed, some predictions are still unreliable especially for the cross-dataset predictions, due to the complex mechanism of drug combinations and heterogeneity of cancer samples. RESULTS We have proposed JointSyn that utilizes dual-view jointly learning to predict sample-specific effects of drug combination from drug and cell features. JointSyn outperforms existing state-of-the-art methods in predictive accuracy and robustness across various benchmarks. Each view of JointSyn captures drug synergy-related characteristics and makes complementary contributes to the final prediction of the drug combination. Moreover, JointSyn with fine-tuning improves its generalization ability to predict a novel drug combination or cancer sample using a small number of experimental measurements. We also used JointSyn to generate an estimated atlas of drug synergy for pan-cancer and explored the differential pattern among cancers. These results demonstrate the potential of JointSyn to predict drug synergy, supporting the development of personalized combinatorial therapies. AVAILABILITY AND IMPLEMENTATION Source code and data are available at https://github.com/LiHongCSBLab/JointSyn.
Collapse
Affiliation(s)
- Xueliang Li
- CAS Key Laboratory of Computational Biology, Shanghai Institute of Nutrition and Health, University of Chinese Academy of Sciences, Chinese Academy of Sciences, Shanghai 200031, China
| | - Bihan Shen
- CAS Key Laboratory of Computational Biology, Shanghai Institute of Nutrition and Health, University of Chinese Academy of Sciences, Chinese Academy of Sciences, Shanghai 200031, China
| | - Fangyoumin Feng
- CAS Key Laboratory of Computational Biology, Shanghai Institute of Nutrition and Health, University of Chinese Academy of Sciences, Chinese Academy of Sciences, Shanghai 200031, China
| | - Kunshi Li
- CAS Key Laboratory of Computational Biology, Shanghai Institute of Nutrition and Health, University of Chinese Academy of Sciences, Chinese Academy of Sciences, Shanghai 200031, China
| | - Zhixuan Tang
- CAS Key Laboratory of Computational Biology, Shanghai Institute of Nutrition and Health, University of Chinese Academy of Sciences, Chinese Academy of Sciences, Shanghai 200031, China
| | - Liangxiao Ma
- Bio-Med Big Data Center, CAS Key Laboratory of Computational Biology, Shanghai Institute of Nutrition and Health, Chinese Academy of Science, Shanghai 200031, China
| | - Hong Li
- CAS Key Laboratory of Computational Biology, Shanghai Institute of Nutrition and Health, University of Chinese Academy of Sciences, Chinese Academy of Sciences, Shanghai 200031, China
| |
Collapse
|
9
|
Connell W, Garcia K, Goodarzi H, Keiser MJ. Learning chemical sensitivity reveals mechanisms of cellular response. Commun Biol 2024; 7:1149. [PMID: 39278951 PMCID: PMC11402971 DOI: 10.1038/s42003-024-06865-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/11/2023] [Accepted: 09/06/2024] [Indexed: 09/18/2024] Open
Abstract
Chemical probes interrogate disease mechanisms at the molecular level by linking genetic changes to observable traits. However, comprehensive chemical screens in diverse biological models are impractical. To address this challenge, we develop ChemProbe, a model that predicts cellular sensitivity to hundreds of molecular probes and drugs by learning to combine transcriptomes and chemical structures. Using ChemProbe, we infer the chemical sensitivity of cancer cell lines and tumor samples and analyze how the model makes predictions. We retrospectively evaluate drug response predictions for precision breast cancer treatment and prospectively validate chemical sensitivity predictions in new cellular models, including a genetically modified cell line. Our model interpretation analysis identifies transcriptome features reflecting compound targets and protein network modules, identifying genes that drive ferroptosis. ChemProbe is an interpretable in silico screening tool that allows researchers to measure cellular response to diverse compounds, facilitating research into molecular mechanisms of chemical sensitivity.
Collapse
Affiliation(s)
- William Connell
- Department of Pharmaceutical Chemistry, University of California, San Francisco, San Francisco, CA, USA
- Institute for Neurodegenerative Diseases, University of California, San Francisco, San Francisco, CA, USA
- Bakar Computational Health Sciences Institute, University of California, San Francisco, San Francisco, CA, USA
| | - Kristle Garcia
- Bakar Computational Health Sciences Institute, University of California, San Francisco, San Francisco, CA, USA
- Department of Biochemistry and Biophysics, University of California, San Francisco, San Francisco, CA, USA
- Department of Urology, University of California, San Francisco, San Francisco, CA, USA
- Helen Diller Family Comprehensive Cancer Center, University of California, San Francisco, San Francisco, CA, USA
| | - Hani Goodarzi
- Bakar Computational Health Sciences Institute, University of California, San Francisco, San Francisco, CA, USA
- Department of Biochemistry and Biophysics, University of California, San Francisco, San Francisco, CA, USA
- Department of Urology, University of California, San Francisco, San Francisco, CA, USA
- Helen Diller Family Comprehensive Cancer Center, University of California, San Francisco, San Francisco, CA, USA
| | - Michael J Keiser
- Department of Pharmaceutical Chemistry, University of California, San Francisco, San Francisco, CA, USA.
- Institute for Neurodegenerative Diseases, University of California, San Francisco, San Francisco, CA, USA.
- Bakar Computational Health Sciences Institute, University of California, San Francisco, San Francisco, CA, USA.
| |
Collapse
|
10
|
Nerella S, Bandyopadhyay S, Zhang J, Contreras M, Siegel S, Bumin A, Silva B, Sena J, Shickel B, Bihorac A, Khezeli K, Rashidi P. Transformers and large language models in healthcare: A review. Artif Intell Med 2024; 154:102900. [PMID: 38878555 PMCID: PMC11638972 DOI: 10.1016/j.artmed.2024.102900] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/01/2023] [Revised: 05/28/2024] [Accepted: 05/30/2024] [Indexed: 08/09/2024]
Abstract
With Artificial Intelligence (AI) increasingly permeating various aspects of society, including healthcare, the adoption of the Transformers neural network architecture is rapidly changing many applications. Transformer is a type of deep learning architecture initially developed to solve general-purpose Natural Language Processing (NLP) tasks and has subsequently been adapted in many fields, including healthcare. In this survey paper, we provide an overview of how this architecture has been adopted to analyze various forms of healthcare data, including clinical NLP, medical imaging, structured Electronic Health Records (EHR), social media, bio-physiological signals, biomolecular sequences. Furthermore, which have also include the articles that used the transformer architecture for generating surgical instructions and predicting adverse outcomes after surgeries under the umbrella of critical care. Under diverse settings, these models have been used for clinical diagnosis, report generation, data reconstruction, and drug/protein synthesis. Finally, we also discuss the benefits and limitations of using transformers in healthcare and examine issues such as computational cost, model interpretability, fairness, alignment with human values, ethical implications, and environmental impact.
Collapse
Affiliation(s)
- Subhash Nerella
- Department of Biomedical Engineering, University of Florida, Gainesville, United States
| | | | - Jiaqing Zhang
- Department of Electrical and Computer Engineering, University of Florida, Gainesville, United States
| | - Miguel Contreras
- Department of Biomedical Engineering, University of Florida, Gainesville, United States
| | - Scott Siegel
- Department of Biomedical Engineering, University of Florida, Gainesville, United States
| | - Aysegul Bumin
- Department of Computer and Information Science and Engineering, University of Florida, Gainesville, United States
| | - Brandon Silva
- Department of Computer and Information Science and Engineering, University of Florida, Gainesville, United States
| | - Jessica Sena
- Department Of Computer Science, Universidade Federal de Minas Gerais, Belo Horizonte, Brazil
| | - Benjamin Shickel
- Department of Medicine, University of Florida, Gainesville, United States
| | - Azra Bihorac
- Department of Medicine, University of Florida, Gainesville, United States
| | - Kia Khezeli
- Department of Biomedical Engineering, University of Florida, Gainesville, United States
| | - Parisa Rashidi
- Department of Biomedical Engineering, University of Florida, Gainesville, United States.
| |
Collapse
|
11
|
Campana PA, Prasse P, Lienhard M, Thedinga K, Herwig R, Scheffer T. Cancer drug sensitivity estimation using modular deep Graph Neural Networks. NAR Genom Bioinform 2024; 6:lqae043. [PMID: 38680251 PMCID: PMC11055499 DOI: 10.1093/nargab/lqae043] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/23/2023] [Revised: 03/01/2024] [Accepted: 04/17/2024] [Indexed: 05/01/2024] Open
Abstract
Computational drug sensitivity models have the potential to improve therapeutic outcomes by identifying targeted drugs components that are tailored to the transcriptomic profile of a given primary tumor. The SMILES representation of molecules that is used by state-of-the-art drug-sensitivity models is not conducive for neural networks to generalize to new drugs, in part because the distance between atoms does not generally correspond to the distance between their representation in the SMILES strings. Graph-attention networks, on the other hand, are high-capacity models that require large training-data volumes which are not available for drug-sensitivity estimation. We develop a modular drug-sensitivity graph-attentional neural network. The modular architecture allows us to separately pre-train the graph encoder and graph-attentional pooling layer on related tasks for which more data are available. We observe that this model outperforms reference models for the use cases of precision oncology and drug discovery; in particular, it is better able to predict the specific interaction between drug and cell line that is not explained by the general cytotoxicity of the drug and the overall survivability of the cell line. The complete source code is available at https://zenodo.org/doi/10.5281/zenodo.8020945. All experiments are based on the publicly available GDSC data.
Collapse
Affiliation(s)
- Pedro A Campana
- University of Potsdam, Department of Computer Science, Potsdam, Germany
| | - Paul Prasse
- University of Potsdam, Department of Computer Science, Potsdam, Germany
| | - Matthias Lienhard
- Max Planck Institute for Molecular Genetics, Department Computational Molecular Biology, Berlin, Germany
| | - Kristina Thedinga
- Max Planck Institute for Molecular Genetics, Department Computational Molecular Biology, Berlin, Germany
| | - Ralf Herwig
- Max Planck Institute for Molecular Genetics, Department Computational Molecular Biology, Berlin, Germany
| | - Tobias Scheffer
- University of Potsdam, Department of Computer Science, Potsdam, Germany
| |
Collapse
|
12
|
Ovchinnikova K, Born J, Chouvardas P, Rapsomaniki M, Kruithof-de Julio M. Overcoming limitations in current measures of drug response may enable AI-driven precision oncology. NPJ Precis Oncol 2024; 8:95. [PMID: 38658785 PMCID: PMC11043358 DOI: 10.1038/s41698-024-00583-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/16/2023] [Accepted: 03/22/2024] [Indexed: 04/26/2024] Open
Abstract
Machine learning (ML) models of drug sensitivity prediction are becoming increasingly popular in precision oncology. Here, we identify a fundamental limitation in standard measures of drug sensitivity that hinders the development of personalized prediction models - they focus on absolute effects but do not capture relative differences between cancer subtypes. Our work suggests that using z-scored drug response measures mitigates these limitations and leads to meaningful predictions, opening the door for sophisticated ML precision oncology models.
Collapse
Affiliation(s)
- Katja Ovchinnikova
- Urology Research Laboratory, Department for BioMedical Research, University of Bern, Bern, Switzerland
| | | | - Panagiotis Chouvardas
- Urology Research Laboratory, Department for BioMedical Research, University of Bern, Bern, Switzerland
- Department of Urology, Inselspital, Bern University Hospital, University of Bern, Bern, Switzerland
| | | | - Marianna Kruithof-de Julio
- Urology Research Laboratory, Department for BioMedical Research, University of Bern, Bern, Switzerland.
- Department of Urology, Inselspital, Bern University Hospital, University of Bern, Bern, Switzerland.
| |
Collapse
|
13
|
Abbasi M, Carvalho FG, Ribeiro B, Arrais JP. Predicting drug activity against cancer through genomic profiles and SMILES. Artif Intell Med 2024; 150:102820. [PMID: 38553160 DOI: 10.1016/j.artmed.2024.102820] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/05/2022] [Revised: 01/09/2024] [Accepted: 02/21/2024] [Indexed: 04/02/2024]
Abstract
Due to the constant increase in cancer rates, the disease has become a leading cause of death worldwide, enhancing the need for its detection and treatment. In the era of personalized medicine, the main goal is to incorporate individual variability in order to choose more precisely which therapy and prevention strategies suit each person. However, predicting the sensitivity of tumors to anticancer treatments remains a challenge. In this work, we propose two deep neural network models to predict the impact of anticancer drugs in tumors through the half-maximal inhibitory concentration (IC50). These models join biological and chemical data to apprehend relevant features of the genetic profile and the drug compounds, respectively. In order to predict the drug response in cancer cell lines, this study employed different DL methods, resorting to Recurrent Neural Networks (RNNs) and Convolutional Neural Networks (CNNs). In the first stage, two autoencoders were pre-trained with high-dimensional gene expression and mutation data of tumors. Afterward, this genetic background is transferred to the prediction models that return the IC50 value that portrays the potency of a substance in inhibiting a cancer cell line. When comparing RSEM Expected counts and TPM as methods for displaying gene expression data, RSEM has been shown to perform better in deep models and CNNs model can obtain better insight in these types of data. Moreover, the obtained results reflect the effectiveness of the extracted deep representations in the prediction of the IC50 value that portrays the potency of a substance in inhibiting a tumor, achieving a performance of a mean squared error of 1.06 and surpassing previous state-of-the-art models.
Collapse
Affiliation(s)
- Maryam Abbasi
- Centre for Informatics and Systems of the University of Coimbra, Department of Informatics Engineering, University of Coimbra, Coimbra, Portugal; Polytechnic Institute of Coimbra, Applied Research Institute, Coimbra, Portugal; Research Centre for Natural Resources Environment and Society (CERNAS), Polytechnic Institute of Coimbra, Coimbra, Portugal.
| | - Filipa G Carvalho
- Centre for Informatics and Systems of the University of Coimbra, Department of Informatics Engineering, University of Coimbra, Coimbra, Portugal
| | - Bernardete Ribeiro
- Centre for Informatics and Systems of the University of Coimbra, Department of Informatics Engineering, University of Coimbra, Coimbra, Portugal
| | - Joel P Arrais
- Centre for Informatics and Systems of the University of Coimbra, Department of Informatics Engineering, University of Coimbra, Coimbra, Portugal
| |
Collapse
|
14
|
Li P, Jiang Z, Liu T, Liu X, Qiao H, Yao X. Improving drug response prediction via integrating gene relationships with deep learning. Brief Bioinform 2024; 25:bbae153. [PMID: 38600666 PMCID: PMC11006795 DOI: 10.1093/bib/bbae153] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/26/2023] [Revised: 03/05/2024] [Accepted: 03/18/2024] [Indexed: 04/12/2024] Open
Abstract
Predicting the drug response of cancer cell lines is crucial for advancing personalized cancer treatment, yet remains challenging due to tumor heterogeneity and individual diversity. In this study, we present a deep learning-based framework named Deep neural network Integrating Prior Knowledge (DIPK) (DIPK), which adopts self-supervised techniques to integrate multiple valuable information, including gene interaction relationships, gene expression profiles and molecular topologies, to enhance prediction accuracy and robustness. We demonstrated the superior performance of DIPK compared to existing methods on both known and novel cells and drugs, underscoring the importance of gene interaction relationships in drug response prediction. In addition, DIPK extends its applicability to single-cell RNA sequencing data, showcasing its capability for single-cell-level response prediction and cell identification. Further, we assess the applicability of DIPK on clinical data. DIPK accurately predicted a higher response to paclitaxel in the pathological complete response (pCR) group compared to the residual disease group, affirming the better response of the pCR group to the chemotherapy compound. We believe that the integration of DIPK into clinical decision-making processes has the potential to enhance individualized treatment strategies for cancer patients.
Collapse
Affiliation(s)
- Pengyong Li
- School of Computer Science and Technology,Xidian University, 710126 Xi’an, Shaanxi, China
- State Key Laboratory of Quality Research in Chinese Medicine, Macau Institute for Applied Research in Medicine and Health, Macau University of Science and Technology, 519020 Macau, China
| | - Zhengxiang Jiang
- School of Electronic Engineering, Xidian University, 710126 Xi’an, Shaanxi, China
| | - Tianxiao Liu
- School of Computer Science and Technology,Xidian University, 710126 Xi’an, Shaanxi, China
| | - Xinyu Liu
- Beijing Laboratory of Biomedical Materials, Department of Geriatric Dentistry, Peking University School and Hospital of Stomatology, 100081 Beijing, China
| | - Hui Qiao
- Department of Oncology, Tai’an Municipal Hospital, 271021 Tai’an, Shandong, China
| | - Xiaojun Yao
- Centre for Artificial Intelligence Driven Drug Discovery, Faculty of Applied Sciences, Macao Polytechnic University, 999078 Macao, China
| |
Collapse
|
15
|
Wang LJ, Ning M, Nayak T, Kasper MJ, Monga SP, Huang Y, Chen Y, Chiu YC. shinyDeepDR: A user-friendly R Shiny app for predicting anti-cancer drug response using deep learning. PATTERNS (NEW YORK, N.Y.) 2024; 5:100894. [PMID: 38370127 PMCID: PMC10873157 DOI: 10.1016/j.patter.2023.100894] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 08/03/2023] [Revised: 11/10/2023] [Accepted: 11/14/2023] [Indexed: 02/20/2024]
Abstract
Advancing precision oncology requires accurate prediction of treatment response and accessible prediction models. To this end, we present shinyDeepDR, a user-friendly implementation of our innovative deep learning model, DeepDR, for predicting anti-cancer drug sensitivity. The web tool makes DeepDR more accessible to researchers without extensive programming experience. Using shinyDeepDR, users can upload mutation and/or gene expression data from a cancer sample (cell line or tumor) and perform two main functions: "Find Drug," which predicts the sample's response to 265 approved and investigational anti-cancer compounds, and "Find Sample," which searches for cell lines in the Cancer Cell Line Encyclopedia (CCLE) and tumors in The Cancer Genome Atlas (TCGA) with genomics profiles similar to those of the query sample to study potential effective treatments. shinyDeepDR provides an interactive interface to interpret prediction results and to investigate individual compounds. In conclusion, shinyDeepDR is an intuitive and free-to-use web tool for in silico anti-cancer drug screening.
Collapse
Affiliation(s)
- Li-Ju Wang
- Cancer Therapeutics Program, University of Pittsburgh Medical Center Hillman Cancer Center, Pittsburgh, PA 15232, USA
| | - Michael Ning
- Cancer Therapeutics Program, University of Pittsburgh Medical Center Hillman Cancer Center, Pittsburgh, PA 15232, USA
| | - Tapsya Nayak
- Greehey Children’s Cancer Research Institute, University of Texas Health San Antonio, San Antonio, TX 78229, USA
| | - Michael J. Kasper
- Cancer Therapeutics Program, University of Pittsburgh Medical Center Hillman Cancer Center, Pittsburgh, PA 15232, USA
| | - Satdarshan P. Monga
- Department of Pathology, University of Pittsburgh School of Medicine, Pittsburgh, PA 15261, USA
- Pittsburgh Liver Research Center, University of Pittsburgh Medical Center and University of Pittsburgh School of Medicine, Pittsburgh, PA 15261, USA
- Department of Medicine, University of Pittsburgh School of Medicine, Pittsburgh, PA 15261, USA
| | - Yufei Huang
- Department of Medicine, University of Pittsburgh School of Medicine, Pittsburgh, PA 15261, USA
- Cancer Virology Program, University of Pittsburgh Medical Center Hillman Cancer Center, Pittsburgh, PA 15232, USA
- Department of Electrical and Computer Engineering, Swanson School of Engineering, University of Pittsburgh, Pittsburgh, PA 15261, USA
- Department of Pharmaceutical Sciences, University of Pittsburgh, Pittsburgh, PA 15261, USA
| | - Yidong Chen
- Greehey Children’s Cancer Research Institute, University of Texas Health San Antonio, San Antonio, TX 78229, USA
- Department of Population Health Sciences, University of Texas Health San Antonio, San Antonio, TX 78229, USA
| | - Yu-Chiao Chiu
- Cancer Therapeutics Program, University of Pittsburgh Medical Center Hillman Cancer Center, Pittsburgh, PA 15232, USA
- Pittsburgh Liver Research Center, University of Pittsburgh Medical Center and University of Pittsburgh School of Medicine, Pittsburgh, PA 15261, USA
- Department of Medicine, University of Pittsburgh School of Medicine, Pittsburgh, PA 15261, USA
| |
Collapse
|
16
|
Liu H, Peng W, Dai W, Lin J, Fu X, Liu L, Liu L, Yu N. Improving anti-cancer drug response prediction using multi-task learning on graph convolutional networks. Methods 2024; 222:41-50. [PMID: 38157919 DOI: 10.1016/j.ymeth.2023.11.018] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/21/2023] [Revised: 09/19/2023] [Accepted: 11/19/2023] [Indexed: 01/03/2024] Open
Abstract
Predicting the therapeutic effect of anti-cancer drugs on tumors based on the characteristics of tumors and patients is one of the important contents of precision oncology. Existing computational methods regard the drug response prediction problem as a classification or regression task. However, few of them consider leveraging the relationship between the two tasks. In this work, we propose a Multi-task Interaction Graph Convolutional Network (MTIGCN) for anti-cancer drug response prediction. MTIGCN first utilizes an graph convolutional network-based model to produce embeddings for both cell lines and drugs. After that, the model employs multi-task learning to predict anti-cancer drug response, which involves training the model on three different tasks simultaneously: the main task of the drug sensitive or resistant classification task and the two auxiliary tasks of regression prediction and similarity network reconstruction. By sharing parameters and optimizing the losses of different tasks simultaneously, MTIGCN enhances the feature representation and reduces overfitting. The results of the experiments on two in vitro datasets demonstrated that MTIGCN outperformed seven state-of-the-art baseline methods. Moreover, the well-trained model on the in vitro dataset GDSC exhibited good performance when applied to predict drug responses in in vivo datasets PDX and TCGA. The case study confirmed the model's ability to discover unknown drug responses in cell lines.
Collapse
Affiliation(s)
- Hancheng Liu
- Faculty of Information Engineering and Automation, Kunming University of Science and Technology, Kunming 650050, China
| | - Wei Peng
- Faculty of Information Engineering and Automation, Kunming University of Science and Technology, Kunming 650050, China; Computer Technology Application Key Lab of Yunnan Province, Kunming University of Science and Technology, Kunming 650050, China.
| | - Wei Dai
- Faculty of Information Engineering and Automation, Kunming University of Science and Technology, Kunming 650050, China; Computer Technology Application Key Lab of Yunnan Province, Kunming University of Science and Technology, Kunming 650050, China.
| | - Jiangzhen Lin
- Faculty of Information Engineering and Automation, Kunming University of Science and Technology, Kunming 650050, China
| | - Xiaodong Fu
- Faculty of Information Engineering and Automation, Kunming University of Science and Technology, Kunming 650050, China; Computer Technology Application Key Lab of Yunnan Province, Kunming University of Science and Technology, Kunming 650050, China
| | - Li Liu
- Faculty of Information Engineering and Automation, Kunming University of Science and Technology, Kunming 650050, China; Computer Technology Application Key Lab of Yunnan Province, Kunming University of Science and Technology, Kunming 650050, China.
| | - Lijun Liu
- Faculty of Information Engineering and Automation, Kunming University of Science and Technology, Kunming 650050, China; Computer Technology Application Key Lab of Yunnan Province, Kunming University of Science and Technology, Kunming 650050, China
| | - Ning Yu
- State University of New York, The College at Brockport, Department of Computing Sciences, 350 New Campus Drive, Brockport NY 14422.
| |
Collapse
|
17
|
Vasanthakumari P, Zhu Y, Brettin T, Partin A, Shukla M, Xia F, Narykov O, Weil MR, Stevens RL. A Comprehensive Investigation of Active Learning Strategies for Conducting Anti-Cancer Drug Screening. Cancers (Basel) 2024; 16:530. [PMID: 38339281 PMCID: PMC10854925 DOI: 10.3390/cancers16030530] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/28/2023] [Revised: 01/12/2024] [Accepted: 01/22/2024] [Indexed: 02/12/2024] Open
Abstract
It is well-known that cancers of the same histology type can respond differently to a treatment. Thus, computational drug response prediction is of paramount importance for both preclinical drug screening studies and clinical treatment design. To build drug response prediction models, treatment response data need to be generated through screening experiments and used as input to train the prediction models. In this study, we investigate various active learning strategies of selecting experiments to generate response data for the purposes of (1) improving the performance of drug response prediction models built on the data and (2) identifying effective treatments. Here, we focus on constructing drug-specific response prediction models for cancer cell lines. Various approaches have been designed and applied to select cell lines for screening, including a random, greedy, uncertainty, diversity, combination of greedy and uncertainty, sampling-based hybrid, and iteration-based hybrid approach. All of these approaches are evaluated and compared using two criteria: (1) the number of identified hits that are selected experiments validated to be responsive, and (2) the performance of the response prediction model trained on the data of selected experiments. The analysis was conducted for 57 drugs and the results show a significant improvement on identifying hits using active learning approaches compared with the random and greedy sampling method. Active learning approaches also show an improvement on response prediction performance for some of the drugs and analysis runs compared with the greedy sampling method.
Collapse
Affiliation(s)
- Priyanka Vasanthakumari
- Division of Data Science and Learning, Argonne National Laboratory, Lemont, IL 60439, USA; (Y.Z.); (A.P.); (M.S.); (F.X.); (O.N.)
| | - Yitan Zhu
- Division of Data Science and Learning, Argonne National Laboratory, Lemont, IL 60439, USA; (Y.Z.); (A.P.); (M.S.); (F.X.); (O.N.)
| | - Thomas Brettin
- Computing, Environment and Life Sciences, Argonne National Laboratory, Lemont, IL 60439, USA; (T.B.); (R.L.S.)
| | - Alexander Partin
- Division of Data Science and Learning, Argonne National Laboratory, Lemont, IL 60439, USA; (Y.Z.); (A.P.); (M.S.); (F.X.); (O.N.)
| | - Maulik Shukla
- Division of Data Science and Learning, Argonne National Laboratory, Lemont, IL 60439, USA; (Y.Z.); (A.P.); (M.S.); (F.X.); (O.N.)
| | - Fangfang Xia
- Division of Data Science and Learning, Argonne National Laboratory, Lemont, IL 60439, USA; (Y.Z.); (A.P.); (M.S.); (F.X.); (O.N.)
| | - Oleksandr Narykov
- Division of Data Science and Learning, Argonne National Laboratory, Lemont, IL 60439, USA; (Y.Z.); (A.P.); (M.S.); (F.X.); (O.N.)
| | - Michael Ryan Weil
- Cancer Research Technology Program, Cancer Data Science Initiatives, Frederick National Laboratory for Cancer Research, Frederick, MD 21701, USA;
| | - Rick L. Stevens
- Computing, Environment and Life Sciences, Argonne National Laboratory, Lemont, IL 60439, USA; (T.B.); (R.L.S.)
- Department of Computer Science, The University of Chicago, Chicago, IL 60637, USA
| |
Collapse
|
18
|
Zhang Y, Liu C, Liu M, Liu T, Lin H, Huang CB, Ning L. Attention is all you need: utilizing attention in AI-enabled drug discovery. Brief Bioinform 2023; 25:bbad467. [PMID: 38189543 PMCID: PMC10772984 DOI: 10.1093/bib/bbad467] [Citation(s) in RCA: 37] [Impact Index Per Article: 18.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/30/2023] [Revised: 11/03/2023] [Accepted: 11/25/2023] [Indexed: 01/09/2024] Open
Abstract
Recently, attention mechanism and derived models have gained significant traction in drug development due to their outstanding performance and interpretability in handling complex data structures. This review offers an in-depth exploration of the principles underlying attention-based models and their advantages in drug discovery. We further elaborate on their applications in various aspects of drug development, from molecular screening and target binding to property prediction and molecule generation. Finally, we discuss the current challenges faced in the application of attention mechanisms and Artificial Intelligence technologies, including data quality, model interpretability and computational resource constraints, along with future directions for research. Given the accelerating pace of technological advancement, we believe that attention-based models will have an increasingly prominent role in future drug discovery. We anticipate that these models will usher in revolutionary breakthroughs in the pharmaceutical domain, significantly accelerating the pace of drug development.
Collapse
Affiliation(s)
- Yang Zhang
- Innovative Institute of Chinese Medicine and Pharmacy, Academy for Interdiscipline, Chengdu University of Traditional Chinese Medicine, Chengdu, China
| | - Caiqi Liu
- Department of Gastrointestinal Medical Oncology, Harbin Medical University Cancer Hospital, No.150 Haping Road, Nangang District, Harbin, Heilongjiang 150081, China
- Key Laboratory of Molecular Oncology of Heilongjiang Province, No.150 Haping Road, Nangang District, Harbin, Heilongjiang 150081, China
| | - Mujiexin Liu
- Chongqing Key Laboratory of Sichuan-Chongqing Co-construction for Diagnosis and Treatment of Infectious Diseases Integrated Traditional Chinese and Western Medicine, College of Medical Technology, Chengdu University of Traditional Chinese Medicine, Chengdu, China
| | - Tianyuan Liu
- Graduate School of Science and Technology, University of Tsukuba, Tsukuba, Japan
| | - Hao Lin
- School of Life Science and Technology, University of Electronic Science and Technology of China, Chengdu 610054, China
| | - Cheng-Bing Huang
- School of Computer Science and Technology, Aba Teachers University, Aba, China
| | - Lin Ning
- Yangtze Delta Region Institute (Quzhou), University of Electronic Science and Technology of China, Quzhou, Zhejiang, China
- School of Healthcare Technology, Chengdu Neusoft University, Chengdu 611844, China
| |
Collapse
|
19
|
Siddalingappa R, Kanagaraj S. K-nearest-neighbor algorithm to predict the survival time and classification of various stages of oral cancer: a machine learning approach. F1000Res 2023; 11:70. [PMID: 38046542 PMCID: PMC10690040 DOI: 10.12688/f1000research.75469.2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Accepted: 10/16/2023] [Indexed: 12/05/2023] Open
Abstract
Background:For years now, cancer treatments have entailed tried-and-true methods. Yet, oncologists and clinicians recommend a series of surgeries, chemotherapy, and radiation therapy. Yet, even amidst these treatments, the number of deaths due to cancer increases at an alarming rate. The prognosis of cancer patients is influenced by mutations, age, and various cancer stages. However, the association between these variables is unclear. Methods: The present work adopts a machine learning technique-k-nearest neighbor; for both regression and classification tasks, regression for predicting the survival time of oral cancer patients, and classification for classifying the patients into one of the predefined oral cancer stages. Two cross-validation approaches-hold-out and k-fold methods-have been used to examine the prediction results. Results: The experimental results show that the k-fold method performs better than the hold-out method, providing the least mean absolute error score of 0.015. Additionally, the model classifies patients into a valid group. Of the 429 records, 97 (out of 106), 99 (out of 119), 95 (out of 113), and 77 (out of 91) were classified to its correct label as stages - 1, 2, 3, and 4. The accuracy, recall, precision, and F-measure for each classification group obtained are 0.84, 0.85, 0.85, and 0.84. Conclusions: The study showed that aged patients with a higher number of mutations than young patients have a higher risk of short survival. Senior patients with a more significant number of mutations have an increased risk of getting into the last cancer stage.
Collapse
Affiliation(s)
- Rashmi Siddalingappa
- Computational and Data Sciences, Indian Institute of Science, Bangalore, Karnataka, 560012, India
| | - Sekar Kanagaraj
- Computational and Data Sciences, Indian Institute of Science, Bangalore, Karnataka, 560012, India
| |
Collapse
|
20
|
Mullowney MW, Duncan KR, Elsayed SS, Garg N, van der Hooft JJJ, Martin NI, Meijer D, Terlouw BR, Biermann F, Blin K, Durairaj J, Gorostiola González M, Helfrich EJN, Huber F, Leopold-Messer S, Rajan K, de Rond T, van Santen JA, Sorokina M, Balunas MJ, Beniddir MA, van Bergeijk DA, Carroll LM, Clark CM, Clevert DA, Dejong CA, Du C, Ferrinho S, Grisoni F, Hofstetter A, Jespers W, Kalinina OV, Kautsar SA, Kim H, Leao TF, Masschelein J, Rees ER, Reher R, Reker D, Schwaller P, Segler M, Skinnider MA, Walker AS, Willighagen EL, Zdrazil B, Ziemert N, Goss RJM, Guyomard P, Volkamer A, Gerwick WH, Kim HU, Müller R, van Wezel GP, van Westen GJP, Hirsch AKH, Linington RG, Robinson SL, Medema MH. Artificial intelligence for natural product drug discovery. Nat Rev Drug Discov 2023; 22:895-916. [PMID: 37697042 DOI: 10.1038/s41573-023-00774-7] [Citation(s) in RCA: 107] [Impact Index Per Article: 53.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 07/20/2023] [Indexed: 09/13/2023]
Abstract
Developments in computational omics technologies have provided new means to access the hidden diversity of natural products, unearthing new potential for drug discovery. In parallel, artificial intelligence approaches such as machine learning have led to exciting developments in the computational drug design field, facilitating biological activity prediction and de novo drug design for molecular targets of interest. Here, we describe current and future synergies between these developments to effectively identify drug candidates from the plethora of molecules produced by nature. We also discuss how to address key challenges in realizing the potential of these synergies, such as the need for high-quality datasets to train deep learning algorithms and appropriate strategies for algorithm validation.
Collapse
Affiliation(s)
| | - Katherine R Duncan
- Strathclyde Institute of Pharmacy and Biomedical Sciences, University of Strathclyde, Glasgow, UK
| | - Somayah S Elsayed
- Department of Molecular Biotechnology, Institute of Biology, Leiden University, Leiden, The Netherlands
| | - Neha Garg
- School of Chemistry and Biochemistry, Center for Microbial Dynamics and Infection, Georgia Institute of Technology, Atlanta, GA, USA
| | - Justin J J van der Hooft
- Bioinformatics Group, Wageningen University, Wageningen, The Netherlands
- Department of Biochemistry, University of Johannesburg, Johannesburg, South Africa
| | - Nathaniel I Martin
- Biological Chemistry Group, Institute of Biology, Leiden University, Leiden, The Netherlands
| | - David Meijer
- Bioinformatics Group, Wageningen University, Wageningen, The Netherlands
| | - Barbara R Terlouw
- Bioinformatics Group, Wageningen University, Wageningen, The Netherlands
| | - Friederike Biermann
- Bioinformatics Group, Wageningen University, Wageningen, The Netherlands
- Institute of Molecular Bio Science, Goethe-University Frankfurt, Frankfurt am Main, Germany
- LOEWE Center for Translational Biodiversity Genomics (TBG), Frankfurt am Main, Germany
| | - Kai Blin
- The Novo Nordisk Foundation Center for Biosustainability, Technical University of Denmark, Kongens Lyngby, Denmark
| | | | - Marina Gorostiola González
- Drug Discovery and Safety, Leiden Academic Centre for Drug Research, Leiden, The Netherlands
- ONCODE institute, Leiden, The Netherlands
| | - Eric J N Helfrich
- Institute of Molecular Bio Science, Goethe-University Frankfurt, Frankfurt am Main, Germany
- LOEWE Center for Translational Biodiversity Genomics (TBG), Frankfurt am Main, Germany
| | - Florian Huber
- Center for Digitalization and Digitality, Hochschule Düsseldorf, Düsseldorf, Germany
| | - Stefan Leopold-Messer
- Institut für Mikrobiologie, Eidgenössische Technische Hochschule (ETH) Zürich, Zürich, Switzerland
| | - Kohulan Rajan
- Institute for Inorganic and Analytical Chemistry, Friedrich-Schiller-University Jena, Jena, Germany
| | - Tristan de Rond
- School of Chemical Sciences, University of Auckland, Auckland, New Zealand
| | - Jeffrey A van Santen
- Department of Chemistry, Simon Fraser University, Burnaby, British Columbia, Canada
| | - Maria Sorokina
- Institute for Inorganic and Analytical Chemistry, Friedrich-Schiller University, Jena, Germany
- Pharmaceuticals R&D, Bayer AG, Berlin, Germany
| | - Marcy J Balunas
- Department of Microbiology and Immunology, University of Michigan, Ann Arbor, MI, USA
- Department of Medicinal Chemistry, University of Michigan, Ann Arbor, MI, USA
| | - Mehdi A Beniddir
- Équipe "Chimie des Substances Naturelles", Université Paris-Saclay, CNRS, BioCIS, Orsay, France
| | - Doris A van Bergeijk
- Department of Molecular Biotechnology, Institute of Biology, Leiden University, Leiden, The Netherlands
| | - Laura M Carroll
- Structural and Computational Biology Unit, EMBL, Heidelberg, Germany
| | - Chase M Clark
- Division of Pharmaceutical Sciences, School of Pharmacy, University of Wisconsin-Madison, Madison, WI, USA
| | | | | | - Chao Du
- Department of Molecular Biotechnology, Institute of Biology, Leiden University, Leiden, The Netherlands
| | | | - Francesca Grisoni
- Institute for Complex Molecular Systems, Department of Biomedical Engineering, Eindhoven University of Technology, Eindhoven, The Netherlands
- Centre for Living Technologies, Alliance TU/e, WUR, UU, UMC Utrecht, Utrecht, The Netherlands
| | | | - Willem Jespers
- Drug Discovery and Safety, Leiden Academic Centre for Drug Research, Leiden, The Netherlands
| | - Olga V Kalinina
- Helmholtz Institute for Pharmaceutical Research Saarland (HIPS), Helmholtz Centre for Infection Research (HZI), Saarbrücken, Germany
- Drug Bioinformatics, Medical Faculty, Saarland University, Homburg, Germany
- Center for Bioinformatics, Saarland University, Saarbrücken, Germany
| | | | - Hyunwoo Kim
- College of Pharmacy and Integrated Research Institute for Drug Development, Dongguk University Seoul, Goyang-si, Republic of Korea
| | - Tiago F Leao
- Center for Nuclear Energy in Agriculture, University of São Paulo, Piracicaba, Brazil
| | - Joleen Masschelein
- Center for Microbiology, VIB-KU Leuven, Heverlee, Belgium
- Department of Biology, KU Leuven, Heverlee, Belgium
| | - Evan R Rees
- Division of Pharmaceutical Sciences, School of Pharmacy, University of Wisconsin-Madison, Madison, WI, USA
| | - Raphael Reher
- Institute of Pharmaceutical Biology and Biotechnology, University of Marburg, Marburg, Germany
- Institute of Pharmacy, Martin-Luther-University Halle-Wittenberg, Halle (Saale), Germany
| | - Daniel Reker
- Department of Biomedical Engineering, Duke University, Durham, NC, USA
- Duke Microbiome Center, Duke University, Durham, NC, USA
| | - Philippe Schwaller
- Laboratory of Artificial Chemical Intelligence, Institut des Sciences et Ingénierie Chimiques, Ecole Polytechnique Fédérale de Lausanne (EPFL), Lausanne, Switzerland
| | | | - Michael A Skinnider
- Adapsyn Bioscience, Hamilton, Ontario, Canada
- Michael Smith Laboratories, University of British Columbia, Vancouver, British Columbia, Canada
| | - Allison S Walker
- Department of Chemistry, Vanderbilt University, Nashville, TN, USA
- Department of Biological Sciences, Vanderbilt University, Nashville, TN, USA
| | - Egon L Willighagen
- Department of Bioinformatics - BiGCaT, NUTRIM, Maastricht University, Maastricht, The Netherlands
| | - Barbara Zdrazil
- European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Cambridgeshire, UK
| | - Nadine Ziemert
- Interfaculty Institute for Microbiology and Infection Medicine Tuebingen (IMIT), Institute for Bioinformatics and Medical Informatics (IBMI), University of Tuebingen, Tuebingen, Germany
| | | | - Pierre Guyomard
- Bonsai team, CRIStAL - Centre de Recherche en Informatique Signal et Automatique de Lille, Université de Lille, Villeneuve d'Ascq Cedex, France
| | - Andrea Volkamer
- Center for Bioinformatics, Saarland University, Saarbrücken, Germany
- In silico Toxicology and Structural Bioinformatics, Institute of Physiology, Charité - Universitätsmedizin Berlin, Berlin, Germany
| | - William H Gerwick
- Scripps Institution of Oceanography, University of California San Diego, La Jolla, CA, USA
| | - Hyun Uk Kim
- Department of Chemical and Biomolecular Engineering, Korea Advanced Institute of Science and Technology (KAIST), Daejeon, Korea
| | - Rolf Müller
- Helmholtz Institute for Pharmaceutical Research Saarland (HIPS), Helmholtz Centre for Infection Research (HZI), Saarbrücken, Germany
- Department of Pharmacy, Saarland University, Saarbrücken, Germany
- German Center for infection research (DZIF), Braunschweig, Germany
- Helmholtz International Lab for Anti-Infectives, Saarbrücken, Germany
| | - Gilles P van Wezel
- Department of Molecular Biotechnology, Institute of Biology, Leiden University, Leiden, The Netherlands
- Netherlands Institute of Ecology, NIOO-KNAW, Wageningen, The Netherlands
| | - Gerard J P van Westen
- Drug Discovery and Safety, Leiden Academic Centre for Drug Research, Leiden, The Netherlands.
| | - Anna K H Hirsch
- Helmholtz Institute for Pharmaceutical Research Saarland (HIPS), Helmholtz Centre for Infection Research (HZI), Saarbrücken, Germany.
- Department of Pharmacy, Saarland University, Saarbrücken, Germany.
- German Center for infection research (DZIF), Braunschweig, Germany.
- Helmholtz International Lab for Anti-Infectives, Saarbrücken, Germany.
| | - Roger G Linington
- Department of Chemistry, Simon Fraser University, Burnaby, British Columbia, Canada.
| | - Serina L Robinson
- Department of Environmental Microbiology, Eawag: Swiss Federal Institute for Aquatic Science and Technology, Dübendorf, Switzerland.
| | - Marnix H Medema
- Bioinformatics Group, Wageningen University, Wageningen, The Netherlands.
- Institute of Biology, Leiden University, Leiden, The Netherlands.
| |
Collapse
|
21
|
Connell W, Garcia K, Goodarzi H, Keiser MJ. Learning chemical sensitivity reveals mechanisms of cellular response. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.08.26.554851. [PMID: 37693536 PMCID: PMC10491110 DOI: 10.1101/2023.08.26.554851] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 09/12/2023]
Abstract
Chemical probes interrogate disease mechanisms at the molecular level by linking genetic changes to observable traits. However, comprehensive chemical screens in diverse biological models are impractical. To address this challenge, we developed ChemProbe, a model that predicts cellular sensitivity to hundreds of molecular probes and drugs by learning to combine transcriptomes and chemical structures. Using ChemProbe, we inferred the chemical sensitivity of cancer cell lines and tumor samples and analyzed how the model makes predictions. We retrospectively evaluated drug response predictions for precision breast cancer treatment and prospectively validated chemical sensitivity predictions in new cellular models, including a genetically modified cell line. Our model interpretation analysis identified transcriptome features reflecting compound targets and protein network modules, identifying genes that drive ferroptosis. ChemProbe is an interpretable in silico screening tool that allows researchers to measure cellular response to diverse compounds, facilitating research into molecular mechanisms of chemical sensitivity.
Collapse
Affiliation(s)
- William Connell
- Department of Pharmaceutical Chemistry, University of California, San Francisco, San Francisco, CA, USA
- Institute for Neurodegenerative Diseases, University of California, San Francisco, San Francisco, CA, USA
- Bakar Computational Health Sciences Institute, University of California, San Francisco, San Francisco, CA, USA
| | - Kristle Garcia
- Bakar Computational Health Sciences Institute, University of California, San Francisco, San Francisco, CA, USA
- Department of Biochemistry and Biophysics, University of California, San Francisco, San Francisco, CA, USA
- Department of Urology, University of California, San Francisco, San Francisco, CA, USA
- Helen Diller Family Comprehensive Cancer Center, University of California, San Francisco, San Francisco, CA, USA
| | - Hani Goodarzi
- Bakar Computational Health Sciences Institute, University of California, San Francisco, San Francisco, CA, USA
- Department of Biochemistry and Biophysics, University of California, San Francisco, San Francisco, CA, USA
- Department of Urology, University of California, San Francisco, San Francisco, CA, USA
- Helen Diller Family Comprehensive Cancer Center, University of California, San Francisco, San Francisco, CA, USA
| | - Michael J. Keiser
- Department of Pharmaceutical Chemistry, University of California, San Francisco, San Francisco, CA, USA
- Institute for Neurodegenerative Diseases, University of California, San Francisco, San Francisco, CA, USA
- Bakar Computational Health Sciences Institute, University of California, San Francisco, San Francisco, CA, USA
| |
Collapse
|
22
|
Choi SR, Lee M. Transformer Architecture and Attention Mechanisms in Genome Data Analysis: A Comprehensive Review. BIOLOGY 2023; 12:1033. [PMID: 37508462 PMCID: PMC10376273 DOI: 10.3390/biology12071033] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/20/2023] [Revised: 07/18/2023] [Accepted: 07/21/2023] [Indexed: 07/30/2023]
Abstract
The emergence and rapid development of deep learning, specifically transformer-based architectures and attention mechanisms, have had transformative implications across several domains, including bioinformatics and genome data analysis. The analogous nature of genome sequences to language texts has enabled the application of techniques that have exhibited success in fields ranging from natural language processing to genomic data. This review provides a comprehensive analysis of the most recent advancements in the application of transformer architectures and attention mechanisms to genome and transcriptome data. The focus of this review is on the critical evaluation of these techniques, discussing their advantages and limitations in the context of genome data analysis. With the swift pace of development in deep learning methodologies, it becomes vital to continually assess and reflect on the current standing and future direction of the research. Therefore, this review aims to serve as a timely resource for both seasoned researchers and newcomers, offering a panoramic view of the recent advancements and elucidating the state-of-the-art applications in the field. Furthermore, this review paper serves to highlight potential areas of future investigation by critically evaluating studies from 2019 to 2023, thereby acting as a stepping-stone for further research endeavors.
Collapse
Affiliation(s)
| | - Minhyeok Lee
- School of Electrical and Electronics Engineering, Chung-Ang University, Seoul 06974, Republic of Korea;
| |
Collapse
|
23
|
Moshkov N, Becker T, Yang K, Horvath P, Dancik V, Wagner BK, Clemons PA, Singh S, Carpenter AE, Caicedo JC. Predicting compound activity from phenotypic profiles and chemical structures. Nat Commun 2023; 14:1967. [PMID: 37031208 PMCID: PMC10082762 DOI: 10.1038/s41467-023-37570-1] [Citation(s) in RCA: 40] [Impact Index Per Article: 20.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/24/2022] [Accepted: 03/23/2023] [Indexed: 04/10/2023] Open
Abstract
Predicting assay results for compounds virtually using chemical structures and phenotypic profiles has the potential to reduce the time and resources of screens for drug discovery. Here, we evaluate the relative strength of three high-throughput data sources-chemical structures, imaging (Cell Painting), and gene-expression profiles (L1000)-to predict compound bioactivity using a historical collection of 16,170 compounds tested in 270 assays for a total of 585,439 readouts. All three data modalities can predict compound activity for 6-10% of assays, and in combination they predict 21% of assays with high accuracy, which is a 2 to 3 times higher success rate than using a single modality alone. In practice, the accuracy of predictors could be lower and still be useful, increasing the assays that can be predicted from 37% with chemical structures alone up to 64% when combined with phenotypic data. Our study shows that unbiased phenotypic profiling can be leveraged to enhance compound bioactivity prediction to accelerate the early stages of the drug-discovery process.
Collapse
Affiliation(s)
- Nikita Moshkov
- Broad Institute of MIT and Harvard, Cambridge, USA
- Biological Research Centre, Szeged, Hungary
| | - Tim Becker
- Broad Institute of MIT and Harvard, Cambridge, USA
| | | | | | - Vlado Dancik
- Broad Institute of MIT and Harvard, Cambridge, USA
| | | | | | | | | | | |
Collapse
|
24
|
Zheng Z, Chen J, Chen X, Huang L, Xie W, Lin Q, Li X, Wong K. Enabling Single-Cell Drug Response Annotations from Bulk RNA-Seq Using SCAD. ADVANCED SCIENCE (WEINHEIM, BADEN-WURTTEMBERG, GERMANY) 2023; 10:e2204113. [PMID: 36762572 PMCID: PMC10104628 DOI: 10.1002/advs.202204113] [Citation(s) in RCA: 9] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 07/19/2022] [Revised: 12/09/2022] [Indexed: 06/18/2023]
Abstract
The single-cell RNA sequencing (scRNA-seq) quantifies the gene expression of individual cells, while the bulk RNA sequencing (bulk RNA-seq) characterizes the mixed transcriptome of cells. The inference of drug sensitivities for individual cells can provide new insights to understand the mechanism of anti-cancer response heterogeneity and drug resistance at the cellular resolution. However, pharmacogenomic information related to their corresponding scRNA-Seq is often limited. Therefore, a transfer learning model is proposed to infer the drug sensitivities at single-cell level. This framework learns bulk transcriptome profiles and pharmacogenomics information from population cell lines in a large public dataset and transfers the knowledge to infer drug efficacy of individual cells. The results suggest that it is suitable to learn knowledge from pre-clinical cell lines to infer pre-existing cell subpopulations with different drug sensitivities prior to drug exposure. In addition, the model offers a new perspective on drug combinations. It is observed that drug-resistant subpopulation can be sensitive to other drugs (e.g., a subset of JHU006 is Vorinostat-resistant while Gefitinib-sensitive); such finding corroborates the previously reported drug combination (Gefitinib + Vorinostat) strategy in several cancer types. The identified drug sensitivity biomarkers reveal insights into the tumor heterogeneity and treatment at cellular resolution.
Collapse
Affiliation(s)
- Zetian Zheng
- Department of Computer ScienceCity University of Hong KongKowloonHong Kong
| | - Junyi Chen
- The Laboratory of Data Discovery for Health (D²4H), Hong Kong Science ParkNew TerritoriesHong Kong
| | - Xingjian Chen
- Department of Computer ScienceCity University of Hong KongKowloonHong Kong
| | - Lei Huang
- Department of Computer ScienceCity University of Hong KongKowloonHong Kong
| | - Weidun Xie
- Department of Computer ScienceCity University of Hong KongKowloonHong Kong
| | - Qiuzhen Lin
- College of Computer Science and Software Engineering, Shenzhen UniversityShenzhenChina
| | - Xiangtao Li
- School of Artificial IntelligenceJilin UniversityJilinChina
| | - Ka‐Chun Wong
- Department of Computer ScienceCity University of Hong KongKowloonHong Kong
- Shenzhen Research InstituteCity University of Hong KongShenzhenChina
- Hong Kong Institute for Data ScienceCity University of Hong KongKowloonHong Kong
| |
Collapse
|
25
|
Partin A, Brettin TS, Zhu Y, Narykov O, Clyde A, Overbeek J, Stevens RL. Deep learning methods for drug response prediction in cancer: Predominant and emerging trends. Front Med (Lausanne) 2023; 10:1086097. [PMID: 36873878 PMCID: PMC9975164 DOI: 10.3389/fmed.2023.1086097] [Citation(s) in RCA: 39] [Impact Index Per Article: 19.5] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/01/2022] [Accepted: 01/23/2023] [Indexed: 02/17/2023] Open
Abstract
Cancer claims millions of lives yearly worldwide. While many therapies have been made available in recent years, by in large cancer remains unsolved. Exploiting computational predictive models to study and treat cancer holds great promise in improving drug development and personalized design of treatment plans, ultimately suppressing tumors, alleviating suffering, and prolonging lives of patients. A wave of recent papers demonstrates promising results in predicting cancer response to drug treatments while utilizing deep learning methods. These papers investigate diverse data representations, neural network architectures, learning methodologies, and evaluations schemes. However, deciphering promising predominant and emerging trends is difficult due to the variety of explored methods and lack of standardized framework for comparing drug response prediction models. To obtain a comprehensive landscape of deep learning methods, we conducted an extensive search and analysis of deep learning models that predict the response to single drug treatments. A total of 61 deep learning-based models have been curated, and summary plots were generated. Based on the analysis, observable patterns and prevalence of methods have been revealed. This review allows to better understand the current state of the field and identify major challenges and promising solution paths.
Collapse
Affiliation(s)
- Alexander Partin
- Division of Data Science and Learning, Argonne National Laboratory, Lemont, IL, United States
| | - Thomas S. Brettin
- Division of Data Science and Learning, Argonne National Laboratory, Lemont, IL, United States
| | - Yitan Zhu
- Division of Data Science and Learning, Argonne National Laboratory, Lemont, IL, United States
| | - Oleksandr Narykov
- Division of Data Science and Learning, Argonne National Laboratory, Lemont, IL, United States
| | - Austin Clyde
- Division of Data Science and Learning, Argonne National Laboratory, Lemont, IL, United States
| | - Jamie Overbeek
- Division of Data Science and Learning, Argonne National Laboratory, Lemont, IL, United States
| | - Rick L. Stevens
- Division of Data Science and Learning, Argonne National Laboratory, Lemont, IL, United States
- Department of Computer Science, The University of Chicago, Chicago, IL, United States
| |
Collapse
|
26
|
Zhao X, Yang T, Li B, Zhang X. SwinGAN: A dual-domain Swin Transformer-based generative adversarial network for MRI reconstruction. Comput Biol Med 2023; 153:106513. [PMID: 36603439 DOI: 10.1016/j.compbiomed.2022.106513] [Citation(s) in RCA: 26] [Impact Index Per Article: 13.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/01/2022] [Revised: 12/09/2022] [Accepted: 12/31/2022] [Indexed: 01/02/2023]
Abstract
Magnetic resonance imaging (MRI) is one of the most important modalities for clinical diagnosis. However, the main disadvantages of MRI are the long scanning time and the moving artifact caused by patient movement during prolonged imaging. It can also lead to patient anxiety and discomfort, so accelerated imaging is indispensable for MRI. Convolutional neural network (CNN) based methods have become the fact standard for medical image reconstruction, and generative adversarial network (GAN) have also been widely used. Nevertheless, due to the limited ability of CNN to capture long-distance information, it may lead to defects in the structure of the reconstructed images such as blurry contour. In this paper, we propose a novel Swin Transformer-based dual-domain generative adversarial network (SwinGAN) for accelerated MRI reconstruction. The SwinGAN consists of two generators: a frequency-domain generator and an image-domain generator. Both the generators utilize Swin Transformer as backbone for effectively capturing the long-distance dependencies. A contextual image relative position encoder (ciRPE) is designed to enhance the ability to capture local information. We extensively evaluate the method on the IXI brain dataset, MICCAI 2013 dataset and MRNet knee dataset. Compared with KIGAN, the peak signal-to-noise ratio (PSNR) and structural similarity index measure (SSIM) are improved by 6.1% and 1.49% to 37.64 dB and 0.98 on IXI dataset respectively, which demonstrates that our model can sufficiently utilize the local and global information of image. The model shows promising performance and robustness under different undersampling masks, different acceleration rates and different datasets. But it needs high hardware requirements with the increasing of the network parameters. The code is available at: https://github.com/learnerzx/SwinGAN.
Collapse
Affiliation(s)
- Xiang Zhao
- School of Information Science and Engineering, Henan University of Technology, Zhengzhou, 450001, China
| | - Tiejun Yang
- School of Artificial Intelligence and Big Data, Henan University of Technology, Zhengzhou, 450001, China; Key Laboratory of Grain Information Processing and Control (HAUT), Ministry of Education, Zhengzhou, China; Henan Key Laboratory of Grain Photoelectric Detection and Control (HAUT), Zhengzhou, Henan, China.
| | - Bingjie Li
- School of Information Science and Engineering, Henan University of Technology, Zhengzhou, 450001, China
| | - Xin Zhang
- School of Information Science and Engineering, Henan University of Technology, Zhengzhou, 450001, China
| |
Collapse
|
27
|
Shen B, Feng F, Li K, Lin P, Ma L, Li H. A systematic assessment of deep learning methods for drug response prediction: from in vitro to clinical applications. Brief Bioinform 2023; 24:6961794. [PMID: 36575826 DOI: 10.1093/bib/bbac605] [Citation(s) in RCA: 8] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/08/2022] [Revised: 10/30/2022] [Accepted: 12/09/2022] [Indexed: 12/29/2022] Open
Abstract
Drug response prediction is an important problem in personalized cancer therapy. Among various newly developed models, significant improvement in prediction performance has been reported using deep learning methods. However, systematic comparisons of deep learning methods, especially of the transferability from preclinical models to clinical cohorts, are currently lacking. To provide a more rigorous assessment, the performance of six representative deep learning methods for drug response prediction using nine evaluation metrics, including the overall prediction accuracy, predictability of each drug, potential associated factors and transferability to clinical cohorts, in multiple application scenarios was benchmarked. Most methods show promising prediction within cell line datasets, and TGSA, with its lower time cost and better performance, is recommended. Although the performance metrics decrease when applying models trained on cell lines to patients, a certain amount of power to distinguish clinical response on some drugs can be maintained using CRDNN and TGSA. With these assessments, we provide a guidance for researchers to choose appropriate methods, as well as insights into future directions for the development of more effective methods in clinical scenarios.
Collapse
Affiliation(s)
- Bihan Shen
- Cancer Systems Biology group at Shanghai Institute of Nutrition and Health, University of Chinese Academy of Sciences, Chinese Academy of Sciences, Shanghai, China
| | - Fangyoumin Feng
- Cancer Systems Biology group at Shanghai Institute of Nutrition and Health, University of Chinese Academy of Sciences, Chinese Academy of Sciences, Shanghai, China
| | - Kunshi Li
- Cancer Systems Biology group at Shanghai Institute of Nutrition and Health, University of Chinese Academy of Sciences, Chinese Academy of Sciences, Shanghai, China
| | - Ping Lin
- Cancer Systems Biology group at Shanghai Institute of Nutrition and Health, University of Chinese Academy of Sciences, Chinese Academy of Sciences, Shanghai, China
| | - Liangxiao Ma
- Bio-Med Big Data Center at Shanghai Institute of Nutrition and Health, University of Chinese Academy of Sciences, Chinese Academy of Sciences, Shanghai, China
| | - Hong Li
- Cancer Systems Biology group at Shanghai Institute of Nutrition and Health, University of Chinese Academy of Sciences, Chinese Academy of Sciences, Shanghai, China
| |
Collapse
|
28
|
Revealing Prognostic and Immunotherapy-Sensitive Characteristics of a Novel Cuproptosis-Related LncRNA Model in Hepatocellular Carcinoma Patients by Genomic Analysis. Cancers (Basel) 2023; 15:cancers15020544. [PMID: 36672493 PMCID: PMC9857215 DOI: 10.3390/cancers15020544] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/29/2022] [Revised: 01/05/2023] [Accepted: 01/10/2023] [Indexed: 01/18/2023] Open
Abstract
Immunotherapy has shown strong anti-tumor activity in a subset of patients. However, many patients do not benefit from the treatment, and there is no effective method to identify sensitive immunotherapy patients. Cuproptosis as a non-apoptotic programmed cell death caused by excess copper, whether it is related to tumor immunity has attracted our attention. In the study, we constructed the prognostic model of 9 cuproptosis-related LncRNAs (crLncRNAs) and assessed its predictive capability, preliminarily explored the potential mechanism causing treatment sensitivity difference between the high-/low-risk group. Our results revealed that the risk score was more effective than traditional clinical features in predicting the survival of HCC patients (AUC = 0.828). The low-risk group had more infiltration of immune cells (B cells, CD8+ T cells, CD4+ T cells), mainly with anti-tumor immune function (p < 0.05). It showed higher sensitivity to immune checkpoint inhibitors (ICIs) treatment (p < 0.001) which may exert the effect through the AL365361.1/hsa-miR-17-5p/NLRP3 axis. In addition, NLRP3 mutation-sensitive drugs (VNLG/124, sunitinib, linifanib) may have better clinical benefits in the high-risk group. All in all, the crLncRNAs model has excellent specificity and sensitivity, which can be used for classifying the therapy-sensitive population and predicting the prognosis of HCC patients.
Collapse
|
29
|
Singh DP, Kaushik B. A systematic literature review for the prediction of anticancer drug response using various machine-learning and deep-learning techniques. Chem Biol Drug Des 2023; 101:175-194. [PMID: 36303299 DOI: 10.1111/cbdd.14164] [Citation(s) in RCA: 5] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/02/2022] [Revised: 10/13/2022] [Accepted: 10/24/2022] [Indexed: 12/24/2022]
Abstract
Computational methods have gained prominence in healthcare research. The accessibility of healthcare data has greatly incited academicians and researchers to develop executions that help in prognosis of cancer drug response. Among various computational methods, machine-learning (ML) and deep-learning (DL) methods provide the most consistent and effectual approaches to handle the serious aftermaths of the deadly disease and drug administered to the patients. Hence, this systematic literature review has reviewed researches that have investigated drug discovery and prognosis of anticancer drug response using ML and DL algorithms. Fot this purpose, PRISMA guidelines have been followed to choose research papers from Google Scholar, PubMed, and Sciencedirect websites. A total count of 105 papers that align with the context of this review were chosen. Further, the review also presents accuracy of the existing ML and DL methods in the prediction of anticancer drug response. It has been found from the review that, amidst the availability of various studies, there are certain challenges associated with each method. Thus, future researchers can consider these limitations and challenges to develop a prominent anticancer drug response prediction method, and it would be greatly beneficial to the medical professionals in administering non-invasive treatment to the patients.
Collapse
Affiliation(s)
- Davinder Paul Singh
- School of Computer Science and Engineering, Shri Mata Vaishno Devi University, Katra, Jammu and Kashmir, India
| | - Baijnath Kaushik
- School of Computer Science and Engineering, Shri Mata Vaishno Devi University, Katra, Jammu and Kashmir, India
| |
Collapse
|
30
|
Zeng X, Wang F, Luo Y, Kang SG, Tang J, Lightstone FC, Fang EF, Cornell W, Nussinov R, Cheng F. Deep generative molecular design reshapes drug discovery. Cell Rep Med 2022; 3:100794. [PMID: 36306797 PMCID: PMC9797947 DOI: 10.1016/j.xcrm.2022.100794] [Citation(s) in RCA: 71] [Impact Index Per Article: 23.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/05/2022] [Revised: 08/05/2022] [Accepted: 09/30/2022] [Indexed: 11/05/2022]
Abstract
Recent advances and accomplishments of artificial intelligence (AI) and deep generative models have established their usefulness in medicinal applications, especially in drug discovery and development. To correctly apply AI, the developer and user face questions such as which protocols to consider, which factors to scrutinize, and how the deep generative models can integrate the relevant disciplines. This review summarizes classical and newly developed AI approaches, providing an updated and accessible guide to the broad computational drug discovery and development community. We introduce deep generative models from different standpoints and describe the theoretical frameworks for representing chemical and biological structures and their applications. We discuss the data and technical challenges and highlight future directions of multimodal deep generative models for accelerating drug discovery.
Collapse
Affiliation(s)
- Xiangxiang Zeng
- College of Computer Science and Electronic Engineering, Hunan University, Changsha, Hunan 410082, P.R. China
| | - Fei Wang
- Department of Population Health Sciences, Weill Cornell Medical College, Cornell University, New York, NY 10065, USA
| | - Yuan Luo
- Division of Health and Biomedical Informatics, Department of Preventive Medicine, Feinberg School of Medicine, Northwestern University, Chicago, IL 60611, USA
| | - Seung-Gu Kang
- Healthcare & Life Sciences Research, IBM TJ Watson Research Center, 1101 Kitchawan Road, Yorktown Heights, NY 10598, USA
| | - Jian Tang
- Mila-Quebec Institute for Learning Algorithms and CIFAR AI Research Chair, HEC Montreal, Montréal, QC H3T 2A7, Canada
| | - Felice C Lightstone
- Biosciences and Biotechnology Division, Physical and Life Sciences Directorate, Lawrence Livermore National Lab, Livermore, CA 94550, USA
| | - Evandro F Fang
- Department of Clinical Molecular Biology, University of Oslo and Akershus University Hospital, 1478 Lørenskog, Oslo, Norway; The Norwegian Centre on Healthy Ageing (NO-Age), Oslo, Norway
| | - Wendy Cornell
- Healthcare & Life Sciences Research, IBM TJ Watson Research Center, 1101 Kitchawan Road, Yorktown Heights, NY 10598, USA
| | - Ruth Nussinov
- Computational Structural Biology Section, Frederick National Laboratory for Cancer Research in the Laboratory of Cancer Immunometabolism, National Cancer Institute, Frederick, MD 21702, USA; Department of Human Molecular Genetics and Biochemistry, Sackler School of Medicine, Tel Aviv University, Tel Aviv 69978, Israel
| | - Feixiong Cheng
- Genomic Medicine Institute, Lerner Research Institute, Cleveland Clinic, Cleveland, OH 44195, USA; Department of Molecular Medicine, Cleveland Clinic Lerner College of Medicine, Case Western Reserve University, Cleveland, OH 44195, USA; Case Comprehensive Cancer Center, Case Western Reserve University School of Medicine, Cleveland, OH 44106, USA.
| |
Collapse
|
31
|
Lee M, Kim PJ, Joe H, Kim HG. Gene-centric multi-omics integration with convolutional encoders for cancer drug response prediction. Comput Biol Med 2022; 151:106192. [PMID: 36327883 DOI: 10.1016/j.compbiomed.2022.106192] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/14/2022] [Revised: 08/26/2022] [Accepted: 10/08/2022] [Indexed: 12/27/2022]
Abstract
MOTIVATION Tumor heterogeneity, including genetic and transcriptomic characteristics, can reduce the efficacy of anticancer pharmacological therapy, resulting in clinical variability in patient response to therapeutic medications. Multi-omics integration can allow in silico models to provide an additional perspective on a biological system. METHODS In this study, we propose a gene-centric multi-channel (GCMC) architecture to integrate multi-omics for predicting cancer drug response. GCMC transformed multi-omics profiles into a three-dimensional tensor with an additional dimension for omics types. GCMC's convolutional encoders captures multi-omics profiles for each gene and yields gene-centric features to predict drug responses. RESULTS We evaluated GCMC on various datasets, including The Cancer Genome Atlas (TCGA) patients, patient-derived xenografts (PDX) mice models, and the Genomics of Drug Sensitivity in Cancer (GDSC) cell line datasets. GCMC achieved better performance than baseline models, including single-omics models, in more than 75% of 265 drugs from GDSC cell line datasets. Furthermore, as for the clinical applicability of GCMC, it achieved the best performance on TCGA and PDX datasets in terms of both AUPR and AUC. We also analyzed models' capability of integrating multi-omics profiles by measuring the contribution ratio of omics types. GCMC can incorporate multi-omics profiles in various manners to enhance performance for each drug type. These results suggested that GCMC can improve performance and feature extraction capability by integrating multi-omics profiles in a gene-centric manner.
Collapse
Affiliation(s)
- Munhwan Lee
- Biomedical Knowledge Engineering Lab., Seoul National University, 1 Gwanak-ro, Seoul, 08826, Republic of Korea.
| | - Pil-Jong Kim
- Biomedical Knowledge Engineering Lab., Seoul National University, 1 Gwanak-ro, Seoul, 08826, Republic of Korea.
| | - Hyunwhan Joe
- Biomedical Knowledge Engineering Lab., Seoul National University, 1 Gwanak-ro, Seoul, 08826, Republic of Korea.
| | - Hong-Gee Kim
- Biomedical Knowledge Engineering Lab., Seoul National University, 1 Gwanak-ro, Seoul, 08826, Republic of Korea.
| |
Collapse
|
32
|
Samal BR, Loers JU, Vermeirssen V, De Preter K. Opportunities and challenges in interpretable deep learning for drug sensitivity prediction of cancer cells. FRONTIERS IN BIOINFORMATICS 2022; 2:1036963. [PMID: 36466148 PMCID: PMC9714662 DOI: 10.3389/fbinf.2022.1036963] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/05/2022] [Accepted: 11/03/2022] [Indexed: 01/02/2024] Open
Abstract
In precision oncology, therapy stratification is done based on the patients' tumor molecular profile. Modeling and prediction of the drug response for a given tumor molecular type will further improve therapeutic decision-making for cancer patients. Indeed, deep learning methods hold great potential for drug sensitivity prediction, but a major problem is that these models are black box algorithms and do not clarify the mechanisms of action. This puts a limitation on their clinical implementation. To address this concern, many recent studies attempt to overcome these issues by developing interpretable deep learning methods that facilitate the understanding of the logic behind the drug response prediction. In this review, we discuss strengths and limitations of recent approaches, and suggest future directions that could guide further improvement of interpretable deep learning in drug sensitivity prediction in cancer research.
Collapse
Affiliation(s)
- Bikash Ranjan Samal
- Department of Biomolecular Medicine, Ghent University, Ghent, Belgium
- Center for Medical Genetics Ghent (CMGG), Ghent University, Ghent, Belgium
- Cancer Research Institute Ghent (CRIG), Ghent, Belgium
| | - Jens Uwe Loers
- Department of Biomolecular Medicine, Ghent University, Ghent, Belgium
- Center for Medical Genetics Ghent (CMGG), Ghent University, Ghent, Belgium
- Cancer Research Institute Ghent (CRIG), Ghent, Belgium
- Department of Biomedical Molecular Biology, Ghent University, Ghent, Belgium
| | - Vanessa Vermeirssen
- Department of Biomolecular Medicine, Ghent University, Ghent, Belgium
- Center for Medical Genetics Ghent (CMGG), Ghent University, Ghent, Belgium
- Cancer Research Institute Ghent (CRIG), Ghent, Belgium
- Department of Biomedical Molecular Biology, Ghent University, Ghent, Belgium
| | - Katleen De Preter
- Department of Biomolecular Medicine, Ghent University, Ghent, Belgium
- Center for Medical Genetics Ghent (CMGG), Ghent University, Ghent, Belgium
- Cancer Research Institute Ghent (CRIG), Ghent, Belgium
| |
Collapse
|
33
|
Sidak D, Schwarzerová J, Weckwerth W, Waldherr S. Interpretable machine learning methods for predictions in systems biology from omics data. Front Mol Biosci 2022; 9:926623. [PMID: 36387282 PMCID: PMC9650551 DOI: 10.3389/fmolb.2022.926623] [Citation(s) in RCA: 7] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/22/2022] [Accepted: 08/15/2022] [Indexed: 12/02/2022] Open
Abstract
Machine learning has become a powerful tool for systems biologists, from diagnosing cancer to optimizing kinetic models and predicting the state, growth dynamics, or type of a cell. Potential predictions from complex biological data sets obtained by “omics” experiments seem endless, but are often not the main objective of biological research. Often we want to understand the molecular mechanisms of a disease to develop new therapies, or we need to justify a crucial decision that is derived from a prediction. In order to gain such knowledge from data, machine learning models need to be extended. A recent trend to achieve this is to design “interpretable” models. However, the notions around interpretability are sometimes ambiguous, and a universal recipe for building well-interpretable models is missing. With this work, we want to familiarize systems biologists with the concept of model interpretability in machine learning. We consider data sets, data preparation, machine learning methods, and software tools relevant to omics research in systems biology. Finally, we try to answer the question: “What is interpretability?” We introduce views from the interpretable machine learning community and propose a scheme for categorizing studies on omics data. We then apply these tools to review and categorize recent studies where predictive machine learning models have been constructed from non-sequential omics data.
Collapse
Affiliation(s)
- David Sidak
- Department of Functional and Evolutionary Ecology, Faculty of Life Sciences, Molecular Systems Biology (MOSYS), University of Vienna, Vienna, Austria
| | - Jana Schwarzerová
- Department of Functional and Evolutionary Ecology, Faculty of Life Sciences, Molecular Systems Biology (MOSYS), University of Vienna, Vienna, Austria
- Department of Biomedical Engineering, Faculty of Electrical Engineering and Communication, Brno University of Technology, Brno, Czech Republic
| | - Wolfram Weckwerth
- Department of Functional and Evolutionary Ecology, Faculty of Life Sciences, Molecular Systems Biology (MOSYS), University of Vienna, Vienna, Austria
- Vienna Metabolomics Center (VIME), Faculty of Life Sciences, University of Vienna, Vienna, Austria
| | - Steffen Waldherr
- Department of Functional and Evolutionary Ecology, Faculty of Life Sciences, Molecular Systems Biology (MOSYS), University of Vienna, Vienna, Austria
- *Correspondence: Steffen Waldherr,
| |
Collapse
|
34
|
He D, Liu Q, Wu Y, Xie L. A context-aware deconfounding autoencoder for robust prediction of personalized clinical drug response from cell-line compound screening. NAT MACH INTELL 2022; 4:879-892. [PMID: 38895093 PMCID: PMC11185412 DOI: 10.1038/s42256-022-00541-0] [Citation(s) in RCA: 16] [Impact Index Per Article: 5.3] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/24/2021] [Accepted: 09/08/2022] [Indexed: 11/09/2022]
Abstract
Accurate and robust prediction of patient-specific responses to a new compound is critical to personalized drug discovery and development. However, patient data are often too scarce to train a generalized machine learning model. Although many methods have been developed to utilize cell-line screens for predicting clinical responses, their performances are unreliable owing to data heterogeneity and distribution shift. Here we have developed a novel context-aware deconfounding autoencoder (CODE-AE) that can extract intrinsic biological signals masked by context-specific patterns and confounding factors. Extensive comparative studies demonstrated that CODE-AE effectively alleviated the out-of-distribution problem for the model generalization and significantly improved accuracy and robustness over state-of-the-art methods in predicting patient-specific clinical drug responses purely from cell-line compound screens. Using CODE-AE, we screened 59 drugs for 9,808 patients with cancer. Our results are consistent with existing clinical observations, suggesting the potential of CODE-AE in developing personalized therapies and drug response biomarkers.
Collapse
Affiliation(s)
- Di He
- PhD program in Computer Science, Graduate Center, City University of New York, New York, NY, USA
| | - Qiao Liu
- Department of Computer Science, Hunter College, City University of New York, New York, NY, USA
| | - You Wu
- PhD program in Computer Science, Graduate Center, City University of New York, New York, NY, USA
| | - Lei Xie
- PhD program in Computer Science, Graduate Center, City University of New York, New York, NY, USA
- Department of Computer Science, Hunter College, City University of New York, New York, NY, USA
- Feil Family Brain and Mind Research Institute, Weill Cornell Medicine, Cornell University, New York, NY, USA
| |
Collapse
|
35
|
Park H, Imoto S, Miyano S. PredictiveNetwork: predictive gene network estimation with application to gastric cancer drug response-predictive network analysis. BMC Bioinformatics 2022; 23:342. [PMID: 35974335 PMCID: PMC9380306 DOI: 10.1186/s12859-022-04871-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/29/2022] [Accepted: 08/02/2022] [Indexed: 11/22/2022] Open
Abstract
Background Gene regulatory networks have garnered a large amount of attention to understand disease mechanisms caused by complex molecular network interactions. These networks have been applied to predict specific clinical characteristics, e.g., cancer, pathogenicity, and anti-cancer drug sensitivity. However, in most previous studies using network-based prediction, the gene networks were estimated first, and predicted clinical characteristics based on pre-estimated networks. Thus, the estimated networks cannot describe clinical characteristic-specific gene regulatory systems. Furthermore, existing computational methods were developed from algorithmic and mathematics viewpoints, without considering network biology. Results To effectively predict clinical characteristics and estimate gene networks that provide critical insights into understanding the biological mechanisms involved in a clinical characteristic, we propose a novel strategy for predictive gene network estimation. The proposed strategy simultaneously performs gene network estimation and prediction of the clinical characteristic. In this strategy, the gene network is estimated with minimal network estimation and prediction errors. We incorporate network biology by assuming that neighboring genes in a network have similar biological functions, while hub genes play key roles in biological processes. Thus, the proposed method provides interpretable prediction results and enables us to uncover biologically reliable marker identification. Monte Carlo simulations shows the effectiveness of our method for feature selection in gene estimation and prediction with excellent prediction accuracy. We applied the proposed strategy to construct gastric cancer drug-responsive networks. Conclusion We identified gastric drug response predictive markers and drug sensitivity/resistance-specific markers, AKR1B10, AKR1C3, ANXA10, and ZNF165, based on GDSC data analysis. Our results for identifying drug sensitive and resistant specific molecular interplay are strongly supported by previous studies. We expect that the proposed strategy will be a useful tool for uncovering crucial molecular interactions involved a specific biological mechanism, such as cancer progression or acquired drug resistance. Supplementary Information The online version contains supplementary material available at 10.1186/s12859-022-04871-z.
Collapse
Affiliation(s)
- Heewon Park
- M&D Data Science Center, Tokyo Medical and Dental University, 1-5-45 Yushima, Bunkyo-ku, Tokyo, Japan.
| | - Seiya Imoto
- Human Genome Center, Institute of Medical Science, University of Tokyo, 4-6-1 Shirokane-dai, Minato-ku, Tokyo, Japan
| | - Satoru Miyano
- M&D Data Science Center, Tokyo Medical and Dental University, 1-5-45 Yushima, Bunkyo-ku, Tokyo, Japan.,Human Genome Center, Institute of Medical Science, University of Tokyo, 4-6-1 Shirokane-dai, Minato-ku, Tokyo, Japan
| |
Collapse
|
36
|
Towards computational solutions for precision medicine based big data healthcare system using deep learning models: A review. Comput Biol Med 2022; 149:106020. [DOI: 10.1016/j.compbiomed.2022.106020] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/07/2022] [Revised: 08/16/2022] [Accepted: 08/20/2022] [Indexed: 12/14/2022]
|
37
|
Research on Image Segmentation Algorithm Based on Multimodal Hierarchical Attention Mechanism and Genetic Neural Network. COMPUTATIONAL INTELLIGENCE AND NEUROSCIENCE 2022; 2022:9980928. [PMID: 35707183 PMCID: PMC9192265 DOI: 10.1155/2022/9980928] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 03/15/2022] [Revised: 04/11/2022] [Accepted: 05/05/2022] [Indexed: 11/24/2022]
Abstract
Multimodal tasks based on attention mechanism and language face numerous problems. Based on multimodal hierarchical attention mechanism and genetic neural network, this paper studies the application of image segmentation algorithm in data completion and 3D scene reconstruction. The algorithm refers to the process of concentrating attention that humans subjectively pay attention to and calculates the difference between each pixel in the genetic neural network test image in the color space and the average value of the target image, which solves the problem of static feature maps and dynamic feature maps of image sequences. In addition, in view of the problem that the number of attention enhancement feature extraction modules is too large and the parameters are too large, the recursive mechanism is used as the feature extraction branch, and new model parameters are not added when the network depth is increased. The simulation results show that the accuracy of the improved image saliency detection algorithm based on the attention mechanism reaches 89.7%, and the difference between the average value of the single-point pixel and the target image is reduced to 0.132, which further promotes the practicability and reliability of the image segmentation model.
Collapse
|
38
|
Farnoud A, Ohnmacht AJ, Meinel M, Menden MP. Can artificial intelligence accelerate preclinical drug discovery and precision medicine? Expert Opin Drug Discov 2022; 17:661-665. [PMID: 35708267 DOI: 10.1080/17460441.2022.2090540] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/04/2022]
Abstract
Precision medicine leverages molecular biomarkers for selecting optimal treatment strategies. Accordingly, stratifying patients into responder and non-responder has strongly accelerated drug discovery and drug approvals in the last two decades. Recently, the applications of artificial intelligence (AI) in healthcare have been promoting these processes, and continue to improve patient care through systematically analysing large-scale molecular data and electronic health records. In particular, preclinical pharmacogenomics data empowered AI to unfold its full potential.Areas covered: Here, we discuss the opportunities of AI in pharmacogenomics, drug discovery and precision medicine. In particular, we shed some light on the advancements in computational biomedicine from statistical, machine learning (ML) to complex deep learning (DL) models.Expert opinion: AI has already strongly impacted drug discovery, and will continue to revolutionise academic research and the pharmaceutical industry. Its algorithms aid the identification of novel treatment options through molecular signatures and thus pave the way for the next generation of precision medicine.
Collapse
Affiliation(s)
- Ali Farnoud
- Institute of Computational Biology, Helmholtz Zentrum München, Munich, Germany
| | - Alexander J Ohnmacht
- Institute of Computational Biology, Helmholtz Zentrum München, Munich, Germany.,Department of Biology, Ludwig-Maximilians University Munich, Munich, Germany
| | - Martin Meinel
- Institute of Computational Biology, Helmholtz Zentrum München, Munich, Germany.,Department of Dermatology and Allergy, Technical University Munich, Munich, Germany
| | - Michael P Menden
- Institute of Computational Biology, Helmholtz Zentrum München, Munich, Germany.,Department of Biology, Ludwig-Maximilians University Munich, Munich, Germany.,German Center for Diabetes Research (DZD e.V.), Munich, Germany
| |
Collapse
|
39
|
Park H, Yamaguchi R, Imoto S, Miyano S. Xprediction: Explainable EGFR-TKIs response prediction based on drug sensitivity specific gene networks. PLoS One 2022; 17:e0261630. [PMID: 35584089 PMCID: PMC9116684 DOI: 10.1371/journal.pone.0261630] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/12/2021] [Accepted: 12/06/2021] [Indexed: 12/03/2022] Open
Abstract
In recent years, drug sensitivity prediction has garnered a great deal of attention due to the growing interest in precision medicine. Several computational methods have been developed for drug sensitivity prediction and the identification of related markers. However, most previous studies have ignored genetic interaction, although complex diseases (e.g., cancer) involve many genes intricately connected in a molecular network rather than the abnormality of a single gene. To effectively predict drug sensitivity and understand its mechanism, we propose a novel strategy for explainable drug sensitivity prediction based on sample-specific gene regulatory networks, designated Xprediction. Our strategy first estimates sample-specific gene regulatory networks that enable us to identify the molecular interplay underlying varying clinical characteristics of cell lines. We then, predict drug sensitivity based on the estimated sample-specific gene regulatory networks. The predictive models are based on machine learning approaches, i.e., random forest, kernel support vector machine, and deep neural network. Although the machine learning models provide remarkable results for prediction and classification, we cannot understand how the models reach their decisions. In other words, the methods suffer from the black box problem and thus, we cannot identify crucial molecular interactions that involve drug sensitivity-related mechanisms. To address this issue, we propose a method that describes the importance of each molecular interaction for the drug sensitivity prediction result. The proposed method enables us to identify crucial gene-gene interactions and thereby, interpret the prediction results based on the identified markers. To evaluate our strategy, we applied Xprediction to EGFR-TKIs prediction based on drug sensitivity specific gene regulatory networks and identified important molecular interactions for EGFR-TKIs prediction. Our strategy effectively performed drug sensitivity prediction compared with prediction based on the expression levels of genes. We also verified through literature, the EGFR-TKIs-related mechanisms of a majority of the identified markers. We expect our strategy to be a useful tool for predicting tasks and uncovering complex mechanisms related to pharmacological profiles, such as mechanisms of acquired drug resistance or sensitivity of cancer cells.
Collapse
Affiliation(s)
- Heewon Park
- M&D Data Science Center, Tokyo Medical and Dental University, Bunkyo-ku, Tokyo, Japan
- * E-mail:
| | - Rui Yamaguchi
- Division of Cancer Systems Biology, Aichi Cancer Center Research Institute, Chikusa-ku, Nagoya, Aichi, Japan
- Division of Cancer Informatics, Nagoya University Graduate School of Medicine, Showa-ku, Nagoya, Aichi, Japan
- Human Genome Center, The Institute of Medical Science, The University of Tokyo, Minato-ku, Tokyo, Japan
| | - Seiya Imoto
- Human Genome Center, The Institute of Medical Science, The University of Tokyo, Minato-ku, Tokyo, Japan
| | - Satoru Miyano
- M&D Data Science Center, Tokyo Medical and Dental University, Bunkyo-ku, Tokyo, Japan
- Human Genome Center, The Institute of Medical Science, The University of Tokyo, Minato-ku, Tokyo, Japan
| |
Collapse
|
40
|
Jiang L, Jiang C, Yu X, Fu R, Jin S, Liu X. DeepTTA: a transformer-based model for predicting cancer drug response. Brief Bioinform 2022; 23:6554594. [PMID: 35348595 DOI: 10.1093/bib/bbac100] [Citation(s) in RCA: 35] [Impact Index Per Article: 11.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/23/2021] [Revised: 02/08/2022] [Accepted: 02/27/2022] [Indexed: 12/27/2022] Open
Abstract
Identifying new lead molecules to treat cancer requires more than a decade of dedicated effort. Before selected drug candidates are used in the clinic, their anti-cancer activity is generally validated by in vitro cellular experiments. Therefore, accurate prediction of cancer drug response is a critical and challenging task for anti-cancer drugs design and precision medicine. With the development of pharmacogenomics, the combination of efficient drug feature extraction methods and omics data has made it possible to use computational models to assist in drug response prediction. In this study, we propose DeepTTA, a novel end-to-end deep learning model that utilizes transformer for drug representation learning and a multilayer neural network for transcriptomic data prediction of the anti-cancer drug responses. Specifically, DeepTTA uses transcriptomic gene expression data and chemical substructures of drugs for drug response prediction. Compared to existing methods, DeepTTA achieved higher performance in terms of root mean square error, Pearson correlation coefficient and Spearman's rank correlation coefficient on multiple test sets. Moreover, we discovered that anti-cancer drugs bortezomib and dactinomycin provide a potential therapeutic option with multiple clinical indications. With its excellent performance, DeepTTA is expected to be an effective method in cancer drug design.
Collapse
Affiliation(s)
- Likun Jiang
- Department of Computer Science, Xiamen University, Xiamen 361005, China.,National Institute for Data Science in Health and Medicine, Xiamen University, Xiamen 361005, China
| | - Changzhi Jiang
- Department of Computer Science, Xiamen University, Xiamen 361005, China
| | - Xinyu Yu
- Department of Computer Science, Xiamen University, Xiamen 361005, China
| | - Rao Fu
- Department of Computer Science, Xiamen University, Xiamen 361005, China
| | - Shuting Jin
- Department of Computer Science, Xiamen University, Xiamen 361005, China.,National Institute for Data Science in Health and Medicine, Xiamen University, Xiamen 361005, China
| | - Xiangrong Liu
- Department of Computer Science, Xiamen University, Xiamen 361005, China.,National Institute for Data Science in Health and Medicine, Xiamen University, Xiamen 361005, China
| |
Collapse
|
41
|
Ramesh P, Veerappapillai S. Designing Novel Compounds for the Treatment and Management of RET-Positive Non-Small Cell Lung Cancer-Fragment Based Drug Design Strategy. Molecules 2022; 27:1590. [PMID: 35268691 PMCID: PMC8911629 DOI: 10.3390/molecules27051590] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/28/2022] [Revised: 02/17/2022] [Accepted: 02/20/2022] [Indexed: 11/29/2022] Open
Abstract
Rearranged during transfection (RET) is an oncogenic driver receptor that is overexpressed in several cancer types, including non-small cell lung cancer. To date, only multiple kinase inhibitors are widely used to treat RET-positive cancer patients. These inhibitors exhibit high toxicity, less efficacy, and specificity against RET. The development of drug-resistant mutations in RET protein further deteriorates this situation. Hence, in the present study, we aimed to design novel drug-like compounds using a fragment-based drug designing strategy to overcome these issues. About 18 known inhibitors from diverse chemical classes were fragmented and bred to form novel compounds against RET proteins. The inhibitory activity of the resultant 115 hybrid molecules was evaluated using molecular docking and RF-Score analysis. The binding free energy and chemical reactivity of the compounds were computed using MM-GBSA and density functional theory analysis, respectively. The results from our study revealed that the developed hybrid molecules except for LF21 and LF27 showed higher reactivity and stability than Pralsetinib. Ultimately, the process resulted in three hybrid molecules namely LF1, LF2, and LF88 having potent inhibitory activity against RET proteins. The scrutinized molecules were then subjected to molecular dynamics simulation for 200 ns and MM-PBSA analysis to eliminate a false positive design. The results from our analysis hypothesized that the designed compounds exhibited significant inhibitory activity against multiple RET variants. Thus, these could be considered as potential leads for further experimental studies.
Collapse
Affiliation(s)
| | - Shanthi Veerappapillai
- Department of Biotechnology, School of Bio Sciences and Technology, Vellore Institute of Technology, Vellore 632014, India;
| |
Collapse
|
42
|
Kang SG, Morrone JA, Weber JK, Cornell WD. Analysis of Training and Seed Bias in Small Molecules Generated with a Conditional Graph-Based Variational Autoencoder─Insights for Practical AI-Driven Molecule Generation. J Chem Inf Model 2022; 62:801-816. [PMID: 35130440 DOI: 10.1021/acs.jcim.1c01545] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]
Abstract
The application of deep learning to generative molecule design has shown early promise for accelerating lead series development. However, questions remain concerning how factors like training, data set, and seed bias impact the technology's utility to medicinal and computational chemists. In this work, we analyze the impact of seed and training bias on the output of an activity-conditioned graph-based variational autoencoder (VAE). Leveraging a massive, labeled data set corresponding to the dopamine D2 receptor, our graph-based generative model is shown to excel in producing desired conditioned activities and favorable unconditioned physical properties in generated molecules. We implement an activity-swapping method that allows for the activation, deactivation, or retention of activity of molecular seeds, and we apply independent deep learning classifiers to verify the generative results. Overall, we uncover relationships between noise, molecular seeds, and training set selection across a range of latent-space sampling procedures, providing important insights for practical AI-driven molecule generation.
Collapse
Affiliation(s)
- Seung-Gu Kang
- Computational Biology Center, IBM Thomas J. Watson Research Center, 1101 Kitchawan Road, Yorktown Heights, New York 10594, United States
| | - Joseph A Morrone
- Computational Biology Center, IBM Thomas J. Watson Research Center, 1101 Kitchawan Road, Yorktown Heights, New York 10594, United States
| | - Jeffrey K Weber
- Computational Biology Center, IBM Thomas J. Watson Research Center, 1101 Kitchawan Road, Yorktown Heights, New York 10594, United States
| | - Wendy D Cornell
- Computational Biology Center, IBM Thomas J. Watson Research Center, 1101 Kitchawan Road, Yorktown Heights, New York 10594, United States
| |
Collapse
|
43
|
Prasse P, Iversen P, Lienhard M, Thedinga K, Bauer C, Herwig R, Scheffer T. Matching anticancer compounds and tumor cell lines by neural networks with ranking loss. NAR Genom Bioinform 2022; 4:lqab128. [PMID: 35047818 PMCID: PMC8759564 DOI: 10.1093/nargab/lqab128] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/09/2021] [Revised: 12/03/2021] [Accepted: 12/29/2021] [Indexed: 12/24/2022] Open
Abstract
Computational drug sensitivity models have the potential to improve therapeutic outcomes by identifying targeted drug components that are likely to achieve the highest efficacy for a cancer cell line at hand at a therapeutic dose. State of the art drug sensitivity models use regression techniques to predict the inhibitory concentration of a drug for a tumor cell line. This regression objective is not directly aligned with either of these principal goals of drug sensitivity models: We argue that drug sensitivity modeling should be seen as a ranking problem with an optimization criterion that quantifies a drug's inhibitory capacity for the cancer cell line at hand relative to its toxicity for healthy cells. We derive an extension to the well-established drug sensitivity regression model PaccMann that employs a ranking loss and focuses on the ratio of inhibitory concentration and therapeutic dosage range. We find that the ranking extension significantly enhances the model's capability to identify the most effective anticancer drugs for unseen tumor cell profiles based in on in-vitro data.
Collapse
Affiliation(s)
- Paul Prasse
- To whom correspondence should be addressed. Tel: +49 331 977 3829;
| | | | - Matthias Lienhard
- Dep. Computational Molecular Biology, Max Planck Institute for Molecular Genetics, Berlin, Germany
| | - Kristina Thedinga
- Dep. Computational Molecular Biology, Max Planck Institute for Molecular Genetics, Berlin, Germany
| | | | - Ralf Herwig
- Dep. Computational Molecular Biology, Max Planck Institute for Molecular Genetics, Berlin, Germany
| | - Tobias Scheffer
- University of Potsdam, Department of Computer Science, Potsdam, Germany
| |
Collapse
|
44
|
Firoozbakht F, Yousefi B, Schwikowski B. An overview of machine learning methods for monotherapy drug response prediction. Brief Bioinform 2022; 23:bbab408. [PMID: 34619752 PMCID: PMC8769705 DOI: 10.1093/bib/bbab408] [Citation(s) in RCA: 23] [Impact Index Per Article: 7.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/07/2021] [Revised: 08/25/2021] [Accepted: 09/06/2021] [Indexed: 12/11/2022] Open
Abstract
For an increasing number of preclinical samples, both detailed molecular profiles and their responses to various drugs are becoming available. Efforts to understand, and predict, drug responses in a data-driven manner have led to a proliferation of machine learning (ML) methods, with the longer term ambition of predicting clinical drug responses. Here, we provide a uniquely wide and deep systematic review of the rapidly evolving literature on monotherapy drug response prediction, with a systematic characterization and classification that comprises more than 70 ML methods in 13 subclasses, their input and output data types, modes of evaluation, and code and software availability. ML experts are provided with a fundamental understanding of the biological problem, and how ML methods are configured for it. Biologists and biomedical researchers are introduced to the basic principles of applicable ML methods, and their application to the problem of drug response prediction. We also provide systematic overviews of commonly used data sources used for training and evaluation methods.
Collapse
Affiliation(s)
- Farzaneh Firoozbakht
- Systems Biology Group, Department of Computational Biology, Institut Pasteur, Paris, France
| | - Behnam Yousefi
- Systems Biology Group, Department of Computational Biology, Institut Pasteur, Paris, France
- Sorbonne Université, École Doctorale Complexite du Vivant, Paris, France
| | - Benno Schwikowski
- Systems Biology Group, Department of Computational Biology, Institut Pasteur, Paris, France
| |
Collapse
|
45
|
A gradient tree boosting and network propagation derived pan-cancer survival network of the tumor microenvironment. iScience 2022; 25:103617. [PMID: 35106465 PMCID: PMC8786644 DOI: 10.1016/j.isci.2021.103617] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/09/2021] [Revised: 11/12/2021] [Accepted: 12/09/2021] [Indexed: 12/22/2022] Open
Abstract
Predicting cancer survival from molecular data is an important aspect of biomedical research because it allows quantifying patient risks and thus individualizing therapy. We introduce XGBoost tree ensemble learning to predict survival from transcriptome data of 8,024 patients from 25 different cancer types and show highly competitive performance with state-of-the-art methods. To further improve plausibility of the machine learning approach we conducted two additional steps. In the first step, we applied pan-cancer training and showed that it substantially improves prognosis compared with cancer subtype-specific training. In the second step, we applied network propagation and inferred a pan-cancer survival network consisting of 103 genes. This network highlights cross-cohort features and is predictive for the tumor microenvironment and immune status of the patients. Our work demonstrates that pan-cancer learning combined with network propagation generalizes over multiple cancer types and identifies biologically plausible features that can serve as biomarkers for monitoring cancer survival. Highly performing cancer survival prediction with XGBoost Pan-cancer training outperforms single-cohort training Combined approach consisting of machine learning and network propagation Tumor microenvironment is most strongly involved in cancer survival prediction
Collapse
|
46
|
Born J, Huynh T, Stroobants A, Cornell WD, Manica M. Active Site Sequence Representations of Human Kinases Outperform Full Sequence Representations for Affinity Prediction and Inhibitor Generation: 3D Effects in a 1D Model. J Chem Inf Model 2021; 62:240-257. [PMID: 34905358 DOI: 10.1021/acs.jcim.1c00889] [Citation(s) in RCA: 12] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/11/2022]
Abstract
Recent advances in deep learning have enabled the development of large-scale multimodal models for virtual screening and de novo molecular design. The human kinome with its abundant sequence and inhibitor data presents an attractive opportunity to develop proteochemometric models that exploit the size and internal diversity of this family of targets. Here, we challenge a standard practice in sequence-based affinity prediction models: instead of leveraging the full primary structure of proteins, each target is represented by a sequence of 29 discontiguous residues defining the ATP binding site. In kinase-ligand binding affinity prediction, our results show that the reduced active site sequence representation is not only computationally more efficient but consistently yields significantly higher performance than the full primary structure. This trend persists across different models, data sets, and performance metrics and holds true when predicting pIC50 for both unseen ligands and kinases. Our interpretability analysis reveals a potential explanation for the superiority of the active site models: whereas only mild statistical effects about the extraction of three-dimensional (3D) interaction sites take place in the full sequence models, the active site models are equipped with an implicit but strong inductive bias about the 3D structure stemming from the discontiguity of the active sites. Moreover, in direct comparisons, our models perform similarly or better than previous state-of-the-art approaches in affinity prediction. We then investigate a de novo molecular design task and find that the active site provides benefits in the computational efficiency, but otherwise, both kinase representations yield similar optimized affinities (for both SMILES- and SELFIES-based molecular generators). Our work challenges the assumption that the full primary structure is indispensable for modeling human kinases.
Collapse
Affiliation(s)
- Jannis Born
- IBM Research Europe, 8804 Rüschlikon, Switzerland.,Department of Biosystems Science and Engineering, ETH Zurich, 4058 Basel, Switzerland
| | - Tien Huynh
- IBM Research, Yorktown Heights, New York 10598, United States
| | - Astrid Stroobants
- Department of Chemistry, Imperial College London, SW7 2AZ London, United Kingdom
| | - Wendy D Cornell
- IBM Research, Yorktown Heights, New York 10598, United States
| | | |
Collapse
|
47
|
|
48
|
Out-of-distribution generalization from labelled and unlabelled gene expression data for drug response prediction. NAT MACH INTELL 2021. [DOI: 10.1038/s42256-021-00408-w] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/04/2023]
|
49
|
Zagidullin B, Wang Z, Guan Y, Pitkänen E, Tang J. Comparative analysis of molecular fingerprints in prediction of drug combination effects. Brief Bioinform 2021; 22:bbab291. [PMID: 34401895 PMCID: PMC8574997 DOI: 10.1093/bib/bbab291] [Citation(s) in RCA: 41] [Impact Index Per Article: 10.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/12/2021] [Revised: 06/01/2021] [Accepted: 07/07/2021] [Indexed: 12/18/2022] Open
Abstract
Application of machine and deep learning methods in drug discovery and cancer research has gained a considerable amount of attention in the past years. As the field grows, it becomes crucial to systematically evaluate the performance of novel computational solutions in relation to established techniques. To this end, we compare rule-based and data-driven molecular representations in prediction of drug combination sensitivity and drug synergy scores using standardized results of 14 high-throughput screening studies, comprising 64 200 unique combinations of 4153 molecules tested in 112 cancer cell lines. We evaluate the clustering performance of molecular representations and quantify their similarity by adapting the Centered Kernel Alignment metric. Our work demonstrates that to identify an optimal molecular representation type, it is necessary to supplement quantitative benchmark results with qualitative considerations, such as model interpretability and robustness, which may vary between and throughout preclinical drug development projects.
Collapse
Affiliation(s)
- B Zagidullin
- Research Program in Systems Oncology, Faculty of Medicine, University of Helsinki, Finland
| | - Z Wang
- Department of Electrical Engineering & Computer Science, University of Michigan, Ann Arbor, USA
| | - Y Guan
- Department of Computational Medicine & Bioinformatics, University of Michigan, Ann Arbor, USA
| | - E Pitkänen
- Institute for Molecular Medicine Finland (FIMM) & Applied Tumor Genomics Research Program, Research Programs Unit, University of Helsinki, Finland
| | - J Tang
- Research Program in Systems Oncology, Faculty of Medicine, University of Helsinki, Finland
| |
Collapse
|
50
|
An X, Chen X, Yi D, Li H, Guan Y. Representation of molecules for drug response prediction. Brief Bioinform 2021; 23:6375515. [PMID: 34571534 DOI: 10.1093/bib/bbab393] [Citation(s) in RCA: 17] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/08/2021] [Revised: 08/28/2021] [Accepted: 08/30/2021] [Indexed: 12/18/2022] Open
Abstract
The rapid development of machine learning and deep learning algorithms in the recent decade has spurred an outburst of their applications in many research fields. In the chemistry domain, machine learning has been widely used to aid in drug screening, drug toxicity prediction, quantitative structure-activity relationship prediction, anti-cancer synergy score prediction, etc. This review is dedicated to the application of machine learning in drug response prediction. Specifically, we focus on molecular representations, which is a crucial element to the success of drug response prediction and other chemistry-related prediction tasks. We introduce three types of commonly used molecular representation methods, together with their implementation and application examples. This review will serve as a brief introduction of the broad field of molecular representations.
Collapse
Affiliation(s)
- Xin An
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, MI, USA
| | - Xi Chen
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, MI, USA
| | - Daiyao Yi
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, MI, USA
| | - Hongyang Li
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, MI, USA
| | - Yuanfang Guan
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, MI, USA
| |
Collapse
|