1
|
Ramon A, Ni M, Predeina O, Gaffey R, Kunz P, Onuoha S, Sormanni P. Prediction of protein biophysical traits from limited data: a case study on nanobody thermostability through NanoMelt. MAbs 2025; 17:2442750. [PMID: 39772905 PMCID: PMC11730357 DOI: 10.1080/19420862.2024.2442750] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/04/2024] [Revised: 12/10/2024] [Accepted: 12/11/2024] [Indexed: 01/11/2025] Open
Abstract
In-silico prediction of protein biophysical traits is often hindered by the limited availability of experimental data and their heterogeneity. Training on limited data can lead to overfitting and poor generalizability to sequences distant from those in the training set. Additionally, inadequate use of scarce and disparate data can introduce biases during evaluation, leading to unreliable model performances being reported. Here, we present a comprehensive study exploring various approaches for protein fitness prediction from limited data, leveraging pre-trained embeddings, repeated stratified nested cross-validation, and ensemble learning to ensure an unbiased assessment of the performances. We applied our framework to introduce NanoMelt, a predictor of nanobody thermostability trained with a dataset of 640 measurements of apparent melting temperature, obtained by integrating data from the literature with 129 new measurements from this study. We find that an ensemble model stacking multiple regression using diverse sequence embeddings achieves state-of-the-art accuracy in predicting nanobody thermostability. We further demonstrate NanoMelt's potential to streamline nanobody development by guiding the selection of highly stable nanobodies. We make the curated dataset of nanobody thermostability freely available and NanoMelt accessible as a downloadable software and webserver.
Collapse
Affiliation(s)
- Aubin Ramon
- Centre for Misfolding Diseases, Yusuf Hamied Department of Chemistry, University of Cambridge, Cambridge, UK
| | - Mingyang Ni
- Centre for Misfolding Diseases, Yusuf Hamied Department of Chemistry, University of Cambridge, Cambridge, UK
| | - Olga Predeina
- Centre for Misfolding Diseases, Yusuf Hamied Department of Chemistry, University of Cambridge, Cambridge, UK
| | - Rebecca Gaffey
- Centre for Misfolding Diseases, Yusuf Hamied Department of Chemistry, University of Cambridge, Cambridge, UK
| | - Patrick Kunz
- Division of Functional Genome Analysis, German Cancer Research Center (DKFZ), Heidelberg, Germany
| | | | - Pietro Sormanni
- Centre for Misfolding Diseases, Yusuf Hamied Department of Chemistry, University of Cambridge, Cambridge, UK
| |
Collapse
|
2
|
Dai W, Chen G, Peng W, Chen C, Fu X, Liu L, Liu L, Yu N. Domain alignment method based on masked variational autoencoder for predicting patient anticancer drug response. Methods 2025; 238:61-73. [PMID: 40090506 DOI: 10.1016/j.ymeth.2025.03.012] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/19/2024] [Revised: 02/03/2025] [Accepted: 03/14/2025] [Indexed: 03/18/2025] Open
Abstract
Predicting the patient's response to anticancer drugs is essential in personalized treatment plans. However, due to significant distribution differences between cell line data and patient data, models trained well on cell line data may perform poorly on patient anticancer drug response predictions. Some existing methods use transfer learning strategies to implement domain feature alignment between cell lines and patient data and leverage knowledge from cell lines to predict patient anticancer drug responses. This study proposes a domain alignment method based on masked variational autoencoders, MVAEDA, to predict patient anticancer drug responses. The model constructs multiple variational autoencoders (VAEs) and mask predictors to extract specific and domain-invariant features of cell lines and patients. Then, it masks and reconstructs the gene expression matrix, using generative adversarial training to learn domain-invariant features from the cell line and patient domains. These domain-invariant features are then used to train a classifier. Finally, the final trained model predicts the anticancer drug response in the target domain. Our model is experimentally evaluated on the clinical dataset and the preclinical dataset. The results show that our method performs better than other state-of-the-art methods.
Collapse
Affiliation(s)
- Wei Dai
- Faculty of Information Engineering and Automation, Kunming University of Science and Technology, Kunming 650050, China; Computer Technology Application Key Lab of Yunnan Province, Kunming University of Science and Technology, Kunming 650050, China.
| | - Gong Chen
- Faculty of Information Engineering and Automation, Kunming University of Science and Technology, Kunming 650050, China
| | - Wei Peng
- Faculty of Information Engineering and Automation, Kunming University of Science and Technology, Kunming 650050, China; Computer Technology Application Key Lab of Yunnan Province, Kunming University of Science and Technology, Kunming 650050, China.
| | - Chuyue Chen
- Faculty of Information Engineering and Automation, Kunming University of Science and Technology, Kunming 650050, China
| | - Xiaodong Fu
- Faculty of Information Engineering and Automation, Kunming University of Science and Technology, Kunming 650050, China; Computer Technology Application Key Lab of Yunnan Province, Kunming University of Science and Technology, Kunming 650050, China
| | - Li Liu
- Faculty of Information Engineering and Automation, Kunming University of Science and Technology, Kunming 650050, China; Computer Technology Application Key Lab of Yunnan Province, Kunming University of Science and Technology, Kunming 650050, China.
| | - Lijun Liu
- Faculty of Information Engineering and Automation, Kunming University of Science and Technology, Kunming 650050, China; Computer Technology Application Key Lab of Yunnan Province, Kunming University of Science and Technology, Kunming 650050, China
| | - Ning Yu
- State University of New York, The College at Brockport, Department of Computing Sciences, 350 New Campus Drive, Brockport, NY 14422, United States.
| |
Collapse
|
3
|
Carli F, De Oliveira Rosa N, Blotas S, Di Chiaro P, Bisceglia L, Morelli M, Lessi F, Di Stefano AL, Mazzanti CM, Natoli G, Raimondi F. CellHit: a web server to predict and analyze cancer patients' drug responsiveness. Nucleic Acids Res 2025:gkaf414. [PMID: 40377071 DOI: 10.1093/nar/gkaf414] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/24/2025] [Revised: 04/17/2025] [Accepted: 05/02/2025] [Indexed: 05/18/2025] Open
Abstract
We present the CellHit web server (https://cellhit.bioinfolab.sns.it/), a web-based platform designed to predict and analyze cancer patients' responsiveness to drugs using transcriptomic data. By leveraging extensive pharmacogenomics datasets from the Genomics of Drug Sensitivity in Cancer v1 and v2 (GDSC) and Profiling Relative Inhibition Simultaneously in Mixtures (PRISM) and transcriptomic data from the Cancer Cell Line Encyclopedia (CCLE) and The Cancer Genome Atlas Program (TCGA). CellHit integrates a computational pipeline for preprocessing, gene imputation, and robust alignment between patient and cell line transcriptomic data with pre-trained SOTA models for drug sensitivity prediction. The pipeline employs batch correction, enhanced Celligner methodology, and Parametric UMAP for stable and actionable alignment. The intuitive interface requires no programming expertise, offering interactive visualizations, including low-dimensional embeddings and drug sensitivity heatmaps for the input transcriptomic samples. Results feature contextual metadata, SHAP-based feature importance, and transcriptomic neighbors from reference datasets, simplifying interpretation and hypothesis generation. CellHit provides precomputed predictions across TCGA samples and offers the ability to run custom analyses online on input samples, democratizing precision oncology by enabling rapid, interpretable predictions accessible the research community.
Collapse
Affiliation(s)
- Francesco Carli
- Laboratorio di Biologia Bio@SNS, Scuola Normale Superiore, Piazza dei Cavalieri 7, 56126 Pisa, Italy
- Department of Computer Science, Univerisity of Pisa, Largo B. Pontecorvo 3, 56127 Pisa, Italy
| | - Natalia De Oliveira Rosa
- Laboratorio di Biologia Bio@SNS, Scuola Normale Superiore, Piazza dei Cavalieri 7, 56126 Pisa, Italy
| | - Simon Blotas
- Laboratorio di Biologia Bio@SNS, Scuola Normale Superiore, Piazza dei Cavalieri 7, 56126 Pisa, Italy
| | - Pierluigi Di Chiaro
- Department of Experimental Oncology, IEO, European Institute of Oncology IRCCS, Via Ripamonti 435, 20141 Milano, Italy
| | - Luisa Bisceglia
- Laboratorio di Biologia Bio@SNS, Scuola Normale Superiore, Piazza dei Cavalieri 7, 56126 Pisa, Italy
| | - Mariangela Morelli
- Fondazione Pisana per la Scienza, Pisa, Via F. Giovannini 13, 56017 Pisa, Italy
| | - Francesca Lessi
- Fondazione Pisana per la Scienza, Pisa, Via F. Giovannini 13, 56017 Pisa, Italy
| | - Anna Luisa Di Stefano
- Neurosurgical Department of Spedali Riuniti di Livorno, Via V. Alfieri 36, 57124 Livorno, Italy
| | | | - Gioacchino Natoli
- Department of Experimental Oncology, IEO, European Institute of Oncology IRCCS, Via Ripamonti 435, 20141 Milano, Italy
| | - Francesco Raimondi
- Laboratorio di Biologia Bio@SNS, Scuola Normale Superiore, Piazza dei Cavalieri 7, 56126 Pisa, Italy
| |
Collapse
|
4
|
Huang K, Liu H. Identification of drug-resistant individual cells within tumors by semi-supervised transfer learning from bulk to single-cell transcriptome. Commun Biol 2025; 8:530. [PMID: 40164749 PMCID: PMC11958800 DOI: 10.1038/s42003-025-07959-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/25/2024] [Accepted: 03/19/2025] [Indexed: 04/02/2025] Open
Abstract
The presence of pre-existing or acquired drug-resistant cells within the tumor often leads to tumor relapse and metastasis. Single-cell RNA sequencing (scRNA-seq) enables elucidation of the subtle differences in drug responsiveness among distinct cell subpopulations within tumors. A few methods have employed scRNA-seq data to predict the drug response of individual cells to date, but their performance is far from satisfactory. In this study, we propose SSDA4Drug, a semi-supervised few-shot transfer learning method for inferring drug-resistant cancer cells. SSDA4Drug extracts pharmacogenomic features from both bulk and single-cell transcriptomic data using semi-supervised adversarial domain adaptation. This allows us to transfer knowledge of drug sensitivity from bulk-level cell lines to single cells. We conduct extensive performance evaluation experiments across multiple independent scRNA-seq datasets, demonstrating SSDA4Drug's superior performance over current state-of-the-art methods. Remarkably, with only one or two labeled target-domain samples, SSDA4Drug significantly boosts the predictive performance of single-cell drug responses. Moreover, SSDA4Drug accurately recapitulates the temporally dynamic changes of drug responses during continuous drug exposure of tumor cells, and successfully identifies reversible drug-responsive states in lung cancer cells, which initially acquire resistance through drug exposure but later restore sensitivity during drug holidays. Also, our predicted drug responses consistently align with the developmental patterns of drug sensitivity observed along the evolutionary trajectory of oral squamous cell carcinoma cells. In addition, our derived SHAP values and integrated gradients effectively pinpoint the key genes involved in drug resistance in prostate cancer cells. These findings highlight the exceptional performance of our method in determining single-cell drug responses. This powerful tool holds the potential for identifying drug-resistant tumor cell subpopulations, paving the way for advancements in precision medicine and novel drug development.
Collapse
Affiliation(s)
- Kaishun Huang
- College of Computer and Information Engineering, Nanjing Tech University, Nanjing, 211800, Jiangsu, China
| | - Hui Liu
- College of Computer and Information Engineering, Nanjing Tech University, Nanjing, 211800, Jiangsu, China.
| |
Collapse
|
5
|
Jayagopal A, Walsh RJ, Hariprasannan KK, Mariappan R, Mahapatra D, Jaynes PW, Lim D, Peng Tan DS, Tan TZ, Pitt JJ, Jeyasekharan AD, Rajan V. A multi-task domain-adapted model to predict chemotherapy response from mutations in recurrently altered cancer genes. iScience 2025; 28:111992. [PMID: 40160429 PMCID: PMC11952854 DOI: 10.1016/j.isci.2025.111992] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/03/2024] [Revised: 08/23/2024] [Accepted: 02/06/2025] [Indexed: 04/02/2025] Open
Abstract
Next-generation sequencing (NGS) is increasingly utilized in oncological practice; however, only a minority of patients benefit from targeted therapy. Developing drug response prediction (DRP) models is important for the "untargetable" majority. Prior DRP models typically use whole-transcriptome and whole-exome sequencing data, which are clinically unavailable. We aim to develop a DRP model toward the repurposing of chemotherapy, requiring only information from clinical-grade NGS (cNGS) panels of restricted gene sets. Data sparsity and limited patient drug response information make this challenging. We firstly show that existing DRPs perform equally with whole-exome versus cNGS (∼300 genes) data. Drug IDentifier (DruID) is then described, a DRP model for restricted gene sets using transfer learning, variant annotations, domain-invariant representation learning, and multi-task learning. DruID outperformed state-of-the-art DRP methods on pan-cancer data and showed robust response classification on two real-world clinical datasets, representing a step toward a clinically applicable DRP tool.
Collapse
Affiliation(s)
- Aishwarya Jayagopal
- Department of Information Systems and Analytics, School of Computing, National University of Singapore, Singapore 117417, Singapore
| | - Robert J. Walsh
- Department of Haematology-Oncology, National University Cancer Institute, NUHS Tower Block, Level 7, 1E Kent Ridge Road, Singapore 119228, Singapore
| | - Krishna Kumar Hariprasannan
- Department of Information Systems and Analytics, School of Computing, National University of Singapore, Singapore 117417, Singapore
| | - Ragunathan Mariappan
- Department of Information Systems and Analytics, School of Computing, National University of Singapore, Singapore 117417, Singapore
| | - Debabrata Mahapatra
- Department of Computer Science, School of Computing, National University of Singapore, Singapore 117417, Singapore
| | - Patrick William Jaynes
- Cancer Science Institute of Singapore, National University of Singapore, Center for Translational Medicine, 14 Medical Drive, #12-01, Singapore 117599, Singapore
| | - Diana Lim
- Department of Pathology, National University Health System, 1E Kent Ridge Road Singapore 119228, Singapore
| | - David Shao Peng Tan
- Department of Haematology-Oncology, National University Cancer Institute, NUHS Tower Block, Level 7, 1E Kent Ridge Road, Singapore 119228, Singapore
- Cancer Science Institute of Singapore, National University of Singapore, Center for Translational Medicine, 14 Medical Drive, #12-01, Singapore 117599, Singapore
- Department of Medicine, Yong Loo Lin School of Medicine, National University of Singapore. 1E Kent Ridge Road, NUHS Tower Block, Level 10, Singapore 119228, Singapore
| | - Tuan Zea Tan
- Cancer Science Institute of Singapore, National University of Singapore, Center for Translational Medicine, 14 Medical Drive, #12-01, Singapore 117599, Singapore
| | - Jason J. Pitt
- Cancer Science Institute of Singapore, National University of Singapore, Center for Translational Medicine, 14 Medical Drive, #12-01, Singapore 117599, Singapore
| | - Anand D. Jeyasekharan
- Department of Haematology-Oncology, National University Cancer Institute, NUHS Tower Block, Level 7, 1E Kent Ridge Road, Singapore 119228, Singapore
- Cancer Science Institute of Singapore, National University of Singapore, Center for Translational Medicine, 14 Medical Drive, #12-01, Singapore 117599, Singapore
| | - Vaibhav Rajan
- Department of Information Systems and Analytics, School of Computing, National University of Singapore, Singapore 117417, Singapore
| |
Collapse
|
6
|
Chang TG, Park S, Schäffer AA, Jiang P, Ruppin E. Hallmarks of artificial intelligence contributions to precision oncology. NATURE CANCER 2025; 6:417-431. [PMID: 40055572 PMCID: PMC11957836 DOI: 10.1038/s43018-025-00917-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/04/2024] [Accepted: 01/21/2025] [Indexed: 03/29/2025]
Abstract
The integration of artificial intelligence (AI) into oncology promises to revolutionize cancer care. In this Review, we discuss ten AI hallmarks in precision oncology, organized into three groups: (1) cancer prevention and diagnosis, encompassing cancer screening, detection and profiling; (2) optimizing current treatments, including patient outcome prediction, treatment planning and monitoring, clinical trial design and matching, and developing response biomarkers; and (3) advancing new treatments by identifying treatment combinations, discovering cancer vulnerabilities and designing drugs. We also survey AI applications in interventional clinical trials and address key challenges to broader clinical adoption of AI: data quality and quantity, model accuracy, clinical relevance and patient benefit, proposing actionable solutions for each.
Collapse
Affiliation(s)
- Tian-Gen Chang
- Cancer Data Science Laboratory, Center for Cancer Research, National Cancer Institute, National Institutes of Health, Bethesda, MD, USA.
| | - Seongyong Park
- Cancer Data Science Laboratory, Center for Cancer Research, National Cancer Institute, National Institutes of Health, Bethesda, MD, USA
| | - Alejandro A Schäffer
- Cancer Data Science Laboratory, Center for Cancer Research, National Cancer Institute, National Institutes of Health, Bethesda, MD, USA
| | - Peng Jiang
- Cancer Data Science Laboratory, Center for Cancer Research, National Cancer Institute, National Institutes of Health, Bethesda, MD, USA
| | - Eytan Ruppin
- Cancer Data Science Laboratory, Center for Cancer Research, National Cancer Institute, National Institutes of Health, Bethesda, MD, USA.
| |
Collapse
|
7
|
Carli F, Di Chiaro P, Morelli M, Arora C, Bisceglia L, De Oliveira Rosa N, Cortesi A, Franceschi S, Lessi F, Di Stefano AL, Santonocito OS, Pasqualetti F, Aretini P, Miglionico P, Diaferia GR, Giannotti F, Liò P, Duran-Frigola M, Mazzanti CM, Natoli G, Raimondi F. Learning and actioning general principles of cancer cell drug sensitivity. Nat Commun 2025; 16:1654. [PMID: 39952993 PMCID: PMC11828915 DOI: 10.1038/s41467-025-56827-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/22/2024] [Accepted: 02/03/2025] [Indexed: 02/17/2025] Open
Abstract
High-throughput screening of drug sensitivity of cancer cell lines (CCLs) holds the potential to unlock anti-tumor therapies. In this study, we leverage such datasets to predict drug response using cell line transcriptomics, focusing on models' interpretability and deployment on patients' data. We use large language models (LLMs) to match drug to mechanisms of action (MOA)-related pathways. Genes crucial for prediction are enriched in drug-MOAs, suggesting that our models learn the molecular determinants of response. Furthermore, by using only LLM-curated, MOA-genes, we enhance the predictive accuracy of our models. To enhance translatability, we align RNAseq data from CCLs, used for training, to those from patient samples, used for inference. We validated our approach on TCGA samples, where patients' best scoring drugs match those prescribed for their cancer type. We further predict and experimentally validate effective drugs for the patients of two highly lethal solid tumors, i.e., pancreatic cancer and glioblastoma.
Collapse
Affiliation(s)
- Francesco Carli
- Laboratorio di Biologia Bio@SNS, Scuola Normale Superiore, Pisa, Italy.
- Department of Computer Science, Univerisity of Pisa, Pisa, Italy.
| | - Pierluigi Di Chiaro
- Department of Experimental Oncology, IEO, European Institute of Oncology IRCCS, Milano, Italy
| | | | - Chakit Arora
- Laboratorio di Biologia Bio@SNS, Scuola Normale Superiore, Pisa, Italy
| | - Luisa Bisceglia
- Laboratorio di Biologia Bio@SNS, Scuola Normale Superiore, Pisa, Italy
| | | | - Alice Cortesi
- Department of Experimental Oncology, IEO, European Institute of Oncology IRCCS, Milano, Italy
| | | | | | | | | | | | | | | | - Giuseppe R Diaferia
- Department of Experimental Oncology, IEO, European Institute of Oncology IRCCS, Milano, Italy
- Botton-Champalimaud Pancreatic Cancer Center, Champalimaud Foundation, Lisbon, Portugal
| | | | - Pietro Liò
- Department of Computer Science and Technology, University of Cambridge, Cambridge, UK
| | | | | | - Gioacchino Natoli
- Department of Experimental Oncology, IEO, European Institute of Oncology IRCCS, Milano, Italy
| | | |
Collapse
|
8
|
Nilsson A, Meimetis N, Lauffenburger DA. Towards an interpretable deep learning model of cancer. NPJ Precis Oncol 2025; 9:46. [PMID: 39948231 PMCID: PMC11825879 DOI: 10.1038/s41698-025-00822-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/26/2024] [Accepted: 01/27/2025] [Indexed: 02/16/2025] Open
Abstract
Cancer is a manifestation of dysfunctional cell states. It emerges from an interplay of intrinsic and extrinsic factors that disrupt cellular dynamics, including genetic and epigenetic alterations, as well as the tumor microenvironment. This complexity can make it challenging to infer molecular causes for treating the disease. This may be addressed by system-wide computer models of cells, as they allow rapid generation and testing of hypotheses that would be too slow or impossible to perform in the laboratory and clinic. However, so far, such models have been impeded by both experimental and computational limitations. In this perspective, we argue that they can now be achieved using deep learning algorithms to integrate omics data and prior knowledge of molecular networks. Such models would have many applications in precision oncology, e.g., for identifying drug targets and biomarkers, predicting resistance mechanisms and toxicity effects of drugs, or simulating cell-cell interactions in the microenvironment.
Collapse
Affiliation(s)
- Avlant Nilsson
- Department of Biological Engineering, Massachusetts Institute of Technology, Cambridge, MA, USA
- Department of Biology and Biological Engineering, Chalmers University of Technology, Göteborg, Sweden
- Department of Cell and Molecular Biology, SciLifeLab, Karolinska Institutet, Stockholm, Sweden
| | - Nikolaos Meimetis
- Department of Biological Engineering, Massachusetts Institute of Technology, Cambridge, MA, USA
| | - Douglas A Lauffenburger
- Department of Biological Engineering, Massachusetts Institute of Technology, Cambridge, MA, USA.
| |
Collapse
|
9
|
Singhal A, Zhao X, Wall P, So E, Calderini G, Partin A, Koussa N, Vasanthakumari P, Narykov O, Zhu Y, Jones SE, Abbas-Aghababazadeh F, Nair SK, Bélisle-Pipon JC, Jayaram A, Parker BA, Yeung KT, Griffiths JI, Weil R, Nath A, Haibe-Kains B, Ideker T. The Hallmarks of Predictive Oncology. Cancer Discov 2025; 15:271-285. [PMID: 39760657 PMCID: PMC11969157 DOI: 10.1158/2159-8290.cd-24-0760] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/25/2024] [Revised: 08/30/2024] [Accepted: 10/16/2024] [Indexed: 01/07/2025]
Abstract
SIGNIFICANCE As the field of artificial intelligence evolves rapidly, these hallmarks are intended to capture fundamental, complementary concepts necessary for the progress and timely adoption of predictive modeling in precision oncology. Through these hallmarks, we hope to establish standards and guidelines that enable the symbiotic development of artificial intelligence and precision oncology.
Collapse
Affiliation(s)
- Akshat Singhal
- Department of Computer Science and Engineering, University of California, San Diego, La Jolla, CA, USA
| | - Xiaoyu Zhao
- Division of Human Genomics and Precision Medicine, Department of Medicine, University of California, San Diego, La Jolla, CA, USA
| | - Patrick Wall
- Department of Bioengineering, University of California, San Diego, La Jolla, CA, USA
| | - Emily So
- Princess Margaret Cancer Centre, University Health Network, Toronto, ON, Canada
| | - Guido Calderini
- Faculty of Health Science, Simon Fraser University, Burnaby, BC, Canada
- École de santé publique, Université de Montréal, Montréal, QC, Canada
| | - Alexander Partin
- Division of Data Science and Learning, Argonne National Laboratory, Lemont, IL, USA
| | - Natasha Koussa
- Cancer Data Science Initiatives, Cancer Research Technology Program, Frederick National Laboratory for Cancer Research, Frederick, MD, USA
| | | | - Oleksandr Narykov
- Division of Data Science and Learning, Argonne National Laboratory, Lemont, IL, USA
| | - Yitan Zhu
- Division of Data Science and Learning, Argonne National Laboratory, Lemont, IL, USA
| | - Sara E. Jones
- Cancer Data Science Initiatives, Cancer Research Technology Program, Frederick National Laboratory for Cancer Research, Frederick, MD, USA
| | | | | | | | | | - Barbara A. Parker
- Moores Cancer Center, Department of Medicine, University of California, San Diego, La Jolla, CA, USA
| | - Kay T. Yeung
- Moores Cancer Center, Department of Medicine, University of California, San Diego, La Jolla, CA, USA
| | - Jason I. Griffiths
- Department of Medical Oncology and Therapeutics Research, Beckman Research Institute, City of Hope National Medical Center, Monrovia, CA, USA
| | - Ryan Weil
- Cancer Data Science Initiatives, Cancer Research Technology Program, Frederick National Laboratory for Cancer Research, Frederick, MD, USA
| | - Aritro Nath
- Department of Medical Oncology and Therapeutics Research, Beckman Research Institute, City of Hope National Medical Center, Monrovia, CA, USA
| | - Benjamin Haibe-Kains
- Princess Margaret Cancer Centre, University Health Network, Toronto, ON, Canada
- Medical Biophysics, University of Toronto, Toronto, Canada
- Vector Institute for Artificial Intelligence, Toronto, Canada
- Department of Biostatistics, Dalla Lana School of Public Health, Toronto, Canada
| | - Trey Ideker
- Department of Computer Science and Engineering, University of California, San Diego, La Jolla, CA, USA
- Division of Human Genomics and Precision Medicine, Department of Medicine, University of California, San Diego, La Jolla, CA, USA
- Department of Bioengineering, University of California, San Diego, La Jolla, CA, USA
- Moores Cancer Center, Department of Medicine, University of California, San Diego, La Jolla, CA, USA
| |
Collapse
|
10
|
Peng W, Chen C, Dai W, Yu N, Wang J. Predicting Clinical Anticancer Drug Response of Patients by Using Domain Alignment and Prototypical Learning. IEEE J Biomed Health Inform 2025; 29:1534-1545. [PMID: 39292588 DOI: 10.1109/jbhi.2024.3462811] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 09/20/2024]
Abstract
Anticancer drug response prediction is crucial in developing personalized treatment plans for cancer patients. However, High-quality patient anticancer drug response data are scarce and cell line data and patient data have different distributions, models trained solely on cell line data perform poorly. Some existing methods predict anticancer drug response by transferring knowledge from the cell line domain to the patient domain using transfer learning. However, the robustness of these classifiers is affected by anomalies in the cell line data, and they do not utilize the knowledge in the unlabeled target domain data. To this end, we proposed a model called DAPL to predict patient responses to anticancer drugs. The model extracts domain-invariant features from cell lines and patients by constructing multiple VAEs and extracts drug features using GNNs. These features are then combined for prototypical learning to train a classifier, resulting in better predictions of patient anticancer drug response. We used the cell line datasets CCLE and GDSC as source domains and the patient datasets TCGA and PDTC as target domains and conducted experiments. The results indicate that DAPL shows excellent performance in predicting patient anticancer drug response compared to other state-of-the-art methods.
Collapse
|
11
|
Firoozbakht F, Yousefi B, Tsoy O, Baumbach J, Schwikowski B. Comparative evaluation of feature reduction methods for drug response prediction. Sci Rep 2024; 14:30885. [PMID: 39730699 DOI: 10.1038/s41598-024-81866-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/02/2024] [Accepted: 11/29/2024] [Indexed: 12/29/2024] Open
Abstract
Personalized medicine aims to tailor medical treatments to individual patients, and predicting drug responses from molecular profiles using machine learning is crucial for this goal. However, the high dimensionality of the molecular profiles compared to the limited number of samples presents significant challenges. Knowledge-based feature selection methods are particularly suitable for drug response prediction, as they leverage biological insights to reduce dimensionality and improve model interpretability. This study presents the first comparative evaluation of nine different knowledge-based and data-driven feature reduction methods on cell line and tumor data. Our analysis employs six distinct machine learning models, with a total of more than 6,000 runs to ensure a robust evaluation. Our findings indicate that transcription factor activities outperform other methods in predicting drug responses, effectively distinguishing between sensitive and resistant tumors for seven of the 20 drugs evaluated.
Collapse
Affiliation(s)
- Farzaneh Firoozbakht
- Institute for Computational Systems Biology, University of Hamburg, Hamburg, Germany
| | - Behnam Yousefi
- Computational Systems Biomedicine Lab, Institut Pasteur, Université Paris Cité, Paris, France
- École Doctorale Complexite du vivant, Sorbonne Université, Paris, France
- Institute of Medical Systems Biology, Center for Biomedical AI (bAIome), Center for Molecular Neurobiology (ZMNH), University Medical Center Hamburg-Eppendorf, 20251, Hamburg, Germany
| | - Olga Tsoy
- Institute for Computational Systems Biology, University of Hamburg, Hamburg, Germany
| | - Jan Baumbach
- Institute for Computational Systems Biology, University of Hamburg, Hamburg, Germany
- Computational BioMedicine Lab, University of Southern Denmark, Odense, Denmark
| | - Benno Schwikowski
- Computational Systems Biomedicine Lab, Institut Pasteur, Université Paris Cité, Paris, France.
| |
Collapse
|
12
|
Crawford J, Chikina M, Greene CS. Best holdout assessment is sufficient for cancer transcriptomic model selection. PATTERNS (NEW YORK, N.Y.) 2024; 5:101115. [PMID: 39776849 PMCID: PMC11701843 DOI: 10.1016/j.patter.2024.101115] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 01/02/2024] [Revised: 08/01/2024] [Accepted: 11/13/2024] [Indexed: 01/11/2025]
Abstract
Guidelines in statistical modeling for genomics hold that simpler models have advantages over more complex ones. Potential advantages include cost, interpretability, and improved generalization across datasets or biological contexts. We directly tested the assumption that small gene signatures generalize better by examining the generalization of mutation status prediction models across datasets (from cell lines to human tumors and vice versa) and biological contexts (holding out entire cancer types from pan-cancer data). We compared model selection between solely cross-validation performance and combining cross-validation performance with regularization strength. We did not observe that more regularized signatures generalized better. This result held across both generalization problems and for both linear models (LASSO logistic regression) and non-linear ones (neural networks). When the goal of an analysis is to produce generalizable predictive models, we recommend choosing the ones that perform best on held-out data or in cross-validation instead of those that are smaller or more regularized.
Collapse
Affiliation(s)
- Jake Crawford
- Genomics and Computational Biology Graduate Group, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, USA
| | - Maria Chikina
- Department of Computational and Systems Biology, School of Medicine, University of Pittsburgh, Pittsburgh, PA, USA
| | - Casey S. Greene
- Department of Biomedical Informatics, University of Colorado School of Medicine, Aurora, CO, USA
- Center for Health AI, University of Colorado School of Medicine, Aurora, CO, USA
| |
Collapse
|
13
|
Bailey RL, MacFarlane AJ, Field MS, Tagkopoulos I, Baranzini SE, Edwards KM, Rose CJ, Schork NJ, Singhal A, Wallace BC, Fisher KP, Markakis K, Stover PJ. Artificial intelligence in food and nutrition evidence: The challenges and opportunities. PNAS NEXUS 2024; 3:pgae461. [PMID: 39677367 PMCID: PMC11638775 DOI: 10.1093/pnasnexus/pgae461] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 06/12/2024] [Accepted: 10/02/2024] [Indexed: 12/17/2024]
Abstract
Science-informed decisions are best guided by the objective synthesis of the totality of evidence around a particular question and assessing its trustworthiness through systematic processes. However, there are major barriers and challenges that limit science-informed food and nutrition policy, practice, and guidance. First, insufficient evidence, primarily due to acquisition cost of generating high-quality data, and the complexity of the diet-disease relationship. Furthermore, the sheer number of systematic reviews needed across the entire agriculture and food value chain, and the cost and time required to conduct them, can delay the translation of science to policy. Artificial intelligence offers the opportunity to (i) better understand the complex etiology of diet-related chronic diseases, (ii) bring more precision to our understanding of the variation among individuals in the diet-chronic disease relationship, (iii) provide new types of computed data related to the efficacy and effectiveness of nutrition/food interventions in health promotion, and (iv) automate the generation of systematic reviews that support timely decisions. These advances include the acquisition and synthesis of heterogeneous and multimodal datasets. This perspective summarizes a meeting convened at the National Academy of Sciences, Engineering, and Medicine. The purpose of the meeting was to examine the current state and future potential of artificial intelligence in generating new types of computed data as well as automating the generation of systematic reviews to support evidence-based food and nutrition policy, practice, and guidance.
Collapse
Affiliation(s)
- Regan L Bailey
- Department of Nutrition, Texas A&M University, Cater-Mattil Hall, 373 Olsen Blvd Room 130, College Station, TX 77843, USA
- Institute for Advancing Health Through Agriculture, Texas A&M University, Borlaug Building, College Station, TX 77843, USA
| | - Amanda J MacFarlane
- Department of Nutrition, Texas A&M University, Cater-Mattil Hall, 373 Olsen Blvd Room 130, College Station, TX 77843, USA
- Texas A&M Agriculture, Food, and Nutrition Evidence Center, 801 Cherry Street, Fort Worth, TX 76102, USA
| | - Martha S Field
- Division of Nutritional Sciences, Cornell University, Savage Hall, Ithaca, NY 14850, USA
| | - Ilias Tagkopoulos
- Department of Computer Science and Genome Center, University of California, Davis, One Shields Avenue, Davis, CA 95616, USA
- USDA/NSF AI Institute for Next Generation Food Systems, University of California, Davis, One Shields Avenue, Davis, CA 95616, USA
| | - Sergio E Baranzini
- Department of Neurology, Weill Institute for Neurosciences, University of California, San Francisco, 1651 4th St, San Francisco, CA 94158, USA
| | - Kristen M Edwards
- Department of Mechanical Engineering, Massachusetts Institute of Technology, 77 Massachusetts Avenue, Cambridge, MA 02139, USA
| | - Christopher J Rose
- Cluster for Reviews and Health Technology Assessments, Norwegian Institute of Public Health, PO Box 222 Skøyen, 0213 Oslo, Norway
- Centre for Epidemic Interventions Research, Norwegian Institute of Public Health, Lovisenberggata 8 0456, 0213 Oslo, Norway
| | - Nicholas J Schork
- Translational Genomics Research Institute, City of Hope National Medical Center, 445 N. Fifth Street, Phoenix, AZ 85004, USA
| | - Akshat Singhal
- Department of Computer Science and Engineering, University of California San Diego, 9500 Gilman Drive, San Diego, CA 92093, USA
| | - Byron C Wallace
- Khoury College of Computer Sciences, Northeastern University, #202, West Village Residence Complex H, 440 Huntington Ave, Boston, MA 02115, USA
| | - Kelly P Fisher
- Institute for Advancing Health Through Agriculture, Texas A&M University, Borlaug Building, College Station, TX 77843, USA
| | - Konstantinos Markakis
- Department of Computer Science and Genome Center, University of California, Davis, One Shields Avenue, Davis, CA 95616, USA
| | - Patrick J Stover
- Department of Nutrition, Texas A&M University, Cater-Mattil Hall, 373 Olsen Blvd Room 130, College Station, TX 77843, USA
| |
Collapse
|
14
|
Wang H, Ren Z, Sun J, Chen Y, Bo X, Xue J, Gao J, Ni M. DeepPFP: a multi-task-aware architecture for protein function prediction. Brief Bioinform 2024; 26:bbae579. [PMID: 39905954 PMCID: PMC11794456 DOI: 10.1093/bib/bbae579] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/24/2024] [Revised: 09/14/2024] [Accepted: 01/31/2025] [Indexed: 02/06/2025] Open
Abstract
Deriving protein function from protein sequences poses a significant challenge due to the intricate relationship between sequence and function. Deep learning has made remarkable strides in predicting sequence-function relationships. However, models tailored for specific tasks or protein types encounter difficulties when using transfer learning across domains. This is attributed to the fact that protein function relies heavily on structural characteristics rather than mere sequence information. Consequently, there is a pressing need for a model capable of capturing shared features among diverse sequence-function mapping tasks to address the generalization issue. In this study, we explore the potential of Model-Agnostic Meta-Learning combined with a protein language model called Evolutionary Scale Modeling to tackle this challenge. Our approach involves training the architecture on five out-domain deep mutational scanning (DMS) datasets and evaluating its performance across four key dimensions. Our findings demonstrate that the proposed architecture exhibits satisfactory performance in terms of generalization and employs an effective few-shot learning strategy. To explain further, Compared to the best results, the Pearson's correlation coefficient (PCC) in the final stage increased by ~0.31%. Furthermore, we leverage the trained architecture to predict binding affinity scores of the DMS dataset of SARS-CoV-2 using transfer learning. Notably, training on a subset of the Ube4b dataset with 500 samples resulted in a notable improvement of 0.11 in the PCC. These results underscore the potential of our conceptual architecture as a promising methodology for multi-task protein function prediction.
Collapse
Affiliation(s)
- Han Wang
- College of Information Science and Technology, Beijing University of Chemical Technology, No. 15 North Third Ring East Road, Chaoyang District, Beijing 100029, China
| | - Zilin Ren
- Changchun Veterinary Research Institute, Chinese Academy of Agricultural Sciences, State Key Laboratory of Pathogen and Biosecurity, Key Laboratory of Jilin Province for Zoonosis Prevention and Control, Changchun 130122, China
- School of Information Science and Technology, Northeast Normal University, Changchun 130117, China
| | - Jinghong Sun
- College of Information Science and Technology, Beijing University of Chemical Technology, No. 15 North Third Ring East Road, Chaoyang District, Beijing 100029, China
| | - Yongbing Chen
- Changchun Veterinary Research Institute, Chinese Academy of Agricultural Sciences, State Key Laboratory of Pathogen and Biosecurity, Key Laboratory of Jilin Province for Zoonosis Prevention and Control, Changchun 130122, China
- School of Information Science and Technology, Northeast Normal University, Changchun 130117, China
| | - Xiaochen Bo
- Advanced & Interdisciplinary Biotechnology, Academy of Military Medical Sciences, No. 27 Taiping Road, Haidian District, Beijing 100850, China
| | - JiGuo Xue
- Advanced & Interdisciplinary Biotechnology, Academy of Military Medical Sciences, No. 27 Taiping Road, Haidian District, Beijing 100850, China
| | - Jingyang Gao
- College of Information Science and Technology, Beijing University of Chemical Technology, No. 15 North Third Ring East Road, Chaoyang District, Beijing 100029, China
| | - Ming Ni
- Advanced & Interdisciplinary Biotechnology, Academy of Military Medical Sciences, No. 27 Taiping Road, Haidian District, Beijing 100850, China
| |
Collapse
|
15
|
Artsi Y, Sorin V, Glicksberg BS, Nadkarni GN, Klang E. Advancing Clinical Practice: The Potential of Multimodal Technology in Modern Medicine. J Clin Med 2024; 13:6246. [PMID: 39458196 PMCID: PMC11508674 DOI: 10.3390/jcm13206246] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/11/2024] [Revised: 10/15/2024] [Accepted: 10/17/2024] [Indexed: 10/28/2024] Open
Abstract
Multimodal technology is poised to revolutionize clinical practice by integrating artificial intelligence with traditional diagnostic modalities. This evolution traces its roots from Hippocrates' humoral theory to the use of sophisticated AI-driven platforms that synthesize data across multiple sensory channels. The interplay between historical medical practices and modern technology challenges conventional patient-clinician interactions and redefines diagnostic accuracy. Highlighting applications from neurology to radiology, the potential of multimodal technology emerges, suggesting a future where AI not only supports but enhances human sensory inputs in medical diagnostics. This shift invites the medical community to navigate the ethical, practical, and technological changes reshaping the landscape of clinical medicine.
Collapse
Affiliation(s)
- Yaara Artsi
- Azrieli Faculty of Medicine, Bar-Ilan University, Zefat 1311502, Israel
| | - Vera Sorin
- Department of Radiology, Mayo Clinic, Rochester, MN 55905, USA;
| | - Benjamin S. Glicksberg
- Division of Data-Driven and Digital Medicine (D3M), Icahn School of Medicine at Mount Sinai, New York, NY 10029, USA; (B.S.G.); (G.N.N.); (E.K.)
- The Charles Bronfman Institute of Personalized Medicine, Icahn School of Medicine at Mount Sinai, New York, NY 10029, USA
| | - Girish N. Nadkarni
- Division of Data-Driven and Digital Medicine (D3M), Icahn School of Medicine at Mount Sinai, New York, NY 10029, USA; (B.S.G.); (G.N.N.); (E.K.)
- The Charles Bronfman Institute of Personalized Medicine, Icahn School of Medicine at Mount Sinai, New York, NY 10029, USA
| | - Eyal Klang
- Division of Data-Driven and Digital Medicine (D3M), Icahn School of Medicine at Mount Sinai, New York, NY 10029, USA; (B.S.G.); (G.N.N.); (E.K.)
- The Charles Bronfman Institute of Personalized Medicine, Icahn School of Medicine at Mount Sinai, New York, NY 10029, USA
| |
Collapse
|
16
|
Luo X, Ding Y, Cao Y, Liu Z, Zhang W, Zeng S, Cheng SH, Li H, Haggarty SJ, Wang X, Zhang J, Shi P. Few-shot meta-learning applied to whole brain activity maps improves systems neuropharmacology and drug discovery. iScience 2024; 27:110875. [PMID: 39319265 PMCID: PMC11419810 DOI: 10.1016/j.isci.2024.110875] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/11/2024] [Revised: 06/10/2024] [Accepted: 08/30/2024] [Indexed: 09/26/2024] Open
Abstract
In this study, we present an approach to neuropharmacological research by integrating few-shot meta-learning algorithms with brain activity mapping (BAMing) to enhance the discovery of central nervous system (CNS) therapeutics. By utilizing patterns from previously validated CNS drugs, our approach facilitates the rapid identification and prediction of potential drug candidates from limited datasets, thereby accelerating the drug discovery process. The application of few-shot meta-learning algorithms allows us to adeptly navigate the challenges of limited sample sizes prevalent in neuropharmacology. The study reveals that our meta-learning-based convolutional neural network (Meta-CNN) models demonstrate enhanced stability and improved prediction accuracy over traditional machine-learning methods. Moreover, our BAM library proves instrumental in classifying CNS drugs and aiding in pharmaceutical repurposing and repositioning. Overall, this research not only demonstrates the effectiveness in overcoming data limitations but also highlights the significant potential of combining BAM with advanced meta-learning techniques in CNS drug discovery.
Collapse
Affiliation(s)
- Xuan Luo
- Department of Biomedical Engineering, City University of Hong Kong, Kowloon 999077, Hong Kong SAR, China
- National Center for Applied Mathematics Shenzhen, Shenzhen 518000, China
- Department of Mathematics, Southern University of Science and Technology, Shenzhen 518055, China
| | - Yanyun Ding
- National Center for Applied Mathematics Shenzhen, Shenzhen 518000, China
- Institute of Applied Mathematics, Shenzhen Polytechnic University, Shenzhen 518055, China
| | - Yi Cao
- Department of Biomedical Sciences, City University of Hong Kong, Kowloon 999077, Hong Kong SAR, China
| | - Zhen Liu
- Department of Biomedical Engineering, City University of Hong Kong, Kowloon 999077, Hong Kong SAR, China
| | - Wenchong Zhang
- Department of Biomedical Engineering, City University of Hong Kong, Kowloon 999077, Hong Kong SAR, China
| | - Shangzhi Zeng
- National Center for Applied Mathematics Shenzhen, Shenzhen 518000, China
| | - Shuk Han Cheng
- Department of Biomedical Sciences, City University of Hong Kong, Kowloon 999077, Hong Kong SAR, China
| | - Honglin Li
- Innovation Center for AI and Drug Discovery, East China Normal University, Shanghai 200062, China
| | - Stephen J. Haggarty
- Chemical Neurobiology Laboratory, Precision Therapeutics Unit, Center for Genomic Medicine, Massachusetts General Hospital, Department of Neurology, Harvard Medical School, Boston, MA 02114, USA
| | - Xin Wang
- Department of Surgery, Chinese University of Hong Kong, Kowloon 999077, Hong Kong SAR, China
| | - Jin Zhang
- National Center for Applied Mathematics Shenzhen, Shenzhen 518000, China
- Department of Mathematics, Southern University of Science and Technology, Shenzhen 518055, China
| | - Peng Shi
- Department of Biomedical Engineering, City University of Hong Kong, Kowloon 999077, Hong Kong SAR, China
- National Center for Applied Mathematics Shenzhen, Shenzhen 518000, China
- Shenzhen Research Institute, City University of Hong Kong, Shenzhen 518057, China
| |
Collapse
|
17
|
Gao Y, Wei Z, Dong K, Chen K, Yang J, Chuai G, Liu Q. Toward subtask-decomposition-based learning and benchmarking for predicting genetic perturbation outcomes and beyond. NATURE COMPUTATIONAL SCIENCE 2024; 4:773-785. [PMID: 39333790 DOI: 10.1038/s43588-024-00698-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/10/2024] [Accepted: 08/29/2024] [Indexed: 09/30/2024]
Abstract
Deciphering cellular responses to genetic perturbations is fundamental for a wide array of biomedical applications. However, there are three main challenges: predicting single-genetic-perturbation outcomes, predicting multiple-genetic-perturbation outcomes and predicting genetic outcomes across cell lines. Here we introduce Subtask Decomposition Modeling for Genetic Perturbation Prediction (STAMP), a flexible artificial intelligence strategy for genetic perturbation outcome prediction and downstream applications. STAMP formulates genetic perturbation prediction as a subtask decomposition problem by resolving three progressive subtasks in a problem decomposition manner, that is, identifying postperturbation differentially expressed genes, determining the expression change directions of differentially expressed genes and finally estimating the magnitudes of gene expression changes. STAMP exhibits a substantial improvement over the existing approaches on three subtasks and beyond, including the ability to identify key regulatory genes and pathways on small samples and to reveal precise genetic interactions of diverse types.
Collapse
Affiliation(s)
- Yicheng Gao
- State Key Laboratory of Cardiology and Medical Innovation Center, Shanghai East Hospital, Frontier Science Center for Stem Cell Research, Bioinformatics Department, School of Life Sciences and Technology, Tongji University, Shanghai, China
- Key Laboratory of Spine and Spinal Cord Injury Repair and Regeneration (Tongji University), Ministry of Education, Orthopaedic Department of Tongji Hospital, Frontier Science Center for Stem Cell Research, Bioinformatics Department, School of Life Sciences and Technology, Tongji University, Shanghai, China
| | - Zhiting Wei
- State Key Laboratory of Cardiology and Medical Innovation Center, Shanghai East Hospital, Frontier Science Center for Stem Cell Research, Bioinformatics Department, School of Life Sciences and Technology, Tongji University, Shanghai, China
- Key Laboratory of Spine and Spinal Cord Injury Repair and Regeneration (Tongji University), Ministry of Education, Orthopaedic Department of Tongji Hospital, Frontier Science Center for Stem Cell Research, Bioinformatics Department, School of Life Sciences and Technology, Tongji University, Shanghai, China
| | - Kejing Dong
- State Key Laboratory of Cardiology and Medical Innovation Center, Shanghai East Hospital, Frontier Science Center for Stem Cell Research, Bioinformatics Department, School of Life Sciences and Technology, Tongji University, Shanghai, China
- Key Laboratory of Spine and Spinal Cord Injury Repair and Regeneration (Tongji University), Ministry of Education, Orthopaedic Department of Tongji Hospital, Frontier Science Center for Stem Cell Research, Bioinformatics Department, School of Life Sciences and Technology, Tongji University, Shanghai, China
| | - Ke Chen
- State Key Laboratory of Cardiology and Medical Innovation Center, Shanghai East Hospital, Frontier Science Center for Stem Cell Research, Bioinformatics Department, School of Life Sciences and Technology, Tongji University, Shanghai, China
- Key Laboratory of Spine and Spinal Cord Injury Repair and Regeneration (Tongji University), Ministry of Education, Orthopaedic Department of Tongji Hospital, Frontier Science Center for Stem Cell Research, Bioinformatics Department, School of Life Sciences and Technology, Tongji University, Shanghai, China
| | - Jingya Yang
- State Key Laboratory of Cardiology and Medical Innovation Center, Shanghai East Hospital, Frontier Science Center for Stem Cell Research, Bioinformatics Department, School of Life Sciences and Technology, Tongji University, Shanghai, China
- Key Laboratory of Spine and Spinal Cord Injury Repair and Regeneration (Tongji University), Ministry of Education, Orthopaedic Department of Tongji Hospital, Frontier Science Center for Stem Cell Research, Bioinformatics Department, School of Life Sciences and Technology, Tongji University, Shanghai, China
- Shanghai Research Institute for Intelligent Autonomous Systems, Shanghai, China
| | - Guohui Chuai
- State Key Laboratory of Cardiology and Medical Innovation Center, Shanghai East Hospital, Frontier Science Center for Stem Cell Research, Bioinformatics Department, School of Life Sciences and Technology, Tongji University, Shanghai, China
- Key Laboratory of Spine and Spinal Cord Injury Repair and Regeneration (Tongji University), Ministry of Education, Orthopaedic Department of Tongji Hospital, Frontier Science Center for Stem Cell Research, Bioinformatics Department, School of Life Sciences and Technology, Tongji University, Shanghai, China
- Shanghai Research Institute for Intelligent Autonomous Systems, Shanghai, China
| | - Qi Liu
- State Key Laboratory of Cardiology and Medical Innovation Center, Shanghai East Hospital, Frontier Science Center for Stem Cell Research, Bioinformatics Department, School of Life Sciences and Technology, Tongji University, Shanghai, China.
- Key Laboratory of Spine and Spinal Cord Injury Repair and Regeneration (Tongji University), Ministry of Education, Orthopaedic Department of Tongji Hospital, Frontier Science Center for Stem Cell Research, Bioinformatics Department, School of Life Sciences and Technology, Tongji University, Shanghai, China.
- Shanghai Research Institute for Intelligent Autonomous Systems, Shanghai, China.
| |
Collapse
|
18
|
Torres LHM, Arrais JP, Ribeiro B. Combining graph neural networks and transformers for few-shot nuclear receptor binding activity prediction. J Cheminform 2024; 16:109. [PMID: 39334272 PMCID: PMC11429188 DOI: 10.1186/s13321-024-00902-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/30/2024] [Accepted: 09/05/2024] [Indexed: 09/30/2024] Open
Abstract
Nuclear receptors (NRs) play a crucial role as biological targets in drug discovery. However, determining which compounds can act as endocrine disruptors and modulate the function of NRs with a reduced amount of candidate drugs is a challenging task. Moreover, the computational methods for NR-binding activity prediction mostly focus on a single receptor at a time, which may limit their effectiveness. Hence, the transfer of learned knowledge among multiple NRs can improve the performance of molecular predictors and lead to the development of more effective drugs. In this research, we integrate graph neural networks (GNNs) and Transformers to introduce a few-shot GNN-Transformer, Meta-GTNRP to predict the binding activity of compounds using the combined information of different NRs and identify potential NR-modulators with limited data. The Meta-GTNRP model captures the local information in graph-structured data and preserves the global-semantic structure of molecular graph embeddings for NR-binding activity prediction. Furthermore, a few-shot meta-learning approach is proposed to optimize model parameters for different NR-binding tasks and leverage the complementarity among multiple NR-specific tasks to predict binding activity of compounds for each NR with just a few labeled molecules. Experiments with a compound database containing annotations on the binding activity for 11 NRs shows that Meta-GTNRP outperforms other graph-based approaches. The data and code are available at: https://github.com/ltorres97/Meta-GTNRP .Scientific contributionThe proposed few-shot GNN-Transformer model, Meta-GTNRP captures the local structure of molecular graphs and preserves the global-semantic information of graph embeddings to predict the NR-binding activity of compounds with limited available data; A few-shot meta-learning framework adapts model parameters across NR-specific tasks for different NRs in a joint learning procedure to predict the binding activity of compounds for each NR with just a few labeled molecules in highly imbalanced data scenarios; Meta-GTNRP is a data-efficient approach that combines the strengths of GNNs and Transformers to predict the NR-binding properties of compounds through an optimized meta-learning procedure and deliver robust results valuable to identify potential NR-based drug candidates.
Collapse
Affiliation(s)
- Luis H M Torres
- Department of Informatics Engineering, Univ Coimbra, Centre for Informatics and Systems of the University of Coimbra, Coimbra, 3030-790, Portugal.
| | - Joel P Arrais
- Department of Informatics Engineering, Univ Coimbra, Centre for Informatics and Systems of the University of Coimbra, Coimbra, 3030-790, Portugal
| | - Bernardete Ribeiro
- Department of Informatics Engineering, Univ Coimbra, Centre for Informatics and Systems of the University of Coimbra, Coimbra, 3030-790, Portugal
| |
Collapse
|
19
|
Rood JE, Hupalowska A, Regev A. Toward a foundation model of causal cell and tissue biology with a Perturbation Cell and Tissue Atlas. Cell 2024; 187:4520-4545. [PMID: 39178831 DOI: 10.1016/j.cell.2024.07.035] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/03/2024] [Revised: 07/15/2024] [Accepted: 07/21/2024] [Indexed: 08/26/2024]
Abstract
Comprehensively charting the biologically causal circuits that govern the phenotypic space of human cells has often been viewed as an insurmountable challenge. However, in the last decade, a suite of interleaved experimental and computational technologies has arisen that is making this fundamental goal increasingly tractable. Pooled CRISPR-based perturbation screens with high-content molecular and/or image-based readouts are now enabling researchers to probe, map, and decipher genetically causal circuits at increasing scale. This scale is now eminently suitable for the deployment of artificial intelligence and machine learning (AI/ML) to both direct further experiments and to predict or generate information that was not-and sometimes cannot-be gathered experimentally. By combining and iterating those through experiments that are designed for inference, we now envision a Perturbation Cell Atlas as a generative causal foundation model to unify human cell biology.
Collapse
Affiliation(s)
| | | | - Aviv Regev
- Genentech, South San Francisco, CA, USA.
| |
Collapse
|
20
|
Cai YQ, Gong DX, Tang LY, Cai Y, Li HJ, Jing TC, Gong M, Hu W, Zhang ZW, Zhang X, Zhang GW. Pitfalls in Developing Machine Learning Models for Predicting Cardiovascular Diseases: Challenge and Solutions. J Med Internet Res 2024; 26:e47645. [PMID: 38869157 PMCID: PMC11316160 DOI: 10.2196/47645] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/29/2023] [Revised: 10/30/2023] [Accepted: 06/12/2024] [Indexed: 06/14/2024] Open
Abstract
In recent years, there has been explosive development in artificial intelligence (AI), which has been widely applied in the health care field. As a typical AI technology, machine learning models have emerged with great potential in predicting cardiovascular diseases by leveraging large amounts of medical data for training and optimization, which are expected to play a crucial role in reducing the incidence and mortality rates of cardiovascular diseases. Although the field has become a research hot spot, there are still many pitfalls that researchers need to pay close attention to. These pitfalls may affect the predictive performance, credibility, reliability, and reproducibility of the studied models, ultimately reducing the value of the research and affecting the prospects for clinical application. Therefore, identifying and avoiding these pitfalls is a crucial task before implementing the research. However, there is currently a lack of a comprehensive summary on this topic. This viewpoint aims to analyze the existing problems in terms of data quality, data set characteristics, model design, and statistical methods, as well as clinical implications, and provide possible solutions to these problems, such as gathering objective data, improving training, repeating measurements, increasing sample size, preventing overfitting using statistical methods, using specific AI algorithms to address targeted issues, standardizing outcomes and evaluation criteria, and enhancing fairness and replicability, with the goal of offering reference and assistance to researchers, algorithm developers, policy makers, and clinical practitioners.
Collapse
Affiliation(s)
- Yu-Qing Cai
- The First Hospital of China Medical University, Shenyang, China
| | - Da-Xin Gong
- Smart Hospital Management Department, The First Hospital of China Medical University, Shenyang, China
| | - Li-Ying Tang
- The First Hospital of China Medical University, Shenyang, China
| | - Yue Cai
- The First Hospital of China Medical University, Shenyang, China
| | - Hui-Jun Li
- Shenyang Medical & Film Science and Technology Co, Ltd, Shenyang, China
| | - Tian-Ci Jing
- Smart Hospital Management Department, The First Hospital of China Medical University, Shenyang, China
| | | | - Wei Hu
- Bayi Orthopedic Hospital, Chengdu, China
| | - Zhen-Wei Zhang
- China Rongtong Medical & Healthcare Co, Ltd, Chengdu, China
| | - Xingang Zhang
- Department of Cardiology, The First Hospital of China Medical University, Shenyang, China
| | - Guang-Wei Zhang
- Smart Hospital Management Department, The First Hospital of China Medical University, Shenyang, China
| |
Collapse
|
21
|
Gross B, Dauvin A, Cabeli V, Kmetzsch V, El Khoury J, Dissez G, Ouardini K, Grouard S, Davi A, Loeb R, Esposito C, Hulot L, Ghermi R, Blum M, Darhi Y, Durand EY, Romagnoni A. Robust evaluation of deep learning-based representation methods for survival and gene essentiality prediction on bulk RNA-seq data. Sci Rep 2024; 14:17064. [PMID: 39048590 PMCID: PMC11269749 DOI: 10.1038/s41598-024-67023-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/15/2024] [Accepted: 07/08/2024] [Indexed: 07/27/2024] Open
Abstract
Deep learning (DL) has shown potential to provide powerful representations of bulk RNA-seq data in cancer research. However, there is no consensus regarding the impact of design choices of DL approaches on the performance of the learned representation, including the model architecture, the training methodology and the various hyperparameters. To address this problem, we evaluate the performance of various design choices of DL representation learning methods using TCGA and DepMap pan-cancer datasets and assess their predictive power for survival and gene essentiality predictions. We demonstrate that baseline methods achieve comparable or superior performance compared to more complex models on survival predictions tasks. DL representation methods, however, are the most efficient to predict the gene essentiality of cell lines. We show that auto-encoders (AE) are consistently improved by techniques such as masking and multi-head training. Our results suggest that the impact of DL representations and of pretraining are highly task- and architecture-dependent, highlighting the need for adopting rigorous evaluation guidelines. These guidelines for robust evaluation are implemented in a pipeline made available to the research community.
Collapse
|
22
|
Zhai J, Liu H. Cross-Domain Feature Disentanglement for Interpretable Modeling of Tumor Microenvironment Impact on Drug Response. IEEE J Biomed Health Inform 2024; 28:4382-4392. [PMID: 38607708 DOI: 10.1109/jbhi.2024.3387930] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 04/14/2024]
Abstract
High-throughput screening technology has enabled the generation of large-scale drug responses across hundreds of cancer cell lines. There remains a significant gap between in vitro cell lines and actual tumors in vivo in terms of their response to drug treatments yet. This is because tumors consist of a complex cellular composition and histopathology structure, known as the tumor microenvironment (TME), which greatly impacts the drug cytotoxicity against tumor cells. To date, no study has focused on modeling the impact of the TME on clinical drug response. In this study, we postulated that the intricate complexity of an actual tumor can be conceptually simplified into two separable components: cancerous cells and the tumor microenvironment. This assumption allowed us to model the influence of these two constituent parts on drug response through feature disentanglement. We employed a domain adaptation network to decouple and extract features from tumor transcriptional profiles. Specifically, two denoising autoencoders were separately used to extract features from cell lines (source domain) and tumors (target domain) for partial domain alignment and feature decoupling. The private encoder was enforced to extract information only about the TME. Moreover, to ensure generalizability to novel drugs, we employed a graph attention network to learn the latent representation of drugs, enabling us to linearly model the drug perturbation on cellular state in latent space. We validated our model on a benchmark dataset and demonstrated its superior performance in predicting clinical drug response and dissecting the influence of the TME on drug efficacy.
Collapse
|
23
|
Işık G, Paçal İ. Few-shot classification of ultrasound breast cancer images using meta-learning algorithms. Neural Comput Appl 2024; 36:12047-12059. [DOI: 10.1007/s00521-024-09767-y] [Citation(s) in RCA: 7] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/08/2023] [Accepted: 03/25/2024] [Indexed: 05/14/2025]
Abstract
AbstractMedical datasets often have a skewed class distribution and a lack of high-quality annotated images. However, deep learning methods require a large amount of labeled data for classification. In this study, we present a few-shot learning approach for the classification of ultrasound breast cancer images using meta-learning methods. We used prototypical networks and model agnostic meta-learning (MAML) algorithms as meta-learning methods. The breast ultrasound images (BUSI) dataset, which has three classes and is difficult to use in meta-learning, was used for meta-testing in a cross-domain approach along with other datasets for meta-training. Our proposed approach yielded an accuracy range of 0.882–0.889, achieved by implementing the ResNet50 backbone with ProtoNet in a 10-shot setting. These results represent a significant improvement ranging from 6.27 to 7.10% over the baseline accuracy of 0.831. The results showed that ProtoNet outperformed the MAML method for all k-shot settings. In addition, the use of ResNet models as the backbone network for feature extraction was found to be more successful than the use of a four-layer convolutional model. Our proposed method is the first attempt to apply meta-learning for few-shot classification in the BUSI dataset while providing higher accuracy compared to deep learning methods for medical images with small-scale datasets and few classes. The methodology used in this study can be adapted to other datasets with similar problems.
Collapse
|
24
|
Cadavid JL, Li NT, McGuigan AP. Bridging systems biology and tissue engineering: Unleashing the full potential of complex 3D in vitro tissue models of disease. BIOPHYSICS REVIEWS 2024; 5:021301. [PMID: 38617201 PMCID: PMC11008916 DOI: 10.1063/5.0179125] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/29/2023] [Accepted: 03/12/2024] [Indexed: 04/16/2024]
Abstract
Rapid advances in tissue engineering have resulted in more complex and physiologically relevant 3D in vitro tissue models with applications in fundamental biology and therapeutic development. However, the complexity provided by these models is often not leveraged fully due to the reductionist methods used to analyze them. Computational and mathematical models developed in the field of systems biology can address this issue. Yet, traditional systems biology has been mostly applied to simpler in vitro models with little physiological relevance and limited cellular complexity. Therefore, integrating these two inherently interdisciplinary fields can result in new insights and move both disciplines forward. In this review, we provide a systematic overview of how systems biology has been integrated with 3D in vitro tissue models and discuss key application areas where the synergies between both fields have led to important advances with potential translational impact. We then outline key directions for future research and discuss a framework for further integration between fields.
Collapse
|
25
|
Clark T, Mohan J, Schaffer L, Obernier K, Al Manir S, Churas CP, Dailamy A, Doctor Y, Forget A, Hansen JN, Hu M, Lenkiewicz J, Levinson MA, Marquez C, Nourreddine S, Niestroy J, Pratt D, Qian G, Thaker S, Bélisle-Pipon JC, Brandt C, Chen J, Ding Y, Fodeh S, Krogan N, Lundberg E, Mali P, Payne-Foster P, Ratcliffe S, Ravitsky V, Sali A, Schulz W, Ideker T. Cell Maps for Artificial Intelligence: AI-Ready Maps of Human Cell Architecture from Disease-Relevant Cell Lines. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.05.21.589311. [PMID: 38826258 PMCID: PMC11142054 DOI: 10.1101/2024.05.21.589311] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/04/2024]
Abstract
This article describes the Cell Maps for Artificial Intelligence (CM4AI) project and its goals, methods, standards, current datasets, software tools , status, and future directions. CM4AI is the Functional Genomics Data Generation Project in the U.S. National Institute of Health's (NIH) Bridge2AI program. Its overarching mission is to produce ethical, AI-ready datasets of cell architecture, inferred from multimodal data collected for human cell lines, to enable transformative biomedical AI research.
Collapse
|
26
|
Omelchenko AA, Siwek JC, Chhibbar P, Arshad S, Nazarali I, Nazarali K, Rosengart A, Rahimikollu J, Tilstra J, Shlomchik MJ, Koes DR, Joglekar AV, Das J. Sliding Window INteraction Grammar (SWING): a generalized interaction language model for peptide and protein interactions. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.05.01.592062. [PMID: 38746274 PMCID: PMC11092674 DOI: 10.1101/2024.05.01.592062] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/16/2024]
Abstract
The explosion of sequence data has allowed the rapid growth of protein language models (pLMs). pLMs have now been employed in many frameworks including variant-effect and peptide-specificity prediction. Traditionally, for protein-protein or peptide-protein interactions (PPIs), corresponding sequences are either co-embedded followed by post-hoc integration or the sequences are concatenated prior to embedding. Interestingly, no method utilizes a language representation of the interaction itself. We developed an interaction LM (iLM), which uses a novel language to represent interactions between protein/peptide sequences. Sliding Window Interaction Grammar (SWING) leverages differences in amino acid properties to generate an interaction vocabulary. This vocabulary is the input into a LM followed by a supervised prediction step where the LM's representations are used as features. SWING was first applied to predicting peptide:MHC (pMHC) interactions. SWING was not only successful at generating Class I and Class II models that have comparable prediction to state-of-the-art approaches, but the unique Mixed Class model was also successful at jointly predicting both classes. Further, the SWING model trained only on Class I alleles was predictive for Class II, a complex prediction task not attempted by any existing approach. For de novo data, using only Class I or Class II data, SWING also accurately predicted Class II pMHC interactions in murine models of SLE (MRL/lpr model) and T1D (NOD model), that were validated experimentally. To further evaluate SWING's generalizability, we tested its ability to predict the disruption of specific protein-protein interactions by missense mutations. Although modern methods like AlphaMissense and ESM1b can predict interfaces and variant effects/pathogenicity per mutation, they are unable to predict interaction-specific disruptions. SWING was successful at accurately predicting the impact of both Mendelian mutations and population variants on PPIs. This is the first generalizable approach that can accurately predict interaction-specific disruptions by missense mutations with only sequence information. Overall, SWING is a first-in-class generalizable zero-shot iLM that learns the language of PPIs.
Collapse
Affiliation(s)
- Alisa A. Omelchenko
- Center for Systems immunology, School of Medicine, University of Pittsburgh, Pittsburgh, PA, USA
- Department of Immunology, School of Medicine, University of Pittsburgh, Pittsburgh, PA, USA
- Department of Computational and Systems Biology, School of Medicine, University of Pittsburgh, PA, USA
- The joint CMU-Pitt PhD program in computational biology, School of Medicine, University of Pittsburgh, PA, USA
| | - Jane C. Siwek
- Center for Systems immunology, School of Medicine, University of Pittsburgh, Pittsburgh, PA, USA
- Department of Immunology, School of Medicine, University of Pittsburgh, Pittsburgh, PA, USA
- Department of Computational and Systems Biology, School of Medicine, University of Pittsburgh, PA, USA
- The joint CMU-Pitt PhD program in computational biology, School of Medicine, University of Pittsburgh, PA, USA
| | - Prabal Chhibbar
- Center for Systems immunology, School of Medicine, University of Pittsburgh, Pittsburgh, PA, USA
- Department of Immunology, School of Medicine, University of Pittsburgh, Pittsburgh, PA, USA
- Integrative systems biology PhD program, School of Medicine, University of Pittsburgh, PA, USA
| | - Sanya Arshad
- Center for Systems immunology, School of Medicine, University of Pittsburgh, Pittsburgh, PA, USA
- Department of Immunology, School of Medicine, University of Pittsburgh, Pittsburgh, PA, USA
| | - Iliyan Nazarali
- Center for Systems immunology, School of Medicine, University of Pittsburgh, Pittsburgh, PA, USA
| | - Kiran Nazarali
- Center for Systems immunology, School of Medicine, University of Pittsburgh, Pittsburgh, PA, USA
| | - AnnaElaine Rosengart
- Center for Systems immunology, School of Medicine, University of Pittsburgh, Pittsburgh, PA, USA
| | - Javad Rahimikollu
- Center for Systems immunology, School of Medicine, University of Pittsburgh, Pittsburgh, PA, USA
- Department of Immunology, School of Medicine, University of Pittsburgh, Pittsburgh, PA, USA
- Department of Computational and Systems Biology, School of Medicine, University of Pittsburgh, PA, USA
- The joint CMU-Pitt PhD program in computational biology, School of Medicine, University of Pittsburgh, PA, USA
| | - Jeremy Tilstra
- Department of Immunology, School of Medicine, University of Pittsburgh, Pittsburgh, PA, USA
- Division of Rheumatology and Clinical Immunology, Department of Medicine, School of Medicine, University of Pittsburgh, PA, USA
| | - Mark J. Shlomchik
- Department of Immunology, School of Medicine, University of Pittsburgh, Pittsburgh, PA, USA
| | - David R. Koes
- Department of Computational and Systems Biology, School of Medicine, University of Pittsburgh, PA, USA
| | - Alok V. Joglekar
- Center for Systems immunology, School of Medicine, University of Pittsburgh, Pittsburgh, PA, USA
- Department of Immunology, School of Medicine, University of Pittsburgh, Pittsburgh, PA, USA
- Department of Computational and Systems Biology, School of Medicine, University of Pittsburgh, PA, USA
| | - Jishnu Das
- Center for Systems immunology, School of Medicine, University of Pittsburgh, Pittsburgh, PA, USA
- Department of Immunology, School of Medicine, University of Pittsburgh, Pittsburgh, PA, USA
- Department of Computational and Systems Biology, School of Medicine, University of Pittsburgh, PA, USA
| |
Collapse
|
27
|
Yanik E, Schwaitzberg S, Yang G, Intes X, Norfleet J, Hackett M, De S. One-shot skill assessment in high-stakes domains with limited data via meta learning. Comput Biol Med 2024; 174:108470. [PMID: 38636326 DOI: 10.1016/j.compbiomed.2024.108470] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/10/2023] [Revised: 04/08/2024] [Accepted: 04/09/2024] [Indexed: 04/20/2024]
Abstract
Deep Learning (DL) has achieved robust competency assessment in various high-stakes fields. However, the applicability of DL models is often hampered by their substantial data requirements and confinement to specific training domains. This prevents them from transitioning to new tasks where data is scarce. Therefore, domain adaptation emerges as a critical element for the practical implementation of DL in real-world scenarios. Herein, we introduce A-VBANet, a novel meta-learning model capable of delivering domain-agnostic skill assessment via one-shot learning. Our methodology has been tested by assessing surgical skills on five laparoscopic and robotic simulators and real-life laparoscopic cholecystectomy. Our model successfully adapted with accuracies up to 99.5 % in one-shot and 99.9 % in few-shot settings for simulated tasks and 89.7 % for laparoscopic cholecystectomy. This study marks the first instance of a domain-agnostic methodology for skill assessment in critical fields setting a precedent for the broad application of DL across diverse real-life domains with limited data.
Collapse
Affiliation(s)
- Erim Yanik
- College of Engineering, Florida A&M University and the Florida State University, USA.
| | | | - Gene Yang
- School of Medicine and Biomedical Sciences, University at Buffalo, USA
| | - Xavier Intes
- Biomedical Engineering Department, Rensselaer Polytechnic Institute, USA
| | - Jack Norfleet
- U.S. Army Combat Capabilities Development Command Soldier Center STTC, USA
| | - Matthew Hackett
- U.S. Army Combat Capabilities Development Command Soldier Center STTC, USA
| | - Suvranu De
- College of Engineering, Florida A&M University and the Florida State University, USA
| |
Collapse
|
28
|
Du M, Xie X, Luo J, Li J. Meta-learning-based Inductive logistic matrix completion for prediction of kinase inhibitors. J Cheminform 2024; 16:44. [PMID: 38627866 PMCID: PMC11301988 DOI: 10.1186/s13321-024-00838-9] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/17/2023] [Accepted: 03/31/2024] [Indexed: 08/09/2024] Open
Abstract
Protein kinases become an important source of potential drug targets. Developing new, efficient, and safe small-molecule kinase inhibitors has become an important topic in the field of drug research and development. In contrast with traditional wet experiments which are time-consuming and expensive, machine learning-based approaches for predicting small molecule inhibitors for protein kinases are time-saving and cost-effective, which are highly desired for us. However, the issue of sample scarcity (known active and inactive compounds are usually limited for most kinases) poses a challenge to the research and development of machine learning-based kinase inhibitors' active prediction methods. To alleviate the data scarcity problem in the prediction of kinase inhibitors, in this study, we present a novel Meta-learning-based inductive logistic matrix completion method for the Prediction of Kinase Inhibitors (MetaILMC). MetaILMC adopts a meta-learning framework to learn a well-generalized model from tasks with sufficient samples, which can fast adapt to new tasks with limited samples. As MetaILMC allows the effective transfer of the prior knowledge learned from kinases with sufficient samples to kinases with a small number of samples, the proposed model can produce accurate predictions for kinases with limited data. Experimental results show that MetaILMC has excellent performance for prediction tasks of kinases with few-shot samples and is significantly superior to the state-of-the-art multi-task learning in terms of AUC, AUPR, etc., various performance metrics. Case studies also provided for two drugs to predict Kinase Inhibitory scores, further validating the proposed method's effectiveness and feasibility. SCIENTIFIC CONTRIBUTION: Considering the potential correlation between activity prediction tasks for different kinases, we propose a novel meta learning algorithm MetaILMC, which learns a prior of strong generalization capacity during meta-training from the tasks with sufficient training samples, such that it can be easily and quickly adapted to the new tasks of the kinase with scarce data during meta-testing. Thus, MetaILMC can effectively alleviate the data scarcity problem in the prediction of kinase inhibitors.
Collapse
Affiliation(s)
- Ming Du
- School of Software, Yunnan University, Kunming, 650091, China
| | - XingRan Xie
- School of Software, Yunnan University, Kunming, 650091, China
| | - Jing Luo
- State Key Laboratory for Conservation and Utilization of Bio-Resource, School of Ecology and Environment and School of Life Sciences, Yunnan University, Kunming, 650091, Yunnan, China
| | - Jin Li
- School of Software, Yunnan University, Kunming, 650091, China.
- The Key Laboratory of Software Engineering of Yunnan Province, Kunming, 650091, China.
- The Cloud Computing Engineering Research Center of Yunnan Province, Kunming, 650091, China.
| |
Collapse
|
29
|
Hajim WI, Zainudin S, Mohd Daud K, Alheeti K. Optimized models and deep learning methods for drug response prediction in cancer treatments: a review. PeerJ Comput Sci 2024; 10:e1903. [PMID: 38660174 PMCID: PMC11042005 DOI: 10.7717/peerj-cs.1903] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/05/2023] [Accepted: 01/31/2024] [Indexed: 04/26/2024]
Abstract
Recent advancements in deep learning (DL) have played a crucial role in aiding experts to develop personalized healthcare services, particularly in drug response prediction (DRP) for cancer patients. The DL's techniques contribution to this field is significant, and they have proven indispensable in the medical field. This review aims to analyze the diverse effectiveness of various DL models in making these predictions, drawing on research published from 2017 to 2023. We utilized the VOS-Viewer 1.6.18 software to create a word cloud from the titles and abstracts of the selected studies. This study offers insights into the focus areas within DL models used for drug response. The word cloud revealed a strong link between certain keywords and grouped themes, highlighting terms such as deep learning, machine learning, precision medicine, precision oncology, drug response prediction, and personalized medicine. In order to achieve an advance in DRP using DL, the researchers need to work on enhancing the models' generalizability and interoperability. It is also crucial to develop models that not only accurately represent various architectures but also simplify these architectures, balancing the complexity with the predictive capabilities. In the future, researchers should try to combine methods that make DL models easier to understand; this will make DRP reviews more open and help doctors trust the decisions made by DL models in cancer DRP.
Collapse
Affiliation(s)
- Wesam Ibrahim Hajim
- Department of Applied Geology, College of Sciences, Tirkit University, Tikrit, Salah ad Din, Iraq
- Center for Artificial Intelligence Technology, Faculty of Information Science and Technology, Universiti Kebangsaan Malaysia, Selangor, Malaysia
| | - Suhaila Zainudin
- Center for Artificial Intelligence Technology, Faculty of Information Science and Technology, Universiti Kebangsaan Malaysia, Selangor, Malaysia
| | - Kauthar Mohd Daud
- Center for Artificial Intelligence Technology, Faculty of Information Science and Technology, Universiti Kebangsaan Malaysia, Selangor, Malaysia
| | - Khattab Alheeti
- Department of Computer Networking Systems, College of Computer Sciences and Information Technology, University of Anbar, Al Anbar, Ramadi, Iraq
| |
Collapse
|
30
|
Cai Y, Cai YQ, Tang LY, Wang YH, Gong M, Jing TC, Li HJ, Li-Ling J, Hu W, Yin Z, Gong DX, Zhang GW. Artificial intelligence in the risk prediction models of cardiovascular disease and development of an independent validation screening tool: a systematic review. BMC Med 2024; 22:56. [PMID: 38317226 PMCID: PMC10845808 DOI: 10.1186/s12916-024-03273-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 07/16/2023] [Accepted: 01/23/2024] [Indexed: 02/07/2024] Open
Abstract
BACKGROUND A comprehensive overview of artificial intelligence (AI) for cardiovascular disease (CVD) prediction and a screening tool of AI models (AI-Ms) for independent external validation are lacking. This systematic review aims to identify, describe, and appraise AI-Ms of CVD prediction in the general and special populations and develop a new independent validation score (IVS) for AI-Ms replicability evaluation. METHODS PubMed, Web of Science, Embase, and IEEE library were searched up to July 2021. Data extraction and analysis were performed for the populations, distribution, predictors, algorithms, etc. The risk of bias was evaluated with the prediction risk of bias assessment tool (PROBAST). Subsequently, we designed IVS for model replicability evaluation with five steps in five items, including transparency of algorithms, performance of models, feasibility of reproduction, risk of reproduction, and clinical implication, respectively. The review is registered in PROSPERO (No. CRD42021271789). RESULTS In 20,887 screened references, 79 articles (82.5% in 2017-2021) were included, which contained 114 datasets (67 in Europe and North America, but 0 in Africa). We identified 486 AI-Ms, of which the majority were in development (n = 380), but none of them had undergone independent external validation. A total of 66 idiographic algorithms were found; however, 36.4% were used only once and only 39.4% over three times. A large number of different predictors (range 5-52,000, median 21) and large-span sample size (range 80-3,660,000, median 4466) were observed. All models were at high risk of bias according to PROBAST, primarily due to the incorrect use of statistical methods. IVS analysis confirmed only 10 models as "recommended"; however, 281 and 187 were "not recommended" and "warning," respectively. CONCLUSION AI has led the digital revolution in the field of CVD prediction, but is still in the early stage of development as the defects of research design, report, and evaluation systems. The IVS we developed may contribute to independent external validation and the development of this field.
Collapse
Affiliation(s)
- Yue Cai
- China Medical University, Shenyang, 110122, China
| | - Yu-Qing Cai
- China Medical University, Shenyang, 110122, China
| | - Li-Ying Tang
- China Medical University, Shenyang, 110122, China
| | - Yi-Han Wang
- China Medical University, Shenyang, 110122, China
| | - Mengchun Gong
- Digital Health China Co. Ltd, Beijing, 100089, China
| | - Tian-Ci Jing
- Smart Hospital Management Department, the First Hospital of China Medical University, Shenyang, 110001, China
| | - Hui-Jun Li
- Shenyang Medical & Film Science and Technology Co. Ltd., Shenyang, 110001, China
- Enduring Medicine Smart Innovation Research Institute, Shenyang, 110001, China
| | - Jesse Li-Ling
- Institute of Genetic Medicine, School of Life Science, State Key Laboratory of Biotherapy, Sichuan University, Chengdu, 610065, China
| | - Wei Hu
- Bayi Orthopedic Hospital, Chengdu, 610017, China
| | - Zhihua Yin
- Department of Epidemiology, School of Public Health, China Medical University, Shenyang, 110122, China.
| | - Da-Xin Gong
- Smart Hospital Management Department, the First Hospital of China Medical University, Shenyang, 110001, China.
- The Internet Hospital Branch of the Chinese Research Hospital Association, Beijing, 100006, China.
| | - Guang-Wei Zhang
- Smart Hospital Management Department, the First Hospital of China Medical University, Shenyang, 110001, China.
- The Internet Hospital Branch of the Chinese Research Hospital Association, Beijing, 100006, China.
| |
Collapse
|
31
|
Liu H, Peng W, Dai W, Lin J, Fu X, Liu L, Liu L, Yu N. Improving anti-cancer drug response prediction using multi-task learning on graph convolutional networks. Methods 2024; 222:41-50. [PMID: 38157919 DOI: 10.1016/j.ymeth.2023.11.018] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/21/2023] [Revised: 09/19/2023] [Accepted: 11/19/2023] [Indexed: 01/03/2024] Open
Abstract
Predicting the therapeutic effect of anti-cancer drugs on tumors based on the characteristics of tumors and patients is one of the important contents of precision oncology. Existing computational methods regard the drug response prediction problem as a classification or regression task. However, few of them consider leveraging the relationship between the two tasks. In this work, we propose a Multi-task Interaction Graph Convolutional Network (MTIGCN) for anti-cancer drug response prediction. MTIGCN first utilizes an graph convolutional network-based model to produce embeddings for both cell lines and drugs. After that, the model employs multi-task learning to predict anti-cancer drug response, which involves training the model on three different tasks simultaneously: the main task of the drug sensitive or resistant classification task and the two auxiliary tasks of regression prediction and similarity network reconstruction. By sharing parameters and optimizing the losses of different tasks simultaneously, MTIGCN enhances the feature representation and reduces overfitting. The results of the experiments on two in vitro datasets demonstrated that MTIGCN outperformed seven state-of-the-art baseline methods. Moreover, the well-trained model on the in vitro dataset GDSC exhibited good performance when applied to predict drug responses in in vivo datasets PDX and TCGA. The case study confirmed the model's ability to discover unknown drug responses in cell lines.
Collapse
Affiliation(s)
- Hancheng Liu
- Faculty of Information Engineering and Automation, Kunming University of Science and Technology, Kunming 650050, China
| | - Wei Peng
- Faculty of Information Engineering and Automation, Kunming University of Science and Technology, Kunming 650050, China; Computer Technology Application Key Lab of Yunnan Province, Kunming University of Science and Technology, Kunming 650050, China.
| | - Wei Dai
- Faculty of Information Engineering and Automation, Kunming University of Science and Technology, Kunming 650050, China; Computer Technology Application Key Lab of Yunnan Province, Kunming University of Science and Technology, Kunming 650050, China.
| | - Jiangzhen Lin
- Faculty of Information Engineering and Automation, Kunming University of Science and Technology, Kunming 650050, China
| | - Xiaodong Fu
- Faculty of Information Engineering and Automation, Kunming University of Science and Technology, Kunming 650050, China; Computer Technology Application Key Lab of Yunnan Province, Kunming University of Science and Technology, Kunming 650050, China
| | - Li Liu
- Faculty of Information Engineering and Automation, Kunming University of Science and Technology, Kunming 650050, China; Computer Technology Application Key Lab of Yunnan Province, Kunming University of Science and Technology, Kunming 650050, China.
| | - Lijun Liu
- Faculty of Information Engineering and Automation, Kunming University of Science and Technology, Kunming 650050, China; Computer Technology Application Key Lab of Yunnan Province, Kunming University of Science and Technology, Kunming 650050, China
| | - Ning Yu
- State University of New York, The College at Brockport, Department of Computing Sciences, 350 New Campus Drive, Brockport NY 14422.
| |
Collapse
|
32
|
Du W, Ma F, Zhang B, Zhang J, Wu D, Sharman E, Jiang J, Wang Y. Spectroscopy-Guided Deep Learning Predicts Solid-Liquid Surface Adsorbate Properties in Unseen Solvents. J Am Chem Soc 2024; 146:811-823. [PMID: 38157302 PMCID: PMC10785802 DOI: 10.1021/jacs.3c10921] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/04/2023] [Revised: 12/13/2023] [Accepted: 12/14/2023] [Indexed: 01/03/2024]
Abstract
Accurately and rapidly acquiring the microscopic properties of a material is crucial for catalysis and electrochemistry. Characterization tools, such as spectroscopy, can be a valuable tool to infer these properties, and when combined with machine learning tools, they can theoretically achieve fast and accurate prediction results. However, on the path to practical applications, training a reliable machine learning model is faced with the challenge of uneven data distribution in a vast array of non-negligible solvent types. Herein, we employ a combination of the first-principles-based approach and data-driven model. Specifically, we utilize density functional theory (DFT) to calculate theoretical spectral data of CO-Ag adsorption in 23 different solvent systems as a data source. Subsequently, we propose a hierarchical knowledge extraction multiexpert neural network (HMNN) to bridge the knowledge gaps among different solvent systems. HMNN undergoes two training tiers: in tier I, it learns fundamental quantitative spectra-property relationships (QSPRs), and in tier II, it inherits the fundamental QSPR knowledge from previous steps through a dynamic integration of expert modules and subsequently captures the solvent differences. The results demonstrate HMNN's superiority in estimating a range of molecular adsorbate properties, with an error range of less than 0.008 eV for zero-shot predictions on unseen solvents. The findings underscore the usability, reliability, and convenience of HMNN and could pave the way for real-time access to microscopic properties by exploiting QSPR.
Collapse
Affiliation(s)
- Wenjie Du
- Key
Laboratory of Precision and Intelligent Chemistry, University of Science and Technology of China, Hefei, Anhui 230026, China
- School
of Software Engineering, University of Science
and Technology of China, Hefei, Anhui 230026, China
- Suzhou
Institute for Advanced Research, University
of Science and Technology of China, Suzhou, Jiangsu 215123, China
| | - Fenfen Ma
- Key
Laboratory of Precision and Intelligent Chemistry, University of Science and Technology of China, Hefei, Anhui 230026, China
- School
of Chemistry and Materials Science, University
of Science and Technology of China, Hefei, Anhui 230026, China
- Gusu
Laboratory of Materials, Suzhou, Jiangsu 215123, China
| | - Baicheng Zhang
- Key
Laboratory of Precision and Intelligent Chemistry, University of Science and Technology of China, Hefei, Anhui 230026, China
- School
of Chemistry and Materials Science, University
of Science and Technology of China, Hefei, Anhui 230026, China
| | - Jiahui Zhang
- School
of Software Engineering, University of Science
and Technology of China, Hefei, Anhui 230026, China
- Suzhou
Institute for Advanced Research, University
of Science and Technology of China, Suzhou, Jiangsu 215123, China
| | - Di Wu
- School
of Software Engineering, University of Science
and Technology of China, Hefei, Anhui 230026, China
- Suzhou
Institute for Advanced Research, University
of Science and Technology of China, Suzhou, Jiangsu 215123, China
| | - Edward Sharman
- Department
of Neurology, University of California, Irvine, California 92697, United States
| | - Jun Jiang
- Key
Laboratory of Precision and Intelligent Chemistry, University of Science and Technology of China, Hefei, Anhui 230026, China
- School
of Chemistry and Materials Science, University
of Science and Technology of China, Hefei, Anhui 230026, China
| | - Yang Wang
- Key
Laboratory of Precision and Intelligent Chemistry, University of Science and Technology of China, Hefei, Anhui 230026, China
- School
of Software Engineering, University of Science
and Technology of China, Hefei, Anhui 230026, China
- Suzhou
Institute for Advanced Research, University
of Science and Technology of China, Suzhou, Jiangsu 215123, China
| |
Collapse
|
33
|
Liu Z, Chen R, Yang L, Jiang J, Ma S, Chen L, He M, Mao Y, Guo C, Kong X, Zhang X, Qi Y, Liu F, He F, Li D. CDS-DB, an omnibus for patient-derived gene expression signatures induced by cancer treatment. Nucleic Acids Res 2024; 52:D1163-D1179. [PMID: 37889038 PMCID: PMC10767794 DOI: 10.1093/nar/gkad888] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/15/2023] [Revised: 09/25/2023] [Accepted: 10/05/2023] [Indexed: 10/28/2023] Open
Abstract
Patient-derived gene expression signatures induced by cancer treatment, obtained from paired pre- and post-treatment clinical transcriptomes, can help reveal drug mechanisms of action (MOAs) in cancer patients and understand the molecular response mechanism of tumor sensitivity or resistance. Their integration and reuse may bring new insights. Paired pre- and post-treatment clinical transcriptomic data are rapidly accumulating. However, a lack of systematic collection makes data access, integration, and reuse challenging. We therefore present the Cancer Drug-induced gene expression Signature DataBase (CDS-DB). CDS-DB has collected 78 patient-derived, paired pre- and post-treatment transcriptomic source datasets with uniformly reprocessed expression profiles and manually curated metadata such as drug administration dosage, sampling time and location, and intrinsic drug response status. From these source datasets, 2012 patient-level gene perturbation signatures were obtained, covering 85 therapeutic regimens, 39 cancer subtypes and 3628 patient samples. Besides data browsing, download and search, CDS-DB also supports single signature analysis (including differential gene expression, functional enrichment, tumor microenvironment and correlation analyses), signature comparative analysis and signature connectivity analysis. This provides insights into drug MOA and its heterogeneity in patients, drug resistance mechanisms, drug repositioning and drug (combination) discovery, etc. CDS-DB is available at http://cdsdb.ncpsb.org.cn/.
Collapse
Affiliation(s)
- Zhongyang Liu
- State Key Laboratory of Proteomics, Beijing Proteome Research Center, National Center for Protein Sciences (Beijing), Beijing Institute of Lifeomics, Beijing 102206, China
- College of Chemistry and Materials Science, Key Laboratory of Medicinal Chemistry and Molecular Diagnosis (Hebei University), Hebei University, Baoding 071002, China
- College of Life Sciences, Hebei University, Baoding 071002, China
| | - Ruzhen Chen
- State Key Laboratory of Proteomics, Beijing Proteome Research Center, National Center for Protein Sciences (Beijing), Beijing Institute of Lifeomics, Beijing 102206, China
| | - Lele Yang
- State Key Laboratory of Proteomics, Beijing Proteome Research Center, National Center for Protein Sciences (Beijing), Beijing Institute of Lifeomics, Beijing 102206, China
- College of Chemistry and Materials Science, Key Laboratory of Medicinal Chemistry and Molecular Diagnosis (Hebei University), Hebei University, Baoding 071002, China
| | - Jianzhou Jiang
- State Key Laboratory of Proteomics, Beijing Proteome Research Center, National Center for Protein Sciences (Beijing), Beijing Institute of Lifeomics, Beijing 102206, China
- College of Life Sciences, Hebei University, Baoding 071002, China
| | - Shurui Ma
- State Key Laboratory of Proteomics, Beijing Proteome Research Center, National Center for Protein Sciences (Beijing), Beijing Institute of Lifeomics, Beijing 102206, China
- School of Basic Medicine, Anhui Medical University, Hefei 230032, China
| | - Lanhui Chen
- State Key Laboratory of Proteomics, Beijing Proteome Research Center, National Center for Protein Sciences (Beijing), Beijing Institute of Lifeomics, Beijing 102206, China
| | - Mengqi He
- State Key Laboratory of Proteomics, Beijing Proteome Research Center, National Center for Protein Sciences (Beijing), Beijing Institute of Lifeomics, Beijing 102206, China
| | - Yichao Mao
- College of Life Sciences, Hebei University, Baoding 071002, China
| | - Congcong Guo
- State Key Laboratory of Proteomics, Beijing Proteome Research Center, National Center for Protein Sciences (Beijing), Beijing Institute of Lifeomics, Beijing 102206, China
| | - Xiangya Kong
- Beijing Cloudna Technology Company, Limited, Beijing 100029, China
| | - Xinlei Zhang
- Beijing Cloudna Technology Company, Limited, Beijing 100029, China
| | - Yaning Qi
- State Key Laboratory of Proteomics, Beijing Proteome Research Center, National Center for Protein Sciences (Beijing), Beijing Institute of Lifeomics, Beijing 102206, China
- College of Chemistry and Materials Science, Key Laboratory of Medicinal Chemistry and Molecular Diagnosis (Hebei University), Hebei University, Baoding 071002, China
| | - Fengsong Liu
- College of Life Sciences, Hebei University, Baoding 071002, China
| | - Fuchu He
- State Key Laboratory of Proteomics, Beijing Proteome Research Center, National Center for Protein Sciences (Beijing), Beijing Institute of Lifeomics, Beijing 102206, China
| | - Dong Li
- State Key Laboratory of Proteomics, Beijing Proteome Research Center, National Center for Protein Sciences (Beijing), Beijing Institute of Lifeomics, Beijing 102206, China
| |
Collapse
|
34
|
Wang Y, Yu X, Gu Y, Li W, Zhu K, Chen L, Tang Y, Liu G. XGraphCDS: An explainable deep learning model for predicting drug sensitivity from gene pathways and chemical structures. Comput Biol Med 2024; 168:107746. [PMID: 38039896 DOI: 10.1016/j.compbiomed.2023.107746] [Citation(s) in RCA: 4] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/12/2023] [Revised: 10/29/2023] [Accepted: 11/20/2023] [Indexed: 12/03/2023]
Abstract
Cancer is a highly complex disease characterized by genetic and phenotypic heterogeneity among individuals. In the era of precision medicine, understanding the genetic basis of these individual differences is crucial for developing new drugs and achieving personalized treatment. Despite the increasing abundance of cancer genomics data, predicting the relationship between cancer samples and drug sensitivity remains challenging. In this study, we developed an explainable graph neural network framework for predicting cancer drug sensitivity (XGraphCDS) based on comparative learning by integrating cancer gene expression information and drug chemical structure knowledge. Specifically, XGraphCDS consists of a unified heterogeneous network and multiple sub-networks, with molecular graphs representing drugs and gene enrichment scores representing cell lines. Experimental results showed that XGraphCDS consistently outperformed most state-of-the-art baselines (R2 = 0.863, AUC = 0.858). We also constructed a separate in vivo prediction model by using transfer learning strategies with in vitro experimental data and achieved good predictive power (AUC = 0.808). Simultaneously, our framework is interpretable, providing insights into resistance mechanisms alongside accurate predictions. The excellent performance of XGraphCDS highlights its immense potential in aiding the development of selective anti-tumor drugs and personalized dosing strategies in the field of precision medicine.
Collapse
Affiliation(s)
- Yimeng Wang
- Shanghai Frontiers Science Center of Optogenetic Techniques for Cell Metabolism, Shanghai Key Laboratory of New Drug Design, School of Pharmacy, East China University of Science and Technology, Shanghai, 200237, China
| | - Xinxin Yu
- Shanghai Frontiers Science Center of Optogenetic Techniques for Cell Metabolism, Shanghai Key Laboratory of New Drug Design, School of Pharmacy, East China University of Science and Technology, Shanghai, 200237, China
| | - Yaxin Gu
- Shanghai Frontiers Science Center of Optogenetic Techniques for Cell Metabolism, Shanghai Key Laboratory of New Drug Design, School of Pharmacy, East China University of Science and Technology, Shanghai, 200237, China
| | - Weihua Li
- Shanghai Frontiers Science Center of Optogenetic Techniques for Cell Metabolism, Shanghai Key Laboratory of New Drug Design, School of Pharmacy, East China University of Science and Technology, Shanghai, 200237, China
| | - Keyun Zhu
- Shanghai Frontiers Science Center of Optogenetic Techniques for Cell Metabolism, Shanghai Key Laboratory of New Drug Design, School of Pharmacy, East China University of Science and Technology, Shanghai, 200237, China
| | - Long Chen
- Shanghai Frontiers Science Center of Optogenetic Techniques for Cell Metabolism, Shanghai Key Laboratory of New Drug Design, School of Pharmacy, East China University of Science and Technology, Shanghai, 200237, China
| | - Yun Tang
- Shanghai Frontiers Science Center of Optogenetic Techniques for Cell Metabolism, Shanghai Key Laboratory of New Drug Design, School of Pharmacy, East China University of Science and Technology, Shanghai, 200237, China.
| | - Guixia Liu
- Shanghai Frontiers Science Center of Optogenetic Techniques for Cell Metabolism, Shanghai Key Laboratory of New Drug Design, School of Pharmacy, East China University of Science and Technology, Shanghai, 200237, China.
| |
Collapse
|
35
|
Xu X, Xiao Z, Zhang F, Wang C, Wei B, Wang Y, Cheng B, Jia Y, Li Y, Li B, Guo H, Xu F. CellVisioner: A Generalizable Cell Virtual Staining Toolbox based on Few-Shot Transfer Learning for Mechanobiological Analysis. RESEARCH (WASHINGTON, D.C.) 2023; 6:0285. [PMID: 38434246 PMCID: PMC10907024 DOI: 10.34133/research.0285] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 07/12/2023] [Accepted: 11/16/2023] [Indexed: 03/05/2024]
Abstract
Visualizing cellular structures especially the cytoskeleton and the nucleus is crucial for understanding mechanobiology, but traditional fluorescence staining has inherent limitations such as phototoxicity and photobleaching. Virtual staining techniques provide an alternative approach to addressing these issues but often require substantial amount of user training data. In this study, we develop a generalizable cell virtual staining toolbox (termed CellVisioner) based on few-shot transfer learning that requires substantially reduced user training data. CellVisioner can virtually stain F-actin and nuclei for various types of cells and extract single-cell parameters relevant to mechanobiology research. Taking the label-free single-cell images as input, CellVisioner can predict cell mechanobiological status (e.g., Yes-associated protein nuclear/cytoplasmic ratio) and perform long-term monitoring for living cells. We envision that CellVisioner would be a powerful tool to facilitate on-site mechanobiological research.
Collapse
Affiliation(s)
- Xiayu Xu
- The Key Laboratory of Biomedical Information Engineering of Ministry of Education,
Xi’an Jiaotong University, Xi’an 710049, P.R. China
- Bioinspired Engineering and Biomechanics Center (BEBC),
Xi’an Jiaotong University, Xi’an 710049, P.R. China
| | - Zhanfeng Xiao
- The Key Laboratory of Biomedical Information Engineering of Ministry of Education,
Xi’an Jiaotong University, Xi’an 710049, P.R. China
- Bioinspired Engineering and Biomechanics Center (BEBC),
Xi’an Jiaotong University, Xi’an 710049, P.R. China
| | - Fan Zhang
- The Key Laboratory of Biomedical Information Engineering of Ministry of Education,
Xi’an Jiaotong University, Xi’an 710049, P.R. China
- Bioinspired Engineering and Biomechanics Center (BEBC),
Xi’an Jiaotong University, Xi’an 710049, P.R. China
| | - Changxiang Wang
- The Key Laboratory of Biomedical Information Engineering of Ministry of Education,
Xi’an Jiaotong University, Xi’an 710049, P.R. China
- Bioinspired Engineering and Biomechanics Center (BEBC),
Xi’an Jiaotong University, Xi’an 710049, P.R. China
| | - Bo Wei
- The Key Laboratory of Biomedical Information Engineering of Ministry of Education,
Xi’an Jiaotong University, Xi’an 710049, P.R. China
- Bioinspired Engineering and Biomechanics Center (BEBC),
Xi’an Jiaotong University, Xi’an 710049, P.R. China
| | - Yaohui Wang
- The Key Laboratory of Biomedical Information Engineering of Ministry of Education,
Xi’an Jiaotong University, Xi’an 710049, P.R. China
- Bioinspired Engineering and Biomechanics Center (BEBC),
Xi’an Jiaotong University, Xi’an 710049, P.R. China
| | - Bo Cheng
- The Key Laboratory of Biomedical Information Engineering of Ministry of Education,
Xi’an Jiaotong University, Xi’an 710049, P.R. China
- Bioinspired Engineering and Biomechanics Center (BEBC),
Xi’an Jiaotong University, Xi’an 710049, P.R. China
| | - Yuanbo Jia
- The Key Laboratory of Biomedical Information Engineering of Ministry of Education,
Xi’an Jiaotong University, Xi’an 710049, P.R. China
- Bioinspired Engineering and Biomechanics Center (BEBC),
Xi’an Jiaotong University, Xi’an 710049, P.R. China
| | - Yuan Li
- The Key Laboratory of Biomedical Information Engineering of Ministry of Education,
Xi’an Jiaotong University, Xi’an 710049, P.R. China
- Bioinspired Engineering and Biomechanics Center (BEBC),
Xi’an Jiaotong University, Xi’an 710049, P.R. China
| | - Bin Li
- The Key Laboratory of Biomedical Information Engineering of Ministry of Education,
Xi’an Jiaotong University, Xi’an 710049, P.R. China
- Bioinspired Engineering and Biomechanics Center (BEBC),
Xi’an Jiaotong University, Xi’an 710049, P.R. China
| | - Hui Guo
- Department of Medical Oncology,
The First Affiliated Hospital of Xi’an Jiaotong University, Xi’an 710061, P.R. China
| | - Feng Xu
- The Key Laboratory of Biomedical Information Engineering of Ministry of Education,
Xi’an Jiaotong University, Xi’an 710049, P.R. China
- Bioinspired Engineering and Biomechanics Center (BEBC),
Xi’an Jiaotong University, Xi’an 710049, P.R. China
| |
Collapse
|
36
|
Ren Q, Qu N, Sun J, Zhou J, Liu J, Ni L, Tong X, Zhang Z, Kong X, Wen Y, Wang Y, Wang D, Luo X, Zhang S, Zheng M, Li X. KinomeMETA: meta-learning enhanced kinome-wide polypharmacology profiling. Brief Bioinform 2023; 25:bbad461. [PMID: 38113075 PMCID: PMC10729787 DOI: 10.1093/bib/bbad461] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/27/2023] [Revised: 11/08/2023] [Accepted: 11/22/2023] [Indexed: 12/21/2023] Open
Abstract
Kinase inhibitors are crucial in cancer treatment, but drug resistance and side effects hinder the development of effective drugs. To address these challenges, it is essential to analyze the polypharmacology of kinase inhibitor and identify compound with high selectivity profile. This study presents KinomeMETA, a framework for profiling the activity of small molecule kinase inhibitors across a panel of 661 kinases. By training a meta-learner based on a graph neural network and fine-tuning it to create kinase-specific learners, KinomeMETA outperforms benchmark multi-task models and other kinase profiling models. It provides higher accuracy for understudied kinases with limited known data and broader coverage of kinase types, including important mutant kinases. Case studies on the discovery of new scaffold inhibitors for membrane-associated tyrosine- and threonine-specific cdc2-inhibitory kinase and selective inhibitors for fibroblast growth factor receptors demonstrate the role of KinomeMETA in virtual screening and kinome-wide activity profiling. Overall, KinomeMETA has the potential to accelerate kinase drug discovery by more effectively exploring the kinase polypharmacology landscape.
Collapse
Affiliation(s)
- Qun Ren
- Nanjing University of Chinese Medicine, 138 Xianlin Road, Nanjing 210023, China
- Drug Discovery and Design Center, State Key Laboratory of Drug Research, Shanghai Institute of Materia Medica, Chinese Academy of Sciences, 555 Zuchongzhi Road, Shanghai 201203, China
| | - Ning Qu
- Drug Discovery and Design Center, State Key Laboratory of Drug Research, Shanghai Institute of Materia Medica, Chinese Academy of Sciences, 555 Zuchongzhi Road, Shanghai 201203, China
- University of Chinese Academy of Sciences, No.19A Yuquan Road, Beijing 100049, China
| | - Jingjing Sun
- Drug Discovery and Design Center, State Key Laboratory of Drug Research, Shanghai Institute of Materia Medica, Chinese Academy of Sciences, 555 Zuchongzhi Road, Shanghai 201203, China
- University of Chinese Academy of Sciences, No.19A Yuquan Road, Beijing 100049, China
| | - Jingyi Zhou
- Drug Discovery and Design Center, State Key Laboratory of Drug Research, Shanghai Institute of Materia Medica, Chinese Academy of Sciences, 555 Zuchongzhi Road, Shanghai 201203, China
- School of Physical Science and Technology, ShanghaiTech University, Shanghai 201210, China
- Lingang Laboratory, Shanghai 200031, China
| | - Jin Liu
- Drug Discovery and Design Center, State Key Laboratory of Drug Research, Shanghai Institute of Materia Medica, Chinese Academy of Sciences, 555 Zuchongzhi Road, Shanghai 201203, China
| | - Lin Ni
- Nanjing University of Chinese Medicine, 138 Xianlin Road, Nanjing 210023, China
- Drug Discovery and Design Center, State Key Laboratory of Drug Research, Shanghai Institute of Materia Medica, Chinese Academy of Sciences, 555 Zuchongzhi Road, Shanghai 201203, China
| | - Xiaochu Tong
- Drug Discovery and Design Center, State Key Laboratory of Drug Research, Shanghai Institute of Materia Medica, Chinese Academy of Sciences, 555 Zuchongzhi Road, Shanghai 201203, China
- University of Chinese Academy of Sciences, No.19A Yuquan Road, Beijing 100049, China
| | - Zimei Zhang
- Drug Discovery and Design Center, State Key Laboratory of Drug Research, Shanghai Institute of Materia Medica, Chinese Academy of Sciences, 555 Zuchongzhi Road, Shanghai 201203, China
| | - Xiangtai Kong
- Drug Discovery and Design Center, State Key Laboratory of Drug Research, Shanghai Institute of Materia Medica, Chinese Academy of Sciences, 555 Zuchongzhi Road, Shanghai 201203, China
- University of Chinese Academy of Sciences, No.19A Yuquan Road, Beijing 100049, China
| | - Yiming Wen
- College of Pharmaceutical Sciences, Zhejiang University, Hangzhou 310058, China
- Drug Discovery and Design Center, State Key Laboratory of Drug Research, Shanghai Institute of Materia Medica, Chinese Academy of Sciences, 555 Zuchongzhi Road, Shanghai 201203, China
- University of Chinese Academy of Sciences, No.19A Yuquan Road, Beijing 100049, China
| | - Yitian Wang
- Drug Discovery and Design Center, State Key Laboratory of Drug Research, Shanghai Institute of Materia Medica, Chinese Academy of Sciences, 555 Zuchongzhi Road, Shanghai 201203, China
- University of Chinese Academy of Sciences, No.19A Yuquan Road, Beijing 100049, China
| | - Dingyan Wang
- School of Pharmaceutical Science and Technology, Hangzhou Institute for Advanced Study, Hangzhou 330106, China
| | - Xiaomin Luo
- Drug Discovery and Design Center, State Key Laboratory of Drug Research, Shanghai Institute of Materia Medica, Chinese Academy of Sciences, 555 Zuchongzhi Road, Shanghai 201203, China
- University of Chinese Academy of Sciences, No.19A Yuquan Road, Beijing 100049, China
| | - Sulin Zhang
- Drug Discovery and Design Center, State Key Laboratory of Drug Research, Shanghai Institute of Materia Medica, Chinese Academy of Sciences, 555 Zuchongzhi Road, Shanghai 201203, China
- University of Chinese Academy of Sciences, No.19A Yuquan Road, Beijing 100049, China
| | - Mingyue Zheng
- Drug Discovery and Design Center, State Key Laboratory of Drug Research, Shanghai Institute of Materia Medica, Chinese Academy of Sciences, 555 Zuchongzhi Road, Shanghai 201203, China
- University of Chinese Academy of Sciences, No.19A Yuquan Road, Beijing 100049, China
- Nanjing University of Chinese Medicine, 138 Xianlin Road, Nanjing 210023, China
- College of Pharmaceutical Sciences, Zhejiang University, Hangzhou 310058, China
| | - Xutong Li
- Drug Discovery and Design Center, State Key Laboratory of Drug Research, Shanghai Institute of Materia Medica, Chinese Academy of Sciences, 555 Zuchongzhi Road, Shanghai 201203, China
- University of Chinese Academy of Sciences, No.19A Yuquan Road, Beijing 100049, China
| |
Collapse
|
37
|
Hassan AZ, Ward HN, Rahman M, Billmann M, Lee Y, Myers CL. Dimensionality reduction methods for extracting functional networks from large-scale CRISPR screens. Mol Syst Biol 2023; 19:e11657. [PMID: 37750448 PMCID: PMC10632734 DOI: 10.15252/msb.202311657] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/19/2023] [Revised: 08/28/2023] [Accepted: 09/05/2023] [Indexed: 09/27/2023] Open
Abstract
CRISPR-Cas9 screens facilitate the discovery of gene functional relationships and phenotype-specific dependencies. The Cancer Dependency Map (DepMap) is the largest compendium of whole-genome CRISPR screens aimed at identifying cancer-specific genetic dependencies across human cell lines. A mitochondria-associated bias has been previously reported to mask signals for genes involved in other functions, and thus, methods for normalizing this dominant signal to improve co-essentiality networks are of interest. In this study, we explore three unsupervised dimensionality reduction methods-autoencoders, robust, and classical principal component analyses (PCA)-for normalizing the DepMap to improve functional networks extracted from these data. We propose a novel "onion" normalization technique to combine several normalized data layers into a single network. Benchmarking analyses reveal that robust PCA combined with onion normalization outperforms existing methods for normalizing the DepMap. Our work demonstrates the value of removing low-dimensional signals from the DepMap before constructing functional gene networks and provides generalizable dimensionality reduction-based normalization tools.
Collapse
Affiliation(s)
- Arshia Zernab Hassan
- Department of Computer Science and EngineeringUniversity of Minnesota – Twin CitiesMinneapolisMNUSA
| | - Henry N Ward
- Bioinformatics and Computational Biology Graduate ProgramUniversity of Minnesota – Twin CitiesMinneapolisMNUSA
| | - Mahfuzur Rahman
- Department of Computer Science and EngineeringUniversity of Minnesota – Twin CitiesMinneapolisMNUSA
| | - Maximilian Billmann
- Department of Computer Science and EngineeringUniversity of Minnesota – Twin CitiesMinneapolisMNUSA
- Institute of Human GeneticsUniversity of Bonn, School of Medicine and University Hospital BonnBonnGermany
| | - Yoonkyu Lee
- Bioinformatics and Computational Biology Graduate ProgramUniversity of Minnesota – Twin CitiesMinneapolisMNUSA
| | - Chad L Myers
- Department of Computer Science and EngineeringUniversity of Minnesota – Twin CitiesMinneapolisMNUSA
- Bioinformatics and Computational Biology Graduate ProgramUniversity of Minnesota – Twin CitiesMinneapolisMNUSA
| |
Collapse
|
38
|
Bazgir O, Lu J. REFINED-CNN framework for survival prediction with high-dimensional features. iScience 2023; 26:107627. [PMID: 37664631 PMCID: PMC10474067 DOI: 10.1016/j.isci.2023.107627] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/09/2023] [Revised: 06/21/2023] [Accepted: 08/10/2023] [Indexed: 09/05/2023] Open
Abstract
Robust and accurate survival prediction of clinical trials using high-throughput genomics data is a fundamental challenge in pharmacogenomics. Current machine learning tools often provide limited predictive performance and model interpretation in these settings. In the present study, we extend the application of REFINED-CNN from regression tasks to making survival predictions, by mapping high-dimensional RNA sequencing data into REFINED images which are conducive to CNN modeling. We show that the REFINED-CNN survival model can be easily adapted to new tasks of a similar nature (e.g., predicting on new cancer types) using transfer learning with a low number of patients. Furthermore, the model can also be interpreted both locally and globally through risk score back propagation that quantifies each feature (e.g., gene) importance in survival prediction task for the patient or cancer type of interest.
Collapse
Affiliation(s)
- Omid Bazgir
- Modeling & Simulation/Clinical Pharmacology, Genentech, 1 DNA Way, South San Francisco, CA 94080, USA
| | - James Lu
- Modeling & Simulation/Clinical Pharmacology, Genentech, 1 DNA Way, South San Francisco, CA 94080, USA
| |
Collapse
|
39
|
Cheng KP, Shen WX, Jiang YY, Chen Y, Chen YZ, Tan Y. Deep learning of 2D-Restructured gene expression representations for improved low-sample therapeutic response prediction. Comput Biol Med 2023; 164:107245. [PMID: 37480677 DOI: 10.1016/j.compbiomed.2023.107245] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/26/2023] [Revised: 06/27/2023] [Accepted: 07/07/2023] [Indexed: 07/24/2023]
Abstract
Clinical outcome prediction is important for stratified therapeutics. Machine learning (ML) and deep learning (DL) methods facilitate therapeutic response prediction from transcriptomic profiles of cells and clinical samples. Clinical transcriptomic DL is challenged by the low-sample sizes (34-286 subjects), high-dimensionality (up to 21,653 genes) and unordered nature of clinical transcriptomic data. The established methods rely on ML algorithms at accuracy levels of 0.6-0.8 AUC/ACC values. Low-sample DL algorithms are needed for enhanced prediction capability. Here, an unsupervised manifold-guided algorithm was employed for restructuring transcriptomic data into ordered image-like 2D-representations, followed by efficient DL of these 2D-representations with deep ConvNets. Our DL models significantly outperformed the state-of-the-art (SOTA) ML models on 82% of 17 low-sample benchmark datasets (53% with >0.05 AUC/ACC improvement). They are more robust than the SOTA models in cross-cohort prediction tasks, and in identifying robust biomarkers and response-dependent variational patterns consistent with experimental indications.
Collapse
Affiliation(s)
- Kai Ping Cheng
- The State Key Laboratory of Chemical Oncogenomics, Key Laboratory of Chemical Biology, Tsinghua Shenzhen International Graduate School, Tsinghua University, Shenzhen, 518055, PR China; Institute of Biomedical Health Technology and Engineering, Shenzhen Bay Laboratory, Shenzhen, 518132, PR China
| | - Wan Xiang Shen
- Bioinformatics and Drug Design Group, Department of Pharmacy, Center for Computational Science and Engineering, National University of Singapore, 117543, Singapore
| | - Yu Yang Jiang
- School of Pharmaceutical Sciences, Tsinghua University, Beijing, 100084, PR China
| | - Yan Chen
- The State Key Laboratory of Chemical Oncogenomics, Key Laboratory of Chemical Biology, Tsinghua Shenzhen International Graduate School, Tsinghua University, Shenzhen, 518055, PR China
| | - Yu Zong Chen
- The State Key Laboratory of Chemical Oncogenomics, Key Laboratory of Chemical Biology, Tsinghua Shenzhen International Graduate School, Tsinghua University, Shenzhen, 518055, PR China; Institute of Biomedical Health Technology and Engineering, Shenzhen Bay Laboratory, Shenzhen, 518132, PR China.
| | - Ying Tan
- The State Key Laboratory of Chemical Oncogenomics, Key Laboratory of Chemical Biology, Tsinghua Shenzhen International Graduate School, Tsinghua University, Shenzhen, 518055, PR China; The Institute of Drug Discovery Technology, Ningbo University, Ningbo, 315211, PR China; Shenzhen Kivita Innovative Drug Discovery Institute, Shenzhen, 518110, PR China.
| |
Collapse
|
40
|
Zhang QQ, Zhang SW, Feng YH, Shi JY. Few-Shot Drug Synergy Prediction With a Prior-Guided Hypernetwork Architecture. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE 2023; 45:9709-9725. [PMID: 37027608 DOI: 10.1109/tpami.2023.3248041] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/19/2023]
Abstract
Predicting drug synergy is critical to tailoring feasible drug combination treatment regimens for cancer patients. However, most of the existing computational methods only focus on data-rich cell lines, and hardly work on data-poor cell lines. To this end, here we proposed a novel few-shot drug synergy prediction method (called HyperSynergy) for data-poor cell lines by designing a prior-guided Hypernetwork architecture, in which the meta-generative network based on the task embedding of each cell line generates cell line dependent parameters for the drug synergy prediction network. In HyperSynergy model, we designed a deep Bayesian variational inference model to infer the prior distribution over the task embedding to quickly update the task embedding with a few labeled drug synergy samples, and presented a three-stage learning strategy to train HyperSynergy for quickly updating the prior distribution by a few labeled drug synergy samples of each data-poor cell line. Moreover, we proved theoretically that HyperSynergy aims to maximize the lower bound of log-likelihood of the marginal distribution over each data-poor cell line. The experimental results show that our HyperSynergy outperforms other state-of-the-art methods not only on data-poor cell lines with a few samples (e.g., 10, 5, 0), but also on data-rich cell lines.
Collapse
|
41
|
Ge Y, Guo Y, Das S, Al-Garadi MA, Sarker A. Few-shot learning for medical text: A review of advances, trends, and opportunities. J Biomed Inform 2023; 144:104458. [PMID: 37488023 PMCID: PMC10940971 DOI: 10.1016/j.jbi.2023.104458] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/21/2023] [Revised: 06/19/2023] [Accepted: 07/19/2023] [Indexed: 07/26/2023]
Abstract
BACKGROUND Few-shot learning (FSL) is a class of machine learning methods that require small numbers of labeled instances for training. With many medical topics having limited annotated text-based data in practical settings, FSL-based natural language processing (NLP) holds substantial promise. We aimed to conduct a review to explore the current state of FSL methods for medical NLP. METHODS We searched for articles published between January 2016 and October 2022 using PubMed/Medline, Embase, ACL Anthology, and IEEE Xplore Digital Library. We also searched the preprint servers (e.g., arXiv, medRxiv, and bioRxiv) via Google Scholar to identify the latest relevant methods. We included all articles that involved FSL and any form of medical text. We abstracted articles based on the data source, target task, training set size, primary method(s)/approach(es), and evaluation metric(s). RESULTS Fifty-one articles met our inclusion criteria-all published after 2018, and most since 2020 (42/51; 82%). Concept extraction/named entity recognition was the most frequently addressed task (21/51; 41%), followed by text classification (16/51; 31%). Thirty-two (61%) articles reconstructed existing datasets to fit few-shot scenarios, and MIMIC-III was the most frequently used dataset (10/51; 20%). 77% of the articles attempted to incorporate prior knowledge to augment the small datasets available for training. Common methods included FSL with attention mechanisms (20/51; 39%), prototypical networks (11/51; 22%), meta-learning (7/51; 14%), and prompt-based learning methods, the latter being particularly popular since 2021. Benchmarking experiments demonstrated relative underperformance of FSL methods on biomedical NLP tasks. CONCLUSION Despite the potential for FSL in biomedical NLP, progress has been limited. This may be attributed to the rarity of specialized data, lack of standardized evaluation criteria, and the underperformance of FSL methods on biomedical topics. The creation of publicly-available specialized datasets for biomedical FSL may aid method development by facilitating comparative analyses.
Collapse
Affiliation(s)
- Yao Ge
- Department of Biomedical Informatics, School of Medicine, Emory University, Atlanta, GA, United States of America
| | - Yuting Guo
- Department of Biomedical Informatics, School of Medicine, Emory University, Atlanta, GA, United States of America
| | - Sudeshna Das
- Department of Biomedical Informatics, School of Medicine, Emory University, Atlanta, GA, United States of America
| | - Mohammed Ali Al-Garadi
- Department of Biomedical Informatics, Vanderbilt University Medical Center, Vanderbilt University, Nashville, TN, United States of America
| | - Abeed Sarker
- Department of Biomedical Informatics, School of Medicine, Emory University, Atlanta, GA, United States of America; Department of Biomedical Engineering, Georgia Institute of Technology and Emory University, Atlanta, GA, United States of America.
| |
Collapse
|
42
|
Yuan S, Chen YC, Tsai CH, Chen HW, Shieh GS. Feature selection translates drug response predictors from cell lines to patients. Front Genet 2023; 14:1217414. [PMID: 37519889 PMCID: PMC10382684 DOI: 10.3389/fgene.2023.1217414] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/05/2023] [Accepted: 06/26/2023] [Indexed: 08/01/2023] Open
Abstract
Targeted therapies and chemotherapies are prevalent in cancer treatment. Identification of predictive markers to stratify cancer patients who will respond to these therapies remains challenging because patient drug response data are limited. As large amounts of drug response data have been generated by cell lines, methods to efficiently translate cell-line-trained predictors to human tumors will be useful in clinical practice. Here, we propose versatile feature selection procedures that can be combined with any classifier. For demonstration, we combined the feature selection procedures with a (linear) logit model and a (non-linear) K-nearest neighbor and trained these on cell lines to result in LogitDA and KNNDA, respectively. We show that LogitDA/KNNDA significantly outperforms existing methods, e.g., a logistic model and a deep learning method trained by thousands of genes, in prediction AUC (0.70-1.00 for seven of the ten drugs tested) and is interpretable. This may be due to the fact that sample sizes are often limited in the area of drug response prediction. We further derive a novel adjustment on the prediction cutoff for LogitDA to yield a prediction accuracy of 0.70-0.93 for seven drugs, including erlotinib and cetuximab, whose pathways relevant to anti-cancer therapies are also uncovered. These results indicate that our methods can efficiently translate cell-line-trained predictors into tumors.
Collapse
Affiliation(s)
- Shinsheng Yuan
- Institute of Statistical Science, Academia Sinica, Taipei, Taiwan
- Bioinformatics Program, Taiwan International Graduate Program, Academia Sinica, Taipei, Taiwan
| | - Yen-Chou Chen
- Institute of Statistical Science, Academia Sinica, Taipei, Taiwan
| | - Chi-Hsuan Tsai
- Institute of Statistical Science, Academia Sinica, Taipei, Taiwan
| | - Huei-Wen Chen
- College of Medicine, Graduate Institute of Toxicology, National Taiwan University, Taipei, Taiwan
| | - Grace S. Shieh
- Institute of Statistical Science, Academia Sinica, Taipei, Taiwan
- Bioinformatics Program, Taiwan International Graduate Program, Academia Sinica, Taipei, Taiwan
- Genome and Systems Biology Degree Program, Academia Sinica and National Taiwan University, Taipei, Taiwan
- Data Science Degree Program, Academia Sinica and National Taiwan University, Taipei, Taiwan
| |
Collapse
|
43
|
Zhou K, Wang W, Tang J. Editorial: Functional screening for cancer drug discovery: from experimental approaches to data integration. Front Genet 2023; 14:1201454. [PMID: 37485338 PMCID: PMC10359426 DOI: 10.3389/fgene.2023.1201454] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/06/2023] [Accepted: 06/30/2023] [Indexed: 07/25/2023] Open
Affiliation(s)
- Kecheng Zhou
- School of Life Sciences, Anhui Medical University, Hefei, China
| | - Wenyu Wang
- Research Program in Systems Oncology, Faculty of Medicine, University of Helsinki, Helsinki, Finland
| | - Jing Tang
- Research Program in Systems Oncology, Faculty of Medicine, University of Helsinki, Helsinki, Finland
| |
Collapse
|
44
|
Metzger P, Hess ME, Blaumeiser A, Pauli T, Schipperges V, Mertes R, Christoph J, Unberath P, Reimer N, Scheible R, Illert AL, Busch H, Andrieux G, Boerries M. MIRACUM-Pipe: An Adaptable Pipeline for Next-Generation Sequencing Analysis, Reporting, and Visualization for Clinical Decision Making. Cancers (Basel) 2023; 15:3456. [PMID: 37444566 DOI: 10.3390/cancers15133456] [Citation(s) in RCA: 10] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/31/2023] [Revised: 06/09/2023] [Accepted: 06/29/2023] [Indexed: 07/15/2023] Open
Abstract
(1) Background: Next-generation sequencing (NGS) of patients with advanced tumors is becoming an established method in Molecular Tumor Boards. However, somatic variant detection, interpretation, and report generation, require in-depth knowledge of both bioinformatics and oncology. (2) Methods: MIRACUM-Pipe combines many individual tools into a seamless workflow for comprehensive analyses and annotation of NGS data including quality control, alignment, variant calling, copy number variation estimation, evaluation of complex biomarkers, and RNA fusion detection. (3) Results: MIRACUM-Pipe offers an easy-to-use, one-prompt standardized solution to analyze NGS data, including quality control, variant calling, copy number estimation, annotation, visualization, and report generation. (4) Conclusions: MIRACUM-Pipe, a versatile pipeline for NGS, can be customized according to bioinformatics and clinical needs and to support clinical decision-making with visual processing and interactive reporting.
Collapse
Affiliation(s)
- Patrick Metzger
- Institute of Medical Bioinformatics and Systems Medicine, Medical Center-University of Freiburg, Faculty of Medicine, University of Freiburg, 79110 Freiburg, Germany
| | - Maria Elena Hess
- Institute of Medical Bioinformatics and Systems Medicine, Medical Center-University of Freiburg, Faculty of Medicine, University of Freiburg, 79110 Freiburg, Germany
- Faculty of Biology, University of Freiburg, 79104 Freiburg, Germany
| | - Andreas Blaumeiser
- Institute of Medical Bioinformatics and Systems Medicine, Medical Center-University of Freiburg, Faculty of Medicine, University of Freiburg, 79110 Freiburg, Germany
- German Cancer Consortium (DKTK) and German Cancer Research Center (DKFZ), Partner Site Freiburg, 79110 Freiburg, Germany
| | - Thomas Pauli
- Institute of Medical Bioinformatics and Systems Medicine, Medical Center-University of Freiburg, Faculty of Medicine, University of Freiburg, 79110 Freiburg, Germany
| | - Vincent Schipperges
- Institute of Medical Bioinformatics and Systems Medicine, Medical Center-University of Freiburg, Faculty of Medicine, University of Freiburg, 79110 Freiburg, Germany
| | - Ralf Mertes
- Institute of Medical Bioinformatics and Systems Medicine, Medical Center-University of Freiburg, Faculty of Medicine, University of Freiburg, 79110 Freiburg, Germany
- Faculty of Biology, University of Freiburg, 79104 Freiburg, Germany
| | - Jan Christoph
- Junior Research Group (Bio-)Medical Data Science, Faculty of Medicine, Martin-Luther-University Halle-Wittenberg, 06122 Halle, Germany
- Medical Informatics, Friedrich-Alexander University Erlangen-Nuremberg, 91058 Erlangen, Germany
| | - Philipp Unberath
- Medical Informatics, Friedrich-Alexander University Erlangen-Nuremberg, 91058 Erlangen, Germany
| | - Niklas Reimer
- Medical Systems Biology Group, Lübeck Institute Für Experimental Dermatology, University of Lübeck, Ratzeburger Alle 160, 23538 Lübeck, Germany
| | - Raphael Scheible
- Institute for AI and Informatics in Medicine, University Hospital Rechts der Isar, Technical University Munich, 81675 Munich, Germany
- Institute for Immunodeficiency, Center for Chronic Immunodeficiency, Medical Center-University of Freiburg, Faculty of Medicine, University of Freiburg, 79106 Freiburg, Germany
| | - Anna L Illert
- German Cancer Consortium (DKTK) and German Cancer Research Center (DKFZ), Partner Site Freiburg, 79110 Freiburg, Germany
- Department of Medicine III, Klinikum Rechts der Isar, Faculty of Medicine, Technical University of Munich, 81675 Munich, Germany
- Department of Medicine I, Medical Center-University of Freiburg, Faculty of Medicine, University of Freiburg, 79110 Freiburg, Germany
- TranslaTUM, Center for Translational Cancer Research, Technical University of Munich, 81675 Munich, Germany
- German Cancer Consortium (DKTK) and German Cancer Research Center (DKFZ), Partner Site Munich, 81675 Munich, Germany
- Center for Personalized Medicine, Klinikum Rechts der Isar, Faculty of Medicine, Technical University of Munich, 81675 Munich, Germany
| | - Hauke Busch
- Medical Systems Biology Group, Lübeck Institute Für Experimental Dermatology, University of Lübeck, Ratzeburger Alle 160, 23538 Lübeck, Germany
| | - Geoffroy Andrieux
- Institute of Medical Bioinformatics and Systems Medicine, Medical Center-University of Freiburg, Faculty of Medicine, University of Freiburg, 79110 Freiburg, Germany
| | - Melanie Boerries
- Institute of Medical Bioinformatics and Systems Medicine, Medical Center-University of Freiburg, Faculty of Medicine, University of Freiburg, 79110 Freiburg, Germany
- German Cancer Consortium (DKTK) and German Cancer Research Center (DKFZ), Partner Site Freiburg, 79110 Freiburg, Germany
| |
Collapse
|
45
|
He W, Demas DM, Shajahan-Haq AN, Baumann WT. Modeling breast cancer proliferation, drug synergies, and alternating therapies. iScience 2023; 26:106714. [PMID: 37234088 PMCID: PMC10206440 DOI: 10.1016/j.isci.2023.106714] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/14/2022] [Revised: 02/12/2023] [Accepted: 04/18/2023] [Indexed: 05/27/2023] Open
Abstract
Estrogen receptor positive (ER+) breast cancer is responsive to a number of targeted therapies used clinically. Unfortunately, the continuous application of targeted therapy often results in resistance, driving the consideration of combination and alternating therapies. Toward this end, we developed a mathematical model that can simulate various mono, combination, and alternating therapies for ER + breast cancer cells at different doses over long time scales. The model is used to look for optimal drug combinations and predicts a significant synergism between Cdk4/6 inhibitors in combination with the anti-estrogen fulvestrant, which may help explain the clinical success of adding Cdk4/6 inhibitors to anti-estrogen therapy. Furthermore, the model is used to optimize an alternating treatment protocol so it works as well as monotherapy while using less total drug dose.
Collapse
Affiliation(s)
- Wei He
- Program in Genetics, Bioinformatics, and Computational Biology, VT BIOTRANS, Virginia Tech, Blacksburg, VA 24061, USA
| | - Diane M. Demas
- Department of Oncology, Lombardi Comprehensive Cancer Center, Georgetown University Medical Center, Washington, DC 20057, USA
| | - Ayesha N. Shajahan-Haq
- Department of Oncology, Lombardi Comprehensive Cancer Center, Georgetown University Medical Center, Washington, DC 20057, USA
| | - William T. Baumann
- Department of Electrical and Computer Engineering, Virginia Tech, Blacksburg, VA 24061, USA
| |
Collapse
|
46
|
Liu Y, Wu W, Cai C, Zhang H, Shen H, Han Y. Patient-derived xenograft models in cancer therapy: technologies and applications. Signal Transduct Target Ther 2023; 8:160. [PMID: 37045827 PMCID: PMC10097874 DOI: 10.1038/s41392-023-01419-2] [Citation(s) in RCA: 140] [Impact Index Per Article: 70.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/22/2022] [Accepted: 03/21/2023] [Indexed: 04/14/2023] Open
Abstract
Patient-derived xenograft (PDX) models, in which tumor tissues from patients are implanted into immunocompromised or humanized mice, have shown superiority in recapitulating the characteristics of cancer, such as the spatial structure of cancer and the intratumor heterogeneity of cancer. Moreover, PDX models retain the genomic features of patients across different stages, subtypes, and diversified treatment backgrounds. Optimized PDX engraftment procedures and modern technologies such as multi-omics and deep learning have enabled a more comprehensive depiction of the PDX molecular landscape and boosted the utilization of PDX models. These irreplaceable advantages make PDX models an ideal choice in cancer treatment studies, such as preclinical trials of novel drugs, validating novel drug combinations, screening drug-sensitive patients, and exploring drug resistance mechanisms. In this review, we gave an overview of the history of PDX models and the process of PDX model establishment. Subsequently, the review presents the strengths and weaknesses of PDX models and highlights the integration of novel technologies in PDX model research. Finally, we delineated the broad application of PDX models in chemotherapy, targeted therapy, immunotherapy, and other novel therapies.
Collapse
Affiliation(s)
- Yihan Liu
- Department of Oncology, Xiangya Hospital, Central South University, Changsha, Hunan, 410008, China
- National Clinical Research Center for Geriatric Disorders, Xiangya Hospital, Central South University, Changsha, Hunan, 410008, P.R. China
| | - Wantao Wu
- Department of Oncology, Xiangya Hospital, Central South University, Changsha, Hunan, 410008, China
- National Clinical Research Center for Geriatric Disorders, Xiangya Hospital, Central South University, Changsha, Hunan, 410008, P.R. China
| | - Changjing Cai
- Department of Oncology, Xiangya Hospital, Central South University, Changsha, Hunan, 410008, China
- National Clinical Research Center for Geriatric Disorders, Xiangya Hospital, Central South University, Changsha, Hunan, 410008, P.R. China
| | - Hao Zhang
- Department of Neurosurgery, The Second Affiliated Hospital, Chongqing Medical University, Chongqing, China
| | - Hong Shen
- Department of Oncology, Xiangya Hospital, Central South University, Changsha, Hunan, 410008, China.
- National Clinical Research Center for Geriatric Disorders, Xiangya Hospital, Central South University, Changsha, Hunan, 410008, P.R. China.
| | - Ying Han
- Department of Oncology, Xiangya Hospital, Central South University, Changsha, Hunan, 410008, China.
- National Clinical Research Center for Geriatric Disorders, Xiangya Hospital, Central South University, Changsha, Hunan, 410008, P.R. China.
| |
Collapse
|
47
|
Zhang H, Wang Z, Nan Y, Zagidullin B, Yi D, Tang J, Guan Y. Harmonizing across datasets to improve the transferability of drug combination prediction. Commun Biol 2023; 6:397. [PMID: 37041243 PMCID: PMC10090076 DOI: 10.1038/s42003-023-04783-5] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/20/2022] [Accepted: 03/30/2023] [Indexed: 04/13/2023] Open
Abstract
Combination treatment has multiple advantages over traditional monotherapy in clinics, thus becoming a target of interest for many high-throughput screening (HTS) studies, which enables the development of machine learning models predicting the response of new drug combinations. However, most existing models have been tested only within a single study, and these models cannot generalize across different datasets due to significantly variable experimental settings. Here, we thoroughly assessed the transferability issue of single-study-derived models on new datasets. More importantly, we propose a method to overcome the experimental variability by harmonizing dose-response curves of different studies. Our method improves the prediction performance of machine learning models by 184% and 1367% compared to the baseline models in intra-study and inter-study predictions, respectively, and shows consistent improvement in multiple cross-validation settings. Our study addresses the crucial question of the transferability in drug combination predictions, which is fundamental for such models to be extrapolated to new drug combination discovery and clinical applications that are de facto different datasets.
Collapse
Affiliation(s)
- Hanrui Zhang
- Department of Computational Medicine and Bioinformatics, Michigan Medicine, University of Michigan, Ann Arbor, MI, USA
| | - Ziyan Wang
- Department of Electrical Engineering and Computer Science (EECS) - CSE Division, University of Michigan, Ann Arbor, MI, USA
| | - Yiyang Nan
- Department of Computational Medicine and Bioinformatics, Michigan Medicine, University of Michigan, Ann Arbor, MI, USA
| | - Bulat Zagidullin
- Research Program in Systems Oncology, Faculty of Medicine, University of Helsinki, Helsinki, Finland
- Institute for Molecular Medicine Finland (FIMM), University of Helsinki, Helsinki, Finland
| | - Daiyao Yi
- Department of Computational Medicine and Bioinformatics, Michigan Medicine, University of Michigan, Ann Arbor, MI, USA
| | - Jing Tang
- Research Program in Systems Oncology, Faculty of Medicine, University of Helsinki, Helsinki, Finland.
| | - Yuanfang Guan
- Department of Computational Medicine and Bioinformatics, Michigan Medicine, University of Michigan, Ann Arbor, MI, USA.
- Department of Internal medicine, Michigan Medicine, University of Michigan, Ann Arbor, MI, USA.
| |
Collapse
|
48
|
Singh DP, Kaushik B. CTDN (Convolutional Temporal Based Deep- Neural Network): An Improvised Stacked Hybrid Computational Approach for Anticancer Drug Response Prediction. Comput Biol Chem 2023; 105:107868. [PMID: 37257399 DOI: 10.1016/j.compbiolchem.2023.107868] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/03/2023] [Revised: 03/31/2023] [Accepted: 04/04/2023] [Indexed: 06/02/2023]
Abstract
The characterization of drug - metabolizing enzymes is a significant problem for customized therapy. It is important to choose the right drugs for cancer victims, and the ability to forecast how those drugs will react is usually based on the available information, genetic sequence, and structural properties. To the finest of our knowledge, this is the first study to evaluate optimization algorithms for selection of features and pharmacogenetics categorization using classification methods based on a successful evolutionary algorithm using datasets from the Cancer Cell Line Encyclopaedia (CCLE) and Genomics of Drug Sensitivity in Cancer (GDSC). The study proposes the uses of Firefly and Grey Wolf Optimization techniques for feature extraction, while comparing the traditional Machine Learning (ML), ensemble ML and Stacking Algorithm with the proposed Convolutional Temporal Deep Neural Network or CTDN. With the potential to increase efficiency from the suggested intelligible classifier model for a suggestive chemotherapeutic drugs response prediction, our study is important in particular for selecting an acceptable feature selection method. The comparison analysis demonstrates that the proposed model not only surpasses the prior state-of-the-art methods, but also uses Grey Wolf and Fire Fly Optimization to lessen multicollinearity and overfitting.
Collapse
Affiliation(s)
- Davinder Paul Singh
- School of Computer Science and Engineering, Shri Mata Vaishno Devi University, Katra 182320, Jammu and Kashmir, India.
| | - Baijnath Kaushik
- School of Computer Science and Engineering, Shri Mata Vaishno Devi University, Katra 182320, Jammu and Kashmir, India
| |
Collapse
|
49
|
Yoo J, Kim TY, Joung I, Song SO. Industrializing AI/ML during the end-to-end drug discovery process. Curr Opin Struct Biol 2023; 79:102528. [PMID: 36736243 DOI: 10.1016/j.sbi.2023.102528] [Citation(s) in RCA: 9] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/15/2022] [Revised: 12/16/2022] [Accepted: 12/20/2022] [Indexed: 02/04/2023]
Abstract
Drug discovery aims to select proper targets and drug candidates to address unmet clinical needs. The end-to-end drug discovery process includes all stages of drug discovery from target identification to drug candidate selection. Recently, several artificial intelligence and machine learning (AI/ML)-based drug discovery companies have attempted to build data-driven platforms spanning the end-to-end drug discovery process. The ability to identify elusive targets essentially leads to the diversification of discovery pipelines, thereby increasing the ability to address unmet needs. Modern ML technologies are complementing traditional computer-aided drug discovery by accelerating candidate optimization in innovative ways. This review summarizes recent developments in AI/ML methods from target identification to molecule optimization, and concludes with an overview of current industrial trends in end-to-end AI/ML platforms.
Collapse
Affiliation(s)
- Jiho Yoo
- Standigm Inc., 3F, 70 Nonhyeon-ro 85-gil, Gangnam-gu, Seoul, South Korea, 06234 +82.2.501.8118
| | - Tae Yong Kim
- Standigm Inc., 3F, 70 Nonhyeon-ro 85-gil, Gangnam-gu, Seoul, South Korea, 06234 +82.2.501.8118
| | - InSuk Joung
- Standigm Inc., 3F, 70 Nonhyeon-ro 85-gil, Gangnam-gu, Seoul, South Korea, 06234 +82.2.501.8118
| | - Sang Ok Song
- Standigm Inc., 3F, 70 Nonhyeon-ro 85-gil, Gangnam-gu, Seoul, South Korea, 06234 +82.2.501.8118.
| |
Collapse
|
50
|
Torres L, Arrais JP, Ribeiro B. Few-shot learning via graph embeddings with convolutional networks for low-data molecular property prediction. Neural Comput Appl 2023. [DOI: 10.1007/s00521-023-08403-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 03/12/2023]
Abstract
AbstractGraph neural networks and convolutional architectures have proven to be pivotal in improving the prediction of molecular properties in drug discovery. However, this is fundamentally a low data problem that is incompatible with regular deep learning approaches. Contemporary deep networks require large amounts of training data, which severely limits the prediction of new molecular entities from limited available data. In this paper, we address the challenge of low data in molecular property prediction by: (1) defining a set of deep learning architectures that accept compound chemical structures in the form of molecular graphs, (2) creating a few-shot learning strategy across graph neural networks and convolutional neural networks to leverage the rich information of graph embeddings, and (3) proposing a two-module meta-learning framework to learn from task-transferable knowledge and predict molecular properties on few-shot data. Furthermore, we conduct multiple experiments on two benchmark multiproperty datasets to demonstrate a superior performance over conventional graph-based baselines. ROC-AUC results for 10-shot experiments show an average improvement of $$+11.37\%$$
+
11.37
%
on Tox21 and $$+0.53\%$$
+
0.53
%
on SIDER, which are representative small-sized biological datasets for molecular property prediction.
Collapse
|