1
|
Fu M, Lin Y, Yang J, Cheng J, Lin L, Wang G, Long C, Xu S, Lu J, Li G, Yan J, Chen G, Zhuo S, Chen D. Multitask machine learning-based tumor-associated collagen signatures predict peritoneal recurrence and disease-free survival in gastric cancer. Gastric Cancer 2024; 27:1242-1257. [PMID: 39271552 DOI: 10.1007/s10120-024-01551-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 04/21/2024] [Accepted: 09/02/2024] [Indexed: 09/15/2024]
Abstract
BACKGROUND Accurate prediction of peritoneal recurrence for gastric cancer (GC) is crucial in clinic. The collagen alterations in tumor microenvironment affect the migration and treatment response of cancer cells. Herein, we proposed multitask machine learning-based tumor-associated collagen signatures (TACS), which are composed of quantitative collagen features derived from multiphoton imaging, to simultaneously predict peritoneal recurrence (TACSPR) and disease-free survival (TACSDFS). METHODS Among 713 consecutive patients, with 275 in training cohort, 222 patients in internal validation cohort, and 216 patients in external validation cohort, we developed and validated a multitask machine learning model for simultaneously predicting peritoneal recurrence (TACSPR) and disease-free survival (TACSDFS). The accuracy of the model for prediction of peritoneal recurrence and prognosis as well as its association with adjuvant chemotherapy were evaluated. RESULTS The TACSPR and TACSDFS were independently associated with peritoneal recurrence and disease-free survival in three cohorts, respectively (all P < 0.001). The TACSPR demonstrated a favorable performance for peritoneal recurrence in all three cohorts. In addition, the TACSDFS also showed a satisfactory accuracy for disease-free survival among included patients. For stage II and III diseases, adjuvant chemotherapy improved the survival of patients with low TACSPR and low TACSDFS, or high TACSPR and low TACSDFS, or low TACSPR and high TACSDFS, but had no impact on patients with high TACSPR and high TACSDFS. CONCLUSIONS The multitask machine learning model allows accurate prediction of peritoneal recurrence and survival for GC and could distinguish patients who might benefit from adjuvant chemotherapy.
Collapse
Affiliation(s)
- Meiting Fu
- Department of General Surgery, Guangdong Provincial Key Laboratory of Precision Medicine for Gastrointestinal Tumor, Nanfang Hospital, Southern Medical University, Guangzhou, 510515, People's Republic of China
- Department of Gastroenterology, Guangdong Provincial Key Laboratory of Gastroenterology, Nanfang Hospital, Guangzhou, 510515, People's Republic of China
- School of Science, Jimei University, Xiamen, 361021, People's Republic of China
| | - Yuyu Lin
- Department of General Surgery, Guangdong Provincial Key Laboratory of Precision Medicine for Gastrointestinal Tumor, Nanfang Hospital, Southern Medical University, Guangzhou, 510515, People's Republic of China
| | - Junyao Yang
- Department of General Surgery, Guangdong Provincial Key Laboratory of Precision Medicine for Gastrointestinal Tumor, Nanfang Hospital, Southern Medical University, Guangzhou, 510515, People's Republic of China
| | - Jiaxin Cheng
- Department of General Surgery, Guangdong Provincial Key Laboratory of Precision Medicine for Gastrointestinal Tumor, Nanfang Hospital, Southern Medical University, Guangzhou, 510515, People's Republic of China
| | - Liyan Lin
- Department of Pathology, Fujian Key Laboratory of Translational Cancer Medicine, Clinical Oncology School of Fujian Medical University, Fujian Cancer Hospital, Fuzhou, 350014, People's Republic of China
| | - Guangxing Wang
- School of Science, Jimei University, Xiamen, 361021, People's Republic of China
| | - Chenyan Long
- Department of General Surgery, Guangdong Provincial Key Laboratory of Precision Medicine for Gastrointestinal Tumor, Nanfang Hospital, Southern Medical University, Guangzhou, 510515, People's Republic of China
| | - Shuoyu Xu
- Department of General Surgery, Guangdong Provincial Key Laboratory of Precision Medicine for Gastrointestinal Tumor, Nanfang Hospital, Southern Medical University, Guangzhou, 510515, People's Republic of China
| | - Jianping Lu
- Department of Pathology, Fujian Key Laboratory of Translational Cancer Medicine, Clinical Oncology School of Fujian Medical University, Fujian Cancer Hospital, Fuzhou, 350014, People's Republic of China
| | - Guoxin Li
- Department of General Surgery, Guangdong Provincial Key Laboratory of Precision Medicine for Gastrointestinal Tumor, Nanfang Hospital, Southern Medical University, Guangzhou, 510515, People's Republic of China
| | - Jun Yan
- Department of General Surgery, Guangdong Provincial Key Laboratory of Precision Medicine for Gastrointestinal Tumor, Nanfang Hospital, Southern Medical University, Guangzhou, 510515, People's Republic of China
| | - Gang Chen
- Department of Pathology, Fujian Key Laboratory of Translational Cancer Medicine, Clinical Oncology School of Fujian Medical University, Fujian Cancer Hospital, Fuzhou, 350014, People's Republic of China
| | - Shuangmu Zhuo
- School of Science, Jimei University, Xiamen, 361021, People's Republic of China
- Key Laboratory of OptoElectronic Science and Technology for Medicine of Ministry of Education, Fujian Normal University, Fuzhou, 350007, People's Republic of China
| | - Dexin Chen
- Department of General Surgery, Guangdong Provincial Key Laboratory of Precision Medicine for Gastrointestinal Tumor, Nanfang Hospital, Southern Medical University, Guangzhou, 510515, People's Republic of China.
| |
Collapse
|
2
|
Wang L, Wang H, D’Angelo F, Curtin L, Sereduk CP, Leon GD, Singleton KW, Urcuyo J, Hawkins-Daarud A, Jackson PR, Krishna C, Zimmerman RS, Patra DP, Bendok BR, Smith KA, Nakaji P, Donev K, Baxter LC, Mrugała MM, Ceccarelli M, Iavarone A, Swanson KR, Tran NL, Hu LS, Li J. Quantifying intra-tumoral genetic heterogeneity of glioblastoma toward precision medicine using MRI and a data-inclusive machine learning algorithm. PLoS One 2024; 19:e0299267. [PMID: 38568950 PMCID: PMC10990246 DOI: 10.1371/journal.pone.0299267] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/31/2023] [Accepted: 02/06/2024] [Indexed: 04/05/2024] Open
Abstract
BACKGROUND AND OBJECTIVE Glioblastoma (GBM) is one of the most aggressive and lethal human cancers. Intra-tumoral genetic heterogeneity poses a significant challenge for treatment. Biopsy is invasive, which motivates the development of non-invasive, MRI-based machine learning (ML) models to quantify intra-tumoral genetic heterogeneity for each patient. This capability holds great promise for enabling better therapeutic selection to improve patient outcome. METHODS We proposed a novel Weakly Supervised Ordinal Support Vector Machine (WSO-SVM) to predict regional genetic alteration status within each GBM tumor using MRI. WSO-SVM was applied to a unique dataset of 318 image-localized biopsies with spatially matched multiparametric MRI from 74 GBM patients. The model was trained to predict the regional genetic alteration of three GBM driver genes (EGFR, PDGFRA and PTEN) based on features extracted from the corresponding region of five MRI contrast images. For comparison, a variety of existing ML algorithms were also applied. Classification accuracy of each gene were compared between the different algorithms. The SHapley Additive exPlanations (SHAP) method was further applied to compute contribution scores of different contrast images. Finally, the trained WSO-SVM was used to generate prediction maps within the tumoral area of each patient to help visualize the intra-tumoral genetic heterogeneity. RESULTS WSO-SVM achieved 0.80 accuracy, 0.79 sensitivity, and 0.81 specificity for classifying EGFR; 0.71 accuracy, 0.70 sensitivity, and 0.72 specificity for classifying PDGFRA; 0.80 accuracy, 0.78 sensitivity, and 0.83 specificity for classifying PTEN; these results significantly outperformed the existing ML algorithms. Using SHAP, we found that the relative contributions of the five contrast images differ between genes, which are consistent with findings in the literature. The prediction maps revealed extensive intra-tumoral region-to-region heterogeneity within each individual tumor in terms of the alteration status of the three genes. CONCLUSIONS This study demonstrated the feasibility of using MRI and WSO-SVM to enable non-invasive prediction of intra-tumoral regional genetic alteration for each GBM patient, which can inform future adaptive therapies for individualized oncology.
Collapse
Affiliation(s)
- Lujia Wang
- H. Milton Stewart School of Industrial and Systems Engineering, Georgia Institute of Technology, Atlanta, Georgia, United States of America
| | - Hairong Wang
- H. Milton Stewart School of Industrial and Systems Engineering, Georgia Institute of Technology, Atlanta, Georgia, United States of America
| | - Fulvio D’Angelo
- Institute for Cancer Genetics, Columbia University Medical Center, New York City, New York, United States of America
| | - Lee Curtin
- Department of Neurosurgery, Mayo Clinic Arizona, Phoenix, Arizona, United States of America
| | - Christopher P. Sereduk
- Department of Neurosurgery, Mayo Clinic Arizona, Phoenix, Arizona, United States of America
| | - Gustavo De Leon
- Department of Neurosurgery, Mayo Clinic Arizona, Phoenix, Arizona, United States of America
| | - Kyle W. Singleton
- Department of Neurosurgery, Mayo Clinic Arizona, Phoenix, Arizona, United States of America
| | - Javier Urcuyo
- Department of Neurosurgery, Mayo Clinic Arizona, Phoenix, Arizona, United States of America
| | - Andrea Hawkins-Daarud
- Department of Neurosurgery, Mayo Clinic Arizona, Phoenix, Arizona, United States of America
| | - Pamela R. Jackson
- Department of Neurosurgery, Mayo Clinic Arizona, Phoenix, Arizona, United States of America
| | - Chandan Krishna
- Department of Neurosurgery, Mayo Clinic Arizona, Phoenix, Arizona, United States of America
| | - Richard S. Zimmerman
- Department of Neurosurgery, Mayo Clinic Arizona, Phoenix, Arizona, United States of America
| | - Devi P. Patra
- Department of Neurosurgery, Mayo Clinic Arizona, Phoenix, Arizona, United States of America
| | - Bernard R. Bendok
- Department of Neurosurgery, Mayo Clinic Arizona, Phoenix, Arizona, United States of America
| | - Kris A. Smith
- Department of Neurosurgery, Barrow Neurological Institute—St. Joseph’s Hospital and Medical Center, Phoenix, Arizona, United States of America
| | - Peter Nakaji
- Department of Neurosurgery, Barrow Neurological Institute—St. Joseph’s Hospital and Medical Center, Phoenix, Arizona, United States of America
| | - Kliment Donev
- Department of Pathology, Mayo Clinic Arizona, Phoenix, Arizona, United States of America
| | - Leslie C. Baxter
- Department of Neuropsychology, Mayo Clinic Arizona, Phoenix, Arizona, United States of America
| | - Maciej M. Mrugała
- Department of Neuro-Oncology, Mayo Clinic Arizona, Phoenix, Arizona, United States of America
| | - Michele Ceccarelli
- Department of Electrical Engineering and Information Technology, University of Naples “Federico II”, Naples, Italy
| | - Antonio Iavarone
- Institute for Cancer Genetics, Columbia University Medical Center, New York City, New York, United States of America
| | - Kristin R. Swanson
- Department of Neurosurgery, Mayo Clinic Arizona, Phoenix, Arizona, United States of America
| | - Nhan L. Tran
- Department of Neurosurgery, Mayo Clinic Arizona, Phoenix, Arizona, United States of America
- Department of Cancer Biology, Mayo Clinic Arizona, Phoenix, Arizona, United States of America
| | - Leland S. Hu
- Department of Radiology, Mayo Clinic Arizona, Phoenix, Arizona, United States of America
| | - Jing Li
- H. Milton Stewart School of Industrial and Systems Engineering, Georgia Institute of Technology, Atlanta, Georgia, United States of America
| |
Collapse
|
3
|
Anguita-Ruiz A, Amine I, Stratakis N, Maitre L, Julvez J, Urquiza J, Luo C, Nieuwenhuijsen M, Thomsen C, Grazuleviciene R, Heude B, McEachan R, Vafeiadi M, Chatzi L, Wright J, Yang TC, Slama R, Siroux V, Vrijheid M, Basagaña X. Beyond the single-outcome approach: A comparison of outcome-wide analysis methods for exposome research. ENVIRONMENT INTERNATIONAL 2023; 182:108344. [PMID: 38016387 DOI: 10.1016/j.envint.2023.108344] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/28/2023] [Revised: 10/16/2023] [Accepted: 11/20/2023] [Indexed: 11/30/2023]
Abstract
Outcome-wide analysis can offer several benefits, including increased power to detect weak signals and the ability to identify exposures with multiple effects on health, which may be good targets for preventive measures. Recently, advanced statistical multivariate techniques for outcome-wide analysis have been developed, but they have been rarely applied to exposome analysis. In this work, we provide an overview of a selection of methods that are well-suited for outcome-wide exposome analysis and are implemented in the R statistical software. Our work brings together six different methods presenting innovative solutions for typical problems arising from outcome-wide approaches in the context of the exposome, including dependencies among outcomes, high dimensionality, mixed-type outcomes, missing data records, and confounding effects. The identified methods can be grouped into four main categories: regularized multivariate regression techniques, multi-task learning approaches, dimensionality reduction approaches, and bayesian extensions of the multivariate regression framework. Here, we compare each technique presenting its main rationale, strengths, and limitations, and provide codes and guidelines for their application to exposome data. Additionally, we apply all selected methods to a real exposome dataset from the Human Early-Life Exposome (HELIX) project, demonstrating their suitability for exposome research. Although the choice of the best method will always depend on the challenges to be faced in each application, for an exposome-like analysis we find dimensionality reduction and bayesian methods such as reduced rank regression (RRR) or multivariate bayesian shrinkage priors (MBSP) particularly useful, given their ability to deal with critical issues such as collinearity, high-dimensionality, missing data or quantification of uncertainty.
Collapse
Affiliation(s)
- Augusto Anguita-Ruiz
- ISGlobal, 08003 Barcelona, Spain; CIBEROBN (CIBER Physiopathology of Obesity and Nutrition), Instituto de Salud Carlos III, 28029 Madrid, Spain
| | - Ines Amine
- University Grenoble Alpes, Inserm U 1209, CNRS UMR 5309, Team of Environmental Epidemiology Applied to the Development and Respiratory Health, Institute for Advanced Biosciences, 38000 Grenoble, France
| | | | - Lea Maitre
- ISGlobal, 08003 Barcelona, Spain; Universitat Pompeu Fabra (UPF), 08003 Barcelona, Spain; CIBER Epidemiología y Salud Pública (CIBERESP), 28029 Madrid, Spain
| | - Jordi Julvez
- ISGlobal, 08003 Barcelona, Spain; CIBEROBN (CIBER Physiopathology of Obesity and Nutrition), Instituto de Salud Carlos III, 28029 Madrid, Spain; Epidemiology and Environmental Health Joint Research Unit, Foundation for the Promotion of Health and Biomedical Research in the Valencian Region, FISABIO-Public Health, FISABIO-Universitat Jaume I-Universitat de València, Av. Catalunya 21, 46020 Valencia, Spain; Institut d'Investigació Sanitària Pere Virgili (IISPV), Clinical and Epidemiological Neuroscience Group (NeuroÈpia), 43204 Reus (Tarragona), Catalonia, Spain
| | | | - Chongliang Luo
- Division of Public Health Sciences, Washington University School of Medicine in St. Louis, 600 S Taylor Ave, St. Louis, MO 63110, USA
| | - Mark Nieuwenhuijsen
- ISGlobal, 08003 Barcelona, Spain; Universitat Pompeu Fabra (UPF), 08003 Barcelona, Spain; CIBER Epidemiología y Salud Pública (CIBERESP), 28029 Madrid, Spain
| | - Cathrine Thomsen
- Department of Food Safety, Norwegian Institute of Public Health (NIPH), Oslo, Norway
| | - Regina Grazuleviciene
- Department of Environmental Science, Vytautas Magnus University, 44248 Kaunas, Lithuania
| | - Barbara Heude
- Université Paris Cité and Université Sorbonne Paris Nord, Inserm, INRAE, Center for Research in Epidemiology and StatisticS (CRESS), F-75004 Paris, France
| | - Rosemary McEachan
- Bradford Institute for Health Research, Bradford Teaching Hospitals NHS Foundation Trust, Bradford, UK
| | - Marina Vafeiadi
- Department of Social Medicine, School of Medicine, University of Crete, Heraklion, Crete, Greece
| | - Leda Chatzi
- Department of Social Medicine, School of Medicine, University of Crete, Heraklion, Crete, Greece
| | - John Wright
- Bradford Institute for Health Research, Bradford Teaching Hospitals NHS Foundation Trust, Bradford, UK
| | - Tiffany C Yang
- Bradford Institute for Health Research, Bradford Teaching Hospitals NHS Foundation Trust, Bradford, UK
| | - Rémy Slama
- University Grenoble Alpes, Inserm U 1209, CNRS UMR 5309, Team of Environmental Epidemiology Applied to the Development and Respiratory Health, Institute for Advanced Biosciences, 38000 Grenoble, France
| | - Valérie Siroux
- University Grenoble Alpes, Inserm U 1209, CNRS UMR 5309, Team of Environmental Epidemiology Applied to the Development and Respiratory Health, Institute for Advanced Biosciences, 38000 Grenoble, France
| | - Martine Vrijheid
- ISGlobal, 08003 Barcelona, Spain; Universitat Pompeu Fabra (UPF), 08003 Barcelona, Spain; CIBER Epidemiología y Salud Pública (CIBERESP), 28029 Madrid, Spain
| | - Xavier Basagaña
- ISGlobal, 08003 Barcelona, Spain; Universitat Pompeu Fabra (UPF), 08003 Barcelona, Spain; CIBER Epidemiología y Salud Pública (CIBERESP), 28029 Madrid, Spain.
| |
Collapse
|
4
|
Cao H, Hong X, Tost H, Meyer-Lindenberg A, Schwarz E. Advancing translational research in neuroscience through multi-task learning. Front Psychiatry 2022; 13:993289. [PMID: 36465289 PMCID: PMC9714033 DOI: 10.3389/fpsyt.2022.993289] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 07/13/2022] [Accepted: 10/24/2022] [Indexed: 11/18/2022] Open
Abstract
Translational research in neuroscience is increasingly focusing on the analysis of multi-modal data, in order to account for the biological complexity of suspected disease mechanisms. Recent advances in machine learning have the potential to substantially advance such translational research through the simultaneous analysis of different data modalities. This review focuses on one of such approaches, the so-called "multi-task learning" (MTL), and describes its potential utility for multi-modal data analyses in neuroscience. We summarize the methodological development of MTL starting from conventional machine learning, and present several scenarios that appear particularly suitable for its application. For these scenarios, we highlight different types of MTL algorithms, discuss emerging technological adaptations, and provide a step-by-step guide for readers to apply the MTL approach in their own studies. With its ability to simultaneously analyze multiple data modalities, MTL may become an important element of the analytics repertoire used in future neuroscience research and beyond.
Collapse
Affiliation(s)
- Han Cao
- Department of Psychiatry and Psychotherapy, Central Institute of Mental Health, Medical Faculty Mannheim, Heidelberg University, Mannheim, Germany
| | - Xudong Hong
- Department of Computer Vision and Machine Learning, Max Planck Institute for Informatics, Saarbrücken, Germany
- Department of Language Science and Technology, Saarland University, Saarbrücken, Germany
| | - Heike Tost
- Department of Psychiatry and Psychotherapy, Central Institute of Mental Health, Medical Faculty Mannheim, Heidelberg University, Mannheim, Germany
| | - Andreas Meyer-Lindenberg
- Department of Psychiatry and Psychotherapy, Central Institute of Mental Health, Medical Faculty Mannheim, Heidelberg University, Mannheim, Germany
| | - Emanuel Schwarz
- Department of Psychiatry and Psychotherapy, Central Institute of Mental Health, Medical Faculty Mannheim, Heidelberg University, Mannheim, Germany
| |
Collapse
|
5
|
Cao H, Zhang Y, Baumbach J, Burton PR, Dwyer D, Koutsouleris N, Matschinske J, Marcon Y, Rajan S, Rieg T, Ryser-Welch P, Späth J, Herrmann C, Schwarz E. dsMTL - a computational framework for privacy-preserving, distributed multi-task machine learning. Bioinformatics 2022; 38:4919-4926. [PMID: 36073911 DOI: 10.1093/bioinformatics/btac616] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/31/2022] [Revised: 09/06/2022] [Accepted: 09/07/2022] [Indexed: 11/12/2022] Open
Abstract
MOTIVATION In multi-cohort machine learning studies, it is critical to differentiate between effects that are reproducible across cohorts and those that are cohort-specific. Multi-task learning (MTL) is a machine learning approach that facilitates this differentiation through the simultaneous learning of prediction tasks across cohorts. Since multi-cohort data can often not be combined into a single storage solution, there would be the substantial utility of an MTL application for geographically distributed data sources. RESULTS Here, we describe the development of "dsMTL", a computational framework for privacy-preserving, distributed multi-task machine learning that includes three supervised and one unsupervised algorithms. First, we derive the theoretical properties of these methods and the relevant machine learning workflows to ensure the validity of the software implementation. Second, we implement dsMTL as a library for the R programming language, building on the DataSHIELD platform that supports the federated analysis of sensitive individual-level data. Third, we demonstrate the applicability of dsMTL for comorbidity modeling in distributed data. We show that comorbidity modeling using dsMTL outperformed conventional, federated machine learning, as well as the aggregation of multiple models built on the distributed datasets individually. The application of dsMTL was computationally efficient and highly scalable when applied to moderate-size (n < 500), real expression data given the actual network latency. AVAILABILITY dsMTL is freely available at https://github.com/transbioZI/dsMTLBase (server-side package) and https://github.com/transbioZI/dsMTLClient (client-side package). SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Han Cao
- Department of Psychiatry and Psychotherapy, Central Institute of Mental Health, Medical Faculty Mannheim, Heidelberg University, Mannheim, Germany
| | - Youcheng Zhang
- Health Data Science Unit, Medical Faculty Heidelberg & BioQuant, Heidelberg, 69120, Germany
| | - Jan Baumbach
- Chair of Computational Systems Biology, University of Hamburg, Hamburg, Germany.,Computational Biomedicine Lab, Department of Mathematics and Computer Science, University of Southern Denmark, Odense, Denmark
| | - Paul R Burton
- Population Health Sciences Institute, Newcastle University, Newcastle upon Tyne, United Kingdom
| | - Dominic Dwyer
- Department of Psychiatry and Psychotherapy, Section for Neurodiagnostic Applications, Ludwig-Maximilian University, Munich 80638, Germany
| | - Nikolaos Koutsouleris
- Department of Psychiatry and Psychotherapy, Section for Neurodiagnostic Applications, Ludwig-Maximilian University, Munich 80638, Germany
| | - Julian Matschinske
- Chair of Computational Systems Biology, University of Hamburg, Hamburg, Germany
| | | | - Sivanesan Rajan
- Department of Psychiatry and Psychotherapy, Central Institute of Mental Health, Medical Faculty Mannheim, Heidelberg University, Mannheim, Germany
| | - Thilo Rieg
- Department of Psychiatry and Psychotherapy, Central Institute of Mental Health, Medical Faculty Mannheim, Heidelberg University, Mannheim, Germany
| | - Patricia Ryser-Welch
- Population Health Sciences Institute, Newcastle University, Newcastle upon Tyne, United Kingdom
| | - Julian Späth
- Chair of Computational Systems Biology, University of Hamburg, Hamburg, Germany
| | | | - Carl Herrmann
- Health Data Science Unit, Medical Faculty Heidelberg & BioQuant, Heidelberg, 69120, Germany
| | - Emanuel Schwarz
- Department of Psychiatry and Psychotherapy, Central Institute of Mental Health, Medical Faculty Mannheim, Heidelberg University, Mannheim, Germany
| |
Collapse
|
6
|
De Francesco D, Blumenfeld YJ, Marić I, Mayo JA, Chang AL, Fallahzadeh R, Phongpreecha T, Butwick AJ, Xenochristou M, Phibbs CS, Bidoki NH, Becker M, Culos A, Espinosa C, Liu Q, Sylvester KG, Gaudilliere B, Angst MS, Stevenson DK, Shaw GM, Aghaeepour N. A data-driven health index for neonatal morbidities. iScience 2022; 25:104143. [PMID: 35402862 PMCID: PMC8990172 DOI: 10.1016/j.isci.2022.104143] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/10/2021] [Revised: 01/14/2022] [Accepted: 03/20/2022] [Indexed: 11/16/2022] Open
Abstract
Whereas prematurity is a major cause of neonatal mortality, morbidity, and lifelong impairment, the degree of prematurity is usually defined by the gestational age (GA) at delivery rather than by neonatal morbidity. Here we propose a multi-task deep neural network model that simultaneously predicts twelve neonatal morbidities, as the basis for a new data-driven approach to define prematurity. Maternal demographics, medical history, obstetrical complications, and prenatal fetal findings were obtained from linked birth certificates and maternal/infant hospitalization records for 11,594,786 livebirths in California from 1991 to 2012. Overall, our model outperformed traditional models to assess prematurity which are based on GA and/or birthweight (area under the precision-recall curve was 0.326 for our model, 0.229 for GA, and 0.156 for small for GA). These findings highlight the potential of using machine learning techniques to predict multiple prematurity phenotypes and inform clinical decisions to prevent, diagnose and treat neonatal morbidities. Traditional definitions of prematurity based on gestational age need to be updated Deep learning of maternal clinical data improves predictions of neonatal morbidity Data-driven model leverages birthweight, type of delivery and maternal race Accurate risk prediction can inform clinical decisions
Collapse
Affiliation(s)
- Davide De Francesco
- Department of Anesthesiology, Perioperative and Pain Medicine, Stanford University School of Medicine, 300 Pasteur Drive, Stanford, CA 94305, USA.,Department of Biomedical Data Sciences, Stanford University, Stanford, CA 94305, USA.,Department of Pediatrics, Stanford University School of Medicine, Stanford, CA 94305, USA
| | - Yair J Blumenfeld
- Department of Obstetrics and Gynecology, Stanford University School of Medicine, Stanford, CA 94305, USA
| | - Ivana Marić
- Department of Anesthesiology, Perioperative and Pain Medicine, Stanford University School of Medicine, 300 Pasteur Drive, Stanford, CA 94305, USA
| | - Jonathan A Mayo
- Department of Pediatrics, Stanford University School of Medicine, Stanford, CA 94305, USA
| | - Alan L Chang
- Department of Anesthesiology, Perioperative and Pain Medicine, Stanford University School of Medicine, 300 Pasteur Drive, Stanford, CA 94305, USA.,Department of Biomedical Data Sciences, Stanford University, Stanford, CA 94305, USA.,Department of Pediatrics, Stanford University School of Medicine, Stanford, CA 94305, USA
| | - Ramin Fallahzadeh
- Department of Anesthesiology, Perioperative and Pain Medicine, Stanford University School of Medicine, 300 Pasteur Drive, Stanford, CA 94305, USA.,Department of Biomedical Data Sciences, Stanford University, Stanford, CA 94305, USA.,Department of Pediatrics, Stanford University School of Medicine, Stanford, CA 94305, USA
| | - Thanaphong Phongpreecha
- Department of Anesthesiology, Perioperative and Pain Medicine, Stanford University School of Medicine, 300 Pasteur Drive, Stanford, CA 94305, USA.,Department of Biomedical Data Sciences, Stanford University, Stanford, CA 94305, USA.,Department of Pathology, Stanford University School of Medicine, Stanford, CA 94305, USA
| | - Alex J Butwick
- Department of Anesthesiology, Perioperative and Pain Medicine, Stanford University School of Medicine, 300 Pasteur Drive, Stanford, CA 94305, USA
| | - Maria Xenochristou
- Department of Anesthesiology, Perioperative and Pain Medicine, Stanford University School of Medicine, 300 Pasteur Drive, Stanford, CA 94305, USA.,Department of Biomedical Data Sciences, Stanford University, Stanford, CA 94305, USA.,Department of Pediatrics, Stanford University School of Medicine, Stanford, CA 94305, USA
| | - Ciaran S Phibbs
- Department of Pediatrics, Stanford University School of Medicine, Stanford, CA 94305, USA.,Health Economics Resource Center, VA Palo Alto Health Care System, Stanford, CA 94305, USA
| | - Neda H Bidoki
- Department of Anesthesiology, Perioperative and Pain Medicine, Stanford University School of Medicine, 300 Pasteur Drive, Stanford, CA 94305, USA.,Department of Biomedical Data Sciences, Stanford University, Stanford, CA 94305, USA.,Department of Pediatrics, Stanford University School of Medicine, Stanford, CA 94305, USA
| | - Martin Becker
- Department of Anesthesiology, Perioperative and Pain Medicine, Stanford University School of Medicine, 300 Pasteur Drive, Stanford, CA 94305, USA.,Department of Biomedical Data Sciences, Stanford University, Stanford, CA 94305, USA.,Department of Pediatrics, Stanford University School of Medicine, Stanford, CA 94305, USA
| | - Anthony Culos
- Department of Anesthesiology, Perioperative and Pain Medicine, Stanford University School of Medicine, 300 Pasteur Drive, Stanford, CA 94305, USA.,Department of Biomedical Data Sciences, Stanford University, Stanford, CA 94305, USA.,Department of Pediatrics, Stanford University School of Medicine, Stanford, CA 94305, USA
| | - Camilo Espinosa
- Department of Anesthesiology, Perioperative and Pain Medicine, Stanford University School of Medicine, 300 Pasteur Drive, Stanford, CA 94305, USA.,Department of Biomedical Data Sciences, Stanford University, Stanford, CA 94305, USA.,Department of Pediatrics, Stanford University School of Medicine, Stanford, CA 94305, USA
| | - Qun Liu
- Department of Anesthesiology, Perioperative and Pain Medicine, Stanford University School of Medicine, 300 Pasteur Drive, Stanford, CA 94305, USA.,Department of Biomedical Data Sciences, Stanford University, Stanford, CA 94305, USA.,Department of Pediatrics, Stanford University School of Medicine, Stanford, CA 94305, USA
| | - Karl G Sylvester
- Department of Surgery, Stanford University School of Medicine, Stanford, CA 94305, USA
| | - Brice Gaudilliere
- Department of Anesthesiology, Perioperative and Pain Medicine, Stanford University School of Medicine, 300 Pasteur Drive, Stanford, CA 94305, USA.,Department of Pediatrics, Stanford University School of Medicine, Stanford, CA 94305, USA
| | - Martin S Angst
- Department of Anesthesiology, Perioperative and Pain Medicine, Stanford University School of Medicine, 300 Pasteur Drive, Stanford, CA 94305, USA
| | - David K Stevenson
- Department of Pediatrics, Stanford University School of Medicine, Stanford, CA 94305, USA
| | - Gary M Shaw
- Department of Pediatrics, Stanford University School of Medicine, Stanford, CA 94305, USA
| | - Nima Aghaeepour
- Department of Anesthesiology, Perioperative and Pain Medicine, Stanford University School of Medicine, 300 Pasteur Drive, Stanford, CA 94305, USA.,Department of Biomedical Data Sciences, Stanford University, Stanford, CA 94305, USA.,Department of Pediatrics, Stanford University School of Medicine, Stanford, CA 94305, USA
| |
Collapse
|
7
|
Golchi S, Fu J, Liu X, Yu E, Forghani R, Bhatnagar S. Sparse Bayesian Predictive Modelling of Tumor Response Using Radiomic Features. Stat (Int Stat Inst) 2022. [DOI: 10.1002/sta4.450] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022]
Affiliation(s)
- Shirin Golchi
- Department of Epidemiology and Biostatistics McGill University QC Canada
| | - Jingyan Fu
- Department of Epidemiology and Biostatistics McGill University QC Canada
| | - Xiaoyang Liu
- Princess Margaret Hospital University of Toronto ON Canada
- Department of Radiology, Brigham and Women’s Hospital Harvard University MA USA
- Department of Medical Imaging University of Toronto ON Canada
| | - Eugene Yu
- Princess Margaret Hospital University of Toronto ON Canada
- Department of Medical Imaging University of Toronto ON Canada
| | - Reza Forghani
- Department of Diagnostic Radiology McGill University QC Canada
| | - Sahir Bhatnagar
- Department of Epidemiology and Biostatistics McGill University QC Canada
- Department of Diagnostic Radiology McGill University QC Canada
| |
Collapse
|
8
|
Hossain SMM, Khatun L, Ray S, Mukhopadhyay A. Pan-cancer classification by regularized multi-task learning. Sci Rep 2021; 11:24252. [PMID: 34930937 PMCID: PMC8688544 DOI: 10.1038/s41598-021-03554-8] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/21/2021] [Accepted: 12/06/2021] [Indexed: 01/16/2023] Open
Abstract
Classifying pan-cancer samples using gene expression patterns is a crucial challenge for the accurate diagnosis and treatment of cancer patients. Machine learning algorithms have been considered proven tools to perform downstream analysis and capture the deviations in gene expression patterns across diversified diseases. In our present work, we have developed PC-RMTL, a pan-cancer classification model using regularized multi-task learning (RMTL) for classifying 21 cancer types and adjacent normal samples using RNASeq data obtained from TCGA. PC-RMTL is observed to outperform when compared with five state-of-the-art classification algorithms, viz. SVM with the linear kernel (SVM-Lin), SVM with radial basis function kernel (SVM-RBF), random forest (RF), k-nearest neighbours (kNN), and decision trees (DT). The PC-RMTL achieves 96.07% accuracy and 95.80% MCC score for a completely unknown independent test set. The only method that appears as the real competitor is SVM-Lin, which nearly equalizes the accuracy in prediction of PC-RMTL but only when complete feature sets are provided for training; otherwise, PC-RMTL outperformed all other classification models. To the best of our knowledge, this is a significant improvement over all the existing works in pan-cancer classification as they have failed to classify many cancer types from one another reliably. We have also compared gene expression patterns of the top discriminating genes across the cancers and performed their functional enrichment analysis that uncovers several interesting facts in distinguishing pan-cancer samples.
Collapse
Affiliation(s)
| | - Lutfunnesa Khatun
- Computer Science and Engineering, University of Kalyani, Kalyani, 741235, India
| | - Sumanta Ray
- Computer Science and Engineering, Aliah University, Kolkata, 700160, India.
| | - Anirban Mukhopadhyay
- Computer Science and Engineering, University of Kalyani, Kalyani, 741235, India.
| |
Collapse
|
9
|
Rauschenberger A, Glaab E. Predicting correlated outcomes from molecular data. Bioinformatics 2021; 37:3889-3895. [PMID: 34358294 DOI: 10.1093/bioinformatics/btab576] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/01/2021] [Revised: 07/14/2021] [Accepted: 08/05/2021] [Indexed: 11/14/2022] Open
Abstract
MOTIVATION Multivariate (multi-target) regression has the potential to outperform univariate (single-target) regression at predicting correlated outcomes, which frequently occur in biomedical and clinical research. Here we implement multivariate lasso and ridge regression using stacked generalisation. RESULTS Our flexible approach leads to predictive and interpretable models in high-dimensional settings, with a single estimate for each input-output effect. In the simulation, we compare the predictive performance of several state-of-the-art methods for multivariate regression. In the application, we use clinical and genomic data to predict multiple motor and non-motor symptoms in Parkinson's disease patients. We conclude that stacked multivariate regression, with our adaptations, is a competitive method for predicting correlated outcomes. AVAILABILITY AND IMPLEMENTATION The R package joinet is available on GitHub (https://github.com/rauschenberger/joinet) and cran (https://cran.r-project.org/package=joinet). SUPPLEMENTARY INFORMATION Supplementary tables and figures are available at Bioinformatics online.
Collapse
Affiliation(s)
- Armin Rauschenberger
- Luxembourg Centre for Systems Biomedicine (lcsb), University of Luxembourg, Esch-sur-Alzette, 4362, Luxembourg
| | - Enrico Glaab
- Luxembourg Centre for Systems Biomedicine (lcsb), University of Luxembourg, Esch-sur-Alzette, 4362, Luxembourg
| |
Collapse
|
10
|
Ülgen E, Sezerman OU. driveR: a novel method for prioritizing cancer driver genes using somatic genomics data. BMC Bioinformatics 2021; 22:263. [PMID: 34030627 PMCID: PMC8142487 DOI: 10.1186/s12859-021-04203-7] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/28/2020] [Accepted: 05/17/2021] [Indexed: 12/15/2022] Open
Abstract
Background Cancer develops due to “driver” alterations. Numerous approaches exist for predicting cancer drivers from cohort-scale genomics data. However, methods for personalized analysis of driver genes are underdeveloped. In this study, we developed a novel personalized/batch analysis approach for driver gene prioritization utilizing somatic genomics data, called driveR. Results Combining genomics information and prior biological knowledge, driveR accurately prioritizes cancer driver genes via a multi-task learning model. Testing on 28 different datasets, this study demonstrates that driveR performs adequately, achieving a median AUC of 0.684 (range 0.651–0.861) on the 28 batch analysis test datasets, and a median AUC of 0.773 (range 0–1) on the 5157 personalized analysis test samples. Moreover, it outperforms existing approaches, achieving a significantly higher median AUC than all of MutSigCV (Wilcoxon rank-sum test p < 0.001), DriverNet (p < 0.001), OncodriveFML (p < 0.001) and MutPanning (p < 0.001) on batch analysis test datasets, and a significantly higher median AUC than DawnRank (p < 0.001) and PRODIGY (p < 0.001) on personalized analysis datasets. Conclusions This study demonstrates that the proposed method is an accurate and easy-to-utilize approach for prioritizing driver genes in cancer genomes in personalized or batch analyses. driveR is available on CRAN: https://cran.r-project.org/package=driveR. Supplementary Information The online version contains supplementary material available at 10.1186/s12859-021-04203-7.
Collapse
Affiliation(s)
- Ege Ülgen
- Department of Biostatistics and Medical Informatics, School of Medicine, Acibadem Mehmet Ali Aydinlar University, Istanbul, Turkey.
| | - O Uğur Sezerman
- Department of Biostatistics and Medical Informatics, School of Medicine, Acibadem Mehmet Ali Aydinlar University, Istanbul, Turkey
| |
Collapse
|
11
|
Xie L, He S, Zhang Z, Lin K, Bo X, Yang S, Feng B, Wan K, Yang K, Yang J, Ding Y. Domain-adversarial multi-task framework for novel therapeutic property prediction of compounds. Bioinformatics 2020; 36:2848-2855. [PMID: 31999334 DOI: 10.1093/bioinformatics/btaa063] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/15/2019] [Revised: 12/23/2019] [Accepted: 01/23/2020] [Indexed: 12/11/2022] Open
Abstract
MOTIVATION With the rapid development of high-throughput technologies, parallel acquisition of large-scale drug-informatics data provides significant opportunities to improve pharmaceutical research and development. One important application is the purpose prediction of small-molecule compounds with the objective of specifying the therapeutic properties of extensive purpose-unknown compounds and repurposing the novel therapeutic properties of FDA-approved drugs. Such a problem is extremely challenging because compound attributes include heterogeneous data with various feature patterns, such as drug fingerprints, drug physicochemical properties and drug perturbation gene expressions. Moreover, there is a complex non-linear dependency among heterogeneous data. In this study, we propose a novel domain-adversarial multi-task framework for integrating shared knowledge from multiple domains. The framework first uses an adversarial strategy to learn target representations and then models non-linear dependency among several domains. RESULTS Experiments on two real-world datasets illustrate that our approach achieves an obvious improvement over competitive baselines. The novel therapeutic properties of purpose-unknown compounds that we predicted have been widely reported or brought to clinics. Furthermore, our framework can integrate various attributes beyond the three domains examined herein and can be applied in industry for screening significant numbers of small-molecule drug candidates. AVAILABILITY AND IMPLEMENTATION The source code and datasets are available at https://github.com/JohnnyY8/DAMT-Model. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Lingwei Xie
- School of Informatics, Xiamen University, Xiamen 361005, China
| | - Song He
- Department of Biotechnology, Beijing Institute of Radiation Medicine, Beijing 100850, China
| | - Zhongnan Zhang
- School of Informatics, Xiamen University, Xiamen 361005, China
| | - Kunhui Lin
- School of Informatics, Xiamen University, Xiamen 361005, China
| | - Xiaochen Bo
- Department of Biotechnology, Beijing Institute of Radiation Medicine, Beijing 100850, China
| | - Shu Yang
- Department of Computer Science, UC Santa Barbara, Santa Barbara, CA 93106, USA
| | - Boyuan Feng
- Department of Computer Science, UC Santa Barbara, Santa Barbara, CA 93106, USA
| | - Kun Wan
- Department of Computer Science, UC Santa Barbara, Santa Barbara, CA 93106, USA
| | - Kang Yang
- School of Informatics, Xiamen University, Xiamen 361005, China
| | - Jie Yang
- School of Informatics, Xiamen University, Xiamen 361005, China
| | - Yufei Ding
- Department of Computer Science, UC Santa Barbara, Santa Barbara, CA 93106, USA
| |
Collapse
|