1
|
Gliozzo J, Soto Gomez MA, Bonometti A, Patak A, Casiraghi E, Valentini G. miss-SNF: a multimodal patient similarity network integration approach to handle completely missing data sources. Bioinformatics 2025; 41:btaf150. [PMID: 40184204 PMCID: PMC12011365 DOI: 10.1093/bioinformatics/btaf150] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/22/2025] [Revised: 03/06/2025] [Accepted: 04/02/2025] [Indexed: 04/05/2025] Open
Abstract
MOTIVATION Precision medicine leverages patient-specific multimodal data to improve prevention, diagnosis, prognosis, and treatment of diseases. Advancing precision medicine requires the non-trivial integration of complex, heterogeneous, and potentially high-dimensional data sources, such as multi-omics and clinical data. In the literature, several approaches have been proposed to manage missing data, but are usually limited to the recovery of subsets of features for a subset of patients. A largely overlooked problem is the integration of multiple sources of data when one or more of them are completely missing for a subset of patients, a relatively common condition in clinical practice. RESULTS We propose miss-Similarity Network Fusion (miss-SNF), a novel general-purpose data integration approach designed to manage completely missing data in the context of patient similarity networks. miss-SNF integrates incomplete unimodal patient similarity networks by leveraging a non-linear message-passing strategy borrowed from the SNF algorithm. miss-SNF is able to recover missing patient similarities and is "task agnostic", in the sense that can integrate partial data for both unsupervised and supervised prediction tasks. Experimental analyses on nine cancer datasets from The Cancer Genome Atlas (TCGA) demonstrate that miss-SNF achieves state-of-the-art results in recovering similarities and in identifying patients subgroups enriched in clinically relevant variables and having differential survival. Moreover, amputation experiments show that miss-SNF supervised prediction of cancer clinical outcomes and Alzheimer's disease diagnosis with completely missing data achieves results comparable to those obtained when all the data are available. AVAILABILITY AND IMPLEMENTATION miss-SNF code, implemented in R, is available at https://github.com/AnacletoLAB/missSNF.
Collapse
Affiliation(s)
- Jessica Gliozzo
- AnacletoLab, Dipartimento di Informatica “Giovanni Degli Antoni”, Università degli Studi di Milano, Via Giovanni Celoria 18, Milan, 20133, Italy
- European Commission, Joint Research Centre (JRC), Ispra, 21027, Italy
| | - Mauricio A Soto Gomez
- AnacletoLab, Dipartimento di Informatica “Giovanni Degli Antoni”, Università degli Studi di Milano, Via Giovanni Celoria 18, Milan, 20133, Italy
| | - Arturo Bonometti
- Department of Biomedical Sciences, Humanitas University, Via Rita Levi Montalcini 4, Pieve Emanuele (MI), 20072, Italy
- Department of Pathology, IRCCS Humanitas Clinical and Research Hospital, Via Alessandro Manzoni 56, Rozzano (MI), 20089, Italy
| | - Alex Patak
- European Commission, Joint Research Centre (JRC), Ispra, 21027, Italy
| | - Elena Casiraghi
- AnacletoLab, Dipartimento di Informatica “Giovanni Degli Antoni”, Università degli Studi di Milano, Via Giovanni Celoria 18, Milan, 20133, Italy
- Environmental Genomics and Systems Biology Division, Lawrence Berkeley National Laboratory, Berkeley, CA, 94720, United States
- Milan Unit, ELLIS—European Laboratory for Learning and Intelligent Systems, Italy
| | - Giorgio Valentini
- AnacletoLab, Dipartimento di Informatica “Giovanni Degli Antoni”, Università degli Studi di Milano, Via Giovanni Celoria 18, Milan, 20133, Italy
- Milan Unit, ELLIS—European Laboratory for Learning and Intelligent Systems, Italy
| |
Collapse
|
2
|
Scala G, Ferraro L, Brandi A, Guo Y, Majello B, Ceccarelli M. MoNETA: MultiOmics Network Embedding for SubType Analysis. NAR Genom Bioinform 2024; 6:lqae141. [PMID: 39416887 PMCID: PMC11482636 DOI: 10.1093/nargab/lqae141] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/25/2024] [Revised: 07/19/2024] [Accepted: 10/04/2024] [Indexed: 10/19/2024] Open
Abstract
Cells are complex systems whose behavior emerges from a huge number of reactions taking place within and among different molecular districts. The availability of bulk and single-cell omics data fueled the creation of multi-omics systems biology models capturing the dynamics within and between omics layers. Powerful modeling strategies are needed to cope with the increased amount of data to be interrogated and the relative research questions. Here, we present MultiOmics Network Embedding for SubType Analysis (MoNETA) for fast and scalable identification of relevant multi-omics relationships between biological entities at the bulk and single-cells level. We apply MoNETA to show how glioma subtypes previously described naturally emerge with our approach. We also show how MoNETA can be used to identify cell types in five multi-omic single-cell datasets.
Collapse
Affiliation(s)
- Giovanni Scala
- Department of Biology, University of Naples ‘Federico II’, 80128 Naples, Italy
| | - Luigi Ferraro
- Sylvester Comprehensive Cancer Center, University of Miami, 33136, Miami, USA
| | - Aurora Brandi
- Department of Biology, University of Naples ‘Federico II’, 80128 Naples, Italy
| | - Yan Guo
- Sylvester Comprehensive Cancer Center, University of Miami, 33136, Miami, USA
| | - Barbara Majello
- Department of Biology, University of Naples ‘Federico II’, 80128 Naples, Italy
| | - Michele Ceccarelli
- Sylvester Comprehensive Cancer Center, University of Miami, 33136, Miami, USA
| |
Collapse
|
3
|
Vieira FG, Bispo R, Lopes MB. Integration of Multi-Omics Data for the Classification of Glioma Types and Identification of Novel Biomarkers. Bioinform Biol Insights 2024; 18:11779322241249563. [PMID: 38812741 PMCID: PMC11135104 DOI: 10.1177/11779322241249563] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/29/2023] [Accepted: 04/09/2024] [Indexed: 05/31/2024] Open
Abstract
Glioma is currently one of the most prevalent types of primary brain cancer. Given its high level of heterogeneity along with the complex biological molecular markers, many efforts have been made to accurately classify the type of glioma in each patient, which, in turn, is critical to improve early diagnosis and increase survival. Nonetheless, as a result of the fast-growing technological advances in high-throughput sequencing and evolving molecular understanding of glioma biology, its classification has been recently subject to significant alterations. In this study, we integrate multiple glioma omics modalities (including mRNA, DNA methylation, and miRNA) from The Cancer Genome Atlas (TCGA), while using the revised glioma reclassified labels, with a supervised method based on sparse canonical correlation analysis (DIABLO) to discriminate between glioma types. We were able to find a set of highly correlated features distinguishing glioblastoma from lower-grade gliomas (LGGs) that were mainly associated with the disruption of receptor tyrosine kinases signaling pathways and extracellular matrix organization and remodeling. Concurrently, the discrimination of the LGG types was characterized primarily by features involved in ubiquitination and DNA transcription processes. Furthermore, we could identify several novel glioma biomarkers likely helpful in both diagnosis and prognosis of the patients, including the genes PPP1R8, GPBP1L1, KIAA1614, C14orf23, CCDC77, BVES, EXD3, CD300A, and HEPN1. Collectively, this comprehensive approach not only allowed a highly accurate discrimination of the different TCGA glioma patients but also presented a step forward in advancing our comprehension of the underlying molecular mechanisms driving glioma heterogeneity. Ultimately, our study also revealed novel candidate biomarkers that might constitute potential therapeutic targets, marking a significant stride toward personalized and more effective treatment strategies for patients with glioma.
Collapse
Affiliation(s)
- Francisca G Vieira
- Center for Mathematics and Applications (NOVA Math), NOVA School of Science and Technology, Caparica, Portugal
| | - Regina Bispo
- Center for Mathematics and Applications (NOVA Math), NOVA School of Science and Technology, Caparica, Portugal
- Department of Mathematics, NOVA School of Science and Technology, Caparica, Portugal
| | - Marta B Lopes
- Center for Mathematics and Applications (NOVA Math), NOVA School of Science and Technology, Caparica, Portugal
- Department of Mathematics, NOVA School of Science and Technology, Caparica, Portugal
- UNIDEMI, Department of Mechanical and Industrial Engineering, NOVA School of Science and Technology, Caparica, Portugal
| |
Collapse
|
4
|
Unger M, Kather JN. Deep learning in cancer genomics and histopathology. Genome Med 2024; 16:44. [PMID: 38539231 PMCID: PMC10976780 DOI: 10.1186/s13073-024-01315-6] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/15/2023] [Accepted: 03/13/2024] [Indexed: 07/08/2024] Open
Abstract
Histopathology and genomic profiling are cornerstones of precision oncology and are routinely obtained for patients with cancer. Traditionally, histopathology slides are manually reviewed by highly trained pathologists. Genomic data, on the other hand, is evaluated by engineered computational pipelines. In both applications, the advent of modern artificial intelligence methods, specifically machine learning (ML) and deep learning (DL), have opened up a fundamentally new way of extracting actionable insights from raw data, which could augment and potentially replace some aspects of traditional evaluation workflows. In this review, we summarize current and emerging applications of DL in histopathology and genomics, including basic diagnostic as well as advanced prognostic tasks. Based on a growing body of evidence, we suggest that DL could be the groundwork for a new kind of workflow in oncology and cancer research. However, we also point out that DL models can have biases and other flaws that users in healthcare and research need to know about, and we propose ways to address them.
Collapse
Affiliation(s)
- Michaela Unger
- Else Kroener Fresenius Center for Digital Health, Medical Faculty Carl Gustav Carus, TUD Dresden University of Technology, Dresden, Germany.
| | - Jakob Nikolas Kather
- Else Kroener Fresenius Center for Digital Health, Medical Faculty Carl Gustav Carus, TUD Dresden University of Technology, Dresden, Germany.
- Department of Medicine I, University Hospital Dresden, Dresden, Germany.
- Medical Oncology, National Center for Tumor Diseases (NCT), University Hospital Heidelberg, Heidelberg, Germany.
| |
Collapse
|
5
|
Tobiasz J, Polanska J. Proteomic Profile Distinguishes New Subpopulations of Breast Cancer Patients with Different Survival Outcomes. Cancers (Basel) 2023; 15:4230. [PMID: 37686507 PMCID: PMC10486506 DOI: 10.3390/cancers15174230] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/25/2023] [Revised: 08/17/2023] [Accepted: 08/22/2023] [Indexed: 09/10/2023] Open
Abstract
As a highly heterogeneous disease, breast cancer (BRCA) demonstrates a diverse molecular portrait. The well-established molecular classification (PAM50) relies on gene expression profiling. It insufficiently explains the observed clinical and histopathological diversity of BRCAs. This study aims to demographically and clinically characterize the six BRCA subpopulations (basal, HER2-enriched, and four luminal ones) revealed by their proteomic portraits. GMM-based high variate protein selection combined with PCA/UMAP was used for dimensionality reduction, while the k-means algorithm allowed patient clustering. The statistical analysis (log-rank and Gehan-Wilcoxon tests, hazard ratio HR as the effect size ES) showed significant differences across identified subpopulations in Disease-Specific Survival (p = 0.0160) and Progression-Free Interval (p = 0.0264). Luminal subpopulations vary in prognosis (Disease-Free Interval, p = 0.0277). The A2 subpopulation is of the poorest, comparable to the HER2-enriched subpopulation, prognoses (HR = 1.748, referenced to Luminal B, small ES), while A3 is of the best (HR = 0.250, large ES). Similar to PAM50 subtypes, no substantial dependency on demographic and clinical factors was detected across Luminal subpopulations, as measured by χ2 test and Cramér's V for ES, and ANOVA with appropriate post hocs combined with η2 or Cohen's d-type ES, respectively. Progesterone receptors can serve as the potential A2 biomarker within Luminal patients. Further investigation of molecular differences is required to examine the potential prognostic or clinical applications.
Collapse
Affiliation(s)
- Joanna Tobiasz
- Department of Data Science and Engineering, Silesian University of Technology, 44-100 Gliwice, Poland;
- Department of Computer Graphics, Vision and Digital Systems, Silesian University of Technology, 44-100 Gliwice, Poland
| | - Joanna Polanska
- Department of Data Science and Engineering, Silesian University of Technology, 44-100 Gliwice, Poland;
| |
Collapse
|
6
|
He W, Huang Y, Shi X, Wang Q, Wu M, Li H, Liu Q, Zhang X, Huang C, Li X. Identifying a distinct fibrosis subset of NAFLD via molecular profiling and the involvement of profibrotic macrophages. J Transl Med 2023; 21:448. [PMID: 37415134 PMCID: PMC10326954 DOI: 10.1186/s12967-023-04300-6] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/25/2023] [Accepted: 06/23/2023] [Indexed: 07/08/2023] Open
Abstract
BACKGROUND There are emerging studies suggesting that non-alcoholic fatty liver disease (NAFLD) is a heterogeneous disease with multiple etiologies and molecular phenotypes. Fibrosis is the key process in NAFLD progression. In this study, we aimed to explore molecular phenotypes of NAFLD with a particular focus on the fibrosis phenotype and also aimed to explore the changes of macrophage subsets in the fibrosis subset of NAFLD. METHODS To assess the transcriptomic alterations of key factors in NAFLD and fibrosis progression, we included 14 different transcriptomic datasets of liver tissues. In addition, two single-cell RNA sequencing (scRNA-seq) datasets were included to construct transcriptomic signatures that could represent specific cells. To explore the molecular subsets of fibrosis in NAFLD based on the transcriptomic features, we used a high-quality RNA-sequencing (RNA-seq) dataset of liver tissues from patients with NAFLD. Non-negative matrix factorization (NMF) was used to analyze the molecular subsets of NAFLD based on the gene set variation analysis (GSVA) enrichment scores of key molecule features in liver tissues. RESULTS The key transcriptomic signatures on NAFLD including non-alcoholic steatohepatitis (NASH) signature, fibrosis signature, non-alcoholic fatty liver (NAFL) signature, liver aging signature and TGF-β signature were constructed by liver transcriptome datasets. We analyzed two liver scRNA-seq datasets and constructed cell type-specific transcriptomic signatures based on the genes that were highly expressed in each cell subset. We analyzed the molecular subsets of NAFLD by NMF and categorized four main subsets of NAFLD. Cluster 4 subset is mainly characterized by liver fibrosis. Patients with Cluster 4 subset have more advanced liver fibrosis than patients with other subsets, or may have a high risk of liver fibrosis progression. Furthermore, we identified two key monocyte-macrophage subsets which were both significantly correlated with the progression of liver fibrosis in NAFLD patients. CONCLUSION Our study revealed the molecular subtypes of NAFLD by integrating key information from transcriptomic expression profiling and liver microenvironment, and identified a novel and distinct fibrosis subset of NAFLD. The fibrosis subset is significantly correlated with the profibrotic macrophages and M2 macrophage subset. These two liver macrophage subsets may be important players in the progression of liver fibrosis of NAFLD patients.
Collapse
Affiliation(s)
- Weiwei He
- Department of Endocrinology and Diabetes, The First Affiliated Hospital of Xiamen University, School of Medicine, Xiamen University, Xaimen, China
- Xiamen Diabetes Institute, The First Affiliated Hospital of Xiamen University, Xiamen, China
- Fujian Provincial Key Laboratory of Translational Medicine for Diabetes, Xiamen, China
| | - Yinxiang Huang
- Xiamen Diabetes Institute, The First Affiliated Hospital of Xiamen University, Xiamen, China
- Fujian Provincial Key Laboratory of Translational Medicine for Diabetes, Xiamen, China
| | - Xiulin Shi
- Department of Endocrinology and Diabetes, The First Affiliated Hospital of Xiamen University, School of Medicine, Xiamen University, Xaimen, China
- Xiamen Diabetes Institute, The First Affiliated Hospital of Xiamen University, Xiamen, China
- Fujian Provincial Key Laboratory of Translational Medicine for Diabetes, Xiamen, China
| | - Qingxuan Wang
- Department of Endocrinology and Diabetes, The First Affiliated Hospital of Xiamen University, School of Medicine, Xiamen University, Xaimen, China
- Xiamen Diabetes Institute, The First Affiliated Hospital of Xiamen University, Xiamen, China
- Fujian Provincial Key Laboratory of Translational Medicine for Diabetes, Xiamen, China
| | - Menghua Wu
- Department of Endocrinology and Diabetes, The First Affiliated Hospital of Xiamen University, School of Medicine, Xiamen University, Xaimen, China
- Xiamen Diabetes Institute, The First Affiliated Hospital of Xiamen University, Xiamen, China
- Fujian Provincial Key Laboratory of Translational Medicine for Diabetes, Xiamen, China
| | - Han Li
- Department of Endocrinology and Diabetes, The First Affiliated Hospital of Xiamen University, School of Medicine, Xiamen University, Xaimen, China
- Xiamen Diabetes Institute, The First Affiliated Hospital of Xiamen University, Xiamen, China
- Fujian Provincial Key Laboratory of Translational Medicine for Diabetes, Xiamen, China
| | - Qiuhong Liu
- Department of Endocrinology and Diabetes, The First Affiliated Hospital of Xiamen University, School of Medicine, Xiamen University, Xaimen, China
- Xiamen Diabetes Institute, The First Affiliated Hospital of Xiamen University, Xiamen, China
- Fujian Provincial Key Laboratory of Translational Medicine for Diabetes, Xiamen, China
| | - Xiaofang Zhang
- Xiamen Diabetes Institute, The First Affiliated Hospital of Xiamen University, Xiamen, China
- Fujian Provincial Key Laboratory of Translational Medicine for Diabetes, Xiamen, China
| | - Caoxin Huang
- Xiamen Diabetes Institute, The First Affiliated Hospital of Xiamen University, Xiamen, China.
- Fujian Provincial Key Laboratory of Translational Medicine for Diabetes, Xiamen, China.
| | - Xuejun Li
- Department of Endocrinology and Diabetes, The First Affiliated Hospital of Xiamen University, School of Medicine, Xiamen University, Xaimen, China.
- Xiamen Diabetes Institute, The First Affiliated Hospital of Xiamen University, Xiamen, China.
- Fujian Provincial Key Laboratory of Translational Medicine for Diabetes, Xiamen, China.
| |
Collapse
|
7
|
Flores JE, Claborne DM, Weller ZD, Webb-Robertson BJM, Waters KM, Bramer LM. Missing data in multi-omics integration: Recent advances through artificial intelligence. Front Artif Intell 2023; 6:1098308. [PMID: 36844425 PMCID: PMC9949722 DOI: 10.3389/frai.2023.1098308] [Citation(s) in RCA: 36] [Impact Index Per Article: 18.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/14/2022] [Accepted: 01/23/2023] [Indexed: 02/11/2023] Open
Abstract
Biological systems function through complex interactions between various 'omics (biomolecules), and a more complete understanding of these systems is only possible through an integrated, multi-omic perspective. This has presented the need for the development of integration approaches that are able to capture the complex, often non-linear, interactions that define these biological systems and are adapted to the challenges of combining the heterogenous data across 'omic views. A principal challenge to multi-omic integration is missing data because all biomolecules are not measured in all samples. Due to either cost, instrument sensitivity, or other experimental factors, data for a biological sample may be missing for one or more 'omic techologies. Recent methodological developments in artificial intelligence and statistical learning have greatly facilitated the analyses of multi-omics data, however many of these techniques assume access to completely observed data. A subset of these methods incorporate mechanisms for handling partially observed samples, and these methods are the focus of this review. We describe recently developed approaches, noting their primary use cases and highlighting each method's approach to handling missing data. We additionally provide an overview of the more traditional missing data workflows and their limitations; and we discuss potential avenues for further developments as well as how the missing data issue and its current solutions may generalize beyond the multi-omics context.
Collapse
Affiliation(s)
- Javier E. Flores
- Pacific Northwest National Laboratory, Biological Sciences Division, Earth and Biological Sciences Directorate, Richland, WA, United States
| | - Daniel M. Claborne
- Pacific Northwest National Laboratory, Artificial Intelligence and Data Analytics Division, National Security Directorate, Richland, WA, United States
| | - Zachary D. Weller
- Pacific Northwest National Laboratory, Artificial Intelligence and Data Analytics Division, National Security Directorate, Richland, WA, United States
| | - Bobbie-Jo M. Webb-Robertson
- Pacific Northwest National Laboratory, Biological Sciences Division, Earth and Biological Sciences Directorate, Richland, WA, United States
| | - Katrina M. Waters
- Pacific Northwest National Laboratory, Biological Sciences Division, Earth and Biological Sciences Directorate, Richland, WA, United States
| | - Lisa M. Bramer
- Pacific Northwest National Laboratory, Biological Sciences Division, Earth and Biological Sciences Directorate, Richland, WA, United States
| |
Collapse
|
8
|
Gliozzo J, Mesiti M, Notaro M, Petrini A, Patak A, Puertas-Gallardo A, Paccanaro A, Valentini G, Casiraghi E. Heterogeneous data integration methods for patient similarity networks. Brief Bioinform 2022; 23:6604996. [PMID: 35679533 PMCID: PMC9294435 DOI: 10.1093/bib/bbac207] [Citation(s) in RCA: 13] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/07/2021] [Revised: 04/14/2022] [Accepted: 05/04/2022] [Indexed: 12/29/2022] Open
Abstract
Patient similarity networks (PSNs), where patients are represented as nodes and their similarities as weighted edges, are being increasingly used in clinical research. These networks provide an insightful summary of the relationships among patients and can be exploited by inductive or transductive learning algorithms for the prediction of patient outcome, phenotype and disease risk. PSNs can also be easily visualized, thus offering a natural way to inspect complex heterogeneous patient data and providing some level of explainability of the predictions obtained by machine learning algorithms. The advent of high-throughput technologies, enabling us to acquire high-dimensional views of the same patients (e.g. omics data, laboratory data, imaging data), calls for the development of data fusion techniques for PSNs in order to leverage this rich heterogeneous information. In this article, we review existing methods for integrating multiple biomedical data views to construct PSNs, together with the different patient similarity measures that have been proposed. We also review methods that have appeared in the machine learning literature but have not yet been applied to PSNs, thus providing a resource to navigate the vast machine learning literature existing on this topic. In particular, we focus on methods that could be used to integrate very heterogeneous datasets, including multi-omics data as well as data derived from clinical information and medical imaging.
Collapse
Affiliation(s)
- Jessica Gliozzo
- AnacletoLab - Computer Science Department, Universitá degli Studi di Milano, Via Celoria 18, 20135, Milan, Italy.,European Commission, Joint Research Centre (JRC), Ispra (VA), Italy.,CINI, Infolife National Laboratory, Roma, Italy
| | - Marco Mesiti
- AnacletoLab - Computer Science Department, Universitá degli Studi di Milano, Via Celoria 18, 20135, Milan, Italy.,CINI, Infolife National Laboratory, Roma, Italy
| | - Marco Notaro
- AnacletoLab - Computer Science Department, Universitá degli Studi di Milano, Via Celoria 18, 20135, Milan, Italy.,CINI, Infolife National Laboratory, Roma, Italy
| | - Alessandro Petrini
- AnacletoLab - Computer Science Department, Universitá degli Studi di Milano, Via Celoria 18, 20135, Milan, Italy.,CINI, Infolife National Laboratory, Roma, Italy
| | - Alex Patak
- European Commission, Joint Research Centre (JRC), Ispra (VA), Italy
| | | | - Alberto Paccanaro
- Department of Computer Science, Royal Holloway, University of London, Egham, TW20 0EX UK.,School of Applied Mathematics (EMAp), Fundação Getúlio Vargas, Rio de Janeiro Brazil
| | - Giorgio Valentini
- AnacletoLab - Computer Science Department, Universitá degli Studi di Milano, Via Celoria 18, 20135, Milan, Italy.,CINI, Infolife National Laboratory, Roma, Italy.,DSRC UNIMI, Data Science Research Center, Milano, 20135, Italy.,ELLIS, European Laboratory for Learning and Intelligent Systems, Berlin, Germany
| | - Elena Casiraghi
- AnacletoLab - Computer Science Department, Universitá degli Studi di Milano, Via Celoria 18, 20135, Milan, Italy.,CINI, Infolife National Laboratory, Roma, Italy
| |
Collapse
|
9
|
Abstract
Grouping patients into subtypes with homogeneous molecular features can guide diagnosis and therapeutic interventions. SUMO is a computational pipeline that uses nonnegative matrix factorization of patient-similarity networks to integrate continuous multi-omic datasets for molecular subtyping of a disease. Here, we present a detailed protocol to demonstrate its use in determining subtypes of lower-grade gliomas by integrating gene expression, DNA methylation, and miRNA expression data from the TCGA-LGG cohort. For complete details on the use and execution of this profile, please refer to Sienkiewicz et al. (2022).
Collapse
Affiliation(s)
- Karolina Sienkiewicz
- Center for Public Health Genomics, University of Virginia, Charlottesville, VA 22908, USA
| | - Aakrosh Ratan
- Center for Public Health Genomics, University of Virginia, Charlottesville, VA 22908, USA
- Department of Public Health Sciences, University of Virginia, Charlottesville, VA 22908, USA
- University of Virginia Cancer Center, Charlottesville, VA 22908, USA
| |
Collapse
|