1
|
Ellington CN, Lengerich BJ, Watkins TBK, Yang J, Adduri AK, Mahbub S, Xiao H, Kellis M, Xing EP. Learning to estimate sample-specific transcriptional networks for 7,000 tumors. Proc Natl Acad Sci U S A 2025; 122:e2411930122. [PMID: 40408406 DOI: 10.1073/pnas.2411930122] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/14/2024] [Accepted: 04/06/2025] [Indexed: 05/25/2025] Open
Abstract
Cancers are shaped by somatic mutations, microenvironment, and patient background, each altering gene expression and regulation in complex ways, resulting in heterogeneous cellular states and dynamics. Inferring gene regulatory networks (GRNs) from expression data can help characterize this regulation-driven heterogeneity, but network inference requires many statistical samples, limiting GRNs to cluster-level analyses that ignore intracluster heterogeneity. We propose to move beyond coarse analyses of predefined subgroups by using contextualized learning, a multitask learning paradigm that uses multiview contexts including phenotypic, molecular, and environmental information to infer personalized models. With sample-specific contexts, contextualization enables sample-specific models and even generalizes at test time to predict network models for entirely unseen contexts. We unify three network model classes (Correlation, Markov, and Neighborhood Selection) and estimate context-specific GRNs for 7,997 tumors across 25 tumor types, using copy number and driver mutation profiles, tumor microenvironment, and patient demographics as model context. Our generative modeling approach allows us to predict GRNs for unseen tumor types based on a pan-cancer model of how somatic mutations affect gene regulation. Finally, contextualized networks enable GRN-based precision oncology by providing a structured view of expression dynamics at sample-specific resolution, explaining known biomarkers in terms of network-mediated effects and leading to subtypings that improve survival prognosis. We provide a SKLearn-style Python package https://contextualized.ml for learning and analyzing contextualized models, as well as interactive plotting tools for pan-cancer data exploration at https://github.com/cnellington/CancerContextualized.
Collapse
Affiliation(s)
- Caleb N Ellington
- Computational Biology Department, Carnegie Mellon University, Pittsburgh, PA 15213
| | - Benjamin J Lengerich
- Department of Electrical Engineering and Computer Science, Massachusetts Institute of Technology, Cambridge, MA 02139
- Broad Institute, Massachusetts Institute of Technology and Harvard University, Cambridge, MA 02142
| | - Thomas B K Watkins
- Cancer Institute, University College London, London WC1E 6DD, United Kingdom
| | - Jiekun Yang
- Department of Electrical Engineering and Computer Science, Massachusetts Institute of Technology, Cambridge, MA 02139
- Broad Institute, Massachusetts Institute of Technology and Harvard University, Cambridge, MA 02142
| | - Abhinav K Adduri
- Computational Biology Department, Carnegie Mellon University, Pittsburgh, PA 15213
| | - Sazan Mahbub
- Computational Biology Department, Carnegie Mellon University, Pittsburgh, PA 15213
| | - Hanxi Xiao
- Department of Computational and Systems Biology, University of Pittsburgh, Pittsburgh, PA 15260
| | - Manolis Kellis
- Department of Electrical Engineering and Computer Science, Massachusetts Institute of Technology, Cambridge, MA 02139
- Broad Institute, Massachusetts Institute of Technology and Harvard University, Cambridge, MA 02142
| | - Eric P Xing
- Computational Biology Department, Carnegie Mellon University, Pittsburgh, PA 15213
- Machine Learning Department, Mohamed bin Zayed University of Artificial Intelligence, Masdar City SE45 05, Abu Dhabi, United Arab Emirates
- GenBio AI Inc., Palo Alto, CA 94301
| |
Collapse
|
2
|
Rao J, Kirk PDW. VICatMix: variational Bayesian clustering and variable selection for discrete biomedical data. BIOINFORMATICS ADVANCES 2025; 5:vbaf055. [PMID: 40206332 PMCID: PMC11981716 DOI: 10.1093/bioadv/vbaf055] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 01/01/2025] [Accepted: 03/13/2025] [Indexed: 04/11/2025]
Abstract
Summary Effective clustering of biomedical data is crucial in precision medicine, enabling accurate stratification of patients or samples. However, the growth in availability of high-dimensional categorical data, including 'omics data, necessitates computationally efficient clustering algorithms. We present VICatMix, a variational Bayesian finite mixture model designed for the clustering of categorical data. The use of variational inference (VI) in its training allows the model to outperform competitors in terms of computational time and scalability, while maintaining high accuracy. VICatMix furthermore performs variable selection, enhancing its performance on high-dimensional, noisy data. The proposed model incorporates summarization and model averaging to mitigate poor local optima in VI, allowing for improved estimation of the true number of clusters simultaneously with feature saliency. We demonstrate the performance of VICatMix with both simulated and real-world data, including applications to datasets from The Cancer Genome Atlas, showing its use in cancer subtyping and driver gene discovery. We demonstrate VICatMix's potential utility in integrative cluster analysis with different 'omics datasets, enabling the discovery of novel disease subtypes. Availability and implementation VICatMix is freely available as an R package via CRAN, incorporating C++ for faster computation, at https://CRAN.R-project.org/package=VICatMix.
Collapse
Affiliation(s)
- Jackie Rao
- MRC Biostatistics Unit, University of Cambridge, Cambridge, CB2 0SR, United Kingdom
| | - Paul D W Kirk
- MRC Biostatistics Unit, University of Cambridge, Cambridge, CB2 0SR, United Kingdom
- CRUK Cambridge Centre Ovarian Programme, University of Cambridge, Cambridge, CB2 0RE, United Kingdom
- Cambridge Institute of Therapeutic Immunology and Infectious Disease (CITIID), University of Cambridge, Cambridge, CB2 0AW, United Kingdom
| |
Collapse
|
3
|
Duong HT, Huynh NCN, Nguyen CTK, Le LGH, Nguyen KD, Nguyen HT, Tu LNL, Tran NHB, Giang H, Nguyen HN, Ho CQ, Hoang HT, Dang THQ, Thai TA, Cao DV. Identify characteristics of Vietnamese oral squamous cell carcinoma patients by machine learning on transcriptome and clinical-histopathological analysis. J Dent Sci 2024; 19:S81-S90. [PMID: 39807441 PMCID: PMC11725156 DOI: 10.1016/j.jds.2024.08.013] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/05/2024] [Revised: 08/19/2024] [Indexed: 01/16/2025] Open
Abstract
Background/purpose Oral squamous cell carcinoma (OSCC) is notorious for its low survival rates, due to the advanced stage at which it is commonly diagnosed. To enhance early detection and improve prognostic assessments, our study harnesses the power of machine learning (ML) to dissect and interpret complex patterns within mRNA-sequencing (RNA-seq) data and clinical-histopathological features. Materials and methods 206 retrospective Vietnamese OSCC formalin-fixed paraffin-embedded (FFPE) tumor samples, of which 101 were subjected to RNA-seq for classification based on gene expression. Then, learning models were built based on clinical-histopathological data to predict OSCC subtypes and propose potential biomarkers for the remaining 105 samples. Results 2 distinct groups of OSCC with different clinical-histopathological characteristics and gene expression. Subgroup 1 was characterized by severe histopathologic features with immune response and apoptosis signatures while subgroup 2 was denoted by more clinical/pathological features, cell division and malignant signatures. XGBoost and SVM (Support Vector Machine) models showed the best performance in predicting subtype OSCC. The study also proposed 12 candidate genes as potential biomarkers for OSCC subtypes (6/group). Conclusion The study identified characteristics of Vietnamese OSCC patients through a combination of mRNA sequencing and clinical-histopathological analysis. It contributes to the insight into the tumor microenvironment of OSCC and provides accurate ML models for biomarker prediction using clinical-histopathological features.
Collapse
Affiliation(s)
- Huong Thu Duong
- Faculty of Odonto-stomatology, University of Medicine and Pharmacy at Ho Chi Minh City, Ho Chi Minh City, Viet Nam
| | - Nam Cong-Nhat Huynh
- Faculty of Odonto-stomatology, University of Medicine and Pharmacy at Ho Chi Minh City, Ho Chi Minh City, Viet Nam
| | - Chi Thi-Kim Nguyen
- Faculty of Odonto-stomatology, University of Medicine and Pharmacy at Ho Chi Minh City, Ho Chi Minh City, Viet Nam
| | - Linh Gia-Hoang Le
- Center for Molecular Biomedicine, University of Medicine and Pharmacy at Ho Chi Minh City, Ho Chi Minh City, Viet Nam
| | - Khoa Dang Nguyen
- Faculty of Odonto-stomatology, University of Medicine and Pharmacy at Ho Chi Minh City, Ho Chi Minh City, Viet Nam
| | - Hieu Trong Nguyen
- Gene Solutions, Ho Chi Minh City, Viet Nam
- Medical Genetics Institute, Ho Chi Minh City, Viet Nam
| | - Lan Ngoc-Ly Tu
- Gene Solutions, Ho Chi Minh City, Viet Nam
- Medical Genetics Institute, Ho Chi Minh City, Viet Nam
| | - Nam Huynh-Bao Tran
- Gene Solutions, Ho Chi Minh City, Viet Nam
- Medical Genetics Institute, Ho Chi Minh City, Viet Nam
| | - Hoa Giang
- Gene Solutions, Ho Chi Minh City, Viet Nam
- Medical Genetics Institute, Ho Chi Minh City, Viet Nam
| | - Hoai-Nghia Nguyen
- Gene Solutions, Ho Chi Minh City, Viet Nam
- Medical Genetics Institute, Ho Chi Minh City, Viet Nam
| | - Chuong Quoc Ho
- Center for Molecular Biomedicine, University of Medicine and Pharmacy at Ho Chi Minh City, Ho Chi Minh City, Viet Nam
| | - Hung Trong Hoang
- Faculty of Odonto-stomatology, University of Medicine and Pharmacy at Ho Chi Minh City, Ho Chi Minh City, Viet Nam
| | | | - Tu Anh Thai
- Ho Chi Minh City Oncology Hospital, Ho Chi Minh City, Viet Nam
| | - Dong Van Cao
- Blood Transfusion Haematology Hospital No. 2, Ho Chi Minh City, Viet Nam
| |
Collapse
|
4
|
Qiu Y, Guo D, Zhao P, Zou Q. scMNMF: a novel method for single-cell multi-omics clustering based on matrix factorization. Brief Bioinform 2024; 25:bbae228. [PMID: 38754408 PMCID: PMC11097994 DOI: 10.1093/bib/bbae228] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/31/2024] [Revised: 04/02/2024] [Accepted: 04/22/2024] [Indexed: 05/18/2024] Open
Abstract
MOTIVATION The technology for analyzing single-cell multi-omics data has advanced rapidly and has provided comprehensive and accurate cellular information by exploring cell heterogeneity in genomics, transcriptomics, epigenomics, metabolomics and proteomics data. However, because of the high-dimensional and sparse characteristics of single-cell multi-omics data, as well as the limitations of various analysis algorithms, the clustering performance is generally poor. Matrix factorization is an unsupervised, dimensionality reduction-based method that can cluster individuals and discover related omics variables from different blocks. Here, we present a novel algorithm that performs joint dimensionality reduction learning and cell clustering analysis on single-cell multi-omics data using non-negative matrix factorization that we named scMNMF. We formulate the objective function of joint learning as a constrained optimization problem and derive the corresponding iterative formulas through alternating iterative algorithms. The major advantage of the scMNMF algorithm remains its capability to explore hidden related features among omics data. Additionally, the feature selection for dimensionality reduction and cell clustering mutually influence each other iteratively, leading to a more effective discovery of cell types. We validated the performance of the scMNMF algorithm using two simulated and five real datasets. The results show that scMNMF outperformed seven other state-of-the-art algorithms in various measurements. AVAILABILITY AND IMPLEMENTATION scMNMF code can be found at https://github.com/yushanqiu/scMNMF.
Collapse
Affiliation(s)
- Yushan Qiu
- School of Mathematical Sciences, Shenzhen University, 518000, Guangdong, China
| | - Dong Guo
- School of Mathematical Sciences, Shenzhen University, 518000, Guangdong, China
| | - Pu Zhao
- College of Life and Health Sciences, Northeastern University, Shenyang, 110169, China
| | - Quan Zou
- Institute of Fundamental and Frontier Sciences, University of Electronic Science and Technology of China, Chengdu, 610056, China
| |
Collapse
|
5
|
Huang C, Kuan PF. intCC: An efficient weighted integrative consensus clustering of multimodal data. PACIFIC SYMPOSIUM ON BIOCOMPUTING. PACIFIC SYMPOSIUM ON BIOCOMPUTING 2024; 29:627-640. [PMID: 38160311 PMCID: PMC10764072] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Subscribe] [Scholar Register] [Indexed: 01/03/2024]
Abstract
High throughput profiling of multiomics data provides a valuable resource to better understand the complex human disease such as cancer and to potentially uncover new subtypes. Integrative clustering has emerged as a powerful unsupervised learning framework for subtype discovery. In this paper, we propose an efficient weighted integrative clustering called intCC by combining ensemble method, consensus clustering and kernel learning integrative clustering. We illustrate that intCC can accurately uncover the latent cluster structures via extensive simulation studies and a case study on the TCGA pan cancer datasets. An R package intCC implementing our proposed method is available at https://github.com/candsj/intCC.
Collapse
Affiliation(s)
- Can Huang
- Department of Applied Mathematics and Statistics, Stony Brook University, Stony Brook, NY 11794, USA
| | - Pei Fen Kuan
- Department of Applied Mathematics and Statistics, Stony Brook University, Stony Brook, NY 11794, USA
| |
Collapse
|
6
|
Wu X, Han M, Song X, He S, Bo X, Zhu Y. COMMO: a web server for the identification and analysis of consensus gene modules across multiple methods. Bioinformatics 2023; 39:btad708. [PMID: 37995293 PMCID: PMC10713113 DOI: 10.1093/bioinformatics/btad708] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/19/2023] [Revised: 11/05/2023] [Accepted: 11/22/2023] [Indexed: 11/25/2023] Open
Abstract
SUMMARY A variety of computational methods have been developed to identify functionally related gene modules from genome-wide gene expression profiles. Integrating the results of these methods to identify consensus modules is a promising approach to produce more accurate and robust results. In this application note, we introduce COMMO, the first web server to identify and analyze consensus gene functionally related gene modules from different module detection methods. First, COMMO implements eight state-of-the-art module detection methods and two consensus clustering algorithms. Second, COMMO provides users with mRNA and protein expression data for 33 cancer types from three public databases. Users can also upload their own data for module detection. Third, users can perform functional enrichment and two types of survival analyses on the observed gene modules. Finally, COMMO provides interactive, customizable visualizations and exportable results. With its extensive analysis and interactive capabilities, COMMO offers a user-friendly solution for conducting module-based precision medicine research. AVAILABILITY AND IMPLEMENTATION COMMO web is available at https://commo.ncpsb.org.cn/, with the source code available on GitHub: https://github.com/Song-xinyu/COMMO/tree/master.
Collapse
Affiliation(s)
- Xiaojing Wu
- Basic Medical School, Anhui Medical University, Hefei 230022, China
- National Center for Protein Sciences (Beijing), Beijing Proteome Research Center, Beijing Institute of Lifeomics, Beijing 102206, China
| | - Mingfei Han
- National Center for Protein Sciences (Beijing), Beijing Proteome Research Center, Beijing Institute of Lifeomics, Beijing 102206, China
| | - Xinyu Song
- Center for Artificial Intelligence in Medicine, Medical Innovation Research Division of Chinese, PLA General Hospital, Beijing 100853, China
| | - Song He
- Department of Bioinformatics, Institute of Health Service and Transfusion Medicine, Beijing 100850, China
| | - Xiaochen Bo
- Department of Bioinformatics, Institute of Health Service and Transfusion Medicine, Beijing 100850, China
| | - Yunping Zhu
- Basic Medical School, Anhui Medical University, Hefei 230022, China
- National Center for Protein Sciences (Beijing), Beijing Proteome Research Center, Beijing Institute of Lifeomics, Beijing 102206, China
| |
Collapse
|
7
|
Marcos-Zambrano LJ, López-Molina VM, Bakir-Gungor B, Frohme M, Karaduzovic-Hadziabdic K, Klammsteiner T, Ibrahimi E, Lahti L, Loncar-Turukalo T, Dhamo X, Simeon A, Nechyporenko A, Pio G, Przymus P, Sampri A, Trajkovik V, Lacruz-Pleguezuelos B, Aasmets O, Araujo R, Anagnostopoulos I, Aydemir Ö, Berland M, Calle ML, Ceci M, Duman H, Gündoğdu A, Havulinna AS, Kaka Bra KHN, Kalluci E, Karav S, Lode D, Lopes MB, May P, Nap B, Nedyalkova M, Paciência I, Pasic L, Pujolassos M, Shigdel R, Susín A, Thiele I, Truică CO, Wilmes P, Yilmaz E, Yousef M, Claesson MJ, Truu J, Carrillo de Santa Pau E. A toolbox of machine learning software to support microbiome analysis. Front Microbiol 2023; 14:1250806. [PMID: 38075858 PMCID: PMC10704913 DOI: 10.3389/fmicb.2023.1250806] [Citation(s) in RCA: 9] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/30/2023] [Accepted: 09/11/2023] [Indexed: 05/14/2025] Open
Abstract
The human microbiome has become an area of intense research due to its potential impact on human health. However, the analysis and interpretation of this data have proven to be challenging due to its complexity and high dimensionality. Machine learning (ML) algorithms can process vast amounts of data to uncover informative patterns and relationships within the data, even with limited prior knowledge. Therefore, there has been a rapid growth in the development of software specifically designed for the analysis and interpretation of microbiome data using ML techniques. These software incorporate a wide range of ML algorithms for clustering, classification, regression, or feature selection, to identify microbial patterns and relationships within the data and generate predictive models. This rapid development with a constant need for new developments and integration of new features require efforts into compile, catalog and classify these tools to create infrastructures and services with easy, transparent, and trustable standards. Here we review the state-of-the-art for ML tools applied in human microbiome studies, performed as part of the COST Action ML4Microbiome activities. This scoping review focuses on ML based software and framework resources currently available for the analysis of microbiome data in humans. The aim is to support microbiologists and biomedical scientists to go deeper into specialized resources that integrate ML techniques and facilitate future benchmarking to create standards for the analysis of microbiome data. The software resources are organized based on the type of analysis they were developed for and the ML techniques they implement. A description of each software with examples of usage is provided including comments about pitfalls and lacks in the usage of software based on ML methods in relation to microbiome data that need to be considered by developers and users. This review represents an extensive compilation to date, offering valuable insights and guidance for researchers interested in leveraging ML approaches for microbiome analysis.
Collapse
Affiliation(s)
- Laura Judith Marcos-Zambrano
- Computational Biology Group, Precision Nutrition and Cancer Research Program, IMDEA Food Institute, Madrid, Spain
| | - Víctor Manuel López-Molina
- Computational Biology Group, Precision Nutrition and Cancer Research Program, IMDEA Food Institute, Madrid, Spain
| | - Burcu Bakir-Gungor
- Department of Computer Engineering, Abdullah Gül University, Kayseri, Türkiye
| | - Marcus Frohme
- Division Molecular Biotechnology and Functional Genomics, Technical University of Applied Sciences Wildau, Wildau, Germany
| | | | - Thomas Klammsteiner
- Department of Microbiology and Department of Ecology, University of Innsbruck, Innsbruck, Austria
| | - Eliana Ibrahimi
- Department of Biology, University of Tirana, Tirana, Albania
| | - Leo Lahti
- Department of Computing, University of Turku, Turku, Finland
| | | | - Xhilda Dhamo
- Department of Applied Mathematics, Faculty of Natural Sciences, University of Tirana, Tirana, Albania
| | - Andrea Simeon
- BioSense Institute, University of Novi Sad, Novi Sad, Serbia
| | - Alina Nechyporenko
- Division Molecular Biotechnology and Functional Genomics, Technical University of Applied Sciences Wildau, Wildau, Germany
- Department of Systems Engineering, Kharkiv National University of Radioelectronics, Kharkiv, Ukraine
| | - Gianvito Pio
- Department of Computer Science, University of Bari Aldo Moro, Bari, Italy
- Big Data Lab, National Interuniversity Consortium for Informatics, Rome, Italy
| | - Piotr Przymus
- Faculty of Mathematics and Computer Science, Nicolaus Copernicus University, Toruń, Poland
| | - Alexia Sampri
- Victor Phillip Dahdaleh Heart and Lung Research Institute, University of Cambridge, Cambridge, United Kingdom
| | - Vladimir Trajkovik
- Faculty of Computer Science and Engineering, Ss. Cyril and Methodius University, Skopje, North Macedonia
| | - Blanca Lacruz-Pleguezuelos
- Computational Biology Group, Precision Nutrition and Cancer Research Program, IMDEA Food Institute, Madrid, Spain
| | - Oliver Aasmets
- Institute of Genomics, Estonian Genome Centre, University of Tartu, Tartu, Estonia
- Department of Biotechnology, Institute of Molecular and Cell Biology, University of Tartu, Tartu, Estonia
| | - Ricardo Araujo
- Nephrology and Infectious Diseases R & D Group, i3S—Instituto de Investigação e Inovação em Saúde; INEB—Instituto de Engenharia Biomédica, Universidade do Porto, Porto, Portugal
| | - Ioannis Anagnostopoulos
- Department of Informatics, University of Piraeus, Piraeus, Greece
- Computer Science and Biomedical Informatics Department, University of Thessaly, Lamia, Greece
| | - Önder Aydemir
- Department of Electrical and Electronics Engineering, Karadeniz Technical University, Trabzon, Türkiye
| | - Magali Berland
- INRAE, MetaGenoPolis, Université Paris-Saclay, Jouy-en-Josas, France
| | - M. Luz Calle
- Faculty of Sciences, Technology and Engineering, University of Vic – Central University of Catalonia, Vic, Barcelona, Spain
- IRIS-CC, Fundació Institut de Recerca i Innovació en Ciències de la Vida i la Salut a la Catalunya Central, Vic, Barcelona, Spain
| | - Michelangelo Ceci
- Department of Computer Science, University of Bari Aldo Moro, Bari, Italy
- Big Data Lab, National Interuniversity Consortium for Informatics, Rome, Italy
| | - Hatice Duman
- Department of Molecular Biology and Genetics, Çanakkale Onsekiz Mart University, Çanakkale, Türkiye
| | - Aycan Gündoğdu
- Department of Microbiology and Clinical Microbiology, Faculty of Medicine, Erciyes University, Kayseri, Türkiye
- Metagenomics Laboratory, Genome and Stem Cell Center (GenKök), Erciyes University, Kayseri, Türkiye
| | - Aki S. Havulinna
- Finnish Institute for Health and Welfare - THL, Helsinki, Finland
- Institute for Molecular Medicine Finland, FIMM-HiLIFE, Helsinki, Finland
| | | | - Eglantina Kalluci
- Department of Applied Mathematics, Faculty of Natural Sciences, University of Tirana, Tirana, Albania
| | - Sercan Karav
- Department of Molecular Biology and Genetics, Çanakkale Onsekiz Mart University, Çanakkale, Türkiye
| | - Daniel Lode
- Division Molecular Biotechnology and Functional Genomics, Technical University of Applied Sciences Wildau, Wildau, Germany
| | - Marta B. Lopes
- Department of Mathematics, Center for Mathematics and Applications (NOVA Math), NOVA School of Science and Technology, Caparica, Portugal
- UNIDEMI, Department of Mechanical and Industrial Engineering, NOVA School of Science and Technology, Caparica, Portugal
| | - Patrick May
- Bioinformatics Core, Luxembourg Centre for Systems Biomedicine, University of Luxembourg, Esch-sur-Alzette, Luxembourg
| | - Bram Nap
- School of Medicine, University of Galway, Galway, Ireland
| | - Miroslava Nedyalkova
- Department of Inorganic Chemistry, Faculty of Chemistry and Pharmacy, University of Sofia, Sofia, Bulgaria
| | - Inês Paciência
- Center for Environmental and Respiratory Health Research (CERH), Research Unit of Population Health, University of Oulu, Oulu, Finland
- Biocenter Oulu, University of Oulu, Oulu, Finland
| | - Lejla Pasic
- Sarajevo Medical School, University Sarajevo School of Science and Technology, Sarajevo, Bosnia and Herzegovina
| | - Meritxell Pujolassos
- Faculty of Sciences, Technology and Engineering, University of Vic – Central University of Catalonia, Vic, Barcelona, Spain
| | - Rajesh Shigdel
- Department of Clinical Science, University of Bergen, Bergen, Norway
| | - Antonio Susín
- Mathematical Department, UPC-Barcelona Tech, Barcelona, Spain
| | - Ines Thiele
- School of Medicine, University of Galway, Galway, Ireland
- APC Microbiome Ireland, University College Cork, Cork, Ireland
| | - Ciprian-Octavian Truică
- Computer Science and Engineering Department, Faculty of Automatic Control and Computers, National University of Science and Technology Politehnica, Bucharest, Romania
| | - Paul Wilmes
- Systems Ecology Group, Luxembourg Centre for Systems Biomedicine, Esch-sur-Alzette, Luxembourg
- Department of Life Sciences and Medicine, Faculty of Science, Technology and Medicine, University of Luxembourg, Belvaux, Luxembourg
| | - Ercument Yilmaz
- Department of Computer Technologies, Karadeniz Technical University, Trabzon, Türkiye
| | - Malik Yousef
- Department of Information Systems, Zefat Academic College, Zefat, Israel
- Galilee Digital Health Research Center (GDH), Zefat Academic College, Zefat, Israel
| | - Marcus Joakim Claesson
- APC Microbiome Ireland, University College Cork, Cork, Ireland
- School of Microbiology, University College Cork, Cork, Ireland
| | - Jaak Truu
- Institute of Molecular and Cell Biology, University of Tartu, Tartu, Estonia
| | | |
Collapse
|
8
|
Chen W, Wang H, Liang C. Deep multi-view contrastive learning for cancer subtype identification. Brief Bioinform 2023; 24:bbad282. [PMID: 37539822 DOI: 10.1093/bib/bbad282] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/22/2023] [Revised: 05/29/2023] [Accepted: 07/19/2023] [Indexed: 08/05/2023] Open
Abstract
Cancer heterogeneity has posed great challenges in exploring precise therapeutic strategies for cancer treatment. The identification of cancer subtypes aims to detect patients with distinct molecular profiles and thus could provide new clues on effective clinical therapies. While great efforts have been made, it remains challenging to develop powerful computational methods that can efficiently integrate multi-omics datasets for the task. In this paper, we propose a novel self-supervised learning model called Deep Multi-view Contrastive Learning (DMCL) for cancer subtype identification. Specifically, by incorporating the reconstruction loss, contrastive loss and clustering loss into a unified framework, our model simultaneously encodes the sample discriminative information into the extracted feature representations and well preserves the sample cluster structures in the embedded space. Moreover, DMCL is an end-to-end framework where the cancer subtypes could be directly obtained from the model outputs. We compare DMCL with eight alternatives ranging from classic cancer subtype identification methods to recently developed state-of-the-art systems on 10 widely used cancer multi-omics datasets as well as an integrated dataset, and the experimental results validate the superior performance of our method. We further conduct a case study on liver cancer and the analysis results indicate that different subtypes might have different responses to the selected chemotherapeutic drugs.
Collapse
Affiliation(s)
- Wenlan Chen
- School of Information Science and Engineering, Shandong Normal University, Jinan, 250358, China
| | - Hong Wang
- School of Information Science and Engineering, Shandong Normal University, Jinan, 250358, China
| | - Cheng Liang
- School of Information Science and Engineering, Shandong Normal University, Jinan, 250358, China
| |
Collapse
|
9
|
Cortes-Ciriano I, Steele CD, Piculell K, Al-Ibraheemi A, Eulo V, Bui MM, Chatzipli A, Dickson BC, Borcherding DC, Feber A, Galor A, Hart J, Jones KB, Jordan JT, Kim RH, Lindsay D, Miller C, Nishida Y, Proszek PZ, Serrano J, Sundby RT, Szymanski JJ, Ullrich NJ, Viskochil D, Wang X, Snuderl M, Park PJ, Flanagan AM, Hirbe AC, Pillay N, Miller DT, for the Genomics of MPNST (GeM) Consortium. Genomic Patterns of Malignant Peripheral Nerve Sheath Tumor (MPNST) Evolution Correlate with Clinical Outcome and Are Detectable in Cell-Free DNA. Cancer Discov 2023; 13:654-671. [PMID: 36598417 PMCID: PMC9983734 DOI: 10.1158/2159-8290.cd-22-0786] [Citation(s) in RCA: 36] [Impact Index Per Article: 18.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/19/2022] [Revised: 11/09/2022] [Accepted: 12/16/2022] [Indexed: 01/05/2023]
Abstract
Malignant peripheral nerve sheath tumor (MPNST), an aggressive soft-tissue sarcoma, occurs in people with neurofibromatosis type 1 (NF1) and sporadically. Whole-genome and multiregional exome sequencing, transcriptomic, and methylation profiling of 95 tumor samples revealed the order of genomic events in tumor evolution. Following biallelic inactivation of NF1, loss of CDKN2A or TP53 with or without inactivation of polycomb repressive complex 2 (PRC2) leads to extensive somatic copy-number aberrations (SCNA). Distinct pathways of tumor evolution are associated with inactivation of PRC2 genes and H3K27 trimethylation (H3K27me3) status. Tumors with H3K27me3 loss evolve through extensive chromosomal losses followed by whole-genome doubling and chromosome 8 amplification, and show lower levels of immune cell infiltration. Retention of H3K27me3 leads to extensive genomic instability, but an immune cell-rich phenotype. Specific SCNAs detected in both tumor samples and cell-free DNA (cfDNA) act as a surrogate for H3K27me3 loss and immune infiltration, and predict prognosis. SIGNIFICANCE MPNST is the most common cause of death and morbidity for individuals with NF1, a relatively common tumor predisposition syndrome. Our results suggest that somatic copy-number and methylation profiling of tumor or cfDNA could serve as a biomarker for early diagnosis and to stratify patients into prognostic and treatment-related subgroups. This article is highlighted in the In This Issue feature, p. 517.
Collapse
Affiliation(s)
- Isidro Cortes-Ciriano
- European Molecular Biology Laboratory, European Bioinformatics Institute, Hinxton, Cambridge, United Kingdom
| | - Christopher D. Steele
- Research Department of Pathology, University College London Cancer Institute, Bloomsbury, London, United Kingdom
| | - Katherine Piculell
- Division of Genetics and Genomics, Boston Children's Hospital, Boston, Massachusetts
| | - Alyaa Al-Ibraheemi
- Department of Pathology, Boston Children's Hospital, Boston, Massachusetts
| | - Vanessa Eulo
- Division of Oncology, Department of Internal Medicine, University of Alabama at Birmingham, Birmingham, Alabama
| | - Marilyn M. Bui
- Department of Pathology, Moffitt Cancer Center & Research Institute, Tampa, Florida
| | - Aikaterini Chatzipli
- Department of Biomedical Informatics, Harvard Medical School, Boston, Massachusetts
| | - Brendan C. Dickson
- Department of Laboratory Medicine and Pathobiology, University of Toronto, Toronto, Ontario, Canada
- Department of Pathology and Laboratory Medicine, Mount Sinai Hospital, Toronto, Ontario, Canada
| | - Dana C. Borcherding
- Division of Oncology, Departments of Internal Medicine and Pediatrics, Siteman Cancer Center, Washington University School of Medicine, St. Louis, Missouri
| | - Andrew Feber
- Clinical Genomics Translational Research, Institute of Cancer Research, Royal Marsden NHS Foundation Trust, London, United Kingdom
| | - Alon Galor
- Wellcome Centre for Human Genetics, University of Oxford, Oxford, United Kingdom
| | - Jesse Hart
- Department of Pathology, Lifespan Laboratories, Rhode Island Hospital, Providence, Rhode Island
| | - Kevin B. Jones
- Departments of Orthopaedics and Oncological Sciences, Huntsman Cancer Institute, University of Utah, Salt Lake City, Utah
| | - Justin T. Jordan
- Pappas Center for Neuro-oncology, Massachusetts General Hospital, Boston, Massachusetts
| | - Raymond H. Kim
- Division of Medical Oncology and Hematology, Princess Margaret Cancer Centre, Sinai Health System, Toronto, Ontario, Canada
- Hospital for Sick Children, University of Toronto, Ontario Institute for Cancer Research, Toronto, Ontario, Canada
| | - Daniel Lindsay
- Department of Histopathology, Royal National Orthopaedic Hospital, NHS Trust, Middlesex, United Kingdom
| | - Colin Miller
- European Molecular Biology Laboratory, European Bioinformatics Institute, Hinxton, Cambridge, United Kingdom
| | - Yoshihiro Nishida
- Department of Rehabilitation Medicine, Nagoya University Hospital, Nagoya, Aichi, Japan
| | - Paula Z. Proszek
- Clinical Genomics Translational Research, Institute of Cancer Research, Royal Marsden NHS Foundation Trust, London, United Kingdom
| | - Jonathan Serrano
- Department of Pathology, New York University Langone Health, Perlmutter Cancer Center, New York City, New York
| | - R. Taylor Sundby
- Pediatric Oncology Branch, Center for Cancer Research, National Cancer Institute, National Institutes of Health, Bethesda, Maryland
| | - Jeffrey J. Szymanski
- Division of Cancer Biology, Department of Radiation Oncology, Washington University School of Medicine, St. Louis, Missouri
| | - Nicole J. Ullrich
- Department of Neurology, Boston Children's Hospital, Boston, Massachusetts
| | - David Viskochil
- Division of Medical Genetics, Huntsman Cancer Institute, University of Utah, Salt Lake City, Utah
| | - Xia Wang
- GeneHome, Department of Individualized Cancer Management, H. Lee Moffitt Cancer Center and Research Institute, Tampa, Florida
| | - Matija Snuderl
- Department of Pathology, New York University Langone Health, Perlmutter Cancer Center, New York City, New York
| | - Peter J. Park
- Department of Biomedical Informatics, Harvard Medical School, Boston, Massachusetts
| | - Adrienne M. Flanagan
- Research Department of Pathology, University College London Cancer Institute, Bloomsbury, London, United Kingdom
- Department of Histopathology, Royal National Orthopaedic Hospital, NHS Trust, Middlesex, United Kingdom
| | - Angela C. Hirbe
- Division of Oncology, Departments of Internal Medicine and Pediatrics, Siteman Cancer Center, Washington University School of Medicine, St. Louis, Missouri
| | - Nischalan Pillay
- Research Department of Pathology, University College London Cancer Institute, Bloomsbury, London, United Kingdom
- Department of Histopathology, Royal National Orthopaedic Hospital, NHS Trust, Middlesex, United Kingdom
| | - David T. Miller
- Division of Genetics and Genomics, Boston Children's Hospital, Boston, Massachusetts
| | | |
Collapse
|
10
|
Madhumita, Dwivedi A, Paul S. Recursive integration of synergised graph representations of multi-omics data for cancer subtypes identification. Sci Rep 2022; 12:15629. [PMID: 36115864 PMCID: PMC9482647 DOI: 10.1038/s41598-022-17585-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/23/2021] [Accepted: 07/27/2022] [Indexed: 11/09/2022] Open
Abstract
AbstractCancer subtypes identification is one of the critical steps toward advancing personalized anti-cancerous therapies. Accumulation of a massive amount of multi-platform omics data measured across the same set of samples provides an opportunity to look into this deadly disease from several views simultaneously. Few integrative clustering approaches are developed to capture shared information from all the views to identify cancer subtypes. However, they have certain limitations. The challenge here is identifying the most relevant feature space from each omic view and systematically integrating them. Both the steps should lead toward a global clustering solution with biological significance. In this respect, a novel multi-omics clustering algorithm named RISynG (Recursive Integration of Synergised Graph-representations) is presented in this study. RISynG represents each omic view as two representation matrices that are Gramian and Laplacian. A parameterised combination function is defined to obtain a synergy matrix from these representation matrices. Then a recursive multi-kernel approach is applied to integrate the most relevant, shared, and complementary information captured via the respective synergy matrices. At last, clustering is applied to the integrated subspace. RISynG is benchmarked on five multi-omics cancer datasets taken from The Cancer Genome Atlas. The experimental results demonstrate RISynG’s efficiency over the other approaches in this domain.
Collapse
|
11
|
Combining Molecular, Imaging, and Clinical Data Analysis for Predicting Cancer Prognosis. Cancers (Basel) 2022; 14:cancers14133215. [PMID: 35804988 PMCID: PMC9265023 DOI: 10.3390/cancers14133215] [Citation(s) in RCA: 14] [Impact Index Per Article: 4.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/26/2022] [Revised: 06/24/2022] [Accepted: 06/27/2022] [Indexed: 02/04/2023] Open
Abstract
Simple Summary The rise of Big Data, the widespread use of Machine Learning, and the cheapening of omics techniques have allowed for the creation of more sophisticated and accurate models in biomedical research. This article presents the state-of-the-art predictive models of cancer prognosis that use multimodal data, considering clinical, molecular (omics and non-omics), and image data. The subject of study, the data modalities used, the data processing and modelling methods applied, the validation strategies involved, the integration strategies encompassed, and the evolution of prognostic predictive models are discussed. Finally, we discuss challenges and opportunities in this field of cancer research, with great potential impact on the clinical management of patients and, by extension, on the implementation of personalised and precision medicine. Abstract Cancer is one of the most detrimental diseases globally. Accordingly, the prognosis prediction of cancer patients has become a field of interest. In this review, we have gathered 43 state-of-the-art scientific papers published in the last 6 years that built cancer prognosis predictive models using multimodal data. We have defined the multimodality of data as four main types: clinical, anatomopathological, molecular, and medical imaging; and we have expanded on the information that each modality provides. The 43 studies were divided into three categories based on the modelling approach taken, and their characteristics were further discussed together with current issues and future trends. Research in this area has evolved from survival analysis through statistical modelling using mainly clinical and anatomopathological data to the prediction of cancer prognosis through a multi-faceted data-driven approach by the integration of complex, multimodal, and high-dimensional data containing multi-omics and medical imaging information and by applying Machine Learning and, more recently, Deep Learning techniques. This review concludes that cancer prognosis predictive multimodal models are capable of better stratifying patients, which can improve clinical management and contribute to the implementation of personalised medicine as well as provide new and valuable knowledge on cancer biology and its progression.
Collapse
|
12
|
Brière G, Darbo É, Thébault P, Uricaru R. Consensus clustering applied to multi-omics disease subtyping. BMC Bioinformatics 2021; 22:361. [PMID: 34229612 PMCID: PMC8259015 DOI: 10.1186/s12859-021-04279-1] [Citation(s) in RCA: 47] [Impact Index Per Article: 11.8] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/20/2020] [Accepted: 06/28/2021] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Facing the diversity of omics data and the difficulty of selecting one result over all those produced by several methods, consensus strategies have the potential to reconcile multiple inputs and to produce robust results. RESULTS Here, we introduce ClustOmics, a generic consensus clustering tool that we use in the context of cancer subtyping. ClustOmics relies on a non-relational graph database, which allows for the simultaneous integration of both multiple omics data and results from various clustering methods. This new tool conciliates input clusterings, regardless of their origin, their number, their size or their shape. ClustOmics implements an intuitive and flexible strategy, based upon the idea of evidence accumulation clustering. ClustOmics computes co-occurrences of pairs of samples in input clusters and uses this score as a similarity measure to reorganize data into consensus clusters. CONCLUSION We applied ClustOmics to multi-omics disease subtyping on real TCGA cancer data from ten different cancer types. We showed that ClustOmics is robust to heterogeneous qualities of input partitions, smoothing and reconciling preliminary predictions into high-quality consensus clusters, both from a computational and a biological point of view. The comparison to a state-of-the-art consensus-based integration tool, COCA, further corroborated this statement. However, the main interest of ClustOmics is not to compete with other tools, but rather to make profit from their various predictions when no gold-standard metric is available to assess their significance. AVAILABILITY The ClustOmics source code, released under MIT license, and the results obtained on TCGA cancer data are available on GitHub: https://github.com/galadrielbriere/ClustOmics .
Collapse
Affiliation(s)
- Galadriel Brière
- CNRS, Bordeaux INP, LaBRI, UMR 5800, Univ. Bordeaux, 33400, Talence, France. .,INRA, Bordeaux INP, NutriNeuro, UMR 1286, Univ. Bordeaux, 33000, Bordeaux, France.
| | - Élodie Darbo
- CNRS, Bordeaux INP, LaBRI, UMR 5800, Univ. Bordeaux, 33400, Talence, France.,INSERM U1218, Institut Bergonié, Univ. Bordeaux, 33076, Bordeaux, France
| | - Patricia Thébault
- CNRS, Bordeaux INP, LaBRI, UMR 5800, Univ. Bordeaux, 33400, Talence, France
| | - Raluca Uricaru
- CNRS, Bordeaux INP, LaBRI, UMR 5800, Univ. Bordeaux, 33400, Talence, France
| |
Collapse
|
13
|
Adossa N, Khan S, Rytkönen KT, Elo LL. Computational strategies for single-cell multi-omics integration. Comput Struct Biotechnol J 2021; 19:2588-2596. [PMID: 34025945 PMCID: PMC8114078 DOI: 10.1016/j.csbj.2021.04.060] [Citation(s) in RCA: 44] [Impact Index Per Article: 11.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/19/2021] [Revised: 04/23/2021] [Accepted: 04/24/2021] [Indexed: 02/06/2023] Open
Abstract
Single-cell omics technologies are currently solving biological and medical problems that earlier have remained elusive, such as discovery of new cell types, cellular differentiation trajectories and communication networks across cells and tissues. Current advances especially in single-cell multi-omics hold high potential for breakthroughs by integration of multiple different omics layers. To pair with the recent biotechnological developments, many computational approaches to process and analyze single-cell multi-omics data have been proposed. In this review, we first introduce recent developments in single-cell multi-omics in general and then focus on the available data integration strategies. The integration approaches are divided into three categories: early, intermediate, and late data integration. For each category, we describe the underlying conceptual principles and main characteristics, as well as provide examples of currently available tools and how they have been applied to analyze single-cell multi-omics data. Finally, we explore the challenges and prospective future directions of single-cell multi-omics data integration, including examples of adopting multi-view analysis approaches used in other disciplines to single-cell multi-omics.
Collapse
Affiliation(s)
- Nigatu Adossa
- Turku Bioscience Centre, University of Turku and Åbo Akademi University, 20520 Turku, Finland
| | - Sofia Khan
- Turku Bioscience Centre, University of Turku and Åbo Akademi University, 20520 Turku, Finland
| | - Kalle T. Rytkönen
- Turku Bioscience Centre, University of Turku and Åbo Akademi University, 20520 Turku, Finland
- Institute of Biomedicine, University of Turku, 20520 Turku, Finland
| | - Laura L. Elo
- Turku Bioscience Centre, University of Turku and Åbo Akademi University, 20520 Turku, Finland
- Institute of Biomedicine, University of Turku, 20520 Turku, Finland
| |
Collapse
|