1
|
Jiang J, Zhao K, Li W, Zheng P, Jiang S, Ren Q, Duan Y, Yu H, Kang X, Li J, Hu K, Jiang T, Zhao M, Wang L, Yang S, Zhang H, Liu Y, Wang A, Liu Y, Xu J. Multiomics Reveals Biological Mechanisms Linking Macroscale Structural Covariance Network Dysfunction With Neuropsychiatric Symptoms Across the Alzheimer's Disease Continuum. Biol Psychiatry 2025; 97:1067-1078. [PMID: 39419461 DOI: 10.1016/j.biopsych.2024.08.027] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 02/01/2024] [Revised: 07/04/2024] [Accepted: 08/28/2024] [Indexed: 10/19/2024]
Abstract
BACKGROUND The high heterogeneity of neuropsychiatric symptoms (NPSs) hinders further exploration of their role in neurobiological mechanisms and Alzheimer's disease (AD). We aimed to delineate NPS patterns based on brain macroscale connectomics to understand the biological mechanisms of NPSs on the AD continuum. METHODS We constructed regional radiomics similarity networks for 550 participants (AD with NPSs [n = 376], AD without NPSs [n = 111], and normal control participants [n = 63]) from the CIBL (Chinese Imaging, Biomarkers, and Lifestyle) study. We identified regional radiomics similarity network connections associated with NPSs and then clustered distinct subtypes of AD with NPSs. An independent dataset (n = 189) and internal validation were performed to assess the robustness of the NPS subtypes. Subsequent multiomics analysis was performed to assess the distinct clinical phenotype and biological mechanisms in each NPS subtype. RESULTS AD patients with NPSs were clustered into severe (n = 187), moderate (n = 87), and mild (n = 102) NPS subtypes, each exhibiting distinct brain network dysfunction patterns. A high level of consistency in clustering NPSs was internally and externally validated. Severe and moderate NPS subtypes were associated with significant cognitive impairment, increased plasma p-tau181 (tau phosphorylated at threonine 181) levels, extensive decreased brain volume and cortical thickness, and accelerated cognitive decline. Gene set enrichment analysis revealed enrichment of differentially expressed genes in ion transport and synaptic transmission with variations for each NPS subtype. Genome-wide association study analysis defined the specific gene loci for each subtype of AD with NPSs (e.g., logical memory), consistent with clinical manifestations and progression patterns. CONCLUSIONS This study identified and validated 3 distinct NPS subtypes, underscoring the role of NPSs in neurobiological mechanisms and progression of the AD continuum.
Collapse
Affiliation(s)
- Jiwei Jiang
- Department of Neurology, Beijing Tiantan Hospital, Capital Medical University, Beijing, China; China National Clinical Research Center for Neurological Diseases, Beijing, China
| | - Kun Zhao
- School of Artificial Intelligence, Beijing University of Posts and Telecommunications, Beijing, China; Queen Mary School Hainan, Beijing University of Posts and Telecommunications, Hainan, China.
| | - Wenyi Li
- Department of Neurology, Beijing Tiantan Hospital, Capital Medical University, Beijing, China; China National Clinical Research Center for Neurological Diseases, Beijing, China
| | - Peiyang Zheng
- School of Artificial Intelligence, Beijing University of Posts and Telecommunications, Beijing, China
| | - Shirui Jiang
- Department of Neurology, Beijing Tiantan Hospital, Capital Medical University, Beijing, China; China National Clinical Research Center for Neurological Diseases, Beijing, China
| | - Qiwei Ren
- Department of Neurology, Beijing Tiantan Hospital, Capital Medical University, Beijing, China; China National Clinical Research Center for Neurological Diseases, Beijing, China
| | - Yunyun Duan
- Department of Neurology, Beijing Tiantan Hospital, Capital Medical University, Beijing, China
| | - Huiying Yu
- School of Information Science and Engineering, Shandong Normal University, Jinan, China
| | - Xiaopeng Kang
- School of Artificial Intelligence, University of Chinese Academy of Sciences, Beijing, China
| | - Junjie Li
- Department of Neurology, Beijing Tiantan Hospital, Capital Medical University, Beijing, China
| | - Ke Hu
- School of Artificial Intelligence, University of Chinese Academy of Sciences, Beijing, China
| | - Tianlin Jiang
- Department of Neurology, Beijing Tiantan Hospital, Capital Medical University, Beijing, China; China National Clinical Research Center for Neurological Diseases, Beijing, China
| | - Min Zhao
- Department of Neurology, Beijing Tiantan Hospital, Capital Medical University, Beijing, China; China National Clinical Research Center for Neurological Diseases, Beijing, China
| | - Linlin Wang
- Department of Neurology, Beijing Tiantan Hospital, Capital Medical University, Beijing, China; China National Clinical Research Center for Neurological Diseases, Beijing, China
| | - Shiyi Yang
- Department of Neurology, Beijing Tiantan Hospital, Capital Medical University, Beijing, China; China National Clinical Research Center for Neurological Diseases, Beijing, China
| | - Huiying Zhang
- Department of Neurology, Beijing Tiantan Hospital, Capital Medical University, Beijing, China; China National Clinical Research Center for Neurological Diseases, Beijing, China
| | - Yaou Liu
- Department of Neurology, Beijing Tiantan Hospital, Capital Medical University, Beijing, China
| | - Anxin Wang
- Department of Neurology, Beijing Tiantan Hospital, Capital Medical University, Beijing, China; China National Clinical Research Center for Neurological Diseases, Beijing, China
| | - Yong Liu
- School of Artificial Intelligence, Beijing University of Posts and Telecommunications, Beijing, China; Queen Mary School Hainan, Beijing University of Posts and Telecommunications, Hainan, China.
| | - Jun Xu
- Department of Neurology, Beijing Tiantan Hospital, Capital Medical University, Beijing, China; China National Clinical Research Center for Neurological Diseases, Beijing, China.
| |
Collapse
|
2
|
Llinas-Bertran A, Butjosa-Espín M, Barberi V, Seoane JA. Multimodal data integration in early-stage breast cancer. Breast 2025; 80:103892. [PMID: 39922065 PMCID: PMC11973824 DOI: 10.1016/j.breast.2025.103892] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/10/2024] [Revised: 12/13/2024] [Accepted: 01/27/2025] [Indexed: 02/10/2025] Open
Abstract
The use of biomarkers in breast cancer has significantly improved patient outcomes through targeted therapies, such as hormone therapy anti-Her2 therapy and CDK4/6 or PARP inhibitors. However, existing knowledge does not fully encompass the diverse nature of breast cancer, particularly in triple-negative tumors. The integration of multi-omics and multimodal data has the potential to provide new insights into biological processes, to improve breast cancer patient stratification, enhance prognosis and response prediction, and identify new biomarkers. This review presents a comprehensive overview of the state-of-the-art multimodal (including molecular and image) data integration algorithms developed and with applicability to breast cancer stratification, prognosis, or biomarker identification. We examined the primary challenges and opportunities of these multimodal data integration algorithms, including their advantages, limitations, and critical considerations for future research. We aimed to describe models that are not only academically and preclinically relevant, but also applicable to clinical settings.
Collapse
Affiliation(s)
- Arnau Llinas-Bertran
- Cancer Computational Biology Group, Vall d'Hebron Institute of Oncology (VHIO), Barcelona, Spain
| | - Maria Butjosa-Espín
- Cancer Computational Biology Group, Vall d'Hebron Institute of Oncology (VHIO), Barcelona, Spain
| | - Vittoria Barberi
- Breast Cancer Group, Vall d'Hebron Institute of Oncology (VHIO), Barcelona, Spain
| | - Jose A Seoane
- Cancer Computational Biology Group, Vall d'Hebron Institute of Oncology (VHIO), Barcelona, Spain.
| |
Collapse
|
3
|
Zheng H, Sarkar H, Raphael BJ. Joint imputation and deconvolution of gene expression across spatial transcriptomics platforms. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2025:2025.02.17.638195. [PMID: 40027720 PMCID: PMC11870578 DOI: 10.1101/2025.02.17.638195] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Subscribe] [Scholar Register] [Indexed: 03/05/2025]
Abstract
Spatially resolved transcriptomics (SRT) technologies measure gene expression across thousands of spatial locations within a tissue slice. Multiple SRT technologies are currently available and others are in active development with each technology having varying spatial resolution (subcellular, single-cell, or multicellular regions), gene coverage (targeted vs. whole-transcriptome), and sequencing depth per location. For example, the widely used 10x Genomics Visium platform measures whole transcriptomes from multiple-cell-sized spots, while the 10x Genomics Xenium platform measures a few hundred genes at subcellular resolution. A number of studies apply multiple SRT technologies to slices that originate from the same biological tissue. Integration of data from different SRT technologies can overcome limitations of the individual technologies enabling the imputation of expression from unmeasured genes in targeted technologies and/or the deconvolution of ad-mixed expression from technologies with lower spatial resolution. We introduce Spatial Integration for Imputation and Deconvolution (SIID), an algorithm to reconstruct a latent spatial gene expression matrix from a pair of observations from different SRT technologies. SIID leverages a spatial alignment and uses a joint non-negative factorization model to accurately impute missing gene expression and infer gene expression signatures of cell types from ad-mixed SRT data. In simulations involving paired SRT datasets from different technologies (e.g., Xenium and Visium), SIID shows superior performance in reconstructing spot-to-cell-type assignments, recovering cell-type-specific gene expression, and imputing missing data compared to contemporary tools. When applied to real-world 10x Xenium-Visium pairs from human breast and colon cancer tissues, SIID achieves highest performance in imputing holdout gene expression. A PyTorch implementation of SIID is available at https://github.com/raphael-group/siid .
Collapse
Affiliation(s)
- Hongyu Zheng
- Department of Computer Science, Princeton University, Princeton, NJ, USA
| | - Hirak Sarkar
- Department of Computer Science, Princeton University, Princeton, NJ, USA
- Ludwig Cancer Institute, Princeton Branch, Princeton University, Princeton, NJ, USA
| | | |
Collapse
|
4
|
Pržulj N, Malod-Dognin N. Simplicity within biological complexity. BIOINFORMATICS ADVANCES 2025; 5:vbae164. [PMID: 39927291 PMCID: PMC11805345 DOI: 10.1093/bioadv/vbae164] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 05/15/2024] [Revised: 10/01/2024] [Accepted: 10/23/2024] [Indexed: 02/11/2025]
Abstract
Motivation Heterogeneous, interconnected, systems-level, molecular (multi-omic) data have become increasingly available and key in precision medicine. We need to utilize them to better stratify patients into risk groups, discover new biomarkers and targets, repurpose known and discover new drugs to personalize medical treatment. Existing methodologies are limited and a paradigm shift is needed to achieve quantitative and qualitative breakthroughs. Results In this perspective paper, we survey the literature and argue for the development of a comprehensive, general framework for embedding of multi-scale molecular network data that would enable their explainable exploitation in precision medicine in linear time. Network embedding methods (also called graph representation learning) map nodes to points in low-dimensional space, so that proximity in the learned space reflects the network's topology-function relationships. They have recently achieved unprecedented performance on hard problems of utilizing few omic data in various biomedical applications. However, research thus far has been limited to special variants of the problems and data, with the performance depending on the underlying topology-function network biology hypotheses, the biomedical applications, and evaluation metrics. The availability of multi-omic data, modern graph embedding paradigms and compute power call for a creation and training of efficient, explainable and controllable models, having no potentially dangerous, unexpected behaviour, that make a qualitative breakthrough. We propose to develop a general, comprehensive embedding framework for multi-omic network data, from models to efficient and scalable software implementation, and to apply it to biomedical informatics, focusing on precision medicine and personalized drug discovery. It will lead to a paradigm shift in the computational and biomedical understanding of data and diseases that will open up ways to solve some of the major bottlenecks in precision medicine and other domains.
Collapse
Affiliation(s)
- Nataša Pržulj
- Computational Biology Department, Mohamed bin Zayed University of Artificial Intelligence, Abu Dhabi, 00000, United Arabic Emirates
- Barcelona Supercomputing Center, Barcelona 08034, Spain
- Department of Computer Science, University College London, London WC1E6BT, United Kingdom
- ICREA, Pg. Lluís Companys 23, Barcelona 08010, Spain
| | | |
Collapse
|
5
|
Mercadié A, Gravier É, Josse G, Fournier I, Viodé C, Vialaneix N, Brouard C. NMFProfiler: a multi-omics integration method for samples stratified in groups. Bioinformatics 2025; 41:btaf066. [PMID: 39921890 PMCID: PMC11855281 DOI: 10.1093/bioinformatics/btaf066] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/13/2024] [Revised: 01/13/2025] [Accepted: 02/05/2025] [Indexed: 02/10/2025] Open
Abstract
MOTIVATION The development of high-throughput sequencing enabled the massive production of "omics" data for various applications in biology. By analyzing simultaneously paired datasets collected on the same samples, integrative statistical approaches allow researchers to get a global picture of such systems and to highlight existing relationships between various molecular types and levels. Here, we introduce NMFProfiler, an integrative supervised NMF that accounts for the stratification of samples into groups of biological interest. RESULTS NMFProfiler was shown to successfully extract signatures characterizing groups with performances comparable to or better than state-of-the-art approaches. In particular, NMFProfiler was used in a clinical study on atopic dermatitis (AD) and to analyze a multi-omic cancer dataset. In the first case, it successfully identified signatures combining known AD protein biomarkers and novel transcriptomic biomarkers. In addition, it was also able to extract signatures significantly associated to cancer survival. AVAILABILITY AND IMPLEMENTATION NMFProfiler is released as a Python package, NMFProfiler (v0.3.0), available on PyPI.
Collapse
Affiliation(s)
- Aurélie Mercadié
- Recherche & Développement, Pierre Fabre Dermo-cosmétique, Toulouse 31300, France
- Université de Toulouse, INRAE, UR MIAT, Castanet-Tolosan Cedex 31326, France
| | - Éléonore Gravier
- Recherche & Développement, Pierre Fabre Dermo-cosmétique, Toulouse 31300, France
| | - Gwendal Josse
- Recherche & Développement, Pierre Fabre Dermo-cosmétique, Toulouse 31300, France
| | - Isabelle Fournier
- Université de Lille, Inserm, CHU Lille, U1192 PRISM, Lille 59000, France
| | - Cécile Viodé
- Recherche & Développement, Pierre Fabre Dermo-cosmétique, Toulouse 31300, France
| | - Nathalie Vialaneix
- Université de Toulouse, INRAE, UR MIAT, Castanet-Tolosan Cedex 31326, France
| | - Céline Brouard
- Université de Toulouse, INRAE, UR MIAT, Castanet-Tolosan Cedex 31326, France
| |
Collapse
|
6
|
Huang B, Chen Y, Yuan S. Application of Spatial Transcriptomics in Digestive System Tumors. Biomolecules 2024; 15:21. [PMID: 39858416 PMCID: PMC11761220 DOI: 10.3390/biom15010021] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/09/2024] [Revised: 12/15/2024] [Accepted: 12/24/2024] [Indexed: 01/27/2025] Open
Abstract
In the field of digestive system tumor research, spatial transcriptomics technologies are used to delve into the spatial structure and the spatial heterogeneity of tumors and to analyze the tumor microenvironment (TME) and the inter-cellular interactions within it by revealing gene expression in tumors. These technologies are also instrumental in the diagnosis, prognosis, and treatment of digestive system tumors. This review provides a concise introduction to spatial transcriptomics and summarizes recent advances, application prospects, and technical challenges of these technologies in digestive system tumor research. This review also discusses the importance of combining spatial transcriptomics with single-cell RNA sequencing (scRNA-seq), artificial intelligence, and machine learning in digestive system cancer research.
Collapse
Affiliation(s)
- Bowen Huang
- Department of Gastric Surgery, Sun Yat-sen University Cancer Center, State Key Laboratory of Oncology in South China, Guangdong Provincial Clinical Research Center for Cancer, Guangzhou 510060, China;
| | - Yingjia Chen
- Health Science Center, Peking University, Beijing 100191, China
| | - Shuqiang Yuan
- Department of Gastric Surgery, Sun Yat-sen University Cancer Center, State Key Laboratory of Oncology in South China, Guangdong Provincial Clinical Research Center for Cancer, Guangzhou 510060, China;
| |
Collapse
|
7
|
Agrawal A, Thomann S, Basu S, Grün D. NiCo identifies extrinsic drivers of cell state modulation by niche covariation analysis. Nat Commun 2024; 15:10628. [PMID: 39639035 PMCID: PMC11621405 DOI: 10.1038/s41467-024-54973-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/17/2024] [Accepted: 11/22/2024] [Indexed: 12/07/2024] Open
Abstract
Cell states are modulated by intrinsic driving forces such as gene expression noise and extrinsic signals from the tissue microenvironment. The distinction between intrinsic and extrinsic cell state determinants is essential for understanding the regulation of cell fate in tissues during development, homeostasis and disease. The rapidly growing availability of single-cell resolution spatial transcriptomics makes it possible to meet this challenge. However, available computational methods to infer topological tissue domains, spatially variable genes, or ligand-receptor interactions are limited in their capacity to capture cell state changes driven by crosstalk between individual cell types within the same niche. We present NiCo, a computational framework for integrating single-cell resolution spatial transcriptomics with matched single-cell RNA-sequencing reference data to infer the influence of the spatial niche on the cell state. By applying NiCo to mouse embryogenesis, adult small intestine and liver data, we demonstrate the ability to predict novel niche interactions that govern cell state variation underlying tissue development and homeostasis. In particular, NiCo predicts a feedback mechanism between Kupffer cells and neighboring stellate cells dampening stellate cell activation in the normal liver. NiCo provides a powerful tool to elucidate tissue architecture and to identify drivers of cellular states in local niches.
Collapse
Affiliation(s)
- Ankit Agrawal
- Würzburg Institute of Systems Immunology, Julius-Maximilians-Universität Würzburg, Würzburg, Germany
| | - Stefan Thomann
- Würzburg Institute of Systems Immunology, Julius-Maximilians-Universität Würzburg, Würzburg, Germany
| | - Sukanya Basu
- Würzburg Institute of Systems Immunology, Julius-Maximilians-Universität Würzburg, Würzburg, Germany
| | - Dominic Grün
- Würzburg Institute of Systems Immunology, Julius-Maximilians-Universität Würzburg, Würzburg, Germany.
- CAIDAS - Center for Artificial Intelligence and Data Science, Würzburg, Germany.
| |
Collapse
|
8
|
Miao Y, Xu H, Wang S. PartIES: a disease subtyping framework with Partition-level Integration using diffusion-Enhanced Similarities from multi-omics Data. Brief Bioinform 2024; 26:bbae609. [PMID: 39584699 PMCID: PMC11586768 DOI: 10.1093/bib/bbae609] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/25/2024] [Revised: 10/16/2024] [Accepted: 11/11/2024] [Indexed: 11/26/2024] Open
Abstract
Integrating multi-omics data helps identify disease subtypes. Many similarity-based methods were developed for disease subtyping using multi-omics data, with many of them focusing on extracting common clustering structures across multiple types of omics data, but not preserving data-type-specific clustering structures. Moreover, clustering performance of similarity-based methods is affected when similarity measures are noisy. Here we proposed PartIES, a Partition-level Integration using diffusion-Enhanced Similarities to perform disease subtyping using multi-omics data. PartIES uses diffusion to reduce noises in individual similarity/kernel matrices from individual omics data types first, and then extract partition information from diffusion-enhanced similarity matrices and integrate the partition-level similarity through a weighted average iteratively. Simulation studies showed that (1) the diffusion step enhances clustering accuracy, and (2) PartIES outperforms competing methods, particularly when omics data types provide different clustering structures. Using mRNA, long noncoding RNAs, microRNAs expression data, DNA methylation data, and somatic mutation data from The Cancer Genome Atlas project, PartIES identified subtypes in bladder urothelial carcinoma, liver hepatocellular carcinoma, and thyroid carcinoma that are most significantly associated with patient survival across all methods. Further investigations suggested that among subtype-associated genes, many of those that are highly interacting with other genes are known important cancer genes. The identified cancer subtypes also have different activity levels for some known cancer-related pathways. The R code can be accessed at https://github.com/yuqimiao/PartIES.git.
Collapse
Affiliation(s)
- Yuqi Miao
- Department of Biostatistics, Mailman School of Public Health, Columbia University, New York, NY 10027, United States
| | - Huang Xu
- Department of Biostatistics, Mailman School of Public Health, Columbia University, New York, NY 10027, United States
| | - Shuang Wang
- Department of Biostatistics, Mailman School of Public Health, Columbia University, New York, NY 10027, United States
| |
Collapse
|
9
|
Ansari MI, Ahmed KT, Zhang W. Optimizing multi-omics data imputation with NMF and GAN synergy. Bioinformatics 2024; 40:btae674. [PMID: 39546381 PMCID: PMC11639186 DOI: 10.1093/bioinformatics/btae674] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/05/2024] [Revised: 10/21/2024] [Accepted: 11/08/2024] [Indexed: 11/17/2024] Open
Abstract
MOTIVATION Integrating multiple omics datasets can significantly advance our understanding of disease mechanisms, physiology, and treatment responses. However, a major challenge in multi-omics studies is the disparity in sample sizes across different datasets, which can introduce bias and reduce statistical power. To address this issue, we propose a novel framework, OmicsNMF, designed to impute missing omics data and enhance disease phenotype prediction. OmicsNMF integrates Generative Adversarial Networks (GANs) with Non-Negative Matrix Factorization (NMF). NMF is a well-established method for uncovering underlying patterns in omics data, while GANs enhance the imputation process by generating realistic data samples. This synergy aims to more effectively address sample size disparity, thereby improving data integration and prediction accuracy. RESULTS For evaluation, we focused on predicting breast cancer subtypes using the imputed data generated by our proposed framework, OmicsNMF. Our results indicate that OmicsNMF consistently outperforms baseline methods. We further assessed the quality of the imputed data through survival analysis, revealing that the imputed omics profiles provide significant prognostic power for both overall survival and disease-free status. Overall, OmicsNMF effectively leverages GANs and NMF to impute missing samples while preserving key biological features. This approach shows potential for advancing precision oncology by improving data integration and analysis. AVAILABILITY AND IMPLEMENTATION Source code is available at: https://github.com/compbiolabucf/OmicsNMF.
Collapse
Affiliation(s)
- Md Istiaq Ansari
- Department of Computer Science, University of Central Florida, Orlando, FL 32816, United States
- Department of Genomics and Bioinformatics Cluster, University of Central Florida, Orlando, FL 32816, United States
| | - Khandakar Tanvir Ahmed
- Department of Computer Science, University of Central Florida, Orlando, FL 32816, United States
- Department of Genomics and Bioinformatics Cluster, University of Central Florida, Orlando, FL 32816, United States
| | - Wei Zhang
- Department of Computer Science, University of Central Florida, Orlando, FL 32816, United States
- Department of Genomics and Bioinformatics Cluster, University of Central Florida, Orlando, FL 32816, United States
| |
Collapse
|
10
|
Lock EF. Empirical Bayes Linked Matrix Decomposition. Mach Learn 2024; 113:7451-7477. [PMID: 39759800 PMCID: PMC11698509 DOI: 10.1007/s10994-024-06599-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/29/2024] [Revised: 07/12/2024] [Accepted: 07/16/2024] [Indexed: 01/07/2025]
Abstract
Data for several applications in diverse fields can be represented as multiple matrices that are linked across rows or columns. This is particularly common in molecular biomedical research, in which multiple molecular "omics" technologies may capture different feature sets (e.g., corresponding to rows in a matrix) and/or different sample populations (corresponding to columns). This has motivated a large body of work on integrative matrix factorization approaches that identify and decompose low-dimensional signal that is shared across multiple matrices or specific to a given matrix. We propose an empirical variational Bayesian approach to this problem that has several advantages over existing techniques, including the flexibility to accommodate shared signal over any number of row or column sets (i.e., bidimensional integration), an intuitive model-based objective function that yields appropriate shrinkage for the inferred signals, and a relatively efficient estimation algorithm with no tuning parameters. A general result establishes conditions for the uniqueness of the underlying decomposition for a broad family of methods that includes the proposed approach. For scenarios with missing data, we describe an associated iterative imputation approach that is novel for the single-matrix context and a powerful approach for "blockwise" imputation (in which an entire row or column is missing) in various linked matrix contexts. Extensive simulations show that the method performs very well under different scenarios with respect to recovering underlying low-rank signal, accurately decomposing shared and specific signals, and accurately imputing missing data. The approach is applied to gene expression and miRNA data from breast cancer tissue and normal breast tissue, for which it gives an informative decomposition of variation and outperforms alternative strategies for missing data imputation.
Collapse
Affiliation(s)
- Eric F. Lock
- Division of Biostatistics and Health Data Science, School of Public Health, University of Minnesota, Minneapolis, 55455, MN, USA
| |
Collapse
|
11
|
Wu W, Zhang W, Gong M, Ma X. Noised Multi-Layer Networks Clustering With Graph Denoising and Structure Learning. IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING 2024; 36:5294-5307. [DOI: 10.1109/tkde.2023.3335223] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/11/2025]
Affiliation(s)
- Wenming Wu
- School of Computer Science and Technology, Xidian University, Xi'an, Shaanxi, China
| | - Wensheng Zhang
- School of Computer Science and Cyber Engineering, Guangzhou University, Guangzhou, Guangdong, China
| | - Maoguo Gong
- School of Electronic Engineering, Xidian University, Xi'an, Shaanxi, China
| | - Xiaoke Ma
- School of Computer Science and Technology, Xidian University, Xi'an, Shaanxi, China
| |
Collapse
|
12
|
Samorodnitsky S, Wendt CH, Lock EF. Bayesian Simultaneous Factorization and Prediction Using Multi-Omic Data. Comput Stat Data Anal 2024; 197:107974. [PMID: 38947282 PMCID: PMC11210674 DOI: 10.1016/j.csda.2024.107974] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 07/02/2024]
Abstract
Integrative factorization methods for multi-omic data estimate factors explaining biological variation. Factors can be treated as covariates to predict an outcome and the factorization can be used to impute missing values. However, no available methods provide a comprehensive framework for statistical inference and uncertainty quantification for these tasks. A novel framework, Bayesian Simultaneous Factorization (BSF), is proposed to decompose multi-omics variation into joint and individual structures simultaneously within a probabilistic framework. BSF uses conjugate normal priors and the posterior mode of this model can be estimated by solving a structured nuclear norm-penalized objective that also achieves rank selection and motivates the choice of hyperparameters. BSF is then extended to simultaneously predict a continuous or binary phenotype while estimating latent factors, termed Bayesian Simultaneous Factorization and Prediction (BSFP). BSF and BSFP accommodate concurrent imputation, i.e., imputation during the model-fitting process, and full posterior inference for missing data, including "blockwise" missingness. It is shown via simulation that BSFP is competitive in recovering latent variation structure, and demonstrate the importance of accounting for uncertainty in the estimated factorization within the predictive model. The imputation performance of BSF is examined via simulation under missing-at-random and missing-not-at-random assumptions. Finally, BSFP is used to predict lung function based on the bronchoalveolar lavage metabolome and proteome from a study of HIV-associated obstructive lung disease, revealing multi-omic patterns related to lung function decline and a cluster of patients with obstructive lung disease driven by shared metabolomic and proteomic abundance patterns.
Collapse
Affiliation(s)
- Sarah Samorodnitsky
- Division of Biostatistics, University of Minnesota, Minneapolis, 55455, MN, USA
- Fred Hutch Cancer Center, Seattle, 98109, WA, USA
| | - Chris H. Wendt
- Minneapolis VA Health Care System, Minneapolis, 55417, MN, USA
| | - Eric F. Lock
- Division of Biostatistics, University of Minnesota, Minneapolis, 55455, MN, USA
| |
Collapse
|
13
|
Scala G, Ferraro L, Brandi A, Guo Y, Majello B, Ceccarelli M. MoNETA: MultiOmics Network Embedding for SubType Analysis. NAR Genom Bioinform 2024; 6:lqae141. [PMID: 39416887 PMCID: PMC11482636 DOI: 10.1093/nargab/lqae141] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/25/2024] [Revised: 07/19/2024] [Accepted: 10/04/2024] [Indexed: 10/19/2024] Open
Abstract
Cells are complex systems whose behavior emerges from a huge number of reactions taking place within and among different molecular districts. The availability of bulk and single-cell omics data fueled the creation of multi-omics systems biology models capturing the dynamics within and between omics layers. Powerful modeling strategies are needed to cope with the increased amount of data to be interrogated and the relative research questions. Here, we present MultiOmics Network Embedding for SubType Analysis (MoNETA) for fast and scalable identification of relevant multi-omics relationships between biological entities at the bulk and single-cells level. We apply MoNETA to show how glioma subtypes previously described naturally emerge with our approach. We also show how MoNETA can be used to identify cell types in five multi-omic single-cell datasets.
Collapse
Affiliation(s)
- Giovanni Scala
- Department of Biology, University of Naples ‘Federico II’, 80128 Naples, Italy
| | - Luigi Ferraro
- Sylvester Comprehensive Cancer Center, University of Miami, 33136, Miami, USA
| | - Aurora Brandi
- Department of Biology, University of Naples ‘Federico II’, 80128 Naples, Italy
| | - Yan Guo
- Sylvester Comprehensive Cancer Center, University of Miami, 33136, Miami, USA
| | - Barbara Majello
- Department of Biology, University of Naples ‘Federico II’, 80128 Naples, Italy
| | - Michele Ceccarelli
- Sylvester Comprehensive Cancer Center, University of Miami, 33136, Miami, USA
| |
Collapse
|
14
|
Kobel CM, Merkesvik J, Burgos IMT, Lai W, Øyås O, Pope PB, Hvidsten TR, Aho VTE. Integrating host and microbiome biology using holo-omics. Mol Omics 2024; 20:438-452. [PMID: 38963125 DOI: 10.1039/d4mo00017j] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 07/05/2024]
Abstract
Holo-omics is the use of omics data to study a host and its inherent microbiomes - a biological system known as a "holobiont". A microbiome that exists in such a space often encounters habitat stability and in return provides metabolic capacities that can benefit their host. Here we present an overview of beneficial host-microbiome systems and propose and discuss several methodological frameworks that can be used to investigate the intricacies of the many as yet undefined host-microbiome interactions that influence holobiont homeostasis. While this is an emerging field, we anticipate that ongoing methodological advancements will enhance the biological resolution that is necessary to improve our understanding of host-microbiome interplay to make meaningful interpretations and biotechnological applications.
Collapse
Affiliation(s)
- Carl M Kobel
- Faculty of Biosciences, Norwegian University of Life Sciences, Ås, Norway.
| | - Jenny Merkesvik
- Faculty of Chemistry, Biotechnology and Food Science, Norwegian University of Life Sciences, Ås, Norway
| | | | - Wanxin Lai
- Faculty of Chemistry, Biotechnology and Food Science, Norwegian University of Life Sciences, Ås, Norway
| | - Ove Øyås
- Faculty of Biosciences, Norwegian University of Life Sciences, Ås, Norway.
| | - Phillip B Pope
- Faculty of Biosciences, Norwegian University of Life Sciences, Ås, Norway.
- Faculty of Chemistry, Biotechnology and Food Science, Norwegian University of Life Sciences, Ås, Norway
- Centre for Microbiome Research, School of Biomedical Sciences, Queensland University of Technology (QUT), Translational Research Institute, Woolloongabba, Queensland, Australia
| | - Torgeir R Hvidsten
- Faculty of Chemistry, Biotechnology and Food Science, Norwegian University of Life Sciences, Ås, Norway
| | - Velma T E Aho
- Faculty of Biosciences, Norwegian University of Life Sciences, Ås, Norway.
| |
Collapse
|
15
|
Tooley K, Jerby L, Escobar G, Krovi SH, Mangani D, Dandekar G, Cheng H, Madi A, Goldschmidt E, Lambden C, Krishnan RK, Rozenblatt-Rosen O, Regev A, Anderson AC. Pan-cancer mapping of single CD8 + T cell profiles reveals a TCF1:CXCR6 axis regulating CD28 co-stimulation and anti-tumor immunity. Cell Rep Med 2024; 5:101640. [PMID: 38959885 PMCID: PMC11293343 DOI: 10.1016/j.xcrm.2024.101640] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/31/2023] [Revised: 01/05/2024] [Accepted: 06/11/2024] [Indexed: 07/05/2024]
Abstract
CD8+ T cells must persist and function in diverse tumor microenvironments to exert their effects. Thus, understanding common underlying expression programs could better inform the next generation of immunotherapies. We apply a generalizable matrix factorization algorithm that recovers both shared and context-specific expression programs from diverse datasets to a single-cell RNA sequencing (scRNA-seq) compendium of 33,161 CD8+ T cells from 132 patients with seven human cancers. Our meta-single-cell analyses uncover a pan-cancer T cell dysfunction program that predicts clinical non-response to checkpoint blockade in melanoma and highlights CXCR6 as a pan-cancer marker of chronically activated T cells. Cxcr6 is trans-activated by AP-1 and repressed by TCF1. Using mouse models, we show that Cxcr6 deletion in CD8+ T cells increases apoptosis of PD1+TIM3+ cells, dampens CD28 signaling, and compromises tumor growth control. Our study uncovers a TCF1:CXCR6 axis that counterbalances PD1-mediated suppression of CD8+ cell responses and is essential for effective anti-tumor immunity.
Collapse
Affiliation(s)
- Katherine Tooley
- The Gene Lay Institute of Immunology and Inflammation of Brigham and Women's Hospital, Massachusetts General Hospital, and Harvard Medical School, Boston, MA, USA; Division of Medical Sciences, Harvard Medical School, Boston, MA, USA; Ann Romney Center for Neurologic Diseases, Brigham and Women's Hospital and Harvard Medical School, Boston, MA, USA
| | - Livnat Jerby
- Department of Genetics, Stanford University School of Medicine, Stanford, CA, USA; Chan Zuckerberg Biohub, San Francisco, CA 94158, USA; Klarman Cell Observatory, Broad Institute of MIT and Harvard, Cambridge, MA, USA.
| | - Giulia Escobar
- The Gene Lay Institute of Immunology and Inflammation of Brigham and Women's Hospital, Massachusetts General Hospital, and Harvard Medical School, Boston, MA, USA; Ann Romney Center for Neurologic Diseases, Brigham and Women's Hospital and Harvard Medical School, Boston, MA, USA
| | - S Harsha Krovi
- The Gene Lay Institute of Immunology and Inflammation of Brigham and Women's Hospital, Massachusetts General Hospital, and Harvard Medical School, Boston, MA, USA; Ann Romney Center for Neurologic Diseases, Brigham and Women's Hospital and Harvard Medical School, Boston, MA, USA
| | - Davide Mangani
- The Gene Lay Institute of Immunology and Inflammation of Brigham and Women's Hospital, Massachusetts General Hospital, and Harvard Medical School, Boston, MA, USA; Ann Romney Center for Neurologic Diseases, Brigham and Women's Hospital and Harvard Medical School, Boston, MA, USA
| | - Gitanjali Dandekar
- The Gene Lay Institute of Immunology and Inflammation of Brigham and Women's Hospital, Massachusetts General Hospital, and Harvard Medical School, Boston, MA, USA; Ann Romney Center for Neurologic Diseases, Brigham and Women's Hospital and Harvard Medical School, Boston, MA, USA
| | - Hanning Cheng
- The Gene Lay Institute of Immunology and Inflammation of Brigham and Women's Hospital, Massachusetts General Hospital, and Harvard Medical School, Boston, MA, USA; Ann Romney Center for Neurologic Diseases, Brigham and Women's Hospital and Harvard Medical School, Boston, MA, USA
| | - Asaf Madi
- Department of Pathology, Sackler School of Medicine, Tel Aviv University, Tel Aviv, Israel
| | - Ella Goldschmidt
- Department of Pathology, Sackler School of Medicine, Tel Aviv University, Tel Aviv, Israel
| | - Conner Lambden
- The Gene Lay Institute of Immunology and Inflammation of Brigham and Women's Hospital, Massachusetts General Hospital, and Harvard Medical School, Boston, MA, USA; Ann Romney Center for Neurologic Diseases, Brigham and Women's Hospital and Harvard Medical School, Boston, MA, USA
| | - Rajesh K Krishnan
- The Gene Lay Institute of Immunology and Inflammation of Brigham and Women's Hospital, Massachusetts General Hospital, and Harvard Medical School, Boston, MA, USA; Ann Romney Center for Neurologic Diseases, Brigham and Women's Hospital and Harvard Medical School, Boston, MA, USA
| | | | - Aviv Regev
- Klarman Cell Observatory, Broad Institute of MIT and Harvard, Cambridge, MA, USA; Howard Hughes Medical Institute and Koch Institute of Integrative Cancer Research, Department of Biology, Massachusetts Institute of Technology, Cambridge, MA, USA.
| | - Ana C Anderson
- The Gene Lay Institute of Immunology and Inflammation of Brigham and Women's Hospital, Massachusetts General Hospital, and Harvard Medical School, Boston, MA, USA; Ann Romney Center for Neurologic Diseases, Brigham and Women's Hospital and Harvard Medical School, Boston, MA, USA.
| |
Collapse
|
16
|
Dai Y, Li J, Yamamoto K, Goyama S, Loza M, Park SJ, Nakai K. Integrative analysis of cancer multimodality data identifying COPS5 as a novel biomarker of diffuse large B-cell lymphoma. Front Genet 2024; 15:1407765. [PMID: 38974382 PMCID: PMC11224480 DOI: 10.3389/fgene.2024.1407765] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/27/2024] [Accepted: 06/03/2024] [Indexed: 07/09/2024] Open
Abstract
Preventing, diagnosing, and treating diseases requires accurate clinical biomarkers, which remains challenging. Recently, advanced computational approaches have accelerated the discovery of promising biomarkers from high-dimensional multimodal data. Although machine-learning methods have greatly contributed to the research fields, handling data sparseness, which is not unusual in research settings, is still an issue as it leads to limited interpretability and performance in the presence of missing information. Here, we propose a novel pipeline integrating joint non-negative matrix factorization (JNMF), identifying key features within sparse high-dimensional heterogeneous data, and a biological pathway analysis, interpreting the functionality of features by detecting activated signaling pathways. By applying our pipeline to large-scale public cancer datasets, we identified sets of genomic features relevant to specific cancer types as common pattern modules (CPMs) of JNMF. We further detected COPS5 as a potential upstream regulator of pathways associated with diffuse large B-cell lymphoma (DLBCL). COPS5 exhibited co-overexpression with MYC, TP53, and BCL2, known DLBCL marker genes, and its high expression was correlated with a lower survival probability of DLBCL patients. Using the CRISPR-Cas9 system, we confirmed the tumor growth effect of COPS5, which suggests it as a novel prognostic biomarker for DLBCL. Our results highlight that integrating multiple high-dimensional data and effectively decomposing them to interpretable dimensions unravels hidden biological importance, which enhances the discovery of clinical biomarkers.
Collapse
Affiliation(s)
- Yutong Dai
- Department of Computational Biology and Medical Science, The University of Tokyo, Kashiwa, Japan
| | - Jingmei Li
- Department of Computational Biology and Medical Science, The University of Tokyo, Kashiwa, Japan
| | - Keita Yamamoto
- Department of Computational Biology and Medical Science, The University of Tokyo, Kashiwa, Japan
| | - Susumu Goyama
- Department of Computational Biology and Medical Science, The University of Tokyo, Kashiwa, Japan
| | - Martin Loza
- The Institute of Medical Science, The University of Tokyo, Tokyo, Japan
| | - Sung-Joon Park
- The Institute of Medical Science, The University of Tokyo, Tokyo, Japan
| | - Kenta Nakai
- Department of Computational Biology and Medical Science, The University of Tokyo, Kashiwa, Japan
- The Institute of Medical Science, The University of Tokyo, Tokyo, Japan
| |
Collapse
|
17
|
Matsuoka T, Yashiro M. Bioinformatics Analysis and Validation of Potential Markers Associated with Prediction and Prognosis of Gastric Cancer. Int J Mol Sci 2024; 25:5880. [PMID: 38892067 PMCID: PMC11172243 DOI: 10.3390/ijms25115880] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/18/2024] [Revised: 05/23/2024] [Accepted: 05/25/2024] [Indexed: 06/21/2024] Open
Abstract
Gastric cancer (GC) is one of the most common cancers worldwide. Most patients are diagnosed at the progressive stage of the disease, and current anticancer drug advancements are still lacking. Therefore, it is crucial to find relevant biomarkers with the accurate prediction of prognoses and good predictive accuracy to select appropriate patients with GC. Recent advances in molecular profiling technologies, including genomics, epigenomics, transcriptomics, proteomics, and metabolomics, have enabled the approach of GC biology at multiple levels of omics interaction networks. Systemic biological analyses, such as computational inference of "big data" and advanced bioinformatic approaches, are emerging to identify the key molecular biomarkers of GC, which would benefit targeted therapies. This review summarizes the current status of how bioinformatics analysis contributes to biomarker discovery for prognosis and prediction of therapeutic efficacy in GC based on a search of the medical literature. We highlight emerging individual multi-omics datasets, such as genomics, epigenomics, transcriptomics, proteomics, and metabolomics, for validating putative markers. Finally, we discuss the current challenges and future perspectives to integrate multi-omics analysis for improving biomarker implementation. The practical integration of bioinformatics analysis and multi-omics datasets under complementary computational analysis is having a great impact on the search for predictive and prognostic biomarkers and may lead to an important revolution in treatment.
Collapse
Affiliation(s)
- Tasuku Matsuoka
- Department of Molecular Oncology and Therapeutics, Osaka Metropolitan University Graduate School of Medicine, 1-4-3 Asahi-machi, Abeno-ku, Osaka 5458585, Japan;
- Institute of Medical Genetics, Osaka Metropolitan University, 1-4-3 Asahi-machi, Abeno-ku, Osaka 5458585, Japan
| | - Masakazu Yashiro
- Department of Molecular Oncology and Therapeutics, Osaka Metropolitan University Graduate School of Medicine, 1-4-3 Asahi-machi, Abeno-ku, Osaka 5458585, Japan;
- Institute of Medical Genetics, Osaka Metropolitan University, 1-4-3 Asahi-machi, Abeno-ku, Osaka 5458585, Japan
| |
Collapse
|
18
|
Novoloaca A, Broc C, Beloeil L, Yu WH, Becker J. Comparative analysis of integrative classification methods for multi-omics data. Brief Bioinform 2024; 25:bbae331. [PMID: 38985929 PMCID: PMC11234228 DOI: 10.1093/bib/bbae331] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/19/2023] [Revised: 05/31/2024] [Indexed: 07/12/2024] Open
Abstract
Recent advances in sequencing, mass spectrometry, and cytometry technologies have enabled researchers to collect multiple 'omics data types from a single sample. These large datasets have led to a growing consensus that a holistic approach is needed to identify new candidate biomarkers and unveil mechanisms underlying disease etiology, a key to precision medicine. While many reviews and benchmarks have been conducted on unsupervised approaches, their supervised counterparts have received less attention in the literature and no gold standard has emerged yet. In this work, we present a thorough comparison of a selection of six methods, representative of the main families of intermediate integrative approaches (matrix factorization, multiple kernel methods, ensemble learning, and graph-based methods). As non-integrative control, random forest was performed on concatenated and separated data types. Methods were evaluated for classification performance on both simulated and real-world datasets, the latter being carefully selected to cover different medical applications (infectious diseases, oncology, and vaccines) and data modalities. A total of 15 simulation scenarios were designed from the real-world datasets to explore a large and realistic parameter space (e.g. sample size, dimensionality, class imbalance, effect size). On real data, the method comparison showed that integrative approaches performed better or equally well than their non-integrative counterpart. By contrast, DIABLO and the four random forest alternatives outperform the others across the majority of simulation scenarios. The strengths and limitations of these methods are discussed in detail as well as guidelines for future applications.
Collapse
Affiliation(s)
- Alexei Novoloaca
- BIOASTER Research Institute, 40 avenue Tony Garnier, F-69007 Lyon, France
| | - Camilo Broc
- BIOASTER Research Institute, 40 avenue Tony Garnier, F-69007 Lyon, France
| | - Laurent Beloeil
- BIOASTER Research Institute, 40 avenue Tony Garnier, F-69007 Lyon, France
| | - Wen-Han Yu
- Bill & Melinda Gates Medical Research Institute, Cambridge, Massachusetts, MA 02139, United States
| | - Jérémie Becker
- BIOASTER Research Institute, 40 avenue Tony Garnier, F-69007 Lyon, France
| |
Collapse
|
19
|
Skok Gibbs C, Mahmood O, Bonneau R, Cho K. PMF-GRN: a variational inference approach to single-cell gene regulatory network inference using probabilistic matrix factorization. Genome Biol 2024; 25:88. [PMID: 38589899 PMCID: PMC11003171 DOI: 10.1186/s13059-024-03226-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/30/2023] [Accepted: 03/26/2024] [Indexed: 04/10/2024] Open
Abstract
Inferring gene regulatory networks (GRNs) from single-cell data is challenging due to heuristic limitations. Existing methods also lack estimates of uncertainty. Here we present Probabilistic Matrix Factorization for Gene Regulatory Network Inference (PMF-GRN). Using single-cell expression data, PMF-GRN infers latent factors capturing transcription factor activity and regulatory relationships. Using variational inference allows hyperparameter search for principled model selection and direct comparison to other generative models. We extensively test and benchmark our method using real single-cell datasets and synthetic data. We show that PMF-GRN infers GRNs more accurately than current state-of-the-art single-cell GRN inference methods, offering well-calibrated uncertainty estimates.
Collapse
Affiliation(s)
| | - Omar Mahmood
- Center for Data Science, New York University, New York, NY, 10011, USA
| | - Richard Bonneau
- Center for Data Science, New York University, New York, NY, 10011, USA
- Prescient Design, Genentech, New York, NY, 10010, USA
- Center for Genomics and Systems Biology, New York University, New York, NY, 10003, USA
| | - Kyunghyun Cho
- Center for Data Science, New York University, New York, NY, 10011, USA.
- Prescient Design, Genentech, New York, NY, 10010, USA.
| |
Collapse
|
20
|
Liang Q, Soto LS, Haymaker C, Chen K. Interpretable Spatial Gradient Analysis for Spatial Transcriptomics Data. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.03.19.585725. [PMID: 38562886 PMCID: PMC10983986 DOI: 10.1101/2024.03.19.585725] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 04/04/2024]
Abstract
Cellular anatomy and signaling vary across niches, which can induce gradated gene expressions in subpopulations of cells. Such spatial transcriptomic gradient (STG) makes a significant source of intratumor heterogeneity and can influence tumor invasion, progression, and response to treatment. Here we report Local Spatial Gradient Inference (LSGI), a computational framework that systematically identifies spatial locations with prominent, interpretable STGs from spatial transcriptomic (ST) data. To achieve so, LSGI scrutinizes each sliding window employing non-negative matrix factorization (NMF) combined with linear regression. With LSGI, we demonstrated the identification of spatially proximal yet opposite directed pathway gradients in a glioblastoma dataset. We further applied LSGI to 87 tumor ST datasets reported from nine published studies and identified both pan-cancer and tumor-type specific pathways with gradated expression patterns, such as epithelial mesenchymal transition, MHC complex, and hypoxia. The local gradients were further categorized according to their association to tumor-TME (tumor microenvironment) interface, highlighting the pathways related to spatial transcriptional intratumoral heterogeneity. We conclude that LSGI enables highly interpretable STG analysis which can reveal novel insights in tumor biology from the increasingly reported tumor ST datasets.
Collapse
Affiliation(s)
- Qingnan Liang
- Department of Bioinformatics and Computational Biology, UT MD Anderson Cancer Center
| | - Luisa Solis Soto
- Department of Translational Molecular Pathology, UT MD Anderson Cancer Center
| | - Cara Haymaker
- Department of Translational Molecular Pathology, UT MD Anderson Cancer Center
| | - Ken Chen
- Department of Bioinformatics and Computational Biology, UT MD Anderson Cancer Center
| |
Collapse
|
21
|
Tan H, Guo M, Chen J, Wang J, Yu G. HetFCM: functional co-module discovery by heterogeneous network co-clustering. Nucleic Acids Res 2024; 52:e16. [PMID: 38088228 PMCID: PMC10853805 DOI: 10.1093/nar/gkad1174] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/26/2023] [Revised: 10/31/2023] [Accepted: 11/23/2023] [Indexed: 02/10/2024] Open
Abstract
Functional molecular module (i.e., gene-miRNA co-modules and gene-miRNA-lncRNA triple-layer modules) analysis can dissect complex regulations underlying etiology or phenotypes. However, current module detection methods lack an appropriate usage and effective model of multi-omics data and cross-layer regulations of heterogeneous molecules, causing the loss of critical genetic information and corrupting the detection performance. In this study, we propose a heterogeneous network co-clustering framework (HetFCM) to detect functional co-modules. HetFCM introduces an attributed heterogeneous network to jointly model interplays and multi-type attributes of different molecules, and applies multiple variational graph autoencoders on the network to generate cross-layer association matrices, then it performs adaptive weighted co-clustering on association matrices and attribute data to identify co-modules of heterogeneous molecules. Empirical study on Human and Maize datasets reveals that HetFCM can find out co-modules characterized with denser topology and more significant functions, which are associated with human breast cancer (subtypes) and maize phenotypes (i.e., lipid storage, drought tolerance and oil content). HetFCM is a useful tool to detect co-modules and can be applied to multi-layer functional modules, yielding novel insights for analyzing molecular mechanisms. We also developed a user-friendly module detection and analysis tool and shared it at http://www.sdu-idea.cn/FMDTool.
Collapse
Affiliation(s)
- Haojiang Tan
- School of Software, Shandong University, Jinan 250101, Shandong, China
- Joint SDU-NTU Centre for Artificial Intelligence Research, Shandong University, Jinan 250101, Shandong, China
| | - Maozu Guo
- College of Electrical and Information Engineering, Beijing Uni. of Civil Eng. and Arch., Beijing 100044, China
| | - Jian Chen
- College of Agronomy & Biotechnolog, China Agricultural University, Beijing 100193, China
| | - Jun Wang
- Joint SDU-NTU Centre for Artificial Intelligence Research, Shandong University, Jinan 250101, Shandong, China
| | - Guoxian Yu
- School of Software, Shandong University, Jinan 250101, Shandong, China
- Joint SDU-NTU Centre for Artificial Intelligence Research, Shandong University, Jinan 250101, Shandong, China
| |
Collapse
|
22
|
Wu W, Ma X, Wang Q, Gong M, Gao Q. Learning deep representation and discriminative features for clustering of multi-layer networks. Neural Netw 2024; 170:405-416. [PMID: 38029721 DOI: 10.1016/j.neunet.2023.11.053] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/19/2023] [Revised: 09/29/2023] [Accepted: 11/22/2023] [Indexed: 12/01/2023]
Abstract
The multi-layer network consists of the interactions between different layers, where each layer of the network is depicted as a graph, providing a comprehensive way to model the underlying complex systems. The layer-specific modules of multi-layer networks are critical to understanding the structure and function of the system. However, existing methods fail to characterize and balance the connectivity and specificity of layer-specific modules in networks because of the complicated inter- and intra-coupling of various layers. To address the above issues, a joint learning graph clustering algorithm (DRDF) for detecting layer-specific modules in multi-layer networks is proposed, which simultaneously learns the deep representation and discriminative features. Specifically, DRDF learns the deep representation with deep nonnegative matrix factorization, where the high-order topology of the multi-layer network is gradually and precisely characterized. Moreover, it addresses the specificity of modules with discriminative feature learning, where the intra-class compactness and inter-class separation of pseudo-labels of clusters are explored as self-supervised information, thereby providing a more accurate method to explicitly model the specificity of the multi-layer network. Finally, DRDF balances the connectivity and specificity of layer-specific modules with joint learning, where the overall objective of the graph clustering algorithm and optimization rules are derived. The experiments on ten multi-layer networks showed that DRDF not only outperforms eight baselines on graph clustering but also enhances the robustness of algorithms.
Collapse
Affiliation(s)
- Wenming Wu
- School of Computer Science and Technology, Xidian University, No. 2 South Taibai Road, Xi'an, Shaanxi, 710071, China
| | - Xiaoke Ma
- School of Computer Science and Technology, Xidian University, No. 2 South Taibai Road, Xi'an, Shaanxi, 710071, China.
| | - Quan Wang
- School of Computer Science and Technology, Xidian University, No. 2 South Taibai Road, Xi'an, Shaanxi, 710071, China
| | - Maoguo Gong
- School of Electronic Engineering, Xidian University, No. 2 South Taibai Road, Xi'an, Shaanxi, 710071, China
| | - Quanxue Gao
- School of Telecommunication, Xidian University, No. 2 South Taibai Road, Xi'an, Shaanxi, 710071, China
| |
Collapse
|
23
|
Sun Z, Chung D, Neelon B, Millar-Wilson A, Ethier SP, Xiao F, Zheng Y, Wallace K, Hardiman G. A Bayesian framework for pathway-guided identification of cancer subgroups by integrating multiple types of genomic data. Stat Med 2023; 42:5266-5284. [PMID: 37715500 DOI: 10.1002/sim.9911] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/19/2021] [Revised: 07/15/2023] [Accepted: 09/05/2023] [Indexed: 09/17/2023]
Abstract
In recent years, comprehensive cancer genomics platforms, such as The Cancer Genome Atlas (TCGA), provide access to an enormous amount of high throughput genomic datasets for each patient, including gene expression, DNA copy number alterations, DNA methylation, and somatic mutation. While the integration of these multi-omics datasets has the potential to provide novel insights that can lead to personalized medicine, most existing approaches only focus on gene-level analysis and lack the ability to facilitate biological findings at the pathway-level. In this article, we propose Bayes-InGRiD (Bayesian Integrative Genomics Robust iDentification of cancer subgroups), a novel pathway-guided Bayesian sparse latent factor model for the simultaneous identification of cancer patient subgroups (clustering) and key molecular features (variable selection) within a unified framework, based on the joint analysis of continuous, binary, and count data. By utilizing pathway (gene set) information, Bayes-InGRiD does not only enhance the accuracy and robustness of cancer patient subgroup and key molecular feature identification, but also promotes biological understanding and interpretation. Finally, to facilitate an efficient posterior sampling, an alternative Gibbs sampler for logistic and negative binomial models is proposed using Pólya-Gamma mixtures of normal to represent latent variables for binary and count data, which yields a conditionally Gaussian representation of the posterior. The R package "INGRID" implementing the proposed approach is currently available in our research group GitHub webpage (https://dongjunchung.github.io/INGRID/).
Collapse
Affiliation(s)
- Zequn Sun
- Department of Preventive Medicine, Northwestern University, Chicago, Illinois
| | - Dongjun Chung
- Department of Biomedical Informatics, The Ohio State University, Columbus, Ohio
- The Pelotonia Institute for Immuno-Oncology, The Ohio State University Comprehensive Cancer Center, Columbus, Ohio
| | - Brian Neelon
- Department of Public Health Sciences, Medical University of South Carolina, Charleston, South Carolina
| | | | - Stephen P Ethier
- Department of Pathology and Laboratory Medicine, Medical University of South Carolina, Charleston, South Carolina
| | - Feifei Xiao
- Department of Biostatistics, University of Florida, Gainesville, Florida
| | - Yinan Zheng
- Department of Preventive Medicine, Northwestern University, Chicago, Illinois
| | - Kristin Wallace
- Department of Public Health Sciences, Medical University of South Carolina, Charleston, South Carolina
| | - Gary Hardiman
- Department of Public Health Sciences, Medical University of South Carolina, Charleston, South Carolina
- Faculty of Medicine, Health and Life Sciences, School of Biological Sciences and Institute for Global Food Security, Queen's University Belfast, Belfast, UK
| |
Collapse
|
24
|
Yi S, Wong RKW, Gaynanova I. Hierarchical nuclear norm penalization for multi-view data integration. Biometrics 2023; 79:2933-2946. [PMID: 37345491 DOI: 10.1111/biom.13893] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/24/2022] [Accepted: 05/18/2023] [Indexed: 06/23/2023]
Abstract
The prevalence of data collected on the same set of samples from multiple sources (i.e., multi-view data) has prompted significant development of data integration methods based on low-rank matrix factorizations. These methods decompose signal matrices from each view into the sum of shared and individual structures, which are further used for dimension reduction, exploratory analyses, and quantifying associations across views. However, existing methods have limitations in modeling partially-shared structures due to either too restrictive models, or restrictive identifiability conditions. To address these challenges, we propose a new formulation for signal structures that include partially-shared signals based on grouping the views into so-called hierarchical levels with identifiable guarantees under suitable conditions. The proposed hierarchy leads us to introduce a new penalty, hierarchical nuclear norm (HNN), for signal estimation. In contrast to existing methods, HNN penalization avoids scores and loadings factorization of the signals and leads to a convex optimization problem, which we solve using a dual forward-backward algorithm. We propose a simple refitting procedure to adjust the penalization bias and develop an adapted version of bi-cross-validation for selecting tuning parameters. Extensive simulation studies and analysis of the genotype-tissue expression data demonstrate the advantages of our method over existing alternatives.
Collapse
Affiliation(s)
- Sangyoon Yi
- Department of Statistics, Oklahoma State University, Stillwater, Oklahoma, USA
| | | | - Irina Gaynanova
- Department of Statistics, Texas A&M University, College Station, Texas, USA
| |
Collapse
|
25
|
Zhou Y, Luo K, Liang L, Chen M, He X. A new Bayesian factor analysis method improves detection of genes and biological processes affected by perturbations in single-cell CRISPR screening. Nat Methods 2023; 20:1693-1703. [PMID: 37770710 PMCID: PMC10630124 DOI: 10.1038/s41592-023-02017-4] [Citation(s) in RCA: 11] [Impact Index Per Article: 5.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/11/2022] [Accepted: 08/18/2023] [Indexed: 09/30/2023]
Abstract
Clustered regularly interspaced short palindromic repeats (CRISPR) screening coupled with single-cell RNA sequencing has emerged as a powerful tool to characterize the effects of genetic perturbations on the whole transcriptome at a single-cell level. However, due to its sparsity and complex structure, analysis of single-cell CRISPR screening data is challenging. In particular, standard differential expression analysis methods are often underpowered to detect genes affected by CRISPR perturbations. We developed a statistical method for such data, called guided sparse factor analysis (GSFA). GSFA infers latent factors that represent coregulated genes or gene modules; by borrowing information from these factors, it infers the effects of genetic perturbations on individual genes. We demonstrated through extensive simulation studies that GSFA detects perturbation effects with much higher power than state-of-the-art methods. Using single-cell CRISPR data from human CD8+ T cells and neural progenitor cells, we showed that GSFA identified biologically relevant gene modules and specific genes affected by CRISPR perturbations, many of which were missed by existing methods, providing new insights into the functions of genes involved in T cell activation and neurodevelopment.
Collapse
Affiliation(s)
- Yifan Zhou
- Graduate Program of Biophysical Sciences, University of Chicago, Chicago, IL, USA
- Department of Human Genetics, University of Chicago, Chicago, IL, USA
| | - Kaixuan Luo
- Department of Human Genetics, University of Chicago, Chicago, IL, USA
| | - Lifan Liang
- Department of Human Genetics, University of Chicago, Chicago, IL, USA
| | - Mengjie Chen
- Department of Human Genetics, University of Chicago, Chicago, IL, USA.
- Department of Medicine, University of Chicago, Chicago, IL, USA.
| | - Xin He
- Department of Human Genetics, University of Chicago, Chicago, IL, USA.
| |
Collapse
|
26
|
Carbonetto P, Luo K, Sarkar A, Hung A, Tayeb K, Pott S, Stephens M. GoM DE: interpreting structure in sequence count data with differential expression analysis allowing for grades of membership. Genome Biol 2023; 24:236. [PMID: 37858253 PMCID: PMC10588049 DOI: 10.1186/s13059-023-03067-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/03/2023] [Accepted: 09/20/2023] [Indexed: 10/21/2023] Open
Abstract
Parts-based representations, such as non-negative matrix factorization and topic modeling, have been used to identify structure from single-cell sequencing data sets, in particular structure that is not as well captured by clustering or other dimensionality reduction methods. However, interpreting the individual parts remains a challenge. To address this challenge, we extend methods for differential expression analysis by allowing cells to have partial membership to multiple groups. We call this grade of membership differential expression (GoM DE). We illustrate the benefits of GoM DE for annotating topics identified in several single-cell RNA-seq and ATAC-seq data sets.
Collapse
Affiliation(s)
- Peter Carbonetto
- Department of Human Genetics, University of Chicago, Chicago, IL, USA
- Research Computing Center, University of Chicago, Chicago, IL, USA
| | - Kaixuan Luo
- Department of Human Genetics, University of Chicago, Chicago, IL, USA
| | - Abhishek Sarkar
- Department of Human Genetics, University of Chicago, Chicago, IL, USA
- Vesalius Therapeutics, Cambridge, MA, USA
| | - Anthony Hung
- Department of Human Genetics, University of Chicago, Chicago, IL, USA
- Section of Genetic Medicine, University of Chicago, Chicago, IL, USA
| | - Karl Tayeb
- Department of Human Genetics, University of Chicago, Chicago, IL, USA
- Committee on Genetics, Genomics and Systems Biology, University of Chicago, Chicago, IL, USA
| | - Sebastian Pott
- Department of Human Genetics, University of Chicago, Chicago, IL, USA
- Section of Genetic Medicine, University of Chicago, Chicago, IL, USA
| | - Matthew Stephens
- Department of Human Genetics, University of Chicago, Chicago, IL, USA.
- Department of Statistics, University of Chicago, Chicago, IL, USA.
| |
Collapse
|
27
|
Ye X, Shang Y, Shi T, Zhang W, Sakurai T. Multi-omics clustering for cancer subtyping based on latent subspace learning. Comput Biol Med 2023; 164:107223. [PMID: 37490833 DOI: 10.1016/j.compbiomed.2023.107223] [Citation(s) in RCA: 9] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/30/2023] [Revised: 06/07/2023] [Accepted: 06/30/2023] [Indexed: 07/27/2023]
Abstract
The increased availability of high-throughput technologies has enabled biomedical researchers to learn about disease etiology across multiple omics layers, which shows promise for improving cancer subtype identification. Many computational methods have been developed to perform clustering on multi-omics data, however, only a few of them are applicable for partial multi-omics in which some samples lack data in some types of omics. In this study, we propose a novel multi-omics clustering method based on latent sub-space learning (MCLS), which can deal with the missing multi-omics for clustering. We utilize the data with complete omics to construct a latent subspace using PCA-based feature extraction and singular value decomposition (SVD). The data with incomplete multi-omics are then projected to the latent subspace, and spectral clustering is performed to find the clusters. The proposed MCLS method is evaluated on seven different cancer datasets on three levels of omics in both full and partial cases compared to several state-of-the-art methods. The experimental results show that the proposed MCLS method is more efficient and effective than the compared methods for cancer subtype identification in multi-omics data analysis, which provides important references to a comprehensive understanding of cancer and biological mechanisms. AVAILABILITY: The proposed method can be freely accessible at https://github.com/ShangCS/MCLS.
Collapse
Affiliation(s)
- Xiucai Ye
- Department of Computer Science, University of Tsukuba, Tsukuba, 3058577, Japan; Tsukuba Life Science Innovation Program, University of Tsukuba, Tsukuba, 3058577, Japan.
| | - Yifan Shang
- Department of Computer Science, University of Tsukuba, Tsukuba, 3058577, Japan
| | - Tianyi Shi
- Tsukuba Life Science Innovation Program, University of Tsukuba, Tsukuba, 3058577, Japan
| | - Weihang Zhang
- Department of Computer Science, University of Tsukuba, Tsukuba, 3058577, Japan
| | - Tetsuya Sakurai
- Department of Computer Science, University of Tsukuba, Tsukuba, 3058577, Japan; Tsukuba Life Science Innovation Program, University of Tsukuba, Tsukuba, 3058577, Japan
| |
Collapse
|
28
|
Itai Y, Rappoport N, Shamir R. Integration of gene expression and DNA methylation data across different experiments. Nucleic Acids Res 2023; 51:7762-7776. [PMID: 37395437 PMCID: PMC10450176 DOI: 10.1093/nar/gkad566] [Citation(s) in RCA: 7] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/16/2022] [Revised: 06/04/2023] [Accepted: 06/21/2023] [Indexed: 07/04/2023] Open
Abstract
Integrative analysis of multi-omic datasets has proven to be extremely valuable in cancer research and precision medicine. However, obtaining multimodal data from the same samples is often difficult. Integrating multiple datasets of different omics remains a challenge, with only a few available algorithms developed to solve it. Here, we present INTEND (IntegratioN of Transcriptomic and EpigeNomic Data), a novel algorithm for integrating gene expression and DNA methylation datasets covering disjoint sets of samples. To enable integration, INTEND learns a predictive model between the two omics by training on multi-omic data measured on the same set of samples. In comprehensive testing on 11 TCGA (The Cancer Genome Atlas) cancer datasets spanning 4329 patients, INTEND achieves significantly superior results compared with four state-of-the-art integration algorithms. We also demonstrate INTEND's ability to uncover connections between DNA methylation and the regulation of gene expression in the joint analysis of two lung adenocarcinoma single-omic datasets from different sources. INTEND's data-driven approach makes it a valuable multi-omic data integration tool. The code for INTEND is available at https://github.com/Shamir-Lab/INTEND.
Collapse
Affiliation(s)
- Yonatan Itai
- Blavatnik School of Computer Science, Tel Aviv University, Tel Aviv 69978, Israel
| | - Nimrod Rappoport
- Blavatnik School of Computer Science, Tel Aviv University, Tel Aviv 69978, Israel
| | - Ron Shamir
- Blavatnik School of Computer Science, Tel Aviv University, Tel Aviv 69978, Israel
| |
Collapse
|
29
|
Chen Y, Wen Y, Xie C, Chen X, He S, Bo X, Zhang Z. MOCSS: Multi-omics data clustering and cancer subtyping via shared and specific representation learning. iScience 2023; 26:107378. [PMID: 37559907 PMCID: PMC10407241 DOI: 10.1016/j.isci.2023.107378] [Citation(s) in RCA: 5] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/17/2023] [Revised: 05/23/2023] [Accepted: 07/07/2023] [Indexed: 08/11/2023] Open
Abstract
Cancer is an extremely complex disease and each type of cancer usually has several different subtypes. Multi-omics data can provide more comprehensive biological information for identifying and discovering cancer subtypes. However, existing unsupervised cancer subtyping methods cannot effectively learn comprehensive shared and specific information of multi-omics data. Therefore, a novel method is proposed based on shared and specific representation learning. For each omics data, two autoencoders are applied to extract shared and specific information, respectively. To reduce redundancy and mutual interference, orthogonality constraint is introduced to separate shared and specific information. In addition, contrastive learning is applied to align the shared information and strengthen their consistency. Finally, the obtained shared and specific information for all samples are used for clustering tasks to achieve cancer subtyping. Experimental results demonstrate that the proposed method can effectively capture shared and specific information of multi-omics data and outperform other state-of-the-art methods on cancer subtyping.
Collapse
Affiliation(s)
- Yuxin Chen
- School of Informatics, Xiamen University, Xiamen 361005, China
| | - Yuqi Wen
- Department of Bioinformatics, Institute of Health Service and Transfusion Medicine, Beijing 100850, China
| | - Chenyang Xie
- School of Informatics, Xiamen University, Xiamen 361005, China
| | - Xinjian Chen
- School of Informatics, Xiamen University, Xiamen 361005, China
| | - Song He
- Department of Bioinformatics, Institute of Health Service and Transfusion Medicine, Beijing 100850, China
| | - Xiaochen Bo
- Department of Bioinformatics, Institute of Health Service and Transfusion Medicine, Beijing 100850, China
| | - Zhongnan Zhang
- School of Informatics, Xiamen University, Xiamen 361005, China
| |
Collapse
|
30
|
Zhang Q, Yang M, Zhang P, Wu B, Wei X, Li S. Deciphering gastric inflammation-induced tumorigenesis through multi-omics data and AI methods. Cancer Biol Med 2023; 21:j.issn.2095-3941.2023.0129. [PMID: 37589244 PMCID: PMC11033716 DOI: 10.20892/j.issn.2095-3941.2023.0129] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/17/2023] [Accepted: 06/26/2023] [Indexed: 08/18/2023] Open
Abstract
Gastric cancer (GC), the fifth most common cancer globally, remains the leading cause of cancer deaths worldwide. Inflammation-induced tumorigenesis is the predominant process in GC development; therefore, systematic research in this area should improve understanding of the biological mechanisms that initiate GC development and promote cancer hallmarks. Here, we summarize biological knowledge regarding gastric inflammation-induced tumorigenesis, and characterize the multi-omics data and systems biology methods for investigating GC development. Of note, we highlight pioneering studies in multi-omics data and state-of-the-art network-based algorithms used for dissecting the features of gastric inflammation-induced tumorigenesis, and we propose translational applications in early GC warning biomarkers and precise treatment strategies. This review offers integrative insights for GC research, with the goal of paving the way to novel paradigms for GC precision oncology and prevention.
Collapse
Affiliation(s)
- Qian Zhang
- Institute for TCM-X, MOE Key Laboratory of Bioinformatics, Bioinformatics Division, BNRIST, Department of Automation, Tsinghua University, Beijing 100084, China
| | - Mingran Yang
- Institute for TCM-X, MOE Key Laboratory of Bioinformatics, Bioinformatics Division, BNRIST, Department of Automation, Tsinghua University, Beijing 100084, China
| | - Peng Zhang
- Institute for TCM-X, MOE Key Laboratory of Bioinformatics, Bioinformatics Division, BNRIST, Department of Automation, Tsinghua University, Beijing 100084, China
| | - Bowen Wu
- Institute for TCM-X, MOE Key Laboratory of Bioinformatics, Bioinformatics Division, BNRIST, Department of Automation, Tsinghua University, Beijing 100084, China
| | - Xiaosen Wei
- Institute for TCM-X, MOE Key Laboratory of Bioinformatics, Bioinformatics Division, BNRIST, Department of Automation, Tsinghua University, Beijing 100084, China
| | - Shao Li
- Institute for TCM-X, MOE Key Laboratory of Bioinformatics, Bioinformatics Division, BNRIST, Department of Automation, Tsinghua University, Beijing 100084, China
| |
Collapse
|
31
|
Abstract
Single-cell RNA sequencing methods have led to improved understanding of the heterogeneity and transcriptomic states present in complex biological systems. Recently, the development of novel single-cell technologies for assaying additional modalities, specifically genomic, epigenomic, proteomic, and spatial data, allows for unprecedented insight into cellular biology. While certain technologies collect multiple measurements from the same cells simultaneously, even when modalities are separately assayed in different cells, we can apply novel computational methods to integrate these data. The application of computational integration methods to multimodal paired and unpaired data results in rich information about the identities of the cells present and the interactions between different levels of biology, such as between genetic variation and transcription. In this review, we both discuss the single-cell technologies for measuring these modalities and describe and characterize a variety of computational integration methods for combining the resulting data to leverage multimodal information toward greater biological insight.
Collapse
Affiliation(s)
- Emily Flynn
- CoLabs, University of California, San Francisco, California, USA;
| | - Ana Almonte-Loya
- CoLabs, University of California, San Francisco, California, USA;
- Biomedical Informatics Program, University of California, San Francisco, California, USA
| | - Gabriela K Fragiadakis
- CoLabs, University of California, San Francisco, California, USA;
- Division of Rheumatology, Department of Medicine, University of California, San Francisco, California, USA
| |
Collapse
|
32
|
Can H, Chanumolu SK, Nielsen BD, Alvarez S, Naldrett MJ, Ünlü G, Otu HH. Integration of Meta-Multi-Omics Data Using Probabilistic Graphs and External Knowledge. Cells 2023; 12:1998. [PMID: 37566077 PMCID: PMC10417344 DOI: 10.3390/cells12151998] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/12/2023] [Revised: 07/11/2023] [Accepted: 08/02/2023] [Indexed: 08/12/2023] Open
Abstract
Multi-omics has the promise to provide a detailed molecular picture of biological systems. Although obtaining multi-omics data is relatively easy, methods that analyze such data have been lagging. In this paper, we present an algorithm that uses probabilistic graph representations and external knowledge to perform optimal structure learning and deduce a multifarious interaction network for multi-omics data from a bacterial community. Kefir grain, a microbial community that ferments milk and creates kefir, represents a self-renewing, stable, natural microbial community. Kefir has been shown to have a wide range of health benefits. We obtained a controlled bacterial community using the two most abundant and well-studied species in kefir grains: Lentilactobacillus kefiri and Lactobacillus kefiranofaciens. We applied growth temperatures of 30 °C and 37 °C and obtained transcriptomic, metabolomic, and proteomic data for the same 20 samples (10 samples per temperature). We obtained a multi-omics interaction network, which generated insights that would not have been possible with single-omics analysis. We identified interactions among transcripts, proteins, and metabolites, suggesting active toxin/antitoxin systems. We also observed multifarious interactions that involved the shikimate pathway. These observations helped explain bacterial adaptation to different stress conditions, co-aggregation, and increased activation of L. kefiranofaciens at 37 °C.
Collapse
Affiliation(s)
- Handan Can
- Department of Electrical and Computer Engineering, University of Nebraska-Lincoln, Lincoln, NE 68588, USA
| | - Sree K. Chanumolu
- Department of Electrical and Computer Engineering, University of Nebraska-Lincoln, Lincoln, NE 68588, USA
| | - Barbara D. Nielsen
- Department of Animal, Veterinary and Food Sciences, University of Idaho, Moscow, ID 83844, USA
| | - Sophie Alvarez
- Proteomics and Metabolomics Facility, Nebraska Center for Biotechnology, University of Nebraska-Lincoln, Lincoln, NE 68588, USA
| | - Michael J. Naldrett
- Proteomics and Metabolomics Facility, Nebraska Center for Biotechnology, University of Nebraska-Lincoln, Lincoln, NE 68588, USA
| | - Gülhan Ünlü
- Department of Animal, Veterinary and Food Sciences, University of Idaho, Moscow, ID 83844, USA
- Department of Chemical and Biological Engineering, University of Idaho, Moscow, ID 83844, USA
- School of Food Science, Washington State University, Pullman, WA 99164, USA
| | - Hasan H. Otu
- Department of Electrical and Computer Engineering, University of Nebraska-Lincoln, Lincoln, NE 68588, USA
| |
Collapse
|
33
|
Gygi JP, Kleinstein SH, Guan L. Predictive overfitting in immunological applications: Pitfalls and solutions. Hum Vaccin Immunother 2023; 19:2251830. [PMID: 37697867 PMCID: PMC10498807 DOI: 10.1080/21645515.2023.2251830] [Citation(s) in RCA: 6] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/01/2023] [Revised: 07/27/2023] [Accepted: 08/21/2023] [Indexed: 09/13/2023] Open
Abstract
Overfitting describes the phenomenon where a highly predictive model on the training data generalizes poorly to future observations. It is a common concern when applying machine learning techniques to contemporary medical applications, such as predicting vaccination response and disease status in infectious disease or cancer studies. This review examines the causes of overfitting and offers strategies to counteract it, focusing on model complexity reduction, reliable model evaluation, and harnessing data diversity. Through discussion of the underlying mathematical models and illustrative examples using both synthetic data and published real datasets, our objective is to equip analysts and bioinformaticians with the knowledge and tools necessary to detect and mitigate overfitting in their research.
Collapse
Affiliation(s)
- Jeremy P. Gygi
- Program in Computational Biology & Bioinformatics, Yale University, New Haven, CT, USA
| | - Steven H. Kleinstein
- Program in Computational Biology & Bioinformatics, Yale University, New Haven, CT, USA
- Department of Pathology, Yale School of Medicine, New Haven, CT, USA
- Department of Immunobiology, Yale School of Medicine, New Haven, CT, USA
| | - Leying Guan
- Program in Computational Biology & Bioinformatics, Yale University, New Haven, CT, USA
- Department of Biostatistics, Yale School of Public Health, New Haven, CT, USA
| |
Collapse
|
34
|
O'Connor LM, O'Connor BA, Lim SB, Zeng J, Lo CH. Integrative multi-omics and systems bioinformatics in translational neuroscience: A data mining perspective. J Pharm Anal 2023; 13:836-850. [PMID: 37719197 PMCID: PMC10499660 DOI: 10.1016/j.jpha.2023.06.011] [Citation(s) in RCA: 25] [Impact Index Per Article: 12.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/29/2022] [Revised: 06/20/2023] [Accepted: 06/25/2023] [Indexed: 09/19/2023] Open
Abstract
Bioinformatic analysis of large and complex omics datasets has become increasingly useful in modern day biology by providing a great depth of information, with its application to neuroscience termed neuroinformatics. Data mining of omics datasets has enabled the generation of new hypotheses based on differentially regulated biological molecules associated with disease mechanisms, which can be tested experimentally for improved diagnostic and therapeutic targeting of neurodegenerative diseases. Importantly, integrating multi-omics data using a systems bioinformatics approach will advance the understanding of the layered and interactive network of biological regulation that exchanges systemic knowledge to facilitate the development of a comprehensive human brain profile. In this review, we first summarize data mining studies utilizing datasets from the individual type of omics analysis, including epigenetics/epigenomics, transcriptomics, proteomics, metabolomics, lipidomics, and spatial omics, pertaining to Alzheimer's disease, Parkinson's disease, and multiple sclerosis. We then discuss multi-omics integration approaches, including independent biological integration and unsupervised integration methods, for more intuitive and informative interpretation of the biological data obtained across different omics layers. We further assess studies that integrate multi-omics in data mining which provide convoluted biological insights and offer proof-of-concept proposition towards systems bioinformatics in the reconstruction of brain networks. Finally, we recommend a combination of high dimensional bioinformatics analysis with experimental validation to achieve translational neuroscience applications including biomarker discovery, therapeutic development, and elucidation of disease mechanisms. We conclude by providing future perspectives and opportunities in applying integrative multi-omics and systems bioinformatics to achieve precision phenotyping of neurodegenerative diseases and towards personalized medicine.
Collapse
Affiliation(s)
- Lance M. O'Connor
- College of Biological Sciences, University of Minnesota, Minneapolis, MN, 55455, USA
| | - Blake A. O'Connor
- School of Pharmacy, University of Wisconsin, Madison, WI, 53705, USA
| | - Su Bin Lim
- Department of Biochemistry and Molecular Biology, Ajou University School of Medicine, Suwon, 16499, South Korea
| | - Jialiu Zeng
- Lee Kong Chian School of Medicine, Nanyang Technological University, Singapore, 308232, Singapore
| | - Chih Hung Lo
- Lee Kong Chian School of Medicine, Nanyang Technological University, Singapore, 308232, Singapore
| |
Collapse
|
35
|
Manganaro L, Bianco S, Bironzo P, Cipollini F, Colombi D, Corà D, Corti G, Doronzo G, Errico L, Falco P, Gandolfi L, Guerrera F, Monica V, Novello S, Papotti M, Parab S, Pittaro A, Primo L, Righi L, Sabbatini G, Sandri A, Vattakunnel S, Bussolino F, Scagliotti GV. Consensus clustering methodology to improve molecular stratification of non-small cell lung cancer. Sci Rep 2023; 13:7759. [PMID: 37173325 PMCID: PMC10182023 DOI: 10.1038/s41598-023-33954-x] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/14/2022] [Accepted: 04/21/2023] [Indexed: 05/15/2023] Open
Abstract
Recent advances in machine learning research, combined with the reduced sequencing costs enabled by modern next-generation sequencing, paved the way to the implementation of precision medicine through routine multi-omics molecular profiling of tumours. Thus, there is an emerging need of reliable models exploiting such data to retrieve clinically useful information. Here, we introduce an original consensus clustering approach, overcoming the intrinsic instability of common clustering methods based on molecular data. This approach is applied to the case of non-small cell lung cancer (NSCLC), integrating data of an ongoing clinical study (PROMOLE) with those made available by The Cancer Genome Atlas, to define a molecular-based stratification of the patients beyond, but still preserving, histological subtyping. The resulting subgroups are biologically characterized by well-defined mutational and gene-expression profiles and are significantly related to disease-free survival (DFS). Interestingly, it was observed that (1) cluster B, characterized by a short DFS, is enriched in KEAP1 and SKP2 mutations, that makes it an ideal candidate for further studies with inhibitors, and (2) over- and under-representation of inflammation and immune systems pathways in squamous-cell carcinomas subgroups could be potentially exploited to stratify patients treated with immunotherapy.
Collapse
Affiliation(s)
- L Manganaro
- aizoOn Technology Consulting S.R.L, Torino, Italy
| | - S Bianco
- aizoOn Technology Consulting S.R.L, Torino, Italy
| | - P Bironzo
- Medical Oncology Division at San Luigi Hospital, Department of Oncology, University of Torino, Orbassano (TO), Italy
| | - F Cipollini
- aizoOn Technology Consulting S.R.L, Torino, Italy
| | - D Colombi
- aizoOn Technology Consulting S.R.L, Torino, Italy
| | - D Corà
- Department of Translational Medicine, Piemonte Orientale University, Novara, Italy
- Center for Translational Research on Autoimmune and Allergic Diseases-CAAD, Novara, Italy
| | - G Corti
- Department of Oncology, University of Torino, 10060, Candiolo, Italy
- Candiolo Cancer Institute-IRCCS-FPO, 10060, Candiolo, Italy
| | - G Doronzo
- Department of Oncology, University of Torino, 10060, Candiolo, Italy
- Candiolo Cancer Institute-IRCCS-FPO, 10060, Candiolo, Italy
| | - L Errico
- Division of Thoracic Surgery at AOU San Luigi, Department of Oncology, University of Torino, Orbassano (TO), Italy
| | - P Falco
- aizoOn Technology Consulting S.R.L, Torino, Italy
| | - L Gandolfi
- Department of Oncology, University of Torino, 10060, Candiolo, Italy
- Candiolo Cancer Institute-IRCCS-FPO, 10060, Candiolo, Italy
| | - F Guerrera
- Division of Thoracic Surgery at AOU Città della Salute e della Scienza, Department of Surgical Sciences, University of Torino, Torino, Italy
| | - V Monica
- Department of Oncology, University of Torino, 10060, Candiolo, Italy
- Candiolo Cancer Institute-IRCCS-FPO, 10060, Candiolo, Italy
| | - S Novello
- Medical Oncology Division at San Luigi Hospital, Department of Oncology, University of Torino, Orbassano (TO), Italy
| | - M Papotti
- Pathology Division at AOU Città della Salute e della Scienza, Department of Oncology, University of Torino, Torino, Italy
| | - S Parab
- Department of Oncology, University of Torino, 10060, Candiolo, Italy
- Candiolo Cancer Institute-IRCCS-FPO, 10060, Candiolo, Italy
| | - A Pittaro
- Pathology Division at AOU Città della Salute e della Scienza, Department of Oncology, University of Torino, Torino, Italy
| | - L Primo
- Department of Oncology, University of Torino, 10060, Candiolo, Italy
- Candiolo Cancer Institute-IRCCS-FPO, 10060, Candiolo, Italy
| | - L Righi
- Pathology Division at AOU San Luigi, Department of Oncology, University of Torino, Orbassano (TO), Italy
| | - G Sabbatini
- aizoOn Technology Consulting S.R.L, Torino, Italy
| | - A Sandri
- Division of Thoracic Surgery at AOU San Luigi, Department of Oncology, University of Torino, Orbassano (TO), Italy
| | | | - F Bussolino
- Department of Oncology, University of Torino, 10060, Candiolo, Italy
- Candiolo Cancer Institute-IRCCS-FPO, 10060, Candiolo, Italy
| | - G V Scagliotti
- Medical Oncology Division at San Luigi Hospital, Department of Oncology, University of Torino, Orbassano (TO), Italy.
| |
Collapse
|
36
|
Ma X, Zhao W, Wu W. Layer-Specific Modules Detection in Cancer Multi-Layer Networks. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2023; 20:1170-1179. [PMID: 35609099 DOI: 10.1109/tcbb.2022.3176859] [Citation(s) in RCA: 14] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 02/02/2023]
Abstract
Multi-layer networks provide an effective and efficient tool to model and characterize complex systems with multiple types of interactions, which differ greatly from the traditional single-layer networks. Graph clustering in multi-layer networks is highly non-trivial since it is difficult to balance the connectivity of clusters and the connection of various layers. The current algorithms for the layer-specific clusters are criticized for the low accuracy and sensitivity to the perturbation of networks. To overcome these issues, a novel algorithm for the layer-specific module in multi-layer networks based on nonnegative matrix factorization (LSNMF) is proposed by explicitly exploring the specific features of vertices. LSNMF first extract features of vertices in multi-layer networks by using nonnegative matrix factorization (NMF) and then decompose features of vertices into the common and specific components. The orthogonality constraint is imposed on the specific components to ensure the specificity of features of vertices, which provides a better strategy to characterize and model the structure of layer-specific modules. The extensive experiments demonstrate that the proposed algorithm dramatically outperforms state-of-the-art baselines in terms of various measurements. Furthermore, LSNMF efficiently extracts stage-specific modules, which are more likely to enrich the known functions, and also associate with the survival time of patients.
Collapse
|
37
|
Ryu Y, Han GH, Jung E, Hwang D. Integration of Single-Cell RNA-Seq Datasets: A Review of Computational Methods. Mol Cells 2023; 46:106-119. [PMID: 36859475 PMCID: PMC9982060 DOI: 10.14348/molcells.2023.0009] [Citation(s) in RCA: 34] [Impact Index Per Article: 17.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/10/2023] [Revised: 01/19/2023] [Accepted: 01/19/2023] [Indexed: 03/03/2023] Open
Abstract
With the increased number of single-cell RNA sequencing (scRNA-seq) datasets in public repositories, integrative analysis of multiple scRNA-seq datasets has become commonplace. Batch effects among different datasets are inevitable because of differences in cell isolation and handling protocols, library preparation technology, and sequencing platforms. To remove these batch effects for effective integration of multiple scRNA-seq datasets, a number of methodologies have been developed based on diverse concepts and approaches. These methods have proven useful for examining whether cellular features, such as cell subpopulations and marker genes, identified from a certain dataset, are consistently present, or whether their condition-dependent variations, such as increases in cell subpopulations in particular disease-related conditions, are consistently observed in different datasets generated under similar or distinct conditions. In this review, we summarize the concepts and approaches of the integration methods and their pros and cons as has been reported in previous literature.
Collapse
Affiliation(s)
- Yeonjae Ryu
- School of Biological Sciences, Seoul National University, Seoul 08826, Korea
| | - Geun Hee Han
- School of Biological Sciences, Seoul National University, Seoul 08826, Korea
| | - Eunsoo Jung
- School of Biological Sciences, Seoul National University, Seoul 08826, Korea
| | - Daehee Hwang
- School of Biological Sciences, Seoul National University, Seoul 08826, Korea
| |
Collapse
|
38
|
Mallik S, Sarkar A, Nath S, Maulik U, Das S, Pati SK, Ghosh S, Zhao Z. 3PNMF-MKL: A non-negative matrix factorization-based multiple kernel learning method for multi-modal data integration and its application to gene signature detection. Front Genet 2023; 14:1095330. [PMID: 36865387 PMCID: PMC9971618 DOI: 10.3389/fgene.2023.1095330] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/11/2022] [Accepted: 01/30/2023] [Indexed: 02/16/2023] Open
Abstract
In this current era, biomedical big data handling is a challenging task. Interestingly, the integration of multi-modal data, followed by significant feature mining (gene signature detection), becomes a daunting task. Remembering this, here, we proposed a novel framework, namely, three-factor penalized, non-negative matrix factorization-based multiple kernel learning with soft margin hinge loss (3PNMF-MKL) for multi-modal data integration, followed by gene signature detection. In brief, limma, employing the empirical Bayes statistics, was initially applied to each individual molecular profile, and the statistically significant features were extracted, which was followed by the three-factor penalized non-negative matrix factorization method used for data/matrix fusion using the reduced feature sets. Multiple kernel learning models with soft margin hinge loss had been deployed to estimate average accuracy scores and the area under the curve (AUC). Gene modules had been identified by the consecutive analysis of average linkage clustering and dynamic tree cut. The best module containing the highest correlation was considered the potential gene signature. We utilized an acute myeloid leukemia cancer dataset from The Cancer Genome Atlas (TCGA) repository containing five molecular profiles. Our algorithm generated a 50-gene signature that achieved a high classification AUC score (viz., 0.827). We explored the functions of signature genes using pathway and Gene Ontology (GO) databases. Our method outperformed the state-of-the-art methods in terms of computing AUC. Furthermore, we included some comparative studies with other related methods to enhance the acceptability of our method. Finally, it can be notified that our algorithm can be applied to any multi-modal dataset for data integration, followed by gene module discovery.
Collapse
Affiliation(s)
- Saurav Mallik
- Department of Environmental Health, Harvard T H Chan School of public Health, Boston, MA, United States,*Correspondence: Saurav Mallik, , ; Zhongming Zhao,
| | - Anasua Sarkar
- Department of Computer Science & Engineering, Jadavpur University, Kolkata, India
| | - Sagnik Nath
- Department of Computer Science & Engineering, Jadavpur University, Kolkata, India
| | - Ujjwal Maulik
- Department of Computer Science & Engineering, Jadavpur University, Kolkata, India
| | - Supantha Das
- Department of Information Technology, Academy of Technology, Hooghly, West Bengal, India
| | - Soumen Kumar Pati
- Department of Bioinformatics, Maulana Abul Kalam Azad University, Kolkata, West Bengal, India
| | - Soumadip Ghosh
- Department of Computer Science & Engineering, Sister Nivedita University, New Town, West Bengal, India
| | - Zhongming Zhao
- Human Genetics Center, School of Public Health, The University of Texas Health Science Center at Houston, Houston, TX, United States,Center for Precision Health, School of Biomedical Informatics, The University of Texas Health Science Center at Houston, Houston, TX, United States,*Correspondence: Saurav Mallik, , ; Zhongming Zhao,
| |
Collapse
|
39
|
Flores JE, Claborne DM, Weller ZD, Webb-Robertson BJM, Waters KM, Bramer LM. Missing data in multi-omics integration: Recent advances through artificial intelligence. Front Artif Intell 2023; 6:1098308. [PMID: 36844425 PMCID: PMC9949722 DOI: 10.3389/frai.2023.1098308] [Citation(s) in RCA: 36] [Impact Index Per Article: 18.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/14/2022] [Accepted: 01/23/2023] [Indexed: 02/11/2023] Open
Abstract
Biological systems function through complex interactions between various 'omics (biomolecules), and a more complete understanding of these systems is only possible through an integrated, multi-omic perspective. This has presented the need for the development of integration approaches that are able to capture the complex, often non-linear, interactions that define these biological systems and are adapted to the challenges of combining the heterogenous data across 'omic views. A principal challenge to multi-omic integration is missing data because all biomolecules are not measured in all samples. Due to either cost, instrument sensitivity, or other experimental factors, data for a biological sample may be missing for one or more 'omic techologies. Recent methodological developments in artificial intelligence and statistical learning have greatly facilitated the analyses of multi-omics data, however many of these techniques assume access to completely observed data. A subset of these methods incorporate mechanisms for handling partially observed samples, and these methods are the focus of this review. We describe recently developed approaches, noting their primary use cases and highlighting each method's approach to handling missing data. We additionally provide an overview of the more traditional missing data workflows and their limitations; and we discuss potential avenues for further developments as well as how the missing data issue and its current solutions may generalize beyond the multi-omics context.
Collapse
Affiliation(s)
- Javier E. Flores
- Pacific Northwest National Laboratory, Biological Sciences Division, Earth and Biological Sciences Directorate, Richland, WA, United States
| | - Daniel M. Claborne
- Pacific Northwest National Laboratory, Artificial Intelligence and Data Analytics Division, National Security Directorate, Richland, WA, United States
| | - Zachary D. Weller
- Pacific Northwest National Laboratory, Artificial Intelligence and Data Analytics Division, National Security Directorate, Richland, WA, United States
| | - Bobbie-Jo M. Webb-Robertson
- Pacific Northwest National Laboratory, Biological Sciences Division, Earth and Biological Sciences Directorate, Richland, WA, United States
| | - Katrina M. Waters
- Pacific Northwest National Laboratory, Biological Sciences Division, Earth and Biological Sciences Directorate, Richland, WA, United States
| | - Lisa M. Bramer
- Pacific Northwest National Laboratory, Biological Sciences Division, Earth and Biological Sciences Directorate, Richland, WA, United States
| |
Collapse
|
40
|
Chen X, Han M, Li Y, Li X, Zhang J, Zhu Y. Identification of functional gene modules by integrating multi-omics data and known molecular interactions. Front Genet 2023; 14:1082032. [PMID: 36760999 PMCID: PMC9902936 DOI: 10.3389/fgene.2023.1082032] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/27/2022] [Accepted: 01/11/2023] [Indexed: 01/25/2023] Open
Abstract
Multi-omics data integration has emerged as a promising approach to identify patient subgroups. However, in terms of grouping genes (or gene products) into co-expression modules, data integration methods suffer from two main drawbacks. First, most existing methods only consider genes or samples measured in all different datasets. Second, known molecular interactions (e.g., transcriptional regulatory interactions, protein-protein interactions and biological pathways) cannot be utilized to assist in module detection. Herein, we present a novel data integration framework, Correlation-based Local Approximation of Membership (CLAM), which provides two methodological innovations to address these limitations: 1) constructing a trans-omics neighborhood matrix by integrating multi-omics datasets and known molecular interactions, and 2) using a local approximation procedure to define gene modules from the matrix. Applying Correlation-based Local Approximation of Membership to human colorectal cancer (CRC) and mouse B-cell differentiation multi-omics data obtained from The Cancer Genome Atlas (TCGA), Clinical Proteomics Tumor Analysis Consortium (CPTAC), Gene Expression Omnibus (GEO) and ProteomeXchange database, we demonstrated its superior ability to recover biologically relevant modules and gene ontology (GO) terms. Further investigation of the colorectal cancer modules revealed numerous transcription factors and KEGG pathways that played crucial roles in colorectal cancer progression. Module-based survival analysis constructed four survival-related networks in which pairwise gene correlations were significantly correlated with colorectal cancer patient survival. Overall, the series of evaluations demonstrated the great potential of Correlation-based Local Approximation of Membership for identifying modular biomarkers for complex diseases. We implemented Correlation-based Local Approximation of Membership as a user-friendly application available at https://github.com/free1234hm/CLAM.
Collapse
Affiliation(s)
- Xiaoqing Chen
- Basic Medical School, Anhui Medical University, Hefei, China,National Center for Protein Sciences (Beijing), Beijing Proteome Research Center, Beijing Institute of Lifeomics, Beijing, China
| | - Mingfei Han
- National Center for Protein Sciences (Beijing), Beijing Proteome Research Center, Beijing Institute of Lifeomics, Beijing, China
| | - Yingxing Li
- Central Research Laboratory, Peking Union Medical College Hospital, Chinese Academy of Medical Sciences and Peking Union Medical College, Beijing, China
| | - Xiao Li
- National Center for Protein Sciences (Beijing), Beijing Proteome Research Center, Beijing Institute of Lifeomics, Beijing, China
| | - Jiaqi Zhang
- National Center for Protein Sciences (Beijing), Beijing Proteome Research Center, Beijing Institute of Lifeomics, Beijing, China
| | - Yunping Zhu
- Basic Medical School, Anhui Medical University, Hefei, China,National Center for Protein Sciences (Beijing), Beijing Proteome Research Center, Beijing Institute of Lifeomics, Beijing, China,*Correspondence: Yunping Zhu,
| |
Collapse
|
41
|
Bansal B, Sahoo A. Multi-omics data fusion using adaptive GTO guided Non-negative matrix factorization for cancer subtype discovery. COMPUTER METHODS AND PROGRAMS IN BIOMEDICINE 2023; 228:107246. [PMID: 36434961 DOI: 10.1016/j.cmpb.2022.107246] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/10/2022] [Revised: 11/13/2022] [Accepted: 11/14/2022] [Indexed: 06/16/2023]
Abstract
BACKGROUND AND OBJECTIVE Cancer subtype discovery is essential for personalized clinical treatment. With the onset of progressive profile techniques for cancer, a large amount of heterogeneous and high-dimensional transcriptomic, proteomic and genomic datasets are easily accumulated. Integrative clustering of such multi-omics data is crucial to recognize their latent structure and to acknowledge the correlation within and across them. Although the integrative analysis of diversified multi-omics data is informative, it is challenging when multiplicity in data inflicts poor accordance w.r.t. clustering structure. The objective of this work is to develop an effective integrative analysis framework that encapsulates the heterogeneity of various biological mechanisms and predicts homogeneous subgroups of cancer patients. METHOD In this paper, improved sparse-joint non-negative matrix factorization (sparse-jNMF) has been devised for the problem of cancer-subtype discovery. The initial points of sparse-jNMF have improved using a novel meta-heuristic algorithm adaptive gorilla troops optimizer (Ada-GTO). Improving the initialization of sparse-jNMF enhances its convergence behavior and further strengthens the clustering performance. In addition, the consensus clustering approach has been adopted to construct a patient-patient similarity matrix for obtaining stable clusters of patient samples. RESULT The proposed framework has been applied to 4 different real-life multi-omics cancer datasets, namely colon adenocarcinoma, breast-invasive carcinoma, kidney-renal clear-cell carcinoma, and lung adenocarcinoma. The proposed method results in patient clusters with better silhouette scores and cluster purity than classical initialization and similar meta-heuristics for initial point estimation approaches. Survival probabilities estimated using Kaplan-Meier (KM) curve show statistically significant difference (p < 0.05) for the homogenous cancer patient clusters obtained using the proposed method as compared to iCluster. The presented approach further identified the somatic mutations for the classified subgroups, which is beneficial to provide targeted treatments. CONCLUSION This paper proposes Ada-GTO guided sparse-jNMF framework for cancer subtype discovery, considering the information from multiple omic features that provide comprehension. The proposed meta-guided framework outperforms all other state-of-the-art counterparts. It also has great potential for obtaining the homogeneous subgroups of other diseases.
Collapse
Affiliation(s)
- Bhavana Bansal
- Department of CSE & IT, Jaypee Institute of Information Technology, Noida, India.
| | - Anita Sahoo
- Department of CSE & IT, Jaypee Institute of Information Technology, Noida, India
| |
Collapse
|
42
|
Mokou M, Narayanasamy S, Stroggilos R, Balaur IA, Vlahou A, Mischak H, Frantzi M. A Drug Repurposing Pipeline Based on Bladder Cancer Integrated Proteotranscriptomics Signatures. Methods Mol Biol 2023; 2684:59-99. [PMID: 37410228 DOI: 10.1007/978-1-0716-3291-8_4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 07/07/2023]
Abstract
Delivering better care for patients with bladder cancer (BC) necessitates the development of novel therapeutic strategies that address both the high disease heterogeneity and the limitations of the current therapeutic modalities, such as drug low efficacy and patient resistance acquisition. Drug repurposing is a cost-effective strategy that targets the reuse of existing drugs for new therapeutic purposes. Such a strategy could open new avenues toward more effective BC treatment. BC patients' multi-omics signatures can be used to guide the investigation of existing drugs that show an effective therapeutic potential through drug repurposing. In this book chapter, we present an integrated multilayer approach that includes cross-omics analyses from publicly available transcriptomics and proteomics data derived from BC tissues and cell lines that were investigated for the development of disease-specific signatures. These signatures are subsequently used as input for a signature-based repurposing approach using the Connectivity Map (CMap) tool. We further explain the steps that may be followed to identify and select existing drugs of increased potential for repurposing in BC patients.
Collapse
Affiliation(s)
- Marika Mokou
- Department of Biomarker Research, Mosaiques Diagnostics, Hannover, Germany.
| | - Shaman Narayanasamy
- Luxembourg Centre for Systems Biomedicine (LCSB), University of Luxembourg, Esch-sur-Alzette, Luxembourg
| | - Rafael Stroggilos
- Systems Biology Center, Biomedical Research Foundation, Academy of Athens, Athens, Greece
| | - Irina-Afrodita Balaur
- Luxembourg Centre for Systems Biomedicine (LCSB), University of Luxembourg, Esch-sur-Alzette, Luxembourg
| | - Antonia Vlahou
- Systems Biology Center, Biomedical Research Foundation, Academy of Athens, Athens, Greece
| | - Harald Mischak
- Department of Biomarker Research, Mosaiques Diagnostics, Hannover, Germany
- Institute of Cardiovascular and Medical Sciences, University of Glasgow, Glasgow, UK
| | - Maria Frantzi
- Department of Biomarker Research, Mosaiques Diagnostics, Hannover, Germany
| |
Collapse
|
43
|
Metabolomics and modelling approaches for systems metabolic engineering. Metab Eng Commun 2022; 15:e00209. [PMID: 36281261 PMCID: PMC9587336 DOI: 10.1016/j.mec.2022.e00209] [Citation(s) in RCA: 6] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/20/2022] [Revised: 10/13/2022] [Accepted: 10/14/2022] [Indexed: 11/21/2022] Open
Abstract
Metabolic engineering involves the manipulation of microbes to produce desirable compounds through genetic engineering or synthetic biology approaches. Metabolomics involves the quantitation of intracellular and extracellular metabolites, where mass spectrometry and nuclear magnetic resonance based analytical instrumentation are often used. Here, the experimental designs, sample preparations, metabolite quenching and extraction are essential to the quantitative metabolomics workflow. The resultant metabolomics data can then be used with computational modelling approaches, such as kinetic and constraint-based modelling, to better understand underlying mechanisms and bottlenecks in the synthesis of desired compounds, thereby accelerating research through systems metabolic engineering. Constraint-based models, such as genome scale models, have been used successfully to enhance the yield of desired compounds from engineered microbes, however, unlike kinetic or dynamic models, constraint-based models do not incorporate regulatory effects. Nevertheless, the lack of time-series metabolomic data generation has hindered the usefulness of dynamic models till today. In this review, we show that improvements in automation, dynamic real-time analysis and high throughput workflows can drive the generation of more quality data for dynamic models through time-series metabolomics data generation. Spatial metabolomics also has the potential to be used as a complementary approach to conventional metabolomics, as it provides information on the localization of metabolites. However, more effort must be undertaken to identify metabolites from spatial metabolomics data derived through imaging mass spectrometry, where machine learning approaches could prove useful. On the other hand, single-cell metabolomics has also seen rapid growth, where understanding cell-cell heterogeneity can provide more insights into efficient metabolic engineering of microbes. Moving forward, with potential improvements in automation, dynamic real-time analysis, high throughput workflows, and spatial metabolomics, more data can be produced and studied using machine learning algorithms, in conjunction with dynamic models, to generate qualitative and quantitative predictions to advance metabolic engineering efforts.
Collapse
|
44
|
Raufaste-Cazavieille V, Santiago R, Droit A. Multi-omics analysis: Paving the path toward achieving precision medicine in cancer treatment and immuno-oncology. Front Mol Biosci 2022; 9:962743. [PMID: 36304921 PMCID: PMC9595279 DOI: 10.3389/fmolb.2022.962743] [Citation(s) in RCA: 26] [Impact Index Per Article: 8.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/06/2022] [Accepted: 09/21/2022] [Indexed: 11/13/2022] Open
Abstract
The acceleration of large-scale sequencing and the progress in high-throughput computational analyses, defined as omics, was a hallmark for the comprehension of the biological processes in human health and diseases. In cancerology, the omics approach, initiated by genomics and transcriptomics studies, has revealed an incredible complexity with unsuspected molecular diversity within a same tumor type as well as spatial and temporal heterogeneity of tumors. The integration of multiple biological layers of omics studies brought oncology to a new paradigm, from tumor site classification to pan-cancer molecular classification, offering new therapeutic opportunities for precision medicine. In this review, we will provide a comprehensive overview of the latest innovations for multi-omics integration in oncology and summarize the largest multi-omics dataset available for adult and pediatric cancers. We will present multi-omics techniques for characterizing cancer biology and show how multi-omics data can be combined with clinical data for the identification of prognostic and treatment-specific biomarkers, opening the way to personalized therapy. To conclude, we will detail the newest strategies for dissecting the tumor immune environment and host–tumor interaction. We will explore the advances in immunomics and microbiomics for biomarker identification to guide therapeutic decision in immuno-oncology.
Collapse
Affiliation(s)
| | - Raoul Santiago
- CHU de Québec Research Center, Université Laval, Québec, QC, Canada
- Division of Pediatric Hematology-Oncology, Centre Hospitalier Universitaire de L’Université Laval, Charles Bruneau Cancer Center, Québec, QC, Canada
- *Correspondence: Raoul Santiago, ; Arnaud Droit,
| | - Arnaud Droit
- CHU de Québec Research Center, Université Laval, Québec, QC, Canada
- *Correspondence: Raoul Santiago, ; Arnaud Droit,
| |
Collapse
|
45
|
Jia M, Yuan DY, Lovelace TC, Hu M, Benos PV. Causal Discovery in High-dimensional, Multicollinear Datasets. FRONTIERS IN EPIDEMIOLOGY 2022; 2:899655. [PMID: 36778756 PMCID: PMC9910507 DOI: 10.3389/fepid.2022.899655] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/19/2022] [Accepted: 08/05/2022] [Indexed: 11/13/2022]
Abstract
As the cost of high-throughput genomic sequencing technology declines, its application in clinical research becomes increasingly popular. The collected datasets often contain tens or hundreds of thousands of biological features that need to be mined to extract meaningful information. One area of particular interest is discovering underlying causal mechanisms of disease outcomes. Over the past few decades, causal discovery algorithms have been developed and expanded to infer such relationships. However, these algorithms suffer from the curse of dimensionality and multicollinearity. A recently introduced, non-orthogonal, general empirical Bayes approach to matrix factorization has been demonstrated to successfully infer latent factors with interpretable structures from observed variables. We hypothesize that applying this strategy to causal discovery algorithms can solve both the high dimensionality and collinearity problems, inherent to most biomedical datasets. We evaluate this strategy on simulated data and apply it to two real-world datasets. In a breast cancer dataset, we identified important survival-associated latent factors and biologically meaningful enriched pathways within factors related to important clinical features. In a SARS-CoV-2 dataset, we were able to predict whether a patient (1) had Covid-19 and (2) would enter the ICU. Furthermore, we were able to associate factors with known Covid-19 related biological pathways.
Collapse
Affiliation(s)
- Minxue Jia
- Department of Computational and Systems Biology, University of Pittsburgh School of Medicine, Pittsburgh, PA, United States
- Joint Carnegie Mellon - University of Pittsburgh Computational Biology PhD Program, Pittsburgh, PA, United States
| | - Daniel Y. Yuan
- Department of Computational and Systems Biology, University of Pittsburgh School of Medicine, Pittsburgh, PA, United States
- Joint Carnegie Mellon - University of Pittsburgh Computational Biology PhD Program, Pittsburgh, PA, United States
| | - Tyler C. Lovelace
- Department of Computational and Systems Biology, University of Pittsburgh School of Medicine, Pittsburgh, PA, United States
- Joint Carnegie Mellon - University of Pittsburgh Computational Biology PhD Program, Pittsburgh, PA, United States
| | - Mengying Hu
- Department of Computational and Systems Biology, University of Pittsburgh School of Medicine, Pittsburgh, PA, United States
- Joint Carnegie Mellon - University of Pittsburgh Computational Biology PhD Program, Pittsburgh, PA, United States
| | - Panayiotis V. Benos
- Department of Computational and Systems Biology, University of Pittsburgh School of Medicine, Pittsburgh, PA, United States
- Joint Carnegie Mellon - University of Pittsburgh Computational Biology PhD Program, Pittsburgh, PA, United States
- Department of Epidemiology, University of Florida, Gainesville, FL, United States
| |
Collapse
|
46
|
Pan-Cancer Analysis for Immune Cell Infiltration and Mutational Signatures Using Non-Negative Canonical Correlation Analysis. APPLIED SCIENCES-BASEL 2022. [DOI: 10.3390/app12136596] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/16/2022]
Abstract
Mutational signatures indicate the mutational processes and substitution patterns in cancer cell genomes. However, the functional consequences of mutational signatures remain unclear, and there have been no comprehensive systematic studies to examine the relationships between the mutational signatures and the immune cell infiltration. Here, the relationship between mutational signatures and immune cell infiltration using non-negative canonical correlation analysis based on 8927 patients across 25 tumor types was investigated. By inspecting mutational signatures with the maximal coefficients determined by the non-negative canonical correlation analysis, the study identified mutational signatures related to immune cell infiltration composed of tumor microenvironments. The analysis was validated by showing that the genes associated with the identified mutational signatures were linked to overall survival by a Kaplan–Meier curve and a log-rank test and were mainly related to immunity by gene set enrichment analysis. These results will help expand our knowledge of tumor biology and recognize the functional roles and associations of immune systems with mutational signatures.
Collapse
|
47
|
Bouchard HC, Sun D, Dennis EL, Newsome MR, Disner SG, Elman J, Silva A, Velez C, Irimia A, Davenport ND, Sponheim SR, Franz CE, Kremen WS, Coleman MJ, Williams MW, Geuze E, Koerte IK, Shenton ME, Adamson MM, Coimbra R, Grant G, Shutter L, George MS, Zafonte RD, McAllister TW, Stein MB, Thompson PM, Wilde EA, Tate DF, Sotiras A, Morey RA. Age-dependent white matter disruptions after military traumatic brain injury: Multivariate analysis results from ENIGMA brain injury. Hum Brain Mapp 2022; 43:2653-2667. [PMID: 35289463 PMCID: PMC9057089 DOI: 10.1002/hbm.25811] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/03/2021] [Revised: 12/18/2021] [Accepted: 02/10/2022] [Indexed: 01/27/2023] Open
Abstract
Mild Traumatic brain injury (mTBI) is a signature wound in military personnel, and repetitive mTBI has been linked to age-related neurogenerative disorders that affect white matter (WM) in the brain. However, findings of injury to specific WM tracts have been variable and inconsistent. This may be due to the heterogeneity of mechanisms, etiology, and comorbid disorders related to mTBI. Non-negative matrix factorization (NMF) is a data-driven approach that detects covarying patterns (components) within high-dimensional data. We applied NMF to diffusion imaging data from military Veterans with and without a self-reported TBI history. NMF identified 12 independent components derived from fractional anisotropy (FA) in a large dataset (n = 1,475) gathered through the ENIGMA (Enhancing Neuroimaging Genetics through Meta-Analysis) Military Brain Injury working group. Regressions were used to examine TBI- and mTBI-related associations in NMF-derived components while adjusting for age, sex, post-traumatic stress disorder, depression, and data acquisition site/scanner. We found significantly stronger age-dependent effects of lower FA in Veterans with TBI than Veterans without in four components (q < 0.05), which are spatially unconstrained by traditionally defined WM tracts. One component, occupying the most peripheral location, exhibited significantly stronger age-dependent differences in Veterans with mTBI. We found NMF to be powerful and effective in detecting covarying patterns of FA associated with mTBI by applying standard parametric regression modeling. Our results highlight patterns of WM alteration that are differentially affected by TBI and mTBI in younger compared to older military Veterans.
Collapse
Affiliation(s)
- Heather C. Bouchard
- Duke‐UNC Brain Imaging and Analysis CenterDuke UniversityDurhamNorth CarolinaUSA
- Mid‐Atlantic Mental Illness Research Education and Clinical CenterDurham VA Medical CenterDurhamNorth CarolinaUSA
- Center for Brain, Biology & BehaviorUniversity of Nebraska‐LincolnLincolnNebraskaUSA
| | - Delin Sun
- Duke‐UNC Brain Imaging and Analysis CenterDuke UniversityDurhamNorth CarolinaUSA
- Mid‐Atlantic Mental Illness Research Education and Clinical CenterDurham VA Medical CenterDurhamNorth CarolinaUSA
| | - Emily L. Dennis
- Department of NeurologyUniversity of UtahSalt Lake CityUtahUSA
- Department of RadiologyStanford UniversityStanfordCaliforniaUSA
| | - Mary R. Newsome
- Michael E. DeBakey VA Medical CenterHoustonTexasUSA
- H. Ben Taub Department of Physical Medicine and RehabilitationBaylor College of MedicineHoustonTexasUSA
| | - Seth G. Disner
- Minneapolis VA Health Care SystemMinneapolisMinnesotaUSA
- Department of PsychiatryUniversity of Minnesota Medical SchoolMinneapolisMinnesotaUSA
| | - Jeremy Elman
- Department of PsychiatryUniversity of California San DiegoLa JollaCaliforniaUSA
- Center for Behavior Genetics of AgingUniversity of California, San DiegoSan DiegoCaliforniaUSA
| | - Annelise Silva
- Psychiatry Neuroimaging LaboratoryBrigham & Women's HospitalBostonMassachusettsUSA
| | - Carmen Velez
- Department of NeurologyUniversity of UtahSalt Lake CityUtahUSA
- George E. Wahlen Veterans Affairs Medical CenterSalt Lake CityUtahUSA
| | - Andrei Irimia
- Leonard Davis School of GerontologyUniversity of Southern CaliforniaLos AngelesCaliforniaUSA
- Department of Biomedical Engineering, Viterbi School of EngineeringUniversity of Southern CaliforniaLos AngelesCaliforniaUSA
| | - Nicholas D. Davenport
- Minneapolis VA Health Care SystemMinneapolisMinnesotaUSA
- Department of PsychiatryUniversity of Minnesota Medical SchoolMinneapolisMinnesotaUSA
| | - Scott R. Sponheim
- Minneapolis VA Health Care SystemMinneapolisMinnesotaUSA
- Department of PsychiatryUniversity of Minnesota Medical SchoolMinneapolisMinnesotaUSA
| | - Carol E. Franz
- Department of PsychiatryUniversity of California San DiegoLa JollaCaliforniaUSA
- Center for Behavior Genetics of AgingUniversity of California, San DiegoSan DiegoCaliforniaUSA
| | - William S. Kremen
- Department of PsychiatryUniversity of California San DiegoLa JollaCaliforniaUSA
- Center for Behavior Genetics of AgingUniversity of California, San DiegoSan DiegoCaliforniaUSA
- Center of Excellence for Stress and Mental HealthVA San Diego Healthcare SystemSan DiegoCaliforniaUSA
| | - Michael J. Coleman
- Psychiatry Neuroimaging LaboratoryBrigham & Women's HospitalBostonMassachusettsUSA
| | - M. Wright Williams
- Michael E. DeBakey VA Medical CenterHoustonTexasUSA
- Menninger Department of Psychiatry and Behavioral SciencesBaylor College of MedicineHoustonTexasUSA
| | - Elbert Geuze
- Department of PsychiatryUniversity Medical CenterUtrechtNetherlands
- Brain Research & Innovation CentreMinistry of DefenceUtrechtNetherlands
| | - Inga K. Koerte
- Psychiatry Neuroimaging LaboratoryBrigham & Women's HospitalBostonMassachusettsUSA
| | - Martha E. Shenton
- Psychiatry Neuroimaging LaboratoryBrigham & Women's HospitalBostonMassachusettsUSA
| | - Maheen M. Adamson
- Rehabilitation ServiceVA Palo AltoPalo AltoCaliforniaUSA
- NeurosurgeryStanford School of MedicineStanfordCaliforniaUSA
| | - Raul Coimbra
- Department of SurgeryUniversity of California San DiegoLa JollaCaliforniaUSA
| | - Gerald Grant
- Department of NeurosurgeryStanford University Medical CenterPalo AltoCaliforniaUSA
| | - Lori Shutter
- Department of Critical Care MedicineUniversity of Pittsburgh School of MedicinePittsburghPennsylvaniaUSA
| | - Mark S. George
- Department of PsychiatryMedical University of South CarolinaCharlestonSouth CarolinaUSA
| | - Ross D. Zafonte
- Spaulding Rehabilitation HospitalMassachusetts General Hospital, Brigham and Women's Hospital and Harvard Medical SchoolBostonMassachusettsUSA
| | | | - Murray B. Stein
- Department of PsychiatryUniversity of California San DiegoLa JollaCaliforniaUSA
- Herbert Wertheim School of Public Health and Human Longevity ScienceUniversity of California San DiegoLa JollaCaliforniaUSA
| | - Paul M. Thompson
- Imaging Genetics Center, Stevens Neuroimaging & Informatics InstituteKeck School of Medicine of USCMarina del ReyCaliforniaUSA
- Department of Neurology, Pediatrics, Psychiatry, Radiology, Engineering, and OphthalmologyUniversity of Southern California (USC), Los AngelesCaliforniaUSA
- Department of PediatricsUSCLos AngelesCaliforniaUSA
- Department of PsychiatryUSCLos AngelesCaliforniaUSA
- Department of RadiologyUSCLos AngelesCaliforniaUSA
- Department of EngineeringUSCLos AngelesCaliforniaUSA
- Department of OphthalmologyUSCLos AngelesCaliforniaUSA
- Department of Radiology and Institute for Informatics, School of MedicineWashington University St. LouisSt. LouisMissouriUSA
| | - Elisabeth A. Wilde
- Department of NeurologyUniversity of UtahSalt Lake CityUtahUSA
- Michael E. DeBakey VA Medical CenterHoustonTexasUSA
- George E. Wahlen Veterans Affairs Medical CenterSalt Lake CityUtahUSA
| | - David F. Tate
- Department of NeurologyUniversity of UtahSalt Lake CityUtahUSA
- George E. Wahlen Veterans Affairs Medical CenterSalt Lake CityUtahUSA
| | - Aristeidis Sotiras
- Department of Radiology and Institute for Informatics, School of MedicineWashington University St. LouisSt. LouisMissouriUSA
| | - Rajendra A. Morey
- Duke‐UNC Brain Imaging and Analysis CenterDuke UniversityDurhamNorth CarolinaUSA
- Mid‐Atlantic Mental Illness Research Education and Clinical CenterDurham VA Medical CenterDurhamNorth CarolinaUSA
| |
Collapse
|
48
|
Benchmarking spatial and single-cell transcriptomics integration methods for transcript distribution prediction and cell type deconvolution. Nat Methods 2022; 19:662-670. [PMID: 35577954 DOI: 10.1038/s41592-022-01480-9] [Citation(s) in RCA: 197] [Impact Index Per Article: 65.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/10/2021] [Accepted: 03/30/2022] [Indexed: 01/07/2023]
Abstract
Spatial transcriptomics approaches have substantially advanced our capacity to detect the spatial distribution of RNA transcripts in tissues, yet it remains challenging to characterize whole-transcriptome-level data for single cells in space. Addressing this need, researchers have developed integration methods to combine spatial transcriptomic data with single-cell RNA-seq data to predict the spatial distribution of undetected transcripts and/or perform cell type deconvolution of spots in histological sections. However, to date, no independent studies have comparatively analyzed these integration methods to benchmark their performance. Here we present benchmarking of 16 integration methods using 45 paired datasets (comprising both spatial transcriptomics and scRNA-seq data) and 32 simulated datasets. We found that Tangram, gimVI, and SpaGE outperformed other integration methods for predicting the spatial distribution of RNA transcripts, whereas Cell2location, SpatialDWLS, and RCTD are the top-performing methods for the cell type deconvolution of spots. We provide a benchmark pipeline to help researchers select optimal integration methods to process their datasets.
Collapse
|
49
|
Guan J, Zhuang Y, Kang Y, Ji G. Shared and Cell-Type-Specific Gene Expression Patterns Associated With Autism Revealed by Integrative Regularized Non-Negative Matrix Factorization. Front Genet 2022; 13:865371. [PMID: 35646047 PMCID: PMC9130660 DOI: 10.3389/fgene.2022.865371] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/29/2022] [Accepted: 04/11/2022] [Indexed: 11/21/2022] Open
Abstract
Human brain-related disorders, such as autism spectrum disorder (ASD), are often characterized by cell heterogeneity, as the cell atlas of brains consists of diverse cell types. There are commonality and specificity in gene expression among different cell types of brains; hence, there may also be commonality and specificity in dysregulated gene expression affected by ASD among brain cells. Moreover, as genes interact together, it is important to identify shared and cell-type-specific ASD-related gene modules for studying the cell heterogeneity of ASD. To this end, we propose integrative regularized non-negative matrix factorization (iRNMF) by imposing a new regularization based on integrative non-negative matrix factorization. Using iRNMF, we analyze gene expression data of multiple cell types of the human brain to obtain shared and cell-type-specific gene modules. Based on ASD risk genes, we identify shared and cell-type-specific ASD-associated gene modules. By analyzing these gene modules, we study the commonality and specificity among different cell types in dysregulated gene expression affected by ASD. The shared ASD-associated gene modules are mostly relevant to the functioning of synapses, while in different cell types, different kinds of gene functions may be specifically dysregulated in ASD, such as inhibitory extracellular ligand-gated ion channel activity in GABAergic interneurons and excitatory postsynaptic potential and ionotropic glutamate receptor signaling pathway in glutamatergic neurons. Our results provide new insights into the molecular mechanism and pathogenesis of ASD. The identification of shared and cell-type-specific ASD-related gene modules can facilitate the development of more targeted biomarkers and treatments for ASD.
Collapse
Affiliation(s)
- Jinting Guan
- Department of Automation, Xiamen University, Xiamen, China
- National Institute for Data Science in Health and Medicine, Xiamen University, Xiamen, China
| | - Yan Zhuang
- Department of Automation, Xiamen University, Xiamen, China
| | - Yue Kang
- Department of Automation, Xiamen University, Xiamen, China
| | - Guoli Ji
- Department of Automation, Xiamen University, Xiamen, China
- National Institute for Data Science in Health and Medicine, Xiamen University, Xiamen, China
| |
Collapse
|
50
|
Jerby-Arnon L, Regev A. DIALOGUE maps multicellular programs in tissue from single-cell or spatial transcriptomics data. Nat Biotechnol 2022; 40:1467-1477. [PMID: 35513526 PMCID: PMC9547813 DOI: 10.1038/s41587-022-01288-0] [Citation(s) in RCA: 53] [Impact Index Per Article: 17.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/19/2020] [Accepted: 03/15/2022] [Indexed: 12/22/2022]
Abstract
Deciphering the functional interactions of cells in tissues remains a major challenge. We describe DIALOGUE, a method to systematically uncover multicellular programs (MCPs) — combinations of coordinated cellular programs in different cell types that form higher-order functional units at the tissue level — from either spatial data or single-cell data obtained without spatial information. Tested on spatial datasets from the mouse hypothalamus, cerebellum, visual cortex, and neocortex, DIALOGUE identified MCPs associated with animal behavior and recovered spatial properties when tested on unseen data, while outperforming other methods and metrics. In spatial data from human lung cancer, DIALOGUE identified MCPs marking immune activation and tissue remodeling. Applied to scRNA-seq data across individuals or regions, DIALOGUE uncovered MCPs in Alzheimer’s disease, ulcerative colitis, and treatment with cancer immunotherapy. These programs were predictive of disease outcome and predisposition in independent cohorts and included risk genes from genome-wide association studies (GWAS). DIALOGUE enables the analysis of multicellular regulation in health and disease. Coordinated gene programs spanning multiple different cell types are identified in healthy and diseased tissues.
Collapse
Affiliation(s)
- Livnat Jerby-Arnon
- Department of Genetics, Stanford University School of Medicine, Stanford, CA, USA. .,Chan Zuckerberg Biohub, San Francisco, CA, USA. .,Klarman Cell Observatory, Broad Institute of MIT and Harvard, Cambridge, MA, USA.
| | - Aviv Regev
- Klarman Cell Observatory, Broad Institute of MIT and Harvard, Cambridge, MA, USA. .,Department of Biology, Massachusetts Institute of Technology, Cambridge, MA, USA. .,Genentech, South San Francisco, CA, USA.
| |
Collapse
|