1
|
Zhang S, Kong W, Wang S, Wei K, Liu K, Wen G, Yu Y. Effective Integration of Single-Cell Multi-Omics Data Using Improved Network-Based Integrative Clustering with Multigraph Regularization. J Comput Biol 2025. [PMID: 40401439 DOI: 10.1089/cmb.2023.0460] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 05/23/2025] Open
Abstract
The purpose of integrating different omics data is to study cellular heterogeneity at the level of transcriptional regulation from different gene levels, which can effectively identify cell types and reveal the pathogenesis of Alzheimer's disease (AD) from two perspectives. However, implementing such algorithms faces challenges such as high data noise levels, increased dimensionality, and computational complexity. In this study, multigraph regularization constraints were introduced in the network-based integrative clustering algorithm (MGR-NIC) to remove redundant features and keep the geometry structures underlying the data by fusing two types of data (snRNA-seq and snATAC-seq) of glial cells from AD samples. The effectiveness of the MGR-NIC algorithm was validated using both simulation datasets and real datasets derived from various tissues. The MGR-NIC algorithm can improve clustering accuracy by selecting features that better represent the dataset's structure. The clustering results obtained with the MGR-NIC algorithm show strong consistency with the clustering results inherent to the published DLPFC dataset, while the classification results generated using the NIC algorithm often lead to cluster overlap when applied to the DLPFC dataset. We will use the same state-of-the-art algorithms for a comprehensive evaluation with our proposed MGR-NIC algorithm, including NIC, scAI, Multi-Omics Factor Analysis v2, and JSNMF. MGR-NIC is the most stable and reliable method, implying its robustness across different datasets and its reliability in yielding consistent and accurate results.
Collapse
Affiliation(s)
- Shunqin Zhang
- College of Information Engineering, Shanghai Maritime University, Shanghai, P.R. China
| | - Wei Kong
- College of Information Engineering, Shanghai Maritime University, Shanghai, P.R. China
| | - Shuaiqun Wang
- College of Information Engineering, Shanghai Maritime University, Shanghai, P.R. China
| | - Kai Wei
- Bio-Med Big Data Center, CAS Key Laboratory of Computational Biology, Shanghai Institute of Nutrition and Health, University of Chinese Academy of Sciences, Chinese Academy of Sciences, Shanghai, P.R. China
| | - Kun Liu
- College of Information Engineering, Shanghai Maritime University, Shanghai, P.R. China
| | - Gen Wen
- Department of Orthopedic Surgery, Shanghai Sixth People's Hospital Affiliated to Shanghai Jiao Tong University School of Medicine, Shanghai, P.R. China
| | - Yaling Yu
- Department of Orthopedic Surgery, Shanghai Sixth People's Hospital Affiliated to Shanghai Jiao Tong University School of Medicine, Shanghai, P.R. China
- Institute of Microsurgery on Extremities, Shanghai Sixth People's Hospital Affiliated to Shanghai Jiao Tong University School of Medicine, Shanghai, P.R. China
| |
Collapse
|
2
|
Zhang B, Nan M, Wang L, Wu H, Chen X, Shi Y, Ma Y, Gao J. JSNMFuP: a unsupervised method for the integrative analysis of single-cell multi-omics data based on non-negative matrix factorization. BMC Genomics 2025; 26:274. [PMID: 40114052 PMCID: PMC11924690 DOI: 10.1186/s12864-025-11462-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/22/2024] [Accepted: 03/10/2025] [Indexed: 03/22/2025] Open
Abstract
With the rapid advancement of sequencing technology, the increasing availability of single-cell multi-omics data from the same cells has provided us with unprecedented opportunities to understand the cellular phenotypes. Integrating multi-omics data has the potential to enhance the ability to reveal cellular heterogeneity. However, data integration analysis is extremely challenging due to the different characteristics and noise levels of different molecular modalities in single-cell data. In this paper, an unsupervised integration method (JSNMFuP) based on non-negative matrix factorization is proposed. This method integrates the information extracted from the latent variables of each omic through a consensus graph. High-dimensional geometrical structure is captured in the original data and biologically-related feature links across modalities are incorporated into the model using regularization terms. JSNMFuP can be utilized for data visualization and clustering, facilitating marker characterization and gene ontology enrichment analysis, providing rich biological insights for downstream analysis. The application on real datasets shows that JSNMFuP has superior performance in cell clustering. The factors are interpretable, making it an effective method for analyzing cell heterogeneity using single-cell multi-omics data.
Collapse
Affiliation(s)
- Bai Zhang
- School of Science, Jiangnan University, Wuxi, Jiangsu, China
| | - Mengdi Nan
- School of Science, Jiangnan University, Wuxi, Jiangsu, China
| | - Liugen Wang
- School of Artificial Intelligence and Computer Science, Jiangnan University, Wuxi, Jiangsu, China
| | - Hanwen Wu
- School of Science, Jiangnan University, Wuxi, Jiangsu, China
| | - Xiang Chen
- School of Science, Jiangnan University, Wuxi, Jiangsu, China
| | - Yongle Shi
- School of Science, Jiangnan University, Wuxi, Jiangsu, China
| | - Yibing Ma
- School of Science, Jiangnan University, Wuxi, Jiangsu, China
| | - Jie Gao
- School of Science, Jiangnan University, Wuxi, Jiangsu, China.
| |
Collapse
|
3
|
Ma Y, Liu L. NMFGOT: a multi-view learning framework for the microbiome and metabolome integrative analysis with optimal transport plan. NPJ Biofilms Microbiomes 2024; 10:135. [PMID: 39582023 PMCID: PMC11586431 DOI: 10.1038/s41522-024-00612-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/23/2024] [Accepted: 11/14/2024] [Indexed: 11/26/2024] Open
Abstract
The rapid development of high-throughput sequencing techniques provides an unprecedented opportunity to generate biological insights into microbiome-related diseases. However, the relationships among microbes, metabolites and human microenvironment are extremely complex, making data analysis challenging. Here, we present NMFGOT, which is a versatile toolkit for the integrative analysis of microbiome and metabolome data from the same samples. NMFGOT is an unsupervised learning framework based on nonnegative matrix factorization with graph regularized optimal transport, where it utilizes the optimal transport plan to measure the probability distance between microbiome samples, which better dealt with the nonlinear high-order interactions among microbial taxa and metabolites. Moreover, it also includes a spatial regularization term to preserve the spatial consistency of samples in the embedding space across different data modalities. We implemented NMFGOT in several multi-omics microbiome datasets from multiple cohorts. The experimental results showed that NMFGOT consistently performed well compared with several recently published multi-omics integrating methods. Moreover, NMFGOT also facilitates downstream biological analysis, including pathway enrichment analysis and disease-specific metabolite-microbe association analysis. Using NMFGOT, we identified the significantly and stable metabolite-microbe associations in GC and ESRD diseases, which improves our understanding for the mechanisms of human complex diseases.
Collapse
Affiliation(s)
- Yuanyuan Ma
- School of Computer Engineering, Hubei University of Arts and Science, Xiangyang, Hubei, China.
- Hubei Key Laboratory of Power System Design and Test for Electrical Vehicle, Hubei University of Arts and Science, Xiangyang, China.
| | - Lifang Liu
- School of Physics and Electronic Engineering, Hubei University of Arts and Science, Xiangyang, Hubei, China
| |
Collapse
|
4
|
Wei PJ, Bao JJ, Gao Z, Tan JY, Cao RF, Su Y, Zheng CH, Deng L. MEFFGRN: Matrix enhancement and feature fusion-based method for reconstructing the gene regulatory network of epithelioma papulosum cyprini cells by spring viremia of carp virus infection. Comput Biol Med 2024; 179:108835. [PMID: 38996550 DOI: 10.1016/j.compbiomed.2024.108835] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/01/2024] [Revised: 06/05/2024] [Accepted: 06/29/2024] [Indexed: 07/14/2024]
Abstract
Gene regulatory networks (GRNs) are crucial for understanding organismal molecular mechanisms and processes. Construction of GRN in the epithelioma papulosum cyprini (EPC) cells of cyprinid fish by spring viremia of carp virus (SVCV) infection helps understand the immune regulatory mechanisms that enhance the survival capabilities of cyprinid fish. Although many computational methods have been used to infer GRNs, specialized approaches for predicting the GRN of EPC cells following SVCV infection are lacking. In addition, most existing methods focus primarily on gene expression features, neglecting the valuable network structural information in known GRNs. In this study, we propose a novel supervised deep neural network, named MEFFGRN (Matrix Enhancement- and Feature Fusion-based method for Gene Regulatory Network inference), to accurately predict the GRN of EPC cells following SVCV infection. MEFFGRN considers both gene expression data and network structure information of known GRN and introduces a matrix enhancement method to address the sparsity issue of known GRN, extracting richer network structure information. To optimize the benefits of CNN (Convolutional Neural Network) in image processing, gene expression and enhanced GRN data were transformed into histogram images for each gene pair respectively. Subsequently, these histograms were separately fed into CNNs for training to obtain the corresponding gene expression and network structural features. Furthermore, a feature fusion mechanism was introduced to comprehensively integrate the gene expression and network structural features. This integration considers the specificity of each feature and their interactive information, resulting in a more comprehensive and precise feature representation during the fusion process. Experimental results from both real-world and benchmark datasets demonstrate that MEFFGRN achieves competitive performance compared with state-of-the-art computational methods. Furthermore, study findings from SVCV-infected EPC cells suggest that MEFFGRN can predict novel gene regulatory relationships.
Collapse
Affiliation(s)
- Pi-Jing Wei
- Key Laboratory of Intelligent Computing & Signal Processing of Ministry of Education, Institutes of Physical Science and Information Technology, Anhui University, 111 Jiulong Road, Hefei, 230601, Anhui, China
| | - Jin-Jin Bao
- Key Laboratory of Intelligent Computing & Signal Processing of Ministry of Education, School of Computer Science and Technology, Anhui University, 111 Jiulong Road, Hefei, 230601, Anhui, China
| | - Zhen Gao
- Key Laboratory of Intelligent Computing & Signal Processing of Ministry of Education, School of Computer Science and Technology, Anhui University, 111 Jiulong Road, Hefei, 230601, Anhui, China
| | - Jing-Yun Tan
- Shenzhen Key Laboratory of Microbial Genetic Engineering, College of Life Sciences and Oceanology, Shenzhen University, Shenzhen, 518055, Guangdong, China
| | - Rui-Fen Cao
- Key Laboratory of Intelligent Computing & Signal Processing of Ministry of Education, School of Computer Science and Technology, Anhui University, 111 Jiulong Road, Hefei, 230601, Anhui, China
| | - Yansen Su
- School of Artificial Intelligence, Anhui University, 111 Jiulong Road, Hefei, 230601, Anhui, China
| | - Chun-Hou Zheng
- School of Artificial Intelligence, Anhui University, 111 Jiulong Road, Hefei, 230601, Anhui, China.
| | - Li Deng
- Shenzhen Key Laboratory of Microbial Genetic Engineering, College of Life Sciences and Oceanology, Shenzhen University, Shenzhen, 518055, Guangdong, China.
| |
Collapse
|
5
|
Ma Y, Liu L, Zhao Y, Hang B, Zhang Y. HyperGCN: an effective deep representation learning framework for the integrative analysis of spatial transcriptomics data. BMC Genomics 2024; 25:566. [PMID: 38840049 PMCID: PMC11155133 DOI: 10.1186/s12864-024-10469-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/07/2024] [Accepted: 05/29/2024] [Indexed: 06/07/2024] Open
Abstract
BACKGROUND Advances of spatial transcriptomics technologies enabled simultaneously profiling gene expression and spatial locations of cells from the same tissue. Computational tools and approaches for integration of transcriptomics data and spatial context information are urgently needed to comprehensively explore the underlying structure patterns. In this manuscript, we propose HyperGCN for the integrative analysis of gene expression and spatial information profiled from the same tissue. HyperGCN enables data visualization and clustering, and facilitates downstream analysis, including domain segmentation, the characterization of marker genes for the specific domain structure and GO enrichment analysis. RESULTS Extensive experiments are implemented on four real datasets from different tissues (including human dorsolateral prefrontal cortex, human positive breast tumors, mouse brain, mouse olfactory bulb tissue and Zabrafish melanoma) and technologies (including 10X visium, osmFISH, seqFISH+, 10X Xenium and Stereo-seq) with different spatial resolutions. The results show that HyperGCN achieves superior clustering performance and produces good domain segmentation effects while identifies biologically meaningful spatial expression patterns. This study provides a flexible framework to analyze spatial transcriptomics data with high geometric complexity. CONCLUSIONS HyperGCN is an unsupervised method based on hypergraph induced graph convolutional network, where it assumes that there existed disjoint tissues with high geometric complexity, and models the semantic relationship of cells through hypergraph, which better tackles the high-order interactions of cells and levels of noise in spatial transcriptomics data.
Collapse
Affiliation(s)
- Yuanyuan Ma
- School of Computer Engineering, Hubei University of Arts and Science, Xiangyang, China.
- Hubei Key Laboratory of Power System Design and Test for Electrical Vehicle, Hubei University of Arts and Science, Xiangyang, China.
| | - Lifang Liu
- School of Physics and Electronic Engineering, Hubei University of Arts and Science, Xiangyang, China
| | - Yongbiao Zhao
- School of Computer Engineering, Hubei University of Arts and Science, Xiangyang, China
- School of Computer, Central China Normal University, Wuhan, China
| | - Bo Hang
- School of Computer Engineering, Hubei University of Arts and Science, Xiangyang, China
| | - Yanduo Zhang
- School of Computer Engineering, Hubei University of Arts and Science, Xiangyang, China
| |
Collapse
|
6
|
Wang H, Liu Z, Ma X. Learning Consistency and Specificity of Cells From Single-Cell Multi-Omic Data. IEEE J Biomed Health Inform 2024; 28:3134-3145. [PMID: 38709615 DOI: 10.1109/jbhi.2024.3370868] [Citation(s) in RCA: 5] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 05/08/2024]
Abstract
Advancements in single-cell technologies concomitantly develop the epigenomic and transcriptomic profiles at the cell levels, providing opportunities to explore the potential biological mechanisms. Even though significant efforts have been dedicated to them, it remains challenging for the integration analysis of multi-omic data of single-cell because of the heterogeneity, complicated coupling and interpretability of data. To handle these issues, we propose a novel self-representation Learning-based Multi-omics data Integrative Clustering algorithm (sLMIC) for the integration of single-cell epigenomic profiles (DNA methylation or scATAC-seq) and transcriptomic (scRNA-seq), which the consistent and specific features of cells are explicitly extracted facilitating the cell clustering. Specifically, sLMIC constructs a graph for each type of single-cell data, thereby transforming omics data into multi-layer networks, which effectively removes heterogeneity of omic data. Then, sLMIC employs the low-rank and exclusivity constraints to separate the self-representation of cells into two parts, i.e., the shared and specific features, which explicitly characterize the consistency and diversity of omic data, providing an effective strategy to model the structure of cell types. Feature extraction and cell clustering are jointly formulated as an overall objective function, where latent features of data are obtained under the guidance of cell clustering. The extensive experimental results on 13 multi-omics datasets of single-cell from diverse organisms and tissues indicate that sLMIC observably exceeds the advanced algorithms regarding various measurements.
Collapse
|
7
|
Bhattacharyya S, Ehsan SF, Karacosta LG. Phenotypic maps for precision medicine: a promising systems biology tool for assessing therapy response and resistance at a personalized level. FRONTIERS IN NETWORK PHYSIOLOGY 2023; 3:1256104. [PMID: 37964768 PMCID: PMC10642209 DOI: 10.3389/fnetp.2023.1256104] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 07/10/2023] [Accepted: 09/28/2023] [Indexed: 11/16/2023]
Abstract
In this perspective we discuss how tumor heterogeneity and therapy resistance necessitate a focus on more personalized approaches, prompting a shift toward precision medicine. At the heart of the shift towards personalized medicine, omics-driven systems biology becomes a driving force as it leverages high-throughput technologies and novel bioinformatics tools. These enable the creation of systems-based maps, providing a comprehensive view of individual tumor's functional plasticity. We highlight the innovative PHENOSTAMP program, which leverages high-dimensional data to construct a visually intuitive and user-friendly map. This map was created to encapsulate complex transitional states in cancer cells, such as Epithelial-Mesenchymal Transition (EMT) and Mesenchymal-Epithelial Transition (MET), offering a visually intuitive way to understand disease progression and therapeutic responses at single-cell resolution in relation to EMT-related single-cell phenotypes. Most importantly, PHENOSTAMP functions as a reference map, which allows researchers and clinicians to assess one clinical specimen at a time in relation to their phenotypic heterogeneity, setting the foundation on constructing phenotypic maps for personalized medicine. This perspective argues that such dynamic predictive maps could also catalyze the development of personalized cancer treatment. They hold the potential to transform our understanding of cancer biology, providing a foundation for a future where therapy is tailored to each patient's unique molecular and cellular tumor profile. As our knowledge of cancer expands, these maps can be continually refined, ensuring they remain a valuable tool in precision oncology.
Collapse
Affiliation(s)
- Sayantan Bhattacharyya
- Department of Cancer Systems Imaging, University of Texas MD Anderson Cancer Center, Houston, TX, United States
| | - Shafqat F. Ehsan
- Department of Cancer Systems Imaging, University of Texas MD Anderson Cancer Center, Houston, TX, United States
- Department of Radiation Oncology, University of Texas MD Anderson Cancer Center, Houston, TX, United States
| | - Loukia G. Karacosta
- Department of Cancer Systems Imaging, University of Texas MD Anderson Cancer Center, Houston, TX, United States
| |
Collapse
|
8
|
Zhang C, Yang Y, Tang S, Aihara K, Zhang C, Chen L. Contrastively generative self-expression model for single-cell and spatial multimodal data. Brief Bioinform 2023; 24:bbad265. [PMID: 37507114 DOI: 10.1093/bib/bbad265] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/16/2023] [Revised: 05/27/2023] [Accepted: 07/03/2023] [Indexed: 07/30/2023] Open
Abstract
Advances in single-cell multi-omics technology provide an unprecedented opportunity to fully understand cellular heterogeneity. However, integrating omics data from multiple modalities is challenging due to the individual characteristics of each measurement. Here, to solve such a problem, we propose a contrastive and generative deep self-expression model, called single-cell multimodal self-expressive integration (scMSI), which integrates the heterogeneous multimodal data into a unified manifold space. Specifically, scMSI first learns each omics-specific latent representation and self-expression relationship to consider the characteristics of different omics data by deep self-expressive generative model. Then, scMSI combines these omics-specific self-expression relations through contrastive learning. In such a way, scMSI provides a paradigm to integrate multiple omics data even with weak relation, which effectively achieves the representation learning and data integration into a unified framework. We demonstrate that scMSI provides a cohesive solution for a variety of analysis tasks, such as integration analysis, data denoising, batch correction and spatial domain detection. We have applied scMSI on various single-cell and spatial multimodal datasets to validate its high effectiveness and robustness in diverse data types and application scenarios.
Collapse
Affiliation(s)
- Chengming Zhang
- Key Laboratory of Systems Biology, Shanghai Institute of Biochemistry and Cell Biology, Center for Excellence in Molecular Cell Science, Chinese Academy of Sciences, Shanghai 200031, China
- International Research Center for Neurointelligence, The University of Tokyo Institutes for Advanced Study, The University of Tokyo, Tokyo 113-0033, Japan
| | - Yiwen Yang
- Key Laboratory of Systems Biology, Shanghai Institute of Biochemistry and Cell Biology, Center for Excellence in Molecular Cell Science, Chinese Academy of Sciences, Shanghai 200031, China
- School of Life Science and Technology, ShanghaiTech University, Shanghai 201210, China
| | - Shijie Tang
- Key Laboratory of Systems Biology, Shanghai Institute of Biochemistry and Cell Biology, Center for Excellence in Molecular Cell Science, Chinese Academy of Sciences, Shanghai 200031, China
| | - Kazuyuki Aihara
- International Research Center for Neurointelligence, The University of Tokyo Institutes for Advanced Study, The University of Tokyo, Tokyo 113-0033, Japan
| | - Chuanchao Zhang
- Key Laboratory of Systems Health Science of Zhejiang Province, Hangzhou Institute for Advanced Study, University of Chinese Academy of Sciences, Chinese Academy of Sciences, Hangzhou 310024, China
- Guangdong Institute of Intelligence Science and Technology, Hengqin, Zhuhai, Guangdong 519031, China
| | - Luonan Chen
- Key Laboratory of Systems Biology, Shanghai Institute of Biochemistry and Cell Biology, Center for Excellence in Molecular Cell Science, Chinese Academy of Sciences, Shanghai 200031, China
- School of Life Science and Technology, ShanghaiTech University, Shanghai 201210, China
- Key Laboratory of Systems Health Science of Zhejiang Province, Hangzhou Institute for Advanced Study, University of Chinese Academy of Sciences, Chinese Academy of Sciences, Hangzhou 310024, China
- Guangdong Institute of Intelligence Science and Technology, Hengqin, Zhuhai, Guangdong 519031, China
| |
Collapse
|
9
|
Zhang W, Lin Z. iPoLNG-An unsupervised model for the integrative analysis of single-cell multiomics data. Front Genet 2023; 14:998504. [PMID: 36865385 PMCID: PMC9972291 DOI: 10.3389/fgene.2023.998504] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/20/2022] [Accepted: 01/24/2023] [Indexed: 02/09/2023] Open
Abstract
Single-cell multiomics technologies, where the transcriptomic and epigenomic profiles are simultaneously measured in the same set of single cells, pose significant challenges for effective integrative analysis. Here, we propose an unsupervised generative model, iPoLNG, for the effective and scalable integration of single-cell multiomics data. iPoLNG reconstructs low-dimensional representations of the cells and features using computationally efficient stochastic variational inference by modelling the discrete counts in single-cell multiomics data with latent factors. The low-dimensional representation of cells enables the identification of distinct cell types, and the feature by factor loading matrices help characterize cell-type specific markers and provide rich biological insights on the functional pathway enrichment analysis. iPoLNG is also able to handle the setting of partial information where certain modality of the cells is missing. Taking advantage of GPU and probabilistic programming, iPoLNG is scalable to large datasets and it takes less than 15 min to implement on datasets with 20,000 cells.
Collapse
Affiliation(s)
- Wenyu Zhang
- Department of Statistics, The Chinese University of Hong Kong, Hong Kong, China
| | | |
Collapse
|
10
|
Zeng P, Ma Y, Lin Z. scAWMV: an adaptively weighted multi-view learning framework for the integrative analysis of parallel scRNA-seq and scATAC-seq data. Bioinformatics 2022; 39:6831091. [PMID: 36383176 PMCID: PMC9805575 DOI: 10.1093/bioinformatics/btac739] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/07/2022] [Revised: 10/16/2022] [Accepted: 11/15/2022] [Indexed: 11/17/2022] Open
Abstract
MOTIVATION Technological advances have enabled us to profile single-cell multi-omics data from the same cells, providing us with an unprecedented opportunity to understand the cellular phenotype and links to its genotype. The available protocols and multi-omics datasets [including parallel single-cell RNA sequencing (scRNA-seq) and single-cell ATAC sequencing (scATAC-seq) data profiled from the same cell] are growing increasingly. However, such data are highly sparse and tend to have high level of noise, making data analysis challenging. The methods that integrate the multi-omics data can potentially improve the capacity of revealing the cellular heterogeneity. RESULTS We propose an adaptively weighted multi-view learning (scAWMV) method for the integrative analysis of parallel scRNA-seq and scATAC-seq data profiled from the same cell. scAWMV considers both the difference in importance across different modalities in multi-omics data and the biological connection of the features in the scRNA-seq and scATAC-seq data. It generates biologically meaningful low-dimensional representations for the transcriptomic and epigenomic profiles via unsupervised learning. Application to four real datasets demonstrates that our framework scAWMV is an efficient method to dissect cellular heterogeneity for single-cell multi-omics data. AVAILABILITY AND IMPLEMENTATION The software and datasets are available at https://github.com/pengchengzeng/scAWMV. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Pengcheng Zeng
- Institute of Mathematical Sciences, ShanghaiTech University, Shanghai 201210, China
| | - Yuanyuan Ma
- School of Computer and Information Engineering, Anyang Normal University, Henan 455000, China
| | | |
Collapse
|