1
|
Wu Y, Shi Z, Zhou X, Zhang P, Yang X, Ding J, Wu H. scHiCyclePred: a deep learning framework for predicting cell cycle phases from single-cell Hi-C data using multi-scale interaction information. Commun Biol 2024; 7:923. [PMID: 39085477 PMCID: PMC11291681 DOI: 10.1038/s42003-024-06626-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/30/2023] [Accepted: 07/24/2024] [Indexed: 08/02/2024] Open
Abstract
The emergence of single-cell Hi-C (scHi-C) technology has provided unprecedented opportunities for investigating the intricate relationship between cell cycle phases and the three-dimensional (3D) structure of chromatin. However, accurately predicting cell cycle phases based on scHi-C data remains a formidable challenge. Here, we present scHiCyclePred, a prediction model that integrates multiple feature sets to leverage scHi-C data for predicting cell cycle phases. scHiCyclePred extracts 3D chromatin structure features by incorporating multi-scale interaction information. The comparative analysis illustrates that scHiCyclePred surpasses existing methods such as Nagano_method and CIRCLET across various metrics including accuracy (ACC), F1 score, Precision, Recall, and balanced accuracy (BACC). In addition, we evaluate scHiCyclePred against the previously published CIRCLET using the dataset of complex tissues (Liu_dataset). Experimental results reveal significant improvements with scHiCyclePred exhibiting improvements of 0.39, 0.52, 0.52, and 0.39 over the CIRCLET in terms of ACC, F1 score, Precision, and Recall metrics, respectively. Furthermore, we conduct analyses on three-dimensional chromatin dynamics and gene features during the cell cycle, providing a more comprehensive understanding of cell cycle dynamics through chromatin structure. scHiCyclePred not only offers insights into cell biology but also holds promise for catalyzing breakthroughs in disease research. Access scHiCyclePred on GitHub at https:// github.com/HaoWuLab-Bioinformatics/ scHiCyclePred .
Collapse
Affiliation(s)
- Yingfu Wu
- School of Software, Shandong University, Jinan, Shandong, China
- Shenzhen Research Institute of Shandong University, Shenzhen, Guangdong, China
- College of Information Engineering, Northwest A&F University, Yangling, Shaanxi, China
| | - Zhenqi Shi
- School of Software, Shandong University, Jinan, Shandong, China
| | - Xiangfei Zhou
- School of Software, Shandong University, Jinan, Shandong, China
| | - Pengyu Zhang
- College of Information Engineering, Northwest A&F University, Yangling, Shaanxi, China
| | - Xiuhui Yang
- School of Software, Shandong University, Jinan, Shandong, China
| | - Jun Ding
- Department of Medicine, Meakins-Christie Laboratories, McGill University, Montreal, QC, Canada.
| | - Hao Wu
- School of Software, Shandong University, Jinan, Shandong, China.
- Shenzhen Research Institute of Shandong University, Shenzhen, Guangdong, China.
| |
Collapse
|
2
|
Wang Y, Kong X, Bi X, Cui L, Yu H, Wu H. ResDeepSurv: A Survival Model for Deep Neural Networks Based on Residual Blocks and Self-attention Mechanism. Interdiscip Sci 2024; 16:405-417. [PMID: 38489147 DOI: 10.1007/s12539-024-00617-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/09/2023] [Revised: 01/30/2024] [Accepted: 02/01/2024] [Indexed: 03/17/2024]
Abstract
Survival analysis, as a widely used method for analyzing and predicting the timing of event occurrence, plays a crucial role in the medicine field. Medical professionals utilize survival models to gain insight into the effects of patient covariates on the disease, and the correlation with the effectiveness of different treatment strategies. This knowledge is essential for the development of treatment plans and the enhancement of treatment approaches. Conventional survival models, such as the Cox proportional hazards model, require a significant amount of feature engineering or prior knowledge to facilitate personalized modeling. To address these limitations, we propose a novel residual-based self-attention deep neural network for survival modeling, called ResDeepSurv, which combines the benefits of neural networks and the Cox proportional hazards regression model. The model proposed in our study simulates the distribution of survival time and the correlation between covariates and outcomes, but does not impose strict assumptions on the basic distribution of survival data. This approach effectively accounts for both linear and nonlinear risk functions in survival data analysis. The performance of our model in analyzing survival data with various risk functions is on par with or even superior to that of other existing survival analysis methods. Furthermore, we validate the superior performance of our model in comparison to currently existing methods by evaluating multiple publicly available clinical datasets. Through this study, we prove the effectiveness of our proposed model in survival analysis, providing a promising alternative to traditional approaches. The application of deep learning techniques and the ability to capture complex relationships between covariates and survival outcomes without relying on extensive feature engineering make our model a valuable tool for personalized medicine and decision-making in clinical practice.
Collapse
Affiliation(s)
- Yuchen Wang
- School of Software, Shandong University, Jinan, 250101, China
| | - Xianchun Kong
- Department of Pediatric Surgery, Heze Municipal Hospital, Heze, 274000, China
| | - Xiao Bi
- School of Mathematics, Shandong University, Jinan, 250100, China
| | - Lizhen Cui
- School of Software, Shandong University, Jinan, 250101, China
| | - Hong Yu
- School of Computer Science and Technology, Chongqing University of Posts and Telecommunications, Chongqing, 400065, China
| | - Hao Wu
- School of Software, Shandong University, Jinan, 250101, China.
| |
Collapse
|
3
|
Zhang H, Wang Y, Lian B, Wang Y, Li X, Wang T, Shang X, Yang H, Aziz A, Hu J. Scbean: a python library for single-cell multi-omics data analysis. Bioinformatics 2024; 40:btae053. [PMID: 38290765 PMCID: PMC10868338 DOI: 10.1093/bioinformatics/btae053] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/10/2023] [Revised: 01/10/2024] [Accepted: 01/25/2024] [Indexed: 02/01/2024] Open
Abstract
SUMMARY Single-cell multi-omics technologies provide a unique platform for characterizing cell states and reconstructing developmental process by simultaneously quantifying and integrating molecular signatures across various modalities, including genome, transcriptome, epigenome, and other omics layers. However, there is still an urgent unmet need for novel computational tools in this nascent field, which are critical for both effective and efficient interrogation of functionality across different omics modalities. Scbean represents a user-friendly Python library, designed to seamlessly incorporate a diverse array of models for the examination of single-cell data, encompassing both paired and unpaired multi-omics data. The library offers uniform and straightforward interfaces for tasks, such as dimensionality reduction, batch effect elimination, cell label transfer from well-annotated scRNA-seq data to scATAC-seq data, and the identification of spatially variable genes. Moreover, Scbean's models are engineered to harness the computational power of GPU acceleration through Tensorflow, rendering them capable of effortlessly handling datasets comprising millions of cells. AVAILABILITY AND IMPLEMENTATION Scbean is released on the Python Package Index (PyPI) (https://pypi.org/project/scbean/) and GitHub (https://github.com/jhu99/scbean) under the MIT license. The documentation and example code can be found at https://scbean.readthedocs.io/en/latest/.
Collapse
Affiliation(s)
- Haohui Zhang
- School of Computer Science, Northwestern Polytechnical University, 710129 Xi'an, Shaanxi, China
| | - Yuwei Wang
- School of Computer Science, Northwestern Polytechnical University, 710129 Xi'an, Shaanxi, China
| | - Bin Lian
- School of Computer Science, Northwestern Polytechnical University, 710129 Xi'an, Shaanxi, China
| | - Yiran Wang
- School of Computer Science, Northwestern Polytechnical University, 710129 Xi'an, Shaanxi, China
| | - Xingyi Li
- School of Computer Science, Northwestern Polytechnical University, 710129 Xi'an, Shaanxi, China
| | - Tao Wang
- School of Computer Science, Northwestern Polytechnical University, 710129 Xi'an, Shaanxi, China
| | - Xuequn Shang
- School of Computer Science, Northwestern Polytechnical University, 710129 Xi'an, Shaanxi, China
| | - Hui Yang
- School of Life Science, Northwestern Polytechnical University, 710072 Xi'an, Shaanxi, China
| | - Ahmad Aziz
- Population Health Sciences, German Center for Neurodegenerative Diseases (DZNE), 53127 Bonn, Germany
- Department of Neurology, Faculty of Medicine, University of Bonn, 53105 Bonn, Germany
| | - Jialu Hu
- School of Computer Science, Northwestern Polytechnical University, 710129 Xi'an, Shaanxi, China
- Population Health Sciences, German Center for Neurodegenerative Diseases (DZNE), 53127 Bonn, Germany
| |
Collapse
|
4
|
Wu H, Zhou B, Zhou H, Zhang P, Wang M. Be-1DCNN: a neural network model for chromatin loop prediction based on bagging ensemble learning. Brief Funct Genomics 2023; 22:475-484. [PMID: 37133976 DOI: 10.1093/bfgp/elad015] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/16/2023] [Revised: 03/10/2023] [Accepted: 03/29/2023] [Indexed: 05/04/2023] Open
Abstract
The chromatin loops in the three-dimensional (3D) structure of chromosomes are essential for the regulation of gene expression. Despite the fact that high-throughput chromatin capture techniques can identify the 3D structure of chromosomes, chromatin loop detection utilizing biological experiments is arduous and time-consuming. Therefore, a computational method is required to detect chromatin loops. Deep neural networks can form complex representations of Hi-C data and provide the possibility of processing biological datasets. Therefore, we propose a bagging ensemble one-dimensional convolutional neural network (Be-1DCNN) to detect chromatin loops from genome-wide Hi-C maps. First, to obtain accurate and reliable chromatin loops in genome-wide contact maps, the bagging ensemble learning method is utilized to synthesize the prediction results of multiple 1DCNN models. Second, each 1DCNN model consists of three 1D convolutional layers for extracting high-dimensional features from input samples and one dense layer for producing the prediction results. Finally, the prediction results of Be-1DCNN are compared to those of the existing models. The experimental results indicate that Be-1DCNN predicts high-quality chromatin loops and outperforms the state-of-the-art methods using the same evaluation metrics. The source code of Be-1DCNN is available for free at https://github.com/HaoWuLab-Bioinformatics/Be1DCNN.
Collapse
Affiliation(s)
- Hao Wu
- College of Information Engineering, Northwest A&F University, Yangling, 712100 Shaanxi, China
- School of Software, Shandong University, Jinan, 250101 Shandong, China
| | - Bing Zhou
- College of Information Engineering, Northwest A&F University, Yangling, 712100 Shaanxi, China
| | - Haoru Zhou
- College of Information Engineering, Northwest A&F University, Yangling, 712100 Shaanxi, China
| | - Pengyu Zhang
- College of Information Engineering, Northwest A&F University, Yangling, 712100 Shaanxi, China
| | - Meili Wang
- College of Information Engineering, Northwest A&F University, Yangling, 712100 Shaanxi, China
| |
Collapse
|
5
|
Hu M, Zhu J, Peng G, Lu W, Wang H, Xie Z. IMOVNN: incomplete multi-omics data integration variational neural networks for gut microbiome disease prediction and biomarker identification. Brief Bioinform 2023; 24:bbad394. [PMID: 37930027 DOI: 10.1093/bib/bbad394] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/21/2023] [Revised: 09/03/2023] [Accepted: 10/14/2023] [Indexed: 11/07/2023] Open
Abstract
The gut microbiome has been regarded as one of the fundamental determinants regulating human health, and multi-omics data profiling has been increasingly utilized to bolster the deep understanding of this complex system. However, stemming from cost or other constraints, the integration of multi-omics often suffers from incomplete views, which poses a great challenge for the comprehensive analysis. In this work, a novel deep model named Incomplete Multi-Omics Variational Neural Networks (IMOVNN) is proposed for incomplete data integration, disease prediction application and biomarker identification. Benefiting from the information bottleneck and the marginal-to-joint distribution integration mechanism, the IMOVNN can learn the marginal latent representation of each individual omics and the joint latent representation for better disease prediction. Moreover, owing to the feature-selective layer predicated upon the concrete distribution, the model is interpretable and can identify the most relevant features. Experiments on inflammatory bowel disease multi-omics datasets demonstrate that our method outperforms several state-of-the-art methods for disease prediction. In addition, IMOVNN has identified significant biomarkers from multi-omics data sources.
Collapse
Affiliation(s)
- Mingyi Hu
- School of Artificial Intelligence and Computer Science, Jiangnan University, Wuxi, China
| | - Jinlin Zhu
- School of Food Science and Technology, Jiangnan University, Wuxi, China
| | | | - Wenwei Lu
- School of Food Science and Technology, Jiangnan University, Wuxi, China
| | - Hongchao Wang
- School of Food Science and Technology, Jiangnan University, Wuxi, China
| | - Zhenping Xie
- School of Artificial Intelligence and Computer Science, Jiangnan University, Wuxi, China
| |
Collapse
|
6
|
Syama K, Jothi JAA, Khanna N. Automatic disease prediction from human gut metagenomic data using boosting GraphSAGE. BMC Bioinformatics 2023; 24:126. [PMID: 37003965 PMCID: PMC10067187 DOI: 10.1186/s12859-023-05251-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/27/2023] [Accepted: 03/23/2023] [Indexed: 04/03/2023] Open
Abstract
BACKGROUND The human microbiome plays a critical role in maintaining human health. Due to the recent advances in high-throughput sequencing technologies, the microbiome profiles present in the human body have become publicly available. Hence, many works have been done to analyze human microbiome profiles. These works have identified that different microbiome profiles are present in healthy and sick individuals for different diseases. Recently, several computational methods have utilized the microbiome profiles to automatically diagnose and classify the host phenotype. RESULTS In this work, a novel deep learning framework based on boosting GraphSAGE is proposed for automatic prediction of diseases from metagenomic data. The proposed framework has two main components, (a). Metagenomic Disease graph (MD-graph) construction module, (b). Disease prediction Network (DP-Net) module. The graph construction module constructs a graph by considering each metagenomic sample as a node in the graph. The graph captures the relationship between the samples using a proximity measure. The DP-Net consists of a boosting GraphSAGE model which predicts the status of a sample as sick or healthy. The effectiveness of the proposed method is verified using real and synthetic datasets corresponding to diseases like inflammatory bowel disease and colorectal cancer. The proposed model achieved a highest AUC of 93%, Accuracy of 95%, F1-score of 95%, AUPRC of 95% for the real inflammatory bowel disease dataset and a best AUC of 90%, Accuracy of 91%, F1-score of 87% and AUPRC of 93% for the real colorectal cancer dataset. CONCLUSION The proposed framework outperforms other machine learning and deep learning models in terms of classification accuracy, AUC, F1-score and AUPRC for both synthetic and real metagenomic data.
Collapse
Affiliation(s)
- K Syama
- Department of Computer Science, Birla Institute of Technology and Science Pilani Dubai Campus, Dubai International Academic City , Dubai, UAE
| | - J Angel Arul Jothi
- Department of Computer Science, Birla Institute of Technology and Science Pilani Dubai Campus, Dubai International Academic City , Dubai, UAE.
| | | |
Collapse
|