1
|
Gao R, Ferraro TN, Chen L, Zhang S, Chen Y. Enhancing Single-Cell and Bulk Hi-C Data Using a Generative Transformer Model. BIOLOGY 2025; 14:288. [PMID: 40136544 PMCID: PMC11940666 DOI: 10.3390/biology14030288] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 02/11/2025] [Revised: 03/01/2025] [Accepted: 03/10/2025] [Indexed: 03/27/2025]
Abstract
The 3D organization of chromatin in the nucleus plays a critical role in regulating gene expression and maintaining cellular functions in eukaryotic cells. High-throughput chromosome conformation capture (Hi-C) and its derivative technologies have been developed to map genome-wide chromatin interactions at the population and single-cell levels. However, insufficient sequencing depth and high noise levels in bulk Hi-C data, particularly in single-cell Hi-C (scHi-C) data, result in low-resolution contact matrices, thereby limiting diverse downstream computational analyses in identifying complex chromosomal organizations. To address these challenges, we developed a transformer-based deep learning model, HiCENT, to impute and enhance both scHi-C and Hi-C contact matrices. Validation experiments on large-scale bulk Hi-C and scHi-C datasets demonstrated that HiCENT achieves superior enhancement effects compared to five popular methods. When applied to real Hi-C data from the GM12878 cell line, HiCENT effectively enhanced 3D structural features at the scales of topologically associated domains and chromosomal loops. Furthermore, when applied to scHi-C data from five human cell lines, it significantly improved clustering performance, outperforming five widely used methods. The adaptability of HiCENT across different datasets and its capacity to improve the quality of chromatin interaction data will facilitate diverse downstream computational analyses in 3D genome research, single-cell studies and other large-scale omics investigations.
Collapse
Affiliation(s)
- Ruoying Gao
- College of Computer and Information Engineering, Tianjin Normal University, Tianjin 300387, China; (R.G.); (L.C.)
| | - Thomas N. Ferraro
- Department of Biomedical Sciences, Cooper Medical School of Rowan University, Camden, NJ 08103, USA;
| | - Liang Chen
- College of Computer and Information Engineering, Tianjin Normal University, Tianjin 300387, China; (R.G.); (L.C.)
| | - Shaoqiang Zhang
- College of Computer and Information Engineering, Tianjin Normal University, Tianjin 300387, China; (R.G.); (L.C.)
| | - Yong Chen
- Department of Biological and Biomedical Sciences, Rowan University, Glassboro, NJ 08028, USA
| |
Collapse
|
2
|
Dautle MA, Chen Y. Single-Cell Hi-C Technologies and Computational Data Analysis. ADVANCED SCIENCE (WEINHEIM, BADEN-WURTTEMBERG, GERMANY) 2025; 12:e2412232. [PMID: 39887949 PMCID: PMC11884588 DOI: 10.1002/advs.202412232] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 10/01/2024] [Revised: 01/14/2025] [Indexed: 02/01/2025]
Abstract
Single-cell chromatin conformation capture (scHi-C) techniques have evolved to provide significant insights into the structural organization and regulatory mechanisms in individual cells. Although many scHi-C protocols have been developed, they often involve intricate procedures and the resulting data are sparse, leading to computational challenges for systematic data analysis and limited applicability. This review provides a comprehensive overview, quantitative evaluation of thirteen protocols and practical guidance on computational topics. It is first assessed the efficiency of these protocols based on the total number of contacts recovered per cell and the cis/trans ratio. It is then provided systematic considerations for scHi-C quality control and data imputation. Additionally, the capabilities and implementations of various analysis methods, covering cell clustering, A/B compartment calling, topologically associating domain (TAD) calling, loop calling, 3D reconstruction, scHi-C data simulation and differential interaction analysis is summarized. It is further highlighted key computational challenges associated with the specific complexities of scHi-C data and propose potential solutions.
Collapse
Affiliation(s)
- Madison A Dautle
- Department of Biological and Biomedical SciencesRowan UniversityGlassboroNJ08028USA
| | - Yong Chen
- Department of Biological and Biomedical SciencesRowan UniversityGlassboroNJ08028USA
| |
Collapse
|
3
|
Menon R, Mohit Chowdhury H, Oluwadare O. ScHiCAtt: Enhancing single-cell Hi-C data resolution using attention-based models. Comput Struct Biotechnol J 2025; 27:978-991. [PMID: 40160860 PMCID: PMC11953966 DOI: 10.1016/j.csbj.2025.02.031] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/16/2024] [Revised: 02/24/2025] [Accepted: 02/25/2025] [Indexed: 04/02/2025] Open
Abstract
The spatial organization of chromatin is fundamental to gene regulation and essential for proper cellular function. The Hi-C technique remains one of the leading methods for unraveling 3D genome structures; however, limited resolution, data sparsity, and incomplete coverage in single-cell Hi-C data pose significant challenges for comprehensive analysis. Traditional convolutional neural network-based models often suffer from blurring and loss of fine details, while generative adversarial network based methods encounter difficulties in maintaining diversity and generalization. Moreover, existing algorithms perform poorly in cross-cell line generalization, where a model trained on one cell type is used to enhance high-resolution data in another cell type. To address these limitations, we propose ScHiCAtt (Single-cell Hi-C Attention-Based Model), which leverages attention mechanisms to capture both long-range and local dependencies in Hi-C data, significantly enhancing resolution while preserving biologically meaningful interactions. By dynamically focusing on regions of interest, attention mechanisms effectively mitigate data sparsity and enhance model performance in low-resolution contexts. Extensive experiments on Human and Drosophila single-cell Hi-C data demonstrate that ScHiCAtt consistently outperforms existing methods in terms of computational and biological reproducibility metrics across various downsampling ratios. Our results also show superior generalization across different chromosomes of the same cell type, as well as across cell types, species, and from single-cell to bulk Hi-C data, highlighting the robustness and adaptability of our approach. ScHiCAtt source code is publicly available at https://github.com/OluwadareLab/ScHiCAtt.
Collapse
Affiliation(s)
- Rohit Menon
- Department of Computer Science, University of Colorado at Colorado Springs, Colorado Springs, 80918, CO, USA
| | - H.M.A. Mohit Chowdhury
- Department of Computer Science, University of Colorado at Colorado Springs, Colorado Springs, 80918, CO, USA
| | - Oluwatosin Oluwadare
- Department of Computer Science, University of Colorado at Colorado Springs, Colorado Springs, 80918, CO, USA
- Department of Biomedical Informatics, University of Colorado Anschutz Medical Campus, Aurora, 80045, CO, USA
| |
Collapse
|
4
|
Kang B, Lee H, Roh TY. Deciphering single-cell genomic architecture: insights into cellular heterogeneity and regulatory dynamics. Genomics Inform 2025; 23:5. [PMID: 39934929 DOI: 10.1186/s44342-025-00037-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/31/2024] [Accepted: 01/19/2025] [Indexed: 02/13/2025] Open
Abstract
BACKGROUND The genomic architecture of eukaryotes exhibits dynamic spatial and temporal changes, enabling cellular processes critical for maintaining viability and functional diversity. Recent advances in sequencing technologies have facilitated the dissection of genomic architecture and functional activity at single-cell resolution, moving beyond the averaged signals typically derived from bulk cell analyses. MAIN BODY The advent of single-cell genomics and epigenomics has yielded transformative insights into cellular heterogeneity, behavior, and biological complexity with unparalleled genomic resolution and reproducibility. This review summarizes recent progress in the characterization of genomic architecture at the single-cell level, emphasizing the impact of structural variation and chromatin organization on gene regulatory networks and cellular identity. CONCLUSION Future directions in single-cell genomics and high-resolution epigenomic methodologies are explored, focusing on emerging challenges and potential impacts on the understanding of cellular states, regulatory dynamics, and the intricate mechanisms driving cellular function and diversity. Future perspectives on the challenges and potential implications of single-cell genomics, along with high-resolution genomic and epigenomic technologies for understanding cellular states and regulatory dynamics, are also discussed.
Collapse
Affiliation(s)
- Byunghee Kang
- Department of Life Sciences, Pohang University of Science and Technology (POSTECH), Pohang, 37673, Republic of Korea
| | - Hyeonji Lee
- Department of Life Sciences, Pohang University of Science and Technology (POSTECH), Pohang, 37673, Republic of Korea
| | - Tae-Young Roh
- Department of Life Sciences, Ewha Womans University, Seoul, 03760, Republic of Korea.
| |
Collapse
|
5
|
Wang Y, Cheng J. HiCDiff: single-cell Hi-C data denoising with diffusion models. Brief Bioinform 2024; 25:bbae279. [PMID: 38856167 PMCID: PMC11163381 DOI: 10.1093/bib/bbae279] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/08/2024] [Revised: 05/21/2024] [Accepted: 05/29/2024] [Indexed: 06/11/2024] Open
Abstract
The genome-wide single-cell chromosome conformation capture technique, i.e. single-cell Hi-C (ScHi-C), was recently developed to interrogate the conformation of the genome of individual cells. However, single-cell Hi-C data are much sparser than bulk Hi-C data of a population of cells, and noise in single-cell Hi-C makes it difficult to apply and analyze them in biological research. Here, we developed the first generative diffusion models (HiCDiff) to denoise single-cell Hi-C data in the form of chromosomal contact matrices. HiCDiff uses a deep residual network to remove the noise in the reverse process of diffusion and can be trained in both unsupervised and supervised learning modes. Benchmarked on several single-cell Hi-C test datasets, the diffusion models substantially remove the noise in single-cell Hi-C data. The unsupervised HiCDiff outperforms most supervised non-diffusion deep learning methods and achieves the performance comparable to the state-of-the-art supervised deep learning method in terms of multiple metrics, demonstrating that diffusion models are a useful approach to denoising single-cell Hi-C data. Moreover, its good performance holds on denoising bulk Hi-C data.
Collapse
Affiliation(s)
- Yanli Wang
- Department of Electrical Engineering and Computer Science, NextGen Precision Health Institute, University of Missouri, Columbia, MO 65211, United States
| | - Jianlin Cheng
- Department of Electrical Engineering and Computer Science, NextGen Precision Health Institute, University of Missouri, Columbia, MO 65211, United States
| |
Collapse
|
6
|
Guo Z, Liu J, Wang Y, Chen M, Wang D, Xu D, Cheng J. Diffusion models in bioinformatics and computational biology. NATURE REVIEWS BIOENGINEERING 2024; 2:136-154. [PMID: 38576453 PMCID: PMC10994218 DOI: 10.1038/s44222-023-00114-9] [Citation(s) in RCA: 16] [Impact Index Per Article: 16.0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Accepted: 08/25/2023] [Indexed: 04/06/2024]
Abstract
Denoising diffusion models embody a type of generative artificial intelligence that can be applied in computer vision, natural language processing and bioinformatics. In this Review, we introduce the key concepts and theoretical foundations of three diffusion modelling frameworks (denoising diffusion probabilistic models, noise-conditioned scoring networks and score stochastic differential equations). We then explore their applications in bioinformatics and computational biology, including protein design and generation, drug and small-molecule design, protein-ligand interaction modelling, cryo-electron microscopy image data analysis and single-cell data analysis. Finally, we highlight open-source diffusion model tools and consider the future applications of diffusion models in bioinformatics.
Collapse
Affiliation(s)
- Zhiye Guo
- Department of Electrical Engineering and Computer Science, University of Missouri, Columbia, MO, USA
- NextGen Precision Health, University of Missouri, Columbia, MO, USA
| | - Jian Liu
- Department of Electrical Engineering and Computer Science, University of Missouri, Columbia, MO, USA
- NextGen Precision Health, University of Missouri, Columbia, MO, USA
| | - Yanli Wang
- Department of Electrical Engineering and Computer Science, University of Missouri, Columbia, MO, USA
- NextGen Precision Health, University of Missouri, Columbia, MO, USA
| | - Mengrui Chen
- Department of Electrical Engineering and Computer Science, University of Missouri, Columbia, MO, USA
- NextGen Precision Health, University of Missouri, Columbia, MO, USA
| | - Duolin Wang
- Department of Electrical Engineering and Computer Science, University of Missouri, Columbia, MO, USA
- NextGen Precision Health, University of Missouri, Columbia, MO, USA
| | - Dong Xu
- Department of Electrical Engineering and Computer Science, University of Missouri, Columbia, MO, USA
- NextGen Precision Health, University of Missouri, Columbia, MO, USA
| | - Jianlin Cheng
- Department of Electrical Engineering and Computer Science, University of Missouri, Columbia, MO, USA
- NextGen Precision Health, University of Missouri, Columbia, MO, USA
| |
Collapse
|