1
|
Xie Q, Meng W, Lin S. scHiCSRS: a self-representation smoothing method with Gaussian mixture model for imputing single cell Hi-C data. BMC Bioinformatics 2025; 26:132. [PMID: 40399810 DOI: 10.1186/s12859-025-06147-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/08/2024] [Accepted: 04/23/2025] [Indexed: 05/23/2025] Open
Abstract
BACKGROUND Single cell Hi-C (scHi-C) techniques make it possible to study cell-to-cell variability, but excess of zeros are makes scHi-C matrices extremely sparse and difficult for downstream analyses. The observed zeros are a combination of two events: structural zeros for which two loci never interact due to underlying biological mechanisms, or dropouts (sampling zeros) where two loci interact but not captured due to insufficient sequencing depth. Although data quality improvement approaches have been proposed, little has been done to differentiate these two types of zeros, even though such a distinction can greatly benefit downstream analysis such as clustering. RESULTS We propose scHiCSRS, a self-representation smoothing method that improves data quality, and a Gaussian mixture model that identifies structural zeros among observed zeros. scHiCSRS not only takes spatial dependencies of a scHi-C data matrix into account but also borrows information from similar single cells. Through an extensive set of simulation studies, we demonstrate the ability of scHiCSRS for identifying structural zeros with high sensitivity and for accurate imputation of dropout values in sampling zeros. Downstream analyses for three experimental datasets show that data improved from scHiCSRS yield more accurate clustering of cells than simply using observed data or improved data from comparison methods. CONCLUSION In summary, scHiCSRS provides a valuable tool for identifying structural zeros and imputing dropouts. The resulted data are improved for downstream analysis, especially for understanding cell-to-cell variation through subtype clustering.
Collapse
Affiliation(s)
- Qing Xie
- Interdisciplinary Ph.D. Program in Biostatistics, The Ohio State University, Columbus, OH, 43210, USA
| | - Wang Meng
- Abigail Wexner Research Institute, Nationwide Children's Hospital, Columbus, OH, 43205, USA
| | - Shili Lin
- Interdisciplinary Ph.D. Program in Biostatistics, The Ohio State University, Columbus, OH, 43210, USA.
- Department of Statistics, The Ohio State University, Columbus, OH, 43210, USA.
| |
Collapse
|
2
|
Lee DI, Roy S. Examining the dynamics of three-dimensional genome organization with multitask matrix factorization. Genome Res 2025; 35:1179-1193. [PMID: 40113262 PMCID: PMC12047540 DOI: 10.1101/gr.279930.124] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/15/2024] [Accepted: 02/20/2025] [Indexed: 03/22/2025]
Abstract
Three-dimensional (3D) genome organization, which determines how the DNA is packaged inside the nucleus, has emerged as a key component of the gene regulation machinery. High-throughput chromosome conformation data sets, such as Hi-C, have become available across multiple conditions and time points, offering a unique opportunity to examine changes in 3D genome organization and link them to phenotypic changes in normal and disease processes. However, systematic detection of higher-order structural changes across multiple Hi-C data sets remains a major challenge. Existing computational methods either do not model higher-order structural units or cannot model dynamics across more than two conditions of interest. We address these limitations with tree-guided integrated factorization (TGIF), a generalizable multitask nonnegative matrix factorization (NMF) approach that can be applied to time series or hierarchically related biological conditions. TGIF can identify large-scale changes at the compartment or subcompartment levels, as well as local changes at boundaries of topologically associated domains (TADs). Based on benchmarking in simulated and real Hi-C data, TGIF boundaries are more accurate and reproducible across differential levels of noise and sources of technical artifacts, and are more enriched in CTCF. Application to three multisample mammalian data sets shows that TGIF can detect differential regions at compartment, subcompartment, and boundary levels that are associated with significant changes in regulatory signals and gene expression enriched in tissue-specific processes. Finally, we leverage TGIF boundaries to prioritize sequence variants for multiple phenotypes from the NHGRI GWAS catalog. Taken together, TGIF is a flexible tool to examine 3D genome organization dynamics across disease and developmental processes.
Collapse
Affiliation(s)
- Da-Inn Lee
- Department of Biostatistics and Medical Informatics, University of Wisconsin-Madison, Madison, Wisconsin 53715, USA
| | - Sushmita Roy
- Department of Biostatistics and Medical Informatics, University of Wisconsin-Madison, Madison, Wisconsin 53715, USA;
- Wisconsin Institute for Discovery, Madison, Wisconsin 53715, USA
| |
Collapse
|
3
|
Chai H, Huang X, Xiong G, Huang J, Pels KK, Meng L, Han J, Tang D, Pan G, Deng L, Xiao Q, Wang X, Zhang M, Banecki K, Plewczynski D, Wei CL, Ruan Y. Tri-omic single-cell mapping of the 3D epigenome and transcriptome in whole mouse brains throughout the lifespan. Nat Methods 2025; 22:994-1007. [PMID: 40301621 DOI: 10.1038/s41592-025-02658-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/14/2024] [Accepted: 03/13/2025] [Indexed: 05/01/2025]
Abstract
Exploring the genomic basis of transcriptional programs has been a long-standing research focus. Here we report a single-cell method, ChAIR, to map chromatin accessibility, chromatin interactions and RNA expression simultaneously. After validating in cultured cells, we applied ChAIR to whole mouse brains and delineated the concerted dynamics of epigenome, three-dimensional (3D) genome and transcriptome during maturation and aging. In particular, gene-centric chromatin interactions and open chromatin states provided 3D epigenomic mechanism underlying cell-type-specific transcription and revealed spatially resolved specificity. Importantly, the composition of short-range and ultralong chromatin contacts in individual cells is remarkably correlated with transcriptional activity, open chromatin state and genome folding density. This genomic property, along with associated cellular properties, differs in neurons and non-neuronal cells across different anatomic regions throughout the lifespan, implying divergent nuclear mechano-genomic mechanisms at play in brain cells. Our results demonstrate ChAIR's robustness in revealing single-cell 3D epigenomic states of cell-type-specific transcription in complex tissues.
Collapse
Affiliation(s)
- Haoxi Chai
- Life Sciences Institute and The Second Affiliated Hospital, Zhejiang University, Hangzhou, China
| | - Xingyu Huang
- Life Sciences Institute and The Second Affiliated Hospital, Zhejiang University, Hangzhou, China
| | - Guangzhou Xiong
- Life Sciences Institute and The Second Affiliated Hospital, Zhejiang University, Hangzhou, China
| | - Jiaxiang Huang
- Life Sciences Institute and The Second Affiliated Hospital, Zhejiang University, Hangzhou, China
| | - Katarzyna Karolina Pels
- Life Sciences Institute and The Second Affiliated Hospital, Zhejiang University, Hangzhou, China
| | - Lingyun Meng
- Life Sciences Institute and The Second Affiliated Hospital, Zhejiang University, Hangzhou, China
| | - Jin Han
- Life Sciences Institute and The Second Affiliated Hospital, Zhejiang University, Hangzhou, China
| | - Dongmei Tang
- Life Sciences Institute and The Second Affiliated Hospital, Zhejiang University, Hangzhou, China
| | - Guanjing Pan
- Life Sciences Institute and The Second Affiliated Hospital, Zhejiang University, Hangzhou, China
| | - Liang Deng
- Life Sciences Institute and The Second Affiliated Hospital, Zhejiang University, Hangzhou, China
| | - Qin Xiao
- Life Sciences Institute and The Second Affiliated Hospital, Zhejiang University, Hangzhou, China
| | - Xiaotao Wang
- Obstetrics and Gynecology Hospital, Institute of Reproduction and Development, Shanghai Key Laboratory of Reproduction and Development, Fudan University, Shanghai, China
| | - Meng Zhang
- Department of Genome Sciences, University of Washington, Seattle, WA, USA
| | - Krzysztof Banecki
- Laboratory of Bioinformatics and Computational Genomics, Faculty of Mathematics and Information Science, Warsaw University of Technology, Warsaw, Poland
| | - Dariusz Plewczynski
- Laboratory of Bioinformatics and Computational Genomics, Faculty of Mathematics and Information Science, Warsaw University of Technology, Warsaw, Poland
| | - Chia-Lin Wei
- Department of Genome Sciences, University of Washington, Seattle, WA, USA
| | - Yijun Ruan
- Life Sciences Institute and The Second Affiliated Hospital, Zhejiang University, Hangzhou, China.
| |
Collapse
|
4
|
Nguyen M, Wall BPG, Harrell JC, Dozmorov MG. scHiCcompare: An R Package for Differential Analysis of Single-cell Hi-C Data. J Mol Biol 2025:169155. [PMID: 40246224 DOI: 10.1016/j.jmb.2025.169155] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/06/2024] [Revised: 02/19/2025] [Accepted: 04/09/2025] [Indexed: 04/19/2025]
Abstract
Changes in the three-dimensional (3D) structure of the human genome are associated with various conditions, such as cancer and developmental disorders. Techniques like chromatin conformation capture (Hi-C) have been developed to study these global 3D structures, typically requiring millions of cells and an extremely high sequencing depth (around 1 billion reads per sample) for bulk Hi-C. In contrast, single-cell Hi-C (scHi-C) captures 3D structures at the individual cell level but faces significant data sparsity, characterized by a high proportion of zeros. scHi-C data enable the identification of cell types with distinct 3D structures; consequently, identifying differential chromatin interactions between such groups may offer insights into cell type-specific regulation. While differential analysis methods exist for bulk Hi-C data, they are limited for scHi-C data. To address this, we developed a method for differential scHi-C analysis, extending the HiCcompare R package. Our approach optionally imputes sparse scHi-C data by considering genomic distances and creates pseudo-bulk Hi-C matrices by summing condition-specific data. The data are normalized using locally estimated scatterplot smoothing (LOESS) regression, and differential chromatin interactions are detected via Gaussian Mixture Model (GMM) clustering. Our workflow outperforms existing methods in identifying differential chromatin interactions across various genomic distances, fold changes, resolutions, and sample sizes in both simulated and experimental contexts. This enables the effective detection of cell type-specific differences in chromatin structure and shows expected associations with biological and epigenetic features. Our method is implemented in the scHiCcompare R package, available at https://bioconductor.org/packages/scHiCcompare.
Collapse
Affiliation(s)
- My Nguyen
- Department of Biostatistics, Virginia Commonwealth University, Richmond, VA 23298, USA
| | - Brydon P G Wall
- Department of Biostatistics, Virginia Commonwealth University, Richmond, VA 23298, USA
| | - J Chuck Harrell
- Department of Pathology, Virginia Commonwealth University, Richmond, VA 23284, USA; Massey Comprehensive Cancer Center, Virginia Commonwealth University, Richmond, VA 23298, USA
| | - Mikhail G Dozmorov
- Department of Biostatistics, Virginia Commonwealth University, Richmond, VA 23298, USA; Department of Pathology, Virginia Commonwealth University, Richmond, VA 23284, USA.
| |
Collapse
|
5
|
Wang H, Yang J, Yu X, Zhang Y, Qian J, Wang J. Tensor-FLAMINGO unravels the complexity of single-cell spatial architectures of genomes at high-resolution. Nat Commun 2025; 16:3435. [PMID: 40210623 PMCID: PMC11986053 DOI: 10.1038/s41467-025-58674-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/15/2024] [Accepted: 03/26/2025] [Indexed: 04/12/2025] Open
Abstract
The dynamic three-dimensional spatial conformations of chromosomes demonstrate complex structural variations across single cells, which plays pivotal roles in modulating single-cell specific transcription and epigenetics landscapes. The high rates of missing contacts in single-cell chromatin contact maps impose significant challenges to reconstruct high-resolution spatial chromatin configurations. We develop a data-driven algorithm, Tensor-FLAMINGO, based on a low-rank tensor completion strategy. Implemented on a diverse panel of single-cell chromatin datasets, Tensor-FLAMINGO generates 10kb- and 30kb-resolution spatial chromosomal architectures across individual cells. Tensor-FLAMINGO achieves superior accuracy in reconstructing 3D chromatin structures, recovering missing contacts, and delineating cell clusters. The unprecedented high-resolution characterization of single-cell genome folding enables expanded identification of single-cell specific long-range chromatin interactions, multi-way spatial hubs, and the mechanisms of disease-associated GWAS variants. Beyond the sparse 2D contact maps, the complete 3D chromatin conformations promote an avenue to understand the dynamics of spatially coordinated molecular processes across different cells.
Collapse
Affiliation(s)
- Hao Wang
- Department of Computational Mathematics, Science and Engineering, Michigan State University, East Lansing, MI, 48824, USA
| | - Jiaxin Yang
- Department of Computational Mathematics, Science and Engineering, Michigan State University, East Lansing, MI, 48824, USA
| | - Xinrui Yu
- Department of Computational Mathematics, Science and Engineering, Michigan State University, East Lansing, MI, 48824, USA
| | - Yu Zhang
- Department of Microbiology, Genetics, and Immunology, Michigan State University, East Lansing, MI, 48824, USA.
| | - Jianliang Qian
- Department of Computational Mathematics, Science and Engineering, Michigan State University, East Lansing, MI, 48824, USA.
- Department of Mathematics, Michigan State University, East Lansing, MI, 48824, USA.
| | - Jianrong Wang
- Department of Computational Mathematics, Science and Engineering, Michigan State University, East Lansing, MI, 48824, USA.
| |
Collapse
|
6
|
Gao R, Ferraro TN, Chen L, Zhang S, Chen Y. Enhancing Single-Cell and Bulk Hi-C Data Using a Generative Transformer Model. BIOLOGY 2025; 14:288. [PMID: 40136544 PMCID: PMC11940666 DOI: 10.3390/biology14030288] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 02/11/2025] [Revised: 03/01/2025] [Accepted: 03/10/2025] [Indexed: 03/27/2025]
Abstract
The 3D organization of chromatin in the nucleus plays a critical role in regulating gene expression and maintaining cellular functions in eukaryotic cells. High-throughput chromosome conformation capture (Hi-C) and its derivative technologies have been developed to map genome-wide chromatin interactions at the population and single-cell levels. However, insufficient sequencing depth and high noise levels in bulk Hi-C data, particularly in single-cell Hi-C (scHi-C) data, result in low-resolution contact matrices, thereby limiting diverse downstream computational analyses in identifying complex chromosomal organizations. To address these challenges, we developed a transformer-based deep learning model, HiCENT, to impute and enhance both scHi-C and Hi-C contact matrices. Validation experiments on large-scale bulk Hi-C and scHi-C datasets demonstrated that HiCENT achieves superior enhancement effects compared to five popular methods. When applied to real Hi-C data from the GM12878 cell line, HiCENT effectively enhanced 3D structural features at the scales of topologically associated domains and chromosomal loops. Furthermore, when applied to scHi-C data from five human cell lines, it significantly improved clustering performance, outperforming five widely used methods. The adaptability of HiCENT across different datasets and its capacity to improve the quality of chromatin interaction data will facilitate diverse downstream computational analyses in 3D genome research, single-cell studies and other large-scale omics investigations.
Collapse
Affiliation(s)
- Ruoying Gao
- College of Computer and Information Engineering, Tianjin Normal University, Tianjin 300387, China; (R.G.); (L.C.)
| | - Thomas N. Ferraro
- Department of Biomedical Sciences, Cooper Medical School of Rowan University, Camden, NJ 08103, USA;
| | - Liang Chen
- College of Computer and Information Engineering, Tianjin Normal University, Tianjin 300387, China; (R.G.); (L.C.)
| | - Shaoqiang Zhang
- College of Computer and Information Engineering, Tianjin Normal University, Tianjin 300387, China; (R.G.); (L.C.)
| | - Yong Chen
- Department of Biological and Biomedical Sciences, Rowan University, Glassboro, NJ 08028, USA
| |
Collapse
|
7
|
Dautle MA, Chen Y. Single-Cell Hi-C Technologies and Computational Data Analysis. ADVANCED SCIENCE (WEINHEIM, BADEN-WURTTEMBERG, GERMANY) 2025; 12:e2412232. [PMID: 39887949 PMCID: PMC11884588 DOI: 10.1002/advs.202412232] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 10/01/2024] [Revised: 01/14/2025] [Indexed: 02/01/2025]
Abstract
Single-cell chromatin conformation capture (scHi-C) techniques have evolved to provide significant insights into the structural organization and regulatory mechanisms in individual cells. Although many scHi-C protocols have been developed, they often involve intricate procedures and the resulting data are sparse, leading to computational challenges for systematic data analysis and limited applicability. This review provides a comprehensive overview, quantitative evaluation of thirteen protocols and practical guidance on computational topics. It is first assessed the efficiency of these protocols based on the total number of contacts recovered per cell and the cis/trans ratio. It is then provided systematic considerations for scHi-C quality control and data imputation. Additionally, the capabilities and implementations of various analysis methods, covering cell clustering, A/B compartment calling, topologically associating domain (TAD) calling, loop calling, 3D reconstruction, scHi-C data simulation and differential interaction analysis is summarized. It is further highlighted key computational challenges associated with the specific complexities of scHi-C data and propose potential solutions.
Collapse
Affiliation(s)
- Madison A Dautle
- Department of Biological and Biomedical SciencesRowan UniversityGlassboroNJ08028USA
| | - Yong Chen
- Department of Biological and Biomedical SciencesRowan UniversityGlassboroNJ08028USA
| |
Collapse
|
8
|
Kang B, Lee H, Roh TY. Deciphering single-cell genomic architecture: insights into cellular heterogeneity and regulatory dynamics. Genomics Inform 2025; 23:5. [PMID: 39934929 DOI: 10.1186/s44342-025-00037-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/31/2024] [Accepted: 01/19/2025] [Indexed: 02/13/2025] Open
Abstract
BACKGROUND The genomic architecture of eukaryotes exhibits dynamic spatial and temporal changes, enabling cellular processes critical for maintaining viability and functional diversity. Recent advances in sequencing technologies have facilitated the dissection of genomic architecture and functional activity at single-cell resolution, moving beyond the averaged signals typically derived from bulk cell analyses. MAIN BODY The advent of single-cell genomics and epigenomics has yielded transformative insights into cellular heterogeneity, behavior, and biological complexity with unparalleled genomic resolution and reproducibility. This review summarizes recent progress in the characterization of genomic architecture at the single-cell level, emphasizing the impact of structural variation and chromatin organization on gene regulatory networks and cellular identity. CONCLUSION Future directions in single-cell genomics and high-resolution epigenomic methodologies are explored, focusing on emerging challenges and potential impacts on the understanding of cellular states, regulatory dynamics, and the intricate mechanisms driving cellular function and diversity. Future perspectives on the challenges and potential implications of single-cell genomics, along with high-resolution genomic and epigenomic technologies for understanding cellular states and regulatory dynamics, are also discussed.
Collapse
Affiliation(s)
- Byunghee Kang
- Department of Life Sciences, Pohang University of Science and Technology (POSTECH), Pohang, 37673, Republic of Korea
| | - Hyeonji Lee
- Department of Life Sciences, Pohang University of Science and Technology (POSTECH), Pohang, 37673, Republic of Korea
| | - Tae-Young Roh
- Department of Life Sciences, Ewha Womans University, Seoul, 03760, Republic of Korea.
| |
Collapse
|
9
|
Gunsalus LM, Keiser MJ, Pollard KS. ChromaFactor: Deconvolution of single-molecule chromatin organization with non-negative matrix factorization. PLoS Comput Biol 2025; 21:e1012841. [PMID: 39965010 PMCID: PMC11849981 DOI: 10.1371/journal.pcbi.1012841] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/19/2024] [Revised: 02/24/2025] [Accepted: 02/02/2025] [Indexed: 02/20/2025] Open
Abstract
The investigation of chromatin organization in single cells holds great promise for identifying causal relationships between genome structure and function. However, analysis of single-molecule data is hampered by extreme yet inherent heterogeneity, making it challenging to determine the contributions of individual chromatin fibers to bulk trends. To address this challenge, we propose ChromaFactor, a novel computational approach based on non-negative matrix factorization that deconvolves single-molecule chromatin organization datasets into their most salient primary components. ChromaFactor provides the ability to identify trends accounting for the maximum variance in the dataset while simultaneously describing the contribution of individual molecules to each component. Applying our approach to two single-molecule imaging datasets across different genomic scales, we find that these primary components demonstrate significant correlation with key functional phenotypes, including active transcription, enhancer-promoter distance, and genomic compartment. Also, we find that some bulk trends exist at the single-cell level, but only in a small fraction of cells, suggesting that critical changes in genome organization may be driven by specific rare subpopulations rather than occurring uniformly across all cells. ChromaFactor offers a robust tool for understanding the complex interplay between chromatin structure and function on individual DNA molecules, pinpointing which subpopulations drive functional changes and fostering new insights into cellular heterogeneity and its implications for bulk genomic phenomena.
Collapse
Affiliation(s)
- Laura M. Gunsalus
- Gladstone Institute of Data Science & Biotechnology, Gladstone Institutes, San Francisco, California, United States of America
- Bakar Computational Health Sciences Institute, University of California, San Francisco, California, United States of America
| | - Michael J. Keiser
- Bakar Computational Health Sciences Institute, University of California, San Francisco, California, United States of America
- Department of Bioengineering and Therapeutic Sciences, University of California, San Francisco, California, United States of America
- Kavli Institute for Fundamental Neuroscience, University of California, San Francisco, California, United States of America
- Department of Pharmaceutical Chemistry, University of California, San Francisco, California, United States of America
- Institute for Neurodegenerative Diseases, University of California, San Francisco, California, United States of America
| | - Katherine S. Pollard
- Gladstone Institute of Data Science & Biotechnology, Gladstone Institutes, San Francisco, California, United States of America
- Bakar Computational Health Sciences Institute, University of California, San Francisco, California, United States of America
- Department of Epidemiology & Biostatistics, University of California, San Francisco, California, United States of America
- Investigator Program, Chan Zuckerberg Biohub SF, San Francisco, California, United States of America
| |
Collapse
|
10
|
Li M, Yang Y, Wu R, Gong H, Yuan Z, Wang J, Long E, Zhang X, Chen Y. SEE: A Method for Predicting the Dynamics of Chromatin Conformation Based on Single-Cell Gene Expression. ADVANCED SCIENCE (WEINHEIM, BADEN-WURTTEMBERG, GERMANY) 2025; 12:e2406413. [PMID: 39778075 PMCID: PMC11848634 DOI: 10.1002/advs.202406413] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/10/2024] [Revised: 12/01/2024] [Indexed: 01/11/2025]
Abstract
The dynamics of chromatin conformation involve continuous and reversible changes within the nucleus of a cell, which participate in regulating processes such as gene expression, DNA replication, and damage repair. Here, SEE is introduced, an artificial intelligence (AI) method that utilizes autoencoder and transformer techniques to analyze chromatin dynamics using single-cell RNA sequencing data and a limited number of single-cell Hi-C maps. SEE is employed to investigate chromatin dynamics across different scales, enabling the detection of (i) rearrangements in topologically associating domains (TADs), and (ii) oscillations in chromatin interactions at gene loci. Additionally, SEE facilitates the interpretation of disease-associated single-nucleotide polymorphisms (SNPs) by leveraging the dynamic features of chromatin conformation. Overall, SEE offers a single-cell, high-resolution approach to analyzing chromatin dynamics in both developmental and disease contexts.
Collapse
Affiliation(s)
- Minghong Li
- State Key Laboratory of Common Mechanism Research for Major DiseasesDepartment of Biochemistry and Molecular BiologyInstitute of Basic Medical SciencesChinese Academy of Medical Sciences and Peking Union Medical CollegeBeijing100005China
- Department of Computer Science and TechnologyUniversity of Science and Technology BeijingBeijing100083China
| | - Yurong Yang
- State Key Laboratory of Common Mechanism Research for Major DiseasesDepartment of Biochemistry and Molecular BiologyInstitute of Basic Medical SciencesChinese Academy of Medical Sciences and Peking Union Medical CollegeBeijing100005China
| | - Rucheng Wu
- State Key Laboratory of Common Mechanism Research for Major DiseasesDepartment of Biochemistry and Molecular BiologyInstitute of Basic Medical SciencesChinese Academy of Medical Sciences and Peking Union Medical CollegeBeijing100005China
| | - Haiyan Gong
- Beijing Advanced Innovation Center for Materials Genome EngineeringUniversity of Science and Technology BeijingBeijing100083China
| | - Zan Yuan
- State Key Laboratory of Common Mechanism Research for Major DiseasesDepartment of Biochemistry and Molecular BiologyInstitute of Basic Medical SciencesChinese Academy of Medical Sciences and Peking Union Medical CollegeBeijing100005China
| | - Jixin Wang
- School of Basic MedicineTsinghua UniversityBeijing100084China
| | - Erping Long
- The State Key Laboratory of Respiratory Health and MultimorbidityInstitute of Basic Medical SciencesChinese Academy of Medical Sciences and Peking Union Medical CollegeBeijing100005China
| | - Xiaotong Zhang
- Department of Computer Science and TechnologyUniversity of Science and Technology BeijingBeijing100083China
- Shunde Innovation SchoolUniversity of Science and Technology BeijingFoshan528399China
| | - Yang Chen
- State Key Laboratory of Common Mechanism Research for Major DiseasesDepartment of Biochemistry and Molecular BiologyInstitute of Basic Medical SciencesChinese Academy of Medical Sciences and Peking Union Medical CollegeBeijing100005China
| |
Collapse
|
11
|
Wang X, Zhang Y, Ray S, Jha A, Fang T, Hang S, Doulatov S, Noble WS, Wang S. A generalizable Hi-C foundation model for chromatin architecture, single-cell and multi-omics analysis across species. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.12.16.628821. [PMID: 39763871 PMCID: PMC11702576 DOI: 10.1101/2024.12.16.628821] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 01/19/2025]
Abstract
Nuclear DNA is organized into a compact three-dimensional (3D) structure that impacts critical cellular processes. High-throughput chromosome conformation capture (Hi-C) is the most widely used method for measuring 3D genome architecture, while linear epigenomic assays, such as ATAC-seq, DNase-seq, and ChIP-seq, are extensively employed to characterize epigenomic regulation. However, the integrative analysis of chromatin interactions and associated epigenomic regulation remains challenging due to the pairwise nature of Hi-C data, mismatched resolution between Hi-C and epigenomic assays, and inconsistencies among analysis tools. Here we propose HiCFoundation, a Hi-C-based foundation model for integrative analysis linking chromatin structure to downstream regulatory function. HiCFoundation is trained from hundreds of Hi-C assays encompassing 118 million contact matrix submatrices. The model achieves state-of-the-art performance on multiple types of 3D genome analysis, including reproducibility analysis, resolution enhancement, and loop detection. We further demonstrate the model's generalizability through genome architecture analysis of 316 species. Notably, by enhancing low-coverage experimental Hi-C data, HiCFoundation reveals genome-wide loop loss during differentiation of hematopoietic stem and progenitor cells (HSPCs) to neutrophils. Additionally, HiCFoundation is able to predict multiple types of epigenomic activity from Hi-C input and further interprets the link between Hi-C input and epigenomic output to reveal the relationship between chromatin conformation and genome function. Finally, HiCFoundation can analyze single-cell Hi-C data, shedding light on genome structure at single-cell resolution. HiCFoundation thus provides a unified, efficient, generalizable, and interpretable foundation for genome architecture, single-cell and multi-omics analysis across species, paving the path for systematically studying genome 3D architecture and its regulatory mechanisms.
Collapse
Affiliation(s)
- Xiao Wang
- Department of Genome Sciences, University of Washington, Seattle, WA, USA
- Paul G. Allen School of Computer Science and Engineering, University of Washington, Seattle, WA, 98105, USA
| | - Yuanyuan Zhang
- Department of Computer Science, Purdue University, West Lafayette, IN, 47907, USA
| | - Suhita Ray
- Division of Hematology and Oncology, University of Washington, Seattle, WA, 98105, USA
| | - Anupama Jha
- Department of Genome Sciences, University of Washington, Seattle, WA, USA
| | - Tangqi Fang
- Paul G. Allen School of Computer Science and Engineering, University of Washington, Seattle, WA, 98105, USA
| | - Shengqi Hang
- Paul G. Allen School of Computer Science and Engineering, University of Washington, Seattle, WA, 98105, USA
| | - Sergei Doulatov
- Division of Hematology and Oncology, University of Washington, Seattle, WA, 98105, USA
| | - William Stafford Noble
- Department of Genome Sciences, University of Washington, Seattle, WA, USA
- Paul G. Allen School of Computer Science and Engineering, University of Washington, Seattle, WA, 98105, USA
| | - Sheng Wang
- Paul G. Allen School of Computer Science and Engineering, University of Washington, Seattle, WA, 98105, USA
| |
Collapse
|
12
|
Banecki K, Korsak S, Plewczynski D. Advancements and future directions in single-cell Hi-C based 3D chromatin modeling. Comput Struct Biotechnol J 2024; 23:3549-3558. [PMID: 39963420 PMCID: PMC11832020 DOI: 10.1016/j.csbj.2024.09.026] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/06/2024] [Revised: 09/27/2024] [Accepted: 09/29/2024] [Indexed: 02/20/2025] Open
Abstract
Single-cell Hi-C data provides valuable insights into the three-dimensional organization of chromatin within individual cells, yet modeling this data poses significant challenges due to its inherent sparsity and variability. This review comprehensively explores the predominant approaches to reconstructing 3D chromatin structures from single-cell Hi-C data, positioning these methods within the broader contexts of single-cell Hi-C research and bulk Hi-C data modeling. We categorize the modeling strategies based on their objective functions, which are framed in terms of force fields, potentials, cost functions, or likelihood probabilities. Despite their diverse methodologies, these approaches exhibit deep underlying similarities. We further dissect the basic components of these models, such as attractive restraint forces and repulsive forces, and discuss additional terms like fluid viscosity and variation penalties. The review also critically evaluates the current state of model validation, highlighting the inconsistencies across various studies and emphasizing the need for a comprehensive validation framework. We detail common validation techniques, including the comparison of distance matrices and the assessment of contact violations. We argue that the future of single-cell Hi-C modeling lies in integrating multiple data modalities and incorporating cell cycle trajectory information. Such integration could significantly advance our understanding of chromatin conformation dynamics during cell cycle progression and cell differentiation. We also foresee the continued growth of optimization-based and molecular dynamics approaches, supported by general molecular dynamics toolkits.
Collapse
Affiliation(s)
- Krzysztof Banecki
- Laboratory of Bioinformatics and Computational Genomics, Faculty of Mathematics and Information Science, Warsaw University of Technology, Warsaw, Poland
- Laboratory of Functional and Structural Genomics, Centre of New Technologies, University of Warsaw, Warsaw, Poland
| | - Sevastianos Korsak
- Laboratory of Bioinformatics and Computational Genomics, Faculty of Mathematics and Information Science, Warsaw University of Technology, Warsaw, Poland
- Laboratory of Functional and Structural Genomics, Centre of New Technologies, University of Warsaw, Warsaw, Poland
| | - Dariusz Plewczynski
- Laboratory of Bioinformatics and Computational Genomics, Faculty of Mathematics and Information Science, Warsaw University of Technology, Warsaw, Poland
- Laboratory of Functional and Structural Genomics, Centre of New Technologies, University of Warsaw, Warsaw, Poland
| |
Collapse
|
13
|
Ma R, Huang J, Jiang T, Ma W. A mini-review of single-cell Hi-C embedding methods. Comput Struct Biotechnol J 2024; 23:4027-4035. [PMID: 39610904 PMCID: PMC11603012 DOI: 10.1016/j.csbj.2024.11.002] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/10/2024] [Revised: 11/01/2024] [Accepted: 11/01/2024] [Indexed: 11/30/2024] Open
Abstract
Single-cell Hi-C (scHi-C) techniques have significantly advanced our understanding of the 3D genome organization, providing crucial insights into the spatial genome architecture within individual nuclei. Numerous computational and statistical methods have been developed to analyze scHi-C data, with embedding methods playing a key role. Embedding reduces the dimensionality of complex scHi-C contact maps, making it easier to extract biologically meaningful patterns. These methods not only enhance cell clustering based on chromatin structures but also facilitate visualization and other downstream analyses. Most scHi-C embedding methods incorporate strategies such as normalization and imputation to address the inherent sparsity of scHi-C data, thereby further improving data quality and interpretability. In this review, we systematically examine the existing methods designed for scHi-C embedding, outlining their methodologies and discussing their capabilities in handling normalization and imputation. Additionally, we present a comprehensive benchmarking analysis to compare both embedding techniques and their clustering performances. This review serves as a practical guide for researchers seeking to select suitable scHi-C embedding tools, ultimately contributing to the understanding of the 3D organization of the genome.
Collapse
Affiliation(s)
- Rui Ma
- Department of Statistics, University of California Riverside, 900 University Ave., Riverside, 92521, CA, USA
| | - Jingong Huang
- Department of Computer Science and Engineering, University of California Riverside, 900 University Ave., Riverside, 92521, CA, USA
| | - Tao Jiang
- Department of Computer Science and Engineering, University of California Riverside, 900 University Ave., Riverside, 92521, CA, USA
- Institute of Integrative Genome Biology, University of California Riverside, 900 University Ave., Riverside, 92521, CA, USA
| | - Wenxiu Ma
- Department of Statistics, University of California Riverside, 900 University Ave., Riverside, 92521, CA, USA
- Institute of Integrative Genome Biology, University of California Riverside, 900 University Ave., Riverside, 92521, CA, USA
| |
Collapse
|
14
|
Park JC, Han JW, Lee W, Kim J, Lee SE, Lee D, Choi H, Han J, Kang YJ, Diep YN, Cho H, Kang R, Yu WJ, Lee J, Choi M, Im SW, Kim JI, Mook-Jung I. Microglia Gravitate toward Amyloid Plaques Surrounded by Externalized Phosphatidylserine via TREM2. ADVANCED SCIENCE (WEINHEIM, BADEN-WURTTEMBERG, GERMANY) 2024; 11:e2400064. [PMID: 38981007 PMCID: PMC11425970 DOI: 10.1002/advs.202400064] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 01/03/2024] [Revised: 05/08/2024] [Indexed: 07/11/2024]
Abstract
Microglia play a crucial role in synaptic elimination by engulfing dystrophic neurons via triggering receptors expressed on myeloid cells 2 (TREM2). They are also involved in the clearance of beta-amyloid (Aβ) plaques in Alzheimer's disease (AD); nonetheless, the driving force behind TREM2-mediated phagocytosis of beta-amyloid (Aβ) plaques remains unknown. Here, using advanced 2D/3D/4D co-culture systems with loss-of-function mutations in TREM2 (a frameshift mutation engineered in exon 2) brain organoids/microglia/assembloids, it is identified that the clearance of Aβ via TREM2 is accelerated by externalized phosphatidylserine (ePtdSer) generated from dystrophic neurons surrounding the Aβ plaques. Moreover, it is investigated whether microglia from both sporadic (CRISPR-Cas9-based APOE4 lines) and familial (APPNL-G-F/MAPT double knock-in mice) AD models show reduced levels of TREM2 and lack of phagocytic activity toward ePtdSer-positive Aβ plaques. Herein new insight is provided into TREM2-dependent microglial phagocytosis of Aβ plaques in the context of the presence of ePtdSer during AD progression.
Collapse
Affiliation(s)
- Jong-Chan Park
- Department of Biophysics, Sungkyunkwan University, Suwon, 16419, Republic of Korea
- Institute of Quantum Biophysics, Sungkyunkwan University, Suwon, 16419, Republic of Korea
- Department of Metabiohealth, Sungkyunkwan University, Suwon, 16419, Republic of Korea
| | - Jong Won Han
- Department of Biochemistry and Biomedical Sciences, College of Medicine, Seoul National University, Seoul, 03080, Republic of Korea
| | - Woochan Lee
- Department of Biochemistry and Biomedical Sciences, College of Medicine, Seoul National University, Seoul, 03080, Republic of Korea
- Genome Medicine Institute, Medical Research Center, Seoul National University, Seoul, 03080, Republic of Korea
| | - Jieun Kim
- Department of Biochemistry and Biomedical Sciences, College of Medicine, Seoul National University, Seoul, 03080, Republic of Korea
| | - Sang-Eun Lee
- Department of Physiology and Biomedical Sciences, College of Medicine, Seoul National University, Seoul, 03080, Republic of Korea
- BK21 FOUR Biomedical Science Program, College of Medicine, Seoul National University, Seoul, 03080, Republic of Korea
- UK Dementia Research Institute, Institute of Neurology, University College London, Gower Street, London, WC1E 6BT, UK
- Neuroscience Research Institute, Seoul National University Medical Research Center, Seoul, 03080, Republic of Korea
| | - Dongjoon Lee
- Department of Biochemistry and Biomedical Sciences, College of Medicine, Seoul National University, Seoul, 03080, Republic of Korea
| | - Hayoung Choi
- Department of Biochemistry and Biomedical Sciences, College of Medicine, Seoul National University, Seoul, 03080, Republic of Korea
| | - Jihui Han
- Department of Biochemistry and Biomedical Sciences, College of Medicine, Seoul National University, Seoul, 03080, Republic of Korea
| | - You Jung Kang
- Department of Biophysics, Sungkyunkwan University, Suwon, 16419, Republic of Korea
- Institute of Quantum Biophysics, Sungkyunkwan University, Suwon, 16419, Republic of Korea
| | - Yen N Diep
- Department of Biophysics, Sungkyunkwan University, Suwon, 16419, Republic of Korea
- Institute of Quantum Biophysics, Sungkyunkwan University, Suwon, 16419, Republic of Korea
- Department of Intelligent Precision Healthcare Convergence, Sungkyunkwan University, Suwon, 16419, Republic of Korea
| | - Hansang Cho
- Department of Biophysics, Sungkyunkwan University, Suwon, 16419, Republic of Korea
- Institute of Quantum Biophysics, Sungkyunkwan University, Suwon, 16419, Republic of Korea
- Department of Intelligent Precision Healthcare Convergence, Sungkyunkwan University, Suwon, 16419, Republic of Korea
| | - Rian Kang
- Institute of Quantum Biophysics, Sungkyunkwan University, Suwon, 16419, Republic of Korea
- Department of Metabiohealth, Sungkyunkwan University, Suwon, 16419, Republic of Korea
| | - Won Jong Yu
- Institute of Quantum Biophysics, Sungkyunkwan University, Suwon, 16419, Republic of Korea
- Department of Metabiohealth, Sungkyunkwan University, Suwon, 16419, Republic of Korea
| | - Jean Lee
- Department of Biomedical Sciences, College of Medicine, Seoul National University, Seoul, 03080, Republic of Korea
| | - Murim Choi
- Department of Biomedical Sciences, College of Medicine, Seoul National University, Seoul, 03080, Republic of Korea
| | - Sun-Wha Im
- Department of Biochemistry and Molecular Biology, Kangwon National University School of Medicine, Gangwon, Seoul, 24341, Republic of Korea
| | - Jong-Il Kim
- Genome Medicine Institute, Medical Research Center, Seoul National University, Seoul, 03080, Republic of Korea
- Department of Biomedical Sciences, College of Medicine, Seoul National University, Seoul, 03080, Republic of Korea
- Cancer Research Institute, Seoul National University, Seoul, 03080, Republic of Korea
- Department of Biochemistry and Molecular Biology, Seoul National University College of Medicine, Seoul, 03080, Republic of Korea
| | - Inhee Mook-Jung
- Department of Biochemistry and Biomedical Sciences, College of Medicine, Seoul National University, Seoul, 03080, Republic of Korea
- Convergence Dementia Research Center, College of Medicine, Seoul National University, Seoul, 03080, Republic of Korea
| |
Collapse
|
15
|
Liu W, Zhong W, Giusti-Rodríguez P, Jiang Z, Wang GW, Sun H, Hu M, Li Y. SnapHiC-G: identifying long-range enhancer-promoter interactions from single-cell Hi-C data via a global background model. Brief Bioinform 2024; 25:bbae426. [PMID: 39222061 PMCID: PMC11367764 DOI: 10.1093/bib/bbae426] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/20/2024] [Revised: 07/05/2024] [Accepted: 08/13/2024] [Indexed: 09/04/2024] Open
Abstract
Harnessing the power of single-cell genomics technologies, single-cell Hi-C (scHi-C) and its derived technologies provide powerful tools to measure spatial proximity between regulatory elements and their target genes in individual cells. Using a global background model, we propose SnapHiC-G, a computational method, to identify long-range enhancer-promoter interactions from scHi-C data. We applied SnapHiC-G to scHi-C datasets generated from mouse embryonic stem cells and human brain cortical cells. SnapHiC-G achieved high sensitivity in identifying long-range enhancer-promoter interactions. Moreover, SnapHiC-G can identify putative target genes for noncoding genome-wide association study (GWAS) variants, and the genetic heritability of neuropsychiatric diseases is enriched for single-nucleotide polymorphisms (SNPs) within SnapHiC-G-identified interactions in a cell-type-specific manner. In sum, SnapHiC-G is a powerful tool for characterizing cell-type-specific enhancer-promoter interactions from complex tissues and can facilitate the discovery of chromatin interactions important for gene regulation in biologically relevant cell types.
Collapse
Affiliation(s)
- Weifang Liu
- Department of Biostatistics, University of North Carolina at Chapel Hill, 135 Dauer Drive, Chapel Hill, NC 27599, United States
| | - Wujuan Zhong
- Biostatistics and Research Decision Sciences, Merck & Co., Inc., 126 East Lincoln Ave, Rahway, New Jersey 07065, United States
| | - Paola Giusti-Rodríguez
- Department of Psychiatry, University of Florida, 1149 Newel Dr., Gainesville, FL 32611, United States
| | - Zhiyun Jiang
- Department of Genetics, University of North Carolina at Chapel Hill, 120 Mason Farm Road, Chapel Hill, NC 27599, United States
| | - Geoffery W Wang
- Department of Biostatistics, University of North Carolina at Chapel Hill, 135 Dauer Drive, Chapel Hill, NC 27599, United States
| | - Huaigu Sun
- Department of Genetics, University of North Carolina at Chapel Hill, 120 Mason Farm Road, Chapel Hill, NC 27599, United States
| | - Ming Hu
- Department of Quantitative Health Sciences, Lerner Research Institute, Cleveland Clinic Foundation, 9500 Euclid Avenue, Cleveland, OH 44196, United States
| | - Yun Li
- Department of Biostatistics, University of North Carolina at Chapel Hill, 135 Dauer Drive, Chapel Hill, NC 27599, United States
- Department of Genetics, University of North Carolina at Chapel Hill, 120 Mason Farm Road, Chapel Hill, NC 27599, United States
- Department of Computer Science, University of North Carolina at Chapel Hill, 201 S. Columbia St, Chapel Hill, NC 27599, United States
| |
Collapse
|
16
|
Park K, Keleş S. Joint tensor modeling of single cell 3D genome and epigenetic data with Muscle. J Am Stat Assoc 2024; 119:2464-2477. [PMID: 39758139 PMCID: PMC11698508 DOI: 10.1080/01621459.2024.2358557] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/26/2023] [Revised: 04/29/2024] [Accepted: 05/16/2024] [Indexed: 01/07/2025]
Abstract
Emerging single cell technologies that simultaneously capture long-range interactions of genomic loci together with their DNA methylation levels are advancing our understanding of three-dimensional genome structure and its interplay with the epigenome at the single cell level. While methods to analyze data from single cell high throughput chromatin conformation capture (scHi-C) experiments are maturing, methods that can jointly analyze multiple single cell modalities with scHi-C data are lacking. Here, we introduce Muscle, a semi-nonnegative joint decomposition of Multiple single cell tensors, to jointly analyze 3D conformation and DNA methylation data at the single cell level. Muscle takes advantage of the inherent tensor structure of the scHi-C data, and integrates this modality with DNA methylation. We developed an alternating least squares algorithm for estimating Muscle parameters and established its optimality properties. Parameters estimated by Muscle directly align with the key components of the downstream analysis of scHi-C data in a cell type specific manner. Evaluations with data-driven experiments and simulations demonstrate the advantages of the joint modeling framework of Muscle over single modality modeling and a baseline multi modality modeling for cell type delineation and elucidating associations between modalities. Muscle is publicly available at https://github.com/keleslab/muscle.
Collapse
Affiliation(s)
- Kwangmoon Park
- Department of Statistics, University of Wisconsin, Madison, WI, USA, 53706
| | - Sündüz Keleş
- Department of Statistics, University of Wisconsin, Madison, WI, USA, 53706
- Department of Biostatistics and Medical Informatics, University of Wisconsin, Madison, WI, USA, 53726
| |
Collapse
|
17
|
Zhang Y, Boninsegna L, Yang M, Misteli T, Alber F, Ma J. Computational methods for analysing multiscale 3D genome organization. Nat Rev Genet 2024; 25:123-141. [PMID: 37673975 PMCID: PMC11127719 DOI: 10.1038/s41576-023-00638-1] [Citation(s) in RCA: 23] [Impact Index Per Article: 23.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 07/12/2023] [Indexed: 09/08/2023]
Abstract
Recent progress in whole-genome mapping and imaging technologies has enabled the characterization of the spatial organization and folding of the genome in the nucleus. In parallel, advanced computational methods have been developed to leverage these mapping data to reveal multiscale three-dimensional (3D) genome features and to provide a more complete view of genome structure and its connections to genome functions such as transcription. Here, we discuss how recently developed computational tools, including machine-learning-based methods and integrative structure-modelling frameworks, have led to a systematic, multiscale delineation of the connections among different scales of 3D genome organization, genomic and epigenomic features, functional nuclear components and genome function. However, approaches that more comprehensively integrate a wide variety of genomic and imaging datasets are still needed to uncover the functional role of 3D genome structure in defining cellular phenotypes in health and disease.
Collapse
Affiliation(s)
- Yang Zhang
- Computational Biology Department, School of Computer Science, Carnegie Mellon University, Pittsburgh, PA, USA
| | - Lorenzo Boninsegna
- Department of Microbiology, Immunology and Molecular Genetics and Institute for Quantitative and Computational Biosciences, University of California Los Angeles, Los Angeles, CA, USA
| | - Muyu Yang
- Computational Biology Department, School of Computer Science, Carnegie Mellon University, Pittsburgh, PA, USA
| | - Tom Misteli
- Center for Cancer Research, National Cancer Institute, Bethesda, MD, USA.
| | - Frank Alber
- Department of Microbiology, Immunology and Molecular Genetics and Institute for Quantitative and Computational Biosciences, University of California Los Angeles, Los Angeles, CA, USA.
| | - Jian Ma
- Computational Biology Department, School of Computer Science, Carnegie Mellon University, Pittsburgh, PA, USA.
| |
Collapse
|
18
|
Hua D, Gu M, Zhang X, Du Y, Xie H, Qi L, Du X, Bai Z, Zhu X, Tian D. DiffDomain enables identification of structurally reorganized topologically associating domains. Nat Commun 2024; 15:502. [PMID: 38218905 PMCID: PMC10787792 DOI: 10.1038/s41467-024-44782-6] [Citation(s) in RCA: 3] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/16/2022] [Accepted: 01/02/2024] [Indexed: 01/15/2024] Open
Abstract
Topologically associating domains (TADs) are critical structural units in three-dimensional genome organization of mammalian genome. Dynamic reorganizations of TADs between health and disease states are associated with essential genome functions. However, computational methods for identifying reorganized TADs are still in the early stages of development. Here, we present DiffDomain, an algorithm leveraging high-dimensional random matrix theory to identify structurally reorganized TADs using high-throughput chromosome conformation capture (Hi-C) contact maps. Method comparison using multiple real Hi-C datasets reveals that DiffDomain outperforms alternative methods for false positive rates, true positive rates, and identifying a new subtype of reorganized TADs. Applying DiffDomain to Hi-C data from different cell types and disease states demonstrates its biological relevance. Identified reorganized TADs are associated with structural variations and epigenomic changes such as changes in CTCF binding sites. By applying to a single-cell Hi-C data from mouse neuronal development, DiffDomain can identify reorganized TADs between cell types with reasonable reproducibility using pseudo-bulk Hi-C data from as few as 100 cells per condition. Moreover, DiffDomain reveals differential cell-to-population variability and heterogeneous cell-to-cell variability in TADs. Therefore, DiffDomain is a statistically sound method for better comparative analysis of TADs using both Hi-C and single-cell Hi-C data.
Collapse
Affiliation(s)
- Dunming Hua
- Department of Biostatistics and Systems Biology, School of Public Health (Shenzhen), Sun Yat-sen University, Shenzhen, Guangdong, 510275, China
- Department of Biostatistics and Systems Biology, School of Public Health (Shenzhen), Shenzhen Campus of Sun Yat-sen University, Shenzhen, 518107, China
| | - Ming Gu
- Department of Biostatistics and Systems Biology, School of Public Health (Shenzhen), Sun Yat-sen University, Shenzhen, Guangdong, 510275, China
- Department of Biostatistics and Systems Biology, School of Public Health (Shenzhen), Shenzhen Campus of Sun Yat-sen University, Shenzhen, 518107, China
| | - Xiao Zhang
- Department of Biostatistics and Systems Biology, School of Public Health (Shenzhen), Sun Yat-sen University, Shenzhen, Guangdong, 510275, China
- Department of Biostatistics and Systems Biology, School of Public Health (Shenzhen), Shenzhen Campus of Sun Yat-sen University, Shenzhen, 518107, China
| | - Yanyi Du
- Department of Biostatistics and Systems Biology, School of Public Health (Shenzhen), Sun Yat-sen University, Shenzhen, Guangdong, 510275, China
- Department of Biostatistics and Systems Biology, School of Public Health (Shenzhen), Shenzhen Campus of Sun Yat-sen University, Shenzhen, 518107, China
| | - Hangcheng Xie
- Department of Biostatistics and Systems Biology, School of Public Health (Shenzhen), Sun Yat-sen University, Shenzhen, Guangdong, 510275, China
- Department of Biostatistics and Systems Biology, School of Public Health (Shenzhen), Shenzhen Campus of Sun Yat-sen University, Shenzhen, 518107, China
| | - Li Qi
- Chongqing Municipal Center for Disease Control and Prevention, Chongqing, 400042, China
| | - Xiangjun Du
- Department of Biostatistics and Systems Biology, School of Public Health (Shenzhen), Sun Yat-sen University, Shenzhen, Guangdong, 510275, China
- Department of Biostatistics and Systems Biology, School of Public Health (Shenzhen), Shenzhen Campus of Sun Yat-sen University, Shenzhen, 518107, China
| | - Zhidong Bai
- KLASMOE & School of Mathematics and Statistics, Northeast Normal University, Changchun, Jilin, 130024, China
| | - Xiaopeng Zhu
- MyCellome LLC., Allison Park, PA, 15101, USA
- Computational Biology Department, School of Computer Science, Carnegie Mellon University, Pittsburgh, PA, 15213, USA
| | - Dechao Tian
- Department of Biostatistics and Systems Biology, School of Public Health (Shenzhen), Sun Yat-sen University, Shenzhen, Guangdong, 510275, China.
- Department of Biostatistics and Systems Biology, School of Public Health (Shenzhen), Shenzhen Campus of Sun Yat-sen University, Shenzhen, 518107, China.
| |
Collapse
|
19
|
Gunsalus LM, Keiser MJ, Pollard KS. ChromaFactor: deconvolution of single-molecule chromatin organization with non-negative matrix factorization. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.11.22.568268. [PMID: 38045231 PMCID: PMC10690235 DOI: 10.1101/2023.11.22.568268] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/05/2023]
Abstract
The investigation of chromatin organization in single cells holds great promise for identifying causal relationships between genome structure and function. However, analysis of single-molecule data is hampered by extreme yet inherent heterogeneity, making it challenging to determine the contributions of individual chromatin fibers to bulk trends. To address this challenge, we propose ChromaFactor, a novel computational approach based on non-negative matrix factorization that deconvolves single-molecule chromatin organization datasets into their most salient primary components. ChromaFactor provides the ability to identify trends accounting for the maximum variance in the dataset while simultaneously describing the contribution of individual molecules to each component. Applying our approach to two single-molecule imaging datasets across different genomic scales, we find that these primary components demonstrate significant correlation with key functional phenotypes, including active transcription, enhancer-promoter distance, and genomic compartment. ChromaFactor offers a robust tool for understanding the complex interplay between chromatin structure and function on individual DNA molecules, pinpointing which subpopulations drive functional changes and fostering new insights into cellular heterogeneity and its implications for bulk genomic phenomena.
Collapse
Affiliation(s)
- Laura M. Gunsalus
- Gladstone Institutes, San Francisco, CA
- Bakar Computational Health Sciences Institute, University of California, San Francisco, CA
| | - Michael J. Keiser
- Bakar Computational Health Sciences Institute, University of California, San Francisco, CA
- Department of Bioengineering and Therapeutic Sciences, University of California, San Francisco, CA
- Kavli Institute for Fundamental Neuroscience, University of California, San Francisco, San Francisco, CA
- Department of Pharmaceutical Chemistry, University of California, San Francisco, San Francisco, CA
- Department of Epidemiology & Biostatistics, University of California, San Francisco, San Francisco, CA
| | - Katherine S. Pollard
- Gladstone Institutes, San Francisco, CA
- Institute for Neurodegenerative Diseases, University of California, San Francisco, San Francisco, CA
- Department of Bioengineering and Therapeutic Sciences, University of California, San Francisco, CA
- Chan Zuckerberg Biohub, San Francisco, CA
| |
Collapse
|
20
|
龚 海, 麻 付, 张 晓. [Advances in methods and applications of single-cell Hi-C data analysis]. SHENG WU YI XUE GONG CHENG XUE ZA ZHI = JOURNAL OF BIOMEDICAL ENGINEERING = SHENGWU YIXUE GONGCHENGXUE ZAZHI 2023; 40:1033-1039. [PMID: 37879935 PMCID: PMC10600426 DOI: 10.7507/1001-5515.202303046] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Subscribe] [Scholar Register] [Received: 03/20/2023] [Revised: 08/29/2023] [Indexed: 10/27/2023]
Abstract
Chromatin three-dimensional genome structure plays a key role in cell function and gene regulation. Single-cell Hi-C techniques can capture genomic structure information at the cellular level, which provides an opportunity to study changes in genomic structure between different cell types. Recently, some excellent computational methods have been developed for single-cell Hi-C data analysis. In this paper, the available methods for single-cell Hi-C data analysis were first reviewed, including preprocessing of single-cell Hi-C data, multi-scale structure recognition based on single-cell Hi-C data, bulk-like Hi-C contact matrix generation based on single-cell Hi-C data sets, pseudo-time series analysis, and cell classification. Then the application of single-cell Hi-C data in cell differentiation and structural variation was described. Finally, the future development direction of single-cell Hi-C data analysis was also prospected.
Collapse
Affiliation(s)
- 海燕 龚
- 北京科技大学 新材料技术研究院 (北京 100083)Institute for Advanced Materials and Technology, University of Science and Technology Beijing, Beijing 100083, P. R. China
- 北京科技大学 计算机与通信工程学院(北京 100083)School of Computer and Communication Engineering, University of Science and Technology Beijing, Beijing 100083, P. R. China
| | - 付强 麻
- 北京科技大学 新材料技术研究院 (北京 100083)Institute for Advanced Materials and Technology, University of Science and Technology Beijing, Beijing 100083, P. R. China
| | - 晓彤 张
- 北京科技大学 新材料技术研究院 (北京 100083)Institute for Advanced Materials and Technology, University of Science and Technology Beijing, Beijing 100083, P. R. China
- 北京科技大学 计算机与通信工程学院(北京 100083)School of Computer and Communication Engineering, University of Science and Technology Beijing, Beijing 100083, P. R. China
| |
Collapse
|
21
|
Liu H, Ma W. scHiCDiff: detecting differential chromatin interactions in single-cell Hi-C data. Bioinformatics 2023; 39:btad625. [PMID: 37847655 PMCID: PMC10598576 DOI: 10.1093/bioinformatics/btad625] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/28/2022] [Revised: 08/15/2023] [Accepted: 10/16/2023] [Indexed: 10/19/2023] Open
Abstract
SUMMARY Here, we presented the scHiCDiff software tool that provides both nonparametric tests and parametirc models to detect differential chromatin interactions (DCIs) from single-cell Hi-C data. We thoroughly evaluated the scHiCDiff methods on both simulated and real data. Our results demonstrated that scHiCDiff, especially the zero-inflated negative binomial model option, can effectively detect reliable and consistent single-cell DCIs between two conditions, thereby facilitating the study of cell type-specific variations of chromatin structures at the single-cell level. AVAILABILITY AND IMPLEMENTATION scHiCDiff is implemented in R and freely available at GitHub (https://github.com/wmalab/scHiCDiff).
Collapse
Affiliation(s)
- Huiling Liu
- Department of Statistics, University of California Riverside, Riverside, CA 92521, United States
| | - Wenxiu Ma
- Department of Statistics, University of California Riverside, Riverside, CA 92521, United States
| |
Collapse
|
22
|
Lee L, Yu M, Li X, Zhu C, Zhang Y, Yu H, Chen Z, Mishra S, Ren B, Li Y, Hu M. SnapHiC-D: a computational pipeline to identify differential chromatin contacts from single-cell Hi-C data. Brief Bioinform 2023; 24:bbad315. [PMID: 37649383 PMCID: PMC10516352 DOI: 10.1093/bib/bbad315] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/25/2023] [Revised: 08/04/2023] [Accepted: 08/07/2023] [Indexed: 09/01/2023] Open
Abstract
Single-cell high-throughput chromatin conformation capture technologies (scHi-C) has been used to map chromatin spatial organization in complex tissues. However, computational tools to detect differential chromatin contacts (DCCs) from scHi-C datasets in development and through disease pathogenesis are still lacking. Here, we present SnapHiC-D, a computational pipeline to identify DCCs between two scHi-C datasets. Compared to methods designed for bulk Hi-C data, SnapHiC-D detects DCCs with high sensitivity and accuracy. We used SnapHiC-D to identify cell-type-specific chromatin contacts at 10 Kb resolution in mouse hippocampal and human prefrontal cortical tissues, demonstrating that DCCs detected in the hippocampal and cortical cell types are generally associated with cell-type-specific gene expression patterns and epigenomic features. SnapHiC-D is freely available at https://github.com/HuMingLab/SnapHiC-D.
Collapse
Affiliation(s)
- Lindsay Lee
- Department of Quantitative Health Sciences, Lerner Research Institute, Cleveland Clinic Foundation, Cleveland, OH, USA
| | - Miao Yu
- Ludwig Institute for Cancer Research, La Jolla, CA, USA
- State Key Laboratory of Genetic Engineering, School of Life Sciences, Fudan University, Shanghai, China
| | - Xiaoqi Li
- Carolina Health Informatics Program, University of North Carolina, Chapel Hill, NC, USA
| | - Chenxu Zhu
- Ludwig Institute for Cancer Research, La Jolla, CA, USA
- New York Genome Center, New York, NY, USA
- Department of Physiology and Biophysics, Institute for Computational Biomedicine, Weill Cornell Medicine, New York, NY, USA
| | - Yanxiao Zhang
- Ludwig Institute for Cancer Research, La Jolla, CA, USA
- Westlake University, Hangzhou, Zhejiang, China
| | - Hongyu Yu
- Department of Statistics, University of Wisconsin Madison, Madison, WI, USA
- Department of Biochemistry, University of Wisconsin Madison, Madison, WI, USA
| | - Ziyin Chen
- State Key Laboratory of Genetic Engineering, School of Life Sciences, Fudan University, Shanghai, China
| | - Shreya Mishra
- Department of Quantitative Health Sciences, Lerner Research Institute, Cleveland Clinic Foundation, Cleveland, OH, USA
| | - Bing Ren
- Ludwig Institute for Cancer Research, La Jolla, CA, USA
- Center for Epigenomics & Department of Cellular and Molecular Medicine, University of California, San Diego, La Jolla, CA, USA
| | - Yun Li
- Department of Biostatistics, University of North Carolina, Chapel Hill, NC, USA
- Department of Genetics, University of North Carolina, Chapel Hill, NC, USA
- Department of Computer Science, University of North Carolina, Chapel Hill, NC, USA
| | - Ming Hu
- Department of Quantitative Health Sciences, Lerner Research Institute, Cleveland Clinic Foundation, Cleveland, OH, USA
| |
Collapse
|
23
|
Li Z, Portillo-Ledesma S, Schlick T. Techniques for and challenges in reconstructing 3D genome structures from 2D chromosome conformation capture data. Curr Opin Cell Biol 2023; 83:102209. [PMID: 37506571 PMCID: PMC10529954 DOI: 10.1016/j.ceb.2023.102209] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/15/2023] [Revised: 06/07/2023] [Accepted: 06/26/2023] [Indexed: 07/30/2023]
Abstract
Chromosome conformation capture technologies that provide frequency information for contacts between genomic regions have been crucial for increasing our understanding of genome folding and regulation. However, such data do not provide direct evidence of the spatial 3D organization of chromatin. In this opinion article, we discuss the development and application of computational methods to reconstruct chromatin 3D structures from experimental 2D contact data, highlighting how such modeling provides biological insights and can suggest mechanisms anchored to experimental data. By applying different reconstruction methods to the same contact data, we illustrate some state-of-the-art of these techniques and discuss our gene resolution approach based on Brownian dynamics and Monte Carlo sampling.
Collapse
Affiliation(s)
- Zilong Li
- Department of Chemistry, New York University, 100 Washington Square East, Silver Building, New York, 10003, NY, USA; Simons Center for Computational Physical Chemistry, New York University, 24 Waverly Place, Silver Building, New York, NY, 10003, USA
| | - Stephanie Portillo-Ledesma
- Department of Chemistry, New York University, 100 Washington Square East, Silver Building, New York, 10003, NY, USA; Simons Center for Computational Physical Chemistry, New York University, 24 Waverly Place, Silver Building, New York, NY, 10003, USA
| | - Tamar Schlick
- Department of Chemistry, New York University, 100 Washington Square East, Silver Building, New York, 10003, NY, USA; Courant Institute of Mathematical Sciences, New York University, 251 Mercer St., New York, 10012, NY, USA; New York University-East China Normal University Center for Computational Chemistry, New York University Shanghai, Room 340, Geography Building, 3663 North Zhongshan Road, Shanghai, 200122, China; Simons Center for Computational Physical Chemistry, New York University, 24 Waverly Place, Silver Building, New York, NY, 10003, USA.
| |
Collapse
|
24
|
Wang Y, Guo Z, Cheng J. Single-cell Hi-C data enhancement with deep residual and generative adversarial networks. Bioinformatics 2023; 39:btad458. [PMID: 37498561 PMCID: PMC10403428 DOI: 10.1093/bioinformatics/btad458] [Citation(s) in RCA: 5] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/19/2023] [Revised: 07/19/2023] [Accepted: 07/25/2023] [Indexed: 07/28/2023] Open
Abstract
MOTIVATION The spatial genome organization of a eukaryotic cell is important for its function. The development of single-cell technologies for probing the 3D genome conformation, especially single-cell chromosome conformation capture techniques, has enabled us to understand genome function better than before. However, due to extreme sparsity and high noise associated with single-cell Hi-C data, it is still difficult to study genome structure and function using the HiC-data of one single cell. RESULTS In this work, we developed a deep learning method ScHiCEDRN based on deep residual networks and generative adversarial networks for the imputation and enhancement of Hi-C data of a single cell. In terms of both image evaluation and Hi-C reproducibility metrics, ScHiCEDRN outperforms the four deep learning methods (DeepHiC, HiCPlus, HiCSR, and Loopenhance) on enhancing the raw single-cell Hi-C data of human and Drosophila. The experiments also show that it can generate single-cell Hi-C data more suitable for identifying topologically associating domain boundaries and reconstructing 3D chromosome structures than the existing methods. Moreover, ScHiCEDRN's performance generalizes well across different single cells and cell types, and it can be applied to improving population Hi-C data. AVAILABILITY AND IMPLEMENTATION The source code of ScHiCEDRN is available at the GitHub repository: https://github.com/BioinfoMachineLearning/ScHiCEDRN.
Collapse
Affiliation(s)
- Yanli Wang
- Department of Electrical Engineering and Computer Science, University of Missouri, Columbia, MO 65211, United States
- NextGen Precision Health Institute, University of Missouri, Columbia, MO 65211, United States
| | - Zhiye Guo
- Department of Electrical Engineering and Computer Science, University of Missouri, Columbia, MO 65211, United States
- NextGen Precision Health Institute, University of Missouri, Columbia, MO 65211, United States
| | - Jianlin Cheng
- Department of Electrical Engineering and Computer Science, University of Missouri, Columbia, MO 65211, United States
- NextGen Precision Health Institute, University of Missouri, Columbia, MO 65211, United States
| |
Collapse
|
25
|
Liu Z, Chen Y, Xia Q, Liu M, Xu H, Chi Y, Deng Y, Xing D. Linking genome structures to functions by simultaneous single-cell Hi-C and RNA-seq. Science 2023; 380:1070-1076. [PMID: 37289875 DOI: 10.1126/science.adg3797] [Citation(s) in RCA: 54] [Impact Index Per Article: 27.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/21/2022] [Accepted: 05/07/2023] [Indexed: 06/10/2023]
Abstract
Much progress has been made recently in single-cell chromosome conformation capture technologies. However, a method that allows simultaneous profiling of chromatin architecture and gene expression has not been reported. Here, we developed an assay named "Hi-C and RNA-seq employed simultaneously" (HiRES) and performed it on thousands of single cells from developing mouse embryos. Single-cell three-dimensional genome structures, despite being heavily determined by the cell cycle and developmental stages, gradually diverged in a cell type-specific manner as development progressed. By comparing the pseudotemporal dynamics of chromatin interactions with gene expression, we found a widespread chromatin rewiring that occurred before transcription activation. Our results demonstrate that the establishment of specific chromatin interactions is tightly related to transcriptional control and cell functions during lineage specification.
Collapse
Affiliation(s)
- Zhiyuan Liu
- Biomedical Pioneering Innovation Center (BIOPIC), School of Life Sciences, Peking University, Beijing, China
- Beijing Advanced Innovation Center for Genomics (ICG), Peking University, Beijing, China
| | - Yujie Chen
- Biomedical Pioneering Innovation Center (BIOPIC), School of Life Sciences, Peking University, Beijing, China
- Beijing Advanced Innovation Center for Genomics (ICG), Peking University, Beijing, China
| | - Qimin Xia
- Biomedical Pioneering Innovation Center (BIOPIC), School of Life Sciences, Peking University, Beijing, China
- Beijing Advanced Innovation Center for Genomics (ICG), Peking University, Beijing, China
| | - Menghan Liu
- Biomedical Pioneering Innovation Center (BIOPIC), School of Life Sciences, Peking University, Beijing, China
- Beijing Advanced Innovation Center for Genomics (ICG), Peking University, Beijing, China
| | - Heming Xu
- Biomedical Pioneering Innovation Center (BIOPIC), School of Life Sciences, Peking University, Beijing, China
- Beijing Advanced Innovation Center for Genomics (ICG), Peking University, Beijing, China
| | - Yi Chi
- Biomedical Pioneering Innovation Center (BIOPIC), School of Life Sciences, Peking University, Beijing, China
- Beijing Advanced Innovation Center for Genomics (ICG), Peking University, Beijing, China
| | - Yujing Deng
- Biomedical Pioneering Innovation Center (BIOPIC), School of Life Sciences, Peking University, Beijing, China
- Beijing Advanced Innovation Center for Genomics (ICG), Peking University, Beijing, China
| | - Dong Xing
- Biomedical Pioneering Innovation Center (BIOPIC), School of Life Sciences, Peking University, Beijing, China
- Beijing Advanced Innovation Center for Genomics (ICG), Peking University, Beijing, China
| |
Collapse
|