1
|
Xie Q, Meng W, Lin S. scHiCSRS: a self-representation smoothing method with Gaussian mixture model for imputing single cell Hi-C data. BMC Bioinformatics 2025; 26:132. [PMID: 40399810 PMCID: PMC12093726 DOI: 10.1186/s12859-025-06147-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/08/2024] [Accepted: 04/23/2025] [Indexed: 05/23/2025] Open
Abstract
BACKGROUND Single cell Hi-C (scHi-C) techniques make it possible to study cell-to-cell variability, but excess of zeros are makes scHi-C matrices extremely sparse and difficult for downstream analyses. The observed zeros are a combination of two events: structural zeros for which two loci never interact due to underlying biological mechanisms, or dropouts (sampling zeros) where two loci interact but not captured due to insufficient sequencing depth. Although data quality improvement approaches have been proposed, little has been done to differentiate these two types of zeros, even though such a distinction can greatly benefit downstream analysis such as clustering. RESULTS We propose scHiCSRS, a self-representation smoothing method that improves data quality, and a Gaussian mixture model that identifies structural zeros among observed zeros. scHiCSRS not only takes spatial dependencies of a scHi-C data matrix into account but also borrows information from similar single cells. Through an extensive set of simulation studies, we demonstrate the ability of scHiCSRS for identifying structural zeros with high sensitivity and for accurate imputation of dropout values in sampling zeros. Downstream analyses for three experimental datasets show that data improved from scHiCSRS yield more accurate clustering of cells than simply using observed data or improved data from comparison methods. CONCLUSION In summary, scHiCSRS provides a valuable tool for identifying structural zeros and imputing dropouts. The resulted data are improved for downstream analysis, especially for understanding cell-to-cell variation through subtype clustering.
Collapse
Affiliation(s)
- Qing Xie
- Interdisciplinary Ph.D. Program in Biostatistics, The Ohio State University, Columbus, OH, 43210, USA
| | - Wang Meng
- Abigail Wexner Research Institute, Nationwide Children's Hospital, Columbus, OH, 43205, USA
| | - Shili Lin
- Interdisciplinary Ph.D. Program in Biostatistics, The Ohio State University, Columbus, OH, 43210, USA.
- Department of Statistics, The Ohio State University, Columbus, OH, 43210, USA.
| |
Collapse
|
2
|
Wang Y, Cheng J. Reconstructing 3D chromosome structures from single-cell Hi-C data with SO(3)-equivariant graph neural networks. NAR Genom Bioinform 2025; 7:lqaf027. [PMID: 40124711 PMCID: PMC11928942 DOI: 10.1093/nargab/lqaf027] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/13/2024] [Revised: 02/23/2025] [Accepted: 03/05/2025] [Indexed: 03/25/2025] Open
Abstract
The spatial conformation of chromosomes and genomes of single cells is relevant to cellular function and useful for elucidating the mechanism underlying gene expression and genome methylation. The chromosomal contacts (i.e. chromosomal regions in spatial proximity) entailing the three-dimensional (3D) structure of the genome of a single cell can be obtained by single-cell chromosome conformation capture techniques, such as single-cell Hi-C (ScHi-C). However, due to the sparsity of chromosomal contacts in ScHi-C data, it is still challenging for traditional 3D conformation optimization methods to reconstruct the 3D chromosome structures from ScHi-C data. Here, we present a machine learning-based method based on a novel SO(3)-equivariant graph neural network (HiCEGNN) to reconstruct 3D structures of chromosomes of single cells from ScHi-C data. HiCEGNN consistently outperforms both the traditional optimization methods and the only other deep learning method across diverse cells, different structural resolutions, and different noise levels of the data. Moreover, HiCEGNN is robust against the noise in the ScHi-C data.
Collapse
Affiliation(s)
- Yanli Wang
- Department of Electrical Engineering and Computer Science, NextGen Precision Health Institute, University of Missouri, Columbia, MO 65211, United States
| | - Jianlin Cheng
- Department of Electrical Engineering and Computer Science, NextGen Precision Health Institute, University of Missouri, Columbia, MO 65211, United States
| |
Collapse
|
3
|
Kang B, Lee H, Roh TY. Deciphering single-cell genomic architecture: insights into cellular heterogeneity and regulatory dynamics. Genomics Inform 2025; 23:5. [PMID: 39934929 DOI: 10.1186/s44342-025-00037-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/31/2024] [Accepted: 01/19/2025] [Indexed: 02/13/2025] Open
Abstract
BACKGROUND The genomic architecture of eukaryotes exhibits dynamic spatial and temporal changes, enabling cellular processes critical for maintaining viability and functional diversity. Recent advances in sequencing technologies have facilitated the dissection of genomic architecture and functional activity at single-cell resolution, moving beyond the averaged signals typically derived from bulk cell analyses. MAIN BODY The advent of single-cell genomics and epigenomics has yielded transformative insights into cellular heterogeneity, behavior, and biological complexity with unparalleled genomic resolution and reproducibility. This review summarizes recent progress in the characterization of genomic architecture at the single-cell level, emphasizing the impact of structural variation and chromatin organization on gene regulatory networks and cellular identity. CONCLUSION Future directions in single-cell genomics and high-resolution epigenomic methodologies are explored, focusing on emerging challenges and potential impacts on the understanding of cellular states, regulatory dynamics, and the intricate mechanisms driving cellular function and diversity. Future perspectives on the challenges and potential implications of single-cell genomics, along with high-resolution genomic and epigenomic technologies for understanding cellular states and regulatory dynamics, are also discussed.
Collapse
Affiliation(s)
- Byunghee Kang
- Department of Life Sciences, Pohang University of Science and Technology (POSTECH), Pohang, 37673, Republic of Korea
| | - Hyeonji Lee
- Department of Life Sciences, Pohang University of Science and Technology (POSTECH), Pohang, 37673, Republic of Korea
| | - Tae-Young Roh
- Department of Life Sciences, Ewha Womans University, Seoul, 03760, Republic of Korea.
| |
Collapse
|
4
|
Sun F, Li H, Sun D, Fu S, Gu L, Shao X, Wang Q, Dong X, Duan B, Xing F, Wu J, Xiao M, Zhao F, Han JDJ, Liu Q, Fan X, Li C, Wang C, Shi T. Single-cell omics: experimental workflow, data analyses and applications. SCIENCE CHINA. LIFE SCIENCES 2025; 68:5-102. [PMID: 39060615 DOI: 10.1007/s11427-023-2561-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 12/07/2023] [Accepted: 04/18/2024] [Indexed: 07/28/2024]
Abstract
Cells are the fundamental units of biological systems and exhibit unique development trajectories and molecular features. Our exploration of how the genomes orchestrate the formation and maintenance of each cell, and control the cellular phenotypes of various organismsis, is both captivating and intricate. Since the inception of the first single-cell RNA technology, technologies related to single-cell sequencing have experienced rapid advancements in recent years. These technologies have expanded horizontally to include single-cell genome, epigenome, proteome, and metabolome, while vertically, they have progressed to integrate multiple omics data and incorporate additional information such as spatial scRNA-seq and CRISPR screening. Single-cell omics represent a groundbreaking advancement in the biomedical field, offering profound insights into the understanding of complex diseases, including cancers. Here, we comprehensively summarize recent advances in single-cell omics technologies, with a specific focus on the methodology section. This overview aims to guide researchers in selecting appropriate methods for single-cell sequencing and related data analysis.
Collapse
Affiliation(s)
- Fengying Sun
- Department of Clinical Laboratory, the Affiliated Wuhu Hospital of East China Normal University (The Second People's Hospital of Wuhu City), Wuhu, 241000, China
| | - Haoyan Li
- Pharmaceutical Informatics Institute, College of Pharmaceutical Sciences, Zhejiang University, Hangzhou, 310058, China
| | - Dongqing Sun
- Key Laboratory of Spine and Spinal Cord Injury Repair and Regeneration (Tongji University), Ministry of Education, Orthopaedic Department, Tongji Hospital, Bioinformatics Department, School of Life Sciences and Technology, Tongji University, Shanghai, 200082, China
- Frontier Science Center for Stem Cells, School of Life Sciences and Technology, Tongji University, Shanghai, 200092, China
| | - Shaliu Fu
- Key Laboratory of Spine and Spinal Cord Injury Repair and Regeneration (Tongji University), Ministry of Education, Orthopaedic Department, Tongji Hospital, Bioinformatics Department, School of Life Sciences and Technology, Tongji University, Shanghai, 200082, China
- Translational Medical Center for Stem Cell Therapy and Institute for Regenerative Medicine, Shanghai East Hospital, Bioinformatics Department, School of Life Sciences and Technology, Tongji University, Shanghai, 200082, China
- Research Institute of Intelligent Computing, Zhejiang Lab, Hangzhou, 311121, China
- Shanghai Research Institute for Intelligent Autonomous Systems, Shanghai, 201210, China
| | - Lei Gu
- Center for Single-cell Omics, School of Public Health, Shanghai Jiao Tong University School of Medicine, Shanghai, 200025, China
| | - Xin Shao
- Pharmaceutical Informatics Institute, College of Pharmaceutical Sciences, Zhejiang University, Hangzhou, 310058, China
- National Key Laboratory of Chinese Medicine Modernization, Innovation Center of Yangtze River Delta, Zhejiang University, Jiaxing, 314103, China
| | - Qinqin Wang
- Center for Single-cell Omics, School of Public Health, Shanghai Jiao Tong University School of Medicine, Shanghai, 200025, China
| | - Xin Dong
- Key Laboratory of Spine and Spinal Cord Injury Repair and Regeneration (Tongji University), Ministry of Education, Orthopaedic Department, Tongji Hospital, Bioinformatics Department, School of Life Sciences and Technology, Tongji University, Shanghai, 200082, China
- Frontier Science Center for Stem Cells, School of Life Sciences and Technology, Tongji University, Shanghai, 200092, China
| | - Bin Duan
- Key Laboratory of Spine and Spinal Cord Injury Repair and Regeneration (Tongji University), Ministry of Education, Orthopaedic Department, Tongji Hospital, Bioinformatics Department, School of Life Sciences and Technology, Tongji University, Shanghai, 200082, China
- Translational Medical Center for Stem Cell Therapy and Institute for Regenerative Medicine, Shanghai East Hospital, Bioinformatics Department, School of Life Sciences and Technology, Tongji University, Shanghai, 200082, China
- Research Institute of Intelligent Computing, Zhejiang Lab, Hangzhou, 311121, China
- Shanghai Research Institute for Intelligent Autonomous Systems, Shanghai, 201210, China
| | - Feiyang Xing
- Key Laboratory of Spine and Spinal Cord Injury Repair and Regeneration (Tongji University), Ministry of Education, Orthopaedic Department, Tongji Hospital, Bioinformatics Department, School of Life Sciences and Technology, Tongji University, Shanghai, 200082, China
- Frontier Science Center for Stem Cells, School of Life Sciences and Technology, Tongji University, Shanghai, 200092, China
| | - Jun Wu
- Center for Bioinformatics and Computational Biology, Shanghai Key Laboratory of Regulatory Biology, the Institute of Biomedical Sciences and School of Life Sciences, East China Normal University, Shanghai, 200241, China
| | - Minmin Xiao
- Department of Clinical Laboratory, the Affiliated Wuhu Hospital of East China Normal University (The Second People's Hospital of Wuhu City), Wuhu, 241000, China.
| | - Fangqing Zhao
- Beijing Institutes of Life Science, Chinese Academy of Sciences, Beijing, 100101, China.
| | - Jing-Dong J Han
- Peking-Tsinghua Center for Life Sciences, Academy for Advanced Interdisciplinary Studies, Center for Quantitative Biology (CQB), Peking University, Beijing, 100871, China.
| | - Qi Liu
- Key Laboratory of Spine and Spinal Cord Injury Repair and Regeneration (Tongji University), Ministry of Education, Orthopaedic Department, Tongji Hospital, Bioinformatics Department, School of Life Sciences and Technology, Tongji University, Shanghai, 200082, China.
- Translational Medical Center for Stem Cell Therapy and Institute for Regenerative Medicine, Shanghai East Hospital, Bioinformatics Department, School of Life Sciences and Technology, Tongji University, Shanghai, 200082, China.
- Research Institute of Intelligent Computing, Zhejiang Lab, Hangzhou, 311121, China.
- Shanghai Research Institute for Intelligent Autonomous Systems, Shanghai, 201210, China.
| | - Xiaohui Fan
- Pharmaceutical Informatics Institute, College of Pharmaceutical Sciences, Zhejiang University, Hangzhou, 310058, China.
- National Key Laboratory of Chinese Medicine Modernization, Innovation Center of Yangtze River Delta, Zhejiang University, Jiaxing, 314103, China.
- Zhejiang Key Laboratory of Precision Diagnosis and Therapy for Major Gynecological Diseases, Women's Hospital, Zhejiang University School of Medicine, Hangzhou, 310006, China.
| | - Chen Li
- Center for Single-cell Omics, School of Public Health, Shanghai Jiao Tong University School of Medicine, Shanghai, 200025, China.
| | - Chenfei Wang
- Key Laboratory of Spine and Spinal Cord Injury Repair and Regeneration (Tongji University), Ministry of Education, Orthopaedic Department, Tongji Hospital, Bioinformatics Department, School of Life Sciences and Technology, Tongji University, Shanghai, 200082, China.
- Frontier Science Center for Stem Cells, School of Life Sciences and Technology, Tongji University, Shanghai, 200092, China.
| | - Tieliu Shi
- Department of Clinical Laboratory, the Affiliated Wuhu Hospital of East China Normal University (The Second People's Hospital of Wuhu City), Wuhu, 241000, China.
- Center for Bioinformatics and Computational Biology, Shanghai Key Laboratory of Regulatory Biology, the Institute of Biomedical Sciences and School of Life Sciences, East China Normal University, Shanghai, 200241, China.
- Key Laboratory of Advanced Theory and Application in Statistics and Data Science-MOE, School of Statistics, East China Normal University, Shanghai, 200062, China.
| |
Collapse
|
5
|
Banecki K, Korsak S, Plewczynski D. Advancements and future directions in single-cell Hi-C based 3D chromatin modeling. Comput Struct Biotechnol J 2024; 23:3549-3558. [PMID: 39963420 PMCID: PMC11832020 DOI: 10.1016/j.csbj.2024.09.026] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/06/2024] [Revised: 09/27/2024] [Accepted: 09/29/2024] [Indexed: 02/20/2025] Open
Abstract
Single-cell Hi-C data provides valuable insights into the three-dimensional organization of chromatin within individual cells, yet modeling this data poses significant challenges due to its inherent sparsity and variability. This review comprehensively explores the predominant approaches to reconstructing 3D chromatin structures from single-cell Hi-C data, positioning these methods within the broader contexts of single-cell Hi-C research and bulk Hi-C data modeling. We categorize the modeling strategies based on their objective functions, which are framed in terms of force fields, potentials, cost functions, or likelihood probabilities. Despite their diverse methodologies, these approaches exhibit deep underlying similarities. We further dissect the basic components of these models, such as attractive restraint forces and repulsive forces, and discuss additional terms like fluid viscosity and variation penalties. The review also critically evaluates the current state of model validation, highlighting the inconsistencies across various studies and emphasizing the need for a comprehensive validation framework. We detail common validation techniques, including the comparison of distance matrices and the assessment of contact violations. We argue that the future of single-cell Hi-C modeling lies in integrating multiple data modalities and incorporating cell cycle trajectory information. Such integration could significantly advance our understanding of chromatin conformation dynamics during cell cycle progression and cell differentiation. We also foresee the continued growth of optimization-based and molecular dynamics approaches, supported by general molecular dynamics toolkits.
Collapse
Affiliation(s)
- Krzysztof Banecki
- Laboratory of Bioinformatics and Computational Genomics, Faculty of Mathematics and Information Science, Warsaw University of Technology, Warsaw, Poland
- Laboratory of Functional and Structural Genomics, Centre of New Technologies, University of Warsaw, Warsaw, Poland
| | - Sevastianos Korsak
- Laboratory of Bioinformatics and Computational Genomics, Faculty of Mathematics and Information Science, Warsaw University of Technology, Warsaw, Poland
- Laboratory of Functional and Structural Genomics, Centre of New Technologies, University of Warsaw, Warsaw, Poland
| | - Dariusz Plewczynski
- Laboratory of Bioinformatics and Computational Genomics, Faculty of Mathematics and Information Science, Warsaw University of Technology, Warsaw, Poland
- Laboratory of Functional and Structural Genomics, Centre of New Technologies, University of Warsaw, Warsaw, Poland
| |
Collapse
|
6
|
Zhou Y, Li T, Choppavarapu L, Fang K, Lin S, Jin VX. Integration of scHi-C and scRNA-seq data defines distinct 3D-regulated and biological-context dependent cell subpopulations. Nat Commun 2024; 15:8310. [PMID: 39333113 PMCID: PMC11436782 DOI: 10.1038/s41467-024-52440-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/29/2024] [Accepted: 09/06/2024] [Indexed: 09/29/2024] Open
Abstract
An integration of 3D chromatin structure and gene expression at single-cell resolution has yet been demonstrated. Here, we develop a computational method, a multiomic data integration (MUDI) algorithm, which integrates scHi-C and scRNA-seq data to precisely define the 3D-regulated and biological-context dependent cell subpopulations or topologically integrated subpopulations (TISPs). We demonstrate its algorithmic utility on the publicly available and newly generated scHi-C and scRNA-seq data. We then test and apply MUDI in a breast cancer cell model system to demonstrate its biological-context dependent utility. We find the newly defined topologically conserved associating domain (CAD) is the characteristic single-cell 3D chromatin structure and better characterizes chromatin domains in single-cell resolution. We further identify 20 TISPs uniquely characterizing 3D-regulated breast cancer cellular states. We reveal two of TISPs are remarkably resemble to high cycling breast cancer persister cells and chromatin modifying enzymes might be functional regulators to drive the alteration of the 3D chromatin structures. Our comprehensive integration of scHi-C and scRNA-seq data in cancer cells at single-cell resolution provides mechanistic insights into 3D-regulated heterogeneity of developing drug-tolerant cancer cells.
Collapse
Affiliation(s)
- Yufan Zhou
- Department of Molecular Medicine, University of Texas Health San Antonio, San Antonio, TX, USA
| | - Tian Li
- Department of Molecular Medicine, University of Texas Health San Antonio, San Antonio, TX, USA
| | - Lavanya Choppavarapu
- Division of Biostatistics, The Medical College of Wisconsin, Milwaukee, WI, USA
- MCW Cancer Center, The Medical College of Wisconsin, Milwaukee, WI, USA
| | - Kun Fang
- Division of Biostatistics, The Medical College of Wisconsin, Milwaukee, WI, USA
- MCW Cancer Center, The Medical College of Wisconsin, Milwaukee, WI, USA
| | - Shili Lin
- Department of Statistics, The Ohio State University, Columbus, OH, USA
| | - Victor X Jin
- Division of Biostatistics, The Medical College of Wisconsin, Milwaukee, WI, USA.
- MCW Cancer Center, The Medical College of Wisconsin, Milwaukee, WI, USA.
| |
Collapse
|
7
|
Zhou Y, Li T, Choppavarapu L, Jin VX. Integration of scHi-C and scRNA-seq data defines distinct 3D-regulated and biological-context dependent cell subpopulations. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.09.29.560193. [PMID: 37873257 PMCID: PMC10592853 DOI: 10.1101/2023.09.29.560193] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 10/25/2023]
Abstract
An integration of 3D chromatin structure and gene expression at single-cell resolution has yet been demonstrated. Here, we develop a computational method, a multiomic data integration (MUDI) algorithm, which integrates scHi-C and scRNA-seq data to precisely define the 3D-regulated and biological-context dependent cell subpopulations or topologically integrated subpopulations (TISPs). We demonstrate its algorithmic utility on the publicly available and newly generated scHi-C and scRNA-seq data. We then test and apply MUDI in a breast cancer cell model system to demonstrate its biological-context dependent utility. We found the newly defined topologically conserved associating domain (CAD) is the characteristic single-cell 3D chromatin structure and better characterizes chromatin domains in single-cell resolution. We further identify 20 TISPs uniquely characterizing 3D-regulated breast cancer cellular states. We reveal two of TISPs are remarkably resemble to high cycling breast cancer persister cells and chromatin modifying enzymes might be functional regulators to drive the alteration of the 3D chromatin structures. Our comprehensive integration of scHi-C and scRNA-seq data in cancer cells at single-cell resolution provides mechanistic insights into 3D-regulated heterogeneity of developing drug-tolerant cancer cells.
Collapse
|
8
|
Fan S, Dang D, Ye Y, Zhang SW, Gao L, Zhang S. scHi-CSim: a flexible simulator that generates high-fidelity single-cell Hi-C data for benchmarking. J Mol Cell Biol 2023; 15:mjad003. [PMID: 36708167 PMCID: PMC10308180 DOI: 10.1093/jmcb/mjad003] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/01/2022] [Revised: 09/18/2022] [Accepted: 01/25/2023] [Indexed: 01/29/2023] Open
Abstract
Single-cell Hi-C technology provides an unprecedented opportunity to reveal chromatin structure in individual cells. However, high sequencing cost impedes the generation of biological Hi-C data with high sequencing depths and multiple replicates for downstream analysis. Here, we developed a single-cell Hi-C simulator (scHi-CSim) that generates high-fidelity data for benchmarking. scHi-CSim merges neighboring cells to overcome the sparseness of data, samples interactions in distance-stratified chromosomes to maintain the heterogeneity of single cells, and estimates the empirical distribution of restriction fragments to generate simulated data. We demonstrated that scHi-CSim can generate high-fidelity data by comparing the performance of single-cell clustering and detection of chromosomal high-order structures with raw data. Furthermore, scHi-CSim is flexible to change sequencing depth and the number of simulated replicates. We showed that increasing sequencing depth could improve the accuracy of detecting topologically associating domains. We also used scHi-CSim to generate a series of simulated datasets with different sequencing depths to benchmark scHi-C clustering methods.
Collapse
Affiliation(s)
- Shichen Fan
- School of Computer Science and Technology, Xidian University, Xi'an 710071, China
| | - Dachang Dang
- School of Automation, Northwestern Polytechnical University, Xi'an 710072, China
| | - Yusen Ye
- School of Computer Science and Technology, Xidian University, Xi'an 710071, China
| | - Shao-Wu Zhang
- School of Automation, Northwestern Polytechnical University, Xi'an 710072, China
| | - Lin Gao
- School of Computer Science and Technology, Xidian University, Xi'an 710071, China
| | - Shihua Zhang
- NCMIS, CEMS, RCSDS, Academy of Mathematics and Systems Science, Chinese Academy of Sciences, Beijing 100190, China
- School of Mathematical Sciences, University of Chinese Academy of Sciences, Beijing 100049, China
- Center for Excellence in Animal Evolution and Genetics, Chinese Academy of Sciences, Kunming 650223, China
- Key Laboratory of Systems Health Science of Zhejiang Province, School of Life Science, Hangzhou Institute for Advanced Study, University of Chinese Academy of Sciences, Hangzhou 310024, China
| |
Collapse
|
9
|
Yildirim A, Boninsegna L, Zhan Y, Alber F. Uncovering the Principles of Genome Folding by 3D Chromatin Modeling. Cold Spring Harb Perspect Biol 2022; 14:a039693. [PMID: 34400556 PMCID: PMC9248826 DOI: 10.1101/cshperspect.a039693] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/25/2022]
Abstract
Our understanding of how genomic DNA is tightly packed inside the nucleus, yet is still accessible for vital cellular processes, has grown dramatically over recent years with advances in microscopy and genomics technologies. Computational methods have played a pivotal role in the structural interpretation of experimental data, which helped unravel some organizational principles of genome folding. Here, we give an overview of current computational efforts in mechanistic and data-driven 3D chromatin structure modeling. We discuss strengths and limitations of different methods and evaluate the added value and benefits of computational approaches to infer the 3D structural and dynamic properties of the genome and its underlying mechanisms at different scales and resolution, ranging from the dynamic formation of chromatin loops and topological associated domains to nuclear compartmentalization of chromatin and nuclear bodies.
Collapse
Affiliation(s)
- Asli Yildirim
- Institute for Quantitative and Computational Biosciences, Department of Microbiology, Immunology and Molecular Genetics, University of California Los Angeles, Los Angeles, California 90095, USA
| | - Lorenzo Boninsegna
- Institute for Quantitative and Computational Biosciences, Department of Microbiology, Immunology and Molecular Genetics, University of California Los Angeles, Los Angeles, California 90095, USA
| | - Yuxiang Zhan
- Institute for Quantitative and Computational Biosciences, Department of Microbiology, Immunology and Molecular Genetics, University of California Los Angeles, Los Angeles, California 90095, USA
- Quantitative and Computational Biology, Department of Biological Sciences, University of Southern California, Los Angeles, California 90089, USA
| | - Frank Alber
- Institute for Quantitative and Computational Biosciences, Department of Microbiology, Immunology and Molecular Genetics, University of California Los Angeles, Los Angeles, California 90095, USA
- Quantitative and Computational Biology, Department of Biological Sciences, University of Southern California, Los Angeles, California 90089, USA
| |
Collapse
|
10
|
Liu T, Wang Z. scHiCEmbed: Bin-Specific Embeddings of Single-Cell Hi-C Data Using Graph Auto-Encoders. Genes (Basel) 2022; 13:genes13061048. [PMID: 35741810 PMCID: PMC9222580 DOI: 10.3390/genes13061048] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/09/2022] [Revised: 06/08/2022] [Accepted: 06/09/2022] [Indexed: 02/05/2023] Open
Abstract
Most publicly accessible single-cell Hi-C data are sparse and cannot reach a higher resolution. Therefore, learning latent representations (bin-specific embeddings) of sparse single-cell Hi-C matrices would provide us with a novel way of mining valuable information hidden in the limited number of single-cell Hi-C contacts. We present scHiCEmbed, an unsupervised computational method for learning bin-specific embeddings of single-cell Hi-C data, and the computational system is applied to the tasks of 3D structure reconstruction of whole genomes and detection of topologically associating domains (TAD). The only input of scHiCEmbed is a raw or scHiCluster-imputed single-cell Hi-C matrix. The main process of scHiCEmbed is to embed each node/bin in a higher dimensional space using graph auto-encoders. The learned n-by-3 bin-specific embedding/latent matrix is considered the final reconstructed 3D genome structure. For TAD detection, we use constrained hierarchical clustering on the latent matrix to classify bins: S_Dbw is used to determine the optimal number of clusters, and each cluster is considered as one potential TAD. Our reconstructed 3D structures for individual chromatins at different cell stages reveal the expanding process of chromatins during the cell cycle. We observe that the TADs called from single-cell Hi-C data are not shared across individual cells and that the TAD boundaries called from raw or imputed single-cell Hi-C are significantly different from those called from bulk Hi-C, confirming the cell-to-cell variability in terms of TAD definitions. The source code for scHiCEmbed is publicly available, and the URL can be found in the conclusion section.
Collapse
|
11
|
Xie Q, Han C, Jin V, Lin S. HiCImpute: A Bayesian hierarchical model for identifying structural zeros and enhancing single cell Hi-C data. PLoS Comput Biol 2022; 18:e1010129. [PMID: 35696429 PMCID: PMC9232133 DOI: 10.1371/journal.pcbi.1010129] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/02/2021] [Revised: 06/24/2022] [Accepted: 04/21/2022] [Indexed: 11/19/2022] Open
Abstract
Single cell Hi-C techniques enable one to study cell to cell variability in chromatin interactions. However, single cell Hi-C (scHi-C) data suffer severely from sparsity, that is, the existence of excess zeros due to insufficient sequencing depth. Complicating the matter further is the fact that not all zeros are created equal: some are due to loci truly not interacting because of the underlying biological mechanism (structural zeros); others are indeed due to insufficient sequencing depth (sampling zeros or dropouts), especially for loci that interact infrequently. Differentiating between structural zeros and dropouts is important since correct inference would improve downstream analyses such as clustering and discovery of subtypes. Nevertheless, distinguishing between these two types of zeros has received little attention in the single cell Hi-C literature, where the issue of sparsity has been addressed mainly as a data quality improvement problem. To fill this gap, in this paper, we propose HiCImpute, a Bayesian hierarchical model that goes beyond data quality improvement by also identifying observed zeros that are in fact structural zeros. HiCImpute takes spatial dependencies of scHi-C 2D data structure into account while also borrowing information from similar single cells and bulk data, when such are available. Through an extensive set of analyses of synthetic and real data, we demonstrate the ability of HiCImpute for identifying structural zeros with high sensitivity, and for accurate imputation of dropout values. Downstream analyses using data improved from HiCImpute yielded much more accurate clustering of cell types compared to using observed data or data improved by several comparison methods. Most significantly, HiCImpute-improved data have led to the identification of subtypes within each of the excitatory neuronal cells of L4 and L5 in the prefrontal cortex.
Collapse
Affiliation(s)
- Qing Xie
- Interdisciplinary Ph.D. Program in Biostatistics, Ohio State University, Columbus, Ohio, United State of America
| | - Chenggong Han
- Interdisciplinary Ph.D. Program in Biostatistics, Ohio State University, Columbus, Ohio, United State of America
| | - Victor Jin
- Department of Molecular Medicine, University of Texas Health Science Center, San Antonio, Texas, United State of America
| | - Shili Lin
- Interdisciplinary Ph.D. Program in Biostatistics, Ohio State University, Columbus, Ohio, United State of America
- Department of Statistics, Ohio State University, Columbus, Ohio, United State of America
- Translational Data Analytics Institute, Ohio State University, Columbus, Ohio, United State of America
| |
Collapse
|
12
|
Mapping nucleosome and chromatin architectures: A survey of computational methods. Comput Struct Biotechnol J 2022; 20:3955-3962. [PMID: 35950186 PMCID: PMC9340519 DOI: 10.1016/j.csbj.2022.07.037] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/08/2022] [Revised: 07/22/2022] [Accepted: 07/22/2022] [Indexed: 11/21/2022] Open
Abstract
With ever-growing genomic sequencing data, the data variabilities and the underlying biases of the sequencing technologies pose significant computational challenges ranging from the need for accurately detecting the nucleosome positioning or chromatin interaction to the need for developing normalization methods to eliminate systematic biases. This review mainly surveys the computational methods for mapping the higher-resolution nucleosome and higher-order chromatin architectures. While a detailed discussion of the underlying algorithms is beyond the scope of our survey, we have discussed the methods and tools that can detect the nucleosomes in the genome, then demonstrated the computational methods for identifying 3D chromatin domains and interactions. We further illustrated computational approaches for integrating multi-omics data with Hi-C data and the advance of single-cell (sc)Hi-C data analysis. Our survey provides a comprehensive and valuable resource for biomedical scientists interested in studying nucleosome organization and chromatin structures as well as for computational scientists who are interested in improving upon them.
Collapse
|
13
|
Galitsyna AA, Gelfand MS. Single-cell Hi-C data analysis: safety in numbers. Brief Bioinform 2021; 22:bbab316. [PMID: 34406348 PMCID: PMC8575028 DOI: 10.1093/bib/bbab316] [Citation(s) in RCA: 28] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/31/2021] [Revised: 07/09/2021] [Accepted: 07/21/2021] [Indexed: 02/06/2023] Open
Abstract
Over the past decade, genome-wide assays for chromatin interactions in single cells have enabled the study of individual nuclei at unprecedented resolution and throughput. Current chromosome conformation capture techniques survey contacts for up to tens of thousands of individual cells, improving our understanding of genome function in 3D. However, these methods recover a small fraction of all contacts in single cells, requiring specialised processing of sparse interactome data. In this review, we highlight recent advances in methods for the interpretation of single-cell genomic contacts. After discussing the strengths and limitations of these methods, we outline frontiers for future development in this rapidly moving field.
Collapse
Affiliation(s)
- Aleksandra A Galitsyna
- Skolkovo Institute of Science and Technology, Skolkovo, Russia
- Institute for Information Transmission Problems, RAS, Moscow, Russia
- Institute of Gene Biology, RAS, Moscow, Russia
| | - Mikhail S Gelfand
- Skolkovo Institute of Science and Technology, Skolkovo, Russia
- Institute for Information Transmission Problems, RAS, Moscow, Russia
| |
Collapse
|
14
|
Goel VY, Hansen AS. The macro and micro of chromosome conformation capture. WILEY INTERDISCIPLINARY REVIEWS. DEVELOPMENTAL BIOLOGY 2021; 10:e395. [PMID: 32987449 PMCID: PMC8236208 DOI: 10.1002/wdev.395] [Citation(s) in RCA: 9] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 06/02/2020] [Revised: 07/21/2020] [Accepted: 07/31/2020] [Indexed: 12/14/2022]
Abstract
The 3D organization of the genome facilitates gene regulation, replication, and repair, making it a key feature of genomic function and one that remains to be properly understood. Over the past two decades, a variety of chromosome conformation capture (3C) methods have delineated genome folding from megabase-scale compartments and topologically associating domains (TADs) down to kilobase-scale enhancer-promoter interactions. Understanding the functional role of each layer of genome organization is a gateway to understanding cell state, development, and disease. Here, we discuss the evolution of 3C-based technologies for mapping 3D genome organization. We focus on genomics methods and provide a historical account of the development from 3C to Hi-C. We also discuss ChIP-based techniques that focus on 3D genome organization mediated by specific proteins, capture-based methods that focus on particular regions or regulatory elements, 3C-orthogonal methods that do not rely on restriction digestion and proximity ligation, and methods for mapping the DNA-RNA and RNA-RNA interactomes. We consider the biological discoveries that have come from these methods, examine the mechanistic contributions of CTCF, cohesin, and loop extrusion to genomic folding, and detail the 3D genome field's current understanding of nuclear architecture. Finally, we give special consideration to Micro-C as an emerging frontier in chromosome conformation capture and discuss recent Micro-C findings uncovering fine-scale chromatin organization in unprecedented detail. This article is categorized under: Gene Expression and Transcriptional Hierarchies > Regulatory Mechanisms Gene Expression and Transcriptional Hierarchies > Gene Networks and Genomics.
Collapse
Affiliation(s)
- Viraat Y. Goel
- Department of Biological EngineeringMassachusetts Institute of TechnologyCambridgeMassachusettsUSA
| | - Anders S. Hansen
- Department of Biological EngineeringMassachusetts Institute of TechnologyCambridgeMassachusettsUSA
| |
Collapse
|
15
|
Abstract
The spatial organization of the genome in the cell nucleus is pivotal to cell function. However, how the 3D genome organization and its dynamics influence cellular phenotypes remains poorly understood. The very recent development of single-cell technologies for probing the 3D genome, especially single-cell Hi-C (scHi-C), has ushered in a new era of unveiling cell-to-cell variability of 3D genome features at an unprecedented resolution. Here, we review recent developments in computational approaches to the analysis of scHi-C, including data processing, dimensionality reduction, imputation for enhancing data quality, and the revealing of 3D genome features at single-cell resolution. While much progress has been made in computational method development to analyze single-cell 3D genomes, substantial future work is needed to improve data interpretation and multimodal data integration, which are critical to reveal fundamental connections between genome structure and function among heterogeneous cell populations in various biological contexts.
Collapse
Affiliation(s)
- Tianming Zhou
- Computational Biology Department, School of Computer Science, Carnegie Mellon University, Pittsburgh, Pennsylvania 15213, USA;
| | - Ruochi Zhang
- Computational Biology Department, School of Computer Science, Carnegie Mellon University, Pittsburgh, Pennsylvania 15213, USA;
| | - Jian Ma
- Computational Biology Department, School of Computer Science, Carnegie Mellon University, Pittsburgh, Pennsylvania 15213, USA;
| |
Collapse
|
16
|
Si-C is a method for inferring super-resolution intact genome structure from single-cell Hi-C data. Nat Commun 2021; 12:4369. [PMID: 34272403 PMCID: PMC8285481 DOI: 10.1038/s41467-021-24662-z] [Citation(s) in RCA: 13] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/10/2020] [Accepted: 06/25/2021] [Indexed: 12/21/2022] Open
Abstract
There is a strong demand for methods that can efficiently reconstruct valid super-resolution intact genome 3D structures from sparse and noise single-cell Hi-C data. Here, we develop Single-Cell Chromosome Conformation Calculator (Si-C) within the Bayesian theory framework and apply this approach to reconstruct intact genome 3D structures from single-cell Hi-C data of eight G1-phase haploid mouse ES cells. The inferred 100-kb and 10-kb structures consistently reproduce the known conserved features of chromatin organization revealed by independent imaging experiments. The analysis of the 10-kb resolution 3D structures reveals cell-to-cell varying domain structures in individual cells and hyperfine structures in domains, such as loops. An average of 0.2 contact reads per divided bin is sufficient for Si-C to obtain reliable structures. The valid super-resolution structures constructed by Si-C demonstrate the potential for visualizing and investigating interactions between all chromatin loci at the genome scale in individual cells. Constructing valid super-resolution intact genome 3D structures from single-cell Hi-C data is essential in investigating chromosome folding. Here the authors develop a method that makes it possible to visualize and investigate chromosome folding in individual cells at the genome scale
Collapse
|
17
|
Zha M, Wang N, Zhang C, Wang Z. Inferring Single-Cell 3D Chromosomal Structures Based on the Lennard-Jones Potential. Int J Mol Sci 2021; 22:ijms22115914. [PMID: 34072879 PMCID: PMC8199262 DOI: 10.3390/ijms22115914] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/22/2021] [Revised: 05/23/2021] [Accepted: 05/28/2021] [Indexed: 11/16/2022] Open
Abstract
Reconstructing three-dimensional (3D) chromosomal structures based on single-cell Hi-C data is a challenging scientific problem due to the extreme sparseness of the single-cell Hi-C data. In this research, we used the Lennard-Jones potential to reconstruct both 500 kb and high-resolution 50 kb chromosomal structures based on single-cell Hi-C data. A chromosome was represented by a string of 500 kb or 50 kb DNA beads and put into a 3D cubic lattice for simulations. A 2D Gaussian function was used to impute the sparse single-cell Hi-C contact matrices. We designed a novel loss function based on the Lennard-Jones potential, in which the ε value, i.e., the well depth, was used to indicate how stable the binding of every pair of beads is. For the bead pairs that have single-cell Hi-C contacts and their neighboring bead pairs, the loss function assigns them stronger binding stability. The Metropolis-Hastings algorithm was used to try different locations for the DNA beads, and simulated annealing was used to optimize the loss function. We proved the correctness and validness of the reconstructed 3D structures by evaluating the models according to multiple criteria and comparing the models with 3D-FISH data.
Collapse
Affiliation(s)
- Mengsheng Zha
- School of Computing Sciences and Computer Engineering, University of Southern Mississippi, 118 College Dr, Hattiesburg, MS 39406, USA; (M.Z.); (C.Z.)
| | - Nan Wang
- Department of Computer Science, New Jersey City University, 2039 Kennedy Blvd, Jersey City, NJ 07305, USA;
| | - Chaoyang Zhang
- School of Computing Sciences and Computer Engineering, University of Southern Mississippi, 118 College Dr, Hattiesburg, MS 39406, USA; (M.Z.); (C.Z.)
| | - Zheng Wang
- Department of Computer Science, University of Miami, 1364 Memorial Drive, Coral Gables, FL 33124, USA
- Correspondence:
| |
Collapse
|
18
|
Wu H, Wang X, Chu M, Li D, Cheng L, Zhou K. HCMB: A stable and efficient algorithm for processing the normalization of highly sparse Hi-C contact data. Comput Struct Biotechnol J 2021; 19:2637-2645. [PMID: 34025950 PMCID: PMC8120939 DOI: 10.1016/j.csbj.2021.04.064] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/15/2021] [Revised: 04/11/2021] [Accepted: 04/24/2021] [Indexed: 11/17/2022] Open
Abstract
The high-throughput genome-wide chromosome conformation capture (Hi-C) method has recently become an important tool to study chromosomal interactions where one can extract meaningful biological information including P(s) curve, topologically associated domains, A/B compartments, and other biologically relevant signals. Normalization is a critical pre-processing step of downstream analyses for the elimination of systematic and technical biases from chromatin contact matrices due to different mappability, GC content, and restriction fragment lengths. Especially, the problem of high sparsity puts forward a huge challenge on the correction, indicating the urgent need for a stable and efficient method for Hi-C data normalization. Recently, some matrix balancing methods have been developed to normalize Hi-C data, such as the Knight-Ruiz (KR) algorithm, but it failed to normalize contact matrices with high sparsity. Here, we presented an algorithm, Hi-C Matrix Balancing (HCMB), based on an iterative solution of equations, combining with linear search and projection strategy to normalize the Hi-C original interaction data. Both the simulated and experimental data demonstrated that HCMB is robust and efficient in normalizing Hi-C data and preserving the biologically relevant Hi-C features even facing very high sparsity. HCMB is implemented in Python and is freely accessible to non-commercial users at GitHub: https://github.com/HUST-DataMan/HCMB.
Collapse
Affiliation(s)
- Honglong Wu
- Wuhan National Laboratory for Optoelectronics, Huazhong University of Science and Technology, Wuhan, Hubei 430000, China
- BGI PathoGenesis Pharmaceutical Technology, BGI-Shenzhen, Shenzhen 518083, China
| | - Xuebin Wang
- BGI PathoGenesis Pharmaceutical Technology, BGI-Shenzhen, Shenzhen 518083, China
| | - Mengtian Chu
- BGI PathoGenesis Pharmaceutical Technology, BGI-Shenzhen, Shenzhen 518083, China
| | - Dongfang Li
- Wuhan National Laboratory for Optoelectronics, Huazhong University of Science and Technology, Wuhan, Hubei 430000, China
- BGI PathoGenesis Pharmaceutical Technology, BGI-Shenzhen, Shenzhen 518083, China
| | - Lixin Cheng
- Shenzhen People's Hospital, First Affiliated Hospital of Southern University of Science and Technology, Second Clinical Medicine College of Jinan University, Shenzhen 518020, China
| | - Ke Zhou
- Wuhan National Laboratory for Optoelectronics, Huazhong University of Science and Technology, Wuhan, Hubei 430000, China
| |
Collapse
|
19
|
Abstract
BACKGROUND The genome architecture mapping (GAM) technique can capture genome-wide chromatin interactions. However, besides the known systematic biases in the raw GAM data, we have found a new type of systematic bias. It is necessary to develop and evaluate effective normalization methods to remove all systematic biases in the raw GAM data. RESULTS We have detected a new type of systematic bias, the fragment length bias, in the genome architecture mapping (GAM) data, which is significantly different from the bias of window detection frequency previously mentioned in the paper introducing the GAM method but is similar to the bias of distances between restriction sites existing in raw Hi-C data. We have found that the normalization method (a normalized variant of the linkage disequilibrium) used in the GAM paper is not able to effectively eliminate the new fragment length bias at 1 Mb resolution (slightly better at 30 kb resolution). We have developed an R package named normGAM for eliminating the new fragment length bias together with the other three biases existing in raw GAM data, which are the biases related to window detection frequency, mappability, and GC content. Five normalization methods have been implemented and included in the R package including Knight-Ruiz 2-norm (KR2, newly designed by us), normalized linkage disequilibrium (NLD), vanilla coverage (VC), sequential component normalization (SCN), and iterative correction and eigenvector decomposition (ICE). CONCLUSIONS Based on our evaluations, the five normalization methods can eliminate the four biases existing in raw GAM data, with VC and KR2 performing better than the others. We have observed that the KR2-normalized GAM data have a higher correlation with the KR-normalized Hi-C data on the same cell samples indicating that the KR-related methods are better than the others for keeping the consistency between the GAM and Hi-C experiments. Compared with the raw GAM data, the normalized GAM data are more consistent with the normalized distances from the fluorescence in situ hybridization (FISH) experiments. The source code of normGAM can be freely downloaded from http://dna.cs.miami.edu/normGAM/.
Collapse
Affiliation(s)
- Tong Liu
- Department of Computer Science, University of Miami, 1365 Memorial Drive, P.O. Box 248154, Coral Gables, FL, 33124, USA
| | - Zheng Wang
- Department of Computer Science, University of Miami, 1365 Memorial Drive, P.O. Box 248154, Coral Gables, FL, 33124, USA.
| |
Collapse
|
20
|
Abstract
BACKGROUND Topologically associating domains (TADs) are genomic regions with varying lengths. The interactions within TADs are more frequent than those between different TADs. TADs or sub-TADs are considered the structural and functional units of the mammalian genomes. Although TADs are important for understanding how genomes function, we have limited knowledge about their 3D structural properties. RESULTS In this study, we designed and benchmarked three metrics for capturing the three-dimensional and two-dimensional structural signatures of TADs, which can help better understand TADs' structural properties and the relationships between structural properties and genetic and epigenetic features. The first metric for capturing 3D structural properties is radius of gyration, which in this study is used to measure the spatial compactness of TADs. The mass value of each DNA bead in a 3D structure is novelly defined as one or more genetic or epigenetic feature(s). The second metric is folding degree. The last metric is exponent parameter, which is used to capture the 2D structural properties based on TADs' Hi-C contact matrices. In general, we observed significant correlations between the three metrics and the genetic and epigenetic features. We made the same observations when using H3K4me3, transcription start sites, and RNA polymerase II to represent the mass value in the modified radius-of-gyration metric. Moreover, we have found that the TADs in the clusters of depleted chromatin states apparently correspond to smaller exponent parameters and larger radius of gyrations. In addition, a new objective function of multidimensional scaling for modelling chromatin or TADs 3D structures was designed and benchmarked, which can handle the DNA bead-pairs with zero Hi-C contact values. CONCLUSIONS The web server for reconstructing chromatin 3D structures using multiple different objective functions and the related source code are publicly available at http://dna.cs.miami.edu/3DChrom/.
Collapse
Affiliation(s)
- Tong Liu
- Department of Computer Science, University of Miami, 1365 Memorial Drive, P.O. Box 248154, Coral Gables, FL 33124 USA
| | - Zheng Wang
- Department of Computer Science, University of Miami, 1365 Memorial Drive, P.O. Box 248154, Coral Gables, FL 33124 USA
| |
Collapse
|
21
|
Liu T, Wang Z. HiCNN2: Enhancing the Resolution of Hi-C Data Using an Ensemble of Convolutional Neural Networks. Genes (Basel) 2019; 10:genes10110862. [PMID: 31671634 PMCID: PMC6896157 DOI: 10.3390/genes10110862] [Citation(s) in RCA: 16] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/19/2019] [Accepted: 10/28/2019] [Indexed: 12/17/2022] Open
Abstract
We present a deep-learning package named HiCNN2 to learn the mapping between low-resolution and high-resolution Hi-C (a technique for capturing genome-wide chromatin interactions) data, which can enhance the resolution of Hi-C interaction matrices. The HiCNN2 package includes three methods each with a different deep learning architecture: HiCNN2-1 is based on one single convolutional neural network (ConvNet); HiCNN2-2 consists of an ensemble of two different ConvNets; and HiCNN2-3 is an ensemble of three different ConvNets. Our evaluation results indicate that HiCNN2-enhanced high-resolution Hi-C data achieve smaller mean squared error and higher Pearson’s correlation coefficients with experimental high-resolution Hi-C data compared with existing methods HiCPlus and HiCNN. Moreover, all of the three HiCNN2 methods can recover more significant interactions detected by Fit-Hi-C compared to HiCPlus and HiCNN. Based on our evaluation results, we would recommend using HiCNN2-1 and HiCNN2-3 if recovering more significant interactions from Hi-C data is of interest, and HiCNN2-2 and HiCNN if the goal is to achieve higher reproducibility scores between the enhanced Hi-C matrix and the real high-resolution Hi-C matrix.
Collapse
Affiliation(s)
- Tong Liu
- Department of Computer Science, University of Miami, 1365 Memorial Drive, P.O. Box 248154, Coral Gables, FL 33124, USA.
| | - Zheng Wang
- Department of Computer Science, University of Miami, 1365 Memorial Drive, P.O. Box 248154, Coral Gables, FL 33124, USA.
| |
Collapse
|
22
|
Zhu H, Wang N, Sun JZ, Pandey RB, Wang Z. Inferring the three-dimensional structures of the X-chromosome during X-inactivation. MATHEMATICAL BIOSCIENCES AND ENGINEERING : MBE 2019; 16:7384-7404. [PMID: 31698618 PMCID: PMC7772933 DOI: 10.3934/mbe.2019369] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Indexed: 06/10/2023]
Abstract
The Hi-C experiment can capture the genome-wide spatial proximities of the DNA, based on which it is possible to computationally reconstruct the three-dimensional (3D) structures of chromosomes. The transcripts of the long non-coding RNA (lncRNA) Xist spread throughout the entire X-chromosome and alter the 3D structure of the X-chromosome, which also inactivates one copy of the two X-chromosomes in a cell. The Hi-C experiments are expensive and time-consuming to conduct, but the Hi-C data of the active and inactive X-chromosomes are available. However, the Hi-C data of the X-chromosome during the process of X-chromosome inactivation (XCI) are not available. Therefore, the 3D structure of the X-chromosome during the process of X-chromosome inactivation (XCI) remains to be unknown. We have developed a new approach to reconstruct the 3D structure of the X-chromosome during XCI, in which the chain of DNA beads representing a chromosome is stored and simulated inside a 3D cubic lattice. A 2D Gaussian function is used to model the zero values in the 2D Hi-C contact matrices. By applying simulated annealing and Metropolis-Hastings simulations, we first generated the 3D structures of the X-chromosome before and after XCI. Then, we used Xist localization intensities on the X-chromosome (RAP data) to model the traveling speeds or acceleration between all bead pairs during the process of XCI. The 3D structures of the X-chromosome at 3 hours, 6 hours, and 24 hours after the start of the Xist expression, which initiates the XCI process, have been reconstructed. The source code and the reconstructed 3D structures of the X-chromosome can be downloaded from http://dna.cs.miami.edu/3D-XCI/.
Collapse
Affiliation(s)
- Hao Zhu
- Department of Computer Science, University of Miami, 1364 Memorial Drive, Coral Gables, FL 33124, USA
| | - Nan Wang
- Department of Computer Science, New Jersey City University, 2039 Kennedy Blvd, Jersey City, NJ 07305, USA
| | - Jonathan Z. Sun
- Department of Computer Science, College of Charleston, Charleston, SC 29424, USA
| | - Ras B. Pandey
- Department of Physics and Astronomy, University of Southern Mississippi, 118 College Drive #5046, Hattiesburg, MS 39406, USA
| | - Zheng Wang
- Department of Computer Science, University of Miami, 1364 Memorial Drive, Coral Gables, FL 33124, USA
| |
Collapse
|