1
|
Zhang J, Zhang N, Mai Q, Zhou C. The frontier of precision medicine: application of single-cell multi-omics in preimplantation genetic diagnosis. Brief Funct Genomics 2024; 23:726-732. [PMID: 39486398 DOI: 10.1093/bfgp/elae041] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/16/2024] [Revised: 10/03/2024] [Indexed: 11/04/2024] Open
Abstract
The advent of single-cell multi-omics technologies has revolutionized the landscape of preimplantation genetic diagnosis (PGD), offering unprecedented insights into the genetic, transcriptomic, and proteomic profiles of individual cells in early-stage embryos. This breakthrough holds the promise of enhancing the accuracy, efficiency, and scope of PGD, thereby significantly improving outcomes in assisted reproductive technologies (ARTs) and genetic disease prevention. This review provides a comprehensive overview of the importance of PGD in the context of precision medicine and elucidates how single-cell multi-omics technologies have transformed this field. We begin with a brief history of PGD, highlighting its evolution and application in detecting genetic disorders and facilitating ART. Subsequently, we delve into the principles, methodologies, and applications of single-cell genomics, transcriptomics, and proteomics in PGD, emphasizing their role in improving diagnostic precision and efficiency. Furthermore, we review significant recent advances within this domain, including key experimental designs, findings, and their implications for PGD practices. The advantages and limitations of these studies are analyzed to assess their potential impact on the future development of PGD technologies. Looking forward, we discuss the emerging research directions and challenges, focusing on technological advancements, new application areas, and strategies to overcome existing limitations. In conclusion, this review underscores the pivotal role of single-cell multi-omics in PGD, highlighting its potential to drive the progress of precision medicine and personalized treatment strategies, thereby marking a new era in reproductive genetics and healthcare.
Collapse
Affiliation(s)
- Jinglei Zhang
- Reproductive Medical Center, The First Affiliated Hospital, Sun Yat-Sen University, Guangzhou 510080, Guangdong, China
| | - Nan Zhang
- General Surgery, The First Affiliated Hospital of Henan University of CM, Zhengzhou 450052, China
| | - Qingyun Mai
- Reproductive Medical Center, The First Affiliated Hospital, Sun Yat-Sen University, Guangzhou 510080, Guangdong, China
| | - Canquan Zhou
- Reproductive Medical Center, The First Affiliated Hospital, Sun Yat-Sen University, Guangzhou 510080, Guangdong, China
| |
Collapse
|
2
|
Ma R, Huang J, Jiang T, Ma W. A mini-review of single-cell Hi-C embedding methods. Comput Struct Biotechnol J 2024; 23:4027-4035. [PMID: 39610904 PMCID: PMC11603012 DOI: 10.1016/j.csbj.2024.11.002] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/10/2024] [Revised: 11/01/2024] [Accepted: 11/01/2024] [Indexed: 11/30/2024] Open
Abstract
Single-cell Hi-C (scHi-C) techniques have significantly advanced our understanding of the 3D genome organization, providing crucial insights into the spatial genome architecture within individual nuclei. Numerous computational and statistical methods have been developed to analyze scHi-C data, with embedding methods playing a key role. Embedding reduces the dimensionality of complex scHi-C contact maps, making it easier to extract biologically meaningful patterns. These methods not only enhance cell clustering based on chromatin structures but also facilitate visualization and other downstream analyses. Most scHi-C embedding methods incorporate strategies such as normalization and imputation to address the inherent sparsity of scHi-C data, thereby further improving data quality and interpretability. In this review, we systematically examine the existing methods designed for scHi-C embedding, outlining their methodologies and discussing their capabilities in handling normalization and imputation. Additionally, we present a comprehensive benchmarking analysis to compare both embedding techniques and their clustering performances. This review serves as a practical guide for researchers seeking to select suitable scHi-C embedding tools, ultimately contributing to the understanding of the 3D organization of the genome.
Collapse
Affiliation(s)
- Rui Ma
- Department of Statistics, University of California Riverside, 900 University Ave., Riverside, 92521, CA, USA
| | - Jingong Huang
- Department of Computer Science and Engineering, University of California Riverside, 900 University Ave., Riverside, 92521, CA, USA
| | - Tao Jiang
- Department of Computer Science and Engineering, University of California Riverside, 900 University Ave., Riverside, 92521, CA, USA
- Institute of Integrative Genome Biology, University of California Riverside, 900 University Ave., Riverside, 92521, CA, USA
| | - Wenxiu Ma
- Department of Statistics, University of California Riverside, 900 University Ave., Riverside, 92521, CA, USA
- Institute of Integrative Genome Biology, University of California Riverside, 900 University Ave., Riverside, 92521, CA, USA
| |
Collapse
|
3
|
Zhou X, Wu H. scHiClassifier: a deep learning framework for cell type prediction by fusing multiple feature sets from single-cell Hi-C data. Brief Bioinform 2024; 26:bbaf009. [PMID: 39831891 PMCID: PMC11744636 DOI: 10.1093/bib/bbaf009] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/21/2024] [Revised: 12/01/2024] [Accepted: 01/06/2025] [Indexed: 01/22/2025] Open
Abstract
Single-cell high-throughput chromosome conformation capture (Hi-C) technology enables capturing chromosomal spatial structure information at the cellular level. However, to effectively investigate changes in chromosomal structure across different cell types, there is a requisite for methods that can identify cell types utilizing single-cell Hi-C data. Current frameworks for cell type prediction based on single-cell Hi-C data are limited, often struggling with features interpretability and biological significance, and lacking convincing and robust classification performance validation. In this study, we propose four new feature sets based on the contact matrix with clear interpretability and biological significance. Furthermore, we develop a novel deep learning framework named scHiClassifier based on multi-head self-attention encoder, 1D convolution and feature fusion, which integrates information from these four feature sets to predict cell types accurately. Through comprehensive comparison experiments with benchmark frameworks on six datasets, we demonstrate the superior classification performance and the universality of the scHiClassifier framework. We further assess the robustness of scHiClassifier through data perturbation experiments and data dropout experiments. Moreover, we demonstrate that using all feature sets in the scHiClassifier framework yields optimal performance, supported by comparisons of different feature set combinations. The effectiveness and the superiority of the multiple feature set extraction are proven by comparison with four unsupervised dimensionality reduction methods. Additionally, we analyze the importance of different feature sets and chromosomes using the "SHapley Additive exPlanations" method. Furthermore, the accuracy and reliability of the scHiClassifier framework in cell classification for single-cell Hi-C data are supported through enrichment analysis. The source code of scHiClassifier is freely available at https://github.com/HaoWuLab-Bioinformatics/scHiClassifier.
Collapse
Affiliation(s)
- Xiangfei Zhou
- School of Software, Shandong University, No. 1500, Shunhua Road, Hi-Tech Industrial Development Zone, Jinan 250100, Shandong, China
| | - Hao Wu
- School of Software, Shandong University, No. 1500, Shunhua Road, Hi-Tech Industrial Development Zone, Jinan 250100, Shandong, China
- Shenzhen Research Institute of Shandong University, Shandong University, No. 19, Gaoxin South 4th Road, Nanshan District, Shenzhen 518063, Guangdong, China
| |
Collapse
|
4
|
Zhou Y, Li T, Choppavarapu L, Fang K, Lin S, Jin VX. Integration of scHi-C and scRNA-seq data defines distinct 3D-regulated and biological-context dependent cell subpopulations. Nat Commun 2024; 15:8310. [PMID: 39333113 PMCID: PMC11436782 DOI: 10.1038/s41467-024-52440-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/29/2024] [Accepted: 09/06/2024] [Indexed: 09/29/2024] Open
Abstract
An integration of 3D chromatin structure and gene expression at single-cell resolution has yet been demonstrated. Here, we develop a computational method, a multiomic data integration (MUDI) algorithm, which integrates scHi-C and scRNA-seq data to precisely define the 3D-regulated and biological-context dependent cell subpopulations or topologically integrated subpopulations (TISPs). We demonstrate its algorithmic utility on the publicly available and newly generated scHi-C and scRNA-seq data. We then test and apply MUDI in a breast cancer cell model system to demonstrate its biological-context dependent utility. We find the newly defined topologically conserved associating domain (CAD) is the characteristic single-cell 3D chromatin structure and better characterizes chromatin domains in single-cell resolution. We further identify 20 TISPs uniquely characterizing 3D-regulated breast cancer cellular states. We reveal two of TISPs are remarkably resemble to high cycling breast cancer persister cells and chromatin modifying enzymes might be functional regulators to drive the alteration of the 3D chromatin structures. Our comprehensive integration of scHi-C and scRNA-seq data in cancer cells at single-cell resolution provides mechanistic insights into 3D-regulated heterogeneity of developing drug-tolerant cancer cells.
Collapse
Affiliation(s)
- Yufan Zhou
- Department of Molecular Medicine, University of Texas Health San Antonio, San Antonio, TX, USA
| | - Tian Li
- Department of Molecular Medicine, University of Texas Health San Antonio, San Antonio, TX, USA
| | - Lavanya Choppavarapu
- Division of Biostatistics, The Medical College of Wisconsin, Milwaukee, WI, USA
- MCW Cancer Center, The Medical College of Wisconsin, Milwaukee, WI, USA
| | - Kun Fang
- Division of Biostatistics, The Medical College of Wisconsin, Milwaukee, WI, USA
- MCW Cancer Center, The Medical College of Wisconsin, Milwaukee, WI, USA
| | - Shili Lin
- Department of Statistics, The Ohio State University, Columbus, OH, USA
| | - Victor X Jin
- Division of Biostatistics, The Medical College of Wisconsin, Milwaukee, WI, USA.
- MCW Cancer Center, The Medical College of Wisconsin, Milwaukee, WI, USA.
| |
Collapse
|
5
|
Zhou B, Liu Q, Wang M, Wu H. Deep neural network models for cell type prediction based on single-cell Hi-C data. BMC Genomics 2024; 22:922. [PMID: 39285318 PMCID: PMC11406723 DOI: 10.1186/s12864-024-10764-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/13/2020] [Accepted: 09/02/2024] [Indexed: 09/19/2024] Open
Abstract
BACKGROUND Cell type prediction is crucial to cell type identification of genomics, cancer diagnosis and drug development, and it can solve the time-consuming and difficult problem of cell classification in biological experiments. Therefore, a computational method is urgently needed to classify and predict cell types using single-cell Hi-C data. In previous studies, there is a lack of convenient and accurate method to predict cell types based on single-cell Hi-C data. Deep neural networks can form complex representations of single-cell Hi-C data and make it possible to handle the multidimensional and sparse biological datasets. RESULTS We compare the performance of SCANN with existing methods and analyze the model by using five different evaluation metrics. When using only ML1 and ML3 datasets, the ARI and NMI values of SCANN increase by 14% and 11% over those of scHiCluster respectively. However, when using all six libraries of data, the ARI and NMI values of SCANN increase by 63% and 88% over those of scHiCluster respectively. These findings show that SCANN is highly accurate in predicting the type of independent cell samples using single-cell Hi-C data. CONCLUSIONS SCANN enhances the training speed and requires fewer resources for predicting cell types. In addition, when the number of cells in different cell types was extremely unbalanced, SCANN has higher stability and flexibility in solving cell classification and cell type prediction using the single-cell Hi-C data. This predication method can assist biologists to study the differences in the chromosome structure of cells between different cell types.
Collapse
Affiliation(s)
- Bing Zhou
- School of Software, Shandong University, Jinan, Shandong, 250100, China
- College of Information Engineering, Northwest A&F University, 712100, Yangling, Shaanxi, China
| | - Quanzhong Liu
- College of Information Engineering, Northwest A&F University, 712100, Yangling, Shaanxi, China
| | - Meili Wang
- College of Information Engineering, Northwest A&F University, 712100, Yangling, Shaanxi, China.
| | - Hao Wu
- School of Software, Shandong University, Jinan, Shandong, 250100, China.
| |
Collapse
|
6
|
Wu Y, Shi Z, Zhou X, Zhang P, Yang X, Ding J, Wu H. scHiCyclePred: a deep learning framework for predicting cell cycle phases from single-cell Hi-C data using multi-scale interaction information. Commun Biol 2024; 7:923. [PMID: 39085477 PMCID: PMC11291681 DOI: 10.1038/s42003-024-06626-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/30/2023] [Accepted: 07/24/2024] [Indexed: 08/02/2024] Open
Abstract
The emergence of single-cell Hi-C (scHi-C) technology has provided unprecedented opportunities for investigating the intricate relationship between cell cycle phases and the three-dimensional (3D) structure of chromatin. However, accurately predicting cell cycle phases based on scHi-C data remains a formidable challenge. Here, we present scHiCyclePred, a prediction model that integrates multiple feature sets to leverage scHi-C data for predicting cell cycle phases. scHiCyclePred extracts 3D chromatin structure features by incorporating multi-scale interaction information. The comparative analysis illustrates that scHiCyclePred surpasses existing methods such as Nagano_method and CIRCLET across various metrics including accuracy (ACC), F1 score, Precision, Recall, and balanced accuracy (BACC). In addition, we evaluate scHiCyclePred against the previously published CIRCLET using the dataset of complex tissues (Liu_dataset). Experimental results reveal significant improvements with scHiCyclePred exhibiting improvements of 0.39, 0.52, 0.52, and 0.39 over the CIRCLET in terms of ACC, F1 score, Precision, and Recall metrics, respectively. Furthermore, we conduct analyses on three-dimensional chromatin dynamics and gene features during the cell cycle, providing a more comprehensive understanding of cell cycle dynamics through chromatin structure. scHiCyclePred not only offers insights into cell biology but also holds promise for catalyzing breakthroughs in disease research. Access scHiCyclePred on GitHub at https:// github.com/HaoWuLab-Bioinformatics/ scHiCyclePred .
Collapse
Affiliation(s)
- Yingfu Wu
- School of Software, Shandong University, Jinan, Shandong, China
- Shenzhen Research Institute of Shandong University, Shenzhen, Guangdong, China
- College of Information Engineering, Northwest A&F University, Yangling, Shaanxi, China
| | - Zhenqi Shi
- School of Software, Shandong University, Jinan, Shandong, China
| | - Xiangfei Zhou
- School of Software, Shandong University, Jinan, Shandong, China
| | - Pengyu Zhang
- College of Information Engineering, Northwest A&F University, Yangling, Shaanxi, China
| | - Xiuhui Yang
- School of Software, Shandong University, Jinan, Shandong, China
| | - Jun Ding
- Department of Medicine, Meakins-Christie Laboratories, McGill University, Montreal, QC, Canada.
| | - Hao Wu
- School of Software, Shandong University, Jinan, Shandong, China.
- Shenzhen Research Institute of Shandong University, Shenzhen, Guangdong, China.
| |
Collapse
|
7
|
Shi Z, Wu H. CTPredictor: A comprehensive and robust framework for predicting cell types by integrating multi-scale features from single-cell Hi-C data. Comput Biol Med 2024; 173:108336. [PMID: 38513390 DOI: 10.1016/j.compbiomed.2024.108336] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/11/2023] [Revised: 03/01/2024] [Accepted: 03/17/2024] [Indexed: 03/23/2024]
Abstract
Single-cell Hi-C (scHi-C) has emerged as a powerful technology for deciphering cell-to-cell variability in three-dimensional (3D) chromatin organization, providing insights into genome-wide chromatin interactions and their correlation with cellular functions. Nevertheless, the accurate identification of cell types across different datasets remains a formidable challenge, hindering comprehensive investigations into genome structure. In response, we introduce CTPredictor, an innovative computational method that integrates multi-scale features to accurately predict cell types in various datasets. CTPredictor strategically incorporates three distinct feature sets, namely, small intra-domain contact probability (SICP), smoothed small intra-domain contact probability (SSICP), and smoothed bin contact probability (SBCP). The resulting fusion classification model significantly enhances the accuracy of cell type prediction based on single-cell Hi-C data (scHi-C). Rigorous benchmarking against established methods and three conventional machine learning approaches demonstrates the robust performance of CTPredictor, positioning it as an advanced tool for cell type prediction within scHi-C data. Beyond its prediction capabilities, CTPredictor holds promise in illuminating 3D genome structures and their functional significance across a wide array of biological processes.
Collapse
Affiliation(s)
- Zhenqi Shi
- School of Software, Shandong University, 250100, Jinan, China
| | - Hao Wu
- School of Software, Shandong University, 250100, Jinan, China.
| |
Collapse
|
8
|
Zhang Y, Boninsegna L, Yang M, Misteli T, Alber F, Ma J. Computational methods for analysing multiscale 3D genome organization. Nat Rev Genet 2024; 25:123-141. [PMID: 37673975 PMCID: PMC11127719 DOI: 10.1038/s41576-023-00638-1] [Citation(s) in RCA: 26] [Impact Index Per Article: 26.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 07/12/2023] [Indexed: 09/08/2023]
Abstract
Recent progress in whole-genome mapping and imaging technologies has enabled the characterization of the spatial organization and folding of the genome in the nucleus. In parallel, advanced computational methods have been developed to leverage these mapping data to reveal multiscale three-dimensional (3D) genome features and to provide a more complete view of genome structure and its connections to genome functions such as transcription. Here, we discuss how recently developed computational tools, including machine-learning-based methods and integrative structure-modelling frameworks, have led to a systematic, multiscale delineation of the connections among different scales of 3D genome organization, genomic and epigenomic features, functional nuclear components and genome function. However, approaches that more comprehensively integrate a wide variety of genomic and imaging datasets are still needed to uncover the functional role of 3D genome structure in defining cellular phenotypes in health and disease.
Collapse
Affiliation(s)
- Yang Zhang
- Computational Biology Department, School of Computer Science, Carnegie Mellon University, Pittsburgh, PA, USA
| | - Lorenzo Boninsegna
- Department of Microbiology, Immunology and Molecular Genetics and Institute for Quantitative and Computational Biosciences, University of California Los Angeles, Los Angeles, CA, USA
| | - Muyu Yang
- Computational Biology Department, School of Computer Science, Carnegie Mellon University, Pittsburgh, PA, USA
| | - Tom Misteli
- Center for Cancer Research, National Cancer Institute, Bethesda, MD, USA.
| | - Frank Alber
- Department of Microbiology, Immunology and Molecular Genetics and Institute for Quantitative and Computational Biosciences, University of California Los Angeles, Los Angeles, CA, USA.
| | - Jian Ma
- Computational Biology Department, School of Computer Science, Carnegie Mellon University, Pittsburgh, PA, USA.
| |
Collapse
|
9
|
Zheng J, Yang Y, Dai Z. Subgraph extraction and graph representation learning for single cell Hi-C imputation and clustering. Brief Bioinform 2023; 25:bbad379. [PMID: 38040494 PMCID: PMC10691963 DOI: 10.1093/bib/bbad379] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/18/2023] [Revised: 09/10/2023] [Accepted: 10/03/2023] [Indexed: 12/03/2023] Open
Abstract
Single-cell Hi-C (scHi-C) technology enables the investigation of 3D chromatin structure variability across individual cells. However, the analysis of scHi-C data is challenged by a large number of missing values. Here, we present a scHi-C data imputation model HiC-SGL, based on Subgraph extraction and graph representation learning. HiC-SGL can also learn informative low-dimensional embeddings of cells. We demonstrate that our method surpasses existing methods in terms of imputation accuracy and clustering performance by various metrics.
Collapse
Affiliation(s)
- Jiahao Zheng
- School of Computer Science and Engineering, Sun Yat-Sen University, 510006 Guangzhou, China
| | - Yuedong Yang
- School of Computer Science and Engineering, Sun Yat-Sen University, 510006 Guangzhou, China
| | - Zhiming Dai
- School of Computer Science and Engineering, Sun Yat-Sen University, 510006 Guangzhou, China
| |
Collapse
|
10
|
Liu H, Ma W. scHiCDiff: detecting differential chromatin interactions in single-cell Hi-C data. Bioinformatics 2023; 39:btad625. [PMID: 37847655 PMCID: PMC10598576 DOI: 10.1093/bioinformatics/btad625] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/28/2022] [Revised: 08/15/2023] [Accepted: 10/16/2023] [Indexed: 10/19/2023] Open
Abstract
SUMMARY Here, we presented the scHiCDiff software tool that provides both nonparametric tests and parametirc models to detect differential chromatin interactions (DCIs) from single-cell Hi-C data. We thoroughly evaluated the scHiCDiff methods on both simulated and real data. Our results demonstrated that scHiCDiff, especially the zero-inflated negative binomial model option, can effectively detect reliable and consistent single-cell DCIs between two conditions, thereby facilitating the study of cell type-specific variations of chromatin structures at the single-cell level. AVAILABILITY AND IMPLEMENTATION scHiCDiff is implemented in R and freely available at GitHub (https://github.com/wmalab/scHiCDiff).
Collapse
Affiliation(s)
- Huiling Liu
- Department of Statistics, University of California Riverside, Riverside, CA 92521, United States
| | - Wenxiu Ma
- Department of Statistics, University of California Riverside, Riverside, CA 92521, United States
| |
Collapse
|
11
|
Li A, Zeng G, Wang H, Li X, Zhang Z. DeDoc2 Identifies and Characterizes the Hierarchy and Dynamics of Chromatin TAD-Like Domains in the Single Cells. ADVANCED SCIENCE (WEINHEIM, BADEN-WURTTEMBERG, GERMANY) 2023; 10:e2300366. [PMID: 37162225 PMCID: PMC10369259 DOI: 10.1002/advs.202300366] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 01/16/2023] [Revised: 04/18/2023] [Indexed: 05/11/2023]
Abstract
Topologically associating domains (TADs) are functional chromatin units with hierarchical structure. However, the existence, prevalence, and dynamics of such hierarchy in single cells remain unexplored. Here, a new generation TAD-like domain (TLD) detection algorithm, named deDoc2, to decode the hierarchy of TLDs in single cells, is reported. With dynamic programming, deDoc2 seeks genome partitions with global minimal structure entropy for both whole and local contact matrix. Notably, deDoc2 outperforms state-of-the-art tools and is one of only two tools able to identify the hierarchy of TLDs in single cells. By applying deDoc2, it is showed that the hierarchy of TLDs in single cells is highly dynamic during cell cycle, as well as among human brain cortex cells, and that it is associated with cellular identity and functions. Thus, the results demonstrate the abundance of information potentially encoded by TLD hierarchy for functional regulation. The deDoc2 can be freely accessed at https://github.com/zengguangjie/deDoc2.
Collapse
Affiliation(s)
- Angsheng Li
- State Key Laboratory of Software Development EnvironmentSchool of Computer ScienceBeihang UniversityBeijing100191P. R. China
- Zhongguancun LaboratoryBeijing100094P. R. China
| | - Guangjie Zeng
- State Key Laboratory of Software Development EnvironmentSchool of Computer ScienceBeihang UniversityBeijing100191P. R. China
| | - Haoyu Wang
- CAS Key Laboratory of Genome Sciences and InformationBeijing Institute of GenomicsChinese Academy of Sciences and China National Center for BioinformationBeijing100101China
- School of Life ScienceUniversity of Chinese Academy of SciencesBeijing101408P. R. China
| | - Xiao Li
- CAS Key Laboratory of Genome Sciences and InformationBeijing Institute of GenomicsChinese Academy of Sciences and China National Center for BioinformationBeijing100101China
- School of Life ScienceUniversity of Chinese Academy of SciencesBeijing101408P. R. China
| | - Zhihua Zhang
- CAS Key Laboratory of Genome Sciences and InformationBeijing Institute of GenomicsChinese Academy of Sciences and China National Center for BioinformationBeijing100101China
- School of Life ScienceUniversity of Chinese Academy of SciencesBeijing101408P. R. China
| |
Collapse
|
12
|
Fan S, Dang D, Ye Y, Zhang SW, Gao L, Zhang S. scHi-CSim: a flexible simulator that generates high-fidelity single-cell Hi-C data for benchmarking. J Mol Cell Biol 2023; 15:mjad003. [PMID: 36708167 PMCID: PMC10308180 DOI: 10.1093/jmcb/mjad003] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/01/2022] [Revised: 09/18/2022] [Accepted: 01/25/2023] [Indexed: 01/29/2023] Open
Abstract
Single-cell Hi-C technology provides an unprecedented opportunity to reveal chromatin structure in individual cells. However, high sequencing cost impedes the generation of biological Hi-C data with high sequencing depths and multiple replicates for downstream analysis. Here, we developed a single-cell Hi-C simulator (scHi-CSim) that generates high-fidelity data for benchmarking. scHi-CSim merges neighboring cells to overcome the sparseness of data, samples interactions in distance-stratified chromosomes to maintain the heterogeneity of single cells, and estimates the empirical distribution of restriction fragments to generate simulated data. We demonstrated that scHi-CSim can generate high-fidelity data by comparing the performance of single-cell clustering and detection of chromosomal high-order structures with raw data. Furthermore, scHi-CSim is flexible to change sequencing depth and the number of simulated replicates. We showed that increasing sequencing depth could improve the accuracy of detecting topologically associating domains. We also used scHi-CSim to generate a series of simulated datasets with different sequencing depths to benchmark scHi-C clustering methods.
Collapse
Affiliation(s)
- Shichen Fan
- School of Computer Science and Technology, Xidian University, Xi'an 710071, China
| | - Dachang Dang
- School of Automation, Northwestern Polytechnical University, Xi'an 710072, China
| | - Yusen Ye
- School of Computer Science and Technology, Xidian University, Xi'an 710071, China
| | - Shao-Wu Zhang
- School of Automation, Northwestern Polytechnical University, Xi'an 710072, China
| | - Lin Gao
- School of Computer Science and Technology, Xidian University, Xi'an 710071, China
| | - Shihua Zhang
- NCMIS, CEMS, RCSDS, Academy of Mathematics and Systems Science, Chinese Academy of Sciences, Beijing 100190, China
- School of Mathematical Sciences, University of Chinese Academy of Sciences, Beijing 100049, China
- Center for Excellence in Animal Evolution and Genetics, Chinese Academy of Sciences, Kunming 650223, China
- Key Laboratory of Systems Health Science of Zhejiang Province, School of Life Science, Hangzhou Institute for Advanced Study, University of Chinese Academy of Sciences, Hangzhou 310024, China
| |
Collapse
|
13
|
Ford K, Munson BP, Fong SH, Panwala R, Chu WK, Rainaldi J, Plongthongkum N, Arunachalam V, Kostrowicki J, Meluzzi D, Kreisberg JF, Jensen-Pergakes K, VanArsdale T, Paul T, Tamayo P, Zhang K, Bienkowska J, Mali P, Ideker T. Multimodal perturbation analyses of cyclin-dependent kinases reveal a network of synthetic lethalities associated with cell-cycle regulation and transcriptional regulation. Sci Rep 2023; 13:7678. [PMID: 37169829 PMCID: PMC10175263 DOI: 10.1038/s41598-023-33329-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/15/2022] [Accepted: 04/11/2023] [Indexed: 05/13/2023] Open
Abstract
Cell-cycle control is accomplished by cyclin-dependent kinases (CDKs), motivating extensive research into CDK targeting small-molecule drugs as cancer therapeutics. Here we use combinatorial CRISPR/Cas9 perturbations to uncover an extensive network of functional interdependencies among CDKs and related factors, identifying 43 synthetic-lethal and 12 synergistic interactions. We dissect CDK perturbations using single-cell RNAseq, for which we develop a novel computational framework to precisely quantify cell-cycle effects and diverse cell states orchestrated by specific CDKs. While pairwise disruption of CDK4/6 is synthetic-lethal, only CDK6 is required for normal cell-cycle progression and transcriptional activation. Multiple CDKs (CDK1/7/9/12) are synthetic-lethal in combination with PRMT5, independent of cell-cycle control. In-depth analysis of mRNA expression and splicing patterns provides multiple lines of evidence that the CDK-PRMT5 dependency is due to aberrant transcriptional regulation resulting in premature termination. These inter-dependencies translate to drug-drug synergies, with therapeutic implications in cancer and other diseases.
Collapse
Affiliation(s)
- Kyle Ford
- Department of Bioengineering, University of California San Diego, La Jolla, CA, 92093, USA
| | - Brenton P Munson
- Department of Bioengineering, University of California San Diego, La Jolla, CA, 92093, USA
- Department of Medicine, University of California San Diego, La Jolla, CA, 92093, USA
| | - Samson H Fong
- Department of Bioengineering, University of California San Diego, La Jolla, CA, 92093, USA
- Department of Medicine, University of California San Diego, La Jolla, CA, 92093, USA
| | - Rebecca Panwala
- Department of Bioengineering, University of California San Diego, La Jolla, CA, 92093, USA
| | - Wai Keung Chu
- Department of Bioengineering, University of California San Diego, La Jolla, CA, 92093, USA
| | - Joseph Rainaldi
- Department of Bioengineering, University of California San Diego, La Jolla, CA, 92093, USA
- Biomedical Sciences Program, University of California San Diego, La Jolla, CA, 92093, USA
| | - Nongluk Plongthongkum
- Department of Bioengineering, University of California San Diego, La Jolla, CA, 92093, USA
| | | | | | - Dario Meluzzi
- Department of Medicine, University of California San Diego, La Jolla, CA, 92093, USA
| | - Jason F Kreisberg
- Department of Medicine, University of California San Diego, La Jolla, CA, 92093, USA
| | | | - Todd VanArsdale
- Pfizer Inc, 10555 Science Center Drive, San Diego, CA, 92121, USA
| | - Thomas Paul
- Pfizer Inc, 10555 Science Center Drive, San Diego, CA, 92121, USA
| | - Pablo Tamayo
- Department of Medicine, University of California San Diego, La Jolla, CA, 92093, USA
| | - Kun Zhang
- Department of Bioengineering, University of California San Diego, La Jolla, CA, 92093, USA
| | | | - Prashant Mali
- Department of Bioengineering, University of California San Diego, La Jolla, CA, 92093, USA.
| | - Trey Ideker
- Department of Bioengineering, University of California San Diego, La Jolla, CA, 92093, USA.
- Department of Medicine, University of California San Diego, La Jolla, CA, 92093, USA.
| |
Collapse
|
14
|
Liu Q, Zeng W, Zhang W, Wang S, Chen H, Jiang R, Zhou M, Zhang S. Deep generative modeling and clustering of single cell Hi-C data. Brief Bioinform 2023; 24:6858951. [PMID: 36458445 DOI: 10.1093/bib/bbac494] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/20/2022] [Revised: 09/28/2022] [Accepted: 10/18/2022] [Indexed: 12/05/2022] Open
Abstract
Deciphering 3D genome conformation is important for understanding gene regulation and cellular function at a spatial level. The recent advances of single cell Hi-C technologies have enabled the profiling of the 3D architecture of DNA within individual cell, which allows us to study the cell-to-cell variability of 3D chromatin organization. Computational approaches are in urgent need to comprehensively analyze the sparse and heterogeneous single cell Hi-C data. Here, we proposed scDEC-Hi-C, a new framework for single cell Hi-C analysis with deep generative neural networks. scDEC-Hi-C outperforms existing methods in terms of single cell Hi-C data clustering and imputation. Moreover, the generative power of scDEC-Hi-C could help unveil the differences of chromatin architecture across cell types. We expect that scDEC-Hi-C could shed light on deepening our understanding of the complex mechanism underlying the formation of chromatin contacts.
Collapse
Affiliation(s)
- Qiao Liu
- Department of Statistics, Stanford University, Stanford, CA 94305, USA
| | - Wanwen Zeng
- College of Software, Nankai University, Tianjin 300071, China
| | - Wei Zhang
- Department of Biomedical Engineering, School of Control Science and Engineering, Shandong University, Jinan, Shandong 250061, China
| | - Sicheng Wang
- Department of Computer Science and Engineering, University of California San Diego, La Jolla, CA 92093, USA
| | - Hongyang Chen
- The Research Center for Intelligent Network, Zhejiang Lab, Hangzhou 311121, China
| | - Rui Jiang
- Ministry of Education Key Laboratory of Bioinformatics, Research Department of Bioinformatics at the Beijing National Research Center for Information Science and Technology, Center for Synthetic and Systems Biology, Department of Automation, Tsinghua University, Beijing 100084, China
| | - Mu Zhou
- SenseBrain Research, San Jose, CA 95131, USA
| | - Shaoting Zhang
- Shanghai Artificial Intelligence Laboratory, Shanghai 200240, China
| |
Collapse
|
15
|
Lyu H, Liu E, Wu Z, Li Y, Liu Y, Yin X. scHiCPTR: unsupervised pseudotime inference through dual graph refinement for single-cell Hi-C data. Bioinformatics 2022; 38:5151-5159. [PMID: 36205615 DOI: 10.1093/bioinformatics/btac670] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/24/2022] [Revised: 08/25/2022] [Accepted: 10/05/2022] [Indexed: 12/24/2022] Open
Abstract
MOTIVATION The emerging single-cell Hi-C technology provides opportunities to study dynamics of chromosomal organization. How to construct a pseudotime path using single-cell Hi-C contact matrices to order cells along developmental trajectory is a challenging topic, since these matrices produced by the technology are inherently high dimensional and sparse, they suffer from noises and biases, and the topology of trajectory underlying them may be diverse. RESULTS We present scHiCPTR, an unsupervised graph-based pipeline to infer pseudotime from single-cell Hi-C contact matrices. It provides a workflow consisting of imputation and embedding, graph construction, dual graph refinement, pseudotime calculation and result visualization. Beyond the few existing methods, scHiCPTR ties to optimize graph structure by two parallel procedures of graph pruning, which help reduce the spurious cell links resulted from noises and determine a global developmental directionality. Besides, it has an ability to handle developmental trajectories with multiple topologies, including linear, bifurcated and circular ones, and is competitive with methods developed for single-cell RNA-seq data. The comparative results tell that our scHiCPTR can achieve higher performance in pseudotime inference, and the inferred developmental trajectory exhibit a reasonable biological significance. AVAILABILITY AND IMPLEMENTATION scHiCPTR is freely available at https://github.com/lhqxinghun/scHiCPTR. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Hongqiang Lyu
- School of Automation Science and Engineering, Faculty of Electronic and Information Engineering, Xi'an Jiaotong University, Shaanxi 710049, China
| | - Erhu Liu
- School of Automation Science and Engineering, Faculty of Electronic and Information Engineering, Xi'an Jiaotong University, Shaanxi 710049, China
| | - Zhifang Wu
- School of Automation Science and Engineering, Faculty of Electronic and Information Engineering, Xi'an Jiaotong University, Shaanxi 710049, China
| | - Yao Li
- School of Automation Science and Engineering, Faculty of Electronic and Information Engineering, Xi'an Jiaotong University, Shaanxi 710049, China
| | - Yuan Liu
- School of Automation Science and Engineering, Faculty of Electronic and Information Engineering, Xi'an Jiaotong University, Shaanxi 710049, China
| | - Xiaoran Yin
- Department of Oncology, The Second Affiliated Hospital of Xi'an Jiaotong University, Shaanxi 710004, China
| |
Collapse
|
16
|
Zhen C, Wang Y, Geng J, Han L, Li J, Peng J, Wang T, Hao J, Shang X, Wei Z, Zhu P, Peng J. A review and performance evaluation of clustering frameworks for single-cell Hi-C data. Brief Bioinform 2022; 23:6712299. [PMID: 36151714 DOI: 10.1093/bib/bbac385] [Citation(s) in RCA: 6] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/03/2022] [Revised: 07/31/2022] [Accepted: 08/09/2022] [Indexed: 12/14/2022] Open
Abstract
The three-dimensional genome structure plays a key role in cellular function and gene regulation. Single-cell Hi-C (high-resolution chromosome conformation capture) technology can capture genome structure information at the cell level, which provides the opportunity to study how genome structure varies among different cell types. Recently, a few methods are well designed for single-cell Hi-C clustering. In this manuscript, we perform an in-depth benchmark study of available single-cell Hi-C data clustering methods to implement an evaluation system for multiple clustering frameworks based on both human and mouse datasets. We compare eight methods in terms of visualization and clustering performance. Performance is evaluated using four benchmark metrics including adjusted rand index, normalized mutual information, homogeneity and Fowlkes-Mallows index. Furthermore, we also evaluate the eight methods for the task of separating cells at different stages of the cell cycle based on single-cell Hi-C data.
Collapse
Affiliation(s)
- Caiwei Zhen
- School of Computer Science, Northwestern Polytechnical University, 710072, Xi'an, China.,Key Laboratory of Big Data Storage and Management, Northwestern Polytechnical University, 710072, Xi'an, China
| | - Yuxian Wang
- School of Computer Science, Northwestern Polytechnical University, 710072, Xi'an, China.,Key Laboratory of Big Data Storage and Management, Northwestern Polytechnical University, 710072, Xi'an, China
| | - Jiaquan Geng
- School of Computer Science, Northwestern Polytechnical University, 710072, Xi'an, China.,Key Laboratory of Big Data Storage and Management, Northwestern Polytechnical University, 710072, Xi'an, China
| | - Lu Han
- School of Computer Science, Northwestern Polytechnical University, 710072, Xi'an, China.,Key Laboratory of Big Data Storage and Management, Northwestern Polytechnical University, 710072, Xi'an, China
| | - Jingyi Li
- School of Computer Science, Northwestern Polytechnical University, 710072, Xi'an, China.,Key Laboratory of Big Data Storage and Management, Northwestern Polytechnical University, 710072, Xi'an, China
| | - Jinghao Peng
- School of Computer Science, Northwestern Polytechnical University, 710072, Xi'an, China.,Key Laboratory of Big Data Storage and Management, Northwestern Polytechnical University, 710072, Xi'an, China
| | - Tao Wang
- School of Computer Science, Northwestern Polytechnical University, 710072, Xi'an, China.,Key Laboratory of Big Data Storage and Management, Northwestern Polytechnical University, 710072, Xi'an, China
| | - Jianye Hao
- School of Computer Software, Tianjin University, 300350, Tianjin, China
| | - Xuequn Shang
- School of Computer Science, Northwestern Polytechnical University, 710072, Xi'an, China.,Key Laboratory of Big Data Storage and Management, Northwestern Polytechnical University, 710072, Xi'an, China
| | - Zhongyu Wei
- School of Computer Science, Northwestern Polytechnical University, 710072, Xi'an, China
| | - Peican Zhu
- School of Computer Science, Northwestern Polytechnical University, 710072, Xi'an, China
| | - Jiajie Peng
- School of Computer Science, Northwestern Polytechnical University, 710072, Xi'an, China.,Key Laboratory of Big Data Storage and Management, Northwestern Polytechnical University, 710072, Xi'an, China
| |
Collapse
|
17
|
Zhang R, Zhou T, Ma J. Ultrafast and interpretable single-cell 3D genome analysis with Fast-Higashi. Cell Syst 2022; 13:798-807.e6. [PMID: 36265466 PMCID: PMC9867958 DOI: 10.1016/j.cels.2022.09.004] [Citation(s) in RCA: 29] [Impact Index Per Article: 9.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/20/2022] [Revised: 09/01/2022] [Accepted: 09/13/2022] [Indexed: 01/26/2023]
Abstract
Single-cell Hi-C (scHi-C) technologies can probe three-dimensional (3D) genome structures in individual cells. However, existing scHi-C analysis methods are hindered by the data quality and complex 3D genome patterns. The lack of computational scalability and interpretability poses further challenges for large-scale analysis. Here, we introduce Fast-Higashi, an ultrafast and interpretable method based on tensor decomposition and partial random walk with restart, enabling joint identification of cell identities and chromatin meta-interactions from sparse scHi-C data. Extensive evaluations demonstrate the advantage of Fast-Higashi over existing methods, leading to improved delineation of rare cell types and continuous developmental trajectories. Fast-Higashi can directly identify 3D genome features that define distinct cell types and help elucidate cell-type-specific connections between genome structure and function. Moreover, Fast-Higashi can generalize to incorporate other single-cell omics data. Fast-Higashi provides a highly efficient and interpretable scHi-C analysis solution that is applicable to a broad range of biological contexts.
Collapse
Affiliation(s)
- Ruochi Zhang
- Computational Biology Department, School of Computer Science, Carnegie Mellon University, Pittsburgh, PA 15213, USA.
| | - Tianming Zhou
- Computational Biology Department, School of Computer Science, Carnegie Mellon University, Pittsburgh, PA 15213, USA
| | - Jian Ma
- Computational Biology Department, School of Computer Science, Carnegie Mellon University, Pittsburgh, PA 15213, USA.
| |
Collapse
|
18
|
Zheng Y, Shen S, Keleş S. Normalization and de-noising of single-cell Hi-C data with BandNorm and scVI-3D. Genome Biol 2022; 23:222. [PMID: 36253828 PMCID: PMC9575231 DOI: 10.1186/s13059-022-02774-z] [Citation(s) in RCA: 25] [Impact Index Per Article: 8.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/16/2021] [Accepted: 09/19/2022] [Indexed: 11/10/2022] Open
Abstract
Single-cell high-throughput chromatin conformation capture methodologies (scHi-C) enable profiling of long-range genomic interactions. However, data from these technologies are prone to technical noise and biases that hinder downstream analysis. We develop a normalization approach, BandNorm, and a deep generative modeling framework, scVI-3D, to account for scHi-C specific biases. In benchmarking experiments, BandNorm yields leading performances in a time and memory efficient manner for cell-type separation, identification of interacting loci, and recovery of cell-type relationships, while scVI-3D exhibits advantages for rare cell types and under high sparsity scenarios. Application of BandNorm coupled with gene-associating domain analysis reveals scRNA-seq validated sub-cell type identification.
Collapse
Affiliation(s)
- Ye Zheng
- Vaccine and Infectious Disease Division, Fred Hutchinson Cancer Center, Seattle, USA
| | - Siqi Shen
- Department of Biostatistics and Medical Informatics, University of Wisconsin - Madison, Madison, USA
| | - Sündüz Keleş
- Department of Biostatistics and Medical Informatics, University of Wisconsin - Madison, Madison, USA
- Department of Statistics, University of Wisconsin - Madison, Madison, USA
| |
Collapse
|
19
|
Dam TV, Toft NI, Grøntved L. Cell-Type Resolved Insights into the Cis-Regulatory Genome of NAFLD. Cells 2022; 11:870. [PMID: 35269495 PMCID: PMC8909044 DOI: 10.3390/cells11050870] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/27/2022] [Revised: 02/27/2022] [Accepted: 02/28/2022] [Indexed: 11/20/2022] Open
Abstract
The prevalence of non-alcoholic fatty liver disease (NAFLD) is increasing rapidly, and unmet treatment can result in the development of hepatitis, fibrosis, and liver failure. There are difficulties involved in diagnosing NAFLD early and for this reason there are challenges involved in its treatment. Furthermore, no drugs are currently approved to alleviate complications, a fact which highlights the need for further insight into disease mechanisms. NAFLD pathogenesis is associated with complex cellular changes, including hepatocyte steatosis, immune cell infiltration, endothelial dysfunction, hepatic stellate cell activation, and epithelial ductular reaction. Many of these cellular changes are controlled by dramatic changes in gene expression orchestrated by the cis-regulatory genome and associated transcription factors. Thus, to understand disease mechanisms, we need extensive insights into the gene regulatory mechanisms associated with tissue remodeling. Mapping cis-regulatory regions genome-wide is a step towards this objective and several current and emerging technologies allow detection of accessible chromatin and specific histone modifications in enriched cell populations of the liver, as well as in single cells. Here, we discuss recent insights into the cis-regulatory genome in NAFLD both at the organ-level and in specific cell populations of the liver. Moreover, we highlight emerging technologies that enable single-cell resolved analysis of the cis-regulatory genome of the liver.
Collapse
Affiliation(s)
| | | | - Lars Grøntved
- Department of Biochemistry and Molecular Biology, University of Southern Denmark, DK-5230 Odense, Denmark; (T.V.D.); (N.I.T.)
| |
Collapse
|
20
|
Zhang R, Zhou T, Ma J. Multiscale and integrative single-cell Hi-C analysis with Higashi. Nat Biotechnol 2022; 40:254-261. [PMID: 34635838 PMCID: PMC8843812 DOI: 10.1038/s41587-021-01034-y] [Citation(s) in RCA: 104] [Impact Index Per Article: 34.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/12/2021] [Accepted: 07/27/2021] [Indexed: 02/08/2023]
Abstract
Single-cell Hi-C (scHi-C) can identify cell-to-cell variability of three-dimensional (3D) chromatin organization, but the sparseness of measured interactions poses an analysis challenge. Here we report Higashi, an algorithm based on hypergraph representation learning that can incorporate the latent correlations among single cells to enhance overall imputation of contact maps. Higashi outperforms existing methods for embedding and imputation of scHi-C data and is able to identify multiscale 3D genome features in single cells, such as compartmentalization and TAD-like domain boundaries, allowing refined delineation of their cell-to-cell variability. Moreover, Higashi can incorporate epigenomic signals jointly profiled in the same cell into the hypergraph representation learning framework, as compared to separate analysis of two modalities, leading to improved embeddings for single-nucleus methyl-3C data. In an scHi-C dataset from human prefrontal cortex, Higashi identifies connections between 3D genome features and cell-type-specific gene regulation. Higashi can also potentially be extended to analyze single-cell multiway chromatin interactions and other multimodal single-cell omics data.
Collapse
Affiliation(s)
- Ruochi Zhang
- Computational Biology Department, School of Computer Science, Carnegie Mellon University, Pittsburgh, PA, USA
| | - Tianming Zhou
- Computational Biology Department, School of Computer Science, Carnegie Mellon University, Pittsburgh, PA, USA
| | - Jian Ma
- Computational Biology Department, School of Computer Science, Carnegie Mellon University, Pittsburgh, PA, USA.
| |
Collapse
|
21
|
Yang T, He X, An L, Li Q. Methods to Assess the Reproducibility and Similarity of Hi-C Data. Methods Mol Biol 2022; 2301:17-37. [PMID: 34415529 DOI: 10.1007/978-1-0716-1390-0_2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/13/2023]
Abstract
Hi-C experiments are costly to perform and involve multiple complex experimental steps. Reproducibility of Hi-C data is essential for ensuring the validity of the scientific conclusions drawn from the data. In this chapter, we describe several recently developed computational methods for assessing reproducibility of Hi-C replicate experiments. These methods can also be used to assess the similarity between any two Hi-C samples.
Collapse
Affiliation(s)
- Tao Yang
- Bioinformatics and Genomics Program, Pennsylvania State University, University Park, PA, USA
| | - Xi He
- Bioinformatics and Genomics Program, Pennsylvania State University, University Park, PA, USA
| | - Lin An
- Bioinformatics and Genomics Program, Pennsylvania State University, University Park, PA, USA
| | - Qunhua Li
- Department of Statistics, Pennsylvania State University, University Park, PA, USA.
| |
Collapse
|
22
|
Mapping nucleosome and chromatin architectures: A survey of computational methods. Comput Struct Biotechnol J 2022; 20:3955-3962. [PMID: 35950186 PMCID: PMC9340519 DOI: 10.1016/j.csbj.2022.07.037] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/08/2022] [Revised: 07/22/2022] [Accepted: 07/22/2022] [Indexed: 11/21/2022] Open
Abstract
With ever-growing genomic sequencing data, the data variabilities and the underlying biases of the sequencing technologies pose significant computational challenges ranging from the need for accurately detecting the nucleosome positioning or chromatin interaction to the need for developing normalization methods to eliminate systematic biases. This review mainly surveys the computational methods for mapping the higher-resolution nucleosome and higher-order chromatin architectures. While a detailed discussion of the underlying algorithms is beyond the scope of our survey, we have discussed the methods and tools that can detect the nucleosomes in the genome, then demonstrated the computational methods for identifying 3D chromatin domains and interactions. We further illustrated computational approaches for integrating multi-omics data with Hi-C data and the advance of single-cell (sc)Hi-C data analysis. Our survey provides a comprehensive and valuable resource for biomedical scientists interested in studying nucleosome organization and chromatin structures as well as for computational scientists who are interested in improving upon them.
Collapse
|
23
|
Gharavi E, Gu A, Zheng G, Smith JP, Cho HJ, Zhang A, Brown DE, Sheffield NC. Embeddings of genomic region sets capture rich biological associations in lower dimensions. Bioinformatics 2021; 37:4299-4306. [PMID: 34156475 PMCID: PMC8652032 DOI: 10.1093/bioinformatics/btab439] [Citation(s) in RCA: 13] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/18/2020] [Revised: 06/07/2021] [Accepted: 06/15/2021] [Indexed: 11/12/2022] Open
Abstract
MOTIVATION Genomic region sets summarize functional genomics data and define locations of interest in the genome such as regulatory regions or transcription factor binding sites. The number of publicly available region sets has increased dramatically, leading to challenges in data analysis. RESULTS We propose a new method to represent genomic region sets as vectors, or embeddings, using an adapted word2vec approach. We compared our approach to two simpler methods based on interval unions or term frequency-inverse document frequency and evaluated the methods in three ways: First, by classifying the cell line, antibody or tissue type of the region set; second, by assessing whether similarity among embeddings can reflect simulated random perturbations of genomic regions; and third, by testing robustness of the proposed representations to different signal thresholds for calling peaks. Our word2vec-based region set embeddings reduce dimensionality from more than a hundred thousand to 100 without significant loss in classification performance. The vector representation could identify cell line, antibody and tissue type with over 90% accuracy. We also found that the vectors could quantitatively summarize simulated random perturbations to region sets and are more robust to subsampling the data derived from different peak calling thresholds. Our evaluations demonstrate that the vectors retain useful biological information in relatively lower-dimensional spaces. We propose that vector representation of region sets is a promising approach for efficient analysis of genomic region data. AVAILABILITY AND IMPLEMENTATION https://github.com/databio/regionset-embedding. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Erfaneh Gharavi
- Center for Public Health Genomics, University of Virginia, Charlottesville, VA 22903, USA
- School of Data Science, University of Virginia, Charlottesville, VA 22903, USA
| | - Aaron Gu
- Center for Public Health Genomics, University of Virginia, Charlottesville, VA 22903, USA
- Department of Computer Science, University of Virginia, Charlottesville, VA 22903, USA
| | - Guangtao Zheng
- Department of Computer Science, University of Virginia, Charlottesville, VA 22903, USA
| | - Jason P Smith
- Center for Public Health Genomics, University of Virginia, Charlottesville, VA 22903, USA
- Department of Biochemistry and Molecular Genetics, University of Virginia, Charlottesville, VA 22903, USA
| | - Hyun Jae Cho
- Center for Public Health Genomics, University of Virginia, Charlottesville, VA 22903, USA
- Department of Computer Science, University of Virginia, Charlottesville, VA 22903, USA
| | - Aidong Zhang
- Department of Computer Science, University of Virginia, Charlottesville, VA 22903, USA
| | - Donald E Brown
- School of Data Science, University of Virginia, Charlottesville, VA 22903, USA
| | - Nathan C Sheffield
- Center for Public Health Genomics, University of Virginia, Charlottesville, VA 22903, USA
- School of Data Science, University of Virginia, Charlottesville, VA 22903, USA
- Department of Biochemistry and Molecular Genetics, University of Virginia, Charlottesville, VA 22903, USA
- Department of Public Health Sciences, University of Virginia, Charlottesville, VA 22903, USA
- Department of Biomedical Engineering, University of Virginia, Charlottesville, VA 22903, USA
| |
Collapse
|
24
|
Huang J, Sheng J, Wang D. Manifold learning analysis suggests strategies to align single-cell multimodal data of neuronal electrophysiology and transcriptomics. Commun Biol 2021; 4:1308. [PMID: 34799674 PMCID: PMC8604989 DOI: 10.1038/s42003-021-02807-6] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/26/2021] [Accepted: 10/26/2021] [Indexed: 12/30/2022] Open
Abstract
Recent single-cell multimodal data reveal multi-scale characteristics of single cells, such as transcriptomics, morphology, and electrophysiology. However, integrating and analyzing such multimodal data to deeper understand functional genomics and gene regulation in various cellular characteristics remains elusive. To address this, we applied and benchmarked multiple machine learning methods to align gene expression and electrophysiological data of single neuronal cells in the mouse brain from the Brain Initiative. We found that nonlinear manifold learning outperforms other methods. After manifold alignment, the cells form clusters highly corresponding to transcriptomic and morphological cell types, suggesting a strong nonlinear relationship between gene expression and electrophysiology at the cell-type level. Also, the electrophysiological features are highly predictable by gene expression on the latent space from manifold alignment. The aligned cells further show continuous changes of electrophysiological features, implying cross-cluster gene expression transitions. Functional enrichment and gene regulatory network analyses for those cell clusters revealed potential genome functions and molecular mechanisms from gene expression to neuronal electrophysiology.
Collapse
Affiliation(s)
- Jiawei Huang
- Department of Statistics, University of Wisconsin - Madison, Madison, WI, 53706, USA
- Carl H. Lindner College of Business, University of Cincinnati, Cincinnati, OH, 45223, USA
| | - Jie Sheng
- Waisman Center, University of Wisconsin - Madison, Madison, WI, 53705, USA
| | - Daifeng Wang
- Waisman Center, University of Wisconsin - Madison, Madison, WI, 53705, USA.
- Department of Biostatistics and Medical Informatics, University of Wisconsin - Madison, Madison, WI, 53706, USA.
- Department of Computer Sciences, University of Wisconsin - Madison, Madison, WI, 53706, USA.
| |
Collapse
|
25
|
Galitsyna AA, Gelfand MS. Single-cell Hi-C data analysis: safety in numbers. Brief Bioinform 2021; 22:bbab316. [PMID: 34406348 PMCID: PMC8575028 DOI: 10.1093/bib/bbab316] [Citation(s) in RCA: 35] [Impact Index Per Article: 8.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/31/2021] [Revised: 07/09/2021] [Accepted: 07/21/2021] [Indexed: 02/06/2023] Open
Abstract
Over the past decade, genome-wide assays for chromatin interactions in single cells have enabled the study of individual nuclei at unprecedented resolution and throughput. Current chromosome conformation capture techniques survey contacts for up to tens of thousands of individual cells, improving our understanding of genome function in 3D. However, these methods recover a small fraction of all contacts in single cells, requiring specialised processing of sparse interactome data. In this review, we highlight recent advances in methods for the interpretation of single-cell genomic contacts. After discussing the strengths and limitations of these methods, we outline frontiers for future development in this rapidly moving field.
Collapse
Affiliation(s)
- Aleksandra A Galitsyna
- Skolkovo Institute of Science and Technology, Skolkovo, Russia
- Institute for Information Transmission Problems, RAS, Moscow, Russia
- Institute of Gene Biology, RAS, Moscow, Russia
| | - Mikhail S Gelfand
- Skolkovo Institute of Science and Technology, Skolkovo, Russia
- Institute for Information Transmission Problems, RAS, Moscow, Russia
| |
Collapse
|
26
|
Ishibashi R, Taguchi YH. Identification of Enhancers and Promoters in the Genome by Multidimensional Scaling. Genes (Basel) 2021; 12:1671. [PMID: 34828279 PMCID: PMC8622094 DOI: 10.3390/genes12111671] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/21/2021] [Revised: 10/12/2021] [Accepted: 10/18/2021] [Indexed: 11/21/2022] Open
Abstract
The positions of enhancers and promoters on genomic DNA remain poorly understood. Chromosomes cannot be observed during the cell division cycle because the genome forms a chromatin structure and spreads within the nucleus. However, high-throughput chromosome conformation capture (Hi-C) measures the physical interactions of genomes. In previous studies, DNA extrusion loops were directly derived from Hi-C heat maps. Multidimensional Scaling (MDS) is used in this assessment to more precisely locate enhancers and promoters. MDS is a multivariate analysis method that reproduces the original coordinates from the distance matrix between elements. We used Hi-C data of cultured osteosarcoma cells and applied MDS as the distance matrix of the genome. In addition, we selected columns 2 and 3 of the orthogonal matrix U as the desired structure. Overall, the DNA loops from the reconstructed genome structure contained bioprocesses involved in transcription, such as the pre-transcriptional initiation complex and RNA polymerase II initiation complex, and transcription factors involved in cancer, such as Foxm1 and CREB3. Therefore, our results are consistent with the biological findings. Our method is suitable for identifying enhancers and promoters in the genome.
Collapse
Affiliation(s)
- Ryo Ishibashi
- Graduate School of Science and Engineering, Chuo University, Tokyo 112-8551, Japan
| | - Y-h. Taguchi
- Department of Physics, Chuo University, Tokyo 112-8551, Japan;
| |
Collapse
|
27
|
Bonora G, Ramani V, Singh R, Fang H, Jackson DL, Srivatsan S, Qiu R, Lee C, Trapnell C, Shendure J, Duan Z, Deng X, Noble WS, Disteche CM. Single-cell landscape of nuclear configuration and gene expression during stem cell differentiation and X inactivation. Genome Biol 2021; 22:279. [PMID: 34579774 PMCID: PMC8474932 DOI: 10.1186/s13059-021-02432-w] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/03/2020] [Accepted: 07/07/2021] [Indexed: 11/25/2022] Open
Abstract
BACKGROUND Mammalian development is associated with extensive changes in gene expression, chromatin accessibility, and nuclear structure. Here, we follow such changes associated with mouse embryonic stem cell differentiation and X inactivation by integrating, for the first time, allele-specific data from these three modalities obtained by high-throughput single-cell RNA-seq, ATAC-seq, and Hi-C. RESULTS Allele-specific contact decay profiles obtained by single-cell Hi-C clearly show that the inactive X chromosome has a unique profile in differentiated cells that have undergone X inactivation. Loss of this inactive X-specific structure at mitosis is followed by its reappearance during the cell cycle, suggesting a "bookmark" mechanism. Differentiation of embryonic stem cells to follow the onset of X inactivation is associated with changes in contact decay profiles that occur in parallel on both the X chromosomes and autosomes. Single-cell RNA-seq and ATAC-seq show evidence of a delay in female versus male cells, due to the presence of two active X chromosomes at early stages of differentiation. The onset of the inactive X-specific structure in single cells occurs later than gene silencing, consistent with the idea that chromatin compaction is a late event of X inactivation. Single-cell Hi-C highlights evidence of discrete changes in nuclear structure characterized by the acquisition of very long-range contacts throughout the nucleus. Novel computational approaches allow for the effective alignment of single-cell gene expression, chromatin accessibility, and 3D chromosome structure. CONCLUSIONS Based on trajectory analyses, three distinct nuclear structure states are detected reflecting discrete and profound simultaneous changes not only to the structure of the X chromosomes, but also to that of autosomes during differentiation. Our study reveals that long-range structural changes to chromosomes appear as discrete events, unlike progressive changes in gene expression and chromatin accessibility.
Collapse
Affiliation(s)
- Giancarlo Bonora
- Department of Genome Sciences, University of Washington, Seattle, WA, USA
| | - Vijay Ramani
- Department of Biochemistry & Biophysics, University of California San Francisco, San Francisco, CA, USA
| | - Ritambhara Singh
- Department of Computer Science, Brown University, Providence, RI, USA
- Center for Computational Molecular Biology, Brown University, Providence, RI, USA
| | - He Fang
- Department of Laboratory Medicine and Pathology, University of Washington, Seattle, WA, USA
| | - Dana L Jackson
- Department of Genome Sciences, University of Washington, Seattle, WA, USA
| | - Sanjay Srivatsan
- Department of Genome Sciences, University of Washington, Seattle, WA, USA
| | - Ruolan Qiu
- Department of Genome Sciences, University of Washington, Seattle, WA, USA
| | - Choli Lee
- Department of Genome Sciences, University of Washington, Seattle, WA, USA
| | - Cole Trapnell
- Department of Genome Sciences, University of Washington, Seattle, WA, USA
- Brotman Baty Institute for Precision Medicine, Seattle, WA, USA
- Allen Discovery Center for Cell Lineage Tracing, Seattle, WA, USA
| | - Jay Shendure
- Department of Genome Sciences, University of Washington, Seattle, WA, USA
- Brotman Baty Institute for Precision Medicine, Seattle, WA, USA
- Allen Discovery Center for Cell Lineage Tracing, Seattle, WA, USA
- Howard Hughes Medical Institute, Seattle, WA, USA
| | - Zhijun Duan
- Institute for Stem Cell and Regenerative Medicine, University of Washington, Seattle, USA
- Division of Hematology, Department of Medicine, University of Washington, Seattle, USA
| | - Xinxian Deng
- Department of Laboratory Medicine and Pathology, University of Washington, Seattle, WA, USA.
| | - William S Noble
- Department of Genome Sciences, University of Washington, Seattle, WA, USA.
- Paul G. Allen School of Computer Science and Engineering, University of Washington, Seattle, WA, USA.
| | - Christine M Disteche
- Department of Laboratory Medicine and Pathology, University of Washington, Seattle, WA, USA.
- Brotman Baty Institute for Precision Medicine, Seattle, WA, USA.
- Department of Medicine, University of Washington, Seattle, WA, USA.
| |
Collapse
|
28
|
Wu H, Wu Y, Jiang Y, Zhou B, Zhou H, Chen Z, Xiong Y, Liu Q, Zhang H. scHiCStackL: a stacking ensemble learning-based method for single-cell Hi-C classification using cell embedding. Brief Bioinform 2021; 23:6374065. [PMID: 34553746 DOI: 10.1093/bib/bbab396] [Citation(s) in RCA: 22] [Impact Index Per Article: 5.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/17/2021] [Revised: 08/25/2021] [Accepted: 08/30/2021] [Indexed: 11/13/2022] Open
Abstract
Single-cell Hi-C data are a common data source for studying the differences in the three-dimensional structure of cell chromosomes. The development of single-cell Hi-C technology makes it possible to obtain batches of single-cell Hi-C data. How to quickly and effectively discriminate cell types has become one hot research field. However, the existing computational methods to predict cell types based on Hi-C data are found to be low in accuracy. Therefore, we propose a high accuracy cell classification algorithm, called scHiCStackL, based on single-cell Hi-C data. In our work, we first improve the existing data preprocessing method for single-cell Hi-C data, which allows the generated cell embedding better to represent cells. Then, we construct a two-layer stacking ensemble model for classifying cells. Experimental results show that the cell embedding generated by our data preprocessing method increases by 0.23, 1.22, 1.46 and 1.61$\%$ comparing with the cell embedding generated by the previously published method scHiCluster, in terms of the Acc, MCC, F1 and Precision confidence intervals, respectively, on the task of classifying human cells in the ML1 and ML3 datasets. When using the two-layer stacking ensemble framework with the cell embedding, scHiCStackL improves by 13.33, 19, 19.27 and 14.5 over the scHiCluster, in terms of the Acc, ARI, NMI and F1 confidence intervals, respectively. In summary, scHiCStackL achieves superior performance in predicting cell types using the single-cell Hi-C data. The webserver and source code of scHiCStackL are freely available at http://hww.sdu.edu.cn:8002/scHiCStackL/ and https://github.com/HaoWuLab-Bioinformatics/scHiCStackL, respectively.
Collapse
Affiliation(s)
- Hao Wu
- College of Information Engineering, Northwest A&F University, Yangling, 712100, Shaanxi, China.,School of Software, Shandong University, Jinan, 250101, Shandong, China
| | - Yingfu Wu
- College of Information Engineering, Northwest A&F University, Yangling, 712100, Shaanxi, China
| | - Yuhong Jiang
- College of Information Engineering, Northwest A&F University, Yangling, 712100, Shaanxi, China
| | - Bing Zhou
- College of Information Engineering, Northwest A&F University, Yangling, 712100, Shaanxi, China
| | - Haoru Zhou
- College of Information Engineering, Northwest A&F University, Yangling, 712100, Shaanxi, China
| | - Zhongli Chen
- College of Information Engineering, Northwest A&F University, Yangling, 712100, Shaanxi, China
| | - Yi Xiong
- State Key Laboratory of Microbial Metabolism and School of Life Sciences and Biotechnology, School of Life Sciences and Biotechnology, Shanghai Jiao Tong University, 200240, Shanghai, China
| | - Quanzhong Liu
- College of Information Engineering, Northwest A&F University, Yangling, 712100, Shaanxi, China
| | - Hongming Zhang
- College of Information Engineering, Northwest A&F University, Yangling, 712100, Shaanxi, China
| |
Collapse
|
29
|
Abstract
The spatial organization of the genome in the cell nucleus is pivotal to cell function. However, how the 3D genome organization and its dynamics influence cellular phenotypes remains poorly understood. The very recent development of single-cell technologies for probing the 3D genome, especially single-cell Hi-C (scHi-C), has ushered in a new era of unveiling cell-to-cell variability of 3D genome features at an unprecedented resolution. Here, we review recent developments in computational approaches to the analysis of scHi-C, including data processing, dimensionality reduction, imputation for enhancing data quality, and the revealing of 3D genome features at single-cell resolution. While much progress has been made in computational method development to analyze single-cell 3D genomes, substantial future work is needed to improve data interpretation and multimodal data integration, which are critical to reveal fundamental connections between genome structure and function among heterogeneous cell populations in various biological contexts.
Collapse
Affiliation(s)
- Tianming Zhou
- Computational Biology Department, School of Computer Science, Carnegie Mellon University, Pittsburgh, Pennsylvania 15213, USA;
| | - Ruochi Zhang
- Computational Biology Department, School of Computer Science, Carnegie Mellon University, Pittsburgh, Pennsylvania 15213, USA;
| | - Jian Ma
- Computational Biology Department, School of Computer Science, Carnegie Mellon University, Pittsburgh, Pennsylvania 15213, USA;
| |
Collapse
|
30
|
Li X, Feng F, Pu H, Leung WY, Liu J. scHiCTools: A computational toolbox for analyzing single-cell Hi-C data. PLoS Comput Biol 2021; 17:e1008978. [PMID: 34003823 PMCID: PMC8162587 DOI: 10.1371/journal.pcbi.1008978] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/17/2020] [Revised: 05/28/2021] [Accepted: 04/18/2021] [Indexed: 11/18/2022] Open
Abstract
Single-cell Hi-C (scHi-C) sequencing technologies allow us to investigate three-dimensional chromatin organization at the single-cell level. However, we still need computational tools to deal with the sparsity of the contact maps from single cells and embed single cells in a lower-dimensional Euclidean space. This embedding helps us understand relationships between the cells in different dimensions, such as cell-cycle dynamics and cell differentiation. We present an open-source computational toolbox, scHiCTools, for analyzing single-cell Hi-C data comprehensively and efficiently. The toolbox provides two methods for screening single cells, three common methods for smoothing scHi-C data, three efficient methods for calculating the pairwise similarity of cells, three methods for embedding single cells, three methods for clustering cells, and a build-in function to visualize the cells embedding in a two-dimensional or three-dimensional plot. scHiCTools, written in Python3, is compatible with different platforms, including Linux, macOS, and Windows.
Collapse
Affiliation(s)
- Xinjun Li
- Department of Statistics, University of Michigan, Ann Arbor, Michigan, United States of America
| | - Fan Feng
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, Michigan, United States of America
| | - Hongxi Pu
- College of Literature Science, and the Arts, University of Michigan, Ann Arbor, Michigan, United States of America
| | - Wai Yan Leung
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, Michigan, United States of America
| | - Jie Liu
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, Michigan, United States of America
- * E-mail:
| |
Collapse
|
31
|
Lin D, Sanders J, Noble WS. HiCRep.py: fast comparison of Hi-C contact matrices in Python. Bioinformatics 2021; 37:2996-2997. [PMID: 33576390 PMCID: PMC8479650 DOI: 10.1093/bioinformatics/btab097] [Citation(s) in RCA: 17] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/27/2020] [Revised: 12/17/2020] [Accepted: 02/08/2021] [Indexed: 02/02/2023] Open
Abstract
MOTIVATION Hi-C is the most widely used assay for investigating genome-wide 3D organization of chromatin. When working with Hi-C data, it is often useful to calculate the similarity between contact matrices in order to assess experimental reproducibility or to quantify relationships among Hi-C data from related samples. The HiCRep algorithm has been widely adopted for this task, but the existing R implementation suffers from run time limitations on high-resolution Hi-C data or on large single-cell Hi-C datasets. RESULTS We introduce a Python implementation of HiCRep and demonstrate that it is much faster and consumes much less memory than the existing R implementation. Furthermore, we give examples of HiCRep's ability to accurately distinguish replicates from non-replicates and to reveal cell type structure among collections of Hi-C data. AVAILABILITY AND IMPLEMENTATION HiCRep.py and its documentation are available with a GPL license at https://github.com/Noble-Lab/hicrep. The software may be installed automatically using the pip package installer. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Dejun Lin
- Department of Genome Sciences, University of Washington, Seattle, WA 98040, USA
| | - Justin Sanders
- Department of Computer Science, Brown University, Providence, RI 02912, USA
| | - William Stafford Noble
- Department of Genome Sciences, University of Washington, Seattle, WA 98040, USA,Paul G. Allen School of Computer Science and Engineering, University of Washington, Seattle, WA 98040, USA,To whom correspondence should be addressed.
| |
Collapse
|
32
|
Bulathsinghalage C, Liu L. Network-based method for regions with statistically frequent interchromosomal interactions at single-cell resolution. BMC Bioinformatics 2020; 21:369. [PMID: 32998686 PMCID: PMC7526258 DOI: 10.1186/s12859-020-03689-x] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/02/2023] Open
Abstract
BACKGROUND Chromosome conformation capture-based methods, especially Hi-C, enable scientists to detect genome-wide chromatin interactions and study the spatial organization of chromatin, which plays important roles in gene expression regulation, DNA replication and repair etc. Thus, developing computational methods to unravel patterns behind the data becomes critical. Existing computational methods focus on intrachromosomal interactions and ignore interchromosomal interactions partly because there is no prior knowledge for interchromosomal interactions and the frequency of interchromosomal interactions is much lower while the search space is much larger. With the development of single-cell technologies, the advent of single-cell Hi-C makes interrogating the spatial structure of chromatin at single-cell resolution possible. It also brings a new type of frequency information, the number of single cells with chromatin interactions between two disjoint chromosome regions. RESULTS Considering the lack of computational methods on interchromosomal interactions and the unsurprisingly frequent intrachromosomal interactions along the diagonal of a chromatin contact map, we propose a computational method dedicated to analyzing interchromosomal interactions of single-cell Hi-C with this new frequency information. To the best of our knowledge, our proposed tool is the first to identify regions with statistically frequent interchromosomal interactions at single-cell resolution. We demonstrate that the tool utilizing networks and binomial statistical tests can identify interesting structural regions through visualization, comparison and enrichment analysis and it also supports different configurations to provide users with flexibility. CONCLUSIONS It will be a useful tool for analyzing single-cell Hi-C interchromosomal interactions.
Collapse
Affiliation(s)
| | - Lu Liu
- North Dakota State University, 1340 Administration Ave, Fargo, 58102, USA.
| |
Collapse
|
33
|
Kim HJ, Yardımcı GG, Bonora G, Ramani V, Liu J, Qiu R, Lee C, Hesson J, Ware CB, Shendure J, Duan Z, Noble WS. Capturing cell type-specific chromatin compartment patterns by applying topic modeling to single-cell Hi-C data. PLoS Comput Biol 2020; 16:e1008173. [PMID: 32946435 PMCID: PMC7526900 DOI: 10.1371/journal.pcbi.1008173] [Citation(s) in RCA: 64] [Impact Index Per Article: 12.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/20/2019] [Revised: 09/30/2020] [Accepted: 07/21/2020] [Indexed: 01/01/2023] Open
Abstract
Single-cell Hi-C (scHi-C) interrogates genome-wide chromatin interaction in individual cells, allowing us to gain insights into 3D genome organization. However, the extremely sparse nature of scHi-C data poses a significant barrier to analysis, limiting our ability to tease out hidden biological information. In this work, we approach this problem by applying topic modeling to scHi-C data. Topic modeling is well-suited for discovering latent topics in a collection of discrete data. For our analysis, we generate nine different single-cell combinatorial indexed Hi-C (sci-Hi-C) libraries from five human cell lines (GM12878, H1Esc, HFF, IMR90, and HAP1), consisting over 19,000 cells. We demonstrate that topic modeling is able to successfully capture cell type differences from sci-Hi-C data in the form of “chromatin topics.” We further show enrichment of particular compartment structures associated with locus pairs in these topics. The genomes of higher organisms are intricately folded and organized in a dynamic manner that has strong implications for many biological processes. Each chromosome undergoes dramatic changes to their three dimensional conformation during the cell cycle, whereas the positioning of chromosomes within the nucleus plays an important role in controlling the activation of specific genes. Recently, it has become possible to investigate the 3D conformations of the genomes of individual cells using a high throughput sequencing assay called single cell Hi-C (scHi-C). However, data from these assays are sparse and noisy, making analysis and interpretation of scHi-C data challenging. In this work, we generated a scHi-C dataset of over 19,000 cells from five human cell lines and applied a natural language processing method called topic modeling to discover cell type-specific “chromatin” topics. We show that these topics can be used to distinguish between cells at different stages of the cell cycle and cells from different tissues based on the 3D conformation of their genomes, despite the sparsity of the data. We further show that the 3D conformations of single cells are linked to the expression of cell type-specific genes and to cell cycle-associated conformational patterns.
Collapse
Affiliation(s)
- Hyeon-Jin Kim
- Department of Genome Sciences, University of Washington, Seattle, Washington, United States of America
| | - Galip Gürkan Yardımcı
- Department of Genome Sciences, University of Washington, Seattle, Washington, United States of America
| | - Giancarlo Bonora
- Department of Genome Sciences, University of Washington, Seattle, Washington, United States of America
| | - Vijay Ramani
- Department of Genome Sciences, University of Washington, Seattle, Washington, United States of America
- Department of Biochemistry and Biophysics, University of California San Francisco, San Francisco, California, United States of America
| | - Jie Liu
- Department of Genome Sciences, University of Washington, Seattle, Washington, United States of America
| | - Ruolan Qiu
- Department of Genome Sciences, University of Washington, Seattle, Washington, United States of America
| | - Choli Lee
- Department of Genome Sciences, University of Washington, Seattle, Washington, United States of America
| | - Jennifer Hesson
- Department of Comparative Medicine, University of Washington, Seattle, Washington, United States of America
- Institute for Stem Cell and Regenerative Medicine, University of Washington, Seattle, Washington, United States of America
| | - Carol B. Ware
- Department of Comparative Medicine, University of Washington, Seattle, Washington, United States of America
- Institute for Stem Cell and Regenerative Medicine, University of Washington, Seattle, Washington, United States of America
| | - Jay Shendure
- Department of Genome Sciences, University of Washington, Seattle, Washington, United States of America
| | - Zhijun Duan
- Division of Hematology, Department of Medicine, University of Washington, Seattle, Washington, United States of America
- Institute for Stem Cell and Regenerative Medicine, University of Washington, Seattle, Washington, United States of America
- * E-mail: (Z. Duan); (W.S. Noble)
| | - William Stafford Noble
- Department of Genome Sciences, University of Washington, Seattle, Washington, United States of America
- Paul G. Allen School of Computer Science & Engineering, University of Washington, Seattle, Washington, United States of America
- * E-mail: (Z. Duan); (W.S. Noble)
| |
Collapse
|
34
|
Ma A, McDermaid A, Xu J, Chang Y, Ma Q. Integrative Methods and Practical Challenges for Single-Cell Multi-omics. Trends Biotechnol 2020; 38:1007-1022. [PMID: 32818441 PMCID: PMC7442857 DOI: 10.1016/j.tibtech.2020.02.013] [Citation(s) in RCA: 146] [Impact Index Per Article: 29.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/19/2019] [Revised: 02/27/2020] [Accepted: 02/28/2020] [Indexed: 12/19/2022]
Abstract
Fast-developing single-cell multimodal omics (scMulti-omics) technologies enable the measurement of multiple modalities, such as DNA methylation, chromatin accessibility, RNA expression, protein abundance, gene perturbation, and spatial information, from the same cell. scMulti-omics can comprehensively explore and identify cell characteristics, while also presenting challenges to the development of computational methods and tools for integrative analyses. Here, we review these integrative methods and summarize the existing tools for studying a variety of scMulti-omics data. The various functionalities and practical challenges in using the available tools in the public domain are explored through several case studies. Finally, we identify remaining challenges and future trends in scMulti-omics modeling and analyses.
Collapse
Affiliation(s)
- Anjun Ma
- Department of Biomedical Informatics, The Ohio State University, Columbus, OH 43235, USA
| | - Adam McDermaid
- Imagenetics, Sanford Health, Sioux Falls, SD 57104, USA; Department of Internal Medicine, University of South Dakota, Virmillion, SD 57069, USA
| | - Jennifer Xu
- Department of Biomedical Informatics, The Ohio State University, Columbus, OH 43235, USA; Department of Biostatistics, University of North Carolina at Chapel Hill, Chapel Hill, NC, 27599, USA
| | - Yuzhou Chang
- Department of Biomedical Informatics, The Ohio State University, Columbus, OH 43235, USA
| | - Qin Ma
- Department of Biomedical Informatics, The Ohio State University, Columbus, OH 43235, USA.
| |
Collapse
|
35
|
Ramani V, Deng X, Qiu R, Lee C, Disteche CM, Noble WS, Shendure J, Duan Z. Sci-Hi-C: A single-cell Hi-C method for mapping 3D genome organization in large number of single cells. Methods 2020; 170:61-68. [PMID: 31536770 PMCID: PMC6949367 DOI: 10.1016/j.ymeth.2019.09.012] [Citation(s) in RCA: 46] [Impact Index Per Article: 9.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/15/2019] [Accepted: 09/13/2019] [Indexed: 12/31/2022] Open
Abstract
The highly dynamic nature of chromosome conformation and three-dimensional (3D) genome organization leads to cell-to-cell variability in chromatin interactions within a cell population, even if the cells of the population appear to be functionally homogeneous. Hence, although Hi-C is a powerful tool for mapping 3D genome organization, this heterogeneity of chromosome higher order structure among individual cells limits the interpretive power of population based bulk Hi-C assays. Moreover, single-cell studies have the potential to enable the identification and characterization of rare cell populations or cell subtypes in a heterogeneous population. However, it may require surveying relatively large numbers of single cells to achieve statistically meaningful observations in single-cell studies. By applying combinatorial cellular indexing to chromosome conformation capture, we developed single-cell combinatorial indexed Hi-C (sci-Hi-C), a high throughput method that enables mapping chromatin interactomes in large number of single cells. We demonstrated the use of sci-Hi-C data to separate cells by karytoypic and cell-cycle state differences and to identify cellular variability in mammalian chromosomal conformation. Here, we provide a detailed description of method design and step-by-step working protocols for sci-Hi-C.
Collapse
Affiliation(s)
- Vijay Ramani
- Department of Genome Sciences, University of Washington, Seattle, WA, United States.
| | - Xinxian Deng
- Department of Pathology, University of Washington, Seattle, WA, United States
| | - Ruolan Qiu
- Department of Genome Sciences, University of Washington, Seattle, WA, United States
| | - Choli Lee
- Department of Genome Sciences, University of Washington, Seattle, WA, United States
| | - Christine M Disteche
- Department of Pathology, University of Washington, Seattle, WA, United States; Department of Medicine, University of Washington, Seattle, WA, United States
| | - William S Noble
- Department of Genome Sciences, University of Washington, Seattle, WA, United States
| | - Jay Shendure
- Department of Genome Sciences, University of Washington, Seattle, WA, United States; Howard Hughes Medical Institute, Seattle, WA, United States.
| | - Zhijun Duan
- Division of Hematology, University of Washington School of Medicine, Seattle, WA, United States; Institute for Stem Cell and Regenerative Medicine, University of Washington, Seattle, WA, United States.
| |
Collapse
|
36
|
Ben-Elazar S, Chor B, Yakhini Z. The Functional 3D Organization of Unicellular Genomes. Sci Rep 2019; 9:12734. [PMID: 31484964 PMCID: PMC6726614 DOI: 10.1038/s41598-019-48798-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/09/2019] [Accepted: 08/12/2019] [Indexed: 11/09/2022] Open
Abstract
Genome conformation capture techniques permit a systematic investigation into the functional spatial organization of genomes, including functional aspects like assessing the co-localization of sets of genomic elements. For example, the co-localization of genes targeted by a transcription factor (TF) within a transcription factory. We quantify spatial co-localization using a rigorous statistical model that measures the enrichment of a subset of elements in neighbourhoods inferred from Hi-C data. We also control for co-localization that can be attributed to genomic order. We systematically apply our open-sourced framework, spatial-mHG, to search for spatial co-localization phenomena in multiple unicellular Hi-C datasets with corresponding genomic annotations. Our biological findings shed new light on the functional spatial organization of genomes, including: In C. crescentus, DNA replication genes reside in two genomic clusters that are spatially co-localized. Furthermore, these clusters contain similar gene copies and lay in genomic vicinity to the ori and ter sequences. In S. cerevisae, Ty5 retrotransposon family element spatially co-localize at a spatially adjacent subset of telomeres. In N. crassa, both Proteasome lid subcomplex genes and protein refolding genes jointly spatially co-localize at a shared location. An implementation of our algorithms is available online.
Collapse
|
37
|
Lee DS, Luo C, Zhou J, Chandran S, Rivkin A, Bartlett A, Nery JR, Fitzpatrick C, O'Connor C, Dixon JR, Ecker JR. Simultaneous profiling of 3D genome structure and DNA methylation in single human cells. Nat Methods 2019; 16:999-1006. [PMID: 31501549 PMCID: PMC6765423 DOI: 10.1038/s41592-019-0547-z] [Citation(s) in RCA: 210] [Impact Index Per Article: 35.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/28/2018] [Accepted: 08/02/2019] [Indexed: 12/21/2022]
Abstract
Dynamic 3D chromatin conformation is a critical mechanism for gene regulation during development and disease. Despite this, profiling of 3D genome structure from complex tissues with cell-type specific resolution remains challenging. Recent efforts have demonstrated that cell-type specific epigenomic features can be resolved in complex tissues using single-cell assays. However, it remains unclear whether single-cell Chromatin Conformation Capture (3C) or Hi-C profiles can effectively identify cell types and reconstruct cell-type specific chromatin conformation maps. To address these challenges, we have developed single-nucleus methyl-3C sequencing (sn-m3C-seq) to capture chromatin organization and DNA methylation information and robustly separate heterogeneous cell types. Applying this method to >4,200 single human brain prefrontal cortex cells, we reconstruct cell-type specific chromatin conformation maps from 14 cortical cell types. These datasets reveal the genome-wide association between cell-type specific chromatin conformation and differential DNA methylation, suggesting pervasive interactions between epigenetic processes regulating gene expression.
Collapse
Affiliation(s)
- Dong-Sung Lee
- Peptide Biology Laboratory, The Salk Institute for Biological Studies, La Jolla, CA, USA
| | - Chongyuan Luo
- Genomic Analysis Laboratory, The Salk Institute for Biological Studies, La Jolla, CA, USA.,Howard Hughes Medical Institute, The Salk Institute for Biological Studies, La Jolla, CA, USA
| | - Jingtian Zhou
- Genomic Analysis Laboratory, The Salk Institute for Biological Studies, La Jolla, CA, USA
| | - Sahaana Chandran
- Peptide Biology Laboratory, The Salk Institute for Biological Studies, La Jolla, CA, USA
| | - Angeline Rivkin
- Genomic Analysis Laboratory, The Salk Institute for Biological Studies, La Jolla, CA, USA
| | - Anna Bartlett
- Genomic Analysis Laboratory, The Salk Institute for Biological Studies, La Jolla, CA, USA
| | - Joseph R Nery
- Genomic Analysis Laboratory, The Salk Institute for Biological Studies, La Jolla, CA, USA
| | - Conor Fitzpatrick
- Flow Cytometry Core Facility, The Salk Institute for Biological Studies, La Jolla, CA, USA
| | - Carolyn O'Connor
- Flow Cytometry Core Facility, The Salk Institute for Biological Studies, La Jolla, CA, USA
| | - Jesse R Dixon
- Peptide Biology Laboratory, The Salk Institute for Biological Studies, La Jolla, CA, USA.
| | - Joseph R Ecker
- Genomic Analysis Laboratory, The Salk Institute for Biological Studies, La Jolla, CA, USA. .,Howard Hughes Medical Institute, The Salk Institute for Biological Studies, La Jolla, CA, USA.
| |
Collapse
|
38
|
Zhou J, Ma J, Chen Y, Cheng C, Bao B, Peng J, Sejnowski TJ, Dixon JR, Ecker JR. Robust single-cell Hi-C clustering by convolution- and random-walk-based imputation. Proc Natl Acad Sci U S A 2019; 116:14011-14018. [PMID: 31235599 PMCID: PMC6628819 DOI: 10.1073/pnas.1901423116] [Citation(s) in RCA: 106] [Impact Index Per Article: 17.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022] Open
Abstract
Three-dimensional genome structure plays a pivotal role in gene regulation and cellular function. Single-cell analysis of genome architecture has been achieved using imaging and chromatin conformation capture methods such as Hi-C. To study variation in chromosome structure between different cell types, computational approaches are needed that can utilize sparse and heterogeneous single-cell Hi-C data. However, few methods exist that are able to accurately and efficiently cluster such data into constituent cell types. Here, we describe scHiCluster, a single-cell clustering algorithm for Hi-C contact matrices that is based on imputations using linear convolution and random walk. Using both simulated and real single-cell Hi-C data as benchmarks, scHiCluster significantly improves clustering accuracy when applied to low coverage datasets compared with existing methods. After imputation by scHiCluster, topologically associating domain (TAD)-like structures (TLSs) can be identified within single cells, and their consensus boundaries were enriched at the TAD boundaries observed in bulk cell Hi-C samples. In summary, scHiCluster facilitates visualization and comparison of single-cell 3D genomes.
Collapse
Affiliation(s)
- Jingtian Zhou
- Genomic Analysis Laboratory, Salk Institute for Biological Studies, La Jolla, CA 92037
- Bioinformatics and Systems Biology Program, University of California San Diego, La Jolla, CA 92093
| | - Jianzhu Ma
- Department of Medicine, University of California San Diego, La Jolla, CA 92093
| | - Yusi Chen
- Computational Neurobiology Laboratory, Salk Institute for Biological Studies, La Jolla, CA 92037
- Division of Biological Sciences, University of California San Diego, La Jolla, CA 92093
| | - Chuankai Cheng
- Department of Bioengineering, University of California San Diego, La Jolla, CA 92093
| | - Bokan Bao
- Bioinformatics and Systems Biology Program, University of California San Diego, La Jolla, CA 92093
| | - Jian Peng
- Department of Computer Science, University of Illinois at Urbana-Champaign, Urbana, IL 61801
| | - Terrence J Sejnowski
- Computational Neurobiology Laboratory, Salk Institute for Biological Studies, La Jolla, CA 92037
- Division of Biological Sciences, University of California San Diego, La Jolla, CA 92093
| | - Jesse R Dixon
- Peptide Biology Laboratory, Salk Institute for Biological Studies, La Jolla, CA 92037
| | - Joseph R Ecker
- Genomic Analysis Laboratory, Salk Institute for Biological Studies, La Jolla, CA 92037;
- Howard Hughes Medical Institute, Salk Institute for Biological Studies, La Jolla, CA 92037
| |
Collapse
|