1
|
Xu J, Xu X, Huang D, Luo Y, Lin L, Bai X, Zheng Y, Yang Q, Cheng Y, Huang A, Shi J, Bo X, Gu J, Chen H. A comprehensive benchmarking with interpretation and operational guidance for the hierarchy of topologically associating domains. Nat Commun 2024; 15:4376. [PMID: 38782890 PMCID: PMC11116433 DOI: 10.1038/s41467-024-48593-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/18/2023] [Accepted: 05/03/2024] [Indexed: 05/25/2024] Open
Abstract
Topologically associating domains (TADs), megabase-scale features of chromatin spatial architecture, are organized in a domain-within-domain TAD hierarchy. Within TADs, the inner and smaller subTADs not only manifest cell-to-cell variability, but also precisely regulate transcription and differentiation. Although over 20 TAD callers are able to detect TAD, their usability in biomedicine is confined by a disagreement of outputs and a limit in understanding TAD hierarchy. We compare 13 computational tools across various conditions and develop a metric to evaluate the similarity of TAD hierarchy. Although outputs of TAD hierarchy at each level vary among callers, data resolutions, sequencing depths, and matrices normalization, they are more consistent when they have a higher similarity of larger TADs. We present comprehensive benchmarking of TAD hierarchy callers and operational guidance to researchers of life science researchers. Moreover, by simulating the mixing of different types of cells, we confirm that TAD hierarchy is generated not simply from stacking Hi-C heatmaps of heterogeneous cells. Finally, we propose an air conditioner model to decipher the role of TAD hierarchy in transcription.
Collapse
Affiliation(s)
- Jingxuan Xu
- Key Laboratory of Carcinogenesis and Translational Research (Ministry of Education/Beijing), Department of Gastrointestinal Surgery, Peking University Cancer Hospital & Institute, Beijing, 100142, China
| | - Xiang Xu
- Academy of Military Medical Science, Beijing, 100850, China
| | - Dandan Huang
- Department of Oncology, Peking University Shougang Hospital, Beijing, China
- Center for Precision Diagnosis and Treatment of Colorectal Cancer and Inflammatory Diseases, Peking University Health Science Center, Beijing, China
| | - Yawen Luo
- Academy of Military Medical Science, Beijing, 100850, China
| | - Lin Lin
- Academy of Military Medical Science, Beijing, 100850, China
- School of Computer Science and Information Technology& KLAS, Northeast Normal University, Changchun, China
| | - Xuemei Bai
- Academy of Military Medical Science, Beijing, 100850, China
| | - Yang Zheng
- Academy of Military Medical Science, Beijing, 100850, China
| | - Qian Yang
- Academy of Military Medical Science, Beijing, 100850, China
| | - Yu Cheng
- Key Laboratory of Carcinogenesis and Translational Research (Ministry of Education/Beijing), Department of Gastrointestinal Surgery, Peking University Cancer Hospital & Institute, Beijing, 100142, China
| | - An Huang
- Key Laboratory of Carcinogenesis and Translational Research (Ministry of Education/Beijing), Department of Gastrointestinal Surgery, Peking University Cancer Hospital & Institute, Beijing, 100142, China
| | - Jingyi Shi
- Key Laboratory of Carcinogenesis and Translational Research (Ministry of Education/Beijing), Department of Gastrointestinal Surgery, Peking University Cancer Hospital & Institute, Beijing, 100142, China
| | - Xiaochen Bo
- Academy of Military Medical Science, Beijing, 100850, China.
| | - Jin Gu
- Key Laboratory of Carcinogenesis and Translational Research (Ministry of Education/Beijing), Department of Gastrointestinal Surgery, Peking University Cancer Hospital & Institute, Beijing, 100142, China.
- Department of Oncology, Peking University Shougang Hospital, Beijing, China.
- Center for Precision Diagnosis and Treatment of Colorectal Cancer and Inflammatory Diseases, Peking University Health Science Center, Beijing, China.
- Peking-Tsinghua Center for Life Sciences, Peking University, Beijing, China.
- Peking University International Cancer Institute, Beijing, China.
| | - Hebing Chen
- Academy of Military Medical Science, Beijing, 100850, China.
| |
Collapse
|
2
|
Zhou T, Zhang R, Jia D, Doty RT, Munday AD, Gao D, Xin L, Abkowitz JL, Duan Z, Ma J. GAGE-seq concurrently profiles multiscale 3D genome organization and gene expression in single cells. Nat Genet 2024:10.1038/s41588-024-01745-3. [PMID: 38744973 DOI: 10.1038/s41588-024-01745-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/21/2023] [Accepted: 04/05/2024] [Indexed: 05/16/2024]
Abstract
The organization of mammalian genomes features a complex, multiscale three-dimensional (3D) architecture, whose functional significance remains elusive because of limited single-cell technologies that can concurrently profile genome organization and transcriptional activities. Here, we introduce genome architecture and gene expression by sequencing (GAGE-seq), a scalable, robust single-cell co-assay measuring 3D genome structure and transcriptome simultaneously within the same cell. Applied to mouse brain cortex and human bone marrow CD34+ cells, GAGE-seq characterized the intricate relationships between 3D genome and gene expression, showing that multiscale 3D genome features inform cell-type-specific gene expression and link regulatory elements to target genes. Integration with spatial transcriptomic data revealed in situ 3D genome variations in mouse cortex. Observations in human hematopoiesis unveiled discordant changes between 3D genome organization and gene expression, underscoring a complex, temporal interplay at the single-cell level. GAGE-seq provides a powerful, cost-effective approach for exploring genome structure and gene expression relationships at the single-cell level across diverse biological contexts.
Collapse
Affiliation(s)
- Tianming Zhou
- Ray and Stephanie Lane Computational Biology Department, School of Computer Science, Carnegie Mellon University, Pittsburgh, PA, USA
| | - Ruochi Zhang
- Ray and Stephanie Lane Computational Biology Department, School of Computer Science, Carnegie Mellon University, Pittsburgh, PA, USA
- Eric and Wendy Schmidt Center, Broad Institute of MIT and Harvard, Cambridge, MA, USA
| | - Deyong Jia
- Department of Urology, University of Washington, Seattle, WA, USA
| | - Raymond T Doty
- Division of Hematology and Oncology, Department of Medicine/Fred Hutch Cancer Center, University of Washington, Seattle, WA, USA
| | - Adam D Munday
- Division of Hematology and Oncology, Department of Medicine/Fred Hutch Cancer Center, University of Washington, Seattle, WA, USA
| | - Daniel Gao
- Institute for Stem Cell and Regenerative Medicine, University of Washington, Seattle, WA, USA
- Department of Chemistry, Pomona College, Claremont, CA, USA
| | - Li Xin
- Department of Urology, University of Washington, Seattle, WA, USA
- Institute for Stem Cell and Regenerative Medicine, University of Washington, Seattle, WA, USA
| | - Janis L Abkowitz
- Division of Hematology and Oncology, Department of Medicine/Fred Hutch Cancer Center, University of Washington, Seattle, WA, USA
- Institute for Stem Cell and Regenerative Medicine, University of Washington, Seattle, WA, USA
| | - Zhijun Duan
- Division of Hematology and Oncology, Department of Medicine/Fred Hutch Cancer Center, University of Washington, Seattle, WA, USA.
- Institute for Stem Cell and Regenerative Medicine, University of Washington, Seattle, WA, USA.
| | - Jian Ma
- Ray and Stephanie Lane Computational Biology Department, School of Computer Science, Carnegie Mellon University, Pittsburgh, PA, USA.
| |
Collapse
|
3
|
Shiu SH, Lehti-Shiu MD. Assessing the evolution of research topics in a biological field using plant science as an example. PLoS Biol 2024; 22:e3002612. [PMID: 38781246 PMCID: PMC11115244 DOI: 10.1371/journal.pbio.3002612] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/16/2023] [Accepted: 04/04/2024] [Indexed: 05/25/2024] Open
Abstract
Scientific advances due to conceptual or technological innovations can be revealed by examining how research topics have evolved. But such topical evolution is difficult to uncover and quantify because of the large body of literature and the need for expert knowledge in a wide range of areas in a field. Using plant biology as an example, we used machine learning and language models to classify plant science citations into topics representing interconnected, evolving subfields. The changes in prevalence of topical records over the last 50 years reflect shifts in major research trends and recent radiation of new topics, as well as turnover of model species and vastly different plant science research trajectories among countries. Our approaches readily summarize the topical diversity and evolution of a scientific field with hundreds of thousands of relevant papers, and they can be applied broadly to other fields.
Collapse
Affiliation(s)
- Shin-Han Shiu
- Department of Plant Biology, Michigan State University, East Lansing, Michigan, United States of America
- Department of Computational Mathematics, Science, and Engineering, Michigan State University, East Lansing, Michigan, United States of America
- DOE-Great Lake Bioenergy Research Center, Michigan State University, East Lansing, Michigan, United States of America
| | - Melissa D. Lehti-Shiu
- Department of Plant Biology, Michigan State University, East Lansing, Michigan, United States of America
| |
Collapse
|
4
|
Xiong K, Zhang R, Ma J. scGHOST: identifying single-cell 3D genome subcompartments. Nat Methods 2024; 21:814-822. [PMID: 38589516 PMCID: PMC11127718 DOI: 10.1038/s41592-024-02230-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/25/2023] [Accepted: 03/01/2024] [Indexed: 04/10/2024]
Abstract
Single-cell Hi-C (scHi-C) technologies allow for probing of genome-wide cell-to-cell variability in three-dimensional (3D) genome organization from individual cells. Computational methods have been developed to reveal single-cell 3D genome features based on scHi-C, including A/B compartments, topologically associating domains and chromatin loops. However, no method exists for annotating single-cell subcompartments, which is important for understanding chromosome spatial localization in single cells. Here we present scGHOST, a single-cell subcompartment annotation method using graph embedding with constrained random walk sampling. Applications of scGHOST to scHi-C data and contact maps derived from single-cell 3D genome imaging demonstrate reliable identification of single-cell subcompartments, offering insights into cell-to-cell variability of nuclear subcompartments. Using scHi-C data from complex tissues, scGHOST identifies cell-type-specific or allele-specific subcompartments linked to gene transcription across various cell types and developmental stages, suggesting functional implications of single-cell subcompartments. scGHOST is an effective method for annotating single-cell 3D genome subcompartments in a broad range of biological contexts.
Collapse
Affiliation(s)
- Kyle Xiong
- Ray and Stephanie Lane Computational Biology Department, School of Computer Science, Carnegie Mellon University, Pittsburgh, PA, USA
| | - Ruochi Zhang
- Ray and Stephanie Lane Computational Biology Department, School of Computer Science, Carnegie Mellon University, Pittsburgh, PA, USA
- Eric and Wendy Schmidt Center, Broad Institute of MIT and Harvard, Cambridge, MA, USA
| | - Jian Ma
- Ray and Stephanie Lane Computational Biology Department, School of Computer Science, Carnegie Mellon University, Pittsburgh, PA, USA.
| |
Collapse
|
5
|
Shi Z, Wu H. CTPredictor: A comprehensive and robust framework for predicting cell types by integrating multi-scale features from single-cell Hi-C data. Comput Biol Med 2024; 173:108336. [PMID: 38513390 DOI: 10.1016/j.compbiomed.2024.108336] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/11/2023] [Revised: 03/01/2024] [Accepted: 03/17/2024] [Indexed: 03/23/2024]
Abstract
Single-cell Hi-C (scHi-C) has emerged as a powerful technology for deciphering cell-to-cell variability in three-dimensional (3D) chromatin organization, providing insights into genome-wide chromatin interactions and their correlation with cellular functions. Nevertheless, the accurate identification of cell types across different datasets remains a formidable challenge, hindering comprehensive investigations into genome structure. In response, we introduce CTPredictor, an innovative computational method that integrates multi-scale features to accurately predict cell types in various datasets. CTPredictor strategically incorporates three distinct feature sets, namely, small intra-domain contact probability (SICP), smoothed small intra-domain contact probability (SSICP), and smoothed bin contact probability (SBCP). The resulting fusion classification model significantly enhances the accuracy of cell type prediction based on single-cell Hi-C data (scHi-C). Rigorous benchmarking against established methods and three conventional machine learning approaches demonstrates the robust performance of CTPredictor, positioning it as an advanced tool for cell type prediction within scHi-C data. Beyond its prediction capabilities, CTPredictor holds promise in illuminating 3D genome structures and their functional significance across a wide array of biological processes.
Collapse
Affiliation(s)
- Zhenqi Shi
- School of Software, Shandong University, 250100, Jinan, China
| | - Hao Wu
- School of Software, Shandong University, 250100, Jinan, China.
| |
Collapse
|
6
|
Rezaie N, Rebboah E, Williams BA, Liang HY, Reese F, Balderrama-Gutierrez G, Dionne LA, Reinholdt L, Trout D, Wold BJ, Mortazavi A. Identification of robust cellular programs using reproducible LDA that impact sex-specific disease progression in different genotypes of a mouse model of AD. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.02.26.582178. [PMID: 38464087 PMCID: PMC10925135 DOI: 10.1101/2024.02.26.582178] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 03/12/2024]
Abstract
The gene expression profiles of distinct cell types reflect complex genomic interactions among multiple simultaneous biological processes within each cell that can be altered by disease progression as well as genetic background. The identification of these active cellular programs is an open challenge in the analysis of single-cell RNA-seq data. Latent Dirichlet Allocation (LDA) is a generative method used to identify recurring patterns in counts data, commonly referred to as topics that can be used to interpret the state of each cell. However, LDA's interpretability is hindered by several key factors including the hyperparameter selection of the number of topics as well as the variability in topic definitions due to random initialization. We developed Topyfic, a Reproducible LDA (rLDA) package, to accurately infer the identity and activity of cellular programs in single-cell data, providing insights into the relative contributions of each program in individual cells. We apply Topyfic to brain single-cell and single-nucleus datasets of two 5xFAD mouse models of Alzheimer's disease crossed with C57BL6/J or CAST/EiJ mice to identify distinct cell types and states in different cell types such as microglia. We find that 8-month 5xFAD/Cast F1 males show higher level of microglial activation than matching 5xFAD/BL6 F1 males, whereas female mice show similar levels of microglial activation. We show that regulatory genes such as TFs, microRNA host genes, and chromatin regulatory genes alone capture cell types and cell states. Our study highlights how topic modeling with a limited vocabulary of regulatory genes can identify gene expression programs in single-cell data in order to quantify similar and divergent cell states in distinct genotypes.
Collapse
Affiliation(s)
- Narges Rezaie
- Department of Developmental and Cell Biology, University of California, Irvine, CA, USA
- Center for Complex Biological Systems, University of California, Irvine, CA, USA
| | - Elisabeth Rebboah
- Department of Developmental and Cell Biology, University of California, Irvine, CA, USA
- Center for Complex Biological Systems, University of California, Irvine, CA, USA
| | - Brian A Williams
- Division of Biology, California Institute of Technology, Pasadena, CA, USA
| | - Heidi Yahan Liang
- Department of Developmental and Cell Biology, University of California, Irvine, CA, USA
- Center for Complex Biological Systems, University of California, Irvine, CA, USA
| | - Fairlie Reese
- Department of Developmental and Cell Biology, University of California, Irvine, CA, USA
- Center for Complex Biological Systems, University of California, Irvine, CA, USA
| | - Gabriela Balderrama-Gutierrez
- Department of Developmental and Cell Biology, University of California, Irvine, CA, USA
- Center for Complex Biological Systems, University of California, Irvine, CA, USA
| | | | | | - Diane Trout
- Division of Biology, California Institute of Technology, Pasadena, CA, USA
| | - Barbara J Wold
- Division of Biology, California Institute of Technology, Pasadena, CA, USA
| | - Ali Mortazavi
- Department of Developmental and Cell Biology, University of California, Irvine, CA, USA
- Center for Complex Biological Systems, University of California, Irvine, CA, USA
| |
Collapse
|
7
|
Zhang K, Zemke NR, Armand EJ, Ren B. A fast, scalable and versatile tool for analysis of single-cell omics data. Nat Methods 2024; 21:217-227. [PMID: 38191932 PMCID: PMC10864184 DOI: 10.1038/s41592-023-02139-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/09/2023] [Accepted: 11/23/2023] [Indexed: 01/10/2024]
Abstract
Single-cell omics technologies have revolutionized the study of gene regulation in complex tissues. A major computational challenge in analyzing these datasets is to project the large-scale and high-dimensional data into low-dimensional space while retaining the relative relationships between cells. This low dimension embedding is necessary to decompose cellular heterogeneity and reconstruct cell-type-specific gene regulatory programs. Traditional dimensionality reduction techniques, however, face challenges in computational efficiency and in comprehensively addressing cellular diversity across varied molecular modalities. Here we introduce a nonlinear dimensionality reduction algorithm, embodied in the Python package SnapATAC2, which not only achieves a more precise capture of single-cell omics data heterogeneities but also ensures efficient runtime and memory usage, scaling linearly with the number of cells. Our algorithm demonstrates exceptional performance, scalability and versatility across diverse single-cell omics datasets, including single-cell assay for transposase-accessible chromatin using sequencing, single-cell RNA sequencing, single-cell Hi-C and single-cell multi-omics datasets, underscoring its utility in advancing single-cell analysis.
Collapse
Affiliation(s)
- Kai Zhang
- Department of Cellular and Molecular Medicine, University of California, San Diego School of Medicine, La Jolla, CA, USA
- Westlake Laboratory of Life Sciences and Biomedicine, School of Life Sciences, Westlake University, Hangzhou, China
| | - Nathan R Zemke
- Department of Cellular and Molecular Medicine, University of California, San Diego School of Medicine, La Jolla, CA, USA
- Center for Epigenomics, University of California, San Diego School of Medicine, La Jolla, CA, USA
| | - Ethan J Armand
- Department of Cellular and Molecular Medicine, University of California, San Diego School of Medicine, La Jolla, CA, USA
- Bioinformatics and Systems Biology Program, University of California, San Diego, La Jolla, CA, USA
| | - Bing Ren
- Department of Cellular and Molecular Medicine, University of California, San Diego School of Medicine, La Jolla, CA, USA.
- Center for Epigenomics, University of California, San Diego School of Medicine, La Jolla, CA, USA.
- Ludwig Institute for Cancer Research, La Jolla, CA, USA.
- Institute for Genomic Medicine, University of California, San Diego, La Jolla, CA, USA.
| |
Collapse
|
8
|
Zhang Y, Boninsegna L, Yang M, Misteli T, Alber F, Ma J. Computational methods for analysing multiscale 3D genome organization. Nat Rev Genet 2024; 25:123-141. [PMID: 37673975 PMCID: PMC11127719 DOI: 10.1038/s41576-023-00638-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 07/12/2023] [Indexed: 09/08/2023]
Abstract
Recent progress in whole-genome mapping and imaging technologies has enabled the characterization of the spatial organization and folding of the genome in the nucleus. In parallel, advanced computational methods have been developed to leverage these mapping data to reveal multiscale three-dimensional (3D) genome features and to provide a more complete view of genome structure and its connections to genome functions such as transcription. Here, we discuss how recently developed computational tools, including machine-learning-based methods and integrative structure-modelling frameworks, have led to a systematic, multiscale delineation of the connections among different scales of 3D genome organization, genomic and epigenomic features, functional nuclear components and genome function. However, approaches that more comprehensively integrate a wide variety of genomic and imaging datasets are still needed to uncover the functional role of 3D genome structure in defining cellular phenotypes in health and disease.
Collapse
Affiliation(s)
- Yang Zhang
- Computational Biology Department, School of Computer Science, Carnegie Mellon University, Pittsburgh, PA, USA
| | - Lorenzo Boninsegna
- Department of Microbiology, Immunology and Molecular Genetics and Institute for Quantitative and Computational Biosciences, University of California Los Angeles, Los Angeles, CA, USA
| | - Muyu Yang
- Computational Biology Department, School of Computer Science, Carnegie Mellon University, Pittsburgh, PA, USA
| | - Tom Misteli
- Center for Cancer Research, National Cancer Institute, Bethesda, MD, USA.
| | - Frank Alber
- Department of Microbiology, Immunology and Molecular Genetics and Institute for Quantitative and Computational Biosciences, University of California Los Angeles, Los Angeles, CA, USA.
| | - Jian Ma
- Computational Biology Department, School of Computer Science, Carnegie Mellon University, Pittsburgh, PA, USA.
| |
Collapse
|
9
|
Zheng J, Yang Y, Dai Z. Subgraph extraction and graph representation learning for single cell Hi-C imputation and clustering. Brief Bioinform 2023; 25:bbad379. [PMID: 38040494 PMCID: PMC10691963 DOI: 10.1093/bib/bbad379] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/18/2023] [Revised: 09/10/2023] [Accepted: 10/03/2023] [Indexed: 12/03/2023] Open
Abstract
Single-cell Hi-C (scHi-C) technology enables the investigation of 3D chromatin structure variability across individual cells. However, the analysis of scHi-C data is challenged by a large number of missing values. Here, we present a scHi-C data imputation model HiC-SGL, based on Subgraph extraction and graph representation learning. HiC-SGL can also learn informative low-dimensional embeddings of cells. We demonstrate that our method surpasses existing methods in terms of imputation accuracy and clustering performance by various metrics.
Collapse
Affiliation(s)
- Jiahao Zheng
- School of Computer Science and Engineering, Sun Yat-Sen University, 510006 Guangzhou, China
| | - Yuedong Yang
- School of Computer Science and Engineering, Sun Yat-Sen University, 510006 Guangzhou, China
| | - Zhiming Dai
- School of Computer Science and Engineering, Sun Yat-Sen University, 510006 Guangzhou, China
| |
Collapse
|
10
|
Gunsalus LM, Keiser MJ, Pollard KS. ChromaFactor: deconvolution of single-molecule chromatin organization with non-negative matrix factorization. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.11.22.568268. [PMID: 38045231 PMCID: PMC10690235 DOI: 10.1101/2023.11.22.568268] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/05/2023]
Abstract
The investigation of chromatin organization in single cells holds great promise for identifying causal relationships between genome structure and function. However, analysis of single-molecule data is hampered by extreme yet inherent heterogeneity, making it challenging to determine the contributions of individual chromatin fibers to bulk trends. To address this challenge, we propose ChromaFactor, a novel computational approach based on non-negative matrix factorization that deconvolves single-molecule chromatin organization datasets into their most salient primary components. ChromaFactor provides the ability to identify trends accounting for the maximum variance in the dataset while simultaneously describing the contribution of individual molecules to each component. Applying our approach to two single-molecule imaging datasets across different genomic scales, we find that these primary components demonstrate significant correlation with key functional phenotypes, including active transcription, enhancer-promoter distance, and genomic compartment. ChromaFactor offers a robust tool for understanding the complex interplay between chromatin structure and function on individual DNA molecules, pinpointing which subpopulations drive functional changes and fostering new insights into cellular heterogeneity and its implications for bulk genomic phenomena.
Collapse
|
11
|
龚 海, 麻 付, 张 晓. [Advances in methods and applications of single-cell Hi-C data analysis]. SHENG WU YI XUE GONG CHENG XUE ZA ZHI = JOURNAL OF BIOMEDICAL ENGINEERING = SHENGWU YIXUE GONGCHENGXUE ZAZHI 2023; 40:1033-1039. [PMID: 37879935 PMCID: PMC10600426 DOI: 10.7507/1001-5515.202303046] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Subscribe] [Scholar Register] [Received: 03/20/2023] [Revised: 08/29/2023] [Indexed: 10/27/2023]
Abstract
Chromatin three-dimensional genome structure plays a key role in cell function and gene regulation. Single-cell Hi-C techniques can capture genomic structure information at the cellular level, which provides an opportunity to study changes in genomic structure between different cell types. Recently, some excellent computational methods have been developed for single-cell Hi-C data analysis. In this paper, the available methods for single-cell Hi-C data analysis were first reviewed, including preprocessing of single-cell Hi-C data, multi-scale structure recognition based on single-cell Hi-C data, bulk-like Hi-C contact matrix generation based on single-cell Hi-C data sets, pseudo-time series analysis, and cell classification. Then the application of single-cell Hi-C data in cell differentiation and structural variation was described. Finally, the future development direction of single-cell Hi-C data analysis was also prospected.
Collapse
Affiliation(s)
- 海燕 龚
- 北京科技大学 新材料技术研究院 (北京 100083)Institute for Advanced Materials and Technology, University of Science and Technology Beijing, Beijing 100083, P. R. China
- 北京科技大学 计算机与通信工程学院(北京 100083)School of Computer and Communication Engineering, University of Science and Technology Beijing, Beijing 100083, P. R. China
| | - 付强 麻
- 北京科技大学 新材料技术研究院 (北京 100083)Institute for Advanced Materials and Technology, University of Science and Technology Beijing, Beijing 100083, P. R. China
| | - 晓彤 张
- 北京科技大学 新材料技术研究院 (北京 100083)Institute for Advanced Materials and Technology, University of Science and Technology Beijing, Beijing 100083, P. R. China
- 北京科技大学 计算机与通信工程学院(北京 100083)School of Computer and Communication Engineering, University of Science and Technology Beijing, Beijing 100083, P. R. China
| |
Collapse
|
12
|
Lee L, Yu M, Li X, Zhu C, Zhang Y, Yu H, Chen Z, Mishra S, Ren B, Li Y, Hu M. SnapHiC-D: a computational pipeline to identify differential chromatin contacts from single-cell Hi-C data. Brief Bioinform 2023; 24:bbad315. [PMID: 37649383 PMCID: PMC10516352 DOI: 10.1093/bib/bbad315] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/25/2023] [Revised: 08/04/2023] [Accepted: 08/07/2023] [Indexed: 09/01/2023] Open
Abstract
Single-cell high-throughput chromatin conformation capture technologies (scHi-C) has been used to map chromatin spatial organization in complex tissues. However, computational tools to detect differential chromatin contacts (DCCs) from scHi-C datasets in development and through disease pathogenesis are still lacking. Here, we present SnapHiC-D, a computational pipeline to identify DCCs between two scHi-C datasets. Compared to methods designed for bulk Hi-C data, SnapHiC-D detects DCCs with high sensitivity and accuracy. We used SnapHiC-D to identify cell-type-specific chromatin contacts at 10 Kb resolution in mouse hippocampal and human prefrontal cortical tissues, demonstrating that DCCs detected in the hippocampal and cortical cell types are generally associated with cell-type-specific gene expression patterns and epigenomic features. SnapHiC-D is freely available at https://github.com/HuMingLab/SnapHiC-D.
Collapse
Affiliation(s)
- Lindsay Lee
- Department of Quantitative Health Sciences, Lerner Research Institute, Cleveland Clinic Foundation, Cleveland, OH, USA
| | - Miao Yu
- Ludwig Institute for Cancer Research, La Jolla, CA, USA
- State Key Laboratory of Genetic Engineering, School of Life Sciences, Fudan University, Shanghai, China
| | - Xiaoqi Li
- Carolina Health Informatics Program, University of North Carolina, Chapel Hill, NC, USA
| | - Chenxu Zhu
- Ludwig Institute for Cancer Research, La Jolla, CA, USA
- New York Genome Center, New York, NY, USA
- Department of Physiology and Biophysics, Institute for Computational Biomedicine, Weill Cornell Medicine, New York, NY, USA
| | - Yanxiao Zhang
- Ludwig Institute for Cancer Research, La Jolla, CA, USA
- Westlake University, Hangzhou, Zhejiang, China
| | - Hongyu Yu
- Department of Statistics, University of Wisconsin Madison, Madison, WI, USA
- Department of Biochemistry, University of Wisconsin Madison, Madison, WI, USA
| | - Ziyin Chen
- State Key Laboratory of Genetic Engineering, School of Life Sciences, Fudan University, Shanghai, China
| | - Shreya Mishra
- Department of Quantitative Health Sciences, Lerner Research Institute, Cleveland Clinic Foundation, Cleveland, OH, USA
| | - Bing Ren
- Ludwig Institute for Cancer Research, La Jolla, CA, USA
- Center for Epigenomics & Department of Cellular and Molecular Medicine, University of California, San Diego, La Jolla, CA, USA
| | - Yun Li
- Department of Biostatistics, University of North Carolina, Chapel Hill, NC, USA
- Department of Genetics, University of North Carolina, Chapel Hill, NC, USA
- Department of Computer Science, University of North Carolina, Chapel Hill, NC, USA
| | - Ming Hu
- Department of Quantitative Health Sciences, Lerner Research Institute, Cleveland Clinic Foundation, Cleveland, OH, USA
| |
Collapse
|
13
|
Zhang K, Zemke NR, Armand EJ, Ren B. SnapATAC2: a fast, scalable and versatile tool for analysis of single-cell omics data. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.09.11.557221. [PMID: 37745443 PMCID: PMC10515871 DOI: 10.1101/2023.09.11.557221] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 09/26/2023]
Abstract
Single-cell omics technologies have ushered in a new era for the study of dynamic gene regulation in complex tissues during development and disease pathogenesis. A major computational challenge in analyzing these datasets is to project the large-scale and high dimensional data into low-dimensional space while retaining the relative relationships between cells in order to decompose the cellular heterogeneity and reconstruct cell-type-specific gene regulatory programs. Conventional dimensionality reduction methods suffer from computational inefficiency, difficulty to capture the full spectrum of cellular heterogeneity, or inability to apply across diverse molecular modalities. Here, we report a fast and nonlinear dimensionality reduction algorithm that not only more accurately captures the heterogeneities of single-cell omics data, but also features runtime and memory usage that is computational efficient and linearly proportional to cell numbers. We implement this algorithm in a Python package named SnapATAC2, and demonstrate its superior performance, remarkable scalability and general adaptability using an array of single-cell omics data types, including single-cell ATAC-seq, single-cell RNA-seq, single-cell Hi-C, and single-cell multiomics datasets.
Collapse
|
14
|
Gao VR, Yang R, Das A, Luo R, Luo H, McNally DR, Karagiannidis I, Rivas MA, Wang ZM, Barisic D, Karbalayghareh A, Wong W, Zhan YA, Chin CR, Noble W, Bilmes JA, Apostolou E, Kharas MG, Béguelin W, Viny AD, Huangfu D, Rudensky AY, Melnick AM, Leslie CS. ChromaFold predicts the 3D contact map from single-cell chromatin accessibility. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.07.27.550836. [PMID: 37546906 PMCID: PMC10402156 DOI: 10.1101/2023.07.27.550836] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 08/08/2023]
Abstract
The identification of cell-type-specific 3D chromatin interactions between regulatory elements can help to decipher gene regulation and to interpret the function of disease-associated non-coding variants. However, current chromosome conformation capture (3C) technologies are unable to resolve interactions at this resolution when only small numbers of cells are available as input. We therefore present ChromaFold, a deep learning model that predicts 3D contact maps and regulatory interactions from single-cell ATAC sequencing (scATAC-seq) data alone. ChromaFold uses pseudobulk chromatin accessibility, co-accessibility profiles across metacells, and predicted CTCF motif tracks as input features and employs a lightweight architecture to enable training on standard GPUs. Once trained on paired scATAC-seq and Hi-C data in human cell lines and tissues, ChromaFold can accurately predict both the 3D contact map and peak-level interactions across diverse human and mouse test cell types. In benchmarking against a recent deep learning method that uses bulk ATAC-seq, DNA sequence, and CTCF ChIP-seq to make cell-type-specific predictions, ChromaFold yields superior prediction performance when including CTCF ChIP-seq data as an input and comparable performance without. Finally, fine-tuning ChromaFold on paired scATAC-seq and Hi-C in a complex tissue enables deconvolution of chromatin interactions across cell subpopulations. ChromaFold thus achieves state-of-the-art prediction of 3D contact maps and regulatory interactions using scATAC-seq alone as input data, enabling accurate inference of cell-type-specific interactions in settings where 3C-based assays are infeasible.
Collapse
Affiliation(s)
- Vianne R. Gao
- Computational and Systems Biology Program, Memorial Sloan Kettering Cancer Center, New York, NY, USA
- Tri-Institutional Program in Computational Biology and Medicine, New York, NY, USA
| | - Rui Yang
- Computational and Systems Biology Program, Memorial Sloan Kettering Cancer Center, New York, NY, USA
- Tri-Institutional Program in Computational Biology and Medicine, New York, NY, USA
| | - Arnav Das
- University of Washington, Seattle, WA, USA
| | - Renhe Luo
- Developmental Biology Program, Sloan Kettering Institute, New York, NY, USA
| | - Hanzhi Luo
- Molecular Pharmacology Program, Experimental Therapeutics Center and Center for Stem Cell Biology, Memorial Sloan Kettering Cancer Center, New York, NY, USA
| | - Dylan R. McNally
- Caryl and Israel Englander Institute for Precision Medicine, Institute for Computational Biomedicine, Weill Cornell Medicine, Cornell University, New York, NY, USA
| | - Ioannis Karagiannidis
- Division of Hematology and Medical Oncology, Department of Medicine, Weill Cornell Medical College, New York, NY, USA
| | - Martin A. Rivas
- Division of Hematology and Medical Oncology, Department of Medicine, Weill Cornell Medical College, New York, NY, USA
| | - Zhong-Min Wang
- Howard Hughes Medical Institute and Immunology Program, Sloan Kettering Institute and Ludwig Center at Memorial Sloan Kettering Cancer Center, New York, NY, USA
| | - Darko Barisic
- Division of Hematology and Medical Oncology, Department of Medicine, Weill Cornell Medical College, New York, NY, USA
| | - Alireza Karbalayghareh
- Computational and Systems Biology Program, Memorial Sloan Kettering Cancer Center, New York, NY, USA
| | - Wilfred Wong
- Computational and Systems Biology Program, Memorial Sloan Kettering Cancer Center, New York, NY, USA
- Tri-Institutional Program in Computational Biology and Medicine, New York, NY, USA
| | - Yingqian A. Zhan
- Center for Epigenetics Research, Memorial Sloan Kettering Cancer Center, New York, NY, USA
| | - Christopher R. Chin
- Division of Hematology and Medical Oncology, Department of Medicine, Weill Cornell Medical College, New York, NY, USA
| | | | | | - Effie Apostolou
- Sanford I Weill department of Medicine, Sandra and Edward Meyer Cancer center, Weill Cornell Medicine, New York, NY, USA
| | - Michael G. Kharas
- Molecular Pharmacology Program, Experimental Therapeutics Center and Center for Stem Cell Biology, Memorial Sloan Kettering Cancer Center, New York, NY, USA
| | - Wendy Béguelin
- Division of Hematology and Medical Oncology, Department of Medicine, Weill Cornell Medical College, New York, NY, USA
| | - Aaron D. Viny
- Departments of Medicine, Division of Hematology & Oncology, and of Genetics & Development, Columbia Stem Cell Initiative, Herbert Irving Comprehensive Cancer Center, Columbia University Irving Medical Center, New York, NY, USA
| | - Danwei Huangfu
- Developmental Biology Program, Sloan Kettering Institute, New York, NY, USA
| | - Alexander Y. Rudensky
- Howard Hughes Medical Institute and Immunology Program, Sloan Kettering Institute and Ludwig Center at Memorial Sloan Kettering Cancer Center, New York, NY, USA
| | - Ari M. Melnick
- Division of Hematology and Medical Oncology, Department of Medicine, Weill Cornell Medical College, New York, NY, USA
| | - Christina S. Leslie
- Computational and Systems Biology Program, Memorial Sloan Kettering Cancer Center, New York, NY, USA
| |
Collapse
|
15
|
Zhou T, Zhang R, Jia D, Doty RT, Munday AD, Gao D, Xin L, Abkowitz JL, Duan Z, Ma J. Concurrent profiling of multiscale 3D genome organization and gene expression in single mammalian cells. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.07.20.549578. [PMID: 37546900 PMCID: PMC10401946 DOI: 10.1101/2023.07.20.549578] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 08/08/2023]
Abstract
The organization of mammalian genomes within the nucleus features a complex, multiscale three-dimensional (3D) architecture. The functional significance of these 3D genome features, however, remains largely elusive due to limited single-cell technologies that can concurrently profile genome organization and transcriptional activities. Here, we report GAGE-seq, a highly scalable, robust single-cell co-assay that simultaneously measures 3D genome structure and transcriptome within the same cell. Employing GAGE-seq on mouse brain cortex and human bone marrow CD34+ cells, we comprehensively characterized the intricate relationships between 3D genome and gene expression. We found that these multiscale 3D genome features collectively inform cell type-specific gene expressions, hence contributing to defining cell identity at the single-cell level. Integration of GAGE-seq data with spatial transcriptomic data revealed in situ variations of the 3D genome in mouse cortex. Moreover, our observations of lineage commitment in normal human hematopoiesis unveiled notable discordant changes between 3D genome organization and gene expression, underscoring a complex, temporal interplay at the single-cell level that is more nuanced than previously appreciated. Together, GAGE-seq provides a powerful, cost-effective approach for interrogating genome structure and gene expression relationships at the single-cell level across diverse biological contexts.
Collapse
Affiliation(s)
- Tianming Zhou
- Computational Biology Department, School of Computer Science, Carnegie Mellon University, Pittsburgh, PA 15213, USA
| | - Ruochi Zhang
- Computational Biology Department, School of Computer Science, Carnegie Mellon University, Pittsburgh, PA 15213, USA
- Present address: Broad Institute of MIT and Harvard, Cambridge, MA 02142, USA
| | - Deyong Jia
- Department of Urology, University of Washington, Seattle, WA 98195, USA
| | - Raymond T. Doty
- Division of Hematology, Department of Medicine, University of Washington, Seattle, WA 98195, USA
| | - Adam D. Munday
- Division of Hematology, Department of Medicine, University of Washington, Seattle, WA 98195, USA
| | - Daniel Gao
- Institute for Stem Cell and Regenerative Medicine, University of Washington, Seattle, WA 98109, USA
- Present address: Department of Chemistry, Pomona College, Claremont, CA 91711, USA
| | - Li Xin
- Department of Urology, University of Washington, Seattle, WA 98195, USA
- Institute for Stem Cell and Regenerative Medicine, University of Washington, Seattle, WA 98109, USA
| | - Janis L. Abkowitz
- Division of Hematology, Department of Medicine, University of Washington, Seattle, WA 98195, USA
- Institute for Stem Cell and Regenerative Medicine, University of Washington, Seattle, WA 98109, USA
| | - Zhijun Duan
- Division of Hematology, Department of Medicine, University of Washington, Seattle, WA 98195, USA
- Institute for Stem Cell and Regenerative Medicine, University of Washington, Seattle, WA 98109, USA
| | - Jian Ma
- Computational Biology Department, School of Computer Science, Carnegie Mellon University, Pittsburgh, PA 15213, USA
| |
Collapse
|
16
|
Rappoport N, Chomsky E, Nagano T, Seibert C, Lubling Y, Baran Y, Lifshitz A, Leung W, Mukamel Z, Shamir R, Fraser P, Tanay A. Single cell Hi-C identifies plastic chromosome conformations underlying the gastrulation enhancer landscape. Nat Commun 2023; 14:3844. [PMID: 37386027 PMCID: PMC10310791 DOI: 10.1038/s41467-023-39549-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/27/2022] [Accepted: 06/19/2023] [Indexed: 07/01/2023] Open
Abstract
Embryonic development involves massive proliferation and differentiation of cell lineages. This must be supported by chromosome replication and epigenetic reprogramming, but how proliferation and cell fate acquisition are balanced in this process is not well understood. Here we use single cell Hi-C to map chromosomal conformations in post-gastrulation mouse embryo cells and study their distributions and correlations with matching embryonic transcriptional atlases. We find that embryonic chromosomes show a remarkably strong cell cycle signature. Despite that, replication timing, chromosome compartment structure, topological associated domains (TADs) and promoter-enhancer contacts are shown to be variable between distinct epigenetic states. About 10% of the nuclei are identified as primitive erythrocytes, showing exceptionally compact and organized compartment structure. The remaining cells are broadly associated with ectoderm and mesoderm identities, showing only mild differentiation of TADs and compartment structures, but more specific localized contacts in hundreds of ectoderm and mesoderm promoter-enhancer pairs. The data suggest that while fully committed embryonic lineages can rapidly acquire specific chromosomal conformations, most embryonic cells are showing plastic signatures driven by complex and intermixed enhancer landscapes.
Collapse
Affiliation(s)
- Nimrod Rappoport
- Department of Computer Science and Department of Biological Regulation, Weizmann Institute of Science, Rehovot, Israel
- The Blavatnik School of Computer Science, Tel Aviv University, Tel Aviv, Israel
| | - Elad Chomsky
- Department of Computer Science and Department of Biological Regulation, Weizmann Institute of Science, Rehovot, Israel
| | - Takashi Nagano
- Laboratory for Nuclear Dynamics, Institute for Protein Research, Osaka University, Osaka, Japan
- Nuclear Dynamics Programme, The Babraham Institute, Cambridge, UK
| | - Charlie Seibert
- Department of Biological Science, Florida State University, Tallahassee, FL, USA
| | - Yaniv Lubling
- Department of Computer Science and Department of Biological Regulation, Weizmann Institute of Science, Rehovot, Israel
| | - Yael Baran
- Department of Computer Science and Department of Biological Regulation, Weizmann Institute of Science, Rehovot, Israel
| | - Aviezer Lifshitz
- Department of Computer Science and Department of Biological Regulation, Weizmann Institute of Science, Rehovot, Israel
| | - Wing Leung
- Laboratory for Nuclear Dynamics, Institute for Protein Research, Osaka University, Osaka, Japan
- Nuclear Dynamics Programme, The Babraham Institute, Cambridge, UK
| | - Zohar Mukamel
- Department of Computer Science and Department of Biological Regulation, Weizmann Institute of Science, Rehovot, Israel
| | - Ron Shamir
- The Blavatnik School of Computer Science, Tel Aviv University, Tel Aviv, Israel
| | - Peter Fraser
- Nuclear Dynamics Programme, The Babraham Institute, Cambridge, UK.
- Department of Biological Science, Florida State University, Tallahassee, FL, USA.
| | - Amos Tanay
- Department of Computer Science and Department of Biological Regulation, Weizmann Institute of Science, Rehovot, Israel.
| |
Collapse
|
17
|
Fan S, Dang D, Ye Y, Zhang SW, Gao L, Zhang S. scHi-CSim: a flexible simulator that generates high-fidelity single-cell Hi-C data for benchmarking. J Mol Cell Biol 2023; 15:mjad003. [PMID: 36708167 PMCID: PMC10308180 DOI: 10.1093/jmcb/mjad003] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/01/2022] [Revised: 09/18/2022] [Accepted: 01/25/2023] [Indexed: 01/29/2023] Open
Abstract
Single-cell Hi-C technology provides an unprecedented opportunity to reveal chromatin structure in individual cells. However, high sequencing cost impedes the generation of biological Hi-C data with high sequencing depths and multiple replicates for downstream analysis. Here, we developed a single-cell Hi-C simulator (scHi-CSim) that generates high-fidelity data for benchmarking. scHi-CSim merges neighboring cells to overcome the sparseness of data, samples interactions in distance-stratified chromosomes to maintain the heterogeneity of single cells, and estimates the empirical distribution of restriction fragments to generate simulated data. We demonstrated that scHi-CSim can generate high-fidelity data by comparing the performance of single-cell clustering and detection of chromosomal high-order structures with raw data. Furthermore, scHi-CSim is flexible to change sequencing depth and the number of simulated replicates. We showed that increasing sequencing depth could improve the accuracy of detecting topologically associating domains. We also used scHi-CSim to generate a series of simulated datasets with different sequencing depths to benchmark scHi-C clustering methods.
Collapse
Affiliation(s)
- Shichen Fan
- School of Computer Science and Technology, Xidian University, Xi'an 710071, China
| | - Dachang Dang
- School of Automation, Northwestern Polytechnical University, Xi'an 710072, China
| | - Yusen Ye
- School of Computer Science and Technology, Xidian University, Xi'an 710071, China
| | - Shao-Wu Zhang
- School of Automation, Northwestern Polytechnical University, Xi'an 710072, China
| | - Lin Gao
- School of Computer Science and Technology, Xidian University, Xi'an 710071, China
| | - Shihua Zhang
- NCMIS, CEMS, RCSDS, Academy of Mathematics and Systems Science, Chinese Academy of Sciences, Beijing 100190, China
- School of Mathematical Sciences, University of Chinese Academy of Sciences, Beijing 100049, China
- Center for Excellence in Animal Evolution and Genetics, Chinese Academy of Sciences, Kunming 650223, China
- Key Laboratory of Systems Health Science of Zhejiang Province, School of Life Science, Hangzhou Institute for Advanced Study, University of Chinese Academy of Sciences, Hangzhou 310024, China
| |
Collapse
|
18
|
Xiong K, Zhang R, Ma J. scGHOST: Identifying single-cell 3D genome subcompartments. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.05.24.542032. [PMID: 37292994 PMCID: PMC10245874 DOI: 10.1101/2023.05.24.542032] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/10/2023]
Abstract
New single-cell Hi-C (scHi-C) technologies enable probing of the genome-wide cell-to-cell variability in 3D genome organization from individual cells. Several computational methods have been developed to reveal single-cell 3D genome features based on scHi-C data, including A/B compartments, topologically-associating domains, and chromatin loops. However, no scHi-C analysis method currently exists for annotating single-cell subcompartments, which are crucial for providing a more refined view of large-scale chromosome spatial localization in single cells. Here, we present scGhost, a single-cell subcompartment annotation method based on graph embedding with constrained random walk sampling. Applications of scGhost to scHi-C data and single-cell 3D genome imaging data demonstrate the reliable identification of single-cell subcompartments and offer new insights into cell-to-cell variability of nuclear subcompartments. Using scHi-C data from the human prefrontal cortex, scGhost identifies cell type-specific subcompartments that are strongly connected to cell type-specific gene expression, suggesting the functional implications of single-cell subcompartments. Overall, scGhost is an effective new method for single-cell 3D genome subcompartment annotation based on scHi-C data for a broad range of biological contexts.
Collapse
Affiliation(s)
- Kyle Xiong
- Computational Biology Department, School of Computer Science, Carnegie Mellon University, Pittsburgh, PA 15213, USA
| | - Ruochi Zhang
- Computational Biology Department, School of Computer Science, Carnegie Mellon University, Pittsburgh, PA 15213, USA
| | - Jian Ma
- Computational Biology Department, School of Computer Science, Carnegie Mellon University, Pittsburgh, PA 15213, USA
| |
Collapse
|
19
|
Tiukacheva EA, Ulianov SV, Karpukhina A, Razin SV, Vassetzky Y. 3D genome alterations and editing in pathology. Mol Ther 2023; 31:924-933. [PMID: 36755493 PMCID: PMC10124079 DOI: 10.1016/j.ymthe.2023.02.005] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/25/2022] [Revised: 12/07/2022] [Accepted: 02/03/2023] [Indexed: 02/10/2023] Open
Abstract
The human genome is folded into a multi-level 3D structure that controls many nuclear functions including gene expression. Recently, alterations in 3D genome organization were associated with several genetic diseases and cancer. As a consequence, experimental approaches are now being developed to modify the global 3D genome organization and that of specific loci. Here, we discuss emerging experimental approaches of 3D genome editing that may prove useful in biomedicine.
Collapse
Affiliation(s)
- Eugenia A Tiukacheva
- CNRS UMR9018, Institut Gustave Roussy, 94805 Villejuif, France; Institute of Gene Biology, Moscow 119334, Russia; Moscow Institute of Physics and Technology, Moscow 141700, Russia; Faculty of Biology, Lomonosov Moscow State University, Moscow 119991, Russia; Koltzov Institute of Developmental Biology, Moscow 119334, Russia
| | - Sergey V Ulianov
- Institute of Gene Biology, Moscow 119334, Russia; Faculty of Biology, Lomonosov Moscow State University, Moscow 119991, Russia
| | - Anna Karpukhina
- CNRS UMR9018, Institut Gustave Roussy, 94805 Villejuif, France; Koltzov Institute of Developmental Biology, Moscow 119334, Russia
| | - Sergey V Razin
- Institute of Gene Biology, Moscow 119334, Russia; Faculty of Biology, Lomonosov Moscow State University, Moscow 119991, Russia
| | - Yegor Vassetzky
- CNRS UMR9018, Institut Gustave Roussy, 94805 Villejuif, France; Koltzov Institute of Developmental Biology, Moscow 119334, Russia.
| |
Collapse
|
20
|
Kalluchi A, Harris HL, Reznicek TE, Rowley MJ. Considerations and caveats for analyzing chromatin compartments. Front Mol Biosci 2023; 10:1168562. [PMID: 37091873 PMCID: PMC10113542 DOI: 10.3389/fmolb.2023.1168562] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/17/2023] [Accepted: 03/27/2023] [Indexed: 04/08/2023] Open
Abstract
Genomes are organized into nuclear compartments, separating active from inactive chromatin. Chromatin compartments are readily visible in a large number of species by experiments that map chromatin conformation genome-wide. When analyzing these maps, a common step is the identification of genomic intervals that interact within A (active) and B (inactive) compartments. It has also become increasingly common to identify and analyze subcompartments. We review different strategies to identify A/B and subcompartment intervals, including a discussion of various machine-learning approaches to predict these features. We then discuss the strengths and limitations of current strategies and examine how these aspects of analysis may have impacted our understanding of chromatin compartments.
Collapse
Affiliation(s)
| | | | | | - M. Jordan Rowley
- Department of Genetics, Cell Biology and Anatomy, University of Nebraska Medical Center, Omaha, NE, United States
| |
Collapse
|
21
|
Chen M, Liu X, Liu Q, Shi D, Li H. 3D genomics and its applications in precision medicine. Cell Mol Biol Lett 2023; 28:19. [PMID: 36879202 PMCID: PMC9987123 DOI: 10.1186/s11658-023-00428-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/16/2022] [Accepted: 02/06/2023] [Indexed: 03/08/2023] Open
Abstract
Three-dimensional (3D) genomics is an emerging discipline that studies the three-dimensional structure of chromatin and the three-dimensional and functions of genomes. It mainly studies the three-dimensional conformation and functional regulation of intranuclear genomes, such as DNA replication, DNA recombination, genome folding, gene expression regulation, transcription factor regulation mechanism, and the maintenance of three-dimensional conformation of genomes. Self-chromosomal conformation capture (3C) technology has been developed, and 3D genomics and related fields have developed rapidly. In addition, chromatin interaction analysis techniques developed by 3C technologies, such as paired-end tag sequencing (ChIA-PET) and whole-genome chromosome conformation capture (Hi-C), enable scientists to further study the relationship between chromatin conformation and gene regulation in different species. Thus, the spatial conformation of plant, animal, and microbial genomes, transcriptional regulation mechanisms, interaction patterns of chromosomes, and the formation mechanism of spatiotemporal specificity of genomes are revealed. With the help of new experimental technologies, the identification of key genes and signal pathways related to life activities and diseases is sustaining the rapid development of life science, agriculture, and medicine. In this paper, the concept and development of 3D genomics and its application in agricultural science, life science, and medicine are introduced, which provides a theoretical basis for the study of biological life processes.
Collapse
Affiliation(s)
- Mengjie Chen
- State Key Laboratory for Conservation and Utilization of Subtropical Agro-Bioresources, College of Animal Science and Technology, Guangxi University, Nanning, 530004, Guangxi Province, China
| | - Xingyu Liu
- State Key Laboratory for Conservation and Utilization of Subtropical Agro-Bioresources, College of Animal Science and Technology, Guangxi University, Nanning, 530004, Guangxi Province, China
| | - Qingyou Liu
- State Key Laboratory for Conservation and Utilization of Subtropical Agro-Bioresources, College of Animal Science and Technology, Guangxi University, Nanning, 530004, Guangxi Province, China.,Guangdong Provincial Key Laboratory of Animal Molecular Design and Precise Breeding, School of Life Science and Engineering, Foshan University, Foshan, 528225, China
| | - Deshun Shi
- State Key Laboratory for Conservation and Utilization of Subtropical Agro-Bioresources, College of Animal Science and Technology, Guangxi University, Nanning, 530004, Guangxi Province, China.
| | - Hui Li
- State Key Laboratory for Conservation and Utilization of Subtropical Agro-Bioresources, College of Animal Science and Technology, Guangxi University, Nanning, 530004, Guangxi Province, China.
| |
Collapse
|
22
|
El Hachem EJ, Sokolovska N, Soula H. Latent dirichlet allocation for double clustering (LDA-DC): discovering patients phenotypes and cell populations within a single Bayesian framework. BMC Bioinformatics 2023; 24:61. [PMID: 36823548 PMCID: PMC9948385 DOI: 10.1186/s12859-023-05177-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/25/2022] [Accepted: 02/08/2023] [Indexed: 02/25/2023] Open
Abstract
BACKGROUND Current clinical routines rely more and more on "omics" data such as flow cytometry data from host and microbiota. Cohorts variability in addition to patients' heterogeneity and huge dimensions make it difficult to understand underlying structure of the data and decipher pathologies. Patients stratification and diagnostics from such complex data are extremely challenging. There is an acute need to develop novel statistical machine learning methods that are robust with respect to the data heterogeneity, efficient from the computational viewpoint, and can be understood by human experts. RESULTS We propose a novel approach to stratify cell-based observations within a single probabilistic framework, i.e., to extract meaningful phenotypes from both patients and cells data simultaneously. We define this problem as a double clustering problem that we tackle with the proposed approach. Our method is a practical extension of the Latent Dirichlet Allocation and is used for the Double Clustering task (LDA-DC). We first validate the method on artificial datasets, then we apply our method to two real problems of patients stratification based on cytometry and microbiota data. We observe that the LDA-DC returns clusters of patients and also clusters of cells related to patients' conditions. We also construct a graphical representation of the results that can be easily understood by humans and are, therefore, of a big help for experts involved in pre-clinical research.
Collapse
Affiliation(s)
- Elie-Julien El Hachem
- Sorbonne University, INSERM, Nutrition and Obesities: Systemic Approaches, NutriOmique, 91 Boulevard de l'hôpital, 75013, Paris, France.
| | - Nataliya Sokolovska
- Sorbonne University, INSERM, Nutrition and Obesities: Systemic Approaches, NutriOmique, 91 Boulevard de l'hôpital, 75013, Paris, France
| | - Hedi Soula
- Sorbonne University, INSERM, Nutrition and Obesities: Systemic Approaches, NutriOmique, 91 Boulevard de l'hôpital, 75013, Paris, France
| |
Collapse
|
23
|
Unveiling the Machinery behind Chromosome Folding by Polymer Physics Modeling. Int J Mol Sci 2023; 24:ijms24043660. [PMID: 36835064 PMCID: PMC9967178 DOI: 10.3390/ijms24043660] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/14/2023] [Revised: 02/06/2023] [Accepted: 02/09/2023] [Indexed: 02/16/2023] Open
Abstract
Understanding the mechanisms underlying the complex 3D architecture of mammalian genomes poses, at a more fundamental level, the problem of how two or multiple genomic sites can establish physical contacts in the nucleus of the cells. Beyond stochastic and fleeting encounters related to the polymeric nature of chromatin, experiments have revealed specific, privileged patterns of interactions that suggest the existence of basic organizing principles of folding. In this review, we focus on two major and recently proposed physical processes of chromatin organization: loop-extrusion and polymer phase-separation, both supported by increasing experimental evidence. We discuss their implementation into polymer physics models, which we test against available single-cell super-resolution imaging data, showing that both mechanisms can cooperate to shape chromatin structure at the single-molecule level. Next, by exploiting the comprehension of the underlying molecular mechanisms, we illustrate how such polymer models can be used as powerful tools to make predictions in silico that can complement experiments in understanding genome folding. To this aim, we focus on recent key applications, such as the prediction of chromatin structure rearrangements upon disease-associated mutations and the identification of the putative chromatin organizing factors that orchestrate the specificity of DNA regulatory contacts genome-wide.
Collapse
|
24
|
Park K, Keleş S. Joint tensor modeling of single cell 3D genome and epigenetic data with Muscle. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.01.27.525871. [PMID: 36747701 PMCID: PMC9900892 DOI: 10.1101/2023.01.27.525871] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 01/30/2023]
Abstract
Emerging single cell technologies that simultaneously capture long-range interactions of genomic loci together with their DNA methylation levels are advancing our understanding of three-dimensional genome structure and its interplay with the epigenome at the single cell level. While methods to analyze data from single cell high throughput chromatin conformation capture (scHi-C) experiments are maturing, methods that can jointly analyze multiple single cell modalities with scHi-C data are lacking. Here, we introduce Muscle, a semi-nonnegative joint decomposition of Multiple single cell tensors, to jointly analyze 3D conformation and DNA methylation data at the single cell level. Muscle takes advantage of the inherent tensor structure of the scHi-C data, and integrates this modality with DNA methylation. We developed an alternating least squares algorithm for estimating Muscle parameters and established its optimality properties. Parameters estimated by Muscle directly align with the key components of the downstream analysis of scHi-C data in a cell type specific manner. Evaluations with data-driven experiments and simulations demonstrate the advantages of the joint modeling framework of Muscle over single modality modeling or a baseline multi modality modeling for cell type delineation and elucidating associations between modalities. Muscle is publicly available at https://github.com/keleslab/muscle.
Collapse
Affiliation(s)
- Kwangmoon Park
- Department of Statistics, University of Wisconsin, Madison, WI, USA, 53706
| | - Sündüz Keleş
- Department of Statistics, University of Wisconsin, Madison, WI, USA, 53706
- Department of Biostatistics and Medical Informatics, University of Wisconsin, Madison, WI, USA, 53726
| |
Collapse
|
25
|
Liu Q, Zeng W, Zhang W, Wang S, Chen H, Jiang R, Zhou M, Zhang S. Deep generative modeling and clustering of single cell Hi-C data. Brief Bioinform 2023; 24:6858951. [PMID: 36458445 DOI: 10.1093/bib/bbac494] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/20/2022] [Revised: 09/28/2022] [Accepted: 10/18/2022] [Indexed: 12/05/2022] Open
Abstract
Deciphering 3D genome conformation is important for understanding gene regulation and cellular function at a spatial level. The recent advances of single cell Hi-C technologies have enabled the profiling of the 3D architecture of DNA within individual cell, which allows us to study the cell-to-cell variability of 3D chromatin organization. Computational approaches are in urgent need to comprehensively analyze the sparse and heterogeneous single cell Hi-C data. Here, we proposed scDEC-Hi-C, a new framework for single cell Hi-C analysis with deep generative neural networks. scDEC-Hi-C outperforms existing methods in terms of single cell Hi-C data clustering and imputation. Moreover, the generative power of scDEC-Hi-C could help unveil the differences of chromatin architecture across cell types. We expect that scDEC-Hi-C could shed light on deepening our understanding of the complex mechanism underlying the formation of chromatin contacts.
Collapse
Affiliation(s)
- Qiao Liu
- Department of Statistics, Stanford University, Stanford, CA 94305, USA
| | - Wanwen Zeng
- College of Software, Nankai University, Tianjin 300071, China
| | - Wei Zhang
- Department of Biomedical Engineering, School of Control Science and Engineering, Shandong University, Jinan, Shandong 250061, China
| | - Sicheng Wang
- Department of Computer Science and Engineering, University of California San Diego, La Jolla, CA 92093, USA
| | - Hongyang Chen
- The Research Center for Intelligent Network, Zhejiang Lab, Hangzhou 311121, China
| | - Rui Jiang
- Ministry of Education Key Laboratory of Bioinformatics, Research Department of Bioinformatics at the Beijing National Research Center for Information Science and Technology, Center for Synthetic and Systems Biology, Department of Automation, Tsinghua University, Beijing 100084, China
| | - Mu Zhou
- SenseBrain Research, San Jose, CA 95131, USA
| | - Shaoting Zhang
- Shanghai Artificial Intelligence Laboratory, Shanghai 200240, China
| |
Collapse
|
26
|
Chakraborty A, Wang JG, Ay F. dcHiC detects differential compartments across multiple Hi-C datasets. Nat Commun 2022; 13:6827. [PMID: 36369226 PMCID: PMC9652325 DOI: 10.1038/s41467-022-34626-6] [Citation(s) in RCA: 21] [Impact Index Per Article: 10.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/23/2022] [Accepted: 11/01/2022] [Indexed: 11/13/2022] Open
Abstract
The compartmental organization of mammalian genomes and its changes play important roles in distinct biological processes. Here, we introduce dcHiC, which utilizes a multivariate distance measure to identify significant changes in compartmentalization among multiple contact maps. Evaluating dcHiC on four collections of bulk and single-cell contact maps from in vitro mouse neural differentiation (n = 3), mouse hematopoiesis (n = 10), human LCLs (n = 20) and post-natal mouse brain development (n = 3 stages), we show its effectiveness and sensitivity in detecting biologically relevant changes, including those orthogonally validated. dcHiC reported regions with dynamically regulated genes associated with cell identity, along with correlated changes in chromatin states, subcompartments, replication timing and lamin association. With its efficient implementation, dcHiC enables high-resolution compartment analysis as well as standalone browser visualization, differential interaction identification and time-series clustering. dcHiC is an essential addition to the Hi-C analysis toolbox for the ever-growing number of bulk and single-cell contact maps. Available at: https://github.com/ay-lab/dcHiC .
Collapse
Affiliation(s)
- Abhijit Chakraborty
- Centers for Autoimmunity, Inflammation and Cancer Immunotherapy, La Jolla Institute for Immunology, La Jolla, CA, 92037, USA.
| | - Jeffrey G Wang
- Centers for Autoimmunity, Inflammation and Cancer Immunotherapy, La Jolla Institute for Immunology, La Jolla, CA, 92037, USA
- The Bishop's School, La Jolla, CA, 92037, USA
- Harvard College, Cambridge, MA, 02138, USA
| | - Ferhat Ay
- Centers for Autoimmunity, Inflammation and Cancer Immunotherapy, La Jolla Institute for Immunology, La Jolla, CA, 92037, USA.
- Bioinformatics and Systems Biology Program, University of California San Diego, La Jolla, CA, 92093, USA.
- Department of Pediatrics, University of California San Diego, La Jolla, CA, 92093, USA.
| |
Collapse
|
27
|
Zhang R, Zhou T, Ma J. Ultrafast and interpretable single-cell 3D genome analysis with Fast-Higashi. Cell Syst 2022; 13:798-807.e6. [PMID: 36265466 PMCID: PMC9867958 DOI: 10.1016/j.cels.2022.09.004] [Citation(s) in RCA: 12] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/20/2022] [Revised: 09/01/2022] [Accepted: 09/13/2022] [Indexed: 01/26/2023]
Abstract
Single-cell Hi-C (scHi-C) technologies can probe three-dimensional (3D) genome structures in individual cells. However, existing scHi-C analysis methods are hindered by the data quality and complex 3D genome patterns. The lack of computational scalability and interpretability poses further challenges for large-scale analysis. Here, we introduce Fast-Higashi, an ultrafast and interpretable method based on tensor decomposition and partial random walk with restart, enabling joint identification of cell identities and chromatin meta-interactions from sparse scHi-C data. Extensive evaluations demonstrate the advantage of Fast-Higashi over existing methods, leading to improved delineation of rare cell types and continuous developmental trajectories. Fast-Higashi can directly identify 3D genome features that define distinct cell types and help elucidate cell-type-specific connections between genome structure and function. Moreover, Fast-Higashi can generalize to incorporate other single-cell omics data. Fast-Higashi provides a highly efficient and interpretable scHi-C analysis solution that is applicable to a broad range of biological contexts.
Collapse
Affiliation(s)
- Ruochi Zhang
- Computational Biology Department, School of Computer Science, Carnegie Mellon University, Pittsburgh, PA 15213, USA.
| | - Tianming Zhou
- Computational Biology Department, School of Computer Science, Carnegie Mellon University, Pittsburgh, PA 15213, USA
| | - Jian Ma
- Computational Biology Department, School of Computer Science, Carnegie Mellon University, Pittsburgh, PA 15213, USA.
| |
Collapse
|
28
|
Zheng Y, Shen S, Keleş S. Normalization and de-noising of single-cell Hi-C data with BandNorm and scVI-3D. Genome Biol 2022; 23:222. [PMID: 36253828 PMCID: PMC9575231 DOI: 10.1186/s13059-022-02774-z] [Citation(s) in RCA: 9] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/16/2021] [Accepted: 09/19/2022] [Indexed: 11/10/2022] Open
Abstract
Single-cell high-throughput chromatin conformation capture methodologies (scHi-C) enable profiling of long-range genomic interactions. However, data from these technologies are prone to technical noise and biases that hinder downstream analysis. We develop a normalization approach, BandNorm, and a deep generative modeling framework, scVI-3D, to account for scHi-C specific biases. In benchmarking experiments, BandNorm yields leading performances in a time and memory efficient manner for cell-type separation, identification of interacting loci, and recovery of cell-type relationships, while scVI-3D exhibits advantages for rare cell types and under high sparsity scenarios. Application of BandNorm coupled with gene-associating domain analysis reveals scRNA-seq validated sub-cell type identification.
Collapse
Affiliation(s)
- Ye Zheng
- Vaccine and Infectious Disease Division, Fred Hutchinson Cancer Center, Seattle, USA
| | - Siqi Shen
- Department of Biostatistics and Medical Informatics, University of Wisconsin - Madison, Madison, USA
| | - Sündüz Keleş
- Department of Biostatistics and Medical Informatics, University of Wisconsin - Madison, Madison, USA
- Department of Statistics, University of Wisconsin - Madison, Madison, USA
| |
Collapse
|
29
|
Chi Y, Shi J, Xing D, Tan L. Every gene everywhere all at once: High-precision measurement of 3D chromosome architecture with single-cell Hi-C. Front Mol Biosci 2022; 9:959688. [PMID: 36275628 PMCID: PMC9583135 DOI: 10.3389/fmolb.2022.959688] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/06/2022] [Accepted: 09/06/2022] [Indexed: 11/13/2022] Open
Abstract
The three-dimensional (3D) structure of chromosomes influences essential biological processes such as gene expression, genome replication, and DNA damage repair and has been implicated in many developmental and degenerative diseases. In the past two centuries, two complementary genres of technology-microscopy, such as fluorescence in situ hybridization (FISH), and biochemistry, such as chromosome conformation capture (3C or Hi-C)-have revealed general principles of chromosome folding in the cell nucleus. However, the extraordinary complexity and cell-to-cell variability of the chromosome structure necessitate new tools with genome-wide coverage and single-cell precision. In the past decade, single-cell Hi-C emerges as a new approach that builds upon yet conceptually differs from bulk Hi-C assays. Instead of measuring population-averaged statistical properties of chromosome folding, single-cell Hi-C works as a proximity-based "biochemical microscope" that measures actual 3D structures of individual genomes, revealing features hidden in bulk Hi-C such as radial organization, multi-way interactions, and chromosome intermingling. Single-cell Hi-C has been used to study highly dynamic processes such as the cell cycle, cell-type-specific chromosome architecture ("structure types"), and structure-expression interplay, deepening our understanding of DNA organization and function.
Collapse
Affiliation(s)
- Yi Chi
- Biomedical Pioneering Innovation Center, Peking University, Beijing, China,Innovation Center for Genomics, Peking University, Beijing, China
| | - Jenny Shi
- Department of Neurobiology, Stanford University, Stanford, CA, United States,Department of Chemistry, Stanford University, Stanford, CA, United States,Department of Bioengineering, Stanford University, Stanford, CA, United States
| | - Dong Xing
- Biomedical Pioneering Innovation Center, Peking University, Beijing, China,Innovation Center for Genomics, Peking University, Beijing, China,*Correspondence: Longzhi Tan, ; Dong Xing,
| | - Longzhi Tan
- Department of Neurobiology, Stanford University, Stanford, CA, United States,Department of Bioengineering, Stanford University, Stanford, CA, United States,*Correspondence: Longzhi Tan, ; Dong Xing,
| |
Collapse
|
30
|
Zhong W, Liu W, Chen J, Sun Q, Hu M, Li Y. Understanding the function of regulatory DNA interactions in the interpretation of non-coding GWAS variants. Front Cell Dev Biol 2022; 10:957292. [PMID: 36060805 PMCID: PMC9437546 DOI: 10.3389/fcell.2022.957292] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/30/2022] [Accepted: 07/21/2022] [Indexed: 01/11/2023] Open
Abstract
Genome-wide association studies (GWAS) have identified a vast number of variants associated with various complex human diseases and traits. However, most of these GWAS variants reside in non-coding regions producing no proteins, making the interpretation of these variants a daunting challenge. Prior evidence indicates that a subset of non-coding variants detected within or near cis-regulatory elements (e.g., promoters, enhancers, silencers, and insulators) might play a key role in disease etiology by regulating gene expression. Advanced sequencing- and imaging-based technologies, together with powerful computational methods, enabling comprehensive characterization of regulatory DNA interactions, have substantially improved our understanding of the three-dimensional (3D) genome architecture. Recent literature witnesses plenty of examples where using chromosome conformation capture (3C)-based technologies successfully links non-coding variants to their target genes and prioritizes relevant tissues or cell types. These examples illustrate the critical capability of 3D genome organization in annotating non-coding GWAS variants. This review discusses how 3D genome organization information contributes to elucidating the potential roles of non-coding GWAS variants in disease etiology.
Collapse
Affiliation(s)
- Wujuan Zhong
- Biostatistics and Research Decision Sciences, Merck & Co, Inc, Rahway, NJ, United States
| | - Weifang Liu
- Department of Biostatistics, University of North Carolina at Chapel Hill, Chapel Hill, NC, United States
| | - Jiawen Chen
- Department of Biostatistics, University of North Carolina at Chapel Hill, Chapel Hill, NC, United States
| | - Quan Sun
- Department of Biostatistics, University of North Carolina at Chapel Hill, Chapel Hill, NC, United States
| | - Ming Hu
- Department of Quantitative Health Sciences, Lerner Research Institute, Cleveland Clinic Foundation, Cleveland, OH, United States
| | - Yun Li
- Department of Biostatistics, University of North Carolina at Chapel Hill, Chapel Hill, NC, United States
- Department of Genetics, University of North Carolina at Chapel Hill, Chapel Hill, NC, United States
- Department of Computer Science, University of North Carolina at Chapel Hill, Chapel Hill, NC, United States
| |
Collapse
|
31
|
Loop-extrusion and polymer phase-separation can co-exist at the single-molecule level to shape chromatin folding. Nat Commun 2022; 13:4070. [PMID: 35831310 PMCID: PMC9279381 DOI: 10.1038/s41467-022-31856-6] [Citation(s) in RCA: 22] [Impact Index Per Article: 11.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/30/2021] [Accepted: 07/06/2022] [Indexed: 11/09/2022] Open
Abstract
Loop-extrusion and phase-separation have been proposed as mechanisms that shape chromosome spatial organization. It is unclear, however, how they perform relative to each other in explaining chromatin architecture data and whether they compete or co-exist at the single-molecule level. Here, we compare models of polymer physics based on loop-extrusion and phase-separation, as well as models where both mechanisms act simultaneously in a single molecule, against multiplexed FISH data available in human loci in IMR90 and HCT116 cells. We find that the different models recapitulate bulk Hi-C and average multiplexed microscopy data. Single-molecule chromatin conformations are also well captured, especially by phase-separation based models that better reflect the experimentally reported segregation in globules of the considered genomic loci and their cell-to-cell structural variability. Such a variability is consistent with two main concurrent causes: single-cell epigenetic heterogeneity and an intrinsic thermodynamic conformational degeneracy of folding. Overall, the model combining loop-extrusion and polymer phase-separation provides a very good description of the data, particularly higher-order contacts, showing that the two mechanisms can co-exist in shaping chromatin architecture in single cells.
Collapse
|
32
|
Shen S, Zheng Y, Keleş S. scGAD: single-cell gene associating domain scores for exploratory analysis of scHi-C data. Bioinformatics 2022; 38:3642-3644. [PMID: 35652733 PMCID: PMC9272792 DOI: 10.1093/bioinformatics/btac372] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/22/2021] [Revised: 03/30/2022] [Accepted: 05/26/2022] [Indexed: 11/12/2022] Open
Abstract
SUMMARY Quantitative tools are needed to leverage the unprecedented resolution of single-cell high-throughput chromatin conformation (scHi-C) data and integrate it with other single-cell data modalities. We present single-cell gene associating domain (scGAD) scores as a dimension reduction and exploratory analysis tool for scHi-C data. scGAD enables summarization at the gene unit while accounting for inherent gene-level genomic biases. Low-dimensional projections with scGAD capture clustering of cells based on their 3D structures. Significant chromatin interactions within and between cell types can be identified with scGAD. We further show that scGAD facilitates the integration of scHi-C data with other single-cell data modalities by enabling its projection onto reference low-dimensional embeddings. This multi-modal data integration provides an automated and refined cell-type annotation for scHi-C data. AVAILABILITY AND IMPLEMENTATION scGAD is part of the BandNorm R package at https://sshen82.github.io/BandNorm/articles/scGAD-tutorial.html. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Siqi Shen
- Department of Biostatistics and Medical Informatics, University of Wisconsin—Madison, Madison, WI 53706, USA
| | - Ye Zheng
- Vaccine and Infectious Disease Division, Fred Hutchinson Cancer Center, Seattle, WA 98109, USA
| | - Sündüz Keleş
- Department of Biostatistics and Medical Informatics, University of Wisconsin—Madison, Madison, WI 53706, USA
- Department of Statistics, University of Wisconsin—Madison, Madison, WI 53706, USA
| |
Collapse
|
33
|
Mora A, Huang X, Jauhari S, Jiang Q, Li X. Chromatin Hubs: A biological and computational outlook. Comput Struct Biotechnol J 2022; 20:3796-3813. [PMID: 35891791 PMCID: PMC9304431 DOI: 10.1016/j.csbj.2022.07.002] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/14/2022] [Revised: 07/02/2022] [Accepted: 07/02/2022] [Indexed: 11/20/2022] Open
Abstract
This review discusses our current understanding of chromatin biology and bioinformatics under the unifying concept of “chromatin hubs.” The first part reviews the biology of chromatin hubs, including chromatin–chromatin interaction hubs, chromatin hubs at the nuclear periphery, hubs around macromolecules such as RNA polymerase or lncRNAs, and hubs around nuclear bodies such as the nucleolus or nuclear speckles. The second part reviews existing computational methods, including enhancer–promoter interaction prediction, network analysis, chromatin domain callers, transcription factory predictors, and multi-way interaction analysis. We introduce an integrated model that makes sense of the existing evidence. Understanding chromatin hubs may allow us (i) to explain long-unsolved biological questions such as interaction specificity and redundancy of mechanisms, (ii) to develop more realistic kinetic and functional predictions, and (iii) to explain the etiology of genomic disease.
Collapse
Affiliation(s)
- Antonio Mora
- Joint School of Life Sciences, Guangzhou Medical University and Guangzhou Institutes of Biomedicine and Health (Chinese Academy of Sciences), Guangzhou 511436, PR China
- Corresponding authors.
| | - Xiaowei Huang
- Joint School of Life Sciences, Guangzhou Medical University and Guangzhou Institutes of Biomedicine and Health (Chinese Academy of Sciences), Guangzhou 511436, PR China
| | - Shaurya Jauhari
- Joint School of Life Sciences, Guangzhou Medical University and Guangzhou Institutes of Biomedicine and Health (Chinese Academy of Sciences), Guangzhou 511436, PR China
| | - Qin Jiang
- Affiliated Eye Hospital of Nanjing Medical University, Nanjing 210000, PR China
| | - Xuri Li
- State Key Laboratory of Ophthalmology, Zhongshan Ophthalmic Center, Sun Yat-Sen University, and Guangdong Provincial Key Laboratory of Ophthalmology and Visual Science, Guangzhou 510060, PR China
- Corresponding authors.
| |
Collapse
|
34
|
Li X, Lee L, Abnousi A, Yu M, Liu W, Huang L, Li Y, Hu M. SnapHiC2: A computationally efficient loop caller for single cell Hi-C data. Comput Struct Biotechnol J 2022; 20:2778-2783. [PMID: 35685374 PMCID: PMC9168059 DOI: 10.1016/j.csbj.2022.05.046] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/14/2022] [Revised: 05/23/2022] [Accepted: 05/23/2022] [Indexed: 01/11/2023] Open
Abstract
Single cell Hi-C (scHi-C) technologies enable the study of chromatin spatial organization directly from complex tissues at single cell resolution. However, the identification of chromatin loops from single cells is challenging, largely due to the extremely sparse data. Our recently developed SnapHiC pipeline provides the first tool to map chromatin loops from scHi-C data, but it is computationally intensive. Here we introduce SnapHiC2, which adapts a sliding window approximation when imputing missing contacts in each single cell and reduces both memory usage and computational time by 70%. SnapHiC2 can identify 5 Kb resolution chromatin loops with high sensitivity and accuracy and help to suggest target genes for GWAS variants in a cell-type-specific manner. SnapHiC2 is freely available at: https://github.com/HuMingLab/SnapHiC/releases/tag/v0.2.2.
Collapse
Affiliation(s)
- Xiaoqi Li
- Carolina Health Informatics Program, University of North Carolina, Chapel Hill, NC, USA
| | - Lindsay Lee
- Department of Quantitative Health Sciences, Lerner Research Institute, Cleveland Clinic Foundation, Cleveland, OH, USA
| | - Armen Abnousi
- Department of Quantitative Health Sciences, Lerner Research Institute, Cleveland Clinic Foundation, Cleveland, OH, USA
| | - Miao Yu
- State Key Laboratory of Genetic Engineering, School of Life Sciences, Fudan University, Shanghai, China
| | - Weifang Liu
- Department of Biostatistics, University of North Carolina, Chapel Hill, NC, USA
| | - Le Huang
- Curriculum in Bioinformatics and Computational Biology, University of North Carolina, Chapel Hill, NC, USA
| | - Yun Li
- Department of Biostatistics, University of North Carolina, Chapel Hill, NC, USA
- Department of Genetics, University of North Carolina, Chapel Hill, NC, USA
- Department of Computer Science, University of North Carolina, Chapel Hill, NC, USA
| | - Ming Hu
- Department of Quantitative Health Sciences, Lerner Research Institute, Cleveland Clinic Foundation, Cleveland, OH, USA
| |
Collapse
|
35
|
Pancheva A, Wheadon H, Rogers S, Otto TD. Using topic modeling to detect cellular crosstalk in scRNA-seq. PLoS Comput Biol 2022; 18:e1009975. [PMID: 35395014 PMCID: PMC9064087 DOI: 10.1371/journal.pcbi.1009975] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/06/2021] [Revised: 05/03/2022] [Accepted: 02/25/2022] [Indexed: 11/19/2022] Open
Abstract
Cell-cell interactions are vital for numerous biological processes including development, differentiation, and response to inflammation. Currently, most methods for studying interactions on scRNA-seq level are based on curated databases of ligands and receptors. While those methods are useful, they are limited to our current biological knowledge. Recent advances in single cell protocols have allowed for physically interacting cells to be captured, and as such we have the potential to study interactions in a complemantary way without relying on prior knowledge. We introduce a new method based on Latent Dirichlet Allocation (LDA) for detecting genes that change as a result of interaction. We apply our method to synthetic datasets to demonstrate its ability to detect genes that change in an interacting population compared to a reference population. Next, we apply our approach to two datasets of physically interacting cells to identify the genes that change as a result of interaction, examples include adhesion and co-stimulatory molecules which confirm physical interaction between cells. For each dataset we produce a ranking of genes that are changing in subpopulations of the interacting cells. In addition to the genes discussed in the original publications, we highlight further candidates for interaction in the top 100 and 300 ranked genes. Lastly, we apply our method to a dataset generated by a standard droplet-based protocol not designed to capture interacting cells, and discuss its suitability for analysing interactions. We present a method that streamlines detection of interactions and does not require prior clustering and generation of synthetic reference profiles to detect changes in expression.
Collapse
Affiliation(s)
- Alexandrina Pancheva
- Institute for Infection, Immunity and Inflammation, University of Glasgow, Glasgow, United Kingdom
| | - Helen Wheadon
- Institute of Cancer Sciences, University of Glasgow, Glasgow, United Kingdom
| | - Simon Rogers
- School of Computing Science, University of Glasgow, Glasgow, United Kingdom
| | - Thomas D. Otto
- Institute for Infection, Immunity and Inflammation, University of Glasgow, Glasgow, United Kingdom
| |
Collapse
|
36
|
Yu M, Li Y, Hu M. Mapping chromatin loops in single cells. Trends Genet 2022; 38:637-640. [DOI: 10.1016/j.tig.2022.03.007] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/07/2022] [Revised: 03/10/2022] [Accepted: 03/11/2022] [Indexed: 10/18/2022]
|
37
|
Liu W, Zhong W, Chen J, Huang B, Hu M, Li Y. Understanding Regulatory Mechanisms of Brain Function and Disease through 3D Genome Organization. Genes (Basel) 2022; 13:genes13040586. [PMID: 35456393 PMCID: PMC9027261 DOI: 10.3390/genes13040586] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/28/2022] [Revised: 03/17/2022] [Accepted: 03/23/2022] [Indexed: 02/01/2023] Open
Abstract
The human genome has a complex and dynamic three-dimensional (3D) organization, which plays a critical role for gene regulation and genome function. The importance of 3D genome organization in brain development and function has been well characterized in a region- and cell-type-specific fashion. Recent technological advances in chromosome conformation capture (3C)-based techniques, imaging approaches, and ligation-free methods, along with computational methods to analyze the data generated, have revealed 3D genome features at different scales in the brain that contribute to our understanding of genetic mechanisms underlying neuropsychiatric diseases and other brain-related traits. In this review, we discuss how these advances aid in the genetic dissection of brain-related traits.
Collapse
Affiliation(s)
- Weifang Liu
- Department of Biostatistics, University of North Carolina at Chapel Hill, Chapel Hill, NC 27599, USA; (W.L.); (J.C.)
| | - Wujuan Zhong
- Biostatistics and Research Decision Sciences, Merck & Co., Inc., Rahway, NJ 07065, USA;
| | - Jiawen Chen
- Department of Biostatistics, University of North Carolina at Chapel Hill, Chapel Hill, NC 27599, USA; (W.L.); (J.C.)
| | - Bo Huang
- Department of Pharmaceutical Chemistry, University of California, San Francisco, CA 94143, USA;
- Chan Zuckerberg Biohub, San Francisco, CA 94158, USA
- Department of Biochemistry and Biophysics, University of California, San Francisco, CA 94143, USA
| | - Ming Hu
- Department of Quantitative Health Sciences, Lerner Research Institute, Cleveland Clinic Foundation, Cleveland, OH 44195, USA
- Correspondence: (M.H.); (Y.L.)
| | - Yun Li
- Department of Biostatistics, University of North Carolina at Chapel Hill, Chapel Hill, NC 27599, USA; (W.L.); (J.C.)
- Department of Genetics, University of North Carolina at Chapel Hill, Chapel Hill, NC 27599, USA
- Department of Computer Science, University of North Carolina at Chapel Hill, Chapel Hill, NC 27599, USA
- Correspondence: (M.H.); (Y.L.)
| |
Collapse
|
38
|
Adossa NA, Rytkönen KT, Elo LL. Dirichlet process mixture models for single-cell RNA-seq clustering. Biol Open 2022; 11:274586. [PMID: 35237784 PMCID: PMC9002799 DOI: 10.1242/bio.059001] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/21/2021] [Accepted: 02/17/2022] [Indexed: 11/20/2022] Open
Abstract
Clustering of cells based on gene expression is one of the major steps in single-cell RNA-sequencing (scRNA-seq) data analysis. One key challenge in cluster analysis is the unknown number of clusters and, for this issue, there is still no comprehensive solution. To enhance the process of defining meaningful cluster resolution, we compare Bayesian latent Dirichlet allocation (LDA) method to its non-parametric counterpart, hierarchical Dirichlet process (HDP) in the context of clustering scRNA-seq data. A potential main advantage of HDP is that it does not require the number of clusters as an input parameter from the user. While LDA has been used in single-cell data analysis, it has not been compared in detail with HDP. Here, we compare the cell clustering performance of LDA and HDP using four scRNA-seq datasets (immune cells, kidney, pancreas and decidua/placenta), with a specific focus on cluster numbers. Using both intrinsic (DB-index) and extrinsic (ARI) cluster quality measures, we show that the performance of LDA and HDP is dataset dependent. We describe a case where HDP produced a more appropriate clustering compared to the best performer from a series of LDA clusterings with different numbers of clusters. However, we also observed cases where the best performing LDA cluster numbers appropriately capture the main biological features while HDP tended to inflate the number of clusters. Overall, our study highlights the importance of carefully assessing the number of clusters when analyzing scRNA-seq data. Summary: Dirichlet mixture models (LDA and HDP) are applied for clustering cells in scRNA-Seq data. Here we made a comprehensive comparison of LDA and HDP model-based clustering for scRNA-seq data.
Collapse
Affiliation(s)
- Nigatu A Adossa
- Turku Bioscience Centre, University of Turku and Åbo Akademi University, FI-20520, Turku, Finland
| | - Kalle T Rytkönen
- Turku Bioscience Centre, University of Turku and Åbo Akademi University, FI-20520, Turku, Finland.,Institute of Biomedicine, Research Centre for Integrative Physiology and Pharmacology, University of Turku, FI-20014, Finland
| | - Laura L Elo
- Turku Bioscience Centre, University of Turku and Åbo Akademi University, FI-20520, Turku, Finland.,Institute of Biomedicine, University of Turku, FI-20014, Finland
| |
Collapse
|
39
|
Dam TV, Toft NI, Grøntved L. Cell-Type Resolved Insights into the Cis-Regulatory Genome of NAFLD. Cells 2022; 11:cells11050870. [PMID: 35269495 PMCID: PMC8909044 DOI: 10.3390/cells11050870] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/27/2022] [Revised: 02/27/2022] [Accepted: 02/28/2022] [Indexed: 11/20/2022] Open
Abstract
The prevalence of non-alcoholic fatty liver disease (NAFLD) is increasing rapidly, and unmet treatment can result in the development of hepatitis, fibrosis, and liver failure. There are difficulties involved in diagnosing NAFLD early and for this reason there are challenges involved in its treatment. Furthermore, no drugs are currently approved to alleviate complications, a fact which highlights the need for further insight into disease mechanisms. NAFLD pathogenesis is associated with complex cellular changes, including hepatocyte steatosis, immune cell infiltration, endothelial dysfunction, hepatic stellate cell activation, and epithelial ductular reaction. Many of these cellular changes are controlled by dramatic changes in gene expression orchestrated by the cis-regulatory genome and associated transcription factors. Thus, to understand disease mechanisms, we need extensive insights into the gene regulatory mechanisms associated with tissue remodeling. Mapping cis-regulatory regions genome-wide is a step towards this objective and several current and emerging technologies allow detection of accessible chromatin and specific histone modifications in enriched cell populations of the liver, as well as in single cells. Here, we discuss recent insights into the cis-regulatory genome in NAFLD both at the organ-level and in specific cell populations of the liver. Moreover, we highlight emerging technologies that enable single-cell resolved analysis of the cis-regulatory genome of the liver.
Collapse
|
40
|
Zhang R, Zhou T, Ma J. Multiscale and integrative single-cell Hi-C analysis with Higashi. Nat Biotechnol 2022; 40:254-261. [PMID: 34635838 PMCID: PMC8843812 DOI: 10.1038/s41587-021-01034-y] [Citation(s) in RCA: 51] [Impact Index Per Article: 25.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/12/2021] [Accepted: 07/27/2021] [Indexed: 02/08/2023]
Abstract
Single-cell Hi-C (scHi-C) can identify cell-to-cell variability of three-dimensional (3D) chromatin organization, but the sparseness of measured interactions poses an analysis challenge. Here we report Higashi, an algorithm based on hypergraph representation learning that can incorporate the latent correlations among single cells to enhance overall imputation of contact maps. Higashi outperforms existing methods for embedding and imputation of scHi-C data and is able to identify multiscale 3D genome features in single cells, such as compartmentalization and TAD-like domain boundaries, allowing refined delineation of their cell-to-cell variability. Moreover, Higashi can incorporate epigenomic signals jointly profiled in the same cell into the hypergraph representation learning framework, as compared to separate analysis of two modalities, leading to improved embeddings for single-nucleus methyl-3C data. In an scHi-C dataset from human prefrontal cortex, Higashi identifies connections between 3D genome features and cell-type-specific gene regulation. Higashi can also potentially be extended to analyze single-cell multiway chromatin interactions and other multimodal single-cell omics data.
Collapse
Affiliation(s)
- Ruochi Zhang
- grid.147455.60000 0001 2097 0344Computational Biology Department, School of Computer Science, Carnegie Mellon University, Pittsburgh, PA USA
| | - Tianming Zhou
- grid.147455.60000 0001 2097 0344Computational Biology Department, School of Computer Science, Carnegie Mellon University, Pittsburgh, PA USA
| | - Jian Ma
- grid.147455.60000 0001 2097 0344Computational Biology Department, School of Computer Science, Carnegie Mellon University, Pittsburgh, PA USA
| |
Collapse
|
41
|
Shim AR, Huang K, Backman V, Szleifer I. Chromatin as self-returning walks: From population to single cell and back. BIOPHYSICAL REPORTS 2021; 2:100042. [PMID: 36425085 PMCID: PMC9680733 DOI: 10.1016/j.bpr.2021.100042] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 08/04/2021] [Accepted: 12/08/2021] [Indexed: 10/19/2022]
Abstract
With a growing understanding of the chromatin structure, many efforts remain focused on bridging the gap between what is suggested by population-averaged data and what is visualized for single cells. A popular approach to traversing these scales is to fit a polymer model to Hi-C contact data. However, Hi-C is an average of millions to billions of cells, and each cell may not contain all population-averaged contacts. Therefore, we employ a novel approach of summing individual chromosome trajectories-determined by our Self-Returning Random Walk model-to create populations of cells. We allow single cells to consist of disparate structures and reproduce a variety of experimentally relevant contact maps. We show that the amount of shared topology between cells, and their mechanism of formation, changes the population-averaged structure. Therefore, we present a modeling technique that, with few constraints and little oversight, can be used to understand which single-cell chromatin structures underlie population-averaged behavior.
Collapse
Affiliation(s)
- Anne R. Shim
- Department of Biomedical Engineering, Northwestern University, Evanston, Illinois,Chemistry of Life Processes Institute, Northwestern University, Evanston, Illinois
| | - Kai Huang
- Shenzhen Bay Laboratory, Shenzhen, Guangdong Province, P. R. China,Corresponding author
| | - Vadim Backman
- Department of Biomedical Engineering, Northwestern University, Evanston, Illinois,Chemistry of Life Processes Institute, Northwestern University, Evanston, Illinois
| | - Igal Szleifer
- Department of Biomedical Engineering, Northwestern University, Evanston, Illinois,Chemistry of Life Processes Institute, Northwestern University, Evanston, Illinois,Department of Chemistry, Northwestern University, Evanston, Illinois,Corresponding author
| |
Collapse
|
42
|
Mulqueen RM, Pokholok D, O’Connell BL, Thornton CA, Zhang F, O’Roak BJ, Link J, Yardımcı GG, Sears RC, Steemers FJ, Adey AC. High-content single-cell combinatorial indexing. Nat Biotechnol 2021; 39:1574-1580. [PMID: 34226710 PMCID: PMC8678206 DOI: 10.1038/s41587-021-00962-z] [Citation(s) in RCA: 30] [Impact Index Per Article: 10.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/08/2021] [Accepted: 05/20/2021] [Indexed: 02/06/2023]
Abstract
Single-cell combinatorial indexing (sci) with transposase-based library construction increases the throughput of single-cell genomics assays but produces sparse coverage in terms of usable reads per cell. We develop symmetrical strand sci ('s3'), a uracil-based adapter switching approach that improves the rate of conversion of source DNA into viable sequencing library fragments following tagmentation. We apply this chemistry to assay chromatin accessibility (s3-assay for transposase-accessible chromatin, s3-ATAC) in human cortical and mouse whole-brain tissues, with mouse datasets demonstrating a six- to 13-fold improvement in usable reads per cell compared with other available methods. Application of s3 to single-cell whole-genome sequencing (s3-WGS) and to whole-genome plus chromatin conformation (s3-GCC) yields 148- and 14.8-fold improvements, respectively, in usable reads per cell compared with sci-DNA-sequencing and sci-HiC. We show that s3-WGS and s3-GCC resolve subclonal genomic alterations in patient-derived pancreatic cancer cell lines. We expect that the s3 platform will be compatible with other transposase-based techniques, including sci-MET or CUT&Tag.
Collapse
Affiliation(s)
- Ryan M. Mulqueen
- Oregon Health & Science University, Department of Molecular and Medical Genetics, Portland, OR
| | | | - Brendan L. O’Connell
- Oregon Health & Science University, Department of Molecular and Medical Genetics, Portland, OR
| | - Casey A. Thornton
- Oregon Health & Science University, Department of Molecular and Medical Genetics, Portland, OR
| | | | - Brian J. O’Roak
- Oregon Health & Science University, Department of Molecular and Medical Genetics, Portland, OR
| | - Jason Link
- Oregon Health & Science University, Department of Molecular and Medical Genetics, Portland, OR,Oregon Health & Science University, Knight Cancer Institute, Portland, OR,Oregon Health & Science University, Brendan Colson Center for Pancreatic Care, Portland, OR
| | - Galip Gürkan Yardımcı
- Oregon Health & Science University, Knight Cancer Institute, Portland, OR,Oregon Health & Science University, Cancer Early Detection Advanced Research Center, Portland, OR
| | - Rosalie C. Sears
- Oregon Health & Science University, Department of Molecular and Medical Genetics, Portland, OR,Oregon Health & Science University, Knight Cancer Institute, Portland, OR,Oregon Health & Science University, Brendan Colson Center for Pancreatic Care, Portland, OR,Oregon Health & Science University, Cancer Early Detection Advanced Research Center, Portland, OR
| | | | - Andrew C. Adey
- Oregon Health & Science University, Department of Molecular and Medical Genetics, Portland, OR,Oregon Health & Science University, Knight Cancer Institute, Portland, OR,Oregon Health & Science University, Cancer Early Detection Advanced Research Center, Portland, OR,Oregon Health & Science University, Department of Oncological Sciences, Portland, OR,Oregon Health & Science University, Knight Cardiovascular Institute, Portland, OR,Correspondence to
| |
Collapse
|
43
|
Large-scale DNA demethylation occurs in proliferating ovarian granulosa cells during mouse follicular development. Commun Biol 2021; 4:1334. [PMID: 34824385 PMCID: PMC8617273 DOI: 10.1038/s42003-021-02849-w] [Citation(s) in RCA: 12] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/16/2020] [Accepted: 11/04/2021] [Indexed: 12/20/2022] Open
Abstract
During ovarian follicular development, granulosa cells proliferate and progressively differentiate to support oocyte maturation and ovulation. To determine the underlying links between proliferation and differentiation in granulosa cells, we determined changes in 1) the expression of genes regulating DNA methylation and 2) DNA methylation patterns, histone acetylation levels and genomic DNA structure. In response to equine chorionic gonadotropin (eCG), granulosa cell proliferation increased, DNA methyltransferase (DNMT1) significantly decreased and Tet methylcytosine dioxygenase 2 (TET2) significantly increased in S-phase granulosa cells. Comprehensive MeDIP-seq analyses documented that eCG treatment decreased methylation of promoter regions in approximately 40% of the genes in granulosa cells. The expression of specific demethylated genes was significantly increased in association with specific histone modifications and changes in DNA structure. These epigenetic processes were suppressed by a cell cycle inhibitor. Based on these results, we propose that the timing of sequential epigenetic events is essential for progressive, stepwise changes in granulosa cell differentiation.
Collapse
|
44
|
Galitsyna AA, Gelfand MS. Single-cell Hi-C data analysis: safety in numbers. Brief Bioinform 2021; 22:bbab316. [PMID: 34406348 PMCID: PMC8575028 DOI: 10.1093/bib/bbab316] [Citation(s) in RCA: 21] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/31/2021] [Revised: 07/09/2021] [Accepted: 07/21/2021] [Indexed: 02/06/2023] Open
Abstract
Over the past decade, genome-wide assays for chromatin interactions in single cells have enabled the study of individual nuclei at unprecedented resolution and throughput. Current chromosome conformation capture techniques survey contacts for up to tens of thousands of individual cells, improving our understanding of genome function in 3D. However, these methods recover a small fraction of all contacts in single cells, requiring specialised processing of sparse interactome data. In this review, we highlight recent advances in methods for the interpretation of single-cell genomic contacts. After discussing the strengths and limitations of these methods, we outline frontiers for future development in this rapidly moving field.
Collapse
Affiliation(s)
- Aleksandra A Galitsyna
- Skolkovo Institute of Science and Technology, Skolkovo, Russia
- Institute for Information Transmission Problems, RAS, Moscow, Russia
- Institute of Gene Biology, RAS, Moscow, Russia
| | - Mikhail S Gelfand
- Skolkovo Institute of Science and Technology, Skolkovo, Russia
- Institute for Information Transmission Problems, RAS, Moscow, Russia
| |
Collapse
|
45
|
Bonora G, Ramani V, Singh R, Fang H, Jackson DL, Srivatsan S, Qiu R, Lee C, Trapnell C, Shendure J, Duan Z, Deng X, Noble WS, Disteche CM. Single-cell landscape of nuclear configuration and gene expression during stem cell differentiation and X inactivation. Genome Biol 2021; 22:279. [PMID: 34579774 PMCID: PMC8474932 DOI: 10.1186/s13059-021-02432-w] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/03/2020] [Accepted: 07/07/2021] [Indexed: 11/25/2022] Open
Abstract
BACKGROUND Mammalian development is associated with extensive changes in gene expression, chromatin accessibility, and nuclear structure. Here, we follow such changes associated with mouse embryonic stem cell differentiation and X inactivation by integrating, for the first time, allele-specific data from these three modalities obtained by high-throughput single-cell RNA-seq, ATAC-seq, and Hi-C. RESULTS Allele-specific contact decay profiles obtained by single-cell Hi-C clearly show that the inactive X chromosome has a unique profile in differentiated cells that have undergone X inactivation. Loss of this inactive X-specific structure at mitosis is followed by its reappearance during the cell cycle, suggesting a "bookmark" mechanism. Differentiation of embryonic stem cells to follow the onset of X inactivation is associated with changes in contact decay profiles that occur in parallel on both the X chromosomes and autosomes. Single-cell RNA-seq and ATAC-seq show evidence of a delay in female versus male cells, due to the presence of two active X chromosomes at early stages of differentiation. The onset of the inactive X-specific structure in single cells occurs later than gene silencing, consistent with the idea that chromatin compaction is a late event of X inactivation. Single-cell Hi-C highlights evidence of discrete changes in nuclear structure characterized by the acquisition of very long-range contacts throughout the nucleus. Novel computational approaches allow for the effective alignment of single-cell gene expression, chromatin accessibility, and 3D chromosome structure. CONCLUSIONS Based on trajectory analyses, three distinct nuclear structure states are detected reflecting discrete and profound simultaneous changes not only to the structure of the X chromosomes, but also to that of autosomes during differentiation. Our study reveals that long-range structural changes to chromosomes appear as discrete events, unlike progressive changes in gene expression and chromatin accessibility.
Collapse
Affiliation(s)
- Giancarlo Bonora
- Department of Genome Sciences, University of Washington, Seattle, WA, USA
| | - Vijay Ramani
- Department of Biochemistry & Biophysics, University of California San Francisco, San Francisco, CA, USA
| | - Ritambhara Singh
- Department of Computer Science, Brown University, Providence, RI, USA
- Center for Computational Molecular Biology, Brown University, Providence, RI, USA
| | - He Fang
- Department of Laboratory Medicine and Pathology, University of Washington, Seattle, WA, USA
| | - Dana L Jackson
- Department of Genome Sciences, University of Washington, Seattle, WA, USA
| | - Sanjay Srivatsan
- Department of Genome Sciences, University of Washington, Seattle, WA, USA
| | - Ruolan Qiu
- Department of Genome Sciences, University of Washington, Seattle, WA, USA
| | - Choli Lee
- Department of Genome Sciences, University of Washington, Seattle, WA, USA
| | - Cole Trapnell
- Department of Genome Sciences, University of Washington, Seattle, WA, USA
- Brotman Baty Institute for Precision Medicine, Seattle, WA, USA
- Allen Discovery Center for Cell Lineage Tracing, Seattle, WA, USA
| | - Jay Shendure
- Department of Genome Sciences, University of Washington, Seattle, WA, USA
- Brotman Baty Institute for Precision Medicine, Seattle, WA, USA
- Allen Discovery Center for Cell Lineage Tracing, Seattle, WA, USA
- Howard Hughes Medical Institute, Seattle, WA, USA
| | - Zhijun Duan
- Institute for Stem Cell and Regenerative Medicine, University of Washington, Seattle, USA
- Division of Hematology, Department of Medicine, University of Washington, Seattle, USA
| | - Xinxian Deng
- Department of Laboratory Medicine and Pathology, University of Washington, Seattle, WA, USA.
| | - William S Noble
- Department of Genome Sciences, University of Washington, Seattle, WA, USA.
- Paul G. Allen School of Computer Science and Engineering, University of Washington, Seattle, WA, USA.
| | - Christine M Disteche
- Department of Laboratory Medicine and Pathology, University of Washington, Seattle, WA, USA.
- Brotman Baty Institute for Precision Medicine, Seattle, WA, USA.
- Department of Medicine, University of Washington, Seattle, WA, USA.
| |
Collapse
|
46
|
Abstract
The spatial organization of the genome in the cell nucleus is pivotal to cell function. However, how the 3D genome organization and its dynamics influence cellular phenotypes remains poorly understood. The very recent development of single-cell technologies for probing the 3D genome, especially single-cell Hi-C (scHi-C), has ushered in a new era of unveiling cell-to-cell variability of 3D genome features at an unprecedented resolution. Here, we review recent developments in computational approaches to the analysis of scHi-C, including data processing, dimensionality reduction, imputation for enhancing data quality, and the revealing of 3D genome features at single-cell resolution. While much progress has been made in computational method development to analyze single-cell 3D genomes, substantial future work is needed to improve data interpretation and multimodal data integration, which are critical to reveal fundamental connections between genome structure and function among heterogeneous cell populations in various biological contexts.
Collapse
Affiliation(s)
- Tianming Zhou
- Computational Biology Department, School of Computer Science, Carnegie Mellon University, Pittsburgh, Pennsylvania 15213, USA;
| | - Ruochi Zhang
- Computational Biology Department, School of Computer Science, Carnegie Mellon University, Pittsburgh, Pennsylvania 15213, USA;
| | - Jian Ma
- Computational Biology Department, School of Computer Science, Carnegie Mellon University, Pittsburgh, Pennsylvania 15213, USA;
| |
Collapse
|
47
|
Durham TJ, Daza RM, Gevirtzman L, Cusanovich DA, Bolonduro O, Noble WS, Shendure J, Waterston RH. Comprehensive characterization of tissue-specific chromatin accessibility in L2 Caenorhabditis elegans nematodes. Genome Res 2021; 31:1952-1969. [PMID: 33888511 DOI: 10.1101/gr.271791.120] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/15/2020] [Accepted: 04/13/2021] [Indexed: 11/24/2022]
Abstract
Recently developed single-cell technologies allow researchers to characterize cell states at ever greater resolution and scale. Caenorhabditis elegans is a particularly tractable system for studying development, and recent single-cell RNA-seq studies characterized the gene expression patterns for nearly every cell type in the embryo and at the second larval stage (L2). Gene expression patterns give insight about gene function and into the biochemical state of different cell types; recent advances in other single-cell genomics technologies can now also characterize the regulatory context of the genome that gives rise to these gene expression levels at a single-cell resolution. To explore the regulatory DNA of individual cell types in C. elegans, we collected single-cell chromatin accessibility data using the sci-ATAC-seq assay in L2 larvae to match the available single-cell RNA-seq data set. By using a novel implementation of the latent Dirichlet allocation algorithm, we identify 37 clusters of cells that correspond to different cell types in the worm, providing new maps of putative cell type-specific gene regulatory sites, with promise for better understanding of cellular differentiation and gene regulation.
Collapse
Affiliation(s)
- Timothy J Durham
- Department of Genome Sciences, University of Washington, Seattle, Washington 98195, USA
| | - Riza M Daza
- Department of Genome Sciences, University of Washington, Seattle, Washington 98195, USA
| | - Louis Gevirtzman
- Department of Genome Sciences, University of Washington, Seattle, Washington 98195, USA
| | - Darren A Cusanovich
- Department of Cellular and Molecular Medicine, University of Arizona, Tucson, Arizona 85721, USA
| | - Olubusayo Bolonduro
- Department of Genome Sciences, University of Washington, Seattle, Washington 98195, USA
| | - William Stafford Noble
- Department of Genome Sciences, University of Washington, Seattle, Washington 98195, USA.,Paul G. Allen School of Computer Science and Engineering, University of Washington, Seattle, Washington 98195, USA
| | - Jay Shendure
- Department of Genome Sciences, University of Washington, Seattle, Washington 98195, USA.,Howard Hughes Medical Institute, Chevy Chase, Maryland 20815, USA.,Brotman Baty Institute for Precision Medicine, Seattle, Washington 98195, USA.,Allen Discovery Center for Cell Lineage Tracing, University of Washington, Seattle, Washington 98195, USA
| | - Robert H Waterston
- Department of Genome Sciences, University of Washington, Seattle, Washington 98195, USA
| |
Collapse
|
48
|
Kundu S, Ray MD, Sharma A. Interplay between genome organization and epigenomic alterations of pericentromeric DNA in cancer. J Genet Genomics 2021; 48:184-197. [PMID: 33840602 DOI: 10.1016/j.jgg.2021.02.004] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/06/2020] [Revised: 02/07/2021] [Accepted: 02/20/2021] [Indexed: 12/16/2022]
Abstract
In eukaryotic genome biology, the genomic organization inside the three-dimensional (3D) nucleus is highly complex, and whether this organization governs gene expression is poorly understood. Nuclear lamina (NL) is a filamentous meshwork of proteins present at the lining of inner nuclear membrane that serves as an anchoring platform for genome organization. Large chromatin domains termed as lamina-associated domains (LADs), play a major role in silencing genes at the nuclear periphery. The interaction of the NL and genome is dynamic and stochastic. Furthermore, many genes change their positions during developmental processes or under disease conditions such as cancer, to activate certain sorts of genes and/or silence others. Pericentromeric heterochromatin (PCH) is mostly in the silenced region within the genome, which localizes at the nuclear periphery. Studies show that several genes located at the PCH are aberrantly expressed in cancer. The interesting question is that despite being localized in the pericentromeric region, how these genes still manage to overcome pericentromeric repression. Although epigenetic mechanisms control the expression of the pericentromeric region, recent studies about genome organization and genome-nuclear lamina interaction have shed light on a new aspect of pericentromeric gene regulation through a complex and coordinated interplay between epigenomic remodeling and genomic organization in cancer.
Collapse
Affiliation(s)
- Subhadip Kundu
- Laboratory of Chromatin and Cancer Epigenetics, Department of Biochemistry, All India Institute of Medical Sciences, Ansari Nagar, New Delhi 110029, India
| | - M D Ray
- Department of Surgical Oncology, IRCH, All India Institute of Medical Sciences, Ansari Nagar, New Delhi 110029, India
| | - Ashok Sharma
- Laboratory of Chromatin and Cancer Epigenetics, Department of Biochemistry, All India Institute of Medical Sciences, Ansari Nagar, New Delhi 110029, India.
| |
Collapse
|
49
|
Cahan P, Cacchiarelli D, Dunn SJ, Hemberg M, de Sousa Lopes SMC, Morris SA, Rackham OJL, Del Sol A, Wells CA. Computational Stem Cell Biology: Open Questions and Guiding Principles. Cell Stem Cell 2021; 28:20-32. [PMID: 33417869 PMCID: PMC7799393 DOI: 10.1016/j.stem.2020.12.012] [Citation(s) in RCA: 9] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/13/2022]
Abstract
Computational biology is enabling an explosive growth in our understanding of stem cells and our ability to use them for disease modeling, regenerative medicine, and drug discovery. We discuss four topics that exemplify applications of computation to stem cell biology: cell typing, lineage tracing, trajectory inference, and regulatory networks. We use these examples to articulate principles that have guided computational biology broadly and call for renewed attention to these principles as computation becomes increasingly important in stem cell biology. We also discuss important challenges for this field with the hope that it will inspire more to join this exciting area.
Collapse
Affiliation(s)
- Patrick Cahan
- Institute for Cell Engineering, Department of Biomedical Engineering, Department of Molecular Biology and Genetics, Johns Hopkins School of Medicine, Baltimore, MD 21205, USA.
| | - Davide Cacchiarelli
- Telethon Institute of Genetics and Medicine (TIGEM), Armenise/Harvard Laboratory of Integrative Genomics, Pozzuoli, Italy d Department of Translational Medicine, University of Naples "Federico II," Naples, Italy
| | - Sara-Jane Dunn
- DeepMind, 14-18 Handyside Street, London N1C 4DN, UK; Wellcome-MRC Cambridge Stem Cell Institute, University of Cambridge, Jeffrey Cheah Biomedical Centre, Puddicombe Way, Cambridge Biomedical Campus, Cambridge CB2 0AW, UK
| | - Martin Hemberg
- Wellcome Sanger Institute, Wellcome Genome Campus, Hinxton CB10 1SA, UK
| | | | - Samantha A Morris
- Department of Developmental Biology, Department of Genetics, Center of Regenerative Medicine, Washington University School of Medicine, St. Louis, MO 63110, USA
| | - Owen J L Rackham
- Centre for Computational Biology and The Program for Cardiovascular and Metabolic Disorders, Duke-NUS Medical School, Singapore, Singapore
| | - Antonio Del Sol
- Luxembourg Centre for Systems Biomedicine (LCSB), University of Luxembourg, 6 Avenue du Swing, Belvaux 4366, Luxembourg; CIC bioGUNE, Bizkaia Technology Park, 801 Building, 48160 Derio, Spain; IKERBASQUE, Basque Foundation for Science, Bilbao 48013, Spain
| | - Christine A Wells
- Centre for Stem Cell Systems, Faculty of Medicine, Dentistry and Health Sciences, The University of Melbourne, Melbourne, VIC 3010, Australia
| |
Collapse
|