1
|
Wang F, Lin J, Alinejad-Rokny H, Ma W, Meng L, Huang L, Yu J, Chen N, Wang Y, Yao Z, Xie W, Wong KC, Li X. Unveiling Multi-Scale Architectural Features in Single-Cell Hi-C Data Using scCAFE. ADVANCED SCIENCE (WEINHEIM, BADEN-WURTTEMBERG, GERMANY) 2025:e2416432. [PMID: 40270467 DOI: 10.1002/advs.202416432] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 12/08/2024] [Revised: 03/12/2025] [Indexed: 04/25/2025]
Abstract
Single-cell Hi-C (scHi-C) has provided unprecedented insights into the heterogeneity of 3D genome organization. However, its sparse and noisy nature poses challenges for computational analyses, such as chromatin architectural feature identification. Here, scCAFE is introduced, which is a deep learning model for the multi-scale detection of architectural features at the single-cell level. scCAFE provides a unified framework for annotating chromatin loops, TAD-like domains (TLDs), and compartments across individual cells. This model outperforms previous scHi-C loop calling methods and delivers accurate predictions of TLDs and compartments that are biologically consistent with previous studies. The resulting single-cell annotations also offer a measure to characterize the heterogeneity of different levels of architectural features across cell types. This heterogeneity is then leveraged to identify a series of marker loop anchors, demontrating the potential of the 3D genome data to annotate cell identities without the aid of simultaneously sequenced omics data. Overall, scCAFE not only serves as a useful tool for analyzing single-cell genomic architecture, but also paves the way for precise cell-type annotations solely based on 3D genome features.
Collapse
Affiliation(s)
- Fuzhou Wang
- Department of Computer Science, City University of Hong Kong, Kowloon Tong, 000000, Hong Kong SAR
| | - Jiecong Lin
- Department of Computer Science, The University of Hong Kong, Pok Fu Lam, 000000, Hong Kong SAR
- Molecular Pathology Unit, Center for Cancer Research, Massachusetts General Hospital, Department of Pathology, Harvard Medical School, Boston, MA, 02129, USA
| | - Hamid Alinejad-Rokny
- BioMedical Machine Learning Lab, Graduate School of Biomedical Engineering, University of New South Wales, Sydney, 2052, Australia
| | - Wenjing Ma
- School of Artificial Intelligence, Jilin University, Changchun, 132000, China
| | - Lingkuan Meng
- Department of Computer Science, City University of Hong Kong, Kowloon Tong, 000000, Hong Kong SAR
| | - Lei Huang
- Department of Computer Science, City University of Hong Kong, Kowloon Tong, 000000, Hong Kong SAR
| | - Jixiang Yu
- Department of Computer Science, City University of Hong Kong, Kowloon Tong, 000000, Hong Kong SAR
| | - Nanjun Chen
- Department of Computer Science, City University of Hong Kong, Kowloon Tong, 000000, Hong Kong SAR
| | - Yuchen Wang
- Department of Computer Science, City University of Hong Kong, Kowloon Tong, 000000, Hong Kong SAR
| | - Zhongyu Yao
- Department of Computer Science, City University of Hong Kong, Kowloon Tong, 000000, Hong Kong SAR
| | - Weidun Xie
- Department of Computer Science, City University of Hong Kong, Kowloon Tong, 000000, Hong Kong SAR
| | - Ka-Chun Wong
- Department of Computer Science, City University of Hong Kong, Kowloon Tong, 000000, Hong Kong SAR
- Shenzhen Research Institute, City University of Hong Kong, Shenzhen, 518057, China
| | - Xiangtao Li
- School of Artificial Intelligence, Jilin University, Changchun, 132000, China
| |
Collapse
|
2
|
Schultz ER, Kyhl S, Willett R, de Pablo JJ. Chromatin structures from integrated AI and polymer physics model. PLoS Comput Biol 2025; 21:e1012912. [PMID: 40203073 PMCID: PMC12005555 DOI: 10.1371/journal.pcbi.1012912] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/06/2024] [Revised: 04/17/2025] [Accepted: 02/26/2025] [Indexed: 04/11/2025] Open
Abstract
The physical organization of the genome in three-dimensional space regulates many biological processes, including gene expression and cell differentiation. Three-dimensional characterization of genome structure is critical to understanding these biological processes. Direct experimental measurements of genome structure are challenging; computational models of chromatin structure are therefore necessary. We develop an approach that combines a particle-based chromatin polymer model, molecular simulation, and machine learning to efficiently and accurately estimate chromatin structure from indirect measures of genome structure. More specifically, we introduce a new approach where the interaction parameters of the polymer model are extracted from experimental Hi-C data using a graph neural network (GNN). We train the GNN on simulated data from the underlying polymer model, avoiding the need for large quantities of experimental data. The resulting approach accurately estimates chromatin structures across all chromosomes and across several experimental cell lines despite being trained almost exclusively on simulated data. The proposed approach can be viewed as a general framework for combining physical modeling with machine learning, and it could be extended to integrate additional biological data modalities. Ultimately, we achieve accurate and high-throughput estimations of chromatin structure from Hi-C data, which will be necessary as experimental methodologies, such as single-cell Hi-C, improve.
Collapse
Affiliation(s)
- Eric R. Schultz
- Pritzker School of Molecular Engineering, The University of Chicago, Chicago, Illinois, United States of America
| | - Soren Kyhl
- Pritzker School of Molecular Engineering, The University of Chicago, Chicago, Illinois, United States of America
| | - Rebecca Willett
- Department of Statistics and Computer Science, The University of Chicago, Chicago, Illinois, United States of America
| | - Juan J. de Pablo
- Pritzker School of Molecular Engineering, The University of Chicago, Chicago, Illinois, United States of America
- Tandon School of Engineering, New York University, Brooklyn, New York, United States of America
- Materials Science Division, Argonne National Laboratory, Lemont, Illinois, United States of America
| |
Collapse
|
3
|
Wang X, Luo J, Wu L, Luo H, Guo F. deepTAD: an approach for identifying topologically associated domains based on convolutional neural network and transformer model. Brief Bioinform 2025; 26:bbaf127. [PMID: 40131313 PMCID: PMC11934553 DOI: 10.1093/bib/bbaf127] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/09/2024] [Revised: 03/02/2025] [Accepted: 03/05/2025] [Indexed: 03/26/2025] Open
Abstract
MOTIVATION Topologically associated domains (TADs) play a key role in the 3D organization and function of genomes, and accurate detection of TADs is essential for revealing the relationship between genomic structure and function. Most current methods are developed to extract features in Hi-C interaction matrix to identify TADs. However, due to complexities in Hi-C contact matrices, it is difficult to directly extract features associated with TADs, which prevents current methods from identifying accurate TADs. RESULTS In this paper, a novel method is proposed, deepTAD, which is developed based on a convolutional neural network (CNN) and transformer model. First, based on Hi-C contact matrix, deepTAD utilizes CNN to directly extract features associated with TAD boundaries. Next, deepTAD takes advantage of the transformer model to analyze the variation features around TAD boundaries and determines the TAD boundaries. Second, deepTAD uses the Wilcoxon rank-sum test to further identify false-positive boundaries. Finally, deepTAD computes cosine similarity among identified TAD boundaries and assembles TAD boundaries to obtain hierarchical TADs. The experimental results show that TAD boundaries identified by deepTAD have a significant enrichment of biological features, including structural proteins, histone modifications, and transcription start site loci. Additionally, when evaluating the completeness and accuracy of identified TADs, deepTAD has a good performance compared with other methods. The source code of deepTAD is available at https://github.com/xiaoyan-wang99/deepTAD.
Collapse
Affiliation(s)
- Xiaoyan Wang
- School of Software, Henan Polytechnic University, 2001 Century Road, Jiaozuo 454003, China
| | - Junwei Luo
- School of Software, Henan Polytechnic University, 2001 Century Road, Jiaozuo 454003, China
| | - Lili Wu
- School of Software, Henan Polytechnic University, 2001 Century Road, Jiaozuo 454003, China
| | - Huimin Luo
- School of Computer and Information Engineering, Henan University, North Section of Jinming Avenue, Kaifeng 475001, China
| | - Fei Guo
- School of Computer Science and Engineering, Central South University, 932 Lushan South Road, Changsha 410083, China
| |
Collapse
|
4
|
Wang Y, Cheng J. Reconstructing 3D chromosome structures from single-cell Hi-C data with SO(3)-equivariant graph neural networks. NAR Genom Bioinform 2025; 7:lqaf027. [PMID: 40124711 PMCID: PMC11928942 DOI: 10.1093/nargab/lqaf027] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/13/2024] [Revised: 02/23/2025] [Accepted: 03/05/2025] [Indexed: 03/25/2025] Open
Abstract
The spatial conformation of chromosomes and genomes of single cells is relevant to cellular function and useful for elucidating the mechanism underlying gene expression and genome methylation. The chromosomal contacts (i.e. chromosomal regions in spatial proximity) entailing the three-dimensional (3D) structure of the genome of a single cell can be obtained by single-cell chromosome conformation capture techniques, such as single-cell Hi-C (ScHi-C). However, due to the sparsity of chromosomal contacts in ScHi-C data, it is still challenging for traditional 3D conformation optimization methods to reconstruct the 3D chromosome structures from ScHi-C data. Here, we present a machine learning-based method based on a novel SO(3)-equivariant graph neural network (HiCEGNN) to reconstruct 3D structures of chromosomes of single cells from ScHi-C data. HiCEGNN consistently outperforms both the traditional optimization methods and the only other deep learning method across diverse cells, different structural resolutions, and different noise levels of the data. Moreover, HiCEGNN is robust against the noise in the ScHi-C data.
Collapse
Affiliation(s)
- Yanli Wang
- Department of Electrical Engineering and Computer Science, NextGen Precision Health Institute, University of Missouri, Columbia, MO 65211, United States
| | - Jianlin Cheng
- Department of Electrical Engineering and Computer Science, NextGen Precision Health Institute, University of Missouri, Columbia, MO 65211, United States
| |
Collapse
|
5
|
Tahir M, Norouzi M, Khan SS, Davie JR, Yamanaka S, Ashraf A. Artificial intelligence and deep learning algorithms for epigenetic sequence analysis: A review for epigeneticists and AI experts. Comput Biol Med 2024; 183:109302. [PMID: 39500240 DOI: 10.1016/j.compbiomed.2024.109302] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/20/2024] [Revised: 09/22/2024] [Accepted: 10/17/2024] [Indexed: 11/20/2024]
Abstract
Epigenetics encompasses mechanisms that can alter the expression of genes without changing the underlying genetic sequence. The epigenetic regulation of gene expression is initiated and sustained by several mechanisms such as DNA methylation, histone modifications, chromatin conformation, and non-coding RNA. The changes in gene regulation and expression can manifest in the form of various diseases and disorders such as cancer and congenital deformities. Over the last few decades, high-throughput experimental approaches have been used to identify and understand epigenetic changes, but these laboratory experimental approaches and biochemical processes are time-consuming and expensive. To overcome these challenges, machine learning and artificial intelligence (AI) approaches have been extensively used for mapping epigenetic modifications to their phenotypic manifestations. In this paper we provide a narrative review of published research on AI models trained on epigenomic data to address a variety of problems such as prediction of disease markers, gene expression, enhancer-promoter interaction, and chromatin states. The purpose of this review is twofold as it is addressed to both AI experts and epigeneticists. For AI researchers, we provided a taxonomy of epigenetics research problems that can benefit from an AI-based approach. For epigeneticists, given each of the above problems we provide a list of candidate AI solutions in the literature. We have also identified several gaps in the literature, research challenges, and recommendations to address these challenges.
Collapse
Affiliation(s)
- Muhammad Tahir
- Department of Electrical and Computer Engineering, University of Manitoba, Winnipeg, R3T 5V6, MB, Canada
| | - Mahboobeh Norouzi
- Department of Electrical and Computer Engineering, University of Manitoba, Winnipeg, R3T 5V6, MB, Canada
| | - Shehroz S Khan
- College of Engineering and Technology, American University of the Middle East, Kuwait
| | - James R Davie
- Department of Biochemistry and Medical Genetics, Max Rady College of Medicine, Rady Faculty of Health Sciences, University of Manitoba, Winnipeg, MB, Canada
| | - Soichiro Yamanaka
- Graduate School of Science, Department of Biophysics and Biochemistry, University of Tokyo, Japan
| | - Ahmed Ashraf
- Department of Electrical and Computer Engineering, University of Manitoba, Winnipeg, R3T 5V6, MB, Canada.
| |
Collapse
|
6
|
Gong H, Zhang S, Zhang X, Chen Y. A method for chromatin domain partitioning based on hypergraph clustering. Comput Struct Biotechnol J 2024; 23:1584-1593. [PMID: 38655013 PMCID: PMC11035048 DOI: 10.1016/j.csbj.2024.04.008] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/28/2024] [Revised: 03/29/2024] [Accepted: 04/04/2024] [Indexed: 04/26/2024] Open
Abstract
For many years, multi-scale models of chromatin domains, such as A/B compartments, sub-compartments, topologically associated domains (TADs), sub-TADs, and loops have been popular. However, existing methods can only identify structures at a single scale and cannot partition multi-scale structures. In this paper, we proposed a method (TORNADOES) for chromatin domain partitioning based on hypergraph clustering. First, we use a density clustering algorithm to identify TADs at different scales based on Hi-C data with different resolutions. Then, by combining ChIP-seq data features and TAD results at different scales, we generate a hypergraph based on these TADs. Finally, we partition the chromatin domain structure at different scales, including A/B, A1, A2, B1, B2, and B3 based on the Laplacian matrix feature of the hypergraph. Similarity comparison experiments and ChIP-seq signal enrichment analysis are performed on the A/B region and sub-TAD levels, respectively, demonstrating that our method can identify chromatin domains with distinct features and provide a deeper understanding of the organizational patterns and functional differences in TADs at the genomic hierarchical structure. Comparative analysis of multiple cell line data shows that TORNADOES can better classify different numbers and types of compartments by changing the factors ChIP-seq data and clustering number used to characterize TAD compared to other methods. Source code for the TORNADOES method can be found at https://github.com/ghaiyan/TORNADOES.
Collapse
Affiliation(s)
- Haiyan Gong
- Beijing Advanced Innovation Center for Materials Genome Engineering, Institute for Advanced Materials and Technology, University of Science and Technology Beijing, Beijing, 100083, China
- Shunde Innovation School, University of Science and Technology Beijing, Foshan, 528399, Guangdong, China
| | - Sichen Zhang
- School of Computer and Communication Engineering, University of Science and Technology Beijing, Beijing, 100083, China
| | - Xiaotong Zhang
- School of Computer and Communication Engineering, University of Science and Technology Beijing, Beijing, 100083, China
- Shunde Innovation School, University of Science and Technology Beijing, Foshan, 528399, Guangdong, China
| | - Yang Chen
- The State Key Laboratory of Common Mechanism Research for Major Diseases, Institute of Basic Medical Sciences, Chinese Academy of Medical Sciences and Peking Union Medical College, Beijing, 100005, China
| |
Collapse
|
7
|
Bera P, Mondal J. Machine learning unravels inherent structural patterns in Escherichia coli Hi-C matrices and predicts chromosome dynamics. Nucleic Acids Res 2024; 52:10836-10849. [PMID: 39217471 PMCID: PMC11472170 DOI: 10.1093/nar/gkae749] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/08/2024] [Accepted: 08/19/2024] [Indexed: 09/04/2024] Open
Abstract
High dimensional nature of the chromosomal conformation contact map ('Hi-C Map'), even for microscopically small bacterial cell, poses challenges for extracting meaningful information related to its complex organization. Here we first demonstrate that an artificial deep neural network-based machine-learnt (ML) low-dimensional representation of a recently reported Hi-C interaction map of archetypal bacteria Escherichia coli can decode crucial underlying structural pattern. The ML-derived representation of Hi-C map can automatically detect a set of spatially distinct domains across E. coli genome, sharing reminiscences of six putative macro-domains previously posited via recombination assay. Subsequently, a ML-generated model assimilates the intricate relationship between large array of Hi-C-derived chromosomal contact probabilities and respective diffusive dynamics of each individual chromosomal gene and identifies an optimal number of functionally important chromosomal contact-pairs that are majorly responsible for heterogenous, coordinate-dependent sub-diffusive motions of chromosomal loci. Finally, the ML models, trained on wild-type E. coli show-cased its predictive capabilities on mutant bacterial strains, shedding light on the structural and dynamic nuances of ΔMatP30MM and ΔMukBEF22MM chromosomes. Overall our results illuminate the power of ML techniques in unraveling the complex relationship between structure and dynamics of bacterial chromosomal loci, promising meaningful connections between ML-derived insights and biological phenomena.
Collapse
Affiliation(s)
- Palash Bera
- Tata Institute of Fundamental Research Hyderabad, Telangana 500046, India
| | - Jagannath Mondal
- Tata Institute of Fundamental Research Hyderabad, Telangana 500046, India
| |
Collapse
|
8
|
Banerjee A, Zhang S, Bahar I. Genome structural dynamics: insights from Gaussian network analysis of Hi-C data. Brief Funct Genomics 2024; 23:525-537. [PMID: 38654598 PMCID: PMC11428154 DOI: 10.1093/bfgp/elae014] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/02/2023] [Revised: 03/11/2024] [Accepted: 04/02/2024] [Indexed: 04/26/2024] Open
Abstract
Characterization of the spatiotemporal properties of the chromatin is essential to gaining insights into the physical bases of gene co-expression, transcriptional regulation and epigenetic modifications. The Gaussian network model (GNM) has proven in recent work to serve as a useful tool for modeling chromatin structural dynamics, using as input high-throughput chromosome conformation capture data. We focus here on the exploration of the collective dynamics of chromosomal structures at hierarchical levels of resolution, from single gene loci to topologically associating domains or entire chromosomes. The GNM permits us to identify long-range interactions between gene loci, shedding light on the role of cross-correlations between distal regions of the chromosomes in regulating gene expression. Notably, GNM analysis performed across diverse cell lines highlights the conservation of the global/cooperative movements of the chromatin across different types of cells. Variations driven by localized couplings between genomic loci, on the other hand, underlie cell differentiation, underscoring the significance of the four-dimensional properties of the genome in defining cellular identity. Finally, we demonstrate the close relation between the cell type-dependent mobility profiles of gene loci and their gene expression patterns, providing a clear demonstration of the role of chromosomal 4D features in defining cell-specific differential expression of genes.
Collapse
Affiliation(s)
- Anupam Banerjee
- Laufer Center for Physical & Quantitative Biology, Stony Brook University, NY 11794, USA
| | - She Zhang
- OpenEye, Cadence Molecular Sciences, Santa Fe, NM 87508, USA
| | - Ivet Bahar
- Laufer Center for Physical & Quantitative Biology, Stony Brook University, NY 11794, USA
- Department of Biochemistry and Cell Biology, Renaissance School of Medicine, Stony Brook University, NY 11794, USA
| |
Collapse
|
9
|
Park SJ, Nakai K. A computational approach for deciphering the interactions between proximal and distal gene regulators in GC B-cell response. NAR Genom Bioinform 2024; 6:lqae050. [PMID: 38711859 PMCID: PMC11071120 DOI: 10.1093/nargab/lqae050] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/05/2023] [Revised: 04/15/2024] [Accepted: 04/27/2024] [Indexed: 05/08/2024] Open
Abstract
Delineating the intricate interplay between promoter-proximal and -distal regulators is crucial for understanding the function of transcriptional mediator complexes implicated in the regulation of gene expression. The present study aimed to develop a computational method for accurately modeling the spatial proximal and distal regulatory interactions. Our method combined regression-based models to identify key regulators through gene expression prediction and a graph-embedding approach to detect coregulated genes. This approach enabled a detailed investigation of the gene regulatory mechanisms for germinal center B cells, accompanied by dramatic rearrangements of the genome structure. We found that while the promoter-proximal regulatory elements were the principal regulators of gene expression, the distal regulators fine-tuned transcription. Moreover, our approach unveiled the presence of modular regulators, such as cofactors and proximal/distal transcription factors, which were co-expressed with their target genes. Some of these modules exhibited abnormal expression patterns in lymphoma. These findings suggest that the dysregulation of interactions between transcriptional and architectural factors is associated with chromatin reorganization failure, which may increase the risk of malignancy. Therefore, our computational approach helps decipher the transcriptional cis-regulatory code spatially interacting.
Collapse
Affiliation(s)
- Sung-Joon Park
- Human Genome Center, The Institute of Medical Science, The University of Tokyo, 4-6-1 Shirokanedai, Minato-ku, Tokyo 108-8639, Japan
| | - Kenta Nakai
- Human Genome Center, The Institute of Medical Science, The University of Tokyo, 4-6-1 Shirokanedai, Minato-ku, Tokyo 108-8639, Japan
| |
Collapse
|
10
|
Zheng S, Thakkar N, Harris HL, Liu S, Zhang M, Gerstein M, Aiden EL, Rowley MJ, Noble WS, Gürsoy G, Singh R. Predicting A/B compartments from histone modifications using deep learning. iScience 2024; 27:109570. [PMID: 38646172 PMCID: PMC11031843 DOI: 10.1016/j.isci.2024.109570] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/04/2024] [Revised: 02/28/2024] [Accepted: 03/22/2024] [Indexed: 04/23/2024] Open
Abstract
The three-dimensional organization of genomes plays a crucial role in essential biological processes. The segregation of chromatin into A and B compartments highlights regions of activity and inactivity, providing a window into the genomic activities specific to each cell type. Yet, the steep costs associated with acquiring Hi-C data, necessary for studying this compartmentalization across various cell types, pose a significant barrier in studying cell type specific genome organization. To address this, we present a prediction tool called compartment prediction using recurrent neural networks (CoRNN), which predicts compartmentalization of 3D genome using histone modification enrichment. CoRNN demonstrates robust cross-cell-type prediction of A/B compartments with an average AuROC of 90.9%. Cell-type-specific predictions align well with known functional elements, with H3K27ac and H3K36me3 identified as highly predictive histone marks. We further investigate our mispredictions and found that they are located in regions with ambiguous compartmental status. Furthermore, our model's generalizability is validated by predicting compartments in independent tissue samples, which underscores its broad applicability.
Collapse
Affiliation(s)
- Suchen Zheng
- Department of Computer Science, Brown University, Providence, RI, USA
| | - Nitya Thakkar
- Department of Computer Science, Brown University, Providence, RI, USA
| | - Hannah L. Harris
- Department of Genetics, Cell Biology and Anatomy, University of Nebraska Medical Center, Omaha, NE, USA
| | - Susanna Liu
- Data Science and Statistics, Molecular, Cellular, and Developmental Biology, Yale University, New Haven, CT, USA
| | - Megan Zhang
- Data Science and Statistics, Molecular, Cellular, and Developmental Biology, Yale University, New Haven, CT, USA
| | - Mark Gerstein
- Computational Biology and Bioinformatics, Molecular Biophysics & Biochemistry, Data Science and Statistics, Computer Science, Yale University, New Haven, CT, USA
| | - Erez Lieberman Aiden
- Department of Genetics, Baylor College of Medicine, Department of Computer Science, Computational and Applied Mathematics, Rice University, Houston, TX, USA
| | - M. Jordan Rowley
- Department of Genetics, Cell Biology and Anatomy, University of Nebraska Medical Center, Omaha, NE, USA
| | - William Stafford Noble
- Department of Genome Sciences, Paul G. Allen School of Computer Science and Engineering, University of Washington, Seattle, WA, USA
| | - Gamze Gürsoy
- Department of Biomedical Informatics, Columbia University, New York Genome Center, New York, NY, USA
| | - Ritambhara Singh
- Department of Computer Science, Center for Computational Molecular Biology, Brown University, Providence, RI, USA
| |
Collapse
|
11
|
Xiong K, Zhang R, Ma J. scGHOST: identifying single-cell 3D genome subcompartments. Nat Methods 2024; 21:814-822. [PMID: 38589516 PMCID: PMC11127718 DOI: 10.1038/s41592-024-02230-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/25/2023] [Accepted: 03/01/2024] [Indexed: 04/10/2024]
Abstract
Single-cell Hi-C (scHi-C) technologies allow for probing of genome-wide cell-to-cell variability in three-dimensional (3D) genome organization from individual cells. Computational methods have been developed to reveal single-cell 3D genome features based on scHi-C, including A/B compartments, topologically associating domains and chromatin loops. However, no method exists for annotating single-cell subcompartments, which is important for understanding chromosome spatial localization in single cells. Here we present scGHOST, a single-cell subcompartment annotation method using graph embedding with constrained random walk sampling. Applications of scGHOST to scHi-C data and contact maps derived from single-cell 3D genome imaging demonstrate reliable identification of single-cell subcompartments, offering insights into cell-to-cell variability of nuclear subcompartments. Using scHi-C data from complex tissues, scGHOST identifies cell-type-specific or allele-specific subcompartments linked to gene transcription across various cell types and developmental stages, suggesting functional implications of single-cell subcompartments. scGHOST is an effective method for annotating single-cell 3D genome subcompartments in a broad range of biological contexts.
Collapse
Affiliation(s)
- Kyle Xiong
- Ray and Stephanie Lane Computational Biology Department, School of Computer Science, Carnegie Mellon University, Pittsburgh, PA, USA
| | - Ruochi Zhang
- Ray and Stephanie Lane Computational Biology Department, School of Computer Science, Carnegie Mellon University, Pittsburgh, PA, USA
- Eric and Wendy Schmidt Center, Broad Institute of MIT and Harvard, Cambridge, MA, USA
| | - Jian Ma
- Ray and Stephanie Lane Computational Biology Department, School of Computer Science, Carnegie Mellon University, Pittsburgh, PA, USA.
| |
Collapse
|
12
|
Nolan B, Harris HL, Kalluchi A, Reznicek TE, Cummings CT, Rowley MJ. HiCrayon reveals distinct layers of multi-state 3D chromatin organization. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.02.11.579821. [PMID: 38405883 PMCID: PMC10888951 DOI: 10.1101/2024.02.11.579821] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 02/27/2024]
Abstract
The co-visualization of chromatin conformation with 1D 'omics data is key to the multi-omics driven data analysis of 3D genome organization. Chromatin contact maps are often shown as 2D heatmaps and visually compared to 1D genomic data by simple juxtaposition. While common, this strategy is imprecise, placing the onus on the reader to align features with each other. To remedy this, we developed HiCrayon, an interactive tool that facilitates the integration of 3D chromatin organization maps and 1D datasets. This visualization method integrates data from genomic assays directly into the chromatin contact map by coloring interactions according to 1D signal. HiCrayon is implemented using R shiny and python to create a graphical user interface (GUI) application, available in both web or containerized format to promote accessibility. HiCrayon is implemented in R, and includes a graphical user interface (GUI), as well as a slimmed-down web-based version that lets users quickly produce publication-ready images. We demonstrate the utility of HiCrayon in visualizing the effectiveness of compartment calling and the relationship between ChIP-seq and various features of chromatin organization. We also demonstrate the improved visualization of other 3D genomic phenomena, such as differences between loops associated with CTCF/cohesin vs. those associated with H3K27ac. We then demonstrate HiCrayon's visualization of organizational changes that occur during differentiation and use HiCrayon to detect compartment patterns that cannot be assigned to either A or B compartments, revealing a distinct 3rd chromatin compartment. Overall, we demonstrate the utility of co-visualizing 2D chromatin conformation with 1D genomic signals within the same matrix to reveal fundamental aspects of genome organization. Local version: https://github.com/JRowleyLab/HiCrayon Web version: https://jrowleylab.com/HiCrayon.
Collapse
Affiliation(s)
- Ben Nolan
- Department of Genetics, Cell Biology and Anatomy, University of Nebraska Medical Center, Emile St, Omaha, 68198, NE, USA
| | - Hannah L. Harris
- Department of Genetics, Cell Biology and Anatomy, University of Nebraska Medical Center, Emile St, Omaha, 68198, NE, USA
| | - Achyuth Kalluchi
- Department of Genetics, Cell Biology and Anatomy, University of Nebraska Medical Center, Emile St, Omaha, 68198, NE, USA
| | - Timothy E. Reznicek
- Department of Genetics, Cell Biology and Anatomy, University of Nebraska Medical Center, Emile St, Omaha, 68198, NE, USA
| | - Christopher T. Cummings
- Department of Pediatrics, University of Nebraska Medical Center, Emile St, Omaha, 68198, NE, USA
| | - M. Jordan Rowley
- Department of Genetics, Cell Biology and Anatomy, University of Nebraska Medical Center, Emile St, Omaha, 68198, NE, USA
| |
Collapse
|
13
|
Zhang Y, Boninsegna L, Yang M, Misteli T, Alber F, Ma J. Computational methods for analysing multiscale 3D genome organization. Nat Rev Genet 2024; 25:123-141. [PMID: 37673975 PMCID: PMC11127719 DOI: 10.1038/s41576-023-00638-1] [Citation(s) in RCA: 26] [Impact Index Per Article: 26.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 07/12/2023] [Indexed: 09/08/2023]
Abstract
Recent progress in whole-genome mapping and imaging technologies has enabled the characterization of the spatial organization and folding of the genome in the nucleus. In parallel, advanced computational methods have been developed to leverage these mapping data to reveal multiscale three-dimensional (3D) genome features and to provide a more complete view of genome structure and its connections to genome functions such as transcription. Here, we discuss how recently developed computational tools, including machine-learning-based methods and integrative structure-modelling frameworks, have led to a systematic, multiscale delineation of the connections among different scales of 3D genome organization, genomic and epigenomic features, functional nuclear components and genome function. However, approaches that more comprehensively integrate a wide variety of genomic and imaging datasets are still needed to uncover the functional role of 3D genome structure in defining cellular phenotypes in health and disease.
Collapse
Affiliation(s)
- Yang Zhang
- Computational Biology Department, School of Computer Science, Carnegie Mellon University, Pittsburgh, PA, USA
| | - Lorenzo Boninsegna
- Department of Microbiology, Immunology and Molecular Genetics and Institute for Quantitative and Computational Biosciences, University of California Los Angeles, Los Angeles, CA, USA
| | - Muyu Yang
- Computational Biology Department, School of Computer Science, Carnegie Mellon University, Pittsburgh, PA, USA
| | - Tom Misteli
- Center for Cancer Research, National Cancer Institute, Bethesda, MD, USA.
| | - Frank Alber
- Department of Microbiology, Immunology and Molecular Genetics and Institute for Quantitative and Computational Biosciences, University of California Los Angeles, Los Angeles, CA, USA.
| | - Jian Ma
- Computational Biology Department, School of Computer Science, Carnegie Mellon University, Pittsburgh, PA, USA.
| |
Collapse
|
14
|
Han MH, Issagulova D, Park M. Interplay between epigenome and 3D chromatin structure. BMB Rep 2023; 56:633-644. [PMID: 38052424 PMCID: PMC10761748] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/19/2023] [Revised: 11/28/2023] [Accepted: 12/05/2023] [Indexed: 12/07/2023] Open
Abstract
Epigenetic mechanisms, primarily mediated through histone and DNA modifications, play a pivotal role in orchestrating the functional identity of a cell and its response to environmental cues. Similarly, the spatial arrangement of chromatin within the threedimensional (3D) nucleus has been recognized as a significant factor influencing genomic function. Investigating the relationship between epigenetic regulation and 3D chromatin structure has revealed correlation and causality between these processes, from the global alignment of average chromatin structure with chromatin marks to the nuanced correlations at smaller scales. This review aims to dissect the biological significance and the interplay between the epigenome and 3D chromatin structure, while also exploring the underlying molecular mechanisms. By synthesizing insights from both experimental and modeling perspectives, we seek to provide a comprehensive understanding of cellular functions. [BMB Reports 2023; 56(12): 633-644].
Collapse
Affiliation(s)
- Man-Hyuk Han
- Department of Biological Sciences, Korea Advanced Institute of Science and Technology (KAIST), Daejeon 34141, Korea
| | - Dariya Issagulova
- Department of Biological Sciences, Korea Advanced Institute of Science and Technology (KAIST), Daejeon 34141, Korea
| | - Minhee Park
- Department of Biological Sciences, Korea Advanced Institute of Science and Technology (KAIST), Daejeon 34141, Korea; Graduate School of Engineering Biology, Korea Advanced Institute of Science and Technology (KAIST), Daejeon 34141; KAIST Institute for the BioCentury, Korea Advanced Institute of Science and Technology (KAIST), Daejeon 34141; KAIST Stem Cell Center, Korea Advanced Institute of Science and Technology (KAIST), Daejeon 34141, Korea
| |
Collapse
|
15
|
Raffo A, Paulsen J. The shape of chromatin: insights from computational recognition of geometric patterns in Hi-C data. Brief Bioinform 2023; 24:bbad302. [PMID: 37646128 PMCID: PMC10516369 DOI: 10.1093/bib/bbad302] [Citation(s) in RCA: 7] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/18/2023] [Revised: 07/05/2023] [Accepted: 08/03/2023] [Indexed: 09/01/2023] Open
Abstract
The three-dimensional organization of chromatin plays a crucial role in gene regulation and cellular processes like deoxyribonucleic acid (DNA) transcription, replication and repair. Hi-C and related techniques provide detailed views of spatial proximities within the nucleus. However, data analysis is challenging partially due to a lack of well-defined, underpinning mathematical frameworks. Recently, recognizing and analyzing geometric patterns in Hi-C data has emerged as a powerful approach. This review provides a summary of algorithms for automatic recognition and analysis of geometric patterns in Hi-C data and their correspondence with chromatin structure. We classify existing algorithms on the basis of the data representation and pattern recognition paradigm they make use of. Finally, we outline some of the challenges ahead and promising future directions.
Collapse
Affiliation(s)
- Andrea Raffo
- Department of Biosciences, University of Oslo, 0316 Oslo, Norway
| | - Jonas Paulsen
- Department of Biosciences, University of Oslo, 0316 Oslo, Norway
- Centre for Bioinformatics, Department of Informatics, University of Oslo, 0316 Oslo, Norway
| |
Collapse
|
16
|
Xu J, Zhang P, Sun W, Zhang J, Zhang W, Hou C, Li L. EpiMCI: Predicting Multi-Way Chromatin Interactions from Epigenomic Signals. BIOLOGY 2023; 12:1203. [PMID: 37759602 PMCID: PMC10525350 DOI: 10.3390/biology12091203] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/27/2023] [Revised: 08/31/2023] [Accepted: 08/31/2023] [Indexed: 09/29/2023]
Abstract
The recently emerging high-throughput Pore-C (HiPore-C) can identify whole-genome high-order chromatin multi-way interactions with an ultra-high output, contributing to deciphering three-dimensional (3D) genome organization. However, it also brings new challenges to relevant data analysis. To alleviate this problem, we proposed the EpiMCI, a model for multi-way chromatin interaction prediction based on a hypergraph neural network with epigenomic signals as the input. The EpiMCI integrated separate hyperedge representations with coupling hyperedge information and obtained AUCs of 0.981 and 0.984 in the GM12878 and K562 datasets, respectively, which outperformed the current available method. Moreover, the EpiMCI can be applied to denoise the HiPore-C data and improve the data quality efficiently. Furthermore, the vertex embeddings extracted from the EpiMCI reflected the global chromatin architecture accurately. The principal component analysis suggested that it was well aligned with the activities of genomic regions at the chromatin compartment level. Taken together, the EpiMCI can accurately predict multi-way chromatin interactions and can be applied to studies relying on chromatin architecture.
Collapse
Affiliation(s)
- Jinsheng Xu
- Hubei Key Laboratory of Agricultural Bioinformatics, College of Informatics, Huazhong Agricultural University, Wuhan 430070, China
| | - Ping Zhang
- Hubei Key Laboratory of Agricultural Bioinformatics, College of Informatics, Huazhong Agricultural University, Wuhan 430070, China
| | - Weicheng Sun
- Hubei Key Laboratory of Agricultural Bioinformatics, College of Informatics, Huazhong Agricultural University, Wuhan 430070, China
| | - Junying Zhang
- Hubei Key Laboratory of Agricultural Bioinformatics, College of Informatics, Huazhong Agricultural University, Wuhan 430070, China
| | - Wenxue Zhang
- Food Science Program, Division of Food, Nutrition and Exercise Sciences, University of Missouri, 1406 E Rollins Street, Columbia, MO 65211, USA
| | - Chunhui Hou
- China State Key Laboratory of Genetic Resources and Evolution, Kunming Institute of Zoology, Chinese Academy of Sciences, Kunming 650223, China
| | - Li Li
- Hubei Key Laboratory of Agricultural Bioinformatics, College of Informatics, Huazhong Agricultural University, Wuhan 430070, China
- Hubei Hongshan Laboratory, Huazhong Agricultural University, Wuhan 430074, China
| |
Collapse
|
17
|
Yildirim A, Hua N, Boninsegna L, Zhan Y, Polles G, Gong K, Hao S, Li W, Zhou XJ, Alber F. Evaluating the role of the nuclear microenvironment in gene function by population-based modeling. Nat Struct Mol Biol 2023; 30:1193-1206. [PMID: 37580627 PMCID: PMC10442234 DOI: 10.1038/s41594-023-01036-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/03/2021] [Accepted: 06/16/2023] [Indexed: 08/16/2023]
Abstract
The nuclear folding of chromosomes relative to nuclear bodies is an integral part of gene function. Here, we demonstrate that population-based modeling-from ensemble Hi-C data-provides a detailed description of the nuclear microenvironment of genes and its role in gene function. We define the microenvironment by the subnuclear positions of genomic regions with respect to nuclear bodies, local chromatin compaction, and preferences in chromatin compartmentalization. These structural descriptors are determined in single-cell models, thereby revealing the structural variability between cells. We demonstrate that the microenvironment of a genomic region is linked to its functional potential in gene transcription, replication, and chromatin compartmentalization. Some chromatin regions feature a strong preference for a single microenvironment, due to association with specific nuclear bodies in most cells. Other chromatin shows high structural variability, which is a strong indicator of functional heterogeneity. Moreover, we identify specialized nuclear microenvironments, which distinguish chromatin in different functional states and reveal a key role of nuclear speckles in chromosome organization. We demonstrate that our method produces highly predictive three-dimensional genome structures, which accurately reproduce data from a variety of orthogonal experiments, thus considerably expanding the range of Hi-C data analysis.
Collapse
Affiliation(s)
- Asli Yildirim
- Institute for Quantitative and Computational Biosciences, University of California Los Angeles, Los Angeles, CA, USA
- Department of Microbiology, Immunology, and Molecular Genetics, University of California Los Angeles, Los Angeles, CA, USA
| | - Nan Hua
- Institute for Quantitative and Computational Biosciences, University of California Los Angeles, Los Angeles, CA, USA
- Department of Microbiology, Immunology, and Molecular Genetics, University of California Los Angeles, Los Angeles, CA, USA
| | - Lorenzo Boninsegna
- Institute for Quantitative and Computational Biosciences, University of California Los Angeles, Los Angeles, CA, USA
- Department of Microbiology, Immunology, and Molecular Genetics, University of California Los Angeles, Los Angeles, CA, USA
| | - Yuxiang Zhan
- Institute for Quantitative and Computational Biosciences, University of California Los Angeles, Los Angeles, CA, USA
- Department of Microbiology, Immunology, and Molecular Genetics, University of California Los Angeles, Los Angeles, CA, USA
- Department of Quantitative and Computational Biology, University of Southern California, Los Angeles, CA, USA
| | - Guido Polles
- Institute for Quantitative and Computational Biosciences, University of California Los Angeles, Los Angeles, CA, USA
- Department of Microbiology, Immunology, and Molecular Genetics, University of California Los Angeles, Los Angeles, CA, USA
| | - Ke Gong
- Institute for Quantitative and Computational Biosciences, University of California Los Angeles, Los Angeles, CA, USA
- Department of Microbiology, Immunology, and Molecular Genetics, University of California Los Angeles, Los Angeles, CA, USA
| | - Shengli Hao
- Institute for Quantitative and Computational Biosciences, University of California Los Angeles, Los Angeles, CA, USA
- Department of Microbiology, Immunology, and Molecular Genetics, University of California Los Angeles, Los Angeles, CA, USA
| | - Wenyuan Li
- Department of Pathology, David Geffen School of Medicine, University of California Los Angeles, Los Angeles, CA, USA
| | - Xianghong Jasmine Zhou
- Department of Pathology, David Geffen School of Medicine, University of California Los Angeles, Los Angeles, CA, USA
| | - Frank Alber
- Institute for Quantitative and Computational Biosciences, University of California Los Angeles, Los Angeles, CA, USA.
- Department of Microbiology, Immunology, and Molecular Genetics, University of California Los Angeles, Los Angeles, CA, USA.
- Department of Quantitative and Computational Biology, University of Southern California, Los Angeles, CA, USA.
| |
Collapse
|
18
|
Yin Z, Cui S, Xue S, Xie Y, Wang Y, Zhao C, Zhang Z, Wu T, Hou G, Wang W, Xie SQ, Wu Y, Guo Y. Identification of Two Subsets of Subcompartment A1 Associated with High Transcriptional Activity and Frequent Loop Extrusion. BIOLOGY 2023; 12:1058. [PMID: 37626945 PMCID: PMC10451812 DOI: 10.3390/biology12081058] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/18/2023] [Revised: 07/24/2023] [Accepted: 07/24/2023] [Indexed: 08/27/2023]
Abstract
Three-dimensional genome organization has been increasingly recognized as an important determinant of the precise regulation of gene expression in mammalian cells, yet the relationship between gene transcriptional activity and spatial subcompartment positioning is still not fully comprehended. Here, we first utilized genome-wide Hi-C data to infer eight types of subcompartment (labeled A1, A2, A3, A4, B1, B2, B3, and B4) in mouse embryonic stem cells and four primary differentiated cell types, including thymocytes, macrophages, neural progenitor cells, and cortical neurons. Transitions of subcompartments may confer gene expression changes in different cell types. Intriguingly, we identified two subsets of subcompartments defined by higher gene density and characterized by strongly looped contact domains, named common A1 and variable A1, respectively. We revealed that common A1, which includes highly expressed genes and abundant housekeeping genes, shows a ~2-fold higher gene density than the variable A1, where cell type-specific genes are significantly enriched. Thus, our study supports a model in which both types of genomic loci with constitutive and regulatory high transcriptional activity can drive the subcompartment A1 formation. Special chromatin subcompartment arrangement and intradomain interactions may, in turn, contribute to maintaining proper levels of gene expression, especially for regulatory non-housekeeping genes.
Collapse
Affiliation(s)
- Zihang Yin
- Sheng Yushou Center of Cell Biology and Immunology, Joint International Research Laboratory of Metabolic and Developmental Sciences, School of Life Sciences and Biotechnology, Shanghai Jiao Tong University, Shanghai 200240, China; (Z.Y.); (S.C.); (Y.X.); (Y.W.); (C.Z.); (Z.Z.); (T.W.)
- WLA Laboratories, Shanghai 201203, China
| | - Shuang Cui
- Sheng Yushou Center of Cell Biology and Immunology, Joint International Research Laboratory of Metabolic and Developmental Sciences, School of Life Sciences and Biotechnology, Shanghai Jiao Tong University, Shanghai 200240, China; (Z.Y.); (S.C.); (Y.X.); (Y.W.); (C.Z.); (Z.Z.); (T.W.)
- WLA Laboratories, Shanghai 201203, China
| | - Song Xue
- Department of Bioinformatics and Biostatistics, School of Life Sciences and Biotechnology, Shanghai Jiao Tong University, Shanghai 200240, China;
| | - Yufan Xie
- Sheng Yushou Center of Cell Biology and Immunology, Joint International Research Laboratory of Metabolic and Developmental Sciences, School of Life Sciences and Biotechnology, Shanghai Jiao Tong University, Shanghai 200240, China; (Z.Y.); (S.C.); (Y.X.); (Y.W.); (C.Z.); (Z.Z.); (T.W.)
- WLA Laboratories, Shanghai 201203, China
| | - Yefan Wang
- Sheng Yushou Center of Cell Biology and Immunology, Joint International Research Laboratory of Metabolic and Developmental Sciences, School of Life Sciences and Biotechnology, Shanghai Jiao Tong University, Shanghai 200240, China; (Z.Y.); (S.C.); (Y.X.); (Y.W.); (C.Z.); (Z.Z.); (T.W.)
- WLA Laboratories, Shanghai 201203, China
| | - Chengling Zhao
- Sheng Yushou Center of Cell Biology and Immunology, Joint International Research Laboratory of Metabolic and Developmental Sciences, School of Life Sciences and Biotechnology, Shanghai Jiao Tong University, Shanghai 200240, China; (Z.Y.); (S.C.); (Y.X.); (Y.W.); (C.Z.); (Z.Z.); (T.W.)
- WLA Laboratories, Shanghai 201203, China
| | - Zhiyu Zhang
- Sheng Yushou Center of Cell Biology and Immunology, Joint International Research Laboratory of Metabolic and Developmental Sciences, School of Life Sciences and Biotechnology, Shanghai Jiao Tong University, Shanghai 200240, China; (Z.Y.); (S.C.); (Y.X.); (Y.W.); (C.Z.); (Z.Z.); (T.W.)
- WLA Laboratories, Shanghai 201203, China
| | - Tao Wu
- Sheng Yushou Center of Cell Biology and Immunology, Joint International Research Laboratory of Metabolic and Developmental Sciences, School of Life Sciences and Biotechnology, Shanghai Jiao Tong University, Shanghai 200240, China; (Z.Y.); (S.C.); (Y.X.); (Y.W.); (C.Z.); (Z.Z.); (T.W.)
- WLA Laboratories, Shanghai 201203, China
| | - Guojun Hou
- Shanghai Institute of Rheumatology, Renji Hospital, Shanghai Jiao Tong University School of Medicine (SJTUSM), Shanghai 200001, China;
| | - Wuming Wang
- CUHK-SDU Joint Laboratory on Reproductive Genetics, School of Biomedical Sciences, The Chinese University of Hong Kong, Hong Kong, China;
| | - Sheila Q. Xie
- MRC London Institute of Medical Sciences, London W12 0NN, UK;
- Institute of Clinical Sciences, Imperial College London, London W12 0NN, UK
| | - Yue Wu
- Department of Bioinformatics and Biostatistics, School of Life Sciences and Biotechnology, Shanghai Jiao Tong University, Shanghai 200240, China;
| | - Ya Guo
- Sheng Yushou Center of Cell Biology and Immunology, Joint International Research Laboratory of Metabolic and Developmental Sciences, School of Life Sciences and Biotechnology, Shanghai Jiao Tong University, Shanghai 200240, China; (Z.Y.); (S.C.); (Y.X.); (Y.W.); (C.Z.); (Z.Z.); (T.W.)
- WLA Laboratories, Shanghai 201203, China
| |
Collapse
|
19
|
Dodero-Rojas E, Mello MF, Brahmachari S, Oliveira Junior AB, Contessoto VG, Onuchic JN. PyMEGABASE: Predicting cell-type-specific structural annotations of chromosomes using the epigenome. J Mol Biol 2023:168180. [PMID: 37302549 DOI: 10.1016/j.jmb.2023.168180] [Citation(s) in RCA: 5] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/15/2022] [Revised: 06/03/2023] [Accepted: 06/06/2023] [Indexed: 06/13/2023]
Abstract
The folding patterns of interphase genomes in higher eukaryotes, as obtained from DNA-proximity-ligation or Hi-C experiments, are used to classify loci into structural classes called compartments and subcompartments. These structurally annotated (sub)compartments are known to exhibit specific epigenomic characteristics and cell-type-specific variations. To explore the relationship between genome structure and the epigenome, we present PyMEGABASE (PYMB), a maximum-entropy-based neural network model that predicts (sub)compartment annotations of a locus based solely on the local epigenome, such as ChIP-Seq of histone post-translational modifications. PYMB builds upon our previous model while improving robustness, capability to handle diverse inputs and user-friendly implementation. We employed PYMB to predict subcompartments for over a hundred human cell types available in ENCODE, shedding light on the links between subcompartments, cell identity, and epigenomic signals. The fact that PYMB, trained on data for human cells, can accurately predict compartments in mice suggests that the model is learning underlying physicochemical principles transferable across cell types and species. Reliable at higher resolutions (up to 5 kbp), PYMB is used to investigate compartment-specific gene expression. Not only can PYMB generate (sub)compartment information without Hi-C experiments, but its predictions are also interpretable. Analyzing PYMB's trained parameters, we explore the importance of various epigenomic marks in each subcompartment prediction. Furthermore, the predictions of the model can be used as input for OpenMiChroM software, which has been calibrated to generate three-dimensional structures of the genome. Detailed documentation of PYMB is available at https://pymegabase.readthedocs.io, including an installation guide using pip or conda, and Jupyter/Colab notebook tutorials.
Collapse
Affiliation(s)
| | - Matheus F Mello
- Center for Theoretical Biological Physics, Rice University, Houston, TX, USA
| | | | | | | | - José N Onuchic
- Center for Theoretical Biological Physics, Rice University, Houston, TX, USA; Department of Physics & Astronomy, Rice University, Houston, TX, USA; Department of Chemistry, Rice University, Houston, TX, USA; Department of Biosciences, Rice University, Houston, TX, USA.
| |
Collapse
|
20
|
Xiong K, Zhang R, Ma J. scGHOST: Identifying single-cell 3D genome subcompartments. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.05.24.542032. [PMID: 37292994 PMCID: PMC10245874 DOI: 10.1101/2023.05.24.542032] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/10/2023]
Abstract
New single-cell Hi-C (scHi-C) technologies enable probing of the genome-wide cell-to-cell variability in 3D genome organization from individual cells. Several computational methods have been developed to reveal single-cell 3D genome features based on scHi-C data, including A/B compartments, topologically-associating domains, and chromatin loops. However, no scHi-C analysis method currently exists for annotating single-cell subcompartments, which are crucial for providing a more refined view of large-scale chromosome spatial localization in single cells. Here, we present scGhost, a single-cell subcompartment annotation method based on graph embedding with constrained random walk sampling. Applications of scGhost to scHi-C data and single-cell 3D genome imaging data demonstrate the reliable identification of single-cell subcompartments and offer new insights into cell-to-cell variability of nuclear subcompartments. Using scHi-C data from the human prefrontal cortex, scGhost identifies cell type-specific subcompartments that are strongly connected to cell type-specific gene expression, suggesting the functional implications of single-cell subcompartments. Overall, scGhost is an effective new method for single-cell 3D genome subcompartment annotation based on scHi-C data for a broad range of biological contexts.
Collapse
Affiliation(s)
- Kyle Xiong
- Computational Biology Department, School of Computer Science, Carnegie Mellon University, Pittsburgh, PA 15213, USA
| | - Ruochi Zhang
- Computational Biology Department, School of Computer Science, Carnegie Mellon University, Pittsburgh, PA 15213, USA
| | - Jian Ma
- Computational Biology Department, School of Computer Science, Carnegie Mellon University, Pittsburgh, PA 15213, USA
| |
Collapse
|
21
|
Zheng X, Tran JR, Zheng Y. CscoreTool-M infers 3D sub-compartment probabilities within cell population. Bioinformatics 2023; 39:btad314. [PMID: 37166448 PMCID: PMC10206090 DOI: 10.1093/bioinformatics/btad314] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/26/2022] [Revised: 02/07/2023] [Accepted: 05/10/2023] [Indexed: 05/12/2023] Open
Abstract
MOTIVATION Computational inference of genome organization based on Hi-C sequencing has greatly aided the understanding of chromatin and nuclear organization in three dimensions (3D). However, existing computational methods fail to address the cell population heterogeneity. Here we describe a probabilistic-modeling-based method called CscoreTool-M that infers multiple 3D genome sub-compartments from Hi-C data. RESULTS The compartment scores inferred using CscoreTool-M represents the probability of a genomic region locating in a specific sub-compartment. Compared to published methods, CscoreTool-M is more accurate in inferring sub-compartments corresponding to both active and repressed chromatin. The compartment scores calculated by CscoreTool-M also help to quantify the levels of heterogeneity in sub-compartment localization within cell populations. By comparing proliferating cells and terminally differentiated non-proliferating cells, we show that the proliferating cells have higher genome organization heterogeneity, which is likely caused by cells at different cell-cycle stages. By analyzing 10 sub-compartments, we found a sub-compartment containing chromatin potentially related to the early-G1 chromatin regions proximal to the nuclear lamina in HCT116 cells, suggesting the method can deconvolve cell cycle stage-specific genome organization among asynchronously dividing cells. Finally, we show that CscoreTool-M can identify sub-compartments that contain genes enriched in housekeeping or cell-type-specific functions. AVAILABILITY AND IMPLEMENTATION https://github.com/scoutzxb/CscoreTool-M.
Collapse
Affiliation(s)
- Xiaobin Zheng
- Department of Embryology, Carnegie Institution for Science, Baltimore, MD 21218, United States
| | - Joseph R Tran
- Department of Embryology, Carnegie Institution for Science, Baltimore, MD 21218, United States
| | - Yixian Zheng
- Department of Embryology, Carnegie Institution for Science, Baltimore, MD 21218, United States
| |
Collapse
|
22
|
Bessadok A, Mahjoub MA, Rekik I. Graph Neural Networks in Network Neuroscience. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE 2023; 45:5833-5848. [PMID: 36155474 DOI: 10.1109/tpami.2022.3209686] [Citation(s) in RCA: 30] [Impact Index Per Article: 15.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/06/2023]
Abstract
Noninvasive medical neuroimaging has yielded many discoveries about the brain connectivity. Several substantial techniques mapping morphological, structural and functional brain connectivities were developed to create a comprehensive road map of neuronal activities in the human brain -namely brain graph. Relying on its non-euclidean data type, graph neural network (GNN) provides a clever way of learning the deep graph structure and it is rapidly becoming the state-of-the-art leading to enhanced performance in various network neuroscience tasks. Here we review current GNN-based methods, highlighting the ways that they have been used in several applications related to brain graphs such as missing brain graph synthesis and disease classification. We conclude by charting a path toward a better application of GNN models in network neuroscience field for neurological disorder diagnosis and population graph integration. The list of papers cited in our work is available at https://github.com/basiralab/GNNs-in-Network-Neuroscience.
Collapse
|
23
|
Kalluchi A, Harris HL, Reznicek TE, Rowley MJ. Considerations and caveats for analyzing chromatin compartments. Front Mol Biosci 2023; 10:1168562. [PMID: 37091873 PMCID: PMC10113542 DOI: 10.3389/fmolb.2023.1168562] [Citation(s) in RCA: 7] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/17/2023] [Accepted: 03/27/2023] [Indexed: 04/08/2023] Open
Abstract
Genomes are organized into nuclear compartments, separating active from inactive chromatin. Chromatin compartments are readily visible in a large number of species by experiments that map chromatin conformation genome-wide. When analyzing these maps, a common step is the identification of genomic intervals that interact within A (active) and B (inactive) compartments. It has also become increasingly common to identify and analyze subcompartments. We review different strategies to identify A/B and subcompartment intervals, including a discussion of various machine-learning approaches to predict these features. We then discuss the strengths and limitations of current strategies and examine how these aspects of analysis may have impacted our understanding of chromatin compartments.
Collapse
Affiliation(s)
| | | | | | - M. Jordan Rowley
- Department of Genetics, Cell Biology and Anatomy, University of Nebraska Medical Center, Omaha, NE, United States
| |
Collapse
|
24
|
Shokraneh N, Arab M, Libbrecht M. Integrative chromatin domain annotation through graph embedding of Hi-C data. Bioinformatics 2022; 39:6935783. [PMID: 36534827 PMCID: PMC9848054 DOI: 10.1093/bioinformatics/btac813] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/11/2022] [Revised: 11/02/2022] [Accepted: 12/16/2022] [Indexed: 12/24/2022] Open
Abstract
MOTIVATION The organization of the genome into domains plays a central role in gene expression and other cellular activities. Researchers identify genomic domains mainly through two views: 1D functional assays such as ChIP-seq, and chromatin conformation assays such as Hi-C. Fully understanding domains requires integrative modeling that combines these two views. However, the predominant form of integrative modeling uses segmentation and genome annotation (SAGA) along with the rigid assumption that loci in contact are more likely to share the same domain type, which is not necessarily true for epigenomic domain types and genome-wide chromatin interactions. RESULTS Here, we present an integrative approach that annotates domains using both 1D functional genomic signals and Hi-C measurements of genome-wide 3D interactions without the use of a pairwise prior. We do so by using a graph embedding to learn structural features corresponding to each genomic region, then inputting learned structural features along with functional genomic signals to a SAGA algorithm. We show that our domain types recapitulate well-known subcompartments with an additional granularity that distinguishes a combination of the spatial and functional states of the genomic regions. In particular, we identified a division of the previously identified A2 subcompartment such that the divided domain types have significantly varying expression levels. AVAILABILITY AND IMPLEMENTATION https://github.com/nedashokraneh/IChDA. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Neda Shokraneh
- Computing Science Department, Simon Fraser University, Burnaby V5A 1S6, Canada
| | - Mariam Arab
- Computing Science Department, Simon Fraser University, Burnaby V5A 1S6, Canada
| | | |
Collapse
|
25
|
Nascimben M, Rimondini L, Corà D, Venturin M. Polygenic risk modeling of tumor stage and survival in bladder cancer. BioData Min 2022; 15:23. [PMID: 36175974 PMCID: PMC9523990 DOI: 10.1186/s13040-022-00306-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/19/2021] [Accepted: 09/18/2022] [Indexed: 11/26/2022] Open
Abstract
Introduction Bladder cancer assessment with non-invasive gene expression signatures facilitates the detection of patients at risk and surveillance of their status, bypassing the discomforts given by cystoscopy. To achieve accurate cancer estimation, analysis pipelines for gene expression data (GED) may integrate a sequence of several machine learning and bio-statistical techniques to model complex characteristics of pathological patterns. Methods Numerical experiments tested the combination of GED preprocessing by discretization with tree ensemble embeddings and nonlinear dimensionality reductions to categorize oncological patients comprehensively. Modeling aimed to identify tumor stage and distinguish survival outcomes in two situations: complete and partial data embedding. This latter experimental condition simulates the addition of new patients to an existing model for rapid monitoring of disease progression. Machine learning procedures were employed to identify the most relevant genes involved in patient prognosis and test the performance of preprocessed GED compared to untransformed data in predicting patient conditions. Results Data embedding paired with dimensionality reduction produced prognostic maps with well-defined clusters of patients, suitable for medical decision support. A second experiment simulated the addition of new patients to an existing model (partial data embedding): Uniform Manifold Approximation and Projection (UMAP) methodology with uniform data discretization led to better outcomes than other analyzed pipelines. Further exploration of parameter space for UMAP and t-distributed stochastic neighbor embedding (t-SNE) underlined the importance of tuning a higher number of parameters for UMAP rather than t-SNE. Moreover, two different machine learning experiments identified a group of genes valuable for partitioning patients (gene relevance analysis) and showed the higher precision obtained by preprocessed data in predicting tumor outcomes for cancer stage and survival rate (six classes prediction). Conclusions The present investigation proposed new analysis pipelines for disease outcome modeling from bladder cancer-related biomarkers. Complete and partial data embedding experiments suggested that pipelines employing UMAP had a more accurate predictive ability, supporting the recent literature trends on this methodology. However, it was also found that several UMAP parameters influence experimental results, therefore deriving a recommendation for researchers to pay attention to this aspect of the UMAP technique. Machine learning procedures further demonstrated the effectiveness of the proposed preprocessing in predicting patients’ conditions and determined a sub-group of biomarkers significant for forecasting bladder cancer prognosis.
Collapse
Affiliation(s)
- Mauro Nascimben
- Department of Health Sciences, Università del Piemonte Orientale, Via Solaroli 17, 28100, Novara, Italy. .,Enginsoft SpA, Via Giambellino 7, 35129, Padova, Italy.
| | - Lia Rimondini
- Department of Health Sciences, Università del Piemonte Orientale, Via Solaroli 17, 28100, Novara, Italy
| | - Davide Corà
- Department of Health Sciences, Università del Piemonte Orientale, Via Solaroli 17, 28100, Novara, Italy.,Department of Translational Medicine, Università del Piemonte Orientale, Via Solaroli 17, 28100, Novara, Italy
| | | |
Collapse
|
26
|
Regulation associated modules reflect 3D genome modularity associated with chromatin activity. Nat Commun 2022; 13:5281. [PMID: 36075900 PMCID: PMC9458634 DOI: 10.1038/s41467-022-32911-y] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/12/2022] [Accepted: 08/19/2022] [Indexed: 12/02/2022] Open
Abstract
The 3D genome has been shown to be organized into modules including topologically associating domains (TADs) and compartments that are primarily defined by spatial contacts from Hi-C. There exists a gap to investigate whether and how the spatial modularity of the chromatin is related to the functional modularity resulting from chromatin activity. Despite histone modifications reflecting chromatin activity, inferring spatial modularity of the genome directly from the histone modification patterns has not been well explored. Here, we report that histone modifications show a modular pattern (referred to as regulation associated modules, RAMs) that reflects spatial chromatin modularity. Enhancer-promoter interactions, loop anchors, super-enhancer clusters and extrachromosomal DNAs (ecDNAs) are found to occur more often within the same RAMs than within the same TADs. Consistently, compared to the TAD boundaries, deletions of RAM boundaries perturb the chromatin structure more severely (may even cause cell death) and somatic variants in cancer samples are more enriched in RAM boundaries. These observations suggest that RAMs reflect a modular organization of the 3D genome at a scale better aligned with chromatin activity, providing a bridge connecting the structural and functional modularity of the genome.
Collapse
|
27
|
Dsouza KB, Maslova A, Al-Jibury E, Merkenschlager M, Bhargava VK, Libbrecht MW. Learning representations of chromatin contacts using a recurrent neural network identifies genomic drivers of conformation. Nat Commun 2022; 13:3704. [PMID: 35764630 PMCID: PMC9240038 DOI: 10.1038/s41467-022-31337-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/05/2021] [Accepted: 06/15/2022] [Indexed: 11/28/2022] Open
Abstract
Despite the availability of chromatin conformation capture experiments, discerning the relationship between the 1D genome and 3D conformation remains a challenge, which limits our understanding of their affect on gene expression and disease. We propose Hi-C-LSTM, a method that produces low-dimensional latent representations that summarize intra-chromosomal Hi-C contacts via a recurrent long short-term memory neural network model. We find that these representations contain all the information needed to recreate the observed Hi-C matrix with high accuracy, outperforming existing methods. These representations enable the identification of a variety of conformation-defining genomic elements, including nuclear compartments and conformation-related transcription factors. They furthermore enable in-silico perturbation experiments that measure the influence of cis-regulatory elements on conformation.
Collapse
Affiliation(s)
- Kevin B Dsouza
- Department of Electrical and Computer Engineering, University of British Columbia, Vancouver, Canada.
| | - Alexandra Maslova
- School of Computing Science, Simon Fraser University, Burnaby, Canada
| | - Ediem Al-Jibury
- MRC, London Institute of Medical Sciences, Institute of Clinical Sciences, Faculty of Medicine, Imperial College London, London, UK
- Department of Computing, Imperial College London, London, UK
| | - Matthias Merkenschlager
- MRC, London Institute of Medical Sciences, Institute of Clinical Sciences, Faculty of Medicine, Imperial College London, London, UK
| | - Vijay K Bhargava
- Department of Electrical and Computer Engineering, University of British Columbia, Vancouver, Canada
| | | |
Collapse
|
28
|
Xu M, Singh AV, Karniadakis GE. DynG2G: An Efficient Stochastic Graph Embedding Method for Temporal Graphs. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2022; PP:985-998. [PMID: 35687628 DOI: 10.1109/tnnls.2022.3178706] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/15/2023]
Abstract
Dynamic graph embedding has gained great attention recently due to its capability of learning low-dimensional and meaningful graph representations for complex temporal graphs with high accuracy. However, recent advances mostly focus on learning node embeddings as deterministic "vectors" for static graphs, hence disregarding the key graph temporal dynamics and the evolving uncertainties associated with node embedding in the latent space. In this work, we propose an efficient stochastic dynamic graph embedding method (DynG2G) that applies an inductive feedforward encoder trained with node triplet energy-based ranking loss. Every node per timestamp is encoded as a time-dependent probabilistic multivariate Gaussian distribution in the latent space, and, hence, we are able to quantify the node embedding uncertainty on-the-fly. We have considered eight different benchmarks that represent diversity in size (from 96 nodes to 87 626 and from 13 398 edges to 4 870 863) as well as diversity in dynamics, from slowly changing temporal evolution to rapidly varying multirate dynamics. We demonstrate through extensive experiments based on these eight dynamic graph benchmarks that DynG2G achieves new state-of-the-art performance in capturing the underlying temporal node embeddings. We also demonstrate that DynG2G can simultaneously predict the evolving node embedding uncertainty, which plays a crucial role in quantifying the intrinsic dimensionality of the dynamical system over time. In particular, we obtain a "universal" relation of the optimal embedding dimension, Lo , versus the effective dimensionality of uncertainty, Du , and infer that Lo=Du for all cases. This, in turn, implies that the uncertainty quantification approach we employ in the DynG2G algorithm correctly captures the intrinsic dimensionality of the dynamics of such evolving graphs despite the diverse nature and composition of the graphs at each timestamp. In addition, this L0 - Du correlation provides a clear path to selecting adaptively the optimum embedding size at each timestamp by setting L ≥ Du .
Collapse
|
29
|
Wen Z, Zhang W, Zhong Q, Xu J, Hou C, Qin ZS, Li L. Extensive Chromatin Structure-Function Associations Revealed by Accurate 3D Compartmentalization Characterization. Front Cell Dev Biol 2022; 10:845118. [PMID: 35517497 PMCID: PMC9062080 DOI: 10.3389/fcell.2022.845118] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/29/2021] [Accepted: 03/24/2022] [Indexed: 11/30/2022] Open
Abstract
A/B compartments are observed in Hi-C data and coincide with eu/hetero-chromatin. However, many genomic regions are ambiguous under A/B compartment scheme. We develop MOSAIC (MOdularity and Singular vAlue decomposition-based Identification of Compartments), an accurate compartmental state detection scheme. MOSAIC reveals that those ambiguous regions segregate into two additional compartmental states, which typically correspond to short genomic regions flanked by long canonical A/B compartments with opposite activities. They are denoted as micro-compartments accordingly. In contrast to the canonical A/B compartments, micro-compartments cover ∼30% of the genome and are highly dynamic across cell types. More importantly, distinguishing the micro-compartments underpins accurate characterization of chromatin structure-function relationship. By applying MOSAIC to GM12878 and K562 cells, we identify CD86, ILDR1 and GATA2 which show concordance between gene expression and compartmental states beyond the scheme of A/B compartments. Taken together, MOSAIC uncovers fine-scale and dynamic compartmental states underlying transcriptional regulation and disease.
Collapse
Affiliation(s)
- Zi Wen
- Hubei Key Laboratory of Agricultural Bioinformatics, College of Informatics, Huazhong Agricultural University, Wuhan, China
- 3D Genomics Research Center, Huazhong Agricultural University, Wuhan, China
| | - Weihan Zhang
- Hubei Key Laboratory of Agricultural Bioinformatics, College of Informatics, Huazhong Agricultural University, Wuhan, China
| | - Quan Zhong
- Hubei Key Laboratory of Agricultural Bioinformatics, College of Informatics, Huazhong Agricultural University, Wuhan, China
- 3D Genomics Research Center, Huazhong Agricultural University, Wuhan, China
| | - Jinsheng Xu
- Hubei Key Laboratory of Agricultural Bioinformatics, College of Informatics, Huazhong Agricultural University, Wuhan, China
- 3D Genomics Research Center, Huazhong Agricultural University, Wuhan, China
| | - Chunhui Hou
- Department of Biology, School of Life Sciences, Southern University of Science and Technology, Shenzhen, China
| | - Zhaohui Steve Qin
- Department of Biostatistics and Bioinformatics, Rollins School of Public Health, Emory University, Atlanta, GA, United States
| | - Li Li
- Hubei Key Laboratory of Agricultural Bioinformatics, College of Informatics, Huazhong Agricultural University, Wuhan, China
- 3D Genomics Research Center, Huazhong Agricultural University, Wuhan, China
- Hubei Hongshan Laboratory, Huazhong Agricultural University, Wuhan, China
- *Correspondence: Li Li,
| |
Collapse
|
30
|
Sefer E. ProbC: joint modeling of epigenome and transcriptome effects in 3D genome. BMC Genomics 2022; 23:287. [PMID: 35397520 PMCID: PMC8994916 DOI: 10.1186/s12864-022-08498-5] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/20/2021] [Accepted: 03/23/2022] [Indexed: 11/30/2022] Open
Abstract
BACKGROUND Hi-C and its high nucleosome resolution variant Micro-C provide a window into the spatial packing of a genome in 3D within the cell. Even though both techniques do not directly depend on the binding of specific antibodies, previous work has revealed enriched interactions and domain structures around multiple chromatin marks; epigenetic modifications and transcription factor binding sites. However, the joint impact of chromatin marks in Hi-C and Micro-C interactions have not been globally characterized, which limits our understanding of 3D genome characteristics. An emerging question is whether it is possible to deduce 3D genome characteristics and interactions by integrative analysis of multiple chromatin marks and associate interactions to functionality of the interacting loci. RESULT We come up with a probabilistic method PROBC to decompose Hi-C and Micro-C interactions by known chromatin marks. PROBC is based on convex likelihood optimization, which can directly take into account both interaction existence and nonexistence. Through PROBC, we discover histone modifications (H3K27ac, H3K9me3, H3K4me3, H3K4me1) and CTCF as particularly predictive of Hi-C and Micro-C contacts across cell types and species. Moreover, histone modifications are more effective than transcription factor binding sites in explaining the genome's 3D shape through these interactions. PROBC can successfully predict Hi-C and Micro-C interactions in given species, while it is trained on different cell types or species. For instance, it can predict missing nucleosome resolution Micro-C interactions in human ES cells trained on mouse ES cells only from these 5 chromatin marks with above 0.75 AUC. Additionally, PROBC outperforms the existing methods in predicting interactions across almost all chromosomes. CONCLUSION Via our proposed method, we optimally decompose Hi-C interactions in terms of these chromatin marks at genome and chromosome levels. We find a subset of histone modifications and transcription factor binding sites to be predictive of both Hi-C and Micro-C interactions and TADs across human, mouse, and different cell types. Through learned models, we can predict interactions on species just from chromatin marks for which Hi-C data may be limited.
Collapse
Affiliation(s)
- Emre Sefer
- Department of Computer Science, Ozyegin University, Istanbul, Turkey.
| |
Collapse
|
31
|
Scalvini B, Schiessel H, Golovnev A, Mashaghi A. Circuit topology analysis of cellular genome reveals signature motifs, conformational heterogeneity, and scaling. iScience 2022; 25:103866. [PMID: 35243229 PMCID: PMC8861635 DOI: 10.1016/j.isci.2022.103866] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/25/2021] [Revised: 12/14/2021] [Accepted: 01/31/2022] [Indexed: 11/30/2022] Open
Abstract
Reciprocal regulation of genome topology and function is a fundamental and enduring puzzle in biology. The wealth of data provided by Hi-C libraries offers the opportunity to unravel this relationship. However, there is a need for a comprehensive theoretical framework in order to extract topological information for genome characterization and comparison. Here, we develop a toolbox for topological analysis based on Circuit Topology, allowing for the quantification of inter- and intracellular genomic heterogeneity, at various levels of fold complexity: pairwise contact arrangement, higher-order contact arrangement, and topological fractal dimension. Single-cell Hi-C data were analyzed and characterized based on topological content, revealing not only a strong multiscale heterogeneity but also highly conserved features such as a characteristic topological length scale and topological signature motifs in the genome. We propose that these motifs inform on the topological state of the nucleus and indicate the presence of active loop extrusion. Circuit topology quantifies heterogeneity in genomic arrangement Scale analysis reveals a characteristic length scale of 10 Mb in genome topology We identify highly conserved topological structures related to loop extrusion We suggest a topological model of chromatin arrangement for loop extrusion, the L-loop
Collapse
Affiliation(s)
- Barbara Scalvini
- Medical Systems Biophysics and Bioengineering, Leiden Academic Centre for Drug Research, Faculty of Science, Leiden University, Einsteinweg 55, 2333CC Leiden, the Netherlands
- Centre for Interdisciplinary Genome Research, Faculty of Science, Leiden University, Einsteinweg 55, 2333CC Leiden, the Netherlands
| | - Helmut Schiessel
- Cluster of Excellence Physics of Life, Technical University of Dresden, 01062 Dresden, Germany
| | - Anatoly Golovnev
- Medical Systems Biophysics and Bioengineering, Leiden Academic Centre for Drug Research, Faculty of Science, Leiden University, Einsteinweg 55, 2333CC Leiden, the Netherlands
- Centre for Interdisciplinary Genome Research, Faculty of Science, Leiden University, Einsteinweg 55, 2333CC Leiden, the Netherlands
| | - Alireza Mashaghi
- Medical Systems Biophysics and Bioengineering, Leiden Academic Centre for Drug Research, Faculty of Science, Leiden University, Einsteinweg 55, 2333CC Leiden, the Netherlands
- Centre for Interdisciplinary Genome Research, Faculty of Science, Leiden University, Einsteinweg 55, 2333CC Leiden, the Netherlands
- Corresponding author
| |
Collapse
|
32
|
Arslan E, Schulz J, Rai K. Machine Learning in Epigenomics: Insights into Cancer Biology and Medicine. Biochim Biophys Acta Rev Cancer 2021; 1876:188588. [PMID: 34245839 PMCID: PMC8595561 DOI: 10.1016/j.bbcan.2021.188588] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/09/2021] [Revised: 05/29/2021] [Accepted: 07/02/2021] [Indexed: 02/01/2023]
Abstract
The recent deluge of genome-wide technologies for the mapping of the epigenome and resulting data in cancer samples has provided the opportunity for gaining insights into and understanding the roles of epigenetic processes in cancer. However, the complexity, high-dimensionality, sparsity, and noise associated with these data pose challenges for extensive integrative analyses. Machine Learning (ML) algorithms are particularly suited for epigenomic data analyses due to their flexibility and ability to learn underlying hidden structures. We will discuss four overlapping but distinct major categories under ML: dimensionality reduction, unsupervised methods, supervised methods, and deep learning (DL). We review the preferred use cases of these algorithms in analyses of cancer epigenomics data with the hope to provide an overview of how ML approaches can be used to explore fundamental questions on the roles of epigenome in cancer biology and medicine.
Collapse
Affiliation(s)
- Emre Arslan
- Department of Genomic Medicine, MD Anderson Cancer Center, Houston, TX 77030, United States of America
| | - Jonathan Schulz
- Department of Genomic Medicine, MD Anderson Cancer Center, Houston, TX 77030, United States of America
| | - Kunal Rai
- Department of Genomic Medicine, MD Anderson Cancer Center, Houston, TX 77030, United States of America.
| |
Collapse
|
33
|
Pezoulas VC, Hazapis O, Lagopati N, Exarchos TP, Goules AV, Tzioufas AG, Fotiadis DI, Stratis IG, Yannacopoulos AN, Gorgoulis VG. Machine Learning Approaches on High Throughput NGS Data to Unveil Mechanisms of Function in Biology and Disease. Cancer Genomics Proteomics 2021; 18:605-626. [PMID: 34479914 PMCID: PMC8441762 DOI: 10.21873/cgp.20284] [Citation(s) in RCA: 9] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/25/2021] [Revised: 07/21/2021] [Accepted: 08/03/2021] [Indexed: 12/13/2022] Open
Abstract
In this review, the fundamental basis of machine learning (ML) and data mining (DM) are summarized together with the techniques for distilling knowledge from state-of-the-art omics experiments. This includes an introduction to the basic mathematical principles of unsupervised/supervised learning methods, dimensionality reduction techniques, deep neural networks architectures and the applications of these in bioinformatics. Several case studies under evaluation mainly involve next generation sequencing (NGS) experiments, like deciphering gene expression from total and single cell (scRNA-seq) analysis; for the latter, a description of all recent artificial intelligence (AI) methods for the investigation of cell sub-types, biomarkers and imputation techniques are described. Other areas of interest where various ML schemes have been investigated are for providing information regarding transcription factors (TF) binding sites, chromatin organization patterns and RNA binding proteins (RBPs), while analyses on RNA sequence and structure as well as 3D dimensional protein structure predictions with the use of ML are described. Furthermore, we summarize the recent methods of using ML in clinical oncology, when taking into consideration the current omics data with pharmacogenomics to determine personalized treatments. With this review we wish to provide the scientific community with a thorough investigation of main novel ML applications which take into consideration the latest achievements in genomics, thus, unraveling the fundamental mechanisms of biology towards the understanding and cure of diseases.
Collapse
Affiliation(s)
- Vasileios C Pezoulas
- Unit of Medical Technology and Intelligent Information Systems, University of Ioannina, Ioannina, Greece
| | - Orsalia Hazapis
- Molecular Carcinogenesis Group, Department of Histology and Embryology, School of Medicine, National and Kapodistrian University of Athens, Athens, Greece
| | - Nefeli Lagopati
- Molecular Carcinogenesis Group, Department of Histology and Embryology, School of Medicine, National and Kapodistrian University of Athens, Athens, Greece
- Biomedical Research Foundation of the Academy of Athens, Athens, Greece
| | - Themis P Exarchos
- Unit of Medical Technology and Intelligent Information Systems, Department of Materials Science and Engineering, University of Ioannina, Ioannina, Greece
- Department of Informatics, Ionian University, Corfu, Greece
| | - Andreas V Goules
- Department of Pathophysiology, School of Medicine, National and Kapodistrian University of Athens, Athens, Greece
| | - Athanasios G Tzioufas
- Department of Pathophysiology, School of Medicine, National and Kapodistrian University of Athens, Athens, Greece
| | - Dimitrios I Fotiadis
- Unit of Medical Technology and Intelligent Information Systems, University of Ioannina, Ioannina, Greece
| | - Ioannis G Stratis
- Department of Mathematics, National and Kapodistrian University of Athens, Athens, Greece
| | - Athanasios N Yannacopoulos
- Department of Statistics, and Stochastic Modelling and Applications Laboratory, Athens University of Economics and Business (AUEB), Athens, Greece;
| | - Vassilis G Gorgoulis
- Molecular Carcinogenesis Group, Department of Histology and Embryology, School of Medicine, National and Kapodistrian University of Athens, Athens, Greece;
- Biomedical Research Foundation of the Academy of Athens, Athens, Greece
- Division of Cancer Sciences, Faculty of Biology, Medicine and Health, Manchester Academic Health Science Centre, Manchester Cancer Research Centre, NIHR Manchester Biomedical Research Centre, University of Manchester, Manchester, U.K
- Center for New Biotechnologies and Precision Medicine, Medical School, National and Kapodistrian University of Athens, Athens, Greece
- Faculty of Health and Medical Sciences, University of Surrey, Surrey, U.K
| |
Collapse
|
34
|
Liu Y, Nanni L, Sungalee S, Zufferey M, Tavernari D, Mina M, Ceri S, Oricchio E, Ciriello G. Systematic inference and comparison of multi-scale chromatin sub-compartments connects spatial organization to cell phenotypes. Nat Commun 2021; 12:2439. [PMID: 33972523 PMCID: PMC8110550 DOI: 10.1038/s41467-021-22666-3] [Citation(s) in RCA: 57] [Impact Index Per Article: 14.3] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/04/2021] [Accepted: 03/16/2021] [Indexed: 12/21/2022] Open
Abstract
Chromatin compartmentalization reflects biological activity. However, inference of chromatin sub-compartments and compartment domains from chromosome conformation capture (Hi-C) experiments is limited by data resolution. As a result, these have been characterized only in a few cell types and systematic comparisons across multiple tissues and conditions are missing. Here, we present Calder, an algorithmic approach that enables the identification of multi-scale sub-compartments at variable data resolution. Calder allows to infer and compare chromatin sub-compartments and compartment domains in >100 cell lines. Our results reveal sub-compartments enriched for poised chromatin states and undergoing spatial repositioning during lineage differentiation and oncogenic transformation.
Collapse
Affiliation(s)
- Yuanlong Liu
- Department of Computational Biology, University of Lausanne, Lausanne, Switzerland
- Swiss Cancer Center Leman, Lausanne, Switzerland
- Swiss Institute of Bioinformatics, Lausanne, Switzerland
| | - Luca Nanni
- Department of Computational Biology, University of Lausanne, Lausanne, Switzerland
- Swiss Cancer Center Leman, Lausanne, Switzerland
- Swiss Institute of Bioinformatics, Lausanne, Switzerland
- Department of Electronics, Information, and Bioengineering, Politecnico di Milano, Milan, Italy
| | - Stephanie Sungalee
- Swiss Cancer Center Leman, Lausanne, Switzerland
- Swiss Institute for Experimental Cancer Research (ISREC) School of Life Sciences, EPFL, Épalinges, Switzerland
| | - Marie Zufferey
- Department of Computational Biology, University of Lausanne, Lausanne, Switzerland
- Swiss Cancer Center Leman, Lausanne, Switzerland
- Swiss Institute of Bioinformatics, Lausanne, Switzerland
| | - Daniele Tavernari
- Department of Computational Biology, University of Lausanne, Lausanne, Switzerland
- Swiss Cancer Center Leman, Lausanne, Switzerland
- Swiss Institute of Bioinformatics, Lausanne, Switzerland
| | - Marco Mina
- Department of Computational Biology, University of Lausanne, Lausanne, Switzerland
- Swiss Cancer Center Leman, Lausanne, Switzerland
- Swiss Institute of Bioinformatics, Lausanne, Switzerland
| | - Stefano Ceri
- Department of Electronics, Information, and Bioengineering, Politecnico di Milano, Milan, Italy
| | - Elisa Oricchio
- Swiss Cancer Center Leman, Lausanne, Switzerland
- Swiss Institute for Experimental Cancer Research (ISREC) School of Life Sciences, EPFL, Épalinges, Switzerland
| | - Giovanni Ciriello
- Department of Computational Biology, University of Lausanne, Lausanne, Switzerland.
- Swiss Cancer Center Leman, Lausanne, Switzerland.
- Swiss Institute of Bioinformatics, Lausanne, Switzerland.
| |
Collapse
|
35
|
Targeting Chromatin Complexes in Myeloid Malignancies and Beyond: From Basic Mechanisms to Clinical Innovation. Cells 2020; 9:cells9122721. [PMID: 33371192 PMCID: PMC7767226 DOI: 10.3390/cells9122721] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/19/2020] [Revised: 12/13/2020] [Accepted: 12/20/2020] [Indexed: 12/12/2022] Open
Abstract
The aberrant function of chromatin regulatory networks (epigenetics) is a hallmark of cancer promoting oncogenic gene expression. A growing body of evidence suggests that the disruption of specific chromatin-associated protein complexes has therapeutic potential in malignant conditions, particularly those that are driven by aberrant chromatin modifiers. Of note, a number of enzymatic inhibitors that block the catalytic function of histone modifying enzymes have been established and entered clinical trials. Unfortunately, many of these molecules do not have potent single-agent activity. One potential explanation for this phenomenon is the fact that those drugs do not profoundly disrupt the integrity of the aberrant network of multiprotein complexes on chromatin. Recent advances in drug development have led to the establishment of novel inhibitors of protein–protein interactions as well as targeted protein degraders that may provide inroads to longstanding effort to physically disrupt oncogenic multiprotein complexes on chromatin. In this review, we summarize some of the current concepts on the role epigenetic modifiers in malignant chromatin states with a specific focus on myeloid malignancies and recent advances in early-phase clinical trials.
Collapse
|