1
|
Zeng Y, You Z, Guo J, Zhao J, Zhou Y, Huang J, Lyu X, Chen L, Li Q. Chrombus-XMBD: a graph convolution model predicting 3D-genome from chromatin features. Brief Bioinform 2025; 26:bbaf183. [PMID: 40315432 PMCID: PMC12047703 DOI: 10.1093/bib/bbaf183] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/16/2024] [Revised: 03/11/2025] [Accepted: 03/26/2025] [Indexed: 05/04/2025] Open
Abstract
The 3D conformation of the chromatin is crucial for transcriptional regulation. However, current experimental techniques for detecting the 3D structure of the genome are costly and limited to the biological conditions. Here, we described "ChrombusXMBD," a graph convolution model capable of predicting chromatin interactions ab initio based on available chromatin features. Using dynamic edge convolution with multihead attention mechanism, Chrombus encodes the 2D-chromatin features into a learnable embedding space, thereby generating a genome-wide 3D-contactmap. In validation, Chrombus effectively recapitulated the topological associated domains, expression quantitative trait loci, and promoter/enhancer interactions. Especially, Chrombus outperforms existing algorithms in predicting chromatin interactions over 1-2 Mb, increasing prediction correlation by 11.8%-48.7%, and predicts long-range interactions over 2 Mb (Pearson's coefficient 0.243-0.582). Chrombus also exhibits strong generalizability across human and mouse-derived cell lines. Additionally, the parameters of Chrombus inform the biological mechanisms underlying cistrome. Our model provides a new, generalizable analytical tool for understanding the complex dynamics of chromatin interactions and the landscape of cis-regulation of gene expression.
Collapse
Affiliation(s)
- Yuanyuan Zeng
- Department of Hematology, The First Affiliated Hospital of Xiamen University and Institute of Hematology, School of Medicine, Xiamen University, Xiamen, Fujian 361102, China
- National Institute for Data Science in Health and Medicine, School of Medicine, Xiamen University, Xiamen, Fujian 361102, China
| | - Zhiyu You
- National Institute for Data Science in Health and Medicine, School of Medicine, Xiamen University, Xiamen, Fujian 361102, China
| | - Jiayang Guo
- National Institute for Data Science in Health and Medicine, School of Medicine, Xiamen University, Xiamen, Fujian 361102, China
| | - Jialin Zhao
- National Institute for Data Science in Health and Medicine, School of Medicine, Xiamen University, Xiamen, Fujian 361102, China
| | - Ying Zhou
- National Institute for Data Science in Health and Medicine, School of Medicine, Xiamen University, Xiamen, Fujian 361102, China
| | - Jialiang Huang
- State Key Laboratory of Cellular Stress Biology, School of Life Sciences, Faculty of Medicine and Life Sciences, Xiamen University, Xiamen, Fujian 361102, China
| | - Xiaowen Lyu
- State Key Laboratory of Cellular Stress Biology, Fujian Provincial Key Laboratory of Reproductive Health Research, School of Medicine, Xiamen University, Xiamen, Fujian 361102, China
| | - Longbiao Chen
- Fujian Key Laboratory of Sensing and Computing for Smart Cities (SCSC), School of Informatics, Xiamen University, Xiamen, Fujian 361102, China
| | - Qiyuan Li
- Department of Hematology, The First Affiliated Hospital of Xiamen University and Institute of Hematology, School of Medicine, Xiamen University, Xiamen, Fujian 361102, China
- National Institute for Data Science in Health and Medicine, School of Medicine, Xiamen University, Xiamen, Fujian 361102, China
| |
Collapse
|
2
|
Betti MJ, Lin P, Aldrich MC, Gamazon ER. Genetically regulated eRNA expression predicts chromatin contact frequency and reveals genetic mechanisms at GWAS loci. Nat Commun 2025; 16:3193. [PMID: 40180945 PMCID: PMC11968980 DOI: 10.1038/s41467-025-58023-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/07/2024] [Accepted: 02/18/2025] [Indexed: 04/05/2025] Open
Abstract
The biological functions of extragenic enhancer RNAs and their impact on disease risk remain relatively underexplored. In this work, we develop in silico models of genetically regulated expression of enhancer RNAs across 49 cell and tissue types, characterizing their degree of genetic control. Leveraging the estimated genetically regulated expression for enhancer RNAs and canonical genes in a large-scale DNA biobank (N > 70,000) and high-resolution Hi-C contact data, we train a deep learning-based model of pairwise three-dimensional chromatin contact frequency for enhancer-enhancer and enhancer-gene pairs in cerebellum and whole blood. Notably, the use of genetically regulated expression of enhancer RNAs provides substantial tissue-specific predictive power, supporting a role for these transcripts in modulating spatial chromatin organization. We identify schizophrenia-associated enhancer RNAs independent of GWAS loci using enhancer RNA-based TWAS and determine the causal effects of these enhancer RNAs using Mendelian randomization. Using enhancer RNA-based TWAS, we generate a comprehensive resource of tissue-specific enhancer associations with complex traits in the UK Biobank. Finally, we show that a substantially greater proportion (63%) of GWAS associations colocalize with causal regulatory variation when enhancer RNAs are included.
Collapse
Affiliation(s)
- Michael J Betti
- Department of Medicine, Division of Genetic Medicine, Vanderbilt University Medical Center, 2525 West End Avenue, Suite 700, Nashville, TN, 37203, USA.
| | - Phillip Lin
- Department of Medicine, Division of Genetic Medicine, Vanderbilt University Medical Center, 2525 West End Avenue, Suite 700, Nashville, TN, 37203, USA
| | - Melinda C Aldrich
- Department of Medicine, Division of Genetic Medicine, Vanderbilt University Medical Center, 2525 West End Avenue, Suite 700, Nashville, TN, 37203, USA
| | - Eric R Gamazon
- Department of Medicine, Division of Genetic Medicine, Vanderbilt University Medical Center, 2525 West End Avenue, Suite 700, Nashville, TN, 37203, USA.
- Clare Hall, University of Cambridge, Herschel Rd, Cambridge, CB3 9AL, UK.
| |
Collapse
|
3
|
Smaruj PN, Xiao Y, Fudenberg G. Recipes and ingredients for deep learning models of 3D genome folding. Curr Opin Genet Dev 2025; 91:102308. [PMID: 39862604 PMCID: PMC11867851 DOI: 10.1016/j.gde.2024.102308] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/12/2024] [Revised: 12/19/2024] [Accepted: 12/31/2024] [Indexed: 01/27/2025]
Abstract
Three-dimensional genome folding plays roles in gene regulation and disease. In this review, we compare and contrast recent deep learning models for predicting genome contact maps. We survey preprocessing, architecture, training, evaluation, and interpretation methods, highlighting the capabilities and limitations of different models. In each area, we highlight challenges, opportunities, and potential future directions for genome-folding models.
Collapse
Affiliation(s)
- Paulina N Smaruj
- Department of Quantitative and Computational Biology, University of Southern California, Los Angeles, CA, USA
| | - Yao Xiao
- Department of Quantitative and Computational Biology, University of Southern California, Los Angeles, CA, USA
| | - Geoffrey Fudenberg
- Department of Quantitative and Computational Biology, University of Southern California, Los Angeles, CA, USA.
| |
Collapse
|
4
|
Dubocanin D, Kalygina A, Franklin JM, Chittenden C, Vollger MR, Neph S, Stergachis AB, Altemose N. Integrating Single-Molecule Sequencing and Deep Learning to Predict Haplotype-Specific 3D Chromatin Organization in a Mendelian Condition. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2025:2025.02.26.640261. [PMID: 40166185 PMCID: PMC11957061 DOI: 10.1101/2025.02.26.640261] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Subscribe] [Scholar Register] [Indexed: 04/02/2025]
Abstract
The three-dimensional (3D) architecture of the genome plays a crucial role in gene regulation and various human diseases. Short-read sequencing methods for measuring 3D genome organization are powerful, but they lack the ability to resolve individual human haplotypes or structurally complex regions. To address this, we present FiberFold, a deep learning model that combines convolutional neural networks and transformer architectures to accurately predict cell-type-specific and haplotype-specific 3D genome organization using multi-omic data from a single, long-read sequencing assay, Fiber-seq. By applying FiberFold to a cell line with allelic X-inactivation, we show that Topologically Associated Domains (TADs) are attenuated on the inactive chrX. Furthermore, FiberFold predicts significant changes to TADs surrounding a 13;X balanced translocation in a patient with a rare Mendelian disease. FiberFold showcases the power of integrating long-read epigenomic sequencing with deep learning tools to investigate fundamental chromatin biology as well as the molecular basis of human disease.
Collapse
Affiliation(s)
- Danilo Dubocanin
- Department of Genetics, School of Medicine, Stanford University, Palo Alto, CA, USA
| | - Anna Kalygina
- Department of Biology, University of Oxford, Oxford, UK
| | - J. Matthew Franklin
- Department of Genetics, School of Medicine, Stanford University, Palo Alto, CA, USA
| | - Cy Chittenden
- Department of Genetics, School of Medicine, Stanford University, Palo Alto, CA, USA
| | - Mitchell R Vollger
- Division of Medical Genetics, Dept. of Medicine, University of Washington, Seattle, WA, USA
| | - Shane Neph
- Division of Medical Genetics, Dept. of Medicine, University of Washington, Seattle, WA, USA
| | - Andrew B Stergachis
- Division of Medical Genetics, Dept. of Medicine, University of Washington, Seattle, WA, USA
- Department of Genome Sciences, University of Washington, Seattle, WA, USA
- Brotman Baty Institute for Precision Medicine, Seattle, WA, USA
| | - Nicolas Altemose
- Department of Genetics, School of Medicine, Stanford University, Palo Alto, CA, USA
- Chan Zuckerberg Biohub – San Francisco, San Francisco, CA, USA
| |
Collapse
|
5
|
An Z, Jiang A, Chen J. Toward understanding the role of genomic repeat elements in neurodegenerative diseases. Neural Regen Res 2025; 20:646-659. [PMID: 38886931 PMCID: PMC11433896 DOI: 10.4103/nrr.nrr-d-23-01568] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/18/2023] [Revised: 12/21/2023] [Accepted: 03/02/2024] [Indexed: 06/20/2024] Open
Abstract
Neurodegenerative diseases cause great medical and economic burdens for both patients and society; however, the complex molecular mechanisms thereof are not yet well understood. With the development of high-coverage sequencing technology, researchers have started to notice that genomic repeat regions, previously neglected in search of disease culprits, are active contributors to multiple neurodegenerative diseases. In this review, we describe the association between repeat element variants and multiple degenerative diseases through genome-wide association studies and targeted sequencing. We discuss the identification of disease-relevant repeat element variants, further powered by the advancement of long-read sequencing technologies and their related tools, and summarize recent findings in the molecular mechanisms of repeat element variants in brain degeneration, such as those causing transcriptional silencing or RNA-mediated gain of toxic function. Furthermore, we describe how in silico predictions using innovative computational models, such as deep learning language models, could enhance and accelerate our understanding of the functional impact of repeat element variants. Finally, we discuss future directions to advance current findings for a better understanding of neurodegenerative diseases and the clinical applications of genomic repeat elements.
Collapse
Affiliation(s)
- Zhengyu An
- Institute of Science and Technology for Brain-Inspired Intelligence, Fudan University, Shanghai, China
| | - Aidi Jiang
- Institute of Science and Technology for Brain-Inspired Intelligence, Fudan University, Shanghai, China
| | - Jingqi Chen
- Institute of Science and Technology for Brain-Inspired Intelligence, Fudan University, Shanghai, China
- MOE Frontiers Center for Brain Science, Fudan University, Shanghai, China
- MOE Key Laboratory of Computational Neuroscience and Brain-Inspired Intelligence, Fudan University, Shanghai, China
- Zhangjiang Fudan International Innovation Center, Shanghai, China
| |
Collapse
|
6
|
Liu X, Ling X, Tian Q, Huang Z, Ding J. Nuclear remodeling during cell fate transitions. Curr Opin Genet Dev 2025; 90:102287. [PMID: 39631291 DOI: 10.1016/j.gde.2024.102287] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/12/2024] [Revised: 10/08/2024] [Accepted: 11/12/2024] [Indexed: 12/07/2024]
Abstract
Totipotent stem cells, the earliest cells in embryonic development, can differentiate into complete embryos and extra-embryonic tissues, making them essential for understanding both development and regenerative medicine. This review examines recent advances in the dynamic remodeling of nuclear structures during the transition between totipotency and pluripotency, as well as other cell fate transition processes. Additionally, we highlight innovative experimental and computational methods that elucidate the relationship between nuclear architecture and cell fate decisions. By integrating these insights, we aim to enhance our understanding of how nuclear remodeling influences totipotency and other cell fate transitions, paving the way for future research in this critical field.
Collapse
Affiliation(s)
- Xinyi Liu
- Center for Stem Cell Biology and Tissue Engineering, Key Laboratory for Stem Cells and Tissue Engineering, Ministry of Education, Zhongshan School of Medicine, Sun Yat-Sen University, Guangzhou, Guangdong, China
| | - Xiaoru Ling
- Center for Stem Cell Biology and Tissue Engineering, Key Laboratory for Stem Cells and Tissue Engineering, Ministry of Education, Zhongshan School of Medicine, Sun Yat-Sen University, Guangzhou, Guangdong, China
| | - Qi Tian
- Center for Stem Cell Biology and Tissue Engineering, Key Laboratory for Stem Cells and Tissue Engineering, Ministry of Education, Zhongshan School of Medicine, Sun Yat-Sen University, Guangzhou, Guangdong, China
| | - Zibin Huang
- Center for Stem Cell Biology and Tissue Engineering, Key Laboratory for Stem Cells and Tissue Engineering, Ministry of Education, Zhongshan School of Medicine, Sun Yat-Sen University, Guangzhou, Guangdong, China
| | - Junjun Ding
- Center for Stem Cell Biology and Tissue Engineering, Key Laboratory for Stem Cells and Tissue Engineering, Ministry of Education, Zhongshan School of Medicine, Sun Yat-Sen University, Guangzhou, Guangdong, China.
| |
Collapse
|
7
|
Li K, Zhang P, Xu J, Wen Z, Zhang J, Zi Z, Li L. COCOA: A Framework for Fine-scale Mapping of Cell-type-specific Chromatin Compartments Using Epigenomic Information. GENOMICS, PROTEOMICS & BIOINFORMATICS 2025; 22:qzae091. [PMID: 39724385 PMCID: PMC11993304 DOI: 10.1093/gpbjnl/qzae091] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 11/30/2023] [Revised: 11/05/2024] [Accepted: 12/09/2024] [Indexed: 12/28/2024]
Abstract
Chromatin compartmentalization and epigenomic modifications play crucial roles in cell differentiation and disease development. However, precise mapping of chromatin compartment patterns requires Hi-C or Micro-C data at high sequencing depth. Exploring the systematic relationship between epigenomic modifications and compartment patterns remains challenging. To address these issues, we present COCOA, a deep neural network framework using convolution and attention mechanisms to infer fine-scale chromatin compartment patterns from six histone modification signals. COCOA extracts 1D track features through bidirectional feature reconstruction after resolution-specific binning of epigenomic signals. These track features are then cross-fused with contact features using an attention mechanism and transformed into chromatin compartment patterns through residual feature reduction. COCOA demonstrates accurate inference of chromatin compartmentalization at a fine-scale resolution and exhibits stable performance on test sets. Additionally, we explored the impact of histone modifications on chromatin compartmentalization prediction through in silico epigenomic perturbation experiments. Unlike obscure compartments observed in high-depth experimental data at 1-kb resolution, COCOA generates clear and detailed compartment patterns, highlighting its superior performance. Finally, we demonstrate that COCOA enables cell-type-specific prediction of unrevealed chromatin compartment patterns in various biological processes, making it an effective tool for gaining insights into chromatin compartmentalization from epigenomics in diverse biological scenarios. The COCOA Python code is publicly available at https://github.com/onlybugs/COCOA and https://ngdc.cncb.ac.cn/biocode/tools/BT007498.
Collapse
Affiliation(s)
- Kai Li
- Hubei Key Laboratory of Agricultural Bioinformatics, College of Informatics, Huazhong Agricultural University, Wuhan 430070, China
| | - Ping Zhang
- Hubei Key Laboratory of Agricultural Bioinformatics, College of Informatics, Huazhong Agricultural University, Wuhan 430070, China
| | - Jinsheng Xu
- Hubei Key Laboratory of Agricultural Bioinformatics, College of Informatics, Huazhong Agricultural University, Wuhan 430070, China
| | - Zi Wen
- Hubei Key Laboratory of Agricultural Bioinformatics, College of Informatics, Huazhong Agricultural University, Wuhan 430070, China
| | - Junying Zhang
- Hubei Key Laboratory of Agricultural Bioinformatics, College of Informatics, Huazhong Agricultural University, Wuhan 430070, China
| | - Zhike Zi
- Shenzhen Key Laboratory of Synthetic Genomics, Guangdong Provincial Key Laboratory of Synthetic Genomics, Key Laboratory of Quantitative Synthetic Biology, Shenzhen Institute of Synthetic Biology, Shenzhen Institute of Advanced Technology, Chinese Academy of Sciences, Shenzhen 518055, China
| | - Li Li
- Hubei Key Laboratory of Agricultural Bioinformatics, College of Informatics, Huazhong Agricultural University, Wuhan 430070, China
- Hubei Hongshan Laboratory, Wuhan 430070, China
| |
Collapse
|
8
|
Wall BPG, Nguyen M, Harrell JC, Dozmorov MG. Machine and Deep Learning Methods for Predicting 3D Genome Organization. Methods Mol Biol 2025; 2856:357-400. [PMID: 39283464 DOI: 10.1007/978-1-0716-4136-1_22] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 09/25/2024]
Abstract
Three-dimensional (3D) chromatin interactions, such as enhancer-promoter interactions (EPIs), loops, topologically associating domains (TADs), and A/B compartments, play critical roles in a wide range of cellular processes by regulating gene expression. Recent development of chromatin conformation capture technologies has enabled genome-wide profiling of various 3D structures, even with single cells. However, current catalogs of 3D structures remain incomplete and unreliable due to differences in technology, tools, and low data resolution. Machine learning methods have emerged as an alternative to obtain missing 3D interactions and/or improve resolution. Such methods frequently use genome annotation data (ChIP-seq, DNAse-seq, etc.), DNA sequencing information (k-mers and transcription factor binding site (TFBS) motifs), and other genomic properties to learn the associations between genomic features and chromatin interactions. In this review, we discuss computational tools for predicting three types of 3D interactions (EPIs, chromatin interactions, and TAD boundaries) and analyze their pros and cons. We also point out obstacles to the computational prediction of 3D interactions and suggest future research directions.
Collapse
Affiliation(s)
- Brydon P G Wall
- Center for Biological Data Science, Virginia Commonwealth University, Richmond, VA, USA
| | - My Nguyen
- Department of Biostatistics, Virginia Commonwealth University, Richmond, VA, USA
| | - J Chuck Harrell
- Department of Pathology, Virginia Commonwealth University, Richmond, VA, USA
- Massey Comprehensive Cancer Center, Virginia Commonwealth University, Richmond, VA, USA
- Center for Pharmaceutical Engineering, Virginia Commonwealth University, Richmond, VA, USA
| | - Mikhail G Dozmorov
- Department of Biostatistics, Virginia Commonwealth University, Richmond, VA, USA.
- Department of Pathology, Virginia Commonwealth University, Richmond, VA, USA.
| |
Collapse
|
9
|
Wang X, Zhang Y, Ray S, Jha A, Fang T, Hang S, Doulatov S, Noble WS, Wang S. A generalizable Hi-C foundation model for chromatin architecture, single-cell and multi-omics analysis across species. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.12.16.628821. [PMID: 39763871 PMCID: PMC11702576 DOI: 10.1101/2024.12.16.628821] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 01/19/2025]
Abstract
Nuclear DNA is organized into a compact three-dimensional (3D) structure that impacts critical cellular processes. High-throughput chromosome conformation capture (Hi-C) is the most widely used method for measuring 3D genome architecture, while linear epigenomic assays, such as ATAC-seq, DNase-seq, and ChIP-seq, are extensively employed to characterize epigenomic regulation. However, the integrative analysis of chromatin interactions and associated epigenomic regulation remains challenging due to the pairwise nature of Hi-C data, mismatched resolution between Hi-C and epigenomic assays, and inconsistencies among analysis tools. Here we propose HiCFoundation, a Hi-C-based foundation model for integrative analysis linking chromatin structure to downstream regulatory function. HiCFoundation is trained from hundreds of Hi-C assays encompassing 118 million contact matrix submatrices. The model achieves state-of-the-art performance on multiple types of 3D genome analysis, including reproducibility analysis, resolution enhancement, and loop detection. We further demonstrate the model's generalizability through genome architecture analysis of 316 species. Notably, by enhancing low-coverage experimental Hi-C data, HiCFoundation reveals genome-wide loop loss during differentiation of hematopoietic stem and progenitor cells (HSPCs) to neutrophils. Additionally, HiCFoundation is able to predict multiple types of epigenomic activity from Hi-C input and further interprets the link between Hi-C input and epigenomic output to reveal the relationship between chromatin conformation and genome function. Finally, HiCFoundation can analyze single-cell Hi-C data, shedding light on genome structure at single-cell resolution. HiCFoundation thus provides a unified, efficient, generalizable, and interpretable foundation for genome architecture, single-cell and multi-omics analysis across species, paving the path for systematically studying genome 3D architecture and its regulatory mechanisms.
Collapse
Affiliation(s)
- Xiao Wang
- Department of Genome Sciences, University of Washington, Seattle, WA, USA
- Paul G. Allen School of Computer Science and Engineering, University of Washington, Seattle, WA, 98105, USA
| | - Yuanyuan Zhang
- Department of Computer Science, Purdue University, West Lafayette, IN, 47907, USA
| | - Suhita Ray
- Division of Hematology and Oncology, University of Washington, Seattle, WA, 98105, USA
| | - Anupama Jha
- Department of Genome Sciences, University of Washington, Seattle, WA, USA
| | - Tangqi Fang
- Paul G. Allen School of Computer Science and Engineering, University of Washington, Seattle, WA, 98105, USA
| | - Shengqi Hang
- Paul G. Allen School of Computer Science and Engineering, University of Washington, Seattle, WA, 98105, USA
| | - Sergei Doulatov
- Division of Hematology and Oncology, University of Washington, Seattle, WA, 98105, USA
| | - William Stafford Noble
- Department of Genome Sciences, University of Washington, Seattle, WA, USA
- Paul G. Allen School of Computer Science and Engineering, University of Washington, Seattle, WA, 98105, USA
| | - Sheng Wang
- Paul G. Allen School of Computer Science and Engineering, University of Washington, Seattle, WA, 98105, USA
| |
Collapse
|
10
|
Banecki K, Korsak S, Plewczynski D. Advancements and future directions in single-cell Hi-C based 3D chromatin modeling. Comput Struct Biotechnol J 2024; 23:3549-3558. [PMID: 39963420 PMCID: PMC11832020 DOI: 10.1016/j.csbj.2024.09.026] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/06/2024] [Revised: 09/27/2024] [Accepted: 09/29/2024] [Indexed: 02/20/2025] Open
Abstract
Single-cell Hi-C data provides valuable insights into the three-dimensional organization of chromatin within individual cells, yet modeling this data poses significant challenges due to its inherent sparsity and variability. This review comprehensively explores the predominant approaches to reconstructing 3D chromatin structures from single-cell Hi-C data, positioning these methods within the broader contexts of single-cell Hi-C research and bulk Hi-C data modeling. We categorize the modeling strategies based on their objective functions, which are framed in terms of force fields, potentials, cost functions, or likelihood probabilities. Despite their diverse methodologies, these approaches exhibit deep underlying similarities. We further dissect the basic components of these models, such as attractive restraint forces and repulsive forces, and discuss additional terms like fluid viscosity and variation penalties. The review also critically evaluates the current state of model validation, highlighting the inconsistencies across various studies and emphasizing the need for a comprehensive validation framework. We detail common validation techniques, including the comparison of distance matrices and the assessment of contact violations. We argue that the future of single-cell Hi-C modeling lies in integrating multiple data modalities and incorporating cell cycle trajectory information. Such integration could significantly advance our understanding of chromatin conformation dynamics during cell cycle progression and cell differentiation. We also foresee the continued growth of optimization-based and molecular dynamics approaches, supported by general molecular dynamics toolkits.
Collapse
Affiliation(s)
- Krzysztof Banecki
- Laboratory of Bioinformatics and Computational Genomics, Faculty of Mathematics and Information Science, Warsaw University of Technology, Warsaw, Poland
- Laboratory of Functional and Structural Genomics, Centre of New Technologies, University of Warsaw, Warsaw, Poland
| | - Sevastianos Korsak
- Laboratory of Bioinformatics and Computational Genomics, Faculty of Mathematics and Information Science, Warsaw University of Technology, Warsaw, Poland
- Laboratory of Functional and Structural Genomics, Centre of New Technologies, University of Warsaw, Warsaw, Poland
| | - Dariusz Plewczynski
- Laboratory of Bioinformatics and Computational Genomics, Faculty of Mathematics and Information Science, Warsaw University of Technology, Warsaw, Poland
- Laboratory of Functional and Structural Genomics, Centre of New Technologies, University of Warsaw, Warsaw, Poland
| |
Collapse
|
11
|
Wang Y, Kong S, Zhou C, Wang Y, Zhang Y, Fang Y, Li G. A review of deep learning models for the prediction of chromatin interactions with DNA and epigenomic profiles. Brief Bioinform 2024; 26:bbae651. [PMID: 39708837 DOI: 10.1093/bib/bbae651] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/23/2024] [Revised: 10/29/2024] [Accepted: 12/03/2024] [Indexed: 12/23/2024] Open
Abstract
Advances in three-dimensional (3D) genomics have revealed the spatial characteristics of chromatin interactions in gene expression regulation, which is crucial for understanding molecular mechanisms in biological processes. High-throughput technologies like ChIA-PET, Hi-C, and their derivatives methods have greatly enhanced our knowledge of 3D chromatin architecture. However, the chromatin interaction mechanisms remain largely unexplored. Deep learning, with its powerful feature extraction and pattern recognition capabilities, offers a promising approach for integrating multi-omics data, to build accurate predictive models of chromatin interaction matrices. This review systematically summarizes recent advances in chromatin interaction matrix prediction models. By integrating DNA sequences and epigenetic signals, we investigate the latest developments in these methods. This article details various models, focusing on how one-dimensional (1D) information transforms into the 3D structure chromatin interactions, and how the integration of different deep learning modules specifically affects model accuracy. Additionally, we discuss the critical role of DNA sequence information and epigenetic markers in shaping 3D genome interaction patterns. Finally, this review addresses the challenges in predicting chromatin interaction matrices, in order to improve the precise mapping of chromatin interaction matrices and DNA sequence, and supporting the transformation and theoretical development of 3D genomics across biological systems.
Collapse
Affiliation(s)
- Yunlong Wang
- Shenzhen Branch, Guangdong Laboratory of Lingnan Modern Agriculture, Key Laboratory of Livestock and Poultry Multi-omics of MARA, Agricultural Genomics Institute at Shenzhen, Chinese Academy of Agricultural Sciences, No. 97 Buxin Road, Dapeng New District, Shenzhen 518120, China
| | - Siyuan Kong
- Shenzhen Branch, Guangdong Laboratory of Lingnan Modern Agriculture, Key Laboratory of Livestock and Poultry Multi-omics of MARA, Agricultural Genomics Institute at Shenzhen, Chinese Academy of Agricultural Sciences, No. 97 Buxin Road, Dapeng New District, Shenzhen 518120, China
| | - Cong Zhou
- Agricultural Bioinformatics Key Laboratory of Hubei Province, Huazhong Agricultural University, No. 1 Shizishan Street, Hongshan District, Wuhan 430070, China
- Hubei Engineering Technology Research Center of Agricultural Big Data, 3D Genomics Research Center, No. 1 Shizishan Street, Hongshan District, Wuhan 430070, China
- College of Informatics, Huazhong Agricultural University, No. 1 Shizishan Street, Hongshan District, Wuhan 430070, China
| | - Yanfang Wang
- State Key Laboratory of Animal Biotech Breeding, Institute of Animal Science, Chinese Academy of Agricultural Sciences (CAAS), No. 2 West Yuanmingyuan Rd, Haidian District, Beijing 100193, China
| | - Yubo Zhang
- Shenzhen Branch, Guangdong Laboratory of Lingnan Modern Agriculture, Key Laboratory of Livestock and Poultry Multi-omics of MARA, Agricultural Genomics Institute at Shenzhen, Chinese Academy of Agricultural Sciences, No. 97 Buxin Road, Dapeng New District, Shenzhen 518120, China
- Sequencing Facility, Frederick National Laboratory for Cancer Research, 8560 Progress Drive, Frederick, MD 21701, United States
| | - Yaping Fang
- Agricultural Bioinformatics Key Laboratory of Hubei Province, Huazhong Agricultural University, No. 1 Shizishan Street, Hongshan District, Wuhan 430070, China
- Hubei Engineering Technology Research Center of Agricultural Big Data, 3D Genomics Research Center, No. 1 Shizishan Street, Hongshan District, Wuhan 430070, China
- College of Informatics, Huazhong Agricultural University, No. 1 Shizishan Street, Hongshan District, Wuhan 430070, China
| | - Guoliang Li
- Agricultural Bioinformatics Key Laboratory of Hubei Province, Huazhong Agricultural University, No. 1 Shizishan Street, Hongshan District, Wuhan 430070, China
- Hubei Engineering Technology Research Center of Agricultural Big Data, 3D Genomics Research Center, No. 1 Shizishan Street, Hongshan District, Wuhan 430070, China
- College of Informatics, Huazhong Agricultural University, No. 1 Shizishan Street, Hongshan District, Wuhan 430070, China
| |
Collapse
|
12
|
Gao VR, Yang R, Das A, Luo R, Luo H, McNally DR, Karagiannidis I, Rivas MA, Wang ZM, Barisic D, Karbalayghareh A, Wong W, Zhan YA, Chin CR, Noble WS, Bilmes JA, Apostolou E, Kharas MG, Béguelin W, Viny AD, Huangfu D, Rudensky AY, Melnick AM, Leslie CS. ChromaFold predicts the 3D contact map from single-cell chromatin accessibility. Nat Commun 2024; 15:9432. [PMID: 39487131 PMCID: PMC11530433 DOI: 10.1038/s41467-024-53628-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/18/2023] [Accepted: 10/14/2024] [Indexed: 11/04/2024] Open
Abstract
Identifying cell-type-specific 3D chromatin interactions between regulatory elements can help decipher gene regulation and interpret disease-associated non-coding variants. However, achieving this resolution with current 3D genomics technologies is often infeasible given limited input cell numbers. We therefore present ChromaFold, a deep learning model that predicts 3D contact maps, including regulatory interactions, from single-cell ATAC sequencing (scATAC-seq) data alone. ChromaFold uses pseudobulk chromatin accessibility, co-accessibility across metacells, and a CTCF motif track as inputs and employs a lightweight architecture to train on standard GPUs. Trained on paired scATAC-seq and Hi-C data in human samples, ChromaFold accurately predicts the 3D contact map and peak-level interactions across diverse human and mouse test cell types. Compared to leading contact map prediction models that use ATAC-seq and CTCF ChIP-seq, ChromaFold achieves state-of-the-art performance using only scATAC-seq. Finally, fine-tuning ChromaFold on paired scATAC-seq and Hi-C in a complex tissue enables deconvolution of chromatin interactions across cell subpopulations.
Collapse
Affiliation(s)
- Vianne R Gao
- Computational and Systems Biology Program, Memorial Sloan Kettering Cancer Center, New York, NY, USA
- Tri-Institutional Program in Computational Biology and Medicine, New York, NY, USA
| | - Rui Yang
- Computational and Systems Biology Program, Memorial Sloan Kettering Cancer Center, New York, NY, USA
- Tri-Institutional Program in Computational Biology and Medicine, New York, NY, USA
| | - Arnav Das
- University of Washington, Seattle, WA, USA
| | - Renhe Luo
- Developmental Biology Program, Sloan Kettering Institute, New York, NY, USA
| | - Hanzhi Luo
- Molecular Pharmacology Program, Experimental Therapeutics Center and Center for Stem Cell Biology, Memorial Sloan Kettering Cancer Center, New York, NY, USA
| | - Dylan R McNally
- Caryl and Israel Englander Institute for Precision Medicine, Institute for Computational Biomedicine, Weill Cornell Medicine, Cornell University, New York, NY, USA
| | - Ioannis Karagiannidis
- Division of Hematology and Medical Oncology, Department of Medicine, Weill Cornell Medical College, New York, NY, USA
| | - Martin A Rivas
- Division of Hematology and Medical Oncology, Department of Medicine, Weill Cornell Medical College, New York, NY, USA
- Department of Biochemistry & Molecular Biology; Sylvester Comprehensive Cancer Center, University of Miami Miller School of Medicine, Miami, FL, USA
| | - Zhong-Min Wang
- Howard Hughes Medical Institute and Immunology Program, Sloan Kettering Institute and Ludwig Center at Memorial Sloan Kettering Cancer Center, New York, NY, USA
| | - Darko Barisic
- Division of Hematology and Medical Oncology, Department of Medicine, Weill Cornell Medical College, New York, NY, USA
| | - Alireza Karbalayghareh
- Computational and Systems Biology Program, Memorial Sloan Kettering Cancer Center, New York, NY, USA
| | - Wilfred Wong
- Computational and Systems Biology Program, Memorial Sloan Kettering Cancer Center, New York, NY, USA
- Tri-Institutional Program in Computational Biology and Medicine, New York, NY, USA
| | - Yingqian A Zhan
- Center for Epigenetics Research, Memorial Sloan Kettering Cancer Center, New York, NY, USA
| | - Christopher R Chin
- Division of Hematology and Medical Oncology, Department of Medicine, Weill Cornell Medical College, New York, NY, USA
| | | | | | - Effie Apostolou
- Joan and Sanford I. Weill Department of Medicine, Sandra and Edward Meyer Cancer Center, Weill Cornell Medicine, New York, NY, USA
| | - Michael G Kharas
- Molecular Pharmacology Program, Experimental Therapeutics Center and Center for Stem Cell Biology, Memorial Sloan Kettering Cancer Center, New York, NY, USA
| | - Wendy Béguelin
- Division of Hematology and Medical Oncology, Department of Medicine, Weill Cornell Medical College, New York, NY, USA
| | - Aaron D Viny
- Departments of Medicine, Division of Hematology & Oncology, and of Genetics & Development, Columbia Stem Cell Initiative, Herbert Irving Comprehensive Cancer Center, Columbia University Irving Medical Center, New York, NY, USA
| | - Danwei Huangfu
- Developmental Biology Program, Sloan Kettering Institute, New York, NY, USA
| | - Alexander Y Rudensky
- Howard Hughes Medical Institute and Immunology Program, Sloan Kettering Institute and Ludwig Center at Memorial Sloan Kettering Cancer Center, New York, NY, USA
| | - Ari M Melnick
- Division of Hematology and Medical Oncology, Department of Medicine, Weill Cornell Medical College, New York, NY, USA
| | - Christina S Leslie
- Computational and Systems Biology Program, Memorial Sloan Kettering Cancer Center, New York, NY, USA.
| |
Collapse
|
13
|
Dekker J, Oksuz BA, Zhang Y, Wang Y, Minsk MK, Kuang S, Yang L, Gibcus JH, Krietenstein N, Rando OJ, Xu J, Janssens DH, Henikoff S, Kukalev A, Willemin A, Winick-Ng W, Kempfer R, Pombo A, Yu M, Kumar P, Zhang L, Belmont AS, Sasaki T, van Schaik T, Brueckner L, Peric-Hupkes D, van Steensel B, Wang P, Chai H, Kim M, Ruan Y, Zhang R, Quinodoz SA, Bhat P, Guttman M, Zhao W, Chien S, Liu Y, Venev SV, Plewczynski D, Azcarate II, Szabó D, Thieme CJ, Szczepińska T, Chiliński M, Sengupta K, Conte M, Esposito A, Abraham A, Zhang R, Wang Y, Wen X, Wu Q, Yang Y, Liu J, Boninsegna L, Yildirim A, Zhan Y, Chiariello AM, Bianco S, Lee L, Hu M, Li Y, Barnett RJ, Cook AL, Emerson DJ, Marchal C, Zhao P, Park P, Alver BH, Schroeder A, Navelkar R, Bakker C, Ronchetti W, Ehmsen S, Veit A, Gehlenborg N, Wang T, Li D, Wang X, Nicodemi M, Ren B, Zhong S, Phillips-Cremins JE, Gilbert DM, Pollard KS, Alber F, Ma J, Noble WS, Yue F. An integrated view of the structure and function of the human 4D nucleome. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.09.17.613111. [PMID: 39484446 PMCID: PMC11526861 DOI: 10.1101/2024.09.17.613111] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/03/2024]
Abstract
The dynamic three-dimensional (3D) organization of the human genome (the "4D Nucleome") is closely linked to genome function. Here, we integrate a wide variety of genomic data generated by the 4D Nucleome Project to provide a detailed view of human 3D genome organization in widely used embryonic stem cells (H1-hESCs) and immortalized fibroblasts (HFFc6). We provide extensive benchmarking of 3D genome mapping assays and integrate these diverse datasets to annotate spatial genomic features across scales. The data reveal a rich complexity of chromatin domains and their sub-nuclear positions, and over one hundred thousand structural loops and promoter-enhancer interactions. We developed 3D models of population-based and individual cell-to-cell variation in genome structure, establishing connections between chromosome folding, nuclear organization, chromatin looping, gene transcription, and DNA replication. We demonstrate the use of computational methods to predict genome folding from DNA sequence, uncovering potential effects of genetic variants on genome structure and function. Together, this comprehensive analysis contributes insights into human genome organization and enhances our understanding of connections between the regulation of genome function and 3D genome organization in general.
Collapse
Affiliation(s)
| | - Job Dekker
- Department of Systems Biology, University of Massachusetts Chan Medical School, Worcester, MA 01605, USA
- Howard Hughes Medical Institute, Chevy Chase, MD, USA
| | - Betul Akgol Oksuz
- Department of Systems Biology, University of Massachusetts Chan Medical School, Worcester, MA 01605, USA
| | - Yang Zhang
- Ray and Stephanie Lane Computational Biology Department, School of Computer Science, Carnegie Mellon University
| | - Ye Wang
- Department of Microbiology, Immunology, and Molecular Genetics; Institute for Quantitative and Computational Biosciences, University of California Los Angeles, Los Angeles, CA, USA
| | - Miriam K. Minsk
- Department of Genetics, Department of Bioengineering, Epigenetics Institute, University of Pennsylvania, Philadelphia, PA, USA
| | | | - Liyan Yang
- Department of Systems Biology, University of Massachusetts Chan Medical School, Worcester, MA 01605, USA
| | - Johan H. Gibcus
- Department of Systems Biology, University of Massachusetts Chan Medical School, Worcester, MA 01605, USA
| | - Nils Krietenstein
- Novo Nordisk Foundation Center for Protein Research, University of Copenhagen
| | - Oliver J. Rando
- Department of Biochemistry and Molecular Biotechnology, University of Massachusetts Chan Medical School, Worcester, Massachusetts 01605, USA
| | - Jie Xu
- Department of Biochemistry and Molecular Genetics, Feinberg School of Medicine Northwestern University, Chicago, Illinois, USA
| | - Derek H. Janssens
- Basic Sciences Division, Fred Hutchinson Cancer Center, Seattle, WA, USA
- Department of Epigenetics, Van Andel Institute, Grand Rapids, MI, USA
| | - Steven Henikoff
- Howard Hughes Medical Institute, Chevy Chase, MD, USA
- Basic Sciences Division, Fred Hutchinson Cancer Center, Seattle, WA, USA
| | - Alexander Kukalev
- Max-Delbrück-Center for Molecular Medicine in the Helmholtz Association (MDC), Berlin Institute for Medical Systems Biology (BIMSB), Epigenetic Regulation and Chromatin Architecture Group, 10115 Berlin, Germany
| | - Andréa Willemin
- Max-Delbrück-Center for Molecular Medicine in the Helmholtz Association (MDC), Berlin Institute for Medical Systems Biology (BIMSB), Epigenetic Regulation and Chromatin Architecture Group, 10115 Berlin, Germany
| | - Warren Winick-Ng
- Max-Delbrück-Center for Molecular Medicine in the Helmholtz Association (MDC), Berlin Institute for Medical Systems Biology (BIMSB), Epigenetic Regulation and Chromatin Architecture Group, 10115 Berlin, Germany
| | - Rieke Kempfer
- Max-Delbrück-Center for Molecular Medicine in the Helmholtz Association (MDC), Berlin Institute for Medical Systems Biology (BIMSB), Epigenetic Regulation and Chromatin Architecture Group, 10115 Berlin, Germany
| | - Ana Pombo
- Max-Delbrück-Center for Molecular Medicine in the Helmholtz Association (MDC), Berlin Institute for Medical Systems Biology (BIMSB), Epigenetic Regulation and Chromatin Architecture Group, 10115 Berlin, Germany
| | - Miao Yu
- University of California, San Diego School of Medicine, Department of Cellular and Molecular Medicine, La Jolla, CA, USA
- State Key Laboratory of Genetic Engineering, School of Life Sciences, Fudan University, Shanghai, China
| | - Pradeep Kumar
- Department of Cell and Developmental Biology, University of Illinois at Urbana-Champaign, Urbana, IL, USA
| | - Liguo Zhang
- Department of Cell and Developmental Biology, University of Illinois at Urbana-Champaign, Urbana, IL, USA
| | - Andrew S Belmont
- Department of Cell and Developmental Biology, University of Illinois at Urbana-Champaign, Urbana, IL, USA
| | - Takayo Sasaki
- San Diego Biomedical Research Institute, San Diego, CA, USA
| | - Tom van Schaik
- Division of Gene Regulation, Netherlands Cancer Institute, Amsterdam, the Netherlands
- Oncode Institute, the Netherlands
| | - Laura Brueckner
- Division of Gene Regulation, Netherlands Cancer Institute, Amsterdam, the Netherlands
| | - Daan Peric-Hupkes
- Division of Gene Regulation, Netherlands Cancer Institute, Amsterdam, the Netherlands
- Oncode Institute, the Netherlands
| | - Bas van Steensel
- Division of Gene Regulation, Netherlands Cancer Institute, Amsterdam, the Netherlands
- Oncode Institute, the Netherlands
| | - Ping Wang
- Department of Biochemistry and Molecular Genetics, Feinberg School of Medicine Northwestern University, Chicago, Illinois, USA
| | - Haoxi Chai
- Life Sciences Institute, Zhejiang University, Hangzhou, Zhejiang Province, 310058, P.R. China
| | - Minji Kim
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, MI 48109, USA
| | - Yijun Ruan
- Life Sciences Institute, Zhejiang University, Hangzhou, Zhejiang Province, 310058, P.R. China
| | - Ran Zhang
- Department of Genome Sciences, University of Washington, Seattle, WA 98195
| | - Sofia A. Quinodoz
- Division of Biology and Biological Engineering, California Institute of Technology, Pasadena, CA 91125, USA
- Department of Chemical and Biological Engineering, Princeton University, Princeton, NJ 08544, USA
| | - Prashant Bhat
- Division of Biology and Biological Engineering, California Institute of Technology, Pasadena, CA 91125, USA
- David Geffen School of Medicine at UCLA, Los Angeles, USA
| | - Mitchell Guttman
- Division of Biology and Biological Engineering, California Institute of Technology, Pasadena, CA 91125, USA
| | - Wenxin Zhao
- Shu Chien-Gene Lay Department of Bioengineering, University of California San Diego, La Jolla, CA, USA
| | - Shu Chien
- Shu Chien-Gene Lay Department of Bioengineering, University of California San Diego, La Jolla, CA, USA
| | - Yuan Liu
- Shu Chien-Gene Lay Department of Bioengineering, University of California San Diego, La Jolla, CA, USA
| | - Sergey V. Venev
- Department of Systems Biology, University of Massachusetts Chan Medical School, Worcester, MA 01605, USA
| | - Dariusz Plewczynski
- Laboratory of Bioinformatics and Computational Genomics, Faculty of Mathematics and Information Science, Warsaw University of Technology ul. Koszykowa 75, 00-662 Warsaw, Poland
- Laboratory of Functional and Structural Genomics, Centre of New Technologies, University of Warsaw, Banacha 2c Street, 02-097 Warsaw, Poland
| | - Ibai Irastorza Azcarate
- Max-Delbrück-Center for Molecular Medicine in the Helmholtz Association (MDC), Berlin Institute for Medical Systems Biology (BIMSB), Epigenetic Regulation and Chromatin Architecture Group, 10115 Berlin, Germany
| | - Dominik Szabó
- Max-Delbrück-Center for Molecular Medicine in the Helmholtz Association (MDC), Berlin Institute for Medical Systems Biology (BIMSB), Epigenetic Regulation and Chromatin Architecture Group, 10115 Berlin, Germany
| | - Christoph J. Thieme
- Max-Delbrück-Center for Molecular Medicine in the Helmholtz Association (MDC), Berlin Institute for Medical Systems Biology (BIMSB), Epigenetic Regulation and Chromatin Architecture Group, 10115 Berlin, Germany
| | - Teresa Szczepińska
- Max-Delbrück-Center for Molecular Medicine in the Helmholtz Association (MDC), Berlin Institute for Medical Systems Biology (BIMSB), Epigenetic Regulation and Chromatin Architecture Group, 10115 Berlin, Germany
- Centre for Advanced Materials and Technologies CEZAMAT, Warsaw University of Technology, Poleczki 19, 02-822 Warsaw, Poland
- Laboratory of Functional and Structural Genomics, Centre of New Technologies, University of Warsaw, Banacha 2c Street, 02-097 Warsaw, Poland
| | - Mateusz Chiliński
- Laboratory of Bioinformatics and Computational Genomics, Faculty of Mathematics and Information Science, Warsaw University of Technology ul. Koszykowa 75, 00-662 Warsaw, Poland
| | - Kaustav Sengupta
- Laboratory of Bioinformatics and Computational Genomics, Faculty of Mathematics and Information Science, Warsaw University of Technology ul. Koszykowa 75, 00-662 Warsaw, Poland
| | - Mattia Conte
- Department of Physics, University of Naples “Federico II”, Naples, Italy; INFN, Naples, Italy
| | - Andrea Esposito
- Department of Physics, University of Naples “Federico II”, Naples, Italy; INFN, Naples, Italy
| | - Alex Abraham
- Department of Physics, University of Naples “Federico II”, Naples, Italy; INFN, Naples, Italy
| | - Ruochi Zhang
- Ray and Stephanie Lane Computational Biology Department, School of Computer Science, Carnegie Mellon University
| | - Yuchuan Wang
- Ray and Stephanie Lane Computational Biology Department, School of Computer Science, Carnegie Mellon University
| | - Xingzhao Wen
- Program in Bioinformatics and Systems Biology, University of California San Diego, La Jolla, CA, USA
| | - Qiuyang Wu
- Shu Chien-Gene Lay Department of Bioengineering, University of California San Diego, La Jolla, CA, USA
| | - Yang Yang
- Ray and Stephanie Lane Computational Biology Department, School of Computer Science, Carnegie Mellon University
| | - Jie Liu
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, MI 48109, USA
| | - Lorenzo Boninsegna
- Department of Microbiology, Immunology, and Molecular Genetics; Institute for Quantitative and Computational Biosciences, University of California Los Angeles, Los Angeles, CA, USA
| | - Asli Yildirim
- Department of Microbiology, Immunology, and Molecular Genetics; Institute for Quantitative and Computational Biosciences, University of California Los Angeles, Los Angeles, CA, USA
| | - Yuxiang Zhan
- Department of Microbiology, Immunology, and Molecular Genetics; Institute for Quantitative and Computational Biosciences, University of California Los Angeles, Los Angeles, CA, USA
| | - Andrea Maria Chiariello
- Department of Physics, University of Naples “Federico II”, Naples, Italy; INFN, Naples, Italy
| | - Simona Bianco
- Department of Physics, University of Naples “Federico II”, Naples, Italy; INFN, Naples, Italy
| | - Lindsay Lee
- Department of Quantitative Health Sciences, Lerner Research Institute, Cleveland Clinic Foundation, Cleveland, OH, USA
| | - Ming Hu
- Department of Quantitative Health Sciences, Lerner Research Institute, Cleveland Clinic Foundation, Cleveland, OH, USA
| | - Yun Li
- Department of Biostatistics, Department of Genetics, University of North Carolina, Chapel Hill, NC 27599, USA
| | - R. Jordan Barnett
- Department of Genetics, Department of Bioengineering, Epigenetics Institute, University of Pennsylvania, Philadelphia, PA, USA
| | - Ashley L. Cook
- Department of Genetics, Department of Bioengineering, Epigenetics Institute, University of Pennsylvania, Philadelphia, PA, USA
| | - Daniel J. Emerson
- Department of Genetics, Department of Bioengineering, Epigenetics Institute, University of Pennsylvania, Philadelphia, PA, USA
| | | | - Peiyao Zhao
- San Diego Biomedical Research Institute, San Diego, CA, USA
| | - Peter Park
- Department of Biomedical Informatics, Harvard Medical School, Boston, MA, 02115
| | - Burak H. Alver
- Department of Biomedical Informatics, Harvard Medical School, Boston, MA, 02115
| | - Andrew Schroeder
- Department of Biomedical Informatics, Harvard Medical School, Boston, MA, 02115
| | - Rahi Navelkar
- Department of Biomedical Informatics, Harvard Medical School, Boston, MA, 02115
| | - Clara Bakker
- Department of Biomedical Informatics, Harvard Medical School, Boston, MA, 02115
| | - William Ronchetti
- Department of Biomedical Informatics, Harvard Medical School, Boston, MA, 02115
| | - Shannon Ehmsen
- Department of Biomedical Informatics, Harvard Medical School, Boston, MA, 02115
| | - Alexander Veit
- Department of Biomedical Informatics, Harvard Medical School, Boston, MA, 02115
| | - Nils Gehlenborg
- Department of Biomedical Informatics, Harvard Medical School, Boston, MA, 02115
| | - Ting Wang
- Department of Genetics, Center for Genome Sciences and Systems Biology, Washington University School of Medicine, St. Louis, MO, USA
| | - Daofeng Li
- Department of Genetics, Center for Genome Sciences and Systems Biology, Washington University School of Medicine, St. Louis, MO, USA
| | - Xiaotao Wang
- Department of Biochemistry and Molecular Genetics, Feinberg School of Medicine Northwestern University, Chicago, Illinois, USA
- Obstetrics and Gynecology Hospital, Institute of Reproduction and Development, Fudan University, Shanghai, China
| | - Mario Nicodemi
- Department of Physics, University of Naples “Federico II”, Naples, Italy; INFN, Naples, Italy
| | - Bing Ren
- University of California, San Diego School of Medicine, Department of Cellular and Molecular Medicine, La Jolla, CA, USA
| | - Sheng Zhong
- Shu Chien-Gene Lay Department of Bioengineering, University of California San Diego, La Jolla, CA, USA
| | - Jennifer E. Phillips-Cremins
- Department of Genetics, Department of Bioengineering, Epigenetics Institute, University of Pennsylvania, Philadelphia, PA, USA
| | | | | | - Frank Alber
- Department of Microbiology, Immunology, and Molecular Genetics; Institute for Quantitative and Computational Biosciences, University of California Los Angeles, Los Angeles, CA, USA
| | - Jian Ma
- Ray and Stephanie Lane Computational Biology Department, School of Computer Science, Carnegie Mellon University
| | - William S. Noble
- Department of Genome Sciences, University of Washington, Seattle, WA 98195
| | - Feng Yue
- Department of Biochemistry and Molecular Genetics, Feinberg School of Medicine Northwestern University, Chicago, Illinois, USA
- Robert H. Lurie Comprehensive Cancer Center of Northwestern University, Chicago, Illinois, USA
| |
Collapse
|
14
|
Jha A, Hristov B, Wang X, Wang S, Greenleaf WJ, Kundaje A, Aiden EL, Bertero A, Noble WS. Prediction and functional interpretation of inter-chromosomal genome architecture from DNA sequence with TwinC. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.09.16.613355. [PMID: 39345598 PMCID: PMC11429679 DOI: 10.1101/2024.09.16.613355] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 10/01/2024]
Abstract
Three-dimensional nuclear DNA architecture comprises well-studied intra-chromosomal (cis) folding and less characterized inter-chromosomal (trans) interfaces. Current predictive models of 3D genome folding can effectively infer pairwise cis-chromatin interactions from the primary DNA sequence but generally ignore trans contacts. There is an unmet need for robust models of trans-genome organization that provide insights into their underlying principles and functional relevance. We present TwinC, an interpretable convolutional neural network model that reliably predicts trans contacts measurable through genome-wide chromatin conformation capture (Hi-C). TwinC uses a paired sequence design from replicate Hi-C experiments to learn single base pair relevance in trans interactions across two stretches of DNA. The method achieves high predictive accuracy (AUROC=0.80) on a cross-chromosomal test set from Hi-C experiments in heart tissue. Mechanistically, the neural network learns the importance of compartments, chromatin accessibility, clustered transcription factor binding and G-quadruplexes in forming trans contacts. In summary, TwinC models and interprets trans genome architecture, shedding light on this poorly understood aspect of gene regulation.
Collapse
Affiliation(s)
- Anupama Jha
- Department of Genome Sciences, University of Washington, Seattle, WA, USA
| | - Borislav Hristov
- Department of Genome Sciences, University of Washington, Seattle, WA, USA
| | - Xiao Wang
- Department of Genome Sciences, University of Washington, Seattle, WA, USA
- Paul G. Allen Center for Computer Science & Engineering, University of Washington, Seattle, WA, USA
| | - Sheng Wang
- Paul G. Allen Center for Computer Science & Engineering, University of Washington, Seattle, WA, USA
| | - William J Greenleaf
- Department of Genetics, Stanford University, Stanford, CA, USA
- Center for Personal Dynamic Regulomes, Stanford University, Stanford, CA, USA
- Department of Applied Physics, Stanford University, Stanford, CA, USA
- Chan Zuckerberg Biohub, San Francisco, CA, USA
| | - Anshul Kundaje
- Department of Genetics, Stanford University, Stanford, CA, USA
- Department of Computer Science, Stanford University Stanford, CA, USA
| | - Erez Lieberman Aiden
- The Center for Genome Architecture, Baylor College of Medicine, Houston, TX, USA
- Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, TX, USA
- Center for Theoretical Biological Physics, Rice University, Houston, TX, USA
- Broad Institute of MIT and Harvard, Cambridge, MA, USA
- Department of Computer Science, Rice University, Houston, TX, USA
- Department of Computational and Applied Mathematics, Rice University, Houston, TX, USA
| | - Alessandro Bertero
- Molecular Biotechnology Center "Guido Tarone," Department of Molecular Biotechnology and Health Sciences, University of Turin, Torino, Italy
| | - William Stafford Noble
- Department of Genome Sciences, University of Washington, Seattle, WA, USA
- Paul G. Allen Center for Computer Science & Engineering, University of Washington, Seattle, WA, USA
| |
Collapse
|
15
|
Kuang S, Pollard KS. Exploring the roles of RNAs in chromatin architecture using deep learning. Nat Commun 2024; 15:6373. [PMID: 39075082 PMCID: PMC11286850 DOI: 10.1038/s41467-024-50573-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/23/2023] [Accepted: 07/12/2024] [Indexed: 07/31/2024] Open
Abstract
Recent studies have highlighted the impact of both transcription and transcripts on 3D genome organization, particularly its dynamics. Here, we propose a deep learning framework, called AkitaR, that leverages both genome sequences and genome-wide RNA-DNA interactions to investigate the roles of chromatin-associated RNAs (caRNAs) on genome folding in HFFc6 cells. In order to disentangle the cis- and trans-regulatory roles of caRNAs, we have compared models with nascent transcripts, trans-located caRNAs, open chromatin data, or DNA sequence alone. Both nascent transcripts and trans-located caRNAs improve the models' predictions, especially at cell-type-specific genomic regions. Analyses of feature importance scores reveal the contribution of caRNAs at TAD boundaries, chromatin loops and nuclear sub-structures such as nuclear speckles and nucleoli to the models' predictions. Furthermore, we identify non-coding RNAs (ncRNAs) known to regulate chromatin structures, such as MALAT1 and NEAT1, as well as several new RNAs, RNY5, RPPH1, POLG-DT and THBS1-IT1, that might modulate chromatin architecture through trans-interactions in HFFc6. Our modeling also suggests that transcripts from Alus and other repetitive elements may facilitate chromatin interactions through trans R-loop formation. Our findings provide insights and generate testable hypotheses about the roles of caRNAs in shaping chromatin organization.
Collapse
Affiliation(s)
- Shuzhen Kuang
- Gladstone Institute of Data Science and Biotechnology, San Francisco, CA, USA
| | - Katherine S Pollard
- Gladstone Institute of Data Science and Biotechnology, San Francisco, CA, USA.
- Department of Epidemiology & Biostatistics, University of California, San Francisco, CA, USA.
- Chan Zuckerberg Biohub, San Francisco, CA, USA.
| |
Collapse
|
16
|
Loers JU, Vermeirssen V. A single-cell multimodal view on gene regulatory network inference from transcriptomics and chromatin accessibility data. Brief Bioinform 2024; 25:bbae382. [PMID: 39207727 PMCID: PMC11359808 DOI: 10.1093/bib/bbae382] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/05/2024] [Revised: 06/27/2024] [Accepted: 07/23/2024] [Indexed: 09/04/2024] Open
Abstract
Eukaryotic gene regulation is a combinatorial, dynamic, and quantitative process that plays a vital role in development and disease and can be modeled at a systems level in gene regulatory networks (GRNs). The wealth of multi-omics data measured on the same samples and even on the same cells has lifted the field of GRN inference to the next stage. Combinations of (single-cell) transcriptomics and chromatin accessibility allow the prediction of fine-grained regulatory programs that go beyond mere correlation of transcription factor and target gene expression, with enhancer GRNs (eGRNs) modeling molecular interactions between transcription factors, regulatory elements, and target genes. In this review, we highlight the key components for successful (e)GRN inference from (sc)RNA-seq and (sc)ATAC-seq data exemplified by state-of-the-art methods as well as open challenges and future developments. Moreover, we address preprocessing strategies, metacell generation and computational omics pairing, transcription factor binding site detection, and linear and three-dimensional approaches to identify chromatin interactions as well as dynamic and causal eGRN inference. We believe that the integration of transcriptomics together with epigenomics data at a single-cell level is the new standard for mechanistic network inference, and that it can be further advanced with integrating additional omics layers and spatiotemporal data, as well as with shifting the focus towards more quantitative and causal modeling strategies.
Collapse
Affiliation(s)
- Jens Uwe Loers
- Lab for Computational Biology, Integromics and Gene Regulation (CBIGR), Cancer Research Institute Ghent (CRIG), Corneel Heymanslaan 10, 9000 Ghent, Belgium
- Department of Biomedical Molecular Biology, Ghent University, Zwijnaarde-Technologiepark 71, 9052 Ghent, Belgium
- Department of Biomolecular Medicine, Ghent University, Corneel Heymanslaan 10, 9000 Ghent, Belgium
| | - Vanessa Vermeirssen
- Lab for Computational Biology, Integromics and Gene Regulation (CBIGR), Cancer Research Institute Ghent (CRIG), Corneel Heymanslaan 10, 9000 Ghent, Belgium
- Department of Biomedical Molecular Biology, Ghent University, Zwijnaarde-Technologiepark 71, 9052 Ghent, Belgium
- Department of Biomolecular Medicine, Ghent University, Corneel Heymanslaan 10, 9000 Ghent, Belgium
| |
Collapse
|
17
|
Murtaza G, Butaney B, Wagner J, Singh R. scGrapHiC: deep learning-based graph deconvolution for Hi-C using single cell gene expression. Bioinformatics 2024; 40:i490-i500. [PMID: 38940151 PMCID: PMC11256916 DOI: 10.1093/bioinformatics/btae223] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/29/2024] Open
Abstract
SUMMARY Single-cell Hi-C (scHi-C) protocol helps identify cell-type-specific chromatin interactions and sheds light on cell differentiation and disease progression. Despite providing crucial insights, scHi-C data is often underutilized due to the high cost and the complexity of the experimental protocol. We present a deep learning framework, scGrapHiC, that predicts pseudo-bulk scHi-C contact maps using pseudo-bulk scRNA-seq data. Specifically, scGrapHiC performs graph deconvolution to extract genome-wide single-cell interactions from a bulk Hi-C contact map using scRNA-seq as a guiding signal. Our evaluations show that scGrapHiC, trained on seven cell-type co-assay datasets, outperforms typical sequence encoder approaches. For example, scGrapHiC achieves a substantial improvement of 23.2% in recovering cell-type-specific Topologically Associating Domains over the baselines. It also generalizes to unseen embryo and brain tissue samples. scGrapHiC is a novel method to generate cell-type-specific scHi-C contact maps using widely available genomic signals that enables the study of cell-type-specific chromatin interactions. AVAILABILITY AND IMPLEMENTATION The GitHub link: https://github.com/rsinghlab/scGrapHiC contains the source code of scGrapHiC and associated scripts to preprocess publicly available datasets to produce the results and visualizations we have discuss in this manuscript.
Collapse
Affiliation(s)
- Ghulam Murtaza
- Department of Computer Science, Brown University, 115 Waterman Street, Providence, RI, 02912, United States
| | - Byron Butaney
- Department of Computer Science, Brown University, 115 Waterman Street, Providence, RI, 02912, United States
| | - Justin Wagner
- Material Measurement Laboratory, National Institute of Standards and Technology, Gaithersburg, MD, 20899, United States
| | - Ritambhara Singh
- Department of Computer Science, Brown University, 115 Waterman Street, Providence, RI, 02912, United States
- Center for Computational Molecular Biology, Brown University, 164 Angell Street, Providence, RI, 02912, United States
| |
Collapse
|
18
|
Yang R, Das A, Gao VR, Karbalayghareh A, Noble WS, Bilmes JA, Leslie CS. Author Correction: Epiphany: predicting Hi-C contact maps from 1D epigenomic signals. Genome Biol 2024; 25:132. [PMID: 38783328 PMCID: PMC11112830 DOI: 10.1186/s13059-024-03281-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 05/25/2024] Open
Affiliation(s)
- Rui Yang
- Memorial Sloan Kettering Cancer Center, New York, USA
| | - Arnav Das
- University of Washington, Seattle, USA
| | - Vianne R Gao
- Memorial Sloan Kettering Cancer Center, New York, USA
| | | | | | | | | |
Collapse
|
19
|
Liu T, Zhu H, Wang Z. Learning Micro-C from Hi-C with diffusion models. PLoS Comput Biol 2024; 20:e1012136. [PMID: 38758956 PMCID: PMC11139321 DOI: 10.1371/journal.pcbi.1012136] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/11/2023] [Revised: 05/30/2024] [Accepted: 05/05/2024] [Indexed: 05/19/2024] Open
Abstract
In the last few years, Micro-C has shown itself as an improved alternative to Hi-C. It replaced the restriction enzymes in Hi-C assays with micrococcal nuclease (MNase), resulting in capturing nucleosome resolution chromatin interactions. The signal-to-noise improvement of Micro-C allows it to detect more chromatin loops than high-resolution Hi-C. However, compared with massive Hi-C datasets available in the literature, there are only a limited number of Micro-C datasets. To take full advantage of these Hi-C datasets, we present HiC2MicroC, a computational method learning and then predicting Micro-C from Hi-C based on the denoising diffusion probabilistic models (DDPM). We trained our DDPM and other regression models in human foreskin fibroblast (HFFc6) cell line and evaluated these methods in six different cell types at 5-kb and 1-kb resolution. Our evaluations demonstrate that both HiC2MicroC and regression methods can markedly improve Hi-C towards Micro-C, and our DDPM-based HiC2MicroC outperforms regression in various terms. First, HiC2MicroC successfully recovers most of the Micro-C loops even those not detected in Hi-C maps. Second, a majority of the HiC2MicroC-recovered loops anchor CTCF binding sites in a convergent orientation. Third, HiC2MicroC loops share genomic and epigenetic properties with Micro-C loops, including linking promoters and enhancers, and their anchors are enriched for structural proteins (CTCF and cohesin) and histone modifications. Lastly, we find our recovered loops are also consistent with the loops identified from promoter capture Micro-C (PCMicro-C) and Chromatin Interaction Analysis by Paired-End Tag Sequencing (ChIA-PET). Overall, HiC2MicroC is an effective tool for further studying Hi-C data with Micro-C as a template. HiC2MicroC is publicly available at https://github.com/zwang-bioinformatics/HiC2MicroC/.
Collapse
Affiliation(s)
- Tong Liu
- Department of Computer Science, University of Miami, Coral Gables, Florida, United States of America
| | - Hao Zhu
- Department of Computer Science, University of Miami, Coral Gables, Florida, United States of America
| | - Zheng Wang
- Department of Computer Science, University of Miami, Coral Gables, Florida, United States of America
| |
Collapse
|
20
|
Min A, Schreiber J, Kundaje A, Noble WS. Predicting chromatin conformation contact maps. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.04.12.589240. [PMID: 38645064 PMCID: PMC11030330 DOI: 10.1101/2024.04.12.589240] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 04/23/2024]
Abstract
Over the past 15 years, a variety of next-generation sequencing assays have been developed for measuring the 3D conformation of DNA in the nucleus. Each of these assays gives, for a particular cell or tissue type, a distinct picture of 3D chromatin architecture. Accordingly, making sense of the relationship between genome structure and function requires teasing apart two closely related questions: how does chromatin 3D structure change from one cell type to the next, and how do different measurements of that structure differ from one another, even when the two assays are carried out in the same cell type? In this work, we assemble a collection of chromatin 3D datasets-each represented as a 2D contact map- spanning multiple assay types and cell types. We then build a machine learning model that predicts missing contact maps in this collection. We use the model to systematically explore how genome 3D architecture changes, at the level of compartments, domains, and loops, between cell type and between assay types.
Collapse
Affiliation(s)
- Alan Min
- Department of Statistics, University of Washington
| | | | | | - William Stafford Noble
- Department of Genome Sciences, University of Washington
- Paul G. Allen School of Computer Science and Engineering, University of Washington
| |
Collapse
|
21
|
Wall BPG, Nguyen M, Harrell JC, Dozmorov MG. Machine and deep learning methods for predicting 3D genome organization. ARXIV 2024:arXiv:2403.03231v1. [PMID: 38495565 PMCID: PMC10942493] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Subscribe] [Scholar Register] [Indexed: 03/19/2024]
Abstract
Three-Dimensional (3D) chromatin interactions, such as enhancer-promoter interactions (EPIs), loops, Topologically Associating Domains (TADs), and A/B compartments play critical roles in a wide range of cellular processes by regulating gene expression. Recent development of chromatin conformation capture technologies has enabled genome-wide profiling of various 3D structures, even with single cells. However, current catalogs of 3D structures remain incomplete and unreliable due to differences in technology, tools, and low data resolution. Machine learning methods have emerged as an alternative to obtain missing 3D interactions and/or improve resolution. Such methods frequently use genome annotation data (ChIP-seq, DNAse-seq, etc.), DNA sequencing information (k-mers, Transcription Factor Binding Site (TFBS) motifs), and other genomic properties to learn the associations between genomic features and chromatin interactions. In this review, we discuss computational tools for predicting three types of 3D interactions (EPIs, chromatin interactions, TAD boundaries) and analyze their pros and cons. We also point out obstacles of computational prediction of 3D interactions and suggest future research directions.
Collapse
Affiliation(s)
- Brydon P. G. Wall
- Center for Biological Data Science, Virginia Commonwealth University, Richmond, VA, 23284, USA
| | - My Nguyen
- Department of Biostatistics, Virginia Commonwealth University, Richmond, VA, 23298, USA
| | - J. Chuck Harrell
- Department of Pathology, Virginia Commonwealth University, Richmond, VA, 23284, USA
- Massey Comprehensive Cancer Center, Virginia Commonwealth University, Richmond, VA 23298, USA
- Center for Pharmaceutical Engineering, Virginia Commonwealth University, Richmond, VA 23298, USA
| | - Mikhail G. Dozmorov
- Department of Biostatistics, Virginia Commonwealth University, Richmond, VA, 23298, USA
- Department of Pathology, Virginia Commonwealth University, Richmond, VA, 23284, USA
| |
Collapse
|
22
|
Zhang Y, Boninsegna L, Yang M, Misteli T, Alber F, Ma J. Computational methods for analysing multiscale 3D genome organization. Nat Rev Genet 2024; 25:123-141. [PMID: 37673975 PMCID: PMC11127719 DOI: 10.1038/s41576-023-00638-1] [Citation(s) in RCA: 23] [Impact Index Per Article: 23.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 07/12/2023] [Indexed: 09/08/2023]
Abstract
Recent progress in whole-genome mapping and imaging technologies has enabled the characterization of the spatial organization and folding of the genome in the nucleus. In parallel, advanced computational methods have been developed to leverage these mapping data to reveal multiscale three-dimensional (3D) genome features and to provide a more complete view of genome structure and its connections to genome functions such as transcription. Here, we discuss how recently developed computational tools, including machine-learning-based methods and integrative structure-modelling frameworks, have led to a systematic, multiscale delineation of the connections among different scales of 3D genome organization, genomic and epigenomic features, functional nuclear components and genome function. However, approaches that more comprehensively integrate a wide variety of genomic and imaging datasets are still needed to uncover the functional role of 3D genome structure in defining cellular phenotypes in health and disease.
Collapse
Affiliation(s)
- Yang Zhang
- Computational Biology Department, School of Computer Science, Carnegie Mellon University, Pittsburgh, PA, USA
| | - Lorenzo Boninsegna
- Department of Microbiology, Immunology and Molecular Genetics and Institute for Quantitative and Computational Biosciences, University of California Los Angeles, Los Angeles, CA, USA
| | - Muyu Yang
- Computational Biology Department, School of Computer Science, Carnegie Mellon University, Pittsburgh, PA, USA
| | - Tom Misteli
- Center for Cancer Research, National Cancer Institute, Bethesda, MD, USA.
| | - Frank Alber
- Department of Microbiology, Immunology and Molecular Genetics and Institute for Quantitative and Computational Biosciences, University of California Los Angeles, Los Angeles, CA, USA.
| | - Jian Ma
- Computational Biology Department, School of Computer Science, Carnegie Mellon University, Pittsburgh, PA, USA.
| |
Collapse
|
23
|
Klie A, Laub D, Talwar JV, Stites H, Jores T, Solvason JJ, Farley EK, Carter H. Predictive analyses of regulatory sequences with EUGENe. NATURE COMPUTATIONAL SCIENCE 2023; 3:946-956. [PMID: 38177592 PMCID: PMC10768637 DOI: 10.1038/s43588-023-00544-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 01/12/2023] [Accepted: 09/27/2023] [Indexed: 01/06/2024]
Abstract
Deep learning has become a popular tool to study cis-regulatory function. Yet efforts to design software for deep-learning analyses in regulatory genomics that are findable, accessible, interoperable and reusable (FAIR) have fallen short of fully meeting these criteria. Here we present elucidating the utility of genomic elements with neural nets (EUGENe), a FAIR toolkit for the analysis of genomic sequences with deep learning. EUGENe consists of a set of modules and subpackages for executing the key functionality of a genomics deep learning workflow: (1) extracting, transforming and loading sequence data from many common file formats; (2) instantiating, initializing and training diverse model architectures; and (3) evaluating and interpreting model behavior. We designed EUGENe as a simple, flexible and extensible interface for streamlining and customizing end-to-end deep-learning sequence analyses, and illustrate these principles through application of the toolkit to three predictive modeling tasks. We hope that EUGENe represents a springboard towards a collaborative ecosystem for deep-learning applications in genomics research.
Collapse
Affiliation(s)
- Adam Klie
- Department of Medicine, University of California San Diego, La Jolla, CA, USA
- Bioinformatics and Systems Biology Program, University of California San Diego, La Jolla, CA, USA
| | - David Laub
- Department of Medicine, University of California San Diego, La Jolla, CA, USA
- Bioinformatics and Systems Biology Program, University of California San Diego, La Jolla, CA, USA
| | - James V Talwar
- Department of Medicine, University of California San Diego, La Jolla, CA, USA
- Bioinformatics and Systems Biology Program, University of California San Diego, La Jolla, CA, USA
| | | | - Tobias Jores
- Department of Genome Sciences, University of Washington, Seattle, WA, USA
| | - Joe J Solvason
- Department of Medicine, University of California San Diego, La Jolla, CA, USA
- Bioinformatics and Systems Biology Program, University of California San Diego, La Jolla, CA, USA
- Department of Molecular Biology, University of California San Diego, La Jolla, CA, USA
| | - Emma K Farley
- Department of Medicine, University of California San Diego, La Jolla, CA, USA
- Bioinformatics and Systems Biology Program, University of California San Diego, La Jolla, CA, USA
- Department of Molecular Biology, University of California San Diego, La Jolla, CA, USA
| | - Hannah Carter
- Department of Medicine, University of California San Diego, La Jolla, CA, USA.
- Bioinformatics and Systems Biology Program, University of California San Diego, La Jolla, CA, USA.
| |
Collapse
|
24
|
Umarov R, Hon CC. Enhancer target prediction: state-of-the-art approaches and future prospects. Biochem Soc Trans 2023; 51:1975-1988. [PMID: 37830459 DOI: 10.1042/bst20230917] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/14/2023] [Revised: 10/02/2023] [Accepted: 10/02/2023] [Indexed: 10/14/2023]
Abstract
Enhancers are genomic regions that regulate gene transcription and are located far away from the transcription start sites of their target genes. Enhancers are highly enriched in disease-associated variants and thus deciphering the interactions between enhancers and genes is crucial to understanding the molecular basis of genetic predispositions to diseases. Experimental validations of enhancer targets can be laborious. Computational methods have thus emerged as a valuable alternative for studying enhancer-gene interactions. A variety of computational methods have been developed to predict enhancer targets by incorporating genomic features (e.g. conservation, distance, and sequence), epigenomic features (e.g. histone marks and chromatin contacts) and activity measurements (e.g. covariations of enhancer activity and gene expression). With the recent advances in genome perturbation and chromatin conformation capture technologies, data on experimentally validated enhancer targets are becoming available for supervised training of these methods and evaluation of their performance. In this review, we categorize enhancer target prediction methods based on their rationales and approaches. Then we discuss their merits and limitations and highlight the future directions for enhancer targets prediction.
Collapse
Affiliation(s)
- Ramzan Umarov
- RIKEN Centre for Integrative Medical Sciences, Yokohama RIKEN Institute, Yokohama, Japan
| | - Chung-Chau Hon
- RIKEN Centre for Integrative Medical Sciences, Yokohama RIKEN Institute, Yokohama, Japan
| |
Collapse
|
25
|
Kuang S, Pollard KS. Exploring the Roles of RNAs in Chromatin Architecture Using Deep Learning. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.10.22.563498. [PMID: 37961712 PMCID: PMC10634726 DOI: 10.1101/2023.10.22.563498] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/15/2023]
Abstract
Recent studies have highlighted the impact of both transcription and transcripts on 3D genome organization, particularly its dynamics. Here, we propose a deep learning framework, called AkitaR, that leverages both genome sequences and genome-wide RNA-DNA interactions to investigate the roles of chromatin-associated RNAs (caRNAs) on genome folding in HFFc6 cells. In order to disentangle the cis- and trans-regulatory roles of caRNAs, we compared models with nascent transcripts, trans-located caRNAs, open chromatin data, or DNA sequence alone. Both nascent transcripts and trans-located caRNAs improved the models' predictions, especially at cell-type-specific genomic regions. Analyses of feature importance scores revealed the contribution of caRNAs at TAD boundaries, chromatin loops and nuclear sub-structures such as nuclear speckles and nucleoli to the models' predictions. Furthermore, we identified non-coding RNAs (ncRNAs) known to regulate chromatin structures, such as MALAT1 and NEAT1, as well as several novel RNAs, RNY5, RPPH1, POLG-DT and THBS1-IT, that might modulate chromatin architecture through trans-interactions in HFFc6. Our modeling also suggests that transcripts from Alus and other repetitive elements may facilitate chromatin interactions through trans R-loop formation. Our findings provide new insights and generate testable hypotheses about the roles of caRNAs in shaping chromatin organization.
Collapse
Affiliation(s)
- Shuzhen Kuang
- Gladstone Institute of Data Science and Biotechnology, San Francisco, CA
| | - Katherine S. Pollard
- Gladstone Institute of Data Science and Biotechnology, San Francisco, CA
- Department of Epidemiology & Biostatistics, University of California, San Francisco, CA
- Chan Zuckerberg Biohub, San Francisco, CA, USA
| |
Collapse
|
26
|
Gunsalus LM, Keiser MJ, Pollard KS. In silico discovery of repetitive elements as key sequence determinants of 3D genome folding. CELL GENOMICS 2023; 3:100410. [PMID: 37868032 PMCID: PMC10589630 DOI: 10.1016/j.xgen.2023.100410] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 10/12/2022] [Revised: 11/08/2022] [Accepted: 08/31/2023] [Indexed: 10/24/2023]
Abstract
Natural and experimental genetic variants can modify DNA loops and insulating boundaries to tune transcription, but it is unknown how sequence perturbations affect chromatin organization genome wide. We developed a deep-learning strategy to quantify the effect of any insertion, deletion, or substitution on chromatin contacts and systematically scored millions of synthetic variants. While most genetic manipulations have little impact, regions with CTCF motifs and active transcription are highly sensitive, as expected. Our unbiased screen and subsequent targeted experiments also point to noncoding RNA genes and several families of repetitive elements as CTCF-motif-free DNA sequences with particularly large effects on nearby chromatin interactions, sometimes exceeding the effects of CTCF sites and explaining interactions that lack CTCF. We anticipate that our disruption tracks may be of broad interest and utility as a measure of 3D genome sensitivity, and our computational strategies may serve as a template for biological inquiry with deep learning.
Collapse
Affiliation(s)
- Laura M. Gunsalus
- Gladstone Institutes, San Francisco, CA, USA
- Institute for Neurodegenerative Diseases, University of California, San Francisco, San Francisco, CA, USA
| | - Michael J. Keiser
- Institute for Neurodegenerative Diseases, University of California, San Francisco, San Francisco, CA, USA
- Bakar Computational Health Sciences Institute, University of California, San Francisco, San Francisco, CA, USA
- Department of Bioengineering and Therapeutic Sciences, University of California, San Francisco, San Francisco, CA, USA
- Kavli Institute for Fundamental Neuroscience, University of California, San Francisco, San Francisco, CA, USA
- Department of Pharmaceutical Chemistry, University of California, San Francisco, San Francisco, CA, USA
| | - Katherine S. Pollard
- Gladstone Institutes, San Francisco, CA, USA
- Bakar Computational Health Sciences Institute, University of California, San Francisco, San Francisco, CA, USA
- Chan Zuckerberg Biohub, San Francisco, CA, USA
- Department of Epidemiology & Biostatistics, University of California, San Francisco, San Francisco, CA, USA
| |
Collapse
|
27
|
Baur B, Roy S. Predicting patient-specific enhancer-promoter interactions. CELL REPORTS METHODS 2023; 3:100594. [PMID: 37751694 PMCID: PMC10545932 DOI: 10.1016/j.crmeth.2023.100594] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 08/25/2023] [Revised: 08/30/2023] [Accepted: 08/30/2023] [Indexed: 09/28/2023]
Abstract
Computational methods that can predict hard-to-measure modalities from those that are easier to measure, in a patient-specific manner, play a critical role in personalized medicine. In this issue of Cell Reports Methods, Khurana et al. present differential gene targets of accessible chromatin (DGTAC), an approach which predicts patient-specific enhancer-promoter interactions.
Collapse
Affiliation(s)
- Brittany Baur
- Wisconsin Institute for Discovery, 330 N. Orchard Street, Madison, WI 53715, USA; The Max Harry Weil Institute of Critical Care Research & Innovation, University of Michigan, Ann Arbor, MI, USA; Department of Emergency Medicine, University of Michigan Medical School, Ann Arbor, MI, USA
| | - Sushmita Roy
- Wisconsin Institute for Discovery, 330 N. Orchard Street, Madison, WI 53715, USA; Department of Biostatistics and Medical Informatics, University of Wisconsin-Madison, Madison, WI 53715, USA.
| |
Collapse
|
28
|
Xu J, Zhang P, Sun W, Zhang J, Zhang W, Hou C, Li L. EpiMCI: Predicting Multi-Way Chromatin Interactions from Epigenomic Signals. BIOLOGY 2023; 12:1203. [PMID: 37759602 PMCID: PMC10525350 DOI: 10.3390/biology12091203] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/27/2023] [Revised: 08/31/2023] [Accepted: 08/31/2023] [Indexed: 09/29/2023]
Abstract
The recently emerging high-throughput Pore-C (HiPore-C) can identify whole-genome high-order chromatin multi-way interactions with an ultra-high output, contributing to deciphering three-dimensional (3D) genome organization. However, it also brings new challenges to relevant data analysis. To alleviate this problem, we proposed the EpiMCI, a model for multi-way chromatin interaction prediction based on a hypergraph neural network with epigenomic signals as the input. The EpiMCI integrated separate hyperedge representations with coupling hyperedge information and obtained AUCs of 0.981 and 0.984 in the GM12878 and K562 datasets, respectively, which outperformed the current available method. Moreover, the EpiMCI can be applied to denoise the HiPore-C data and improve the data quality efficiently. Furthermore, the vertex embeddings extracted from the EpiMCI reflected the global chromatin architecture accurately. The principal component analysis suggested that it was well aligned with the activities of genomic regions at the chromatin compartment level. Taken together, the EpiMCI can accurately predict multi-way chromatin interactions and can be applied to studies relying on chromatin architecture.
Collapse
Affiliation(s)
- Jinsheng Xu
- Hubei Key Laboratory of Agricultural Bioinformatics, College of Informatics, Huazhong Agricultural University, Wuhan 430070, China
| | - Ping Zhang
- Hubei Key Laboratory of Agricultural Bioinformatics, College of Informatics, Huazhong Agricultural University, Wuhan 430070, China
| | - Weicheng Sun
- Hubei Key Laboratory of Agricultural Bioinformatics, College of Informatics, Huazhong Agricultural University, Wuhan 430070, China
| | - Junying Zhang
- Hubei Key Laboratory of Agricultural Bioinformatics, College of Informatics, Huazhong Agricultural University, Wuhan 430070, China
| | - Wenxue Zhang
- Food Science Program, Division of Food, Nutrition and Exercise Sciences, University of Missouri, 1406 E Rollins Street, Columbia, MO 65211, USA
| | - Chunhui Hou
- China State Key Laboratory of Genetic Resources and Evolution, Kunming Institute of Zoology, Chinese Academy of Sciences, Kunming 650223, China
| | - Li Li
- Hubei Key Laboratory of Agricultural Bioinformatics, College of Informatics, Huazhong Agricultural University, Wuhan 430070, China
- Hubei Hongshan Laboratory, Huazhong Agricultural University, Wuhan 430074, China
| |
Collapse
|