1
|
Wang J, Wu L, Wei J, Yan C, Luo H, Luo J, Guo F. CGLoop: a neural network framework for chromatin loop prediction. BMC Genomics 2025; 26:342. [PMID: 40186170 PMCID: PMC11971808 DOI: 10.1186/s12864-025-11531-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/25/2024] [Accepted: 03/25/2025] [Indexed: 04/07/2025] Open
Abstract
BACKGROUND Chromosomes of species exhibit a variety of high-dimensional organizational features, and chromatin loops, which are fundamental structures in the three-dimensional (3D) structure of the genome. Chromatin loops are visible speckled patterns on Hi-C contact matrix generated by chromosome conformation capture methods. The chromatin loops play an important role in gene expression, and predicting the chromatin loops generated during whole genome interactions is crucial for a deeper understanding of the 3D genome structure and function. RESULTS Here, we propose CGLoop, a deep learning based neural network framework that detects chromatin loops in Hi-C contact matrix. CGLoop combines the convolutional neural network (CNN) with Convolutional Block Attention Module (CBAM) and the Bidirectional Gated Recurrent Unit (BiGRU) to capture important features related to chromatin loops by comprehensively analyzing the Hi-C contact matrix, enabling the prediction of candidate chromatin loops. And CGLoop employs a density based clustering method to filter the candidate chromatin loops predicted by the neural network model. Finally, we compared CGloop with other chromatin loops prediction methods on several cell line including GM12878, K562, IMR90, and mESC. The code is available from https://github.com/wllwuliliwll/CGLoop . CONCLUSIONS The experimental results show that, loops predicted by CGLoop show high APA scores and there is an enrichment of multiple transcription factors and binding proteins at the predicted loops anchors, which outperforms other methods in terms of accuracy and validity of chromatin loops prediction.
Collapse
Affiliation(s)
- Junfeng Wang
- School of Software, Henan Polytechnic University, Jiaozuo, 454003, China
| | - Lili Wu
- School of Software, Henan Polytechnic University, Jiaozuo, 454003, China
| | - Jingjing Wei
- College of Chemical and Environmental Engineering, Anyang Institute of Technology, Anyang, 455000, China
| | - Chaokun Yan
- School of Computer and Information Engineering, Henan University, Kaifeng, 475001, China
| | - Huimin Luo
- School of Computer and Information Engineering, Henan University, Kaifeng, 475001, China
| | - Junwei Luo
- School of Software, Henan Polytechnic University, Jiaozuo, 454003, China.
| | - Fei Guo
- School of Computer Science and Engineering, Central South University, Changsha, 410083, China
| |
Collapse
|
2
|
Gjoni K, Gunsalus LM, Kuang S, McArthur E, Pittman M, Capra JA, Pollard KS. Comparing chromatin contact maps at scale: methods and insights. Nat Methods 2025; 22:824-833. [PMID: 40108448 PMCID: PMC11978506 DOI: 10.1038/s41592-025-02630-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/20/2023] [Accepted: 02/14/2025] [Indexed: 03/22/2025]
Abstract
Comparing chromatin contact maps is an essential step in quantifying how three-dimensional (3D) genome organization shapes development, evolution, and disease. However, methods often disagree, and no gold standard exists for comparing pairs of maps. Here, we evaluate 25 ways to compare contact maps using Micro-C and Hi-C data from two cell types and in silico-generated contact maps. We identify similarities and differences between the methods and quantify their robustness to common sources of biological and technical variation, including losses and gains of CTCF-binding sites, changes in contact intensity or patterns, and noise. We find that global comparison methods, such as mean squared error, are suitable for initial screening; however, biologically informed methods are necessary for identifying how maps diverge and for proposing specific functional hypotheses. We provide a reference guide, codebase, and thorough evaluation for rapidly comparing chromatin contact maps at scale to enable biological insights into 3D genome organization.
Collapse
Affiliation(s)
- Ketrin Gjoni
- Gladstone Institute of Data Science and Biotechnology, San Francisco, CA, USA
- Department of Epidemiology & Biostatistics, University of California, San Francisco, CA, USA
| | - Laura M Gunsalus
- Gladstone Institute of Data Science and Biotechnology, San Francisco, CA, USA
- Department of Epidemiology & Biostatistics, University of California, San Francisco, CA, USA
| | - Shuzhen Kuang
- Gladstone Institute of Data Science and Biotechnology, San Francisco, CA, USA
- Department of Epidemiology & Biostatistics, University of California, San Francisco, CA, USA
| | - Evonne McArthur
- Department of Epidemiology & Biostatistics, University of California, San Francisco, CA, USA
- Bakar Computational Health Sciences Institute, University of California, San Francisco, CA, USA
- Vanderbilt Genetics Institute, Vanderbilt University Medical Center, Nashville, TN, USA
- Department of Medicine, University of Washington, Seattle, WA, USA
| | - Maureen Pittman
- Gladstone Institute of Data Science and Biotechnology, San Francisco, CA, USA
- Department of Epidemiology & Biostatistics, University of California, San Francisco, CA, USA
| | - John A Capra
- Department of Epidemiology & Biostatistics, University of California, San Francisco, CA, USA.
- Bakar Computational Health Sciences Institute, University of California, San Francisco, CA, USA.
| | - Katherine S Pollard
- Gladstone Institute of Data Science and Biotechnology, San Francisco, CA, USA.
- Department of Epidemiology & Biostatistics, University of California, San Francisco, CA, USA.
- Bakar Computational Health Sciences Institute, University of California, San Francisco, CA, USA.
- Chan Zuckerberg Biohub, San Francisco, CA, USA.
| |
Collapse
|
3
|
Kumar Halder A, Agarwal A, Jodkowska K, Plewczynski D. A systematic analyses of different bioinformatics pipelines for genomic data and its impact on deep learning models for chromatin loop prediction. Brief Funct Genomics 2024; 23:538-548. [PMID: 38555493 DOI: 10.1093/bfgp/elae009] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/28/2023] [Revised: 02/07/2024] [Accepted: 03/04/2024] [Indexed: 04/02/2024] Open
Abstract
Genomic data analysis has witnessed a surge in complexity and volume, primarily driven by the advent of high-throughput technologies. In particular, studying chromatin loops and structures has become pivotal in understanding gene regulation and genome organization. This systematic investigation explores the realm of specialized bioinformatics pipelines designed specifically for the analysis of chromatin loops and structures. Our investigation incorporates two protein (CTCF and Cohesin) factor-specific loop interaction datasets from six distinct pipelines, amassing a comprehensive collection of 36 diverse datasets. Through a meticulous review of existing literature, we offer a holistic perspective on the methodologies, tools and algorithms underpinning the analysis of this multifaceted genomic feature. We illuminate the vast array of approaches deployed, encompassing pivotal aspects such as data preparation pipeline, preprocessing, statistical features and modelling techniques. Beyond this, we rigorously assess the strengths and limitations inherent in these bioinformatics pipelines, shedding light on the interplay between data quality and the performance of deep learning models, ultimately advancing our comprehension of genomic intricacies.
Collapse
Affiliation(s)
- Anup Kumar Halder
- Laboratory of Bioinformatics and Computational Genomics, Faculty of Mathematics and Information Science, Warsaw University of Technology, Koszykowa 75, 00-662 Warsaw, Poland
- Laboratory of Functional and Structural Genomics, Centre of New Technologies, University of Warsaw, Banacha 2c, 02-097 Warsaw, Poland
| | - Abhishek Agarwal
- Laboratory of Functional and Structural Genomics, Centre of New Technologies, University of Warsaw, Banacha 2c, 02-097 Warsaw, Poland
| | - Karolina Jodkowska
- Laboratory of Functional and Structural Genomics, Centre of New Technologies, University of Warsaw, Banacha 2c, 02-097 Warsaw, Poland
| | - Dariusz Plewczynski
- Laboratory of Bioinformatics and Computational Genomics, Faculty of Mathematics and Information Science, Warsaw University of Technology, Koszykowa 75, 00-662 Warsaw, Poland
- Laboratory of Functional and Structural Genomics, Centre of New Technologies, University of Warsaw, Banacha 2c, 02-097 Warsaw, Poland
| |
Collapse
|
4
|
Shen J, Wang Y, Luo J. CD-Loop: a chromatin loop detection method based on the diffusion model. Front Genet 2024; 15:1393406. [PMID: 38770419 PMCID: PMC11102972 DOI: 10.3389/fgene.2024.1393406] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/29/2024] [Accepted: 04/11/2024] [Indexed: 05/22/2024] Open
Abstract
Motivation In recent years, there have been significant advances in various chromatin conformation capture techniques, and annotating the topological structure from Hi-C contact maps has become crucial for studying the three-dimensional structure of chromosomes. However, the structure and function of chromatin loops are highly dynamic and diverse, influenced by multiple factors. Therefore, obtaining the three-dimensional structure of the genome remains a challenging task. Among many chromatin loop prediction methods, it is difficult to fully extract features from the contact map and make accurate predictions at low sequencing depths. Results In this study, we put forward a deep learning framework based on the diffusion model called CD-Loop for predicting accurate chromatin loops. First, by pre-training the input data, we obtain prior probabilities for predicting the classification of the Hi-C contact map. Then, by combining the denoising process based on the diffusion model and the prior probability obtained by pre-training, candidate loops were predicted from the input Hi-C contact map. Finally, CD-Loop uses a density-based clustering algorithm to cluster the candidate chromatin loops and predict the final chromatin loops. We compared CD-Loop with the currently popular methods, such as Peakachu, Chromosight, and Mustache, and found that in different cell types, species, and sequencing depths, CD-Loop outperforms other methods in loop annotation. We conclude that CD-Loop can accurately predict chromatin loops and reveal cell-type specificity. The code is available at https://github.com/wangyang199897/CD-Loop.
Collapse
Affiliation(s)
| | | | - Junwei Luo
- School of Software, Henan Polytechnic University, Jiaozuo, China
| |
Collapse
|
5
|
Li K, Zhang P, Wang Z, Shen W, Sun W, Xu J, Wen Z, Li L. iEnhance: a multi-scale spatial projection encoding network for enhancing chromatin interaction data resolution. Brief Bioinform 2023; 24:bbad245. [PMID: 37381618 DOI: 10.1093/bib/bbad245] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/21/2023] [Revised: 06/06/2023] [Accepted: 06/12/2023] [Indexed: 06/30/2023] Open
Abstract
Although sequencing-based high-throughput chromatin interaction data are widely used to uncover genome-wide three-dimensional chromatin architecture, their sparseness and high signal-noise-ratio greatly restrict the precision of the obtained structural elements. To improve data quality, we here present iEnhance (chromatin interaction data resolution enhancement), a multi-scale spatial projection and encoding network, to predict high-resolution chromatin interaction matrices from low-resolution and noisy input data. Specifically, iEnhance projects the input data into matrix spaces to extract multi-scale global and local feature sets, then hierarchically fused these features by attention mechanism. After that, dense channel encoding and residual channel decoding are used to effectively infer robust chromatin interaction maps. iEnhance outperforms state-of-the-art Hi-C resolution enhancement tools in both visual and quantitative evaluation. Comprehensive analysis shows that unlike other tools, iEnhance can recover both short-range structural elements and long-range interaction patterns precisely. More importantly, iEnhance can be transferred to data enhancement of other tissues or cell lines of unknown resolution. Furthermore, iEnhance performs robustly in enhancement of diverse chromatin interaction data including those from single-cell Hi-C and Micro-C experiments.
Collapse
Affiliation(s)
- Kai Li
- Hubei Key Laboratory of Agricultural Bioinformatics, College of Informatics, Huazhong Agricultural University, Wuhan 430070, China
| | - Ping Zhang
- Hubei Key Laboratory of Agricultural Bioinformatics, College of Informatics, Huazhong Agricultural University, Wuhan 430070, China
| | - Zilin Wang
- Hubei Key Laboratory of Agricultural Bioinformatics, College of Informatics, Huazhong Agricultural University, Wuhan 430070, China
| | - Wei Shen
- Hubei Key Laboratory of Agricultural Bioinformatics, College of Informatics, Huazhong Agricultural University, Wuhan 430070, China
| | - Weicheng Sun
- Hubei Key Laboratory of Agricultural Bioinformatics, College of Informatics, Huazhong Agricultural University, Wuhan 430070, China
| | - Jinsheng Xu
- Hubei Key Laboratory of Agricultural Bioinformatics, College of Informatics, Huazhong Agricultural University, Wuhan 430070, China
| | - Zi Wen
- Hubei Key Laboratory of Agricultural Bioinformatics, College of Informatics, Huazhong Agricultural University, Wuhan 430070, China
| | - Li Li
- Hubei Key Laboratory of Agricultural Bioinformatics, College of Informatics, Huazhong Agricultural University, Wuhan 430070, China
- Hubei Hongshan Laboratory, Huazhong Agricultural University, Wuhan 430070, China
| |
Collapse
|
6
|
Gunsalus LM, McArthur E, Gjoni K, Kuang S, Pittman M, Capra JA, Pollard KS. Comparing chromatin contact maps at scale: methods and insights. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.04.04.535480. [PMID: 37066196 PMCID: PMC10104037 DOI: 10.1101/2023.04.04.535480] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 04/18/2023]
Abstract
Comparing chromatin contact maps is an essential step in quantifying how three-dimensional (3D) genome organization shapes development, evolution, and disease. However, no gold standard exists for comparing contact maps, and even simple methods often disagree. In this study, we propose novel comparison methods and evaluate them alongside existing approaches using genome-wide Hi-C data and 22,500 in silico predicted contact maps. We also quantify the robustness of methods to common sources of biological and technical variation, such as boundary size and noise. We find that simple difference-based methods such as mean squared error are suitable for initial screening, but biologically informed methods are necessary to identify why maps diverge and propose specific functional hypotheses. We provide a reference guide, codebase, and benchmark for rapidly comparing chromatin contact maps at scale to enable biological insights into the 3D organization of the genome.
Collapse
Affiliation(s)
- Laura M. Gunsalus
- Gladstone Institute of Data Science and Biotechnology, San Francisco, CA
- Department of Epidemiology & Biostatistics, University of California, San Francisco, CA
| | - Evonne McArthur
- Department of Epidemiology & Biostatistics, University of California, San Francisco, CA
- Bakar Computational Health Sciences Institute, University of California, San Francisco, CA
- Vanderbilt Genetics Institute, Vanderbilt University Medical Center, Nashville, TN
| | - Ketrin Gjoni
- Gladstone Institute of Data Science and Biotechnology, San Francisco, CA
- Department of Epidemiology & Biostatistics, University of California, San Francisco, CA
| | - Shuzhen Kuang
- Gladstone Institute of Data Science and Biotechnology, San Francisco, CA
- Department of Epidemiology & Biostatistics, University of California, San Francisco, CA
| | - Maureen Pittman
- Gladstone Institute of Data Science and Biotechnology, San Francisco, CA
- Department of Epidemiology & Biostatistics, University of California, San Francisco, CA
| | - John A. Capra
- Department of Epidemiology & Biostatistics, University of California, San Francisco, CA
- Bakar Computational Health Sciences Institute, University of California, San Francisco, CA
| | - Katherine S. Pollard
- Gladstone Institute of Data Science and Biotechnology, San Francisco, CA
- Department of Epidemiology & Biostatistics, University of California, San Francisco, CA
- Bakar Computational Health Sciences Institute, University of California, San Francisco, CA
- Chan Zuckerberg Biohub, San Francisco, CA, USA
| |
Collapse
|