1
|
Yao Z, Fang K, Liu G, Bjørås M, Jin VX, Wang J. Integrated analysis of differential intra-chromosomal community interactions: A study of breast cancer. Artif Intell Med 2025; 167:103180. [PMID: 40449144 DOI: 10.1016/j.artmed.2025.103180] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/02/2024] [Revised: 05/15/2025] [Accepted: 05/23/2025] [Indexed: 06/02/2025]
Abstract
It is challenging to analyze the dynamics of intra-chromosomal interactions when considering multiple high-dimensional epigenetic datasets. A computational approach, differential network analysis in intra-chromosomal community interaction (DNAICI), was proposed here to elucidate these dynamics by integrating Hi-C data with other epigenetic data. DNAICI utilized a novel hyperparameter tuning method, for optimizing the network clustering, to identify valid intra-chromosomal community interactions at different resolutions. The approach was first trained on Hi-C data and other epigenetic data in an untreated and one hour estrogen (E2)-treated breast cancer cell line, MCF7, and uncovered two major types of valid intra-chromosomal community interactions (active/repressive) that resembles the properties of A/B compartments (or open/closed chromatin domains). It was further tested on the breast cancer cell line MCF7 and its corresponding tamoxifen-resistant (TR) derivative, MCF7TR, and identified 515 differentially interacting and expressed genes (DIEGs) within intra-chromosomal community interactions. In silico analysis of these DIEGs revealed that endocrine resistance is among the top biological pathways, suggesting an interacting/looping-mediated mechanism in regulating breast cancer tamoxifen resistance. This novel integrated network analysis approach offers a broad application in diverse biological systems for identifying a biological-context-specific differential community interaction.
Collapse
Affiliation(s)
- Zhihao Yao
- Department of Clinical Molecular Biology (EpiGen), Akershus University Hospital and University of Oslo, Lørenskog, Norway; Department of Microbiology, Oslo University Hospital and University of Oslo, Oslo, Norway
| | - Kun Fang
- Division of Biostatistics, Data Science Institute, Medical College of Wisconsin, Milwaukee, WI 53226, USA; MCW Cancer Center, Medical College of Wisconsin, Milwaukee, WI 53226, USA; Mellowes Center for Genomic Sciences and Precision Medicine, Medical College of Wisconsin, Milwaukee, WI 53226, USA
| | - Gege Liu
- Department of Pathology, Oslo University Hospital - Norwegian Radium Hospital, Oslo, Norway
| | - Magnar Bjørås
- Department of Microbiology, Oslo University Hospital and University of Oslo, Oslo, Norway; Department of Clinical and Molecular Medicine, Norwegian University of Science and Technology, Trondheim, Norway
| | - Victor X Jin
- Division of Biostatistics, Data Science Institute, Medical College of Wisconsin, Milwaukee, WI 53226, USA; MCW Cancer Center, Medical College of Wisconsin, Milwaukee, WI 53226, USA; Mellowes Center for Genomic Sciences and Precision Medicine, Medical College of Wisconsin, Milwaukee, WI 53226, USA.
| | - Junbai Wang
- Department of Clinical Molecular Biology (EpiGen), Akershus University Hospital and University of Oslo, Lørenskog, Norway.
| |
Collapse
|
2
|
Xie Q, Meng W, Lin S. scHiCSRS: a self-representation smoothing method with Gaussian mixture model for imputing single cell Hi-C data. BMC Bioinformatics 2025; 26:132. [PMID: 40399810 PMCID: PMC12093726 DOI: 10.1186/s12859-025-06147-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/08/2024] [Accepted: 04/23/2025] [Indexed: 05/23/2025] Open
Abstract
BACKGROUND Single cell Hi-C (scHi-C) techniques make it possible to study cell-to-cell variability, but excess of zeros are makes scHi-C matrices extremely sparse and difficult for downstream analyses. The observed zeros are a combination of two events: structural zeros for which two loci never interact due to underlying biological mechanisms, or dropouts (sampling zeros) where two loci interact but not captured due to insufficient sequencing depth. Although data quality improvement approaches have been proposed, little has been done to differentiate these two types of zeros, even though such a distinction can greatly benefit downstream analysis such as clustering. RESULTS We propose scHiCSRS, a self-representation smoothing method that improves data quality, and a Gaussian mixture model that identifies structural zeros among observed zeros. scHiCSRS not only takes spatial dependencies of a scHi-C data matrix into account but also borrows information from similar single cells. Through an extensive set of simulation studies, we demonstrate the ability of scHiCSRS for identifying structural zeros with high sensitivity and for accurate imputation of dropout values in sampling zeros. Downstream analyses for three experimental datasets show that data improved from scHiCSRS yield more accurate clustering of cells than simply using observed data or improved data from comparison methods. CONCLUSION In summary, scHiCSRS provides a valuable tool for identifying structural zeros and imputing dropouts. The resulted data are improved for downstream analysis, especially for understanding cell-to-cell variation through subtype clustering.
Collapse
Affiliation(s)
- Qing Xie
- Interdisciplinary Ph.D. Program in Biostatistics, The Ohio State University, Columbus, OH, 43210, USA
| | - Wang Meng
- Abigail Wexner Research Institute, Nationwide Children's Hospital, Columbus, OH, 43205, USA
| | - Shili Lin
- Interdisciplinary Ph.D. Program in Biostatistics, The Ohio State University, Columbus, OH, 43210, USA.
- Department of Statistics, The Ohio State University, Columbus, OH, 43210, USA.
| |
Collapse
|
3
|
Lee J, Mo HL, Ha Y, Nam DY, Lim G, Park JW, Park S, Choi WY, Lee HJ, Rhee JK. Unraveling the three-dimensional genome structure using machine learning. BMB Rep 2025; 58:203-208. [PMID: 40058875 PMCID: PMC12123201] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/24/2024] [Revised: 03/07/2024] [Accepted: 09/06/2024] [Indexed: 05/29/2025] Open
Abstract
The study of chromatin interactions has advanced considerably with technologies such as high-throughput chromosome conformation capture (Hi-C) sequencing, providing a genome-wide view of physical interactions within the nucleus. These techniques have revealed the existence of hierarchical chromatin structures such as compartments, topologically associating domains (TADs), and chromatin loops, which are crucial in genome organization and regulation. However, identifying and analyzing these structural features require advanced computational methods. In recent years, machine learning approaches, particularly deep learning, have emerged as powerful tools for detecting and analyzing structural information. In this review, we present an overview of various machine learning-based techniques for determining chromosomal organization. Starting with the progress in predicting interactions from DNA sequences, we describe methods for identifying various hierarchical structures from Hi-C data. Additionally, we present advances in enhancing the chromosome contact frequency map resolution to overcome the limitations of Hi-C data. Finally, we identify the remaining challenges and propose potential solutions and future directions. [BMB Reports 2025; 58(5): 203-208].
Collapse
Affiliation(s)
- Jiho Lee
- School of Systems Biomedical Science, Soongsil University, Seoul 06978, Korea
| | - Hye-Lim Mo
- Department of Bioinformatics & Life Science, Soongsil University, Seoul 06978, Korea
| | - Yoon Ha
- Department of Bioinformatics & Life Science, Soongsil University, Seoul 06978, Korea
| | - Dong Yeon Nam
- Department of Bioinformatics & Life Science, Soongsil University, Seoul 06978, Korea
| | - Geumnim Lim
- Department of Bioinformatics & Life Science, Soongsil University, Seoul 06978, Korea
| | - Jeong-Woon Park
- Department of Bioinformatics & Life Science, Soongsil University, Seoul 06978, Korea
| | - Seoyoung Park
- Department of Bioinformatics & Life Science, Soongsil University, Seoul 06978, Korea
| | - Woo-Young Choi
- Department of Bioinformatics & Life Science, Soongsil University, Seoul 06978, Korea
| | - Hyun Ji Lee
- Department of Bioinformatics & Life Science, Soongsil University, Seoul 06978, Korea
| | - Je-Keun Rhee
- School of Systems Biomedical Science, Soongsil University, Seoul 06978, Korea
- Department of Bioinformatics & Life Science, Soongsil University, Seoul 06978, Korea
| |
Collapse
|
4
|
Wang H, Yang J, Yu X, Zhang Y, Qian J, Wang J. Tensor-FLAMINGO unravels the complexity of single-cell spatial architectures of genomes at high-resolution. Nat Commun 2025; 16:3435. [PMID: 40210623 PMCID: PMC11986053 DOI: 10.1038/s41467-025-58674-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/15/2024] [Accepted: 03/26/2025] [Indexed: 04/12/2025] Open
Abstract
The dynamic three-dimensional spatial conformations of chromosomes demonstrate complex structural variations across single cells, which plays pivotal roles in modulating single-cell specific transcription and epigenetics landscapes. The high rates of missing contacts in single-cell chromatin contact maps impose significant challenges to reconstruct high-resolution spatial chromatin configurations. We develop a data-driven algorithm, Tensor-FLAMINGO, based on a low-rank tensor completion strategy. Implemented on a diverse panel of single-cell chromatin datasets, Tensor-FLAMINGO generates 10kb- and 30kb-resolution spatial chromosomal architectures across individual cells. Tensor-FLAMINGO achieves superior accuracy in reconstructing 3D chromatin structures, recovering missing contacts, and delineating cell clusters. The unprecedented high-resolution characterization of single-cell genome folding enables expanded identification of single-cell specific long-range chromatin interactions, multi-way spatial hubs, and the mechanisms of disease-associated GWAS variants. Beyond the sparse 2D contact maps, the complete 3D chromatin conformations promote an avenue to understand the dynamics of spatially coordinated molecular processes across different cells.
Collapse
Affiliation(s)
- Hao Wang
- Department of Computational Mathematics, Science and Engineering, Michigan State University, East Lansing, MI, 48824, USA
| | - Jiaxin Yang
- Department of Computational Mathematics, Science and Engineering, Michigan State University, East Lansing, MI, 48824, USA
| | - Xinrui Yu
- Department of Computational Mathematics, Science and Engineering, Michigan State University, East Lansing, MI, 48824, USA
| | - Yu Zhang
- Department of Microbiology, Genetics, and Immunology, Michigan State University, East Lansing, MI, 48824, USA.
| | - Jianliang Qian
- Department of Computational Mathematics, Science and Engineering, Michigan State University, East Lansing, MI, 48824, USA.
- Department of Mathematics, Michigan State University, East Lansing, MI, 48824, USA.
| | - Jianrong Wang
- Department of Computational Mathematics, Science and Engineering, Michigan State University, East Lansing, MI, 48824, USA.
| |
Collapse
|
5
|
Angel JC, El Amraoui N, Gürsoy G. pC-SAC: A method for high-resolution 3D genome reconstruction from low-resolution Hi-C data. Nucleic Acids Res 2025; 53:gkaf289. [PMID: 40226920 PMCID: PMC11995266 DOI: 10.1093/nar/gkaf289] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/02/2024] [Revised: 02/25/2025] [Accepted: 03/28/2025] [Indexed: 04/15/2025] Open
Abstract
The three-dimensional (3D) organization of the genome is crucial for gene regulation, with disruptions linked to various diseases. High-throughput Chromosome Conformation Capture (Hi-C) and related technologies have advanced our understanding of 3D genome organization by mapping interactions between distal genomic regions. However, capturing enhancer-promoter interactions at high resolution remains challenging due to the high sequencing depth required. We introduce pC-SAC (probabilistically Constrained Self-Avoiding Chromatin), a novel computational method for producing accurate high-resolution Hi-C matrices from low-resolution data. pC-SAC uses adaptive importance sampling with sequential Monte Carlo to generate ensembles of 3D chromatin chains that satisfy physical constraints derived from low-resolution Hi-C data. Our method achieves over 95% accuracy in reconstructing high-resolution chromatin maps and identifies novel interactions enriched with candidate cis-regulatory elements (cCREs) and expression quantitative trait loci (eQTLs). Benchmarking against state-of-the-art deep learning models demonstrates pC-SAC's performance in both short- and long-range interaction reconstruction. pC-SAC offers a cost-effective solution for enhancing the resolution of Hi-C data, thus enabling deeper insights into 3D genome organization and its role in gene regulation and disease. Our tool can be found at https://github.com/G2Lab/pCSAC.
Collapse
Affiliation(s)
- J Carlos Angel
- Department of Molecular Pharmacology and Therapeutics, Columbia University, New York, NY 10032, United States
- New York Genome Center, New York, NY 10013, United States
- Department of Biomedical Informatics, Columbia University, New York, NY 10032, United States
| | | | - Gamze Gürsoy
- Department of Biomedical Informatics, Columbia University, New York, NY 10032, United States
- New York Genome Center, New York, NY 10013, United States
- Department of Computer Science, Columbia University, New York, NY 10027, United States
| |
Collapse
|
6
|
Li C, Mowlaei ME, Carnevale V, Kumar S, Shi X. TRUHiC: A TRansformer-embedded U-2 Net to enhance Hi-C data for 3D chromatin structure characterization. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2025:2025.03.29.646133. [PMID: 40236218 PMCID: PMC11996377 DOI: 10.1101/2025.03.29.646133] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 04/17/2025]
Abstract
High-throughput chromosome conformation capture sequencing (Hi-C) is a key technology for studying the three-dimensional (3D) structure of genomes and chromatin folding. Hi-C data reveals important patterns of genome organization such as topologically associating domains (TADs) and chromatin loops with critical roles in transcriptional regulation and disease etiology and progression. However, the relatively low resolution of existing Hi-C data often hinders robust and reliable inference of 3D structures. Hence, we propose TRUHiC, a new computational method that leverages recent state-of-the-art deep generative modeling to augment low-resolution Hi-C data for the characterization of 3D chromatin structures. Applying TRUHiC to publically available Hi-C data for human and mice, we demonstrate that the augmented data significantly improves the characterization of TADs and loops across diverse cell lines and species. We further present a pre-trained TRUHiC on human lymphoblastoid cell lines that can be adaptable and transferable to improve chromatin characterization of various cell lines, tissues, and species.
Collapse
|
7
|
Gao R, Ferraro TN, Chen L, Zhang S, Chen Y. Enhancing Single-Cell and Bulk Hi-C Data Using a Generative Transformer Model. BIOLOGY 2025; 14:288. [PMID: 40136544 PMCID: PMC11940666 DOI: 10.3390/biology14030288] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 02/11/2025] [Revised: 03/01/2025] [Accepted: 03/10/2025] [Indexed: 03/27/2025]
Abstract
The 3D organization of chromatin in the nucleus plays a critical role in regulating gene expression and maintaining cellular functions in eukaryotic cells. High-throughput chromosome conformation capture (Hi-C) and its derivative technologies have been developed to map genome-wide chromatin interactions at the population and single-cell levels. However, insufficient sequencing depth and high noise levels in bulk Hi-C data, particularly in single-cell Hi-C (scHi-C) data, result in low-resolution contact matrices, thereby limiting diverse downstream computational analyses in identifying complex chromosomal organizations. To address these challenges, we developed a transformer-based deep learning model, HiCENT, to impute and enhance both scHi-C and Hi-C contact matrices. Validation experiments on large-scale bulk Hi-C and scHi-C datasets demonstrated that HiCENT achieves superior enhancement effects compared to five popular methods. When applied to real Hi-C data from the GM12878 cell line, HiCENT effectively enhanced 3D structural features at the scales of topologically associated domains and chromosomal loops. Furthermore, when applied to scHi-C data from five human cell lines, it significantly improved clustering performance, outperforming five widely used methods. The adaptability of HiCENT across different datasets and its capacity to improve the quality of chromatin interaction data will facilitate diverse downstream computational analyses in 3D genome research, single-cell studies and other large-scale omics investigations.
Collapse
Affiliation(s)
- Ruoying Gao
- College of Computer and Information Engineering, Tianjin Normal University, Tianjin 300387, China; (R.G.); (L.C.)
| | - Thomas N. Ferraro
- Department of Biomedical Sciences, Cooper Medical School of Rowan University, Camden, NJ 08103, USA;
| | - Liang Chen
- College of Computer and Information Engineering, Tianjin Normal University, Tianjin 300387, China; (R.G.); (L.C.)
| | - Shaoqiang Zhang
- College of Computer and Information Engineering, Tianjin Normal University, Tianjin 300387, China; (R.G.); (L.C.)
| | - Yong Chen
- Department of Biological and Biomedical Sciences, Rowan University, Glassboro, NJ 08028, USA
| |
Collapse
|
8
|
Taguchi YH, Turki T. Novel AI-powered computational method using tensor decomposition for identification of common optimal bin sizes when integrating multiple Hi-C datasets. Sci Rep 2025; 15:7459. [PMID: 40033014 DOI: 10.1038/s41598-025-91355-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/22/2024] [Accepted: 02/19/2025] [Indexed: 03/05/2025] Open
Abstract
Identifying the optimal bin sizes (or resolutions) for the integration of multiple Hi-C datasets is a challenge due to the fact that bin sizes must be common over multiple datasets. By contrast, the dependence of quality upon bin sizes can vary from dataset to dataset. Moreover, common structures should not be sought in bin sizes smaller than the optimal bin sizes, below which common structure cannot be the primary structure any more even after increasing the number of mapped short reads per bin. In this case, there are no common structures at finer resolutions, suggesting that individual Hi-C datasets may have to be analyzed separately in the bin sizes smaller than the optimal one. Thus, quality assessments of individual datasets have a limited ability to determine the best bin size for all datasets. In this study, we propose a novel application of tensor decomposition (TD) based unsupervised feature extraction (FE) to choose the optimal bin sizes for the integration of multiple Hi-C datasets. TD-based unsupervised FE exhibit phase transition-like phenomena through which the smallest possible bin size (or the highest resolution) can be automatically estimated empirically, without the need to manually set a threshold value for the integration of multiple Hi-C datasets, retrieved from GEO with GEO ID, GSE260760 and GSE255264. To our knowledge, ours is the first one that can optimize bin sizes over multiple Hi-C profiles without any tunable parameters.
Collapse
Affiliation(s)
- Y-H Taguchi
- Department of Physics, Chuo University, 1-13-27 Kasuga, Bunkyo-ku, Tokyo, 112-8551, Japan.
| | - Turki Turki
- Department of Computer Science, King Abdulaziz University, Jeddah, 21589, Saudi Arabia
| |
Collapse
|
9
|
Menon R, Mohit Chowdhury H, Oluwadare O. ScHiCAtt: Enhancing single-cell Hi-C data resolution using attention-based models. Comput Struct Biotechnol J 2025; 27:978-991. [PMID: 40160860 PMCID: PMC11953966 DOI: 10.1016/j.csbj.2025.02.031] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/16/2024] [Revised: 02/24/2025] [Accepted: 02/25/2025] [Indexed: 04/02/2025] Open
Abstract
The spatial organization of chromatin is fundamental to gene regulation and essential for proper cellular function. The Hi-C technique remains one of the leading methods for unraveling 3D genome structures; however, limited resolution, data sparsity, and incomplete coverage in single-cell Hi-C data pose significant challenges for comprehensive analysis. Traditional convolutional neural network-based models often suffer from blurring and loss of fine details, while generative adversarial network based methods encounter difficulties in maintaining diversity and generalization. Moreover, existing algorithms perform poorly in cross-cell line generalization, where a model trained on one cell type is used to enhance high-resolution data in another cell type. To address these limitations, we propose ScHiCAtt (Single-cell Hi-C Attention-Based Model), which leverages attention mechanisms to capture both long-range and local dependencies in Hi-C data, significantly enhancing resolution while preserving biologically meaningful interactions. By dynamically focusing on regions of interest, attention mechanisms effectively mitigate data sparsity and enhance model performance in low-resolution contexts. Extensive experiments on Human and Drosophila single-cell Hi-C data demonstrate that ScHiCAtt consistently outperforms existing methods in terms of computational and biological reproducibility metrics across various downsampling ratios. Our results also show superior generalization across different chromosomes of the same cell type, as well as across cell types, species, and from single-cell to bulk Hi-C data, highlighting the robustness and adaptability of our approach. ScHiCAtt source code is publicly available at https://github.com/OluwadareLab/ScHiCAtt.
Collapse
Affiliation(s)
- Rohit Menon
- Department of Computer Science, University of Colorado at Colorado Springs, Colorado Springs, 80918, CO, USA
| | - H.M.A. Mohit Chowdhury
- Department of Computer Science, University of Colorado at Colorado Springs, Colorado Springs, 80918, CO, USA
| | - Oluwatosin Oluwadare
- Department of Computer Science, University of Colorado at Colorado Springs, Colorado Springs, 80918, CO, USA
- Department of Biomedical Informatics, University of Colorado Anschutz Medical Campus, Aurora, 80045, CO, USA
| |
Collapse
|
10
|
Li K, Zhang P, Xu J, Wen Z, Zhang J, Zi Z, Li L. COCOA: A Framework for Fine-scale Mapping of Cell-type-specific Chromatin Compartments Using Epigenomic Information. GENOMICS, PROTEOMICS & BIOINFORMATICS 2025; 22:qzae091. [PMID: 39724385 PMCID: PMC11993304 DOI: 10.1093/gpbjnl/qzae091] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 11/30/2023] [Revised: 11/05/2024] [Accepted: 12/09/2024] [Indexed: 12/28/2024]
Abstract
Chromatin compartmentalization and epigenomic modifications play crucial roles in cell differentiation and disease development. However, precise mapping of chromatin compartment patterns requires Hi-C or Micro-C data at high sequencing depth. Exploring the systematic relationship between epigenomic modifications and compartment patterns remains challenging. To address these issues, we present COCOA, a deep neural network framework using convolution and attention mechanisms to infer fine-scale chromatin compartment patterns from six histone modification signals. COCOA extracts 1D track features through bidirectional feature reconstruction after resolution-specific binning of epigenomic signals. These track features are then cross-fused with contact features using an attention mechanism and transformed into chromatin compartment patterns through residual feature reduction. COCOA demonstrates accurate inference of chromatin compartmentalization at a fine-scale resolution and exhibits stable performance on test sets. Additionally, we explored the impact of histone modifications on chromatin compartmentalization prediction through in silico epigenomic perturbation experiments. Unlike obscure compartments observed in high-depth experimental data at 1-kb resolution, COCOA generates clear and detailed compartment patterns, highlighting its superior performance. Finally, we demonstrate that COCOA enables cell-type-specific prediction of unrevealed chromatin compartment patterns in various biological processes, making it an effective tool for gaining insights into chromatin compartmentalization from epigenomics in diverse biological scenarios. The COCOA Python code is publicly available at https://github.com/onlybugs/COCOA and https://ngdc.cncb.ac.cn/biocode/tools/BT007498.
Collapse
Affiliation(s)
- Kai Li
- Hubei Key Laboratory of Agricultural Bioinformatics, College of Informatics, Huazhong Agricultural University, Wuhan 430070, China
| | - Ping Zhang
- Hubei Key Laboratory of Agricultural Bioinformatics, College of Informatics, Huazhong Agricultural University, Wuhan 430070, China
| | - Jinsheng Xu
- Hubei Key Laboratory of Agricultural Bioinformatics, College of Informatics, Huazhong Agricultural University, Wuhan 430070, China
| | - Zi Wen
- Hubei Key Laboratory of Agricultural Bioinformatics, College of Informatics, Huazhong Agricultural University, Wuhan 430070, China
| | - Junying Zhang
- Hubei Key Laboratory of Agricultural Bioinformatics, College of Informatics, Huazhong Agricultural University, Wuhan 430070, China
| | - Zhike Zi
- Shenzhen Key Laboratory of Synthetic Genomics, Guangdong Provincial Key Laboratory of Synthetic Genomics, Key Laboratory of Quantitative Synthetic Biology, Shenzhen Institute of Synthetic Biology, Shenzhen Institute of Advanced Technology, Chinese Academy of Sciences, Shenzhen 518055, China
| | - Li Li
- Hubei Key Laboratory of Agricultural Bioinformatics, College of Informatics, Huazhong Agricultural University, Wuhan 430070, China
- Hubei Hongshan Laboratory, Wuhan 430070, China
| |
Collapse
|
11
|
Chen C, Zhang Z, Liu Y, Hong W, Karahan H, Wang J, Li W, Diao L, Yu M, Saykin AJ, Nho K, Kim J, Han L. Comprehensive characterization of the transcriptional landscape in Alzheimer's disease (AD) brains. SCIENCE ADVANCES 2025; 11:eadn1927. [PMID: 39752483 PMCID: PMC11698078 DOI: 10.1126/sciadv.adn1927] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 11/27/2023] [Accepted: 11/26/2024] [Indexed: 01/06/2025]
Abstract
Alzheimer's disease (AD) is the leading dementia among the elderly with complex origins. Despite extensive investigation into the AD-associated protein-coding genes, the involvement of noncoding RNAs (ncRNAs) and posttranscriptional modification (PTM) in AD pathogenesis remains unclear. Here, we comprehensively characterized the landscape of ncRNAs and PTM events in 1460 samples across six brain regions sourced from the Mount Sinai/JJ Peters VA Medical Center Brain Bank Study and Mayo cohorts, encompassing 33,321 long ncRNAs, 92,897 enhancer RNAs, 53,763 alternative polyadenylation events, and 900,221 A-to-I RNA editing events. We additionally identified 25,351 aberrantly expressed ncRNAs and altered PTM events associated with AD traits and further identified the corresponding protein-coding genes to construct regulatory networks. Furthermore, we developed a user-friendly data portal, ADatlas, facilitating users in exploring our results. Our study aims to establish a comprehensive data platform for ncRNAs and PTMs in AD to advance related research.
Collapse
Affiliation(s)
- Chengxuan Chen
- Department of Biostatistics and Health Data Science, School of Medicine, Indiana University, Indianapolis, IN 46202, USA
- Brown Center for Immunotherapy, School of Medicine, Indiana University, Indianapolis, IN 46202, USA
- Center for Computational Biology and Bioinformatics, Indiana University School of Medicine, Indianapolis, IN 46202, USA
- Center for Epigenetics and Disease Prevention, Institute of Biosciences and Technology, Texas A&M University, Houston, TX 77030, USA
| | - Zhao Zhang
- Department of Biochemistry and Molecular Biology, McGovern Medical School, University of Texas Health Science Center, Houston, TX 77030, USA
| | - Yuan Liu
- Department of Biostatistics and Health Data Science, School of Medicine, Indiana University, Indianapolis, IN 46202, USA
- Brown Center for Immunotherapy, School of Medicine, Indiana University, Indianapolis, IN 46202, USA
- Center for Computational Biology and Bioinformatics, Indiana University School of Medicine, Indianapolis, IN 46202, USA
- Center for Epigenetics and Disease Prevention, Institute of Biosciences and Technology, Texas A&M University, Houston, TX 77030, USA
| | - Wei Hong
- Department of Biochemistry and Molecular Biology, McGovern Medical School, University of Texas Health Science Center, Houston, TX 77030, USA
| | - Hande Karahan
- Stark Neurosciences Research Institute, Indiana University School of Medicine, Indianapolis, IN 46202, USA
- Department of Medical and Molecular Genetics, Indiana University School of Medicine, Indianapolis, IN 46202, USA
| | - Jun Wang
- Department of Pediatrics, McGovern Medical School at The University of Texas Health Science Center at Houston, Houston, TX 77030, USA
- The University of Texas MD Anderson Cancer Center and UTHealth Graduate School of Biomedical Sciences, Houston, TX 77030, USA
| | - Wenbo Li
- Department of Biochemistry and Molecular Biology, McGovern Medical School, University of Texas Health Science Center, Houston, TX 77030, USA
- The University of Texas MD Anderson Cancer Center and UTHealth Graduate School of Biomedical Sciences, Houston, TX 77030, USA
| | - Lixia Diao
- Department of Bioinformatics and Computational Biology, The University of Texas MD Anderson Cancer Center, Houston, TX 77030, USA
| | - Meichen Yu
- Indiana Alzheimer’s Disease Research Center, Indiana University School of Medicine, Indianapolis, IN 46202, USA
- Department of Radiology and Imaging Sciences, Indiana University School of Medicine, Indianapolis, IN 46202, USA
- Indiana University Network Science Institute, Bloomington, IN, USA
| | - Andrew J. Saykin
- Indiana Alzheimer’s Disease Research Center, Indiana University School of Medicine, Indianapolis, IN 46202, USA
- Department of Radiology and Imaging Sciences, Indiana University School of Medicine, Indianapolis, IN 46202, USA
- Indiana University Network Science Institute, Bloomington, IN, USA
| | - Kwangsik Nho
- Center for Computational Biology and Bioinformatics, Indiana University School of Medicine, Indianapolis, IN 46202, USA
- Indiana Alzheimer’s Disease Research Center, Indiana University School of Medicine, Indianapolis, IN 46202, USA
- Department of Radiology and Imaging Sciences, Indiana University School of Medicine, Indianapolis, IN 46202, USA
| | - Jungsu Kim
- Stark Neurosciences Research Institute, Indiana University School of Medicine, Indianapolis, IN 46202, USA
- Department of Medical and Molecular Genetics, Indiana University School of Medicine, Indianapolis, IN 46202, USA
| | - Leng Han
- Department of Biostatistics and Health Data Science, School of Medicine, Indiana University, Indianapolis, IN 46202, USA
- Brown Center for Immunotherapy, School of Medicine, Indiana University, Indianapolis, IN 46202, USA
- Center for Computational Biology and Bioinformatics, Indiana University School of Medicine, Indianapolis, IN 46202, USA
- Center for Epigenetics and Disease Prevention, Institute of Biosciences and Technology, Texas A&M University, Houston, TX 77030, USA
- Indiana Alzheimer’s Disease Research Center, Indiana University School of Medicine, Indianapolis, IN 46202, USA
| |
Collapse
|
12
|
Li C, Bonder MJ, Syed S, Jensen M, Gerstein MB, Zody MC, Chaisson MJP, Talkowski ME, Marschall T, Korbel JO, Eichler EE, Lee C, Shi X. An integrative TAD catalog in lymphoblastoid cell lines discloses the functional impact of deletions and insertions in human genomes. Genome Res 2024; 34:2304-2318. [PMID: 39638559 PMCID: PMC11694747 DOI: 10.1101/gr.279419.124] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/29/2024] [Accepted: 10/04/2024] [Indexed: 12/07/2024]
Abstract
The human genome is packaged within a three-dimensional (3D) nucleus and organized into structural units known as compartments, topologically associating domains (TADs), and loops. TAD boundaries, separating adjacent TADs, have been found to be well conserved across mammalian species and more evolutionarily constrained than TADs themselves. Recent studies show that structural variants (SVs) can modify 3D genomes through the disruption of TADs, which play an essential role in insulating genes from outside regulatory elements' aberrant regulation. However, how SV affects the 3D genome structure and their association among different aspects of gene regulation and candidate cis-regulatory elements (cCREs) have rarely been studied systematically. Here, we assess the impact of SVs intersecting with TAD boundaries by developing an integrative Hi-C analysis pipeline, which enables the generation of an in-depth catalog of TADs and TAD boundaries in human lymphoblastoid cell lines (LCLs) to fill the gap of limited resources. Our catalog contains 18,865 TADs, including 4596 sub-TADs, with 185 SVs (TAD-SVs) that alter chromatin architecture. By leveraging the ENCODE registry of cCREs in humans, we determine that 34 of 185 TAD-SVs intersect with cCREs and observe significant enrichment of TAD-SVs within cCREs. This study provides a database of TADs and TAD-SVs in the human genome that will facilitate future investigations of the impact of SVs on chromatin structure and gene regulation in health and disease.
Collapse
Affiliation(s)
- Chong Li
- Department of Computer and Information Sciences, College of Science and Technology, Temple University, Philadelphia, Pennsylvania 19122, USA
- Institute for Genomics and Evolutionary Medicine, Temple University, Philadelphia, Pennsylvania 19122, USA
| | - Marc Jan Bonder
- Department of Genetics, Groningen, University of Groningen, University Medical Center Groningen, Groningen 9713 AV, Netherlands
- Division of Computational Genomics and Systems Genetics, German Cancer Research Center, 69120 Heidelberg, Germany
| | - Sabriya Syed
- The Jackson Laboratory for Genomic Medicine, Farmington, Connecticut 06032, USA
| | - Matthew Jensen
- Department of Molecular Biochemistry and Biophysics, Yale University, New Haven, Connecticut 06510, USA
- Program in Computational Biology and Bioinformatics, Yale University, New Haven, Connecticut 06520, USA
| | - Mark B Gerstein
- Department of Molecular Biochemistry and Biophysics, Yale University, New Haven, Connecticut 06510, USA
- Program in Computational Biology and Bioinformatics, Yale University, New Haven, Connecticut 06520, USA
| | | | - Mark J P Chaisson
- Department of Quantitative and Computational Biology, University of Southern California, Los Angeles, California 90089, USA
| | - Michael E Talkowski
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, Massachusetts 02142, USA
- Center for Genomic Medicine, Massachusetts General Hospital, Boston, Massachusetts 02114, USA
- Department of Neurology, Massachusetts General Hospital and Harvard Medical School, Boston, Massachusetts 02114, USA
- Stanley Center for Psychiatric Research, Broad Institute of MIT and Harvard, Cambridge, Massachusetts 02142, USA
| | - Tobias Marschall
- Institute for Medical Biometry and Bioinformatics, Medical Faculty and University Hospital, Heinrich Heine University, 40225 Düsseldorf, Germany
- Center for Digital Medicine, Heinrich Heine University, 40225 Düsseldorf, Germany
| | - Jan O Korbel
- European Molecular Biology Laboratory (EMBL), Genome Biology Unit, 69117 Heidelberg, Germany
| | - Evan E Eichler
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, Washington 98195-5065, USA
- Howard Hughes Medical Institute, University of Washington, Seattle, Washington 98195, USA
| | - Charles Lee
- The Jackson Laboratory for Genomic Medicine, Farmington, Connecticut 06032, USA
- Department of Genetics and Genome Sciences, UConn Health, Farmington, Connecticut 06030-6403, USA
| | - Xinghua Shi
- Department of Computer and Information Sciences, College of Science and Technology, Temple University, Philadelphia, Pennsylvania 19122, USA;
- Institute for Genomics and Evolutionary Medicine, Temple University, Philadelphia, Pennsylvania 19122, USA
| |
Collapse
|
13
|
Tavallaee G, Orouji E. Mapping the 3D genome architecture. Comput Struct Biotechnol J 2024; 27:89-101. [PMID: 39816913 PMCID: PMC11732852 DOI: 10.1016/j.csbj.2024.12.018] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/05/2024] [Revised: 12/17/2024] [Accepted: 12/20/2024] [Indexed: 01/18/2025] Open
Abstract
The spatial organization of the genome plays a critical role in regulating gene expression, cellular differentiation, and genome stability. This review provides an in-depth examination of the methodologies, computational tools, and frameworks developed to map the three-dimensional (3D) architecture of the genome, focusing on both ligation-based and ligation-free techniques. We also explore the limitations of these methods, including biases introduced by restriction enzyme digestion and ligation inefficiencies, and compare them to more recent ligation-free approaches such as Genome Architecture Mapping (GAM) and Split-Pool Recognition of Interactions by Tag Extension (SPRITE). These techniques offer unique insights into higher-order chromatin structures by bypassing ligation steps, thus enabling the capture of complex multi-way interactions that are often challenging to resolve with traditional methods. Furthermore, we discuss the integration of chromatin interaction data with other genomic layers through multimodal approaches, including recent advances in single-cell technologies like sci-HiC and scSPRITE, which help unravel the heterogeneity of chromatin architecture in development and disease.
Collapse
Affiliation(s)
- Ghazaleh Tavallaee
- Princess Margaret Cancer Centre, University Health Network, Toronto, Ontario, Canada
| | - Elias Orouji
- Princess Margaret Cancer Centre, University Health Network, Toronto, Ontario, Canada
| |
Collapse
|
14
|
Kenari NS, Bayat F, Libbrecht MW. VSS-Hi-C: variance-stabilized signals for chromatin contacts. Bioinformatics 2024; 40:btae715. [PMID: 39658249 PMCID: PMC11648998 DOI: 10.1093/bioinformatics/btae715] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/31/2024] [Revised: 10/29/2024] [Accepted: 12/07/2024] [Indexed: 12/12/2024] Open
Abstract
MOTIVATION The genome-wide chromosome conformation capture assay Hi-C is widely used to study chromatin 3D structures and their functional implications. Read counts from Hi-C indicate the strength of chromatin contact between each pair of genomic loci. These read counts are heteroskedastic: that is, a difference between the interaction frequency of 0 and 100 is much more significant than a difference between the interaction frequency of 1000 and 1100. This property impedes visualization and downstream analysis because it violates the Gaussian variable assumption of many computational tools. Thus heuristic transformations aimed at stabilizing the variance of signals like the shifted-log transformation are typically applied to data before its visualization and inputting to models with Gaussian assumption. However, such heuristic transformations cannot fully stabilize the variance because of their restrictive assumptions about the mean-variance relationship in the data. RESULTS Here, we present VSS-Hi-C, a data-driven variance stabilization method for Hi-C data. We show that VSS-Hi-C signals have a unit variance improving visualization of Hi-C, for example in heatmap contact maps. VSS-Hi-C signals also improve the performance of subcompartment callers relying on Gaussian observations. VSS-Hi-C is implemented as an R package and can be used for variance stabilization of different genomic and epigenomic data types with two replicates available. AVAILABILITY AND IMPLEMENTATION https://github.com/nedashokraneh/vssHiC.
Collapse
Affiliation(s)
- Neda Shokraneh Kenari
- Department of Computing Science, Simon Fraser University, Burnaby, BC, V5A 1S6, Canada
| | - Faezeh Bayat
- Department of Computing Science, Simon Fraser University, Burnaby, BC, V5A 1S6, Canada
| | - Maxwell W Libbrecht
- Department of Computing Science, Simon Fraser University, Burnaby, BC, V5A 1S6, Canada
| |
Collapse
|
15
|
Wang Y, Kong S, Zhou C, Wang Y, Zhang Y, Fang Y, Li G. A review of deep learning models for the prediction of chromatin interactions with DNA and epigenomic profiles. Brief Bioinform 2024; 26:bbae651. [PMID: 39708837 DOI: 10.1093/bib/bbae651] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/23/2024] [Revised: 10/29/2024] [Accepted: 12/03/2024] [Indexed: 12/23/2024] Open
Abstract
Advances in three-dimensional (3D) genomics have revealed the spatial characteristics of chromatin interactions in gene expression regulation, which is crucial for understanding molecular mechanisms in biological processes. High-throughput technologies like ChIA-PET, Hi-C, and their derivatives methods have greatly enhanced our knowledge of 3D chromatin architecture. However, the chromatin interaction mechanisms remain largely unexplored. Deep learning, with its powerful feature extraction and pattern recognition capabilities, offers a promising approach for integrating multi-omics data, to build accurate predictive models of chromatin interaction matrices. This review systematically summarizes recent advances in chromatin interaction matrix prediction models. By integrating DNA sequences and epigenetic signals, we investigate the latest developments in these methods. This article details various models, focusing on how one-dimensional (1D) information transforms into the 3D structure chromatin interactions, and how the integration of different deep learning modules specifically affects model accuracy. Additionally, we discuss the critical role of DNA sequence information and epigenetic markers in shaping 3D genome interaction patterns. Finally, this review addresses the challenges in predicting chromatin interaction matrices, in order to improve the precise mapping of chromatin interaction matrices and DNA sequence, and supporting the transformation and theoretical development of 3D genomics across biological systems.
Collapse
Affiliation(s)
- Yunlong Wang
- Shenzhen Branch, Guangdong Laboratory of Lingnan Modern Agriculture, Key Laboratory of Livestock and Poultry Multi-omics of MARA, Agricultural Genomics Institute at Shenzhen, Chinese Academy of Agricultural Sciences, No. 97 Buxin Road, Dapeng New District, Shenzhen 518120, China
| | - Siyuan Kong
- Shenzhen Branch, Guangdong Laboratory of Lingnan Modern Agriculture, Key Laboratory of Livestock and Poultry Multi-omics of MARA, Agricultural Genomics Institute at Shenzhen, Chinese Academy of Agricultural Sciences, No. 97 Buxin Road, Dapeng New District, Shenzhen 518120, China
| | - Cong Zhou
- Agricultural Bioinformatics Key Laboratory of Hubei Province, Huazhong Agricultural University, No. 1 Shizishan Street, Hongshan District, Wuhan 430070, China
- Hubei Engineering Technology Research Center of Agricultural Big Data, 3D Genomics Research Center, No. 1 Shizishan Street, Hongshan District, Wuhan 430070, China
- College of Informatics, Huazhong Agricultural University, No. 1 Shizishan Street, Hongshan District, Wuhan 430070, China
| | - Yanfang Wang
- State Key Laboratory of Animal Biotech Breeding, Institute of Animal Science, Chinese Academy of Agricultural Sciences (CAAS), No. 2 West Yuanmingyuan Rd, Haidian District, Beijing 100193, China
| | - Yubo Zhang
- Shenzhen Branch, Guangdong Laboratory of Lingnan Modern Agriculture, Key Laboratory of Livestock and Poultry Multi-omics of MARA, Agricultural Genomics Institute at Shenzhen, Chinese Academy of Agricultural Sciences, No. 97 Buxin Road, Dapeng New District, Shenzhen 518120, China
- Sequencing Facility, Frederick National Laboratory for Cancer Research, 8560 Progress Drive, Frederick, MD 21701, United States
| | - Yaping Fang
- Agricultural Bioinformatics Key Laboratory of Hubei Province, Huazhong Agricultural University, No. 1 Shizishan Street, Hongshan District, Wuhan 430070, China
- Hubei Engineering Technology Research Center of Agricultural Big Data, 3D Genomics Research Center, No. 1 Shizishan Street, Hongshan District, Wuhan 430070, China
- College of Informatics, Huazhong Agricultural University, No. 1 Shizishan Street, Hongshan District, Wuhan 430070, China
| | - Guoliang Li
- Agricultural Bioinformatics Key Laboratory of Hubei Province, Huazhong Agricultural University, No. 1 Shizishan Street, Hongshan District, Wuhan 430070, China
- Hubei Engineering Technology Research Center of Agricultural Big Data, 3D Genomics Research Center, No. 1 Shizishan Street, Hongshan District, Wuhan 430070, China
- College of Informatics, Huazhong Agricultural University, No. 1 Shizishan Street, Hongshan District, Wuhan 430070, China
| |
Collapse
|
16
|
Gao VR, Yang R, Das A, Luo R, Luo H, McNally DR, Karagiannidis I, Rivas MA, Wang ZM, Barisic D, Karbalayghareh A, Wong W, Zhan YA, Chin CR, Noble WS, Bilmes JA, Apostolou E, Kharas MG, Béguelin W, Viny AD, Huangfu D, Rudensky AY, Melnick AM, Leslie CS. ChromaFold predicts the 3D contact map from single-cell chromatin accessibility. Nat Commun 2024; 15:9432. [PMID: 39487131 PMCID: PMC11530433 DOI: 10.1038/s41467-024-53628-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/18/2023] [Accepted: 10/14/2024] [Indexed: 11/04/2024] Open
Abstract
Identifying cell-type-specific 3D chromatin interactions between regulatory elements can help decipher gene regulation and interpret disease-associated non-coding variants. However, achieving this resolution with current 3D genomics technologies is often infeasible given limited input cell numbers. We therefore present ChromaFold, a deep learning model that predicts 3D contact maps, including regulatory interactions, from single-cell ATAC sequencing (scATAC-seq) data alone. ChromaFold uses pseudobulk chromatin accessibility, co-accessibility across metacells, and a CTCF motif track as inputs and employs a lightweight architecture to train on standard GPUs. Trained on paired scATAC-seq and Hi-C data in human samples, ChromaFold accurately predicts the 3D contact map and peak-level interactions across diverse human and mouse test cell types. Compared to leading contact map prediction models that use ATAC-seq and CTCF ChIP-seq, ChromaFold achieves state-of-the-art performance using only scATAC-seq. Finally, fine-tuning ChromaFold on paired scATAC-seq and Hi-C in a complex tissue enables deconvolution of chromatin interactions across cell subpopulations.
Collapse
Affiliation(s)
- Vianne R Gao
- Computational and Systems Biology Program, Memorial Sloan Kettering Cancer Center, New York, NY, USA
- Tri-Institutional Program in Computational Biology and Medicine, New York, NY, USA
| | - Rui Yang
- Computational and Systems Biology Program, Memorial Sloan Kettering Cancer Center, New York, NY, USA
- Tri-Institutional Program in Computational Biology and Medicine, New York, NY, USA
| | - Arnav Das
- University of Washington, Seattle, WA, USA
| | - Renhe Luo
- Developmental Biology Program, Sloan Kettering Institute, New York, NY, USA
| | - Hanzhi Luo
- Molecular Pharmacology Program, Experimental Therapeutics Center and Center for Stem Cell Biology, Memorial Sloan Kettering Cancer Center, New York, NY, USA
| | - Dylan R McNally
- Caryl and Israel Englander Institute for Precision Medicine, Institute for Computational Biomedicine, Weill Cornell Medicine, Cornell University, New York, NY, USA
| | - Ioannis Karagiannidis
- Division of Hematology and Medical Oncology, Department of Medicine, Weill Cornell Medical College, New York, NY, USA
| | - Martin A Rivas
- Division of Hematology and Medical Oncology, Department of Medicine, Weill Cornell Medical College, New York, NY, USA
- Department of Biochemistry & Molecular Biology; Sylvester Comprehensive Cancer Center, University of Miami Miller School of Medicine, Miami, FL, USA
| | - Zhong-Min Wang
- Howard Hughes Medical Institute and Immunology Program, Sloan Kettering Institute and Ludwig Center at Memorial Sloan Kettering Cancer Center, New York, NY, USA
| | - Darko Barisic
- Division of Hematology and Medical Oncology, Department of Medicine, Weill Cornell Medical College, New York, NY, USA
| | - Alireza Karbalayghareh
- Computational and Systems Biology Program, Memorial Sloan Kettering Cancer Center, New York, NY, USA
| | - Wilfred Wong
- Computational and Systems Biology Program, Memorial Sloan Kettering Cancer Center, New York, NY, USA
- Tri-Institutional Program in Computational Biology and Medicine, New York, NY, USA
| | - Yingqian A Zhan
- Center for Epigenetics Research, Memorial Sloan Kettering Cancer Center, New York, NY, USA
| | - Christopher R Chin
- Division of Hematology and Medical Oncology, Department of Medicine, Weill Cornell Medical College, New York, NY, USA
| | | | | | - Effie Apostolou
- Joan and Sanford I. Weill Department of Medicine, Sandra and Edward Meyer Cancer Center, Weill Cornell Medicine, New York, NY, USA
| | - Michael G Kharas
- Molecular Pharmacology Program, Experimental Therapeutics Center and Center for Stem Cell Biology, Memorial Sloan Kettering Cancer Center, New York, NY, USA
| | - Wendy Béguelin
- Division of Hematology and Medical Oncology, Department of Medicine, Weill Cornell Medical College, New York, NY, USA
| | - Aaron D Viny
- Departments of Medicine, Division of Hematology & Oncology, and of Genetics & Development, Columbia Stem Cell Initiative, Herbert Irving Comprehensive Cancer Center, Columbia University Irving Medical Center, New York, NY, USA
| | - Danwei Huangfu
- Developmental Biology Program, Sloan Kettering Institute, New York, NY, USA
| | - Alexander Y Rudensky
- Howard Hughes Medical Institute and Immunology Program, Sloan Kettering Institute and Ludwig Center at Memorial Sloan Kettering Cancer Center, New York, NY, USA
| | - Ari M Melnick
- Division of Hematology and Medical Oncology, Department of Medicine, Weill Cornell Medical College, New York, NY, USA
| | - Christina S Leslie
- Computational and Systems Biology Program, Memorial Sloan Kettering Cancer Center, New York, NY, USA.
| |
Collapse
|
17
|
Bera P, Mondal J. Machine learning unravels inherent structural patterns in Escherichia coli Hi-C matrices and predicts chromosome dynamics. Nucleic Acids Res 2024; 52:10836-10849. [PMID: 39217471 PMCID: PMC11472170 DOI: 10.1093/nar/gkae749] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/08/2024] [Accepted: 08/19/2024] [Indexed: 09/04/2024] Open
Abstract
High dimensional nature of the chromosomal conformation contact map ('Hi-C Map'), even for microscopically small bacterial cell, poses challenges for extracting meaningful information related to its complex organization. Here we first demonstrate that an artificial deep neural network-based machine-learnt (ML) low-dimensional representation of a recently reported Hi-C interaction map of archetypal bacteria Escherichia coli can decode crucial underlying structural pattern. The ML-derived representation of Hi-C map can automatically detect a set of spatially distinct domains across E. coli genome, sharing reminiscences of six putative macro-domains previously posited via recombination assay. Subsequently, a ML-generated model assimilates the intricate relationship between large array of Hi-C-derived chromosomal contact probabilities and respective diffusive dynamics of each individual chromosomal gene and identifies an optimal number of functionally important chromosomal contact-pairs that are majorly responsible for heterogenous, coordinate-dependent sub-diffusive motions of chromosomal loci. Finally, the ML models, trained on wild-type E. coli show-cased its predictive capabilities on mutant bacterial strains, shedding light on the structural and dynamic nuances of ΔMatP30MM and ΔMukBEF22MM chromosomes. Overall our results illuminate the power of ML techniques in unraveling the complex relationship between structure and dynamics of bacterial chromosomal loci, promising meaningful connections between ML-derived insights and biological phenomena.
Collapse
Affiliation(s)
- Palash Bera
- Tata Institute of Fundamental Research Hyderabad, Telangana 500046, India
| | - Jagannath Mondal
- Tata Institute of Fundamental Research Hyderabad, Telangana 500046, India
| |
Collapse
|
18
|
Murtaza G, Wagner J, Zook JM, Singh R. GrapHiC: An integrative graph based approach for imputing missing Hi-C reads. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2024; PP:10.1109/TCBB.2024.3477909. [PMID: 39392732 PMCID: PMC12034241 DOI: 10.1109/tcbb.2024.3477909] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 10/13/2024]
Abstract
Hi-C experiments allow researchers to study and understand the 3D genome organization and its regulatory function. Unfortunately, sequencing costs and technical constraints severely restrict access to high-quality Hi-C data for many cell types. Existing frameworks rely on a sparse Hi-C dataset or cheaper-to-acquire ChIP-seq data to predict Hi-C contact maps with high read coverage. However, these methods fail to generalize to sparse or cross-cell-type inputs because they do not account for the contributions of epigenomic features or the impact of the structural neighborhood in predicting Hi-C reads. We propose GrapHiC, which combines Hi-C and ChIP-seq in a graph representation, allowing more accurate embedding of structural and epigenomic features. Each node represents a binned genomic region, and we assign edge weights using the observed Hi-C reads. Additionally, we embed ChIP-seq and relative positional information as node attributes, allowing our representation to capture structural neighborhoods and the contributions of proteins and their modifications for predicting Hi-C reads. We show that GrapHiC generalizes better than the current state-of-the-art on cross-cell-type settings and sparse Hi-C inputs. Moreover, we can utilize our framework to impute Hi-C reads even when no Hi-C contact map is available, thus making high-quality Hi-C data accessible for many cell types. Availability: https://github.com/rsinghlab/GrapHiC.
Collapse
|
19
|
Zhou Y, Li T, Choppavarapu L, Fang K, Lin S, Jin VX. Integration of scHi-C and scRNA-seq data defines distinct 3D-regulated and biological-context dependent cell subpopulations. Nat Commun 2024; 15:8310. [PMID: 39333113 PMCID: PMC11436782 DOI: 10.1038/s41467-024-52440-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/29/2024] [Accepted: 09/06/2024] [Indexed: 09/29/2024] Open
Abstract
An integration of 3D chromatin structure and gene expression at single-cell resolution has yet been demonstrated. Here, we develop a computational method, a multiomic data integration (MUDI) algorithm, which integrates scHi-C and scRNA-seq data to precisely define the 3D-regulated and biological-context dependent cell subpopulations or topologically integrated subpopulations (TISPs). We demonstrate its algorithmic utility on the publicly available and newly generated scHi-C and scRNA-seq data. We then test and apply MUDI in a breast cancer cell model system to demonstrate its biological-context dependent utility. We find the newly defined topologically conserved associating domain (CAD) is the characteristic single-cell 3D chromatin structure and better characterizes chromatin domains in single-cell resolution. We further identify 20 TISPs uniquely characterizing 3D-regulated breast cancer cellular states. We reveal two of TISPs are remarkably resemble to high cycling breast cancer persister cells and chromatin modifying enzymes might be functional regulators to drive the alteration of the 3D chromatin structures. Our comprehensive integration of scHi-C and scRNA-seq data in cancer cells at single-cell resolution provides mechanistic insights into 3D-regulated heterogeneity of developing drug-tolerant cancer cells.
Collapse
Affiliation(s)
- Yufan Zhou
- Department of Molecular Medicine, University of Texas Health San Antonio, San Antonio, TX, USA
| | - Tian Li
- Department of Molecular Medicine, University of Texas Health San Antonio, San Antonio, TX, USA
| | - Lavanya Choppavarapu
- Division of Biostatistics, The Medical College of Wisconsin, Milwaukee, WI, USA
- MCW Cancer Center, The Medical College of Wisconsin, Milwaukee, WI, USA
| | - Kun Fang
- Division of Biostatistics, The Medical College of Wisconsin, Milwaukee, WI, USA
- MCW Cancer Center, The Medical College of Wisconsin, Milwaukee, WI, USA
| | - Shili Lin
- Department of Statistics, The Ohio State University, Columbus, OH, USA
| | - Victor X Jin
- Division of Biostatistics, The Medical College of Wisconsin, Milwaukee, WI, USA.
- MCW Cancer Center, The Medical College of Wisconsin, Milwaukee, WI, USA.
| |
Collapse
|
20
|
Zhou B, Liu Q, Wang M, Wu H. Deep neural network models for cell type prediction based on single-cell Hi-C data. BMC Genomics 2024; 22:922. [PMID: 39285318 PMCID: PMC11406723 DOI: 10.1186/s12864-024-10764-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/13/2020] [Accepted: 09/02/2024] [Indexed: 09/19/2024] Open
Abstract
BACKGROUND Cell type prediction is crucial to cell type identification of genomics, cancer diagnosis and drug development, and it can solve the time-consuming and difficult problem of cell classification in biological experiments. Therefore, a computational method is urgently needed to classify and predict cell types using single-cell Hi-C data. In previous studies, there is a lack of convenient and accurate method to predict cell types based on single-cell Hi-C data. Deep neural networks can form complex representations of single-cell Hi-C data and make it possible to handle the multidimensional and sparse biological datasets. RESULTS We compare the performance of SCANN with existing methods and analyze the model by using five different evaluation metrics. When using only ML1 and ML3 datasets, the ARI and NMI values of SCANN increase by 14% and 11% over those of scHiCluster respectively. However, when using all six libraries of data, the ARI and NMI values of SCANN increase by 63% and 88% over those of scHiCluster respectively. These findings show that SCANN is highly accurate in predicting the type of independent cell samples using single-cell Hi-C data. CONCLUSIONS SCANN enhances the training speed and requires fewer resources for predicting cell types. In addition, when the number of cells in different cell types was extremely unbalanced, SCANN has higher stability and flexibility in solving cell classification and cell type prediction using the single-cell Hi-C data. This predication method can assist biologists to study the differences in the chromosome structure of cells between different cell types.
Collapse
Affiliation(s)
- Bing Zhou
- School of Software, Shandong University, Jinan, Shandong, 250100, China
- College of Information Engineering, Northwest A&F University, 712100, Yangling, Shaanxi, China
| | - Quanzhong Liu
- College of Information Engineering, Northwest A&F University, 712100, Yangling, Shaanxi, China
| | - Meili Wang
- College of Information Engineering, Northwest A&F University, 712100, Yangling, Shaanxi, China.
| | - Hao Wu
- School of Software, Shandong University, Jinan, Shandong, 250100, China.
| |
Collapse
|
21
|
Murtaza G, Butaney B, Wagner J, Singh R. scGrapHiC: deep learning-based graph deconvolution for Hi-C using single cell gene expression. Bioinformatics 2024; 40:i490-i500. [PMID: 38940151 PMCID: PMC11256916 DOI: 10.1093/bioinformatics/btae223] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/29/2024] Open
Abstract
SUMMARY Single-cell Hi-C (scHi-C) protocol helps identify cell-type-specific chromatin interactions and sheds light on cell differentiation and disease progression. Despite providing crucial insights, scHi-C data is often underutilized due to the high cost and the complexity of the experimental protocol. We present a deep learning framework, scGrapHiC, that predicts pseudo-bulk scHi-C contact maps using pseudo-bulk scRNA-seq data. Specifically, scGrapHiC performs graph deconvolution to extract genome-wide single-cell interactions from a bulk Hi-C contact map using scRNA-seq as a guiding signal. Our evaluations show that scGrapHiC, trained on seven cell-type co-assay datasets, outperforms typical sequence encoder approaches. For example, scGrapHiC achieves a substantial improvement of 23.2% in recovering cell-type-specific Topologically Associating Domains over the baselines. It also generalizes to unseen embryo and brain tissue samples. scGrapHiC is a novel method to generate cell-type-specific scHi-C contact maps using widely available genomic signals that enables the study of cell-type-specific chromatin interactions. AVAILABILITY AND IMPLEMENTATION The GitHub link: https://github.com/rsinghlab/scGrapHiC contains the source code of scGrapHiC and associated scripts to preprocess publicly available datasets to produce the results and visualizations we have discuss in this manuscript.
Collapse
Affiliation(s)
- Ghulam Murtaza
- Department of Computer Science, Brown University, 115 Waterman Street, Providence, RI, 02912, United States
| | - Byron Butaney
- Department of Computer Science, Brown University, 115 Waterman Street, Providence, RI, 02912, United States
| | - Justin Wagner
- Material Measurement Laboratory, National Institute of Standards and Technology, Gaithersburg, MD, 20899, United States
| | - Ritambhara Singh
- Department of Computer Science, Brown University, 115 Waterman Street, Providence, RI, 02912, United States
- Center for Computational Molecular Biology, Brown University, 164 Angell Street, Providence, RI, 02912, United States
| |
Collapse
|
22
|
Fang T, Liu Y, Woicik A, Lu M, Jha A, Wang X, Li G, Hristov B, Liu Z, Xu H, Noble WS, Wang S. Enhancing Hi-C contact matrices for loop detection with Capricorn: a multiview diffusion model. Bioinformatics 2024; 40:i471-i480. [PMID: 38940142 PMCID: PMC11211821 DOI: 10.1093/bioinformatics/btae211] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/29/2024] Open
Abstract
MOTIVATION High-resolution Hi-C contact matrices reveal the detailed three-dimensional architecture of the genome, but high-coverage experimental Hi-C data are expensive to generate. Simultaneously, chromatin structure analyses struggle with extremely sparse contact matrices. To address this problem, computational methods to enhance low-coverage contact matrices have been developed, but existing methods are largely based on resolution enhancement methods for natural images and hence often employ models that do not distinguish between biologically meaningful contacts, such as loops and other stochastic contacts. RESULTS We present Capricorn, a machine learning model for Hi-C resolution enhancement that incorporates small-scale chromatin features as additional views of the input Hi-C contact matrix and leverages a diffusion probability model backbone to generate a high-coverage matrix. We show that Capricorn outperforms the state of the art in a cross-cell-line setting, improving on existing methods by 17% in mean squared error and 26% in F1 score for chromatin loop identification from the generated high-coverage data. We also demonstrate that Capricorn performs well in the cross-chromosome setting and cross-chromosome, cross-cell-line setting, improving the downstream loop F1 score by 14% relative to existing methods. We further show that our multiview idea can also be used to improve several existing methods, HiCARN and HiCNN, indicating the wide applicability of this approach. Finally, we use DNA sequence to validate discovered loops and find that the fraction of CTCF-supported loops from Capricorn is similar to those identified from the high-coverage data. Capricorn is a powerful Hi-C resolution enhancement method that enables scientists to find chromatin features that cannot be identified in the low-coverage contact matrix. AVAILABILITY AND IMPLEMENTATION Implementation of Capricorn and source code for reproducing all figures in this paper are available at https://github.com/CHNFTQ/Capricorn.
Collapse
Affiliation(s)
- Tangqi Fang
- Paul G. Allen School of Computer Science and Engineering, University of Washington, Seattle, WA 98195, United States
| | - Yifeng Liu
- Paul G. Allen School of Computer Science and Engineering, University of Washington, Seattle, WA 98195, United States
| | - Addie Woicik
- Paul G. Allen School of Computer Science and Engineering, University of Washington, Seattle, WA 98195, United States
| | - Minsi Lu
- Paul G. Allen School of Computer Science and Engineering, University of Washington, Seattle, WA 98195, United States
| | - Anupama Jha
- Department of Genome Sciences, University of Washington, Seattle, WA 98195, United States
| | - Xiao Wang
- Department of Computer Science, Purdue University, West Lafayette, IN 47907, United States
| | - Gang Li
- Department of Genome Sciences, University of Washington, Seattle, WA 98195, United States
- eScience Institute, University of Washington, Seattle, WA 98195, United States
| | - Borislav Hristov
- Department of Genome Sciences, University of Washington, Seattle, WA 98195, United States
| | - Zixuan Liu
- Paul G. Allen School of Computer Science and Engineering, University of Washington, Seattle, WA 98195, United States
| | - Hanwen Xu
- Paul G. Allen School of Computer Science and Engineering, University of Washington, Seattle, WA 98195, United States
| | - William S Noble
- Paul G. Allen School of Computer Science and Engineering, University of Washington, Seattle, WA 98195, United States
- Department of Genome Sciences, University of Washington, Seattle, WA 98195, United States
| | - Sheng Wang
- Paul G. Allen School of Computer Science and Engineering, University of Washington, Seattle, WA 98195, United States
| |
Collapse
|
23
|
Zhu H, Liu T, Wang Z. C2c: Predicting Micro-C from Hi-C. Genes (Basel) 2024; 15:673. [PMID: 38927609 PMCID: PMC11203216 DOI: 10.3390/genes15060673] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/22/2024] [Revised: 05/16/2024] [Accepted: 05/20/2024] [Indexed: 06/28/2024] Open
Abstract
MOTIVATION High-resolution Hi-C data, capable of detecting chromatin features below the level of Topologically Associating Domains (TADs), significantly enhance our understanding of gene regulation. Micro-C, a variant of Hi-C incorporating a micrococcal nuclease (MNase) digestion step to examine interactions between nucleosome pairs, has been developed to overcome the resolution limitations of Hi-C. However, Micro-C experiments pose greater technical challenges compared to Hi-C, owing to the need for precise MNase digestion control and higher-resolution sequencing. Therefore, developing computational methods to derive Micro-C data from existing Hi-C datasets could lead to better usage of a large amount of existing Hi-C data in the scientific community and cost savings. RESULTS We developed C2c ("high" or upper case C to "micro" or lower case c), a computational tool based on a residual neural network to learn the mapping between Hi-C and Micro-C contact matrices and then predict Micro-C contact matrices based on Hi-C contact matrices. Our evaluation results show that the predicted Micro-C contact matrices reveal more chromatin loops than the input Hi-C contact matrices, and more of the loops detected from predicted Micro-C match the promoter-enhancer interactions. Furthermore, we found that the mutual loops from real and predicted Micro-C better match the ChIA-PET data compared to Hi-C and real Micro-C loops, and the predicted Micro-C leads to more TAD-boundaries detected compared to the Hi-C data. The website URL of C2c can be found in the Data Availability Statement.
Collapse
Affiliation(s)
| | | | - Zheng Wang
- Department of Computer Science, University of Miami, 330M Ungar Building, 1365 Memorial Drive, Coral Gables, FL 33124-4245, USA; (H.Z.); (T.L.)
| |
Collapse
|
24
|
Wang Y, Cheng J. HiCDiff: single-cell Hi-C data denoising with diffusion models. Brief Bioinform 2024; 25:bbae279. [PMID: 38856167 PMCID: PMC11163381 DOI: 10.1093/bib/bbae279] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/08/2024] [Revised: 05/21/2024] [Accepted: 05/29/2024] [Indexed: 06/11/2024] Open
Abstract
The genome-wide single-cell chromosome conformation capture technique, i.e. single-cell Hi-C (ScHi-C), was recently developed to interrogate the conformation of the genome of individual cells. However, single-cell Hi-C data are much sparser than bulk Hi-C data of a population of cells, and noise in single-cell Hi-C makes it difficult to apply and analyze them in biological research. Here, we developed the first generative diffusion models (HiCDiff) to denoise single-cell Hi-C data in the form of chromosomal contact matrices. HiCDiff uses a deep residual network to remove the noise in the reverse process of diffusion and can be trained in both unsupervised and supervised learning modes. Benchmarked on several single-cell Hi-C test datasets, the diffusion models substantially remove the noise in single-cell Hi-C data. The unsupervised HiCDiff outperforms most supervised non-diffusion deep learning methods and achieves the performance comparable to the state-of-the-art supervised deep learning method in terms of multiple metrics, demonstrating that diffusion models are a useful approach to denoising single-cell Hi-C data. Moreover, its good performance holds on denoising bulk Hi-C data.
Collapse
Affiliation(s)
- Yanli Wang
- Department of Electrical Engineering and Computer Science, NextGen Precision Health Institute, University of Missouri, Columbia, MO 65211, United States
| | - Jianlin Cheng
- Department of Electrical Engineering and Computer Science, NextGen Precision Health Institute, University of Missouri, Columbia, MO 65211, United States
| |
Collapse
|
25
|
Xu J, Xu X, Huang D, Luo Y, Lin L, Bai X, Zheng Y, Yang Q, Cheng Y, Huang A, Shi J, Bo X, Gu J, Chen H. A comprehensive benchmarking with interpretation and operational guidance for the hierarchy of topologically associating domains. Nat Commun 2024; 15:4376. [PMID: 38782890 PMCID: PMC11116433 DOI: 10.1038/s41467-024-48593-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/18/2023] [Accepted: 05/03/2024] [Indexed: 05/25/2024] Open
Abstract
Topologically associating domains (TADs), megabase-scale features of chromatin spatial architecture, are organized in a domain-within-domain TAD hierarchy. Within TADs, the inner and smaller subTADs not only manifest cell-to-cell variability, but also precisely regulate transcription and differentiation. Although over 20 TAD callers are able to detect TAD, their usability in biomedicine is confined by a disagreement of outputs and a limit in understanding TAD hierarchy. We compare 13 computational tools across various conditions and develop a metric to evaluate the similarity of TAD hierarchy. Although outputs of TAD hierarchy at each level vary among callers, data resolutions, sequencing depths, and matrices normalization, they are more consistent when they have a higher similarity of larger TADs. We present comprehensive benchmarking of TAD hierarchy callers and operational guidance to researchers of life science researchers. Moreover, by simulating the mixing of different types of cells, we confirm that TAD hierarchy is generated not simply from stacking Hi-C heatmaps of heterogeneous cells. Finally, we propose an air conditioner model to decipher the role of TAD hierarchy in transcription.
Collapse
Affiliation(s)
- Jingxuan Xu
- Key Laboratory of Carcinogenesis and Translational Research (Ministry of Education/Beijing), Department of Gastrointestinal Surgery, Peking University Cancer Hospital & Institute, Beijing, 100142, China
| | - Xiang Xu
- Academy of Military Medical Science, Beijing, 100850, China
| | - Dandan Huang
- Department of Oncology, Peking University Shougang Hospital, Beijing, China
- Center for Precision Diagnosis and Treatment of Colorectal Cancer and Inflammatory Diseases, Peking University Health Science Center, Beijing, China
| | - Yawen Luo
- Academy of Military Medical Science, Beijing, 100850, China
| | - Lin Lin
- Academy of Military Medical Science, Beijing, 100850, China
- School of Computer Science and Information Technology& KLAS, Northeast Normal University, Changchun, China
| | - Xuemei Bai
- Academy of Military Medical Science, Beijing, 100850, China
| | - Yang Zheng
- Academy of Military Medical Science, Beijing, 100850, China
| | - Qian Yang
- Academy of Military Medical Science, Beijing, 100850, China
| | - Yu Cheng
- Key Laboratory of Carcinogenesis and Translational Research (Ministry of Education/Beijing), Department of Gastrointestinal Surgery, Peking University Cancer Hospital & Institute, Beijing, 100142, China
| | - An Huang
- Key Laboratory of Carcinogenesis and Translational Research (Ministry of Education/Beijing), Department of Gastrointestinal Surgery, Peking University Cancer Hospital & Institute, Beijing, 100142, China
| | - Jingyi Shi
- Key Laboratory of Carcinogenesis and Translational Research (Ministry of Education/Beijing), Department of Gastrointestinal Surgery, Peking University Cancer Hospital & Institute, Beijing, 100142, China
| | - Xiaochen Bo
- Academy of Military Medical Science, Beijing, 100850, China.
| | - Jin Gu
- Key Laboratory of Carcinogenesis and Translational Research (Ministry of Education/Beijing), Department of Gastrointestinal Surgery, Peking University Cancer Hospital & Institute, Beijing, 100142, China.
- Department of Oncology, Peking University Shougang Hospital, Beijing, China.
- Center for Precision Diagnosis and Treatment of Colorectal Cancer and Inflammatory Diseases, Peking University Health Science Center, Beijing, China.
- Peking-Tsinghua Center for Life Sciences, Peking University, Beijing, China.
- Peking University International Cancer Institute, Beijing, China.
| | - Hebing Chen
- Academy of Military Medical Science, Beijing, 100850, China.
| |
Collapse
|
26
|
Yoon I, Kim U, Jung KO, Song Y, Park T, Lee DS. 3C methods in cancer research: recent advances and future prospects. Exp Mol Med 2024; 56:788-798. [PMID: 38658701 PMCID: PMC11059347 DOI: 10.1038/s12276-024-01236-9] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/17/2023] [Revised: 03/15/2024] [Accepted: 03/25/2024] [Indexed: 04/26/2024] Open
Abstract
In recent years, Hi-C technology has revolutionized cancer research by elucidating the mystery of three-dimensional chromatin organization and its role in gene regulation. This paper explored the impact of Hi-C advancements on cancer research by delving into high-resolution techniques, such as chromatin loops, structural variants, haplotype phasing, and extrachromosomal DNA (ecDNA). Distant regulatory elements interact with their target genes through chromatin loops. Structural variants contribute to the development and progression of cancer. Haplotype phasing is crucial for understanding allele-specific genomic rearrangements and somatic clonal evolution in cancer. The role of ecDNA in driving oncogene amplification and drug resistance in cancer cells has also been revealed. These innovations offer a deeper understanding of cancer biology and the potential for personalized therapies. Despite these advancements, challenges, such as the accurate mapping of repetitive sequences and precise identification of structural variants, persist. Integrating Hi-C with multiomics data is key to overcoming these challenges and comprehensively understanding complex cancer genomes. Thus, Hi-C is a powerful tool for guiding precision medicine in cancer research and treatment.
Collapse
Affiliation(s)
- Insoo Yoon
- Department of Life Science, University of Seoul, Seoul, 02504, Republic of Korea
| | - Uijin Kim
- Department of Life Science, University of Seoul, Seoul, 02504, Republic of Korea
| | - Kyung Oh Jung
- Department of Anatomy, College of Medicine, Chung-Ang University, Seoul, 06974, Republic of Korea
| | - Yousuk Song
- Department of Life Science, University of Seoul, Seoul, 02504, Republic of Korea
| | - Taesoo Park
- Department of Life Science, University of Seoul, Seoul, 02504, Republic of Korea
| | - Dong-Sung Lee
- Department of Life Science, University of Seoul, Seoul, 02504, Republic of Korea.
| |
Collapse
|
27
|
Liu R, Xu R, Yan S, Li P, Jia C, Sun H, Sheng K, Wang Y, Zhang Q, Guo J, Xin X, Li X, Guo D. Hi-C, a chromatin 3D structure technique advancing the functional genomics of immune cells. Front Genet 2024; 15:1377238. [PMID: 38586584 PMCID: PMC10995239 DOI: 10.3389/fgene.2024.1377238] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/07/2024] [Accepted: 03/13/2024] [Indexed: 04/09/2024] Open
Abstract
The functional performance of immune cells relies on a complex transcriptional regulatory network. The three-dimensional structure of chromatin can affect chromatin status and gene expression patterns, and plays an important regulatory role in gene transcription. Currently available techniques for studying chromatin spatial structure include chromatin conformation capture techniques and their derivatives, chromatin accessibility sequencing techniques, and others. Additionally, the recently emerged deep learning technology can be utilized as a tool to enhance the analysis of data. In this review, we elucidate the definition and significance of the three-dimensional chromatin structure, summarize the technologies available for studying it, and describe the research progress on the chromatin spatial structure of dendritic cells, macrophages, T cells, B cells, and neutrophils.
Collapse
Affiliation(s)
| | | | | | | | | | | | | | | | | | | | | | | | - Dianhao Guo
- School of Clinical and Basic Medical Sciences, Shandong First Medical University and Shandong Academy of Medical Sciences, Jinan, Shandong, China
| |
Collapse
|
28
|
Robson ES, Ioannidis NM. GUANinE v1.0: Benchmark Datasets for Genomic AI Sequence-to-Function Models. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2023.10.12.562113. [PMID: 37904945 PMCID: PMC10614795 DOI: 10.1101/2023.10.12.562113] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/01/2023]
Abstract
Computational genomics increasingly relies on machine learning methods for genome interpretation, and the recent adoption of neural sequence-to-function models highlights the need for rigorous model specification and controlled evaluation, problems familiar to other fields of AI. Research strategies that have greatly benefited other fields - including benchmarking, auditing, and algorithmic fairness - are also needed to advance the field of genomic AI and to facilitate model development. Here we propose a genomic AI benchmark, GUANinE, for evaluating model generalization across a number of distinct genomic tasks. Compared to existing task formulations in computational genomics, GUANinE is large-scale, de-noised, and suitable for evaluating pretrained models. GUANinE v1.0 primarily focuses on functional genomics tasks such as functional element annotation and gene expression prediction, and it also draws upon connections to evolutionary biology through sequence conservation tasks. The current GUANinE tasks provide insight into the performance of existing genomic AI models and non-neural baselines, with opportunities to be refined, revisited, and broadened as the field matures. Finally, the GUANinE benchmark allows us to evaluate new self-supervised T5 models and explore the tradeoffs between tokenization and model performance, while showcasing the potential for self-supervision to complement existing pretraining procedures.
Collapse
Affiliation(s)
- Eyes S Robson
- Center for Computational Biology, UC Berkeley, Berkeley, CA 94720
| | - Nilah M Ioannidis
- Department of Electrical Engineering and Computer Sciences, UC Berkeley, Berkeley, CA 94720
| |
Collapse
|
29
|
Zhang Y, Cameron CJF, Blanchette M. Posterior inference of Hi-C contact frequency through sampling. FRONTIERS IN BIOINFORMATICS 2024; 3:1285828. [PMID: 38455089 PMCID: PMC10919286 DOI: 10.3389/fbinf.2023.1285828] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/30/2023] [Accepted: 12/20/2023] [Indexed: 03/09/2024] Open
Abstract
Hi-C is one of the most widely used approaches to study three-dimensional genome conformations. Contacts captured by a Hi-C experiment are represented in a contact frequency matrix. Due to the limited sequencing depth and other factors, Hi-C contact frequency matrices are only approximations of the true interaction frequencies and are further reported without any quantification of uncertainty. Hence, downstream analyses based on Hi-C contact maps (e.g., TAD and loop annotation) are themselves point estimations. Here, we present the Hi-C interaction frequency sampler (HiCSampler) that reliably infers the posterior distribution of the interaction frequency for a given Hi-C contact map by exploiting dependencies between neighboring loci. Posterior predictive checks demonstrate that HiCSampler can infer highly predictive chromosomal interaction frequency. Summary statistics calculated by HiCSampler provide a measurement of the uncertainty for Hi-C experiments, and samples inferred by HiCSampler are ready for use by most downstream analysis tools off the shelf and permit uncertainty measurements in these analyses without modifications.
Collapse
Affiliation(s)
- Yanlin Zhang
- School of Computer Science, McGill University, Montréal, QC, Canada
| | - Christopher J. F. Cameron
- School of Computer Science, McGill University, Montréal, QC, Canada
- Department of Biochemistry and Goodman Cancer Research Center, McGill University, Montreal, QC, Canada
| | | |
Collapse
|
30
|
Murtaza G, Jain A, Hughes M, Wagner J, Singh R. A Comprehensive Evaluation of Generalizability of Deep Learning-Based Hi-C Resolution Improvement Methods. Genes (Basel) 2023; 15:54. [PMID: 38254945 PMCID: PMC10815746 DOI: 10.3390/genes15010054] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/04/2023] [Revised: 12/24/2023] [Accepted: 12/26/2023] [Indexed: 01/24/2024] Open
Abstract
Hi-C is a widely used technique to study the 3D organization of the genome. Due to its high sequencing cost, most of the generated datasets are of a coarse resolution, which makes it impractical to study finer chromatin features such as Topologically Associating Domains (TADs) and chromatin loops. Multiple deep learning-based methods have recently been proposed to increase the resolution of these datasets by imputing Hi-C reads (typically called upscaling). However, the existing works evaluate these methods on either synthetically downsampled datasets, or a small subset of experimentally generated sparse Hi-C datasets, making it hard to establish their generalizability in the real-world use case. We present our framework-Hi-CY-that compares existing Hi-C resolution upscaling methods on seven experimentally generated low-resolution Hi-C datasets belonging to various levels of read sparsities originating from three cell lines on a comprehensive set of evaluation metrics. Hi-CY also includes four downstream analysis tasks, such as TAD and chromatin loops recall, to provide a thorough report on the generalizability of these methods. We observe that existing deep learning methods fail to generalize to experimentally generated sparse Hi-C datasets, showing a performance reduction of up to 57%. As a potential solution, we find that retraining deep learning-based methods with experimentally generated Hi-C datasets improves performance by up to 31%. More importantly, Hi-CY shows that even with retraining, the existing deep learning-based methods struggle to recover biological features such as chromatin loops and TADs when provided with sparse Hi-C datasets. Our study, through the Hi-CY framework, highlights the need for rigorous evaluation in the future. We identify specific avenues for improvements in the current deep learning-based Hi-C upscaling methods, including but not limited to using experimentally generated datasets for training.
Collapse
Affiliation(s)
- Ghulam Murtaza
- Department of Computer Science, Brown University, Providence, RI 02912, USA; (G.M.); (A.J.); (M.H.)
| | - Atishay Jain
- Department of Computer Science, Brown University, Providence, RI 02912, USA; (G.M.); (A.J.); (M.H.)
| | - Madeline Hughes
- Department of Computer Science, Brown University, Providence, RI 02912, USA; (G.M.); (A.J.); (M.H.)
| | - Justin Wagner
- Material Measurement Laboratory, National Institute of Standards and Technology, Gaithersburg, MD 20899, USA;
| | - Ritambhara Singh
- Department of Computer Science, Brown University, Providence, RI 02912, USA; (G.M.); (A.J.); (M.H.)
- Center for Computational Molecular Biology, Brown University, Providence, RI 02912, USA
| |
Collapse
|
31
|
Race AM, Fuchs A, Chung HR. Visualization and data exploration of chromosome conformation capture data using Voronoi diagrams with v3c-viz. Sci Rep 2023; 13:22020. [PMID: 38086827 PMCID: PMC10716258 DOI: 10.1038/s41598-023-49179-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/13/2023] [Accepted: 12/05/2023] [Indexed: 12/18/2023] Open
Abstract
Chromosome conformation capture (3C) sequencing approaches, like Hi-C or micro-C, allow for an unbiased view of chromatin interactions. Most analysis methods rely on so-called interaction matrices, which are derived from counting read pairs in bins of fixed size. Here, we propose the Voronoi diagram, as implemented in Voronoi for chromosome conformation capture data visualization (v3c-viz) to visualize 3C data. The Voronoi diagram corresponds to an adaptive-binning strategy that adapts to the local densities of points. In this way, visualization of data obtained by moderate sequencing depth pinpoint many, if not most, interesting features such as high frequency contacts. The favorable visualization properties of the Voronoi diagram indicate that the Voronoi diagram as density estimator can be used to identify high frequency contacts at a resolution approaching the typical size of enhancers and promoters. v3c-viz is available at https://github.com/imbbLab/v3c-viz .
Collapse
Affiliation(s)
- Alan M Race
- Philipps University Marburg, Institute for Medical Bioinformatics and Biostatistics, Marburg, 35043, Germany
| | - Alisa Fuchs
- Max Planck Institute for Molecular Genetics, Epigenomics, Berlin, 14195, Germany
- Berlin Institute for Medical Systems Biology, Max Delbrück Center, Berlin, 10115, Germany
| | - Ho-Ryun Chung
- Philipps University Marburg, Institute for Medical Bioinformatics and Biostatistics, Marburg, 35043, Germany.
- Max Planck Institute for Molecular Genetics, Epigenomics, Berlin, 14195, Germany.
| |
Collapse
|
32
|
Salafranca J, Ko JK, Mukherjee AK, Fritzsche M, van Grinsven E, Udalova IA. Neutrophil nucleus: shaping the past and the future. J Leukoc Biol 2023; 114:585-594. [PMID: 37480361 PMCID: PMC10673716 DOI: 10.1093/jleuko/qiad084] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/21/2023] [Revised: 07/10/2023] [Accepted: 07/12/2023] [Indexed: 07/24/2023] Open
Abstract
Neutrophils are innate immune cells that are key to protecting the host against infection and maintaining body homeostasis. However, if dysregulated, they can contribute to disease, such as in cancer or chronic autoinflammatory disorders. Recent studies have highlighted the heterogeneity in the neutrophil compartment and identified the presence of immature neutrophils and their precursors in these pathologies. Therefore, understanding neutrophil maturity and the mechanisms through which they contribute to disease is critical. Neutrophils were first characterized morphologically by Ehrlich in 1879 using microscopy, and since then, different technologies have been used to assess neutrophil maturity. The advances in the imaging field, including state-of-the-art microscopy and machine learning algorithms for image analysis, reinforce the use of neutrophil nuclear morphology as a fundamental marker of maturity, applicable for objective classification in clinical diagnostics. New emerging approaches, such as the capture of changes in chromatin topology, will provide mechanistic links between the nuclear shape, chromatin organization, and transcriptional regulation during neutrophil maturation.
Collapse
Affiliation(s)
- Julia Salafranca
- The Kennedy Institute of Rheumatology, University of Oxford, Old Road Campus Research Build, Roosevelt Dr, Headington, Oxford OX3 7DQ, United Kingdom
| | - Jacky Ka Ko
- The Kennedy Institute of Rheumatology, University of Oxford, Old Road Campus Research Build, Roosevelt Dr, Headington, Oxford OX3 7DQ, United Kingdom
| | - Ananda K Mukherjee
- The Kennedy Institute of Rheumatology, University of Oxford, Old Road Campus Research Build, Roosevelt Dr, Headington, Oxford OX3 7DQ, United Kingdom
| | - Marco Fritzsche
- The Kennedy Institute of Rheumatology, University of Oxford, Old Road Campus Research Build, Roosevelt Dr, Headington, Oxford OX3 7DQ, United Kingdom
| | - Erinke van Grinsven
- The Kennedy Institute of Rheumatology, University of Oxford, Old Road Campus Research Build, Roosevelt Dr, Headington, Oxford OX3 7DQ, United Kingdom
| | - Irina A Udalova
- The Kennedy Institute of Rheumatology, University of Oxford, Old Road Campus Research Build, Roosevelt Dr, Headington, Oxford OX3 7DQ, United Kingdom
| |
Collapse
|
33
|
Wang Q, Zhang J, Liu Z, Duan Y, Li C. Integrative approaches based on genomic techniques in the functional studies on enhancers. Brief Bioinform 2023; 25:bbad442. [PMID: 38048082 PMCID: PMC10694556 DOI: 10.1093/bib/bbad442] [Citation(s) in RCA: 5] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/28/2023] [Revised: 10/22/2023] [Accepted: 11/08/2023] [Indexed: 12/05/2023] Open
Abstract
With the development of sequencing technology and the dramatic drop in sequencing cost, the functions of noncoding genes are being characterized in a wide variety of fields (e.g. biomedicine). Enhancers are noncoding DNA elements with vital transcription regulation functions. Tens of thousands of enhancers have been identified in the human genome; however, the location, function, target genes and regulatory mechanisms of most enhancers have not been elucidated thus far. As high-throughput sequencing techniques have leapt forwards, omics approaches have been extensively employed in enhancer research. Multidimensional genomic data integration enables the full exploration of the data and provides novel perspectives for screening, identification and characterization of the function and regulatory mechanisms of unknown enhancers. However, multidimensional genomic data are still difficult to integrate genome wide due to complex varieties, massive amounts, high rarity, etc. To facilitate the appropriate methods for studying enhancers with high efficacy, we delineate the principles, data processing modes and progress of various omics approaches to study enhancers and summarize the applications of traditional machine learning and deep learning in multi-omics integration in the enhancer field. In addition, the challenges encountered during the integration of multiple omics data are addressed. Overall, this review provides a comprehensive foundation for enhancer analysis.
Collapse
Affiliation(s)
- Qilin Wang
- School of Engineering Medicine, Beihang University, Beijing 100191, China
- School of Biological Science and Medical Engineering, Beihang University, Beijing 100191, China
| | - Junyou Zhang
- School of Engineering Medicine, Beihang University, Beijing 100191, China
- School of Biological Science and Medical Engineering, Beihang University, Beijing 100191, China
| | - Zhaoshuo Liu
- School of Engineering Medicine, Beihang University, Beijing 100191, China
- School of Biological Science and Medical Engineering, Beihang University, Beijing 100191, China
| | - Yingying Duan
- School of Engineering Medicine, Beihang University, Beijing 100191, China
- School of Biological Science and Medical Engineering, Beihang University, Beijing 100191, China
| | - Chunyan Li
- School of Engineering Medicine, Beihang University, Beijing 100191, China
- School of Biological Science and Medical Engineering, Beihang University, Beijing 100191, China
- Key Laboratory of Big Data-Based Precision Medicine (Ministry of Industry and Information Technology), Beihang University, Beijing 100191, China
- Beijing Advanced Innovation Center for Big Data-Based Precision Medicine, Beihang University, Beijing 100191, China
| |
Collapse
|
34
|
Wu H, Zhou B, Zhou H, Zhang P, Wang M. Be-1DCNN: a neural network model for chromatin loop prediction based on bagging ensemble learning. Brief Funct Genomics 2023; 22:475-484. [PMID: 37133976 DOI: 10.1093/bfgp/elad015] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/16/2023] [Revised: 03/10/2023] [Accepted: 03/29/2023] [Indexed: 05/04/2023] Open
Abstract
The chromatin loops in the three-dimensional (3D) structure of chromosomes are essential for the regulation of gene expression. Despite the fact that high-throughput chromatin capture techniques can identify the 3D structure of chromosomes, chromatin loop detection utilizing biological experiments is arduous and time-consuming. Therefore, a computational method is required to detect chromatin loops. Deep neural networks can form complex representations of Hi-C data and provide the possibility of processing biological datasets. Therefore, we propose a bagging ensemble one-dimensional convolutional neural network (Be-1DCNN) to detect chromatin loops from genome-wide Hi-C maps. First, to obtain accurate and reliable chromatin loops in genome-wide contact maps, the bagging ensemble learning method is utilized to synthesize the prediction results of multiple 1DCNN models. Second, each 1DCNN model consists of three 1D convolutional layers for extracting high-dimensional features from input samples and one dense layer for producing the prediction results. Finally, the prediction results of Be-1DCNN are compared to those of the existing models. The experimental results indicate that Be-1DCNN predicts high-quality chromatin loops and outperforms the state-of-the-art methods using the same evaluation metrics. The source code of Be-1DCNN is available for free at https://github.com/HaoWuLab-Bioinformatics/Be1DCNN.
Collapse
Affiliation(s)
- Hao Wu
- College of Information Engineering, Northwest A&F University, Yangling, 712100 Shaanxi, China
- School of Software, Shandong University, Jinan, 250101 Shandong, China
| | - Bing Zhou
- College of Information Engineering, Northwest A&F University, Yangling, 712100 Shaanxi, China
| | - Haoru Zhou
- College of Information Engineering, Northwest A&F University, Yangling, 712100 Shaanxi, China
| | - Pengyu Zhang
- College of Information Engineering, Northwest A&F University, Yangling, 712100 Shaanxi, China
| | - Meili Wang
- College of Information Engineering, Northwest A&F University, Yangling, 712100 Shaanxi, China
| |
Collapse
|
35
|
Rapakoulia T, Lopez Ruiz De Vargas S, Omgba PA, Laupert V, Ulitsky I, Vingron M. CENTRE: a gradient boosting algorithm for Cell-type-specific ENhancer-Target pREdiction. Bioinformatics 2023; 39:btad687. [PMID: 37982748 PMCID: PMC10666202 DOI: 10.1093/bioinformatics/btad687] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/17/2023] [Revised: 10/11/2023] [Accepted: 11/17/2023] [Indexed: 11/21/2023] Open
Abstract
MOTIVATION Identifying target promoters of active enhancers is a crucial step for realizing gene regulation and deciphering phenotypes and diseases. Up to now, several computational methods were developed to predict enhancer gene interactions, but they require either many epigenomic and transcriptomic experimental assays to generate cell-type (CT)-specific predictions or a single experiment applied to a large cohort of CTs to extract correlations between activities of regulatory elements. Thus, inferring CT-specific enhancer gene interactions in unstudied or poorly annotated CTs becomes a laborious and costly task. RESULTS Here, we aim to infer CT-specific enhancer target interactions, using minimal experimental input. We introduce Cell-specific ENhancer Target pREdiction (CENTRE), a machine learning framework that predicts enhancer target interactions in a CT-specific manner, using only gene expression and ChIP-seq data for three histone modifications for the CT of interest. CENTRE exploits the wealth of available datasets and extracts cell-type agnostic statistics to complement the CT-specific information. CENTRE is thoroughly tested across many datasets and CTs and achieves equivalent or superior performance than existing algorithms that require massive experimental data. AVAILABILITY AND IMPLEMENTATION CENTRE's open-source code is available at GitHub via https://github.com/slrvv/CENTRE.
Collapse
Affiliation(s)
| | | | | | - Verena Laupert
- Max Planck Institute for Molecular Genetics, 14195 Berlin, Germany
| | - Igor Ulitsky
- Max Planck Institute for Molecular Genetics, 14195 Berlin, Germany
- Department of Immunology and Regenerative Biology, Weizmann Institute of Science, Rehovot 76100, Israel
- Department of Molecular Neuroscience, Weizmann Institute of Science, Rehovot 76100, Israel
| | - Martin Vingron
- Max Planck Institute for Molecular Genetics, 14195 Berlin, Germany
| |
Collapse
|
36
|
Badia-I-Mompel P, Wessels L, Müller-Dott S, Trimbour R, Ramirez Flores RO, Argelaguet R, Saez-Rodriguez J. Gene regulatory network inference in the era of single-cell multi-omics. Nat Rev Genet 2023; 24:739-754. [PMID: 37365273 DOI: 10.1038/s41576-023-00618-5] [Citation(s) in RCA: 128] [Impact Index Per Article: 64.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 05/12/2023] [Indexed: 06/28/2023]
Abstract
The interplay between chromatin, transcription factors and genes generates complex regulatory circuits that can be represented as gene regulatory networks (GRNs). The study of GRNs is useful to understand how cellular identity is established, maintained and disrupted in disease. GRNs can be inferred from experimental data - historically, bulk omics data - and/or from the literature. The advent of single-cell multi-omics technologies has led to the development of novel computational methods that leverage genomic, transcriptomic and chromatin accessibility information to infer GRNs at an unprecedented resolution. Here, we review the key principles of inferring GRNs that encompass transcription factor-gene interactions from transcriptomics and chromatin accessibility data. We focus on the comparison and classification of methods that use single-cell multimodal data. We highlight challenges in GRN inference, in particular with respect to benchmarking, and potential further developments using additional data modalities.
Collapse
Affiliation(s)
- Pau Badia-I-Mompel
- Heidelberg University, Faculty of Medicine, Heidelberg University Hospital, Institute for Computational Biomedicine, Bioquant, Heidelberg, Germany
| | - Lorna Wessels
- Heidelberg University, Faculty of Medicine, Heidelberg University Hospital, Institute for Computational Biomedicine, Bioquant, Heidelberg, Germany
- Department of Vascular Biology and Tumor Angiogenesis, European Center for Angioscience, Medical Faculty, MannHeim Heidelberg University, Mannheim, Germany
| | - Sophia Müller-Dott
- Heidelberg University, Faculty of Medicine, Heidelberg University Hospital, Institute for Computational Biomedicine, Bioquant, Heidelberg, Germany
| | - Rémi Trimbour
- Heidelberg University, Faculty of Medicine, Heidelberg University Hospital, Institute for Computational Biomedicine, Bioquant, Heidelberg, Germany
- Institut Pasteur, Université Paris Cité, CNRS UMR 3738, Machine Learning for Integrative Genomics Group, Paris, France
| | - Ricardo O Ramirez Flores
- Heidelberg University, Faculty of Medicine, Heidelberg University Hospital, Institute for Computational Biomedicine, Bioquant, Heidelberg, Germany
| | | | - Julio Saez-Rodriguez
- Heidelberg University, Faculty of Medicine, Heidelberg University Hospital, Institute for Computational Biomedicine, Bioquant, Heidelberg, Germany.
| |
Collapse
|
37
|
Huang L, Song M, Shen H, Hong H, Gong P, Deng HW, Zhang C. Deep Learning Methods for Omics Data Imputation. BIOLOGY 2023; 12:1313. [PMID: 37887023 PMCID: PMC10604785 DOI: 10.3390/biology12101313] [Citation(s) in RCA: 6] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 08/08/2023] [Revised: 09/28/2023] [Accepted: 10/02/2023] [Indexed: 10/28/2023]
Abstract
One common problem in omics data analysis is missing values, which can arise due to various reasons, such as poor tissue quality and insufficient sample volumes. Instead of discarding missing values and related data, imputation approaches offer an alternative means of handling missing data. However, the imputation of missing omics data is a non-trivial task. Difficulties mainly come from high dimensionality, non-linear or non-monotonic relationships within features, technical variations introduced by sampling methods, sample heterogeneity, and the non-random missingness mechanism. Several advanced imputation methods, including deep learning-based methods, have been proposed to address these challenges. Due to its capability of modeling complex patterns and relationships in large and high-dimensional datasets, many researchers have adopted deep learning models to impute missing omics data. This review provides a comprehensive overview of the currently available deep learning-based methods for omics imputation from the perspective of deep generative model architectures such as autoencoder, variational autoencoder, generative adversarial networks, and Transformer, with an emphasis on multi-omics data imputation. In addition, this review also discusses the opportunities that deep learning brings and the challenges that it might face in this field.
Collapse
Affiliation(s)
- Lei Huang
- School of Computing Sciences and Computer Engineering, University of Southern Mississippi, Hattiesburg, MS 39406, USA
| | - Meng Song
- School of Computing Sciences and Computer Engineering, University of Southern Mississippi, Hattiesburg, MS 39406, USA
| | - Hui Shen
- Center for Biomedical Informatics and Genomics, School of Medicine, Tulane University, New Orleans, LA 70112, USA
| | - Huixiao Hong
- Division of Bioinformatics and Biostatistics, National Center for Toxicological Research, U.S. Food and Drug Administration, Jefferson, AR 72079, USA
| | - Ping Gong
- Environmental Laboratory, U.S. Army Engineer Research and Development Center, Vicksburg, MS 39180, USA
| | - Hong-Wen Deng
- Center for Biomedical Informatics and Genomics, School of Medicine, Tulane University, New Orleans, LA 70112, USA
| | - Chaoyang Zhang
- School of Computing Sciences and Computer Engineering, University of Southern Mississippi, Hattiesburg, MS 39406, USA
| |
Collapse
|
38
|
Wang X, Zeng H, Lin L, Huang Y, Lin H, Que Y. Deep learning-empowered crop breeding: intelligent, efficient and promising. FRONTIERS IN PLANT SCIENCE 2023; 14:1260089. [PMID: 37860239 PMCID: PMC10583549 DOI: 10.3389/fpls.2023.1260089] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 07/18/2023] [Accepted: 09/13/2023] [Indexed: 10/21/2023]
Abstract
Crop breeding is one of the main approaches to increase crop yield and improve crop quality. However, the breeding process faces challenges such as complex data, difficulties in data acquisition, and low prediction accuracy, resulting in low breeding efficiency and long cycle. Deep learning-based crop breeding is a strategy that applies deep learning techniques to improve and optimize the breeding process, leading to accelerated crop improvement, enhanced breeding efficiency, and the development of higher-yielding, more adaptive, and disease-resistant varieties for agricultural production. This perspective briefly discusses the mechanisms, key applications, and impact of deep learning in crop breeding. We also highlight the current challenges associated with this topic and provide insights into its future application prospects.
Collapse
Affiliation(s)
- Xiaoding Wang
- Fujian Provincial Key Lab of Network Security & Cryptology, College of Computer and Cyber Security, Fujian Normal University, Fuzhou, China
| | - Haitao Zeng
- Fujian Provincial Key Lab of Network Security & Cryptology, College of Computer and Cyber Security, Fujian Normal University, Fuzhou, China
| | - Limei Lin
- Fujian Provincial Key Lab of Network Security & Cryptology, College of Computer and Cyber Security, Fujian Normal University, Fuzhou, China
| | - Yanze Huang
- School of Computer Science and Mathematics, Fujian Provincial Key Laboratory of Big Data Mining and Applications, Fujian University of Technology, Fuzhou, China
| | - Hui Lin
- Fujian Provincial Key Lab of Network Security & Cryptology, College of Computer and Cyber Security, Fujian Normal University, Fuzhou, China
| | - Youxiong Que
- Key Laboratory of Sugarcane Biology and Genetic Breeding, Ministry of Agriculture and Rural Affairs, Fujian Agriculture and Forestry University, Fuzhou, China
- National Key Laboratory for Tropical Crop Breeding, Institute of Tropical Bioscience and Biotechnology, Chinese Academy of Tropical Agricultural Sciences, Hainan, China
| |
Collapse
|
39
|
Zhou Y, Li T, Choppavarapu L, Jin VX. Integration of scHi-C and scRNA-seq data defines distinct 3D-regulated and biological-context dependent cell subpopulations. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.09.29.560193. [PMID: 37873257 PMCID: PMC10592853 DOI: 10.1101/2023.09.29.560193] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 10/25/2023]
Abstract
An integration of 3D chromatin structure and gene expression at single-cell resolution has yet been demonstrated. Here, we develop a computational method, a multiomic data integration (MUDI) algorithm, which integrates scHi-C and scRNA-seq data to precisely define the 3D-regulated and biological-context dependent cell subpopulations or topologically integrated subpopulations (TISPs). We demonstrate its algorithmic utility on the publicly available and newly generated scHi-C and scRNA-seq data. We then test and apply MUDI in a breast cancer cell model system to demonstrate its biological-context dependent utility. We found the newly defined topologically conserved associating domain (CAD) is the characteristic single-cell 3D chromatin structure and better characterizes chromatin domains in single-cell resolution. We further identify 20 TISPs uniquely characterizing 3D-regulated breast cancer cellular states. We reveal two of TISPs are remarkably resemble to high cycling breast cancer persister cells and chromatin modifying enzymes might be functional regulators to drive the alteration of the 3D chromatin structures. Our comprehensive integration of scHi-C and scRNA-seq data in cancer cells at single-cell resolution provides mechanistic insights into 3D-regulated heterogeneity of developing drug-tolerant cancer cells.
Collapse
|
40
|
Raffo A, Paulsen J. The shape of chromatin: insights from computational recognition of geometric patterns in Hi-C data. Brief Bioinform 2023; 24:bbad302. [PMID: 37646128 PMCID: PMC10516369 DOI: 10.1093/bib/bbad302] [Citation(s) in RCA: 8] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/18/2023] [Revised: 07/05/2023] [Accepted: 08/03/2023] [Indexed: 09/01/2023] Open
Abstract
The three-dimensional organization of chromatin plays a crucial role in gene regulation and cellular processes like deoxyribonucleic acid (DNA) transcription, replication and repair. Hi-C and related techniques provide detailed views of spatial proximities within the nucleus. However, data analysis is challenging partially due to a lack of well-defined, underpinning mathematical frameworks. Recently, recognizing and analyzing geometric patterns in Hi-C data has emerged as a powerful approach. This review provides a summary of algorithms for automatic recognition and analysis of geometric patterns in Hi-C data and their correspondence with chromatin structure. We classify existing algorithms on the basis of the data representation and pattern recognition paradigm they make use of. Finally, we outline some of the challenges ahead and promising future directions.
Collapse
Affiliation(s)
- Andrea Raffo
- Department of Biosciences, University of Oslo, 0316 Oslo, Norway
| | - Jonas Paulsen
- Department of Biosciences, University of Oslo, 0316 Oslo, Norway
- Centre for Bioinformatics, Department of Informatics, University of Oslo, 0316 Oslo, Norway
| |
Collapse
|
41
|
Senapati S, Irshad IU, Sharma AK, Kumar H. Fundamental insights into the correlation between chromosome configuration and transcription. Phys Biol 2023; 20:051002. [PMID: 37467757 DOI: 10.1088/1478-3975/ace8e5] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/18/2023] [Accepted: 07/19/2023] [Indexed: 07/21/2023]
Abstract
Eukaryotic chromosomes exhibit a hierarchical organization that spans a spectrum of length scales, ranging from sub-regions known as loops, which typically comprise hundreds of base pairs, to much larger chromosome territories that can encompass a few mega base pairs. Chromosome conformation capture experiments that involve high-throughput sequencing methods combined with microscopy techniques have enabled a new understanding of inter- and intra-chromosomal interactions with unprecedented details. This information also provides mechanistic insights on the relationship between genome architecture and gene expression. In this article, we review the recent findings on three-dimensional interactions among chromosomes at the compartment, topologically associating domain, and loop levels and the impact of these interactions on the transcription process. We also discuss current understanding of various biophysical processes involved in multi-layer structural organization of chromosomes. Then, we discuss the relationships between gene expression and genome structure from perturbative genome-wide association studies. Furthermore, for a better understanding of how chromosome architecture and function are linked, we emphasize the role of epigenetic modifications in the regulation of gene expression. Such an understanding of the relationship between genome architecture and gene expression can provide a new perspective on the range of potential future discoveries and therapeutic research.
Collapse
Affiliation(s)
- Swayamshree Senapati
- School of Basic Sciences, Indian Institute of Technology, Bhubaneswar, Argul, Odisha 752050, India
| | - Inayat Ullah Irshad
- Department of Physics, Indian Institute of Technology, Jammu, Jammu 181221, India
| | - Ajeet K Sharma
- Department of Physics, Indian Institute of Technology, Jammu, Jammu 181221, India
- Department of Biosciences and Bioengineering, Indian Institute of Technology Jammu, Jammu 181221, India
| | - Hemant Kumar
- School of Basic Sciences, Indian Institute of Technology, Bhubaneswar, Argul, Odisha 752050, India
| |
Collapse
|
42
|
Wang Y, Guo Z, Cheng J. Single-cell Hi-C data enhancement with deep residual and generative adversarial networks. Bioinformatics 2023; 39:btad458. [PMID: 37498561 PMCID: PMC10403428 DOI: 10.1093/bioinformatics/btad458] [Citation(s) in RCA: 5] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/19/2023] [Revised: 07/19/2023] [Accepted: 07/25/2023] [Indexed: 07/28/2023] Open
Abstract
MOTIVATION The spatial genome organization of a eukaryotic cell is important for its function. The development of single-cell technologies for probing the 3D genome conformation, especially single-cell chromosome conformation capture techniques, has enabled us to understand genome function better than before. However, due to extreme sparsity and high noise associated with single-cell Hi-C data, it is still difficult to study genome structure and function using the HiC-data of one single cell. RESULTS In this work, we developed a deep learning method ScHiCEDRN based on deep residual networks and generative adversarial networks for the imputation and enhancement of Hi-C data of a single cell. In terms of both image evaluation and Hi-C reproducibility metrics, ScHiCEDRN outperforms the four deep learning methods (DeepHiC, HiCPlus, HiCSR, and Loopenhance) on enhancing the raw single-cell Hi-C data of human and Drosophila. The experiments also show that it can generate single-cell Hi-C data more suitable for identifying topologically associating domain boundaries and reconstructing 3D chromosome structures than the existing methods. Moreover, ScHiCEDRN's performance generalizes well across different single cells and cell types, and it can be applied to improving population Hi-C data. AVAILABILITY AND IMPLEMENTATION The source code of ScHiCEDRN is available at the GitHub repository: https://github.com/BioinfoMachineLearning/ScHiCEDRN.
Collapse
Affiliation(s)
- Yanli Wang
- Department of Electrical Engineering and Computer Science, University of Missouri, Columbia, MO 65211, United States
- NextGen Precision Health Institute, University of Missouri, Columbia, MO 65211, United States
| | - Zhiye Guo
- Department of Electrical Engineering and Computer Science, University of Missouri, Columbia, MO 65211, United States
- NextGen Precision Health Institute, University of Missouri, Columbia, MO 65211, United States
| | - Jianlin Cheng
- Department of Electrical Engineering and Computer Science, University of Missouri, Columbia, MO 65211, United States
- NextGen Precision Health Institute, University of Missouri, Columbia, MO 65211, United States
| |
Collapse
|
43
|
Gao VR, Yang R, Das A, Luo R, Luo H, McNally DR, Karagiannidis I, Rivas MA, Wang ZM, Barisic D, Karbalayghareh A, Wong W, Zhan YA, Chin CR, Noble W, Bilmes JA, Apostolou E, Kharas MG, Béguelin W, Viny AD, Huangfu D, Rudensky AY, Melnick AM, Leslie CS. ChromaFold predicts the 3D contact map from single-cell chromatin accessibility. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.07.27.550836. [PMID: 37546906 PMCID: PMC10402156 DOI: 10.1101/2023.07.27.550836] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 08/08/2023]
Abstract
The identification of cell-type-specific 3D chromatin interactions between regulatory elements can help to decipher gene regulation and to interpret the function of disease-associated non-coding variants. However, current chromosome conformation capture (3C) technologies are unable to resolve interactions at this resolution when only small numbers of cells are available as input. We therefore present ChromaFold, a deep learning model that predicts 3D contact maps and regulatory interactions from single-cell ATAC sequencing (scATAC-seq) data alone. ChromaFold uses pseudobulk chromatin accessibility, co-accessibility profiles across metacells, and predicted CTCF motif tracks as input features and employs a lightweight architecture to enable training on standard GPUs. Once trained on paired scATAC-seq and Hi-C data in human cell lines and tissues, ChromaFold can accurately predict both the 3D contact map and peak-level interactions across diverse human and mouse test cell types. In benchmarking against a recent deep learning method that uses bulk ATAC-seq, DNA sequence, and CTCF ChIP-seq to make cell-type-specific predictions, ChromaFold yields superior prediction performance when including CTCF ChIP-seq data as an input and comparable performance without. Finally, fine-tuning ChromaFold on paired scATAC-seq and Hi-C in a complex tissue enables deconvolution of chromatin interactions across cell subpopulations. ChromaFold thus achieves state-of-the-art prediction of 3D contact maps and regulatory interactions using scATAC-seq alone as input data, enabling accurate inference of cell-type-specific interactions in settings where 3C-based assays are infeasible.
Collapse
Affiliation(s)
- Vianne R. Gao
- Computational and Systems Biology Program, Memorial Sloan Kettering Cancer Center, New York, NY, USA
- Tri-Institutional Program in Computational Biology and Medicine, New York, NY, USA
| | - Rui Yang
- Computational and Systems Biology Program, Memorial Sloan Kettering Cancer Center, New York, NY, USA
- Tri-Institutional Program in Computational Biology and Medicine, New York, NY, USA
| | - Arnav Das
- University of Washington, Seattle, WA, USA
| | - Renhe Luo
- Developmental Biology Program, Sloan Kettering Institute, New York, NY, USA
| | - Hanzhi Luo
- Molecular Pharmacology Program, Experimental Therapeutics Center and Center for Stem Cell Biology, Memorial Sloan Kettering Cancer Center, New York, NY, USA
| | - Dylan R. McNally
- Caryl and Israel Englander Institute for Precision Medicine, Institute for Computational Biomedicine, Weill Cornell Medicine, Cornell University, New York, NY, USA
| | - Ioannis Karagiannidis
- Division of Hematology and Medical Oncology, Department of Medicine, Weill Cornell Medical College, New York, NY, USA
| | - Martin A. Rivas
- Division of Hematology and Medical Oncology, Department of Medicine, Weill Cornell Medical College, New York, NY, USA
| | - Zhong-Min Wang
- Howard Hughes Medical Institute and Immunology Program, Sloan Kettering Institute and Ludwig Center at Memorial Sloan Kettering Cancer Center, New York, NY, USA
| | - Darko Barisic
- Division of Hematology and Medical Oncology, Department of Medicine, Weill Cornell Medical College, New York, NY, USA
| | - Alireza Karbalayghareh
- Computational and Systems Biology Program, Memorial Sloan Kettering Cancer Center, New York, NY, USA
| | - Wilfred Wong
- Computational and Systems Biology Program, Memorial Sloan Kettering Cancer Center, New York, NY, USA
- Tri-Institutional Program in Computational Biology and Medicine, New York, NY, USA
| | - Yingqian A. Zhan
- Center for Epigenetics Research, Memorial Sloan Kettering Cancer Center, New York, NY, USA
| | - Christopher R. Chin
- Division of Hematology and Medical Oncology, Department of Medicine, Weill Cornell Medical College, New York, NY, USA
| | | | | | - Effie Apostolou
- Sanford I Weill department of Medicine, Sandra and Edward Meyer Cancer center, Weill Cornell Medicine, New York, NY, USA
| | - Michael G. Kharas
- Molecular Pharmacology Program, Experimental Therapeutics Center and Center for Stem Cell Biology, Memorial Sloan Kettering Cancer Center, New York, NY, USA
| | - Wendy Béguelin
- Division of Hematology and Medical Oncology, Department of Medicine, Weill Cornell Medical College, New York, NY, USA
| | - Aaron D. Viny
- Departments of Medicine, Division of Hematology & Oncology, and of Genetics & Development, Columbia Stem Cell Initiative, Herbert Irving Comprehensive Cancer Center, Columbia University Irving Medical Center, New York, NY, USA
| | - Danwei Huangfu
- Developmental Biology Program, Sloan Kettering Institute, New York, NY, USA
| | - Alexander Y. Rudensky
- Howard Hughes Medical Institute and Immunology Program, Sloan Kettering Institute and Ludwig Center at Memorial Sloan Kettering Cancer Center, New York, NY, USA
| | - Ari M. Melnick
- Division of Hematology and Medical Oncology, Department of Medicine, Weill Cornell Medical College, New York, NY, USA
| | - Christina S. Leslie
- Computational and Systems Biology Program, Memorial Sloan Kettering Cancer Center, New York, NY, USA
| |
Collapse
|
44
|
Li K, Zhang P, Wang Z, Shen W, Sun W, Xu J, Wen Z, Li L. iEnhance: a multi-scale spatial projection encoding network for enhancing chromatin interaction data resolution. Brief Bioinform 2023; 24:bbad245. [PMID: 37381618 DOI: 10.1093/bib/bbad245] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/21/2023] [Revised: 06/06/2023] [Accepted: 06/12/2023] [Indexed: 06/30/2023] Open
Abstract
Although sequencing-based high-throughput chromatin interaction data are widely used to uncover genome-wide three-dimensional chromatin architecture, their sparseness and high signal-noise-ratio greatly restrict the precision of the obtained structural elements. To improve data quality, we here present iEnhance (chromatin interaction data resolution enhancement), a multi-scale spatial projection and encoding network, to predict high-resolution chromatin interaction matrices from low-resolution and noisy input data. Specifically, iEnhance projects the input data into matrix spaces to extract multi-scale global and local feature sets, then hierarchically fused these features by attention mechanism. After that, dense channel encoding and residual channel decoding are used to effectively infer robust chromatin interaction maps. iEnhance outperforms state-of-the-art Hi-C resolution enhancement tools in both visual and quantitative evaluation. Comprehensive analysis shows that unlike other tools, iEnhance can recover both short-range structural elements and long-range interaction patterns precisely. More importantly, iEnhance can be transferred to data enhancement of other tissues or cell lines of unknown resolution. Furthermore, iEnhance performs robustly in enhancement of diverse chromatin interaction data including those from single-cell Hi-C and Micro-C experiments.
Collapse
Affiliation(s)
- Kai Li
- Hubei Key Laboratory of Agricultural Bioinformatics, College of Informatics, Huazhong Agricultural University, Wuhan 430070, China
| | - Ping Zhang
- Hubei Key Laboratory of Agricultural Bioinformatics, College of Informatics, Huazhong Agricultural University, Wuhan 430070, China
| | - Zilin Wang
- Hubei Key Laboratory of Agricultural Bioinformatics, College of Informatics, Huazhong Agricultural University, Wuhan 430070, China
| | - Wei Shen
- Hubei Key Laboratory of Agricultural Bioinformatics, College of Informatics, Huazhong Agricultural University, Wuhan 430070, China
| | - Weicheng Sun
- Hubei Key Laboratory of Agricultural Bioinformatics, College of Informatics, Huazhong Agricultural University, Wuhan 430070, China
| | - Jinsheng Xu
- Hubei Key Laboratory of Agricultural Bioinformatics, College of Informatics, Huazhong Agricultural University, Wuhan 430070, China
| | - Zi Wen
- Hubei Key Laboratory of Agricultural Bioinformatics, College of Informatics, Huazhong Agricultural University, Wuhan 430070, China
| | - Li Li
- Hubei Key Laboratory of Agricultural Bioinformatics, College of Informatics, Huazhong Agricultural University, Wuhan 430070, China
- Hubei Hongshan Laboratory, Huazhong Agricultural University, Wuhan 430070, China
| |
Collapse
|
45
|
Zhang Y, Blanchette M. Reference panel-guided super-resolution inference of Hi-C data. Bioinformatics 2023; 39:i386-i393. [PMID: 37387127 DOI: 10.1093/bioinformatics/btad266] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 07/01/2023] Open
Abstract
MOTIVATION Accurately assessing contacts between DNA fragments inside the nucleus with Hi-C experiment is crucial for understanding the role of 3D genome organization in gene regulation. This challenging task is due in part to the high sequencing depth of Hi-C libraries required to support high-resolution analyses. Most existing Hi-C data are collected with limited sequencing coverage, leading to poor chromatin interaction frequency estimation. Current computational approaches to enhance Hi-C signals focus on the analysis of individual Hi-C datasets of interest, without taking advantage of the facts that (i) several hundred Hi-C contact maps are publicly available and (ii) the vast majority of local spatial organizations are conserved across multiple cell types. RESULTS Here, we present RefHiC-SR, an attention-based deep learning framework that uses a reference panel of Hi-C datasets to facilitate the enhancement of Hi-C data resolution of a given study sample. We compare RefHiC-SR against tools that do not use reference samples and find that RefHiC-SR outperforms other programs across different cell types, and sequencing depths. It also enables high-accuracy mapping of structures such as loops and topologically associating domains. AVAILABILITY AND IMPLEMENTATION https://github.com/BlanchetteLab/RefHiC.
Collapse
Affiliation(s)
- Yanlin Zhang
- School of Computer Science, McGill University, Montréal, Québec H3A 0E9, Canada
| | - Mathieu Blanchette
- School of Computer Science, McGill University, Montréal, Québec H3A 0E9, Canada
| |
Collapse
|
46
|
Wang B, Liu K, Li Y, Wang J. DFHiC: a dilated full convolution model to enhance the resolution of Hi-C data. Bioinformatics 2023; 39:btad211. [PMID: 37084258 PMCID: PMC10166584 DOI: 10.1093/bioinformatics/btad211] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/04/2022] [Revised: 02/13/2023] [Accepted: 04/12/2023] [Indexed: 04/22/2023] Open
Abstract
MOTIVATION Hi-C technology has been the most widely used chromosome conformation capture (3C) experiment that measures the frequency of all paired interactions in the entire genome, which is a powerful tool for studying the 3D structure of the genome. The fineness of the constructed genome structure depends on the resolution of Hi-C data. However, due to the fact that high-resolution Hi-C data require deep sequencing and thus high experimental cost, most available Hi-C data are in low-resolution. Hence, it is essential to enhance the quality of Hi-C data by developing the effective computational methods. RESULTS In this work, we propose a novel method, so-called DFHiC, which generates the high-resolution Hi-C matrix from the low-resolution Hi-C matrix in the framework of the dilated convolutional neural network. The dilated convolution is able to effectively explore the global patterns in the overall Hi-C matrix by taking advantage of the information of the Hi-C matrix in a way of the longer genomic distance. Consequently, DFHiC can improve the resolution of the Hi-C matrix reliably and accurately. More importantly, the super-resolution Hi-C data enhanced by DFHiC is more in line with the real high-resolution Hi-C data than those done by the other existing methods, in terms of both chromatin significant interactions and identifying topologically associating domains. AVAILABILITY AND IMPLEMENTATION https://github.com/BinWangCSU/DFHiC.
Collapse
Affiliation(s)
- Bin Wang
- School of Computer Science and Engineering, Central South University, Changsha 410083, China
- Hunan Provincial Key Lab on Bioinformatics, Central South University, Changsha 410083, China
| | - Kun Liu
- School of Computer Science and Engineering, Central South University, Changsha 410083, China
- Hunan Provincial Key Lab on Bioinformatics, Central South University, Changsha 410083, China
| | - Yaohang Li
- Department of Computer Science, Old Dominion University, Norfolk, VA 23529, United States
| | - Jianxin Wang
- School of Computer Science and Engineering, Central South University, Changsha 410083, China
- Hunan Provincial Key Lab on Bioinformatics, Central South University, Changsha 410083, China
| |
Collapse
|
47
|
Kalluchi A, Harris HL, Reznicek TE, Rowley MJ. Considerations and caveats for analyzing chromatin compartments. Front Mol Biosci 2023; 10:1168562. [PMID: 37091873 PMCID: PMC10113542 DOI: 10.3389/fmolb.2023.1168562] [Citation(s) in RCA: 7] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/17/2023] [Accepted: 03/27/2023] [Indexed: 04/08/2023] Open
Abstract
Genomes are organized into nuclear compartments, separating active from inactive chromatin. Chromatin compartments are readily visible in a large number of species by experiments that map chromatin conformation genome-wide. When analyzing these maps, a common step is the identification of genomic intervals that interact within A (active) and B (inactive) compartments. It has also become increasingly common to identify and analyze subcompartments. We review different strategies to identify A/B and subcompartment intervals, including a discussion of various machine-learning approaches to predict these features. We then discuss the strengths and limitations of current strategies and examine how these aspects of analysis may have impacted our understanding of chromatin compartments.
Collapse
Affiliation(s)
| | | | | | - M. Jordan Rowley
- Department of Genetics, Cell Biology and Anatomy, University of Nebraska Medical Center, Omaha, NE, United States
| |
Collapse
|
48
|
Gong H, Li M, Ji M, Zhang X, Yuan Z, Zhang S, Yang Y, Li C, Chen Y. MINE is a method for detecting spatial density of regulatory chromatin interactions based on a multi-modal network. CELL REPORTS METHODS 2023; 3:100386. [PMID: 36814847 PMCID: PMC9939382 DOI: 10.1016/j.crmeth.2022.100386] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 06/30/2022] [Revised: 09/15/2022] [Accepted: 12/16/2022] [Indexed: 06/18/2023]
Abstract
Chromatin interactions play essential roles in chromatin conformation and gene expression. However, few tools exist to analyze the spatial density of regulatory chromatin interactions (SD-RCI). Here, we present the multi-modal network (MINE) toolkit, including MINE-Loop, MINE-Density, and MINE-Viewer. The MINE-Loop network aims to enhance the detection of RCIs, MINE-Density quantifies the SD--RCI, and MINE-Viewer facilitates 3D visualization of the density of chromatin interactions and participating regulatory factors (e.g., transcription factors). We applied MINE to investigate the relationship between the SD-RCI and chromatin volume change in HeLa cells before and after liquid-liquid phase separation. Changes in SD-RCI before and after treating the HeLa cells with 1,6-hexanediol suggest that changes in chromatin organization was related to the degree of activation or repression of genes. Together, the MINE toolkit enables quantitative studies on different aspects of chromatin conformation and regulatory activity.
Collapse
Affiliation(s)
- Haiyan Gong
- Beijing Advanced Innovation Center for Materials Genome Engineering, School of Computer and Communication Engineering, University of Science and Technology Beijing, Beijing 100083, China
| | - Minghong Li
- Beijing Advanced Innovation Center for Materials Genome Engineering, School of Computer and Communication Engineering, University of Science and Technology Beijing, Beijing 100083, China
| | - Mengdie Ji
- State Key Laboratory of Medical Molecular Biology, Department of Biochemistry and Molecular Biology, Institute of Basic Medical Sciences, School of Basic Medicine, Chinese Academy of Medical Sciences, Peking Union Medical College, Beijing 100005, China
| | - Xiaotong Zhang
- Beijing Advanced Innovation Center for Materials Genome Engineering, School of Computer and Communication Engineering, University of Science and Technology Beijing, Beijing 100083, China
- Shunde Innovation School, University of Science and Technology Beijing, Foshan 528399, China
| | - Zan Yuan
- State Key Laboratory of Medical Molecular Biology, Department of Biochemistry and Molecular Biology, Institute of Basic Medical Sciences, School of Basic Medicine, Chinese Academy of Medical Sciences, Peking Union Medical College, Beijing 100005, China
| | - Sichen Zhang
- Beijing Advanced Innovation Center for Materials Genome Engineering, School of Computer and Communication Engineering, University of Science and Technology Beijing, Beijing 100083, China
| | - Yi Yang
- Beijing Advanced Innovation Center for Materials Genome Engineering, School of Computer and Communication Engineering, University of Science and Technology Beijing, Beijing 100083, China
| | - Chun Li
- School of Mechanical Engineering, University of Science and Technology Beijing, Beijing 100083, China
| | - Yang Chen
- State Key Laboratory of Medical Molecular Biology, Department of Biochemistry and Molecular Biology, Institute of Basic Medical Sciences, School of Basic Medicine, Chinese Academy of Medical Sciences, Peking Union Medical College, Beijing 100005, China
| |
Collapse
|
49
|
Yang JY, Chang JM. Pattern recognition of topologically associating domains using deep learning. BMC Bioinformatics 2022; 22:634. [PMID: 36482308 PMCID: PMC9732975 DOI: 10.1186/s12859-022-05075-1] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/07/2022] [Accepted: 11/22/2022] [Indexed: 12/13/2022] Open
Abstract
BACKGROUND Recent increasing evidence indicates that three-dimensional chromosome structure plays an important role in genomic function. Topologically associating domains (TADs) are self-interacting regions that have been shown to be a chromosomal structural unit. During evolution, these are conserved based on checking synteny block cross species. Are there common TAD patterns across species or cell lines? RESULTS To address the above question, we propose a novel task-TAD recognition-as opposed to traditional TAD identification. Specifically, we treat Hi-C maps as images, thus re-casting TAD recognition as image pattern recognition, for which we use a convolutional neural network and a residual neural network. In addition, we propose an elegant way to generate non-TAD data for binary classification. We demonstrate deep learning performance which is quite promising, AUC > 0.80, through cross-species and cell-type validation. CONCLUSIONS TADs have been shown to be conserved during evolution. Interestingly, our results confirm that the TAD recognition model is practical across species, which indicates that TADs between human and mouse show common patterns from an image classification point of view. Our approach could be a new way to identify TAD variations or patterns among Hi-C maps. For example, TADs of two Hi-C maps are conserved if the two classification models are exchangeable.
Collapse
Affiliation(s)
- Jhen Yuan Yang
- Department of Computer Science, National Chengchi University, 11605 Taipei City, Taiwan
| | - Jia-Ming Chang
- Department of Computer Science, National Chengchi University, 11605 Taipei City, Taiwan
| |
Collapse
|
50
|
Reference panel guided topological structure annotation of Hi-C data. Nat Commun 2022; 13:7426. [PMID: 36460680 PMCID: PMC9718747 DOI: 10.1038/s41467-022-35231-3] [Citation(s) in RCA: 13] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/10/2022] [Accepted: 11/22/2022] [Indexed: 12/03/2022] Open
Abstract
Accurately annotating topological structures (e.g., loops and topologically associating domains) from Hi-C data is critical for understanding the role of 3D genome organization in gene regulation. This is a challenging task, especially at high resolution, in part due to the limited sequencing coverage of Hi-C data. Current approaches focus on the analysis of individual Hi-C data sets of interest, without taking advantage of the facts that (i) several hundred Hi-C contact maps are publicly available, and (ii) the vast majority of topological structures are conserved across multiple cell types. Here, we present RefHiC, an attention-based deep learning framework that uses a reference panel of Hi-C datasets to facilitate topological structure annotation from a given study sample. We compare RefHiC against tools that do not use reference samples and find that RefHiC outperforms other programs at both topological associating domain and loop annotation across different cell types, species, and sequencing depths.
Collapse
|