1
|
Wang J, Cheng K, Yan C, Luo H, Luo J. DconnLoop: a deep learning model for predicting chromatin loops based on multi-source data integration. BMC Bioinformatics 2025; 26:96. [PMID: 40170155 PMCID: PMC11959853 DOI: 10.1186/s12859-025-06092-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/27/2024] [Accepted: 02/19/2025] [Indexed: 04/03/2025] Open
Abstract
BACKGROUND Chromatin loops are critical for the three-dimensional organization of the genome and gene regulation. Accurate identification of chromatin loops is essential for understanding the regulatory mechanisms in disease. However, current mainstream detection methods rely primarily on single-source data, such as Hi-C, which limits these methods' ability to capture the diverse features of chromatin loop structures. In contrast, multi-source data integration and deep learning approaches, though not yet widely applied, hold significant potential. RESULTS In this study, we developed a method called DconnLoop to integrate Hi-C, ChIP-seq, and ATAC-seq data to predict chromatin loops. This method achieves feature extraction and fusion of multi-source data by integrating residual mechanisms, directional connectivity excitation modules, and interactive feature space decoders. Finally, we apply density estimation and density clustering to the genome-wide prediction results to identify more representative loops. The code is available from https://github.com/kuikui-C/DconnLoop . CONCLUSIONS The results demonstrate that DconnLoop outperforms existing methods in both precision and recall. In various experiments, including Aggregate Peak Analysis and peak enrichment comparisons, DconnLoop consistently shows advantages. Extensive ablation studies and validation across different sequencing depths further confirm DconnLoop's robustness and generalizability.
Collapse
Affiliation(s)
- Junfeng Wang
- School of Physics and Electronic Information Engineering, Henan Polytechnic University, Jiaozuo, 454003, China
- School of Software, Henan Polytechnic University, Jiaozuo, 454003, China
| | - Kuikui Cheng
- School of Physics and Electronic Information Engineering, Henan Polytechnic University, Jiaozuo, 454003, China
| | - Chaokun Yan
- School of Computer and Information Engineering, Henan University, Kaifeng, 475001, China
| | - Huimin Luo
- School of Computer and Information Engineering, Henan University, Kaifeng, 475001, China
| | - Junwei Luo
- School of Software, Henan Polytechnic University, Jiaozuo, 454003, China.
| |
Collapse
|
2
|
Kaiser VB, Semple CA. CTCF-anchored chromatin loop dynamics during human meiosis. BMC Biol 2025; 23:83. [PMID: 40114154 PMCID: PMC11927364 DOI: 10.1186/s12915-025-02181-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/09/2024] [Accepted: 03/03/2025] [Indexed: 03/22/2025] Open
Abstract
BACKGROUND During meiosis, the mammalian genome is organised within chromatin loops, which facilitate synapsis, crossing over and chromosome segregation, setting the stage for recombination events and the generation of genetic diversity. Chromatin looping is thought to play a major role in the establishment of cross overs during prophase I of meiosis, in diploid early primary spermatocytes. However, chromatin conformation dynamics during human meiosis are difficult to study experimentally, due to the transience of each cell division and the difficulty of obtaining stage-resolved cell populations. Here, we employed a machine learning framework trained on single cell ATAC-seq and RNA-seq data to predict CTCF-anchored looping during spermatogenesis, including cell types at different stages of meiosis. RESULTS We find dramatic changes in genome-wide looping patterns throughout meiosis: compared to pre-and-post meiotic germline cell types, loops in meiotic early primary spermatocytes are more abundant, more variable between individual cells, and more evenly spread throughout the genome. In preparation for the first meiotic division, loops also include longer stretches of DNA, encompassing more than half of the total genome. These loop structures then influence the rate of recombination initiation and resolution as cross overs. In contrast, in later mature sperm stages, we find evidence of genome compaction, with loops being confined to the telomeric ends of the chromosomes. CONCLUSION Overall, we find that chromatin loops do not orchestrate the gene expression dynamics seen during spermatogenesis, but loops do play important roles in recombination, influencing the positions of DNA breakage and cross over events.
Collapse
Affiliation(s)
- Vera B Kaiser
- MRC Human Genetics Unit, Institute of Genetics and Cancer, University of Edinburgh, Western General Hospital, Crewe Road, Edinburgh, EH4 2XU, UK.
| | - Colin A Semple
- MRC Human Genetics Unit, Institute of Genetics and Cancer, University of Edinburgh, Western General Hospital, Crewe Road, Edinburgh, EH4 2XU, UK
| |
Collapse
|
3
|
Lyu H, Chen X, Cheng Y, Zhang T, Wang P, Wong JHY, Wang J, Stasiak L, Sun L, Yang G, Wang L, Yue F. Pioneer factor GATA6 promotes colorectal cancer through 3D genome regulation. SCIENCE ADVANCES 2025; 11:eads4985. [PMID: 39919174 PMCID: PMC11804904 DOI: 10.1126/sciadv.ads4985] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 08/16/2024] [Accepted: 01/09/2025] [Indexed: 02/09/2025]
Abstract
Colorectal cancer (CRC) is one of the most lethal and prevalent malignancies. While the overexpression of pioneer factor GATA6 in CRC has been linked with metastasis, its role in genome-wide gene expression dysregulation remains unclear. Through studies of primary human CRC tissues and analysis of the TCGA data, we found that GATA6 preferentially binds at CRC-specific active enhancers, with enrichment at enhancer-promoter loop anchors. GATA6 protein also physically interacts with CTCF, suggesting its critical role in 3D genome organization. The ablation of GATA6 through AID and CRISPR systems severely impaired cancer cell clonogenicity and proliferation. Mechanistically, GATA6 knockout induced global loss of CRC-specific open chromatins and extensive alterations of critical enhancer-promoter interactions for CRC oncogenes. Last, we showed that GATA6 knockout greatly reduced tumor growth and improved survival in mice. Together, we revealed a previously unidentified mechanism by which GATA6 contributes to the pathogenesis of colorectal cancer.
Collapse
Affiliation(s)
- Huijue Lyu
- Department of Biochemistry and Molecular Genetics, Northwestern University Feinberg School of Medicine, Chicago, IL, USA
| | - Xintong Chen
- Department of Biochemistry and Molecular Genetics, Northwestern University Feinberg School of Medicine, Chicago, IL, USA
| | - Yang Cheng
- Department of Biochemistry and Molecular Genetics, Northwestern University Feinberg School of Medicine, Chicago, IL, USA
| | - Te Zhang
- Department of Biochemistry and Molecular Genetics, Northwestern University Feinberg School of Medicine, Chicago, IL, USA
| | - Ping Wang
- Department of Biochemistry and Molecular Genetics, Northwestern University Feinberg School of Medicine, Chicago, IL, USA
| | - Josiah Hiu-yuen Wong
- Department of Biochemistry and Molecular Genetics, Northwestern University Feinberg School of Medicine, Chicago, IL, USA
| | - Juan Wang
- Department of Biochemistry and Molecular Genetics, Northwestern University Feinberg School of Medicine, Chicago, IL, USA
| | - Lena Stasiak
- Department of Biochemistry and Molecular Genetics, Northwestern University Feinberg School of Medicine, Chicago, IL, USA
| | - Leyu Sun
- Department of Pathology, Northwestern University, Feinberg School of Medicine, Chicago, IL, USA
| | - Guangyu Yang
- Department of Pathology, Northwestern University, Feinberg School of Medicine, Chicago, IL, USA
| | - Lu Wang
- Department of Biochemistry and Molecular Genetics, Northwestern University Feinberg School of Medicine, Chicago, IL, USA
| | - Feng Yue
- Department of Biochemistry and Molecular Genetics, Northwestern University Feinberg School of Medicine, Chicago, IL, USA
- Robert H. Lurie Comprehensive Cancer Center of Northwestern University, Chicago, IL, USA
| |
Collapse
|
4
|
Wall BPG, Nguyen M, Harrell JC, Dozmorov MG. Machine and Deep Learning Methods for Predicting 3D Genome Organization. Methods Mol Biol 2025; 2856:357-400. [PMID: 39283464 DOI: 10.1007/978-1-0716-4136-1_22] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 09/25/2024]
Abstract
Three-dimensional (3D) chromatin interactions, such as enhancer-promoter interactions (EPIs), loops, topologically associating domains (TADs), and A/B compartments, play critical roles in a wide range of cellular processes by regulating gene expression. Recent development of chromatin conformation capture technologies has enabled genome-wide profiling of various 3D structures, even with single cells. However, current catalogs of 3D structures remain incomplete and unreliable due to differences in technology, tools, and low data resolution. Machine learning methods have emerged as an alternative to obtain missing 3D interactions and/or improve resolution. Such methods frequently use genome annotation data (ChIP-seq, DNAse-seq, etc.), DNA sequencing information (k-mers and transcription factor binding site (TFBS) motifs), and other genomic properties to learn the associations between genomic features and chromatin interactions. In this review, we discuss computational tools for predicting three types of 3D interactions (EPIs, chromatin interactions, and TAD boundaries) and analyze their pros and cons. We also point out obstacles to the computational prediction of 3D interactions and suggest future research directions.
Collapse
Affiliation(s)
- Brydon P G Wall
- Center for Biological Data Science, Virginia Commonwealth University, Richmond, VA, USA
| | - My Nguyen
- Department of Biostatistics, Virginia Commonwealth University, Richmond, VA, USA
| | - J Chuck Harrell
- Department of Pathology, Virginia Commonwealth University, Richmond, VA, USA
- Massey Comprehensive Cancer Center, Virginia Commonwealth University, Richmond, VA, USA
- Center for Pharmaceutical Engineering, Virginia Commonwealth University, Richmond, VA, USA
| | - Mikhail G Dozmorov
- Department of Biostatistics, Virginia Commonwealth University, Richmond, VA, USA.
- Department of Pathology, Virginia Commonwealth University, Richmond, VA, USA.
| |
Collapse
|
5
|
Kumar Halder A, Agarwal A, Jodkowska K, Plewczynski D. A systematic analyses of different bioinformatics pipelines for genomic data and its impact on deep learning models for chromatin loop prediction. Brief Funct Genomics 2024; 23:538-548. [PMID: 38555493 DOI: 10.1093/bfgp/elae009] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/28/2023] [Revised: 02/07/2024] [Accepted: 03/04/2024] [Indexed: 04/02/2024] Open
Abstract
Genomic data analysis has witnessed a surge in complexity and volume, primarily driven by the advent of high-throughput technologies. In particular, studying chromatin loops and structures has become pivotal in understanding gene regulation and genome organization. This systematic investigation explores the realm of specialized bioinformatics pipelines designed specifically for the analysis of chromatin loops and structures. Our investigation incorporates two protein (CTCF and Cohesin) factor-specific loop interaction datasets from six distinct pipelines, amassing a comprehensive collection of 36 diverse datasets. Through a meticulous review of existing literature, we offer a holistic perspective on the methodologies, tools and algorithms underpinning the analysis of this multifaceted genomic feature. We illuminate the vast array of approaches deployed, encompassing pivotal aspects such as data preparation pipeline, preprocessing, statistical features and modelling techniques. Beyond this, we rigorously assess the strengths and limitations inherent in these bioinformatics pipelines, shedding light on the interplay between data quality and the performance of deep learning models, ultimately advancing our comprehension of genomic intricacies.
Collapse
Affiliation(s)
- Anup Kumar Halder
- Laboratory of Bioinformatics and Computational Genomics, Faculty of Mathematics and Information Science, Warsaw University of Technology, Koszykowa 75, 00-662 Warsaw, Poland
- Laboratory of Functional and Structural Genomics, Centre of New Technologies, University of Warsaw, Banacha 2c, 02-097 Warsaw, Poland
| | - Abhishek Agarwal
- Laboratory of Functional and Structural Genomics, Centre of New Technologies, University of Warsaw, Banacha 2c, 02-097 Warsaw, Poland
| | - Karolina Jodkowska
- Laboratory of Functional and Structural Genomics, Centre of New Technologies, University of Warsaw, Banacha 2c, 02-097 Warsaw, Poland
| | - Dariusz Plewczynski
- Laboratory of Bioinformatics and Computational Genomics, Faculty of Mathematics and Information Science, Warsaw University of Technology, Koszykowa 75, 00-662 Warsaw, Poland
- Laboratory of Functional and Structural Genomics, Centre of New Technologies, University of Warsaw, Banacha 2c, 02-097 Warsaw, Poland
| |
Collapse
|
6
|
Rosean S, Sosa EA, O'Shea D, Raj SM, Seoighe C, Greally JM. Regulatory landscape enrichment analysis (RLEA): a computational toolkit for non-coding variant enrichment and cell type prioritization. BMC Bioinformatics 2024; 25:179. [PMID: 38714913 PMCID: PMC11075237 DOI: 10.1186/s12859-024-05794-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/23/2024] [Accepted: 04/22/2024] [Indexed: 05/12/2024] Open
Abstract
BACKGROUND As genomic studies continue to implicate non-coding sequences in disease, testing the roles of these variants requires insights into the cell type(s) in which they are likely to be mediating their effects. Prior methods for associating non-coding variants with cell types have involved approaches using linkage disequilibrium or ontological associations, incurring significant processing requirements. GaiaAssociation is a freely available, open-source software that enables thousands of genomic loci implicated in a phenotype to be tested for enrichment at regulatory loci of multiple cell types in minutes, permitting insights into the cell type(s) mediating the studied phenotype. RESULTS In this work, we present Regulatory Landscape Enrichment Analysis (RLEA) by GaiaAssociation and demonstrate its capability to test the enrichment of 12,133 variants across the cis-regulatory regions of 44 cell types. This analysis was completed in 134.0 ± 2.3 s, highlighting the efficient processing provided by GaiaAssociation. The intuitive interface requires only four inputs, offers a collection of customizable functions, and visualizes variant enrichment in cell-type regulatory regions through a heatmap matrix. GaiaAssociation is available on PyPi for download as a command line tool or Python package and the source code can also be installed from GitHub at https://github.com/GreallyLab/gaiaAssociation . CONCLUSIONS GaiaAssociation is a novel package that provides an intuitive and efficient resource to understand the enrichment of non-coding variants across the cis-regulatory regions of different cells, empowering studies seeking to identify disease-mediating cell types.
Collapse
Affiliation(s)
- Samuel Rosean
- Department of Systems and Computational Biology, Albert Einstein College of Medicine, Bronx, NY, 10461, USA
| | - Eric A Sosa
- Department of Genetics, Albert Einstein College of Medicine, Bronx, NY, 10461, USA
| | - Dónal O'Shea
- School of Mathematics, Statistics & Applied Mathematics, National University of Ireland Galway, Galway, H91 TK33, Ireland
| | - Srilakshmi M Raj
- Department of Genetics, Albert Einstein College of Medicine, Bronx, NY, 10461, USA
| | - Cathal Seoighe
- School of Mathematics, Statistics & Applied Mathematics, National University of Ireland Galway, Galway, H91 TK33, Ireland
| | - John M Greally
- Department of Genetics, Albert Einstein College of Medicine, Bronx, NY, 10461, USA.
| |
Collapse
|
7
|
Shen J, Wang Y, Luo J. CD-Loop: a chromatin loop detection method based on the diffusion model. Front Genet 2024; 15:1393406. [PMID: 38770419 PMCID: PMC11102972 DOI: 10.3389/fgene.2024.1393406] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/29/2024] [Accepted: 04/11/2024] [Indexed: 05/22/2024] Open
Abstract
Motivation In recent years, there have been significant advances in various chromatin conformation capture techniques, and annotating the topological structure from Hi-C contact maps has become crucial for studying the three-dimensional structure of chromosomes. However, the structure and function of chromatin loops are highly dynamic and diverse, influenced by multiple factors. Therefore, obtaining the three-dimensional structure of the genome remains a challenging task. Among many chromatin loop prediction methods, it is difficult to fully extract features from the contact map and make accurate predictions at low sequencing depths. Results In this study, we put forward a deep learning framework based on the diffusion model called CD-Loop for predicting accurate chromatin loops. First, by pre-training the input data, we obtain prior probabilities for predicting the classification of the Hi-C contact map. Then, by combining the denoising process based on the diffusion model and the prior probability obtained by pre-training, candidate loops were predicted from the input Hi-C contact map. Finally, CD-Loop uses a density-based clustering algorithm to cluster the candidate chromatin loops and predict the final chromatin loops. We compared CD-Loop with the currently popular methods, such as Peakachu, Chromosight, and Mustache, and found that in different cell types, species, and sequencing depths, CD-Loop outperforms other methods in loop annotation. We conclude that CD-Loop can accurately predict chromatin loops and reveal cell-type specificity. The code is available at https://github.com/wangyang199897/CD-Loop.
Collapse
Affiliation(s)
| | | | - Junwei Luo
- School of Software, Henan Polytechnic University, Jiaozuo, China
| |
Collapse
|
8
|
Wall BPG, Nguyen M, Harrell JC, Dozmorov MG. Machine and deep learning methods for predicting 3D genome organization. ARXIV 2024:arXiv:2403.03231v1. [PMID: 38495565 PMCID: PMC10942493] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Subscribe] [Scholar Register] [Indexed: 03/19/2024]
Abstract
Three-Dimensional (3D) chromatin interactions, such as enhancer-promoter interactions (EPIs), loops, Topologically Associating Domains (TADs), and A/B compartments play critical roles in a wide range of cellular processes by regulating gene expression. Recent development of chromatin conformation capture technologies has enabled genome-wide profiling of various 3D structures, even with single cells. However, current catalogs of 3D structures remain incomplete and unreliable due to differences in technology, tools, and low data resolution. Machine learning methods have emerged as an alternative to obtain missing 3D interactions and/or improve resolution. Such methods frequently use genome annotation data (ChIP-seq, DNAse-seq, etc.), DNA sequencing information (k-mers, Transcription Factor Binding Site (TFBS) motifs), and other genomic properties to learn the associations between genomic features and chromatin interactions. In this review, we discuss computational tools for predicting three types of 3D interactions (EPIs, chromatin interactions, TAD boundaries) and analyze their pros and cons. We also point out obstacles of computational prediction of 3D interactions and suggest future research directions.
Collapse
Affiliation(s)
- Brydon P. G. Wall
- Center for Biological Data Science, Virginia Commonwealth University, Richmond, VA, 23284, USA
| | - My Nguyen
- Department of Biostatistics, Virginia Commonwealth University, Richmond, VA, 23298, USA
| | - J. Chuck Harrell
- Department of Pathology, Virginia Commonwealth University, Richmond, VA, 23284, USA
- Massey Comprehensive Cancer Center, Virginia Commonwealth University, Richmond, VA 23298, USA
- Center for Pharmaceutical Engineering, Virginia Commonwealth University, Richmond, VA 23298, USA
| | - Mikhail G. Dozmorov
- Department of Biostatistics, Virginia Commonwealth University, Richmond, VA, 23298, USA
- Department of Pathology, Virginia Commonwealth University, Richmond, VA, 23284, USA
| |
Collapse
|
9
|
Ren L, Ma W, Wang Y. SpecLoop predicts cell type-specific chromatin loop via transcription factor cooperation. Comput Biol Med 2024; 171:108182. [PMID: 38422958 DOI: 10.1016/j.compbiomed.2024.108182] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/07/2023] [Revised: 01/18/2024] [Accepted: 02/18/2024] [Indexed: 03/02/2024]
Abstract
Cell-type-Specific Chromatin Loops (CSCLs) are crucial for gene regulation and cell fate determination. However, the mechanisms governing their establishment remain elusive. Here, we present SpecLoop, a network regularization-based machine learning framework, to investigate the role of transcription factors (TFs) cooperation in CSCL formation. SpecLoop integrates multi-omics data, including gene expression, chromatin accessibility, sequence, protein-protein interaction, and TF binding motif data, to predict CSCLs and identify TF cooperations. Using high resolution Hi-C data as the gold standard, SpecLoop accurately predicts CSCL in GM12878, IMR90, HeLa-S3, K562, HUVEC, HMEC, and NHEK seven cell types, with the AUROC values ranging from 0.8645 to 0.9852 and AUPR values ranging from 0.8654 to 0.9734. Notably SpecLoop demonstrates improved accuracy in predicting long-distance CSCLs and identifies TF complexes with strong predictive ability. Our study systematically explores the TFs and TF pairs associated with CSCL through effective integration of diverse omics data. SpecLoop is freely available at https://github.com/AMSSwanglab/SpecLoop.
Collapse
Affiliation(s)
- Lixin Ren
- Department of Applied Mathematics, School of Mathematics and Physics, University of Science and Technology Beijing, 100083, Beijing, China.
| | - Wanbiao Ma
- Department of Applied Mathematics, School of Mathematics and Physics, University of Science and Technology Beijing, 100083, Beijing, China.
| | - Yong Wang
- CEMS, NCMIS, HCMS, MDIS, Academy of Mathematics and Systems Science, Chinese Academy of Sciences, Beijing, 100190, China; School of Mathematical Sciences, University of Chinese Academy of Sciences, Beijing, 100049, China; Center for Excellence in Animal Evolution and Genetics, Chinese Academy of Sciences, Kunming, 650223, China; Key Laboratory of Systems Biology, Hangzhou Institute for Advanced Study, University of Chinese Academy of Sciences, Chinese Academy of Sciences, Hangzhou, 330106, China.
| |
Collapse
|
10
|
Zhang Y, Boninsegna L, Yang M, Misteli T, Alber F, Ma J. Computational methods for analysing multiscale 3D genome organization. Nat Rev Genet 2024; 25:123-141. [PMID: 37673975 PMCID: PMC11127719 DOI: 10.1038/s41576-023-00638-1] [Citation(s) in RCA: 23] [Impact Index Per Article: 23.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 07/12/2023] [Indexed: 09/08/2023]
Abstract
Recent progress in whole-genome mapping and imaging technologies has enabled the characterization of the spatial organization and folding of the genome in the nucleus. In parallel, advanced computational methods have been developed to leverage these mapping data to reveal multiscale three-dimensional (3D) genome features and to provide a more complete view of genome structure and its connections to genome functions such as transcription. Here, we discuss how recently developed computational tools, including machine-learning-based methods and integrative structure-modelling frameworks, have led to a systematic, multiscale delineation of the connections among different scales of 3D genome organization, genomic and epigenomic features, functional nuclear components and genome function. However, approaches that more comprehensively integrate a wide variety of genomic and imaging datasets are still needed to uncover the functional role of 3D genome structure in defining cellular phenotypes in health and disease.
Collapse
Affiliation(s)
- Yang Zhang
- Computational Biology Department, School of Computer Science, Carnegie Mellon University, Pittsburgh, PA, USA
| | - Lorenzo Boninsegna
- Department of Microbiology, Immunology and Molecular Genetics and Institute for Quantitative and Computational Biosciences, University of California Los Angeles, Los Angeles, CA, USA
| | - Muyu Yang
- Computational Biology Department, School of Computer Science, Carnegie Mellon University, Pittsburgh, PA, USA
| | - Tom Misteli
- Center for Cancer Research, National Cancer Institute, Bethesda, MD, USA.
| | - Frank Alber
- Department of Microbiology, Immunology and Molecular Genetics and Institute for Quantitative and Computational Biosciences, University of California Los Angeles, Los Angeles, CA, USA.
| | - Jian Ma
- Computational Biology Department, School of Computer Science, Carnegie Mellon University, Pittsburgh, PA, USA.
| |
Collapse
|
11
|
Zhang P, Wu H. IChrom-Deep: An Attention-Based Deep Learning Model for Identifying Chromatin Interactions. IEEE J Biomed Health Inform 2023; 27:4559-4568. [PMID: 37402191 DOI: 10.1109/jbhi.2023.3292299] [Citation(s) in RCA: 17] [Impact Index Per Article: 8.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 07/06/2023]
Abstract
Identification of chromatin interactions is crucial for advancing our knowledge of gene regulation. However, due to the limitations of high-throughput experimental techniques, there is an urgent need to develop computational methods for predicting chromatin interactions. In this study, we propose a novel attention-based deep learning model, termed IChrom-Deep, to identify chromatin interactions using sequence features and genomic features. The experimental results based on the datasets of three cell lines demonstrate that the IChrom-Deep achieves satisfactory performance and is superior to the previous methods. We also investigate the effect of DNA sequence and associated features and genomic features on chromatin interactions, and highlight the applicable scenarios of some features, such as sequence conservation and distance. Moreover, we identify a few genomic features that are extremely important across different cell lines, and IChrom-Deep achieves comparable performance with only these significant genomic features versus using all genomic features. It is believed that IChrom-Deep can serve as a useful tool for future studies that seek to identify chromatin interactions.
Collapse
|
12
|
Xu H, Yi X, Fan X, Wu C, Wang W, Chu X, Zhang S, Dong X, Wang Z, Wang J, Zhou Y, Zhao K, Yao H, Zheng N, Wang J, Chen Y, Plewczynski D, Sham PC, Chen K, Huang D, Li MJ. Inferring CTCF-binding patterns and anchored loops across human tissues and cell types. PATTERNS (NEW YORK, N.Y.) 2023; 4:100798. [PMID: 37602215 PMCID: PMC10436006 DOI: 10.1016/j.patter.2023.100798] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/12/2022] [Revised: 01/25/2023] [Accepted: 06/20/2023] [Indexed: 08/22/2023]
Abstract
CCCTC-binding factor (CTCF) is a transcription regulator with a complex role in gene regulation. The recognition and effects of CTCF on DNA sequences, chromosome barriers, and enhancer blocking are not well understood. Existing computational tools struggle to assess the regulatory potential of CTCF-binding sites and their impact on chromatin loop formation. Here we have developed a deep-learning model, DeepAnchor, to accurately characterize CTCF binding using high-resolution genomic/epigenomic features. This has revealed distinct chromatin and sequence patterns for CTCF-mediated insulation and looping. An optimized implementation of a previous loop model based on DeepAnchor score excels in predicting CTCF-anchored loops. We have established a compendium of CTCF-anchored loops across 52 human tissue/cell types, and this suggests that genomic disruption of these loops could be a general mechanism of disease pathogenesis. These computational models and resources can help investigate how CTCF-mediated cis-regulatory elements shape context-specific gene regulation in cell development and disease progression.
Collapse
Affiliation(s)
- Hang Xu
- Department of Epidemiology and Biostatistics, Key Laboratory of Prevention and Control of Human Major Diseases (Ministry of Education), National Clinical Research Center for Cancer, Tianjin Medical University Cancer Institute and Hospital, Tianjin Medical University, Tianjin 300070, China
- Singapore Immunology Network (SIgN), Agency for Science, Technology and Research (A∗STAR), Singapore 138648, Singapore
| | - Xianfu Yi
- Department of Bioinformatics, The Province and Ministry Co-sponsored Collaborative Innovation Center for Medical Epigenetics, School of Basic Medical Sciences, Tianjin Medical University, Tianjin 300070, China
| | - Xutong Fan
- Department of Bioinformatics, The Province and Ministry Co-sponsored Collaborative Innovation Center for Medical Epigenetics, School of Basic Medical Sciences, Tianjin Medical University, Tianjin 300070, China
| | - Chengyue Wu
- Department of Biochemistry and Molecular Biology, School of Basic Medical Sciences, Tianjin Medical University, Tianjin 300070, China
| | - Wei Wang
- Department of Epidemiology and Biostatistics, Key Laboratory of Prevention and Control of Human Major Diseases (Ministry of Education), National Clinical Research Center for Cancer, Tianjin Medical University Cancer Institute and Hospital, Tianjin Medical University, Tianjin 300070, China
| | - Xinlei Chu
- Department of Epidemiology and Biostatistics, Key Laboratory of Prevention and Control of Human Major Diseases (Ministry of Education), National Clinical Research Center for Cancer, Tianjin Medical University Cancer Institute and Hospital, Tianjin Medical University, Tianjin 300070, China
| | - Shijie Zhang
- Department of Pharmacology, Tianjin Key Laboratory of Inflammation Biology, School of Basic Medical Sciences, Tianjin Medical University, Tianjin 300070, China
| | - Xiaobao Dong
- Department of Genetics, School of Basic Medical Sciences, Tianjin Medical University, Tianjin 300070, China
| | - Zhao Wang
- Department of Pharmacology, Tianjin Key Laboratory of Inflammation Biology, School of Basic Medical Sciences, Tianjin Medical University, Tianjin 300070, China
| | - Jianhua Wang
- Department of Bioinformatics, The Province and Ministry Co-sponsored Collaborative Innovation Center for Medical Epigenetics, School of Basic Medical Sciences, Tianjin Medical University, Tianjin 300070, China
| | - Yao Zhou
- Department of Bioinformatics, The Province and Ministry Co-sponsored Collaborative Innovation Center for Medical Epigenetics, School of Basic Medical Sciences, Tianjin Medical University, Tianjin 300070, China
| | - Ke Zhao
- Department of Pharmacology, Tianjin Key Laboratory of Inflammation Biology, School of Basic Medical Sciences, Tianjin Medical University, Tianjin 300070, China
| | - Hongcheng Yao
- Centre for PanorOmic Sciences-Genomics and Bioinformatics Cores, The University of Hong Kong, Hong Kong 999077, China
| | - Nan Zheng
- Department of Network Security and Informatization, Tianjin Medical University, Tianjin 300070, China
| | - Junwen Wang
- Department of Health Sciences Research and Center for Individualized Medicine, Mayo Clinic, Scottsdale, AZ 85259, USA
| | - Yupeng Chen
- Department of Biochemistry and Molecular Biology, School of Basic Medical Sciences, Tianjin Medical University, Tianjin 300070, China
| | - Dariusz Plewczynski
- Faculty of Mathematics and Information Science, Warsaw University of Technology, Warsaw, Poland
| | - Pak Chung Sham
- Centre for PanorOmic Sciences-Genomics and Bioinformatics Cores, The University of Hong Kong, Hong Kong 999077, China
| | - Kexin Chen
- Department of Epidemiology and Biostatistics, Key Laboratory of Prevention and Control of Human Major Diseases (Ministry of Education), National Clinical Research Center for Cancer, Tianjin Medical University Cancer Institute and Hospital, Tianjin Medical University, Tianjin 300070, China
| | - Dandan Huang
- Wuxi School of Medicine, Jiangnan University, Wuxi 214122, China
| | - Mulin Jun Li
- Department of Epidemiology and Biostatistics, Key Laboratory of Prevention and Control of Human Major Diseases (Ministry of Education), National Clinical Research Center for Cancer, Tianjin Medical University Cancer Institute and Hospital, Tianjin Medical University, Tianjin 300070, China
- Department of Bioinformatics, The Province and Ministry Co-sponsored Collaborative Innovation Center for Medical Epigenetics, School of Basic Medical Sciences, Tianjin Medical University, Tianjin 300070, China
| |
Collapse
|
13
|
Liu T, Wang Z. DeepChIA-PET: Accurately predicting ChIA-PET from Hi-C and ChIP-seq with deep dilated networks. PLoS Comput Biol 2023; 19:e1011307. [PMID: 37440599 PMCID: PMC10368233 DOI: 10.1371/journal.pcbi.1011307] [Citation(s) in RCA: 5] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/20/2023] [Accepted: 06/26/2023] [Indexed: 07/15/2023] Open
Abstract
Chromatin interaction analysis by paired-end tag sequencing (ChIA-PET) can capture genome-wide chromatin interactions mediated by a specific DNA-associated protein. The ChIA-PET experiments have been applied to explore the key roles of different protein factors in chromatin folding and transcription regulation. However, compared with widely available Hi-C and ChIP-seq data, there are not many ChIA-PET datasets available in the literature. A computational method for accurately predicting ChIA-PET interactions from Hi-C and ChIP-seq data is needed that can save the efforts of performing wet-lab experiments. Here we present DeepChIA-PET, a supervised deep learning approach that can accurately predict ChIA-PET interactions by learning the latent relationships between ChIA-PET and two widely used data types: Hi-C and ChIP-seq. We trained our deep models with CTCF-mediated ChIA-PET of GM12878 as ground truth, and the deep network contains 40 dilated residual convolutional blocks. We first showed that DeepChIA-PET with only Hi-C as input significantly outperforms Peakachu, another computational method for predicting ChIA-PET from Hi-C but using random forests. We next proved that adding ChIP-seq as one extra input does improve the classification performance of DeepChIA-PET, but Hi-C plays a more prominent role in DeepChIA-PET than ChIP-seq. Our evaluation results indicate that our learned models can accurately predict not only CTCF-mediated ChIA-ET in GM12878 and HeLa but also non-CTCF ChIA-PET interactions, including RNA polymerase II (RNAPII) ChIA-PET of GM12878, RAD21 ChIA-PET of GM12878, and RAD21 ChIA-PET of K562. In total, DeepChIA-PET is an accurate tool for predicting the ChIA-PET interactions mediated by various chromatin-associated proteins from different cell types.
Collapse
Affiliation(s)
- Tong Liu
- Department of Computer Science, University of Miami, Coral Gables, Florida, United States of America
| | - Zheng Wang
- Department of Computer Science, University of Miami, Coral Gables, Florida, United States of America
| |
Collapse
|
14
|
Villaman C, Pollastri G, Saez M, Martin AJ. Benefiting from the intrinsic role of epigenetics to predict patterns of CTCF binding. Comput Struct Biotechnol J 2023; 21:3024-3031. [PMID: 37266407 PMCID: PMC10229758 DOI: 10.1016/j.csbj.2023.05.012] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/19/2022] [Revised: 05/11/2023] [Accepted: 05/11/2023] [Indexed: 06/03/2023] Open
Abstract
Motivation One of the most relevant mechanisms involved in the determination of chromatin structure is the formation of structural loops that are also related with the conservation of chromatin states. Many of these loops are stabilized by CCCTC-binding factor (CTCF) proteins at their base. Despite the relevance of chromatin structure and the key role of CTCF, the role of the epigenetic factors that are involved in the regulation of CTCF binding, and thus, in the formation of structural loops in the chromatin, is not thoroughly understood. Results Here we describe a CTCF binding predictor based on Random Forest that employs different epigenetic data and genomic features. Importantly, given the ability of Random Forests to determine the relevance of features for the prediction, our approach also shows how the different types of descriptors impact the binding of CTCF, confirming previous knowledge on the relevance of chromatin accessibility and DNA methylation, but demonstrating the effect of epigenetic modifications on the activity of CTCF. We compared our approach against other predictors and found improved performance in terms of areas under PR and ROC curves (PRAUC-ROCAUC), outperforming current state-of-the-art methods.
Collapse
Affiliation(s)
- Camilo Villaman
- Programa de Doctorado en Genómica Integrativa, Vicerrectoría de Investigación, Universidad Mayor, Santiago, Chile
- Laboratorio de Redes Biológicas, Centro Científico y Tecnológico de Excelencia Ciencia & Vida, Fundación Ciencia & Vida, Escuela de Ingeniería, Facultad de Ingeniería, Arquitectura y Diseño, Universidad San Sebastián, Santiago, Chile
| | | | - Mauricio Saez
- Centro de Oncología de Precisión, Facultad de Medicina y Ciencias de la Salud, Universidad Mayor, Santiago, Chile
- Laboratorio de Investigación en Salud de Precisión, Departamento de Procesos Diagnósticos y Evaluación, Facultad de Ciencias de la Salud, Universidad Católica de Temuco, Chile
| | - Alberto J.M. Martin
- Laboratorio de Redes Biológicas, Centro Científico y Tecnológico de Excelencia Ciencia & Vida, Fundación Ciencia & Vida, Escuela de Ingeniería, Facultad de Ingeniería, Arquitectura y Diseño, Universidad San Sebastián, Santiago, Chile
| |
Collapse
|
15
|
Zhang X, Zhu W, Sun H, Ding Y, Liu L. Prediction of CTCF loop anchor based on machine learning. Front Genet 2023; 14:1181956. [PMID: 37077544 PMCID: PMC10106609 DOI: 10.3389/fgene.2023.1181956] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/08/2023] [Accepted: 03/24/2023] [Indexed: 04/05/2023] Open
Abstract
Introduction: Various activities in biological cells are affected by three-dimensional genome structure. The insulators play an important role in the organization of higher-order structure. CTCF is a representative of mammalian insulators, which can produce barriers to prevent the continuous extrusion of chromatin loop. As a multifunctional protein, CTCF has tens of thousands of binding sites in the genome, but only a portion of them can be used as anchors of chromatin loops. It is still unclear how cells select the anchor in the process of chromatin looping.Methods: In this paper, a comparative analysis is performed to investigate the sequence preference and binding strength of anchor and non-anchor CTCF binding sites. Furthermore, a machine learning model based on the CTCF binding intensity and DNA sequence is proposed to predict which CTCF sites can form chromatin loop anchors.Results: The accuracy of the machine learning model that we constructed for predicting the anchor of the chromatin loop mediated by CTCF reached 0.8646. And we find that the formation of loop anchor is mainly influenced by the CTCF binding strength and binding pattern (which can be interpreted as the binding of different zinc fingers).Discussion: In conclusion, our results suggest that The CTCF core motif and it’s flanking sequence may be responsible for the binding specificity. This work contributes to understanding the mechanism of loop anchor selection and provides a reference for the prediction of CTCF-mediated chromatin loops.
Collapse
Affiliation(s)
- Xiao Zhang
- School of Mathematics and Statistics, Hainan Normal University, Haikou, China
- Yangtze Delta Region Institute (Quzhou), University of Electronic Science and Technology of China, Quzhou, China
- Key Laboratory of Computational Science and Application of Hainan Province, Haikou, China
| | - Wen Zhu
- School of Mathematics and Statistics, Hainan Normal University, Haikou, China
- Key Laboratory of Computational Science and Application of Hainan Province, Haikou, China
- *Correspondence: Wen Zhu,
| | - Huimin Sun
- School of Physical Science and Technology, Inner Mongolia University, Hohhot, China
| | - Yijie Ding
- Key Laboratory of Computational Science and Application of Hainan Province, Haikou, China
| | - Li Liu
- Yangtze Delta Region Institute (Quzhou), University of Electronic Science and Technology of China, Quzhou, China
| |
Collapse
|
16
|
Shin H, Kim Y. Regulation of loop extrusion on the interphase genome. Crit Rev Biochem Mol Biol 2023; 58:1-18. [PMID: 36921088 DOI: 10.1080/10409238.2023.2182273] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 03/17/2023]
Abstract
In the human cell nucleus, dynamically organized chromatin is the substrate for gene regulation, DNA replication, and repair. A central mechanism of DNA loop formation is an ATPase motor cohesin-mediated loop extrusion. The cohesin complexes load and unload onto the chromosome under the control of other regulators that physically interact and affect motor activity. Regulation of the dynamic loading cycle of cohesin influences not only the chromatin structure but also genome-associated human disorders and aging. This review focuses on the recently spotlighted genome organizing factors and the mechanism by which their dynamic interactions shape the genome architecture in interphase.
Collapse
Affiliation(s)
- Hyogyung Shin
- Department of New Biology, Daegu Gyeongbuk Institute of Science and Technology (DGIST), Daegu, South Korea
| | - Yoori Kim
- Department of New Biology, Daegu Gyeongbuk Institute of Science and Technology (DGIST), Daegu, South Korea.,New Biology Research Center, Daegu Gyeongbuk Institute of Science and Technology (DGIST), Daegu, South Korea
| |
Collapse
|
17
|
Zhao X, Zhu S, Peng W, Xue HH. The Interplay of Transcription and Genome Topology Programs T Cell Development and Differentiation. JOURNAL OF IMMUNOLOGY (BALTIMORE, MD. : 1950) 2022; 209:2269-2278. [PMID: 36469845 PMCID: PMC9731349 DOI: 10.4049/jimmunol.2200625] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 08/22/2022] [Accepted: 09/14/2022] [Indexed: 01/04/2023]
Abstract
T cells are essential for mounting defense against various pathogens and malignantly transformed cells. Thymic development and peripheral T cell differentiation are highly orchestrated biological processes that require precise gene regulation. Higher-order genome organization on multiple scales, in the form of chromatin loops, topologically associating domains and compartments, provides pivotal control of T cell gene expression. CTCF and the cohesin machinery are ubiquitously expressed architectural proteins responsible for establishing chromatin structures. Recent studies indicate that transcription factors, such as T lineage-defining Tcf1 and TCR-induced Batf, may have intrinsic ability and/or engage CTCF to shape chromatin architecture. In this article, we summarize current knowledge on the dynamic changes in genome topology that underlie normal or leukemic T cell development, CD4+ helper T cell differentiation, and CD8+ cytotoxic T cell functions. The knowledge lays a solid foundation for elucidating the causative link of spatial chromatin configuration to transcriptional and functional output in T cells.
Collapse
Affiliation(s)
- Xin Zhao
- Center for Discovery and Innovation, Hackensack University Medical Center, Nutley, NJ 07110
| | - Shaoqi Zhu
- Department of Physics, The George Washington University, Washington DC, 20052
| | - Weiqun Peng
- Department of Physics, The George Washington University, Washington DC, 20052
| | - Hai-Hui Xue
- Center for Discovery and Innovation, Hackensack University Medical Center, Nutley, NJ 07110
- New Jersey Veterans Affairs Health Care System, East Orange, NJ 07018
| |
Collapse
|
18
|
DLoopCaller: A deep learning approach for predicting genome-wide chromatin loops by integrating accessible chromatin landscapes. PLoS Comput Biol 2022; 18:e1010572. [PMID: 36206320 PMCID: PMC9581407 DOI: 10.1371/journal.pcbi.1010572] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/04/2022] [Revised: 10/19/2022] [Accepted: 09/14/2022] [Indexed: 11/20/2022] Open
Abstract
In recent years, major advances have been made in various chromosome conformation capture technologies to further satisfy the needs of researchers for high-quality, high-resolution contact interactions. Discriminating the loops from genome-wide contact interactions is crucial for dissecting three-dimensional(3D) genome structure and function. Here, we present a deep learning method to predict genome-wide chromatin loops, called DLoopCaller, by combining accessible chromatin landscapes and raw Hi-C contact maps. Some available orthogonal data ChIA-PET/HiChIP and Capture Hi-C were used to generate positive samples with a wider contact matrix which provides the possibility to find more potential genome-wide chromatin loops. The experimental results demonstrate that DLoopCaller effectively improves the accuracy of predicting genome-wide chromatin loops compared to the state-of-the-art method Peakachu. Moreover, compared to two of most popular loop callers, such as HiCCUPS and Fit-Hi-C, DLoopCaller identifies some unique interactions. We conclude that a combination of chromatin landscapes on the one-dimensional genome contributes to understanding the 3D genome organization, and the identified chromatin loops reveal cell-type specificity and transcription factor motif co-enrichment across different cell lines and species.
Collapse
|
19
|
Zhang P, Wu Y, Zhou H, Zhou B, Zhang H, Wu H. CLNN-loop: a deep learning model to predict CTCF-mediated chromatin loops in the different cell lines and CTCF-binding sites (CBS) pair types. Bioinformatics 2022; 38:4497-4504. [PMID: 35997565 DOI: 10.1093/bioinformatics/btac575] [Citation(s) in RCA: 32] [Impact Index Per Article: 10.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/09/2022] [Revised: 06/28/2022] [Accepted: 08/22/2022] [Indexed: 12/24/2022] Open
Abstract
MOTIVATION Three-dimensional (3D) genome organization is of vital importance in gene regulation and disease mechanisms. Previous studies have shown that CTCF-mediated chromatin loops are crucial to studying the 3D structure of cells. Although various experimental techniques have been developed to detect chromatin loops, they have been found to be time-consuming and costly. Nowadays, various sequence-based computational methods can capture significant features of 3D genome organization and help predict chromatin loops. However, these methods have low performance and poor generalization ability in predicting chromatin loops. RESULTS Here, we propose a novel deep learning model, called CLNN-loop, to predict chromatin loops in different cell lines and CTCF-binding sites (CBS) pair types by fusing multiple sequence-based features. The analysis of a series of examinations based on the datasets in the previous study shows that CLNN-loop has satisfactory performance and is superior to the existing methods in terms of predicting chromatin loops. In addition, we apply the SHAP framework to interpret the predictions of different models, and find that CTCF motif and sequence conservation are important signs of chromatin loops in different cell lines and CBS pair types. AVAILABILITY AND IMPLEMENTATION The source code of CLNN-loop is freely available at https://github.com/HaoWuLab-Bioinformatics/CLNN-loop and the webserver of CLNN-loop is freely available at http://hwclnn.sdu.edu.cn. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Pengyu Zhang
- School of Software, Shandong University, Jinan, Shandong 250101, China.,College of Information Engineering, Northwest A&F University, Yangling, Shaanxi 712100, China
| | - Yingfu Wu
- College of Information Engineering, Northwest A&F University, Yangling, Shaanxi 712100, China
| | - Haoru Zhou
- College of Information Engineering, Northwest A&F University, Yangling, Shaanxi 712100, China
| | - Bing Zhou
- College of Information Engineering, Northwest A&F University, Yangling, Shaanxi 712100, China
| | - Hongming Zhang
- College of Information Engineering, Northwest A&F University, Yangling, Shaanxi 712100, China
| | - Hao Wu
- School of Software, Shandong University, Jinan, Shandong 250101, China
| |
Collapse
|
20
|
Yang M, Ma J. Machine Learning Methods for Exploring Sequence Determinants of 3D Genome Organization. J Mol Biol 2022; 434:167666. [PMID: 35659533 DOI: 10.1016/j.jmb.2022.167666] [Citation(s) in RCA: 7] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/27/2021] [Revised: 05/23/2022] [Accepted: 05/27/2022] [Indexed: 01/25/2023]
Abstract
In higher eukaryotic cells, chromosomes are folded inside the nucleus. Recent advances in whole-genome mapping technologies have revealed the multiscale features of 3D genome organization that are intertwined with fundamental genome functions. However, DNA sequence determinants that modulate the formation of 3D genome organization remain poorly characterized. In the past few years, predicting 3D genome organization based on DNA sequence features has become an active area of research. Here, we review the recent progress in computational approaches to unraveling important sequence elements for 3D genome organization. In particular, we discuss the rapid development of machine learning-based methods that facilitate the connections between DNA sequence features and 3D genome architectures at different scales. While much progress has been made in developing predictive models for revealing important sequence features for 3D genome organization, new research is urgently needed to incorporate multi-omic data and enhance model interpretability, further advancing our understanding of gene regulation mechanisms through the lens of 3D genome organization.
Collapse
Affiliation(s)
- Muyu Yang
- Computational Biology Department, School of Computer Science, Carnegie Mellon University, United States. https://twitter.com/muyu_wendy_yang
| | - Jian Ma
- Computational Biology Department, School of Computer Science, Carnegie Mellon University, United States.
| |
Collapse
|
21
|
Ellingford JM, Ahn JW, Bagnall RD, Baralle D, Barton S, Campbell C, Downes K, Ellard S, Duff-Farrier C, FitzPatrick DR, Greally JM, Ingles J, Krishnan N, Lord J, Martin HC, Newman WG, O'Donnell-Luria A, Ramsden SC, Rehm HL, Richardson E, Singer-Berk M, Taylor JC, Williams M, Wood JC, Wright CF, Harrison SM, Whiffin N. Recommendations for clinical interpretation of variants found in non-coding regions of the genome. Genome Med 2022; 14:73. [PMID: 35850704 PMCID: PMC9295495 DOI: 10.1186/s13073-022-01073-3] [Citation(s) in RCA: 114] [Impact Index Per Article: 38.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/09/2022] [Accepted: 06/16/2022] [Indexed: 01/28/2023] Open
Abstract
BACKGROUND The majority of clinical genetic testing focuses almost exclusively on regions of the genome that directly encode proteins. The important role of variants in non-coding regions in penetrant disease is, however, increasingly being demonstrated, and the use of whole genome sequencing in clinical diagnostic settings is rising across a large range of genetic disorders. Despite this, there is no existing guidance on how current guidelines designed primarily for variants in protein-coding regions should be adapted for variants identified in other genomic contexts. METHODS We convened a panel of nine clinical and research scientists with wide-ranging expertise in clinical variant interpretation, with specific experience in variants within non-coding regions. This panel discussed and refined an initial draft of the guidelines which were then extensively tested and reviewed by external groups. RESULTS We discuss considerations specifically for variants in non-coding regions of the genome. We outline how to define candidate regulatory elements, highlight examples of mechanisms through which non-coding region variants can lead to penetrant monogenic disease, and outline how existing guidelines can be adapted for the interpretation of these variants. CONCLUSIONS These recommendations aim to increase the number and range of non-coding region variants that can be clinically interpreted, which, together with a compatible phenotype, can lead to new diagnoses and catalyse the discovery of novel disease mechanisms.
Collapse
Affiliation(s)
- Jamie M Ellingford
- Division of Evolution, Infection and Genomic Sciences, School of Biological Sciences, Faculty of Biology, Medicines and Health, University of Manchester, Manchester, M13 9PT, UK.
- Manchester Centre for Genomic Medicine, St Mary's Hospital, Manchester University NHS Foundation Trust, Manchester, M13 9WL, UK.
- Genomics England, London, UK.
| | - Joo Wook Ahn
- Cambridge Genomics Laboratory, Cambridge University Hospitals NHS Foundation Trust, Cambridge Biomedical Campus, Cambridge, UK
| | - Richard D Bagnall
- Agnes Ginges Centre for Molecular Cardiology at Centenary Institute, University of Sydney, Sydney, Australia
| | - Diana Baralle
- School of Human Development and Health, Faculty of Medicine, University of Southampton, Southampton, UK
- Wessex Clinical Genetics Service, University Hospital Southampton NHS Foundation Trust, Southampton, UK
| | - Stephanie Barton
- Manchester Centre for Genomic Medicine, St Mary's Hospital, Manchester University NHS Foundation Trust, Manchester, M13 9WL, UK
| | - Chris Campbell
- Manchester Centre for Genomic Medicine, St Mary's Hospital, Manchester University NHS Foundation Trust, Manchester, M13 9WL, UK
| | - Kate Downes
- Cambridge Genomics Laboratory, Cambridge University Hospitals NHS Foundation Trust, Cambridge Biomedical Campus, Cambridge, UK
| | - Sian Ellard
- Institute of Biomedical and Clinical Science, University of Exeter Medical School, Exeter, UK
- South West Genomic Laboratory Hub, Exeter Genomic Laboratory, Royal Devon and Exeter NHS Foundation Trust, Exeter, UK
| | - Celia Duff-Farrier
- South West NHS Genomic Laboratory Hub, Bristol Genetics Laboratory, North Bristol NHS Trust, Bristol, UK
| | - David R FitzPatrick
- MRC Human Genetics Unit, Institute of Genetics and Cancer, University of Edinburgh, Western General Hospital, Edinburgh, UK
| | - John M Greally
- Department of Pediatrics, Division of Pediatric Genetic, Medicine, Children's Hospital at Montefiore/Montefiore Medical Center/Albert, Einstein College of Medicine, Bronx, NY, USA
| | - Jodie Ingles
- Centre for Population Genomics, Garvan Institute of Medical Research, and UNSW Sydney, Sydney, Australia
- Centre for Population Genomics, Murdoch Children's Research Institute, Melbourne, Australia
| | - Neesha Krishnan
- Centre for Population Genomics, Garvan Institute of Medical Research, and UNSW Sydney, Sydney, Australia
- Centre for Population Genomics, Murdoch Children's Research Institute, Melbourne, Australia
| | - Jenny Lord
- School of Human Development and Health, Faculty of Medicine, University of Southampton, Southampton, UK
| | - Hilary C Martin
- Human Genetics Programme, Wellcome Sanger Institute, Wellcome Genome Campus, Hinxton, UK
| | - William G Newman
- Division of Evolution, Infection and Genomic Sciences, School of Biological Sciences, Faculty of Biology, Medicines and Health, University of Manchester, Manchester, M13 9PT, UK
- Manchester Centre for Genomic Medicine, St Mary's Hospital, Manchester University NHS Foundation Trust, Manchester, M13 9WL, UK
| | - Anne O'Donnell-Luria
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA, USA
- Division of Genetics and Genomics, Boston Children's Hospital, Boston, MA, USA
- Analytic and Translational Genetics Unit, Massachusetts General Hospital, Boston, MA, USA
| | - Simon C Ramsden
- Manchester Centre for Genomic Medicine, St Mary's Hospital, Manchester University NHS Foundation Trust, Manchester, M13 9WL, UK
| | - Heidi L Rehm
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA, USA
- Analytic and Translational Genetics Unit, Massachusetts General Hospital, Boston, MA, USA
| | - Ebony Richardson
- Centre for Population Genomics, Garvan Institute of Medical Research, and UNSW Sydney, Sydney, Australia
- Centre for Population Genomics, Murdoch Children's Research Institute, Melbourne, Australia
| | - Moriel Singer-Berk
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA, USA
| | - Jenny C Taylor
- National Institute for Health Research Oxford Biomedical Research Centre, Wellcome Centre for Human Genetics, University of Oxford, Oxford, OX3 7BN, UK
- Wellcome Centre for Human Genetics, University of Oxford, Oxford, OX3 7BN, UK
| | - Maggie Williams
- South West NHS Genomic Laboratory Hub, Bristol Genetics Laboratory, North Bristol NHS Trust, Bristol, UK
| | - Jordan C Wood
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA, USA
| | - Caroline F Wright
- Institute of Biomedical and Clinical Science, University of Exeter Medical School, Exeter, UK
| | - Steven M Harrison
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA, USA
- Ambry Genetics, Aliso Viejo, CA, USA
| | - Nicola Whiffin
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA, USA.
- Wellcome Centre for Human Genetics, University of Oxford, Oxford, OX3 7BN, UK.
| |
Collapse
|
22
|
Yang D, Chung T, Kim D. DeepLUCIA: predicting tissue-specific chromatin loops using Deep Learning-based Universal Chromatin Interaction Annotator. Bioinformatics 2022; 38:3501-3512. [PMID: 35640981 DOI: 10.1093/bioinformatics/btac373] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/21/2021] [Revised: 04/17/2022] [Accepted: 05/27/2022] [Indexed: 11/13/2022] Open
Abstract
MOTIVATION The importance of chromatin loops in gene regulation is broadly accepted. There are mainly two approaches to predict chromatin loops: transcription factor (TF) binding-dependent approach and genomic variation-based approach. However, neither of these approaches provides an adequate understanding of gene regulation in human tissues. To address this issue, we developed a deep learning-based chromatin loop prediction model called DeepLUCIA (Deep Learning-based Universal Chromatin Interaction Annotator). RESULTS Although DeepLUCIA does not use TF binding profile data which previous TF binding-dependent methods critically rely on, its prediction accuracies are comparable to those of the previous TF binding-dependent methods. More importantly, DeepLUCIA enables the tissue-specific chromatin loop predictions from tissue-specific epigenomes that cannot be handled by genomic variation-based approach. We demonstrated the utility of the DeepLUCIA by predicting several novel target genes of SNPs identified in genome-wide association studies targeting Brugada syndrome, COVID-19 severity, and age-related macular degeneration. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Dongchan Yang
- Department of Bio and Brain Engineering, KAIST, Daejeon, 34141, Republic of Korea
| | - Taesu Chung
- Biotechnology & Healthcare Examination Division, KIPO, Daejeon, 35208, Republic of Korea
| | - Dongsup Kim
- Department of Bio and Brain Engineering, KAIST, Daejeon, 34141, Republic of Korea
| |
Collapse
|
23
|
InsuLock: A Weakly Supervised Learning Approach for Accurate Insulator Prediction, and Variant Impact Quantification. Genes (Basel) 2022; 13:genes13040621. [PMID: 35456427 PMCID: PMC9026820 DOI: 10.3390/genes13040621] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/30/2022] [Revised: 03/24/2022] [Accepted: 03/25/2022] [Indexed: 02/01/2023] Open
Abstract
Mapping chromatin insulator loops is crucial to investigating genome evolution, elucidating critical biological functions, and ultimately quantifying variant impact in diseases. However, chromatin conformation profiling assays are usually expensive, time-consuming, and may report fuzzy insulator annotations with low resolution. Therefore, we propose a weakly supervised deep learning method, InsuLock, to address these challenges. Specifically, InsuLock first utilizes a Siamese neural network to predict the existence of insulators within a given region (up to 2000 bp). Then, it uses an object detection module for precise insulator boundary localization via gradient-weighted class activation mapping (~40 bp resolution). Finally, it quantifies variant impacts by comparing the insulator score differences between the wild-type and mutant alleles. We applied InsuLock on various bulk and single-cell datasets for performance testing and benchmarking. We showed that it outperformed existing methods with an AUROC of ~0.96 and condensed insulator annotations to ~2.5% of their original size while still demonstrating higher conservation scores and better motif enrichments. Finally, we utilized InsuLock to make cell-type-specific variant impacts from brain scATAC-seq data and identified a schizophrenia GWAS variant disrupting an insulator loop proximal to a known risk gene, indicating a possible new mechanism of action for the disease.
Collapse
|
24
|
Shen Y, Zhong Q, Liu T, Wen Z, Shen W, Li L. CharID: a two-step model for universal prediction of interactions between chromatin accessible regions. Brief Bioinform 2022; 23:6514800. [PMID: 35077535 DOI: 10.1093/bib/bbab602] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/25/2021] [Revised: 12/23/2021] [Accepted: 12/24/2021] [Indexed: 11/14/2022] Open
Abstract
Open chromatin regions (OCRs) allow direct interaction between cis-regulatory elements and trans-acting factors. Therefore, predicting all potential OCR-mediated loops is essential for deciphering the regulation mechanism of gene expression. However, existing loop prediction tools are restricted to specific anchor types. Here, we present CharID (Chromatin Accessible Region Interaction Detector), a two-step model that combines neural network and ensemble learning to predict OCR-mediated loops. In the first step, CharID-Anchor, an attention-based hybrid CNN-BiGRU network is constructed to discriminate between the anchor and nonanchor OCRs. In the second step, CharID-Loop uses gradient boosting decision tree with chromosome-split strategy to predict the interactions between anchor OCRs. The performance was assessed in three human cell lines, and CharID showed superior prediction performance compared with other algorithms. In contrast to the methods designed to predict a particular type of loops, CharID can detect varieties of chromatin loops not limited to enhancer-promoter loops or architectural protein-mediated loops. We constructed the OCR-mediated interaction network using the predicted loops and identified hub anchors, which are highlighted by their proximity to housekeeping genes. By analyzing loops containing SNPs associated with cardiovascular disease, we identified an SNP-gene loop indicating the regulation mechanism of the GFOD1. Taken together, CharID universally predicts diverse chromatin loops beyond other state-of-the-art methods, which are limited by anchor types, and experimental techniques, which are limited by sensitivities drastically decaying with the genomic distance of anchors. Finally, we hosted Peaksniffer, a user-friendly web server that provides online prediction, query and visualization of OCRs and associated loops.
Collapse
Affiliation(s)
- Yin Shen
- Hubei Key Laboratory of Agricultural Bioinformatics, College of Informatics, Huazhong Agricultural University, Wuhan, 430070, P. R. China
- 3D Genomics Research Center, Huazhong Agricultural University, Wuhan, 430070, P. R. China
| | - Quan Zhong
- Hubei Key Laboratory of Agricultural Bioinformatics, College of Informatics, Huazhong Agricultural University, Wuhan, 430070, P. R. China
- 3D Genomics Research Center, Huazhong Agricultural University, Wuhan, 430070, P. R. China
| | - Tian Liu
- Hubei Key Laboratory of Agricultural Bioinformatics, College of Informatics, Huazhong Agricultural University, Wuhan, 430070, P. R. China
| | - Zi Wen
- Hubei Key Laboratory of Agricultural Bioinformatics, College of Informatics, Huazhong Agricultural University, Wuhan, 430070, P. R. China
- 3D Genomics Research Center, Huazhong Agricultural University, Wuhan, 430070, P. R. China
| | - Wei Shen
- Hubei Key Laboratory of Agricultural Bioinformatics, College of Informatics, Huazhong Agricultural University, Wuhan, 430070, P. R. China
- 3D Genomics Research Center, Huazhong Agricultural University, Wuhan, 430070, P. R. China
| | - Li Li
- Hubei Key Laboratory of Agricultural Bioinformatics, College of Informatics, Huazhong Agricultural University, Wuhan, 430070, P. R. China
- 3D Genomics Research Center, Huazhong Agricultural University, Wuhan, 430070, P. R. China
| |
Collapse
|
25
|
Stilianoudakis SC, Marshall MA, Dozmorov MG. preciseTAD: a transfer learning framework for 3D domain boundary prediction at base-pair resolution. Bioinformatics 2022; 38:621-630. [PMID: 34741515 PMCID: PMC8756196 DOI: 10.1093/bioinformatics/btab743] [Citation(s) in RCA: 7] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/10/2021] [Revised: 10/07/2021] [Accepted: 11/02/2021] [Indexed: 02/03/2023] Open
Abstract
MOTIVATION Chromosome conformation capture technologies (Hi-C) revealed extensive DNA folding into discrete 3D domains, such as Topologically Associating Domains and chromatin loops. The correct binding of CTCF and cohesin at domain boundaries is integral in maintaining the proper structure and function of these 3D domains. 3D domains have been mapped at the resolutions of 1 kilobase and above. However, it has not been possible to define their boundaries at the resolution of boundary-forming proteins. RESULTS To predict domain boundaries at base-pair resolution, we developed preciseTAD, an optimized transfer learning framework trained on high-resolution genome annotation data. In contrast to current TAD/loop callers, preciseTAD-predicted boundaries are strongly supported by experimental evidence. Importantly, this approach can accurately delineate boundaries in cells without Hi-C data. preciseTAD provides a powerful framework to improve our understanding of how genomic regulators are shaping the 3D structure of the genome at base-pair resolution. AVAILABILITY AND IMPLEMENTATION preciseTAD is an R/Bioconductor package available at https://bioconductor.org/packages/preciseTAD/. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Spiro C Stilianoudakis
- Department of Biostatistics, Department of Pathology, Virginia Commonwealth University, Richmond, VA 23298, USA
| | - Maggie A Marshall
- Bioinformatics Program, Virginia Commonwealth University, Richmond, VA 23298, USA
| | - Mikhail G Dozmorov
- Department of Biostatistics, Department of Pathology, Virginia Commonwealth University, Richmond, VA 23298, USA
| |
Collapse
|
26
|
Kai Y, Li BE, Zhu M, Li GY, Chen F, Han Y, Cha HJ, Orkin SH, Cai W, Huang J, Yuan GC. Mapping the evolving landscape of super-enhancers during cell differentiation. Genome Biol 2021; 22:269. [PMID: 34526084 PMCID: PMC8442463 DOI: 10.1186/s13059-021-02485-x] [Citation(s) in RCA: 19] [Impact Index Per Article: 4.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/11/2020] [Accepted: 09/02/2021] [Indexed: 12/26/2022] Open
Abstract
BACKGROUND Super-enhancers are clusters of enhancer elements that play critical roles in the maintenance of cell identity. Current investigations on super-enhancers are centered on the established ones in static cell types. How super-enhancers are established during cell differentiation remains obscure. RESULTS Here, by developing an unbiased approach to systematically analyze the evolving landscape of super-enhancers during cell differentiation in multiple lineages, we discover a general trend where super-enhancers emerge through three distinct temporal patterns: conserved, temporally hierarchical, and de novo. The three types of super-enhancers differ further in association patterns in target gene expression, functional enrichment, and 3D chromatin organization, suggesting they may represent distinct structural and functional subtypes. Furthermore, we dissect the enhancer repertoire within temporally hierarchical super-enhancers, and find enhancers that emerge at early and late stages are enriched with distinct transcription factors, suggesting that the temporal order of establishment of elements within super-enhancers may be directed by underlying DNA sequence. CRISPR-mediated deletion of individual enhancers in differentiated cells shows that both the early- and late-emerged enhancers are indispensable for target gene expression, while in undifferentiated cells early enhancers are involved in the regulation of target genes. CONCLUSIONS In summary, our analysis highlights the heterogeneity of the super-enhancer population and provides new insights to enhancer functions within super-enhancers.
Collapse
Affiliation(s)
- Yan Kai
- Department of Pediatric Oncology, Dana-Farber Cancer Institute and Harvard Medical School, Boston, MA, 02115, USA
| | - Bin E Li
- Cancer and Blood Disorders Center, Boston Children's Hospital and Dana-Farber Cancer Institute, Harvard Medical School, Boston, MA, 02115, USA
| | - Ming Zhu
- State Key Laboratory of Cellular Stress Biology, Innovation Center for Cell Signaling Network, School of Life Sciences, Xiamen University, Xiamen, 361102, Fujian, China
| | - Grace Y Li
- Cancer and Blood Disorders Center, Boston Children's Hospital and Dana-Farber Cancer Institute, Harvard Medical School, Boston, MA, 02115, USA
| | - Fei Chen
- State Key Laboratory of Cellular Stress Biology, Innovation Center for Cell Signaling Network, School of Life Sciences, Xiamen University, Xiamen, 361102, Fujian, China
| | - Yingli Han
- State Key Laboratory of Cellular Stress Biology, Innovation Center for Cell Signaling Network, School of Life Sciences, Xiamen University, Xiamen, 361102, Fujian, China
| | - Hye Ji Cha
- Cancer and Blood Disorders Center, Boston Children's Hospital and Dana-Farber Cancer Institute, Harvard Medical School, Boston, MA, 02115, USA
| | - Stuart H Orkin
- Cancer and Blood Disorders Center, Boston Children's Hospital and Dana-Farber Cancer Institute, Harvard Medical School, Boston, MA, 02115, USA
- Howard Hughes Medical Institute, Boston, MA, 02115, USA
| | - Wenqing Cai
- Cancer and Blood Disorders Center, Boston Children's Hospital and Dana-Farber Cancer Institute, Harvard Medical School, Boston, MA, 02115, USA.
| | - Jialiang Huang
- State Key Laboratory of Cellular Stress Biology, Innovation Center for Cell Signaling Network, School of Life Sciences, Xiamen University, Xiamen, 361102, Fujian, China.
| | - Guo-Cheng Yuan
- Department of Pediatric Oncology, Dana-Farber Cancer Institute and Harvard Medical School, Boston, MA, 02115, USA.
- Department of Genetics and Genomic Sciences, Charles Bronfman Institute for Precision Medicine, Icahn School of Medicine at Mount Sinai, New York, NY, 10029, USA.
| |
Collapse
|
27
|
Wang W, Gao L, Ye Y, Gao Y. CCIP: Predicting CTCF-mediated chromatin loops with transitivity. Bioinformatics 2021; 37:4635-4642. [PMID: 34289010 PMCID: PMC8665748 DOI: 10.1093/bioinformatics/btab534] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/17/2021] [Revised: 06/18/2021] [Accepted: 07/19/2021] [Indexed: 11/14/2022] Open
Abstract
Motivation CTCF-mediated chromatin loops underlie the formation of topological associating domains and serve as the structural basis for transcriptional regulation. However, the formation mechanism of these loops remains unclear, and the genome-wide mapping of these loops is costly and difficult. Motivated by the recent studies on the formation mechanism of CTCF-mediated loops, we studied the possibility of making use of transitivity-related information of interacting CTCF anchors to predict CTCF loops computationally. In this context, transitivity arises when two CTCF anchors interact with the same third anchor by the loop extrusion mechanism and bring themselves close to each other spatially to form an indirect loop. Results To determine whether transitivity is informative for predicting CTCF loops and to obtain an accurate and low-cost predicting method, we proposed a two-stage random-forest-based machine learning method, CTCF-mediated Chromatin Interaction Prediction (CCIP), to predict CTCF-mediated chromatin loops. Our two-stage learning approach makes it possible for us to train a prediction model by taking advantage of transitivity-related information as well as functional genome data and genomic data. Experimental studies showed that our method predicts CTCF-mediated loops more accurately than other methods and that transitivity, when used as a properly defined attribute, is informative for predicting CTCF loops. Furthermore, we found that transitivity explains the formation of tandem CTCF loops and facilitates enhancer–promoter interactions. Our work contributes to the understanding of the formation mechanism and function of CTCF-mediated chromatin loops. Availability and implementation The source code of CCIP can be accessed at: https://github.com/GaoLabXDU/CCIP. Supplementary information Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Weibing Wang
- School of Computer Science and Technology, Xidian University, Xi'an, Shaanxi, 710071, China
| | - Lin Gao
- School of Computer Science and Technology, Xidian University, Xi'an, Shaanxi, 710071, China
| | - Yusen Ye
- School of Computer Science and Technology, Xidian University, Xi'an, Shaanxi, 710071, China
| | - Yong Gao
- Department of Computer Science, The University of British Columbia Okanagan, Kelowna, BC, V1V 1V5, Canada
| |
Collapse
|
28
|
Lv H, Dao FY, Zulfiqar H, Su W, Ding H, Liu L, Lin H. A sequence-based deep learning approach to predict CTCF-mediated chromatin loop. Brief Bioinform 2021; 22:6149346. [PMID: 33634313 DOI: 10.1093/bib/bbab031] [Citation(s) in RCA: 26] [Impact Index Per Article: 6.5] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/26/2020] [Revised: 12/01/2020] [Accepted: 01/21/2021] [Indexed: 12/13/2022] Open
Abstract
Three-dimensional (3D) architecture of the chromosomes is of crucial importance for transcription regulation and DNA replication. Various high-throughput chromosome conformation capture-based methods have revealed that CTCF-mediated chromatin loops are a major component of 3D architecture. However, CTCF-mediated chromatin loops are cell type specific, and most chromatin interaction capture techniques are time-consuming and labor-intensive, which restricts their usage on a very large number of cell types. Genomic sequence-based computational models are sophisticated enough to capture important features of chromatin architecture and help to identify chromatin loops. In this work, we develop Deep-loop, a convolutional neural network model, to integrate k-tuple nucleotide frequency component, nucleotide pair spectrum encoding, position conservation, position scoring function and natural vector features for the prediction of chromatin loops. By a series of examination based on cross-validation, Deep-loop shows excellent performance in the identification of the chromatin loops from different cell types. The source code of Deep-loop is freely available at the repository https://github.com/linDing-group/Deep-loop.
Collapse
Affiliation(s)
- Hao Lv
- Informational Biology at University of Electronic Science and Technology of China
| | - Fu-Ying Dao
- Informational Biology at University of Electronic Science and Technology of China
| | - Hasan Zulfiqar
- Informational Biology at University of Electronic Science and Technology of China
| | - Wei Su
- Informational Biology at University of Electronic Science and Technology of China
| | - Hui Ding
- Informational Biology at University of Electronic Science and Technology of China
| | - Li Liu
- Laboratory of Theoretical Biophysics at Inner Mongolia University
| | - Hao Lin
- Informational Biology at University of Electronic Science and Technology of China
| |
Collapse
|
29
|
Loop competition and extrusion model predicts CTCF interaction specificity. Nat Commun 2021; 12:1046. [PMID: 33594051 PMCID: PMC7886907 DOI: 10.1038/s41467-021-21368-0] [Citation(s) in RCA: 18] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/08/2020] [Accepted: 01/22/2021] [Indexed: 12/20/2022] Open
Abstract
Three-dimensional chromatin looping interactions play an important role in constraining enhancer–promoter interactions and mediating transcriptional gene regulation. CTCF is thought to play a critical role in the formation of these loops, but the specificity of which CTCF binding events form loops and which do not is difficult to predict. Loops often have convergent CTCF binding site motif orientation, but this constraint alone is only weakly predictive of genome-wide interaction data. Here we present an easily interpretable and simple mathematical model of CTCF mediated loop formation which is consistent with Cohesin extrusion and can predict ChIA-PET CTCF looping interaction measurements with high accuracy. Competition between overlapping loops is a critical determinant of loop specificity. We show that this model is consistent with observed chromatin interaction frequency changes induced by CTCF binding site deletion, inversion, and mutation, and is also consistent with observed constraints on validated enhancer–promoter interactions. Boundaries of topologically associated domains in genomes are marked by CTCF and cohesin binding. Here the authors predict CTCF interaction specificity by building a simple mathematical model with features including loop competition and extrusion.
Collapse
|
30
|
Kuang S, Wang L. Identification and analysis of consensus RNA motifs binding to the genome regulator CTCF. NAR Genom Bioinform 2021; 2:lqaa031. [PMID: 33575587 PMCID: PMC7671415 DOI: 10.1093/nargab/lqaa031] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/03/2019] [Revised: 03/21/2020] [Accepted: 04/28/2020] [Indexed: 12/14/2022] Open
Abstract
CCCTC-binding factor (CTCF) is a key regulator of 3D genome organization and gene expression. Recent studies suggest that RNA transcripts, mostly long non-coding RNAs (lncRNAs), can serve as locus-specific factors to bind and recruit CTCF to the chromatin. However, it remains unclear whether specific sequence patterns are shared by the CTCF-binding RNA sites, and no RNA motif has been reported so far for CTCF binding. In this study, we have developed DeepLncCTCF, a new deep learning model based on a convolutional neural network and a bidirectional long short-term memory network, to discover the RNA recognition patterns of CTCF and identify candidate lncRNAs binding to CTCF. When evaluated on two different datasets, human U2OS dataset and mouse ESC dataset, DeepLncCTCF was shown to be able to accurately predict CTCF-binding RNA sites from nucleotide sequence. By examining the sequence features learned by DeepLncCTCF, we discovered a novel RNA motif with the consensus sequence, AGAUNGGA, for potential CTCF binding in humans. Furthermore, the applicability of DeepLncCTCF was demonstrated by identifying nearly 5000 candidate lncRNAs that might bind to CTCF in the nucleus. Our results provide useful information for understanding the molecular mechanisms of CTCF function in 3D genome organization.
Collapse
Affiliation(s)
- Shuzhen Kuang
- Department of Genetics and Biochemistry, Clemson University, Clemson, SC 29634, USA.,Department of Biological Sciences, Clemson University, Clemson, SC 29634, USA
| | - Liangjiang Wang
- Department of Genetics and Biochemistry, Clemson University, Clemson, SC 29634, USA
| |
Collapse
|
31
|
Belokopytova P, Fishman V. Predicting Genome Architecture: Challenges and Solutions. Front Genet 2021; 11:617202. [PMID: 33552135 PMCID: PMC7862721 DOI: 10.3389/fgene.2020.617202] [Citation(s) in RCA: 22] [Impact Index Per Article: 5.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/14/2020] [Accepted: 12/15/2020] [Indexed: 12/22/2022] Open
Abstract
Genome architecture plays a pivotal role in gene regulation. The use of high-throughput methods for chromatin profiling and 3-D interaction mapping provide rich experimental data sets describing genome organization and dynamics. These data challenge development of new models and algorithms connecting genome architecture with epigenetic marks. In this review, we describe how chromatin architecture could be reconstructed from epigenetic data using biophysical or statistical approaches. We discuss the applicability and limitations of these methods for understanding the mechanisms of chromatin organization. We also highlight the emergence of new predictive approaches for scoring effects of structural variations in human cells.
Collapse
Affiliation(s)
- Polina Belokopytova
- Natural Sciences Department, Novosibirsk State University, Novosibirsk, Russia
- Institute of Cytology and Genetics Siberian Branch of Russian Academy of Sciences (SB RAS), Novosibirsk, Russia
| | - Veniamin Fishman
- Natural Sciences Department, Novosibirsk State University, Novosibirsk, Russia
- Institute of Cytology and Genetics Siberian Branch of Russian Academy of Sciences (SB RAS), Novosibirsk, Russia
| |
Collapse
|
32
|
Tao H, Li H, Xu K, Hong H, Jiang S, Du G, Wang J, Sun Y, Huang X, Ding Y, Li F, Zheng X, Chen H, Bo X. Computational methods for the prediction of chromatin interaction and organization using sequence and epigenomic profiles. Brief Bioinform 2021; 22:6102668. [PMID: 33454752 PMCID: PMC8424394 DOI: 10.1093/bib/bbaa405] [Citation(s) in RCA: 16] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/21/2020] [Revised: 11/26/2020] [Accepted: 12/10/2020] [Indexed: 12/14/2022] Open
Abstract
The exploration of three-dimensional chromatin interaction and organization provides insight into mechanisms underlying gene regulation, cell differentiation and disease development. Advances in chromosome conformation capture technologies, such as high-throughput chromosome conformation capture (Hi-C) and chromatin interaction analysis by paired-end tag (ChIA-PET), have enabled the exploration of chromatin interaction and organization. However, high-resolution Hi-C and ChIA-PET data are only available for a limited number of cell lines, and their acquisition is costly, time consuming, laborious and affected by theoretical limitations. Increasing evidence shows that DNA sequence and epigenomic features are informative predictors of regulatory interaction and chromatin architecture. Based on these features, numerous computational methods have been developed for the prediction of chromatin interaction and organization, whereas they are not extensively applied in biomedical study. A systematical study to summarize and evaluate such methods is still needed to facilitate their application. Here, we summarize 48 computational methods for the prediction of chromatin interaction and organization using sequence and epigenomic profiles, categorize them and compare their performance. Besides, we provide a comprehensive guideline for the selection of suitable methods to predict chromatin interaction and organization based on available data and biological question of interest.
Collapse
Affiliation(s)
- Huan Tao
- Beijing Institute of Radiation Medicine
| | - Hao Li
- Beijing Institute of Radiation Medicine
| | - Kang Xu
- Beijing Institute of Radiation Medicine
| | - Hao Hong
- Beijing Institute of Radiation Medicine, Department of Biotechnology
| | - Shuai Jiang
- Beijing Institute of Radiation Medicine, Department of Biotechnology
| | - Guifang Du
- Beijing Institute of Radiation Medicine, Department of Biotechnology
| | | | - Yu Sun
- Beijing Institute of Radiation Medicine, Department of Biotechnology
| | - Xin Huang
- Beijing Institute of Radiation Medicine, Department of Biotechnology
| | - Yang Ding
- Beijing Institute of Radiation Medicine
| | - Fei Li
- Chinese Academy of Sciences, Department of Computer Network Information Center
| | | | | | | |
Collapse
|
33
|
Dao FY, Lv H, Zhang D, Zhang ZM, Liu L, Lin H. DeepYY1: a deep learning approach to identify YY1-mediated chromatin loops. Brief Bioinform 2020; 22:6024741. [PMID: 33279983 DOI: 10.1093/bib/bbaa356] [Citation(s) in RCA: 59] [Impact Index Per Article: 11.8] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/21/2020] [Revised: 10/19/2020] [Accepted: 11/04/2020] [Indexed: 12/29/2022] Open
Abstract
The protein Yin Yang 1 (YY1) could form dimers that facilitate the interaction between active enhancers and promoter-proximal elements. YY1-mediated enhancer-promoter interaction is the general feature of mammalian gene control. Recently, some computational methods have been developed to characterize the interactions between DNA elements by elucidating important features of chromatin folding; however, no computational methods have been developed for identifying the YY1-mediated chromatin loops. In this study, we developed a deep learning algorithm named DeepYY1 based on word2vec to determine whether a pair of YY1 motifs would form a loop. The proposed models showed a high prediction performance (AUCs$\ge$0.93) on both training datasets and testing datasets in different cell types, demonstrating that DeepYY1 has an excellent performance in the identification of the YY1-mediated chromatin loops. Our study also suggested that sequences play an important role in the formation of YY1-mediated chromatin loops. Furthermore, we briefly discussed the distribution of the replication origin site in the loops. Finally, a user-friendly web server was established, and it can be freely accessed at http://lin-group.cn/server/DeepYY1.
Collapse
Affiliation(s)
- Fu-Ying Dao
- Center for Informational Biology at the University of Electronic Science and Technology of China
| | - Hao Lv
- Center for Informational Biology at the University of Electronic Science and Technology of China
| | - Dan Zhang
- Center for Informational Biology at the University of Electronic Science and Technology of China
| | - Zi-Mei Zhang
- Center for Informational Biology at the University of Electronic Science and Technology of China
| | - Li Liu
- Laboratory of Theoretical Biophysics at the Inner Mongolia University
| | - Hao Lin
- Center for Informational Biology at the University of Electronic Science and Technology of China
| |
Collapse
|
34
|
A Comparative Study of Supervised Machine Learning Algorithms for the Prediction of Long-Range Chromatin Interactions. Genes (Basel) 2020; 11:genes11090985. [PMID: 32847102 PMCID: PMC7563616 DOI: 10.3390/genes11090985] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/08/2020] [Revised: 08/18/2020] [Accepted: 08/20/2020] [Indexed: 02/07/2023] Open
Abstract
The role of three-dimensional genome organization as a critical regulator of gene expression has become increasingly clear over the last decade. Most of our understanding of this association comes from the study of long range chromatin interaction maps provided by Chromatin Conformation Capture-based techniques, which have greatly improved in recent years. Since these procedures are experimentally laborious and expensive, in silico prediction has emerged as an alternative strategy to generate virtual maps in cell types and conditions for which experimental data of chromatin interactions is not available. Several methods have been based on predictive models trained on one-dimensional (1D) sequencing features, yielding promising results. However, different approaches vary both in the way they model chromatin interactions and in the machine learning-based strategy they rely on, making it challenging to carry out performance comparison of existing methods. In this study, we use publicly available 1D sequencing signals to model cohesin-mediated chromatin interactions in two human cell lines and evaluate the prediction performance of six popular machine learning algorithms: decision trees, random forests, gradient boosting, support vector machines, multi-layer perceptron and deep learning. Our approach accurately predicts long-range interactions and reveals that gradient boosting significantly outperforms the other five methods, yielding accuracies of about 95%. We show that chromatin features in close genomic proximity to the anchors cover most of the predictive information, as has been previously reported. Moreover, we demonstrate that gradient boosting models trained with different subsets of chromatin features, unlike the other methods tested, are able to produce accurate predictions. In this regard, and besides architectural proteins, transcription factors are shown to be highly informative. Our study provides a framework for the systematic prediction of long-range chromatin interactions, identifies gradient boosting as the best suited algorithm for this task and highlights cell-type specific binding of transcription factors at the anchors as important determinants of chromatin wiring mediated by cohesin.
Collapse
|
35
|
Zhang R, Ma J. MATCHA: Probing multi-way chromatin interaction with hypergraph representation learning. Cell Syst 2020; 10:397-407.e5. [PMID: 32550271 PMCID: PMC7299183 DOI: 10.1016/j.cels.2020.04.004] [Citation(s) in RCA: 15] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/14/2022]
Abstract
Recent advances in ligation-free, genome-wide chromatin interaction mapping such as SPRITE and ChIA-Drop have enabled the identification of simultaneous interactions involving multiple genomic loci within the same nuclei, which are informative to delineate higher-order genome organization and gene regulation mechanisms at single-nucleus resolution. Unfortunately, computational methods for analyzing multi-way chromatin interaction data are significantly underexplored. Here we develop an algorithm, called MATCHA, based on hypergraph representation learning where multi-way chromatin interactions are represented as hyperedges. Applications to SPRITE and ChIA-Drop data suggest that MATCHA is effective to denoise the data and make de novo predictions, which greatly enhances the data quality for analyzing the properties of multi-way chromatin interactions. MATCHA provides a promising framework to significantly improve the analysis of multi-way chromatin interaction data and has the potential to offer unique insights into higher-order chromosome organization and function. MATCHA is freely available for download here: https://github.com/ma-compbio/MATCHA.
Collapse
Affiliation(s)
- Ruochi Zhang
- Computational Biology Department, School of Computer Science, Carnegie Mellon University, Pittsburgh, PA 15213, USA
| | - Jian Ma
- Computational Biology Department, School of Computer Science, Carnegie Mellon University, Pittsburgh, PA 15213, USA
| |
Collapse
|
36
|
Trieu T, Martinez-Fundichely A, Khurana E. DeepMILO: a deep learning approach to predict the impact of non-coding sequence variants on 3D chromatin structure. Genome Biol 2020; 21:79. [PMID: 32216817 PMCID: PMC7098089 DOI: 10.1186/s13059-020-01987-4] [Citation(s) in RCA: 32] [Impact Index Per Article: 6.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/04/2019] [Accepted: 03/06/2020] [Indexed: 12/17/2022] Open
Abstract
Non-coding variants have been shown to be related to disease by alteration of 3D genome structures. We propose a deep learning method, DeepMILO, to predict the effects of variants on CTCF/cohesin-mediated insulator loops. Application of DeepMILO on variants from whole-genome sequences of 1834 patients of twelve cancer types revealed 672 insulator loops disrupted in at least 10% of patients. Our results show mutations at loop anchors are associated with upregulation of the cancer driver genes BCL2 and MYC in malignant lymphoma thus pointing to a possible new mechanism for their dysregulation via alteration of insulator loops.
Collapse
Affiliation(s)
- Tuan Trieu
- Meyer Cancer Center, Weill Cornell Medicine, New York, NY, 10065, USA.
- Department of Physiology and Biophysics, Weill Cornell Medicine, New York, NY, 10065, USA.
- Institute for Computational Biomedicine, Weill Cornell Medicine, New York, NY, 10021, USA.
| | - Alexander Martinez-Fundichely
- Meyer Cancer Center, Weill Cornell Medicine, New York, NY, 10065, USA
- Department of Physiology and Biophysics, Weill Cornell Medicine, New York, NY, 10065, USA
- Institute for Computational Biomedicine, Weill Cornell Medicine, New York, NY, 10021, USA
| | - Ekta Khurana
- Meyer Cancer Center, Weill Cornell Medicine, New York, NY, 10065, USA.
- Department of Physiology and Biophysics, Weill Cornell Medicine, New York, NY, 10065, USA.
- Institute for Computational Biomedicine, Weill Cornell Medicine, New York, NY, 10021, USA.
- Caryl and Israel Englander Institute for Precision Medicine, New York Presbyterian Hospital-Weill Cornell Medicine, New York, NY, 10065, USA.
| |
Collapse
|
37
|
Li Y, Tao T, Du L, Zhu X. Three-dimensional genome: developmental technologies and applications in precision medicine. J Hum Genet 2020; 65:497-511. [PMID: 32152365 DOI: 10.1038/s10038-020-0737-7] [Citation(s) in RCA: 20] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/30/2019] [Revised: 02/20/2020] [Accepted: 02/22/2020] [Indexed: 12/17/2022]
Abstract
In the 20th century, our familiar structure of DNA was the double helix. Due to technical limitations, we do not have a good way to understand the finer structure of the genome, let alone its transcriptional regulation. Until the advent of 3C technologies, we were no longer blind to this one. Three-dimensional (3D) genomics is a new subject, which mainly studies the 3D structure and transcriptional regulation of eukaryotic genomes. Now, this field mainly has Hi-C series and CHIA-PET series technologies. Through 3D genomics, we can understand the basic structure of DNA, understand the growth and development of organisms and the occurrence of diseases, so as to promote human medical and health undertakings. The review introduces the main research techniques of 3D genomics and their characteristics, the latest development of 3D genome structure, the relationship between diseases and 3D genome structure, the applications of 3D genome in precision medicine, and the development of the 4D nucleome project.
Collapse
Affiliation(s)
- Yingqi Li
- Marine Medical Research Institute of Guangdong Zhanjiang (GDZJMMRI), Southern Marine Science and Engineering Guangdong Laboratory Zhanjiang, Guangdong Medical University, Zhanjiang, 524023, China
| | - Tao Tao
- Department of Gastroenterology, Zibo Central Hospital, Zibo, 255000, China
| | - Likun Du
- First Affiliated Hospital, Heilongjiang University of Traditional Chinese Medicine, Harbin, 150040, China.
| | - Xiao Zhu
- Marine Medical Research Institute of Guangdong Zhanjiang (GDZJMMRI), Southern Marine Science and Engineering Guangdong Laboratory Zhanjiang, Guangdong Medical University, Zhanjiang, 524023, China.
| |
Collapse
|
38
|
Belokopytova PS, Nuriddinov MA, Mozheiko EA, Fishman D, Fishman V. Quantitative prediction of enhancer-promoter interactions. Genome Res 2019; 30:72-84. [PMID: 31804952 PMCID: PMC6961579 DOI: 10.1101/gr.249367.119] [Citation(s) in RCA: 46] [Impact Index Per Article: 7.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/13/2019] [Accepted: 11/25/2019] [Indexed: 11/24/2022]
Abstract
Recent experimental and computational efforts have provided large data sets describing three-dimensional organization of mouse and human genomes and showed the interconnection between the expression profile, epigenetic state, and spatial interactions of loci. These interconnections were utilized to infer the spatial organization of chromatin, including enhancer–promoter contacts, from one-dimensional epigenetic marks. Here, we show that the predictive power of some of these algorithms is overestimated due to peculiar properties of the biological data. We propose an alternative approach, which provides high-quality predictions of chromatin interactions using information on gene expression and CTCF-binding alone. Using multiple metrics, we confirmed that our algorithm could efficiently predict the three-dimensional architecture of both normal and rearranged genomes.
Collapse
Affiliation(s)
- Polina S Belokopytova
- Institute of Cytology and Genetics SB RAS 630090, Novosibirsk, Russia.,Novosibirsk State University, Novosibirsk, Russia 630090
| | | | | | - Daniil Fishman
- Novosibirsk State University, Novosibirsk, Russia 630090
| | - Veniamin Fishman
- Institute of Cytology and Genetics SB RAS 630090, Novosibirsk, Russia.,Novosibirsk State University, Novosibirsk, Russia 630090
| |
Collapse
|
39
|
Li X, An Z, Zhang Z. Comparison of computational methods for 3D genome analysis at single-cell Hi-C level. Methods 2019; 181-182:52-61. [PMID: 31445093 DOI: 10.1016/j.ymeth.2019.08.005] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/13/2019] [Revised: 07/09/2019] [Accepted: 08/19/2019] [Indexed: 11/18/2022] Open
Abstract
Hi-C is a high-throughput chromosome conformation capture technology that is becoming routine in the literature. Although the price of sequencing has been dropping dramatically, high-resolution Hi-C data are not always an option for many studies, such as in single cells. However, the performance of current computational methods based on Hi-C at the ultra-sparse data condition has yet to be fully assessed. Therefore, in this paper, after briefly surveying the primary computational methods for Hi-C data analysis, we assess the performance of representative methods on data normalization, identification of compartments, Topologically Associating Domains (TADs) and chromatin loops under the condition of ultra-low resolution. We showed that most state-of-the-art methods do not work properly for that condition. Then, we applied the three best-performing methods on real single-cell Hi-C data, and their performance indicates that compartments may be a statistical feature emerging from the cell population, while TADs and chromatin loops may dynamically exist in single cells.
Collapse
Affiliation(s)
- Xiao Li
- CAS Key Laboratory of Genome Sciences and Information, Beijing Institute of Genomics, Chinese Academy of Sciences, Beijing 100101, China; School of Life Science, University of Chinese Academy of Sciences, Beijing, China
| | - Ziyang An
- CAS Key Laboratory of Genome Sciences and Information, Beijing Institute of Genomics, Chinese Academy of Sciences, Beijing 100101, China; School of Life Science, University of Chinese Academy of Sciences, Beijing, China
| | - Zhihua Zhang
- CAS Key Laboratory of Genome Sciences and Information, Beijing Institute of Genomics, Chinese Academy of Sciences, Beijing 100101, China; School of Life Science, University of Chinese Academy of Sciences, Beijing, China.
| |
Collapse
|
40
|
Large-scale chromatin organisation in interphase, mitosis and meiosis. Biochem J 2019; 476:2141-2156. [DOI: 10.1042/bcj20180512] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/20/2019] [Revised: 07/16/2019] [Accepted: 07/18/2019] [Indexed: 01/17/2023]
Abstract
AbstractThe spatial configuration of chromatin is fundamental to ensure any given cell can fulfil its functional duties, from gene expression to specialised cellular division. Significant technological innovations have facilitated further insights into the structure, function and regulation of three-dimensional chromatin organisation. To date, the vast majority of investigations into chromatin organisation have been conducted in interphase and mitotic cells leaving meiotic chromatin relatively unexplored. In combination, cytological and genome-wide contact frequency analyses in mammalian germ cells have recently demonstrated that large-scale chromatin structures in meiotic prophase I are reminiscent of the sequential loop arrays found in mitotic cells, although interphase-like segmentation of transcriptionally active and inactive regions are also evident along the length of chromosomes. Here, we discuss the similarities and differences of such large-scale chromatin architecture, between interphase, mitotic and meiotic cells, as well as their functional relevance and the proposed modulatory mechanisms which underlie them.
Collapse
|
41
|
Shimbo T, Kawamura M, Wijaya E, Takaki E, Kaneda Y, Tamai K. Cut-C: cleavage under tethered nuclease for conformational capture. BMC Genomics 2019; 20:614. [PMID: 31357933 PMCID: PMC6664727 DOI: 10.1186/s12864-019-5989-2] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/28/2019] [Accepted: 07/22/2019] [Indexed: 12/21/2022] Open
Abstract
BACKGROUND Deciphering the 3D structure of the genome is essential for elucidating the regulatory mechanisms of gene expression in detail. Existing methods, such as chromosome conformation capture (3C) and Hi-C have enabled the identification of novel aspects of chromatin structure. Further identification of protein-centric chromatin conformation is enabled by coupling the Hi-C procedure with a conventional chromatin immunoprecipitation assay. However, these methods are time-consuming and require independent methods for validation. RESULTS To simultaneously identify protein-centric chromatin conformation and target protein localization, we have developed Cut-C, a method that combines antibody-mediated cleavage by tethered nuclease with chromosome conformation capture to identify chromatin interactions mediated by a protein of interest. Applying Cut-C to H3K4me3, a histone modification enriched at active gene promoters, we have successfully identified chromatin loops mediated by H3K4me3 along with the genome-wide distribution of H3K4me3. Cut-C also identified chromatin loops mediated by CTCF, validating the general applicability of the method. CONCLUSIONS Cut-C identifies protein-centric chromatin conformations along with the genome-wide distribution of target proteins using simple procedures. The simplified protocol will improve the efficiency of analysing chromatin conformation using precious materials, such as clinical samples.
Collapse
Affiliation(s)
- Takashi Shimbo
- Department of Stem Cell Therapy Science, Graduate School of Medicine, Osaka University, Suita, Osaka, 5650871 Japan
| | - Machika Kawamura
- Department of Stem Cell Therapy Science, Graduate School of Medicine, Osaka University, Suita, Osaka, 5650871 Japan
- StemRIM Co., Ltd., Ibaraki, Osaka, 5670085 Japan
| | - Edward Wijaya
- Department of Stem Cell Therapy Science, Graduate School of Medicine, Osaka University, Suita, Osaka, 5650871 Japan
- StemRIM Co., Ltd., Ibaraki, Osaka, 5670085 Japan
| | - Eiichi Takaki
- Department of Stem Cell Therapy Science, Graduate School of Medicine, Osaka University, Suita, Osaka, 5650871 Japan
- StemRIM Co., Ltd., Ibaraki, Osaka, 5670085 Japan
| | - Yasufumi Kaneda
- Division of Gene Therapy Science, Graduate School of Medicine, Osaka University, Suita, Osaka, 5650871 Japan
| | - Katsuto Tamai
- Department of Stem Cell Therapy Science, Graduate School of Medicine, Osaka University, Suita, Osaka, 5650871 Japan
| |
Collapse
|
42
|
Qi Y, Zhang B. Predicting three-dimensional genome organization with chromatin states. PLoS Comput Biol 2019; 15:e1007024. [PMID: 31181064 PMCID: PMC6586364 DOI: 10.1371/journal.pcbi.1007024] [Citation(s) in RCA: 83] [Impact Index Per Article: 13.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/05/2018] [Revised: 06/20/2019] [Accepted: 04/13/2019] [Indexed: 11/19/2022] Open
Abstract
We introduce a computational model to simulate chromatin structure and dynamics. Starting from one-dimensional genomics and epigenomics data that are available for hundreds of cell types, this model enables de novo prediction of chromatin structures at five-kilo-base resolution. Simulated chromatin structures recapitulate known features of genome organization, including the formation of chromatin loops, topologically associating domains (TADs) and compartments, and are in quantitative agreement with chromosome conformation capture experiments and super-resolution microscopy measurements. Detailed characterization of the predicted structural ensemble reveals the dynamical flexibility of chromatin loops and the presence of cross-talk among neighboring TADs. Analysis of the model's energy function uncovers distinct mechanisms for chromatin folding at various length scales and suggests a need to go beyond simple A/B compartment types to predict specific contacts between regulatory elements using polymer simulations.
Collapse
Affiliation(s)
- Yifeng Qi
- Departments of Chemistry, Massachusetts Institute of Technology, Cambridge, Massachusetts, United States of America
| | - Bin Zhang
- Departments of Chemistry, Massachusetts Institute of Technology, Cambridge, Massachusetts, United States of America
| |
Collapse
|