1
|
Mitra S, Hartemink AJ. Inferring differential protein binding from time-series chromatin accessibility data. BIOINFORMATICS ADVANCES 2025; 5:vbaf080. [PMID: 40297777 PMCID: PMC12037103 DOI: 10.1093/bioadv/vbaf080] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 01/14/2025] [Revised: 03/08/2025] [Accepted: 04/07/2025] [Indexed: 04/30/2025]
Abstract
Motivation Due to internal and external factors, the epigenomic landscape is constantly changing in ways that are linked to changes in gene expression. Chromatin accessibility data, such as MNase-seq, provide valuable insights into this landscape and have been used to compute chromatin occupancy profiles. Multiple datasets generated over time or under different conditions can thus be used to study dynamic changes in chromatin occupancy across the genome. Results Our existing model, RoboCOP, computes a genome-wide chromatin occupancy profile for nucleosomes and hundreds of transcription factors. Here, we present a new method called DynaCOP that takes multiple chromatin occupancy profiles and uses them to generate a series of nucleosome-guided difference profiles. These profiles identify differentially binding transcription factors and reveal changes in nucleosome occupancy and positioning. We apply DynaCOP to chromatin occupancy profiles derived from deeply sequenced time-series MNase-seq data to study differential chromatin occupancy in the yeast genome under cadmium stress. We find strong correlations between the observed chromatin changes and changes in transcription. Availability and implementation https://github.com/HarteminkLab/RoboCOP.
Collapse
Affiliation(s)
- Sneha Mitra
- Department of Computer Science, Duke University, Durham, NC 27708-0129, United States
| | - Alexander J Hartemink
- Department of Computer Science, Duke University, Durham, NC 27708-0129, United States
- Program in Computational Biology and Bioinformatics, Duke University, Durham, NC 27710, United States
| |
Collapse
|
2
|
Luo K, Zhong J, Safi A, Hong LK, Tewari AK, Song L, Reddy TE, Ma L, Crawford GE, Hartemink AJ. Profiling the quantitative occupancy of myriad transcription factors across conditions by modeling chromatin accessibility data. Genome Res 2022; 32:1183-1198. [PMID: 35609992 PMCID: PMC9248881 DOI: 10.1101/gr.272203.120] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/30/2020] [Accepted: 05/06/2022] [Indexed: 11/24/2022]
Abstract
Over a thousand different transcription factors (TFs) bind with varying occupancy across the human genome. Chromatin immunoprecipitation (ChIP) can assay occupancy genome-wide, but only one TF at a time, limiting our ability to comprehensively observe the TF occupancy landscape, let alone quantify how it changes across conditions. We developed TF occupancy profiler (TOP), a Bayesian hierarchical regression framework, to profile genome-wide quantitative occupancy of numerous TFs using data from a single chromatin accessibility experiment (DNase- or ATAC-seq). TOP is supervised, and its hierarchical structure allows it to predict the occupancy of any sequence-specific TF, even those never assayed with ChIP. We used TOP to profile the quantitative occupancy of hundreds of sequence-specific TFs at sites throughout the genome and examined how their occupancies changed in multiple contexts: in approximately 200 human cell types, through 12 h of exposure to different hormones, and across the genetic backgrounds of 70 individuals. TOP enables cost-effective exploration of quantitative changes in the landscape of TF binding.
Collapse
Affiliation(s)
- Kaixuan Luo
- Computational Biology & Bioinformatics Graduate Program, Duke University, Durham, North Carolina 27708, USA
- Center for Genomic and Computational Biology, Duke University, Durham, North Carolina 27708, USA
- Department of Computer Science, Duke University, Durham, North Carolina 27708, USA
- Department of Human Genetics, The University of Chicago, Chicago, Illinois 60637, USA
| | - Jianling Zhong
- Computational Biology & Bioinformatics Graduate Program, Duke University, Durham, North Carolina 27708, USA
- Center for Genomic and Computational Biology, Duke University, Durham, North Carolina 27708, USA
- Department of Computer Science, Duke University, Durham, North Carolina 27708, USA
| | - Alexias Safi
- Center for Genomic and Computational Biology, Duke University, Durham, North Carolina 27708, USA
- Department of Pediatrics, Duke University Medical Center, Durham, North Carolina 27710, USA
| | - Linda K Hong
- Center for Genomic and Computational Biology, Duke University, Durham, North Carolina 27708, USA
- Department of Pediatrics, Duke University Medical Center, Durham, North Carolina 27710, USA
| | - Alok K Tewari
- Department of Medical Oncology, Dana-Farber Cancer Institute, Boston, Massachusetts 02215, USA
| | - Lingyun Song
- Center for Genomic and Computational Biology, Duke University, Durham, North Carolina 27708, USA
- Department of Pediatrics, Duke University Medical Center, Durham, North Carolina 27710, USA
| | - Timothy E Reddy
- Computational Biology & Bioinformatics Graduate Program, Duke University, Durham, North Carolina 27708, USA
- Center for Genomic and Computational Biology, Duke University, Durham, North Carolina 27708, USA
- Department of Biostatistics and Bioinformatics, Durham, North Carolina 27710, USA
- Department of Molecular Genetics and Microbiology, Duke University Medical Center, Durham, North Carolina 27710, USA
- Department of Biomedical Engineering, Duke University, Durham, North Carolina 27708, USA
| | - Li Ma
- Computational Biology & Bioinformatics Graduate Program, Duke University, Durham, North Carolina 27708, USA
- Department of Statistical Science, Duke University, Durham, North Carolina 27708, USA
| | - Gregory E Crawford
- Computational Biology & Bioinformatics Graduate Program, Duke University, Durham, North Carolina 27708, USA
- Center for Genomic and Computational Biology, Duke University, Durham, North Carolina 27708, USA
- Department of Pediatrics, Duke University Medical Center, Durham, North Carolina 27710, USA
| | - Alexander J Hartemink
- Computational Biology & Bioinformatics Graduate Program, Duke University, Durham, North Carolina 27708, USA
- Center for Genomic and Computational Biology, Duke University, Durham, North Carolina 27708, USA
- Department of Computer Science, Duke University, Durham, North Carolina 27708, USA
- Department of Biology, Duke University, Durham, North Carolina 27708, USA
| |
Collapse
|
3
|
Oiwa NN, Li K, Cordeiro CE, Heermann DW. Prediction and comparative analysis of CTCF binding sites based on a first principle approach. Phys Biol 2022; 19. [PMID: 35290214 DOI: 10.1088/1478-3975/ac5dca] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/28/2021] [Accepted: 03/09/2022] [Indexed: 11/12/2022]
Abstract
We calculated the patterns for the CCCTC transcription factor (CTCF) binding sites across many genomes on a first principle approach. The validation of the first principle method was done on the human as well as on the mouse genome. The predicted human CTCF binding sites are consistent with the consensus sequence, ChIP-seq data for the K562 cell, nucleosome positions for IMR90 cell as well as the CTCF binding sites in the mouse HOXA gene. The analysis of Homo sapiens, Mus musculus, Sus scrofa, Capra hircus and Drosophila melanogaster whole genomes shows: binding sites are organized in cluster-like groups, where two consecutive sites obey a power-law with coefficient ranging from to 0.3292 0.0068 to 0.5409 0.0064; the distance between these groups varies from 18.08 0.52kbp to 42.1 2.0kbp. The genome of Aedes aegypti does not show a power law, but 19.9% of binding sites are 144 4 and 287 5bp distant of each other. We run negative tests, confirming the under-representation of CTCF binding sites in Caenorhabditis elegans, Plasmodium falciparum and Arabidopsis thaliana complete genomes.
Collapse
Affiliation(s)
- Nestor Norio Oiwa
- Theoretical Physics, Heidelberg University, Philosophenweg 19, Heidelberg, Baden-Württemberg, 69120, GERMANY
| | - Kunhe Li
- Theoretical Physics, Heidelberg University, Philosophenweg 19, Heidelberg, 69117, GERMANY
| | - Claudette E Cordeiro
- Department of Physics, Universidade Federal Fluminense, Avenida Atlantica s/n, Gragoatal, Niteroi, Rio de Janeiro, 24220-900, BRAZIL
| | - Dieter W Heermann
- Theoretical Physics, Heidelberg University, Philosophenweg 19, Heidelberg, 69120, GERMANY
| |
Collapse
|
4
|
Mitra S, Zhong J, Tran TQ, MacAlpine DM, Hartemink AJ. RoboCOP: jointly computing chromatin occupancy profiles for numerous factors from chromatin accessibility data. Nucleic Acids Res 2021; 49:7925-7938. [PMID: 34255854 PMCID: PMC8373080 DOI: 10.1093/nar/gkab553] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/23/2020] [Revised: 05/28/2021] [Accepted: 07/08/2021] [Indexed: 01/25/2023] Open
Abstract
Chromatin is a tightly packaged structure of DNA and protein within the nucleus of a cell. The arrangement of different protein complexes along the DNA modulates and is modulated by gene expression. Measuring the binding locations and occupancy levels of different transcription factors (TFs) and nucleosomes is therefore crucial to understanding gene regulation. Antibody-based methods for assaying chromatin occupancy are capable of identifying the binding sites of specific DNA binding factors, but only one factor at a time. In contrast, epigenomic accessibility data like MNase-seq, DNase-seq, and ATAC-seq provide insight into the chromatin landscape of all factors bound along the genome, but with little insight into the identities of those factors. Here, we present RoboCOP, a multivariate state space model that integrates chromatin accessibility data with nucleotide sequence to jointly compute genome-wide probabilistic scores of nucleosome and TF occupancy, for hundreds of different factors. We apply RoboCOP to MNase-seq and ATAC-seq data to elucidate the protein-binding landscape of nucleosomes and 150 TFs across the yeast genome, and show that our model makes better predictions than existing methods. We also compute a chromatin occupancy profile of the yeast genome under cadmium stress, revealing chromatin dynamics associated with transcriptional regulation.
Collapse
Affiliation(s)
- Sneha Mitra
- Department of Computer Science, Duke University, Durham, NC 27708, USA
| | - Jianling Zhong
- Program in Computational Biology and Bioinformatics, Duke University, Durham, NC 27708, USA
| | - Trung Q Tran
- Department of Computer Science, Duke University, Durham, NC 27708, USA
| | - David M MacAlpine
- Program in Computational Biology and Bioinformatics, Duke University, Durham, NC 27708, USA.,Department of Pharmacology and Cancer Biology, Duke University Medical Center, Durham, NC 27710, USA.,Center for Genomic and Computational Biology, Duke University, Durham, NC 27708, USA
| | - Alexander J Hartemink
- Department of Computer Science, Duke University, Durham, NC 27708, USA.,Program in Computational Biology and Bioinformatics, Duke University, Durham, NC 27708, USA.,Center for Genomic and Computational Biology, Duke University, Durham, NC 27708, USA
| |
Collapse
|
5
|
Mitra S, Zhong J, MacAlpine DM, Hartemink AJ. RoboCOP: Multivariate State Space Model Integrating Epigenomic Accessibility Data to Elucidate Genome-Wide Chromatin Occupancy. RESEARCH IN COMPUTATIONAL MOLECULAR BIOLOGY : ... ANNUAL INTERNATIONAL CONFERENCE, RECOMB ... : PROCEEDINGS. RECOMB (CONFERENCE : 2005- ) 2020; 12074:136-151. [PMID: 34386808 PMCID: PMC8356533 DOI: 10.1007/978-3-030-45257-5_9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 04/04/2024]
Abstract
Chromatin is the tightly packaged structure of DNA and protein within the nucleus of a cell. The arrangement of different protein complexes along the DNA modulates and is modulated by gene expression. Measuring the binding locations and level of occupancy of different transcription factors (TFs) and nucleosomes is therefore crucial to understanding gene regulation. Antibody-based methods for assaying chromatin occupancy are capable of identifying the binding sites of specific DNA binding factors, but only one factor at a time. On the other hand, epigenomic accessibility data like ATAC-seq, DNase-seq, and MNase-seq provide insight into the chromatin landscape of all factors bound along the genome, but with minimal insight into the identities of those factors. Here, we present RoboCOP, a multivariate state space model that integrates chromatin information from epigenomic accessibility data with nucleotide sequence to compute genome-wide probabilistic scores of nucleosome and TF occupancy, for hundreds of different factors at once. RoboCOP can be applied to any epigenomic dataset that provides quantitative insight into chromatin accessibility in any organism, but here we apply it to MNase-seq data to elucidate the protein-binding landscape of nucleosomes and 150 TFs across the yeast genome. Using available protein-binding datasets from the literature, we show that our model more accurately predicts the binding of these factors genome-wide.
Collapse
Affiliation(s)
- Sneha Mitra
- Department of Computer Science, Duke University, Durham, NC 27708, USA
| | - Jianling Zhong
- Program in Computational Biology and Bioinformatics, Duke University, Durham, NC 27708, USA
| | - David M MacAlpine
- Program in Computational Biology and Bioinformatics, Duke University, Durham, NC 27708, USA
- Department of Pharmacology and Cancer Biology, Duke University Medical Center, Durham, NC 27710, USA
- Center for Genomic and Computational Biology, Duke University, Durham, NC 27708, USA
| | - Alexander J Hartemink
- Department of Computer Science, Duke University, Durham, NC 27708, USA
- Program in Computational Biology and Bioinformatics, Duke University, Durham, NC 27708, USA
- Center for Genomic and Computational Biology, Duke University, Durham, NC 27708, USA
| |
Collapse
|
6
|
Abstract
“Big Data” has surpassed “systems biology” and “omics” as the hottest buzzword in the biological sciences, but is there any substance behind the hype? Certainly, we have learned about various aspects of cell and molecular biology from the many individual high-throughput data sets that have been published in the past 15–20 years. These data, although useful as individual data sets, can provide much more knowledge when interrogated with Big Data approaches, such as applying integrative methods that leverage the heterogeneous data compendia in their entirety. Here we discuss the benefits and challenges of such Big Data approaches in biology and how cell and molecular biologists can best take advantage of them.
Collapse
Affiliation(s)
- Kara Dolinski
- Lewis-Sigler Institute for Integrative Genomics, Princeton University, Princeton, NJ 08540
| | - Olga G Troyanskaya
- Lewis-Sigler Institute for Integrative Genomics, Princeton University, Princeton, NJ 08540 Department of Computer Science, Princeton University, Princeton, NJ 08540 Simons Center for Data Analysis, Simons Foundation, New York, NY 10010
| |
Collapse
|
7
|
Abstract
Although deoxyribonuclease I (DNase I) was used to probe the structure of the nucleosome in the 1960s and 1970s, in the current high-throughput sequencing era, DNase I has mainly been used to study genomic regions devoid of nucleosomes. Here, we reveal for the first time that DNase I can be used to precisely map the (translational) positions of in vivo nucleosomes genome-wide. Specifically, exploiting a distinctive DNase I cleavage profile within nucleosome-associated DNA—including a signature 10.3 base pair oscillation that corresponds to accessibility of the minor groove as DNA winds around the nucleosome—we develop a Bayes-factor–based method that can be used to map nucleosome positions along the genome. Compared to methods that require genetically modified histones, our DNase-based approach is easily applied in any organism, which we demonstrate by producing maps in yeast and human. Compared to micrococcal nuclease (MNase)-based methods that map nucleosomes based on cuts in linker regions, we utilize DNase I cuts both outside and within nucleosomal DNA; the oscillatory nature of the DNase I cleavage profile within nucleosomal DNA enables us to identify translational positioning details not apparent in MNase digestion of linker DNA. Because the oscillatory pattern corresponds to nucleosome rotational positioning, it also reveals the rotational context of transcription factor (TF) binding sites. We show that potential binding sites within nucleosome-associated DNA are often centered preferentially on an exposed major or minor groove. This preferential localization may modulate TF interaction with nucleosome-associated DNA as TFs search for binding sites.
Collapse
|
8
|
Abstract
Recent advances in experimental and computational methodologies are enabling ultra-high resolution genome-wide profiles of protein-DNA binding events. For example, the ChIP-exo protocol precisely characterizes protein-DNA cross-linking patterns by combining chromatin immunoprecipitation (ChIP) with 5' → 3' exonuclease digestion. Similarly, deeply sequenced chromatin accessibility assays (e.g. DNase-seq and ATAC-seq) enable the detection of protected footprints at protein-DNA binding sites. With these techniques and others, we have the potential to characterize the individual nucleotides that interact with transcription factors, nucleosomes, RNA polymerases and other regulatory proteins in a particular cellular context. In this review, we explain the experimental assays and computational analysis methods that enable high-resolution profiling of protein-DNA binding events. We discuss the challenges and opportunities associated with such approaches.
Collapse
Affiliation(s)
- Shaun Mahony
- a Department of Biochemistry & Molecular Biology , Center for Eukaryotic Gene Regulation, The Pennsylvania State University , University Park , PA , USA
| | - B Franklin Pugh
- a Department of Biochemistry & Molecular Biology , Center for Eukaryotic Gene Regulation, The Pennsylvania State University , University Park , PA , USA
| |
Collapse
|