1
|
Orenstein Y, Puccinelli R, Kim R, Fordyce P, Berger B. Optimized Sequence Library Design for Efficient In Vitro Interaction Mapping. Cell Syst 2019; 5:230-236.e5. [PMID: 28957657 DOI: 10.1016/j.cels.2017.07.006] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/25/2017] [Revised: 04/14/2017] [Accepted: 07/27/2017] [Indexed: 11/27/2022]
Abstract
Sequence libraries that cover all k-mers enable universal, unbiased measurements of binding to both oligonucleotides and peptides. While the number of k-mers grows exponentially in k, space on all experimental platforms is limited. Here, we shrink k-mer library sizes by using joker characters, which represent all characters in the alphabet simultaneously. We present the JokerCAKE (joker covering all k-mers) algorithm for generating a short sequence such that each k-mer appears at least p times with at most one joker character per k-mer. By running our algorithm on a range of parameters and alphabets, we show that JokerCAKE produces near-optimal sequences. Moreover, through comparison with data from hundreds of DNA-protein binding experiments and with new experimental results for both standard and JokerCAKE libraries, we establish that accurate binding scores can be inferred for high-affinity k-mers using JokerCAKE libraries. JokerCAKE libraries allow researchers to search a significantly larger sequence space using the same number of experimental measurements and at the same cost.
Collapse
Affiliation(s)
- Yaron Orenstein
- Computer Science and Artificial Intelligence Laboratory, Massachusetts Institute of Technology, Cambridge, MA 02139, USA
| | - Robert Puccinelli
- Department of Genetics, Stanford University, Stanford, CA 94305, USA
| | - Ryan Kim
- Research Science Institute, Center for Excellence in Education, McLean, VA 22102, USA
| | - Polly Fordyce
- Department of Genetics, Stanford University, Stanford, CA 94305, USA; Department of Bioengineering, Stanford University, Stanford, CA 94305, USA; ChEM-H Institute, Stanford University, Stanford, CA 94305, USA; Chan Zuckerberg Biohub, San Francisco, CA 94158, USA
| | - Bonnie Berger
- Computer Science and Artificial Intelligence Laboratory, Massachusetts Institute of Technology, Cambridge, MA 02139, USA; Department of Mathematics, Massachusetts Institute of Technology, Cambridge, MA 02139, USA.
| |
Collapse
|
2
|
Ruan S, Swamidass SJ, Stormo GD. BEESEM: estimation of binding energy models using HT-SELEX data. Bioinformatics 2018; 33:2288-2295. [PMID: 28379348 DOI: 10.1093/bioinformatics/btx191] [Citation(s) in RCA: 26] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/16/2016] [Accepted: 03/30/2017] [Indexed: 12/24/2022] Open
Abstract
Motivation Characterizing the binding specificities of transcription factors (TFs) is crucial to the study of gene expression regulation. Recently developed high-throughput experimental methods, including protein binding microarrays (PBM) and high-throughput SELEX (HT-SELEX), have enabled rapid measurements of the specificities for hundreds of TFs. However, few studies have developed efficient algorithms for estimating binding motifs based on HT-SELEX data. Also the simple method of constructing a position weight matrix (PWM) by comparing the frequency of the preferred sequence with single-nucleotide variants has the risk of generating motifs with higher information content than the true binding specificity. Results We developed an algorithm called BEESEM that builds on a comprehensive biophysical model of protein-DNA interactions, which is trained using the expectation maximization method. BEESEM is capable of selecting the optimal motif length and calculating the confidence intervals of estimated parameters. By comparing BEESEM with the published motifs estimated using the same HT-SELEX data, we demonstrate that BEESEM provides significant improvements. We also evaluate several motif discovery algorithms on independent PBM and ChIP-seq data. BEESEM provides significantly better fits to in vitro data, but its performance is similar to some other methods on in vivo data under the criterion of the area under the receiver operating characteristic curve (AUROC). This highlights the limitations of the purely rank-based AUROC criterion. Using quantitative binding data to assess models, however, demonstrates that BEESEM improves on prior models. Availability and Implementation Freely available on the web at http://stormo.wustl.edu/resources.html . Contact stormo@wustl.edu. Supplementary information Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
| | - S Joshua Swamidass
- Department of Pathology and Immunology, Washington University School of Medicine, St. Louis 63110, USA
| | | |
Collapse
|
3
|
Gao T, Shu J, Cui J. A systematic approach to RNA-associated motif discovery. BMC Genomics 2018; 19:146. [PMID: 29444662 PMCID: PMC5813387 DOI: 10.1186/s12864-018-4528-x] [Citation(s) in RCA: 14] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/06/2017] [Accepted: 02/05/2018] [Indexed: 12/30/2022] Open
Abstract
BACKGROUND Sequencing-based large screening of RNA-protein and RNA-RNA interactions has enabled the mechanistic study of post-transcriptional RNA processing and sorting, including exosome-mediated RNA secretion. The downstream analysis of RNA binding sites has encouraged the investigation of novel sequence motifs, which resulted in exceptional new challenges for identifying motifs from very short sequences (e.g., small non-coding RNAs or truncated messenger RNAs), where conventional methods tend to be ineffective. To address these challenges, we propose a novel motif-finding method and validate it on a wide range of RNA applications. RESULTS We first perform motif analysis on microRNAs and longer RNA fragments from various cellular and exosomal sources, and then validate our prediction through literature search and experimental test. For example, a 4 bp-long motif, GUUG, was detected to be responsible for microRNA loading in exosomes involved in human colon cancer (SW620). Additional performance comparisons in various case studies have shown that this new approach outperforms several existing state-of-the-art methods in detecting motifs with exceptional high coverage and explicitness. CONCLUSIONS In this work, we have demonstrated the promising performance of a new motif discovery approach that is particularly effective in current RNA applications. Important discoveries resulting from this work include the identification of possible RNA-loading motifs in a variety of exosomes, as well as novel insights in sequence features of RNA cargos, i.e., short non-coding RNAs and messenger RNAs may share similar loading mechanism into exosomes. This method has been implemented and deployed as a new webserver named MDS2 which is accessible at http://sbbi-panda.unl.edu/MDS2/ , along with a standalone package available for download at https://github.com/sbbi/MDS2 .
Collapse
Affiliation(s)
- Tian Gao
- Systems Biology and Biomedical Informatics (SBBI) Laboratory, Department of Computer Science and Engineering, University of Nebraska-Lincoln, Lincoln, NE 68588 USA
| | - Jiang Shu
- Systems Biology and Biomedical Informatics (SBBI) Laboratory, Department of Computer Science and Engineering, University of Nebraska-Lincoln, Lincoln, NE 68588 USA
| | - Juan Cui
- Systems Biology and Biomedical Informatics (SBBI) Laboratory, Department of Computer Science and Engineering, University of Nebraska-Lincoln, Lincoln, NE 68588 USA
| |
Collapse
|
4
|
Abstract
Protein-DNA binding plays a central role in gene regulation and by that in all processes in the living cell. Novel experimental and computational approaches facilitate better understanding of protein-DNA binding preferences via high-throughput measurement of protein binding to a large number of DNA sequences and inference of binding models from them. Here we review the state of the art in measuring protein-DNA binding in vitro, emphasizing the advantages and limitations of different technologies. In addition, we describe models for representing protein-DNA binding preferences and key computational approaches to learn those from high-throughput data. Using large experimental data sets, we test the performance of different models based on different measuring techniques. We conclude with pertinent open problems.
Collapse
|
5
|
Chen D, Orenstein Y, Golodnitsky R, Pellach M, Avrahami D, Wachtel C, Ovadia-Shochat A, Shir-Shapira H, Kedmi A, Juven-Gershon T, Shamir R, Gerber D. SELMAP - SELEX affinity landscape MAPping of transcription factor binding sites using integrated microfluidics. Sci Rep 2016; 6:33351. [PMID: 27628341 PMCID: PMC5024299 DOI: 10.1038/srep33351] [Citation(s) in RCA: 17] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/28/2015] [Accepted: 08/19/2016] [Indexed: 01/19/2023] Open
Abstract
Transcription factors (TFs) alter gene expression in response to changes in the environment through sequence-specific interactions with the DNA. These interactions are best portrayed as a landscape of TF binding affinities. Current methods to study sequence-specific binding preferences suffer from limited dynamic range, sequence bias, lack of specificity and limited throughput. We have developed a microfluidic-based device for SELEX Affinity Landscape MAPping (SELMAP) of TF binding, which allows high-throughput measurement of 16 proteins in parallel. We used it to measure the relative affinities of Pho4, AtERF2 and Btd full-length proteins to millions of different DNA binding sites, and detected both high and low-affinity interactions in equilibrium conditions, generating a comprehensive landscape of the relative TF affinities to all possible DNA 6-mers, and even DNA10-mers with increased sequencing depth. Low quantities of both the TFs and DNA oligomers were sufficient for obtaining high-quality results, significantly reducing experimental costs. SELMAP allows in-depth screening of hundreds of TFs, and provides a means for better understanding of the regulatory processes that govern gene expression.
Collapse
Affiliation(s)
- Dana Chen
- Mina and Everard Goodman Faculty of Life Sciences, Bar Ilan University, Ramat-Gan, 5290002, Israel
| | - Yaron Orenstein
- Blavatnik School of Computer Science, Tel-Aviv University, Tel-Aviv, 69978, Israel
| | - Rada Golodnitsky
- Mina and Everard Goodman Faculty of Life Sciences, Bar Ilan University, Ramat-Gan, 5290002, Israel
| | - Michal Pellach
- Mina and Everard Goodman Faculty of Life Sciences, Bar Ilan University, Ramat-Gan, 5290002, Israel
| | - Dorit Avrahami
- Mina and Everard Goodman Faculty of Life Sciences, Bar Ilan University, Ramat-Gan, 5290002, Israel
| | - Chaim Wachtel
- Mina and Everard Goodman Faculty of Life Sciences, Bar Ilan University, Ramat-Gan, 5290002, Israel
| | - Avital Ovadia-Shochat
- Mina and Everard Goodman Faculty of Life Sciences, Bar Ilan University, Ramat-Gan, 5290002, Israel
| | - Hila Shir-Shapira
- Mina and Everard Goodman Faculty of Life Sciences, Bar Ilan University, Ramat-Gan, 5290002, Israel
| | - Adi Kedmi
- Mina and Everard Goodman Faculty of Life Sciences, Bar Ilan University, Ramat-Gan, 5290002, Israel
| | - Tamar Juven-Gershon
- Mina and Everard Goodman Faculty of Life Sciences, Bar Ilan University, Ramat-Gan, 5290002, Israel
| | - Ron Shamir
- Blavatnik School of Computer Science, Tel-Aviv University, Tel-Aviv, 69978, Israel
| | - Doron Gerber
- Mina and Everard Goodman Faculty of Life Sciences, Bar Ilan University, Ramat-Gan, 5290002, Israel
| |
Collapse
|
6
|
Kibet CK, Machanick P. Transcription factor motif quality assessment requires systematic comparative analysis. F1000Res 2015; 4:ISCB Comm J-1429. [PMID: 27092243 PMCID: PMC4821295 DOI: 10.12688/f1000research.7408.2] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Accepted: 02/29/2016] [Indexed: 11/22/2022] Open
Abstract
Transcription factor (TF) binding site prediction remains a challenge in gene regulatory research due to degeneracy and potential variability in binding sites in the genome. Dozens of algorithms designed to learn binding models (motifs) have generated many motifs available in research papers with a subset making it to databases like JASPAR, UniPROBE and Transfac. The presence of many versions of motifs from the various databases for a single TF and the lack of a standardized assessment technique makes it difficult for biologists to make an appropriate choice of binding model and for algorithm developers to benchmark, test and improve on their models. In this study, we review and evaluate the approaches in use, highlight differences and demonstrate the difficulty of defining a standardized motif assessment approach. We review scoring functions, motif length, test data and the type of performance metrics used in prior studies as some of the factors that influence the outcome of a motif assessment. We show that the scoring functions and statistics used in motif assessment influence ranking of motifs in a TF-specific manner. We also show that TF binding specificity can vary by source of genomic binding data. We also demonstrate that information content of a motif is not in isolation a measure of motif quality but is influenced by TF binding behaviour. We conclude that there is a need for an easy-to-use tool that presents all available evidence for a comparative analysis.
Collapse
Affiliation(s)
- Caleb Kipkurui Kibet
- Department of Computer Science and Research Unit in Bioinformatics (RUBi), Rhodes University, Grahamstown, South Africa
| | - Philip Machanick
- Department of Computer Science and Research Unit in Bioinformatics (RUBi), Rhodes University, Grahamstown, South Africa
| |
Collapse
|
7
|
Kibet CK, Machanick P. Transcription factor motif quality assessment requires systematic comparative analysis. F1000Res 2015; 4:ISCB Comm J-1429. [PMID: 27092243 DOI: 10.12688/f1000research.7408.1] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Accepted: 11/19/2015] [Indexed: 03/26/2024] Open
Abstract
Transcription factor (TF) binding site prediction remains a challenge in gene regulatory research due to degeneracy and potential variability in binding sites in the genome. Dozens of algorithms designed to learn binding models (motifs) have generated many motifs available in research papers with a subset making it to databases like JASPAR, UniPROBE and Transfac. The presence of many versions of motifs from the various databases for a single TF and the lack of a standardized assessment technique makes it difficult for biologists to make an appropriate choice of binding model and for algorithm developers to benchmark, test and improve on their models. In this study, we review and evaluate the approaches in use, highlight differences and demonstrate the difficulty of defining a standardized motif assessment approach. We review scoring functions, motif length, test data and the type of performance metrics used in prior studies as some of the factors that influence the outcome of a motif assessment. We show that the scoring functions and statistics used in motif assessment influence ranking of motifs in a TF-specific manner. We also show that TF binding specificity can vary by source of genomic binding data. Finally, we demonstrate that information content of a motif is not in isolation a measure of motif quality but is influenced by TF binding behaviour. We conclude that there is a need for an easy-to-use tool that presents all available evidence for a comparative analysis.
Collapse
Affiliation(s)
- Caleb Kipkurui Kibet
- Department of Computer Science and Research Unit in Bioinformatics (RUBi), Rhodes University, Grahamstown, South Africa
| | - Philip Machanick
- Department of Computer Science and Research Unit in Bioinformatics (RUBi), Rhodes University, Grahamstown, South Africa
| |
Collapse
|
8
|
Glick Y, Orenstein Y, Chen D, Avrahami D, Zor T, Shamir R, Gerber D. Integrated microfluidic approach for quantitative high-throughput measurements of transcription factor binding affinities. Nucleic Acids Res 2015; 44:e51. [PMID: 26635393 PMCID: PMC4824076 DOI: 10.1093/nar/gkv1327] [Citation(s) in RCA: 17] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/12/2015] [Accepted: 11/14/2015] [Indexed: 01/16/2023] Open
Abstract
Protein binding to DNA is a fundamental process in gene regulation. Methodologies such as ChIP-Seq and mapping of DNase I hypersensitive sites provide global information on this regulation in vivo In vitro methodologies provide valuable complementary information on protein-DNA specificities. However, current methods still do not measure absolute binding affinities. There is a real need for large-scale quantitative protein-DNA affinity measurements. We developed QPID, a microfluidic application for measuring protein-DNA affinities. A single run is equivalent to 4096 gel-shift experiments. Using QPID, we characterized the different affinities of ATF1, c-Jun, c-Fos and AP-1 to the CRE consensus motif and CRE half-site in two different genomic sequences on a single device. We discovered that binding of ATF1, but not of AP-1, to the CRE half-site is highly affected by its genomic context. This effect was highly correlated with ATF1 ChIP-seq and PBM experiments. Next, we characterized the affinities of ATF1 and ATF3 to 128 genomic CRE and CRE half-site sequences. Our affinity measurements explained that in vivo binding differences between ATF1 and ATF3 to CRE and CRE half-sites are partially mediated by differences in the minor groove width. We believe that QPID would become a central tool for quantitative characterization of biophysical aspects affecting protein-DNA binding.
Collapse
Affiliation(s)
- Yair Glick
- Mina and Evrard Goodman life science faculty, Bar Ilan University, Ramat-Gan, 5290002, Israel
| | - Yaron Orenstein
- Blavatnik School of Computer Science, Tel-Aviv University, Tel-Aviv, 69978, Israel
| | - Dana Chen
- Mina and Evrard Goodman life science faculty, Bar Ilan University, Ramat-Gan, 5290002, Israel
| | - Dorit Avrahami
- Mina and Evrard Goodman life science faculty, Bar Ilan University, Ramat-Gan, 5290002, Israel
| | - Tsaffrir Zor
- Department of Biochemistry & Molecular Biology, Life Sciences Institute, Tel-Aviv University, Tel-Aviv, 69978, Israel
| | - Ron Shamir
- Blavatnik School of Computer Science, Tel-Aviv University, Tel-Aviv, 69978, Israel
| | - Doron Gerber
- Mina and Evrard Goodman life science faculty, Bar Ilan University, Ramat-Gan, 5290002, Israel
| |
Collapse
|
9
|
Andrilenas KK, Penvose A, Siggers T. Using protein-binding microarrays to study transcription factor specificity: homologs, isoforms and complexes. Brief Funct Genomics 2014; 14:17-29. [PMID: 25431149 DOI: 10.1093/bfgp/elu046] [Citation(s) in RCA: 20] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/14/2023] Open
Abstract
Protein-DNA binding is central to specificity in gene regulation, and methods for characterizing transcription factor (TF)-DNA binding remain crucial to studies of regulatory specificity. High-throughput (HT) technologies have revolutionized our ability to characterize protein-DNA binding by significantly increasing the number of binding measurements that can be performed. Protein-binding microarrays (PBMs) are a robust and powerful HT platform for studying DNA-binding specificity of TFs. Analysis of PBM-determined DNA-binding profiles has provided new insight into the scope and mechanisms of TF binding diversity. In this review, we focus specifically on the PBM technique and discuss its application to the study of TF specificity, in particular, the binding diversity of TF homologs and multi-protein complexes.
Collapse
|
10
|
Orenstein Y, Shamir R. A comparative analysis of transcription factor binding models learned from PBM, HT-SELEX and ChIP data. Nucleic Acids Res 2014; 42:e63. [PMID: 24500199 PMCID: PMC4005680 DOI: 10.1093/nar/gku117] [Citation(s) in RCA: 79] [Impact Index Per Article: 7.9] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/29/2022] Open
Abstract
Understanding gene regulation is a key challenge in today's biology. The new technologies of protein-binding microarrays (PBMs) and high-throughput SELEX (HT-SELEX) allow measurement of the binding intensities of one transcription factor (TF) to numerous synthetic double-stranded DNA sequences in a single experiment. Recently, Jolma et al. reported the results of 547 HT-SELEX experiments covering human and mouse TFs. Because 162 of these TFs were also covered by PBM technology, for the first time, a large-scale comparison between implementations of these two in vitro technologies is possible. Here we assessed the similarities and differences between binding models, represented as position weight matrices, inferred from PBM and HT-SELEX, and also measured how well these models predict in vivo binding. Our results show that HT-SELEX- and PBM-derived models agree for most TFs. For some TFs, the HT-SELEX-derived models are longer versions of the PBM-derived models, whereas for other TFs, the HT-SELEX models match the secondary PBM-derived models. Remarkably, PBM-based 8-mer ranking is more accurate than that of HT-SELEX, but models derived from HT-SELEX predict in vivo binding better. In addition, we reveal several biases in HT-SELEX data including nucleotide frequency bias, enrichment of C-rich k-mers and oligos and underrepresentation of palindromes.
Collapse
Affiliation(s)
- Yaron Orenstein
- Blavatnik School of Computer Science, Tel-Aviv University, Tel Aviv 69978, Israel
| | | |
Collapse
|
11
|
Orenstein Y, Shamir R. Design of shortest double-stranded DNA sequences covering all k-mers with applications to protein-binding microarrays and synthetic enhancers. Bioinformatics 2013; 29:i71-9. [PMID: 23813011 PMCID: PMC3694677 DOI: 10.1093/bioinformatics/btt230] [Citation(s) in RCA: 10] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/16/2022] Open
Abstract
Motivation: Novel technologies can generate large sets of short double-stranded DNA sequences that can be used to measure their regulatory effects. Microarrays can measure in vitro the binding intensity of a protein to thousands of probes. Synthetic enhancer sequences inserted into an organism’s genome allow us to measure in vivo the effect of such sequences on the phenotype. In both applications, by using sequence probes that cover all k-mers, a comprehensive picture of the effect of all possible short sequences on gene regulation is obtained. The value of k that can be used in practice is, however, severely limited by cost and space considerations. A key challenge is, therefore, to cover all k-mers with a minimal number of probes. The standard way to do this uses the de Bruijn sequence of length . However, as probes are double stranded, when a k-mer is included in a probe, its reverse complement k-mer is accounted for as well. Results: Here, we show how to efficiently create a shortest possible sequence with the property that it contains each k-mer or its reverse complement, but not necessarily both. The length of the resulting sequence approaches half that of the de Bruijn sequence as k increases resulting in a more efficient array, which allows covering more longer sequences; alternatively, additional sequences with redundant k-mers of interest can be added. Availability: The software is freely available from our website http://acgt.cs.tau.ac.il/shortcake/. Contact:rshamir@tau.ac.il
Collapse
Affiliation(s)
- Yaron Orenstein
- Blavatnik School of Computer Science, Tel-Aviv University, Tel-Aviv 69978, Israel
| | | |
Collapse
|
12
|
Zhong S, He X, Bar-Joseph Z. Predicting tissue specific transcription factor binding sites. BMC Genomics 2013; 14:796. [PMID: 24238150 PMCID: PMC3898213 DOI: 10.1186/1471-2164-14-796] [Citation(s) in RCA: 24] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/30/2013] [Accepted: 11/06/2013] [Indexed: 12/13/2022] Open
Abstract
BACKGROUND Studies of gene regulation often utilize genome-wide predictions of transcription factor (TF) binding sites. Most existing prediction methods are based on sequence information alone, ignoring biological contexts such as developmental stages and tissue types. Experimental methods to study in vivo binding, including ChIP-chip and ChIP-seq, can only study one transcription factor in a single cell type and under a specific condition in each experiment, and therefore cannot scale to determine the full set of regulatory interactions in mammalian transcriptional regulatory networks. RESULTS We developed a new computational approach, PIPES, for predicting tissue-specific TF binding. PIPES integrates in vitro protein binding microarrays (PBMs), sequence conservation and tissue-specific epigenetic (DNase I hypersensitivity) information. We demonstrate that PIPES improves over existing methods on distinguishing between in vivo bound and unbound sequences using ChIP-seq data for 11 mouse TFs. In addition, our predictions are in good agreement with current knowledge of tissue-specific TF regulation. CONCLUSIONS We provide a systematic map of computationally predicted tissue-specific binding targets for 284 mouse TFs across 55 tissue/cell types. Such comprehensive resource is useful for researchers studying gene regulation.
Collapse
Affiliation(s)
| | | | - Ziv Bar-Joseph
- Lane Center for Computational Biology, School of Computer Science, Carnegie Mellon University, Pittsburgh, PA, 15213, USA.
| |
Collapse
|