1
|
Mitra S, Hartemink AJ. Inferring differential protein binding from time-series chromatin accessibility data. BIOINFORMATICS ADVANCES 2025; 5:vbaf080. [PMID: 40297777 PMCID: PMC12037103 DOI: 10.1093/bioadv/vbaf080] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 01/14/2025] [Revised: 03/08/2025] [Accepted: 04/07/2025] [Indexed: 04/30/2025]
Abstract
Motivation Due to internal and external factors, the epigenomic landscape is constantly changing in ways that are linked to changes in gene expression. Chromatin accessibility data, such as MNase-seq, provide valuable insights into this landscape and have been used to compute chromatin occupancy profiles. Multiple datasets generated over time or under different conditions can thus be used to study dynamic changes in chromatin occupancy across the genome. Results Our existing model, RoboCOP, computes a genome-wide chromatin occupancy profile for nucleosomes and hundreds of transcription factors. Here, we present a new method called DynaCOP that takes multiple chromatin occupancy profiles and uses them to generate a series of nucleosome-guided difference profiles. These profiles identify differentially binding transcription factors and reveal changes in nucleosome occupancy and positioning. We apply DynaCOP to chromatin occupancy profiles derived from deeply sequenced time-series MNase-seq data to study differential chromatin occupancy in the yeast genome under cadmium stress. We find strong correlations between the observed chromatin changes and changes in transcription. Availability and implementation https://github.com/HarteminkLab/RoboCOP.
Collapse
Affiliation(s)
- Sneha Mitra
- Department of Computer Science, Duke University, Durham, NC 27708-0129, United States
| | - Alexander J Hartemink
- Department of Computer Science, Duke University, Durham, NC 27708-0129, United States
- Program in Computational Biology and Bioinformatics, Duke University, Durham, NC 27710, United States
| |
Collapse
|
2
|
Caudai C, Galizia A, Geraci F, Le Pera L, Morea V, Salerno E, Via A, Colombo T. AI applications in functional genomics. Comput Struct Biotechnol J 2021; 19:5762-5790. [PMID: 34765093 PMCID: PMC8566780 DOI: 10.1016/j.csbj.2021.10.009] [Citation(s) in RCA: 33] [Impact Index Per Article: 8.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/16/2021] [Revised: 10/05/2021] [Accepted: 10/05/2021] [Indexed: 12/13/2022] Open
Abstract
We review the current applications of artificial intelligence (AI) in functional genomics. The recent explosion of AI follows the remarkable achievements made possible by "deep learning", along with a burst of "big data" that can meet its hunger. Biology is about to overthrow astronomy as the paradigmatic representative of big data producer. This has been made possible by huge advancements in the field of high throughput technologies, applied to determine how the individual components of a biological system work together to accomplish different processes. The disciplines contributing to this bulk of data are collectively known as functional genomics. They consist in studies of: i) the information contained in the DNA (genomics); ii) the modifications that DNA can reversibly undergo (epigenomics); iii) the RNA transcripts originated by a genome (transcriptomics); iv) the ensemble of chemical modifications decorating different types of RNA transcripts (epitranscriptomics); v) the products of protein-coding transcripts (proteomics); and vi) the small molecules produced from cell metabolism (metabolomics) present in an organism or system at a given time, in physiological or pathological conditions. After reviewing main applications of AI in functional genomics, we discuss important accompanying issues, including ethical, legal and economic issues and the importance of explainability.
Collapse
Affiliation(s)
- Claudia Caudai
- CNR, Institute of Information Science and Technologies “A. Faedo” (ISTI), Pisa, Italy
| | - Antonella Galizia
- CNR, Institute of Applied Mathematics and Information Technologies (IMATI), Genoa, Italy
| | - Filippo Geraci
- CNR, Institute for Informatics and Telematics (IIT), Pisa, Italy
| | - Loredana Le Pera
- CNR, Institute of Biomembranes, Bioenergetics and Molecular Biotechnologies (IBIOM), Bari, Italy
- CNR, Institute of Molecular Biology and Pathology (IBPM), Rome, Italy
| | - Veronica Morea
- CNR, Institute of Molecular Biology and Pathology (IBPM), Rome, Italy
| | - Emanuele Salerno
- CNR, Institute of Information Science and Technologies “A. Faedo” (ISTI), Pisa, Italy
| | - Allegra Via
- CNR, Institute of Molecular Biology and Pathology (IBPM), Rome, Italy
| | - Teresa Colombo
- CNR, Institute of Molecular Biology and Pathology (IBPM), Rome, Italy
| |
Collapse
|
3
|
Minnoye L, Marinov GK, Krausgruber T, Pan L, Marand AP, Secchia S, Greenleaf WJ, Furlong EEM, Zhao K, Schmitz RJ, Bock C, Aerts S. Chromatin accessibility profiling methods. NATURE REVIEWS. METHODS PRIMERS 2021; 1:10. [PMID: 38410680 PMCID: PMC10895463 DOI: 10.1038/s43586-020-00008-9] [Citation(s) in RCA: 66] [Impact Index Per Article: 16.5] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Accepted: 12/01/2020] [Indexed: 02/06/2023]
Abstract
Chromatin accessibility, or the physical access to chromatinized DNA, is a widely studied characteristic of the eukaryotic genome. As active regulatory DNA elements are generally 'accessible', the genome-wide profiling of chromatin accessibility can be used to identify candidate regulatory genomic regions in a tissue or cell type. Multiple biochemical methods have been developed to profile chromatin accessibility, both in bulk and at the single-cell level. Depending on the method, enzymatic cleavage, transposition or DNA methyltransferases are used, followed by high-throughput sequencing, providing a view of genome-wide chromatin accessibility. In this Primer, we discuss these biochemical methods, as well as bioinformatics tools for analysing and interpreting the generated data, and insights into the key regulators underlying developmental, evolutionary and disease processes. We outline standards for data quality, reproducibility and deposition used by the genomics community. Although chromatin accessibility profiling is invaluable to study gene regulation, alone it provides only a partial view of this complex process. Orthogonal assays facilitate the interpretation of accessible regions with respect to enhancer-promoter proximity, functional transcription factor binding and regulatory function. We envision that technological improvements including single-molecule, multi-omics and spatial methods will bring further insight into the secrets of genome regulation.
Collapse
Affiliation(s)
- Liesbeth Minnoye
- Center for Brain & Disease Research, VIB-KU Leuven, Leuven, Belgium
- Department of Human Genetics, KU Leuven, Leuven, Belgium
| | | | - Thomas Krausgruber
- CeMM Research Center for Molecular Medicine of the Austrian Academy of Sciences, Vienna, Austria
| | - Lixia Pan
- Laboratory of Epigenome Biology, Systems Biology Center, Division of Intramural Research, National Heart, Lung and Blood Institute, NIH, Bethesda, MD, USA
| | | | - Stefano Secchia
- European Molecular Biology Laboratory (EMBL), Genome Biology Unit, Heidelberg, Germany
| | | | - Eileen E M Furlong
- European Molecular Biology Laboratory (EMBL), Genome Biology Unit, Heidelberg, Germany
| | - Keji Zhao
- Laboratory of Epigenome Biology, Systems Biology Center, Division of Intramural Research, National Heart, Lung and Blood Institute, NIH, Bethesda, MD, USA
| | | | - Christoph Bock
- CeMM Research Center for Molecular Medicine of the Austrian Academy of Sciences, Vienna, Austria
- Institute of Artificial Intelligence and Decision Support, Center for Medical Statistics, Informatics, and Intelligent Systems, Medical University of Vienna, Vienna, Austria
| | - Stein Aerts
- Center for Brain & Disease Research, VIB-KU Leuven, Leuven, Belgium
- Department of Human Genetics, KU Leuven, Leuven, Belgium
| |
Collapse
|
4
|
Liu Y, Fu L, Kaufmann K, Chen D, Chen M. A practical guide for DNase-seq data analysis: from data management to common applications. Brief Bioinform 2020; 20:1865-1877. [PMID: 30010713 DOI: 10.1093/bib/bby057] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/08/2018] [Revised: 06/06/2018] [Accepted: 06/10/2018] [Indexed: 01/01/2023] Open
Abstract
Deoxyribonuclease I (DNase I)-hypersensitive site sequencing (DNase-seq) has been widely used to determine chromatin accessibility and its underlying regulatory lexicon. However, exploring DNase-seq data requires sophisticated downstream bioinformatics analyses. In this study, we first review computational methods for all of the major steps in DNase-seq data analysis, including experimental design, quality control, read alignment, peak calling, annotation of cis-regulatory elements, genomic footprinting and visualization. The challenges associated with each step are highlighted. Next, we provide a practical guideline and a computational pipeline for DNase-seq data analysis by integrating some of these tools. We also discuss the competing techniques and the potential applications of this pipeline for the analysis of analogous experimental data. Finally, we discuss the integration of DNase-seq with other functional genomics techniques.
Collapse
Affiliation(s)
- Yongjing Liu
- Department of Bioinformatics, College of Life Sciences, Zhejiang University, Hangzhou 310058, China
| | - Liangyu Fu
- Department for Plant Cell and Molecular Biology, Institute for Biology, Humboldt-Universität zu Berlin, Berlin 10115, Germany
| | - Kerstin Kaufmann
- Department for Plant Cell and Molecular Biology, Institute for Biology, Humboldt-Universität zu Berlin, Berlin 10115, Germany
| | - Dijun Chen
- Department of Bioinformatics, College of Life Sciences, Zhejiang University, Hangzhou 310058, China
| | - Ming Chen
- Department for Plant Cell and Molecular Biology, Institute for Biology, Humboldt-Universität zu Berlin, Berlin 10115, Germany
| |
Collapse
|
5
|
Keilwagen J, Posch S, Grau J. Accurate prediction of cell type-specific transcription factor binding. Genome Biol 2019; 20:9. [PMID: 30630522 PMCID: PMC6327544 DOI: 10.1186/s13059-018-1614-y] [Citation(s) in RCA: 56] [Impact Index Per Article: 9.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/25/2018] [Accepted: 12/18/2018] [Indexed: 01/11/2023] Open
Abstract
Prediction of cell type-specific, in vivo transcription factor binding sites is one of the central challenges in regulatory genomics. Here, we present our approach that earned a shared first rank in the "ENCODE-DREAM in vivo Transcription Factor Binding Site Prediction Challenge" in 2017. In post-challenge analyses, we benchmark the influence of different feature sets and find that chromatin accessibility and binding motifs are sufficient to yield state-of-the-art performance. Finally, we provide 682 lists of predicted peaks for a total of 31 transcription factors in 22 primary cell types and tissues and a user-friendly version of our approach, Catchitt, for download.
Collapse
Affiliation(s)
- Jens Keilwagen
- Institute for Biosafety in Plant Biotechnology, Julius Kühn-Institut (JKI) - Federal Research Centre for Cultivated Plants, Erwin-Baur-Straße 27, Quedlinburg, 06484 Germany
| | - Stefan Posch
- Institute of Computer Science, Martin Luther University Halle–Wittenberg, Von-Seckendorff-Platz 1, Halle (Saale), 06120 Germany
| | - Jan Grau
- Institute of Computer Science, Martin Luther University Halle–Wittenberg, Von-Seckendorff-Platz 1, Halle (Saale), 06120 Germany
| |
Collapse
|
6
|
Ma W, Yang L, Rohs R, Noble WS. DNA sequence+shape kernel enables alignment-free modeling of transcription factor binding. Bioinformatics 2018; 33:3003-3010. [PMID: 28541376 PMCID: PMC5870879 DOI: 10.1093/bioinformatics/btx336] [Citation(s) in RCA: 19] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/25/2016] [Accepted: 05/23/2017] [Indexed: 01/07/2023] Open
Abstract
Motivation Transcription factors (TFs) bind to specific DNA sequence motifs. Several lines of evidence suggest that TF-DNA binding is mediated in part by properties of the local DNA shape: the width of the minor groove, the relative orientations of adjacent base pairs, etc. Several methods have been developed to jointly account for DNA sequence and shape properties in predicting TF binding affinity. However, a limitation of these methods is that they typically require a training set of aligned TF binding sites. Results We describe a sequence + shape kernel that leverages DNA sequence and shape information to better understand protein-DNA binding preference and affinity. This kernel extends an existing class of k-mer based sequence kernels, based on the recently described di-mismatch kernel. Using three in vitro benchmark datasets, derived from universal protein binding microarrays (uPBMs), genomic context PBMs (gcPBMs) and SELEX-seq data, we demonstrate that incorporating DNA shape information improves our ability to predict protein-DNA binding affinity. In particular, we observe that (i) the k-spectrum + shape model performs better than the classical k-spectrum kernel, particularly for small k values; (ii) the di-mismatch kernel performs better than the k-mer kernel, for larger k; and (iii) the di-mismatch + shape kernel performs better than the di-mismatch kernel for intermediate k values. Availability and implementation The software is available at https://bitbucket.org/wenxiu/sequence-shape.git. Supplementary information Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Wenxiu Ma
- Department of Statistics, University of California Riverside, Riverside, CA 92521, USA
| | - Lin Yang
- Molecular and Computational Biology Program, Departments of Biological Sciences, Chemistry, Physics, and Computer Science, University of Southern California, Los Angeles, CA 90089, USA
| | - Remo Rohs
- Molecular and Computational Biology Program, Departments of Biological Sciences, Chemistry, Physics, and Computer Science, University of Southern California, Los Angeles, CA 90089, USA
| | - William Stafford Noble
- Department of Genome Sciences, Department of Computer Science and Engineering, University of Washington, Seattle, WA 98195, USA
| |
Collapse
|
7
|
Tripodi IJ, Allen MA, Dowell RD. Detecting Differential Transcription Factor Activity from ATAC-Seq Data. Molecules 2018; 23:molecules23051136. [PMID: 29748466 PMCID: PMC6099720 DOI: 10.3390/molecules23051136] [Citation(s) in RCA: 30] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/31/2018] [Revised: 05/05/2018] [Accepted: 05/06/2018] [Indexed: 02/06/2023] Open
Abstract
Transcription factors are managers of the cellular factory, and key components to many diseases. Many non-coding single nucleotide polymorphisms affect transcription factors, either by directly altering the protein or its functional activity at individual binding sites. Here we first briefly summarize high-throughput approaches to studying transcription factor activity. We then demonstrate, using published chromatin accessibility data (specifically ATAC-seq), that the genome-wide profile of TF recognition motifs relative to regions of open chromatin can determine the key transcription factor altered by a perturbation. Our method of determining which TFs are altered by a perturbation is simple, is quick to implement, and can be used when biological samples are limited. In the future, we envision that this method could be applied to determine which TFs show altered activity in response to a wide variety of drugs and diseases.
Collapse
Affiliation(s)
- Ignacio J Tripodi
- Computer Science, University of Colorado, Boulder, CO 80305, USA.
- BioFrontiers Institute, University of Colorado, Boulder, CO 80303, USA.
| | - Mary A Allen
- BioFrontiers Institute, University of Colorado, Boulder, CO 80303, USA.
| | - Robin D Dowell
- Computer Science, University of Colorado, Boulder, CO 80305, USA.
- BioFrontiers Institute, University of Colorado, Boulder, CO 80303, USA.
- Molecular, Cellular and Developmental Biology, University of Colorado, Boulder, CO 80305, USA.
| |
Collapse
|
8
|
LRPPRC-mediated folding of the mitochondrial transcriptome. Nat Commun 2017; 8:1532. [PMID: 29146908 PMCID: PMC5691074 DOI: 10.1038/s41467-017-01221-z] [Citation(s) in RCA: 83] [Impact Index Per Article: 10.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/18/2017] [Accepted: 08/24/2017] [Indexed: 01/01/2023] Open
Abstract
The expression of the compact mammalian mitochondrial genome requires transcription, RNA processing, translation and RNA decay, much like the more complex chromosomal systems, and here we use it as a model system to understand the fundamental aspects of gene expression. Here we combine RNase footprinting with PAR-CLIP at unprecedented depth to reveal the importance of RNA-protein interactions in dictating RNA folding within the mitochondrial transcriptome. We show that LRPPRC, in complex with its protein partner SLIRP, binds throughout the mitochondrial transcriptome, with a preference for mRNAs, and its loss affects the entire secondary structure and stability of the transcriptome. We demonstrate that the LRPPRC-SLIRP complex is a global RNA chaperone that stabilizes RNA structures to expose the required sites for translation, stabilization, and polyadenylation. Our findings reveal a general mechanism where extensive RNA-protein interactions ensure that RNA is accessible for its biological functions.
Collapse
|
9
|
Quach B, Furey TS. DeFCoM: analysis and modeling of transcription factor binding sites using a motif-centric genomic footprinter. Bioinformatics 2017; 33:956-963. [PMID: 27993786 DOI: 10.1093/bioinformatics/btw740] [Citation(s) in RCA: 16] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/03/2016] [Accepted: 11/18/2016] [Indexed: 11/13/2022] Open
Abstract
Motivation Identifying the locations of transcription factor binding sites is critical for understanding how gene transcription is regulated across different cell types and conditions. Chromatin accessibility experiments such as DNaseI sequencing (DNase-seq) and Assay for Transposase Accessible Chromatin sequencing (ATAC-seq) produce genome-wide data that include distinct 'footprint' patterns at binding sites. Nearly all existing computational methods to detect footprints from these data assume that footprint signals are highly homogeneous across footprint sites. Additionally, a comprehensive and systematic comparison of footprinting methods for specifically identifying which motif sites for a specific factor are bound has not been performed. Results Using DNase-seq data from the ENCODE project, we show that a large degree of previously uncharacterized site-to-site variability exists in footprint signal across motif sites for a transcription factor. To model this heterogeneity in the data, we introduce a novel, supervised learning footprinter called Detecting Footprints Containing Motifs (DeFCoM). We compare DeFCoM to nine existing methods using evaluation sets from four human cell-lines and eighteen transcription factors and show that DeFCoM outperforms current methods in determining bound and unbound motif sites. We also analyze the impact of several biological and technical factors on the quality of footprint predictions to highlight important considerations when conducting footprint analyses and assessing the performance of footprint prediction methods. Finally, we show that DeFCoM can detect footprints using ATAC-seq data with similar accuracy as when using DNase-seq data. Availability and Implementation Python code available at https://bitbucket.org/bryancquach/defcom. Contact bquach@email.unc.edu or tsfurey@email.unc.edu. Supplementary information Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Bryan Quach
- Curriculum in Bioinformatics and Computational Biology.,Department of Genetics.,Department of Biology, University of North Carolina, Chapel Hill, NC 27599, USA
| | - Terrence S Furey
- Department of Genetics.,Department of Biology, University of North Carolina, Chapel Hill, NC 27599, USA
| |
Collapse
|
10
|
Schmidt F, Gasparoni N, Gasparoni G, Gianmoena K, Cadenas C, Polansky JK, Ebert P, Nordström K, Barann M, Sinha A, Fröhler S, Xiong J, Dehghani Amirabad A, Behjati Ardakani F, Hutter B, Zipprich G, Felder B, Eils J, Brors B, Chen W, Hengstler JG, Hamann A, Lengauer T, Rosenstiel P, Walter J, Schulz MH. Combining transcription factor binding affinities with open-chromatin data for accurate gene expression prediction. Nucleic Acids Res 2017; 45:54-66. [PMID: 27899623 PMCID: PMC5224477 DOI: 10.1093/nar/gkw1061] [Citation(s) in RCA: 82] [Impact Index Per Article: 10.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/27/2016] [Revised: 10/18/2016] [Accepted: 10/24/2016] [Indexed: 12/21/2022] Open
Abstract
The binding and contribution of transcription factors (TF) to cell specific gene expression is often deduced from open-chromatin measurements to avoid costly TF ChIP-seq assays. Thus, it is important to develop computational methods for accurate TF binding prediction in open-chromatin regions (OCRs). Here, we report a novel segmentation-based method, TEPIC, to predict TF binding by combining sets of OCRs with position weight matrices. TEPIC can be applied to various open-chromatin data, e.g. DNaseI-seq and NOMe-seq. Additionally, Histone-Marks (HMs) can be used to identify candidate TF binding sites. TEPIC computes TF affinities and uses open-chromatin/HM signal intensity as quantitative measures of TF binding strength. Using machine learning, we find low affinity binding sites to improve our ability to explain gene expression variability compared to the standard presence/absence classification of binding sites. Further, we show that both footprints and peaks capture essential TF binding events and lead to a good prediction performance. In our application, gene-based scores computed by TEPIC with one open-chromatin assay nearly reach the quality of several TF ChIP-seq data sets. Finally, these scores correctly predict known transcriptional regulators as illustrated by the application to novel DNaseI-seq and NOMe-seq data for primary human hepatocytes and CD4+ T-cells, respectively.
Collapse
Affiliation(s)
- Florian Schmidt
- Cluster of Excellence for Multimodal Computing and Interaction, Saarland Informatics Campus, Saarland University, Saarbrücken, 66123, Germany
- Computational Biology & Applied Algorithmics, Max Planck Institute for Informatics, Saarland Informatics Campus, Saarbrücken, 66123, Germany
| | - Nina Gasparoni
- Department of Genetics, University of Saarland, Saarbrücken, 66123, Germany
| | - Gilles Gasparoni
- Department of Genetics, University of Saarland, Saarbrücken, 66123, Germany
| | - Kathrin Gianmoena
- Leibniz Research Centre for Working Environment and Human Factors IfADo, Dortmund, 44139, Germany
| | - Cristina Cadenas
- Leibniz Research Centre for Working Environment and Human Factors IfADo, Dortmund, 44139, Germany
| | - Julia K Polansky
- Experimental Rheumatology, German Rheumatism Research Centre, Berlin, 10117, Germany
| | - Peter Ebert
- Computational Biology & Applied Algorithmics, Max Planck Institute for Informatics, Saarland Informatics Campus, Saarbrücken, 66123, Germany
- International Max Planck Research School for Computer Science, Saarland Informatics Campus, Saarbrücken, 66123, Germany
| | - Karl Nordström
- Department of Genetics, University of Saarland, Saarbrücken, 66123, Germany
| | - Matthias Barann
- Institute of Clinical Molecular Biology, Christian-Albrechts-University, Kiel, 24105, Germany
| | - Anupam Sinha
- Institute of Clinical Molecular Biology, Christian-Albrechts-University, Kiel, 24105, Germany
| | - Sebastian Fröhler
- Berlin Institute for Medical Systems Biology, Max-Delbrück Center for Molecular Medicine, Berlin, 13092, Germany
| | - Jieyi Xiong
- Berlin Institute for Medical Systems Biology, Max-Delbrück Center for Molecular Medicine, Berlin, 13092, Germany
| | - Azim Dehghani Amirabad
- Cluster of Excellence for Multimodal Computing and Interaction, Saarland Informatics Campus, Saarland University, Saarbrücken, 66123, Germany
- Computational Biology & Applied Algorithmics, Max Planck Institute for Informatics, Saarland Informatics Campus, Saarbrücken, 66123, Germany
- International Max Planck Research School for Computer Science, Saarland Informatics Campus, Saarbrücken, 66123, Germany
| | - Fatemeh Behjati Ardakani
- Cluster of Excellence for Multimodal Computing and Interaction, Saarland Informatics Campus, Saarland University, Saarbrücken, 66123, Germany
- Computational Biology & Applied Algorithmics, Max Planck Institute for Informatics, Saarland Informatics Campus, Saarbrücken, 66123, Germany
| | - Barbara Hutter
- Applied Bioinformatics, Deutsches Krebsforschungszentrum, Heidelberg, 69120, Germany
| | - Gideon Zipprich
- Data Management and Genomics IT, Deutsches Krebsforschungszentrum, Heidelberg, 69120, Germany
| | - Bärbel Felder
- Data Management and Genomics IT, Deutsches Krebsforschungszentrum, Heidelberg, 69120, Germany
| | - Jürgen Eils
- Data Management and Genomics IT, Deutsches Krebsforschungszentrum, Heidelberg, 69120, Germany
| | - Benedikt Brors
- Applied Bioinformatics, Deutsches Krebsforschungszentrum, Heidelberg, 69120, Germany
| | - Wei Chen
- Berlin Institute for Medical Systems Biology, Max-Delbrück Center for Molecular Medicine, Berlin, 13092, Germany
| | - Jan G Hengstler
- Leibniz Research Centre for Working Environment and Human Factors IfADo, Dortmund, 44139, Germany
| | - Alf Hamann
- International Max Planck Research School for Computer Science, Saarland Informatics Campus, Saarbrücken, 66123, Germany
| | - Thomas Lengauer
- Computational Biology & Applied Algorithmics, Max Planck Institute for Informatics, Saarland Informatics Campus, Saarbrücken, 66123, Germany
| | - Philip Rosenstiel
- Institute of Clinical Molecular Biology, Christian-Albrechts-University, Kiel, 24105, Germany
| | - Jörn Walter
- Department of Genetics, University of Saarland, Saarbrücken, 66123, Germany
| | - Marcel H Schulz
- Cluster of Excellence for Multimodal Computing and Interaction, Saarland Informatics Campus, Saarland University, Saarbrücken, 66123, Germany
- Computational Biology & Applied Algorithmics, Max Planck Institute for Informatics, Saarland Informatics Campus, Saarbrücken, 66123, Germany
| |
Collapse
|
11
|
Onisko A, Druzdzel MJ, Austin RM. How to interpret the results of medical time series data analysis: Classical statistical approaches versus dynamic Bayesian network modeling. J Pathol Inform 2016; 7:50. [PMID: 28163973 PMCID: PMC5248402 DOI: 10.4103/2153-3539.197191] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/02/2016] [Accepted: 11/17/2016] [Indexed: 01/11/2023] Open
Abstract
BACKGROUND Classical statistics is a well-established approach in the analysis of medical data. While the medical community seems to be familiar with the concept of a statistical analysis and its interpretation, the Bayesian approach, argued by many of its proponents to be superior to the classical frequentist approach, is still not well-recognized in the analysis of medical data. AIM The goal of this study is to encourage data analysts to use the Bayesian approach, such as modeling with graphical probabilistic networks, as an insightful alternative to classical statistical analysis of medical data. MATERIALS AND METHODS This paper offers a comparison of two approaches to analysis of medical time series data: (1) classical statistical approach, such as the Kaplan-Meier estimator and the Cox proportional hazards regression model, and (2) dynamic Bayesian network modeling. Our comparison is based on time series cervical cancer screening data collected at Magee-Womens Hospital, University of Pittsburgh Medical Center over 10 years. RESULTS The main outcomes of our comparison are cervical cancer risk assessments produced by the three approaches. However, our analysis discusses also several aspects of the comparison, such as modeling assumptions, model building, dealing with incomplete data, individualized risk assessment, results interpretation, and model validation. CONCLUSION Our study shows that the Bayesian approach is (1) much more flexible in terms of modeling effort, and (2) it offers an individualized risk assessment, which is more cumbersome for classical statistical approaches.
Collapse
Affiliation(s)
- Agnieszka Onisko
- Department of Pathology, University of Pittsburgh Medical Center, Magee-Womens Hospital, Pittsburgh, PA 15213, USA
- Faculty of Computer Science, Bialystok University of Technology, 15-351 Bialystok, Poland
| | - Marek J. Druzdzel
- Faculty of Computer Science, Bialystok University of Technology, 15-351 Bialystok, Poland
- Decision Systems Laboratory, School of Information Sciences and Intelligent Systems Program, University of Pittsburgh, Pittsburgh, PA 15260, USA
| | - R. Marshall Austin
- Department of Pathology, University of Pittsburgh Medical Center, Magee-Womens Hospital, Pittsburgh, PA 15213, USA
| |
Collapse
|
12
|
Hira S, Deshpande PS. Mining precise cause and effect rules in large time series data of socio-economic indicators. SPRINGERPLUS 2016; 5:1625. [PMID: 27722044 PMCID: PMC5031588 DOI: 10.1186/s40064-016-3292-0] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 03/16/2016] [Accepted: 09/11/2016] [Indexed: 11/21/2022]
Abstract
Discovery of cause–effect relationships, particularly in large databases of time-series is challenging because of continuous data of different characteristics and complex lagged relationships. In this paper, we have proposed a novel approach, to extract cause–effect relationships in large time series data set of socioeconomic indicators. The method enhances the scope of relationship discovery to cause–effect relationships by identifying multiple causal structures such as binary, transitive, many to one and cyclic. We use temporal association and temporal odds ratio to exclude noncausal association and to ensure the high reliability of discovered causal rules. We assess the method with both synthetic and real-world datasets. Our proposed method will help to build quantitative models to analyze socioeconomic processes by generating a precise cause–effect relationship between different economic indicators. The outcome shows that the proposed method can effectively discover existing causality structure in large time series databases.
Collapse
Affiliation(s)
- Swati Hira
- Department of Computer Science and Engineering, Visvesvaraya National Institute of Technology, 440010 Nagpur, India
| | - P S Deshpande
- Department of Computer Science and Engineering, Visvesvaraya National Institute of Technology, 440010 Nagpur, India
| |
Collapse
|
13
|
Vierstra J, Stamatoyannopoulos JA. Genomic footprinting. Nat Methods 2016; 13:213-21. [DOI: 10.1038/nmeth.3768] [Citation(s) in RCA: 76] [Impact Index Per Article: 8.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/14/2015] [Accepted: 01/13/2016] [Indexed: 01/08/2023]
|
14
|
Ebert-Uphoff I, Deng Y. Identifying Physical Interactions from Climate Data: Challenges and Opportunities. Comput Sci Eng 2015. [DOI: 10.1109/mcse.2015.129] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022]
|
15
|
Abstract
Strict control of tissue-specific gene expression plays a pivotal role during lineage commitment. The transcription factor c-Myb has an essential role in adult haematopoiesis and functions as an oncogene when rearranged in human cancers. Here we have exploited digital genomic footprinting analysis to obtain a global picture of c-Myb occupancy in the genome of six different haematopoietic cell-types. We have biologically validated several c-Myb footprints using c-Myb knockdown data, reporter assays and DamID analysis. We show that our predicted conserved c-Myb footprints are highly dependent on the haematopoietic cell type, but that there is a group of gene targets common to all cell-types analysed. Furthermore, we find that c-Myb footprints co-localise with active histone mark H3K4me3 and are significantly enriched at exons. We analysed co-localisation of c-Myb footprints with 104 chromatin regulatory factors in K562 cells, and identified nine proteins that are enriched together with c-Myb footprints on genes positively regulated by c-Myb and one protein enriched on negatively regulated genes. Our data suggest that c-Myb is a transcription factor with multifaceted target regulation depending on cell type.
Collapse
|
16
|
Abstract
Recent advances in experimental and computational methodologies are enabling ultra-high resolution genome-wide profiles of protein-DNA binding events. For example, the ChIP-exo protocol precisely characterizes protein-DNA cross-linking patterns by combining chromatin immunoprecipitation (ChIP) with 5' → 3' exonuclease digestion. Similarly, deeply sequenced chromatin accessibility assays (e.g. DNase-seq and ATAC-seq) enable the detection of protected footprints at protein-DNA binding sites. With these techniques and others, we have the potential to characterize the individual nucleotides that interact with transcription factors, nucleosomes, RNA polymerases and other regulatory proteins in a particular cellular context. In this review, we explain the experimental assays and computational analysis methods that enable high-resolution profiling of protein-DNA binding events. We discuss the challenges and opportunities associated with such approaches.
Collapse
Affiliation(s)
- Shaun Mahony
- a Department of Biochemistry & Molecular Biology , Center for Eukaryotic Gene Regulation, The Pennsylvania State University , University Park , PA , USA
| | - B Franklin Pugh
- a Department of Biochemistry & Molecular Biology , Center for Eukaryotic Gene Regulation, The Pennsylvania State University , University Park , PA , USA
| |
Collapse
|
17
|
Wang C, Lv Y, Wang B, Yin C, Lin Y, Pan L. Survey of protein-DNA interactions in Aspergillus oryzae on a genomic scale. Nucleic Acids Res 2015; 43:4429-46. [PMID: 25883143 PMCID: PMC4482085 DOI: 10.1093/nar/gkv334] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/21/2014] [Accepted: 03/31/2015] [Indexed: 01/23/2023] Open
Abstract
The genome-scale delineation of in vivo protein–DNA interactions is key to understanding genome function. Only ∼5% of transcription factors (TFs) in the Aspergillus genus have been identified using traditional methods. Although the Aspergillus oryzae genome contains >600 TFs, knowledge of the in vivo genome-wide TF-binding sites (TFBSs) in aspergilli remains limited because of the lack of high-quality antibodies. We investigated the landscape of in vivo protein–DNA interactions across the A. oryzae genome through coupling the DNase I digestion of intact nuclei with massively parallel sequencing and the analysis of cleavage patterns in protein–DNA interactions at single-nucleotide resolution. The resulting map identified overrepresented de novo TF-binding motifs from genomic footprints, and provided the detailed chromatin remodeling patterns and the distribution of digital footprints near transcription start sites. The TFBSs of 19 known Aspergillus TFs were also identified based on DNase I digestion data surrounding potential binding sites in conjunction with TF binding specificity information. We observed that the cleavage patterns of TFBSs were dependent on the orientation of TF motifs and independent of strand orientation, consistent with the DNA shape features of binding motifs with flanking sequences.
Collapse
Affiliation(s)
- Chao Wang
- School of Bioscience and Bioengineering, South China University of Technology, Guangzhou, Guangdong, 510006, China
| | - Yangyong Lv
- School of Bioscience and Bioengineering, South China University of Technology, Guangzhou, Guangdong, 510006, China
| | - Bin Wang
- School of Bioscience and Bioengineering, South China University of Technology, Guangzhou, Guangdong, 510006, China
| | - Chao Yin
- School of Bioscience and Bioengineering, South China University of Technology, Guangzhou, Guangdong, 510006, China
| | - Ying Lin
- School of Bioscience and Bioengineering, South China University of Technology, Guangzhou, Guangdong, 510006, China
| | - Li Pan
- School of Bioscience and Bioengineering, South China University of Technology, Guangzhou, Guangdong, 510006, China
| |
Collapse
|
18
|
Tsompana M, Buck MJ. Chromatin accessibility: a window into the genome. Epigenetics Chromatin 2014; 7:33. [PMID: 25473421 PMCID: PMC4253006 DOI: 10.1186/1756-8935-7-33] [Citation(s) in RCA: 270] [Impact Index Per Article: 24.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/15/2014] [Accepted: 11/05/2014] [Indexed: 01/09/2023] Open
Abstract
Transcriptional activation throughout the eukaryotic lineage has been tightly linked with disruption of nucleosome organization at promoters, enhancers, silencers, insulators and locus control regions due to transcription factor binding. Regulatory DNA thus coincides with open or accessible genomic sites of remodeled chromatin. Current chromatin accessibility assays are used to separate the genome by enzymatic or chemical means and isolate either the accessible or protected locations. The isolated DNA is then quantified using a next-generation sequencing platform. Wide application of these assays has recently focused on the identification of the instrumental epigenetic changes responsible for differential gene expression, cell proliferation, functional diversification and disease development. Here we discuss the limitations and advantages of current genome-wide chromatin accessibility assays with especial attention on experimental precautions and sequence data analysis. We conclude with our perspective on future improvements necessary for moving the field of chromatin profiling forward.
Collapse
Affiliation(s)
- Maria Tsompana
- New York State Center of Excellence in Bioinformatics and Life Sciences, State University of New York at Buffalo, 701 Ellicott St, Buffalo, NY 14203 USA
| | - Michael J Buck
- New York State Center of Excellence in Bioinformatics and Life Sciences, State University of New York at Buffalo, 701 Ellicott St, Buffalo, NY 14203 USA ; Department of Biochemistry, State University of New York at Buffalo, Buffalo, NY USA
| |
Collapse
|
19
|
Fattori J, Indolfo NDC, Campos JCLDO, Videira NB, Bridi AV, Doratioto TR, Assis MAD, Figueira ACM. Investigation of Interactions between DNA and Nuclear Receptors: A Review of the Most Used Methods. NUCLEAR RECEPTOR RESEARCH 2014. [DOI: 10.11131/2014/101090] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open
Affiliation(s)
- Juliana Fattori
- Brazilian Biosciences National Laboratory (LNBio), Brazilian Center for Research in Energy and Materials (CNPEM), P.O. Box 6192, Campinas-SP, Brazil
| | - Nathalia de Carvalho Indolfo
- Brazilian Biosciences National Laboratory (LNBio), Brazilian Center for Research in Energy and Materials (CNPEM), P.O. Box 6192, Campinas-SP, Brazil
| | | | - Natália Bernardi Videira
- Brazilian Biosciences National Laboratory (LNBio), Brazilian Center for Research in Energy and Materials (CNPEM), P.O. Box 6192, Campinas-SP, Brazil
| | - Aline Villanova Bridi
- Brazilian Biosciences National Laboratory (LNBio), Brazilian Center for Research in Energy and Materials (CNPEM), P.O. Box 6192, Campinas-SP, Brazil
| | - Tábata Renée Doratioto
- Brazilian Biosciences National Laboratory (LNBio), Brazilian Center for Research in Energy and Materials (CNPEM), P.O. Box 6192, Campinas-SP, Brazil
| | - Michelle Alexandrino de Assis
- Brazilian Biosciences National Laboratory (LNBio), Brazilian Center for Research in Energy and Materials (CNPEM), P.O. Box 6192, Campinas-SP, Brazil
| | - Ana Carolina Migliorini Figueira
- Brazilian Biosciences National Laboratory (LNBio), Brazilian Center for Research in Energy and Materials (CNPEM), P.O. Box 6192, Campinas-SP, Brazil
| |
Collapse
|
20
|
Sung MH, Guertin MJ, Baek S, Hager GL. DNase footprint signatures are dictated by factor dynamics and DNA sequence. Mol Cell 2014; 56:275-285. [PMID: 25242143 DOI: 10.1016/j.molcel.2014.08.016] [Citation(s) in RCA: 127] [Impact Index Per Article: 11.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/18/2014] [Revised: 05/05/2014] [Accepted: 08/15/2014] [Indexed: 12/13/2022]
Abstract
Genomic footprinting has emerged as an unbiased discovery method for transcription factor (TF) occupancy at cognate DNA in vivo. A basic premise of footprinting is that sequence-specific TF-DNA interactions are associated with localized resistance to nucleases, leaving observable signatures of cleavage within accessible chromatin. This phenomenon is interpreted to imply protection of the critical nucleotides by the stably bound protein factor. However, this model conflicts with previous reports of many TFs exchanging with specific binding sites in living cells on a timescale of seconds. We show that TFs with short DNA residence times have no footprints at bound motif elements. Moreover, the nuclease cleavage profile within a footprint originates from the DNA sequence in the factor-binding site, rather than from the protein occupying specific nucleotides. These findings suggest a revised understanding of TF footprinting and reveal limitations in comprehensive reconstruction of the TF regulatory network using this approach.
Collapse
Affiliation(s)
- Myong-Hee Sung
- Laboratory of Receptor Biology and Gene Expression, National Cancer Institute, NIH, Building 41, 41 Library Drive, Bethesda, MD 20892, USA
| | - Michael J Guertin
- Laboratory of Receptor Biology and Gene Expression, National Cancer Institute, NIH, Building 41, 41 Library Drive, Bethesda, MD 20892, USA
| | - Songjoon Baek
- Laboratory of Receptor Biology and Gene Expression, National Cancer Institute, NIH, Building 41, 41 Library Drive, Bethesda, MD 20892, USA
| | - Gordon L Hager
- Laboratory of Receptor Biology and Gene Expression, National Cancer Institute, NIH, Building 41, 41 Library Drive, Bethesda, MD 20892, USA.
| |
Collapse
|
21
|
Meyer CA, Liu XS. Identifying and mitigating bias in next-generation sequencing methods for chromatin biology. Nat Rev Genet 2014; 15:709-21. [PMID: 25223782 DOI: 10.1038/nrg3788] [Citation(s) in RCA: 219] [Impact Index Per Article: 19.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/07/2023]
Abstract
Next-generation sequencing (NGS) technologies have been used in diverse ways to investigate various aspects of chromatin biology by identifying genomic loci that are bound by transcription factors, occupied by nucleosomes or accessible to nuclease cleavage, or loci that physically interact with remote genomic loci. However, reaching sound biological conclusions from such NGS enrichment profiles requires many potential biases to be taken into account. In this Review, we discuss common ways in which biases may be introduced into NGS chromatin profiling data, approaches to diagnose these biases and analytical techniques to mitigate their effect.
Collapse
Affiliation(s)
- Clifford A Meyer
- Department of Biostatistics and Computational Biology, Dana-Farber Cancer Institute and Harvard School of Public Health, Boston, Massachusetts 02115, USA; and Center for Functional Cancer Epigenetics, Dana-Farber Cancer Institute, Boston, Massachusetts 02215, USA
| | - X Shirley Liu
- Department of Biostatistics and Computational Biology, Dana-Farber Cancer Institute and Harvard School of Public Health, Boston, Massachusetts 02115, USA; and Center for Functional Cancer Epigenetics, Dana-Farber Cancer Institute, Boston, Massachusetts 02215, USA
| |
Collapse
|
22
|
Zhong J, Wasson T, Hartemink AJ. Learning protein-DNA interaction landscapes by integrating experimental data through computational models. ACTA ACUST UNITED AC 2014; 30:2868-74. [PMID: 24974204 DOI: 10.1093/bioinformatics/btu408] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/31/2022]
Abstract
MOTIVATION Transcriptional regulation is directly enacted by the interactions between DNA and many proteins, including transcription factors (TFs), nucleosomes and polymerases. A critical step in deciphering transcriptional regulation is to infer, and eventually predict, the precise locations of these interactions, along with their strength and frequency. While recent datasets yield great insight into these interactions, individual data sources often provide only partial information regarding one aspect of the complete interaction landscape. For example, chromatin immunoprecipitation (ChIP) reveals the binding positions of a protein, but only for one protein at a time. In contrast, nucleases like MNase and DNase can be used to reveal binding positions for many different proteins at once, but cannot easily determine the identities of those proteins. Currently, few statistical frameworks jointly model these different data sources to reveal an accurate, holistic view of the in vivo protein-DNA interaction landscape. RESULTS Here, we develop a novel statistical framework that integrates different sources of experimental information within a thermodynamic model of competitive binding to jointly learn a holistic view of the in vivo protein-DNA interaction landscape. We show that our framework learns an interaction landscape with increased accuracy, explaining multiple sets of data in accordance with thermodynamic principles of competitive DNA binding. The resulting model of genomic occupancy provides a precise mechanistic vantage point from which to explore the role of protein-DNA interactions in transcriptional regulation. AVAILABILITY AND IMPLEMENTATION The C source code for compete and Python source code for MCMC-based inference are available at http://www.cs.duke.edu/∼amink. CONTACT amink@cs.duke.edu SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Jianling Zhong
- Program in Computational Biology and Bioinformatics, Duke University, Durham, NC 27708, Knowledge Systems and Informatics, Lawrence Livermore National Laboratory, Livermore, CA 94550 and Department of Computer Science, Duke University, Durham, NC 27708, USA
| | - Todd Wasson
- Program in Computational Biology and Bioinformatics, Duke University, Durham, NC 27708, Knowledge Systems and Informatics, Lawrence Livermore National Laboratory, Livermore, CA 94550 and Department of Computer Science, Duke University, Durham, NC 27708, USA
| | - Alexander J Hartemink
- Program in Computational Biology and Bioinformatics, Duke University, Durham, NC 27708, Knowledge Systems and Informatics, Lawrence Livermore National Laboratory, Livermore, CA 94550 and Department of Computer Science, Duke University, Durham, NC 27708, USA Program in Computational Biology and Bioinformatics, Duke University, Durham, NC 27708, Knowledge Systems and Informatics, Lawrence Livermore National Laboratory, Livermore, CA 94550 and Department of Computer Science, Duke University, Durham, NC 27708, USA
| |
Collapse
|
23
|
Sherwood RI, Hashimoto T, O'Donnell CW, Lewis S, Barkal AA, van Hoff JP, Karun V, Jaakkola T, Gifford DK. Discovery of directional and nondirectional pioneer transcription factors by modeling DNase profile magnitude and shape. Nat Biotechnol 2014; 32:171-178. [PMID: 24441470 PMCID: PMC3951735 DOI: 10.1038/nbt.2798] [Citation(s) in RCA: 323] [Impact Index Per Article: 29.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/13/2013] [Accepted: 12/16/2013] [Indexed: 11/12/2022]
Abstract
We describe protein interaction quantitation (PIQ), a computational method for modeling the magnitude and shape of genome-wide DNase I hypersensitivity profiles to identify transcription factor (TF) binding sites. Through the use of machine-learning techniques, PIQ identified binding sites for >700 TFs from one DNase I hypersensitivity analysis followed by sequencing (DNase-seq) experiment with accuracy comparable to that of chromatin immunoprecipitation followed by sequencing (ChIP-seq). We applied PIQ to analyze DNase-seq data from mouse embryonic stem cells differentiating into prepancreatic and intestinal endoderm. We identified 120 and experimentally validated eight 'pioneer' TF families that dynamically open chromatin. Four pioneer TF families only opened chromatin in one direction from their motifs. Furthermore, we identified 'settler' TFs whose genomic binding is principally governed by proximity to open chromatin. Our results support a model of hierarchical TF binding in which directional and nondirectional pioneer activity shapes the chromatin landscape for population by settler TFs.
Collapse
Affiliation(s)
- Richard I Sherwood
- Division of Genetics, Department of Medicine, Brigham and Women's Hospital and Harvard Medical School, Boston, MA 02115
| | - Tatsunori Hashimoto
- Computer Science and Artificial Intelligence Laboratory, Massachusetts Institute of Technology, Cambridge, MA 02142
| | - Charles W O'Donnell
- Computer Science and Artificial Intelligence Laboratory, Massachusetts Institute of Technology, Cambridge, MA 02142
- Department of Stem Cell and Regenerative Biology, Harvard University and Harvard Medical School, 7 Divinity Avenue, Cambridge, MA 02138
| | - Sophia Lewis
- Division of Genetics, Department of Medicine, Brigham and Women's Hospital and Harvard Medical School, Boston, MA 02115
| | - Amira A Barkal
- Division of Genetics, Department of Medicine, Brigham and Women's Hospital and Harvard Medical School, Boston, MA 02115
| | - John Peter van Hoff
- Division of Genetics, Department of Medicine, Brigham and Women's Hospital and Harvard Medical School, Boston, MA 02115
| | - Vivek Karun
- Division of Genetics, Department of Medicine, Brigham and Women's Hospital and Harvard Medical School, Boston, MA 02115
| | - Tommi Jaakkola
- Computer Science and Artificial Intelligence Laboratory, Massachusetts Institute of Technology, Cambridge, MA 02142
| | - David K Gifford
- Computer Science and Artificial Intelligence Laboratory, Massachusetts Institute of Technology, Cambridge, MA 02142
- Department of Stem Cell and Regenerative Biology, Harvard University and Harvard Medical School, 7 Divinity Avenue, Cambridge, MA 02138
| |
Collapse
|
24
|
Liu G, Mercer TR, Shearwood AMJ, Siira SJ, Hibbs ME, Mattick JS, Rackham O, Filipovska A. Mapping of mitochondrial RNA-protein interactions by digital RNase footprinting. Cell Rep 2013; 5:839-48. [PMID: 24183674 DOI: 10.1016/j.celrep.2013.09.036] [Citation(s) in RCA: 26] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/30/2013] [Revised: 09/04/2013] [Accepted: 09/25/2013] [Indexed: 01/01/2023] Open
Abstract
Human mitochondrial DNA is transcribed as long polycistronic transcripts that encompass each strand of the genome and are processed subsequently into mature mRNAs, tRNAs, and rRNAs, necessitating widespread posttranscriptional regulation. Here, we establish methods for massively parallel sequencing and analyses of RNase-accessible regions of human mitochondrial RNA and thereby identify specific regions within mitochondrial transcripts that are bound by proteins. This approach provides a range of insights into the contribution of RNA-binding proteins to the regulation of mitochondrial gene expression.
Collapse
Affiliation(s)
- Ganqiang Liu
- Garvan Institute of Medical Research, Sydney NSW 2010, Australia; Institute for Molecular Bioscience, The University of Queensland, Brisbane QLD 4072, Australia
| | | | | | | | | | | | | | | |
Collapse
|
25
|
Luo K, Hartemink AJ. Using DNase digestion data to accurately identify transcription factor binding sites. PACIFIC SYMPOSIUM ON BIOCOMPUTING. PACIFIC SYMPOSIUM ON BIOCOMPUTING 2013:80-91. [PMID: 23424114 PMCID: PMC3716004] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Subscribe] [Scholar Register] [Indexed: 06/01/2023]
Abstract
Identifying binding sites of transcription factors (TFs) is a key task in deciphering transcriptional regulation. ChIP-based methods are used to survey the genomic locations of a single TF in each experiment. But methods combining DNase digestion data with TF binding specificity information could potentially be used to survey the locations of many TFs in the same experiment, provided such methods permit reasonable levels of sensitivity and specificity. Here, we present a simple such method that outperforms a leading recent method, centipede, marginally in human but dramatically in yeast (average auROC across 20 TFs increases from 74% to 94%). Our method is based on logistic regression and thus benefits from supervision, but we show that partially and completely unsupervised variants perform nearly as well. Because the number of parameters in our method is at least an order of magnitude smaller than CENTIPEDE, we dub it MILLIPEDE.
Collapse
Affiliation(s)
- Kaixuan Luo
- Program in Computational Biology and Bioinformatics, Duke University, Durham, NC 27708, USA.
| | | |
Collapse
|
26
|
Madrigal P, Krajewski P. Current bioinformatic approaches to identify DNase I hypersensitive sites and genomic footprints from DNase-seq data. Front Genet 2012; 3:230. [PMID: 23118738 PMCID: PMC3484326 DOI: 10.3389/fgene.2012.00230] [Citation(s) in RCA: 25] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/28/2012] [Accepted: 10/13/2012] [Indexed: 12/16/2022] Open
Affiliation(s)
- Pedro Madrigal
- Laboratory of Biometry, Institute of Plant Genetics, Polish Academy of Sciences Poznań, Poland
| | | |
Collapse
|
27
|
Smith RP, Lam ET, Markova S, Yee SW, Ahituv N. Pharmacogene regulatory elements: from discovery to applications. Genome Med 2012; 4:45. [PMID: 22630332 PMCID: PMC3506911 DOI: 10.1186/gm344] [Citation(s) in RCA: 16] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/15/2022] Open
Abstract
Regulatory elements play an important role in the variability of individual responses to drug treatment. This has been established through studies on three classes of elements that regulate RNA and protein abundance: promoters, enhancers and microRNAs. Each of these elements, and genetic variants within them, are being characterized at an exponential pace by next-generation sequencing (NGS) technologies. In this review, we outline examples of how each class of element affects drug response via regulation of drug targets, transporters and enzymes. We also discuss the impact of NGS technologies such as chromatin immunoprecipitation sequencing (ChIP-Seq) and RNA sequencing (RNA-Seq), and the ramifications of new techniques such as high-throughput chromosome capture (Hi-C), chromatin interaction analysis by paired-end tag sequencing (ChIA-PET) and massively parallel reporter assays (MPRA). NGS approaches are generating data faster than they can be analyzed, and new methods will be required to prioritize laboratory results before they are ready for the clinic. However, there is no doubt that these approaches will bring about a systems-level understanding of the interplay between genetic variants and drug response. An understanding of the importance of regulatory variants in pharmacogenomics will facilitate the identification of responders versus non-responders, the prevention of adverse effects and the optimization of therapies for individual patients.
Collapse
Affiliation(s)
- Robin P Smith
- Department of Bioengineering and Therapeutic Sciences, University of California San Francisco, San Francisco, CA, USA
- Institute for Human Genetics, University of California San Francisco, San Francisco, CA, USA
| | - Ernest T Lam
- Institute for Human Genetics, University of California San Francisco, San Francisco, CA, USA
- Cardiovascular Research Institute, University of California San Francisco, San Francisco, CA, USA
| | - Svetlana Markova
- Department of Bioengineering and Therapeutic Sciences, University of California San Francisco, San Francisco, CA, USA
| | - Sook Wah Yee
- Department of Bioengineering and Therapeutic Sciences, University of California San Francisco, San Francisco, CA, USA
| | - Nadav Ahituv
- Department of Bioengineering and Therapeutic Sciences, University of California San Francisco, San Francisco, CA, USA
- Institute for Human Genetics, University of California San Francisco, San Francisco, CA, USA
| |
Collapse
|
28
|
Flores O, Orozco M. nucleR: a package for non-parametric nucleosome positioning. Bioinformatics 2011; 27:2149-50. [DOI: 10.1093/bioinformatics/btr345] [Citation(s) in RCA: 64] [Impact Index Per Article: 4.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open
|
29
|
Ghedira K, Hornischer K, Konovalova T, Jenhani AZ, Benkahla A, Kel A. Identification of key mechanisms controlling gene expression in Leishmania infected macrophages using genome-wide promoter analysis. INFECTION GENETICS AND EVOLUTION 2010; 11:769-77. [PMID: 21093613 DOI: 10.1016/j.meegid.2010.10.015] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/25/2010] [Revised: 10/18/2010] [Accepted: 10/19/2010] [Indexed: 01/15/2023]
Abstract
The present study describes the in silico prediction of the regulatory network of Leishmania infected human macrophages. The construction of the gene regulatory network requires the identification of Transcription Factor Binding Sites (TFBSs) in the regulatory regions (promoters, enhancers) of genes that are regulated upon Leishmania infection. The promoters of human, mouse, rat, dog and chimpanzee genes were first identified in the whole genomes using available experimental data on full length cDNA sequences or deep CAGE tag data (DBTSS, FANTOM3, FANTOM4), mRNA models (ENSEMBL), or using hand annotated data (EPD, TRANSFAC). A phylogenetic footprinting analysis and a MATCH analysis of the promoter sequences were then performed to predict TFBS. Then, an SQL database that integrates all results of promoter analysis as well as other genome annotation information obtained from ENSEMBL, TRANSFAC, TRED (Transcription Regulatory Element Database), ORegAnno and the ENCODE project, was established. Finally publicly available expression data from human Leishmania infected macrophages were analyzed using the genome-wide information on predicted TFBS with the computer system ExPlain™. The gene regulatory network was constructed and activated signal transduction pathways were revealed. The Irak1 pathway was identified as a key pathway regulating gene expression changes in Leishmania infected macrophages.
Collapse
Affiliation(s)
- Kais Ghedira
- Laboratory of Immunology, Vaccinology, and Molecular Genetics, Institut Pasteur de Tunis, 13 place Pasteur BP 74, Tunis, Tunisia
| | | | | | | | | | | |
Collapse
|
30
|
Pique-Regi R, Degner JF, Pai AA, Gaffney DJ, Gilad Y, Pritchard JK. Accurate inference of transcription factor binding from DNA sequence and chromatin accessibility data. Genome Res 2010; 21:447-55. [PMID: 21106904 DOI: 10.1101/gr.112623.110] [Citation(s) in RCA: 390] [Impact Index Per Article: 26.0] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/06/2023]
Abstract
Accurate functional annotation of regulatory elements is essential for understanding global gene regulation. Here, we report a genome-wide map of 827,000 transcription factor binding sites in human lymphoblastoid cell lines, which is comprised of sites corresponding to 239 position weight matrices of known transcription factor binding motifs, and 49 novel sequence motifs. To generate this map, we developed a probabilistic framework that integrates cell- or tissue-specific experimental data such as histone modifications and DNase I cleavage patterns with genomic information such as gene annotation and evolutionary conservation. Comparison to empirical ChIP-seq data suggests that our method is highly accurate yet has the advantage of targeting many factors in a single assay. We anticipate that this approach will be a valuable tool for genome-wide studies of gene regulation in a wide variety of cell types or tissues under diverse conditions.
Collapse
Affiliation(s)
- Roger Pique-Regi
- Department of Human Genetics, University of Chicago, Chicago, Illinois 60637, USA.
| | | | | | | | | | | |
Collapse
|