1
|
Foroozandeh Shahraki M, Farahbod M, Libbrecht MW. Robust chromatin state annotation. Genome Res 2024; 34:469-483. [PMID: 38514204 PMCID: PMC11067878 DOI: 10.1101/gr.278343.123] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/28/2023] [Accepted: 03/19/2024] [Indexed: 03/23/2024]
Abstract
With the goal of mapping genomic activity, international projects have recently measured epigenetic activity in hundreds of cell and tissue types. Chromatin state annotations produced by segmentation and genome annotation (SAGA) methods have emerged as the predominant way to summarize these epigenomic data sets in order to annotate the genome. These chromatin state annotations are essential for many genomic tasks, including identifying active regulatory elements and interpreting disease-associated genetic variation. However, despite the widespread applications of SAGA methods, no principled approach exists to evaluate the statistical significance of chromatin state assignments. Here, we propose the first method for assigning calibrated confidence scores to chromatin state annotations. Toward this goal, we performed a comprehensive evaluation of the reproducibility of the two most widely used existing SAGA methods, ChromHMM and Segway. We found that their predictions are frequently irreproducible. For example, when applying the same SAGA method on two sets of experimental replicates, 27%-69% of predicted enhancers fail to replicate. This suggests that a substantial fraction of predicted elements in existing chromatin state annotations cannot be relied upon. To remedy this problem, we introduce SAGAconf, a method for assigning a measure of confidence (r-value) to chromatin state annotations. SAGAconf works with any SAGA method and assigns an r-value to each genomic bin of a chromatin state annotation that represents the probability that the label of this bin will be reproduced in a replicated experiment. Thus, SAGAconf allows a researcher to select only the reliable predictions from a chromatin annotation for use in downstream analyses.
Collapse
Affiliation(s)
| | - Marjan Farahbod
- School of Computing Science, Simon Fraser University, Burnaby, British Columbia V51 1S6, Canada
| | - Maxwell W Libbrecht
- School of Computing Science, Simon Fraser University, Burnaby, British Columbia V51 1S6, Canada
| |
Collapse
|
2
|
Mar D, Babenko IM, Zhang R, Noble WS, Denisenko O, Vaisar T, Bomsztyk K. MultiomicsTracks96: A high throughput PIXUL-Matrix-based toolbox to profile frozen and FFPE tissues multiomes. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.03.16.533031. [PMID: 36993219 PMCID: PMC10055122 DOI: 10.1101/2023.03.16.533031] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/19/2023]
Abstract
Background The multiome is an integrated assembly of distinct classes of molecules and molecular properties, or "omes," measured in the same biospecimen. Freezing and formalin-fixed paraffin-embedding (FFPE) are two common ways to store tissues, and these practices have generated vast biospecimen repositories. However, these biospecimens have been underutilized for multi-omic analysis due to the low throughput of current analytical technologies that impede large-scale studies. Methods Tissue sampling, preparation, and downstream analysis were integrated into a 96-well format multi-omics workflow, MultiomicsTracks96. Frozen mouse organs were sampled using the CryoGrid system, and matched FFPE samples were processed using a microtome. The 96-well format sonicator, PIXUL, was adapted to extract DNA, RNA, chromatin, and protein from tissues. The 96-well format analytical platform, Matrix, was used for chromatin immunoprecipitation (ChIP), methylated DNA immunoprecipitation (MeDIP), methylated RNA immunoprecipitation (MeRIP), and RNA reverse transcription (RT) assays followed by qPCR and sequencing. LC-MS/MS was used for protein analysis. The Segway genome segmentation algorithm was used to identify functional genomic regions, and linear regressors based on the multi-omics data were trained to predict protein expression. Results MultiomicsTracks96 was used to generate 8-dimensional datasets including RNA-seq measurements of mRNA expression; MeRIP-seq measurements of m6A and m5C; ChIP-seq measurements of H3K27Ac, H3K4m3, and Pol II; MeDIP-seq measurements of 5mC; and LC-MS/MS measurements of proteins. We observed high correlation between data from matched frozen and FFPE organs. The Segway genome segmentation algorithm applied to epigenomic profiles (ChIP-seq: H3K27Ac, H3K4m3, Pol II; MeDIP-seq: 5mC) was able to recapitulate and predict organ-specific super-enhancers in both FFPE and frozen samples. Linear regression analysis showed that proteomic expression profiles can be more accurately predicted by the full suite of multi-omics data, compared to using epigenomic, transcriptomic, or epitranscriptomic measurements individually. Conclusions The MultiomicsTracks96 workflow is well suited for high dimensional multi-omics studies - for instance, multiorgan animal models of disease, drug toxicities, environmental exposure, and aging as well as large-scale clinical investigations involving the use of biospecimens from existing tissue repositories.
Collapse
|
3
|
Niu YN, Roberts EG, Denisko D, Hoffman MM. Assessing and assuring interoperability of a genomics file format. Bioinformatics 2022; 38:3327-3336. [PMID: 35575355 PMCID: PMC9237710 DOI: 10.1093/bioinformatics/btac327] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/07/2022] [Revised: 03/30/2022] [Accepted: 05/11/2022] [Indexed: 12/01/2022] Open
Abstract
Motivation Bioinformatics software tools operate largely through the use of specialized genomics file formats. Often these formats lack formal specification, making it difficult or impossible for the creators of these tools to robustly test them for correct handling of input and output. This causes problems in interoperability between different tools that, at best, wastes time and frustrates users. At worst, interoperability issues could lead to undetected errors in scientific results. Results We developed a new verification system, Acidbio, which tests for correct behavior in bioinformatics software packages. We crafted tests to unify correct behavior when tools encounter various edge cases—potentially unexpected inputs that exemplify the limits of the format. To analyze the performance of existing software, we tested the input validation of 80 Bioconda packages that parsed the Browser Extensible Data (BED) format. We also used a fuzzing approach to automatically perform additional testing. Of 80 software packages examined, 75 achieved less than 70% correctness on our test suite. We categorized multiple root causes for the poor performance of different types of software. Fuzzing detected other errors that the manually designed test suite could not. We also created a badge system that developers can use to indicate more precisely which BED variants their software accepts and to advertise the software’s performance on the test suite. Availability and implementation Acidbio is available at https://github.com/hoffmangroup/acidbio. Supplementary information Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Yi Nian Niu
- Princess Margaret Cancer Centre University Health Network, Toronto, ON, M5G 2C1, Canada
| | - Eric G Roberts
- Princess Margaret Cancer Centre University Health Network, Toronto, ON, M5G 2C1, Canada
| | - Danielle Denisko
- Princess Margaret Cancer Centre University Health Network, Toronto, ON, M5G 2C1, Canada.,Department of Medical Biophysics, University of Toronto, Toronto, ON, M5G 1L7, Canada
| | - Michael M Hoffman
- Princess Margaret Cancer Centre University Health Network, Toronto, ON, M5G 2C1, Canada.,Department of Medical Biophysics, University of Toronto, Toronto, ON, M5G 1L7, Canada.,Department of Computer Science, University of Toronto, Toronto, ON, M5S 2E4, Canada.,Vector Institute, Toronto, ON, M5G 1M1, Canada
| |
Collapse
|
4
|
Masoumi S, Libbrecht MW, Wiese KC. SigTools: exploratory visualization for genomic signals. Bioinformatics 2022; 38:1126-1128. [PMID: 34718413 DOI: 10.1093/bioinformatics/btab742] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/21/2021] [Revised: 09/29/2021] [Accepted: 10/25/2021] [Indexed: 02/03/2023] Open
Abstract
MOTIVATION With the advancement of sequencing technologies, genomic data sets are constantly being expanded by high volumes of different data types. One recently introduced data type in genomic science is genomic signals, which are usually short-read coverage measurements over the genome. To understand and evaluate the results of such studies, one needs to understand and analyze the characteristics of the input data. RESULTS SigTools is an R-based genomic signals visualization package developed with two objectives: (i) to facilitate genomic signals exploration in order to uncover insights for later model training, refinement and development by including distribution and autocorrelation plots; (ii) to enable genomic signals interpretation by including correlation and aggregation plots. In addition, our corresponding web application, SigTools-Shiny, extends the accessibility scope of these modules to people who are more comfortable working with graphical user interfaces instead of command-line tools. AVAILABILITY AND IMPLEMENTATION SigTools source code, installation guide and manual is freely available on http://github.com/shohre73.
Collapse
Affiliation(s)
- Shohre Masoumi
- School of Computing Science, Simon Fraser University, Burnaby, British Columbia, Canada
| | - Maxwell W Libbrecht
- School of Computing Science, Simon Fraser University, Burnaby, British Columbia, Canada
| | - Kay C Wiese
- School of Computing Science, Simon Fraser University, Burnaby, British Columbia, Canada
| |
Collapse
|
5
|
Libbrecht MW, Chan RCW, Hoffman MM. Segmentation and genome annotation algorithms for identifying chromatin state and other genomic patterns. PLoS Comput Biol 2021; 17:e1009423. [PMID: 34648491 PMCID: PMC8516206 DOI: 10.1371/journal.pcbi.1009423] [Citation(s) in RCA: 9] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/19/2022] Open
Abstract
Segmentation and genome annotation (SAGA) algorithms are widely used to understand genome activity and gene regulation. These algorithms take as input epigenomic datasets, such as chromatin immunoprecipitation-sequencing (ChIP-seq) measurements of histone modifications or transcription factor binding. They partition the genome and assign a label to each segment such that positions with the same label exhibit similar patterns of input data. SAGA algorithms discover categories of activity such as promoters, enhancers, or parts of genes without prior knowledge of known genomic elements. In this sense, they generally act in an unsupervised fashion like clustering algorithms, but with the additional simultaneous function of segmenting the genome. Here, we review the common methodological framework that underlies these methods, review variants of and improvements upon this basic framework, and discuss the outlook for future work. This review is intended for those interested in applying SAGA methods and for computational researchers interested in improving upon them.
Collapse
Affiliation(s)
| | - Rachel C. W. Chan
- Department of Computer Science, University of Toronto, Toronto, Canada
- Princess Margaret Cancer Centre, University Health Network, Toronto, Canada
| | - Michael M. Hoffman
- Department of Computer Science, University of Toronto, Toronto, Canada
- Princess Margaret Cancer Centre, University Health Network, Toronto, Canada
- Department of Medical Biophysics, University of Toronto, Toronto, Canada
- Vector Institute for Artificial Intelligence, Toronto, Canada
| |
Collapse
|
6
|
Pálinkás HL, Békési A, Róna G, Pongor L, Papp G, Tihanyi G, Holub E, Póti Á, Gemma C, Ali S, Morten MJ, Rothenberg E, Pagano M, Szűts D, Győrffy B, Vértessy BG. Genome-wide alterations of uracil distribution patterns in human DNA upon chemotherapeutic treatments. eLife 2020; 9:e60498. [PMID: 32956035 PMCID: PMC7505663 DOI: 10.7554/elife.60498] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/29/2020] [Accepted: 08/23/2020] [Indexed: 12/17/2022] Open
Abstract
Numerous anti-cancer drugs perturb thymidylate biosynthesis and lead to genomic uracil incorporation contributing to their antiproliferative effect. Still, it is not yet characterized if uracil incorporations have any positional preference. Here, we aimed to uncover genome-wide alterations in uracil pattern upon drug treatments in human cancer cell line models derived from HCT116. We developed a straightforward U-DNA sequencing method (U-DNA-Seq) that was combined with in situ super-resolution imaging. Using a novel robust analysis pipeline, we found broad regions with elevated probability of uracil occurrence both in treated and non-treated cells. Correlation with chromatin markers and other genomic features shows that non-treated cells possess uracil in the late replicating constitutive heterochromatic regions, while drug treatment induced a shift of incorporated uracil towards segments that are normally more active/functional. Data were corroborated by colocalization studies via dSTORM microscopy. This approach can be applied to study the dynamic spatio-temporal nature of genomic uracil.
Collapse
Affiliation(s)
- Hajnalka L Pálinkás
- Genome Metabolism Research Group, Institute of Enzymology, Research Centre for Natural SciencesBudapestHungary
- Department of Applied Biotechnology and Food Sciences, Budapest University of Technology and EconomicsBudapestHungary
- Doctoral School of Multidisciplinary Medical Science, University of SzegedSzegedHungary
| | - Angéla Békési
- Genome Metabolism Research Group, Institute of Enzymology, Research Centre for Natural SciencesBudapestHungary
- Department of Applied Biotechnology and Food Sciences, Budapest University of Technology and EconomicsBudapestHungary
| | - Gergely Róna
- Department of Applied Biotechnology and Food Sciences, Budapest University of Technology and EconomicsBudapestHungary
- Department of Biochemistry and Molecular Pharmacology, New York University School of MedicineNew YorkUnited States
- Perlmutter Cancer Center, New York University School of MedicineNew YorkUnited States
- Howard Hughes Medical Institute, New York University School of MedicineNew YorkUnited States
| | - Lőrinc Pongor
- Cancer Biomarker Research Group, Institute of Enzymology, Research Centre for Natural SciencesBudapestHungary
- Department of Bioinformatics and 2nd Department of Pediatrics, Semmelweis UniversityBudapestHungary
| | - Gábor Papp
- Department of Applied Biotechnology and Food Sciences, Budapest University of Technology and EconomicsBudapestHungary
| | - Gergely Tihanyi
- Genome Metabolism Research Group, Institute of Enzymology, Research Centre for Natural SciencesBudapestHungary
- Department of Applied Biotechnology and Food Sciences, Budapest University of Technology and EconomicsBudapestHungary
| | - Eszter Holub
- Department of Applied Biotechnology and Food Sciences, Budapest University of Technology and EconomicsBudapestHungary
| | - Ádám Póti
- Genome Stability Research Group, Institute of Enzymology, Research Centre for Natural SciencesBudapestHungary
| | - Carolina Gemma
- Department of Surgery and Cancer, Imperial College London, Hammersmith Hospital CampusLondonUnited Kingdom
| | - Simak Ali
- Department of Surgery and Cancer, Imperial College London, Hammersmith Hospital CampusLondonUnited Kingdom
| | - Michael J Morten
- Department of Biochemistry and Molecular Pharmacology, New York University School of MedicineNew YorkUnited States
| | - Eli Rothenberg
- Department of Biochemistry and Molecular Pharmacology, New York University School of MedicineNew YorkUnited States
| | - Michele Pagano
- Department of Biochemistry and Molecular Pharmacology, New York University School of MedicineNew YorkUnited States
- Perlmutter Cancer Center, New York University School of MedicineNew YorkUnited States
- Howard Hughes Medical Institute, New York University School of MedicineNew YorkUnited States
| | - Dávid Szűts
- Genome Stability Research Group, Institute of Enzymology, Research Centre for Natural SciencesBudapestHungary
| | - Balázs Győrffy
- Cancer Biomarker Research Group, Institute of Enzymology, Research Centre for Natural SciencesBudapestHungary
- Department of Bioinformatics and 2nd Department of Pediatrics, Semmelweis UniversityBudapestHungary
| | - Beáta G Vértessy
- Genome Metabolism Research Group, Institute of Enzymology, Research Centre for Natural SciencesBudapestHungary
- Department of Applied Biotechnology and Food Sciences, Budapest University of Technology and EconomicsBudapestHungary
| |
Collapse
|
7
|
Furcila D, García M, Toader C, Morales J, LaTorre A, Rodríguez Á, Pastor L, DeFelipe J, Alonso-Nanclares L. InTool Explorer: An Interactive Exploratory Analysis Tool for Versatile Visualizations of Neuroscientific Data. Front Neuroanat 2019; 13:28. [PMID: 30914926 PMCID: PMC6421977 DOI: 10.3389/fnana.2019.00028] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/09/2018] [Accepted: 02/18/2019] [Indexed: 02/05/2023] Open
Abstract
The bottleneck for progress in many research areas within neuroscience has shifted from the data acquisition to the data analysis stages. In the present article, we propose a method named InTool Explorer that we have developed to perform interactive exploratory data analysis, focusing on neuroanatomy as an example of its utility. This tool is freely-available software that has been designed to facilitate the study of complex neuroscience data. InTool Explorer requires no more than an internet connection and a web browser. The main contribution of this tool is to provide a user-designed canvas for data visualization and interaction, to perform specific exploratory tasks according to the user needs. Moreover, InTool Explorer permits visualization of the datasets in a very dynamic and versatile way using a linked-card approach. For this purpose, the tool allows the user to select among different predefined card types. Each card type offers an abstract data representation, a filtering tool or a set of statistical analysis methods. Additionally, InTool Explorer makes it possible linking raw images to the data. These images can be used by InTool Explorer to define new customized filtering cards. Another significant contribution of this tool is that it allows fast visualization of the data, error finding, and re-evaluation to establish new hypotheses or new lines of research. Thus, regarding its practical application in the laboratory, InTool Explorer provides a new opportunity to study and analyze neuroscience data prior to any statistical analysis being carried out.
Collapse
Affiliation(s)
- Diana Furcila
- Laboratorio Cajal de Circuitos Corticales (CTB), Universidad Politécnica de Madrid (UPM), Madrid, Spain.,Centro de Investigación Biomédica en Red sobre Enfermedades Neurodegenerativas (CIBERNED), Madrid, Spain.,Facultad de Psicología, Universidad Nacional de Educación a Distancia (UNED), Madrid, Spain
| | - Marcos García
- Escuela Técnica Superior de Ingeniería Informática, Universidad Rey Juan Carlos, Madrid, Spain.,Center for Computational Simulation (CCS), Universidad Politécnica de Madrid (UPM), Madrid, Spain
| | - Cosmin Toader
- Escuela Técnica Superior de Ingeniería Informática, Universidad Rey Juan Carlos, Madrid, Spain
| | | | - Antonio LaTorre
- Center for Computational Simulation (CCS), Universidad Politécnica de Madrid (UPM), Madrid, Spain.,Escuela Técnica Superior de Ingenieros Informáticos, Universidad Politécnica de Madrid (UPM), Madrid, Spain
| | - Ángel Rodríguez
- Center for Computational Simulation (CCS), Universidad Politécnica de Madrid (UPM), Madrid, Spain.,Escuela Técnica Superior de Ingenieros Informáticos, Universidad Politécnica de Madrid (UPM), Madrid, Spain
| | - Luis Pastor
- Escuela Técnica Superior de Ingeniería Informática, Universidad Rey Juan Carlos, Madrid, Spain.,Center for Computational Simulation (CCS), Universidad Politécnica de Madrid (UPM), Madrid, Spain
| | - Javier DeFelipe
- Laboratorio Cajal de Circuitos Corticales (CTB), Universidad Politécnica de Madrid (UPM), Madrid, Spain.,Centro de Investigación Biomédica en Red sobre Enfermedades Neurodegenerativas (CIBERNED), Madrid, Spain.,Department of Functional and Systems Neurobiology, Instituto Cajal (CSIC), Madrid, Spain
| | - Lidia Alonso-Nanclares
- Laboratorio Cajal de Circuitos Corticales (CTB), Universidad Politécnica de Madrid (UPM), Madrid, Spain.,Centro de Investigación Biomédica en Red sobre Enfermedades Neurodegenerativas (CIBERNED), Madrid, Spain.,Department of Functional and Systems Neurobiology, Instituto Cajal (CSIC), Madrid, Spain
| |
Collapse
|
8
|
Richard G, Legeai F, Prunier-Leterme N, Bretaudeau A, Tagu D, Jaquiéry J, Le Trionnaire G. Dosage compensation and sex-specific epigenetic landscape of the X chromosome in the pea aphid. Epigenetics Chromatin 2017. [PMID: 28638443 PMCID: PMC5471693 DOI: 10.1186/s13072-017-0137-1] [Citation(s) in RCA: 24] [Impact Index Per Article: 3.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/26/2023] Open
Abstract
Background Heterogametic species display a differential number of sex chromosomes resulting in imbalanced transcription levels for these chromosomes between males and females. To correct this disequilibrium, dosage compensation mechanisms involving gene expression and chromatin accessibility regulations have emerged throughout evolution. In insects, these mechanisms have been extensively characterized only in Drosophila but not in insects of agronomical importance. Aphids are indeed major pests of a wide range of crops. Their remarkable ability to switch from asexual to sexual reproduction during their life cycle largely explains the economic losses they can cause. As heterogametic insects, male aphids are X0, while females (asexual and sexual) are XX. Results Here, we analyzed transcriptomic and open chromatin data obtained from whole male and female individuals to evaluate the putative existence of a dosage compensation mechanism involving differential chromatin accessibility of the pea aphid’s X chromosome. Transcriptomic analyses first showed X/AA and XX/AA expression ratios for expressed genes close to 1 in males and females, respectively, suggesting dosage compensation in the pea aphid. Analyses of open chromatin data obtained by Formaldehyde-Assisted Isolation of Regulatory Elements (FAIRE-seq) revealed a X chromosome chromatin accessibility globally and significantly higher in males than in females, while autosomes’ chromatin accessibility is similar between sexes. Moreover, chromatin environment of X-linked genes displaying similar expression levels in males and females—and thus likely to be compensated—is significantly more accessible in males. Conclusions Our results suggest the existence of an underlying epigenetic mechanism enhancing the X chromosome chromatin accessibility in males to allow X-linked gene dose correction between sexes in the pea aphid, similar to Drosophila. Our study gives new evidence into the comprehension of dosage compensation in link with chromatin biology in insects and newly in a major crop pest, taking benefits from both transcriptomic and open chromatin data. Electronic supplementary material The online version of this article (doi:10.1186/s13072-017-0137-1) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Gautier Richard
- EGI, UMR 1349, INRA, Institut de Génétique, Environnement et Protection des Plantes (IGEPP), Domaine de la Motte, BP 35327, Le Rheu, France
| | - Fabrice Legeai
- BIPAA, UMR 1349, INRA, Institut de Génétique, Environnement et Protection des Plantes (IGEPP), Campus Beaulieu, Rennes, France.,Genscale, INRIA, IRISA, Campus Beaulieu, Rennes, France
| | - Nathalie Prunier-Leterme
- EGI, UMR 1349, INRA, Institut de Génétique, Environnement et Protection des Plantes (IGEPP), Domaine de la Motte, BP 35327, Le Rheu, France
| | - Anthony Bretaudeau
- BIPAA, UMR 1349, INRA, Institut de Génétique, Environnement et Protection des Plantes (IGEPP), Campus Beaulieu, Rennes, France.,Genouest, INRIA, IRISA, Campus Beaulieu, Rennes, France
| | - Denis Tagu
- EGI, UMR 1349, INRA, Institut de Génétique, Environnement et Protection des Plantes (IGEPP), Domaine de la Motte, BP 35327, Le Rheu, France
| | - Julie Jaquiéry
- CNRS, UMR 6553, EcoBio, University of Rennes 1, 35042 Rennes, France
| | - Gaël Le Trionnaire
- EGI, UMR 1349, INRA, Institut de Génétique, Environnement et Protection des Plantes (IGEPP), Domaine de la Motte, BP 35327, Le Rheu, France
| |
Collapse
|
9
|
Hoffman MM, Ernst J, Wilder SP, Kundaje A, Harris RS, Libbrecht M, Giardine B, Ellenbogen PM, Bilmes JA, Birney E, Hardison RC, Dunham I, Kellis M, Noble WS. Integrative annotation of chromatin elements from ENCODE data. Nucleic Acids Res 2012; 41:827-41. [PMID: 23221638 PMCID: PMC3553955 DOI: 10.1093/nar/gks1284] [Citation(s) in RCA: 357] [Impact Index Per Article: 29.8] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/01/2023] Open
Abstract
The ENCODE Project has generated a wealth of experimental information mapping diverse chromatin properties in several human cell lines. Although each such data track is independently informative toward the annotation of regulatory elements, their interrelations contain much richer information for the systematic annotation of regulatory elements. To uncover these interrelations and to generate an interpretable summary of the massive datasets of the ENCODE Project, we apply unsupervised learning methodologies, converting dozens of chromatin datasets into discrete annotation maps of regulatory regions and other chromatin elements across the human genome. These methods rediscover and summarize diverse aspects of chromatin architecture, elucidate the interplay between chromatin activity and RNA transcription, and reveal that a large proportion of the genome lies in a quiescent state, even across multiple cell types. The resulting annotation of non-coding regulatory elements correlate strongly with mammalian evolutionary constraint, and provide an unbiased approach for evaluating metrics of evolutionary constraint in human. Lastly, we use the regulatory annotations to revisit previously uncharacterized disease-associated loci, resulting in focused, testable hypotheses through the lens of the chromatin landscape.
Collapse
Affiliation(s)
- Michael M Hoffman
- Department of Genome Sciences, University of Washington, 3720 15th Ave NE, Seattle, WA 98195-5065, USA
| | | | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
10
|
Nair NU, Sahu AD, Bucher P, Moret BME. ChIPnorm: a statistical method for normalizing and identifying differential regions in histone modification ChIP-seq libraries. PLoS One 2012; 7:e39573. [PMID: 22870189 PMCID: PMC3411705 DOI: 10.1371/journal.pone.0039573] [Citation(s) in RCA: 30] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/23/2012] [Accepted: 05/22/2012] [Indexed: 11/19/2022] Open
Abstract
The advent of high-throughput technologies such as ChIP-seq has made possible the study of histone modifications. A problem of particular interest is the identification of regions of the genome where different cell types from the same organism exhibit different patterns of histone enrichment. This problem turns out to be surprisingly difficult, even in simple pairwise comparisons, because of the significant level of noise in ChIP-seq data. In this paper we propose a two-stage statistical method, called ChIPnorm, to normalize ChIP-seq data, and to find differential regions in the genome, given two libraries of histone modifications of different cell types. We show that the ChIPnorm method removes most of the noise and bias in the data and outperforms other normalization methods. We correlate the histone marks with gene expression data and confirm that histone modifications H3K27me3 and H3K4me3 act as respectively a repressor and an activator of genes. Compared to what was previously reported in the literature, we find that a substantially higher fraction of bivalent marks in ES cells for H3K27me3 and H3K4me3 move into a K27-only state. We find that most of the promoter regions in protein-coding genes have differential histone-modification sites. The software for this work can be downloaded from http://lcbb.epfl.ch/software.html.
Collapse
Affiliation(s)
- Nishanth Ulhas Nair
- Laboratory for Computational Biology and Bioinformatics, School of Computer and Communication Sciences, École Polytechnique Fédérale de Lausanne (EPFL), Lausanne, Switzerland
| | - Avinash Das Sahu
- Department of Computer Science, University of Maryland, College Park, Maryland, United States of America
| | - Philipp Bucher
- School of Life Sciences, École Polytechnique Fédérale de Lausanne (EPFL), Lausanne, Switzerland
- Swiss Institute for Bioinformatics, Lausanne, Switzerland
- * E-mail: (PB); (BM)
| | - Bernard M. E. Moret
- Laboratory for Computational Biology and Bioinformatics, School of Computer and Communication Sciences, École Polytechnique Fédérale de Lausanne (EPFL), Lausanne, Switzerland
- Swiss Institute for Bioinformatics, Lausanne, Switzerland
- * E-mail: (PB); (BM)
| |
Collapse
|
11
|
Hoffman MM, Buske OJ, Wang J, Weng Z, Bilmes JA, Noble WS. Unsupervised pattern discovery in human chromatin structure through genomic segmentation. Nat Methods 2012; 9:473-6. [PMID: 22426492 DOI: 10.1038/nmeth.1937] [Citation(s) in RCA: 383] [Impact Index Per Article: 31.9] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/01/2011] [Accepted: 02/14/2012] [Indexed: 01/24/2023]
Abstract
We trained Segway, a dynamic Bayesian network method, simultaneously on chromatin data from multiple experiments, including positions of histone modifications, transcription-factor binding and open chromatin, all derived from a human chronic myeloid leukemia cell line. In an unsupervised fashion, we identified patterns associated with transcription start sites, gene ends, enhancers, transcriptional regulator CTCF-binding regions and repressed regions. Software and genome browser tracks are at http://noble.gs.washington.edu/proj/segway/.
Collapse
Affiliation(s)
- Michael M Hoffman
- Department of Genome Sciences, University of Washington, Seattle, Washington, USA
| | | | | | | | | | | |
Collapse
|