1
|
Ning H, Boyes I, Numanagić I, Rott M, Xing L, Zhang X. Diagnostics of viral infections using high-throughput genome sequencing data. Brief Bioinform 2024; 25:bbae501. [PMID: 39417677 PMCID: PMC11483527 DOI: 10.1093/bib/bbae501] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/25/2024] [Revised: 08/30/2024] [Indexed: 10/19/2024] Open
Abstract
Plant viral infections cause significant economic losses, totalling $350 billion USD in 2021. With no treatment for virus-infected plants, accurate and efficient diagnosis is crucial to preventing and controlling these diseases. High-throughput sequencing (HTS) enables cost-efficient identification of known and unknown viruses. However, existing diagnostic pipelines face challenges. First, many methods depend on subjectively chosen parameter values, undermining their robustness across various data sources. Second, artifacts (e.g. false peaks) in the mapped sequence data can lead to incorrect diagnostic results. While some methods require manual or subjective verification to address these artifacts, others overlook them entirely, affecting the overall method performance and leading to imprecise or labour-intensive outcomes. To address these challenges, we introduce IIMI, a new automated analysis pipeline using machine learning to diagnose infections from 1583 plant viruses with HTS data. It adopts a data-driven approach for parameter selection, reducing subjectivity, and automatically filters out regions affected by artifacts, thus improving accuracy. Testing with in-house and published data shows IIMI's superiority over existing methods. Besides a prediction model, IIMI also provides resources on plant virus genomes, including annotations of regions prone to artifacts. The method is available as an R package (iimi) on CRAN and will integrate with the web application www.virtool.ca, enhancing accessibility and user convenience.
Collapse
Affiliation(s)
- Haochen Ning
- Department of Mathematics and Statistics, University of Victoria, 3800 Finnerty Road (Ring Road), BC V8P 5C2, Canada
| | - Ian Boyes
- Canadian Food Inspection Agency, Centre for Plant Health, 8801 Saanich Road E., North Saanich, BC V8L 1H3, Canada
| | - Ibrahim Numanagić
- Department of Computer Science, University of Victoria, 3800 Finnerty Road (Ring Road), BC V8P 5C2, Canada
| | - Michael Rott
- Canadian Food Inspection Agency, Centre for Plant Health, 8801 Saanich Road E., North Saanich, BC V8L 1H3, Canada
| | - Li Xing
- Department of Mathematics and Statistics, University of Saskatchewan, 106 Wiggins Road, Saskatoon, SK S7N 5E6, Canada
| | - Xuekui Zhang
- Department of Mathematics and Statistics, University of Victoria, 3800 Finnerty Road (Ring Road), BC V8P 5C2, Canada
| |
Collapse
|
2
|
Mapping nucleosome and chromatin architectures: A survey of computational methods. Comput Struct Biotechnol J 2022; 20:3955-3962. [PMID: 35950186 PMCID: PMC9340519 DOI: 10.1016/j.csbj.2022.07.037] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/08/2022] [Revised: 07/22/2022] [Accepted: 07/22/2022] [Indexed: 11/21/2022] Open
Abstract
With ever-growing genomic sequencing data, the data variabilities and the underlying biases of the sequencing technologies pose significant computational challenges ranging from the need for accurately detecting the nucleosome positioning or chromatin interaction to the need for developing normalization methods to eliminate systematic biases. This review mainly surveys the computational methods for mapping the higher-resolution nucleosome and higher-order chromatin architectures. While a detailed discussion of the underlying algorithms is beyond the scope of our survey, we have discussed the methods and tools that can detect the nucleosomes in the genome, then demonstrated the computational methods for identifying 3D chromatin domains and interactions. We further illustrated computational approaches for integrating multi-omics data with Hi-C data and the advance of single-cell (sc)Hi-C data analysis. Our survey provides a comprehensive and valuable resource for biomedical scientists interested in studying nucleosome organization and chromatin structures as well as for computational scientists who are interested in improving upon them.
Collapse
|
3
|
Lee W, Kim J, Yun JM, Ohn T, Gong Q. MeCP2 regulates gene expression through recognition of H3K27me3. Nat Commun 2020; 11:3140. [PMID: 32561780 PMCID: PMC7305159 DOI: 10.1038/s41467-020-16907-0] [Citation(s) in RCA: 34] [Impact Index Per Article: 6.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/28/2019] [Accepted: 05/27/2020] [Indexed: 02/08/2023] Open
Abstract
MeCP2 plays a multifaceted role in gene expression regulation and chromatin organization. Interaction between MeCP2 and methylated DNA in the regulation of gene expression is well established. However, the widespread distribution of MeCP2 suggests it has additional interactions with chromatin. Here we demonstrate, by both biochemical and genomic analyses, that MeCP2 directly interacts with nucleosomes and its genomic distribution correlates with that of H3K27me3. In particular, the methyl-CpG-binding domain of MeCP2 shows preferential interactions with H3K27me3. We further observe that the impact of MeCP2 on transcriptional changes correlates with histone post-translational modification patterns. Our findings indicate that MeCP2 interacts with genomic loci via binding to DNA as well as histones, and that interaction between MeCP2 and histone proteins plays a key role in gene expression regulation.
Collapse
Affiliation(s)
- Wooje Lee
- Department of Cellular & Molecular Medicine, College of Medicine, Chosun University, Gwangju, 61452, South Korea
| | - Jeeho Kim
- Department of Cellular & Molecular Medicine, College of Medicine, Chosun University, Gwangju, 61452, South Korea
| | - Jung-Mi Yun
- Department of Food and Nutrition, Chonnam National University, Gwangju, 61186, South Korea
| | - Takbum Ohn
- Department of Cellular & Molecular Medicine, College of Medicine, Chosun University, Gwangju, 61452, South Korea.
| | - Qizhi Gong
- Department of Cell Biology and Human Anatomy, University of California at Davis, School of Medicine, Davis, CA, 95616, USA.
| |
Collapse
|
4
|
Ankolkar M, Deshpande SS, Balasinor NH. Systemic hormonal modulation induces sperm nucleosomal imbalance in rat spermatozoa. Andrologia 2018; 50:e13060. [DOI: 10.1111/and.13060] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/25/2018] [Revised: 04/18/2018] [Accepted: 05/03/2018] [Indexed: 02/04/2023] Open
Affiliation(s)
- Mandar Ankolkar
- Department of Neuroendocrinology; National Institute for Research in Reproductive Health (ICMR); Mumbai India
| | - Sharvari S. Deshpande
- Department of Neuroendocrinology; National Institute for Research in Reproductive Health (ICMR); Mumbai India
| | - Nafisa H. Balasinor
- Department of Neuroendocrinology; National Institute for Research in Reproductive Health (ICMR); Mumbai India
| |
Collapse
|
5
|
Vainshtein Y, Rippe K, Teif VB. NucTools: analysis of chromatin feature occupancy profiles from high-throughput sequencing data. BMC Genomics 2017; 18:158. [PMID: 28196481 PMCID: PMC5309995 DOI: 10.1186/s12864-017-3580-2] [Citation(s) in RCA: 29] [Impact Index Per Article: 3.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/26/2016] [Accepted: 02/10/2017] [Indexed: 12/21/2022] Open
Abstract
Background Biomedical applications of high-throughput sequencing methods generate a vast amount of data in which numerous chromatin features are mapped along the genome. The results are frequently analysed by creating binary data sets that link the presence/absence of a given feature to specific genomic loci. However, the nucleosome occupancy or chromatin accessibility landscape is essentially continuous. It is currently a challenge in the field to cope with continuous distributions of deep sequencing chromatin readouts and to integrate the different types of discrete chromatin features to reveal linkages between them. Results Here we introduce the NucTools suite of Perl scripts as well as MATLAB- and R-based visualization programs for a nucleosome-centred downstream analysis of deep sequencing data. NucTools accounts for the continuous distribution of nucleosome occupancy. It allows calculations of nucleosome occupancy profiles averaged over several replicates, comparisons of nucleosome occupancy landscapes between different experimental conditions, and the estimation of the changes of integral chromatin properties such as the nucleosome repeat length. Furthermore, NucTools facilitates the annotation of nucleosome occupancy with other chromatin features like binding of transcription factors or architectural proteins, and epigenetic marks like histone modifications or DNA methylation. The applications of NucTools are demonstrated for the comparison of several datasets for nucleosome occupancy in mouse embryonic stem cells (ESCs) and mouse embryonic fibroblasts (MEFs). Conclusions The typical workflows of data processing and integrative analysis with NucTools reveal information on the interplay of nucleosome positioning with other features such as for example binding of a transcription factor CTCF, regions with stable and unstable nucleosomes, and domains of large organized chromatin K9me2 modifications (LOCKs). As potential limitations and problems we discuss how inter-replicate variability of MNase-seq experiments can be addressed. Electronic supplementary material The online version of this article (doi:10.1186/s12864-017-3580-2) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Yevhen Vainshtein
- Functional Genomics Group, Fraunhofer Institute for Interfacial Engineering and Biotechnology IGB, Nobelstraße 12, 70569, Stuttgart, Germany.
| | - Karsten Rippe
- Research Group Genome Organization & Function, German Cancer Research Center (DKFZ) and Bioquant, Im Neuenheimer Feld 280, 69120, Heidelberg, Germany
| | - Vladimir B Teif
- School of Biological Sciences, University of Essex, Wivenhoe Park, CO4 3SQ, Colchester, UK.
| |
Collapse
|
6
|
Yelagandula R, Osakabe A, Axelsson E, Berger F, Kawashima T. Genome-Wide Profiling of Histone Modifications and Histone Variants in Arabidopsis thaliana and Marchantia polymorpha. Methods Mol Biol 2017; 1610:93-106. [PMID: 28439859 DOI: 10.1007/978-1-4939-7003-2_7] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/03/2022]
Abstract
Histone modifications and histone variants barcode the genome and play major roles in epigenetic regulations. Chromatin immunoprecipitation (ChIP) coupled with next-generation sequencing (NGS) is a well-established method to investigate the landscape of epigenetic marks at a genomic level. Here, we describe procedures for conducting ChIP, subsequent NGS library construction, and data analysis on histone modifications and histone variants in Arabidopsis thaliana. We also describe an optimized nuclear isolation procedure to prepare chromatin for ChIP in the liverwort, Marchantia polymorpha, which is the emerging model plant ideal for evolutionary studies.
Collapse
Affiliation(s)
- Ramesh Yelagandula
- Gregor Mendel Institute (GMI), Austrian Academy of Sciences, Vienna Biocenter (VBC), Dr. Bohr-Gasse 3, 1030, Vienna, Austria.,Institute of Molecular Biotechnology of the Austrian Academy of Sciences (IMBA), Vienna Biocenter (VBC), Dr.Bohrgasse 3, 1030, Vienna, Austria
| | - Akihisa Osakabe
- Gregor Mendel Institute (GMI), Austrian Academy of Sciences, Vienna Biocenter (VBC), Dr. Bohr-Gasse 3, 1030, Vienna, Austria
| | - Elin Axelsson
- Gregor Mendel Institute (GMI), Austrian Academy of Sciences, Vienna Biocenter (VBC), Dr. Bohr-Gasse 3, 1030, Vienna, Austria
| | - Frederic Berger
- Gregor Mendel Institute (GMI), Austrian Academy of Sciences, Vienna Biocenter (VBC), Dr. Bohr-Gasse 3, 1030, Vienna, Austria
| | - Tomokazu Kawashima
- Gregor Mendel Institute (GMI), Austrian Academy of Sciences, Vienna Biocenter (VBC), Dr. Bohr-Gasse 3, 1030, Vienna, Austria. .,Department of Plant and Soil Sciences, University of Kentucky, Lexington, KY, 40546, USA.
| |
Collapse
|
7
|
Blocker AW, Airoldi EM. Template-Based Models for Genome-Wide Analysis of Next-Generation Sequencing Data at Base-Pair Resolution. J Am Stat Assoc 2016. [DOI: 10.1080/01621459.2016.1141095] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/22/2022]
|
8
|
Abstract
Nucleosome positioning is an important process required for proper genome packing and its accessibility to execute the genetic program in a cell-specific, timely manner. In the recent years hundreds of papers have been devoted to the bioinformatics, physics and biology of nucleosome positioning. The purpose of this review is to cover a practical aspect of this field, namely, to provide a guide to the multitude of nucleosome positioning resources available online. These include almost 300 experimental datasets of genome-wide nucleosome occupancy profiles determined in different cell types and more than 40 computational tools for the analysis of experimental nucleosome positioning data and prediction of intrinsic nucleosome formation probabilities from the DNA sequence. A manually curated, up to date list of these resources will be maintained at http://generegulation.info.
Collapse
|
9
|
Abstract
Recent advances in experimental and computational methodologies are enabling ultra-high resolution genome-wide profiles of protein-DNA binding events. For example, the ChIP-exo protocol precisely characterizes protein-DNA cross-linking patterns by combining chromatin immunoprecipitation (ChIP) with 5' → 3' exonuclease digestion. Similarly, deeply sequenced chromatin accessibility assays (e.g. DNase-seq and ATAC-seq) enable the detection of protected footprints at protein-DNA binding sites. With these techniques and others, we have the potential to characterize the individual nucleotides that interact with transcription factors, nucleosomes, RNA polymerases and other regulatory proteins in a particular cellular context. In this review, we explain the experimental assays and computational analysis methods that enable high-resolution profiling of protein-DNA binding events. We discuss the challenges and opportunities associated with such approaches.
Collapse
Affiliation(s)
- Shaun Mahony
- a Department of Biochemistry & Molecular Biology , Center for Eukaryotic Gene Regulation, The Pennsylvania State University , University Park , PA , USA
| | - B Franklin Pugh
- a Department of Biochemistry & Molecular Biology , Center for Eukaryotic Gene Regulation, The Pennsylvania State University , University Park , PA , USA
| |
Collapse
|
10
|
Joshi A. Mammalian transcriptional hotspots are enriched for tissue specific enhancers near cell type specific highly expressed genes and are predicted to act as transcriptional activator hubs. BMC Bioinformatics 2014; 15:412. [PMID: 25547756 PMCID: PMC4302108 DOI: 10.1186/s12859-014-0412-0] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/02/2014] [Accepted: 12/08/2014] [Indexed: 12/20/2022] Open
Abstract
BACKGROUND Transcriptional hotspots are defined as genomic regions bound by multiple factors. They have been identified recently as cell type specific enhancers regulating developmentally essential genes in many species such as worm, fly and humans. The in-depth analysis of hotspots across multiple cell types in same species still remains to be explored and can bring new biological insights. RESULTS We therefore collected 108 transcription-related factor (TF) ChIP sequencing data sets in ten murine cell types and classified the peaks in each cell type in three groups according to binding occupancy as singletons (low-occupancy), combinatorials (mid-occupancy) and hotspots (high-occupancy). The peaks in the three groups clustered largely according to the occupancy, suggesting priming of genomic loci for mid occupancy irrespective of cell type. We then characterized hotspots for diverse structural functional properties. The genes neighbouring hotspots had a small overlap with hotspot genes in other cell types and were highly enriched for cell type specific function. Hotspots were enriched for sequence motifs of key TFs in that cell type and more than 90% of hotspots were occupied by pioneering factors. Though we did not find any sequence signature in the three groups, the H3K4me1 binding profile had bimodal peaks at hotspots, distinguishing hotspots from mono-modal H3K4me1 singletons. In ES cells, differentially expressed genes after perturbation of activators were enriched for hotspot genes suggesting hotspots primarily act as transcriptional activator hubs. Finally, we proposed that ES hotspots might be under control of SetDB1 and not DNMT for silencing. CONCLUSION Transcriptional hotspots are enriched for tissue specific enhancers near cell type specific highly expressed genes. In ES cells, they are predicted to act as transcriptional activator hubs and might be under SetDB1 control for silencing.
Collapse
Affiliation(s)
- Anagha Joshi
- Division of Developmental Biology, The Roslin Institute and Royal (Dick) School of Veterinary Studies, University of Edinburgh, Easter Bush Campus, Midlothian, EH25 8GR, UK.
| |
Collapse
|
11
|
Quintales L, Vázquez E, Antequera F. Comparative analysis of methods for genome-wide nucleosome cartography. Brief Bioinform 2014; 16:576-87. [PMID: 25296770 DOI: 10.1093/bib/bbu037] [Citation(s) in RCA: 22] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/31/2014] [Accepted: 08/26/2014] [Indexed: 11/13/2022] Open
Abstract
Nucleosomes contribute to compacting the genome into the nucleus and regulate the physical access of regulatory proteins to DNA either directly or through the epigenetic modifications of the histone tails. Precise mapping of nucleosome positioning across the genome is, therefore, essential to understanding the genome regulation. In recent years, several experimental protocols have been developed for this purpose that include the enzymatic digestion, chemical cleavage or immunoprecipitation of chromatin followed by next-generation sequencing of the resulting DNA fragments. Here, we compare the performance and resolution of these methods from the initial biochemical steps through the alignment of the millions of short-sequence reads to a reference genome to the final computational analysis to generate genome-wide maps of nucleosome occupancy. Because of the lack of a unified protocol to process data sets obtained through the different approaches, we have developed a new computational tool (NUCwave), which facilitates their analysis, comparison and assessment and will enable researchers to choose the most suitable method for any particular purpose. NUCwave is freely available at http://nucleosome.usal.es/nucwave along with a step-by-step protocol for its use.
Collapse
|
12
|
Chen W, Liu Y, Zhu S, Green CD, Wei G, Han JDJ. Improved nucleosome-positioning algorithm iNPS for accurate nucleosome positioning from sequencing data. Nat Commun 2014; 5:4909. [DOI: 10.1038/ncomms5909] [Citation(s) in RCA: 46] [Impact Index Per Article: 4.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/29/2013] [Accepted: 08/04/2014] [Indexed: 11/09/2022] Open
|
13
|
Polishko A, Bunnik EM, Le Roch KG, Lonardi S. PuFFIN--a parameter-free method to build nucleosome maps from paired-end reads. BMC Bioinformatics 2014; 15 Suppl 9:S11. [PMID: 25252810 PMCID: PMC4168711 DOI: 10.1186/1471-2105-15-s9-s11] [Citation(s) in RCA: 10] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022] Open
Abstract
Background We introduce a novel method, called PuFFIN, that takes advantage of paired-end short reads to build genome-wide nucleosome maps with larger numbers of detected nucleosomes and higher accuracy than existing tools. In contrast to other approaches that require users to optimize several parameters according to their data (e.g., the maximum allowed nucleosome overlap or legal ranges for the fragment sizes) our algorithm can accurately determine a genome-wide set of non-overlapping nucleosomes without any user-defined parameter. This feature makes PuFFIN significantly easier to use and prevents users from choosing the "wrong" parameters and obtain sub-optimal nucleosome maps. Results PuFFIN builds genome-wide nucleosome maps using a multi-scale (or multi-resolution) approach. Our algorithm relies on a set of nucleosome "landscape" functions at different resolution levels: each function represents the likelihood of each genomic location to be occupied by a nucleosome for a particular value of the smoothing parameter. After a set of candidate nucleosomes is computed for each function, PuFFIN produces a consensus set that satisfies non-overlapping constraints and maximizes the number of nucleosomes. Conclusions We report comprehensive experimental results that compares PuFFIN with recently published tools (NOrMAL, TEMPLATE FILTERING, and NucPosSimulator) on several synthetic datasets as well as real data for S. cerevisiae and P. falciparum. Experimental results show that our approach produces more accurate nucleosome maps with a higher number of non-overlapping nucleosomes than other tools.
Collapse
|
14
|
Murray V, Chen JK, Galea AM. Enhanced DNA repair of bleomycin-induced 3'-phosphoglycolate termini at the transcription start sites of actively transcribed genes in human cells. Mutat Res 2014; 769:93-9. [PMID: 25771728 DOI: 10.1016/j.mrfmmm.2014.06.006] [Citation(s) in RCA: 10] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/13/2014] [Revised: 05/29/2014] [Accepted: 06/18/2014] [Indexed: 10/25/2022]
Abstract
The anti-tumour agent, bleomycin, cleaves DNA to give 3'-phosphoglycolate and 5'-phosphate termini. The removal of 3'-phosphoglycolate to give 3'-OH ends is a very important step in the DNA repair of these lesions. In this study, next-generation DNA sequencing was utilised to investigate the repair of these 3'-phosphoglycolate termini at the transcription start sites (TSSs) of genes in HeLa cells. The 143,600 identified human TSSs in HeLa cells comprised 82,596 non-transcribed genes and 61,004 transcribed genes; and the transcribed genes were divided into quintiles of 12,201 genes comprising the top 20%, 20-40%, 40-60%, 60-80%, 80-100% of expressed genes. Repair of bleomycin-induced 3'-phosphoglycolate termini was enhanced at actively transcribed genes. The top 20% and 20-40% quintiles had a very similar level of enhanced repair, the 40-60% quintile was intermediate, while the 60-80% and 80-100% quintiles were close to the low level of enhancement found in non-transcribed genes. There were also interesting differences regarding bleomycin repair on the sense and antisense strands of DNA at TSSs. The sense strand had highly enhanced repair between 0 and 250bp relative to the TSS, while for the antisense strand highly enhanced repair was between 150 and 450bp. Repair of DNA damage is a major mechanism of resistance to anti-tumour drugs and this study provides an insight into this process in human tumour cells.
Collapse
Affiliation(s)
- Vincent Murray
- School of Biotechnology and Biomolecular Sciences, University of New South Wales, Sydney, NSW 2052, Australia.
| | - Jon K Chen
- School of Biotechnology and Biomolecular Sciences, University of New South Wales, Sydney, NSW 2052, Australia
| | - Anne M Galea
- School of Biotechnology and Biomolecular Sciences, University of New South Wales, Sydney, NSW 2052, Australia
| |
Collapse
|
15
|
Murray V, Chen JK, Galea AM. The anti-tumor drug bleomycin preferentially cleaves at the transcription start sites of actively transcribed genes in human cells. Cell Mol Life Sci 2014; 71:1505-12. [PMID: 23982755 PMCID: PMC11113418 DOI: 10.1007/s00018-013-1456-4] [Citation(s) in RCA: 18] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/02/2013] [Revised: 08/06/2013] [Accepted: 08/12/2013] [Indexed: 11/26/2022]
Abstract
The genome-wide pattern of DNA cleavage at transcription start sites (TSSs) for the anti-tumor drug bleomycin was examined in human HeLa cells using next-generation DNA sequencing. It was found that actively transcribed genes were preferentially cleaved compared with non-transcribed genes. The 143,600 identified human TSSs were split into non-transcribed genes (82,596) and transcribed genes (61,004) for HeLa cells. These transcribed genes were further split into quintiles of 12,201 genes comprising the top 20, 20-40, 40-60, 60-80, and 80-100 % of expressed genes. The bleomycin cleavage pattern at highly transcribed gene TSSs was greatly enhanced compared with purified DNA and non-transcribed gene TSSs. The top 20 and 20-40 % quintiles had a very similar enhanced cleavage pattern, the 40-60 % quintile was intermediate, while the 60-80 and 80-100 % quintiles were close to the non-transcribed and purified DNA profiles. The pattern of bleomycin enhanced cleavage had peaks that were approximately 200 bp apart, and this indicated that bleomycin was identifying the presence of phased nucleosomes at TSSs. Hence bleomycin can be utilized to detect chromatin structures that are present at actively transcribed genes. In this study, for the first time, the pattern of DNA damage by a clinically utilized cancer chemotherapeutic agent was performed on a human genome-wide scale at the nucleotide level.
Collapse
Affiliation(s)
- Vincent Murray
- School of Biotechnology and Biomolecular Sciences, University of New South Wales, Sydney, NSW, 2052, Australia,
| | | | | |
Collapse
|
16
|
Mammana A, Vingron M, Chung HR. Inferring nucleosome positions with their histone mark annotation from ChIP data. ACTA ACUST UNITED AC 2013; 29:2547-54. [PMID: 23981350 PMCID: PMC3789549 DOI: 10.1093/bioinformatics/btt449] [Citation(s) in RCA: 20] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]
Abstract
Motivation: The nucleosome is the basic repeating unit of chromatin. It contains two copies each of the four core histones H2A, H2B, H3 and H4 and about 147 bp of DNA. The residues of the histone proteins are subject to numerous post-translational modifications, such as methylation or acetylation. Chromatin immunoprecipitiation followed by sequencing (ChIP-seq) is a technique that provides genome-wide occupancy data of these modified histone proteins, and it requires appropriate computational methods. Results: We present NucHunter, an algorithm that uses the data from ChIP-seq experiments directed against many histone modifications to infer positioned nucleosomes. NucHunter annotates each of these nucleosomes with the intensities of the histone modifications. We demonstrate that these annotations can be used to infer nucleosomal states with distinct correlations to underlying genomic features and chromatin-related processes, such as transcriptional start sites, enhancers, elongation by RNA polymerase II and chromatin-mediated repression. Thus, NucHunter is a versatile tool that can be used to predict positioned nucleosomes from a panel of histone modification ChIP-seq experiments and infer distinct histone modification patterns associated to different chromatin states. Availability: The software is available at http://epigen.molgen.mpg.de/nuchunter/. Contact:chung@molgen.mpg.de Supplementary information:Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Alessandro Mammana
- Otto-Warburg-Laboratories, Epigenomics and Computational Molecular Biology, Max Planck Institute for Molecular Genetics, D-14195 Berlin, Germany
| | | | | |
Collapse
|
17
|
Schöpflin R, Teif VB, Müller O, Weinberg C, Rippe K, Wedemann G. Modeling nucleosome position distributions from experimental nucleosome positioning maps. ACTA ACUST UNITED AC 2013; 29:2380-6. [PMID: 23846748 DOI: 10.1093/bioinformatics/btt404] [Citation(s) in RCA: 30] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/19/2022]
Abstract
MOTIVATION Recent experimental advancements allow determining positions of nucleosomes for complete genomes. However, the resulting nucleosome occupancy maps are averages of heterogeneous cell populations. Accordingly, they represent a snapshot of a dynamic ensemble at a single time point with an overlay of many configurations from different cells. To study the organization of nucleosomes along the genome and to understand the mechanisms of nucleosome translocation, it is necessary to retrieve features of specific conformations from the population average. RESULTS Here, we present a method for identifying non-overlapping nucleosome configurations that combines binary-variable analysis and a Monte Carlo approach with a simulated annealing scheme. In this manner, we obtain specific nucleosome configurations and optimized solutions for the complex positioning patterns from experimental data. We apply the method to compare nucleosome positioning at transcription factor binding sites in different mouse cell types. Our method can model nucleosome translocations at regulatory genomic elements and generate configurations for simulations of the spatial folding of the nucleosome chain. AVAILABILITY Source code, precompiled binaries, test data and a web-based test installation are freely available at http://bioinformatics.fh-stralsund.de/nucpos/
Collapse
Affiliation(s)
- Robert Schöpflin
- Institute for Applied Computer Science, University of Applied Sciences Stralsund, Zur Schwedenschanze 15, Stralsund 18435, Germany and Deutsches Krebsforschungszentrum (DKFZ) & BioQuant, Im Neuenheimer Feld 280, Heidelberg 69120, Germany
| | | | | | | | | | | |
Collapse
|
18
|
Woo S, Zhang X, Sauteraud R, Robert F, Gottardo R. PING 2.0: an R/Bioconductor package for nucleosome positioning using next-generation sequencing data. ACTA ACUST UNITED AC 2013; 29:2049-50. [PMID: 23786769 DOI: 10.1093/bioinformatics/btt348] [Citation(s) in RCA: 16] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/21/2023]
Abstract
SUMMARY MNase-Seq and ChIP-Seq have evolved as popular techniques to study chromatin and histone modification. Although many tools have been developed to identify enriched regions, software tools for nucleosome positioning are still limited. We introduce a flexible and powerful open-source R package, PING 2.0, for nucleosome positioning using MNase-Seq data or MNase- or sonicated- ChIP-Seq data combined with either single-end or paired-end sequencing. PING uses a model-based approach, which enables nucleosome predictions even in the presence of low read counts. We illustrate PING using two paired-end datasets from Saccharomyces cerevisiae and compare its performance with nucleR and ChIPseqR. AVAILABILITY PING 2.0 is available from the Bioconductor website at http://bioconductor.org. It can run on Linux, Mac and Windows.
Collapse
Affiliation(s)
- Sangsoon Woo
- Vaccine and Infectious Diseases and Public Health Sciences Divisions, Fred Hutchinson Cancer Research Center, Seattle, WA 98109-1024, USA
| | | | | | | | | |
Collapse
|
19
|
Tennant BR, Robertson AG, Kramer M, Li L, Zhang X, Beach M, Thiessen N, Chiu R, Mungall K, Whiting CJ, Sabatini PV, Kim A, Gottardo R, Marra MA, Lynn FC, Jones SJM, Hoodless PA, Hoffman BG. Identification and analysis of murine pancreatic islet enhancers. Diabetologia 2013; 56:542-52. [PMID: 23238790 PMCID: PMC4773896 DOI: 10.1007/s00125-012-2797-5] [Citation(s) in RCA: 44] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 09/06/2012] [Accepted: 11/20/2012] [Indexed: 01/05/2023]
Abstract
AIMS/HYPOTHESIS The paucity of information on the epigenetic barriers that are blocking reprogramming protocols, and on what makes a beta cell unique, has hampered efforts to develop novel beta cell sources. Here, we aimed to identify enhancers in pancreatic islets, to understand their developmental ontologies, and to identify enhancers unique to islets to increase our understanding of islet-specific gene expression. METHODS We combined H3K4me1-based nucleosome predictions with pancreatic and duodenal homeobox 1 (PDX1), neurogenic differentiation 1 (NEUROD1), v-Maf musculoaponeurotic fibrosarcoma oncogene family, protein A (MAFA) and forkhead box A2 (FOXA2) occupancy data to identify enhancers in mouse islets. RESULTS We identified 22,223 putative enhancer loci in in vivo mouse islets. Our validation experiments suggest that nearly half of these loci are active in regulating islet gene expression, with the remaining regions probably poised for activity. We showed that these loci have at least nine developmental ontologies, and that islet enhancers predominately acquire H3K4me1 during differentiation. We next discriminated 1,799 enhancers unique to islets and showed that these islet-specific enhancers have reduced association with annotated genes, and identified a subset that are instead associated with novel islet-specific long non-coding RNAs (lncRNAs). CONCLUSIONS/INTERPRETATIONS Our results indicate that genes with islet-specific expression and function tend to have enhancers devoid of histone methylation marks or, less often, that are bivalent or repressed, in embryonic stem cells and liver. Further, we identify a subset of enhancers unique to islets that are associated with novel islet-specific genes and lncRNAs. We anticipate that these data will facilitate the development of novel sources of functional beta cell mass.
Collapse
Affiliation(s)
- B. R. Tennant
- Child and Family Research Institute, British Columbia Children’s Hospital and Sunny Hill Health Centre, Room A4-185, 950 W28th Avenue, Vancouver, BC, Canada V5Z 4H4
| | - A. G. Robertson
- Canada’s Michael Smith Genome Sciences Centre, British Columbia Cancer Agency, Vancouver, BC, Canada
| | - M. Kramer
- Child and Family Research Institute, British Columbia Children’s Hospital and Sunny Hill Health Centre, Room A4-185, 950 W28th Avenue, Vancouver, BC, Canada V5Z 4H4
| | - L. Li
- Biostatistics Branch, National Institute of Environmental Health Sciences/NIH, Research Triangle Park, NC, USA
| | - X. Zhang
- Department of Statistics, University of British Columbia, Vancouver, BC, Canada
| | - M. Beach
- Child and Family Research Institute, British Columbia Children’s Hospital and Sunny Hill Health Centre, Room A4-185, 950 W28th Avenue, Vancouver, BC, Canada V5Z 4H4
| | - N. Thiessen
- Canada’s Michael Smith Genome Sciences Centre, British Columbia Cancer Agency, Vancouver, BC, Canada
| | - R. Chiu
- Canada’s Michael Smith Genome Sciences Centre, British Columbia Cancer Agency, Vancouver, BC, Canada
| | - K. Mungall
- Canada’s Michael Smith Genome Sciences Centre, British Columbia Cancer Agency, Vancouver, BC, Canada
| | - C. J. Whiting
- Child and Family Research Institute, British Columbia Children’s Hospital and Sunny Hill Health Centre, Room A4-185, 950 W28th Avenue, Vancouver, BC, Canada V5Z 4H4
| | - P. V. Sabatini
- Child and Family Research Institute, British Columbia Children’s Hospital and Sunny Hill Health Centre, Room A4-185, 950 W28th Avenue, Vancouver, BC, Canada V5Z 4H4
| | - A. Kim
- Child and Family Research Institute, British Columbia Children’s Hospital and Sunny Hill Health Centre, Room A4-185, 950 W28th Avenue, Vancouver, BC, Canada V5Z 4H4
| | - R. Gottardo
- Division of Public Health Sciences, Fred Hutchinson Cancer Research Center, Seattle, WA, USA
| | - M. A. Marra
- Canada’s Michael Smith Genome Sciences Centre, British Columbia Cancer Agency, Vancouver, BC, Canada
- Department of Medical Genetics, University of British Columbia, Vancouver, BC, Canada
| | - F. C. Lynn
- Child and Family Research Institute, British Columbia Children’s Hospital and Sunny Hill Health Centre, Room A4-185, 950 W28th Avenue, Vancouver, BC, Canada V5Z 4H4
- Department of Surgery, University of British Columbia, Vancouver, BC, Canada
| | - S. J. M. Jones
- Canada’s Michael Smith Genome Sciences Centre, British Columbia Cancer Agency, Vancouver, BC, Canada
- Department of Medical Genetics, University of British Columbia, Vancouver, BC, Canada
- Department of Molecular Biology and Biochemistry, Simon Fraser University, Vancouver, BC, Canada
| | - P. A. Hoodless
- Department of Medical Genetics, University of British Columbia, Vancouver, BC, Canada
- Terry Fox Laboratory, British Columbia Cancer Agency, Vancouver, BC, Canada
| | - B. G. Hoffman
- Child and Family Research Institute, British Columbia Children’s Hospital and Sunny Hill Health Centre, Room A4-185, 950 W28th Avenue, Vancouver, BC, Canada V5Z 4H4
- Department of Surgery, University of British Columbia, Vancouver, BC, Canada
| |
Collapse
|
20
|
Nellore A, Bobkov K, Howe E, Pankov A, Diaz A, Song JS. NSeq: a multithreaded Java application for finding positioned nucleosomes from sequencing data. Front Genet 2013; 3:320. [PMID: 23335939 PMCID: PMC3542818 DOI: 10.3389/fgene.2012.00320] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/13/2012] [Accepted: 12/21/2012] [Indexed: 01/27/2023] Open
Abstract
We introduce NSeq, a fast and efficient Java application for finding positioned nucleosomes from the high-throughput sequencing of MNase-digested mononucleosomal DNA. NSeq includes a user-friendly graphical interface, computes false discovery rates (FDRs) for candidate nucleosomes from Monte Carlo simulations, plots nucleosome coverage and centers, and exploits the availability of multiple processor cores by parallelizing its computations. Java binaries and source code are freely available at https://github.com/songlab/NSeq. The software is supported on all major platforms equipped with Java Runtime Environment 6 or later.
Collapse
Affiliation(s)
- Abhinav Nellore
- Institute for Human Genetics, University of California San Francisco CA, USA ; The Eli and Edythe Broad Center of Regeneration Medicine and Stem Cell Research, University of California San Francisco CA, USA
| | | | | | | | | | | |
Collapse
|