1
|
Kravatsky YV, Chechetkin VR, Tchurikov NA, Kravatskaya GI. Genome-Wide Study of Colocalization between Genomic Stretches: A Method and Applications to the Regulation of Gene Expression. BIOLOGY 2022; 11:1422. [PMID: 36290327 PMCID: PMC9598420 DOI: 10.3390/biology11101422] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 08/29/2022] [Revised: 09/25/2022] [Accepted: 09/26/2022] [Indexed: 06/16/2023]
Abstract
In this paper, we describe a method for the study of colocalization effects between stretch-stretch and stretch-point genome tracks based on a set of indices varying within the (-1, +1) interval. The indices combine the distances between the centers of neighboring stretches and their lengths. The extreme boundaries of the interval correspond to the complete colocalization of the genome tracks or its complete absence. We also obtained the relevant criteria of statistical significance for such indices using the complete permutation test. The method is robust with respect to strongly inhomogeneous positioning and length distribution of the genome tracks. On the basis of this approach, we created command-line software, the Genome Track Colocalization Analyzer. The software was tested, compared with other available packages, and applied to particular problems related to gene expression. The package, Genome Track Colocalization Analyzer (GTCA), is freely available to the users. GTCA complements our previous software, the Genome Track Analyzer, intended for the search for pairwise correlations between point-like genome tracks (also freely available). The corresponding details are provided in Data Availability Statement at the end of the text.
Collapse
Affiliation(s)
- Yuri V. Kravatsky
- Engelhardt Institute of Molecular Biology, Russian Academy of Sciences, Vavilov Str., 32, 119991 Moscow, Russia
- Center for Precision Genome Editing and Genetic Technologies for Biomedicine, Engelhardt Institute of Molecular Biology, Russian Academy of Sciences, 119991 Moscow, Russia
| | - Vladimir R. Chechetkin
- Engelhardt Institute of Molecular Biology, Russian Academy of Sciences, Vavilov Str., 32, 119991 Moscow, Russia
| | - Nickolai A. Tchurikov
- Engelhardt Institute of Molecular Biology, Russian Academy of Sciences, Vavilov Str., 32, 119991 Moscow, Russia
| | - Galina I. Kravatskaya
- Engelhardt Institute of Molecular Biology, Russian Academy of Sciences, Vavilov Str., 32, 119991 Moscow, Russia
| |
Collapse
|
2
|
Bürger A, Dugas M. Cogito: automated and generic comparison of annotated genomic intervals. BMC Bioinformatics 2022; 23:315. [PMID: 35927614 PMCID: PMC9351259 DOI: 10.1186/s12859-022-04853-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/10/2022] [Accepted: 07/23/2022] [Indexed: 11/27/2022] Open
Abstract
Background Genetic and epigenetic biological studies often combine different types of experiments and multiple conditions. While the corresponding raw and processed data are made available through specialized public databases, the processed files are usually limited to a specific research question. Hence, they are unsuitable for an unbiased, systematic overview of a complex dataset. However, possible combinations of different sample types and conditions grow exponentially with the amount of sample types and conditions. Therefore the risk to miss a correlation or to overrate an identified correlation should be mitigated in a complex dataset. Since reanalysis of a full study is rarely a viable option, new methods are needed to address these issues systematically, reliably, reproducibly and efficiently. Results Cogito “COmpare annotated Genomic Intervals TOol” provides a workflow for an unbiased, structured overview and systematic analysis of complex genomic datasets consisting of different data types (e.g. RNA-seq, ChIP-seq) and conditions. Cogito is able to visualize valuable key information of genomic or epigenomic interval-based data, thereby providing a straightforward analysis approach for comparing different conditions. It supports getting an unbiased impression of a dataset and developing an appropriate analysis strategy for it. In addition to a text-based report, Cogito offers a fully customizable report as a starting point for further in-depth investigation. Conclusions Cogito implements a novel approach to facilitate high-level overview analyses of complex datasets, and offers additional insights into the data without the need for a full, time-consuming reanalysis. The R/Bioconductor package is freely available at https://bioconductor.org/packages/release/bioc/html/Cogito.html, a comprehensive documentation with detailed descriptions and reproducible examples is included. Supplementary Information The online version contains supplementary material available at 10.1186/s12859-022-04853-1.
Collapse
Affiliation(s)
- Annika Bürger
- Institute of Medical Informatics, Westfälische Wilhelms-Universität Münster, Albert-Schweitzer-Campus 1, 48149, Münster, Germany.
| | - Martin Dugas
- Institute of Medical Informatics, Heidelberg University Hospital, Seminarstr. 2, 69117, Heidelberg, Germany
| |
Collapse
|
3
|
Tchurikov NA, Alembekov IR, Klushevskaya ES, Kretova AN, Keremet AM, Sidorova AE, Meilakh PB, Chechetkin VR, Kravatskaya GI, Kravatsky YV. Genes Possessing the Most Frequent DNA DSBs Are Highly Associated with Development and Cancers, and Essentially Overlap with the rDNA-Contacting Genes. Int J Mol Sci 2022; 23:ijms23137201. [PMID: 35806206 PMCID: PMC9266645 DOI: 10.3390/ijms23137201] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/12/2022] [Revised: 06/15/2022] [Accepted: 06/27/2022] [Indexed: 12/13/2022] Open
Abstract
Double-strand DNA breakes (DSBs) are the most deleterious and widespread examples of DNA damage. They inevitably originate from endogenous mechanisms in the course of transcription, replication, and recombination, as well as from different exogenous factors. If not properly repaired, DSBs result in cell death or diseases. Genome-wide analysis of DSBs has revealed the numerous endogenous DSBs in human chromosomes. However, until now, it has not been clear what kind of genes are preferentially subjected to breakage. We performed a genetic and epigenetic analysis of the most frequent DSBs in HEK293T cells. Here, we show that they predominantly occur in the active genes controlling differentiation, development, and morphogenesis. These genes are highly associated with cancers and other diseases. About one-third of the genes possessing frequent DSBs correspond to rDNA-contacting genes. Our data suggest that a specific set of active genes controlling morphogenesis are the main targets of DNA breakage in human cells, although there is a specific set of silent genes controlling metabolism that also are enriched in DSBs. We detected this enrichment by different activators and repressors of transcription at DSB target sites, as well breakage at promoters. We propose that both active transcription and silencing of genes give a propensity for DNA breakage. These results have implications for medicine and gene therapy.
Collapse
Affiliation(s)
- Nickolai A. Tchurikov
- Department of Epigenetic Mechanisms of Gene Expression Regulation, Engelhardt Institute of Molecular Biology Russian Academy of Sciences, 119334 Moscow, Russia; (I.R.A.); (E.S.K.); (A.N.K.); (A.M.K.); (A.E.S.); (P.B.M.); (V.R.C.); (G.I.K.); (Y.V.K.)
- Correspondence:
| | - Ildar R. Alembekov
- Department of Epigenetic Mechanisms of Gene Expression Regulation, Engelhardt Institute of Molecular Biology Russian Academy of Sciences, 119334 Moscow, Russia; (I.R.A.); (E.S.K.); (A.N.K.); (A.M.K.); (A.E.S.); (P.B.M.); (V.R.C.); (G.I.K.); (Y.V.K.)
| | - Elena S. Klushevskaya
- Department of Epigenetic Mechanisms of Gene Expression Regulation, Engelhardt Institute of Molecular Biology Russian Academy of Sciences, 119334 Moscow, Russia; (I.R.A.); (E.S.K.); (A.N.K.); (A.M.K.); (A.E.S.); (P.B.M.); (V.R.C.); (G.I.K.); (Y.V.K.)
| | - Antonina N. Kretova
- Department of Epigenetic Mechanisms of Gene Expression Regulation, Engelhardt Institute of Molecular Biology Russian Academy of Sciences, 119334 Moscow, Russia; (I.R.A.); (E.S.K.); (A.N.K.); (A.M.K.); (A.E.S.); (P.B.M.); (V.R.C.); (G.I.K.); (Y.V.K.)
| | - Ann M. Keremet
- Department of Epigenetic Mechanisms of Gene Expression Regulation, Engelhardt Institute of Molecular Biology Russian Academy of Sciences, 119334 Moscow, Russia; (I.R.A.); (E.S.K.); (A.N.K.); (A.M.K.); (A.E.S.); (P.B.M.); (V.R.C.); (G.I.K.); (Y.V.K.)
| | - Anastasia E. Sidorova
- Department of Epigenetic Mechanisms of Gene Expression Regulation, Engelhardt Institute of Molecular Biology Russian Academy of Sciences, 119334 Moscow, Russia; (I.R.A.); (E.S.K.); (A.N.K.); (A.M.K.); (A.E.S.); (P.B.M.); (V.R.C.); (G.I.K.); (Y.V.K.)
| | - Polina B. Meilakh
- Department of Epigenetic Mechanisms of Gene Expression Regulation, Engelhardt Institute of Molecular Biology Russian Academy of Sciences, 119334 Moscow, Russia; (I.R.A.); (E.S.K.); (A.N.K.); (A.M.K.); (A.E.S.); (P.B.M.); (V.R.C.); (G.I.K.); (Y.V.K.)
| | - Vladimir R. Chechetkin
- Department of Epigenetic Mechanisms of Gene Expression Regulation, Engelhardt Institute of Molecular Biology Russian Academy of Sciences, 119334 Moscow, Russia; (I.R.A.); (E.S.K.); (A.N.K.); (A.M.K.); (A.E.S.); (P.B.M.); (V.R.C.); (G.I.K.); (Y.V.K.)
| | - Galina I. Kravatskaya
- Department of Epigenetic Mechanisms of Gene Expression Regulation, Engelhardt Institute of Molecular Biology Russian Academy of Sciences, 119334 Moscow, Russia; (I.R.A.); (E.S.K.); (A.N.K.); (A.M.K.); (A.E.S.); (P.B.M.); (V.R.C.); (G.I.K.); (Y.V.K.)
| | - Yuri V. Kravatsky
- Department of Epigenetic Mechanisms of Gene Expression Regulation, Engelhardt Institute of Molecular Biology Russian Academy of Sciences, 119334 Moscow, Russia; (I.R.A.); (E.S.K.); (A.N.K.); (A.M.K.); (A.E.S.); (P.B.M.); (V.R.C.); (G.I.K.); (Y.V.K.)
- Center for Precision Genome Editing and Genetic Technologies for Biomedicine, Engelhardt Institute of Molecular Biology Russian Academy of Sciences, 119334 Moscow, Russia
| |
Collapse
|
4
|
Tchurikov NA, Klushevskaya ES, Alembekov IR, Bukreeva AS, Kretova AN, Chechetkin VR, Kravatskaya GI, Kravatsky YV. Fragments of rDNA Genes Scattered over the Human Genome Are Targets of Small RNAs. Int J Mol Sci 2022; 23:ijms23063014. [PMID: 35328433 PMCID: PMC8954558 DOI: 10.3390/ijms23063014] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/03/2022] [Revised: 03/08/2022] [Accepted: 03/09/2022] [Indexed: 02/06/2023] Open
Abstract
Small noncoding RNAs of different origins and classes play several roles in the regulation of gene expression. Here, we show that diverged and rearranged fragments of rDNA units are scattered throughout the human genome and that endogenous small noncoding RNAs are processed by the Microprocessor complex from specific regions of ribosomal RNAs shaping hairpins. These small RNAs correspond to particular sites inside the fragments of rDNA that mostly reside in intergenic regions or the introns of about 1500 genes. The targets of these small ribosomal RNAs (srRNAs) are characterized by a set of epigenetic marks, binding sites of Pol II, RAD21, CBP, and P300, DNase I hypersensitive sites, and by enrichment or depletion of active histone marks. In HEK293T cells, genes that are targeted by srRNAs (srRNA target genes) are involved in differentiation and development. srRNA target genes are enriched with more actively transcribed genes. Our data suggest that remnants of rDNA sequences and srRNAs may be involved in the upregulation or downregulation of a specific set of genes in human cells. These results have implications for diverse fields, including epigenetics and gene therapy.
Collapse
Affiliation(s)
- Nickolai A. Tchurikov
- Department of Epigenetic Mechanisms of Gene Expression Regulation, Engelhardt Institute of Molecular Biology Russian Academy of Sciences, 119334 Moscow, Russia; (E.S.K.); (I.R.A.); (A.S.B.); (A.N.K.); (V.R.C.); (G.I.K.); (Y.V.K.)
- Correspondence:
| | - Elena S. Klushevskaya
- Department of Epigenetic Mechanisms of Gene Expression Regulation, Engelhardt Institute of Molecular Biology Russian Academy of Sciences, 119334 Moscow, Russia; (E.S.K.); (I.R.A.); (A.S.B.); (A.N.K.); (V.R.C.); (G.I.K.); (Y.V.K.)
| | - Ildar R. Alembekov
- Department of Epigenetic Mechanisms of Gene Expression Regulation, Engelhardt Institute of Molecular Biology Russian Academy of Sciences, 119334 Moscow, Russia; (E.S.K.); (I.R.A.); (A.S.B.); (A.N.K.); (V.R.C.); (G.I.K.); (Y.V.K.)
| | - Anastasiia S. Bukreeva
- Department of Epigenetic Mechanisms of Gene Expression Regulation, Engelhardt Institute of Molecular Biology Russian Academy of Sciences, 119334 Moscow, Russia; (E.S.K.); (I.R.A.); (A.S.B.); (A.N.K.); (V.R.C.); (G.I.K.); (Y.V.K.)
| | - Antonina N. Kretova
- Department of Epigenetic Mechanisms of Gene Expression Regulation, Engelhardt Institute of Molecular Biology Russian Academy of Sciences, 119334 Moscow, Russia; (E.S.K.); (I.R.A.); (A.S.B.); (A.N.K.); (V.R.C.); (G.I.K.); (Y.V.K.)
| | - Vladimir R. Chechetkin
- Department of Epigenetic Mechanisms of Gene Expression Regulation, Engelhardt Institute of Molecular Biology Russian Academy of Sciences, 119334 Moscow, Russia; (E.S.K.); (I.R.A.); (A.S.B.); (A.N.K.); (V.R.C.); (G.I.K.); (Y.V.K.)
| | - Galina I. Kravatskaya
- Department of Epigenetic Mechanisms of Gene Expression Regulation, Engelhardt Institute of Molecular Biology Russian Academy of Sciences, 119334 Moscow, Russia; (E.S.K.); (I.R.A.); (A.S.B.); (A.N.K.); (V.R.C.); (G.I.K.); (Y.V.K.)
| | - Yuri V. Kravatsky
- Department of Epigenetic Mechanisms of Gene Expression Regulation, Engelhardt Institute of Molecular Biology Russian Academy of Sciences, 119334 Moscow, Russia; (E.S.K.); (I.R.A.); (A.S.B.); (A.N.K.); (V.R.C.); (G.I.K.); (Y.V.K.)
- Center for Precision Genome Editing and Genetic Technologies for Biomedicine, Engelhardt Institute of Molecular Biology Russian Academy of Sciences, 119334 Moscow, Russia
| |
Collapse
|
5
|
Busa VF, Favorov AV, Fertig EJ, Leung AK. Spatial correlation statistics enable transcriptome-wide characterization of RNA structure binding. CELL REPORTS METHODS 2021; 1:100088. [PMID: 35474897 PMCID: PMC9017189 DOI: 10.1016/j.crmeth.2021.100088] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 12/15/2020] [Revised: 06/23/2021] [Accepted: 08/30/2021] [Indexed: 11/20/2022]
Abstract
Molecular interactions at identical transcriptomic locations or at proximal but non-overlapping sites can mediate RNA modification and regulation, necessitating tools to uncover these spatial relationships. We present nearBynding, a flexible algorithm and software pipeline that models spatial correlation between transcriptome-wide tracks from diverse data types. nearBynding can process and correlate interval as well as continuous data and incorporate experimentally derived or in silico predicted transcriptomic tracks. nearBynding offers visualization functions for its statistics to identify colocalizations and adjacent features. We demonstrate the application of nearBynding to correlate RNA-binding protein (RBP) binding preferences with other RBPs, RNA structure, or RNA modification. By cross-correlating RBP binding and RNA structure data, we demonstrate that nearBynding recapitulates known RBP binding to structural motifs and provides biological insights into RBP binding preference of G-quadruplexes. nearBynding is available as an R/Bioconductor package and can run on a personal computer, making correlation of transcriptomic features broadly accessible.
Collapse
Affiliation(s)
- Veronica F. Busa
- McKusick-Nathans Department of Genetic Medicine, Johns Hopkins University School of Medicine, Baltimore, MD 21205, USA
- Department of Biochemistry and Molecular Biology, Bloomberg School of Public Health, Johns Hopkins University, Baltimore, MD 21205, USA
| | - Alexander V. Favorov
- Department of Oncology, Sidney Kimmel Comprehensive Cancer Center, Johns Hopkins University School of Medicine, Baltimore, MD 21205, USA
- Laboratory of Systems Biology and Computational Genetics, Vavilov Institute of General Genetics, Russian Academy of Sciences, Moscow, Russia
| | - Elana J. Fertig
- McKusick-Nathans Department of Genetic Medicine, Johns Hopkins University School of Medicine, Baltimore, MD 21205, USA
- Department of Oncology, Sidney Kimmel Comprehensive Cancer Center, Johns Hopkins University School of Medicine, Baltimore, MD 21205, USA
- Department of Biomedical Engineering, Johns Hopkins University Whiting School of Engineering, Baltimore, MD 21205, USA
- Department of Applied Mathematics and Statistics, Johns Hopkins University Whiting School of Engineering, Baltimore, MD 21205, USA
| | - Anthony K.L. Leung
- McKusick-Nathans Department of Genetic Medicine, Johns Hopkins University School of Medicine, Baltimore, MD 21205, USA
- Department of Biochemistry and Molecular Biology, Bloomberg School of Public Health, Johns Hopkins University, Baltimore, MD 21205, USA
- Department of Molecular Biology and Genetics, Johns Hopkins University School of Medicine, Baltimore, MD 21205, USA
- Department of Oncology, Sidney Kimmel Comprehensive Cancer Center, Johns Hopkins University School of Medicine, Baltimore, MD 21205, USA
| |
Collapse
|
6
|
Non-canonical DNA/RNA structures during Transcription-Coupled Double-Strand Break Repair: Roadblocks or Bona fide repair intermediates? DNA Repair (Amst) 2019; 81:102661. [PMID: 31331819 DOI: 10.1016/j.dnarep.2019.102661] [Citation(s) in RCA: 56] [Impact Index Per Article: 11.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/12/2023]
Abstract
Although long overlooked, it is now well understood that DNA does not systematically assemble into a canonical double helix, known as B-DNA, throughout the entire genome but can also accommodate other structures including DNA hairpins, G-quadruplexes and RNA:DNA hybrids. Notably, these non-canonical DNA structures form preferentially at transcriptionally active loci. Acting as replication roadblocks and being targeted by multiple machineries, these structures weaken the genome and render it prone to damage, including DNA double-strand breaks (DSB). In addition, secondary structures also further accumulate upon DSB formation. Here we discuss the potential functions of pre-existing or de novo formed nucleic acid structures, as bona fide repair intermediates or repair roadblocks, especially during Transcription-Coupled DNA Double-Strand Break repair (TC-DSBR), and provide an update on the specialized protein complexes displaying the ability to remove these structures to safeguard genome integrity.
Collapse
|
7
|
Stavrovskaya ED, Niranjan T, Fertig EJ, Wheelan SJ, Favorov AV, Mironov AA. StereoGene: rapid estimation of genome-wide correlation of continuous or interval feature data. Bioinformatics 2018; 33:3158-3165. [PMID: 29028265 DOI: 10.1093/bioinformatics/btx379] [Citation(s) in RCA: 18] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/22/2016] [Accepted: 06/12/2017] [Indexed: 12/13/2022] Open
Abstract
Motivation Genomics features with similar genome-wide distributions are generally hypothesized to be functionally related, for example, colocalization of histones and transcription start sites indicate chromatin regulation of transcription factor activity. Therefore, statistical algorithms to perform spatial, genome-wide correlation among genomic features are required. Results Here, we propose a method, StereoGene, that rapidly estimates genome-wide correlation among pairs of genomic features. These features may represent high-throughput data mapped to reference genome or sets of genomic annotations in that reference genome. StereoGene enables correlation of continuous data directly, avoiding the data binarization and subsequent data loss. Correlations are computed among neighboring genomic positions using kernel correlation. Representing the correlation as a function of the genome position, StereoGene outputs the local correlation track as part of the analysis. StereoGene also accounts for confounders such as input DNA by partial correlation. We apply our method to numerous comparisons of ChIP-Seq datasets from the Human Epigenome Atlas and FANTOM CAGE to demonstrate its wide applicability. We observe the changes in the correlation between epigenomic features across developmental trajectories of several tissue types consistent with known biology and find a novel spatial correlation of CAGE clusters with donor splice sites and with poly(A) sites. These analyses provide examples for the broad applicability of StereoGene for regulatory genomics. Availability and implementation The StereoGene C ++ source code, program documentation, Galaxy integration scripts and examples are available from the project homepage http://stereogene.bioinf.fbb.msu.ru/. Contact favorov@sensi.org. Supplementary information Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Elena D Stavrovskaya
- Department of Bioengineering and Bioinformatics, Moscow State University, Moscow 119992, Russia.,Institute for Information Transmission Problems, RAS, Moscow 127994, Russia
| | - Tejasvi Niranjan
- Department of Oncology, Sidney Kimmel Comprehensive Cancer Center, The Johns Hopkins University School of Medicine, Baltimore, MD 21205, USA
| | - Elana J Fertig
- Department of Oncology, Sidney Kimmel Comprehensive Cancer Center, The Johns Hopkins University School of Medicine, Baltimore, MD 21205, USA
| | - Sarah J Wheelan
- Department of Oncology, Sidney Kimmel Comprehensive Cancer Center, The Johns Hopkins University School of Medicine, Baltimore, MD 21205, USA
| | - Alexander V Favorov
- Department of Oncology, Sidney Kimmel Comprehensive Cancer Center, The Johns Hopkins University School of Medicine, Baltimore, MD 21205, USA.,Laboratory of Systems Biology and Computational Genetics, Vavilov Institute of General Genetics, RAS, Moscow 119333, Russia.,Laboratory of Bioinformatics, Research Institute of Genetics and Selection of Industrial Microorganisms, Moscow 117545, Russia
| | - Andrey A Mironov
- Department of Bioengineering and Bioinformatics, Moscow State University, Moscow 119992, Russia.,Institute for Information Transmission Problems, RAS, Moscow 127994, Russia
| |
Collapse
|
8
|
Dozmorov MG. Epigenomic annotation-based interpretation of genomic data: from enrichment analysis to machine learning. Bioinformatics 2018; 33:3323-3330. [PMID: 29028263 DOI: 10.1093/bioinformatics/btx414] [Citation(s) in RCA: 26] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/05/2017] [Accepted: 06/22/2017] [Indexed: 12/12/2022] Open
Abstract
Motivation One of the goals of functional genomics is to understand the regulatory implications of experimentally obtained genomic regions of interest (ROIs). Most sequencing technologies now generate ROIs distributed across the whole genome. The interpretation of these genome-wide ROIs represents a challenge as the majority of them lie outside of functionally well-defined protein coding regions. Recent efforts by the members of the International Human Epigenome Consortium have generated volumes of functional/regulatory data (reference epigenomic datasets), effectively annotating the genome with epigenomic properties. Consequently, a wide variety of computational tools has been developed utilizing these epigenomic datasets for the interpretation of genomic data. Results The purpose of this review is to provide a structured overview of practical solutions for the interpretation of ROIs with the help of epigenomic data. Starting with epigenomic enrichment analysis, we discuss leading tools and machine learning methods utilizing epigenomic and 3D genome structure data. The hierarchy of tools and methods reviewed here presents a practical guide for the interpretation of genome-wide ROIs within an epigenomic context. Contact mikhail.dozmorov@vcuhealth.org. Supplementary information Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Mikhail G Dozmorov
- Department of Biostatistics, Virginia Commonwealth University, Richmond, VA 23298, USA
| |
Collapse
|
9
|
Beckwith SL, Schwartz EK, García-Nieto PE, King DA, Gowans GJ, Wong KM, Eckley TL, Paraschuk AP, Peltan EL, Lee LR, Yao W, Morrison AJ. The INO80 chromatin remodeler sustains metabolic stability by promoting TOR signaling and regulating histone acetylation. PLoS Genet 2018; 14:e1007216. [PMID: 29462149 PMCID: PMC5834206 DOI: 10.1371/journal.pgen.1007216] [Citation(s) in RCA: 22] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/17/2017] [Revised: 03/02/2018] [Accepted: 01/23/2018] [Indexed: 12/16/2022] Open
Abstract
Chromatin remodeling complexes are essential for gene expression programs that coordinate cell function with metabolic status. However, how these remodelers are integrated in metabolic stability pathways is not well known. Here, we report an expansive genetic screen with chromatin remodelers and metabolic regulators in Saccharomyces cerevisiae. We found that, unlike the SWR1 remodeler, the INO80 chromatin remodeling complex is composed of multiple distinct functional subunit modules. We identified a strikingly divergent genetic signature for the Ies6 subunit module that links the INO80 complex to metabolic homeostasis. In particular, mitochondrial maintenance is disrupted in ies6 mutants. INO80 is also needed to communicate TORC1-mediated signaling to chromatin, as ino80 mutants exhibit defective transcriptional profiles and altered histone acetylation of TORC1-responsive genes. Furthermore, comparative analysis reveals subunits of INO80 and mTORC1 have high co-occurrence of alterations in human cancers. Collectively, these results demonstrate that the INO80 complex is a central component of metabolic homeostasis that influences histone acetylation and may contribute to disease when disrupted.
Collapse
Affiliation(s)
- Sean L. Beckwith
- Department of Biology, Stanford University, Stanford, CA, United States of America
| | - Erin K. Schwartz
- Department of Biology, Stanford University, Stanford, CA, United States of America
| | | | - Devin A. King
- Department of Biology, Stanford University, Stanford, CA, United States of America
| | - Graeme J. Gowans
- Department of Biology, Stanford University, Stanford, CA, United States of America
| | - Ka Man Wong
- Department of Biology, Stanford University, Stanford, CA, United States of America
| | - Tessa L. Eckley
- Department of Biology, Stanford University, Stanford, CA, United States of America
| | | | - Egan L. Peltan
- Department of Biology, Stanford University, Stanford, CA, United States of America
| | - Laura R. Lee
- Department of Biology, Stanford University, Stanford, CA, United States of America
| | - Wei Yao
- Department of Biology, Stanford University, Stanford, CA, United States of America
| | - Ashby J. Morrison
- Department of Biology, Stanford University, Stanford, CA, United States of America
| |
Collapse
|
10
|
Chechetkin VR, Lobzin VV. Large-scale chromosome folding versus genomic DNA sequences: A discrete double Fourier transform technique. J Theor Biol 2017; 426:162-179. [PMID: 28552553 DOI: 10.1016/j.jtbi.2017.05.033] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/08/2017] [Revised: 04/23/2017] [Accepted: 05/23/2017] [Indexed: 12/15/2022]
Abstract
Using state-of-the-art techniques combining imaging methods and high-throughput genomic mapping tools leaded to the significant progress in detailing chromosome architecture of various organisms. However, a gap still remains between the rapidly growing structural data on the chromosome folding and the large-scale genome organization. Could a part of information on the chromosome folding be obtained directly from underlying genomic DNA sequences abundantly stored in the databanks? To answer this question, we developed an original discrete double Fourier transform (DDFT). DDFT serves for the detection of large-scale genome regularities associated with domains/units at the different levels of hierarchical chromosome folding. The method is versatile and can be applied to both genomic DNA sequences and corresponding physico-chemical parameters such as base-pairing free energy. The latter characteristic is closely related to the replication and transcription and can also be used for the assessment of temperature or supercoiling effects on the chromosome folding. We tested the method on the genome of E. coli K-12 and found good correspondence with the annotated domains/units established experimentally. As a brief illustration of further abilities of DDFT, the study of large-scale genome organization for bacteriophage PHIX174 and bacterium Caulobacter crescentus was also added. The combined experimental, modeling, and bioinformatic DDFT analysis should yield more complete knowledge on the chromosome architecture and genome organization.
Collapse
Affiliation(s)
- V R Chechetkin
- Engelhardt Institute of Molecular Biology of Russian Academy of Sciences, Vavilov str., 32, Moscow 119334, Russia; Theoretical Department of Division for Perspective Investigations, Troitsk Institute of Innovation and Thermonuclear Investigations (TRINITI), Moscow, Troitsk District 108840, Russia.
| | - V V Lobzin
- School of Physics, University of Sydney, Sydney, NSW 2006, Australia.
| |
Collapse
|
11
|
Kudrin RA, Mironov AA, Stavrovskaya ED. Chromatin and Polycomb: Biology and bioinformatics. Mol Biol 2017. [DOI: 10.1134/s0026893316060121] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/23/2022]
|
12
|
Pope BJ, Mahmood K, Jung CH, Georgeson P, Park DJ. Single nucleotide-level mapping of DNA double-strand breaks in human HEK293T cells. GENOMICS DATA 2016; 11:43-45. [PMID: 27942458 PMCID: PMC5133665 DOI: 10.1016/j.gdata.2016.11.007] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 10/17/2016] [Revised: 11/07/2016] [Accepted: 11/09/2016] [Indexed: 11/29/2022]
Abstract
Constitutional biological processes involve the generation of DNA double-strand breaks (DSBs). The production of such breaks and their subsequent resolution are also highly relevant to neurodegenerative diseases and cancer, in which extensive DNA fragmentation has been described Stephens et al. (2011), Blondet et al. (2001). Tchurikov et al. Tchurikov et al. (2011, 2013) have reported previously that frequent sites of DSBs occur in chromosomal domains involved in the co-ordinated expression of genes. This group report that hot spots of DSBs in human HEK293T cells often coincide with H3K4me3 marks, associated with active transcription Kravatsky et al. (2015) and that frequent sites of DNA double-strand breakage are likely to be relevant to cancer genomics Tchurikov et al. (2013, 2016) . Recently, they applied a RAFT (rapid amplification of forum termini) protocol that selects for blunt-ended DSB sites and mapped these to the human genome within defined co-ordinate ‘windows’. In this paper, we re-analyse public RAFT data to derive sites of DSBs at the single-nucleotide level across the built genome for human HEK293T cells (https://figshare.com/s/35220b2b79eaaaf64ed8). This refined mapping, combined with accessory ENCODE data tracks and ribosomal DNA-related sequence annotations, will likely be of value for the design of clinically relevant targeted assays such as those for cancer susceptibility, diagnosis, treatment-matching and prognostication.
Collapse
Affiliation(s)
- Bernard J Pope
- Victorian Life Sciences Computation Initiative, The University of Melbourne, Australia
| | - Khalid Mahmood
- Victorian Life Sciences Computation Initiative, The University of Melbourne, Australia
| | - Chol-Hee Jung
- Victorian Life Sciences Computation Initiative, The University of Melbourne, Australia
| | - Peter Georgeson
- Victorian Life Sciences Computation Initiative, The University of Melbourne, Australia
| | - Daniel J Park
- Victorian Life Sciences Computation Initiative, The University of Melbourne, Australia; Genomic Technologies Group, Genetic Epidemiology Laboratory, Department of Pathology, The University of Melbourne, Australia
| |
Collapse
|
13
|
Sheffield NC, Bock C. LOLA: enrichment analysis for genomic region sets and regulatory elements in R and Bioconductor. Bioinformatics 2015; 32:587-9. [PMID: 26508757 PMCID: PMC4743627 DOI: 10.1093/bioinformatics/btv612] [Citation(s) in RCA: 246] [Impact Index Per Article: 27.3] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/24/2015] [Accepted: 10/16/2015] [Indexed: 12/31/2022] Open
Abstract
Summary: Genomic datasets are often interpreted in the context of large-scale reference databases. One approach is to identify significantly overlapping gene sets, which works well for gene-centric data. However, many types of high-throughput data are based on genomic regions. Locus Overlap Analysis (LOLA) provides easy and automatable enrichment analysis for genomic region sets, thus facilitating the interpretation of functional genomics and epigenomics data. Availability and Implementation: R package available in Bioconductor and on the following website: http://lola.computational-epigenetics.org. Contact: nsheffield@cemm.oeaw.ac.at or cbock@cemm.oeaw.ac.at
Collapse
Affiliation(s)
- Nathan C Sheffield
- CeMM Research Center for Molecular Medicine of the Austrian Academy of Sciences
| | - Christoph Bock
- CeMM Research Center for Molecular Medicine of the Austrian Academy of Sciences, Department of Laboratory Medicine, Medical University of Vienna, 1090 Vienna, Austria and Max Planck Institute for Informatics, 66123 Saarbrücken, Germany
| |
Collapse
|
14
|
Genome-wide mapping of hot spots of DNA double-strand breaks in human cells as a tool for epigenetic studies and cancer genomics. GENOMICS DATA 2015; 5:89-93. [PMID: 26484232 PMCID: PMC4583641 DOI: 10.1016/j.gdata.2015.05.018] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 05/15/2015] [Accepted: 05/24/2015] [Indexed: 11/23/2022]
Abstract
Hot spots of DNA double-strand breaks (DSBs) are associated with coordinated expression of genes in chromosomal domains (Tchurikov et al., 2011 [1]; 2013). These 50–150-kb DNA domains (denoted “forum domains”) can be visualized by separation of undigested chromosomal DNA in pulsed-field agarose gels (Tchurikov et al., 1988; 1992) and used for genome-wide mapping of the DSBs that produce them. Recently, we described nine hot spots of DSBs in human rDNA genes and observed that, in rDNA units, the hot spots coincide with CTCF binding sites and H3K4me3 marks (Tchurikov et al., 2014), suggesting a role for DSBs in active transcription. Here we have used Illumina sequencing to map DSBs in chromosomes of human HEK293T cells, and describe in detail the experimental design and bioinformatics analysis of the data deposited in the Gene Expression Omnibus with accession number GSE53811 and associated with the study published in DNA Research (Kravatsky et al., 2015). Our data indicate that H3K4me3 marks often coincide with hot spots of DSBs in HEK293T cells and that the mapping of these hot spots is important for cancer genomic studies.
Collapse
|