1
|
Jalili V, Cremona MA, Palluzzi F. Rescuing biologically relevant consensus regions across replicated samples. BMC Bioinformatics 2023; 24:240. [PMID: 37286963 PMCID: PMC10246347 DOI: 10.1186/s12859-023-05340-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/09/2022] [Accepted: 05/16/2023] [Indexed: 06/09/2023] Open
Abstract
BACKGROUND Protein-DNA binding sites of ChIP-seq experiments are identified where the binding affinity is significant based on a given threshold. The choice of the threshold is a trade-off between conservative region identification and discarding weak, but true binding sites. RESULTS We rescue weak binding sites using MSPC, which efficiently exploits replicates to lower the threshold required to identify a site while keeping a low false-positive rate, and we compare it to IDR, a widely used post-processing method for identifying highly reproducible peaks across replicates. We observe several master transcription regulators (e.g., SP1 and GATA3) and HDAC2-GATA1 regulatory networks on rescued regions in K562 cell line. CONCLUSIONS We argue the biological relevance of weak binding sites and the information they add when rescued by MSPC. An implementation of the proposed extended MSPC methodology and the scripts to reproduce the performed analysis are freely available at https://genometric.github.io/MSPC/ ; MSPC is distributed as a command-line application and an R package available from Bioconductor ( https://doi.org/doi:10.18129/B9.bioc.rmspc ).
Collapse
Affiliation(s)
- Vahid Jalili
- Broad Institute of MIT and Harvard, Cambridge, MA, USA.
| | - Marzia A Cremona
- Department of Operations and Decision Systems, Université Laval, Quebec, Canada.
- CHU de Québec - Université Laval Research Center, Quebec, Canada.
| | - Fernando Palluzzi
- Department of Brain and Behavioral Sciences, Università di Pavia, Pavia, Italy.
| |
Collapse
|
2
|
Karimzadeh M, Hoffman MM. Virtual ChIP-seq: predicting transcription factor binding by learning from the transcriptome. Genome Biol 2022; 23:126. [PMID: 35681170 PMCID: PMC9185870 DOI: 10.1186/s13059-022-02690-2] [Citation(s) in RCA: 8] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/04/2022] [Accepted: 05/16/2022] [Indexed: 11/29/2022] Open
Abstract
Existing methods for computational prediction of transcription factor (TF) binding sites evaluate genomic regions with similarity to known TF sequence preferences. Most TF binding sites, however, do not resemble known TF sequence motifs, and many TFs are not sequence-specific. We developed Virtual ChIP-seq, which predicts binding of individual TFs in new cell types, integrating learned associations with gene expression and binding, TF binding sites from other cell types, and chromatin accessibility data in the new cell type. This approach outperforms methods that predict TF binding solely based on sequence preference, predicting binding for 36 TFs (MCC>0.3).
Collapse
Affiliation(s)
- Mehran Karimzadeh
- Department of Medical Biophysics, University of Toronto, Toronto, ON, Canada.,Princess Margaret Cancer Centre, Toronto, ON, Canada.,Vector Institute, Toronto, ON, Canada
| | - Michael M Hoffman
- Department of Medical Biophysics, University of Toronto, Toronto, ON, Canada. .,Princess Margaret Cancer Centre, Toronto, ON, Canada. .,Vector Institute, Toronto, ON, Canada. .,Department of Computer Science, University of Toronto, Toronto, ON, Canada.
| |
Collapse
|
3
|
Lombardo SD, Wangsaputra IF, Menche J, Stevens A. Network Approaches for Charting the Transcriptomic and Epigenetic Landscape of the Developmental Origins of Health and Disease. Genes (Basel) 2022; 13:764. [PMID: 35627149 PMCID: PMC9141211 DOI: 10.3390/genes13050764] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [What about the content of this article? (0)] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/04/2022] [Revised: 04/04/2022] [Accepted: 04/13/2022] [Indexed: 02/04/2023] Open
Abstract
The early developmental phase is of critical importance for human health and disease later in life. To decipher the molecular mechanisms at play, current biomedical research is increasingly relying on large quantities of diverse omics data. The integration and interpretation of the different datasets pose a critical challenge towards the holistic understanding of the complex biological processes that are involved in early development. In this review, we outline the major transcriptomic and epigenetic processes and the respective datasets that are most relevant for studying the periconceptional period. We cover both basic data processing and analysis steps, as well as more advanced data integration methods. A particular focus is given to network-based methods. Finally, we review the medical applications of such integrative analyses.
Collapse
|
4
|
Yellajoshyula D, Rogers AE, Kim AJ, Kim S, Pappas SS, Dauer WT. A pathogenic DYT-THAP1 dystonia mutation causes hypomyelination and loss of YY1 binding. Hum Mol Genet 2022; 31:1096-1104. [PMID: 34686877 PMCID: PMC8976427 DOI: 10.1093/hmg/ddab310] [Citation(s) in RCA: 5] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/04/2021] [Revised: 09/27/2021] [Accepted: 10/19/2021] [Indexed: 12/24/2022] Open
Abstract
Dystonia is a disabling disease that manifests as prolonged involuntary twisting movements. DYT-THAP1 is an inherited form of isolated dystonia caused by mutations in THAP1 encoding the transcription factor THAP1. The phe81leu (F81L) missense mutation is representative of a category of poorly understood mutations that do not occur on residues critical for DNA binding. Here, we demonstrate that the F81L mutation (THAP1F81L) impairs THAP1 transcriptional activity and disrupts CNS myelination. Strikingly, THAP1F81L exhibits normal DNA binding but causes a significantly reduced DNA binding of YY1, its transcriptional partner that also has an established role in oligodendrocyte lineage progression. Our results suggest a model of molecular pathogenesis whereby THAP1F81L normally binds DNA but is unable to efficiently organize an active transcription complex.
Collapse
Affiliation(s)
| | - Abigail E Rogers
- Molecular Cellular and Developmental Biology, University of Michigan, Ann Arbor, MI 48109, USA
| | - Audrey J Kim
- Peter O’Donnell Jr. Brain Institute, University of Texas Southwestern Medical Center, Dallas, TX 75390, USA
| | - Sumin Kim
- Department of Neurology, University of Michigan, Ann Arbor, MI 48109, USA
- Cellular and Molecular Biology Graduate Program, University of Michigan, Ann Arbor, MI 48109, USA
| | - Samuel S Pappas
- Peter O’Donnell Jr. Brain Institute, University of Texas Southwestern Medical Center, Dallas, TX 75390, USA
- Department of Neurology, University of Texas Southwestern Medical Center, Dallas, TX 75390, USA
| | - William T Dauer
- Peter O’Donnell Jr. Brain Institute, University of Texas Southwestern Medical Center, Dallas, TX 75390, USA
- Department of Neurology, University of Texas Southwestern Medical Center, Dallas, TX 75390, USA
- Department of Neuroscience, University of Texas Southwestern Medical Center, Dallas, TX 75390, USA
| |
Collapse
|
5
|
Decker KT, Gao Y, Rychel K, Al Bulushi T, Chauhan S, Kim D, Cho BK, Palsson B. proChIPdb: a chromatin immunoprecipitation database for prokaryotic organisms. Nucleic Acids Res 2022; 50:D1077-D1084. [PMID: 34791440 PMCID: PMC8728212 DOI: 10.1093/nar/gkab1043] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/14/2021] [Revised: 10/05/2021] [Accepted: 10/14/2021] [Indexed: 12/03/2022] Open
Abstract
The transcriptional regulatory network in prokaryotes controls global gene expression mostly through transcription factors (TFs), which are DNA-binding proteins. Chromatin immunoprecipitation (ChIP) with DNA sequencing methods can identify TF binding sites across the genome, providing a bottom-up, mechanistic understanding of how gene expression is regulated. ChIP provides indispensable evidence toward the goal of acquiring a comprehensive understanding of cellular adaptation and regulation, including condition-specificity. ChIP-derived data's importance and labor-intensiveness motivate its broad dissemination and reuse, which is currently an unmet need in the prokaryotic domain. To fill this gap, we present proChIPdb (prochipdb.org), an information-rich, interactive web database. This website collects public ChIP-seq/-exo data across several prokaryotes and presents them in dashboards that include curated binding sites, nucleotide-resolution genome viewers, and summary plots such as motif enrichment sequence logos. Users can search for TFs of interest or their target genes, download all data, dashboards, and visuals, and follow external links to understand regulons through biological databases and the literature. This initial release of proChIPdb covers diverse organisms, including most major TFs of Escherichia coli, and can be expanded to support regulon discovery across the prokaryotic domain.
Collapse
Affiliation(s)
- Katherine T Decker
- Department of Bioengineering, University of California, San Diego, La Jolla, CA92093, USA
| | - Ye Gao
- Department of Bioengineering, University of California, San Diego, La Jolla, CA92093, USA
| | - Kevin Rychel
- Department of Bioengineering, University of California, San Diego, La Jolla, CA92093, USA
| | - Tahani Al Bulushi
- Department of Bioengineering, University of California, San Diego, La Jolla, CA92093, USA
| | - Siddharth M Chauhan
- Department of Bioengineering, University of California, San Diego, La Jolla, CA92093, USA
| | - Donghyuk Kim
- School of Energy and Chemical Engineering, Ulsan National Institute of Science and Technology, Ulsan 44919, Korea
| | - Byung-Kwan Cho
- Department of Biological Sciences and KI for the BioCentury, Korea Advanced Institute of Science and Technology, Daejeon34141, Republic of Korea
| | - Bernhard O Palsson
- Department of Bioengineering, University of California, San Diego, La Jolla, CA92093, USA
- Department of Pediatrics, University of California, San Diego, La Jolla, CA92093, USA
- Novo Nordisk Foundation Center for Biosustainability, Technical University of Denmark, Building 220, Kemitorvet, 2800 Kgs. Lyngby, Denmark
| |
Collapse
|
6
|
Lourenco C, Resetca D, Redel C, Lin P, MacDonald AS, Ciaccio R, Kenney TMG, Wei Y, Andrews DW, Sunnerhagen M, Arrowsmith CH, Raught B, Penn LZ. MYC protein interactors in gene transcription and cancer. Nat Rev Cancer 2021; 21:579-591. [PMID: 34188192 DOI: 10.1038/s41568-021-00367-9] [Citation(s) in RCA: 117] [Impact Index Per Article: 39.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Accepted: 05/04/2021] [Indexed: 02/07/2023]
Abstract
The transcription factor and oncoprotein MYC is a potent driver of many human cancers and can regulate numerous biological activities that contribute to tumorigenesis. How a single transcription factor can regulate such a diverse set of biological programmes is central to the understanding of MYC function in cancer. In this Perspective, we highlight how multiple proteins that interact with MYC enable MYC to regulate several central control points of gene transcription. These include promoter binding, epigenetic modifications, initiation, elongation and post-transcriptional processes. Evidence shows that a combination of multiple protein interactions enables MYC to function as a potent oncoprotein, working together in a 'coalition model', as presented here. Moreover, as MYC depends on its protein interactome for function, we discuss recent research that emphasizes an unprecedented opportunity to target protein interactors to directly impede MYC oncogenesis.
Collapse
Affiliation(s)
| | - Diana Resetca
- Princess Margaret Cancer Centre, Toronto, ON, Canada
- Department of Medical Biophysics, University of Toronto, Toronto, ON, Canada
| | - Cornelia Redel
- Princess Margaret Cancer Centre, Toronto, ON, Canada
- Department of Medical Biophysics, University of Toronto, Toronto, ON, Canada
| | - Peter Lin
- Princess Margaret Cancer Centre, Toronto, ON, Canada
- Department of Medical Biophysics, University of Toronto, Toronto, ON, Canada
| | - Alannah S MacDonald
- Princess Margaret Cancer Centre, Toronto, ON, Canada
- Department of Medical Biophysics, University of Toronto, Toronto, ON, Canada
| | - Roberto Ciaccio
- Department of Pharmacy and Biotechnology, University of Bologna, Bologna, Italy
| | - Tristan M G Kenney
- Princess Margaret Cancer Centre, Toronto, ON, Canada
- Department of Medical Biophysics, University of Toronto, Toronto, ON, Canada
| | - Yong Wei
- Princess Margaret Cancer Centre, Toronto, ON, Canada
- Biological Sciences, Sunnybrook Research Institute, Toronto, ON, Canada
| | - David W Andrews
- Department of Medical Biophysics, University of Toronto, Toronto, ON, Canada
- Biological Sciences, Sunnybrook Research Institute, Toronto, ON, Canada
| | - Maria Sunnerhagen
- Department of Physics, Chemistry and Biology, Linköping University, Linköping, Sweden
| | - Cheryl H Arrowsmith
- Princess Margaret Cancer Centre, Toronto, ON, Canada
- Department of Medical Biophysics, University of Toronto, Toronto, ON, Canada
- Structural Genomics Consortium, Toronto, ON, Canada
| | - Brian Raught
- Princess Margaret Cancer Centre, Toronto, ON, Canada
- Department of Medical Biophysics, University of Toronto, Toronto, ON, Canada
| | - Linda Z Penn
- Princess Margaret Cancer Centre, Toronto, ON, Canada.
- Department of Medical Biophysics, University of Toronto, Toronto, ON, Canada.
| |
Collapse
|
7
|
Chiliński M, Sengupta K, Plewczynski D. From DNA human sequence to the chromatin higher order organisation and its biological meaning: Using biomolecular interaction networks to understand the influence of structural variation on spatial genome organisation and its functional effect. Semin Cell Dev Biol 2021; 121:171-185. [PMID: 34429265 DOI: 10.1016/j.semcdb.2021.08.007] [Citation(s) in RCA: 9] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/19/2021] [Revised: 08/06/2021] [Accepted: 08/12/2021] [Indexed: 12/30/2022]
Abstract
The three-dimensional structure of the human genome has been proven to have a significant functional impact on gene expression. The high-order spatial chromatin is organised first by looping mediated by multiple protein factors, and then it is further formed into larger structures of topologically associated domains (TADs) or chromatin contact domains (CCDs), followed by A/B compartments and finally the chromosomal territories (CTs). The genetic variation observed in human population influences the multi-scale structures, posing a question regarding the functional impact of structural variants reflected by the variability of the genes expression patterns. The current methods of evaluating the functional effect include eQTLs analysis which uses statistical testing of influence of variants on spatially close genes. Rarely, non-coding DNA sequence changes are evaluated by their impact on the biomolecular interaction network (BIN) reflecting the cellular interactome that can be analysed by the classical graph-theoretic algorithms. Therefore, in the second part of the review, we introduce the concept of BIN, i.e. a meta-network model of the complete molecular interactome developed by integrating various biological networks. The BIN meta-network model includes DNA-protein binding by the plethora of protein factors as well as chromatin interactions, therefore allowing connection of genomics with the downstream biomolecular processes present in a cell. As an illustration, we scrutinise the chromatin interactions mediated by the CTCF protein detected in a ChIA-PET experiment in the human lymphoblastoid cell line GM12878. In the corresponding BIN meta-network the DNA spatial proximity is represented as a graph model, combined with the Proteins-Interaction Network (PIN) of human proteome using the Gene Association Network (GAN). Furthermore, we enriched the BIN with the signalling and metabolic pathways and Gene Ontology (GO) terms to assert its functional context. Finally, we mapped the Single Nucleotide Polymorphisms (SNPs) from the GWAS studies and identified the chromatin mutational hot-spots associated with a significant enrichment of SNPs related to autoimmune diseases. Afterwards, we mapped Structural Variants (SVs) from healthy individuals of 1000 Genomes Project and identified an interesting example of the missing protein complex associated with protein Q6GYQ0 due to a deletion on chromosome 14. Such an analysis using the meta-network BIN model is therefore helpful in evaluating the influence of genetic variation on spatial organisation of the genome and its functional effect in a cell.
Collapse
Affiliation(s)
- Mateusz Chiliński
- Laboratory of Bioinformatics and Computational Genomics, Faculty of Mathematics and Information Science, Warsaw University of Technology, Koszykowa 75, 00-662 Warsaw, Poland; Laboratory of Functional and Structural Genomics, Centre of New Technologies, University of Warsaw, Banacha 2c, 02-097 Warsaw, Poland
| | - Kaustav Sengupta
- Laboratory of Functional and Structural Genomics, Centre of New Technologies, University of Warsaw, Banacha 2c, 02-097 Warsaw, Poland
| | - Dariusz Plewczynski
- Laboratory of Bioinformatics and Computational Genomics, Faculty of Mathematics and Information Science, Warsaw University of Technology, Koszykowa 75, 00-662 Warsaw, Poland; Laboratory of Functional and Structural Genomics, Centre of New Technologies, University of Warsaw, Banacha 2c, 02-097 Warsaw, Poland.
| |
Collapse
|
8
|
Fischer J, Ardakani FB, Kattler K, Walter J, Schulz MH. CpG content-dependent associations between transcription factors and histone modifications. PLoS One 2021; 16:e0249985. [PMID: 33857234 PMCID: PMC8049299 DOI: 10.1371/journal.pone.0249985] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/05/2021] [Accepted: 03/30/2021] [Indexed: 11/18/2022] Open
Abstract
Understanding the factors that underlie the epigenetic regulation of genes is crucial to understand the gene regulatory machinery as a whole. Several experimental and computational studies examined the relationship between different factors involved. Here we investigate the relationship between transcription factors (TFs) and histone modifications (HMs), based on ChIP-seq data in cell lines. As it was shown that gene regulation by TFs differs depending on the CpG class of a promoter, we study the impact of the CpG content in promoters on the associations between TFs and HMs. We suggest an approach based on sparse linear regression models to infer associations between TFs and HMs with respect to CpG content. A study of the partial correlation of HMs for the two classes of high and low CpG content reveals possible CpG dependence and potential candidates for confounding factors in our models. We show that the models are accurate, inferred associations reflect known biological relationships, and we give new insight into associations with respect to CpG content. Moreover, analysis of a ChIP-seq dataset in HepG2 cells of the HM H3K122ac, an HM about little is known, reveals novel TF associations and supports a previously established link to active transcription.
Collapse
Affiliation(s)
- Jonas Fischer
- Max Planck Institute for Informatics, Databases and Information Systems, Saarbrücken, Germany
- Cluster of Excellence for Multimodal Computing and Interaction, High Throughput Genomics and Systems Biology, Saarbrücken, Germany
- * E-mail:
| | - Fatemeh Behjati Ardakani
- Max Planck Institute for Informatics, Computational Biology and Applied Algorithmics, Saarbrücken, Germany
- Cluster of Excellence for Multimodal Computing and Interaction, High Throughput Genomics and Systems Biology, Saarbrücken, Germany
- Institute of Cardiovascular Regeneration, Goethe University, Frankfurt, Germany
| | - Kathrin Kattler
- Department of Genetics, University of Saarland, Saarbrücken, Germany
| | - Jörn Walter
- Department of Genetics, University of Saarland, Saarbrücken, Germany
| | - Marcel H. Schulz
- Max Planck Institute for Informatics, Computational Biology and Applied Algorithmics, Saarbrücken, Germany
- Cluster of Excellence for Multimodal Computing and Interaction, High Throughput Genomics and Systems Biology, Saarbrücken, Germany
- Institute of Cardiovascular Regeneration, Goethe University, Frankfurt, Germany
| |
Collapse
|
9
|
Hernández HG, Hernández-Castañeda AA, Pieschacón MP, Arboleda H. ZNF718, HOXA4, and ZFP57 are differentially methylated in periodontitis in comparison with periodontal health: Epigenome-wide DNA methylation pilot study. J Periodontal Res 2021; 56:710-725. [PMID: 33660869 DOI: 10.1111/jre.12868] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/30/2020] [Revised: 02/07/2021] [Accepted: 02/14/2021] [Indexed: 12/21/2022]
Abstract
OBJECTIVE To investigate the differences in the epigenomic patterns of DNA methylation in peripheral leukocytes between patients with periodontitis and gingivally healthy controls evaluating its functional meaning by functional enrichment analysis. BACKGROUND The DNA methylation profiling of peripheral leukocytes as immune-related tissue potentially relevant as a source of biomarkers between periodontitis patients and gingivally healthy subjects has not been investigated. METHODS A DNA methylation epigenome-wide study of peripheral leukocytes was conducted using the Illumina MethylationEPIC platform in sixteen subjects, eight diagnosed with periodontitis patients and eight age-matched and sex-matched periodontally healthy controls. A trained periodontist performed the clinical evaluation. Global DNA methylation was estimated using methylation-sensitive high-resolution melting in LINE1. Routine cell count cytometry and metabolic laboratory tests were also performed. The analysis of differentially methylated positions (DMPs) and differentially methylated regions (DMRs) was made using R/Bioconductor environment considering leukocyte populations assessed in both routine cell counts and using the FlowSorted.Blood.EPIC package. Finally, a DMP and DMR intersection analysis was performed. Functional enrichment analysis was carried out with the differentially methylated genes found in DMP. RESULTS DMP analysis identified 81 differentially hypermethylated genes and 21 differentially hypomethylated genes. Importantly, the intersection analysis showed that zinc finger protein 718 (ZNF718) and homeobox A4 (HOXA4) were differentially hypermethylated and zinc finger protein 57 (ZFP57) was differentially hypomethylated in periodontitis. The functional enrichment analysis found clearly immune-related ontologies such as "detection of bacterium" and "antigen processing and presentation." CONCLUSION The results of this study propose three new periodontitis-related genes: ZNF718, HOXA4, and ZFP57 but also evidence the suitability and relevance of studying leukocytes' DNA methylome for biological interpretation of systemic immune-related epigenetic patterns in periodontitis.
Collapse
Affiliation(s)
- Hernán G Hernández
- Faculty of Dentistry, Division of Health Sciences, Universidad Santo Tomás, Bucaramanga, Colombia
| | | | - Maria P Pieschacón
- Faculty of Dentistry, Division of Health Sciences, Universidad Santo Tomás, Bucaramanga, Colombia
| | - Humberto Arboleda
- Neurosciences Research Group, Faculty of Medicine and Genetic Institute, Universidad Nacional de Colombia, Bogotá, Colombia
| |
Collapse
|
10
|
Ceddia G, Martino LN, Parodi A, Secchi P, Campaner S, Masseroli M. Association rule mining to identify transcription factor interactions in genomic regions. Bioinformatics 2020; 36:1007-1013. [PMID: 31504203 DOI: 10.1093/bioinformatics/btz687] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/20/2019] [Revised: 08/07/2019] [Accepted: 08/29/2019] [Indexed: 11/13/2022] Open
Abstract
MOTIVATION Genome regulatory networks have different layers and ways to modulate cellular processes, such as cell differentiation, proliferation, and adaptation to external stimuli. Transcription factors and other chromatin-associated proteins act as combinatorial protein complexes that control gene transcription. Thus, identifying functional interaction networks among these proteins is a fundamental task to understand the genome regulation framework. RESULTS We developed a novel approach to infer interactions among transcription factors in user-selected genomic regions, by combining the computation of association rules and of a novel Importance Index on ChIP-seq datasets. The hallmark of our method is the definition of the Importance Index, which provides a relevance measure of the interaction among transcription factors found associated in the computed rules. Examples on synthetic data explain the index use and potential. A straightforward pre-processing pipeline enables the easy extraction of input data for our approach from any set of ChIP-seq experiments. Applications on ENCODE ChIP-seq data prove that our approach can reliably detect interactions between transcription factors, including known interactions that validate our approach. AVAILABILITY AND IMPLEMENTATION A R/Bioconductor package implementing our association rules and Importance Index-based method is available at http://bioconductor.org/packages/release/bioc/html/TFARM.html. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Gaia Ceddia
- Dipartimento di Elettronica, Informazione e Bioingegneria, Italy
| | | | - Alice Parodi
- MOX - Dipartimento di Matematica, Politecnico di Milano, Milan 20133, Italy
| | - Piercesare Secchi
- MOX - Dipartimento di Matematica, Politecnico di Milano, Milan 20133, Italy.,Center for Analysis, Decisions and Society, Human Technopole, Milan 20157, Italy
| | - Stefano Campaner
- Center for Genomic Science of IIT@SEMM, Istituto Italiano di Tecnologia (IIT), Milan 20139, Italy
| | - Marco Masseroli
- Dipartimento di Elettronica, Informazione e Bioingegneria, Italy
| |
Collapse
|
11
|
Hiranuma N, Lundberg SM, Lee SI. AIControl: replacing matched control experiments with machine learning improves ChIP-seq peak identification. Nucleic Acids Res 2019; 47:e58. [PMID: 30869146 PMCID: PMC6547432 DOI: 10.1093/nar/gkz156] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/16/2018] [Revised: 02/15/2019] [Accepted: 02/28/2019] [Indexed: 01/24/2023] Open
Abstract
ChIP-seq is a technique to determine binding locations of transcription factors, which remains a central challenge in molecular biology. Current practice is to use a 'control' dataset to remove background signals from a immunoprecipitation (IP) 'target' dataset. We introduce the AIControl framework, which eliminates the need to obtain a control dataset and instead identifies binding peaks by estimating the distributions of background signals from many publicly available control ChIP-seq datasets. We thereby avoid the cost of running control experiments while simultaneously increasing the accuracy of binding location identification. Specifically, AIControl can (i) estimate background signals at fine resolution, (ii) systematically weigh the most appropriate control datasets in a data-driven way, (iii) capture sources of potential biases that may be missed by one control dataset and (iv) remove the need for costly and time-consuming control experiments. We applied AIControl to 410 IP datasets in the ENCODE ChIP-seq database, using 440 control datasets from 107 cell types to impute background signal. Without using matched control datasets, AIControl identified peaks that were more enriched for putative binding sites than those identified by other popular peak callers that used a matched control dataset. We also demonstrated that our framework identifies binding sites that recover documented protein interactions more accurately.
Collapse
Affiliation(s)
- Naozumi Hiranuma
- Paul G. Allen School of Computer Science and Engineering, University of Washington, WA, USA, 98195-2350
| | - Scott M Lundberg
- Paul G. Allen School of Computer Science and Engineering, University of Washington, WA, USA, 98195-2350
| | - Su-In Lee
- Paul G. Allen School of Computer Science and Engineering, University of Washington, WA, USA, 98195-2350
| |
Collapse
|
12
|
Zitnik M, Nguyen F, Wang B, Leskovec J, Goldenberg A, Hoffman MM. Machine Learning for Integrating Data in Biology and Medicine: Principles, Practice, and Opportunities. Inf Fusion 2019; 50:71-91. [PMID: 30467459 PMCID: PMC6242341 DOI: 10.1016/j.inffus.2018.09.012] [Citation(s) in RCA: 210] [Impact Index Per Article: 42.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/10/2023]
Abstract
New technologies have enabled the investigation of biology and human health at an unprecedented scale and in multiple dimensions. These dimensions include myriad properties describing genome, epigenome, transcriptome, microbiome, phenotype, and lifestyle. No single data type, however, can capture the complexity of all the factors relevant to understanding a phenomenon such as a disease. Integrative methods that combine data from multiple technologies have thus emerged as critical statistical and computational approaches. The key challenge in developing such approaches is the identification of effective models to provide a comprehensive and relevant systems view. An ideal method can answer a biological or medical question, identifying important features and predicting outcomes, by harnessing heterogeneous data across several dimensions of biological variation. In this Review, we describe the principles of data integration and discuss current methods and available implementations. We provide examples of successful data integration in biology and medicine. Finally, we discuss current challenges in biomedical integrative methods and our perspective on the future development of the field.
Collapse
Affiliation(s)
- Marinka Zitnik
- Department of Computer Science, Stanford University,
Stanford, CA, USA
| | - Francis Nguyen
- Department of Medical Biophysics, University of Toronto,
Toronto, ON, Canada
- Princess Margaret Cancer Centre, Toronto, ON, Canada
| | - Bo Wang
- Hikvision Research Institute, Santa Clara, CA, USA
| | - Jure Leskovec
- Department of Computer Science, Stanford University,
Stanford, CA, USA
- Chan Zuckerberg Biohub, San Francisco, CA, USA
| | - Anna Goldenberg
- Genetics & Genome Biology, SickKids Research Institute,
Toronto, ON, Canada
- Department of Computer Science, University of Toronto,
Toronto, ON, Canada
- Vector Institute, Toronto, ON, Canada
| | - Michael M. Hoffman
- Department of Medical Biophysics, University of Toronto,
Toronto, ON, Canada
- Princess Margaret Cancer Centre, Toronto, ON, Canada
- Department of Computer Science, University of Toronto,
Toronto, ON, Canada
- Vector Institute, Toronto, ON, Canada
| |
Collapse
|
13
|
Wang R, Wang Y, Zhang X, Zhang Y, Du X, Fang Y, Li G. Hierarchical cooperation of transcription factors from integration analysis of DNA sequences, ChIP-Seq and ChIA-PET data. BMC Genomics 2019; 20:296. [PMID: 32039697 PMCID: PMC7226942 DOI: 10.1186/s12864-019-5535-2] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/05/2023] Open
Abstract
Background Chromosomal architecture, which is constituted by chromatin loops, plays an important role in cellular functions. Gene expression and cell identity can be regulated by the chromatin loop, which is formed by proximal or distal enhancers and promoters in linear DNA (1D). Enhancers and promoters are fundamental non-coding elements enriched with transcription factors (TFs) to form chromatin loops. However, the specific cooperation of TFs involved in forming chromatin loops is not fully understood. Results Here, we proposed a method for investigating the cooperation of TFs in four cell lines by the integrative analysis of DNA sequences, ChIP-Seq and ChIA-PET data. Results demonstrate that the interaction of enhancers and promoters is a hierarchical and dynamic complex process with cooperative interactions of different TFs synergistically regulating gene expression and chromatin structure. The TF cooperation involved in maintaining and regulating the chromatin loop of cells can be regulated by epigenetic factors, such as other TFs and DNA methylation. Conclusions Such cooperation among TFs provides the potential features that can affect chromatin’s 3D architecture in cells. The regulation of chromatin 3D organization and gene expression is a complex process associated with the hierarchical and dynamic prosperities of TFs. Electronic supplementary material The online version of this article (10.1186/s12864-019-5535-2) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Ruimin Wang
- Agricultural Bioinformatics Key Laboratory of Hubei Province, Wuhan, 430070, China
| | - Yunlong Wang
- Agricultural Bioinformatics Key Laboratory of Hubei Province, Wuhan, 430070, China
| | - Xueying Zhang
- Agricultural Bioinformatics Key Laboratory of Hubei Province, Wuhan, 430070, China
| | - Yaliang Zhang
- Agricultural Bioinformatics Key Laboratory of Hubei Province, Wuhan, 430070, China
| | - Xiaoyong Du
- Agricultural Bioinformatics Key Laboratory of Hubei Province, Wuhan, 430070, China.,Huazhong Agricultural University, Wuhan, 430070, China
| | - Yaping Fang
- Agricultural Bioinformatics Key Laboratory of Hubei Province, Wuhan, 430070, China. .,Huazhong Agricultural University, Wuhan, 430070, China. .,College of Informatics, Huazhong Agricultural University, Wuhan, 430070, China.
| | - Guoliang Li
- Agricultural Bioinformatics Key Laboratory of Hubei Province, Wuhan, 430070, China. .,Huazhong Agricultural University, Wuhan, 430070, China. .,College of Informatics, Huazhong Agricultural University, Wuhan, 430070, China.
| |
Collapse
|
14
|
Ng FSL, Ruau D, Wernisch L, Göttgens B. A graphical model approach visualizes regulatory relationships between genome-wide transcription factor binding profiles. Brief Bioinform 2018; 19:162-173. [PMID: 27780826 PMCID: PMC5496675 DOI: 10.1093/bib/bbw102] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/08/2015] [Indexed: 11/16/2022] Open
Abstract
Integrated analysis of multiple genome-wide transcription factor (TF)-binding profiles will be vital to advance our understanding of the global impact of TF binding. However, existing methods for measuring similarity in large numbers of chromatin immunoprecipitation assays with sequencing (ChIP-seq), such as correlation, mutual information or enrichment analysis, are limited in their ability to display functionally relevant TF relationships. In this study, we propose the use of graphical models to determine conditional independence between TFs and showed that network visualization provides a promising alternative to distinguish ‘direct’ versus ‘indirect’ TF interactions. We applied four algorithms to measure ‘direct’ dependence to a compendium of 367 mouse haematopoietic TF ChIP-seq samples and obtained a consensus network known as a ‘TF association network’ where edges in the network corresponded to likely causal pairwise relationships between TFs. The ‘TF association network’ illustrates the role of TFs in developmental pathways, is reminiscent of combinatorial TF regulation, corresponds to known protein–protein interactions and indicates substantial TF-binding reorganization in leukemic cell types. With the rapid increase in TF ChIP-Seq data sets, the approach presented here will be a powerful tool to study transcriptional programmes across a wide range of biological systems.
Collapse
Affiliation(s)
- Felicia S L Ng
- Department of Haematology, Wellcome Trust and MRC Cambridge Stem Cell Institute & Cambridge Institute for Medical Research, Hills Road, Cambridge, UK
| | - David Ruau
- Department of Haematology, Wellcome Trust and MRC Cambridge Stem Cell Institute & Cambridge Institute for Medical Research, Hills Road, Cambridge, UK
| | - Lorenz Wernisch
- Department of Haematology, Wellcome Trust and MRC Cambridge Stem Cell Institute & Cambridge Institute for Medical Research, Hills Road, Cambridge, UK
| | - Berthold Göttgens
- Department of Haematology, Wellcome Trust and MRC Cambridge Stem Cell Institute & Cambridge Institute for Medical Research, Hills Road, Cambridge, UK
- Corresponding author: Berthold Gottgens, Department of Haematology, Wellcome Trust and MRC Cambridge Stem Cell Institute & Cambridge Institute for Medical Research, Hills Road, Cambridge CB2 0XY, UK. Tel: 01223-336829; Fax: 01223-762670; E-mail:
| |
Collapse
|
15
|
Dozmorov MG. Epigenomic annotation-based interpretation of genomic data: from enrichment analysis to machine learning. Bioinformatics 2018; 33:3323-3330. [PMID: 29028263 DOI: 10.1093/bioinformatics/btx414] [Citation(s) in RCA: 26] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/05/2017] [Accepted: 06/22/2017] [Indexed: 12/12/2022] Open
Abstract
Motivation One of the goals of functional genomics is to understand the regulatory implications of experimentally obtained genomic regions of interest (ROIs). Most sequencing technologies now generate ROIs distributed across the whole genome. The interpretation of these genome-wide ROIs represents a challenge as the majority of them lie outside of functionally well-defined protein coding regions. Recent efforts by the members of the International Human Epigenome Consortium have generated volumes of functional/regulatory data (reference epigenomic datasets), effectively annotating the genome with epigenomic properties. Consequently, a wide variety of computational tools has been developed utilizing these epigenomic datasets for the interpretation of genomic data. Results The purpose of this review is to provide a structured overview of practical solutions for the interpretation of ROIs with the help of epigenomic data. Starting with epigenomic enrichment analysis, we discuss leading tools and machine learning methods utilizing epigenomic and 3D genome structure data. The hierarchy of tools and methods reviewed here presents a practical guide for the interpretation of genome-wide ROIs within an epigenomic context. Contact mikhail.dozmorov@vcuhealth.org. Supplementary information Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Mikhail G Dozmorov
- Department of Biostatistics, Virginia Commonwealth University, Richmond, VA 23298, USA
| |
Collapse
|
16
|
Ching T, Himmelstein DS, Beaulieu-Jones BK, Kalinin AA, Do BT, Way GP, Ferrero E, Agapow PM, Zietz M, Hoffman MM, Xie W, Rosen GL, Lengerich BJ, Israeli J, Lanchantin J, Woloszynek S, Carpenter AE, Shrikumar A, Xu J, Cofer EM, Lavender CA, Turaga SC, Alexandari AM, Lu Z, Harris DJ, DeCaprio D, Qi Y, Kundaje A, Peng Y, Wiley LK, Segler MHS, Boca SM, Swamidass SJ, Huang A, Gitter A, Greene CS. Opportunities and obstacles for deep learning in biology and medicine. J R Soc Interface 2018; 15:20170387. [PMID: 29618526 PMCID: PMC5938574 DOI: 10.1098/rsif.2017.0387] [Citation(s) in RCA: 764] [Impact Index Per Article: 127.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/26/2017] [Accepted: 03/07/2018] [Indexed: 11/12/2022] Open
Abstract
Deep learning describes a class of machine learning algorithms that are capable of combining raw inputs into layers of intermediate features. These algorithms have recently shown impressive results across a variety of domains. Biology and medicine are data-rich disciplines, but the data are complex and often ill-understood. Hence, deep learning techniques may be particularly well suited to solve problems of these fields. We examine applications of deep learning to a variety of biomedical problems-patient classification, fundamental biological processes and treatment of patients-and discuss whether deep learning will be able to transform these tasks or if the biomedical sphere poses unique challenges. Following from an extensive literature review, we find that deep learning has yet to revolutionize biomedicine or definitively resolve any of the most pressing challenges in the field, but promising advances have been made on the prior state of the art. Even though improvements over previous baselines have been modest in general, the recent progress indicates that deep learning methods will provide valuable means for speeding up or aiding human investigation. Though progress has been made linking a specific neural network's prediction to input features, understanding how users should interpret these models to make testable hypotheses about the system under study remains an open challenge. Furthermore, the limited amount of labelled data for training presents problems in some domains, as do legal and privacy constraints on work with sensitive health records. Nonetheless, we foresee deep learning enabling changes at both bench and bedside with the potential to transform several areas of biology and medicine.
Collapse
Affiliation(s)
- Travers Ching
- Molecular Biosciences and Bioengineering Graduate Program, University of Hawaii at Manoa, Honolulu, HI, USA
| | - Daniel S Himmelstein
- Department of Systems Pharmacology and Translational Therapeutics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, USA
| | - Brett K Beaulieu-Jones
- Genomics and Computational Biology Graduate Group, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, USA
| | - Alexandr A Kalinin
- Department of Computational Medicine and Bioinformatics, University of Michigan Medical School, Ann Arbor, MI, USA
| | | | - Gregory P Way
- Department of Systems Pharmacology and Translational Therapeutics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, USA
| | - Enrico Ferrero
- Computational Biology and Stats, Target Sciences, GlaxoSmithKline, Stevenage, UK
| | | | - Michael Zietz
- Department of Systems Pharmacology and Translational Therapeutics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, USA
| | - Michael M Hoffman
- Princess Margaret Cancer Centre, Toronto, Ontario, Canada
- Department of Medical Biophysics, University of Toronto, Toronto, Ontario, Canada
- Department of Computer Science, University of Toronto, Toronto, Ontario, Canada
| | - Wei Xie
- Electrical Engineering and Computer Science, Vanderbilt University, Nashville, TN, USA
| | - Gail L Rosen
- Ecological and Evolutionary Signal-processing and Informatics Laboratory, Department of Electrical and Computer Engineering, Drexel University, Philadelphia, PA, USA
| | - Benjamin J Lengerich
- Computational Biology Department, School of Computer Science, Carnegie Mellon University, Pittsburgh, PA, USA
| | - Johnny Israeli
- Biophysics Program, Stanford University, Stanford, CA, USA
| | - Jack Lanchantin
- Department of Computer Science, University of Virginia, Charlottesville, VA, USA
| | - Stephen Woloszynek
- Ecological and Evolutionary Signal-processing and Informatics Laboratory, Department of Electrical and Computer Engineering, Drexel University, Philadelphia, PA, USA
| | - Anne E Carpenter
- Imaging Platform, Broad Institute of Harvard and MIT, Cambridge, MA, USA
| | - Avanti Shrikumar
- Department of Computer Science, Stanford University, Stanford, CA, USA
| | - Jinbo Xu
- Toyota Technological Institute at Chicago, Chicago, IL, USA
| | - Evan M Cofer
- Department of Computer Science, Trinity University, San Antonio, TX, USA
- Lewis-Sigler Institute for Integrative Genomics, Princeton University, Princeton, NJ, USA
| | - Christopher A Lavender
- Integrative Bioinformatics, National Institute of Environmental Health Sciences, National Institutes of Health, Research Triangle Park, NC, USA
| | - Srinivas C Turaga
- Howard Hughes Medical Institute, Janelia Research Campus, Ashburn, VA, USA
| | - Amr M Alexandari
- Department of Computer Science, Stanford University, Stanford, CA, USA
| | - Zhiyong Lu
- National Center for Biotechnology Information and National Library of Medicine, National Institutes of Health, Bethesda, MD, USA
| | - David J Harris
- Department of Wildlife Ecology and Conservation, University of Florida, Gainesville, FL, USA
| | | | - Yanjun Qi
- Department of Computer Science, University of Virginia, Charlottesville, VA, USA
| | - Anshul Kundaje
- Department of Computer Science, Stanford University, Stanford, CA, USA
- Department of Genetics, Stanford University, Stanford, CA, USA
| | - Yifan Peng
- National Center for Biotechnology Information and National Library of Medicine, National Institutes of Health, Bethesda, MD, USA
| | - Laura K Wiley
- Division of Biomedical Informatics and Personalized Medicine, University of Colorado School of Medicine, Aurora, CO, USA
| | - Marwin H S Segler
- Institute of Organic Chemistry, Westfälische Wilhelms-Universität Münster, Münster, Germany
| | - Simina M Boca
- Innovation Center for Biomedical Informatics, Georgetown University Medical Center, Washington, DC, USA
| | - S Joshua Swamidass
- Department of Pathology and Immunology, Washington University in Saint Louis, St Louis, MO, USA
| | - Austin Huang
- Department of Medicine, Brown University, Providence, RI, USA
| | - Anthony Gitter
- Department of Biostatistics and Medical Informatics, University of Wisconsin-Madison, Madison, WI, USA
- Morgridge Institute for Research, Madison, WI, USA
| | - Casey S Greene
- Department of Systems Pharmacology and Translational Therapeutics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, USA
| |
Collapse
|
17
|
Rajarajan P, Jiang Y, Kassim BS, Akbarian S. Chromosomal Conformations and Epigenomic Regulation in Schizophrenia. Prog Mol Biol Transl Sci 2018; 157:21-40. [PMID: 29933951 DOI: 10.1016/bs.pmbts.2017.11.022] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/20/2022]
Abstract
Chromosomal conformations, including promoter-enhancer loops, provide a critical regulatory layer for the transcriptional machinery. Therefore, schizophrenia, a common psychiatric disorder associated with broad changes in neuronal gene expression in prefrontal cortex and other brain regions implicated in psychosis, could be associated with alterations in higher-order chromatin. Here, we review early studies on spatial genome organization in the schizophrenia postmortem brain and discuss how integrative approaches using cell culture and animal model systems could gain deeper insight into the potential roles of higher-order chromatin for the neurobiology of and novel treatment avenues for common psychiatric disease.
Collapse
|
18
|
Abstract
We developed a predictive, stable, and interpretable tool: the iterative random forest algorithm (iRF). iRF discovers high-order interactions among biomolecules with the same order of computational cost as random forests. We demonstrate the efficacy of iRF by finding known and promising interactions among biomolecules, of up to fifth and sixth order, in two data examples in transcriptional regulation and alternative splicing. Genomics has revolutionized biology, enabling the interrogation of whole transcriptomes, genome-wide binding sites for proteins, and many other molecular processes. However, individual genomic assays measure elements that interact in vivo as components of larger molecular machines. Understanding how these high-order interactions drive gene expression presents a substantial statistical challenge. Building on random forests (RFs) and random intersection trees (RITs) and through extensive, biologically inspired simulations, we developed the iterative random forest algorithm (iRF). iRF trains a feature-weighted ensemble of decision trees to detect stable, high-order interactions with the same order of computational cost as the RF. We demonstrate the utility of iRF for high-order interaction discovery in two prediction problems: enhancer activity in the early Drosophila embryo and alternative splicing of primary transcripts in human-derived cell lines. In Drosophila, among the 20 pairwise transcription factor interactions iRF identifies as stable (returned in more than half of bootstrap replicates), 80% have been previously reported as physical interactions. Moreover, third-order interactions, e.g., between Zelda (Zld), Giant (Gt), and Twist (Twi), suggest high-order relationships that are candidates for follow-up experiments. In human-derived cells, iRF rediscovered a central role of H3K36me3 in chromatin-mediated splicing regulation and identified interesting fifth- and sixth-order interactions, indicative of multivalent nucleosomes with specific roles in splicing regulation. By decoupling the order of interactions from the computational cost of identification, iRF opens additional avenues of inquiry into the molecular mechanisms underlying genome biology.
Collapse
|
19
|
Bessière C, Taha M, Petitprez F, Vandel J, Marin JM, Bréhélin L, Lèbre S, Lecellier CH. Probing instructions for expression regulation in gene nucleotide compositions. PLoS Comput Biol 2018; 14:e1005921. [PMID: 29293496 PMCID: PMC5766238 DOI: 10.1371/journal.pcbi.1005921] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/11/2017] [Revised: 01/12/2018] [Accepted: 12/10/2017] [Indexed: 01/22/2023] Open
Abstract
Gene expression is orchestrated by distinct regulatory regions to ensure a wide variety of cell types and functions. A challenge is to identify which regulatory regions are active, what are their associated features and how they work together in each cell type. Several approaches have tackled this problem by modeling gene expression based on epigenetic marks, with the ultimate goal of identifying driving regions and associated genomic variations that are clinically relevant in particular in precision medicine. However, these models rely on experimental data, which are limited to specific samples (even often to cell lines) and cannot be generated for all regulators and all patients. In addition, we show here that, although these approaches are accurate in predicting gene expression, inference of TF combinations from this type of models is not straightforward. Furthermore these methods are not designed to capture regulation instructions present at the sequence level, before the binding of regulators or the opening of the chromatin. Here, we probe sequence-level instructions for gene expression and develop a method to explain mRNA levels based solely on nucleotide features. Our method positions nucleotide composition as a critical component of gene expression. Moreover, our approach, able to rank regulatory regions according to their contribution, unveils a strong influence of the gene body sequence, in particular introns. We further provide evidence that the contribution of nucleotide content can be linked to co-regulations associated with genome 3D architecture and to associations of genes within topologically associated domains.
Collapse
Affiliation(s)
- Chloé Bessière
- IBC, Univ. Montpellier, CNRS, Montpellier, France
- Institut de Génétique Moléculaire de Montpellier, University of Montpellier, CNRS, Montpellier, France
| | - May Taha
- IBC, Univ. Montpellier, CNRS, Montpellier, France
- Institut de Génétique Moléculaire de Montpellier, University of Montpellier, CNRS, Montpellier, France
- IMAG, Univ. Montpellier, CNRS, Montpellier, France
| | - Florent Petitprez
- IBC, Univ. Montpellier, CNRS, Montpellier, France
- Institut de Génétique Moléculaire de Montpellier, University of Montpellier, CNRS, Montpellier, France
| | - Jimmy Vandel
- IBC, Univ. Montpellier, CNRS, Montpellier, France
- LIRMM, Univ. Montpellier, CNRS, Montpellier, France
| | - Jean-Michel Marin
- IBC, Univ. Montpellier, CNRS, Montpellier, France
- IMAG, Univ. Montpellier, CNRS, Montpellier, France
| | - Laurent Bréhélin
- IBC, Univ. Montpellier, CNRS, Montpellier, France
- LIRMM, Univ. Montpellier, CNRS, Montpellier, France
| | - Sophie Lèbre
- IBC, Univ. Montpellier, CNRS, Montpellier, France
- IMAG, Univ. Montpellier, CNRS, Montpellier, France
- Univ. Paul-Valéry-Montpellier 3, Montpellier, France
| | - Charles-Henri Lecellier
- IBC, Univ. Montpellier, CNRS, Montpellier, France
- Institut de Génétique Moléculaire de Montpellier, University of Montpellier, CNRS, Montpellier, France
| |
Collapse
|
20
|
Javidfar B, Park R, Kassim BS, Bicks LK, Akbarian S. The epigenomics of schizophrenia, in the mouse. Am J Med Genet B Neuropsychiatr Genet 2017; 174:631-640. [PMID: 28699694 PMCID: PMC5573750 DOI: 10.1002/ajmg.b.32566] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 03/04/2017] [Revised: 05/04/2017] [Accepted: 06/12/2017] [Indexed: 01/02/2023]
Abstract
Large-scale consortia including the Psychiatric Genomics Consortium, the Common Minds Consortium, BrainSeq and PsychENCODE, and many other studies taken together provide increasingly detailed insights into the genetic and epigenetic risk architectures of schizophrenia (SCZ) and offer vast amounts of molecular information, but with largely unexplored therapeutic potential. Here we discuss how epigenomic studies in human brain could guide animal work to test the impact of disease-associated alterations in chromatin structure and function on cognition and behavior. For example, transcription factors such as MYOCYTE-SPECIFIC ENHANCER FACTOR 2C (MEF2C), or multiple regulators of the open chromatin mark, methyl-histone H3-lysine 4, are associated with the genetic risk architectures of common psychiatric disease and alterations in chromatin structure and function in diseased brain tissue. Importantly, these molecules also affect cognition and behavior in genetically engineered mice, including virus-mediated expression changes in prefrontal cortex (PFC) and other key nodes in the circuitry underlying psychosis. Therefore, preclinical and small laboratory animal work could target genomic sequences affected by chromatin alterations in SCZ. To this end, in vivo editing of enhancer and other regulatory non-coding DNA by RNA-guided nucleases including CRISPR-Cas, and designer transcription factors, could be expected to deliver pipelines for novel therapeutic approaches aimed at improving cognitive dysfunction and other core symptoms of SCZ.
Collapse
Affiliation(s)
| | | | | | - Lucy K. Bicks
- Department of Psychiatry; Friedman Brain Institute; Icahn School of Medicine at Mount Sinai; New York New York
| | - Schahram Akbarian
- Department of Psychiatry; Friedman Brain Institute; Icahn School of Medicine at Mount Sinai; New York New York
| |
Collapse
|
21
|
Hollstein R, Reiz B, Kötter L, Richter A, Schaake S, Lohmann K, Kaiser FJ. Dystonia-causing mutations in the transcription factor THAP1 disrupt HCFC1 cofactor recruitment and alter gene expression. Hum Mol Genet 2017; 26:2975-2983. [DOI: 10.1093/hmg/ddx187] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/02/2017] [Accepted: 05/05/2017] [Indexed: 12/14/2022] Open
|
22
|
Chetverina D, Fujioka M, Erokhin M, Georgiev P, Jaynes JB, Schedl P. Boundaries of loop domains (insulators): Determinants of chromosome form and function in multicellular eukaryotes. Bioessays 2017; 39. [PMID: 28133765 DOI: 10.1002/bies.201600233] [Citation(s) in RCA: 34] [Impact Index Per Article: 4.9] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022]
Abstract
Chromosomes in multicellular animals are subdivided into a series of looped domains. In addition to being the underlying principle for organizing the chromatin fiber, looping is critical for processes ranging from gene regulation to recombination and repair. The subdivision of chromosomes into looped domains depends upon a special class of architectural elements called boundaries or insulators. These elements are distributed throughout the genome and are ubiquitous building blocks of chromosomes. In this review, we focus on features of boundaries that are critical in determining the topology of the looped domains and their genetic properties. We highlight the properties of fly boundaries that are likely to have an important bearing on the organization of looped domains in vertebrates, and discuss the functional consequences of the observed similarities and differences.
Collapse
Affiliation(s)
- Darya Chetverina
- Department of the Control of Genetic Processes, Institute of Gene Biology, Russian Academy of Sciences, Moscow, Russia
| | - Miki Fujioka
- Department of Biochemistry and Molecular Biology, Thomas Jefferson University, Philadelphia, PA, USA
| | - Maksim Erokhin
- Department of the Control of Genetic Processes, Institute of Gene Biology, Russian Academy of Sciences, Moscow, Russia
| | - Pavel Georgiev
- Department of the Control of Genetic Processes, Institute of Gene Biology, Russian Academy of Sciences, Moscow, Russia
| | - James B Jaynes
- Department of Biochemistry and Molecular Biology, Thomas Jefferson University, Philadelphia, PA, USA
| | - Paul Schedl
- Department of Molecular Biology, Princeton University, Princeton, NJ, USA.,Laboratory of Gene Expression Regulation in Development, Institute of Gene Biology, Russian Academy of Sciences, Moscow, Russia
| |
Collapse
|