1
|
Huang Z, Wang J, Yan Z, Guo M. Differentially expressed genes prediction by multiple self-attention on epigenetics data. Brief Bioinform 2022; 23:6563414. [PMID: 35380603 DOI: 10.1093/bib/bbac117] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/30/2021] [Revised: 03/09/2022] [Accepted: 03/09/2022] [Indexed: 11/12/2022] Open
Abstract
Predicting differentially expressed genes (DEGs) from epigenetics signal data is the key to understand how epigenetics controls cell functional heterogeneity by gene regulation. This knowledge can help developing 'epigenetics drugs' for complex diseases like cancers. Most of existing machine learning-based methods suffer defects in prediction accuracy, interpretability or training speed. To address these problems, in this paper, we propose a Multiple Self-Attention model for predicting DEGs on Epigenetic data (Epi-MSA). Epi-MSA first uses convolutional neural networks for neighborhood bins information embedding, and then employs multiple self-attention encoders on different input epigenetics factors data to learn which locations of genes are important for predicting DEGs. Next it trains a soft attention module to pick out which epigenetics factors are significant. The attention mechanism makes the model interpretable, and the pure matrix operation of self-attention enables the model to be parallel calculated and speeds up the training. Experiments on datasets from the Roadmap Epigenome Project and BluePrint Data Analysis Portal (BDAP) show that the performance of Epi-MSA is better than existing competitive methods, and Epi-MSA also has a smaller standard deviation, which shows that Epi-MSA is effective and stable. In addition, Epi-MSA has a good interpretability, this is confirmed by referring its attention weight matrix with existing biological knowledge.
Collapse
Affiliation(s)
- Zimo Huang
- School of Software, Shandong University, Jinan 250101, China.,Joint SDU-NTU Centre for Artificial Intelligence Research, Shandong University, Jinan 250101, China
| | - Jun Wang
- Joint SDU-NTU Centre for Artificial Intelligence Research, Shandong University, Jinan 250101, China
| | - Zhongmin Yan
- School of Software, Shandong University, Jinan 250101, China
| | - Maozu Guo
- College of Electrical and Information Engineering, Beijing University of Civil Engineering and Architecture, Beijing 100044, China
| |
Collapse
|
2
|
Bejjani F, Tolza C, Boulanger M, Downes D, Romero R, Maqbool M, Zine El Aabidine A, Andrau JC, Lebre S, Brehelin L, Parrinello H, Rohmer M, Kaoma T, Vallar L, Hughes J, Zibara K, Lecellier CH, Piechaczyk M, Jariel-Encontre I. Fra-1 regulates its target genes via binding to remote enhancers without exerting major control on chromatin architecture in triple negative breast cancers. Nucleic Acids Res 2021; 49:2488-2508. [PMID: 33533919 PMCID: PMC7968996 DOI: 10.1093/nar/gkab053] [Citation(s) in RCA: 11] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/13/2020] [Revised: 12/21/2020] [Accepted: 01/25/2021] [Indexed: 12/12/2022] Open
Abstract
The ubiquitous family of dimeric transcription factors AP-1 is made up of Fos and Jun family proteins. It has long been thought to operate principally at gene promoters and how it controls transcription is still ill-understood. The Fos family protein Fra-1 is overexpressed in triple negative breast cancers (TNBCs) where it contributes to tumor aggressiveness. To address its transcriptional actions in TNBCs, we combined transcriptomics, ChIP-seqs, machine learning and NG Capture-C. Additionally, we studied its Fos family kin Fra-2 also expressed in TNBCs, albeit much less. Consistently with their pleiotropic effects, Fra-1 and Fra-2 up- and downregulate individually, together or redundantly many genes associated with a wide range of biological processes. Target gene regulation is principally due to binding of Fra-1 and Fra-2 at regulatory elements located distantly from cognate promoters where Fra-1 modulates the recruitment of the transcriptional co-regulator p300/CBP and where differences in AP-1 variant motif recognition can underlie preferential Fra-1- or Fra-2 bindings. Our work also shows no major role for Fra-1 in chromatin architecture control at target gene loci, but suggests collaboration between Fra-1-bound and -unbound enhancers within chromatin hubs sometimes including promoters for other Fra-1-regulated genes. Our work impacts our view of AP-1.
Collapse
Affiliation(s)
- Fabienne Bejjani
- IGMM, Univ Montpellier, CNRS, Montpellier, France
- PRASE, DSST, ER045, Lebanese University, Beirut, Lebanon
| | - Claire Tolza
- IGMM, Univ Montpellier, CNRS, Montpellier, France
| | | | - Damien Downes
- Medical Research Council, Molecular Haematology Unit, Weatherall Institute of Molecular Medicine, Oxford University, Oxford, UK
| | - Raphaël Romero
- IMAG, Univ Montpellier, CNRS, Montpellier, France
- LIRMM, Univ Montpellier, CNRS, Montpellier, France
| | | | | | | | - Sophie Lebre
- IMAG, Univ Montpellier, CNRS, Montpellier, France
| | | | - Hughes Parrinello
- Montpellier GenomiX, MGX, BioCampus Montpellier, CNRS, INSERM, Univ. Montpellier, F-34094 Montpellier, France
| | - Marine Rohmer
- Montpellier GenomiX, MGX, BioCampus Montpellier, CNRS, INSERM, Univ. Montpellier, F-34094 Montpellier, France
| | - Tony Kaoma
- Computational Biomedecine, Quantitative Biology Unit, Luxembourg Institute of Health, Strassen, Luxembourg
| | - Laurent Vallar
- Proteome and Genome Research Unit, Department of Oncology, Luxembourg Institute of Health, Luxembourg, Luxembourg
| | - Jim R Hughes
- Medical Research Council, Molecular Haematology Unit, Weatherall Institute of Molecular Medicine, Oxford University, Oxford, UK
| | - Kazem Zibara
- PRASE, DSST, ER045, Lebanese University, Beirut, Lebanon
- Biology Department, Faculty of Sciences-I, Lebanese University, Beirut, Lebanon
| | - Charles-Henri Lecellier
- IGMM, Univ Montpellier, CNRS, Montpellier, France
- LIRMM, Univ Montpellier, CNRS, Montpellier, France
| | | | | |
Collapse
|
3
|
Boulanger M, Chakraborty M, Tempé D, Piechaczyk M, Bossis G. SUMO and Transcriptional Regulation: The Lessons of Large-Scale Proteomic, Modifomic and Genomic Studies. Molecules 2021; 26:molecules26040828. [PMID: 33562565 PMCID: PMC7915335 DOI: 10.3390/molecules26040828] [Citation(s) in RCA: 35] [Impact Index Per Article: 11.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/12/2021] [Revised: 01/29/2021] [Accepted: 02/01/2021] [Indexed: 12/12/2022] Open
Abstract
One major role of the eukaryotic peptidic post-translational modifier SUMO in the cell is transcriptional control. This occurs via modification of virtually all classes of transcriptional actors, which include transcription factors, transcriptional coregulators, diverse chromatin components, as well as Pol I-, Pol II- and Pol III transcriptional machineries and their regulators. For many years, the role of SUMOylation has essentially been studied on individual proteins, or small groups of proteins, principally dealing with Pol II-mediated transcription. This provided only a fragmentary view of how SUMOylation controls transcription. The recent advent of large-scale proteomic, modifomic and genomic studies has however considerably refined our perception of the part played by SUMO in gene expression control. We review here these developments and the new concepts they are at the origin of, together with the limitations of our knowledge. How they illuminate the SUMO-dependent transcriptional mechanisms that have been characterized thus far and how they impact our view of SUMO-dependent chromatin organization are also considered.
Collapse
Affiliation(s)
- Mathias Boulanger
- Institut de Génétique Moléculaire de Montpellier (IGMM), University of Montpellier, CNRS, Montpellier, France; (M.B.); (M.C.); (D.T.)
- Equipe Labellisée Ligue Contre le Cancer, Paris, France
| | - Mehuli Chakraborty
- Institut de Génétique Moléculaire de Montpellier (IGMM), University of Montpellier, CNRS, Montpellier, France; (M.B.); (M.C.); (D.T.)
- Equipe Labellisée Ligue Contre le Cancer, Paris, France
| | - Denis Tempé
- Institut de Génétique Moléculaire de Montpellier (IGMM), University of Montpellier, CNRS, Montpellier, France; (M.B.); (M.C.); (D.T.)
- Equipe Labellisée Ligue Contre le Cancer, Paris, France
| | - Marc Piechaczyk
- Institut de Génétique Moléculaire de Montpellier (IGMM), University of Montpellier, CNRS, Montpellier, France; (M.B.); (M.C.); (D.T.)
- Equipe Labellisée Ligue Contre le Cancer, Paris, France
- Correspondence: (M.P.); (G.B.)
| | - Guillaume Bossis
- Institut de Génétique Moléculaire de Montpellier (IGMM), University of Montpellier, CNRS, Montpellier, France; (M.B.); (M.C.); (D.T.)
- Equipe Labellisée Ligue Contre le Cancer, Paris, France
- Correspondence: (M.P.); (G.B.)
| |
Collapse
|
4
|
Osmala M, Lähdesmäki H. Enhancer prediction in the human genome by probabilistic modelling of the chromatin feature patterns. BMC Bioinformatics 2020; 21:317. [PMID: 32689977 PMCID: PMC7370432 DOI: 10.1186/s12859-020-03621-3] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/10/2019] [Accepted: 06/19/2020] [Indexed: 12/11/2022] Open
Abstract
Background The binding sites of transcription factors (TFs) and the localisation of histone modifications in the human genome can be quantified by the chromatin immunoprecipitation assay coupled with next-generation sequencing (ChIP-seq). The resulting chromatin feature data has been successfully adopted for genome-wide enhancer identification by several unsupervised and supervised machine learning methods. However, the current methods predict different numbers and different sets of enhancers for the same cell type and do not utilise the pattern of the ChIP-seq coverage profiles efficiently. Results In this work, we propose a PRobabilistic Enhancer PRedictIoN Tool (PREPRINT) that assumes characteristic coverage patterns of chromatin features at enhancers and employs a statistical model to account for their variability. PREPRINT defines probabilistic distance measures to quantify the similarity of the genomic query regions and the characteristic coverage patterns. The probabilistic scores of the enhancer and non-enhancer samples are utilised to train a kernel-based classifier. The performance of the method is demonstrated on ENCODE data for two cell lines. The predicted enhancers are computationally validated based on the transcriptional regulatory protein binding sites and compared to the predictions obtained by state-of-the-art methods. Conclusion PREPRINT performs favorably to the state-of-the-art methods, especially when requiring the methods to predict a larger set of enhancers. PREPRINT generalises successfully to data from cell type not utilised for training, and often the PREPRINT performs better than the previous methods. The PREPRINT enhancers are less sensitive to the choice of prediction threshold. PREPRINT identifies biologically validated enhancers not predicted by the competing methods. The enhancers predicted by PREPRINT can aid the genome interpretation in functional genomics and clinical studies.
Collapse
Affiliation(s)
- Maria Osmala
- Department of Computer Science, Aalto University, Konemiehentie 2, Espoo, 02150, Finland.
| | - Harri Lähdesmäki
- Department of Computer Science, Aalto University, Konemiehentie 2, Espoo, 02150, Finland
| |
Collapse
|