1
|
Xin R, Cheng Q, Chi X, Feng X, Zhang H, Wang Y, Duan M, Xie T, Song X, Yu Q, Fan Y, Huang L, Zhou F. Computational Characterization of Undifferentially Expressed Genes with Altered Transcription Regulation in Lung Cancer. Genes (Basel) 2023; 14:2169. [PMID: 38136991 PMCID: PMC10742656 DOI: 10.3390/genes14122169] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/01/2023] [Revised: 11/19/2023] [Accepted: 11/27/2023] [Indexed: 12/24/2023] Open
Abstract
A transcriptome profiles the expression levels of genes in cells and has accumulated a huge amount of public data. Most of the existing biomarker-related studies investigated the differential expression of individual transcriptomic features under the assumption of inter-feature independence. Many transcriptomic features without differential expression were ignored from the biomarker lists. This study proposed a computational analysis protocol (mqTrans) to analyze transcriptomes from the view of high-dimensional inter-feature correlations. The mqTrans protocol trained a regression model to predict the expression of an mRNA feature from those of the transcription factors (TFs). The difference between the predicted and real expression of an mRNA feature in a query sample was defined as the mqTrans feature. The new mqTrans view facilitated the detection of thirteen transcriptomic features with differentially expressed mqTrans features, but without differential expression in the original transcriptomic values in three independent datasets of lung cancer. These features were called dark biomarkers because they would have been ignored in a conventional differential analysis. The detailed discussion of one dark biomarker, GBP5, and additional validation experiments suggested that the overlapping long non-coding RNAs might have contributed to this interesting phenomenon. In summary, this study aimed to find undifferentially expressed genes with significantly changed mqTrans values in lung cancer. These genes were usually ignored in most biomarker detection studies of undifferential expression. However, their differentially expressed mqTrans values in three independent datasets suggested their strong associations with lung cancer.
Collapse
Affiliation(s)
- Ruihao Xin
- Key Laboratory of Symbolic Computation and Knowledge Engineering of Ministry of Education, College of Computer Science and Technology, Jilin University, Changchun 130012, China; (R.X.); (Y.W.); (M.D.); (L.H.)
- Jilin Institute of Chemical Technology, College of Information and Control Engineering, Jilin 132000, China; (Q.C.); (X.C.); (H.Z.)
| | - Qian Cheng
- Jilin Institute of Chemical Technology, College of Information and Control Engineering, Jilin 132000, China; (Q.C.); (X.C.); (H.Z.)
| | - Xiaohang Chi
- Jilin Institute of Chemical Technology, College of Information and Control Engineering, Jilin 132000, China; (Q.C.); (X.C.); (H.Z.)
| | - Xin Feng
- School of Science, Jilin Institute of Chemical Technology, Jilin 132000, China;
- Department of Epidemiology and Biostatistics, School of Public Health, Jilin University, Changchun 130012, China;
| | - Hang Zhang
- Jilin Institute of Chemical Technology, College of Information and Control Engineering, Jilin 132000, China; (Q.C.); (X.C.); (H.Z.)
| | - Yueying Wang
- Key Laboratory of Symbolic Computation and Knowledge Engineering of Ministry of Education, College of Computer Science and Technology, Jilin University, Changchun 130012, China; (R.X.); (Y.W.); (M.D.); (L.H.)
| | - Meiyu Duan
- Key Laboratory of Symbolic Computation and Knowledge Engineering of Ministry of Education, College of Computer Science and Technology, Jilin University, Changchun 130012, China; (R.X.); (Y.W.); (M.D.); (L.H.)
| | - Tunyang Xie
- Centre for Mathematical Sciences, University of Cambridge, Wilberforce Road, Cambridge CB3 0WA, UK;
| | - Xiaonan Song
- Key Laboratory of Symbolic Computation and Knowledge Engineering of Ministry of Education, College of Software, Jilin University, Changchun 130012, China;
| | - Qiong Yu
- Department of Epidemiology and Biostatistics, School of Public Health, Jilin University, Changchun 130012, China;
| | - Yusi Fan
- Key Laboratory of Symbolic Computation and Knowledge Engineering of Ministry of Education, College of Software, Jilin University, Changchun 130012, China;
| | - Lan Huang
- Key Laboratory of Symbolic Computation and Knowledge Engineering of Ministry of Education, College of Computer Science and Technology, Jilin University, Changchun 130012, China; (R.X.); (Y.W.); (M.D.); (L.H.)
| | - Fengfeng Zhou
- Key Laboratory of Symbolic Computation and Knowledge Engineering of Ministry of Education, College of Computer Science and Technology, Jilin University, Changchun 130012, China; (R.X.); (Y.W.); (M.D.); (L.H.)
- School of Biology and Engineering, Guizhou Medical University, Guiyang 550025, China
| |
Collapse
|
2
|
Pan-cancer identification of the relationship of metabolism-related differentially expressed transcription regulation with non-differentially expressed target genes via a gated recurrent unit network. Comput Biol Med 2022; 148:105883. [PMID: 35878490 DOI: 10.1016/j.compbiomed.2022.105883] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/06/2022] [Revised: 07/10/2022] [Accepted: 07/16/2022] [Indexed: 11/20/2022]
Abstract
The transcriptome describes the expression of all genes in a sample. Most studies have investigated the differential patterns or discrimination powers of transcript expression levels. In this study, we hypothesized that the quantitative correlations between the expression levels of transcription factors (TFs) and their regulated target genes (mRNAs) serve as a novel view of healthy status, and a disease sample exhibits a differential landscape (mqTrans) of transcription regulations compared with healthy status. We formulated quantitative transcription regulation relationships of metabolism-related genes as a multi-input multi-output regression model via a gated recurrent unit (GRU) network. The GRU model was trained using healthy blood transcriptomes and the expression levels of mRNAs were predicted by those of the TFs. The mqTrans feature of a gene was defined as the difference between its predicted and actual expression levels. A pan-cancer investigation of the differentially expressed mqTrans features was conducted between the early- and late-stage cancers in 26 cancer types of The Cancer Genome Atlas database. This study focused on the differentially expressed mqTrans features, that did not show differential expression in the actual expression levels. These genes could not be detected by conventional differential analysis. Such dark biomarkers are worthy of further wet-lab investigation. The experimental data also showed that the proposed mqTrans investigation improved the classification between early- and late-stage samples for some cancer types. Thus, the mqTrans features serve as a complementary view to transcriptomes, an OMIC type with mature high-throughput production technologies, and abundant public resources.
Collapse
|
3
|
Ray-Jones H, Spivakov M. Transcriptional enhancers and their communication with gene promoters. Cell Mol Life Sci 2021; 78:6453-6485. [PMID: 34414474 PMCID: PMC8558291 DOI: 10.1007/s00018-021-03903-w] [Citation(s) in RCA: 32] [Impact Index Per Article: 8.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/12/2021] [Revised: 07/08/2021] [Accepted: 07/19/2021] [Indexed: 12/13/2022]
Abstract
Transcriptional enhancers play a key role in the initiation and maintenance of gene expression programmes, particularly in metazoa. How these elements control their target genes in the right place and time is one of the most pertinent questions in functional genomics, with wide implications for most areas of biology. Here, we synthesise classic and recent evidence on the regulatory logic of enhancers, including the principles of enhancer organisation, factors that facilitate and delimit enhancer-promoter communication, and the joint effects of multiple enhancers. We show how modern approaches building on classic insights have begun to unravel the complexity of enhancer-promoter relationships, paving the way towards a quantitative understanding of gene control.
Collapse
Affiliation(s)
- Helen Ray-Jones
- MRC London Institute of Medical Sciences, London, W12 0NN, UK
- Institute of Clinical Sciences, Faculty of Medicine, Imperial College, London, W12 0NN, UK
| | - Mikhail Spivakov
- MRC London Institute of Medical Sciences, London, W12 0NN, UK.
- Institute of Clinical Sciences, Faculty of Medicine, Imperial College, London, W12 0NN, UK.
| |
Collapse
|
4
|
MACMIC Reveals A Dual Role of CTCF in Epigenetic Regulation of Cell Identity Genes. GENOMICS PROTEOMICS & BIOINFORMATICS 2021; 19:140-153. [PMID: 33677108 PMCID: PMC8498966 DOI: 10.1016/j.gpb.2020.10.008] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 05/26/2020] [Revised: 08/28/2020] [Accepted: 11/17/2020] [Indexed: 11/23/2022]
Abstract
Numerous studies of relationship between epigenomic features have focused on their strong correlation across the genome, likely because such relationship can be easily identified by many established methods for correlation analysis. However, two features with little correlation may still colocalize at many genomic sites to implement important functions. There is no bioinformatic tool for researchers to specifically identify such feature pairs. Here, we develop a method to identify feature pairs in which two features have maximal colocalization minimal correlation (MACMIC) across the genome. By MACMIC analysis of 3306 feature pairs in 16 human cell types, we reveal a dual role of CCCTC-binding factor (CTCF) in epigenetic regulation of cell identity genes. Although super-enhancers are associated with activation of target genes, only a subset of super-enhancers colocalized with CTCF regulate cell identity genes. At super-enhancers colocalized with CTCF, CTCF is required for the active marker H3K27ac in cell types requiring the activation, and also required for the repressive marker H3K27me3 in other cell types requiring repression. Our work demonstrates the biological utility of the MACMIC analysis and reveals a key role for CTCF in epigenetic regulation of cell identity. The code for MACMIC is available at https://github.com/bxia888/MACMIC.
Collapse
|
5
|
Weighill D, Guebila MB, Lopes-Ramos C, Glass K, Quackenbush J, Platig J, Burkholz R. Gene regulatory network inference as relaxed graph matching. PROCEEDINGS OF THE ... AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE. AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE 2021; 35:10263-10272. [PMID: 34707916 PMCID: PMC8546743] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Subscribe] [Scholar Register] [Indexed: 05/17/2023]
Abstract
Bipartite network inference is a ubiquitous problem across disciplines. One important example in the field molecular biology is gene regulatory network inference. Gene regulatory networks are an instrumental tool aiding in the discovery of the molecular mechanisms driving diverse diseases, including cancer. However, only noisy observations of the projections of these regulatory networks are typically assayed. In an effort to better estimate regulatory networks from their noisy projections, we formulate a non-convex but analytically tractable optimization problem called OTTER. This problem can be interpreted as relaxed graph matching between the two projections of the bipartite network. OTTER's solutions can be derived explicitly and inspire a spectral algorithm, for which we provide network recovery guarantees. We also provide an alternative approach based on gradient descent that is more robust to noise compared to the spectral algorithm. Interestingly, this gradient descent approach resembles the message passing equations of an established gene regulatory network inference method, PANDA. Using three cancer-related data sets, we show that OTTER outperforms state-of-the-art inference methods in predicting transcription factor binding to gene regulatory regions. To encourage new graph matching applications to this problem, we have made all networks and validation data publicly available.
Collapse
Affiliation(s)
- Deborah Weighill
- Department of Biostatistics, Harvard T.H. Chan School of Public Health, Boston, MA 02115
| | - Marouen Ben Guebila
- Department of Biostatistics, Harvard T.H. Chan School of Public Health, Boston, MA 02115
| | - Camila Lopes-Ramos
- Department of Biostatistics, Harvard T.H. Chan School of Public Health, Boston, MA 02115
| | - Kimberly Glass
- Department of Biostatistics, Harvard T.H. Chan School of Public Health, Boston, MA 02115
- Channing Division of Network Medicine, Brigham and Women's Hospital
- Harvard Medical School, Boston, MA 02115
| | - John Quackenbush
- Department of Biostatistics, Harvard T.H. Chan School of Public Health, Boston, MA 02115
- Channing Division of Network Medicine, Brigham and Women's Hospital
- Harvard Medical School, Boston, MA 02115
| | - John Platig
- Channing Division of Network Medicine, Brigham and Women's Hospital
- Harvard Medical School, Boston, MA 02115
| | - Rebekka Burkholz
- Department of Biostatistics, Harvard T.H. Chan School of Public Health, Boston, MA 02115
| |
Collapse
|
6
|
van der Lee R, Correard S, Wasserman WW. Deregulated Regulators: Disease-Causing cis Variants in Transcription Factor Genes. Trends Genet 2020; 36:523-539. [DOI: 10.1016/j.tig.2020.04.006] [Citation(s) in RCA: 16] [Impact Index Per Article: 3.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/21/2020] [Revised: 04/15/2020] [Accepted: 04/16/2020] [Indexed: 12/12/2022]
|
7
|
Mitchelmore J, Grinberg NF, Wallace C, Spivakov M. Functional effects of variation in transcription factor binding highlight long-range gene regulation by epromoters. Nucleic Acids Res 2020; 48:2866-2879. [PMID: 32112106 PMCID: PMC7102942 DOI: 10.1093/nar/gkaa123] [Citation(s) in RCA: 12] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/28/2019] [Revised: 02/14/2020] [Accepted: 02/17/2020] [Indexed: 02/06/2023] Open
Abstract
Identifying DNA cis-regulatory modules (CRMs) that control the expression of specific genes is crucial for deciphering the logic of transcriptional control. Natural genetic variation can point to the possible gene regulatory function of specific sequences through their allelic associations with gene expression. However, comprehensive identification of causal regulatory sequences in brute-force association testing without incorporating prior knowledge is challenging due to limited statistical power and effects of linkage disequilibrium. Sequence variants affecting transcription factor (TF) binding at CRMs have a strong potential to influence gene regulatory function, which provides a motivation for prioritizing such variants in association testing. Here, we generate an atlas of CRMs showing predicted allelic variation in TF binding affinity in human lymphoblastoid cell lines and test their association with the expression of their putative target genes inferred from Promoter Capture Hi-C and immediate linear proximity. We reveal >1300 CRM TF-binding variants associated with target gene expression, the majority of them undetected with standard association testing. A large proportion of CRMs showing associations with the expression of genes they contact in 3D localize to the promoter regions of other genes, supporting the notion of 'epromoters': dual-action CRMs with promoter and distal enhancer activity.
Collapse
Affiliation(s)
- Joanna Mitchelmore
- Nuclear Dynamics Programme, Babraham Institute, Babraham Research Campus, Cambridge CB22 3AT, UK
| | - Nastasiya F Grinberg
- Cambridge Institute of Therapeutic Immunology & Infectious Disease (CITIID), University of Cambridge, Cambridge Biomedical Campus, Cambridge CB2 0AW, UK
| | - Chris Wallace
- Cambridge Institute of Therapeutic Immunology & Infectious Disease (CITIID), University of Cambridge, Cambridge Biomedical Campus, Cambridge CB2 0AW, UK
- MRC Biostatistics Unit, University of Cambridge, Cambridge Biomedical Campus, Cambridge CB2 0SR, UK
| | - Mikhail Spivakov
- Nuclear Dynamics Programme, Babraham Institute, Babraham Research Campus, Cambridge CB22 3AT, UK
- MRC London Institute of Medical Sciences, Du Cane Road, London W12 0NN, UK
- Institute of Clinical Sciences, Faculty of Medicine, Imperial College, Du Cane Road, London W12 0NN, UK
| |
Collapse
|
8
|
Trefflich S, Dalmolin RJS, Ortega JM, Castro MAA. Which came first, the transcriptional regulator or its target genes? An evolutionary perspective into the construction of eukaryotic regulons. BIOCHIMICA ET BIOPHYSICA ACTA-GENE REGULATORY MECHANISMS 2019; 1863:194472. [PMID: 31825805 DOI: 10.1016/j.bbagrm.2019.194472] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/31/2019] [Revised: 11/06/2019] [Accepted: 11/30/2019] [Indexed: 01/06/2023]
Abstract
Eukaryotic regulons are regulatory units formed by a set of genes under the control of the same transcription factor (TF). Despite the functional plasticity, TFs are highly conserved and recognize the same DNA sequences in different organisms. One of the main factors that confer regulatory specificity is the distribution of the binding sites of the TFs along the genome, allowing the configuration of different transcriptional regulatory networks (TRNs) from the same regulator. A similar scenario occurs between tissues of the same organism, where a TRN can be rewired by epigenetic factors, modulating the accessibility of the TF to its binding sites. In this article we discuss concepts that can help to formulate testable hypotheses about the construction of regulons, exploring the presence and absence of the elements that form a TRN throughout the evolution of an ancestral lineage. This article is part of a Special Issue entitled: Transcriptional Profiles and Regulatory Gene Networks edited by Dr. Federico Manuel Giorgi and Dr. Shaun Mahony.
Collapse
Affiliation(s)
- Sheyla Trefflich
- Graduate Program in Bioinformatics, Institute of Biological Sciences, Federal University of Minas Gerais, Belo Horizonte 31270-901, Brazil; Bioinformatics and Systems Biology Laboratory, Federal University of Paraná, Curitiba 81520-260, Brazil
| | - Rodrigo J S Dalmolin
- Bioinformatics Multidisciplinary Environment, Federal University of Rio Grande do Norte, Natal 59078-400, Brazil
| | - José Miguel Ortega
- Graduate Program in Bioinformatics, Institute of Biological Sciences, Federal University of Minas Gerais, Belo Horizonte 31270-901, Brazil
| | - Mauro A A Castro
- Bioinformatics and Systems Biology Laboratory, Federal University of Paraná, Curitiba 81520-260, Brazil.
| |
Collapse
|
9
|
Penzar DD, Zinkevich AO, Vorontsov IE, Sitnik VV, Favorov AV, Makeev VJ, Kulakovskiy IV. What Do Neighbors Tell About You: The Local Context of Cis-Regulatory Modules Complicates Prediction of Regulatory Variants. Front Genet 2019; 10:1078. [PMID: 31737053 PMCID: PMC6834773 DOI: 10.3389/fgene.2019.01078] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/15/2019] [Accepted: 10/09/2019] [Indexed: 02/05/2023] Open
Abstract
Many problems of modern genetics and functional genomics require the assessment of functional effects of sequence variants, including gene expression changes. Machine learning is considered to be a promising approach for solving this task, but its practical applications remain a challenge due to the insufficient volume and diversity of training data. A promising source of valuable data is a saturation mutagenesis massively parallel reporter assay, which quantitatively measures changes in transcription activity caused by sequence variants. Here, we explore the computational predictions of the effects of individual single-nucleotide variants on gene transcription measured in the massively parallel reporter assays, based on the data from the recent "Regulation Saturation" Critical Assessment of Genome Interpretation challenge. We show that the estimated prediction quality strongly depends on the structure of the training and validation data. Particularly, training on the sequence segments located next to the validation data results in the "information leakage" caused by the local context. This information leakage allows reproducing the prediction quality of the best CAGI challenge submissions with a fairly simple machine learning approach, and even obtaining notably better-than-random predictions using irrelevant genomic regions. Validation scenarios preventing such information leakage dramatically reduce the measured prediction quality. The performance at independent regulatory regions entirely excluded from the training set appears to be much lower than needed for practical applications, and even the performance estimation will become reliable only in the future with richer data from multiple reporters. The source code and data are available at https://bitbucket.org/autosomeru_cagi2018/cagi2018_regsat and https://genomeinterpretation.org/content/expression-variants.
Collapse
Affiliation(s)
- Dmitry D. Penzar
- Vavilov Institute of General Genetics, Russian Academy of Sciences, Moscow, Russia
- Faculty of Bioengineering and Bioinformatics, Lomonosov Moscow State University, Moscow, Russia
- Department of Medical and Biological Physics, Moscow Institute of Physics and Technology (State University), Dolgoprudny, Russia
| | - Arsenii O. Zinkevich
- Vavilov Institute of General Genetics, Russian Academy of Sciences, Moscow, Russia
- Faculty of Bioengineering and Bioinformatics, Lomonosov Moscow State University, Moscow, Russia
| | - Ilya E. Vorontsov
- Vavilov Institute of General Genetics, Russian Academy of Sciences, Moscow, Russia
| | - Vasily V. Sitnik
- Vavilov Institute of General Genetics, Russian Academy of Sciences, Moscow, Russia
| | - Alexander V. Favorov
- Vavilov Institute of General Genetics, Russian Academy of Sciences, Moscow, Russia
- Department of Oncology, Sidney Kimmel Comprehensive Cancer Center, The Johns Hopkins University School of Medicine, Baltimore, MD, United States
| | - Vsevolod J. Makeev
- Vavilov Institute of General Genetics, Russian Academy of Sciences, Moscow, Russia
- Department of Medical and Biological Physics, Moscow Institute of Physics and Technology (State University), Dolgoprudny, Russia
- Engelhardt Institute of Molecular Biology, Russian Academy of Sciences, Moscow, Russia
| | - Ivan V. Kulakovskiy
- Vavilov Institute of General Genetics, Russian Academy of Sciences, Moscow, Russia
- Engelhardt Institute of Molecular Biology, Russian Academy of Sciences, Moscow, Russia
- Institute of Mathematical Problems of Biology RAS - the Branch of Keldysh Institute of Applied Mathematics of Russian Academy of Sciences, Pushchino, Russia
| |
Collapse
|
10
|
The spatial binding model of the pioneer factor Oct4 with its target genes during cell reprogramming. Comput Struct Biotechnol J 2019; 17:1226-1233. [PMID: 31921389 PMCID: PMC6944736 DOI: 10.1016/j.csbj.2019.09.002] [Citation(s) in RCA: 18] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/19/2019] [Revised: 09/05/2019] [Accepted: 09/07/2019] [Indexed: 12/18/2022] Open
Abstract
Understanding the target regulation between pioneer factor and its binding genes is crucial for improving the efficiency of TF-mediated reprogramming. Oct4 as the only one factor that cannot be substituted by other POU members, it is urgent need to develop a quantitative model for describing the spatial binding pattern with its target genes. The dynamic profiles of pioneer factor Oct4-binding showed that the major wave occurs at the intermediate stage of cell reprogramming (from day 7 to day 15), and the promoter is the preferred targeting regions. The Oct4-binding distributions perform significant chromosome bias. The overall enrichment on chromosome 1–11 is higher than that on the others. The dramatic event of TF-mediated reprogramming is mainly concentrated on autosomes. We also found that the spatial binding ability of Oct4 binding can be represented quantitatively by using three parameters of peaks (height, width and distance). The dynamic changes of Oct4-binding demonstrated that the width play more important roles in regulating expression of target genes. At last, a multivariate linear regression was introduced to establish the spatial binding model of the Oct4-binding. The evaluation results confirmed that the height and width is positively correlated with the gene expression. And the additive interaction terms of height and width can better optimize the model performance than the multiplicative terms. The best average coefficients of determination of improved model achieved to 81.38%. Our study will provide new insights into the cooperative regulation of spatial binding pattern of pioneer factors in cell reprogramming.
Collapse
|