1
|
Tudose C, Bond J, Ryan CJ. Gene essentiality in cancer is better predicted by mRNA abundance than by gene regulatory network-inferred activity. NAR Cancer 2023; 5:zcad056. [PMID: 38035131 PMCID: PMC10683780 DOI: 10.1093/narcan/zcad056] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/03/2023] [Revised: 10/30/2023] [Accepted: 11/09/2023] [Indexed: 12/02/2023] Open
Abstract
Gene regulatory networks (GRNs) are often deregulated in tumor cells, resulting in altered transcriptional programs that facilitate tumor growth. These altered networks may make tumor cells vulnerable to the inhibition of specific regulatory proteins. Consequently, the reconstruction of GRNs in tumors is often proposed as a means to identify therapeutic targets. While there are examples of individual targets identified using GRNs, the extent to which GRNs can be used to predict sensitivity to targeted intervention in general remains unknown. Here we use the results of genome-wide CRISPR screens to systematically assess the ability of GRNs to predict sensitivity to gene inhibition in cancer cell lines. Using GRNs derived from multiple sources, including GRNs reconstructed from tumor transcriptomes and from curated databases, we infer regulatory gene activity in cancer cell lines from ten cancer types. We then ask, in each cancer type, if the inferred regulatory activity of each gene is predictive of sensitivity to CRISPR perturbation of that gene. We observe slight variation in the correlation between gene regulatory activity and gene sensitivity depending on the source of the GRN and the activity estimation method used. However, we find that there is consistently a stronger relationship between mRNA abundance and gene sensitivity than there is between regulatory gene activity and gene sensitivity. This is true both when gene sensitivity is treated as a binary and a quantitative property. Overall, our results suggest that gene sensitivity is better predicted by measured expression than by GRN-inferred activity.
Collapse
Affiliation(s)
- Cosmin Tudose
- Systems Biology Ireland, University College Dublin, Dublin, Ireland
- School of Medicine, University College Dublin, Dublin, Ireland
- The SFI Centre for Research Training in Genomics Data Science, Ireland
| | - Jonathan Bond
- Systems Biology Ireland, University College Dublin, Dublin, Ireland
- School of Medicine, University College Dublin, Dublin, Ireland
- Children's Health Ireland at Crumlin, Dublin, Ireland
| | - Colm J Ryan
- Systems Biology Ireland, University College Dublin, Dublin, Ireland
- School of Computer Science, University College Dublin, Dublin, Ireland
- Conway Institute, University College Dublin, Dublin, Ireland
| |
Collapse
|
2
|
Li X, Lappalainen T, Bussemaker HJ. Identifying genetic regulatory variants that affect transcription factor activity. CELL GENOMICS 2023; 3:100382. [PMID: 37719147 PMCID: PMC10504674 DOI: 10.1016/j.xgen.2023.100382] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 10/20/2022] [Revised: 05/19/2023] [Accepted: 07/21/2023] [Indexed: 09/19/2023]
Abstract
Genetic variants affecting gene expression levels in humans have been mapped in the Genotype-Tissue Expression (GTEx) project. Trans-acting variants impacting many genes simultaneously through a shared transcription factor (TF) are of particular interest. Here, we developed a generalized linear model (GLM) to estimate protein-level TF activity levels in an individual-specific manner from GTEx RNA sequencing (RNA-seq) profiles. It uses observed differential gene expression after TF perturbation as a predictor and, by analyzing differential expression within pairs of neighboring genes, controls for the confounding effect of variation in chromatin state along the genome. We inferred genotype-specific activities for 55 TFs across 49 tissues. Subsequently performing genome-wide association analysis on this virtual trait revealed TF activity quantitative trait loci (aQTLs) that, as a set, are enriched for functional features. Altogether, the set of tools we introduce here highlights the potential of genetic association studies for cellular endophenotypes based on a network-based multi-omics approach. The transparent peer review record is available.
Collapse
Affiliation(s)
- Xiaoting Li
- Department of Biological Sciences, Columbia University, New York, NY 10027, USA
| | - Tuuli Lappalainen
- New York Genome Center, New York, NY 10013, USA
- Science for Life Laboratory, Department of Gene Technology, KTH Royal Institute of Technology, Stockholm, Sweden
- Department of Systems Biology, Columbia University, New York, NY 10032, USA
| | - Harmen J. Bussemaker
- Department of Biological Sciences, Columbia University, New York, NY 10027, USA
- Department of Systems Biology, Columbia University, New York, NY 10032, USA
| |
Collapse
|
3
|
Jiménez S, Schreiber V, Mercier R, Gradwohl G, Molina N. Characterization of cell-fate decision landscapes by estimating transcription factor dynamics. CELL REPORTS METHODS 2023; 3:100512. [PMID: 37533652 PMCID: PMC10391345 DOI: 10.1016/j.crmeth.2023.100512] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 09/06/2022] [Revised: 03/23/2023] [Accepted: 06/01/2023] [Indexed: 08/04/2023]
Abstract
Time-specific modulation of gene expression during differentiation by transcription factors promotes cell diversity. However, estimating their dynamic regulatory activity at the single-cell level and in a high-throughput manner remains challenging. We present FateCompass, an integrative approach that utilizes single-cell transcriptomics data to identify lineage-specific transcription factors throughout differentiation. By combining a probabilistic framework with RNA velocities or differentiation potential, we estimate transition probabilities, while a linear model of gene regulation is employed to compute transcription factor activities. Considering dynamic changes and correlations of expression and activities, FateCompass identifies lineage-specific regulators. Our validation using in silico data and application to pancreatic endocrine cell differentiation datasets highlight both known and potentially novel lineage-specific regulators. Notably, we uncovered undescribed transcription factors of an enterochromaffin-like population during in vitro differentiation toward ß-like cells. FateCompass provides a valuable framework for hypothesis generation, advancing our understanding of the gene regulatory networks driving cell-fate decisions.
Collapse
Affiliation(s)
- Sara Jiménez
- Université de Strasbourg, Strasbourg, France
- CNRS, UMR 7104, 67400 Illkirch, France
- INSERM, UMR-S 1258, 67400 Illkirch, France
- IGBMC, Institut de Génétique et de Biologie Moléculaire et Cellulaire, 67400 Illkirch, France
| | - Valérie Schreiber
- Université de Strasbourg, Strasbourg, France
- CNRS, UMR 7104, 67400 Illkirch, France
- INSERM, UMR-S 1258, 67400 Illkirch, France
- IGBMC, Institut de Génétique et de Biologie Moléculaire et Cellulaire, 67400 Illkirch, France
| | - Reuben Mercier
- Université de Strasbourg, Strasbourg, France
- CNRS, UMR 7104, 67400 Illkirch, France
- INSERM, UMR-S 1258, 67400 Illkirch, France
- IGBMC, Institut de Génétique et de Biologie Moléculaire et Cellulaire, 67400 Illkirch, France
| | - Gérard Gradwohl
- Université de Strasbourg, Strasbourg, France
- CNRS, UMR 7104, 67400 Illkirch, France
- INSERM, UMR-S 1258, 67400 Illkirch, France
- IGBMC, Institut de Génétique et de Biologie Moléculaire et Cellulaire, 67400 Illkirch, France
| | - Nacho Molina
- Université de Strasbourg, Strasbourg, France
- CNRS, UMR 7104, 67400 Illkirch, France
- INSERM, UMR-S 1258, 67400 Illkirch, France
- IGBMC, Institut de Génétique et de Biologie Moléculaire et Cellulaire, 67400 Illkirch, France
| |
Collapse
|
4
|
Arriojas A, Patalano S, Macoska J, Zarringhalam K. A Bayesian Noisy Logic Model for Inference of Transcription Factor Activity from Single Cell and Bulk Transcriptomic Data. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.05.03.539308. [PMID: 37205561 PMCID: PMC10187261 DOI: 10.1101/2023.05.03.539308] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/21/2023]
Abstract
The advent of high-throughput sequencing has made it possible to measure the expression of genes at relatively low cost. However, direct measurement of regulatory mechanisms, such as Transcription Factor (TF) activity is still not readily feasible in a high-throughput manner. Consequently, there is a need for computational approaches that can reliably estimate regulator activity from observable gene expression data. In this work, we present a noisy Boolean logic Bayesian model for TF activity inference from differential gene expression data and causal graphs. Our approach provides a flexible framework to incorporate biologically motivated TF-gene regulation logic models. Using simulations and controlled over-expression experiments in cell cultures, we demonstrate that our method can accurately identify TF activity. Moreover, we apply our method to bulk and single cell transcriptomics measurements to investigate transcriptional regulation of fibroblast phenotypic plasticity. Finally, to facilitate usage, we provide user-friendly software packages and a web-interface to query TF activity from user input differential gene expression data: https://umbibio.math.umb.edu/nlbayes/.
Collapse
Affiliation(s)
- Argenis Arriojas
- Department of Mathematics, University of Massachusetts Boston, Boston, MA 02125, USA
- Department of Physics, University of Massachusetts Boston, Boston, MA 02125, USA
- Center for Personalized Cancer Therapy, University of Massachusetts Boston, Boston, MA 02125, USA
| | - Susan Patalano
- Center for Personalized Cancer Therapy, University of Massachusetts Boston, Boston, MA 02125, USA
| | - Jill Macoska
- Center for Personalized Cancer Therapy, University of Massachusetts Boston, Boston, MA 02125, USA
| | - Kourosh Zarringhalam
- Department of Mathematics, University of Massachusetts Boston, Boston, MA 02125, USA
- Center for Personalized Cancer Therapy, University of Massachusetts Boston, Boston, MA 02125, USA
| |
Collapse
|
5
|
Wu Y, Xue L, Huang W, Deng M, Lin Y. Profiling transcription factor activity dynamics using intronic reads in time-series transcriptome data. PLoS Comput Biol 2022; 18:e1009762. [PMID: 35007289 PMCID: PMC8782462 DOI: 10.1371/journal.pcbi.1009762] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/14/2021] [Revised: 01/21/2022] [Accepted: 12/15/2021] [Indexed: 11/19/2022] Open
Abstract
Activities of transcription factors (TFs) are temporally modulated to regulate dynamic cellular processes, including development, homeostasis, and disease. Recent developments of bioinformatic tools have enabled the analysis of TF activities using transcriptome data. However, because these methods typically use exon-based target expression levels, the estimated TF activities have limited temporal accuracy. To address this, we proposed a TF activity measure based on intron-level information in time-series RNA-seq data, and implemented it to decode the temporal control of TF activities during dynamic processes. We showed that TF activities inferred from intronic reads can better recapitulate instantaneous TF activities compared to the exon-based measure. By analyzing public and our own time-series transcriptome data, we found that intron-based TF activities improve the characterization of temporal phasing of cycling TFs during circadian rhythm, and facilitate the discovery of two temporally opposing TF modules during T cell activation. Collectively, we anticipate that the proposed approach would be broadly applicable for decoding global transcriptional architecture during dynamic processes.
Collapse
Affiliation(s)
- Yan Wu
- Center for Quantitative Biology, Academy for Advanced Interdisciplinary Studies, Peking University, Beijing, China
- The MOE Key Laboratory of Cell Proliferation and Differentiation, School of Life Sciences, Peking University, Beijing, China
- Peking-Tsinghua Center for Life Sciences, Academy for Advanced Interdisciplinary Studies, Peking University, Beijing, China
- School of Mathematical Sciences and Center for Statistical Science, Peking University, Beijing, China
| | - Lingfeng Xue
- Center for Quantitative Biology, Academy for Advanced Interdisciplinary Studies, Peking University, Beijing, China
- The MOE Key Laboratory of Cell Proliferation and Differentiation, School of Life Sciences, Peking University, Beijing, China
- Peking-Tsinghua Center for Life Sciences, Academy for Advanced Interdisciplinary Studies, Peking University, Beijing, China
| | - Wen Huang
- Center for Quantitative Biology, Academy for Advanced Interdisciplinary Studies, Peking University, Beijing, China
- The MOE Key Laboratory of Cell Proliferation and Differentiation, School of Life Sciences, Peking University, Beijing, China
- Peking-Tsinghua Center for Life Sciences, Academy for Advanced Interdisciplinary Studies, Peking University, Beijing, China
| | - Minghua Deng
- Center for Quantitative Biology, Academy for Advanced Interdisciplinary Studies, Peking University, Beijing, China
- School of Mathematical Sciences and Center for Statistical Science, Peking University, Beijing, China
| | - Yihan Lin
- Center for Quantitative Biology, Academy for Advanced Interdisciplinary Studies, Peking University, Beijing, China
- The MOE Key Laboratory of Cell Proliferation and Differentiation, School of Life Sciences, Peking University, Beijing, China
- Peking-Tsinghua Center for Life Sciences, Academy for Advanced Interdisciplinary Studies, Peking University, Beijing, China
- * E-mail:
| |
Collapse
|
6
|
Gallegos JE, Adames NR, Rogers MF, Kraikivski P, Ibele A, Nurzynski-Loth K, Kudlow E, Murali TM, Tyson JJ, Peccoud J. Genetic interactions derived from high-throughput phenotyping of 6589 yeast cell cycle mutants. NPJ Syst Biol Appl 2020; 6:11. [PMID: 32376972 PMCID: PMC7203125 DOI: 10.1038/s41540-020-0134-z] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/25/2019] [Accepted: 04/06/2020] [Indexed: 11/09/2022] Open
Abstract
Over the last 30 years, computational biologists have developed increasingly realistic mathematical models of the regulatory networks controlling the division of eukaryotic cells. These models capture data resulting from two complementary experimental approaches: low-throughput experiments aimed at extensively characterizing the functions of small numbers of genes, and large-scale genetic interaction screens that provide a systems-level perspective on the cell division process. The former is insufficient to capture the interconnectivity of the genetic control network, while the latter is fraught with irreproducibility issues. Here, we describe a hybrid approach in which the 630 genetic interactions between 36 cell-cycle genes are quantitatively estimated by high-throughput phenotyping with an unprecedented number of biological replicates. Using this approach, we identify a subset of high-confidence genetic interactions, which we use to refine a previously published mathematical model of the cell cycle. We also present a quantitative dataset of the growth rate of these mutants under six different media conditions in order to inform future cell cycle models.
Collapse
Affiliation(s)
- Jenna E Gallegos
- Colorado State University, Chemical and Biological Engineering, Fort Collins, CO, USA
| | - Neil R Adames
- Colorado State University, Chemical and Biological Engineering, Fort Collins, CO, USA.,New Culture, Inc., San Francisco, CA, USA
| | | | - Pavel Kraikivski
- Virginia Tech, Academy of Integrated Sciences, Blacksburg, VA, USA
| | - Aubrey Ibele
- Colorado State University, Chemical and Biological Engineering, Fort Collins, CO, USA
| | - Kevin Nurzynski-Loth
- Colorado State University, Chemical and Biological Engineering, Fort Collins, CO, USA
| | - Eric Kudlow
- Colorado State University, Chemical and Biological Engineering, Fort Collins, CO, USA
| | - T M Murali
- Virginia Tech, Computer Science, Blacksburg, VA, USA
| | - John J Tyson
- Virginia Tech, Biological Sciences, Blacksburg, VA, USA
| | - Jean Peccoud
- Colorado State University, Chemical and Biological Engineering, Fort Collins, CO, USA. .,GenoFAB, Inc., Fort Collins, CO, USA.
| |
Collapse
|
7
|
Holland CH, Tanevski J, Perales-Patón J, Gleixner J, Kumar MP, Mereu E, Joughin BA, Stegle O, Lauffenburger DA, Heyn H, Szalai B, Saez-Rodriguez J. Robustness and applicability of transcription factor and pathway analysis tools on single-cell RNA-seq data. Genome Biol 2020; 21:36. [PMID: 32051003 PMCID: PMC7017576 DOI: 10.1186/s13059-020-1949-z] [Citation(s) in RCA: 196] [Impact Index Per Article: 39.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/03/2019] [Accepted: 01/29/2020] [Indexed: 12/31/2022] Open
Abstract
BACKGROUND Many functional analysis tools have been developed to extract functional and mechanistic insight from bulk transcriptome data. With the advent of single-cell RNA sequencing (scRNA-seq), it is in principle possible to do such an analysis for single cells. However, scRNA-seq data has characteristics such as drop-out events and low library sizes. It is thus not clear if functional TF and pathway analysis tools established for bulk sequencing can be applied to scRNA-seq in a meaningful way. RESULTS To address this question, we perform benchmark studies on simulated and real scRNA-seq data. We include the bulk-RNA tools PROGENy, GO enrichment, and DoRothEA that estimate pathway and transcription factor (TF) activities, respectively, and compare them against the tools SCENIC/AUCell and metaVIPER, designed for scRNA-seq. For the in silico study, we simulate single cells from TF/pathway perturbation bulk RNA-seq experiments. We complement the simulated data with real scRNA-seq data upon CRISPR-mediated knock-out. Our benchmarks on simulated and real data reveal comparable performance to the original bulk data. Additionally, we show that the TF and pathway activities preserve cell type-specific variability by analyzing a mixture sample sequenced with 13 scRNA-seq protocols. We also provide the benchmark data for further use by the community. CONCLUSIONS Our analyses suggest that bulk-based functional analysis tools that use manually curated footprint gene sets can be applied to scRNA-seq data, partially outperforming dedicated single-cell tools. Furthermore, we find that the performance of functional analysis tools is more sensitive to the gene sets than to the statistic used.
Collapse
Affiliation(s)
- Christian H Holland
- Institute for Computational Biomedicine, Bioquant, Heidelberg University, Faculty of Medicine, and Heidelberg University Hospital, Heidelberg, Germany
- Joint Research Centre for Computational Biomedicine (JRC-COMBINE), RWTH Aachen University, Faculty of Medicine, Aachen, Germany
| | - Jovan Tanevski
- Institute for Computational Biomedicine, Bioquant, Heidelberg University, Faculty of Medicine, and Heidelberg University Hospital, Heidelberg, Germany
- Department of Knowledge Technologies, Jožef Stefan Institute, Ljubljana, Slovenia
| | - Javier Perales-Patón
- Institute for Computational Biomedicine, Bioquant, Heidelberg University, Faculty of Medicine, and Heidelberg University Hospital, Heidelberg, Germany
| | - Jan Gleixner
- German Cancer Research Center (DKFZ), Heidelberg, Germany
- European Molecular Biology Laboratory (EMBL), Genome Biology Unit, Heidelberg, Germany
| | - Manu P Kumar
- Department of Biological Engineering, MIT, Cambridge, MA, USA
| | - Elisabetta Mereu
- CNAG-CRG, Centre for Genomic Regulation (CRG), Barcelona Institute of Science and Technology (BIST), Barcelona, Spain
| | - Brian A Joughin
- Department of Biological Engineering, MIT, Cambridge, MA, USA
- Koch Institute for Integrative Cancer Biology, MIT, Cambridge, MA, USA
| | - Oliver Stegle
- German Cancer Research Center (DKFZ), Heidelberg, Germany
- European Molecular Biology Laboratory (EMBL), Genome Biology Unit, Heidelberg, Germany
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Cambridge, UK
| | | | - Holger Heyn
- CNAG-CRG, Centre for Genomic Regulation (CRG), Barcelona Institute of Science and Technology (BIST), Barcelona, Spain
- Universitat Pompeu Fabra (UPF), Barcelona, Spain
| | - Bence Szalai
- Faculty of Medicine, Department of Physiology, Semmelweis University, Budapest, Hungary
| | - Julio Saez-Rodriguez
- Institute for Computational Biomedicine, Bioquant, Heidelberg University, Faculty of Medicine, and Heidelberg University Hospital, Heidelberg, Germany.
- Joint Research Centre for Computational Biomedicine (JRC-COMBINE), RWTH Aachen University, Faculty of Medicine, Aachen, Germany.
| |
Collapse
|
8
|
Li P, Guo M, Sun B. Integration of multi-omics data to mine cancer-related gene modules. J Bioinform Comput Biol 2020; 17:1950038. [PMID: 32019413 DOI: 10.1142/s0219720019500380] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/31/2023]
Abstract
The identification of cancer-related genes is a major research goal, with implications for determining the pathogenesis of cancer and identifying biomarkers for early diagnosis and treatment. In this study, by integrating multi-omics data, including gene expression, DNA copy number variation, DNA methylation, transcription factors, miRNA, and lncRNA data, we propose a method for mining cancer-related genes based on network models. First, using random forest-based feature selection method multi-omics data are integrated to identify key regulatory factors that affect gene expression, and then genome-wide regulatory networks are constructed. Next, by comparing the regulatory networks of key candidate genes in variant samples and non-variant samples, a differential expression regulatory network is generated. The differential network contains a collection of abnormal regulatory genes of key candidate genes. Then, by introducing the functional similarity as a distance metric for gene sets, a density-based clustering method is used to mine gene modules related to cancer. We applied this method to LUSC (lung squamous cell carcinoma) and mined cancer-related gene modules composed of 20 genes. GO function and KEGG pathway analyses indicated that the modules were closely related to cancer. A survival analysis was used to verify that the excavated gene modules can effectively distinguish between high- and low-risk groups. Overall, these results suggest that the proposed method can be used to identify cancer-related gene modules, providing a basis for the development of biomarkers for diagnosis and treatment.
Collapse
Affiliation(s)
- Peng Li
- School of Artificial Intelligence, Beijing Normal University, Beijing 100875, P. R. China.,School of Electrical and Information Engineering, Beijing University of Civil Engineering and Architecture, Beijing 100044, P. R. China
| | - Maozu Guo
- School of Electrical and Information Engineering, Beijing University of Civil Engineering and Architecture, Beijing 100044, P. R. China
| | - Bo Sun
- School of Artificial Intelligence, Beijing Normal University, Beijing 100875, P. R. China
| |
Collapse
|
9
|
Garcia-Alonso L, Holland CH, Ibrahim MM, Turei D, Saez-Rodriguez J. Benchmark and integration of resources for the estimation of human transcription factor activities. Genome Res 2019; 29:1363-1375. [PMID: 31340985 PMCID: PMC6673718 DOI: 10.1101/gr.240663.118] [Citation(s) in RCA: 536] [Impact Index Per Article: 89.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/18/2018] [Accepted: 05/28/2019] [Indexed: 12/25/2022]
Abstract
The prediction of transcription factor (TF) activities from the gene expression of their targets (i.e., TF regulon) is becoming a widely used approach to characterize the functional status of transcriptional regulatory circuits. Several strategies and data sets have been proposed to link the target genes likely regulated by a TF, each one providing a different level of evidence. The most established ones are (1) manually curated repositories, (2) interactions derived from ChIP-seq binding data, (3) in silico prediction of TF binding on gene promoters, and (4) reverse-engineered regulons from large gene expression data sets. However, it is not known how these different sources of regulons affect the TF activity estimations and, thereby, downstream analysis and interpretation. Here we compared the accuracy and biases of these strategies to define human TF regulons by means of their ability to predict changes in TF activities in three reference benchmark data sets. We assembled a collection of TF-target interactions for 1541 human TFs and evaluated how different molecular and regulatory properties of the TFs, such as the DNA-binding domain, specificities, or mode of interaction with the chromatin, affect the predictions of TF activity. We assessed their coverage and found little overlap on the regulons derived from each strategy and better performance by literature-curated information followed by ChIP-seq data. We provide an integrated resource of all TF-target interactions derived through these strategies, with confidence scores, as a resource for enhanced prediction of TF activities.
Collapse
Affiliation(s)
- Luz Garcia-Alonso
- European Molecular Biology Laboratory-European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, CB10 1SD Cambridge, United Kingdom
- Open Targets, Wellcome Genome Campus, CB10 1SD Cambridge, United Kingdom
| | - Christian H Holland
- Joint Research Centre for Computational Biomedicine (JRC-COMBINE), RWTH Aachen University, Faculty of Medicine, 52074 Aachen, Germany
- Institute of Computational Biomedicine, Heidelberg University, Faculty of Medicine, 69120 Heidelberg, Germany
| | - Mahmoud M Ibrahim
- Joint Research Centre for Computational Biomedicine (JRC-COMBINE), RWTH Aachen University, Faculty of Medicine, 52074 Aachen, Germany
- Department of Nephrology, RWTH Aachen University, Faculty of Medicine, 52074 Aachen, Germany
| | - Denes Turei
- Joint Research Centre for Computational Biomedicine (JRC-COMBINE), RWTH Aachen University, Faculty of Medicine, 52074 Aachen, Germany
- Institute of Computational Biomedicine, Heidelberg University, Faculty of Medicine, 69120 Heidelberg, Germany
| | - Julio Saez-Rodriguez
- European Molecular Biology Laboratory-European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, CB10 1SD Cambridge, United Kingdom
- Open Targets, Wellcome Genome Campus, CB10 1SD Cambridge, United Kingdom
- Joint Research Centre for Computational Biomedicine (JRC-COMBINE), RWTH Aachen University, Faculty of Medicine, 52074 Aachen, Germany
- Institute of Computational Biomedicine, Heidelberg University, Faculty of Medicine, 69120 Heidelberg, Germany
| |
Collapse
|
10
|
Estimation of Transcription Factor Activity in Knockdown Studies. Sci Rep 2019; 9:9593. [PMID: 31270369 PMCID: PMC6610105 DOI: 10.1038/s41598-019-46053-7] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/26/2018] [Accepted: 06/20/2019] [Indexed: 11/24/2022] Open
Abstract
Numerous methods have been developed trying to infer actual regulatory events in a sample. A prominent class of methods model genome-wide gene expression as linear equations derived from a transcription factor (TF) – gene network and optimizes parameters to fit the measured expression intensities. We apply four such methods on experiments with a TF-knockdown (KD) in human and E. coli. The transcriptome data provides clear expression signals and thus represents an extremely favorable test setting. The methods estimate activity changes of all TFs, which we expect to be highest in the KD TF. However, only in 15 out of 54 cases, the KD TFs ranked in the top 5%. We show that this poor overall performance cannot be attributed to a low effectiveness of the knockdown or the specific regulatory network provided as background knowledge. Further, the ranks of regulators related to the KD TF by the network or pathway are not significantly different from a random selection. In general, the result overlaps of different methods are small, indicating that they draw very different conclusions when presented with the same, presumably simple, inference problem. These results show that the investigated methods cannot yield robust TF activity estimates in knockdown schemes.
Collapse
|
11
|
Computational methods for Gene Regulatory Networks reconstruction and analysis: A review. Artif Intell Med 2019; 95:133-145. [DOI: 10.1016/j.artmed.2018.10.006] [Citation(s) in RCA: 71] [Impact Index Per Article: 11.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/21/2018] [Revised: 10/23/2018] [Accepted: 10/23/2018] [Indexed: 01/14/2023]
|
12
|
Chen Y, Widschwendter M, Teschendorff AE. Systems-epigenomics inference of transcription factor activity implicates aryl-hydrocarbon-receptor inactivation as a key event in lung cancer development. Genome Biol 2017; 18:236. [PMID: 29262847 PMCID: PMC5738803 DOI: 10.1186/s13059-017-1366-0] [Citation(s) in RCA: 33] [Impact Index Per Article: 4.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/05/2017] [Accepted: 11/27/2017] [Indexed: 12/25/2022] Open
Abstract
Background Diverse molecular alterations associated with smoking in normal and precursor lung cancer cells have been reported, yet their role in lung cancer etiology remains unclear. A prominent example is hypomethylation of the aryl hydrocarbon-receptor repressor (AHRR) locus, which is observed in blood and squamous epithelial cells of smokers, but not in lung cancer. Results Using a novel systems-epigenomics algorithm, called SEPIRA, which leverages the power of a large RNA-sequencing expression compendium to infer regulatory activity from messenger RNA expression or DNA methylation (DNAm) profiles, we infer the landscape of binding activity of lung-specific transcription factors (TFs) in lung carcinogenesis. We show that lung-specific TFs become preferentially inactivated in lung cancer and precursor lung cancer lesions and further demonstrate that these results can be derived using only DNAm data. We identify subsets of TFs which become inactivated in precursor cells. Among these regulatory factors, we identify AHR, the aryl hydrocarbon-receptor which controls a healthy immune response in the lung epithelium and whose repressor, AHRR, has recently been implicated in smoking-mediated lung cancer. In addition, we identify FOXJ1, a TF which promotes growth of airway cilia and effective clearance of the lung airway epithelium from carcinogens. Conclusions We identify TFs, such as AHR, which become inactivated in the earliest stages of lung cancer and which, unlike AHRR hypomethylation, are also inactivated in lung cancer itself. The novel systems-epigenomics algorithm SEPIRA will be useful to the wider epigenome-wide association study community as a means of inferring regulatory activity. Electronic supplementary material The online version of this article (doi:10.1186/s13059-017-1366-0) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Yuting Chen
- CAS Key Laboratory of Computational Biology, CAS-MPG Partner Institute for Computational Biology, 320 Yue Yang Road, Shanghai, 200031, China
| | - Martin Widschwendter
- Department of Women's Cancer, University College London, 74 Huntley Street, London, WC1E 6AU, UK
| | - Andrew E Teschendorff
- CAS Key Laboratory of Computational Biology, CAS-MPG Partner Institute for Computational Biology, 320 Yue Yang Road, Shanghai, 200031, China. .,Department of Women's Cancer, University College London, 74 Huntley Street, London, WC1E 6AU, UK. .,UCL Cancer Institute, University College London, Paul O'Gorman Building, 72 Huntley Street, London, WC1E 6BT, UK.
| |
Collapse
|