1
|
Skok Gibbs C, Mahmood O, Bonneau R, Cho K. PMF-GRN: a variational inference approach to single-cell gene regulatory network inference using probabilistic matrix factorization. Genome Biol 2024; 25:88. [PMID: 38589899 PMCID: PMC11003171 DOI: 10.1186/s13059-024-03226-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/30/2023] [Accepted: 03/26/2024] [Indexed: 04/10/2024] Open
Abstract
Inferring gene regulatory networks (GRNs) from single-cell data is challenging due to heuristic limitations. Existing methods also lack estimates of uncertainty. Here we present Probabilistic Matrix Factorization for Gene Regulatory Network Inference (PMF-GRN). Using single-cell expression data, PMF-GRN infers latent factors capturing transcription factor activity and regulatory relationships. Using variational inference allows hyperparameter search for principled model selection and direct comparison to other generative models. We extensively test and benchmark our method using real single-cell datasets and synthetic data. We show that PMF-GRN infers GRNs more accurately than current state-of-the-art single-cell GRN inference methods, offering well-calibrated uncertainty estimates.
Collapse
Affiliation(s)
| | - Omar Mahmood
- Center for Data Science, New York University, New York, NY, 10011, USA
| | - Richard Bonneau
- Center for Data Science, New York University, New York, NY, 10011, USA
- Prescient Design, Genentech, New York, NY, 10010, USA
- Center for Genomics and Systems Biology, New York University, New York, NY, 10003, USA
| | - Kyunghyun Cho
- Center for Data Science, New York University, New York, NY, 10011, USA.
- Prescient Design, Genentech, New York, NY, 10010, USA.
| |
Collapse
|
2
|
Hecker D, Lauber M, Behjati Ardakani F, Ashrafiyan S, Manz Q, Kersting J, Hoffmann M, Schulz MH, List M. Computational tools for inferring transcription factor activity. Proteomics 2023; 23:e2200462. [PMID: 37706624 DOI: 10.1002/pmic.202200462] [Citation(s) in RCA: 7] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/17/2023] [Revised: 08/11/2023] [Accepted: 08/22/2023] [Indexed: 09/15/2023]
Abstract
Transcription factors (TFs) are essential players in orchestrating the regulatory landscape in cells. Still, their exact modes of action and dependencies on other regulatory aspects remain elusive. Since TFs act cell type-specific and each TF has its own characteristics, untangling their regulatory interactions from an experimental point of view is laborious and convoluted. Thus, there is an ongoing development of computational tools that estimate transcription factor activity (TFA) from a variety of data modalities, either based on a mapping of TFs to their putative target genes or in a genome-wide, gene-unspecific fashion. These tools can help to gain insights into TF regulation and to prioritize candidates for experimental validation. We want to give an overview of available computational tools that estimate TFA, illustrate examples of their application, debate common result validation strategies, and discuss assumptions and concomitant limitations.
Collapse
Affiliation(s)
- Dennis Hecker
- Goethe University Frankfurt, Frankfurt am Main, Germany
- German Center for Cardiovascular Research, Partner site Rhein-Main, Frankfurt am Main, Germany
- Cardio-Pulmonary Institute, Goethe University Hospital, Frankfurt am Main, Germany
| | - Michael Lauber
- Big Data in BioMedicine Group, Chair of Experimental Bioinformatics, TUM School of Life Sciences, Technical University of Munich, Freising, Germany
| | - Fatemeh Behjati Ardakani
- Goethe University Frankfurt, Frankfurt am Main, Germany
- German Center for Cardiovascular Research, Partner site Rhein-Main, Frankfurt am Main, Germany
- Cardio-Pulmonary Institute, Goethe University Hospital, Frankfurt am Main, Germany
| | - Shamim Ashrafiyan
- Goethe University Frankfurt, Frankfurt am Main, Germany
- German Center for Cardiovascular Research, Partner site Rhein-Main, Frankfurt am Main, Germany
- Cardio-Pulmonary Institute, Goethe University Hospital, Frankfurt am Main, Germany
| | - Quirin Manz
- Big Data in BioMedicine Group, Chair of Experimental Bioinformatics, TUM School of Life Sciences, Technical University of Munich, Freising, Germany
| | - Johannes Kersting
- Big Data in BioMedicine Group, Chair of Experimental Bioinformatics, TUM School of Life Sciences, Technical University of Munich, Freising, Germany
- GeneSurge GmbH, München, Germany
| | - Markus Hoffmann
- Big Data in BioMedicine Group, Chair of Experimental Bioinformatics, TUM School of Life Sciences, Technical University of Munich, Freising, Germany
- Institute for Advanced Study, Technical University of Munich, Garching, Germany
- National Institute of Diabetes, Digestive, and Kidney Diseases, National Institutes of Health, Bethesda, Maryland, USA
| | - Marcel H Schulz
- Goethe University Frankfurt, Frankfurt am Main, Germany
- German Center for Cardiovascular Research, Partner site Rhein-Main, Frankfurt am Main, Germany
- Cardio-Pulmonary Institute, Goethe University Hospital, Frankfurt am Main, Germany
| | - Markus List
- Big Data in BioMedicine Group, Chair of Experimental Bioinformatics, TUM School of Life Sciences, Technical University of Munich, Freising, Germany
| |
Collapse
|
3
|
Zhang X, Wang K, Dai H, Cai J, Liu Y, Yin C, Wu J, Li X, Wu G, Lu A, Liu Q, Guan D. Quantification of promoting efficiency and reducing toxicity of Traditional Chinese Medicine: A case study of the combination of Tripterygium wilfordii hook. f. and Lysimachia christinae hance in the treatment of lung cancer. Front Pharmacol 2022; 13:1018273. [PMID: 36339610 PMCID: PMC9631451 DOI: 10.3389/fphar.2022.1018273] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/13/2022] [Accepted: 10/07/2022] [Indexed: 11/13/2022] Open
Abstract
Traditional Chinese medicine (TCM) usually acts in the form of compound prescriptions in the treatment of complex diseases. The herbs contained in each prescription have the dual nature of efficiency and toxicity due to their complex chemical component, and the principle of prescription is usually to increase efficiency and reduce toxicity. At present, the studies on prescriptions have mainly focused on the consideration of the material basis and possible mechanism of the action mode, but the quantitative research on the compatibility rule of increasing efficiency and reducing toxicity is still the tip of the iceberg. With the extensive application of computational pharmacology technology in the research of TCM prescriptions, it is possible to quantify the mechanism of synergism and toxicity reduction of the TCM formula. Currently, there are some classic drug pairs commonly used to treat complex diseases, such as Tripterygium wilfordii Hook. f. with Lysimachia christinae Hance for lung cancer, Aconitum carmichaelii Debeaux with Glycyrrhiza uralensis Fisch. in the treatment of coronary heart disease, but there is a lack of systematic quantitative analysis model and strategy to quantitatively study the compatibility rule and potential mechanism of synergism and toxicity reduction. To address this issue, we designed an integrated model which integrates matrix decomposition and shortest path propagation, taking into account both the crosstalk of the effective network and the propagation characteristics. With the integrated model strategy, we can quantitatively detect the possible mechanisms of synergism and attenuation of Tripterygium wilfordii Hook. f. and Lysimachia christinae Hance in the treatment of lung cancer. The results showed the compatibility of Tripterygium wilfordii Hook. f. and Lysimachia christinae Hance could increase the efficacy and decrease the toxicity of lung cancer treatment through MAPK pathway and PD-1 checkpoint pathway in lung cancer.
Collapse
Affiliation(s)
- Xiaoyi Zhang
- Department of Biochemistry and Molecular Biology, School of Basic Medical Sciences, Southern Medical University, Guangzhou, China
- Guangdong Provincial Key Laboratory of Single Cell Technology and Application, Southern Medical University, Guangzhou, China
| | - Kexin Wang
- Guangdong Provincial Key Laboratory on Brain Function Repair and Regeneration, Department of Neurosurgery, National Key Clinical Specialty/Engineering Technology Research Center of Education Ministry of China, Neurosurgery Institute, Zhujiang Hospital, Southern Medical University, Guangzhou, China
- Institute of Integrated Bioinformedicine and Translational Science, Hong Kong Baptist University, Hong Kong, Hong Kong SAR, China
| | - Hui Dai
- Hospital Office, Ganzhou People’s Hospital, Ganzhou, China
- Hospital Office, Ganzhou Hospital-Nanfang Hospital, Southern Medical University, Guangdong, China
| | - Jieqi Cai
- Department of Biochemistry and Molecular Biology, School of Basic Medical Sciences, Southern Medical University, Guangzhou, China
- Guangdong Provincial Key Laboratory of Single Cell Technology and Application, Southern Medical University, Guangzhou, China
| | - Yujie Liu
- Department of Biochemistry and Molecular Biology, School of Basic Medical Sciences, Southern Medical University, Guangzhou, China
- Guangdong Provincial Key Laboratory of Single Cell Technology and Application, Southern Medical University, Guangzhou, China
| | - Chuanhui Yin
- Department of Biochemistry and Molecular Biology, School of Basic Medical Sciences, Southern Medical University, Guangzhou, China
- Guangdong Provincial Key Laboratory of Single Cell Technology and Application, Southern Medical University, Guangzhou, China
| | - Jie Wu
- Department of Biochemistry and Molecular Biology, School of Basic Medical Sciences, Southern Medical University, Guangzhou, China
- Guangdong Provincial Key Laboratory of Single Cell Technology and Application, Southern Medical University, Guangzhou, China
| | - Xiaowei Li
- Department of Biochemistry and Molecular Biology, School of Basic Medical Sciences, Southern Medical University, Guangzhou, China
- Guangdong Provincial Key Laboratory of Single Cell Technology and Application, Southern Medical University, Guangzhou, China
| | - Guiyong Wu
- Department of Biochemistry and Molecular Biology, School of Basic Medical Sciences, Southern Medical University, Guangzhou, China
- Guangdong Provincial Key Laboratory of Single Cell Technology and Application, Southern Medical University, Guangzhou, China
| | - Aiping Lu
- Institute of Integrated Bioinformedicine and Translational Science, Hong Kong Baptist University, Hong Kong, Hong Kong SAR, China
- Guangdong-Hong Kong-Macau Joint Lab on Chinese Medicine and Immune Disease Research, Guangzhou, China
| | - Qinwen Liu
- Department of Biochemistry and Molecular Biology, School of Basic Medical Sciences, Southern Medical University, Guangzhou, China
- Guangdong Provincial Key Laboratory of Single Cell Technology and Application, Southern Medical University, Guangzhou, China
| | - Daogang Guan
- Department of Biochemistry and Molecular Biology, School of Basic Medical Sciences, Southern Medical University, Guangzhou, China
- Guangdong Provincial Key Laboratory of Single Cell Technology and Application, Southern Medical University, Guangzhou, China
| |
Collapse
|
4
|
Exploring the changes of brain immune microenvironment in Alzheimer's disease based on PANDA algorithm combined with blood brain barrier injury-related genes. Biochem Biophys Res Commun 2021; 557:159-165. [PMID: 33865224 DOI: 10.1016/j.bbrc.2021.04.014] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/13/2021] [Accepted: 04/05/2021] [Indexed: 02/07/2023]
Abstract
Studies have shown that the specific entry of peripheral cells into the brain parenchyma caused by BBB injury and the imbalance of the immune microenvironment in the brain are closely related to the pathogenesis of Alzheimer's disease (AD). Because of the difficulty of obtaining data inside the brain, it is urgent to find out the relationship between the peripheral and intracerebral data and their influence on the development of AD by machine learning methods. However, in the actual algorithm design, it is still a challenge to extract relevant information from a variety of data to establish a complete and accurate regulatory network. In order to overcome the above difficulties, we presented a method based on a message passing model (Passing Attributes between Networks for Data Assimilation, PANDA) to discover the correlation between internal and external brain by the BBB injury-related genes, and further explore their regulatory mechanism of the brain immune environment for AD pathology. The Biological analysis of the results showed that pathways such as immune response pathway, inflammatory response pathway and chemokine signaling pathway are closely related to the pathogenesis of AD. Especially, some significant genes such as RELA, LAMA4, PPBP were found play certain roles in the injury of BBB and the change of permeability in AD patients, thus leading to the change of immune microenvironment in AD brain.
Collapse
|
5
|
Shi M, Tan S, Xie XP, Li A, Yang W, Zhu T, Wang HQ. Globally learning gene regulatory networks based on hidden atomic regulators from transcriptomic big data. BMC Genomics 2020; 21:711. [PMID: 33054712 PMCID: PMC7559338 DOI: 10.1186/s12864-020-07079-8] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/02/2020] [Accepted: 09/18/2020] [Indexed: 12/02/2022] Open
Abstract
BACKGROUND Genes are regulated by various types of regulators and most of them are still unknown or unobserved. Current gene regulatory networks (GRNs) reverse engineering methods often neglect the unknown regulators and infer regulatory relationships in a local and sub-optimal manner. RESULTS This paper proposes a global GRNs inference framework based on dictionary learning, named dlGRN. The method intends to learn atomic regulators (ARs) from gene expression data using a modified dictionary learning (DL) algorithm, which reflects the whole gene regulatory system, and predicts the regulation between a known regulator and a target gene in a global regression way. The modified DL algorithm fits the scale-free property of biological network, rendering dlGRN intrinsically discern direct and indirect regulations. CONCLUSIONS Extensive experimental results on simulation and real-world data demonstrate the effectiveness and efficiency of dlGRN in reverse engineering GRNs. A novel predicted transcription regulation between a TF TFAP2C and an oncogene EGFR was experimentally verified in lung cancer cells. Furthermore, the real application reveals the prevalence of DNA methylation regulation in gene regulatory system. dlGRN can be a standalone tool for GRN inference for its globalization and robustness.
Collapse
Affiliation(s)
- Ming Shi
- MICB Laboratory, Institute of Intelligent Machines, Hefei Institutes of Physical Science, CAS, 350 Shushanghu Road, Hefei, Anhui, 230031, P. R. China
- Current Address: MOE Key Laboratory of Bioinformatics, Division of Bioinformatics and Center for Synthetic and Systems Biology, TNLIST, Department of Automation, Tsinghua University, Beijing, 100084, China
| | - Sheng Tan
- The CAS Key Laboratory of Innate Immunity and Chronic Disease, Division of Life Sciences and Medicine, University of Science and Technology of China, 96 Jinzhai Road, Hefei, Anhui, 230026, P. R. China
| | - Xin-Ping Xie
- School of Mathematics and Physics, Anhui Jianzhu University, 856 Jinzhai Road, Hefei, Anhui, 230022, P. R. China
| | - Ao Li
- School of Information Science and Technology, University of Science and Technology of China, 96 Jinzhai Road, Hefei, Anhui, 230026, P. R. China
| | - Wulin Yang
- Cancer hospital & Anhui Province Key Laboratory of Medical Physics and Technology, Center of Medical Physics and Technology, Hefei Institutes of Physical Science, CAS, 350 Shushanghu Road, Hefei, Anhui, 230031, P. R. China
| | - Tao Zhu
- Current Address: MOE Key Laboratory of Bioinformatics, Division of Bioinformatics and Center for Synthetic and Systems Biology, TNLIST, Department of Automation, Tsinghua University, Beijing, 100084, China.
| | - Hong-Qiang Wang
- MICB Laboratory, Institute of Intelligent Machines, Hefei Institutes of Physical Science, CAS, 350 Shushanghu Road, Hefei, Anhui, 230031, P. R. China.
- Cancer hospital & Anhui Province Key Laboratory of Medical Physics and Technology, Center of Medical Physics and Technology, Hefei Institutes of Physical Science, CAS, 350 Shushanghu Road, Hefei, Anhui, 230031, P. R. China.
| |
Collapse
|
6
|
Mignone P, Pio G, D'Elia D, Ceci M. Exploiting transfer learning for the reconstruction of the human gene regulatory network. Bioinformatics 2020; 36:1553-1561. [PMID: 31608946 DOI: 10.1093/bioinformatics/btz781] [Citation(s) in RCA: 20] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/12/2019] [Revised: 09/13/2019] [Accepted: 10/09/2019] [Indexed: 01/26/2023] Open
Abstract
MOTIVATION The reconstruction of gene regulatory networks (GRNs) from gene expression data has received increasing attention in recent years, due to its usefulness in the understanding of regulatory mechanisms involved in human diseases. Most of the existing methods reconstruct the network through machine learning approaches, by analyzing known examples of interactions. However, (i) they often produce poor results when the amount of labeled examples is limited, or when no negative example is available and (ii) they are not able to exploit information extracted from GRNs of other (better studied) related organisms, when this information is available. RESULTS In this paper, we propose a novel machine learning method that overcomes these limitations, by exploiting the knowledge about the GRN of a source organism for the reconstruction of the GRN of the target organism, by means of a novel transfer learning technique. Moreover, the proposed method is natively able to work in the positive-unlabeled setting, where no negative example is available, by fruitfully exploiting a (possibly large) set of unlabeled examples. In our experiments, we reconstructed the human GRN, by exploiting the knowledge of the GRN of Mus musculus. Results showed that the proposed method outperforms state-of-the-art approaches and identifies previously unknown functional relationships among the analyzed genes. AVAILABILITY AND IMPLEMENTATION http://www.di.uniba.it/∼mignone/systems/biosfer/index.html. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Paolo Mignone
- Department of Computer Science, University of Bari Aldo Moro, Bari 70125, Italy.,National Interuniversity Consortium for Informatics (CINI), Roma 00185, Italy
| | - Gianvito Pio
- Department of Computer Science, University of Bari Aldo Moro, Bari 70125, Italy.,National Interuniversity Consortium for Informatics (CINI), Roma 00185, Italy
| | - Domenica D'Elia
- Institute for Biomedical Technologies, CNR, Institute for Biomedical Technologies, Bari 70126, Italy
| | - Michelangelo Ceci
- Department of Computer Science, University of Bari Aldo Moro, Bari 70125, Italy.,National Interuniversity Consortium for Informatics (CINI), Roma 00185, Italy.,Department of Knowledge Technologies, Jožef Stefan Institute, Ljubljana 1000, Slovenia
| |
Collapse
|
7
|
Sun Q, Kong W, Mou X, Wang S. Transcriptional Regulation Analysis of Alzheimer's Disease Based on FastNCA Algorithm. Curr Bioinform 2019. [DOI: 10.2174/1574893614666190919150411] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/24/2022]
Abstract
Background:
Understanding the relationship between genetic variation and gene expression
is a central issue in genetics. Although many studies have identified genetic variations associated
with gene expression, it is unclear how they perturb the underlying regulatory network of
gene expression.
Objective:
To explore how genetic variations perturb potential transcriptional regulation networks
of Alzheimer’s disease (AD) to paint a more complete picture of the complex landscape of transcription
regulation.
Methods:
Fast network component analysis (FastNCA), which can capture the genetic variations
in the form of single nucleotide polymorphisms (SNPs), is applied to analyse the expression activities
of TFs and their regulatory strengths on TGs using microarray and RNA-seq data of AD.
Then, multi-data fusion analysis was used to analyze the different TGs regulated by the same TFs
in the different data by constructing the transcriptional regulatory networks of differentially expressed
genes.
Results:
the common TF regulating TGs are not necessarily identical in different data, they may be
involved in the same pathways that are closely related to the pathogenesis of AD, such as immune
response, signal transduction and cytokine-cytokine receptor interaction pathways. Even if they are
involved in different pathways, these pathways are also confirmed to have a potential link with
AD.
Conclusion:
The study shows that the pathways of different TGs regulated by the same TFs in different
data are all closely related to AD. Multi-data fusion analysis can form a certain complement
to some extent and get more comprehensive results in the process of exploring the pathogenesis
of AD.
Collapse
Affiliation(s)
- Qianni Sun
- College of Information Engineering, Shanghai Maritime University, 1550 Haigang Ave., Shanghai 201306, China
| | - Wei Kong
- College of Information Engineering, Shanghai Maritime University, 1550 Haigang Ave., Shanghai 201306, China
| | - Xiaoyang Mou
- Department of Biochemistry, Rowan University and Guava Medicine, Glassboro, New Jersey 08028, United States
| | - Shuaiqun Wang
- College of Information Engineering, Shanghai Maritime University, 1550 Haigang Ave., Shanghai 201306, China
| |
Collapse
|
8
|
Poon ENY, Hao B, Guan D, Jun Li M, Lu J, Yang Y, Wu B, Wu SCM, Webb SE, Liang Y, Miller AL, Yao X, Wang J, Yan B, Boheler KR. Integrated transcriptomic and regulatory network analyses identify microRNA-200c as a novel repressor of human pluripotent stem cell-derived cardiomyocyte differentiation and maturation. Cardiovasc Res 2019; 114:894-906. [PMID: 29373717 DOI: 10.1093/cvr/cvy019] [Citation(s) in RCA: 43] [Impact Index Per Article: 7.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 09/28/2017] [Accepted: 01/22/2018] [Indexed: 11/12/2022] Open
Abstract
Aims MicroRNAs (miRNAs) are crucial for the post-transcriptional control of protein-encoding genes and together with transcription factors (TFs) regulate gene expression; however, the regulatory activities of miRNAs during cardiac development are only partially understood. In this study, we tested the hypothesis that integrative computational approaches could identify miRNAs that experimentally could be shown to regulate cardiomyogenesis. Methods and results We integrated expression profiles with bioinformatics analyses of miRNA and TF regulatory programs to identify candidate miRNAs involved with cardiac development. Expression profiling showed that miR-200c, which is not normally detected in adult heart, is progressively down-regulated both during cardiac development and in vitro differentiation of human embryonic stem cells (hESCs) to cardiomyocytes (CMs). We employed computational methodologies to predict target genes of both miR-200c and five key cardiac TFs to identify co-regulated gene networks. The inferred cardiac networks revealed that the cooperative action of miR-200c with these five key TFs, including three (GATA4, SRF and TBX5) targeted by miR-200c, should modulate key processes and pathways necessary for CM development and function. Experimentally, over-expression (OE) of miR-200c in hESC-CMs reduced the mRNA levels of GATA4, SRF and TBX5. Cardiac expression of Ca2+, K+ and Na+ ion channel genes (CACNA1C, KCNJ2 and SCN5A) were also significantly altered by knockdown or OE of miR-200c. Luciferase reporter assays validated miR-200c binding sites on the 3' untranslated region of CACNA1C. In hESC-CMs, elevated miR-200c increased beating frequency, and repressed both Ca2+ influx, mediated by the L-type Ca2+ channel and Ca2+ transients. Conclusions Our analyses demonstrate that miR-200c represses hESC-CM differentiation and maturation. The integrative computation and experimental approaches described here, when applied more broadly, will enhance our understanding of the interplays between miRNAs and TFs in controlling cardiac development and disease processes.
Collapse
Affiliation(s)
- Ellen Ngar-Yun Poon
- School of Biomedical Sciences, LKS Faculty of Medicine, The University of Hong Kong, Hong Kong, China.,Department of Paediatrics and Adolescent Medicine, LKS Faculty of Medicine, The University of Hong Kong, Hong Kong, China
| | - Baixia Hao
- Division of Life Science and State Key Laboratory of Molecular Neuroscience, The Hong Kong University of Science and Technology, Clear Water Bay, Hong Kong, China
| | - Daogang Guan
- School of Chinese Medicine, Hong Kong Baptist University, Hong Kong, China
| | - Mulin Jun Li
- Department of Pharmacology, School of Basic Medical Sciences, Tianjin Medical University, Tianjin, China.,Centre of Genomics Sciences, LKS Faculty of Medicine, The University of Hong Kong. Hong Kong, China
| | - Jun Lu
- School of Biomedical Sciences, LSK Institute of Health Science, The Chinese University of Hong Kong, Hong Kong, China
| | - Yong Yang
- Laboratory for Food Safety and Environmental Technology, Shenzhen Institutes of Advanced Technology, Chinese Academy of Sciences, 1068 Xueyuan Ave, Shenzhen, Guangdong 518055, China
| | - Binbin Wu
- Laboratory for Food Safety and Environmental Technology, Shenzhen Institutes of Advanced Technology, Chinese Academy of Sciences, 1068 Xueyuan Ave, Shenzhen, Guangdong 518055, China
| | - Stanley Chun-Ming Wu
- School of Biomedical Sciences, LKS Faculty of Medicine, The University of Hong Kong, Hong Kong, China
| | - Sarah E Webb
- Division of Life Science and State Key Laboratory of Molecular Neuroscience, The Hong Kong University of Science and Technology, Clear Water Bay, Hong Kong, China
| | - Yan Liang
- Laboratory for Food Safety and Environmental Technology, Shenzhen Institutes of Advanced Technology, Chinese Academy of Sciences, 1068 Xueyuan Ave, Shenzhen, Guangdong 518055, China
| | - Andrew L Miller
- Division of Life Science and State Key Laboratory of Molecular Neuroscience, The Hong Kong University of Science and Technology, Clear Water Bay, Hong Kong, China.,Marine Biology Laboratory, Woods Hole, MA 02543, USA
| | - Xiaoqiang Yao
- School of Biomedical Sciences, LSK Institute of Health Science, The Chinese University of Hong Kong, Hong Kong, China
| | - Junwen Wang
- Centre of Genomics Sciences, LKS Faculty of Medicine, The University of Hong Kong. Hong Kong, China.,Center for Individualized Medicine, Department of Health Sciences Research, Mayo Clinic, Scottsdale, AZ 85259, USA and Department of Biomedical Informatics, Arizona State University, Scottsdale, AZ 85259, USA
| | - Bin Yan
- School of Biomedical Sciences, LKS Faculty of Medicine, The University of Hong Kong, Hong Kong, China.,Centre of Genomics Sciences, LKS Faculty of Medicine, The University of Hong Kong. Hong Kong, China.,Laboratory for Food Safety and Environmental Technology, Shenzhen Institutes of Advanced Technology, Chinese Academy of Sciences, 1068 Xueyuan Ave, Shenzhen, Guangdong 518055, China
| | - Kenneth R Boheler
- School of Biomedical Sciences, LKS Faculty of Medicine, The University of Hong Kong, Hong Kong, China.,Division of Cardiology, Johns Hopkins University School of Medicine, Baltimore, MD 21205, USA
| |
Collapse
|
9
|
Noor A, Ahmad A, Serpedin E. SparseNCA: Sparse Network Component Analysis for Recovering Transcription Factor Activities with Incomplete Prior Information. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2018; 15:387-395. [PMID: 26529780 DOI: 10.1109/tcbb.2015.2495224] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/05/2023]
Abstract
Network component analysis (NCA) is an important method for inferring transcriptional regulatory networks (TRNs) and recovering transcription factor activities (TFAs) using gene expression data, and the prior information about the connectivity matrix. The algorithms currently available crucially depend on the completeness of this prior information. However, inaccuracies in the measurement process may render incompleteness in the available knowledge about the connectivity matrix. Hence, computationally efficient algorithms are needed to overcome the possible incompleteness in the available data. We present a sparse network component analysis algorithm (sparseNCA), which incorporates the effect of incompleteness in the estimation of TRNs by imposing an additional sparsity constraint using the norm, which results in a greater estimation accuracy. In order to improve the computational efficiency, an iterative re-weighted method is proposed for the NCA problem which not only promotes sparsity but is hundreds of times faster than the norm based solution. The performance of sparseNCA is rigorously compared to that of FastNCA and NINCA using synthetic data as well as real data. It is shown that sparseNCA outperforms the existing state-of-the-art algorithms both in terms of estimation accuracy and consistency with the added advantage of low computational complexity. The performance of sparseNCA compared to its predecessors is particularly pronounced in case of incomplete prior information about the sparsity of the network. Subnetwork analysis is performed on the E.coli data which reiterates the superior consistency of the proposed algorithm.
Collapse
|
10
|
An integrative method to decode regulatory logics in gene transcription. Nat Commun 2017; 8:1044. [PMID: 29051499 PMCID: PMC5715098 DOI: 10.1038/s41467-017-01193-0] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/20/2016] [Accepted: 08/25/2017] [Indexed: 12/27/2022] Open
Abstract
Modeling of transcriptional regulatory networks (TRNs) has been increasingly used to dissect the nature of gene regulation. Inference of regulatory relationships among transcription factors (TFs) and genes, especially among multiple TFs, is still challenging. In this study, we introduced an integrative method, LogicTRN, to decode TF–TF interactions that form TF logics in regulating target genes. By combining cis-regulatory logics and transcriptional kinetics into one single model framework, LogicTRN can naturally integrate dynamic gene expression data and TF-DNA-binding signals in order to identify the TF logics and to reconstruct the underlying TRNs. We evaluated the newly developed methodology using simulation, comparison and application studies, and the results not only show their consistence with existing knowledge, but also demonstrate its ability to accurately reconstruct TRNs in biological complex systems. Existing transcriptional regulatory networks models fall short of deciphering the cooperation between multiple transcription factors on dynamic gene expression. Here the authors develop an integrative method that combines gene expression and transcription factor-DNA binding data to decode transcription regulatory logics.
Collapse
|
11
|
Local network component analysis for quantifying transcription factor activities. Methods 2017; 124:25-35. [PMID: 28710010 DOI: 10.1016/j.ymeth.2017.06.018] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/30/2017] [Revised: 05/02/2017] [Accepted: 06/17/2017] [Indexed: 12/16/2022] Open
Abstract
Transcription factors (TFs) could regulate physiological transitions or determine stable phenotypic diversity. The accurate estimation on TF regulatory signals or functional activities is of great significance to guide biological experiments or elucidate molecular mechanisms, but still remains challenging. Traditional methods identify TF regulatory signals at the population level, which masks heterogeneous regulation mechanisms in individuals or subgroups, thus resulting in inaccurate analyses. Here, we propose a novel computational framework, namely local network component analysis (LNCA), to exploit data heterogeneity and automatically quantify accurate transcription factor activity (TFA) in practical terms, through integrating the partitioned expression sets (i.e., local information) and prior TF-gene regulatory knowledge. Specifically, LNCA adopts an adaptive optimization strategy, which evaluates the local similarities of regulation controls and corrects biases during data integration, to construct the TFA landscape. In particular, we first numerically demonstrate the effectiveness of LNCA for the simulated data sets, compared with traditional methods, such as FastNCA, ROBNCA and NINCA. Then, we apply our model to two real data sets with implicit temporal or spatial regulation variations. The results show that LNCA not only recognizes the periodic mode along the S. cerevisiae cell cycle process, but also substantially outperforms over other methods in terms of accuracy and consistency. In addition, the cross-validation study for glioblastomas multiforme (GBM) indicates that the TFAs, identified by LNCA, can better distinguish clinically distinct tumor groups than the expression values of the corresponding TFs, thus opening a new way to classify tumor subtypes and also providing a novel insight into cancer heterogeneity. AVAILABILITY LNCA was implemented as a Matlab package, which is available at http://sysbio.sibcb.ac.cn/cb/chenlab/software.htm/LNCApackage_0.1.rar.
Collapse
|
12
|
Zhang F, Liu R, Zheng J. Sig2GRN: a software tool linking signaling pathway with gene regulatory network for dynamic simulation. BMC SYSTEMS BIOLOGY 2016; 10:123. [PMID: 28155685 PMCID: PMC5259907 DOI: 10.1186/s12918-016-0365-1] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 02/01/2023]
Abstract
Background Linking computational models of signaling pathways to predicted cellular responses such as gene expression regulation is a major challenge in computational systems biology. In this work, we present Sig2GRN, a Cytoscape plugin that is able to simulate time-course gene expression data given the user-defined external stimuli to the signaling pathways. Methods A generalized logical model is used in modeling the upstream signaling pathways. Then a Boolean model and a thermodynamics-based model are employed to predict the downstream changes in gene expression based on the simulated dynamics of transcription factors in signaling pathways. Results Our empirical case studies show that the simulation of Sig2GRN can predict changes in gene expression patterns induced by DNA damage signals and drug treatments. Conclusions As a software tool for modeling cellular dynamics, Sig2GRN can facilitate studies in systems biology by hypotheses generation and wet-lab experimental design. Availability: http://histone.scse.ntu.edu.sg/Sig2GRN/
Collapse
Affiliation(s)
- Fan Zhang
- School of Computer Science and Engineering, Nanyang Technological University, Singapore, 639798, Singapore
| | - Runsheng Liu
- School of Computer Science and Engineering, Nanyang Technological University, Singapore, 639798, Singapore
| | - Jie Zheng
- School of Computer Science and Engineering, Nanyang Technological University, Singapore, 639798, Singapore. .,Complexity Institute, Nanyang Technological University, Singapore, 637723, Singapore. .,Genome Institute of Singapore, Agency for Science, Technology and Research, Singapore, 138672, Singapore.
| |
Collapse
|
13
|
Wang X, Alshawaqfeh M, Dang X, Wajid B, Noor A, Qaraqe M, Serpedin E. An Overview of NCA-Based Algorithms for Transcriptional Regulatory Network Inference. ACTA ACUST UNITED AC 2015; 4:596-617. [PMID: 27600242 PMCID: PMC4996402 DOI: 10.3390/microarrays4040596] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/01/2015] [Revised: 10/07/2015] [Accepted: 11/11/2015] [Indexed: 01/08/2023]
Abstract
In systems biology, the regulation of gene expressions involves a complex network of regulators. Transcription factors (TFs) represent an important component of this network: they are proteins that control which genes are turned on or off in the genome by binding to specific DNA sequences. Transcription regulatory networks (TRNs) describe gene expressions as a function of regulatory inputs specified by interactions between proteins and DNA. A complete understanding of TRNs helps to predict a variety of biological processes and to diagnose, characterize and eventually develop more efficient therapies. Recent advances in biological high-throughput technologies, such as DNA microarray data and next-generation sequence (NGS) data, have made the inference of transcription factor activities (TFAs) and TF-gene regulations possible. Network component analysis (NCA) represents an efficient computational framework for TRN inference from the information provided by microarrays, ChIP-on-chip and the prior information about TF-gene regulation. However, NCA suffers from several shortcomings. Recently, several algorithms based on the NCA framework have been proposed to overcome these shortcomings. This paper first overviews the computational principles behind NCA, and then, it surveys the state-of-the-art NCA-based algorithms proposed in the literature for TRN reconstruction.
Collapse
Affiliation(s)
- Xu Wang
- Department of Electrical and Computer Engineering, Texas A&M University, College Station, TX 77843, USA.
| | - Mustafa Alshawaqfeh
- Department of Electrical and Computer Engineering, Texas A&M University, College Station, TX 77843, USA.
| | - Xuan Dang
- Department of Electrical and Computer Engineering, Texas A&M University, College Station, TX 77843, USA.
| | - Bilal Wajid
- Department of Electrical and Computer Engineering, Texas A&M University, College Station, TX 77843, USA.
| | - Amina Noor
- Institute of Genomic Medicine, University of California San Diego, La Jolla, CA 92093, USA.
| | - Marwa Qaraqe
- Department of Electrical and Computer Engineering, Texas A&M University, College Station, TX 77843, USA.
| | - Erchin Serpedin
- Department of Electrical and Computer Engineering, Texas A&M University, College Station, TX 77843, USA.
| |
Collapse
|
14
|
Jayavelu ND, Aasgaard LS, Bar N. Iterative sub-network component analysis enables reconstruction of large scale genetic networks. BMC Bioinformatics 2015; 16:366. [PMID: 26537518 PMCID: PMC4634733 DOI: 10.1186/s12859-015-0768-9] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/23/2015] [Accepted: 10/09/2015] [Indexed: 11/28/2022] Open
Abstract
Background Network component analysis (NCA) became a popular tool to understand complex regulatory networks. The method uses high-throughput gene expression data and a priori topology to reconstruct transcription factor activity profiles. Current NCA algorithms are constrained by several conditions posed on the network topology, to guarantee unique reconstruction (termed compliancy). However, the restrictions these conditions pose are not necessarily true from biological perspective and they force network size reduction, pruning potentially important components. Results To address this, we developed a novel, Iterative Sub-Network Component Analysis (ISNCA) for reconstructing networks at any size. By dividing the initial network into smaller, compliant subnetworks, the algorithm first predicts the reconstruction of each subntework using standard NCA algorithms. It then subtracts from the reconstruction the contribution of the shared components from the other subnetwork. We tested the ISNCA on real, large datasets using various NCA algorithms. The size of the networks we tested and the accuracy of the reconstruction increased significantly. Importantly, FOXA1, ATF2, ATF3 and many other known key regulators in breast cancer could not be incorporated by any NCA algorithm because of the necessary conditions. However, their temporal activities could be reconstructed by our algorithm, and therefore their involvement in breast cancer could be analyzed. Conclusions Our framework enables reconstruction of large gene expression data networks, without reducing their size or pruning potentially important components, and at the same time rendering the results more biological plausible. Our ISNCA method is not only suitable for prediction of key regulators in cancer studies, but it can be applied to any high-throughput gene expression data. Electronic supplementary material The online version of this article (doi:10.1186/s12859-015-0768-9) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Naresh Doni Jayavelu
- Department of Chemical Engineering, Norwegian University of Science and Technology (NTNU), Sem Salandsvei 4, Trondheim, Norway.
| | - Lasse S Aasgaard
- Department of Chemical Engineering, Norwegian University of Science and Technology (NTNU), Sem Salandsvei 4, Trondheim, Norway.
| | - Nadav Bar
- Department of Chemical Engineering, Norwegian University of Science and Technology (NTNU), Sem Salandsvei 4, Trondheim, Norway.
| |
Collapse
|
15
|
Cai Y, Dai X, Zhang Q, Dai Z. Gene expression of OCT4, SOX2, KLF4 and MYC (OSKM) induced pluripotent stem cells: identification for potential mechanisms. Diagn Pathol 2015; 10:35. [PMID: 25907774 PMCID: PMC4414430 DOI: 10.1186/s13000-015-0263-7] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/20/2014] [Accepted: 04/06/2015] [Indexed: 02/09/2023] Open
Abstract
Background Somatic cells could be reprogrammed to induced pluripotent stem cells (iPS) by ectopic expression of OCT4, SOX2, KLF4 and MYC (OSKM). We aimed to gain insights into the early mechanisms underlying the induction of pluripotency. Methods GSE28688 containing 14 gene expression profiles were downloaded from GEO, including untreated human neonatal foreskin fibroblasts (HFF1) as control, OSKM-induced HFF1 (at 24, 48, 72 h post-transduction of OSKM encoding viruses), two iPS cell lines, and two embryonic stem (ES) cell lines. Differentially expressed genes (DEGs) were screened between different cell lines and the control by Limma package in Bioconductor. KEGG pathway enrichment analysis was performed by DAVID. The STRING database was used to construct protein-protein interaction (PPI) network. Activities and regulatory networks of transcription factors (TFs) were calculated and constructed by Fast Network Component Analysis (FastNCA). Results Compared with untreated HFF1, 117, 347, 557, 2263 and 2307 DEGs were obtained from three point post-transduction HFF1, iPS and ES cells. Meanwhile, up-regulated DEGs in first two days of HFF1 were mainly enriched in RIG-I-like receptor (RLR) and Toll-like receptor (TLR) signaling pathways. Down-regulated DEGs at 72 h were significantly enriched in focal adhesion pathway which was similar to iPS cells. Moreover, ISG15, IRF7, STAT1 and DDX58 were with higher degree in PPI networks during time series. Furthermore, the targets of six selected TFs were mainly enriched in screened DEGs. Conclusion In this study, screened DEGs including ISG15, IRF7 and CCL5 participated in OSKM-induced pluripotency might attenuate immune response post-transduction through RLR and TLR signaling pathways. Virtual slides The virtual slide(s) for this article can be found here: http://www.diagnosticpathology.diagnomx.eu/vs/2503890341543007.
Collapse
Affiliation(s)
- Yanning Cai
- School of Information Science and Technology, Sun Yat-sen University, Higher Education Mega Center, No.132 East Outer Ring Road, Guangzhou, China. .,SYSU-CMU Shunde International Joint Research Institute (JRI), Shunde, Guangdong, China.
| | - Xianhua Dai
- School of Information Science and Technology, Sun Yat-sen University, Higher Education Mega Center, No.132 East Outer Ring Road, Guangzhou, China. .,SYSU-CMU Shunde International Joint Research Institute (JRI), Shunde, Guangdong, China.
| | - Qianhua Zhang
- School of Information Science and Technology, Sun Yat-sen University, Higher Education Mega Center, No.132 East Outer Ring Road, Guangzhou, China. .,SYSU-CMU Shunde International Joint Research Institute (JRI), Shunde, Guangdong, China.
| | - Zhiming Dai
- School of Information Science and Technology, Sun Yat-sen University, Higher Education Mega Center, No.132 East Outer Ring Road, Guangzhou, China. .,SYSU-CMU Shunde International Joint Research Institute (JRI), Shunde, Guangdong, China.
| |
Collapse
|
16
|
Chen YH, Yang CD, Tseng CP, Huang HD, Ho SY. GeNOSA: inferring and experimentally supporting quantitative gene regulatory networks in prokaryotes. Bioinformatics 2015; 31:2151-8. [DOI: 10.1093/bioinformatics/btv075] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/19/2014] [Accepted: 01/30/2015] [Indexed: 11/14/2022] Open
|
17
|
Shi X, Gu J, Chen X, Shajahan A, Hilakivi-Clarke L, Clarke R, Xuan J. mAPC-GibbsOS: an integrated approach for robust identification of gene regulatory networks. BMC SYSTEMS BIOLOGY 2013; 7 Suppl 5:S4. [PMID: 24564939 PMCID: PMC4028818 DOI: 10.1186/1752-0509-7-s5-s4] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 01/29/2023]
Abstract
Background Identification of cooperative gene regulatory network is an important topic for biological study especially in cancer research. Traditional approaches suffer from large noise in gene expression data and false positive connections in motif binding data; they also fail to identify the modularized structure of gene regulatory network. Methods that are capable of revealing underlying modularized structure and robust to noise and false positives are needed to be developed. Results We proposed and developed an integrated approach to identify gene regulatory networks, which consists of a novel clustering method (namely motif-guided affinity propagation clustering (mAPC)) and a sampling based method (called Gibbs sampler based on outlier sum statistic (GibbsOS)). mAPC is used in the first step to obtain co-regulated gene modules by clustering genes with a similarity measurement taking into account both gene expression data and binding motif information. This clustering method can reduce the noise effect from microarray data to obtain modularized gene clusters. However, due to many false positives in motif binding data, some genes not regulated by certain transcription factors (TFs) will be falsely clustered with true target genes. To overcome this problem, GibbsOS is applied in the second step to refine each cluster for the identification of true target genes. In order to evaluate the performance of the proposed method, we generated simulation data under different signal-to-noise ratios and false positive ratios to test the method. The experimental results show an improved accuracy in terms of clustering and transcription factor identification. Moreover, an improved performance is demonstrated in target gene identification as compared with GibbsOS. Finally, we applied the proposed method to two breast cancer patient datasets to identify cooperative transcriptional regulatory networks associated with recurrence of breast cancer, as supported by their functional annotations. Conclusions We have developed a two-step approach for gene regulatory network identification, featuring an integrated method to identify modularized regulatory structures and refine their target genes subsequently. Simulation studies have shown the robustness of the method against noise in gene expression data and false positives in motif binding data. The proposed method has been applied to two breast cancer gene expression datasets to infer the hidden regulation mechanisms. The experimental results demonstrate the efficacy of the method in identifying key regulatory networks related to the progression and recurrence of breast cancer.
Collapse
|
18
|
Misra A, Sriram G. Network component analysis provides quantitative insights on an Arabidopsis transcription factor-gene regulatory network. BMC SYSTEMS BIOLOGY 2013; 7:126. [PMID: 24228871 PMCID: PMC3843564 DOI: 10.1186/1752-0509-7-126] [Citation(s) in RCA: 10] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 08/24/2013] [Accepted: 11/05/2013] [Indexed: 01/01/2023]
Abstract
Background Gene regulatory networks (GRNs) are models of molecule-gene interactions instrumental in the coordination of gene expression. Transcription factor (TF)-GRNs are an important subset of GRNs that characterize gene expression as the effect of TFs acting on their target genes. Although such networks can qualitatively summarize TF-gene interactions, it is highly desirable to quantitatively determine the strengths of the interactions in a TF-GRN as well as the magnitudes of TF activities. To our knowledge, such analysis is rare in plant biology. A computational methodology developed for this purpose is network component analysis (NCA), which has been used for studying large-scale microbial TF-GRNs to obtain nontrivial, mechanistic insights. In this work, we employed NCA to quantitatively analyze a plant TF-GRN important in floral development using available regulatory information from AGRIS, by processing previously reported gene expression data from four shoot apical meristem cell types. Results The NCA model satisfactorily accounted for gene expression measurements in a TF-GRN of seven TFs (LFY, AG, SEPALLATA3 [SEP3], AP2, AGL15, HY5 and AP3/PI) and 55 genes. NCA found strong interactions between certain TF-gene pairs including LFY → MYB17, AG → CRC, AP2 → RD20, AGL15 → RAV2 and HY5 → HLH1, and the direction of the interaction (activation or repression) for some AGL15 targets for which this information was not previously available. The activity trends of four TFs - LFY, AG, HY5 and AP3/PI as deduced by NCA correlated well with the changes in expression levels of the genes encoding these TFs across all four cell types; such a correlation was not observed for SEP3, AP2 and AGL15. Conclusions For the first time, we have reported the use of NCA to quantitatively analyze a plant TF-GRN important in floral development for obtaining nontrivial information about connectivity strengths between TFs and their target genes as well as TF activity. However, since NCA relies on documented connectivity information about the underlying TF-GRN, it is currently limited in its application to larger plant networks because of the lack of documented connectivities. In the future, the identification of interactions between plant TFs and their target genes on a genome scale would allow the use of NCA to provide quantitative regulatory information about plant TF-GRNs, leading to improved insights on cellular regulatory programs.
Collapse
Affiliation(s)
| | - Ganesh Sriram
- Department of Chemical and Biomolecular Engineering, University of Maryland, College Park, MD 20742, USA.
| |
Collapse
|
19
|
Chen X, Xuan J, Wang C, Shajahan AN, Riggins RB, Clarke R. Reconstruction of transcriptional regulatory networks by stability-based network component analysis. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2013; 10:1347-1358. [PMID: 24407294 PMCID: PMC3652899 DOI: 10.1109/tcbb.2012.146] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/03/2023]
Abstract
Reliable inference of transcription regulatory networks is a challenging task in computational biology. Network component analysis (NCA) has become a powerful scheme to uncover regulatory networks behind complex biological processes. However, the performance of NCA is impaired by the high rate of false connections in binding information. In this paper, we integrate stability analysis with NCA to form a novel scheme, namely stability-based NCA (sNCA), for regulatory network identification. The method mainly addresses the inconsistency between gene expression data and binding motif information. Small perturbations are introduced to prior regulatory network, and the distance among multiple estimated transcript factor (TF) activities is computed to reflect the stability for each TF's binding network. For target gene identification, multivariate regression and t-statistic are used to calculate the significance for each TF-gene connection. Simulation studies are conducted and the experimental results show that sNCA can achieve an improved and robust performance in TF identification as compared to NCA. The approach for target gene identification is also demonstrated to be suitable for identifying true connections between TFs and their target genes. Furthermore, we have successfully applied sNCA to breast cancer data to uncover the role of TFs in regulating endocrine resistance in breast cancer.
Collapse
Affiliation(s)
- Xi Chen
- Virginia Polytechnic Institute and State University, Arlington
| | - Jianhua Xuan
- Virginia Polytechnic Institute and State University, Arlington
| | - Chen Wang
- Virginia Polytechnic Institute and State University, Arlington
| | | | | | | |
Collapse
|
20
|
Noor A, Ahmad A, Serpedin E, Nounou M, Nounou H. ROBNCA: robust network component analysis for recovering transcription factor activities. ACTA ACUST UNITED AC 2013; 29:2410-8. [PMID: 23940252 DOI: 10.1093/bioinformatics/btt433] [Citation(s) in RCA: 16] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022]
Abstract
MOTIVATION Network component analysis (NCA) is an efficient method of reconstructing the transcription factor activity (TFA), which makes use of the gene expression data and prior information available about transcription factor (TF)-gene regulations. Most of the contemporary algorithms either exhibit the drawback of inconsistency and poor reliability, or suffer from prohibitive computational complexity. In addition, the existing algorithms do not possess the ability to counteract the presence of outliers in the microarray data. Hence, robust and computationally efficient algorithms are needed to enable practical applications. RESULTS We propose ROBust Network Component Analysis (ROBNCA), a novel iterative algorithm that explicitly models the possible outliers in the microarray data. An attractive feature of the ROBNCA algorithm is the derivation of a closed form solution for estimating the connectivity matrix, which was not available in prior contributions. The ROBNCA algorithm is compared with FastNCA and the non-iterative NCA (NI-NCA). ROBNCA estimates the TF activity profiles as well as the TF-gene control strength matrix with a much higher degree of accuracy than FastNCA and NI-NCA, irrespective of varying noise, correlation and/or amount of outliers in case of synthetic data. The ROBNCA algorithm is also tested on Saccharomyces cerevisiae data and Escherichia coli data, and it is observed to outperform the existing algorithms. The run time of the ROBNCA algorithm is comparable with that of FastNCA, and is hundreds of times faster than NI-NCA. AVAILABILITY The ROBNCA software is available at http://people.tamu.edu/∼amina/ROBNCA
Collapse
Affiliation(s)
- Amina Noor
- Department of Electrical and Computer Engineering, Texas A&M University, College Station, TX 77843, USA, Corporate Research and Development, Qualcomm Technologies Inc., San Diego, CA 92121, USA, Department of Chemical Engineering and Department of Electrical Engineering, Texas A&M University at Qatar, Doha Qatar
| | | | | | | | | |
Collapse
|
21
|
Chintapalli VR, Wang J, Herzyk P, Davies SA, Dow JAT. Data-mining the FlyAtlas online resource to identify core functional motifs across transporting epithelia. BMC Genomics 2013; 14:518. [PMID: 23895496 PMCID: PMC3734111 DOI: 10.1186/1471-2164-14-518] [Citation(s) in RCA: 40] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/05/2012] [Accepted: 07/26/2013] [Indexed: 11/23/2022] Open
Abstract
Background Comparative analysis of tissue-specific transcriptomes is a powerful technique to uncover tissue functions. Our FlyAtlas.org provides authoritative gene expression levels for multiple tissues of Drosophila melanogaster (1). Although the main use of such resources is single gene lookup, there is the potential for powerful meta-analysis to address questions that could not easily be framed otherwise. Here, we illustrate the power of data-mining of FlyAtlas data by comparing epithelial transcriptomes to identify a core set of highly-expressed genes, across the four major epithelial tissues (salivary glands, Malpighian tubules, midgut and hindgut) of both adults and larvae. Method Parallel hypothesis-led and hypothesis-free approaches were adopted to identify core genes that underpin insect epithelial function. In the former, gene lists were created from transport processes identified in the literature, and their expression profiles mapped from the flyatlas.org online dataset. In the latter, gene enrichment lists were prepared for each epithelium, and genes (both transport related and unrelated) consistently enriched in transporting epithelia identified. Results A key set of transport genes, comprising V-ATPases, cation exchangers, aquaporins, potassium and chloride channels, and carbonic anhydrase, was found to be highly enriched across the epithelial tissues, compared with the whole fly. Additionally, a further set of genes that had not been predicted to have epithelial roles, were co-expressed with the core transporters, extending our view of what makes a transporting epithelium work. Further insights were obtained by studying the genes uniquely overexpressed in each epithelium; for example, the salivary gland expresses lipases, the midgut organic solute transporters, the tubules specialize for purine metabolism and the hindgut overexpresses still unknown genes. Conclusion Taken together, these data provide a unique insight into epithelial function in this key model insect, and a framework for comparison with other species. They also provide a methodology for function-led datamining of FlyAtlas.org and other multi-tissue expression datasets.
Collapse
Affiliation(s)
- Venkateswara R Chintapalli
- Institute of Molecular, Cell and Systems Biology, College of Medical, Veterinary and Life Sciences, University of Glasgow, Glasgow G12 8QQ, UK
| | | | | | | | | |
Collapse
|
22
|
Membrane stress caused by octanoic acid in Saccharomyces cerevisiae. Appl Microbiol Biotechnol 2013; 97:3239-51. [DOI: 10.1007/s00253-013-4773-5] [Citation(s) in RCA: 70] [Impact Index Per Article: 5.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/11/2013] [Revised: 02/07/2013] [Accepted: 02/11/2013] [Indexed: 02/04/2023]
|
23
|
Polo JM, Anderssen E, Walsh RM, Schwarz BA, Nefzger CM, Lim SM, Borkent M, Apostolou E, Alaei S, Cloutier J, Bar-Nur O, Cheloufi S, Stadtfeld M, Figueroa ME, Robinton D, Natesan S, Melnick A, Zhu J, Ramaswamy S, Hochedlinger K. A molecular roadmap of reprogramming somatic cells into iPS cells. Cell 2012; 151:1617-32. [PMID: 23260147 PMCID: PMC3608203 DOI: 10.1016/j.cell.2012.11.039] [Citation(s) in RCA: 669] [Impact Index Per Article: 51.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/12/2012] [Revised: 10/09/2012] [Accepted: 11/20/2012] [Indexed: 12/28/2022]
Abstract
Factor-induced reprogramming of somatic cells into induced pluripotent stem cells (iPSCs) is inefficient, complicating mechanistic studies. Here, we examined defined intermediate cell populations poised to becoming iPSCs by genome-wide analyses. We show that induced pluripotency elicits two transcriptional waves, which are driven by c-Myc/Klf4 (first wave) and Oct4/Sox2/Klf4 (second wave). Cells that become refractory to reprogramming activate the first but fail to initiate the second transcriptional wave and can be rescued by elevated expression of all four factors. The establishment of bivalent domains occurs gradually after the first wave, whereas changes in DNA methylation take place after the second wave when cells acquire stable pluripotency. This integrative analysis allowed us to identify genes that act as roadblocks during reprogramming and surface markers that further enrich for cells prone to forming iPSCs. Collectively, our data offer new mechanistic insights into the nature and sequence of molecular events inherent to cellular reprogramming.
Collapse
Affiliation(s)
- Jose M. Polo
- Massachusetts General Hospital Cancer Center and Center for Regenerative Medicine, 185 Cambridge Street, Boston, MA 02114, USA
- Harvard Stem Cell Institute, 1350 Massachusetts Avenue, Cambridge, MA 02138, USA
- Monash Immunology and Stem Cell Laboratories, Monash University, Wellington Rd, Clayton, Vic 3800, Australia
- Adjunct to Australian Regenerative Medicine Institute, Monash University, Wellington Rd, Clayton, Vic 3800, Australia
| | - Endre Anderssen
- Massachusetts General Hospital Cancer Center and Center for Regenerative Medicine, 185 Cambridge Street, Boston, MA 02114, USA
- Harvard Stem Cell Institute, 1350 Massachusetts Avenue, Cambridge, MA 02138, USA
- Broad Institute of MIT and Harvard, 7 Cambridge Center, Cambridge, MA 02142, USA
| | - Ryan M. Walsh
- Massachusetts General Hospital Cancer Center and Center for Regenerative Medicine, 185 Cambridge Street, Boston, MA 02114, USA
- Harvard Stem Cell Institute, 1350 Massachusetts Avenue, Cambridge, MA 02138, USA
| | - Benjamin A. Schwarz
- Massachusetts General Hospital Cancer Center and Center for Regenerative Medicine, 185 Cambridge Street, Boston, MA 02114, USA
- Harvard Stem Cell Institute, 1350 Massachusetts Avenue, Cambridge, MA 02138, USA
| | - Christian M. Nefzger
- Monash Immunology and Stem Cell Laboratories, Monash University, Wellington Rd, Clayton, Vic 3800, Australia
| | - Sue Mei Lim
- Monash Immunology and Stem Cell Laboratories, Monash University, Wellington Rd, Clayton, Vic 3800, Australia
| | - Marti Borkent
- Massachusetts General Hospital Cancer Center and Center for Regenerative Medicine, 185 Cambridge Street, Boston, MA 02114, USA
- Harvard Stem Cell Institute, 1350 Massachusetts Avenue, Cambridge, MA 02138, USA
- Erasmus Medical Center Rotterdam, Department of Reproduction and Development, Dr. Molewaterplein 50, 3015 GE Rotterdam, The Netherlands
| | - Effie Apostolou
- Massachusetts General Hospital Cancer Center and Center for Regenerative Medicine, 185 Cambridge Street, Boston, MA 02114, USA
- Harvard Stem Cell Institute, 1350 Massachusetts Avenue, Cambridge, MA 02138, USA
| | - Sara Alaei
- Monash Immunology and Stem Cell Laboratories, Monash University, Wellington Rd, Clayton, Vic 3800, Australia
| | - Jennifer Cloutier
- Massachusetts General Hospital Cancer Center and Center for Regenerative Medicine, 185 Cambridge Street, Boston, MA 02114, USA
- Harvard Stem Cell Institute, 1350 Massachusetts Avenue, Cambridge, MA 02138, USA
| | - Ori Bar-Nur
- Massachusetts General Hospital Cancer Center and Center for Regenerative Medicine, 185 Cambridge Street, Boston, MA 02114, USA
- Harvard Stem Cell Institute, 1350 Massachusetts Avenue, Cambridge, MA 02138, USA
| | - Sihem Cheloufi
- Massachusetts General Hospital Cancer Center and Center for Regenerative Medicine, 185 Cambridge Street, Boston, MA 02114, USA
- Harvard Stem Cell Institute, 1350 Massachusetts Avenue, Cambridge, MA 02138, USA
| | - Matthias Stadtfeld
- Massachusetts General Hospital Cancer Center and Center for Regenerative Medicine, 185 Cambridge Street, Boston, MA 02114, USA
- Harvard Stem Cell Institute, 1350 Massachusetts Avenue, Cambridge, MA 02138, USA
| | - Maria Eugenia Figueroa
- Department of Medicine, Hematology Oncology Division, Weill Cornell Medical College, New York, NY 10065, USA
| | - Daisy Robinton
- Massachusetts General Hospital Cancer Center and Center for Regenerative Medicine, 185 Cambridge Street, Boston, MA 02114, USA
- Harvard Stem Cell Institute, 1350 Massachusetts Avenue, Cambridge, MA 02138, USA
| | | | - Ari Melnick
- Department of Medicine, Hematology Oncology Division, Weill Cornell Medical College, New York, NY 10065, USA
| | - Jinfang Zhu
- National Institute of Allergy and Infectious Disease, National Institutes of Health, Bethesda, MD 20892, USA
| | - Sridhar Ramaswamy
- Massachusetts General Hospital Cancer Center and Center for Regenerative Medicine, 185 Cambridge Street, Boston, MA 02114, USA
- Harvard Stem Cell Institute, 1350 Massachusetts Avenue, Cambridge, MA 02138, USA
- Broad Institute of MIT and Harvard, 7 Cambridge Center, Cambridge, MA 02142, USA
| | - Konrad Hochedlinger
- Massachusetts General Hospital Cancer Center and Center for Regenerative Medicine, 185 Cambridge Street, Boston, MA 02114, USA
- Harvard Stem Cell Institute, 1350 Massachusetts Avenue, Cambridge, MA 02138, USA
- Howard Hughes Medical Institute and Department of Stem Cell and Regenerative Biology, Harvard University and Harvard Medical School, 7 Divinity Avenue, Cambridge, MA 02138, USA
| |
Collapse
|
24
|
Jacklin N, Ding Z, Chen W, Chang C. Noniterative convex optimization methods for network component analysis. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2012; 9:1472-1481. [PMID: 22641712 DOI: 10.1109/tcbb.2012.81] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/01/2023]
Abstract
This work studies the reconstruction of gene regulatory networks by the means of network component analysis (NCA). We will expound a family of convex optimization-based methods for estimating the transcription factor control strengths and the transcription factor activities (TFAs). The approach taken in this work is to decompose the problem into a network connectivity strength estimation phase and a transcription factor activity estimation phase. In the control strength estimation phase, we formulate a new subspace-based method incorporating a choice of multiple error metrics. For the source estimation phase we propose a total least squares (TLS) formulation that generalizes many existing methods. Both estimation procedures are noniterative and yield the optimal estimates according to various proposed error metrics. We test the performance of the proposed algorithms on simulated data and experimental gene expression data for the yeast Saccharomyces cerevisiae and demonstrate that the proposed algorithms have superior effectiveness in comparison with both Bayesian Decomposition (BD) and our previous FastNCA approach, while the computational complexity is still orders of magnitude less than BD.
Collapse
Affiliation(s)
- Neil Jacklin
- Department of Electrical and Computer Engineering, University of California, Davis, CA 95616, USA.
| | | | | | | |
Collapse
|
25
|
Wang C, Xuan J, Shih IM, Clarke R, Wang Y. Regulatory component analysis: a semi-blind extraction approach to infer gene regulatory networks with imperfect biological knowledge. SIGNAL PROCESSING 2012; 92:1902-1915. [PMID: 22685363 PMCID: PMC3367667 DOI: 10.1016/j.sigpro.2011.11.028] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/01/2023]
Abstract
With the advent of high-throughput biotechnology capable of monitoring genomic signals, it becomes increasingly promising to understand molecular cellular mechanisms through systems biology approaches. One of the active research topics in systems biology is to infer gene transcriptional regulatory networks using various genomic data; this inference problem can be formulated as a linear model with latent signals associated with some regulatory proteins called transcription factors (TFs). As common statistical assumptions may not hold for genomic signals, typical latent variable algorithms such as independent component analysis (ICA) are incapable to reveal underlying true regulatory signals. Liao et al. [1] proposed to perform inference using an approach named network component analysis (NCA), the optimization of which is achieved by a least-squares fitting approach with biological knowledge constraints. However, the incompleteness of biological knowledge and its inconsistency with gene expression data are not considered in the original NCA solution, which could greatly affect the inference accuracy. To overcome these limitations, we propose a linear extraction scheme, namely regulatory component analysis (RCA), to infer underlying regulatory signals even with partial biological knowledge. Numerical simulations show a significant improvement of our proposed RCA over NCA, not only when signal-to-noise-ratio (SNR) is low, but also when the given biological knowledge is incomplete and inconsistent to gene expression data. Furthermore, real biological experiments on E. coli are performed for regulatory network inference in comparison with several typical linear latent variable methods, which again demonstrates the effectiveness and improved performance of the proposed algorithm.
Collapse
Affiliation(s)
- Chen Wang
- Bradley Dept. of Electrical and Computer Engineering, Virginia Tech, Arlington, VA 22203, USA
| | - Jianhua Xuan
- Bradley Dept. of Electrical and Computer Engineering, Virginia Tech, Arlington, VA 22203, USA
| | - Ie-Ming Shih
- Dept. of Pathology, Johns Hopkins University, Baltimore, MD 21231, USA
| | - Robert Clarke
- Lombardi Comprehensive Cancer Center and Department of Oncology, Physiology and Biophysics, Georgetown University, Washington, DC 20057, USA
| | - Yue Wang
- Bradley Dept. of Electrical and Computer Engineering, Virginia Tech, Arlington, VA 22203, USA
| |
Collapse
|
26
|
Hasdemir D, Smits GJ, Westerhuis JA, Smilde AK. Topology of transcriptional regulatory networks: testing and improving. PLoS One 2012; 7:e40082. [PMID: 22844399 PMCID: PMC3402518 DOI: 10.1371/journal.pone.0040082] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/12/2012] [Accepted: 06/05/2012] [Indexed: 12/03/2022] Open
Abstract
With the increasing amount and complexity of data generated in biological experiments it is becoming necessary to enhance the performance and applicability of existing statistical data analysis methods. This enhancement is needed for the hidden biological information to be better resolved and better interpreted. Towards that aim, systematic incorporation of prior information in biological data analysis has been a challenging problem for systems biology. Several methods have been proposed to integrate data from different levels of information most notably from metabolomics, transcriptomics and proteomics and thus enhance biological interpretation. However, in order not to be misled by the dominance of incorrect prior information in the analysis, being able to discriminate between competing prior information is required. In this study, we show that discrimination between topological information in competing transcriptional regulatory network models is possible solely based on experimental data. We use network topology dependent decomposition of synthetic gene expression data to introduce both local and global discriminating measures. The measures indicate how well the gene expression data can be explained under the constraints of the model network topology and how much each regulatory connection in the model refuses to be constrained. Application of the method to the cell cycle regulatory network of Saccharomyces cerevisiae leads to the prediction of novel regulatory interactions, improving the information content of the hypothesized network model.
Collapse
Affiliation(s)
- Dicle Hasdemir
- Biosystems Data Analysis, Swammerdam Institute for Life Sciences, University of Amsterdam, Amsterdam, The Netherlands.
| | | | | | | |
Collapse
|
27
|
Gu J, Xuan J, Riggins RB, Chen L, Wang Y, Clarke R. Robust identification of transcriptional regulatory networks using a Gibbs sampler on outlier sum statistic. Bioinformatics 2012; 28:1990-7. [PMID: 22595208 DOI: 10.1093/bioinformatics/bts296] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open
Abstract
MOTIVATION Identification of transcriptional regulatory networks (TRNs) is of significant importance in computational biology for cancer research, providing a critical building block to unravel disease pathways. However, existing methods for TRN identification suffer from the inclusion of excessive 'noise' in microarray data and false-positives in binding data, especially when applied to human tumor-derived cell line studies. More robust methods that can counteract the imperfection of data sources are therefore needed for reliable identification of TRNs in this context. RESULTS In this article, we propose to establish a link between the quality of one target gene to represent its regulator and the uncertainty of its expression to represent other target genes. Specifically, an outlier sum statistic was used to measure the aggregated evidence for regulation events between target genes and their corresponding transcription factors. A Gibbs sampling method was then developed to estimate the marginal distribution of the outlier sum statistic, hence, to uncover underlying regulatory relationships. To evaluate the effectiveness of our proposed method, we compared its performance with that of an existing sampling-based method using both simulation data and yeast cell cycle data. The experimental results show that our method consistently outperforms the competing method in different settings of signal-to-noise ratio and network topology, indicating its robustness for biological applications. Finally, we applied our method to breast cancer cell line data and demonstrated its ability to extract biologically meaningful regulatory modules related to estrogen signaling and action in breast cancer. AVAILABILITY AND IMPLEMENTATION The Gibbs sampler MATLAB package is freely available at http://www.cbil.ece.vt.edu/software.htm. CONTACT xuan@vt.edu SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Jinghua Gu
- Bradley Department of Electrical and Computer Engineering, Virginia Tech, Arlington, VA 22203, USA
| | | | | | | | | | | |
Collapse
|
28
|
Zhang X, Cheng W, Listgarten J, Kadie C, Huang S, Wang W, Heckerman D. Learning transcriptional regulatory relationships using sparse graphical models. PLoS One 2012; 7:e35762. [PMID: 22586449 PMCID: PMC3346750 DOI: 10.1371/journal.pone.0035762] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/21/2011] [Accepted: 03/21/2012] [Indexed: 11/19/2022] Open
Abstract
Understanding the organization and function of transcriptional regulatory networks by analyzing high-throughput gene expression profiles is a key problem in computational biology. The challenges in this work are 1) the lack of complete knowledge of the regulatory relationship between the regulators and the associated genes, 2) the potential for spurious associations due to confounding factors, and 3) the number of parameters to learn is usually larger than the number of available microarray experiments. We present a sparse (L1 regularized) graphical model to address these challenges. Our model incorporates known transcription factors and introduces hidden variables to represent possible unknown transcription and confounding factors. The expression level of a gene is modeled as a linear combination of the expression levels of known transcription factors and hidden factors. Using gene expression data covering 39,296 oligonucleotide probes from 1109 human liver samples, we demonstrate that our model better predicts out-of-sample data than a model with no hidden variables. We also show that some of the gene sets associated with hidden variables are strongly correlated with Gene Ontology categories. The software including source code is available at http://grnl1.codeplex.com.
Collapse
Affiliation(s)
- Xiang Zhang
- Microsoft Research, Los Angeles, California, United States of America
- Case Western Reserve University, Cleveland, Ohio, United States of America
- University of North Carolina at Chapel Hill, Chapel Hill, North Carolina, United States of America
| | - Wei Cheng
- University of North Carolina at Chapel Hill, Chapel Hill, North Carolina, United States of America
| | | | - Carl Kadie
- Microsoft Research, Los Angeles, California, United States of America
| | - Shunping Huang
- University of North Carolina at Chapel Hill, Chapel Hill, North Carolina, United States of America
| | - Wei Wang
- University of North Carolina at Chapel Hill, Chapel Hill, North Carolina, United States of America
| | - David Heckerman
- Microsoft Research, Los Angeles, California, United States of America
- * E-mail:
| |
Collapse
|
29
|
Ochs MF, Fertig EJ. Matrix Factorization for Transcriptional Regulatory Network Inference. IEEE SYMPOSIUM ON COMPUTATIONAL INTELLIGENCE IN BIOINFORMATICS AND COMPUTATIONAL BIOLOGY PROCEEDINGS. IEEE SYMPOSIUM ON COMPUTATIONAL INTELLIGENCE IN BIOINFORMATICS AND COMPUTATIONAL BIOLOGY 2012; 2012:387-396. [PMID: 25364782 DOI: 10.1109/cibcb.2012.6217256] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/30/2023]
Abstract
Inference of Transcriptional Regulatory Networks (TRNs) provides insight into the mechanisms driving biological systems, especially mammalian development and disease. Many techniques have been developed for TRN estimation from indirect biochemical measurements. Although successful when initially tested in model organisms, these regulatory models often fail when applied to data from multicellular organisms where multiple regulation and gene reuse increase dramatically. Non-negative matrix factorization techniques were initially introduced to find non-orthogonal patterns in data, making them ideal techniques for inference in cases of multiple regulation. We review these techniques and their application to TRN analysis.
Collapse
Affiliation(s)
- Michael F Ochs
- School of Medicine, Johns Hopkins University, Baltimore, MD 21205
| | - Elana J Fertig
- School of Medicine, Johns Hopkins University, Baltimore, MD 21205
| |
Collapse
|
30
|
Gene regulatory network inference: evaluation and application to ovarian cancer allows the prioritization of drug targets. Genome Med 2012; 4:41. [PMID: 22548828 PMCID: PMC3506907 DOI: 10.1186/gm340] [Citation(s) in RCA: 113] [Impact Index Per Article: 8.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/20/2011] [Revised: 02/22/2012] [Accepted: 05/01/2012] [Indexed: 12/15/2022] Open
Abstract
Background Altered networks of gene regulation underlie many complex conditions, including cancer. Inferring gene regulatory networks from high-throughput microarray expression data is a fundamental but challenging task in computational systems biology and its translation to genomic medicine. Although diverse computational and statistical approaches have been brought to bear on the gene regulatory network inference problem, their relative strengths and disadvantages remain poorly understood, largely because comparative analyses usually consider only small subsets of methods, use only synthetic data, and/or fail to adopt a common measure of inference quality. Methods We report a comprehensive comparative evaluation of nine state-of-the art gene regulatory network inference methods encompassing the main algorithmic approaches (mutual information, correlation, partial correlation, random forests, support vector machines) using 38 simulated datasets and empirical serous papillary ovarian adenocarcinoma expression-microarray data. We then apply the best-performing method to infer normal and cancer networks. We assess the druggability of the proteins encoded by our predicted target genes using the CancerResource and PharmGKB webtools and databases. Results We observe large differences in the accuracy with which these methods predict the underlying gene regulatory network depending on features of the data, network size, topology, experiment type, and parameter settings. Applying the best-performing method (the supervised method SIRENE) to the serous papillary ovarian adenocarcinoma dataset, we infer and rank regulatory interactions, some previously reported and others novel. For selected novel interactions we propose testable mechanistic models linking gene regulation to cancer. Using network analysis and visualization, we uncover cross-regulation of angiogenesis-specific genes through three key transcription factors in normal and cancer conditions. Druggabilty analysis of proteins encoded by the 10 highest-confidence target genes, and by 15 genes with differential regulation in normal and cancer conditions, reveals 75% to be potential drug targets. Conclusions Our study represents a concrete application of gene regulatory network inference to ovarian cancer, demonstrating the complete cycle of computational systems biology research, from genome-scale data analysis via network inference, evaluation of methods, to the generation of novel testable hypotheses, their prioritization for experimental validation, and discovery of potential drug targets.
Collapse
|
31
|
Zhang J, Zheng CH, Liu JX, Wang HQ. Discovering the transcriptional modules using microarray data by penalized matrix decomposition. Comput Biol Med 2011; 41:1041-50. [PMID: 22001074 DOI: 10.1016/j.compbiomed.2011.09.003] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/20/2011] [Revised: 08/30/2011] [Accepted: 09/12/2011] [Indexed: 11/25/2022]
Abstract
Uncovering the transcriptional modules with context-specific cellular activities or functions is important for understanding biological network, deciphering regulatory mechanisms and identifying biomarkers. In this paper, we propose to use the penalized matrix decomposition (PMD) to discover the transcriptional modules from microarray data. With the sparsity constraint on the decomposition factors, metagenes can be extracted from the gene expression data and they can well capture the intrinsic patterns of genes with the similar functions. Meanwhile, the PMD factors of each gene are good indicators of the cluster it belongs to. Compared with traditional methods, our method can cluster genes of similar functions but without similar expression profiles. It can also assign a gene into different modules. Moreover, the clustering results by our method are stable and more biologically relevant transcriptional modules can be discovered. Experimental results on two public datasets show that the proposed PMD based method is promising to discover transcriptional modules.
Collapse
Affiliation(s)
- Jun Zhang
- College of Electrical Engineering and Automation, Anhui University, Hefei, Anhui, China
| | | | | | | |
Collapse
|
32
|
Fu Y, Jarboe LR, Dickerson JA. Reconstructing genome-wide regulatory network of E. coli using transcriptome data and predicted transcription factor activities. BMC Bioinformatics 2011; 12:233. [PMID: 21668997 PMCID: PMC3224099 DOI: 10.1186/1471-2105-12-233] [Citation(s) in RCA: 44] [Impact Index Per Article: 3.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/15/2010] [Accepted: 06/13/2011] [Indexed: 01/16/2023] Open
Abstract
Background Gene regulatory networks play essential roles in living organisms to control growth, keep internal metabolism running and respond to external environmental changes. Understanding the connections and the activity levels of regulators is important for the research of gene regulatory networks. While relevance score based algorithms that reconstruct gene regulatory networks from transcriptome data can infer genome-wide gene regulatory networks, they are unfortunately prone to false positive results. Transcription factor activities (TFAs) quantitatively reflect the ability of the transcription factor to regulate target genes. However, classic relevance score based gene regulatory network reconstruction algorithms use models do not include the TFA layer, thus missing a key regulatory element. Results This work integrates TFA prediction algorithms with relevance score based network reconstruction algorithms to reconstruct gene regulatory networks with improved accuracy over classic relevance score based algorithms. This method is called Gene expression and Transcription factor activity based Relevance Network (GTRNetwork). Different combinations of TFA prediction algorithms and relevance score functions have been applied to find the most efficient combination. When the integrated GTRNetwork method was applied to E. coli data, the reconstructed genome-wide gene regulatory network predicted 381 new regulatory links. This reconstructed gene regulatory network including the predicted new regulatory links show promising biological significances. Many of the new links are verified by known TF binding site information, and many other links can be verified from the literature and databases such as EcoCyc. The reconstructed gene regulatory network is applied to a recent transcriptome analysis of E. coli during isobutanol stress. In addition to the 16 significantly changed TFAs detected in the original paper, another 7 significantly changed TFAs have been detected by using our reconstructed network. Conclusions The GTRNetwork algorithm introduces the hidden layer TFA into classic relevance score-based gene regulatory network reconstruction processes. Integrating the TFA biological information with regulatory network reconstruction algorithms significantly improves both detection of new links and reduces that rate of false positives. The application of GTRNetwork on E. coli gene transcriptome data gives a set of potential regulatory links with promising biological significance for isobutanol stress and other conditions.
Collapse
Affiliation(s)
- Yao Fu
- Bioinformatics and Computational Biology Program, Iowa State University, Ames, USA
| | | | | |
Collapse
|
33
|
Chen W, Chang C, Hung YS. Transcription factor activity estimation based on particle swarm optimization and fast network component analysis. ANNUAL INTERNATIONAL CONFERENCE OF THE IEEE ENGINEERING IN MEDICINE AND BIOLOGY SOCIETY. IEEE ENGINEERING IN MEDICINE AND BIOLOGY SOCIETY. ANNUAL INTERNATIONAL CONFERENCE 2011; 2010:1061-4. [PMID: 21096999 DOI: 10.1109/iembs.2010.5627641] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/06/2022]
Abstract
Transcription factors (TFs) play an important role in regulating the expression of genes. The accurate measurement of transcription factor activities (TFAs) depends on a series of experimental technologies of molecular biology and is intractable in most practical situations. Some signal processing methods for blind source separation have been applied in the prediction of TFAs from gene expression data. Most of such methods make use of statistical properties of the gene expression data only, leading to the inaccurate detection of TFAs. In contrast, network component analysis (NCA) can provide much improved result through utilizing the structural information of the gene regulatory network. However, the structure of the gene regulatory network, required by NCA, is not available in most practical cases so that NCA is not directly applicable. In this paper, we propose to use particle swarm optimization (PSO) to find the most plausible network structure iteratively from the gene expression data, with the assistance of recently developed fast algorithm for network component analysis (FastNCA). This novel approach to TFA inference can thus take advantage of NCA, even when the required network structure is unknown. The effectiveness of our novel approach has been demonstrated by applications to both simulated data and real gene expression microarray data, in the sense that TFAs can be inferred with high accuracy.
Collapse
Affiliation(s)
- Wei Chen
- Department of Electrical and Electronic Engineering, The University of Hong Kong, Pokfulam Road, Hong Kong.
| | | | | |
Collapse
|
34
|
Gong T, Xuan J, Chen L, Riggins RB, Li H, Hoffman EP, Clarke R, Wang Y. Motif-guided sparse decomposition of gene expression data for regulatory module identification. BMC Bioinformatics 2011; 12:82. [PMID: 21426557 PMCID: PMC3072956 DOI: 10.1186/1471-2105-12-82] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/02/2010] [Accepted: 03/22/2011] [Indexed: 02/07/2023] Open
Abstract
BACKGROUND Genes work coordinately as gene modules or gene networks. Various computational approaches have been proposed to find gene modules based on gene expression data; for example, gene clustering is a popular method for grouping genes with similar gene expression patterns. However, traditional gene clustering often yields unsatisfactory results for regulatory module identification because the resulting gene clusters are co-expressed but not necessarily co-regulated. RESULTS We propose a novel approach, motif-guided sparse decomposition (mSD), to identify gene regulatory modules by integrating gene expression data and DNA sequence motif information. The mSD approach is implemented as a two-step algorithm comprising estimates of (1) transcription factor activity and (2) the strength of the predicted gene regulation event(s). Specifically, a motif-guided clustering method is first developed to estimate the transcription factor activity of a gene module; sparse component analysis is then applied to estimate the regulation strength, and so predict the target genes of the transcription factors. The mSD approach was first tested for its improved performance in finding regulatory modules using simulated and real yeast data, revealing functionally distinct gene modules enriched with biologically validated transcription factors. We then demonstrated the efficacy of the mSD approach on breast cancer cell line data and uncovered several important gene regulatory modules related to endocrine therapy of breast cancer. CONCLUSION We have developed a new integrated strategy, namely motif-guided sparse decomposition (mSD) of gene expression data, for regulatory module identification. The mSD method features a novel motif-guided clustering method for transcription factor activity estimation by finding a balance between co-regulation and co-expression. The mSD method further utilizes a sparse decomposition method for regulation strength estimation. The experimental results show that such a motif-guided strategy can provide context-specific regulatory modules in both yeast and breast cancer studies.
Collapse
Affiliation(s)
- Ting Gong
- Bradley Department of Electrical and Computer Engineering, Virginia Tech, Arlington, VA 22203, USA
| | - Jianhua Xuan
- Bradley Department of Electrical and Computer Engineering, Virginia Tech, Arlington, VA 22203, USA
| | - Li Chen
- Bradley Department of Electrical and Computer Engineering, Virginia Tech, Arlington, VA 22203, USA
| | - Rebecca B Riggins
- Lombardi Comprehensive Cancer Center and Department of Oncology, Physiology and Biophysics, Georgetown University, Washington, DC 20057, USA
| | - Huai Li
- Bioinformatics Unit, RRB, National Institute on Aging, National Institutes of Health, Baltimore, MD 21224, USA
| | - Eric P Hoffman
- Research Center for Genetic Medicine, Children's National Medical Center, Washington, DC 20010, USA
| | - Robert Clarke
- Lombardi Comprehensive Cancer Center and Department of Oncology, Physiology and Biophysics, Georgetown University, Washington, DC 20057, USA
| | - Yue Wang
- Bradley Department of Electrical and Computer Engineering, Virginia Tech, Arlington, VA 22203, USA
| |
Collapse
|
35
|
Markovsky I, Niranjan M. Approximate low-rank factorization with structured factors. Comput Stat Data Anal 2010. [DOI: 10.1016/j.csda.2009.06.003] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/20/2022]
|
36
|
Abstract
In many organisms the expression levels of each gene are controlled by the activation levels of known "Transcription Factors" (TF). A problem of considerable interest is that of estimating the "Transcription Regulation Networks" (TRN) relating the TFs and genes. While the expression levels of genes can be observed, the activation levels of the corresponding TFs are usually unknown, greatly increasing the difficulty of the problem. Based on previous experimental work, it is often the case that partial information about the TRN is available. For example, certain TFs may be known to regulate a given gene or in other cases a connection may be predicted with a certain probability. In general, the biology of the problem indicates there will be very few connections between TFs and genes. Several methods have been proposed for estimating TRNs. However, they all suffer from problems such as unrealistic assumptions about prior knowledge of the network structure or computational limitations. We propose a new approach that can directly utilize prior information about the network structure in conjunction with observed gene expression data to estimate the TRN. Our approach uses L(1) penalties on the network to ensure a sparse structure. This has the advantage of being computationally efficient as well as making many fewer assumptions about the network structure. We use our methodology to construct the TRN for E. coli and show that the estimate is biologically sensible and compares favorably with previous estimates.
Collapse
Affiliation(s)
- Gareth M James
- University of Southern California, Stanford University, University of Michigan and University of Michigan
| | | | | | | |
Collapse
|
37
|
Alternative splicing regulatory network reconstruction from exon array data. J Theor Biol 2010; 263:471-80. [DOI: 10.1016/j.jtbi.2009.12.025] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/02/2009] [Revised: 11/14/2009] [Accepted: 12/22/2009] [Indexed: 11/17/2022]
|
38
|
Zhang Y, Hatch KA, Bacon J, Wernisch L. An integrated machine learning approach for predicting DosR-regulated genes in Mycobacterium tuberculosis. BMC SYSTEMS BIOLOGY 2010; 4:37. [PMID: 20356371 PMCID: PMC2867773 DOI: 10.1186/1752-0509-4-37] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 10/08/2009] [Accepted: 03/31/2010] [Indexed: 11/10/2022]
Abstract
BACKGROUND DosR is an important regulator of the response to stress such as limited oxygen availability in Mycobacterium tuberculosis. Time course gene expression data enable us to dissect this response on the gene regulatory level. The mRNA expression profile of a regulator, however, is not necessarily a direct reflection of its activity. Knowing the transcription factor activity (TFA) can be exploited to predict novel target genes regulated by the same transcription factor. Various approaches have been proposed to reconstruct TFAs from gene expression data. Most of them capture only a first-order approximation to the complex transcriptional processes by assuming linear gene responses and linear dynamics in TFA, or ignore the temporal information in data from such systems. RESULTS In this paper, we approach the problem of inferring dynamic hidden TFAs using Gaussian processes (GP). We are able to model dynamic TFAs and to account for both linear and nonlinear gene responses. To test the validity of the proposed approach, we reconstruct the hidden TFA of p53, a tumour suppressor activated by DNA damage, using published time course gene expression data. Our reconstructed TFA is closer to the experimentally determined profile of p53 concentration than that from the original study. We then apply the model to time course gene expression data obtained from chemostat cultures of M. tuberculosis under reduced oxygen availability. After estimation of the TFA of DosR based on a number of known target genes using the GP model, we predict novel DosR-regulated genes: the parameters of the model are interpreted as relevance parameters indicating an existing functional relationship between TFA and gene expression. We further improve the prediction by integrating promoter sequence information in a logistic regression model. Apart from the documented DosR-regulated genes, our prediction yields ten novel genes under direct control of DosR. CONCLUSIONS Chemostat cultures are an ideal experimental system for controlling noise and variability when monitoring the response of bacterial organisms such as M. tuberculosis to finely controlled changes in culture conditions and available metabolites. Nonlinear hidden TFA dynamics of regulators can be reconstructed remarkably well with Gaussian processes from such data. Moreover, estimated parameters of the GP can be used to assess whether a gene is controlled by the reconstructed TFA or not. It is straightforward to combine these parameters with further information, such as the presence of binding motifs, to increase prediction accuracy.
Collapse
Affiliation(s)
- Yi Zhang
- School of Crystallography, Birkbeck College, University of London, Malet Street, London, WC1E 7HX, UK
| | - Kim A Hatch
- TB research, Health Protection Agency, CEPR, Porton Down, Salisbury SP4 0JG, UK
| | - Joanna Bacon
- TB research, Health Protection Agency, CEPR, Porton Down, Salisbury SP4 0JG, UK
| | - Lorenz Wernisch
- School of Crystallography, Birkbeck College, University of London, Malet Street, London, WC1E 7HX, UK
- MRC Biostatistics Unit, University Forvie Site, Robinson Way, Cambridge CB2 0SR, UK
| |
Collapse
|
39
|
Lee WP, Tzou WS. Computational methods for discovering gene networks from expression data. Brief Bioinform 2009; 10:408-23. [PMID: 19505889 DOI: 10.1093/bib/bbp028] [Citation(s) in RCA: 73] [Impact Index Per Article: 4.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022] Open
Abstract
Designing and conducting experiments are routine practices for modern biologists. The real challenge, especially in the post-genome era, usually comes not from acquiring data, but from subsequent activities such as data processing, analysis, knowledge generation and gaining insight into the research question of interest. The approach of inferring gene regulatory networks (GRNs) has been flourishing for many years, and new methods from mathematics, information science, engineering and social sciences have been applied. We review different kinds of computational methods biologists use to infer networks of varying levels of accuracy and complexity. The primary concern of biologists is how to translate the inferred network into hypotheses that can be tested with real-life experiments. Taking the biologists' viewpoint, we scrutinized several methods for predicting GRNs in mammalian cells, and more importantly show how the power of different knowledge databases of different types can be used to identify modules and subnetworks, thereby reducing complexity and facilitating the generation of testable hypotheses.
Collapse
Affiliation(s)
- Wei-Po Lee
- Department of Information Management, National Sun Yat-sen University, Kaohsiung, Taiwan.
| | | |
Collapse
|
40
|
Modelling Transcriptional Regulation with a Mixture of Factor Analyzers and Variational Bayesian Expectation Maximization. EURASIP JOURNAL ON BIOINFORMATICS & SYSTEMS BIOLOGY 2009:601068. [PMID: 19572011 PMCID: PMC3171433 DOI: 10.1155/2009/601068] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Received: 12/02/2008] [Accepted: 02/27/2009] [Indexed: 11/17/2022]
Abstract
Understanding the mechanisms of gene transcriptional regulation through analysis of high-throughput postgenomic data is one of the central problems of computational systems biology. Various approaches have been proposed, but most of them fail to address at least one of the following objectives: (1) allow for the fact that transcription factors are potentially subject to posttranscriptional regulation; (2) allow for the fact that transcription factors cooperate as a functional complex in regulating gene expression, and (3) provide a model and a learning algorithm with manageable computational complexity. The objective of the present study is to propose and test a method that addresses these three issues. The model we employ is a mixture of factor analyzers, in which the latent variables correspond to different transcription factors, grouped into complexes or modules. We pursue inference in a Bayesian framework, using the Variational Bayesian Expectation Maximization (VBEM) algorithm for approximate inference of the posterior distributions of the model parameters, and estimation of a lower bound on the marginal likelihood for model selection. We have evaluated the performance of the proposed method on three criteria: activity profile reconstruction, gene clustering, and network inference.
Collapse
|
41
|
Benuskova L, Kasabov N. Modeling brain dynamics using computational neurogenetic approach. Cogn Neurodyn 2008; 2:319-34. [PMID: 19003458 PMCID: PMC2585617 DOI: 10.1007/s11571-008-9061-1] [Citation(s) in RCA: 12] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/22/2007] [Revised: 08/19/2008] [Accepted: 08/19/2008] [Indexed: 01/10/2023] Open
Abstract
The paper introduces a novel computational approach to brain dynamics modeling that integrates dynamic gene-protein regulatory networks with a neural network model. Interaction of genes and proteins in neurons affects the dynamics of the whole neural network. Through tuning the gene-protein interaction network and the initial gene/protein expression values, different states of the neural network dynamics can be achieved. A generic computational neurogenetic model is introduced that implements this approach. It is illustrated by means of a simple neurogenetic model of a spiking neural network of the generation of local field potential. Our approach allows for investigation of how deleted or mutated genes can alter the dynamics of a model neural network. We conclude with the proposal how to extend this approach to model cognitive neurodynamics.
Collapse
Affiliation(s)
- Lubica Benuskova
- Department of Computer Science, University of Otago, 90 Union Place East, Dunedin, 9016 New Zealand
| | - Nikola Kasabov
- Knowledge Engineering and Discovery Research Institute, Auckland University of Technology, AUT Technology Park, 583-585 Great South Road, Penrose, Auckland, 1135 New Zealand
| |
Collapse
|