1
|
Otto JE, Ursu O, Wu AP, Winter EB, Cuoco MS, Ma S, Qian K, Michel BC, Buenrostro JD, Berger B, Regev A, Kadoch C. Structural and functional properties of mSWI/SNF chromatin remodeling complexes revealed through single-cell perturbation screens. Mol Cell 2023; 83:1350-1367.e7. [PMID: 37028419 DOI: 10.1016/j.molcel.2023.03.013] [Citation(s) in RCA: 3] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/28/2022] [Revised: 02/07/2023] [Accepted: 03/10/2023] [Indexed: 04/09/2023]
Abstract
The mammalian SWI/SNF (mSWI/SNF or BAF) family of chromatin remodeling complexes play critical roles in regulating DNA accessibility and gene expression. The three final-form subcomplexes-cBAF, PBAF, and ncBAF-are distinct in biochemical componentry, chromatin targeting, and roles in disease; however, the contributions of their constituent subunits to gene expression remain incompletely defined. Here, we performed Perturb-seq-based CRISPR-Cas9 knockout screens targeting mSWI/SNF subunits individually and in select combinations, followed by single-cell RNA-seq and SHARE-seq. We uncovered complex-, module-, and subunit-specific contributions to distinct regulatory networks and defined paralog subunit relationships and shifted subcomplex functions upon perturbations. Synergistic, intra-complex genetic interactions between subunits reveal functional redundancy and modularity. Importantly, single-cell subunit perturbation signatures mapped across bulk primary human tumor expression profiles both mirror and predict cBAF loss-of-function status in cancer. Our findings highlight the utility of Perturb-seq to dissect disease-relevant gene regulatory impacts of heterogeneous, multi-component master regulatory complexes.
Collapse
Affiliation(s)
- Jordan E Otto
- Department of Pediatric Oncology, Dana-Farber Cancer Institute and Harvard Medical School, Boston, MA, USA; Broad Institute of MIT and Harvard, Cambridge, MA, USA; Chemical Biology Program, Harvard University, Cambridge, MA, USA
| | - Oana Ursu
- Broad Institute of MIT and Harvard, Cambridge, MA, USA; Department of Biology, Massachusetts Institute of Technology, Cambridge, MA, USA
| | - Alexander P Wu
- Broad Institute of MIT and Harvard, Cambridge, MA, USA; Computer Science and Artificial Intelligence Laboratory, Massachusetts Institute of Technology, Cambridge, MA, USA; Department of Mathematics, Massachusetts Institute of Technology, Cambridge, MA, USA
| | - Evan B Winter
- Department of Pediatric Oncology, Dana-Farber Cancer Institute and Harvard Medical School, Boston, MA, USA; Broad Institute of MIT and Harvard, Cambridge, MA, USA
| | | | - Sai Ma
- Broad Institute of MIT and Harvard, Cambridge, MA, USA; Stem Cell and Regenerative Biology, Harvard University, Cambridge, MA, USA
| | - Kristin Qian
- Department of Pediatric Oncology, Dana-Farber Cancer Institute and Harvard Medical School, Boston, MA, USA; Broad Institute of MIT and Harvard, Cambridge, MA, USA; Biological and Biomedical Sciences Program, Harvard Medical School, Boston, MA, USA
| | - Brittany C Michel
- Department of Pediatric Oncology, Dana-Farber Cancer Institute and Harvard Medical School, Boston, MA, USA; Broad Institute of MIT and Harvard, Cambridge, MA, USA; Biological and Biomedical Sciences Program, Harvard Medical School, Boston, MA, USA
| | - Jason D Buenrostro
- Broad Institute of MIT and Harvard, Cambridge, MA, USA; Stem Cell and Regenerative Biology, Harvard University, Cambridge, MA, USA
| | - Bonnie Berger
- Broad Institute of MIT and Harvard, Cambridge, MA, USA; Computer Science and Artificial Intelligence Laboratory, Massachusetts Institute of Technology, Cambridge, MA, USA; Department of Mathematics, Massachusetts Institute of Technology, Cambridge, MA, USA
| | - Aviv Regev
- Broad Institute of MIT and Harvard, Cambridge, MA, USA; Department of Biology, Massachusetts Institute of Technology, Cambridge, MA, USA; Howard Hughes Medical Institute, Chevy Chase, MD, UA.
| | - Cigall Kadoch
- Department of Pediatric Oncology, Dana-Farber Cancer Institute and Harvard Medical School, Boston, MA, USA; Broad Institute of MIT and Harvard, Cambridge, MA, USA; Chemical Biology Program, Harvard University, Cambridge, MA, USA; Howard Hughes Medical Institute, Chevy Chase, MD, UA.
| |
Collapse
|
2
|
Najnin T, Saimon SH, Sunter G, Ruan J. A Network-Based Approach for Improving Annotation of Transcription Factor Functions and Binding Sites in Arabidopsis thaliana. Genes (Basel) 2023; 14:genes14020282. [PMID: 36833209 PMCID: PMC9957447 DOI: 10.3390/genes14020282] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/09/2022] [Revised: 01/12/2023] [Accepted: 01/19/2023] [Indexed: 01/26/2023] Open
Abstract
Transcription factors are an integral component of the cellular machinery responsible for regulating many biological processes, and they recognize distinct DNA sequence patterns as well as internal/external signals to mediate target gene expression. The functional roles of an individual transcription factor can be traced back to the functions of its target genes. While such functional associations can be inferred through the use of binding evidence from high-throughput sequencing technologies available today, including chromatin immunoprecipitation sequencing, such experiments can be resource-consuming. On the other hand, exploratory analysis driven by computational techniques can alleviate this burden by narrowing the search scope, but the results are often deemed low-quality or non-specific by biologists. In this paper, we introduce a data-driven, statistics-based strategy to predict novel functional associations for transcription factors in the model plant Arabidopsis thaliana. To achieve this, we leverage one of the largest available gene expression compendia to build a genome-wide transcriptional regulatory network and infer regulatory relationships among transcription factors and their targets. We then use this network to build a pool of likely downstream targets for each transcription factor and query each target pool for functionally enriched gene ontology terms. The results exhibited sufficient statistical significance to annotate most of the transcription factors in Arabidopsis with highly specific biological processes. We also perform DNA binding motif discovery for transcription factors based on their target pool. We show that the predicted functions and motifs strongly agree with curated databases constructed from experimental evidence. In addition, statistical analysis of the network revealed interesting patterns and connections between network topology and system-level transcriptional regulation properties. We believe that the methods demonstrated in this work can be extended to other species to improve the annotation of transcription factors and understand transcriptional regulation on a system level.
Collapse
Affiliation(s)
- Tanzira Najnin
- Department of Computer Science, The University of Texas at San Antonio, San Antonio, TX 78249, USA
| | - Sakhawat Hossain Saimon
- Department of Computer Science, The University of Texas at San Antonio, San Antonio, TX 78249, USA
| | - Garry Sunter
- Department of Biological Sciences, Northern Illinois University, DeKalb, IL 60115, USA
| | - Jianhua Ruan
- Department of Computer Science, The University of Texas at San Antonio, San Antonio, TX 78249, USA
- Correspondence:
| |
Collapse
|
3
|
Learning complex dependency structure of gene regulatory networks from high dimensional microarray data with Gaussian Bayesian networks. Sci Rep 2022; 12:18704. [PMID: 36333425 PMCID: PMC9636198 DOI: 10.1038/s41598-022-21957-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/15/2022] [Accepted: 10/06/2022] [Indexed: 11/06/2022] Open
Abstract
Reconstruction of Gene Regulatory Networks (GRNs) of gene expression data with Probabilistic Network Models (PNMs) is an open problem. Gene expression datasets consist of thousand of genes with relatively small sample sizes (i.e. are large-p-small-n). Moreover, dependencies of various orders coexist in the datasets. On the one hand transcription factor encoding genes act like hubs and regulate target genes, on the other hand target genes show local dependencies. In the field of Undirected Network Models (UNMs)-a subclass of PNMs-the Glasso algorithm has been proposed to deal with high dimensional microarray datasets forcing sparsity. To overcome the problem of the complex structure of interactions, modifications of the default Glasso algorithm have been developed that integrate the expected dependency structure in the UNMs beforehand. In this work we advocate the use of a simple score-based Hill Climbing algorithm (HC) that learns Gaussian Bayesian networks leaning on directed acyclic graphs. We compare HC with Glasso and variants in the UNM framework based on their capability to reconstruct GRNs from microarray data from the benchmarking synthetic dataset from the DREAM5 challenge and from real-world data from the Escherichia coli genome. We conclude that dependencies in complex data are learned best by the HC algorithm, presenting them most accurately and efficiently, simultaneously modelling strong local and weaker but significant global connections coexisting in the gene expression dataset. The HC algorithm adapts intrinsically to the complex dependency structure of the dataset, without forcing a specific structure in advance.
Collapse
|
4
|
Sarmah D, Smith GR, Bouhaddou M, Stern AD, Erskine J, Birtwistle MR. Network inference from perturbation time course data. NPJ Syst Biol Appl 2022; 8:42. [PMID: 36316338 PMCID: PMC9622863 DOI: 10.1038/s41540-022-00253-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/05/2022] [Accepted: 10/18/2022] [Indexed: 11/05/2022] Open
Abstract
Networks underlie much of biology from subcellular to ecological scales. Yet, understanding what experimental data are needed and how to use them for unambiguously identifying the structure of even small networks remains a broad challenge. Here, we integrate a dynamic least squares framework into established modular response analysis (DL-MRA), that specifies sufficient experimental perturbation time course data to robustly infer arbitrary two and three node networks. DL-MRA considers important network properties that current methods often struggle to capture: (i) edge sign and directionality; (ii) cycles with feedback or feedforward loops including self-regulation; (iii) dynamic network behavior; (iv) edges external to the network; and (v) robust performance with experimental noise. We evaluate the performance of and the extent to which the approach applies to cell state transition networks, intracellular signaling networks, and gene regulatory networks. Although signaling networks are often an application of network reconstruction methods, the results suggest that only under quite restricted conditions can they be robustly inferred. For gene regulatory networks, the results suggest that incomplete knockdown is often more informative than full knockout perturbation, which may change experimental strategies for gene regulatory network reconstruction. Overall, the results give a rational basis to experimental data requirements for network reconstruction and can be applied to any such problem where perturbation time course experiments are possible.
Collapse
Affiliation(s)
- Deepraj Sarmah
- Department of Chemical and Biomolecular Engineering, Clemson University, Clemson, SC, USA
| | - Gregory R Smith
- Department of Neurology, Center for Advanced Research on Diagnostic Assays, Icahn School of Medicine at Mount Sinai, New York, NY, USA
| | - Mehdi Bouhaddou
- J. David Gladstone Institutes, San Francisco, CA, 94158, USA
- Department of Cellular and Molecular Pharmacology, University of California San Francisco, San Francisco, CA, 94158, USA
| | - Alan D Stern
- Department of Pharmacological Sciences, Icahn School of Medicine at Mount Sinai, New York, NY, USA
| | - James Erskine
- Department of Chemical and Biomolecular Engineering, Clemson University, Clemson, SC, USA
| | - Marc R Birtwistle
- Department of Chemical and Biomolecular Engineering, Clemson University, Clemson, SC, USA.
- Department of Bioengineering, Clemson University, Clemson, SC, USA.
| |
Collapse
|
5
|
Chowdhury S, Wang R, Yu Q, Huntoon CJ, Karnitz LM, Kaufmann SH, Gygi SP, Birrer MJ, Paulovich AG, Peng J, Wang P. DAGBagM: learning directed acyclic graphs of mixed variables with an application to identify protein biomarkers for treatment response in ovarian cancer. BMC Bioinformatics 2022; 23:321. [PMID: 35931981 PMCID: PMC9354326 DOI: 10.1186/s12859-022-04864-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/04/2022] [Accepted: 07/28/2022] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Applying directed acyclic graph (DAG) models to proteogenomic data has been shown effective for detecting causal biomarkers of complex diseases. However, there remain unsolved challenges in DAG learning to jointly model binary clinical outcome variables and continuous biomarker measurements. RESULTS In this paper, we propose a new tool, DAGBagM, to learn DAGs with both continuous and binary nodes. By using appropriate models, DAGBagM allows for either continuous or binary nodes to be parent or child nodes. It employs a bootstrap aggregating strategy to reduce false positives in edge inference. At the same time, the aggregation procedure provides a flexible framework to robustly incorporate prior information on edges. CONCLUSIONS Through extensive simulation experiments, we demonstrate that DAGBagM has superior performance compared to alternative strategies for modeling mixed types of nodes. In addition, DAGBagM is computationally more efficient than two competing methods. When applying DAGBagM to proteogenomic datasets from ovarian cancer studies, we identify potential protein biomarkers for platinum refractory/resistant response in ovarian cancer. DAGBagM is made available as a github repository at https://github.com/jie108/dagbagM .
Collapse
Affiliation(s)
- Shrabanti Chowdhury
- Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, New York, NY, 10029, USA
| | - Ru Wang
- Department of Statistics, University of California, Davis, CA, 95616, USA
| | - Qing Yu
- Department of Cell Biology, Harvard Medical School, Boston, MA, 02115, USA
| | - Catherine J Huntoon
- Division of Oncology Research and Department of Oncology, Mayo Clinic, Rochester, MN, 55905, USA
| | - Larry M Karnitz
- Division of Oncology Research and Department of Oncology, Mayo Clinic, Rochester, MN, 55905, USA
| | - Scott H Kaufmann
- Division of Oncology Research, Mayo Clinic, Rochester, MN, 55905, USA
| | - Steven P Gygi
- Department of Cell Biology, Harvard Medical School, Boston, MA, 02115, USA
| | - Michael J Birrer
- Winthrop P. Rockefeller Cancer Institute, University of Arkansas for Medical Sciences, Little Rock, AR, 72205, USA
| | - Amanda G Paulovich
- Clinical Research Division, Fred Hutchinson Cancer Center, Seattle, WA, 98109, USA
| | - Jie Peng
- Department of Statistics, University of California, Davis, CA, 95616, USA.
| | - Pei Wang
- Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, New York, NY, 10029, USA.
| |
Collapse
|
6
|
Dsouza KB, Li AY, Bhargava VK, Libbrecht MW. Latent Representation of the Human Pan-Celltype Epigenome Through a Deep Recurrent Neural Network. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2022; 19:2313-2323. [PMID: 34043510 DOI: 10.1109/tcbb.2021.3084147] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/12/2023]
Abstract
The availability of thousands of assays of epigenetic activity necessitates compressed representations of these data sets that summarize the epigenetic landscape of the genome. Until recently, most such representations were cell type-specific, applying to a single tissue or cell state. Recently, neural networks have made it possible to summarize data across tissues to produce a pan-cell type representation. In this work, we propose Epi-LSTM, a deep long short-term memory (LSTM) recurrent neural network autoencoder to capture the long-term dependencies in the epigenomic data. The latent representations from Epi-LSTM capture a variety of genomic phenomena, including gene-expression, promoter-enhancer interactions, replication timing, frequently interacting regions, and evolutionary conservation. These representations outperform existing methods in a majority of cell types while yielding smoother representations along the genomic axis due to their sequential nature.
Collapse
|
7
|
Dsouza KB, Maslova A, Al-Jibury E, Merkenschlager M, Bhargava VK, Libbrecht MW. Learning representations of chromatin contacts using a recurrent neural network identifies genomic drivers of conformation. Nat Commun 2022; 13:3704. [PMID: 35764630 PMCID: PMC9240038 DOI: 10.1038/s41467-022-31337-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/05/2021] [Accepted: 06/15/2022] [Indexed: 11/28/2022] Open
Abstract
Despite the availability of chromatin conformation capture experiments, discerning the relationship between the 1D genome and 3D conformation remains a challenge, which limits our understanding of their affect on gene expression and disease. We propose Hi-C-LSTM, a method that produces low-dimensional latent representations that summarize intra-chromosomal Hi-C contacts via a recurrent long short-term memory neural network model. We find that these representations contain all the information needed to recreate the observed Hi-C matrix with high accuracy, outperforming existing methods. These representations enable the identification of a variety of conformation-defining genomic elements, including nuclear compartments and conformation-related transcription factors. They furthermore enable in-silico perturbation experiments that measure the influence of cis-regulatory elements on conformation.
Collapse
Affiliation(s)
- Kevin B Dsouza
- Department of Electrical and Computer Engineering, University of British Columbia, Vancouver, Canada.
| | - Alexandra Maslova
- School of Computing Science, Simon Fraser University, Burnaby, Canada
| | - Ediem Al-Jibury
- MRC, London Institute of Medical Sciences, Institute of Clinical Sciences, Faculty of Medicine, Imperial College London, London, UK
- Department of Computing, Imperial College London, London, UK
| | - Matthias Merkenschlager
- MRC, London Institute of Medical Sciences, Institute of Clinical Sciences, Faculty of Medicine, Imperial College London, London, UK
| | - Vijay K Bhargava
- Department of Electrical and Computer Engineering, University of British Columbia, Vancouver, Canada
| | | |
Collapse
|
8
|
Kim DW, Hong H, Kim JK. Systematic inference identifies a major source of heterogeneity in cell signaling dynamics: The rate-limiting step number. SCIENCE ADVANCES 2022; 8:eabl4598. [PMID: 35302852 PMCID: PMC8932658 DOI: 10.1126/sciadv.abl4598] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 07/14/2021] [Accepted: 01/26/2022] [Indexed: 06/14/2023]
Abstract
Identifying the sources of cell-to-cell variability in signaling dynamics is essential to understand drug response variability and develop effective therapeutics. However, it is challenging because not all signaling intermediate reactions can be experimentally measured simultaneously. This can be overcome by replacing them with a single random time delay, but the resulting process is non-Markovian, making it difficult to infer cell-to-cell heterogeneity in reaction rates and time delays. To address this, we developed an efficient and scalable moment-based Bayesian inference method (MBI) with a user-friendly computational package that infers cell-to-cell heterogeneity in the non-Markovian signaling process. We applied MBI to single-cell expression profiles from promoters responding to antibiotics and discovered a major source of cell-to-cell variability in antibiotic stress response: the number of rate-limiting steps in signaling cascades. This knowledge can help identify effective therapies that destroy all pathogenic or cancer cells, and the approach can be applied to precision medicine.
Collapse
Affiliation(s)
- Dae Wook Kim
- Department of Mathematical Sciences, Korea Advanced Institute of Science and Technology, Daejeon 34141, Republic of Korea
- Biomedical Mathematics Group, Institute for Basic Science, Daejeon 34126, Republic of Korea
| | - Hyukpyo Hong
- Department of Mathematical Sciences, Korea Advanced Institute of Science and Technology, Daejeon 34141, Republic of Korea
- Biomedical Mathematics Group, Institute for Basic Science, Daejeon 34126, Republic of Korea
| | - Jae Kyoung Kim
- Department of Mathematical Sciences, Korea Advanced Institute of Science and Technology, Daejeon 34141, Republic of Korea
- Biomedical Mathematics Group, Institute for Basic Science, Daejeon 34126, Republic of Korea
| |
Collapse
|
9
|
Sahoo A, Pechmann S. Functional network motifs defined through integration of protein-protein and genetic interactions. PeerJ 2022; 10:e13016. [PMID: 35223214 PMCID: PMC8877332 DOI: 10.7717/peerj.13016] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/11/2021] [Accepted: 02/06/2022] [Indexed: 01/11/2023] Open
Abstract
Cells are enticingly complex systems. The identification of feedback regulation is critically important for understanding this complexity. Network motifs defined as small graphlets that occur more frequently than expected by chance have revolutionized our understanding of feedback circuits in cellular networks. However, with their definition solely based on statistical over-representation, network motifs often lack biological context, which limits their usefulness. Here, we define functional network motifs (FNMs) through the systematic integration of genetic interaction data that directly inform on functional relationships between genes and encoded proteins. Occurring two orders of magnitude less frequently than conventional network motifs, we found FNMs significantly enriched in genes known to be functionally related. Moreover, our comprehensive analyses of FNMs in yeast showed that they are powerful at capturing both known and putative novel regulatory interactions, thus suggesting a promising strategy towards the systematic identification of feedback regulation in biological networks. Many FNMs appeared as excellent candidates for the prioritization of follow-up biochemical characterization, which is a recurring bottleneck in the targeting of complex diseases. More generally, our work highlights a fruitful avenue for integrating and harnessing genomic network data.
Collapse
Affiliation(s)
- Amruta Sahoo
- Département de Biochimie, Université de Montréal, Montréal, QC, Canada
| | | |
Collapse
|
10
|
McMurray HR, Ambeskovic A, Newman LA, Aldersley J, Balakrishnan V, Smith B, Stern HA, Land H, McCall MN. Gene network modeling via TopNet reveals functional dependencies between diverse tumor-critical mediator genes. Cell Rep 2021; 37:110136. [PMID: 34936873 PMCID: PMC8803128 DOI: 10.1016/j.celrep.2021.110136] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/23/2020] [Revised: 08/02/2021] [Accepted: 11/25/2021] [Indexed: 11/08/2022] Open
Abstract
Malignant cell transformation and the underlying reprogramming of gene
expression require the cooperation of multiple oncogenic mutations. This
cooperation is reflected in the synergistic regulation of non-mutant downstream
genes, so-called cooperation response genes (CRGs). CRGs affect diverse hallmark
features of cancer cells and are not known to be functionally connected.
However, they act as critical mediators of the cancer phenotype at an
unexpectedly high frequency >50%, as indicated by genetic perturbations.
Here, we demonstrate that CRGs function within a network of strong genetic
interdependencies that are critical to the malignant state. Our network modeling
methodology, TopNet, takes the approach of incorporating uncertainty in the
underlying gene perturbation data and can identify non-linear gene interactions.
In the dense space of gene connectivity, TopNet reveals a sparse topological
gene network architecture, effectively pinpointing functionally relevant gene
interactions. Thus, among diverse potential applications, TopNet has utility for
identification of non-mutant targets for cancer intervention. Malignant cell transformation requires the cooperation of multiple
oncogenic mutations. Here, we demonstrate that non-mutated genes function within
a network of strong genetic interdependencies that are critical to the malignant
state. Our network modeling methodology, TopNet, reveals a sparse topological
gene network architecture, effectively pinpointing functionally relevant gene
interactions.
Collapse
|
11
|
Zhang J, Liu L, Xu T, Zhang W, Li J, Rao N, Le TD. Time to infer miRNA sponge modules. WILEY INTERDISCIPLINARY REVIEWS-RNA 2021; 13:e1686. [PMID: 34342388 DOI: 10.1002/wrna.1686] [Citation(s) in RCA: 6] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/06/2021] [Revised: 07/14/2021] [Accepted: 07/14/2021] [Indexed: 01/01/2023]
Abstract
Inferring competing endogenous RNA (ceRNA) or microRNA (miRNA) sponge modules is a challenging and meaningful task for revealing ceRNA regulation mechanism at the module level. Modules in this context refer to groups of miRNA sponges which have mutual competitions and act as functional units for achieving biological processes. The recent development of computational methods based on heterogeneous data provides a novel way to discern the competitive effects of miRNA sponges on human complex diseases. This article aims to provide a comprehensive perspective of miRNA sponge module discovery methods. We first review the publicly available databases of cancer-related miRNA sponges, as the miRNA sponges involved in human cancers contribute to the discovery of cancer-associated modules. Then we review the existing computational methods for inferring miRNA sponge modules. Furthermore, we conduct an assessment on the performance of the module discovery methods with the pan-cancer dataset, and the comparison study indicates that it is useful to infer biologically meaningful miRNA sponge modules by directly mapping heterogeneous data to the competitive modules. Finally, we discuss the future directions and associated challenges in developing in silico methods to infer miRNA sponge modules. This article is categorized under: RNA Interactions with Proteins and Other Molecules > Small Molecule-RNA Interactions Regulatory RNAs/RNAi/Riboswitches > Regulatory RNAs.
Collapse
Affiliation(s)
- Junpeng Zhang
- Center for Informational Biology, School of Life Science and Technology, University of Electronic Science and Technology of China, Chengdu, Sichuan, China.,School of Engineering, Dali University, Dali, Yunnan, China
| | - Lin Liu
- UniSA STEM, University of South Australia, Mawson Lakes, South Australia, Australia
| | - Taosheng Xu
- Institute of Intelligent Machines, Hefei Institutes of Physical Science, Chinese Academy of Sciences, Hefei, Anhui, China
| | - Wu Zhang
- School of Agriculture and Biological Sciences, Dali University, Dali, Yunnan, China
| | - Jiuyong Li
- UniSA STEM, University of South Australia, Mawson Lakes, South Australia, Australia
| | - Nini Rao
- Center for Informational Biology, School of Life Science and Technology, University of Electronic Science and Technology of China, Chengdu, Sichuan, China
| | - Thuc Duy Le
- UniSA STEM, University of South Australia, Mawson Lakes, South Australia, Australia
| |
Collapse
|
12
|
Sinha M, Tadepalli P, Ramsey SA. Voting-based integration algorithm improves causal network learning from interventional and observational data: An application to cell signaling network inference. PLoS One 2021; 16:e0245776. [PMID: 33556096 PMCID: PMC7869988 DOI: 10.1371/journal.pone.0245776] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/10/2020] [Accepted: 01/07/2021] [Indexed: 11/19/2022] Open
Abstract
In order to increase statistical power for learning a causal network, data are often pooled from multiple observational and interventional experiments. However, if the direct effects of interventions are uncertain, multi-experiment data pooling can result in false causal discoveries. We present a new method, "Learn and Vote," for inferring causal interactions from multi-experiment datasets. In our method, experiment-specific networks are learned from the data and then combined by weighted averaging to construct a consensus network. Through empirical studies on synthetic and real-world datasets, we found that for most of the larger-sized network datasets that we analyzed, our method is more accurate than state-of-the-art network inference approaches.
Collapse
Affiliation(s)
- Meghamala Sinha
- School of Electrical Engineering and Computer Science, Oregon State University, Corvallis, Oregon, United States of America
- * E-mail:
| | - Prasad Tadepalli
- School of Electrical Engineering and Computer Science, Oregon State University, Corvallis, Oregon, United States of America
| | - Stephen A. Ramsey
- School of Electrical Engineering and Computer Science, Oregon State University, Corvallis, Oregon, United States of America
- Department of Biomedical Sciences, Oregon State University, Corvallis, Oregon, United States of America
| |
Collapse
|
13
|
Hackett SR, Baltz EA, Coram M, Wranik BJ, Kim G, Baker A, Fan M, Hendrickson DG, Berndl M, McIsaac RS. Learning causal networks using inducible transcription factors and transcriptome-wide time series. Mol Syst Biol 2021; 16:e9174. [PMID: 32181581 PMCID: PMC7076914 DOI: 10.15252/msb.20199174] [Citation(s) in RCA: 35] [Impact Index Per Article: 11.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/13/2019] [Revised: 02/13/2020] [Accepted: 02/19/2020] [Indexed: 11/27/2022] Open
Abstract
We present IDEA (the Induction Dynamics gene Expression Atlas), a dataset constructed by independently inducing hundreds of transcription factors (TFs) and measuring timecourses of the resulting gene expression responses in budding yeast. Each experiment captures a regulatory cascade connecting a single induced regulator to the genes it causally regulates. We discuss the regulatory cascade of a single TF, Aft1, in detail; however, IDEA contains > 200 TF induction experiments with 20 million individual observations and 100,000 signal‐containing dynamic responses. As an application of IDEA, we integrate all timecourses into a whole‐cell transcriptional model, which is used to predict and validate multiple new and underappreciated transcriptional regulators. We also find that the magnitudes of coefficients in this model are predictive of genetic interaction profile similarities. In addition to being a resource for exploring regulatory connectivity between TFs and their target genes, our modeling approach shows that combining rapid perturbations of individual genes with genome‐scale time‐series measurements is an effective strategy for elucidating gene regulatory networks.
Collapse
Affiliation(s)
| | | | | | | | - Griffin Kim
- Calico Life Sciences LLC, South San Francisco, CA, USA
| | - Adam Baker
- Calico Life Sciences LLC, South San Francisco, CA, USA
| | | | | | | | | |
Collapse
|
14
|
Zhou Y, Song PXK, Wen X. Structural factor equation models for causal network construction via directed acyclic mixed graphs. Biometrics 2020; 77:573-586. [PMID: 32627167 DOI: 10.1111/biom.13322] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/08/2018] [Accepted: 05/29/2020] [Indexed: 11/30/2022]
Abstract
Directed acyclic mixed graphs (DAMGs) provide a useful representation of network topology with both directed and undirected edges subject to the restriction of no directed cycles in the graph. This graphical framework may arise in many biomedical studies, for example, when a directed acyclic graph (DAG) of interest is contaminated with undirected edges induced by some unobserved confounding factors (eg, unmeasured environmental factors). Directed edges in a DAG are widely used to evaluate causal relationships among variables in a network, but detecting them is challenging when the underlying causality is obscured by some shared latent factors. The objective of this paper is to develop an effective structural equation model (SEM) method to extract reliable causal relationships from a DAMG. The proposed approach, termed structural factor equation model (SFEM), uses the SEM to capture the network topology of the DAG while accounting for the undirected edges in the graph with a factor analysis model. The latent factors in the SFEM enable the identification and removal of undirected edges, leading to a simpler and more interpretable causal network. The proposed method is evaluated and compared to existing methods through extensive simulation studies, and illustrated through the construction of gene regulatory networks related to breast cancer.
Collapse
Affiliation(s)
- Yan Zhou
- Gilead Sciences, Foster City, California
| | - Peter X-K Song
- Department of Biostatistics, University of Michigan, Ann Arbor, Michigan
| | - Xiaoquan Wen
- Department of Biostatistics, University of Michigan, Ann Arbor, Michigan
| |
Collapse
|
15
|
Khan A, Saha G, Pal RK. Modified Half-System Based Method for Reverse Engineering of Gene Regulatory Networks. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2020; 17:1303-1316. [PMID: 30640623 DOI: 10.1109/tcbb.2019.2892450] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/09/2023]
Abstract
The accurate reconstruction of gene regulatory networks for proper understanding of the intricacies of complex biological mechanisms still provides motivation for researchers. Due to accessibility of various gene expression data, we can now attempt to computationally infer genetic interactions. Among the established network inference techniques, S-system is preferred because of its efficiency in replicating biological systems though it is computationally more expensive. This provides motivation for us to develop a similar system with lesser computational load. In this work, we have proposed a novel methodology for reverse engineering of gene regulatory networks based on a new technique: half-system. Half-systems use half the number of parameters compared to S-systems and thus significantly reduce the computational complexity. We have implemented our proposed technique for reconstructing four benchmark networks from their corresponding temporal expression profiles: an 8-gene, a 10-gene, and two 20-gene networks. Being a new technique, to the best of our knowledge, there are no comparable results for this in the contemporary literature. Therefore, we have compared our results with those obtained from the contemporary literature using other methodologies, including the state-of-the-art method, GENIE3. The results obtained in this work stack favourably against the competition, even showing quantifiable improvements in some cases.
Collapse
|
16
|
Cardner M, Meyer-Schaller N, Christofori G, Beerenwinkel N. Inferring signalling dynamics by integrating interventional with observational data. Bioinformatics 2020; 35:i577-i585. [PMID: 31510686 PMCID: PMC6612850 DOI: 10.1093/bioinformatics/btz325] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/17/2022] Open
Abstract
Motivation In order to infer a cell signalling network, we generally need interventional data from perturbation experiments. If the perturbation experiments are time-resolved, then signal progression through the network can be inferred. However, such designs are infeasible for large signalling networks, where it is more common to have steady-state perturbation data on the one hand, and a non-interventional time series on the other. Such was the design in a recent experiment investigating the coordination of epithelial–mesenchymal transition (EMT) in murine mammary gland cells. We aimed to infer the underlying signalling network of transcription factors and microRNAs coordinating EMT, as well as the signal progression during EMT. Results In the context of nested effects models, we developed a method for integrating perturbation data with a non-interventional time series. We applied the model to RNA sequencing data obtained from an EMT experiment. Part of the network inferred from RNA interference was validated experimentally using luciferase reporter assays. Our model extension is formulated as an integer linear programme, which can be solved efficiently using heuristic algorithms. This extension allowed us to infer the signal progression through the network during an EMT time course, and thereby assess when each regulator is necessary for EMT to advance. Availability and implementation R package at https://github.com/cbg-ethz/timeseriesNEM. The RNA sequencing data and microscopy images can be explored through a Shiny app at https://emt.bsse.ethz.ch. Supplementary information Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Mathias Cardner
- Department of Biosystems Science and Engineering, ETH Zurich, Basel, Switzerland.,SIB Swiss Institute of Bioinformatics, Basel, Switzerland
| | | | | | - Niko Beerenwinkel
- Department of Biosystems Science and Engineering, ETH Zurich, Basel, Switzerland.,SIB Swiss Institute of Bioinformatics, Basel, Switzerland
| |
Collapse
|
17
|
Holding AN, Cook HV, Markowetz F. Data generation and network reconstruction strategies for single cell transcriptomic profiles of CRISPR-mediated gene perturbations. BIOCHIMICA ET BIOPHYSICA ACTA. GENE REGULATORY MECHANISMS 2020; 1863:194441. [PMID: 31756390 DOI: 10.1016/j.bbagrm.2019.194441] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/07/2019] [Revised: 10/01/2019] [Accepted: 10/01/2019] [Indexed: 02/05/2023]
Abstract
Recent advances in single-cell RNA-sequencing (scRNA-seq) in combination with CRISPR/Cas9 technologies have enabled the development of methods for large-scale perturbation studies with transcriptional readouts. These methods are highly scalable and have the potential to provide a wealth of information on the biological networks that underlie cellular response. Here we discuss how to overcome several key challenges to generate and analyse data for the confident reconstruction of models of the underlying cellular network. Some challenges are generic, and apply to analysing any single-cell transcriptomic data, while others are specific to combined single-cell CRISPR/Cas9 data, in particular barcode swapping, knockdown efficiency, multiplicity of infection and potential confounding factors. We also provide a curated collection of published data sets to aid the development of analysis strategies. Finally, we discuss several network reconstruction approaches, including co-expression networks and Bayesian networks, as well as their limitations, and highlight the potential of Nested Effects Models for network reconstruction from scRNA-seq data. This article is part of a Special Issue entitled: Transcriptional Profiles and Regulatory Gene Networks edited by Dr. Dr. Federico Manuel Giorgi and Dr. Shaun Mahony.
Collapse
Affiliation(s)
- Andrew N Holding
- Department of Biology, University of York, York, UK; York Biomedical Research Institute, University of York, York, UK; CRUK Cambridge Institute, University of Cambridge, Robinson Way, Cambridge, UK; The Alan Turing Institute, 96 Euston Road, Kings Cross, London, UK
| | - Helen V Cook
- Department of Biology, University of York, York, UK
| | | |
Collapse
|
18
|
Condition-Specific Modeling of Biophysical Parameters Advances Inference of Regulatory Networks. Cell Rep 2019; 23:376-388. [PMID: 29641998 PMCID: PMC5987223 DOI: 10.1016/j.celrep.2018.03.048] [Citation(s) in RCA: 12] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/25/2017] [Revised: 01/12/2018] [Accepted: 03/12/2018] [Indexed: 12/31/2022] Open
Abstract
Large-scale inference of eukaryotic transcription-regulatory networks remains challenging. One underlying reason is that existing algorithms typically ignore crucial regulatory mechanisms, such as RNA degradation and post-transcriptional processing. Here, we describe InfereCLaDR, which incorporates such elements and advances prediction in Saccharomyces cerevisiae. First, InfereCLaDR employs a high-quality Gold Standard dataset that we use separately as prior information and for model validation. Second, InfereCLaDR explicitly models transcription factor activity and RNA half-lives. Third, it introduces expression subspaces to derive condition-responsive regulatory networks for every gene. InfereCLaDR’s final network is validated by known data and trends and results in multiple insights. For example, it predicts long half-lives for transcripts of the nucleic acid metabolism genes and members of the cytosolic chaperonin complex as targets of the proteasome regulator Rpn4p. InfereCLaDR demonstrates that more biophysically realistic modeling of regulatory networks advances prediction accuracy both in eukaryotes and prokaryotes.
Collapse
|
19
|
Choobdar S, Ahsen ME, Crawford J, Tomasoni M, Fang T, Lamparter D, Lin J, Hescott B, Hu X, Mercer J, Natoli T, Narayan R, Subramanian A, Zhang JD, Stolovitzky G, Kutalik Z, Lage K, Slonim DK, Saez-Rodriguez J, Cowen LJ, Bergmann S, Marbach D. Assessment of network module identification across complex diseases. Nat Methods 2019; 16:843-852. [PMID: 31471613 PMCID: PMC6719725 DOI: 10.1038/s41592-019-0509-5] [Citation(s) in RCA: 140] [Impact Index Per Article: 28.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/29/2018] [Accepted: 07/10/2019] [Indexed: 12/11/2022]
Abstract
Many bioinformatics methods have been proposed for reducing the complexity of large gene or protein networks into relevant subnetworks or modules. Yet, how such methods compare to each other in terms of their ability to identify disease-relevant modules in different types of network remains poorly understood. We launched the 'Disease Module Identification DREAM Challenge', an open competition to comprehensively assess module identification methods across diverse protein-protein interaction, signaling, gene co-expression, homology and cancer-gene networks. Predicted network modules were tested for association with complex traits and diseases using a unique collection of 180 genome-wide association studies. Our robust assessment of 75 module identification methods reveals top-performing algorithms, which recover complementary trait-associated modules. We find that most of these modules correspond to core disease-relevant pathways, which often comprise therapeutic targets. This community challenge establishes biologically interpretable benchmarks, tools and guidelines for molecular network analysis to study human disease biology.
Collapse
Affiliation(s)
- Sarvenaz Choobdar
- Department of Computational Biology, University of Lausanne, Lausanne, Switzerland
- Swiss Institute of Bioinformatics, Lausanne, Switzerland
| | - Mehmet E Ahsen
- Icahn Institute for Genomics and Multiscale Biology and Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, New York, NY, USA
| | - Jake Crawford
- Department of Computer Science, Tufts University, Medford, MA, USA
| | - Mattia Tomasoni
- Department of Computational Biology, University of Lausanne, Lausanne, Switzerland
- Swiss Institute of Bioinformatics, Lausanne, Switzerland
| | - Tao Fang
- Roche Pharma Research and Early Development, Pharmaceutical Sciences, Roche Innovation Center Basel, F. Hoffmann-La Roche Ltd, Basel, Switzerland
| | - David Lamparter
- Department of Computational Biology, University of Lausanne, Lausanne, Switzerland
- Swiss Institute of Bioinformatics, Lausanne, Switzerland
- Verge Genomics, San Francisco, CA, USA
| | - Junyuan Lin
- Department of Mathematics, Tufts University, Medford, MA, USA
| | - Benjamin Hescott
- College of Computer and Information Science, Northeastern University, Boston, MA, USA
| | - Xiaozhe Hu
- Department of Mathematics, Tufts University, Medford, MA, USA
| | - Johnathan Mercer
- Department of Surgery, Massachusetts General Hospital, Harvard Medical School, Boston, MA, USA
- Stanley Center at the Broad Institute of MIT and Harvard, Cambridge, MA, USA
| | - Ted Natoli
- Broad Institute of MIT and Harvard, Cambridge, MA, USA
| | - Rajiv Narayan
- Broad Institute of MIT and Harvard, Cambridge, MA, USA
| | | | - Jitao D Zhang
- Roche Pharma Research and Early Development, Pharmaceutical Sciences, Roche Innovation Center Basel, F. Hoffmann-La Roche Ltd, Basel, Switzerland
| | - Gustavo Stolovitzky
- Icahn Institute for Genomics and Multiscale Biology and Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, New York, NY, USA
- IBM T.J. Watson Research Center, Yorktown Heights, NY, USA
| | - Zoltán Kutalik
- Swiss Institute of Bioinformatics, Lausanne, Switzerland
- University Institute of Primary Care and Public Health, University of Lausanne, Lausanne, Switzerland
| | - Kasper Lage
- Department of Surgery, Massachusetts General Hospital, Harvard Medical School, Boston, MA, USA
- Stanley Center at the Broad Institute of MIT and Harvard, Cambridge, MA, USA
- Institute for Biological Psychiatry, Mental Health Center Sct. Hans, University of Copenhagen, Roskilde, Denmark
| | - Donna K Slonim
- Department of Computer Science, Tufts University, Medford, MA, USA
- Department of Immunology, Tufts University School of Medicine, Boston, MA, USA
| | - Julio Saez-Rodriguez
- Institute for Computational Biomedicine, Faculty of Medicine, Heidelberg University, Bioquant, Heidelberg, Germany
- RWTH Aachen University, Faculty of Medicine, Joint Research Center for Computational Biomedicine, Aachen, Germany
| | - Lenore J Cowen
- Department of Computer Science, Tufts University, Medford, MA, USA
- Department of Mathematics, Tufts University, Medford, MA, USA
| | - Sven Bergmann
- Department of Computational Biology, University of Lausanne, Lausanne, Switzerland.
- Swiss Institute of Bioinformatics, Lausanne, Switzerland.
- Department of Integrative Biomedical Sciences, University of Cape Town, Cape Town, South Africa.
| | - Daniel Marbach
- Department of Computational Biology, University of Lausanne, Lausanne, Switzerland.
- Swiss Institute of Bioinformatics, Lausanne, Switzerland.
- Roche Pharma Research and Early Development, Pharmaceutical Sciences, Roche Innovation Center Basel, F. Hoffmann-La Roche Ltd, Basel, Switzerland.
| |
Collapse
|
20
|
Glymour C, Zhang K, Spirtes P. Review of Causal Discovery Methods Based on Graphical Models. Front Genet 2019; 10:524. [PMID: 31214249 PMCID: PMC6558187 DOI: 10.3389/fgene.2019.00524] [Citation(s) in RCA: 130] [Impact Index Per Article: 26.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/07/2018] [Accepted: 05/13/2019] [Indexed: 12/11/2022] Open
Abstract
A fundamental task in various disciplines of science, including biology, is to find underlying causal relations and make use of them. Causal relations can be seen if interventions are properly applied; however, in many cases they are difficult or even impossible to conduct. It is then necessary to discover causal relations by analyzing statistical properties of purely observational data, which is known as causal discovery or causal structure search. This paper aims to give a introduction to and a brief review of the computational methods for causal discovery that were developed in the past three decades, including constraint-based and score-based methods and those based on functional causal models, supplemented by some illustrations and applications.
Collapse
Affiliation(s)
- Clark Glymour
- Department of Philosophy, Carnegie Mellon University, Pittsburgh, PA, United States
| | - Kun Zhang
- Department of Philosophy, Carnegie Mellon University, Pittsburgh, PA, United States
| | - Peter Spirtes
- Department of Philosophy, Carnegie Mellon University, Pittsburgh, PA, United States
| |
Collapse
|
21
|
Dozmorov MG. Disease classification: from phenotypic similarity to integrative genomics and beyond. Brief Bioinform 2019; 20:1769-1780. [DOI: 10.1093/bib/bby049] [Citation(s) in RCA: 15] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/21/2018] [Revised: 05/01/2018] [Indexed: 02/06/2023] Open
Abstract
Abstract
A fundamental challenge of modern biomedical research is understanding how diseases that are similar on the phenotypic level are similar on the molecular level. Integration of various genomic data sets with the traditionally used phenotypic disease similarity revealed novel genetic and molecular mechanisms and blurred the distinction between monogenic (Mendelian) and complex diseases. Network-based medicine has emerged as a complementary approach for identifying disease-causing genes, genetic mediators, disruptions in the underlying cellular functions and for drug repositioning. The recent development of machine and deep learning methods allow for leveraging real-life information about diseases to refine genetic and phenotypic disease relationships. This review describes the historical development and recent methodological advancements for studying disease classification (nosology).
Collapse
Affiliation(s)
- Mikhail G Dozmorov
- Department of Biostatistics, Virginia Commonwealth University, 830 East Main Street, Richmond, VA, USA
| |
Collapse
|
22
|
Alexiou A, Chatzichronis S, Perveen A, Hafeez A, Ashraf GM. Algorithmic and Stochastic Representations of Gene Regulatory Networks and Protein-Protein Interactions. Curr Top Med Chem 2019; 19:413-425. [PMID: 30854971 DOI: 10.2174/1568026619666190311125256] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/27/2018] [Revised: 10/15/2018] [Accepted: 12/26/2018] [Indexed: 02/06/2023]
Abstract
BACKGROUND Latest studies reveal the importance of Protein-Protein interactions on physiologic functions and biological structures. Several stochastic and algorithmic methods have been published until now, for the modeling of the complex nature of the biological systems. OBJECTIVE Biological Networks computational modeling is still a challenging task. The formulation of the complex cellular interactions is a research field of great interest. In this review paper, several computational methods for the modeling of GRN and PPI are presented analytically. METHODS Several well-known GRN and PPI models are presented and discussed in this review study such as: Graphs representation, Boolean Networks, Generalized Logical Networks, Bayesian Networks, Relevance Networks, Graphical Gaussian models, Weight Matrices, Reverse Engineering Approach, Evolutionary Algorithms, Forward Modeling Approach, Deterministic models, Static models, Hybrid models, Stochastic models, Petri Nets, BioAmbients calculus and Differential Equations. RESULTS GRN and PPI methods have been already applied in various clinical processes with potential positive results, establishing promising diagnostic tools. CONCLUSION In literature many stochastic algorithms are focused in the simulation, analysis and visualization of the various biological networks and their dynamics interactions, which are referred and described in depth in this review paper.
Collapse
Affiliation(s)
| | | | - Asma Perveen
- Glocal School of Life Sciences, Glocal University, Mirzapur Pole, Saharanpur, Uttar Pradesh, India
| | - Abdul Hafeez
- Glocal School of Pharmacy, Glocal University, Mirzapur Pole, Saharanpur, Uttar Pradesh, India
| | - Ghulam Md. Ashraf
- King Fahd Medical Research Center, King Abdulaziz University, Jeddah, Saudi Arabia
| |
Collapse
|
23
|
Inferring Gene Regulatory Networks from a Population of Yeast Segregants. Sci Rep 2019; 9:1197. [PMID: 30718595 PMCID: PMC6361976 DOI: 10.1038/s41598-018-37667-4] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/31/2018] [Accepted: 11/30/2018] [Indexed: 12/14/2022] Open
Abstract
Constructing gene regulatory networks is crucial to unraveling the genetic architecture of complex traits and to understanding the mechanisms of diseases. On the basis of gene expression and single nucleotide polymorphism data in the yeast, Saccharomyces cerevisiae, we constructed gene regulatory networks using a two-stage penalized least squares method. A large system of structural equations via optimal prediction of a set of surrogate variables was established at the first stage, followed by consistent selection of regulatory effects at the second stage. Using this approach, we identified subnetworks that were enriched in gene ontology categories, revealing directional regulatory mechanisms controlling these biological pathways. Our mapping and analysis of expression-based quantitative trait loci uncovered a known alteration of gene expression within a biological pathway that results in regulatory effects on companion pathway genes in the phosphocholine network. In addition, we identify nodes in these gene ontology-enriched subnetworks that are coordinately controlled by transcription factors driven by trans-acting expression quantitative trait loci. Altogether, the integration of documented transcription factor regulatory associations with subnetworks defined by a system of structural equations using quantitative trait loci data is an effective means to delineate the transcriptional control of biological pathways.
Collapse
|
24
|
van der Wijst MGP, de Vries DH, Brugge H, Westra HJ, Franke L. An integrative approach for building personalized gene regulatory networks for precision medicine. Genome Med 2018; 10:96. [PMID: 30567569 PMCID: PMC6299585 DOI: 10.1186/s13073-018-0608-4] [Citation(s) in RCA: 32] [Impact Index Per Article: 5.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/01/2023] Open
Abstract
Only a small fraction of patients respond to the drug prescribed to treat their disease, which means that most are at risk of unnecessary exposure to side effects through ineffective drugs. This inter-individual variation in drug response is driven by differences in gene interactions caused by each patient's genetic background, environmental exposures, and the proportions of specific cell types involved in disease. These gene interactions can now be captured by building gene regulatory networks, by taking advantage of RNA velocity (the time derivative of the gene expression state), the ability to study hundreds of thousands of cells simultaneously, and the falling price of single-cell sequencing. Here, we propose an integrative approach that leverages these recent advances in single-cell data with the sensitivity of bulk data to enable the reconstruction of personalized, cell-type- and context-specific gene regulatory networks. We expect this approach will allow the prioritization of key driver genes for specific diseases and will provide knowledge that opens new avenues towards improved personalized healthcare.
Collapse
Affiliation(s)
- Monique G P van der Wijst
- Department of Genetics, 5th floor ERIBA building, Antonius Deusinglaan 1, 9713AV Groningen, University of Groningen, University Medical Center Groningen, Groningen, The Netherlands
| | - Dylan H de Vries
- Department of Genetics, 5th floor ERIBA building, Antonius Deusinglaan 1, 9713AV Groningen, University of Groningen, University Medical Center Groningen, Groningen, The Netherlands
| | - Harm Brugge
- Department of Genetics, 5th floor ERIBA building, Antonius Deusinglaan 1, 9713AV Groningen, University of Groningen, University Medical Center Groningen, Groningen, The Netherlands
| | - Harm-Jan Westra
- Department of Genetics, 5th floor ERIBA building, Antonius Deusinglaan 1, 9713AV Groningen, University of Groningen, University Medical Center Groningen, Groningen, The Netherlands
| | - Lude Franke
- Department of Genetics, 5th floor ERIBA building, Antonius Deusinglaan 1, 9713AV Groningen, University of Groningen, University Medical Center Groningen, Groningen, The Netherlands.
| |
Collapse
|
25
|
Rougny A, Gloaguen P, Langonné N, Reiter E, Crépieux P, Poupon A, Froidevaux C. A logic-based method to build signaling networks and propose experimental plans. Sci Rep 2018; 8:7830. [PMID: 29777117 PMCID: PMC5959848 DOI: 10.1038/s41598-018-26006-2] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/08/2018] [Accepted: 04/27/2018] [Indexed: 11/15/2022] Open
Abstract
With the dramatic increase of the diversity and the sheer quantity of biological data generated, the construction of comprehensive signaling networks that include precise mechanisms cannot be carried out manually anymore. In this context, we propose a logic-based method that allows building large signaling networks automatically. Our method is based on a set of expert rules that make explicit the reasoning made by biologists when interpreting experimental results coming from a wide variety of experiment types. These rules allow formulating all the conclusions that can be inferred from a set of experimental results, and thus building all the possible networks that explain these results. Moreover, given an hypothesis, our system proposes experimental plans to carry out in order to validate or invalidate it. To evaluate the performance of our method, we applied our framework to the reconstruction of the FSHR-induced and the EGFR-induced signaling networks. The FSHR is known to induce the transactivation of the EGFR, but very little is known on the resulting FSH- and EGF-dependent network. We built a single network using data underlying both networks. This leads to a new hypothesis on the activation of MEK by p38MAPK, which we validate experimentally. These preliminary results represent a first step in the demonstration of a cross-talk between these two major MAP kinases pathways.
Collapse
Affiliation(s)
- Adrien Rougny
- Biotechnology Research Institute for Drug Discovery, National Institute of Advanced Industrial Science and Technology (AIST), Aomi, Tokyo, 135-0064, Japan.,Laboratoire de Recherche en Informatique UMR CNRS 8623, Université Paris-Sud, Université Paris-Saclay, Orsay Cedex, 91405, France
| | - Pauline Gloaguen
- PRC, INRA, CNRS, Université François Rabelais-Tours, 37380, Nouzilly, France
| | - Nathalie Langonné
- PRC, INRA, CNRS, Université François Rabelais-Tours, 37380, Nouzilly, France.,CNRS; Université François-Rabelais de Tours, UMR 7292, 37032, Tours, France
| | - Eric Reiter
- PRC, INRA, CNRS, Université François Rabelais-Tours, 37380, Nouzilly, France
| | - Pascale Crépieux
- PRC, INRA, CNRS, Université François Rabelais-Tours, 37380, Nouzilly, France
| | - Anne Poupon
- PRC, INRA, CNRS, Université François Rabelais-Tours, 37380, Nouzilly, France.
| | - Christine Froidevaux
- Laboratoire de Recherche en Informatique UMR CNRS 8623, Université Paris-Sud, Université Paris-Saclay, Orsay Cedex, 91405, France
| |
Collapse
|
26
|
Yu J, Silva JM. Bayesian Network to Infer Drug-Induced Apoptosis Circuits from Connectivity Map Data. Methods Mol Biol 2018; 1783:361-378. [PMID: 29767372 DOI: 10.1007/978-1-4939-7834-2_18] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/08/2023]
Abstract
The Connectivity Map (CMAP) project profiled human cancer cell lines exposed to a library of anticancer compounds with the goal of connecting cancer with underlying genes and potential treatments. As most targeted anticancer therapeutics aim to induce tumor-selective apoptosis, it is critical to understand the specific cell death pathways triggered by drugs. This can help to better understand the mechanism of how cancer cells respond to chemical stimulations and improve the treatment of human tumors. In this study, using Connectivity MAP microarray-based gene expression data, we applied a Bayesian network modeling approach and identified apoptosis as a major drug-induced cellular pathway. We focused on 13 apoptotic genes that showed significant differential expression across all drug-perturbed samples to reconstruct the apoptosis network. In our predicted subnetwork, 9 out of 15 high-confidence interactions were validated in literature, and our inferred network captured two major cell death pathways by identifying BCL2L11 and PMAIP1 as key interacting players for the intrinsic apoptosis pathway, and TAXBP1 and TNFAIP3 for the extrinsic apoptosis pathway. Our inferred apoptosis network also suggested the role of BCL2L11 and TNFAIP3 as "gateway" genes in the drug-induced intrinsic and extrinsic apoptosis pathways.
Collapse
Affiliation(s)
- Jiyang Yu
- Department of Computational Biology, St. Jude Children's Research Hospital, Memphis, TN, USA.
| | - Jose M Silva
- Department of Pathology, Icahn School of Medicine at Mount Sinai, The Mount Sinai Hospital, New York, NY, USA
| |
Collapse
|
27
|
Yu B, Xu JM, Li S, Chen C, Chen RX, Wang L, Zhang Y, Wang MH. Inference of time-delayed gene regulatory networks based on dynamic Bayesian network hybrid learning method. Oncotarget 2017; 8:80373-80392. [PMID: 29113310 PMCID: PMC5655205 DOI: 10.18632/oncotarget.21268] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/20/2017] [Accepted: 08/27/2017] [Indexed: 01/31/2023] Open
Abstract
Gene regulatory networks (GRNs) research reveals complex life phenomena from the perspective of gene interaction, which is an important research field in systems biology. Traditional Bayesian networks have a high computational complexity, and the network structure scoring model has a single feature. Information-based approaches cannot identify the direction of regulation. In order to make up for the shortcomings of the above methods, this paper presents a novel hybrid learning method (DBNCS) based on dynamic Bayesian network (DBN) to construct the multiple time-delayed GRNs for the first time, combining the comprehensive score (CS) with the DBN model. DBNCS algorithm first uses CMI2NI (conditional mutual inclusive information-based network inference) algorithm for network structure profiles learning, namely the construction of search space. Then the redundant regulations are removed by using the recursive optimization algorithm (RO), thereby reduce the false positive rate. Secondly, the network structure profiles are decomposed into a set of cliques without loss, which can significantly reduce the computational complexity. Finally, DBN model is used to identify the direction of gene regulation within the cliques and search for the optimal network structure. The performance of DBNCS algorithm is evaluated by the benchmark GRN datasets from DREAM challenge as well as the SOS DNA repair network in Escherichia coli, and compared with other state-of-the-art methods. The experimental results show the rationality of the algorithm design and the outstanding performance of the GRNs.
Collapse
Affiliation(s)
- Bin Yu
- College of Mathematics and Physics, Qingdao University of Science and Technology, Qingdao 266061, China
- CAS Key Laboratory of Geospace Environment, Department of Geophysics and Planetary Science, University of Science and Technology of China, Hefei 230026, China
- Bioinformatics and Systems Biology Research Center, Qingdao University of Science and Technology, Qingdao 266061, China
| | - Jia-Meng Xu
- College of Mathematics and Physics, Qingdao University of Science and Technology, Qingdao 266061, China
- Bioinformatics and Systems Biology Research Center, Qingdao University of Science and Technology, Qingdao 266061, China
| | - Shan Li
- College of Mathematics and Physics, Qingdao University of Science and Technology, Qingdao 266061, China
- Bioinformatics and Systems Biology Research Center, Qingdao University of Science and Technology, Qingdao 266061, China
| | - Cheng Chen
- College of Mathematics and Physics, Qingdao University of Science and Technology, Qingdao 266061, China
- Bioinformatics and Systems Biology Research Center, Qingdao University of Science and Technology, Qingdao 266061, China
| | - Rui-Xin Chen
- College of Mathematics and Physics, Qingdao University of Science and Technology, Qingdao 266061, China
- Bioinformatics and Systems Biology Research Center, Qingdao University of Science and Technology, Qingdao 266061, China
| | - Lei Wang
- Key Laboratory of Eco-chemical Engineering, Ministry of Education, Laboratory of Inorganic Synthesis and Applied Chemistry, College of Chemistry and Molecular Engineering, Qingdao University of Science and Technology, Qingdao 266042, China
| | - Yan Zhang
- Bioinformatics and Systems Biology Research Center, Qingdao University of Science and Technology, Qingdao 266061, China
- College of Electromechanical Engineering, Qingdao University of Science and Technology, Qingdao 266061, China
| | - Ming-Hui Wang
- College of Mathematics and Physics, Qingdao University of Science and Technology, Qingdao 266061, China
- Bioinformatics and Systems Biology Research Center, Qingdao University of Science and Technology, Qingdao 266061, China
| |
Collapse
|
28
|
Szczurek E, Beerenwinkel N. Linear effects models of signaling pathways from combinatorial perturbation data. Bioinformatics 2017; 32:i297-i305. [PMID: 27307630 PMCID: PMC4908352 DOI: 10.1093/bioinformatics/btw268] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/17/2022] Open
Abstract
Motivation: Perturbations constitute the central means to study signaling pathways. Interrupting components of the pathway and analyzing observed effects of those interruptions can give insight into unknown connections within the signaling pathway itself, as well as the link from the pathway to the effects. Different pathway components may have different individual contributions to the measured perturbation effects, such as gene expression changes. Those effects will be observed in combination when the pathway components are perturbed. Extant approaches focus either on the reconstruction of pathway structure or on resolving how the pathway components control the downstream effects. Results: Here, we propose a linear effects model, which can be applied to solve both these problems from combinatorial perturbation data. We use simulated data to demonstrate the accuracy of learning the pathway structure as well as estimation of the individual contributions of pathway components to the perturbation effects. The practical utility of our approach is illustrated by an application to perturbations of the mitogen-activated protein kinase pathway in Saccharomyces cerevisiae. Availability and Implementation: lem is available as a R package at http://www.mimuw.edu.pl/∼szczurek/lem. Contact:szczurek@mimuw.edu.pl; niko.beerenwinkel@bsse.ethz.ch Supplementary information:Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Ewa Szczurek
- Faculty of Mathematics, Informatics and Mechanics, University of Warsaw, Warsaw, Poland
| | - Niko Beerenwinkel
- Department of Biosystems Science and Engineering, ETH Zurich, Basel, Switzerland SIB Swiss Institute of Bioinformatics
| |
Collapse
|
29
|
Pirkl M, Diekmann M, van der Wees M, Beerenwinkel N, Fröhlich H, Markowetz F. Inferring modulators of genetic interactions with epistatic nested effects models. PLoS Comput Biol 2017; 13:e1005496. [PMID: 28406896 PMCID: PMC5407847 DOI: 10.1371/journal.pcbi.1005496] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/06/2016] [Revised: 04/27/2017] [Accepted: 04/03/2017] [Indexed: 12/27/2022] Open
Abstract
Maps of genetic interactions can dissect functional redundancies in cellular networks. Gene expression profiles as high-dimensional molecular readouts of combinatorial perturbations provide a detailed view of genetic interactions, but can be hard to interpret if different gene sets respond in different ways (called mixed epistasis). Here we test the hypothesis that mixed epistasis between a gene pair can be explained by the action of a third gene that modulates the interaction. We have extended the framework of Nested Effects Models (NEMs), a type of graphical model specifically tailored to analyze high-dimensional gene perturbation data, to incorporate logical functions that describe interactions between regulators on downstream genes and proteins. We benchmark our approach in the controlled setting of a simulation study and show high accuracy in inferring the correct model. In an application to data from deletion mutants of kinases and phosphatases in S. cerevisiae we show that epistatic NEMs can point to modulators of genetic interactions. Our approach is implemented in the R-package 'epiNEM' available from https://github.com/cbg-ethz/epiNEM and https://bioconductor.org/packages/epiNEM/.
Collapse
Affiliation(s)
- Martin Pirkl
- ETH Zurich, Department of Biosystems Science and Engineering, Basel, Switzerland
- SIB Swiss Institute of Bioinformatics, Lausanne, Switzerland
| | - Madeline Diekmann
- ETH Zurich, Department of Biosystems Science and Engineering, Basel, Switzerland
- SIB Swiss Institute of Bioinformatics, Lausanne, Switzerland
| | | | - Niko Beerenwinkel
- ETH Zurich, Department of Biosystems Science and Engineering, Basel, Switzerland
- SIB Swiss Institute of Bioinformatics, Lausanne, Switzerland
| | - Holger Fröhlich
- Bonn-Aachen International Center for IT (B-IT), University of Bonn, Bonn, Germany
- UCB Biosciences GmbH, Monheim, Germany
| | - Florian Markowetz
- University of Cambridge, Cancer Research UK Cambridge Institute, Cambridge, United Kingdom
| |
Collapse
|
30
|
Expectation propagation for large scale Bayesian inference of non-linear molecular networks from perturbation data. PLoS One 2017; 12:e0171240. [PMID: 28166542 PMCID: PMC5293552 DOI: 10.1371/journal.pone.0171240] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/21/2016] [Accepted: 01/17/2017] [Indexed: 11/19/2022] Open
Abstract
Inferring the structure of molecular networks from time series protein or gene expression data provides valuable information about the complex biological processes of the cell. Causal network structure inference has been approached using different methods in the past. Most causal network inference techniques, such as Dynamic Bayesian Networks and ordinary differential equations, are limited by their computational complexity and thus make large scale inference infeasible. This is specifically true if a Bayesian framework is applied in order to deal with the unavoidable uncertainty about the correct model. We devise a novel Bayesian network reverse engineering approach using ordinary differential equations with the ability to include non-linearity. Besides modeling arbitrary, possibly combinatorial and time dependent perturbations with unknown targets, one of our main contributions is the use of Expectation Propagation, an algorithm for approximate Bayesian inference over large scale network structures in short computation time. We further explore the possibility of integrating prior knowledge into network inference. We evaluate the proposed model on DREAM4 and DREAM8 data and find it competitive against several state-of-the-art existing network inference methods.
Collapse
|
31
|
Hartmann AK, Nuel G. Using Triplet Ordering Preferences for Estimating Causal Effects in the Analysis of Gene Expression Data. PLoS One 2017; 12:e0170514. [PMID: 28141825 PMCID: PMC5283676 DOI: 10.1371/journal.pone.0170514] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/22/2016] [Accepted: 01/05/2017] [Indexed: 12/04/2022] Open
Abstract
Triplet ordering preferences are used to perform Monte Carlo sampling of the posterior causal orderings originating from the analysis of gene-expression experiments involving observation as well as, usually few, interventions, like knock-outs. The performance of this sampling approach is compared to a previously used sampling via pairwise ordering preference as well as to the sampling of the full posterior distribution. For a fair comparison, the latter approach is restricted to twice the numerical effort of the triplet-based approach. This is done for artificially generated causal, i.e., directed acyclic graphs (DAGs) and for actual experimental data taken from the ROSETTA challenge. The sampling using the triplets ordering turns out to be superior to both other approaches.
Collapse
Affiliation(s)
| | - Grégory Nuel
- LPMA, CNRS 7599, Université Pierre et Marie Curie, Paris, France
| |
Collapse
|
32
|
|
33
|
Bianchi F. Bioinformatics for Clinical Use in Breast Cancer. Breast Cancer 2017. [DOI: 10.1007/978-3-319-48848-6_82] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/18/2022]
|
34
|
Sinha S. A pedagogical walkthrough of computational modeling and simulation of Wnt signaling pathway using static causal models in MATLAB. EURASIP JOURNAL ON BIOINFORMATICS & SYSTEMS BIOLOGY 2016; 2017:1. [PMID: 27547217 PMCID: PMC4977324 DOI: 10.1186/s13637-016-0044-y] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 06/12/2015] [Accepted: 07/22/2016] [Indexed: 12/26/2022]
Abstract
Simulation study in systems biology involving computational experiments dealing with Wnt signaling pathways abound in literature but often lack a pedagogical perspective that might ease the understanding of beginner students and researchers in transition, who intend to work on the modeling of the pathway. This paucity might happen due to restrictive business policies which enforce an unwanted embargo on the sharing of important scientific knowledge. A tutorial introduction to computational modeling of Wnt signaling pathway in a human colorectal cancer dataset using static Bayesian network models is provided. The walkthrough might aid biologists/informaticians in understanding the design of computational experiments that is interleaved with exposition of the Matlab code and causal models from Bayesian network toolbox. The manuscript elucidates the coding contents of the advance article by Sinha (Integr. Biol. 6:1034-1048, 2014) and takes the reader in a step-by-step process of how (a) the collection and the transformation of the available biological information from literature is done, (b) the integration of the heterogeneous data and prior biological knowledge in the network is achieved, (c) the simulation study is designed, (d) the hypothesis regarding a biological phenomena is transformed into computational framework, and (e) results and inferences drawn using d-connectivity/separability are reported. The manuscript finally ends with a programming assignment to help the readers get hands-on experience of a perturbation project. Description of Matlab files is made available under GNU GPL v3 license at the Google code project on https://code.google.com/p/static-bn-for-wnt-signaling-pathway and https: //sites.google.com/site/shriprakashsinha/shriprakashsinha/projects/static-bn-for-wnt-signaling-pathway. Latest updates can be found in the latter website.
Collapse
|
35
|
Expanding the Immunology Toolbox: Embracing Public-Data Reuse and Crowdsourcing. Immunity 2016; 45:1191-1204. [DOI: 10.1016/j.immuni.2016.12.008] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/12/2016] [Revised: 11/30/2016] [Accepted: 12/01/2016] [Indexed: 12/15/2022]
|
36
|
Non-obvious correlations to disease management unraveled by Bayesian artificial intelligence analyses of CMS data. Artif Intell Med 2016; 74:1-8. [PMID: 27964799 DOI: 10.1016/j.artmed.2016.11.001] [Citation(s) in RCA: 28] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/25/2016] [Revised: 11/01/2016] [Accepted: 11/07/2016] [Indexed: 12/23/2022]
Abstract
OBJECTIVE Given the availability of extensive digitized healthcare data from medical records, claims and prescription information, it is now possible to use hypothesis-free, data-driven approaches to mine medical databases for novel insight. The goal of this analysis was to demonstrate the use of artificial intelligence based methods such as Bayesian networks to open up opportunities for creation of new knowledge in management of chronic conditions. MATERIALS AND METHODS Hospital level Medicare claims data containing discharge numbers for most common diagnoses were analyzed in a hypothesis-free manner using Bayesian networks learning methodology. RESULTS While many interactions identified between discharge rates of diagnoses using this data set are supported by current medical knowledge, a novel interaction linking asthma and renal failure was discovered. This interaction is non-obvious and had not been looked at by the research and clinical communities in epidemiological or clinical data. A plausible pharmacological explanation of this link is proposed together with a verification of the risk significance by conventional statistical analysis. CONCLUSION Potential clinical and molecular pathways defining the relationship between commonly used asthma medications and renal disease are discussed. The study underscores the need for further epidemiological research to validate this novel hypothesis. Validation will lead to advancement in clinical treatment of asthma & bronchitis, thereby, improving patient outcomes and leading to long term cost savings. In summary, this study demonstrates that application of advanced artificial intelligence methods in healthcare has the potential to enhance the quality of care by discovering non-obvious, clinically relevant relationships and enabling timely care intervention.
Collapse
|
37
|
Hassan A, Naz A, Obaid A, Paracha RZ, Naz K, Awan FM, Muhmmad SA, Janjua HA, Ahmad J, Ali A. Pangenome and immuno-proteomics analysis of Acinetobacter baumannii strains revealed the core peptide vaccine targets. BMC Genomics 2016; 17:732. [PMID: 27634541 PMCID: PMC5025611 DOI: 10.1186/s12864-016-2951-4] [Citation(s) in RCA: 81] [Impact Index Per Article: 10.1] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/12/2016] [Accepted: 07/19/2016] [Indexed: 01/21/2023] Open
Abstract
BACKGROUND Acinetobacter baumannii has emerged as a significant nosocomial pathogen during the last few years, exhibiting resistance to almost all major classes of antibiotics. Alternative treatment options such as vaccines tend to be most promising and cost effective approaches against this resistant pathogen. In the current study, we have explored the pan-genome of A. baumannii followed by immune-proteomics and reverse vaccinology approaches to identify potential core vaccine targets. RESULTS The pan-genome of all available A. baumannii strains (30 complete genomes) is estimated to contain 7,606 gene families and the core genome consists of 2,445 gene families (~32 % of the pan-genome). Phylogenetic tree, comparative genomic and proteomic analysis revealed both intra- and inter genomic similarities and evolutionary relationships. Among the conserved core genome, thirteen proteins, including P pilus assembly protein, pili assembly chaperone, AdeK, PonA, OmpA, general secretion pathway protein D, FhuE receptor, Type VI secretion system OmpA/MotB, TonB dependent siderophore receptor, general secretion pathway protein D, outer membrane protein, peptidoglycan associated lipoprotein and peptidyl-prolyl cis-trans isomerase are identified as highly antigenic. Epitope mapping of the target proteins revealed the presence of antigenic surface exposed 9-mer T-cell epitopes. Protein-protein interaction and functional annotation have shown their involvement in significant biological and molecular processes. The pipeline is validated by predicting already known immunogenic targets against Gram negative pathogen Helicobacter pylori as a positive control. CONCLUSION The study, based upon combinatorial approach of pan-genomics, core genomics, proteomics and reverse vaccinology led us to find out potential vaccine candidates against A. baumannii. The comprehensive analysis of all the completely sequenced genomes revealed thirteen putative antigens which could elicit substantial immune response. The integration of computational vaccinology strategies would facilitate in tackling the rapid dissemination of resistant A.baumannii strains. The scarcity of effective antibiotics and the global expansion of sequencing data making this approach desirable in the development of effective vaccines against A. baumannii and other bacterial pathogens.
Collapse
Affiliation(s)
- Afreenish Hassan
- Atta-ur-Rahman School of Applied Biosciences (ASAB), National University of Sciences and Technology (NUST), H-12, Islamabad, Pakistan
| | - Anam Naz
- Atta-ur-Rahman School of Applied Biosciences (ASAB), National University of Sciences and Technology (NUST), H-12, Islamabad, Pakistan
| | - Ayesha Obaid
- Atta-ur-Rahman School of Applied Biosciences (ASAB), National University of Sciences and Technology (NUST), H-12, Islamabad, Pakistan
| | - Rehan Zafar Paracha
- Research Center for Modeling and Simulation (RCMS), National University of Sciences and Technology (NUST), H-12, Islamabad, Pakistan
| | - Kanwal Naz
- Atta-ur-Rahman School of Applied Biosciences (ASAB), National University of Sciences and Technology (NUST), H-12, Islamabad, Pakistan
| | - Faryal Mehwish Awan
- Atta-ur-Rahman School of Applied Biosciences (ASAB), National University of Sciences and Technology (NUST), H-12, Islamabad, Pakistan
| | - Syed Aun Muhmmad
- Institute of Molecular Biology and Biotechnology, Bahauddin Zakariya University, Multan, Pakistan
| | - Hussnain Ahmed Janjua
- Atta-ur-Rahman School of Applied Biosciences (ASAB), National University of Sciences and Technology (NUST), H-12, Islamabad, Pakistan
| | - Jamil Ahmad
- Research Center for Modeling and Simulation (RCMS), National University of Sciences and Technology (NUST), H-12, Islamabad, Pakistan
- Department of Computer Science and Information Technology, Stratford University, Falls Church, VA 22043 USA
| | - Amjad Ali
- Atta-ur-Rahman School of Applied Biosciences (ASAB), National University of Sciences and Technology (NUST), H-12, Islamabad, Pakistan
| |
Collapse
|
38
|
Guo L, Zhao G, Xu J, Kistler HC, Gao L, Ma L. Compartmentalized gene regulatory network of the pathogenic fungus Fusarium graminearum. THE NEW PHYTOLOGIST 2016; 211:527-41. [PMID: 26990214 PMCID: PMC5069591 DOI: 10.1111/nph.13912] [Citation(s) in RCA: 30] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 10/30/2015] [Accepted: 01/25/2016] [Indexed: 05/09/2023]
Abstract
Head blight caused by Fusarium graminearum threatens world-wide wheat production, resulting in both yield loss and mycotoxin contamination. We reconstructed the global F. graminearum gene regulatory network (GRN) from a large collection of transcriptomic data using Bayesian network inference, a machine-learning algorithm. This GRN reveals connectivity between key regulators and their target genes. Focusing on key regulators, this network contains eight distinct but interwoven modules. Enriched for unique functions, such as cell cycle, DNA replication, transcription, translation and stress responses, each module exhibits distinct expression profiles. Evolutionarily, the F. graminearum genome can be divided into core regions shared with closely related species and variable regions harboring genes that are unique to F. graminearum and perform species-specific functions. Interestingly, the inferred top regulators regulate genes that are significantly enriched from the same genomic regions (P < 0.05), revealing a compartmentalized network structure that may reflect network rewiring related to specific adaptation of this plant pathogen. This first-ever reconstructed filamentous fungal GRN primes our understanding of pathogenicity at the systems biology level and provides enticing prospects for novel disease control strategies involving the targeting of master regulators in pathogens. The program can be used to construct GRNs of other plant pathogens.
Collapse
Affiliation(s)
- Li Guo
- Department of Biochemistry and Molecular BiologyUniversity of Massachusetts AmherstAmherstMA01003USA
| | - Guoyi Zhao
- Department of Electrical & Computer EngineeringUniversity of Massachusetts AmherstAmherstMA01003USA
| | - Jin‐Rong Xu
- Department of Botany and Plant PathologyPurdue UniversityWest LafayetteIN47907USA
| | - H. Corby Kistler
- USDA‐ARSCereal Disease LaboratoryUniversity of MinnesotaSt PaulMN55108USA
| | - Lixin Gao
- Department of Electrical & Computer EngineeringUniversity of Massachusetts AmherstAmherstMA01003USA
| | - Li‐Jun Ma
- Department of Biochemistry and Molecular BiologyUniversity of Massachusetts AmherstAmherstMA01003USA
| |
Collapse
|
39
|
He B, Tan K. Understanding transcriptional regulatory networks using computational models. Curr Opin Genet Dev 2016; 37:101-108. [PMID: 26950762 DOI: 10.1016/j.gde.2016.02.002] [Citation(s) in RCA: 21] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/15/2015] [Revised: 01/29/2016] [Accepted: 02/08/2016] [Indexed: 01/06/2023]
Abstract
Transcriptional regulatory networks (TRNs) encode instructions for animal development and physiological responses. Recent advances in genomic technologies and computational modeling have revolutionized our ability to construct models of TRNs. Here, we survey current computational methods for inferring TRN models using genome-scale data. We discuss their advantages and limitations. We summarize representative TRNs constructed using genome-scale data in both normal and disease development. We discuss lessons learned about the structure/function relationship of TRNs, based on examining various large-scale TRN models. Finally, we outline some open questions regarding TRNs, including how to improve model accuracy by integrating complementary data types, how to infer condition-specific TRNs, and how to compare TRNs across conditions and species in order to understand their structure/function relationship.
Collapse
Affiliation(s)
- Bing He
- Interdisciplinary Graduate Program in Genetics, University of Iowa, Iowa City, IA 52242, USA
| | - Kai Tan
- Interdisciplinary Graduate Program in Genetics, University of Iowa, Iowa City, IA 52242, USA; Department of Internal Medicine, University of Iowa, Iowa City, IA 52242, USA.
| |
Collapse
|
40
|
Cho H, Berger B, Peng J. Reconstructing Causal Biological Networks through Active Learning. PLoS One 2016; 11:e0150611. [PMID: 26930205 PMCID: PMC4773135 DOI: 10.1371/journal.pone.0150611] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/23/2015] [Accepted: 02/16/2016] [Indexed: 11/28/2022] Open
Abstract
Reverse-engineering of biological networks is a central problem in systems biology. The use of intervention data, such as gene knockouts or knockdowns, is typically used for teasing apart causal relationships among genes. Under time or resource constraints, one needs to carefully choose which intervention experiments to carry out. Previous approaches for selecting most informative interventions have largely been focused on discrete Bayesian networks. However, continuous Bayesian networks are of great practical interest, especially in the study of complex biological systems and their quantitative properties. In this work, we present an efficient, information-theoretic active learning algorithm for Gaussian Bayesian networks (GBNs), which serve as important models for gene regulatory networks. In addition to providing linear-algebraic insights unique to GBNs, leading to significant runtime improvements, we demonstrate the effectiveness of our method on data simulated with GBNs and the DREAM4 network inference challenge data sets. Our method generally leads to faster recovery of underlying network structure and faster convergence to final distribution of confidence scores over candidate graph structures using the full data, in comparison to random selection of intervention experiments.
Collapse
Affiliation(s)
- Hyunghoon Cho
- Computer Science and Artificial Intelligence Laboratory, MIT, Cambridge, MA, United States of America
| | - Bonnie Berger
- Computer Science and Artificial Intelligence Laboratory, MIT, Cambridge, MA, United States of America
- Department of Mathematics, MIT, Cambridge, MA, United States of America
- * E-mail: (BB); (JP)
| | - Jian Peng
- Computer Science and Artificial Intelligence Laboratory, MIT, Cambridge, MA, United States of America
- Department of Mathematics, MIT, Cambridge, MA, United States of America
- Department of Computer Science, University of Illinois at Urbana-Champaign, Champaign, IL, United States of America
- * E-mail: (BB); (JP)
| |
Collapse
|
41
|
Longitudinal Prediction of the Infant Gut Microbiome with Dynamic Bayesian Networks. Sci Rep 2016; 6:20359. [PMID: 26853461 PMCID: PMC4745046 DOI: 10.1038/srep20359] [Citation(s) in RCA: 46] [Impact Index Per Article: 5.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/05/2015] [Accepted: 12/31/2015] [Indexed: 12/22/2022] Open
Abstract
Sequencing of the 16S rRNA gene allows comprehensive assessment of bacterial community composition from human body sites. Previously published and publicly accessible data on 58 preterm infants in the Neonatal Intensive Care Unit who underwent frequent stool collection was used. We constructed Dynamic Bayesian Networks from the data and analyzed predictive performance and network characteristics. We constructed a DBN model of the infant gut microbial ecosystem, which explicitly captured specific relationships and general trends in the data: increasing amounts of Clostridia, residual amounts of Bacilli, and increasing amounts of Gammaproteobacteria that then give way to Clostridia. Prediction performance of DBNs with fewer edges were overall more accurate, although less so on harder-to-predict subjects (p = 0.045). DBNs provided quantitative likelihood estimates for rare abruptions events. Iterative prediction was less accurate (p < 0.001), but showed remarkable insensitivity to initial conditions and predicted convergence to a mix of Clostridia, Gammaproteobacteria, and Bacilli. DBNs were able to identify important relationships between microbiome taxa and predict future changes in microbiome composition from measured or synthetic initial conditions. DBNs also provided likelihood estimates for sudden, dramatic shifts in microbiome composition, which may be useful in guiding further analysis of those samples.
Collapse
|
42
|
Fröhlich H, Bahamondez G, Götschel F, Korf U. Dynamic Bayesian Network Modeling of the Interplay between EGFR and Hedgehog Signaling. PLoS One 2015; 10:e0142646. [PMID: 26571415 PMCID: PMC4646463 DOI: 10.1371/journal.pone.0142646] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/14/2015] [Accepted: 10/23/2015] [Indexed: 11/23/2022] Open
Abstract
Aberrant activation of sonic Hegdehog (SHH) signaling has been found to disrupt cellular differentiation in many human cancers and to increase proliferation. The SHH pathway is known to cross-talk with EGFR dependent signaling. Recent studies experimentally addressed this interplay in Daoy cells, which are presumable a model system for medulloblastoma, a highly malignant brain tumor that predominately occurs in children. Currently ongoing are several clinical trials for different solid cancers, which are designed to validate the clinical benefits of targeting the SHH in combination with other pathways. This has motivated us to investigate interactions between EGFR and SHH dependent signaling in greater depth. To our knowledge, there is no mathematical model describing the interplay between EGFR and SHH dependent signaling in medulloblastoma so far. Here we come up with a fully probabilistic approach using Dynamic Bayesian Networks (DBNs). To build our model, we made use of literature based knowledge describing SHH and EGFR signaling and integrated gene expression (Illumina) and cellular location dependent time series protein expression data (Reverse Phase Protein Arrays). We validated our model by sub-sampling training data and making Bayesian predictions on the left out test data. Our predictions focusing on key transcription factors and p70S6K, showed a high level of concordance with experimental data. Furthermore, the stability of our model was tested by a parametric bootstrap approach. Stable network features were in agreement with published data. Altogether we believe that our model improved our understanding of the interplay between two highly oncogenic signaling pathways in Daoy cells. This may open new perspectives for the future therapy of Hedghog/EGF-dependent solid tumors.
Collapse
Affiliation(s)
- Holger Fröhlich
- Algorithmic Bioinformatics, Institute for Computer Science, c/o Bonn-Aachen International Center for IT (B-IT), University of Bonn, Bonn, Germany
- * E-mail:
| | - Gloria Bahamondez
- Algorithmic Bioinformatics, Institute for Computer Science, c/o Bonn-Aachen International Center for IT (B-IT), University of Bonn, Bonn, Germany
| | - Frank Götschel
- Division of Molecular Genome Analysis, German Cancer Research Center (DKFZ), Heidelberg, Germany
| | - Ulrike Korf
- Division of Molecular Genome Analysis, German Cancer Research Center (DKFZ), Heidelberg, Germany
| |
Collapse
|
43
|
Aghdam R, Ganjali M, Niloofar P, Eslahchi C. Inferring gene regulatory networks by an order independent algorithm using incomplete data sets. J Appl Stat 2015. [DOI: 10.1080/02664763.2015.1079307] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/05/2023]
|
44
|
Jennen DGJ, van Leeuwen DM, Hendrickx DM, Gottschalk RWH, van Delft JHM, Kleinjans JCS. Bayesian Network Inference Enables Unbiased Phenotypic Anchoring of Transcriptomic Responses to Cigarette Smoke in Humans. Chem Res Toxicol 2015; 28:1936-48. [PMID: 26360787 DOI: 10.1021/acs.chemrestox.5b00145] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
Abstract
Microarray-based transcriptomic analysis has been demonstrated to hold the opportunity to study the effects of human exposure to, e.g., chemical carcinogens at the whole genome level, thus yielding broad-ranging molecular information on possible carcinogenic effects. Since genes do not operate individually but rather through concerted interactions, analyzing and visualizing networks of genes should provide important mechanistic information, especially upon connecting them to functional parameters, such as those derived from measurements of biomarkers for exposure and carcinogenic risk. Conventional methods such as hierarchical clustering and correlation analyses are frequently used to address these complex interactions but are limited as they do not provide directional causal dependence relationships. Therefore, our aim was to apply Bayesian network inference with the purpose of phenotypic anchoring of modified gene expressions. We investigated a use case on transcriptomic responses to cigarette smoking in humans, in association with plasma cotinine levels as biomarkers of exposure and aromatic DNA-adducts in blood cells as biomarkers of carcinogenic risk. Many of the genes that appear in the Bayesian networks surrounding plasma cotinine, and to a lesser extent around aromatic DNA-adducts, hold biologically relevant functions in inducing severe adverse effects of smoking. In conclusion, this study shows that Bayesian network inference enables unbiased phenotypic anchoring of transcriptomics responses. Furthermore, in all inferred Bayesian networks several dependencies are found which point to known but also to new relationships between the expression of specific genes, cigarette smoke exposure, DNA damaging-effects, and smoking-related diseases, in particular associated with apoptosis, DNA repair, and tumor suppression, as well as with autoimmunity.
Collapse
Affiliation(s)
- Danyel G J Jennen
- Department of Toxicogenomics, Maastricht University , Universiteitssingel 40, 6229 ER Maastricht, The Netherlands
| | - Danitsja M van Leeuwen
- Department of Toxicogenomics, Maastricht University , Universiteitssingel 40, 6229 ER Maastricht, The Netherlands
| | - Diana M Hendrickx
- Department of Toxicogenomics, Maastricht University , Universiteitssingel 40, 6229 ER Maastricht, The Netherlands
| | - Ralph W H Gottschalk
- Department of Toxicogenomics, Maastricht University , Universiteitssingel 40, 6229 ER Maastricht, The Netherlands
| | - Joost H M van Delft
- Department of Toxicogenomics, Maastricht University , Universiteitssingel 40, 6229 ER Maastricht, The Netherlands
| | - Jos C S Kleinjans
- Department of Toxicogenomics, Maastricht University , Universiteitssingel 40, 6229 ER Maastricht, The Netherlands
| |
Collapse
|
45
|
Sinha S. Integration of prior biological knowledge and epigenetic information enhances the prediction accuracy of the Bayesian Wnt pathway. Integr Biol (Camb) 2015; 6:1034-48. [PMID: 25167061 DOI: 10.1039/c4ib00124a] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/21/2022]
Abstract
Computational modeling of the Wnt signaling pathway has gained prominence for its use as a diagnostic tool to develop therapeutic cancer target drugs and predict test samples as tumorous/normal. Diagnostic tools entail modeling of the biological phenomena behind the pathway while prediction requires inclusion of factors for discriminative classification. This manuscript develops simple static Bayesian network predictive models of varying complexity by encompassing prior partially available biological knowledge about intra/extracellular factors and incorporating information regarding epigenetic modification into a few genes that are known to have an inhibitory effect on the pathway. Incorporation of epigenetic information enhances the prediction accuracy of test samples in human colorectal cancer. In comparison to the Naive Bayes model where β-catenin transcription complex activation predictions are assumed to correspond to sample predictions, the new biologically inspired models shed light on differences in behavior of the transcription complex and the state of samples. Receiver operator curves and their respective area under the curve measurements obtained from predictions of the state of the test sample and the corresponding predictions of the state of activation of the β-catenin transcription complex of the pathway for the test sample indicate a significant difference between the transcription complex being on (off) and its association with the sample being tumorous (normal). The two-sample Kolmogorov-Smirnov test confirms the statistical deviation between the distributions of these predictions. Hitherto unknown relationship between factors like DKK2, DKK3-1 and SFRP-2/3/5 w.r.t. the β-catenin transcription complex has been inferred using these causal models.
Collapse
Affiliation(s)
- Shriprakash Sinha
- Netherlands Bioinformatics Centre, 6500 HB, Nijmegen, The Netherlands.
| |
Collapse
|
46
|
Liu ZP. Reverse Engineering of Genome-wide Gene Regulatory Networks from Gene Expression Data. Curr Genomics 2015; 16:3-22. [PMID: 25937810 PMCID: PMC4412962 DOI: 10.2174/1389202915666141110210634] [Citation(s) in RCA: 54] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/07/2014] [Revised: 09/05/2014] [Accepted: 09/05/2014] [Indexed: 12/17/2022] Open
Abstract
Transcriptional regulation plays vital roles in many fundamental biological processes. Reverse engineering of genome-wide regulatory networks from high-throughput transcriptomic data provides a promising way to characterize the global scenario of regulatory relationships between regulators and their targets. In this review, we summarize and categorize the main frameworks and methods currently available for inferring transcriptional regulatory networks from microarray gene expression profiling data. We overview each of strategies and introduce representative methods respectively. Their assumptions, advantages, shortcomings, and possible improvements and extensions are also clarified and commented.
Collapse
Affiliation(s)
- Zhi-Ping Liu
- Department of Biomedical Engineering, School of Control Science and Engineering, Shandong University, Jinan, Shandong 250061, China
| |
Collapse
|
47
|
Recovering drug-induced apoptosis subnetwork from Connectivity Map data. BIOMED RESEARCH INTERNATIONAL 2015; 2015:708563. [PMID: 25883971 PMCID: PMC4389823 DOI: 10.1155/2015/708563] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 12/17/2014] [Revised: 03/06/2015] [Accepted: 03/09/2015] [Indexed: 01/29/2023]
Abstract
The Connectivity Map (CMAP) project profiled human cancer cell lines exposed to a library of anticancer compounds with the goal of connecting cancer with underlying genes and potential treatments. Since the therapeutic goal of most anticancer drugs is to induce tumor-selective apoptosis, it is critical to understand the specific cell death pathways triggered by drugs. This can help to better understand the mechanism of how cancer cells respond to chemical stimulations and improve the treatment of human tumors. In this study, using CMAP microarray data from breast cancer cell line MCF7, we applied a Gaussian Bayesian network modeling approach and identified apoptosis as a major drug-induced cellular-pathway. We then focused on 13 apoptotic genes that showed significant differential expression across all drug-perturbed samples to reconstruct the apoptosis network. In our predicted subnetwork, 9 out of 15 high-confidence interactions were validated in the literature, and our inferred network captured two major cell death pathways by identifying BCL2L11 and PMAIP1 as key interacting players for the intrinsic apoptosis pathway and TAXBP1 and TNFAIP3 for the extrinsic apoptosis pathway. Our inferred apoptosis network also suggested the role of BCL2L11 and TNFAIP3 as "gateway" genes in the drug-induced intrinsic and extrinsic apoptosis pathways.
Collapse
|
48
|
Huang X, Zi Z. Inferring cellular regulatory networks with Bayesian model averaging for linear regression (BMALR). MOLECULAR BIOSYSTEMS 2015; 10:2023-30. [PMID: 24899235 DOI: 10.1039/c4mb00053f] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/29/2023]
Abstract
Bayesian network and linear regression methods have been widely applied to reconstruct cellular regulatory networks. In this work, we propose a Bayesian model averaging for linear regression (BMALR) method to infer molecular interactions in biological systems. This method uses a new closed form solution to compute the posterior probabilities of the edges from regulators to the target gene within a hybrid framework of Bayesian model averaging and linear regression methods. We have assessed the performance of BMALR by benchmarking on both in silico DREAM datasets and real experimental datasets. The results show that BMALR achieves both high prediction accuracy and high computational efficiency across different benchmarks. A pre-processing of the datasets with the log transformation can further improve the performance of BMALR, leading to a new top overall performance. In addition, BMALR can achieve robust high performance in community predictions when it is combined with other competing methods. The proposed method BMALR is competitive compared to the existing network inference methods. Therefore, BMALR will be useful to infer regulatory interactions in biological networks. A free open source software tool for the BMALR algorithm is available at https://sites.google.com/site/bmalr4netinfer/.
Collapse
Affiliation(s)
- Xun Huang
- BIOSS Centre for Biological Signalling Studies, University of Freiburg, 79104, Freiburg, Germany.
| | | |
Collapse
|
49
|
Stegle O, Teichmann SA, Marioni JC. Computational and analytical challenges in single-cell transcriptomics. Nat Rev Genet 2015; 16:133-45. [PMID: 25628217 DOI: 10.1038/nrg3833] [Citation(s) in RCA: 733] [Impact Index Per Article: 81.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/12/2022]
Abstract
The development of high-throughput RNA sequencing (RNA-seq) at the single-cell level has already led to profound new discoveries in biology, ranging from the identification of novel cell types to the study of global patterns of stochastic gene expression. Alongside the technological breakthroughs that have facilitated the large-scale generation of single-cell transcriptomic data, it is important to consider the specific computational and analytical challenges that still have to be overcome. Although some tools for analysing RNA-seq data from bulk cell populations can be readily applied to single-cell RNA-seq data, many new computational strategies are required to fully exploit this data type and to enable a comprehensive yet detailed study of gene expression at the single-cell level.
Collapse
Affiliation(s)
- Oliver Stegle
- European Molecular Biology Laboratory European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SD, UK
| | - Sarah A Teichmann
- 1] European Molecular Biology Laboratory European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SD, UK. [2] Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SA, UK
| | - John C Marioni
- 1] European Molecular Biology Laboratory European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SD, UK. [2] Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SA, UK
| |
Collapse
|
50
|
Guo NL, Wan YW. Network-based identification of biomarkers coexpressed with multiple pathways. Cancer Inform 2014; 13:37-47. [PMID: 25392692 PMCID: PMC4218687 DOI: 10.4137/cin.s14054] [Citation(s) in RCA: 19] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/23/2014] [Revised: 06/25/2014] [Accepted: 06/29/2014] [Indexed: 02/07/2023] Open
Abstract
Unraveling complex molecular interactions and networks and incorporating clinical information in modeling will present a paradigm shift in molecular medicine. Embedding biological relevance via modeling molecular networks and pathways has become increasingly important for biomarker identification in cancer susceptibility and metastasis studies. Here, we give a comprehensive overview of computational methods used for biomarker identification, and provide a performance comparison of several network models used in studies of cancer susceptibility, disease progression, and prognostication. Specifically, we evaluated implication networks, Boolean networks, Bayesian networks, and Pearson’s correlation networks in constructing gene coexpression networks for identifying lung cancer diagnostic and prognostic biomarkers. The results show that implication networks, implemented in Genet package, identified sets of biomarkers that generated an accurate prediction of lung cancer risk and metastases; meanwhile, implication networks revealed more biologically relevant molecular interactions than Boolean networks, Bayesian networks, and Pearson’s correlation networks when evaluated with MSigDB database.
Collapse
Affiliation(s)
- Nancy Lan Guo
- Mary Babb Randolph Cancer Center/School of Public Health, West Virginia University, Morgantown, WV, USA
| | - Ying-Wooi Wan
- Mary Babb Randolph Cancer Center/School of Public Health, West Virginia University, Morgantown, WV, USA
| |
Collapse
|