1
|
Ito Y, Uda S, Kokaji T, Hirayama A, Soga T, Suzuki Y, Kuroda S, Kubota H. Comparison of hepatic responses to glucose perturbation between healthy and obese mice based on the edge type of network structures. Sci Rep 2023; 13:4758. [PMID: 36959243 PMCID: PMC10036622 DOI: 10.1038/s41598-023-31547-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/14/2022] [Accepted: 03/14/2023] [Indexed: 03/25/2023] Open
Abstract
Interactions between various molecular species in biological phenomena give rise to numerous networks. The investigation of these networks, including their statistical and biochemical interactions, supports a deeper understanding of biological phenomena. The clustering of nodes associated with molecular species and enrichment analysis is frequently applied to examine the biological significance of such network structures. However, these methods focus on delineating the function of a node. As such, in-depth investigations of the edges, which are the connections between the nodes, are rarely explored. In the current study, we aimed to investigate the functions of the edges rather than the nodes. To accomplish this, for each network, we categorized the edges and defined the edge type based on their biological annotations. Subsequently, we used the edge type to compare the network structures of the metabolome and transcriptome in the livers of healthy (wild-type) and obese (ob/ob) mice following oral glucose administration (OGTT). The findings demonstrate that the edge type can facilitate the characterization of the state of a network structure, thereby reducing the information available through datasets containing the OGTT response in the metabolome and transcriptome.
Collapse
Affiliation(s)
- Yuki Ito
- Division of Integrated Omics, Medical Research Center for High Depth Omics, Medical Institute of Bioregulation, Kyushu University, 3-1-1 Maidashi, Higashi-ku, Fukuoka, 812-8582, Japan
- Department of Computational Biology and Medical Sciences, Graduate School of Frontier Sciences, University of Tokyo, 5-1-5 Kashiwanoha, Kashiwa, Chiba, 277-8562, Japan
| | - Shinsuke Uda
- Division of Integrated Omics, Medical Research Center for High Depth Omics, Medical Institute of Bioregulation, Kyushu University, 3-1-1 Maidashi, Higashi-ku, Fukuoka, 812-8582, Japan.
| | - Toshiya Kokaji
- Department of Computational Biology and Medical Sciences, Graduate School of Frontier Sciences, University of Tokyo, 5-1-5 Kashiwanoha, Kashiwa, Chiba, 277-8562, Japan
- Data Science Center, Nara Institute of Science and Technology, 8916-5, Takayamacho, Ikoma, Nara, 630-0192, Japan
| | - Akiyoshi Hirayama
- Institute for Advanced Biosciences, Keio University, 246-2 Mizukami, Kakuganji, Tsuruoka, Yamagata, 997-0052, Japan
| | - Tomoyoshi Soga
- Institute for Advanced Biosciences, Keio University, 246-2 Mizukami, Kakuganji, Tsuruoka, Yamagata, 997-0052, Japan
| | - Yutaka Suzuki
- Department of Computational Biology and Medical Sciences, Graduate School of Frontier Sciences, University of Tokyo, 5-1-5 Kashiwanoha, Kashiwa, Chiba, 277-8562, Japan
| | - Shinya Kuroda
- Department of Computational Biology and Medical Sciences, Graduate School of Frontier Sciences, University of Tokyo, 5-1-5 Kashiwanoha, Kashiwa, Chiba, 277-8562, Japan
- Department of Biological Sciences, Graduate School of Science, University of Tokyo, 7-3-1 Hongo, Bunkyo-ku, Tokyo, 113-0033, Japan
- Core Research for Evolutional Science and Technology (CREST), Japan Science and Technology Agency, Bunkyo-ku, Tokyo, 113-0033, Japan
| | - Hiroyuki Kubota
- Division of Integrated Omics, Medical Research Center for High Depth Omics, Medical Institute of Bioregulation, Kyushu University, 3-1-1 Maidashi, Higashi-ku, Fukuoka, 812-8582, Japan
| |
Collapse
|
2
|
Banerjee A, Goswami RP, Chatterjee M. Network theoretic analysis of JAK/STAT pathway and extrapolation to drugs and viruses including COVID-19. Sci Rep 2021; 11:2512. [PMID: 33510353 PMCID: PMC7844052 DOI: 10.1038/s41598-021-82139-x] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/16/2020] [Accepted: 12/15/2020] [Indexed: 01/20/2023] Open
Abstract
Whenever some phenomenon can be represented as a graph or a network it seems pertinent to explore how much the mathematical properties of that network impact the phenomenon. In this study we explore the same philosophy in the context of immunology. Our objective was to assess the correlation of "size" (number of edges and minimum vertex cover) of the JAK/STAT network with treatment effect in rheumatoid arthritis (RA), phenotype of viral infection and effect of immunosuppressive agents on a system infected with the coronavirus. We extracted the JAK/STAT pathway from Kyoto Encyclopedia of Genes and Genomes (KEGG, hsa04630). The effects of the following drugs, and their combinations, commonly used in RA were tested: methotrexate, prednisolone, rituximab, tocilizumab, tofacitinib and baricitinib. Following viral systems were also tested for their ability to evade the JAK/STAT pathway: Measles, Influenza A, West Nile virus, Japanese B virus, Yellow Fever virus, respiratory syncytial virus, Kaposi's sarcoma virus, Hepatitis B and C virus, cytomegalovirus, Hendra and Nipah virus and Coronavirus. Good correlation of edges and minimum vertex cover with clinical efficacy were observed (for edge, rho = - 0.815, R2 = 0.676, p = 0.007, for vertex cover rho = - 0.793, R2 = 0.635, p = 0.011). In the viral systems both edges and vertex cover were associated with acuteness of viral infections. In the JAK/STAT system already infected with coronavirus, maximum reduction in size was achieved with baricitinib. To conclude, algebraic and combinatorial invariant of a network may explain its biological behaviour. At least theoretically, baricitinib may be an attractive target for treatment of coronavirus infection.
Collapse
Affiliation(s)
- Arindam Banerjee
- Department of Mathematics, Ramakrishna Mission Vivekananda Educational and Research Institute, Belur, India
| | - Rudra Prosad Goswami
- Department of Rheumatology, All India Institute of Medical Sciences, New Delhi, India.
| | - Moumita Chatterjee
- Department of Mathematics and Statistics, Aliah University, Kolkata, India
| |
Collapse
|
3
|
Chang C, Oh J, Min EJ, Long Q. Knowledge-Guided Biclustering via Sparse Variational EM Algorithm. 10TH IEEE INTERNATIONAL CONFERENCE ON BIG KNOWLEDGE : PROCEEDINGS : 10-11 NOVEMBER 2019, BEIJING, CHINA. IEEE INTERNATIONAL CONFERENCE ON BIG KNOWLEDGE (10TH : 2019 : BEIJING, CHINA) 2019; 2019:25-32. [PMID: 34290493 PMCID: PMC8291726 DOI: 10.1109/icbk.2019.00012] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/13/2023]
Abstract
A biclustering in the analysis of a gene expression data matrix, for example, is defined as a set of biclusters where each bicluster is a group of genes and a group of samples for which the genes are differentially expressed. Although many data mining approaches for biclustering exist in the literature, only few are able to incorporate prior knowledge to the analysis, which can lead to great improvements in terms of accuracy and interpretability, and all are limited in handling discrete data types. We propose a generalized biclustering approach that can be used for integrative analysis of multi-omics data with different data types. Our method is capable of utilizing biological information that can be represented by graph such as functional genomics and functional proteomics and accommodating a combination of continuous and discrete data types. The proposed method builds on a generalized Bayesian factor analysis framework and a variational EM approach is used to obtain parameter estimates, where the latent quantities in the loglikelihood are iteratively imputed by their conditional expectations. The biclusters are retrieved via the sparse estimates of the factor loadings and the conditional expectation of the latent factors. In order to obtain the sparse conditional expectation of the latent factors, a novel sparse variational EM algorithm is used. We demonstrate the superiority of our method over several existing biclustering methods in extensive simulation experiements and in integrative analysis of multi-omics data.
Collapse
Affiliation(s)
- Changgee Chang
- Department of Biostatistics, Epidemiology, and Informatics, University of Pennsylvania, Philadelphia, USA
| | - Jihwan Oh
- Department of Biostatistics, Epidemiology, and Informatics, University of Pennsylvania, Philadelphia, USA
| | - Eun Jeong Min
- Department of Biostatistics, Epidemiology, and Informatics, University of Pennsylvania, Philadelphia, USA
| | - Qi Long
- Department of Biostatistics, Epidemiology, and Informatics, University of Pennsylvania, Philadelphia, USA
| |
Collapse
|
4
|
Sun W, Chang C, Zhao Y, Long Q. Knowledge-Guided Bayesian Support Vector Machine for High-Dimensional Data with Application to Analysis of Genomics Data. PROCEEDINGS : ... IEEE INTERNATIONAL CONFERENCE ON BIG DATA. IEEE INTERNATIONAL CONFERENCE ON BIG DATA 2018; 2018:1484-1493. [PMID: 31041431 PMCID: PMC6486656 DOI: 10.1109/bigdata.2018.8622484] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/13/2022]
Abstract
Support vector machine (SVM) is a popular classification method for the analysis of wide range of data including big data. Many SVM methods with feature selection have been developed under frequentist regularization or Bayesian shrinkage frameworks. On the other hand, the importance of incorporating a priori known biological knowledge, such as gene pathway information which stems from the gene regulatory network, into the statistical analysis of genomic data has been recognized in recent years. In this article, we propose a new Bayesian SVM approach that enables the feature selection to be guided by the knowledge on the graphical structure among predictors. The proposed method uses the spike-and-slab prior for feature selection, combined with the Ising prior that encourages group-wise selection of the predictors adjacent to each other on the known graph. Gibbs sampling algorithm is used for Bayesian inference. The performance of our method is evaluated and compared with existing SVM methods in terms of prediction and feature selection in extensive simulation settings. In addition, our method is illustrated in the analysis of genomic data from a cancer study, demonstrating its advantage in generating biologically meaningful results and identifying potentially important features.
Collapse
Affiliation(s)
- Wenli Sun
- Department of Biostatistics, Epidemiology and Informatics The University of Pennsylvania, Philadelphia, PA, 19104
| | - Changgee Chang
- Department of Biostatistics, Epidemiology and Informatics The University of Pennsylvania, Philadelphia, PA, 19104
| | - Yize Zhao
- Department of Healthcare Policy and Research Weill Cornell Medicine, Cornell University, New York, NY, 10065
| | - Qi Long
- Department of Biostatistics, Epidemiology and Informatics The University of Pennsylvania, Philadelphia, PA, 19104
| |
Collapse
|
5
|
Hammami R, Fliss I. Use of SciDBMaker as Tool for the Design of Specialized Biological Databases. Bioinformatics 2013. [DOI: 10.4018/978-1-4666-3604-0.ch093] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/09/2022] Open
Abstract
The exponential growth of molecular biology research in recent decades has brought concomitant growth in the number and size of genomic and proteomic databases used to interpret experimental findings. Particularly, growth of protein sequence records created the need for smaller and manually annotated databases. Since scientists are continually developing new specific databases to enhance their understanding of biological processes, the authors created SciDBMaker to provide a tool for easy building of new specialized protein knowledge bases. This chapter also suggests best practices for specialized biological databases design, and provides examples for the implementation of these practices.
Collapse
|
6
|
Molecular signaling network complexity is correlated with cancer patient survivability. Proc Natl Acad Sci U S A 2012; 109:9209-12. [PMID: 22615392 DOI: 10.1073/pnas.1201416109] [Citation(s) in RCA: 63] [Impact Index Per Article: 4.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/23/2022] Open
Abstract
The 5-y survival for cancer patients after diagnosis and treatment is strongly dependent on tumor type. Prostate cancer patients have a >99% chance of survival past 5 y after diagnosis, and pancreatic patients have <6% chance of survival past 5 y. Because each cancer type has its own molecular signaling network, we asked if there are "signatures" embedded in these networks that inform us as to the 5-y survival. In other words, are there statistical metrics of the network that correlate with survival? Furthermore, if there are, can such signatures provide clues to selecting new therapeutic targets? From the Kyoto Encyclopedia of Genes and Genomes Cancer Pathway database we computed several conventional and some less conventional network statistics. In particular we found a correlation (R(2) = 0.7) between degree-entropy and 5-y survival based on the Surveillance Epidemiology and End Results database. This correlation suggests that cancers that have a more complex molecular pathway are more refractory than those with less complex molecular pathway. We also found potential new molecular targets for drugs by computing the betweenness--a statistical metric of the centrality of a node--for the molecular networks.
Collapse
|
7
|
Mazza T, Ballarini P, Guido R, Prandi D. The Relevance of Topology in Parallel Simulation of Biological Networks. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2012; 9:911-923. [PMID: 22331861 DOI: 10.1109/tcbb.2012.27] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/31/2023]
Abstract
Important achievements in traditional biology has deepened the knowledge about living systems leading to an extensive identification of parts-list of the cell as well as of the interactions among biochemical species responsible for cell's regulation. Such an expanding knowledge also introduces new issues. For example the increasing comprehension of the inter- dependencies between pathways (pathways cross-talk) has resulted, on one hand, in the growth of informational complexity, on the other, in a strong lack of information coherence. The overall grand challenge remains unchanged: to be able to assemble the knowledge of every 'piece' of a system in order to figure out the behavior of the whole (integrative approach). In light of these considerations high performance computing plays a fundamental role in the context of in-silico biology. Stochastic simulation is a renowned analysis tool, which, although widely used, is subject to stringent computational requirements, in particular when dealing with heterogeneous and high dimensional systems. Here we introduce and discuss a methodology aimed at alleviating the burden of simulating complex biological networks. Such a method, which springs from graph theory, is based on the principle of fragmenting the computational space of a simulation trace and delegating the computation of fragments to a number of parallel processes.
Collapse
|
8
|
Guney E, Sanz-Pamplona R, Sierra A, Oliva B. Understanding Cancer Progression Using Protein Interaction Networks. SYSTEMS BIOLOGY IN CANCER RESEARCH AND DRUG DISCOVERY 2012:167-195. [DOI: 10.1007/978-94-007-4819-4_7] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/03/2025]
|
9
|
Hester SD, Johnstone AF, Boyes WK, Bushnell PJ, Shafer TJ. Acute toluene exposure alters expression of genes in the central nervous system associated with synaptic structure and function. Neurotoxicol Teratol 2011; 33:521-9. [DOI: 10.1016/j.ntt.2011.07.008] [Citation(s) in RCA: 18] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/19/2011] [Revised: 07/07/2011] [Accepted: 07/20/2011] [Indexed: 10/17/2022]
|
10
|
Kandasamy K, Mohan SS, Raju R, Keerthikumar S, Kumar GSS, Venugopal AK, Telikicherla D, Navarro JD, Mathivanan S, Pecquet C, Gollapudi SK, Tattikota SG, Mohan S, Padhukasahasram H, Subbannayya Y, Goel R, Jacob HKC, Zhong J, Sekhar R, Nanjappa V, Balakrishnan L, Subbaiah R, Ramachandra YL, Rahiman BA, Prasad TSK, Lin JX, Houtman JCD, Desiderio S, Renauld JC, Constantinescu SN, Ohara O, Hirano T, Kubo M, Singh S, Khatri P, Draghici S, Bader GD, Sander C, Leonard WJ, Pandey A. NetPath: a public resource of curated signal transduction pathways. Genome Biol 2010; 11:R3. [PMID: 20067622 PMCID: PMC2847715 DOI: 10.1186/gb-2010-11-1-r3] [Citation(s) in RCA: 351] [Impact Index Per Article: 23.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/21/2009] [Revised: 11/02/2009] [Accepted: 01/12/2010] [Indexed: 12/18/2022] Open
Abstract
NetPath, a novel community resource of curated human signaling pathways is presented and its utility demonstrated using immune signaling data. We have developed NetPath as a resource of curated human signaling pathways. As an initial step, NetPath provides detailed maps of a number of immune signaling pathways, which include approximately 1,600 reactions annotated from the literature and more than 2,800 instances of transcriptionally regulated genes - all linked to over 5,500 published articles. We anticipate NetPath to become a consolidated resource for human signaling pathways that should enable systems biology approaches.
Collapse
Affiliation(s)
- Kumaran Kandasamy
- Institute of Bioinformatics, International Tech Park, Bangalore 560066, India.
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
11
|
Mazza T, Iaccarino G, Priami C. Snazer: the simulations and networks analyzer. BMC SYSTEMS BIOLOGY 2010; 4:1. [PMID: 20056001 PMCID: PMC2880970 DOI: 10.1186/1752-0509-4-1] [Citation(s) in RCA: 25] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 04/24/2009] [Accepted: 01/07/2010] [Indexed: 11/10/2022]
Abstract
BACKGROUND Networks are widely recognized as key determinants of structure and function in systems that span the biological, physical, and social sciences. They are static pictures of the interactions among the components of complex systems. Often, much effort is required to identify networks as part of particular patterns as well as to visualize and interpret them.From a pure dynamical perspective, simulation represents a relevant way-out. Many simulator tools capitalized on the "noisy" behavior of some systems and used formal models to represent cellular activities as temporal trajectories. Statistical methods have been applied to a fairly large number of replicated trajectories in order to infer knowledge.A tool which both graphically manipulates reactive models and deals with sets of simulation time-course data by aggregation, interpretation and statistical analysis is missing and could add value to simulators. RESULTS We designed and implemented Snazer, the simulations and networks analyzer. Its goal is to aid the processes of visualizing and manipulating reactive models, as well as to share and interpret time-course data produced by stochastic simulators or by any other means. CONCLUSIONS Snazer is a solid prototype that integrates biological network and simulation time-course data analysis techniques.
Collapse
Affiliation(s)
- Tommaso Mazza
- The Microsoft Research University of Trento, CoSBi, Trento, Italy
| | | | - Corrado Priami
- The Microsoft Research University of Trento, CoSBi, Trento, Italy
- DISI - University of Trento, Trento, Italy
| |
Collapse
|
12
|
Abstract
BACKGROUND Increasingly, effective drug discovery involves the searching and data mining of large volumes of information from many sources covering the domains of chemistry, biology and pharmacology amongst others. This has led to a proliferation of databases and data sources relevant to drug discovery. OBJECTIVE This paper provides a review of the publicly-available large-scale databases relevant to drug discovery, describes the kinds of data mining approaches that can be applied to them and discusses recent work in integrative data mining that looks for associations that pan multiple sources, including the use of Semantic Web techniques. CONCLUSION The future of mining large data sets for drug discovery requires intelligent, semantic aggregation of information from all of the data sources described in this review, along with the application of advanced methods such as intelligent agents and inference engines in client applications.
Collapse
Affiliation(s)
- David J Wild
- Director of Cheminformatics Program, Assistant Professor of Informatics, Indiana Universtiy, School of Informatics and Computing, 901 E. 10th St., Bloomington, IN 47408, USA +1 812 856 1848 ; +1 608 541 5402 ;
| |
Collapse
|
13
|
Jiang K, Huang Y, Robertson J. A method of biological pathway similarity search using high performance computing. ANNUAL INTERNATIONAL CONFERENCE OF THE IEEE ENGINEERING IN MEDICINE AND BIOLOGY SOCIETY. IEEE ENGINEERING IN MEDICINE AND BIOLOGY SOCIETY. ANNUAL INTERNATIONAL CONFERENCE 2009; 2009:4933-4936. [PMID: 19963871 DOI: 10.1109/iembs.2009.5332712] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/28/2023]
Abstract
Comparative study of biological pathway structures and composition can aid us in elucidating the functions of newly discovered pathways, understanding evolutionary traits, and determining missing pathway elements. A method has been developed to perform pair-wise comparison and similarity search of biological pathways. The comparison determines the differences of each pair of pathways represented in the XML format. The similarity search uses a scoring mechanism to rank the similarities of the pathway in question against those in the pathway repository. To achieve a reasonably good performance, the method is being implemented using the Condor high performance computing environment.
Collapse
Affiliation(s)
- Keyuan Jiang
- Department of Computer Information Technology and Graphics, Purdue University Calumet, Hammond, IN 46323, USA.
| | | | | |
Collapse
|
14
|
Hahne F, Mehrle A, Arlt D, Poustka A, Wiemann S, Beissbarth T. Extending pathways based on gene lists using InterPro domain signatures. BMC Bioinformatics 2008; 9:3. [PMID: 18177498 PMCID: PMC2245903 DOI: 10.1186/1471-2105-9-3] [Citation(s) in RCA: 12] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/16/2007] [Accepted: 01/04/2008] [Indexed: 12/28/2022] Open
Abstract
Background High-throughput technologies like functional screens and gene expression analysis produce extended lists of candidate genes. Gene-Set Enrichment Analysis is a commonly used and well established technique to test for the statistically significant over-representation of particular pathways. A shortcoming of this method is however, that most genes that are investigated in the experiments have very sparse functional or pathway annotation and therefore cannot be the target of such an analysis. The approach presented here aims to assign lists of genes with limited annotation to previously described functional gene collections or pathways. This works by comparing InterPro domain signatures of the candidate gene lists with domain signatures of gene sets derived from known classifications, e.g. KEGG pathways. Results In order to validate our approach, we designed a simulation study. Based on all pathways available in the KEGG database, we create test gene lists by randomly selecting pathway genes, removing these genes from the known pathways and adding variable amounts of noise in the form of genes not annotated to the pathway. We show that we can recover pathway memberships based on the simulated gene lists with high accuracy. We further demonstrate the applicability of our approach on a biological example. Conclusion Results based on simulation and data analysis show that domain based pathway enrichment analysis is a very sensitive method to test for enrichment of pathways in sparsely annotated lists of genes. An R based software package domainsignatures, to routinely perform this analysis on the results of high-throughput screening, is available via Bioconductor.
Collapse
Affiliation(s)
- Florian Hahne
- German Cancer Research Center, Molecular Genome Analysis, Im Neuenheimer Feld 580,69120 Heidelberg, Germany.
| | | | | | | | | | | |
Collapse
|
15
|
Lu LJ, Sboner A, Huang YJ, Lu HX, Gianoulis TA, Yip KY, Kim PM, Montelione GT, Gerstein MB. Comparing classical pathways and modern networks: towards the development of an edge ontology. Trends Biochem Sci 2007; 32:320-31. [PMID: 17583513 DOI: 10.1016/j.tibs.2007.06.003] [Citation(s) in RCA: 44] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/23/2006] [Revised: 05/02/2007] [Accepted: 06/06/2007] [Indexed: 02/04/2023]
Abstract
Pathways are integral to systems biology. Their classical representation has proven useful but is inconsistent in the meaning assigned to each arrow (or edge) and inadvertently implies the isolation of one pathway from another. Conversely, modern high-throughput (HTP) experiments offer standardized networks that facilitate topological calculations. Combining these perspectives, classical pathways can be embedded within large-scale networks and thus demonstrate the crosstalk between them. As more diverse types of HTP data become available, both perspectives can be effectively merged, embedding pathways simultaneously in multiple networks. However, the original problem still remains - the current edge representation is inadequate to accurately convey all the information in pathways. Therefore, we suggest that a standardized and well-defined edge ontology is necessary and propose a prototype as a starting point for reaching this goal.
Collapse
Affiliation(s)
- Long J Lu
- Department of Molecular Biophysics and Biochemistry, Yale University, 266 Whitney Avenue, New Haven, CT 06520, USA
| | | | | | | | | | | | | | | | | |
Collapse
|
16
|
BioPP: a tool for web-publication of biological networks. BMC Bioinformatics 2007; 8:168. [PMID: 17519033 PMCID: PMC1885811 DOI: 10.1186/1471-2105-8-168] [Citation(s) in RCA: 12] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/09/2007] [Accepted: 05/22/2007] [Indexed: 11/25/2022] Open
Abstract
Background Cellular processes depend on the function of intracellular molecular networks. The curation of the literature relevant to specific biological pathways is important for many theoretical and experimental research teams and communities. No current tool supports web publication or hosting of user-developed large scale annotated pathway diagrams. Sharing via web publication is needed to allow real-time access to the current literature pathway knowledgebase, both privately within a research team or publicly among the outside research community. Web publication also facilitates team and/or community input into the curation process while allowing centralized control of the curation and validation process. We have developed new tool to address these needs. Biological Pathway Publisher (BioPP) is a software suite for converting CellDesigner Systems Biology Markup Language (CD-SBML) formatted pathways into a web viewable format. The BioPP suite is available for private use and for depositing knowledgebases into a newly created public repository. Results BioPP suite is a web-based application that allows pathway knowledgebases stored in CD-SBML to be web published with an easily navigated user interface. The BioPP suite consists of four interrelated elements: a pathway publisher, an upload web-interface, a pathway repository for user-deposited knowledgebases and a pathway navigator. Users have the option to convert their CD-SBML files to HTML for restricted use or to allow their knowledgebase to be web-accessible to the scientific community. All entities in all knowledgebases in the repository are linked to public database entries as well as to a newly created public wiki which provides a discussion forum. Conclusion BioPP tools and the public repository facilitate sharing of pathway knowledgebases and interactive curation for research teams and scientific communities. BioPP suite is accessible at
Collapse
|
17
|
Efroni S, Schaefer CF, Buetow KH. Identification of key processes underlying cancer phenotypes using biologic pathway analysis. PLoS One 2007; 2:e425. [PMID: 17487280 PMCID: PMC1855990 DOI: 10.1371/journal.pone.0000425] [Citation(s) in RCA: 100] [Impact Index Per Article: 5.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/05/2007] [Accepted: 03/29/2007] [Indexed: 11/19/2022] Open
Abstract
Cancer is recognized to be a family of gene-based diseases whose causes are to be found in disruptions of basic biologic processes. An increasingly deep catalogue of canonical networks details the specific molecular interaction of genes and their products. However, mapping of disease phenotypes to alterations of these networks of interactions is accomplished indirectly and non-systematically. Here we objectively identify pathways associated with malignancy, staging, and outcome in cancer through application of an analytic approach that systematically evaluates differences in the activity and consistency of interactions within canonical biologic processes. Using large collections of publicly accessible genome-wide gene expression, we identify small, common sets of pathways – Trka Receptor, Apoptosis response to DNA Damage, Ceramide, Telomerase, CD40L and Calcineurin – whose differences robustly distinguish diverse tumor types from corresponding normal samples, predict tumor grade, and distinguish phenotypes such as estrogen receptor status and p53 mutation state. Pathways identified through this analysis perform as well or better than phenotypes used in the original studies in predicting cancer outcome. This approach provides a means to use genome-wide characterizations to map key biological processes to important clinical features in disease.
Collapse
Affiliation(s)
- Sol Efroni
- National Cancer Institute Center for Bioinformatics, Rockville, Maryland, United States of America
| | - Carl F. Schaefer
- National Cancer Institute Center for Bioinformatics, Rockville, Maryland, United States of America
| | - Kenneth H. Buetow
- National Cancer Institute Center for Bioinformatics, Rockville, Maryland, United States of America
- Laboratory of Population Genetics, National Cancer Institute, Bethesda, Maryland, United States of America
- * To whom correspondence should be addressed. E-mail:
| |
Collapse
|
18
|
Abnizova I, Subhankulova T, Gilks WR. Recent computational approaches to understand gene regulation: mining gene regulation in silico. Curr Genomics 2007; 8:79-91. [PMID: 18660846 PMCID: PMC2435357 DOI: 10.2174/138920207780368150] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/30/2006] [Revised: 12/13/2006] [Accepted: 12/15/2006] [Indexed: 01/03/2023] Open
Abstract
This paper reviews recent computational approaches to the understanding of gene regulation in eukaryotes. Cis-regulation of gene expression by the binding of transcription factors is a critical component of cellular physiology. In eukaryotes, a number of transcription factors often work together in a combinatorial fashion to enable cells to respond to a wide spectrum of environmental and developmental signals. Integration of genome sequences and/or Chromatin Immunoprecipitation on chip data with gene-expression data has facilitated in silico discovery of how the combinatorics and positioning of transcription factors binding sites underlie gene activation in a variety of cellular processes.The process of gene regulation is extremely complex and intriguing, therefore all possible points of view and related links should be carefully considered. Here we attempt to collect an inventory, not claiming it to be comprehensive and complete, of related computational biological topics covering gene regulation, which may en-lighten the process, and briefly review what is currently occurring in these areas.We will consider the following computational areas:o gene regulatory network construction;o evolution of regulatory DNA;o studies of its structural and statistical informational properties;o and finally, regulatory RNA.
Collapse
Affiliation(s)
| | - T Subhankulova
- Wellcome Trust/Cancer Research UK Gurdon Institute of Cancer and Developmental Biology, Cambridge, UK
| | | |
Collapse
|
19
|
Cerami EG, Bader GD, Gross BE, Sander C. cPath: open source software for collecting, storing, and querying biological pathways. BMC Bioinformatics 2006; 7:497. [PMID: 17101041 PMCID: PMC1660554 DOI: 10.1186/1471-2105-7-497] [Citation(s) in RCA: 74] [Impact Index Per Article: 3.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/16/2006] [Accepted: 11/13/2006] [Indexed: 11/10/2022] Open
Abstract
Background Biological pathways, including metabolic pathways, protein interaction networks, signal transduction pathways, and gene regulatory networks, are currently represented in over 220 diverse databases. These data are crucial for the study of specific biological processes, including human diseases. Standard exchange formats for pathway information, such as BioPAX, CellML, SBML and PSI-MI, enable convenient collection of this data for biological research, but mechanisms for common storage and communication are required. Results We have developed cPath, an open source database and web application for collecting, storing, and querying biological pathway data. cPath makes it easy to aggregate custom pathway data sets available in standard exchange formats from multiple databases, present pathway data to biologists via a customizable web interface, and export pathway data via a web service to third-party software, such as Cytoscape, for visualization and analysis. cPath is software only, and does not include new pathway information. Key features include: a built-in identifier mapping service for linking identical interactors and linking to external resources; built-in support for PSI-MI and BioPAX standard pathway exchange formats; a web service interface for searching and retrieving pathway data sets; and thorough documentation. The cPath software is freely available under the LGPL open source license for academic and commercial use. Conclusion cPath is a robust, scalable, modular, professional-grade software platform for collecting, storing, and querying biological pathways. It can serve as the core data handling component in information systems for pathway visualization, analysis and modeling.
Collapse
Affiliation(s)
- Ethan G Cerami
- Computational Biology Center, Memorial Sloan-Kettering Cancer Center 1275 York Avenue, Box 460, New York, NY 10021, USA
| | - Gary D Bader
- Banting and Best Department of Medical Research, Terrence Donnelly Centre for Cellular and Biomolecular Research, University of Toronto, 160 College St, Toronto, Ontario M5S 3E1, Canada
| | - Benjamin E Gross
- Computational Biology Center, Memorial Sloan-Kettering Cancer Center 1275 York Avenue, Box 460, New York, NY 10021, USA
| | - Chris Sander
- Computational Biology Center, Memorial Sloan-Kettering Cancer Center 1275 York Avenue, Box 460, New York, NY 10021, USA
| |
Collapse
|