1
|
Aromolaran O, Aromolaran D, Isewon I, Oyelade J. Machine learning approach to gene essentiality prediction: a review. Brief Bioinform 2021; 22:6219158. [PMID: 33842944 DOI: 10.1093/bib/bbab128] [Citation(s) in RCA: 56] [Impact Index Per Article: 14.0] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/18/2021] [Revised: 03/04/2021] [Accepted: 03/17/2021] [Indexed: 12/17/2022] Open
Abstract
Essential genes are critical for the growth and survival of any organism. The machine learning approach complements the experimental methods to minimize the resources required for essentiality assays. Previous studies revealed the need to discover relevant features that significantly classify essential genes, improve on the generalizability of prediction models across organisms, and construct a robust gold standard as the class label for the train data to enhance prediction. Findings also show that a significant limitation of the machine learning approach is predicting conditionally essential genes. The essentiality status of a gene can change due to a specific condition of the organism. This review examines various methods applied to essential gene prediction task, their strengths, limitations and the factors responsible for effective computational prediction of essential genes. We discussed categories of features and how they contribute to the classification performance of essentiality prediction models. Five categories of features, namely, gene sequence, protein sequence, network topology, homology and gene ontology-based features, were generated for Caenorhabditis elegans to perform a comparative analysis of their essentiality prediction capacity. Gene ontology-based feature category outperformed other categories of features majorly due to its high correlation with the genes' biological functions. However, the topology feature category provided the highest discriminatory power making it more suitable for essentiality prediction. The major limiting factor of machine learning to predict essential genes conditionality is the unavailability of labeled data for interest conditions that can train a classifier. Therefore, cooperative machine learning could further exploit models that can perform well in conditional essentiality predictions. SHORT ABSTRACT Identification of essential genes is imperative because it provides an understanding of the core structure and function, accelerating drug targets' discovery, among other functions. Recent studies have applied machine learning to complement the experimental identification of essential genes. However, several factors are limiting the performance of machine learning approaches. This review aims to present the standard procedure and resources available for predicting essential genes in organisms, and also highlight the factors responsible for the current limitation in using machine learning for conditional gene essentiality prediction. The choice of features and ML technique was identified as an important factor to predict essential genes effectively.
Collapse
Affiliation(s)
- Olufemi Aromolaran
- Department of Computer and Information Sciences, Covenant University, Ota, Ogun State, Nigeria.,Covenant University Bioinformatics Research (CUBRe), Covenant University, Ota, Ogun State, Nigeria
| | - Damilare Aromolaran
- Department of Computer and Information Sciences, Covenant University, Ota, Ogun State, Nigeria.,Covenant University Bioinformatics Research (CUBRe), Covenant University, Ota, Ogun State, Nigeria
| | - Itunuoluwa Isewon
- Department of Computer and Information Sciences, Covenant University, Ota, Ogun State, Nigeria.,Covenant University Bioinformatics Research (CUBRe), Covenant University, Ota, Ogun State, Nigeria
| | - Jelili Oyelade
- Department of Computer and Information Sciences, Covenant University, Ota, Ogun State, Nigeria.,Covenant University Bioinformatics Research (CUBRe), Covenant University, Ota, Ogun State, Nigeria
| |
Collapse
|
2
|
Nagpal S, Baksi KD, Kuntal BK, Mande SS. NetConfer: a web application for comparative analysis of multiple biological networks. BMC Biol 2020; 18:53. [PMID: 32430035 PMCID: PMC7236966 DOI: 10.1186/s12915-020-00781-9] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/03/2020] [Accepted: 04/14/2020] [Indexed: 12/29/2022] Open
Abstract
Background Most biological experiments are inherently designed to compare changes or transitions of state between conditions of interest. The advancements in data intensive research have in particular elevated the need for resources and tools enabling comparative analysis of biological data. The complexity of biological systems and the interactions of their various components, such as genes, proteins, taxa, and metabolites, have been inferred, represented, and visualized via graph theory-based networks. Comparisons of multiple networks can help in identifying variations across different biological systems, thereby providing additional insights. However, while a number of online and stand-alone tools exist for generating, analyzing, and visualizing individual biological networks, the utility to batch process and comprehensively compare multiple networks is limited. Results Here, we present a graphical user interface (GUI)-based web application which implements multiple network comparison methodologies and presents them in the form of organized analysis workflows. Dedicated comparative visualization modules are provided to the end-users for obtaining easy to comprehend, insightful, and meaningful comparisons of various biological networks. We demonstrate the utility and power of our tool using publicly available microbial and gene expression data. Conclusion NetConfer tool is developed keeping in mind the requirements of researchers working in the field of biological data analysis with limited programming expertise. It is also expected to be useful for advanced users from biological as well as other domains (working with association networks), benefiting from provided ready-made workflows, as they allow to focus directly on the results without worrying about the implementation. While the web version allows using this application without installation and dependency requirements, a stand-alone version has also been supplemented to accommodate the offline requirement of processing large networks.
Collapse
Affiliation(s)
- Sunil Nagpal
- Bio-Sciences R&D Division, TCS Research, Tata Consultancy Services Ltd., 54-B Hadapsar Industrial Estate, Pune, 411 013, India
| | - Krishanu Das Baksi
- Bio-Sciences R&D Division, TCS Research, Tata Consultancy Services Ltd., 54-B Hadapsar Industrial Estate, Pune, 411 013, India
| | - Bhusan K Kuntal
- Bio-Sciences R&D Division, TCS Research, Tata Consultancy Services Ltd., 54-B Hadapsar Industrial Estate, Pune, 411 013, India. .,Chemical Engineering and Process Development Division, CSIR-National Chemical Laboratory, Dr. Homi Bhabha Road, Pune, 411 008, India. .,Academy of Scientific and Innovative Research (AcSIR), CSIR-National Chemical Laboratory Campus, Pune, 411 008, India.
| | - Sharmila S Mande
- Bio-Sciences R&D Division, TCS Research, Tata Consultancy Services Ltd., 54-B Hadapsar Industrial Estate, Pune, 411 013, India.
| |
Collapse
|
3
|
Wang Z, Ishihara Y, Ishikawa T, Hoshijima M, Odagaki N, Ei Hsu Hlaing E, Kamioka H. Screening of key candidate genes and pathways for osteocytes involved in the differential response to different types of mechanical stimulation using a bioinformatics analysis. J Bone Miner Metab 2019; 37:614-626. [PMID: 30413886 DOI: 10.1007/s00774-018-0963-7] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 01/15/2018] [Accepted: 09/25/2018] [Indexed: 12/16/2022]
Abstract
This study aimed to predict the key genes and pathways that are activated when different types of mechanical loading are applied to osteocytes. mRNA expression datasets (series number of GSE62128 and GSE42874) were obtained from Gene Expression Omnibus database (GEO). High gravity-treated osteocytic MLO-Y4 cell-line samples from GSE62128 (Set1), and fluid flow-treated MLO-Y4 samples from GSE42874 (Set2) were employed. After identifying the differentially expressed genes (DEGs), functional enrichment was performed. The common DEGs between Set1 and Set2 were considered as key DEGs, then a protein-protein interaction (PPI) network was constructed using the minimal nodes from all of the DEGs in Set1 and Set2, which linked most of the key DEGs. Several open source software programs were employed to process and analyze the original data. The bioinformatic results and the biological meaning were validated by in vitro experiments. High gravity and fluid flow induced opposite expression trends in the key DEGs. The hypoxia-related biological process and signaling pathway were the common functional enrichment terms among the DEGs from Set1, Set2 and the PPI network. The expression of almost all the key DEGs (Pdk1, Ccng2, Eno2, Egln1, Higd1a, Slc5a3 and Mxi1) were mechano-sensitive. Eno2 was identified as the hub gene in the PPI network. Eno2 knockdown results in expression changes of some other key DEGs (Pdk1, Mxi1 and Higd1a). Our findings indicated that the hypoxia response might have an important role in the differential responses of osteocytes to the different types of mechanical force.
Collapse
Affiliation(s)
- Ziyi Wang
- Department of Orthodontics, Okayama University Graduate School of Medicine, Dentistry, and Pharmaceutical Sciences, 2-5-1 Shikata, Kita-ku, Okayama, 700-8525, Japan
| | | | - Takanori Ishikawa
- Department of Orthodontics, Okayama University Hospital, Okayama, Japan
| | - Mitsuhiro Hoshijima
- Department of Orthodontics, Okayama University Graduate School of Medicine, Dentistry, and Pharmaceutical Sciences, 2-5-1 Shikata, Kita-ku, Okayama, 700-8525, Japan
| | - Naoya Odagaki
- Department of Orthodontics, Okayama University Hospital, Okayama, Japan
| | - Ei Ei Hsu Hlaing
- Department of Orthodontics, Okayama University Graduate School of Medicine, Dentistry, and Pharmaceutical Sciences, 2-5-1 Shikata, Kita-ku, Okayama, 700-8525, Japan
| | - Hiroshi Kamioka
- Department of Orthodontics, Okayama University Graduate School of Medicine, Dentistry, and Pharmaceutical Sciences, 2-5-1 Shikata, Kita-ku, Okayama, 700-8525, Japan.
| |
Collapse
|
4
|
Zhang P, Tao L, Zeng X, Qin C, Chen S, Zhu F, Li Z, Jiang Y, Chen W, Chen YZ. A protein network descriptor server and its use in studying protein, disease, metabolic and drug targeted networks. Brief Bioinform 2017; 18:1057-1070. [PMID: 27542402 PMCID: PMC5862332 DOI: 10.1093/bib/bbw071] [Citation(s) in RCA: 26] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/15/2016] [Revised: 06/14/2016] [Indexed: 02/06/2023] Open
Abstract
The genetic, proteomic, disease and pharmacological studies have generated rich data in protein interaction, disease regulation and drug activities useful for systems-level study of the biological, disease and drug therapeutic processes. These studies are facilitated by the established and the emerging computational methods. More recently, the network descriptors developed in other disciplines have become more increasingly used for studying the protein-protein, gene regulation, metabolic, disease networks. There is an inadequate coverage of these useful network features in the public web servers. We therefore introduced upto 313 literature-reported network descriptors in PROFEAT web server, for describing the topological, connectivity and complexity characteristics of undirected unweighted (uniform binding constants and molecular levels), undirected edge-weighted (varying binding constants), undirected node-weighted (varying molecular levels), undirected edge-node-weighted (varying binding constants and molecular levels) and directed unweighted (oriented process) networks. The usefulness of the PROFEAT computed network descriptors is illustrated by their literature-reported applications in studying the protein-protein, gene regulatory, gene co-expression, protein-drug and metabolic networks. PROFEAT is accessible free of charge at http://bidd2.nus.edu.sg/cgi-bin/profeat2016/main.cgi.
Collapse
Affiliation(s)
- Peng Zhang
- Bioinformatics and Drug Design Group, Department of Pharmacy, National University of Singapore, Singapore
- College of Science, Sichuan Agricultural University, Yaan, P. R. China
| | - Lin Tao
- College of Science, Sichuan Agricultural University, Yaan, P. R. China
| | - Xian Zeng
- Bioinformatics and Drug Design Group, Department of Pharmacy, National University of Singapore, Singapore
| | - Chu Qin
- Bioinformatics and Drug Design Group, Department of Pharmacy, National University of Singapore, Singapore
| | - Shangying Chen
- Bioinformatics and Drug Design Group, Department of Pharmacy, National University of Singapore, Singapore
| | - Feng Zhu
- College of Chemistry, Sichuan University, Chengdu, P. R. China
| | - Zerong Li
- Molecular Medicine Research Center, State Key Laboratory of Biotherapy, West China Hospital, West China School of Medicine, Sichuan University, Chengdu, P. R. China
- Key Lab of Agricultural Products Processing and Quality Control of Nanchang City, Jiangxi Agricultural University, Nanchang, P. R. China
| | - Yuyang Jiang
- The Ministry-Province Jointly Constructed Base for State Key Lab, Shenzhen Technology and Engineering Lab for Personalized Cancer Diagnostics and Therapeutics, and Shenzhen Kivita Innovative Drug Discovery Institute, Tsinghua University Shenzhen Graduate School, Shenzhen, P.R. China
| | - Weiping Chen
- Key Lab of Agricultural Products Processing and Quality Control of Nanchang City, Jiangxi Agricultural University, Nanchang, P. R. China
| | - Yu-Zong Chen
- Bioinformatics and Drug Design Group, Department of Pharmacy, National University of Singapore, Singapore
| |
Collapse
|
5
|
Chatterjee P, Roy D, Bhattacharyya M, Bandyopadhyay S. Biological networks in Parkinson's disease: an insight into the epigenetic mechanisms associated with this disease. BMC Genomics 2017; 18:721. [PMID: 28899360 PMCID: PMC5596942 DOI: 10.1186/s12864-017-4098-3] [Citation(s) in RCA: 27] [Impact Index Per Article: 3.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/05/2016] [Accepted: 08/30/2017] [Indexed: 01/25/2023] Open
Abstract
BACKGROUND Parkinson's disease (PD) is the second most prevalent neurodegenerative disorders in the world. Studying PD from systems biology perspective involving genes and their regulators might provide deeper insights into the complex molecular interactions associated with this disease. RESULT We have studied gene co-expression network obtained from a PD-specific microarray data. The co-expression network identified 11 hub genes, of which eight genes are not previously known to be associated with PD. Further study on the functionality of these eight novel hub genes revealed that these genes play important roles in several neurodegenerative diseases. Furthermore, we have studied the tissue-specific expression and histone modification patterns of the novel hub genes. Most of these genes possess several histone modification sites those are already known to be associated with neurodegenerative diseases. Regulatory network namely mTF-miRNA-gene-gTF involves microRNA Transcription Factor (mTF), microRNA (miRNA), gene and gene Transcription Factor (gTF). Whereas long noncoding RNA (lncRNA) mediated regulatory network involves miRNA, gene, mTF and lncRNA. mTF-miRNA-gene-gTF regulatory network identified a novel feed-forward loop. lncRNA-mediated regulatory network identified novel lncRNAs of PD and revealed the two-way regulatory pattern of PD-specific miRNAs where miRNAs can be regulated by both the TFs and lncRNAs. SNP analysis of the most significant genes of the co-expression network identified 20 SNPs. These SNPs are present in the 3' UTR of known PD genes and are controlled by those miRNAs which are also involved in PD. CONCLUSION Our study identified eight novel hub genes which can be considered as possible candidates for future biomarker identification studies for PD. The two regulatory networks studied in our work provide a detailed overview of the cellular regulatory mechanisms where the non-coding RNAs namely miRNA and lncRNA, can act as epigenetic regulators of PD. SNPs identified in our study can be helpful for identifying PD at an earlier stage. Overall, this study may impart a better comprehension of the complex molecular interactions associated with PD from systems biology perspective.
Collapse
Affiliation(s)
- Paulami Chatterjee
- Department of Biophysics, Bose Institute, Acharya J.C. Bose Centenary Building, P-1/12 C.I.T. Scheme VII M, Kolkata, 700054 India
| | - Debjani Roy
- Department of Biophysics, Bose Institute, Acharya J.C. Bose Centenary Building, P-1/12 C.I.T. Scheme VII M, Kolkata, 700054 India
| | - Malay Bhattacharyya
- Department of Information Technology, Indian Institute of Engineering Science and Technology, Shibpur, Botanic Garden, Howrah, PO 711103 India
| | | |
Collapse
|
6
|
Zhou S, Liu P, Jiang W, Zhang H. Identification of potential target genes associated with the effect of propranolol on angiosarcoma via microarray analysis. Oncol Lett 2017; 13:4267-4275. [PMID: 28588707 PMCID: PMC5452868 DOI: 10.3892/ol.2017.5968] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/17/2016] [Accepted: 02/13/2017] [Indexed: 01/16/2023] Open
Abstract
The purpose of the present study was to explore the effect of propranolol on angiosarcoma, and the potential target genes involved in the processes of proliferation and differentiation of angiosarcoma tumor cells. The mRNA expression profile (GSE42534) was downloaded from the Gene Expressed Omnibus database, including three samples without propranolol treatment (control), three samples with propranolol treatment for 4 h and three samples with propranolol treatment for 24 h. The differentially expressed genes (DEGs) in angiosarcoma tumor cells with or without propranolol treatment were obtained via the limma package of R and designated DEGs-4 h and DEGs-24 h. The DEGs-24 h group was divided into two sets. Set 1 contained the DEGs also contained in the DEGs-4 h group. Set 2 contained the remainder of the DEGs. Functional and pathway enrichment analysis of sets 1 and 2 was performed. The protein-protein interaction (PPI) networks of sets 1 and 2 were constructed, termed PPI 1 and PPI 2, and visualized using Cytoscape software. Modules of the two PPI networks were analyzed, and their topological structures were simulated using the tYNA platform. A total of 543 and 2,025 DEGs were identified in angiosarcoma tumor cells treated with propranolol for 4 and 24 h, respectively, compared with the control group. A total of 401 DEGs were involved in DEGs-4 h and DEGs-24 h, including metallothionein 1, heme oxygenase 1, WW domain-binding protein 2 and sequestosome 1. Certain significantly enriched gene ontology (GO) terms and pathways of sets 1 and 2 were identified, containing 28 overlapping GO terms. Furthermore, 121 nodes and 700 associated pairs were involved in PPI 1, whereas 1,324 nodes and 11,839 associated pairs were involved in PPI 2. A total of 45 and 593 potential target genes were obtained according to the node degrees of PPI 1 and PPI 2. The results of the present study indicated that a number of potential target genes, including AXL receptor tyrosine kinase, coatomer subunit α, DR1-associated protein 1 and ERBB receptor feedback inhibitor 1 may be involved in the effect of propranolol on angiosarcoma.
Collapse
Affiliation(s)
- Shiyong Zhou
- Department of Lymphoma, Sino-US Center of Lymphoma and Leukemia, Tianjin Medical University Cancer Institute and Hospital, National Clinical Research Center for Cancer, Key Laboratory of Cancer Prevention and Therapy, Tianjin's Clinical Research Center for Cancer, Tianjin 300060, P.R. China
| | - Pengfei Liu
- Department of Lymphoma, Sino-US Center of Lymphoma and Leukemia, Tianjin Medical University Cancer Institute and Hospital, National Clinical Research Center for Cancer, Key Laboratory of Cancer Prevention and Therapy, Tianjin's Clinical Research Center for Cancer, Tianjin 300060, P.R. China
| | - Wenhua Jiang
- Department of Radiotherapy, Second Hospital of Tianjin Medical University, Tianjin 300211, P.R. China
| | - Huilai Zhang
- Department of Lymphoma, Sino-US Center of Lymphoma and Leukemia, Tianjin Medical University Cancer Institute and Hospital, National Clinical Research Center for Cancer, Key Laboratory of Cancer Prevention and Therapy, Tianjin's Clinical Research Center for Cancer, Tianjin 300060, P.R. China
| |
Collapse
|
7
|
Omics analysis of mouse brain models of human diseases. Gene 2017; 600:90-100. [DOI: 10.1016/j.gene.2016.11.022] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/15/2016] [Revised: 11/04/2016] [Accepted: 11/10/2016] [Indexed: 01/24/2023]
|
8
|
PROFEAT Update: A Protein Features Web Server with Added Facility to Compute Network Descriptors for Studying Omics-Derived Networks. J Mol Biol 2016; 429:416-425. [PMID: 27742592 DOI: 10.1016/j.jmb.2016.10.013] [Citation(s) in RCA: 25] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/30/2016] [Revised: 09/25/2016] [Accepted: 10/06/2016] [Indexed: 02/05/2023]
Abstract
The studies of biological, disease, and pharmacological networks are facilitated by the systems-level investigations using computational tools. In particular, the network descriptors developed in other disciplines have found increasing applications in the study of the protein, gene regulatory, metabolic, disease, and drug-targeted networks. Facilities are provided by the public web servers for computing network descriptors, but many descriptors are not covered, including those used or useful for biological studies. We upgraded the PROFEAT web server http://bidd2.nus.edu.sg/cgi-bin/profeat2016/main.cgi for computing up to 329 network descriptors and protein-protein interaction descriptors. PROFEAT network descriptors comprehensively describe the topological and connectivity characteristics of unweighted (uniform binding constants and molecular levels), edge-weighted (varying binding constants), node-weighted (varying molecular levels), edge-node-weighted (varying binding constants and molecular levels), and directed (oriented processes) networks. The usefulness of the network descriptors is illustrated by the literature-reported studies of the biological networks derived from the genome, interactome, transcriptome, metabolome, and diseasome profiles.
Collapse
|
9
|
Alavi Majd H, Talebi A, Gilany K, Khayyer N. Two-Way Gene Interaction From Microarray Data Based on Correlation Methods. IRANIAN RED CRESCENT MEDICAL JOURNAL 2016; 18:e24373. [PMID: 27621916 PMCID: PMC5002968 DOI: 10.5812/ircmj.24373] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 10/13/2014] [Revised: 03/26/2015] [Accepted: 04/21/2015] [Indexed: 11/26/2022]
Abstract
Background Gene networks have generated a massive explosion in the development of high-throughput techniques for monitoring various aspects of gene activity. Networks offer a natural way to model interactions between genes, and extracting gene network information from high-throughput genomic data is an important and difficult task. Objectives The purpose of this study is to construct a two-way gene network based on parametric and nonparametric correlation coefficients. The first step in constructing a Gene Co-expression Network is to score all pairs of gene vectors. The second step is to select a score threshold and connect all gene pairs whose scores exceed this value. Materials and Methods In the foundation-application study, we constructed two-way gene networks using nonparametric methods, such as Spearman’s rank correlation coefficient and Blomqvist’s measure, and compared them with Pearson’s correlation coefficient. We surveyed six genes of venous thrombosis disease, made a matrix entry representing the score for the corresponding gene pair, and obtained two-way interactions using Pearson’s correlation, Spearman’s rank correlation, and Blomqvist’s coefficient. Finally, these methods were compared with Cytoscape, based on BIND, and Gene Ontology, based on molecular function visual methods; R software version 3.2 and Bioconductor were used to perform these methods. Results Based on the Pearson and Spearman correlations, the results were the same and were confirmed by Cytoscape and GO visual methods; however, Blomqvist’s coefficient was not confirmed by visual methods. Conclusions Some results of the correlation coefficients are not the same with visualization. The reason may be due to the small number of data.
Collapse
Affiliation(s)
- Hamid Alavi Majd
- Department of Biostatistics, School of Paramedical Sciences, Shahid Beheshti University of Medical Sciences, Tehran, IR Iran
| | - Atefeh Talebi
- Department of Biostatistics, School of Paramedial Sciences, Students’ Research Committee, Shahid Beheshti University of Medical Sciences, Tehran, IR Iran
- Corresponding Author: Atefeh Talebi, Department of Biostatistics, School of Paramedial Sciences, Students’ Research Committee, Shahid Beheshti University of Medical Sciences, Tehran, IR Iran. Tel: +98-2122707347, Fax: +98-2122721150, E-mail:
| | - Kambiz Gilany
- Reproductive Biotechnology Research Center, Avicenna Research Institute, ACECR, Tehran, IR Iran
| | - Nasibeh Khayyer
- Proteomics Research Center, Shahid Beheshti University of Medical Sciences, Tehran, IR Iran
| |
Collapse
|
10
|
Faust K, Lima-Mendez G, Lerat JS, Sathirapongsasuti JF, Knight R, Huttenhower C, Lenaerts T, Raes J. Cross-biome comparison of microbial association networks. Front Microbiol 2015; 6:1200. [PMID: 26579106 PMCID: PMC4621437 DOI: 10.3389/fmicb.2015.01200] [Citation(s) in RCA: 117] [Impact Index Per Article: 11.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/07/2015] [Accepted: 10/15/2015] [Indexed: 12/22/2022] Open
Abstract
Clinical and environmental meta-omics studies are accumulating an ever-growing amount of microbial abundance data over a wide range of ecosystems. With a sufficiently large sample number, these microbial communities can be explored by constructing and analyzing co-occurrence networks, which detect taxon associations from abundance data and can give insights into community structure. Here, we investigate how co-occurrence networks differ across biomes and which other factors influence their properties. For this, we inferred microbial association networks from 20 different 16S rDNA sequencing data sets and observed that soil microbial networks harbor proportionally fewer positive associations and are less densely interconnected than host-associated networks. After excluding sample number, sequencing depth and beta-diversity as possible drivers, we found a negative correlation between community evenness and positive edge percentage. This correlation likely results from a skewed distribution of negative interactions, which take place preferentially between less prevalent taxa. Overall, our results suggest an under-appreciated role of evenness in shaping microbial association networks.
Collapse
Affiliation(s)
- Karoline Faust
- Center for the Biology of Disease, VIBLeuven, Belgium
- Department of Microbiology and Immunology, REGA Institute, KU LeuvenLeuven, Belgium
- Department of Applied Biological Sciences, Vrije Universiteit BrusselBrussels, Belgium
| | - Gipsi Lima-Mendez
- Center for the Biology of Disease, VIBLeuven, Belgium
- Department of Microbiology and Immunology, REGA Institute, KU LeuvenLeuven, Belgium
- Department of Applied Biological Sciences, Vrije Universiteit BrusselBrussels, Belgium
| | - Jean-Sébastien Lerat
- Machine Learning Group, Department of Computer Science, Université Libre de BruxellesBrussels, Belgium
| | | | - Rob Knight
- Department of Chemistry and Biochemistry and BioFrontiers Institute, University of Colorado, BoulderCO, USA
| | - Curtis Huttenhower
- Department of Biostatistics, Harvard School of Public Health, BostonMA, USA
| | - Tom Lenaerts
- Machine Learning Group, Department of Computer Science, Université Libre de BruxellesBrussels, Belgium
- Artificial Intelligence Lab, Department of Computer Science, Vrije Universiteit BrusselBrussels, Belgium
- Interuniversity Institute of Bioinformatics in Brussels, Université Libre de Bruxelles–Vrije Universiteit BrusselBrussels, Belgium
| | - Jeroen Raes
- Center for the Biology of Disease, VIBLeuven, Belgium
- Department of Microbiology and Immunology, REGA Institute, KU LeuvenLeuven, Belgium
- Department of Applied Biological Sciences, Vrije Universiteit BrusselBrussels, Belgium
| |
Collapse
|
11
|
Abstract
Essential genes are indispensable for the target organism's survival. Large-scale identification and characterization of essential genes has shown to be beneficial in both fundamental biology and medicine fields. Current existing genome-scale experimental screenings of essential genes are time consuming and costly, also sometimes confer erroneous essential gene annotations. To circumvent these difficulties, many research groups turn to computational approaches as the alternative to identify essential genes. Here, we developed an integrative machine-learning based statistical framework to accurately predict essential genes in microorganisms. First we extracted a variety of relevant features derived from different aspects of an organism's genomic sequences. Then we selected a subset of features have high predictive power of gene essentiality through a carefully designed feature selection system. Using the selected features as input, we constructed an ensemble classifier and trained the model on a well-studied microorganism. After fine-tuning the model parameters in cross-validation, we tested the model on the other microorganism. We found that the tenfold cross-validation results within the same organism achieves a high predictive accuracy (AUC ~0.9), and cross-organism predictions between distant related organisms yield the AUC scores from 0.69 to 0.89, which significantly outperformed homology mapping.
Collapse
|
12
|
Pavlopoulos GA, Malliarakis D, Papanikolaou N, Theodosiou T, Enright AJ, Iliopoulos I. Visualizing genome and systems biology: technologies, tools, implementation techniques and trends, past, present and future. Gigascience 2015; 4:38. [PMID: 26309733 PMCID: PMC4548842 DOI: 10.1186/s13742-015-0077-2] [Citation(s) in RCA: 53] [Impact Index Per Article: 5.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/26/2015] [Accepted: 08/03/2015] [Indexed: 01/31/2023] Open
Abstract
"Α picture is worth a thousand words." This widely used adage sums up in a few words the notion that a successful visual representation of a concept should enable easy and rapid absorption of large amounts of information. Although, in general, the notion of capturing complex ideas using images is very appealing, would 1000 words be enough to describe the unknown in a research field such as the life sciences? Life sciences is one of the biggest generators of enormous datasets, mainly as a result of recent and rapid technological advances; their complexity can make these datasets incomprehensible without effective visualization methods. Here we discuss the past, present and future of genomic and systems biology visualization. We briefly comment on many visualization and analysis tools and the purposes that they serve. We focus on the latest libraries and programming languages that enable more effective, efficient and faster approaches for visualizing biological concepts, and also comment on the future human-computer interaction trends that would enable for enhancing visualization further.
Collapse
Affiliation(s)
- Georgios A Pavlopoulos
- Bioinformatics & Computational Biology Laboratory, Division of Basic Sciences, University of Crete, Medical School, 70013 Heraklion, Crete Greece
| | | | - Nikolas Papanikolaou
- Bioinformatics & Computational Biology Laboratory, Division of Basic Sciences, University of Crete, Medical School, 70013 Heraklion, Crete Greece
| | - Theodosis Theodosiou
- Bioinformatics & Computational Biology Laboratory, Division of Basic Sciences, University of Crete, Medical School, 70013 Heraklion, Crete Greece
| | - Anton J Enright
- EMBL - European Bioinformatics Institute, Wellcome Trust Genome Campus, Cambridge, CB10 1SD UK
| | - Ioannis Iliopoulos
- Bioinformatics & Computational Biology Laboratory, Division of Basic Sciences, University of Crete, Medical School, 70013 Heraklion, Crete Greece
| |
Collapse
|
13
|
Zhang ST, Zuo C, Li WN, Fu XQ, Xing S, Zhang XP. Identification of key genes associated with the effect of estrogen on ovarian cancer using microarray analysis. Arch Gynecol Obstet 2015; 293:421-7. [PMID: 26264810 DOI: 10.1007/s00404-015-3833-8] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/02/2015] [Accepted: 07/27/2015] [Indexed: 01/15/2023]
Abstract
PURPOSE To identify key genes related to the effect of estrogen on ovarian cancer. METHODS Microarray data (GSE22600) were downloaded from Gene Expression Omnibus. Eight estrogen and seven placebo treatment samples were obtained using a 2 × 2 factorial designs, which contained 2 cell lines (PEO4 and 2008) and 2 treatments (estrogen and placebo). Differentially expressed genes were identified by Bayesian methods, and the genes with P < 0.05 and |log2FC (fold change)| ≥0.5 were chosen as cut-off criterion. Differentially co-expressed genes (DCGs) and differentially regulated genes (DRGs) were, respectively, identified by DCe function and DRsort function in DCGL package. Topological structure analysis was performed on the important transcriptional factors (TFs) and genes in transcriptional regulatory network using tYNA. Functional enrichment analysis was, respectively, performed for DEGs and the important genes using Gene Ontology and KEGG databases. RESULTS In total, 465 DEGs were identified. Functional enrichment analysis of DEGs indicated that ACVR2B, LTBP1, BMP7 and MYC involved in TGF-beta signaling pathway. The 2285 DCG pairs and 357 DRGs were identified. Topological structure analysis showed that 52 important TFs and 65 important genes were identified. Functional enrichment analysis of the important genes showed that TP53 and MLH1 participated in DNA damage response and the genes (ACVR2B, LTBP1, BMP7 and MYC) involved in TGF-beta signaling pathway. CONCLUSION TP53, MLH1, ACVR2B, LTBP1 and BMP7 might participate in the pathogenesis of ovarian cancer.
Collapse
Affiliation(s)
- Shi-tao Zhang
- Key Laboratory for Molecular Enzymology and Engineering, The Ministry of Education, Jilin University, Changchun, 130012, China
| | - Chao Zuo
- Department of Anesthesiology, The Fifth Affiliated Hospital of Zunyi Medical College, Zhu Hai, 519100, China
| | - Wan-nan Li
- Key Laboratory for Molecular Enzymology and Engineering, The Ministry of Education, Jilin University, Changchun, 130012, China
| | - Xue-qi Fu
- Key Laboratory for Molecular Enzymology and Engineering, The Ministry of Education, Jilin University, Changchun, 130012, China
| | - Shu Xing
- Key Laboratory for Molecular Enzymology and Engineering, The Ministry of Education, Jilin University, Changchun, 130012, China
| | - Xiao-ping Zhang
- Department of Obstetrics and Gynecology, China-Japan Union Hospital of Jilin University, 126 Xiantai Street, Changchun, 130033, China.
| |
Collapse
|
14
|
Rakshit H, Rathi N, Roy D. Construction and analysis of the protein-protein interaction networks based on gene expression profiles of Parkinson's disease. PLoS One 2014; 9:e103047. [PMID: 25170921 PMCID: PMC4149362 DOI: 10.1371/journal.pone.0103047] [Citation(s) in RCA: 48] [Impact Index Per Article: 4.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/11/2013] [Accepted: 06/26/2014] [Indexed: 11/29/2022] Open
Abstract
Background Parkinson's Disease (PD) is one of the most prevailing neurodegenerative diseases. Improving diagnoses and treatments of this disease is essential, as currently there exists no cure for this disease. Microarray and proteomics data have revealed abnormal expression of several genes and proteins responsible for PD. Nevertheless, few studies have been reported involving PD-specific protein-protein interactions. Results Microarray based gene expression data and protein-protein interaction (PPI) databases were combined to construct the PPI networks of differentially expressed (DE) genes in post mortem brain tissue samples of patients with Parkinson's disease. Samples were collected from the substantia nigra and the frontal cerebral cortex. From the microarray data, two sets of DE genes were selected by 2-tailed t-tests and Significance Analysis of Microarrays (SAM), run separately to construct two Query-Query PPI (QQPPI) networks. Several topological properties of these networks were studied. Nodes with High Connectivity (hubs) and High Betweenness Low Connectivity (bottlenecks) were identified to be the most significant nodes of the networks. Three and four-cliques were identified in the QQPPI networks. These cliques contain most of the topologically significant nodes of the networks which form core functional modules consisting of tightly knitted sub-networks. Hitherto unreported 37 PD disease markers were identified based on their topological significance in the networks. Of these 37 markers, eight were significantly involved in the core functional modules and showed significant change in co-expression levels. Four (ARRB2, STX1A, TFRC and MARCKS) out of the 37 markers were found to be associated with several neurotransmitters including dopamine. Conclusion This study represents a novel investigation of the PPI networks for PD, a complex disease. 37 proteins identified in our study can be considered as PD network biomarkers. These network biomarkers may provide as potential therapeutic targets for PD applications development.
Collapse
Affiliation(s)
- Hindol Rakshit
- Integrated Science Education & Research Centre (ISERC), Visva-Bharati University, Shantiniketan, Birbhum, West Bengal, India
| | - Nitin Rathi
- Cognizant Technology Solutions India Pvt. Ltd., Rajiv Gandhi Infotech Park, MIDC, Hinjewadi, Pune, Maharashtra, India
| | - Debjani Roy
- Department of Biophysics, Bose Institute, Acharya J.C. Bose Centenary Building, Kolkata, West Bengal, India
- * E-mail:
| |
Collapse
|
15
|
ModuleRole: a tool for modulization, role determination and visualization in protein-protein interaction networks. PLoS One 2014; 9:e94608. [PMID: 24788790 PMCID: PMC4006751 DOI: 10.1371/journal.pone.0094608] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/23/2013] [Accepted: 03/17/2014] [Indexed: 11/19/2022] Open
Abstract
UNLABELLED Rapidly increasing amounts of (physical and genetic) protein-protein interaction (PPI) data are produced by various high-throughput techniques, and interpretation of these data remains a major challenge. In order to gain insight into the organization and structure of the resultant large complex networks formed by interacting molecules, using simulated annealing, a method based on the node connectivity, we developed ModuleRole, a user-friendly web server tool which finds modules in PPI network and defines the roles for every node, and produces files for visualization in Cytoscape and Pajek. For given proteins, it analyzes the PPI network from BioGRID database, finds and visualizes the modules these proteins form, and then defines the role every node plays in this network, based on two topological parameters Participation Coefficient and Z-score. This is the first program which provides interactive and very friendly interface for biologists to find and visualize modules and roles of proteins in PPI network. It can be tested online at the website http://www.bioinfo.org/modulerole/index.php, which is free and open to all users and there is no login requirement, with demo data provided by "User Guide" in the menu Help. Non-server application of this program is considered for high-throughput data with more than 200 nodes or user's own interaction datasets. Users are able to bookmark the web link to the result page and access at a later time. As an interactive and highly customizable application, ModuleRole requires no expert knowledge in graph theory on the user side and can be used in both Linux and Windows system, thus a very useful tool for biologist to analyze and visualize PPI networks from databases such as BioGRID. AVAILABILITY ModuleRole is implemented in Java and C, and is freely available at http://www.bioinfo.org/modulerole/index.php. Supplementary information (user guide, demo data) is also available at this website. API for ModuleRole used for this program can be obtained upon request.
Collapse
|
16
|
Chatterjee P, Bhattacharyya M, Bandyopadhyay S, Roy D. Studying the system-level involvement of microRNAs in Parkinson's disease. PLoS One 2014; 9:e93751. [PMID: 24690883 PMCID: PMC3972105 DOI: 10.1371/journal.pone.0093751] [Citation(s) in RCA: 18] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/11/2013] [Accepted: 03/08/2014] [Indexed: 12/15/2022] Open
Abstract
Background Parkinson's Disease (PD) is a progressive neurologic disorder that affects movement and balance. Recent studies have revealed the importance of microRNA (miR) in PD. However, the detailed role of miR and its regulation by Transcription Factor (TF) remain unexplored. In this work for the first time we have studied TF-miR-mRNA regulatory network as well as miR co-expression network in PD. Result We compared the 204 differentially expressed miRs from microarray data with 73 PD related miRs obtained from literature, Human MicroRNA Disease Database and found a significant overlap of 47 PD related miRs (p-value<0.05). Functional enrichment analyses of these 47 common (Group1) miRs and the remaining 157 (Group2) miRs revealed similar kinds of over-representative GO Biological Processes and KEGG pathways. This strengthens the possibility that some of the Group 2 miRs can have functional roles in PD progression, hitherto unidentified in any study. In order to explore the cross talk between TF, miR and target mRNA, regulatory networks were constructed. Study of these networks resulted in 14 Inter-Regulatory hub miRs whereas miR co-expression network revealed 18 co-expressed hub miRs. Of these 32 hub miRs, 23 miRs were previously unidentified with respect to their association with PD. Hierarchical clustering analysis further strengthens the roles of these novel miRs in different PD pathways. Furthermore hsa-miR-92a appeared as novel hub miR in both regulatory and co-expression network indicating its strong functional role in PD. High conservation patterns were observed for most of these 23 novel hub miRs across different species including human. Thus these 23 novel hub miRs can be considered as potential biomarkers for PD. Conclusion Our study identified 23 novel miR markers which can open up new avenues for future studies and shed lights on potential therapeutic targets for PD.
Collapse
Affiliation(s)
- Paulami Chatterjee
- Department of Biophysics, Bose Institute, Acharya J.C. Bose Centenary Building, Kolkata, India
| | | | | | - Debjani Roy
- Department of Biophysics, Bose Institute, Acharya J.C. Bose Centenary Building, Kolkata, India
- * E-mail:
| |
Collapse
|
17
|
Amin MS, Finley RL, Jamil HM. Top-k similar graph matching using TraM in biological networks. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2012; 9:1790-1804. [PMID: 22732692 DOI: 10.1109/tcbb.2012.90] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/01/2023]
Abstract
Many emerging database applications entail sophisticated graph-based query manipulation, predominantly evident in large-scale scientific applications. To access the information embedded in graphs, efficient graph matching tools and algorithms have become of prime importance. Although the prohibitively expensive time complexity associated with exact subgraph isomorphism techniques has limited its efficacy in the application domain, approximate yet efficient graph matching techniques have received much attention due to their pragmatic applicability. Since public domain databases are noisy and incomplete in nature, inexact graph matching techniques have proven to be more promising in terms of inferring knowledge from numerous structural data repositories. In this paper, we propose a novel technique called TraM for approximate graph matching that off-loads a significant amount of its processing on to the database making the approach viable for large graphs. Moreover, the vector space embedding of the graphs and efficient filtration of the search space enables computation of approximate graph similarity at a throw-away cost. We annotate nodes of the query graphs by means of their global topological properties and compare them with neighborhood biased segments of the datagraph for proper matches. We have conducted experiments on several real data sets, and have demonstrated the effectiveness and efficiency of the proposed method
Collapse
Affiliation(s)
- Mohammad Shafkat Amin
- Department of Computer Science, Wayne State University, 555 E Washington Ave, Apt 1807, Sunnyvale, CA 94086, USA.
| | | | | |
Collapse
|
18
|
Faust K, Sathirapongsasuti JF, Izard J, Segata N, Gevers D, Raes J, Huttenhower C. Microbial co-occurrence relationships in the human microbiome. PLoS Comput Biol 2012; 8:e1002606. [PMID: 22807668 PMCID: PMC3395616 DOI: 10.1371/journal.pcbi.1002606] [Citation(s) in RCA: 991] [Impact Index Per Article: 76.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/30/2012] [Accepted: 05/21/2012] [Indexed: 02/07/2023] Open
Abstract
The healthy microbiota show remarkable variability within and among individuals. In addition to external exposures, ecological relationships (both oppositional and symbiotic) between microbial inhabitants are important contributors to this variation. It is thus of interest to assess what relationships might exist among microbes and determine their underlying reasons. The initial Human Microbiome Project (HMP) cohort, comprising 239 individuals and 18 different microbial habitats, provides an unprecedented resource to detect, catalog, and analyze such relationships. Here, we applied an ensemble method based on multiple similarity measures in combination with generalized boosted linear models (GBLMs) to taxonomic marker (16S rRNA gene) profiles of this cohort, resulting in a global network of 3,005 significant co-occurrence and co-exclusion relationships between 197 clades occurring throughout the human microbiome. This network revealed strong niche specialization, with most microbial associations occurring within body sites and a number of accompanying inter-body site relationships. Microbial communities within the oropharynx grouped into three distinct habitats, which themselves showed no direct influence on the composition of the gut microbiota. Conversely, niches such as the vagina demonstrated little to no decomposition into region-specific interactions. Diverse mechanisms underlay individual interactions, with some such as the co-exclusion of Porphyromonaceae family members and Streptococcus in the subgingival plaque supported by known biochemical dependencies. These differences varied among broad phylogenetic groups as well, with the Bacilli and Fusobacteria, for example, both enriched for exclusion of taxa from other clades. Comparing phylogenetic versus functional similarities among bacteria, we show that dominant commensal taxa (such as Prevotellaceae and Bacteroides in the gut) often compete, while potential pathogens (e.g. Treponema and Prevotella in the dental plaque) are more likely to co-occur in complementary niches. This approach thus serves to open new opportunities for future targeted mechanistic studies of the microbial ecology of the human microbiome. The human body is a complex ecosystem where microbes compete, and cooperate. These interactions can support health or promote disease, e.g. in dental plaque formation. The Human Microbiome Project collected and sequenced ca. 5,000 samples from 18 different body sites, including the airways, gut, skin, oral cavity and vagina. These data allowed the first assessment of significant patterns of co-presence and exclusion among human-associated bacteria. We combined sparse regression with an ensemble of similarity measures to predict microbial relationships within and between body sites. This captured known relationships in the dental plaque, vagina, and gut, and also predicted novel interactions involving members of under-characterized phyla such as TM7. We detected relationships necessary for plaque formation and differences in community composition among dominant members of the gut and vaginal microbiomes. Most relationships were strongly niche-specific, with only a few hub microorganisms forming links across multiple body areas. We also found that phylogenetic distance had a strong impact on the interaction type: closely related microorganisms co-occurred within the same niche, whereas most exclusive relationships occurred between more distantly related microorganisms. This establishes both the specific organisms and general principles by which microbial communities associated with healthy humans are assembled and maintained.
Collapse
Affiliation(s)
- Karoline Faust
- Department of Structural Biology, VIB, Brussels, Belgium
- Department of Applied Biological Sciences (DBIT), Vrije Universiteit Brussel, Brussels, Belgium
| | - J. Fah Sathirapongsasuti
- Department of Biostatistics, Harvard School of Public Health, Boston, Massachusetts, United States of America
| | - Jacques Izard
- Department of Molecular Genetics, Forsyth Institute, Cambridge, Massachusetts, United States of America
- Department of Oral Medicine, Infection and Immunity, Harvard School of Dental Medicine, Boston, Massachusetts, United States of America
| | - Nicola Segata
- Department of Biostatistics, Harvard School of Public Health, Boston, Massachusetts, United States of America
| | - Dirk Gevers
- Microbial Systems and Communities, Broad Institute of MIT and Harvard, Cambridge, Massachusetts, United States of America
| | - Jeroen Raes
- Department of Structural Biology, VIB, Brussels, Belgium
- Department of Applied Biological Sciences (DBIT), Vrije Universiteit Brussel, Brussels, Belgium
- * E-mail: (JR); (CH)
| | - Curtis Huttenhower
- Department of Biostatistics, Harvard School of Public Health, Boston, Massachusetts, United States of America
- Microbial Systems and Communities, Broad Institute of MIT and Harvard, Cambridge, Massachusetts, United States of America
- * E-mail: (JR); (CH)
| |
Collapse
|
19
|
Cromar GL, Xiong X, Chautard E, Ricard-Blum S, Parkinson J. Toward a systems level view of the ECM and related proteins: a framework for the systematic definition and analysis of biological systems. Proteins 2012; 80:1522-44. [PMID: 22275077 DOI: 10.1002/prot.24036] [Citation(s) in RCA: 23] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/24/2011] [Revised: 12/19/2011] [Accepted: 12/29/2011] [Indexed: 12/20/2022]
Abstract
Advances in high throughput 'omic technologies are starting to provide unprecedented insights into how components of biological systems are organized and interact. Key to exploiting these datasets is the definition of the components that comprise the system of interest. Although a variety of knowledge bases exist that capture such information, a major challenge is determining how these resources may be best utilized. Here we present a systematic curation strategy to define a systems-level view of the human extracellular matrix (ECM)--a three-dimensional meshwork of proteins and polysaccharides that impart structure and mechanical stability to tissues. Employing our curation strategy we define a set of 357 proteins that represent core components of the ECM, together with an additional 524 genes that mediate related functional roles, and construct a map of their physical interactions. Topological properties help identify modules of functionally related proteins, including those involved in cell adhesion, bone formation and blood clotting. Because of its major role in cell adhesion, proliferation and morphogenesis, defects in the ECM have been implicated in cancer, atherosclerosis, asthma, fibrosis, and arthritis. We use MeSH annotations to identify modules enriched for specific disease terms that aid to strengthen existing as well as predict novel gene-disease associations. Mapping expression and conservation data onto the network reveal modules evolved in parallel to convey tissue-specific functionality on otherwise broadly expressed units. In addition to demonstrating an effective workflow for defining biological systems, this study crystallizes our current knowledge surrounding the organization of the ECM.
Collapse
Affiliation(s)
- Graham L Cromar
- Department of Molecular Genetics, University of Toronto, Toronto, Ontario M5S 1A8, Canada
| | | | | | | | | |
Collapse
|
20
|
A targeted association study of immunity genes and networks suggests novel associations with placental malaria infection. PLoS One 2011; 6:e24996. [PMID: 21949827 PMCID: PMC3176307 DOI: 10.1371/journal.pone.0024996] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/29/2011] [Accepted: 08/22/2011] [Indexed: 01/17/2023] Open
Abstract
A large proportion of the death toll associated with malaria is a consequence of malaria infection during pregnancy, causing up to 200,000 infant deaths annually. We previously published the first extensive genetic association study of placental malaria infection, and here we extend this analysis considerably, investigating genetic variation in over 9,000 SNPs in more than 1,000 genes involved in immunity and inflammation for their involvement in susceptibility to placental malaria infection. We applied a new approach incorporating results from both single gene analysis as well as gene-gene interactions on a protein-protein interaction network. We found suggestive associations of variants in the gene KLRK1 in the single gene analysis, as well as evidence for associations of multiple members of the IL-7/IL-7R signalling cascade in the combined analysis. To our knowledge, this is the first large-scale genetic study on placental malaria infection to date, opening the door for follow-up studies trying to elucidate the genetic basis of this neglected form of malaria.
Collapse
|
21
|
Zhang M, Zhu C, Jacomy A, Lu L, Jegga A. The orphan disease networks. Am J Hum Genet 2011; 88:755-766. [PMID: 21664998 DOI: 10.1016/j.ajhg.2011.05.006] [Citation(s) in RCA: 37] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/15/2010] [Revised: 04/29/2011] [Accepted: 05/06/2011] [Indexed: 01/29/2023] Open
Abstract
The low prevalence rate of orphan diseases (OD) requires special combined efforts to improve diagnosis, prevention, and discovery of novel therapeutic strategies. To identify and investigate relationships based on shared genes or shared functional features, we have conducted a bioinformatic-based global analysis of all orphan diseases with known disease-causing mutant genes. Starting with a bipartite network of known OD and OD-causing mutant genes and using the human protein interactome, we first construct and topologically analyze three networks: the orphan disease network, the orphan disease-causing mutant gene network, and the orphan disease-causing mutant gene interactome. Our results demonstrate that in contrast to the common disease-causing mutant genes that are predominantly nonessential, a majority of orphan disease-causing mutant genes are essential. In confirmation of this finding, we found that OD-causing mutant genes are topologically important in the protein interactome and are ubiquitously expressed. Additionally, functional enrichment analysis of those genes in which mutations cause ODs shows that a majority result in premature death or are lethal in the orthologous mouse gene knockout models. To address the limitations of traditional gene-based disease networks, we also construct and analyze OD networks on the basis of shared enriched features (biological processes, cellular components, pathways, phenotypes, and literature citations). Analyzing these functionally-linked OD networks, we identified several additional OD-OD relations that are both phenotypically similar and phenotypically diverse. Surprisingly, we observed that the wiring of the gene-based and other feature-based OD networks are largely different; this suggests that the relationship between ODs cannot be fully captured by the gene-based network alone.
Collapse
|
22
|
Abstract
Systems biology is all about networks. A recent trend has been to associate systems biology exclusively with the study of gene regulatory or protein-interaction networks. However, systems biology approaches can be applied at many other scales, from the subatomic to the ecosystem scales. In this review, we describe studies at the sub-cellular, tissue, whole plant and crop scales and highlight how these studies can be related to systems biology. We discuss the properties of system approaches at each scale as well as their current limits, and pinpoint in each case advances unique to the considered scale but representing potential for the other scales. We conclude by examining plant models bridging different scales and considering the future prospects of plant systems biology.
Collapse
Affiliation(s)
- Mikaël Lucas
- Centre for Plant Integrative Biology, University of Nottingham, Nottingham, UK.
| | | | | |
Collapse
|
23
|
Gianoulis TA, Agarwal A, Snyder M, Gerstein MB. The CRIT framework for identifying cross patterns in systems biology and application to chemogenomics. Genome Biol 2011; 12:R32. [PMID: 21453526 PMCID: PMC3129682 DOI: 10.1186/gb-2011-12-3-r32] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/04/2010] [Revised: 01/31/2011] [Accepted: 03/31/2011] [Indexed: 12/03/2022] Open
Abstract
Biological data is often tabular but finding statistically valid connections between entities in a sequence of tables can be problematic - for example, connecting particular entities in a drug property table to gene properties in a second table, using a third table associating genes with drugs. Here we present an approach (CRIT) to find connections such as these and show how it can be applied in a variety of genomic contexts including chemogenomics data.
Collapse
Affiliation(s)
- Tara A Gianoulis
- Department of Genetics, 77 Ave. of Louis Pasteur, Harvard Medical School, Boston, MA 02115, USA
| | | | | | | |
Collapse
|
24
|
Kuchaiev O, Stevanović A, Hayes W, Pržulj N. GraphCrunch 2: Software tool for network modeling, alignment and clustering. BMC Bioinformatics 2011; 12:24. [PMID: 21244715 PMCID: PMC3036622 DOI: 10.1186/1471-2105-12-24] [Citation(s) in RCA: 56] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/18/2010] [Accepted: 01/19/2011] [Indexed: 02/02/2023] Open
Abstract
Background Recent advancements in experimental biotechnology have produced large amounts of protein-protein interaction (PPI) data. The topology of PPI networks is believed to have a strong link to their function. Hence, the abundance of PPI data for many organisms stimulates the development of computational techniques for the modeling, comparison, alignment, and clustering of networks. In addition, finding representative models for PPI networks will improve our understanding of the cell just as a model of gravity has helped us understand planetary motion. To decide if a model is representative, we need quantitative comparisons of model networks to real ones. However, exact network comparison is computationally intractable and therefore several heuristics have been used instead. Some of these heuristics are easily computable "network properties," such as the degree distribution, or the clustering coefficient. An important special case of network comparison is the network alignment problem. Analogous to sequence alignment, this problem asks to find the "best" mapping between regions in two networks. It is expected that network alignment might have as strong an impact on our understanding of biology as sequence alignment has had. Topology-based clustering of nodes in PPI networks is another example of an important network analysis problem that can uncover relationships between interaction patterns and phenotype. Results We introduce the GraphCrunch 2 software tool, which addresses these problems. It is a significant extension of GraphCrunch which implements the most popular random network models and compares them with the data networks with respect to many network properties. Also, GraphCrunch 2 implements the GRAph ALigner algorithm ("GRAAL") for purely topological network alignment. GRAAL can align any pair of networks and exposes large, dense, contiguous regions of topological and functional similarities far larger than any other existing tool. Finally, GraphCruch 2 implements an algorithm for clustering nodes within a network based solely on their topological similarities. Using GraphCrunch 2, we demonstrate that eukaryotic and viral PPI networks may belong to different graph model families and show that topology-based clustering can reveal important functional similarities between proteins within yeast and human PPI networks. Conclusions GraphCrunch 2 is a software tool that implements the latest research on biological network analysis. It parallelizes computationally intensive tasks to fully utilize the potential of modern multi-core CPUs. It is open-source and freely available for research use. It runs under the Windows and Linux platforms.
Collapse
Affiliation(s)
- Oleksii Kuchaiev
- Department of Computer Science, University of California, Irvine, CA, USA
| | | | | | | |
Collapse
|
25
|
Babu MM. Early Career Research Award Lecture. Structure, evolution and dynamics of transcriptional regulatory networks. Biochem Soc Trans 2010; 38:1155-78. [PMID: 20863280 DOI: 10.1042/bst0381155] [Citation(s) in RCA: 19] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/09/2023]
Abstract
The availability of entire genome sequences and the wealth of literature on gene regulation have enabled researchers to model an organism's transcriptional regulation system in the form of a network. In such a network, TFs (transcription factors) and TGs (target genes) are represented as nodes and regulatory interactions between TFs and TGs are represented as directed links. In the present review, I address the following topics pertaining to transcriptional regulatory networks. (i) Structure and organization: first, I introduce the concept of networks and discuss our understanding of the structure and organization of transcriptional networks. (ii) Evolution: I then describe the different mechanisms and forces that influence network evolution and shape network structure. (iii) Dynamics: I discuss studies that have integrated information on dynamics such as mRNA abundance or half-life, with data on transcriptional network in order to elucidate general principles of regulatory network dynamics. In particular, I discuss how cell-to-cell variability in the expression level of TFs could permit differential utilization of the same underlying network by distinct members of a genetically identical cell population. Finally, I conclude by discussing open questions for future research and highlighting the implications for evolution, development, disease and applications such as genetic engineering.
Collapse
Affiliation(s)
- M Madan Babu
- MRC Laboratory of Molecular Biology, Hills Road, Cambridge CB2 0QH, UK.
| |
Collapse
|
26
|
Zhang M, Lu LJ. Investigating the validity of current network analysis on static conglomerate networks by protein network stratification. BMC Bioinformatics 2010; 11:466. [PMID: 20846443 PMCID: PMC2949894 DOI: 10.1186/1471-2105-11-466] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/02/2010] [Accepted: 09/16/2010] [Indexed: 01/25/2023] Open
Abstract
Background A molecular network perspective forms the foundation of systems biology. A common practice in analyzing protein-protein interaction (PPI) networks is to perform network analysis on a conglomerate network that is an assembly of all available binary interactions in a given organism from diverse data sources. Recent studies on network dynamics suggested that this approach might have ignored the dynamic nature of context-dependent molecular systems. Results In this study, we employed a network stratification strategy to investigate the validity of the current network analysis on conglomerate PPI networks. Using the genome-scale tissue- and condition-specific proteomics data in Arabidopsis thaliana, we present here the first systematic investigation into this question. We stratified a conglomerate A. thaliana PPI network into three levels of context-dependent subnetworks. We then focused on three types of most commonly conducted network analyses, i.e., topological, functional and modular analyses, and compared the results from these network analyses on the conglomerate network and five stratified context-dependent subnetworks corresponding to specific tissues. Conclusions We found that the results based on the conglomerate PPI network are often significantly different from those of context-dependent subnetworks corresponding to specific tissues or conditions. This conclusion depends neither on relatively arbitrary cutoffs (such as those defining network hubs or bottlenecks), nor on specific network clustering algorithms for module extraction, nor on the possible high false positive rates of binary interactions in PPI networks. We also found that our conclusions are likely to be valid in human PPI networks. Furthermore, network stratification may help resolve many controversies in current research of systems biology.
Collapse
Affiliation(s)
- Minlu Zhang
- Division of Biomedical Informatics, Cincinnati Children's Hospital Medical Center, 3333 Burnet Avenue, Cincinnati, OH 45229, USA
| | | |
Collapse
|
27
|
Global network analysis of lipid-raft-related proteins reveals their centrality in the network and their roles in multiple biological processes. J Mol Biol 2010; 402:761-73. [PMID: 20709075 DOI: 10.1016/j.jmb.2010.08.014] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/06/2010] [Revised: 08/03/2010] [Accepted: 08/09/2010] [Indexed: 11/20/2022]
Abstract
Lipid rafts are specialized cholesterol-enriched microdomains in the cell membrane. They have been known as a platform for protein-protein interactions and to take part in multiple biological processes. Nevertheless, how lipid rafts influence protein properties at the proteomic level is still an open question for researchers using traditional biochemical approaches. Here, by annotating the lipid raft localization of proteins in human protein-protein interaction networks, we performed a systematic analysis of the function of proteins related to lipid rafts. Our results demonstrated that lipid raft proteins and their interactions were critical for the structure and stability of the whole network, and that the interactions between them were significantly enriched. Furthermore, for each protein in the network, we calculated its "lipid raft dependency (LRD)," which indicates how close it is topologically associated with lipid rafts, and we then uncovered the connection between LRD and protein functions. Proteins with high LRD tended to be essential for mammalian development, and malfunction of these proteins was inclined to cause human diseases. Coordinated with their neighbors, high-LRD proteins participated in multiple biological processes and targeted many pathways in diseases pathogenesis. High-LRD proteins were also found to have tissue specificity of expression. In summary, our network-based analysis denotes that lipid raft proteins have higher centrality in the network, and that lipid-raft-related proteins have multiple functions and are probably concerned with many biological processes in disease development.
Collapse
|
28
|
Dogrusoz U, Cetintas A, Demir E, Babur O. Algorithms for effective querying of compound graph-based pathway databases. BMC Bioinformatics 2009; 10:376. [PMID: 19917102 PMCID: PMC2784781 DOI: 10.1186/1471-2105-10-376] [Citation(s) in RCA: 22] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/02/2008] [Accepted: 11/16/2009] [Indexed: 12/22/2022] Open
Abstract
BACKGROUND Graph-based pathway ontologies and databases are widely used to represent data about cellular processes. This representation makes it possible to programmatically integrate cellular networks and to investigate them using the well-understood concepts of graph theory in order to predict their structural and dynamic properties. An extension of this graph representation, namely hierarchically structured or compound graphs, in which a member of a biological network may recursively contain a sub-network of a somehow logically similar group of biological objects, provides many additional benefits for analysis of biological pathways, including reduction of complexity by decomposition into distinct components or modules. In this regard, it is essential to effectively query such integrated large compound networks to extract the sub-networks of interest with the help of efficient algorithms and software tools. RESULTS Towards this goal, we developed a querying framework, along with a number of graph-theoretic algorithms from simple neighborhood queries to shortest paths to feedback loops, that is applicable to all sorts of graph-based pathway databases, from PPIs (protein-protein interactions) to metabolic and signaling pathways. The framework is unique in that it can account for compound or nested structures and ubiquitous entities present in the pathway data. In addition, the queries may be related to each other through "AND" and "OR" operators, and can be recursively organized into a tree, in which the result of one query might be a source and/or target for another, to form more complex queries. The algorithms were implemented within the querying component of a new version of the software tool PATIKAweb (Pathway Analysis Tool for Integration and Knowledge Acquisition) and have proven useful for answering a number of biologically significant questions for large graph-based pathway databases. CONCLUSION The PATIKA Project Web site is http://www.patika.org. PATIKAweb version 2.1 is available at http://web.patika.org.
Collapse
Affiliation(s)
- Ugur Dogrusoz
- Center for Bioinformatics and Computer Engineering Dept., Bilkent University, Ankara, Turkey
| | - Ahmet Cetintas
- Center for Bioinformatics and Computer Engineering Dept., Bilkent University, Ankara, Turkey
| | - Emek Demir
- Computational Biology Center, Memorial Sloan-Kettering Cancer Center, New York, NY, USA
| | - Ozgun Babur
- Center for Bioinformatics and Computer Engineering Dept., Bilkent University, Ankara, Turkey
| |
Collapse
|
29
|
Mayya V, Lundgren DH, Hwang SI, Rezaul K, Wu L, Eng JK, Rodionov V, Han DK. Quantitative phosphoproteomic analysis of T cell receptor signaling reveals system-wide modulation of protein-protein interactions. Sci Signal 2009; 2:ra46. [PMID: 19690332 DOI: 10.1126/scisignal.2000007] [Citation(s) in RCA: 307] [Impact Index Per Article: 19.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/02/2022]
Abstract
Protein phosphorylation events during T cell receptor (TCR) signaling control the formation of complexes among proteins proximal to the TCR, the activation of kinase cascades, and the activation of transcription factors; however, the mode and extent of the influence of phosphorylation in coordinating the diverse phenomena associated with T cell activation are unclear. Therefore, we used the human Jurkat T cell leukemia cell line as a model system and performed large-scale quantitative phosphoproteomic analyses of TCR signaling. We identified 10,665 unique phosphorylation sites, of which 696 showed TCR-responsive changes. In addition, we analyzed broad trends in phosphorylation data sets to uncover underlying mechanisms associated with T cell activation. We found that, upon stimulation of the TCR, phosphorylation events extensively targeted protein modules involved in all of the salient phenomena associated with T cell activation: patterning of surface proteins, endocytosis of the TCR, formation of the F-actin cup, inside-out activation of integrins, polarization of microtubules, production of cytokines, and alternative splicing of messenger RNA. Further, case-by-case analysis of TCR-responsive phosphorylation sites on proteins belonging to relevant functional modules together with network analysis allowed us to deduce that serine-threonine (S-T) phosphorylation modulated protein-protein interactions (PPIs) in a system-wide fashion. We also provide experimental support for this inference by showing that phosphorylation of tubulin on six distinct serine residues abrogated PPIs during the assembly of microtubules. We propose that modulation of PPIs by stimulus-dependent changes in S-T phosphorylation state is a widespread phenomenon applicable to many other signaling systems.
Collapse
Affiliation(s)
- Viveka Mayya
- Department of Cell Biology, University of Connecticut Health Center, Farmington, 06030, USA
| | | | | | | | | | | | | | | |
Collapse
|
30
|
Janky R, Helden JV, Babu MM. Investigating transcriptional regulation: from analysis of complex networks to discovery of cis-regulatory elements. Methods 2009; 48:277-86. [PMID: 19450688 DOI: 10.1016/j.ymeth.2009.04.022] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/06/2009] [Revised: 04/17/2009] [Accepted: 04/18/2009] [Indexed: 10/20/2022] Open
Abstract
Regulation of gene expression at the transcriptional level is a fundamental mechanism that is well conserved in all cellular systems. Due to advances in large-scale experimental analyses, we now have a wealth of information on gene regulation such as mRNA expression level across multiple conditions, genome-wide location data of transcription factors and data on transcription factor binding sites. This knowledge can be used to reconstruct transcriptional regulatory networks. Such networks are usually represented as directed graphs where regulatory interactions are depicted as directed edges from the transcription factor nodes to the target gene nodes. This abstract representation allows us to apply graph theory to study transcriptional regulation at global and local levels, to predict regulatory motifs and regulatory modules such as regulons and to compare the regulatory network of different genomes. Here we review some of the available computational methodologies for studying transcriptional regulatory networks as well as their interpretation.
Collapse
Affiliation(s)
- Rekin's Janky
- Structural Studies Division, Medical Research Council - Laboratory of Molecular Biology, Hills Road, Cambridge CB20QH, United Kingdom.
| | | | | |
Collapse
|
31
|
Rodriguez-Llorente I, Caviedes MA, Dary M, Palomares AJ, Cánovas FM, Peregrín-Alvarez JM. The Symbiosis Interactome: a computational approach reveals novel components, functional interactions and modules in Sinorhizobium meliloti. BMC SYSTEMS BIOLOGY 2009; 3:63. [PMID: 19531251 PMCID: PMC2701930 DOI: 10.1186/1752-0509-3-63] [Citation(s) in RCA: 12] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 01/29/2009] [Accepted: 06/16/2009] [Indexed: 11/10/2022]
Abstract
BACKGROUND Rhizobium-Legume symbiosis is an attractive biological process that has been studied for decades because of its importance in agriculture. However, this system has undergone extensive study and although many of the major factors underpinning the process have been discovered using traditional methods, much remains to be discovered. RESULTS Here we present an analysis of the 'Symbiosis Interactome' using novel computational methods in order to address the complex dynamic interactions between proteins involved in the symbiosis of the model bacteria Sinorhizobium meliloti with its plant hosts. Our study constitutes the first large-scale analysis attempting to reconstruct this complex biological process, and to identify novel proteins involved in establishing symbiosis. We identified 263 novel proteins potentially associated with the Symbiosis Interactome. The topology of the Symbiosis Interactome was used to guide experimental techniques attempting to validate novel proteins involved in different stages of symbiosis. The contribution of a set of novel proteins was tested analyzing the symbiotic properties of several S. meliloti mutants. We found mutants with altered symbiotic phenotypes suggesting novel proteins that provide key complementary roles for symbiosis. CONCLUSION Our 'systems-based model' represents a novel framework for studying host-microbe interactions, provides a theoretical basis for further experimental validations, and can also be applied to the study of other complex processes such as diseases.
Collapse
|
32
|
Peregrín-Alvarez JM, Sanford C, Parkinson J. The conservation and evolutionary modularity of metabolism. Genome Biol 2009; 10:R63. [PMID: 19523219 PMCID: PMC2718497 DOI: 10.1186/gb-2009-10-6-r63] [Citation(s) in RCA: 103] [Impact Index Per Article: 6.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/05/2009] [Revised: 05/27/2009] [Accepted: 06/12/2009] [Indexed: 01/09/2023] Open
Abstract
A novel evolutionary analysis of metabolic networks across 26 taxa reveals a highly-conserved but flexible core of metabolic enzymes. Background Cellular metabolism is a fundamental biological system consisting of myriads of enzymatic reactions that together fulfill the basic requirements of life. The recent availability of vast amounts of sequence data from diverse sets of organisms provides an opportunity to systematically examine metabolism from a comparative perspective. Here we supplement existing genome and protein resources with partial genome datasets derived from 193 eukaryotes to present a comprehensive survey of the conservation of metabolism across 26 taxa representing the three domains of life. Results In general, metabolic enzymes are highly conserved. However, organizing these enzymes within the context of functional pathways revealed a spectrum of conservation from those that are highly conserved (for example, carbohydrate, energy, amino acid and nucleotide metabolism enzymes) to those specific to individual taxa (for example, those involved in glycan metabolism and secondary metabolite pathways). Applying a novel co-conservation analysis, KEGG defined pathways did not generally display evolutionary coherence. Instead, such modularity appears restricted to smaller subsets of enzymes. Expanding analyses to a global metabolic network revealed a highly conserved, but nonetheless flexible, 'core' of enzymes largely involved in multiple reactions across different pathways. Enzymes and pathways associated with the periphery of this network were less well conserved and associated with taxon-specific innovations. Conclusions These findings point to an emerging picture in which a core of enzyme activities involving amino acid, energy, carbohydrate and lipid metabolism have evolved to provide the basic functions required for life. However, the precise complement of enzymes associated within this core for each species is flexible.
Collapse
Affiliation(s)
- José M Peregrín-Alvarez
- Program in Molecular Structure and Function, Hospital for Sick Children, College Street, Toronto, ON M5G1L7, Canada.
| | | | | |
Collapse
|
33
|
Minguez P, Götz S, Montaner D, Al-Shahrour F, Dopazo J. SNOW, a web-based tool for the statistical analysis of protein-protein interaction networks. Nucleic Acids Res 2009; 37:W109-14. [PMID: 19454602 PMCID: PMC2703972 DOI: 10.1093/nar/gkp402] [Citation(s) in RCA: 39] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/25/2022] Open
Abstract
Understanding the structure and the dynamics of the complex intercellular network of interactions that contributes to the structure and function of a living cell is one of the main challenges of today's biology. SNOW inputs a collection of protein (or gene) identifiers and, by using the interactome as scaffold, draws the connections among them, calculates several relevant network parameters and, as a novelty among the rest of tools of its class, it estimates their statistical significance. The parameters calculated for each node are: connectivity, betweenness and clustering coefficient. It also calculates the number of components, number of bicomponents and articulation points. An interactive network viewer is also available to explore the resulting network. SNOW is available at http://snow.bioinfo.cipf.es.
Collapse
Affiliation(s)
- Pablo Minguez
- Department of Bioinformatics and Genomics, Centro de Investigación Príncipe Felipe, Valencia, Spain
| | | | | | | | | |
Collapse
|
34
|
Evolutionary rates and centrality in the yeast gene regulatory network. Genome Biol 2009; 10:R35. [PMID: 19358738 PMCID: PMC2688926 DOI: 10.1186/gb-2009-10-4-r35] [Citation(s) in RCA: 60] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/09/2008] [Accepted: 04/09/2009] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Transcription factors play a fundamental role in regulating physiological responses and developmental processes. Here we examine the evolution of the yeast transcription factors in the context of the structure of the gene regulatory network. RESULTS In contrast to previous results for the protein-protein interaction and metabolic networks, we find that the position of a gene within the transcription network affects the rate of protein evolution such that more central transcription factors tend to evolve faster. Centrality is also positively correlated with expression variability, suggesting that the higher rate of divergence among central transcription factors may be due to their role in controlling information flow and may be the result of adaptation to changing environmental conditions. Alternatively, more central transcription factors could be more buffered against environmental perturbations and, therefore, less subject to strong purifying selection. Importantly, the relationship between centrality and evolutionary rates is independent of expression level, expression variability and gene essentiality. CONCLUSIONS Our analysis of the transcription network highlights the role of network structure on protein evolutionary rate. Further, the effect of network centrality on nucleotide divergence is different among the metabolic, protein-protein and transcriptional networks, suggesting that the effect of gene position is dependant on the function of the specific network under study. A better understanding of how these three cellular networks interact with one another may be needed to fully examine the impact of network structure on the function and evolution of biological systems.
Collapse
|
35
|
Dynamic modularity in protein interaction networks predicts breast cancer outcome. Nat Biotechnol 2009; 27:199-204. [PMID: 19182785 DOI: 10.1038/nbt.1522] [Citation(s) in RCA: 496] [Impact Index Per Article: 31.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/25/2008] [Accepted: 12/18/2008] [Indexed: 01/21/2023]
Abstract
Changes in the biochemical wiring of oncogenic cells drives phenotypic transformations that directly affect disease outcome. Here we examine the dynamic structure of the human protein interaction network (interactome) to determine whether changes in the organization of the interactome can be used to predict patient outcome. An analysis of hub proteins identified intermodular hub proteins that are co-expressed with their interacting partners in a tissue-restricted manner and intramodular hub proteins that are co-expressed with their interacting partners in all or most tissues. Substantial differences in biochemical structure were observed between the two types of hubs. Signaling domains were found more often in intermodular hub proteins, which were also more frequently associated with oncogenesis. Analysis of two breast cancer patient cohorts revealed that altered modularity of the human interactome may be useful as an indicator of breast cancer prognosis.
Collapse
|
36
|
Functional analysis of OMICs data and small molecule compounds in an integrated "knowledge-based" platform. Methods Mol Biol 2009; 563:177-96. [PMID: 19597786 DOI: 10.1007/978-1-60761-175-2_10] [Citation(s) in RCA: 59] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/24/2022]
Abstract
Analysis of microarray, SNPs, proteomics, and other high-throughput (OMICs) data is challenging because of its biological complexity and high level of technical and biological noise. One way to deal with both problems is to perform analysis with a high-fidelity annotated knowledge base of protein interactions, pathways, and functional ontologies. This knowledge base has to be structured in a computer-readable format and must include software tools for managing experimental data, analysis, and reporting. Here we present MetaDiscovery, an integrated platform for functional data analysis which is being developed at GeneGo for the past 8 years. On the content side, MetaDiscovery encompasses a comprehensive database of protein interactions of different types, pathways, network models and 10 functional ontologies covering human, mouse, and rat proteins. The analytical toolkit includes tools for gene/protein list enrichment analysis, statistical "interactome" tool for identification of over- and under-connected proteins in the data set, and a network module made up of network generation algorithms and filters. The suite also features MetaSearch, an application for combinatorial search of the database content, as well as a Java-based tool called MapEditor for drawing and editing custom pathway maps. Applications of MetaDiscovery include identification of potential biomarkers and drug targets, pathway hypothesis generation, analysis of biological effects for novel small molecule compounds, and clinical applications (analysis of large cohorts of patients and translational and personalized medicine).
Collapse
|
37
|
Jordan IK, Katz LS, Denver DR, Streelman JT. Natural selection governs local, but not global, evolutionary gene coexpression networks in Caenorhabditis elegans. BMC SYSTEMS BIOLOGY 2008; 2:96. [PMID: 19014554 PMCID: PMC2596099 DOI: 10.1186/1752-0509-2-96] [Citation(s) in RCA: 12] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 04/22/2008] [Accepted: 11/13/2008] [Indexed: 11/13/2022]
Abstract
Background Large-scale evaluation of gene expression variation among Caenorhabditis elegans lines that have diverged from a common ancestor allows for the analysis of a novel class of biological networks – evolutionary gene coexpression networks. Comparative analysis of these evolutionary networks has the potential to uncover the effects of natural selection in shaping coexpression network topologies since C. elegans mutation accumulation (MA) lines evolve essentially free from the effects of natural selection, whereas natural isolate (NI) populations are subject to selective constraints. Results We compared evolutionary gene coexpression networks for C. elegans MA lines versus NI populations to evaluate the role that natural selection plays in shaping the evolution of network topologies. MA and NI evolutionary gene coexpression networks were found to have very similar global topological properties as measured by a number of network topological parameters. Observed MA and NI networks show node degree distributions and average values for node degree, clustering coefficient, path length, eccentricity and betweeness that are statistically indistinguishable from one another yet highly distinct from randomly simulated networks. On the other hand, at the local level the MA and NI coexpression networks are highly divergent; pairs of genes coexpressed in the MA versus NI lines are almost entirely different as are the connectivity and clustering properties of individual genes. Conclusion It appears that selective forces shape how local patterns of coexpression change over time but do not control the global topology of C. elegans evolutionary gene coexpression networks. These results have implications for the evolutionary significance of global network topologies, which are known to be conserved across disparate complex systems.
Collapse
Affiliation(s)
- I King Jordan
- School of Biology, Georgia Institute of Technology, Atlanta, GA, USA.
| | | | | | | |
Collapse
|
38
|
Hong SE, Park I, Cha H, Rho SH, Park WJ, Cho C, Kim DH. Identification of mouse heart transcriptomic network sensitive to various heart diseases. Biotechnol J 2008; 3:648-58. [PMID: 18320566 DOI: 10.1002/biot.200700250] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/06/2022]
Abstract
Exploring biological systems from highly complex datasets is an important task for systems biology. The present study examined co-expression dynamics of mouse heart transcriptome by spectral graph clustering (SGC) to identify a heart transcriptomic network. SGC of microarray data produced 17 classified biological conditions (called condition spectrum, CS) and co-expression patterns by generating bi-clusters. The results showed dynamic co-expression patterns with a modular structure enriched in heart-related CS (CS-1 and -13) containing abundant heart-related microarray data. Consequently, a mouse heart transcriptomic network was constructed by clique analysis from the gene clusters exclusively present in the heart-related CS; 31 cliques were used for constructing the network. The participating genes in the network were closely associated with important cardiac functions (e. g., development, lipid and glycogen metabolisms). Online Mendelian Inheritance in Man (OMIM) database indicates that mutations of the genes in the network induced serious heart diseases. Many of the tested genes in the network showed significantly altered gene expression in an animal model of hypertrophy. The results suggest that the present approach is critical for constructing a heart-related transcriptomic network and for deducing important genes involved in the pathogenesis of various heart diseases.
Collapse
Affiliation(s)
- Seong-Eui Hong
- Department of Life Science, Gwangju Institute of Science and Technology, Gwangju, Korea
| | | | | | | | | | | | | |
Collapse
|
39
|
Milenković T, Lai J, Przulj N. GraphCrunch: a tool for large network analyses. BMC Bioinformatics 2008; 9:70. [PMID: 18230190 PMCID: PMC2275247 DOI: 10.1186/1471-2105-9-70] [Citation(s) in RCA: 58] [Impact Index Per Article: 3.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/03/2007] [Accepted: 01/30/2008] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND The recent explosion in biological and other real-world network data has created the need for improved tools for large network analyses. In addition to well established global network properties, several new mathematical techniques for analyzing local structural properties of large networks have been developed. Small over-represented subgraphs, called network motifs, have been introduced to identify simple building blocks of complex networks. Small induced subgraphs, called graphlets, have been used to develop "network signatures" that summarize network topologies. Based on these network signatures, two new highly sensitive measures of network local structural similarities were designed: the relative graphlet frequency distance (RGF-distance) and the graphlet degree distribution agreement (GDD-agreement). Finding adequate null-models for biological networks is important in many research domains. Network properties are used to assess the fit of network models to the data. Various network models have been proposed. To date, there does not exist a software tool that measures the above mentioned local network properties. Moreover, none of the existing tools compare real-world networks against a series of network models with respect to these local as well as a multitude of global network properties. RESULTS Thus, we introduce GraphCrunch, a software tool that finds well-fitting network models by comparing large real-world networks against random graph models according to various network structural similarity measures. It has unique capabilities of finding computationally expensive RGF-distance and GDD-agreement measures. In addition, it computes several standard global network measures and thus supports the largest variety of network measures thus far. Also, it is the first software tool that compares real-world networks against a series of network models and that has built-in parallel computing capabilities allowing for a user specified list of machines on which to perform compute intensive searches for local network properties. Furthermore, GraphCrunch is easily extendible to include additional network measures and models. CONCLUSION GraphCrunch is a software tool that implements the latest research on biological network models and properties: it compares real-world networks against a series of random graph models with respect to a multitude of local and global network properties. We present GraphCrunch as a comprehensive, parallelizable, and easily extendible software tool for analyzing and modeling large biological networks. The software is open-source and freely available at http://www.ics.uci.edu/~bio-nets/graphcrunch/. It runs under Linux, MacOS, and Windows Cygwin. In addition, it has an easy to use on-line web user interface that is available from the above web page.
Collapse
Affiliation(s)
- Tijana Milenković
- Department of Computer Science, University of California, Irvine, CA 92697-3435, USA.
| | | | | |
Collapse
|
40
|
Assenov Y, Ramírez F, Schelhorn SE, Lengauer T, Albrecht M. Computing topological parameters of biological networks. ACTA ACUST UNITED AC 2007; 24:282-4. [PMID: 18006545 DOI: 10.1093/bioinformatics/btm554] [Citation(s) in RCA: 1245] [Impact Index Per Article: 69.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/21/2022]
Abstract
UNLABELLED Rapidly increasing amounts of molecular interaction data are being produced by various experimental techniques and computational prediction methods. In order to gain insight into the organization and structure of the resultant large complex networks formed by the interacting molecules, we have developed the versatile Cytoscape plugin NetworkAnalyzer. It computes and displays a comprehensive set of topological parameters, which includes the number of nodes, edges, and connected components, the network diameter, radius, density, centralization, heterogeneity, and clustering coefficient, the characteristic path length, and the distributions of node degrees, neighborhood connectivities, average clustering coefficients, and shortest path lengths. NetworkAnalyzer can be applied to both directed and undirected networks and also contains extra functionality to construct the intersection or union of two networks. It is an interactive and highly customizable application that requires no expert knowledge in graph theory from the user. AVAILABILITY NetworkAnalyzer can be downloaded via the Cytoscape web site: http://www.cytoscape.org
Collapse
Affiliation(s)
- Yassen Assenov
- Department of Computational Biology and Applied Algorithmics, Max Planck Institute for Informatics, Stuhlsatzenhausweg 85, 66123 Saarbrücken, Germany
| | | | | | | | | |
Collapse
|
41
|
Campanaro S, Picelli S, Torregrossa R, Colluto L, Ceol M, Del Prete D, D'Angelo A, Valle G, Anglani F. Genes involved in TGF beta1-driven epithelial-mesenchymal transition of renal epithelial cells are topologically related in the human interactome map. BMC Genomics 2007; 8:383. [PMID: 17953753 PMCID: PMC2174485 DOI: 10.1186/1471-2164-8-383] [Citation(s) in RCA: 16] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/28/2007] [Accepted: 10/22/2007] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Understanding how mesenchymal cells arise from epithelial cells could have a strong impact in unveiling mechanisms of epithelial cell plasticity underlying kidney regeneration and repair. In primary human tubular epithelial cells (HUTEC) under different TGF beta 1 concentrations we had observed epithelial-to-mesenchymal transition (EMT) but not epithelial-myofibroblast transdifferentiation. We hypothesized that the process triggered by TGFbeta 1 could be a dedifferentiation event. The purpose of this study is to comprehensively delineate genetic programs associated with TGF beta 1-driven EMT in our in vitro model using gene expression profile on large-scale oligonucleotide microarrays. RESULTS In HUTEC under TGF beta 1 stimulus, 977 genes were found differentially expressed. Thirty genes were identified whose expression depended directly on TGF beta 1 concentration. By mapping the differentially expressed genes in the Human Interactome Map using Cytoscape software, we identified a single scale-free network consisting of 2630 interacting proteins and containing 449 differentially expressed proteins. We identified 27 hub proteins in the interactome with more than 29 edges incident on them and encoded by differentially expressed genes. The Gene Ontology analysis showed an excess of up-regulated proteins involved in biological processes, such as "morphogenesis", "cell fate determination" and "regulation of development", and the most up-regulated genes belonged to these categories. In addition, 267 genes were mapped to the KEGG pathways and 14 pathways with more than nine differentially expressed genes were identified. In our model, Smad signaling was not the TGF beta 1 action effector; instead, the engagement of RAS/MAPK signaling pathway seems mainly to regulate genes involved in the cell cycle and proliferation/apoptosis. CONCLUSION Our present findings support the hypothesis that context-dependent EMT generated in our model by TGF beta 1 might be the outcome of a dedifferentiation. In fact: 1) the principal biological categories involved in the process concern morphogenesis and development; 2) the most up-regulated genes belong to these categories; and, finally, 3) some intracellular pathways are involved, whose engagement during kidney development and nephrogenesis is well known. These long-term effects of TGF beta 1 in HUTEC involve genes that are highly interconnected, thereby generating a scale-free network that we named the "TGF beta 1 interactome", whose hubs represent proteins that may have a crucial role for HUTEC in response to TGF beta 1.
Collapse
Affiliation(s)
- Stefano Campanaro
- CRIBI Biotechnology Center, Department of Biology, University of Padova, Italy.
| | | | | | | | | | | | | | | | | |
Collapse
|
42
|
Ma S, Gong Q, Bohnert HJ. An Arabidopsis gene network based on the graphical Gaussian model. Genome Res 2007; 17:1614-25. [PMID: 17921353 DOI: 10.1101/gr.6911207] [Citation(s) in RCA: 177] [Impact Index Per Article: 9.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/28/2023]
Abstract
We describe a gene network for the Arabidopsis thaliana transcriptome based on a modified graphical Gaussian model (GGM). Through partial correlation (pcor), GGM infers coregulation patterns between gene pairs conditional on the behavior of other genes. Regularized GGM calculated pcor between gene pairs among approximately 2000 input genes at a time. Regularized GGM coupled with iterative random samplings of genes was expanded into a network that covered the Arabidopsis genome (22,266 genes). This resulted in a network of 18,625 interactions (edges) among 6760 genes (nodes) with high confidence and connections representing approximately 0.01% of all possible edges. When queried for selected genes, locally coherent subnetworks mainly related to metabolic functions, and stress responses emerged. Examples of networks for biochemical pathways, cell wall metabolism, and cold responses are presented. GGM displayed known coregulation pathways as subnetworks and added novel components to known edges. Finally, the network reconciled individual subnetworks in a topology joined at the whole-genome level and provided a general framework that can instruct future studies on plant metabolism and stress responses. The network model is included.
Collapse
Affiliation(s)
- Shisong Ma
- Physiological and Molecular Plant Biology Program, University of Illinois at Urbana-Champaign, Urbana, Illinois 61801, USA
| | | | | |
Collapse
|
43
|
Byrne AB, Weirauch MT, Wong V, Koeva M, Dixon SJ, Stuart JM, Roy PJ. A global analysis of genetic interactions in Caenorhabditis elegans. J Biol 2007; 6:8. [PMID: 17897480 PMCID: PMC2373897 DOI: 10.1186/jbiol58] [Citation(s) in RCA: 121] [Impact Index Per Article: 6.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/04/2007] [Revised: 07/31/2007] [Accepted: 08/17/2007] [Indexed: 01/10/2023] Open
Abstract
Background Understanding gene function and genetic relationships is fundamental to our efforts to better understand biological systems. Previous studies systematically describing genetic interactions on a global scale have either focused on core biological processes in protozoans or surveyed catastrophic interactions in metazoans. Here, we describe a reliable high-throughput approach capable of revealing both weak and strong genetic interactions in the nematode Caenorhabditis elegans. Results We investigated interactions between 11 'query' mutants in conserved signal transduction pathways and hundreds of 'target' genes compromised by RNA interference (RNAi). Mutant-RNAi combinations that grew more slowly than controls were identified, and genetic interactions inferred through an unbiased global analysis of the interaction matrix. A network of 1,246 interactions was uncovered, establishing the largest metazoan genetic-interaction network to date. We refer to this approach as systematic genetic interaction analysis (SGI). To investigate how genetic interactions connect genes on a global scale, we superimposed the SGI network on existing networks of physical, genetic, phenotypic and coexpression interactions. We identified 56 putative functional modules within the superimposed network, one of which regulates fat accumulation and is coordinated by interactions with bar-1(ga80), which encodes a homolog of β-catenin. We also discovered that SGI interactions link distinct subnetworks on a global scale. Finally, we showed that the properties of genetic networks are conserved between C. elegans and Saccharomyces cerevisiae, but that the connectivity of interactions within the current networks is not. Conclusions Synthetic genetic interactions may reveal redundancy among functional modules on a global scale, which is a previously unappreciated level of organization within metazoan systems. Although the buffering between functional modules may differ between species, studying these differences may provide insight into the evolution of divergent form and function.
Collapse
Affiliation(s)
- Alexandra B Byrne
- Department of Medical Genetics and Microbiology, The Terrence Donnelly Centre for Cellular and Biomolecular Research, 160 College St, University of Toronto, Toronto, ON, M5S 3E1, Canada
- Collaborative Program in Developmental Biology, University of Toronto, Toronto, ON, M5S 3E1, Canada
| | - Matthew T Weirauch
- Department of Biomolecular Engineering, 1156 High Street, Mail Stop SOE2, University of California, Santa Cruz, CA 95064, USA
| | - Victoria Wong
- Department of Medical Genetics and Microbiology, The Terrence Donnelly Centre for Cellular and Biomolecular Research, 160 College St, University of Toronto, Toronto, ON, M5S 3E1, Canada
| | - Martina Koeva
- Department of Biomolecular Engineering, 1156 High Street, Mail Stop SOE2, University of California, Santa Cruz, CA 95064, USA
| | - Scott J Dixon
- Department of Medical Genetics and Microbiology, The Terrence Donnelly Centre for Cellular and Biomolecular Research, 160 College St, University of Toronto, Toronto, ON, M5S 3E1, Canada
- Collaborative Program in Developmental Biology, University of Toronto, Toronto, ON, M5S 3E1, Canada
| | - Joshua M Stuart
- Department of Biomolecular Engineering, 1156 High Street, Mail Stop SOE2, University of California, Santa Cruz, CA 95064, USA
| | - Peter J Roy
- Department of Medical Genetics and Microbiology, The Terrence Donnelly Centre for Cellular and Biomolecular Research, 160 College St, University of Toronto, Toronto, ON, M5S 3E1, Canada
- Collaborative Program in Developmental Biology, University of Toronto, Toronto, ON, M5S 3E1, Canada
| |
Collapse
|
44
|
Ferro A, Giugno R, Pigola G, Pulvirenti A, Skripin D, Bader GD, Shasha D. NetMatch: a Cytoscape plugin for searching biological networks. Bioinformatics 2007; 23:910-2. [PMID: 17277332 DOI: 10.1093/bioinformatics/btm032] [Citation(s) in RCA: 42] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open
Abstract
UNLABELLED NetMatch is a Cytoscape plugin which allows searching biological networks for subcomponents matching a given query. Queries may be approximate in the sense that certain parts of the subgraph-query may be left unspecified. To make the query creation process easy, a drawing tool is provided. Cytoscape is a bioinformatics software platform for the visualization and analysis of biological networks. AVAILABILITY The full package, a tutorial and associated examples are available at the following web sites: http://alpha.dmi.unict.it/~ctnyu/netmatch.html, http://baderlab.org/Software/NetMatch.
Collapse
Affiliation(s)
- A Ferro
- Dipartimento di Matematica e Informatica, Università di Catania, Viale A. Doria 6, I-95125 Catania, Italy.
| | | | | | | | | | | | | |
Collapse
|
45
|
Cline MS, Smoot M, Cerami E, Kuchinsky A, Landys N, Workman C, Christmas R, Avila-Campilo I, Creech M, Gross B, Hanspers K, Isserlin R, Kelley R, Killcoyne S, Lotia S, Maere S, Morris J, Ono K, Pavlovic V, Pico AR, Vailaya A, Wang PL, Adler A, Conklin BR, Hood L, Kuiper M, Sander C, Schmulevich I, Schwikowski B, Warner GJ, Ideker T, Bader GD. Integration of biological networks and gene expression data using Cytoscape. Nat Protoc 2007; 2:2366-82. [PMID: 17947979 PMCID: PMC3685583 DOI: 10.1038/nprot.2007.324] [Citation(s) in RCA: 1870] [Impact Index Per Article: 103.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/08/2022]
Abstract
Cytoscape is a free software package for visualizing, modeling and analyzing molecular and genetic interaction networks. This protocol explains how to use Cytoscape to analyze the results of mRNA expression profiling, and other functional genomics and proteomics experiments, in the context of an interaction network obtained for genes of interest. Five major steps are described: (i) obtaining a gene or protein network, (ii) displaying the network using layout algorithms, (iii) integrating with gene expression and other functional attributes, (iv) identifying putative complexes and functional modules and (v) identifying enriched Gene Ontology annotations in the network. These steps provide a broad sample of the types of analyses performed by Cytoscape.
Collapse
Affiliation(s)
- Melissa S Cline
- Institut Pasteur, 25-28 rue du Docteur Roux, 75724 Paris cedex 15, France
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|