Reference Citation Analysis: Find an Article, Find a Category, Find a Journal, Find a Scholar

For: Sharan R, Ideker T, Kelley B, Shamir R, Karp RM. Identification of Protein Complexes by Comparative Analysis of Yeast and Bacterial Protein Interaction Data. J Comput Biol 2005;12:835-46. [PMID: 16108720 DOI: 10.1089/cmb.2005.12.835] [Citation(s) in RCA: 76] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/25/2022] Open

For:	Sharan R, Ideker T, Kelley B, Shamir R, Karp RM. Identification of Protein Complexes by Comparative Analysis of Yeast and Bacterial Protein Interaction Data. J Comput Biol 2005;12:835-46. [PMID: 16108720 DOI: 10.1089/cmb.2005.12.835] [Citation(s) in RCA: 76] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/25/2022] Open

Number

Cited by Other Article(s)

Wang W, Meng X, Xiang J, Shuai Y, Bedru HD, Li M. CACO: A Core-Attachment Method With Cross-Species Functional Ortholog Information to Detect Human Protein Complexes. IEEE J Biomed Health Inform 2023;27:4569-4578. [PMID: 37399160 DOI: 10.1109/jbhi.2023.3289490] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 07/05/2023]

Liany H, Lin Y, Jeyasekharan A, Rajan V. An Algorithm to Mine Therapeutic Motifs for Cancer from Networks of Genetic Interactions. IEEE J Biomed Health Inform 2022;26:2830-2838. [PMID: 34990373 DOI: 10.1109/jbhi.2022.3141076] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]

Advanced Network Sampling with Heterogeneous Multiple Chains. SENSORS 2021;21:s21051905. [PMID: 33803175 PMCID: PMC7963173 DOI: 10.3390/s21051905] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 01/26/2021] [Revised: 02/22/2021] [Accepted: 03/03/2021] [Indexed: 12/02/2022]

Wu Z, Liao Q, Fan S, Liu B. idenPC-CAP: Identify protein complexes from weighted RNA-protein heterogeneous interaction networks using co-assemble partner relation. Brief Bioinform 2020;22:6041167. [PMID: 33333549 DOI: 10.1093/bib/bbaa372] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/30/2020] [Revised: 11/07/2020] [Accepted: 11/20/2020] [Indexed: 12/18/2022] Open

Wu Z, Liao Q, Liu B. A comprehensive review and evaluation of computational methods for identifying protein complexes from protein-protein interaction networks. Brief Bioinform 2020;21:1531-1548. [PMID: 31631226 DOI: 10.1093/bib/bbz085] [Citation(s) in RCA: 32] [Impact Index Per Article: 6.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/08/2019] [Revised: 06/17/2019] [Accepted: 06/17/2019] [Indexed: 01/03/2025] Open

Liu X, Yang Z, Sang S, Lin H, Wang J, Xu B. Detection of protein complexes from multiple protein interaction networks using graph embedding. Artif Intell Med 2019;96:107-115. [PMID: 31164203 DOI: 10.1016/j.artmed.2019.04.001] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/22/2018] [Revised: 04/06/2019] [Accepted: 04/06/2019] [Indexed: 12/22/2022]

Zahiri J, Emamjomeh A, Bagheri S, Ivazeh A, Mahdevar G, Sepasi Tehrani H, Mirzaie M, Fakheri BA, Mohammad-Noori M. Protein complex prediction: A survey. Genomics 2019;112:174-183. [PMID: 30660789 DOI: 10.1016/j.ygeno.2019.01.011] [Citation(s) in RCA: 29] [Impact Index Per Article: 4.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/10/2018] [Revised: 11/27/2018] [Accepted: 01/15/2019] [Indexed: 02/08/2023]

Xu B, Li K, Zheng W, Liu X, Zhang Y, Zhao Z, He Z. Protein complexes identification based on go attributed network embedding. BMC Bioinformatics 2018;19:535. [PMID: 30572820 PMCID: PMC6302388 DOI: 10.1186/s12859-018-2555-x] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/22/2018] [Accepted: 11/30/2018] [Indexed: 01/19/2023] Open

Abstract

Background

Identifying protein complexes from protein-protein interaction (PPI) network is one of the most important tasks in proteomics. Existing computational methods try to incorporate a variety of biological evidences to enhance the quality of predicted complexes. However, it is still a challenge to integrate different types of biological information into the complexes discovery process under a unified framework. Recently, attributed network embedding methods have be proved to be remarkably effective in generating vector representations for nodes in the network. In the transformed vector space, both the topological proximity and node attributed affinity between different nodes are preserved. Therefore, such attributed network embedding methods provide us a unified framework to integrate various biological evidences into the protein complexes identification process.

Results

In this article, we propose a new method called GANE to predict protein complexes based on Gene Ontology (GO) attributed network embedding. Firstly, it learns the vector representation for each protein from a GO attributed PPI network. Based on the pair-wise vector representation similarity, a weighted adjacency matrix is constructed. Secondly, it uses the clique mining method to generate candidate cores. Consequently, seed cores are obtained by ranking candidate cores based on their densities on the weighted adjacency matrix and removing redundant cores. For each seed core, its attachments are the proteins with correlation score that is larger than a given threshold. The combination of a seed core and its attachment proteins is reported as a predicted protein complex by the GANE algorithm. For performance evaluation, we compared GANE with six protein complex identification methods on five yeast PPI networks. Experimental results showes that GANE performs better than the competing algorithms in terms of different evaluation metrics.

Conclusions

GANE provides a framework that integrate many valuable and different biological information into the task of protein complex identification. The protein vector representation learned from our attributed PPI network can also be used in other tasks, such as PPI prediction and disease gene prediction.

Electronic supplementary material

The online version of this article (10.1186/s12859-018-2555-x) contains supplementary material, which is available to authorized users.

Collapse

Diament A, Tuller T. Modeling three-dimensional genomic organization in evolution and pathogenesis. Semin Cell Dev Biol 2018;90:78-93. [PMID: 30030143 DOI: 10.1016/j.semcdb.2018.07.008] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/28/2018] [Accepted: 07/08/2018] [Indexed: 12/17/2022]

Mohammadi S, Gleich DF, Kolda TG, Grama A. Triangular Alignment (TAME): A Tensor-Based Approach for Higher-Order Network Alignment. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2017;14:1446-1458. [PMID: 27483461 DOI: 10.1109/tcbb.2016.2595583] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/06/2023]

Ma CY, Chen YPP, Berger B, Liao CS. Identification of protein complexes by integrating multiple alignment of protein interaction networks. Bioinformatics 2017;33:1681-1688. [PMID: 28130237 PMCID: PMC5860626 DOI: 10.1093/bioinformatics/btx043] [Citation(s) in RCA: 25] [Impact Index Per Article: 3.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/31/2016] [Revised: 11/22/2016] [Accepted: 01/20/2017] [Indexed: 01/04/2023] Open

Abstract

MOTIVATION

Protein complexes are one of the keys to studying the behavior of a cell system. Many biological functions are carried out by protein complexes. During the past decade, the main strategy used to identify protein complexes from high-throughput network data has been to extract near-cliques or highly dense subgraphs from a single protein-protein interaction (PPI) network. Although experimental PPI data have increased significantly over recent years, most PPI networks still have many false positive interactions and false negative edge loss due to the limitations of high-throughput experiments. In particular, the false negative errors restrict the search space of such conventional protein complex identification approaches. Thus, it has become one of the most challenging tasks in systems biology to automatically identify protein complexes.

RESULTS

In this study, we propose a new algorithm, NEOComplex ( NE CC- and O rtholog-based Complex identification by multiple network alignment), which integrates functional orthology information that can be obtained from different types of multiple network alignment (MNA) approaches to expand the search space of protein complex detection. As part of our approach, we also define a new edge clustering coefficient (NECC) to assign weights to interaction edges in PPI networks so that protein complexes can be identified more accurately. The NECC is based on the intuition that there is functional information captured in the common neighbors of the common neighbors as well. Our results show that our algorithm outperforms well-known protein complex identification tools in a balance between precision and recall on three eukaryotic species: human, yeast, and fly. As a result of MNAs of the species, the proposed approach can tolerate edge loss in PPI networks and even discover sparse protein complexes which have traditionally been a challenge to predict.

AVAILABILITY AND IMPLEMENTATION

http://acolab.ie.nthu.edu.tw/bionetwork/NEOComplex.

CONTACT

bab@csail.mit.edu or csliao@ie.nthu.edu.tw.

SUPPLEMENTARY INFORMATION

Supplementary data are available at Bioinformatics online.

Collapse

Peng W, Li M, Chen L, Wang L. Predicting Protein Functions by Using Unbalanced Random Walk Algorithm on Three Biological Networks. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2017;14:360-369. [PMID: 28368814 DOI: 10.1109/tcbb.2015.2394314] [Citation(s) in RCA: 29] [Impact Index Per Article: 3.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/07/2023]

Snider J, Kotlyar M, Saraon P, Yao Z, Jurisica I, Stagljar I. Fundamentals of protein interaction network mapping. Mol Syst Biol 2015;11:848. [PMID: 26681426 PMCID: PMC4704491 DOI: 10.15252/msb.20156351] [Citation(s) in RCA: 201] [Impact Index Per Article: 20.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/13/2022] Open

Peng W, Wang J, Wu F, Yi P. Detecting conserved protein complexes using a dividing-and-matching algorithm and unequally lenient criteria for network comparison. Algorithms Mol Biol 2015;10:21. [PMID: 26136815 PMCID: PMC4487215 DOI: 10.1186/s13015-015-0053-5] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/30/2014] [Accepted: 05/26/2015] [Indexed: 01/09/2023] Open

Srihari S, Yong CH, Patil A, Wong L. Methods for protein complex prediction and their contributions towards understanding the organisation, function and dynamics of complexes. FEBS Lett 2015;589:2590-602. [PMID: 25913176 DOI: 10.1016/j.febslet.2015.04.026] [Citation(s) in RCA: 53] [Impact Index Per Article: 5.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/19/2015] [Revised: 04/14/2015] [Accepted: 04/14/2015] [Indexed: 12/30/2022]

Caufield JH, Abreu M, Wimble C, Uetz P. Protein complexes in bacteria. PLoS Comput Biol 2015;11:e1004107. [PMID: 25723151 PMCID: PMC4344305 DOI: 10.1371/journal.pcbi.1004107] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/31/2014] [Accepted: 01/02/2015] [Indexed: 01/26/2023] Open

Abstract

Large-scale analyses of protein complexes have recently become available for Escherichia coli and Mycoplasma pneumoniae, yielding 443 and 116 heteromultimeric soluble protein complexes, respectively. We have coupled the results of these mass spectrometry-characterized protein complexes with the 285 “gold standard” protein complexes identified by EcoCyc. A comparison with databases of gene orthology, conservation, and essentiality identified proteins conserved or lost in complexes of other species. For instance, of 285 “gold standard” protein complexes in E. coli, less than 10% are fully conserved among a set of 7 distantly-related bacterial “model” species. Complex conservation follows one of three models: well-conserved complexes, complexes with a conserved core, and complexes with partial conservation but no conserved core. Expanding the comparison to 894 distinct bacterial genomes illustrates fractional conservation and the limits of co-conservation among components of protein complexes: just 14 out of 285 model protein complexes are perfectly conserved across 95% of the genomes used, yet we predict more than 180 may be partially conserved across at least half of the genomes. No clear relationship between gene essentiality and protein complex conservation is observed, as even poorly conserved complexes contain a significant number of essential proteins. Finally, we identify 183 complexes containing well-conserved components and uncharacterized proteins which will be interesting targets for future experimental studies.

Though more than 20,000 binary protein-protein interactions have been published for a few well-studied bacterial species, the results rarely capture the full extent to which proteins take part in complexes. Here, we use experimentally-observed protein complexes from E. coli or Mycoplasma pneumoniae, as well as gene orthology, to predict protein complexes across many species of bacteria. Surprisingly, the majority of protein complexes is not conserved, demonstrating an unexpected evolutionary flexibility. We also observe broader trends within protein complex conservation, especially in genome-reduced species with minimal sets of protein complexes.

Collapse

Chen W, Schmidt M, Tian W, Samatova NF, Zhang S. An efficient algorithm for pairwise local alignment of protein interaction networks. J Bioinform Comput Biol 2014;13:1550003. [PMID: 25477149 DOI: 10.1142/s0219720015500031] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022]

Wu M, Kwoh CK, Li X, Zheng J. Finding trans-regulatory genes and protein complexes modulating meiotic recombination hotspots of human, mouse and yeast. BMC SYSTEMS BIOLOGY 2014;8:107. [PMID: 25208583 PMCID: PMC4236725 DOI: 10.1186/s12918-014-0107-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 12/10/2013] [Accepted: 07/11/2014] [Indexed: 11/18/2022]

Abstract

Background

The regulatory mechanism of recombination is one of the most fundamental problems in genomics, with wide applications in genome wide association studies (GWAS), birth-defect diseases, molecular evolution, cancer research, etc. Recombination events cluster into short genomic regions called “recombination hotspots”. Recently, a zinc finger protein PRDM9 was reported to regulate recombination hotspots in human and mouse genomes. In addition, a 13-mer motif contained in the binding sites of PRDM9 is found to be enriched in human hotspots. However, this 13-mer motif only covers a fraction of hotspots, indicating that PRDM9 is not the only regulator of recombination hotspots. Therefore, the challenge of discovering other regulators of recombination hotspots becomes significant. Furthermore, recombination is a complex process. Hence, multiple proteins acting as machinery, rather than individual proteins, are more likely to carry out this process in a precise and stable manner. Therefore, the extension of the prediction of individual trans-regulators to protein complexes is also highly desired.

Results

In this paper, we introduce a pipeline to identify genes and protein complexes associated with recombination hotspots. First, we prioritize proteins associated with hotspots based on their preference of binding to hotspots and coldspots. Second, using the above identified genes as seeds, we apply the Random Walk with Restart algorithm (RWR) to propagate their influences to other proteins in protein-protein interaction (PPI) networks. Hence, many proteins without DNA-binding information will also be assigned a score to implicate their roles in recombination hotspots. Third, we construct sub-PPI networks induced by top genes ranked by RWR for various species (e.g., yeast, human and mouse) and detect protein complexes in those sub-PPI networks.

Conclusions

The GO term analysis show that our prioritizing methods and the RWR algorithm are capable of identifying novel genes associated with recombination hotspots. The trans-regulators predicted by our pipeline are enriched with epigenetic functions (e.g., histone modifications), demonstrating the epigenetic regulatory mechanisms of recombination hotspots. The identified protein complexes also provide us with candidates to further investigate the molecular machineries for recombination hotspots. Moreover, the experimental data and results are available on our web site http://www.ntu.edu.sg/home/zhengjie/data/RecombinationHotspot/NetPipe/.

Collapse

Garcia-Reyero N, Tingaud-Sequeira A, Cao M, Zhu Z, Perkins EJ, Hu W. Endocrinology: advances through omics and related technologies. Gen Comp Endocrinol 2014;203:262-73. [PMID: 24726988 DOI: 10.1016/j.ygcen.2014.03.042] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 12/04/2013] [Revised: 03/20/2014] [Accepted: 03/22/2014] [Indexed: 12/27/2022]

Pache RA, Aloy P. Increasing the precision of orthology-based complex prediction through network alignment. PeerJ 2014;2:e413. [PMID: 24918034 PMCID: PMC4045337 DOI: 10.7717/peerj.413] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/13/2014] [Accepted: 05/13/2014] [Indexed: 12/01/2022] Open

Abstract

Macromolecular assemblies play an important role in almost all cellular processes. However, despite several large-scale studies, our current knowledge about protein complexes is still quite limited, thus advocating the use of in silico predictions to gather information on complex composition in model organisms. Since protein–protein interactions present certain constraints on the functional divergence of macromolecular assemblies during evolution, it is possible to predict complexes based on orthology data. Here, we show that incorporating interaction information through network alignment significantly increases the precision of orthology-based complex prediction. Moreover, we performed a large-scale in silico screen for protein complexes in human, yeast and fly, through the alignment of hundreds of known complexes to whole organism interactomes. Systematic comparison of the resulting network alignments to all complexes currently known in those species revealed many conserved complexes, as well as several novel complex components. In addition to validating our predictions using orthogonal data, we were able to assign specific functional roles to the predicted complexes. In several cases, the incorporation of interaction data through network alignment allowed to distinguish real complex components from other orthologous proteins. Our analyses indicate that current knowledge of yeast protein complexes exceeds that in other organisms and that predicting complexes in fly based on human and yeast data is complementary rather than redundant. Lastly, assessing the conservation of protein complexes of the human pathogen Mycoplasma pneumoniae, we discovered that its complexes repertoire is different from that of eukaryotes, suggesting new points of therapeutic intervention, whereas targeting the pathogen’s Restriction enzyme complex might lead to adverse effects due to its similarity to ATP-dependent metalloproteases in the human host.

Collapse

Mina M, Guzzi PH. Improving the Robustness of Local Network Alignment: Design and Extensive Assessment of a Markov Clustering-Based Approach. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2014;11:561-572. [PMID: 26356023 DOI: 10.1109/tcbb.2014.2318707] [Citation(s) in RCA: 17] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/05/2023]

Peng W, Wang J, Cai J, Chen L, Li M, Wu FX. Improving protein function prediction using domain and protein complexes in PPI networks. BMC SYSTEMS BIOLOGY 2014;8:35. [PMID: 24655481 PMCID: PMC3994332 DOI: 10.1186/1752-0509-8-35] [Citation(s) in RCA: 37] [Impact Index Per Article: 3.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 09/06/2012] [Accepted: 03/14/2014] [Indexed: 01/25/2023]

Narayanan T, Subramaniam S. A Newtonian framework for community detection in undirected biological networks. IEEE TRANSACTIONS ON BIOMEDICAL CIRCUITS AND SYSTEMS 2014;8:65-73. [PMID: 24681920 DOI: 10.1109/tbcas.2013.2288155] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/03/2023]

Liu Q, Chen YPP, Li J. k-Partite cliques of protein interactions: A novel subgraph topology for functional coherence analysis on PPI networks. J Theor Biol 2014;340:146-54. [PMID: 24056214 DOI: 10.1016/j.jtbi.2013.09.013] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/04/2013] [Revised: 08/09/2013] [Accepted: 09/10/2013] [Indexed: 01/02/2023]

Liu H, Beck TN, Golemis EA, Serebriiskii IG. Integrating in silico resources to map a signaling network. Methods Mol Biol 2014;1101:197-245. [PMID: 24233784 DOI: 10.1007/978-1-62703-721-1_11] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/23/2022]

Integrative approaches for finding modular structure in biological networks. Nat Rev Genet 2013;14:719-32. [PMID: 24045689 DOI: 10.1038/nrg3552] [Citation(s) in RCA: 365] [Impact Index Per Article: 30.4] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/16/2022]

Nguyen PV, Srihari S, Leong HW. Identifying conserved protein complexes between species by constructing interolog networks. BMC Bioinformatics 2013;14 Suppl 16:S8. [PMID: 24564762 PMCID: PMC4098725 DOI: 10.1186/1471-2105-14-s16-s8] [Citation(s) in RCA: 19] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/23/2022] Open

Abstract

BACKGROUND

Protein complexes conserved across species indicate processes that are core to cellular machinery (e.g. cell-cycle or DNA damage-repair complexes conserved across human and yeast). While numerous computational methods have been devised to identify complexes from the protein interaction (PPI) networks of individual species, these are severely limited by noise and errors (false positives) in currently available datasets. Our analysis using human and yeast PPI networks revealed that these methods missed several important complexes including those conserved between the two species (e.g. the MLH1-MSH2-PMS2-PCNA mismatch-repair complex). Here, we note that much of the functionalities of yeast complexes have been conserved in human complexes not only through sequence conservation of proteins but also of critical functional domains. Therefore, integrating information of domain conservation might throw further light on conservation patterns between yeast and human complexes.

RESULTS

We identify conserved complexes by constructing an interolog network (IN) leveraging on the functional conservation of proteins between species through domain conservation (from Ensembl) in addition to sequence similarity. We employ 'state-of-the-art' methods to cluster the interolog network, and map these clusters back to the original PPI networks to identify complexes conserved between the species. Evaluation of our IN-based approach (called COCIN) on human and yeast interaction data identifies several additional complexes (76% recall) compared to direct complex detection from the original PINs (54% recall). Our analysis revealed that the IN-construction removes several non-conserved interactions many of which are false positives, thereby improving complex prediction. In fact removing non-conserved interactions from the original PINs also resulted in higher number of conserved complexes, thereby validating our IN-based approach. These complexes included the mismatch repair complex, MLH1-MSH2-PMS2-PCNA, and other important ones namely, RNA polymerase-II, EIF3 and MCM complexes, all of which constitute core cellular processes known to be conserved across the two species.

CONCLUSIONS

Our method based on integrating domain conservation and sequence similarity to construct interolog networks helps to identify considerably more conserved complexes between the PPI networks from two species compared to direct complex prediction from the PPI networks. We observe from our experiments that protein complexes are not conserved from yeast to human in a straightforward way, that is, it is not the case that a yeast complex is a (proper) sub-set of a human complex with a few additional proteins present in the human complex. Instead complexes have evolved multifold with considerable re-organization of proteins and re-distribution of their functions across complexes. This finding can have significant implications on attempts to extrapolate other kinds of relationships such as synthetic lethality from yeast to human, for example in the identification of novel cancer targets.

AVAILABILITY

http://www.comp.nus.edu.sg/~leonghw/COCIN/.

Collapse

Dutkowski J, Tiuryn J. A probabilistic model of neutral and selective dynamics of protein network evolution. J Comput Biol 2013;20:631-42. [PMID: 23931333 DOI: 10.1089/cmb.2012.0295] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/31/2022] Open

Fan JH, Chen J, Sze SH. Identifying complexes from protein interaction networks according to different types of neighborhood density. J Comput Biol 2013;19:1284-94. [PMID: 23210476 DOI: 10.1089/cmb.2012.0195] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022] Open

Zhang B, Shi Z. Modules in Biological Networks. Bioinformatics 2013. [DOI: 10.4018/978-1-4666-3604-0.ch034] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/08/2022] Open

Singh P, Sreenivasan S, Szymanski BK, Korniss G. Threshold-limited spreading in social networks with multiple initiators. Sci Rep 2013;3:2330. [PMID: 23900230 PMCID: PMC3728590 DOI: 10.1038/srep02330] [Citation(s) in RCA: 68] [Impact Index Per Article: 5.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/25/2013] [Accepted: 07/15/2013] [Indexed: 11/17/2022] Open

Srihari S, Leong HW. A survey of computational methods for protein complex prediction from protein interaction networks. J Bioinform Comput Biol 2012;11:1230002. [PMID: 23600810 DOI: 10.1142/s021972001230002x] [Citation(s) in RCA: 77] [Impact Index Per Article: 5.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022]

Chen Z, Hendrix W, Guan H, Tetteh IK, Choudhary A, Semazzi F, Samatova NF. Discovery of extreme events-related communities in contrasting groups of physical system networks. Data Min Knowl Discov 2012. [DOI: 10.1007/s10618-012-0289-3] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]

Narayanan T, Gersten M, Subramaniam S, Grama A. Modularity detection in protein-protein interaction networks. BMC Res Notes 2011;4:569. [PMID: 22206604 PMCID: PMC3292542 DOI: 10.1186/1756-0500-4-569] [Citation(s) in RCA: 22] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/31/2011] [Accepted: 12/29/2011] [Indexed: 11/30/2022] Open

Abstract

Background

Many recent studies have investigated modularity in biological networks, and its role in functional and structural characterization of constituent biomolecules. A technique that has shown considerable promise in the domain of modularity detection is the Newman and Girvan (NG) algorithm, which relies on the number of shortest-paths across pairs of vertices in the network traversing a given edge, referred to as the betweenness of that edge. The edge with the highest betweenness is iteratively eliminated from the network, with the betweenness of the remaining edges recalculated in every iteration. This generates a complete dendrogram, from which modules are extracted by applying a quality metric called modularity denoted by Q. This exhaustive computation can be prohibitively expensive for large networks such as Protein-Protein Interaction Networks. In this paper, we present a novel optimization to the modularity detection algorithm, in terms of an efficient termination criterion based on a target edge betweenness value, using which the process of iterative edge removal may be terminated.

Results

We validate the robustness of our approach by applying our algorithm on real-world protein-protein interaction networks of Yeast, C.Elegans and Drosophila, and demonstrate that our algorithm consistently has significant computational gains in terms of reduced runtime, when compared to the NG algorithm. Furthermore, our algorithm produces modules comparable to those from the NG algorithm, qualitatively and quantitatively. We illustrate this using comparison metrics such as module distribution, module membership cardinality, modularity Q, and Jaccard Similarity Coefficient.

Conclusions

We have presented an optimized approach for efficient modularity detection in networks. The intuition driving our approach is the extraction of holistic measures of centrality from graphs, which are representative of inherent modular structure of the underlying network, and the application of those measures to efficiently guide the modularity detection process. We have empirically evaluated our approach in the specific context of real-world large scale biological networks, and have demonstrated significant savings in computational time while maintaining comparable quality of detected modules.

Collapse

Use of comparative genomics approaches to characterize interspecies differences in response to environmental chemicals: challenges, opportunities, and research needs. Toxicol Appl Pharmacol 2011;271:372-85. [PMID: 22142766 DOI: 10.1016/j.taap.2011.11.011] [Citation(s) in RCA: 23] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/08/2011] [Revised: 11/11/2011] [Accepted: 11/16/2011] [Indexed: 01/12/2023]

Abstract

A critical challenge for environmental chemical risk assessment is the characterization and reduction of uncertainties introduced when extrapolating inferences from one species to another. The purpose of this article is to explore the challenges, opportunities, and research needs surrounding the issue of how genomics data and computational and systems level approaches can be applied to inform differences in response to environmental chemical exposure across species. We propose that the data, tools, and evolutionary framework of comparative genomics be adapted to inform interspecies differences in chemical mechanisms of action. We compare and contrast existing approaches, from disciplines as varied as evolutionary biology, systems biology, mathematics, and computer science, that can be used, modified, and combined in new ways to discover and characterize interspecies differences in chemical mechanism of action which, in turn, can be explored for application to risk assessment. We consider how genetic, protein, pathway, and network information can be interrogated from an evolutionary biology perspective to effectively characterize variations in biological processes of toxicological relevance among organisms. We conclude that comparative genomics approaches show promise for characterizing interspecies differences in mechanisms of action, and further, for improving our understanding of the uncertainties inherent in extrapolating inferences across species in both ecological and human health risk assessment. To achieve long-term relevance and consistent use in environmental chemical risk assessment, improved bioinformatics tools, computational methods robust to data gaps, and quantitative approaches for conducting extrapolations across species are critically needed. Specific areas ripe for research to address these needs are recommended.

Collapse

YU L, GAO L, SUN PG. Research on Algorithms for Complexes and Functional Modules Prediction in Protein-Protein Interaction Networks. ACTA ACUST UNITED AC 2011. [DOI: 10.3724/sp.j.1016.2011.01239] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/25/2022]

Gallone G, Simpson TI, Armstrong JD, Jarman AP. Bio::Homology::InterologWalk--a Perl module to build putative protein-protein interaction networks through interolog mapping. BMC Bioinformatics 2011;12:289. [PMID: 21767381 PMCID: PMC3161927 DOI: 10.1186/1471-2105-12-289] [Citation(s) in RCA: 12] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/25/2011] [Accepted: 07/18/2011] [Indexed: 02/25/2023] Open

Abstract

BACKGROUND

Protein-protein interaction (PPI) data are widely used to generate network models that aim to describe the relationships between proteins in biological systems. The fidelity and completeness of such networks is primarily limited by the paucity of protein interaction information and by the restriction of most of these data to just a few widely studied experimental organisms. In order to extend the utility of existing PPIs, computational methods can be used that exploit functional conservation between orthologous proteins across taxa to predict putative PPIs or 'interologs'. To date most interolog prediction efforts have been restricted to specific biological domains with fixed underlying data sources and there are no software tools available that provide a generalised framework for 'on-the-fly' interolog prediction.

RESULTS

We introduce Bio::Homology::InterologWalk, a Perl module to retrieve, prioritise and visualise putative protein-protein interactions through an orthology-walk method. The module uses orthology and experimental interaction data to generate putative PPIs and optionally collates meta-data into an Interaction Prioritisation Index that can be used to help prioritise interologs for further analysis. We show the application of our interolog prediction method to the genomic interactome of the fruit fly, Drosophila melanogaster. We analyse the resulting interaction networks and show that the method proposes new interactome members and interactions that are candidates for future experimental investigation.

CONCLUSIONS

Our interolog prediction tool employs the Ensembl Perl API and PSICQUIC enabled protein interaction data sources to generate up to date interologs 'on-the-fly'. This represents a significant advance on previous methods for interolog prediction as it allows the use of the latest orthology and protein interaction data for all of the genomes in Ensembl. The module outputs simple text files, making it easy to customise the results by post-processing, allowing the putative PPI datasets to be easily integrated into existing analysis workflows. The Bio::Homology::InterologWalk module, sample scripts and full documentation are freely available from the Comprehensive Perl Archive Network (CPAN) under the GNU Public license.

Collapse

Hsu JT, Peng CH, Hsieh WP, Lan CY, Tang CY. A novel method to identify cooperative functional modules: study of module coordination in the Saccharomyces cerevisiae cell cycle. BMC Bioinformatics 2011;12:281. [PMID: 21749690 PMCID: PMC3143111 DOI: 10.1186/1471-2105-12-281] [Citation(s) in RCA: 11] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/18/2011] [Accepted: 07/12/2011] [Indexed: 12/23/2022] Open

Abstract

BACKGROUND

Identifying key components in biological processes and their associations is critical for deciphering cellular functions. Recently, numerous gene expression and molecular interaction experiments have been reported in Saccharomyces cerevisiae, and these have enabled systematic studies. Although a number of approaches have been used to predict gene functions and interactions, tools that analyze the essential coordination of functional components in cellular processes still need to be developed.

RESULTS

In this work, we present a new approach to study the cooperation of functional modules (sets of functionally related genes) in a specific cellular process. A cooperative module pair is defined as two modules that significantly cooperate with certain functional genes in a cellular process. This method identifies cooperative module pairs that significantly influence a cellular process and the correlated genes and interactions that are essential to that process. Using the yeast cell cycle as an example, we identified 101 cooperative module associations among 82 modules, and importantly, we established a cell cycle-specific cooperative module network. Most of the identified module pairs cover cooperative pathways and components essential to the cell cycle. We found that 14, 36, 18, 15, and 20 cooperative module pairs significantly cooperate with genes regulated in early G1, late G1, S, G2, and M phase, respectively. Fifty-nine module pairs that correlate with Cdc28 and other essential regulators were also identified. These results are consistent with previous studies and demonstrate that our methodology is effective for studying cooperative mechanisms in the cell cycle.

CONCLUSIONS

In this work, we propose a new approach to identifying condition-related cooperative interactions, and importantly, we establish a cell cycle-specific cooperation module network. These results provide a global view of the cell cycle and the method can be used to discover the dynamic coordination properties of functional components in other cellular processes.

Collapse

Ozery-Flato M, Linhart C, Trakhtenbrot L, Izraeli S, Shamir R. Large-scale analysis of chromosomal aberrations in cancer karyotypes reveals two distinct paths to aneuploidy. Genome Biol 2011;12:R61. [PMID: 21714908 PMCID: PMC3218849 DOI: 10.1186/gb-2011-12-6-r61] [Citation(s) in RCA: 47] [Impact Index Per Article: 3.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/20/2011] [Revised: 05/17/2011] [Accepted: 06/29/2011] [Indexed: 01/05/2023] Open

Feng J, Jiang R, Jiang T. A max-flow-based approach to the identification of protein complexes using protein interaction and microarray data. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2011;8:621-34. [PMID: 20733237 DOI: 10.1109/tcbb.2010.78] [Citation(s) in RCA: 26] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/29/2023]

Hou L, Wang L, Qian M, Li D, Tang C, Zhu Y, Deng M, Li F. Modular analysis of the probabilistic genetic interaction network. ACTA ACUST UNITED AC 2011;27:853-9. [PMID: 21278184 PMCID: PMC3051332 DOI: 10.1093/bioinformatics/btr031] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022]

Lagrangian Relaxation Applied to Sparse Global Network Alignment. PATTERN RECOGNITION IN BIOINFORMATICS 2011. [DOI: 10.1007/978-3-642-24855-9_20] [Citation(s) in RCA: 21] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/06/2023]

Bernarde C, Lehours P, Lasserre JP, Castroviejo M, Bonneu M, Mégraud F, Ménard A. Complexomics study of two Helicobacter pylori strains of two pathological origins: potential targets for vaccine development and new insight in bacteria metabolism. Mol Cell Proteomics 2010;9:2796-826. [PMID: 20610778 PMCID: PMC3101863 DOI: 10.1074/mcp.m110.001065] [Citation(s) in RCA: 18] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/25/2010] [Indexed: 12/12/2022] Open

Module discovery by exhaustive search for densely connected, co-expressed regions in biomolecular interaction networks. PLoS One 2010;5:e13348. [PMID: 21049092 PMCID: PMC2963598 DOI: 10.1371/journal.pone.0013348] [Citation(s) in RCA: 18] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/28/2009] [Accepted: 06/06/2010] [Indexed: 01/09/2023] Open

Abstract

Background

Computational prediction of functionally related groups of genes (functional modules) from large-scale data is an important issue in computational biology. Gene expression experiments and interaction networks are well studied large-scale data sources, available for many not yet exhaustively annotated organisms. It has been well established, when analyzing these two data sources jointly, modules are often reflected by highly interconnected (dense) regions in the interaction networks whose participating genes are co-expressed. However, the tractability of the problem had remained unclear and methods by which to exhaustively search for such constellations had not been presented.

Methodology/Principal Findings

We provide an algorithmic framework, referred to as Densely Connected Biclustering (DECOB), by which the aforementioned search problem becomes tractable. To benchmark the predictive power inherent to the approach, we computed all co-expressed, dense regions in physical protein and genetic interaction networks from human and yeast. An automatized filtering procedure reduces our output which results in smaller collections of modules, comparable to state-of-the-art approaches. Our results performed favorably in a fair benchmarking competition which adheres to standard criteria. We demonstrate the usefulness of an exhaustive module search, by using the unreduced output to more quickly perform GO term related function prediction tasks. We point out the advantages of our exhaustive output by predicting functional relationships using two examples.

Conclusion/Significance

We demonstrate that the computation of all densely connected and co-expressed regions in interaction networks is an approach to module discovery of considerable value. Beyond confirming the well settled hypothesis that such co-expressed, densely connected interaction network regions reflect functional modules, we open up novel computational ways to comprehensively analyze the modular organization of an organism based on prevalent and largely available large-scale datasets.

Availability

Software and data sets are available at http://www.sfu.ca/~ester/software/DECOB.zip.

Collapse

Jancura P, Marchiori E. Dividing protein interaction networks for modular network comparative analysis. Pattern Recognit Lett 2010. [DOI: 10.1016/j.patrec.2010.04.005] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/19/2022]

Bruckner S, Hüffner F, Karp RM, Shamir R, Sharan R. Topology-free querying of protein interaction networks. J Comput Biol 2010;17:237-52. [PMID: 20377443 DOI: 10.1089/cmb.2009.0170] [Citation(s) in RCA: 86] [Impact Index Per Article: 5.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022] Open

Koyutürk M. Algorithmic and analytical methods in network biology. WILEY INTERDISCIPLINARY REVIEWS. SYSTEMS BIOLOGY AND MEDICINE 2010;2:277-292. [PMID: 20836029 PMCID: PMC3087298 DOI: 10.1002/wsbm.61] [Citation(s) in RCA: 20] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Subscribe] [Scholar Register] [Indexed: 12/21/2022]

Abstract

During the genomic revolution, algorithmic and analytical methods for organizing, integrating, analyzing, and querying biological sequence data proved invaluable. Today, increasing availability of high-throughput data pertaining to functional states of biomolecules, as well as their interactions, enables genome-scale studies of the cell from a systems perspective. The past decade witnessed significant efforts on the development of computational infrastructure for large-scale modeling and analysis of biological systems, commonly using network models. Such efforts lead to novel insights into the complexity of living systems, through development of sophisticated abstractions, algorithms, and analytical techniques that address a broad range of problems, including the following: (1) inference and reconstruction of complex cellular networks; (2) identification of common and coherent patterns in cellular networks, with a view to understanding the organizing principles and building blocks of cellular signaling, regulation, and metabolism; and (3) characterization of cellular mechanisms that underlie the differences between living systems, in terms of evolutionary diversity, development and differentiation, and complex phenotypes, including human disease. These problems pose significant algorithmic and analytical challenges because of the inherent complexity of the systems being studied; limitations of data in terms of availability, scope, and scale; intractability of resulting computational problems; and limitations of reference models for reliable statistical inference. This article provides a broad overview of existing algorithmic and analytical approaches to these problems, highlights key biological insights provided by these approaches, and outlines emerging opportunities and challenges in computational systems biology.

Collapse

Li X, Wu M, Kwoh CK, Ng SK. Computational approaches for detecting protein complexes from protein interaction networks: a survey. BMC Genomics 2010;11 Suppl 1:S3. [PMID: 20158874 PMCID: PMC2822531 DOI: 10.1186/1471-2164-11-s1-s3] [Citation(s) in RCA: 180] [Impact Index Per Article: 12.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022] Open

Abstract

Background

Most proteins form macromolecular complexes to perform their biological functions. However, experimentally determined protein complex data, especially of those involving more than two protein partners, are relatively limited in the current state-of-the-art high-throughput experimental techniques. Nevertheless, many techniques (such as yeast-two-hybrid) have enabled systematic screening of pairwise protein-protein interactions en masse. Thus computational approaches for detecting protein complexes from protein interaction data are useful complements to the limited experimental methods. They can be used together with the experimental methods for mapping the interactions of proteins to understand how different proteins are organized into higher-level substructures to perform various cellular functions.

Results

Given the abundance of pairwise protein interaction data from high-throughput genome-wide experimental screenings, a protein interaction network can be constructed from protein interaction data by considering individual proteins as the nodes, and the existence of a physical interaction between a pair of proteins as a link. This binary protein interaction graph can then be used for detecting protein complexes using graph clustering techniques. In this paper, we review and evaluate the state-of-the-art techniques for computational detection of protein complexes, and discuss some promising research directions in this field.

Conclusions

Experimental results with yeast protein interaction data show that the interaction subgraphs discovered by various computational methods matched well with actual protein complexes. In addition, the computational approaches have also improved in performance over the years. Further improvements could be achieved if the quality of the underlying protein interaction data can be considered adequately to minimize the undesirable effects from the irrelevant and noisy sources, and the various biological evidences can be better incorporated into the detection process to maximize the exploitation of the increasing wealth of biological knowledge available.

Collapse

Pinkert S, Schultz J, Reichardt J. Protein interaction networks--more than mere modules. PLoS Comput Biol 2010;6:e1000659. [PMID: 20126533 PMCID: PMC2813263 DOI: 10.1371/journal.pcbi.1000659] [Citation(s) in RCA: 33] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/12/2008] [Accepted: 12/22/2009] [Indexed: 11/26/2022] Open

Abstract

It is widely believed that the modular organization of cellular function is reflected in a modular structure of molecular networks. A common view is that a “module” in a network is a cohesively linked group of nodes, densely connected internally and sparsely interacting with the rest of the network. Many algorithms try to identify functional modules in protein-interaction networks (PIN) by searching for such cohesive groups of proteins. Here, we present an alternative approach independent of any prior definition of what actually constitutes a “module”. In a self-consistent manner, proteins are grouped into “functional roles” if they interact in similar ways with other proteins according to their functional roles. Such grouping may well result in cohesive modules again, but only if the network structure actually supports this. We applied our method to the PIN from the Human Protein Reference Database (HPRD) and found that a representation of the network in terms of cohesive modules, at least on a global scale, does not optimally represent the network's structure because it focuses on finding independent groups of proteins. In contrast, a decomposition into functional roles is able to depict the structure much better as it also takes into account the interdependencies between roles and even allows groupings based on the absence of interactions between proteins in the same functional role. This, for example, is the case for transmembrane proteins, which could never be recognized as a cohesive group of nodes in a PIN. When mapping experimental methods onto the groups, we identified profound differences in the coverage suggesting that our method is able to capture experimental bias in the data, too. For example yeast-two-hybrid data were highly overrepresented in one particular group. Thus, there is more structure in protein-interaction networks than cohesive modules alone and we believe this finding can significantly improve automated function prediction algorithms.

Cellular function is widely believed to be organized in a modular fashion. On all scales and at all levels of complexity, relatively independent sub-units perform relatively independent sub-tasks. This functional modularity must be reflected in the topology of molecular networks. But how a functional module should be represented in an interaction network is an open question. On a small scale, one can identify a protein-complex as a module in protein-interaction networks (PIN), i.e., modules are understood as densely linked (interacting) groups of proteins, that are only sparsely interacting with the rest of the network. In this contribution, we show that extrapolating this concept of cohesively linked clusters of proteins as modules to the scale of the entire PIN inevitably misses important and functionally relevant structure inherent in the network. As an alternative, we introduce a novel way of decomposing a network into functional roles and show that this represents network structure and function more efficiently. This finding should have a profound impact on all module assisted methods of protein function prediction and should shed new light on how functional modules can be represented in molecular interaction networks in general.

Collapse

Vanunu O, Magger O, Ruppin E, Shlomi T, Sharan R. Associating genes and protein complexes with disease via network propagation. PLoS Comput Biol 2010;6:e1000641. [PMID: 20090828 PMCID: PMC2797085 DOI: 10.1371/journal.pcbi.1000641] [Citation(s) in RCA: 582] [Impact Index Per Article: 38.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/06/2009] [Accepted: 12/14/2009] [Indexed: 11/18/2022] Open

Abstract

A fundamental challenge in human health is the identification of disease-causing genes. Recently, several studies have tackled this challenge via a network-based approach, motivated by the observation that genes causing the same or similar diseases tend to lie close to one another in a network of protein-protein or functional interactions. However, most of these approaches use only local network information in the inference process and are restricted to inferring single gene associations. Here, we provide a global, network-based method for prioritizing disease genes and inferring protein complex associations, which we call PRINCE. The method is based on formulating constraints on the prioritization function that relate to its smoothness over the network and usage of prior information. We exploit this function to predict not only genes but also protein complex associations with a disease of interest. We test our method on gene-disease association data, evaluating both the prioritization achieved and the protein complexes inferred. We show that our method outperforms extant approaches in both tasks. Using data on 1,369 diseases from the OMIM knowledgebase, our method is able (in a cross validation setting) to rank the true causal gene first for 34% of the diseases, and infer 139 disease-related complexes that are highly coherent in terms of the function, expression and conservation of their member proteins. Importantly, we apply our method to study three multi-factorial diseases for which some causal genes have been found already: prostate cancer, alzheimer and type 2 diabetes mellitus. PRINCE's predictions for these diseases highly match the known literature, suggesting several novel causal genes and protein complexes for further investigation.

Understanding the genetic background of diseases is crucial to medical research, with implications in diagnosis, treatment and drug development. As molecular approaches to this challenge are time consuming and costly, computational approaches offer an efficient alternative. Such approaches aim at prioritizing genes in a genomic interval of interest according to their predicted strength-of-association with a given disease. State-of-the-art prioritization problems are based on the observation that genes causing similar diseases tend to lie close to one another in a network of protein-protein interactions. Here we develop a novel prioritization approach that uses the network data in a global manner and can tie not only single genes but also whole protein machineries with a given disease. Our method, PRINCE, is shown to outperform previous methods in both the gene prioritization task and the protein complex task. Applying PRINCE to prostate cancer, alzheimer's disease and type 2 diabetes, we are able to infer new causal genes and related protein complexes with high confidence.

Collapse