1
|
Miyauchi A, Kawase Y. Z-Score-Based Modularity for Community Detection in Networks. PLoS One 2016; 11:e0147805. [PMID: 26808270 PMCID: PMC4726636 DOI: 10.1371/journal.pone.0147805] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/03/2015] [Accepted: 01/08/2016] [Indexed: 11/19/2022] Open
Abstract
Identifying community structure in networks is an issue of particular interest in network science. The modularity introduced by Newman and Girvan is the most popular quality function for community detection in networks. In this study, we identify a problem in the concept of modularity and suggest a solution to overcome this problem. Specifically, we obtain a new quality function for community detection. We refer to the function as Z-modularity because it measures the Z-score of a given partition with respect to the fraction of the number of edges within communities. Our theoretical analysis shows that Z-modularity mitigates the resolution limit of the original modularity in certain cases. Computational experiments using both artificial networks and well-known real-world networks demonstrate the validity and reliability of the proposed quality function.
Collapse
Affiliation(s)
- Atsushi Miyauchi
- Graduate School of Decision Science and Technology, Tokyo Institute of Technology, Ookayama 2-12-1, Meguro-ku, Tokyo 152-8552, Japan
- * E-mail:
| | - Yasushi Kawase
- Graduate School of Decision Science and Technology, Tokyo Institute of Technology, Ookayama 2-12-1, Meguro-ku, Tokyo 152-8552, Japan
| |
Collapse
|
2
|
The reconstruction of complex networks with community structure. Sci Rep 2015; 5:17287. [PMID: 26620158 PMCID: PMC4664866 DOI: 10.1038/srep17287] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/02/2015] [Accepted: 10/28/2015] [Indexed: 11/28/2022] Open
Abstract
Link prediction is a fundamental problem with applications in many fields ranging from biology to computer science. In the literature, most effort has been devoted to estimate the likelihood of the existence of a link between two nodes, based on observed links and nodes’ attributes in a network. In this paper, we apply several representative link prediction methods to reconstruct the network, namely to add the missing links with high likelihood of existence back to the network. We find that all these existing methods fail to identify the links connecting different communities, resulting in a poor reproduction of the topological and dynamical properties of the true network. To solve this problem, we propose a community-based link prediction method. We find that our method has high prediction accuracy and is very effective in reconstructing the inter-community links.
Collapse
|
3
|
Schülke C, Ricci-Tersenghi F. Multiple phases in modularity-based community detection. PHYSICAL REVIEW. E, STATISTICAL, NONLINEAR, AND SOFT MATTER PHYSICS 2015; 92:042804. [PMID: 26565286 DOI: 10.1103/physreve.92.042804] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/16/2015] [Indexed: 06/05/2023]
Abstract
Detecting communities in a network, based only on the adjacency matrix, is a problem of interest to several scientific disciplines. Recently, Zhang and Moore have introduced an algorithm [Proc. Natl. Acad. Sci. USA 111, 18144 (2014)], called mod-bp, that avoids overfitting the data by optimizing a weighted average of modularity (a popular goodness-of-fit measure in community detection) and entropy (i.e., number of configurations with a given modularity). The adjustment of the relative weight, the "temperature" of the model, is crucial for getting a correct result from mod-bp. In this work we study the many phase transitions that mod-bp may undergo by changing the two parameters of the algorithm: the temperature T and the maximum number of groups q. We introduce a new set of order parameters that allow us to determine the actual number of groups q̂, and we observe on both synthetic and real networks the existence of phases with any q̂∈{1,q}, which were unknown before. We discuss how to interpret the results of mod-bp and how to make the optimal choice for the problem of detecting significant communities.
Collapse
Affiliation(s)
- Christophe Schülke
- Université Paris Diderot, Sorbonne Paris Cité, 75205 Paris, France and Dipartimento di Fisica, Università di Roma "La Sapienza," Piazzale Aldo Moro 2, 00185 Rome, Italy
| | - Federico Ricci-Tersenghi
- Dipartimento di Fisica, INFN-Sezione di Roma 1, and CNR-NANOTEC, UOS di Roma, Università di Roma "La Sapienza," Piazzale Aldo Moro 2, 00185 Rome, Italy
| |
Collapse
|
4
|
Bennett L, Kittas A, Muirhead G, Papageorgiou LG, Tsoka S. Detection of composite communities in multiplex biological networks. Sci Rep 2015; 5:10345. [PMID: 26012716 PMCID: PMC4446847 DOI: 10.1038/srep10345] [Citation(s) in RCA: 33] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/19/2014] [Accepted: 03/26/2015] [Indexed: 12/23/2022] Open
Abstract
The detection of community structure is a widely accepted means of investigating the
principles governing biological systems. Recent efforts are exploring ways in which
multiple data sources can be integrated to generate a more comprehensive model of
cellular interactions, leading to the detection of more biologically relevant
communities. In this work, we propose a mathematical programming model to cluster
multiplex biological networks, i.e. multiple network slices, each with a different
interaction type, to determine a single representative partition of composite
communities. Our method, known as SimMod, is evaluated through its application to
yeast networks of physical, genetic and co-expression interactions. A comparative
analysis involving partitions of the individual networks, partitions of aggregated
networks and partitions generated by similar methods from the literature highlights
the ability of SimMod to identify functionally enriched modules. It is further shown
that SimMod offers enhanced results when compared to existing approaches without the
need to train on known cellular interactions.
Collapse
Affiliation(s)
- Laura Bennett
- Centre for Process Systems Engineering, Department of Chemical Engineering,University College London, Torrington Place, London WC1E 7JE, United Kingdom
| | - Aristotelis Kittas
- Department of Informatics, Faculty of Natural and Mathematical Sciences, King's College London, Strand, London WC2R 2LS, UnitedKingdom
| | - Gareth Muirhead
- Department of Informatics, Faculty of Natural and Mathematical Sciences, King's College London, Strand, London WC2R 2LS, UnitedKingdom
| | - Lazaros G Papageorgiou
- Centre for Process Systems Engineering, Department of Chemical Engineering,University College London, Torrington Place, London WC1E 7JE, United Kingdom
| | - Sophia Tsoka
- Department of Informatics, Faculty of Natural and Mathematical Sciences, King's College London, Strand, London WC2R 2LS, UnitedKingdom
| |
Collapse
|
5
|
Le Thi HA, Nguyen MC, Dinh TP. A DC Programming Approach for Finding Communities in Networks. Neural Comput 2014; 26:2827-54. [DOI: 10.1162/neco_a_00673] [Citation(s) in RCA: 17] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/04/2022]
Abstract
Automatic discovery of community structures in complex networks is a fundamental task in many disciplines, including physics, biology, and the social sciences. The most used criterion for characterizing the existence of a community structure in a network is modularity, a quantitative measure proposed by Newman and Girvan ( 2004 ). The discovery community can be formulated as the so-called modularity maximization problem that consists of finding a partition of nodes of a network with the highest modularity. In this letter, we propose a fast and scalable algorithm called DCAM, based on DC (difference of convex function) programming and DCA (DC algorithms), an innovative approach in nonconvex programming framework for solving the modularity maximization problem. The special structure of the problem considered here has been well exploited to get an inexpensive DCA scheme that requires only a matrix-vector product at each iteration. Starting with a very large number of communities, DCAM furnishes, as output results, an optimal partition together with the optimal number of communities [Formula: see text]; that is, the number of communities is discovered automatically during DCAM’s iterations. Numerical experiments are performed on a variety of real-world network data sets with up to 4,194,304 nodes and 30,359,198 edges. The comparative results with height reference algorithms show that the proposed approach outperforms them not only on quality and rapidity but also on scalability. Moreover, it realizes a very good trade-off between the quality of solutions and the run time.
Collapse
Affiliation(s)
- Hoai An Le Thi
- Laboratory of Theoretical and Applied Computer Science, University of Lorraine, Ile du Saulcy, 57045 Metz, France
| | - Manh Cuong Nguyen
- Laboratory of Theoretical and Applied Computer Science, University of Lorraine, Ile du Saulcy, 57045 Metz, France
| | - Tao Pham Dinh
- Laboratoire of Mathematics, National Institute for Applied Sciences—Rouen, 76801 Saint-Étienne-du-Rouvray cedex, France
| |
Collapse
|
6
|
Bennett L, Kittas A, Liu S, Papageorgiou LG, Tsoka S. Community structure detection for overlapping modules through mathematical programming in protein interaction networks. PLoS One 2014; 9:e112821. [PMID: 25412367 PMCID: PMC4239042 DOI: 10.1371/journal.pone.0112821] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/17/2014] [Accepted: 10/15/2014] [Indexed: 12/05/2022] Open
Abstract
Community structure detection has proven to be important in revealing the underlying properties of complex networks. The standard problem, where a partition of disjoint communities is sought, has been continually adapted to offer more realistic models of interactions in these systems. Here, a two-step procedure is outlined for exploring the concept of overlapping communities. First, a hard partition is detected by employing existing methodologies. We then propose a novel mixed integer non linear programming (MINLP) model, known as OverMod, which transforms disjoint communities to overlapping. The procedure is evaluated through its application to protein-protein interaction (PPI) networks of the rat, E. coli, yeast and human organisms. Connector nodes of hard partitions exhibit topological and functional properties indicative of their suitability as candidates for multiple module membership. OverMod identifies two types of connector nodes, inter and intra-connector, each with their own particular characteristics pertaining to their topological and functional role in the organisation of the network. Inter-connector proteins are shown to be highly conserved proteins participating in pathways that control essential cellular processes, such as proliferation, differentiation and apoptosis and their differences with intra-connectors is highlighted. Many of these proteins are shown to possess multiple roles of distinct nature through their participation in different network modules, setting them apart from proteins that are simply ‘hubs’, i.e. proteins with many interaction partners but with a more specific biochemical role.
Collapse
Affiliation(s)
- Laura Bennett
- Centre for Process Systems Engineering, Department of Chemical Engineering, UCL (University College London), Torrington Place, WC1E 7JE, London, United Kingdom
| | - Aristotelis Kittas
- Department of Informatics, King's College London, Strand, WC2R 2LS, London, United Kingdom
| | - Songsong Liu
- Centre for Process Systems Engineering, Department of Chemical Engineering, UCL (University College London), Torrington Place, WC1E 7JE, London, United Kingdom
| | - Lazaros G. Papageorgiou
- Centre for Process Systems Engineering, Department of Chemical Engineering, UCL (University College London), Torrington Place, WC1E 7JE, London, United Kingdom
| | - Sophia Tsoka
- Department of Informatics, King's College London, Strand, WC2R 2LS, London, United Kingdom
- * E-mail:
| |
Collapse
|
7
|
Wu J, Jiao L, Jin C, Liu F, Gong M, Shang R, Chen W. Overlapping community detection via network dynamics. PHYSICAL REVIEW. E, STATISTICAL, NONLINEAR, AND SOFT MATTER PHYSICS 2012; 85:016115. [PMID: 22400633 DOI: 10.1103/physreve.85.016115] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/25/2011] [Revised: 12/10/2011] [Indexed: 05/31/2023]
Abstract
The modular structure of a network is closely related to the dynamics toward clustering. In this paper, a method for community detection is proposed via the clustering dynamics of a network. The initial phases of the nodes in the network are given randomly, and then they evolve according to a set of dedicatedly designed differential equations. The phases of the nodes are naturally separated into several clusters after a period of evolution, and each cluster corresponds to a community in the network. For the networks with overlapping communities, the phases of the overlapping nodes will evolve to the interspace of the two communities. The proposed method is illustrated with applications to both synthetically generated and real-world complex networks.
Collapse
Affiliation(s)
- Jianshe Wu
- Key Laboratory of Intelligent Perception and Image Understanding of Ministry of Education of China, Xidian University, Xi'an 710071, China
| | | | | | | | | | | | | |
Collapse
|
8
|
Finding and testing network communities by lumped Markov chains. PLoS One 2011; 6:e27028. [PMID: 22073245 PMCID: PMC3207820 DOI: 10.1371/journal.pone.0027028] [Citation(s) in RCA: 44] [Impact Index Per Article: 3.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/05/2011] [Accepted: 10/09/2011] [Indexed: 11/23/2022] Open
Abstract
Identifying communities (or clusters), namely groups of nodes with comparatively strong internal connectivity, is a fundamental task for deeply understanding the structure and function of a network. Yet, there is a lack of formal criteria for defining communities and for testing their significance. We propose a sharp definition that is based on a quality threshold. By means of a lumped Markov chain model of a random walker, a quality measure called “persistence probability” is associated to a cluster, which is then defined as an “-community” if such a probability is not smaller than . Consistently, a partition composed of -communities is an “-partition.” These definitions turn out to be very effective for finding and testing communities. If a set of candidate partitions is available, setting the desired -level allows one to immediately select the -partition with the finest decomposition. Simultaneously, the persistence probabilities quantify the quality of each single community. Given its ability in individually assessing each single cluster, this approach can also disclose single well-defined communities even in networks that overall do not possess a definite clusterized structure.
Collapse
|