1
|
Aref S, Mostajabdaveh M, Chheda H. Bayan algorithm: Detecting communities in networks through exact and approximate optimization of modularity. Phys Rev E 2024; 110:044315. [PMID: 39562863 DOI: 10.1103/physreve.110.044315] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/26/2024] [Accepted: 09/24/2024] [Indexed: 11/21/2024]
Abstract
Community detection is a classic network problem with extensive applications in various fields. Its most common method is using modularity maximization heuristics which rarely return an optimal partition or anything similar. Partitions with globally optimal modularity are difficult to compute, and therefore have been underexplored. Using structurally diverse networks, we compare 30 community detection methods including our proposed algorithm that offers optimality and approximation guarantees: the Bayan algorithm. Unlike existing methods, Bayan globally maximizes modularity or approximates it within a factor. Our results show the distinctive accuracy and stability of maximum-modularity partitions in retrieving planted partitions at rates higher than most alternatives for a wide range of parameter settings in two standard benchmarks. Compared to the partitions from 29 other algorithms, maximum-modularity partitions have the best medians for description length, coverage, performance, average conductance, and well clusteredness. These advantages come at the cost of additional computations which Bayan makes possible for small networks (networks that have up to 3000 edges in their largest connected component). Bayan is several times faster than using open-source and commercial solvers for modularity maximization, making it capable of finding optimal partitions for instances that cannot be optimized by any other existing method. Our results point to a few well-performing algorithms, among which Bayan stands out as the most reliable method for small networks. A python implementation of the Bayan algorithm (bayanpy) is publicly available through the package installer for python.
Collapse
|
2
|
Brusco M, Steinley D, Watts AL. Improving the Walktrap Algorithm Using K-Means Clustering. MULTIVARIATE BEHAVIORAL RESEARCH 2024; 59:266-288. [PMID: 38361218 PMCID: PMC11014777 DOI: 10.1080/00273171.2023.2254767] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 02/17/2024]
Abstract
The walktrap algorithm is one of the most popular community-detection methods in psychological research. Several simulation studies have shown that it is often effective at determining the correct number of communities and assigning items to their proper community. Nevertheless, it is important to recognize that the walktrap algorithm relies on hierarchical clustering because it was originally developed for networks much larger than those encountered in psychological research. In this paper, we present and demonstrate a computational alternative to the hierarchical algorithm that is conceptually easier to understand. More importantly, we show that better solutions to the sum-of-squares optimization problem that is heuristically tackled by hierarchical clustering in the walktrap algorithm can often be obtained using exact or approximate methods for K-means clustering. Three simulation studies and analyses of empirical networks were completed to assess the impact of better sum-of-squares solutions.
Collapse
|
3
|
Baghersad M, Emadikhiav M, Huang CD, Behara RS. Modularity maximization to design contiguous policy zones for pandemic response. EUROPEAN JOURNAL OF OPERATIONAL RESEARCH 2023; 304:99-112. [PMID: 35039709 PMCID: PMC8755430 DOI: 10.1016/j.ejor.2022.01.012] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 02/05/2021] [Accepted: 01/05/2022] [Indexed: 05/05/2023]
Abstract
The health and economic devastation caused by the COVID-19 pandemic has created a significant global humanitarian disaster. Pandemic response policies guided by geospatial approaches are appropriate additions to traditional epidemiological responses when addressing this disaster. However, little is known about finding the optimal set of locations or jurisdictions to create policy coordination zones. In this study, we propose optimization models and algorithms to identify coordination communities based on the natural movement of people. To do so, we develop a mixed-integer quadratic-programming model to maximize the modularity of detected communities while ensuring that the jurisdictions within each community are contiguous. To solve the problem, we present a heuristic and a column-generation algorithm. Our computational experiments highlight the effectiveness of the models and algorithms in various instances. We also apply the proposed optimization-based solutions to identify coordination zones within North Carolina and South Carolina, two highly interconnected states in the U.S. Results of our case study show that the proposed model detects communities that are significantly better for coordinating pandemic related policies than the existing geopolitical boundaries.
Collapse
Affiliation(s)
- Milad Baghersad
- Department of Information Technology & Operations Management, College of Business, Florida Atlantic University, Boca Raton, FL 33431-0991, USA
| | - Mohsen Emadikhiav
- Department of Information Technology & Operations Management, College of Business, Florida Atlantic University, Boca Raton, FL 33431-0991, USA
| | - C Derrick Huang
- Department of Information Technology & Operations Management, College of Business, Florida Atlantic University, Boca Raton, FL 33431-0991, USA
| | - Ravi S Behara
- Department of Information Technology & Operations Management, College of Business, Florida Atlantic University, Boca Raton, FL 33431-0991, USA
| |
Collapse
|
4
|
Lu Z, Zhou Y, Hao JK. A Hybrid Evolutionary Algorithm for the Clique Partitioning Problem. IEEE TRANSACTIONS ON CYBERNETICS 2022; 52:9391-9403. [PMID: 33635807 DOI: 10.1109/tcyb.2021.3051243] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/12/2023]
Abstract
The clique partitioning problem (CPP) of an edge-weighted complete graph is to partition the vertex set V into k disjoint subsets such that the sum of the edge weights within all cliques induced by the subsets is as large as possible. The problem has a number of practical applications in areas, such as data mining, engineering, and bioinformatics, and is, however, computationally challenging. To solve this NP-hard problem, we propose the first evolutionary algorithm that combines a dedicated merge-divide crossover operator to generate offspring solutions and an effective simulated annealing-based local optimization procedure to find high-quality local optima. The extensive experiments on three sets of 94 benchmark instances (including two sets of 63 classical benchmark instances and one new set of 31 large benchmark) show a remarkable performance of the proposed approach compared to the state-of-the-art methods. We analyze the key algorithmic ingredients to shed light on their impacts on the performance of the algorithm. The algorithm and its available source code can benefit people working on practical problems related to CPP.
Collapse
|
5
|
de Santiago R, Lamb LC. A ground truth contest between modularity maximization and modularity density maximization. Artif Intell Rev 2020. [DOI: 10.1007/s10462-019-09802-8] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/24/2022]
|
6
|
|
7
|
Singh A, Rossi A, Sevaux M. Heuristics for lifetime maximization in camera sensor networks. Inf Sci (N Y) 2017. [DOI: 10.1016/j.ins.2017.01.017] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/16/2022]
|
8
|
Community Structure Detection for Directed Networks through Modularity Optimisation. ALGORITHMS 2016. [DOI: 10.3390/a9040073] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/16/2022]
|
9
|
Basu S, Maulik U. Community detection based on strong Nash stable graph partition. SOCIAL NETWORK ANALYSIS AND MINING 2015. [DOI: 10.1007/s13278-015-0299-4] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/23/2022]
|
10
|
Schülke C, Ricci-Tersenghi F. Multiple phases in modularity-based community detection. PHYSICAL REVIEW. E, STATISTICAL, NONLINEAR, AND SOFT MATTER PHYSICS 2015; 92:042804. [PMID: 26565286 DOI: 10.1103/physreve.92.042804] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/16/2015] [Indexed: 06/05/2023]
Abstract
Detecting communities in a network, based only on the adjacency matrix, is a problem of interest to several scientific disciplines. Recently, Zhang and Moore have introduced an algorithm [Proc. Natl. Acad. Sci. USA 111, 18144 (2014)], called mod-bp, that avoids overfitting the data by optimizing a weighted average of modularity (a popular goodness-of-fit measure in community detection) and entropy (i.e., number of configurations with a given modularity). The adjustment of the relative weight, the "temperature" of the model, is crucial for getting a correct result from mod-bp. In this work we study the many phase transitions that mod-bp may undergo by changing the two parameters of the algorithm: the temperature T and the maximum number of groups q. We introduce a new set of order parameters that allow us to determine the actual number of groups q̂, and we observe on both synthetic and real networks the existence of phases with any q̂∈{1,q}, which were unknown before. We discuss how to interpret the results of mod-bp and how to make the optimal choice for the problem of detecting significant communities.
Collapse
Affiliation(s)
- Christophe Schülke
- Université Paris Diderot, Sorbonne Paris Cité, 75205 Paris, France and Dipartimento di Fisica, Università di Roma "La Sapienza," Piazzale Aldo Moro 2, 00185 Rome, Italy
| | - Federico Ricci-Tersenghi
- Dipartimento di Fisica, INFN-Sezione di Roma 1, and CNR-NANOTEC, UOS di Roma, Università di Roma "La Sapienza," Piazzale Aldo Moro 2, 00185 Rome, Italy
| |
Collapse
|
11
|
Bennett L, Kittas A, Muirhead G, Papageorgiou LG, Tsoka S. Detection of composite communities in multiplex biological networks. Sci Rep 2015; 5:10345. [PMID: 26012716 PMCID: PMC4446847 DOI: 10.1038/srep10345] [Citation(s) in RCA: 33] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/19/2014] [Accepted: 03/26/2015] [Indexed: 12/23/2022] Open
Abstract
The detection of community structure is a widely accepted means of investigating the
principles governing biological systems. Recent efforts are exploring ways in which
multiple data sources can be integrated to generate a more comprehensive model of
cellular interactions, leading to the detection of more biologically relevant
communities. In this work, we propose a mathematical programming model to cluster
multiplex biological networks, i.e. multiple network slices, each with a different
interaction type, to determine a single representative partition of composite
communities. Our method, known as SimMod, is evaluated through its application to
yeast networks of physical, genetic and co-expression interactions. A comparative
analysis involving partitions of the individual networks, partitions of aggregated
networks and partitions generated by similar methods from the literature highlights
the ability of SimMod to identify functionally enriched modules. It is further shown
that SimMod offers enhanced results when compared to existing approaches without the
need to train on known cellular interactions.
Collapse
Affiliation(s)
- Laura Bennett
- Centre for Process Systems Engineering, Department of Chemical Engineering,University College London, Torrington Place, London WC1E 7JE, United Kingdom
| | - Aristotelis Kittas
- Department of Informatics, Faculty of Natural and Mathematical Sciences, King's College London, Strand, London WC2R 2LS, UnitedKingdom
| | - Gareth Muirhead
- Department of Informatics, Faculty of Natural and Mathematical Sciences, King's College London, Strand, London WC2R 2LS, UnitedKingdom
| | - Lazaros G Papageorgiou
- Centre for Process Systems Engineering, Department of Chemical Engineering,University College London, Torrington Place, London WC1E 7JE, United Kingdom
| | - Sophia Tsoka
- Department of Informatics, Faculty of Natural and Mathematical Sciences, King's College London, Strand, London WC2R 2LS, UnitedKingdom
| |
Collapse
|
12
|
Le Thi HA, Nguyen MC, Dinh TP. A DC Programming Approach for Finding Communities in Networks. Neural Comput 2014; 26:2827-54. [DOI: 10.1162/neco_a_00673] [Citation(s) in RCA: 17] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/04/2022]
Abstract
Automatic discovery of community structures in complex networks is a fundamental task in many disciplines, including physics, biology, and the social sciences. The most used criterion for characterizing the existence of a community structure in a network is modularity, a quantitative measure proposed by Newman and Girvan ( 2004 ). The discovery community can be formulated as the so-called modularity maximization problem that consists of finding a partition of nodes of a network with the highest modularity. In this letter, we propose a fast and scalable algorithm called DCAM, based on DC (difference of convex function) programming and DCA (DC algorithms), an innovative approach in nonconvex programming framework for solving the modularity maximization problem. The special structure of the problem considered here has been well exploited to get an inexpensive DCA scheme that requires only a matrix-vector product at each iteration. Starting with a very large number of communities, DCAM furnishes, as output results, an optimal partition together with the optimal number of communities [Formula: see text]; that is, the number of communities is discovered automatically during DCAM’s iterations. Numerical experiments are performed on a variety of real-world network data sets with up to 4,194,304 nodes and 30,359,198 edges. The comparative results with height reference algorithms show that the proposed approach outperforms them not only on quality and rapidity but also on scalability. Moreover, it realizes a very good trade-off between the quality of solutions and the run time.
Collapse
Affiliation(s)
- Hoai An Le Thi
- Laboratory of Theoretical and Applied Computer Science, University of Lorraine, Ile du Saulcy, 57045 Metz, France
| | - Manh Cuong Nguyen
- Laboratory of Theoretical and Applied Computer Science, University of Lorraine, Ile du Saulcy, 57045 Metz, France
| | - Tao Pham Dinh
- Laboratoire of Mathematics, National Institute for Applied Sciences—Rouen, 76801 Saint-Étienne-du-Rouvray cedex, France
| |
Collapse
|
13
|
Bennett L, Kittas A, Liu S, Papageorgiou LG, Tsoka S. Community structure detection for overlapping modules through mathematical programming in protein interaction networks. PLoS One 2014; 9:e112821. [PMID: 25412367 PMCID: PMC4239042 DOI: 10.1371/journal.pone.0112821] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/17/2014] [Accepted: 10/15/2014] [Indexed: 12/05/2022] Open
Abstract
Community structure detection has proven to be important in revealing the underlying properties of complex networks. The standard problem, where a partition of disjoint communities is sought, has been continually adapted to offer more realistic models of interactions in these systems. Here, a two-step procedure is outlined for exploring the concept of overlapping communities. First, a hard partition is detected by employing existing methodologies. We then propose a novel mixed integer non linear programming (MINLP) model, known as OverMod, which transforms disjoint communities to overlapping. The procedure is evaluated through its application to protein-protein interaction (PPI) networks of the rat, E. coli, yeast and human organisms. Connector nodes of hard partitions exhibit topological and functional properties indicative of their suitability as candidates for multiple module membership. OverMod identifies two types of connector nodes, inter and intra-connector, each with their own particular characteristics pertaining to their topological and functional role in the organisation of the network. Inter-connector proteins are shown to be highly conserved proteins participating in pathways that control essential cellular processes, such as proliferation, differentiation and apoptosis and their differences with intra-connectors is highlighted. Many of these proteins are shown to possess multiple roles of distinct nature through their participation in different network modules, setting them apart from proteins that are simply ‘hubs’, i.e. proteins with many interaction partners but with a more specific biochemical role.
Collapse
Affiliation(s)
- Laura Bennett
- Centre for Process Systems Engineering, Department of Chemical Engineering, UCL (University College London), Torrington Place, WC1E 7JE, London, United Kingdom
| | - Aristotelis Kittas
- Department of Informatics, King's College London, Strand, WC2R 2LS, London, United Kingdom
| | - Songsong Liu
- Centre for Process Systems Engineering, Department of Chemical Engineering, UCL (University College London), Torrington Place, WC1E 7JE, London, United Kingdom
| | - Lazaros G. Papageorgiou
- Centre for Process Systems Engineering, Department of Chemical Engineering, UCL (University College London), Torrington Place, WC1E 7JE, London, United Kingdom
| | - Sophia Tsoka
- Department of Informatics, King's College London, Strand, WC2R 2LS, London, United Kingdom
- * E-mail:
| |
Collapse
|
14
|
Hüffner F, Komusiewicz C, Liebtrau A, Niedermeier R. Partitioning Biological Networks into Highly Connected Clusters with Maximum Edge Coverage. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2014; 11:455-467. [PMID: 26356014 DOI: 10.1109/tcbb.2013.177] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/05/2023]
Abstract
A popular clustering algorithm for biological networks which was proposed by Hartuv and Shamir identifies nonoverlapping highly connected components. We extend the approach taken by this algorithm by introducing the combinatorial optimization problem Highly Connected Deletion, which asks for removing as few edges as possible from a graph such that the resulting graph consists of highly connected components. We show that Highly Connected Deletion is NP-hard and provide a fixed-parameter algorithm and a kernelization. We propose exact and heuristic solution strategies, based on polynomial-time data reduction rules and integer linear programming with column generation. The data reduction typically identifies 75 percent of the edges that are deleted for an optimal solution; the column generation method can then optimally solve protein interaction networks with up to 6,000 vertices and 13,500 edges within five hours. Additionally, we present a new heuristic that finds more clusters than the method by Hartuv and Shamir.
Collapse
|
15
|
An iterated tabu search approach for the clique partitioning problem. ScientificWorldJournal 2014; 2014:353101. [PMID: 24737968 PMCID: PMC3967401 DOI: 10.1155/2014/353101] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/08/2013] [Accepted: 01/15/2014] [Indexed: 11/17/2022] Open
Abstract
Given an edge-weighted undirected graph with weights specifying dissimilarities between pairs of objects, represented by the vertices of the graph, the clique partitioning problem (CPP) is to partition the vertex set of the graph into mutually disjoint subsets such that the sum of the edge weights over all cliques induced by the subsets is as small as possible. We develop an iterated tabu search (ITS) algorithm for solving this problem. The proposed algorithm incorporates tabu search, local search, and solution perturbation procedures. We report computational results on CPP instances of size up to 2000 vertices. Performance comparisons of ITS against state-of-the-art methods from the literature demonstrate the competitiveness of our approach.
Collapse
|
16
|
Lee J, Lee J. Hidden information revealed by optimal community structure from a protein-complex bipartite network improves protein function prediction. PLoS One 2013; 8:e60372. [PMID: 23577106 PMCID: PMC3618231 DOI: 10.1371/journal.pone.0060372] [Citation(s) in RCA: 26] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/14/2012] [Accepted: 02/25/2013] [Indexed: 11/18/2022] Open
Abstract
The task of extracting the maximal amount of information from a biological network has drawn much attention from researchers, for example, predicting the function of a protein from a protein-protein interaction (PPI) network. It is well known that biological networks consist of modules/communities, a set of nodes that are more densely inter-connected among themselves than with the rest of the network. However, practical applications of utilizing the community information have been rather limited. For protein function prediction on a network, it has been shown that none of the existing community-based protein function prediction methods outperform a simple neighbor-based method. Recently, we have shown that proper utilization of a highly optimal modularity community structure for protein function prediction can outperform neighbor-assisted methods. In this study, we propose two function prediction approaches on bipartite networks that consider the community structure information as well as the neighbor information from the network: 1) a simple screening method and 2) a random forest based method. We demonstrate that our community-assisted methods outperform neighbor-assisted methods and the random forest method yields the best performance. In addition, we show that using the optimal community structure information is essential for more accurate function prediction for the protein-complex bipartite network of Saccharomyces cerevisiae. Community detection can be carried out either using a modified modularity for dealing with the original bipartite network or first projecting the network into a single-mode network (i.e., PPI network) and then applying community detection to the reduced network. We find that the projection leads to the loss of information in a significant way. Since our prediction methods rely only on the network topology, they can be applied to various fields where an efficient network-based analysis is required.
Collapse
Affiliation(s)
- Juyong Lee
- School of Computational Sciences, Korea Institute for Advanced Study, Seoul, Korea
| | - Jooyoung Lee
- School of Computational Sciences, Korea Institute for Advanced Study, Seoul, Korea
- * E-mail:
| |
Collapse
|
17
|
|
18
|
Bettinelli A, Hansen P, Liberti L. Algorithm for parametric community detection in networks. PHYSICAL REVIEW. E, STATISTICAL, NONLINEAR, AND SOFT MATTER PHYSICS 2012; 86:016107. [PMID: 23005491 DOI: 10.1103/physreve.86.016107] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 10/20/2011] [Revised: 04/17/2012] [Indexed: 06/01/2023]
Abstract
Modularity maximization is extensively used to detect communities in complex networks. It has been shown, however, that this method suffers from a resolution limit: Small communities may be undetectable in the presence of larger ones even if they are very dense. To alleviate this defect, various modifications of the modularity function have been proposed as well as multiresolution methods. In this paper we systematically study a simple model (proposed by Pons and Latapy [Theor. Comput. Sci. 412, 892 (2011)] and similar to the parametric model of Reichardt and Bornholdt [Phys. Rev. E 74, 016110 (2006)]) with a single parameter α that balances the fraction of within community edges and the expected fraction of edges according to the configuration model. An exact algorithm is proposed to find optimal solutions for all values of α as well as the corresponding successive intervals of α values for which they are optimal. This algorithm relies upon a routine for exact modularity maximization and is limited to moderate size instances. An agglomerative hierarchical heuristic is therefore proposed to address parametric modularity detection in large networks. At each iteration the smallest value of α for which it is worthwhile to merge two communities of the current partition is found. Then merging is performed and the data are updated accordingly. An implementation is proposed with the same time and space complexity as the well-known Clauset-Newman-Moore (CNM) heuristic [Phys. Rev. E 70, 066111 (2004)]. Experimental results on artificial and real world problems show that (i) communities are detected by both exact and heuristic methods for all values of the parameter α; (ii) the dendrogram summarizing the results of the heuristic method provides a useful tool for substantive analysis, as illustrated particularly on a Les Misérables data set; (iii) the difference between the parametric modularity values given by the exact method and those given by the heuristic is moderate; (iv) the heuristic version of the proposed parametric method, viewed as a modularity maximization tool, gives better results than the CNM heuristic for large instances.
Collapse
Affiliation(s)
- Andrea Bettinelli
- Dipartimento di Tecnologie dell'Infomazione, Università degli Studi di Milano, via Bramante 65, Crema, Italy.
| | | | | |
Collapse
|
19
|
Lee J, Gross SP, Lee J. Modularity optimization by conformational space annealing. PHYSICAL REVIEW. E, STATISTICAL, NONLINEAR, AND SOFT MATTER PHYSICS 2012; 85:056702. [PMID: 23004898 DOI: 10.1103/physreve.85.056702] [Citation(s) in RCA: 25] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 02/21/2012] [Indexed: 06/01/2023]
Abstract
We propose a modularity optimization method, Mod-CSA, based on stochastic global optimization algorithm, conformational space annealing (CSA). Our method outperforms simulated annealing in terms of both efficiency and accuracy, finding higher modularity partitions with less computational resources required. The high modularity values found by our method are higher than, or equal to, the largest values previously reported. In addition, the method can be combined with other heuristic methods, and implemented in parallel fashion, allowing it to be applicable to large graphs with more than 10,000 nodes.
Collapse
Affiliation(s)
- Juyong Lee
- School of Computational Sciences, Korea Institute of Advanced Study, Seoul, Korea.
| | | | | |
Collapse
|
20
|
Cafieri S, Caporossi G, Hansen P, Perron S, Costa A. Finding communities in networks in the strong and almost-strong sense. PHYSICAL REVIEW. E, STATISTICAL, NONLINEAR, AND SOFT MATTER PHYSICS 2012; 85:046113. [PMID: 22680544 DOI: 10.1103/physreve.85.046113] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 12/08/2011] [Indexed: 06/01/2023]
Abstract
Finding communities, or clusters or modules, in networks can be done by optimizing an objective function defined globally and/or by specifying conditions which must be satisfied by all communities. Radicchi et al. [Proc. Natl. Acad. Sci. USA 101, 2658 (2004)] define a susbset of vertices of a network to be a community in the strong sense if each vertex of that subset has a larger inner degree than its outer degree. A partition in the strong sense has only strong communities. In this paper we first define an enumerative algorithm to list all partitions in the strong sense of a network of moderate size. The results of this algorithm are given for the Zachary karate club data set, which is solved by hand, as well as for several well-known real-world problems of the literature. Moreover, this algorithm is slightly modified in order to apply it to larger networks, keeping only partitions with the largest number of communities. It is shown that some of the partitions obtained are informative, although they often have only a few communities, while they fail to give any information in other cases having only one community. It appears that degree 2 vertices play a big role in forcing large inhomogeneous communities. Therefore, a weakening of the strong condition is proposed and explored: we define a partition in the almost-strong sense by substituting a nonstrict inequality to a strict one in the definition of strong community for all vertices of degree 2. Results, for the same set of problems as before, then give partitions with a larger number of communities and are more informative.
Collapse
Affiliation(s)
- Sonia Cafieri
- Laboratoire MAIAA, École Nationale de l'Aviation Civile, Toulouse, France.
| | | | | | | | | |
Collapse
|
21
|
Costa A, Hansen P. Comment on "Evolutionary method for finding communities in bipartite networks". PHYSICAL REVIEW. E, STATISTICAL, NONLINEAR, AND SOFT MATTER PHYSICS 2011; 84:058101. [PMID: 22181549 DOI: 10.1103/physreve.84.058101] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/18/2011] [Indexed: 05/31/2023]
Abstract
In a recent paper, Zhan, Zhang, Guan, and Zhou [Phys. Rev. E 83, 066120 (2011)] presented a modified adaptive genetic algorithm (MAGA) tailored to the discovery of maximum modularity partitions of the node set into communities in unipartite, bipartite, and directed networks. The authors claim that "detection of communities in unipartite networks or in directed networks can be transformed into the same task in bipartite networks." Actually, some tests show that it is not the case for the proposed transformations, and why. Experimental results of MAGA for modularity maximization of untransformed unipartite or bipartite networks are also discussed.
Collapse
Affiliation(s)
- Alberto Costa
- LIX, École Polytechnique, F-91128 Palaiseau, France.
| | | |
Collapse
|
22
|
Cafieri S, Hansen P, Liberti L. Locally optimal heuristic for modularity maximization of networks. PHYSICAL REVIEW. E, STATISTICAL, NONLINEAR, AND SOFT MATTER PHYSICS 2011; 83:056105. [PMID: 21728603 DOI: 10.1103/physreve.83.056105] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/26/2010] [Revised: 02/08/2011] [Indexed: 05/31/2023]
Abstract
Community detection in networks based on modularity maximization is currently done with hierarchical divisive or agglomerative as well as partitioning heuristics, hybrids, and, in a few papers, exact algorithms. We consider here the case of hierarchical networks in which communities should be detected and propose a divisive heuristic which is locally optimal in the sense that each of the successive bipartitions is done in a provably optimal way. This heuristic is compared with the spectral-based hierarchical divisive heuristic of Newman [Proc. Natl. Acad. Sci. USA 103, 8577 (2006).] and with the hierarchical agglomerative heuristic of Clauset, Newman, and Moore [Phys. Rev. E 70, 066111 (2004).]. Computational results are given for a series of problems of the literature with up to 4941 vertices and 6594 edges. They show that the proposed divisive heuristic gives better results than the divisive heuristic of Newman and than the agglomerative heuristic of Clauset et al.
Collapse
Affiliation(s)
- Sonia Cafieri
- Laboratoire MAIA, École Nationale de l'Aviation Civile, 7 Avenue Edouard Belin, F-31055 Toulouse, France.
| | | | | |
Collapse
|