1
|
Preti G, De Francisci Morales G, Riondato M. Impossibility result for Markov chain Monte Carlo sampling from microcanonical bipartite graph ensembles. Phys Rev E 2024; 109:L053301. [PMID: 38907506 DOI: 10.1103/physreve.109.l053301] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/13/2023] [Accepted: 04/16/2024] [Indexed: 06/24/2024]
Abstract
Markov Chain Monte Carlo (MCMC) algorithms are commonly used to sample from graph ensembles. Two graphs are neighbors in the state space if one can be obtained from the other with only a few modifications, e.g., edge rewirings. For many common ensembles, e.g., those preserving the degree sequences of bipartite graphs, rewiring operations involving two edges are sufficient to create a fully connected state space, and they can be performed efficiently. We show that, for ensembles of bipartite graphs with fixed degree sequences and number of butterflies (k_{2,2} bicliques), there is no universal constant c such that a rewiring of at most c edges at every step is sufficient for any such ensemble to be fully connected. Our proof relies on an explicit construction of a family of pairs of graphs with the same degree sequences and number of butterflies, with each pair indexed by a natural c, and such that any sequence of rewiring operations transforming one graph into the other must include at least one rewiring operation involving at least c edges. Whether rewiring this many edges is sufficient to guarantee the full connectivity of the state space of any such ensemble remains an open question. Our result implies the impossibility of developing efficient, graph-agnostic, MCMC algorithms for these ensembles, as the necessity to rewire an impractically large number of edges may hinder taking a step on the state space.
Collapse
Affiliation(s)
| | | | - Matteo Riondato
- Amherst College, 25 East Drive, Amherst, Massachusetts 01002, USA
| |
Collapse
|
2
|
Casiraghi G, Nanumyan V. Configuration models as an urn problem. Sci Rep 2021; 11:13416. [PMID: 34183694 PMCID: PMC8239003 DOI: 10.1038/s41598-021-92519-y] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/18/2021] [Accepted: 06/11/2021] [Indexed: 11/09/2022] Open
Abstract
A fundamental issue of network data science is the ability to discern observed features that can be expected at random from those beyond such expectations. Configuration models play a crucial role there, allowing us to compare observations against degree-corrected null-models. Nonetheless, existing formulations have limited large-scale data analysis applications either because they require expensive Monte-Carlo simulations or lack the required flexibility to model real-world systems. With the generalized hypergeometric ensemble, we address both problems. To achieve this, we map the configuration model to an urn problem, where edges are represented as balls in an appropriately constructed urn. Doing so, we obtain the generalized hypergeometric ensemble of random graphs: a random graph model reproducing and extending the properties of standard configuration models, with the critical advantage of a closed-form probability distribution.
Collapse
|
3
|
de Santiago R, Lamb LC. A ground truth contest between modularity maximization and modularity density maximization. Artif Intell Rev 2020. [DOI: 10.1007/s10462-019-09802-8] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/24/2022]
|
4
|
He M, Glasser J, Pritchard N, Bhamidi S, Kaza N. Demarcating geographic regions using community detection in commuting networks with significant self-loops. PLoS One 2020; 15:e0230941. [PMID: 32348311 PMCID: PMC7190107 DOI: 10.1371/journal.pone.0230941] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/31/2019] [Accepted: 03/12/2020] [Indexed: 11/19/2022] Open
Abstract
We develop a method to identify statistically significant communities in a weighted network with a high proportion of self-looping weights. We use this method to find overlapping agglomerations of U.S. counties by representing inter-county commuting as a weighted network. We identify three types of communities; non-nodal, nodal and monads, which correspond to different types of regions. The results suggest that traditional regional delineations that rely on ad hoc thresholds do not account for important and pervasive connections that extend far beyond expected metropolitan boundaries or megaregions.
Collapse
Affiliation(s)
- Mark He
- Statistics & Operations Research, University of North Carolina at Chapel Hill, Chapel Hill, NC, United States of America
| | - Joseph Glasser
- Statistics & Operations Research, University of North Carolina at Chapel Hill, Chapel Hill, NC, United States of America
| | - Nathaniel Pritchard
- Statistics, University of Wisconsin at Madison, Madison, WI, United States of America
| | - Shankar Bhamidi
- Statistics & Operations Research, University of North Carolina at Chapel Hill, Chapel Hill, NC, United States of America
| | - Nikhil Kaza
- City & Regional Planning, University of North Carolina at Chapel Hill, Chapel Hill, NC, United States of America
| |
Collapse
|
5
|
Bettinelli A, Hansen P, Liberti L. Algorithm for parametric community detection in networks. PHYSICAL REVIEW. E, STATISTICAL, NONLINEAR, AND SOFT MATTER PHYSICS 2012; 86:016107. [PMID: 23005491 DOI: 10.1103/physreve.86.016107] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 10/20/2011] [Revised: 04/17/2012] [Indexed: 06/01/2023]
Abstract
Modularity maximization is extensively used to detect communities in complex networks. It has been shown, however, that this method suffers from a resolution limit: Small communities may be undetectable in the presence of larger ones even if they are very dense. To alleviate this defect, various modifications of the modularity function have been proposed as well as multiresolution methods. In this paper we systematically study a simple model (proposed by Pons and Latapy [Theor. Comput. Sci. 412, 892 (2011)] and similar to the parametric model of Reichardt and Bornholdt [Phys. Rev. E 74, 016110 (2006)]) with a single parameter α that balances the fraction of within community edges and the expected fraction of edges according to the configuration model. An exact algorithm is proposed to find optimal solutions for all values of α as well as the corresponding successive intervals of α values for which they are optimal. This algorithm relies upon a routine for exact modularity maximization and is limited to moderate size instances. An agglomerative hierarchical heuristic is therefore proposed to address parametric modularity detection in large networks. At each iteration the smallest value of α for which it is worthwhile to merge two communities of the current partition is found. Then merging is performed and the data are updated accordingly. An implementation is proposed with the same time and space complexity as the well-known Clauset-Newman-Moore (CNM) heuristic [Phys. Rev. E 70, 066111 (2004)]. Experimental results on artificial and real world problems show that (i) communities are detected by both exact and heuristic methods for all values of the parameter α; (ii) the dendrogram summarizing the results of the heuristic method provides a useful tool for substantive analysis, as illustrated particularly on a Les Misérables data set; (iii) the difference between the parametric modularity values given by the exact method and those given by the heuristic is moderate; (iv) the heuristic version of the proposed parametric method, viewed as a modularity maximization tool, gives better results than the CNM heuristic for large instances.
Collapse
Affiliation(s)
- Andrea Bettinelli
- Dipartimento di Tecnologie dell'Infomazione, Università degli Studi di Milano, via Bramante 65, Crema, Italy.
| | | | | |
Collapse
|
6
|
Wu J, Jiao L, Jin C, Liu F, Gong M, Shang R, Chen W. Overlapping community detection via network dynamics. PHYSICAL REVIEW. E, STATISTICAL, NONLINEAR, AND SOFT MATTER PHYSICS 2012; 85:016115. [PMID: 22400633 DOI: 10.1103/physreve.85.016115] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/25/2011] [Revised: 12/10/2011] [Indexed: 05/31/2023]
Abstract
The modular structure of a network is closely related to the dynamics toward clustering. In this paper, a method for community detection is proposed via the clustering dynamics of a network. The initial phases of the nodes in the network are given randomly, and then they evolve according to a set of dedicatedly designed differential equations. The phases of the nodes are naturally separated into several clusters after a period of evolution, and each cluster corresponds to a community in the network. For the networks with overlapping communities, the phases of the overlapping nodes will evolve to the interspace of the two communities. The proposed method is illustrated with applications to both synthetically generated and real-world complex networks.
Collapse
Affiliation(s)
- Jianshe Wu
- Key Laboratory of Intelligent Perception and Image Understanding of Ministry of Education of China, Xidian University, Xi'an 710071, China
| | | | | | | | | | | | | |
Collapse
|
7
|
Cafieri S, Hansen P, Liberti L. Locally optimal heuristic for modularity maximization of networks. PHYSICAL REVIEW. E, STATISTICAL, NONLINEAR, AND SOFT MATTER PHYSICS 2011; 83:056105. [PMID: 21728603 DOI: 10.1103/physreve.83.056105] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/26/2010] [Revised: 02/08/2011] [Indexed: 05/31/2023]
Abstract
Community detection in networks based on modularity maximization is currently done with hierarchical divisive or agglomerative as well as partitioning heuristics, hybrids, and, in a few papers, exact algorithms. We consider here the case of hierarchical networks in which communities should be detected and propose a divisive heuristic which is locally optimal in the sense that each of the successive bipartitions is done in a provably optimal way. This heuristic is compared with the spectral-based hierarchical divisive heuristic of Newman [Proc. Natl. Acad. Sci. USA 103, 8577 (2006).] and with the hierarchical agglomerative heuristic of Clauset, Newman, and Moore [Phys. Rev. E 70, 066111 (2004).]. Computational results are given for a series of problems of the literature with up to 4941 vertices and 6594 edges. They show that the proposed divisive heuristic gives better results than the divisive heuristic of Newman and than the agglomerative heuristic of Clauset et al.
Collapse
Affiliation(s)
- Sonia Cafieri
- Laboratoire MAIA, École Nationale de l'Aviation Civile, 7 Avenue Edouard Belin, F-31055 Toulouse, France.
| | | | | |
Collapse
|
8
|
Aloise D, Cafieri S, Caporossi G, Hansen P, Perron S, Liberti L. Column generation algorithms for exact modularity maximization in networks. PHYSICAL REVIEW. E, STATISTICAL, NONLINEAR, AND SOFT MATTER PHYSICS 2010; 82:046112. [PMID: 21230350 DOI: 10.1103/physreve.82.046112] [Citation(s) in RCA: 21] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/04/2010] [Indexed: 05/30/2023]
Abstract
Finding modules, or clusters, in networks currently attracts much attention in several domains. The most studied criterion for doing so, due to Newman and Girvan [Phys. Rev. E 69, 026113 (2004)], is modularity maximization. Many heuristics have been proposed for maximizing modularity and yield rapidly near optimal solution or sometimes optimal ones but without a guarantee of optimality. There are few exact algorithms, prominent among which is a paper by Xu [Eur. Phys. J. B 60, 231 (2007)]. Modularity maximization can also be expressed as a clique partitioning problem and the row generation algorithm of Grötschel and Wakabayashi [Math. Program. 45, 59 (1989)] applied. We propose to extend both of these algorithms using the powerful column generation methods for linear and non linear integer programming. Performance of the four resulting algorithms is compared on problems from the literature. Instances with up to 512 entities are solved exactly. Moreover, the computing time of previously solved problems are reduced substantially.
Collapse
Affiliation(s)
- Daniel Aloise
- Department of Production Engineering, Universidade Federal do Rio Grande do Norte, Campus Universitário s/n, Natal, RN 59072-970, Brazil.
| | | | | | | | | | | |
Collapse
|