1
|
Mangold L, Roth C. Quantifying metadata relevance to network block structure using description length. COMMUNICATIONS PHYSICS 2024; 7:331. [PMID: 39398491 PMCID: PMC11469959 DOI: 10.1038/s42005-024-01819-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 03/19/2024] [Accepted: 09/30/2024] [Indexed: 10/15/2024]
Abstract
Network analysis is often enriched by including an examination of node metadata. In the context of understanding the mesoscale of networks it is often assumed that node groups based on metadata and node groups based on connectivity patterns are intrinsically linked. This assumption is increasingly being challenged, whereby metadata might be entirely unrelated to structure or, similarly, multiple sets of metadata might be relevant to the structure of a network in different ways. We propose the metablox tool to quantify the relationship between a network's node metadata and its mesoscale structure, measuring the strength of the relationship and the type of structural arrangement exhibited by the metadata. We show on a number of synthetic and empirical networks that our tool distinguishes relevant metadata and allows for this in a comparative setting, demonstrating that it can be used as part of systematic meta analyses for the comparison of networks from different domains.
Collapse
Affiliation(s)
- Lena Mangold
- Centre d’Analyse et de Mathématique Sociales (CNRS/EHESS), 54 Bd Raspail, 75006 Paris, France
- Computational Social Science Team, Centre Marc Bloch (CNRS/MEAE), Friedrichstr. 191, 10117 Berlin, Germany
| | - Camille Roth
- Centre d’Analyse et de Mathématique Sociales (CNRS/EHESS), 54 Bd Raspail, 75006 Paris, France
- Computational Social Science Team, Centre Marc Bloch (CNRS/MEAE), Friedrichstr. 191, 10117 Berlin, Germany
| |
Collapse
|
2
|
Nelson APK, Mole J, Pombo G, Gray RJ, Ruffle JK, Chan E, Rees GE, Cipolotti L, Nachev P. The minimal computational substrate of fluid intelligence. Cortex 2024; 179:62-76. [PMID: 39141936 DOI: 10.1016/j.cortex.2024.07.003] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/18/2024] [Revised: 05/08/2024] [Accepted: 07/01/2024] [Indexed: 08/16/2024]
Abstract
The quantification of cognitive powers rests on identifying a behavioural task that depends on them. Such dependence cannot be assured, for the powers a task invokes cannot be experimentally controlled or constrained a priori, resulting in unknown vulnerability to failure of specificity and generalisability. Evaluating a compact version of Raven's Advanced Progressive Matrices (RAPM), a widely used clinical test of fluid intelligence, we show that LaMa, a self-supervised artificial neural network trained solely on the completion of partially masked images of natural environmental scenes, achieves representative human-level test scores a prima vista, without any task-specific inductive bias or training. Compared with cohorts of healthy and focally lesioned participants, LaMa exhibits human-like variation with item difficulty, and produces errors characteristic of right frontal lobe damage under degradation of its ability to integrate global spatial patterns. LaMa's narrow training and limited capacity suggest matrix-style tests may be open to computationally simple solutions that need not necessarily invoke the substrates of reasoning.
Collapse
Affiliation(s)
- Amy P K Nelson
- High Dimensional Neurology Group, UCL Queen Square Institute of Neurology, University College London, Russell Square House, Bloomsbury, London, UK.
| | - Joe Mole
- Department of Neuropsychology, National Hospital for Neurology and Neurosurgery, London, UK; UCL Queen Square Institute of Neurology, London, UK
| | - Guilherme Pombo
- High Dimensional Neurology Group, UCL Queen Square Institute of Neurology, University College London, Russell Square House, Bloomsbury, London, UK
| | - Robert J Gray
- High Dimensional Neurology Group, UCL Queen Square Institute of Neurology, University College London, Russell Square House, Bloomsbury, London, UK
| | - James K Ruffle
- High Dimensional Neurology Group, UCL Queen Square Institute of Neurology, University College London, Russell Square House, Bloomsbury, London, UK
| | - Edgar Chan
- Department of Neuropsychology, National Hospital for Neurology and Neurosurgery, London, UK; UCL Queen Square Institute of Neurology, London, UK
| | - Geraint E Rees
- UCL Queen Square Institute of Neurology, London, UK; University College London, Gower Street, London, UK
| | - Lisa Cipolotti
- Department of Neuropsychology, National Hospital for Neurology and Neurosurgery, London, UK; UCL Queen Square Institute of Neurology, London, UK
| | - Parashkev Nachev
- High Dimensional Neurology Group, UCL Queen Square Institute of Neurology, University College London, Russell Square House, Bloomsbury, London, UK.
| |
Collapse
|
3
|
Murphy C, Thibeault V, Allard A, Desrosiers P. Duality between predictability and reconstructability in complex systems. Nat Commun 2024; 15:4478. [PMID: 38796449 PMCID: PMC11127975 DOI: 10.1038/s41467-024-48020-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/08/2023] [Accepted: 04/15/2024] [Indexed: 05/28/2024] Open
Abstract
Predicting the evolution of a large system of units using its structure of interaction is a fundamental problem in complex system theory. And so is the problem of reconstructing the structure of interaction from temporal observations. Here, we find an intricate relationship between predictability and reconstructability using an information-theoretical point of view. We use the mutual information between a random graph and a stochastic process evolving on this random graph to quantify their codependence. Then, we show how the uncertainty coefficients, which are intimately related to that mutual information, quantify our ability to reconstruct a graph from an observed time series, and our ability to predict the evolution of a process from the structure of its interactions. We provide analytical calculations of the uncertainty coefficients for many different systems, including continuous deterministic systems, and describe a numerical procedure when exact calculations are intractable. Interestingly, we find that predictability and reconstructability, even though closely connected by the mutual information, can behave differently, even in a dual manner. We prove how such duality universally emerges when changing the number of steps in the process. Finally, we provide evidence that predictability-reconstruction dualities may exist in dynamical processes on real networks close to criticality.
Collapse
Affiliation(s)
- Charles Murphy
- Département de physique, de génie physique et d'optique, Université Laval, Québec, QC, G1V 0A6, Canada.
- Centre interdisciplinaire en modélisation mathématique, Université Laval, Québec, QC, G1V 0A6, Canada.
| | - Vincent Thibeault
- Département de physique, de génie physique et d'optique, Université Laval, Québec, QC, G1V 0A6, Canada
- Centre interdisciplinaire en modélisation mathématique, Université Laval, Québec, QC, G1V 0A6, Canada
| | - Antoine Allard
- Département de physique, de génie physique et d'optique, Université Laval, Québec, QC, G1V 0A6, Canada
- Centre interdisciplinaire en modélisation mathématique, Université Laval, Québec, QC, G1V 0A6, Canada
| | - Patrick Desrosiers
- Département de physique, de génie physique et d'optique, Université Laval, Québec, QC, G1V 0A6, Canada.
- Centre interdisciplinaire en modélisation mathématique, Université Laval, Québec, QC, G1V 0A6, Canada.
- Centre de recherche CERVO, Québec, QC, G1J 2G3, Canada.
| |
Collapse
|
4
|
Ruffle JK, Gray RJ, Mohinta S, Pombo G, Kaul C, Hyare H, Rees G, Nachev P. Computational limits to the legibility of the imaged human brain. Neuroimage 2024; 291:120600. [PMID: 38569979 DOI: 10.1016/j.neuroimage.2024.120600] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/19/2023] [Revised: 03/08/2024] [Accepted: 03/31/2024] [Indexed: 04/05/2024] Open
Abstract
Our knowledge of the organisation of the human brain at the population-level is yet to translate into power to predict functional differences at the individual-level, limiting clinical applications and casting doubt on the generalisability of inferred mechanisms. It remains unknown whether the difficulty arises from the absence of individuating biological patterns within the brain, or from limited power to access them with the models and compute at our disposal. Here we comprehensively investigate the resolvability of such patterns with data and compute at unprecedented scale. Across 23 810 unique participants from UK Biobank, we systematically evaluate the predictability of 25 individual biological characteristics, from all available combinations of structural and functional neuroimaging data. Over 4526 GPU*hours of computation, we train, optimize, and evaluate out-of-sample 700 individual predictive models, including fully-connected feed-forward neural networks of demographic, psychological, serological, chronic disease, and functional connectivity characteristics, and both uni- and multi-modal 3D convolutional neural network models of macro- and micro-structural brain imaging. We find a marked discrepancy between the high predictability of sex (balanced accuracy 99.7%), age (mean absolute error 2.048 years, R2 0.859), and weight (mean absolute error 2.609Kg, R2 0.625), for which we set new state-of-the-art performance, and the surprisingly low predictability of other characteristics. Neither structural nor functional imaging predicted an individual's psychology better than the coincidence of common chronic disease (p < 0.05). Serology predicted chronic disease (p < 0.05) and was best predicted by it (p < 0.001), followed by structural neuroimaging (p < 0.05). Our findings suggest either more informative imaging or more powerful models will be needed to decipher individual level characteristics from the human brain. We make our models and code openly available.
Collapse
Affiliation(s)
- James K Ruffle
- Queen Square Institute of Neurology, University College London, London, United Kingdom.
| | - Robert J Gray
- Queen Square Institute of Neurology, University College London, London, United Kingdom
| | - Samia Mohinta
- Queen Square Institute of Neurology, University College London, London, United Kingdom
| | - Guilherme Pombo
- Queen Square Institute of Neurology, University College London, London, United Kingdom
| | - Chaitanya Kaul
- School of Computing Science, University of Glasgow, Glasgow, United Kingdom
| | - Harpreet Hyare
- Queen Square Institute of Neurology, University College London, London, United Kingdom
| | - Geraint Rees
- Queen Square Institute of Neurology, University College London, London, United Kingdom
| | - Parashkev Nachev
- Queen Square Institute of Neurology, University College London, London, United Kingdom.
| |
Collapse
|
5
|
Ruffle JK, Mohinta S, Pombo G, Gray R, Kopanitsa V, Lee F, Brandner S, Hyare H, Nachev P. Brain tumour genetic network signatures of survival. Brain 2023; 146:4736-4754. [PMID: 37665980 PMCID: PMC10629773 DOI: 10.1093/brain/awad199] [Citation(s) in RCA: 5] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/21/2023] [Revised: 05/12/2023] [Accepted: 05/30/2023] [Indexed: 09/06/2023] Open
Abstract
Tumour heterogeneity is increasingly recognized as a major obstacle to therapeutic success across neuro-oncology. Gliomas are characterized by distinct combinations of genetic and epigenetic alterations, resulting in complex interactions across multiple molecular pathways. Predicting disease evolution and prescribing individually optimal treatment requires statistical models complex enough to capture the intricate (epi)genetic structure underpinning oncogenesis. Here, we formalize this task as the inference of distinct patterns of connectivity within hierarchical latent representations of genetic networks. Evaluating multi-institutional clinical, genetic and outcome data from 4023 glioma patients over 14 years, across 12 countries, we employ Bayesian generative stochastic block modelling to reveal a hierarchical network structure of tumour genetics spanning molecularly confirmed glioblastoma, IDH-wildtype; oligodendroglioma, IDH-mutant and 1p/19q codeleted; and astrocytoma, IDH-mutant. Our findings illuminate the complex dependence between features across the genetic landscape of brain tumours and show that generative network models reveal distinct signatures of survival with better prognostic fidelity than current gold standard diagnostic categories.
Collapse
Affiliation(s)
- James K Ruffle
- Queen Square Institute of Neurology, University College London, London WC1N 3BG, UK
| | - Samia Mohinta
- Queen Square Institute of Neurology, University College London, London WC1N 3BG, UK
| | - Guilherme Pombo
- Queen Square Institute of Neurology, University College London, London WC1N 3BG, UK
| | - Robert Gray
- Queen Square Institute of Neurology, University College London, London WC1N 3BG, UK
| | - Valeriya Kopanitsa
- Queen Square Institute of Neurology, University College London, London WC1N 3BG, UK
| | - Faith Lee
- Queen Square Institute of Neurology, University College London, London WC1N 3BG, UK
| | - Sebastian Brandner
- Division of Neuropathology and Department of Neurodegenerative Disease, Queen Square Institute of Neurology, University College London, London WC1N 3BG, UK
| | - Harpreet Hyare
- Queen Square Institute of Neurology, University College London, London WC1N 3BG, UK
| | - Parashkev Nachev
- Queen Square Institute of Neurology, University College London, London WC1N 3BG, UK
| |
Collapse
|
6
|
Peixoto TP, Kirkley A. Implicit models, latent compression, intrinsic biases, and cheap lunches in community detection. Phys Rev E 2023; 108:024309. [PMID: 37723811 DOI: 10.1103/physreve.108.024309] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/24/2022] [Accepted: 08/02/2023] [Indexed: 09/20/2023]
Abstract
The task of community detection, which aims to partition a network into clusters of nodes to summarize its large-scale structure, has spawned the development of many competing algorithms with varying objectives. Some community detection methods are inferential, explicitly deriving the clustering objective through a probabilistic generative model, while other methods are descriptive, dividing a network according to an objective motivated by a particular application, making it challenging to compare these methods on the same scale. Here we present a solution to this problem that associates any community detection objective, inferential or descriptive, with its corresponding implicit network generative model. This allows us to compute the description length of a network and its partition under arbitrary objectives, providing a principled measure to compare the performance of different algorithms without the need for "ground-truth" labels. Our approach also gives access to instances of the community detection problem that are optimal to any given algorithm and in this way reveals intrinsic biases in popular descriptive methods, explaining their tendency to overfit. Using our framework, we compare a number of community detection methods on artificial networks and on a corpus of over 500 structurally diverse empirical networks. We find that more expressive community detection methods exhibit consistently superior compression performance on structured data instances, without having degraded performance on a minority of situations where more specialized algorithms perform optimally. Our results undermine the implications of the "no free lunch" theorem for community detection, both conceptually and in practice, since it is confined to unstructured data instances, unlike relevant community detection problems which are structured by requirement.
Collapse
Affiliation(s)
- Tiago P Peixoto
- Department of Network and Data Science, Central European University, 1100 Vienna, Austria
| | - Alec Kirkley
- Institute of Data Science, University of Hong Kong, Hong Kong; Department of Urban Planning and Design, University of Hong Kong, Hong Kong; and Urban Systems Institute, University of Hong Kong, Hong Kong
| |
Collapse
|
7
|
Schaub MT, Li J, Peel L. Hierarchical community structure in networks. Phys Rev E 2023; 107:054305. [PMID: 37329032 DOI: 10.1103/physreve.107.054305] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/24/2020] [Accepted: 04/24/2023] [Indexed: 06/18/2023]
Abstract
Modular and hierarchical community structures are pervasive in real-world complex systems. A great deal of effort has gone into trying to detect and study these structures. Important theoretical advances in the detection of modular have included identifying fundamental limits of detectability by formally defining community structure using probabilistic generative models. Detecting hierarchical community structure introduces additional challenges alongside those inherited from community detection. Here we present a theoretical study on hierarchical community structure in networks, which has thus far not received the same rigorous attention. We address the following questions. (1) How should we define a hierarchy of communities? (2) How do we determine if there is sufficient evidence of a hierarchical structure in a network? (3) How can we detect hierarchical structure efficiently? We approach these questions by introducing a definition of hierarchy based on the concept of stochastic externally equitable partitions and their relation to probabilistic models, such as the popular stochastic block model. We enumerate the challenges involved in detecting hierarchies and, by studying the spectral properties of hierarchical structure, present an efficient and principled method for detecting them.
Collapse
Affiliation(s)
- Michael T Schaub
- Department of Computer Science, RWTH Aachen University, 52074 Aachen, Germany
| | - Jiaze Li
- Department of Data Analytics and Digitalisation, School of Business and Economics, Maastricht University, 6211 LM Maastricht, The Netherlands
| | - Leto Peel
- Department of Data Analytics and Digitalisation, School of Business and Economics, Maastricht University, 6211 LM Maastricht, The Netherlands
| |
Collapse
|
8
|
Paton J, Hartle H, Stepanyants H, van der Hoorn P, Krioukov D. Entropy of labeled versus unlabeled networks. Phys Rev E 2022; 106:054308. [PMID: 36559397 DOI: 10.1103/physreve.106.054308] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/20/2022] [Accepted: 10/24/2022] [Indexed: 06/17/2023]
Abstract
The structure of a network is an unlabeled graph, yet graphs in most models of complex networks are labeled by meaningless random integers. Is the associated labeling noise always negligible, or can it overpower the network-structural signal? To address this question, we introduce and consider the sparse unlabeled versions of popular network models and compare their entropy against the original labeled versions. We show that labeled and unlabeled Erdős-Rényi graphs are entropically equivalent, even though their degree distributions are very different. The labeled and unlabeled versions of the configuration model may have different prefactors in their leading entropy terms, although this remains conjectural. Our main results are upper and lower bounds for the entropy of labeled and unlabeled one-dimensional random geometric graphs. We show that their unlabeled entropy is negligible in comparison with the labeled entropy. This means that in sparse networks the entropy of meaningless labeling may dominate the entropy of the network structure. The main implication of this result is that the common practice of using exchangeable models to reason about real-world networks with distinguishable nodes may introduce uncontrolled aberrations into conclusions made about these networks, suggesting a need for a thorough reexamination of the statistical foundations and key results of network science.
Collapse
Affiliation(s)
- Jeremy Paton
- Department of Physics, Northeastern University, Boston, Massachusetts 02115, USA
- Network Science Institute, Northeastern University, Boston, Massachusetts 02115, USA
| | - Harrison Hartle
- Network Science Institute, Northeastern University, Boston, Massachusetts 02115, USA
| | - Huck Stepanyants
- Department of Physics, Northeastern University, Boston, Massachusetts 02115, USA
- Network Science Institute, Northeastern University, Boston, Massachusetts 02115, USA
| | - Pim van der Hoorn
- Department of Mathematics and Computer Science, Eindhoven University of Technology, 5600 MB Eindhoven, Netherlands
| | - Dmitri Krioukov
- Department of Physics, Northeastern University, Boston, Massachusetts 02115, USA
- Network Science Institute, Northeastern University, Boston, Massachusetts 02115, USA
- Department of Mathematics, Northeastern University, Boston, Massachusetts 02115, USA
- Department of Electrical and Computer Engineering, Northeastern University, Boston, Massachusetts 02115, USA
| |
Collapse
|
9
|
Bianconi G. Grand Canonical Ensembles of Sparse Networks and Bayesian Inference. ENTROPY (BASEL, SWITZERLAND) 2022; 24:633. [PMID: 35626517 PMCID: PMC9146839 DOI: 10.3390/e24050633] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 04/13/2022] [Revised: 04/25/2022] [Accepted: 04/27/2022] [Indexed: 02/04/2023]
Abstract
Maximum entropy network ensembles have been very successful in modelling sparse network topologies and in solving challenging inference problems. However the sparse maximum entropy network models proposed so far have fixed number of nodes and are typically not exchangeable. Here we consider hierarchical models for exchangeable networks in the sparse limit, i.e., with the total number of links scaling linearly with the total number of nodes. The approach is grand canonical, i.e., the number of nodes of the network is not fixed a priori: it is finite but can be arbitrarily large. In this way the grand canonical network ensembles circumvent the difficulties in treating infinite sparse exchangeable networks which according to the Aldous-Hoover theorem must vanish. The approach can treat networks with given degree distribution or networks with given distribution of latent variables. When only a subgraph induced by a subset of nodes is known, this model allows a Bayesian estimation of the network size and the degree sequence (or the sequence of latent variables) of the entire network which can be used for network reconstruction.
Collapse
Affiliation(s)
- Ginestra Bianconi
- School of Mathematical Sciences, Queen Mary University of London, London E1 4NS, UK;
- The Alan Turing Institute, The British Library, London NW1 2DB, UK
| |
Collapse
|
10
|
Bianconi G. Statistical physics of exchangeable sparse simple networks, multiplex networks, and simplicial complexes. Phys Rev E 2022; 105:034310. [PMID: 35428066 DOI: 10.1103/physreve.105.034310] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/07/2021] [Accepted: 03/01/2022] [Indexed: 06/14/2023]
Abstract
Exchangeability is a desired statistical property of network ensembles requiring their invariance upon relabeling of the nodes. However, combining sparsity of network ensembles with exchangeability is challenging. Here we propose a statistical physics framework and a Metropolis-Hastings algorithm defining exchangeable sparse network ensembles. The model generates networks with heterogeneous degree distributions by enforcing only global constraints while existing (nonexchangeable) exponential random graphs enforce an extensive number of local constraints. This very general theoretical framework to describe exchangeable networks is here first formulated for uncorrelated simple networks and then it is extended to treat simple networks with degree correlations, directed networks, bipartite networks, and generalized network structures including multiplex networks and simplicial complexes. In particular here we formulate and treat both uncorrelated and correlated exchangeable ensembles of simplicial complexes using statistical mechanics approaches.
Collapse
Affiliation(s)
- Ginestra Bianconi
- School of Mathematical Sciences, Queen Mary University of London, London E1 4NS, United Kingdom and The Alan Turing Institute, The British Library, London NW1 2DB, United Kingdom
| |
Collapse
|
11
|
On model selection for dense stochastic block models. ADV APPL PROBAB 2022. [DOI: 10.1017/apr.2021.29] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/07/2022]
Abstract
Abstract
This paper studies estimation of stochastic block models with Rissanen’s minimum description length (MDL) principle in the dense graph asymptotics. We focus on the problem of model specification, i.e., identification of the number of blocks. Refinements of the true partition always decrease the code part corresponding to the edge placement, and thus a respective increase of the code part specifying the model should overweight that gain in order to yield a minimum at the true partition. The balance between these effects turns out to be delicate. We show that the MDL principle identifies the true partition among models whose relative block sizes are bounded away from zero. The results are extended to models with Poisson-distributed edge weights.
Collapse
|
12
|
Han J, Guo T, Zhou Q, Han W, Bai B, Zhang G. Structural Entropy of the Stochastic Block Models. ENTROPY 2022; 24:e24010081. [PMID: 35052107 PMCID: PMC8775199 DOI: 10.3390/e24010081] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 11/24/2021] [Revised: 12/28/2021] [Accepted: 12/30/2021] [Indexed: 11/16/2022]
Abstract
With the rapid expansion of graphs and networks and the growing magnitude of data from all areas of science, effective treatment and compression schemes of context-dependent data is extremely desirable. A particularly interesting direction is to compress the data while keeping the "structural information" only and ignoring the concrete labelings. Under this direction, Choi and Szpankowski introduced the structures (unlabeled graphs) which allowed them to compute the structural entropy of the Erdős-Rényi random graph model. Moreover, they also provided an asymptotically optimal compression algorithm that (asymptotically) achieves this entropy limit and runs in expectation in linear time. In this paper, we consider the stochastic block models with an arbitrary number of parts. Indeed, we define a partitioned structural entropy for stochastic block models, which generalizes the structural entropy for unlabeled graphs and encodes the partition information as well. We then compute the partitioned structural entropy of the stochastic block models, and provide a compression scheme that asymptotically achieves this entropy limit.
Collapse
Affiliation(s)
- Jie Han
- Theory Lab, Central Research Institute, 2012 Labs, Huawei Tech. Co., Ltd., Hong Kong SAR, China; (J.H.); (W.H.); (B.B.); (G.Z.)
| | - Tao Guo
- Theory Lab, Central Research Institute, 2012 Labs, Huawei Tech. Co., Ltd., Hong Kong SAR, China; (J.H.); (W.H.); (B.B.); (G.Z.)
- Correspondence:
| | - Qiaoqiao Zhou
- Department of Computer Science, School of Computing, National University of Singapore, Singapore 11741, Singapore;
| | - Wei Han
- Theory Lab, Central Research Institute, 2012 Labs, Huawei Tech. Co., Ltd., Hong Kong SAR, China; (J.H.); (W.H.); (B.B.); (G.Z.)
| | - Bo Bai
- Theory Lab, Central Research Institute, 2012 Labs, Huawei Tech. Co., Ltd., Hong Kong SAR, China; (J.H.); (W.H.); (B.B.); (G.Z.)
| | - Gong Zhang
- Theory Lab, Central Research Institute, 2012 Labs, Huawei Tech. Co., Ltd., Hong Kong SAR, China; (J.H.); (W.H.); (B.B.); (G.Z.)
| |
Collapse
|
13
|
Zamani Esfahlani F, Jo Y, Puxeddu MG, Merritt H, Tanner JC, Greenwell S, Patel R, Faskowitz J, Betzel RF. Modularity maximization as a flexible and generic framework for brain network exploratory analysis. Neuroimage 2021; 244:118607. [PMID: 34607022 DOI: 10.1016/j.neuroimage.2021.118607] [Citation(s) in RCA: 11] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/03/2021] [Revised: 09/03/2021] [Accepted: 09/20/2021] [Indexed: 11/28/2022] Open
Abstract
The modular structure of brain networks supports specialized information processing, complex dynamics, and cost-efficient spatial embedding. Inter-individual variation in modular structure has been linked to differences in performance, disease, and development. There exist many data-driven methods for detecting and comparing modular structure, the most popular of which is modularity maximization. Although modularity maximization is a general framework that can be modified and reparamaterized to address domain-specific research questions, its application to neuroscientific datasets has, thus far, been narrow. Here, we highlight several strategies in which the "out-of-the-box" version of modularity maximization can be extended to address questions specific to neuroscience. First, we present approaches for detecting "space-independent" modules and for applying modularity maximization to signed matrices. Next, we show that the modularity maximization frame is well-suited for detecting task- and condition-specific modules. Finally, we highlight the role of multi-layer models in detecting and tracking modules across time, tasks, subjects, and modalities. In summary, modularity maximization is a flexible and general framework that can be adapted to detect modular structure resulting from a wide range of hypotheses. This article highlights multiple frontiers for future research and applications.
Collapse
Affiliation(s)
- Farnaz Zamani Esfahlani
- Department of Psychological and Brain Sciences, Indiana University, Bloomington, IN 47405, United States
| | - Youngheun Jo
- Department of Psychological and Brain Sciences, Indiana University, Bloomington, IN 47405, United States
| | - Maria Grazia Puxeddu
- Department of Psychological and Brain Sciences, Indiana University, Bloomington, IN 47405, United States; Department of Computer, Control and Management Engineering "Antonio Ruberti", Sapienza University of Rome, Rome 00185, Italy; IRCCS Fondazione Santa Lucia, Rome 00179, Italy
| | - Haily Merritt
- Luddy School of Informatics, Computing, and Engineering, Indiana University, Bloomington, IN 47405, United States; Cognitive Science Program, Indiana University, Bloomington, IN 47405, United States
| | - Jacob C Tanner
- Luddy School of Informatics, Computing, and Engineering, Indiana University, Bloomington, IN 47405, United States; Cognitive Science Program, Indiana University, Bloomington, IN 47405, United States
| | - Sarah Greenwell
- Department of Psychological and Brain Sciences, Indiana University, Bloomington, IN 47405, United States
| | - Riya Patel
- Department of Psychological and Brain Sciences, Indiana University, Bloomington, IN 47405, United States
| | - Joshua Faskowitz
- Department of Psychological and Brain Sciences, Indiana University, Bloomington, IN 47405, United States; Program in Neuroscience, Indiana University, Bloomington, IN 47405, United States
| | - Richard F Betzel
- Department of Psychological and Brain Sciences, Indiana University, Bloomington, IN 47405, United States; Cognitive Science Program, Indiana University, Bloomington, IN 47405, United States; Program in Neuroscience, Indiana University, Bloomington, IN 47405, United States; Network Science Institute, Indiana University, Bloomington, IN 47405, United States.
| |
Collapse
|
14
|
Mitrai I, Daoutidis P. Efficient Solution of Enterprise-Wide Optimization Problems Using Nested Stochastic Blockmodeling. Ind Eng Chem Res 2021. [DOI: 10.1021/acs.iecr.1c01570] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
Affiliation(s)
- Ilias Mitrai
- Department of Chemical Engineering and Materials Science, University of Minnesota, Minneapolis, Minnesota 55455, United States
| | - Prodromos Daoutidis
- Department of Chemical Engineering and Materials Science, University of Minnesota, Minneapolis, Minnesota 55455, United States
| |
Collapse
|
15
|
De Nicola G, Sischka B, Kauermann G. Mixture models and networks: The stochastic blockmodel. STAT MODEL 2021. [DOI: 10.1177/1471082x211033169] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/17/2022]
Abstract
Mixture models are probabilistic models aimed at uncovering and representing latent subgroups within a population. In the realm of network data analysis, the latent subgroups of nodes are typically identified by their connectivity behaviour, with nodes behaving similarly belonging to the same community. In this context, mixture modelling is pursued through stochastic blockmodelling. We consider stochastic blockmodels and some of their variants and extensions from a mixture modelling perspective. We also explore some of the main classes of estimation methods available and propose an alternative approach based on the reformulation of the blockmodel as a graphon. In addition to the discussion of inferential properties and estimating procedures, we focus on the application of the models to several real-world network datasets, showcasing the advantages and pitfalls of different approaches.
Collapse
Affiliation(s)
- Giacomo De Nicola
- Department of Statistics, Faculty of Mathematics, Informatics and Statistics, Ludwig-Maximilians-Universität München, Munich, Germany
| | - Benjamin Sischka
- Department of Statistics, Faculty of Mathematics, Informatics and Statistics, Ludwig-Maximilians-Universität München, Munich, Germany
| | - Göran Kauermann
- Department of Statistics, Faculty of Mathematics, Informatics and Statistics, Ludwig-Maximilians-Universität München, Munich, Germany
| |
Collapse
|
16
|
Wegner AE, Olhede S. Atomic subgraphs and the statistical mechanics of networks. Phys Rev E 2021; 103:042311. [PMID: 34005963 DOI: 10.1103/physreve.103.042311] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/25/2020] [Accepted: 02/17/2021] [Indexed: 11/07/2022]
Abstract
We develop random graph models where graphs are generated by connecting not only pairs of vertices by edges, but also larger subsets of vertices by copies of small atomic subgraphs of arbitrary topology. This allows for the generation of graphs with extensive numbers of triangles and other network motifs commonly observed in many real-world networks. More specifically, we focus on maximum entropy ensembles under constraints placed on the counts and distributions of atomic subgraphs and derive general expressions for the entropy of such models. We also present a procedure for combining distributions of multiple atomic subgraphs that enables the construction of models with fewer parameters. Expanding the model to include atoms with edge and vertex labels we obtain a general class of models that can be parametrized in terms of basic building blocks and their distributions that include many widely used models as special cases. These models include random graphs with arbitrary distributions of subgraphs, random hypergraphs, bipartite models, stochastic block models, models of multilayer networks and their degree-corrected and directed versions. We show that the entropy for all these models can be derived from a single expression that is characterized by the symmetry groups of atomic subgraphs.
Collapse
Affiliation(s)
- Anatol E Wegner
- Department of Statistical Science, University College London, London, United Kingdom
| | - Sofia Olhede
- Department of Statistical Science, University College London, London, United Kingdom.,Institute of Mathematics, Statistical Data Science Group, EPFL, Lausanne, Switzerland
| |
Collapse
|
17
|
Hartle H, Papadopoulos F, Krioukov D. Dynamic hidden-variable network models. Phys Rev E 2021; 103:052307. [PMID: 34134209 DOI: 10.1103/physreve.103.052307] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/05/2021] [Accepted: 03/12/2021] [Indexed: 11/07/2022]
Abstract
Models of complex networks often incorporate node-intrinsic properties abstracted as hidden variables. The probability of connections in the network is then a function of these variables. Real-world networks evolve over time and many exhibit dynamics of node characteristics as well as of linking structure. Here we introduce and study natural temporal extensions of static hidden-variable network models with stochastic dynamics of hidden variables and links. The dynamics is controlled by two parameters: one that tunes the rate of change of hidden variables and another that tunes the rate at which node pairs reevaluate their connections given the current values of hidden variables. Snapshots of networks in the dynamic models are equivalent to networks generated by the static models only if the link reevaluation rate is sufficiently larger than the rate of hidden-variable dynamics or if an additional mechanism is added whereby links actively respond to changes in hidden variables. Otherwise, links are out of equilibrium with respect to hidden variables and network snapshots exhibit structural deviations from the static models. We examine the level of structural persistence in the considered models and quantify deviations from staticlike behavior. We explore temporal versions of popular static models with community structure, latent geometry, and degree heterogeneity. While we do not attempt to directly model real networks, we comment on interesting qualitative resemblances to real systems. In particular, we speculate that links in some real networks are out of equilibrium with respect to hidden variables, partially explaining the presence of long-ranged links in geometrically embedded systems and intergroup connectivity in modular systems. We also discuss possible extensions, generalizations, and applications of the introduced class of dynamic network models.
Collapse
Affiliation(s)
- Harrison Hartle
- Network Science Institute, Northeastern University, Boston, 02115 Massachusetts, USA
| | - Fragkiskos Papadopoulos
- Department of Electrical Engineering, Computer Engineering and Informatics, Cyprus University of Technology, 3036 Limassol, Cyprus
| | - Dmitri Krioukov
- Network Science Institute, Northeastern University, Boston, 02115 Massachusetts, USA.,Northeastern University, Departments of Physics, Mathematics, and Electrical & Computer Engineering, Boston, 02115 Massachusetts, USA
| |
Collapse
|
18
|
Schawe H, Hartmann AK. Large deviations of connected components in the stochastic block model. Phys Rev E 2020; 102:052108. [PMID: 33327148 DOI: 10.1103/physreve.102.052108] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/06/2020] [Accepted: 10/19/2020] [Indexed: 06/12/2023]
Abstract
We study the stochastic block model, which is often used to model community structures and study community-detection algorithms. We consider the case of two blocks in regard to its largest connected component and largest biconnected component, respectively. We are especially interested in the distributions of their sizes including the tails down to probabilities smaller than 10^{-800}. For this purpose we use sophisticated Markov chain Monte Carlo simulations to sample graphs from the stochastic block model ensemble. We use these data to study the large-deviation rate function and conjecture that the large-deviation principle holds. Further we compare the distribution to the well-known Erdős-Rényi ensemble, where we notice subtle differences at and above the percolation threshold.
Collapse
Affiliation(s)
- Hendrik Schawe
- Laboratoire de Physique Théorique et Modélisation, UMR-8089 CNRS, CY Cergy Paris Université, 95000 Cergy, France
- Institut für Physik, Universität Oldenburg, 26111 Oldenburg, Germany
| | | |
Collapse
|
19
|
Morel-Balbi S, Peixoto TP. Null models for multioptimized large-scale network structures. Phys Rev E 2020; 102:032306. [PMID: 33075868 DOI: 10.1103/physreve.102.032306] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/23/2020] [Accepted: 08/31/2020] [Indexed: 11/07/2022]
Abstract
We study the emerging large-scale structures in networks subject to selective pressures that simultaneously drive toward higher modularity and robustness against random failures. We construct maximum-entropy null models that isolate the effects of the joint optimization on the network structure from any kind of evolutionary dynamics. Our analysis reveals a rich phase diagram of optimized structures, composed of many combinations of modular, core-periphery, and bipartite patterns. Furthermore, we observe parameter regions where the simultaneous optimization can be either synergistic or antagonistic, with the improvement of one criterion directly aiding or hindering the other, respectively. Our results show how interactions between different selective pressures can be pivotal in determining the emerging network structure, and that these interactions can be captured by simple network models.
Collapse
Affiliation(s)
- Sebastian Morel-Balbi
- Department of Mathematical Sciences, University of Bath, Claverton Down, Bath BA2 7AY, United Kingdom
| | - Tiago P Peixoto
- Department of Network and Data Science, Central European University, 1100 Vienna, Austria; ISI Foundation, 10126 Torino, Italy; and Department of Mathematical Sciences, University of Bath, Claverton Down, Bath BA2 7AY, United Kingdom
| |
Collapse
|
20
|
Yen TC, Larremore DB. Community detection in bipartite networks with stochastic block models. Phys Rev E 2020; 102:032309. [PMID: 33075933 DOI: 10.1103/physreve.102.032309] [Citation(s) in RCA: 21] [Impact Index Per Article: 4.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/22/2020] [Accepted: 07/23/2020] [Indexed: 11/07/2022]
Abstract
In bipartite networks, community structures are restricted to being disassortative, in that nodes of one type are grouped according to common patterns of connection with nodes of the other type. This makes the stochastic block model (SBM), a highly flexible generative model for networks with block structure, an intuitive choice for bipartite community detection. However, typical formulations of the SBM do not make use of the special structure of bipartite networks. Here we introduce a Bayesian nonparametric formulation of the SBM and a corresponding algorithm to efficiently find communities in bipartite networks which parsimoniously chooses the number of communities. The biSBM improves community detection results over general SBMs when data are noisy, improves the model resolution limit by a factor of sqrt[2], and expands our understanding of the complicated optimization landscape associated with community detection tasks. A direct comparison of certain terms of the prior distributions in the biSBM and a related high-resolution hierarchical SBM also reveals a counterintuitive regime of community detection problems, populated by smaller and sparser networks, where nonhierarchical models outperform their more flexible counterpart.
Collapse
Affiliation(s)
- Tzu-Chi Yen
- Department of Computer Science, University of Colorado, Boulder, Colorado 80309, USA
| | - Daniel B Larremore
- Department of Computer Science, University of Colorado, Boulder, Colorado 80309, USA.,BioFrontiers Institute, University of Colorado, Boulder, Colorado 80303, USA
| |
Collapse
|
21
|
Lu X, Cross B, Szymanski BK. Asymptotic resolution bounds of generalized modularity and multi-scale community detection. Inf Sci (N Y) 2020. [DOI: 10.1016/j.ins.2020.03.082] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/24/2022]
|
22
|
Funke T, Becker T. Stochastic block models: A comparison of variants and inference methods. PLoS One 2019; 14:e0215296. [PMID: 31013290 PMCID: PMC6478296 DOI: 10.1371/journal.pone.0215296] [Citation(s) in RCA: 15] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/28/2018] [Accepted: 03/30/2019] [Indexed: 11/19/2022] Open
Abstract
Finding communities in complex networks is a challenging task and one promising approach is the Stochastic Block Model (SBM). But the influences from various fields led to a diversity of variants and inference methods. Therefore, a comparison of the existing techniques and an independent analysis of their capabilities and weaknesses is needed. As a first step, we review the development of different SBM variants such as the degree-corrected SBM of Karrer and Newman or Peixoto's hierarchical SBM. Beside stating all these variants in a uniform notation, we show the reasons for their development. Knowing the variants, we discuss a variety of approaches to infer the optimal partition like the Metropolis-Hastings algorithm. We perform our analysis based on our extension of the Girvan-Newman test and the Lancichinetti-Fortunato-Radicchi benchmark as well as a selection of some real world networks. Using these results, we give some guidance to the challenging task of selecting an inference method and SBM variant. In addition, we give a simple heuristic to determine the number of steps for the Metropolis-Hastings algorithms that lack a usual stop criterion. With our comparison, we hope to guide researches in the field of SBM and highlight the problem of existing techniques to focus future research. Finally, by making our code freely available, we want to promote a faster development, integration and exchange of new ideas.
Collapse
Affiliation(s)
- Thorben Funke
- Production Systems and Logistic Systems, BIBA - Bremer Institut für Produktion und Logistik GmbH at the University of Bremen, Bremen, Bremen, Germany
- Faculty of Production Engineering, University of Bremen, Bremen, Bremen, Germany
| | - Till Becker
- Production Systems and Logistic Systems, BIBA - Bremer Institut für Produktion und Logistik GmbH at the University of Bremen, Bremen, Bremen, Germany
- Faculty of Business Studies, University of Applied Sciences Emden/Leer, Emden, Lower Saxony, Germany
| |
Collapse
|
23
|
Cantwell GT, Newman MEJ. Mixing patterns and individual differences in networks. Phys Rev E 2019; 99:042306. [PMID: 31108687 DOI: 10.1103/physreve.99.042306] [Citation(s) in RCA: 13] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/05/2018] [Indexed: 11/07/2022]
Abstract
We study mixing patterns in networks, meaning the propensity for nodes of different kinds to connect to one another. The phenomenon of assortative mixing, whereby nodes prefer to connect to others that are similar to themselves, has been widely studied, but here we go further and examine how and to what extent nodes that are otherwise similar can have different preferences. Many individuals in a friendship network, for instance, may prefer friends who are roughly the same age as themselves, but some may display a preference for older or younger friends. We introduce a network model that captures this behavior and a method for fitting it to empirical network data. We propose metrics to characterize the mean and variation of mixing patterns and show how to infer their values from the fitted model, either using maximum-likelihood estimates of model parameters or in a Bayesian framework that does not require fixing any parameters.
Collapse
Affiliation(s)
- George T Cantwell
- Department of Physics, University of Michigan, Ann Arbor, Michigan 48109, USA
| | - M E J Newman
- Department of Physics, University of Michigan, Ann Arbor, Michigan 48109, USA.,Center for the Study of Complex Systems, University of Michigan, Ann Arbor, Michigan 48109, USA
| |
Collapse
|
24
|
Basurto-Flores R, Guzmán-Vargas L, Velasco S, Medina A, Calvo Hernandez A. On entropy research analysis: cross-disciplinary knowledge transfer. Scientometrics 2018; 117:123-139. [PMID: 30237641 PMCID: PMC6133011 DOI: 10.1007/s11192-018-2860-1] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/17/2017] [Indexed: 12/02/2022]
Abstract
Our aim is to illustrate how the thermodynamics-based concept of entropy has spread across different areas of knowledge by analyzing the distribution of papers, citations and the use of words related to entropy in the predefined Scopus categories. To achieve this, we analyze the Scopus papers database related to entropy research during the last 20 years, collecting 750 K research papers which directly contain or mention the word entropy. First, some well-recognized works which introduced novel entropy-related definitions are monitored. Then we compare the hierarchical structure which emerges for the different cases of association, which can be in terms of citations among papers, classification of papers in categories or key words in abstracts and titles. Our study allowed us to evaluate, to some extent, the utility and versatility of concepts such as entropy to permeate in different areas of science. Furthermore, the use of specific terms (key words) in titles and abstracts provided a useful way to account for the interaction between areas in the category research space.
Collapse
Affiliation(s)
- R. Basurto-Flores
- Instituto Politécnico Nacional, Unidad Profesional Interdisciplinaria en Ingeniería y Tecnologías Avanzadas, 07340 Ciudad de México, Mexico
| | - L. Guzmán-Vargas
- Instituto Politécnico Nacional, Unidad Profesional Interdisciplinaria en Ingeniería y Tecnologías Avanzadas, 07340 Ciudad de México, Mexico
| | - S. Velasco
- Departamento de Física Aplicada, Universidad de Salamanca, 37008 Salamanca, Spain
- Instituto Universitario de Física Fundamental y Matemáticas (IUFFyM), Universidad de Salamanca, 37008 Salamanca, Spain
| | - A. Medina
- Departamento de Física Aplicada, Universidad de Salamanca, 37008 Salamanca, Spain
| | - A. Calvo Hernandez
- Departamento de Física Aplicada, Universidad de Salamanca, 37008 Salamanca, Spain
- Instituto Universitario de Física Fundamental y Matemáticas (IUFFyM), Universidad de Salamanca, 37008 Salamanca, Spain
| |
Collapse
|
25
|
Hric D, Kaski K, Kivelä M. Stochastic block model reveals maps of citation patterns and their evolution in time. J Informetr 2018. [DOI: 10.1016/j.joi.2018.05.004] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/28/2022]
|
26
|
Kartun-Giles AP, Krioukov D, Gleeson JP, Moreno Y, Bianconi G. Sparse Power-Law Network Model for Reliable Statistical Predictions Based on Sampled Data. ENTROPY (BASEL, SWITZERLAND) 2018; 20:e20040257. [PMID: 33265348 PMCID: PMC7512772 DOI: 10.3390/e20040257] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 03/02/2018] [Revised: 04/04/2018] [Accepted: 04/05/2018] [Indexed: 06/12/2023]
Abstract
A projective network model is a model that enables predictions to be made based on a subsample of the network data, with the predictions remaining unchanged if a larger sample is taken into consideration. An exchangeable model is a model that does not depend on the order in which nodes are sampled. Despite a large variety of non-equilibrium (growing) and equilibrium (static) sparse complex network models that are widely used in network science, how to reconcile sparseness (constant average degree) with the desired statistical properties of projectivity and exchangeability is currently an outstanding scientific problem. Here we propose a network process with hidden variables which is projective and can generate sparse power-law networks. Despite the model not being exchangeable, it can be closely related to exchangeable uncorrelated networks as indicated by its information theory characterization and its network entropy. The use of the proposed network process as a null model is here tested on real data, indicating that the model offers a promising avenue for statistical network modelling.
Collapse
Affiliation(s)
| | - Dmitri Krioukov
- Departments of Physics, Mathematics, and Electrical & Computer Engineering, Northeastern University, Boston 02120, MA, USA
| | - James P. Gleeson
- MACSI, Department of Mathematics and Statistics, University of Limerick, Limerick V94 T9PX, Ireland
| | - Yamir Moreno
- Institute for Biocomputation and Physics of Complex Systems (BIFI), University of Zaragoza, Zaragoza 50013, Spain
- Department of Theoretical Physics, Faculty of Sciences, University of Zaragoza, Zaragoza 50013, Spain
- Institute for Scientific Interchange (ISI Foundation), Turin 10121, Italy
- Complexity Science Hub Vienna, Vienna 22180, Austria
| | - Ginestra Bianconi
- School of Mathematical Sciences, Queen Mary University of London, London E1 4NS, UK
| |
Collapse
|
27
|
Young JG, Desrosiers P, Hébert-Dufresne L, Laurence E, Dubé LJ. Finite-size analysis of the detectability limit of the stochastic block model. Phys Rev E 2017; 95:062304. [PMID: 28709195 DOI: 10.1103/physreve.95.062304] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/31/2016] [Indexed: 06/07/2023]
Abstract
It has been shown in recent years that the stochastic block model is sometimes undetectable in the sparse limit, i.e., that no algorithm can identify a partition correlated with the partition used to generate an instance, if the instance is sparse enough and infinitely large. In this contribution, we treat the finite case explicitly, using arguments drawn from information theory and statistics. We give a necessary condition for finite-size detectability in the general SBM. We then distinguish the concept of average detectability from the concept of instance-by-instance detectability and give explicit formulas for both definitions. Using these formulas, we prove that there exist large equivalence classes of parameters, where widely different network ensembles are equally detectable with respect to our definitions of detectability. In an extensive case study, we investigate the finite-size detectability of a simplified variant of the SBM, which encompasses a number of important models as special cases. These models include the symmetric SBM, the planted coloring model, and more exotic SBMs not previously studied. We conclude with three appendices, where we study the interplay of noise and detectability, establish a connection between our information-theoretic approach and random matrix theory, and provide proofs of some of the more technical results.
Collapse
Affiliation(s)
- Jean-Gabriel Young
- Département de Physique, de Génie Physique, et d'Optique, Université Laval, Québec (Québec), Canada G1V 0A6
| | - Patrick Desrosiers
- Département de Physique, de Génie Physique, et d'Optique, Université Laval, Québec (Québec), Canada G1V 0A6
- Centre de recherche de l'Institut universitaire en santé mentale de Québec, Québec (Québec), Canada G1J 2G3
| | | | - Edward Laurence
- Département de Physique, de Génie Physique, et d'Optique, Université Laval, Québec (Québec), Canada G1V 0A6
| | - Louis J Dubé
- Département de Physique, de Génie Physique, et d'Optique, Université Laval, Québec (Québec), Canada G1V 0A6
| |
Collapse
|
28
|
Peel L, Larremore DB, Clauset A. The ground truth about metadata and community detection in networks. SCIENCE ADVANCES 2017; 3:e1602548. [PMID: 28508065 PMCID: PMC5415338 DOI: 10.1126/sciadv.1602548] [Citation(s) in RCA: 126] [Impact Index Per Article: 15.8] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 10/17/2016] [Accepted: 03/08/2017] [Indexed: 05/30/2023]
Abstract
Across many scientific domains, there is a common need to automatically extract a simplified view or coarse-graining of how a complex system's components interact. This general task is called community detection in networks and is analogous to searching for clusters in independent vector data. It is common to evaluate the performance of community detection algorithms by their ability to find so-called ground truth communities. This works well in synthetic networks with planted communities because these networks' links are formed explicitly based on those known communities. However, there are no planted communities in real-world networks. Instead, it is standard practice to treat some observed discrete-valued node attributes, or metadata, as ground truth. We show that metadata are not the same as ground truth and that treating them as such induces severe theoretical and practical problems. We prove that no algorithm can uniquely solve community detection, and we prove a general No Free Lunch theorem for community detection, which implies that there can be no algorithm that is optimal for all possible community detection tasks. However, community detection remains a powerful tool and node metadata still have value, so a careful exploration of their relationship with network structure can yield insights of genuine worth. We illustrate this point by introducing two statistical techniques that can quantify the relationship between metadata and community structure for a broad class of models. We demonstrate these techniques using both synthetic and real-world networks, and for multiple types of metadata and community structures.
Collapse
Affiliation(s)
- Leto Peel
- Institute of Information and Communication Technologies, Electronics and Applied Mathematics, Université Catholique de Louvain, Louvain-la-Neuve, Belgium
- naXys, Université de Namur, Namur, Belgium
| | | | - Aaron Clauset
- Santa Fe Institute, Santa Fe, NM 87501, USA
- Department of Computer Science, University of Colorado, Boulder, CO 80309, USA
- BioFrontiers Institute, University of Colorado, Boulder, CO 80309, USA
| |
Collapse
|
29
|
Pelillo M, Elezi I, Fiorucci M. Revealing structure in large graphs: Szemerédi’s regularity lemma and its use in pattern recognition. Pattern Recognit Lett 2017. [DOI: 10.1016/j.patrec.2016.09.007] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/21/2022]
|
30
|
Peixoto TP. Nonparametric Bayesian inference of the microcanonical stochastic block model. Phys Rev E 2017; 95:012317. [PMID: 28208453 DOI: 10.1103/physreve.95.012317] [Citation(s) in RCA: 60] [Impact Index Per Article: 7.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/20/2016] [Indexed: 11/07/2022]
Abstract
A principled approach to characterize the hidden structure of networks is to formulate generative models and then infer their parameters from data. When the desired structure is composed of modules or "communities," a suitable choice for this task is the stochastic block model (SBM), where nodes are divided into groups, and the placement of edges is conditioned on the group memberships. Here, we present a nonparametric Bayesian method to infer the modular structure of empirical networks, including the number of modules and their hierarchical organization. We focus on a microcanonical variant of the SBM, where the structure is imposed via hard constraints, i.e., the generated networks are not allowed to violate the patterns imposed by the model. We show how this simple model variation allows simultaneously for two important improvements over more traditional inference approaches: (1) deeper Bayesian hierarchies, with noninformative priors replaced by sequences of priors and hyperpriors, which not only remove limitations that seriously degrade the inference on large networks but also reveal structures at multiple scales; (2) a very efficient inference algorithm that scales well not only for networks with a large number of nodes and edges but also with an unlimited number of modules. We show also how this approach can be used to sample modular hierarchies from the posterior distribution, as well as to perform model selection. We discuss and analyze the differences between sampling from the posterior and simply finding the single parameter estimate that maximizes it. Furthermore, we expose a direct equivalence between our microcanonical approach and alternative derivations based on the canonical SBM.
Collapse
Affiliation(s)
- Tiago P Peixoto
- Department of Mathematical Sciences and Centre for Networks and Collective Behaviour, University of Bath, Claverton Down, Bath BA2 7AY, United Kingdom and ISI Foundation, Via Alassio 11/c, 10126 Torino, Italy
| |
Collapse
|
31
|
Young JG, Hébert-Dufresne L, Allard A, Dubé LJ. Growing networks of overlapping communities with internal structure. Phys Rev E 2016; 94:022317. [PMID: 27627327 DOI: 10.1103/physreve.94.022317] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/17/2016] [Indexed: 06/06/2023]
Abstract
We introduce an intuitive model that describes both the emergence of community structure and the evolution of the internal structure of communities in growing social networks. The model comprises two complementary mechanisms: One mechanism accounts for the evolution of the internal link structure of a single community, and the second mechanism coordinates the growth of multiple overlapping communities. The first mechanism is based on the assumption that each node establishes links with its neighbors and introduces new nodes to the community at different rates. We demonstrate that this simple mechanism gives rise to an effective maximal degree within communities. This observation is related to the anthropological theory known as Dunbar's number, i.e., the empirical observation of a maximal number of ties which an average individual can sustain within its social groups. The second mechanism is based on a recently proposed generalization of preferential attachment to community structure, appropriately called structural preferential attachment (SPA). The combination of these two mechanisms into a single model (SPA+) allows us to reproduce a number of the global statistics of real networks: The distribution of community sizes, of node memberships, and of degrees. The SPA+ model also predicts (a) three qualitative regimes for the degree distribution within overlapping communities and (b) strong correlations between the number of communities to which a node belongs and its number of connections within each community. We present empirical evidence that support our findings in real complex networks.
Collapse
Affiliation(s)
- Jean-Gabriel Young
- Département de Physique, de Génie Physique, et d'Optique, Université Laval, Québec (Québec), Canada G1V 0A6
| | - Laurent Hébert-Dufresne
- Département de Physique, de Génie Physique, et d'Optique, Université Laval, Québec (Québec), Canada G1V 0A6
- Santa Fe Institute, Santa Fe, New Mexico 87501, USA
| | - Antoine Allard
- Département de Physique, de Génie Physique, et d'Optique, Université Laval, Québec (Québec), Canada G1V 0A6
- Departament de Física de la Matèria Condensada, Universitat de Barcelona, Martí i Franquès 1, E-08028 Barcelona, Spain
- Universitat de Barcelona Institute of Complex Systems (UBICS), Universitat de Barcelona, Barcelona, Spain
| | - Louis J Dubé
- Département de Physique, de Génie Physique, et d'Optique, Université Laval, Québec (Québec), Canada G1V 0A6
| |
Collapse
|
32
|
Iacovacci J, Bianconi G. Extracting information from multiplex networks. CHAOS (WOODBURY, N.Y.) 2016; 26:065306. [PMID: 27368796 DOI: 10.1063/1.4953161] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/06/2023]
Abstract
Multiplex networks are generalized network structures that are able to describe networks in which the same set of nodes are connected by links that have different connotations. Multiplex networks are ubiquitous since they describe social, financial, engineering, and biological networks as well. Extending our ability to analyze complex networks to multiplex network structures increases greatly the level of information that is possible to extract from big data. For these reasons, characterizing the centrality of nodes in multiplex networks and finding new ways to solve challenging inference problems defined on multiplex networks are fundamental questions of network science. In this paper, we discuss the relevance of the Multiplex PageRank algorithm for measuring the centrality of nodes in multilayer networks and we characterize the utility of the recently introduced indicator function Θ̃(S) for describing their mesoscale organization and community structure. As working examples for studying these measures, we consider three multiplex network datasets coming for social science.
Collapse
Affiliation(s)
- Jacopo Iacovacci
- School of Mathematical Sciences, Queen Mary University of London, Mile End Road, E1 4NS, United Kingdom, London
| | - Ginestra Bianconi
- School of Mathematical Sciences, Queen Mary University of London, Mile End Road, E1 4NS, United Kingdom, London
| |
Collapse
|
33
|
Traxl D, Boers N, Kurths J. Deep graphs-A general framework to represent and analyze heterogeneous complex systems across scales. CHAOS (WOODBURY, N.Y.) 2016; 26:065303. [PMID: 27368793 DOI: 10.1063/1.4952963] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/06/2023]
Abstract
Network theory has proven to be a powerful tool in describing and analyzing systems by modelling the relations between their constituent objects. Particularly in recent years, a great progress has been made by augmenting "traditional" network theory in order to account for the multiplex nature of many networks, multiple types of connections between objects, the time-evolution of networks, networks of networks and other intricacies. However, existing network representations still lack crucial features in order to serve as a general data analysis tool. These include, most importantly, an explicit association of information with possibly heterogeneous types of objects and relations, and a conclusive representation of the properties of groups of nodes as well as the interactions between such groups on different scales. In this paper, we introduce a collection of definitions resulting in a framework that, on the one hand, entails and unifies existing network representations (e.g., network of networks and multilayer networks), and on the other hand, generalizes and extends them by incorporating the above features. To implement these features, we first specify the nodes and edges of a finite graph as sets of properties (which are permitted to be arbitrary mathematical objects). Second, the mathematical concept of partition lattices is transferred to the network theory in order to demonstrate how partitioning the node and edge set of a graph into supernodes and superedges allows us to aggregate, compute, and allocate information on and between arbitrary groups of nodes. The derived partition lattice of a graph, which we denote by deep graph, constitutes a concise, yet comprehensive representation that enables the expression and analysis of heterogeneous properties, relations, and interactions on all scales of a complex system in a self-contained manner. Furthermore, to be able to utilize existing network-based methods and models, we derive different representations of multilayer networks from our framework and demonstrate the advantages of our representation. On the basis of the formal framework described here, we provide a rich, fully scalable (and self-explanatory) software package that integrates into the PyData ecosystem and offers interfaces to popular network packages, making it a powerful, general-purpose data analysis toolkit. We exemplify an application of deep graphs using a real world dataset, comprising 16 years of satellite-derived global precipitation measurements. We deduce a deep graph representation of these measurements in order to track and investigate local formations of spatio-temporal clusters of extreme precipitation events.
Collapse
Affiliation(s)
- Dominik Traxl
- Department of Physics, Humboldt Universität zu Berlin, Berlin, Germany
| | - Niklas Boers
- Potsdam Institute for Climate Impact Research, Potsdam, Germany
| | - Jürgen Kurths
- Department of Physics, Humboldt Universität zu Berlin, Berlin, Germany
| |
Collapse
|
34
|
Krioukov D. Clustering Implies Geometry in Networks. PHYSICAL REVIEW LETTERS 2016; 116:208302. [PMID: 27258887 DOI: 10.1103/physrevlett.116.208302] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/02/2016] [Indexed: 06/05/2023]
Abstract
Network models with latent geometry have been used successfully in many applications in network science and other disciplines, yet it is usually impossible to tell if a given real network is geometric, meaning if it is a typical element in an ensemble of random geometric graphs. Here we identify structural properties of networks that guarantee that random graphs having these properties are geometric. Specifically we show that random graphs in which expected degree and clustering of every node are fixed to some constants are equivalent to random geometric graphs on the real line, if clustering is sufficiently strong. Large numbers of triangles, homogeneously distributed across all nodes as in real networks, are thus a consequence of network geometricity. The methods we use to prove this are quite general and applicable to other network ensembles, geometric or not, and to certain problems in quantum gravity.
Collapse
Affiliation(s)
- Dmitri Krioukov
- Northeastern University, Departments of Physics, Mathematics, and Electrical and Computer Engineering, Boston, Massachusetts 02115, USA
| |
Collapse
|
35
|
Šubelj L, van Eck NJ, Waltman L. Clustering Scientific Publications Based on Citation Relations: A Systematic Comparison of Different Methods. PLoS One 2016; 11:e0154404. [PMID: 27124610 PMCID: PMC4849655 DOI: 10.1371/journal.pone.0154404] [Citation(s) in RCA: 70] [Impact Index Per Article: 7.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/30/2015] [Accepted: 04/13/2016] [Indexed: 11/19/2022] Open
Abstract
Clustering methods are applied regularly in the bibliometric literature to identify research areas or scientific fields. These methods are for instance used to group publications into clusters based on their relations in a citation network. In the network science literature, many clustering methods, often referred to as graph partitioning or community detection techniques, have been developed. Focusing on the problem of clustering the publications in a citation network, we present a systematic comparison of the performance of a large number of these clustering methods. Using a number of different citation networks, some of them relatively small and others very large, we extensively study the statistical properties of the results provided by different methods. In addition, we also carry out an expert-based assessment of the results produced by different methods. The expert-based assessment focuses on publications in the field of scientometrics. Our findings seem to indicate that there is a trade-off between different properties that may be considered desirable for a good clustering of publications. Overall, map equation methods appear to perform best in our analysis, suggesting that these methods deserve more attention from the bibliometric community.
Collapse
Affiliation(s)
- Lovro Šubelj
- University of Ljubljana, Faculty of Computer and Information Science, Ljubljana, Slovenia
- * E-mail:
| | - Nees Jan van Eck
- Leiden University, Centre for Science and Technology Studies, Leiden, Netherlands
| | - Ludo Waltman
- Leiden University, Centre for Science and Technology Studies, Leiden, Netherlands
| |
Collapse
|
36
|
Génois M, Vestergaard CL, Cattuto C, Barrat A. Compensating for population sampling in simulations of epidemic spread on temporal contact networks. Nat Commun 2015; 6:8860. [PMID: 26563418 PMCID: PMC4660211 DOI: 10.1038/ncomms9860] [Citation(s) in RCA: 37] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/13/2015] [Accepted: 10/09/2015] [Indexed: 11/09/2022] Open
Abstract
Data describing human interactions often suffer from incomplete sampling of the underlying population. As a consequence, the study of contagion processes using data-driven models can lead to a severe underestimation of the epidemic risk. Here we present a systematic method to alleviate this issue and obtain a better estimation of the risk in the context of epidemic models informed by high-resolution time-resolved contact data. We consider several such data sets collected in various contexts and perform controlled resampling experiments. We show how the statistical information contained in the resampled data can be used to build a series of surrogate versions of the unknown contacts. We simulate epidemic processes on the resulting reconstructed data sets and show that it is possible to obtain good estimates of the outcome of simulations performed using the complete data set. We discuss limitations and potential improvements of our method.
Collapse
Affiliation(s)
- Mathieu Génois
- Aix Marseille Université, Université de Toulon, CNRS, CPT, UMR 7332, 13288 Marseille, France
| | | | - Ciro Cattuto
- Data Science Laboratory, ISI Foundation, 10126 Torino, Italy
| | - Alain Barrat
- Aix Marseille Université, Université de Toulon, CNRS, CPT, UMR 7332, 13288 Marseille, France
- Data Science Laboratory, ISI Foundation, 10126 Torino, Italy
| |
Collapse
|
37
|
Sagarra O, Pérez Vicente CJ, Díaz-Guilera A. Role of adjacency-matrix degeneracy in maximum-entropy-weighted network models. PHYSICAL REVIEW. E, STATISTICAL, NONLINEAR, AND SOFT MATTER PHYSICS 2015; 92:052816. [PMID: 26651753 DOI: 10.1103/physreve.92.052816] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Received: 08/03/2015] [Indexed: 06/05/2023]
Abstract
Complex network null models based on entropy maximization are becoming a powerful tool to characterize and analyze data from real systems. However, it is not easy to extract good and unbiased information from these models: A proper understanding of the nature of the underlying events represented in them is crucial. In this paper we emphasize this fact stressing how an accurate counting of configurations compatible with given constraints is fundamental to build good null models for the case of networks with integer-valued adjacency matrices constructed from an aggregation of one or multiple layers. We show how different assumptions about the elements from which the networks are built give rise to distinctively different statistics, even when considering the same observables to match those of real data. We illustrate our findings by applying the formalism to three data sets using an open-source software package accompanying the present work and demonstrate how such differences are clearly seen when measuring network observables.
Collapse
Affiliation(s)
- O Sagarra
- Departament de Física Fonamental, Universitat de Barcelona, 08028 Barcelona, Spain
| | - C J Pérez Vicente
- Departament de Física Fonamental, Universitat de Barcelona, 08028 Barcelona, Spain
| | - A Díaz-Guilera
- Departament de Física Fonamental, Universitat de Barcelona, 08028 Barcelona, Spain
| |
Collapse
|
38
|
Iacovacci J, Wu Z, Bianconi G. Mesoscopic structures reveal the network between the layers of multiplex data sets. PHYSICAL REVIEW. E, STATISTICAL, NONLINEAR, AND SOFT MATTER PHYSICS 2015; 92:042806. [PMID: 26565288 DOI: 10.1103/physreve.92.042806] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/14/2015] [Indexed: 06/05/2023]
Abstract
Multiplex networks describe a large variety of complex systems, whose elements (nodes) can be connected by different types of interactions forming different layers (networks) of the multiplex. Multiplex networks include social networks, transportation networks, or biological networks in the cell or in the brain. Extracting relevant information from these networks is of crucial importance for solving challenging inference problems and for characterizing the multiplex networks microscopic and mesoscopic structure. Here we propose an information theory method to extract the network between the layers of multiplex data sets, forming a "network of networks." We build an indicator function, based on the entropy of network ensembles, to characterize the mesoscopic similarities between the layers of a multiplex network, and we use clustering techniques to characterize the communities present in this network of networks. We apply the proposed method to study the Multiplex Collaboration Network formed by scientists collaborating on different subjects and publishing in the American Physical Society journals. The analysis of this data set reveals the interplay between the collaboration networks and the organization of knowledge in physics.
Collapse
Affiliation(s)
- Jacopo Iacovacci
- School of Mathematical Sciences, Queen Mary University of London, London, United Kingdom
| | - Zhihao Wu
- School of Computer and Information Technology, Beijing Jiaotong University, Beijing, China
| | - Ginestra Bianconi
- School of Mathematical Sciences, Queen Mary University of London, London, United Kingdom
| |
Collapse
|
39
|
Peixoto TP. Inferring the mesoscale structure of layered, edge-valued, and time-varying networks. PHYSICAL REVIEW. E, STATISTICAL, NONLINEAR, AND SOFT MATTER PHYSICS 2015; 92:042807. [PMID: 26565289 DOI: 10.1103/physreve.92.042807] [Citation(s) in RCA: 48] [Impact Index Per Article: 4.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/18/2015] [Indexed: 05/24/2023]
Abstract
Many network systems are composed of interdependent but distinct types of interactions, which cannot be fully understood in isolation. These different types of interactions are often represented as layers, attributes on the edges, or as a time dependence of the network structure. Although they are crucial for a more comprehensive scientific understanding, these representations offer substantial challenges. Namely, it is an open problem how to precisely characterize the large or mesoscale structure of network systems in relation to these additional aspects. Furthermore, the direct incorporation of these features invariably increases the effective dimension of the network description, and hence aggravates the problem of overfitting, i.e., the use of overly complex characterizations that mistake purely random fluctuations for actual structure. In this work, we propose a robust and principled method to tackle these problems, by constructing generative models of modular network structure, incorporating layered, attributed and time-varying properties, as well as a nonparametric Bayesian methodology to infer the parameters from data and select the most appropriate model according to statistical evidence. We show that the method is capable of revealing hidden structure in layered, edge-valued, and time-varying networks, and that the most appropriate level of granularity with respect to the additional dimensions can be reliably identified. We illustrate our approach on a variety of empirical systems, including a social network of physicians, the voting correlations of deputies in the Brazilian national congress, the global airport network, and a proximity network of high-school students.
Collapse
Affiliation(s)
- Tiago P Peixoto
- Institut für Theoretische Physik, Universität Bremen, Hochschulring 18, D-28359 Bremen, Germany
| |
Collapse
|
40
|
Kawamoto T, Kabashima Y. Limitations in the spectral method for graph partitioning: Detectability threshold and localization of eigenvectors. PHYSICAL REVIEW. E, STATISTICAL, NONLINEAR, AND SOFT MATTER PHYSICS 2015; 91:062803. [PMID: 26172750 DOI: 10.1103/physreve.91.062803] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Received: 02/25/2015] [Indexed: 06/04/2023]
Abstract
Investigating the performance of different methods is a fundamental problem in graph partitioning. In this paper, we estimate the so-called detectability threshold for the spectral method with both un-normalized and normalized Laplacians in sparse graphs. The detectability threshold is the critical point at which the result of the spectral method is completely uncorrelated to the planted partition. We also analyze whether the localization of eigenvectors affects the partitioning performance in the detectable region. We use the replica method, which is often used in the field of spin-glass theory, and focus on the case of bisection. We show that the gap between the estimated threshold for the spectral method and the threshold obtained from Bayesian inference is considerable in sparse graphs, even without eigenvector localization. This gap closes in a dense limit.
Collapse
Affiliation(s)
- Tatsuro Kawamoto
- Department of Computational Intelligence and Systems Science, Tokyo Institute of Technology, 4259-G5-22, Nagatsuta-cho, Midori-ku, Yokohama, Kanagawa 226-8502, Japan
| | - Yoshiyuki Kabashima
- Department of Computational Intelligence and Systems Science, Tokyo Institute of Technology, 4259-G5-22, Nagatsuta-cho, Midori-ku, Yokohama, Kanagawa 226-8502, Japan
| |
Collapse
|
41
|
Peixoto TP. Model Selection and Hypothesis Testing for Large-Scale Network Models with Overlapping Groups. PHYSICAL REVIEW X 2015; 5:011033. [DOI: 10.1103/physrevx.5.011033] [Citation(s) in RCA: 24] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/03/2025]
|
42
|
Priester C, Schmitt S, Peixoto TP. Limits and trade-offs of topological network robustness. PLoS One 2014; 9:e108215. [PMID: 25250565 PMCID: PMC4176960 DOI: 10.1371/journal.pone.0108215] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/23/2014] [Accepted: 08/26/2014] [Indexed: 11/18/2022] Open
Abstract
We investigate the trade-off between the robustness against random and targeted removal of nodes from a network. To this end we utilize the stochastic block model to study ensembles of infinitely large networks with arbitrary large-scale structures. We present results from numerical two-objective optimization simulations for networks with various fixed mean degree and number of blocks. The results provide strong evidence that three different blocks are sufficient to realize the best trade-off between the two measures of robustness, i.e. to obtain the complete front of Pareto-optimal networks. For all values of the mean degree, a characteristic three block structure emerges over large parts of the Pareto-optimal front. This structure can be often characterized as a core-periphery structure, composed of a group of core nodes with high degree connected among themselves and to a periphery of low-degree nodes, in addition to a third group of nodes which is disconnected from the periphery, and weakly connected to the core. Only at both extremes of the Pareto-optimal front, corresponding to maximal robustness against random and targeted node removal, a two-block core-periphery structure or a one-block fully random network are found, respectively.
Collapse
Affiliation(s)
- Christopher Priester
- Institut für Festkörperphysik, Technische Universität Darmstadt, Darmstadt, Germany
| | | | - Tiago P. Peixoto
- Institut für Theoretische Physik, Universität Bremen, Bremen, Germany
| |
Collapse
|
43
|
Herlau T, Schmidt MN, Mørup M. Infinite-degree-corrected stochastic block model. PHYSICAL REVIEW. E, STATISTICAL, NONLINEAR, AND SOFT MATTER PHYSICS 2014; 90:032819. [PMID: 25314493 DOI: 10.1103/physreve.90.032819] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 12/09/2013] [Indexed: 06/04/2023]
Abstract
In stochastic block models, which are among the most prominent statistical models for cluster analysis of complex networks, clusters are defined as groups of nodes with statistically similar link probabilities within and between groups. A recent extension by Karrer and Newman [Karrer and Newman, Phys. Rev. E 83, 016107 (2011)] incorporates a node degree correction to model degree heterogeneity within each group. Although this demonstrably leads to better performance on several networks, it is not obvious whether modeling node degree is always appropriate or necessary. We formulate the degree corrected stochastic block model as a nonparametric Bayesian model, incorporating a parameter to control the amount of degree correction that can then be inferred from data. Additionally, our formulation yields principled ways of inferring the number of groups as well as predicting missing links in the network that can be used to quantify the model's predictive performance. On synthetic data we demonstrate that including the degree correction yields better performance on both recovering the true group structure and predicting missing links when degree heterogeneity is present, whereas performance is on par for data with no degree heterogeneity within clusters. On seven real networks (with no ground truth group structure available) we show that predictive performance is about equal whether or not degree correction is included; however, for some networks significantly fewer clusters are discovered when correcting for degree, indicating that the data can be more compactly explained by clusters of heterogenous degree nodes.
Collapse
Affiliation(s)
- Tue Herlau
- Section for Cognitive Systems, DTU Compute, Technical University of Denmark, DK-2800 Kongens Lyngby, Denmark
| | - Mikkel N Schmidt
- Section for Cognitive Systems, DTU Compute, Technical University of Denmark, DK-2800 Kongens Lyngby, Denmark
| | - Morten Mørup
- Section for Cognitive Systems, DTU Compute, Technical University of Denmark, DK-2800 Kongens Lyngby, Denmark
| |
Collapse
|
44
|
Anand K, Krioukov D, Bianconi G. Entropy distribution and condensation in random networks with a given degree distribution. PHYSICAL REVIEW. E, STATISTICAL, NONLINEAR, AND SOFT MATTER PHYSICS 2014; 89:062807. [PMID: 25019833 DOI: 10.1103/physreve.89.062807] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/24/2014] [Indexed: 06/03/2023]
Abstract
The entropy of network ensembles characterizes the amount of information encoded in the network structure and can be used to quantify network complexity and the relevance of given structural properties observed in real network datasets with respect to a random hypothesis. In many real networks the degrees of individual nodes are not fixed but change in time, while their statistical properties, such as the degree distribution, are preserved. Here we characterize the distribution of entropy of random networks with given degree sequences, where each degree sequence is drawn randomly from a given degree distribution. We show that the leading term of the entropy of scale-free network ensembles depends only on the network size and average degree and that entropy is self-averaging, meaning that its relative variance vanishes in the thermodynamic limit. We also characterize large fluctuations of entropy that are fully determined by the average degree in the network. Finally, above a certain threshold, large fluctuations of the average degree in the ensemble can lead to condensation, meaning that a single node in a network of size N can attract O(N) links.
Collapse
Affiliation(s)
- Kartik Anand
- Bank of Canada, 234 Laurier Ave West, Ottawa, Ontario K1A 0G9, Canada
| | - Dmitri Krioukov
- Department of Physics, Northeastern University, Boston, Massachusetts 02115, USA
| | - Ginestra Bianconi
- School of Mathematical Sciences, Queen Mary University of London, London, E1 4NS, UK
| |
Collapse
|
45
|
Halu A, Mukherjee S, Bianconi G. Emergence of overlap in ensembles of spatial multiplexes and statistical mechanics of spatial interacting network ensembles. PHYSICAL REVIEW. E, STATISTICAL, NONLINEAR, AND SOFT MATTER PHYSICS 2014; 89:012806. [PMID: 24580280 DOI: 10.1103/physreve.89.012806] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/17/2013] [Indexed: 05/09/2023]
Abstract
Spatial networks range from the brain networks, to transportation networks and infrastructures. Recently interacting and multiplex networks are attracting great attention because their dynamics and robustness cannot be understood without treating at the same time several networks. Here we present maximal entropy ensembles of spatial multiplex and spatial interacting networks that can be used in order to model spatial multilayer network structures and to build null models of real data sets. We show that spatial multiplexes naturally develop a significant overlap of the links, a noticeable property of many multiplexes that can affect significantly the dynamics taking place on them. Additionally, we characterize ensembles of spatial interacting networks and we analyze the structure of interacting airport and railway networks in India, showing the effect of space in determining the link probability.
Collapse
Affiliation(s)
- Arda Halu
- Department of Physics, Northeastern University, Boston, Massachusetts 02115, USA
| | - Satyam Mukherjee
- Kellogg School of Management, Northwestern University, Evanston, Illinois 60208, USA
| | - Ginestra Bianconi
- School of Mathematical Sciences, Queen Mary University of London, London E1 4NS, United Kingdom
| |
Collapse
|
46
|
Peixoto TP. Efficient Monte Carlo and greedy heuristic for the inference of stochastic block models. PHYSICAL REVIEW. E, STATISTICAL, NONLINEAR, AND SOFT MATTER PHYSICS 2014; 89:012804. [PMID: 24580278 DOI: 10.1103/physreve.89.012804] [Citation(s) in RCA: 66] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Received: 10/17/2013] [Indexed: 05/22/2023]
Abstract
We present an efficient algorithm for the inference of stochastic block models in large networks. The algorithm can be used as an optimized Markov chain Monte Carlo (MCMC) method, with a fast mixing time and a much reduced susceptibility to getting trapped in metastable states, or as a greedy agglomerative heuristic, with an almost linear O(Nln2N) complexity, where N is the number of nodes in the network, independent of the number of blocks being inferred. We show that the heuristic is capable of delivering results which are indistinguishable from the more exact and numerically expensive MCMC method in many artificial and empirical networks, despite being much faster. The method is entirely unbiased towards any specific mixing pattern, and in particular it does not favor assortative community structures.
Collapse
Affiliation(s)
- Tiago P Peixoto
- Institut für Theoretische Physik, Universität Bremen, Hochschulring 18, D-28359 Bremen, Germany
| |
Collapse
|
47
|
Fronczak P, Fronczak A, Bujok M. Exponential random graph models for networks with community structure. PHYSICAL REVIEW. E, STATISTICAL, NONLINEAR, AND SOFT MATTER PHYSICS 2013; 88:032810. [PMID: 24125315 DOI: 10.1103/physreve.88.032810] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/17/2013] [Indexed: 06/02/2023]
Abstract
Although the community structure organization is an important characteristic of real-world networks, most of the traditional network models fail to reproduce the feature. Therefore, the models are useless as benchmark graphs for testing community detection algorithms. They are also inadequate to predict various properties of real networks. With this paper we intend to fill the gap. We develop an exponential random graph approach to networks with community structure. To this end we mainly built upon the idea of blockmodels. We consider both the classical blockmodel and its degree-corrected counterpart and study many of their properties analytically. We show that in the degree-corrected blockmodel, node degrees display an interesting scaling property, which is reminiscent of what is observed in real-world fractal networks. A short description of Monte Carlo simulations of the models is also given in the hope of being useful to others working in the field.
Collapse
Affiliation(s)
- Piotr Fronczak
- Faculty of Physics, Warsaw University of Technology, Koszykowa 75, PL-00-662 Warsaw, Poland
| | | | | |
Collapse
|
48
|
Bianconi G. Statistical mechanics of multiplex networks: entropy and overlap. PHYSICAL REVIEW. E, STATISTICAL, NONLINEAR, AND SOFT MATTER PHYSICS 2013; 87:062806. [PMID: 23848728 DOI: 10.1103/physreve.87.062806] [Citation(s) in RCA: 76] [Impact Index Per Article: 6.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/15/2013] [Indexed: 05/16/2023]
Abstract
There is growing interest in multiplex networks where individual nodes take part in several layers of networks simultaneously. This is the case, for example, in social networks where each individual node has different kinds of social ties or transportation systems where each location is connected to another location by different types of transport. Many of these multiplexes are characterized by a significant overlap of the links in different layers. In this paper we introduce a statistical mechanics framework to describe multiplex ensembles. A multiplex is a system formed by N nodes and M layers of interactions where each node belongs to the M layers at the same time. Each layer α is formed by a network G^{α}. Here we introduce the concept of correlated multiplex ensembles in which the existence of a link in one layer is correlated with the existence of a link in another layer. This implies that a typical multiplex of the ensemble can have a significant overlap of the links in the different layers. Moreover, we characterize microcanonical and canonical multiplex ensembles satisfying respectively hard and soft constraints and we discuss how to construct multiplexes in these ensembles. Finally, we provide the expression for the entropy of these ensembles that can be useful to address different inference problems involving multiplexes.
Collapse
Affiliation(s)
- Ginestra Bianconi
- School of Mathematical Sciences, Queen Mary University of London, London E1 4NS, United Kingdom
| |
Collapse
|
49
|
Peixoto TP. Parsimonious module inference in large networks. PHYSICAL REVIEW LETTERS 2013; 110:148701. [PMID: 25167049 DOI: 10.1103/physrevlett.110.148701] [Citation(s) in RCA: 55] [Impact Index Per Article: 4.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Received: 12/19/2012] [Indexed: 06/03/2023]
Abstract
We investigate the detectability of modules in large networks when the number of modules is not known in advance. We employ the minimum description length principle which seeks to minimize the total amount of information required to describe the network, and avoid overfitting. According to this criterion, we obtain general bounds on the detectability of any prescribed block structure, given the number of nodes and edges in the sampled network. We also obtain that the maximum number of detectable blocks scales as sqrt[N], where N is the number of nodes in the network, for a fixed average degree ⟨k⟩. We also show that the simplicity of the minimum description length approach yields an efficient multilevel Monte Carlo inference algorithm with a complexity of O(τNlogN), if the number of blocks is unknown, and O(τN) if it is known, where τ is the mixing time of the Markov chain. We illustrate the application of the method on a large network of actors and films with over 10(6) edges, and a dissortative, bipartite block structure.
Collapse
Affiliation(s)
- Tiago P Peixoto
- Institut für Theoretische Physik, Universität Bremen, Hochschulring 18, D-28359 Bremen, Germany
| |
Collapse
|
50
|
Peixoto TP, Bornholdt S. Evolution of robust network topologies: emergence of central backbones. PHYSICAL REVIEW LETTERS 2012; 109:118703. [PMID: 23005691 DOI: 10.1103/physrevlett.109.118703] [Citation(s) in RCA: 30] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/11/2012] [Indexed: 06/01/2023]
Abstract
We model the robustness against random failure or an intentional attack of networks with an arbitrary large-scale structure. We construct a block-based model which incorporates--in a general fashion--both connectivity and interdependence links, as well as arbitrary degree distributions and block correlations. By optimizing the percolation properties of this general class of networks, we identify a simple core-periphery structure as the topology most robust against random failure. In such networks, a distinct and small "core" of nodes with higher degree is responsible for most of the connectivity, functioning as a central "backbone" of the system. This centralized topology remains the optimal structure when other constraints are imposed, such as a given fraction of interdependence links and fixed degree distributions. This distinguishes simple centralized topologies as the most likely to emerge, when robustness against failure is the dominant evolutionary force.
Collapse
Affiliation(s)
- Tiago P Peixoto
- Institut für Theoretische Physik, Universität Bremen, Hochschulring 18, D-28359 Bremen, Germany.
| | | |
Collapse
|