1
|
Gaume B, Achitouv I, Chavalarias D. Two antagonistic objectives for one multi-scale graph clustering framework. Sci Rep 2025; 15:13368. [PMID: 40246874 PMCID: PMC12006389 DOI: 10.1038/s41598-025-90454-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/01/2024] [Accepted: 02/13/2025] [Indexed: 04/19/2025] Open
Abstract
In the current state of knowledge, there is no consensus on an objective criterion for evaluating network communities as cohesive sets of nodes with the following two properties: [Formula: see text] Each community is Densely Connected; [Formula: see text] Communities are Weakly Connected to each other. This makes it difficult to conduct comparative studies between dozens of graph clustering methods proposed over more than 20 years. To fill this gap: We propose a graph clustering framework by faithfully formalizing [Formula: see text] with precision and [Formula: see text] with recall, which are two meaningful metrics, simple, well known and already widely used for many tasks in most sciences. The meaning of these metrics in the context of graph clustering is therefore easily interpretable by most users of real-world graphs. We show that for most graphs, these two metrics are antagonistic, i.e. there is no solution that simultaneously maximizes precision and recall. In other words, to select a clustering among the Pareto optimal solutions (clusterings such that no other clustering exist that both increases the precision and the recall) we must first make a subjective compromise, according to our needs between the two properties [Formula: see text] and [Formula: see text]. We then show how to use this framework to compare, even without 'ground truth', the performances of five hitherto incommensurable state-of-the-art clustering methods, as well as that of a new family of clustering methods inspired by our approach.
Collapse
Affiliation(s)
- Bruno Gaume
- Cognition, Langues, Langage, Ergonomie (CLLE, UMR 5263), CNRS, Paris, France.
- Complex Systems Institute of Paris île-de-France (ISC-PIF, UAR3611), Paris, France.
| | - Ixandra Achitouv
- Complex Systems Institute of Paris île-de-France (ISC-PIF, UAR3611), Paris, France
| | - David Chavalarias
- Centre d'Analyse et de Mathématique Sociales (CAMS, UMR8557), Paris, France.
- Complex Systems Institute of Paris île-de-France (ISC-PIF, UAR3611), Paris, France.
| |
Collapse
|
2
|
Russell M, Aqil A, Saitou M, Gokcumen O, Masuda N. Gene communities in co-expression networks across different tissues. PLoS Comput Biol 2023; 19:e1011616. [PMID: 37976327 PMCID: PMC10691702 DOI: 10.1371/journal.pcbi.1011616] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/19/2023] [Revised: 12/01/2023] [Accepted: 10/19/2023] [Indexed: 11/19/2023] Open
Abstract
With the recent availability of tissue-specific gene expression data, e.g., provided by the GTEx Consortium, there is interest in comparing gene co-expression patterns across tissues. One promising approach to this problem is to use a multilayer network analysis framework and perform multilayer community detection. Communities in gene co-expression networks reveal groups of genes similarly expressed across individuals, potentially involved in related biological processes responding to specific environmental stimuli or sharing common regulatory variations. We construct a multilayer network in which each of the four layers is an exocrine gland tissue-specific gene co-expression network. We develop methods for multilayer community detection with correlation matrix input and an appropriate null model. Our correlation matrix input method identifies five groups of genes that are similarly co-expressed in multiple tissues (a community that spans multiple layers, which we call a generalist community) and two groups of genes that are co-expressed in just one tissue (a community that lies primarily within just one layer, which we call a specialist community). We further found gene co-expression communities where the genes physically cluster across the genome significantly more than expected by chance (on chromosomes 1 and 11). This clustering hints at underlying regulatory elements determining similar expression patterns across individuals and cell types. We suggest that KRTAP3-1, KRTAP3-3, and KRTAP3-5 share regulatory elements in skin and pancreas. Furthermore, we find that CELA3A and CELA3B share associated expression quantitative trait loci in the pancreas. The results indicate that our multilayer community detection method for correlation matrix input extracts biologically interesting communities of genes.
Collapse
Affiliation(s)
- Madison Russell
- Department of Mathematics, State University of New York at Buffalo, Buffalo, New York, United States of America
| | - Alber Aqil
- Department of Biological Sciences, State University of New York at Buffalo, Buffalo, New York, United States of America
| | - Marie Saitou
- Faculty of Biosciences, Norwegian University of Life Sciences, Ås, Norway
| | - Omer Gokcumen
- Department of Biological Sciences, State University of New York at Buffalo, Buffalo, New York, United States of America
| | - Naoki Masuda
- Department of Mathematics, State University of New York at Buffalo, Buffalo, New York, United States of America
- Institute for Artificial Intelligence and Data Science, State University of New York at Buffalo, Buffalo, New York, United States of America
| |
Collapse
|
3
|
Wang MB, Boring MJ, Ward MJ, Richardson RM, Ghuman AS. Deep brain stimulation for parkinson's disease induces spontaneous cortical hypersynchrony in extended motor and cognitive networks. Cereb Cortex 2022; 32:4480-4491. [PMID: 35136991 PMCID: PMC9574237 DOI: 10.1093/cercor/bhab496] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/17/2021] [Revised: 12/04/2021] [Accepted: 12/05/2021] [Indexed: 11/14/2022] Open
Abstract
The mechanism of action of deep brain stimulation (DBS) to the basal ganglia for Parkinson's disease remains unclear. Studies have shown that DBS decreases pathological beta hypersynchrony between the basal ganglia and motor cortex. However, little is known about DBS's effects on long range corticocortical synchronization. Here, we use machine learning combined with graph theory to compare resting-state cortical connectivity between the off and on-stimulation states and to healthy controls. We found that turning DBS on increased high beta and gamma band synchrony (26 to 50 Hz) in a cortical circuit spanning the motor, occipitoparietal, middle temporal, and prefrontal cortices. The synchrony in this network was greater in DBS on relative to both DBS off and controls, with no significant difference between DBS off and controls. Turning DBS on also increased network efficiency and strength and subnetwork modularity relative to both DBS off and controls in the beta and gamma band. Thus, unlike DBS's subcortical normalization of pathological basal ganglia activity, it introduces greater synchrony relative to healthy controls in cortical circuitry that includes both motor and non-motor systems. This increased high beta/gamma synchronization may reflect compensatory mechanisms related to DBS's clinical benefits, as well as undesirable non-motor side effects.
Collapse
Affiliation(s)
- Maxwell B Wang
- Address correspondence to Maxwell B Wang, BS, Medical Scientist Training Program, University of Pittsburgh School of Medicine, Program of Neural Computation, Carnegie Mellon University, Pittsburgh, PA 15213. Tel: 815-200-9533;
| | - Matthew J Boring
- Center for Neuroscience at the University of Pittsburgh, Pittsburgh, PA 15213, USA,Center for the Neural Basis of Cognition, University of Pittsburgh and Carnegie Mellon University, Pittsburgh, PA 15213, USA,Department of Neurological Surgery, University of Pittsburgh, Pittsburgh, PA 15213, USA
| | - Michael J Ward
- Department of Neurological Surgery, University of Pittsburgh, Pittsburgh, PA 15213, USA
| | - R Mark Richardson
- Department of Neurological Surgery, University of Pittsburgh, Pittsburgh, PA 15213, USA,Department of Neurosurgery, Massachusetts General Hospital, Boston, MA 02114, USA,Harvard Medical School, Boston, MA 02115, USA
| | - Avniel Singh Ghuman
- Program of Neural Computation, Carnegie Mellon University, Pittsburgh, PA 15213, USA,Center for Neuroscience at the University of Pittsburgh, Pittsburgh, PA 15213, USA,Center for the Neural Basis of Cognition, University of Pittsburgh and Carnegie Mellon University, Pittsburgh, PA 15213, USA,Department of Neurological Surgery, University of Pittsburgh, Pittsburgh, PA 15213, USA
| |
Collapse
|
4
|
He Z, Chen W, Wei X, Liu Y. On the statistical significance of communities from weighted graphs. Sci Rep 2021; 11:20304. [PMID: 34645850 PMCID: PMC8514603 DOI: 10.1038/s41598-021-99175-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/17/2021] [Accepted: 09/21/2021] [Indexed: 11/09/2022] Open
Abstract
Community detection is a fundamental procedure in the analysis of network data. Despite decades of research, there is still no consensus on the definition of a community. To analytically test the realness of a candidate community in weighted networks, we present a general formulation from a significance testing perspective. In this new formulation, the edge-weight is modeled as a censored observation due to the noisy characteristics of real networks. In particular, the edge-weights of missing links are incorporated as well, which are specified to be zeros based on the assumption that they are truncated or unobserved. Thereafter, the community significance assessment issue is formulated as a two-sample test problem on censored data. More precisely, the Logrank test is employed to conduct the significance testing on two sets of augmented edge-weights: internal weight set and external weight set. The presented approach is evaluated on both weighted networks and un-weighted networks. The experimental results show that our method can outperform prior widely used evaluation metrics on the task of individual community validation.
Collapse
Affiliation(s)
- Zengyou He
- School of Software, Dalian University of Technology, Dalian, 116024, China. .,Key Laboratory for Ubiquitous Network and Service Software of Liaoning Province, Dalian, 116024, China.
| | - Wenfang Chen
- School of Software, Dalian University of Technology, Dalian, 116024, China
| | - Xiaoqi Wei
- School of Software, Dalian University of Technology, Dalian, 116024, China
| | - Yan Liu
- School of Software, Dalian University of Technology, Dalian, 116024, China
| |
Collapse
|
5
|
Dozmorov MG, Cresswell KG, Bacanu SA, Craver C, Reimers M, Kendler KS. A method for estimating coherence of molecular mechanisms in major human disease and traits. BMC Bioinformatics 2020; 21:473. [PMID: 33087046 PMCID: PMC7579960 DOI: 10.1186/s12859-020-03821-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/06/2020] [Accepted: 10/15/2020] [Indexed: 11/29/2022] Open
Abstract
BACKGROUND Phenotypes such as height and intelligence, are thought to be a product of the collective effects of multiple phenotype-associated genes and interactions among their protein products. High/low degree of interactions is suggestive of coherent/random molecular mechanisms, respectively. Comparing the degree of interactions may help to better understand the coherence of phenotype-specific molecular mechanisms and the potential for therapeutic intervention. However, direct comparison of the degree of interactions is difficult due to different sizes and configurations of phenotype-associated gene networks. METHODS We introduce a metric for measuring coherence of molecular-interaction networks as a slope of internal versus external distributions of the degree of interactions. The internal degree distribution is defined by interaction counts within a phenotype-specific gene network, while the external degree distribution counts interactions with other genes in the whole protein-protein interaction (PPI) network. We present a novel method for normalizing the coherence estimates, making them directly comparable. RESULTS Using STRING and BioGrid PPI databases, we compared the coherence of 116 phenotype-associated gene sets from GWAScatalog against size-matched KEGG pathways (the reference for high coherence) and random networks (the lower limit of coherence). We observed a range of coherence estimates for each category of phenotypes. Metabolic traits and diseases were the most coherent, while psychiatric disorders and intelligence-related traits were the least coherent. We demonstrate that coherence and modularity measures capture distinct network properties. CONCLUSIONS We present a general-purpose method for estimating and comparing the coherence of molecular-interaction gene networks that accounts for the network size and shape differences. Our results highlight gaps in our current knowledge of genetics and molecular mechanisms of complex phenotypes and suggest priorities for future GWASs.
Collapse
Affiliation(s)
- Mikhail G. Dozmorov
- Department of Biostatistics, Virginia Commonwealth University, Richmond, VA USA
- Department of Pathology, Virginia Commonwealth University, Richmond, VA USA
| | - Kellen G. Cresswell
- Department of Biostatistics, Virginia Commonwealth University, Richmond, VA USA
| | - Silviu-Alin Bacanu
- Virginia Institute for Psychiatric and Behavior Genetics and the Department of Psychiatry, Virginia Commonwealth University, Richmond, VA USA
| | - Carl Craver
- Philosophy-Neuroscience-Psychology Program, Washington University in St. Louis, St. Louis, MO USA
| | - Mark Reimers
- Department Physiology, Michigan State University, East Lansing, MI USA
- Department Biomedical Engineering, Michigan State University, East Lansing, MI USA
| | - Kenneth S. Kendler
- Virginia Institute for Psychiatric and Behavior Genetics and the Department of Psychiatry, Virginia Commonwealth University, Richmond, VA USA
| |
Collapse
|
6
|
He Z, Liang H, Chen Z, Zhao C, Liu Y. Computing exact P-values for community detection. Data Min Knowl Discov 2020. [DOI: 10.1007/s10618-020-00681-0] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/24/2022]
|
7
|
Palowitch J. Computing the statistical significance of optimized communities in networks. Sci Rep 2019; 9:18444. [PMID: 31804528 PMCID: PMC6895225 DOI: 10.1038/s41598-019-54708-8] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/08/2019] [Accepted: 11/14/2019] [Indexed: 11/25/2022] Open
Abstract
In scientific problems involving systems that can be modeled as a network (or "graph"), it is often of interest to find network communities - strongly connected node subsets - for unsupervised learning, feature discovery, anomaly detection, or scientific study. The vast majority of community detection methods proceed via optimization of a quality function, which is possible even on random networks without communities. Therefore there is usually not an easy way to tell if a community is "significant", in this context meaning more internally connected than would be expected under a random graph model without communities. This paper generalizes existing null models and statistical tests for this purpose to bipartite graphs, and introduces a new significance scoring algorithm called Fast Optimized Community Significance (FOCS) that is highly scalable and agnostic to the type of graph. Compared with existing methods on unipartite graphs, FOCS is more numerically stable and better balances the trade-off between detection power and false positives. On a large-scale bipartite graph derived from the Internet Movie Database (IMDB), the significance scores provided by FOCS correlate strongly with meaningful actor/director collaborations on serial cinematic projects.
Collapse
|
8
|
Li Y, Qi Y. Asymptotic distribution of modularity in networks. METRIKA 2019. [DOI: 10.1007/s00184-019-00740-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
|
9
|
Lenormand M, Papuga G, Argagnon O, Soubeyrand M, De Barros G, Alleaume S, Luque S. Biogeographical network analysis of plant species distribution in the Mediterranean region. Ecol Evol 2019; 9:237-250. [PMID: 30680110 PMCID: PMC6342112 DOI: 10.1002/ece3.4718] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/13/2018] [Revised: 10/09/2018] [Accepted: 10/14/2018] [Indexed: 12/04/2022] Open
Abstract
The delimitation of bioregions helps to understand historical and ecological drivers of species distribution. In this work, we performed a network analysis of the spatial distribution patterns of plants in south of France (Languedoc-Roussillon and Provence-Alpes-Côte d'Azur) to analyze the biogeographical structure of the French Mediterranean flora at different scales. We used a network approach to identify and characterize biogeographical regions, based on a large database containing 2.5 million of geolocalized plant records corresponding to more than 3,500 plant species. This methodology is performed following five steps, from the biogeographical bipartite network construction to the identification of biogeographical regions under the form of spatial network communities, the analysis of their interactions, and the identification of clusters of plant species based on the species contribution to the biogeographical regions. First, we identified two sub-networks that distinguish Mediterranean and temperate biota. Then, we separated eight statistically significant bioregions that present a complex spatial structure. Some of them are spatially well delimited and match with particular geological entities. On the other hand, fuzzy transitions arise between adjacent bioregions that share a common geological setting, but are spread along a climatic gradient. The proposed network approach illustrates the biogeographical structure of the flora in southern France and provides precise insights into the relationships between bioregions. This approach sheds light on ecological drivers shaping the distribution of Mediterranean biota: The interplay between a climatic gradient and geological substrate shapes biodiversity patterns. Finally, this work exemplifies why fragmented distributions are common in the Mediterranean region, isolating groups of species that share a similar eco-evolutionary history.
Collapse
Affiliation(s)
| | - Guillaume Papuga
- Conservatoire botanique national méditerranéen de PorquerollesParc scientifique AgropolisMontferrier sur LezFrance
- UMR 5175 CEFECNRSMontpellier Cedex 5France
| | - Olivier Argagnon
- Conservatoire botanique national méditerranéen de PorquerollesParc scientifique AgropolisMontferrier sur LezFrance
| | | | - Guilhem De Barros
- Conservatoire botanique national méditerranéen de PorquerollesParc scientifique AgropolisMontferrier sur LezFrance
| | | | | |
Collapse
|
10
|
Kojaku S, Masuda N. A generalised significance test for individual communities in networks. Sci Rep 2018; 8:7351. [PMID: 29743534 PMCID: PMC5943579 DOI: 10.1038/s41598-018-25560-z] [Citation(s) in RCA: 14] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/11/2017] [Accepted: 04/24/2018] [Indexed: 11/09/2022] Open
Abstract
Many empirical networks have community structure, in which nodes are densely interconnected within each community (i.e., a group of nodes) and sparsely across different communities. Like other local and meso-scale structure of networks, communities are generally heterogeneous in various aspects such as the size, density of edges, connectivity to other communities and significance. In the present study, we propose a method to statistically test the significance of individual communities in a given network. Compared to the previous methods, the present algorithm is unique in that it accepts different community-detection algorithms and the corresponding quality function for single communities. The present method requires that a quality of each community can be quantified and that community detection is performed as optimisation of such a quality function summed over the communities. Various community detection algorithms including modularity maximisation and graph partitioning meet this criterion. Our method estimates a distribution of the quality function for randomised networks to calculate a likelihood of each community in the given network. We illustrate our algorithm by synthetic and empirical networks.
Collapse
Affiliation(s)
- Sadamori Kojaku
- CREST, JST, Kawaguchi Center Building, 4-1-8, Honcho, Kawaguchi-shi, Saitama, 332-0012, Japan.,Department of Engineering Mathematics, Merchant Venturers Building, University of Bristol, Woodland Road, Clifton, Bristol, BS8 1UB, United Kingdom
| | - Naoki Masuda
- Department of Engineering Mathematics, Merchant Venturers Building, University of Bristol, Woodland Road, Clifton, Bristol, BS8 1UB, United Kingdom.
| |
Collapse
|
11
|
|
12
|
Tokuda T. Statistical test for detecting community structure in real-valued edge-weighted graphs. PLoS One 2018; 13:e0194079. [PMID: 29558487 PMCID: PMC5860707 DOI: 10.1371/journal.pone.0194079] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/11/2017] [Accepted: 02/23/2018] [Indexed: 11/28/2022] Open
Abstract
We propose a novel method to test the existence of community structure in undirected, real-valued, edge-weighted graphs. The method is based on the asymptotic behavior of extreme eigenvalues of a real symmetric edge-weight matrix. We provide a theoretical foundation for this method and report on its performance using synthetic and real data, suggesting that this new method outperforms other state-of-the-art methods.
Collapse
Affiliation(s)
- Tomoki Tokuda
- Okinawa Institute of Science and Technology Graduate University, 1919-1, Tancha, Onna-son, Okinawa, Japan
- * E-mail:
| |
Collapse
|
13
|
Bongiorno C, London A, Miccichè S, Mantegna RN. Core of communities in bipartite networks. Phys Rev E 2017; 96:022321. [PMID: 28950546 DOI: 10.1103/physreve.96.022321] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/06/2017] [Indexed: 06/07/2023]
Abstract
We use the information present in a bipartite network to detect cores of communities of each set of the bipartite system. Cores of communities are found by investigating statistically validated projected networks obtained using information present in the bipartite network. Cores of communities are highly informative and robust with respect to the presence of errors or missing entries in the bipartite network. We assess the statistical robustness of cores by investigating an artificial benchmark network, the coauthorship network, and the actor-movie network. The accuracy and precision of the partition obtained with respect to the reference partition are measured in terms of the adjusted Rand index and the adjusted Wallace index, respectively. The detection of cores is highly precise, although the accuracy of the methodology can be limited in some cases.
Collapse
Affiliation(s)
- Christian Bongiorno
- Dipartimento di Fisica e Chimica, Università degli Studi di Palermo, Viale delle Scienze Ed. 18, I-90128 Palermo, Italy
| | - András London
- Institute of Informatics, University of Szeged, Árpád tér 2, H-6720 Szeged, Hungary
| | - Salvatore Miccichè
- Dipartimento di Fisica e Chimica, Università degli Studi di Palermo, Viale delle Scienze Ed. 18, I-90128 Palermo, Italy
| | - Rosario N Mantegna
- Dipartimento di Fisica e Chimica, Università degli Studi di Palermo, Viale delle Scienze Ed. 18, I-90128 Palermo, Italy
- Center for Network Science, Central European University, Nador 9, H-1051 Budapest, Hungary
- Department of Computer Science, University College London, Gower Street, London WC1E 6BT, United Kingdom
| |
Collapse
|
14
|
Bae J, Cha YJ, Lee H, Lee B, Baek S, Choi S, Jang D. Social networks and inference about unknown events: A case of the match between Google's AlphaGo and Sedol Lee. PLoS One 2017; 12:e0171472. [PMID: 28222114 PMCID: PMC5319654 DOI: 10.1371/journal.pone.0171472] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/11/2016] [Accepted: 01/20/2017] [Indexed: 11/24/2022] Open
Abstract
This study examines whether the way that a person makes inferences about unknown events is associated with his or her social relations, more precisely, those characterized by ego network density that reflects the structure of a person’s immediate social relation. From the analysis of individual predictions over the Go match between AlphaGo and Sedol Lee in March 2016 in Seoul, Korea, this study shows that the low-density group scored higher than the high-density group in the accuracy of the prediction over a future state of a social event, i.e., the outcome of the first game. We corroborated this finding with three replication tests that asked the participants to predict the following: film awards, President Park’s impeachment in Korea, and the counterfactual assessment of the US presidential election. Taken together, this study suggests that network density is negatively associated with vision advantage, i.e., the ability to discover and forecast an unknown aspect of a social event.
Collapse
Affiliation(s)
- Jonghoon Bae
- Graduate School of Business, Seoul National University, Seoul, Korea
- Transdisciplinary Research Center for Culture-Brain Dynamics, Seoul National University, Seoul, Korea
| | - Young-Jae Cha
- Transdisciplinary Research Center for Culture-Brain Dynamics, Seoul National University, Seoul, Korea
- Interdisciplinary Program in Cognitive Science, Seoul National University, Seoul, Korea
| | - Hyungsuk Lee
- Transdisciplinary Research Center for Culture-Brain Dynamics, Seoul National University, Seoul, Korea
- Interdisciplinary Program in History and Philosophy of Science, Seoul National University, Seoul, Korea
| | - Boyun Lee
- Transdisciplinary Research Center for Culture-Brain Dynamics, Seoul National University, Seoul, Korea
- Interdisciplinary Program in History and Philosophy of Science, Seoul National University, Seoul, Korea
| | - Sojung Baek
- Transdisciplinary Research Center for Culture-Brain Dynamics, Seoul National University, Seoul, Korea
- Interdisciplinary Program in Cognitive Science, Seoul National University, Seoul, Korea
| | - Semin Choi
- Transdisciplinary Research Center for Culture-Brain Dynamics, Seoul National University, Seoul, Korea
- Department of Statistics, Seoul National University, Seoul, Korea
| | - Dayk Jang
- Transdisciplinary Research Center for Culture-Brain Dynamics, Seoul National University, Seoul, Korea
- Interdisciplinary Program in Cognitive Science, Seoul National University, Seoul, Korea
- Interdisciplinary Program in History and Philosophy of Science, Seoul National University, Seoul, Korea
- College of Liberal Studies, Seoul National University, Seoul, Korea
- * E-mail:
| |
Collapse
|
15
|
Kujala R, Glerean E, Pan RK, Jääskeläinen IP, Sams M, Saramäki J. Graph coarse-graining reveals differences in the module-level structure of functional brain networks. Eur J Neurosci 2016; 44:2673-2684. [PMID: 27602806 DOI: 10.1111/ejn.13392] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/12/2016] [Revised: 08/30/2016] [Accepted: 08/30/2016] [Indexed: 01/22/2023]
Abstract
Networks have become a standard tool for analyzing functional magnetic resonance imaging (fMRI) data. In this approach, brain areas and their functional connections are mapped to the nodes and links of a network. Even though this mapping reduces the complexity of the underlying data, it remains challenging to understand the structure of the resulting networks due to the large number of nodes and links. One solution is to partition networks into modules and then investigate the modules' composition and relationship with brain functioning. While this approach works well for single networks, understanding differences between two networks by comparing their partitions is difficult and alternative approaches are thus necessary. To this end, we present a coarse-graining framework that uses a single set of data-driven modules as a frame of reference, enabling one to zoom out from the node- and link-level details. As a result, differences in the module-level connectivity can be understood in a transparent, statistically verifiable manner. We demonstrate the feasibility of the method by applying it to networks constructed from fMRI data recorded from 13 healthy subjects during rest and movie viewing. While independently partitioning the rest and movie networks is shown to yield little insight, the coarse-graining framework enables one to pinpoint differences in the module-level structure, such as the increased number of intra-module links within the visual cortex during movie viewing. In addition to quantifying differences due to external stimuli, the approach could also be applied in clinical settings, such as comparing patients with healthy controls.
Collapse
Affiliation(s)
- Rainer Kujala
- Department of Computer Science, Aalto University, PO Box 15400, FI-00076, Aalto, Finland.
| | - Enrico Glerean
- Department of Neuroscience and Biomedical Engineering, Aalto University, Aalto, Finland
| | - Raj Kumar Pan
- Department of Computer Science, Aalto University, PO Box 15400, FI-00076, Aalto, Finland
| | - Iiro P Jääskeläinen
- Department of Neuroscience and Biomedical Engineering, Aalto University, Aalto, Finland
| | - Mikko Sams
- Department of Neuroscience and Biomedical Engineering, Aalto University, Aalto, Finland
| | - Jari Saramäki
- Department of Computer Science, Aalto University, PO Box 15400, FI-00076, Aalto, Finland
| |
Collapse
|
16
|
Burgess M, Adar E, Cafarella M. Link-Prediction Enhanced Consensus Clustering for Complex Networks. PLoS One 2016; 11:e0153384. [PMID: 27203750 PMCID: PMC4874693 DOI: 10.1371/journal.pone.0153384] [Citation(s) in RCA: 25] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/12/2015] [Accepted: 03/29/2016] [Indexed: 11/19/2022] Open
Abstract
Many real networks that are collected or inferred from data are incomplete due to missing edges. Missing edges can be inherent to the dataset (Facebook friend links will never be complete) or the result of sampling (one may only have access to a portion of the data). The consequence is that downstream analyses that "consume" the network will often yield less accurate results than if the edges were complete. Community detection algorithms, in particular, often suffer when critical intra-community edges are missing. We propose a novel consensus clustering algorithm to enhance community detection on incomplete networks. Our framework utilizes existing community detection algorithms that process networks imputed by our link prediction based sampling algorithm and merges their multiple partitions into a final consensus output. On average our method boosts performance of existing algorithms by 7% on artificial data and 17% on ego networks collected from Facebook.
Collapse
Affiliation(s)
- Matthew Burgess
- Computer Science & Engineering, University of Michigan, Ann Arbor, MI, United States of America
| | - Eytan Adar
- Computer Science & Engineering, University of Michigan, Ann Arbor, MI, United States of America
- School of Information, University of Michigan, Ann Arbor, MI, United States of America
| | - Michael Cafarella
- Computer Science & Engineering, University of Michigan, Ann Arbor, MI, United States of America
| |
Collapse
|
17
|
Schülke C, Ricci-Tersenghi F. Multiple phases in modularity-based community detection. PHYSICAL REVIEW. E, STATISTICAL, NONLINEAR, AND SOFT MATTER PHYSICS 2015; 92:042804. [PMID: 26565286 DOI: 10.1103/physreve.92.042804] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/16/2015] [Indexed: 06/05/2023]
Abstract
Detecting communities in a network, based only on the adjacency matrix, is a problem of interest to several scientific disciplines. Recently, Zhang and Moore have introduced an algorithm [Proc. Natl. Acad. Sci. USA 111, 18144 (2014)], called mod-bp, that avoids overfitting the data by optimizing a weighted average of modularity (a popular goodness-of-fit measure in community detection) and entropy (i.e., number of configurations with a given modularity). The adjustment of the relative weight, the "temperature" of the model, is crucial for getting a correct result from mod-bp. In this work we study the many phase transitions that mod-bp may undergo by changing the two parameters of the algorithm: the temperature T and the maximum number of groups q. We introduce a new set of order parameters that allow us to determine the actual number of groups q̂, and we observe on both synthetic and real networks the existence of phases with any q̂∈{1,q}, which were unknown before. We discuss how to interpret the results of mod-bp and how to make the optimal choice for the problem of detecting significant communities.
Collapse
Affiliation(s)
- Christophe Schülke
- Université Paris Diderot, Sorbonne Paris Cité, 75205 Paris, France and Dipartimento di Fisica, Università di Roma "La Sapienza," Piazzale Aldo Moro 2, 00185 Rome, Italy
| | - Federico Ricci-Tersenghi
- Dipartimento di Fisica, INFN-Sezione di Roma 1, and CNR-NANOTEC, UOS di Roma, Università di Roma "La Sapienza," Piazzale Aldo Moro 2, 00185 Rome, Italy
| |
Collapse
|
18
|
Abstract
The development of new technologies for mapping structural and functional brain connectivity has led to the creation of comprehensive network maps of neuronal circuits and systems. The architecture of these brain networks can be examined and analyzed with a large variety of graph theory tools. Methods for detecting modules, or network communities, are of particular interest because they uncover major building blocks or subnetworks that are particularly densely connected, often corresponding to specialized functional components. A large number of methods for community detection have become available and are now widely applied in network neuroscience. This article first surveys a number of these methods, with an emphasis on their advantages and shortcomings; then it summarizes major findings on the existence of modules in both structural and functional brain networks and briefly considers their potential functional roles in brain evolution, wiring minimization, and the emergence of functional specialization and complex dynamics.
Collapse
Affiliation(s)
- Olaf Sporns
- Department of Psychological and Brain Sciences, Indiana University, Bloomington, Indiana 47405; .,Indiana University Network Science Institute, Indiana University, Bloomington, Indiana 47405
| | - Richard F Betzel
- Department of Psychological and Brain Sciences, Indiana University, Bloomington, Indiana 47405;
| |
Collapse
|
19
|
Lenormand M, Gonçalves B, Tugores A, Ramasco JJ. Human diffusion and city influence. J R Soc Interface 2015; 12:20150473. [PMID: 26179991 PMCID: PMC4535413 DOI: 10.1098/rsif.2015.0473] [Citation(s) in RCA: 37] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/26/2015] [Accepted: 06/24/2015] [Indexed: 11/12/2022] Open
Abstract
Cities are characterized by concentrating population, economic activity and services. However, not all cities are equal and a natural hierarchy at local, regional or global scales spontaneously emerges. In this work, we introduce a method to quantify city influence using geolocated tweets to characterize human mobility. Rome and Paris appear consistently as the cities attracting most diverse visitors. The ratio between locals and non-local visitors turns out to be fundamental for a city to truly be global. Focusing only on urban residents' mobility flows, a city-to-city network can be constructed. This network allows us to analyse centrality measures at different scales. New York and London play a central role on the global scale, while urban rankings suffer substantial changes if the focus is set at a regional level.
Collapse
Affiliation(s)
- Maxime Lenormand
- Instituto de Física Interdisciplinary Sistemas Complejos IFISC (CSIC-UIB), Campus UIB, 07122 Palma de Mallorca, Spain
| | - Bruno Gonçalves
- Aix Marseille Université, Université de Toulon, CNRS, CPT, UMR 7332, Marseille 13288, France
| | - Antònia Tugores
- Instituto de Física Interdisciplinary Sistemas Complejos IFISC (CSIC-UIB), Campus UIB, 07122 Palma de Mallorca, Spain
| | - José J Ramasco
- Instituto de Física Interdisciplinary Sistemas Complejos IFISC (CSIC-UIB), Campus UIB, 07122 Palma de Mallorca, Spain
| |
Collapse
|
20
|
Li HJ, Daniels JJ. Social significance of community structure: statistical view. PHYSICAL REVIEW. E, STATISTICAL, NONLINEAR, AND SOFT MATTER PHYSICS 2015; 91:012801. [PMID: 25679651 DOI: 10.1103/physreve.91.012801] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/16/2014] [Indexed: 05/06/2023]
Abstract
Community structure analysis is a powerful tool for social networks that can simplify their topological and functional analysis considerably. However, since community detection methods have random factors and real social networks obtained from complex systems always contain error edges, evaluating the significance of a partitioned community structure is an urgent and important question. In this paper, integrating the specific characteristics of real society, we present a framework to analyze the significance of a social community. The dynamics of social interactions are modeled by identifying social leaders and corresponding hierarchical structures. Instead of a direct comparison with the average outcome of a random model, we compute the similarity of a given node with the leader by the number of common neighbors. To determine the membership vector, an efficient community detection algorithm is proposed based on the position of the nodes and their corresponding leaders. Then, using a log-likelihood score, the tightness of the community can be derived. Based on the distribution of community tightness, we establish a connection between p-value theory and network analysis, and then we obtain a significance measure of statistical form . Finally, the framework is applied to both benchmark networks and real social networks. Experimental results show that our work can be used in many fields, such as determining the optimal number of communities, analyzing the social significance of a given community, comparing the performance among various algorithms, etc.
Collapse
Affiliation(s)
- Hui-Jia Li
- School of Management Science and Engineering, Central University of Finance and Economics, Beijing 100080, China and Academy of Mathematics and Systems Science, Chinese Academy of Sciences, Beijing 100190, China
| | - Jasmine J Daniels
- Department of Applied Physics, Stanford University, Stanford, California 94305, USA
| |
Collapse
|
21
|
Scalable detection of statistically significant communities and hierarchies, using message passing for modularity. Proc Natl Acad Sci U S A 2014; 111:18144-9. [PMID: 25489096 DOI: 10.1073/pnas.1409770111] [Citation(s) in RCA: 58] [Impact Index Per Article: 5.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022] Open
Abstract
Modularity is a popular measure of community structure. However, maximizing the modularity can lead to many competing partitions, with almost the same modularity, that are poorly correlated with each other. It can also produce illusory ''communities'' in random graphs where none exist. We address this problem by using the modularity as a Hamiltonian at finite temperature and using an efficient belief propagation algorithm to obtain the consensus of many partitions with high modularity, rather than looking for a single partition that maximizes it. We show analytically and numerically that the proposed algorithm works all of the way down to the detectability transition in networks generated by the stochastic block model. It also performs well on real-world networks, revealing large communities in some networks where previous work has claimed no communities exist. Finally we show that by applying our algorithm recursively, subdividing communities until no statistically significant subcommunities can be found, we can detect hierarchical structure in real-world networks more efficiently than previous methods.
Collapse
|
22
|
Weng T, Zhao Y, Small M, Huang DD. Time-series analysis of networks: exploring the structure with random walks. PHYSICAL REVIEW. E, STATISTICAL, NONLINEAR, AND SOFT MATTER PHYSICS 2014; 90:022804. [PMID: 25215778 DOI: 10.1103/physreve.90.022804] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 12/05/2013] [Indexed: 06/03/2023]
Abstract
We generate time series from scale-free networks based on a finite-memory random walk traversing the network. These time series reveal topological and functional properties of networks via their temporal correlations. Remarkably, networks with different node-degree mixing patterns exhibit distinct self-similar characteristics. In particular, assortative networks are transformed into time series with long-range correlation, while disassortative networks are transformed into time series exhibiting anticorrelation. These relationships are consistent across a diverse variety of real networks. Moreover, we show that multiscale analysis of these time series can describe and classify various physical networks ranging from social and technological to biological networks according to their functional origin. These results suggest that there is a unified dynamical mechanism that governs the structural organization of many seemingly different networks.
Collapse
Affiliation(s)
- Tongfeng Weng
- Shenzhen Graduate School, Harbin Institute of Technology, Shenzhen, People's Republic of China
| | - Yi Zhao
- Shenzhen Graduate School, Harbin Institute of Technology, Shenzhen, People's Republic of China
| | - Michael Small
- The University of Western Australia, Crawley, WA 6009, Australia
| | | |
Collapse
|
23
|
|
24
|
Liu Y, Tennant DA, Zhu Z, Heath JK, Yao X, He S. DiME: a scalable disease module identification algorithm with application to glioma progression. PLoS One 2014; 9:e86693. [PMID: 24523864 PMCID: PMC3921127 DOI: 10.1371/journal.pone.0086693] [Citation(s) in RCA: 19] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/17/2013] [Accepted: 12/13/2013] [Indexed: 11/21/2022] Open
Abstract
Disease module is a group of molecular components that interact intensively in the disease specific biological network. Since the connectivity and activity of disease modules may shed light on the molecular mechanisms of pathogenesis and disease progression, their identification becomes one of the most important challenges in network medicine, an emerging paradigm to study complex human disease. This paper proposes a novel algorithm, DiME (Disease Module Extraction), to identify putative disease modules from biological networks. We have developed novel heuristics to optimise Community Extraction, a module criterion originally proposed for social network analysis, to extract topological core modules from biological networks as putative disease modules. In addition, we have incorporated a statistical significance measure, B-score, to evaluate the quality of extracted modules. As an application to complex diseases, we have employed DiME to investigate the molecular mechanisms that underpin the progression of glioma, the most common type of brain tumour. We have built low (grade II) - and high (GBM) - grade glioma co-expression networks from three independent datasets and then applied DiME to extract potential disease modules from both networks for comparison. Examination of the interconnectivity of the identified modules have revealed changes in topology and module activity (expression) between low- and high- grade tumours, which are characteristic of the major shifts in the constitution and physiology of tumour cells during glioma progression. Our results suggest that transcription factors E2F4, AR and ETS1 are potential key regulators in tumour progression. Our DiME compiled software, R/C++ source code, sample data and a tutorial are available at http://www.cs.bham.ac.uk/~szh/DiME.
Collapse
Affiliation(s)
- Yunpeng Liu
- School of Computer Science, University of Birmingham, Birmingham, United Kingdom
| | - Daniel A. Tennant
- School of Cancer Sciences, University of Birmingham, Birmingham, United Kingdom
| | - Zexuan Zhu
- College of Computer Science and Software Engineering, Shenzhen University, Shenzhen, China
| | - John K. Heath
- Centre for Systems Biology, School of Biological Sciences, University of Birmingham, Birmingham, United Kingdom
| | - Xin Yao
- School of Computer Science, University of Birmingham, Birmingham, United Kingdom
| | - Shan He
- School of Computer Science, University of Birmingham, Birmingham, United Kingdom
- Centre for Systems Biology, School of Biological Sciences, University of Birmingham, Birmingham, United Kingdom
- * E-mail:
| |
Collapse
|
25
|
Chang YT, Pantazis D, Leahy RM. To cut or not to cut? Assessing the modular structure of brain networks. Neuroimage 2014; 91:99-108. [PMID: 24440531 DOI: 10.1016/j.neuroimage.2014.01.010] [Citation(s) in RCA: 10] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/05/2013] [Revised: 12/12/2013] [Accepted: 01/08/2014] [Indexed: 11/18/2022] Open
Abstract
A wealth of methods has been developed to identify natural divisions of brain networks into groups or modules, with one of the most prominent being modularity. Compared with the popularity of methods to detect community structure, only a few methods exist to statistically control for spurious modules, relying almost exclusively on resampling techniques. It is well known that even random networks can exhibit high modularity because of incidental concentration of edges, even though they have no underlying organizational structure. Consequently, interpretation of community structure is confounded by the lack of principled and computationally tractable approaches to statistically control for spurious modules. In this paper we show that the modularity of random networks follows a transformed version of the Tracy-Widom distribution, providing for the first time a link between module detection and random matrix theory. We compute parametric formulas for the distribution of modularity for random networks as a function of network size and edge variance, and show that we can efficiently control for false positives in brain and other real-world networks.
Collapse
Affiliation(s)
- Yu-Teng Chang
- McGovern Institute for Brain Research, Massachusetts Institute of Technology, Cambridge, MA 02139, USA.
| | - Dimitrios Pantazis
- McGovern Institute for Brain Research, Massachusetts Institute of Technology, Cambridge, MA 02139, USA
| | - Richard M Leahy
- Signal and Image Processing Institute, University of Southern California, Los Angeles, CA 90089, USA
| |
Collapse
|
26
|
Tremblay N, Barrat A, Forest C, Nornberg M, Pinton JF, Borgnat P. Bootstrapping under constraint for the assessment of group behavior in human contact networks. PHYSICAL REVIEW. E, STATISTICAL, NONLINEAR, AND SOFT MATTER PHYSICS 2013; 88:052812. [PMID: 24329323 DOI: 10.1103/physreve.88.052812] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Received: 12/14/2012] [Revised: 07/19/2013] [Indexed: 06/03/2023]
Abstract
The increasing availability of time- and space-resolved data describing human activities and interactions gives insights into both static and dynamic properties of human behavior. In practice, nevertheless, real-world data sets can often be considered as only one realization of a particular event. This highlights a key issue in social network analysis: the statistical significance of estimated properties. In this context, we focus here on the assessment of quantitative features of specific subset of nodes in empirical networks. We present a method of statistical resampling based on bootstrapping groups of nodes under constraints within the empirical network. The method enables us to define acceptance intervals for various null hypotheses concerning relevant properties of the subset of nodes under consideration in order to characterize by a statistical test its behavior as "normal" or not. We apply this method to a high-resolution data set describing the face-to-face proximity of individuals during two colocated scientific conferences. As a case study, we show how to probe whether colocating the two conferences succeeded in bringing together the two corresponding groups of scientists.
Collapse
Affiliation(s)
- Nicolas Tremblay
- Physics Laboratory, ENS Lyon, Université de Lyon, CNRS UMR 5672, Lyon, France
| | - Alain Barrat
- Aix Marseille Université, CNRS, CPT, UMR 7332, 13288 Marseille, France and Université de Toulon, CNRS, CPT, UMR 7332, 83957 La Garde, France and Data Science Laboratory, Institute for Scientific Interchange (ISI) Foundation, Torino, Italy
| | - Cary Forest
- University of Wisconsin-Madison, Physics Department, Madison, Wisconsin, USA
| | - Mark Nornberg
- University of Wisconsin-Madison, Physics Department, Madison, Wisconsin, USA
| | | | - Pierre Borgnat
- Physics Laboratory, ENS Lyon, Université de Lyon, CNRS UMR 5672, Lyon, France
| |
Collapse
|
27
|
Traag VA, Krings G, Van Dooren P. Significant scales in community structure. Sci Rep 2013; 3:2930. [PMID: 24121597 PMCID: PMC3796307 DOI: 10.1038/srep02930] [Citation(s) in RCA: 47] [Impact Index Per Article: 3.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/17/2013] [Accepted: 09/24/2013] [Indexed: 11/09/2022] Open
Abstract
Many complex networks show signs of modular structure, uncovered by community detection. Although many methods succeed in revealing various partitions, it remains difficult to detect at what scale some partition is significant. This problem shows foremost in multi-resolution methods. We here introduce an efficient method for scanning for resolutions in one such method. Additionally, we introduce the notion of "significance" of a partition, based on subgraph probabilities. Significance is independent of the exact method used, so could also be applied in other methods, and can be interpreted as the gain in encoding a graph by making use of a partition. Using significance, we can determine "good" resolution parameters, which we demonstrate on benchmark networks. Moreover, optimizing significance itself also shows excellent performance. We demonstrate our method on voting data from the European Parliament. Our analysis suggests the European Parliament has become increasingly ideologically divided and that nationality plays no role.
Collapse
Affiliation(s)
- V A Traag
- 1] ICTEAM, Université catholique de Louvain [2] Royal Netherlands Institute of Southeast Asian and Caribbean Studies
| | | | | |
Collapse
|
28
|
Wang Y, Zeng A, Di Z, Fan Y. Spectral coarse graining for random walks in bipartite networks. CHAOS (WOODBURY, N.Y.) 2013; 23:013104. [PMID: 23556941 DOI: 10.1063/1.4773823] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/02/2023]
Abstract
Many real-world networks display a natural bipartite structure, yet analyzing and visualizing large bipartite networks is one of the open challenges in complex network research. A practical approach to this problem would be to reduce the complexity of the bipartite system while at the same time preserve its functionality. However, we find that existing coarse graining methods for monopartite networks usually fail for bipartite networks. In this paper, we use spectral analysis to design a coarse graining scheme specific for bipartite networks, which keeps their random walk properties unchanged. Numerical analysis on both artificial and real-world networks indicates that our coarse graining can better preserve most of the relevant spectral properties of the network. We validate our coarse graining method by directly comparing the mean first passage time of the walker in the original network and the reduced one.
Collapse
Affiliation(s)
- Yang Wang
- Department of Systems Science, School of Management, Beijing Normal University, Beijing 100875, People's Republic of China
| | | | | | | |
Collapse
|
29
|
Bassett DS, Porter MA, Wymbs NF, Grafton ST, Carlson JM, Mucha PJ. Robust detection of dynamic community structure in networks. CHAOS (WOODBURY, N.Y.) 2013; 23:013142. [PMID: 23556979 PMCID: PMC3618100 DOI: 10.1063/1.4790830] [Citation(s) in RCA: 274] [Impact Index Per Article: 22.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/19/2012] [Accepted: 01/08/2013] [Indexed: 05/05/2023]
Abstract
We describe techniques for the robust detection of community structure in some classes of time-dependent networks. Specifically, we consider the use of statistical null models for facilitating the principled identification of structural modules in semi-decomposable systems. Null models play an important role both in the optimization of quality functions such as modularity and in the subsequent assessment of the statistical validity of identified community structure. We examine the sensitivity of such methods to model parameters and show how comparisons to null models can help identify system scales. By considering a large number of optimizations, we quantify the variance of network diagnostics over optimizations ("optimization variance") and over randomizations of network structure ("randomization variance"). Because the modularity quality function typically has a large number of nearly degenerate local optima for networks constructed using real data, we develop a method to construct representative partitions that uses a null model to correct for statistical noise in sets of partitions. To illustrate our results, we employ ensembles of time-dependent networks extracted from both nonlinear oscillators and empirical neuroscience data.
Collapse
Affiliation(s)
- Danielle S Bassett
- Department of Physics, University of California, Santa Barbara, California 93106, USA.
| | | | | | | | | | | |
Collapse
|
30
|
Resampling effects on significance analysis of network clustering and ranking. PLoS One 2013; 8:e53943. [PMID: 23372677 PMCID: PMC3553110 DOI: 10.1371/journal.pone.0053943] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/10/2012] [Accepted: 12/06/2012] [Indexed: 11/19/2022] Open
Abstract
Community detection helps us simplify the complex configuration of networks, but communities are reliable only if they are statistically significant. To detect statistically significant communities, a common approach is to resample the original network and analyze the communities. But resampling assumes independence between samples, while the components of a network are inherently dependent. Therefore, we must understand how breaking dependencies between resampled components affects the results of the significance analysis. Here we use scientific communication as a model system to analyze this effect. Our dataset includes citations among articles published in journals in the years 1984–2010. We compare parametric resampling of citations with non-parametric article resampling. While citation resampling breaks link dependencies, article resampling maintains such dependencies. We find that citation resampling underestimates the variance of link weights. Moreover, this underestimation explains most of the differences in the significance analysis of ranking and clustering. Therefore, when only link weights are available and article resampling is not an option, we suggest a simple parametric resampling scheme that generates link-weight variances close to the link-weight variances of article resampling. Nevertheless, when we highlight and summarize important structural changes in science, the more dependencies we can maintain in the resampling scheme, the earlier we can predict structural change.
Collapse
|
31
|
Hu D, Ronhovde P, Nussinov Z. Stability-to-instability transition in the structure of large-scale networks. PHYSICAL REVIEW. E, STATISTICAL, NONLINEAR, AND SOFT MATTER PHYSICS 2012; 86:066106. [PMID: 23368003 DOI: 10.1103/physreve.86.066106] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/25/2012] [Indexed: 06/01/2023]
Abstract
We examine phase transitions between the "easy," "hard," and "unsolvable" phases when attempting to identify structure in large complex networks ("community detection") in the presence of disorder induced by network "noise" (spurious links that obscure structure), heat bath temperature T, and system size N. The partition of a graph into q optimally disjoint subgraphs or "communities" inherently requires Potts-type variables. In earlier work [Philos. Mag. 92, 406 (2012)], when examining power law and other networks (and general associated Potts models), we illustrated that transitions in the computational complexity of the community detection problem typically correspond to spin-glass-type transitions (and transitions to chaotic dynamics in mechanical analogs) at both high and low temperatures and/or noise. The computationally "hard" phase exhibits spin-glass type behavior including memory effects. The region over which the hard phase extends in the noise and temperature phase diagram decreases as N increases while holding the average number of nodes per community fixed. This suggests that in the thermodynamic limit a direct sharp transition may occur between the easy and unsolvable phases. When present, transitions at low temperature or low noise correspond to entropy driven (or "order by disorder") annealing effects, wherein stability may initially increase as temperature or noise is increased before becoming unsolvable at sufficiently high temperature or noise. Additional transitions between contending viable solutions (such as those at different natural scales) are also possible. Identifying community structure via a dynamical approach where "chaotic-type" transitions were found earlier. The correspondence between the spin-glass-type complexity transitions and transitions into chaos in dynamical analogs might extend to other hard computational problems. In this work, we examine large networks (with a power law distribution in cluster size) that have a large number of communities (q≫1). We infer that large systems at a constant ratio of q to the number of nodes N asymptotically tend towards insolvability in the limit of large N for any positive T. The asymptotic behavior of temperatures below which structure identification might be possible, T_{×}=O[1/lnq], decreases slowly, so for practical system sizes, there remains an accessible, and generally easy, global solvable phase at low temperature. We further employ multivariate Tutte polynomials to show that increasing q emulates increasing T for a general Potts model, leading to a similar stability region at low T. Given the relation between Tutte and Jones polynomials, our results further suggest a link between the above complexity transitions and transitions associated with random knots.
Collapse
Affiliation(s)
- Dandan Hu
- Department of Physics, Washington University in St. Louis, Campus Box 1105, 1 Brookings Drive, St. Louis, Missouri 63130, USA
| | | | | |
Collapse
|
32
|
Unravelling the intrinsic functional organization of the human lateral frontal cortex: a parcellation scheme based on resting state fMRI. J Neurosci 2012; 32:10238-52. [PMID: 22836258 DOI: 10.1523/jneurosci.5852-11.2012] [Citation(s) in RCA: 61] [Impact Index Per Article: 4.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/21/2022] Open
Abstract
Human and nonhuman primates exhibit flexible behavior. Functional, anatomical, and lesion studies indicate that the lateral frontal cortex (LFC) plays a pivotal role in such behavior. LFC consists of distinct subregions exhibiting distinct connectivity patterns that possibly relate to functional specializations. Inference about the border of each subregion in the human brain is performed with the aid of macroscopic landmarks and/or cytoarchitectonic parcellations extrapolated in a stereotaxic system. However, the high interindividual variability, the limited availability of cytoarchitectonic probabilistic maps, and the absence of robust functional localizers render the in vivo delineation and examination of the LFC subregions challenging. In this study, we use resting state fMRI for the in vivo parcellation of the human LFC on a subjectwise and data-driven manner. This approach succeeds in uncovering neuroanatomically realistic subregions, with potential anatomical substrates including BA 46, 44, 45, 9 and related (sub)divisions. Ventral LFC subregions exhibit different functional connectivity (FC), which can account for different contributions in the language domain, while more dorsal adjacent subregions mark a transition to visuospatial/sensorimotor networks. Dorsal LFC subregions participate in known large-scale networks obeying an external/internal information processing dichotomy. Furthermore, we traced "families" of LFC subregions organized along the dorsal-ventral and anterior-posterior axis with distinct functional networks also encompassing specialized cingulate divisions. Similarities with the connectivity of macaque candidate homologs were observed, such as the premotor affiliation of presumed BA 46. The current findings partially support dominant LFC models.
Collapse
|
33
|
Network analysis reveals increased integration during emotional and motivational processing. J Neurosci 2012; 32:8361-72. [PMID: 22699916 DOI: 10.1523/jneurosci.0821-12.2012] [Citation(s) in RCA: 120] [Impact Index Per Article: 9.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/21/2022] Open
Abstract
In recent years, a large number of human studies have investigated large-scale network properties of the brain, typically during the resting state. A critical gap in the knowledge base concerns the understanding of network properties of a focused set of brain regions during task conditions engaging these regions. Although emotion and motivation recruit many brain regions, it is currently unknown how they affect network-level properties of inter-region interactions. In the present study, we sought to characterize network structure during "mini-states" engendered by emotional and motivational cues investigated in separate studies. To do so, we used graph-theoretic network analysis to probe network-, community-, and node-level properties of the trial-by-trial functional connectivity between regions of interest. We used methods that operate on weighted graphs that make use of the continuous information of connectivity strength. In both the emotion and motivation datasets, global efficiency increased and decomposability decreased. Thus, processing became less segregated with the context signaled by the cue (potential shock or potential reward). Our findings also revealed several important features of inter-community communication, including notable contributions of the bed nucleus of the stria terminalis, anterior insula, and thalamus during threat and of the caudate and nucleus accumbens during reward. Together, the results suggest that one way in which emotional and motivational processing affect brain responses is by enhancing signal communication between regions, especially between cortical and subcortical ones.
Collapse
|
34
|
Bagrow JP. Communities and bottlenecks: trees and treelike networks have high modularity. PHYSICAL REVIEW. E, STATISTICAL, NONLINEAR, AND SOFT MATTER PHYSICS 2012; 85:066118. [PMID: 23005173 DOI: 10.1103/physreve.85.066118] [Citation(s) in RCA: 17] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 01/02/2012] [Indexed: 06/01/2023]
Abstract
Much effort has gone into understanding the modular nature of complex networks. Communities, also known as clusters or modules, are typically considered to be densely interconnected groups of nodes that are only sparsely connected to other groups in the network. Discovering high quality communities is a difficult and important problem in a number of areas. The most popular approach is the objective function known as modularity, used both to discover communities and to measure their strength. To understand the modular structure of networks it is then crucial to know how such functions evaluate different topologies, what features they account for, and what implicit assumptions they may make. We show that trees and treelike networks can have unexpectedly and often arbitrarily high values of modularity. This is surprising since trees are maximally sparse connected graphs and are not typically considered to possess modular structure, yet the nonlocal null model used by modularity assigns low probabilities, and thus high significance, to the densities of these sparse tree communities. We further study the practical performance of popular methods on model trees and on a genealogical data set and find that the discovered communities also have very high modularity, often approaching its maximum value. Statistical tests reveal the communities in trees to be significant, in contrast with known results for partitions of sparse, random graphs.
Collapse
Affiliation(s)
- James P Bagrow
- Department of Engineering Sciences and Applied Mathematics, Northwestern Institute on Complex Systems, Northwestern University, Evanston, Illinois 60208, USA.
| |
Collapse
|
35
|
Peixoto TP. Entropy of stochastic blockmodel ensembles. PHYSICAL REVIEW. E, STATISTICAL, NONLINEAR, AND SOFT MATTER PHYSICS 2012; 85:056122. [PMID: 23004836 DOI: 10.1103/physreve.85.056122] [Citation(s) in RCA: 50] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Received: 02/01/2012] [Indexed: 06/01/2023]
Abstract
Stochastic blockmodels are generative network models where the vertices are separated into discrete groups, and the probability of an edge existing between two vertices is determined solely by their group membership. In this paper, we derive expressions for the entropy of stochastic blockmodel ensembles. We consider several ensemble variants, including the traditional model as well as the newly introduced degree-corrected version [Karrer et al., Phys. Rev. E 83, 016107 (2011)], which imposes a degree sequence on the vertices, in addition to the block structure. The imposed degree sequence is implemented both as "soft" constraints, where only the expected degrees are imposed, and as "hard" constraints, where they are required to be the same on all samples of the ensemble. We also consider generalizations to multigraphs and directed graphs. We illustrate one of many applications of this measure by directly deriving a log-likelihood function from the entropy expression, and using it to infer latent block structure in observed data. Due to the general nature of the ensembles considered, the method works well for ensembles with intrinsic degree correlations (i.e., with entropic origin) as well as extrinsic degree correlations, which go beyond the block structure.
Collapse
Affiliation(s)
- Tiago P Peixoto
- Institut für Theoretische Physik, Universität Bremen, Germany.
| |
Collapse
|
36
|
Abstract
Researchers use community-detection algorithms to reveal large-scale organization in biological and social networks, but community detection is useful only if the communities are significant and not a result of noisy data. To assess the statistical significance of the network communities, or the robustness of the detected structure, one approach is to perturb the network structure by removing links and measure how much the communities change. However, perturbing sparse networks is challenging because they are inherently sensitive; they shatter easily if links are removed. Here we propose a simple method to perturb sparse networks and assess the significance of their communities. We generate resampled networks by adding extra links based on local information, then we aggregate the information from multiple resampled networks to find a coarse-grained description of significant clusters. In addition to testing our method on benchmark networks, we use our method on the sparse network of the European Court of Justice (ECJ) case law, to detect significant and insignificant areas of law. We use our significance analysis to draw a map of the ECJ case law network that reveals the relations between the areas of law.
Collapse
|
37
|
Grabowicz PA, Ramasco JJ, Moro E, Pujol JM, Eguiluz VM. Social features of online networks: the strength of intermediary ties in online social media. PLoS One 2012; 7:e29358. [PMID: 22247773 PMCID: PMC3256152 DOI: 10.1371/journal.pone.0029358] [Citation(s) in RCA: 62] [Impact Index Per Article: 4.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/05/2011] [Accepted: 11/27/2011] [Indexed: 11/29/2022] Open
Abstract
An increasing fraction of today's social interactions occur using online social media as communication channels. Recent worldwide events, such as social movements in Spain or revolts in the Middle East, highlight their capacity to boost people's coordination. Online networks display in general a rich internal structure where users can choose among different types and intensity of interactions. Despite this, there are still open questions regarding the social value of online interactions. For example, the existence of users with millions of online friends sheds doubts on the relevance of these relations. In this work, we focus on Twitter, one of the most popular online social networks, and find that the network formed by the basic type of connections is organized in groups. The activity of the users conforms to the landscape determined by such groups. Furthermore, Twitter's distinction between different types of interactions allows us to establish a parallelism between online and offline social networks: personal interactions are more likely to occur on internal links to the groups (the weakness of strong ties); events transmitting new information go preferentially through links connecting different groups (the strength of weak ties) or even more through links connecting to users belonging to several groups that act as brokers (the strength of intermediary ties).
Collapse
Affiliation(s)
- Przemyslaw A. Grabowicz
- Instituto de Fisica Interdisciplinaria y Sistemas Complejos (CSIC-UIB), Palma de Mallorca, Spain
| | - José J. Ramasco
- Instituto de Fisica Interdisciplinaria y Sistemas Complejos (CSIC-UIB), Palma de Mallorca, Spain
- * E-mail:
| | - Esteban Moro
- Instituto de Ingeniera del Conocimiento, Universidad Autónoma de Madrid, Madrid, Spain
- Instituto de Ciencias Matemáticas CSIC-UAM-UC3M-UCM, Departamento de Matemáticas y GISC, Universidad Carlos III de Madrid, Leganés, Spain
| | - Josep M. Pujol
- Telefónica Research, Barcelona, Spain
- 3scale Networks, Barcelona, Spain
| | - Victor M. Eguiluz
- Instituto de Fisica Interdisciplinaria y Sistemas Complejos (CSIC-UIB), Palma de Mallorca, Spain
| |
Collapse
|
38
|
Lancichinetti A, Fortunato S. Limits of modularity maximization in community detection. PHYSICAL REVIEW. E, STATISTICAL, NONLINEAR, AND SOFT MATTER PHYSICS 2011; 84:066122. [PMID: 22304170 DOI: 10.1103/physreve.84.066122] [Citation(s) in RCA: 113] [Impact Index Per Article: 8.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/06/2011] [Revised: 10/17/2011] [Indexed: 05/21/2023]
Abstract
Modularity maximization is the most popular technique for the detection of community structure in graphs. The resolution limit of the method is supposedly solvable with the introduction of modified versions of the measure, with tunable resolution parameters. We show that multiresolution modularity suffers from two opposite coexisting problems: the tendency to merge small subgraphs, which dominates when the resolution is low; the tendency to split large subgraphs, which dominates when the resolution is high. In benchmark networks with heterogeneous distributions of cluster sizes, the simultaneous elimination of both biases is not possible and multiresolution modularity is not capable to recover the planted community structure, not even when it is pronounced and easily detectable by other methods, for any value of the resolution parameter. This holds for other multiresolution techniques and it is likely to be a general problem of methods based on global optimization.
Collapse
Affiliation(s)
- Andrea Lancichinetti
- Complex Networks and Systems Lagrange Lab, Institute for Scientific Interchange, I-10133 Torino, Italy
| | | |
Collapse
|
39
|
Zhan W, Zhang Z, Guan J, Zhou S. Evolutionary method for finding communities in bipartite networks. PHYSICAL REVIEW. E, STATISTICAL, NONLINEAR, AND SOFT MATTER PHYSICS 2011; 83:066120. [PMID: 21797454 DOI: 10.1103/physreve.83.066120] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 11/17/2010] [Revised: 03/03/2011] [Indexed: 05/31/2023]
Abstract
An important step in unveiling the relation between network structure and dynamics defined on networks is to detect communities, and numerous methods have been developed separately to identify community structure in different classes of networks, such as unipartite networks, bipartite networks, and directed networks. Here, we show that the finding of communities in such networks can be unified in a general framework-detection of community structure in bipartite networks. Moreover, we propose an evolutionary method for efficiently identifying communities in bipartite networks. To this end, we show that both unipartite and directed networks can be represented as bipartite networks, and their modularity is completely consistent with that for bipartite networks, the detection of modular structure on which can be reformulated as modularity maximization. To optimize the bipartite modularity, we develop a modified adaptive genetic algorithm (MAGA), which is shown to be especially efficient for community structure detection. The high efficiency of the MAGA is based on the following three improvements we make. First, we introduce a different measure for the informativeness of a locus instead of the standard deviation, which can exactly determine which loci mutate. This measure is the bias between the distribution of a locus over the current population and the uniform distribution of the locus, i.e., the Kullback-Leibler divergence between them. Second, we develop a reassignment technique for differentiating the informative state a locus has attained from the random state in the initial phase. Third, we present a modified mutation rule which by incorporating related operations can guarantee the convergence of the MAGA to the global optimum and can speed up the convergence process. Experimental results show that the MAGA outperforms existing methods in terms of modularity for both bipartite and unipartite networks.
Collapse
Affiliation(s)
- Weihua Zhan
- Department of Computer Science and Technology, Tongji University, 4800 Cao'an Road, Shanghai 201804, China.
| | | | | | | |
Collapse
|
40
|
Berry JW, Hendrickson B, LaViolette RA, Phillips CA. Tolerating the community detection resolution limit with edge weighting. PHYSICAL REVIEW. E, STATISTICAL, NONLINEAR, AND SOFT MATTER PHYSICS 2011; 83:056119. [PMID: 21728617 DOI: 10.1103/physreve.83.056119] [Citation(s) in RCA: 18] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Received: 01/04/2011] [Indexed: 05/31/2023]
Abstract
Communities of vertices within a giant network such as the World Wide Web are likely to be vastly smaller than the network itself. However, Fortunato and Barthélemy have proved that modularity maximization algorithms for community detection may fail to resolve communities with fewer than √L/2 edges, where L is the number of edges in the entire network. This resolution limit leads modularity maximization algorithms to have notoriously poor accuracy on many real networks. Fortunato and Barthélemy's argument can be extended to networks with weighted edges as well, and we derive this corollary argument. We conclude that weighted modularity algorithms may fail to resolve communities with less than √Wε/2 total edge weight, where W is the total edge weight in the network and ε is the maximum weight of an intercommunity edge. If ε is small, then small communities can be resolved. Given a weighted or unweighted network, we describe how to derive new edge weights in order to achieve a low ε, we modify the Clauset, Newman, and Moore (CNM) community detection algorithm to maximize weighted modularity, and we show that the resulting algorithm has greatly improved accuracy. In experiments with an emerging community standard benchmark, we find that our simple CNM variant is competitive with the most accurate community detection methods yet proposed.
Collapse
Affiliation(s)
- Jonathan W Berry
- Sandia National Laboratories, P.O. Box 5800, Albuquerque, New Mexico 87185, USA.
| | | | | | | |
Collapse
|
41
|
Lancichinetti A, Radicchi F, Ramasco JJ, Fortunato S. Finding statistically significant communities in networks. PLoS One 2011; 6:e18961. [PMID: 21559480 PMCID: PMC3084717 DOI: 10.1371/journal.pone.0018961] [Citation(s) in RCA: 252] [Impact Index Per Article: 18.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/09/2010] [Accepted: 03/14/2011] [Indexed: 11/18/2022] Open
Abstract
Community structure is one of the main structural features of networks, revealing both their internal organization and the similarity of their elementary units. Despite the large variety of methods proposed to detect communities in graphs, there is a big need for multi-purpose techniques, able to handle different types of datasets and the subtleties of community structure. In this paper we present OSLOM (Order Statistics Local Optimization Method), the first method capable to detect clusters in networks accounting for edge directions, edge weights, overlapping communities, hierarchies and community dynamics. It is based on the local optimization of a fitness function expressing the statistical significance of clusters with respect to random fluctuations, which is estimated with tools of Extreme and Order Statistics. OSLOM can be used alone or as a refinement procedure of partitions/covers delivered by other techniques. We have also implemented sequential algorithms combining OSLOM with other fast techniques, so that the community structure of very large networks can be uncovered. Our method has a comparable performance as the best existing algorithms on artificial benchmark graphs. Several applications on real networks are shown as well. OSLOM is implemented in a freely available software (http://www.oslom.org), and we believe it will be a valuable tool in the analysis of networks.
Collapse
Affiliation(s)
- Andrea Lancichinetti
- Complex Networks and Systems Lagrange
Laboratory, Institute for Scientific Interchange (ISI), Torino,
Italy
- Physics Department, Politecnico di Torino,
Torino, Italy
| | - Filippo Radicchi
- Howard Hughes Medical Institute (HHMI),
Northwestern University, Evanston, Illinois, United States of
America
| | - José J. Ramasco
- Complex Networks and Systems Lagrange
Laboratory, Institute for Scientific Interchange (ISI), Torino,
Italy
- Instituto de Física Interdisciplinar y
Sistemas Complejos IFISC (CSIC-UIB), Palma de Mallorca, Spain
| | - Santo Fortunato
- Complex Networks and Systems Lagrange
Laboratory, Institute for Scientific Interchange (ISI), Torino,
Italy
| |
Collapse
|
42
|
Radicchi F, Lancichinetti A, Ramasco JJ. Combinatorial approach to modularity. PHYSICAL REVIEW. E, STATISTICAL, NONLINEAR, AND SOFT MATTER PHYSICS 2010; 82:026102. [PMID: 20866871 DOI: 10.1103/physreve.82.026102] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/29/2010] [Indexed: 05/29/2023]
Abstract
Communities are clusters of nodes with a higher than average density of internal connections. Their detection is of great relevance to better understand the structure and hierarchies present in a network. Modularity has become a standard tool in the area of community detection, providing at the same time a way to evaluate partitions and, by maximizing it, a method to find communities. In this work, we study the modularity from a combinatorial point of view. Our analysis (as the modularity definition) relies on the use of the configurational model, a technique that given a graph produces a series of randomized copies keeping the degree sequence invariant. We develop an approach that enumerates the null model partitions and can be used to calculate the probability distribution function of the modularity. Our theory allows for a deep inquiry of several interesting features characterizing modularity such as its resolution limit and the statistics of the partitions that maximize it. Additionally, the study of the probability of extremes of the modularity in the random graph partitions opens the way for a definition of the statistical significance of network partitions.
Collapse
Affiliation(s)
- Filippo Radicchi
- Complex Networks Lagrange Laboratory, ISI Foundation, Turin, Italy
| | | | | |
Collapse
|