1
|
Economou KN, Gentleman WC, Krumhansl KA, DiBacco C, Reijnders D, Wang Z, Lyons DA, Lowen B. Assessing spatial structure in marine populations using network theory: A case study of Atlantic sea scallop (Placopecten magellanicus) connectivity. PLoS One 2024; 19:e0308787. [PMID: 39535997 PMCID: PMC11559974 DOI: 10.1371/journal.pone.0308787] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/12/2024] [Accepted: 07/31/2024] [Indexed: 11/16/2024] Open
Abstract
Knowledge of the geographic distribution and connectivity of marine populations is essential for ecological understanding and informing management. Previous works have assessed spatial structure by quantifying exchange using Lagrangian particle-tracking simulations, but their scope of analysis is limited by their use of predefined subpopulations. To instead delineate subpopulations emerging naturally from marine population connectivity, we interpret this connectivity as a network, enabling the use of powerful analytic tools from the field of network theory. The modelling approach presented here uses particle-tracking to construct a transport network, and then applies the community detection algorithm Infomap to identify subpopulations that exhibit high internal connectivity and sparse connectivity with other subpopulations. An established quality metric, the coherence ratio, and a new metric we introduce indicating self-recruitment to subpopulations, dubbed the fortress ratio, are used to interpret community-level exchange. We use the Atlantic sea scallop (Placopecten magellanicus) in the northwest Atlantic as a case study. Results suggest that genetic lineages of P. magellanicus demonstrate spatial substructure that depends on horizontal transport, vertical motility, and suitable habitat. Our results support connectivity previously characterized on Georges Bank and Mid-Atlantic Bight. The Gulf of St. Lawrence genetic lineage is found to consist of five subpopulations that are classified as being a sink, source, permeable, or impermeable using quality metrics. This approach may be applied to other planktonic dispersers and prove useful to management.
Collapse
Affiliation(s)
- Karsten N. Economou
- Department of Engineering Mathematics and Internetworking, Dalhousie University, Halifax, Nova Scotia, Canada
- Fisheries and Oceans Canada, Bedford Institute of Oceanography, Dartmouth, Nova Scotia, Canada
| | - Wendy C. Gentleman
- Department of Engineering Mathematics and Internetworking, Dalhousie University, Halifax, Nova Scotia, Canada
| | - Kira A. Krumhansl
- Fisheries and Oceans Canada, Bedford Institute of Oceanography, Dartmouth, Nova Scotia, Canada
| | - Claudio DiBacco
- Fisheries and Oceans Canada, Bedford Institute of Oceanography, Dartmouth, Nova Scotia, Canada
| | - Daan Reijnders
- Institute for Marine and Atmospheric Research Utrecht, Utrecht University, Utrecht, Utrecht, The Netherlands
| | - Zeliang Wang
- Fisheries and Oceans Canada, Bedford Institute of Oceanography, Dartmouth, Nova Scotia, Canada
| | - Devin A. Lyons
- Fisheries and Oceans Canada, Bedford Institute of Oceanography, Dartmouth, Nova Scotia, Canada
| | - Ben Lowen
- Fisheries and Oceans Canada, Bedford Institute of Oceanography, Dartmouth, Nova Scotia, Canada
| |
Collapse
|
2
|
Zamora-López G, Gilson M. An integrative dynamical perspective for graph theory and the analysis of complex networks. CHAOS (WOODBURY, N.Y.) 2024; 34:041501. [PMID: 38625080 DOI: 10.1063/5.0202241] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/01/2023] [Accepted: 02/25/2024] [Indexed: 04/17/2024]
Abstract
Built upon the shoulders of graph theory, the field of complex networks has become a central tool for studying real systems across various fields of research. Represented as graphs, different systems can be studied using the same analysis methods, which allows for their comparison. Here, we challenge the widespread idea that graph theory is a universal analysis tool, uniformly applicable to any kind of network data. Instead, we show that many classical graph metrics-including degree, clustering coefficient, and geodesic distance-arise from a common hidden propagation model: the discrete cascade. From this perspective, graph metrics are no longer regarded as combinatorial measures of the graph but as spatiotemporal properties of the network dynamics unfolded at different temporal scales. Once graph theory is seen as a model-based (and not a purely data-driven) analysis tool, we can freely or intentionally replace the discrete cascade by other canonical propagation models and define new network metrics. This opens the opportunity to design-explicitly and transparently-dedicated analyses for different types of real networks by choosing a propagation model that matches their individual constraints. In this way, we take stand that network topology cannot always be abstracted independently from network dynamics but shall be jointly studied, which is key for the interpretability of the analyses. The model-based perspective here proposed serves to integrate into a common context both the classical graph analysis and the more recent network metrics defined in the literature which were, directly or indirectly, inspired by propagation phenomena on networks.
Collapse
Affiliation(s)
- Gorka Zamora-López
- Center for Brain and Cognition, Pompeu Fabra University, 08005 Barcelona, Spain
- Department of Information and Communication Technologies, Pompeu Fabra University, 08018 Barcelona, Spain
| | - Matthieu Gilson
- Institut des Neurosciences de la Timone, CNRS-AMU, 13005 Marseille, France
- Institut des Neurosciences des Systemes, INSERM-AMU, 13005 Marseille, France
| |
Collapse
|
3
|
Kawamoto T. Single-trajectory map equation. Sci Rep 2023; 13:6597. [PMID: 37087492 PMCID: PMC10122677 DOI: 10.1038/s41598-023-33880-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/01/2022] [Accepted: 04/20/2023] [Indexed: 04/24/2023] Open
Abstract
Community detection, the process of identifying module structures in complex systems represented on networks, is an effective tool in various fields of science. The map equation, which is an information-theoretic framework based on the random walk on a network, is a particularly popular community detection method. Despite its outstanding performance in many applications, the inner workings of the map equation have not been thoroughly studied. Herein, we revisit the original formulation of the map equation and address the existence of its "raw form," which we refer to as the single-trajectory map equation. This raw form sheds light on many details behind the principle of the map equation that are hidden in the steady-state limit of the random walk. Most importantly, the single-trajectory map equation provides a more balanced community structure, naturally reducing the tendency of the overfitting phenomenon in the map equation.
Collapse
Affiliation(s)
- Tatsuro Kawamoto
- Artificial Intelligence Research Center, National Institute of Advanced Industrial Science and Technology, Tokyo, 135-0064, Japan.
| |
Collapse
|
4
|
Okamoto H, Qiu X. Detecting hierarchical organization of pervasive communities by modular decomposition of Markov chain. Sci Rep 2022; 12:20211. [PMID: 36418410 PMCID: PMC9684584 DOI: 10.1038/s41598-022-24567-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/07/2022] [Accepted: 11/17/2022] [Indexed: 11/25/2022] Open
Abstract
Connecting nodes that contingently co-appear, which is a common process of networking in social and biological systems, normally leads to modular structure characterized by the absence of definite boundaries. This study seeks to find and evaluate methods to detect such modules, which will be called 'pervasive' communities. We propose a mathematical formulation to decompose a random walk spreading over the entire network into localized random walks as a proxy for pervasive communities. We applied this formulation to biological and social as well as synthetic networks to demonstrate that it can properly detect communities as pervasively structured objects. We further addressed a question that is fundamental but has been little discussed so far: What is the hierarchical organization of pervasive communities and how can it be extracted? Here we show that hierarchical organization of pervasive communities is unveiled from finer to coarser layers through discrete phase transitions that intermittently occur as the value for a resolution-controlling parameter is quasi-statically increased. To our knowledge, this is the first elucidation of how the pervasiveness and hierarchy, both hallmarks of community structure of real-world networks, are unified.
Collapse
Affiliation(s)
- Hiroshi Okamoto
- Department of Bioengineering, The University of Tokyo, Tokyo, 113-8656, Japan.
- DWANGO Co., Ltd., Tokyo , Japan.
| | - Xule Qiu
- FUJIFILM Business Innovation Corp., Tokyo, Japan
| |
Collapse
|
5
|
Bassolas A, Holmgren A, Marot A, Rosvall M, Nicosia V. Mapping nonlocal relationships between metadata and network structure with metadata-dependent encoding of random walks. SCIENCE ADVANCES 2022; 8:eabn7558. [PMID: 36306360 PMCID: PMC9616498 DOI: 10.1126/sciadv.abn7558] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 12/17/2021] [Accepted: 09/12/2022] [Indexed: 06/16/2023]
Abstract
Integrating structural information and metadata, such as gender, social status, or interests, enriches networks and enables a better understanding of the large-scale structure of complex systems. However, existing approaches to augment networks with metadata for community detection only consider immediately adjacent nodes and cannot exploit the nonlocal relationships between metadata and large-scale network structure present in many spatial and social systems. Here, we develop a flow-based community detection framework based on the map equation that integrates network information and metadata of distant nodes and reveals more complex relationships. We analyze social and spatial networks and find that our methodology can detect functional metadata-informed communities distinct from those derived solely from network information or metadata. For example, in a mobility network of London, we identify communities that reflect the heterogeneity of income distribution, and in a European power grid network, we identify communities that capture relationships between geography and energy prices beyond country borders.
Collapse
Affiliation(s)
- Aleix Bassolas
- School of Mathematical Sciences, Queen Mary University of London, London E1 4NS, UK
- Departament d’Enginyeria Informatica i Matematiques, Universitat Rovira i Virgili, 43007 Tarragona, Spain
- Instituto de Física Interdisciplinar y Sistemas Complejos IFISC (CSIC-UIB), Campus UIB, 07122 Palma de Mallorca, Spain
| | - Anton Holmgren
- Integrated Science Lab, Department of Physics, Umeå University, SE-901 87 Umeå, Sweden
| | - Antoine Marot
- RTE Réseau de Transport d’Electricité, Paris, France
| | - Martin Rosvall
- Integrated Science Lab, Department of Physics, Umeå University, SE-901 87 Umeå, Sweden
| | - Vincenzo Nicosia
- School of Mathematical Sciences, Queen Mary University of London, London E1 4NS, UK
| |
Collapse
|
6
|
Faccin M, Schaub MT, Delvenne JC. State Aggregations in Markov Chains and Block Models of Networks. PHYSICAL REVIEW LETTERS 2021; 127:078301. [PMID: 34459654 DOI: 10.1103/physrevlett.127.078301] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/04/2020] [Revised: 06/17/2021] [Accepted: 07/15/2021] [Indexed: 06/13/2023]
Abstract
We consider state-aggregation schemes for Markov chains from an information-theoretic perspective. Specifically, we consider aggregating the states of a Markov chain such that the mutual information of the aggregated states separated by T time steps is maximized. We show that for T=1 this recovers the maximum-likelihood estimator of the degree-corrected stochastic block model as a particular case, which enables us to explain certain features of the likelihood landscape of this generative network model from a dynamical lens. We further highlight how we can uncover coherent, long-range dynamical modules for which considering a timescale T≫1 is essential. We demonstrate our results using synthetic flows and real-world ocean currents, where we are able to recover the fundamental features of the surface currents of the oceans.
Collapse
Affiliation(s)
- Mauro Faccin
- ICTEAM, Université catholique de Louvain, 1348 Louvain-la-Neuve, Belgium
| | - Michael T Schaub
- Department of Engineering Science, University of Oxford, Oxford OX1 2JD, United Kingdom
- Department of Computer Science, RWTH Aachen University, 52074 Aachen, Germany
| | - Jean-Charles Delvenne
- ICTEAM, Université catholique de Louvain, 1348 Louvain-la-Neuve, Belgium
- CORE, Université catholique de Louvain, 1348 Louvain-la-Neuve, Belgium
| |
Collapse
|
7
|
Tandon A, Albeshri A, Thayananthan V, Alhalabi W, Radicchi F, Fortunato S. Community detection in networks using graph embeddings. Phys Rev E 2021; 103:022316. [PMID: 33736102 DOI: 10.1103/physreve.103.022316] [Citation(s) in RCA: 12] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/11/2020] [Accepted: 02/04/2021] [Indexed: 11/07/2022]
Abstract
Graph embedding methods are becoming increasingly popular in the machine learning community, where they are widely used for tasks such as node classification and link prediction. Embedding graphs in geometric spaces should aid the identification of network communities as well because nodes in the same community should be projected close to each other in the geometric space, where they can be detected via standard data clustering algorithms. In this paper, we test the ability of several graph embedding techniques to detect communities on benchmark graphs. We compare their performance against that of traditional community detection algorithms. We find that the performance is comparable, if the parameters of the embedding techniques are suitably chosen. However, the optimal parameter set varies with the specific features of the benchmark graphs, like their size, whereas popular community detection algorithms do not require any parameter. So, it is not possible to indicate beforehand good parameter sets for the analysis of real networks. This finding, along with the high computational cost of embedding a network and grouping the points, suggests that, for community detection, current embedding techniques do not represent an improvement over network clustering algorithms.
Collapse
Affiliation(s)
- Aditya Tandon
- Luddy School of Informatics, Computing and Engineering, Indiana University, Bloomington, Indiana 47408, USA
| | - Aiiad Albeshri
- Department of Computer Science, Faculty of Computing and Information Technology King Abdulaziz University, Jeddah 21589, Kingdom of Saudi Arabia
| | - Vijey Thayananthan
- Department of Computer Science, Faculty of Computing and Information Technology King Abdulaziz University, Jeddah 21589, Kingdom of Saudi Arabia
| | - Wadee Alhalabi
- Department of Computer Science, Faculty of Computing and Information Technology King Abdulaziz University, Jeddah 21589, Kingdom of Saudi Arabia
| | - Filippo Radicchi
- Luddy School of Informatics, Computing and Engineering, Indiana University, Bloomington, Indiana 47408, USA.,Indiana University Network Science Institute (IUNI), Bloomington, Indiana 47408, USA
| | - Santo Fortunato
- Luddy School of Informatics, Computing and Engineering, Indiana University, Bloomington, Indiana 47408, USA.,Indiana University Network Science Institute (IUNI), Bloomington, Indiana 47408, USA
| |
Collapse
|
8
|
Blöcker C, Rosvall M. Mapping flows on bipartite networks. Phys Rev E 2020; 102:052305. [PMID: 33327187 DOI: 10.1103/physreve.102.052305] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/03/2020] [Accepted: 10/10/2020] [Indexed: 05/07/2023]
Abstract
Mapping network flows provides insight into the organization of networks, but even though many real networks are bipartite, no method for mapping flows takes advantage of the bipartite structure. What do we miss by discarding this information and how can we use it to understand the structure of bipartite networks better? The map equation models network flows with a random walk and exploits the information-theoretic duality between compression and finding regularities to detect communities in networks. However, it does not use the fact that random walks in bipartite networks alternate between node types, information worth 1 bit. To make some or all of this information available to the map equation, we developed a coding scheme that remembers node types at different rates. We explored the community landscape of bipartite real-world networks from no node-type information to full node-type information and found that using node types at a higher rate generally leads to deeper community hierarchies and a higher resolution. The corresponding compression of network flows exceeds the amount of extra information provided. Consequently, taking advantage of the bipartite structure increases the resolution and reveals more network regularities.
Collapse
Affiliation(s)
- Christopher Blöcker
- Integrated Science Laboratory, Department of Physics, Umeå University, SE-901 87 Umeå, Sweden
| | - Martin Rosvall
- Integrated Science Laboratory, Department of Physics, Umeå University, SE-901 87 Umeå, Sweden
| |
Collapse
|
9
|
Clarke J, Beaney T, Majeed A, Darzi A, Barahona M. Identifying naturally occurring communities of primary care providers in the English National Health Service in London. BMJ Open 2020; 10:e036504. [PMID: 32690744 PMCID: PMC7375630 DOI: 10.1136/bmjopen-2019-036504] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Indexed: 11/04/2022] Open
Abstract
OBJECTIVES Primary Care Networks (PCNs) are a new organisational hierarchy with wide-ranging responsibilities introduced in the National Health Service (NHS) Long Term Plan. The vision is that PCNs should represent 'natural' communities of general practices (GP practices) collaborating at scale and covering a geography that fits well with practices, other healthcare providers and local communities. Our study aims to identify natural communities of GP practices based on patient registration patterns using Markov Multiscale Community Detection, an unsupervised network-based clustering technique to create catchments for these communities. DESIGN Retrospective observational study using Hospital Episode Statistics - patient-level administrative records of attendances to hospital. SETTING General practices in the 32 Clinical Commissioning Groups of Greater London PARTICIPANTS: All adult patients resident in and registered to a GP practice in Greater London that had one or more outpatient encounters at NHS hospitals between 1st April 2017 and 31st March 2018. MAIN OUTCOME MEASURES The allocation of GP practices in Greater London to PCNs based on the registrations of patients resident in each Lower Layer Super Output Area (LSOA) of Greater London. The population size and coverage of each proposed PCN. RESULTS 3 428 322 unique patients attended 1334 GPs in 4835 LSOAs in Greater London. Our model grouped 1291 GPs (96.8%) and 4721 LSOAs (97.6%) into 165 mutually exclusive PCNs. Median PCN list size was 53 490, with a lower quartile of 38 079 patients and an upper quartile of 72 982 patients. A median of 70.1% of patients attended a GP within their allocated PCN, ranging from 44.6% to 91.4%. CONCLUSIONS With PCNs expected to take a role in population health management and with community providers expected to reconfigure around them, it is vital to recognise how PCNs represent their communities. Our method may be used by policymakers to understand the populations and geography shared between networks.
Collapse
Affiliation(s)
- Jonathan Clarke
- Centre for Health Policy, Imperial College London, London, UK
- Centre for Mathematics of Precision Healthcare, Imperial College London, London, UK
| | - Thomas Beaney
- Department of Primary Care, Imperial College of Science Technology and Medicine, London, UK
| | | | | | - Mauricio Barahona
- Centre for Mathematics of Precision Healthcare, Imperial College London, London, UK
- Department of Mathematics, Imperial College London, London, UK
| |
Collapse
|
10
|
Verkhivker GM, Agajanian S, Hu G, Tao P. Allosteric Regulation at the Crossroads of New Technologies: Multiscale Modeling, Networks, and Machine Learning. Front Mol Biosci 2020; 7:136. [PMID: 32733918 PMCID: PMC7363947 DOI: 10.3389/fmolb.2020.00136] [Citation(s) in RCA: 47] [Impact Index Per Article: 9.4] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/01/2020] [Accepted: 06/08/2020] [Indexed: 12/12/2022] Open
Abstract
Allosteric regulation is a common mechanism employed by complex biomolecular systems for regulation of activity and adaptability in the cellular environment, serving as an effective molecular tool for cellular communication. As an intrinsic but elusive property, allostery is a ubiquitous phenomenon where binding or disturbing of a distal site in a protein can functionally control its activity and is considered as the "second secret of life." The fundamental biological importance and complexity of these processes require a multi-faceted platform of synergistically integrated approaches for prediction and characterization of allosteric functional states, atomistic reconstruction of allosteric regulatory mechanisms and discovery of allosteric modulators. The unifying theme and overarching goal of allosteric regulation studies in recent years have been integration between emerging experiment and computational approaches and technologies to advance quantitative characterization of allosteric mechanisms in proteins. Despite significant advances, the quantitative characterization and reliable prediction of functional allosteric states, interactions, and mechanisms continue to present highly challenging problems in the field. In this review, we discuss simulation-based multiscale approaches, experiment-informed Markovian models, and network modeling of allostery and information-theoretical approaches that can describe the thermodynamics and hierarchy allosteric states and the molecular basis of allosteric mechanisms. The wealth of structural and functional information along with diversity and complexity of allosteric mechanisms in therapeutically important protein families have provided a well-suited platform for development of data-driven research strategies. Data-centric integration of chemistry, biology and computer science using artificial intelligence technologies has gained a significant momentum and at the forefront of many cross-disciplinary efforts. We discuss new developments in the machine learning field and the emergence of deep learning and deep reinforcement learning applications in modeling of molecular mechanisms and allosteric proteins. The experiment-guided integrated approaches empowered by recent advances in multiscale modeling, network science, and machine learning can lead to more reliable prediction of allosteric regulatory mechanisms and discovery of allosteric modulators for therapeutically important protein targets.
Collapse
Affiliation(s)
- Gennady M. Verkhivker
- Graduate Program in Computational and Data Sciences, Schmid College of Science and Technology, Chapman University, Orange, CA, United States
- Department of Biomedical and Pharmaceutical Sciences, Chapman University School of Pharmacy, Irvine, CA, United States
| | - Steve Agajanian
- Graduate Program in Computational and Data Sciences, Schmid College of Science and Technology, Chapman University, Orange, CA, United States
| | - Guang Hu
- Center for Systems Biology, Department of Bioinformatics, School of Biology and Basic Medical Sciences, Soochow University, Suzhou, China
| | - Peng Tao
- Department of Chemistry, Center for Drug Discovery, Design, and Delivery (CD4), Center for Scientific Computation, Southern Methodist University, Dallas, TX, United States
| |
Collapse
|
11
|
Schaub MT, Delvenne JC, Lambiotte R, Barahona M. Multiscale dynamical embeddings of complex networks. Phys Rev E 2019; 99:062308. [PMID: 31330590 DOI: 10.1103/physreve.99.062308] [Citation(s) in RCA: 23] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/02/2018] [Indexed: 11/07/2022]
Abstract
Complex systems and relational data are often abstracted as dynamical processes on networks. To understand, predict, and control their behavior, a crucial step is to extract reduced descriptions of such networks. Inspired by notions from control theory, we propose a time-dependent dynamical similarity measure between nodes, which quantifies the effect a node-input has on the network. This dynamical similarity induces an embedding that can be employed for several analysis tasks. Here we focus on (i) dimensionality reduction, i.e., projecting nodes onto a low-dimensional space that captures dynamic similarity at different timescales, and (ii) how to exploit our embeddings to uncover functional modules. We exemplify our ideas through case studies focusing on directed networks without strong connectivity and signed networks. We further highlight how certain ideas from community detection can be generalized and linked to control theory, by using the here developed dynamical perspective.
Collapse
Affiliation(s)
- Michael T Schaub
- Institute for Data, Systems and Society, Massachusetts Institute of Technology, Cambridge, Massachusetts 02139, USA.,Department of Engineering Science, University of Oxford, Oxford, United Kingdom
| | - Jean-Charles Delvenne
- ICTEAM, Université catholique de Louvain, B-1348 Louvain-la-Neuve, Belgium.,CORE, Université catholique de Louvain, B-1348 Louvain-la-Neuve, Belgium
| | - Renaud Lambiotte
- Mathematical Institute, University of Oxford, Oxford, United Kingdom
| | - Mauricio Barahona
- Department of Mathematics, Imperial College London, London SW7 2AZ, United Kingdom
| |
Collapse
|
12
|
Altuncu MT, Mayer E, Yaliraki SN, Barahona M. From free text to clusters of content in health records: an unsupervised graph partitioning approach. APPLIED NETWORK SCIENCE 2019; 4:2. [PMID: 30906850 PMCID: PMC6400329 DOI: 10.1007/s41109-018-0109-9] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/25/2018] [Accepted: 11/06/2018] [Indexed: 05/07/2023]
Abstract
Electronic healthcare records contain large volumes of unstructured data in different forms. Free text constitutes a large portion of such data, yet this source of richly detailed information often remains under-used in practice because of a lack of suitable methodologies to extract interpretable content in a timely manner. Here we apply network-theoretical tools to the analysis of free text in Hospital Patient Incident reports in the English National Health Service, to find clusters of reports in an unsupervised manner and at different levels of resolution based directly on the free text descriptions contained within them. To do so, we combine recently developed deep neural network text-embedding methodologies based on paragraph vectors with multi-scale Markov Stability community detection applied to a similarity graph of documents obtained from sparsified text vector similarities. We showcase the approach with the analysis of incident reports submitted in Imperial College Healthcare NHS Trust, London. The multiscale community structure reveals levels of meaning with different resolution in the topics of the dataset, as shown by relevant descriptive terms extracted from the groups of records, as well as by comparing a posteriori against hand-coded categories assigned by healthcare personnel. Our content communities exhibit good correspondence with well-defined hand-coded categories, yet our results also provide further medical detail in certain areas as well as revealing complementary descriptors of incidents beyond the external classification. We also discuss how the method can be used to monitor reports over time and across different healthcare providers, and to detect emerging trends that fall outside of pre-existing categories.
Collapse
Affiliation(s)
- M. Tarik Altuncu
- Department of Mathematics, Imperial College London, South Kensington campus, London, SW7 2AZ UK
- EPSRC Centre for Mathematics of Precision Healthcare, Imperial College London, South Kensington campus, London, SW7 2AZ UK
| | - Erik Mayer
- Centre for Health Policy, Institute of Global Health Innovation, Imperial College London, St Mary’s campus, London, W2 1NY UK
- EPSRC Centre for Mathematics of Precision Healthcare, Imperial College London, South Kensington campus, London, SW7 2AZ UK
| | - Sophia N. Yaliraki
- Department of Chemistry, Imperial College London, South Kensington campus, London, SW7 2AZ UK
- EPSRC Centre for Mathematics of Precision Healthcare, Imperial College London, South Kensington campus, London, SW7 2AZ UK
| | - Mauricio Barahona
- Department of Mathematics, Imperial College London, South Kensington campus, London, SW7 2AZ UK
- EPSRC Centre for Mathematics of Precision Healthcare, Imperial College London, South Kensington campus, London, SW7 2AZ UK
| |
Collapse
|
13
|
Gilson M, Kouvaris NE, Deco G, Zamora-López G. Framework based on communicability and flow to analyze complex network dynamics. Phys Rev E 2018; 97:052301. [PMID: 29906867 DOI: 10.1103/physreve.97.052301] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/19/2018] [Indexed: 06/08/2023]
Abstract
Graph theory constitutes a widely used and established field providing powerful tools for the characterization of complex networks. The intricate topology of networks can also be investigated by means of the collective dynamics observed in the interactions of self-sustained oscillations (synchronization patterns) or propagationlike processes such as random walks. However, networks are often inferred from real-data-forming dynamic systems, which are different from those employed to reveal their topological characteristics. This stresses the necessity for a theoretical framework dedicated to the mutual relationship between the structure and dynamics in complex networks, as the two sides of the same coin. Here we propose a rigorous framework based on the network response over time (i.e., Green function) to study interactions between nodes across time. For this purpose we define the flow that describes the interplay between the network connectivity and external inputs. This multivariate measure relates to the concepts of graph communicability and the map equation. We illustrate our theory using the multivariate Ornstein-Uhlenbeck process, which describes stable and non-conservative dynamics, but the formalism can be adapted to other local dynamics for which the Green function is known. We provide applications to classical network examples, such as small-world ring and hierarchical networks. Our theory defines a comprehensive framework that is canonically related to directed and weighted networks, thus paving a way to revise the standards for network analysis, from the pairwise interactions between nodes to the global properties of networks including community detection.
Collapse
Affiliation(s)
- M Gilson
- Center for Brain and Cognition, Department of Information and Communication Technologies, Universitat Pompeu Fabra, Carrer Ramon Trias Fargas, 25-27, 08005 Barcelona, Spain
| | - N E Kouvaris
- Center for Brain and Cognition, Department of Information and Communication Technologies, Universitat Pompeu Fabra, Carrer Ramon Trias Fargas, 25-27, 08005 Barcelona, Spain
- Namur Institute for Complex Systems (naXys), Department of Mathematics, University of Namur, Rempart de la Vierge 8, B 5000 Namur, Belgium
| | - G Deco
- Center for Brain and Cognition, Department of Information and Communication Technologies, Universitat Pompeu Fabra, Carrer Ramon Trias Fargas, 25-27, 08005 Barcelona, Spain
- Institució Catalana de la Recerca i Estudis Avanats (ICREA), Universitat Pompeu Fabra, Passeig Lluís Companys 23, Barcelona, 08010, Spain
| | - G Zamora-López
- Center for Brain and Cognition, Department of Information and Communication Technologies, Universitat Pompeu Fabra, Carrer Ramon Trias Fargas, 25-27, 08005 Barcelona, Spain
| |
Collapse
|
14
|
Jeub LGS, Sporns O, Fortunato S. Multiresolution Consensus Clustering in Networks. Sci Rep 2018; 8:3259. [PMID: 29459635 PMCID: PMC5818657 DOI: 10.1038/s41598-018-21352-7] [Citation(s) in RCA: 79] [Impact Index Per Article: 11.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/09/2017] [Accepted: 02/02/2018] [Indexed: 11/16/2022] Open
Abstract
Networks often exhibit structure at disparate scales. We propose a method for identifying community structure at different scales based on multiresolution modularity and consensus clustering. Our contribution consists of two parts. First, we propose a strategy for sampling the entire range of possible resolutions for the multiresolution modularity quality function. Our approach is directly based on the properties of modularity and, in particular, provides a natural way of avoiding the need to increase the resolution parameter by several orders of magnitude to break a few remaining small communities, necessitating the introduction of ad-hoc limits to the resolution range with standard sampling approaches. Second, we propose a hierarchical consensus clustering procedure, based on a modified modularity, that allows one to construct a hierarchical consensus structure given a set of input partitions. While here we are interested in its application to partitions sampled using multiresolution modularity, this consensus clustering procedure can be applied to the output of any clustering algorithm. As such, we see many potential applications of the individual parts of our multiresolution consensus clustering procedure in addition to using the procedure itself to identify hierarchical structure in networks.
Collapse
Affiliation(s)
- Lucas G S Jeub
- School of Informatics, Computing and Engineering, Indiana University, Indiana, United States.
| | - Olaf Sporns
- Department of Psychological and Brain Sciences, Indiana University, Indiana, United States
- Network Science Institute (IUNI), Indiana University, Indiana, United States
| | - Santo Fortunato
- School of Informatics, Computing and Engineering, Indiana University, Indiana, United States
- Network Science Institute (IUNI), Indiana University, Indiana, United States
| |
Collapse
|
15
|
Analysis of Network Clustering Algorithms and Cluster Quality Metrics at Scale. PLoS One 2016; 11:e0159161. [PMID: 27391786 PMCID: PMC4938516 DOI: 10.1371/journal.pone.0159161] [Citation(s) in RCA: 42] [Impact Index Per Article: 4.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/10/2016] [Accepted: 06/28/2016] [Indexed: 11/30/2022] Open
Abstract
Overview Notions of community quality underlie the clustering of networks. While studies surrounding network clustering are increasingly common, a precise understanding of the realtionship between different cluster quality metrics is unknown. In this paper, we examine the relationship between stand-alone cluster quality metrics and information recovery metrics through a rigorous analysis of four widely-used network clustering algorithms—Louvain, Infomap, label propagation, and smart local moving. We consider the stand-alone quality metrics of modularity, conductance, and coverage, and we consider the information recovery metrics of adjusted Rand score, normalized mutual information, and a variant of normalized mutual information used in previous work. Our study includes both synthetic graphs and empirical data sets of sizes varying from 1,000 to 1,000,000 nodes. Cluster Quality Metrics We find significant differences among the results of the different cluster quality metrics. For example, clustering algorithms can return a value of 0.4 out of 1 on modularity but score 0 out of 1 on information recovery. We find conductance, though imperfect, to be the stand-alone quality metric that best indicates performance on the information recovery metrics. Additionally, our study shows that the variant of normalized mutual information used in previous work cannot be assumed to differ only slightly from traditional normalized mutual information. Network Clustering Algorithms Smart local moving is the overall best performing algorithm in our study, but discrepancies between cluster evaluation metrics prevent us from declaring it an absolutely superior algorithm. Interestingly, Louvain performed better than Infomap in nearly all the tests in our study, contradicting the results of previous work in which Infomap was superior to Louvain. We find that although label propagation performs poorly when clusters are less clearly defined, it scales efficiently and accurately to large graphs with well-defined clusters.
Collapse
|
16
|
Salnikov V, Schaub MT, Lambiotte R. Using higher-order Markov models to reveal flow-based communities in networks. Sci Rep 2016; 6:23194. [PMID: 27029508 PMCID: PMC4814833 DOI: 10.1038/srep23194] [Citation(s) in RCA: 28] [Impact Index Per Article: 3.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/18/2015] [Accepted: 03/02/2016] [Indexed: 12/01/2022] Open
Abstract
Complex systems made of interacting elements are commonly abstracted as networks, in which nodes are associated with dynamic state variables, whose evolution is driven by interactions mediated by the edges. Markov processes have been the prevailing paradigm to model such a network-based dynamics, for instance in the form of random walks or other types of diffusions. Despite the success of this modelling perspective for numerous applications, it represents an over-simplification of several real-world systems. Importantly, simple Markov models lack memory in their dynamics, an assumption often not realistic in practice. Here, we explore possibilities to enrich the system description by means of second-order Markov models, exploiting empirical pathway information. We focus on the problem of community detection and show that standard network algorithms can be generalized in order to extract novel temporal information about the system under investigation. We also apply our methodology to temporal networks, where we can uncover communities shaped by the temporal correlations in the system. Finally, we discuss relations of the framework of second order Markov processes and the recently proposed formalism of using non-backtracking matrices for community detection.
Collapse
Affiliation(s)
- Vsevolod Salnikov
- naXys, University of Namur, Rempart de la Vierge 8, 5000 Namur, Belgium
| | - Michael T Schaub
- naXys, University of Namur, Rempart de la Vierge 8, 5000 Namur, Belgium.,ICTEAM, Université catholique de Louvain, Avenue George Lemaître 4, B-1348 Louvain-la-Neuve, Belgium
| | - Renaud Lambiotte
- naXys, University of Namur, Rempart de la Vierge 8, 5000 Namur, Belgium
| |
Collapse
|
17
|
Kheirkhahzadeh M, Lancichinetti A, Rosvall M. Efficient community detection of network flows for varying Markov times and bipartite networks. Phys Rev E 2016; 93:032309. [PMID: 27078368 DOI: 10.1103/physreve.93.032309] [Citation(s) in RCA: 31] [Impact Index Per Article: 3.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/04/2015] [Indexed: 11/07/2022]
Abstract
Community detection of network flows conventionally assumes one-step dynamics on the links. For sparse networks and interest in large-scale structures, longer timescales may be more appropriate. Oppositely, for large networks and interest in small-scale structures, shorter timescales may be better. However, current methods for analyzing networks at different timescales require expensive and often infeasible network reconstructions. To overcome this problem, we introduce a method that takes advantage of the inner workings of the map equation and evades the reconstruction step. This makes it possible to efficiently analyze large networks at different Markov times with no extra overhead cost. The method also evades the costly unipartite projection for identifying flow modules in bipartite networks.
Collapse
Affiliation(s)
- Masoumeh Kheirkhahzadeh
- Department of IT and Computer Engineering, Iran University of Science and Technology, Teheran, Iran.,Integrated Science Lab, Department of Physics, Umeå University, SE-901 87 Umeå, Sweden
| | - Andrea Lancichinetti
- Integrated Science Lab, Department of Physics, Umeå University, SE-901 87 Umeå, Sweden
| | - Martin Rosvall
- Integrated Science Lab, Department of Physics, Umeå University, SE-901 87 Umeå, Sweden
| |
Collapse
|
18
|
Beguerisse-Díaz M, Garduño-Hernández G, Vangelov B, Yaliraki SN, Barahona M. Interest communities and flow roles in directed networks: the Twitter network of the UK riots. J R Soc Interface 2015; 11:20140940. [PMID: 25297320 PMCID: PMC4223916 DOI: 10.1098/rsif.2014.0940] [Citation(s) in RCA: 23] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022] Open
Abstract
Directionality is a crucial ingredient in many complex networks in which information, energy or influence are transmitted. In such directed networks, analysing flows (and not only the strength of connections) is crucial to reveal important features of the network that might go undetected if the orientation of connections is ignored. We showcase here a flow-based approach for community detection through the study of the network of the most influential Twitter users during the 2011 riots in England. Firstly, we use directed Markov Stability to extract descriptions of the network at different levels of coarseness in terms of interest communities, i.e. groups of nodes within which flows of information are contained and reinforced. Such interest communities reveal user groupings according to location, profession, employer and topic. The study of flows also allows us to generate an interest distance, which affords a personalized view of the attention in the network as viewed from the vantage point of any given user. Secondly, we analyse the profiles of incoming and outgoing long-range flows with a combined approach of role-based similarity and the novel relaxed minimum spanning tree algorithm to reveal that the users in the network can be classified into five roles. These flow roles go beyond the standard leader/follower dichotomy and differ from classifications based on regular/structural equivalence. We then show that the interest communities fall into distinct informational organigrams characterized by a different mix of user roles reflecting the quality of dialogue within them. Our generic framework can be used to provide insight into how flows are generated, distributed, preserved and consumed in directed networks.
Collapse
Affiliation(s)
- Mariano Beguerisse-Díaz
- Department of Mathematics, Imperial College London, London SW7 2AZ, UK Department of Chemistry, Imperial College London, London SW7 2AZ, UK
| | | | - Borislav Vangelov
- Department of Mathematics, Imperial College London, London SW7 2AZ, UK
| | - Sophia N Yaliraki
- Department of Chemistry, Imperial College London, London SW7 2AZ, UK
| | - Mauricio Barahona
- Department of Mathematics, Imperial College London, London SW7 2AZ, UK
| |
Collapse
|
19
|
Ser-Giacomi E, Rossi V, López C, Hernández-García E. Flow networks: a characterization of geophysical fluid transport. CHAOS (WOODBURY, N.Y.) 2015; 25:036404. [PMID: 25833442 DOI: 10.1063/1.4908231] [Citation(s) in RCA: 39] [Impact Index Per Article: 3.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/04/2023]
Abstract
We represent transport between different regions of a fluid domain by flow networks, constructed from the discrete representation of the Perron-Frobenius or transfer operator associated to the fluid advection dynamics. The procedure is useful to analyze fluid dynamics in geophysical contexts, as illustrated by the construction of a flow network associated to the surface circulation in the Mediterranean sea. We use network-theory tools to analyze the flow network and gain insights into transport processes. In particular, we quantitatively relate dispersion and mixing characteristics, classically quantified by Lyapunov exponents, to the degree of the network nodes. A family of network entropies is defined from the network adjacency matrix and related to the statistics of stretching in the fluid, in particular, to the Lyapunov exponent field. Finally, we use a network community detection algorithm, Infomap, to partition the Mediterranean network into coherent regions, i.e., areas internally well mixed, but with little fluid interchange between them.
Collapse
Affiliation(s)
- Enrico Ser-Giacomi
- Instituto de Física Interdisciplinar y Sistemas Complejos, IFISC (CSIC-UIB), Campus Universitat de les Illes Balears, E-07122 Palma de Mallorca, Spain
| | - Vincent Rossi
- Instituto de Física Interdisciplinar y Sistemas Complejos, IFISC (CSIC-UIB), Campus Universitat de les Illes Balears, E-07122 Palma de Mallorca, Spain
| | - Cristóbal López
- Instituto de Física Interdisciplinar y Sistemas Complejos, IFISC (CSIC-UIB), Campus Universitat de les Illes Balears, E-07122 Palma de Mallorca, Spain
| | - Emilio Hernández-García
- Instituto de Física Interdisciplinar y Sistemas Complejos, IFISC (CSIC-UIB), Campus Universitat de les Illes Balears, E-07122 Palma de Mallorca, Spain
| |
Collapse
|
20
|
Jeub LGS, Balachandran P, Porter MA, Mucha PJ, Mahoney MW. Think locally, act locally: detection of small, medium-sized, and large communities in large networks. PHYSICAL REVIEW. E, STATISTICAL, NONLINEAR, AND SOFT MATTER PHYSICS 2015; 91:012821. [PMID: 25679670 PMCID: PMC5125638 DOI: 10.1103/physreve.91.012821] [Citation(s) in RCA: 20] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/15/2014] [Indexed: 06/04/2023]
Abstract
It is common in the study of networks to investigate intermediate-sized (or "meso-scale") features to try to gain an understanding of network structure and function. For example, numerous algorithms have been developed to try to identify "communities," which are typically construed as sets of nodes with denser connections internally than with the remainder of a network. In this paper, we adopt a complementary perspective that communities are associated with bottlenecks of locally biased dynamical processes that begin at seed sets of nodes, and we employ several different community-identification procedures (using diffusion-based and geodesic-based dynamics) to investigate community quality as a function of community size. Using several empirical and synthetic networks, we identify several distinct scenarios for "size-resolved community structure" that can arise in real (and realistic) networks: (1) the best small groups of nodes can be better than the best large groups (for a given formulation of the idea of a good community); (2) the best small groups can have a quality that is comparable to the best medium-sized and large groups; and (3) the best small groups of nodes can be worse than the best large groups. As we discuss in detail, which of these three cases holds for a given network can make an enormous difference when investigating and making claims about network community structure, and it is important to take this into account to obtain reliable downstream conclusions. Depending on which scenario holds, one may or may not be able to successfully identify "good" communities in a given network (and good communities might not even exist for a given community quality measure), the manner in which different small communities fit together to form meso-scale network structures can be very different, and processes such as viral propagation and information diffusion can exhibit very different dynamics. In addition, our results suggest that, for many large realistic networks, the output of locally biased methods that focus on communities that are centered around a given seed node (or set of seed nodes) might have better conceptual grounding and greater practical utility than the output of global community-detection methods. They also illustrate structural properties that are important to consider in the development of better benchmark networks to test methods for community detection.
Collapse
Affiliation(s)
- Lucas G S Jeub
- Oxford Centre for Industrial and Applied Mathematics, Mathematical Institute, University of Oxford, Oxford OX2 6GG, United Kingdom
| | - Prakash Balachandran
- Morgan Stanley, Montreal, Quebec, H3C 3S4, Canada and Department of Mathematics and Statistics, Boston University, Boston, Massachusetts 02215, USA
| | - Mason A Porter
- Oxford Centre for Industrial and Applied Mathematics, Mathematical Institute, University of Oxford, Oxford OX2 6GG, United Kingdom and CABDyN Complexity Centre, University of Oxford, Oxford OX1 1HP, United Kingdom
| | - Peter J Mucha
- Carolina Center for Interdisciplinary Applied Mathematics, Department of Mathematics, University of North Carolina, Chapel Hill, North Carolina 27599-3250, USA
| | - Michael W Mahoney
- International Computer Science Institute, Berkeley, California 94704, USA and Department of Statistics, University of California at Berkeley, Berkeley, California 94720, USA
| |
Collapse
|
21
|
Kawamoto T, Rosvall M. Estimating the resolution limit of the map equation in community detection. PHYSICAL REVIEW. E, STATISTICAL, NONLINEAR, AND SOFT MATTER PHYSICS 2015; 91:012809. [PMID: 25679659 DOI: 10.1103/physreve.91.012809] [Citation(s) in RCA: 23] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Received: 02/18/2014] [Indexed: 06/04/2023]
Abstract
A community detection algorithm is considered to have a resolution limit if the scale of the smallest modules that can be resolved depends on the size of the analyzed subnetwork. The resolution limit is known to prevent some community detection algorithms from accurately identifying the modular structure of a network. In fact, any global objective function for measuring the quality of a two-level assignment of nodes into modules must have some sort of resolution limit or an external resolution parameter. However, it is yet unknown how the resolution limit affects the so-called map equation, which is known to be an efficient objective function for community detection. We derive an analytical estimate and conclude that the resolution limit of the map equation is set by the total number of links between modules instead of the total number of links in the full network as for modularity. This mechanism makes the resolution limit much less restrictive for the map equation than for modularity; in practice, it is orders of magnitudes smaller. Furthermore, we argue that the effect of the resolution limit often results from shoehorning multilevel modular structures into two-level descriptions. As we show, the hierarchical map equation effectively eliminates the resolution limit for networks with nested multilevel modular structures.
Collapse
Affiliation(s)
- Tatsuro Kawamoto
- Department of Computational Intelligence and Systems Science, Tokyo Institute of Technology, 4259-G5-22, Nagatsuta-cho, Midori-ku, Yokohama, Kanagawa 226-8502, Japan
| | - Martin Rosvall
- Integrated Science Lab, Department of Physics, Umeå University, SE-901 87 Umeå, Sweden
| |
Collapse
|
22
|
Structure of complex networks: Quantifying edge-to-edge relations by failure-induced flow redistribution. ACTA ACUST UNITED AC 2014. [DOI: 10.1017/nws.2014.4] [Citation(s) in RCA: 30] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/03/2023]
Abstract
AbstractThe analysis of complex networks has so far revolved mainly around the role of nodes and communities of nodes. However, the dynamics of interconnected systems is often focalized on edge processes, and a dual edge-centric perspective can often prove more natural. Here we present graph-theoretical measures to quantify edge-to-edge relations inspired by the notion of flow redistribution induced by edge failures. Our measures, which are related to the pseudo-inverse of the Laplacian of the network, are global and reveal the dynamical interplay between the edges of a network, including potentiallynon-localinteractions. Our framework also allows us to define the embeddedness of an edge, a measure of how strongly an edge features in the weighted cuts of the network. We showcase the general applicability of our edge-centric framework through analyses of the Iberian power grid, traffic flow in road networks, and theC. elegansneuronal network.
Collapse
|
23
|
Zhang S, Zhao H. Normalized modularity optimization method for community identification with degree adjustment. PHYSICAL REVIEW. E, STATISTICAL, NONLINEAR, AND SOFT MATTER PHYSICS 2013; 88:052802. [PMID: 24329313 DOI: 10.1103/physreve.88.052802] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/03/2013] [Revised: 07/11/2013] [Indexed: 06/03/2023]
Abstract
As a fundamental problem in network study, community identification has attracted much attention from different fields. Representing a seminal work in this area, the modularity optimization method has been widely applied and studied. However, this method has issues in resolution limit and extreme degeneracy and may not perform well for networks with unbalanced structures. Although several methods have been proposed to overcome these limitations, they are all based on the original idea of defining modularity through comparing the total number of edges within the putative communities in the observed network with that in an equivalent randomly generated network. In this paper, we show that this modularity definition is not suitable to analyze some networks such as those with unbalanced structures. Instead, we propose to define modularity through the average degree within the communities and formulate modularity as comparing the sum of average degree within communities of the observed network to that of an equivalent randomly generated network. In addition, we also propose a degree-adjusted approach for further improvement when there are unbalanced structures. We analyze the theoretical properties of our degree adjusted method. Numerical experiments for both artificial networks and real networks demonstrate that average degree plays an important role in network community identification, and our proposed methods have better performance than existing ones.
Collapse
Affiliation(s)
- Shuqin Zhang
- Center for Computational Systems Biology, School of Mathematical Sciences, Fudan University, Shanghai 200433, China
| | - Hongyu Zhao
- Department of Biostatistics, Yale School of Public Health, New Haven, Connecticut 06520, USA
| |
Collapse
|