1
|
Maekawa S, Sasaki Y, Fletcher G, Onizuka M. GenCAT: Generating attributed graphs with controlled relationships between classes, attributes, and topology. INFORM SYST 2023. [DOI: 10.1016/j.is.2023.102195] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/13/2023]
|
2
|
Gomes Ferreira CH, Murai F, Silva APC, Trevisan M, Vassio L, Drago I, Mellia M, Almeida JM. On network backbone extraction for modeling online collective behavior. PLoS One 2022; 17:e0274218. [PMID: 36107952 PMCID: PMC9477297 DOI: 10.1371/journal.pone.0274218] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/05/2022] [Accepted: 08/23/2022] [Indexed: 11/18/2022] Open
Abstract
Collective user behavior in social media applications often drives several important online and offline phenomena linked to the spread of opinions and information. Several studies have focused on the analysis of such phenomena using networks to model user interactions, represented by edges. However, only a fraction of edges contribute to the actual investigation. Even worse, the often large number of non-relevant edges may obfuscate the salient interactions, blurring the underlying structures and user communities that capture the collective behavior patterns driving the target phenomenon. To solve this issue, researchers have proposed several network backbone extraction techniques to obtain a reduced and representative version of the network that better explains the phenomenon of interest. Each technique has its specific assumptions and procedure to extract the backbone. However, the literature lacks a clear methodology to highlight such assumptions, discuss how they affect the choice of a method and offer validation strategies in scenarios where no ground truth exists. In this work, we fill this gap by proposing a principled methodology for comparing and selecting the most appropriate backbone extraction method given a phenomenon of interest. We characterize ten state-of-the-art techniques in terms of their assumptions, requirements, and other aspects that one must consider to apply them in practice. We present four steps to apply, evaluate and select the best method(s) to a given target phenomenon. We validate our approach using two case studies with different requirements: online discussions on Instagram and coordinated behavior in WhatsApp groups. We show that each method can produce very different backbones, underlying that the choice of an adequate method is of utmost importance to reveal valuable knowledge about the particular phenomenon under investigation.
Collapse
Affiliation(s)
- Carlos Henrique Gomes Ferreira
- Department of Computer Science, Universidade Federal de Minas Gerais, Belo Horizonte, Minas Gerais, Brazil
- Department of Computing and Systems, Universidade Federal de Ouro Preto, João Monlevade, Minas Gerais, Brazil
- Department of Electronics and Telecommunications, Politecnico di Torino, Torino, Italy
| | - Fabricio Murai
- Department of Computer Science, Universidade Federal de Minas Gerais, Belo Horizonte, Minas Gerais, Brazil
| | - Ana P. C. Silva
- Department of Computer Science, Universidade Federal de Minas Gerais, Belo Horizonte, Minas Gerais, Brazil
| | - Martino Trevisan
- Department of Electronics and Telecommunications, Politecnico di Torino, Torino, Italy
| | - Luca Vassio
- Department of Control and Computer Engineering, Politecnico di Torino, Torino, Italy
| | - Idilio Drago
- Department of Computer Science, Università di Torino, Torino, Italy
| | - Marco Mellia
- Department of Control and Computer Engineering, Politecnico di Torino, Torino, Italy
| | - Jussara M. Almeida
- Department of Computer Science, Universidade Federal de Minas Gerais, Belo Horizonte, Minas Gerais, Brazil
| |
Collapse
|
3
|
Liu M, Yang J, Guo J, Chen J, Zhang Y. An improved two-stage label propagation algorithm based on LeaderRank. PeerJ Comput Sci 2022; 8:e981. [PMID: 36091993 PMCID: PMC9454888 DOI: 10.7717/peerj-cs.981] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/11/2021] [Accepted: 04/25/2022] [Indexed: 06/15/2023]
Abstract
To solve the problems of poor stability and low modularity (Q) of community division results caused by the randomness of node selection and label update in the traditional label propagation algorithm, an improved two-stage label propagation algorithm based on LeaderRank was proposed in this study. In the first stage, the order of node updating was determined by the participation coefficient (PC). Then, a new similarity measure was defined to improve the label selection mechanism so as to solve the problem of label oscillation caused by multiple labels of the node with the most similarity to the node. Moreover, the influence of the nodes was comprehensively used to find the initial community structure. In the second stage, the rough communities obtained in the first stage were regarded as nodes, and their merging sequence was determined by the PC. Next, the non-weak community and the community with the largest number of connected edges were combined. Finally, the community structure was further optimized to improve the modularity so as to obtain the final partition result. Experiments were performed on nine classic realistic networks and 19 artificial datasets with different scales, complexities, and densities. The modularity and normalized mutual information (NMI) were used as evaluation indexes for comparing the improved algorithm with dozens of relevant classic algorithms. The results showed that the proposed algorithm yields superior performance, and the results of community partitioning obtained using the improved algorithm were stable and more accurate than those obtained using other algorithms. In addition, the proposed algorithm always performs well in nine large-scale artificial data sets with 6,000 to 50,000 nodes and three large realistic network datasets, which verifies its computational performance and utility in community detection for large-scale networks.
Collapse
Affiliation(s)
- Miaomiao Liu
- School of Computer and Information Technology, Northeast Petroleum University, Daqing, Heilongjiang, China
- Key Laboratory of Petroleum Big Data and Intelligent Analysis of Heilongjiang Province, Northeast Petroleum University, Daqing, Heilongjiang, China
| | - Jinyun Yang
- School of Computer and Information Technology, Northeast Petroleum University, Daqing, Heilongjiang, China
| | - Jingfeng Guo
- College of Information Science and Engineering, Yanshan University, Qinhuangdao, Hebei, China
| | - Jing Chen
- College of Information Science and Engineering, Yanshan University, Qinhuangdao, Hebei, China
| | - Yongsheng Zhang
- School of Computer and Information Technology, Northeast Petroleum University, Daqing, Heilongjiang, China
| |
Collapse
|
4
|
Malvestio I, Cardillo A, Masuda N. Interplay between
k
-core and community structure in complex networks. Sci Rep 2020; 10:14702. [PMID: 32895432 PMCID: PMC7477593 DOI: 10.1038/s41598-020-71426-8] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/02/2020] [Accepted: 08/07/2020] [Indexed: 11/12/2022] Open
Abstract
The organisation of a network in a maximal set of nodes having at least k neighbours within the set, known ask -core decomposition, has been used for studying various phenomena. It has been shown that nodes in the innermostk -shells play a crucial role in contagion processes, emergence of consensus, and resilience of the system. It is known that thek -core decomposition of many empirical networks cannot be explained by the degree of each node alone, or equivalently, random graph models that preserve the degree of each node (i.e., configuration model). Here we study thek -core decomposition of some empirical networks as well as that of some randomised counterparts, and examine the extent to which thek -shell structure of the networks can be accounted for by the community structure. We find that preserving the community structure in the randomisation process is crucial for generating networks whosek -core decomposition is close to the empirical one. We also highlight the existence, in some networks, of a concentration of the nodes in the innermostk -shells into a small number of communities.
Collapse
Affiliation(s)
- Irene Malvestio
- Department of Engineering Mathematics, University of Bristol, Bristol, BS8 1UB UK
| | - Alessio Cardillo
- Department of Engineering Mathematics, University of Bristol, Bristol, BS8 1UB UK
- Department of Computer Science and Mathematics, University Rovira i Virgili, 43007 Tarragona, Spain
- GOTHAM Lab – Institute for Biocomputation and Physics of Complex Systems (BIFI), University of Zaragoza, 50018 Zaragoza, Spain
| | - Naoki Masuda
- Department of Engineering Mathematics, University of Bristol, Bristol, BS8 1UB UK
- Department of Mathematics, University at Buffalo, Buffalo, NY 14260-2900 United States
- Computational and Data-Enabled Science and Engineering Program, University at Buffalo, State University of New York, Buffalo, NY 14260-5030 USA
| |
Collapse
|
5
|
Perotti JI, Almeira N, Saracco F. Towards a generalization of information theory for hierarchical partitions. Phys Rev E 2020; 101:062148. [PMID: 32688491 DOI: 10.1103/physreve.101.062148] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/27/2020] [Accepted: 06/02/2020] [Indexed: 11/07/2022]
Abstract
Complex systems often exhibit multiple levels of organization covering a wide range of physical scales, so the study of the hierarchical decomposition of their structure and function is frequently convenient. To better understand this phenomenon, we introduce a generalization of information theory that works with hierarchical partitions. We begin revisiting the recently introduced hierarchical mutual information (HMI), and show that it can be written as a level by level summation of classical conditional mutual information terms. Then, we prove that the HMI is bounded from above by the corresponding hierarchical joint entropy. In this way, in analogy to the classical case, we derive hierarchical generalizations of many other classical information-theoretic quantities. In particular, we prove that, as opposed to its classical counterpart, the hierarchical generalization of the variation of information is not a metric distance, but it admits a transformation into one. Moreover, focusing on potential applications of the existing developments of the theory, we show how to adjust by chance the HMI. We also corroborate and analyze all the presented theoretical results with exhaustive numerical computations, and include an illustrative application example of the introduced formalism. Finally, we mention some open problems that should be eventually addressed for the proposed generalization of information theory to reach maturity.
Collapse
Affiliation(s)
- Juan Ignacio Perotti
- Facultad de Matemática, Astronomía, Física y Computación, Universidad Nacional de Córdoba, Ciudad Universitaria, Córdoba, Argentina.,Instituto de Física Enrique Gaviola (IFEG-CONICET), Ciudad Universitaria, Córdoba, Argentina
| | - Nahuel Almeira
- Facultad de Matemática, Astronomía, Física y Computación, Universidad Nacional de Córdoba, Ciudad Universitaria, Córdoba, Argentina.,Instituto de Física Enrique Gaviola (IFEG-CONICET), Ciudad Universitaria, Córdoba, Argentina
| | - Fabio Saracco
- IMT School for Advanced Studies Lucca, Piazza San Francesco 19, I-55100, Lucca, Italy
| |
Collapse
|
6
|
Hu K, Xiang J, Yu YX, Tang L, Xiang Q, Li JM, Tang YH, Chen YJ, Zhang Y. Significance-based multi-scale method for network community detection and its application in disease-gene prediction. PLoS One 2020; 15:e0227244. [PMID: 32196490 PMCID: PMC7083276 DOI: 10.1371/journal.pone.0227244] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/30/2019] [Accepted: 12/16/2019] [Indexed: 11/18/2022] Open
Abstract
Community detection in complex networks is an important issue in network science. Several statistical measures have been proposed and widely applied to detecting the communities in various complex networks. However, due to the lack of flexibility resolution, some of them have to encounter the resolution limit and thus are not compatible with multi-scale structures of complex networks. In this paper, we investigated a statistical measure of interest for community detection, Significance [Sci. Rep. 3 (2013) 2930], and analyzed its critical behaviors based on the theoretical derivation of critical number of communities and the phase diagram in community-partition transition. It was revealed that Significance exhibits far higher resolution than the traditional Modularity when the intra- and inter-link densities of communities are obviously different. Following the critical analysis, we developed a multi-resolution version of Significance for identifying communities in the multi-scale networks. Experimental tests in several typical networks have been performed and confirmed that the generalized Significance can be competent for the multi-scale communities detection. Moreover, it can effectively relax the first- and second-type resolution limits. Finally, we displayed an important potential application of the multi-scale Significance in computational biology: disease-gene identification, showing that extracting information from the perspective of multi-scale module mining is helpful for disease gene prediction.
Collapse
Affiliation(s)
- Ke Hu
- School of Physics and Optoelectronic Engineering, Xiangtan University, Xiangtan, Hunan, People’s Republic of China
| | - Ju Xiang
- School of Computer Science and Engineering, Central South University, Changsha, Hunan, People’s Republic of China
- School of Basic Medical Sciences, Changsha Medical University, Changsha, Hunan, People’s Republic of China
| | - Yun-Xia Yu
- School of Physics and Optoelectronic Engineering, Xiangtan University, Xiangtan, Hunan, People’s Republic of China
| | - Liang Tang
- School of Basic Medical Sciences, Changsha Medical University, Changsha, Hunan, People’s Republic of China
| | - Qin Xiang
- School of Basic Medical Sciences, Changsha Medical University, Changsha, Hunan, People’s Republic of China
| | - Jian-Ming Li
- School of Basic Medical Sciences, Changsha Medical University, Changsha, Hunan, People’s Republic of China
- Department of Neurology, Xiang-ya Hospital, Central South University, Changsha, Hunan, People’s Republic of China
- Department of Rehabilitation, Xiangya Boai Rehabilitation Hospital, Changsha, Hunan, People’s Republic of China
- Department of Neurology, Nanhua Affiliated Hospital, University of South China, Hengyang, Hunan, People’s Republic of China
| | - Yong-Hong Tang
- Department of Neurology, Nanhua Affiliated Hospital, University of South China, Hengyang, Hunan, People’s Republic of China
| | - Yong-Jun Chen
- Department of Neurology, Nanhua Affiliated Hospital, University of South China, Hengyang, Hunan, People’s Republic of China
| | - Yan Zhang
- School of Computer Science and Engineering, Central South University, Changsha, Hunan, People’s Republic of China
- School of Basic Medical Sciences, Changsha Medical University, Changsha, Hunan, People’s Republic of China
| |
Collapse
|