1
|
Sirianni AD, Morgan JH, Zöller N, Rogers KB, Schröder T. Complements and competitors: Examining technological co-diffusion and relatedness on a collaborative coding platform. PNAS NEXUS 2024; 3:pgae549. [PMID: 39677372 PMCID: PMC11646702 DOI: 10.1093/pnasnexus/pgae549] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 06/19/2024] [Accepted: 11/19/2024] [Indexed: 12/17/2024]
Abstract
Diffusive and contagious processes spread in the context of one another in connected populations. Diffusions may be more likely to pass through portions of a network where compatible diffusions are already present. We examine this by incorporating the concept of "relatedness" from the economic complexity literature into a network co-diffusion model. Building on the "product space" concept used in this work, we consider technologies themselves as nodes in "product networks," where edges define relationships between products. Specifically, coding languages on GitHub, an online platform for collaborative coding, are considered. From rates of language co-occurrence in coding projects, we calculate rates of functional cohesion and functional equivalence for each pair of languages. From rates of how individuals adopt and abandon coding languages over time, we calculate measures of complementary diffusion and substitutive diffusion for each pair of languages relative to one another. Consistent with the principle of relatedness, network regression techniques (MR-QAP) reveal strong evidence that functional cohesion positively predicts complementary diffusion. We also find limited evidence that functional equivalence predicts substitutive (competitive) diffusion. Results support the broader finding that functional dependencies between diffusive processes will dictate how said processes spread relative to one another across a population of potential adopters.
Collapse
Affiliation(s)
- Antonio D Sirianni
- Department of Sociology, Dartmouth College, 20 N Main St, Hanover, NH 03755, USA
- McCourt School of Public Policy, Georgetown University, 125 E St NW, Washington, DC 20001, USA
| | - Jonathan H Morgan
- Department of Sociology, Duke University, 417 Chapel Drive, Durham, NC 27708, USA
- Institute for Applied Research Urban Future, Potsdam University of Applied Sciences, Kiepenheuerallee 5, Potsdam 14469, Germany
| | - Nikolas Zöller
- Institute for Applied Research Urban Future, Potsdam University of Applied Sciences, Kiepenheuerallee 5, Potsdam 14469, Germany
- Department of Adaptive Rationality, Max Planck Institute for Human Development, Lentzeallee 94, Berlin 14195, Germany
| | - Kimberly B Rogers
- Department of Sociology, Dartmouth College, 20 N Main St, Hanover, NH 03755, USA
| | - Tobias Schröder
- Institute for Applied Research Urban Future, Potsdam University of Applied Sciences, Kiepenheuerallee 5, Potsdam 14469, Germany
| |
Collapse
|
2
|
Ruggeri N, Contisciani M, Battiston F, De Bacco C. Community detection in large hypergraphs. SCIENCE ADVANCES 2023; 9:eadg9159. [PMID: 37436987 PMCID: PMC10337898 DOI: 10.1126/sciadv.adg9159] [Citation(s) in RCA: 8] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 01/30/2023] [Accepted: 06/12/2023] [Indexed: 07/14/2023]
Abstract
Hypergraphs, describing networks where interactions take place among any number of units, are a natural tool to model many real-world social and biological systems. Here, we propose a principled framework to model the organization of higher-order data. Our approach recovers community structure with accuracy exceeding that of currently available state-of-the-art algorithms, as tested in synthetic benchmarks with both hard and overlapping ground-truth partitions. Our model is flexible and allows capturing both assortative and disassortative community structures. Moreover, our method scales orders of magnitude faster than competing algorithms, making it suitable for the analysis of very large hypergraphs, containing millions of nodes and interactions among thousands of nodes. Our work constitutes a practical and general tool for hypergraph analysis, broadening our understanding of the organization of real-world higher-order systems.
Collapse
Affiliation(s)
- Nicolò Ruggeri
- Max Planck Institute for Intelligent Systems, Cyber Valley, 72076 Tübingen, Germany
- Department of Computer Science, ETH, 8004 Zürich, Switzerland
| | - Martina Contisciani
- Max Planck Institute for Intelligent Systems, Cyber Valley, 72076 Tübingen, Germany
| | - Federico Battiston
- Department of Network and Data Science, Central European University, 1100 Vienna, Austria
| | - Caterina De Bacco
- Max Planck Institute for Intelligent Systems, Cyber Valley, 72076 Tübingen, Germany
| |
Collapse
|
3
|
Dobon B, Musciotto F, Mira A, Greenacre M, Schlaepfer R, Aguileta G, Astete LH, Ngales M, Latora V, Battiston F, Vinicius L, Migliano AB, Bertranpetit J. The making of the oral microbiome in Agta hunter-gatherers. EVOLUTIONARY HUMAN SCIENCES 2023; 5:e13. [PMID: 37587941 PMCID: PMC10426117 DOI: 10.1017/ehs.2023.9] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/25/2022] [Revised: 04/04/2023] [Accepted: 04/06/2023] [Indexed: 08/18/2023] Open
Abstract
Ecological and genetic factors have influenced the composition of the human microbiome during our evolutionary history. We analysed the oral microbiota of the Agta, a hunter-gatherer population where some members have adopted an agricultural diet. We show that age is the strongest factor modulating the microbiome, probably through immunosenescence since we identified an increase in the number of species classified as pathogens with age. We also characterised biological and cultural processes generating sexual dimorphism in the oral microbiome. A small subset of oral bacteria is influenced by the host genome, linking host collagen genes to bacterial biofilm formation. Our data also suggest that shifting from a fish/meat diet to a rice-rich diet transforms their microbiome, mirroring the Neolithic transition. All of these factors have implications in the epidemiology of oral diseases. Thus, the human oral microbiome is multifactorial and shaped by various ecological and social factors that modify the oral environment.
Collapse
Affiliation(s)
- Begoña Dobon
- Department of Anthropology, University of Zurich, Switzerland
- Institut de Biologia Evolutiva (CSIC-Universitat Pompeu Fabra), Barcelona, Spain
| | - Federico Musciotto
- Department of Anthropology, University of Zurich, Switzerland
- Dipartimento di Fisica e Chimica, Università di Palermo, Italy
| | - Alex Mira
- Department of Health and Genomics, Center for Advanced Research in Public Health, FISABIO Foundation, Valencia, Spain
- CIBER Center for Epidemiology and Public Health, Madrid, Spain
| | - Michael Greenacre
- Department of Economics and Business, Universitat Pompeu Fabra and Barcelona Graduate School of Economics, Barcelona, Spain
- Faculty of Biosciences, Fisheries and Economics, University of Tromsø, Norway
| | | | - Gabriela Aguileta
- Institut de Biologia Evolutiva (CSIC-Universitat Pompeu Fabra), Barcelona, Spain
| | - Leonora H. Astete
- Lyceum of the Philippines University, Intramuros, Manila, Philippines
| | - Marilyn Ngales
- Lyceum of the Philippines University, Intramuros, Manila, Philippines
| | - Vito Latora
- School of Mathematical Sciences, Queen Mary University of London, UK
- Dipartimento di Fisica ed Astronomia, Università di Catania and INFN, Catania, Italy
- Complexity Science Hub Vienna, Vienna, Austria
| | - Federico Battiston
- Department of Anthropology, University of Zurich, Switzerland
- Department of Network and Data Science, Central European University, Vienna 1100, Austria
| | - Lucio Vinicius
- Department of Anthropology, University of Zurich, Switzerland
- Department of Anthropology, University College London, UK
| | - Andrea B. Migliano
- Department of Anthropology, University of Zurich, Switzerland
- Department of Anthropology, University College London, UK
| | - Jaume Bertranpetit
- Institut de Biologia Evolutiva (CSIC-Universitat Pompeu Fabra), Barcelona, Spain
| |
Collapse
|
4
|
Um S, Zhang B, Wattal S, Yoo Y. Software Components and Product Variety in a Platform Ecosystem: A Dynamic Network Analysis of WordPress. INFORMATION SYSTEMS RESEARCH 2022. [DOI: 10.1287/isre.2022.1172] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]
Abstract
Software components such as application programming interfaces (APIs) provided by external developers are vital to online digital platforms. Although APIs generally increase the variety of products according to anecdote, the precise relationship between the categories of APIs and product variety is not yet known. We find that APIs, regarding their use frequency, are categorized into three groups. The core is a group of frequently used APIs, whereas the periphery is a group of sparsely used APIs. In a large and mature platform ecosystem, an additional group of APIs, the regular core, mainly provided by third-party developers, emerges. APIs in the regular core are the main driver of product variety. However, we also find that the strength of this effect diminishes in a newly created product category when most of the new products are built by duplicating the usage of APIs from other products. A platform owner can stimulate developers’ creativity by acting as a bridge between digital product providers and third-party developers. It can collect functional needs from third-party developers and then share them with product providers. Therefore, the latter can build APIs that developers need.
Collapse
Affiliation(s)
- Sungyong Um
- Department of Information Systems and Analytics, School of Computing, National University of Singapore, Singapore 119391, Singapore
| | - Bin Zhang
- Department of Information and Operations Management, Mays Business School, Texas A&M University, College Station, Texas 77843
| | - Sunil Wattal
- Department of Management Information Systems, Fox School of Business, Temple University, Philadelphia, Pennsylvania 19122
| | - Youngjin Yoo
- Department of Design and Innovation, Weatherhead School of Management, Case Western Reserve University, Cleveland, Ohio 44106
| |
Collapse
|
5
|
Krishnagopal S. The collective vs individual nature of mountaineering: a network and simplicial approach. APPLIED NETWORK SCIENCE 2022; 7:62. [PMID: 36072295 PMCID: PMC9440880 DOI: 10.1007/s41109-022-00503-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 02/08/2022] [Accepted: 08/22/2022] [Indexed: 06/15/2023]
Abstract
Mountaineering is a sport of contrary forces: teamwork plays a large role in mental fortitude and skills, but the actual act of climbing, and indeed survival, is largely individualistic. This work studies the effects of the structure and topology of relationships within climbers on the level of cooperation and success. It does so using simplicial complexes, where relationships between climbers are captured through simplices that correspond to joint previous expeditions with dimension given by the number of climbers minus one and weight given by the number of occurrences of the simplex. First, this analysis establishes the importance of relationships in mountaineering and shows that chances of failure to summit reduce drastically when climbing with repeated partners. From a climber-centric perspective, it finds that climbers that belong to simplices with large dimension were more likely to be successful, across all experience levels. Then, the distribution of relationships within a group is explored to categorize collective human behavior in expeditions, on a spectrum from polarized to cooperative. Expeditions containing simplices with large dimension, and usually low weight (weak relationships), implying that a large number of people participated in a small number of joint expeditions, tended to be more cooperative, improving chances of success of all members of the group, not just those that were part of the simplex. On the other hand, the existence of small, usually high weight (i.e., strong relationships) simplices, subgroups lead to a polarized style where climbers that were not a part of the subgroup were less likely to succeed. Lastly, this work examines the effects of individual features (such as age, gender, climber experience etc.) and expedition-wide factors (number of camps, total number of days etc.) that are more important determiners of success in individualistic and cooperative expeditions respectively. Centrality indicates that individual features of youth and oxygen use while ascending are the most important predictors of success. Of expedition-wide factors, the expedition size and number of expedition days are found to be strongly correlated with success rate.
Collapse
Affiliation(s)
- Sanjukta Krishnagopal
- Department of Mathematics, University of California, Los Angeles, Los Angeles, United States
- Berkeley Artificial Intelligence Research Lab, University of California, Berkeley, Berkeley, United States
| |
Collapse
|
6
|
Peng Z, Zhou Q. An empirical Bayes approach to stochastic blockmodels and graphons: shrinkage estimation and model selection. PeerJ Comput Sci 2022; 8:e1006. [PMID: 35875655 PMCID: PMC9299287 DOI: 10.7717/peerj-cs.1006] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/26/2021] [Accepted: 05/24/2022] [Indexed: 06/15/2023]
Abstract
The graphon (W-graph), including the stochastic block model as a special case, has been widely used in modeling and analyzing network data. Estimation of the graphon function has gained a lot of recent research interests. Most existing works focus on inference in the latent space of the model, while adopting simple maximum likelihood or Bayesian estimates for the graphon or connectivity parameters given the identified latent variables. In this work, we propose a hierarchical model and develop a novel empirical Bayes estimate of the connectivity matrix of a stochastic block model to approximate the graphon function. Based on our hierarchical model, we further introduce a new model selection criterion for choosing the number of communities. Numerical results on extensive simulations and two well-annotated social networks demonstrate the superiority of our approach in terms of parameter estimation and model selection.
Collapse
|
7
|
Bartlett TE. Comodularity and detection of co-communities. Phys Rev E 2021; 104:054309. [PMID: 34942704 DOI: 10.1103/physreve.104.054309] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/01/2021] [Accepted: 11/08/2021] [Indexed: 11/07/2022]
Abstract
This paper introduces the notion of comodularity, to cocluster observations of bipartite networks into co-communities. The task of coclustering is to group together nodes of one type with nodes of another type, according to the interactions that are the most similar. The measure of comodularity is introduced to assess the strength of co-communities, as well as to arrange the representation of nodes and clusters for visualization, and to define an objective function for optimization. We demonstrate the usefulness of our proposed methodology on simulated data, and with examples from genomics and consumer-product reviews.
Collapse
Affiliation(s)
- Thomas E Bartlett
- Department of Statistical Science, University College London, London WC1E 7HB, United Kingdom
| |
Collapse
|
8
|
Wang J, Zhang J, Liu B, Zhu J, Guo J. Fast Network Community Detection With Profile-Pseudo Likelihood Methods. J Am Stat Assoc 2021. [DOI: 10.1080/01621459.2021.1996378] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/20/2022]
Affiliation(s)
- Jiangzhou Wang
- School of Mathematics and Statistics & KLAS, Northeast Normal University, Jilin, China
- Department of Statistics and Data Science, Southern University of Science and Technology, Shenzhen, China
| | - Jingfei Zhang
- Department of Management Science, University of Miami, Coral Gables, FL
| | - Binghui Liu
- School of Mathematics and Statistics & KLAS, Northeast Normal University, Jilin, China
| | - Ji Zhu
- Department of Statistics, University of Michigan, Ann Arbor, MI
| | - Jianhua Guo
- School of Mathematics and Statistics & KLAS, Northeast Normal University, Jilin, China
| |
Collapse
|
9
|
Long Y, Huang T. Statistical inference on group Rasch mixture network model. Stat (Int Stat Inst) 2021. [DOI: 10.1002/sta4.436] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]
Affiliation(s)
- Yuhang Long
- School of Statistics and Management Shanghai University of Finance and Economics China
| | - Tao Huang
- School of Statistics and Management Shanghai University of Finance and Economics China
| |
Collapse
|
10
|
Arroyo J, Priebe CE, Lyzinski V. Graph Matching between Bipartite and Unipartite Networks: to Collapse, or not to Collapse, that is the Question. IEEE TRANSACTIONS ON NETWORK SCIENCE AND ENGINEERING 2021; 8:3019-3033. [PMID: 35224127 PMCID: PMC8865401 DOI: 10.1109/tnse.2021.3086508] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/14/2023]
Abstract
Graph matching consists of aligning the vertices of two unlabeled graphs in order to maximize the shared structure across networks; when the graphs are unipartite, this is commonly formulated as minimizing their edge disagreements. In this paper we address the common setting in which one of the graphs to match is a bipartite network and one is unipartite. Commonly, the bipartite networks are collapsed or projected into a unipartite graph, and graph matching proceeds as in the classical setting. This potentially leads to noisy edge estimates and loss of information. We formulate the graph matching problem between a bipartite and a unipartite graph using an undirected graphical model, and introduce methods to find the alignment with this model without collapsing. We theoretically demonstrate that our methodology is consistent, and provide non-asymptotic conditions that ensure exact recovery of the matching solution. In simulations and real data examples, we show how our methods can result in a more accurate matching than the naive approach of transforming the bipartite networks into unipartite, and we demonstrate the performance gains achieved by our method in simulated and real data networks, including a co-authorship-citation network pair, and brain structural and functional data.
Collapse
Affiliation(s)
| | - Carey E Priebe
- Department of Applied Mathematics and Statistics, and the Center for Imaging Science, Johns Hopkins University, Baltimore, MD 21218
| | - Vince Lyzinski
- Department of Mathematics, University of Maryland, College Park, MD 20742
| |
Collapse
|
11
|
Wang J, Li K. Community structure exploration considering latent link patterns in complex networks. Neurocomputing 2021. [DOI: 10.1016/j.neucom.2021.06.032] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
|
12
|
Mitrai I, Tang W, Daoutidis P. Stochastic blockmodeling for learning the structure of optimization problems. AIChE J 2021. [DOI: 10.1002/aic.17415] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/07/2023]
Affiliation(s)
- Ilias Mitrai
- Department of Chemical Engineering and Materials Science University of Minnesota Minneapolis Minnesota USA
| | - Wentao Tang
- Projects and Technology Shell Global Solutions (U.S.) Inc. Houston Texas USA
| | - Prodromos Daoutidis
- Department of Chemical Engineering and Materials Science University of Minnesota Minneapolis Minnesota USA
| |
Collapse
|
13
|
Kawamoto T, Aoki T, Ueda M. Graph-based open-ended survey on concerns related to COVID-19. PLoS One 2021; 16:e0256212. [PMID: 34388225 PMCID: PMC8362959 DOI: 10.1371/journal.pone.0256212] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/15/2020] [Accepted: 08/02/2021] [Indexed: 11/20/2022] Open
Abstract
The COVID-19 pandemic is an unprecedented public health crisis with broad social and economic consequences. We conducted four surveys between April and August 2020 using the graph-based open-ended survey (GOS) framework, and investigated the most pressing concerns and issues for the general public in Japan. The GOS framework is a hybrid of the two traditional survey frameworks that allows respondents to post their opinions in a free-format style, which can subsequently serve as one of the choice items for other respondents, just as in a multiple-choice survey. As a result, this framework generates an opinion graph that relates opinions and respondents. We can also construct annotated opinion graphs to achieve a higher resolution. By clustering the annotated opinion graphs, we revealed the characteristic evolution of the response patterns as well as the interconnectedness and multi-faceted nature of opinions. Substantively, our notable finding is that “social pressure,” not “infection risk,” was one of the major concerns of our respondents. Social pressure refers to criticism and discrimination that they anticipate receiving from others should they contract COVID-19. It is possible that the collectivist nature of Japanese culture coupled with the government’s policy of relying on personal responsibility to combat COVID-19 explains some of the above findings, as the latter has led to the emergence of vigilantes. The presence of mutual surveillance can contribute to growing skepticism toward others as well as fear of ostracism, which may have negative consequences at both the societal and individual levels.
Collapse
Affiliation(s)
- Tatsuro Kawamoto
- Artificial Intelligence Research Center, National Institute of Advanced Industrial Science and Technology, Tokyo, Japan
- * E-mail:
| | - Takaaki Aoki
- Faculty of Education, Kagawa University, Takamatsu, Japan
| | - Michiko Ueda
- Faculty of Political Science and Economics, Waseda University, Tokyo, Japan
| |
Collapse
|
14
|
Smith MJ, Baxter AJ, Skivington K, McCann M, Hilton S, Katikireddi SV. Examining the sources of evidence in e-cigarette policy recommendations: A citation network analysis of international public health recommendations. PLoS One 2021; 16:e0255604. [PMID: 34347823 PMCID: PMC8336794 DOI: 10.1371/journal.pone.0255604] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/15/2021] [Accepted: 07/19/2021] [Indexed: 12/30/2022] Open
Abstract
BACKGROUND Public health policies and recommendations aim to be informed by the best available evidence. Evidence underpinning e-cigarettes policy recommendations has been necessarily limited due to the novelty of the technology and the lack of long-term epidemiological studies and trials. Some public health bodies have actively encouraged e-cigarette use whilst others have raised concerns over introducing new health risks and renormalising tobacco smoking. Using citation network analysis we investigated the author conflicts of interest and study funding statements within sources of evidence used by public health bodies when making recommendations about e-cigarette policy. METHODS We conducted citation network analysis of public health recommendation documents across four purposively selected diverse jurisdictions: WHO, UK, Australia, and USA. We extracted all citations from 15 public health recommendation documents, with more detailed data collected for influential citations (used in 3+ recommendation documents). We analysed the relationships between the sources of evidence used across jurisdictions using block modelling to determine if similar groups of documents were used across different jurisdictions. We assessed the frequency and nature of conflicts of interest. RESULTS 1700 unique citations were included across the 15 public health recommendation documents, with zero to 923 citations per document (median = 63, IQR = 7.5-132). The evidence base underpinning public health recommendations did not systematically differ across jurisdictions. Of the 1700 citations included, the majority were journal articles (n = 1179). Across 1081 journal articles published between 1998-2018, 200 declared a conflict of interest, 288 contained no mention of conflicts of interest, and 593 declared none. Conflicts of interest were reported with tobacco (3%; n = 37 journal articles of 1081), e-cigarette (7%; n = 72), and pharmaceutical companies (12%; n = 127), with such conflicts present even in the most recent years. There were 53 influential citations, the most common study type was basic science research without human subjects (e.g. examination of aerosols and e-liquids) (n = 18) followed by systematic review (n = 10); with randomised control trial being least common (n = 4). Network analysis identified clusters of highly-cited articles with a higher prevalence of conflicts of interest. CONCLUSION Public health bodies across different jurisdictions drew upon similar sources of evidence, despite articulating different policy approaches to e-cigarettes. The evidence drawn upon, including the most influential evidence, contained substantial conflicts of interest (including relationships with e-cigarette and tobacco industries). Processes to explicitly manage conflicts of interest arising from the underlying evidence base may be required when developing public health recommendations.
Collapse
Affiliation(s)
- Marissa J. Smith
- MRC/CSO Social and Public Health Sciences Unit, University of Glasgow, Glasgow, United Kingdom
| | - Andrew J. Baxter
- MRC/CSO Social and Public Health Sciences Unit, University of Glasgow, Glasgow, United Kingdom
| | - Kathryn Skivington
- MRC/CSO Social and Public Health Sciences Unit, University of Glasgow, Glasgow, United Kingdom
| | - Mark McCann
- MRC/CSO Social and Public Health Sciences Unit, University of Glasgow, Glasgow, United Kingdom
| | - Shona Hilton
- MRC/CSO Social and Public Health Sciences Unit, University of Glasgow, Glasgow, United Kingdom
| | | |
Collapse
|
15
|
Chodrow PS, Veldt N, Benson AR. Generative hypergraph clustering: From blockmodels to modularity. SCIENCE ADVANCES 2021; 7:eabh1303. [PMID: 34233880 PMCID: PMC11559555 DOI: 10.1126/sciadv.abh1303] [Citation(s) in RCA: 33] [Impact Index Per Article: 8.3] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 02/17/2021] [Accepted: 05/24/2021] [Indexed: 06/13/2023]
Abstract
Hypergraphs are a natural modeling paradigm for networked systems with multiway interactions. A standard task in network analysis is the identification of closely related or densely interconnected nodes. We propose a probabilistic generative model of clustered hypergraphs with heterogeneous node degrees and edge sizes. Approximate maximum likelihood inference in this model leads to a clustering objective that generalizes the popular modularity objective for graphs. From this, we derive an inference algorithm that generalizes the Louvain graph community detection method, and a faster, specialized variant in which edges are expected to lie fully within clusters. Using synthetic and empirical data, we demonstrate that the specialized method is highly scalable and can detect clusters where graph-based methods fail. We also use our model to find interpretable higher-order structure in school contact networks, U.S. congressional bill cosponsorship and committees, product categories in copurchasing behavior, and hotel locations from web browsing sessions.
Collapse
Affiliation(s)
- Philip S Chodrow
- Department of Mathematics, University of California, Los Angeles, 520 Portola Plaza, Los Angeles, CA 90095, USA.
| | - Nate Veldt
- Center for Applied Mathematics, Cornell University, 657 Frank H.T. Rhodes Hall, Ithaca, NY 14853, USA
| | - Austin R Benson
- Department of Computer Science, Cornell University, 413B Gates Hall, Ithaca, NY 14853, USA
| |
Collapse
|
16
|
Sun YE, Zhou HJ, Li JJ. Bipartite tight spectral clustering (BiTSC) algorithm for identifying conserved gene co-clusters in two species. Bioinformatics 2021; 37:1225-1233. [PMID: 32814973 DOI: 10.1093/bioinformatics/btaa741] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/06/2019] [Revised: 05/20/2020] [Accepted: 08/13/2020] [Indexed: 11/12/2022] Open
Abstract
MOTIVATION Gene clustering is a widely used technique that has enabled computational prediction of unknown gene functions within a species. However, it remains a challenge to refine gene function prediction by leveraging evolutionarily conserved genes in another species. This challenge calls for a new computational algorithm to identify gene co-clusters in two species, so that genes in each co-cluster exhibit similar expression levels in each species and strong conservation between the species. RESULTS Here, we develop the bipartite tight spectral clustering (BiTSC) algorithm, which identifies gene co-clusters in two species based on gene orthology information and gene expression data. BiTSC novelly implements a formulation that encodes gene orthology as a bipartite network and gene expression data as node covariates. This formulation allows BiTSC to adopt and combine the advantages of multiple unsupervised learning techniques: kernel enhancement, bipartite spectral clustering, consensus clustering, tight clustering and hierarchical clustering. As a result, BiTSC is a flexible and robust algorithm capable of identifying informative gene co-clusters without forcing all genes into co-clusters. Another advantage of BiTSC is that it does not rely on any distributional assumptions. Beyond cross-species gene co-clustering, BiTSC also has wide applications as a general algorithm for identifying tight node co-clusters in any bipartite network with node covariates. We demonstrate the accuracy and robustness of BiTSC through comprehensive simulation studies. In a real data example, we use BiTSC to identify conserved gene co-clusters of Drosophila melanogaster and Caenorhabditis elegans, and we perform a series of downstream analysis to both validate BiTSC and verify the biological significance of the identified co-clusters. AVAILABILITY AND IMPLEMENTATION The Python package BiTSC is open-access and available at https://github.com/edensunyidan/BiTSC. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Yidan Eden Sun
- Department of Statistics, University of California, Los Angeles, CA 90095-1554, USA
| | - Heather J Zhou
- Department of Statistics, University of California, Los Angeles, CA 90095-1554, USA
| | - Jingyi Jessica Li
- Department of Statistics, University of California, Los Angeles, CA 90095-1554, USA.,Department of Human Genetics, University of California, Los Angeles, CA 90095-7088, USA.,Department of Computational Medicine, University of California, Los Angeles, CA 90095-1766, USA
| |
Collapse
|
17
|
Gupta S, Kumar P. A constrained agglomerative clustering approach for unipartite and bipartite networks with application to credit networks. Inf Sci (N Y) 2021. [DOI: 10.1016/j.ins.2019.12.085] [Citation(s) in RCA: 9] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
|
18
|
Cai C, Li G, Chi Y, Poor HV, Chen Y. Subspace estimation from unbalanced and incomplete data matrices: ℓ2,∞ statistical guarantees. Ann Stat 2021. [DOI: 10.1214/20-aos1986] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/10/2023]
Affiliation(s)
- Changxiao Cai
- Department of Electrical Engineering, Princeton University
| | - Gen Li
- Department of Electronic Engineering, Tsinghua University
| | - Yuejie Chi
- Department of Electrical and Computer Engineering, Carnegie Mellon University
| | | | - Yuxin Chen
- Department of Electrical Engineering, Princeton University
| |
Collapse
|
19
|
Gallagher RJ, Young JG, Welles BF. A clarified typology of core-periphery structure in networks. SCIENCE ADVANCES 2021; 7:7/12/eabc9800. [PMID: 33731343 PMCID: PMC7968838 DOI: 10.1126/sciadv.abc9800] [Citation(s) in RCA: 15] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 05/26/2020] [Accepted: 01/29/2021] [Indexed: 06/12/2023]
Abstract
Core-periphery structure, the arrangement of a network into a dense core and sparse periphery, is a versatile descriptor of various social, biological, and technological networks. In practice, different core-periphery algorithms are often applied interchangeably despite the fact that they can yield inconsistent descriptions of core-periphery structure. For example, two of the most widely used algorithms, the k-cores decomposition and the classic two-block model of Borgatti and Everett, extract fundamentally different structures: The latter partitions a network into a binary hub-and-spoke layout, while the former divides it into a layered hierarchy. We introduce a core-periphery typology to clarify these differences, along with Bayesian stochastic block modeling techniques to classify networks in accordance with this typology. Empirically, we find a rich diversity of core-periphery structure among networks. Through a detailed case study, we demonstrate the importance of acknowledging this diversity and situating networks within the core-periphery typology when conducting domain-specific analyses.
Collapse
Affiliation(s)
- Ryan J Gallagher
- Network Science Institute, Northeastern University, Boston, MA 02115, USA.
| | - Jean-Gabriel Young
- Center for the Study of Complex Systems, University of Michigan, Ann Arbor, MI 48109, USA
- Department of Computer Science, University of Vermont, Burlington, VT 05405, USA
| | - Brooke Foucault Welles
- Network Science Institute, Northeastern University, Boston, MA 02115, USA
- Department of Communication Studies, Northeastern University, Boston, MA 02115, USA
| |
Collapse
|
20
|
Bouguessa M, Nouri K. BiNeTClus. ACM T INTEL SYST TEC 2021. [DOI: 10.1145/3423067] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/22/2022]
Abstract
We investigate the problem of community detection in bipartite networks that are characterized by the presence of two types of nodes such that connections exist only between nodes of different types. While some approaches have been proposed to identify community structures in bipartite networks, there are a number of problems still to solve. In fact, the majority of the proposed approaches suffer from one or even more of the following limitations: (1) difficulty in detecting communities in the presence of many non-discriminating nodes with atypical connections that hide the community structures, (2) loss of relevant topological information due to the transformation of the bipartite network to standard plain graphs, and (3) manually specifying several input parameters, including the number of communities to be identified. To alleviate these problems, we propose BiNeTClus, a parameter-free community detection algorithm in bipartite networks that operates in two phases. The first phase focuses on identifying an initial grouping of nodes through a transactional data model capable of dealing with the situation that involves networks with many atypical connections, that is, sparsely connected nodes and nodes of one type that massively connect to all other nodes of the second type. The second phase aims to refine the clustering results of the first phase via an optimization strategy of the bipartite modularity to identify the final community structures. Our experiments on both synthetic and real networks illustrate the suitability of the proposed approach.
Collapse
Affiliation(s)
| | - Khaled Nouri
- University of Quebec at Montreal, Montreal, Canada
| |
Collapse
|
21
|
Tamarit I, Pereda M, Cuesta JA. Hierarchical clustering of bipartite data sets based on the statistical significance of coincidences. Phys Rev E 2020; 102:042304. [PMID: 33212688 DOI: 10.1103/physreve.102.042304] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/30/2020] [Accepted: 09/13/2020] [Indexed: 11/07/2022]
Abstract
When some 'entities' are related by the 'features' they share they are amenable to a bipartite network representation. Plant-pollinator ecological communities, co-authorship of scientific papers, customers and purchases, or answers in a poll, are but a few examples. Analyzing clustering of such entities in the network is a useful tool with applications in many fields, like internet technology, recommender systems, or detection of diseases. The algorithms most widely applied to find clusters in bipartite networks are variants of modularity optimization. Here, we provide a hierarchical clustering algorithm based on a dissimilarity between entities that quantifies the probability that the features shared by two entities are due to mere chance. The algorithm performance is O(n^{2}) when applied to a set of n entities, and its outcome is a dendrogram exhibiting the connections of those entities. Through the introduction of a 'susceptibility' measure we can provide an 'optimal' choice for the clustering as well as quantify its quality. The dendrogram reveals further useful structural information though-like the existence of subclusters within clusters or of nodes that do not fit in any cluster. We illustrate the algorithm by applying it first to a set of synthetic networks, and then to a selection of examples. We also illustrate how to transform our algorithm into a valid alternative for one-mode networks as well, and show that it performs at least as well as the standard, modularity-based algorithms-with a higher numerical performance. We provide an implementation of the algorithm in python freely accessible from GitHub.
Collapse
Affiliation(s)
- Ignacio Tamarit
- Grupo Interdisciplinar de Sistemas Complejos (GISC), Departamento de Matemáticas de la Universidad Carlos III de Madrid, Leganés, Spain.,Unidad Mixta Interdisciplinar de Comportamiento y Complejidad Social (UMICCS), Madrid, Spain
| | - María Pereda
- Grupo Interdisciplinar de Sistemas Complejos (GISC), Departamento de Matemáticas de la Universidad Carlos III de Madrid, Leganés, Spain.,Unidad Mixta Interdisciplinar de Comportamiento y Complejidad Social (UMICCS), Madrid, Spain.,Grupo de Investigación Ingeniería de Organización y Logística (IOL), Escuela Técnica Superior de Ingenieros Industriales, Universidad Politécnica de Madrid, Madrid, Spain
| | - José A Cuesta
- Grupo Interdisciplinar de Sistemas Complejos (GISC), Departamento de Matemáticas de la Universidad Carlos III de Madrid, Leganés, Spain.,Unidad Mixta Interdisciplinar de Comportamiento y Complejidad Social (UMICCS), Madrid, Spain.,Instituto de Biocomputación y Física de Sistemas Complejos (BIFI), Universidad de Zaragoza, Zaragoza, Spain.,UC3M-Santander Big Data Institute (IBiDat), Getafe, Spain
| |
Collapse
|
22
|
Blöcker C, Rosvall M. Mapping flows on bipartite networks. Phys Rev E 2020; 102:052305. [PMID: 33327187 DOI: 10.1103/physreve.102.052305] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/03/2020] [Accepted: 10/10/2020] [Indexed: 05/07/2023]
Abstract
Mapping network flows provides insight into the organization of networks, but even though many real networks are bipartite, no method for mapping flows takes advantage of the bipartite structure. What do we miss by discarding this information and how can we use it to understand the structure of bipartite networks better? The map equation models network flows with a random walk and exploits the information-theoretic duality between compression and finding regularities to detect communities in networks. However, it does not use the fact that random walks in bipartite networks alternate between node types, information worth 1 bit. To make some or all of this information available to the map equation, we developed a coding scheme that remembers node types at different rates. We explored the community landscape of bipartite real-world networks from no node-type information to full node-type information and found that using node types at a higher rate generally leads to deeper community hierarchies and a higher resolution. The corresponding compression of network flows exceeds the amount of extra information provided. Consequently, taking advantage of the bipartite structure increases the resolution and reveals more network regularities.
Collapse
Affiliation(s)
- Christopher Blöcker
- Integrated Science Laboratory, Department of Physics, Umeå University, SE-901 87 Umeå, Sweden
| | - Martin Rosvall
- Integrated Science Laboratory, Department of Physics, Umeå University, SE-901 87 Umeå, Sweden
| |
Collapse
|
23
|
Yen TC, Larremore DB. Community detection in bipartite networks with stochastic block models. Phys Rev E 2020; 102:032309. [PMID: 33075933 DOI: 10.1103/physreve.102.032309] [Citation(s) in RCA: 21] [Impact Index Per Article: 4.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/22/2020] [Accepted: 07/23/2020] [Indexed: 11/07/2022]
Abstract
In bipartite networks, community structures are restricted to being disassortative, in that nodes of one type are grouped according to common patterns of connection with nodes of the other type. This makes the stochastic block model (SBM), a highly flexible generative model for networks with block structure, an intuitive choice for bipartite community detection. However, typical formulations of the SBM do not make use of the special structure of bipartite networks. Here we introduce a Bayesian nonparametric formulation of the SBM and a corresponding algorithm to efficiently find communities in bipartite networks which parsimoniously chooses the number of communities. The biSBM improves community detection results over general SBMs when data are noisy, improves the model resolution limit by a factor of sqrt[2], and expands our understanding of the complicated optimization landscape associated with community detection tasks. A direct comparison of certain terms of the prior distributions in the biSBM and a related high-resolution hierarchical SBM also reveals a counterintuitive regime of community detection problems, populated by smaller and sparser networks, where nonhierarchical models outperform their more flexible counterpart.
Collapse
Affiliation(s)
- Tzu-Chi Yen
- Department of Computer Science, University of Colorado, Boulder, Colorado 80309, USA
| | - Daniel B Larremore
- Department of Computer Science, University of Colorado, Boulder, Colorado 80309, USA.,BioFrontiers Institute, University of Colorado, Boulder, Colorado 80303, USA
| |
Collapse
|
24
|
Public Strategy and Eco-Social Engagement in Latin American States: An Analysis of Complex Networks Arising from Their Constitutions. SUSTAINABILITY 2020. [DOI: 10.3390/su12208558] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/16/2022]
Abstract
Over the past four decades, Latin American states have drafted relatively new constitutions in comparison with other regions of the world. These transformations, in some cases, have helped governments leave behind the former authoritarian regimes, or in others, have simply established a more democratic system incorporating a forward-looking approach to rights. For example, stronger individual and collective rights have been forged, together with new avenues for citizen participation. Certainly, many of the new constitutions grant a much broader base of rights, including collective political and territorial rights for indigenous communities, protections against ethnic, racial, and gender discrimination, and greater guarantees of privacy and control over information. Consequently, some Latin American constitutions are held up as among the best in the world. For this study, the constitutional texts of 22 Latin American countries were analyzed with the aim of understanding their regulatory changes and impacts, pointing out the existing inequalities they address, as well as the clear positive trend established in terms of the generation of greater social engagement.
Collapse
|
25
|
Abstract
MOTIVATION Recent single-cell DNA sequencing technologies enable whole-genome sequencing of hundreds to thousands of individual cells. However, these technologies have ultra-low sequencing coverage (<0.5× per cell) which has limited their use to the analysis of large copy-number aberrations (CNAs) in individual cells. While CNAs are useful markers in cancer studies, single-nucleotide mutations are equally important, both in cancer studies and in other applications. However, ultra-low coverage sequencing yields single-nucleotide mutation data that are too sparse for current single-cell analysis methods. RESULTS We introduce SBMClone, a method to infer clusters of cells, or clones, that share groups of somatic single-nucleotide mutations. SBMClone uses a stochastic block model to overcome sparsity in ultra-low coverage single-cell sequencing data, and we show that SBMClone accurately infers the true clonal composition on simulated datasets with coverage at low as 0.2×. We applied SBMClone to single-cell whole-genome sequencing data from two breast cancer patients obtained using two different sequencing technologies. On the first patient, sequenced using the 10X Genomics CNV solution with sequencing coverage ≈0.03×, SBMClone recovers the major clonal composition when incorporating a small amount of additional information. On the second patient, where pre- and post-treatment tumor samples were sequenced using DOP-PCR with sequencing coverage ≈0.5×, SBMClone shows that tumor cells are present in the post-treatment sample, contrary to published analysis of this dataset. AVAILABILITY AND IMPLEMENTATION SBMClone is available on the GitHub repository https://github.com/raphael-group/SBMClone. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Matthew A Myers
- Department of Computer Science, Princeton University, Princeton, NJ 08544, USA
| | - Simone Zaccaria
- Department of Computer Science, Princeton University, Princeton, NJ 08544, USA
| | - Benjamin J Raphael
- Department of Computer Science, Princeton University, Princeton, NJ 08544, USA
| |
Collapse
|
26
|
Vasques Filho D, O'Neale DRJ. Transitivity and degree assortativity explained: The bipartite structure of social networks. Phys Rev E 2020; 101:052305. [PMID: 32575287 DOI: 10.1103/physreve.101.052305] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/06/2019] [Accepted: 04/10/2020] [Indexed: 06/11/2023]
Abstract
Dynamical processes, such as the diffusion of knowledge, opinions, pathogens, "fake news," innovation, and others, are highly dependent on the structure of the social network in which they occur. However, questions on why most social networks present some particular structural features, namely, high levels of transitivity and degree assortativity, when compared to other types of networks remain open. First, we argue that every one-mode network can be regarded as a projection of a bipartite network, and we show that this is the case using two simple examples solved with the generating functions formalism. Second, using synthetic and empirical data, we reveal how the combination of the degree distribution of both sets of nodes of the bipartite network-together with the presence of cycles of lengths four and six-explain the observed values of transitivity and degree assortativity coefficients in the one-mode projected network. Bipartite networks with top node degrees that display a more right-skewed distribution than the bottom nodes result in highly transitive and degree assortative projections, especially if a large number of small cycles are present in the bipartite structure.
Collapse
Affiliation(s)
- Demival Vasques Filho
- Leibniz-Institut für Europäische Geschichte, Alte Universitätsstraße 19, 55116 Mainz, Germany
| | - Dion R J O'Neale
- Department of Physics, University of Auckland, Private Bag 92019, Auckland, New Zealand and Te Puūaha Matatini, University of Auckland, Private Bag 92019, Auckland, New Zealand
| |
Collapse
|
27
|
Valejo A, Faleiros T, Oliveira MCFD, Lopes ADA. A coarsening method for bipartite networks via weight-constrained label propagation. Knowl Based Syst 2020. [DOI: 10.1016/j.knosys.2020.105678] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/25/2022]
|
28
|
Terry JCD, Lewis OT. Finding missing links in interaction networks. Ecology 2020; 101:e03047. [PMID: 32219855 DOI: 10.1002/ecy.3047] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 07/16/2019] [Revised: 02/05/2020] [Accepted: 02/24/2020] [Indexed: 12/22/2022]
Abstract
Documenting which species interact within ecological communities is challenging and labor intensive. As a result, many interactions remain unrecorded, potentially distorting our understanding of network structure and dynamics. We test the utility of four structural models and a new coverage-deficit model for predicting missing links in both simulated and empirical bipartite networks. We find they can perform well, although the predictive power of structural models varies with the underlying network structure. The accuracy of predictions can be improved by ensembling multiple models. Augmenting observed networks with most-likely missing links improves estimates of qualitative network metrics. Tools to identify likely missing links can be simple to implement, allowing the prioritization of research effort and more robust assessment of network properties.
Collapse
Affiliation(s)
| | - Owen T Lewis
- Department of Zoology, University of Oxford, Oxford, OX1 3PS, United Kingdom
| |
Collapse
|
29
|
Optimizing Variational Graph Autoencoder for Community Detection with Dual Optimization. ENTROPY 2020; 22:e22020197. [PMID: 33285972 PMCID: PMC7516625 DOI: 10.3390/e22020197] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 12/31/2019] [Revised: 02/03/2020] [Accepted: 02/04/2020] [Indexed: 11/17/2022]
Abstract
Variational Graph Autoencoder (VGAE) has recently gained traction for learning representations on graphs. Its inception has allowed models to achieve state-of-the-art performance for challenging tasks such as link prediction, rating prediction, and node clustering. However, a fundamental flaw exists in Variational Autoencoder (VAE)-based approaches. Specifically, merely minimizing the loss of VAE increases the deviation from its primary objective. Focusing on Variational Graph Autoencoder for Community Detection (VGAECD) we found that optimizing the loss using the stochastic gradient descent often leads to sub-optimal community structure especially when initialized poorly. We address this shortcoming by introducing a dual optimization procedure. This procedure aims to guide the optimization process and encourage learning of the primary objective. Additionally, we linearize the encoder to reduce the number of learning parameters. The outcome is a robust algorithm that outperforms its predecessor.
Collapse
|
30
|
|
31
|
Aslan S, Kaya B. Time-aware link prediction based on strengthened projection in bipartite networks. Inf Sci (N Y) 2020. [DOI: 10.1016/j.ins.2019.08.025] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/25/2022]
|
32
|
Wilinski M, Mazzarisi P, Tantari D, Lillo F. Detectability of macroscopic structures in directed asymmetric stochastic block model. Phys Rev E 2019; 99:042310. [PMID: 31108631 DOI: 10.1103/physreve.99.042310] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/11/2018] [Indexed: 11/07/2022]
Abstract
We study the problem of identifying macroscopic structures in networks, characterizing the impact of introducing link directions on the detectability phase transition. To this end, building on the stochastic block model, we construct a class of nontrivially detectable directed networks. We find closed-form solutions by using the belief propagation method, showing how the transition line depends on the assortativity and the asymmetry of the network. Finally, we numerically identify the existence of a hard phase for detection close to the transition point.
Collapse
Affiliation(s)
- Mateusz Wilinski
- Scuola Normale Superiore, Piazza dei Cavalieri 7, 56126 Pisa, Italy
| | - Piero Mazzarisi
- Dipartimento di Matematica, Porta di Piazza San Donato 5, 40126 Bologna, Italy
| | - Daniele Tantari
- Scuola Normale Superiore, Piazza dei Cavalieri 7, 56126 Pisa, Italy
| | - Fabrizio Lillo
- Dipartimento di Matematica, Porta di Piazza San Donato 5, 40126 Bologna, Italy
| |
Collapse
|
33
|
Becatti C, Caldarelli G, Saracco F. Entropy-based randomization of rating networks. Phys Rev E 2019; 99:022306. [PMID: 30934284 DOI: 10.1103/physreve.99.022306] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/16/2018] [Indexed: 05/28/2023]
Abstract
In recent years, due to the great diffusion of e-commerce, online rating platforms quickly became a common tool for purchase recommendations. However, instruments for their analysis did not evolve at the same speed. Indeed, interesting information about users' habits and tastes can be recovered just considering the bipartite network of users and products, in which links represent products' purchases and have different weights due to the score assigned to the item in users' reviews. With respect to other weighted bipartite networks, in these systems we observe a maximum possible weight per link, that limits the variability of the outcomes. In the present article we propose an entropy-based randomization method for this type of networks (i.e., bipartite rating networks) by extending the configuration model framework: the randomized network satisfies the constraints of the degree per rating, i.e., the number of given ratings received by the specified product or assigned by the single user. We first show that such a null model is able to reproduce several nontrivial features of the real network better than other null models. Then, using our model as benchmark, we project the information contained in the real system on one of the layers: To provide an interpretation of the projection obtained, we run the Louvain community detection on the obtained network and discuss the observed division in clusters. We are able to detect groups of music albums due to the consumers' taste or communities of movies due to their audience. Finally, we show that our method is also able to handle the special case of categorical bipartite networks: we consider the bipartite categorical network of scientific journals recognized for the scientific qualification in economics and statistics. In the end, from the outcome of our method, the probability that each user appreciate every product can be easily recovered. Therefore, this information may be employed in future applications to implement a more detailed recommendation system that also takes into account information regarding the topology of the observed network.
Collapse
Affiliation(s)
- Carolina Becatti
- IMT School for Advanced Studies, Piazza S.Francesco 19, 55100 Lucca, Italy
| | - Guido Caldarelli
- IMT School for Advanced Studies, Piazza S.Francesco 19, 55100 Lucca, Italy
- Istituto dei Sistemi Complessi (ISC)-CNR UoS Università "Sapienza", Piazzale Aldo Moro 5, 00185 Roma, Italy
- ECLT San Marco 2940, 30124 Venezia, Italy
| | - Fabio Saracco
- IMT School for Advanced Studies, Piazza S.Francesco 19, 55100 Lucca, Italy
| |
Collapse
|
34
|
Pesantez-Cabrera P, Kalyanaraman A. Efficient Detection of Communities in Biological Bipartite Networks. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2019; 16:258-271. [PMID: 29990252 DOI: 10.1109/tcbb.2017.2765319] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/08/2023]
Abstract
Methods to efficiently uncover and extract community structures are required in a number of biological applications where networked data and their interactions can be modeled as graphs, and observing tightly-knit groups of vertices ("communities") can offer insights into the structural and functional building blocks of the underlying network. Classical applications of community detection have largely focused on unipartite networks - i.e., graphs built out of a single type of objects. However, due to increased availability of biological data from various sources, there is now an increasing need for handling heterogeneous networks which are built out of multiple types of objects. In this paper, we address the problem of identifying communities from biological bipartite networks - i.e., networks where interactions are observed between two different types of objects (e.g., genes and diseases, drugs and protein complexes, plants and pollinators, and hosts and pathogens). Toward detecting communities in such bipartite networks, we make the following contributions: i) (metric) we propose a variant of bipartite modularity; ii) (algorithms) we present an efficient algorithm called biLouvain that implements a set of heuristics toward fast and precise community detection in bipartite networks (https://github.com/paolapesantez/biLouvain); and iii) (experiments) we present a thorough experimental evaluation of our algorithm including comparison to other state-of-the-art methods to identify communities in bipartite networks. Experimental results show that our biLouvain algorithm identifies communities that have a comparable or better quality (as measured by bipartite modularity) than existing methods, while significantly reducing the time-to-solution between one and four orders of magnitude.
Collapse
|
35
|
Vasques Filho D, O'Neale DRJ. Degree distributions of bipartite networks and their projections. Phys Rev E 2018; 98:022307. [PMID: 30253604 DOI: 10.1103/physreve.98.022307] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/21/2018] [Indexed: 11/07/2022]
Abstract
Bipartite (two-mode) networks are important in the analysis of social and economic systems as they explicitly show conceptual links between different types of entities. However, applications of such networks often work with a projected (one-mode) version of the original bipartite network. The topology of the projected network, and the dynamics that take place on it, are highly dependent on the degree distributions of the two different node types from the original bipartite structure. To date, the interaction between the degree distributions of bipartite networks and their one-mode projections is well understood for only a few cases, or for networks that satisfy a restrictive set of assumptions. Here we show a broader analysis in order to fill the gap left by previous studies. We use the formalism of generating functions to prove that the degree distributions of both node types in the original bipartite network affect the degree distribution in the projected version. To support our analysis, we simulate several types of synthetic bipartite networks using a configuration model where node degrees are assigned from specific probability distributions, ranging from peaked to heavy-tailed distributions. Our findings show that when projecting a bipartite network onto a particular set of nodes, the degree distribution for the resulting one-mode network follows the distribution of the nodes being projected on to, but only so long as the degree distribution for the opposite set of nodes does not have a heavier tail. Furthermore, we show that bipartite degree distributions are not the only feature driving topology formation of projected networks, in contrast to what is commonly described in the literature.
Collapse
Affiliation(s)
- Demival Vasques Filho
- Department of Physics, University of Auckland, Private Bag 92019, Auckland, New Zealand and Te Pūnaha Matatini, University of Auckland, Private Bag 92019, Auckland, New Zealand
| | - Dion R J O'Neale
- Department of Physics, University of Auckland, Private Bag 92019, Auckland, New Zealand and Te Pūnaha Matatini, University of Auckland, Private Bag 92019, Auckland, New Zealand
| |
Collapse
|
36
|
Cluster-based network proximities for arbitrary nodal subsets. Sci Rep 2018; 8:14371. [PMID: 30254231 PMCID: PMC6156331 DOI: 10.1038/s41598-018-32172-0] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/20/2017] [Accepted: 08/07/2018] [Indexed: 11/08/2022] Open
Abstract
The concept of a cluster or community in a network context has been of considerable interest in a variety of settings in recent years. In this paper, employing random walks and geodesic distance, we introduce a unified measure of cluster-based proximity between nodes, relative to a given subset of interest. The inherent simplicity and informativeness of the approach could make it of value to researchers in a variety of scientific fields. Applicability is demonstrated via application to clustering for a number of existent data sets (including multipartite networks). We view community detection (i.e. when the full set of network nodes is considered) as simply the limiting instance of clustering (for arbitrary subsets). This perspective should add to the dialogue on what constitutes a cluster or community within a network. In regards to health-relevant attributes in social networks, identification of clusters of individuals with similar attributes can support targeting of collective interventions. The method performs well in comparisons with other approaches, based on comparative measures such as NMI and ARI.
Collapse
|
37
|
BMTK: a toolkit for determining modules in biological bipartite networks. QUANTITATIVE BIOLOGY 2018. [DOI: 10.1007/s40484-018-0132-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/26/2022]
|
38
|
Daoutidis P, Tang W, Jogwar SS. Decomposing complex plants for distributed control: Perspectives from network theory. Comput Chem Eng 2018. [DOI: 10.1016/j.compchemeng.2017.10.015] [Citation(s) in RCA: 24] [Impact Index Per Article: 3.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/18/2022]
|
39
|
A comparative study of the measures for evaluating community structure in bipartite networks. Inf Sci (N Y) 2018. [DOI: 10.1016/j.ins.2018.03.036] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/20/2022]
|
40
|
Pavlopoulos GA, Kontou PI, Pavlopoulou A, Bouyioukos C, Markou E, Bagos PG. Bipartite graphs in systems biology and medicine: a survey of methods and applications. Gigascience 2018; 7:1-31. [PMID: 29648623 PMCID: PMC6333914 DOI: 10.1093/gigascience/giy014] [Citation(s) in RCA: 78] [Impact Index Per Article: 11.1] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/09/2017] [Revised: 01/15/2018] [Accepted: 02/13/2018] [Indexed: 11/14/2022] Open
Abstract
The latest advances in high-throughput techniques during the past decade allowed the systems biology field to expand significantly. Today, the focus of biologists has shifted from the study of individual biological components to the study of complex biological systems and their dynamics at a larger scale. Through the discovery of novel bioentity relationships, researchers reveal new information about biological functions and processes. Graphs are widely used to represent bioentities such as proteins, genes, small molecules, ligands, and others such as nodes and their connections as edges within a network. In this review, special focus is given to the usability of bipartite graphs and their impact on the field of network biology and medicine. Furthermore, their topological properties and how these can be applied to certain biological case studies are discussed. Finally, available methodologies and software are presented, and useful insights on how bipartite graphs can shape the path toward the solution of challenging biological problems are provided.
Collapse
Affiliation(s)
- Georgios A Pavlopoulos
- Lawrence Berkeley Labs, DOE Joint Genome Institute, 2800 Mitchell Drive, Walnut Creek, CA 94598, USA
| | - Panagiota I Kontou
- University of Thessaly, Department of Computer Science and Biomedical Informatics, Papasiopoulou 2–4, Lamia, 35100, Greece
| | - Athanasia Pavlopoulou
- Izmir International Biomedicine and Genome Institute (iBG-Izmir), Dokuz Eylül University, 35340, Turkey
| | - Costas Bouyioukos
- Université Paris Diderot, Sorbonne Paris Cité, Epigenetics and Cell Fate, UMR7216, CNRS, France
| | - Evripides Markou
- University of Thessaly, Department of Computer Science and Biomedical Informatics, Papasiopoulou 2–4, Lamia, 35100, Greece
| | - Pantelis G Bagos
- University of Thessaly, Department of Computer Science and Biomedical Informatics, Papasiopoulou 2–4, Lamia, 35100, Greece
| |
Collapse
|
41
|
Kawamoto T. Algorithmic detectability threshold of the stochastic block model. Phys Rev E 2018; 97:032301. [PMID: 29776051 DOI: 10.1103/physreve.97.032301] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/30/2017] [Indexed: 06/08/2023]
Abstract
The assumption that the values of model parameters are known or correctly learned, i.e., the Nishimori condition, is one of the requirements for the detectability analysis of the stochastic block model in statistical inference. In practice, however, there is no example demonstrating that we can know the model parameters beforehand, and there is no guarantee that the model parameters can be learned accurately. In this study, we consider the expectation-maximization (EM) algorithm with belief propagation (BP) and derive its algorithmic detectability threshold. Our analysis is not restricted to the community structure but includes general modular structures. Because the algorithm cannot always learn the planted model parameters correctly, the algorithmic detectability threshold is qualitatively different from the one with the Nishimori condition.
Collapse
Affiliation(s)
- Tatsuro Kawamoto
- Artificial Intelligence Research Center, National Institute of Advanced Industrial Science and Technology, 2-3-26 Aomi, Koto-ku, Tokyo, Japan
| |
Collapse
|
42
|
Abstract
A comprehensive analysis of statistical properties of a network of organic reactions reveals several generic traits. This knowledge can be used in the development of optimal reaction sequences.
Collapse
Affiliation(s)
| | - Alexei Lapkin
- Department of Chemical Engineering and Biotechnology
- University of Cambridge
- Cambridge
- UK
| |
Collapse
|
43
|
Wang W, Chen X, Jiao P, Jin D. Similarity-based Regularized Latent Feature Model for Link Prediction in Bipartite Networks. Sci Rep 2017; 7:16996. [PMID: 29208988 PMCID: PMC5717264 DOI: 10.1038/s41598-017-17157-9] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/21/2017] [Accepted: 11/21/2017] [Indexed: 01/05/2023] Open
Abstract
Link prediction is an attractive research topic in the field of data mining and has significant applications in improving performance of recommendation system and exploring evolving mechanisms of the complex networks. A variety of complex systems in real world should be abstractly represented as bipartite networks, in which there are two types of nodes and no links connect nodes of the same type. In this paper, we propose a framework for link prediction in bipartite networks by combining the similarity based structure and the latent feature model from a new perspective. The framework is called Similarity Regularized Nonnegative Matrix Factorization (SRNMF), which explicitly takes the local characteristics into consideration and encodes the geometrical information of the networks by constructing a similarity based matrix. We also develop an iterative scheme to solve the objective function based on gradient descent. Extensive experiments on a variety of real world bipartite networks show that the proposed framework of link prediction has a more competitive, preferable and stable performance in comparison with the state-of-art methods.
Collapse
Affiliation(s)
- Wenjun Wang
- School of Computer Science and Technology, Tianjin University, Tianjin, 300354, China.,Tianjin Engineering Center of SmartSafety and Bigdata Technology, Tianjin University, Tianjin, 300354, China.,Tianjin Key Laboratory of Advanced Networking (TANK), Tianjin Key Laboratory, Tianjin, 300354, China
| | - Xue Chen
- School of Computer Science and Technology, Tianjin University, Tianjin, 300354, China
| | - Pengfei Jiao
- School of Computer Science and Technology, Tianjin University, Tianjin, 300354, China.
| | - Di Jin
- School of Computer Science and Technology, Tianjin University, Tianjin, 300354, China
| |
Collapse
|
44
|
Azizi A, Dewar J, Wu T, Hyman JM. Generating Bipartite Networks with a Prescribed Joint Degree Distribution. JOURNAL OF COMPLEX NETWORKS 2017; 5:839-857. [PMID: 29854407 PMCID: PMC5972839 DOI: 10.1093/comnet/cnx014] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/08/2023]
Abstract
We describe a class of new algorithms to construct bipartite networks that preserves a prescribed degree and joint-degree (degree-degree) distribution of the nodes. Bipartite networks are graphs that can represent real-world interactions between two disjoint sets, such as actor-movie networks, author-article networks, co-occurrence networks, and heterosexual partnership networks. Often there is a strong correlation between the degree of a node and the degrees of the neighbors of that node that must be preserved when generating a network that reflects the structure of the underling system. Our bipartite 2K (B2K) algorithms generate an ensemble of networks that preserve prescribed degree sequences for the two disjoint set of nodes in the bipartite network, and the joint-degree distribution that is the distribution of the degrees of all neighbors of nodes with the same degree. We illustrate the effectiveness of the algorithms on a romance network using the NetworkX software environment to compare other properties of a target network that are not directly enforced by the B2K algorithms. We observe that when average degree of nodes is low, as is the case for romance and heterosexual partnership networks, then the B2K networks tend to preserve additional properties, such as the cluster coefficients, than algorithms that do not preserve the joint-degree distribution of the original network.
Collapse
Affiliation(s)
| | - Jeremy Dewar
- Department of Mathematics, Tulane University, New Orleans, LA 70118, USA
| | - Tong Wu
- Department of mathematics, North Carolina State University, Raleigh, NC 27695, USA
| | - James M. Hyman
- Department of Mathematics, Tulane University, New Orleans, LA 70118, USA
| |
Collapse
|
45
|
Bongiorno C, London A, Miccichè S, Mantegna RN. Core of communities in bipartite networks. Phys Rev E 2017; 96:022321. [PMID: 28950546 DOI: 10.1103/physreve.96.022321] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/06/2017] [Indexed: 06/07/2023]
Abstract
We use the information present in a bipartite network to detect cores of communities of each set of the bipartite system. Cores of communities are found by investigating statistically validated projected networks obtained using information present in the bipartite network. Cores of communities are highly informative and robust with respect to the presence of errors or missing entries in the bipartite network. We assess the statistical robustness of cores by investigating an artificial benchmark network, the coauthorship network, and the actor-movie network. The accuracy and precision of the partition obtained with respect to the reference partition are measured in terms of the adjusted Rand index and the adjusted Wallace index, respectively. The detection of cores is highly precise, although the accuracy of the methodology can be limited in some cases.
Collapse
Affiliation(s)
- Christian Bongiorno
- Dipartimento di Fisica e Chimica, Università degli Studi di Palermo, Viale delle Scienze Ed. 18, I-90128 Palermo, Italy
| | - András London
- Institute of Informatics, University of Szeged, Árpád tér 2, H-6720 Szeged, Hungary
| | - Salvatore Miccichè
- Dipartimento di Fisica e Chimica, Università degli Studi di Palermo, Viale delle Scienze Ed. 18, I-90128 Palermo, Italy
| | - Rosario N Mantegna
- Dipartimento di Fisica e Chimica, Università degli Studi di Palermo, Viale delle Scienze Ed. 18, I-90128 Palermo, Italy
- Center for Network Science, Central European University, Nador 9, H-1051 Budapest, Hungary
- Department of Computer Science, University College London, Gower Street, London WC1E 6BT, United Kingdom
| |
Collapse
|
46
|
|
47
|
Connor N, Barberán A, Clauset A. Using null models to infer microbial co-occurrence networks. PLoS One 2017; 12:e0176751. [PMID: 28493918 PMCID: PMC5426617 DOI: 10.1371/journal.pone.0176751] [Citation(s) in RCA: 37] [Impact Index Per Article: 4.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/02/2017] [Accepted: 04/12/2017] [Indexed: 12/21/2022] Open
Abstract
Although microbial communities are ubiquitous in nature, relatively little is known about the structural and functional roles of their constituent organisms' underlying interactions. A common approach to study such questions begins with extracting a network of statistically significant pairwise co-occurrences from a matrix of observed operational taxonomic unit (OTU) abundances across sites. The structure of this network is assumed to encode information about ecological interactions and processes, resistance to perturbation, and the identity of keystone species. However, common methods for identifying these pairwise interactions can contaminate the network with spurious patterns that obscure true ecological signals. Here, we describe this problem in detail and develop a solution that incorporates null models to distinguish ecological signals from statistical noise. We apply these methods to the initial OTU abundance matrix and to the extracted network. We demonstrate this approach by applying it to a large soil microbiome data set and show that many previously reported patterns for these data are statistical artifacts. In contrast, we find the frequency of three-way interactions among microbial OTUs to be highly statistically significant. These results demonstrate the importance of using appropriate null models when studying observational microbiome data, and suggest that extracting and characterizing three-way interactions among OTUs is a promising direction for unraveling the structure and function of microbial ecosystems.
Collapse
Affiliation(s)
- Nora Connor
- Department of Computer Science, University of Colorado, Boulder, Colorado, United States of America
- * E-mail:
| | - Albert Barberán
- Department of Soil, Water, and Environmental Science, University of Arizona, Tucson, Arizona, United States of America
| | - Aaron Clauset
- Department of Computer Science, University of Colorado, Boulder, Colorado, United States of America
- BioFrontiers Institute, University of Colorado, Boulder, Colorado, United States of America
- Santa Fe Institute, Santa Fe, New Mexico, United States of America
| |
Collapse
|
48
|
Peel L, Larremore DB, Clauset A. The ground truth about metadata and community detection in networks. SCIENCE ADVANCES 2017; 3:e1602548. [PMID: 28508065 PMCID: PMC5415338 DOI: 10.1126/sciadv.1602548] [Citation(s) in RCA: 126] [Impact Index Per Article: 15.8] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 10/17/2016] [Accepted: 03/08/2017] [Indexed: 05/30/2023]
Abstract
Across many scientific domains, there is a common need to automatically extract a simplified view or coarse-graining of how a complex system's components interact. This general task is called community detection in networks and is analogous to searching for clusters in independent vector data. It is common to evaluate the performance of community detection algorithms by their ability to find so-called ground truth communities. This works well in synthetic networks with planted communities because these networks' links are formed explicitly based on those known communities. However, there are no planted communities in real-world networks. Instead, it is standard practice to treat some observed discrete-valued node attributes, or metadata, as ground truth. We show that metadata are not the same as ground truth and that treating them as such induces severe theoretical and practical problems. We prove that no algorithm can uniquely solve community detection, and we prove a general No Free Lunch theorem for community detection, which implies that there can be no algorithm that is optimal for all possible community detection tasks. However, community detection remains a powerful tool and node metadata still have value, so a careful exploration of their relationship with network structure can yield insights of genuine worth. We illustrate this point by introducing two statistical techniques that can quantify the relationship between metadata and community structure for a broad class of models. We demonstrate these techniques using both synthetic and real-world networks, and for multiple types of metadata and community structures.
Collapse
Affiliation(s)
- Leto Peel
- Institute of Information and Communication Technologies, Electronics and Applied Mathematics, Université Catholique de Louvain, Louvain-la-Neuve, Belgium
- naXys, Université de Namur, Namur, Belgium
| | | | - Aaron Clauset
- Santa Fe Institute, Santa Fe, NM 87501, USA
- Department of Computer Science, University of Colorado, Boulder, CO 80309, USA
- BioFrontiers Institute, University of Colorado, Boulder, CO 80309, USA
| |
Collapse
|
49
|
Kitsak M, Papadopoulos F, Krioukov D. Latent geometry of bipartite networks. Phys Rev E 2017; 95:032309. [PMID: 28415237 DOI: 10.1103/physreve.95.032309] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/27/2016] [Indexed: 06/07/2023]
Abstract
Despite the abundance of bipartite networked systems, their organizing principles are less studied compared to unipartite networks. Bipartite networks are often analyzed after projecting them onto one of the two sets of nodes. As a result of the projection, nodes of the same set are linked together if they have at least one neighbor in common in the bipartite network. Even though these projections allow one to study bipartite networks using tools developed for unipartite networks, one-mode projections lead to significant loss of information and artificial inflation of the projected network with fully connected subgraphs. Here we pursue a different approach for analyzing bipartite systems that is based on the observation that such systems have a latent metric structure: network nodes are points in a latent metric space, while connections are more likely to form between nodes separated by shorter distances. This approach has been developed for unipartite networks, and relatively little is known about its applicability to bipartite systems. Here, we fully analyze a simple latent-geometric model of bipartite networks and show that this model explains the peculiar structural properties of many real bipartite systems, including the distributions of common neighbors and bipartite clustering. We also analyze the geometric information loss in one-mode projections in this model and propose an efficient method to infer the latent pairwise distances between nodes. Uncovering the latent geometry underlying real bipartite networks can find applications in diverse domains, ranging from constructing efficient recommender systems to understanding cell metabolism.
Collapse
Affiliation(s)
- Maksim Kitsak
- Department of Physics, Northeastern University, Boston, Massachusetts 02115, USA
| | - Fragkiskos Papadopoulos
- Department of Electrical Engineering, Computer Engineering and Informatics, Cyprus University of Technology, 33 Saripolou Street, 3036 Limassol, Cyprus
| | - Dmitri Krioukov
- Department of Physics, Department of Mathematics, Department of Electrical and Computer Engineering, Northeastern University, Boston, Massachusetts 02115, USA
| |
Collapse
|
50
|
Su Y, Wang B, Zhang X. A seed-expanding method based on random walks for community detection in networks with ambiguous community structures. Sci Rep 2017; 7:41830. [PMID: 28157183 PMCID: PMC5291113 DOI: 10.1038/srep41830] [Citation(s) in RCA: 24] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/24/2016] [Accepted: 12/28/2016] [Indexed: 11/09/2022] Open
Abstract
Community detection has received a great deal of attention, since it could help to reveal the useful information hidden in complex networks. Although most previous modularity-based and local modularity-based community detection algorithms could detect strong communities, they may fail to exactly detect several weak communities. In this work, we define a network with clear or ambiguous community structures based on the types of its communities. A seed-expanding method based on random walks is proposed to detect communities for networks, especially for the networks with ambiguous community structures. We identify local maximum degree nodes, and detect seed communities in a network. Then, the probability of a node belonging to each community is calculated based on the total probability model and random walks, and each community is expanded by repeatedly adding the node which is most likely to belong to it. Finally, we use the community optimization method to ensure that each node is in a community. Experimental results on both computer-generated and real-world networks demonstrate that the quality of the communities detected by the proposed algorithm is superior to the- state-of-the-art algorithms in the networks with ambiguous community structures.
Collapse
Affiliation(s)
- Yansen Su
- Key Lab of Intelligent Computing and Signal Processing of Ministry of Education, School of Computer Science and Technology, Anhui University, Hefei 230039, China
| | - Bangju Wang
- Key Lab of Intelligent Computing and Signal Processing of Ministry of Education, School of Computer Science and Technology, Anhui University, Hefei 230039, China
| | - Xingyi Zhang
- Key Lab of Intelligent Computing and Signal Processing of Ministry of Education, School of Computer Science and Technology, Anhui University, Hefei 230039, China
| |
Collapse
|