1
|
He Z, Zhao Y, Bickel P, Weko C, Cheng D, Wang J. Network Inference Using the Hub Model and Variants. J Am Stat Assoc 2023. [DOI: 10.1080/01621459.2023.2183133] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/25/2023]
|
2
|
Barbarino G, Noferini V, Van Dooren P. Role extraction for digraphs via neighborhood pattern similarity. Phys Rev E 2022; 106:054301. [PMID: 36559511 DOI: 10.1103/physreve.106.054301] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/10/2022] [Accepted: 10/06/2022] [Indexed: 12/24/2022]
Abstract
We analyze the recovery of different roles in a network modeled by a directed graph, based on the so-called Neighborhood Pattern Similarity approach. Our analysis uses results from random matrix theory to show that, when assuming that the graph is generated as a particular stochastic block model with Bernoulli probability distributions for the different blocks, then the recovery is asymptotically correct when the graph has a sufficiently large dimension. Under these assumptions there is a sufficient gap between the dominant and dominated eigenvalues of the similarity matrix, which guarantees the asymptotic correct identification of the number of different roles. We also comment on the connections with the literature on stochastic block models, including the case of probabilities of order log(n)/n where n is the graph size. We provide numerical experiments to assess the effectiveness of the method when applied to practical networks of finite size.
Collapse
Affiliation(s)
- Giovanni Barbarino
- Aalto University, Department of Mathematics and Systems Analysis, P.O. Box 11100, FI-00076 Aalto, Finland
| | - Vanni Noferini
- Aalto University, Department of Mathematics and Systems Analysis, P.O. Box 11100, FI-00076 Aalto, Finland
| | - Paul Van Dooren
- Université catholique de Louvain, Department of Mathematical Engineering, Av. Lemaitre 4, B-1348 Louvain-la-Neuve, Belgium
| |
Collapse
|
3
|
Huang S, Weng H, Feng Y. Spectral clustering via adaptive layer aggregation for multi-layer networks*. J Comput Graph Stat 2022. [DOI: 10.1080/10618600.2022.2134874] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/17/2022]
Affiliation(s)
- Sihan Huang
- Department of Statistics, Columbia University
| | - Haolei Weng
- Department of Statistics and Probability, Michigan State University
| | - Yang Feng
- Department of Biostatistics, New York University
| |
Collapse
|
4
|
Qing H. A Useful Criterion on Studying Consistent Estimation in Community Detection. Entropy (Basel) 2022; 24:1098. [PMID: 36010762 PMCID: PMC9407257 DOI: 10.3390/e24081098] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 07/10/2022] [Revised: 08/04/2022] [Accepted: 08/08/2022] [Indexed: 06/15/2023]
Abstract
In network analysis, developing a unified theoretical framework that can compare methods under different models is an interesting problem. This paper proposes a partial solution to this problem. We summarize the idea of using a separation condition for a standard network and sharp threshold of the Erdös-Rényi random graph to study consistent estimation, and compare theoretical error rates and requirements on the network sparsity of spectral methods under models that can degenerate to a stochastic block model as a four-step criterion SCSTC. Using SCSTC, we find some inconsistent phenomena on separation condition and sharp threshold in community detection. In particular, we find that the original theoretical results of the SPACL algorithm introduced to estimate network memberships under the mixed membership stochastic blockmodel are sub-optimal. To find the formation mechanism of inconsistencies, we re-establish the theoretical convergence rate of this algorithm by applying recent techniques on row-wise eigenvector deviation. The results are further extended to the degree-corrected mixed membership model. By comparison, our results enjoy smaller error rates, lesser dependence on the number of communities, weaker requirements on network sparsity, and so forth. The separation condition and sharp threshold obtained from our theoretical results match the classical results, so the usefulness of this criterion on studying consistent estimation is guaranteed. Numerical results for computer-generated networks support our finding that spectral methods considered in this paper achieve the threshold of separation condition.
Collapse
Affiliation(s)
- Huan Qing
- School of Mathematics, China University of Mining and Technology, Xuzhou 221116, China
| |
Collapse
|
5
|
Zhao Y. An optimal uniform concentration inequality for discrete entropies on finite alphabets in the high-dimensional setting. BERNOULLI 2022. [DOI: 10.3150/21-bej1403] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/19/2022]
Affiliation(s)
- Yunpeng Zhao
- School of Mathematical and Natural Sciences, Arizona State University, AZ, 85306
| |
Collapse
|
6
|
Austern M, Orbanz P. Limit theorems for distributions invariant under groups of transformations. Ann Stat 2022. [DOI: 10.1214/21-aos2165] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/19/2022]
Affiliation(s)
| | - Peter Orbanz
- Gatsby Computational Neuroscience Unit, University College London
| |
Collapse
|
7
|
Abstract
AbstractNetwork data often exhibit block structures characterized by clusters of nodes with similar patterns of edge formation. When such relational data are complemented by additional information on exogenous node partitions, these sources of knowledge are typically included in the model to supervise the cluster assignment mechanism or to improve inference on edge probabilities. Although these solutions are routinely implemented, there is a lack of formal approaches to test if a given external node partition is in line with the endogenous clustering structure encoding stochastic equivalence patterns among the nodes in the network. To fill this gap, we develop a formal Bayesian testing procedure which relies on the calculation of the Bayes factor between a stochastic block model with known grouping structure defined by the exogenous node partition and an infinite relational model that allows the endogenous clustering configurations to be unknown, random and fully revealed by the block–connectivity patterns in the network. A simple Markov chain Monte Carlo method for computing the Bayes factor and quantifying uncertainty in the endogenous groups is proposed. This strategy is evaluated in simulations, and in applications studying brain networks of Alzheimer’s patients.
Collapse
|
8
|
Nolau I, Ferreira GS. An alternative class of models to position social network groups in latent spaces. BRAZ J PROBAB STAT 2022. [DOI: 10.1214/21-bjps526] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/19/2022]
Affiliation(s)
- Izabel Nolau
- Department of Statistical Methods, Federal University of Rio de Janeiro, 149 Athos da Silveira Ramos Ave, ZIP 21941-909, Rio de Janeiro, RJ, Brazil
| | - Gustavo S. Ferreira
- National School of Statistical Sciences—ENCE, Brazilian Institute of Geography and Statistics—IBGE, 106 Andre Cavalcanti St, ZIP 20231-050, Rio de Janeiro, RJ, Brazil
| |
Collapse
|
9
|
Tang M, Cape J, Priebe CE. Asymptotically efficient estimators for stochastic blockmodels: The naive MLE, the rank-constrained MLE, and the spectral estimator. BERNOULLI 2022. [DOI: 10.3150/21-bej1376] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/19/2022]
Affiliation(s)
- Minh Tang
- Department of Statistics, North Carolina State University, Raleigh, NC, USA
| | - Joshua Cape
- Department of Statistics, University of Pittsburgh, Pittsburgh, PA, USA
| | - Carey E. Priebe
- Applied Mathematics and Statistics, Johns Hopkins University, Baltimore, MD, USA
| |
Collapse
|
10
|
Affiliation(s)
- Yuan Zhang
- Department of Statistics, The Ohio State University
| | - Dong Xia
- Department of Mathematics, The Hong Kong University of Science and Technology
| |
Collapse
|
11
|
Zhang H, Guo X, Chang X. Randomized Spectral Clustering in Large-Scale Stochastic Block Models. J Comput Graph Stat 2022. [DOI: 10.1080/10618600.2022.2034636] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/19/2022]
Affiliation(s)
- Hai Zhang
- Center for Modern Statistics, School of Mathematics, Northwest University, China
| | - Xiao Guo
- Center for Modern Statistics, School of Mathematics, Northwest University, China
| | - Xiangyu Chang
- Center for Intelligent Decision-Making and Machine Learning, School of Management, Xi’an Jiaotong University, China
| |
Collapse
|
12
|
Weng H, Feng Y. Community detection with nodal information: Likelihood and its variational approximation. Stat (Int Stat Inst) 2022. [DOI: 10.1002/sta4.428] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/11/2022]
Affiliation(s)
- Haolei Weng
- Department of Statistics and Probability Michigan State University East Lansing Michigan USA
| | - Yang Feng
- Department of Biostatistics New York University New York City New York USA
| |
Collapse
|
13
|
Abstract
We propose a weighted stochastic block model (WSBM) which extends the stochastic block model to the important case in which edges are weighted. We address the parameter estimation of the WSBM by use of maximum likelihood and variational approaches, and establish the consistency of these estimators. The problem of choosing the number of classes in a WSBM is addressed. The proposed model is applied to simulated data and an illustrative data set.
Collapse
Affiliation(s)
- Tin Lok James Ng
- School of Computer Science and Statistics, Trinity College Dublin, Dublin, Ireland
| | | |
Collapse
|
14
|
De Nicola G, Sischka B, Kauermann G. Mixture models and networks: The stochastic blockmodel. STAT MODEL 2021. [DOI: 10.1177/1471082x211033169] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/17/2022]
Abstract
Mixture models are probabilistic models aimed at uncovering and representing latent subgroups within a population. In the realm of network data analysis, the latent subgroups of nodes are typically identified by their connectivity behaviour, with nodes behaving similarly belonging to the same community. In this context, mixture modelling is pursued through stochastic blockmodelling. We consider stochastic blockmodels and some of their variants and extensions from a mixture modelling perspective. We also explore some of the main classes of estimation methods available and propose an alternative approach based on the reformulation of the blockmodel as a graphon. In addition to the discussion of inferential properties and estimating procedures, we focus on the application of the models to several real-world network datasets, showcasing the advantages and pitfalls of different approaches.
Collapse
Affiliation(s)
- Giacomo De Nicola
- Department of Statistics, Faculty of Mathematics, Informatics and Statistics, Ludwig-Maximilians-Universität München, Munich, Germany
| | - Benjamin Sischka
- Department of Statistics, Faculty of Mathematics, Informatics and Statistics, Ludwig-Maximilians-Universität München, Munich, Germany
| | - Göran Kauermann
- Department of Statistics, Faculty of Mathematics, Informatics and Statistics, Ludwig-Maximilians-Universität München, Munich, Germany
| |
Collapse
|
15
|
Gu Y, Xu G. A Joint MLE Approach to Large-Scale Structured Latent Attribute Analysis. J Am Stat Assoc 2021; 118:746-760. [PMID: 37153844 PMCID: PMC10162480 DOI: 10.1080/01621459.2021.1955689] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/20/2022]
Abstract
Structured Latent Attribute Models (SLAMs) are a family of discrete latent variable models widely used in education, psychology, and epidemiology to model multivariate categorical data. A SLAM assumes that multiple discrete latent attributes explain the dependence of observed variables in a highly structured fashion. Usually, the maximum marginal likelihood estimation approach is adopted for SLAMs, treating the latent attributes as random effects. The increasing scope of modern assessment data involves large numbers of observed variables and high-dimensional latent attributes. This poses challenges to classical estimation methods and requires new methodology and understanding of latent variable modeling. Motivated by this, we consider the joint maximum likelihood estimation (MLE) approach to SLAMs, treating latent attributes as fixed unknown parameters. We investigate estimability, consistency, and computation in the regime where sample size, number of variables, and number of latent attributes all can diverge. We establish the statistical consistency of the joint MLE and propose efficient algorithms that scale well to large-scale data for several popular SLAMs. Simulation studies demonstrate the superior empirical performance of the proposed methods. An application to real data from an international educational assessment gives interpretable findings of cognitive diagnosis.
Collapse
Affiliation(s)
- Yuqi Gu
- Department of Statistics, Columbia University, New York
| | - Gongjun Xu
- Department of Statistics, University of Michigan, Ann Arbor, MI
| |
Collapse
|
16
|
Li Y, Fan X, Chen L, Li B, Sisson SA. Smoothing graphons for modelling exchangeable relational data. Mach Learn 2022; 111:319-44. [DOI: 10.1007/s10994-021-06046-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/26/2022]
|
17
|
Yang H, Xiong W, Zhang X, Wang K, Tian M. Penalized homophily latent space models for directed scale-free networks. PLoS One 2021; 16:e0253873. [PMID: 34339437 PMCID: PMC8328337 DOI: 10.1371/journal.pone.0253873] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/10/2021] [Accepted: 06/14/2021] [Indexed: 11/28/2022] Open
Abstract
Online social networks like Twitter and Facebook are among the most popular sites on the Internet. Most online social networks involve some specific features, including reciprocity, transitivity and degree heterogeneity. Such networks are so called scale-free networks and have drawn lots of attention in research. The aim of this paper is to develop a novel methodology for directed network embedding within the latent space model (LSM) framework. It is known, the link probability between two individuals may increase as the features of each become similar, which is referred to as homophily attributes. To this end, penalized pair-specific attributes, acting as a distance measure, are introduced to provide with more powerful interpretation and improve link prediction accuracy, named penalized homophily latent space models (PHLSM). The proposed models also involve in-degree heterogeneity of directed scale-free networks by embedding with the popularity scales. We also introduce LASSO-based PHLSM to produce an accurate and sparse model for high-dimensional covariates. We make Bayesian inference using MCMC algorithms. The finite sample performance of the proposed models is evaluated by three benchmark simulation datasets and two real data examples. Our methods are competitive and interpretable, they outperform existing approaches for fitting directed networks.
Collapse
Affiliation(s)
- Hanxuan Yang
- School of Statistics, University of International Business and Economics, Beijing, China
| | - Wei Xiong
- School of Statistics, University of International Business and Economics, Beijing, China
| | - Xueliang Zhang
- Department of Medical Engineering and Technology, Xinjiang Medical University, Urumqi, China
| | - Kai Wang
- Department of Medical Engineering and Technology, Xinjiang Medical University, Urumqi, China
| | - Maozai Tian
- Department of Medical Engineering and Technology, Xinjiang Medical University, Urumqi, China
- Center for Applied Statistics, School of Statistics, Renmin University of China, Beijing, China
- * E-mail:
| |
Collapse
|
18
|
Affiliation(s)
- Yubai Yuan
- Department of Statistics, University of California, Irvine
| | - Annie Qu
- Department of Statistics, University of California, Irvine
| |
Collapse
|
19
|
|
20
|
Suh N, Huo X, Heim E, Seversky L. A network model that combines latent factors and sparse graphs. Stat Anal Data Min 2021. [DOI: 10.1002/sam.11492] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/05/2022]
Affiliation(s)
- Namjoon Suh
- Industrial and Systems Engineering Georgia Institute of Technology Atlanta Georgia USA
| | - Xiaoming Huo
- Industrial and Systems Engineering Georgia Institute of Technology Atlanta Georgia USA
| | - Eric Heim
- Software Engineering Institute Carnegie Mellon University Pittsburgh Pennsylvania USA
| | - Lee Seversky
- Information System Division (AFRL/RIS) The Air Force Research Laboratory Rome NewYork USA
| |
Collapse
|
21
|
Affiliation(s)
- Jiangzhou Wang
- School of Mathematics and Statistics & KLAS Northeast Normal University Changchun 130024 China
| | - Jianhua Guo
- School of Mathematics and Statistics & KLAS Northeast Normal University Changchun 130024 China
| | - Binghui Liu
- School of Mathematics and Statistics & KLAS Northeast Normal University Changchun 130024 China
| |
Collapse
|
22
|
Liu H, Jin IH, Zhang Z, Yuan Y. Social Network Mediation Analysis: A Latent Space Approach. Psychometrika 2021; 86:272-298. [PMID: 33346886 DOI: 10.1007/s11336-020-09736-z] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/09/2019] [Accepted: 11/24/2020] [Indexed: 06/12/2023]
Abstract
A social network comprises both actors and the social connections among them. Such connections reflect the dependence among social actors, which is essential for individuals' mental health and social development. In this article, we propose a mediation model with a social network as a mediator to investigate the potential mediation role of a social network. In the model, the dependence among actors is accounted for by a few mutually orthogonal latent dimensions which form a social space. The individuals' positions in such a latent social space are directly involved in the mediation process between an independent and dependent variable. After showing that all the latent dimensions are equivalent in terms of their relationship to the social network and the meaning of each dimension is arbitrary, we propose to measure the whole mediation effect of a network. Although individuals' positions in the latent space are not unique, we rigorously articulate that the proposed network mediation effect is still well defined. We use a Bayesian estimation method to estimate the model and evaluate its performance through an extensive simulation study under representative conditions. The usefulness of the network mediation model is demonstrated through an application to a college friendship network.
Collapse
Affiliation(s)
- Haiyan Liu
- Psychological Sciences, University of California, Merced, 5200 N. Lake Road, Merced, CA, 95343, USA.
| | - Ick Hoon Jin
- Department of Applied Statistics, Department of Statistics and Data Science, Yonsei University, Seoul, South Korea
| | - Zhiyong Zhang
- Department of Psychology, University of Notre Dame, Notre Dame, USA
| | - Ying Yuan
- Department of Biostatistics, The University of Texas MD, Anderson Cancer Center, Houston, USA
| |
Collapse
|
23
|
Babkin S, Stewart JR, Long X, Schweinberger M. Large-scale estimation of random graph models with local dependence. Comput Stat Data Anal 2020; 152:107029. [PMID: 32834264 PMCID: PMC7282802 DOI: 10.1016/j.csda.2020.107029] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/06/2019] [Revised: 03/11/2020] [Accepted: 06/04/2020] [Indexed: 01/23/2023]
Abstract
A class of random graph models is considered, combining features of exponential-family models and latent structure models, with the goal of retaining the strengths of both of them while reducing the weaknesses of each of them. An open problem is how to estimate such models from large networks. A novel approach to large-scale estimation is proposed, taking advantage of the local structure of such models for the purpose of local computing. The main idea is that random graphs with local dependence can be decomposed into subgraphs, which enables parallel computing on subgraphs and suggests a two-step estimation approach. The first step estimates the local structure underlying random graphs. The second step estimates parameters given the estimated local structure of random graphs. Both steps can be implemented in parallel, which enables large-scale estimation. The advantages of the two-step estimation approach are demonstrated by simulation studies with up to 10,000 nodes and an application to a large Amazon product recommendation network with more than 10,000 products.
Collapse
Affiliation(s)
| | - Jonathan R. Stewart
- Department of Statistics, Florida State University, United States of America
| | - Xiaochen Long
- Department of Statistics, Rice University, United States of America
| | | |
Collapse
|
24
|
Xie F, Xu Y. Optimal Bayesian estimation for random dot product graphs. Biometrika 2020. [DOI: 10.1093/biomet/asaa031] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/15/2022] Open
Abstract
Summary
We propose and prove the optimality of a Bayesian approach for estimating the latent positions in random dot product graphs, which we call posterior spectral embedding. Unlike classical spectral-based adjacency, or Laplacian spectral embedding, posterior spectral embedding is a fully likelihood-based graph estimation method that takes advantage of the Bernoulli likelihood information of the observed adjacency matrix. We develop a minimax lower bound for estimating the latent positions, and show that posterior spectral embedding achieves this lower bound in the following two senses: it both results in a minimax-optimal posterior contraction rate and yields a point estimator achieving the minimax risk asymptotically. The convergence results are subsequently applied to clustering in stochastic block models with positive semidefinite block probability matrices, strengthening an existing result concerning the number of misclustered vertices. We also study a spectral-based Gaussian spectral embedding as a natural Bayesian analogue of adjacency spectral embedding, but the resulting posterior contraction rate is suboptimal by an extra logarithmic factor. The practical performance of the proposed methodology is illustrated through extensive synthetic examples and the analysis of Wikipedia graph data.
Collapse
Affiliation(s)
- Fangzheng Xie
- Department of Applied Mathematics and Statistics, Johns Hopkins University, 3400 North Charles Street, Baltimore, Maryland 21218, U.S.A
| | - Yanxun Xu
- Department of Applied Mathematics and Statistics, Johns Hopkins University, 3400 North Charles Street, Baltimore, Maryland 21218, U.S.A
| |
Collapse
|
25
|
Schweinberger M, Krivitsky PN, Butts CT, Stewart JR. Exponential-Family Models of Random Graphs: Inference in Finite, Super and Infinite Population Scenarios. Stat Sci 2020. [DOI: 10.1214/19-sts743] [Citation(s) in RCA: 27] [Impact Index Per Article: 6.8] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/19/2022]
|
26
|
Affiliation(s)
- Jianwei Hu
- Department of Statistics, Central China Normal University , Wuhan , China
| | - Jingfei Zhang
- Department of Management Science, University of Miami , Coral Gables , FL
| | - Hong Qin
- Department of Statistics, Central China Normal University , Wuhan , China
- Department of Statistics, Zhongnan University of Economics and Law , Wuhan , China
| | - Ting Yan
- Department of Statistics, Central China Normal University , Wuhan , China
| | - Ji Zhu
- Department of Statistics, University of Michigan , Ann Arbor , MI
| |
Collapse
|
27
|
|
28
|
Pavlović DM, Guillaume BR, Afyouni S, Nichols TE. Multi‐subject stochastic blockmodels with mixed effects for adaptive analysis of individual differences in human brain network cluster structure. STAT NEERL 2020. [DOI: 10.1111/stan.12219] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
Affiliation(s)
- Dragana M. Pavlović
- Oxford Big Data Institute Li Ka Shing Centre for Health Information and Discovery Nuffield Department of Population HealthUniversity of Oxford Oxford UK
| | - Bryan R.L. Guillaume
- Oxford Big Data Institute Li Ka Shing Centre for Health Information and Discovery Nuffield Department of Population HealthUniversity of Oxford Oxford UK
| | - Soroosh Afyouni
- Oxford Big Data Institute Li Ka Shing Centre for Health Information and Discovery Nuffield Department of Population HealthUniversity of Oxford Oxford UK
| | - Thomas E. Nichols
- Oxford Big Data Institute Li Ka Shing Centre for Health Information and Discovery Nuffield Department of Population HealthUniversity of Oxford Oxford UK
| |
Collapse
|
29
|
|
30
|
Xia CH, Ma Z, Cui Z, Bzdok D, Thirion B, Bassett DS, Satterthwaite TD, Shinohara RT, Witten DM. Multi-scale network regression for brain-phenotype associations. Hum Brain Mapp 2020; 41:2553-2566. [PMID: 32216125 PMCID: PMC7383128 DOI: 10.1002/hbm.24982] [Citation(s) in RCA: 12] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/15/2019] [Revised: 01/31/2020] [Accepted: 02/26/2020] [Indexed: 02/03/2023] Open
Abstract
Brain networks are increasingly characterized at different scales, including summary statistics, community connectivity, and individual edges. While research relating brain networks to behavioral measurements has yielded many insights into brain‐phenotype relationships, common analytical approaches only consider network information at a single scale. Here, we designed, implemented, and deployed Multi‐Scale Network Regression (MSNR), a penalized multivariate approach for modeling brain networks that explicitly respects both edge‐ and community‐level information by assuming a low rank and sparse structure, both encouraging less complex and more interpretable modeling. Capitalizing on a large neuroimaging cohort (n = 1, 051), we demonstrate that MSNR recapitulates interpretable and statistically significant connectivity patterns associated with brain development, sex differences, and motion‐related artifacts. Compared to single‐scale methods, MSNR achieves a balance between prediction performance and model complexity, with improved interpretability. Together, by jointly exploiting both edge‐ and community‐level information, MSNR has the potential to yield novel insights into brain‐behavior relationships.
Collapse
Affiliation(s)
- Cedric Huchuan Xia
- Department of Psychiatry, Perelman School of Medicine, University of Pennsylvania, Philadelphia, Pennsylvania, USA
| | - Zongming Ma
- Department of Statistics, The Wharton School, University of Pennsylvania, Philadelphia, Pennsylvania, USA
| | - Zaixu Cui
- Department of Psychiatry, Perelman School of Medicine, University of Pennsylvania, Philadelphia, Pennsylvania, USA
| | - Danilo Bzdok
- Department of Psychiatry, Psychopathology and Psychosomatics, RWTH Aachen University, Aachen, Germany.,JARA-BRAIN, Jülich-Aachen Research Alliance, Jülich, Germany.,Université Paris-Saclay, CEA, Inria, Gif-sur-Yvette, France.,Department of Bioengineering, McGill University, Montreal, Canada
| | | | - Danielle S Bassett
- Department of Psychiatry, Perelman School of Medicine, University of Pennsylvania, Philadelphia, Pennsylvania, USA.,Department of Bioengineering, School of Engineering and Applied Science, University of Pennsylvania, Philadelphia, Pennsylvania, USA.,Department of Electrical and Systems Engineering, School of Engineering and Applied Science, University of Pennsylvania, Philadelphia, Pennsylvania, USA.,Department of Neurology, Perelman School of Medicine, University of Pennsylvania, Philadelphia, Pennsylvania, USA.,Department of Physics and Astronomy, School of Arts and Science, University of Pennsylvania, Philadelphia, Pennsylvania, USA.,Santa Fe Institute, Santa Fe, New Mexico, USA
| | - Theodore D Satterthwaite
- Department of Psychiatry, Perelman School of Medicine, University of Pennsylvania, Philadelphia, Pennsylvania, USA
| | - Russell T Shinohara
- Penn Statistics and Visualization Center, Department of Biostatistics, Epidemiology, and Informatics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, Pennsylvania, USA.,Center for Biomedical Imaging Computing and Analytics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, Pennsylvania, USA
| | - Daniela M Witten
- Department of Statistics, College of Arts and Science, University of Washington, Seattle, Washington, USA.,Department of Biostatistics, School of Public Health, University of Washington, Seattle, Washington, USA
| |
Collapse
|
31
|
Zhang Y, Yuan M. Nonreconstruction of high-dimensional stochastic block model with bounded degree. Stat Probab Lett 2020; 158:108675. [DOI: 10.1016/j.spl.2019.108675] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Indexed: 11/19/2022]
|
32
|
|
33
|
|
34
|
Abstract
Dynamic networks are a general language for describing time-evolving complex systems, and discrete time network models provide an emerging statistical technique for various applications. It is a fundamental research question to detect a set of nodes sharing similar connectivity patterns in time-evolving networks. Our work is primarily motivated by detecting groups based on interesting features of the time-evolving networks (e.g., stability). In this work, we propose a model-based clustering framework for time-evolving networks based on discrete time exponential-family random graph models, which simultaneously allows both modeling and detecting group structure. To choose the number of groups, we use the conditional likelihood to construct an effective model selection criterion. Furthermore, we propose an efficient variational expectation-maximization (EM) algorithm to find approximate maximum likelihood estimates of network parameters and mixing proportions. The power of our method is demonstrated in simulation studies and empirical applications to international trade networks and the collaboration networks of a large research university.
Collapse
Affiliation(s)
- Kevin H. Lee
- Department of Statistics, Western Michigan University, Kalamazoo, MI 49008, USA
| | - Lingzhou Xue
- Department of Statistics, The Pennsylvania State University, University Park, PA 16802, USA
| | - David R. Hunter
- Department of Statistics, The Pennsylvania State University, University Park, PA 16802, USA
| |
Collapse
|
35
|
Abstract
Summary
We consider multi-layer network data where the relationships between pairs of elements are reflected in multiple modalities, and may be described by multivariate or even high-dimensional vectors. Under the multi-layer stochastic block model framework we derive consistency results for a least squares estimation of memberships. Our theorems show that, as compared to single-layer community detection, a multi-layer network provides much richer information that allows for consistent community detection from a much sparser network, with required edge density reduced by a factor of the square root of the number of layers. Moreover, the multi-layer framework can detect cohesive community structure across layers, which might be hard to detect by any single-layer or simple aggregation. Simulations and a data example are provided to support the theoretical results.
Collapse
Affiliation(s)
- Jing Lei
- Department of Statistics and Data Science, Carnegie Mellon University, Baker Hall 132, Pittsburgh, Pennsylvania 15213, U.S.A
| | - Kehui Chen
- Department of Statistics, University of Pittsburgh, 1800 Wesley W. Posvar Hall, Pittsburgh, Pennsylvania 15260, U.S.A
| | - Brian Lynch
- Department of Statistics, University of Pittsburgh, 1800 Wesley W. Posvar Hall, Pittsburgh, Pennsylvania 15260, U.S.A
| |
Collapse
|
36
|
|
37
|
Roy S, Atchadé Y, Michailidis G. Likelihood Inference for Large Scale Stochastic Blockmodels with Covariates based on a Divide-and-Conquer Parallelizable Algorithm with Communication. J Comput Graph Stat 2019; 28:609-619. [PMID: 31595140 DOI: 10.1080/10618600.2018.1554486] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/27/2022]
Abstract
We consider a stochastic blockmodel equipped with node covariate information, that is helpful in analyzing social network data. The key objective is to obtain maximum likelihood estimates of the model parameters. For this task, we devise a fast, scalable Monte Carlo EM type algorithm based on case-control approximation of the log-likelihood coupled with a subsampling approach. A key feature of the proposed algorithm is its parallelizability, by processing portions of the data on several cores, while leveraging communication of key statistics across the cores during each iteration of the algorithm. The performance of the algorithm is evaluated on synthetic data sets and compared with competing methods for blockmodel parameter estimation. We also illustrate the model on data from a Facebook derived social network enhanced with node covariate information.
Collapse
Affiliation(s)
- Sandipan Roy
- Department of Statistical Science, University College London
| | - Yves Atchadé
- Department of Statistics, University of Michigan
| | - George Michailidis
- Department of Statistics & Informatics Institute, University of Florida November 15, 2018
| |
Collapse
|
38
|
Ma F, Wang Y, Yuen KF, Wang W, Li X, Liang Y. The Evolution of the Spatial Association Effect of Carbon Emissions in Transportation: A Social Network Perspective. Int J Environ Res Public Health 2019; 16:ijerph16122154. [PMID: 31216689 PMCID: PMC6616870 DOI: 10.3390/ijerph16122154] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 04/09/2019] [Revised: 05/21/2019] [Accepted: 06/14/2019] [Indexed: 11/23/2022]
Abstract
The association effect between provincial transportation carbon emissions has become an important issue in regional carbon emission management. This study explored the relationship and development trends associated with regional transportation carbon emissions. A social network method was used to analyze the structural characteristics of the spatial association of transportation carbon emissions. Indicators for each of the structural characteristics were selected from three dimensions: The integral network, node network, and spatial clustering. Then, this study established an association network for transportation carbon emissions (ANTCE) using a gravity model with China’s provincial data during the period of 2007 to 2016. Further, a block model (a method of partitioning provinces based on the information of transportation carbon emission) was used to group the ANTCE network of inter-provincial transportation carbon emissions to examine the overall association structure. There were three key findings. First, the tightness of China’s ANTCE network is growing, and its complexity and robustness are gradually increasing. Second, China’s ANTCE network shows a structural characteristic of “dense east and thin west.” That is, the transportation carbon emissions of eastern provinces in China are highly correlated, while those of central and western provinces are less correlated. Third, the eastern provinces belong to the two-way spillover or net benefit block, the central regions belong to the broker block, and the western provinces belong to the net spillover block. This indicates that the transportation carbon emissions in the western regions are flowing to the eastern and central regions. Finally, a regression analysis using a quadratic assignment procedure (QAP) was used to explore the spatial association between provinces. We found that per capita gross domestic product (GDP) and fixed transportation investments significantly influence the association and spillover effects of the ANTCE network. The research findings provide a theoretical foundation for the development of policies that may better coordinate carbon emission mitigation in regional transportation.
Collapse
Affiliation(s)
- Fei Ma
- School of Economics and Management, Chang'an University, Xi'an 710064, China.
| | - Yixuan Wang
- School of Economics and Management, Chang'an University, Xi'an 710064, China.
| | - Kum Fai Yuen
- Department of International Logistics, Chung-Ang University, Seoul 06974, Korea.
| | - Wenlin Wang
- School of Economics and Management, Chang'an University, Xi'an 710064, China.
| | - Xiaodan Li
- School of Economics and Management, Chang'an University, Xi'an 710064, China.
| | - Yuan Liang
- School of Economics and Management, Chang'an University, Xi'an 710064, China.
| |
Collapse
|
39
|
Zhao MJ, Driscoll AR, Sengupta S, Stevens NT, Fricker RD, Woodall WH. The effect of temporal aggregation level in social network monitoring. PLoS One 2018; 13:e0209075. [PMID: 30566509 PMCID: PMC6300332 DOI: 10.1371/journal.pone.0209075] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/23/2018] [Accepted: 11/29/2018] [Indexed: 11/19/2022] Open
Abstract
Social networks have become ubiquitous in modern society, which makes social network monitoring a research area of significant practical importance. Social network data consist of social interactions between pairs of individuals that are temporally aggregated over a certain interval of time, and the level of such temporal aggregation can have substantial impact on social network monitoring. There have been several studies on the effect of temporal aggregation in the process monitoring literature, but no studies on the effect of temporal aggregation in social network monitoring. We use the degree corrected stochastic block model (DCSBM) to simulate social networks and network anomalies and analyze these networks in the context of both count and binary network data. In conjunction with this model, we use the Priebe scan method as the monitoring method. We demonstrate that temporal aggregation at high levels leads to a considerable decrease in the ability to detect an anomaly within a specified time period. Moreover, converting social network communication data from counts to binary indicators can result in a significant loss of information, hindering detection performance. Aggregation at an appropriate level with count data, however, can amplify the anomalous signal generated by network anomalies and improve detection performance. Our results provide both insights on the practical effects of temporal aggregation and a framework for the study of other combinations of network models, surveillance methods, and types of anomalies.
Collapse
Affiliation(s)
- Meng J. Zhao
- Department of Statistics, Virginia Polytechnic Institute and State University, Blacksburg, VA, United States of America
- * E-mail:
| | - Anne R. Driscoll
- Department of Statistics, Virginia Polytechnic Institute and State University, Blacksburg, VA, United States of America
| | - Srijan Sengupta
- Department of Statistics, Virginia Polytechnic Institute and State University, Blacksburg, VA, United States of America
| | - Nathaniel T. Stevens
- Department of Mathematics and Statistics, University of San Francisco, San Francisco, California, United States of America
| | - Ronal D. Fricker
- Department of Statistics, Virginia Polytechnic Institute and State University, Blacksburg, VA, United States of America
| | - William H. Woodall
- Department of Statistics, Virginia Polytechnic Institute and State University, Blacksburg, VA, United States of America
| |
Collapse
|
40
|
Abstract
Many existing statistical and machine learning tools for social network analysis focus on a single level of analysis. Methods designed for clustering optimize a global partition of the graph, whereas projection-based approaches (e.g., the latent space model in the statistics literature) represent in rich detail the roles of individuals. Many pertinent questions in sociology and economics, however, span multiple scales of analysis. Further, many questions involve comparisons across disconnected graphs that will, inevitably be of different sizes, either due to missing data or the inherent heterogeneity in real-world networks. We propose a class of network models that represent network structure on multiple scales and facilitate comparison across graphs with different numbers of individuals. These models differentially invest modeling effort within subgraphs of high density, often termed communities, while maintaining a parsimonious structure between said subgraphs. We show that our model class is projective, highlighting an ongoing discussion in the social network modeling literature on the dependence of inference paradigms on the size of the observed graph. We illustrate the utility of our method using data on household relations from Karnataka, India. Supplementary material for this article is available online.
Collapse
Affiliation(s)
- Bailey K Fosdick
- Department of Statistics, Colorado State University, Fort Collins, CO
| | - Tyler H McCormick
- Department of Statistics, Department of Sociology, University of Washington, Seattle, WA
| | - Thomas Brendan Murphy
- School of Mathematics and Statistics, University College Dublin, Belfield, Dublin, Ireland
| | - Tin Lok James Ng
- School of Mathematics and Statistics, University College Dublin, Belfield, Dublin, Ireland
| | - Ted Westling
- Department of Statistics, University of Washington, Seattle, WA
| |
Collapse
|
41
|
|
42
|
|
43
|
|
44
|
Zhao Y, Pan Q, Du C. Logistic regression augmented community detection for network data with application in identifying autism-related gene pathways. Biometrics 2018; 75:222-234. [PMID: 30039855 DOI: 10.1111/biom.12955] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/01/2016] [Revised: 06/01/2018] [Accepted: 07/01/2018] [Indexed: 12/01/2022]
Abstract
When searching for gene pathways leading to specific disease outcomes, additional information on gene characteristics is often available that may facilitate to differentiate genes related to the disease from irrelevant background when connections involving both types of genes are observed and their relationships to the disease are unknown. We propose method to single out irrelevant background genes with the help of auxiliary information through a logistic regression, and cluster relevant genes into cohesive groups using the adjacency matrix. Expectation-maximization algorithm is modified to maximize a joint pseudo-likelihood assuming latent indicators for relevance to the disease and latent group memberships as well as Poisson or multinomial distributed link numbers within and between groups. A robust version allowing arbitrary linkage patterns within the background is further derived. Asymptotic consistency of label assignments under the stochastic blockmodel is proven. Superior performance and robustness in finite samples are observed in simulation studies. The proposed robust method identifies previously missed gene sets underlying autism related neurological diseases using diverse data sources including de novo mutations, gene expressions, and protein-protein interactions.
Collapse
Affiliation(s)
- Yunpeng Zhao
- School of Mathematical and Natural Sciences, Arizona State University, Tempe, Arizona 85281, U.S.A
| | - Qing Pan
- Department of Statistics, George Washington University, Washington DC 20032, U.S.A
| | - Chengan Du
- Center for Outcomes Research and Evaluation, Yale University, New Haven, Connecticut 06510, U.S.A
| |
Collapse
|
45
|
|
46
|
Affiliation(s)
| | - Yuguo Chen
- University of Illinois Urbana–Champaign USA
| |
Collapse
|
47
|
Affiliation(s)
- Yunpeng Zhao
- Department of Statistics George Mason University Fairfax VA USA
| |
Collapse
|
48
|
|
49
|
|
50
|
Affiliation(s)
| | - Yi Yu
- Statistical Laboratory, University of Cambridge, Cambridge, UK
| | - Yang Feng
- Department of Statistics, Columbia University, New York, New York
| |
Collapse
|