1
|
Ren M, Zhang S, Wang J. Consistent estimation of the number of communities via regularized network embedding. Biometrics 2023; 79:2404-2416. [PMID: 36573805 DOI: 10.1111/biom.13815] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/07/2022] [Accepted: 12/20/2022] [Indexed: 12/28/2022]
Abstract
The network analysis plays an important role in numerous application domains including biomedicine. Estimation of the number of communities is a fundamental and critical issue in network analysis. Most existing studies assume that the number of communities is known a priori, or lack of rigorous theoretical guarantee on the estimation consistency. In this paper, we propose a regularized network embedding model to simultaneously estimate the community structure and the number of communities in a unified formulation. The proposed model equips network embedding with a novel composite regularization term, which pushes the embedding vector toward its center and pushes similar community centers collapsed with each other. A rigorous theoretical analysis is conducted, establishing asymptotic consistency in terms of community detection and estimation of the number of communities. Extensive numerical experiments have also been conducted on both synthetic networks and brain functional connectivity network, which demonstrate the superior performance of the proposed method compared with existing alternatives.
Collapse
Affiliation(s)
- Mingyang Ren
- Department of Statistics, The Chinese University of Hong Kong, Ma Liu Shui, Hong Kong
- School of Mathematical Sciences, Key Laboratory of Big Data Mining and Knowledge Management, University of Chinese Academy of Sciences, Beijing, China
| | - Sanguo Zhang
- School of Mathematical Sciences, Key Laboratory of Big Data Mining and Knowledge Management, University of Chinese Academy of Sciences, Beijing, China
| | - Junhui Wang
- Department of Statistics, The Chinese University of Hong Kong, Ma Liu Shui, Hong Kong
| |
Collapse
|
2
|
Xu Y, Koidis A, Tian X, Xu S, Xu X, Wei X, Jiang A, Lei H. Bayesian Fusion Model Enhanced Codfish Classification Using Near Infrared and Raman Spectrum. Foods 2022; 11:foods11244100. [PMID: 36553842 PMCID: PMC9777887 DOI: 10.3390/foods11244100] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/07/2022] [Revised: 12/14/2022] [Accepted: 12/14/2022] [Indexed: 12/24/2022] Open
Abstract
In this study, a Bayesian-based decision fusion technique was developed for the first time to quickly and non-destructively identify codfish using near infrared (NIRS) and Raman spectroscopy (RS). NIRS and RS spectra from 320 codfish samples were collected, and separate partial least squares discriminant analysis (PLS-DA) models were developed to establish the relationship between the raw data and cod identity for each spectral technique. Three decision fusion methods: decision fusion, data layer or feature layer, were tested and compared. The decision fusion model based on the Bayesian algorithm (NIRS-RS-B) was developed on the optimal discrimination features of NIRS and RS data (NIRS-RS) extracted by the PLS-DA method whereas the other fusion models followed conventional, non-Bayesian approaches. The Bayesian model showed enhanced classification metrics (92% sensitivity, 98% specificity, 98% accuracy) that were significantly superior to those demonstrated by any of other two spectroscopic methods (NIRS, RS) and the two data fusion methods (data layer fused, NIRS-RS-D, or feature layer fused, NIRS-RS-F). This novel proposed approach can provide an alternative classification for codfish and potentially other food speciation cases.
Collapse
Affiliation(s)
- Yi Xu
- Guangdong Provincial Key Laboratory of Food Quality and Safety/Nation-Local Joint Engineering Research Center for Precision Machining and Safety of Livestock and Poultry Products, College of Food Science, South China Agricultural University, Guangzhou 510642, China
- College of Light Industry and Engineering, Sichuan Technology & Business College, Chengdu 611800, China
- Guangdong Laboratory for Lingnan Modern Agriculture, Guangzhou 510642, China
| | - Anastasios Koidis
- Institute for Global Food Security, Queen’s University Belfast, 19 Chlorine Gardens, Belfast BT9 5DJ, UK
| | - Xingguo Tian
- Guangdong Provincial Key Laboratory of Food Quality and Safety/Nation-Local Joint Engineering Research Center for Precision Machining and Safety of Livestock and Poultry Products, College of Food Science, South China Agricultural University, Guangzhou 510642, China
| | - Sai Xu
- Public Monitoring Center of Agricultural Products, Guangdong Academy of Agricultural Sciences, Guangzhou 510642, China
| | - Xiaoyan Xu
- Guangdong Provincial Key Laboratory of Food Quality and Safety/Nation-Local Joint Engineering Research Center for Precision Machining and Safety of Livestock and Poultry Products, College of Food Science, South China Agricultural University, Guangzhou 510642, China
| | - Xiaoqun Wei
- Guangdong Provincial Key Laboratory of Food Quality and Safety/Nation-Local Joint Engineering Research Center for Precision Machining and Safety of Livestock and Poultry Products, College of Food Science, South China Agricultural University, Guangzhou 510642, China
| | - Aimin Jiang
- Guangdong Provincial Key Laboratory of Food Quality and Safety/Nation-Local Joint Engineering Research Center for Precision Machining and Safety of Livestock and Poultry Products, College of Food Science, South China Agricultural University, Guangzhou 510642, China
- Correspondence: (A.J.); (H.L.); Tel.: +86-20-8528-0270 (A.J.); +86-20-8528-3925 (H.L.)
| | - Hongtao Lei
- Guangdong Provincial Key Laboratory of Food Quality and Safety/Nation-Local Joint Engineering Research Center for Precision Machining and Safety of Livestock and Poultry Products, College of Food Science, South China Agricultural University, Guangzhou 510642, China
- Guangdong Laboratory for Lingnan Modern Agriculture, Guangzhou 510642, China
- Correspondence: (A.J.); (H.L.); Tel.: +86-20-8528-0270 (A.J.); +86-20-8528-3925 (H.L.)
| |
Collapse
|
3
|
Legramanti S, Rigon T, Durante D, Dunson DB. Extended stochastic block models with application to criminal networks. Ann Appl Stat 2022; 16:2369-2395. [DOI: 10.1214/21-aoas1595] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/19/2022]
Affiliation(s)
- Sirio Legramanti
- Department Decision Sciences and Institute for Data Science and Analytics, Bocconi University
| | - Tommaso Rigon
- Department of Economics, Management and Statistics, University of Milano-Bicocca
| | - Daniele Durante
- Department Decision Sciences and Institute for Data Science and Analytics, Bocconi University
| | | |
Collapse
|
4
|
Huang S, Weng H, Feng Y. Spectral clustering via adaptive layer aggregation for multi-layer networks*. J Comput Graph Stat 2022. [DOI: 10.1080/10618600.2022.2134874] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/17/2022]
Affiliation(s)
- Sihan Huang
- Department of Statistics, Columbia University
| | - Haolei Weng
- Department of Statistics and Probability, Michigan State University
| | - Yang Feng
- Department of Biostatistics, New York University
| |
Collapse
|
5
|
Qing H. A Useful Criterion on Studying Consistent Estimation in Community Detection. ENTROPY (BASEL, SWITZERLAND) 2022; 24:1098. [PMID: 36010762 PMCID: PMC9407257 DOI: 10.3390/e24081098] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 07/10/2022] [Revised: 08/04/2022] [Accepted: 08/08/2022] [Indexed: 06/15/2023]
Abstract
In network analysis, developing a unified theoretical framework that can compare methods under different models is an interesting problem. This paper proposes a partial solution to this problem. We summarize the idea of using a separation condition for a standard network and sharp threshold of the Erdös-Rényi random graph to study consistent estimation, and compare theoretical error rates and requirements on the network sparsity of spectral methods under models that can degenerate to a stochastic block model as a four-step criterion SCSTC. Using SCSTC, we find some inconsistent phenomena on separation condition and sharp threshold in community detection. In particular, we find that the original theoretical results of the SPACL algorithm introduced to estimate network memberships under the mixed membership stochastic blockmodel are sub-optimal. To find the formation mechanism of inconsistencies, we re-establish the theoretical convergence rate of this algorithm by applying recent techniques on row-wise eigenvector deviation. The results are further extended to the degree-corrected mixed membership model. By comparison, our results enjoy smaller error rates, lesser dependence on the number of communities, weaker requirements on network sparsity, and so forth. The separation condition and sharp threshold obtained from our theoretical results match the classical results, so the usefulness of this criterion on studying consistent estimation is guaranteed. Numerical results for computer-generated networks support our finding that spectral methods considered in this paper achieve the threshold of separation condition.
Collapse
Affiliation(s)
- Huan Qing
- School of Mathematics, China University of Mining and Technology, Xuzhou 221116, China
| |
Collapse
|
6
|
A likelihood-ratio type test for stochastic block models with bounded degrees. J Stat Plan Inference 2022. [DOI: 10.1016/j.jspi.2021.12.005] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/23/2022]
|
7
|
Rubin‐Delanchy P, Cape J, Tang M, Priebe CE. A statistical interpretation of spectral embedding: The generalised random dot product graph. J R Stat Soc Series B Stat Methodol 2022. [DOI: 10.1111/rssb.12509] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/27/2022]
Affiliation(s)
| | - Joshua Cape
- University of Pittsburgh Pittsburgh Pennsylvania USA
| | - Minh Tang
- North Carolina State University Raleigh North Carolina USA
| | | |
Collapse
|
8
|
Tang M, Cape J, Priebe CE. Asymptotically efficient estimators for stochastic blockmodels: The naive MLE, the rank-constrained MLE, and the spectral estimator. BERNOULLI 2022. [DOI: 10.3150/21-bej1376] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/19/2022]
Affiliation(s)
- Minh Tang
- Department of Statistics, North Carolina State University, Raleigh, NC, USA
| | - Joshua Cape
- Department of Statistics, University of Pittsburgh, Pittsburgh, PA, USA
| | - Carey E. Priebe
- Applied Mathematics and Statistics, Johns Hopkins University, Baltimore, MD, USA
| |
Collapse
|
9
|
Zhang H, Guo X, Chang X. Randomized Spectral Clustering in Large-Scale Stochastic Block Models. J Comput Graph Stat 2022. [DOI: 10.1080/10618600.2022.2034636] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/19/2022]
Affiliation(s)
- Hai Zhang
- Center for Modern Statistics, School of Mathematics, Northwest University, China
| | - Xiao Guo
- Center for Modern Statistics, School of Mathematics, Northwest University, China
| | - Xiangyu Chang
- Center for Intelligent Decision-Making and Machine Learning, School of Management, Xi’an Jiaotong University, China
| |
Collapse
|
10
|
Keriven N, Vaiter S. Sparse and smooth: Improved guarantees for spectral clustering in the dynamic stochastic block model. Electron J Stat 2022. [DOI: 10.1214/22-ejs1986] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/19/2022]
|
11
|
Two provably consistent divide-and-conquer clustering algorithms for large networks. Proc Natl Acad Sci U S A 2021; 118:2100482118. [PMID: 34716259 DOI: 10.1073/pnas.2100482118] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 08/30/2021] [Indexed: 11/18/2022] Open
Abstract
In this article, we advance divide-and-conquer strategies for solving the community detection problem in networks. We propose two algorithms that perform clustering on several small subgraphs and finally patch the results into a single clustering. The main advantage of these algorithms is that they significantly bring down the computational cost of traditional algorithms, including spectral clustering, semidefinite programs, modularity-based methods, likelihood-based methods, etc., without losing accuracy, and even improving accuracy at times. These algorithms are also, by nature, parallelizable. Since most traditional algorithms are accurate, and the corresponding optimization problems are much simpler in small problems, our divide-and-conquer methods provide an omnibus recipe for scaling traditional algorithms up to large networks. We prove the consistency of these algorithms under various subgraph selection procedures and perform extensive simulations and real-data analysis to understand the advantages of the divide-and-conquer approach in various settings.
Collapse
|
12
|
Löffler M, Zhang AY, Zhou HH. Optimality of spectral clustering in the Gaussian mixture model. Ann Stat 2021. [DOI: 10.1214/20-aos2044] [Citation(s) in RCA: 6] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/19/2022]
|
13
|
Smoothing graphons for modelling exchangeable relational data. Mach Learn 2021. [DOI: 10.1007/s10994-021-06046-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/26/2022]
|
14
|
Xie F, Xu Y. Efficient Estimation for Random Dot Product Graphs via a One-Step Procedure. J Am Stat Assoc 2021. [DOI: 10.1080/01621459.2021.1948419] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/20/2022]
Affiliation(s)
- Fangzheng Xie
- Department of Statistics, Indiana University, Bloomington, IN
- Department of Applied Mathematics and Statistics, Johns Hopkins University, Baltimore, MD
| | - Yanxun Xu
- Department of Applied Mathematics and Statistics, Johns Hopkins University, Baltimore, MD
| |
Collapse
|
15
|
Landa B, Coifman RR, Kluger Y. Doubly Stochastic Normalization of the Gaussian Kernel Is Robust to Heteroskedastic Noise. SIAM JOURNAL ON MATHEMATICS OF DATA SCIENCE 2021; 3:388-413. [PMID: 34124607 PMCID: PMC8194191 DOI: 10.1137/20m1342124] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/12/2023]
Abstract
A fundamental step in many data-analysis techniques is the construction of an affinity matrix describing similarities between data points. When the data points reside in Euclidean space, a widespread approach is to from an affinity matrix by the Gaussian kernel with pairwise distances, and to follow with a certain normalization (e.g. the row-stochastic normalization or its symmetric variant). We demonstrate that the doubly-stochastic normalization of the Gaussian kernel with zero main diagonal (i.e., no self loops) is robust to heteroskedastic noise. That is, the doubly-stochastic normalization is advantageous in that it automatically accounts for observations with different noise variances. Specifically, we prove that in a suitable high-dimensional setting where heteroskedastic noise does not concentrate too much in any particular direction in space, the resulting (doubly-stochastic) noisy affinity matrix converges to its clean counterpart with rate m -1/2, where m is the ambient dimension. We demonstrate this result numerically, and show that in contrast, the popular row-stochastic and symmetric normalizations behave unfavorably under heteroskedastic noise. Furthermore, we provide examples of simulated and experimental single-cell RNA sequence data with intrinsic heteroskedasticity, where the advantage of the doubly-stochastic normalization for exploratory analysis is evident.
Collapse
Affiliation(s)
- Boris Landa
- Program in Applied Mathematics, Yale University
| | | | - Yuval Kluger
- Program in Applied Mathematics, Yale University
- Interdepartmental Program in Computational Biology and Bioinformatics, Yale University
- Department of Pathology, Yale University School of Medicine
| |
Collapse
|
16
|
Gao C, Ma Z. Minimax Rates in Network Analysis: Graphon Estimation, Community Detection and Hypothesis Testing. Stat Sci 2021. [DOI: 10.1214/19-sts736] [Citation(s) in RCA: 7] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/19/2022]
|
17
|
Hu J, Zhang J, Qin H, Yan T, Zhu J. Using Maximum Entry-Wise Deviation to Test the Goodness of Fit for Stochastic Block Models. J Am Stat Assoc 2020. [DOI: 10.1080/01621459.2020.1722676] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/25/2022]
Affiliation(s)
- Jianwei Hu
- Department of Statistics, Central China Normal University , Wuhan , China
| | - Jingfei Zhang
- Department of Management Science, University of Miami , Coral Gables , FL
| | - Hong Qin
- Department of Statistics, Central China Normal University , Wuhan , China
- Department of Statistics, Zhongnan University of Economics and Law , Wuhan , China
| | - Ting Yan
- Department of Statistics, Central China Normal University , Wuhan , China
| | - Ji Zhu
- Department of Statistics, University of Michigan , Ann Arbor , MI
| |
Collapse
|
18
|
Abstract
Summary
While many statistical models and methods are now available for network analysis, resampling of network data remains a challenging problem. Cross-validation is a useful general tool for model selection and parameter tuning, but it is not directly applicable to networks since splitting network nodes into groups requires deleting edges and destroys some of the network structure. In this paper we propose a new network resampling strategy, based on splitting node pairs rather than nodes, that is applicable to cross-validation for a wide range of network model selection tasks. We provide theoretical justification for our method in a general setting and examples of how the method can be used in specific network model selection and parameter tuning tasks. Numerical results on simulated networks and on a statisticians’ citation network show that the proposed cross-validation approach works well for model selection.
Collapse
Affiliation(s)
- Tianxi Li
- Department of Statistics, University of Virginia, B005 Halsey Hall, 148 Amphitheater Way, Charlottesville, Virginia 22904, U.S.A
| | - Elizaveta Levina
- Department of Statistics, University of Michigan, 459 West Hall, 1085 South University Avenue, Ann Arbor, Michigan 48105, U.S.A
| | - Ji Zhu
- Department of Statistics, University of Michigan, 459 West Hall, 1085 South University Avenue, Ann Arbor, Michigan 48105, U.S.A
| |
Collapse
|
19
|
Cape J, Tang M, Priebe CE. The two-to-infinity norm and singular subspace geometry with applications to high-dimensional statistics. Ann Stat 2019. [DOI: 10.1214/18-aos1752] [Citation(s) in RCA: 22] [Impact Index Per Article: 4.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/19/2022]
|
20
|
Cape J, Tang M, Priebe CE. Signal-plus-noise matrix models: eigenvector deviations and fluctuations. Biometrika 2019. [DOI: 10.1093/biomet/asy070] [Citation(s) in RCA: 12] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022] Open
Affiliation(s)
- J Cape
- Department of Applied Mathematics and Statistics, Johns Hopkins University, 3400 North Charles Street, Baltimore, Maryland, USA
| | - M Tang
- Department of Applied Mathematics and Statistics, Johns Hopkins University, 3400 North Charles Street, Baltimore, Maryland, USA
| | - C E Priebe
- Department of Applied Mathematics and Statistics, Johns Hopkins University, 3400 North Charles Street, Baltimore, Maryland, USA
| |
Collapse
|
21
|
|
22
|
Tang M, Priebe CE. Limit theorems for eigenvectors of the normalized Laplacian for random graphs. Ann Stat 2018. [DOI: 10.1214/17-aos1623] [Citation(s) in RCA: 27] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/19/2022]
|
23
|
Abstract
Community detection is challenging when the network structure is estimated with uncertainty. Dynamic networks present additional challenges but also add information across time periods. We propose a global community detection method, persistent communities by eigenvector smoothing (PisCES), that combines information across a series of networks, longitudinally, to strengthen the inference for each period. Our method is derived from evolutionary spectral clustering and degree correction methods. Data-driven solutions to the problem of tuning parameter selection are provided. In simulations we find that PisCES performs better than competing methods designed for a low signal-to-noise ratio. Recently obtained gene expression data from rhesus monkey brains provide samples from finely partitioned brain regions over a broad time span including pre- and postnatal periods. Of interest is how gene communities develop over space and time; however, once the data are divided into homogeneous spatial and temporal periods, sample sizes are very small, making inference quite challenging. Applying PisCES to medial prefrontal cortex in monkey rhesus brains from near conception to adulthood reveals dense communities that persist, merge, and diverge over time and others that are loosely organized and short lived, illustrating how dynamic community detection can yield interesting insights into processes such as brain development.
Collapse
|
24
|
Zhao Y. A survey on theoretical advances of community detection in networks. WIRES COMPUTATIONAL STATISTICS 2017. [DOI: 10.1002/wics.1403] [Citation(s) in RCA: 18] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/10/2022]
Affiliation(s)
- Yunpeng Zhao
- Department of Statistics George Mason University Fairfax VA USA
| |
Collapse
|
25
|
Le CM, Levina E, Vershynin R. Optimization via low-rank approximation for community detection in networks. Ann Stat 2016. [DOI: 10.1214/15-aos1360] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/19/2022]
|
26
|
|