1
|
Wu B, Guo Y, Kang J. Bayesian Spatial Blind Source Separation via the Thresholded Gaussian Process. J Am Stat Assoc 2022; 119:422-433. [PMID: 38545331 PMCID: PMC10964322 DOI: 10.1080/01621459.2022.2123336] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/17/2020] [Accepted: 09/05/2022] [Indexed: 10/14/2022]
Abstract
Blind source separation (BSS) aims to separate latent source signals from their mixtures. For spatially dependent signals in high dimensional and large-scale data, such as neuroimaging, most existing BSS methods do not take into account the spatial dependence and the sparsity of the latent source signals. To address these major limitations, we propose a Bayesian spatial blind source separation (BSP-BSS) approach for neuroimaging data analysis. We assume the expectation of the observed images as a linear mixture of multiple sparse and piece-wise smooth latent source signals, for which we construct a new class of Bayesian nonparametric prior models by thresholding Gaussian processes. We assign the vMF priors to mixing coefficients in the model. Under some regularity conditions, we show that the proposed method has several desirable theoretical properties including the large support for the priors, the consistency of joint posterior distribution of the latent source intensity functions and the mixing coefficients, and the selection consistency on the number of latent sources. We use extensive simulation studies and an analysis of the resting-state fMRI data in the Autism Brain Imaging Data Exchange (ABIDE) study to demonstrate that BSP-BSS outperforms the existing method for separating latent brain networks and detecting activated brain activation in the latent sources.
Collapse
Affiliation(s)
- Ben Wu
- Center for Applied Statistics, School of Statistics, Renmin University of China, Beijing, CN, 100872
| | - Ying Guo
- Department of Biostatistics and Bioinformatics, Emory University, Atlanta, GA 30322
| | - Jian Kang
- Department of Biostatistics, University of Michigan, Ann Arbor, MI 48109
| |
Collapse
|
2
|
Mejia AF, Bolin D, Yue YR, Wang J, Caffo BS, Nebel MB. Template independent component analysis with spatial priors for accurate subject-level brain network estimation and inference. J Comput Graph Stat 2022; 32:413-433. [PMID: 37377728 PMCID: PMC10292763 DOI: 10.1080/10618600.2022.2104289] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/09/2021] [Accepted: 06/14/2022] [Indexed: 10/17/2022]
Abstract
Independent component analysis is commonly applied to functional magnetic resonance imaging (fMRI) data to extract independent components (ICs) representing functional brain networks. While ICA produces reliable group-level estimates, single-subject ICA often produces noisy results. Template ICA is a hierarchical ICA model using empirical population priors to produce more reliable subject-level estimates. However, this and other hierarchical ICA models assume unrealistically that subject effects are spatially independent. Here, we propose spatial template ICA (stICA), which incorporates spatial priors into the template ICA framework for greater estimation efficiency. Additionally, the joint posterior distribution can be used to identify brain regions engaged in each network using an excursions set approach. By leveraging spatial dependencies and avoiding massive multiple comparisons, stICA has high power to detect true effects. We derive an efficient expectation-maximization algorithm to obtain maximum likelihood estimates of the model parameters and posterior moments of the latent fields. Based on analysis of simulated data and fMRI data from the Human Connectome Project, we find that stICA produces estimates that are more accurate and reliable than benchmark approaches, and identifies larger and more reliable areas of engagement. The algorithm is computationally tractable, achieving convergence within 12 hours for whole-cortex fMRI analysis.
Collapse
Affiliation(s)
- Amanda F. Mejia
- Department of Statistics, Indiana University, Bloomington, IN, 47408
| | - David Bolin
- CEMSE Division, King Abdullah University of Science and Technology, Thuwal, Saudi Arabia
| | - Yu Ryan Yue
- Paul H. Chook Department of Information Systems and Statistics, Baruch College, The City University of New York, New York, NY, 10010
| | - Jiongran Wang
- Department of Statistics, Indiana University, Bloomington, IN, 47408
| | - Brian S. Caffo
- Department of Biostatistics, Johns Hopkins University, Baltimore, MD, 21205
| | - Mary Beth Nebel
- Center for Neurodevelopmental and Imaging Research, Kennedy Krieger Institute, Baltimore, MD, 21205
- Department of Neurology, Johns Hopkins University, Baltimore, MD, 21205
| |
Collapse
|
3
|
Lee S, Shen H, Truong Y. Sampling Properties of color Independent Component Analysis. J MULTIVARIATE ANAL 2020; 181. [PMID: 33162620 DOI: 10.1016/j.jmva.2020.104692] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/23/2022]
Abstract
Independent Component Analysis (ICA) offers an effective data-driven approach for blind source extraction encountered in many signal and image processing problems. Although many ICA methods have been developed, they have received relatively little attention in the statistics literature, especially in terms of rigorous theoretical investigation for statistical inference. The current paper aims at narrowing this gap and investigates the statistical sampling properties of the colorICA (cICA) method. The cICA incorporates the correlation structure within sources through parametric time series models in the frequency domain and outperforms several existing ICA alternatives numerically. We establish the consistency and asymptotic normality of the cICA estimates, which then enables statistical inference based on the estimates. These asymptotic properties are further validated using simulation studies.
Collapse
Affiliation(s)
- Seonjoo Lee
- Department of Psychiatry and Biostatistics, Columbia University, New York, NY, USA.,Mental Health Data Science, New York State Psychiatric Institute and Research Foundation for Mental Hygiene, Inc., New York, NY, USA
| | - Haipeng Shen
- Innovation and Information Management, Faculty of Business and Economics, University of Hong Kong, Hong Kong, China
| | - Young Truong
- Department of Biostatistics, University of North Carolina at Chapel Hill, Chapel Hill, NC, USA
| |
Collapse
|
4
|
Guo R, Zhang C, Zhang Z. Maximum Independent Component Analysis with Application to EEG Data. Stat Sci 2020. [DOI: 10.1214/19-sts763] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/19/2022]
|
5
|
Risk BB, Matteson DS, Ruppert D. Linear Non-Gaussian Component Analysis Via Maximum Likelihood. J Am Stat Assoc 2019. [DOI: 10.1080/01621459.2017.1407772] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/18/2022]
Affiliation(s)
- Benjamin B. Risk
- Department of Biostatistics & Bioinformatics, Emory University, Atlanta, GA
| | | | - David Ruppert
- Department of Statistical Science, Cornell University, Ithaca, NY
| |
Collapse
|
6
|
Zhang W, Lv J, Li X, Zhu D, Jiang X, Zhang S, Zhao Y, Guo L, Ye J, Hu D, Liu T. Experimental Comparisons of Sparse Dictionary Learning and Independent Component Analysis for Brain Network Inference From fMRI Data. IEEE Trans Biomed Eng 2019; 66:289-299. [DOI: 10.1109/tbme.2018.2831186] [Citation(s) in RCA: 38] [Impact Index Per Article: 7.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/07/2022]
|
7
|
Pfister N, Bühlmann P, Schölkopf B, Peters J. Kernel-based tests for joint independence. J R Stat Soc Series B Stat Methodol 2017. [DOI: 10.1111/rssb.12235] [Citation(s) in RCA: 54] [Impact Index Per Article: 7.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
Affiliation(s)
| | | | | | - Jonas Peters
- Max Planck Institute for Intelligent Systems; Tübingen Germany
- University of Copenhagen; Denmark
| |
Collapse
|
8
|
Affiliation(s)
- David S. Matteson
- Department of Social Statistics and Statistical Science, Cornell University, Ithaca, NY
| | - Ruey S. Tsay
- Booth School of Business, University of Chicago, Chicago, IL
| |
Collapse
|
9
|
Zanini P, Shen H, Truong Y. Understanding resident mobility in Milan through independent component analysis of Telecom Italia mobile usage data. Ann Appl Stat 2016. [DOI: 10.1214/16-aoas913] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/19/2022]
|
10
|
Oja H, Paindaveine D, Taskinen S. Affine-invariant rank tests for multivariate independence in independent component models. Electron J Stat 2016. [DOI: 10.1214/16-ejs1174] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/19/2022]
|
11
|
|
12
|
Miettinen J, Taskinen S, Nordhausen K, Oja H. Fourth Moments and Independent Component Analysis. Stat Sci 2015. [DOI: 10.1214/15-sts520] [Citation(s) in RCA: 42] [Impact Index Per Article: 4.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/19/2022]
|
13
|
|
14
|
Eloyan A, Ghosh SK. A Semiparametric Approach to Source Separation using Independent Component Analysis. Comput Stat Data Anal 2014; 58:383-396. [PMID: 24526802 DOI: 10.1016/j.csda.2012.09.012] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/19/2022]
Abstract
Data processing and source identification using lower dimensional hidden structure plays an essential role in many fields of applications, including image processing, neural networks, genome studies, signal processing and other areas where large datasets are often encountered. One of the common methods for source separation using lower dimensional structure involves the use of Independent Component Analysis, which is based on a linear representation of the observed data in terms of independent hidden sources. The problem thus involves the estimation of the linear mixing matrix and the densities of the independent hidden sources. However, the solution to the problem depends on the identifiability of the sources. This paper first presents a set of sufficient conditions to establish the identifiability of the sources and the mixing matrix using moment restrictions of the hidden source variables. Under such sufficient conditions a semi-parametric maximum likelihood estimate of the mixing matrix is obtained using a class of mixture distributions. The consistency of our proposed estimate is established under additional regularity conditions. The proposed method is illustrated and compared with existing methods using simulated and real data sets.
Collapse
|
15
|
Sokol A, H. Maathuis M, Falkeborg B. Quantifying identifiability in independent component analysis. Electron J Stat 2014. [DOI: 10.1214/14-ejs932] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/19/2022]
|
16
|
Risk BB, Matteson DS, Ruppert D, Eloyan A, Caffo BS. An evaluation of independent component analyses with an application to resting-state fMRI. Biometrics 2013; 70:224-36. [PMID: 24350655 DOI: 10.1111/biom.12111] [Citation(s) in RCA: 19] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/01/2013] [Revised: 06/01/2013] [Accepted: 08/01/2013] [Indexed: 11/29/2022]
Abstract
We examine differences between independent component analyses (ICAs) arising from different assumptions, measures of dependence, and starting points of the algorithms. ICA is a popular method with diverse applications including artifact removal in electrophysiology data, feature extraction in microarray data, and identifying brain networks in functional magnetic resonance imaging (fMRI). ICA can be viewed as a generalization of principal component analysis (PCA) that takes into account higher-order cross-correlations. Whereas the PCA solution is unique, there are many ICA methods-whose solutions may differ. Infomax, FastICA, and JADE are commonly applied to fMRI studies, with FastICA being arguably the most popular. Hastie and Tibshirani (2003) demonstrated that ProDenICA outperformed FastICA in simulations with two components. We introduce the application of ProDenICA to simulations with more components and to fMRI data. ProDenICA was more accurate in simulations, and we identified differences between biologically meaningful ICs from ProDenICA versus other methods in the fMRI analysis. ICA methods require nonconvex optimization, yet current practices do not recognize the importance of, nor adequately address sensitivity to, initial values. We found that local optima led to dramatically different estimates in both simulations and group ICA of fMRI, and we provide evidence that the global optimum from ProDenICA is the best estimate. We applied a modification of the Hungarian (Kuhn-Munkres) algorithm to match ICs from multiple estimates, thereby gaining novel insights into how brain networks vary in their sensitivity to initial values and ICA method.
Collapse
Affiliation(s)
- Benjamin B Risk
- Department of Statistical Science, Cornell University, 301 Malott Hall, Ithaca, New York, U.S.A
| | | | | | | | | |
Collapse
|
17
|
Hyvärinen A. Independent component analysis: recent advances. PHILOSOPHICAL TRANSACTIONS. SERIES A, MATHEMATICAL, PHYSICAL, AND ENGINEERING SCIENCES 2013; 371:20110534. [PMID: 23277597 PMCID: PMC3538438 DOI: 10.1098/rsta.2011.0534] [Citation(s) in RCA: 79] [Impact Index Per Article: 7.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/04/2023]
Abstract
Independent component analysis is a probabilistic method for learning a linear transform of a random vector. The goal is to find components that are maximally independent and non-Gaussian (non-normal). Its fundamental difference to classical multi-variate statistical methods is in the assumption of non-Gaussianity, which enables the identification of original, underlying components, in contrast to classical methods. The basic theory of independent component analysis was mainly developed in the 1990s and summarized, for example, in our monograph in 2001. Here, we provide an overview of some recent developments in the theory since the year 2000. The main topics are: analysis of causal relations, testing independent components, analysing multiple datasets (three-way data), modelling dependencies between the components and improved methods for estimating the basic model.
Collapse
Affiliation(s)
- Aapo Hyvärinen
- Department of Computer Science, and HIIT, University of Helsinki, Helsinki, Finland.
| |
Collapse
|
18
|
Eloyan A, Crainiceanu CM, Caffo BS. Likelihood-based population independent component analysis. Biostatistics 2013; 14:514-27. [PMID: 23314416 DOI: 10.1093/biostatistics/kxs055] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open
Abstract
Independent component analysis (ICA) is a widely used technique for blind source separation, used heavily in several scientific research areas including acoustics, electrophysiology, and functional neuroimaging. We propose a scalable two-stage iterative true group ICA methodology for analyzing population level functional magnetic resonance imaging (fMRI) data where the number of subjects is very large. The method is based on likelihood estimators of the underlying source densities and the mixing matrix. As opposed to many commonly used group ICA algorithms, the proposed method does not require significant data reduction by a 2-fold singular value decomposition. In addition, the method can be applied to a large group of subjects since the memory requirements are not restrictive. The performance of our approach is compared with a commonly used group ICA algorithm via simulation studies. Furthermore, the proposed method is applied to a large collection of resting state fMRI datasets. The results show that established brain networks are well recovered by the proposed algorithm.
Collapse
Affiliation(s)
- Ani Eloyan
- Bloomberg School of Public Health, Johns Hopkins University, Baltimore, MD 21205, USA.
| | | | | |
Collapse
|
19
|
Samworth RJ, Yuan M. Independent component analysis via nonparametric maximum likelihood estimation. Ann Stat 2012. [DOI: 10.1214/12-aos1060] [Citation(s) in RCA: 31] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/19/2022]
|
20
|
Lio G, Boulinguez P. Greater robustness of second order statistics than higher order statistics algorithms to distortions of the mixing matrix in blind source separation of human EEG: implications for single-subject and group analyses. Neuroimage 2012. [PMID: 23194817 DOI: 10.1016/j.neuroimage.2012.11.015] [Citation(s) in RCA: 35] [Impact Index Per Article: 2.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/26/2022] Open
Abstract
A mandatory assumption in blind source separation (BSS) of the human electroencephalogram (EEG) is that the mixing matrix remains invariant, i.e., that the sources, electrodes and geometry of the head do not change during the experiment. Actually, this is not often the case. For instance, it is common that some electrodes slightly move during EEG recording. This issue is even more critical for group independent component analysis (gICA), a method of growing interest, in which only one mixing matrix is estimated for several subjects. Indeed, because of interindividual anatomo-functional variability, this method violates the mandatory principle of invariance. Here, using simulated (experiments 1 and 2) and real (experiment 3) EEG data, we test how eleven current BSS algorithms undergo distortions of the mixing matrix. We show that this usual kind of perturbation creates non-Gaussian features that are virtually added to all sources, impairing the estimation of real higher order statistics (HOS) features of the actual sources by HOS algorithms (e.g., Ext-INFOMAX, FASTICA). HOS-based methods are likely to identify more components (with similar properties) than actual neurological sources, a problem frequently encountered by BSS users. In practice, the quality of the recovered signal and the efficiency of subsequent source localization are substantially impaired. Performing dimensionality reduction before applying HOS-based BSS does not seem to be a safe strategy to circumvent the problem. Second order statistics (SOS)-based BSS methods belonging to the less popular SOBI family class are much less sensitive to this bias.
Collapse
Affiliation(s)
- Guillaume Lio
- Université de Lyon, F-69622, Lyon, France; Université Lyon 1, Villeurbanne, France
| | | |
Collapse
|
21
|
Lee S, Shen H, Truong Y, Lewis M, Huang X. Independent Component Analysis Involving Autocorrelated Sources With an Application to Functional Magnetic Resonance Imaging. J Am Stat Assoc 2012; 106:1009-1024. [PMID: 27524847 DOI: 10.1198/jasa.2011.tm10332] [Citation(s) in RCA: 23] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/21/2022]
Abstract
Independent component analysis (ICA) is an effective data-driven method for blind source separation. It has been successfully applied to separate source signals of interest from their mixtures. Most existing ICA procedures are carried out by relying solely on the estimation of the marginal density functions, either parametrically or nonparametrically. In many applications, correlation structures within each source also play an important role besides the marginal distributions. One important example is functional magnetic resonance imaging (fMRI) analysis where the brain-function-related signals are temporally correlated. In this article, we consider a novel approach to ICA that fully exploits the correlation structures within the source signals. Specifically, we propose to estimate the spectral density functions of the source signals instead of their marginal density functions. This is made possible by virtue of the intrinsic relationship between the (unobserved) sources and the (observed) mixed signals. Our methodology is described and implemented using spectral density functions from frequently used time series models such as autoregressive moving average (ARMA) processes. The time series parameters and the mixing matrix are estimated via maximizing the Whittle likelihood function. We illustrate the performance of the proposed method through extensive simulation studies and a real fMRI application. The numerical results indicate that our approach outperforms several popular methods including the most widely used fastICA algorithm. This article has supplementary material online.
Collapse
Affiliation(s)
- Seonjoo Lee
- Department of Statistics and Operations Research, University of North Carolina at Chapel Hill, Chapel Hill, NC 27599
| | - Haipeng Shen
- Department of Statistics and Operations Research, University of North Carolina at Chapel Hill, Chapel Hill, NC 27599. ( )
| | - Young Truong
- Department of Biostatistics, University of North Carolina at Chapel Hill, Chapel Hill, NC 27599 ( )
| | - Mechelle Lewis
- Neurology and Pharmacology, Pennsylvania State University College of Medicine, Hershey, PA 17033 ( )
| | - Xuemei Huang
- Neurology, Pharmacology, Radiology, Neurosurgery, Kinesiology, and Bioengineering, Pennsylvania State University Milton S. Hershey Medical Center, Hershey, PA 17033 ( )
| |
Collapse
|
22
|
Ilmonen P, Paindaveine D. Semiparametrically efficient inference based on signed ranks in symmetric independent component models. Ann Stat 2011. [DOI: 10.1214/11-aos906] [Citation(s) in RCA: 29] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/19/2022]
|
23
|
Capobianco E. Gene feature interference deconvolution. Math Biosci 2010; 227:136-46. [PMID: 20673773 DOI: 10.1016/j.mbs.2010.07.003] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/05/2009] [Revised: 06/14/2010] [Accepted: 07/19/2010] [Indexed: 10/19/2022]
Abstract
High-throughput microarray technologies measure the abundance of thousands of mRNA targets simultaneously. Due to the usual disparity between a few available samples (from limited conditions or time course points) and many gene expression values (entire genomes), a complex high-dimensional genomic system has to be analyzed, for instance by reverse engineering methods. The latter aim to reconstruct gene networks from experimentally observed expression changes caused by various kinds of perturbations. In particular, elucidating regulatory paths and assessing their reliability across replicates are central topics in this article. The reconstruction problem requires efficiency and accuracy from numerical optimization algorithms and statistical inference techniques. To this end, we focus on methods but also on the available experimental information produced in technical replicates. We propose a model-based approach based on a few steps. First, feature selection is performed by a projective method aimed to combine the gene measurements observed across replicates. Second, a quite heuristic sieving strategy is pursued to bypass the usual recourse to averaging. Third, the impact of dimensionality reduction on the biological system under study is evaluated. Evidence is obtained from the application of our approach to microarray time course experimental replicated data, and suggests that gene features, once identified, can be used for stabilization purposes relatively to the replicate variability. Both quantitative representation and qualitative assessment of the observed gene feature interference are reported in order to decipher specific gene regulatory map and the pathway-associated dynamics.
Collapse
Affiliation(s)
- Enrico Capobianco
- CRS4 Bioinformatics Laboratory, Technology Park of Sardinia, 09010 Pula (Cagliari), Sardinia, Italy.
| |
Collapse
|
24
|
Paul D, Peng J. Consistency of restricted maximum likelihood estimators of principal components. Ann Stat 2009. [DOI: 10.1214/08-aos608] [Citation(s) in RCA: 31] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/19/2022]
|
25
|
Kawanabe M, Sugiyama M, Blanchard G, Müller KR. A new algorithm of non-Gaussian component analysis with radial kernel functions. ANN I STAT MATH 2006. [DOI: 10.1007/s10463-006-0098-9] [Citation(s) in RCA: 11] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
|