1
|
Shamshoian J, Marco N, Şentürk D, Jeste S, Telesca D. Bayesian covariance regression in functional data analysis with applications to functional brain imaging. Int J Biostat 2025:ijb-2023-0029. [PMID: 39903849 DOI: 10.1515/ijb-2023-0029] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/16/2023] [Accepted: 01/07/2025] [Indexed: 02/06/2025]
Abstract
Function on scalar regression models relate functional outcomes to scalar predictors through the conditional mean function. With few and limited exceptions, many functional regression frameworks operate under the assumption that covariate information does not affect patterns of covariation. In this manuscript, we address this disparity by developing a Bayesian functional regression model, providing joint inference for both the conditional mean and covariance functions. Our work hinges on basis expansions of both the functional evaluation domain and covariate space, to define flexible non-parametric forms of dependence. To aid interpretation, we develop novel low-dimensional summaries, which indicate the degree of covariate-dependent heteroskedasticity. The proposed modeling framework is motivated and applied to a case study in functional brain imaging through electroencephalography, aiming to elucidate potential differentiation in the neural development of children with autism spectrum disorder.
Collapse
Affiliation(s)
- John Shamshoian
- Department of Biostatistics, University of California, Los Angeles, CA, USA
| | - Nicholas Marco
- Department of Biostatistics, University of California, Los Angeles, CA, USA
| | - Damla Şentürk
- Department of Biostatistics, University of California, Los Angeles, CA, USA
| | - Shafali Jeste
- Division of Neurology and Neurological Institute, Children's Hospital Los Angeles, Los Angeles, USA
| | - Donatello Telesca
- Department of Biostatistics, University of California, Los Angeles, CA, USA
| |
Collapse
|
2
|
Qian Q, Nguyen DV, Telesca D, Kurum E, Rhee CM, Banerjee S, Li Y, Senturk D. Multivariate spatiotemporal functional principal component analysis for modeling hospitalization and mortality rates in the dialysis population. Biostatistics 2024; 25:718-735. [PMID: 37337346 PMCID: PMC11358256 DOI: 10.1093/biostatistics/kxad013] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/09/2022] [Revised: 03/14/2023] [Accepted: 05/30/2023] [Indexed: 06/21/2023] Open
Abstract
Dialysis patients experience frequent hospitalizations and a higher mortality rate compared to other Medicare populations, in whom hospitalizations are a major contributor to morbidity, mortality, and healthcare costs. Patients also typically remain on dialysis for the duration of their lives or until kidney transplantation. Hence, there is growing interest in studying the spatiotemporal trends in the correlated outcomes of hospitalization and mortality among dialysis patients as a function of time starting from transition to dialysis across the United States Utilizing national data from the United States Renal Data System (USRDS), we propose a novel multivariate spatiotemporal functional principal component analysis model to study the joint spatiotemporal patterns of hospitalization and mortality rates among dialysis patients. The proposal is based on a multivariate Karhunen-Loéve expansion that describes leading directions of variation across time and induces spatial correlations among region-specific scores. An efficient estimation procedure is proposed using only univariate principal components decompositions and a Markov Chain Monte Carlo framework for targeting the spatial correlations. The finite sample performance of the proposed method is studied through simulations. Novel applications to the USRDS data highlight hot spots across the United States with higher hospitalization and/or mortality rates and time periods of elevated risk.
Collapse
Affiliation(s)
- Qi Qian
- Department of Biostatistics, University of California, Los Angeles, CA 90095, USA
| | - Danh V Nguyen
- Department of Medicine, University of California, Irvine, CA 92868, USA
| | - Donatello Telesca
- Department of Biostatistics, University of California, Los Angeles, CA 90095, USA
| | - Esra Kurum
- Department of Statistics, University of California, Riverside, CA 92521, USA
| | - Connie M Rhee
- Department of Medicine, University of California, Irvine, CA 92868, USA
- Harold Simmons Center for Chronic Disease Research and Epidemiology, University of California School of Medicine, Irvine, CA 92868, USA
| | - Sudipto Banerjee
- Department of Biostatistics, University of California, Los Angeles, CA 90095, USA
| | - Yihao Li
- Department of Biostatistics, University of California, Los Angeles, CA 90095, USA
| | - Damla Senturk
- Department of Biostatistics, University of California, Los Angeles, CA 90095, USA
| |
Collapse
|
3
|
Dempsey W. Recurrent event analysis in the presence of real-time high frequency data via random subsampling. J Comput Graph Stat 2023; 33:525-537. [PMID: 38868625 PMCID: PMC11165938 DOI: 10.1080/10618600.2023.2276114] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/02/2022] [Accepted: 10/17/2023] [Indexed: 06/14/2024]
Abstract
Digital monitoring studies collect real-time high frequency data via mobile sensors in the subjects' natural environment. This data can be used to model the impact of changes in physiology on recurrent event outcomes such as smoking, drug use, alcohol use, or self-identified moments of suicide ideation. Likelihood calculations for the recurrent event analysis, however, become computationally prohibitive in this setting. Motivated by this, a random subsampling framework is proposed for computationally efficient, approximate likelihood-based estimation. A subsampling-unbiased estimator for the derivative of the cumulative hazard enters into an approximation of log-likelihood. The estimator has two sources of variation: the first due to the recurrent event model and the second due to subsampling. The latter can be reduced by increasing the sampling rate; however, this leads to increased computational costs. The approximate score equations are equivalent to logistic regression score equations, allowing for standard, "off-the-shelf" software to be used in fitting these models. Simulations demonstrate the method and efficiency-computation trade-off. We end by illustrating our approach using data from a digital monitoring study of suicidal ideation.
Collapse
Affiliation(s)
- Walter Dempsey
- Department of Biostatistics, University of Michigan, 1415 Washington Heights, Ann Arbor, MI 48109, USA
| |
Collapse
|
4
|
Boland J, Telesca D, Sugar C, Jeste S, Dickinson A, DiStefano C, Şentürk D. Central Posterior Envelopes for Bayesian Functional Principal Component Analysis. JOURNAL OF DATA SCIENCE : JDS 2023; 21:715-734. [PMID: 38883309 PMCID: PMC11178334 DOI: 10.6339/23-jds1085] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/18/2024]
Abstract
Bayesian methods provide direct inference in functional data analysis applications without reliance on bootstrap techniques. A major tool in functional data applications is the functional principal component analysis which decomposes the data around a common mean function and identifies leading directions of variation. Bayesian functional principal components analysis (BFPCA) provides uncertainty quantification on the estimated functional model components via the posterior samples obtained. We propose central posterior envelopes (CPEs) for BFPCA based on functional depth as a descriptive visualization tool to summarize variation in the posterior samples of the estimated functional model components, contributing to uncertainty quantification in BFPCA. The proposed BFPCA relies on a latent factor model and targets model parameters within a mixed effects modeling framework using modified multiplicative gamma process shrinkage priors on the variance components. Functional depth provides a center-outward order to a sample of functions. We utilize modified band depth and modified volume depth for ordering of a sample of functions and surfaces, respectively, to derive at CPEs of the mean and eigenfunctions within the BFPCA framework. The proposed CPEs are showcased in extensive simulations. Finally, the proposed CPEs are applied to the analysis of a sample of power spectral densities (PSD) from resting state electroencephalography (EEG) where they lead to novel insights on diagnostic group differences among children diagnosed with autism spectrum disorder and their typically developing peers across age.
Collapse
Affiliation(s)
- Joanna Boland
- Department of Biostatistics, University of California, Los Angeles, Los Angeles, CA 90025, USA
| | - Donatello Telesca
- Department of Biostatistics, University of California, Los Angeles, Los Angeles, CA 90025, USA
| | - Catherine Sugar
- Department of Biostatistics, University of California, Los Angeles, Los Angeles, CA 90025, USA
- Department of Statistics, University of California, Los Angeles, Los Angeles, CA 90025, USA
- Department of Psychiatry and Biobehavioral Sciences, University of California, Los Angeles, Los Angeles, CA 90025, USA
| | - Shafali Jeste
- Division of Neurology, Children’s Hospital Los Angeles, Los Angeles, CA 90027, USA
| | - Abigail Dickinson
- Department of Psychiatry and Biobehavioral Sciences, University of California, Los Angeles, Los Angeles, CA 90025, USA
| | - Charlotte DiStefano
- Division of Neurology, Children’s Hospital Los Angeles, Los Angeles, CA 90027, USA
| | - Damla Şentürk
- Department of Biostatistics, University of California, Los Angeles, Los Angeles, CA 90025, USA
- Department of Statistics, University of California, Los Angeles, Los Angeles, CA 90025, USA
| |
Collapse
|
5
|
Zhang J, Siegle GJ, Sun T, D’andrea W, Krafty RT. Interpretable principal component analysis for multilevel multivariate functional data. Biostatistics 2023; 24:227-243. [PMID: 34545394 PMCID: PMC10102903 DOI: 10.1093/biostatistics/kxab018] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/27/2020] [Revised: 03/27/2021] [Accepted: 04/12/2021] [Indexed: 11/14/2022] Open
Abstract
Many studies collect functional data from multiple subjects that have both multilevel and multivariate structures. An example of such data comes from popular neuroscience experiments where participants' brain activity is recorded using modalities such as electroencephalography and summarized as power within multiple time-varying frequency bands within multiple electrodes, or brain regions. Summarizing the joint variation across multiple frequency bands for both whole-brain variability between subjects, as well as location-variation within subjects, can help to explain neural reactions to stimuli. This article introduces a novel approach to conducting interpretable principal components analysis on multilevel multivariate functional data that decomposes total variation into subject-level and replicate-within-subject-level (i.e., electrode-level) variation and provides interpretable components that can be both sparse among variates (e.g., frequency bands) and have localized support over time within each frequency band. Smoothness is achieved through a roughness penalty, while sparsity and localization of components are achieved by solving an innovative rank-one based convex optimization problem with block Frobenius and matrix $L_1$-norm-based penalties. The method is used to analyze data from a study to better understand reactions to emotional information in individuals with histories of trauma and the symptom of dissociation, revealing new neurophysiological insights into how subject- and electrode-level brain activity are associated with these phenomena. Supplementary materials for this article are available online.
Collapse
Affiliation(s)
- Jun Zhang
- Department of Biostatistics, University of Pittsburgh, 130 De Soto Street, Pittsburgh, PA, 15261, USA
| | - Greg J Siegle
- Department of Psychiatry, University of Pittsburgh, 3811 O’Hara Street, Pittsburgh, PA, 15213, USA
| | - Tao Sun
- Center for Applied Statistics, School of Statistics, Renmin University of China, 59 Zhongguancun Street, Beijing, 100872, China
| | - Wendy D’andrea
- Department of Psychology, New School for Social Research, 80 Fifth Avenue, New York, NY, 10011, USA
| | - Robert T Krafty
- Department of Biostatistics and Bioinformatics, Emory University, 1518 Clifton Road NE, Atlanta, GA, 30322, USA
| |
Collapse
|
6
|
Li Y, Nguyen DV, Kürüm E, Rhee CM, Banerjee S, Şentürk D. Multilevel Varying Coefficient Spatiotemporal Model. Stat (Int Stat Inst) 2022; 11:e438. [PMID: 35693320 PMCID: PMC9175782 DOI: 10.1002/sta4.438] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/24/2021] [Accepted: 11/13/2021] [Indexed: 11/11/2022]
Abstract
Over 785,000 individuals in the U.S. have end-stage renal disease (ESRD) with about 70% of patients on dialysis, a life-sustaining treatment. Dialysis patients experience frequent hospitalizations. In order to identify risk factors of hospitalizations, we utilize data from the large national database, United States Renal Data System (USRDS). To account for the hierarchical structure of the data, with longitudinal hospitalization rates nested in dialysis facilities and dialysis facilities nested in geographic regions across the U.S., we propose a multilevel varying coefficient spatiotemporal model (M-VCSM) where region- and facility-specific random deviations are modeled through a multilevel Karhunen-Loéve (KL) expansion. The proposed M-VCSM includes time-varying effects of multilevel risk factors at the region- (e.g., urbanicity and area deprivation index) and facility-levels (e.g., patient demographic makeup) and incorporates spatial correlations across regions via a conditional autoregressive (CAR) structure. Efficient estimation and inference is achieved through the fusion of functional principal component analysis (FPCA) and Markov Chain Monte Carlo (MCMC). Applications to the USRDS data highlight significant region- and facility-level risk factors of hospitalizations and characterize time periods and spatial locations with elevated hospitalization risk. Finite sample performance of the proposed methodology is studied through simulations.
Collapse
Affiliation(s)
- Yihao Li
- Department of Biostatistics, University of California, Los Angeles, CA 90095, USA
| | - Danh V Nguyen
- Department of Medicine, University of California Irvine, Orange, CA 92868, USA
| | - Esra Kürüm
- Department of Statistics, University of California, Riverside, CA 92521, USA
| | - Connie M Rhee
- Department of Medicine, University of California Irvine, Orange, CA 92868, USA
- Harold Simmons Center for Chronic Disease Research and Epidemiology, University of California Irvine School of Medicine, Orange, CA 92868, USA
| | - Sudipto Banerjee
- Department of Biostatistics, University of California, Los Angeles, CA 90095, USA
| | - Damla Şentürk
- Department of Biostatistics, University of California, Los Angeles, CA 90095, USA
| |
Collapse
|
7
|
Park Y, Li B, Li Y. Crop Yield Prediction Using Bayesian Spatially Varying Coefficient Models with Functional Predictors. J Am Stat Assoc 2022. [DOI: 10.1080/01621459.2022.2123333] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/17/2022]
Affiliation(s)
- Yeonjoo Park
- Management Science and Statistics, University of Texas at San Antonio
| | - Bo Li
- Department of Statistics, University of Illinois at Urbana-Champaign
| | - Yehua Li
- Department of Statistics, University of California at Riverside
| |
Collapse
|
8
|
Campos E, Scheffler AW, Telesca D, Sugar C, DiStefano C, Jeste S, Levin AR, Naples A, Webb SJ, Shic F, Dawson G, Faja S, McPartland JC, Şentürk D. Multilevel hybrid principal components analysis for region-referenced functional electroencephalography data. Stat Med 2022; 41:3737-3757. [PMID: 35611602 PMCID: PMC9308678 DOI: 10.1002/sim.9445] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/22/2021] [Revised: 03/15/2022] [Accepted: 05/10/2022] [Indexed: 01/27/2023]
Abstract
Electroencephalography experiments produce region-referenced functional data representing brain signals in the time or the frequency domain collected across the scalp. The data typically also have a multilevel structure with high-dimensional observations collected across multiple experimental conditions or visits. Common analysis approaches reduce the data complexity by collapsing the functional and regional dimensions, where event-related potential (ERP) features or band power are targeted in a pre-specified scalp region. This practice can fail to portray more comprehensive differences in the entire ERP signal or the power spectral density (PSD) across the scalp. Building on the weak separability of the high-dimensional covariance process, the proposed multilevel hybrid principal components analysis (M-HPCA) utilizes dimension reduction tools from both vector and functional principal components analysis to decompose the total variation into between- and within-subject variance. The resulting model components are estimated in a mixed effects modeling framework via a computationally efficient minorization-maximization algorithm coupled with bootstrap. The diverse array of applications of M-HPCA is showcased with two studies of individuals with autism. While ERP responses to match vs mismatch conditions are compared in an audio odd-ball paradigm in the first study, short-term reliability of the PSD across visits is compared in the second. Finite sample properties of the proposed methodology are studied in extensive simulations.
Collapse
Affiliation(s)
- Emilie Campos
- Department of Biostatistics, University of California, Los Angeles, California, USA
| | - Aaron Wolfe Scheffler
- Department of Epidemiology and Biostatistics, University of California, San Francisco, California, USA
| | - Donatello Telesca
- Department of Biostatistics, University of California, Los Angeles, California, USA
| | - Catherine Sugar
- Department of Biostatistics, University of California, Los Angeles, California, USA
- Department of Psychiatry and Biobehavioral Sciences, University of California, Los Angeles, California, USA
| | - Charlotte DiStefano
- Department of Psychiatry and Biobehavioral Sciences, University of California, Los Angeles, California, USA
| | - Shafali Jeste
- Department of Psychiatry and Biobehavioral Sciences, University of California, Los Angeles, California, USA
| | - April R. Levin
- Department of Neurology, Boston Children’s Hospital and Harvard Medical School, Massachusetts, USA
| | - Adam Naples
- Child Study Center, School of Medicine, Yale University, Connecticut, USA
| | - Sara J. Webb
- Center for Child Health, Behavior, and Development, Seattle Children’s Research Institute, Washington, USA
- Department of Psychiatry and Behavioral Sciences, University of Washington, Seattle, Washington, USA
| | - Frederick Shic
- Center for Child Health, Behavior, and Development, Seattle Children’s Research Institute, Washington, USA
- Department of Pediatrics, University of Washington, Seattle, Washington, USA
| | - Geraldine Dawson
- Duke Institute for Brain Sciences, Duke University, Durham, North Carolina, USA
- Duke Center for Autism and Brain Development, Duke University, Durham, North Carolina, USA
- Department of Psychiatry and Behavioral Sciences, Duke University, Durham, North Carolina, USA
| | - Susan Faja
- Laboratory of Cognitive Neuroscience, Division of Developmental Medicine, Boston Children’s Hospital, Harvard Medical School, Boston, Massachusetts, USA
| | | | - Damla Şentürk
- Department of Biostatistics, University of California, Los Angeles, California, USA
| |
Collapse
|
9
|
Li Y, Qiu Y, Xu Y. From multivariate to functional data analysis: fundamentals, recent developments, and emerging areas. J MULTIVARIATE ANAL 2022; 188:104806. [PMID: 39040141 PMCID: PMC11261241 DOI: 10.1016/j.jmva.2021.104806] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/21/2022]
Abstract
Functional data analysis (FDA), which is a branch of statistics on modeling infinite dimensional random vectors resided in functional spaces, has become a major research area for Journal of Multivariate Analysis. We review some fundamental concepts of FDA, their origins and connections from multivariate analysis, and some of its recent developments, including multi-level functional data analysis, high-dimensional functional regression, and dependent functional data analysis. We also discuss the impact of these new methodology developments on genetics, plant science, wearable device data analysis, image data analysis, and business analytics. Two real data examples are provided to motivate our discussions.
Collapse
Affiliation(s)
- Yehua Li
- University of California - Riverside, Riverside, CA 92521, USA
| | - Yumou Qiu
- Iowa State University, Ames, IA 50011, USA
| | - Yuhang Xu
- Bowling Green State University, Bowling Green, OH 43403, USA
| |
Collapse
|
10
|
Yarger D, Stoev S, Hsing T. A functional-data approach to the Argo data. Ann Appl Stat 2022. [DOI: 10.1214/21-aoas1477] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/19/2022]
Affiliation(s)
- Drew Yarger
- Department of Statistics, University of Michigan
| | | | - Tailen Hsing
- Department of Statistics, University of Michigan
| |
Collapse
|
11
|
Scheffler AW, Dickinson A, DiStefano C, Jeste S, Şentürk D. Covariate-adjusted hybrid principal components analysis for region-referenced functional EEG data. STATISTICS AND ITS INTERFACE 2022; 15:209-223. [PMID: 35664510 PMCID: PMC9165697 DOI: 10.4310/21-sii712] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/15/2023]
Abstract
Electroencephalography (EEG) studies produce region-referenced functional data via EEG signals recorded across scalp electrodes. The high-dimensional data can be used to contrast neurodevelopmental trajectories between diagnostic groups, for example between typically developing (TD) children and children with autism spectrum disorder (ASD). Valid inference requires characterization of the complex EEG dependency structure as well as covariate-dependent heteroscedasticity, such as changes in variation over developmental age. In our motivating study, EEG data is collected on TD and ASD children aged two to twelve years old. The peak alpha frequency, a prominent peak in the alpha spectrum, is a biomarker linked to neurodevelopment that shifts as children age. To retain information, we model patterns of alpha spectral variation, rather than just the peak location, regionally across the scalp and chronologically across development. We propose a covariate-adjusted hybrid principal components analysis (CA-HPCA) for EEG data, which utilizes both vector and functional principal components analysis while simultaneously adjusting for covariate-dependent heteroscedasticity. CA-HPCA assumes the covariance process is weakly separable conditional on observed covariates, allowing for covariate-adjustments to be made on the marginal covariances rather than the full covariance leading to stable and computationally efficient estimation. The proposed methodology provides novel insights into neurodevelopmental differences between TD and ASD children.
Collapse
Affiliation(s)
| | - Abigail Dickinson
- Department of Psychiatry and Biobehavioral Sciences, University of California, Los Angeles, USA
| | - Charlotte DiStefano
- Department of Psychiatry and Biobehavioral Sciences, University of California, Los Angeles, USA
| | - Shafali Jeste
- Department of Psychiatry and Biobehavioral Sciences, University of California, Los Angeles, USA
| | - Damla Şentürk
- Department of Biostatistics, University of California, Los Angeles, USA
| |
Collapse
|
12
|
Xu Y, Li Y, Qiu Y. Growth dynamics and heritability for plant high-throughput phenotyping studies using hierarchical functional data analysis. Biom J 2021; 63:1325-1341. [PMID: 33830499 DOI: 10.1002/bimj.202000315] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/22/2020] [Revised: 02/03/2021] [Accepted: 02/22/2021] [Indexed: 11/08/2022]
Abstract
In modern high-throughput plant phenotyping, images of plants of different genotypes are repeatedly taken throughout the growing season, and phenotypic traits of plants (e.g., plant height) are extracted through image processing. It is of interest to recover whole trait trajectories and their derivatives at both genotype and plant levels based on observations made at irregular discrete time points. We propose to model trait trajectories using hierarchical functional principal component analysis (HFPCA) and show that the problem of recovering derivatives of the trajectories is reduced to estimating derivatives of eigenfunctions, which is solved by differentiating eigenequations. Based on HFPCA, we also propose a new measure for the broad-sense heritability by allowing it to vary over time during plant growth. Simulation studies show that the proposed procedure performs better than its competitors in terms of recovering both trait trajectories and their derivatives. Interesting characteristics of plant growth and heritability dynamics are revealed in the application to a modern plant phenotyping study.
Collapse
Affiliation(s)
- Yuhang Xu
- Department of Applied Statistics and Operations Research, Bowling Green State University, Bowling Green, OH, USA
| | - Yehua Li
- Department of Statistics, University of California - Riverside, Riverside, CA, USA
| | - Yumou Qiu
- Department of Statistics, Iowa State University, Ames, IA, USA
| |
Collapse
|
13
|
Zhang H, Li Y. Unified Principal Component Analysis for Sparse and Dense Functional Data under Spatial Dependency. JOURNAL OF BUSINESS & ECONOMIC STATISTICS : A PUBLICATION OF THE AMERICAN STATISTICAL ASSOCIATION 2021; 40:1523-1537. [PMID: 36582252 PMCID: PMC9793858 DOI: 10.1080/07350015.2021.1938085] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/17/2023]
Abstract
We consider spatially dependent functional data collected under a geostatistics setting, where locations are sampled from a spatial point process. The functional response is the sum of a spatially dependent functional effect and a spatially independent functional nugget effect. Observations on each function are made on discrete time points and contaminated with measurement errors. Under the assumption of spatial stationarity and isotropy, we propose a tensor product spline estimator for the spatio-temporal covariance function. When a coregionalization covariance structure is further assumed, we propose a new functional principal component analysis method that borrows information from neighboring functions. The proposed method also generates nonparametric estimators for the spatial covariance functions, which can be used for functional kriging. Under a unified framework for sparse and dense functional data, infill and increasing domain asymptotic paradigms, we develop the asymptotic convergence rates for the proposed estimators. Advantages of the proposed approach are demonstrated through simulation studies and two real data applications representing sparse and dense functional data, respectively.
Collapse
|
14
|
Jang JH. Principal component analysis of hybrid functional and vector data. Stat Med 2021; 40:5152-5173. [PMID: 34160848 DOI: 10.1002/sim.9117] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/21/2020] [Revised: 04/09/2021] [Accepted: 06/10/2021] [Indexed: 11/07/2022]
Abstract
We propose a practical principal component analysis (PCA) framework that provides a nonparametric means of simultaneously reducing the dimensions of and modeling functional and vector (multivariate) data. We first introduce a Hilbert space that combines functional and vector objects as a single hybrid object. The framework, termed a PCA of hybrid functional and vector data (HFV-PCA), is then based on the eigen-decomposition of a covariance operator that captures simultaneous variations of functional and vector data in the new space. This approach leads to interpretable principal components that have the same structure as each observation and a single set of scores that serves well as a low-dimensional proxy for hybrid functional and vector data. To support practical application of HFV-PCA, the explicit relationship between the hybrid PC decomposition and the functional and vector PC decompositions is established, leading to a simple and robust estimation scheme where components of HFV-PCA are calculated using the components estimated from the existing functional and classical PCA methods. This estimation strategy allows flexible incorporation of sparse and irregular functional data as well as multivariate functional data. We derive the consistency results and asymptotic convergence rates for the proposed estimators. We demonstrate the efficacy of the method through simulations and analysis of renal imaging data.
Collapse
Affiliation(s)
- Jeong Hoon Jang
- Department of Biostatistics and Health Data Science, Indiana University School of Medicine, Indianapolis, Indiana, USA
| |
Collapse
|
15
|
Li Y, Nguyen DV, Banerjee S, Rhee CM, Kalantar-Zadeh K, Kürüm E, Şentürk D. Multilevel modeling of spatially nested functional data: Spatiotemporal patterns of hospitalization rates in the US dialysis population. Stat Med 2021; 40:3937-3952. [PMID: 33902165 DOI: 10.1002/sim.9007] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/11/2020] [Revised: 02/15/2021] [Accepted: 04/08/2021] [Indexed: 11/12/2022]
Abstract
End-stage renal disease patients on dialysis experience frequent hospitalizations. In addition to known temporal patterns of hospitalizations over the life span on dialysis, where poor outcomes are typically exacerbated during the first year on dialysis, variations in hospitalizations among dialysis facilities across the US contribute to spatial variation. Utilizing national data from the United States Renal Data System (USRDS), we propose a novel multilevel spatiotemporal functional model to study spatiotemporal patterns of hospitalization rates among dialysis facilities. Hospitalization rates of dialysis facilities are considered as spatially nested functional data (FD) with longitudinal hospitalizations nested in dialysis facilities and dialysis facilities nested in geographic regions. A multilevel Karhunen-Loéve expansion is utilized to model the two-level (facility and region) FD, where spatial correlations are induced among region-specific principal component scores accounting for regional variation. A new efficient algorithm based on functional principal component analysis and Markov Chain Monte Carlo is proposed for estimation and inference. We report a novel application using USRDS data to characterize spatiotemporal patterns of hospitalization rates for over 400 health service areas across the US and over the posttransition time on dialysis. Finite sample performance of the proposed method is studied through simulations.
Collapse
Affiliation(s)
- Yihao Li
- Department of Biostatistics, University of California, Los Angeles, California
| | - Danh V Nguyen
- Department of Medicine, UC Irvine School of Medicine, Orange, California
| | - Sudipto Banerjee
- Department of Biostatistics, University of California, Los Angeles, California
| | - Connie M Rhee
- Department of Medicine, UC Irvine School of Medicine, Orange, California
| | | | - Esra Kürüm
- Department of Statistics, University of California, Riverside, California
| | - Damla Şentürk
- Department of Biostatistics, University of California, Los Angeles, California
| |
Collapse
|
16
|
Maiti T, Safikhani A, Zhong P. On uncertainty estimation in functional linear mixed models. CAN J STAT 2020. [DOI: 10.1002/cjs.11585] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/06/2022]
Affiliation(s)
- Tapabrata Maiti
- Department of Statistics and Probability Michigan State University East Lansing MI 48824 U.S.A
| | - Abolfazl Safikhani
- Department of Statistics University of Florida Gainesville FL 32611 U.S.A
| | - Ping‐Shou Zhong
- Department of Mathematics, Statistics, and Computer Science University of Illinois at Chicago Chicago IL 60607 U.S.A
| |
Collapse
|
17
|
Scheffler A, Telesca D, Li Q, Sugar CA, Distefano C, Jeste S, Şentürk D. Hybrid principal components analysis for region-referenced longitudinal functional EEG data. Biostatistics 2020; 21:139-157. [PMID: 30084925 DOI: 10.1093/biostatistics/kxy034] [Citation(s) in RCA: 16] [Impact Index Per Article: 3.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/20/2017] [Revised: 01/25/2018] [Accepted: 06/11/2018] [Indexed: 11/12/2022] Open
Abstract
Electroencephalography (EEG) data possess a complex structure that includes regional, functional, and longitudinal dimensions. Our motivating example is a word segmentation paradigm in which typically developing (TD) children, and children with autism spectrum disorder (ASD) were exposed to a continuous speech stream. For each subject, continuous EEG signals recorded at each electrode were divided into one-second segments and projected into the frequency domain via fast Fourier transform. Following a spectral principal components analysis, the resulting data consist of region-referenced principal power indexed regionally by scalp location, functionally across frequencies, and longitudinally by one-second segments. Standard EEG power analyses often collapse information across the longitudinal and functional dimensions by averaging power across segments and concentrating on specific frequency bands. We propose a hybrid principal components analysis for region-referenced longitudinal functional EEG data, which utilizes both vector and functional principal components analyses and does not collapse information along any of the three dimensions of the data. The proposed decomposition only assumes weak separability of the higher-dimensional covariance process and utilizes a product of one dimensional eigenvectors and eigenfunctions, obtained from the regional, functional, and longitudinal marginal covariances, to represent the observed data, providing a computationally feasible non-parametric approach. A mixed effects framework is proposed to estimate the model components coupled with a bootstrap test for group level inference, both geared towards sparse data applications. Analysis of the data from the word segmentation paradigm leads to valuable insights about group-region differences among the TD and verbal and minimally verbal children with ASD. Finite sample properties of the proposed estimation framework and bootstrap inference procedure are further studied via extensive simulations.
Collapse
Affiliation(s)
- Aaron Scheffler
- Department of Biostatistics, University of California Los Angeles, 650 Charles E Young Drive, Los Angeles, CA, USA
| | - Donatello Telesca
- Department of Biostatistics, University of California Los Angeles, 650 Charles E Young Drive, Los Angeles, CA, USA
| | - Qian Li
- Department of Biostatistics, University of California Los Angeles, 650 Charles E Young Drive, Los Angeles, CA, USA
| | - Catherine A Sugar
- Department of Biostatistics, University of California Los Angeles, 650 Charles E Young Drive, Los Angeles, CA, USA.,Department of Psychiatry and Biobehavioral Sciences, University of California Los Angeles, 757 Westwood Plaza, Los Angeles, CA, USA
| | - Charlotte Distefano
- Department of Psychiatry and Biobehavioral Sciences, University of California Los Angeles, 757 Westwood Plaza, Los Angeles, CA, USA
| | - Shafali Jeste
- Department of Psychiatry and Biobehavioral Sciences, University of California Los Angeles, 757 Westwood Plaza, Los Angeles, CA, USA
| | - Damla Şentürk
- Department of Biostatistics, University of California Los Angeles, 650 Charles E Young Drive, Los Angeles, CA, USA
| |
Collapse
|
18
|
Martínez-Hernández I, Genton MG. Recent developments in complex and spatially correlated functional data. BRAZ J PROBAB STAT 2020. [DOI: 10.1214/20-bjps466] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/19/2022]
|
19
|
Hu M, Crainiceanu C, Schindler MK, Dewey B, Reich DS, Shinohara RT, Eloyan A. Matrix decomposition for modeling lesion development processes in multiple sclerosis. Biostatistics 2020; 23:83-100. [PMID: 32318692 DOI: 10.1093/biostatistics/kxaa016] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/01/2019] [Revised: 03/11/2019] [Accepted: 03/12/2020] [Indexed: 11/14/2022] Open
Abstract
Our main goal is to study and quantify the evolution of multiple sclerosis lesions observed longitudinally over many years in multi-sequence structural magnetic resonance imaging (sMRI). To achieve that, we propose a class of functional models for capturing the temporal dynamics and spatial distribution of the voxel-specific intensity trajectories in all sMRI sequences. To accommodate the hierarchical data structure (observations nested within voxels, which are nested within lesions, which, in turn, are nested within study participants), we use structured functional principal component analysis. We propose and evaluate the finite sample properties of hypothesis tests of therapeutic intervention effects on lesion evolution while accounting for the multilevel structure of the data. Using this novel testing strategy, we found statistically significant differences in lesion evolution between treatment groups.
Collapse
Affiliation(s)
- Menghan Hu
- Department of Biostatistics, Brown University, Providence, RI 02903, USA
| | - Ciprian Crainiceanu
- Department of Biostatistics, Johns Hopkins Bloomberg School of Public Health, Baltimore, MD 21205, USA
| | - Matthew K Schindler
- Translational Neuroradiology Section, Division of Neuroimmunology and Neurovirology, National Institute of Neurological Disorders and Stroke, National Institutes of Health, Bethesda, MD 20892, USA
| | - Blake Dewey
- Translational Neuroradiology Section, Division of Neuroimmunology and Neurovirology, National Institute of Neurological Disorders and Stroke, National Institutes of Health, Bethesda, MD 20892, USA and Department of Electrical and Computer Engineering, Johns Hopkins Whiting School of Engineering, Baltimore, MD 21218, USA
| | - Daniel S Reich
- Translational Neuroradiology Section, National Institute of Neurological Disorders and Stroke, National Institutes of Health, Bethesda, MD 20892, USA
| | - Russell T Shinohara
- Department of Biostatistics, Epidemiology, and Informatics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA 19104, USA and Department of Radiology, Center for Biomedical Image Computing and Analytics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA 19104, USA
| | - Ani Eloyan
- Department of Biostatistics, Brown University, Providence, RI 02903, USA
| |
Collapse
|
20
|
Staicu AM, Islam MN, Dumitru R, van Heugten E. Longitudinal dynamic functional regression. J R Stat Soc Ser C Appl Stat 2020; 69:25-46. [PMID: 31929657 PMCID: PMC6953745 DOI: 10.1111/rssc.12376] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
Abstract
The paper develops a parsimonious modelling framework to study the time-varying association between scalar outcomes and functional predictors observed at many instances, in longitudinal studies. The methods enable us to reconstruct the full trajectory of the response and are applicable to Gaussian and non-Gaussian responses. The idea is to model the time-varying functional predictors by using orthogonal basis functions and to expand the time-varying regression coefficient by using the same basis. Numerical investigation through simulation studies and data analysis show excellent performance in terms of accurate prediction and efficient computations, when compared with existing alternatives. The methods are inspired and applied to an animal science application, where of interest is to study the association between the feed intake of lactating sows and the minute-by-minute temperature throughout the 21 days of their lactation period. R code and an R illustration are provided.
Collapse
|
21
|
Cao J, Soiaporn K, Carroll RJ, Ruppert D. Modeling and Prediction of Multiple Correlated Functional Outcomes. JOURNAL OF AGRICULTURAL BIOLOGICAL AND ENVIRONMENTAL STATISTICS 2019; 24:112-129. [PMID: 30956522 DOI: 10.1007/s13253-018-00344-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/26/2022]
Abstract
We propose a copula-based approach for analyzing functional data with correlated multiple functional outcomes exhibiting heterogeneous shape characteristics. To accommodate the possibly large number of parameters due to having several functional outcomes, parameter estimation is performed in two steps: first, the parameters for the marginal distributions are estimated using the skew t family, and then the dependence structure both within and across outcomes is estimated using a Gaussian copula. We develop an estimation algorithm for the dependence parameters based on the Karhunen-Loève expansion and an EM algorithm that significantly reduces the dimension of the problem and is computationally efficient. We also demonstrate prediction of an unknown outcome when the other outcomes are known. We apply our methodology to diffusion tensor imaging data for multiple sclerosis (MS) patients with three outcomes and identify differences in both the marginal distributions and the dependence structure between the MS and control groups. Our proposed methodology is quite general and can be applied to other functional data with multiple outcomes in biology and other fields.
Collapse
Affiliation(s)
- Jiguo Cao
- Department of Statistics and Actuarial Science, Simon Fraser University, Burnaby, BC V5A1S6, Canada
| | | | - Raymond J Carroll
- Department of Statistics, Texas A&M University, College Station, TX 77843, USA and School of Mathematical and Physical Sciences, University of Technology Sydney, Broadway, NSW 2007, Australia
| | - David Ruppert
- Department of Statistical Science and School of Operations Research and Information Engineering, Cornell University, Ithaca, NY 14850, USA
| |
Collapse
|
22
|
Wang Y, Hu J, Do KA, Hobbs BP. An Efficient Nonparametric Estimate for Spatially Correlated Functional Data. STATISTICS IN BIOSCIENCES 2019. [DOI: 10.1007/s12561-019-09233-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/27/2022]
|
23
|
Fontanella L, Ippoliti L, Valentini P. Predictive functional ANOVA models for longitudinal analysis of mandibular shape changes. Biom J 2019; 61:918-933. [PMID: 30865334 DOI: 10.1002/bimj.201800228] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/06/2018] [Revised: 12/17/2018] [Accepted: 12/19/2018] [Indexed: 11/10/2022]
Abstract
In this paper, we introduce a Bayesian statistical model for the analysis of functional data observed at several time points. Examples of such data include the Michigan growth study where we wish to characterize the shape changes of human mandible profiles. The form of the mandible is often used by clinicians as an aid in predicting the mandibular growth. However, whereas many studies have demonstrated the changes in size that may occur during the period of pubertal growth spurt, shape changes have been less well investigated. Considering a group of subjects presenting normal occlusion, in this paper we thus describe a Bayesian functional ANOVA model that provides information about where and when the shape changes of the mandible occur during different stages of development. The model is developed by defining the notion of predictive process models for Gaussian process (GP) distributions used as priors over the random functional effects. We show that the predictive approach is computationally appealing and that it is useful to analyze multivariate functional data with unequally spaced observations that differ among subjects and times. Graphical posterior summaries show that our model is able to provide a biological interpretation of the morphometric findings and that they comprehensively describe the shape changes of the human mandible profiles. Compared with classical cephalometric analysis, this paper represents a significant methodological advance for the study of mandibular shape changes in two dimensions.
Collapse
Affiliation(s)
- Lara Fontanella
- Department of Legal and Social Sciences, University G. d'Annunzio, Chieti-Pescara, Italy
| | - Luigi Ippoliti
- Department of Economics, University G. d'Annunzio, Chieti-Pescara, Italy
| | - Pasquale Valentini
- Department of Economics, University G. d'Annunzio, Chieti-Pescara, Italy
| |
Collapse
|
24
|
Zhu H, Versace F, Cinciripini PM, Rausch P, Morris JS. Robust and Gaussian spatial functional regression models for analysis of event-related potentials. Neuroimage 2018; 181:501-512. [PMID: 30057352 DOI: 10.1016/j.neuroimage.2018.07.006] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/17/2018] [Revised: 06/01/2018] [Accepted: 07/03/2018] [Indexed: 10/28/2022] Open
Abstract
Event-related potentials (ERPs) summarize electrophysiological brain response to specific stimuli. They can be considered as correlated functions of time with both spatial correlation across electrodes and nested correlations within subjects. Commonly used analytical methods for ERPs often focus on pre-determined extracted components and/or ignore the correlation among electrodes or subjects, which can miss important insights, and tend to be sensitive to outlying subjects, time points or electrodes. Motivated by ERP data in a smoking cessation study, we introduce a Bayesian spatial functional regression framework that models the entire ERPs as spatially correlated functional responses and the stimulus types as covariates. This novel framework relies on mixed models to characterize the effects of stimuli while simultaneously accounting for the multilevel correlation structure. The spatial correlation among the ERP profiles is captured through basis-space Matérn assumptions that allow either separable or nonseparable spatial correlations over time. We induce both adaptive regularization over time and spatial smoothness across electrodes via a correlated normal-exponential-gamma (CNEG) prior on the fixed effect coefficient functions. Our proposed framework includes both Gaussian models as well as robust models using heavier-tailed distributions to make the regression automatically robust to outliers. We introduce predictive methods to select among Gaussian vs. robust models and models with separable vs. non-separable spatiotemporal correlation structures. Our proposed analysis produces global tests for stimuli effects across entire time (or time-frequency) and electrode domains, plus multiplicity-adjusted pointwise inference based on experiment-wise error rate or false discovery rate to flag spatiotemporal (or spatio-temporal-frequency) regions that characterize stimuli differences, and can also produce inference for any prespecified waveform components. Our analysis of the smoking cessation ERP data set reveals numerous effects across different types of visual stimuli.
Collapse
Affiliation(s)
- Hongxiao Zhu
- Department of Statistics, Virginia Tech, Blacksburg, VA, USA.
| | - Francesco Versace
- Department of Behavioral Science, The University of Texas M.D. Anderson Cancer Center, Houston, TX, USA
| | - Paul M Cinciripini
- Department of Behavioral Science, The University of Texas M.D. Anderson Cancer Center, Houston, TX, USA
| | - Philip Rausch
- Department of Psychology, Humboldt-Universität zu Berlin, Berlin, Germany
| | - Jeffrey S Morris
- Department of Biostatistics, The University of Texas M.D. Anderson Cancer Center, Houston, TX, USA
| |
Collapse
|
25
|
Kapur K, Sanchez B, Pacheck A, Darras B, Rutkove SB, Selukar R. Functional Mixed-Effects Modeling of Longitudinal Duchenne Muscular Dystrophy Electrical Impedance Myography Data Using State-Space Approach. IEEE Trans Biomed Eng 2018; 66:1761-1768. [PMID: 30387720 DOI: 10.1109/tbme.2018.2879227] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/14/2022]
Abstract
OBJECTIVE Electrical impedance myography (EIM) is a quantitative and objective tool to evaluate muscle status. EIM offers the possibility to replace conventional physical functioning scores or quality of life measures, which depend on patient cooperation and mood. METHODS Here, we propose a functional mixed-effects model using a state-space approach to describe the response trajectories of EIM data measured on 16 boys with Duchenne muscular dystrophy and 12 healthy controls, both groups measured over a period of two years. The modeling framework presented imposes a smoothing spline structure on EIM data collected at each visit and taking into account of within subject correlations of these curves along the longitudinal measurements. The modeling framework is recast in a state-space approach, thereby allowing for the employment of computationally efficient diffuse Kalman filtering and smoothing algorithms for the model estimation, as well as the estimates of the posterior variance-covariance matrix for the construction of the Bayesian [Formula: see text] confidence bands. RESULTS The proposed model allows us to simultaneously adjust for baseline variables, differentiate the longitudinal changes in the smooth functional response and estimate the subject and subject-time specific deviations from the population-averaged response curves. The code is made publicly available in the supplementary material. SIGNIFICANCE The modeling approach presented will potentially enhance EIM capability to serve as a biomarker for testing therapeutic efficacy in DMD and other clinical trials.
Collapse
|
26
|
Park SY, Staicu AM, Xiao L, Crainiceanu CM. Simple fixed-effects inference for complex functional models. Biostatistics 2018; 19:137-152. [PMID: 29036541 PMCID: PMC5862370 DOI: 10.1093/biostatistics/kxx026] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/16/2016] [Revised: 04/09/2017] [Accepted: 05/07/2017] [Indexed: 11/14/2022] Open
Abstract
We propose simple inferential approaches for the fixed effects in complex functional mixed effects models. We estimate the fixed effects under the independence of functional residuals assumption and then bootstrap independent units (e.g. subjects) to conduct inference on the fixed effects parameters. Simulations show excellent coverage probability of the confidence intervals and size of tests for the fixed effects model parameters. Methods are motivated by and applied to the Baltimore Longitudinal Study of Aging, though they are applicable to other studies that collect correlated functional data.
Collapse
Affiliation(s)
- So Young Park
- Department of Statistics, North Carolina State University, Raleigh, NC, USA
| | - Ana-Maria Staicu
- Department of Statistics, North Carolina State University, Raleigh, NC, USA
| | - Luo Xiao
- Department of Statistics, North Carolina State University, Raleigh, NC, USA
| | | |
Collapse
|
27
|
Hasenstab K, Scheffler A, Telesca D, Sugar CA, Jeste S, DiStefano C, Şentürk D. A multi-dimensional functional principal components analysis of EEG data. Biometrics 2017; 73:999-1009. [PMID: 28072468 PMCID: PMC5517364 DOI: 10.1111/biom.12635] [Citation(s) in RCA: 21] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/01/2016] [Revised: 11/01/2016] [Accepted: 11/01/2016] [Indexed: 11/28/2022]
Abstract
The electroencephalography (EEG) data created in event-related potential (ERP) experiments have a complex high-dimensional structure. Each stimulus presentation, or trial, generates an ERP waveform which is an instance of functional data. The experiments are made up of sequences of multiple trials, resulting in longitudinal functional data and moreover, responses are recorded at multiple electrodes on the scalp, adding an electrode dimension. Traditional EEG analyses involve multiple simplifications of this structure to increase the signal-to-noise ratio, effectively collapsing the functional and longitudinal components by identifying key features of the ERPs and averaging them across trials. Motivated by an implicit learning paradigm used in autism research in which the functional, longitudinal, and electrode components all have critical interpretations, we propose a multidimensional functional principal components analysis (MD-FPCA) technique which does not collapse any of the dimensions of the ERP data. The proposed decomposition is based on separation of the total variation into subject and subunit level variation which are further decomposed in a two-stage functional principal components analysis. The proposed methodology is shown to be useful for modeling longitudinal trends in the ERP functions, leading to novel insights into the learning patterns of children with Autism Spectrum Disorder (ASD) and their typically developing peers as well as comparisons between the two groups. Finite sample properties of MD-FPCA are further studied via extensive simulations.
Collapse
Affiliation(s)
- Kyle Hasenstab
- Department of Statistics, University of California, Los Angeles, CA 90095, U.S.A
| | - Aaron Scheffler
- Department of Biostatistics, University of California, Los Angeles, CA 90095, U.S.A
| | - Donatello Telesca
- Department of Biostatistics, University of California, Los Angeles, CA 90095, U.S.A
| | - Catherine A. Sugar
- Department of Statistics, University of California, Los Angeles, CA 90095, U.S.A
- Department of Biostatistics, University of California, Los Angeles, CA 90095, U.S.A
- Department of Psychiatry and Biobehavioral Sciences, University of California, Los Angeles, CA 90095, U.S.A
| | - Shafali Jeste
- Department of Psychiatry and Biobehavioral Sciences, University of California, Los Angeles, CA 90095, U.S.A
| | - Charlotte DiStefano
- Department of Psychiatry and Biobehavioral Sciences, University of California, Los Angeles, CA 90095, U.S.A
| | - Damla Şentürk
- Department of Statistics, University of California, Los Angeles, CA 90095, U.S.A
- Department of Biostatistics, University of California, Los Angeles, CA 90095, U.S.A
| |
Collapse
|
28
|
Peterson GCL, Li D, Reich BJ, Brenner D. Spatial prediction of crystalline defects observed in molecular dynamic simulations of plastic damage. J Appl Stat 2017. [DOI: 10.1080/02664763.2016.1221915] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/21/2022]
Affiliation(s)
| | - Dong Li
- Department of Material Sciences, North Carolina State University, Raleigh, NC, USA
| | - Brian J. Reich
- Department of Statistics, North Carolina State University, Raleigh, NC, USA
| | - Donald Brenner
- Department of Material Sciences, North Carolina State University, Raleigh, NC, USA
| |
Collapse
|
29
|
Zhu H, Morris JS, Wei F, Cox DD. Multivariate functional response regression, with application to fluorescence spectroscopy in a cervical pre-cancer study. Comput Stat Data Anal 2017; 111:88-101. [PMID: 29051679 PMCID: PMC5642121 DOI: 10.1016/j.csda.2017.02.004] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/20/2022]
Abstract
Many scientific studies measure different types of high-dimensional signals or images from the same subject, producing multivariate functional data. These functional measurements carry different types of information about the scientific process, and a joint analysis that integrates information across them may provide new insights into the underlying mechanism for the phenomenon under study. Motivated by fluorescence spectroscopy data in a cervical pre-cancer study, a multivariate functional response regression model is proposed, which treats multivariate functional observations as responses and a common set of covariates as predictors. This novel modeling framework simultaneously accounts for correlations between functional variables and potential multi-level structures in data that are induced by experimental design. The model is fitted by performing a two-stage linear transformation-a basis expansion to each functional variable followed by principal component analysis for the concatenated basis coefficients. This transformation effectively reduces the intra-and inter-function correlations and facilitates fast and convenient calculation. A fully Bayesian approach is adopted to sample the model parameters in the transformed space, and posterior inference is performed after inverse-transforming the regression coefficients back to the original data domain. The proposed approach produces functional tests that flag local regions on the functional effects, while controlling the overall experiment-wise error rate or false discovery rate. It also enables functional discriminant analysis through posterior predictive calculation. Analysis of the fluorescence spectroscopy data reveals local regions with differential expressions across the pre-cancer and normal samples. These regions may serve as biomarkers for prognosis and disease assessment.
Collapse
Affiliation(s)
- Hongxiao Zhu
- Department of Statistics, Virginia Tech, Blacksburg, VA 24061
| | - Jeffrey S Morris
- The University of Texas MD Anderson Cancer Center, Houston, TX 77230
| | - Fengrong Wei
- Department of Mathematics, University of West Georgia, Carrollton, GA 30118
| | - Dennis D Cox
- Department of Statistics, Rice University, Houston, TX 77005
| |
Collapse
|
30
|
Gromenko O, Kokoszka P, Sojka J. Evaluation of the cooling trend in the ionosphere using functional regression with incomplete curves. Ann Appl Stat 2017. [DOI: 10.1214/17-aoas1022] [Citation(s) in RCA: 24] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/19/2022]
|
31
|
|
32
|
Zhang L, Baladandayuthapani V, Zhu H, Baggerly KA, Majewski T, Czerniak BA, Morris JS. Functional CAR models for large spatially correlated functional datasets. J Am Stat Assoc 2016; 111:772-786. [PMID: 28018013 DOI: 10.1080/01621459.2015.1042581] [Citation(s) in RCA: 21] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/23/2022]
Abstract
We develop a functional conditional autoregressive (CAR) model for spatially correlated data for which functions are collected on areal units of a lattice. Our model performs functional response regression while accounting for spatial correlations with potentially nonseparable and nonstationary covariance structure, in both the space and functional domains. We show theoretically that our construction leads to a CAR model at each functional location, with spatial covariance parameters varying and borrowing strength across the functional domain. Using basis transformation strategies, the nonseparable spatial-functional model is computationally scalable to enormous functional datasets, generalizable to different basis functions, and can be used on functions defined on higher dimensional domains such as images. Through simulation studies, we demonstrate that accounting for the spatial correlation in our modeling leads to improved functional regression performance. Applied to a high-throughput spatially correlated copy number dataset, the model identifies genetic markers not identified by comparable methods that ignore spatial correlations.
Collapse
Affiliation(s)
- Lin Zhang
- The University of Texas M.D. Anderson Cancer Center, Houston, Texas, U.S.A
| | | | | | - Keith A Baggerly
- The University of Texas M.D. Anderson Cancer Center, Houston, Texas, U.S.A
| | - Tadeusz Majewski
- The University of Texas M.D. Anderson Cancer Center, Houston, Texas, U.S.A
| | - Bogdan A Czerniak
- The University of Texas M.D. Anderson Cancer Center, Houston, Texas, U.S.A
| | - Jeffrey S Morris
- The University of Texas M.D. Anderson Cancer Center, Houston, Texas, U.S.A
| |
Collapse
|
33
|
Luo X, Zhu L, Zhu H. Single-index varying coefficient model for functional responses. Biometrics 2016; 72:1275-1284. [PMID: 27061414 DOI: 10.1111/biom.12526] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/01/2015] [Revised: 02/01/2016] [Accepted: 03/01/2016] [Indexed: 11/28/2022]
Abstract
Recently, massive functional data have been widely collected over space across a set of grid points in various imaging studies. It is interesting to correlate functional data with various clinical variables, such as age and gender, in order to address scientific questions of interest. The aim of this article is to develop a single-index varying coefficient (SIVC) model for establishing a varying association between functional responses (e.g., image) and a set of covariates. It enjoys several unique features of both varying-coefficient and single-index models. An estimation procedure is developed to estimate varying coefficient functions, the index function, and the covariance function of individual functions. The optimal integration of information across different grid points is systematically delineated and the asymptotic properties (e.g., consistency and convergence rate) of all estimators are examined. Simulation studies are conducted to assess the finite-sample performance of the proposed estimation procedure. Furthermore, our real data analysis of a white matter tract dataset obtained from the Alzheimer's Disease Neuroimaging Initiative (ADNI) study confirms the advantage and accuracy of SIVC model over the popular varying coefficient model.
Collapse
Affiliation(s)
- Xinchao Luo
- School of Finance and Statistics, East China Normal University, Shanghai, China.,Department of Biostatistics and Biomedical Research Imaging Center, University of North Carolina at Chapel Hill, Chapel Hill, North Carolina, U.S.A
| | - Lixing Zhu
- Department of Mathematics, Hong Kong Baptist University, Hong Kong, China
| | - Hongtu Zhu
- Department of Biostatistics and Biomedical Research Imaging Center, University of North Carolina at Chapel Hill, Chapel Hill, North Carolina, U.S.A
| |
Collapse
|
34
|
Pomann GM, Staicu AM, Ghosh S. A Two Sample Distribution-Free Test for Functional Data with Application to a Diffusion Tensor Imaging Study of Multiple Sclerosis. J R Stat Soc Ser C Appl Stat 2016; 65:395-414. [PMID: 27041772 DOI: 10.1111/rssc.12130] [Citation(s) in RCA: 18] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/22/2022]
Abstract
Motivated by an imaging study, this paper develops a nonparametric testing procedure for testing the null hypothesis that two samples of curves observed at discrete grids and with noise have the same underlying distribution. The objective is to formally compare white matter tract profiles between healthy individuals and multiple sclerosis patients, as assessed by conventional diffusion tensor imaging measures. We propose to decompose the curves using functional principal component analysis of a mixture process, which we refer to as marginal functional principal component analysis. This approach reduces the dimension of the testing problem in a way that enables the use of traditional nonparametric univariate testing procedures. The procedure is computationally efficient and accommodates different sampling designs. Numerical studies are presented to validate the size and power properties of the test in many realistic scenarios. In these cases, the proposed test has been found to be more powerful than its primary competitor. Application to the diffusion tensor imaging data reveals that all the tracts studied are associated with multiple sclerosis and the choice of the diffusion tensor image measurement is important when assessing axonal disruption.
Collapse
Affiliation(s)
| | | | - Sujit Ghosh
- North Carolina State University, Raleigh and Statistical and Applied Mathematical Sciences Institute, RTP, NC. USA
| |
Collapse
|
35
|
Cederbaum J, Pouplier M, Hoole P, Greven S. Functional linear mixed models for irregularly or sparsely sampled data. STAT MODEL 2015. [DOI: 10.1177/1471082x15617594] [Citation(s) in RCA: 17] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/16/2022]
Abstract
We propose an estimation approach to analyse correlated functional data, which are observed on unequal grids or even sparsely. The model we use is a functional linear mixed model, a functional analogue of the linear mixed model. Estimation is based on dimension reduction via functional principal component analysis and on mixed model methodology. Our procedure allows the decomposition of the variability in the data as well as the estimation of mean effects of interest, and borrows strength across curves. Confidence bands for mean effects can be constructed conditionally on estimated principal components. We provide R -code implementing our approach in an online appendix. The method is motivated by and applied to data from speech production research.
Collapse
Affiliation(s)
- Jona Cederbaum
- Department of Statistics, Faculty of Mathematics, Computer Science and Statistics, Ludwig-Maximilians-University, Munich, Germany
| | - Marianne Pouplier
- Department of Phonetics and Speech Processing, Faculty of Languages and Literature, Ludwig-Maximilians-University, Munich, Germany
| | - Phil Hoole
- Department of Phonetics and Speech Processing, Faculty of Languages and Literature, Ludwig-Maximilians-University, Munich, Germany
| | - Sonja Greven
- Department of Statistics, Faculty of Mathematics, Computer Science and Statistics, Ludwig-Maximilians-University, Munich, Germany
| |
Collapse
|
36
|
Abstract
We consider dependent functional data that are correlated because of a longitudinal-based design: each subject is observed at repeated times and at each time a functional observation (curve) is recorded. We propose a novel parsimonious modeling framework for repeatedly observed functional observations that allows to extract low dimensional features. The proposed methodology accounts for the longitudinal design, is designed to study the dynamic behavior of the underlying process, allows prediction of full future trajectory, and is computationally fast. Theoretical properties of this framework are studied and numerical investigations confirm excellent behavior in finite samples. The proposed method is motivated by and applied to a diffusion tensor imaging study of multiple sclerosis.
Collapse
Affiliation(s)
- So Young Park
- Department of Statistics, North Carolina State University, Raleigh, NC 27695-8203, USA
| | - Ana-Maria Staicu
- Department of Statistics, North Carolina State University, Raleigh, NC 27695-8203, USA
| |
Collapse
|
37
|
Goldsmith J, Kitago T. Assessing systematic effects of stroke on motorcontrol by using hierarchical function-on-scalar regression. J R Stat Soc Ser C Appl Stat 2015; 65:215-236. [PMID: 27546913 DOI: 10.1111/rssc.12115] [Citation(s) in RCA: 33] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/17/2022]
Abstract
This work is concerned with understanding common population-level effects of stroke on motor control while accounting for possible subject-level idiosyncratic effects. Upper extremity motor control for each subject is assessed through repeated planar reaching motions from a central point to eight pre-specified targets arranged on a circle. We observe the kinematic data for hand position as a bivariate function of time for each reach. Our goal is to estimate the bivariate function-on-scalar regression with subject-level random functional effects while accounting for potential correlation in residual curves; covariates of interest are severity of motor impairment and target number. We express fixed effects and random effects using penalized splines, and allow for residual correlation using a Wishart prior distribution. Parameters are jointly estimated in a Bayesian framework, and we implement a computationally efficient approximation algorithm using variational Bayes. Simulations indicate that the proposed method yields accurate estimation and inference, and application results suggest that the effect of stroke on motor control has a systematic component observed across subjects.
Collapse
Affiliation(s)
- Jeff Goldsmith
- Department of Biostatistics, Mailman School of Public Health, Columbia University
| | - Tomoko Kitago
- Department of Neurology, Columbia University Medical Center
| |
Collapse
|
38
|
Discussion of “analysis of spatio-temporal mobile phone data: a case study in the metropolitan area of Milan” by P. Secchi, S. Vantini, and V. Vitelli. STAT METHOD APPL-GER 2015. [DOI: 10.1007/s10260-015-0317-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/23/2022]
|
39
|
Abstract
We propose an extensive framework for additive regression models for correlated functional responses, allowing for multiple partially nested or crossed functional random effects with flexible correlation structures for, e.g., spatial, temporal, or longitudinal functional data. Additionally, our framework includes linear and nonlinear effects of functional and scalar covariates that may vary smoothly over the index of the functional response. It accommodates densely or sparsely observed functional responses and predictors which may be observed with additional error and includes both spline-based and functional principal component-based terms. Estimation and inference in this framework is based on standard additive mixed models, allowing us to take advantage of established methods and robust, flexible algorithms. We provide easy-to-use open source software in the pffr() function for the R-package refund. Simulations show that the proposed method recovers relevant effects reliably, handles small sample sizes well and also scales to larger data sets. Applications with spatially and longitudinally observed functional data demonstrate the flexibility in modeling and interpretability of results of our approach.
Collapse
|
40
|
Zipunnikov V, Greven S, Shou H, Caffo B, Reich DS, Crainiceanu C. Longitudinal High-Dimensional Principal Components Analysis with Application to Diffusion Tensor Imaging of Multiple Sclerosis. Ann Appl Stat 2015; 8:2175-2202. [PMID: 25663955 PMCID: PMC4316386 DOI: 10.1214/14-aoas748] [Citation(s) in RCA: 27] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/19/2022]
Abstract
We develop a flexible framework for modeling high-dimensional imaging data observed longitudinally. The approach decomposes the observed variability of repeatedly measured high-dimensional observations into three additive components: a subject-specific imaging random intercept that quantifies the cross-sectional variability, a subject-specific imaging slope that quantifies the dynamic irreversible deformation over multiple realizations, and a subject-visit specific imaging deviation that quantifies exchangeable effects between visits. The proposed method is very fast, scalable to studies including ultra-high dimensional data, and can easily be adapted to and executed on modest computing infrastructures. The method is applied to the longitudinal analysis of diffusion tensor imaging (DTI) data of the corpus callosum of multiple sclerosis (MS) subjects. The study includes 176 subjects observed at 466 visits. For each subject and visit the study contains a registered DTI scan of the corpus callosum at roughly 30,000 voxels.
Collapse
Affiliation(s)
- Vadim Zipunnikov
- Department of Biostatistics, Johns Hopkins University, Baltimore, MD, 21205
| | - Sonja Greven
- Department of Statistics, Ludwig-Maximilians-Universität and Miinchen, 80539 Munich, Germany
| | | | - Brian Caffo
- Department of Biostatistics, Johns Hopkins University, Baltimore, MD, 21205
| | - Daniel S. Reich
- Translational Neurology Unit, Neuroimmunology Branch, National Institute of Neurological Disorders and Stroke, National Institutes of Health, Bethesda, MD 20892, USA
| | | |
Collapse
|
41
|
Staicu AM, Lahiri SN, Carroll RJ. Significance tests for functional data with complex dependence structure. J Stat Plan Inference 2015; 156:1-13. [PMID: 26023253 DOI: 10.1016/j.jspi.2014.08.006] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/24/2022]
Abstract
We propose an L2-norm based global testing procedure for the null hypothesis that multiple group mean functions are equal, for functional data with complex dependence structure. Specifically, we consider the setting of functional data with a multilevel structure of the form groups-clusters or subjects-units, where the unit-level profiles are spatially correlated within the cluster, and the cluster-level data are independent. Orthogonal series expansions are used to approximate the group mean functions and the test statistic is estimated using the basis coefficients. The asymptotic null distribution of the test statistic is developed, under mild regularity conditions. To our knowledge this is the first work that studies hypothesis testing, when data have such complex multilevel functional and spatial structure. Two small-sample alternatives, including a novel block bootstrap for functional data, are proposed, and their performance is examined in simulation studies. The paper concludes with an illustration of a motivating experiment.
Collapse
Affiliation(s)
- Ana-Maria Staicu
- Department of Statistics, North Carolina State University, United States
| | - Soumen N Lahiri
- Department of Statistics, North Carolina State University, United States
| | | |
Collapse
|
42
|
Xiao L, Huang L, Schrack JA, Ferrucci L, Zipunnikov V, Crainiceanu CM. Quantifying the lifetime circadian rhythm of physical activity: a covariate-dependent functional approach. Biostatistics 2014; 16:352-67. [PMID: 25361695 DOI: 10.1093/biostatistics/kxu045] [Citation(s) in RCA: 49] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open
Abstract
Objective measurement of physical activity using wearable devices such as accelerometers may provide tantalizing new insights into the association between activity and health outcomes. Accelerometers can record quasi-continuous activity information for many days and for hundreds of individuals. For example, in the Baltimore Longitudinal Study on Aging physical activity was recorded every minute for [Formula: see text] adults for an average of [Formula: see text] days per adult. An important scientific problem is to separate and quantify the systematic and random circadian patterns of physical activity as functions of time of day, age, and gender. To capture the systematic circadian pattern, we introduce a practical bivariate smoother and two crucial innovations: (i) estimating the smoothing parameter using leave-one-subject-out cross validation to account for within-subject correlation and (ii) introducing fast computational techniques that overcome problems both with the size of the data and with the cross-validation approach to smoothing. The age-dependent random patterns are analyzed by a new functional principal component analysis that incorporates both covariate dependence and multilevel structure. For the analysis, we propose a practical and very fast trivariate spline smoother to estimate covariate-dependent covariances and their spectra. Results reveal several interesting, previously unknown, circadian patterns associated with human aging and gender.
Collapse
Affiliation(s)
- Luo Xiao
- Department of Biostatistics, Johns Hopkins University, Baltimore, MD 21205, USA
| | - Lei Huang
- Department of Biostatistics, Johns Hopkins University, Baltimore, MD 21205, USA
| | - Jennifer A Schrack
- Department of Epidemiology, Johns Hopkins University, Baltimore, MD 21205, USA and Longitudinal Studies Section, Translational Gerontology Branch, National Institute on Aging, National Institutes of Health, Baltimore, MD 21225, USA
| | - Luigi Ferrucci
- Longitudinal Studies Section, Translational Gerontology Branch, National Institute on Aging, National Institutes of Health, Baltimore, MD 21225, USA
| | - Vadim Zipunnikov
- Department of Biostatistics, Johns Hopkins University, Baltimore, MD 21205, USA
| | | |
Collapse
|
43
|
Shou H, Zipunnikov V, Crainiceanu CM, Greven S. Structured functional principal component analysis. Biometrics 2014; 71:247-257. [PMID: 25327216 DOI: 10.1111/biom.12236] [Citation(s) in RCA: 40] [Impact Index Per Article: 3.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/01/2013] [Revised: 07/01/2013] [Accepted: 08/01/2014] [Indexed: 11/30/2022]
Abstract
Motivated by modern observational studies, we introduce a class of functional models that expand nested and crossed designs. These models account for the natural inheritance of the correlation structures from sampling designs in studies where the fundamental unit is a function or image. Inference is based on functional quadratics and their relationship with the underlying covariance structure of the latent processes. A computationally fast and scalable estimation procedure is developed for high-dimensional data. Methods are used in applications including high-frequency accelerometer data for daily activity, pitch linguistic data for phonetic analysis, and EEG data for studying electrical brain activity during sleep.
Collapse
Affiliation(s)
- Haochang Shou
- Department of Biostatistics and Epidemiology, University of Pennsylvania, Perelman School of Medicine, Philadelphia, Pennsylvania, U.S.A
| | - Vadim Zipunnikov
- Department of Biostatistics, Johns Hopkins Bloomberg School of Public Health, Baltimore, Maryland, U.S.A
| | - Ciprian M Crainiceanu
- Department of Biostatistics, Johns Hopkins Bloomberg School of Public Health, Baltimore, Maryland, U.S.A
| | - Sonja Greven
- Department of Statistics, Ludwig-Maximilians-Universität München, Munich, Germany
| |
Collapse
|
44
|
Rakêt LL, Markussen B. Approximate inference for spatial functional data on massively parallel processors. Comput Stat Data Anal 2014. [DOI: 10.1016/j.csda.2013.10.016] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/05/2023]
|
45
|
Staicu AM, Li Y, Crainiceanu CM, Ruppert D. Likelihood Ratio Tests for Dependent Data with Applications to Longitudinal and Functional Data Analysis. Scand Stat Theory Appl 2014. [DOI: 10.1111/sjos.12075] [Citation(s) in RCA: 28] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
Affiliation(s)
| | - Yingxing Li
- The Wang Yanan Institute for Studies in Economics; Xiamen University
| | | | - David Ruppert
- Department of Statistical Science and School of Operations Research and Information Engineering; Cornell University
| |
Collapse
|
46
|
Reiss PT, Huang L, Chen YH, Huo L, Tarpey T, Mennes M. Massively parallel nonparametric regression, with an application to developmental brain mapping. J Comput Graph Stat 2014; 23:232-248. [PMID: 24683303 DOI: 10.1080/10618600.2012.733549] [Citation(s) in RCA: 16] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/27/2022]
Abstract
We propose a penalized spline approach to performing large numbers of parallel non-parametric analyses of either of two types: restricted likelihood ratio tests of a parametric regression model versus a general smooth alternative, and nonparametric regression. Compared with naïvely performing each analysis in turn, our techniques reduce computation time dramatically. Viewing the large collection of scatterplot smooths produced by our methods as functional data, we develop a clustering approach to summarize and visualize these results. Our approach is applicable to ultra-high-dimensional data, particularly data acquired by neuroimaging; we illustrate it with an analysis of developmental trajectories of functional connectivity at each of approximately 70000 brain locations. Supplementary materials, including an appendix and an R package, are available online.
Collapse
Affiliation(s)
- Philip T Reiss
- Department of Child and Adolescent Psychiatry, New York University ; Nathan S. Kline Institute for Psychiatric Research
| | - Lei Huang
- Department of Biostatistics, Johns Hopkins University
| | - Yin-Hsiu Chen
- Department of Child and Adolescent Psychiatry, New York University
| | - Lan Huo
- Department of Child and Adolescent Psychiatry, New York University
| | - Thaddeus Tarpey
- Department of Mathematics and Statistics, Wright State University
| | - Maarten Mennes
- Department of Cognitive Neuroscience, Radboud University Nijmegen Medical Centre ; Department of Child and Adolescent Psychiatry, New York University
| |
Collapse
|
47
|
Chen H, Wang Y, Paik MC, Choi HA. A marginal approach to reduced-rank penalized spline smoothing with application to multilevel functional data. J Am Stat Assoc 2013; 108:1216-1229. [PMID: 24497670 DOI: 10.1080/01621459.2013.826134] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/25/2022]
Abstract
Multilevel functional data is collected in many biomedical studies. For example, in a study of the effect of Nimodipine on patients with subarachnoid hemorrhage (SAH), patients underwent multiple 4-hour treatment cycles. Within each treatment cycle, subjects' vital signs were reported every 10 minutes. This data has a natural multilevel structure with treatment cycles nested within subjects and measurements nested within cycles. Most literature on nonparametric analysis of such multilevel functional data focus on conditional approaches using functional mixed effects models. However, parameters obtained from the conditional models do not have direct interpretations as population average effects. When population effects are of interest, we may employ marginal regression models. In this work, we propose marginal approaches to fit multilevel functional data through penalized spline generalized estimating equation (penalized spline GEE). The procedure is effective for modeling multilevel correlated generalized outcomes as well as continuous outcomes without suffering from numerical difficulties. We provide a variance estimator robust to misspecification of correlation structure. We investigate the large sample properties of the penalized spline GEE estimator with multilevel continuous data and show that the asymptotics falls into two categories. In the small knots scenario, the estimated mean function is asymptotically efficient when the true correlation function is used and the asymptotic bias does not depend on the working correlation matrix. In the large knots scenario, both the asymptotic bias and variance depend on the working correlation. We propose a new method to select the smoothing parameter for penalized spline GEE based on an estimate of the asymptotic mean squared error (MSE). We conduct extensive simulation studies to examine property of the proposed estimator under different correlation structures and sensitivity of the variance estimation to the choice of smoothing parameter. Finally, we apply the methods to the SAH study to evaluate a recent debate on discontinuing the use of Nimodipine in the clinical community.
Collapse
Affiliation(s)
- Huaihou Chen
- Department of Child and Adolescent Psychiatry, New York University School of Medicine New York, NY 10016, U.S.A
| | - Yuanjia Wang
- Department of Biostatistics, Mailman School of Public Health, Columbia University, New York, NY 10032, U.S.A
| | - Myunghee Cho Paik
- Department of Statistics, Seoul National University, 1 Gwanakro, Gwanakgu, Seoul, Korea 151-742
| | - H Alex Choi
- Department of Neurosurgery and Neurology, The University of Texas Health Science Center at Houston Medical School, Houston, Texas 77030, U.S.A
| |
Collapse
|
48
|
Serban N, Staicu AM, Carroll RJ. Multilevel cross-dependent binary longitudinal data. Biometrics 2013; 69:903-13. [PMID: 24131242 DOI: 10.1111/biom.12083] [Citation(s) in RCA: 22] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/01/2012] [Revised: 06/01/2013] [Accepted: 07/01/2013] [Indexed: 11/30/2022]
Abstract
We provide insights into new methodology for the analysis of multilevel binary data observed longitudinally, when the repeated longitudinal measurements are correlated. The proposed model is logistic functional regression conditioned on three latent processes describing the within- and between-variability, and describing the cross-dependence of the repeated longitudinal measurements. We estimate the model components without employing mixed-effects modeling but assuming an approximation to the logistic link function. The primary objectives of this article are to highlight the challenges in the estimation of the model components, to compare two approximations to the logistic regression function, linear and exponential, and to discuss their advantages and limitations. The linear approximation is computationally efficient whereas the exponential approximation applies for rare events functional data. Our methods are inspired by and applied to a scientific experiment on spectral backscatter from long range infrared light detection and ranging (LIDAR) data. The models are general and relevant to many new binary functional data sets, with or without dependence between repeated functional measurements.
Collapse
Affiliation(s)
- Nicoleta Serban
- H. Milton Stewart School of Industrial Systems and Engineering, Georgia Institute of Technology, 765 Ferst Drive, Atlanta, Georgia, 30318, U.S.A
| | | | | |
Collapse
|
49
|
Miranda MF, Zhu H, Ibrahim JG. Bayesian spatial transformation models with applications in neuroimaging data. Biometrics 2013; 69:1074-83. [PMID: 24128143 DOI: 10.1111/biom.12085] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/01/2012] [Revised: 06/01/2013] [Accepted: 06/01/2013] [Indexed: 11/28/2022]
Abstract
The aim of this article is to develop a class of spatial transformation models (STM) to spatially model the varying association between imaging measures in a three-dimensional (3D) volume (or 2D surface) and a set of covariates. The proposed STM include a varying Box-Cox transformation model for dealing with the issue of non-Gaussian distributed imaging data and a Gaussian Markov random field model for incorporating spatial smoothness of the imaging data. Posterior computation proceeds via an efficient Markov chain Monte Carlo algorithm. Simulations and real data analysis demonstrate that the STM significantly outperforms the voxel-wise linear model with Gaussian noise in recovering meaningful geometric patterns. Our STM is able to reveal important brain regions with morphological changes in children with attention deficit hyperactivity disorder.
Collapse
Affiliation(s)
- Michelle F Miranda
- Department of Statistics and Operations Research, University of North Carolina at Chapel Hill, Chapel Hill, North Carolina, U.S.A
| | | | | |
Collapse
|
50
|
Crainiceanu CM, Staicu AM, Ray S, Punjabi N. Bootstrap-based inference on the difference in the means of two correlated functional processes. Stat Med 2012; 31:3223-40. [PMID: 22855258 DOI: 10.1002/sim.5439] [Citation(s) in RCA: 34] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/01/2011] [Accepted: 04/18/2012] [Indexed: 11/06/2022]
Abstract
We propose nonparametric inference methods on the mean difference between two correlated functional processes. We compare methods that (1) incorporate different levels of smoothing of the mean and covariance; (2) preserve the sampling design; and (3) use parametric and nonparametric estimation of the mean functions. We apply our method to estimating the mean difference between average normalized δ power of sleep electroencephalograms for 51 subjects with severe sleep apnea and 51 matched controls in the first 4 h after sleep onset. We obtain data from the Sleep Heart Health Study, the largest community cohort study of sleep. Although methods are applied to a single case study, they can be applied to a large number of studies that have correlated functional data.
Collapse
Affiliation(s)
- Ciprian M Crainiceanu
- Department of Biostatistics, Johns Hopkins University, 615 N. Wolfe St., Baltimore, MD 21205, U.S.A.
| | | | | | | |
Collapse
|