1
|
Jiang S, Colditz GA. Modeling correlated pairs of mammogram images. Stat Med 2024; 43:1660-1668. [PMID: 38351511 DOI: 10.1002/sim.10002] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/24/2022] [Revised: 10/30/2023] [Accepted: 12/10/2023] [Indexed: 03/16/2024]
Abstract
Mammography remains the primary screening strategy for breast cancer, which continues to be the most prevalent cancer diagnosis among women globally. Because screening mammograms capture both the left and right breast, there is a nonnegligible correlation between the pair of images. Previous studies have explored the concept of averaging between the pair of images after proper image registration; however, no comparison has been made in directly utilizing the paired images. In this paper, we extend the bivariate functional principal component analysis over triangulations to jointly characterize the pair of imaging data bounded in an irregular domain and then nest the extracted features within the survival model to predict the onset of breast cancer. The method is applied to our motivating data from the Joanne Knight Breast Health Cohort at Siteman Cancer Center. Our findings indicate that there was no statistically significant difference in model discrimination performance between averaging the pair of images and jointly modeling the two images. Although the breast cancer study did not reveal any significant difference, it is worth noting that the methods proposed here can be readily extended to other studies involving paired or multivariate imaging data.
Collapse
Affiliation(s)
- Shu Jiang
- Division of Public Health Sciences, Washington University School of Medicine, St. Louis, Missouri, USA
| | - Graham A Colditz
- Division of Public Health Sciences, Washington University School of Medicine, St. Louis, Missouri, USA
| |
Collapse
|
2
|
Xie S, Ogden RT. Functional support vector machine. Biostatistics 2024:kxae007. [PMID: 38476094 DOI: 10.1093/biostatistics/kxae007] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/03/2023] [Revised: 12/26/2023] [Accepted: 02/13/2024] [Indexed: 03/14/2024] Open
Abstract
Linear and generalized linear scalar-on-function modeling have been commonly used to understand the relationship between a scalar response variable (e.g. continuous, binary outcomes) and functional predictors. Such techniques are sensitive to model misspecification when the relationship between the response variable and the functional predictors is complex. On the other hand, support vector machines (SVMs) are among the most robust prediction models but do not take account of the high correlations between repeated measurements and cannot be used for irregular data. In this work, we propose a novel method to integrate functional principal component analysis with SVM techniques for classification and regression to account for the continuous nature of functional data and the nonlinear relationship between the scalar response variable and the functional predictors. We demonstrate the performance of our method through extensive simulation experiments and two real data applications: the classification of alcoholics using electroencephalography signals and the prediction of glucobrassicin concentration using near-infrared reflectance spectroscopy. Our methods especially have more advantages when the measurement errors in functional predictors are relatively large.
Collapse
Affiliation(s)
- Shanghong Xie
- School of Statistics, Southwestern University of Finance and Economics, Chengdu, China
- Department of Biostatistics, Columbia University, New York, NY, United States
| | - R Todd Ogden
- Department of Biostatistics, Columbia University, New York, NY, United States
| |
Collapse
|
3
|
Koner S, Luo S. Projection-based two-sample inference for sparsely observed multivariate functional data. Biostatistics 2024:kxae004. [PMID: 38413051 DOI: 10.1093/biostatistics/kxae004] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/20/2023] [Revised: 01/08/2024] [Accepted: 01/11/2024] [Indexed: 02/29/2024] Open
Abstract
Modern longitudinal studies collect multiple outcomes as the primary endpoints to understand the complex dynamics of the diseases. Oftentimes, especially in clinical trials, the joint variation among the multidimensional responses plays a significant role in assessing the differential characteristics between two or more groups, rather than drawing inferences based on a single outcome. We develop a projection-based two-sample significance test to identify the population-level difference between the multivariate profiles observed under a sparse longitudinal design. The methodology is built upon widely adopted multivariate functional principal component analysis to reduce the dimension of the infinite-dimensional multi-modal functions while preserving the dynamic correlation between the components. The test applies to a wide class of (non-stationary) covariance structures of the response, and it detects a significant group difference based on a single p-value, thereby overcoming the issue of adjusting for multiple p-values that arise due to comparing the means in each of components separately. Finite-sample numerical studies demonstrate that the test maintains the type-I error, and is powerful to detect significant group differences, compared to the state-of-the-art testing procedures. The test is carried out on two significant longitudinal studies for Alzheimer's disease and Parkinson's disease (PD) patients, namely, TOMMORROW study of individuals at high risk of mild cognitive impairment to detect differences in the cognitive test scores between the pioglitazone and the placebo groups, and Azillect study to assess the efficacy of rasagiline as a potential treatment to slow down the progression of PD.
Collapse
Affiliation(s)
- Salil Koner
- Department of Biostatistics and Bioinformatics Duke University, Durham, NC, United States
| | - Sheng Luo
- Department of Biostatistics and Bioinformatics Duke University, Durham, NC, United States
| |
Collapse
|
4
|
Gomon D, Putter H, Fiocco M, Signorelli M. Dynamic prediction of survival using multivariate functional principal component analysis: A strict landmarking approach. Stat Methods Med Res 2024; 33:256-272. [PMID: 38196243 DOI: 10.1177/09622802231224631] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/11/2024]
Abstract
Dynamically predicting patient survival probabilities using longitudinal measurements has become of great importance with routine data collection becoming more common. Many existing models utilize a multi-step landmarking approach for this problem, mostly due to its ease of use and versatility but unfortunately most fail to do so appropriately. In this article we make use of multivariate functional principal component analysis to summarize the available longitudinal information, and employ a Cox proportional hazards model for prediction. Additionally, we consider a centred functional principal component analysis procedure in an attempt to remove the natural variation incurred by the difference in age of the considered subjects. We formalize the difference between a 'relaxed' landmarking approach where only validation data is landmarked and a 'strict' landmarking approach where both the training and validation data are landmarked. We show that a relaxed landmarking approach fails to effectively use the information contained in the longitudinal outcomes, thereby producing substantially worse prediction accuracy than a strict landmarking approach.
Collapse
Affiliation(s)
- Daniel Gomon
- Mathematical Institute, Leiden University, Leiden, the Netherlands
| | - Hein Putter
- Department of Biomedical Data Sciences, Leiden University Medical Centre, Leiden, the Netherlands
| | - Marta Fiocco
- Mathematical Institute, Leiden University, Leiden, the Netherlands
- Department of Biomedical Data Sciences, Leiden University Medical Centre, Leiden, the Netherlands
| | - Mirko Signorelli
- Mathematical Institute, Leiden University, Leiden, the Netherlands
| |
Collapse
|
5
|
Yue Y, Jang JH, Manatunga AK. Assessing intra- and inter-method agreement of functional data. Stat Methods Med Res 2024; 33:112-129. [PMID: 38155544 DOI: 10.1177/09622802231219862] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/30/2023]
Abstract
Modern medical devices are increasingly producing complex data that could offer deeper insights into physiological mechanisms of underlying diseases. One type of complex data that arises frequently in medical imaging studies is functional data, whose sampling unit is a smooth continuous function. In this work, with the goal of establishing the scientific validity of experiments involving modern medical imaging devices, we focus on the problem of evaluating reliability and reproducibility of multiple functional data that are measured on the same subjects by different methods (i.e. different technologies or raters). Specifically, we develop a series of intraclass correlation coefficient and concordance correlation coefficient indices that can assess intra-method, inter-method, and total (intra + inter) agreement based on multivariate multilevel functional data consisting of replicated functional data measurements produced by each of the different methods. For efficient estimation, the proposed indices are expressed using variance components of a multivariate multilevel functional mixed effect model, which can be smoothly estimated by functional principal component analysis. Extensive simulation studies are performed to assess the finite-sample properties of the estimators. The proposed method is applied to evaluate the reliability and reproducibility of renogram curves produced by a high-tech radionuclide image scan used to non-invasively detect kidney obstruction.
Collapse
Affiliation(s)
- Ye Yue
- Department of Biostatistics and Bioinformatics, Emory University, Atlanta, GA, USA
| | - Jeong Hoon Jang
- Quantitative Risk Management, Yonsei University, Incheon, Republic of Korea
| | - Amita K Manatunga
- Department of Biostatistics and Bioinformatics, Emory University, Atlanta, GA, USA
| |
Collapse
|
6
|
Chang X, Li Y, Li Y. Asynchronous and error-prone longitudinal data analysis via functional calibration. Biometrics 2023; 79:3374-3387. [PMID: 37042741 PMCID: PMC10567993 DOI: 10.1111/biom.13866] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/09/2022] [Accepted: 03/22/2023] [Indexed: 04/13/2023]
Abstract
In many longitudinal settings, time-varying covariates may not be measured at the same time as responses and are often prone to measurement error. Naive last-observation-carried-forward methods incur estimation biases, and existing kernel-based methods suffer from slow convergence rates and large variations. To address these challenges, we propose a new functional calibration approach to efficiently learn longitudinal covariate processes based on sparse functional data with measurement error. Our approach, stemming from functional principal component analysis, calibrates the unobserved synchronized covariate values from the observed asynchronous and error-prone covariate values, and is broadly applicable to asynchronous longitudinal regression with time-invariant or time-varying coefficients. For regression with time-invariant coefficients, our estimator is asymptotically unbiased, root-n consistent, and asymptotically normal; for time-varying coefficient models, our estimator has the optimal varying coefficient model convergence rate with inflated asymptotic variance from the calibration. In both cases, our estimators present asymptotic properties superior to the existing methods. The feasibility and usability of the proposed methods are verified by simulations and an application to the Study of Women's Health Across the Nation, a large-scale multisite longitudinal study on women's health during midlife.
Collapse
Affiliation(s)
- Xinyue Chang
- Department of Statistics, Iowa State University, Ames, IA 50011, U.S.A
| | - Yehua Li
- Department of Statistics, University of California, Riverside, CA 92521, U.S.A
| | - Yi Li
- Department of Biostatistics, University of Michigan, Ann Arbor, MI 48109, U.S.A
| |
Collapse
|
7
|
Arnone E, Negri L, Panzica F, Sangalli LM. Analyzing data in complicated 3D domains: Smoothing, semiparametric regression, and functional principal component analysis. Biometrics 2023; 79:3510-3521. [PMID: 36807198 DOI: 10.1111/biom.13845] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/06/2022] [Accepted: 01/26/2023] [Indexed: 02/23/2023]
Abstract
In this work, we introduce a family of methods for the analysis of data observed at locations scattered in three-dimensional (3D) domains, with possibly complicated shapes. The proposed family of methods includes smoothing, regression, and functional principal component analysis for functional signals defined over (possibly nonconvex) 3D domains, appropriately complying with the nontrivial shape of the domain. This constitutes an important advance with respect to the literature, because the available methods to analyze data observed in 3D domains rely on Euclidean distances, which are inappropriate when the shape of the domain influences the phenomenon under study. The common building block of the proposed methods is a nonparametric regression model with differential regularization. We derive the asymptotic properties of the methods and show, through simulation studies, that they are superior to the available alternatives for the analysis of data in 3D domains, even when considering domains with simple shapes. We finally illustrate an application to a neurosciences study, with neuroimaging signals from functional magnetic resonance imaging, measuring neural activity in the gray matter, a nonconvex volume with a highly complicated structure.
Collapse
Affiliation(s)
- Eleonora Arnone
- Department of Statistical Sciences, University of Padova, Italy
- Department of Management, University of Turin, Italy
| | - Luca Negri
- MOX-Department of Mathematics, Politecnico di Milano, Italy
| | | | | |
Collapse
|
8
|
Jiang S, Cao J, Rosner B, Colditz GA. Supervised two-dimensional functional principal component analysis with time-to-event outcomes and mammogram imaging data. Biometrics 2023; 79:1359-1369. [PMID: 34854477 PMCID: PMC9160217 DOI: 10.1111/biom.13611] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/21/2021] [Revised: 11/07/2021] [Accepted: 11/15/2021] [Indexed: 12/24/2022]
Abstract
Screening mammography aims to identify breast cancer early and secondarily measures breast density to classify women at higher or lower than average risk for future breast cancer in the general population. Despite the strong association of individual mammography features to breast cancer risk, the statistical literature on mammogram imaging data is limited. While functional principal component analysis (FPCA) has been studied in the literature for extracting image-based features, it is conducted independently of the time-to-event response variable. With the consideration of building a prognostic model for precision prevention, we present a set of flexible methods, supervised FPCA (sFPCA) and functional partial least squares (FPLS), to extract image-based features associated with the failure time while accommodating the added complication from right censoring. Throughout the article, we hope to demonstrate that one method is favored over the other under different clinical setups. The proposed methods are applied to the motivating data set from the Joanne Knight Breast Health cohort at Siteman Cancer Center. Our approaches not only obtain the best prediction performance compared to the benchmark model, but also reveal different risk patterns within the mammograms.
Collapse
Affiliation(s)
- Shu Jiang
- Division of Public Health Sciences, Washington University School of Medicine in St. Louis, Missouri
| | - Jiguo Cao
- Department of Statistics and Actuarial Science, Simon Fraser University, Canada
| | - Bernard Rosner
- Channing Division of Network Medicine, Harvard Medical School, Massachusetts
| | - Graham A Colditz
- Division of Public Health Sciences, Washington University School of Medicine in St. Louis, Missouri
| |
Collapse
|
9
|
Zhu C, Chen Y, Müller HG, Wang JL, O'Muircheartaigh J, Bruchhage M, Deoni S. Trajectories of brain volumes in young children are associated with maternal education. Hum Brain Mapp 2023; 44:3168-3179. [PMID: 36896867 PMCID: PMC10171562 DOI: 10.1002/hbm.26271] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/07/2023] [Revised: 02/16/2023] [Accepted: 02/22/2023] [Indexed: 03/11/2023] Open
Abstract
Brain growth in early childhood is reflected in the evolution of proportional cerebrospinal fluid volumes (pCSF), grey matter (pGM), and white matter (pWM). We study brain development as reflected in the relative fractions of these three tissues for a cohort of 388 children that were longitudinally followed between the ages of 18 and 96 months. We introduce statistical methodology (Riemannian Principal Analysis through Conditional Expectation, RPACE) that addresses major challenges that are of general interest for the analysis of longitudinal neuroimaging data, including the sparsity of the longitudinal observations over time and the compositional structure of the relative brain volumes. Applying the RPACE methodology, we find that longitudinal growth as reflected by tissue composition differs significantly for children of mothers with higher and lower maternal education levels.
Collapse
Affiliation(s)
- Changbo Zhu
- Department of Applied and Computational Mathematics and Statistics, University of Notre Dame, South Bend, Indiana, USA
| | - Yaqing Chen
- Department of Statistics, Rutgers University, New Brunswick, New Jersey, USA
| | - Hans-Georg Müller
- Department of Statistics, University of California, Davis, California, USA
| | - Jane-Ling Wang
- Department of Statistics, University of California, Davis, California, USA
| | - Jonathan O'Muircheartaigh
- Centre for the Developing Brain, School of Biomedical Engineering and Imaging Sciences, King's College London, London, UK.,Department of Forensic and Neurodevelopmental Sciences, King's College London, London, UK.,MRC Centre for Neurodevelopmental Disorders, King's College London, London, UK
| | - Muriel Bruchhage
- Department of Diagnostic Imaging, Rhode Island Hospital, Providence, Rhode Island, USA.,Department of Pediatrics, Warren Alpert Medical School at Brown University, Providence, Rhode Island, USA.,Institute of Social Studies, University of Stavanger, Stavanger, Norway
| | - Sean Deoni
- MNCH D&T, Bill & Melinda Gates Foundation, Seattle, Washington, USA
| |
Collapse
|
10
|
Ghosal R, Maity A. Variable selection in nonlinear function-on-scalar regression. Biometrics 2023; 79:292-303. [PMID: 34528237 DOI: 10.1111/biom.13564] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/30/2020] [Revised: 07/26/2021] [Accepted: 09/03/2021] [Indexed: 11/28/2022]
Abstract
We develop a new method for variable selection in a nonlinear additive function-on-scalar regression (FOSR) model. Existing methods for variable selection in FOSR have focused on the linear effects of scalar predictors, which can be a restrictive assumption in the presence of multiple continuously measured covariates. We propose a computationally efficient approach for variable selection in existing linear FOSR using functional principal component scores of the functional response and extend this framework to a nonlinear additive function-on-scalar model. The proposed method provides a unified and flexible framework for variable selection in FOSR, allowing nonlinear effects of the covariates. Numerical analysis using simulation study illustrates the advantages of the proposed method over existing variable selection methods in FOSR even when the underlying covariate effects are all linear. The proposed procedure is demonstrated on accelerometer data from the 2003-2004 cohorts of the National Health and Nutrition Examination Survey (NHANES) in understanding the association between diurnal patterns of physical activity and demographic, lifestyle, and health characteristics of the participants.
Collapse
Affiliation(s)
- Rahul Ghosal
- Department of Biostatistics, Johns Hopkins Bloomberg School of Public Health, Baltimore, Maryland
| | - Arnab Maity
- Department of Statistics, North Carolina State University, Raleigh, North Carolina
| |
Collapse
|
11
|
Zhou J, Jiang X, Xia HA, Hobbs BP, Wei P. Landmark mediation survival analysis using longitudinal surrogate. Front Oncol 2023; 12:999324. [PMID: 36733365 PMCID: PMC9887328 DOI: 10.3389/fonc.2022.999324] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/20/2022] [Accepted: 12/14/2022] [Indexed: 01/18/2023] Open
Abstract
Clinical cancer trials are designed to collect radiographic measurements of each patient's baseline and residual tumor burden at regular intervals over the course of study. For solid tumors, the extent of reduction in tumor size following treatment is used as a measure of a drug's antitumor activity. Statistical estimation of treatment efficacy routinely reduce the longitudinal assessment of tumor burden to a binary outcome describing the presence versus absence of an objective tumor response as defined by RECIST criteria. The objective response rate (ORR) is the predominate method for evaluating an experimental therapy in a single-arm trial. Additionally, ORR is routinely compared against a control therapy in phase III randomized controlled trials. The longitudinal assessments of tumor burden are seldom integrated into a formal statistical model, nor integrated into mediation analysis to characterize the relationships among treatment, residual tumor burden, and survival. This article presents a frameworkfor landmark mediation survival analyses devised to incorporate longitudinal assessment of tumor burden. R 2 effect-size measures are developed to quantify the survival treatment mediation effects using longitudinal predictors. Analyses are demonstrated with applications to two colorectal cancer trials. Survival prediction is compared in the presence versus absence of longitudinal analysis. Simulation studies elucidate settings wherein patterns of tumor burden dynamics require longitudinal analysis.
Collapse
Affiliation(s)
- Jie Zhou
- Department of Biostatistics and Pharmacometrics, Neuroscience Global Drug Development, Novartis, East Hanover, NJ, United States
| | - Xun Jiang
- Center for Design and Analysis, Amgen, Thousand Oaks, CA, United States
| | - H. Amy Xia
- Center for Design and Analysis, Amgen, Thousand Oaks, CA, United States
| | - Brian P. Hobbs
- Department of Population Health, The University of Texas, Austin, TX, United States
| | - Peng Wei
- Department of Biostatistics, The University of Texas MD Anderson Cancer Center, Houston, TX, United States,*Correspondence: Peng Wei,
| |
Collapse
|
12
|
Cui E, Li R, Crainiceanu CM, Xiao L. Fast Multilevel Functional Principal Component Analysis. J Comput Graph Stat 2022; 32:366-377. [PMID: 37313008 PMCID: PMC10260118 DOI: 10.1080/10618600.2022.2115500] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/03/2021] [Accepted: 08/06/2022] [Indexed: 10/15/2022]
Abstract
We introduce fast multilevel functional principal component analysis (fast MFPCA), which scales up to high dimensional functional data measured at multiple visits. The new approach is orders of magnitude faster than and achieves comparable estimation accuracy with the original MFPCA (Di et al., 2009). Methods are motivated by the National Health and Nutritional Examination Survey (NHANES), which contains minute-level physical activity information of more than 10000 participants over multiple days and 1440 observations per day. While MFPCA takes more than five days to analyze these data, fast MFPCA takes less than five minutes. A theoretical study of the proposed method is also provided. The associated function mfpca.face() is available in the R package refund.
Collapse
Affiliation(s)
- Erjia Cui
- Department of Biostatistics, Johns Hopkins Bloomberg School of Public Health, 615 N. Wolfe Street, Baltimore, MD 21205
| | - Ruonan Li
- Department of Statistics, North Carolina State University, 2311 Stinson Dr, Raleigh, NC 27607
| | - Ciprian M. Crainiceanu
- Department of Biostatistics, Johns Hopkins Bloomberg School of Public Health, 615 N. Wolfe Street, Baltimore, MD 21205
| | - Luo Xiao
- Department of Statistics, North Carolina State University, 2311 Stinson Dr, Raleigh, NC 27607
| |
Collapse
|
13
|
Shi H, Jiang S, Cao J. Dynamic prediction with time-dependent marker in survival analysis using supervised functional principal component analysis. Stat Med 2022; 41:3547-3560. [PMID: 35574725 PMCID: PMC10025984 DOI: 10.1002/sim.9433] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/18/2021] [Revised: 02/27/2022] [Accepted: 04/26/2022] [Indexed: 11/11/2022]
Abstract
Time-varying biomarkers reflect important information on disease progression over time. Dynamic prediction for event occurrence on a real-time basis, utilizing time-varying information, is crucial in making accurate clinical decisions. Functional principal component analysis (FPCA) has been widely adopted in the literature for extracting features from time-varying biomarker trajectories. However, feature extraction via FPCA is conducted independent of the time-to-event response, which may not produce optimal results when the goal lies in prediction. With this consideration, we propose a novel supervised FPCA, where the functional principal components are determined to optimize the association between the time-varying biomarker and time-to-event outcome. The proposed framework also accommodates irregularly spaced and sparse longitudinal data. Our method is empirically shown to retain better discrimination and calibration performance than the unsupervised FPCA method in simulation studies. Application of the proposed method is also illustrated in the Alzheimer's Disease Neuroimaging Initiative database.
Collapse
Affiliation(s)
- Haolun Shi
- Department of Statistics and Actuarial Science, Simon Fraser University, Burnaby, British Columbia, Canada
| | - Shu Jiang
- Division of Public Health Sciences, Washington University School of Medicine in St. Louis, St. Louis, Missouri, USA
| | - Jiguo Cao
- Department of Statistics and Actuarial Science, Simon Fraser University, Burnaby, British Columbia, Canada
| |
Collapse
|
14
|
Lam KK, Wang B. Multipopulation mortality modelling and forecasting: the weighted multivariate functional principal component approaches. J Appl Stat 2022; 50:3177-3198. [PMID: 37969540 PMCID: PMC10631385 DOI: 10.1080/02664763.2022.2104228] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/04/2021] [Accepted: 07/12/2022] [Indexed: 10/16/2022]
Abstract
Human mortality patterns and trajectories in closely related populations are likely linked together and share similarities. It is always desirable to model them simultaneously while taking their heterogeneity into account. This article introduces two new models for joint mortality modelling and forecasting multiple subpopulations using the multivariate functional principal component analysis techniques. The first model extends the independent functional data model to a multipopulation modelling setting. In the second one, we propose a novel multivariate functional principal component method for coherent modelling. Its design primarily fulfils the idea that when several subpopulation groups have similar socio-economic conditions or common biological characteristics such close connections are expected to evolve in a non-diverging fashion. We demonstrate the proposed methods by using sex-specific mortality data. Their forecast performances are further compared with several existing models, including the independent functional data model and the Product-Ratio model, through comparisons with mortality data of ten developed countries. The numerical examples show that the first proposed model maintains a comparable forecast ability with the existing methods. In contrast, the second proposed model outperforms the first model as well as the existing models in terms of forecast accuracy.
Collapse
Affiliation(s)
- Ka Kin Lam
- School of Mathematics and Actuarial Science, University of Leicester, Leicester, UK
| | - Bo Wang
- School of Mathematics and Actuarial Science, University of Leicester, Leicester, UK
| |
Collapse
|
15
|
Wang G, Liu S, Han F, Di CZ. Robust functional principal component analysis via a functional pairwise spatial sign operator. Biometrics 2022. [PMID: 35583919 PMCID: PMC9672141 DOI: 10.1111/biom.13695] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/01/2021] [Accepted: 05/03/2022] [Indexed: 11/30/2022]
Abstract
Functional principal component analysis (FPCA) has been widely used to capture major modes of variation and reduce dimensions in functional data analysis. However, standard FPCA based on the sample covariance estimator does not work well if the data exhibits heavy-tailedness or outliers. To address this challenge, a new robust functional principal component analysis approach based on a functional pairwise spatial sign (PASS) operator, termed PASS FPCA, is introduced. We propose robust estimation procedures for eigenfunctions and eigenvalues. Theoretical properties of the PASS operator are established, showing that it adopts the same eigenfunctions as the standard covariance operator and also allows recovering ratios between eigenvalues. We also extend the proposed procedure to handle functional data measured with noise. Compared to existing robust FPCA approaches, the proposed PASS FPCA requires weaker distributional assumptions to conserve the eigenspace of the covariance function. Specifically, existing work are often built upon a class of functional elliptical distributions, which requires inherently symmetry. In contrast, we introduce a class of distributions called the weakly functional coordinate symmetry (weakly FCS), which allows for severe asymmetry and is much more flexible than the functional elliptical distribution family. The robustness of the PASS FPCA is demonstrated via extensive simulation studies, especially its advantages in scenarios with non-elliptical distributions. The proposed method was motivated by and applied to analysis of accelerometry data from the Objective Physical Activity and Cardiovascular Health Study, a large-scale epidemiological study to investigate the relationship between objectively measured physical activity and cardiovascular health among older women. This article is protected by copyright. All rights reserved.
Collapse
Affiliation(s)
- Guangxing Wang
- Division of Public Health Sciences, Fred Hutchinson Cancer Research Center, Seattle, WA, USA
| | - Sisheng Liu
- School of Mathematics and Statistics, Hunan Normal University, Changsha, Hunan, China
| | - Fang Han
- Department of Statistics, University of Washington, Seattle, WA, USA
| | - Chong-Zhi Di
- Division of Public Health Sciences, Fred Hutchinson Cancer Research Center, Seattle, WA, USA
| |
Collapse
|
16
|
ZHU H, LI Y, LIU B, YAO W, ZHANG R. Extreme quantile estimation for partial functional linear regression models with heavy-tailed distributions. CAN J STAT 2022; 50:267-286. [PMID: 38239624 PMCID: PMC10795494 DOI: 10.1002/cjs.11653] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/17/2020] [Accepted: 03/15/2021] [Indexed: 11/10/2022]
Abstract
In this article, we propose a novel estimator of extreme conditional quantiles in partial functional linear regression models with heavy-tailed distributions. The conventional quantile regression estimators are often unstable at the extreme tails due to data sparsity, especially for heavy-tailed distributions. We first estimate the slope function and the partially linear coefficient using a functional quantile regression based on functional principal component analysis, which is a robust alternative to the ordinary least squares regression. The extreme conditional quantiles are then estimated by using a new extrapolation technique from extreme value theory. We establish the asymptotic normality of the proposed estimator and illustrate its finite sample performance by simulation studies and an empirical analysis of diffusion tensor imaging data from a cognitive disorder study.
Collapse
Affiliation(s)
- Hanbing ZHU
- School of Statistics, Key Laboratory of Advanced Theory and Application in Statistics and Data Science-MOE, East China Normal University, Shanghai, China
| | - Yehua LI
- Department of Statistics, University of California, Riverside, California, USA
| | - Baisen LIU
- School of Statistics, Dongbei University of Finance and Economics, Dalian, China
| | - Weixin YAO
- Department of Statistics, University of California, Riverside, California, USA
| | - Riquan ZHANG
- School of Statistics, Key Laboratory of Advanced Theory and Application in Statistics and Data Science-MOE, East China Normal University, Shanghai, China
| |
Collapse
|
17
|
Abstract
We consider estimation of mean and covariance functions of functional snippets, which are short segments of functions possibly observed irregularly on an individual specific subinterval that is much shorter than the entire study interval. Estimation of the covariance function for functional snippets is challenging since information for the far off-diagonal regions of the covariance structure is completely missing. We address this difficulty by decomposing the covariance function into a variance function component and a correlation function component. The variance function can be effectively estimated nonparametrically, while the correlation part is modeled parametrically, possibly with an increasing number of parameters, to handle the missing information in the far off-diagonal regions. Both theoretical analysis and numerical simulations suggest that this hybrid strategy is effective. In addition, we propose a new estimator for the variance of measurement errors and analyze its asymptotic properties. This estimator is required for the estimation of the variance function from noisy measurements.
Collapse
|
18
|
Shi H, Ma D, Faisal Beg M, Cao J. A functional proportional hazard cure rate model for interval-censored data. Stat Methods Med Res 2021; 31:154-168. [PMID: 34806480 DOI: 10.1177/09622802211052972] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/16/2022]
Abstract
Existing survival models involving functional covariates typically rely on the Cox proportional hazards structure and the assumption of right censorship. Motivated by the aim of predicting the time of conversion to Alzheimer's disease from sparse biomarker trajectories in patients with mild cognitive impairment, we propose a functional mixture cure rate model with both functional and scalar covariates for interval censoring and sparsely sampled functional data. To estimate the nonparametric coefficient function that depicts the effect of the shape of the trajectories on the survival outcome and cure probability, we utilize the functional principal component analysis to extract the functional features from the sparsely and irregularly sampled trajectories. To obtain parameter estimates from the mixture cure rate model with interval censoring, we apply the expectation-maximization algorithm based on Poisson data augmentation. The estimation accuracy of our method is assessed via a simulation study and we apply our model on Alzheimer's disease Neuroimaging Initiative data set.
Collapse
Affiliation(s)
- Haolun Shi
- Department of Statistics and Actuarial Science, 1763Simon Fraser University, Burnaby, BC, Canada
| | - Da Ma
- School of Engineering, 1763Simon Fraser University, Burnaby, BC, Canada
| | - Mirza Faisal Beg
- School of Engineering, 1763Simon Fraser University, Burnaby, BC, Canada
| | - Jiguo Cao
- Department of Statistics and Actuarial Science, 1763Simon Fraser University, Burnaby, BC, Canada
| |
Collapse
|
19
|
Shi H, Ma D, Nie Y, Faisal Beg M, Pei J, Cao J, Neuroimaging Initiative TAD. Early diagnosis of Alzheimer's disease on ADNI data using novel longitudinal score based on functional principal component analysis. J Med Imaging (Bellingham) 2021; 8:024502. [PMID: 33898638 DOI: 10.1117/1.jmi.8.2.024502] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/22/2020] [Accepted: 03/12/2021] [Indexed: 11/14/2022] Open
Abstract
Methods: Alzheimer's disease (AD) is a worldwide prevalent age-related neurodegenerative disease with no available cure yet. Early prognosis is therefore crucial for planning proper clinical intervention. It is especially true for people diagnosed with mild cognitive impairment, to whom the prediction of whether and when the future disease onset would happen is particularly valuable. However, such prognostic prediction has been proven to be challenging, and previous studies have only achieved limited success. Approach: In this study, we seek to extract the principal component of the longitudinal disease progression trajectory in the early stage of AD, measured as the magnetic resonance imaging (MRI)-derived structural volume, to predict the onset of AD for mild cognitive impaired patients two years ahead. Results: Cross-validation results of LASSO regression using the longitudinal functional principal component (FPC) features show significant improved predictive power compared to training using the baseline volume 12 months before AD conversion [area under the receiver operating characteristic curve (AUC) of 0.802 versus 0.732] and 24 months before AD conversion (AUC of 0.816 versus 0.717). Conclusions: We present a framework using the FPCA to extract features from MRI-derived information collected from multiple timepoints. The results of our study demonstrate the advantageous predictive power of the population-based longitudinal features to predict the disease onset compared with using only cross-sectional data-based on volumetric features extracted from a single timepoint, demonstrating the improved prediction power using FPC-derived longitudinal features.
Collapse
Affiliation(s)
- Haolun Shi
- Simon Fraser University, Department of Statistics and Actuarial Science, Burnaby, BC, Canada
| | - Da Ma
- Simon Fraser University, School of Engineering Science, Burnaby, BC, Canada
| | - Yunlong Nie
- Simon Fraser University, Department of Statistics and Actuarial Science, Burnaby, BC, Canada
| | - Mirza Faisal Beg
- Simon Fraser University, School of Engineering Science, Burnaby, BC, Canada
| | - Jian Pei
- Simon Fraser University, Department of Statistics and Actuarial Science, Burnaby, BC, Canada.,Simon Fraser University, School of Computing Science, Burnaby, BC, Canada
| | - Jiguo Cao
- Simon Fraser University, Department of Statistics and Actuarial Science, Burnaby, BC, Canada.,Simon Fraser University, School of Computing Science, Burnaby, BC, Canada
| | - The Alzheimer's Disease Neuroimaging Initiative
- Simon Fraser University, Department of Statistics and Actuarial Science, Burnaby, BC, Canada.,Simon Fraser University, School of Engineering Science, Burnaby, BC, Canada.,Simon Fraser University, School of Computing Science, Burnaby, BC, Canada
| |
Collapse
|
20
|
Barua S, Elhalawani H, Volpe S, Al Feghali KA, Yang P, Ng SP, Elgohari B, Granberry RC, Mackin DS, Gunn GB, Hutcheson KA, Chambers MS, Court LE, Mohamed ASR, Fuller CD, Lai SY, Rao A. Computed Tomography Radiomics Kinetics as Early Imaging Correlates of Osteoradionecrosis in Oropharyngeal Cancer Patients. Front Artif Intell 2021; 4:618469. [PMID: 33898983 PMCID: PMC8063205 DOI: 10.3389/frai.2021.618469] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/17/2020] [Accepted: 03/04/2021] [Indexed: 01/08/2023] Open
Abstract
Osteoradionecrosis (ORN) is a major side-effect of radiation therapy in oropharyngeal cancer (OPC) patients. In this study, we demonstrate that early prediction of ORN is possible by analyzing the temporal evolution of mandibular subvolumes receiving radiation. For our analysis, we use computed tomography (CT) scans from 21 OPC patients treated with Intensity Modulated Radiation Therapy (IMRT) with subsequent radiographically-proven ≥ grade II ORN, at three different time points: pre-IMRT, 2-months, and 6-months post-IMRT. For each patient, radiomic features were extracted from a mandibular subvolume that developed ORN and a control subvolume that received the same dose but did not develop ORN. We used a Multivariate Functional Principal Component Analysis (MFPCA) approach to characterize the temporal trajectories of these features. The proposed MFPCA model performs the best at classifying ORN vs. Control subvolumes with an area under curve (AUC) = 0.74 [95% confidence interval (C.I.): 0.61–0.90], significantly outperforming existing approaches such as a pre-IMRT features model or a delta model based on changes at intermediate time points, i.e., at 2- and 6-month follow-up. This suggests that temporal trajectories of radiomics features derived from sequential pre- and post-RT CT scans can provide markers that are correlates of RT-induced mandibular injury, and consequently aid in earlier management of ORN.
Collapse
Affiliation(s)
- Souptik Barua
- Department of Electrical and Computer Engineering, Rice University, Houston, TX, United States.,Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, MI, United States
| | - Hesham Elhalawani
- Department of Radiation Oncology, The University of Texas MD Anderson Cancer Center, Houston, TX, United States
| | - Stefania Volpe
- Department of Radiation Oncology, European Institute of Oncology IRCSS, Milan, Italy.,Department of Oncology and Hemato-Oncology, University of Milan, Milan, Italy
| | - Karine A Al Feghali
- Department of Radiation Oncology, The University of Texas MD Anderson Cancer Center, Houston, TX, United States
| | - Pei Yang
- Department of Radiation Oncology, The University of Texas MD Anderson Cancer Center, Houston, TX, United States
| | - Sweet Ping Ng
- Department of Radiation Oncology, Peter MacCallum Cancer Centre, Melbourne, VIC, Australia
| | - Baher Elgohari
- Department of Radiation Oncology, The University of Texas MD Anderson Cancer Center, Houston, TX, United States
| | - Robin C Granberry
- Department of Radiation Oncology, The University of Texas MD Anderson Cancer Center, Houston, TX, United States
| | - Dennis S Mackin
- Department of Radiation Physics, The University of Texas MD Anderson Cancer Center, Houston, TX, United States
| | - G Brandon Gunn
- Department of Radiation Oncology, The University of Texas MD Anderson Cancer Center, Houston, TX, United States
| | - Katherine A Hutcheson
- Department of Radiation Oncology, The University of Texas MD Anderson Cancer Center, Houston, TX, United States
| | - Mark S Chambers
- Department of Oncologic Dentistry and Prosthodontics, The University of Texas MD Anderson Cancer Center, Houston, TX, United States
| | - Laurence E Court
- Department of Radiation Physics, The University of Texas MD Anderson Cancer Center, Houston, TX, United States
| | - Abdallah S R Mohamed
- Department of Radiation Oncology, The University of Texas MD Anderson Cancer Center, Houston, TX, United States
| | - Clifton D Fuller
- Department of Radiation Oncology, The University of Texas MD Anderson Cancer Center, Houston, TX, United States.,Department of Radiation Physics, The University of Texas MD Anderson Cancer Center, Houston, TX, United States
| | - Stephen Y Lai
- Department of Head and Neck Surgery, The University of Texas MD Anderson Cancer Center, Houston, TX, United States
| | - Arvind Rao
- Department of Electrical and Computer Engineering, Rice University, Houston, TX, United States.,Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, MI, United States.,Department of Radiation Oncology, University of Michigan, Ann Arbor, MI, United States
| |
Collapse
|
21
|
Sawikowska A, Piasecka A, Kachlicki P, Krajewski P. Separation of Chromatographic Co-Eluted Compounds by Clustering and by Functional Data Analysis. Metabolites 2021; 11:metabo11040214. [PMID: 33807374 PMCID: PMC8065729 DOI: 10.3390/metabo11040214] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/02/2021] [Revised: 03/25/2021] [Accepted: 03/29/2021] [Indexed: 11/26/2022] Open
Abstract
Peak overlapping is a common problem in chromatography, mainly in the case of complex biological mixtures, i.e., metabolites. Due to the existence of the phenomenon of co-elution of different compounds with similar chromatographic properties, peak separation becomes challenging. In this paper, two computational methods of separating peaks, applied, for the first time, to large chromatographic datasets, are described, compared, and experimentally validated. The methods lead from raw observations to data that can form inputs for statistical analysis. First, in both methods, data are normalized by the mass of sample, the baseline is removed, retention time alignment is conducted, and detection of peaks is performed. Then, in the first method, clustering is used to separate overlapping peaks, whereas in the second method, functional principal component analysis (FPCA) is applied for the same purpose. Simulated data and experimental results are used as examples to present both methods and to compare them. Real data were obtained in a study of metabolomic changes in barley (Hordeum vulgare) leaves under drought stress. The results suggest that both methods are suitable for separation of overlapping peaks, but the additional advantage of the FPCA is the possibility to assess the variability of individual compounds present within the same peaks of different chromatograms.
Collapse
Affiliation(s)
- Aneta Sawikowska
- Department of Mathematical and Statistical Methods, Poznań University of Life Sciences, Wojska Polskiego 28, 60-637 Poznań, Poland
- Institute of Bioorganic Chemistry, Polish Academy of Sciences, Z. Noskowskiego 12/14, 61-704 Poznań, Poland;
- Correspondence: or ; Tel.: +48-61-848-75-45
| | - Anna Piasecka
- Institute of Bioorganic Chemistry, Polish Academy of Sciences, Z. Noskowskiego 12/14, 61-704 Poznań, Poland;
| | - Piotr Kachlicki
- Institute of Plant Genetics, Polish Academy of Sciences, Strzeszyńska 34, 60-479 Poznań, Poland; (P.K.); (P.K.)
| | - Paweł Krajewski
- Institute of Plant Genetics, Polish Academy of Sciences, Strzeszyńska 34, 60-479 Poznań, Poland; (P.K.); (P.K.)
| |
Collapse
|
22
|
Hong Y, Su L, Song S, Yan F. Dynamic prediction of disease processes based on recurrent history and functional principal component analysis of longitudinal biomarkers: Application for ovarian epithelial cancer. Stat Med 2021; 40:2006-2023. [PMID: 33484015 DOI: 10.1002/sim.8885] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/30/2019] [Revised: 12/28/2020] [Accepted: 01/06/2021] [Indexed: 11/09/2022]
Abstract
Ovarian epithelial cancer is a gynecological tumor with a high risk of recurrence and death. In the clinical diagnosis of ovarian epithelial cancer, CA125 has become an important indicator of disease burden. To account for patient recurrence and death, a proper method is needed to integrate information from biomarkers and recurrence simultaneously. In the past 10 years, many methods have been proposed for joint modeling of longitudinal biomarkers and survival data, but few of them are applicable to longitudinal data and disease processes, including recurrence and death. In this article, we proposed a new joint frailty model based on functional principal component analysis for dynamic prediction of survival probabilities on the total time scale, which took recurrent history and longitudinal data into account simultaneously. The estimation of the joint frailty model is achieved by maximizing the penalized log-likelihood function. The simulation results demonstrated the advantages of our method in both discrimination and accuracy under different scenarios. To indicate the method's practicality, it is applied to an actual dataset of patients with ovarian epithelial cancer to predict survival dynamically using longitudinal data of biomarker CA125 and recurrent history data.
Collapse
Affiliation(s)
- Yizhou Hong
- Research Center of Biostatistics and Computational Pharmacy, China Pharmaceutical University, Nanjing, China
| | - Liwen Su
- Research Center of Biostatistics and Computational Pharmacy, China Pharmaceutical University, Nanjing, China
| | - Siyi Song
- Research Center of Biostatistics and Computational Pharmacy, China Pharmaceutical University, Nanjing, China
| | - Fangrong Yan
- Research Center of Biostatistics and Computational Pharmacy, China Pharmaceutical University, Nanjing, China
| |
Collapse
|
23
|
Zhu Y, Huang X, Li L. Dynamic prediction of time to a clinical event with sparse and irregularly measured longitudinal biomarkers. Biom J 2020; 62:1371-1393. [PMID: 32196728 PMCID: PMC7502505 DOI: 10.1002/bimj.201900112] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/12/2019] [Revised: 12/13/2019] [Accepted: 12/18/2019] [Indexed: 12/21/2022]
Abstract
In clinical research and practice, landmark models are commonly used to predict the risk of an adverse future event, using patients' longitudinal biomarker data as predictors. However, these data are often observable only at intermittent visits, making their measurement times irregularly spaced and unsynchronized across different subjects. This poses challenges to conducting dynamic prediction at any post-baseline time. A simple solution is the last-value-carry-forward method, but this may result in bias for the risk model estimation and prediction. Another option is to jointly model the longitudinal and survival processes with a shared random effects model. However, when dealing with multiple biomarkers, this approach often results in high-dimensional integrals without a closed-form solution, and thus the computational burden limits its software development and practical use. In this article, we propose to process the longitudinal data by functional principal component analysis techniques, and then use the processed information as predictors in a class of flexible linear transformation models to predict the distribution of residual time-to-event occurrence. The measurement schemes for multiple biomarkers are allowed to be different within subject and across subjects. Dynamic prediction can be performed in a real-time fashion. The advantages of our proposed method are demonstrated by simulation studies. We apply our approach to the African American Study of Kidney Disease and Hypertension, predicting patients' risk of kidney failure or death by using four important longitudinal biomarkers for renal functions.
Collapse
Affiliation(s)
- Yayuan Zhu
- The Department of Epidemiology and Biostatistics, University of Western Ontario, London, ON, Canada
| | - Xuelin Huang
- The Department of Biostatistics, The University of Texas MD Anderson Cancer Center, Houston, TX, United States
| | - Liang Li
- The Department of Biostatistics, The University of Texas MD Anderson Cancer Center, Houston, TX, United States
| |
Collapse
|
24
|
Wang S, Nie Y, Sutherland JM, Wang L. Pattern discovery of health curves using an ordered probit model with Bayesian smoothing and functional principal component analysis. Stat Methods Med Res 2020; 30:458-472. [PMID: 32976070 DOI: 10.1177/0962280220951834] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/08/2023]
Abstract
This article is motivated by the need for discovering patterns of patients' health based on their daily settings of care to aid the health policy-makers to improve the effectiveness of distributing funding for health services. The hidden process of one's health status is assumed to be a continuous smooth function, called the health curve, ranging from perfectly healthy to dead. The health curves are linked to the categorical setting of care using an ordered probit model and are inferred through Bayesian smoothing. The challenges include the nontrivial constraints on the lower bound of the health status (death) and on the model parameters to ensure model identifiability. We use the Markov chain Monte Carlo method to estimate the parameters and health curves. The functional principal component analysis is applied to the patients' estimated health curves to discover common health patterns. The proposed method is demonstrated through an application to patients hospitalized from strokes in Ontario. Whilst this paper focuses on the method's application to a health care problem, the proposed model and its implementation have the potential to be applied to many application domains in which the response variable is ordinal and there is a hidden process. Our implementation is available at https://github.com/liangliangwangsfu/healthCurveCode.
Collapse
Affiliation(s)
- Shijia Wang
- School of Statistics and Data Science, LPMC and KLMDASR, Nankai University, China
| | - Yunlong Nie
- Department of Statistics and Actuarial Science, Simon Fraser University, Canada
| | - Jason M Sutherland
- Centre for Health Services and Policy Research, School of Population and Public Health, University of British Columbia, Canada
| | - Liangliang Wang
- Department of Statistics and Actuarial Science, Simon Fraser University, Canada
| |
Collapse
|
25
|
Gecili E, Huang R, Khoury JC, King E, Altaye M, Bowers K, Szczesniak RD. Functional data analysis and prediction tools for continuous glucose-monitoring studies. J Clin Transl Sci 2020; 5:e51. [PMID: 33948272 DOI: 10.1017/cts.2020.545] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [What about the content of this article? (0)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/18/2022] Open
Abstract
Introduction To identify phenotypes of type 1 diabetes based on glucose curves from continuous glucose-monitoring (CGM) using functional data (FD) analysis to account for longitudinal glucose patterns. We present a reliable prediction model that can accurately predict glycemic levels based on past data collected from the CGM sensor and real-time risk of hypo-/hyperglycemic for individuals with type 1 diabetes. Methods A longitudinal cohort study of 443 type 1 diabetes patients with CGM data from a completed trial. The FD analysis approach, sparse functional principal components (FPCs) analysis was used to identify phenotypes of type 1 diabetes glycemic variation. We employed a nonstationary stochastic linear mixed-effects model (LME) that accommodates between-patient and within-patient heterogeneity to predict glycemic levels and real-time risk of hypo-/hyperglycemic by creating specific target functions for these excursions. Results The majority of the variation (73%) in glucose trajectories was explained by the first two FPCs. Higher order variation in the CGM profiles occurred during weeknights, although variation was higher on weekends. The model has low prediction errors and yields accurate predictions for both glucose levels and real-time risk of glycemic excursions. Conclusions By identifying these distinct longitudinal patterns as phenotypes, interventions can be targeted to optimize type 1 diabetes management for subgroups at the highest risk for compromised long-term outcomes such as cardiac disease or stroke. Further, the estimated change/variability in an individual's glucose trajectory can be used to establish clinically meaningful and patient-specific thresholds that, when coupled with probabilistic predictive inference, provide a useful medical-monitoring tool.
Collapse
|
26
|
O'Connor JD, O'Connell MDL, Romero-Ortuno R, Hernández B, Newman L, Reilly RB, Kenny RA, Knight SP. Functional Analysis of Continuous, High-Resolution Measures in Aging Research: A Demonstration Using Cerebral Oxygenation Data From the Irish Longitudinal Study on Aging. Front Hum Neurosci 2020; 14:261. [PMID: 32765238 PMCID: PMC7379867 DOI: 10.3389/fnhum.2020.00261] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/18/2020] [Accepted: 06/12/2020] [Indexed: 12/16/2022] Open
Abstract
Background: A shift towards the dynamic measurement of physiologic resilience and improved technology incorporated into experimental paradigms in aging research is producing high-resolution data. Identifying the most appropriate analysis method for this type of data is a challenge. In this work, the functional principal component analysis (fPCA) was employed to demonstrate a data-driven approach to the analysis of high-resolution data in aging research. Methods: Cerebral oxygenation during standing was measured in a large cohort [The Irish Longitudinal Study on Aging (TILDA)]. FPCA was performed on tissue saturation index (TSI) data. A regression analysis was then conducted with the functional principal component (fPC) scores as the explanatory variables and transition time as the response. Results: The mean ± SD age of the analysis sample was 64 ± 8 years. Females made up 54% of the sample and overall, 43% had tertiary education. The first PC explained 96% of the variance in cerebral oxygenation upon standing and was related to a baseline shift. Subsequent components described the recovery to before-stand levels (fPC2), drop magnitude and initial recovery (fPC3 and fPC4) as well as a temporal shift in the location of the minimum TSI value (fPC5). Transition time was associated with components describing the magnitude and timing of the nadir. Conclusions: Application of fPCA showed utility in reducing a large amount of data to a small number of parameters which summarize the inter-participant variation in TSI upon standing. A demonstration of principal component regression was provided to allow for continued use and development of data-driven approaches to high-resolution data analysis in aging research.
Collapse
Affiliation(s)
- John D O'Connor
- The Irish Longitudinal Study on Aging, Trinity College, The University of Dublin, Dublin, Ireland
| | - Matthew D L O'Connell
- The Irish Longitudinal Study on Aging, Trinity College, The University of Dublin, Dublin, Ireland.,Department of Population Health Sciences, King's College London, London, United Kingdom
| | - Roman Romero-Ortuno
- The Irish Longitudinal Study on Aging, Trinity College, The University of Dublin, Dublin, Ireland.,The Global Brain Health Institute, Trinity College, The University of Dublin, Dublin, Ireland
| | - Belinda Hernández
- The Irish Longitudinal Study on Aging, Trinity College, The University of Dublin, Dublin, Ireland
| | - Louise Newman
- The Irish Longitudinal Study on Aging, Trinity College, The University of Dublin, Dublin, Ireland
| | - Richard B Reilly
- Trinity Centre for Biomedical Engineering, Trinity College, The University of Dublin, Dublin, Ireland
| | - Rose Anne Kenny
- The Irish Longitudinal Study on Aging, Trinity College, The University of Dublin, Dublin, Ireland
| | - Silvin P Knight
- The Irish Longitudinal Study on Aging, Trinity College, The University of Dublin, Dublin, Ireland
| |
Collapse
|
27
|
Abstract
Gaussian distributions have been commonly assumed when clustering functional data. When the normality condition fails, biased results will follow. Additional challenges occur as the number of the clusters is often unknown a priori. This paper focuses on clustering non-Gaussian functional data without the prior information of the number of clusters. We introduce a semiparametric mixed normal transformation model to accommodate non-Gaussian functional data, and propose a penalized approach to simultaneously estimate the parameters, transformation function, and the number of clusters. The estimators are shown to be consistent and asymptotically normal. The practical utility of the methods is confirmed via simulations as well as an application of the analysis of Alzheimer's disease study. The proposed method yields much less classification error than the existing methods. Data used in preparation of this paper were obtained from the Alzheimer's Disease Neuroimaging Initiative database.
Collapse
Affiliation(s)
- Qingzhi Zhong
- Center of Statistical Research and School of Statistics, Southwestern University of Finance and Economics, Chengdu, China, 611130
| | - Huazhen Lin
- Center of Statistical Research and School of Statistics, Southwestern University of Finance and Economics, Chengdu, China, 611130
| | - Yi Li
- Department of Biostatistics, University of Michigan, Ann Arbor, Michigan
| |
Collapse
|
28
|
Li F, Li K, Li C, Luo S. Predicting the Risk of Huntington's Disease with Multiple Longitudinal Biomarkers. J Huntingtons Dis 2020; 8:323-332. [PMID: 31256145 DOI: 10.3233/jhd-190345] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/15/2022]
Abstract
BACKGROUND Huntington's disease (HD) has gradually become a public health threat, and there is a growing interest in developing prognostic models to predict the time for HD diagnosis. OBJECTIVE This study aims to develop a novel prognostic model that leverages multiple longitudinal biomarkers to inform the risk of HD. METHODS The multivariate functional principal component analysis was used to summarize the essential information from multiple longitudinal markers and to obtain a set of prognostic scores. The prognostic scores were used as predictors in a Cox model to predict the right-censored time to diagnosis. We used cross-validation to determine the best model in PREDICT-HD (n = 1,039) and ENROLL-HD (n = 1,776); external validation was carried out in ENROLL-HD. RESULTS We considered six commonly measured longitudinal biomarkers in PREDICT-HD and ENROLL-HD (Total Motor Score, Symbol Digit Modalities Test, Stroop Word Test, Stroop Color Test, Stroop Interference Test, and Total Functional Capacity). The prognostic model utilizing these longitudinal biomarkers significantly improved the predictive performance over the model with baseline biomarker information. A new prognostic index was computed using the proposed model, and can be dynamically updated over time as new biomarker measurements become available. CONCLUSION Longitudinal measurements of commonly measured clinical biomarkers substantially improve the risk prediction of Huntington's disease diagnosis. Calculation of the prognostic index informs the patient's risk category and facilitates patient selection in future clinical trials.
Collapse
Affiliation(s)
- Fan Li
- Department of Biostatistics and Bioinformatics, Duke University School of Medicine, Durham, NC, USA.,Duke Clinical Research Institute, Durham, NC, USA
| | - Kan Li
- Merck Research Lab, Merck & Co, North Wales, PA, USA
| | - Cai Li
- Department of Biostatistics and Bioinformatics, Duke University School of Medicine, Durham, NC, USA
| | - Sheng Luo
- Department of Biostatistics and Bioinformatics, Duke University School of Medicine, Durham, NC, USA.,Duke Clinical Research Institute, Durham, NC, USA
| | | |
Collapse
|
29
|
Wang Y, Wang G, Wang L, Ogden RT. Simultaneous confidence corridors for mean functions in functional data analysis of imaging data. Biometrics 2020; 76:427-437. [PMID: 31544958 PMCID: PMC7310608 DOI: 10.1111/biom.13156] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/18/2018] [Accepted: 09/09/2019] [Indexed: 11/30/2022]
Abstract
Motivated by recent work involving the analysis of biomedical imaging data, we present a novel procedure for constructing simultaneous confidence corridors for the mean of imaging data. We propose to use flexible bivariate splines over triangulations to handle an irregular domain of the images that is common in brain imaging studies and in other biomedical imaging applications. The proposed spline estimators of the mean functions are shown to be consistent and asymptotically normal under some regularity conditions. We also provide a computationally efficient estimator of the covariance function and derive its uniform consistency. The procedure is also extended to the two-sample case in which we focus on comparing the mean functions from two populations of imaging data. Through Monte Carlo simulation studies, we examine the finite sample performance of the proposed method. Finally, the proposed method is applied to analyze brain positron emission tomography data in two different studies. One data set used in preparation of this article was obtained from the Alzheimer's Disease Neuroimaging Initiative (ADNI) database.
Collapse
Affiliation(s)
- Yueying Wang
- Department of Statistics, Iowa State University, Ames, Iowa
| | - Guannan Wang
- Department of Mathematics, College of William and Mary, Williamsburg, Virginia
| | - Li Wang
- Department of Statistics, Iowa State University, Ames, Iowa
| | - R. Todd Ogden
- Department of Biostatistics, Columbia University, New York, New York
| |
Collapse
|
30
|
Zhang B, Zheng K, Huang Q, Feng S, Zhou S, Zhang Y. Aircraft Engine Prognostics Based on Informative Sensor Selection and Adaptive Degradation Modeling with Functional Principal Component Analysis. Sensors (Basel) 2020; 20:E920. [PMID: 32050483 DOI: 10.3390/s20030920] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 12/15/2019] [Revised: 02/03/2020] [Accepted: 02/06/2020] [Indexed: 11/17/2022]
Abstract
Engine prognostics are critical to improve safety, reliability, and operational efficiency of an aircraft. With the development in sensor technology, multiple sensors are embedded or deployed to monitor the health condition of the aircraft engine. Thus, the challenge of engine prognostics lies in how to model and predict future health by appropriate utilization of these sensor information. In this paper, a prognostic approach is developed based on informative sensor selection and adaptive degradation modeling with functional data analysis. The presented approach selects sensors based on metrics and constructs health index to characterize engine degradation by fusing the selected informative sensors. Next, the engine degradation is adaptively modeled with the functional principal component analysis (FPCA) method and future health is prognosticated using the Bayesian inference. The prognostic approach is applied to run-to-failure data sets of C-MAPSS test-bed developed by NASA. Results show that the proposed method can effectively select the informative sensors and accurately predict the complex degradation of the aircraft engine.
Collapse
|
31
|
Fortela DLB, Farmer K, Zappi A, Sharp WW, Revellame E, Gang D, Zappi M. A Methodology for Global Sensitivity Analysis of Activated Sludge Models: Case Study with Activated Sludge Model No. 3 (ASM3). Water Environ Res 2019; 91:865-876. [PMID: 31004529 DOI: 10.1002/wer.1127] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 12/19/2018] [Revised: 04/11/2019] [Accepted: 04/13/2019] [Indexed: 06/09/2023]
Abstract
The main objective of this study was to demonstrate a computational approach of global sensitivity analysis (GSA) integrated with functional principal component analysis (fPCA) for activated sludge models through aggregation of time-dependent model response patterns into time-independent coefficients of functional principal components (PCs). This proposed approach addresses the main issue of time-varying character of GSA indices when calculated solely on the time-dependent model outputs. The GSA-fPCA methodology was implemented using the rigorous model Activated Sludge Model No. 3 (ASM3) as case study. The approach transforms the time-dependent model outputs into functional PCs prior to calculation of GSA indices to remove the time-varying character of the calculated GSA indices. This work focused on the evaluation of the following key computational factors that may significantly influence the performance of the GSA-fPCA methodology: (a) model parameter sampling range, (b) model simulation period, (c) basis functions system, and (d) state of the system being modeled-batch or continuous activated sludge process. Results show that first few functional PCs capture up to 100% of the curve patterns in the time-dependent model outputs. The sensitivity indices calculated from the PC scores via Morris' GSA technique elucidated parameter sensitivity patterns inherent to the complex mathematical structure of ASM3. PRACTITIONER POINTS: Functional principal components-mediated GSA technique to remove time-varying character of sensitivity indices derived from time-dependent dynamical models. Technique amenable to improving efficiency of capturing response patterns into few functional principal components through various basis functions. Identifying priority parameters for ASM3 model calibration requires specification of target model outputs to which parameter sensitivities are calculated. GSA-fPCA offers a comprehensive numerical approach to manipulating models depending on the intended applications: simple fast-responding models to complex models.
Collapse
Affiliation(s)
- Dhan Lord B Fortela
- Energy Institute of Louisiana, University of Louisiana, Lafayette, Louisiana
- Department of Chemical Engineering, University of Louisiana, Lafayette, Louisiana
| | - Kyle Farmer
- Department of Chemical Engineering, University of Louisiana, Lafayette, Louisiana
| | - Alex Zappi
- Department of Chemical Engineering, University of Louisiana, Lafayette, Louisiana
| | - Wayne W Sharp
- Energy Institute of Louisiana, University of Louisiana, Lafayette, Louisiana
- Department of Civil Engineering, University of Louisiana, Lafayette, Louisiana
| | - Emmanuel Revellame
- Department of Industrial Technology, University of Louisiana, Lafayette, Louisiana
| | - Daniel Gang
- Department of Civil Engineering, University of Louisiana, Lafayette, Louisiana
| | - Mark Zappi
- Energy Institute of Louisiana, University of Louisiana, Lafayette, Louisiana
- Department of Chemical Engineering, University of Louisiana, Lafayette, Louisiana
| |
Collapse
|
32
|
Moreno-Oyervides A, Martín-Mateos P, Aguilera-Morillo MC, Ulisse G, Arriba MC, Durban M, Rio MD, Larcher F, Krozer V, Acedo P. Early, Non-Invasive Sensing of Sustained Hyperglycemia in Mice Using Millimeter-Wave Spectroscopy. Sensors (Basel) 2019; 19:E3347. [PMID: 31366169 DOI: 10.3390/s19153347] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [What about the content of this article? (0)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 05/28/2019] [Revised: 07/12/2019] [Accepted: 07/27/2019] [Indexed: 11/22/2022]
Abstract
Diabetes is a very complex condition affecting millions of people around the world. Its occurrence, always accompanied by sustained hyperglycemia, leads to many medical complications that can be greatly mitigated when the disease is treated in its earliest stage. In this paper, a novel sensing approach for the early non-invasive detection and monitoring of sustained hyperglycemia is presented. The sensing principle is based on millimeter-wave transmission spectroscopy through the skin and subsequent statistical analysis of the amplitude data. A classifier based on functional principal components for sustained hyperglycemia prediction was validated on a sample of twelve mice, correctly classifying the condition in diabetic mice. Using the same classifier, sixteen mice with drug-induced diabetes were studied for two weeks. The proposed sensing approach was capable of assessing the glycemic states at different stages of induced diabetes, providing a clear transition from normoglycemia to hyperglycemia typically associated with diabetes. This is believed to be the first presentation of such evolution studies using non-invasive sensing. The results obtained indicate that gradual glycemic changes associated with diabetes can be accurately detected by non-invasively sensing the metabolism using a millimeter-wave spectral sensor, with an observed temporal resolution of around four days. This unprecedented detection speed and its non-invasive character could open new opportunities for the continuous control and monitoring of diabetics and the evaluation of response to treatments (including new therapies), enabling a much more appropriate control of the condition.
Collapse
|
33
|
Walker C, Warmenhoven J, Sinclair PJ, Cobley S. The application of inertial measurement units and functional principal component analysis to evaluate movement in the forward 3½ pike somersault springboard dive. Sports Biomech 2019; 18:146-162. [PMID: 31042139 DOI: 10.1080/14763141.2019.1574887] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Indexed: 10/26/2022]
Abstract
Based on technological and analytical advances, the capability to more accurately and finitely examine biomechanical and skill characteristics of movement has improved. The purpose of this study was to use Inertial Measurement Units (IMUs) and Functional Principal Components Analysis (fPCA) to examine the role of movement variability (assessed via angular velocity), on 2 divers (1 international level; 1 national) performing the forward 3½ pike somersault dive. Analysis of angular velocity curves during ive-flight identified 5 fPCs, accounting for 96.5% of movement variability. The national diver's scatter plots and standard deviation of fPC scores illustrated larger magnitudes of angular velocity variability across dive flight. For fPC1 and fPC3, magnitudes of SD variability were 282.6 and 201.5, respectively. The international diver illustrated more consistent angular velocity profiles, with clustering of fPCs scores (e.g., fPC1 & 3 = SD's of 75.2 & 68.0). To account for lower variability in the international diver, the ability to better coordinate movement sequences and functionally utilise feedback in response to initiation of the somersault position is highlighted. Overall, findings highlight how both IMUs and fPCA can more holistically and finitely examine the biomechanical and skill characteristics of movement sequences with the capability to inform athlete development.
Collapse
Affiliation(s)
- Cherie Walker
- a Faculty of Health Sciences , The University of Sydney , Lidcombe , Australia.,b Applied Research Program , New South Wales Institute of Sport , Sydney Olympic Park , Australia
| | - John Warmenhoven
- a Faculty of Health Sciences , The University of Sydney , Lidcombe , Australia
| | - Peter J Sinclair
- a Faculty of Health Sciences , The University of Sydney , Lidcombe , Australia
| | - Stephen Cobley
- a Faculty of Health Sciences , The University of Sydney , Lidcombe , Australia
| |
Collapse
|
34
|
Abstract
HIV-1C is the most prevalent subtype of HIV-1 and accounts for over half of HIV-1 infections worldwide. Host genetic influence of HIV infection has been previously studied in HIV-1B, but little attention has been paid to the more prevalent subtype C. To understand the role of host genetics in HIV-1C disease progression, we perform a study to assess the association between longitudinally collected measures of disease and more than 100,000 genetic markers located on chromosome 6. The most common approach to analyzing longitudinal data in this context is linear mixed effects models, which may be overly simplistic in this case. On the other hand, existing flexible and nonparametric methods either require densely sampled points, restrict attention to a single SNP, lack testing procedures, or are cumbersome to fit on the genome-wide scale. We propose a functional principal variance component (FPVC) testing framework which captures the nonlinearity in the CD4 and viral load with low degrees of freedom and is fast enough to carry out thousands or millions of times. The FPVC testing unfolds in two stages. In the first stage, we summarize the markers of disease progression according to their major patterns of variation via functional principal components analysis (FPCA). In the second stage, we employ a simple working model and variance component testing to examine the association between the summaries of disease progression and a set of single nucleotide polymorphisms. We supplement this analysis with simulation results which indicate that FPVC testing can offer large power gains over the standard linear mixed effects model.
Collapse
Affiliation(s)
- Denis Agniel
- RAND Corporation, 1776 Main St., Santa Monica, California 90401, USA
| | - Wen Xie
- Department of Immunology and Infectious Diseases, Harvard T. H. Chan School of Public Health, 655 Huntington Ave, Boston, Massachusetts 02115, USA
| | - Myron Essex
- Department of Immunology and Infectious Diseases, Harvard T. H. Chan School of Public Health, 655 Huntington Ave, Boston, Massachusetts 02115, USA
| | - Tianxi Cai
- Department of Biostatistics, Harvard T. H. Chan School of Public Health, 655 Huntington Ave, Boston, Massachusetts 02115, USA
| |
Collapse
|
35
|
Sheppard T, Tamblyn R, Abrahamowicz M, Lunt M, Sperrin M, Dixon WG. A comparison of methods for estimating the temporal change in a continuous variable: Example of HbA1c in patients with diabetes. Pharmacoepidemiol Drug Saf 2017; 26:1474-1482. [PMID: 28812323 PMCID: PMC5724699 DOI: 10.1002/pds.4273] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/03/2016] [Revised: 05/12/2017] [Accepted: 06/15/2017] [Indexed: 11/12/2022]
Abstract
Purpose To compare the more complex technique, functional principal component analysis (FPCA), to simpler methods of estimating values of sparse and irregularly spaced continuous variables at given time points in longitudinal data using a diabetic patient cohort from UK primary care. Methods The setting for this study is the Clinical Practice Research Datalink (CPRD), a UK general practice research database. For 16,034 diabetic patients identified in CPRD, with at least 2 measures in a 30‐month period, HbA1c was estimated after temporarily omitting (i) the final and (ii) middle known values using linear interpolation, simple linear regression, arithmetic mean, random effects, and FPCA. Performance of each method was assessed using mean prediction error. The influence on predictive accuracy of (1) more homogeneous populations and (2) number and range of known HbA1c values was explored. Results When estimating the last observation, the predictive accuracy of FPCA was highest with over half of predicted values within 0.4 units, equivalent to laboratory measurement error. Predictive accuracy improved when estimating the middle observation with almost 60% predicted values within 0.4 units for FPCA. These results were marginally better than that achieved by simpler approaches, such as last‐occurrence‐carried‐forward linear interpolation. This pattern persisted with more homogeneous populations as well as when variability in HbA1c measures coupled with frequency of data points were considered. Conclusions When estimating change from baseline to prespecified time points in electronic medical records data, a marginal benefit to using the more complex modelling approach of FPCA exists over more traditional methods.
Collapse
Affiliation(s)
- Therese Sheppard
- Division of Musculoskeletal and Dermatological Sciences, Arthritis Research UK Centre for Epidemiology, School of Biological Sciences, Faculty of Biology, Medicine and Health, The University of Manchester, Manchester Academic Health Science Centre, Manchester, UK
| | - Robyn Tamblyn
- Department of Epidemiology, Biostatistics and Occupational Health, McGill University, Quebec, Canada.,Department of Medicine, McGill University, Quebec, Canada.,Clinical and Health Informatics Research Group, McGill University, Quebec, Canada
| | - Michal Abrahamowicz
- Department of Epidemiology, Biostatistics and Occupational Health, McGill University, Quebec, Canada
| | - Mark Lunt
- Division of Musculoskeletal and Dermatological Sciences, Arthritis Research UK Centre for Epidemiology, School of Biological Sciences, Faculty of Biology, Medicine and Health, The University of Manchester, Manchester Academic Health Science Centre, Manchester, UK
| | - Matthew Sperrin
- Centre for Health Informatics, Division of Informatics, Imaging and Data Sciences, School of Health Sciences, Faculty of Biology, Medicine and Health, The University of Manchester, Manchester Academic Health Science Centre, Manchester, UK.,Health e-Research Centre, Farr Institute, The University of Manchester, Manchester Academic Health Science Centre, Manchester, UK
| | - William G Dixon
- Division of Musculoskeletal and Dermatological Sciences, Arthritis Research UK Centre for Epidemiology, School of Biological Sciences, Faculty of Biology, Medicine and Health, The University of Manchester, Manchester Academic Health Science Centre, Manchester, UK.,Health e-Research Centre, Farr Institute, The University of Manchester, Manchester Academic Health Science Centre, Manchester, UK
| |
Collapse
|
36
|
Li K, Luo S. Functional joint model for longitudinal and time-to-event data: an application to Alzheimer's disease. Stat Med 2017; 36:3560-3572. [PMID: 28664662 DOI: 10.1002/sim.7381] [Citation(s) in RCA: 20] [Impact Index Per Article: 2.9] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/15/2016] [Revised: 04/14/2017] [Accepted: 05/30/2017] [Indexed: 11/09/2022]
Abstract
Functional data are increasingly collected in public health and medical studies to better understand many complex diseases. Besides the functional data, other clinical measures are often collected repeatedly. Investigating the association between these longitudinal data and time to a survival event is of great interest to these studies. In this article, we develop a functional joint model (FJM) to account for functional predictors in both longitudinal and survival submodels in the joint modeling framework. The parameters of FJM are estimated in a maximum likelihood framework via expectation maximization algorithm. The proposed FJM provides a flexible framework to incorporate many features both in joint modeling of longitudinal and survival data and in functional data analysis. The FJM is evaluated by a simulation study and is applied to the Alzheimer's Disease Neuroimaging Initiative study, a motivating clinical study testing whether serial brain imaging, clinical, and neuropsychological assessments can be combined to measure the progression of Alzheimer's disease. Copyright © 2017 John Wiley & Sons, Ltd.
Collapse
Affiliation(s)
- Kan Li
- Department of Biostatistics, The University of Texas Health Science Center at Houston, Houston, 77030, TX, U.S.A
| | - Sheng Luo
- Department of Biostatistics, The University of Texas Health Science Center at Houston, Houston, 77030, TX, U.S.A
| |
Collapse
|
37
|
Choi JY, Hwang H, Yamamoto M, Jung K, Woodward TS. A Unified Approach to Functional Principal Component Analysis and Functional Multiple-Set Canonical Correlation. Psychometrika 2017; 82:427-441. [PMID: 26856725 DOI: 10.1007/s11336-015-9478-5] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/05/2015] [Indexed: 06/05/2023]
Abstract
Functional principal component analysis (FPCA) and functional multiple-set canonical correlation analysis (FMCCA) are data reduction techniques for functional data that are collected in the form of smooth curves or functions over a continuum such as time or space. In FPCA, low-dimensional components are extracted from a single functional dataset such that they explain the most variance of the dataset, whereas in FMCCA, low-dimensional components are obtained from each of multiple functional datasets in such a way that the associations among the components are maximized across the different sets. In this paper, we propose a unified approach to FPCA and FMCCA. The proposed approach subsumes both techniques as special cases. Furthermore, it permits a compromise between the techniques, such that components are obtained from each set of functional data to maximize their associations across different datasets, while accounting for the variance of the data well. We propose a single optimization criterion for the proposed approach, and develop an alternating regularized least squares algorithm to minimize the criterion in combination with basis function approximations to functions. We conduct a simulation study to investigate the performance of the proposed approach based on synthetic data. We also apply the approach for the analysis of multiple-subject functional magnetic resonance imaging data to obtain low-dimensional components of blood-oxygen level-dependent signal changes of the brain over time, which are highly correlated across the subjects as well as representative of the data. The extracted components are used to identify networks of neural activity that are commonly activated across the subjects while carrying out a working memory task.
Collapse
Affiliation(s)
- Ji Yeh Choi
- Department of Psychology, McGill University, 1205 Dr. Penfield Avenue, Montreal, QC, H3A 1B1 , Canada.
| | - Heungsun Hwang
- Department of Psychology, McGill University, 1205 Dr. Penfield Avenue, Montreal, QC, H3A 1B1 , Canada
| | | | - Kwanghee Jung
- University of Texas Health Science Center, San Antonio, TX, USA
| | - Todd S Woodward
- University of British Columbia and British Columbia Mental Health and Addiction Research Institute, Vancouver, Canada
| |
Collapse
|
38
|
Farneti B, Di Guardo M, Khomenko I, Cappellin L, Biasioli F, Velasco R, Costa F. Genome-wide association study unravels the genetic control of the apple volatilome and its interplay with fruit texture. J Exp Bot 2017; 68:1467-1478. [PMID: 28338794 PMCID: PMC5441895 DOI: 10.1093/jxb/erx018] [Citation(s) in RCA: 20] [Impact Index Per Article: 2.9] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/18/2023]
Abstract
Fruit quality represents a fundamental factor guiding consumers' preferences. Among apple quality traits, volatile organic compounds and texture features play a major role. Proton Transfer Reaction-Time of Flight-Mass Spectrometry (PTR-ToF-MS), coupled with an artificial chewing device, was used to profile the entire apple volatilome of 162 apple accessions, while the fruit texture was dissected with a TAXT-AED texture analyzer. The array of volatile compounds was classed into seven major groups and used in a genome-wide association analysis carried out with 9142 single nucleotide polymorphisms (SNPs). Marker-trait associations were identified on seven chromosomes co-locating with important candidate genes for aroma, such as MdAAT1 and MdIGS. The integration of volatilome and fruit texture data conducted with a multiple factor analysis unraveled contrasting behavior, underlying opposite regulation of the two fruit quality aspects. The association analysis using the first two principal components identified two QTLs located on chromosomes 10 and 2, respectively. The distinction of the apple accessions on the basis of the allelic configuration of two functional markers, MdPG1 and MdACO1, shed light on the type of interplay existing between fruit texture and the production of volatile organic compounds.
Collapse
Affiliation(s)
- Brian Farneti
- Research and Innovation Centre, Fondazione Edmund Mach, via Mach 1, 38010 San Michele all'Adige, Trento,Italy
| | - Mario Di Guardo
- Research and Innovation Centre, Fondazione Edmund Mach, via Mach 1, 38010 San Michele all'Adige, Trento, Italy
- Graduate School Experimental Plant Sciences, Wageningen University, PO Box 386, 6700 AJ Wageningen, The Netherlands
| | - Iuliia Khomenko
- Research and Innovation Centre, Fondazione Edmund Mach, via Mach 1, 38010 San Michele all'Adige, Trento, Italy
- Institute for Ion Physics and Applied Physics, University of Innsbruck, Technikerstr. 25/3, 6020 Innsbruck, Austria
| | - Luca Cappellin
- Research and Innovation Centre, Fondazione Edmund Mach, via Mach 1, 38010 San Michele all'Adige, Trento,Italy
| | - Franco Biasioli
- Research and Innovation Centre, Fondazione Edmund Mach, via Mach 1, 38010 San Michele all'Adige, Trento,Italy
| | - Riccardo Velasco
- Research and Innovation Centre, Fondazione Edmund Mach, via Mach 1, 38010 San Michele all'Adige, Trento,Italy
| | - Fabrizio Costa
- Research and Innovation Centre, Fondazione Edmund Mach, via Mach 1, 38010 San Michele all'Adige, Trento,Italy
| |
Collapse
|
39
|
Salvatore S, Røislien J, Baz-Lomba JA, Bramness JG. Assessing prescription drug abuse using functional principal component analysis (FPCA) of wastewater data. Pharmacoepidemiol Drug Saf 2016; 26:320-326. [PMID: 27862608 DOI: 10.1002/pds.4127] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/06/2016] [Revised: 08/29/2016] [Accepted: 10/16/2016] [Indexed: 11/11/2022]
Abstract
BACKGROUND Wastewater-based epidemiology is an alternative method for estimating the collective drug use in a community. We applied functional data analysis, a statistical framework developed for analysing curve data, to investigate weekly temporal patterns in wastewater measurements of three prescription drugs with known abuse potential: methadone, oxazepam and methylphenidate, comparing them to positive and negative control drugs. METHODS Sewage samples were collected in February 2014 from a wastewater treatment plant in Oslo, Norway. The weekly pattern of each drug was extracted by fitting of generalized additive models, using trigonometric functions to model the cyclic behaviour. From the weekly component, the main temporal features were then extracted using functional principal component analysis. Results are presented through the functional principal components (FPCs) and corresponding FPC scores. RESULTS Clinically, the most important weekly feature of the wastewater-based epidemiology data was the second FPC, representing the difference between average midweek level and a peak during the weekend, representing possible recreational use of a drug in the weekend. Estimated scores on this FPC indicated recreational use of methylphenidate, with a high weekend peak, but not for methadone and oxazepam. CONCLUSION The functional principal component analysis uncovered clinically important temporal features of the weekly patterns of the use of prescription drugs detected from wastewater analysis. This may be used as a post-marketing surveillance method to monitor prescription drugs with abuse potential. Copyright © 2016 John Wiley & Sons, Ltd.
Collapse
Affiliation(s)
- Stefania Salvatore
- Norwegian Centre for Addiction Research, University of Oslo, Oslo, Norway
| | - Jo Røislien
- Norwegian Centre for Addiction Research, University of Oslo, Oslo, Norway.,Department of Health Studies, University of Stavanger, Stavanger, Norway
| | - Jose A Baz-Lomba
- Norwegian Centre for Addiction Research, University of Oslo, Oslo, Norway.,Norwegian Institute for Water Research, Oslo, Norway
| | - Jørgen G Bramness
- Norwegian Centre for Addiction Research, University of Oslo, Oslo, Norway
| |
Collapse
|
40
|
Szczesniak RD, Li D, Duan LL, Altaye M, Miodovnik M, Khoury JC. Longitudinal Patterns of Glycemic Control and Blood Pressure in Pregnant Women with Type 1 Diabetes Mellitus: Phenotypes from Functional Data Analysis. Am J Perinatol 2016; 33:1282-1290. [PMID: 27490775 PMCID: PMC5294951 DOI: 10.1055/s-0036-1586507] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Indexed: 10/21/2022]
Abstract
Objective To identify phenotypes of type 1 diabetes control and associations with maternal/neonatal characteristics based on blood pressure (BP), glucose, and insulin curves during gestation, using a novel functional data analysis approach that accounts for sparse longitudinal patterns of medical monitoring during pregnancy. Methods We performed a retrospective longitudinal cohort study of women with type 1 diabetes whose BP, glucose, and insulin requirements were monitored throughout gestation as part of a program-project grant. Scores from sparse functional principal component analysis (fPCA) were used to classify gestational profiles according to the degree of control for each monitored measure. Phenotypes created using fPCA were compared with respect to maternal and neonatal characteristics and outcome. Results Most of the gestational profile variation in the monitored measures was explained by the first principal component (82-94%). Profiles clustered into three subgroups of high, moderate, or low heterogeneity, relative to the overall mean response. Phenotypes were associated with baseline characteristics, longitudinal changes in glycohemoglobin A1 and weight, and to pregnancy-related outcomes. Conclusion Three distinct longitudinal patterns of glucose, insulin, and BP control were found. By identifying these phenotypes, interventions can be targeted for subgroups at highest risk for compromised outcome, to optimize diabetes management during pregnancy.
Collapse
Affiliation(s)
- Rhonda D. Szczesniak
- Division of Biostatistics & Epidemiology, Cincinnati Children’s Hospital Medical Center, Cincinnati, OH,Division of Pulmonary Medicine, Cincinnati Children’s Hospital Medical Center, Cincinnati, OH,Address for correspondence: Rhonda Szczesniak, PhD, Division of Biostatistics & Epidemiology (MLC 5041), Cincinnati Children’s Hospital Medical Center, Cincinnati, OH; Phone: (513) 803-0563; Fax: (513) 636-7509;
| | - Dan Li
- Department of Mathematical Sciences, University of Cincinnati, Cincinnati, OH
| | - Leo L. Duan
- Department of Mathematical Sciences, University of Cincinnati, Cincinnati, OH
| | - Mekibib Altaye
- Division of Biostatistics & Epidemiology, Cincinnati Children’s Hospital Medical Center, Cincinnati, OH
| | - Menachem Miodovnik
- Pregnancy and Perinatology Branch, Eunice Kennedy Shriver National Institute of Child Health and Human Development, Bethesda, MD
| | - Jane C Khoury
- Division of Biostatistics & Epidemiology, Cincinnati Children’s Hospital Medical Center, Cincinnati, OH,Division of Endocrinology, Cincinnati Children’s Hospital Medical Center, Cincinnati, OH
| |
Collapse
|
41
|
Petersen A, Zhao J, Carmichael O, Müller HG. Quantifying Individual Brain Connectivity with Functional Principal Component Analysis for Networks. Brain Connect 2016; 6:540-7. [PMID: 27267074 DOI: 10.1089/brain.2016.0420] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/08/2023] Open
Abstract
In typical functional connectivity studies, connections between voxels or regions in the brain are represented as edges in a network. Networks for different subjects are constructed at a given graph density and are summarized by some network measure such as path length. Examining these summary measures for many density values yields samples of connectivity curves, one for each individual. This has led to the adoption of basic tools of functional data analysis, most commonly to compare control and disease groups through the average curves in each group. Such group differences, however, neglect the variability in the sample of connectivity curves. In this article, the use of functional principal component analysis (FPCA) is demonstrated to enrich functional connectivity studies by providing increased power and flexibility for statistical inference. Specifically, individual connectivity curves are related to individual characteristics such as age and measures of cognitive function, thus providing a tool to relate brain connectivity with these variables at the individual level. This individual level analysis opens a new perspective that goes beyond previous group level comparisons. Using a large data set of resting-state functional magnetic resonance imaging scans, relationships between connectivity and two measures of cognitive function-episodic memory and executive function-were investigated. The group-based approach was implemented by dichotomizing the continuous cognitive variable and testing for group differences, resulting in no statistically significant findings. To demonstrate the new approach, FPCA was implemented, followed by linear regression models with cognitive scores as responses, identifying significant associations of connectivity in the right middle temporal region with both cognitive scores.
Collapse
Affiliation(s)
- Alexander Petersen
- 1 Department of Statistics and Applied Probability, University of California, Santa Barbara, Santa Barbara, California
| | - Jianyang Zhao
- 1 Department of Statistics and Applied Probability, University of California, Santa Barbara, Santa Barbara, California
| | - Owen Carmichael
- 2 Pennington Biomedical Research Center, Louisiana State University , Baton Rouge, Louisiana
| | - Hans-Georg Müller
- 1 Department of Statistics and Applied Probability, University of California, Santa Barbara, Santa Barbara, California
| |
Collapse
|
42
|
Lee E, Zhu H, Kong D, Wang Y, Giovanello KS, Ibrahim JG. BFLCRM: A BAYESIAN FUNCTIONAL LINEAR COX REGRESSION MODEL FOR PREDICTING TIME TO CONVERSION TO ALZHEIMER'S DISEASE. Ann Appl Stat 2015; 9:2153-2178. [PMID: 26900412 DOI: 10.1214/15-aoas879] [Citation(s) in RCA: 21] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/29/2022]
Abstract
The aim of this paper is to develop a Bayesian functional linear Cox regression model (BFLCRM) with both functional and scalar covariates. This new development is motivated by establishing the likelihood of conversion to Alzheimer's disease (AD) in 346 patients with mild cognitive impairment (MCI) enrolled in the Alzheimer's Disease Neuroimaging Initiative 1 (ADNI-1) and the early markers of conversion. These 346 MCI patients were followed over 48 months, with 161 MCI participants progressing to AD at 48 months. The functional linear Cox regression model was used to establish that functional covariates including hippocampus surface morphology and scalar covariates including brain MRI volumes, cognitive performance (ADAS-Cog), and APOE status can accurately predict time to onset of AD. Posterior computation proceeds via an efficient Markov chain Monte Carlo algorithm. A simulation study is performed to evaluate the finite sample performance of BFLCRM.
Collapse
Affiliation(s)
- Eunjee Lee
- Departments of Statistics and Operation Research, Biostatistics, and Psychology and Biomedical Research Imaging Center, University of North Carolina at Chapel Hill, Chapel Hill, NC 27599, USA
| | - Hongtu Zhu
- Departments of Statistics and Operation Research, Biostatistics, and Psychology and Biomedical Research Imaging Center, University of North Carolina at Chapel Hill, Chapel Hill, NC 27599, USA
| | - Dehan Kong
- Departments of Statistics and Operation Research, Biostatistics, and Psychology and Biomedical Research Imaging Center, University of North Carolina at Chapel Hill, Chapel Hill, NC 27599, USA
| | - Yalin Wang
- School of Computing, Informatics, and Decision Systems Engineering Arizona State University Tempe, AZ 85287-8809
| | - Kelly Sullivan Giovanello
- Departments of Statistics and Operation Research, Biostatistics, and Psychology and Biomedical Research Imaging Center, University of North Carolina at Chapel Hill, Chapel Hill, NC 27599, USA
| | - Joseph G Ibrahim
- Departments of Statistics and Operation Research, Biostatistics, and Psychology and Biomedical Research Imaging Center, University of North Carolina at Chapel Hill, Chapel Hill, NC 27599, USA
| |
Collapse
|
43
|
Abstract
Emerging integrative analysis of genomic and anatomical imaging data which has not been well developed, provides invaluable information for the holistic discovery of the genomic structure of disease and has the potential to open a new avenue for discovering novel disease susceptibility genes which cannot be identified if they are analyzed separately. A key issue to the success of imaging and genomic data analysis is how to reduce their dimensions. Most previous methods for imaging information extraction and RNA-seq data reduction do not explore imaging spatial information and often ignore gene expression variation at the genomic positional level. To overcome these limitations, we extend functional principle component analysis from one dimension to two dimensions (2DFPCA) for representing imaging data and develop a multiple functional linear model (MFLM) in which functional principal scores of images are taken as multiple quantitative traits and RNA-seq profile across a gene is taken as a function predictor for assessing the association of gene expression with images. The developed method has been applied to image and RNA-seq data of ovarian cancer and kidney renal clear cell carcinoma (KIRC) studies. We identified 24 and 84 genes whose expressions were associated with imaging variations in ovarian cancer and KIRC studies, respectively. Our results showed that many significantly associated genes with images were not differentially expressed, but revealed their morphological and metabolic functions. The results also demonstrated that the peaks of the estimated regression coefficient function in the MFLM often allowed the discovery of splicing sites and multiple isoforms of gene expressions.
Collapse
|
44
|
Abstract
We propose an extensive framework for additive regression models for correlated functional responses, allowing for multiple partially nested or crossed functional random effects with flexible correlation structures for, e.g., spatial, temporal, or longitudinal functional data. Additionally, our framework includes linear and nonlinear effects of functional and scalar covariates that may vary smoothly over the index of the functional response. It accommodates densely or sparsely observed functional responses and predictors which may be observed with additional error and includes both spline-based and functional principal component-based terms. Estimation and inference in this framework is based on standard additive mixed models, allowing us to take advantage of established methods and robust, flexible algorithms. We provide easy-to-use open source software in the pffr() function for the R-package refund. Simulations show that the proposed method recovers relevant effects reliably, handles small sample sizes well and also scales to larger data sets. Applications with spatially and longitudinally observed functional data demonstrate the flexibility in modeling and interpretability of results of our approach.
Collapse
|
45
|
Abstract
We propose localized functional principal component analysis (LFPCA), looking for orthogonal basis functions with localized support regions that explain most of the variability of a random process. The LFPCA is formulated as a convex optimization problem through a novel Deflated Fantope Localization method and is implemented through an efficient algorithm to obtain the global optimum. We prove that the proposed LFPCA converges to the original FPCA when the tuning parameters are chosen appropriately. Simulation shows that the proposed LFPCA with tuning parameters chosen by cross validation can almost perfectly recover the true eigenfunctions and significantly improve the estimation accuracy when the eigenfunctions are truly supported on some subdomains. In the scenario that the original eigenfunctions are not localized, the proposed LFPCA also serves as a nice tool in finding orthogonal basis functions that balance between interpretability and the capability of explaining variability of the data. The analyses of a country mortality data reveal interesting features that cannot be found by standard FPCA methods.
Collapse
Affiliation(s)
- Kehui Chen
- University of Pittsburgh and Carnegie Mellon University
| | - Jing Lei
- University of Pittsburgh and Carnegie Mellon University
| |
Collapse
|
46
|
Abstract
We consider analysis of sparsely sampled multilevel functional data, where the basic observational unit is a function and data have a natural hierarchy of basic units. An example is when functions are recorded at multiple visits for each subject. Multilevel functional principal component analysis (MFPCA; Di et al. 2009) was proposed for such data when functions are densely recorded. Here we consider the case when functions are sparsely sampled and may contain only a few observations per function. We exploit the multilevel structure of covariance operators and achieve data reduction by principal component decompositions at both between and within subject levels. We address inherent methodological differences in the sparse sampling context to: 1) estimate the covariance operators; 2) estimate the functional principal component scores; 3) predict the underlying curves. Through simulations the proposed method is able to discover dominating modes of variations and reconstruct underlying curves well even in sparse settings. Our approach is illustrated by two applications, the Sleep Heart Health Study and eBay auctions.
Collapse
Affiliation(s)
- Chongzhi Di
- Division of Public Health Sciences, Fred Hutchinson Cancer Research Center, 1100 Fairview Avenue North, M2-B500, Seattle, WA 98115, USA
| | - Ciprian M Crainiceanu
- Department of Biostatistics, Johns Hopkins University, 615 North Wolfe Street, Baltimore, MD 21205, USA
| | - Wolfgang S Jank
- Department of Information Systems and Decision Sciences, University of South Florida, Tampa, FL 33620, USA
| |
Collapse
|
47
|
Sørensen H, Goldsmith J, Sangalli LM. An introduction with medical applications to functional data analysis. Stat Med 2013; 32:5222-40. [PMID: 24114808 DOI: 10.1002/sim.5989] [Citation(s) in RCA: 47] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/30/2012] [Accepted: 08/27/2013] [Indexed: 11/11/2022]
Abstract
Functional data are data that can be represented by suitable functions, such as curves (potentially multi-dimensional) or surfaces. This paper gives an introduction to some basic but important techniques for the analysis of such data, and we apply the techniques to two datasets from biomedicine. One dataset is about white matter structures in the brain in multiple sclerosis patients; the other dataset is about three-dimensional vascular geometries collected for the study of cerebral aneurysms. The techniques described are smoothing, alignment, principal component analysis, and regression.
Collapse
Affiliation(s)
- Helle Sørensen
- Laboratory for Applied Statistics, Department of Mathematical Sciences, University of Copenhagen, Denmark
| | | | | |
Collapse
|
48
|
Abstract
Functional and longitudinal data are becoming more and more common in practice. This paper focuses on sparse and irregular longitudinal data with a multicategory response. The predictor consists of sparse and irregular observations, potentially contaminated with measurement errors, on the predictor trajectory. To deal with this type of complicated predictors, we borrow the strength of large margin classifiers in statistical learning for classification of sparse and irregular longitudinal data. In particular, we propose functional robust truncated-hinge-loss support vector machines to perform multicategory classification with the aid of functional principal component analysis.
Collapse
Affiliation(s)
- Yichao Wu
- Department of Statistics, North Carolina State University, Raleigh, NC 27695 ()
| | - Yufeng Liu
- Department of Statistics and Operations Research, Carolina Center for Genome Sciences, University of North Carolina, Chapel Hill, NC 27599 ()
| |
Collapse
|
49
|
Abstract
BACKGROUND Functional data analysis (FDA) is increasingly being used to better analyze, model and predict time series data. Key aspects of FDA include the choice of smoothing technique, data reduction, adjustment for clustering, functional linear modeling and forecasting methods. METHODS A systematic review using 11 electronic databases was conducted to identify FDA application studies published in the peer-review literature during 1995-2010. Papers reporting methodological considerations only were excluded, as were non-English articles. RESULTS In total, 84 FDA application articles were identified; 75.0% of the reviewed articles have been published since 2005. Application of FDA has appeared in a large number of publications across various fields of sciences; the majority is related to biomedicine applications (21.4%). Overall, 72 studies (85.7%) provided information about the type of smoothing techniques used, with B-spline smoothing (29.8%) being the most popular. Functional principal component analysis (FPCA) for extracting information from functional data was reported in 51 (60.7%) studies. One-quarter (25.0%) of the published studies used functional linear models to describe relationships between explanatory and outcome variables and only 8.3% used FDA for forecasting time series data. CONCLUSIONS Despite its clear benefits for analyzing time series data, full appreciation of the key features and value of FDA have been limited to date, though the applications show its relevance to many public health and biomedical problems. Wider application of FDA to all studies involving correlated measurements should allow better modeling of, and predictions from, such data in the future especially as FDA makes no a priori age and time effects assumptions.
Collapse
Affiliation(s)
- Shahid Ullah
- Flinders Centre for Epidemiology and Biostatistics, School of Medicine, Faculty of Health Sciences, Flinders University, Adelaide, SA, 5001, Australia
| | - Caroline F Finch
- Centre for Healthy and Safe Sports (CHASS), University of Ballarat, SMB Campus, Ballarat, VIC, 3353, Australia
| |
Collapse
|