1
|
Kalogridis I, Van Aelst S. Robust penalized estimators for functional linear regression. J MULTIVARIATE ANAL 2022. [DOI: 10.1016/j.jmva.2022.105104] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/25/2022]
|
2
|
Hesamian G, Akbari MG. A fuzzy functional linear regression model with functional predictors and fuzzy responses. Soft comput 2022. [DOI: 10.1007/s00500-021-06435-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
|
3
|
Tan CW, Bergmeir C, Petitjean F, Webb GI. Time series extrinsic regression: Predicting numeric values from time series data. Data Min Knowl Discov 2021; 35:1032-1060. [PMID: 33727888 PMCID: PMC7951134 DOI: 10.1007/s10618-021-00745-9] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/22/2020] [Accepted: 02/17/2021] [Indexed: 12/02/2022]
Abstract
This paper studies time series extrinsic regression (TSER): a regression task of which the aim is to learn the relationship between a time series and a continuous scalar variable; a task closely related to time series classification (TSC), which aims to learn the relationship between a time series and a categorical class label. This task generalizes time series forecasting, relaxing the requirement that the value predicted be a future value of the input series or primarily depend on more recent values. In this paper, we motivate and study this task, and benchmark existing solutions and adaptations of TSC algorithms on a novel archive of 19 TSER datasets which we have assembled. Our results show that the state-of-the-art TSC algorithm Rocket, when adapted for regression, achieves the highest overall accuracy compared to adaptations of other TSC algorithms and state-of-the-art machine learning (ML) algorithms such as XGBoost, Random Forest and Support Vector Regression. More importantly, we show that much research is needed in this field to improve the accuracy of ML models. We also find evidence that further research has excellent prospects of improving upon these straightforward baselines.
Collapse
Affiliation(s)
- Chang Wei Tan
- Faculty of Information Technology, Monash University, 25 Exhibition Walk, Melbourne, VIC 3800 Australia
| | - Christoph Bergmeir
- Faculty of Information Technology, Monash University, 25 Exhibition Walk, Melbourne, VIC 3800 Australia
| | - François Petitjean
- Faculty of Information Technology, Monash University, 25 Exhibition Walk, Melbourne, VIC 3800 Australia
| | - Geoffrey I Webb
- Faculty of Information Technology, Monash University, 25 Exhibition Walk, Melbourne, VIC 3800 Australia
| |
Collapse
|
4
|
Cai X, Xue L, Wang Z. Robust estimation with modified Huber's function for functional linear models. STATISTICS-ABINGDON 2020. [DOI: 10.1080/02331888.2020.1862114] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/22/2022]
Affiliation(s)
- Xiong Cai
- College of Applied Sciences, Beijing University of Technology, Beijing, People's Republic of China
| | - Liugen Xue
- College of Applied Sciences, Beijing University of Technology, Beijing, People's Republic of China
| | - Zhaoliang Wang
- School of Mathematics and Information Science, Henan Polytechnic University, Jiaozuo, People's Republic of China
| |
Collapse
|
5
|
Regression models using shapes of functions as predictors. Comput Stat Data Anal 2020. [DOI: 10.1016/j.csda.2020.107017] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/23/2022]
|
6
|
Luo R, Qi X. Functional Regression for Densely Observed Data With Novel Regularization. J Comput Graph Stat 2020. [DOI: 10.1080/10618600.2020.1807994] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/23/2022]
Affiliation(s)
- Ruiyan Luo
- Department of Population Health Sciences, School of Public Health, Georgia State University, Atlanta, GA
| | - Xin Qi
- Department of Mathematics and Statistics, Georgia State University, Atlanta, GA
| |
Collapse
|
7
|
Shang HL. Bayesian bandwidth estimation and semi-metric selection for a functional partial linear model with unknown error density. J Appl Stat 2020; 48:583-604. [PMID: 35706989 PMCID: PMC9041737 DOI: 10.1080/02664763.2020.1736527] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/24/2019] [Accepted: 02/23/2020] [Indexed: 10/24/2022]
Abstract
This study examines the optimal selections of bandwidth and semi-metric for a functional partial linear model. Our proposed method begins by estimating the unknown error density using a kernel density estimator of residuals, where the regression function, consisting of parametric and nonparametric components, can be estimated by functional principal component and functional Nadayara-Watson estimators. The estimation accuracy of the regression function and error density crucially depends on the optimal estimations of bandwidth and semi-metric. A Bayesian method is utilized to simultaneously estimate the bandwidths in the regression function and kernel error density by minimizing the Kullback-Leibler divergence. For estimating the regression function and error density, a series of simulation studies demonstrate that the functional partial linear model gives improved estimation and forecast accuracies compared with the functional principal component regression and functional nonparametric regression. Using a spectroscopy dataset, the functional partial linear model yields better forecast accuracy than some commonly used functional regression models. As a by-product of the Bayesian method, a pointwise prediction interval can be obtained, and marginal likelihood can be used to select the optimal semi-metric.
Collapse
Affiliation(s)
- Han Lin Shang
- Research School of Finance, Actuarial Studies and Statistics, Australian National University, Canberra, Australia
- Department of Actuarial Studies and Business Analytics, Macquarie University, Sydney, Australia
| |
Collapse
|
8
|
Kalogridis I, Van Aelst S. Robust functional regression based on principal components. J MULTIVARIATE ANAL 2019. [DOI: 10.1016/j.jmva.2019.04.003] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/27/2022]
|
9
|
|
10
|
Happ C, Greven S, Schmid VJ. The impact of model assumptions in scalar-on-image regression. Stat Med 2018; 37:4298-4317. [PMID: 30132932 DOI: 10.1002/sim.7915] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/10/2018] [Revised: 06/20/2018] [Accepted: 06/27/2018] [Indexed: 11/11/2022]
Abstract
Complex statistical models such as scalar-on-image regression often require strong assumptions to overcome the issue of nonidentifiability. While in theory, it is well understood that model assumptions can strongly influence the results, this seems to be underappreciated, or played down, in practice. This article gives a systematic overview of the main approaches for scalar-on-image regression with a special focus on their assumptions. We categorize the assumptions and develop measures to quantify the degree to which they are met. The impact of model assumptions and the practical usage of the proposed measures are illustrated in a simulation study and in an application to neuroimaging data. The results show that different assumptions indeed lead to quite different estimates with similar predictive ability, raising the question of their interpretability. We give recommendations for making modeling and interpretation decisions in practice based on the new measures and simulations using hypothetic coefficient images and the observed data.
Collapse
Affiliation(s)
- Clara Happ
- Department of Statistics, LMU Munich, Munich, Germany
| | - Sonja Greven
- Department of Statistics, LMU Munich, Munich, Germany
| | | |
Collapse
|
11
|
Reiss PT, Goldsmith J, Shang HL, Ogden RT. Methods for scalar-on-function regression. Int Stat Rev 2017; 85:228-249. [PMID: 28919663 PMCID: PMC5598560 DOI: 10.1111/insr.12163] [Citation(s) in RCA: 69] [Impact Index Per Article: 8.6] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/08/2015] [Accepted: 12/28/2015] [Indexed: 01/16/2023]
Abstract
Recent years have seen an explosion of activity in the field of functional data analysis (FDA), in which curves, spectra, images, etc. are considered as basic functional data units. A central problem in FDA is how to fit regression models with scalar responses and functional data points as predictors. We review some of the main approaches to this problem, categorizing the basic model types as linear, nonlinear and nonparametric. We discuss publicly available software packages, and illustrate some of the procedures by application to a functional magnetic resonance imaging dataset.
Collapse
Affiliation(s)
- Philip T. Reiss
- Department of Child and Adolescent Psychiatry and Department of Population Health, New York University School of Medicine
- Department of Statistics, University of Haifa
| | - Jeff Goldsmith
- Department of Biostatistics, Columbia University Mailman School of Public Health
| | - Han Lin Shang
- Research School of Finance, Actuarial Studies and Statistics, Australian National University
| | - R. Todd Ogden
- Department of Biostatistics, Columbia University Mailman School of Public Health
- New York State Psychiatric Institute
| |
Collapse
|
12
|
Goldstein BA, Pomann GM, Winkelmayer WC, Pencina MJ. A comparison of risk prediction methods using repeated observations: an application to electronic health records for hemodialysis. Stat Med 2017; 36:2750-2763. [PMID: 28464332 PMCID: PMC5494276 DOI: 10.1002/sim.7308] [Citation(s) in RCA: 34] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/18/2016] [Revised: 03/16/2017] [Accepted: 03/22/2017] [Indexed: 12/12/2022]
Abstract
An increasingly important data source for the development of clinical risk prediction models is electronic health records (EHRs). One of their key advantages is that they contain data on many individuals collected over time. This allows one to incorporate more clinical information into a risk model. However, traditional methods for developing risk models are not well suited to these irregularly collected clinical covariates. In this paper, we compare a range of approaches for using longitudinal predictors in a clinical risk model. Using data from an EHR for patients undergoing hemodialysis, we incorporate five different clinical predictors into a risk model for patient mortality. We consider different approaches for treating the repeated measurements including use of summary statistics, machine learning methods, functional data analysis, and joint models. We follow up our empirical findings with a simulation study. Overall, our results suggest that simple approaches perform just as well, if not better, than more complex analytic approaches. These results have important implication for development of risk prediction models with EHRs. Copyright © 2017 John Wiley & Sons, Ltd.
Collapse
Affiliation(s)
- Benjamin A Goldstein
- Biostatistics and Bioinformatics, Duke University, 2424 Erwin Road, Durham, 27705, NC, U.S.A
- Center for Predictive Medicine, Duke Clinical Research Institute, Durham, NC, 27705, U.S.A
| | - Gina Maria Pomann
- Biostatistics and Bioinformatics, Duke University, 2424 Erwin Road, Durham, 27705, NC, U.S.A
| | | | - Michael J Pencina
- Biostatistics and Bioinformatics, Duke University, 2424 Erwin Road, Durham, 27705, NC, U.S.A
- Center for Predictive Medicine, Duke Clinical Research Institute, Durham, NC, 27705, U.S.A
| |
Collapse
|
13
|
|
14
|
Simultaneous detection of multiple bioactive pollutants using a multiparametric biochip for water quality monitoring. Biosens Bioelectron 2015; 72:71-9. [PMID: 25957833 DOI: 10.1016/j.bios.2015.04.092] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/11/2015] [Revised: 04/26/2015] [Accepted: 04/27/2015] [Indexed: 01/08/2023]
Abstract
Water is a renewable resource but yet finite. Its sustainable usage and the maintenance of a good quality are essential for an intact environment, human life and a stable economy. Emerging technologies aim for a continuous monitoring of water quality, overcoming periodic analytical sampling, and providing information on the current state of inshore waters in real time. So does the here presented cell-based sensor system which uses RLC-18 cells (rat liver cells) as the detection layer for the detection of water pollutants. The electrical read-out of the system, cellular metabolism, oxygen consumption and morphological integrity detects small changes in the water quality and indicates a possible physiological damage caused. A generalized functional linear model was implemented in order to regress the chemicals present in the sample on the electrical read-out. The chosen environmental pollutants to test the system were chlorpyrifos, an organophosphate pesticide, and tetrabromobisphenol A, a flame retardant. Each chemical gives a very characteristic response, but the toxicity is mitigated if both chemicals are present at once. This will focus our attention on the statistical approach which is able to discriminate between these pollutants.
Collapse
|
15
|
Shang HL. Bayesian bandwidth estimation for a functional nonparametric regression model with mixed types of regressors and unknown error density. J Nonparametr Stat 2014. [DOI: 10.1080/10485252.2014.916806] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/25/2022]
|