1
|
Tyralis H, Papacharalampous G, Dogulu N, Chun KP. Deep Huber quantile regression networks. Neural Netw 2025; 187:107364. [PMID: 40112635 DOI: 10.1016/j.neunet.2025.107364] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/09/2023] [Revised: 01/06/2025] [Accepted: 03/04/2025] [Indexed: 03/22/2025]
Abstract
Typical machine learning regression applications aim to report the mean or the median of the predictive probability distribution, via training with a squared or an absolute error scoring function. The importance of issuing predictions of more functionals of the predictive probability distribution (quantiles and expectiles) has been recognized as a means to quantify the uncertainty of the prediction. In deep learning (DL) applications, that is possible through quantile and expectile regression neural networks (QRNN and ERNN respectively). Here we introduce deep Huber quantile regression networks (DHQRN) that nest QRNN and ERNN as edge cases. DHQRN can predict Huber quantiles, which are more general functionals in the sense that they nest quantiles and expectiles as limiting cases. The main idea is to train a DL algorithm with the Huber quantile scoring function, which is consistent for the Huber quantile functional. As a proof of concept, DHQRN are applied to predict house prices in Melbourne, Australia and Boston, United States (US). In this context, predictive performances of three DL architectures are discussed along with evidential interpretation of results from two economic case studies. Additional simulation experiments and applications to real-world case studies using open datasets demonstrate a satisfactory absolute performance of DHQRN.
Collapse
Affiliation(s)
- Hristos Tyralis
- Department of Topography, School of Rural, Surveying and Geoinformatics Engineering, National Technical University of Athens, Iroon Polytechniou 5, Zografou 157 80, Greece; Construction Agency, Hellenic Air Force, Mesogion Avenue 227-231, Cholargos 15 561, Greece.
| | - Georgia Papacharalampous
- Department of Topography, School of Rural, Surveying and Geoinformatics Engineering, National Technical University of Athens, Iroon Polytechniou 5, Zografou 157 80, Greece
| | - Nilay Dogulu
- Hydrology, Water Resources and Cryosphere Branch, World Meteorological Organisation (WMO), Geneva, Switzerland
| | - Kwok P Chun
- Department of Geography and Environmental Management, University of the West of England, Bristol, United Kingdom
| |
Collapse
|
2
|
Liu X, Tan Z, Wu Y, Zhou Y. The Financial Risk Measurement EVaR Based on DTARCH Models. ENTROPY (BASEL, SWITZERLAND) 2023; 25:1204. [PMID: 37628234 PMCID: PMC10453247 DOI: 10.3390/e25081204] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/15/2023] [Revised: 08/01/2023] [Accepted: 08/09/2023] [Indexed: 08/27/2023]
Abstract
The value at risk based on expectile (EVaR) is a very useful method to measure financial risk, especially in measuring extreme financial risk. The double-threshold autoregressive conditional heteroscedastic (DTARCH) model is a valuable tool in assessing the volatility of a financial asset's return. A significant characteristic of DTARCH models is that their conditional mean and conditional variance functions are both piecewise linear, involving double thresholds. This paper proposes the weighted composite expectile regression (WCER) estimation of the DTARCH model based on expectile regression theory. Therefore, we can use EVaR to predict extreme financial risk, especially when the conditional mean and the conditional variance of asset returns are nonlinear. Unlike the existing papers on DTARCH models, we do not assume that the threshold and delay parameters are known. Using simulation studies, it has been demonstrated that the proposed WCER estimation exhibits adequate and promising performance in finite samples. Finally, the proposed approach is used to analyze the daily Hang Seng Index (HSI) and the Standard & Poor's 500 Index (SPI).
Collapse
Affiliation(s)
- Xiaoqian Liu
- Department of Mathematics and Statistics, York University, Toronto, ON M3J 1P3, Canada; (X.L.); (Z.T.)
| | - Zhenni Tan
- Department of Mathematics and Statistics, York University, Toronto, ON M3J 1P3, Canada; (X.L.); (Z.T.)
| | - Yuehua Wu
- Department of Mathematics and Statistics, York University, Toronto, ON M3J 1P3, Canada; (X.L.); (Z.T.)
| | - Yong Zhou
- Key Laboratory of Advanced Theory and Application in Statistics and Data Science, MOE, and Academy of Statistics and Interdisciplinary Sciences and School of Statistics, East China Normal University, Shanghai 200062, China;
| |
Collapse
|
3
|
Cui Y, Zheng S. Iteratively reweighted least square for kernel expectile regression with random features. J STAT COMPUT SIM 2023. [DOI: 10.1080/00949655.2023.2182304] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 03/09/2023]
Affiliation(s)
- Yue Cui
- Department of Mathematics, Missouri State University, Springfield, MO, USA
| | - Songfeng Zheng
- Department of Mathematics, Missouri State University, Springfield, MO, USA
| |
Collapse
|
4
|
Barry A, Bhagwat N, Misic B, Poline JB, Greenwood CMT. Asymmetric influence measure for high dimensional regression. COMMUN STAT-THEOR M 2022. [DOI: 10.1080/03610926.2020.1841793] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/23/2022]
Affiliation(s)
- Amadou Barry
- Departments of Epidemiology, Biostatistics and Occupational Health, McGill University, Montreal, Québec, Canada
- Lady Davis Institute, Jewish General Hospital, Montreal, Québec, Canada
| | - Nikhil Bhagwat
- Faculty of Medicine, Department of Neurology and Neurosurgery, Montreal Neurological Institute and Hospital, McConnell Brain Imaging Centre, McGill University, Montreal, Québec, Canada
| | - Bratislav Misic
- Faculty of Medicine, Department of Neurology and Neurosurgery, Montreal Neurological Institute and Hospital, McConnell Brain Imaging Centre, McGill University, Montreal, Québec, Canada
| | - Jean-Baptiste Poline
- Faculty of Medicine, Department of Neurology and Neurosurgery, Montreal Neurological Institute and Hospital, McConnell Brain Imaging Centre, McGill University, Montreal, Québec, Canada
- Henry H. Wheeler Jr. Brain Imaging Center, Helen Wills Neuroscience Institute, University of California, Berkeley, California, USA
- Ludmer Centre for Neuroinformatics & Mental Health, McGill University, Montreal, Québec, Canada
| | - Celia M. T. Greenwood
- Departments of Epidemiology, Biostatistics and Occupational Health, McGill University, Montreal, Québec, Canada
- Lady Davis Institute, Jewish General Hospital, Montreal, Québec, Canada
- Ludmer Centre for Neuroinformatics & Mental Health, McGill University, Montreal, Québec, Canada
- Departments of Oncology and Human Genetics, McGill University, Montreal, Québec, Canada
| |
Collapse
|
5
|
Girard S, Stupfler G, Usseglio-Carleve A. On automatic bias reduction for extreme expectile estimation. STATISTICS AND COMPUTING 2022; 32:64. [PMID: 35968040 PMCID: PMC9362073 DOI: 10.1007/s11222-022-10118-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 01/07/2021] [Accepted: 05/30/2022] [Indexed: 06/15/2023]
Abstract
Expectiles induce a law-invariant risk measure that has recently gained popularity in actuarial and financial risk management applications. Unlike quantiles or the quantile-based Expected Shortfall, the expectile risk measure is coherent and elicitable. The estimation of extreme expectiles in the heavy-tailed framework, which is reasonable for extreme financial or actuarial risk management, is not without difficulties; currently available estimators of extreme expectiles are typically biased and hence may show poor finite-sample performance even in fairly large samples. We focus here on the construction of bias-reduced extreme expectile estimators for heavy-tailed distributions. The rationale for our construction hinges on a careful investigation of the asymptotic proportionality relationship between extreme expectiles and their quantile counterparts, as well as of the extrapolation formula motivated by the heavy-tailed context. We accurately quantify and estimate the bias incurred by the use of these relationships when constructing extreme expectile estimators. This motivates the introduction of classes of bias-reduced estimators whose asymptotic properties are rigorously shown, and whose finite-sample properties are assessed on a simulation study and three samples of real data from economics, insurance and finance. Supplementary Information The online version contains supplementary material available at 10.1007/s11222-022-10118-x.
Collapse
Affiliation(s)
- Stéphane Girard
- Univ. Grenoble Alpes, Inria, CNRS, Grenoble INP, LJK, 38000 Grenoble, France
| | - Gilles Stupfler
- Univ. Rennes, Ensai, CNRS, CREST, UMR 9194, 35000 Rennes, France
| | | |
Collapse
|
6
|
Padoan SA, Stupfler G. Joint inference on extreme expectiles for multivariate heavy-tailed distributions. BERNOULLI 2022. [DOI: 10.3150/21-bej1375] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/19/2022]
Affiliation(s)
- Simone A. Padoan
- Department of Decision Sciences, Bocconi University, via Roentgen 1, 20136 Milano, Italy
| | | |
Collapse
|
7
|
The functional kNN estimator of the conditional expectile: Uniform consistency in number of neighbors. STATISTICS & RISK MODELING 2021. [DOI: 10.1515/strm-2019-0029] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/15/2022]
Abstract
Abstract
The main purpose of the present paper is to investigate the problem of the nonparametric estimation of the expectile regression
in which the response variable is scalar while the covariate is
a random function. More precisely, an estimator is constructed by using the k Nearest Neighbor procedures (kNN). The main contribution of this study is the establishment of the Uniform consistency in Number of Neighbors (UNN) of the constructed estimator. The usefulness of our result for the smoothing parameter automatic selection is discussed. Short simulation results show that the finite sample performance of the proposed estimator is satisfactory in moderate sample sizes.
We finally examine the implementation of this model in practice with a real data in financial risk analysis.
Collapse
|
8
|
|
9
|
Xu Q, Ding X, Jiang C, Yu K, Shi L. An elastic-net penalized expectile regression with applications. J Appl Stat 2021; 48:2205-2230. [DOI: 10.1080/02664763.2020.1787355] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/24/2022]
Affiliation(s)
- Q.F. Xu
- School of Management, Hefei University of Technology, Hefei, People's Republic of China
- Key Laboratory of Process Optimization and Intelligent Decision-making, Ministry of Education, Hefei, People's Republic of China
| | - X.H. Ding
- School of Management, Hefei University of Technology, Hefei, People's Republic of China
| | - C.X. Jiang
- School of Management, Hefei University of Technology, Hefei, People's Republic of China
| | - K.M. Yu
- Department of Mathematics, Brunel University London, Uxbridge, UK
| | - L. Shi
- School of Computer Science and Technology, Huaibei Normal University, Huaibei, People's Republic of China
| |
Collapse
|
10
|
Ji Y, Shi H. Shrinkage estimation of fixed and random effects in linear quantile mixed models. J Appl Stat 2021; 49:3693-3716. [DOI: 10.1080/02664763.2021.1962262] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/20/2022]
Affiliation(s)
- Yonggang Ji
- School of Science, Civil Aviation University of China, Tianjin, People's Republic of China
| | - Haifang Shi
- School of Science, Civil Aviation University of China, Tianjin, People's Republic of China
| |
Collapse
|
11
|
Seipp A, Uslar V, Weyhe D, Timmer A, Otto-Sobotka F. Weighted expectile regression for right-censored data. Stat Med 2021; 40:5501-5520. [PMID: 34272749 DOI: 10.1002/sim.9137] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/29/2020] [Revised: 06/04/2021] [Accepted: 06/29/2021] [Indexed: 01/01/2023]
Abstract
Expectile regression can be used to analyze the entire conditional distribution of a response, omitting all distributional assumptions. Among its benefits are computational simplicity, efficiency, and the possibility to incorporate a semiparametric predictor. Due to its advantages in full data settings, we propose an extension to right-censored data situations, where conventional methods typically focus only on mean effects. We propose to extend expectile regression with inverse probability weights. Estimates are easy to implement and computationally simple. Expectiles can be converted to more easily interpreted tail expectations, that is, the expected residual life. It provides a meaningful effect measure, similar to the hazard rate. The results from an extensive simulation study are presented, evaluating consistency and sensitivity to violations of assumptions. We use the proposed method to analyze survival times of colorectal cancer patients from a regional certified high volume cancer center.
Collapse
Affiliation(s)
- Alexander Seipp
- Division of Epidemiology and Biometry, Faculty of Medicine and Health Sciences, Carl von Ossietzky University Oldenburg, Oldenburg, Germany
| | - Verena Uslar
- University Hospital for General and Visceral Surgery, Pius-Hospital Oldenburg, Oldenburg, Germany
| | - Dirk Weyhe
- University Hospital for General and Visceral Surgery, Pius-Hospital Oldenburg, Oldenburg, Germany
| | - Antje Timmer
- Division of Epidemiology and Biometry, Faculty of Medicine and Health Sciences, Carl von Ossietzky University Oldenburg, Oldenburg, Germany
| | - Fabian Otto-Sobotka
- Division of Epidemiology and Biometry, Faculty of Medicine and Health Sciences, Carl von Ossietzky University Oldenburg, Oldenburg, Germany
| |
Collapse
|
12
|
Abstract
Expectiles have gained considerable attention in recent years due to wide applications in many areas. In this study, the k-nearest neighbours approach, together with the asymmetric least squares loss function, called ex-kNN, is proposed for computing expectiles. Firstly, the effect of various distance measures on ex-kNN in terms of test error and computational time is evaluated. It is found that Canberra, Lorentzian, and Soergel distance measures lead to minimum test error, whereas Euclidean, Canberra, and Average of (L1,L∞) lead to a low computational cost. Secondly, the performance of ex-kNN is compared with existing packages er-boost and ex-svm for computing expectiles that are based on nine real life examples. Depending on the nature of data, the ex-kNN showed two to 10 times better performance than er-boost and comparable performance with ex-svm regarding test error. Computationally, the ex-kNN is found two to five times faster than ex-svm and much faster than er-boost, particularly, in the case of high dimensional data.
Collapse
|
13
|
Spiegel E, Kneib T, von Gablenz P, Otto-Sobotka F. Generalized expectile regression with flexible response function. Biom J 2021; 63:1028-1051. [PMID: 33734453 DOI: 10.1002/bimj.202000203] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/29/2020] [Revised: 12/06/2020] [Accepted: 01/20/2021] [Indexed: 11/09/2022]
Abstract
Expectile regression, in contrast to classical linear regression, allows for heteroscedasticity and omits a parametric specification of the underlying distribution. This model class can be seen as a quantile-like generalization of least squares regression. Similarly as in quantile regression, the whole distribution can be modeled with expectiles, while still offering the same flexibility in the use of semiparametric predictors as modern mean regression. However, even with no parametric assumption for the distribution of the response in expectile regression, the model is still constructed with a linear relationship between the fitted value and the predictor. If the true underlying relationship is nonlinear then severe biases can be observed in the parameter estimates as well as in quantities derived from them such as model predictions. We observed this problem during the analysis of the distribution of a self-reported hearing score with limited range. Classical expectile regression should in theory adhere to these constraints, however, we observed predictions that exceeded the maximum score. We propose to include a response function between the fitted value and the predictor similarly as in generalized linear models. However, including a fixed response function would imply an assumption on the shape of the underlying distribution function. Such assumptions would be counterintuitive in expectile regression. Therefore, we propose to estimate the response function jointly with the covariate effects. We design the response function as a monotonically increasing P-spline, which may also contain constraints on the target set. This results in valid estimates for a self-reported listening effort score through nonlinear estimates of the response function. We observed strong associations with the speech reception threshold.
Collapse
Affiliation(s)
- Elmar Spiegel
- Helmholtz Zentrum München GmbH, German Research Center for Environmental Health, Institute of Computational Biology, Neuherberg, Germany.,University of Goettingen, Chair of Statistics, Göttingen, Germany
| | - Thomas Kneib
- University of Goettingen, Chair of Statistics, Göttingen, Germany
| | - Petra von Gablenz
- Jade University of Applied Sciences, Institute for Hearing Technology and Audiology, Oldenburg, Germany
| | - Fabian Otto-Sobotka
- Carl von Ossietzky University Oldenburg, Division of Epidemiology and Biometry, Oldenburg, Germany
| |
Collapse
|
14
|
Zhao J, Yan G, Zhang Y. Robust estimation and shrinkage in ultrahigh dimensional expectile regression with heavy tails and variance heterogeneity. Stat Pap (Berl) 2021. [DOI: 10.1007/s00362-021-01227-2] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/21/2022]
|
15
|
Pan Y, Liu Z, Song G. Weighted expectile regression with covariates missing at random. COMMUN STAT-SIMUL C 2021. [DOI: 10.1080/03610918.2021.1873371] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/22/2022]
Affiliation(s)
- Yingli Pan
- Hubei Key Laboratory of Applied Mathematics, Faculty of Mathematics and Statistics, Hubei University, Wuhan, China
| | - Zhan Liu
- Hubei Key Laboratory of Applied Mathematics, Faculty of Mathematics and Statistics, Hubei University, Wuhan, China
| | - Guangyu Song
- Hubei Key Laboratory of Applied Mathematics, Faculty of Mathematics and Statistics, Hubei University, Wuhan, China
| |
Collapse
|
16
|
Alfò M, Marino MF, Ranalli MG, Salvati N, Tzavidis N. M‐quantile regression for multivariate longitudinal data with an application to the Millennium Cohort Study. J R Stat Soc Ser C Appl Stat 2020. [DOI: 10.1111/rssc.12452] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/27/2022]
Affiliation(s)
- Marco Alfò
- Dipartimento di Scienze Statistiche Sapienza Università di Roma Roma Italy
| | - Maria Francesca Marino
- Dipartimento di Statistica, Informatica, Applicazioni Università degli Studi di Firenze Firenze Italy
| | | | - Nicola Salvati
- Dipartimento di Economia e Management Università di Pisa Pisa Italy
| | - Nikos Tzavidis
- Department of Social Statistics and Demography Southampton Statistical Sciences Research Institute University of Southampton Southampton UK
| |
Collapse
|
17
|
Zheng S. KLERC: kernel Lagrangian expectile regression calculator. Comput Stat 2020. [DOI: 10.1007/s00180-020-01003-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
|
18
|
Pan Y. Distributed optimization and statistical learning for large-scale penalized expectile regression. J Korean Stat Soc 2020. [DOI: 10.1007/s42952-020-00074-5] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
|
19
|
|
20
|
Chen T, Su Z, Yang Y, Ding S. Efficient estimation in expectile regression using envelope models. Electron J Stat 2020. [DOI: 10.1214/19-ejs1664] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/19/2022]
|
21
|
Silbersdorff A, Schneider KS. Distributional Regression Techniques in Socioeconomic Research on the Inequality of Health with an Application on the Relationship between Mental Health and Income. INTERNATIONAL JOURNAL OF ENVIRONMENTAL RESEARCH AND PUBLIC HEALTH 2019; 16:E4009. [PMID: 31635091 PMCID: PMC6843976 DOI: 10.3390/ijerph16204009] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 09/06/2019] [Revised: 10/07/2019] [Accepted: 10/15/2019] [Indexed: 12/02/2022]
Abstract
This study addresses the much-discussed issue of the relationship between health and income. In particular, it focuses on the relation between mental health and household income by using generalized additive models of location, scale and shape and thus employing a distributional perspective. Furthermore, this study aims to give guidelines to applied researchers interested in taking a distributional perspective on health inequalities. In our analysis we use cross-sectional data of the German socioeconomic Panel (SOEP). We find that when not only looking at the expected mental health score of an individual but also at other distributional aspects, like the risk of moderate and severe mental illness, that the relationship between income and mental health is much more pronounced. We thus show that taking a distributional perspective, can add to and indeed enrich the mostly mean-based assessment of existent health inequalities.
Collapse
Affiliation(s)
| | - Kai Sebastian Schneider
- Department of Clinical Psychology, PFH Private University of Applied Sciences, 37073 Göttingen, Germany.
| |
Collapse
|
22
|
Wirsik N, Otto-Sobotka F, Pigeot I. Modeling physical activity data using L 0 -penalized expectile regression. Biom J 2019; 61:1371-1384. [PMID: 31172553 DOI: 10.1002/bimj.201800007] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/07/2018] [Revised: 12/26/2018] [Accepted: 01/09/2019] [Indexed: 11/11/2022]
Abstract
In recent years accelerometers have become widely used to objectively assess physical activity. Usually intensity ranges are assigned to the measured accelerometer counts by simple cut points, disregarding the underlying activity pattern. Under the assumption that physical activity can be seen as distinct sequence of distinguishable activities, the use of hidden Markov models (HMM) has been proposed to improve the modeling of accelerometer data. As further improvement we propose to use expectile regression utilizing a Whittaker smoother with an L0 -penalty to better capture the intensity levels underlying the observed counts. Different expectile asymmetries beyond the mean allow the distinction of monotonous and more variable activities as expectiles effectively model the complete distribution of the counts. This new approach is investigated in a simulation study, where we simulated 1,000 days of accelerometer data with 1 and 5 s epochs, based on collected labeled data to resemble real-life data as closely as possible. The expectile regression is compared to HMMs and the commonly used cut point method with regard to misclassification rate, number of identified bouts and identified levels as well as the proportion of the estimate being in the range of ± 10 % of the true activity level. In summary, expectile regression utilizing a Whittaker smoother with an L0 -penalty outperforms HMMs and the cut point method and is hence a promising approach to model accelerometer data.
Collapse
Affiliation(s)
- Norman Wirsik
- Department of Biometry and Data Management, Leibniz Institute for Prevention Research and Epidemiology - BIPS, Bremen, Germany
| | - Fabian Otto-Sobotka
- School of Medicine and Health Sciences, Carl von Ossietzky University Oldenburg, Oldenburg, Germany
| | - Iris Pigeot
- Department of Biometry and Data Management, Leibniz Institute for Prevention Research and Epidemiology - BIPS, Bremen, Germany.,Faculty of Mathematics and Computer Science, University of Bremen, Bremen, Germany
| |
Collapse
|
23
|
|
24
|
Abstract
Spatio-temporal models are becoming increasingly popular in recent regression research. However, they usually rely on the assumption of a specific parametric distribution for the response and/or homoscedastic error terms. In this article, we propose to apply semiparametric expectile regression to model spatio-temporal effects beyond the mean. Besides the removal of the assumption of a specific distribution and homoscedasticity, with expectile regression the whole distribution of the response can be estimated. For the use of expectiles, we interpret them as weighted means and estimate them by established tools of (penalized) least squares regression. The spatio-temporal effect is set up as an interaction between time and space either based on trivariate tensor product P-splines or the tensor product of a Gaussian Markov random field and a univariate P-spline. Importantly, the model can easily be split up into main effects and interactions to facilitate interpretation. The method is presented along the analysis of spatio-temporal variation of temperatures in Germany from 1980 to 2014.
Collapse
Affiliation(s)
- Elmar Spiegel
- Chair of Statistics, University of Göttingen, Göttingen, Germany
- Institute of Computational Biology, Helmholtz Zentrum München, German Research Center for Environmental Health, Neuherberg, Germany
| | - Thomas Kneib
- Chair of Statistics, University of Göttingen, Göttingen, Germany
| | - Fabian Otto-Sobotka
- Department of Health Services Research, Carl von Ossietzky University Oldenburg, Oldenburg, Germany
| |
Collapse
|
25
|
Zhao X, Cheng W, Zhang P. Extreme tail risk estimation with the generalized Pareto distribution under the peaks-over-threshold framework. COMMUN STAT-THEOR M 2018. [DOI: 10.1080/03610926.2018.1549253] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/27/2022]
Affiliation(s)
- Xu Zhao
- College of Applied Sciences, Beijing University of Technology, Beijing, China
| | - Weihu Cheng
- College of Applied Sciences, Beijing University of Technology, Beijing, China
| | - Pengyue Zhang
- Department of Biomedical Informatics, College of Medicine, The Ohio State University, Columbus, Ohio, USA
| |
Collapse
|
26
|
|
27
|
|
28
|
|
29
|
Affiliation(s)
- Jun Zhao
- School of Mathematical Sciences, Zhejiang University, Hangzhou, P. R. China
| | - Yi Zhang
- School of Mathematical Sciences, Zhejiang University, Hangzhou, P. R. China
| |
Collapse
|
30
|
Abstract
Boosting algorithms were originally developed for machine learning but were later adapted to estimate statistical models—offering various practical advantages such as automated variable selection and implicit regularization of effect estimates. The interpretation of the resulting models, however, remains the same as if they had been fitted by classical methods. Boosting, hence, allows to use an advanced machine learning scheme to estimate various types of statistical models. This tutorial aims to highlight how boosting can be used for semi-parametric modelling, what practical implications follow from the design of the algorithm and what kind of drawbacks data analysts have to expect. We illustrate the application of boosting in the analysis of a stunting score from children in India and a high-dimensional dataset of tumour DNA to develop a biomarker for the occurrence of metastases in breast cancer patients.
Collapse
Affiliation(s)
- Andreas Mayr
- Institut für Statistik,
Ludwig-Maxilians-Universität, München, Germany
- Institut für Medizininformatik,
Biometrie und Epidemiologie, Friedrich-Alexander-Universität Erlangen-Nürnberg
(FAU), Erlangen, Germany
| | | |
Collapse
|
31
|
Binder H, Gefeller O, Schmid M, Mayr A. Extending Statistical Boosting. Methods Inf Med 2018; 53:428-35. [DOI: 10.3414/me13-01-0123] [Citation(s) in RCA: 30] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/11/2013] [Accepted: 05/02/2014] [Indexed: 11/09/2022]
Abstract
SummaryBackground: Boosting algorithms to simultaneously estimate and select predictor effects in statistical models have gained substantial interest during the last decade.Objectives: This review highlights recent methodological developments regarding boosting algorithms for statistical modelling especially focusing on topics relevant for biomedical research.Methods: We suggest a unified framework for gradient boosting and likelihood-based boosting (statistical boosting) which have been addressed separately in the literature up to now.Results: The methodological developments on statistical boosting during the last ten years can be grouped into three different lines of research: i) efforts to ensure variable selection leading to sparser models, ii) developments regarding different types of predictor effects and how to choose them, iii) approaches to extend the statistical boosting framework to new regression settings.Conclusions: Statistical boosting algorithms have been adapted to carry out unbiased variable selection and automated model choice during the fitting process and can nowadays be applied in almost any regression setting in combination with a large amount of different types of predictor effects.
Collapse
|
32
|
Usseglio-Carleve A. Estimation of conditional extreme risk measures from heavy-tailed elliptical random vectors. Electron J Stat 2018. [DOI: 10.1214/18-ejs1499] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/19/2022]
|
33
|
Daouia A, Girard S, Stupfler G. Estimation of tail risk based on extreme expectiles. J R Stat Soc Series B Stat Methodol 2017. [DOI: 10.1111/rssb.12254] [Citation(s) in RCA: 57] [Impact Index Per Article: 7.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]
Affiliation(s)
| | - Stéphane Girard
- Inria Grenoble Rhône-Alpes and Laboratoire Jean Kuntzmann; Grenoble France
| | - Gilles Stupfler
- Aix Marseille Université; Marseille France
- University of Nottingham; UK
| |
Collapse
|
34
|
|
35
|
Affiliation(s)
- Yi Yang
- Department of Mathematics and Statistics, McGill University, Montréal, QC, Canada
| | - Teng Zhang
- Department of Mathematics, University of Central Florida, Orlando, FL
| | - Hui Zou
- School of Statistics, University of Minnesota, Minneapolis, MN
| |
Collapse
|
36
|
Xing JJ, Qian XY. Bayesian expectile regression with asymmetric normal distribution. COMMUN STAT-THEOR M 2017. [DOI: 10.1080/03610926.2015.1088030] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/21/2022]
|
37
|
|
38
|
Spiegel E, Sobotka F, Kneib T. Model selection in semiparametric expectile regression. Electron J Stat 2017. [DOI: 10.1214/17-ejs1307] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/19/2022]
|
39
|
Hepp T, Schmid M, Gefeller O, Waldmann E, Mayr A. Approaches to Regularized Regression - A Comparison between Gradient Boosting and the Lasso. Methods Inf Med 2016; 55:422-430. [PMID: 27626931 DOI: 10.3414/me16-01-0033] [Citation(s) in RCA: 90] [Impact Index Per Article: 10.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/11/2016] [Accepted: 06/21/2016] [Indexed: 11/09/2022]
Abstract
BACKGROUND Penalization and regularization techniques for statistical modeling have attracted increasing attention in biomedical research due to their advantages in the presence of high-dimensional data. A special focus lies on algorithms that incorporate automatic variable selection like the least absolute shrinkage operator (lasso) or statistical boosting techniques. OBJECTIVES Focusing on the linear regression framework, this article compares the two most-common techniques for this task, the lasso and gradient boosting, both from a methodological and a practical perspective. METHODS We describe these methods highlighting under which circumstances their results will coincide in low-dimensional settings. In addition, we carry out extensive simulation studies comparing the performance in settings with more predictors than observations and investigate multiple combinations of noise-to-signal ratio and number of true non-zero coeffcients. Finally, we examine the impact of different tuning methods on the results. RESULTS Both methods carry out penalization and variable selection for possibly highdimensional data, often resulting in very similar models. An advantage of the lasso is its faster run-time, a strength of the boosting concept is its modular nature, making it easy to extend to other regression settings. CONCLUSIONS Although following different strategies with respect to optimization and regularization, both methods imply similar constraints to the estimation problem leading to a comparable performance regarding prediction accuracy and variable selection in practice.
Collapse
|
40
|
|
41
|
Klein N, Kneib T, Lang S, Sohn A. Bayesian structured additive distributional regression with an application to regional income inequality in Germany. Ann Appl Stat 2015. [DOI: 10.1214/15-aoas823] [Citation(s) in RCA: 77] [Impact Index Per Article: 7.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/19/2022]
|
42
|
Hofner B, Boccuto L, Göker M. Controlling false discoveries in high-dimensional situations: boosting with stability selection. BMC Bioinformatics 2015; 16:144. [PMID: 25943565 PMCID: PMC4464883 DOI: 10.1186/s12859-015-0575-3] [Citation(s) in RCA: 71] [Impact Index Per Article: 7.1] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/07/2014] [Accepted: 04/16/2015] [Indexed: 12/17/2022] Open
Abstract
BACKGROUND Modern biotechnologies often result in high-dimensional data sets with many more variables than observations (n≪p). These data sets pose new challenges to statistical analysis: Variable selection becomes one of the most important tasks in this setting. Similar challenges arise if in modern data sets from observational studies, e.g., in ecology, where flexible, non-linear models are fitted to high-dimensional data. We assess the recently proposed flexible framework for variable selection called stability selection. By the use of resampling procedures, stability selection adds a finite sample error control to high-dimensional variable selection procedures such as Lasso or boosting. We consider the combination of boosting and stability selection and present results from a detailed simulation study that provide insights into the usefulness of this combination. The interpretation of the used error bounds is elaborated and insights for practical data analysis are given. RESULTS Stability selection with boosting was able to detect influential predictors in high-dimensional settings while controlling the given error bound in various simulation scenarios. The dependence on various parameters such as the sample size, the number of truly influential variables or tuning parameters of the algorithm was investigated. The results were applied to investigate phenotype measurements in patients with autism spectrum disorders using a log-linear interaction model which was fitted by boosting. Stability selection identified five differentially expressed amino acid pathways. CONCLUSION Stability selection is implemented in the freely available R package stabs (http://CRAN.R-project.org/package=stabs). It proved to work well in high-dimensional settings with more predictors than observations for both, linear and additive models. The original version of stability selection, which controls the per-family error rate, is quite conservative, though, this is much less the case for its improvement, complementary pairs stability selection. Nevertheless, care should be taken to appropriately specify the error bound.
Collapse
Affiliation(s)
- Benjamin Hofner
- Department of Medical Informatics, Biometry and Epidemiology, Friedrich-Alexander-University Erlangen-Nuremberg, Waldstraße 6, Erlangen, 91054, Germany.
| | - Luigi Boccuto
- Greenwood Genetic Center, 113 Gregor Mendel Circle, Greenwood, 29646, SC, USA.
| | - Markus Göker
- Leibniz Institute DSMZ - German Collection of Microorganisms and Cell Cultures, Inhoffenstraße 7b, Braunschweig, 38124, Germany.
| |
Collapse
|
43
|
Abstract
Recent interest in modern regression modelling has focused on extending available (mean) regression models by describing more general properties of the response distribution. An alternative approach is quantile regression where regression effects on the conditional quantile function of the response are assumed. While quantile regression can be seen as a generalization of median regression, expectiles as alternative are a generalized form of mean regression. Generally, quantiles provide a natural interpretation even beyond the 0.5 quantile, the median. A comparable simple interpretation is not available for expectiles beyond the 0.5 expectile, the mean. Nonetheless, expectiles have some interesting properties, some of which are discussed in this article. We contrast the two approaches and show how to get quantiles from a fine grid of expectiles. We compare such quantiles from expectiles with direct quantile estimates regarding efficiency. We also look at regression problems where both quantile and expectile curves have the undesirable property that neighbouring curves may cross each other. We propose a modified method to estimate non-crossing expectile curves based on splines. In an application, we look at the expected shortfall, a risk measure used in finance, which requires both expectiles and quantiles for estimation and which can be calculated easily with the proposed methods in the article.
Collapse
|
44
|
Huang X, Shi L, Suykens JA. Asymmetric least squares support vector machine classifiers. Comput Stat Data Anal 2014. [DOI: 10.1016/j.csda.2013.09.015] [Citation(s) in RCA: 35] [Impact Index Per Article: 3.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]
|
45
|
Shim J, Bin O, Hwang C. Semiparametric spatial effects kernel minimum squared error model for predicting housing sales prices. Neurocomputing 2014. [DOI: 10.1016/j.neucom.2013.07.035] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/26/2022]
|
46
|
Abstract
Usual exponential family regression models focus on only one designated quantity of the response distribution, namely the mean. While this entails easy interpretation of the estimated regression effects, it may often lead to incomplete analyses when more complex relationships are indeed present and also bears the risk of false conclusions about the significance/importance of covariates. We will therefore give an overview on extended types of regression models that allows us to go beyond mean regression. More specifically, we will consider generalized additive models for location, scale and shape as well as semiparametric quantile and expectile regression. We will review the basic properties of all three approaches and compare them with respect to the flexibility in terms of the supported types of predictor specification, the availability of software and the support for different types of inferential procedures. The considered model classes are illustrated using a data set on rents for flats in the City of Munich.
Collapse
Affiliation(s)
- Thomas Kneib
- Chair of Statistics, Georg August University, Göttingen, Germany
| |
Collapse
|
47
|
|
48
|
Fung WK, He X, Hubert M, Portnoy S, Wang HJ. Editorial for the special issue on quantile regression and semiparametric methods. Comput Stat Data Anal 2012. [DOI: 10.1016/j.csda.2011.12.012] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/14/2022]
|