1
|
Raza A, Noor-Ul-Amin M, Ayari-Akkari A, Nabi M, Usman Aslam M. A redescending M-estimator approach for outlier-resilient modeling. Sci Rep 2024; 14:7131. [PMID: 38532107 DOI: 10.1038/s41598-024-57906-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/19/2024] [Accepted: 03/22/2024] [Indexed: 03/28/2024] Open
Abstract
The OLS model is built on the assumption of normality in the distribution of error terms. However, this assumption can be easily violated, especially when there are outliers in the data. A single outlier can disrupt the normality assumption of error terms, making the OLS model less effective. In such situations, M-estimators (MEs) come into play to obtain reliable estimates. We introduce a redescending M-estimators (RME) for robust regression to handle datasets with outliers. The proposed RME produces more robust estimates by effectively managing the influence of outliers, even at lower values of the tuning constant. We compared the performance of this estimator with existing RMEs using real-life data examples and an extensive simulation study. The results show that our suggested RME is more efficient than the compared ME in various situations.
Collapse
Affiliation(s)
- Aamir Raza
- Govt. College Women University Sialkot, Sialkot, Pakistan
| | | | - Amel Ayari-Akkari
- Biology Department, College of Sciences in Abha, King Khalid University, P.O. Box 960, Abha, Saudi Arabia
| | | | | |
Collapse
|
2
|
Hird C, Barham KE, Franklin CE. Looking beyond the mean: quantile regression for comparative physiologists. J Exp Biol 2024; 227:jeb247122. [PMID: 38323449 PMCID: PMC10949063 DOI: 10.1242/jeb.247122] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/30/2023] [Accepted: 02/01/2024] [Indexed: 02/08/2024]
Abstract
Statistical analyses that physiologists use to test hypotheses predominantly centre on means, but the tail ends of the response distribution can behave quite differently and underpin important scientific phenomena. We demonstrate that quantile regression (QR) offers a way to bypass some limitations of least squares regression (LSR) by building a picture of independent variable effects across the whole distribution of a dependent variable. We used LSR and QR with simulated and real datasets. With simulated data, LSR showed no change in the mean response but missed significant effects in the tails of the distribution found using QR. With real data, LSR showed a significant change in the mean response but missed a lack of response in the upper quantiles which was biologically revealing. Together, this highlights that QR can help to ask and answer more questions about variation in nature.
Collapse
Affiliation(s)
- Coen Hird
- School of the Environment, The University of Queensland, Brisbane (Magandjin), QLD 4072, Australia
| | - Kaitlin E. Barham
- School of the Environment, The University of Queensland, Brisbane (Magandjin), QLD 4072, Australia
| | - Craig E. Franklin
- School of the Environment, The University of Queensland, Brisbane (Magandjin), QLD 4072, Australia
| |
Collapse
|
3
|
Yuanyuan Z, Kumari S, Ilyas M, Bhayo MUR, Marwat J. Media coverage and stock market returns: Evidence from China Pakistan economic corridor (CPEC). Heliyon 2023; 9:e14204. [PMID: 36923889 PMCID: PMC10009534 DOI: 10.1016/j.heliyon.2023.e14204] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/26/2022] [Revised: 02/18/2023] [Accepted: 02/23/2023] [Indexed: 03/05/2023] Open
Abstract
The primary source of investor interest that disrupts the financial markets is news that reflects the macroeconomy. This study intends to track changes in investors' positive and negative market attention and their effects on stock market returns by examining the print media portrayal of the China-Pakistan economic corridor (CPEC). We access the daily and weekly coverage of the CPEC by national and international newspapers from the Bloomberg database over the period from January 2015 to December 2019. Using the Harvard psychological dictionary, we categorize the news headlines into positive and negative news sentiments. We then relate the news sentiment to the stock market returns, using quintile analysis, ordinary least squares (OLS), and vector autoregressive (VAR) models. The results show that investors react quickly and significantly to positive news. They pay more for the same stock if the positive news stream increases; hence, the stock market return also increases. In contrast, investors do not react with the same passion to an increase in negative news. These findings are in line with the theoretical rationale of the disposition effect. These outcomes may be useful for active investors and practitioners to devise investment strategies in the presence of the hype surrounding the CPEC in the print media.
Collapse
Affiliation(s)
| | - Sonia Kumari
- Department of Business Administration, Sukkur IBA University, Sukkur, 65200, Pakistan
| | - Muhammad Ilyas
- Department of Business Administration, Sukkur IBA University, Sukkur, 65200, Pakistan
| | - Mujeeb-U-Rehman Bhayo
- Department of Business Administration, Sukkur IBA University, Sukkur, 65200, Pakistan
| | | |
Collapse
|
4
|
Pan X, Chen Z, Zhai W, Dong L, Lin L, Li Y, Yang Y. Distribution of antibiotic resistance genes in the sediments of Erhai Lake, Yunnan-Kweichow Plateau, China: Their linear relations with nonpoint source pollution discharges from 26 tributaries. Environ Pollut 2023; 316:120471. [PMID: 36270570 DOI: 10.1016/j.envpol.2022.120471] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/24/2022] [Revised: 10/11/2022] [Accepted: 10/15/2022] [Indexed: 06/16/2023]
Abstract
Erhai Lake, a typical plateau deep water lake, experienced long-term nonpoint source (NPS) pollution discharge from 26 tributaries, which significantly affected the abundance and spread of resistance genes. In this study, 25 antibiotic resistance genes (ARGs), classified into six types, and NPS pollution discharges were investigated throughout around the Erhai basin. FCA (mexF) and sulfonamide resistance genes (sul1, sul2 and sul3) were the most common. Although the absolute overall abundance of ARGs there was low so far, the individual gene like sulfonamide resistance gene was high. Regression analysis using an ordinary least squares model (OLS) showed that the discharge of NPS pollution into Erhai Lake would have an obvious effect on the distribution of ARGs. And the relations between them were linear. Concretely speaking, the total nitrogen (TN) pollution input from tributaries could significantly correlated with the increasing of ARG abundance, while the total phosphorus (TP) pollution input showed the opposite correlation, and ultimately affect the distribution of ARGs. Moreover, the effect of TP on ARG distribution was more significant than TN. This study provides a geographical profile of ARG distribution in a subtropical deep lake on Yunnan-Kweichow Plateau. The results are beneficial for predicting the distribution characteristics of ARGs and controlling their pollution in plateau lakes.
Collapse
Affiliation(s)
- Xiong Pan
- Basin Water Environmental Research Department, Changjiang River Scientific Research Institute, Wuhan, 430010, China; Key Lab of Basin Water Resource and Eco-Environmental Science in Hubei Province, Wuhan, 430010, China
| | - Zeyu Chen
- School of Geography and Information Engineering, China University of Geosciences, Wuhan, 430074, China
| | - Wenliang Zhai
- Basin Water Environmental Research Department, Changjiang River Scientific Research Institute, Wuhan, 430010, China; Key Lab of Basin Water Resource and Eco-Environmental Science in Hubei Province, Wuhan, 430010, China
| | - Lei Dong
- Basin Water Environmental Research Department, Changjiang River Scientific Research Institute, Wuhan, 430010, China; Key Lab of Basin Water Resource and Eco-Environmental Science in Hubei Province, Wuhan, 430010, China
| | - Li Lin
- Basin Water Environmental Research Department, Changjiang River Scientific Research Institute, Wuhan, 430010, China; Key Lab of Basin Water Resource and Eco-Environmental Science in Hubei Province, Wuhan, 430010, China.
| | - Yi Li
- Key Laboratory of Integrated Regulation and Resource Development on Shallow Lake of Ministry of Education, College of Environment, Hohai University, Nanjing, 210098, China
| | - Yuyi Yang
- Key Laboratory of Aquatic Botany and Watershed Ecology, Wuhan Botanical Garden, Chinese Academy of Sciences, Wuhan, 430074, China
| |
Collapse
|
5
|
Pellekooren S, Ben ÂJ, Bosmans JE, Ostelo RWJG, van Tulder MW, Maas ET, Huygen FJPM, Oosterhuis T, Apeldoorn AT, van Hooff ML, van Dongen JM. Can EQ-5D-3L utility values of low back pain patients be validly predicted by the Oswestry Disability Index for use in cost-effectiveness analyses? Qual Life Res 2022; 31:2153-2165. [PMID: 35040002 PMCID: PMC9188530 DOI: 10.1007/s11136-022-03082-6] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 01/07/2022] [Indexed: 11/26/2022]
Abstract
Purpose To assess whether regression modeling can be used to predict EQ-5D-3L utility values from the Oswestry Disability Index (ODI) in low back pain (LBP) patients for use in cost-effectiveness analysis. Methods EQ-5D-3L utility values of LBP patients were estimated using their ODI scores as independent variables using regression analyses, while adjusting for case-mix variables. Six different models were estimated: (1) Ordinary Least Squares (OLS) regression, with total ODI score, (2) OLS, with ODI item scores as continuous variables, (3) OLS, with ODI item scores as ordinal variables, (4) Tobit model, with total ODI score, (5) Tobit model, with ODI item scores as continuous variables, and (6) Tobit model, with ODI item scores as ordinal variables. The models’ performance was assessed using explained variance (R2) and root mean squared error (RMSE). The potential impact of using predicted instead of observed EQ-5D-3L utility values on cost-effectiveness outcomes was evaluated in two empirical cost-effectiveness analysis. Results Complete individual patient data of 18,692 low back pain patients were analyzed. All models had a more or less similar R2 (range 45–52%) and RMSE (range 0.21–0.22). The two best performing models produced similar probabilities of cost-effectiveness for a range of willingness-to-pay (WTP) values compared to those based on the observed EQ-5D-3L values. For example, the difference in probabilities ranged from 2 to 5% at a WTP of 50,000 €/QALY gained. Conclusion Results suggest that the ODI can be validly used to predict low back pain patients’ EQ-5D-3L utility values and QALYs for use in cost-effectiveness analyses. Supplementary Information The online version contains supplementary material available at 10.1007/s11136-022-03082-6.
Collapse
Affiliation(s)
- Sylvia Pellekooren
- Department of Health Sciences, Faculty of Science, Vrije Universiteit Amsterdam, Amsterdam Movement Sciences Research Institute, De Boelelaan 1105, 1081 HV, Amsterdam, The Netherlands.
- Department Human Movement Sciences, Faculty of Behavioral & Movement Sciences, Vrije Universiteit, Amsterdam, The Netherlands.
| | - Ângela J Ben
- Department of Health Sciences, Faculty of Science, Vrije Universiteit Amsterdam, Amsterdam Public Health Research Institute, Amsterdam, The Netherlands
| | - Judith E Bosmans
- Department of Health Sciences, Faculty of Science, Vrije Universiteit Amsterdam, Amsterdam Public Health Research Institute, Amsterdam, The Netherlands
| | - Raymond W J G Ostelo
- Department of Health Sciences, Faculty of Science, Vrije Universiteit Amsterdam, Amsterdam Movement Sciences Research Institute, De Boelelaan 1105, 1081 HV, Amsterdam, The Netherlands
- Department of Epidemiology and Data Science, Amsterdam UMC, Location VUmc, Amsterdam Movement Sciences Research Institute, Amsterdam, The Netherlands
| | - Maurits W van Tulder
- Department Human Movement Sciences, Faculty of Behavioral & Movement Sciences, Vrije Universiteit, Amsterdam, The Netherlands
| | - Esther T Maas
- Department of Health Sciences, Faculty of Science, Vrije Universiteit Amsterdam, Amsterdam Movement Sciences Research Institute, De Boelelaan 1105, 1081 HV, Amsterdam, The Netherlands
| | - Frank J P M Huygen
- Center of Pain Medicine Erasmusmc, Rotterdam, The Netherlands
- Center of Pain Medicine UMCU, Utrecht, The Netherlands
| | - Teddy Oosterhuis
- Netherlands Society of Occupational Medicine, Centre of Excellence, Utrecht, the Netherlands
- Coronel Institute of Occupational Health, Amsterdam UMC, University of Amsterdam, Amsterdam Public Health Research Institute, Amsterdam, The Netherlands
| | - Adri T Apeldoorn
- Rehabilitation Departement, Noordwest Ziekenhuisgroep, Alkmaar, Netherlands, Breederode Hogeschool, Rotterdam, Netherlands
| | - Miranda L van Hooff
- Departement Research, Sint Maartenskliniek, Nijmegen, The Netherlands
- Department of Orthopedic Surgery, Radboud University Medical Center, Nijmegen, The Netherlands
| | - Johanna M van Dongen
- Department of Health Sciences, Faculty of Science, Vrije Universiteit Amsterdam, Amsterdam Movement Sciences Research Institute, De Boelelaan 1105, 1081 HV, Amsterdam, The Netherlands
- Department of Health Sciences, Faculty of Science, Vrije Universiteit Amsterdam, Amsterdam Public Health Research Institute, Amsterdam, The Netherlands
| |
Collapse
|
6
|
Abstract
Accounting for dependent observations in cluster-randomized trials (CRTs) using nested data is necessary in order to avoid misestimated standard errors resulting in questionable inferential statistics. Cluster-robust standard errors (CRSEs) are often used to address this issue. However, CRSEs are still well-known to underestimate standard errors for group-level variables when the number of clusters is low (e.g., < 50) and with CRTs, a small number of clusters, due to logistical or financial considerations, is the norm rather than the exception. Using a simulation with various conditions, we investigate the use of a small sample correction (i.e., CR2 estimator) proposed by Bell and McCaffrey (2002) together with empirically derived degrees of freedom estimates (dofBM). Findings indicate that even with as few as 10 clusters, the CR2 estimator used with dofBM yields generally unbiased results with acceptable type I error and coverage rates. Results show that coverage and type I error rates can be largely influenced by the choice of dof, not just the standard error adjustments. An applied example is provided together with R syntax to conduct the analysis. To facilitate the use of different CRSEs, a free graphical, menu-driven SPSS add-on to compute the various cluster-robust variance estimates can be downloaded from https://github.com/flh3/CR2/tree/master/SPSS .
Collapse
|
7
|
Mahanty C, Kumar R, Mishra BK. Analyses the effects of COVID-19 outbreak on human sexual behaviour using ordinary least-squares based multivariate logistic regression. ACTA ACUST UNITED AC 2020; 55:1239-1259. [PMID: 33100406 PMCID: PMC7568844 DOI: 10.1007/s11135-020-01057-8] [Citation(s) in RCA: 16] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 10/10/2020] [Indexed: 02/07/2023]
Abstract
This study aimed to evaluate the impact of COVID-19 on sexual, mental and physical health. There were 262 respondents included in this study (38% female and 62% male) above 18 years of age from India. Statistical analysis was performed using Ordinary Least Squares (OLS) based on multivariate logistic regression analysis. The numerical tests were performed by using Python 3 engine and R-squared (coefficient of multiple determinations for multiple regressions) for prediction and P value > 0.5 is considered to be statistically significant. The study outcomes were obtained using a study-specific questionnaire to assess the quality of sex life, changes in sexual behavior and mental health. Frequency of sexual intercourse, frequency of watching porn, sexual hygiene, frequency of physical activity, depression, desire for parenthood in female respondents have more significant R 2 (0.903, 0.976, 0.973, 0.989, 0.985, 0.862) value respectively as compared to male respondents. Financial anxiety, Smoking and drinking habits in male respondents have more significant R 2 (0.917, 0.964) value respectively as compared to female respondents. The aim of this study is to understand quality of sex life, sexual behavior, reproductive planning, mental health, physical health and adult coping during the COVID-19 pandemic, as well as how past experiences have affected. Many respondents had a broad variety of problems concerning their sexual and reproductive well being. Measures should be set in order to safeguard the mental and sexual health of people during the pandemic.
Collapse
Affiliation(s)
| | - Raghvendra Kumar
- Department of Computer Science and Engineering, GIET University, Gunupur, India
| | | |
Collapse
|
8
|
Huang Y, Li J, Ma Y. Determining optimum sampling numbers for survey of soil heavy metals in decision-making units: taking cadmium as an example. Environ Sci Pollut Res Int 2020; 27:24466-24479. [PMID: 32304065 DOI: 10.1007/s11356-020-08793-2] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 01/22/2020] [Accepted: 04/06/2020] [Indexed: 06/11/2023]
Abstract
Optimum sampling number (OSN) is one critical issue to achieve credible results when surveying heavy metals in soil and undertaking risk assessment for sustainable land use or remediation decisions. Although traditional methods, such as classical statistics, geostatistics, and simulated annealing algorithm, have been used to determine OSN for surveying soil heavy metals, their usefulness is limited because the distribution of soil heavy metal concentration approximately follows a log-normal distribution. Furthermore, existing correction equations for the log-normal distribution may overestimate or underestimate the OSN, and they have not been applied to estimate the OSN of soil heavy metals. The objective of the present study was to find a simple model under the log-normal distribution that determined the OSN for surveying of soil heavy metals in decision-making units. To test the effectiveness and accuracy of this model, soil heavy metals in 17 contaminated areas generating 200 multiscale units were analyzed. Determining equations for OSN, including classical statistics and approximate correction equations, were compared. Results showed that the equation for determining OSN by ordinary least squares (OSN_OLS) was computationally simple and straightforward because of an adjustment of the classic log-normal equation without relying on consulting the adjusted Student t-tables for a noncentralized data distribution. Compared with other OSN determining equations, sampling numbers by OSN_OLS were closer to optimum numbers and effectively avoided the risk of overestimation or underestimation. Descriptive statistics indicated that the estimated pollution results by OSN_OLS in representative units were very similar to original sampling with more sampling information. Furthermore, compared with other OSN-determining equations, the mapping based on OSN_OLS not only described the trends of spatial variation but also improved mapping accuracy. We conclude that OSN_OLS is an effective, straightforward, and exact model to estimate the OSN for surveying of soil heavy metals in decision-making units.
Collapse
Affiliation(s)
- Yajie Huang
- Institute of Agricultural Resources and Regional Planning, Chinese Academy of Agricultural Sciences, Beijing, 100081, China
| | - Jumei Li
- Institute of Agricultural Resources and Regional Planning, Chinese Academy of Agricultural Sciences, Beijing, 100081, China
| | - Yibing Ma
- Macau Environmental Research Institute, Macau University of Science and Technology, Macau, 999078, China.
| |
Collapse
|
9
|
Fellman J. Seasonality and multiple maternities: Comparisons between different models. Early Hum Dev 2020; 141:104870. [PMID: 31514989 DOI: 10.1016/j.earlhumdev.2019.104870] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
Abstract
Seasonality of demographic data has been of great interest. The seasonality depends mainly on climatic conditions, and the findings may vary from study to study. Commonly, the studies are based on monthly data. The population at risk plays a central role. For births or deaths over short periods, the population at risk is proportional to the lengths of the months. Hence, one must analyse the number of births (deaths) per day. If one studies the seasonality of multiple maternities, the population at risk is the total monthly number of confinements and the number of multiple maternities in a given month must be compared with the monthly number of all maternities. Consequently, one considers the monthly rates of multiple maternities, the monthly number of births is eliminated and one obtains an unaffected seasonality measure of the rates. In general, comparisons between the seasonality of different data sets presuppose standardization of the data to indices with common means, mainly 100. When seasonal models are applied, one must pay special attention to how well the applied model fits the data. If the goodness of fit is poor, non-significant models obtained can erroneously lead to statements that the seasonality is slight, although the observed seasonal fluctuations are marked. The estimated monthly models chosen are approximately orthogonal and they have little influence on the parameter estimates. Exact orthogonality should be obtained if the data are equidistant, that is, if the months are of equal length (e.g. 30 days), corresponding to 30∘. Exactly equidistant data can be observed when circadian rhythms (24 h) are studied. In this study, we compare seasonal models with models with exact orthogonality.
Collapse
|
10
|
Mumford JA. A comprehensive review of group level model performance in the presence of heteroscedasticity: Can a single model control Type I errors in the presence of outliers? Neuroimage 2016; 147:658-668. [PMID: 28030782 DOI: 10.1016/j.neuroimage.2016.12.058] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/03/2016] [Revised: 12/16/2016] [Accepted: 12/20/2016] [Indexed: 10/20/2022] Open
Abstract
Even after thorough preprocessing and a careful time series analysis of functional magnetic resonance imaging (fMRI) data, artifact and other issues can lead to violations of the assumption that the variance is constant across subjects in the group level model. This is especially concerning when modeling a continuous covariate at the group level, as the slope is easily biased by outliers. Various models have been proposed to deal with outliers including models that use the first level variance or that use the group level residual magnitude to differentially weight subjects. The most typically used robust regression, implementing a robust estimator of the regression slope, has been previously studied in the context of fMRI studies and was found to perform well in some scenarios, but a loss of Type I error control can occur for some outlier settings. A second type of robust regression using a heteroscedastic autocorrelation consistent (HAC) estimator, which produces robust slope and variance estimates has been shown to perform well, with better Type I error control, but with large sample sizes (500-1000 subjects). The Type I error control with smaller sample sizes has not been studied in this model and has not been compared to other modeling approaches that handle outliers such as FSL's Flame 1 and FSL's outlier de-weighting. Focusing on group level inference with a continuous covariate over a range of sample sizes and degree of heteroscedasticity, which can be driven either by the within- or between-subject variability, both styles of robust regression are compared to ordinary least squares (OLS), FSL's Flame 1, Flame 1 with outlier de-weighting algorithm and Kendall's Tau. Additionally, subject omission using the Cook's Distance measure with OLS and nonparametric inference with the OLS statistic are studied. Pros and cons of these models as well as general strategies for detecting outliers in data and taking precaution to avoid inflated Type I error rates are discussed.
Collapse
Affiliation(s)
- Jeanette A Mumford
- Center for Healthy Minds, University of Wisconsin, Madison, United States.
| |
Collapse
|
11
|
Chernyavskiy P, Kendall GM, Wakeford R, Little MP. Spatial prediction of naturally occurring gamma radiation in Great Britain. J Environ Radioact 2016; 164:300-311. [PMID: 27544074 PMCID: PMC5048584 DOI: 10.1016/j.jenvrad.2016.07.029] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/25/2016] [Revised: 07/19/2016] [Accepted: 07/21/2016] [Indexed: 05/11/2023]
Abstract
Gamma radiation from natural sources is an important component of background radiation, and correlates with childhood leukaemia risk in Great Britain. The geographic variation of indoor gamma radiation dose-rates in Great Britain is explored using various geo-statistical methods. A multi-resolution Gaussian-process model using radial basis functions with 2, 4, or 8 components, is fitted via maximum likelihood, and a non-spatial model is also used, fitted by ordinary least squares. Because of the dataset size (N = 10,199), four other parametric spatial models are fitted by variogram-fitting. A randomly selected 70:30 split is used for fitting:validation. The models are evaluated based on their predictive performance as measured by Mean Absolute Error, Mean Squared Error, as well as Pearson correlation and rank-correlation between predicted and actual dose-rates. Each of the four parametric models (Matérn, Gaussian, Bessel, Spherical) fitted the empirical variogram well, and yielded similar predictions at >50 km separation, although with more substantial differences in predicted variograms at <50 km. The multi-resolution Gaussian-process model with 8 components had the best predictive accuracy among the models considered. The Spherical, Bessel, Matérn, Gaussian and ordinary least squares models had progressively worse predictive performance, the ordinary least squares model being particularly poor in this respect.
Collapse
Affiliation(s)
- P Chernyavskiy
- Radiation Epidemiology Branch, National Cancer Institute, DHHS, NIH, Division of Cancer Epidemiology and Genetics, Bethesda, MD 20892-9778, USA.
| | - G M Kendall
- Cancer Epidemiology Unit, University of Oxford, Richard Doll Building, Old Road Campus, Headington, Oxford, OX3 7LF, UK.
| | - R Wakeford
- Centre for Occupational and Environmental Health, Institute of Population Health, The University of Manchester, Ellen Wilkinson Building, Oxford Road, Manchester, M13 9PL, UK.
| | - M P Little
- Radiation Epidemiology Branch, National Cancer Institute, DHHS, NIH, Division of Cancer Epidemiology and Genetics, Bethesda, MD 20892-9778, USA.
| |
Collapse
|
12
|
Liu Y, Chiaromonte F, Li B. Structured Ordinary Least Squares: A Sufficient Dimension Reduction approach for regressions with partitioned predictors and heterogeneous units. Biometrics 2016; 73:529-539. [PMID: 27649087 DOI: 10.1111/biom.12579] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/01/2015] [Revised: 07/01/2016] [Accepted: 07/01/2016] [Indexed: 11/29/2022]
Abstract
In many scientific and engineering fields, advanced experimental and computing technologies are producing data that are not just high dimensional, but also internally structured. For instance, statistical units may have heterogeneous origins from distinct studies or subpopulations, and features may be naturally partitioned based on experimental platforms generating them, or on information available about their roles in a given phenomenon. In a regression analysis, exploiting this known structure in the predictor dimension reduction stage that precedes modeling can be an effective way to integrate diverse data. To pursue this, we propose a novel Sufficient Dimension Reduction (SDR) approach that we call structured Ordinary Least Squares (sOLS). This combines ideas from existing SDR literature to merge reductions performed within groups of samples and/or predictors. In particular, it leads to a version of OLS for grouped predictors that requires far less computation than recently proposed groupwise SDR procedures, and provides an informal yet effective variable selection tool in these settings. We demonstrate the performance of sOLS by simulation and present a first application to genomic data. The R package "sSDR," publicly available on CRAN, includes all procedures necessary to implement the sOLS approach.
Collapse
Affiliation(s)
- Yang Liu
- Department of Statistics, Pennsylvania State University, University Park, Pennsylvania 16802, U.S.A
| | - Francesca Chiaromonte
- Department of Statistics, Pennsylvania State University, University Park, Pennsylvania 16802, U.S.A
| | - Bing Li
- Department of Statistics, Pennsylvania State University, University Park, Pennsylvania 16802, U.S.A
| |
Collapse
|
13
|
Abstract
In the context of inverse or parameter estimation problems we demonstrate the use of statistically based model comparison tests in several examples of practical interest. In these examples we are interested in questions related to information content of a particular given data set and whether the data will support a more complicated model to describe it. In the first example we compare fits for several different models to describe simple decay in a size histogram for aggregates in amyloid fibril formation. In a second example we investigate whether the information content in data sets for the pest Lygus hesperus in cotton fields as it is currently collected is sufficient to support a model in which one distinguishes between nymphs and adults. Finally in a third example with data for patients having undergone an organ transplant, we question whether the data content is sufficient to estimate more than 5 of the fundamental parameters in a particular dynamic model.
Collapse
Affiliation(s)
- H T Banks
- Center for Research in Scientific Computation, North Carolina State University, Raleigh, NC 27695-8212 USA
| | - J E Banks
- Division of Sciences & Mathematics, School of Interdisciplinary Arts & Sciences, University of Washington, Tacoma, Tacoma, Washington 98402
| | - Kathryn Link
- Center for Research in Scientific Computation, North Carolina State University, Raleigh, NC 27695-8212 USA
| | - J A Rosenheim
- Department of Entomology and Nematology, and Center for Population Biology, University of California, Davis, Davis, CA 95616
| | - Chelsea Ross
- Center for Research in Scientific Computation, North Carolina State University, Raleigh, NC 27695-8212 USA
| | - K A Tillman
- Center for Research in Scientific Computation, North Carolina State University, Raleigh, NC 27695-8212 USA
| |
Collapse
|