1
|
Halpin PF. Differential Item Functioning via Robust Scaling. Psychometrika 2024:10.1007/s11336-024-09957-6. [PMID: 38704430 DOI: 10.1007/s11336-024-09957-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/20/2022] [Indexed: 05/06/2024]
Abstract
This paper proposes a method for assessing differential item functioning (DIF) in item response theory (IRT) models. The method does not require pre-specification of anchor items, which is its main virtue. It is developed in two main steps: first by showing how DIF can be re-formulated as a problem of outlier detection in IRT-based scaling and then tackling the latter using methods from robust statistics. The proposal is a redescending M-estimator of IRT scaling parameters that is tuned to flag items with DIF at the desired asymptotic type I error rate. Theoretical results describe the efficiency of the estimator in the absence of DIF and its robustness in the presence of DIF. Simulation studies show that the proposed method compares favorably to currently available approaches for DIF detection, and a real data example illustrates its application in a research context where pre-specification of anchor items is infeasible. The focus of the paper is the two-parameter logistic model in two independent groups, with extensions to other settings considered in the conclusion.
Collapse
Affiliation(s)
- Peter F Halpin
- University of North Carolina at Chapel Hill, 100 E Cameron Ave, Office 1070G, Chapel Hill, NC, 27514, USA.
| |
Collapse
|
2
|
Wang W, Shang Z, Li C. Brain-inspired semantic data augmentation for multi-style images. Front Neurorobot 2024; 18:1382406. [PMID: 38596181 PMCID: PMC11002076 DOI: 10.3389/fnbot.2024.1382406] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/05/2024] [Accepted: 03/04/2024] [Indexed: 04/11/2024] Open
Abstract
Data augmentation is an effective technique for automatically expanding training data in deep learning. Brain-inspired methods are approaches that draw inspiration from the functionality and structure of the human brain and apply these mechanisms and principles to artificial intelligence and computer science. When there is a large style difference between training data and testing data, common data augmentation methods cannot effectively enhance the generalization performance of the deep model. To solve this problem, we improve modeling Domain Shifts with Uncertainty (DSU) and propose a new brain-inspired computer vision image data augmentation method which consists of two key components, namely, using Robust statistics and controlling the Coefficient of variance for DSU (RCDSU) and Feature Data Augmentation (FeatureDA). RCDSU calculates feature statistics (mean and standard deviation) with robust statistics to weaken the influence of outliers, making the statistics close to the real values and improving the robustness of deep learning models. By controlling the coefficient of variance, RCDSU makes the feature statistics shift with semantic preservation and increases shift range. FeatureDA controls the coefficient of variance similarly to generate the augmented features with semantics unchanged and increase the coverage of augmented features. RCDSU and FeatureDA are proposed to perform style transfer and content transfer in the feature space, and improve the generalization ability of the model at the style and content level respectively. On Photo, Art Painting, Cartoon, and Sketch (PACS) multi-style classification task, RCDSU plus FeatureDA achieves competitive accuracy. After adding Gaussian noise to PACS dataset, RCDSU plus FeatureDA shows strong robustness against outliers. FeatureDA achieves excellent results on CIFAR-100 image classification task. RCDSU plus FeatureDA can be applied as a novel brain-inspired semantic data augmentation method with implicit robot automation which is suitable for datasets with large style differences between training and testing data.
Collapse
Affiliation(s)
| | - Zhaowei Shang
- College of Computer Science, Chongqing University, Chongqing, China
| | | |
Collapse
|
3
|
Hadian-Jazi M, Sadri A. A Python package based on robust statistical analysis for serial crystallography data processing. Acta Crystallogr D Struct Biol 2023; 79:820-829. [PMID: 37584428 PMCID: PMC10478633 DOI: 10.1107/s2059798323005855] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/26/2023] [Accepted: 07/03/2023] [Indexed: 08/17/2023] Open
Abstract
The term robustness in statistics refers to methods that are generally insensitive to deviations from model assumptions. In other words, robust methods are able to preserve their accuracy even when the data do not perfectly fit the statistical models. Robust statistical analyses are particularly effective when analysing mixtures of probability distributions. Therefore, these methods enable the discretization of X-ray serial crystallography data into two probability distributions: a group comprising true data points (for example the background intensities) and another group comprising outliers (for example Bragg peaks or bad pixels on an X-ray detector). These characteristics of robust statistical analysis are beneficial for the ever-increasing volume of serial crystallography (SX) data sets produced at synchrotron and X-ray free-electron laser (XFEL) sources. The key advantage of the use of robust statistics for some applications in SX data analysis is that it requires minimal parameter tuning because of its insensitivity to the input parameters. In this paper, a software package called Robust Gaussian Fitting library (RGFlib) is introduced that is based on the concept of robust statistics. Two methods are presented based on the concept of robust statistics and RGFlib for two SX data-analysis tasks: (i) a robust peak-finding algorithm and (ii) an automated robust method to detect bad pixels on X-ray pixel detectors.
Collapse
Affiliation(s)
- Marjan Hadian-Jazi
- The Walter and Eliza Hall Institute of Medical Research, Parkville, Melbourne, Victoria 3052, Australia
- Department of Medical Biology, The University of Melbourne, Parkville, Melbourne, Victoria 3052, Australia
| | - Alireza Sadri
- School of Physics and Astronomy, Monash University, Clayton, Victoria 3800, Australia
| |
Collapse
|
4
|
Abstract
Viable methods for the identification of item misfit or Differential Item Functioning (DIF) are central to scale construction and sound measurement. Many approaches rely on the derivation of a limiting distribution under the assumption that a certain model fits the data perfectly. Typical DIF assumptions such as the monotonicity and population independence of item functions are present even in classical test theory but are more explicitly stated when using item response theory or other latent variable models for the assessment of item fit. The work presented here provides a robust approach for DIF detection that does not assume perfect model data fit, but rather uses Tukey's concept of contaminated distributions. The approach uses robust outlier detection to flag items for which adequate model data fit cannot be established.
Collapse
|
5
|
Cumpston P. Blood volume estimation in cardiac surgery - A comparative analysis. Perfusion 2023; 38:455-463. [PMID: 35345934 DOI: 10.1177/02676591211069920] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/15/2022]
Abstract
INTRODUCTION This paper seeks to identify which of three published formulas used for estimating the blood volume of normal human subjects correlates most closely with blood volumes measured in a published study where erythrocyte volume was determined by a method using 51Cr and a nonradioactive dye was used to determine the plasma volume. METHODS Blood volumes predicted by three published algorithms were compared with blood volume estimates from a study by Retzlaff et al. using the two-tailed Wilcoxon signed rank test and a robust version of the Bland-Altman test. RESULTS When applied to a sample of normal subjects selected from Mayo Clinic personnel and patients, the Nadler formula correlated more closely with blood volume measured using a radio nucleotide technique than did the Allen formula or one based on a saline haemodilution technique. CONCLUSIONS The Nadler formula correlated more closely with blood volume measurements derived from Retzlaff's study than the other formulas for estimating blood volume in a population with height and weight distribution more consistent with that seen in North America. It should be used in preference to the Allen formula for estimating blood volume in adult patients currently undergoing cardiac surgical procedures. Saline haemodilution techniques used to measure blood volume require validation against more recently developed nuclear medicine techniques using statistical methods other than regression analysis. Until validated, they should be used with caution for estimating blood volume in adult patients currently undergoing cardiac surgical procedures. If a formula using height, weight and sex is used to estimate blood volume in the context of cardiac surgery, then it must be derived using a much more comprehensive sample of the population to which it is applied than has occurred to date. In particular, it should include broader distributions of height, weight and the presence or absence and type of significant valvular disease.
Collapse
Affiliation(s)
- Philip Cumpston
- Visiting Senior Specialist Anaesthetist, 3621Greenslopes Private Hospital, Greenslopes, QLD, Australia.,1974The University of Queensland, Saint Lucia, QLD, Australia
| |
Collapse
|
6
|
Poulsen VM, DeDeo S. Inferring Cultural Landscapes with the Inverse Ising Model. Entropy (Basel) 2023; 25:264. [PMID: 36832631 PMCID: PMC9955041 DOI: 10.3390/e25020264] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 12/13/2022] [Revised: 01/22/2023] [Accepted: 01/25/2023] [Indexed: 06/18/2023]
Abstract
The space of possible human cultures is vast, but some cultural configurations are more consistent with cognitive and social constraints than others. This leads to a "landscape" of possibilities that our species has explored over millennia of cultural evolution. However, what does this fitness landscape, which constrains and guides cultural evolution, look like? The machine-learning algorithms that can answer these questions are typically developed for large-scale datasets. Applications to the sparse, inconsistent, and incomplete data found in the historical record have received less attention, and standard recommendations can lead to bias against marginalized, under-studied, or minority cultures. We show how to adapt the minimum probability flow algorithm and the Inverse Ising model, a physics-inspired workhorse of machine learning, to the challenge. A series of natural extensions-including dynamical estimation of missing data, and cross-validation with regularization-enables reliable reconstruction of the underlying constraints. We demonstrate our methods on a curated subset of the Database of Religious History: records from 407 religious groups throughout human history, ranging from the Bronze Age to the present day. This reveals a complex, rugged, landscape, with both sharp, well-defined peaks where state-endorsed religions tend to concentrate, and diffuse cultural floodplains where evangelical religions, non-state spiritual practices, and mystery religions can be found.
Collapse
Affiliation(s)
- Victor Møller Poulsen
- Department of Social and Decision Sciences, Carnegie Mellon University, 5000 Forbes Avenue, Pittsburgh, PA 15213, USA
| | - Simon DeDeo
- Department of Social and Decision Sciences, Carnegie Mellon University, 5000 Forbes Avenue, Pittsburgh, PA 15213, USA
- Santa Fe Institute, 1399 Hyde Park Road, Santa Fe, NM 87501, USA
| |
Collapse
|
7
|
Vitelli V, Fleischer T, Ankill J, Arjas E, Frigessi A, Kristensen VN, Zucknick M. Transcriptomic pan-cancer analysis using rank-based Bayesian inference. Mol Oncol 2022; 17:548-563. [PMID: 36562628 PMCID: PMC10061294 DOI: 10.1002/1878-0261.13354] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/01/2022] [Revised: 09/30/2022] [Accepted: 12/08/2022] [Indexed: 12/24/2022] Open
Abstract
The analysis of whole genomes of pan-cancer data sets provides a challenge for researchers, and we contribute to the literature concerning the identification of robust subgroups with clear biological interpretation. Specifically, we tackle this unsupervised problem via a novel rank-based Bayesian clustering method. The advantages of our method are the integration and quantification of all uncertainties related to both the input data and the model, the probabilistic interpretation of final results to allow straightforward assessment of the stability of clusters leading to reliable conclusions, and the transparent biological interpretation of the identified clusters since each cluster is characterized by its top-ranked genomic features. We applied our method to RNA-seq data from cancer samples from 12 tumor types from the Cancer Genome Atlas. We identified a robust clustering that mostly reflects tissue of origin but also includes pan-cancer clusters. Importantly, we identified three pan-squamous clusters composed of a mix of lung squamous cell carcinoma, head and neck squamous carcinoma, and bladder cancer, with different biological functions over-represented in the top genes that characterize the three clusters. We also found two novel subtypes of kidney cancer that show different prognosis, and we reproduced known subtypes of breast cancer. Taken together, our method allows the identification of robust and biologically meaningful clusters of pan-cancer samples.
Collapse
Affiliation(s)
- Valeria Vitelli
- Oslo Centre for Biostatistics and Epidemiology, University of Oslo, Norway
| | - Thomas Fleischer
- Department of Cancer Genetics, Institute for Cancer Research, Oslo University Hospital, Norway
| | - Jørgen Ankill
- Department of Cancer Genetics, Institute for Cancer Research, Oslo University Hospital, Norway
| | - Elja Arjas
- Oslo Centre for Biostatistics and Epidemiology, University of Oslo, Norway.,Department of Mathematics and Statistics, University of Helsinki, Finland
| | - Arnoldo Frigessi
- Oslo Centre for Biostatistics and Epidemiology, University of Oslo, Norway.,Oslo Centre for Biostatistics and Epidemiology, Oslo University Hospital, Norway
| | - Vessela N Kristensen
- Department of Medical Genetics, Clinic for Laboratory Medicine, Oslo University Hospital, Norway.,Institute for Clinical Medicine, Faculty of Medicine, University of Oslo, Norway
| | - Manuela Zucknick
- Oslo Centre for Biostatistics and Epidemiology, University of Oslo, Norway
| |
Collapse
|
8
|
Sadri A, Hadian-Jazi M, Yefanov O, Galchenkova M, Kirkwood H, Mills G, Sikorski M, Letrun R, de Wijn R, Vakili M, Oberthuer D, Komadina D, Brehm W, Mancuso AP, Carnis J, Gelisio L, Chapman HN. Automatic bad-pixel mask maker for X-ray pixel detectors with application to serial crystallography. J Appl Crystallogr 2022; 55:1549-1561. [PMID: 36570663 PMCID: PMC9721322 DOI: 10.1107/s1600576722009815] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/19/2022] [Accepted: 10/06/2022] [Indexed: 11/22/2022] Open
Abstract
X-ray crystallography has witnessed a massive development over the past decade, driven by large increases in the intensity and brightness of X-ray sources and enabled by employing high-frame-rate X-ray detectors. The analysis of large data sets is done via automatic algorithms that are vulnerable to imperfections in the detector and noise inherent with the detection process. By improving the model of the behaviour of the detector, data can be analysed more reliably and data storage costs can be significantly reduced. One major requirement is a software mask that identifies defective pixels in diffraction frames. This paper introduces a methodology and program based upon concepts of machine learning, called robust mask maker (RMM), for the generation of bad-pixel masks for large-area X-ray pixel detectors based on modern robust statistics. It is proposed to discriminate normally behaving pixels from abnormal pixels by analysing routine measurements made with and without X-ray illumination. Analysis software typically uses a Bragg peak finder to detect Bragg peaks and an indexing method to detect crystal lattices among those peaks. Without proper masking of the bad pixels, peak finding methods often confuse the abnormal values of bad pixels in a pattern with true Bragg peaks and flag such patterns as useful regardless, leading to storage of enormous uninformative data sets. Also, it is computationally very expensive for indexing methods to search for crystal lattices among false peaks and the solution may be biased. This paper shows how RMM vastly improves peak finders and prevents them from labelling bad pixels as Bragg peaks, by demonstrating its effectiveness on several serial crystallography data sets.
Collapse
Affiliation(s)
- Alireza Sadri
- Center for Free-Electron Laser Science CFEL, Deutsches Elektronen-Synchrotron DESY, Notkestraße 85, 22607 Hamburg, Germany,Correspondence e-mail:
| | - Marjan Hadian-Jazi
- European XFEL GmbH, Holzkoppel 4, 22869, Schenefeld, Germany,ARC Centre of Excellence in Advanced Molecular Imaging, La Trobe Institute for Molecular Sciences, La Trobe University, Melbourne, Australia,Australian Nuclear Science and Technology Organisation (ANSTO), Australia
| | - Oleksandr Yefanov
- Center for Free-Electron Laser Science CFEL, Deutsches Elektronen-Synchrotron DESY, Notkestraße 85, 22607 Hamburg, Germany
| | - Marina Galchenkova
- Center for Free-Electron Laser Science CFEL, Deutsches Elektronen-Synchrotron DESY, Notkestraße 85, 22607 Hamburg, Germany
| | - Henry Kirkwood
- European XFEL GmbH, Holzkoppel 4, 22869, Schenefeld, Germany
| | - Grant Mills
- European XFEL GmbH, Holzkoppel 4, 22869, Schenefeld, Germany
| | - Marcin Sikorski
- European XFEL GmbH, Holzkoppel 4, 22869, Schenefeld, Germany
| | - Romain Letrun
- European XFEL GmbH, Holzkoppel 4, 22869, Schenefeld, Germany
| | - Raphael de Wijn
- European XFEL GmbH, Holzkoppel 4, 22869, Schenefeld, Germany
| | - Mohammad Vakili
- European XFEL GmbH, Holzkoppel 4, 22869, Schenefeld, Germany
| | - Dominik Oberthuer
- Center for Free-Electron Laser Science CFEL, Deutsches Elektronen-Synchrotron DESY, Notkestraße 85, 22607 Hamburg, Germany
| | - Dana Komadina
- Center for Free-Electron Laser Science CFEL, Deutsches Elektronen-Synchrotron DESY, Notkestraße 85, 22607 Hamburg, Germany
| | - Wolfgang Brehm
- Center for Free-Electron Laser Science CFEL, Deutsches Elektronen-Synchrotron DESY, Notkestraße 85, 22607 Hamburg, Germany
| | - Adrian P. Mancuso
- European XFEL GmbH, Holzkoppel 4, 22869, Schenefeld, Germany,Department of Chemistry and Physics, La Trobe Institute for Molecular Science, La Trobe University, Melbourne, Victoria, Australia
| | - Jerome Carnis
- Center for Free-Electron Laser Science CFEL, Deutsches Elektronen-Synchrotron DESY, Notkestraße 85, 22607 Hamburg, Germany
| | - Luca Gelisio
- Center for Free-Electron Laser Science CFEL, Deutsches Elektronen-Synchrotron DESY, Notkestraße 85, 22607 Hamburg, Germany
| | - Henry N. Chapman
- Center for Free-Electron Laser Science CFEL, Deutsches Elektronen-Synchrotron DESY, Notkestraße 85, 22607 Hamburg, Germany,Department of Physics, Universität Hamburg, Luruper Chaussee 149, 22761 Hamburg, Germany,The Hamburg Centre for Ultrafast Imaging, Luruper Chaussee 149, 22761 Hamburg, Germany
| |
Collapse
|
9
|
Cerasa A. Introducing Robust Statistics in the Uncertainty Quantification of Nuclear Safeguards Measurements. Entropy (Basel) 2022; 24:1160. [PMID: 36010824 PMCID: PMC9407210 DOI: 10.3390/e24081160] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 05/19/2022] [Revised: 08/09/2022] [Accepted: 08/17/2022] [Indexed: 06/15/2023]
Abstract
The monitoring of nuclear safeguards measurements consists of verifying the coherence between the operator declarations and the corresponding inspector measurements on the same nuclear items. Significant deviations may be present in the data, as consequence of problems with the operator and/or inspector measurement systems. However, they could also be the result of data falsification. In both cases, quantitative analysis and statistical outcomes may be negatively affected by their presence unless robust statistical methods are used. This article aims to investigate the benefits deriving from the introduction of robust procedures in the nuclear safeguards context. In particular, we will introduce a robust estimator for the estimation of the uncertainty components of the measurement error model. The analysis will prove the capacity of robust procedures to limit the bias in simulated and empirical contexts to provide more sounding statistical outcomes. For these reasons, the introduction of robust procedures may represent a step forward in the still ongoing development of reliable uncertainty quantification methods for error variance estimation.
Collapse
Affiliation(s)
- Andrea Cerasa
- European Commission, Joint Research Centre, Via E. Fermi 2479, 21027 Ispra, VA, Italy
| |
Collapse
|
10
|
Haelewyn J, Iversen PW, Weidner JR. Addressing Unusual Assay Variability with Robust Statistics. SLAS Discov 2021; 26:1291-1297. [PMID: 34474612 DOI: 10.1177/24725552211038379] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/13/2023]
Abstract
Well-behaved, in vitro bioassays generally produce normally distributed values in their primary (efficacy) data. Accordingly, the best practices for statistical analysis are well documented and understood. However, assays may occasionally display unusually high variability and fall outside the assumptions inherent in these standard analyses. These assays may still be in the optimization phase, in which the source of variation could be identified and addressed. They might also represent the best available option to address the biological process being examined. In these cases, the use of robust statistical methods may provide a more appropriate set of tools for both data analysis and assay optimization. This article provides guidance on best practices for the use of robust statistical methods for the analysis of bioassay data as an alternative to standard methods. Impacts on experimental design and interpretation will be discussed.
Collapse
|
11
|
Wang T, Yang X, Guo Y, Li Z. Identification of outlying observations for large-dimensional data. J Appl Stat 2021; 50:370-386. [PMID: 36698547 PMCID: PMC9870021 DOI: 10.1080/02664763.2021.1993799] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/28/2023]
Abstract
This work proposes a two-stage procedure for identifying outlying observations in a large-dimensional data set. In the first stage, an outlier identification measure is defined by using a max-normal statistic and a clean subset that contains non-outliers is obtained. The identification of outliers can be deemed as a multiple hypothesis testing problem, then, in the second stage, we explore the asymptotic distribution of the proposed measure, and obtain the threshold of the outlying observations. Furthermore, in order to improve the identification power and better control the misjudgment rate, a one-step refined algorithm is proposed. Simulation results and two real data analysis examples show that, compared with other methods, the proposed procedure has great advantages in identifying outliers in various data situations.
Collapse
Affiliation(s)
- Tao Wang
- School of Mathematics and Statistics, Huaiyin Normal University, Huaian City, People's Republic of China
| | - Xiaona Yang
- School of Mathematical Science, Heilongjiang University, Harbin City, People's Republic of China
| | - Yunfei Guo
- School of Statistics and Data Science, LPMC and KLMDASR, Nankai University, Tianjin City, People's Republic of China,Department of Mathematics, Yanbian University, Yanji City, People's Republic of China
| | - Zhonghua Li
- School of Statistics and Data Science, LPMC and KLMDASR, Nankai University, Tianjin City, People's Republic of China,Zhonghua Li School of Statistics and Data Science, LPMC and KLMDASR, Nankai University, Tianjin City, 300071, People's Republic of China
| |
Collapse
|
12
|
Hadian-Jazi M, Sadri A, Barty A, Yefanov O, Galchenkova M, Oberthuer D, Komadina D, Brehm W, Kirkwood H, Mills G, de Wijn R, Letrun R, Kloos M, Vakili M, Gelisio L, Darmanin C, Mancuso AP, Chapman HN, Abbey B. Data reduction for serial crystallography using a robust peak finder. J Appl Crystallogr 2021; 54:1360-1378. [PMID: 34667447 PMCID: PMC8493619 DOI: 10.1107/s1600576721007317] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/16/2021] [Accepted: 07/14/2021] [Indexed: 11/23/2022] Open
Abstract
This article focuses on the challenges of hit finding and data reduction in serial crystallography (SX). An effective and reliable Bragg-peak-finding method, called robust peak finder (RPF), has been developed. RPF is based on the principle of robust statistics and can be used for SX data analysis. A peak-finding algorithm for serial crystallography (SX) data analysis based on the principle of ‘robust statistics’ has been developed. Methods which are statistically robust are generally more insensitive to any departures from model assumptions and are particularly effective when analysing mixtures of probability distributions. For example, these methods enable the discretization of data into a group comprising inliers (i.e. the background noise) and another group comprising outliers (i.e. Bragg peaks). Our robust statistics algorithm has two key advantages, which are demonstrated through testing using multiple SX data sets. First, it is relatively insensitive to the exact value of the input parameters and hence requires minimal optimization. This is critical for the algorithm to be able to run unsupervised, allowing for automated selection or ‘vetoing’ of SX diffraction data. Secondly, the processing of individual diffraction patterns can be easily parallelized. This means that it can analyse data from multiple detector modules simultaneously, making it ideally suited to real-time data processing. These characteristics mean that the robust peak finder (RPF) algorithm will be particularly beneficial for the new class of MHz X-ray free-electron laser sources, which generate large amounts of data in a short period of time.
Collapse
Affiliation(s)
- Marjan Hadian-Jazi
- ARC Centre of Excellence in Advanced Molecular Imaging, La Trobe Institute for Molecular Sciences, La Trobe University, Melbourne, Australia.,Australian Nuclear Science and Technology Organisation (ANSTO), Australia.,European XFEL, Holzkoppel 4, 22869 Schenefeld, Germany
| | - Alireza Sadri
- Center for Free-Electron Laser Science, Deutsches Elektronen-Synchrotron (DESY), Notkestrasse 85, 22607 Hamburg, Germany
| | - Anton Barty
- Center for Free-Electron Laser Science, Deutsches Elektronen-Synchrotron (DESY), Notkestrasse 85, 22607 Hamburg, Germany
| | - Oleksandr Yefanov
- Center for Free-Electron Laser Science, Deutsches Elektronen-Synchrotron (DESY), Notkestrasse 85, 22607 Hamburg, Germany
| | - Marina Galchenkova
- Center for Free-Electron Laser Science, Deutsches Elektronen-Synchrotron (DESY), Notkestrasse 85, 22607 Hamburg, Germany
| | - Dominik Oberthuer
- Center for Free-Electron Laser Science, Deutsches Elektronen-Synchrotron (DESY), Notkestrasse 85, 22607 Hamburg, Germany
| | - Dana Komadina
- Center for Free-Electron Laser Science, Deutsches Elektronen-Synchrotron (DESY), Notkestrasse 85, 22607 Hamburg, Germany
| | - Wolfgang Brehm
- Center for Free-Electron Laser Science, Deutsches Elektronen-Synchrotron (DESY), Notkestrasse 85, 22607 Hamburg, Germany
| | | | - Grant Mills
- European XFEL, Holzkoppel 4, 22869 Schenefeld, Germany
| | | | - Romain Letrun
- European XFEL, Holzkoppel 4, 22869 Schenefeld, Germany
| | - Marco Kloos
- European XFEL, Holzkoppel 4, 22869 Schenefeld, Germany
| | | | - Luca Gelisio
- Center for Free-Electron Laser Science, Deutsches Elektronen-Synchrotron (DESY), Notkestrasse 85, 22607 Hamburg, Germany
| | - Connie Darmanin
- ARC Centre of Excellence in Advanced Molecular Imaging, La Trobe Institute for Molecular Sciences, La Trobe University, Melbourne, Australia
| | - Adrian P Mancuso
- European XFEL, Holzkoppel 4, 22869 Schenefeld, Germany.,Department of Chemistry and Physics, La Trobe Institute for Molecular Science, La Trobe University, Melbourne, Victoria, Australia
| | - Henry N Chapman
- Center for Free-Electron Laser Science, Deutsches Elektronen-Synchrotron (DESY), Notkestrasse 85, 22607 Hamburg, Germany.,Department of Physics, Universität Hamburg, Luruper Chaussee 149, 22761 Hamburg, Germany.,The Hamburg Centre for Ultrafast Imaging, Luruper Chaussee 149, 22761 Hamburg, Germany
| | - Brian Abbey
- ARC Centre of Excellence in Advanced Molecular Imaging, La Trobe Institute for Molecular Sciences, La Trobe University, Melbourne, Australia
| |
Collapse
|
13
|
Ren C, Wang X, Zheng B. [Evaluation of proficiency test with low number of participants:a case study on the testing of lead and nitrite in water]. Wei Sheng Yan Jiu 2021; 50:653-659. [PMID: 34311839 DOI: 10.19813/j.cnki.weishengyanjiu.2021.04.019] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Subscribe] [Scholar Register] [Indexed: 06/13/2023]
Abstract
OBJECTIVE The gross error of the assigned value-classical statistical method was developed that all participants in proficiency testing activities could obtain unbiased evaluation in case of a low number of participants(& lt; 17). METHODS This developed method was employed to evaluate the testing result of nitrite-N and lead in water from participants along the "Belt and Road", which was organized by CNCA in 2019. RESULTS Use the specified value gross error-classical statistical method to evaluate the feedback result of 29(15+14) participants with non-normal distribution, 4 out of 15 participants in "Lead in Water" get a "unsatisfied", 5 out of the 14 participants in "Nitrite in Water(as Nitrogen)"get a "unsatisfactory". CONCLUSION When the robust statistics and classical statistical method are unable to make objective evaluations respectively, this method could mildly remove outliers and avoid the extreme evaluation of "all satisfied"or "all dissatisfied", which truly reflected the real ability of each participant and could be applied to the proficiency testing activity with low number of participants.
Collapse
Affiliation(s)
- Chunxiang Ren
- China National Accreditation Service for Conformity Assessment, Beijing 100062, China
| | - Xin Wang
- Key Laboratory of Drinking Water Science and Technology, Chinese Academy of Sciences, Beijing 100085, China
| | - Bei Zheng
- Key Laboratory of Drinking Water Science and Technology, Chinese Academy of Sciences, Beijing 100085, China
| |
Collapse
|
14
|
Hbid Y, Mohamed K, Wolfe CDA, Douiri A. Inverse problem approach to regularized regression models with application to predicting recovery after stroke. Biom J 2020; 62:1926-1938. [PMID: 33058244 DOI: 10.1002/bimj.201900283] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/23/2019] [Revised: 06/27/2020] [Accepted: 08/02/2020] [Indexed: 12/26/2022]
Abstract
Regression modelling is a powerful statistical tool often used in biomedical and clinical research. It could be formulated as an inverse problem that measures the discrepancy between the target outcome and the data produced by representation of the modelled predictors. This approach could simultaneously perform variable selection and coefficient estimation. We focus particularly on a linear regression issue, Y ∼ N ( X β , σ I n ) , where β ∈ R p is the parameter of interest and its components are the regression coefficients. The inverse problem finds an estimate for the parameter β , which is mapped by the linear operator ( L : β ⟶ X β ) to the observed outcome data Y = X β + ε . This problem could be conveyed by finding a solution in the affine subspaceL - 1 ( Y ) . However, in the presence of collinearity, high-dimensional data and high conditioning number of the related covariance matrix, the solution may not be unique, so the introduction of prior information to reduce the subsetL - 1 ( Y ) and regularize the inverse problem is needed. Informed by Huber's robust statistics framework, we propose an optimal regularizer to the regression problem. We compare results of the proposed method and other penalized regression regularization methods: ridge, lasso, adaptive-lasso and elastic-net under different strong hypothesis such as high conditioning number of the covariance matrix and high error amplitude, on both simulated and real data from the South London Stroke Register. The proposed approach can be extended to mixed regression models. Our inverse problem framework coupled with robust statistics methodology offer new insights in statistical regression and learning. It could open a new research development for model fitting and learning.
Collapse
Affiliation(s)
- Youssef Hbid
- LMDP, Cadi Ayyad University, Marrakech, Morocco
- UMMISCO, IRD, France
- Laboratoire Jacques-Louis Lions, Sorbonne University, Paris, France
| | - Khaladi Mohamed
- LMDP, Cadi Ayyad University, Marrakech, Morocco
- UMMISCO, IRD, France
| | - Charles D A Wolfe
- School of Population Health and Environmental Sciences, King's College London, London, United Kingdom
- National Institute for Health Research Biomedical Research Centre, Guy's and St Thomas' NHS Foundation Trust and King's College London, London, United Kingdom
| | - Abdel Douiri
- School of Population Health and Environmental Sciences, King's College London, London, United Kingdom
- National Institute for Health Research Biomedical Research Centre, Guy's and St Thomas' NHS Foundation Trust and King's College London, London, United Kingdom
| |
Collapse
|
15
|
Deutelmoser H, Scherer D, Brenner H, Waldenberger M, Suhre K, Kastenmüller G, Lorenzo Bermejo J. Robust Huber-LASSO for improved prediction of protein, metabolite and gene expression levels relying on individual genotype data. Brief Bioinform 2020; 22:5924409. [PMID: 33063116 PMCID: PMC8293825 DOI: 10.1093/bib/bbaa230] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/10/2020] [Revised: 08/22/2020] [Accepted: 08/25/2020] [Indexed: 12/22/2022] Open
Abstract
Least absolute shrinkage and selection operator (LASSO) regression is often applied to select the most promising set of single nucleotide polymorphisms (SNPs) associated with a molecular phenotype of interest. While the penalization parameter λ restricts the number of selected SNPs and the potential model overfitting, the least-squares loss function of standard LASSO regression translates into a strong dependence of statistical results on a small number of individuals with phenotypes or genotypes divergent from the majority of the study population—typically comprised of outliers and high-leverage observations. Robust methods have been developed to constrain the influence of divergent observations and generate statistical results that apply to the bulk of study data, but they have rarely been applied to genetic association studies. In this article, we review, for newcomers to the field of robust statistics, a novel version of standard LASSO that utilizes the Huber loss function. We conduct comprehensive simulations and analyze real protein, metabolite, mRNA expression and genotype data to compare the stability of penalization, the cross-iteration concordance of the model, the false-positive and true-positive rates and the prediction accuracy of standard and robust Huber-LASSO. Although the two methods showed controlled false-positive rates ≤2.1% and similar true-positive rates, robust Huber-LASSO outperformed standard LASSO in the accuracy of predicted protein, metabolite and gene expression levels using individual SNP data. The conducted simulations and real-data analyses show that robust Huber-LASSO represents a valuable alternative to standard LASSO in genetic studies of molecular phenotypes.
Collapse
Affiliation(s)
- Heike Deutelmoser
- Statistical Genetics Research Group, Institute of Medical Biometry and Informatics, Heidelberg University, Germany
| | - Dominique Scherer
- Statistical Genetics Research Group, Institute of Medical Biometry and Informatics, Heidelberg University, Germany
| | - Hermann Brenner
- Division of Preventive Oncology and the Division of Clinical Epidemiology and Aging Research at the German Cancer Research Center, Heidelberg, Germany
| | - Melanie Waldenberger
- Research Unit Molecular Epidemiology and Institute of Epidemiology, Helmholtz Center Munich, Germany
| | | | - Karsten Suhre
- Weill Cornell Medicine and the Director of the Bioinformatics and Virtual Metabolomics Core at the Cornell campus in Doha, Qatar
| | - Gabi Kastenmüller
- Institute of Computational Biology, Helmholtz Center Munich, Germany
| | - Justo Lorenzo Bermejo
- Statistical Genetics Research Group at the Institute of Medical Biometry and Informatics, Heidelberg University, Germany
| |
Collapse
|
16
|
Wang W, Xu Y. A Modified Residual-Based RAIM Algorithm for Multiple Outliers Based on a Robust MM Estimation. Sensors (Basel) 2020; 20:s20185407. [PMID: 32967213 PMCID: PMC7570696 DOI: 10.3390/s20185407] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 09/03/2020] [Revised: 09/18/2020] [Accepted: 09/18/2020] [Indexed: 11/16/2022]
Abstract
The residual-based (RB) receiver autonomous integrity monitoring (RAIM) detector is a widely used receiver integrity enhancement technology that has the ability to rapidly respond to outliers. However, the sensitivity and vulnerability of the residuals to the outliers are the weaknesses of the method especially in the case of multi-outlier modes. It is an effective method for enhancing the validity of residuals by robust estimation instead of least squares (LS) estimation. In this paper, a modified RB RAIM detector based on a robust MM estimation with a higher detection performance under multi-outlier modes is presented. A fast subset selection method based on the characteristic slope that could reduce the number of subsets to be calculated is also presented. The experimental results show that the proposed algorithm maintains a more robust performance than the RB RAIM detector based on the LS estimator and M estimator with an IGG III function especially with the increase in the number of outliers. The proposed fast subset selection method can reduce the calculation time by at least 80%, demonstrating the practical application value of the algorithm.
Collapse
Affiliation(s)
- Wenbo Wang
- Aerospace Information Research Institute, Chinese Academy of Science, Beijing 100864, China;
- School of Electronic, Electrical and Communication Engineering, University of Chinese Academy of Sciences, Beijing 100864, China
- Correspondence:
| | - Ying Xu
- Aerospace Information Research Institute, Chinese Academy of Science, Beijing 100864, China;
| |
Collapse
|
17
|
Silalahi DD, Midi H, Arasan J, Mustafa MS, Caliman JP. Robust Wavelength Selection Using Filter-Wrapper Method and Input Scaling on Near Infrared Spectral Data. Sensors (Basel) 2020; 20:E5001. [PMID: 32899292 DOI: 10.3390/s20175001] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [What about the content of this article? (0)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 08/04/2020] [Revised: 08/28/2020] [Accepted: 08/30/2020] [Indexed: 12/25/2022]
Abstract
The extraction of relevant wavelengths from a large dataset of Near Infrared Spectroscopy (NIRS) is a significant challenge in vibrational spectroscopy research. Nonetheless, this process allows the improvement in the chemical interpretability by emphasizing the chemical entities related to the chemical parameters of samples. With the complexity in the dataset, it may be possible that irrelevant wavelengths are still included in the multivariate calibration. This yields the computational process to become unnecessary complex and decreases the accuracy and robustness of the model. In multivariate analysis, Partial Least Square Regression (PLSR) is a method commonly used to build a predictive model from NIR spectral data. However, in the PLSR method and common commercial chemometrics software, there is no standard wavelength selection procedure applied to screen the irrelevant wavelengths. In this study, a new robust wavelength selection procedure called the modified VIP-MCUVE (mod-VIP-MCUVE) using Filter-Wrapper method and input scaling strategy is introduced. The proposed method combines the modified Variable Importance in Projection (VIP) and modified Monte Carlo Uninformative Variable Elimination (MCUVE) to calculate the scale matrix of the input variable. The modified VIP uses the orthogonal components of Partial Least Square (PLS) in investigating the informative variable in the model by applying the amount of variation both in X and y{SSX,SSY}, simultaneously. The modified MCUVE uses a robust reliability coefficient and a robust tolerance interval in the selection procedure. To evaluate the superiority of the proposed method, the classical VIP, MCUVE, and autoscaling procedure in classical PLSR were also included in the evaluation. Using artificial data with Monte Carlo simulation and NIR spectral data of oil palm (Elaeis guineensis Jacq.) fruit mesocarp, the study shows that the proposed method offers advantages to improve model interpretability, to be computationally extensive, and to produce better model accuracy.
Collapse
|
18
|
Di Lazzaro P, Atkinson AC, Iacomussi P, Riani M, Ricci M, Wadhams P. Statistical and Proactive Analysis of an Inter-Laboratory Comparison: The Radiocarbon Dating of the Shroud of Turin. Entropy (Basel) 2020; 22:e22090926. [PMID: 33286695 PMCID: PMC7597180 DOI: 10.3390/e22090926] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 07/26/2020] [Revised: 08/11/2020] [Accepted: 08/18/2020] [Indexed: 06/12/2023]
Abstract
We review the sampling and results of the radiocarbon dating of the archaeological cloth known as the Shroud of Turin, in the light of recent statistical analyses of both published and raw data. The statistical analyses highlight an inter-laboratory heterogeneity of the means and a monotone spatial variation of the ages of subsamples that suggest the presence of contaminants unevenly removed by the cleaning pretreatments. We consider the significance and overall impact of the statistical analyses on assessing the reliability of the dating results and the design of correct sampling. These analyses suggest that the 1988 radiocarbon dating does not match the current accuracy requirements. Should this be the case, it would be interesting to know the accurate age of the Shroud of Turin. Taking into account the whole body of scientific data, we discuss whether it makes sense to date the Shroud again.
Collapse
Affiliation(s)
- Paolo Di Lazzaro
- Italian National Agency for New Technologies, Energy and Sustainable Economic Development (ENEA), Dipartimento FSN, Centro Ricerche Frascati, via E. Fermi 45, 00044 Frascati, Italy
| | | | - Paola Iacomussi
- Istituto Nazionale di Ricerca Metrologica (INRIM), 00135 Torino, Italy;
| | - Marco Riani
- Dipartimento di Scienze Economiche e Aziendale and Interdepartmental Centre for Robust Statistics, Università di Parma, 43125 Parma, Italy
| | - Marco Ricci
- Independent Researcher, Via Fra Dolcino 19, 28100 Novara, Italy;
| | - Peter Wadhams
- Department of Applied Mathematics and Theoretical Physics, University of Cambridge, Wilberforce Road, Cambridge CB3 0WA, UK;
| |
Collapse
|
19
|
Zhao Q, Chen Y, Wang J, Small DS. Powerful three-sample genome-wide design and robust statistical inference in summary-data Mendelian randomization. Int J Epidemiol 2020; 48:1478-1492. [PMID: 31298269 DOI: 10.1093/ije/dyz142] [Citation(s) in RCA: 94] [Impact Index Per Article: 23.5] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 06/19/2019] [Indexed: 01/03/2023] Open
Abstract
BACKGROUND Summary-data Mendelian randomization (MR) has become a popular research design to estimate the causal effect of risk exposures. With the sample size of GWAS continuing to increase, it is now possible to use genetic instruments that are only weakly associated with the exposure. DEVELOPMENT We propose a three-sample genome-wide design where typically 1000 independent genetic instruments across the whole genome are used. We develop an empirical partially Bayes statistical analysis approach where instruments are weighted according to their strength; thus weak instruments bring less variation to the estimator. The estimator is highly efficient with many weak genetic instruments and is robust to balanced and/or sparse pleiotropy. APPLICATION We apply our method to estimate the causal effect of body mass index (BMI) and major blood lipids on cardiovascular disease outcomes, and obtain substantially shorter confidence intervals (CIs). In particular, the estimated causal odds ratio of BMI on ischaemic stroke is 1.19 (95% CI: 1.07-1.32, P-value <0.001); the estimated causal odds ratio of high-density lipoprotein cholesterol (HDL-C) on coronary artery disease (CAD) is 0.78 (95% CI: 0.73-0.84, P-value <0.001). However, the estimated effect of HDL-C attenuates and become statistically non-significant when we only use strong instruments. CONCLUSIONS A genome-wide design can greatly improve the statistical power of MR studies. Robust statistical methods may alleviate but not solve the problem of horizontal pleiotropy. Our empirical results suggest that the relationship between HDL-C and CAD is heterogeneous, and it may be too soon to completely dismiss the HDL hypothesis.
Collapse
Affiliation(s)
- Qingyuan Zhao
- Department of Statistics, Wharton School, University of Pennsylvania, Philadelphia, PA, USA
| | - Yang Chen
- Department of Statistics, University of Michigan, Ann Arbor, MI, USA
| | - Jingshu Wang
- Department of Statistics, Wharton School, University of Pennsylvania, Philadelphia, PA, USA
| | - Dylan S Small
- Department of Statistics, Wharton School, University of Pennsylvania, Philadelphia, PA, USA
| |
Collapse
|
20
|
Abstract
M-estimation, or estimating equation, methods are widely applicable for point estimation and asymptotic inference. In this paper, we present an R package that can find roots and compute the empirical sandwich variance estimator for any set of user-specified, unbiased estimating equations. Examples from the M-estimation primer by Stefanski and Boos (2002) demonstrate use of the software. The package also includes a framework for finite sample, heteroscedastic, and autocorrelation variance corrections, and a website with an extensive collection of tutorials.
Collapse
|
21
|
Schuster C, Lubbe D. A note on residual M-distances for identifying aberrant response patterns. Br J Math Stat Psychol 2020; 73:164-169. [PMID: 30756381 DOI: 10.1111/bmsp.12161] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 08/20/2018] [Revised: 12/20/2018] [Indexed: 06/09/2023]
Abstract
Although a statistical model might fit well to a large proportion of the individuals of a random sample, some individuals might give 'unusual' responses that are not well explained by the hypothesized model. If individual responses are given as continuous response vectors, M-distances can be used to produce real valued indicators of how well an individual's response vector corresponds to a covariance structure implied by a psychometric model. In this note, we focus on the so-called one-factor model. Two M-distances, dsi and dri , which are sensitive to different aspects of the assumed factor model, have been proposed. While one of the M-distances, dri , has been derived based on Bartlett factor scores, in this note we show that the second M-distance, dsi , can be derived in an analogous fashion based on Thomson factor scores.
Collapse
|
22
|
Medina D, Li H, Vilà-Valls J, Closas P. Robust Statistics for GNSS Positioning under Harsh Conditions: A Useful Tool? Sensors (Basel) 2019; 19:E5402. [PMID: 31817922 DOI: 10.3390/s19245402] [Citation(s) in RCA: 17] [Impact Index Per Article: 3.4] [Reference Citation Analysis] [What about the content of this article? (0)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 10/27/2019] [Revised: 12/04/2019] [Accepted: 12/05/2019] [Indexed: 12/03/2022]
Abstract
Navigation problems are generally solved applying least-squares (LS) adjustments. Techniques based on LS can be shown to perform optimally when the system noise is Gaussian distributed and the parametric model is accurately known. Unfortunately, real world problems usually contain unexpectedly large errors, so-called outliers, that violate the noise model assumption, leading to a spoiled solution estimation. In this work, the framework of robust statistics is explored to provide robust solutions to the global navigation satellite systems (GNSS) single point positioning (SPP) problem. Considering that GNSS observables may be contaminated by erroneous measurements, we survey the most popular approaches for robust regression (M-, S-, and MM-estimators) and how they can be adapted into a general methodology for robust GNSS positioning. We provide both theoretical insights and validation over experimental datasets, which serves in discussing the robust methods in detail.
Collapse
|
23
|
Abstract
The linear model often serves as a starting point for applying statistics in psychology. Often, formal training beyond the linear model is limited, creating a potential pedagogical gap because of the pervasiveness of data non-normality. We reviewed 61 recently published undergraduate and graduate textbooks on introductory statistics and the linear model, focusing on their treatment of non-normality. This review identified at least eight distinct methods suggested to address non-normality, which we organize into a new taxonomy according to whether the approach: (a) remains within the linear model, (b) changes the data, and (c) treats normality as informative or as a nuisance. Because textbook coverage of these methods was often cursory, and methodological papers introducing these approaches are usually inaccessible to non-statisticians, this review is designed to be the happy medium. We provide a relatively non-technical review of advanced methods which can address non-normality (and heteroscedasticity), thereby serving a starting point to promote best practice in the application of the linear model. We also present three empirical examples to highlight distinctions between these methods' motivations and results. The paper also reviews the current state of methodological research in addressing non-normality within the linear modeling framework. It is anticipated that our taxonomy will provide a useful overview and starting place for researchers interested in extending their knowledge in approaches developed to address non-normality from the perspective of the linear model.
Collapse
Affiliation(s)
- Jolynn Pek
- Psychology, The Ohio State University, Columbus, OH, United States
| | - Octavia Wong
- Kinesiology and Health Sciences, York University, Toronto, ON, Canada
| | - Augustine C M Wong
- Kinesiology and Health Sciences, York University, Toronto, ON, Canada.,Mathematics and Statistics, York University, Toronto, ON, Canada
| |
Collapse
|
24
|
Monti GS, Filzmoser P, Deutsch RC. A Robust Approach to Risk Assessment Based on Species Sensitivity Distributions. Risk Anal 2018; 38:2073-2086. [PMID: 29723427 DOI: 10.1111/risa.13009] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/30/2015] [Revised: 02/01/2018] [Accepted: 03/23/2018] [Indexed: 06/08/2023]
Abstract
The guidelines for setting environmental quality standards are increasingly based on probabilistic risk assessment due to a growing general awareness of the need for probabilistic procedures. One of the commonly used tools in probabilistic risk assessment is the species sensitivity distribution (SSD), which represents the proportion of species affected belonging to a biological assemblage as a function of exposure to a specific toxicant. Our focus is on the inverse use of the SSD curve with the aim of estimating the concentration, HCp, of a toxic compound that is hazardous to p% of the biological community under study. Toward this end, we propose the use of robust statistical methods in order to take into account the presence of outliers or apparent skew in the data, which may occur without any ecological basis. A robust approach exploits the full neighborhood of a parametric model, enabling the analyst to account for the typical real-world deviations from ideal models. We examine two classic HCp estimation approaches and consider robust versions of these estimators. In addition, we also use data transformations in conjunction with robust estimation methods in case of heteroscedasticity. Different scenarios using real data sets as well as simulated data are presented in order to illustrate and compare the proposed approaches. These scenarios illustrate that the use of robust estimation methods enhances HCp estimation.
Collapse
Affiliation(s)
- Gianna S Monti
- Department of Economics, Management and Statistics, University of Milano-Bicocca, Milan, Italy
| | - Peter Filzmoser
- Institute of Statistics & Mathematical Methods in Economics, Vienna University of Technology, Vienna, Austria
| | - Roland C Deutsch
- Institute of Statistics & Mathematical Methods in Economics, Vienna University of Technology, Vienna, Austria
| |
Collapse
|
25
|
Wu H. Approximations to the distribution of a test statistic in covariance structure analysis: A comprehensive study. Br J Math Stat Psychol 2018; 71:334-362. [PMID: 29086416 DOI: 10.1111/bmsp.12123] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 12/13/2015] [Revised: 07/08/2017] [Indexed: 06/07/2023]
Abstract
In structural equation modelling (SEM), a robust adjustment to the test statistic or to its reference distribution is needed when its null distribution deviates from a χ2 distribution, which usually arises when data do not follow a multivariate normal distribution. Unfortunately, existing studies on this issue typically focus on only a few methods and neglect the majority of alternative methods in statistics. Existing simulation studies typically consider only non-normal distributions of data that either satisfy asymptotic robustness or lead to an asymptotic scaled χ2 distribution. In this work we conduct a comprehensive study that involves both typical methods in SEM and less well-known methods from the statistics literature. We also propose the use of several novel non-normal data distributions that are qualitatively different from the non-normal distributions widely used in existing studies. We found that several under-studied methods give the best performance under specific conditions, but the Satorra-Bentler method remains the most viable method for most situations.
Collapse
Affiliation(s)
- Hao Wu
- Boston College, Chestnut Hill, Massachusetts, USA
| |
Collapse
|
26
|
Maalek R, Lichti DD, Ruwanpura JY. Robust Segmentation of Planar and Linear Features of Terrestrial Laser Scanner Point Clouds Acquired from Construction Sites. Sensors (Basel) 2018. [PMID: 29518062 PMCID: PMC5876591 DOI: 10.3390/s18030819] [Citation(s) in RCA: 34] [Impact Index Per Article: 5.7] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 11/16/2022]
Abstract
Automated segmentation of planar and linear features of point clouds acquired from construction sites is essential for the automatic extraction of building construction elements such as columns, beams and slabs. However, many planar and linear segmentation methods use scene-dependent similarity thresholds that may not provide generalizable solutions for all environments. In addition, outliers exist in construction site point clouds due to data artefacts caused by moving objects, occlusions and dust. To address these concerns, a novel method for robust classification and segmentation of planar and linear features is proposed. First, coplanar and collinear points are classified through a robust principal components analysis procedure. The classified points are then grouped using a new robust clustering method, the robust complete linkage method. A robust method is also proposed to extract the points of flat-slab floors and/or ceilings independent of the aforementioned stages to improve computational efficiency. The applicability of the proposed method is evaluated in eight datasets acquired from a complex laboratory environment and two construction sites at the University of Calgary. The precision, recall, and accuracy of the segmentation at both construction sites were 96.8%, 97.7% and 95%, respectively. These results demonstrate the suitability of the proposed method for robust segmentation of planar and linear features of contaminated datasets, such as those collected from construction sites.
Collapse
Affiliation(s)
- Reza Maalek
- Department of Civil Engineering, University of Calgary, Calgary, AB T2N 1N4, Canada.
| | - Derek D Lichti
- Department of Geomatics Engineering, University of Calgary, Calgary, AB T2N 1N4, Canada.
| | - Janaka Y Ruwanpura
- Department of Civil Engineering, University of Calgary, Calgary, AB T2N 1N4, Canada.
| |
Collapse
|
27
|
Rousselet GA, Pernet CR, Wilcox RR. Beyond differences in means: robust graphical methods to compare two groups in neuroscience. Eur J Neurosci 2017; 46:1738-1748. [PMID: 28544058 DOI: 10.1111/ejn.13610] [Citation(s) in RCA: 91] [Impact Index Per Article: 13.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/27/2016] [Revised: 05/02/2017] [Accepted: 05/16/2017] [Indexed: 12/18/2022]
Abstract
If many changes are necessary to improve the quality of neuroscience research, one relatively simple step could have great pay-offs: to promote the adoption of detailed graphical methods, combined with robust inferential statistics. Here, we illustrate how such methods can lead to a much more detailed understanding of group differences than bar graphs and t-tests on means. To complement the neuroscientist's toolbox, we present two powerful tools that can help us understand how groups of observations differ: the shift function and the difference asymmetry function. These tools can be combined with detailed visualisations to provide complementary perspectives about the data. We provide implementations in R and MATLAB of the graphical tools, and all the examples in the article can be reproduced using R scripts.
Collapse
Affiliation(s)
- Guillaume A Rousselet
- Institute of Neuroscience and Psychology, College of Medical, Veterinary and Life Sciences, University of Glasgow, 58 Hillhead Street, G12 8QB, Glasgow, UK
| | - Cyril R Pernet
- Centre for Clinical Brain Sciences, Neuroimaging Sciences, University of Edinburgh, Edinburgh, UK
| | - Rand R Wilcox
- Department of Psychology, University of Southern California, Los Angeles, CA, USA
| |
Collapse
|
28
|
Velina M, Valeinis J, Greco L, Luta G. Empirical Likelihood-Based ANOVA for Trimmed Means. Int J Environ Res Public Health 2016; 13:ijerph13100953. [PMID: 27690063 PMCID: PMC5086692 DOI: 10.3390/ijerph13100953] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Subscribe] [Scholar Register] [Received: 05/17/2016] [Revised: 09/15/2016] [Accepted: 09/20/2016] [Indexed: 11/16/2022]
Abstract
In this paper, we introduce an alternative to Yuen's test for the comparison of several population trimmed means. This nonparametric ANOVA type test is based on the empirical likelihood (EL) approach and extends the results for one population trimmed mean from Qin and Tsao (2002). The results of our simulation study indicate that for skewed distributions, with and without variance heterogeneity, Yuen's test performs better than the new EL ANOVA test for trimmed means with respect to control over the probability of a type I error. This finding is in contrast with our simulation results for the comparison of means, where the EL ANOVA test for means performs better than Welch's heteroscedastic F test. The analysis of a real data example illustrates the use of Yuen's test and the new EL ANOVA test for trimmed means for different trimming levels. Based on the results of our study, we recommend the use of Yuen's test for situations involving the comparison of population trimmed means between groups of interest.
Collapse
Affiliation(s)
- Mara Velina
- Department of Mathematics, Faculty of Physics and Mathematics, University of Latvia, Riga LV-1002, Latvia.
| | - Janis Valeinis
- Department of Mathematics, Faculty of Physics and Mathematics, University of Latvia, Riga LV-1002, Latvia.
| | - Luca Greco
- Department of Law, Economics, Management and Quantitative Methods, University of Sannio, Benevento 82100, Italy.
| | - George Luta
- Department of Biostatistics, Bioinformatics and Biomathematics, Georgetown University, Washington, DC 20057, USA.
| |
Collapse
|
29
|
Bowden J, Davey Smith G, Haycock PC, Burgess S. Consistent Estimation in Mendelian Randomization with Some Invalid Instruments Using a Weighted Median Estimator. Genet Epidemiol 2016; 40:304-14. [PMID: 27061298 PMCID: PMC4849733 DOI: 10.1002/gepi.21965] [Citation(s) in RCA: 3404] [Impact Index Per Article: 425.5] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/07/2015] [Revised: 12/03/2015] [Accepted: 02/04/2016] [Indexed: 12/29/2022]
Abstract
Developments in genome-wide association studies and the increasing availability of summary genetic association data have made application of Mendelian randomization relatively straightforward. However, obtaining reliable results from a Mendelian randomization investigation remains problematic, as the conventional inverse-variance weighted method only gives consistent estimates if all of the genetic variants in the analysis are valid instrumental variables. We present a novel weighted median estimator for combining data on multiple genetic variants into a single causal estimate. This estimator is consistent even when up to 50% of the information comes from invalid instrumental variables. In a simulation analysis, it is shown to have better finite-sample Type 1 error rates than the inverse-variance weighted method, and is complementary to the recently proposed MR-Egger (Mendelian randomization-Egger) regression method. In analyses of the causal effects of low-density lipoprotein cholesterol and high-density lipoprotein cholesterol on coronary artery disease risk, the inverse-variance weighted method suggests a causal effect of both lipid fractions, whereas the weighted median and MR-Egger regression methods suggest a null effect of high-density lipoprotein cholesterol that corresponds with the experimental evidence. Both median-based and MR-Egger regression methods should be considered as sensitivity analyses for Mendelian randomization investigations with multiple genetic variants.
Collapse
Affiliation(s)
- Jack Bowden
- Integrative Epidemiology Unit, University of Bristol, Bristol, United Kingdom
| | - George Davey Smith
- Integrative Epidemiology Unit, University of Bristol, Bristol, United Kingdom
| | - Philip C Haycock
- Integrative Epidemiology Unit, University of Bristol, Bristol, United Kingdom
| | - Stephen Burgess
- Department of Public Health and Primary Care, University of Cambridge, Cambridge, United Kingdom
| |
Collapse
|
30
|
Hanke AT, Klijn ME, Verhaert PDEM, van der Wielen LAM, Ottens M, Eppink MHM, van de Sandt EJAX. Prediction of protein retention times in hydrophobic interaction chromatography by robust statistical characterization of their atomic-level surface properties. Biotechnol Prog 2016; 32:372-81. [PMID: 26698169 DOI: 10.1002/btpr.2219] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/24/2015] [Revised: 11/09/2015] [Indexed: 11/11/2022]
Abstract
The correlation between the dimensionless retention times (DRT) of proteins in hydrophobic interaction chromatography (HIC) and their surface properties were investigated. A ternary atomic-level hydrophobicity scale was used to calculate the distribution of local average hydrophobicity across the proteins surfaces. These distributions were characterized by robust descriptive statistics to reduce their sensitivity to small changes in the three-dimensional structure. The applicability of these statistics for the prediction of protein retention behaviour was looked into. A linear combination of robust statistics describing the central tendency, heterogeneity and frequency of highly hydrophobic clusters was found to have a good predictive capability (R2 = 0.78), when combined a factor to account for protein size differences. The achieved error of prediction was 35% lower than for a similar model based on a description of the protein surface on an amino acid level. This indicates that a robust and mathematically simple model based on an atomic description of the protein surface can be used for the prediction of the retention behaviour of conformationally stable globular proteins with a well determined 3D structure in HIC. © 2016 American Institute of Chemical Engineers Biotechnol. Prog., 32:372-381, 2016.
Collapse
Affiliation(s)
- Alexander T Hanke
- Dept. of Biotechnology, TU Delft, Julianalaan 67, Delft, 2628 BC, The Netherlands
| | - Marieke E Klijn
- Dept. of Biotechnology, TU Delft, Julianalaan 67, Delft, 2628 BC, The Netherlands
| | - Peter D E M Verhaert
- Dept. of Biotechnology, TU Delft, Julianalaan 67, Delft, 2628 BC, The Netherlands
| | | | - Marcel Ottens
- Dept. of Biotechnology, TU Delft, Julianalaan 67, Delft, 2628 BC, The Netherlands
| | - Michel H M Eppink
- Synthon Biopharmaceuticals B.V, Microweg 22, GN, Nijmegen, 6503, The Netherlands
| | | |
Collapse
|
31
|
Abstract
We propose a new high dimensional semiparametric principal component analysis (PCA) method, named Copula Component Analysis (COCA). The semiparametric model assumes that, after unspecified marginally monotone transformations, the distributions are multivariate Gaussian. COCA improves upon PCA and sparse PCA in three aspects: (i) It is robust to modeling assumptions; (ii) It is robust to outliers and data contamination; (iii) It is scale-invariant and yields more interpretable results. We prove that the COCA estimators obtain fast estimation rates and are feature selection consistent when the dimension is nearly exponentially large relative to the sample size. Careful experiments confirm that COCA outperforms sparse PCA on both synthetic and real-world data sets.
Collapse
|
32
|
Abstract
This paper discusses statistical methods for estimating complex correlation structure from large pharmacogenomic datasets. We selectively review several prominent statistical methods for estimating large covariance matrix for understanding correlation structure, inverse covariance matrix for network modeling, large-scale simultaneous tests for selecting significantly differently expressed genes and proteins and genetic markers for complex diseases, and high dimensional variable selection for identifying important molecules for understanding molecule mechanisms in pharmacogenomics. Their applications to gene network estimation and biomarker selection are used to illustrate the methodological power. Several new challenges of Big data analysis, including complex data distribution, missing data, measurement error, spurious correlation, endogeneity, and the need for robust statistical methods, are also discussed.
Collapse
Affiliation(s)
- Jianqing Fan
- Department of Operations Research and Financial Engineering, Princeton University, Princeton, NJ 08544, USA.
| | | |
Collapse
|
33
|
Pernet CR, Wilcox R, Rousselet GA. Robust correlation analyses: false positive and power validation using a new open source matlab toolbox. Front Psychol 2013; 3:606. [PMID: 23335907 PMCID: PMC3541537 DOI: 10.3389/fpsyg.2012.00606] [Citation(s) in RCA: 320] [Impact Index Per Article: 29.1] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/31/2012] [Accepted: 12/19/2012] [Indexed: 11/29/2022] Open
Abstract
Pearson’s correlation measures the strength of the association between two variables. The technique is, however, restricted to linear associations and is overly sensitive to outliers. Indeed, a single outlier can result in a highly inaccurate summary of the data. Yet, it remains the most commonly used measure of association in psychology research. Here we describe a free Matlab(R) based toolbox (http://sourceforge.net/projects/robustcorrtool/) that computes robust measures of association between two or more random variables: the percentage-bend correlation and skipped-correlations. After illustrating how to use the toolbox, we show that robust methods, where outliers are down weighted or removed and accounted for in significance testing, provide better estimates of the true association with accurate false positive control and without loss of power. The different correlation methods were tested with normal data and normal data contaminated with marginal or bivariate outliers. We report estimates of effect size, false positive rate and power, and advise on which technique to use depending on the data at hand.
Collapse
Affiliation(s)
- Cyril R Pernet
- Brain Research Imaging Center, Division of Clinical Neurosciences, University of Edinburgh Edinburgh, UK
| | | | | |
Collapse
|
34
|
Abstract
Associations between two variables, for instance between brain and behavioral measurements, are often studied using correlations, and in particular Pearson correlation. However, Pearson correlation is not robust: outliers can introduce false correlations or mask existing ones. These problems are exacerbated in brain imaging by a widespread lack of control for multiple comparisons, and several issues with data interpretations. We illustrate these important problems associated with brain-behavior correlations, drawing examples from published articles. We make several propositions to alleviate these problems.
Collapse
Affiliation(s)
- Guillaume A Rousselet
- Centre for Cognitive Neuroimaging (CCNi), Institute of Neuroscience and Psychology, College of Medical, Veterinary and Life Sciences, University of Glasgow Glasgow, UK
| | | |
Collapse
|
35
|
Abstract
In high-dimensional model selection problems, penalized least-square approaches have been extensively used. This paper addresses the question of both robustness and efficiency of penalized model selection methods, and proposes a data-driven weighted linear combination of convex loss functions, together with weighted L(1)-penalty. It is completely data-adaptive and does not require prior knowledge of the error distribution. The weighted L(1)-penalty is used both to ensure the convexity of the penalty term and to ameliorate the bias caused by the L(1)-penalty. In the setting with dimensionality much larger than the sample size, we establish a strong oracle property of the proposed method that possesses both the model selection consistency and estimation efficiency for the true non-zero coefficients. As specific examples, we introduce a robust method of composite L1-L2, and optimal composite quantile method and evaluate their performance in both simulated and real data examples.
Collapse
Affiliation(s)
- Jelena Bradic
- Department of Operations Research and Financial Engineering, Princeton University, Princeton, USA
| | - Jianqing Fan
- Department of Operations Research and Financial Engineering, Princeton University, Princeton, USA
| | - Weiwei Wang
- Biostatistics/Epidemiology/Research Design (BERD) Core, Center for Clinical and Translational Sciences, The University of Texas Health Science Center at Houston, Houston, USA
| |
Collapse
|
36
|
Rousselet GA, Pernet CR. Quantifying the Time Course of Visual Object Processing Using ERPs: It's Time to Up the Game. Front Psychol 2011; 2:107. [PMID: 21779262 PMCID: PMC3132679 DOI: 10.3389/fpsyg.2011.00107] [Citation(s) in RCA: 62] [Impact Index Per Article: 4.8] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/23/2011] [Accepted: 05/11/2011] [Indexed: 11/16/2022] Open
Abstract
Hundreds of studies have investigated the early ERPs to faces and objects using scalp and intracranial recordings. The vast majority of these studies have used uncontrolled stimuli, inappropriate designs, peak measurements, poor figures, and poor inferential and descriptive group statistics. These problems, together with a tendency to discuss any effect p < 0.05 rather than to report effect sizes, have led to a research field very much qualitative in nature, despite its quantitative inspirations, and in which predictions do not go beyond condition A > condition B. Here we describe the main limitations of face and object ERP research and suggest alternative strategies to move forward. The problems plague intracranial and surface ERP studies, but also studies using more advanced techniques – e.g., source space analyses and measurements of network dynamics, as well as many behavioral, fMRI, TMS, and LFP studies. In essence, it is time to stop amassing binary results and start using single-trial analyses to build models of visual perception.
Collapse
Affiliation(s)
- Guillaume A Rousselet
- Centre for Cognitive Neuroimaging, Institute of Neuroscience and Psychology, University of Glasgow Glasgow, UK
| | | |
Collapse
|
37
|
Abstract
In this paper, we present a new Adaptive-Scale Kernel Consensus (ASKC) robust estimator as a generalization of the popular and state-of-the-art robust estimators such as RANdom SAmple Consensus (RANSAC), Adaptive Scale Sample Consensus (ASSC), and Maximum Kernel Density Estimator (MKDE). The ASKC framework is grounded on and unifies these robust estimators using nonparametric kernel density estimation theory. In particular, we show that each of these methods is a special case of ASKC using a specific kernel. Like these methods, ASKC can tolerate more than 50 percent outliers, but it can also automatically estimate the scale of inliers. We apply ASKC to two important areas in computer vision, robust motion estimation and pose estimation, and show comparative results on both synthetic and real data.
Collapse
Affiliation(s)
- Hanzi Wang
- School of Computer Science, The University of Adelaide, Adelaide SA 5005, Australia.
| | - Daniel Mirota
- Department of Computer Science, The Johns Hopkins University, 3400 N. Charles St., Baltimore, MD 21218.
| | - Gregory D. Hager
- Department of Computer Science, The Johns Hopkins University, 3400 N. Charles St., Baltimore, MD 21218.
| |
Collapse
|
38
|
Prieto JC, Croux C, Jiménez AR. RoPEUS: A New Robust Algorithm for Static Positioning in Ultrasonic Systems. Sensors (Basel) 2009; 9:4211-29. [PMID: 22408522 PMCID: PMC3291907 DOI: 10.3390/s90604211] [Citation(s) in RCA: 27] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 04/08/2009] [Revised: 05/06/2009] [Accepted: 05/14/2009] [Indexed: 11/28/2022]
Abstract
A well known problem for precise positioning in real environments is the presence of outliers in the measurement sample. Its importance is even bigger in ultrasound based systems since this technology needs a direct line of sight between emitters and receivers. Standard techniques for outlier detection in range based systems do not usually employ robust algorithms, failing when multiple outliers are present. The direct application of standard robust regression algorithms fails in static positioning (where only the current measurement sample is considered) in real ultrasound based systems mainly due to the limited number of measurements and the geometry effects. This paper presents a new robust algorithm, called RoPEUS, based on MM estimation, that follows a typical two-step strategy: 1) a high breakdown point algorithm to obtain a clean sample, and 2) a refinement algorithm to increase the accuracy of the solution. The main modifications proposed to the standard MM robust algorithm are a built in check of partial solutions in the first step (rejecting bad geometries) and the off-line calculation of the scale of the measurements. The algorithm is tested with real samples obtained with the 3D-LOCUS ultrasound localization system in an ideal environment without obstacles. These measurements are corrupted with typical outlying patterns to numerically evaluate the algorithm performance with respect to the standard parity space algorithm. The algorithm proves to be robust under single or multiple outliers, providing similar accuracy figures in all cases.
Collapse
Affiliation(s)
- José Carlos Prieto
- LOPSI group, Instituto de Automática Industrial, Consejo Superior de Investigaciones Científicas (CSIC), Ctra. Campo Real Km 0.200, 28500 La Poveda-Arganda del Rey, Madrid, Spain; E-mail: (A.R.J.)
- Author to whom correspondence should be addressed; E-mail: ; Tel.: (+34) 91 871 19 00; Fax: (+34) 91 871 70 50
| | - Christophe Croux
- Faculty of Business and Economics, K.U.Leuven, Naamsestraat 69, 3000 Leuven, Belgium; E-mail: (C.C.)
| | - Antonio Ramón Jiménez
- LOPSI group, Instituto de Automática Industrial, Consejo Superior de Investigaciones Científicas (CSIC), Ctra. Campo Real Km 0.200, 28500 La Poveda-Arganda del Rey, Madrid, Spain; E-mail: (A.R.J.)
| |
Collapse
|
39
|
Abstract
Mean values, traditionally used as a location parameter in the analysis of inter-comparisons, are known to lack stability against the effect of "outliers". It is therefore proposed to replace (or complement) them by the use of medians, which have better statistical "robustness". An estimate for the corresponding uncertainty is derived and the procedure is illustrated by a numerical example. The simplicity of the suggested robust approach should favor its practical use in a number of metrological applications.
Collapse
Affiliation(s)
- J W Müller
- Bureau International des Poids et Mesures, F-92312 Sevres, France
| |
Collapse
|