1
|
Cai J, Hu W, Yang Y, Yan H, Chen F. Outlier detection in spatial error models using modified thresholding-based iterative procedure for outlier detection approach. BMC Med Res Methodol 2024; 24:89. [PMID: 38622516 PMCID: PMC11323683 DOI: 10.1186/s12874-024-02208-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/28/2023] [Accepted: 03/26/2024] [Indexed: 04/17/2024] Open
Abstract
BACKGROUND Outliers, data points that significantly deviate from the norm, can have a substantial impact on statistical inference and provide valuable insights in data analysis. Multiple methods have been developed for outlier detection, however, almost all available approaches fail to consider the spatial dependence and heterogeneity in spatial data. Spatial data has diverse formats and semantics, requiring specialized outlier detection methodology to handle these unique properties. For now, there is limited research exists on robust spatial outlier detection methods designed specifically under the spatial error model (SEM) structure. METHOD We propose the Spatial-Θ-Iterative Procedure for Outlier Detection (Spatial-Θ-IPOD), which utilizes a mean-shift vector to identify outliers within the SEM. Our method enables an effective detection of spatial outliers while also providing robust coefficient estimates. To assess the performance of our approach, we conducted extensive simulations and applied it to a real-world empirical study using life expectancy data from multiple countries. RESULTS Simulation results showed that the masking and JD (Joint Detection) indicators of our Spatial-Θ-IPOD method outperformed several commonly used methods, even in high-dimensional scenarios, demonstrating stable performance. Conversely, the Θ-IPOD method proved to be ineffective in detecting outliers when spatial correlation was present. Moreover, our model successfully provided reliable coefficient estimation alongside outlier detection. The proposed method consistently outperformed other models (both robust and non-robust) in most cases. In the empirical study, our proposed model successfully detected outliers and provided valuable insights in the modeling process. CONCLUSIONS Our proposed Spatial-Θ-IPOD offers an effective solution for detecting spatial outliers for SEM while providing robust coefficient estimates. Notably, our approach showcases its relative superiority even in the presence of high leverage points. By successfully identifying outliers, our method enhances the overall understanding of the data and provides valuable insights for further analysis.
Collapse
Affiliation(s)
- Jiaxin Cai
- Department of Epidemiology and Biostatistics, School of Public Health, Xi'an Jiaotong University Health Science Center, No. 76, Yanta Xilu Road, Xi'an, 710061, Shaanxi, China
| | - Weiwei Hu
- Department of Epidemiology and Biostatistics, School of Public Health, Xi'an Jiaotong University Health Science Center, No. 76, Yanta Xilu Road, Xi'an, 710061, Shaanxi, China
| | - Yuhui Yang
- Department of Epidemiology and Biostatistics, School of Public Health, Xi'an Jiaotong University Health Science Center, No. 76, Yanta Xilu Road, Xi'an, 710061, Shaanxi, China
| | - Hong Yan
- Department of Epidemiology and Biostatistics, School of Public Health, Xi'an Jiaotong University Health Science Center, No. 76, Yanta Xilu Road, Xi'an, 710061, Shaanxi, China.
- Key Laboratory for Disease Prevention and Control and Health Promotion of Shaanxi Province, Xi'an Jiaotong University, Xi'an, 710061, Shaanxi, China.
| | - Fangyao Chen
- Department of Epidemiology and Biostatistics, School of Public Health, Xi'an Jiaotong University Health Science Center, No. 76, Yanta Xilu Road, Xi'an, 710061, Shaanxi, China.
- Key Laboratory for Disease Prevention and Control and Health Promotion of Shaanxi Province, Xi'an Jiaotong University, Xi'an, 710061, Shaanxi, China.
- Department of Radiology, First Affiliate Hospital of Xi'an Jiaotong University, Xi'an, 710061, Shaanxi, China.
| |
Collapse
|
2
|
Malashin IP, Tynchenko VS, Nelyub VA, Borodulin AS, Gantimurov AP. Estimation and Prediction of the Polymers' Physical Characteristics Using the Machine Learning Models. Polymers (Basel) 2023; 16:115. [PMID: 38201778 PMCID: PMC10780762 DOI: 10.3390/polym16010115] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/25/2023] [Revised: 12/23/2023] [Accepted: 12/27/2023] [Indexed: 01/12/2024] Open
Abstract
This article investigates the utility of machine learning (ML) methods for predicting and analyzing the diverse physical characteristics of polymers. Leveraging a rich dataset of polymers' characteristics, the study encompasses an extensive range of polymer properties, spanning compressive and tensile strength to thermal and electrical behaviors. Using various regression methods like Ensemble, Tree-based, Regularization, and Distance-based, the research undergoes thorough evaluation using the most common quality metrics. As a result of a series of experimental studies on the selection of effective model parameters, those that provide a high-quality solution to the stated problem were found. The best results were achieved by Random Forest with the highest R2 scores of 0.71, 0.73, and 0.88 for glass transition, thermal decomposition, and melting temperatures, respectively. The outcomes are intricately compared, providing valuable insights into the efficiency of distinct ML approaches in predicting polymer properties. Unknown values for each characteristic were predicted, and a method validation was performed by training on the predicted values, comparing the results with the specified variance values of each characteristic. The research not only advances our comprehension of polymer physics but also contributes to informed model selection and optimization for materials science applications.
Collapse
Affiliation(s)
- Ivan Pavlovich Malashin
- Artificial Intelligence Technology Scientific and Education Center, Bauman Moscow State Technical University, 105005 Moscow, Russia; (V.A.N.); (A.S.B.); (A.P.G.)
| | - Vadim Sergeevich Tynchenko
- Artificial Intelligence Technology Scientific and Education Center, Bauman Moscow State Technical University, 105005 Moscow, Russia; (V.A.N.); (A.S.B.); (A.P.G.)
- Information-Control Systems Department, Institute of Computer Science and Telecommunications, Reshetnev Siberian State University of Science and Technology, 660037 Krasnoyarsk, Russia
- Department of Technological Machines and Equipment of Oil and Gas Complex, School of Petroleum and Natural Gas Engineering, Siberian Federal University, 660041 Krasnoyarsk, Russia
| | - Vladimir Aleksandrovich Nelyub
- Artificial Intelligence Technology Scientific and Education Center, Bauman Moscow State Technical University, 105005 Moscow, Russia; (V.A.N.); (A.S.B.); (A.P.G.)
| | - Aleksei Sergeevich Borodulin
- Artificial Intelligence Technology Scientific and Education Center, Bauman Moscow State Technical University, 105005 Moscow, Russia; (V.A.N.); (A.S.B.); (A.P.G.)
| | - Andrei Pavlovich Gantimurov
- Artificial Intelligence Technology Scientific and Education Center, Bauman Moscow State Technical University, 105005 Moscow, Russia; (V.A.N.); (A.S.B.); (A.P.G.)
| |
Collapse
|
3
|
Li G, Jung JJ. Entropy-based dynamic graph embedding for anomaly detection on multiple climate time series. Sci Rep 2021; 11:13819. [PMID: 34226612 PMCID: PMC8257856 DOI: 10.1038/s41598-021-92973-8] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/22/2020] [Accepted: 06/18/2021] [Indexed: 12/01/2022] Open
Abstract
Abnormal climate event is that some meteorological conditions are extreme in a certain time interval. The existing methods for detecting abnormal climate events utilize supervised learning models to learn the abnormal patterns, but they cannot detect the untrained patterns. To overcome this problem, we construct a dynamic graph by discovering the correlation among the climate time series and propose a novel dynamic graph embedding model based on graph entropy called EDynGE to discriminate anomalies. The graph entropy measurement quantifies the information of the graphs and constructs the embedding space. We conducted experiments on synthetic datasets and real-world meteorological datasets. The results showed that EdynGE model achieved a better F1-score than the baselines by 43.2%, and the number of days of abnormal climate events has increased by 304.5 days in the past 30 years.
Collapse
Affiliation(s)
- Gen Li
- Department of Computer Engineering, Chung-Ang University, 84 Heukseok-ro, Dongjak-gu, Seoul, 06974, Republic of Korea
| | - Jason J Jung
- Department of Computer Engineering, Chung-Ang University, 84 Heukseok-ro, Dongjak-gu, Seoul, 06974, Republic of Korea.
| |
Collapse
|