1
|
Yang X, Li J, Jiang X. Research on information leakage in time series prediction based on empirical mode decomposition. Sci Rep 2024; 14:28362. [PMID: 39550475 PMCID: PMC11569228 DOI: 10.1038/s41598-024-80018-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/27/2024] [Accepted: 11/14/2024] [Indexed: 11/18/2024] Open
Abstract
Time series analysis predicts the future based on existing historical data and has a wide range of applications in finance, economics, meteorology, biology, engineering, and other fields. Although the combination of decomposition techniques and machine learning algorithms can effectively solve the problem of predicting nonstationary sequences, this kind of decomposition-integration-prediction strategy of the prediction method has serious defects. After the decomposition of the division of the training set and the test set, the information of the test set in the process of decomposition of the information leakage ultimately shows a high accuracy of the prediction of the illusionary. This paper proposes three improvement strategies for this type of "information leakage" problem: sliding window decomposition (SW-EMD), single training and multiple decomposition (STMP-EMD), and multiple training and multiple decomposition (MTMP-EMD). They are combined with a bidirectional multiscale temporal convolutional network (MSBTCN), bidirectional long- and short-term memory network (BiLSTM), and attention mechanism (DMAttention), which introduces a dependency matrix based on cosine similarity to be applied to water quality prediction. The experimental results show that the model achieves good performance in the prediction of three water quality indicators (pH, DO and KMnO4), and the accuracies of the three models proposed in this paper are improved by 1.958% and 0.853% in terms of the RMSE and MAPE, respectively, compared with those of the mainstream LSTM models. The key contributions of this study include the following: (1) three methods are proposed to improve the class EMD decomposition, which can effectively solve the problem of "information leakage" that exists in the current models via class EMD decomposition; (2) the CEEMDAN-MSBTCN-BiLSTM-DMAttention model structure is innovated by combining improved class EMD decomposition methods; and (3) the three improved decomposition methods proposed in this paper can effectively solve the problem of "information leakage" and optimize the prediction model at the same time. This study provides an effective experimental method for water quality prediction and can effectively address the problem of "overfitting" models via class EMD decompositions during model training and testing.
Collapse
Affiliation(s)
- Xinyi Yang
- School of Statistics and Mathematics, Zhongnan University of Economics and Law, Wuhan, 430073, China
- College of Finance and Statistics, Hunan University, Changsha, 410082, China
| | - Jingyi Li
- School of Statistics and Mathematics, Zhongnan University of Economics and Law, Wuhan, 430073, China
| | - Xuchu Jiang
- School of Statistics and Mathematics, Zhongnan University of Economics and Law, Wuhan, 430073, China.
- Emergency Management Research Center, Zhongnan University of Economics and Law, Wuhan, 430073, China.
| |
Collapse
|
2
|
Karbasi M, Ali M, Bateni SM, Jun C, Jamei M, Farooque AA, Yaseen ZM. Multi-step ahead forecasting of electrical conductivity in rivers by using a hybrid Convolutional Neural Network-Long Short-Term Memory (CNN-LSTM) model enhanced by Boruta-XGBoost feature selection algorithm. Sci Rep 2024; 14:15051. [PMID: 38951605 PMCID: PMC11217395 DOI: 10.1038/s41598-024-65837-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/25/2024] [Accepted: 06/24/2024] [Indexed: 07/03/2024] Open
Abstract
Electrical conductivity (EC) is widely recognized as one of the most essential water quality metrics for predicting salinity and mineralization. In the current research, the EC of two Australian rivers (Albert River and Barratta Creek) was forecasted for up to 10 days using a novel deep learning algorithm (Convolutional Neural Network combined with Long Short-Term Memory Model, CNN-LSTM). The Boruta-XGBoost feature selection method was used to determine the significant inputs (time series lagged data) to the model. To compare the performance of Boruta-XGB-CNN-LSTM models, three machine learning approaches-multi-layer perceptron neural network (MLP), K-nearest neighbour (KNN), and extreme gradient boosting (XGBoost) were used. Different statistical metrics, such as correlation coefficient (R), root mean square error (RMSE), and mean absolute percentage error, were used to assess the models' performance. From 10 years of data in both rivers, 7 years (2012-2018) were used as a training set, and 3 years (2019-2021) were used for testing the models. Application of the Boruta-XGB-CNN-LSTM model in forecasting one day ahead of EC showed that in both stations, Boruta-XGB-CNN-LSTM can forecast the EC parameter better than other machine learning models for the test dataset (R = 0.9429, RMSE = 45.6896, MAPE = 5.9749 for Albert River, and R = 0.9215, RMSE = 43.8315, MAPE = 7.6029 for Barratta Creek). Considering the better performance of the Boruta-XGB-CNN-LSTM model in both rivers, this model was used to forecast 3-10 days ahead of EC. The results showed that the Boruta-XGB-CNN-LSTM model is very capable of forecasting the EC for the next 10 days. The results showed that by increasing the forecasting horizon from 3 to 10 days, the performance of the Boruta-XGB-CNN-LSTM model slightly decreased. The results of this study show that the Boruta-XGB-CNN-LSTM model can be used as a good soft computing method for accurately predicting how the EC will change in rivers.
Collapse
Affiliation(s)
- Masoud Karbasi
- Water Engineering Department, Faculty of Agriculture, University of Zanjan, Zanjan, Iran.
| | - Mumtaz Ali
- UniSQ College, University of Southern Queensland, Springfield Campus, QLD, 4301, Australia
| | - Sayed M Bateni
- Department of Civil, Environmental and Construction Engineering and Water Resources Research Center, University of Hawaii at Manoa, Honolulu, HI, 96822, USA
| | - Changhyun Jun
- Department of Civil and Environmental Engineering, College of Engineering, Chung-Ang University, Seoul, Republic of Korea.
| | - Mehdi Jamei
- Faculty of Civil Engineering and Architecture, Shahid Chamran University of Ahvaz, Ahvaz, Iran
- New Era and Development in Civil Engineering Research Group, Scientific Research Center, Al-Ayen University, Thi-Qar, Nasiriyah, 64001, Iraq
| | - Aitazaz Ahsan Farooque
- Canadian Centre for Climate Change and Adaptation, University of Prince Edward Island, St Peters Bay, PE, Canada.
- Faculty of Sustainable Design Engineering, University of Prince Edward Island, Charlottetown, PE, C1A4P3, Canada.
| | - Zaher Mundher Yaseen
- Civil and Environmental Engineering Department, King Fahd University of Petroleum & Minerals, 31261, Dhahran, Saudi Arabia
| |
Collapse
|
3
|
Henríquez PA, Alessandri F. Analyzing digital societal interactions and sentiment classification in Twitter (X) during critical events in Chile. Heliyon 2024; 10:e32572. [PMID: 39668988 PMCID: PMC11637145 DOI: 10.1016/j.heliyon.2024.e32572] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/21/2024] [Revised: 06/04/2024] [Accepted: 06/05/2024] [Indexed: 12/14/2024] Open
Abstract
This study explores the influence of social media content on societal attitudes and actions during critical events, with a special focus on occurrences in Chile, such as the COVID-19 pandemic, the 2019 protests, and the wildfires in 2017 and 2023. By leveraging a novel tweet dataset, this study introduces new metrics for assessing sentiment, inclusivity, engagement, and impact, thereby providing a comprehensive framework for analyzing social media dynamics. The methodology employed enhances sentiment classification through the use of a Deep Random Vector Functional Link (D-RVFL) neural network, which demonstrates superior performance over traditional models such as Support Vector Machines (SVM), naive Bayes, and back propagation (BP) neural networks, achieving an overall average accuracy of 78.30% (0.17). This advancement is attributed to deep learning techniques with direct input-output connections that facilitate faster and more precise sentiment classification. This analysis differentiates the roles of influencers, press radio, and television handlers during crises, revealing how various social media actors affect information dissemination and audience engagement. By dissecting online behaviors and classifying sentiments using the RVFL network, this study sheds light on the effects of the digital landscape on societal attitudes and actions during emergencies. These findings underscore the importance of understanding the nuances of social media engagement to develop more effective crisis communication strategies.
Collapse
Affiliation(s)
- Pablo A. Henríquez
- Facultad de Administración y Economía, Universidad Diego Portales, Santiago, Chile
| | | |
Collapse
|
4
|
Xiao Y, Adegoke M, Leung CS, Leung KW. Robust noise-aware algorithm for randomized neural network and its convergence properties. Neural Netw 2024; 173:106202. [PMID: 38422835 DOI: 10.1016/j.neunet.2024.106202] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/17/2023] [Revised: 12/19/2023] [Accepted: 02/20/2024] [Indexed: 03/02/2024]
Abstract
The concept of randomized neural networks (RNNs), such as the random vector functional link network (RVFL) and extreme learning machine (ELM), is a widely accepted and efficient network method for constructing single-hidden layer feedforward networks (SLFNs). Due to its exceptional approximation capabilities, RNN is being extensively used in various fields. While the RNN concept has shown great promise, its performance can be unpredictable in imperfect conditions, such as weight noises and outliers. Thus, there is a need to develop more reliable and robust RNN algorithms. To address this issue, this paper proposes a new objective function that addresses the combined effect of weight noise and training data outliers for RVFL networks. Based on the half-quadratic optimization method, we then propose a novel algorithm, named noise-aware RNN (NARNN), to optimize the proposed objective function. The convergence of the NARNN is also theoretically validated. We also discuss the way to use the NARNN for ensemble deep RVFL (edRVFL) networks. Finally, we present an extension of the NARNN to concurrently address weight noise, stuck-at-fault, and outliers. The experimental results demonstrate that the proposed algorithm outperforms a number of state-of-the-art robust RNN algorithms.
Collapse
Affiliation(s)
- Yuqi Xiao
- Department of Electrical Engineering, City University of Hong Kong, Tat Chee Avenue, Kowloon, Hong Kong, HKSAR, China; State Key Laboratory of Terahertz and Millimeter Waves, City University of Hong Kong, Tat Chee Avenue, Kowloon, Hong Kong, HKSAR, China; Shenzhen Key Laboratory of Millimeter Wave and Wideband Wireless Communications, CityU Shenzhen Research Institute, Shenzhen, 518057, China.
| | - Muideen Adegoke
- Department of Electrical Engineering, City University of Hong Kong, Tat Chee Avenue, Kowloon, Hong Kong, HKSAR, China.
| | - Chi-Sing Leung
- Department of Electrical Engineering, City University of Hong Kong, Tat Chee Avenue, Kowloon, Hong Kong, HKSAR, China.
| | - Kwok Wa Leung
- Department of Electrical Engineering, City University of Hong Kong, Tat Chee Avenue, Kowloon, Hong Kong, HKSAR, China; State Key Laboratory of Terahertz and Millimeter Waves, City University of Hong Kong, Tat Chee Avenue, Kowloon, Hong Kong, HKSAR, China; Shenzhen Key Laboratory of Millimeter Wave and Wideband Wireless Communications, CityU Shenzhen Research Institute, Shenzhen, 518057, China.
| |
Collapse
|
5
|
Gao R, Li R, Hu M, Suganthan PN, Yuen KF. Online dynamic ensemble deep random vector functional link neural network for forecasting. Neural Netw 2023; 166:51-69. [PMID: 37480769 DOI: 10.1016/j.neunet.2023.06.042] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/29/2022] [Revised: 06/09/2023] [Accepted: 06/28/2023] [Indexed: 07/24/2023]
Abstract
This paper proposes a three-stage online deep learning model for time series based on the ensemble deep random vector functional link (edRVFL). The edRVFL stacks multiple randomized layers to enhance the single-layer RVFL's representation ability. Each hidden layer's representation is utilized for training an output layer, and the ensemble of all output layers forms the edRVFL's output. However, the original edRVFL is not designed for online learning, and the randomized nature of the features is harmful to extracting meaningful temporal features. In order to address the limitations and extend the edRVFL to an online learning mode, this paper proposes a dynamic edRVFL consisting of three online components, the online decomposition, the online training, and the online dynamic ensemble. First, an online decomposition is utilized as a feature engineering block for the edRVFL. Then, an online learning algorithm is designed to learn the edRVFL. Finally, an online dynamic ensemble method, which can measure the change in the distribution, is proposed for aggregating all layers' outputs. This paper evaluates and compares the proposed model with state-of-the-art methods on sixteen time series.
Collapse
Affiliation(s)
- Ruobin Gao
- School of Civil & Environmental Engineering, Nanyang Technological University, Singapore.
| | - Ruilin Li
- School of Electrical & Electronic Engineering, Nanyang Technological University, Singapore.
| | - Minghui Hu
- School of Electrical & Electronic Engineering, Nanyang Technological University, Singapore.
| | - P N Suganthan
- KINDI Center for Computing Research, College of Engineering, Qatar University, Doha, Qatar.
| | - Kum Fai Yuen
- School of Civil & Environmental Engineering, Nanyang Technological University, Singapore.
| |
Collapse
|
6
|
A multi-class classification model with parametrized target outputs for randomized-based feedforward neural networks. Appl Soft Comput 2022. [DOI: 10.1016/j.asoc.2022.109914] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/13/2022]
|
7
|
Graph ensemble deep random vector functional link network for traffic forecasting. Appl Soft Comput 2022. [DOI: 10.1016/j.asoc.2022.109809] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/15/2022]
|
8
|
Mo J, Gao R, Liu J, Du L, Yuen KF. Annual dilated convolutional LSTM network for time charter rate forecasting. Appl Soft Comput 2022. [DOI: 10.1016/j.asoc.2022.109259] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/22/2023]
|
9
|
Del Ser J, Casillas-Perez D, Cornejo-Bueno L, Prieto-Godino L, Sanz-Justo J, Casanova-Mateo C, Salcedo-Sanz S. Randomization-based machine learning in renewable energy prediction problems: Critical literature review, new results and perspectives. Appl Soft Comput 2022. [DOI: 10.1016/j.asoc.2022.108526] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/02/2022]
|
10
|
Liu Z, Jiang P, Wang J, Zhang L. Ensemble system for short term carbon dioxide emissions forecasting based on multi-objective tangent search algorithm. JOURNAL OF ENVIRONMENTAL MANAGEMENT 2022; 302:113951. [PMID: 34678540 DOI: 10.1016/j.jenvman.2021.113951] [Citation(s) in RCA: 14] [Impact Index Per Article: 4.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/19/2021] [Revised: 10/08/2021] [Accepted: 10/14/2021] [Indexed: 06/13/2023]
Abstract
Carbon emissions play a crucial role in inducing global warming and climate change. Accurate and stable carbon emissions forecasting is beneficial for formulating emissions reduction schemes and achieving carbon neutrality as early as possible. Although previous studies have concentrated on employing one or several methods for carbon emissions forecasting, the improvement in forecasting performance is limited because they ignore the importance of objectively selecting the models and the necessity of interval forecasting. In this paper, a novel ensemble prediction system, composed of data decomposition, model selection, phase space reconstruction, ensemble point prediction, and interval prediction, is proposed to conduct both point and interval predictions, which has been proven to be effective in prompting carbon emissions forecasting accuracy and stability. According to the empirical results, the mean MAPE results of our proposed forecasting strategy in point prediction are 1.1102% (in Dataset A) and 1.1382% (in Dataset B), and the mean CWC values in the interval forecasting are 0.3512 and 0.1572, respectively. Thus, the proposed forecasting system improves the forecasting performance relative to other models considerably, which can provide meaningful references for policymakers.
Collapse
Affiliation(s)
- Zhenkun Liu
- School of Statistics, Dongbei University of Finance and Economics, No. 217, Jianshan Road, Shahekou District, Dalian, Liaoning Province 116025, China.
| | - Ping Jiang
- School of Statistics, Dongbei University of Finance and Economics, No. 217, Jianshan Road, Shahekou District, Dalian, Liaoning Province 116025, China.
| | - Jianzhou Wang
- Institute of Systems Engineering, Macau University of Science and Technology, Taipa Street, Macau, China.
| | - Lifang Zhang
- School of Statistics, Dongbei University of Finance and Economics, No. 217, Jianshan Road, Shahekou District, Dalian, Liaoning Province 116025, China.
| |
Collapse
|