1
|
Guo Q, Fang L, Wang R, Zhang C. Multivariate Time Series Forecasting Using Multiscale Recurrent Networks With Scale Attention and Cross-Scale Guidance. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2025; 36:540-554. [PMID: 37903050 DOI: 10.1109/tnnls.2023.3326140] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/01/2023]
Abstract
Multivariate time series (MTS) forecasting is considered as a challenging task due to complex and nonlinear interdependencies between time steps and series. With the advance of deep learning, significant efforts have been made to model long-term and short-term temporal patterns hidden in historical information by recurrent neural networks (RNNs) with a temporal attention mechanism. Although various forecasting models have been developed, most of them are single-scale oriented, resulting in scale information loss. In this article, we seamlessly integrate multiscale analysis into deep learning frameworks to build scale-aware recurrent networks and propose two multiscale recurrent network (MRN) models for MTS forecasting. The first model called MRN-SA adopts a scale attention mechanism to dynamically select the most relevant information from different scales and simultaneously employs input attention and temporal attention to make predictions. The second one named as MRN-CSG introduces a novel cross-scale guidance mechanism to exploit the information from coarse scale to guide the decoding process at fine scale, which results in a lightweight and more easily trained model without obvious loss of accuracy. Extensive experimental results demonstrate that both MRN-SA and MRN-CSG can achieve state-of-the-art performance on five typical MTS datasets in different domains. The source codes will be publicly available at https://github.com/qguo2010/MRN.
Collapse
|
2
|
Gong Y, Li Z, Liu W, Lu X, Liu X, Tsang IW, Yin Y. Missingness-Pattern-Adaptive Learning With Incomplete Data. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE 2023; 45:11053-11066. [PMID: 37030829 DOI: 10.1109/tpami.2023.3262784] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/19/2023]
Abstract
Many real-world problems deal with collections of data with missing values, e.g., RNA sequential analytics, image completion, video processing, etc. Usually, such missing data is a serious impediment to a good learning achievement. Existing methods tend to use a universal model for all incomplete data, resulting in a suboptimal model for each missingness pattern. In this paper, we present a general model for learning with incomplete data. The proposed model can be appropriately adjusted with different missingness patterns, alleviating competitions between data. Our model is based on observable features only, so it does not incur errors from data imputation. We further introduce a low-rank constraint to promote the generalization ability of our model. Analysis of the generalization error justifies our idea theoretically. In additional, a subgradient method is proposed to optimize our model with a proven convergence rate. Experiments on different types of data show that our method compares favorably with typical imputation strategies and other state-of-the-art models for incomplete data. More importantly, our method can be seamlessly incorporated into the neural networks with the best results achieved. The source code is released at https://github.com/YS-GONG/missingness-patterns.
Collapse
|
3
|
Yu Y, Zhou G, Zheng N, Qiu Y, Xie S, Zhao Q. Graph-Regularized Non-Negative Tensor-Ring Decomposition for Multiway Representation Learning. IEEE TRANSACTIONS ON CYBERNETICS 2023; 53:3114-3127. [PMID: 35468067 DOI: 10.1109/tcyb.2022.3157133] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/14/2023]
Abstract
Tensor-ring (TR) decomposition is a powerful tool for exploiting the low-rank property of multiway data and has been demonstrated great potential in a variety of important applications. In this article, non-negative TR (NTR) decomposition and graph-regularized NTR (GNTR) decomposition are proposed. The former equips TR decomposition with the ability to learn the parts-based representation by imposing non-negativity on the core tensors, and the latter additionally introduces a graph regularization to the NTR model to capture manifold geometry information from tensor data. Both of the proposed models extend TR decomposition and can be served as powerful representation learning tools for non-negative multiway data. The optimization algorithms based on an accelerated proximal gradient are derived for NTR and GNTR. We also empirically justified that the proposed methods can provide more interpretable and physically meaningful representations. For example, they are able to extract parts-based components with meaningful color and line patterns from objects. Extensive experimental results demonstrated that the proposed methods have better performance than state-of-the-art tensor-based methods in clustering and classification tasks.
Collapse
|
4
|
Zheng W, Chen G. An Accurate GRU-Based Power Time-Series Prediction Approach With Selective State Updating and Stochastic Optimization. IEEE TRANSACTIONS ON CYBERNETICS 2022; 52:13902-13914. [PMID: 34731085 DOI: 10.1109/tcyb.2021.3121312] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/13/2023]
Abstract
Accurate power time-series prediction is an important application for building new industrialized smart cities. The gated recurrent units (GRUs) models have been successfully employed to learn temporal information for power time-series prediction, demonstrating its effectiveness. However, from a statistical perspective, these existing models are geometrically ergodic with short-term memory that causes the learned temporal information to be quickly forgotten. Meanwhile, these existing approaches completely ignore the temporal dependencies between the gradient flow in the optimization algorithm, which greatly limits the prediction accuracy. To resolve these issues, we propose a novel GRU model coupling two new mechanisms of selective state updating and adaptive mixed gradient optimization (GRU-SSU-AMG) to improve the accuracy of prediction. Specifically, a tensor discriminator is used for adaptively determining whether hidden state information needs to be updated at each time step for learning the extremely fluctuating information in the proposed selective GRU (SGRU). In addition, an adaptive mixed gradient (AdaMG) optimization method that mixes the moment estimations is proposed to further improve the capability of learning the temporal dependencies information. The effectiveness of the GRU-SSU-AMG has been extensively evaluated on five different real-world datasets. The experimental results show that the GRU-SSU-AMG achieves significant accuracy improvement compared with the state-of-the-art approaches.
Collapse
|
5
|
Li D, Zhou B, Lin C, Gao J, Gao W, Gao A. Supply forecasting and profiling of urban supermarket chains based on tensor quantization exponential regression for social governance. PeerJ Comput Sci 2022; 8:e1138. [PMID: 36426261 PMCID: PMC9680888 DOI: 10.7717/peerj-cs.1138] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/14/2022] [Accepted: 10/05/2022] [Indexed: 06/16/2023]
Abstract
Background During the COVID-19 pandemic, the accurate forecasting and profiling of the supply of fresh commodities in urban supermarket chains may help the city government make better economic decisions, support activities of daily living, and optimize transportation to support social governance. In urban supermarket chains, the large variety of fresh commodities and the short shelf life of fresh commodities lead to the poor performance of the traditional fresh commodity supply forecasting algorithm. Methods Unlike the classic method of forecasting a single type of fresh commodity, we proposed a third-order exponential regression algorithm incorporating the block Hankle tensor. First, a multi-way delay embedding transform was used to fuse multiple fresh commodities sales to a Hankle tensor, for aggregating the correlation and mutual information of the whole category of fresh commodities. Second, high-order orthogonal iterations were performed for tensor decomposition, which effectively extracted the high-dimensional features of multiple related fresh commodities sales time series. Finally, a tensor quantization third-order exponential regression algorithm was employed to simultaneously predict the sales of multiple correlated fresh produce items. Results The experiment result showed that the provided tensor quantization exponential regression method reduced the normalized root mean square error by 24% and the symmetric mean absolute percentage error by 22%, compared with the state-of-the-art approaches.
Collapse
Affiliation(s)
- Dazhou Li
- College of Computer Science and Technology, Shenyang University of Chemical Technology, Shenyang, China
| | - Bo Zhou
- College of Computer Science and Technology, Shenyang University of Chemical Technology, Shenyang, China
| | - Chuan Lin
- Software College, Northeastern University, Shenyang, China
| | - Jian Gao
- Shenyang University of Technology, Shenyang, China
| | - Wei Gao
- College of Computer Science and Technology, Shenyang University of Chemical Technology, Shenyang, China
| | - Aimin Gao
- Liaoning Chain Operation Association, Shenyang, China
| |
Collapse
|
6
|
Chen X, Sun L. Bayesian Temporal Factorization for Multidimensional Time Series Prediction. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE 2022; 44:4659-4673. [PMID: 33729926 DOI: 10.1109/tpami.2021.3066551] [Citation(s) in RCA: 7] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/12/2023]
Abstract
Large-scale and multidimensional spatiotemporal data sets are becoming ubiquitous in many real-world applications such as monitoring urban traffic and air quality. Making predictions on these time series has become a critical challenge due to not only the large-scale and high-dimensional nature but also the considerable amount of missing data. In this paper, we propose a Bayesian temporal factorization (BTF) framework for modeling multidimensional time series-in particular spatiotemporal data-in the presence of missing values. By integrating low-rank matrix/tensor factorization and vector autoregressive (VAR) process into a single probabilistic graphical model, this framework can characterize both global and local consistencies in large-scale time series data. The graphical model allows us to effectively perform probabilistic predictions and produce uncertainty estimates without imputing those missing values. We develop efficient Gibbs sampling algorithms for model inference and model updating for real-time prediction and test the proposed BTF framework on several real-world spatiotemporal data sets for both missing data imputation and multi-step rolling prediction tasks. The numerical experiments demonstrate the superiority of the proposed BTF approaches over existing state-of-the-art methods.
Collapse
|
7
|
Hoeltgebaum H, Adams N, Fernandes C. Estimation, Forecasting, and Anomaly Detection for Nonstationary Streams Using Adaptive Estimation. IEEE TRANSACTIONS ON CYBERNETICS 2022; 52:7956-7967. [PMID: 33705331 DOI: 10.1109/tcyb.2021.3054161] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/12/2023]
Abstract
Streaming data provides substantial challenges for data analysis. From a computational standpoint, these challenges arise from constraints related to computer memory and processing speed. Statistically, the challenges relate to constructing procedures that can handle the so-called concept drift-the tendency of future data to have different underlying properties to current and historic data. The issue of handling structure, such as trend and periodicity, remains a difficult problem for streaming estimation. We propose the real-time adaptive component (RAC), a penalized-regression modeling framework that satisfies the computational constraints of streaming data, and provides the capability for dealing with concept drift. At the core of the estimation process are techniques from adaptive filtering. The RAC procedure adopts a specified basis to handle local structure, along with a least absolute shrinkage operator-like penalty procedure to handle over fitting. We enhance the RAC estimation procedure with a streaming anomaly detection capability. The experiments with simulated data suggest the procedure can be considered as a competitive tool for a variety of scenarios, and an illustration with real cyber-security data further demonstrates the promise of the method.
Collapse
|
8
|
Feng S, Han M, Zhang J, Qiu T, Ren W. Learning Both Dynamic-Shared and Dynamic-Specific Patterns for Chaotic Time-Series Prediction. IEEE TRANSACTIONS ON CYBERNETICS 2022; 52:4115-4125. [PMID: 33119517 DOI: 10.1109/tcyb.2020.3017736] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/11/2023]
Abstract
In the real world, multivariate time series from the dynamical system are correlated with deterministic relationships. Analyzing them dividedly instead of utilizing the shared-pattern of the dynamical system is time consuming and cumbersome. Multitask learning (MTL) is an effective inductive bias method to utilize latent shared features and discover the structural relationships from related tasks. Base on this concept, we propose a novel MTL model for multivariate chaotic time-series prediction, which could learn both dynamic-shared and dynamic-specific patterns. We implement the dynamic analysis of multiple time series through a special network structure design. The model could disentangle the complex relationships among multivariate chaotic time series and derive the common evolutionary trend of the multivariate chaotic dynamical system by inductive bias. We also develop an efficient Crank-Nicolson-like curvilinear update algorithm based on the alternating direction method of multipliers (ADMM) for the nonconvex nonsmooth Stiefel optimization problem. Simulation results and analysis demonstrate the effectiveness on dynamic-shared pattern discovery and prediction performance.
Collapse
|
9
|
Winter Wheat Yield Estimation Based on Optimal Weighted Vegetation Index and BHT-ARIMA Model. REMOTE SENSING 2022. [DOI: 10.3390/rs14091994] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/16/2022]
Abstract
This study aims to use remote sensing (RS) time-series data to explore the intrinsic relationship between crop growth and yield formation at different fertility stages and construct a high-precision winter wheat yield estimation model applicable to short time-series RS data. Sentinel-2 images were acquired in this study at six key phenological stages (rejuvenation stage, rising stage, jointing stage, heading stage, filling stage, filling-maturity stage) of winter wheat growth, and various vegetation indexes (VIs) at different fertility stages were calculated. Based on the characteristics of yield data continuity, the RReliefF algorithm was introduced to filter the optimal vegetation index combinations suitable for the yield estimation of winter wheat for all fertility stages. The Absolutely Objective Improved Analytic Hierarchy Process (AOIAHP) was innovatively proposed to determine the proportional contribution of crop growth to yield formation in six different phenological stages. The selected VIs consisting of MTCI(RE2), EVI, REP, MTCI(RE1), RECI(RE1), NDVI(RE1), NDVI(RE3), NDVI(RE2), NDVI, and MSAVI were then fused with the weights of different fertility periods to obtain time-series weighted data. For the characteristics of short time length and a small number of sequences of RS time-series data in yield estimation, this study applied the multiplexed delayed embedding transformation (MDT) technique to realize the data augmentation of the original short time series. Tucker decomposition was performed on the block Hankel tensor (BHT) obtained after MDT enhancement, and the core tensor was extracted while preserving the intrinsic connection of the time-series data. Finally, the resulting multidimensional core tensor was trained with the Autoregressive Integrated Moving Average (ARIMA) model to obtain the BHT-ARIMA model for wheat yield estimation. Compared to the performance of the BHT-ARIMA model with unweighted time-series data as input, the weighted time-series input significantly improves yield estimation accuracy. The coefficients of determination (R2) were improved from 0.325 to 0.583. The root mean square error (RMSE) decreased from 492.990 to 323.637 kg/ha, the mean absolute error (MAE) dropped from 350.625 to 255.954, and the mean absolute percentage error (MAPE) decreased from 4.332% to 3.186%. Besides, BHT-ARMA and BHT-CNN models were also used to compare with BHT-ARIMA. The results indicated that the BHT-ARIMA model still had the best yield prediction accuracy. The proposed method of this study will provide fast and accurate guidance for crop yield estimation and will be of great value for the processing and application of time-series RS data.
Collapse
|
10
|
Liu M, Hu H, Li L, Yu Y, Guan W. Chinese Image Caption Generation via Visual Attention and Topic Modeling. IEEE TRANSACTIONS ON CYBERNETICS 2022; 52:1247-1257. [PMID: 32568717 DOI: 10.1109/tcyb.2020.2997034] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/11/2023]
Abstract
Automatic image captioning is to conduct the cross-modal conversion from image visual content to natural language text. Involving computer vision (CV) and natural language processing (NLP), it has become one of the most sophisticated research issues in the artificial-intelligence area. Based on the deep neural network, the neural image caption (NIC) model has achieved remarkable performance in image captioning, yet there still remain some essential challenges, such as the deviation between descriptive sentences generated by the model and the intrinsic content expressed by the image, the low accuracy of the image scene description, and the monotony of generated sentences. In addition, most of the current datasets and methods for image captioning are in English. However, considering the distinction between Chinese and English in syntax and semantics, it is necessary to develop specialized Chinese image caption generation methods to accommodate the difference. To solve the aforementioned problems, we design the NICVATP2L model via visual attention and topic modeling, in which the visual attention mechanism reduces the deviation and the topic model improves the accuracy and diversity of generated sentences. Specifically, in the encoding phase, convolutional neural network (CNN) and topic model are used to extract visual and topic features of the input images, respectively. In the decoding phase, an attention mechanism is applied to processing image visual features for obtaining image visual region features. Finally, the topic features and the visual region features are combined to guide the two-layer long short-term memory (LSTM) network for generating Chinese image captions. To justify our model, we have conducted experiments over the Chinese AIC-ICC image dataset. The experimental results show that our model can automatically generate more informative and descriptive captions in Chinese in a more natural way, and it outperforms the existing image captioning NIC model.
Collapse
|
11
|
Unifying tensor factorization and tensor nuclear norm approaches for low-rank tensor completion. Neurocomputing 2021. [DOI: 10.1016/j.neucom.2021.06.020] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/24/2022]
|
12
|
Tan Q, Liu Y, Liu J. Demystifying Deep Learning in Predictive Spatiotemporal Analytics: An Information-Theoretic Framework. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2021; 32:3538-3552. [PMID: 32877340 DOI: 10.1109/tnnls.2020.3015215] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/11/2023]
Abstract
Deep learning has achieved incredible success over the past years, especially in various challenging predictive spatiotemporal analytics (PSTA) tasks, such as disease prediction, climate forecast, and traffic prediction, where intrinsic dependence relationships among data exist and generally manifest at multiple spatiotemporal scales. However, given a specific PSTA task and the corresponding data set, how to appropriately determine the desired configuration of a deep learning model, theoretically analyze the model's learning behavior, and quantitatively characterize the model's learning capacity remains a mystery. In order to demystify the power of deep learning for PSTA in a theoretically sound and explainable way, in this article, we provide a comprehensive framework for deep learning model design and information-theoretic analysis. First, we develop and demonstrate a novel interactively and integratively connected deep recurrent neural network (I2DRNN) model. I2DRNN consists of three modules: an input module that integrates data from heterogeneous sources; a hidden module that captures the information at different scales while allowing the information to flow interactively between layers; and an output module that models the integrative effects of information from various hidden layers to generate the output predictions. Second, to theoretically prove that our designed model can learn multiscale spatiotemporal dependence in PSTA tasks, we provide an information-theoretic analysis to examine the information-based learning capacity (i-CAP) of the proposed model. In so doing, we can tackle an important open question in deep learning, that is, how to determine the necessary and sufficient configurations of a designed deep learning model with respect to the given learning data sets. Third, to validate the I2DRNN model and confirm its i-CAP, we systematically conduct a series of experiments involving both synthetic data sets and real-world PSTA tasks. The experimental results show that the I2DRNN model outperforms both classical and state-of-the-art models on all data sets and PSTA tasks. More importantly, as readily validated, the proposed model captures the multiscale spatiotemporal dependence, which is meaningful in the real-world context. Furthermore, the model configuration that corresponds to the best performance on a given data set always falls into the range between the necessary and sufficient configurations, as derived from the information-theoretic analysis.
Collapse
|
13
|
Hsu NJ, Huang HC, Tsay RS. Matrix Autoregressive Spatio-Temporal Models. J Comput Graph Stat 2021. [DOI: 10.1080/10618600.2021.1938587] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/21/2022]
Affiliation(s)
- Nan-Jung Hsu
- Institute of Statistics, National Tsing-Hua University, Hsinchu, Taiwan
| | - Hsin-Cheng Huang
- Institute of Statistical Science, Academia Sinica, Taipei, Taiwan
| | - Ruey S. Tsay
- Booth School of Business, University of Chicago, Chicago, IL
| |
Collapse
|
14
|
Wang X, Kang Q, Zhou M, Pan L, Abusorrah A. Multiscale Drift Detection Test to Enable Fast Learning in Nonstationary Environments. IEEE TRANSACTIONS ON CYBERNETICS 2021; 51:3483-3495. [PMID: 32544055 DOI: 10.1109/tcyb.2020.2989213] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/11/2023]
Abstract
A model can be easily influenced by unseen factors in nonstationary environments and fail to fit dynamic data distribution. In a classification scenario, this is known as a concept drift. For instance, the shopping preference of customers may change after they move from one city to another. Therefore, a shopping website or application should alter recommendations based on its poorer predictions of such user patterns. In this article, we propose a novel approach called the multiscale drift detection test (MDDT) that efficiently localizes abrupt drift points when feature values fluctuate, meaning that the current model needs immediate adaption. MDDT is based on a resampling scheme and a paired student t -test. It applies a detection procedure on two different scales. Initially, the detection is performed on a broad scale to check if recently gathered drift indicators remain stationary. If a drift is claimed, a narrow scale detection is performed to trace the refined change time. This multiscale structure reduces the massive time of constantly checking and filters noises in drift indicators. Experiments are performed to compare the proposed method with several algorithms via synthetic and real-world datasets. The results indicate that it outperforms others when abrupt shift datasets are handled, and achieves the highest recall score in localizing drift points.
Collapse
|
15
|
Zhang L, Song L, Du B, Zhang Y. Nonlocal Low-Rank Tensor Completion for Visual Data. IEEE TRANSACTIONS ON CYBERNETICS 2021; 51:673-685. [PMID: 31021816 DOI: 10.1109/tcyb.2019.2910151] [Citation(s) in RCA: 22] [Impact Index Per Article: 5.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/09/2023]
Abstract
In this paper, we propose a novel nonlocal patch tensor-based visual data completion algorithm and analyze its potential problems. Our algorithm consists of two steps: the first step is initializing the image with triangulation-based linear interpolation and the second step is grouping similar nonlocal patches as a tensor then applying the proposed tensor completion technique. Specifically, with treating a group of patch matrices as a tensor, we impose the low-rank constraint on the tensor through the recently proposed tensor nuclear norm. Moreover, we observe that after the first interpolation step, the image gets blurred and, thus, the similar patches we have found may not exactly match the reference. We name the problem "Patch Mismatch," and then in order to avoid the error caused by it, we further decompose the patch tensor into a low-rank tensor and a sparse tensor, which means the accepted horizontal strips in mismatched patches. Furthermore, our theoretical analysis shows that the error caused by Patch Mismatch can be decomposed into two components, one of which can be bounded by a reasonable assumption named local patch similarity, and the other part is lower than that using matrix completion. Extensive experimental results on real-world datasets verify our method's superiority to the state-of-the-art tensor-based image inpainting methods.
Collapse
|