1
|
Tang K, Ma Y, Miao D, Song P, Gu Z, Tian Z, Wang W. Decision Fusion Networks for Image Classification. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2025; 36:3890-3903. [PMID: 35951567 DOI: 10.1109/tnnls.2022.3196129] [Citation(s) in RCA: 3] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/15/2023]
Abstract
Convolutional neural networks, in which each layer receives features from the previous layer(s) and then aggregates/abstracts higher level features from them, are widely adopted for image classification. To avoid information loss during feature aggregation/abstraction and fully utilize lower layer features, we propose a novel decision fusion module (DFM) for making an intermediate decision based on the features in the current layer and then fuse its results with the original features before passing them to the next layers. This decision is devised to determine an auxiliary category corresponding to the category at a higher hierarchical level, which can, thus, serve as category-coherent guidance for later layers. Therefore, by stacking a collection of DFMs into a classification network, the generated decision fusion network is explicitly formulated to progressively aggregate/abstract more discriminative features guided by these decisions and then refine the decisions based on the newly generated features in a layer-by-layer manner. Comprehensive results on four benchmarks validate that the proposed DFM can bring significant improvements for various common classification networks at a minimal additional computational cost and are superior to the state-of-the-art decision fusion-based methods. In addition, we demonstrate the generalization ability of the DFM to object detection and semantic segmentation.
Collapse
|
2
|
Liu M, Wang J, Wang F, Xiang F, Chen J. Non-Intrusive Speech Quality Assessment Based on Deep Neural Networks for Speech Communication. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2025; 36:174-187. [PMID: 37824322 DOI: 10.1109/tnnls.2023.3321076] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 10/14/2023]
Abstract
Traditionally, speech quality evaluation relies on subjective assessments or intrusive methods that require reference signals or additional equipment. However, over recent years, non-intrusive speech quality assessment has emerged as a promising alternative, capturing much attention from researchers and industry professionals. This article presents a deep learning-based method that exploits large-scale intrusive simulated data to improve the accuracy and generalization of non-intrusive methods. The major contributions of this article are as follows. First, it presents a data simulation method, which generates degraded speech signals and labels their speech quality with the perceptual objective listening quality assessment (POLQA). The generated data is proven to be useful for pretraining the deep learning models. Second, it proposes to apply an adversarial speaker classifier to reduce the impact of speaker-dependent information on speech quality evaluation. Third, an autoencoder-based deep learning scheme is proposed following the principle of representation learning and adversarial training (AT) methods, which is able to transfer the knowledge learned from a large amount of simulated speech data labeled by POLQA. With the help of discriminative representations extracted from the autoencoder, the prediction model can be trained well on a relatively small amount of speech data labeled through subjective listening tests. Fourth, an end-to-end speech quality evaluation neural network is developed, which takes magnitude and phase spectral features as its inputs. This phase-aware model is more accurate than the model using only the magnitude spectral features. A large number of experiments are carried out with three datasets: one simulated with labels obtained using POLQA and two recorded with labels obtained using subjective listening tests. The results show that the presented phase-aware method improves the performance of the baseline model and the proposed model with latent representations extracted from the adversarial autoencoder (AAE) outperforms the state-of-the-art objective quality assessment methods, reducing the root mean square error (RMSE) by 10.5% and 12.2% on the Beijing Institute of Technology (BIT) dataset and Tencent Corpus, respectively. The code and supplementary materials are available at https://github.com/liushenme/AAE-SQA.
Collapse
|
3
|
Luo H, Lin G, Shen F, Huang X, Yao Y, Shen H. Robust-EQA: Robust Learning for Embodied Question Answering With Noisy Labels. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2024; 35:12083-12094. [PMID: 37028297 DOI: 10.1109/tnnls.2023.3251984] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/19/2023]
Abstract
Embodied question answering (EQA) is a recently emerged research field in which an agent is asked to answer the user's questions by exploring the environment and collecting visual information. Plenty of researchers turn their attention to the EQA field due to its broad potential application areas, such as in-home robots, self-driven mobile, and personal assistants. High-level visual tasks, such as EQA, are susceptible to noisy inputs, because they have complex reasoning processes. Before the profits of the EQA field can be applied to practical applications, good robustness against label noise needs to be equipped. To tackle this problem, we propose a novel label noise-robust learning algorithm for the EQA task. First, a joint training co-regularization noise-robust learning method is proposed for noisy filtering of the visual question answering (VQA) module, which trains two parallel network branches by one loss function. Then, a two-stage hierarchical robust learning algorithm is proposed to filter out noisy navigation labels in both trajectory level and action level. Finally, by taking purified labels as inputs, a joint robust learning mechanism is given to coordinate the work of the whole EQA system. Empirical results demonstrate that, under extremely noisy environments (45% of noisy labels) and low-level noisy environments (20% of noisy labels), the robustness of deep learning models trained by our algorithm is superior to the existing EQA models in noisy environments.
Collapse
|
4
|
Yao Y, Yu B, Gong C, Liu T. Understanding How Pretraining Regularizes Deep Learning Algorithms. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2023; 34:5828-5840. [PMID: 34890343 DOI: 10.1109/tnnls.2021.3131377] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/13/2023]
Abstract
Deep learning algorithms have led to a series of breakthroughs in computer vision, acoustical signal processing, and others. However, they have only been popularized recently due to the groundbreaking techniques developed for training deep architectures. Understanding the training techniques is important if we want to further improve them. Through extensive experimentation, Erhan et al. (2010) empirically illustrated that unsupervised pretraining has an effect of regularization for deep learning algorithms. However, theoretical justifications for the observation remain elusive. In this article, we provide theoretical supports by analyzing how unsupervised pretraining regularizes deep learning algorithms. Specifically, we interpret deep learning algorithms as the traditional Tikhonov-regularized batch learning algorithms that simultaneously learn predictors in the input feature spaces and the parameters of the neural networks to produce the Tikhonov matrices. We prove that unsupervised pretraining helps in learning meaningful Tikhonov matrices, which will make the deep learning algorithms uniformly stable and the learned predictor will generalize fast w.r.t. the sample size. Unsupervised pretraining, therefore, can be interpreted as to have the function of regularization.
Collapse
|
5
|
Li M, Zhang K, Li J, Zuo W, Timofte R, Zhang D. Learning Context-Based Nonlocal Entropy Modeling for Image Compression. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2023; 34:1132-1145. [PMID: 34428157 DOI: 10.1109/tnnls.2021.3104974] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/13/2023]
Abstract
The entropy of the codes usually serves as the rate loss in the recent learned lossy image compression methods. Precise estimation of the probabilistic distribution of the codes plays a vital role in reducing the entropy and boosting the joint rate-distortion performance. However, existing deep learning based entropy models generally assume the latent codes are statistically independent or depend on some side information or local context, which fails to take the global similarity within the context into account and thus hinders the accurate entropy estimation. To address this issue, we propose a special nonlocal operation for context modeling by employing the global similarity within the context. Specifically, due to the constraint of context, nonlocal operation is incalculable in context modeling. We exploit the relationship between the code maps produced by deep neural networks and introduce the proxy similarity functions as a workaround. Then, we combine the local and the global context via a nonlocal attention block and employ it in masked convolutional networks for entropy modeling. Taking the consideration that the width of the transforms is essential in training low distortion models, we finally produce a U-net block in the transforms to increase the width with manageable memory consumption and time complexity. Experiments on Kodak and Tecnick datasets demonstrate the priority of the proposed context-based nonlocal attention block in entropy modeling and the U-net block in low distortion situations. On the whole, our model performs favorably against the existing image compression standards and recent deep image compression models.
Collapse
|
6
|
Two-stage natural scene image classification with noise discovering and label-correlation mining. Knowl Based Syst 2022. [DOI: 10.1016/j.knosys.2022.110137] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/25/2022]
|
7
|
Amrutha E, Arivazhagan S, Jebarani WSL. Deep Clustering Network for Steganographer Detection Using Latent Features Extracted from a Novel Convolutional Autoencoder. Neural Process Lett 2022. [DOI: 10.1007/s11063-022-10992-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
|
8
|
Critic-observer-based decentralized force/position approximate optimal control for modular and reconfigurable manipulators with uncertain environmental constraints. COMPLEX INTELL SYST 2022. [DOI: 10.1007/s40747-021-00538-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/20/2022]
Abstract
AbstractA critic-observer decentralized force/position approximate optimal control method is presented to address the joint trajectory and contacted force tracking problem of modular and reconfigurable manipulators (MRMs) with uncertain environmental constraints. The dynamic model of the MRM systems is formulated as an integration of joint subsystems via extensive state observer (ESO) associated with the effect of interconnected dynamic coupling (IDC). A radial basis function neural network (RBF-NN) is developed to deal with the IDC effects among the independent joint subsystems. Based on adaptive dynamic programming (ADP) approach and policy iteration (PI) algorithm, the Hamilton–Jacobi–Bellman (HJB) equation is approximately solved by establishing critic NN structure and then the approximated optimal control policy can be derived. The closed-loop manipulator system is proved to be asymptotic stable by using the Lyapunov theory. Finally, simulation results are provided to demonstrate the effectiveness and advantages of the proposed control method.
Collapse
|
9
|
Research on a Convolution Kernel Initialization Method for Speeding Up the Convergence of CNN. APPLIED SCIENCES-BASEL 2022. [DOI: 10.3390/app12020633] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 02/04/2023]
Abstract
This paper presents a convolution kernel initialization method based on the local binary patterns (LBP) algorithm and sparse autoencoder. This method can be applied to the initialization of the convolution kernel in the convolutional neural network (CNN). The main function of the convolution kernel is to extract the local pattern of the image by template matching as the target feature of subsequent image recognition. In general, the Xavier initialization method and the He initialization method are used to initialize the convolution kernel. In this paper, firstly, some typical sample images were selected from the training set, and the LBP algorithm was applied to extract the texture information of the typical sample images. Then, the texture information was divided into several small blocks, and these blocks were input into the sparse autoencoder (SAE) for pre-training. After finishing the training, the weight values of the sparse autoencoder that met the statistical features of the data set were used as the initial value of the convolution kernel in the CNN. The experimental result indicates that the method proposed in this paper can speed up the convergence of the network in the network training process and improve the recognition rate of the network to an extent.
Collapse
|
10
|
A Deep Autoencoder-Based Convolution Neural Network Framework for Bearing Fault Classification in Induction Motors. SENSORS 2021; 21:s21248453. [PMID: 34960552 PMCID: PMC8706012 DOI: 10.3390/s21248453] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 11/21/2021] [Revised: 12/15/2021] [Accepted: 12/15/2021] [Indexed: 11/17/2022]
Abstract
Fault diagnosis and classification for machines are integral to condition monitoring in the industrial sector. However, in recent times, as sensor technology and artificial intelligence have developed, data-driven fault diagnosis and classification have been more widely investigated. The data-driven approach requires good-quality features to attain good fault classification accuracy, yet domain expertise and a fair amount of labeled data are important for better features. This paper proposes a deep auto-encoder (DAE) and convolutional neural network (CNN)-based bearing fault classification model using motor current signals of an induction motor (IM). Motor current signals can be easily and non-invasively collected from the motor. However, the current signal collected from industrial sources is highly contaminated with noise; feature calculation thus becomes very challenging. The DAE is utilized for estimating the nonlinear function of the system with the normal state data, and later, the residual signal is obtained. The subsequent CNN model then successfully classified the types of faults from the residual signals. Our proposed semi-supervised approach achieved very high classification accuracy (more than 99%). The inclusion of DAE was found to not only improve the accuracy significantly but also to be potentially useful when the amount of labeled data is small. The experimental outcomes are compared with some existing works on the same dataset, and the performance of this proposed combined approach is found to be comparable with them. In terms of the classification accuracy and other evaluation parameters, the overall method can be considered as an effective approach for bearing fault classification using the motor current signal.
Collapse
|
11
|
MixNet: A Robust Mixture of Convolutional Neural Networks as Feature Extractors to Detect Stego Images Created by Content-Adaptive Steganography. Neural Process Lett 2021. [DOI: 10.1007/s11063-021-10661-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/20/2022]
|
12
|
Metzger A, Toscani M, Akbarinia A, Valsecchi M, Drewing K. Deep neural network model of haptic saliency. Sci Rep 2021; 11:1395. [PMID: 33446756 PMCID: PMC7809404 DOI: 10.1038/s41598-020-80675-6] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/20/2020] [Accepted: 12/18/2020] [Indexed: 11/29/2022] Open
Abstract
Haptic exploration usually involves stereotypical systematic movements that are adapted to the task. Here we tested whether exploration movements are also driven by physical stimulus features. We designed haptic stimuli, whose surface relief varied locally in spatial frequency, height, orientation, and anisotropy. In Experiment 1, participants subsequently explored two stimuli in order to decide whether they were same or different. We trained a variational autoencoder to predict the spatial distribution of touch duration from the surface relief of the haptic stimuli. The model successfully predicted where participants touched the stimuli. It could also predict participants' touch distribution from the stimulus' surface relief when tested with two new groups of participants, who performed a different task (Exp. 2) or explored different stimuli (Exp. 3). We further generated a large number of virtual surface reliefs (uniformly expressing a certain combination of features) and correlated the model's responses with stimulus properties to understand the model's preferences in order to infer which stimulus features were preferentially touched by participants. Our results indicate that haptic exploratory behavior is to some extent driven by the physical features of the stimuli, with e.g. edge-like structures, vertical and horizontal patterns, and rough regions being explored in more detail.
Collapse
Affiliation(s)
- Anna Metzger
- Justus-Liebig University Giessen, 35394, Giessen, Germany.
| | - Matteo Toscani
- Justus-Liebig University Giessen, 35394, Giessen, Germany
| | | | | | - Knut Drewing
- Justus-Liebig University Giessen, 35394, Giessen, Germany
| |
Collapse
|
13
|
Deng WY, Dong YY, Liu GD, Wang Y, Men J. Multiclass heterogeneous domain adaptation via bidirectional ECOC projection. Neural Netw 2019; 119:313-322. [DOI: 10.1016/j.neunet.2019.08.005] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/23/2018] [Revised: 08/02/2019] [Accepted: 08/06/2019] [Indexed: 10/26/2022]
|
14
|
Gao X, Zhou C, Chao F, Yang L, Lin CM, Xu T, Shang C, Shen Q. A data-driven robotic Chinese calligraphy system using convolutional auto-encoder and differential evolution. Knowl Based Syst 2019. [DOI: 10.1016/j.knosys.2019.06.010] [Citation(s) in RCA: 15] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
|
15
|
Zeng Z, Wang X, Yan F, Chen Y. Local adaptive learning for semi-supervised feature selection with group sparsity. Knowl Based Syst 2019. [DOI: 10.1016/j.knosys.2019.05.030] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
|
16
|
Stacked convolutional sparse denoising auto-encoder for identification of defect patterns in semiconductor wafer map. COMPUT IND 2019. [DOI: 10.1016/j.compind.2019.04.015] [Citation(s) in RCA: 45] [Impact Index Per Article: 7.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022]
|
17
|
Abstract
Image classification is an important problem in computer vision. The sparse coding spatial pyramid matching (ScSPM) framework is widely used in this field. However, the sparse coding cannot effectively handle very large training sets because of its high computational complexity, and ignoring the mutual dependence among local features results in highly variable sparse codes even for similar features. To overcome the shortcomings of previous sparse coding algorithm, we present an image classification method, which replaces the sparse dictionary with a stable dictionary learned via low computational complexity clustering, more specifically, a k-medoids cluster method optimized by k-means++. The proposed method can reduce the learning complexity and improve the feature’s stability. In the experiments, we compared the effectiveness of our method with the existing ScSPM method and its improved versions. We evaluated our approach on two diverse datasets: Caltech-101 and UIUC-Sports. The results show that our method can increase the accuracy of spatial pyramid matching, which suggests that our method is capable of improving performance of sparse coding features.
Collapse
|
18
|
Tsapanos N, Tefas A, Nikolaidis N, Pitas I. Neurons With Paraboloid Decision Boundaries for Improved Neural Network Classification Performance. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2019; 30:284-294. [PMID: 29994277 DOI: 10.1109/tnnls.2018.2839655] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/08/2023]
Abstract
In mathematical terms, an artificial neuron computes the inner product of a d -dimensional input vector x with its weight vector w , compares it with a bias value w0 and fires based on the result of this comparison. Therefore, its decision boundary is given by the equation wTx+w0=0 . In this paper, we propose replacing the linear hyperplane decision boundary of a neuron with a curved, paraboloid decision boundary. Thus, the decision boundary of the proposed paraboloid neuron is given by the equation (hTx+h0)2-||x-p||22=0 , where h and h0 denote the parameters of the directrix and p denotes the coordinates of the focus. Such paraboloid neural networks are proven to have superior recognition accuracy in a number of applications.
Collapse
|
19
|
Angelov PP, Gu X. Deep rule-based classifier with human-level performance and characteristics. Inf Sci (N Y) 2018. [DOI: 10.1016/j.ins.2018.06.048] [Citation(s) in RCA: 40] [Impact Index Per Article: 5.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/26/2022]
|
20
|
Tang C, Liu X, Li M, Wang P, Chen J, Wang L, Li W. Robust unsupervised feature selection via dual self-representation and manifold regularization. Knowl Based Syst 2018. [DOI: 10.1016/j.knosys.2018.01.009] [Citation(s) in RCA: 78] [Impact Index Per Article: 11.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
|
21
|
Li J, Chang H, Yang J, Luo W, Fu Y. Visual Representation and Classification by Learning Group Sparse Deep Stacking Network. IEEE TRANSACTIONS ON IMAGE PROCESSING : A PUBLICATION OF THE IEEE SIGNAL PROCESSING SOCIETY 2018; 27:464-476. [PMID: 29989968 DOI: 10.1109/tip.2017.2765833] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/08/2023]
Abstract
Deep stacking networks (DSNs) have been successfully applied in classification tasks. Its architecture builds upon blocks of simplified neural network modules (SNNM). The hidden units are assumed to be independent in the SNNM module. However, this assumption prevents SNNM from learning the local dependencies between hidden units to better capture the information in the input data for the classification task. In addition, the hidden representations of input data in each class can be expectantly split into a group in real-world classification applications. Therefore, we propose two kinds of group sparse SNNM modules by mixing -norm and -norm. The first module learns the local dependencies among hidden units by dividing them into non-overlapping groups. The second module splits the representations of samples in different classes into separate groups to cluster the samples in each class. A group sparse DSN (GS-DSN) is constructed by stacking the group sparse SNNM modules. Experimental results further verify that our GS-DSN model outperforms the relevant classification methods. Particularly, GS-DSN achieves the state-of-the-art performance (99.1%) on 15-Scene.
Collapse
|