1
|
DeLong LN, Mir RF, Fleuriot JD. Neurosymbolic AI for Reasoning Over Knowledge Graphs: A Survey. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2025; 36:7822-7842. [PMID: 39024082 DOI: 10.1109/tnnls.2024.3420218] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 07/20/2024]
Abstract
Neurosymbolic artificial intelligence (AI) is an increasingly active area of research that combines symbolic reasoning methods with deep learning to leverage their complementary benefits. As knowledge graphs (KGs) are becoming a popular way to represent heterogeneous and multirelational data, methods for reasoning on graph structures have attempted to follow this neurosymbolic paradigm. Traditionally, such approaches have utilized either rule-based inference or generated representative numerical embeddings from which patterns could be extracted. However, several recent studies have attempted to bridge this dichotomy to generate models that facilitate interpretability, maintain competitive performance, and integrate expert knowledge. Therefore, we survey methods that perform neurosymbolic reasoning tasks on KGs and propose a novel taxonomy by which we can classify them. Specifically, we propose three major categories: 1) logically informed embedding approaches; 2) embedding approaches with logical constraints; and 3) rule-learning approaches. Alongside the taxonomy, we provide a tabular overview of the approaches and links to their source code, if available, for more direct comparison. Finally, we discuss the unique characteristics and limitations of these methods and then propose several prospective directions toward which this field of research could evolve.
Collapse
|
2
|
Zhu Z, Zhou Y, Dong Y, Zhong Z. PWLU: Learning Specialized Activation Functions With the Piecewise Linear Unit. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE 2023; 45:12269-12286. [PMID: 37314901 DOI: 10.1109/tpami.2023.3286109] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/16/2023]
Abstract
The choice of activation functions is crucial to deep neural networks. ReLU is a popular hand-designed activation function. Swish, the automatically searched activation function, outperforms ReLU on many challenging datasets. However, the search method has two main drawbacks. First, the tree-based search space is highly discrete and restricted, which is difficult to search. Second, the sample-based search method is inefficient in finding specialized activation functions for each dataset or neural architecture. To overcome these drawbacks, we propose a new activation function called Piecewise Linear Unit (PWLU), incorporating a carefully designed formulation and learning method. PWLU can learn specialized activation functions for different models, layers, or channels. Besides, we propose a non-uniform version of PWLU, which maintains sufficient flexibility but requires fewer intervals and parameters. Additionally, we generalize PWLU to three-dimensional space to define a piecewise linear surface named 2D-PWLU, which can be treated as a non-linear binary operator. Experimental results show that PWLU achieves SOTA performance on various tasks and models, and 2D-PWLU is better than element-wise addition when aggregating features from different branches. The proposed PWLU and its variation are easy to implement and efficient for inference, which can be widely applied in real-world applications.
Collapse
|
3
|
Liao M, Zhao JP, Tian J, Zheng CH. iEnhancer-DCLA: using the original sequence to identify enhancers and their strength based on a deep learning framework. BMC Bioinformatics 2022; 23:480. [PMCID: PMC9664816 DOI: 10.1186/s12859-022-05033-x] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/29/2022] [Accepted: 11/02/2022] [Indexed: 11/16/2022] Open
Abstract
AbstractEnhancers are small regions of DNA that bind to proteins, which enhance the transcription of genes. The enhancer may be located upstream or downstream of the gene. It is not necessarily close to the gene to be acted on, because the entanglement structure of chromatin allows the positions far apart in the sequence to have the opportunity to contact each other. Therefore, identifying enhancers and their strength is a complex and challenging task. In this article, a new prediction method based on deep learning is proposed to identify enhancers and enhancer strength, called iEnhancer-DCLA. Firstly, we use word2vec to convert k-mers into number vectors to construct an input matrix. Secondly, we use convolutional neural network and bidirectional long short-term memory network to extract sequence features, and finally use the attention mechanism to extract relatively important features. In the task of predicting enhancers and their strengths, this method has improved to a certain extent in most evaluation indexes. In summary, we believe that this method provides new ideas in the analysis of enhancers.
Collapse
|
4
|
Li J, Tao Z, Wu Y, Zhong B, Fu Y. Large-Scale Subspace Clustering by Independent Distributed and Parallel Coding. IEEE TRANSACTIONS ON CYBERNETICS 2022; 52:9090-9100. [PMID: 33635812 DOI: 10.1109/tcyb.2021.3052056] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/12/2023]
Abstract
Subspace clustering is a popular method to discover underlying low-dimensional structures of high-dimensional multimedia data (e.g., images, videos, and texts). In this article, we consider a large-scale subspace clustering (LS2C) problem, that is, partitioning million data points with a millon dimensions. To address this, we explore an independent distributed and parallel framework by dividing big data/variable matrices and regularization by both columns and rows. Specifically, LS2C is independently decomposed into many subproblems by distributing those matrices into different machines by columns since the regularization of the code matrix is equal to a sum of that of its submatrices (e.g., square-of-Frobenius/ l1 -norm). Consensus optimization is designed to solve these subproblems in a parallel way for saving communication costs. Moreover, we provide theoretical guarantees that LS2C can recover consensus subspace representations of high-dimensional data points under broad conditions. Compared with the state-of-the-art LS2C methods, our approach achieves better clustering results in public datasets, including a million images and videos.
Collapse
|
5
|
On the effectiveness of binary emulation in malware classification. JOURNAL OF INFORMATION SECURITY AND APPLICATIONS 2022. [DOI: 10.1016/j.jisa.2022.103258] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/24/2022]
|
6
|
Xu Y, Chen S, Li J, Han Z, Yang J. Autoencoder-Based Latent Block-Diagonal Representation for Subspace Clustering. IEEE TRANSACTIONS ON CYBERNETICS 2022; 52:5408-5418. [PMID: 33206621 DOI: 10.1109/tcyb.2020.3031666] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/11/2023]
Abstract
Block-diagonal representation (BDR) is an effective subspace clustering method. The existing BDR methods usually obtain a self-expression coefficient matrix from the original features by a shallow linear model. However, the underlying structure of real-world data is often nonlinear, thus those methods cannot faithfully reflect the intrinsic relationship among samples. To address this problem, we propose a novel latent BDR (LBDR) model to perform the subspace clustering on a nonlinear structure, which jointly learns an autoencoder and a BDR matrix. The autoencoder, which consists of a nonlinear encoder and a linear decoder, plays an important role to learn features from the nonlinear samples. Meanwhile, the learned features are used as a new dictionary for a linear model with block-diagonal regularization, which can ensure good performances for spectral clustering. Moreover, we theoretically prove that the learned features are located in the linear space, thus ensuring the effectiveness of the linear model using self-expression. Extensive experiments on various real-world datasets verify the superiority of our LBDR over the state-of-the-art subspace clustering approaches.
Collapse
|
7
|
Li J, Liu H, Tao Z, Zhao H, Fu Y. Learnable Subspace Clustering. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2022; 33:1119-1133. [PMID: 33306473 DOI: 10.1109/tnnls.2020.3040379] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/12/2023]
Abstract
This article studies the large-scale subspace clustering (LS2C) problem with millions of data points. Many popular subspace clustering methods cannot directly handle the LS2C problem although they have been considered to be state-of-the-art methods for small-scale data points. A simple reason is that these methods often choose all data points as a large dictionary to build huge coding models, which results in high time and space complexity. In this article, we develop a learnable subspace clustering paradigm to efficiently solve the LS2C problem. The key concept is to learn a parametric function to partition the high-dimensional subspaces into their underlying low-dimensional subspaces instead of the computationally demanding classical coding models. Moreover, we propose a unified, robust, predictive coding machine (RPCM) to learn the parametric function, which can be solved by an alternating minimization algorithm. Besides, we provide a bounded contraction analysis of the parametric function. To the best of our knowledge, this article is the first work to efficiently cluster millions of data points among the subspace clustering methods. Experiments on million-scale data sets verify that our paradigm outperforms the related state-of-the-art methods in both efficiency and effectiveness.
Collapse
|
8
|
Cheng C, Li H, Zhang L. Two-Branch Deconvolutional Network With Application in Stereo Matching. IEEE TRANSACTIONS ON IMAGE PROCESSING : A PUBLICATION OF THE IEEE SIGNAL PROCESSING SOCIETY 2021; 31:327-340. [PMID: 34871173 DOI: 10.1109/tip.2021.3131048] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/13/2023]
Abstract
Deconvolutional networks have attracted extensive attention and have been successfully applied in the field of computer vision. In this paper we propose a novel two-branch deconvolutional network (TBDN) that can improve the performance of conventional deconvolutional networks and reduce the computational complexity. A feasible iterative algorithm is designed to solve the optimization problem for the TBDN model, and a theoretical analysis of the convergence and computational complexity for the algorithm is also provided. The application of the TBDN in stereo matching is presented by constructing a disparity estimation network. Extensive experimental results on four commonly used datasets demonstrate the efficiency and effectiveness of the proposed TBDN.
Collapse
|
9
|
Wang T, Ng WWY, Pelillo M, Kwong S. LiSSA: Localized Stochastic Sensitive Autoencoders. IEEE TRANSACTIONS ON CYBERNETICS 2021; 51:2748-2760. [PMID: 31331899 DOI: 10.1109/tcyb.2019.2923756] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/10/2023]
Abstract
The training of autoencoder (AE) focuses on the selection of connection weights via a minimization of both the training error and a regularized term. However, the ultimate goal of AE training is to autoencode future unseen samples correctly (i.e., good generalization). Minimizing the training error with different regularized terms only indirectly minimizes the generalization error. Moreover, the trained model may not be robust to small perturbations of inputs which may lead to a poor generalization capability. In this paper, we propose a localized stochastic sensitive AE (LiSSA) to enhance the robustness of AE with respect to input perturbations. With the local stochastic sensitivity regularization, LiSSA reduces sensitivity to unseen samples with small differences (perturbations) from training samples. Meanwhile, LiSSA preserves the local connectivity from the original input space to the representation space that learns a more robustness features (intermediate representation) for unseen samples. The classifier using these learned features yields a better generalization capability. Extensive experimental results on 36 benchmarking datasets indicate that LiSSA outperforms several classical and recent AE training methods significantly on classification tasks.
Collapse
|
10
|
Qin N, Liang K, Huang D, Ma L, Kemp AH. Multiple Convolutional Recurrent Neural Networks for Fault Identification and Performance Degradation Evaluation of High-Speed Train Bogie. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2020; 31:5363-5376. [PMID: 32054588 DOI: 10.1109/tnnls.2020.2966744] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/27/2023]
Abstract
As an important part of high-speed train (HST), the mechanical performance of bogies imposes a direct impact on the safety and reliability of HST. It is a fact that, regardless of the potential mechanical performance degradation status, most existing fault diagnosis methods focus only on the identification of bogie fault types. However, for application scenarios such as auxiliary maintenance, identifying the performance degradation of bogie is critical in determining a particular maintenance strategy. In this article, by considering the intrinsic link between fault type and performance degradation of bogie, a novel multiple convolutional recurrent neural network (M-CRNN) that consists of two CRNN frameworks is proposed for simultaneous diagnosis of fault type and performance degradation state. Specifically, the CRNN framework 1 is designed to detect the fault types of the bogie. Meanwhile, CRNN framework 2, which is formed by CRNN Framework 1 and an RNN module, is adopted to further extract the features of fault performance degradation. It is worth highlighting that M-CRNN extends the structure of traditional neural networks and makes full use of the temporal correlation of performance degradation and model fault types. The effectiveness of the proposed M-CRNN algorithm is tested via the HST model CRH380A at different running speeds, including 160, 200, and 220 km/h. The overall accuracy of M-CRNN, i.e., the product of the accuracies for identifying the fault types and evaluating the fault performance degradation, is beyond 94.6% in all cases. This clearly demonstrates the potential applicability of the proposed method for multiple fault diagnosis tasks of HST bogie system.
Collapse
|
11
|
Tao Z, Liu H, Li S, Ding Z, Fu Y. Marginalized Multiview Ensemble Clustering. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2020; 31:600-611. [PMID: 30990450 DOI: 10.1109/tnnls.2019.2906867] [Citation(s) in RCA: 24] [Impact Index Per Article: 4.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/09/2023]
Abstract
Multiview clustering (MVC), which aims to explore the underlying cluster structure shared by multiview data, has drawn more research efforts in recent years. To exploit the complementary information among multiple views, existing methods mainly learn a common latent subspace or develop a certain loss across different views, while ignoring the higher level information such as basic partitions (BPs) generated by the single-view clustering algorithm. In light of this, we propose a novel marginalized multiview ensemble clustering (M2VEC) method in this paper. Specifically, we solve MVC in an EC way, which generates BPs for each view individually and seeks for a consensus one. By this means, we naturally leverage the complementary information of multiview data upon the same partition space. In order to boost the robustness of our approach, the marginalized denoising process is adopted to mimic the data corruptions and noises, which provides robust partition-level representations for each view by training a single-layer autoencoder. A low-rank and sparse decomposition is seamlessly incorporated into the denoising process to explicitly capture the consistency information and meanwhile compensate the distinctness between heterogeneous features. Spectral consensus graph partitioning is also involved by our model to make M2VEC as a unified optimization framework. Moreover, a multilayer M2VEC is eventually delivered in a stacked fashion to encapsulate nonlinearity into partition-level representations for handling complex data. Experimental results on eight real-world data sets show the efficacy of our approach compared with several state-of-the-art multiview and EC methods. We also showcase our method performs well with partial multiview data.
Collapse
|
12
|
Jeyaraj PR, Nadar ERS. Deep Boltzmann machine algorithm for accurate medical image analysis for classification of cancerous region. COGNITIVE COMPUTATION AND SYSTEMS 2019. [DOI: 10.1049/ccs.2019.0004] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/26/2022] Open
Affiliation(s)
- Pandia Rajan Jeyaraj
- Department of Electrical and Electronics EngineeringMepco Schlenk Engineering College (Autonomous)Sivakasi626005Tamil NaduIndia
| | - Edward Rajan Samuel Nadar
- Department of Electrical and Electronics EngineeringMepco Schlenk Engineering College (Autonomous)Sivakasi626005Tamil NaduIndia
| |
Collapse
|
13
|
Auto Encoder Feature Learning with Utilization of Local Spatial Information and Data Distribution for Classification of PolSAR Image. REMOTE SENSING 2019. [DOI: 10.3390/rs11111313] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/16/2022]
Abstract
The distribution of data plays a key role in the designing of a machine learning model. Therefore, this paper proposes a novel auto encoder network based on the distribution of polarimetric synthetic aperture radar (PolSAR) data matrix. Designed specifically for PolSAR data matrix, the proposed mixture auto encoder (MAE) feature learning method defines data error term in the loss function according to the data distribution. Instead of the pixel itself, all pixels in the neighborhood are used as input to train the proposed MAE. Then, a corresponding classification network is also given by discarding the decoder process of the proposed MAE and connecting with a Softmax classifier. The MAE is trained using the unlabeled data, while the training process of the classification network is completed with the help of a small number of labeled pixels. In view of the phenomenon of misclassification in the predicted result image, two post-processing steps acting on local spatial are also given, which accomplished by the proposed two filters. Extensive experiments by four methods were made over three real PolSAR images including the proposed classification network. The experimental results show that introducing data distribution into the auto encoder network leads to an average 4% improvement in overall accuracy for three PolSAR images. Moreover, the post-processing steps with the proposed filters bring a new level of discrimination on the classification performance of PolSAR images.
Collapse
|
14
|
Sridar P, Kumar A, Quinton A, Nanan R, Kim J, Krishnakumar R. Decision Fusion-Based Fetal Ultrasound Image Plane Classification Using Convolutional Neural Networks. ULTRASOUND IN MEDICINE & BIOLOGY 2019; 45:1259-1273. [PMID: 30826153 DOI: 10.1016/j.ultrasmedbio.2018.11.016] [Citation(s) in RCA: 20] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 11/08/2017] [Revised: 11/26/2018] [Accepted: 11/29/2018] [Indexed: 06/09/2023]
Abstract
Machine learning for ultrasound image analysis and interpretation can be helpful in automated image classification in large-scale retrospective analyses to objectively derive new indicators of abnormal fetal development that are embedded in ultrasound images. Current approaches to automatic classification are limited to the use of either image patches (cropped images) or the global (whole) image. As many fetal organs have similar visual features, cropped images can misclassify certain structures such as the kidneys and abdomen. Also, the whole image does not encode sufficient local information about structures to identify different structures in different locations. Here we propose a method to automatically classify 14 different fetal structures in 2-D fetal ultrasound images by fusing information from both cropped regions of fetal structures and the whole image. Our method trains two feature extractors by fine-tuning pre-trained convolutional neural networks with the whole ultrasound fetal images and the discriminant regions of the fetal structures found in the whole image. The novelty of our method is in integrating the classification decisions made from the global and local features without relying on priors. In addition, our method can use the classification outcome to localize the fetal structures in the image. Our experiments on a data set of 4074 2-D ultrasound images (training: 3109, test: 965) achieved a mean accuracy of 97.05%, mean precision of 76.47% and mean recall of 75.41%. The Cohen κ of 0.72 revealed the highest agreement between the ground truth and the proposed method. The superiority of the proposed method over the other non-fusion-based methods is statistically significant (p < 0.05). We found that our method is capable of predicting images without ultrasound scanner overlays with a mean accuracy of 92%. The proposed method can be leveraged to retrospectively classify any ultrasound images in clinical research.
Collapse
Affiliation(s)
- Pradeeba Sridar
- Department of Engineering Design, Indian Institute of Technology Madras, India; School of Computer Science, University of Sydney, Sydney, New South Wales, Australia
| | - Ashnil Kumar
- School of Computer Science, University of Sydney, Sydney, New South Wales, Australia
| | - Ann Quinton
- Sydney Medical School, University of Sydney, Sydney, New South Wales, Australia
| | - Ralph Nanan
- Sydney Medical School, University of Sydney, Sydney, New South Wales, Australia
| | - Jinman Kim
- School of Computer Science, University of Sydney, Sydney, New South Wales, Australia
| | | |
Collapse
|
15
|
Cai J, Huang X. Modified Sparse Linear-Discriminant Analysis via Nonconvex Penalties. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2018; 29:4957-4966. [PMID: 29994754 DOI: 10.1109/tnnls.2017.2785324] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/08/2023]
Abstract
This paper considers the linear-discriminant analysis (LDA) problem in the undersampled situation, in which the number of features is very large and the number of observations is limited. Sparsity is often incorporated in the solution of LDA to make a well interpretation of the results. However, most of the existing sparse LDA algorithms pursue sparsity by means of the $\ell _{1}$ -norm. In this paper, we give elaborate analysis for nonconvex penalties, including the $\ell _{0}$ -based and the sorted $\ell _{1}$ -based LDA methods. The latter one can be regarded as a bridge between the $\ell _{0}$ and $\ell _{1}$ penalties. These nonconvex penalty-based LDA algorithms are evaluated on the gene expression array and face database, showing high classification accuracy on real-world problems.
Collapse
|
16
|
Xing F, Xie Y, Su H, Liu F, Yang L. Deep Learning in Microscopy Image Analysis: A Survey. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2018; 29:4550-4568. [PMID: 29989994 DOI: 10.1109/tnnls.2017.2766168] [Citation(s) in RCA: 168] [Impact Index Per Article: 24.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/21/2023]
Abstract
Computerized microscopy image analysis plays an important role in computer aided diagnosis and prognosis. Machine learning techniques have powered many aspects of medical investigation and clinical practice. Recently, deep learning is emerging as a leading machine learning tool in computer vision and has attracted considerable attention in biomedical image analysis. In this paper, we provide a snapshot of this fast-growing field, specifically for microscopy image analysis. We briefly introduce the popular deep neural networks and summarize current deep learning achievements in various tasks, such as detection, segmentation, and classification in microscopy image analysis. In particular, we explain the architectures and the principles of convolutional neural networks, fully convolutional networks, recurrent neural networks, stacked autoencoders, and deep belief networks, and interpret their formulations or modelings for specific tasks on various microscopy images. In addition, we discuss the open challenges and the potential trends of future research in microscopy image analysis using deep learning.
Collapse
|
17
|
Novel Cross-View Human Action Model Recognition Based on the Powerful View-Invariant Features Technique. FUTURE INTERNET 2018. [DOI: 10.3390/fi10090089] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/16/2022] Open
Abstract
One of the most important research topics nowadays is human action recognition, which is of significant interest to the computer vision and machine learning communities. Some of the factors that hamper it include changes in postures and shapes and the memory space and time required to gather, store, label, and process the pictures. During our research, we noted a considerable complexity to recognize human actions from different viewpoints, and this can be explained by the position and orientation of the viewer related to the position of the subject. We attempted to address this issue in this paper by learning different special view-invariant facets that are robust to view variations. Moreover, we focused on providing a solution to this challenge by exploring view-specific as well as view-shared facets utilizing a novel deep model called the sample-affinity matrix (SAM). These models can accurately determine the similarities among samples of videos in diverse angles of the camera and enable us to precisely fine-tune transfer between various views and learn more detailed shared facets found in cross-view action identification. Additionally, we proposed a novel view-invariant facets algorithm that enabled us to better comprehend the internal processes of our project. Using a series of experiments applied on INRIA Xmas Motion Acquisition Sequences (IXMAS) and the Northwestern–UCLA Multi-view Action 3D (NUMA) datasets, we were able to show that our technique performs much better than state-of-the-art techniques.
Collapse
|
18
|
Luo W, Li J, Yang J, Xu W, Zhang J. Convolutional Sparse Autoencoders for Image Classification. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2018; 29:3289-3294. [PMID: 28682266 DOI: 10.1109/tnnls.2017.2712793] [Citation(s) in RCA: 14] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/07/2023]
Abstract
Convolutional sparse coding (CSC) can model local connections between image content and reduce the code redundancy when compared with patch-based sparse coding. However, CSC needs a complicated optimization procedure to infer the codes (i.e., feature maps). In this brief, we proposed a convolutional sparse auto-encoder (CSAE), which leverages the structure of the convolutional AE and incorporates the max-pooling to heuristically sparsify the feature maps for feature learning. Together with competition over feature channels, this simple sparsifying strategy makes the stochastic gradient descent algorithm work efficiently for the CSAE training; thus, no complicated optimization procedure is involved. We employed the features learned in the CSAE to initialize convolutional neural networks for classification and achieved competitive results on benchmark data sets. In addition, by building connections between the CSAE and CSC, we proposed a strategy to construct local descriptors from the CSAE for classification. Experiments on Caltech-101 and Caltech-256 clearly demonstrated the effectiveness of the proposed method and verified the CSAE as a CSC model has the ability to explore connections between neighboring image content for classification tasks.
Collapse
|
19
|
Tian Y, Kong Y, Ruan Q, An G, Fu Y. Hierarchical and Spatio-Temporal Sparse Representation for Human Action Recognition. IEEE TRANSACTIONS ON IMAGE PROCESSING : A PUBLICATION OF THE IEEE SIGNAL PROCESSING SOCIETY 2018; 27:1748-1762. [PMID: 29346092 DOI: 10.1109/tip.2017.2788196] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/07/2023]
Abstract
In this paper, we present a novel two-layer video representation for human action recognition employing hierarchical group sparse encoding technique and spatio-temporal structure. In the first layer, a new sparse encoding method named locally consistent group sparse coding (LCGSC) is proposed to make full use of motion and appearance information of local features. LCGSC method not only encodes global layouts of features within the same video-level groups, but also captures local correlations between them, which obtains expressive sparse representations of video sequences. Meanwhile, two kinds of efficient location estimation models, namely an absolute location model and a relative location model, are developed to incorporate spatio-temporal structure into LCGSC representations. In the second layer, action-level group is established, where a hierarchical LCGSC encoding scheme is applied to describe videos at different levels of abstractions. On the one hand, the new layer captures higher order dependency between video sequences; on the other hand, it takes label information into consideration to improve discrimination of videos' representations. The superiorities of our hierarchical framework are demonstrated on several challenging datasets.
Collapse
|
20
|
Li J, Chang H, Yang J, Luo W, Fu Y. Visual Representation and Classification by Learning Group Sparse Deep Stacking Network. IEEE TRANSACTIONS ON IMAGE PROCESSING : A PUBLICATION OF THE IEEE SIGNAL PROCESSING SOCIETY 2018; 27:464-476. [PMID: 29989968 DOI: 10.1109/tip.2017.2765833] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/08/2023]
Abstract
Deep stacking networks (DSNs) have been successfully applied in classification tasks. Its architecture builds upon blocks of simplified neural network modules (SNNM). The hidden units are assumed to be independent in the SNNM module. However, this assumption prevents SNNM from learning the local dependencies between hidden units to better capture the information in the input data for the classification task. In addition, the hidden representations of input data in each class can be expectantly split into a group in real-world classification applications. Therefore, we propose two kinds of group sparse SNNM modules by mixing -norm and -norm. The first module learns the local dependencies among hidden units by dividing them into non-overlapping groups. The second module splits the representations of samples in different classes into separate groups to cluster the samples in each class. A group sparse DSN (GS-DSN) is constructed by stacking the group sparse SNNM modules. Experimental results further verify that our GS-DSN model outperforms the relevant classification methods. Particularly, GS-DSN achieves the state-of-the-art performance (99.1%) on 15-Scene.
Collapse
|
21
|
Kong Y, Ding Z, Li J, Fu Y. Deeply Learned View-Invariant Features for Cross-View Action Recognition. IEEE TRANSACTIONS ON IMAGE PROCESSING : A PUBLICATION OF THE IEEE SIGNAL PROCESSING SOCIETY 2017; 26:3028-3037. [PMID: 28436876 DOI: 10.1109/tip.2017.2696786] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/07/2023]
Abstract
Classifying human actions from varied views is challenging due to huge data variations in different views. The key to this problem is to learn discriminative view-invariant features robust to view variations. In this paper, we address this problem by learning view-specific and view-shared features using novel deep models. View-specific features capture unique dynamics of each view while view-shared features encode common patterns across views. A novel sample-affinity matrix is introduced in learning shared features, which accurately balances information transfer within the samples from multiple views and limits the transfer across samples. This allows us to learn more discriminative shared features robust to view variations. In addition, the incoherence between the two types of features is encouraged to reduce information redundancy and exploit discriminative information in them separately. The discriminative power of the learned features is further improved by encouraging features in the same categories to be geometrically closer. Robust view-invariant features are finally learned by stacking several layers of features. Experimental results on three multi-view data sets show that our approaches outperform the state-of-the-art approaches.
Collapse
|