1
|
Zhang L, Du G, Liu F, Tu H, Shu X. Global-Local Multiple Granularity Learning for Cross-Modality Visible-Infrared Person Reidentification. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2025; 36:4209-4219. [PMID: 34138719 DOI: 10.1109/tnnls.2021.3085978] [Citation(s) in RCA: 3] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/27/2023]
Abstract
Cross-modality visible-infrared person reidentification (VI-ReID), which aims to retrieve pedestrian images captured by both visible and infrared cameras, is a challenging but essential task for smart surveillance systems. The huge barrier between visible and infrared images has led to the large cross-modality discrepancy and intraclass variations. Most existing VI-ReID methods tend to learn discriminative modality-sharable features based on either global or part-based representations, lacking effective optimization objectives. In this article, we propose a novel global-local multichannel (GLMC) network for VI-ReID, which can learn multigranularity representations based on both global and local features. The coarse- and fine-grained information can complement each other to form a more discriminative feature descriptor. Besides, we also propose a novel center loss function that aims to simultaneously improve the intraclass cross-modality similarity and enlarge the interclass discrepancy to explicitly handle the cross-modality discrepancy issue and avoid the model fluctuating problem. Experimental results on two public datasets have demonstrated the superiority of the proposed method compared with state-of-the-art approaches in terms of effectiveness.
Collapse
|
2
|
Sun X, Yao F, Ding C. Modeling High-Order Relationships: Brain-Inspired Hypergraph-Induced Multimodal-Multitask Framework for Semantic Comprehension. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2024; 35:12142-12156. [PMID: 37028292 DOI: 10.1109/tnnls.2023.3252359] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/19/2023]
Abstract
Semantic comprehension aims to reasonably reproduce people's real intentions or thoughts, e.g., sentiment, humor, sarcasm, motivation, and offensiveness, from multiple modalities. It can be instantiated as a multimodal-oriented multitask classification issue and applied to scenarios, such as online public opinion supervision and political stance analysis. Previous methods generally employ multimodal learning alone to deal with varied modalities or solely exploit multitask learning to solve various tasks, a few to unify both into an integrated framework. Moreover, multimodal-multitask cooperative learning could inevitably encounter the challenges of modeling high-order relationships, i.e., intramodal, intermodal, and intertask relationships. Related research of brain sciences proves that the human brain possesses multimodal perception and multitask cognition for semantic comprehension via decomposing, associating, and synthesizing processes. Thus, establishing a brain-inspired semantic comprehension framework to bridge the gap between multimodal and multitask learning becomes the primary motivation of this work. Motivated by the superiority of the hypergraph in modeling high-order relations, in this article, we propose a hypergraph-induced multimodal-multitask (HIMM) network for semantic comprehension. HIMM incorporates monomodal, multimodal, and multitask hypergraph networks to, respectively, mimic the decomposing, associating, and synthesizing processes to tackle the intramodal, intermodal, and intertask relationships accordingly. Furthermore, temporal and spatial hypergraph constructions are designed to model the relationships in the modality with sequential and spatial structures, respectively. Also, we elaborate a hypergraph alternative updating algorithm to ensure that vertices aggregate to update hyperedges and hyperedges converge to update their connected vertices. Experiments on the dataset with two modalities and five tasks verify the effectiveness of HIMM on semantic comprehension.
Collapse
|
3
|
Zhou Q, Zhong B, Liu X, Ji R. Attention-Based Neural Architecture Search for Person Re-Identification. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2022; 33:6627-6639. [PMID: 34057899 DOI: 10.1109/tnnls.2021.3082701] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/12/2023]
Abstract
Recent years have witnessed significant progress of person reidentification (reID) driven by expert-designed deep neural network architectures. Despite the remarkable success, such architectures often suffer from high model complexity and time-consuming pretraining process, as well as the mismatches between the image classification-driven backbones and the reID task. To address these issues, we introduce neural architecture search (NAS) into automatically designing person reID backbones, i.e., reID-NAS, which is achieved via automatically searching attention-based network architectures from scratch. Different from traditional NAS approaches that originated for image classification, we design a reID-based search space as well as a search objective to fit NAS for the reID tasks. In terms of the search space, reID-NAS includes a lightweight attention module to precisely locate arbitrary pedestrian bounding boxes, which is automatically added as attention to the reID architectures. In terms of the search objective, reID-NAS introduces a new retrieval objective to search and train reID architectures from scratch. Finally, we propose a hybrid optimization strategy to improve the search stability in reID-NAS. In our experiments, we validate the effectiveness of different parts in reID-NAS, and show that the architecture searched by reID-NAS achieves a new state of the art, with one order of magnitude fewer parameters on three-person reID datasets. As a concomitant benefit, the reliance on the pretraining process is vastly reduced by reID-NAS, which facilitates one to directly search and train a lightweight reID model from scratch.
Collapse
|
4
|
Guo Y, Zhao L, Shi Y, Zhang X, Du S, Wang F. Adaptive weighted robust iterative closest point. Neurocomputing 2022. [DOI: 10.1016/j.neucom.2022.08.047] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
|
5
|
Search-Based Cost-Sensitive Hypergraph Learning for Anomaly Detection. Inf Sci (N Y) 2022. [DOI: 10.1016/j.ins.2022.07.029] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022]
|
6
|
Wei Z, Yang X, Wang N, Gao X. Flexible Body Partition-Based Adversarial Learning for Visible Infrared Person Re-Identification. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2022; 33:4676-4687. [PMID: 33651699 DOI: 10.1109/tnnls.2021.3059713] [Citation(s) in RCA: 14] [Impact Index Per Article: 4.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/27/2023]
Abstract
Person re-identification (Re-ID) aims to retrieve images of the same person across disjoint camera views. Most Re-ID studies focus on pedestrian images captured by visible cameras, without considering the infrared images obtained in the dark scenarios. Person retrieval between visible and infrared modalities is of great significance to public security. Current methods usually train a model to extract global feature descriptors and obtain discriminative representations for visible infrared person Re-ID (VI-REID). Nevertheless, they ignore the detailed information of heterogeneous pedestrian images, which affects the performance of Re-ID. In this article, we propose a flexible body partition (FBP) model-based adversarial learning method (FBP-AL) for VI-REID. To learn more fine-grained information, FBP model is exploited to automatically distinguish part representations according to the feature maps of pedestrian images. Specially, we design a modality classifier and introduce adversarial learning which attempts to discriminate features between visible and infrared modality. Adaptive weighting-based representation learning and threefold triplet loss-based metric learning compete with modality classification to obtain more effective modality-sharable features, thus shrinking the cross-modality gap and enhancing the feature discriminability. Extensive experimental results on two cross-modality person Re-ID data sets, i.e., SYSU-MM01 and RegDB, exhibit the superiority of the proposed method compared with the state-of-the-art solutions.
Collapse
|
7
|
Research for an Adaptive Classifier Based on Dynamic Graph Learning. Neural Process Lett 2022. [DOI: 10.1007/s11063-021-10452-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/22/2022]
|
8
|
Nie WZ, Liu AA, Zhao S, Gao Y. Deep Correlated Joint Network for 2-D Image-Based 3-D Model Retrieval. IEEE TRANSACTIONS ON CYBERNETICS 2022; 52:1862-1871. [PMID: 32603301 DOI: 10.1109/tcyb.2020.2995415] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/11/2023]
Abstract
In this article, we propose a novel deep correlated joint network (DCJN) approach for 2-D image-based 3-D model retrieval. First, the proposed method can jointly learn two distinct deep neural networks, which are trained for individual modalities to learn two deep nonlinear transformations for visual feature extraction from the co-embedding feature space. Second, we propose the global loss function for the DCJN, consisting of a discriminative loss and a correlation loss. The discriminative loss aims to minimize the intraclass distance of the extracted features and maximize the interclass distance of such features to a large margin within each modality, while the correlation loss focuses on mitigating the distribution discrepancy across different modalities. Consequently, the proposed method can realize cross-modality feature extraction guided by the defined global loss function to benefit the similarity measure between 2-D images and 3-D models. For a comparison experiment, we contribute the current largest 2-D image-based 3-D model retrieval dataset. Moreover, the proposed method was further evaluated on three popular benchmarks, including the 3-D Shape Retrieval Contest 2014, 2016, and 2018 benchmarks. The extensive comparison experimental results demonstrate the superiority of this method over the state-of-the-art methods.
Collapse
|
9
|
Liu Y, Sun Q, He X, Liu AA, Su Y, Chua TS. Generating Face Images With Attributes for Free. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2021; 32:2733-2743. [PMID: 32697723 DOI: 10.1109/tnnls.2020.3007790] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/11/2023]
Abstract
With superhuman-level performance of face recognition, we are more concerned about the recognition of fine-grained attributes, such as emotion, age, and gender. However, given that the label space is extremely large and follows a long-tail distribution, it is quite expensive to collect sufficient samples for fine-grained attributes. This results in imbalanced training samples and inferior attribute recognition models. To this end, we propose the use of arbitrary attribute combinations, without human effort, to synthesize face images. In particular, to bridge the semantic gap between high-level attribute label space and low-level face image, we propose a novel neural-network-based approach that maps the target attribute labels to an embedding vector, which can be fed into a pretrained image decoder to synthesize a new face image. Furthermore, to regularize the attribute for image synthesis, we propose to use a perceptual loss to make the new image explicitly faithful to target attributes. Experimental results show that our approach can generate photorealistic face images from attribute labels, and more importantly, by serving as augmented training samples, these images can significantly boost the performance of attribute recognition model. The code is open-sourced at this link.
Collapse
|
10
|
Dong P, Guo Y, Gao Y, Liang P, Shi Y, Wu G. Multi-Atlas Segmentation of Anatomical Brain Structures Using Hierarchical Hypergraph Learning. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2020; 31:3061-3072. [PMID: 31502994 DOI: 10.1109/tnnls.2019.2935184] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/10/2023]
Abstract
Accurate segmentation of anatomical brain structures is crucial for many neuroimaging applications, e.g., early brain development studies and the study of imaging biomarkers of neurodegenerative diseases. Although multi-atlas segmentation (MAS) has achieved many successes in the medical imaging area, this approach encounters limitations in segmenting anatomical structures associated with poor image contrast. To address this issue, we propose a new MAS method that uses a hypergraph learning framework to model the complex subject-within and subject-to-atlas image voxel relationships and propagate the label on the atlas image to the target subject image. To alleviate the low-image contrast issue, we propose two strategies equipped with our hypergraph learning framework. First, we use a hierarchical strategy that exploits high-level context features for hypergraph construction. Because the context features are computed on the tentatively estimated probability maps, we can ultimately turn the hypergraph learning into a hierarchical model. Second, instead of only propagating the labels from the atlas images to the target subject image, we use a dynamic label propagation strategy that can gradually use increasing reliably identified labels from the subject image to aid in predicting the labels on the difficult-to-label subject image voxels. Compared with the state-of-the-art label fusion methods, our results show that the hierarchical hypergraph learning framework can substantially improve the robustness and accuracy in the segmentation of anatomical brain structures with low image contrast from magnetic resonance (MR) images.
Collapse
|
11
|
Zhou R, Chang X, Shi L, Shen YD, Yang Y, Nie F. Person Reidentification via Multi-Feature Fusion With Adaptive Graph Learning. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2020; 31:1592-1601. [PMID: 31283511 DOI: 10.1109/tnnls.2019.2920905] [Citation(s) in RCA: 38] [Impact Index Per Article: 7.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/09/2023]
Abstract
The goal of person reidentification (Re-ID) is to identify a given pedestrian from a network of nonoverlapping surveillance cameras. Most existing works follow the supervised learning paradigm which requires pairwise labeled training data for each pair of cameras. However, this limits their scalability to real-world applications where abundant unlabeled data are available. To address this issue, we propose a multi-feature fusion with adaptive graph learning model for unsupervised Re-ID. Our model aims to negotiate comprehensive assessment on the consistent graph structure of pedestrians with the help of special information of feature descriptors. Specifically, we incorporate multi-feature dictionary learning and adaptive multi-feature graph learning into a unified learning model such that the learned dictionaries are discriminative and the subsequent graph structure learning is accurate. An alternating optimization algorithm with proved convergence is developed to solve the final optimization objective. Extensive experiments on four benchmark data sets demonstrate the superiority and effectiveness of the proposed method.
Collapse
|
12
|
Zero-Shot Classification Based on Multitask Mixed Attribute Relations and Attribute-Specific Features. IEEE Trans Cogn Dev Syst 2020. [DOI: 10.1109/tcds.2019.2902250] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/06/2022]
|
13
|
Wu L, Wang Y, Shao L, Wang M. 3-D PersonVLAD: Learning Deep Global Representations for Video-Based Person Reidentification. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2019; 30:3347-3359. [PMID: 30716051 DOI: 10.1109/tnnls.2019.2891244] [Citation(s) in RCA: 33] [Impact Index Per Article: 5.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/09/2023]
Abstract
We present the global deep video representation learning to video-based person reidentification (re-ID) that aggregates local 3-D features across the entire video extent. Existing methods typically extract frame-wise deep features from 2-D convolutional networks (ConvNets) which are pooled temporally to produce the video-level representations. However, 2-D ConvNets lose temporal priors immediately after the convolutions, and a separate temporal pooling is limited in capturing human motion in short sequences. In this paper, we present global video representation learning, to be complementary to 3-D ConvNets as a novel layer to capture the appearance and motion dynamics in full-length videos. Nevertheless, encoding each video frame in its entirety and computing aggregate global representations across all frames is tremendously challenging due to the occlusions and misalignments. To resolve this, our proposed network is further augmented with the 3-D part alignment to learn local features through the soft-attention module. These attended features are statistically aggregated to yield identity-discriminative representations. Our global 3-D features are demonstrated to achieve the state-of-the-art results on three benchmark data sets: MARS, Imagery Library for Intelligent Detection Systems-Video Re-identification, and PRID2011.
Collapse
|
14
|
Zhang Z, Xie Y, Zhang W, Tang Y, Tian Q. Tensor Multi-task Learning for Person Re-identification. IEEE TRANSACTIONS ON IMAGE PROCESSING : A PUBLICATION OF THE IEEE SIGNAL PROCESSING SOCIETY 2019; 29:2463-2477. [PMID: 31689192 DOI: 10.1109/tip.2019.2949929] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/10/2023]
Abstract
This paper presents a tensor multi-task model for person re-identification (Re-ID). Due to discrepancy among cameras, our approach regards Re-ID from multiple cameras as different but related classification tasks, each task corresponding to a specific camera. In each task, we distinguish the person identity as a one-vs-all linear classification problem, where one classifier is associated with a specific person. By constructing all classifiers into a task-specific projection matrix, the proposed method could utilize all the matrices to form a tensor structure, and jointly train all the tasks in a uniform tensor space. In this space, by assuming the features of the same person under different cameras are generated from a latent subspace, and different identities under the same perspective share similar patterns, the high-order correlations, not only across different tasks but also within a certain task, can be captured by utilizing a new type of low-rank tensor constraint. Therefore, the learned classifiers transform the original feature vector into the latent space, where feature distributions across cameras can be well-aligned. Moreover, this model can be incorporated into multiple visual features to boost the performance, and easily extended to the unsupervised setting. Extensive experiments and comparisons with recent Re-ID methods manifest the competitive performance of our method.
Collapse
|
15
|
Shi H, Zhang Y, Zhang Z, Ma N, Zhao X, Gao Y, Sun J. Hypergraph-Induced Convolutional Networks for Visual Classification. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2019; 30:2963-2972. [PMID: 30295630 DOI: 10.1109/tnnls.2018.2869747] [Citation(s) in RCA: 13] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/08/2023]
Abstract
At present, convolutional neural networks (CNNs) have become popular in visual classification tasks because of their superior performance. However, CNN-based methods do not consider the correlation of visual data to be classified. Recently, graph convolutional networks (GCNs) have mitigated this problem by modeling the pairwise relationship in visual data. Real-world tasks of visual classification typically must address numerous complex relationships in the data, which are not fit for the modeling of the graph structure using GCNs. Therefore, it is vital to explore the underlying correlation of visual data. Regarding this issue, we propose a framework called the hypergraph-induced convolutional network to explore the high-order correlation in visual data during deep neural networks. First, a hypergraph structure is constructed to formulate the relationship in visual data. Then, the high-order correlation is optimized by a learning process based on the constructed hypergraph. The classification tasks are performed by considering the high-order correlation in the data. Thus, the convolution of the hypergraph-induced convolutional network is based on the corresponding high-order relationship, and the optimization on the network uses each data and considers the high-order correlation of the data. To evaluate the proposed hypergraph-induced convolutional network framework, we have conducted experiments on three visual data sets: the National Taiwan University 3-D model data set, Princeton Shape Benchmark, and multiview RGB-depth object data set. The experimental results and comparison in all data sets demonstrate the effectiveness of our proposed hypergraph-induced convolutional network compared with the state-of-the-art methods.
Collapse
|
16
|
Yu Z, Yu J, Xiang C, Fan J, Tao D. Beyond Bilinear: Generalized Multimodal Factorized High-Order Pooling for Visual Question Answering. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2018; 29:5947-5959. [PMID: 29993847 DOI: 10.1109/tnnls.2018.2817340] [Citation(s) in RCA: 98] [Impact Index Per Article: 14.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/08/2023]
Abstract
Visual question answering (VQA) is challenging, because it requires a simultaneous understanding of both visual content of images and textual content of questions. To support the VQA task, we need to find good solutions for the following three issues: 1) fine-grained feature representations for both the image and the question; 2) multimodal feature fusion that is able to capture the complex interactions between multimodal features; and 3) automatic answer prediction that is able to consider the complex correlations between multiple diverse answers for the same question. For fine-grained image and question representations, a "coattention" mechanism is developed using a deep neural network (DNN) architecture to jointly learn the attentions for both the image and the question, which can allow us to reduce the irrelevant features effectively and obtain more discriminative features for image and question representations. For multimodal feature fusion, a generalized multimodal factorized high-order pooling approach (MFH) is developed to achieve more effective fusion of multimodal features by exploiting their correlations sufficiently, which can further result in superior VQA performance as compared with the state-of-the-art approaches. For answer prediction, the Kullback-Leibler divergence is used as the loss function to achieve precise characterization of the complex correlations between multiple diverse answers with the same or similar meaning, which can allow us to achieve faster convergence rate and obtain slightly better accuracy on answer prediction. A DNN architecture is designed to integrate all these aforementioned modules into a unified model for achieving superior VQA performance. With an ensemble of our MFH models, we achieve the state-of-the-art performance on the large-scale VQA data sets and win the runner-up in VQA Challenge 2017.
Collapse
|
17
|
Robust iterative closest point algorithm based on global reference point for rotation invariant registration. PLoS One 2017; 12:e0188039. [PMID: 29176780 PMCID: PMC5703502 DOI: 10.1371/journal.pone.0188039] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/20/2017] [Accepted: 10/31/2017] [Indexed: 11/27/2022] Open
Abstract
The iterative closest point (ICP) algorithm is efficient and accurate for rigid registration but it needs the good initial parameters. It is easily failed when the rotation angle between two point sets is large. To deal with this problem, a new objective function is proposed by introducing a rotation invariant feature based on the Euclidean distance between each point and a global reference point, where the global reference point is a rotation invariant. After that, this optimization problem is solved by a variant of ICP algorithm, which is an iterative method. Firstly, the accurate correspondence is established by using the weighted rotation invariant feature distance and position distance together. Secondly, the rigid transformation is solved by the singular value decomposition method. Thirdly, the weight is adjusted to control the relative contribution of the positions and features. Finally this new algorithm accomplishes the registration by a coarse-to-fine way whatever the initial rotation angle is, which is demonstrated to converge monotonically. The experimental results validate that the proposed algorithm is more accurate and robust compared with the original ICP algorithm.
Collapse
|