1
|
Qi J, Gong Z, Liu X, Chen C, Zhong P. Masked Spatial-Spectral Autoencoders Are Excellent Hyperspectral Defenders. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2025; 36:3012-3026. [PMID: 38163309 DOI: 10.1109/tnnls.2023.3345734] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/03/2024]
Abstract
Deep learning (DL) methodology contributes a lot to the development of hyperspectral image (HSI) analysis community. However, it also makes HSI analysis systems vulnerable to adversarial attacks. To this end, we propose a masked spatial-spectral autoencoder (MSSA) in this article under self-supervised learning theory, for enhancing the robustness of HSI analysis systems. First, a masked sequence attention learning (MSAL) module is conducted to promote the inherent robustness of HSI analysis systems along spectral channel. Then, we develop a graph convolutional network (GCN) with learnable graph structure to establish global pixel-wise combinations. In this way, the attack effect would be dispersed by all the related pixels among each combination, and a better defense performance is achievable in spatial aspect. Finally, to improve the defense transferability and address the problem of limited labeled samples, MSSA employs spectra reconstruction as a pretext task and fits the datasets in a self-supervised manner. Comprehensive experiments over three benchmarks verify the effectiveness of MSSA in comparison with the state-of-the-art hyperspectral classification methods and representative adversarial defense strategies.
Collapse
|
2
|
Zhang C, Zhang Y, Chang Z, Li C. Sperm YOLOv8E-TrackEVD: A Novel Approach for Sperm Detection and Tracking. SENSORS (BASEL, SWITZERLAND) 2024; 24:3493. [PMID: 38894284 PMCID: PMC11175353 DOI: 10.3390/s24113493] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/16/2024] [Revised: 05/21/2024] [Accepted: 05/24/2024] [Indexed: 06/21/2024]
Abstract
Male infertility is a global health issue, with 40-50% attributed to sperm abnormalities. The subjectivity and irreproducibility of existing detection methods pose challenges to sperm assessment, making the design of automated semen analysis algorithms crucial for enhancing the reliability of sperm evaluations. This paper proposes a comprehensive sperm tracking algorithm (Sperm YOLOv8E-TrackEVD) that combines an enhanced YOLOv8 small object detection algorithm (SpermYOLOv8-E) with an improved DeepOCSORT tracking algorithm (SpermTrack-EVD) to detect human sperm in a microscopic field of view and track healthy sperm in a sample in a short period effectively. Firstly, we trained the improved YOLOv8 model on the VISEM-Tracking dataset for accurate sperm detection. To enhance the detection of small sperm objects, we introduced an attention mechanism, added a small object detection layer, and integrated the SPDConv and Detect_DyHead modules. Furthermore, we used a new distance metric method and chose IoU loss calculation. Ultimately, we achieved a 1.3% increase in precision, a 1.4% increase in recall rate, and a 2.0% improvement in mAP@0.5:0.95. We applied SpermYOLOv8-E combined with SpermTrack-EVD for sperm tracking. On the VISEM-Tracking dataset, we achieved 74.303% HOTA and 71.167% MOTA. These results show the effectiveness of the designed Sperm YOLOv8E-TrackEVD approach in sperm tracking scenarios.
Collapse
Affiliation(s)
| | | | - Zhanyuan Chang
- College of Information, Mechanical and Electrical Engineering, Shanghai Normal University, Shanghai 200234, China; (C.Z.); (Y.Z.); (C.L.)
| | | |
Collapse
|
3
|
Luan X, Ding Z, Liu L, Li W, Gao X. A Symmetrical Siamese Network Framework With Contrastive Learning for Pose-Robust Face Recognition. IEEE TRANSACTIONS ON IMAGE PROCESSING : A PUBLICATION OF THE IEEE SIGNAL PROCESSING SOCIETY 2023; 32:5652-5663. [PMID: 37824317 DOI: 10.1109/tip.2023.3322593] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 10/14/2023]
Abstract
Face recognition has achieved remarkable success owing to the development of deep learning. However, most of existing face recognition models perform poorly against pose variations. We argue that, it is primarily caused by pose-based long-tailed data - imbalanced distribution of training samples between profile faces and near-frontal faces. Additionally, self-occlusion and nonlinear warping of facial textures caused by large pose variations also increase the difficulty in learning discriminative features of profile faces. In this study, we propose a novel framework called Symmetrical Siamese Network (SSN), which can simultaneously overcome the limitation of pose-based long-tailed data and pose-invariant features learning. Specifically, two sub-modules are proposed in the SSN, i.e., Feature-Consistence Learning sub-Net (FCLN) and Identity-Consistence Learning sub-Net (ICLN). For FCLN, the inputs are all face images on training dataset. Inspired by the contrastive learning, we simulate pose variations of faces and constrain the model to focus on the consistent areas between the original face image and its corresponding virtual pose face images. For ICLN, only profile images are used as inputs, and we propose to adopt Identity Consistence Loss to minimize the intra-class feature variation across different poses. The collaborative learning of two sub-modules guarantees that the parameters of network are updated in a relatively equal probability between near-frontal face images and profile images, so that the pose-based long-tailed problem can be effectively addressed. The proposed SSN shows comparable results over the state-of-the-art methods on several public datasets. In this study, LightCNN is selected as the backbone of SSN, and existing popular networks also can be used into our framework for pose-robust face recognition.
Collapse
|
4
|
Ma Y, He J, Yang D, Zhang T, Wu F. Adaptive Part Mining for Robust Visual Tracking. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE 2023; 45:11443-11457. [PMID: 37192025 DOI: 10.1109/tpami.2023.3275034] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/18/2023]
Abstract
Visual tracking aims to estimate object state in a video sequence, which is challenging when facing drastic appearance changes. Most existing trackers conduct tracking with divided parts to handle appearance variations. However, these trackers commonly divide target objects into regular patches by a hand-designed splitting way, which is too coarse to align object parts well. Besides, a fixed part detector is difficult to partition targets with arbitrary categories and deformations. To address the above issues, we propose a novel adaptive part mining tracker (APMT) for robust tracking via a transformer architecture, including an object representation encoder, an adaptive part mining decoder, and an object state estimation decoder. The proposed APMT enjoys several merits. First, in the object representation encoder, object representation is learned by distinguishing target object from background regions. Second, in the adaptive part mining decoder, we introduce multiple part prototypes to adaptively capture target parts through cross-attention mechanisms for arbitrary categories and deformations. Third, in the object state estimation decoder, we propose two novel strategies to effectively handle appearance variations and distractors. Extensive experimental results demonstrate that our APMT achieves promising results with high FPS. Notably, our tracker is ranked the first place in the VOT-STb2022 challenge.
Collapse
|
5
|
Shen J, Liu Y, Dong X, Lu X, Khan FS, Hoi S. Distilled Siamese Networks for Visual Tracking. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE 2022; 44:8896-8909. [PMID: 34762585 DOI: 10.1109/tpami.2021.3127492] [Citation(s) in RCA: 10] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/13/2023]
Abstract
In recent years, Siamese network based trackers have significantly advanced the state-of-the-art in real-time tracking. Despite their success, Siamese trackers tend to suffer from high memory costs, which restrict their applicability to mobile devices with tight memory budgets. To address this issue, we propose a distilled Siamese tracking framework to learn small, fast and accurate trackers (students), which capture critical knowledge from large Siamese trackers (teachers) by a teacher-students knowledge distillation model. This model is intuitively inspired by the one teacher versus multiple students learning method typically employed in schools. In particular, our model contains a single teacher-student distillation module and a student-student knowledge sharing mechanism. The former is designed using a tracking-specific distillation strategy to transfer knowledge from a teacher to students. The latter is utilized for mutual learning between students to enable in-depth knowledge understanding. Extensive empirical evaluations on several popular Siamese trackers demonstrate the generality and effectiveness of our framework. Moreover, the results on five tracking benchmarks show that the proposed distilled trackers achieve compression rates of up to 18× and frame-rates of 265 FPS, while obtaining comparable tracking accuracy compared to base models.
Collapse
|
6
|
Wang X, Chen Z, Jiang B, Tang J, Luo B, Tao D. Beyond Greedy Search: Tracking by Multi-Agent Reinforcement Learning-Based Beam Search. IEEE TRANSACTIONS ON IMAGE PROCESSING : A PUBLICATION OF THE IEEE SIGNAL PROCESSING SOCIETY 2022; 31:6239-6254. [PMID: 36166563 DOI: 10.1109/tip.2022.3208437] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/16/2023]
Abstract
To track the target in a video, current visual trackers usually adopt greedy search for target object localization in each frame, that is, the candidate region with the maximum response score will be selected as the tracking result of each frame. However, we found that this may be not an optimal choice, especially when encountering challenging tracking scenarios such as heavy occlusion and fast motion. In particular, if a tracker drifts, errors will be accumulated and would further make response scores estimated by the tracker unreliable in future frames. To address this issue, we propose to maintain multiple tracking trajectories and apply beam search strategy for visual tracking, so that the trajectory with fewer accumulated errors can be identified. Accordingly, this paper introduces a novel multi-agent reinforcement learning based beam search tracking strategy, termed BeamTracking. It is mainly inspired by the image captioning task, which takes an image as input and generates diverse descriptions using beam search algorithm. Accordingly, we formulate the tracking as a sample selection problem fulfilled by multiple parallel decision-making processes, each of which aims at picking out one sample as their tracking result in each frame. Each maintained trajectory is associated with an agent to perform the decision-making and determine what actions should be taken to update related information. More specifically, using the classification-based tracker as the baseline, we first adopt bi-GRU to encode the target feature, proposal feature, and its response score into a unified state representation. The state feature and greedy search result are then fed into the first agent for independent action selection. Afterwards, the output action and state features are fed into the subsequent agent for diverse results prediction. When all the frames are processed, we select the trajectory with the maximum accumulated score as the tracking result. Extensive experiments on seven popular tracking benchmark datasets validated the effectiveness of the proposed algorithm.
Collapse
|
7
|
A location-aware siamese network for high-speed visual tracking. APPL INTELL 2022. [DOI: 10.1007/s10489-022-03636-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/02/2022]
|
8
|
Glass Refraction Distortion Object Detection via Abstract Features. COMPUTATIONAL INTELLIGENCE AND NEUROSCIENCE 2022; 2022:5456818. [PMID: 35371226 PMCID: PMC8970925 DOI: 10.1155/2022/5456818] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 12/28/2021] [Revised: 02/25/2022] [Accepted: 03/07/2022] [Indexed: 11/23/2022]
Abstract
Glass reflection and refraction lead to missing and distorted object feature data, affecting the accuracy of object detection. In order to solve the above problems, this paper proposed a glass refraction distortion object detection via abstract features. The number of parameters of the algorithm is reduced by introducing skip connections and expansion modules with different expansion rates. The abstract feature information of the object is extracted by binary cross-entropy loss. Meanwhile, the abstract feature distance between the object domain and source domain is reduced by a loss function, which improves the accuracy of object detection under glass interference. To verify the effectiveness of the algorithm in this paper, the GRI dataset is produced and made public on GitHub. The algorithm of this paper is compared with the current state-of-the-art Deep Face, VGG Face, TBE-CNN, DA-GAN, PEN-3D, LMZMPM, and the average detection accuracy of our algorithm is 92.57% at the highest, and the number of parameters is only 5.13 M.
Collapse
|
9
|
Karpagam M, Jeyavathana RB, Chinnappan SK, Kanimozhi KV, Sambath M. A novel face recognition model for fighting against human trafficking in surveillance videos and rescuing victims. Soft comput 2022. [DOI: 10.1007/s00500-022-06931-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]
|
10
|
Abstract
This paper presents a novel iterative learning sliding mode controller (ILSMC) that can be applied to the trajectory tracking of quadrotor unmanned aerial vehicles (UAVs) subject to model uncertainties and external disturbances. Here, the proposed ILSMC is integrated in the outer loop of a controlled system. The control development, conducted in the discrete-time domain, does not require a priori information of the disturbance bound as with conventional SMC techniques. It only involves an equivalent control term for the desired dynamics in the closed loop and an iterative learning term to drive the system state toward the sliding surface to maintain robust performance. By learning from previous iterations, the ILSMC can yield very accurate tracking performance when a sliding mode is induced without control chattering. The design is then applied to the attitude control of a 3DR Solo UAV with a built-in PID controller. The simulation results and experimental validation with real-time data demonstrate the advantages of the proposed control scheme over existing techniques.
Collapse
|
11
|
|
12
|
Hybrid Deep Learning Model Based Indoor Positioning Using Wi-Fi RSSI Heat Maps for Autonomous Applications. ELECTRONICS 2020. [DOI: 10.3390/electronics10010002] [Citation(s) in RCA: 16] [Impact Index Per Article: 3.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/16/2022]
Abstract
Positioning using Wi-Fi received signal strength indication (RSSI) signals is an effective method for identifying the user positions in an indoor scenario. Wi-Fi RSSI signals in an autonomous system can be easily used for vehicle tracking in underground parking. In Wi-Fi RSSI signal based positioning, the positioning system estimates the signal strength of the access points (APs) to the receiver and identifies the user’s indoor positions. The existing Wi-Fi RSSI based positioning systems use raw RSSI signals obtained from APs and estimate the user positions. These raw RSSI signals can easily fluctuate and be interfered with by the indoor channel conditions. This signal interference in the indoor channel condition reduces localization performance of these existing Wi-Fi RSSI signal based positioning systems. To enhance their performance and reduce the positioning error, we propose a hybrid deep learning model (HDLM) based indoor positioning system. The proposed HDLM based positioning system uses RSSI heat maps instead of raw RSSI signals from APs. This results in better localization performance for Wi-Fi RSSI signal based positioning systems. When compared to the existing Wi-Fi RSSI based positioning technologies such as fingerprint, trilateration, and Wi-Fi fusion approaches, the proposed approach achieves reasonably better positioning results for indoor localization. The experiment results show that a combination of convolutional neural network and long short-term memory network (CNN-LSTM) used in the proposed HDLM outperforms other deep learning models and gives a smaller localization error than conventional Wi-Fi RSSI signal based localization approaches. From the experiment result analysis, the proposed system can be easily implemented for autonomous applications.
Collapse
|