1
|
Cao X, Zheng Y, Yao Y, Qin H, Cao X, Guo S. TOPIC: A Parallel Association Paradigm for Multi-Object Tracking under Complex Motions and Diverse Scenes. IEEE TRANSACTIONS ON IMAGE PROCESSING : A PUBLICATION OF THE IEEE SIGNAL PROCESSING SOCIETY 2025; PP:743-758. [PMID: 40031160 DOI: 10.1109/tip.2025.3526066] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 03/05/2025]
Abstract
Video data and algorithms have been driving advances in multi-object tracking (MOT). While existing MOT datasets focus on occlusion and appearance similarity, complex motion patterns are widespread yet overlooked. To address this issue, we introduce a new dataset called BEE24 to highlight complex motions. Identity association algorithms have long been the focus of MOT research. Existing trackers can be categorized into two association paradigms: single-feature paradigm (based on either motion or appearance feature) and serial paradigm (one feature serves as secondary while the other is primary). However, these paradigms are incapable of fully utilizing different features. In this paper, we propose a parallel paradigm and present the Two rOund Parallel matchIng meChanism (TOPIC) to implement it. The TOPIC leverages both motion and appearance features and can adaptively select the preferable one as the assignment metric based on motion level. Moreover, we provide an Attention-based Appearance Reconstruction Module (AARM) to reconstruct appearance feature embeddings, thus enhancing the representation of appearance features. Comprehensive experiments show that our approach achieves state-of-the-art performance on four public datasets and BEE24. Moreover, BEE24 challenges existing trackers to track multiple similar-appearing small objects with complex motions over long periods, which is critical in real-world applications such as beekeeping and drone swarm surveillance. Notably, our proposed parallel paradigm surpasses the performance of existing association paradigms by a large margin, e.g., reducing false negatives by 6% to 81% compared to the single-feature association paradigm. The introduced dataset and association paradigm in this work offer a fresh perspective for advancing the MOT field. The source code and dataset are available at https://github.com/holmescao/TOPICTrack.
Collapse
|
2
|
Dai M, Zheng E, Feng Z, Qi L, Zhuang J, Yang W. Vision-Based UAV Self-Positioning in Low-Altitude Urban Environments. IEEE TRANSACTIONS ON IMAGE PROCESSING : A PUBLICATION OF THE IEEE SIGNAL PROCESSING SOCIETY 2024; 33:493-508. [PMID: 38157460 DOI: 10.1109/tip.2023.3346279] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/03/2024]
Abstract
Unmanned Aerial Vehicles (UAVs) rely on satellite systems for stable positioning. However, due to limited satellite coverage or communication disruptions, UAVs may lose signals for positioning. In such situations, vision-based techniques can serve as an alternative, ensuring the self-positioning capability of UAVs. However, most of the existing datasets are developed for the geo-localization task of the objects captured by UAVs, rather than UAV self-positioning. Furthermore, the existing UAV datasets apply discrete sampling to synthetic data, such as Google Maps, neglecting the crucial aspects of dense sampling and the uncertainties commonly experienced in practical scenarios. To address these issues, this paper presents a new dataset, DenseUAV, that is the first publicly available dataset tailored for the UAV self-positioning task. DenseUAV adopts dense sampling on UAV images obtained in low-altitude urban areas. In total, over 27K UAV- and satellite-view images of 14 university campuses are collected and annotated. In terms of methodology, we first verify the superiority of Transformers over CNNs for the proposed task. Then we incorporate metric learning into representation learning to enhance the model's discriminative capacity and to reduce the modality discrepancy. Besides, to facilitate joint learning from both the satellite and UAV views, we introduce a mutually supervised learning approach. Last, we enhance the Recall@K metric and introduce a new measurement, SDM@K, to evaluate both the retrieval and localization performance for the proposed task. As a result, the proposed baseline method achieves a remarkable Recall@1 score of 83.01% and an SDM@1 score of 86.50% on DenseUAV. The dataset and code have been made publicly available on https://github.com/Dmmm1997/DenseUAV.
Collapse
|
3
|
Zheng A, Zhang C, Li C, Tang J, Tan C. Multi-Query Vehicle Re-Identification: Viewpoint-Conditioned Network, Unified Dataset and New Metric. IEEE TRANSACTIONS ON IMAGE PROCESSING : A PUBLICATION OF THE IEEE SIGNAL PROCESSING SOCIETY 2023; 32:5948-5960. [PMID: 37889811 DOI: 10.1109/tip.2023.3326691] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 10/29/2023]
Abstract
Existing vehicle re-identification methods mainly rely on the single query, which has limited information for vehicle representation and thus significantly hinders the performance of vehicle Re-ID in complicated surveillance networks. In this paper, we propose a more realistic and easily accessible task, called multi-query vehicle Re-ID, which leverages multiple queries to overcome viewpoint limitation of single one. Based on this task, we make three major contributions. First, we design a novel viewpoint-conditioned network (VCNet), which adaptively combines the complementary information from different vehicle viewpoints, for multi-query vehicle Re-ID. Moreover, to deal with the problem of missing vehicle viewpoints, we propose a cross-view feature recovery module which recovers the features of the missing viewpoints by learnt the correlation between the features of available and missing viewpoints. Second, we create a unified benchmark dataset, taken by 6142 cameras from a real-life transportation surveillance system, with comprehensive viewpoints and large number of crossed scenes of each vehicle for multi-query vehicle Re-ID evaluation. Finally, we design a new evaluation metric, called mean cross-scene precision (mCSP), which measures the ability of cross-scene recognition by suppressing the positive samples with similar viewpoints from the same camera. Comprehensive experiments validate the superiority of the proposed method against other methods, as well as the effectiveness of the designed metric in the evaluation of multi-query vehicle Re-ID. The codes and dataset are available at: https://github.com/zhangchaobin001/VCNet.
Collapse
|
4
|
Li Q, Hu S, Shimasaki K, Ishii I. HFR-Video-Based Stereo Correspondence Using High Synchronous Short-Term Velocities. SENSORS (BASEL, SWITZERLAND) 2023; 23:s23094285. [PMID: 37177489 PMCID: PMC10181470 DOI: 10.3390/s23094285] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/20/2023] [Revised: 04/19/2023] [Accepted: 04/23/2023] [Indexed: 05/15/2023]
Abstract
This study focuses on solving the correspondence problem of multiple moving objects with similar appearances in stereoscopic videos. Specifically, we address the multi-camera correspondence problem by taking into account the pixel-level and feature-level stereo correspondences, and object-level cross-camera multiple object correspondence. Most correspondence algorithms rely on texture and color information of the stereo images, making it challenging to distinguish between similar-looking objects, such as ballet dancers and corporate employees wearing similar dresses, or farm animals such as chickens, ducks, and cows. However, by leveraging the low latency and high synchronization of high-speed cameras, we can perceive the phase and frequency differences between the movements of similar-looking objects. In this study, we propose using short-term velocities (STVs) of objects as motion features to determine the correspondence of multiple objects by calculating the similarity of STVs. To validate our approach, we conducted stereo correspondence experiments using markers attached to a metronome and natural hand movements to simulate simple and complex motion scenes. The experimental results demonstrate that our method achieved good performance in stereo correspondence.
Collapse
Affiliation(s)
- Qing Li
- Smart Robotics Laboratory, Graduate School of Advanced Science and Engineering, Hiroshima University, 1-4-1 Kagamiyama, Higashi-Hiroshima, Hiroshima 739-8527, Japan
| | - Shaopeng Hu
- Smart Robotics Laboratory, Graduate School of Advanced Science and Engineering, Hiroshima University, 1-4-1 Kagamiyama, Higashi-Hiroshima, Hiroshima 739-8527, Japan
| | - Kohei Shimasaki
- Smart Robotics Laboratory, Graduate School of Advanced Science and Engineering, Hiroshima University, 1-4-1 Kagamiyama, Higashi-Hiroshima, Hiroshima 739-8527, Japan
| | - Idaku Ishii
- Smart Robotics Laboratory, Graduate School of Advanced Science and Engineering, Hiroshima University, 1-4-1 Kagamiyama, Higashi-Hiroshima, Hiroshima 739-8527, Japan
| |
Collapse
|
5
|
Seo H, Lee K, Lee K. Investigating the Improvement of Autonomous Vehicle Performance through the Integration of Multi-Sensor Dynamic Mapping Techniques. SENSORS (BASEL, SWITZERLAND) 2023; 23:2369. [PMID: 36904572 PMCID: PMC10007208 DOI: 10.3390/s23052369] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 02/06/2023] [Revised: 02/13/2023] [Accepted: 02/18/2023] [Indexed: 06/18/2023]
Abstract
The emergence of autonomous vehicles marks a shift in mobility. Conventional vehicles have been designed to prioritize the safety of drivers and passengers and increase fuel efficiency, while autonomous vehicles are developing as convergence technologies with a focus on more than just transportation. With the potential for autonomous vehicles to serve as an office or leisure space, the accuracy and stability of their driving technology is of utmost importance. However, commercializing autonomous vehicles has been challenging due to the limitations of current technology. This paper proposes a method to build a precision map for multi-sensor-based autonomous driving to improve the accuracy and stability of autonomous vehicle technology. The proposed method leverages dynamic high-definition maps to enhance the recognition rates and autonomous driving path recognition of objects in the vicinity of the vehicle, utilizing multiple sensors such as cameras, LIDAR, and RADAR. The goal is to improve the accuracy and stability of autonomous driving technology.
Collapse
Affiliation(s)
- Hyoduck Seo
- College of Electronics & Information, Kyunghee University, 1732 Deogyeong-daero, Giheung-gu, Yongin-si 17104, Gyeonggi-do, Republic of Korea
| | - Kyesan Lee
- College of Electronics & Information, Kyunghee University, 1732 Deogyeong-daero, Giheung-gu, Yongin-si 17104, Gyeonggi-do, Republic of Korea
| | - Kyujin Lee
- Department of Electronic Engineering, Semyung University, 65 Semyung-ro, Jecheon-si 27136, Chungcheongbuk-do, Republic of Korea
| |
Collapse
|
6
|
Jang J, Seon M, Choi J. Lightweight Indoor Multi-Object Tracking in Overlapping FOV Multi-Camera Environments. SENSORS (BASEL, SWITZERLAND) 2022; 22:5267. [PMID: 35890945 PMCID: PMC9325266 DOI: 10.3390/s22145267] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 06/15/2022] [Revised: 07/08/2022] [Accepted: 07/11/2022] [Indexed: 02/04/2023]
Abstract
Multi-Target Multi-Camera Tracking (MTMCT), which aims to track multiple targets within a multi-camera network, has recently attracted considerable attention due to its wide range of applications. The main challenge of MTMCT is to match local tracklets (i.e., sub-trajectories) obtained by different cameras and to combine them into global trajectories across the multi-camera network. This paper addresses the cross-camera tracklet matching problem in scenarios with partially overlapping fields of view (FOVs), such as indoor multi-camera environments. We present a new lightweight matching method for the MTMC task that employs similarity analysis for location features. The proposed approach comprises two steps: (i) extracting the motion information of targets based on a ground projection method and (ii) matching the tracklets using similarity analysis based on the Dynamic Time Warping (DTW) algorithm. We use a Kanade-Lucas-Tomasi (KLT) algorithm-based frame-skipping method to reduce the computational overhead in object detection and to produce a smooth estimate of the target's local tracklets. To improve matching accuracy, we also investigate three different location features to determine the most appropriate feature for similarity analysis. The effectiveness of the proposed method has been evaluated through real experiments, demonstrating its ability to accurately match local tracklets.
Collapse
Affiliation(s)
| | | | - Jaehyuk Choi
- School of Computing, Gachon University, 1342 Seongnam-daero, Sujeong-gu, Seongnam-si 13120, Korea; (J.J.); (M.S.)
| |
Collapse
|
7
|
Ren J, Guan F, Wang T, Qian B, Luo C, Cai G, Kan C, Li X. High Precision Calibration Algorithm for Binocular Stereo Vision Camera using Deep Reinforcement Learning. COMPUTATIONAL INTELLIGENCE AND NEUROSCIENCE 2022; 2022:6596868. [PMID: 35401726 PMCID: PMC8989564 DOI: 10.1155/2022/6596868] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 11/16/2021] [Revised: 02/25/2022] [Accepted: 03/18/2022] [Indexed: 11/30/2022]
Abstract
Camera calibration is the most important aspect of computer vision research. To address the issue of insufficient precision, therefore, a high precision calibration algorithm for binocular stereo vision camera using deep reinforcement learning is proposed. Firstly, a binocular stereo camera model is established. Camera calibration is mainly divided into internal and external parameter calibration. Secondly, the internal parameter calibration is completed by solving the antihidden point of the camera light center and the camera distortion value of the camera plane. The deep learning fitting value function is used based on the internal parameters. The target network is established to adjust the parameters of the value function, and the convergence of the value function is calculated to optimize reinforcement learning. The deep reinforcement learning fitting structure is built, the camera data is entered, and the external parameter calibration is finished by continuous updating and convergence. Finally, the high precision calibration of the binocular stereo vision camera is completed. The results show that the calibration error of the proposed algorithm under different sizes of checkerboard calibration board test is only 0.36% and 0.35%, respectively, the calibration accuracy is high, the value function converges quickly, and the parameter calculation accuracy is high, the overall time consumption of the proposed algorithm is short, and the calibration results have strong stability.
Collapse
Affiliation(s)
- Jie Ren
- College of Physical Education and Training, Harbin Sport University, Harbin 150008, China
| | - Fuyu Guan
- College of Physical Education and Training, Harbin Sport University, Harbin 150008, China
| | - Tingting Wang
- Party and Government Office, Harbin Sport University, Harbin 150008, China
| | - Baoshan Qian
- Winter Olympic College, Harbin Sport University, Harbin 150008, China
| | - Chunlin Luo
- College of Physical Education and Training, Harbin Sport University, Harbin 150008, China
| | - Guoliang Cai
- College of Sports Human Science, Harbin Sport University, Harbin 150008, China
| | - Ce Kan
- College of Physical Education and Training, Harbin Sport University, Harbin 150008, China
| | - Xiaofeng Li
- Department of Information Engineering, Heilongjiang International University, Harbin 150025, China
| |
Collapse
|
8
|
Gao Z, Huang Z. Global-View Re-identification Tracking with Transformer. ARTIF INTELL 2022. [DOI: 10.1007/978-3-031-20497-5_51] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/23/2022]
|