1
|
Li Z, Ma J. Learning Feature Matching via Matchable Keypoint-Assisted Graph Neural Network. IEEE TRANSACTIONS ON IMAGE PROCESSING : A PUBLICATION OF THE IEEE SIGNAL PROCESSING SOCIETY 2024; PP:154-169. [PMID: 40030484 DOI: 10.1109/tip.2024.3512352] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 03/05/2025]
Abstract
Accurately matching local features between a pair of images corresponding to the same 3D scene is a challenging computer vision task. Previous studies typically utilize attention-based graph neural networks (GNNs) with fully-connected graphs over keypoints within/across images for visual and geometric information reasoning. However, in the background of local feature matching, a significant number of keypoints are non-repeatable due to factors like occlusion and failure of the detector, and thus irrelevant for message passing. The connectivity with non-repeatable keypoints not only introduces redundancy, resulting in limited efficiency (quadratic computational complexity w.r.t. the keypoint number), but also interferes with the representation aggregation process, leading to limited accuracy. Aiming at the best of both worlds on accuracy and efficiency, we propose MaKeGNN, a sparse attention-based GNN architecture which bypasses non-repeatable keypoints and leverages matchable ones to guide compact and meaningful message passing. More specifically, our Bilateral Context-Aware Sampling (BCAS) Module first dynamically samples two small sets of well-distributed keypoints with high matchability scores from the image pair. Then, our Matchable Keypoint-Assisted Context Aggregation (MKACA) Module regards sampled informative keypoints as message bottlenecks and thus constrains each keypoint only to retrieve favorable contextual information from intra- and inter-matchable keypoints, evading the interference of irrelevant and redundant connectivity with non-repeatable ones. Furthermore, considering the potential noise in initial keypoints and sampled matchable ones, the MKACA module adopts a matchability-guided attentional aggregation operation for purer data-dependent context propagation. By these means, MaKeGNN outperforms the state-of-the-arts on multiple highly challenging benchmarks, while significantly reducing computational and memory complexity compared to typical attentional GNNs.
Collapse
|
2
|
Xiao G, Yu J, Ma J, Fan DP, Shao L. Latent Semantic Consensus for Deterministic Geometric Model Fitting. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE 2024; 46:6139-6153. [PMID: 38478435 DOI: 10.1109/tpami.2024.3376731] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 03/20/2024]
Abstract
Estimating reliable geometric model parameters from the data with severe outliers is a fundamental and important task in computer vision. This paper attempts to sample high-quality subsets and select model instances to estimate parameters in the multi-structural data. To address this, we propose an effective method called Latent Semantic Consensus (LSC). The principle of LSC is to preserve the latent semantic consensus in both data points and model hypotheses. Specifically, LSC formulates the model fitting problem into two latent semantic spaces based on data points and model hypotheses, respectively. Then, LSC explores the distributions of points in the two latent semantic spaces, to remove outliers, generate high-quality model hypotheses, and effectively estimate model instances. Finally, LSC is able to provide consistent and reliable solutions within only a few milliseconds for general multi-structural model fitting, due to its deterministic fitting nature and efficiency. Compared with several state-of-the-art model fitting methods, our LSC achieves significant superiority for the performance of both accuracy and speed on synthetic data and real images.
Collapse
|
3
|
Lin S, Chen X, Xiao G, Wang H, Huang F, Weng J. Multi-Stage Network With Geometric Semantic Attention for Two-View Correspondence Learning. IEEE TRANSACTIONS ON IMAGE PROCESSING : A PUBLICATION OF THE IEEE SIGNAL PROCESSING SOCIETY 2024; 33:3031-3046. [PMID: 38656841 DOI: 10.1109/tip.2024.3391002] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 04/26/2024]
Abstract
The removal of outliers is crucial for establishing correspondence between two images. However, when the proportion of outliers reaches nearly 90%, the task becomes highly challenging. Existing methods face limitations in effectively utilizing geometric transformation consistency (GTC) information and incorporating geometric semantic neighboring information. To address these challenges, we propose a Multi-Stage Geometric Semantic Attention (MSGSA) network. The MSGSA network consists of three key modules: the multi-branch (MB) module, the GTC module, and the geometric semantic attention (GSA) module. The MB module, structured with a multi-branch design, facilitates diverse and robust spatial transformations. The GTC module captures transformation consistency information from the preceding stage. The GSA module categorizes input based on the prior stage's output, enabling efficient extraction of geometric semantic information through a graph-based representation and inter-category information interaction using Transformer. Extensive experiments on the YFCC100M and SUN3D datasets demonstrate that MSGSA outperforms current state-of-the-art methods in outlier removal and camera pose estimation, particularly in scenarios with a high prevalence of outliers. Source code is available at https://github.com/shuyuanlin.
Collapse
|
4
|
Zhang Z, Song H, Fan J, Fu T, Li Q, Ai D, Xiao D, Yang J. Dual-correlate optimized coarse-fine strategy for monocular laparoscopic videos feature matching via multilevel sequential coupling feature descriptor. Comput Biol Med 2024; 169:107890. [PMID: 38168646 DOI: 10.1016/j.compbiomed.2023.107890] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/12/2023] [Revised: 12/13/2023] [Accepted: 12/18/2023] [Indexed: 01/05/2024]
Abstract
Feature matching of monocular laparoscopic videos is crucial for visualization enhancement in computer-assisted surgery, and the keys to conducting high-quality matches are accurate homography estimation, relative pose estimation, as well as sufficient matches and fast calculation. However, limited by various monocular laparoscopic imaging characteristics such as highlight noises, motion blur, texture interference and illumination variation, most exiting feature matching methods face the challenges of producing high-quality matches efficiently and sufficiently. To overcome these limitations, this paper presents a novel sequential coupling feature descriptor to extract and express multilevel feature maps efficiently, and a dual-correlate optimized coarse-fine strategy to establish dense matches in coarse level and adjust pixel-wise matches in fine level. Firstly, a novel sequential coupling swin transformer layer is designed in feature descriptor to learn and extract multilevel feature representations richly without increasing complexity. Then, a dual-correlate optimized coarse-fine strategy is proposed to match coarse feature sequences under low resolution, and the correlated fine feature sequences is optimized to refine pixel-wise matches based on coarse matching priors. Finally, the sequential coupling feature descriptor and dual-correlate optimization are merged into the Sequential Coupling Dual-Correlate Network (SeCo DC-Net) to produce high-quality matches. The evaluation is conducted on two public laparoscopic datasets: Scared and EndoSLAM, and the experimental results show the proposed network outperforms state-of-the-art methods in homography estimation, relative pose estimation, reprojection error, matching pairs number and inference runtime. The source code is publicly available at https://github.com/Iheckzza/FeatureMatching.
Collapse
Affiliation(s)
- Ziang Zhang
- The School of Medical Technology, Beijing Institute of Technology, Beijing, 100081, China
| | - Hong Song
- The School of Computer Science & Technology, Beijing Institute of Technology, Beijing, 100081, China.
| | - Jingfan Fan
- The School of Optics and Photonics, Beijing Institute of Technology, Beijing, 100081, China.
| | - Tianyu Fu
- The School of Medical Technology, Beijing Institute of Technology, Beijing, 100081, China
| | - Qiang Li
- The School of Computer Science & Technology, Beijing Institute of Technology, Beijing, 100081, China
| | - Danni Ai
- The School of Optics and Photonics, Beijing Institute of Technology, Beijing, 100081, China
| | - Deqaing Xiao
- The School of Optics and Photonics, Beijing Institute of Technology, Beijing, 100081, China
| | - Jian Yang
- The School of Optics and Photonics, Beijing Institute of Technology, Beijing, 100081, China.
| |
Collapse
|
5
|
Ding J, Zhang J, Ye L, Wu C. Kalman-Based Scene Flow Estimation for Point Cloud Densification and 3D Object Detection in Dynamic Scenes. SENSORS (BASEL, SWITZERLAND) 2024; 24:916. [PMID: 38339632 PMCID: PMC10856919 DOI: 10.3390/s24030916] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 11/07/2023] [Revised: 01/23/2024] [Accepted: 01/26/2024] [Indexed: 02/12/2024]
Abstract
Point cloud densification is essential for understanding the 3D environment. It provides crucial structural and semantic information for downstream tasks such as 3D object detection and tracking. However, existing registration-based methods struggle with dynamic targets due to the incompleteness and deformation of point clouds. To address this challenge, we propose a Kalman-based scene flow estimation method for point cloud densification and 3D object detection in dynamic scenes. Our method effectively tackles the issue of localization errors in scene flow estimation and enhances the accuracy and precision of shape completion. Specifically, we introduce a Kalman filter to correct the dynamic target's position while estimating long sequence scene flow. This approach helps eliminate the cumulative localization error during the scene flow estimation process. Extended experiments on the KITTI 3D tracking dataset demonstrate that our method significantly improves the performance of LiDAR-only detectors, achieving superior results compared to the baselines.
Collapse
Affiliation(s)
| | - Jin Zhang
- School of Rail Transportation, Soochow University, Suzhou 215500, China; (J.D.); (L.Y.)
| | | | - Cheng Wu
- School of Rail Transportation, Soochow University, Suzhou 215500, China; (J.D.); (L.Y.)
| |
Collapse
|
6
|
Sun K, Pang X, Zheng M, Nie X, Li X, Zhou H, Yin Y. Heterogeneous context interaction network for vehicle re-identification. Neural Netw 2024; 169:293-306. [PMID: 37918272 DOI: 10.1016/j.neunet.2023.10.032] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/10/2023] [Revised: 10/12/2023] [Accepted: 10/22/2023] [Indexed: 11/04/2023]
Abstract
Capturing global and subtle discriminative information using attention mechanisms is essential to address the challenge of inter-class high similarity for vehicle re-identification (Re-ID) task. Mixing self-information of nodes or modeling context based on pairwise dependencies between nodes are the core ideas of current advanced attention mechanisms. This paper aims to explore how to utilize both dependency context and self-context in an efficient way to facilitate attention to learn more effectively. We propose a heterogeneous context interaction (HCI) attention mechanism that infers the weights of nodes from the interactions of global dependency contexts and local self-contexts to enhance the effect of attention learning. To reduce computational complexity, global dependency contexts are modeled by aggregating number-compressed pairwise dependencies, and the interactions of heterogeneous contexts are restricted to a certain range. Based on this mechanism, we propose a heterogeneous context interaction network (HCI-Net), which uses channel heterogeneous context interaction module (CHCI) and spatial heterogeneous context interaction module (SHCI), and introduces a rigid partitioning strategy to extract important global and fine-grained features. In addition, we design a non-similarity constraint (NSC) that forces the HCI-Net to learn diverse subtle discriminative information. The experiment results on two large datasets, VeRi-776 and VehicleID, show that our proposed HCI-Net achieves the state-of-the-art performance. In particular, the mean average precision (mAP) reaches 83.8% on VeRi-776 dataset.
Collapse
Affiliation(s)
- Ke Sun
- School of Information Science and Electrical Engineering, Shandong Jiaotong University, No. 5001, Haitang Road, Changqing District, Jinan, 250357, Shan Dong, China.
| | - Xiyu Pang
- School of Information Science and Electrical Engineering, Shandong Jiaotong University, No. 5001, Haitang Road, Changqing District, Jinan, 250357, Shan Dong, China; School of Software, Shandong University, No. 1500, Shunhua Road, High-tech Industrial Development Zone, Jinan, 250101, Shan Dong, China.
| | - Meifeng Zheng
- School of Information Science and Electrical Engineering, Shandong Jiaotong University, No. 5001, Haitang Road, Changqing District, Jinan, 250357, Shan Dong, China.
| | - Xiushan Nie
- School of Computer Science and Technology, Shandong Jianzhu University, No. 1000, Fengming Road, Lingang Development Zone, Jinan, 250101, Shan Dong, China.
| | - Xi Li
- School of Information Science and Electrical Engineering, Shandong Jiaotong University, No. 5001, Haitang Road, Changqing District, Jinan, 250357, Shan Dong, China.
| | - Houren Zhou
- School of Information Science and Electrical Engineering, Shandong Jiaotong University, No. 5001, Haitang Road, Changqing District, Jinan, 250357, Shan Dong, China.
| | - Yilong Yin
- School of Software, Shandong University, No. 1500, Shunhua Road, High-tech Industrial Development Zone, Jinan, 250101, Shan Dong, China.
| |
Collapse
|
7
|
Song S, Li Y, Jia Z, Shi F. Salient Object Detection Based on Optimization of Feature Computation by Neutrosophic Set Theory. SENSORS (BASEL, SWITZERLAND) 2023; 23:8348. [PMID: 37896445 PMCID: PMC10610941 DOI: 10.3390/s23208348] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 08/19/2023] [Revised: 09/21/2023] [Accepted: 09/25/2023] [Indexed: 10/29/2023]
Abstract
In recent saliency detection research, too many or too few image features are used in the algorithm, and the processing of saliency map details is not satisfactory, resulting in significant degradation of the salient object detection result. To overcome the above deficiencies and achieve better object detection results, we propose a salient object detection method based on feature optimization by neutrosophic set (NS) theory in this paper. First, prior object knowledge is built using foreground and background models, which include pixel-wise and super-pixel cues. Simultaneously, the feature maps are selected and extracted for feature computation, allowing the object and background features of the image to be separated as much as possible. Second, the salient object is obtained by fusing the features decomposed by the low-rank matrix recovery model with the object prior knowledge. Finally, for salient object detection, we present a novel mathematical description of neutrosophic set theory. To reduce the uncertainty of the obtained saliency map and then obtain good saliency detection results, the new NS theory is proposed. Extensive experiments on five public datasets demonstrate that the results are competitive and superior to previous state-of-the-art methods.
Collapse
Affiliation(s)
- Sensen Song
- Key Laboratory of Signal Detection and Processing, College of Computer Science and Technology, Xinjiang University, Urumqi 830046, China; (S.S.); (Y.L.)
- College of Mathematics and System Science, Xinjiang University, Urumqi 830046, China
| | - Yue Li
- Key Laboratory of Signal Detection and Processing, College of Computer Science and Technology, Xinjiang University, Urumqi 830046, China; (S.S.); (Y.L.)
| | - Zhenhong Jia
- Key Laboratory of Signal Detection and Processing, College of Computer Science and Technology, Xinjiang University, Urumqi 830046, China; (S.S.); (Y.L.)
| | - Fei Shi
- Key Laboratory of Signal Detection and Processing, College of Computer Science and Technology, Xinjiang University, Urumqi 830046, China; (S.S.); (Y.L.)
| |
Collapse
|