1
|
Yang S, Xiao D, Geng H, Ai D, Fan J, Fu T, Song H, Duan F, Yang J. Real-Time 3D Instrument Tip Tracking Using 2D X-Ray Fluoroscopy With Vessel Deformation Correction Under Free Breathing. IEEE Trans Biomed Eng 2025; 72:1422-1436. [PMID: 40117137 DOI: 10.1109/tbme.2024.3508840] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 03/23/2025]
Abstract
OBJECTIVE Accurate localization of the instrument tip within the hepatic vein is crucial for the success of transjugular intrahepatic portosystemic shunt (TIPS) procedures. Real-time tracking of the instrument tip in X-ray images is greatly influenced by vessel deformation due to patient's pose variation, respiratory motion, and puncture manipulation, frequently resulting in failed punctures. METHOD We propose a novel framework called deformable instrument tip tracking (DITT) to obtain the real-time tip positioning within the 3D deformable vasculature. First, we introduce a pose alignment module to improve the rigid matching between the preoperative vessel centerline and the intraoperative instrument centerline, in which the accurate matching of 3D/2D centerline features is implemented with an adaptive point sampling strategy. Second, a respiration compensation module using monoplane X-ray image sequences is constructed and provides the motion prior to predict intraoperative liver movement. Third, a deformation correction module is proposed to rectify the vessel deformation during procedures, in which a manifold regularization and the maximum likelihood-based acceleration are introduced to obtain the accurate and fast deformation learning. RESULTS Experimental results on simulated and clinical datasets show an average tracking error of 1.59 0.57 mm and 1.67 0.54 mm, respectively. CONCLUSION Our framework can track the tip in 3D vessel and dynamically overlap the branch roadmapping onto X-ray images to provide real-time guidance. SIGNIFICANCE Accurate and fast (43ms per frame) tip tracking with the proposed framework possesses a good potential for improving the outcomes of TIPS treatment and minimizes the usage of contrast agent.
Collapse
|
2
|
Zhang Z, Chen S, Wang Z, Yang J. PlaneSeg: Building a Plug-In for Boosting Planar Region Segmentation. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2024; 35:11486-11500. [PMID: 37027268 DOI: 10.1109/tnnls.2023.3262544] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/19/2023]
Abstract
Existing methods in planar region segmentation suffer the problems of vague boundaries and failure to detect small-sized regions. To address these, this study presents an end-to-end framework, named PlaneSeg, which can be easily integrated into various plane segmentation models. Specifically, PlaneSeg contains three modules, namely, the edge feature extraction module, the multiscale module, and the resolution-adaptation module. First, the edge feature extraction module produces edge-aware feature maps for finer segmentation boundaries. The learned edge information acts as a constraint to mitigate inaccurate boundaries. Second, the multiscale module combines feature maps of different layers to harvest spatial and semantic information from planar objects. The multiformity of object information can help recognize small-sized objects to produce more accurate segmentation results. Third, the resolution-adaptation module fuses the feature maps produced by the two aforementioned modules. For this module, a pairwise feature fusion is adopted to resample the dropped pixels and extract more detailed features. Extensive experiments demonstrate that PlaneSeg outperforms other state-of-the-art approaches on three downstream tasks, including plane segmentation, 3-D plane reconstruction, and depth prediction. Code is available at https://github.com/nku-zhichengzhang/PlaneSeg.
Collapse
|
3
|
Yang S, Wang Y, Ai D, Geng H, Zhang D, Xiao D, Song H, Li M, Yang J. Augmented Reality Navigation System for Biliary Interventional Procedures With Dynamic Respiratory Motion Correction. IEEE Trans Biomed Eng 2024; 71:700-711. [PMID: 38241137 DOI: 10.1109/tbme.2023.3316290] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/21/2024]
Abstract
OBJECTIVE Biliary interventional procedures require physicians to track the interventional instrument tip (Tip) precisely with X-ray image. However, Tip positioning relies heavily on the physicians' experience due to the limitations of X-ray imaging and the respiratory interference, which leads to biliary damage, prolonged operation time, and increased X-ray radiation. METHODS We construct an augmented reality (AR) navigation system for biliary interventional procedures. It includes system calibration, respiratory motion correction and fusion navigation. Firstly, the magnetic and 3D computed tomography (CT) coordinates are aligned through system calibration. Secondly, a respiratory motion correction method based on manifold regularization is proposed to correct the misalignment of the two coordinates caused by respiratory motion. Thirdly, the virtual biliary, liver and Tip from CT are overlapped to the corresponding position of the patient for dynamic virtual-real fusion. RESULTS Our system is respectively evaluated and achieved an average alignment error of 0.75 ± 0.17 mm and 2.79 ± 0.46 mm on phantoms and patients. The navigation experiments conducted on phantoms achieve an average Tip positioning error of 0.98 ± 0.15 mm and an average fusion error of 1.67 ± 0.34 mm after correction. CONCLUSION Our system can automatically register the Tip to the corresponding location in CT, and dynamically overlap the 3D virtual model onto patients to provide accurate and intuitive AR navigation. SIGNIFICANCE This study demonstrates the clinical potential of our system by assisting physicians during biliary interventional procedures. Our system enables dynamic visualization of virtual model on patients, reducing the reliance on contrast agents and X-ray usage.
Collapse
|
4
|
Tang Y, Zhao C, Wang J, Zhang C, Sun Q, Zheng WX, Du W, Qian F, Kurths J. Perception and Navigation in Autonomous Systems in the Era of Learning: A Survey. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2023; 34:9604-9624. [PMID: 35482692 DOI: 10.1109/tnnls.2022.3167688] [Citation(s) in RCA: 6] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/14/2023]
Abstract
Autonomous systems possess the features of inferring their own state, understanding their surroundings, and performing autonomous navigation. With the applications of learning systems, like deep learning and reinforcement learning, the visual-based self-state estimation, environment perception, and navigation capabilities of autonomous systems have been efficiently addressed, and many new learning-based algorithms have surfaced with respect to autonomous visual perception and navigation. In this review, we focus on the applications of learning-based monocular approaches in ego-motion perception, environment perception, and navigation in autonomous systems, which is different from previous reviews that discussed traditional methods. First, we delineate the shortcomings of existing classical visual simultaneous localization and mapping (vSLAM) solutions, which demonstrate the necessity to integrate deep learning techniques. Second, we review the visual-based environmental perception and understanding methods based on deep learning, including deep learning-based monocular depth estimation, monocular ego-motion prediction, image enhancement, object detection, semantic segmentation, and their combinations with traditional vSLAM frameworks. Then, we focus on the visual navigation based on learning systems, mainly including reinforcement learning and deep reinforcement learning. Finally, we examine several challenges and promising directions discussed and concluded in related research of learning systems in the era of computer science and robotics.
Collapse
|
5
|
Jia T, Taylor ZA, Chen X. Density-adaptive registration of pointclouds based on Dirichlet Process Gaussian Mixture Models. Phys Eng Sci Med 2023; 46:719-734. [PMID: 37014577 DOI: 10.1007/s13246-023-01245-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/07/2022] [Accepted: 03/12/2023] [Indexed: 04/05/2023]
Abstract
We propose an algorithm for rigid registration of pre- and intra-operative patient anatomy, represented as pointclouds, during minimally invasive surgery. This capability is essential for development of augmented reality systems for guiding such interventions. Key challenges in this context are differences in the point density in the pre- and intra-operative pointclouds, and potentially low spatial overlap between the two. Solutions, correspondingly, must be robust to both of these phenomena. We formulated a pointclouds registration approach which considers the pointclouds after rigid transformation to be observations of a global non-parametric probabilistic model named Dirichlet Process Gaussian Mixture Model. The registration problem is solved by minimizing the Kullback-Leibler divergence in a variational Bayesian inference framework. By this means, all unknown parameters are recursively inferred, including, importantly, the optimal number of mixture model components, which ensures the model complexity efficiently matches that of the observed data. By presenting the pointclouds as KDTrees, both the data and model are expanded in a coarse-to-fine style. The scanning weight of each point is estimated by its neighborhood, imparting the algorithm with robustness to point density variations. Experiments on several datasets with different levels of noise, outliers and pointcloud overlap show that our method has a comparable accuracy, but higher efficiency than existing Gaussian Mixture Model methods, whose performance is sensitive to the number of model components.
Collapse
Affiliation(s)
- Tingting Jia
- School of Biomedical Engineering, Shanghai Jiao Tong University, 800 Dongchuan Road, Shanghai, 200240, China
- School of Mechanical Engineering, Shanghai Jiao Tong University, 800 Dongchuan Road, Shanghai, 200240, China
| | - Zeike A Taylor
- CISTIB Centre for Computational Imaging and Simulation Technologies in Biomedicine and the Institute of Medical and Biological Engineering, University of Leeds, Woodhouse Lane, Leeds, LS2 9JT, UK
| | - Xiaojun Chen
- School of Mechanical Engineering, Shanghai Jiao Tong University, 800 Dongchuan Road, Shanghai, 200240, China.
- Institute of Medical Robotics, Shanghai Jiao Tong University, 800 Dongchuan Road, Shanghai, 200240, China.
| |
Collapse
|
6
|
Robust two-phase registration method for three-dimensional point set under the Bayesian mixture framework. INT J MACH LEARN CYB 2022. [DOI: 10.1007/s13042-022-01673-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/26/2022]
|
7
|
Yuan X, Maharjan A. Non-rigid point set registration: recent trends and challenges. Artif Intell Rev 2022. [DOI: 10.1007/s10462-022-10292-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/01/2022]
|
8
|
Wan T, Du S, Cui W, Yao R, Ge Y, Li C, Gao Y, Zheng N. RGB-D Point Cloud Registration Based on Salient Object Detection. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2022; 33:3547-3559. [PMID: 33556020 DOI: 10.1109/tnnls.2021.3053274] [Citation(s) in RCA: 11] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/12/2023]
Abstract
We propose a robust algorithm for aligning rigid, noisy, and partially overlapping red green blue-depth (RGB-D) point clouds. To address the problems of data degradation and uneven distribution, we offer three strategies to increase the robustness of the iterative closest point (ICP) algorithm. First, we introduce a salient object detection (SOD) method to extract a set of points with significant structural variation in the foreground, which can avoid the unbalanced proportion of foreground and background point sets leading to the local registration. Second, registration algorithms that rely only on structural information for alignment cannot establish the correct correspondences when faced with the point set with no significant change in structure. Therefore, a bidirectional color distance (BCD) is designed to build precise correspondence with bidirectional search and color guidance. Third, the maximum correntropy criterion (MCC) and trimmed strategy are introduced into our algorithm to handle with noise and outliers. We experimentally validate that our algorithm is more robust than previous algorithms on simulated and real-world scene data in most scenarios and achieve a satisfying 3-D reconstruction of indoor scenes.
Collapse
|
9
|
Feature Matching via Motion-Consistency Driven Probabilistic Graphical Model. Int J Comput Vis 2022. [DOI: 10.1007/s11263-022-01644-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/05/2022]
|
10
|
Hu X, Zhang D, Chen J, Wu Y, Chen Y. NrtNet: An Unsupervised Method for 3D Non-Rigid Point Cloud Registration Based on Transformer. SENSORS (BASEL, SWITZERLAND) 2022; 22:5128. [PMID: 35890808 PMCID: PMC9324002 DOI: 10.3390/s22145128] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 05/09/2022] [Revised: 06/23/2022] [Accepted: 07/06/2022] [Indexed: 06/15/2023]
Abstract
Self-attention networks have revolutionized the field of natural language processing and have also made impressive progress in image analysis tasks. Corrnet3D proposes the idea of first obtaining the point cloud correspondence in point cloud registration. Inspired by these successes, we propose an unsupervised network for non-rigid point cloud registration, namely NrtNet, which is the first network using a transformer for unsupervised large deformation non-rigid point cloud registration. Specifically, NrtNet consists of a feature extraction module, a correspondence matrix generation module, and a reconstruction module. Feeding a pair of point clouds, our model first learns the point-by-point features and feeds them to the transformer-based correspondence matrix generation module, which utilizes the transformer to learn the correspondence probability between pairs of point sets, and then the correspondence probability matrix conducts normalization to obtain the correct point set corresponding matrix. We then permute the point clouds and learn the relative drift of the point pairs to reconstruct the point clouds for registration. Extensive experiments on synthetic and real datasets of non-rigid 3D shapes show that NrtNet outperforms state-of-the-art methods, including methods that use grids as input and methods that directly compute point drift.
Collapse
Affiliation(s)
- Xiaobo Hu
- School of Computer Science, China University of Geosciences, Wuhan 430074, China; (X.H.); (J.C.); (Y.W.)
| | - Dejun Zhang
- School of Computer Science, China University of Geosciences, Wuhan 430074, China; (X.H.); (J.C.); (Y.W.)
| | - Jinzhi Chen
- School of Computer Science, China University of Geosciences, Wuhan 430074, China; (X.H.); (J.C.); (Y.W.)
| | - Yiqi Wu
- School of Computer Science, China University of Geosciences, Wuhan 430074, China; (X.H.); (J.C.); (Y.W.)
| | - Yilin Chen
- Hubei Key Laboratory of Intelligent Robot (Wuhan Institute of Technology), Wuhan 430205, China;
| |
Collapse
|
11
|
Castillon M, Ridao P, Siegwart R, Cadena C. Linewise Non-Rigid Point Cloud Registration. IEEE Robot Autom Lett 2022. [DOI: 10.1109/lra.2022.3180038] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/07/2022]
Affiliation(s)
- Miguel Castillon
- Computer Vision and Robotics Research Institute (VICOROB), University of Girona, Girona, Spain
| | - Pere Ridao
- Computer Vision and Robotics Research Institute (VICOROB), University of Girona, Girona, Spain
| | - Roland Siegwart
- Autonomous Systems Lab (ASL), ETH Zurich, Zurich, Switzerland
| | - Cesar Cadena
- Autonomous Systems Lab (ASL), ETH Zurich, Zurich, Switzerland
| |
Collapse
|
12
|
Min Z, Liu J, Liu L, Meng MQH. Generalized Coherent Point Drift With Multi-Variate Gaussian Distribution and Watson Distribution. IEEE Robot Autom Lett 2021. [DOI: 10.1109/lra.2021.3093011] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]
|
13
|
Hirose O. Acceleration of Non-Rigid Point Set Registration With Downsampling and Gaussian Process Regression. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE 2021; 43:2858-2865. [PMID: 33301401 DOI: 10.1109/tpami.2020.3043769] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/12/2023]
Abstract
Non-rigid point set registration is the process of transforming a shape represented as a point set into a shape matching another shape. In this paper, we propose an acceleration method for solving non-rigid point set registration problems. We accelerate non-rigid registration by dividing it into three steps: i) downsampling of point sets; ii) non-rigid registration of downsampled point sets; and iii) interpolation of shape deformation vectors corresponding to points removed during downsampling. To register downsampled point sets, we use a registration algorithm based on a prior distribution, called motion coherence prior. Using the same prior, we derive an interpolation method interpreted as Gaussian process regression. Through numerical experiments, we demonstrate that our algorithm registers point sets containing over ten million points. We also show that our algorithm reduces computing time more radically than a state-of-the-art acceleration algorithm.
Collapse
|
14
|
Yang L, Li Q, Song X, Cai W, Hou C, Xiong Z. An Improved Stereo Matching Algorithm for Vehicle Speed Measurement System Based on Spatial and Temporal Image Fusion. ENTROPY 2021; 23:e23070866. [PMID: 34356407 PMCID: PMC8305597 DOI: 10.3390/e23070866] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 06/14/2021] [Revised: 07/04/2021] [Accepted: 07/05/2021] [Indexed: 11/20/2022]
Abstract
This paper proposes an improved stereo matching algorithm for vehicle speed measurement system based on spatial and temporal image fusion (STIF). Firstly, the matching point pairs in the license plate area with obviously abnormal distance to the camera are roughly removed according to the characteristic of license plate specification. Secondly, more mismatching point pairs are finely removed according to local neighborhood consistency constraint (LNCC). Thirdly, the optimum speed measurement point pairs are selected for successive stereo frame pairs by STIF of binocular stereo video, so that the 3D points corresponding to the matching point pairs for speed measurement in the successive stereo frame pairs are in the same position on the real vehicle, which can significantly improve the vehicle speed measurement accuracy. LNCC and STIF can be used not only for license plate, but also for vehicle logo, light, mirror etc. Experimental results demonstrate that the vehicle speed measurement system with the proposed LNCC+STIF stereo matching algorithm can significantly outperform the state-of-the-art system in accuracy.
Collapse
Affiliation(s)
- Lei Yang
- School of Electronic and Information, Zhongyuan University of Technology, Zhengzhou 450007, China; (Q.L.); (W.C.)
- Correspondence: (L.Y.); (X.S.)
| | - Qingyuan Li
- School of Electronic and Information, Zhongyuan University of Technology, Zhengzhou 450007, China; (Q.L.); (W.C.)
| | - Xiaowei Song
- School of Electronic and Information, Zhongyuan University of Technology, Zhengzhou 450007, China; (Q.L.); (W.C.)
- Dongjing Avenue Campus, Kaifeng University, Kaifeng 475004, China
- Correspondence: (L.Y.); (X.S.)
| | - Wenjing Cai
- School of Electronic and Information, Zhongyuan University of Technology, Zhengzhou 450007, China; (Q.L.); (W.C.)
| | - Chunping Hou
- School of Electrical and Information Engineering, Tianjin University, Tianjin 300072, China;
| | - Zixiang Xiong
- Department of Electrical and Computer Engineering, Texas A&M University, College Station, TX 77843, USA;
| |
Collapse
|
15
|
Hirose O. A Bayesian Formulation of Coherent Point Drift. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE 2021; 43:2269-2286. [PMID: 32031931 DOI: 10.1109/tpami.2020.2971687] [Citation(s) in RCA: 27] [Impact Index Per Article: 6.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/10/2023]
Abstract
Coherent point drift is a well-known algorithm for solving point set registration problems, i.e., finding corresponding points between shapes represented as point sets. Despite its advantages over other state-of-the-art algorithms, theoretical and practical issues remain. Among theoretical issues, (1) it is unknown whether the algorithm always converges, and (2) the meaning of the parameters concerning motion coherence is unclear. Among practical issues, (3) the algorithm is relatively sensitive to target shape rotation, and (4) acceleration of the algorithm is restricted to the use of the Gaussian kernel. To overcome these issues and provide a different and more general perspective to the algorithm, we formulate coherent point drift in a Bayesian setting. The formulation brings the following consequences and advances to the field: convergence of the algorithm is guaranteed by variational Bayesian inference; the definition of motion coherence as a prior distribution provides a basis for interpretation of the parameters; rigid and non-rigid registration can be performed in a single algorithm, enhancing robustness against target rotation. We also propose an acceleration scheme for the algorithm that can be applied to non-Gaussian kernels and that provides greater efficiency than coherent point drift.
Collapse
|
16
|
CHEN XINRONG, YANG FUMING, ZHANG ZIQUN, BAI BAODAN, GUO LEI. ROBUST SURFACE-MATCHING REGISTRATION BASED ON THE STRUCTURE INFORMATION FOR IMAGE-GUIDED NEUROSURGERY SYSTEM. J MECH MED BIOL 2021. [DOI: 10.1142/s0219519421400091] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022]
Abstract
Image-to-patient space registration is to make the accurate alignment between the actual operating space and the image space. Although the image-to-patient space registration using paired-point is used in some image-guided neurosurgery systems, the current paired-point registration method has some drawbacks and usually cannot achieve the best registration result. Therefore, surface-matching registration is proposed to solve this problem. This paper proposes a surface-matching method that accomplishes image-to-patient space registration automatically. We represent the surface point clouds by the Gaussian Mixture Model (GMM), which can smoothly approximate the probability density distribution of an arbitrary point set. We also use mutual information as the similarity measure between the point clouds and take into account the structure information of the points. To analyze the registration error, we introduce a method for the estimation of Target Registration Error (TRE) by generating simulated data. In the experiments, we used the point sets of the cranium surface and the model of the human head determined by a CT and laser scanner. The TRE was less than 2[Formula: see text]mm, and the TRE had better accuracy in the front and the posterior region. Compared to the Iterative Closest Point algorithm, the surface registration based on GMM and the structure information of the points proved superior in registration robustness and accurate implementation of image-to-patient registration.
Collapse
Affiliation(s)
- XINRONG CHEN
- Academy for Engineering and Technology, Fudan University, Shanghai 200433, P. R. China
- Shanghai Key Laboratory of Medical Image, Computing and Computer Assisted Intervention, Shanghai 200032, P. R. China
| | - FUMING YANG
- Huashan Hospital, Fudan University, Shanghai 200040, P. R. China
| | - ZIQUN ZHANG
- Information Center, Fudan University, Shanghai 200433, P. R. China
| | - BAODAN BAI
- School of Medical Instruments, Shanghai University of Medicine & Health Science, Shanghai 201318, P. R. China
| | - LEI GUO
- School of Business Administration, Shanghai Lixin University of Accounting and Finance, Shanghai 201620, P. R. China
| |
Collapse
|
17
|
Liu Y, Li Y, Dai L, Yang C, Wei L, Lai T, Chen R. Robust feature matching via advanced neighborhood topology consensus. Neurocomputing 2021. [DOI: 10.1016/j.neucom.2020.09.047] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/23/2022]
|
18
|
Wang G, Chen Y. SCM: Spatially Coherent Matching With Gaussian Field Learning for Nonrigid Point Set Registration. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2021; 32:203-213. [PMID: 32275605 DOI: 10.1109/tnnls.2020.2978031] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/11/2023]
Abstract
While point set registration has been studied in many areas of computer vision for decades, registering points encountering different degradations remains a challenging problem. In this article, we introduce a robust point pattern matching method, termed spatially coherent matching (SCM). The SCM algorithm consists of recovering correspondences and learning nonrigid transformations between the given model and scene point sets while preserving the local neighborhood structure. Precisely, the proposed SCM starts with the initial matches that are contaminated by degradations (e.g., deformation, noise, occlusion, rotation, multiview, and outliers), and the main task is to recover the underlying correspondences and learn the nonrigid transformation alternately. Based on unsupervised manifold learning, the challenging problem of point set registration can be formulated by the Gaussian fields criterion under a local preserving constraint, where the neighborhood structure could be preserved in each transforming. Moreover, the nonrigid transformation is modeled in a reproducing kernel Hilbert space, and we use a kernel approximation strategy to boost efficiency. Experimental results demonstrate that the proposed approach robustly rejecting mismatches and registers complex point set pairs containing large degradations.
Collapse
|
19
|
Ultrarobust support vector registration. APPL INTELL 2020. [DOI: 10.1007/s10489-020-01967-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/23/2022]
|
20
|
Jiang J, Yu Y, Wang Z, Tang S, Hu R, Ma J. Ensemble Super-Resolution With a Reference Dataset. IEEE TRANSACTIONS ON CYBERNETICS 2020; 50:4694-4708. [PMID: 30843812 DOI: 10.1109/tcyb.2018.2890149] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/09/2023]
Abstract
By developing sophisticated image priors or designing deep(er) architectures, a variety of image super-resolution (SR) approaches have been proposed recently and achieved very promising performance. A natural question that arises is whether these methods can be reformulated into a unifying framework and whether this framework assists in SR reconstruction? In this paper, we present a simple but effective single image SR method based on ensemble learning, which can produce a better performance than that could be obtained from any of SR methods to be ensembled (or called component super-resolvers). Based on the assumption that better component super-resolver should have larger ensemble weight when performing SR reconstruction, we present a maximum a posteriori (MAP) estimation framework for the inference of optimal ensemble weights. Especially, we introduce a reference dataset, which is composed of high-resolution (HR) and low-resolution (LR) image pairs, to measure the SR abilities (prior knowledge) of different component super-resolvers. To obtain the optimal ensemble weights, we propose to incorporate the reconstruction constraint, which states that the degenerated HR estimation should be equal to the LR observation one, as well as the prior knowledge of ensemble weights into the MAP estimation framework. Moreover, the proposed optimization problem can be solved by an analytical solution. We study the performance of the proposed method by comparing with different competitive approaches, including four state-of-the-art nondeep learning-based methods, four latest deep learning-based methods, and one ensemble learning-based method, and prove its effectiveness and superiority on some general image datasets and face image datasets.
Collapse
|
21
|
General first-order target registration error model considering a coordinate reference frame in an image-guided surgical system. Med Biol Eng Comput 2020; 58:2989-3002. [PMID: 33029759 DOI: 10.1007/s11517-020-02265-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/30/2019] [Accepted: 09/08/2020] [Indexed: 10/23/2022]
Abstract
Point-based rigid registration (PBRR) techniques are widely used in many aspects of image-guided surgery (IGS). Accurately estimating target registration error (TRE) statistics is of essential value for medical applications such as optically surgical tool-tip tracking and image registration. For example, knowing the TRE distribution statistics of surgical tool tip can help the surgeon make right decisions during surgery. In the meantime, the pose of a surgical tool is usually reported relative to a second rigid body whose local frame is called coordinate reference frame (CRF). In an n-ocular tracking system, fiducial localization error (FLE) should be considered inhomogeneous, that means FLE is different between fiducials, and anisotropic that indicates FLE is different in all directions. In this paper, we extend the TRE estimation algorithm relative to a CRF from homogeneous and anisotropic to heterogeneous FLE cases. Arbitrary weightings can be assumed in solving the registration problems in the proposed TRE estimation algorithm. Monte Carlo simulation results demonstrate the proposed algorithm's effectiveness for both homogeneous and inhomogeneous FLE distributions. The results are further compared with those using the other two algorithms. When FLE distribution is anisotropic and homogeneous, the proposed TRE estimation algorithm's performance is comparable with that of the first one. When FLE distribution is heterogeneous, proposed TRE estimation algorithm outperforms the other two classical algorithms in all test cases when ideal weighting scheme is adopted in solving two registrations. Possible clinical applications include the online estimation of surgical tool-tip tracking error with respect to a CRF in IGS. Graphical Abstract This paper provides the target registration error model considering a coordinate reference frame in surgical navigation.
Collapse
|
22
|
Wang Y, Mei X, Ma Y, Huang J, Fan F, Ma J. Learning to find reliable correspondences with local neighborhood consensus. Neurocomputing 2020. [DOI: 10.1016/j.neucom.2020.04.016] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/24/2022]
|
23
|
Ma J, Jiang X, Fan A, Jiang J, Yan J. Image Matching from Handcrafted to Deep Features: A Survey. Int J Comput Vis 2020. [DOI: 10.1007/s11263-020-01359-2] [Citation(s) in RCA: 230] [Impact Index Per Article: 46.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/27/2022]
Abstract
AbstractAs a fundamental and critical task in various visual applications, image matching can identify then correspond the same or similar structure/content from two or more images. Over the past decades, growing amount and diversity of methods have been proposed for image matching, particularly with the development of deep learning techniques over the recent years. However, it may leave several open questions about which method would be a suitable choice for specific applications with respect to different scenarios and task requirements and how to design better image matching methods with superior performance in accuracy, robustness and efficiency. This encourages us to conduct a comprehensive and systematic review and analysis for those classical and latest techniques. Following the feature-based image matching pipeline, we first introduce feature detection, description, and matching techniques from handcrafted methods to trainable ones and provide an analysis of the development of these methods in theory and practice. Secondly, we briefly introduce several typical image matching-based applications for a comprehensive understanding of the significance of image matching. In addition, we also provide a comprehensive and objective comparison of these classical and latest techniques through extensive experiments on representative datasets. Finally, we conclude with the current status of image matching technologies and deliver insightful discussions and prospects for future works. This survey can serve as a reference for (but not limited to) researchers and engineers in image matching and related fields.
Collapse
|
24
|
Feng XW, Feng DZ. A Robust Nonrigid Point Set Registration Method Based on Collaborative Correspondences. SENSORS 2020; 20:s20113248. [PMID: 32517316 PMCID: PMC7308981 DOI: 10.3390/s20113248] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 05/22/2020] [Revised: 06/04/2020] [Accepted: 06/05/2020] [Indexed: 11/16/2022]
Abstract
The nonrigid point set registration is one of the bottlenecks and has the wide applications in computer vision, pattern recognition, image fusion, video processing, and so on. In a nonrigid point set registration problem, finding the point-to-point correspondences is challengeable because of the various image degradations. In this paper, a robust method is proposed to accurately determine the correspondences by fusing the two complementary structural features, including the spatial location of a point and the local structure around it. The former is used to define the absolute distance (AD), and the latter is exploited to define the relative distance (RD). The AD-correspondences and the RD-correspondences can be established based on AD and RD, respectively. The neighboring corresponding consistency is employed to assign the confidence for each RD-correspondence. The proposed heuristic method combines the AD-correspondences and the RD-correspondences to determine the corresponding relationship between two point sets, which can significantly improve the corresponding accuracy. Subsequently, the thin plate spline (TPS) is employed as the transformation function. At each step, the closed-form solutions of the affine and nonaffine parts of TPS can be independently and robustly solved. It facilitates to analyze and control the registration process. Experimental results demonstrate that our method can achieve better performance than several existing state-of-the-art methods.
Collapse
|
25
|
Zhou H, Ma J, Tan CC, Zhang Y, Ling H. Cross-Weather Image Alignment via Latent Generative Model with Intensity Consistency. IEEE TRANSACTIONS ON IMAGE PROCESSING : A PUBLICATION OF THE IEEE SIGNAL PROCESSING SOCIETY 2020; 29:5216-5228. [PMID: 32217476 DOI: 10.1109/tip.2020.2980210] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/10/2023]
Abstract
Image alignment/registration/correspondence is a critical prerequisite for many vision-based tasks, and it has been widely studied in computer vision. However, aligning images from different domains, such as cross-weather/season road scenes, remains a challenging problem. Inspired by the success of classic intensity-constancy-based image alignment methods and the modern generative adversarial network (GAN) technology, we propose a cross-weather road scene alignment method called latent generative model with intensity constancy. From a novel perspective, the alignment problem is formulated as a constrained 2D flow optimization problem with latent encoding, which can be decoded into an intensity-constancy image on the latent image manifold. The manifold is parameterized by a pre-trained GAN, which is able to capture statistic characteristics from large datasets. Moreover, we employ the learned manifold to constrain the warped latent image identical to the target image, thereby producing a realistic warping effect. Experimental results on several cross-weather/season road scene datasets demonstrate that our approach can significantly outperform the state-of-the-art methods.
Collapse
|
26
|
Du X, He X, Yuan F, Tang J, Qin Z, Chua TS. Modeling Embedding Dimension Correlations via Convolutional Neural Collaborative Filtering. ACM T INFORM SYST 2019. [DOI: 10.1145/3357154] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/26/2022]
Abstract
As the core of recommender systems, collaborative filtering (CF) models the affinity between a user and an item from historical user-item interactions, such as clicks, purchases, and so on. Benefiting from the strong representation power, neural networks have recently revolutionized the recommendation research, setting up a new standard for CF. However, existing neural recommender models do not explicitly consider the correlations among embedding dimensions, making them less effective in modeling the interaction function between users and items. In this work, we emphasize on modeling the correlations among embedding dimensions in neural networks to pursue higher effectiveness for CF. We propose a novel and general neural collaborative filtering framework—namely, ConvNCF, which is featured with two designs: (1) applying outer product on user embedding and item embedding to explicitly model the pairwise correlations between embedding dimensions, and (2) employing convolutional neural network above the outer product to learn the high-order correlations among embedding dimensions. To justify our proposal, we present three instantiations of ConvNCF by using different inputs to represent a user and conduct experiments on two real-world datasets. Extensive results verify the utility of modeling embedding dimension correlations with ConvNCF, which outperforms several competitive CF methods.
Collapse
Affiliation(s)
- Xiaoyu Du
- University of Electronic Science and Technology of China, Chengdu, Sichuan, China
| | - Xiangnan He
- HE, University of Science and Technology of China, Anhui, China
| | - Fajie Yuan
- Platform and Content Group (PCG) of Tencent, Guangdong, China
| | - Jinhui Tang
- Nanjing University of Science and Technology, Jiangsu, China
| | - Zhiguang Qin
- University of Electronic Science and Technology of China, Chengdu, Sichuan, China
| | | |
Collapse
|
27
|
Xu Z, Hu R, Chen J, Chen C, Jiang J, Li J, Li H. Semisupervised Discriminant Multimanifold Analysis for Action Recognition. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2019; 30:2951-2962. [PMID: 30762568 DOI: 10.1109/tnnls.2018.2886008] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/09/2023]
Abstract
Although recent semisupervised approaches have proven their effectiveness when there are limited training data, they assume that the samples from different actions lie on a single data manifold in the feature space and try to uncover a common subspace for all samples. However, this assumption ignores the intraclass compactness and the interclass separability simultaneously. We believe that human actions should occupy multimanifold subspace and, therefore, model the samples of the same action as the same manifold and those of different actions as different manifolds. In order to obtain the optimum subspace projection matrix, the current approaches may be mathematically imprecise owe to the badly scaled matrix and improper convergence. To address these issues in unconstrained convex optimization, we introduce a nontrivial spectral projected gradient method and Karush-Kuhn-Tucker conditions without matrix inversion. Through maximizing the separability between different classes by using labeled data points and estimating the intrinsic geometric structure of the data distributions by exploring unlabeled data points, the proposed algorithm can learn global and local consistency and boost the recognition performance. Extensive experiments conducted on the realistic video data sets, including JHMDB, HMDB51, UCF50, and UCF101, have demonstrated that our algorithm outperforms the compared algorithms, including deep learning approach when there are only a few labeled samples.
Collapse
|
28
|
Ma J, Chow TWS. Label-specific feature selection and two-level label recovery for multi-label classification with missing labels. Neural Netw 2019; 118:110-126. [PMID: 31254766 DOI: 10.1016/j.neunet.2019.04.011] [Citation(s) in RCA: 14] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/28/2018] [Revised: 03/27/2019] [Accepted: 04/08/2019] [Indexed: 11/29/2022]
Abstract
In multi-label learning, each instance is assigned by several nonexclusive labels. However, these labels are often incomplete, resulting in unsatisfactory performance in label related applications. We design a two-level label recovery mechanism to perform label imputation in training sets. An instance-wise semantic relational graph and a label-wise semantic relational graph are used in this mechanism to recover the label matrix. These two graphs exhibit a capability of capturing reliable two-level semantic correlations. We also design a label-specific feature selection mechanism to perform label prediction in testing sets. The local and global feature-label connection are both exploited in this mechanism to learn an inductive classifier. By updating the matrix that represents the relevance between features and the predicted labels, the label-specific feature selection mechanism is robust to missing labels. At last, intensive experimental results on nine datasets under different domains are presented to demonstrate the effectiveness of the proposed approach.
Collapse
Affiliation(s)
- Jianghong Ma
- Department of Electronic Engineering, City University of Hong Kong Tat Chee Avenue, Kowloon, Hong Kong Special Administrative Region.
| | - Tommy W S Chow
- Department of Electronic Engineering, City University of Hong Kong Tat Chee Avenue, Kowloon, Hong Kong Special Administrative Region.
| |
Collapse
|
29
|
Ma J, Jiang X, Jiang J, Guo X. Robust Feature Matching Using Spatial Clustering with Heavy Outliers. IEEE TRANSACTIONS ON IMAGE PROCESSING : A PUBLICATION OF THE IEEE SIGNAL PROCESSING SOCIETY 2019; 29:736-746. [PMID: 31449018 DOI: 10.1109/tip.2019.2934572] [Citation(s) in RCA: 29] [Impact Index Per Article: 4.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/10/2023]
Abstract
This paper focuses on removing mismatches from given putative feature matches created typically based on descriptor similarity. To achieve this goal, existing attempts usually involve estimating the image transformation under a geometrical constraint, where a pre-defined transformation model is demanded. This severely limits the applicability, as the transformation could vary with different data and is complex and hard to model in many real-world tasks. From a novel perspective, this paper casts the feature matching into a spatial clustering problem with outliers. The main idea is to adaptively cluster the putative matches into several motion consistent clusters together with an outlier/mismatch cluster. To implement the spatial clustering, we customize the classic density based spatial clustering method of applications with noise (DBSCAN) in the context of feature matching, which enables our approach to achieve quasi-linear time complexity. We also design an iterative clustering strategy to promote the matching performance in case of severely degraded data. Extensive experiments on several datasets involving different types of image transformations demonstrate the superiority of our approach over state-of-the-art alternatives. Our approach is also applied to near-duplicate image retrieval and co-segmentation and achieves promising performance.
Collapse
|
30
|
Ma J, Jiang X, Jiang J, Zhao J, Guo X. LMR: Learning a Two-Class Classifier for Mismatch Removal. IEEE TRANSACTIONS ON IMAGE PROCESSING : A PUBLICATION OF THE IEEE SIGNAL PROCESSING SOCIETY 2019; 28:4045-4059. [PMID: 30908218 DOI: 10.1109/tip.2019.2906490] [Citation(s) in RCA: 23] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/09/2023]
Abstract
Feature matching, which refers to establishing reliable correspondence between two sets of features, is a critical prerequisite in a wide spectrum of vision-based tasks. Existing attempts typically involve the mismatch removal from a set of putative matches based on estimating the underlying image transformation. However, the transformation could vary with different data. Thus, a pre-defined transformation model is often demanded, which severely limits the applicability. From a novel perspective, this paper casts the mismatch removal into a two-class classification problem, learning a general classifier to determine the correctness of an arbitrary putative match, termed as Learning for Mismatch Removal (LMR). The classifier is trained based on a general match representation associated with each putative match through exploiting the consensus of local neighborhood structures based on a multiple K -nearest neighbors strategy. With only ten training image pairs involving about 8000 putative matches, the learned classifier can generate promising matching results in linearithmic time complexity on arbitrary testing data. The generality and robustness of our approach are verified under several representative supervised learning techniques as well as on different training and testing data. Extensive experiments on feature matching, visual homing, and near-duplicate image retrieval are conducted to reveal the superiority of our LMR over the state-of-the-art competitors.
Collapse
|
31
|
|
32
|
Laplacian Regularized Spatial-Aware Collaborative Graph for Discriminant Analysis of Hyperspectral Imagery. REMOTE SENSING 2018. [DOI: 10.3390/rs11010029] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/16/2022]
Abstract
Dimensionality Reduction (DR) models are of significance to extract low-dimensional features for Hyperspectral Images (HSIs) data analysis where there exist lots of noisy and redundant spectral features. Among many DR techniques, the Graph-Embedding Discriminant Analysis framework has demonstrated its effectiveness for HSI feature reduction. Based on this framework, many representation based models are developed to learn the similarity graphs, but most of these methods ignore the spatial information, resulting in unsatisfactory performance of DR models. In this paper, we firstly propose a novel supervised DR algorithm termed Spatial-aware Collaborative Graph for Discriminant Analysis (SaCGDA) by introducing a simple but efficient spatial constraint into Collaborative Graph-based Discriminate Analysis (CGDA) which is inspired by recently developed Spatial-aware Collaborative Representation (SaCR). In order to make the representation of samples on the data manifold smoother, i.e., similar pixels share similar representations, we further add the spectral Laplacian regularization and propose the Laplacian regularized SaCGDA (LapSaCGDA), where the two spectral and spatial constraints can exploit the intrinsic geometric structures embedded in HSIs efficiently. Experiments on three HSIs data sets verify that the proposed SaCGDA and LapSaCGDA outperform other state-of-the-art methods.
Collapse
|
33
|
Du Q, Xu H, Ma Y, Huang J, Fan F. Fusing Infrared and Visible Images of Different Resolutions via Total Variation Model. SENSORS 2018; 18:s18113827. [PMID: 30413066 PMCID: PMC6263655 DOI: 10.3390/s18113827] [Citation(s) in RCA: 30] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 09/09/2018] [Revised: 11/03/2018] [Accepted: 11/05/2018] [Indexed: 11/23/2022]
Abstract
In infrared and visible image fusion, existing methods typically have a prerequisite that the source images share the same resolution. However, due to limitations of hardware devices and application environments, infrared images constantly suffer from markedly lower resolution compared with the corresponding visible images. In this case, current fusion methods inevitably cause texture information loss in visible images or blur thermal radiation information in infrared images. Moreover, the principle of existing fusion rules typically focuses on preserving texture details in source images, which may be inappropriate for fusing infrared thermal radiation information because it is characterized by pixel intensities, possibly neglecting the prominence of targets in fused images. Faced with such difficulties and challenges, we propose a novel method to fuse infrared and visible images of different resolutions and generate high-resolution resulting images to obtain clear and accurate fused images. Specifically, the fusion problem is formulated as a total variation (TV) minimization problem. The data fidelity term constrains the pixel intensity similarity of the downsampled fused image with respect to the infrared image, and the regularization term compels the gradient similarity of the fused image with respect to the visible image. The fast iterative shrinkage-thresholding algorithm (FISTA) framework is applied to improve the convergence rate. Our resulting fused images are similar to super-resolved infrared images, which are sharpened by the texture information from visible images. Advantages and innovations of our method are demonstrated by the qualitative and quantitative comparisons with six state-of-the-art methods on publicly available datasets.
Collapse
Affiliation(s)
- Qinglei Du
- Electronic Information School, Wuhan University, Wuhan 430072, China.
- Air Force Early Warning Academy, Wuhan 430019, China.
| | - Han Xu
- Electronic Information School, Wuhan University, Wuhan 430072, China.
| | - Yong Ma
- Electronic Information School, Wuhan University, Wuhan 430072, China.
| | - Jun Huang
- Electronic Information School, Wuhan University, Wuhan 430072, China.
| | - Fan Fan
- Electronic Information School, Wuhan University, Wuhan 430072, China.
| |
Collapse
|