1
|
Sun K, Zhang J, Xu S, Zhao Z, Zhang C, Liu J, Hu J. CACNN: Capsule Attention Convolutional Neural Networks for 3D Object Recognition. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2025; 36:4091-4102. [PMID: 37934641 DOI: 10.1109/tnnls.2023.3326606] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/09/2023]
Abstract
Recently, view-based approaches, which recognize a 3D object through its projected 2-D images, have been extensively studied and have achieved considerable success in 3D object recognition. Nevertheless, most of them use a pooling operation to aggregate viewwise features, which usually leads to the visual information loss. To tackle this problem, we propose a novel layer called capsule attention layer (CAL) by using attention mechanism to fuse the features expressed by capsules. In detail, instead of dynamic routing algorithm, we use an attention module to transmit information from the lower level capsules to higher level capsules, which obviously improves the speed of capsule networks. In particular, the view pooling layer of multiview convolutional neural network (MVCNN) becomes a special case of our CAL when the trainable weights are chosen on some certain values. Furthermore, based on CAL, we propose a capsule attention convolutional neural network (CACNN) for 3D object recognition. Extensive experimental results on three benchmark datasets demonstrate the efficiency of our CACNN and show that it outperforms many state-of-the-art methods.
Collapse
|
2
|
Chu M, De Maria GL, Dai R, Benenati S, Yu W, Zhong J, Kotronias R, Walsh J, Andreaggi S, Zuccarelli V, Chai J, Channon K, Banning A, Tu S. DCCAT: Dual-Coordinate Cross-Attention Transformer for thrombus segmentation on coronary OCT. Med Image Anal 2024; 97:103265. [PMID: 39029158 DOI: 10.1016/j.media.2024.103265] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/04/2024] [Revised: 06/02/2024] [Accepted: 07/01/2024] [Indexed: 07/21/2024]
Abstract
Acute coronary syndromes (ACS) are one of the leading causes of mortality worldwide, with atherosclerotic plaque rupture and subsequent thrombus formation as the main underlying substrate. Thrombus burden evaluation is important for tailoring treatment therapy and predicting prognosis. Coronary optical coherence tomography (OCT) enables in-vivo visualization of thrombus that cannot otherwise be achieved by other image modalities. However, automatic quantification of thrombus on OCT has not been implemented. The main challenges are due to the variation in location, size and irregularities of thrombus in addition to the small data set. In this paper, we propose a novel dual-coordinate cross-attention transformer network, termed DCCAT, to overcome the above challenges and achieve the first automatic segmentation of thrombus on OCT. Imaging features from both Cartesian and polar coordinates are encoded and fused based on long-range correspondence via multi-head cross-attention mechanism. The dual-coordinate cross-attention block is hierarchically stacked amid convolutional layers at multiple levels, allowing comprehensive feature enhancement. The model was developed based on 5,649 OCT frames from 339 patients and tested using independent external OCT data from 548 frames of 52 patients. DCCAT achieved Dice similarity score (DSC) of 0.706 in segmenting thrombus, which is significantly higher than the CNN-based (0.656) and Transformer-based (0.584) models. We prove that the additional input of polar image not only leverages discriminative features from another coordinate but also improves model robustness for geometrical transformation.Experiment results show that DCCAT achieves competitive performance with only 10% of the total data, highlighting its data efficiency. The proposed dual-coordinate cross-attention design can be easily integrated into other developed Transformer models to boost performance.
Collapse
Affiliation(s)
- Miao Chu
- Biomedical Instrument Institute, School of Biomedical Engineering, Shanghai Jiao Tong University, Shanghai, China; Oxford Heart Centre, Oxford University Hospitals NHS Trust, UK; Division of Cardiovascular Medicine, Radcliffe Department of Medicine, University of Oxford, UK
| | - Giovanni Luigi De Maria
- Oxford Heart Centre, Oxford University Hospitals NHS Trust, UK; Division of Cardiovascular Medicine, Radcliffe Department of Medicine, University of Oxford, UK; National Institute for Health Research, Oxford Biomedical Research Centre, UK.
| | - Ruobing Dai
- Biomedical Instrument Institute, School of Biomedical Engineering, Shanghai Jiao Tong University, Shanghai, China
| | - Stefano Benenati
- Oxford Heart Centre, Oxford University Hospitals NHS Trust, UK; Division of Cardiovascular Medicine, Radcliffe Department of Medicine, University of Oxford, UK; University of Genoa, Genoa, Italy
| | - Wei Yu
- Biomedical Instrument Institute, School of Biomedical Engineering, Shanghai Jiao Tong University, Shanghai, China
| | - Jiaxin Zhong
- Biomedical Instrument Institute, School of Biomedical Engineering, Shanghai Jiao Tong University, Shanghai, China; Department of Cardiology, Fujian Medical University Union Hospital, Fujian, China
| | - Rafail Kotronias
- Oxford Heart Centre, Oxford University Hospitals NHS Trust, UK; Division of Cardiovascular Medicine, Radcliffe Department of Medicine, University of Oxford, UK; National Institute for Health Research, Oxford Biomedical Research Centre, UK
| | - Jason Walsh
- Oxford Heart Centre, Oxford University Hospitals NHS Trust, UK; Division of Cardiovascular Medicine, Radcliffe Department of Medicine, University of Oxford, UK; National Institute for Health Research, Oxford Biomedical Research Centre, UK
| | - Stefano Andreaggi
- Oxford Heart Centre, Oxford University Hospitals NHS Trust, UK; Division of Cardiology, Department of Medicine, University of Verona, Italy
| | | | - Jason Chai
- Oxford Heart Centre, Oxford University Hospitals NHS Trust, UK; Division of Cardiovascular Medicine, Radcliffe Department of Medicine, University of Oxford, UK
| | - Keith Channon
- Oxford Heart Centre, Oxford University Hospitals NHS Trust, UK; Division of Cardiovascular Medicine, Radcliffe Department of Medicine, University of Oxford, UK; National Institute for Health Research, Oxford Biomedical Research Centre, UK
| | - Adrian Banning
- Oxford Heart Centre, Oxford University Hospitals NHS Trust, UK; Division of Cardiovascular Medicine, Radcliffe Department of Medicine, University of Oxford, UK; National Institute for Health Research, Oxford Biomedical Research Centre, UK
| | - Shengxian Tu
- Biomedical Instrument Institute, School of Biomedical Engineering, Shanghai Jiao Tong University, Shanghai, China; Division of Cardiovascular Medicine, Radcliffe Department of Medicine, University of Oxford, UK.
| |
Collapse
|
3
|
Ma N, Wu Z, Feng Y, Wang C, Gao Y. Multi-View Time-Series Hypergraph Neural Network for Action Recognition. IEEE TRANSACTIONS ON IMAGE PROCESSING : A PUBLICATION OF THE IEEE SIGNAL PROCESSING SOCIETY 2024; 33:3301-3313. [PMID: 38700958 DOI: 10.1109/tip.2024.3391913] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/05/2024]
Abstract
Recently, action recognition has attracted considerable attention in the field of computer vision. In dynamic circumstances and complicated backgrounds, there are some problems, such as object occlusion, insufficient light, and weak correlation of human body joints, resulting in skeleton-based human action recognition accuracy being very low. To address this issue, we propose a Multi-View Time-Series Hypergraph Neural Network (MV-TSHGNN) method. The framework is composed of two main parts: the construction of a multi-view time-series hypergraph structure and the learning process of multi-view time-series hypergraph convolutions. Specifically, given the multi-view video sequence frames, we first extract the joint features of actions from different views. Then, limb components and adjacent joints spatial hypergraphs based on the joints of different views at the same time are constructed respectively, temporal hypergraphs are constructed joints of the same view at continuous times, which are established high-order semantic relationships and cooperatively generate complementary action features. After that, we design a multi-view time-series hypergraph neural network to efficiently learn the features of spatial and temporal hypergraphs, and effectively improve the accuracy of skeleton-based action recognition. To evaluate the effectiveness and efficiency of MV-TSHGNN, we conduct experiments on NTU RGB+D, NTU RGB+D 120 and imitating traffic police gestures datasets. The experimental results indicate that our proposed method model achieves the new state-of-the-art performance.
Collapse
|
4
|
Huang H, Zhou G, Zhao Q, He L, Xie S. Comprehensive Multiview Representation Learning via Deep Autoencoder-Like Nonnegative Matrix Factorization. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2024; 35:5953-5967. [PMID: 37672378 DOI: 10.1109/tnnls.2023.3304626] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 09/08/2023]
Abstract
Learning a comprehensive representation from multiview data is crucial in many real-world applications. Multiview representation learning (MRL) based on nonnegative matrix factorization (NMF) has been widely adopted by projecting high-dimensional space into a lower order dimensional space with great interpretability. However, most prior NMF-based MRL techniques are shallow models that ignore hierarchical information. Although deep matrix factorization (DMF)-based methods have been proposed recently, most of them only focus on the consistency of multiple views and have cumbersome clustering steps. To address the above issues, in this article, we propose a novel model termed deep autoencoder-like NMF for MRL (DANMF-MRL), which obtains the representation matrix through the deep encoding stage and decodes it back to the original data. In this way, through a DANMF-based framework, we can simultaneously consider the multiview consistency and complementarity, allowing for a more comprehensive representation. We further propose a one-step DANMF-MRL, which learns the latent representation and final clustering labels matrix in a unified framework. In this approach, the two steps can negotiate with each other to fully exploit the latent clustering structure, avoid previous tedious clustering steps, and achieve optimal clustering performance. Furthermore, two efficient iterative optimization algorithms are developed to solve the proposed models both with theoretical convergence analysis. Extensive experiments on five benchmark datasets demonstrate the superiority of our approaches against other state-of-the-art MRL methods.
Collapse
|
5
|
Zhou L, Wu G, Zuo Y, Chen X, Hu H. A Comprehensive Review of Vision-Based 3D Reconstruction Methods. SENSORS (BASEL, SWITZERLAND) 2024; 24:2314. [PMID: 38610525 PMCID: PMC11014007 DOI: 10.3390/s24072314] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/10/2024] [Revised: 03/28/2024] [Accepted: 04/03/2024] [Indexed: 04/14/2024]
Abstract
With the rapid development of 3D reconstruction, especially the emergence of algorithms such as NeRF and 3DGS, 3D reconstruction has become a popular research topic in recent years. 3D reconstruction technology provides crucial support for training extensive computer vision models and advancing the development of general artificial intelligence. With the development of deep learning and GPU technology, the demand for high-precision and high-efficiency 3D reconstruction information is increasing, especially in the fields of unmanned systems, human-computer interaction, virtual reality, and medicine. The rapid development of 3D reconstruction is becoming inevitable. This survey categorizes the various methods and technologies used in 3D reconstruction. It explores and classifies them based on three aspects: traditional static, dynamic, and machine learning. Furthermore, it compares and discusses these methods. At the end of the survey, which includes a detailed analysis of the trends and challenges in 3D reconstruction development, we aim to provide a comprehensive introduction for individuals who are currently engaged in or planning to conduct research on 3D reconstruction. Our goal is to help them gain a comprehensive understanding of the relevant knowledge related to 3D reconstruction.
Collapse
Affiliation(s)
| | - Guoxin Wu
- Key Laboratory of Modern Measurement and Control Technology Ministry of Education, Beijing Information Science and Technology University, Beijing 100080, China; (L.Z.); (Y.Z.); (X.C.); (H.H.)
| | | | | | | |
Collapse
|
6
|
Wang W, Wang X, Chen G, Zhou H. Multi-view SoftPool attention convolutional networks for 3D model classification. Front Neurorobot 2022; 16:1029968. [DOI: 10.3389/fnbot.2022.1029968] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/28/2022] [Accepted: 11/01/2022] [Indexed: 11/17/2022] Open
Abstract
IntroductionExisting multi-view-based 3D model classification methods have the problems of insufficient view refinement feature extraction and poor generalization ability of the network model, which makes it difficult to further improve the classification accuracy. To this end, this paper proposes a multi-view SoftPool attention convolutional network for 3D model classification tasks.MethodsThis method extracts multi-view features through ResNest and adaptive pooling modules, and the extracted features can better represent 3D models. Then, the results of the multi-view feature extraction processed using SoftPool are used as the Query for the self-attentive calculation, which enables the subsequent refinement extraction. We then input the attention scores calculated by Query and Key in the self-attention calculation into the mobile inverted bottleneck convolution, which effectively improves the generalization of the network model. Based on our proposed method, a compact 3D global descriptor is finally generated, achieving a high-accuracy 3D model classification performance.ResultsExperimental results showed that our method achieves 96.96% OA and 95.68% AA on ModelNet40 and 98.57% OA and 98.42% AA on ModelNet10.DiscussionCompared with a multitude of popular methods, our algorithm model achieves the state-of-the-art classification accuracy.
Collapse
|
9
|
Wu Q, Gao K, Mao Y, Li M, Jin X, Xiong J, Yu P. Three-dimensional reconstruction using variable exponential function regularization for wide-field polarization modulation imaging of surface texture of particles. OPTICS LETTERS 2021; 46:3998-4001. [PMID: 34388795 DOI: 10.1364/ol.426395] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/31/2021] [Accepted: 06/29/2021] [Indexed: 06/13/2023]
Abstract
Shapes from the diffuse polarization method effectively realize the three-dimensional (3D) reconstruction of the object surface by using the polarization information of the diffuse reflection light. However, due to the nonconvexity of the particle surface, the reconstruction often falls into a local optimal solution. Indeed, the depth image obtained by the scanning electron microscope has serious stripe noise, which distorts the surface texture of the particle. In this Letter, a variable exponential function regularization method is proposed to realize 3D reconstruction for the nonconvexity of the surface and inclination of the particles. We focus on the gradient unintegrability caused by the skew and surface undulation of the specimen. An adaptive 3D reconstruction method is proposed based on variable exponential function regularization to fit the surface function of the particle. Experimental results of finite-difference time-domain simulations and actual imaging demonstrate the effectiveness of the method.
Collapse
|