1
|
Dai Y, Feng Y, Ma N, Zhao X, Gao Y. Cross-Modal 3D Shape Retrieval via Heterogeneous Dynamic Graph Representation. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE 2025; 47:2370-2387. [PMID: 40030786 DOI: 10.1109/tpami.2024.3524440] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 03/05/2025]
Abstract
Cross-modal 3D shape retrieval is a crucial and widely applied task in the field of 3D vision. Its goal is to construct retrieval representations capable of measuring the similarity between instances of different 3D modalities. However, existing methods face challenges due to the performance bottlenecks of single-modal representation extractors and the modality gap across 3D modalities. To tackle these issues, we propose a Heterogeneous Dynamic Graph Representation (HDGR) network, which incorporates context-dependent dynamic relations within a heterogeneous framework. By capturing correlations among diverse 3D objects, HDGR overcomes the limitations of ambiguous representations obtained solely from instances. Within the context of varying mini-batches, dynamic graphs are constructed to capture proximal intra-modal relations, and dynamic bipartite graphs represent implicit cross-modal relations, effectively addressing the two challenges above. Subsequently, message passing and aggregation are performed using Dynamic Graph Convolution (DGConv) and Dynamic Bipartite Graph Convolution (DBConv), enhancing features through heterogeneous dynamic relation learning. Finally, intra-modal, cross-modal, and self-transformed features are redistributed and integrated into a heterogeneous dynamic representation for cross-modal 3D shape retrieval. HDGR establishes a stable, context-enhanced, structure-aware 3D shape representation by capturing heterogeneous inter-object relationships and adapting to varying contextual dynamics. Extensive experiments conducted on the ModelNet10, ModelNet40, and real-world ABO datasets demonstrate the state-of-the-art performance of HDGR in cross-modal and intra-modal retrieval tasks. Moreover, under the supervision of robust loss functions, HDGR achieves remarkable cross-modal retrieval against label noise on the 3D MNIST dataset. The comprehensive experimental results highlight the effectiveness and efficiency of HDGR on cross-modal 3D shape retrieval.
Collapse
|
2
|
Sun H, Wang Y, Wang P, Deng H, Cai X, Li D. VSFormer: Mining Correlations in Flexible View Set for Multi-View 3D Shape Understanding. IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS 2025; 31:2127-2141. [PMID: 38526893 DOI: 10.1109/tvcg.2024.3381152] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 03/27/2024]
Abstract
View-based methods have demonstrated promising performance in 3D shape understanding. However, they tend to make strong assumptions about the relations between views or learn the multi-view correlations indirectly, which limits the flexibility of exploring inter-view correlations and the effectiveness of target tasks. To overcome the above problems, this article investigates flexible organization and explicit correlation learning for multiple views. In particular, we propose to incorporate different views of a 3D shape into a permutation-invariant set, referred to as View Set, which removes rigid relation assumptions and facilitates adequate information exchange and fusion among views. Based on that, we devise a nimble Transformer model, named VSFormer, to explicitly capture pairwise and higher-order correlations of all elements in the set. Meanwhile, we theoretically reveal a natural correspondence between the Cartesian product of a view set and the correlation matrix in the attention mechanism, which supports our model design. Comprehensive experiments suggest that VSFormer has better flexibility, efficient inference efficiency and superior performance. Notably, VSFormer reaches state-of-the-art results on various 3 d recognition datasets, including ModelNet40, ScanObjectNN and RGBD. It also establishes new records on the SHREC'17 retrieval benchmark.
Collapse
|
3
|
Sun K, Zhang J, Xu S, Zhao Z, Zhang C, Liu J, Hu J. CACNN: Capsule Attention Convolutional Neural Networks for 3D Object Recognition. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2025; 36:4091-4102. [PMID: 37934641 DOI: 10.1109/tnnls.2023.3326606] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/09/2023]
Abstract
Recently, view-based approaches, which recognize a 3D object through its projected 2-D images, have been extensively studied and have achieved considerable success in 3D object recognition. Nevertheless, most of them use a pooling operation to aggregate viewwise features, which usually leads to the visual information loss. To tackle this problem, we propose a novel layer called capsule attention layer (CAL) by using attention mechanism to fuse the features expressed by capsules. In detail, instead of dynamic routing algorithm, we use an attention module to transmit information from the lower level capsules to higher level capsules, which obviously improves the speed of capsule networks. In particular, the view pooling layer of multiview convolutional neural network (MVCNN) becomes a special case of our CAL when the trainable weights are chosen on some certain values. Furthermore, based on CAL, we propose a capsule attention convolutional neural network (CACNN) for 3D object recognition. Extensive experimental results on three benchmark datasets demonstrate the efficiency of our CACNN and show that it outperforms many state-of-the-art methods.
Collapse
|
4
|
Zhou W, Zheng F, Zhao Y, Pang Y, Yi J. MSDCNN: A multiscale dilated convolution neural network for fine-grained 3D shape classification. Neural Netw 2024; 172:106141. [PMID: 38301340 DOI: 10.1016/j.neunet.2024.106141] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/10/2023] [Revised: 01/17/2024] [Accepted: 01/21/2024] [Indexed: 02/03/2024]
Abstract
Multi-view deep neural networks have shown excellent performance on 3D shape classification tasks. However, global features aggregated from multiple views data often lack content information and spatial relationship, which leads to difficult identification the small variance among subcategories in the same category. To solve this problem, in this paper, a novel multiscale dilated convolution neural network termed as MSDCNN is proposed for multi-view fine-grained 3D shape classification. Firstly, a sequence of views are rendered from 12-viewpoints around the input 3D shape by the sequential view capturing module. Then, the first 22 convolution layers of ResNeXt50 is employed to extract the semantic features of each view, and a global mixed feature map is obtained through the element-wise maximum operation of the 12 output feature maps. Furthermore, attention dilated module (ADM), which combines four concatenated attention dilated block (ADB), is designed to extract larger receptive field features from global mixed feature map to enhance context information among the views. Specifically, each ADB is consisted by an attention mechanism module and a dilated convolution with different dilation rates. In addition, prediction module with label smoothing is proposed to classify features, which contains 3 × 3 convolution and adaptive average pooling. The performance of our method is validated experimentally on the ModelNet10, ModelNet40 and FG3D datasets. Experimental results demonstrate the effectiveness and superiority of the proposed MSDCNN framework for 3D shape fine-grained classification.
Collapse
Affiliation(s)
- Wei Zhou
- College of Intelligent Technology and Engineering, Chongqing University of Science and Technology, Chongqing 401331, PR China.
| | - Fujian Zheng
- College of Intelligent Technology and Engineering, Chongqing University of Science and Technology, Chongqing 401331, PR China; College of Optoelectronic Engineering, Chongqing University, Chongqing 400030, PR China.
| | - Yiheng Zhao
- College of Intelligent Technology and Engineering, Chongqing University of Science and Technology, Chongqing 401331, PR China.
| | - Yiran Pang
- Department of Computer & Electrical Engineering and Computer Science, Florida Atlantic University, FL 33431, United States of America.
| | - Jun Yi
- College of Intelligent Technology and Engineering, Chongqing University of Science and Technology, Chongqing 401331, PR China.
| |
Collapse
|
5
|
Xiang P, Wen X, Liu YS, Cao YP, Wan P, Zheng W, Han Z. Snowflake Point Deconvolution for Point Cloud Completion and Generation With Skip-Transformer. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE 2023; 45:6320-6338. [PMID: 36282830 DOI: 10.1109/tpami.2022.3217161] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/16/2023]
Abstract
Most existing point cloud completion methods suffer from the discrete nature of point clouds and the unstructured prediction of points in local regions, which makes it difficult to reveal fine local geometric details. To resolve this issue, we propose SnowflakeNet with snowflake point deconvolution (SPD) to generate complete point clouds. SPD models the generation of point clouds as the snowflake-like growth of points, where child points are generated progressively by splitting their parent points after each SPD. Our insight into the detailed geometry is to introduce a skip-transformer in the SPD to learn the point splitting patterns that can best fit the local regions. The skip-transformer leverages attention mechanism to summarize the splitting patterns used in the previous SPD layer to produce the splitting in the current layer. The locally compact and structured point clouds generated by SPD precisely reveal the structural characteristics of the 3D shape in local patches, which enables us to predict highly detailed geometries. Moreover, since SPD is a general operation that is not limited to completion, we explore its applications in other generative tasks, including point cloud auto-encoding, generation, single image reconstruction, and upsampling. Our experimental results outperform state-of-the-art methods under widely used benchmarks.
Collapse
|
6
|
Wen X, Xiang P, Han Z, Cao YP, Wan P, Zheng W, Liu YS. PMP-Net++: Point Cloud Completion by Transformer-Enhanced Multi-Step Point Moving Paths. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE 2023; 45:852-867. [PMID: 35290184 DOI: 10.1109/tpami.2022.3159003] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/14/2023]
Abstract
Point cloud completion concerns to predict missing part for incomplete 3D shapes. A common strategy is to generate complete shape according to incomplete input. However, unordered nature of point clouds will degrade generation of high-quality 3D shapes, as detailed topology and structure of unordered points are hard to be captured during the generative process using an extracted latent code. We address this problem by formulating completion as point cloud deformation process. Specifically, we design a novel neural network, named PMP-Net++, to mimic behavior of an earth mover. It moves each point of incomplete input to obtain a complete point cloud, where total distance of point moving paths (PMPs) should be the shortest. Therefore, PMP-Net++ predicts unique PMP for each point according to constraint of point moving distances. The network learns a strict and unique correspondence on point-level, and thus improves quality of predicted complete shape. Moreover, since moving points heavily relies on per-point features learned by network, we further introduce a transformer-enhanced representation learning network, which significantly improves completion performance of PMP-Net++. We conduct comprehensive experiments in shape completion, and further explore application on point cloud up-sampling, which demonstrate non-trivial improvement of PMP-Net++ over state-of-the-art point cloud completion/up-sampling methods.
Collapse
|
7
|
Wang W, Wang X, Chen G, Zhou H. Multi-view SoftPool attention convolutional networks for 3D model classification. Front Neurorobot 2022; 16:1029968. [DOI: 10.3389/fnbot.2022.1029968] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/28/2022] [Accepted: 11/01/2022] [Indexed: 11/17/2022] Open
Abstract
IntroductionExisting multi-view-based 3D model classification methods have the problems of insufficient view refinement feature extraction and poor generalization ability of the network model, which makes it difficult to further improve the classification accuracy. To this end, this paper proposes a multi-view SoftPool attention convolutional network for 3D model classification tasks.MethodsThis method extracts multi-view features through ResNest and adaptive pooling modules, and the extracted features can better represent 3D models. Then, the results of the multi-view feature extraction processed using SoftPool are used as the Query for the self-attentive calculation, which enables the subsequent refinement extraction. We then input the attention scores calculated by Query and Key in the self-attention calculation into the mobile inverted bottleneck convolution, which effectively improves the generalization of the network model. Based on our proposed method, a compact 3D global descriptor is finally generated, achieving a high-accuracy 3D model classification performance.ResultsExperimental results showed that our method achieves 96.96% OA and 95.68% AA on ModelNet40 and 98.57% OA and 98.42% AA on ModelNet10.DiscussionCompared with a multitude of popular methods, our algorithm model achieves the state-of-the-art classification accuracy.
Collapse
|
8
|
Multi-Modal 3D Shape Clustering with Dual Contrastive Learning. APPLIED SCIENCES-BASEL 2022. [DOI: 10.3390/app12157384] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/16/2022]
Abstract
3D shape clustering is developing into an important research subject with the wide applications of 3D shapes in computer vision and multimedia fields. Since 3D shapes generally take on various modalities, how to comprehensively exploit the multi-modal properties to boost clustering performance has become a key issue for the 3D shape clustering task. Taking into account the advantages of multiple views and point clouds, this paper proposes the first multi-modal 3D shape clustering method, named the dual contrastive learning network (DCL-Net), to discover the clustering partitions of unlabeled 3D shapes. First, by simultaneously performing cross-view contrastive learning within multi-view modality and cross-modal contrastive learning between the point cloud and multi-view modalities in the representation space, a representation-level dual contrastive learning module is developed, which aims to capture discriminative 3D shape features for clustering. Meanwhile, an assignment-level dual contrastive learning module is designed by further ensuring the consistency of clustering assignments within the multi-view modality, as well as between the point cloud and multi-view modalities, thus obtaining more compact clustering partitions. Experiments on two commonly used 3D shape benchmarks demonstrate the effectiveness of the proposed DCL-Net.
Collapse
|
9
|
View-relation constrained global representation learning for multi-view-based 3D object recognition. APPL INTELL 2022. [DOI: 10.1007/s10489-022-03949-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/02/2022]
|
10
|
Principal views selection based on growing graph convolution network for multi-view 3D model recognition. APPL INTELL 2022. [DOI: 10.1007/s10489-022-03775-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/02/2022]
|
11
|
Multi-view 3D object retrieval leveraging the aggregation of view and instance attentive features. Knowl Based Syst 2022. [DOI: 10.1016/j.knosys.2022.108754] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/17/2022]
|
12
|
Han XF, Huang XY, Sun SJ, Wang MJ. 3DDACNN: 3D dense attention convolutional neural network for point cloud based object recognition. Artif Intell Rev 2022. [DOI: 10.1007/s10462-022-10165-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/02/2022]
|
13
|
Xu Y, Zheng C, Xu R, Quan Y, Ling H. Multi-View 3D Shape Recognition via Correspondence-Aware Deep Learning. IEEE TRANSACTIONS ON IMAGE PROCESSING : A PUBLICATION OF THE IEEE SIGNAL PROCESSING SOCIETY 2021; 30:5299-5312. [PMID: 34038361 DOI: 10.1109/tip.2021.3082310] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/12/2023]
Abstract
In recent years, multi-view learning has emerged as a promising approach for 3D shape recognition, which identifies a 3D shape based on its 2D views taken from different viewpoints. Usually, the correspondences inside a view or across different views encode the spatial arrangement of object parts and the symmetry of the object, which provide useful geometric cues for recognition. However, such view correspondences have not been explicitly and fully exploited in existing work. In this paper, we propose a correspondence-aware representation (CAR) module, which explicitly finds potential intra-view correspondences and cross-view correspondences via k NN search in semantic space and then aggregates the shape features from the correspondences via learned transforms. Particularly, the spatial relations of correspondences in terms of their viewpoint positions and intra-view locations are taken into account for learning correspondence-aware features. Incorporating the CAR module into a ResNet-18 backbone, we propose an effective deep model called CAR-Net for 3D shape classification and retrieval. Extensive experiments have demonstrated the effectiveness of the CAR module as well as the excellent performance of the CAR-Net.
Collapse
|
14
|
Nie W, Zhao Y, Song D, Gao Y. DAN: Deep-Attention Network for 3D Shape Recognition. IEEE TRANSACTIONS ON IMAGE PROCESSING : A PUBLICATION OF THE IEEE SIGNAL PROCESSING SOCIETY 2021; 30:4371-4383. [PMID: 33848247 DOI: 10.1109/tip.2021.3071687] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/12/2023]
Abstract
Due to the wide applications in a rapidly increasing number of different fields, 3D shape recognition has become a hot topic in the computer vision field. Many approaches have been proposed in recent years. However, there remain huge challenges in two aspects: exploring the effective representation of 3D shapes and reducing the redundant complexity of 3D shapes. In this paper, we propose a novel deep-attention network (DAN) for 3D shape representation based on multiview information. More specifically, we introduce the attention mechanism to construct a deep multiattention network that has advantages in two aspects: 1) information selection, in which DAN utilizes the self-attention mechanism to update the feature vector of each view, effectively reducing the redundant information, and 2) information fusion, in which DAN applies attention mechanism that can save more effective information by considering the correlations among views. Meanwhile, deep network structure can fully consider the correlations to continuously fuse effective information. To validate the effectiveness of our proposed method, we conduct experiments on the public 3D shape datasets: ModelNet40, ModelNet10, and ShapeNetCore55. Experimental results and comparison with state-of-the-art methods demonstrate the superiority of our proposed method. Code is released on https://github.com/RiDang/DANN.
Collapse
|
15
|
Cheng S, Chen X, He X, Liu Z, Bai X. PRA-Net: Point Relation-Aware Network for 3D Point Cloud Analysis. IEEE TRANSACTIONS ON IMAGE PROCESSING : A PUBLICATION OF THE IEEE SIGNAL PROCESSING SOCIETY 2021; 30:4436-4448. [PMID: 33856993 DOI: 10.1109/tip.2021.3072214] [Citation(s) in RCA: 11] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/12/2023]
Abstract
Learning intra-region contexts and inter-region relations are two effective strategies to strengthen feature representations for point cloud analysis. However, unifying the two strategies for point cloud representation is not fully emphasized in existing methods. To this end, we propose a novel framework named Point Relation-Aware Network (PRA-Net), which is composed of an Intra-region Structure Learning (ISL) module and an Inter-region Relation Learning (IRL) module. The ISL module can dynamically integrate the local structural information into the point features, while the IRL module captures inter-region relations adaptively and efficiently via a differentiable region partition scheme and a representative point-based strategy. Extensive experiments on several 3D benchmarks covering shape classification, keypoint estimation, and part segmentation have verified the effectiveness and the generalization ability of PRA-Net. Code will be available at https://github.com/XiwuChen/PRA-Net.
Collapse
|
16
|
INCO-GAN: Variable-Length Music Generation Method Based on Inception Model-Based Conditional GAN. MATHEMATICS 2021. [DOI: 10.3390/math9040387] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/17/2022]
Abstract
Deep learning has made significant progress in the field of automatic music generation. At present, the research on music generation via deep learning can be divided into two categories: predictive models and generative models. However, both categories have the same problems that need to be resolved. First, the length of the music must be determined artificially prior to generation. Second, although the convolutional neural network (CNN) is unexpectedly superior to the recurrent neural network (RNN), CNN still has several disadvantages. This paper proposes a conditional generative adversarial network approach using an inception model (INCO-GAN), which enables the generation of complete variable-length music automatically. By adding a time distribution layer that considers sequential data, CNN considers the time relationship in a manner similar to RNN. In addition, the inception model obtains richer features, which improves the quality of the generated music. In experiments conducted, the music generated by the proposed method and that by human composers were compared. High cosine similarity of up to 0.987 was achieved between the frequency vectors, indicating that the music generated by the proposed method is very similar to that created by a human composer.
Collapse
|
17
|
Liu AA, Zhou H, Nie W, Liu Z, Liu W, Xie H, Mao Z, Li X, Song D. Hierarchical multi-view context modelling for 3D object classification and retrieval. Inf Sci (N Y) 2021. [DOI: 10.1016/j.ins.2020.09.057] [Citation(s) in RCA: 11] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/23/2022]
|
18
|
Liu X, Han Z, Liu YS, Zwicker M. Fine-Grained 3D Shape Classification With Hierarchical Part-View Attention. IEEE TRANSACTIONS ON IMAGE PROCESSING : A PUBLICATION OF THE IEEE SIGNAL PROCESSING SOCIETY 2021; 30:1744-1758. [PMID: 33417547 DOI: 10.1109/tip.2020.3048623] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/12/2023]
Abstract
Fine-grained 3D shape classification is important for shape understanding and analysis, which poses a challenging research problem. However, the studies on the fine-grained 3D shape classification have rarely been explored, due to the lack of fine-grained 3D shape benchmarks. To address this issue, we first introduce a new 3D shape dataset (named FG3D dataset) with fine-grained class labels, which consists of three categories including airplane, car and chair. Each category consists of several subcategories at a fine-grained level. According to our experiments under this fine-grained dataset, we find that state-of-the-art methods are significantly limited by the small variance among subcategories in the same category. To resolve this problem, we further propose a novel fine-grained 3D shape classification method named FG3D-Net to capture the fine-grained local details of 3D shapes from multiple rendered views. Specifically, we first train a Region Proposal Network (RPN) to detect the generally semantic parts inside multiple views under the benchmark of generally semantic part detection. Then, we design a hierarchical part-view attention aggregation module to learn a global shape representation by aggregating generally semantic part features, which preserves the local details of 3D shapes. The part-view attention module hierarchically leverages part-level and view-level attention to increase the discriminability of our features. The part-level attention highlights the important parts in each view while the view-level attention highlights the discriminative views among all the views of the same object. In addition, we integrate a Recurrent Neural Network (RNN) to capture the spatial relationships among sequential views from different viewpoints. Our results under the fine-grained 3D shape dataset show that our method outperforms other state-of-the-art methods. The FG3D dataset is available at https://github.com/liuxinhai/FG3D-Net.
Collapse
|
19
|
Sun K, Zhang J, Liu J, Yu R, Song Z. DRCNN: Dynamic Routing Convolutional Neural Network for Multi-View 3D Object Recognition. IEEE TRANSACTIONS ON IMAGE PROCESSING : A PUBLICATION OF THE IEEE SIGNAL PROCESSING SOCIETY 2020; 30:868-877. [PMID: 33237859 DOI: 10.1109/tip.2020.3039378] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/11/2023]
Abstract
3D object recognition is one of the most important tasks in 3D data processing, and has been extensively studied recently. Researchers have proposed various 3D recognition methods based on deep learning, among which a class of view-based approaches is a typical one. However, in the view-based methods, the commonly used view pooling layer to fuse multi-view features causes a loss of visual information. To alleviate this problem, in this paper, we construct a novel layer called Dynamic Routing Layer (DRL) by modifying the dynamic routing algorithm of capsule network, to more effectively fuse the features of each view. Concretely, in DRL, we use rearrangement and affine transformation to convert features, then leverage the modified dynamic routing algorithm to adaptively choose the converted features, instead of ignoring all but the most active feature in view pooling layer. We also illustrate that the view pooling layer is a special case of our DRL. In addition, based on DRL, we further present a Dynamic Routing Convolutional Neural Network (DRCNN) for multi-view 3D object recognition. Our experiments on three 3D benchmark datasets show that our proposed DRCNN outperforms many state-of-the-arts, which demonstrates the efficacy of our method.
Collapse
|
20
|
Wen X, Han Z, Liu X, Liu YS. Point2SpatialCapsule: Aggregating Features and Spatial Relationships of Local Regions on Point Clouds using Spatial-aware Capsules. IEEE TRANSACTIONS ON IMAGE PROCESSING : A PUBLICATION OF THE IEEE SIGNAL PROCESSING SOCIETY 2020; PP:8855-8869. [PMID: 32894715 DOI: 10.1109/tip.2020.3019925] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/11/2023]
Abstract
Learning discriminative shape representation directly on point clouds is still challenging in 3D shape analysis and understanding. Recent studies usually involve three steps: first splitting a point cloud into some local regions, then extracting the corresponding feature of each local region, and finally aggregating all individual local region features into a global feature as shape representation using simple max-pooling. However, such pooling-based feature aggregation methods do not adequately take the spatial relationships (e.g. the relative locations to other regions) between local regions into account, which greatly limits the ability to learn discriminative shape representation. To address this issue, we propose a novel deep learning network, named Point2SpatialCapsule, for aggregating features and spatial relationships of local regions on point clouds, which aims to learn more discriminative shape representation. Compared with the traditional max-pooling based feature aggregation networks, Point2SpatialCapsule can explicitly learn not only geometric features of local regions but also the spatial relationships among them. Point2SpatialCapsule consists of two main modules. To resolve the disorder problem of local regions, the first module, named geometric feature aggregation, is designed to aggregate the local region features into the learnable cluster centers, which explicitly encodes the spatial locations from the original 3D space. The second module, named spatial relationship aggregation, is proposed for further aggregating the clustered features and the spatial relationships among them in the feature space using the spatial-aware capsules developed in this paper. Compared to the previous capsule network based methods, the feature routing on the spatial-aware capsules can learn more discriminative spatial relationships among local regions for point clouds, which establishes a direct mapping between log priors and the spatial locations through feature clusters. Experimental results demonstrate that Point2SpatialCapsule outperforms the state-of-the-art methods in the 3D shape classification, retrieval and segmentation tasks under the well-known ModelNet and ShapeNet datasets.
Collapse
|
21
|
Han Z, Ma B, Liu YS, Zwicker M. Reconstructing 3D Shapes from Multiple Sketches using Direct Shape Optimization. IEEE TRANSACTIONS ON IMAGE PROCESSING : A PUBLICATION OF THE IEEE SIGNAL PROCESSING SOCIETY 2020; PP:8721-8734. [PMID: 32870791 DOI: 10.1109/tip.2020.3018865] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/11/2023]
Abstract
3D shape reconstruction from multiple hand-drawn sketches is an intriguing way to 3D shape modeling. Currently, state-of-the-art methods employ neural networks to learn a mapping from multiple sketches from arbitrary view angles to a 3D voxel grid. Because of the cubic complexity of 3D voxel grids, however, neural networks are hard to train and limited to low resolution reconstructions, which leads to a lack of geometric detail and low accuracy. To resolve this issue, we propose to reconstruct 3D shapes from multiple sketches using direct shape optimization (DSO), which does not involve deep learning models for direct voxel-based 3D shape generation. Specifically, we first leverage a conditional generative adversarial network (CGAN) to translate each sketch into an attenuance image that captures the predicted geometry from a given viewpoint. Then, DSO minimizes a project-and-compare loss to reconstruct the 3D shape such that it matches the predicted attenuance images from the view angles of all input sketches. Based on this, we further propose a progressive update approach to handle inconsistencies among a few hand-drawn sketches for the same 3D shape. Our experimental results show that our method significantly outperforms the state-of-the-art methods under widely used benchmarks and produces intuitive results in an interactive application.
Collapse
|
22
|
Gong B, Yan C, Bai J, Zou C, Gao Y. Hamming Embedding Sensitivity Guided Fusion Network for 3D Shape Representation. IEEE TRANSACTIONS ON IMAGE PROCESSING : A PUBLICATION OF THE IEEE SIGNAL PROCESSING SOCIETY 2020; PP:8381-8390. [PMID: 32755857 DOI: 10.1109/tip.2020.3013138] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/11/2023]
Abstract
Three-dimensional multi-modal data are used to represent 3D objects in the real world in different ways. Features separately extracted from multimodality data are often poorly correlated. Recent solutions leveraging the attention mechanism to learn a joint-network for the fusion of multimodality features have weak generalization capability. In this paper, we propose a hamming embedding sensitivity network to address the problem of effectively fusing multimodality features. The proposed network called HamNet is the first end-to-end framework with the capacity to theoretically integrate data from all modalities with a unified architecture for 3D shape representation, which can be used for 3D shape retrieval and recognition. HamNet uses the feature concealment module to achieve effective deep feature fusion. The basic idea of the concealment module is to re-weight the features from each modality at an early stage with the hamming embedding of these modalities. The hamming embedding also provides an effective solution for fast retrieval tasks on a large scale dataset. We have evaluated the proposed method on the large-scale ModelNet40 dataset for the tasks of 3D shape classification, single modality and cross-modality retrieval. Comprehensive experiments and comparisons with state-of-the-art methods demonstrate that the proposed approach can achieve superior performance.
Collapse
|
23
|
|
24
|
A New Volumetric CNN for 3D Object Classification Based on Joint Multiscale Feature and Subvolume Supervised Learning Approaches. COMPUTATIONAL INTELLIGENCE AND NEUROSCIENCE 2020. [DOI: 10.1155/2020/5851465] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/18/2022]
Abstract
The advancement of low-cost RGB-D and LiDAR three-dimensional (3D) sensors has permitted the obtainment of the 3D model easier in real-time. However, making intricate 3D features is crucial for the advancement of 3D object classifications. The existing volumetric voxel-based CNN approaches have achieved remarkable progress, but they generate huge computational overhead that limits the extraction of global features at higher resolutions of 3D objects. In this paper, a low-cost 3D volumetric deep convolutional neural network is proposed for 3D object classification based on joint multiscale hierarchical and subvolume supervised learning strategies. Our proposed deep neural network inputs 3D data, which are preprocessed by implementing memory-efficient octree representation, and we propose to limit the full layer octree depth to a certain level based on the predefined input volume resolution for storing high-precision contour features. Multiscale features are concatenated from multilevel octree depths inside the network, aiming to adaptively generate high-level global features. The strategy of the subvolume supervision approach is to train the network on subparts of the 3D object in order to learn local features. Our framework has been evaluated with two publicly available 3D repositories. Experimental results demonstrate the effectiveness of our proposed method where the classification accuracy is improved in comparison to existing volumetric approaches, and the memory consumption ratio and run-time are significantly reduced.
Collapse
|
25
|
Li N, Li Q, Liu YS, Lu W, Wang W. BIMSeek++: Retrieving BIM components using similarity measurement of attributes. COMPUT IND 2020. [DOI: 10.1016/j.compind.2020.103186] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/27/2022]
|