1
|
Sun H, Wang Y, Wang P, Deng H, Cai X, Li D. VSFormer: Mining Correlations in Flexible View Set for Multi-View 3D Shape Understanding. IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS 2025; 31:2127-2141. [PMID: 38526893 DOI: 10.1109/tvcg.2024.3381152] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 03/27/2024]
Abstract
View-based methods have demonstrated promising performance in 3D shape understanding. However, they tend to make strong assumptions about the relations between views or learn the multi-view correlations indirectly, which limits the flexibility of exploring inter-view correlations and the effectiveness of target tasks. To overcome the above problems, this article investigates flexible organization and explicit correlation learning for multiple views. In particular, we propose to incorporate different views of a 3D shape into a permutation-invariant set, referred to as View Set, which removes rigid relation assumptions and facilitates adequate information exchange and fusion among views. Based on that, we devise a nimble Transformer model, named VSFormer, to explicitly capture pairwise and higher-order correlations of all elements in the set. Meanwhile, we theoretically reveal a natural correspondence between the Cartesian product of a view set and the correlation matrix in the attention mechanism, which supports our model design. Comprehensive experiments suggest that VSFormer has better flexibility, efficient inference efficiency and superior performance. Notably, VSFormer reaches state-of-the-art results on various 3 d recognition datasets, including ModelNet40, ScanObjectNN and RGBD. It also establishes new records on the SHREC'17 retrieval benchmark.
Collapse
|
2
|
Sun K, Zhang J, Xu S, Zhao Z, Zhang C, Liu J, Hu J. CACNN: Capsule Attention Convolutional Neural Networks for 3D Object Recognition. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2025; 36:4091-4102. [PMID: 37934641 DOI: 10.1109/tnnls.2023.3326606] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/09/2023]
Abstract
Recently, view-based approaches, which recognize a 3D object through its projected 2-D images, have been extensively studied and have achieved considerable success in 3D object recognition. Nevertheless, most of them use a pooling operation to aggregate viewwise features, which usually leads to the visual information loss. To tackle this problem, we propose a novel layer called capsule attention layer (CAL) by using attention mechanism to fuse the features expressed by capsules. In detail, instead of dynamic routing algorithm, we use an attention module to transmit information from the lower level capsules to higher level capsules, which obviously improves the speed of capsule networks. In particular, the view pooling layer of multiview convolutional neural network (MVCNN) becomes a special case of our CAL when the trainable weights are chosen on some certain values. Furthermore, based on CAL, we propose a capsule attention convolutional neural network (CACNN) for 3D object recognition. Extensive experimental results on three benchmark datasets demonstrate the efficiency of our CACNN and show that it outperforms many state-of-the-art methods.
Collapse
|
3
|
Zhou W, Zheng F, Zhao Y, Pang Y, Yi J. MSDCNN: A multiscale dilated convolution neural network for fine-grained 3D shape classification. Neural Netw 2024; 172:106141. [PMID: 38301340 DOI: 10.1016/j.neunet.2024.106141] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/10/2023] [Revised: 01/17/2024] [Accepted: 01/21/2024] [Indexed: 02/03/2024]
Abstract
Multi-view deep neural networks have shown excellent performance on 3D shape classification tasks. However, global features aggregated from multiple views data often lack content information and spatial relationship, which leads to difficult identification the small variance among subcategories in the same category. To solve this problem, in this paper, a novel multiscale dilated convolution neural network termed as MSDCNN is proposed for multi-view fine-grained 3D shape classification. Firstly, a sequence of views are rendered from 12-viewpoints around the input 3D shape by the sequential view capturing module. Then, the first 22 convolution layers of ResNeXt50 is employed to extract the semantic features of each view, and a global mixed feature map is obtained through the element-wise maximum operation of the 12 output feature maps. Furthermore, attention dilated module (ADM), which combines four concatenated attention dilated block (ADB), is designed to extract larger receptive field features from global mixed feature map to enhance context information among the views. Specifically, each ADB is consisted by an attention mechanism module and a dilated convolution with different dilation rates. In addition, prediction module with label smoothing is proposed to classify features, which contains 3 × 3 convolution and adaptive average pooling. The performance of our method is validated experimentally on the ModelNet10, ModelNet40 and FG3D datasets. Experimental results demonstrate the effectiveness and superiority of the proposed MSDCNN framework for 3D shape fine-grained classification.
Collapse
Affiliation(s)
- Wei Zhou
- College of Intelligent Technology and Engineering, Chongqing University of Science and Technology, Chongqing 401331, PR China.
| | - Fujian Zheng
- College of Intelligent Technology and Engineering, Chongqing University of Science and Technology, Chongqing 401331, PR China; College of Optoelectronic Engineering, Chongqing University, Chongqing 400030, PR China.
| | - Yiheng Zhao
- College of Intelligent Technology and Engineering, Chongqing University of Science and Technology, Chongqing 401331, PR China.
| | - Yiran Pang
- Department of Computer & Electrical Engineering and Computer Science, Florida Atlantic University, FL 33431, United States of America.
| | - Jun Yi
- College of Intelligent Technology and Engineering, Chongqing University of Science and Technology, Chongqing 401331, PR China.
| |
Collapse
|
4
|
Gezawa AS, Liu C, Jia H, Nanehkaran YA, Almutairi MS, Chiroma H. An improved fused feature residual network for 3D point cloud data. Front Comput Neurosci 2023; 17:1204445. [PMID: 37711504 PMCID: PMC10498464 DOI: 10.3389/fncom.2023.1204445] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/12/2023] [Accepted: 08/03/2023] [Indexed: 09/16/2023] Open
Abstract
Point clouds have evolved into one of the most important data formats for 3D representation. It is becoming more popular as a result of the increasing affordability of acquisition equipment and growing usage in a variety of fields. Volumetric grid-based approaches are among the most successful models for processing point clouds because they fully preserve data granularity while additionally making use of point dependency. However, using lower order local estimate functions to close 3D objects, such as the piece-wise constant function, necessitated the use of a high-resolution grid in order to capture detailed features that demanded vast computational resources. This study proposes an improved fused feature network as well as a comprehensive framework for solving shape classification and segmentation tasks using a two-branch technique and feature learning. We begin by designing a feature encoding network with two distinct building blocks: layer skips within, batch normalization (BN), and rectified linear units (ReLU) in between. The purpose of using layer skips is to have fewer layers to propagate across, which will speed up the learning process and lower the effect of gradients vanishing. Furthermore, we develop a robust grid feature extraction module that consists of multiple convolution blocks accompanied by max-pooling to represent a hierarchical representation and extract features from an input grid. We overcome the grid size constraints by sampling a constant number of points in each grid using a simple K-points nearest neighbor (KNN) search, which aids in learning approximation functions in higher order. The proposed method outperforms or is comparable to state-of-the-art approaches in point cloud segmentation and classification tasks. In addition, a study of ablation is presented to show the effectiveness of the proposed method.
Collapse
Affiliation(s)
- Abubakar Sulaiman Gezawa
- College of Information Engineering, Fujian Key Lab of Agriculture IOT Application, Sanming University, Sanming, Fujian, China
| | - Chibiao Liu
- College of Information Engineering, Fujian Key Lab of Agriculture IOT Application, Sanming University, Sanming, Fujian, China
| | - Heming Jia
- College of Information Engineering, Fujian Key Lab of Agriculture IOT Application, Sanming University, Sanming, Fujian, China
| | - Y. A. Nanehkaran
- Department of Software Engineering, School of Information Engineering, Yancheng Teachers University, Yancheng, Jiangsu, China
| | - Mubarak S. Almutairi
- College of Computer Science and Engineering, University of Hafr Al-Batin, Hafar Al Batin, Saudi Arabia
| | - Haruna Chiroma
- College of Computer Science and Engineering Technology, Applied College, University of Hafr Al-Batin, Hafar Al Batin, Saudi Arabia
| |
Collapse
|
5
|
Xiang P, Wen X, Liu YS, Cao YP, Wan P, Zheng W, Han Z. Snowflake Point Deconvolution for Point Cloud Completion and Generation With Skip-Transformer. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE 2023; 45:6320-6338. [PMID: 36282830 DOI: 10.1109/tpami.2022.3217161] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/16/2023]
Abstract
Most existing point cloud completion methods suffer from the discrete nature of point clouds and the unstructured prediction of points in local regions, which makes it difficult to reveal fine local geometric details. To resolve this issue, we propose SnowflakeNet with snowflake point deconvolution (SPD) to generate complete point clouds. SPD models the generation of point clouds as the snowflake-like growth of points, where child points are generated progressively by splitting their parent points after each SPD. Our insight into the detailed geometry is to introduce a skip-transformer in the SPD to learn the point splitting patterns that can best fit the local regions. The skip-transformer leverages attention mechanism to summarize the splitting patterns used in the previous SPD layer to produce the splitting in the current layer. The locally compact and structured point clouds generated by SPD precisely reveal the structural characteristics of the 3D shape in local patches, which enables us to predict highly detailed geometries. Moreover, since SPD is a general operation that is not limited to completion, we explore its applications in other generative tasks, including point cloud auto-encoding, generation, single image reconstruction, and upsampling. Our experimental results outperform state-of-the-art methods under widely used benchmarks.
Collapse
|
6
|
Wen X, Xiang P, Han Z, Cao YP, Wan P, Zheng W, Liu YS. PMP-Net++: Point Cloud Completion by Transformer-Enhanced Multi-Step Point Moving Paths. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE 2023; 45:852-867. [PMID: 35290184 DOI: 10.1109/tpami.2022.3159003] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/14/2023]
Abstract
Point cloud completion concerns to predict missing part for incomplete 3D shapes. A common strategy is to generate complete shape according to incomplete input. However, unordered nature of point clouds will degrade generation of high-quality 3D shapes, as detailed topology and structure of unordered points are hard to be captured during the generative process using an extracted latent code. We address this problem by formulating completion as point cloud deformation process. Specifically, we design a novel neural network, named PMP-Net++, to mimic behavior of an earth mover. It moves each point of incomplete input to obtain a complete point cloud, where total distance of point moving paths (PMPs) should be the shortest. Therefore, PMP-Net++ predicts unique PMP for each point according to constraint of point moving distances. The network learns a strict and unique correspondence on point-level, and thus improves quality of predicted complete shape. Moreover, since moving points heavily relies on per-point features learned by network, we further introduce a transformer-enhanced representation learning network, which significantly improves completion performance of PMP-Net++. We conduct comprehensive experiments in shape completion, and further explore application on point cloud up-sampling, which demonstrate non-trivial improvement of PMP-Net++ over state-of-the-art point cloud completion/up-sampling methods.
Collapse
|
7
|
Feng W, Zhang J, Zhou Y, Xin S. GDR-Net: A Geometric Detail Recovering Network for 3D Scanned Objects. IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS 2022; 28:3959-3973. [PMID: 34495834 DOI: 10.1109/tvcg.2021.3110658] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/13/2023]
Abstract
This article addresses the problem of mesh super-resolution such that the geometry details which are not well represented in the low-resolution models can be recovered and well represented in the generated high-quality models. The main challenges of this problem are the nonregularity of 3D mesh representation and the high complexity of 3D shapes. We propose a deep neural network called GDR-Net to solve this ill-posed problem, which resolves the two challenges simultaneously. First, to overcome the nonregularity, we regress a displacement in radial basis function parameter space instead of the vertex-wise coordinates in the euclidean space. Second, to overcome the high complexity, we apply the detail recovery process to small surface patches extracted from the input surface and obtain the overall high-quality mesh by fusing the refined surface patches. To train the network, we constructed a dataset composed of both real-world and synthetic scanned models, including high/low-quality pairs. Our experimental results demonstrate that GDR-Net works well for general models and outperforms previous methods for recovering geometric details.
Collapse
|
8
|
Wang W, Wang X, Chen G, Zhou H. Multi-view SoftPool attention convolutional networks for 3D model classification. Front Neurorobot 2022; 16:1029968. [DOI: 10.3389/fnbot.2022.1029968] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/28/2022] [Accepted: 11/01/2022] [Indexed: 11/17/2022] Open
Abstract
IntroductionExisting multi-view-based 3D model classification methods have the problems of insufficient view refinement feature extraction and poor generalization ability of the network model, which makes it difficult to further improve the classification accuracy. To this end, this paper proposes a multi-view SoftPool attention convolutional network for 3D model classification tasks.MethodsThis method extracts multi-view features through ResNest and adaptive pooling modules, and the extracted features can better represent 3D models. Then, the results of the multi-view feature extraction processed using SoftPool are used as the Query for the self-attentive calculation, which enables the subsequent refinement extraction. We then input the attention scores calculated by Query and Key in the self-attention calculation into the mobile inverted bottleneck convolution, which effectively improves the generalization of the network model. Based on our proposed method, a compact 3D global descriptor is finally generated, achieving a high-accuracy 3D model classification performance.ResultsExperimental results showed that our method achieves 96.96% OA and 95.68% AA on ModelNet40 and 98.57% OA and 98.42% AA on ModelNet10.DiscussionCompared with a multitude of popular methods, our algorithm model achieves the state-of-the-art classification accuracy.
Collapse
|
9
|
Mirbauer M, Krabec M, Krivanek J, Sikudova E. Survey and Evaluation of Neural 3D Shape Classification Approaches. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE 2022; 44:8635-8656. [PMID: 34406936 DOI: 10.1109/tpami.2021.3102676] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/13/2023]
Abstract
Classification of 3D objects - the selection of a category in which each object belongs - is of great interest in the field of machine learning. Numerous researchers use deep neural networks to address this problem, altering the network architecture and representation of the 3D shape used as an input. To investigate the effectiveness of their approaches, we conduct an extensive survey of existing methods and identify common ideas by which we categorize them into a taxonomy. Second, we evaluate 11 selected classification networks on two 3D object datasets, extending the evaluation to a larger dataset on which most of the selected approaches have not been tested yet. For this, we provide a framework for converting shapes from common 3D mesh formats into formats native to each network, and for training and evaluating different classification approaches on this data. Despite being partially unable to reach the accuracies reported in the original papers, we compare the relative performance of the approaches as well as their performance when changing datasets as the only variable to provide valuable insights into performance on different kinds of data. We make our code available to simplify running training experiments with multiple neural networks with different prerequisites.
Collapse
|
10
|
Wang J, Chakraborty R, Yu SX. Transformer for 3D Point Clouds. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE 2022; 44:4419-4431. [PMID: 33793397 DOI: 10.1109/tpami.2021.3070341] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/12/2023]
Abstract
Deep neural networks are widely used for understanding 3D point clouds. At each point convolution layer, features are computed from local neighbourhoods of 3D points and combined for subsequent processing in order to extract semantic information. Existing methods adopt the same individual point neighborhoods throughout the network layers, defined by the same metric on the fixed input point coordinates. This common practice is easy to implement but not necessarily optimal. Ideally, local neighborhoods should be different at different layers, as more latent information is extracted at deeper layers. We propose a novel end-to-end approach to learn different non-rigid transformations of the input point cloud so that optimal local neighborhoods can be adopted at each layer. We propose both linear (affine) and non-linear (projective and deformable) spatial transformers for 3D point clouds. With spatial transformers on the ShapeNet part segmentation dataset, the network achieves higher accuracy for all categories, with 8 percent gain on earphones and rockets in particular. Our method also outperforms the state-of-the-art on other point cloud tasks such as classification, detection, and semantic segmentation. Visualizations show that spatial transformers can learn features more efficiently by dynamically altering local neighborhoods according to the geometry and semantics of 3D shapes in spite of their within-category variations.
Collapse
|
11
|
View-relation constrained global representation learning for multi-view-based 3D object recognition. APPL INTELL 2022. [DOI: 10.1007/s10489-022-03949-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/02/2022]
|
12
|
Wen S, Wang T, Tao S. Hybrid CNN-LSTM Architecture for LiDAR Point Clouds Semantic Segmentation. IEEE Robot Autom Lett 2022. [DOI: 10.1109/lra.2022.3153899] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/07/2022]
|
13
|
Principal views selection based on growing graph convolution network for multi-view 3D model recognition. APPL INTELL 2022. [DOI: 10.1007/s10489-022-03775-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/02/2022]
|
14
|
Research on an Optimal Path Planning Method Based on A* Algorithm for Multi-View Recognition. ALGORITHMS 2022. [DOI: 10.3390/a15050171] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/16/2022]
Abstract
In order to obtain the optimal perspectives of the recognition target, this paper combines the motion path of the manipulator arm and camera. A path planning method to find the optimal perspectives based on an A* algorithm is proposed. The quality of perspectives is represented by means of multi-view recognition. A binary multi-view 2D kernel principal component analysis network (BM2DKPCANet) is built to extract features. The multi-view angles classifier based on BM2DKPCANet + Softmax is established, which outputs category posterior probability to represent the perspective recognition performance function. The path planning problem is transformed into a multi-objective optimization problem by taking the optimal view recognition and the shortest path distance as the objective functions. In order to reduce the calculation, the multi-objective optimization problem is transformed into a single optimization problem by fusing the objective functions based on the established perspective observation directed graph model. An A* algorithm is used to solve the single source shortest path problem of the fused directed graph. The path planning experiments with different numbers of view angles and different starting points demonstrate that the method can guide the camera to reach the viewpoint with higher recognition accuracy and complete the optimal observation path planning.
Collapse
|
15
|
Multi-view 3D object retrieval leveraging the aggregation of view and instance attentive features. Knowl Based Syst 2022. [DOI: 10.1016/j.knosys.2022.108754] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/17/2022]
|
16
|
|
17
|
|
18
|
Liu J, Ren Y, Qin X. Study on 3D Clothing Color Application Based on Deep Learning-Enabled Macro-Micro Adversarial Network and Human Body Modeling. COMPUTATIONAL INTELLIGENCE AND NEUROSCIENCE 2021; 2021:9918175. [PMID: 34539773 PMCID: PMC8443351 DOI: 10.1155/2021/9918175] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 07/10/2021] [Revised: 08/12/2021] [Accepted: 08/14/2021] [Indexed: 11/23/2022]
Abstract
In real life, people's life gradually tends to be simple, so the convenience of online shopping makes more and more research begin to explore the convenience optimization of shopping, in which the fitting system is the research product. However, due to the immaturity of the virtual fitting system, there are a lot of problems, such as the expression of clothing color is not clear or deviation. In view of this, this paper proposes a 3D clothing color display model based on deep learning to support human modeling-driven. Firstly, the macro-micro adversarial network (MMAN) based on deep learning is used to analyze the original image, and then, the results are preprocessed. Finally, the 3D model with the original image color is constructed by using UV mapping. The experimental results show that the accuracy of the MMAN algorithm reaches 0.972, the established three-dimensional model is emotional enough, the expression of the clothing color is clear, and the difference between the color difference and the original image is within 0.01, and the subjective evaluation of volunteers is more than 90 points. The above results show that it is effective to use deep learning to build a 3D model with the original picture clothing color, which has great guiding significance for the research of character model modeling and simulation.
Collapse
Affiliation(s)
- Jingmiao Liu
- General Graduate School of Keimyung University South Korea, Daegu 42601, Republic of Korea
| | - Yu Ren
- General Graduate School of Keimyung University South Korea, Daegu 42601, Republic of Korea
- School of Design, Sichuan Fine Arts Institute, Chongqing 401331, China
| | - Xiaotong Qin
- School of Art, Yanching Institute of Technology, Sanhe 065201, China
| |
Collapse
|
19
|
Chen Q, Huang J, Salehi HS, Zhu H, Lian L, Lai X, Wei K. Hierarchical CNN-based occlusal surface morphology analysis for classifying posterior tooth type using augmented images from 3D dental surface models. COMPUTER METHODS AND PROGRAMS IN BIOMEDICINE 2021; 208:106295. [PMID: 34329895 DOI: 10.1016/j.cmpb.2021.106295] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/01/2021] [Accepted: 07/15/2021] [Indexed: 06/13/2023]
Abstract
OBJECTIVE 3D Digitization of dental model is growing in popularity for dental application. Classification of tooth type from single 3D point cloud model without assist of relative position among teeth is still a challenging task. METHODS In this paper, 8-class posterior tooth type classification (first premolar, second premolar, first molar, second molar in maxilla and mandible respectively) was investigated by convolutional neural network (CNN)-based occlusal surface morphology analysis. 3D occlusal surface was transformed to depth image for basic CNN-based classification. Considering the logical hierarchy of tooth categories, a hierarchical classification structure was proposed to decompose 8-class classification task into two-stage cascaded classification subtasks. Image augmentations including traditional geometrical transformation and deep convolutional generative adversarial networks (DCGANs) were applied for each subnetworks and cascaded network. RESULTS Results indicate that combing traditional and DCGAN-based augmented images to train CNN models can improve classification performance. In the paper, we achieve overall accuracy 91.35%, macro precision 91.49%, macro-recall 91.29%, and macro-F1 0.9139 for the 8-class posterior tooth type classification, which outperform other deep learning models. Meanwhile, Grad-cam results demonstrate that CNN model trained by our augmented images will focus on smaller important region for better generality. And anatomic landmarks of cusp, fossa, and groove work as important regions for cascaded classification model. CONCLUSION The reported work has proved that using basic CNN to construct two-stage hierarchical structure can achieve the best classification performance of posterior tooth type in 3D model without assistance of relative position information. The proposed method has advantages of easy training, great ability to learn discriminative features from small image region.
Collapse
Affiliation(s)
- Qingguang Chen
- School of Automation, Hangzhou Dianzi University, 310018, Hangzhou, China.
| | - Junchao Huang
- School of Automation, Hangzhou Dianzi University, 310018, Hangzhou, China
| | - Hassan S Salehi
- Department of Electrical and Computer Engineering, California State University, Chico, 95929, United States
| | - Haihua Zhu
- Hospital of Stomatology of Zhejiang University, Hangzhou, 310018, China
| | - Luya Lian
- Hospital of Stomatology of Zhejiang University, Hangzhou, 310018, China
| | - Xiaomin Lai
- School of Automation, Hangzhou Dianzi University, 310018, Hangzhou, China
| | - Kaihua Wei
- School of Automation, Hangzhou Dianzi University, 310018, Hangzhou, China
| |
Collapse
|
20
|
Bai J, Gong B, Zhao Y, Lei F, Yan C, Gao Y. Multi-Scale Representation Learning on Hypergraph for 3D Shape Retrieval and Recognition. IEEE TRANSACTIONS ON IMAGE PROCESSING : A PUBLICATION OF THE IEEE SIGNAL PROCESSING SOCIETY 2021; 30:5327-5338. [PMID: 34043509 DOI: 10.1109/tip.2021.3082765] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/12/2023]
Abstract
Effective 3D shape retrieval and recognition are challenging but important tasks in computer vision research field, which have attracted much attention in recent decades. Although recent progress has shown significant improvement of deep learning methods on 3D shape retrieval and recognition performance, it is still under investigated of how to jointly learn an optimal representation of 3D shapes considering their relationships. To tackle this issue, we propose a multi-scale representation learning method on hypergraph for 3D shape retrieval and recognition, called multi-scale hypergraph neural network (MHGNN). In this method, the correlation among 3D shapes is formulated in a hypergraph and a hypergraph convolution process is conducted to learn the representations. Here, multiple representations can be obtained through different convolution layers, leading to multi-scale representations of 3D shapes. A fusion module is then introduced to combine these representations for 3D shape retrieval and recognition. The main advantages of our method lie in 1) the high-order correlation among 3D shapes can be investigated in the framework and 2) the joint multi-scale representation can be more robust for comparison. Comparisons with state-of-the-art methods on the public ModelNet40 dataset demonstrate remarkable performance improvement of our proposed method on the 3D shape retrieval task. Meanwhile, experiments on recognition tasks also show better results of our proposed method, which indicate the superiority of our method on learning better representation for retrieval and recognition.
Collapse
|
21
|
Xu Y, Zheng C, Xu R, Quan Y, Ling H. Multi-View 3D Shape Recognition via Correspondence-Aware Deep Learning. IEEE TRANSACTIONS ON IMAGE PROCESSING : A PUBLICATION OF THE IEEE SIGNAL PROCESSING SOCIETY 2021; 30:5299-5312. [PMID: 34038361 DOI: 10.1109/tip.2021.3082310] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/12/2023]
Abstract
In recent years, multi-view learning has emerged as a promising approach for 3D shape recognition, which identifies a 3D shape based on its 2D views taken from different viewpoints. Usually, the correspondences inside a view or across different views encode the spatial arrangement of object parts and the symmetry of the object, which provide useful geometric cues for recognition. However, such view correspondences have not been explicitly and fully exploited in existing work. In this paper, we propose a correspondence-aware representation (CAR) module, which explicitly finds potential intra-view correspondences and cross-view correspondences via k NN search in semantic space and then aggregates the shape features from the correspondences via learned transforms. Particularly, the spatial relations of correspondences in terms of their viewpoint positions and intra-view locations are taken into account for learning correspondence-aware features. Incorporating the CAR module into a ResNet-18 backbone, we propose an effective deep model called CAR-Net for 3D shape classification and retrieval. Extensive experiments have demonstrated the effectiveness of the CAR module as well as the excellent performance of the CAR-Net.
Collapse
|
22
|
ResSANet: Learning Geometric Information for Point Cloud Processing. SENSORS 2021; 21:s21093227. [PMID: 34066612 PMCID: PMC8124999 DOI: 10.3390/s21093227] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 04/01/2021] [Revised: 04/27/2021] [Accepted: 05/05/2021] [Indexed: 11/17/2022]
Abstract
Point clouds with rich local geometric information have potentially huge implications in several applications, especially in areas of robotic manipulation and autonomous driving. However, most point cloud processing methods cannot extract enough geometric features from a raw point cloud, which restricts the performance of their downstream tasks such as point cloud classification, shape retrieval and part segmentation. In this paper, the authors propose a new method where a convolution based on geometric primitives is adopted to accurately represent the elusive shape in the form of a point cloud to fully extract hidden geometric features. The key idea of the proposed approach is building a brand-new convolution net named ResSANet on the basis of geometric primitives to learn hierarchical geometry information. Two different modules are devised in our network, Res-SA and ResSA2, to achieve feature fusion at different levels in ResSANet. This work achieves classification accuracy up to 93.2% on the ModelNet40 dataset and the shape retrieval with an effect of 87.4%. The part segmentation experiment also achieves an accuracy of 83.3% (class mIoU) and 85.3% (instance mIoU) on ShapeNet dataset. It is worth mentioning that the number of parameters in this work is just 1.04 M while the network depth is minimal. Experimental results and comparisons with state-of-the-art methods demonstrate that our approach can achieve superior performance.
Collapse
|
23
|
Zhou X, Li Y, Liang W. CNN-RNN Based Intelligent Recommendation for Online Medical Pre-Diagnosis Support. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2021; 18:912-921. [PMID: 32750846 DOI: 10.1109/tcbb.2020.2994780] [Citation(s) in RCA: 102] [Impact Index Per Article: 25.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/11/2023]
Abstract
The rapidly developed Health 2.0 technology has provided people with more opportunities to conduct online medical consultation than ever before. Understanding contexts within different online medical communications and activities becomes a significant issue to facilitate patients' medical decision making process. As a subcategory of machine learning, neural networks have drawn increasing attentions in natural language processing applications. In this article, we focus on modeling and analyzing the patient-physician-generated data based on an integrated CNN-RNN framework, in order to deal with the situation that patients' online inquiries are usually not very long. A so-called DP-CRNN algorithm is developed with a newly designed neural network structure, to extract and highlight the combination of semantic and sequential features in terms of patient's inquiries. An intelligent recommendation method is then proposed to provide patients with automatic clinic guidance and pre-diagnosis suggestions, in which a clustering mechanism is utilized to refine the learning process with more precise diagnosis scope and more representative features. Experiments based on the collected real world data demonstrate the effectiveness of our proposed model and method for intelligent pre-diagnosis service in online medical environments.
Collapse
|
24
|
Nie W, Zhao Y, Song D, Gao Y. DAN: Deep-Attention Network for 3D Shape Recognition. IEEE TRANSACTIONS ON IMAGE PROCESSING : A PUBLICATION OF THE IEEE SIGNAL PROCESSING SOCIETY 2021; 30:4371-4383. [PMID: 33848247 DOI: 10.1109/tip.2021.3071687] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/12/2023]
Abstract
Due to the wide applications in a rapidly increasing number of different fields, 3D shape recognition has become a hot topic in the computer vision field. Many approaches have been proposed in recent years. However, there remain huge challenges in two aspects: exploring the effective representation of 3D shapes and reducing the redundant complexity of 3D shapes. In this paper, we propose a novel deep-attention network (DAN) for 3D shape representation based on multiview information. More specifically, we introduce the attention mechanism to construct a deep multiattention network that has advantages in two aspects: 1) information selection, in which DAN utilizes the self-attention mechanism to update the feature vector of each view, effectively reducing the redundant information, and 2) information fusion, in which DAN applies attention mechanism that can save more effective information by considering the correlations among views. Meanwhile, deep network structure can fully consider the correlations to continuously fuse effective information. To validate the effectiveness of our proposed method, we conduct experiments on the public 3D shape datasets: ModelNet40, ModelNet10, and ShapeNetCore55. Experimental results and comparison with state-of-the-art methods demonstrate the superiority of our proposed method. Code is released on https://github.com/RiDang/DANN.
Collapse
|
25
|
Cheng S, Chen X, He X, Liu Z, Bai X. PRA-Net: Point Relation-Aware Network for 3D Point Cloud Analysis. IEEE TRANSACTIONS ON IMAGE PROCESSING : A PUBLICATION OF THE IEEE SIGNAL PROCESSING SOCIETY 2021; 30:4436-4448. [PMID: 33856993 DOI: 10.1109/tip.2021.3072214] [Citation(s) in RCA: 11] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/12/2023]
Abstract
Learning intra-region contexts and inter-region relations are two effective strategies to strengthen feature representations for point cloud analysis. However, unifying the two strategies for point cloud representation is not fully emphasized in existing methods. To this end, we propose a novel framework named Point Relation-Aware Network (PRA-Net), which is composed of an Intra-region Structure Learning (ISL) module and an Inter-region Relation Learning (IRL) module. The ISL module can dynamically integrate the local structural information into the point features, while the IRL module captures inter-region relations adaptively and efficiently via a differentiable region partition scheme and a representative point-based strategy. Extensive experiments on several 3D benchmarks covering shape classification, keypoint estimation, and part segmentation have verified the effectiveness and the generalization ability of PRA-Net. Code will be available at https://github.com/XiwuChen/PRA-Net.
Collapse
|
26
|
|
27
|
Liu AA, Zhou H, Nie W, Liu Z, Liu W, Xie H, Mao Z, Li X, Song D. Hierarchical multi-view context modelling for 3D object classification and retrieval. Inf Sci (N Y) 2021. [DOI: 10.1016/j.ins.2020.09.057] [Citation(s) in RCA: 11] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/23/2022]
|
28
|
Liu X, Han Z, Liu YS, Zwicker M. Fine-Grained 3D Shape Classification With Hierarchical Part-View Attention. IEEE TRANSACTIONS ON IMAGE PROCESSING : A PUBLICATION OF THE IEEE SIGNAL PROCESSING SOCIETY 2021; 30:1744-1758. [PMID: 33417547 DOI: 10.1109/tip.2020.3048623] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/12/2023]
Abstract
Fine-grained 3D shape classification is important for shape understanding and analysis, which poses a challenging research problem. However, the studies on the fine-grained 3D shape classification have rarely been explored, due to the lack of fine-grained 3D shape benchmarks. To address this issue, we first introduce a new 3D shape dataset (named FG3D dataset) with fine-grained class labels, which consists of three categories including airplane, car and chair. Each category consists of several subcategories at a fine-grained level. According to our experiments under this fine-grained dataset, we find that state-of-the-art methods are significantly limited by the small variance among subcategories in the same category. To resolve this problem, we further propose a novel fine-grained 3D shape classification method named FG3D-Net to capture the fine-grained local details of 3D shapes from multiple rendered views. Specifically, we first train a Region Proposal Network (RPN) to detect the generally semantic parts inside multiple views under the benchmark of generally semantic part detection. Then, we design a hierarchical part-view attention aggregation module to learn a global shape representation by aggregating generally semantic part features, which preserves the local details of 3D shapes. The part-view attention module hierarchically leverages part-level and view-level attention to increase the discriminability of our features. The part-level attention highlights the important parts in each view while the view-level attention highlights the discriminative views among all the views of the same object. In addition, we integrate a Recurrent Neural Network (RNN) to capture the spatial relationships among sequential views from different viewpoints. Our results under the fine-grained 3D shape dataset show that our method outperforms other state-of-the-art methods. The FG3D dataset is available at https://github.com/liuxinhai/FG3D-Net.
Collapse
|
29
|
Sun K, Zhang J, Liu J, Yu R, Song Z. DRCNN: Dynamic Routing Convolutional Neural Network for Multi-View 3D Object Recognition. IEEE TRANSACTIONS ON IMAGE PROCESSING : A PUBLICATION OF THE IEEE SIGNAL PROCESSING SOCIETY 2020; 30:868-877. [PMID: 33237859 DOI: 10.1109/tip.2020.3039378] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/11/2023]
Abstract
3D object recognition is one of the most important tasks in 3D data processing, and has been extensively studied recently. Researchers have proposed various 3D recognition methods based on deep learning, among which a class of view-based approaches is a typical one. However, in the view-based methods, the commonly used view pooling layer to fuse multi-view features causes a loss of visual information. To alleviate this problem, in this paper, we construct a novel layer called Dynamic Routing Layer (DRL) by modifying the dynamic routing algorithm of capsule network, to more effectively fuse the features of each view. Concretely, in DRL, we use rearrangement and affine transformation to convert features, then leverage the modified dynamic routing algorithm to adaptively choose the converted features, instead of ignoring all but the most active feature in view pooling layer. We also illustrate that the view pooling layer is a special case of our DRL. In addition, based on DRL, we further present a Dynamic Routing Convolutional Neural Network (DRCNN) for multi-view 3D object recognition. Our experiments on three 3D benchmark datasets show that our proposed DRCNN outperforms many state-of-the-arts, which demonstrates the efficacy of our method.
Collapse
|
30
|
Wen X, Han Z, Liu X, Liu YS. Point2SpatialCapsule: Aggregating Features and Spatial Relationships of Local Regions on Point Clouds using Spatial-aware Capsules. IEEE TRANSACTIONS ON IMAGE PROCESSING : A PUBLICATION OF THE IEEE SIGNAL PROCESSING SOCIETY 2020; PP:8855-8869. [PMID: 32894715 DOI: 10.1109/tip.2020.3019925] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/11/2023]
Abstract
Learning discriminative shape representation directly on point clouds is still challenging in 3D shape analysis and understanding. Recent studies usually involve three steps: first splitting a point cloud into some local regions, then extracting the corresponding feature of each local region, and finally aggregating all individual local region features into a global feature as shape representation using simple max-pooling. However, such pooling-based feature aggregation methods do not adequately take the spatial relationships (e.g. the relative locations to other regions) between local regions into account, which greatly limits the ability to learn discriminative shape representation. To address this issue, we propose a novel deep learning network, named Point2SpatialCapsule, for aggregating features and spatial relationships of local regions on point clouds, which aims to learn more discriminative shape representation. Compared with the traditional max-pooling based feature aggregation networks, Point2SpatialCapsule can explicitly learn not only geometric features of local regions but also the spatial relationships among them. Point2SpatialCapsule consists of two main modules. To resolve the disorder problem of local regions, the first module, named geometric feature aggregation, is designed to aggregate the local region features into the learnable cluster centers, which explicitly encodes the spatial locations from the original 3D space. The second module, named spatial relationship aggregation, is proposed for further aggregating the clustered features and the spatial relationships among them in the feature space using the spatial-aware capsules developed in this paper. Compared to the previous capsule network based methods, the feature routing on the spatial-aware capsules can learn more discriminative spatial relationships among local regions for point clouds, which establishes a direct mapping between log priors and the spatial locations through feature clusters. Experimental results demonstrate that Point2SpatialCapsule outperforms the state-of-the-art methods in the 3D shape classification, retrieval and segmentation tasks under the well-known ModelNet and ShapeNet datasets.
Collapse
|
31
|
Han Z, Ma B, Liu YS, Zwicker M. Reconstructing 3D Shapes from Multiple Sketches using Direct Shape Optimization. IEEE TRANSACTIONS ON IMAGE PROCESSING : A PUBLICATION OF THE IEEE SIGNAL PROCESSING SOCIETY 2020; PP:8721-8734. [PMID: 32870791 DOI: 10.1109/tip.2020.3018865] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/11/2023]
Abstract
3D shape reconstruction from multiple hand-drawn sketches is an intriguing way to 3D shape modeling. Currently, state-of-the-art methods employ neural networks to learn a mapping from multiple sketches from arbitrary view angles to a 3D voxel grid. Because of the cubic complexity of 3D voxel grids, however, neural networks are hard to train and limited to low resolution reconstructions, which leads to a lack of geometric detail and low accuracy. To resolve this issue, we propose to reconstruct 3D shapes from multiple sketches using direct shape optimization (DSO), which does not involve deep learning models for direct voxel-based 3D shape generation. Specifically, we first leverage a conditional generative adversarial network (CGAN) to translate each sketch into an attenuance image that captures the predicted geometry from a given viewpoint. Then, DSO minimizes a project-and-compare loss to reconstruct the 3D shape such that it matches the predicted attenuance images from the view angles of all input sketches. Based on this, we further propose a progressive update approach to handle inconsistencies among a few hand-drawn sketches for the same 3D shape. Our experimental results show that our method significantly outperforms the state-of-the-art methods under widely used benchmarks and produces intuitive results in an interactive application.
Collapse
|
32
|
|
33
|
Xu X, Caulfield S, Amaro J, Falcao G, Moloney D. 1.2 Watt Classification of 3D Voxel Based Point-clouds using a CNN on a Neural Compute Stick. Neurocomputing 2020. [DOI: 10.1016/j.neucom.2018.10.114] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/27/2022]
|
34
|
Li N, Li Q, Liu YS, Lu W, Wang W. BIMSeek++: Retrieving BIM components using similarity measurement of attributes. COMPUT IND 2020. [DOI: 10.1016/j.compind.2020.103186] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/27/2022]
|