1
|
Lu T, Liu C, Chen Y, Wu G, Wang L. APP-Net: Auxiliary-Point-Based Push and Pull Operations for Efficient Point Cloud Recognition. IEEE TRANSACTIONS ON IMAGE PROCESSING : A PUBLICATION OF THE IEEE SIGNAL PROCESSING SOCIETY 2023; 32:6500-6513. [PMID: 37988214 DOI: 10.1109/tip.2023.3333191] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/23/2023]
Abstract
Aggregating neighbor features is essential for point cloud neural network. In the existing work, each point in the cloud may inevitably be selected as the neighbors of multiple aggregation centers, as all centers will gather neighbor features from the whole point cloud independently. Thus, each point has to participate in the calculation repeatedly, generating redundant duplicates in the memory, leading to intensive computation costs and memory consumption. Meanwhile, to pursue higher accuracy, previous methods often rely on a complex local aggregator to extract fine geometric representation, further slowing down the processing pipeline. To address these issues, we propose a new local aggregator of linear complexity for point cloud analysis, coined as APP. Specifically, we introduce an auxiliary container as an anchor to exchange features between the source point and the aggregating center. Each source point pushes its feature to only one auxiliary container, and each center point pulls features from only one auxiliary container. This avoids the re-computation issue of each source point. To facilitate the learning of the local structure of point cloud, we use an online normal estimation module to provide explainable geometric information to enhance our APP modeling capability. Our built network is more efficient than all the previous baselines with a clear margin while still consuming a lower memory. Experiments on classification and semantic segmentation demonstrate that APP-Net reaches comparable accuracies to other networks. In the classification task, it can process more than 10,000 samples per second with less than 10GB of memory on a single GPU. We will release the code at https://github.com/MCG-NJU/ APP-Net.
Collapse
|
2
|
Xiang P, Wen X, Liu YS, Cao YP, Wan P, Zheng W, Han Z. Snowflake Point Deconvolution for Point Cloud Completion and Generation With Skip-Transformer. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE 2023; 45:6320-6338. [PMID: 36282830 DOI: 10.1109/tpami.2022.3217161] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/16/2023]
Abstract
Most existing point cloud completion methods suffer from the discrete nature of point clouds and the unstructured prediction of points in local regions, which makes it difficult to reveal fine local geometric details. To resolve this issue, we propose SnowflakeNet with snowflake point deconvolution (SPD) to generate complete point clouds. SPD models the generation of point clouds as the snowflake-like growth of points, where child points are generated progressively by splitting their parent points after each SPD. Our insight into the detailed geometry is to introduce a skip-transformer in the SPD to learn the point splitting patterns that can best fit the local regions. The skip-transformer leverages attention mechanism to summarize the splitting patterns used in the previous SPD layer to produce the splitting in the current layer. The locally compact and structured point clouds generated by SPD precisely reveal the structural characteristics of the 3D shape in local patches, which enables us to predict highly detailed geometries. Moreover, since SPD is a general operation that is not limited to completion, we explore its applications in other generative tasks, including point cloud auto-encoding, generation, single image reconstruction, and upsampling. Our experimental results outperform state-of-the-art methods under widely used benchmarks.
Collapse
|
3
|
Wen X, Xiang P, Han Z, Cao YP, Wan P, Zheng W, Liu YS. PMP-Net++: Point Cloud Completion by Transformer-Enhanced Multi-Step Point Moving Paths. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE 2023; 45:852-867. [PMID: 35290184 DOI: 10.1109/tpami.2022.3159003] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/14/2023]
Abstract
Point cloud completion concerns to predict missing part for incomplete 3D shapes. A common strategy is to generate complete shape according to incomplete input. However, unordered nature of point clouds will degrade generation of high-quality 3D shapes, as detailed topology and structure of unordered points are hard to be captured during the generative process using an extracted latent code. We address this problem by formulating completion as point cloud deformation process. Specifically, we design a novel neural network, named PMP-Net++, to mimic behavior of an earth mover. It moves each point of incomplete input to obtain a complete point cloud, where total distance of point moving paths (PMPs) should be the shortest. Therefore, PMP-Net++ predicts unique PMP for each point according to constraint of point moving distances. The network learns a strict and unique correspondence on point-level, and thus improves quality of predicted complete shape. Moreover, since moving points heavily relies on per-point features learned by network, we further introduce a transformer-enhanced representation learning network, which significantly improves completion performance of PMP-Net++. We conduct comprehensive experiments in shape completion, and further explore application on point cloud up-sampling, which demonstrate non-trivial improvement of PMP-Net++ over state-of-the-art point cloud completion/up-sampling methods.
Collapse
|
4
|
Wang W, Wang X, Chen G, Zhou H. Multi-view SoftPool attention convolutional networks for 3D model classification. Front Neurorobot 2022; 16:1029968. [DOI: 10.3389/fnbot.2022.1029968] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/28/2022] [Accepted: 11/01/2022] [Indexed: 11/17/2022] Open
Abstract
IntroductionExisting multi-view-based 3D model classification methods have the problems of insufficient view refinement feature extraction and poor generalization ability of the network model, which makes it difficult to further improve the classification accuracy. To this end, this paper proposes a multi-view SoftPool attention convolutional network for 3D model classification tasks.MethodsThis method extracts multi-view features through ResNest and adaptive pooling modules, and the extracted features can better represent 3D models. Then, the results of the multi-view feature extraction processed using SoftPool are used as the Query for the self-attentive calculation, which enables the subsequent refinement extraction. We then input the attention scores calculated by Query and Key in the self-attention calculation into the mobile inverted bottleneck convolution, which effectively improves the generalization of the network model. Based on our proposed method, a compact 3D global descriptor is finally generated, achieving a high-accuracy 3D model classification performance.ResultsExperimental results showed that our method achieves 96.96% OA and 95.68% AA on ModelNet40 and 98.57% OA and 98.42% AA on ModelNet10.DiscussionCompared with a multitude of popular methods, our algorithm model achieves the state-of-the-art classification accuracy.
Collapse
|
5
|
3DPointCaps++: Learning 3D Representations with Capsule Networks. Int J Comput Vis 2022; 130:2321-2336. [PMID: 35968252 PMCID: PMC9362689 DOI: 10.1007/s11263-022-01632-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/28/2021] [Accepted: 04/27/2022] [Indexed: 11/26/2022]
Abstract
We present 3DPointCaps++ for learning robust, flexible and generalizable 3D object representations without requiring heavy annotation efforts or supervision. Unlike conventional 3D generative models, our algorithm aims for building a structured latent space where certain factors of shape variations, such as object parts, can be disentangled into independent sub-spaces. Our novel decoder then acts on these individual latent sub-spaces (i.e. capsules) using deconvolution operators to reconstruct 3D points in a self-supervised manner. We further introduce a cluster loss ensuring that the points reconstructed by a single capsule remain local and do not spread across the object uncontrollably. These contributions allow our network to tackle the challenging tasks of part segmentation, part interpolation/replacement as well as correspondence estimation across rigid / non-rigid shape, and across / within category. Our extensive evaluations on ShapeNet objects and human scans demonstrate that our network can learn generic representations that are robust and useful in many applications.
Collapse
|
6
|
Liu X, Liu X, Liu YS, Han Z. SPU-Net: Self-Supervised Point Cloud Upsampling by Coarse-to-Fine Reconstruction With Self-Projection Optimization. IEEE TRANSACTIONS ON IMAGE PROCESSING : A PUBLICATION OF THE IEEE SIGNAL PROCESSING SOCIETY 2022; 31:4213-4226. [PMID: 35696479 DOI: 10.1109/tip.2022.3182266] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/15/2023]
Abstract
The task of point cloud upsampling aims to acquire dense and uniform point sets from sparse and irregular point sets. Although significant progress has been made with deep learning models, state-of-the-art methods require ground-truth dense point sets as the supervision, which makes them limited to be trained under synthetic paired training data and not suitable to be under real-scanned sparse data. However, it is expensive and tedious to obtain large numbers of paired sparse-dense point sets as supervision from real-scanned sparse data. To address this problem, we propose a self-supervised point cloud upsampling network, named SPU-Net, to capture the inherent upsampling patterns of points lying on the underlying object surface. Specifically, we propose a coarse-to-fine reconstruction framework, which contains two main components: point feature extraction and point feature expansion, respectively. In the point feature extraction, we integrate the self-attention module with the graph convolution network (GCN) to capture context information inside and among local regions simultaneously. In the point feature expansion, we introduce a hierarchically learnable folding strategy to generate upsampled point sets with learnable 2D grids. Moreover, to further optimize the noisy points in the generated point sets, we propose a novel self-projection optimization associated with uniform and reconstruction terms as a joint loss to facilitate the self-supervised point cloud upsampling. We conduct various experiments on both synthetic and real-scanned datasets, and the results demonstrate that we achieve comparable performances to state-of-the-art supervised methods.
Collapse
|
7
|
A Lightweight Network for Point Cloud Analysis via the Fusion of Local Features and Distribution Characteristics. SENSORS 2022; 22:s22134742. [PMID: 35808253 PMCID: PMC9269399 DOI: 10.3390/s22134742] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 05/24/2022] [Revised: 06/15/2022] [Accepted: 06/21/2022] [Indexed: 11/17/2022]
Abstract
Effectively integrating the local features and their spatial distribution information for more effective point cloud analysis is a subject that has been explored for a long time. Inspired by convolutional neural networks (CNNs), this paper studies the relationship between local features and their spatial characteristics and proposes a concise architecture to effectively integrate them instead of designing more sophisticated feature extraction modules. Different positions in the feature map of the 2D image correspond to different weights in the convolution kernel, making the obtained features that are sensitive to local distribution characteristics. Thus, the spatial distribution of the input features of the point cloud within the receptive field is critical for capturing abstract regional aggregated features. We design a lightweight structure to extract local features by explicitly supplementing the distribution information of the input features to obtain distinctive features for point cloud analysis. Compared with the baseline, our model shows improvements in accuracy and convergence speed, and these advantages facilitate the introduction of the snapshot ensemble. Aiming at the shortcomings of the commonly used cosine annealing learning schedule, we design a new annealing schedule that can be flexibly adjusted for the snapshot ensemble technology, which significantly improves the performance by a large margin. Extensive experiments on typical benchmarks verify that, although it adopts the basic shared multi-layer perceptrons (MLPs) as feature extractors, the proposed model with a lightweight structure achieves on-par performance with previous state-of-the-art (SOTA) methods (e.g., MoldeNet40 classification, 0.98 million parameters and 93.5% accuracy; S3DIS segmentation, 1.4 million parameters and 68.7% mIoU).
Collapse
|
8
|
Effective Point Cloud Analysis Using Multi-Scale Features. SENSORS 2021; 21:s21165574. [PMID: 34451016 PMCID: PMC8402300 DOI: 10.3390/s21165574] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 07/20/2021] [Revised: 08/14/2021] [Accepted: 08/16/2021] [Indexed: 11/24/2022]
Abstract
Fully exploring the correlation of local features and their spatial distribution in point clouds is essential for feature modeling. This paper, inspired by convolutional neural networks (CNNs), explores the relationship between local patterns and point coordinates from a novel perspective and proposes a lightweight structure based on multi-scale features and a two-step fusion strategy. Specifically, local features of multi-scales and their spatial distribution can be regarded as independent features corresponding to different levels of geometric significance, which are extracted by multiple parallel branches and then merged on multiple levels. In this way, the proposed model generates a shape-level representation that contains rich local characteristics and the spatial relationship between them. Moreover, with the shared multi-layer perceptrons (MLPs) as basic operators, the proposed structure is so concise that it converges rapidly, and so we introduce the snapshot ensemble to improve performance further. The model is evaluated on classification and part segmentation tasks. The experiments prove that our model achieves on-par or better performance than previous state-of-the-art (SOTA) methods.
Collapse
|
9
|
Cheng S, Chen X, He X, Liu Z, Bai X. PRA-Net: Point Relation-Aware Network for 3D Point Cloud Analysis. IEEE TRANSACTIONS ON IMAGE PROCESSING : A PUBLICATION OF THE IEEE SIGNAL PROCESSING SOCIETY 2021; 30:4436-4448. [PMID: 33856993 DOI: 10.1109/tip.2021.3072214] [Citation(s) in RCA: 11] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/12/2023]
Abstract
Learning intra-region contexts and inter-region relations are two effective strategies to strengthen feature representations for point cloud analysis. However, unifying the two strategies for point cloud representation is not fully emphasized in existing methods. To this end, we propose a novel framework named Point Relation-Aware Network (PRA-Net), which is composed of an Intra-region Structure Learning (ISL) module and an Inter-region Relation Learning (IRL) module. The ISL module can dynamically integrate the local structural information into the point features, while the IRL module captures inter-region relations adaptively and efficiently via a differentiable region partition scheme and a representative point-based strategy. Extensive experiments on several 3D benchmarks covering shape classification, keypoint estimation, and part segmentation have verified the effectiveness and the generalization ability of PRA-Net. Code will be available at https://github.com/XiwuChen/PRA-Net.
Collapse
|
10
|
Liu X, Han Z, Liu YS, Zwicker M. Fine-Grained 3D Shape Classification With Hierarchical Part-View Attention. IEEE TRANSACTIONS ON IMAGE PROCESSING : A PUBLICATION OF THE IEEE SIGNAL PROCESSING SOCIETY 2021; 30:1744-1758. [PMID: 33417547 DOI: 10.1109/tip.2020.3048623] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/12/2023]
Abstract
Fine-grained 3D shape classification is important for shape understanding and analysis, which poses a challenging research problem. However, the studies on the fine-grained 3D shape classification have rarely been explored, due to the lack of fine-grained 3D shape benchmarks. To address this issue, we first introduce a new 3D shape dataset (named FG3D dataset) with fine-grained class labels, which consists of three categories including airplane, car and chair. Each category consists of several subcategories at a fine-grained level. According to our experiments under this fine-grained dataset, we find that state-of-the-art methods are significantly limited by the small variance among subcategories in the same category. To resolve this problem, we further propose a novel fine-grained 3D shape classification method named FG3D-Net to capture the fine-grained local details of 3D shapes from multiple rendered views. Specifically, we first train a Region Proposal Network (RPN) to detect the generally semantic parts inside multiple views under the benchmark of generally semantic part detection. Then, we design a hierarchical part-view attention aggregation module to learn a global shape representation by aggregating generally semantic part features, which preserves the local details of 3D shapes. The part-view attention module hierarchically leverages part-level and view-level attention to increase the discriminability of our features. The part-level attention highlights the important parts in each view while the view-level attention highlights the discriminative views among all the views of the same object. In addition, we integrate a Recurrent Neural Network (RNN) to capture the spatial relationships among sequential views from different viewpoints. Our results under the fine-grained 3D shape dataset show that our method outperforms other state-of-the-art methods. The FG3D dataset is available at https://github.com/liuxinhai/FG3D-Net.
Collapse
|