1
|
Miyauchi S, Morooka K, Kurazume R. Isomorphic Mesh Generation From Point Clouds With Multilayer Perceptrons. IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS 2025; 31:1647-1663. [PMID: 38376959 DOI: 10.1109/tvcg.2024.3367855] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 02/22/2024]
Abstract
A novel neural network called the isomorphic mesh generator (iMG) is proposed to generate isomorphic meshes from point clouds containing noise and missing parts. Isomorphic meshes of arbitrary objects exhibit a unified mesh structure, despite objects belonging to different classes. This unified representation enables various modern deep neural networks (DNNs) to easily handle surface models without requiring additional pre-processing. Additionally, the unified mesh structure of isomorphic meshes enables the application of the same process to all isomorphic meshes, unlike general mesh models, where processes need to be tailored depending on their mesh structures. Therefore, the use of isomorphic meshes can ensure efficient memory usage and reduce calculation time. Apart from the point cloud of the target object used as input for the iMG, point clouds and mesh models need not be prepared in advance as training data because the iMG is a data-free method. Furthermore, the iMG outputs an isomorphic mesh obtained by mapping a reference mesh to a given input point cloud. To stably estimate the mapping function, a step-by-step mapping strategy is introduced. This strategy enables flexible deformation while simultaneously maintaining the structure of the reference mesh. Simulations and experiments conducted using a mobile phone have confirmed that the iMG reliably generates isomorphic meshes of given objects, even when the input point cloud includes noise and missing parts.
Collapse
|
2
|
Wang C, Zha Y, He J, Yang W, Zhang T. Rethinking Masked Representation Learning for 3D Point Cloud Understanding. IEEE TRANSACTIONS ON IMAGE PROCESSING : A PUBLICATION OF THE IEEE SIGNAL PROCESSING SOCIETY 2024; PP:247-262. [PMID: 40030821 DOI: 10.1109/tip.2024.3520008] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 03/05/2025]
Abstract
Self-supervised point cloud representation learning aims to acquire robust and general feature representations from unlabeled data. Recently, masked point modeling-based methods have shown significant performance improvements for point cloud understanding, yet these methods rely on overlapping grouping strategies (k-nearest neighbor algorithm) resulting in early leakage of structural information of mask groups, and overlook the semantic modeling of object components resulting in parts with the same semantics having obvious feature differences due to position differences. In this work, we rethink grouping strategies and pretext tasks that are more suitable for self-supervised point cloud representation learning and propose a novel hierarchical masked representation learning method, including an optimal transport-based hierarchical grouping strategy, a prototype-based part modeling module, and a hierarchical attention encoder. The proposed method enjoys several merits. First, the proposed grouping strategy partitions the point cloud into non-overlapping groups, eliminating the early leakage of structural information in the masked groups. Second, the proposed prototype-based part modeling module dynamically models different object components, ensuring feature consistency on parts with the same semantics. Extensive experiments on four downstream tasks demonstrate that our method surpasses state-of-the-art 3D representation learning methods. Furthermore, Comprehensive ablation studies and visualizations demonstrate the effectiveness of the proposed modules.
Collapse
|
3
|
Zheng Y, Lu J, Duan Y, Zhou J. Structural Relation Modeling of 3D Point Clouds. IEEE TRANSACTIONS ON IMAGE PROCESSING : A PUBLICATION OF THE IEEE SIGNAL PROCESSING SOCIETY 2024; 33:4867-4881. [PMID: 39236129 DOI: 10.1109/tip.2024.3451940] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 09/07/2024]
Abstract
In this paper, we propose an effective plug-and-play module called structural relation network (SRN) to model structural dependencies in 3D point clouds for feature representation. Existing network architectures such as PointNet++ and RS-CNN capture local structures individually and ignore the inner interactions between different sub-clouds. Motivated by the fact that structural relation modeling plays critical roles for humans to understand 3D objects, our SRN exploits local information by modeling structural relations in 3D spaces. For a given sub-cloud of point sets, SRN firstly extracts its geometrical and locational relations with the other sub-clouds and maps them into the embedding space, then aggregates both relational features with the other sub-clouds. As the variation of semantics embedded in different sub-clouds is ignored by SRN, we further extend SRN to enable dynamic message passing between different sub-clouds. We propose a graph-based structural relation network (GSRN) where sub-clouds and their pairwise relations are modeled as nodes and edges respectively, so that the node features are updated by the messages along the edges. Since the node features might not be well preserved when acquiring the global representation, we propose a Combined Entropy Readout (CER) function to adaptively aggregate them into the holistic representation, so that GSRN simultaneously models the local-local and local-global region-wise interaction. The proposed SRN and GSRN modules are simple, interpretable, and do not require any additional supervision signals, which can be easily equipped with the existing networks. Experimental results on the benchmark datasets (ScanObjectNN, ModelNet40, ShapeNet Part, S3DIS, ScanNet and SUN-RGBD) indicate promising boosts on the tasks of 3D point cloud classification, segmentation and object detection.
Collapse
|
4
|
Zhou W, Zheng F, Zhao Y, Pang Y, Yi J. MSDCNN: A multiscale dilated convolution neural network for fine-grained 3D shape classification. Neural Netw 2024; 172:106141. [PMID: 38301340 DOI: 10.1016/j.neunet.2024.106141] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/10/2023] [Revised: 01/17/2024] [Accepted: 01/21/2024] [Indexed: 02/03/2024]
Abstract
Multi-view deep neural networks have shown excellent performance on 3D shape classification tasks. However, global features aggregated from multiple views data often lack content information and spatial relationship, which leads to difficult identification the small variance among subcategories in the same category. To solve this problem, in this paper, a novel multiscale dilated convolution neural network termed as MSDCNN is proposed for multi-view fine-grained 3D shape classification. Firstly, a sequence of views are rendered from 12-viewpoints around the input 3D shape by the sequential view capturing module. Then, the first 22 convolution layers of ResNeXt50 is employed to extract the semantic features of each view, and a global mixed feature map is obtained through the element-wise maximum operation of the 12 output feature maps. Furthermore, attention dilated module (ADM), which combines four concatenated attention dilated block (ADB), is designed to extract larger receptive field features from global mixed feature map to enhance context information among the views. Specifically, each ADB is consisted by an attention mechanism module and a dilated convolution with different dilation rates. In addition, prediction module with label smoothing is proposed to classify features, which contains 3 × 3 convolution and adaptive average pooling. The performance of our method is validated experimentally on the ModelNet10, ModelNet40 and FG3D datasets. Experimental results demonstrate the effectiveness and superiority of the proposed MSDCNN framework for 3D shape fine-grained classification.
Collapse
Affiliation(s)
- Wei Zhou
- College of Intelligent Technology and Engineering, Chongqing University of Science and Technology, Chongqing 401331, PR China.
| | - Fujian Zheng
- College of Intelligent Technology and Engineering, Chongqing University of Science and Technology, Chongqing 401331, PR China; College of Optoelectronic Engineering, Chongqing University, Chongqing 400030, PR China.
| | - Yiheng Zhao
- College of Intelligent Technology and Engineering, Chongqing University of Science and Technology, Chongqing 401331, PR China.
| | - Yiran Pang
- Department of Computer & Electrical Engineering and Computer Science, Florida Atlantic University, FL 33431, United States of America.
| | - Jun Yi
- College of Intelligent Technology and Engineering, Chongqing University of Science and Technology, Chongqing 401331, PR China.
| |
Collapse
|
5
|
Xiang P, Wen X, Liu YS, Cao YP, Wan P, Zheng W, Han Z. Snowflake Point Deconvolution for Point Cloud Completion and Generation With Skip-Transformer. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE 2023; 45:6320-6338. [PMID: 36282830 DOI: 10.1109/tpami.2022.3217161] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/16/2023]
Abstract
Most existing point cloud completion methods suffer from the discrete nature of point clouds and the unstructured prediction of points in local regions, which makes it difficult to reveal fine local geometric details. To resolve this issue, we propose SnowflakeNet with snowflake point deconvolution (SPD) to generate complete point clouds. SPD models the generation of point clouds as the snowflake-like growth of points, where child points are generated progressively by splitting their parent points after each SPD. Our insight into the detailed geometry is to introduce a skip-transformer in the SPD to learn the point splitting patterns that can best fit the local regions. The skip-transformer leverages attention mechanism to summarize the splitting patterns used in the previous SPD layer to produce the splitting in the current layer. The locally compact and structured point clouds generated by SPD precisely reveal the structural characteristics of the 3D shape in local patches, which enables us to predict highly detailed geometries. Moreover, since SPD is a general operation that is not limited to completion, we explore its applications in other generative tasks, including point cloud auto-encoding, generation, single image reconstruction, and upsampling. Our experimental results outperform state-of-the-art methods under widely used benchmarks.
Collapse
|
6
|
Wen X, Xiang P, Han Z, Cao YP, Wan P, Zheng W, Liu YS. PMP-Net++: Point Cloud Completion by Transformer-Enhanced Multi-Step Point Moving Paths. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE 2023; 45:852-867. [PMID: 35290184 DOI: 10.1109/tpami.2022.3159003] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/14/2023]
Abstract
Point cloud completion concerns to predict missing part for incomplete 3D shapes. A common strategy is to generate complete shape according to incomplete input. However, unordered nature of point clouds will degrade generation of high-quality 3D shapes, as detailed topology and structure of unordered points are hard to be captured during the generative process using an extracted latent code. We address this problem by formulating completion as point cloud deformation process. Specifically, we design a novel neural network, named PMP-Net++, to mimic behavior of an earth mover. It moves each point of incomplete input to obtain a complete point cloud, where total distance of point moving paths (PMPs) should be the shortest. Therefore, PMP-Net++ predicts unique PMP for each point according to constraint of point moving distances. The network learns a strict and unique correspondence on point-level, and thus improves quality of predicted complete shape. Moreover, since moving points heavily relies on per-point features learned by network, we further introduce a transformer-enhanced representation learning network, which significantly improves completion performance of PMP-Net++. We conduct comprehensive experiments in shape completion, and further explore application on point cloud up-sampling, which demonstrate non-trivial improvement of PMP-Net++ over state-of-the-art point cloud completion/up-sampling methods.
Collapse
|
7
|
Wei XS, Song YZ, Aodha OM, Wu J, Peng Y, Tang J, Yang J, Belongie S. Fine-Grained Image Analysis With Deep Learning: A Survey. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE 2022; 44:8927-8948. [PMID: 34752384 DOI: 10.1109/tpami.2021.3126648] [Citation(s) in RCA: 32] [Impact Index Per Article: 10.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/13/2023]
Abstract
Fine-grained image analysis (FGIA) is a longstanding and fundamental problem in computer vision and pattern recognition, and underpins a diverse set of real-world applications. The task of FGIA targets analyzing visual objects from subordinate categories, e.g., species of birds or models of cars. The small inter-class and large intra-class variation inherent to fine-grained image analysis makes it a challenging problem. Capitalizing on advances in deep learning, in recent years we have witnessed remarkable progress in deep learning powered FGIA. In this paper we present a systematic survey of these advances, where we attempt to re-define and broaden the field of FGIA by consolidating two fundamental fine-grained research areas - fine-grained image recognition and fine-grained image retrieval. In addition, we also review other key issues of FGIA, such as publicly available benchmark datasets and related domain-specific applications. We conclude by highlighting several research directions and open problems which need further exploration from the community.
Collapse
|
8
|
Zheng Y, Xu X, Zhou J, Lu J. PointRas: Uncertainty-Aware Multi-Resolution Learning for Point Cloud Segmentation. IEEE TRANSACTIONS ON IMAGE PROCESSING : A PUBLICATION OF THE IEEE SIGNAL PROCESSING SOCIETY 2022; 31:6002-6016. [PMID: 36103438 DOI: 10.1109/tip.2022.3205208] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/15/2023]
Abstract
In this paper, we propose an uncertainty-aware multi-resolution learning for point cloud segmentation, named PointRas. Most existing works for point cloud segmentation design encoder networks to obtain better representation of local space in point cloud. However, few of them investigate the utilization of features in the lower resolutions produced by encoders and consider the contextual learning between various resolutions in decoder network. To address this, we propose to utilize the descriptive characteristic of point clouds in the lower resolutions. Taking reference to core steps of rasterization in 2D graphics where the properties of pixels in high density are interpolated from a few primitive shapes in rasterization rendering, we use the similar strategy where prediction maps in lower resolution are iteratively regressed and upsampled into higher resolutions. Moreover, to remedy the potential information deficiency of lower-resolution point cloud, we refine the predictions in each resolution under the criterion of uncertainty selection, which notably enhances the representation ability of the point cloud in lower resolutions. Our proposed PointRas module can be incorporated into the backbones of various point cloud segmentation frameworks, and brings only marginal computational cost. We evaluate the proposed method on challenging datasets including ScanNet, S3DIS, NPM3D, STPLS3D and ScanObjectNN, and consistently improve the performance in comparison with the state-of-the-art methods.
Collapse
|
9
|
Liu X, Liu X, Liu YS, Han Z. SPU-Net: Self-Supervised Point Cloud Upsampling by Coarse-to-Fine Reconstruction With Self-Projection Optimization. IEEE TRANSACTIONS ON IMAGE PROCESSING : A PUBLICATION OF THE IEEE SIGNAL PROCESSING SOCIETY 2022; 31:4213-4226. [PMID: 35696479 DOI: 10.1109/tip.2022.3182266] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/15/2023]
Abstract
The task of point cloud upsampling aims to acquire dense and uniform point sets from sparse and irregular point sets. Although significant progress has been made with deep learning models, state-of-the-art methods require ground-truth dense point sets as the supervision, which makes them limited to be trained under synthetic paired training data and not suitable to be under real-scanned sparse data. However, it is expensive and tedious to obtain large numbers of paired sparse-dense point sets as supervision from real-scanned sparse data. To address this problem, we propose a self-supervised point cloud upsampling network, named SPU-Net, to capture the inherent upsampling patterns of points lying on the underlying object surface. Specifically, we propose a coarse-to-fine reconstruction framework, which contains two main components: point feature extraction and point feature expansion, respectively. In the point feature extraction, we integrate the self-attention module with the graph convolution network (GCN) to capture context information inside and among local regions simultaneously. In the point feature expansion, we introduce a hierarchically learnable folding strategy to generate upsampled point sets with learnable 2D grids. Moreover, to further optimize the noisy points in the generated point sets, we propose a novel self-projection optimization associated with uniform and reconstruction terms as a joint loss to facilitate the self-supervised point cloud upsampling. We conduct various experiments on both synthetic and real-scanned datasets, and the results demonstrate that we achieve comparable performances to state-of-the-art supervised methods.
Collapse
|
10
|
Liu F, Deng X, Zou C, Lai YK, Chen K, Zuo R, Ma C, Liu YJ, Wang H. SceneSketcher-v2: Fine-Grained Scene-Level Sketch-Based Image Retrieval Using Adaptive GCNs. IEEE TRANSACTIONS ON IMAGE PROCESSING : A PUBLICATION OF THE IEEE SIGNAL PROCESSING SOCIETY 2022; 31:3737-3751. [PMID: 35594232 DOI: 10.1109/tip.2022.3175403] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/15/2023]
Abstract
Sketch-based image retrieval (SBIR) is a long-standing research topic in computer vision. Existing methods mainly focus on category-level or instance-level image retrieval. This paper investigates the fine-grained scene-level SBIR problem where a free-hand sketch depicting a scene is used to retrieve desired images. This problem is useful yet challenging mainly because of two entangled facts: 1) achieving an effective representation of the input query data and scene-level images is difficult as it requires to model the information across multiple modalities such as object layout, relative size and visual appearances, and 2) there is a great domain gap between the query sketch input and target images. We present SceneSketcher-v2, a Graph Convolutional Network (GCN) based architecture to address these challenges. SceneSketcher-v2 employs a carefully designed graph convolution network to fuse the multi-modality information in the query sketch and target images and uses a triplet training process and end-to-end training manner to alleviate the domain gap. Extensive experiments demonstrate SceneSketcher-v2 outperforms state-of-the-art scene-level SBIR models with a significant margin.
Collapse
|