1
|
Du W, Wang H, Zhao C, Cui Z, Li J, Zhang W, Yu Y, Peng X. Postoperative facial prediction for mandibular defect based on surface mesh deformation. JOURNAL OF STOMATOLOGY, ORAL AND MAXILLOFACIAL SURGERY 2024; 125:101973. [PMID: 39089509 DOI: 10.1016/j.jormas.2024.101973] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/06/2024] [Revised: 07/12/2024] [Accepted: 07/17/2024] [Indexed: 08/04/2024]
Abstract
OBJECTIVES This study aims to introduce a novel predictive model for the post-operative facial contours of patients with mandibular defect, addressing limitations in current methodologies that fail to preserve geometric features and lack interpretability. METHODS Utilizing surface mesh theory and deep learning, our model diverges from traditional point cloud approaches by employing surface triangular mesh grids. We extract latent variables using a Mesh Convolutional Restricted Boltzmann Machines (MCRBM) model to generate a three-dimensional deformation field, aiming to enhance geometric information preservation and interpretability. RESULTS Experimental evaluations of our model demonstrate a prediction accuracy of 91.2 %, which represents a significant improvement over traditional machine learning-based methods. CONCLUSIONS The proposed model offers a promising new tool for pre-operative planning in oral and maxillofacial surgery. It significantly enhances the accuracy of post-operative facial contour predictions for mandibular defect reconstructions, providing substantial advancements over previous approaches.
Collapse
Affiliation(s)
- Wen Du
- Department of Oral and Maxillofacial Surgery, Peking University School and Hospital of Stomatology, National Center for Stomatology, National Clinical Research Center for Oral Diseases, Beijing Key Laboratory of Digital Stomatology, NHC Key Laboratory of Digital Stomatology, China
| | - Hao Wang
- Department of Oral and Maxillofacial Surgery, Peking University School and Hospital of Stomatology, National Center for Stomatology, National Clinical Research Center for Oral Diseases, Beijing Key Laboratory of Digital Stomatology, NHC Key Laboratory of Digital Stomatology, China
| | - Chenche Zhao
- College of Engineering, Peking University, China
| | - Zhiming Cui
- School of Biomedical Engineering, ShanghaiTech University, Shanghai 201210, China
| | - Jiaqi Li
- Department of Oral and Maxillofacial Surgery, Peking University School and Hospital of Stomatology, National Center for Stomatology, National Clinical Research Center for Oral Diseases, Beijing Key Laboratory of Digital Stomatology, NHC Key Laboratory of Digital Stomatology, China
| | - Wenbo Zhang
- Department of Oral and Maxillofacial Surgery, Peking University School and Hospital of Stomatology, National Center for Stomatology, National Clinical Research Center for Oral Diseases, Beijing Key Laboratory of Digital Stomatology, NHC Key Laboratory of Digital Stomatology, China
| | - Yao Yu
- Department of Oral and Maxillofacial Surgery, Peking University School and Hospital of Stomatology, National Center for Stomatology, National Clinical Research Center for Oral Diseases, Beijing Key Laboratory of Digital Stomatology, NHC Key Laboratory of Digital Stomatology, China
| | - Xin Peng
- Department of Oral and Maxillofacial Surgery, Peking University School and Hospital of Stomatology, National Center for Stomatology, National Clinical Research Center for Oral Diseases, Beijing Key Laboratory of Digital Stomatology, NHC Key Laboratory of Digital Stomatology, China.
| |
Collapse
|
2
|
An Optimization-Linked Intelligent Security Algorithm for Smart Healthcare Organizations. Healthcare (Basel) 2023; 11:healthcare11040580. [PMID: 36833114 PMCID: PMC9956199 DOI: 10.3390/healthcare11040580] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/29/2022] [Revised: 01/24/2023] [Accepted: 02/10/2023] [Indexed: 02/17/2023] Open
Abstract
IoT-enabled healthcare apps are providing significant value to society by offering cost-effective patient monitoring solutions in IoT-enabled buildings. However, with a large number of users and sensitive personal information readily available in today's fast-paced, internet, and cloud-based environment, the security of these healthcare systems must be a top priority. The idea of safely storing a patient's health data in an electronic format raises issues regarding patient data privacy and security. Furthermore, with traditional classifiers, processing large amounts of data is a difficult challenge. Several computational intelligence approaches are useful for effectively categorizing massive quantities of data for this goal. For many of these reasons, a novel healthcare monitoring system that tracks disease processes and forecasts diseases based on the available data obtained from patients in distant communities is proposed in this study. The proposed framework consists of three major stages, namely data collection, secured storage, and disease detection. The data are collected using IoT sensor devices. After that, the homomorphic encryption (HE) model is used for secured data storage. Finally, the disease detection framework is designed with the help of Centered Convolutional Restricted Boltzmann Machines-based whale optimization (CCRBM-WO) algorithm. The experiment is conducted on a Python-based cloud tool. The proposed system outperforms current e-healthcare solutions, according to the findings of the experiments. The accuracy, precision, F1-measure, and recall of our suggested technique are 96.87%, 97.45%, 97.78%, and 98.57%, respectively, according to the proposed method.
Collapse
|
3
|
Multi-Modal 3D Shape Clustering with Dual Contrastive Learning. APPLIED SCIENCES-BASEL 2022. [DOI: 10.3390/app12157384] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/16/2022]
Abstract
3D shape clustering is developing into an important research subject with the wide applications of 3D shapes in computer vision and multimedia fields. Since 3D shapes generally take on various modalities, how to comprehensively exploit the multi-modal properties to boost clustering performance has become a key issue for the 3D shape clustering task. Taking into account the advantages of multiple views and point clouds, this paper proposes the first multi-modal 3D shape clustering method, named the dual contrastive learning network (DCL-Net), to discover the clustering partitions of unlabeled 3D shapes. First, by simultaneously performing cross-view contrastive learning within multi-view modality and cross-modal contrastive learning between the point cloud and multi-view modalities in the representation space, a representation-level dual contrastive learning module is developed, which aims to capture discriminative 3D shape features for clustering. Meanwhile, an assignment-level dual contrastive learning module is designed by further ensuring the consistency of clustering assignments within the multi-view modality, as well as between the point cloud and multi-view modalities, thus obtaining more compact clustering partitions. Experiments on two commonly used 3D shape benchmarks demonstrate the effectiveness of the proposed DCL-Net.
Collapse
|
4
|
A Lightweight Network for Point Cloud Analysis via the Fusion of Local Features and Distribution Characteristics. SENSORS 2022; 22:s22134742. [PMID: 35808253 PMCID: PMC9269399 DOI: 10.3390/s22134742] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 05/24/2022] [Revised: 06/15/2022] [Accepted: 06/21/2022] [Indexed: 11/17/2022]
Abstract
Effectively integrating the local features and their spatial distribution information for more effective point cloud analysis is a subject that has been explored for a long time. Inspired by convolutional neural networks (CNNs), this paper studies the relationship between local features and their spatial characteristics and proposes a concise architecture to effectively integrate them instead of designing more sophisticated feature extraction modules. Different positions in the feature map of the 2D image correspond to different weights in the convolution kernel, making the obtained features that are sensitive to local distribution characteristics. Thus, the spatial distribution of the input features of the point cloud within the receptive field is critical for capturing abstract regional aggregated features. We design a lightweight structure to extract local features by explicitly supplementing the distribution information of the input features to obtain distinctive features for point cloud analysis. Compared with the baseline, our model shows improvements in accuracy and convergence speed, and these advantages facilitate the introduction of the snapshot ensemble. Aiming at the shortcomings of the commonly used cosine annealing learning schedule, we design a new annealing schedule that can be flexibly adjusted for the snapshot ensemble technology, which significantly improves the performance by a large margin. Extensive experiments on typical benchmarks verify that, although it adopts the basic shared multi-layer perceptrons (MLPs) as feature extractors, the proposed model with a lightweight structure achieves on-par performance with previous state-of-the-art (SOTA) methods (e.g., MoldeNet40 classification, 0.98 million parameters and 93.5% accuracy; S3DIS segmentation, 1.4 million parameters and 68.7% mIoU).
Collapse
|
5
|
Principal views selection based on growing graph convolution network for multi-view 3D model recognition. APPL INTELL 2022. [DOI: 10.1007/s10489-022-03775-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/02/2022]
|
6
|
Liu X, Han Z, Liu YS, Zwicker M. Fine-Grained 3D Shape Classification With Hierarchical Part-View Attention. IEEE TRANSACTIONS ON IMAGE PROCESSING : A PUBLICATION OF THE IEEE SIGNAL PROCESSING SOCIETY 2021; 30:1744-1758. [PMID: 33417547 DOI: 10.1109/tip.2020.3048623] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/12/2023]
Abstract
Fine-grained 3D shape classification is important for shape understanding and analysis, which poses a challenging research problem. However, the studies on the fine-grained 3D shape classification have rarely been explored, due to the lack of fine-grained 3D shape benchmarks. To address this issue, we first introduce a new 3D shape dataset (named FG3D dataset) with fine-grained class labels, which consists of three categories including airplane, car and chair. Each category consists of several subcategories at a fine-grained level. According to our experiments under this fine-grained dataset, we find that state-of-the-art methods are significantly limited by the small variance among subcategories in the same category. To resolve this problem, we further propose a novel fine-grained 3D shape classification method named FG3D-Net to capture the fine-grained local details of 3D shapes from multiple rendered views. Specifically, we first train a Region Proposal Network (RPN) to detect the generally semantic parts inside multiple views under the benchmark of generally semantic part detection. Then, we design a hierarchical part-view attention aggregation module to learn a global shape representation by aggregating generally semantic part features, which preserves the local details of 3D shapes. The part-view attention module hierarchically leverages part-level and view-level attention to increase the discriminability of our features. The part-level attention highlights the important parts in each view while the view-level attention highlights the discriminative views among all the views of the same object. In addition, we integrate a Recurrent Neural Network (RNN) to capture the spatial relationships among sequential views from different viewpoints. Our results under the fine-grained 3D shape dataset show that our method outperforms other state-of-the-art methods. The FG3D dataset is available at https://github.com/liuxinhai/FG3D-Net.
Collapse
|
7
|
Wen X, Han Z, Liu X, Liu YS. Point2SpatialCapsule: Aggregating Features and Spatial Relationships of Local Regions on Point Clouds using Spatial-aware Capsules. IEEE TRANSACTIONS ON IMAGE PROCESSING : A PUBLICATION OF THE IEEE SIGNAL PROCESSING SOCIETY 2020; PP:8855-8869. [PMID: 32894715 DOI: 10.1109/tip.2020.3019925] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/11/2023]
Abstract
Learning discriminative shape representation directly on point clouds is still challenging in 3D shape analysis and understanding. Recent studies usually involve three steps: first splitting a point cloud into some local regions, then extracting the corresponding feature of each local region, and finally aggregating all individual local region features into a global feature as shape representation using simple max-pooling. However, such pooling-based feature aggregation methods do not adequately take the spatial relationships (e.g. the relative locations to other regions) between local regions into account, which greatly limits the ability to learn discriminative shape representation. To address this issue, we propose a novel deep learning network, named Point2SpatialCapsule, for aggregating features and spatial relationships of local regions on point clouds, which aims to learn more discriminative shape representation. Compared with the traditional max-pooling based feature aggregation networks, Point2SpatialCapsule can explicitly learn not only geometric features of local regions but also the spatial relationships among them. Point2SpatialCapsule consists of two main modules. To resolve the disorder problem of local regions, the first module, named geometric feature aggregation, is designed to aggregate the local region features into the learnable cluster centers, which explicitly encodes the spatial locations from the original 3D space. The second module, named spatial relationship aggregation, is proposed for further aggregating the clustered features and the spatial relationships among them in the feature space using the spatial-aware capsules developed in this paper. Compared to the previous capsule network based methods, the feature routing on the spatial-aware capsules can learn more discriminative spatial relationships among local regions for point clouds, which establishes a direct mapping between log priors and the spatial locations through feature clusters. Experimental results demonstrate that Point2SpatialCapsule outperforms the state-of-the-art methods in the 3D shape classification, retrieval and segmentation tasks under the well-known ModelNet and ShapeNet datasets.
Collapse
|
8
|
Han Z, Ma B, Liu YS, Zwicker M. Reconstructing 3D Shapes from Multiple Sketches using Direct Shape Optimization. IEEE TRANSACTIONS ON IMAGE PROCESSING : A PUBLICATION OF THE IEEE SIGNAL PROCESSING SOCIETY 2020; PP:8721-8734. [PMID: 32870791 DOI: 10.1109/tip.2020.3018865] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/11/2023]
Abstract
3D shape reconstruction from multiple hand-drawn sketches is an intriguing way to 3D shape modeling. Currently, state-of-the-art methods employ neural networks to learn a mapping from multiple sketches from arbitrary view angles to a 3D voxel grid. Because of the cubic complexity of 3D voxel grids, however, neural networks are hard to train and limited to low resolution reconstructions, which leads to a lack of geometric detail and low accuracy. To resolve this issue, we propose to reconstruct 3D shapes from multiple sketches using direct shape optimization (DSO), which does not involve deep learning models for direct voxel-based 3D shape generation. Specifically, we first leverage a conditional generative adversarial network (CGAN) to translate each sketch into an attenuance image that captures the predicted geometry from a given viewpoint. Then, DSO minimizes a project-and-compare loss to reconstruct the 3D shape such that it matches the predicted attenuance images from the view angles of all input sketches. Based on this, we further propose a progressive update approach to handle inconsistencies among a few hand-drawn sketches for the same 3D shape. Our experimental results show that our method significantly outperforms the state-of-the-art methods under widely used benchmarks and produces intuitive results in an interactive application.
Collapse
|
9
|
A New Volumetric CNN for 3D Object Classification Based on Joint Multiscale Feature and Subvolume Supervised Learning Approaches. COMPUTATIONAL INTELLIGENCE AND NEUROSCIENCE 2020. [DOI: 10.1155/2020/5851465] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/18/2022]
Abstract
The advancement of low-cost RGB-D and LiDAR three-dimensional (3D) sensors has permitted the obtainment of the 3D model easier in real-time. However, making intricate 3D features is crucial for the advancement of 3D object classifications. The existing volumetric voxel-based CNN approaches have achieved remarkable progress, but they generate huge computational overhead that limits the extraction of global features at higher resolutions of 3D objects. In this paper, a low-cost 3D volumetric deep convolutional neural network is proposed for 3D object classification based on joint multiscale hierarchical and subvolume supervised learning strategies. Our proposed deep neural network inputs 3D data, which are preprocessed by implementing memory-efficient octree representation, and we propose to limit the full layer octree depth to a certain level based on the predefined input volume resolution for storing high-precision contour features. Multiscale features are concatenated from multilevel octree depths inside the network, aiming to adaptively generate high-level global features. The strategy of the subvolume supervision approach is to train the network on subparts of the 3D object in order to learn local features. Our framework has been evaluated with two publicly available 3D repositories. Experimental results demonstrate the effectiveness of our proposed method where the classification accuracy is improved in comparison to existing volumetric approaches, and the memory consumption ratio and run-time are significantly reduced.
Collapse
|
10
|
|
11
|
Han Z, Lu H, Liu Z, Vong CM, Liu YS, Zwicker M, Han J, Chen CLP. 3D2SeqViews: Aggregating Sequential Views for 3D Global Feature Learning by CNN With Hierarchical Attention Aggregation. IEEE TRANSACTIONS ON IMAGE PROCESSING : A PUBLICATION OF THE IEEE SIGNAL PROCESSING SOCIETY 2019; 28:3986-3999. [PMID: 30872228 DOI: 10.1109/tip.2019.2904460] [Citation(s) in RCA: 22] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/09/2023]
Abstract
Learning 3D global features by aggregating multiple views is important. Pooling is widely used to aggregate views in deep learning models. However, pooling disregards a lot of content information within views and the spatial relationship among the views, which limits the discriminability of learned features. To resolve this issue, 3D to Sequential Views (3D2SeqViews) is proposed to more effectively aggregate the sequential views using convolutional neural networks with a novel hierarchical attention aggregation. Specifically, the content information within each view is first encoded. Then, the encoded view content information and the sequential spatiality among the views are simultaneously aggregated by the hierarchical attention aggregation, where view-level attention and class-level attention are proposed to hierarchically weight sequential views and shape classes. View-level attention is learned to indicate how much attention is paid to each view by each shape class, which subsequently weights sequential views through a novel recursive view integration. Recursive view integration learns the semantic meaning of view sequence, which is robust to the first view position. Furthermore, class-level attention is introduced to describe how much attention is paid to each shape class, which innovatively employs the discriminative ability of the fine-tuned network. 3D2SeqViews learns more discriminative features than the state-of-the-art, which leads to the outperforming results in shape classification and retrieval under three large-scale benchmarks.
Collapse
|
12
|
Gu L, Huang J, Yang L. On the Representational Power of Restricted Boltzmann Machines for Symmetric Functions and Boolean Functions. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2019; 30:1335-1347. [PMID: 30281484 DOI: 10.1109/tnnls.2018.2868809] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/08/2023]
Abstract
Restricted Boltzmann machines (RBMs) are used to build deep-belief networks that are widely thought to be one of the first effective deep learning neural networks. This paper studies the ability of RBMs to represent distributions over {0,1}n via softplus/hardplus RBM networks. It is shown that any distribution whose density depends on the number of 1's in their input can be approximated with arbitrarily high accuracy by an RBM of size 2n+1 , which improves the result of a previous study by reducing the size from n2 to 2n+1 . A theorem for representing partially symmetric Boolean functions by softplus RBM networks is established. Accordingly, the representational power of RBMs for distributions whose mass represents the Boolean functions is investigated in comparison with that of threshold circuits and polynomial threshold functions. It is shown that a distribution over {0,1}n whose mass represents a Boolean function can be computed with a given margin δ by an RBM of size and parameters bounded by polynomials in n , if and only if it can be computed by a depth-2 threshold circuit with size and parameters bounded by polynomials in n .
Collapse
|
13
|
Han Z, Liu Z, Han J, Vong CM, Bu S, Chen CLP. Unsupervised Learning of 3-D Local Features From Raw Voxels Based on a Novel Permutation Voxelization Strategy. IEEE TRANSACTIONS ON CYBERNETICS 2019; 49:481-494. [PMID: 29990288 DOI: 10.1109/tcyb.2017.2778764] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/08/2023]
Abstract
Effective 3-D local features are significant elements for 3-D shape analysis. Existing hand-crafted 3-D local descriptors are effective but usually involve intensive human intervention and prior knowledge, which burdens the subsequent processing procedures. An alternative resorts to the unsupervised learning of features from raw 3-D representations via popular deep learning models. However, this alternative suffers from several significant unresolved issues, such as irregular vertex topology, arbitrary mesh resolution, orientation ambiguity on the 3-D surface, and rigid and slightly nonrigid transformation invariance. To tackle these issues, we propose an unsupervised 3-D local feature learning framework based on a novel permutation voxelization strategy to learn high-level and hierarchical 3-D local features from raw 3-D voxels. Specifically, the proposed strategy first applies a novel voxelization which discretizes each 3-D local region with irregular vertex topology and arbitrary mesh resolution into regular voxels, and then, a novel permutation is applied to permute the voxels to simultaneously eliminate the effect of rotation transformation and orientation ambiguity on the surface. Based on the proposed strategy, the permuted voxels can fully encode the geometry and structure of each local region in regular, sparse, and binary vectors. These voxel vectors are highly suitable for the learning of hierarchical common surface patterns by stacked sparse autoencoder with hierarchical abstraction and sparse constraint. Experiments are conducted on three aspects for evaluating the learned local features: 1) global shape retrieval; 2) partial shape retrieval; and 3) shape correspondence. The experimental results show that the learned local features outperform the other state-of-the-art 3-D shape descriptors.
Collapse
|
14
|
Han Z, Shang M, Liu Z, Vong CM, Liu YS, Zwicker M, Han J, Chen CLP. SeqViews2SeqLabels: Learning 3D Global Features via Aggregating Sequential Views by RNN With Attention. IEEE TRANSACTIONS ON IMAGE PROCESSING : A PUBLICATION OF THE IEEE SIGNAL PROCESSING SOCIETY 2019; 28:658-672. [PMID: 30183634 DOI: 10.1109/tip.2018.2868426] [Citation(s) in RCA: 32] [Impact Index Per Article: 5.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/08/2023]
Abstract
Learning 3D global features by aggregating multiple views has been introduced as a successful strategy for 3D shape analysis. In recent deep learning models with end-to-end training, pooling is a widely adopted procedure for view aggregation. However, pooling merely retains the max or mean value over all views, which disregards the content information of almost all views and also the spatial information among the views. To resolve these issues, we propose Sequential Views To Sequential Labels (SeqViews2SeqLabels) as a novel deep learning model with an encoder-decoder structure based on recurrent neural networks (RNNs) with attention. SeqViews2SeqLabels consists of two connected parts, an encoder-RNN followed by a decoder-RNN, that aim to learn the global features by aggregating sequential views and then performing shape classification from the learned global features, respectively. Specifically, the encoder-RNN learns the global features by simultaneously encoding the spatial and content information of sequential views, which captures the semantics of the view sequence. With the proposed prediction of sequential labels, the decoder-RNN performs more accurate classification using the learned global features by predicting sequential labels step by step. Learning to predict sequential labels provides more and finer discriminative information among shape classes to learn, which alleviates the overfitting problem inherent in training using a limited number of 3D shapes. Moreover, we introduce an attention mechanism to further improve the discriminative ability of SeqViews2SeqLabels. This mechanism increases the weight of views that are distinctive to each shape class, and it dramatically reduces the effect of selecting the first view position. Shape classification and retrieval results under three large-scale benchmarks verify that SeqViews2SeqLabels learns more discriminative global features by more effectively aggregating sequential views than state-of-the-art methods.
Collapse
|
15
|
Han Z, Liu Z, Vong CM, Liua YS, Bu S, Han J, Chen CLP. Deep Spatiality: Unsupervised Learning of Spatially-Enhanced Global and Local 3D Features by Deep Neural Network with Coupled Softmax. IEEE TRANSACTIONS ON IMAGE PROCESSING : A PUBLICATION OF THE IEEE SIGNAL PROCESSING SOCIETY 2018; 27:3049-3063. [PMID: 29993805 DOI: 10.1109/tip.2018.2816821] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/08/2023]
Abstract
The discriminability of Bag-of-Words representations can be increased via encoding the spatial relationship among virtual words on 3D shapes. However, this encoding task involves several issues, including arbitrary mesh resolutions, irregular vertex topology, orientation ambiguity on 3D surface, invariance to rigid and non-rigid shape transformations. To address these issues, a novel unsupervised spatial learning framework based on deep neural network, deep spatiality (DS), is proposed. Specifically, DS employs two novel components: spatial context extractor and deep context learner. Spatial context extractor extracts the spatial relationship among virtual words in a local region into a raw spatial representation. Along a consistent circular direction, a directed circular graph is constructed to encode relative positions between pairwise virtual words in each face ring into a relative spatial matrix. By decomposing each relative spatial matrix using SVD, the raw spatial representation is formed, from which deep context learner conducts unsupervised learning of global and local features. Deep context learner is a deep neural network with a novel model structure to adapt the proposed coupled softmax layer, which encodes not only the discriminative information among local regions but also the one among global shapes. Experimental results show that DS outperforms state-of-the-art methods.
Collapse
|
16
|
Han Z, Liu Z, Han J, Vong CM, Bu S, Li X. Unsupervised 3D Local Feature Learning by Circle Convolutional Restricted Boltzmann Machine. IEEE TRANSACTIONS ON IMAGE PROCESSING : A PUBLICATION OF THE IEEE SIGNAL PROCESSING SOCIETY 2016; 25:5331-5344. [PMID: 28113374 DOI: 10.1109/tip.2016.2605920] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/06/2023]
Abstract
Extracting local features from 3D shapes is an important and challenging task that usually requires carefully designed 3D shape descriptors. However, these descriptors are hand-crafted and require intensive human intervention with prior knowledge. To tackle this issue, we propose a novel deep learning model, namely circle convolutional restricted Boltzmann machine (CCRBM), for unsupervised 3D local feature learning. CCRBM is specially designed to learn from raw 3D representations. It effectively overcomes obstacles such as irregular vertex topology, orientation ambiguity on the 3D surface, and rigid or slightly non-rigid transformation invariance in the hierarchical learning of 3D data that cannot be resolved by the existing deep learning models. Specifically, by introducing the novel circle convolution, CCRBM holds a novel ring-like multi-layer structure to learn 3D local features in a structure preserving manner. Circle convolution convolves across 3D local regions via rotating a novel circular sector convolution window in a consistent circular direction. In the process of circle convolution, extra points are sampled in each 3D local region and projected onto the tangent plane of the center of the region. In this way, the projection distances in each sector window are employed to constitute a novel local raw 3D representation called projection distance distribution (PDD). In addition, to eliminate the initial location ambiguity of a sector window, the Fourier transform modulus is used to transform the PDD into the Fourier domain, which is then conveyed to CCRBM. Experiments using the learned local features are conducted on three aspects: global shape retrieval, partial shape retrieval, and shape correspondence. The experimental results show that the learned local features outperform other state-of-the-art 3D shape descriptors.
Collapse
|