1
|
Li J, Wu Z, Zhang S, Lu W. Joint identification of hydraulic conductivity and groundwater pollution sources using unscented Kalman smoother with multiple data assimilation and deep learning. ECOTOXICOLOGY AND ENVIRONMENTAL SAFETY 2025; 295:118134. [PMID: 40187214 DOI: 10.1016/j.ecoenv.2025.118134] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 12/03/2024] [Revised: 03/01/2025] [Accepted: 03/30/2025] [Indexed: 04/07/2025]
Abstract
Identification of groundwater pollution sources (IGPSs) is a prerequisite for pollution remediation and pollution risk prediction. Data assimilation approaches have been used extensively in IGPSs field in recent years. A data assimilation approach-unscented Kalman filter is complex to operate due to the need to repeatedly restart the simulation model, and the identification accuracy needs to be improved further for application to IGPSs with strong nonlinear characteristics. Thus, to improve the identification performance and enrich the technology for IGPSs, a novel data assimilation approach called unscented Kalman smoother with multiple data assimilation (UKS-MDA) was applied to identify hydraulic conductivity and GPSs. To assess the identification performance, the identification results (IRs) obtained with UKS-MDA were compared with those produced by the ensemble smoother with multiple data assimilation (ES-MDA) in terms of the identification accuracy and computational efficiency. In addition, given the strong learning ability of deep belief neural network (DBNN) for complex nonlinear systems, this study employs a deep belief neural network (DBNN) as a substitute model for the simulation model to reduce the computational load and loss of computational accuracy caused by iterative calculations. The results indicated that (1) the mean relative error (MRE) between the DBNN substitute model and the simulation model was 0.92 %, and when applied to IGPSs, it could save approximately 99 % of the computation time and load. (2) The MREs between the IRs obtained using UKS-MDA and the true values in scenarios with smaller errors in concentrations and in scenarios with larger errors in concentrations were 0.4 % and 4.16 % lower than that obtained using ES-MDA. (3) Compared to ES-MDA, UKS-MDA could save approximately 12 % of computational efficiency in the execution of IGPSs. The combination of DBNN and UKS-MDA could effectively recognize GPSs, which has guiding significance for the remediation and prediction of groundwater pollution.
Collapse
Affiliation(s)
- Jiuhui Li
- Key Laboratory of Geographical Processes and Ecological Security in Changbai Mountains, Ministry of Education, School of Geographical Sciences, Northeast Normal University, Changchun 130024, China
| | - Zhengfang Wu
- Key Laboratory of Geographical Processes and Ecological Security in Changbai Mountains, Ministry of Education, School of Geographical Sciences, Northeast Normal University, Changchun 130024, China.
| | - Shuo Zhang
- Key Laboratory of Geographical Processes and Ecological Security in Changbai Mountains, Ministry of Education, School of Geographical Sciences, Northeast Normal University, Changchun 130024, China
| | - Wenxi Lu
- College of New Energy and Environment, Jilin University, Changchun 130021, China
| |
Collapse
|
2
|
Chen B, Lv X, Zhao Y, Yu L. TPDC: Point Cloud Completion by Triangular Pyramid Features and Divide-and-Conquer in Complex Environments. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2025; 36:6029-6040. [PMID: 38758619 DOI: 10.1109/tnnls.2024.3397988] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/19/2024]
Abstract
Point cloud completion recovers the complete point clouds from partial ones, providing numerous point cloud information for downstream tasks such as 3-D reconstruction and target detection. However, previous methods usually suffer from unstructured prediction of points in local regions and the discrete nature of the point cloud. To resolve these problems, we propose a point cloud completion network called TPDC. Representing the point cloud as a set of unordered features of points with local geometric information, we devise a Triangular Pyramid Extractor (TPE), using the simplest 3-D structure-a triangular pyramid-to convert the point cloud to a sequence of local geometric information. Our insight of revealing local geometric information in a complex environment is to design a Divide-and-Conquer Splitting Module in a Divide-and-Conquer Splitting Decoder (DCSD) to learn point-splitting patterns that can fit local regions the best. This module employs the Divide-and-Conquer approach to parallelly handle tasks related to fitting ground-truth values to base points and predicting the displacement of split points. This approach aims to make the base points align more closely with the ground-truth values while also forecasting the displacement of split points relative to the base points. Furthermore, we propose a more realistic and challenging benchmark, ShapeNetMask, with more random point cloud input, more complex random item occlusion, and more realistic random environmental perturbations. The results show that our method outperforms both widely used benchmarks as well as the new benchmark.
Collapse
|
3
|
Xiang P, Wen X, Liu YS, Cao YP, Wan P, Zheng W, Han Z. Snowflake Point Deconvolution for Point Cloud Completion and Generation With Skip-Transformer. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE 2023; 45:6320-6338. [PMID: 36282830 DOI: 10.1109/tpami.2022.3217161] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/16/2023]
Abstract
Most existing point cloud completion methods suffer from the discrete nature of point clouds and the unstructured prediction of points in local regions, which makes it difficult to reveal fine local geometric details. To resolve this issue, we propose SnowflakeNet with snowflake point deconvolution (SPD) to generate complete point clouds. SPD models the generation of point clouds as the snowflake-like growth of points, where child points are generated progressively by splitting their parent points after each SPD. Our insight into the detailed geometry is to introduce a skip-transformer in the SPD to learn the point splitting patterns that can best fit the local regions. The skip-transformer leverages attention mechanism to summarize the splitting patterns used in the previous SPD layer to produce the splitting in the current layer. The locally compact and structured point clouds generated by SPD precisely reveal the structural characteristics of the 3D shape in local patches, which enables us to predict highly detailed geometries. Moreover, since SPD is a general operation that is not limited to completion, we explore its applications in other generative tasks, including point cloud auto-encoding, generation, single image reconstruction, and upsampling. Our experimental results outperform state-of-the-art methods under widely used benchmarks.
Collapse
|
4
|
Li H, Liu M, Yu X, Zhu J, Wang C, Chen X, Feng C, Leng J, Zhang Y, Xu F. Coherence based graph convolution network for motor imagery-induced EEG after spinal cord injury. Front Neurosci 2023; 16:1097660. [PMID: 36711141 PMCID: PMC9880407 DOI: 10.3389/fnins.2022.1097660] [Citation(s) in RCA: 7] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/14/2022] [Accepted: 12/28/2022] [Indexed: 01/15/2023] Open
Abstract
Background Spinal cord injury (SCI) may lead to impaired motor function, autonomic nervous system dysfunction, and other dysfunctions. Brain-computer Interface (BCI) system based on motor imagery (MI) can provide more scientific and effective treatment solutions for SCI patients. Methods According to the interaction between brain regions, a coherence-based graph convolutional network (C-GCN) method is proposed to extract the temporal-frequency-spatial features and functional connectivity information of EEG signals. The proposed algorithm constructs multi-channel EEG features based on coherence networks as graphical signals and then classifies MI tasks. Different from the traditional graphical convolutional neural network (GCN), the C-GCN method uses the coherence network of EEG signals to determine MI-related functional connections, which are used to represent the intrinsic connections between EEG channels in different rhythms and different MI tasks. EEG data of SCI patients and healthy subjects have been analyzed, where healthy subjects served as the control group. Results The experimental results show that the C-GCN method can achieve the best classification performance with certain reliability and stability, the highest classification accuracy is 96.85%. Conclusion The proposed framework can provide an effective theoretical basis for the rehabilitation treatment of SCI patients.
Collapse
Affiliation(s)
- Han Li
- International School for Optoelectronic Engineering, Qilu University of Technology, Shandong Academy of Sciences, Jinan, China
| | - Ming Liu
- International School for Optoelectronic Engineering, Qilu University of Technology, Shandong Academy of Sciences, Jinan, China
| | - Xin Yu
- International School for Optoelectronic Engineering, Qilu University of Technology, Shandong Academy of Sciences, Jinan, China
| | - JianQun Zhu
- International School for Optoelectronic Engineering, Qilu University of Technology, Shandong Academy of Sciences, Jinan, China
| | - Chongfeng Wang
- International School for Optoelectronic Engineering, Qilu University of Technology, Shandong Academy of Sciences, Jinan, China
| | - Xinyi Chen
- International School for Optoelectronic Engineering, Qilu University of Technology, Shandong Academy of Sciences, Jinan, China
| | - Chao Feng
- International School for Optoelectronic Engineering, Qilu University of Technology, Shandong Academy of Sciences, Jinan, China,*Correspondence: Chao Feng,
| | - Jiancai Leng
- International School for Optoelectronic Engineering, Qilu University of Technology, Shandong Academy of Sciences, Jinan, China,Jiancai Leng,
| | - Yang Zhang
- Rehabilitation Center, Qilu Hospital of Shandong University, Jinan, China,Yang Zhang,
| | - Fangzhou Xu
- International School for Optoelectronic Engineering, Qilu University of Technology, Shandong Academy of Sciences, Jinan, China,Fangzhou Xu,
| |
Collapse
|
5
|
Principal views selection based on growing graph convolution network for multi-view 3D model recognition. APPL INTELL 2022. [DOI: 10.1007/s10489-022-03775-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/02/2022]
|
6
|
Qi X, Hu J, Zhang L, Bai S, Yi Z. Automated Segmentation of the Clinical Target Volume in the Planning CT for Breast Cancer Using Deep Neural Networks. IEEE TRANSACTIONS ON CYBERNETICS 2022; 52:3446-3456. [PMID: 32833659 DOI: 10.1109/tcyb.2020.3012186] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/11/2023]
Abstract
3-D radiotherapy is an effective treatment modality for breast cancer. In 3-D radiotherapy, delineation of the clinical target volume (CTV) is an essential step in the establishment of treatment plans. However, manual delineation is subjective and time consuming. In this study, we propose an automated segmentation model based on deep neural networks for the breast cancer CTV in planning computed tomography (CT). Our model is composed of three stages that work in a cascade manner, making it applicable to real-world scenarios. The first stage determines which slices contain CTVs, as not all CT slices include breast lesions. The second stage detects the region of the human body in an entire CT slice, eliminating boundary areas, which may have side effects for the segmentation of the CTV. The third stage delineates the CTV. To permit the network to focus on the breast mass in the slice, a novel dynamically strided convolution operation, which shows better performance than standard convolution, is proposed. To train and evaluate the model, a large dataset containing 455 cases and 50 425 CT slices is constructed. The proposed model achieves an average dice similarity coefficient (DSC) of 0.802 and 0.801 for right-0 and left-sided breast, respectively. Our method shows superior performance to that of previous state-of-the-art approaches.
Collapse
|
7
|
Xu Y, Du B, Zhang L. Self-Attention Context Network: Addressing the Threat of Adversarial Attacks for Hyperspectral Image Classification. IEEE TRANSACTIONS ON IMAGE PROCESSING : A PUBLICATION OF THE IEEE SIGNAL PROCESSING SOCIETY 2021; 30:8671-8685. [PMID: 34648444 DOI: 10.1109/tip.2021.3118977] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/13/2023]
Abstract
Deep learning models have shown their great capability for the hyperspectral image (HSI) classification task in recent years. Nevertheless, their vulnerability towards adversarial attacks could not be neglected. In this study, we systematically analyze the influence of adversarial attacks on the HSI classification task for the first time. While existing research of adversarial attacks focuses on the generation of adversarial examples in the RGB domain, the experiments in this study show such adversarial examples could also exist in the hyperspectral domain. Although the difference between the generated adversarial image and the original hyperspectral data is imperceptible to the human visual system, most of the existing state-of-the-art deep learning models could be fooled by the adversarial image to make wrong predictions. To address this challenge, a novel self-attention context network (SACNet) is further proposed. We discover that the global context information contained in HSI can significantly improve the robustness of deep neural networks when confronted with adversarial attacks. Extensive experiments on three benchmark HSI datasets demonstrate that the proposed SACNet possesses stronger resistibility towards adversarial examples compared with the existing state-of-the-art deep learning models.
Collapse
|
8
|
Wen X, Han Z, Liu X, Liu YS. Point2SpatialCapsule: Aggregating Features and Spatial Relationships of Local Regions on Point Clouds using Spatial-aware Capsules. IEEE TRANSACTIONS ON IMAGE PROCESSING : A PUBLICATION OF THE IEEE SIGNAL PROCESSING SOCIETY 2020; PP:8855-8869. [PMID: 32894715 DOI: 10.1109/tip.2020.3019925] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/11/2023]
Abstract
Learning discriminative shape representation directly on point clouds is still challenging in 3D shape analysis and understanding. Recent studies usually involve three steps: first splitting a point cloud into some local regions, then extracting the corresponding feature of each local region, and finally aggregating all individual local region features into a global feature as shape representation using simple max-pooling. However, such pooling-based feature aggregation methods do not adequately take the spatial relationships (e.g. the relative locations to other regions) between local regions into account, which greatly limits the ability to learn discriminative shape representation. To address this issue, we propose a novel deep learning network, named Point2SpatialCapsule, for aggregating features and spatial relationships of local regions on point clouds, which aims to learn more discriminative shape representation. Compared with the traditional max-pooling based feature aggregation networks, Point2SpatialCapsule can explicitly learn not only geometric features of local regions but also the spatial relationships among them. Point2SpatialCapsule consists of two main modules. To resolve the disorder problem of local regions, the first module, named geometric feature aggregation, is designed to aggregate the local region features into the learnable cluster centers, which explicitly encodes the spatial locations from the original 3D space. The second module, named spatial relationship aggregation, is proposed for further aggregating the clustered features and the spatial relationships among them in the feature space using the spatial-aware capsules developed in this paper. Compared to the previous capsule network based methods, the feature routing on the spatial-aware capsules can learn more discriminative spatial relationships among local regions for point clouds, which establishes a direct mapping between log priors and the spatial locations through feature clusters. Experimental results demonstrate that Point2SpatialCapsule outperforms the state-of-the-art methods in the 3D shape classification, retrieval and segmentation tasks under the well-known ModelNet and ShapeNet datasets.
Collapse
|
9
|
Han Z, Ma B, Liu YS, Zwicker M. Reconstructing 3D Shapes from Multiple Sketches using Direct Shape Optimization. IEEE TRANSACTIONS ON IMAGE PROCESSING : A PUBLICATION OF THE IEEE SIGNAL PROCESSING SOCIETY 2020; PP:8721-8734. [PMID: 32870791 DOI: 10.1109/tip.2020.3018865] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/11/2023]
Abstract
3D shape reconstruction from multiple hand-drawn sketches is an intriguing way to 3D shape modeling. Currently, state-of-the-art methods employ neural networks to learn a mapping from multiple sketches from arbitrary view angles to a 3D voxel grid. Because of the cubic complexity of 3D voxel grids, however, neural networks are hard to train and limited to low resolution reconstructions, which leads to a lack of geometric detail and low accuracy. To resolve this issue, we propose to reconstruct 3D shapes from multiple sketches using direct shape optimization (DSO), which does not involve deep learning models for direct voxel-based 3D shape generation. Specifically, we first leverage a conditional generative adversarial network (CGAN) to translate each sketch into an attenuance image that captures the predicted geometry from a given viewpoint. Then, DSO minimizes a project-and-compare loss to reconstruct the 3D shape such that it matches the predicted attenuance images from the view angles of all input sketches. Based on this, we further propose a progressive update approach to handle inconsistencies among a few hand-drawn sketches for the same 3D shape. Our experimental results show that our method significantly outperforms the state-of-the-art methods under widely used benchmarks and produces intuitive results in an interactive application.
Collapse
|
10
|
A New Volumetric CNN for 3D Object Classification Based on Joint Multiscale Feature and Subvolume Supervised Learning Approaches. COMPUTATIONAL INTELLIGENCE AND NEUROSCIENCE 2020. [DOI: 10.1155/2020/5851465] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/18/2022]
Abstract
The advancement of low-cost RGB-D and LiDAR three-dimensional (3D) sensors has permitted the obtainment of the 3D model easier in real-time. However, making intricate 3D features is crucial for the advancement of 3D object classifications. The existing volumetric voxel-based CNN approaches have achieved remarkable progress, but they generate huge computational overhead that limits the extraction of global features at higher resolutions of 3D objects. In this paper, a low-cost 3D volumetric deep convolutional neural network is proposed for 3D object classification based on joint multiscale hierarchical and subvolume supervised learning strategies. Our proposed deep neural network inputs 3D data, which are preprocessed by implementing memory-efficient octree representation, and we propose to limit the full layer octree depth to a certain level based on the predefined input volume resolution for storing high-precision contour features. Multiscale features are concatenated from multilevel octree depths inside the network, aiming to adaptively generate high-level global features. The strategy of the subvolume supervision approach is to train the network on subparts of the 3D object in order to learn local features. Our framework has been evaluated with two publicly available 3D repositories. Experimental results demonstrate the effectiveness of our proposed method where the classification accuracy is improved in comparison to existing volumetric approaches, and the memory consumption ratio and run-time are significantly reduced.
Collapse
|
11
|
Li N, Li Q, Liu YS, Lu W, Wang W. BIMSeek++: Retrieving BIM components using similarity measurement of attributes. COMPUT IND 2020. [DOI: 10.1016/j.compind.2020.103186] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/27/2022]
|
12
|
Yang X, Guo H, Wang N, Song B, Gao X. A Novel Symmetry Driven Siamese Network for THz Concealed Object Verification. IEEE TRANSACTIONS ON IMAGE PROCESSING : A PUBLICATION OF THE IEEE SIGNAL PROCESSING SOCIETY 2020; 29:5447-5456. [PMID: 32248104 DOI: 10.1109/tip.2020.2983554] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/11/2023]
Abstract
Security inspection aims to improve the high detection rate as well as reduce the false alarm rate. However, it still suffers from two challenges affecting its robustness. 1) Existing security inspection methods are mostly designed for natural images, which cannot reflect the uniqueness and imaging principle of THz images. 2) Existing methods is sensitive to noise interference and pose variations. This work revisits these challenges and presents a novel symmetry driven Siamese network (SDSN) for THz concealed object verification. Our idea is to employ a specially designed network architecture for THz concealed object verification. First, to reflect the uniqueness and the special property of THz images, Siamese network with Contrastive loss is used for feature extraction along with symmetrical prior information consideration, which can learn symmetrical metrics from the same person. Second, to alleviate the impact of noise interference and pose variations, the adaptive identity normalization (A-IDN) is proposed to normalize the symmetrical metrics each person. Finally, to enhance the generalization of network, an adaptive selective threshold based on Gaussian mixture model (AST-GMM) is designed, which serves as a classifier for the final classification results. Extensive experiments show that SDSN significantly improves the accuracy. Specially, SDSN outperforms the state-of-the-art methods without symmetrical prior information on THz security dataset.
Collapse
|
13
|
Kan S, Cen Y, He Z, Zhang Z, Zhang L, Wang Y. Supervised Deep Feature Embedding With Handcrafted Feature. IEEE TRANSACTIONS ON IMAGE PROCESSING : A PUBLICATION OF THE IEEE SIGNAL PROCESSING SOCIETY 2019; 28:5809-5823. [PMID: 30802863 DOI: 10.1109/tip.2019.2901407] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/09/2023]
Abstract
Image representation methods based on deep convolutional neural networks (CNNs) have achieved the state-of-the-art performance in various computer vision tasks, such as image retrieval and person re-identification. We recognize that more discriminative feature embeddings can be learned with supervised deep metric learning and handcrafted features for image retrieval and similar applications. In this paper, we propose a new supervised deep feature embedding with a handcrafted feature model. To fuse handcrafted feature information into CNNs and realize feature embeddings, a general fusion unit is proposed (called Fusion-Net). We also define a network loss function with image label information to realize supervised deep metric learning. Our extensive experimental results on the Stanford online products' data set and the in-shop clothes retrieval data set demonstrate that our proposed methods outperform the existing state-of-the-art methods of image retrieval by a large margin. Moreover, we also explore the applications of the proposed methods in person re-identification and vehicle re-identification; the experimental results demonstrate both the effectiveness and efficiency of the proposed methods.
Collapse
|
14
|
Han Z, Lu H, Liu Z, Vong CM, Liu YS, Zwicker M, Han J, Chen CLP. 3D2SeqViews: Aggregating Sequential Views for 3D Global Feature Learning by CNN With Hierarchical Attention Aggregation. IEEE TRANSACTIONS ON IMAGE PROCESSING : A PUBLICATION OF THE IEEE SIGNAL PROCESSING SOCIETY 2019; 28:3986-3999. [PMID: 30872228 DOI: 10.1109/tip.2019.2904460] [Citation(s) in RCA: 22] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/09/2023]
Abstract
Learning 3D global features by aggregating multiple views is important. Pooling is widely used to aggregate views in deep learning models. However, pooling disregards a lot of content information within views and the spatial relationship among the views, which limits the discriminability of learned features. To resolve this issue, 3D to Sequential Views (3D2SeqViews) is proposed to more effectively aggregate the sequential views using convolutional neural networks with a novel hierarchical attention aggregation. Specifically, the content information within each view is first encoded. Then, the encoded view content information and the sequential spatiality among the views are simultaneously aggregated by the hierarchical attention aggregation, where view-level attention and class-level attention are proposed to hierarchically weight sequential views and shape classes. View-level attention is learned to indicate how much attention is paid to each view by each shape class, which subsequently weights sequential views through a novel recursive view integration. Recursive view integration learns the semantic meaning of view sequence, which is robust to the first view position. Furthermore, class-level attention is introduced to describe how much attention is paid to each shape class, which innovatively employs the discriminative ability of the fine-tuned network. 3D2SeqViews learns more discriminative features than the state-of-the-art, which leads to the outperforming results in shape classification and retrieval under three large-scale benchmarks.
Collapse
|
15
|
Han Z, Shang M, Liu Z, Vong CM, Liu YS, Zwicker M, Han J, Chen CLP. SeqViews2SeqLabels: Learning 3D Global Features via Aggregating Sequential Views by RNN With Attention. IEEE TRANSACTIONS ON IMAGE PROCESSING : A PUBLICATION OF THE IEEE SIGNAL PROCESSING SOCIETY 2019; 28:658-672. [PMID: 30183634 DOI: 10.1109/tip.2018.2868426] [Citation(s) in RCA: 32] [Impact Index Per Article: 5.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/08/2023]
Abstract
Learning 3D global features by aggregating multiple views has been introduced as a successful strategy for 3D shape analysis. In recent deep learning models with end-to-end training, pooling is a widely adopted procedure for view aggregation. However, pooling merely retains the max or mean value over all views, which disregards the content information of almost all views and also the spatial information among the views. To resolve these issues, we propose Sequential Views To Sequential Labels (SeqViews2SeqLabels) as a novel deep learning model with an encoder-decoder structure based on recurrent neural networks (RNNs) with attention. SeqViews2SeqLabels consists of two connected parts, an encoder-RNN followed by a decoder-RNN, that aim to learn the global features by aggregating sequential views and then performing shape classification from the learned global features, respectively. Specifically, the encoder-RNN learns the global features by simultaneously encoding the spatial and content information of sequential views, which captures the semantics of the view sequence. With the proposed prediction of sequential labels, the decoder-RNN performs more accurate classification using the learned global features by predicting sequential labels step by step. Learning to predict sequential labels provides more and finer discriminative information among shape classes to learn, which alleviates the overfitting problem inherent in training using a limited number of 3D shapes. Moreover, we introduce an attention mechanism to further improve the discriminative ability of SeqViews2SeqLabels. This mechanism increases the weight of views that are distinctive to each shape class, and it dramatically reduces the effect of selecting the first view position. Shape classification and retrieval results under three large-scale benchmarks verify that SeqViews2SeqLabels learns more discriminative global features by more effectively aggregating sequential views than state-of-the-art methods.
Collapse
|