1
|
Manocha A, Afaq Y, Bhatia M. Mapping of water bodies from sentinel-2 images using deep learning-based feature fusion approach. Neural Comput Appl 2022. [DOI: 10.1007/s00521-022-08177-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/24/2022]
|
2
|
|
3
|
Liu X, Han Z, Liu YS, Zwicker M. Fine-Grained 3D Shape Classification With Hierarchical Part-View Attention. IEEE TRANSACTIONS ON IMAGE PROCESSING : A PUBLICATION OF THE IEEE SIGNAL PROCESSING SOCIETY 2021; 30:1744-1758. [PMID: 33417547 DOI: 10.1109/tip.2020.3048623] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/12/2023]
Abstract
Fine-grained 3D shape classification is important for shape understanding and analysis, which poses a challenging research problem. However, the studies on the fine-grained 3D shape classification have rarely been explored, due to the lack of fine-grained 3D shape benchmarks. To address this issue, we first introduce a new 3D shape dataset (named FG3D dataset) with fine-grained class labels, which consists of three categories including airplane, car and chair. Each category consists of several subcategories at a fine-grained level. According to our experiments under this fine-grained dataset, we find that state-of-the-art methods are significantly limited by the small variance among subcategories in the same category. To resolve this problem, we further propose a novel fine-grained 3D shape classification method named FG3D-Net to capture the fine-grained local details of 3D shapes from multiple rendered views. Specifically, we first train a Region Proposal Network (RPN) to detect the generally semantic parts inside multiple views under the benchmark of generally semantic part detection. Then, we design a hierarchical part-view attention aggregation module to learn a global shape representation by aggregating generally semantic part features, which preserves the local details of 3D shapes. The part-view attention module hierarchically leverages part-level and view-level attention to increase the discriminability of our features. The part-level attention highlights the important parts in each view while the view-level attention highlights the discriminative views among all the views of the same object. In addition, we integrate a Recurrent Neural Network (RNN) to capture the spatial relationships among sequential views from different viewpoints. Our results under the fine-grained 3D shape dataset show that our method outperforms other state-of-the-art methods. The FG3D dataset is available at https://github.com/liuxinhai/FG3D-Net.
Collapse
|
4
|
Wen X, Han Z, Liu X, Liu YS. Point2SpatialCapsule: Aggregating Features and Spatial Relationships of Local Regions on Point Clouds using Spatial-aware Capsules. IEEE TRANSACTIONS ON IMAGE PROCESSING : A PUBLICATION OF THE IEEE SIGNAL PROCESSING SOCIETY 2020; PP:8855-8869. [PMID: 32894715 DOI: 10.1109/tip.2020.3019925] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/11/2023]
Abstract
Learning discriminative shape representation directly on point clouds is still challenging in 3D shape analysis and understanding. Recent studies usually involve three steps: first splitting a point cloud into some local regions, then extracting the corresponding feature of each local region, and finally aggregating all individual local region features into a global feature as shape representation using simple max-pooling. However, such pooling-based feature aggregation methods do not adequately take the spatial relationships (e.g. the relative locations to other regions) between local regions into account, which greatly limits the ability to learn discriminative shape representation. To address this issue, we propose a novel deep learning network, named Point2SpatialCapsule, for aggregating features and spatial relationships of local regions on point clouds, which aims to learn more discriminative shape representation. Compared with the traditional max-pooling based feature aggregation networks, Point2SpatialCapsule can explicitly learn not only geometric features of local regions but also the spatial relationships among them. Point2SpatialCapsule consists of two main modules. To resolve the disorder problem of local regions, the first module, named geometric feature aggregation, is designed to aggregate the local region features into the learnable cluster centers, which explicitly encodes the spatial locations from the original 3D space. The second module, named spatial relationship aggregation, is proposed for further aggregating the clustered features and the spatial relationships among them in the feature space using the spatial-aware capsules developed in this paper. Compared to the previous capsule network based methods, the feature routing on the spatial-aware capsules can learn more discriminative spatial relationships among local regions for point clouds, which establishes a direct mapping between log priors and the spatial locations through feature clusters. Experimental results demonstrate that Point2SpatialCapsule outperforms the state-of-the-art methods in the 3D shape classification, retrieval and segmentation tasks under the well-known ModelNet and ShapeNet datasets.
Collapse
|
5
|
Han Z, Ma B, Liu YS, Zwicker M. Reconstructing 3D Shapes from Multiple Sketches using Direct Shape Optimization. IEEE TRANSACTIONS ON IMAGE PROCESSING : A PUBLICATION OF THE IEEE SIGNAL PROCESSING SOCIETY 2020; PP:8721-8734. [PMID: 32870791 DOI: 10.1109/tip.2020.3018865] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/11/2023]
Abstract
3D shape reconstruction from multiple hand-drawn sketches is an intriguing way to 3D shape modeling. Currently, state-of-the-art methods employ neural networks to learn a mapping from multiple sketches from arbitrary view angles to a 3D voxel grid. Because of the cubic complexity of 3D voxel grids, however, neural networks are hard to train and limited to low resolution reconstructions, which leads to a lack of geometric detail and low accuracy. To resolve this issue, we propose to reconstruct 3D shapes from multiple sketches using direct shape optimization (DSO), which does not involve deep learning models for direct voxel-based 3D shape generation. Specifically, we first leverage a conditional generative adversarial network (CGAN) to translate each sketch into an attenuance image that captures the predicted geometry from a given viewpoint. Then, DSO minimizes a project-and-compare loss to reconstruct the 3D shape such that it matches the predicted attenuance images from the view angles of all input sketches. Based on this, we further propose a progressive update approach to handle inconsistencies among a few hand-drawn sketches for the same 3D shape. Our experimental results show that our method significantly outperforms the state-of-the-art methods under widely used benchmarks and produces intuitive results in an interactive application.
Collapse
|
6
|
A New Volumetric CNN for 3D Object Classification Based on Joint Multiscale Feature and Subvolume Supervised Learning Approaches. COMPUTATIONAL INTELLIGENCE AND NEUROSCIENCE 2020. [DOI: 10.1155/2020/5851465] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/18/2022]
Abstract
The advancement of low-cost RGB-D and LiDAR three-dimensional (3D) sensors has permitted the obtainment of the 3D model easier in real-time. However, making intricate 3D features is crucial for the advancement of 3D object classifications. The existing volumetric voxel-based CNN approaches have achieved remarkable progress, but they generate huge computational overhead that limits the extraction of global features at higher resolutions of 3D objects. In this paper, a low-cost 3D volumetric deep convolutional neural network is proposed for 3D object classification based on joint multiscale hierarchical and subvolume supervised learning strategies. Our proposed deep neural network inputs 3D data, which are preprocessed by implementing memory-efficient octree representation, and we propose to limit the full layer octree depth to a certain level based on the predefined input volume resolution for storing high-precision contour features. Multiscale features are concatenated from multilevel octree depths inside the network, aiming to adaptively generate high-level global features. The strategy of the subvolume supervision approach is to train the network on subparts of the 3D object in order to learn local features. Our framework has been evaluated with two publicly available 3D repositories. Experimental results demonstrate the effectiveness of our proposed method where the classification accuracy is improved in comparison to existing volumetric approaches, and the memory consumption ratio and run-time are significantly reduced.
Collapse
|
7
|
Chu J, Wang H, Meng H, Jin P, Li T. Restricted Boltzmann Machines With Gaussian Visible Units Guided by Pairwise Constraints. IEEE TRANSACTIONS ON CYBERNETICS 2019; 49:4321-4334. [PMID: 30137021 DOI: 10.1109/tcyb.2018.2863601] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/08/2023]
Abstract
Restricted Boltzmann machines (RBMs) and their variants are usually trained by contrastive divergence (CD) learning, but the training procedure is an unsupervised learning approach, without any guidances of the background knowledge. To enhance the expression ability of traditional RBMs, in this paper, we propose pairwise constraints (PCs) RBM with Gaussian visible units (pcGRBM) model, in which the learning procedure is guided by PCs and the process of encoding is conducted under these guidances. The PCs are encoded in hidden layer features of pcGRBM. Then, some pairwise hidden features of pcGRBM flock together and another part of them are separated by the guidances. In order to deal with real-valued data, the binary visible units are replaced by linear units with Gaussian noise in the pcGRBM model. In the learning process of pcGRBM, the PCs are iterated transitions between visible and hidden units during CD learning procedure. Then, the proposed model is inferred by approximative gradient descent method and the corresponding learning algorithm is designed. In order to compare the availability of pcGRBM and traditional RBMs with Gaussian visible units, the features of the pcGRBM and RBMs hidden layer are used as input "data" for K -means, spectral clustering (SP) and affinity propagation (AP) algorithms, respectively. We also use tenfold cross-validation strategy to train and test pcGRBM model to obtain more meaningful results with PCs which are derived from incremental sampling procedures. A thorough experimental evaluation is performed with 12 image datasets of Microsoft Research Asia Multimedia. The experimental results show that the clustering performance of K -means, SP, and AP algorithms based on pcGRBM model are significantly better than traditional RBMs. In addition, the pcGRBM model for clustering tasks shows better performance than some semi-supervised clustering algorithms.
Collapse
|
8
|
|
9
|
Han Z, Lu H, Liu Z, Vong CM, Liu YS, Zwicker M, Han J, Chen CLP. 3D2SeqViews: Aggregating Sequential Views for 3D Global Feature Learning by CNN With Hierarchical Attention Aggregation. IEEE TRANSACTIONS ON IMAGE PROCESSING : A PUBLICATION OF THE IEEE SIGNAL PROCESSING SOCIETY 2019; 28:3986-3999. [PMID: 30872228 DOI: 10.1109/tip.2019.2904460] [Citation(s) in RCA: 22] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/09/2023]
Abstract
Learning 3D global features by aggregating multiple views is important. Pooling is widely used to aggregate views in deep learning models. However, pooling disregards a lot of content information within views and the spatial relationship among the views, which limits the discriminability of learned features. To resolve this issue, 3D to Sequential Views (3D2SeqViews) is proposed to more effectively aggregate the sequential views using convolutional neural networks with a novel hierarchical attention aggregation. Specifically, the content information within each view is first encoded. Then, the encoded view content information and the sequential spatiality among the views are simultaneously aggregated by the hierarchical attention aggregation, where view-level attention and class-level attention are proposed to hierarchically weight sequential views and shape classes. View-level attention is learned to indicate how much attention is paid to each view by each shape class, which subsequently weights sequential views through a novel recursive view integration. Recursive view integration learns the semantic meaning of view sequence, which is robust to the first view position. Furthermore, class-level attention is introduced to describe how much attention is paid to each shape class, which innovatively employs the discriminative ability of the fine-tuned network. 3D2SeqViews learns more discriminative features than the state-of-the-art, which leads to the outperforming results in shape classification and retrieval under three large-scale benchmarks.
Collapse
|
10
|
Zhang L, Yao Y, Lu Z, Shao L. Aesthetics-Guided Graph Clustering With Absent Modalities Imputation. IEEE TRANSACTIONS ON IMAGE PROCESSING : A PUBLICATION OF THE IEEE SIGNAL PROCESSING SOCIETY 2019; 28:3462-3476. [PMID: 30735995 DOI: 10.1109/tip.2019.2897940] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/09/2023]
Abstract
Accurately clustering Internet-scale Internet users into multiple communities according to their aesthetic styles is a useful technique in image modeling and data mining. In this paper, we present a novel partially supervised model which seeks a sparse representation to capture photo aesthetics. It optimally fuzes multi-channel features, i.e., human gaze behavior, quality scores, and semantic tags, each of which could be absent. Afterward, by leveraging the KL-divergence to distinguish the aesthetic distributions between photo sets, a large-scale graph is constructed to describe the aesthetic correlations between users. Finally, a dense subgraph mining algorithm which intrinsically supports outliers (i.e., unique users not belong to any community) is adopted to detect aesthetic communities. The comprehensive experimental results on a million-scale image set grabbed from Flickr have demonstrated the superiority of our method. As a byproduct, the discovered aesthetic communities can enhance photo retargeting and video summarization substantially.
Collapse
|
11
|
Han Z, Liu Z, Han J, Vong CM, Bu S, Chen CLP. Unsupervised Learning of 3-D Local Features From Raw Voxels Based on a Novel Permutation Voxelization Strategy. IEEE TRANSACTIONS ON CYBERNETICS 2019; 49:481-494. [PMID: 29990288 DOI: 10.1109/tcyb.2017.2778764] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/08/2023]
Abstract
Effective 3-D local features are significant elements for 3-D shape analysis. Existing hand-crafted 3-D local descriptors are effective but usually involve intensive human intervention and prior knowledge, which burdens the subsequent processing procedures. An alternative resorts to the unsupervised learning of features from raw 3-D representations via popular deep learning models. However, this alternative suffers from several significant unresolved issues, such as irregular vertex topology, arbitrary mesh resolution, orientation ambiguity on the 3-D surface, and rigid and slightly nonrigid transformation invariance. To tackle these issues, we propose an unsupervised 3-D local feature learning framework based on a novel permutation voxelization strategy to learn high-level and hierarchical 3-D local features from raw 3-D voxels. Specifically, the proposed strategy first applies a novel voxelization which discretizes each 3-D local region with irregular vertex topology and arbitrary mesh resolution into regular voxels, and then, a novel permutation is applied to permute the voxels to simultaneously eliminate the effect of rotation transformation and orientation ambiguity on the surface. Based on the proposed strategy, the permuted voxels can fully encode the geometry and structure of each local region in regular, sparse, and binary vectors. These voxel vectors are highly suitable for the learning of hierarchical common surface patterns by stacked sparse autoencoder with hierarchical abstraction and sparse constraint. Experiments are conducted on three aspects for evaluating the learned local features: 1) global shape retrieval; 2) partial shape retrieval; and 3) shape correspondence. The experimental results show that the learned local features outperform the other state-of-the-art 3-D shape descriptors.
Collapse
|
12
|
Han Z, Shang M, Liu Z, Vong CM, Liu YS, Zwicker M, Han J, Chen CLP. SeqViews2SeqLabels: Learning 3D Global Features via Aggregating Sequential Views by RNN With Attention. IEEE TRANSACTIONS ON IMAGE PROCESSING : A PUBLICATION OF THE IEEE SIGNAL PROCESSING SOCIETY 2019; 28:658-672. [PMID: 30183634 DOI: 10.1109/tip.2018.2868426] [Citation(s) in RCA: 32] [Impact Index Per Article: 5.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/08/2023]
Abstract
Learning 3D global features by aggregating multiple views has been introduced as a successful strategy for 3D shape analysis. In recent deep learning models with end-to-end training, pooling is a widely adopted procedure for view aggregation. However, pooling merely retains the max or mean value over all views, which disregards the content information of almost all views and also the spatial information among the views. To resolve these issues, we propose Sequential Views To Sequential Labels (SeqViews2SeqLabels) as a novel deep learning model with an encoder-decoder structure based on recurrent neural networks (RNNs) with attention. SeqViews2SeqLabels consists of two connected parts, an encoder-RNN followed by a decoder-RNN, that aim to learn the global features by aggregating sequential views and then performing shape classification from the learned global features, respectively. Specifically, the encoder-RNN learns the global features by simultaneously encoding the spatial and content information of sequential views, which captures the semantics of the view sequence. With the proposed prediction of sequential labels, the decoder-RNN performs more accurate classification using the learned global features by predicting sequential labels step by step. Learning to predict sequential labels provides more and finer discriminative information among shape classes to learn, which alleviates the overfitting problem inherent in training using a limited number of 3D shapes. Moreover, we introduce an attention mechanism to further improve the discriminative ability of SeqViews2SeqLabels. This mechanism increases the weight of views that are distinctive to each shape class, and it dramatically reduces the effect of selecting the first view position. Shape classification and retrieval results under three large-scale benchmarks verify that SeqViews2SeqLabels learns more discriminative global features by more effectively aggregating sequential views than state-of-the-art methods.
Collapse
|
13
|
Han Z, Liu Z, Vong CM, Liua YS, Bu S, Han J, Chen CLP. Deep Spatiality: Unsupervised Learning of Spatially-Enhanced Global and Local 3D Features by Deep Neural Network with Coupled Softmax. IEEE TRANSACTIONS ON IMAGE PROCESSING : A PUBLICATION OF THE IEEE SIGNAL PROCESSING SOCIETY 2018; 27:3049-3063. [PMID: 29993805 DOI: 10.1109/tip.2018.2816821] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/08/2023]
Abstract
The discriminability of Bag-of-Words representations can be increased via encoding the spatial relationship among virtual words on 3D shapes. However, this encoding task involves several issues, including arbitrary mesh resolutions, irregular vertex topology, orientation ambiguity on 3D surface, invariance to rigid and non-rigid shape transformations. To address these issues, a novel unsupervised spatial learning framework based on deep neural network, deep spatiality (DS), is proposed. Specifically, DS employs two novel components: spatial context extractor and deep context learner. Spatial context extractor extracts the spatial relationship among virtual words in a local region into a raw spatial representation. Along a consistent circular direction, a directed circular graph is constructed to encode relative positions between pairwise virtual words in each face ring into a relative spatial matrix. By decomposing each relative spatial matrix using SVD, the raw spatial representation is formed, from which deep context learner conducts unsupervised learning of global and local features. Deep context learner is a deep neural network with a novel model structure to adapt the proposed coupled softmax layer, which encodes not only the discriminative information among local regions but also the one among global shapes. Experimental results show that DS outperforms state-of-the-art methods.
Collapse
|
14
|
Proposing Enhanced Feature Engineering and a Selection Model for Machine Learning Processes. APPLIED SCIENCES-BASEL 2018. [DOI: 10.3390/app8040646] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/16/2022]
|