1
|
Xia Q. DTV-CNN: Neural network based on depth and thickness views for efficient 3D shape classification. Heliyon 2023; 9:e21515. [PMID: 38027921 PMCID: PMC10665673 DOI: 10.1016/j.heliyon.2023.e21515] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/29/2023] [Revised: 09/20/2023] [Accepted: 10/23/2023] [Indexed: 12/01/2023] Open
Abstract
Fast and effective algorithms for deep learning on 3D shapes are keys to innovate mechanical and electronic engineering design workflow. In this paper, an efficient 3D shape to 2D images projection algorithm and a shallow 2.5D convolutional neural network architecture is proposed. A smaller convolutional neural network (CNN) model is achieved by information enrichment at the preprocessing stage, i.e. 3D geometry is compressed into 2D "thickness view" and "depth view". Fusing the depth view and thickness view (DTV) from the same projection view into a dual-channel grayscale image, can improve information locality for geometry and topology feature extraction. This approach bridges the gap between mature image deep learning technologies to the applications of 3D shape. Enhanced by several essential scalar geometry properties and only 3 projection views, a mixed CNN and multiple linear parameter (MLP) neural network model achives a validation accuracy of 92 % for ModelNet10 mesh-based dataset, while the training time is one order of magnitude less than the original multi-view CNN approach. This study also creates new 3D shape datasets from 2 open source CAD projects. Higher validation accuracy is obtained for realistic CAD datasets, i.e. 97 % for FreeCAD's mechanical part library and 95 % for KiCAD electronic part library. The training cost reduces to tens of minutes on a laptop CPU, given the smaller input data size and shallow neural network design. It is expected that this approach can be adapted for other machine learning scenarios involved in CAD geometry.
Collapse
Affiliation(s)
- Qingfeng Xia
- Culham Centre for Fusion Energy, United Kingdom Atomic Energy Authority, OX14 3DB, United Kingdom
| |
Collapse
|
2
|
Song D, Nie WZ, Li WH, Kankanhalli M, Liu AA. Monocular Image-Based 3-D Model Retrieval: A Benchmark. IEEE TRANSACTIONS ON CYBERNETICS 2022; 52:8114-8127. [PMID: 33531330 DOI: 10.1109/tcyb.2021.3051016] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/12/2023]
Abstract
Monocular image-based 3-D model retrieval aims to search for relevant 3-D models from a dataset given one RGB image captured in the real world, which can significantly benefit several applications, such as self-service checkout, online shopping, etc. To help advance this promising yet challenging research topic, we built a novel dataset and organized the first international contest for monocular image-based 3-D model retrieval. Moreover, we conduct a thorough analysis of the state-of-the-art methods. Existing methods can be classified into supervised and unsupervised methods. The supervised methods can be analyzed based on several important aspects, such as the strategies of domain adaptation, view fusion, loss function, and similarity measure. The unsupervised methods focus on solving this problem with unlabeled data and domain adaptation. Seven popular metrics are employed to evaluate the performance, and accordingly, we provide a thorough analysis and guidance for future work. To the best of our knowledge, this is the first benchmark for monocular image-based 3-D model retrieval, which aims to help related research in multiview feature learning, domain adaptation, and information retrieval.
Collapse
|
3
|
Principal views selection based on growing graph convolution network for multi-view 3D model recognition. APPL INTELL 2022. [DOI: 10.1007/s10489-022-03775-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/02/2022]
|
4
|
Nie WZ, Liu AA, Zhao S, Gao Y. Deep Correlated Joint Network for 2-D Image-Based 3-D Model Retrieval. IEEE TRANSACTIONS ON CYBERNETICS 2022; 52:1862-1871. [PMID: 32603301 DOI: 10.1109/tcyb.2020.2995415] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/11/2023]
Abstract
In this article, we propose a novel deep correlated joint network (DCJN) approach for 2-D image-based 3-D model retrieval. First, the proposed method can jointly learn two distinct deep neural networks, which are trained for individual modalities to learn two deep nonlinear transformations for visual feature extraction from the co-embedding feature space. Second, we propose the global loss function for the DCJN, consisting of a discriminative loss and a correlation loss. The discriminative loss aims to minimize the intraclass distance of the extracted features and maximize the interclass distance of such features to a large margin within each modality, while the correlation loss focuses on mitigating the distribution discrepancy across different modalities. Consequently, the proposed method can realize cross-modality feature extraction guided by the defined global loss function to benefit the similarity measure between 2-D images and 3-D models. For a comparison experiment, we contribute the current largest 2-D image-based 3-D model retrieval dataset. Moreover, the proposed method was further evaluated on three popular benchmarks, including the 3-D Shape Retrieval Contest 2014, 2016, and 2018 benchmarks. The extensive comparison experimental results demonstrate the superiority of this method over the state-of-the-art methods.
Collapse
|
5
|
Wang XL, Zhu ZF, Song Y, Fu HJ. GRNet: Graph-based remodeling network for multi-view semi-supervised classification. Pattern Recognit Lett 2021. [DOI: 10.1016/j.patrec.2021.08.008] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/25/2022]
|
6
|
A Deep Learning Method for 3D Object Classification and Retrieval Using the Global Point Signature Plus and Deep Wide Residual Network. SENSORS 2021; 21:s21082644. [PMID: 33918845 PMCID: PMC8070544 DOI: 10.3390/s21082644] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 03/14/2021] [Revised: 04/04/2021] [Accepted: 04/05/2021] [Indexed: 12/23/2022]
Abstract
A vital and challenging task in computer vision is 3D Object Classification and Retrieval, with many practical applications such as an intelligent robot, autonomous driving, multimedia contents processing and retrieval, and augmented/mixed reality. Various deep learning methods were introduced for solving classification and retrieval problems of 3D objects. Almost all view-based methods use many views to handle spatial loss, although they perform the best among current techniques such as View-based, Voxelization, and Point Cloud methods. Many views make network structure more complicated due to the parallel Convolutional Neural Network (CNN). We propose a novel method that combines a Global Point Signature Plus with a Deep Wide Residual Network, namely GPSP-DWRN, in this paper. Global Point Signature Plus (GPSPlus) is a novel descriptor because it can capture more shape information of the 3D object for a single view. First, an original 3D model was converted into a colored one by applying GPSPlus. Then, a 32 × 32 × 3 matrix stored the obtained 2D projection of this color 3D model. This matrix was the input data of a Deep Residual Network, which used a single CNN structure. We evaluated the GPSP-DWRN for a retrieval task using the Shapnetcore55 dataset, while using two well-known datasets—ModelNet10 and ModelNet40 for a classification task. Based on our experimental results, our framework performed better than the state-of-the-art methods.
Collapse
|
7
|
Li WH, Yang S, Wang Y, Song D, Li XY. Multi-level similarity learning for image-text retrieval. Inf Process Manag 2021. [DOI: 10.1016/j.ipm.2020.102432] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/22/2022]
|
8
|
Gao Z, Xue H, Wan S. Multiple Discrimination and Pairwise CNN for view-based 3D object retrieval. Neural Netw 2020; 125:290-302. [PMID: 32151916 DOI: 10.1016/j.neunet.2020.02.017] [Citation(s) in RCA: 19] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/01/2019] [Revised: 02/12/2020] [Accepted: 02/24/2020] [Indexed: 10/24/2022]
Abstract
With the rapid development and wide application of computer, camera device, network and hardware technology, 3D object (or model) retrieval has attracted widespread attention and it has become a hot research topic in the computer vision domain. Deep learning features already available in 3D object retrieval have been proven to be better than the retrieval performance of hand-crafted features. However, most existing networks do not take into account the impact of multi-view image selection on network training, and the use of contrastive loss alone only forcing the same-class samples to be as close as possible. In this work, a novel solution named Multi-view Discrimination and Pairwise CNN (MDPCNN) for 3D object retrieval is proposed to tackle these issues. It can simultaneously input multiple batches and multiple views by adding the Slice layer and the Concat layer. Furthermore, a highly discriminative network is obtained by training samples that are not easy to be classified by clustering. Lastly, we deploy the contrastive-center loss and contrastive loss as the optimization objective that has better intra-class compactness and inter-class separability. Large-scale experiments show that the proposed MDPCNN can achieve a significant performance over the state-of-the-art algorithms in 3D object retrieval.
Collapse
Affiliation(s)
- Zan Gao
- Qilu University of Technology (Shandong Academy of Sciences), Shandong Artificial Intelligence Institute, Shandong Computer Science Center (National Supercomputer Center in Jinan), Jinan, 250014, PR China
| | - Haixin Xue
- Key Laboratory of Computer Vision and System, Ministry of Education, Tianjin University of Technology, Tianjin, 300384, PR China
| | - Shaohua Wan
- School of Information and Safety Engineering, Zhongnan University of Economics and Law, Wuhan 430073, PR China.
| |
Collapse
|
9
|
Leng B, Zhang C, Zhou X, Xu C, Xu K. Learning Discriminative 3D Shape Representations by View Discerning Networks. IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS 2019; 25:2896-2909. [PMID: 30130227 DOI: 10.1109/tvcg.2018.2865317] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/08/2023]
Abstract
In view-based 3D shape recognition, extracting discriminative visual representation of 3D shapes from projected images is considered the core problem. Projections with low discriminative ability can adversely influence the final 3D shape representation. Especially under the real situations with background clutter and object occlusion, the adverse effect is even more severe. To resolve this problem, we propose a novel deep neural network, View Discerning Network, which learns to judge the quality of views and adjust their contributions to the representation of shapes. In this network, a Score Generation Unit is devised to evaluate the quality of each projected image with score vectors. These score vectors are used to weight the image features and the weighted features perform much better than original features in 3D shape recognition task. In particular, we introduce two structures of Score Generation Unit, Channel-wise Score Unit and Part-wise Score Unit, to assess the quality of feature maps from different perspectives. Our network aggregates features and scores in an end-to-end framework, so that final shape descriptors are directly obtained from its output. Our experiments on ModelNet and ShapeNet Core55 show that View Discerning Network outperforms the state-of-the-arts in terms of the retrieval task, with excellent robustness against background clutter and object occlusion.
Collapse
|
10
|
Shi H, Zhang Y, Zhang Z, Ma N, Zhao X, Gao Y, Sun J. Hypergraph-Induced Convolutional Networks for Visual Classification. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2019; 30:2963-2972. [PMID: 30295630 DOI: 10.1109/tnnls.2018.2869747] [Citation(s) in RCA: 13] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/08/2023]
Abstract
At present, convolutional neural networks (CNNs) have become popular in visual classification tasks because of their superior performance. However, CNN-based methods do not consider the correlation of visual data to be classified. Recently, graph convolutional networks (GCNs) have mitigated this problem by modeling the pairwise relationship in visual data. Real-world tasks of visual classification typically must address numerous complex relationships in the data, which are not fit for the modeling of the graph structure using GCNs. Therefore, it is vital to explore the underlying correlation of visual data. Regarding this issue, we propose a framework called the hypergraph-induced convolutional network to explore the high-order correlation in visual data during deep neural networks. First, a hypergraph structure is constructed to formulate the relationship in visual data. Then, the high-order correlation is optimized by a learning process based on the constructed hypergraph. The classification tasks are performed by considering the high-order correlation in the data. Thus, the convolution of the hypergraph-induced convolutional network is based on the corresponding high-order relationship, and the optimization on the network uses each data and considers the high-order correlation of the data. To evaluate the proposed hypergraph-induced convolutional network framework, we have conducted experiments on three visual data sets: the National Taiwan University 3-D model data set, Princeton Shape Benchmark, and multiview RGB-depth object data set. The experimental results and comparison in all data sets demonstrate the effectiveness of our proposed hypergraph-induced convolutional network compared with the state-of-the-art methods.
Collapse
|
11
|
IARNN-Based Semantic-Containing Double-Level Embedding Bi-LSTM for Question-and-Answer Matching. COMPUTATIONAL INTELLIGENCE AND NEUROSCIENCE 2019; 2019:6074840. [PMID: 30944556 PMCID: PMC6421739 DOI: 10.1155/2019/6074840] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 08/10/2018] [Revised: 11/03/2018] [Accepted: 11/26/2018] [Indexed: 11/17/2022]
Abstract
We propose a novel end-to-end approach, namely, the semantic-containing double-level embedding Bi-LSTM model (SCDE-Bi-LSTM), to solve the three key problems of Q&A matching in the Chinese medical field. In the similarity calculation of the Q&A core module, we propose a text similarity calculation method that contains semantic information, to solve the problem that previous Q&A methods do not incorporate the deep information of a sentence into the similarity calculations. For the sentence vector representation module, we present a double-level embedding sentence representation method to reduce the error caused by Chinese medical word segmentation. In addition, due to the problem of the attention mechanism tending to cause backward deviation of the features, we propose an improved algorithm based on Bi-LSTM in the feature extraction stage. The Q&A framework proposed in this paper not only retains important timing features but also loses low-frequency features and noise. Additionally, it is applicable to different domains. To verify the framework, extensive Chinese medical Q&A corpora are created. We run several state-of-the-art Q&A methods as contrastive experiments on the medical corpora and the current popular insuranceQA dataset under different performance measures. The experimental results on the medical corpora show that our framework significantly outperforms several strong baselines and achieves an improvement of top-1 accuracy of up to 14%, reaching 79.15%.
Collapse
|
12
|
Liu AA, Nie WZ, Gao Y, Su YT. View-Based 3-D Model Retrieval: A Benchmark. IEEE TRANSACTIONS ON CYBERNETICS 2018; 48:916-928. [PMID: 28212106 DOI: 10.1109/tcyb.2017.2664503] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/06/2023]
Abstract
View-based 3-D model retrieval is one of the most important techniques in numerous applications of computer vision. While many methods have been proposed in recent years, to the best of our knowledge, there is no benchmark to evaluate the state-of-the-art methods. To tackle this problem, we systematically investigate and evaluate the related methods by: 1) proposing a clique graph-based method and 2) reimplementing six representative methods. Moreover, we concurrently evaluate both hand-crafted visual features and deep features on four popular datasets (NTU60, NTU216, PSB, and ETH) and one challenging real-world multiview model dataset (MV-RED) prepared by our group with various evaluation criteria to understand how these algorithms perform. By quantitatively analyzing the performances, we discover the graph matching-based method with deep features, especially the clique graph matching algorithm with convolutional neural networks features, can usually outperform the others. We further discuss the future research directions in this field.
Collapse
|
13
|
Li Y, Wang Y, Liu J, Hao W. Expression-insensitive 3D face recognition by the fusion of multiple subject-specific curves. Neurocomputing 2018. [DOI: 10.1016/j.neucom.2017.09.070] [Citation(s) in RCA: 16] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
|
14
|
|
15
|
|
16
|
Kankanhalli M. Benchmarking a Multimodal and Multiview and Interactive Dataset for Human Action Recognition. IEEE TRANSACTIONS ON CYBERNETICS 2017; 47:1781-1794. [PMID: 27429453 DOI: 10.1109/tcyb.2016.2582918] [Citation(s) in RCA: 21] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/06/2023]
Abstract
Human action recognition is an active research area in both computer vision and machine learning communities. In the past decades, the machine learning problem has evolved from conventional single-view learning problem, to cross-view learning, cross-domain learning and multitask learning, where a large number of algorithms have been proposed in the literature. Despite having large number of action recognition datasets, most of them are designed for a subset of the four learning problems, where the comparisons between algorithms can further limited by variances within datasets, experimental configurations, and other factors. To the best of our knowledge, there exists no dataset that allows concurrent analysis on the four learning problems. In this paper, we introduce a novel multimodal and multiview and interactive (M2I) dataset, which is designed for the evaluation of human action recognition methods under all four scenarios. This dataset consists of 1760 action samples from 22 action categories, including nine person-person interactive actions and 13 person-object interactive actions. We systematically benchmark state-of-the-art approaches on M2I dataset on all four learning problems. Overall, we evaluated 13 approaches with nine popular feature and descriptor combinations. Our comprehensive analysis demonstrates that M2I dataset is challenging due to significant intraclass and view variations, and multiple similar action categories, as well as provides solid foundation for the evaluation of existing state-of-the-art algorithms.
Collapse
|
17
|
Hong R, Hu Z, Wang R, Wang M, Tao D. Multi-View Object Retrieval via Multi-Scale Topic Models. IEEE TRANSACTIONS ON IMAGE PROCESSING : A PUBLICATION OF THE IEEE SIGNAL PROCESSING SOCIETY 2016; 25:5814-5827. [PMID: 28114066 DOI: 10.1109/tip.2016.2614132] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/06/2023]
Abstract
The increasing number of 3D objects in various applications has increased the requirement for effective and efficient 3D object retrieval methods, which attracted extensive research efforts in recent years. Existing works mainly focus on how to extract features and conduct object matching. With the increasing applications, 3D objects come from different areas. In such circumstances, how to conduct object retrieval becomes more important. To address this issue, we propose a multi-view object retrieval method using multi-scale topic models in this paper. In our method, multiple views are first extracted from each object, and then the dense visual features are extracted to represent each view. To represent the 3D object, multi-scale topic models are employed to extract the hidden relationship among these features with respect to varied topic numbers in the topic model. In this way, each object can be represented by a set of bag of topics. To compare the objects, we first conduct topic clustering for the basic topics from two data sets, and then generate the common topic dictionary for new representation. Then, the two objects can be aligned to the same common feature space for comparison. To evaluate the performance of the proposed method, experiments are conducted on two data sets. The 3D object retrieval experimental results and comparison with existing methods demonstrate the effectiveness of the proposed method.
Collapse
|
18
|
|