51
|
Shen X, Tian X, Liu T, Xu F, Tao D. Continuous Dropout. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2018; 29:3926-3937. [PMID: 28981433 DOI: 10.1109/tnnls.2017.2750679] [Citation(s) in RCA: 19] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/07/2023]
Abstract
Dropout has been proven to be an effective algorithm for training robust deep networks because of its ability to prevent overfitting by avoiding the co-adaptation of feature detectors. Current explanations of dropout include bagging, naive Bayes, regularization, and sex in evolution. According to the activation patterns of neurons in the human brain, when faced with different situations, the firing rates of neurons are random and continuous, not binary as current dropout does. Inspired by this phenomenon, we extend the traditional binary dropout to continuous dropout. On the one hand, continuous dropout is considerably closer to the activation characteristics of neurons in the human brain than traditional binary dropout. On the other hand, we demonstrate that continuous dropout has the property of avoiding the co-adaptation of feature detectors, which suggests that we can extract more independent feature detectors for model averaging in the test stage. We introduce the proposed continuous dropout to a feedforward neural network and comprehensively compare it with binary dropout, adaptive dropout, and DropConnect on Modified National Institute of Standards and Technology, Canadian Institute for Advanced Research-10, Street View House Numbers, NORB, and ImageNet large scale visual recognition competition-12. Thorough experiments demonstrate that our method performs better in preventing the co-adaptation of feature detectors and improves test performance.
Collapse
|
52
|
Li J, Zhang B, Zhang D. Shared Autoencoder Gaussian Process Latent Variable Model for Visual Classification. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2018; 29:4272-4286. [PMID: 29990089 DOI: 10.1109/tnnls.2017.2761401] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/08/2023]
Abstract
Multiview learning reveals the latent correlation among different modalities and utilizes the complementary information to achieve a better performance in many applications. In this paper, we propose a novel multiview learning model based on the Gaussian process latent variable model (GPLVM) to learn a set of nonlinear and nonparametric mapping functions and obtain a shared latent variable in the manifold space. Different from the previous work on the GPLVM, the proposed shared autoencoder Gaussian process (SAGP) latent variable model assumes that there is an additional mapping from the observed data to the shared manifold space. Due to the introduction of the autoencoder framework, both nonlinear projections from and to the observation are considered simultaneously. Additionally, instead of fully connecting used in the conventional autoencoder, the SAGP achieves the mappings utilizing the GP, which remarkably reduces the number of estimated parameters and avoids the phenomenon of overfitting. To make the proposed method adaptive for classification, a discriminative regularization is embedded into the proposed method. In the optimization process, an efficient algorithm based on the alternating direction method and gradient decent techniques is designed to solve the encoder and decoder parts alternatively. Experimental results on three real-world data sets substantiate the effectiveness and superiority of the proposed approach as compared with the state of the art.
Collapse
|
53
|
Chen X, Weng J, Lu W, Xu J, Weng J. Deep Manifold Learning Combined With Convolutional Neural Networks for Action Recognition. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2018; 29:3938-3952. [PMID: 28922128 DOI: 10.1109/tnnls.2017.2740318] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/07/2023]
Abstract
Learning deep representations have been applied in action recognition widely. However, there have been a few investigations on how to utilize the structural manifold information among different action videos to enhance the recognition accuracy and efficiency. In this paper, we propose to incorporate the manifold of training samples into deep learning, which is defined as deep manifold learning (DML). The proposed DML framework can be adapted to most existing deep networks to learn more discriminative features for action recognition. When applied to a convolutional neural network, DML embeds the previous convolutional layer's manifold into the next convolutional layer; thus, the discriminative capacity of the next layer can be promoted. We also apply the DML on a restricted Boltzmann machine, which can alleviate the overfitting problem. Experimental results on four standard action databases (i.e., UCF101, HMDB51, KTH, and UCF sports) show that the proposed method outperforms the state-of-the-art methods.
Collapse
|
54
|
Gan H, Huang R, Luo Z, Xi X, Gao Y. On using supervised clustering analysis to improve classification performance. Inf Sci (N Y) 2018. [DOI: 10.1016/j.ins.2018.04.080] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/16/2022]
|
55
|
Li X, Liu L, Lu X. Person Reidentification Based on Elastic Projections. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2018; 29:1314-1327. [PMID: 28422688 DOI: 10.1109/tnnls.2016.2602855] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/07/2023]
Abstract
Person reidentification usually refers to matching people in different camera views in nonoverlapping multicamera networks. Many existing methods learn a similarity measure by projecting the raw feature to a latent subspace to make the same target's distance smaller than different targets' distances. However, the same targets captured in different camera views should hold the same intrinsic attributes while different targets should hold different intrinsic attributes. Projecting all the data to the same subspace would cause loss of such an information and comparably poor discriminability. To address this problem, in this paper, a method based on elastic projections is proposed to learn a pairwise similarity measure for person reidentification. The proposed model learns two projections, positive projection and negative projection, which are both representative and discriminative. The representability refers to: for the same targets captured in two camera views, the positive projection can bridge the corresponding appearance variation and represent the intrinsic attributes of the same targets, while for the different targets captured in two camera views, the negative projection can explore and utilize the different attributes of different targets. The discriminability means that the intraclass distance should become smaller than its original distance after projection, while the interclass distance becomes larger on the contrary, which is the elastic property of the proposed model. In this case, prior information of the original data space is used to give guidance for the learning phase; more importantly, similar targets (but not the same) are effectively reduced by forcing the same targets to become more similar and different targets to become more distinct. The proposed model is evaluated on three benchmark data sets, including VIPeR, GRID, and CUHK, and achieves better performance than other methods.
Collapse
|
56
|
Liu J, Gong M, Qin K, Zhang P. A Deep Convolutional Coupling Network for Change Detection Based on Heterogeneous Optical and Radar Images. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2018; 29:545-559. [PMID: 28026789 DOI: 10.1109/tnnls.2016.2636227] [Citation(s) in RCA: 64] [Impact Index Per Article: 9.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/06/2023]
Abstract
We propose an unsupervised deep convolutional coupling network for change detection based on two heterogeneous images acquired by optical sensors and radars on different dates. Most existing change detection methods are based on homogeneous images. Due to the complementary properties of optical and radar sensors, there is an increasing interest in change detection based on heterogeneous images. The proposed network is symmetric with each side consisting of one convolutional layer and several coupling layers. The two input images connected with the two sides of the network, respectively, are transformed into a feature space where their feature representations become more consistent. In this feature space, the different map is calculated, which then leads to the ultimate detection map by applying a thresholding algorithm. The network parameters are learned by optimizing a coupling function. The learning process is unsupervised, which is different from most existing change detection methods based on heterogeneous images. Experimental results on both homogenous and heterogeneous images demonstrate the promising performance of the proposed network compared with several existing approaches.
Collapse
|
57
|
Wang W, Shen J, Shao L. Video Salient Object Detection via Fully Convolutional Networks. IEEE TRANSACTIONS ON IMAGE PROCESSING : A PUBLICATION OF THE IEEE SIGNAL PROCESSING SOCIETY 2018; 27:38-49. [PMID: 28945593 DOI: 10.1109/tip.2017.2754941] [Citation(s) in RCA: 183] [Impact Index Per Article: 26.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/07/2023]
Abstract
This paper proposes a deep learning model to efficiently detect salient regions in videos. It addresses two important issues: 1) deep video saliency model training with the absence of sufficiently large and pixel-wise annotated video data and 2) fast video saliency training and detection. The proposed deep video saliency network consists of two modules, for capturing the spatial and temporal saliency information, respectively. The dynamic saliency model, explicitly incorporating saliency estimates from the static saliency model, directly produces spatiotemporal saliency inference without time-consuming optical flow computation. We further propose a novel data augmentation technique that simulates video training data from existing annotated image data sets, which enables our network to learn diverse saliency information and prevents overfitting with the limited number of training videos. Leveraging our synthetic video data (150K video sequences) and real videos, our deep video saliency model successfully learns both spatial and temporal saliency cues, thus producing accurate spatiotemporal saliency estimate. We advance the state-of-the-art on the densely annotated video segmentation data set (MAE of .06) and the Freiburg-Berkeley Motion Segmentation data set (MAE of .07), and do so with much improved speed (2 fps with all steps).This paper proposes a deep learning model to efficiently detect salient regions in videos. It addresses two important issues: 1) deep video saliency model training with the absence of sufficiently large and pixel-wise annotated video data and 2) fast video saliency training and detection. The proposed deep video saliency network consists of two modules, for capturing the spatial and temporal saliency information, respectively. The dynamic saliency model, explicitly incorporating saliency estimates from the static saliency model, directly produces spatiotemporal saliency inference without time-consuming optical flow computation. We further propose a novel data augmentation technique that simulates video training data from existing annotated image data sets, which enables our network to learn diverse saliency information and prevents overfitting with the limited number of training videos. Leveraging our synthetic video data (150K video sequences) and real videos, our deep video saliency model successfully learns both spatial and temporal saliency cues, thus producing accurate spatiotemporal saliency estimate. We advance the state-of-the-art on the densely annotated video segmentation data set (MAE of .06) and the Freiburg-Berkeley Motion Segmentation data set (MAE of .07), and do so with much improved speed (2 fps with all steps).
Collapse
Affiliation(s)
- Wenguan Wang
- Beijing Laboratory of Intelligent Information Technology, School of Computer Science, Beijing Institute of Technology, Beijing, China
| | - Jianbing Shen
- Beijing Laboratory of Intelligent Information Technology, School of Computer Science, Beijing Institute of Technology, Beijing, China
| | - Ling Shao
- School of Computing Sciences, University of East Anglia, Norwich, U.K
| |
Collapse
|
58
|
Ding K, Huo C, Fan B, Xiang S, Pan C. In Defense of Locality-Sensitive Hashing. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2018; 29:87-103. [PMID: 28113786 DOI: 10.1109/tnnls.2016.2615085] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/06/2023]
Abstract
Hashing-based semantic similarity search is becoming increasingly important for building large-scale content-based retrieval system. The state-of-the-art supervised hashing techniques use flexible two-step strategy to learn hash functions. The first step learns binary codes for training data by solving binary optimization problems with millions of variables, thus usually requiring intensive computations. Despite simplicity and efficiency, locality-sensitive hashing (LSH) has never been recognized as a good way to generate such codes due to its poor performance in traditional approximate neighbor search. We claim in this paper that the true merit of LSH lies in transforming the semantic labels to obtain the binary codes, resulting in an effective and efficient two-step hashing framework. Specifically, we developed the locality-sensitive two-step hashing (LS-TSH) that generates the binary codes through LSH rather than any complex optimization technique. Theoretically, with proper assumption, LS-TSH is actually a useful LSH scheme, so that it preserves the label-based semantic similarity and possesses sublinear query complexity for hash lookup. Experimentally, LS-TSH could obtain comparable retrieval accuracy with state of the arts with two to three orders of magnitudes faster training speed.
Collapse
|
59
|
Dornaika F, Kejani MT, Bosaghzadeh A. Graph construction using adaptive Local Hybrid Coding scheme. Neural Netw 2017; 95:91-101. [DOI: 10.1016/j.neunet.2017.08.002] [Citation(s) in RCA: 17] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/20/2016] [Revised: 08/04/2017] [Accepted: 08/08/2017] [Indexed: 11/26/2022]
|
60
|
Han Z, Liu Z, Han J, Vong CM, Bu S, Chen CLP. Mesh Convolutional Restricted Boltzmann Machines for Unsupervised Learning of Features With Structure Preservation on 3-D Meshes. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2017; 28:2268-2281. [PMID: 28113522 DOI: 10.1109/tnnls.2016.2582532] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/06/2023]
Abstract
Discriminative features of 3-D meshes are significant to many 3-D shape analysis tasks. However, handcrafted descriptors and traditional unsupervised 3-D feature learning methods suffer from several significant weaknesses: 1) the extensive human intervention is involved; 2) the local and global structure information of 3-D meshes cannot be preserved, which is in fact an important source of discriminability; 3) the irregular vertex topology and arbitrary resolution of 3-D meshes do not allow the direct application of the popular deep learning models; 4) the orientation is ambiguous on the mesh surface; and 5) the effect of rigid and nonrigid transformations on 3-D meshes cannot be eliminated. As a remedy, we propose a deep learning model with a novel irregular model structure, called mesh convolutional restricted Boltzmann machines (MCRBMs). MCRBM aims to simultaneously learn structure-preserving local and global features from a novel raw representation, local function energy distribution. In addition, multiple MCRBMs can be stacked into a deeper model, called mesh convolutional deep belief networks (MCDBNs). MCDBN employs a novel local structure preserving convolution (LSPC) strategy to convolve the geometry and the local structure learned by the lower MCRBM to the upper MCRBM. LSPC facilitates resolving the challenging issue of the orientation ambiguity on the mesh surface in MCDBN. Experiments using the proposed MCRBM and MCDBN were conducted on three common aspects: global shape retrieval, partial shape retrieval, and shape correspondence. Results show that the features learned by the proposed methods outperform the other state-of-the-art 3-D shape features.
Collapse
|
61
|
Nimmy SF, Kamal MS, Hossain MI, Dey N, Ashour AS, Shi F. Neural Skyline Filtering for Imbalance Features Classification. INTERNATIONAL JOURNAL OF COMPUTATIONAL INTELLIGENCE AND APPLICATIONS 2017. [DOI: 10.1142/s1469026817500195] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022]
Abstract
In the current digitalized era, large datasets play a vital role in features extractions, information processing, knowledge mining and management. Sometimes, existing mining approaches are not sufficient to handle large volume of datasets. Biological data processing also suffers for the same issue. In the present work, a classification process is carried out on large volume of exons and introns from a set of raw data. The proposed work is designed into two parts as pre-processing and mapping-based classification. For pre-processing, three filtering techniques have been used. However, these traditional filtering techniques face difficulties for large datasets due to the long required time during large data processing as well as the large required memory size. In this regard, a mapping-based neural skyline filtering approach is designed. Randomized algorithm performed the mapping for large volume of datasets based on objective function. The objective function determines the randomized size of the datasets according to the homogeneity. Around 200 million DNA base pairs have been used for experimental analysis. Experimental result shows that mapping centric filtering outperforms other filtering techniques during large data processing.
Collapse
Affiliation(s)
- Sonia Farhana Nimmy
- Department of Computer Science and Engineering, Notre Dame University Bangladesh, Bangladesh
| | - Md. Sarwar Kamal
- Department of Computer Science and Engineering, East West University Bangladesh, Bangladesh
| | - Muhammad Iqbal Hossain
- Department of Computer Science and Engineering, BGC Trust University Bangladesh, Bangladesh
| | - Nilanjan Dey
- Department of Information Technology, Techno India College of Technology, India
| | - Amira S. Ashour
- Department of Electronics and Electrical, Communications Engineering Tanta University, Egypt
| | - Fuqian Shi
- College of Information and Engineering, Wenzhou Medical University, Wenzhou, P. R. China
| |
Collapse
|
62
|
|
63
|
|
64
|
Liu F, Liang J, Shen L, Yang M, Zhang D, Lai Z. Case study of 3D fingerprints applications. PLoS One 2017; 12:e0175261. [PMID: 28399141 PMCID: PMC5388323 DOI: 10.1371/journal.pone.0175261] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/29/2016] [Accepted: 03/08/2017] [Indexed: 11/19/2022] Open
Abstract
Human fingers are 3D objects. More information will be provided if three dimensional (3D) fingerprints are available compared with two dimensional (2D) fingerprints. Thus, this paper firstly collected 3D finger point cloud data by Structured-light Illumination method. Additional features from 3D fingerprint images are then studied and extracted. The applications of these features are finally discussed. A series of experiments are conducted to demonstrate the helpfulness of 3D information to fingerprint recognition. Results show that a quick alignment can be easily implemented under the guidance of 3D finger shape feature even though this feature does not work for fingerprint recognition directly. The newly defined distinctive 3D shape ridge feature can be used for personal authentication with Equal Error Rate (EER) of ~8.3%. Also, it is helpful to remove false core point. Furthermore, a promising of EER ~1.3% is realized by combining this feature with 2D features for fingerprint recognition which indicates the prospect of 3D fingerprint recognition.
Collapse
Affiliation(s)
- Feng Liu
- Computer Vision Institute, School of Computer Science & Software Engineering, Shenzhen University, Shen Zhen, Guang Dong, China
- * E-mail:
| | - Jinrong Liang
- Computer Vision Institute, School of Computer Science & Software Engineering, Shenzhen University, Shen Zhen, Guang Dong, China
| | - Linlin Shen
- Computer Vision Institute, School of Computer Science & Software Engineering, Shenzhen University, Shen Zhen, Guang Dong, China
| | - Meng Yang
- Computer Vision Institute, School of Computer Science & Software Engineering, Shenzhen University, Shen Zhen, Guang Dong, China
| | - David Zhang
- Department of Computing, The Hong Kong Polytechnic University, Hung Hom, Kowloon, Hong Kong
| | - Zhihui Lai
- Computer Vision Institute, School of Computer Science & Software Engineering, Shenzhen University, Shen Zhen, Guang Dong, China
| |
Collapse
|
65
|
Du B, Wang Z, Zhang L, Zhang L, Tao D. Robust and Discriminative Labeling for Multi-Label Active Learning Based on Maximum Correntropy Criterion. IEEE TRANSACTIONS ON IMAGE PROCESSING : A PUBLICATION OF THE IEEE SIGNAL PROCESSING SOCIETY 2017; 26:1694-1707. [PMID: 28092540 DOI: 10.1109/tip.2017.2651372] [Citation(s) in RCA: 26] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/06/2023]
Abstract
Multi-label learning draws great interests in many real world applications. It is a highly costly task to assign many labels by the oracle for one instance. Meanwhile, it is also hard to build a good model without diagnosing discriminative labels. Can we reduce the label costs and improve the ability to train a good model for multi-label learning simultaneously? Active learning addresses the less training samples problem by querying the most valuable samples to achieve a better performance with little costs. In multi-label active learning, some researches have been done for querying the relevant labels with less training samples or querying all labels without diagnosing the discriminative information. They all cannot effectively handle the outlier labels for the measurement of uncertainty. Since maximum correntropy criterion (MCC) provides a robust analysis for outliers in many machine learning and data mining algorithms, in this paper, we derive a robust multi-label active learning algorithm based on an MCC by merging uncertainty and representativeness, and propose an efficient alternating optimization method to solve it. With MCC, our method can eliminate the influence of outlier labels that are not discriminative to measure the uncertainty. To make further improvement on the ability of information measurement, we merge uncertainty and representativeness with the prediction labels of unknown data. It cannot only enhance the uncertainty but also improve the similarity measurement of multi-label data with labels information. Experiments on benchmark multi-label data sets have shown a superior performance than the state-of-the-art methods.
Collapse
|
66
|
Du B, Xiong W, Wu J, Zhang L, Zhang L, Tao D. Stacked Convolutional Denoising Auto-Encoders for Feature Representation. IEEE TRANSACTIONS ON CYBERNETICS 2017; 47:1017-1027. [PMID: 26992191 DOI: 10.1109/tcyb.2016.2536638] [Citation(s) in RCA: 113] [Impact Index Per Article: 14.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/05/2023]
Abstract
Deep networks have achieved excellent performance in learning representation from visual data. However, the supervised deep models like convolutional neural network require large quantities of labeled data, which are very expensive to obtain. To solve this problem, this paper proposes an unsupervised deep network, called the stacked convolutional denoising auto-encoders, which can map images to hierarchical representations without any label information. The network, optimized by layer-wise training, is constructed by stacking layers of denoising auto-encoders in a convolutional way. In each layer, high dimensional feature maps are generated by convolving features of the lower layer with kernels learned by a denoising auto-encoder. The auto-encoder is trained on patches extracted from feature maps in the lower layer to learn robust feature detectors. To better train the large network, a layer-wise whitening technique is introduced into the model. Before each convolutional layer, a whitening layer is embedded to sphere the input data. By layers of mapping, raw images are transformed into high-level feature representations which would boost the performance of the subsequent support vector machine classifier. The proposed algorithm is evaluated by extensive experimentations and demonstrates superior classification performance to state-of-the-art unsupervised networks.
Collapse
|
67
|
Ding G, Guo Y, Zhou J, Gao Y. Large-Scale Cross-Modality Search via Collective Matrix Factorization Hashing. IEEE TRANSACTIONS ON IMAGE PROCESSING : A PUBLICATION OF THE IEEE SIGNAL PROCESSING SOCIETY 2016; 25:5427-5440. [PMID: 27623584 DOI: 10.1109/tip.2016.2607421] [Citation(s) in RCA: 26] [Impact Index Per Article: 2.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/06/2023]
Abstract
By transforming data into binary representation, i.e., Hashing, we can perform high-speed search with low storage cost, and thus, Hashing has collected increasing research interest in the recent years. Recently, how to generate Hashcode for multimodal data (e.g., images with textual tags, documents with photos, and so on) for large-scale cross-modality search (e.g., searching semantically related images in database for a document query) is an important research issue because of the fast growth of multimodal data in the Web. To address this issue, a novel framework for multimodal Hashing is proposed, termed as Collective Matrix Factorization Hashing (CMFH). The key idea of CMFH is to learn unified Hashcodes for different modalities of one multimodal instance in the shared latent semantic space in which different modalities can be effectively connected. Therefore, accurate cross-modality search is supported. Based on the general framework, we extend it in the unsupervised scenario where it tries to preserve the Euclidean structure, and in the supervised scenario where it fully exploits the label information of data. The corresponding theoretical analysis and the optimization algorithms are given. We conducted comprehensive experiments on three benchmark data sets for cross-modality search. The experimental results demonstrate that CMFH can significantly outperform several state-of-the-art cross-modality Hashing methods, which validates the effectiveness of the proposed CMFH.
Collapse
|
68
|
|
69
|
|
70
|
Learning a Transferable Change Rule from a Recurrent Neural Network for Land Cover Change Detection. REMOTE SENSING 2016. [DOI: 10.3390/rs8060506] [Citation(s) in RCA: 70] [Impact Index Per Article: 7.8] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/16/2022]
|
71
|
Weng D, Wang Y, Gong M, Tao D, Wei H, Huang D. DERF: distinctive efficient robust features from the biological modeling of the P ganglion cells. IEEE TRANSACTIONS ON IMAGE PROCESSING : A PUBLICATION OF THE IEEE SIGNAL PROCESSING SOCIETY 2015; 24:2287-2302. [PMID: 25769164 DOI: 10.1109/tip.2015.2409739] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/04/2023]
Abstract
Studies in neuroscience and biological vision have shown that the human retina has strong computational power, and its information representation supports vision tasks on both ventral and dorsal pathways. In this paper, a new local image descriptor, termed distinctive efficient robust features (DERF), is derived by modeling the response and distribution properties of the parvocellular-projecting ganglion cells in the primate retina. DERF features exponential scale distribution, exponential grid structure, and circularly symmetric function difference of Gaussian (DoG) used as a convolution kernel, all of which are consistent with the characteristics of the ganglion cell array found in neurophysiology, anatomy, and biophysics. In addition, a new explanation for local descriptor design is presented from the perspective of wavelet tight frames. DoG is naturally a wavelet, and the structure of the grid points array in our descriptor is closely related to the spatial sampling of wavelets. The DoG wavelet itself forms a frame, and when we modulate the parameters of our descriptor to make the frame tighter, the performance of the DERF descriptor improves accordingly. This is verified by designing a tight frame DoG, which leads to much better performance. Extensive experiments conducted in the image matching task on the multiview stereo correspondence data set demonstrate that DERF outperforms state of the art methods for both hand-crafted and learned descriptors, while remaining robust and being much faster to compute.
Collapse
|