101
|
|
102
|
Anami BS, Bhandage VA. A Comparative Study of Suitability of Certain Features in Classification of Bharatanatyam Mudra Images Using Artificial Neural Network. Neural Process Lett 2018. [DOI: 10.1007/s11063-018-9921-6] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
|
103
|
Wang Y, Zhu L, Qian X, Han J. Joint Hypergraph Learning for Tag-Based Image Retrieval. IEEE TRANSACTIONS ON IMAGE PROCESSING : A PUBLICATION OF THE IEEE SIGNAL PROCESSING SOCIETY 2018; 27:4437-4451. [PMID: 29897870 DOI: 10.1109/tip.2018.2837219] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/08/2023]
Abstract
As the image sharing websites like Flickr become more and more popular, extensive scholars concentrate on tag-based image retrieval. It is one of the important ways to find images contributed by social users. In this research field, tag information and diverse visual features have been investigated. However, most existing methods use these visual features separately or sequentially. In this paper, we propose a global and local visual features fusion approach to learn the relevance of images by hypergraph approach. A hypergraph is constructed first by utilizing global, local visual features, and tag information. Then, we propose a pseudo-relevance feedback mechanism to obtain the pseudo-positive images. Finally, with the hypergraph and pseudo relevance feedback, we adopt the hypergraph learning algorithm to calculate the relevance score of each image to the query. Experimental results demonstrate the effectiveness of the proposed approach.
Collapse
|
104
|
|
105
|
Zhang C, Li C, Lu D, Cheng J, Tian Q. Birds of a feather flock together: Visual representation with scale and class consistency. Inf Sci (N Y) 2018. [DOI: 10.1016/j.ins.2018.05.048] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/16/2022]
|
106
|
|
107
|
Cao G, Iosifidis A, Chen K, Gabbouj M. Generalized Multi-View Embedding for Visual Recognition and Cross-Modal Retrieval. IEEE TRANSACTIONS ON CYBERNETICS 2018; 48:2542-2555. [PMID: 28885168 DOI: 10.1109/tcyb.2017.2742705] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/07/2023]
Abstract
In this paper, the problem of multi-view embedding from different visual cues and modalities is considered. We propose a unified solution for subspace learning methods using the Rayleigh quotient, which is extensible for multiple views, supervised learning, and nonlinear embeddings. Numerous methods including canonical correlation analysis, partial least square regression, and linear discriminant analysis are studied using specific intrinsic and penalty graphs within the same framework. Nonlinear extensions based on kernels and (deep) neural networks are derived, achieving better performance than the linear ones. Moreover, a novel multi-view modular discriminant analysis is proposed by taking the view difference into consideration. We demonstrate the effectiveness of the proposed multi-view embedding methods on visual object recognition and cross-modal image retrieval, and obtain superior results in both applications compared to related methods.
Collapse
|
108
|
Abstract
Due to the specific characteristics and complicated contents of remote sensing (RS) images, remote sensing image retrieval (RSIR) is always an open and tough research topic in the RS community. There are two basic blocks in RSIR, including feature learning and similarity matching. In this paper, we focus on developing an effective feature learning method for RSIR. With the help of the deep learning technique, the proposed feature learning method is designed under the bag-of-words (BOW) paradigm. Thus, we name the obtained feature deep BOW (DBOW). The learning process consists of two parts, including image descriptor learning and feature construction. First, to explore the complex contents within the RS image, we extract the image descriptor in the image patch level rather than the whole image. In addition, instead of using the handcrafted feature to describe the patches, we propose the deep convolutional auto-encoder (DCAE) model to deeply learn the discriminative descriptor for the RS image. Second, the k-means algorithm is selected to generate the codebook using the obtained deep descriptors. Then, the final histogrammic DBOW features are acquired by counting the frequency of the single code word. When we get the DBOW features from the RS images, the similarities between RS images are measured using L1-norm distance. Then, the retrieval results can be acquired according to the similarity order. The encouraging experimental results counted on four public RS image archives demonstrate that our DBOW feature is effective for the RSIR task. Compared with the existing RS image features, our DBOW can achieve improved behavior on RSIR.
Collapse
|
109
|
Supervoxel Segmentation and Bias Correction of MR Image with Intensity Inhomogeneity. Neural Process Lett 2018. [DOI: 10.1007/s11063-017-9704-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/10/2023]
|
110
|
Singhal V, Majumdar A. Majorization Minimization Technique for Optimally Solving Deep Dictionary Learning. Neural Process Lett 2018. [DOI: 10.1007/s11063-017-9603-9] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/20/2022]
|
111
|
|
112
|
Back projection: An effective postprocessing method for GAN-based face sketch synthesis. Pattern Recognit Lett 2018. [DOI: 10.1016/j.patrec.2017.06.012] [Citation(s) in RCA: 48] [Impact Index Per Article: 6.9] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/21/2022]
|
113
|
|
114
|
|
115
|
|
116
|
Baldominos A, Saez Y, Isasi P. Evolutionary convolutional neural networks: An application to handwriting recognition. Neurocomputing 2018. [DOI: 10.1016/j.neucom.2017.12.049] [Citation(s) in RCA: 103] [Impact Index Per Article: 14.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/16/2022]
|
117
|
Mahfouf Z, Merouani HF, Bouchrika I, Harrati N. Investigating the use of motion-based features from optical flow for gait recognition. Neurocomputing 2018. [DOI: 10.1016/j.neucom.2017.12.040] [Citation(s) in RCA: 23] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/25/2022]
|
118
|
Discriminant Analysis with Local Gaussian Similarity Preserving for Feature Extraction. Neural Process Lett 2018. [DOI: 10.1007/s11063-017-9630-6] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/19/2022]
|
119
|
Zeng X, Huang H, Qi C. Expanding Training Data for Facial Image Super-Resolution. IEEE TRANSACTIONS ON CYBERNETICS 2018; 48:716-729. [PMID: 28166514 DOI: 10.1109/tcyb.2017.2655027] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/06/2023]
Abstract
The quality of training data is very important for learning-based facial image super-resolution (SR). The more similarity between training data and testing input is, the better SR results we can have. To generate a better training set of low/high resolution training facial images for a particular testing input, this paper is the first work that proposes expanding the training data for improving facial image SR. To this end, observing that facial images are highly structured, we propose three constraints, i.e., the local structure constraint, the correspondence constraint and the similarity constraint, to generate new training data, where local patches are expanded with different expansion parameters. The expanded training data can be used for both patch-based facial SR methods and global facial SR methods. Extensive testings on benchmark databases and real world images validate the effectiveness of training data expansion on improving the SR quality.
Collapse
|
120
|
Laplace Graph Embedding Class Specific Dictionary Learning for Face Recognition. JOURNAL OF ELECTRICAL AND COMPUTER ENGINEERING 2018. [DOI: 10.1155/2018/2179049] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/17/2022]
Abstract
The sparse representation based classification (SRC) method and collaborative representation based classification (CRC) method have attracted more and more attention in recent years due to their promising results and robustness. However, both SRC and CRC algorithms directly use the training samples as the dictionary, which leads to a large fitting error. In this paper, we propose the Laplace graph embedding class specific dictionary learning (LGECSDL) algorithm, which trains a weight matrix and embeds a Laplace graph to reconstruct the dictionary. Firstly, it can increase the dimension of the dictionary matrix, which can be used to classify the small sample database. Secondly, it gives different dictionary atoms with different weights to improve classification accuracy. Additionally, in each class dictionary training process, the LGECSDL algorithm introduces the Laplace graph embedding method to the objective function in order to keep the local structure of each class, and the proposed method is capable of improving the performance of face recognition according to the class specific dictionary learning and Laplace graph embedding regularizer. Moreover, we also extend the proposed method to an arbitrary kernel space. Extensive experimental results on several face recognition benchmark databases demonstrate the superior performance of our proposed algorithm.
Collapse
|
121
|
Jiang X, Pang Y, Li X, Pan J, Xie Y. Deep neural networks with Elastic Rectified Linear Units for object recognition. Neurocomputing 2018. [DOI: 10.1016/j.neucom.2017.09.056] [Citation(s) in RCA: 41] [Impact Index Per Article: 5.9] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/18/2022]
|
122
|
Faghihi F, Moustafa AA. Sparse and burst spiking in artificial neural networks inspired by synaptic retrograde signaling. Inf Sci (N Y) 2017. [DOI: 10.1016/j.ins.2017.08.073] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/01/2023]
|
123
|
Shelton CR. Event Detection in Continuous Video: An Inference in Point Process Approach. IEEE TRANSACTIONS ON IMAGE PROCESSING : A PUBLICATION OF THE IEEE SIGNAL PROCESSING SOCIETY 2017; 26:5680-5691. [PMID: 28858803 DOI: 10.1109/tip.2017.2745209] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/07/2023]
Abstract
We propose a novel approach toward event detection in real-world continuous video sequences. The method: 1) is able to model arbitrary-order non-Markovian dependences in videos to mitigate local visual ambiguities; 2) conducts simultaneous event segmentation and labeling; and 3) is time-window free. The idea is to represent a video as an event stream of both high-level semantic events and low-level video observations. In training, we learn a point process model called a piecewise-constant conditional intensity model (PCIM) that is able to capture complex non-Markovian dependences in the event streams. In testing, event detection can be modeled as the inference of high-level semantic events, given low-level image observations. We develop the first inference algorithm for PCIM and show it samples exactly from the posterior distribution. We then evaluate the video event detection task on real-world video sequences. Our model not only provides competitive results on the video event segmentation and labeling task, but also provides benefits, including being interpretable and efficient.
Collapse
|
124
|
|
125
|
Chen Z, Lin J, Liao N, Chen CW. Full Reference Quality Assessment for Image Retargeting Based on Natural Scene Statistics Modeling and Bi-Directional Saliency Similarity. IEEE TRANSACTIONS ON IMAGE PROCESSING : A PUBLICATION OF THE IEEE SIGNAL PROCESSING SOCIETY 2017; 26:5138-5148. [PMID: 28792899 DOI: 10.1109/tip.2017.2736422] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/07/2023]
Abstract
Image retargeting technology has been widely studied to adapt images for the devices with heterogeneous screen resolutions. Meanwhile effective objective retargeting quality assessment algorithms are also very important for optimizing and selecting favorable retargeting methods. Unlike previous assessment algorithms which rely on image local structure features and unidirectional prediction of information loss, we propose a bi-directional natural salient scene distortion model (BNSSD) including image natural scene statistics (NSS) measurement, salient global structure distortion measurement, and bi-directional salient information loss measurement. First, we propose a new NSS model in log-Gabor domain and verify its effectiveness in reflecting nature scene statistical distortions introduced during the retargeting process. Second, the concept of salient global structure distortion is proposed to measure the global structure uniformity in the corresponding salient regions between original and retargeted images. Finally, we propose a bidirectional salient information loss metric to measure the information loss between salient areas in original image and retargeted image. The effectiveness of the BNSSD model is verified on two widely recognized public databases, and the experimental results show that our method outperforms the state-of-the-art algorithms under different statistical assessment criteria.
Collapse
|
126
|
|
127
|
Ou X, Ling H, Yu H, Li P, Zou F, Liu S. Adult Image and Video Recognition by a Deep Multicontext Network and Fine-to-Coarse Strategy. ACM T INTEL SYST TEC 2017. [DOI: 10.1145/3057733] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/19/2022]
Abstract
Adult image and video recognition is an important and challenging problem in the real world. Low-level feature cues do not produce good enough information, especially when the dataset is very large and has various data distributions. This issue raises a serious problem for conventional approaches. In this article, we tackle this problem by proposing a deep multicontext network with fine-to-coarse strategy for adult image and video recognition. We employ a deep convolution networks to model fusion features of sensitive objects in images. Global contexts and local contexts are both taken into consideration and are jointly modeled in a unified multicontext deep learning framework. To make the model more discriminative for diverse target objects, we investigate a novel hierarchical method, and a task-specific fine-to-coarse strategy is designed to make the multicontext modeling more suitable for adult object recognition. Furthermore, some recently proposed deep models are investigated. Our approach is extensively evaluated on four different datasets. One dataset is used for ablation experiments, whereas others are used for generalization experiments. Results show significant and consistent improvements over the state-of-the-art methods.
Collapse
Affiliation(s)
- Xinyu Ou
- Huazhong University of Science and Technology, Chinese Academy of Sciences, Yunnan Open University, Kunming, China
| | - Hefei Ling
- Huazhong University of Science and Technology, Wuhan, China
| | - Han Yu
- Chinese Academy of Sciences, Beijing, China
| | - Ping Li
- Huazhong University of Science and Technology, Wuhan, China
| | - Fuhao Zou
- Huazhong University of Science and Technology, Wuhan, China
| | - Si Liu
- Chinese Academy of Sciences, Beijing, China
| |
Collapse
|
128
|
Zhang J, Li K, Liang Y, Li N. Learning 3D faces from 2D images via Stacked Contractive Autoencoder. Neurocomputing 2017. [DOI: 10.1016/j.neucom.2016.11.062] [Citation(s) in RCA: 16] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/20/2022]
|
129
|
Zhang P, Zhuo T, Huang W, Chen K, Kankanhalli M. Online object tracking based on CNN with spatial-temporal saliency guided sampling. Neurocomputing 2017. [DOI: 10.1016/j.neucom.2016.10.073] [Citation(s) in RCA: 19] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/20/2022]
|
130
|
Zheng S, Hao Y, Lu D, Bao H, Xu J, Hao H, Xu B. Joint entity and relation extraction based on a hybrid neural network. Neurocomputing 2017. [DOI: 10.1016/j.neucom.2016.12.075] [Citation(s) in RCA: 64] [Impact Index Per Article: 8.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/20/2022]
|
131
|
Yang Y, Li Z, Wang W, Tao D. An adaptive semi-supervised clustering approach via multiple density-based information. Neurocomputing 2017. [DOI: 10.1016/j.neucom.2016.11.061] [Citation(s) in RCA: 17] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]
|
132
|
|
133
|
Guo D, Li W, Fang X. Capturing Temporal Structures for Video Captioning by Spatio-temporal Contexts and Channel Attention Mechanism. Neural Process Lett 2017. [DOI: 10.1007/s11063-017-9591-9] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/20/2022]
|
134
|
A Simplified Architecture of the Zhang Neural Network for Toeplitz Linear Systems Solving. Neural Process Lett 2017. [DOI: 10.1007/s11063-017-9656-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/19/2022]
|
135
|
|
136
|
Thanh ND, Ali M, Son LH. A Novel Clustering Algorithm in a Neutrosophic Recommender System for Medical Diagnosis. Cognit Comput 2017. [DOI: 10.1007/s12559-017-9462-8] [Citation(s) in RCA: 51] [Impact Index Per Article: 6.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/14/2022]
|
137
|
Wang Z, Wang L, Wang Y, Zhang B, Qiao Y. Weakly Supervised PatchNets: Describing and Aggregating Local Patches for Scene Recognition. IEEE TRANSACTIONS ON IMAGE PROCESSING : A PUBLICATION OF THE IEEE SIGNAL PROCESSING SOCIETY 2017; 26:2028-2041. [PMID: 28207394 DOI: 10.1109/tip.2017.2666739] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/06/2023]
Abstract
Traditional feature encoding scheme (e.g., Fisher vector) with local descriptors (e.g., SIFT) and recent convolutional neural networks (CNNs) are two classes of successful methods for image recognition. In this paper, we propose a hybrid representation, which leverages the discriminative capacity of CNNs and the simplicity of descriptor encoding schema for image recognition, with a focus on scene recognition. To this end, we make three main contributions from the following aspects. First, we propose a patch-level and end-to-end architecture to model the appearance of local patches, called PatchNet. PatchNet is essentially a customized network trained in a weakly supervised manner, which uses the image-level supervision to guide the patch-level feature extraction. Second, we present a hybrid visual representation, called VSAD, by utilizing the robust feature representations of PatchNet to describe local patches and exploiting the semantic probabilities of PatchNet to aggregate these local patches into a global representation. Third, based on the proposed VSAD representation, we propose a new state-of-the-art scene recognition approach, which achieves an excellent performance on two standard benchmarks: MIT Indoor67 (86.2%) and SUN397 (73.0%).
Collapse
|
138
|
Wu C, Shi X, Su J, Chen Y, Huang Y. Co-training for Implicit Discourse Relation Recognition Based on Manual and Distributed Features. Neural Process Lett 2017. [DOI: 10.1007/s11063-017-9582-x] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]
|
139
|
Liu X, Xue J. A Cluster Splitting Technique by Hopfield Networks and P Systems on Simplices. Neural Process Lett 2017. [DOI: 10.1007/s11063-016-9577-z] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/20/2022]
|