1
|
Susan S. Neuroscientific insights about computer vision models: a concise review. BIOLOGICAL CYBERNETICS 2024; 118:331-348. [PMID: 39382577 DOI: 10.1007/s00422-024-00998-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/22/2024] [Accepted: 09/12/2024] [Indexed: 10/10/2024]
Abstract
The development of biologically-inspired computational models has been the focus of study ever since the artificial neuron was introduced by McCulloch and Pitts in 1943. However, a scrutiny of literature reveals that most attempts to replicate the highly efficient and complex biological visual system have been futile or have met with limited success. The recent state-of the-art computer vision models, such as pre-trained deep neural networks and vision transformers, may not be biologically inspired per se. Nevertheless, certain aspects of biological vision are still found embedded, knowingly or unknowingly, in the architecture and functioning of these models. This paper explores several principles related to visual neuroscience and the biological visual pathway that resonate, in some manner, in the architectural design and functioning of contemporary computer vision models. The findings of this survey can provide useful insights for building futuristic bio-inspired computer vision models. The survey is conducted from a historical perspective, tracing the biological connections of computer vision models starting with the basic artificial neuron to modern technologies such as deep convolutional neural network (CNN) and spiking neural networks (SNN). One spotlight of the survey is a discussion on biologically plausible neural networks and bio-inspired unsupervised learning mechanisms adapted for computer vision tasks in recent times.
Collapse
Affiliation(s)
- Seba Susan
- Department of Information Technology, Delhi Technological University, Delhi, India.
| |
Collapse
|
2
|
Wu G, Yang J. Randomized algorithms for large-scale dictionary learning. Neural Netw 2024; 179:106628. [PMID: 39168071 DOI: 10.1016/j.neunet.2024.106628] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/26/2024] [Revised: 08/08/2024] [Accepted: 08/09/2024] [Indexed: 08/23/2024]
Abstract
Dictionary learning is an important sparse representation algorithm which has been widely used in machine learning and artificial intelligence. However, for massive data in the big data era, classical dictionary learning algorithms are computationally expensive and even can be infeasible. To overcome this difficulty, we propose new dictionary learning methods based on randomized algorithms. The contributions of this work are as follows. First, we find that dictionary matrix is often numerically low-rank. Based on this property, we apply randomized singular value decomposition (RSVD) to the dictionary matrix, and propose a randomized algorithm for linear dictionary learning. Compared with the classical K-SVD algorithm, an advantage is that one can update all the elements of the dictionary matrix simultaneously. Second, to the best of our knowledge, there are few theoretical results on why one can solve the involved matrix computation problems inexactly in dictionary learning. To fill-in this gap, we show the rationality of this randomized algorithm with inexact solving, from a matrix perturbation analysis point of view. Third, based on the numerically low-rank property and Nyström approximation of the kernel matrix, we propose a randomized kernel dictionary learning algorithm, and establish the distance between the exact solution and the computed solution, to show the effectiveness of the proposed randomized kernel dictionary learning algorithm. Fourth, we propose an efficient scheme for the testing stage in kernel dictionary learning. By using this strategy, there is no need to form nor store kernel matrices explicitly both in the training and the testing stages. Comprehensive numerical experiments are performed on some real-world data sets. Numerical results demonstrate the rationality of our strategies, and show that the proposed algorithms are much efficient than some state-of-the-art dictionary learning algorithms. The MATLAB codes of the proposed algorithms are publicly available from https://github.com/Jiali-yang/RALDL_RAKDL.
Collapse
Affiliation(s)
- Gang Wu
- School of Mathematics, China University of Mining and Technology, Xuzhou, 221116, Jiangsu, PR China; School of Big Data, Fuzhou University of International Studies and Trade, Fuzhou, Fujian, PR China.
| | - Jiali Yang
- School of Mathematics, China University of Mining and Technology, Xuzhou, 221116, Jiangsu, PR China
| |
Collapse
|
3
|
Wu Y, Gadsden SA. Machine learning algorithms in microbial classification: a comparative analysis. Front Artif Intell 2023; 6:1200994. [PMID: 37928448 PMCID: PMC10620803 DOI: 10.3389/frai.2023.1200994] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/05/2023] [Accepted: 09/27/2023] [Indexed: 11/07/2023] Open
Abstract
This research paper presents an overview of contemporary machine learning methodologies and their utilization in the domain of healthcare and the prevention of infectious diseases, specifically focusing on the classification and identification of bacterial species. As deep learning techniques have gained prominence in the healthcare sector, a diverse array of architectural models has emerged. Through a comprehensive review of pertinent literature, multiple studies employing machine learning algorithms in the context of microbial diagnosis and classification are examined. Each investigation entails a tabulated presentation of data, encompassing details about the training and validation datasets, specifications of the machine learning and deep learning techniques employed, as well as the evaluation metrics utilized to gauge algorithmic performance. Notably, Convolutional Neural Networks have been the predominant selection for image classification tasks by machine learning practitioners over the last decade. This preference stems from their ability to autonomously extract pertinent and distinguishing features with minimal human intervention. A range of CNN architectures have been developed and effectively applied in the realm of image classification. However, addressing the considerable data requirements of deep learning, recent advancements encompass the application of pre-trained models using transfer learning for the identification of microbial entities. This method involves repurposing the knowledge gleaned from solving alternate image classification challenges to accurately classify microbial images. Consequently, the necessity for extensive and varied training data is significantly mitigated. This study undertakes a comparative assessment of various popular pre-trained CNN architectures for the classification of bacteria. The dataset employed is composed of approximately 660 images, representing 33 bacterial species. To enhance dataset diversity, data augmentation is implemented, followed by evaluation on multiple models including AlexNet, VGGNet, Inception networks, Residual Networks, and Densely Connected Convolutional Networks. The results indicate that the DenseNet-121 architecture yields the optimal performance, achieving a peak accuracy of 99.08%, precision of 99.06%, recall of 99.00%, and an F1-score of 98.99%. By demonstrating the proficiency of the DenseNet-121 model on a comparatively modest dataset, this study underscores the viability of transfer learning in the healthcare sector for precise and efficient microbial identification. These findings contribute to the ongoing endeavors aimed at harnessing machine learning techniques to enhance healthcare methodologies and bolster infectious disease prevention practices.
Collapse
Affiliation(s)
- Yuandi Wu
- Department of Mechanical Engineering, Intelligent and Cognitive Engineering Laboratory, McMaster University, Hamilton, ON, Canada
| | - S Andrew Gadsden
- Department of Mechanical Engineering, Intelligent and Cognitive Engineering Laboratory, McMaster University, Hamilton, ON, Canada
| |
Collapse
|
4
|
Lin W, Ding X, Huang Y, Zeng H. Self-Supervised Video-Based Action Recognition With Disturbances. IEEE TRANSACTIONS ON IMAGE PROCESSING : A PUBLICATION OF THE IEEE SIGNAL PROCESSING SOCIETY 2023; 32:2493-2507. [PMID: 37099471 DOI: 10.1109/tip.2023.3269228] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/06/2023]
Abstract
Self-supervised video-based action recognition is a challenging task, which needs to extract the principal information characterizing the action from content-diversified videos over large unlabeled datasets. However, most existing methods choose to exploit the natural spatio-temporal properties of video to obtain effective action representations from a visual perspective, while ignoring the exploration of the semantic that is closer to human cognition. For that, a self-supervised Video-based Action Recognition method with Disturbances called VARD, which extracts the principal information of the action in terms of the visual and semantic, is proposed. Specifically, according to cognitive neuroscience research, the recognition ability of humans is activated by visual and semantic attributes. An intuitive impression is that minor changes of the actor or scene in video do not affect one person's recognition of the action. On the other hand, different humans always make consistent opinions when they recognize the same action video. In other words, for an action video, the necessary information that remains constant despite the disturbances in the visual video or the semantic encoding process is sufficient to represent the action. Therefore, to learn such information, we construct a positive clip/embedding for each action video. Compared to the original video clip/embedding, the positive clip/embedding is disturbed visually/semantically by Video Disturbance and Embedding Disturbance. Our objective is to pull the positive closer to the original clip/embedding in the latent space. In this way, the network is driven to focus on the principal information of the action while the impact of sophisticated details and inconsequential variations is weakened. It is worthwhile to mention that the proposed VARD does not require optical flow, negative samples, and pretext tasks. Extensive experiments conducted on the UCF101 and HMDB51 datasets demonstrate that the proposed VARD effectively improves the strong baseline and outperforms multiple classical and advanced self-supervised action recognition methods.
Collapse
|
5
|
Dornaika F, Hoang VT. Deep data representation with feature propagation for semi-supervised learning. INT J MACH LEARN CYB 2022. [DOI: 10.1007/s13042-022-01701-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022]
|
6
|
Dornaika F. Deep, Flexible Data Embedding with Graph-Based Feature Propagation for Semi-supervised Classification. Cognit Comput 2022. [DOI: 10.1007/s12559-022-10056-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/24/2022]
|
7
|
Gong Z, Hu W, Du X, Zhong P, Hu P. Deep Manifold Embedding for Hyperspectral Image Classification. IEEE TRANSACTIONS ON CYBERNETICS 2022; 52:10430-10443. [PMID: 33872180 DOI: 10.1109/tcyb.2021.3069790] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/12/2023]
Abstract
Deep learning methods have played a more important role in hyperspectral image classification. However, general deep learning methods mainly take advantage of the samplewise information to formulate the training loss while ignoring the intrinsic data structure of each class. Due to the high spectral dimension and great redundancy between different spectral channels in the hyperspectral image, these former training losses usually cannot work so well for the deep representation of the image. To tackle this problem, this work develops a novel deep manifold embedding method (DMEM) for deep learning in hyperspectral image classification. First, each class in the image is modeled as a specific nonlinear manifold, and the geodesic distance is used to measure the correlation between the samples. Then, based on the hierarchical clustering, the manifold structure of the data can be captured and each nonlinear data manifold can be divided into several subclasses. Finally, considering the distribution of each subclass and the correlation between different subclasses under data manifold, DMEM is constructed as the novel training loss to incorporate the special classwise information in the training process and obtain discriminative representation for the hyperspectral image. Experiments over four real-world hyperspectral image datasets have demonstrated the effectiveness of the proposed method when compared with general sample-based losses and showed superiority when compared with state-of-the-art methods.
Collapse
|
8
|
Wang C, Peng G, De Baets B. Joint global metric learning and local manifold preservation for scene recognition. Inf Sci (N Y) 2022. [DOI: 10.1016/j.ins.2022.07.188] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/15/2022]
|
9
|
Zhang L, Su G, Yin J, Li Y, Lin Q, Zhang X, Shao L. Bioinspired Scene Classification by Deep Active Learning With Remote Sensing Applications. IEEE TRANSACTIONS ON CYBERNETICS 2022; 52:5682-5694. [PMID: 33635802 DOI: 10.1109/tcyb.2020.2981480] [Citation(s) in RCA: 11] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/12/2023]
Abstract
Accurately classifying sceneries with different spatial configurations is an indispensable technique in computer vision and intelligent systems, for example, scene parsing, robot motion planning, and autonomous driving. Remarkable performance has been achieved by the deep recognition models in the past decade. As far as we know, however, these deep architectures are incapable of explicitly encoding the human visual perception, that is, the sequence of gaze movements and the subsequent cognitive processes. In this article, a biologically inspired deep model is proposed for scene classification, where the human gaze behaviors are robustly discovered and represented by a unified deep active learning (UDAL) framework. More specifically, to characterize objects' components with varied sizes, an objectness measure is employed to decompose each scenery into a set of semantically aware object patches. To represent each region at a low level, a local-global feature fusion scheme is developed which optimally integrates multimodal features by automatically calculating each feature's weight. To mimic the human visual perception of various sceneries, we develop the UDAL that hierarchically represents the human gaze behavior by recognizing semantically important regions within the scenery. Importantly, UDAL combines the semantically salient region detection and the deep gaze shifting path (GSP) representation learning into a principled framework, where only the partial semantic tags are required. Meanwhile, by incorporating the sparsity penalty, the contaminated/redundant low-level regional features can be intelligently avoided. Finally, the learned deep GSP features from the entire scene images are integrated to form an image kernel machine, which is subsequently fed into a kernel SVM to classify different sceneries. Experimental evaluations on six well-known scenery sets (including remote sensing images) have shown the competitiveness of our approach.
Collapse
|
10
|
Sun Q, Tang Y, Zhang C, Zhao C, Qian F, Kurths J. Unsupervised Estimation of Monocular Depth and VO in Dynamic Environments via Hybrid Masks. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2022; 33:2023-2033. [PMID: 34347607 DOI: 10.1109/tnnls.2021.3100895] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/13/2023]
Abstract
Deep learning-based methods mymargin have achieved remarkable performance in 3-D sensing since they perceive environments in a biologically inspired manner. Nevertheless, the existing approaches trained by monocular sequences are still prone to fail in dynamic environments. In this work, we mitigate the negative influence of dynamic environments on the joint estimation of depth and visual odometry (VO) through hybrid masks. Since both the VO estimation and view reconstruction process in the joint estimation framework is vulnerable to dynamic environments, we propose the cover mask and the filter mask to alleviate the adverse effects, respectively. As the depth and VO estimation are tightly coupled during training, the improved VO estimation promotes depth estimation as well. Besides, a depth-pose consistency loss is proposed to overcome the scale inconsistency between different training samples of monocular sequences. Experimental results show that both our depth prediction and globally consistent VO estimation are state of the art when evaluated on the KITTI benchmark. We evaluate our depth prediction model on the Make3D dataset to prove the transferability of our method as well.
Collapse
|
11
|
Robust scene text recognition: Using manifold regularized Twin-Support Vector Machine. JOURNAL OF KING SAUD UNIVERSITY - COMPUTER AND INFORMATION SCIENCES 2022. [DOI: 10.1016/j.jksuci.2019.01.013] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/22/2022]
|
12
|
|
13
|
Mishra P, Kumar S, Chaube MK. Classifying Chart Based on Structural Dissimilarities using Improved Regularized Loss Function. Neural Process Lett 2022. [DOI: 10.1007/s11063-021-10735-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
|
14
|
Improved graph-regularized deep belief network with sparse features learning for fault diagnosis. Neural Comput Appl 2022. [DOI: 10.1007/s00521-022-06972-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/27/2022]
|
15
|
Deep learning model construction for a semi-supervised classification with feature learning. COMPLEX INTELL SYST 2022. [DOI: 10.1007/s40747-022-00641-9] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/19/2022]
Abstract
AbstractSeveral deep models were proposed in image processing, data interpretation, speech recognition, and video analysis. Most of these architectures need a massive proportion of training samples and use arbitrary configuration. This paper constructs a deep learning architecture with feature learning. Graph convolution networks (GCNs), semi-supervised learning and graph data representation, have become increasingly popular as cost-effective and efficient methods. Most existing merging node descriptions for node distribution on the graph use stabilised neighbourhood knowledge, typically requiring a significant amount of variables and a high degree of computational complexity. To address these concerns, this research presents DLM-SSC, a unique method semi-supervised node classification tasks that can combine knowledge from multiple neighbourhoods at the same time by integrating high-order convolution and feature learning. This paper employs two function learning techniques for reducing the number of parameters and hidden layers: modified marginal fisher analysis (MMFA) and kernel principal component analysis (KPCA). The MMFA and KPCA weight matrices are modified layer by layer when implementing the DLM, a supervised pretraining technique that doesn't require a lot of information. Free measuring on citation datasets (Citeseer, Pubmed, and Cora) and other data sets demonstrate that the suggested approaches outperform similar algorithms.
Collapse
|
16
|
Zheng Q, Li Y, Zheng L, Shen Q. Progressively real-time video salient object detection via cascaded fully convolutional networks with motion attention. Neurocomputing 2022. [DOI: 10.1016/j.neucom.2021.10.007] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]
|
17
|
Dornaika F. On the use of high-order feature propagation in Graph Convolution Networks with Manifold Regularization. Inf Sci (N Y) 2022. [DOI: 10.1016/j.ins.2021.10.041] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/05/2022]
|
18
|
Li H, Weng J, Mao Y, Wang Y, Zhan Y, Cai Q, Gu W. Adaptive Dropout Method Based on Biological Principles. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2021; 32:4267-4276. [PMID: 33872159 DOI: 10.1109/tnnls.2021.3070895] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/12/2023]
Abstract
Dropout is one of the most widely used methods to avoid overfitting neural networks. However, it rigidly and randomly activates neurons according to a fixed probability, which is not consistent with the activation mode of neurons in the human cerebral cortex. Inspired by gene theory and the activation mechanism of brain neurons, we propose a more intelligent adaptive dropout, in which a variational self-encoder (VAE) overlaps to an existing neural network to regularize its hidden neurons by adaptively setting activities to zero. Through alternating iterative training, the discarding probability of each hidden neuron can be learned according to the weights and thus effectively avoid the shortcomings of the standard dropout method. The experimental results in multiple data sets illustrate that this method can better suppress overfitting in various neural networks than can the standard dropout. Additionally, this adaptive dropout technique can reduce the number of neurons and improve training efficiency.
Collapse
|
19
|
Zhang L, Liang R, Yin J, Zhang D, Shao L. Scene Categorization by Deeply Learning Gaze Behavior in a Semisupervised Context. IEEE TRANSACTIONS ON CYBERNETICS 2021; 51:4265-4276. [PMID: 31144650 DOI: 10.1109/tcyb.2019.2913016] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/09/2023]
Abstract
Accurately recognizing different categories of sceneries with sophisticated spatial configurations is a useful technique in computer vision and intelligent systems, e.g., scene understanding and autonomous driving. Competitive accuracies have been observed by the deep recognition models recently. Nevertheless, these deep architectures cannot explicitly characterize human visual perception, that is, the sequence of gaze allocation and the subsequent cognitive processes when viewing each scenery. In this paper, a novel spatially aware aggregation network is proposed for scene categorization, where the human gaze behavior is discovered in a semisupervised setting. In particular, as semantically labeling a large quantity of scene images is labor-intensive, a semisupervised and structure-preserved non-negative matrix factorization (NMF) is proposed to detect a set of visually/semantically salient regions from each scenery. Afterward, the gaze shifting path (GSP) is engineered to characterize the process of humans perceiving each scene picture. To deeply describe each GSP, a novel spatially aware CNN termed SA-Net is developed. It accepts input regions with various shapes and statistically aggregates all the salient regions along each GSP. Finally, the learned deep GSP features from the entire scene images are fused into an image kernel, which is subsequently integrated into a kernel SVM to categorize different sceneries. Comparative experiments on six scene image sets have shown the advantage of our method.
Collapse
|
20
|
Wang YT, Zhao XL, Jiang TX, Deng LJ, Chang Y, Huang TZ. Rain Streaks Removal for Single Image via Kernel-Guided Convolutional Neural Network. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2021; 32:3664-3676. [PMID: 32822310 DOI: 10.1109/tnnls.2020.3015897] [Citation(s) in RCA: 11] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/11/2023]
Abstract
Recently emerged deep learning methods have achieved great success in single image rain streaks removal. However, existing methods ignore an essential factor in the rain streaks generation mechanism, i.e., the motion blur leading to the line pattern appearances. Thus, they generally produce overderaining or underderaining results. In this article, inspired by the generation mechanism, we propose a novel rain streaks removal framework using a kernel-guided convolutional neural network (KGCNN), achieving state-of-the-art performance with a simple network architecture. More precisely, our framework consists of three steps. First, we learn the motion blur kernel by a plain neural network, termed parameter network, from the detail layer of a rainy patch. Then, we stretch the learned motion blur kernel into a degradation map with the same spatial size as the rainy patch. Finally, we use the stretched degradation map together with the detail patches to train a deraining network with a typical ResNet architecture, which produces the rain streaks with the guidance of the learned motion blur kernel. Experiments conducted on extensive synthetic and real data demonstrate the effectiveness of the proposed KGCNN, in terms of rain streaks removal and image detail preservation.
Collapse
|
21
|
Flexible data representation with graph convolution for semi-supervised learning. Neural Comput Appl 2021. [DOI: 10.1007/s00521-020-05462-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/23/2022]
|
22
|
Zhu R, Dornaika F, Ruichek Y. Inductive semi-supervised learning with Graph Convolution based regression. Neurocomputing 2021. [DOI: 10.1016/j.neucom.2020.12.084] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/22/2022]
|
23
|
Dornaika F. Flexible data representation with feature convolution for semi-supervised learning. APPL INTELL 2021. [DOI: 10.1007/s10489-021-02210-y] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/21/2022]
|
24
|
Karabayir I, Akbilgic O, Tas N. A Novel Learning Algorithm to Optimize Deep Neural Networks: Evolved Gradient Direction Optimizer (EVGO). IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2021; 32:685-694. [PMID: 32481228 DOI: 10.1109/tnnls.2020.2979121] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/11/2023]
Abstract
Gradient-based algorithms have been widely used in optimizing parameters of deep neural networks' (DNNs) architectures. However, the vanishing gradient remains as one of the common issues in the parameter optimization of such networks. To cope with the vanishing gradient problem, in this article, we propose a novel algorithm, evolved gradient direction optimizer (EVGO), updating the weights of DNNs based on the first-order gradient and a novel hyperplane we introduce. We compare the EVGO algorithm with other gradient-based algorithms, such as gradient descent, RMSProp, Adagrad, momentum, and Adam on the well-known Modified National Institute of Standards and Technology (MNIST) data set for handwritten digit recognition by implementing deep convolutional neural networks. Furthermore, we present empirical evaluations of EVGO on the CIFAR-10 and CIFAR-100 data sets by using the well-known AlexNet and ResNet architectures. Finally, we implement an empirical analysis for EVGO and other algorithms to investigate the behavior of the loss functions. The results show that EVGO outperforms all the algorithms in comparison for all experiments. We conclude that EVGO can be used effectively in the optimization of DNNs, and also, the proposed hyperplane may provide a basis for future optimization algorithms.
Collapse
|
25
|
Chen H, Wang Y, Xu C, Xu C, Tao D. Learning Student Networks via Feature Embedding. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2021; 32:25-35. [PMID: 32092018 DOI: 10.1109/tnnls.2020.2970494] [Citation(s) in RCA: 14] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/10/2023]
Abstract
Deep convolutional neural networks have been widely used in numerous applications, but their demanding storage and computational resource requirements prevent their applications on mobile devices. Knowledge distillation aims to optimize a portable student network by taking the knowledge from a well-trained heavy teacher network. Traditional teacher-student-based methods used to rely on additional fully connected layers to bridge intermediate layers of teacher and student networks, which brings in a large number of auxiliary parameters. In contrast, this article aims to propagate information from teacher to student without introducing new variables that need to be optimized. We regard the teacher-student paradigm from a new perspective of feature embedding. By introducing the locality preserving loss, the student network is encouraged to generate the low-dimensional features that could inherit intrinsic properties of their corresponding high-dimensional features from the teacher network. The resulting portable network, thus, can naturally maintain the performance as that of the teacher network. Theoretical analysis is provided to justify the lower computation complexity of the proposed method. Experiments on benchmark data sets and well-trained networks suggest that the proposed algorithm is superior to state-of-the-art teacher-student learning methods in terms of computational and storage complexity.
Collapse
|
26
|
Li M, Wang D. 2-D Stochastic Configuration Networks for Image Data Analytics. IEEE TRANSACTIONS ON CYBERNETICS 2021; 51:359-372. [PMID: 31329148 DOI: 10.1109/tcyb.2019.2925883] [Citation(s) in RCA: 16] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/10/2023]
Abstract
Stochastic configuration networks (SCNs) as a class of randomized learner model have been successfully employed in data analytics due to its universal approximation capability and fast modeling property. The technical essence lies in stochastically configuring the hidden nodes (or basis functions) based on a supervisory mechanism rather than data-independent randomization as usually adopted for building randomized neural networks. Given image data modeling tasks, the use of 1-D SCNs potentially demolishes the spatial information of images, and may result in undesirable performance. This paper extends the original SCNs to a 2-D version, called 2DSCNs, for fast building randomized learners with matrix inputs. Some theoretical analysis on the goodness of 2DSCNs against SCNs, including the complexity of the random parameter space and the superiority of generalization, are presented. Empirical results over one regression example, four benchmark handwritten digit classification tasks, two human face recognition datasets, as well as one natural image database, demonstrate that the proposed 2DSCNs perform favorably and show good potential for image data analytics.
Collapse
|
27
|
Yang Y, Wu QMJ, Feng X, Akilan T. Recomputation of the Dense Layers for Performance Improvement of DCNN. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE 2020; 42:2912-2925. [PMID: 31107643 DOI: 10.1109/tpami.2019.2917685] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/09/2023]
Abstract
Gradient descent optimization of learning has become a paradigm for training deep convolutional neural networks (DCNN). However, utilizing other learning strategies in the training process of the DCNN has rarely been explored by the deep learning (DL) community. This serves as the motivation to introduce a non-iterative learning strategy to retrain neurons at the top dense or fully connected (FC) layers of DCNN, resulting in, higher performance. The proposed method exploits the Moore-Penrose Inverse to pull back the current residual error to each FC layer, generating well-generalized features. Further, the weights of each FC layers are recomputed according to the Moore-Penrose Inverse. We evaluate the proposed approach on six most widely accepted object recognition benchmark datasets: Scene-15, CIFAR-10, CIFAR-100, SUN-397, Places365, and ImageNet. The experimental results show that the proposed method obtains improvements over 30 state-of-the-art methods. Interestingly, it also indicates that any DCNN with the proposed method can provide better performance than the same network with its original Backpropagation (BP)-based training.
Collapse
|
28
|
Miao KC, Han TT, Yao YQ, Lu H, Chen P, Wang B, Zhang J. Application of LSTM for short term fog forecasting based on meteorological elements. Neurocomputing 2020. [DOI: 10.1016/j.neucom.2019.12.129] [Citation(s) in RCA: 35] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
|
29
|
Qin J, Pan W, Xiang X, Tan Y, Hou G. A biological image classification method based on improved CNN. ECOL INFORM 2020. [DOI: 10.1016/j.ecoinf.2020.101093] [Citation(s) in RCA: 16] [Impact Index Per Article: 3.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/24/2022]
|
30
|
Visualization analysis for fault diagnosis in chemical processes using recurrent neural networks. J Taiwan Inst Chem Eng 2020. [DOI: 10.1016/j.jtice.2020.06.016] [Citation(s) in RCA: 14] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022]
|
31
|
Nassour J, Duy Hoa T, Atoofi P, Hamker F. Concrete Action Representation Model: From Neuroscience to Robotics. IEEE Trans Cogn Dev Syst 2020. [DOI: 10.1109/tcds.2019.2896300] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/08/2022]
|
32
|
Kejani MT, Dornaika F, Talebi H. Graph Convolution Networks with manifold regularization for semi-supervised learning. Neural Netw 2020; 127:160-167. [PMID: 32361546 DOI: 10.1016/j.neunet.2020.04.016] [Citation(s) in RCA: 12] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/24/2019] [Revised: 04/04/2020] [Accepted: 04/16/2020] [Indexed: 10/24/2022]
Abstract
In recent times, Graph Convolution Networks (GCN) have been proposed as a powerful tool for graph-based semi-supervised learning. In this paper, we introduce a model that enhances label propagation of Graph Convolution Networks (GCN). More precisely, we propose GCNs with Manifold Regularization (GCNMR). The objective function of the proposed GCNMR is composed by a supervised term and an unsupervised term. The supervised term enforces the fitting term between the predicted labels and the known labels. The unsupervised term imposes the smoothness of the predicted labels of the whole data samples. By learning a Graph Convolution Network with the proposed objective function, we are able to derive a more powerful semi-supervised learning. The proposed model retains the advantages of the classic GCN, yet it can improve it with no increase in time complexity. Experiments on three public image datasets show that the proposed model is superior to the GCN and several competing existing graph-based semi-supervised learning methods.
Collapse
Affiliation(s)
| | - F Dornaika
- University of the Basque Country UPV/EHU, San Sebastian, Spain; IKERBASQUE, Basque Foundation for Science, Bilbao, Spain.
| | - H Talebi
- Amirkabir University of Technology, Tehran, Iran
| |
Collapse
|
33
|
He N, Fang L, Li S, Plaza J, Plaza A. Skip-Connected Covariance Network for Remote Sensing Scene Classification. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2020; 31:1461-1474. [PMID: 31295122 DOI: 10.1109/tnnls.2019.2920374] [Citation(s) in RCA: 33] [Impact Index Per Article: 6.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/09/2023]
Abstract
This paper proposes a novel end-to-end learning model, called skip-connected covariance (SCCov) network, for remote sensing scene classification (RSSC). The innovative contribution of this paper is to embed two novel modules into the traditional convolutional neural network (CNN) model, i.e., skip connections and covariance pooling. The advantages of newly developed SCCov are twofold. First, by means of the skip connections, the multi-resolution feature maps produced by the CNN are combined together, which provides important benefits to address the presence of large-scale variance in RSSC data sets. Second, by using covariance pooling, we can fully exploit the second-order information contained in such multi-resolution feature maps. This allows the CNN to achieve more representative feature learning when dealing with RSSC problems. Experimental results, conducted using three large-scale benchmark data sets, demonstrate that our newly proposed SCCov network exhibits very competitive or superior classification performance when compared with the current state-of-the-art RSSC techniques, using a much lower amount of parameters. Specifically, our SCCov only needs 10% of the parameters used by its counterparts.
Collapse
|
34
|
Gowthami S, Harikumar R. Conventional neural network for blind image blur correction using latent semantics. Soft comput 2020. [DOI: 10.1007/s00500-020-04859-y] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/24/2022]
|
35
|
Li Z, Zhou A, Shen Y. An End-to-End Trainable Multi-Column CNN for Scene Recognition in Extremely Changing Environment. SENSORS (BASEL, SWITZERLAND) 2020; 20:E1556. [PMID: 32168843 PMCID: PMC7147165 DOI: 10.3390/s20061556] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 01/07/2020] [Revised: 02/27/2020] [Accepted: 03/07/2020] [Indexed: 11/21/2022]
Abstract
Scene recognition is an essential part in the vision-based robot navigation domain. The successful application of deep learning technology has triggered more extensive preliminary studies on scene recognition, which all use extracted features from networks that are trained for recognition tasks. In the paper, we interpret scene recognition as a region-based image retrieval problem and present a novel approach for scene recognition with an end-to-end trainable Multi-column convolutional neural network (MCNN) architecture. The proposed MCNN utilizes filters with receptive fields of different sizes to have Multi-level and Multi-layer image perception, and consists of three components: front-end, middle-end and back-end. The first seven layers VGG16 are taken as front-end for two-dimensional feature extraction, Inception-A is taken as the middle-end for deeper learning feature representation, and Large-Margin Softmax Loss (L-Softmax) is taken as the back-end for enhancing intra-class compactness and inter-class-separability. Extensive experiments have been conducted to evaluate the performance according to compare our proposed network to existing state-of-the-art methods. Experimental results on three popular datasets demonstrate the robustness and accuracy of our approach. To the best of our knowledge, the presented approach has not been applied for the scene recognition in literature.
Collapse
Affiliation(s)
- Zhenyu Li
- School of Mechanical Engineering, Tongji University, Shanghai 201804, China;
| | - Aiguo Zhou
- School of Mechanical Engineering, Tongji University, Shanghai 201804, China;
| | - Yong Shen
- School of Automotive Studies, Tongji University, Shanghai 201804, China;
| |
Collapse
|
36
|
Luo Y, Wong Y, Kankanhalli M, Zhao Q. G -Softmax: Improving Intraclass Compactness and Interclass Separability of Features. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2020; 31:685-699. [PMID: 31094695 DOI: 10.1109/tnnls.2019.2909737] [Citation(s) in RCA: 11] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/09/2023]
Abstract
Intraclass compactness and interclass separability are crucial indicators to measure the effectiveness of a model to produce discriminative features, where intraclass compactness indicates how close the features with the same label are to each other and interclass separability indicates how far away the features with different labels are. In this paper, we investigate intraclass compactness and interclass separability of features learned by convolutional networks and propose a Gaussian-based softmax ( G -softmax) function that can effectively improve intraclass compactness and interclass separability. The proposed function is simple to implement and can easily replace the softmax function. We evaluate the proposed G -softmax function on classification data sets (i.e., CIFAR-10, CIFAR-100, and Tiny ImageNet) and on multilabel classification data sets (i.e., MS COCO and NUS-WIDE). The experimental results show that the proposed G -softmax function improves the state-of-the-art models across all evaluated data sets. In addition, the analysis of the intraclass compactness and interclass separability demonstrates the advantages of the proposed function over the softmax function, which is consistent with the performance improvement. More importantly, we observe that high intraclass compactness and interclass separability are linearly correlated with average precision on MS COCO and NUS-WIDE. This implies that the improvement of intraclass compactness and interclass separability would lead to the improvement of average precision.
Collapse
|
37
|
Yang Y, Wu QMJ. Features Combined From Hundreds of Midlayers: Hierarchical Networks With Subnetwork Nodes. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2019; 30:3313-3325. [PMID: 30703046 DOI: 10.1109/tnnls.2018.2890787] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/09/2023]
Abstract
In this paper, we believe that the mixed selectivity of neuron in the top layer encodes distributed information produced from other neurons to offer a significant computational advantage over recognition accuracy. Thus, this paper proposes a hierarchical network framework that the learning behaviors of features combined from hundreds of midlayers. First, a subnetwork neuron, which itself could be constructed by other nodes, is functional as a subspace features extractor. The top layer of a hierarchical network needs subspace features produced by the subnetwork neurons to get rid of factors that are not relevant, but at the same time, to recast the subspace features into a mapping space so that the hierarchical network can be processed to generate more reliable cognition. Second, this paper shows that with noniterative learning strategy, the proposed method has a wider and shallower structure, providing a significant role in generalization performance improvements. Hence, compared with other state-of-the-art methods, multiple channel features with the proposed method could provide a comparable or even better performance, which dramatically boosts the learning speed. Our experimental results show that our platform can provide a much better generalization performance than 55 other state-of-the-art methods.
Collapse
|
38
|
|
39
|
Chen J, Su M, Shen S, Xiong H, Zheng H. POBA-GA: Perturbation optimized black-box adversarial attacks via genetic algorithm. Comput Secur 2019. [DOI: 10.1016/j.cose.2019.04.014] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/26/2022]
|
40
|
Yu JB. Evolutionary manifold regularized stacked denoising autoencoders for gearbox fault diagnosis. Knowl Based Syst 2019. [DOI: 10.1016/j.knosys.2019.04.022] [Citation(s) in RCA: 31] [Impact Index Per Article: 5.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/26/2022]
|
41
|
Wu X, Ling X, Liu J. Location Recognition Algorithm for Vision-Based Industrial Sorting Robot via Deep Learning. INT J PATTERN RECOGN 2019. [DOI: 10.1142/s0218001419550097] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022]
Abstract
In this paper, the deep convolutional neural network (DCNN) is applied to locating and recognizing complex workpieces automatically for the vision-based sorting robot in industrial production process. Firstly, in order to obtain the location of workpieces, the pixel projection algorithm (PPA), which consists of pre-procession and pixel projection operation, is presented to eliminate uneven illumination, and locate and segment workpieces images. Then, we get the objective information and identify the object by training DCNN, which is used to recognize the rational degree and type of workpieces at a high rate of speed. Finally, experimental results prove the validity of the location-recognition algorithms for the vision-based sorting robot. The location error and recognition accuracy can be significantly improved in the experimental environment.
Collapse
Affiliation(s)
- Xiru Wu
- College of Electronic Engineering and Automation, Guilin University of Electronic Technology, Guilin 541004, P. R. China
- Guangxi Key Laboratory for Nonlinear Circuit and Optical Communication, (Guangxi Normal University), Guilin, Guangxi 541004, P. R. China
| | - Xingyu Ling
- College of Electronic Engineering and Automation, Guilin University of Electronic Technology, Guilin 541004, P. R. China
- Guangxi Key Laboratory for Nonlinear Circuit and Optical Communication, (Guangxi Normal University), Guilin, Guangxi 541004, P. R. China
| | - Jinxia Liu
- Office of State Asset Management, Guilin University of Electronic Technology, Guilin 541004, P. R. China
| |
Collapse
|
42
|
Application of Deep Convolutional Neural Networks and Smartphone Sensors for Indoor Localization. APPLIED SCIENCES-BASEL 2019. [DOI: 10.3390/app9112337] [Citation(s) in RCA: 27] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/17/2022]
Abstract
Indoor localization systems are susceptible to higher errors and do not meet the current standards of indoor localization. Moreover, the performance of such approaches is limited by device dependence. The use of Wi-Fi makes the localization process vulnerable to dynamic factors and energy hungry. A multi-sensor fusion based indoor localization approach is proposed to overcome these issues. The proposed approach predicts pedestrians’ current location with smartphone sensors data alone. The proposed approach aims at mitigating the impact of device dependency on the localization accuracy and lowering the localization error in the magnetic field based localization systems. We trained a deep learning based convolutional neural network to recognize the indoor scene which helps to lower the localization error. The recognized scene is used to identify a specific floor and narrow the search space. The database built of magnetic field patterns helps to lower the device dependence. A modified K nearest neighbor (mKNN) is presented to calculate the pedestrian’s current location. The data from pedestrian dead reckoning further refines this location and an extended Kalman filter is implemented to this end. The performance of the proposed approach is tested with experiments on Galaxy S8 and LG G6 smartphones. The experimental results demonstrate that the proposed approach can achieve an accuracy of 1.04 m at 50 percent, regardless of the smartphone used for localization. The proposed mKNN outperforms K nearest neighbor approach, and mean, variance, and maximum errors are lower than those of KNN. Moreover, the proposed approach does not use Wi-Fi for localization and is more energy efficient than those of Wi-Fi based approaches. Experiments reveal that localization without scene recognition leads to higher errors.
Collapse
|
43
|
Joint sparse graph and flexible embedding for graph-based semi-supervised learning. Neural Netw 2019; 114:91-95. [DOI: 10.1016/j.neunet.2019.03.002] [Citation(s) in RCA: 17] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/29/2018] [Revised: 02/19/2019] [Accepted: 03/05/2019] [Indexed: 11/20/2022]
|
44
|
Passalis N, Tefas A. Unsupervised Knowledge Transfer Using Similarity Embeddings. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2019; 30:946-950. [PMID: 30047908 DOI: 10.1109/tnnls.2018.2851924] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/08/2023]
Abstract
With the advent of deep neural networks, there is a growing interest in transferring the knowledge from a large and complex model to a smaller and faster one. In this brief, a method for unsupervised knowledge transfer (KT) between neural networks is proposed. To the best of our knowledge, the proposed method is the first method that utilizes similarity-induced embeddings to transfer the knowledge between any two layers of neural networks, regardless of the number of neurons in each of them. By this way, the knowledge is transferred without using any lossy dimensionality reduction transformations or requiring any information about the complex model, except for the activations of the layer used for KT. This is in contrast with most existing approaches that only generate soft-targets for training the smaller neural network or directly use the weights of the larger model. The proposed method is evaluated using six image data sets and it is demonstrated, through extensive experiments, that the knowledge of a neural network can be successfully transferred using different kinds of (synthetic or not) data, ranging from cross-domain data to just randomly generated data.
Collapse
|
45
|
Zhu R, Dornaika F, Ruichek Y. Learning a discriminant graph-based embedding with feature selection for image categorization. Neural Netw 2019; 111:35-46. [PMID: 30660101 DOI: 10.1016/j.neunet.2018.12.008] [Citation(s) in RCA: 32] [Impact Index Per Article: 5.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/12/2018] [Revised: 11/19/2018] [Accepted: 12/11/2018] [Indexed: 11/24/2022]
Abstract
Graph-based embedding methods are very useful for reducing the dimension of high-dimensional data and for extracting their relevant features. In this paper, we introduce a novel nonlinear method called Flexible Discriminant graph-based Embedding with feature selection (FDEFS). The proposed algorithm aims to classify image sample data in supervised learning and semi-supervised learning settings. Specifically, our method incorporates the Manifold Smoothness, Margin Discriminant Embedding and the Sparse Regression for feature selection. The weights add ℓ2,1-norm regularization for local linear approximation. The sparse regression implicitly performs feature selection on the original features of data matrix and of the linear transform. We also provide an effective solution method to optimize the objective function. We apply the algorithm on six public image datasets including scene, face and object datasets. These experiments demonstrate the effectiveness of the proposed embedding method. They also show that proposed the method compares favorably with many competing embedding methods.
Collapse
Affiliation(s)
- Ruifeng Zhu
- Laboratory of Electronics, Information and Image(LE2i), CNRS, University of Bourgogne Franche-Comte, Belfort, France; Faculty of Computer Science, University of the Basque Country UPV/EHU, Spain
| | - Fadi Dornaika
- Faculty of Computer Science, University of the Basque Country UPV/EHU, Spain; IKERBASQUE, Basque Foundation for Science, Bilbao, Spain.
| | - Yassine Ruichek
- Laboratory of Electronics, Information and Image(LE2i), CNRS, University of Bourgogne Franche-Comte, Belfort, France
| |
Collapse
|
46
|
Zhao W, Tan S, Guan Z, Zhang B, Gong M, Cao Z, Wang Q. Learning to Map Social Network Users by Unified Manifold Alignment on Hypergraph. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2018; 29:5834-5846. [PMID: 29993666 DOI: 10.1109/tnnls.2018.2812888] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/08/2023]
Abstract
Nowadays, a lot of people possess accounts on multiple online social networks, e.g., Facebook and Twitter. These networks are overlapped, but the correspondences between their users are not explicitly given. Mapping common users across these social networks will be beneficial for applications such as cross-network recommendation. In recent years, a lot of mapping algorithms have been proposed which exploited social and/or profile relations between users from different networks. However, there is still a lack of unified mapping framework which can well exploit high-order relational information in both social structures and profiles. In this paper, we propose a unified hypergraph learning framework named unified manifold alignment on hypergraph (UMAH) for this task. UMAH models social structures and user profile relations in a unified hypergraph where the relative weights of profile hyperedges are determined automatically. Given a set of training user correspondences, a common subspace is learned by preserving the hypergraph structure as well as the correspondence relations of labeled users. UMAH intrinsically performs semisupervised manifold alignment with profile information for calibration. For a target user in one network, UMAH ranks all the users in the other network by their probabilities of being the corresponding user (measured by similarity in the subspace). In experiments, we evaluate UMAH on three real world data sets and compare it to state-of-art baseline methods. Experimental results have demonstrated the effectiveness of UMAH in mapping users across networks.
Collapse
|
47
|
Abstract
Place recognition is one of the most fundamental topics in the computer-vision and robotics communities, where the task is to accurately and efficiently recognize the location of a given query image. Despite years of knowledge accumulated in this field, place recognition still remains an open problem due to the various ways in which the appearance of real-world places may differ. This paper presents an overview of the place-recognition literature. Since condition-invariant and viewpoint-invariant features are essential factors to long-term robust visual place-recognition systems, we start with traditional image-description methodology developed in the past, which exploits techniques from the image-retrieval field. Recently, the rapid advances of related fields, such as object detection and image classification, have inspired a new technique to improve visual place-recognition systems, that is, convolutional neural networks (CNNs). Thus, we then introduce the recent progress of visual place-recognition systems based on CNNs to automatically learn better image representations for places. Finally, we close with discussions and mention of future work on place recognition.
Collapse
|
48
|
Yang Z, Merrick K, Jin L, Abbass HA. Hierarchical Deep Reinforcement Learning for Continuous Action Control. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2018; 29:5174-5184. [PMID: 29994078 DOI: 10.1109/tnnls.2018.2805379] [Citation(s) in RCA: 30] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/08/2023]
Abstract
Robotic control in a continuous action space has long been a challenging topic. This is especially true when controlling robots to solve compound tasks, as both basic skills and compound skills need to be learned. In this paper, we propose a hierarchical deep reinforcement learning algorithm to learn basic skills and compound skills simultaneously. In the proposed algorithm, compound skills and basic skills are learned by two levels of hierarchy. In the first level of hierarchy, each basic skill is handled by its own actor, overseen by a shared basic critic. Then, in the second level of hierarchy, compound skills are learned by a meta critic by reusing basic skills. The proposed algorithm was evaluated on a Pioneer 3AT robot in three different navigation scenarios with fully observable tasks. The simulations were built in Gazebo 2 in a robot operating system Indigo environment. The results show that the proposed algorithm can learn both high performance basic skills and compound skills through the same learning process. The compound skills learned outperform those learned by a discrete action space deep reinforcement learning algorithm.
Collapse
|
49
|
Deep Learning Scene Recognition Method Based on Localization Enhancement. SENSORS 2018; 18:s18103376. [PMID: 30308964 PMCID: PMC6209898 DOI: 10.3390/s18103376] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 09/11/2018] [Revised: 10/02/2018] [Accepted: 10/08/2018] [Indexed: 11/16/2022]
Abstract
With the rapid development of indoor localization in recent years; signals of opportunity have become a reliable and convenient source for indoor localization. The mobile device cannot only capture images of the indoor environment in real-time, but can also obtain one or more different types of signals of opportunity as well. Based on this, we design a convolutional neural network (CNN) model that concatenates features of image data and signals of opportunity for localization by using indoor scene datasets and simulating the situation of indoor location probability. Using the method of transfer learning on the Inception V3 network model feature information is added to assist in scene recognition. The experimental result shows that, for two different experiment sceneries, the accuracies of the prediction results are 97.0% and 96.6% using the proposed model, compared to 69.0% and 81.2% by the method of overlapping positioning information and the base map, and compared to 73.3% and 77.7% by using the fine-tuned Inception V3 model. The accuracy of indoor scene recognition is improved; in particular, the error rate at the spatial connection of different scenes is decreased, and the recognition rate of similar scenes is increased.
Collapse
|
50
|
Xing F, Xie Y, Su H, Liu F, Yang L. Deep Learning in Microscopy Image Analysis: A Survey. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2018; 29:4550-4568. [PMID: 29989994 DOI: 10.1109/tnnls.2017.2766168] [Citation(s) in RCA: 168] [Impact Index Per Article: 24.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/21/2023]
Abstract
Computerized microscopy image analysis plays an important role in computer aided diagnosis and prognosis. Machine learning techniques have powered many aspects of medical investigation and clinical practice. Recently, deep learning is emerging as a leading machine learning tool in computer vision and has attracted considerable attention in biomedical image analysis. In this paper, we provide a snapshot of this fast-growing field, specifically for microscopy image analysis. We briefly introduce the popular deep neural networks and summarize current deep learning achievements in various tasks, such as detection, segmentation, and classification in microscopy image analysis. In particular, we explain the architectures and the principles of convolutional neural networks, fully convolutional networks, recurrent neural networks, stacked autoencoders, and deep belief networks, and interpret their formulations or modelings for specific tasks on various microscopy images. In addition, we discuss the open challenges and the potential trends of future research in microscopy image analysis using deep learning.
Collapse
|