1
|
Diao S, Wan Y, Huang D, Huang S, Sadiq T, Khan MS, Hussain L, Alkahtani BS, Mazhar T. Optimizing Bi-LSTM networks for improved lung cancer detection accuracy. PLoS One 2025; 20:e0316136. [PMID: 39992919 PMCID: PMC11849851 DOI: 10.1371/journal.pone.0316136] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/30/2024] [Accepted: 12/05/2024] [Indexed: 02/26/2025] Open
Abstract
Lung cancer remains a leading cause of cancer-related deaths worldwide, with low survival rates often attributed to late-stage diagnosis. To address this critical health challenge, researchers have developed computer-aided diagnosis (CAD) systems that rely on feature extraction from medical images. However, accurately identifying the most informative image features for lung cancer detection remains a significant challenge. This study aimed to compare the effectiveness of both hand-crafted and deep learning-based approaches for lung cancer diagnosis. We employed traditional hand-crafted features, such as Gray Level Co-occurrence Matrix (GLCM) features, in conjunction with traditional machine learning algorithms. To explore the potential of deep learning, we also optimized and implemented a Bidirectional Long Short-Term Memory (Bi-LSTM) network for lung cancer detection. The results revealed that the highest performance using hand-crafted features was achieved by extracting GLCM features and utilizing Support Vector Machine (SVM) with different kernels, reaching an accuracy of 99.78% and an AUC of 0.999. However, the deep learning Bi-LSTM network surpassed both methods, achieving an accuracy of 99.89% and an AUC of 1.0000. These findings suggest that the proposed methodology, combining hand-crafted features and deep learning, holds significant promise for enhancing early lung cancer detection and ultimately improving diagnosis systems.
Collapse
Affiliation(s)
- Su Diao
- Department of Industrial & Systems Engineering, Auburn University, Auburn, Alabama, United States of America
| | - Yajie Wan
- Department of Computer Science, Brown University, Providence, RI, United States of America
| | - Danyi Huang
- Department of Chemical Engineering, Columbia University, New York City, NY, United States of America
| | - Shijia Huang
- Fu Foundation School of Engineering and Applied Science, Fu Foundation School of Engineering and Applied Science, Columbia University, New York, NY, United States of America
| | - Touseef Sadiq
- Department of Information and Communication Technology, Centre for Artificial Intelligence Research (CAIR), University of Agder, Grimstad, Norway
| | | | - Lal Hussain
- Department of Computer Science and Information Technology, The University of Azad Jammu and Kashmir, Chattar Kalas Campus, Muzaffarabad, Pakistan
- Department of Computer Science, Neelum Campus, The University of Azad Jammu and Kashmir, Azad Kashmir, Pakistan
| | - Badr S. Alkahtani
- Department of Mathematics, King Saud University, Riyadh, Saudi Arabia
| | - Tehseen Mazhar
- School of Computer Science, National College of Business Administration and Economics, Lahore, Pakistan
- Department of Computer Science and Information Technology, School Education Department, Government of Punjab, Layyah, Pakistan
| |
Collapse
|
2
|
Yan X, Mao Y, Ye Y, Yu H. Cross-Modal Clustering With Deep Correlated Information Bottleneck Method. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2024; 35:13508-13522. [PMID: 37220062 DOI: 10.1109/tnnls.2023.3269789] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/25/2023]
Abstract
Cross-modal clustering (CMC) intends to improve the clustering accuracy (ACC) by exploiting the correlations across modalities. Although recent research has made impressive advances, it remains a challenge to sufficiently capture the correlations across modalities due to the high-dimensional nonlinear characteristics of individual modalities and the conflicts in heterogeneous modalities. In addition, the meaningless modality-private information in each modality might become dominant in the process of correlation mining, which also interferes with the clustering performance. To tackle these challenges, we devise a novel deep correlated information bottleneck (DCIB) method, which aims at exploring the correlation information between multiple modalities while eliminating the modality-private information in each modality in an end-to-end manner. Specifically, DCIB treats the CMC task as a two-stage data compression procedure, in which the modality-private information in each modality is eliminated under the guidance of the shared representation of multiple modalities. Meanwhile, the correlations between multiple modalities are preserved from the aspects of feature distributions and clustering assignments simultaneously. Finally, the objective of DCIB is formulated as an objective function based on a mutual information measurement, in which a variational optimization approach is proposed to ensure its convergence. Experimental results on four cross-modal datasets validate the superiority of the DCIB. Code is released at https://github.com/Xiaoqiang-Yan/DCIB.
Collapse
|
3
|
Liu N, Sun X, Yu H, Yao F, Xu G, Fu K. M2DCapsN: Multimodal, Multichannel, and Dual-Step Capsule Network for Natural Language Moment Localization. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2024; 35:11448-11462. [PMID: 37027272 DOI: 10.1109/tnnls.2023.3261927] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/19/2023]
Abstract
Natural language moment localization aims to localize the target moment that matches a given natural language query in an untrimmed video. The key to this challenging task is to capture fine-grained video-language correlations to establish the alignment between the query and target moment. Most existing works establish a single-pass interaction schema to capture correlations between queries and moments. Considering the complex feature space of lengthy video and diverse information between frames, the weight distribution of information interaction flow is prone to dispersion or misalignment, which leads to redundant information flow affecting the final prediction. We address this issue by proposing a capsule-based approach to model the query-video interactions, termed the Multimodal, Multichannel, and Dual-step Capsule Network ( [Formula: see text]DCapsN), which is derived from the intuition that "multiple people viewing multiple times is better than one person viewing one time." First, we introduce a multimodal capsule network, replacing the single-pass interaction schema of "one person viewing one time" with the iterative interaction schema of "one person viewing multiple times," which cyclically updates cross-modal interactions and modifies potential redundant interactions via its routing-by-agreement. Then, considering that the conventional routing mechanism only learns a single iterative interaction schema, we further propose a multichannel dynamic routing mechanism to learn multiple iterative interaction schemas, where each channel performs independent routing iteration to collectively capture cross-modal correlations from multiple subspaces, that is, "multiple people viewing." Moreover, we design a dual-step capsule network structure based on the multimodal, multichannel capsule network, bringing together the query and query-guided key moments to jointly enhance the original video, so as to select the target moments according to the enhanced part. Experimental results on three public datasets demonstrate the superiority of our approach in comparison with state-of-the-art methods, and comprehensive ablation and visualization analysis validate the effectiveness of each component of the proposed model.
Collapse
|
4
|
Hu H, Liu A, Guan Q, Qian H, Li X, Chen S, Zhou Q. Adaptively Customizing Activation Functions for Various Layers. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2023; 34:6096-6107. [PMID: 35007200 DOI: 10.1109/tnnls.2021.3133263] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/14/2023]
Abstract
To enhance the nonlinearity of neural networks and increase their mapping abilities between the inputs and response variables, activation functions play a crucial role to model more complex relationships and patterns in the data. In this work, a novel methodology is proposed to adaptively customize activation functions only by adding very few parameters to the traditional activation functions such as Sigmoid, Tanh, and rectified linear unit (ReLU). To verify the effectiveness of the proposed methodology, some theoretical and experimental analysis on accelerating the convergence and improving the performance is presented, and a series of experiments are conducted based on various network models (such as AlexNet, VggNet, GoogLeNet, ResNet and DenseNet), and various datasets (such as CIFAR10, CIFAR100, miniImageNet, PASCAL VOC, and COCO). To further verify the validity and suitability in various optimization strategies and usage scenarios, some comparison experiments are also implemented among different optimization strategies (such as SGD, Momentum, AdaGrad, AdaDelta, and ADAM) and different recognition tasks such as classification and detection. The results show that the proposed methodology is very simple but with significant performance in convergence speed, precision, and generalization, and it can surpass other popular methods such as ReLU and adaptive functions such as Swish in almost all experiments in terms of overall performance.
Collapse
|
5
|
Bellamkonda S, Gopalan NP, Mala C, Settipalli L. Facial expression recognition on partially occluded faces using component based ensemble stacked CNN. Cogn Neurodyn 2023; 17:985-1008. [PMID: 37522034 PMCID: PMC10374495 DOI: 10.1007/s11571-022-09879-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/27/2021] [Revised: 07/22/2022] [Accepted: 08/13/2022] [Indexed: 11/28/2022] Open
Abstract
Facial Expression Recognition (FER) is the basis for many applications including human-computer interaction and surveillance. While developing such applications, it is imperative to understand human emotions for better interaction with machines. Among many FER models developed so far, Ensemble Stacked Convolution Neural Networks (ES-CNN) showed an empirical impact in improving the performance of FER on static images. However, the existing ES-CNN based FER models trained with features extracted from the entire face, are unable to address the issues of ambient parameters such as pose, illumination, occlusions. To mitigate the problem of reduced performance of ES-CNN on partially occluded faces, a Component based ES-CNN (CES-CNN) is proposed. CES-CNN applies ES-CNN on action units of individual face components such as eyes, eyebrows, nose, cheek, mouth, and glabella as one subnet of the network. Max-Voting based ensemble classifier is used to ensemble the decisions of the subnets in order to obtain the optimized recognition accuracy. The proposed CES-CNN is validated by conducting experiments on benchmark datasets and the performance is compared with the state-of-the-art models. It is observed from the experimental results that the proposed model has a significant enhancement in the recognition accuracy compared to the existing models.
Collapse
Affiliation(s)
- Sivaiah Bellamkonda
- Department of Computer Applications, National Institute of Technology, Tiruchirappalli, Tamilnadu 620015 India
| | - N. P. Gopalan
- Department of Computer Applications, National Institute of Technology, Tiruchirappalli, Tamilnadu 620015 India
| | - C. Mala
- Department of Computer Science and Engineering, National Institute of Technology, Tiruchirappalli, Tamilnadu 620015 India
| | - Lavanya Settipalli
- Department of Computer Applications, National Institute of Technology, Tiruchirappalli, Tamilnadu 620015 India
| |
Collapse
|
6
|
Zhao H, Zhan ZH, Liu J. Outlier aware differential evolution for multimodal optimization problems. Appl Soft Comput 2023. [DOI: 10.1016/j.asoc.2023.110264] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 03/31/2023]
|
7
|
Lu Y, Chen X, Wu Z, Yu J. Decoupled Metric Network for Single-Stage Few-Shot Object Detection. IEEE TRANSACTIONS ON CYBERNETICS 2023; 53:514-525. [PMID: 35213322 DOI: 10.1109/tcyb.2022.3149825] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/14/2023]
Abstract
Within the last few years, great efforts have been made to study few-shot learning. Although general object detection is advancing at a rapid pace, few-shot detection remains a very challenging problem. In this work, we propose a novel decoupled metric network (DMNet) for single-stage few-shot object detection. We design a decoupled representation transformation (DRT) and an image-level distance metric learning (IDML) to solve the few-shot detection problem. The DRT can eliminate the adverse effect of handcrafted prior knowledge by predicting objectness and anchor shape. Meanwhile, to alleviate the problem of representation disagreement between classification and location (i.e., translational invariance versus translational variance), the DRT adopts a decoupled manner to generate adaptive representations so that the model is easier to learn from only a few training data. As for a few-shot classification in the detection task, we design an IDML tailored to enhance the generalization ability. This module can perform metric learning for the whole visual feature, so it can be more efficient than traditional DML due to the merit of parallel inference for multiobjects. Based on the DRT and IDML, our DMNet efficiently realizes a novel paradigm for few-shot detection, called single-stage metric detection. Experiments are conducted on the PASCAL VOC dataset and the MS COCO dataset. As a result, our method achieves state-of-the-art performance in few-shot object detection. The codes are available at https://github.com/yrqs/DMNet.
Collapse
|
8
|
Li Y, Wang F, Zheng Z. Adaptive Synchronization-Based Approach for Finite-Time Parameters Identification of Genetic Regulatory Networks. Neural Process Lett 2022. [DOI: 10.1007/s11063-022-10754-4] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/09/2023]
|
9
|
Zhu Z, Wang Z, Li D, Du W. Globalized Multiple Balanced Subsets With Collaborative Learning for Imbalanced Data. IEEE TRANSACTIONS ON CYBERNETICS 2022; 52:2407-2417. [PMID: 32609619 DOI: 10.1109/tcyb.2020.3001158] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/11/2023]
Abstract
The skewed distribution of data brings difficulties to classify minority and majority samples in the imbalanced problem. The balanced bagging randomly undersampes majority samples several times and combines the selected majority samples with minority samples to form several balanced subsets, in which the numbers of minority and majority samples are roughly equal. However, the balanced bagging is the lack of a unified learning framework. Moreover, it fails to concern the connection of all subsets and the global information of the entire data distribution. To this end, this article puts several balanced subsets into an effective learning framework with a criterion function. In the learning framework, one regularization term called RS establishes the connection and realizes the collaborative learning of all subsets by requiring the consistent outputs of the minority samples in different subsets. Besides, another regularization term called RW provides the global information to each basic classifier by reducing the difference between the direction of the solution vector in each subset and that in the entire dataset. The proposed learning framework is called globalized multiple balanced subsets with collaborative learning (GMBSCL). The experimental results validate the effectiveness of the proposed GMBSCL.
Collapse
|
10
|
Ran R, Feng J, Zhang S, Fang B. A General Matrix Function Dimensionality Reduction Framework and Extension for Manifold Learning. IEEE TRANSACTIONS ON CYBERNETICS 2022; 52:2137-2148. [PMID: 32697725 DOI: 10.1109/tcyb.2020.3003620] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/11/2023]
Abstract
Many dimensionality reduction methods in the manifold learning field have the so-called small-sample-size (SSS) problem. Starting from solving the SSS problem, we first summarize the existing dimensionality reduction methods and construct a unified criterion function of these methods. Then, combining the unified criterion with the matrix function, we propose a general matrix function dimensionality reduction framework. This framework is configurable, that is, one can select suitable functions to construct such a matrix transformation framework, and then a series of new dimensionality reduction methods can be derived from this framework. In this article, we discuss how to choose suitable functions from two aspects: 1) solving the SSS problem and 2) improving pattern classification ability. As an extension, with the inverse hyperbolic tangent function and linear function, we propose a new matrix function dimensionality reduction framework. Compared with the existing methods to solve the SSS problem, these new methods can obtain better pattern classification ability and have less computational complexity. The experimental results on handwritten digit, letters databases, and two face databases show the superiority of the new methods.
Collapse
|
11
|
Li Y, Fan X, Gaussier E. Supervised Categorical Metric Learning With Schatten p-Norms. IEEE TRANSACTIONS ON CYBERNETICS 2022; 52:2059-2069. [PMID: 32697727 DOI: 10.1109/tcyb.2020.3004437] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/11/2023]
Abstract
Metric learning has been successful in learning new metrics adapted to numerical datasets. However, its development of categorical data still needs further exploration. In this article, we propose a method, called CPML for categorical projected metric learning, which tries to efficiently (i.e., less computational time and better prediction accuracy) address the problem of metric learning in categorical data. We make use of the value distance metric to represent our data and propose new distances based on this representation. We then show how to efficiently learn new metrics. We also generalize several previous regularizers through the Schatten p -norm and provide a generalization bound for it that complements the standard generalization bound for metric learning. The experimental results show that our method provides state-of-the-art results while being faster.
Collapse
|
12
|
Zhou Y, Luo S, Pan L, Liu L, Song D. Continuous temporal network embedding by modeling neighborhood propagation process. Knowl Based Syst 2022. [DOI: 10.1016/j.knosys.2021.107998] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/19/2022]
|
13
|
Heterogeneous information network embedding based on multiperspective metapath for question routing. Knowl Based Syst 2022. [DOI: 10.1016/j.knosys.2021.107842] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/24/2022]
|
14
|
Zhang J, Yang J, Yu J, Fan J. Semisupervised image classification by mutual learning of multiple self‐supervised models. INT J INTELL SYST 2022. [DOI: 10.1002/int.22814] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/06/2022]
Affiliation(s)
- Jian Zhang
- School of Science and Technology Zhejiang International Studies University Hangzhou Zhejiang China
| | - Jianing Yang
- School of Science and Technology Zhejiang International Studies University Hangzhou Zhejiang China
| | - Jun Yu
- Computer and Software School Hangzhou Dianzi University Hangzhou Zhejiang China
| | - Jianping Fan
- Department of Computer Science University of North Carolina at Charlotte Charlotte North Carolina USA
| |
Collapse
|
15
|
|
16
|
Qiao S, Han N, Huang F, Yue K, Wu T, Yi Y, Mao R, Yuan CA. LMNNB: Two-in-One imbalanced classification approach by combining metric learning and ensemble learning. APPL INTELL 2021. [DOI: 10.1007/s10489-021-02901-6] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
|
17
|
Zeng Y, Chen J, Huang GB. Slice-Based Online Convolutional Dictionary Learning. IEEE TRANSACTIONS ON CYBERNETICS 2021; 51:5116-5129. [PMID: 31443059 DOI: 10.1109/tcyb.2019.2931914] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/10/2023]
Abstract
Convolutional dictionary learning (CDL) aims to learn a structured and shift-invariant dictionary to decompose signals into sparse representations. While yielding superior results compared to traditional sparse coding methods on various signal and image processing tasks, most CDL methods have difficulties handling large data, because they have to process all images in the dataset in a single pass. Therefore, recent research has focused on online CDL (OCDL) which updates the dictionary with sequentially incoming signals. In this article, a novel OCDL algorithm is proposed based on a local, slice-based representation of sparse codes. Such representation has been found useful in batch CDL problems, where the convolutional sparse coding and dictionary learning problem could be handled in a local way similar to traditional sparse coding problems, but it has never been explored under online scenarios before. We show, in this article, that the proposed algorithm is a natural extension of the traditional patch-based online dictionary learning algorithm, and the dictionary is updated in a similar memory efficient way too. On the other hand, it can be viewed as an improvement of existing second-order OCDL algorithms. Theoretical analysis shows that our algorithm converges and has lower time complexity than existing counterpart that yields exactly the same output. Extensive experiments are performed on various benchmarking datasets, which show that our algorithm outperforms state-of-the-art batch and OCDL algorithms in terms of reconstruction objectives.
Collapse
|
18
|
Dong LJ, Zhang HB, Shi Q, Lei Q, Du JX, Gao S. Learning and fusing multiple hidden substages for action quality assessment. Knowl Based Syst 2021. [DOI: 10.1016/j.knosys.2021.107388] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/22/2022]
|
19
|
Yu J, Yan X. Deep unLSTM network: Features with memory information extracted from unlabeled data and their application on industrial unsupervised industrial fault detection. Appl Soft Comput 2021. [DOI: 10.1016/j.asoc.2021.107382] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/25/2022]
|
20
|
Li T, Kou G, Peng Y, Yu PS. A fast diagonal distance metric learning approach for large-scale datasets. Inf Sci (N Y) 2021. [DOI: 10.1016/j.ins.2021.04.077] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
|
21
|
Yu J, Xu X, Gao F, Shi S, Wang M, Tao D, Huang Q. Toward Realistic Face Photo-Sketch Synthesis via Composition-Aided GANs. IEEE TRANSACTIONS ON CYBERNETICS 2021; 51:4350-4362. [PMID: 32149668 DOI: 10.1109/tcyb.2020.2972944] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/10/2023]
Abstract
Face photo-sketch synthesis aims at generating a facial sketch/photo conditioned on a given photo/sketch. It covers wide applications including digital entertainment and law enforcement. Precisely depicting face photos/sketches remains challenging due to the restrictions on structural realism and textural consistency. While existing methods achieve compelling results, they mostly yield blurred effects and great deformation over various facial components, leading to the unrealistic feeling of synthesized images. To tackle this challenge, in this article, we propose using facial composition information to help the synthesis of face sketch/photo. Especially, we propose a novel composition-aided generative adversarial network (CA-GAN) for face photo-sketch synthesis. In CA-GAN, we utilize paired inputs, including a face photo/sketch and the corresponding pixelwise face labels for generating a sketch/photo. Next, to focus training on hard-generated components and delicate facial structures, we propose a compositional reconstruction loss. In addition, we employ a perceptual loss function to encourage the synthesized image and real image to be perceptually similar. Finally, we use stacked CA-GANs (SCA-GANs) to further rectify defects and add compelling details. The experimental results show that our method is capable of generating both visually comfortable and identity-preserving face sketches/photos over a wide range of challenging data. In addition, our method significantly decreases the best previous Fréchet inception distance (FID) from 36.2 to 26.2 for sketch synthesis, and from 60.9 to 30.5 for photo synthesis. Besides, we demonstrate that the proposed method is of considerable generalization ability.
Collapse
|
22
|
Jin X, Eom S, Shin S, Lee KH, Hong C. DORIC: discovering topological relations based on spatial link composition. Knowl Inf Syst 2021. [DOI: 10.1007/s10115-021-01603-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]
|
23
|
Mousas C, Krogmeier C, Wang Z. Photo Sequences of Varying Emotion: Optimization with a Valence-Arousal Annotated Dataset. ACM T INTERACT INTEL 2021. [DOI: 10.1145/3458844] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/20/2022]
Abstract
Synthesizing photo products such as photo strips and slideshows using a database of images is a time-consuming and tedious process that requires significant manual work. To overcome this limitation, we developed a method that automatically synthesizes photo sequences based on several design parameters. Our method considers the valence and arousal ratings of images in conjunction with parameters related to both the visual consistency of the synthesized photo sequence and the progression of valence and arousal throughout the photo sequence. Our method encodes valence, arousal, and visual consistency parameters as cost terms into a total cost function while applying a Markov chain Monte Carlo optimization techniques called simulated annealing to synthesize the photo sequence based on user-defined target objectives in a few seconds. As our method was developed for the synthesis of photo sequences using the valence-arousal emotional model, a user study was conducted to evaluate the efficacy of the synthesized photo sequences in triggering valence-arousal ratings as expected. Our results indicate that the proposed method synthesizes photo sequences in which valence and arousal dimensions are perceived as expected by participants; however, valence may be more appropriately perceived than arousal.
Collapse
Affiliation(s)
- Christos Mousas
- Department of Computer Graphics Technology, Purdue University, West Lafayette, IN, USA
| | - Claudia Krogmeier
- Department of Computer Graphics Technology, Purdue University, West Lafayette, IN, USA
| | - Zhiquan Wang
- Department of Computer Graphics Technology, Purdue University, West Lafayette, IN, USA
| |
Collapse
|
24
|
Yuan Y, Ning H, Lu X. Bio-Inspired Representation Learning for Visual Attention Prediction. IEEE TRANSACTIONS ON CYBERNETICS 2021; 51:3562-3575. [PMID: 31484145 DOI: 10.1109/tcyb.2019.2931735] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/10/2023]
Abstract
Visual attention prediction (VAP) is a significant and imperative issue in the field of computer vision. Most of the existing VAP methods are based on deep learning. However, they do not fully take advantage of the low-level contrast features while generating the visual attention map. In this article, a novel VAP method is proposed to generate the visual attention map via bio-inspired representation learning. The bio-inspired representation learning combines both low-level contrast and high-level semantic features simultaneously, which are developed by the fact that the human eye is sensitive to the patches with high contrast and objects with high semantics. The proposed method is composed of three main steps: 1) feature extraction; 2) bio-inspired representation learning; and 3) visual attention map generation. First, the high-level semantic feature is extracted from the refined VGG16, while the low-level contrast feature is extracted by the proposed contrast feature extraction block in a deep network. Second, during bio-inspired representation learning, both the extracted low-level contrast and high-level semantic features are combined by the designed densely connected block, which is proposed to concatenate various features scale by scale. Finally, the weighted-fusion layer is exploited to generate the ultimate visual attention map based on the obtained representations after bio-inspired representation learning. Extensive experiments are performed to demonstrate the effectiveness of the proposed method.
Collapse
|
25
|
Ren C, He X, Pu Y, Nguyen TQ. Learning Image Profile Enhancement and Denoising Statistics Priors for Single-Image Super-Resolution. IEEE TRANSACTIONS ON CYBERNETICS 2021; 51:3535-3548. [PMID: 31449041 DOI: 10.1109/tcyb.2019.2933257] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/10/2023]
Abstract
Single-image super-resolution (SR) has been widely used in computer vision applications. The reconstruction-based SR methods are mainly based on certain prior terms to regularize the SR problem. However, it is very challenging to further improve the SR performance by the conventional design of explicit prior terms. Because of the powerful learning ability, deep convolutional neural networks (CNNs) have been widely used in single-image SR task. However, it is difficult to achieve further improvement by only designing the network architecture. In addition, most existing deep CNN-based SR methods learn a nonlinear mapping function to directly map low-resolution (LR) images to desirable high-resolution (HR) images, ignoring the observation models of input images. Inspired by the split Bregman iteration (SBI) algorithm, which is a powerful technique for solving the constrained optimization problems, the original SR problem is divided into two subproblems: 1) inversion subproblem and 2) denoising subproblem. Since the inversion subproblem can be regarded as an inversion step to reconstruct an intermediate HR image with sharper edges and finer structures, we propose to use deep CNN to capture low-level explicit image profile enhancement prior (PEP). Since the denoising subproblem aims to remove the noise in the intermediate image, we adopt a simple and effective denoising network to learn implicit image denoising statistics prior (DSP). Furthermore, the penalty parameter in SBI is adaptively tuned during the iterations for better performance. Finally, we also prove the convergence of our method. Thus, the deep CNNs are exploited to capture both implicit and explicit image statistics priors. Due to SBI, the SR observation model is also leveraged. Consequently, it bridges between two popular SR approaches: 1) learning-based method and 2) reconstruction-based method. Experimental results show that the proposed method achieves the state-of-the-art SR results.
Collapse
|
26
|
Xing L, Chen B, Du S, Gu Y, Zheng N. Correntropy-Based Multiview Subspace Clustering. IEEE TRANSACTIONS ON CYBERNETICS 2021; 51:3298-3311. [PMID: 31794416 DOI: 10.1109/tcyb.2019.2952398] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/10/2023]
Abstract
Multiview subspace clustering, which aims to cluster the given data points with information from multiple sources or features into their underlying subspaces, has a wide range of applications in the communities of data mining and pattern recognition. Compared with the single-view subspace clustering, it is challenging to efficiently learn the structure of the representation matrix from each view and make use of the extra information embedded in multiple views. To address the two problems, a novel correntropy-based multiview subspace clustering (CMVSC) method is proposed in this article. The objective function of our model mainly includes two parts. The first part utilizes the Frobenius norm to efficiently estimate the dense connections between the points lying in the same subspace instead of following the standard compressive sensing approach. In the second part, the correntropy-induced metric (CIM) is introduced to characterize the noise in each view and utilize the information embedded in different views from an information-theoretic perspective. Furthermore, an efficient iterative algorithm based on the half-quadratic technique (HQ) and the alternating direction method of multipliers (ADMM) is developed to optimize the proposed joint learning problem, and extensive experimental results on six real-world multiview benchmarks demonstrate that the proposed methods can outperform several state-of-the-art multiview subspace clustering methods.
Collapse
|
27
|
Huang F, Zhang X, Xu J, Zhao Z, Li Z. Multimodal Learning of Social Image Representation by Exploiting Social Relations. IEEE TRANSACTIONS ON CYBERNETICS 2021; 51:1506-1518. [PMID: 30843858 DOI: 10.1109/tcyb.2019.2896100] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/09/2023]
Abstract
Learning the representation for social images has recently made remarkable achievements for many tasks, such as cross-modal retrieval and multilabel classification. However, since social images contain both multimodal contents (e.g., visual images and textual descriptions) and social relations among images, simply modeling the content information may lead to suboptimal embedding. In this paper, we propose a novel multimodal representation learning model for social images, that is, correlational multimodal variational autoencoder (CMVAE) via triplet network. Specifically, in order to mine the highly nonlinear correlation between the visual content and the textual content, a CMVAE is proposed to learn a unified representation for the multiple modalities of social images. Both common information in all modalities and private information in each modality are encoded for the representation learning. To incorporate the social relations among images, we employ the triplet network to embed multiple types of social links in the representation learning. Then, a joint embedding model is proposed to combine the social relations for representation learning of the multimodal contents. Comprehensive experiment results on four datasets confirm the effectiveness of our method in two tasks, namely, multilabel classification and cross-modal retrieval. Our method outperforms the state-of-the-art multimodal representation learning methods with significant improvement of performance.
Collapse
|
28
|
Duan M, Li K, Li K, Tian Q. A Novel Multi-task Tensor Correlation Neural Network for Facial Attribute Prediction. ACM T INTEL SYST TEC 2021. [DOI: 10.1145/3418285] [Citation(s) in RCA: 9] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/22/2022]
Abstract
Multi-task learning plays an important role in face multi-attribute prediction. At present, most researches excavate the shared information between attributes by sharing all convolutional layers. However, it is not appropriate to treat the low-level and high-level features of the face multi-attribute equally, because the high-level features are more biased toward the specific content of the category. In this article, a novel multi-attribute tensor correlation neural network (MTCN) is used to predict face attributes. MTCN shares all attribute features at the low-level layers, and then distinguishes each attribute feature at the high-level layers. To better excavate the correlations among high-level attribute features, each sub-network explores useful information from other networks to enhance its original information. Then a tensor canonical correlation analysis method is used to seek the correlations among the highest-level attributes, which enhances the original information of each attribute. After that, these features are mapped into a highly correlated space through the correlation matrix. Finally, we use sufficient experiments to verify the performance of MTCN on the CelebA and LFWA datasets and our MTCN achieves the best performance compared with the latest multi-attribute recognition algorithms under the same settings.
Collapse
Affiliation(s)
| | | | - Keqin Li
- State University of New York, USA
| | | |
Collapse
|
29
|
A Myocardial Segmentation Method Based on Adversarial Learning. BIOMED RESEARCH INTERNATIONAL 2021; 2021:6618918. [PMID: 33728334 PMCID: PMC7935602 DOI: 10.1155/2021/6618918] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 10/16/2020] [Revised: 12/09/2020] [Accepted: 02/02/2021] [Indexed: 12/03/2022]
Abstract
Congenital heart defects (CHD) are structural imperfections of the heart or large blood vessels that are detected around birth and their symptoms vary wildly, with mild case patients having no obvious symptoms and serious cases being potentially life-threatening. Using cardiovascular magnetic resonance imaging (CMRI) technology to create a patient-specific 3D heart model is an important prerequisite for surgical planning in children with CHD. Manually segmenting 3D images using existing tools is time-consuming and laborious, which greatly hinders the routine clinical application of 3D heart models. Therefore, automatic myocardial segmentation algorithms and related computer-aided diagnosis systems have emerged. Currently, the conventional methods for automatic myocardium segmentation are based on deep learning, rather than on the traditional machine learning method. Better results have been achieved, however, difficulties still exist such as CMRI often has, inconsistent signal strength, low contrast, and indistinguishable thin-walled structures near the atrium, valves, and large blood vessels, leading to challenges in automatic myocardium segmentation. Additionally, the labeling of 3D CMR images is time-consuming and laborious, causing problems in obtaining enough accurately labeled data. To solve the above problems, we proposed to apply the idea of adversarial learning to the problem of myocardial segmentation. Through a discriminant model, some additional supervision information is provided as a guide to further improve the performance of the segmentation model. Experiment results on real-world datasets show that our proposed adversarial learning-based method had improved performance compared with the baseline segmentation model and achieved better results on the automatic myocardium segmentation problem.
Collapse
|
30
|
Jerripothula KR, Cai J, Lu J, Yuan J. Image Co-Skeletonization via Co-Segmentation. IEEE TRANSACTIONS ON IMAGE PROCESSING : A PUBLICATION OF THE IEEE SIGNAL PROCESSING SOCIETY 2021; 30:2784-2797. [PMID: 33523810 DOI: 10.1109/tip.2021.3054464] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/12/2023]
Abstract
Recent advances in the joint processing of a set of images have shown its advantages over individual processing. Unlike the existing works geared towards co-segmentation or co-localization, in this article, we explore a new joint processing topic: image co-skeletonization, which is defined as joint skeleton extraction of the foreground objects in an image collection. It is well known that object skeletonization in a single natural image is challenging, because there is hardly any prior knowledge available about the object present in the image. Therefore, we resort to the idea of image co-skeletonization, hoping that the commonness prior that exists across the semantically similar images can be leveraged to have such knowledge, similar to other joint processing problems such as co-segmentation. Moreover, earlier research has found that augmenting a skeletonization process with the object's shape information is highly beneficial in capturing the image context. Having made these two observations, we propose a coupled framework for co-skeletonization and co-segmentation tasks to facilitate shape information discovery for our co-skeletonization process through the co-segmentation process. While image co-skeletonization is our primary goal, the co-segmentation process might also benefit, in turn, from exploiting skeleton outputs of the co-skeletonization process as central object seeds through such a coupled framework. As a result, both can benefit from each other synergistically. For evaluating image co-skeletonization results, we also construct a novel benchmark dataset by annotating nearly 1.8 K images and dividing them into 38 semantic categories. Although the proposed idea is essentially a weakly supervised method, it can also be employed in supervised and unsupervised scenarios. Extensive experiments demonstrate that the proposed method achieves promising results in all three scenarios.
Collapse
|
31
|
Li Y, Lu H. Multi-modal constraint propagation via compatible conditional distribution reconstruction. Neurocomputing 2021. [DOI: 10.1016/j.neucom.2020.09.067] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/25/2022]
|
32
|
Fernández-García ME, Sancho-Gómez JL, Ros-Ros A, Figueiras-Vidal AR. Complete Stacked Denoising Auto-Encoders for Regression. Neural Process Lett 2021. [DOI: 10.1007/s11063-020-10419-0] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/21/2022]
|
33
|
Jing XY, Zhang X, Zhu X, Wu F, You X, Gao Y, Shan S, Yang JY. Multiset Feature Learning for Highly Imbalanced Data Classification. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE 2021; 43:139-156. [PMID: 31331881 DOI: 10.1109/tpami.2019.2929166] [Citation(s) in RCA: 27] [Impact Index Per Article: 6.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/10/2023]
Abstract
With the expansion of data, increasing imbalanced data has emerged. When the imbalance ratio (IR) of data is high, most existing imbalanced learning methods decline seriously in classification performance. In this paper, we systematically investigate the highly imbalanced data classification problem, and propose an uncorrelated cost-sensitive multiset learning (UCML) approach for it. Specifically, UCML first constructs multiple balanced subsets through random partition, and then employs the multiset feature learning (MFL) to learn discriminant features from the constructed multiset. To enhance the usability of each subset and deal with the non-linearity issue existed in each subset, we further propose a deep metric based UCML (DM-UCML) approach. DM-UCML introduces the generative adversarial network technique into the multiset constructing process, such that each subset can own similar distribution with the original dataset. To cope with the non-linearity issue, DM-UCML integrates deep metric learning with MFL, such that more favorable performance can be achieved. In addition, DM-UCML designs a new discriminant term to enhance the discriminability of learned metrics. Experiments on eight traditional highly class-imbalanced datasets and two large-scale datasets indicate that: the proposed approaches outperform state-of-the-art highly imbalanced learning methods and are more robust to high IR.
Collapse
|
34
|
Gao H, Geng G, Zeng S. Approach for 3D Cultural Relic Classification Based on a Low-Dimensional Descriptor and Unsupervised Learning. ENTROPY 2020; 22:e22111290. [PMID: 33287058 PMCID: PMC7712925 DOI: 10.3390/e22111290] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 10/20/2020] [Revised: 11/05/2020] [Accepted: 11/11/2020] [Indexed: 11/22/2022]
Abstract
Computer-aided classification serves as the basis of virtual cultural relic management and display. The majority of the existing cultural relic classification methods require labelling of the samples of the dataset; however, in practical applications, there is often a lack of category labels of samples or an uneven distribution of samples of different categories. To solve this problem, we propose a 3D cultural relic classification method based on a low dimensional descriptor and unsupervised learning. First, the scale-invariant heat kernel signature (Si-HKS) was computed. The heat kernel signature denotes the heat flow of any two vertices across a 3D shape and the heat diffusion propagation is governed by the heat equation. Secondly, the Bag-of-Words (BoW) mechanism was utilized to transform the Si-HKS descriptor into a low-dimensional feature tensor, named a SiHKS-BoW descriptor that is related to entropy. Finally, we applied an unsupervised learning algorithm, called MKDSIF-FCM, to conduct the classification task. A dataset consisting of 3D models from 41 Tang tri-color Hu terracotta Eures was utilized to validate the effectiveness of the proposed method. A series of experiments demonstrated that the SiHKS-BoW descriptor along with the MKDSIF-FCM algorithm showed the best classification accuracy, up to 99.41%, which is a solution for an actual case with the absence of category labels and an uneven distribution of different categories of data. The present work promotes the application of virtual reality in digital projects and enriches the content of digital archaeology.
Collapse
Affiliation(s)
- Hongjuan Gao
- School of Information Science & Technology, Northwest University, Xi’an 710127, China; (H.G.); (S.Z.)
- Xinhua College, Ningxia University, Yinchuan 750021, China
| | - Guohua Geng
- School of Information Science & Technology, Northwest University, Xi’an 710127, China; (H.G.); (S.Z.)
- Correspondence:
| | - Sheng Zeng
- School of Information Science & Technology, Northwest University, Xi’an 710127, China; (H.G.); (S.Z.)
| |
Collapse
|
35
|
|
36
|
Wei Y, Gong C, Chen S, Liu T, Yang J, Tao D. Harnessing Side Information for Classification Under Label Noise. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2020; 31:3178-3192. [PMID: 31562108 DOI: 10.1109/tnnls.2019.2938782] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/10/2023]
Abstract
Practical data sets often contain the label noise caused by various human factors or measurement errors, which means that a fraction of training examples might be mistakenly labeled. Such noisy labels will mislead the classifier training and severely decrease the classification performance. Existing approaches to handle this problem are usually developed through various surrogate loss functions under the framework of empirical risk minimization. However, they are only suitable for binary classification and also require strong prior knowledge. Therefore, this article treats the example features as side information and formulates the noisy label removal problem as a matrix recovery problem. We denote our proposed method as "label noise handling via side information" (LNSI). Specifically, the observed label matrix is decomposed as the sum of two parts, in which the first part reveals the true labels and can be obtained by conducting a low-rank mapping on the side information; and the second part captures the incorrect labels and is modeled by a row-sparse matrix. The merits of such formulation lie in three aspects: 1) the strong recovery ability of this strategy has been sufficiently demonstrated by intensive theoretical works on side information; 2) multi-class situations can be directly handled with the aid of learned projection matrix; and 3) only very weak assumptions are required for model design, making LNSI applicable to a wide range of practical problems. Moreover, we theoretically derive the generalization bound of LNSI and show that the expected classification error of LNSI is upper bounded. The experimental results on a variety of data sets including UCI benchmark data sets and practical data sets confirm the superiority of LNSI to state-of-the-art approaches on label noise handling.
Collapse
|
37
|
|
38
|
Zhang Y, Gao X, He L, Lu W, He R. Objective Video Quality Assessment Combining Transfer Learning With CNN. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2020; 31:2716-2730. [PMID: 30736007 DOI: 10.1109/tnnls.2018.2890310] [Citation(s) in RCA: 12] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/09/2023]
Abstract
Nowadays, video quality assessment (VQA) is essential to video compression technology applied to video transmission and storage. However, small-scale video quality databases with imbalanced samples and low-level feature representations for distorted videos impede the development of VQA methods. In this paper, we propose a full-reference (FR) VQA metric integrating transfer learning with a convolutional neural network (CNN). First, we imitate the feature-based transfer learning framework to transfer the distorted images as the related domain, which enriches the distorted samples. Second, to extract high-level spatiotemporal features of the distorted videos, a six-layer CNN with the acknowledged learning ability is pretrained and finetuned by the common features of the distorted image blocks (IBs) and video blocks (VBs), respectively. Notably, the labels of the distorted IBs and VBs are predicted by the classic FR metrics. Finally, based on saliency maps and the entropy function, we conduct a pooling stage to obtain the quality scores of the distorted videos by weighting the block-level scores predicted by the trained CNN. In particular, we introduce a preprocessing and a postprocessing to reduce the impact of inaccurate labels predicted by the FR-VQA metric. Due to feature learning in the proposed framework, two kinds of experimental schemes including train-test iterative procedures on one database and tests on one database with training other databases are carried out. The experimental results demonstrate that the proposed method has high expansibility and is on a par with some state-of-the-art VQA metrics on two widely used VQA databases with various compression distortions.
Collapse
|
39
|
A Context Based Deep Temporal Embedding Network in Action Recognition. Neural Process Lett 2020. [DOI: 10.1007/s11063-020-10248-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/27/2022]
|
40
|
Jiang X, Wang N, Xin J, Yang X, Yu Y, Gao X. Image super-resolution via multi-view information fusion networks. Neurocomputing 2020. [DOI: 10.1016/j.neucom.2020.03.073] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/24/2022]
|
41
|
Zhao H, Zhan ZH, Lin Y, Chen X, Luo XN, Zhang J, Kwong S, Zhang J. Local Binary Pattern-Based Adaptive Differential Evolution for Multimodal Optimization Problems. IEEE TRANSACTIONS ON CYBERNETICS 2020; 50:3343-3357. [PMID: 31403453 DOI: 10.1109/tcyb.2019.2927780] [Citation(s) in RCA: 40] [Impact Index Per Article: 8.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/10/2023]
Abstract
The multimodal optimization problem (MMOP) requires the algorithm to find multiple global optima of the problem simultaneously. In order to solve MMOP efficiently, a novel differential evolution (DE) algorithm based on the local binary pattern (LBP) is proposed in this paper. The LBP makes use of the neighbors' information for extracting relevant pattern information, so as to identify the multiple regions of interests, which is similar to finding multiple peaks in MMOP. Inspired by the principle of LBP, this paper proposes an LBP-based adaptive DE (LBPADE) algorithm. It enables the LBP operator to form multiple niches, and further to locate multiple peak regions in MMOP. Moreover, based on the LBP niching information, we develop a niching and global interaction (NGI) mutation strategy and an adaptive parameter strategy (APS) to fully search the niching areas and maintain multiple peak regions. The proposed NGI mutation strategy incorporates information from both the niching and the global areas for effective exploration, while APS adjusts the parameters of each individual based on its own LBP information and guides the individual to the promising direction. The proposed LBPADE algorithm is evaluated on the extensive MMOPs test functions. The experimental results show that LBPADE outperforms or at least remains competitive with some state-of-the-art algorithms.
Collapse
|
42
|
Joint local constraint and fisher discrimination based dictionary learning for image classification. Neurocomputing 2020. [DOI: 10.1016/j.neucom.2019.05.103] [Citation(s) in RCA: 14] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/17/2022]
|
43
|
Hu Y, Xiong F, Lu D, Wang X, Xiong X, Chen H. Movie collaborative filtering with multiplex implicit feedbacks. Neurocomputing 2020. [DOI: 10.1016/j.neucom.2019.03.098] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/26/2022]
|
44
|
Ghodratnama S, Abrishami Moghaddam H. Content-based image retrieval using feature weighting and C-means clustering in a multi-label classification framework. Pattern Anal Appl 2020. [DOI: 10.1007/s10044-020-00887-4] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]
|
45
|
Deep Dual-Stream Network with Scale Context Selection Attention Module for Semantic Segmentation. Neural Process Lett 2020. [DOI: 10.1007/s11063-019-10148-z] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/25/2022]
|
46
|
|
47
|
Zhang C, Cheng J, Tian Q. Multiview Semantic Representation for Visual Recognition. IEEE TRANSACTIONS ON CYBERNETICS 2020; 50:2038-2049. [PMID: 30418893 DOI: 10.1109/tcyb.2018.2875728] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/09/2023]
Abstract
Due to interclass and intraclass variations, the images of different classes are often cluttered which makes it hard for efficient classifications. The use of discriminative classification algorithms helps to alleviate this problem. However, it is still an open problem to accurately model the relationships between visual representations and human perception. To alleviate these problems, in this paper, we propose a novel multiview semantic representation (MVSR) algorithm for efficient visual recognition. First, we leverage visually based methods to get initial image representations. We then use both visual and semantic similarities to divide images into groups which are then used for semantic representations. We treat different image representation strategies, partition methods, and numbers as different views. A graph is then used to combine the discriminative power of different views. The similarities between images can be obtained by measuring the similarities of graphs. Finally, we train classifiers to predict the categories of images. We evaluate the discriminative power of the proposed MVSR method for visual recognition on several public image datasets. Experimental results show the effectiveness of the proposed method.
Collapse
|
48
|
Song W, Zheng Y, Fu C, Shan P. A novel batch image encryption algorithm using parallel computing. Inf Sci (N Y) 2020. [DOI: 10.1016/j.ins.2020.01.009] [Citation(s) in RCA: 15] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/25/2022]
|
49
|
|
50
|
Zhang T, Liu Y, Hwang M, Hwang KS, Ma C, Cheng J. An end-to-end inverse reinforcement learning by a boosting approach with relative entropy. Inf Sci (N Y) 2020. [DOI: 10.1016/j.ins.2020.01.023] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/25/2022]
|