1
|
Zhu K, Guo H, Liu S, Wang J, Tang M. Learning Semantics-Consistent Stripes With Self-Refinement for Person Re-Identification. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2023; 34:8531-8542. [PMID: 35298384 DOI: 10.1109/tnnls.2022.3151487] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/14/2023]
Abstract
Aligning human parts automatically is one of the most challenging problems for person re-identification (re-ID). Recently, the stripe-based methods, which equally partition the person images into the fixed stripes for aligned representation learning, have achieved great success. However, the stripes with fixed height and position cannot well handle the misalignment problems caused by inaccurate detection and occlusion and may introduce much background noise. In this article, we aim at learning adaptive stripes with foreground refinement to achieve pixel-level part alignment by only using person identity labels for person re-ID and make two contributions. 1) A semantics-consistent stripe learning method (SCS). Given an image, SCS partitions it into adaptive horizontal stripes and each stripe is corresponding to a specific semantic part. Specifically, SCS iterates between two processes: i) clustering the rows to human parts or background to generate the pseudo-part labels of rows and ii) learning a row classifier to partition a person image, which is supervised by the latest pseudo-labels. This iterative scheme guarantees the accuracy of the learned image partition. 2) A self-refinement method (SCS+) to remove the background noise in stripes. We employ the above row classifier to generate the probabilities of pixels belonging to human parts (foreground) or background, which is called the class activation map (CAM). Only the most confident areas from the CAM are assigned with foreground/background labels to guide the human part refinement. Finally, by intersecting the semantics-consistent stripes with the foreground areas, SCS+ locates the human parts at pixel-level, obtaining a more robust part-aligned representation. Extensive experiments validate that SCS+ sets the new state-of-the-art performance on three widely used datasets including Market-1501, DukeMTMC-reID, and CUHK03-NP.
Collapse
|
2
|
Yi X, Qian C, Wu P, Maponde BT, Jiang T, Ge W. Research on Fine-Grained Image Recognition of Birds Based on Improved YOLOv5. SENSORS (BASEL, SWITZERLAND) 2023; 23:8204. [PMID: 37837034 PMCID: PMC10575358 DOI: 10.3390/s23198204] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 08/17/2023] [Revised: 09/14/2023] [Accepted: 09/27/2023] [Indexed: 10/15/2023]
Abstract
Birds play a vital role in maintaining biodiversity. Accurate identification of bird species is essential for conducting biodiversity surveys. However, fine-grained image recognition of birds encounters challenges due to large within-class differences and small inter-class differences. To solve this problem, our study took a part-based approach, dividing the identification task into two parts: part detection and identification classification. We proposed an improved bird part detection algorithm based on YOLOv5, which can handle partial overlap and complex environmental conditions between part objects. The backbone network incorporates the Res2Net-CBAM module to enhance the receptive fields of each network layer, strengthen the channel characteristics, and improve the sensitivity of the model to important information. Additionally, in order to boost data on features extraction and channel self-regulation, we have integrated CBAM attention mechanisms into the neck. The success rate of our suggested model, according to experimental findings, is 86.6%, 1.2% greater than the accuracy of the original model. Furthermore, when compared with other algorithms, our model's accuracy shows noticeable improvement. These results show how useful the method we suggested is for quickly and precisely recognizing different bird species.
Collapse
Affiliation(s)
| | | | - Peng Wu
- College of Mathematics & Computer Science, Zhejiang A & F University, Hangzhou 311300, China; (X.Y.); (C.Q.); (B.T.M.); (T.J.); (W.G.)
| | | | | | | |
Collapse
|
3
|
Hu K, Mei S, Wang W, Martens KAE, Wang L, Lewis SJG, Feng DD, Wang Z. Multi-Level Adversarial Spatio-Temporal Learning for Footstep Pressure Based FoG Detection. IEEE J Biomed Health Inform 2023; 27:4166-4177. [PMID: 37227913 DOI: 10.1109/jbhi.2023.3272902] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 05/27/2023]
Abstract
Freezing of gait (FoG) is one of the most common symptoms of Parkinson's disease, which is a neurodegenerative disorder of the central nervous system impacting millions of people around the world. To address the pressing need to improve the quality of treatment for FoG, devising a computer-aided detection and quantification tool for FoG has been increasingly important. As a non-invasive technique for collecting motion patterns, the footstep pressure sequences obtained from pressure sensitive gait mats provide a great opportunity for evaluating FoG in the clinic and potentially in the home environment. In this study, FoG detection is formulated as a sequential modelling task and a novel deep learning architecture, namely Adversarial Spatio-temporal Network (ASTN), is proposed to learn FoG patterns across multiple levels. ASTN introduces a novel adversarial training scheme with a multi-level subject discriminator to obtain subject-independent FoG representations, which helps to reduce the over-fitting risk due to the high inter-subject variance. As a result, robust FoG detection can be achieved for unseen subjects. The proposed scheme also sheds light on improving subject-level clinical studies from other scenarios as it can be integrated with many existing deep architectures. To the best of our knowledge, this is one of the first studies of footstep pressure-based FoG detection and the approach of utilizing ASTN is the first deep neural network architecture in pursuit of subject-independent representations. In our experiments on 393 trials collected from 21 subjects, the proposed ASTN achieved an AUC 0.85, clearly outperforming conventional learning methods.
Collapse
|
4
|
Spatial Relationship Recognition via Heterogeneous Representation: A Review. Neurocomputing 2023. [DOI: 10.1016/j.neucom.2023.02.053] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 03/06/2023]
|
5
|
Yang B, Yu C, Yu JG, Gao C, Sang N. Pose-Guided Hierarchical Semantic Decomposition and Composition for Human Parsing. IEEE TRANSACTIONS ON CYBERNETICS 2023; 53:1641-1652. [PMID: 34506295 DOI: 10.1109/tcyb.2021.3107544] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/13/2023]
Abstract
Human parsing is a fine-grained semantic segmentation task, which needs to understand human semantic parts. Most existing methods model human parsing as a general semantic segmentation, which ignores the inherent relationship among hierarchical human parts. In this work, we propose a pose-guided hierarchical semantic decomposition and composition framework for human parsing. Specifically, our method includes a semantic maintained decomposition and composition (SMDC) module and a pose distillation (PC) module. SMDC progressively disassembles the human body to focus on the more concise regions of interest in the decomposition stage and then gradually assembles human parts under the guidance of pose information in the composition stage. Notably, SMDC maintains the atomic semantic labels during both stages to avoid the error propagation issue of the hierarchical structure. To further take advantage of the relationship of human parts, we introduce pose information as explicit guidance for the composition. However, the discrete structure prediction in pose estimation is against the requirement of the continuous region in human parsing. To this end, we design a PC module to broadcast the maximum responses of pose estimation to form the continuous structure in the way of knowledge distillation. The experimental results on the look-into-person (LIP) and PASCAL-Person-Part datasets demonstrate the superiority of our method compared with the state-of-the-art methods, that is, 55.21% mean Intersection of Union (mIoU) on LIP and 69.88% mIoU on PASCAL-Person-Part.
Collapse
|
6
|
Yuan L, Yang L, Zhang S, Xu Z, Qin J, Shi Y, Yu P, Wang Y, Bao Z, Xia Y, Sun J, He W, Chen T, Chen X, Hu C, Zhang Y, Dong C, Zhao P, Wang Y, Jiang N, Lv B, Xue Y, Jiao B, Gao H, Chai K, Li J, Wang H, Wang X, Guan X, Liu X, Zhao G, Zheng Z, Yan J, Yu H, Chen L, Ye Z, You H, Bao Y, Cheng X, Zhao P, Wang L, Zeng W, Tian Y, Chen M, You Y, Yuan G, Ruan H, Gao X, Xu J, Xu H, Du L, Zhang S, Fu H, Cheng X. Development of a tongue image-based machine learning tool for the diagnosis of gastric cancer: a prospective multicentre clinical cohort study. EClinicalMedicine 2023; 57:101834. [PMID: 36825238 PMCID: PMC9941057 DOI: 10.1016/j.eclinm.2023.101834] [Citation(s) in RCA: 28] [Impact Index Per Article: 14.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 10/04/2022] [Revised: 01/04/2023] [Accepted: 01/09/2023] [Indexed: 02/08/2023] Open
Abstract
BACKGROUND Tongue images (the colour, size and shape of the tongue and the colour, thickness and moisture content of the tongue coating), reflecting the health state of the whole body according to the theory of traditional Chinese medicine (TCM), have been widely used in China for thousands of years. Herein, we investigated the value of tongue images and the tongue coating microbiome in the diagnosis of gastric cancer (GC). METHODS From May 2020 to January 2021, we simultaneously collected tongue images and tongue coating samples from 328 patients with GC (all newly diagnosed with GC) and 304 non-gastric cancer (NGC) participants in China, and 16 S rDNA was used to characterize the microbiome of the tongue coating samples. Then, artificial intelligence (AI) deep learning models were established to evaluate the value of tongue images and the tongue coating microbiome in the diagnosis of GC. Considering that tongue imaging is more convenient and economical as a diagnostic tool, we further conducted a prospective multicentre clinical study from May 2020 to March 2022 in China and recruited 937 patients with GC and 1911 participants with NGC from 10 centres across China to further evaluate the role of tongue images in the diagnosis of GC. Moreover, we verified this approach in another independent external validation cohort that included 294 patients with GC and 521 participants with NGC from 7 centres. This study is registered at ClinicalTrials.gov, NCT01090362. FINDINGS For the first time, we found that both tongue images and the tongue coating microbiome can be used as tools for the diagnosis of GC, and the area under the curve (AUC) value of the tongue image-based diagnostic model was 0.89. The AUC values of the tongue coating microbiome-based model reached 0.94 using genus data and 0.95 using species data. The results of the prospective multicentre clinical study showed that the AUC values of the three tongue image-based models for GCs reached 0.88-0.92 in the internal verification and 0.83-0.88 in the independent external verification, which were significantly superior to the combination of eight blood biomarkers. INTERPRETATION Our results suggest that tongue images can be used as a stable method for GC diagnosis and are significantly superior to conventional blood biomarkers. The three kinds of tongue image-based AI deep learning diagnostic models that we developed can be used to adequately distinguish patients with GC from participants with NGC, even early GC and precancerous lesions, such as atrophic gastritis (AG). FUNDING The National Key R&D Program of China (2021YFA0910100), Program of Zhejiang Provincial TCM Sci-tech Plan (2018ZY006), Medical Science and Technology Project of Zhejiang Province (2022KY114, WKJ-ZJ-2104), Zhejiang Provincial Research Center for Upper Gastrointestinal Tract Cancer (JBZX-202006), Natural Science Foundation of Zhejiang Province (HDMY22H160008), Science and Technology Projects of Zhejiang Province (2019C03049), National Natural Science Foundation of China (82074245, 81973634, 82204828), and Chinese Postdoctoral Science Foundation (2022M713203).
Collapse
Key Words
- AFP, alpha fetoprotein
- AG, atrophic gastritis
- AI, artificial intelligence
- APINet, attentive pairwise interaction neural network
- AUC, area under the curve
- Artificial intelligence
- BC, breast cancer
- CA, carbohydrate antigen
- CEA, carcinoembryonic antigen
- CRC, colorectal cancer
- DT, decision tree learning
- EC, esophageal cancer
- GC, gastric cancer
- Gastric cancer
- HBPC, hepatobiliary pancreatic carcinoma
- HC, healthy control
- KNN, K-nearest neighbours
- LC, lung cancer
- NGC, non-gastric cancers
- PCoA, principal coordinates analysis
- SG, superficial gastritis
- SVM, support vector machine
- TCM, traditional Chinese medicine
- Tongue coating microbiome
- Tongue images
- Traditional Chinese medicine
- TransFG, transformer architecture for fine-grained recognition
Collapse
Affiliation(s)
- Li Yuan
- Department of Gastric Surgery, The Cancer Hospital of the University of Chinese Academy of Sciences (Zhejiang Cancer Hospital), Institutes of Basic Medicine and Cancer (IBMC), Chinese Academy of Sciences, Hangzhou, 310022, China
- Zhejiang Provincial Research Center for Upper Gastrointestinal Tract Cancer, Zhejiang Cancer Hospital, Hangzhou, 310022, China
- Zhejiang Key Lab of Prevention, Diagnosis and Therapy of Upper Gastrointestinal Cancer, Zhejiang Cancer Hospital, Hangzhou, 310022, China
| | - Lin Yang
- Artificial Intelligence and Biomedical Images Analysis Lab, School of Engineering, Westlake University, China
| | - Shichuan Zhang
- Artificial Intelligence and Biomedical Images Analysis Lab, School of Engineering, Westlake University, China
| | - Zhiyuan Xu
- Department of Gastric Surgery, The Cancer Hospital of the University of Chinese Academy of Sciences (Zhejiang Cancer Hospital), Institutes of Basic Medicine and Cancer (IBMC), Chinese Academy of Sciences, Hangzhou, 310022, China
- Zhejiang Provincial Research Center for Upper Gastrointestinal Tract Cancer, Zhejiang Cancer Hospital, Hangzhou, 310022, China
- Zhejiang Key Lab of Prevention, Diagnosis and Therapy of Upper Gastrointestinal Cancer, Zhejiang Cancer Hospital, Hangzhou, 310022, China
| | - Jiangjiang Qin
- Department of Gastric Surgery, The Cancer Hospital of the University of Chinese Academy of Sciences (Zhejiang Cancer Hospital), Institutes of Basic Medicine and Cancer (IBMC), Chinese Academy of Sciences, Hangzhou, 310022, China
- Zhejiang Provincial Research Center for Upper Gastrointestinal Tract Cancer, Zhejiang Cancer Hospital, Hangzhou, 310022, China
- Zhejiang Key Lab of Prevention, Diagnosis and Therapy of Upper Gastrointestinal Cancer, Zhejiang Cancer Hospital, Hangzhou, 310022, China
| | - Yunfu Shi
- First Clinical Medical College, Zhejiang Chinese Medical University, Hangzhou, 310053, China
- Oncology Department, Tongde Hospital of Zhejiang Province, Hangzhou, 310012, China
| | - Pengcheng Yu
- First Clinical Medical College, Zhejiang Chinese Medical University, Hangzhou, 310053, China
| | - Yi Wang
- First Clinical Medical College, Zhejiang Chinese Medical University, Hangzhou, 310053, China
| | - Zhehan Bao
- First Clinical Medical College, Zhejiang Chinese Medical University, Hangzhou, 310053, China
| | - Yuhang Xia
- First Clinical Medical College, Zhejiang Chinese Medical University, Hangzhou, 310053, China
| | - Jiancheng Sun
- Department of Gastrointestinal Surgery, The First Affiliated Hospital of Wenzhou Medical University, Wenzhou, 325099, China
| | - Weiyang He
- Department of Gastrointestinal Surgery, Sichuan Cancer Hospital, Chengdu, 610042, China
| | - Tianhui Chen
- Department of Gastric Surgery, The Cancer Hospital of the University of Chinese Academy of Sciences (Zhejiang Cancer Hospital), Institutes of Basic Medicine and Cancer (IBMC), Chinese Academy of Sciences, Hangzhou, 310022, China
| | - Xiaolei Chen
- Department of Gastrointestinal Surgery, The First Affiliated Hospital of Wenzhou Medical University, Wenzhou, 325099, China
| | - Can Hu
- First Clinical Medical College, Zhejiang Chinese Medical University, Hangzhou, 310053, China
| | - Yunlong Zhang
- Artificial Intelligence and Biomedical Images Analysis Lab, School of Engineering, Westlake University, China
| | - Changwu Dong
- College of Traditional Chinese Medicine, Anhui University of Traditional Chinese Medicine, HeFei, 230038, China
| | - Ping Zhao
- Department of Gastrointestinal Surgery, Sichuan Cancer Hospital, Chengdu, 610042, China
| | - Yanan Wang
- College of Traditional Chinese Medicine, Anhui University of Traditional Chinese Medicine, HeFei, 230038, China
| | - Nan Jiang
- College of Traditional Chinese Medicine, Anhui University of Traditional Chinese Medicine, HeFei, 230038, China
| | - Bin Lv
- Department of Gastroenterology, First Affiliated Hospital of Zhejiang Chinese Medical University, Hangzhou, 310053, China
| | - Yingwei Xue
- Gastrointestinal Surgery, Harbin Medical University Cancer Hospital, Harbin, 150081, China
| | - Baoping Jiao
- Department of General Surgery, Shanxi Cancer Hospital, Taiyuan, 030013, China
| | - Hongyu Gao
- Gastrointestinal Surgery, Harbin Medical University Cancer Hospital, Harbin, 150081, China
| | - Kequn Chai
- Oncology Department, Tongde Hospital of Zhejiang Province, Hangzhou, 310012, China
| | - Jun Li
- Department of General Surgery, Shanxi Cancer Hospital, Taiyuan, 030013, China
| | - Hao Wang
- Gastrointestinal Surgery, Harbin Medical University Cancer Hospital, Harbin, 150081, China
| | - Xibo Wang
- Gastrointestinal Surgery, Harbin Medical University Cancer Hospital, Harbin, 150081, China
| | - Xiaoqing Guan
- Department of Gastric Surgery, The Cancer Hospital of the University of Chinese Academy of Sciences (Zhejiang Cancer Hospital), Institutes of Basic Medicine and Cancer (IBMC), Chinese Academy of Sciences, Hangzhou, 310022, China
| | - Xu Liu
- Department of Gastrointestinal Surgery, RenJi Hospital, School of Medicine, Shanghai Jiao Tong University, Shanghai, 200025, China
| | - Gang Zhao
- Department of Gastrointestinal Surgery, RenJi Hospital, School of Medicine, Shanghai Jiao Tong University, Shanghai, 200025, China
| | - Zhichao Zheng
- Department of Gastric Surgery, Cancer Hospital of China Medical University (Liaoning Cancer Hospital and Institute), Shenyang, 110042, China
| | - Jie Yan
- Department of Gastric Surgery, Cancer Hospital of China Medical University (Liaoning Cancer Hospital and Institute), Shenyang, 110042, China
| | - Haiyue Yu
- Department of Gastric Surgery, Cancer Hospital of China Medical University (Liaoning Cancer Hospital and Institute), Shenyang, 110042, China
| | - Luchuan Chen
- Department of Gastrointestinal Surgery, Fujian Cancer Hospital, Fujian Medical University Cancer Hospital, Fuzhou, 350014, China
| | - Zaisheng Ye
- Department of Gastrointestinal Surgery, Fujian Cancer Hospital, Fujian Medical University Cancer Hospital, Fuzhou, 350014, China
| | - Huaqiang You
- Department of Gastroenterology, Yuhang District People's Hospital, Hangzhou, 311199, China
| | - Yu Bao
- Department of Gastrointestinal Surgery, Sichuan Cancer Hospital, Chengdu, 610042, China
| | - Xi Cheng
- Department of Gastrointestinal Surgery, Sichuan Cancer Hospital, Chengdu, 610042, China
| | - Peizheng Zhao
- Department of Health Management Center, Yueyang Central Hospital, Yueyang, 414000, China
| | - Liang Wang
- Department of Endoscopy Center, Kecheng District People's Hospital, Quzhou, 324000, China
| | - Wenting Zeng
- Department of General Surgery, Shanxi Cancer Hospital, Taiyuan, 030013, China
| | - Yanfei Tian
- Department of Gastric Surgery, Cancer Hospital of China Medical University (Liaoning Cancer Hospital and Institute), Shenyang, 110042, China
| | - Ming Chen
- Department of Endoscopy Center, Shandong Cancer Hospital, Shandong, 250117, China
| | - You You
- Department of Health Management Center, Zigong Fourth People's Hospital, Zigong, 643099, China
| | - Guihong Yuan
- Department of Gastroenterology, Hainan Cancer Hospital, Hainan, 570312, China
| | - Hua Ruan
- Department of Chinese Surgery, Linping District Hospital of Traditional Chinese Medicine, Hangzhou, 311100, China
| | - Xiaole Gao
- The First Affiliated Hospital of Henan University of Science and Technology, Zhengzhou, 450062, China
| | - Jingli Xu
- First Clinical Medical College, Zhejiang Chinese Medical University, Hangzhou, 310053, China
| | - Handong Xu
- First Clinical Medical College, Zhejiang Chinese Medical University, Hangzhou, 310053, China
| | - Lingbin Du
- Department of Gastric Surgery, The Cancer Hospital of the University of Chinese Academy of Sciences (Zhejiang Cancer Hospital), Institutes of Basic Medicine and Cancer (IBMC), Chinese Academy of Sciences, Hangzhou, 310022, China
| | - Shengjie Zhang
- Department of Gastric Surgery, The Cancer Hospital of the University of Chinese Academy of Sciences (Zhejiang Cancer Hospital), Institutes of Basic Medicine and Cancer (IBMC), Chinese Academy of Sciences, Hangzhou, 310022, China
| | - Huanying Fu
- Department of Gastric Surgery, The Cancer Hospital of the University of Chinese Academy of Sciences (Zhejiang Cancer Hospital), Institutes of Basic Medicine and Cancer (IBMC), Chinese Academy of Sciences, Hangzhou, 310022, China
| | - Xiangdong Cheng
- Department of Gastric Surgery, The Cancer Hospital of the University of Chinese Academy of Sciences (Zhejiang Cancer Hospital), Institutes of Basic Medicine and Cancer (IBMC), Chinese Academy of Sciences, Hangzhou, 310022, China
- Zhejiang Provincial Research Center for Upper Gastrointestinal Tract Cancer, Zhejiang Cancer Hospital, Hangzhou, 310022, China
- Zhejiang Key Lab of Prevention, Diagnosis and Therapy of Upper Gastrointestinal Cancer, Zhejiang Cancer Hospital, Hangzhou, 310022, China
- Corresponding author. Department of Gastric surgery, Zhejiang Cancer Hospital, Banshan Road 1#, Hangzhou, Zhejiang, 310022, China.
| |
Collapse
|
7
|
Wang Y, Feng L, Song X, Xu D, Zhai Y. Zero-Shot Image Classification Method Based on Attention Mechanism and Semantic Information Fusion. SENSORS (BASEL, SWITZERLAND) 2023; 23:2311. [PMID: 36850908 PMCID: PMC9966441 DOI: 10.3390/s23042311] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 01/04/2023] [Revised: 02/15/2023] [Accepted: 02/16/2023] [Indexed: 06/18/2023]
Abstract
The zero-shot image classification (ZSIC) is designed to solve the classification problem when the sample is very small, or the category is missing. A common method is to use attribute or word vectors as a priori category features (auxiliary information) and complete the domain transfer from training of seen classes to recognition of unseen classes by building a mapping between image features and a priori category features. However, feature extraction of the whole image lacks discrimination, and the amount of information of single attribute features or word vector features of categories is insufficient, which makes the matching degree between image features and prior class features not high and affects the accuracy of the ZSIC model. To this end, a spatial attention mechanism is designed, and an image feature extraction module based on this attention mechanism is constructed to screen critical features with discrimination. A semantic information fusion method based on matrix decomposition is proposed, which first decomposes the attribute features and then fuses them with the extracted word vector features of a dataset to achieve information expansion. Through the above two improvement measures, the classification accuracy of the ZSIC model for unseen images is improved. The experimental results on public datasets verify the effect and superiority of the proposed methods.
Collapse
Affiliation(s)
- Yaru Wang
- Department of Automation, North China Electric Power University, Baoding 071003, China
| | - Lilong Feng
- Department of Automation, North China Electric Power University, Baoding 071003, China
| | - Xiaoke Song
- Department of Automation, North China Electric Power University, Baoding 071003, China
| | - Dawei Xu
- Department of Automation, North China Electric Power University, Baoding 071003, China
- State Key Laboratory of Management and Control for Complex Systems, Institute of Automation, Chinese Academy of Sciences, Beijing 100190, China
| | - Yongjie Zhai
- Department of Automation, North China Electric Power University, Baoding 071003, China
| |
Collapse
|
8
|
Lu Y, Chen X, Wu Z, Yu J. Decoupled Metric Network for Single-Stage Few-Shot Object Detection. IEEE TRANSACTIONS ON CYBERNETICS 2023; 53:514-525. [PMID: 35213322 DOI: 10.1109/tcyb.2022.3149825] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/14/2023]
Abstract
Within the last few years, great efforts have been made to study few-shot learning. Although general object detection is advancing at a rapid pace, few-shot detection remains a very challenging problem. In this work, we propose a novel decoupled metric network (DMNet) for single-stage few-shot object detection. We design a decoupled representation transformation (DRT) and an image-level distance metric learning (IDML) to solve the few-shot detection problem. The DRT can eliminate the adverse effect of handcrafted prior knowledge by predicting objectness and anchor shape. Meanwhile, to alleviate the problem of representation disagreement between classification and location (i.e., translational invariance versus translational variance), the DRT adopts a decoupled manner to generate adaptive representations so that the model is easier to learn from only a few training data. As for a few-shot classification in the detection task, we design an IDML tailored to enhance the generalization ability. This module can perform metric learning for the whole visual feature, so it can be more efficient than traditional DML due to the merit of parallel inference for multiobjects. Based on the DRT and IDML, our DMNet efficiently realizes a novel paradigm for few-shot detection, called single-stage metric detection. Experiments are conducted on the PASCAL VOC dataset and the MS COCO dataset. As a result, our method achieves state-of-the-art performance in few-shot object detection. The codes are available at https://github.com/yrqs/DMNet.
Collapse
|
9
|
Zhou X, Shen K, Weng L, Cong R, Zheng B, Zhang J, Yan C. Edge-Guided Recurrent Positioning Network for Salient Object Detection in Optical Remote Sensing Images. IEEE TRANSACTIONS ON CYBERNETICS 2023; 53:539-552. [PMID: 35417369 DOI: 10.1109/tcyb.2022.3163152] [Citation(s) in RCA: 6] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/14/2023]
Abstract
Optical remote sensing images (RSIs) have been widely used in many applications, and one of the interesting issues about optical RSIs is the salient object detection (SOD). However, due to diverse object types, various object scales, numerous object orientations, and cluttered backgrounds in optical RSIs, the performance of the existing SOD models often degrade largely. Meanwhile, cutting-edge SOD models targeting optical RSIs typically focus on suppressing cluttered backgrounds, while they neglect the importance of edge information which is crucial for obtaining precise saliency maps. To address this dilemma, this article proposes an edge-guided recurrent positioning network (ERPNet) to pop-out salient objects in optical RSIs, where the key point lies in the edge-aware position attention unit (EPAU). First, the encoder is used to give salient objects a good representation, that is, multilevel deep features, which are then delivered into two parallel decoders, including: 1) an edge extraction part and 2) a feature fusion part. The edge extraction module and the encoder form a U-shape architecture, which not only provides accurate salient edge clues but also ensures the integrality of edge information by extra deploying the intraconnection. That is to say, edge features can be generated and reinforced by incorporating object features from the encoder. Meanwhile, each decoding step of the feature fusion module provides the position attention about salient objects, where position cues are sharpened by the effective edge information and are used to recurrently calibrate the misaligned decoding process. After that, we can obtain the final saliency map by fusing all position attention cues. Extensive experiments are conducted on two public optical RSIs datasets, and the results show that the proposed ERPNet can accurately and completely pop-out salient objects, which consistently outperforms the state-of-the-art SOD models.
Collapse
|
10
|
Zhou J, Liu Q. Emotion Analysis Based on Neural Network under the Big Data Environment. JOURNAL OF ENVIRONMENTAL AND PUBLIC HEALTH 2022; 2022:7123079. [PMID: 36203508 PMCID: PMC9532102 DOI: 10.1155/2022/7123079] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 08/18/2022] [Revised: 09/03/2022] [Accepted: 09/10/2022] [Indexed: 11/17/2022]
Abstract
Aiming at the problems of poor emotional tendency prediction effect and low utilization of syntactic information, this study proposes a big data sentiment analysis method based on neural network. First, the BERT model is used to vectorize the input data to reduce the semantic loss when the data is vectorized; then the word vector is input into the bidirectional LSTM encoder to obtain data features. Finally, the representation of the attention layer is used as the final feature vector for sentiment classification, reducing the influence of irrelevant data. The experimental results show that the method has high accuracy, recall, and F1 value and can effectively improve the accuracy of fine-grained sentiment classification of ambiguous texts.
Collapse
Affiliation(s)
- Jing Zhou
- Department of Computer School, Huanggang Normal University, Huanggang, Hubei 438000, China
| | - Quanju Liu
- Department of Computer School, Huanggang Normal University, Huanggang, Hubei 438000, China
| |
Collapse
|
11
|
Chen J, Li H, Liang J, Su X, Zhai Z, Chai X. Attention-based cropping and erasing learning with coarse-to-fine refinement for fine-grained visual classification. Neurocomputing 2022. [DOI: 10.1016/j.neucom.2022.06.041] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/18/2022]
|
12
|
Cheng J, Wang L, Wu J, Hu X, Jeon G, Tao D, Zhou M. Visual Relationship Detection: A Survey. IEEE TRANSACTIONS ON CYBERNETICS 2022; 52:8453-8466. [PMID: 35077387 DOI: 10.1109/tcyb.2022.3142013] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/14/2023]
Abstract
Visual relationship detection (VRD) is one newly developed computer vision task, aiming to recognize relations or interactions between objects in an image. It is a further learning task after object recognition, and is important for fully understanding images even the visual world. It has numerous applications, such as image retrieval, machine vision in robotics, visual question answer (VQA), and visual reasoning. However, this problem is difficult since relationships are not definite, and the number of possible relations is much larger than objects. So the complete annotation for visual relationships is much more difficult, making this task hard to learn. Many approaches have been proposed to tackle this problem especially with the development of deep neural networks in recent years. In this survey, we first introduce the background of visual relations. Then, we present categorization and frameworks of deep learning models for visual relationship detection. The high-level applications, benchmark datasets, as well as empirical analysis are also introduced for comprehensive understanding of this task.
Collapse
|
13
|
Yu Q, Song S, Ma C, Wei J, Chen S, Tan KC. Temporal Encoding and Multispike Learning Framework for Efficient Recognition of Visual Patterns. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2022; 33:3387-3399. [PMID: 33531306 DOI: 10.1109/tnnls.2021.3052804] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/12/2023]
Abstract
Biological systems under a parallel and spike-based computation endow individuals with abilities to have prompt and reliable responses to different stimuli. Spiking neural networks (SNNs) have thus been developed to emulate their efficiency and to explore principles of spike-based processing. However, the design of a biologically plausible and efficient SNN for image classification still remains as a challenging task. Previous efforts can be generally clustered into two major categories in terms of coding schemes being employed: rate and temporal. The rate-based schemes suffer inefficiency, whereas the temporal-based ones typically end with a relatively poor performance in accuracy. It is intriguing and important to develop an SNN with both efficiency and efficacy being considered. In this article, we focus on the temporal-based approaches in a way to advance their accuracy performance by a great margin while keeping the efficiency on the other hand. A new temporal-based framework integrated with the multispike learning is developed for efficient recognition of visual patterns. Different approaches of encoding and learning under our framework are evaluated with the MNIST and Fashion-MNIST data sets. Experimental results demonstrate the efficient and effective performance of our temporal-based approaches across a variety of conditions, improving accuracies to higher levels that are even comparable to rate-based ones but importantly with a lighter network structure and far less number of spikes. This article attempts to extend the advanced multispike learning to the challenging task of image recognition and bring state of the arts in temporal-based approaches to a novel level. The experimental results could be potentially favorable to low-power and high-speed requirements in the field of artificial intelligence and contribute to attract more efforts toward brain-like computing.
Collapse
|
14
|
Zhang Y, Cheng C, Wang S, Xia T. Emotion recognition using heterogeneous convolutional neural networks combined with multimodal factorized bilinear pooling. Biomed Signal Process Control 2022. [DOI: 10.1016/j.bspc.2022.103877] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/02/2022]
|
15
|
Liu X, Ji Z, Pang Y, Han J, Li X. DGIG-Net: Dynamic Graph-in-Graph Networks for Few-Shot Human-Object Interaction. IEEE TRANSACTIONS ON CYBERNETICS 2022; 52:7852-7864. [PMID: 33566778 DOI: 10.1109/tcyb.2021.3049537] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/12/2023]
Abstract
Few-shot learning (FSL) for human-object interaction (HOI) aims at recognizing various relationships between human actions and surrounding objects only from a few samples. It is a challenging vision task, in which the diversity and interactivity of human actions result in great difficulty to learn an adaptive classifier to catch ambiguous interclass information. Therefore, traditional FSL methods usually perform unsatisfactorily in complex HOI scenes. To this end, we propose dynamic graph-in-graph networks (DGIG-Net), a novel graph prototypes framework to learn a dynamic metric space by embedding a visual subgraph to a task-oriented cross-modal graph for few-shot HOI. Specifically, we first build a knowledge reconstruction graph to learn latent representations for HOI categories by reconstructing the relationship among visual features, which generates visual representations under the category distribution of every task. Then, a dynamic relation graph integrates both reconstructible visual nodes and dynamic task-oriented semantic information to explore a graph metric space for HOI class prototypes, which applies the discriminative information from the similarities among actions or objects. We validate DGIG-Net on multiple benchmark datasets, on which it largely outperforms existing FSL approaches and achieves state-of-the-art results.
Collapse
|
16
|
Ji Z, Yu X, Yu Y, Pang Y, Zhang Z. Semantic-Guided Class-Imbalance Learning Model for Zero-Shot Image Classification. IEEE TRANSACTIONS ON CYBERNETICS 2022; 52:6543-6554. [PMID: 34043516 DOI: 10.1109/tcyb.2020.3004641] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/12/2023]
Abstract
In this article, we focus on the task of zero-shot image classification (ZSIC) that equips a learning system with the ability to recognize visual images from unseen classes. In contrast to the traditional image classification, ZSIC more easily suffers from the class-imbalance issue since it is more concerned with the class-level knowledge transferring capability. In the real world, the sample numbers of different categories generally follow a long-tailed distribution, and the discriminative information in the sample-scarce seen classes is hard to transfer to the related unseen classes in the traditional batch-based training manner, which degrades the overall generalization ability a lot. To alleviate the class-imbalance issue in ZSIC, we propose a sample-balanced training process to encourage all training classes to contribute equally to the learned model. Specifically, we randomly select the same number of images from each class across all training classes to form a training batch to ensure that the sample-scarce classes contribute equally as those classes with sufficient samples during each iteration. Considering that the instances from the same class differ in class representativeness, we further develop an efficient semantic-guided feature fusion model to obtain the discriminative class visual prototype for the following visual-semantic interaction process via distributing different weights to the selected samples based on their class representativeness. Extensive experiments on three imbalanced ZSIC benchmark datasets for both traditional ZSIC and generalized ZSIC tasks demonstrate that our approach achieves promising results, especially for the unseen categories that are closely related to the sample-scarce seen categories. Besides, the experimental results on two class-balanced datasets show that the proposed approach also improves the classification performance against the baseline model.
Collapse
|
17
|
Tang L, Chen K, Wu C, Hong Y, Jia K, Yang ZX. Improving Semantic Analysis on Point Clouds via Auxiliary Supervision of Local Geometric Priors. IEEE TRANSACTIONS ON CYBERNETICS 2022; 52:4949-4959. [PMID: 33095729 DOI: 10.1109/tcyb.2020.3025798] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/11/2023]
Abstract
Existing deep learning algorithms for point cloud analysis mainly concern discovering semantic patterns from the global configuration of local geometries in a supervised learning manner. However, very few explore geometric properties revealing local surface manifolds embedded in 3-D Euclidean space to discriminate semantic classes or object parts as additional supervision signals. This article is the first attempt to propose a unique multitask geometric learning network to improve semantic analysis by auxiliary geometric learning with local shape properties, which can be either generated via physical computation from point clouds themselves as self-supervision signals or provided as privileged information. Owing to explicitly encoding local shape manifolds in favor of semantic analysis, the proposed geometric self-supervised and privileged learning algorithms can achieve superior performance to their backbone baselines and other state-of-the-art methods, which are verified in the experiments on the popular benchmarks.
Collapse
|
18
|
Zhang L, Shang Y, Li P, Luo H, Shao L. Community-Aware Photo Quality Evaluation by Deeply Encoding Human Perception. IEEE TRANSACTIONS ON CYBERNETICS 2022; 52:3136-3146. [PMID: 32735541 DOI: 10.1109/tcyb.2019.2937319] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/11/2023]
Abstract
Computational photo quality evaluation is a useful technique in many tasks of computer vision and graphics, for example, photo retaregeting, 3-D rendering, and fashion recommendation. The conventional photo quality models are designed by characterizing the pictures from all communities (e.g., "architecture" and "colorful") indiscriminately, wherein community-specific features are not exploited explicitly. In this article, we develop a new community-aware photo quality evaluation framework. It uncovers the latent community-specific topics by a regularized latent topic model (LTM) and captures human visual quality perception by exploring multiple attributes. More specifically, given massive-scale online photographs from multiple communities, a novel ranking algorithm is proposed to measure the visual/semantic attractiveness of regions inside each photograph. Meanwhile, three attributes, namely: 1) photo quality scores; weak semantic tags; and inter-region correlations, are seamlessly and collaboratively incorporated during ranking. Subsequently, we construct the gaze shifting path (GSP) for each photograph by sequentially linking the top-ranking regions from each photograph, and an aggregation-based CNN calculates the deep representation for each GSP. Based on this, an LTM is proposed to model the GSP distribution from multiple communities in the latent space. To mitigate the overfitting problem caused by communities with very few photographs, a regularizer is incorporated into our LTM. Finally, given a test photograph, we obtain its deep GSP representation and its quality score is determined by the posterior probability of the regularized LTM. Comparative studies on four image sets have shown the competitiveness of our method. Besides, the eye-tracking experiments have demonstrated that our ranking-based GSPs are highly consistent with real human gaze movements.
Collapse
|
19
|
Zheng X, Gong T, Lu X, Li X. Human action recognition by multiple spatial clues network. Neurocomputing 2022. [DOI: 10.1016/j.neucom.2022.01.091] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/19/2022]
|
20
|
de Santana Correia A, Colombini EL. Attention, please! A survey of neural attention models in deep learning. Artif Intell Rev 2022. [DOI: 10.1007/s10462-022-10148-x] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/11/2023]
|
21
|
Liu M, Hu H, Li L, Yu Y, Guan W. Chinese Image Caption Generation via Visual Attention and Topic Modeling. IEEE TRANSACTIONS ON CYBERNETICS 2022; 52:1247-1257. [PMID: 32568717 DOI: 10.1109/tcyb.2020.2997034] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/11/2023]
Abstract
Automatic image captioning is to conduct the cross-modal conversion from image visual content to natural language text. Involving computer vision (CV) and natural language processing (NLP), it has become one of the most sophisticated research issues in the artificial-intelligence area. Based on the deep neural network, the neural image caption (NIC) model has achieved remarkable performance in image captioning, yet there still remain some essential challenges, such as the deviation between descriptive sentences generated by the model and the intrinsic content expressed by the image, the low accuracy of the image scene description, and the monotony of generated sentences. In addition, most of the current datasets and methods for image captioning are in English. However, considering the distinction between Chinese and English in syntax and semantics, it is necessary to develop specialized Chinese image caption generation methods to accommodate the difference. To solve the aforementioned problems, we design the NICVATP2L model via visual attention and topic modeling, in which the visual attention mechanism reduces the deviation and the topic model improves the accuracy and diversity of generated sentences. Specifically, in the encoding phase, convolutional neural network (CNN) and topic model are used to extract visual and topic features of the input images, respectively. In the decoding phase, an attention mechanism is applied to processing image visual features for obtaining image visual region features. Finally, the topic features and the visual region features are combined to guide the two-layer long short-term memory (LSTM) network for generating Chinese image captions. To justify our model, we have conducted experiments over the Chinese AIC-ICC image dataset. The experimental results show that our model can automatically generate more informative and descriptive captions in Chinese in a more natural way, and it outperforms the existing image captioning NIC model.
Collapse
|
22
|
Han J, Yao X, Cheng G, Feng X, Xu D. P-CNN: Part-Based Convolutional Neural Networks for Fine-Grained Visual Categorization. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE 2022; 44:579-590. [PMID: 31398107 DOI: 10.1109/tpami.2019.2933510] [Citation(s) in RCA: 15] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/10/2023]
Abstract
This paper proposes an end-to-end fine-grained visual categorization system, termed Part-based Convolutional Neural Network (P-CNN), which consists of three modules. The first module is a Squeeze-and-Excitation (SE) block, which learns to recalibrate channel-wise feature responses by emphasizing informative channels and suppressing less useful ones. The second module is a Part Localization Network (PLN) used to locate distinctive object parts, through which a bank of convolutional filters are learned as discriminative part detectors. Thus, a group of informative parts can be discovered by convolving the feature maps with each part detector. The third module is a Part Classification Network (PCN) that has two streams. The first stream classifies each individual object part into image-level categories. The second stream concatenates part features and global feature into a joint feature for the final classification. In order to learn powerful part features and boost the joint feature capability, we propose a Duplex Focal Loss used for metric learning and part classification, which focuses on training hard examples. We further merge PLN and PCN into a unified network for an end-to-end training process via a simple training technique. Comprehensive experiments and comparisons with state-of-the-art methods on three benchmark datasets demonstrate the effectiveness of our proposed method.
Collapse
|
23
|
Abstract
Cross-modal retrieval aims to search samples of one modality via queries of other modalities, which is a hot issue in the community of multimedia. However, two main challenges, i.e., heterogeneity gap and semantic interaction across different modalities, have not been solved efficaciously. Reducing the heterogeneous gap can improve the cross-modal similarity measurement. Meanwhile, modeling cross-modal semantic interaction can capture the semantic correlations more accurately. To this end, this paper presents a novel end-to-end framework, called Dual Attention Generative Adversarial Network (DA-GAN). This technique is an adversarial semantic representation model with a dual attention mechanism, i.e., intra-modal attention and inter-modal attention. Intra-modal attention is used to focus on the important semantic feature within a modality, while inter-modal attention is to explore the semantic interaction between different modalities and then represent the high-level semantic correlation more precisely. A dual adversarial learning strategy is designed to generate modality-invariant representations, which can reduce the cross-modal heterogeneity efficiently. The experiments on three commonly used benchmarks show the better performance of DA-GAN than these competitors.
Collapse
|
24
|
Lv Z, Wang W, Xu Z, Zhang K, Fan Y, Song Y. Fine-grained object detection method using attention mechanism and its application in coal–gangue detection. Appl Soft Comput 2021. [DOI: 10.1016/j.asoc.2021.107891] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
|
25
|
Abstract
Classifying fine-grained categories (e.g., bird species, car, and aircraft types) is a crucial problem in image understanding and is difficult due to intra-class and inter-class variance. Most of the existing fine-grained approaches individually utilize various parts and local information of objects to improve the classification accuracy but neglect the mechanism of the feature fusion between the object (global) and object’s parts (local) to reinforce fine-grained features. In this paper, we present a novel framework, namely object–part registration–fusion Net (OR-Net), which considers the mechanism of registration and fusion between an object (global) and its parts’ (local) features for fine-grained classification. Our model learns the fine-grained features from the object of global and local regions and fuses these features with the registration mechanism to reinforce each region’s characteristics in the feature maps. Precisely, OR-Net consists of: (1) a multi-stream feature extraction net, which generates features with global and various local regions of objects; (2) a registration–fusion feature module calculates the dimension and location relationships between global (object) regions and local (parts) regions to generate the registration information and fuses the local features into the global features with registration information to generate the fine-grained feature. Experiments execute symmetric GPU devices with symmetric mini-batch to verify that OR-Net surpasses the state-of-the-art approaches on CUB-200-2011 (Birds), Stanford-Cars, and Stanford-Aircraft datasets.
Collapse
|
26
|
Ye H, Li H, Chen CLP. Adaptive Deep Cascade Broad Learning System and Its Application in Image Denoising. IEEE TRANSACTIONS ON CYBERNETICS 2021; 51:4450-4463. [PMID: 32203051 DOI: 10.1109/tcyb.2020.2978500] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/10/2023]
Abstract
This article proposes a novel regularization deep cascade broad learning system (DCBLS) architecture, which includes one cascaded feature mapping nodes layer and one cascaded enhancement nodes layer. Then, the transformation feature representation is easily obtained by incorporating the enhancement nodes and the feature mapping nodes. Once such a representation is established, a final output layer is constructed by implementing a simple convex optimization model. Furthermore, a parallelization framework on the new method is designed to make it compatible with large-scale data. Simultaneously, an adaptive regularization parameter criterion is adopted under some conditions. Moreover, the stability and error estimate of this method are discussed and proved mathematically. The proposed method could extract sufficient available information from the raw data compared with the standard broad learning system and could achieve compellent successes in image denoising. The experiments results on benchmark datasets, including natural images as well as hyperspectral images, verify the effectiveness and superiority of the proposed method in comparison with the state-of-the-art approaches for image denoising.
Collapse
|
27
|
Liu X, Zhang L, Li T, Wang D, Wang Z. Dual attention guided multi-scale CNN for fine-grained image classification. Inf Sci (N Y) 2021. [DOI: 10.1016/j.ins.2021.05.040] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/21/2022]
|
28
|
Chen X, Han Z, Liu X, Li Z, Fang T, Huo H, Li Q, Zhu M, Liu M, Yuan H. Semantic boundary enhancement and position attention network with long-range dependency for semantic segmentation. Appl Soft Comput 2021. [DOI: 10.1016/j.asoc.2021.107511] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/06/2023]
|
29
|
Wang H, Peng J, Jiang G, Xu F, Fu X. Discriminative feature and dictionary learning with part-aware model for vehicle re-identification. Neurocomputing 2021. [DOI: 10.1016/j.neucom.2020.06.148] [Citation(s) in RCA: 18] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/16/2022]
|
30
|
|
31
|
Li Y, Zhang Y, Zhu Z. Error-Tolerant Deep Learning for Remote Sensing Image Scene Classification. IEEE TRANSACTIONS ON CYBERNETICS 2021; 51:1756-1768. [PMID: 32413949 DOI: 10.1109/tcyb.2020.2989241] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/11/2023]
Abstract
Due to its various application potentials, the remote sensing image scene classification (RSSC) has attracted a broad range of interests. While the deep convolutional neural network (CNN) has recently achieved tremendous success in RSSC, its superior performances highly depend on a large number of accurately labeled samples which require lots of time and manpower to generate for a large-scale remote sensing image scene dataset. In contrast, it is not only relatively easy to collect coarse and noisy labels but also inevitable to introduce label noise when collecting large-scale annotated data in the remote sensing scenario. Therefore, it is of great practical importance to robustly learn a superior CNN-based classification model from the remote sensing image scene dataset containing non-negligible or even significant error labels. To this end, this article proposes a new RSSC-oriented error-tolerant deep learning (RSSC-ETDL) approach to mitigate the adverse effect of incorrect labels of the remote sensing image scene dataset. In our proposed RSSC-ETDL method, learning multiview CNNs and correcting error labels are alternatively conducted in an iterative manner. It is noted that to make the alternative scheme work effectively, we propose a novel adaptive multifeature collaborative representation classifier (AMF-CRC) that benefits from adaptively combining multiple features of CNNs to correct the labels of uncertain samples. To quantitatively evaluate the performance of error-tolerant methods in the remote sensing domain, we construct remote sensing image scene datasets with: 1) simulated noisy labels by corrupting the open datasets with varying error rates and 2) real noisy labels by deploying the greedy annotation strategies that are practically used to accelerate the process of annotating remote sensing image scene datasets. Extensive experiments on these datasets demonstrate that our proposed RSSC-ETDL approach outperforms the state-of-the-art approaches.
Collapse
|
32
|
Feng Y, Yuan Y, Lu X. Person Reidentification via Unsupervised Cross-View Metric Learning. IEEE TRANSACTIONS ON CYBERNETICS 2021; 51:1849-1859. [PMID: 31021787 DOI: 10.1109/tcyb.2019.2909480] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/09/2023]
Abstract
Person reidentification (Re-ID) aims to match observations of individuals across multiple nonoverlapping camera views. Recently, metric learning-based methods have played important roles in addressing this task. However, metrics are mostly learned in supervised manners, of which the performance relies heavily on the quantity and quality of manual annotations. Meanwhile, metric learning-based algorithms generally project person features into a common subspace, in which the extracted features are shared by all views. However, it may result in information loss since these algorithms neglect the view-specific features. Besides, they assume person samples of different views are taken from the same distribution. Conversely, these samples are more likely to obey different distributions due to view condition changes. To this end, this paper proposes an unsupervised cross-view metric learning method based on the properties of data distributions. Specifically, person samples in each view are taken from a mixture of two distributions: one models common prosperities among camera views and the other focuses on view-specific properties. Based on this, we introduce a shared mapping to explore the shared features. Meanwhile, we construct view-specific mappings to extract and project view-related features into a common subspace. As a result, samples in the transformed subspace follow the same distribution and are equipped with comprehensive representations. In this paper, these mappings are learned in an unsupervised manner by clustering samples in the projected space. Experimental results on five cross-view datasets validate the effectiveness of the proposed method.
Collapse
|
33
|
Ying Y, Zhang N, Shan P, Miao L, Sun P, Peng S. PSigmoid: Improving squeeze-and-excitation block with parametric sigmoid. APPL INTELL 2021. [DOI: 10.1007/s10489-021-02247-z] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
|
34
|
Jiang G, Wang H, Peng J, Chen D, Fu X. Graph-based Multi-view Binary Learning for image clustering. Neurocomputing 2021. [DOI: 10.1016/j.neucom.2020.07.132] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/23/2022]
|
35
|
Wu L, Wang Y, Gao J, Wang M, Zha ZJ, Tao D. Deep Coattention-Based Comparator for Relative Representation Learning in Person Re-Identification. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2021; 32:722-735. [PMID: 32275611 DOI: 10.1109/tnnls.2020.2979190] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/11/2023]
Abstract
Person re-identification (re-ID) favors discriminative representations over unseen shots to recognize identities in disjoint camera views. Effective methods are developed via pair-wise similarity learning to detect a fixed set of region features, which can be mapped to compute the similarity value. However, relevant parts of each image are detected independently without referring to the correlation on the other image. Also, region-based methods spatially position local features for their aligned similarities. In this article, we introduce the deep coattention-based comparator (DCC) to fuse codependent representations of paired images so as to correlate the best relevant parts and produce their relative representations accordingly. The proposed approach mimics the human foveation to detect the distinct regions concurrently across images and alternatively attends to fuse them into the similarity learning. Our comparator is capable of learning representations relative to a test shot and well-suited to reidentifying pedestrians in surveillance. We perform extensive experiments to provide the insights and demonstrate the state of the arts achieved by our method in benchmark data sets: 1.2 and 2.5 points gain in mean average precision (mAP) on DukeMTMC-reID and Market-1501, respectively.
Collapse
|
36
|
Bilateral attention decoder: A lightweight decoder for real-time semantic segmentation. Neural Netw 2021; 137:188-199. [PMID: 33647536 DOI: 10.1016/j.neunet.2021.01.021] [Citation(s) in RCA: 22] [Impact Index Per Article: 5.5] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/11/2020] [Revised: 12/16/2020] [Accepted: 01/22/2021] [Indexed: 11/22/2022]
Abstract
The encoder-decoder structure has been introduced into semantic segmentation to improve the spatial accuracy of the network by fusing high- and low-level feature maps. However, recent state-of-the-art encoder-decoder-based methods can hardly attain the real-time requirement due to their complex and inefficient decoders. To address this issue, in this paper, we propose a lightweight bilateral attention decoder for real-time semantic segmentation. It consists of two blocks and can fuse different level feature maps via two steps, i.e., information refinement and information fusion. In the first step, we propose a channel attention branch to refine the high-level feature maps and a spatial attention branch for the low-level ones. The refined high-level feature maps can capture more exact semantic information and the refined low-level ones can capture more accurate spatial information, which significantly improves the information capturing ability of these feature maps. In the second step, we develop a new fusion module named pooling fusing block to fuse the refined high- and low-level feature maps. This fusion block can take full advantages of the high- and low-level feature maps, leading to high-quality fusion results. To verify the efficiency of the proposed bilateral attention decoder, we adopt a lightweight network as the backbone and compare our proposed method with other state-of-the-art real-time semantic segmentation methods on the Cityscapes and Camvid datasets. Experimental results demonstrate that our proposed method can achieve better performance with a higher inference speed. Moreover, we compare our proposed network with several state-of-the-art non-real-time semantic segmentation methods and find that our proposed network can also attain better segmentation performance.
Collapse
|
37
|
Ji Z, Liu X, Pang Y, Ouyang W, Li X. Few-Shot Human-Object Interaction Recognition With Semantic-Guided Attentive Prototypes Network. IEEE TRANSACTIONS ON IMAGE PROCESSING : A PUBLICATION OF THE IEEE SIGNAL PROCESSING SOCIETY 2021; 30:1648-1661. [PMID: 33382652 DOI: 10.1109/tip.2020.3046861] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/12/2023]
Abstract
Extreme instance imbalance among categories and combinatorial explosion make the recognition of Human-Object Interaction (HOI) a challenging task. Few studies have addressed both challenges directly. Motivated by the success of few-shot learning that learns a robust model from a few instances, we formulate HOI as a few-shot task in a meta-learning framework to alleviate the above challenges. Due to the fact that the intrinsical characteristic of HOI is diverse and interactive, we propose a Semantic-guided Attentive Prototypes Network (SAPNet) framework to learn a semantic-guided metric space where HOI recognition can be performed by computing distances to attentive prototypes of each class. Specifically, the model generates attentive prototypes guided by the category names of actions and objects, which highlight the commonalities of images from the same class in HOI. In addition, we design two alternative prototypes calculation methods, i.e., Prototypes Shift (PS) approach and Hallucinatory Graph Prototypes (HGP) approach, which explore to learn a suitable category prototypes representations in HOI. Finally, in order to realize the task of few-shot HOI, we reorganize 2 HOI benchmark datasets with 2 split strategies, i.e., HICO-NN, TUHOI-NN, HICO-NF, and TUHOI-NF. Extensive experimental results on these datasets have demonstrated the effectiveness of our proposed SAPNet approach.
Collapse
|
38
|
Zhang M, Tian G, Zhang Y, Duan P. Service skill improvement for home robots: Autonomous generation of action sequence based on reinforcement learning. Knowl Based Syst 2021. [DOI: 10.1016/j.knosys.2020.106605] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/07/2023]
|
39
|
Li X, Li Z, Yang D, Zhong L, Huang L, Lin J. Research on Finger Vein Image Segmentation and Blood Sampling Point Location in Automatic Blood Collection. SENSORS 2020; 21:s21010132. [PMID: 33379213 PMCID: PMC7795357 DOI: 10.3390/s21010132] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 11/27/2020] [Revised: 12/22/2020] [Accepted: 12/23/2020] [Indexed: 11/16/2022]
Abstract
In the fingertip blood automatic sampling process, when the blood sampling point in the fingertip venous area, it will greatly increase the amount of bleeding without being squeezed. In order to accurately locate the blood sampling point in the venous area, we propose a new finger vein image segmentation approach basing on Gabor transform and Gaussian mixed model (GMM). Firstly, Gabor filter parameter can be set adaptively according to the differential excitation of image and we use the local binary pattern (LBP) to fuse the same-scale and multi-orientation Gabor features of the image. Then, finger vein image segmentation is achieved by Gabor-GMM system and optimized by the max flow min cut method which is based on the relative entropy of the foreground and the background. Finally, the blood sampling point can be localized with corner detection. The experimental results show that the proposed approach has significant performance in segmenting finger vein images which the average accuracy of segmentation images reach 91.6%.
Collapse
Affiliation(s)
- Xi Li
- School of Communication and Information Engineering, Chongqing University of Posts and Telecommunications, Chongqing 400065, China; (X.L.); (D.Y.); (L.Z.); (L.H.)
- Foundation Department, Chongqing Medical and Pharmaceutical College, Chongqing 401331, China
| | - Zhangyong Li
- School of Bioinformatics, Chongqing University of Posts and Telecommunications, Chongqing 400065, China;
| | - Dewei Yang
- School of Communication and Information Engineering, Chongqing University of Posts and Telecommunications, Chongqing 400065, China; (X.L.); (D.Y.); (L.Z.); (L.H.)
| | - Lisha Zhong
- School of Communication and Information Engineering, Chongqing University of Posts and Telecommunications, Chongqing 400065, China; (X.L.); (D.Y.); (L.Z.); (L.H.)
| | - Lian Huang
- School of Communication and Information Engineering, Chongqing University of Posts and Telecommunications, Chongqing 400065, China; (X.L.); (D.Y.); (L.Z.); (L.H.)
| | - Jinzhao Lin
- School of Communication and Information Engineering, Chongqing University of Posts and Telecommunications, Chongqing 400065, China; (X.L.); (D.Y.); (L.Z.); (L.H.)
- Correspondence:
| |
Collapse
|
40
|
Zheng Y, Zheng X, Lu X, Wu S. Spatial attention based visual semantic learning for action recognition in still images. Neurocomputing 2020. [DOI: 10.1016/j.neucom.2020.07.016] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/15/2022]
|
41
|
Convolutional Attention Network with Maximizing Mutual Information for Fine-Grained Image Classification. Symmetry (Basel) 2020. [DOI: 10.3390/sym12091511] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/17/2022] Open
Abstract
Fine-grained image classification has seen a great improvement benefiting from the advantages of deep learning techniques. Most fine-grained image classification methods focus on extracting discriminative features and combining the global features with the local ones. However, the accuracy is limited due to the inter-class similarity and the inner-class divergence as well as the lack of enough labelled images to train a deep network which can generalize to fine-grained classes. To deal with these problems, we develop an algorithm which combines Maximizing the Mutual Information (MMI) with the Learning Attention (LA). We make use of MMI to distill knowledge from the image pairs which contain the same object. Meanwhile we take advantage of the LA mechanism to find the salient region of the image to enhance the information distillation. Our model can extract more discriminative semantic features and improve the performance on fine-grained image classification. Our model has a symmetric structure, in which the paired images are inputted into the same network to extract the local and global features for the subsequent MMI and LA modules. We train the model by maximizing the mutual information and minimizing the cross-entropy stage by stage alternatively. Experiments show that our model can improve the performance of the fine-grained image classification effectively.
Collapse
|
42
|
Du C, Yuan J, Dong J, Li L, Chen M, Li T. GPU based parallel optimization for real time panoramic video stitching. Pattern Recognit Lett 2020. [DOI: 10.1016/j.patrec.2019.06.018] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/26/2022]
|
43
|
Wan Z, Jiang C, Fahad M, Ni Z, Guo Y, He H. Robot-Assisted Pedestrian Regulation Based on Deep Reinforcement Learning. IEEE TRANSACTIONS ON CYBERNETICS 2020; 50:1669-1682. [PMID: 30475740 DOI: 10.1109/tcyb.2018.2878977] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/09/2023]
Abstract
Pedestrian regulation can prevent crowd accidents and improve crowd safety in densely populated areas. Recent studies use mobile robots to regulate pedestrian flows for desired collective motion through the effect of passive human-robot interaction (HRI). This paper formulates a robot motion planning problem for the optimization of two merging pedestrian flows moving through a bottleneck exit. To address the challenge of feature representation of complex human motion dynamics under the effect of HRI, we propose using a deep neural network to model the mapping from the image input of pedestrian environments to the output of robot motion decisions. The robot motion planner is trained end-to-end using a deep reinforcement learning algorithm, which avoids hand-crafted feature detection and extraction, thus improving the learning capability for complex dynamic problems. Our proposed approach is validated in simulated experiments, and its performance is evaluated. The results demonstrate that the robot is able to find optimal motion decisions that maximize the pedestrian outflow in different flow conditions, and the pedestrian-accumulated outflow increases significantly compared to cases without robot regulation and with random robot motion.
Collapse
|
44
|
Qin C, Zhu H, Xu T, Zhu C, Ma C, Chen E, Xiong H. An Enhanced Neural Network Approach to Person-Job Fit in Talent Recruitment. ACM T INFORM SYST 2020. [DOI: 10.1145/3376927] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/25/2022]
Abstract
The widespread use of online recruitment services has led to an information explosion in the job market. As a result, recruiters have to seek intelligent ways for Person-Job Fit, which is the bridge for adapting the right candidates to the right positions. Existing studies on Person-Job Fit usually focus on measuring the matching degree between talent qualification and job requirements mainly based on the manual inspection of human resource experts, which could be easily misguided by the subjective, incomplete, and inefficient nature of human judgment. To that end, in this article, we propose a novel end-to-end
T
opic-based
A
bility-aware
P
erson-
J
ob
F
it
N
eural
N
etwork (TAPJFNN) framework, which has a goal of reducing the dependence on manual labor and can provide better interpretability about the fitting results. The key idea is to exploit the rich information available in abundant historical job application data. Specifically, we propose a word-level semantic representation for both job requirements and job seekers’ experiences based on Recurrent Neural Network (RNN). Along this line, two hierarchical topic-based ability-aware attention strategies are designed to measure the different importance of job requirements for semantic representation, as well as measure the different contribution of each job experience to a specific ability requirement. In addition, we design a refinement strategy for Person-Job Fit prediction based on historical recruitment records. Furthermore, we introduce how to exploit our TAPJFNN framework for enabling two specific applications in talent recruitment: talent sourcing and job recommendation. Particularly, in the application of job recommendation, a novel training mechanism is designed for addressing the challenge of biased negative labels. Finally, extensive experiments on a large-scale real-world dataset clearly validate the effectiveness and interpretability of the TAPJFNN and its variants compared with several baselines.
Collapse
Affiliation(s)
- Chuan Qin
- School of Computer Science, University of Science and Technology of China
| | | | - Tong Xu
- School of Computer Science, University of Science and Technology of China
| | - Chen Zhu
- Baidu Talent Intelligence Center, Baidu Inc
| | - Chao Ma
- Baidu Talent Intelligence Center, Baidu Inc
| | - Enhong Chen
- School of Computer Science, University of Science and Technology of China
| | - Hui Xiong
- School of Computer Science, University of Science and Technology of China
| |
Collapse
|
45
|
Gammulle H, Denman S, Sridharan S, Fookes C. Hierarchical Attention Network for Action Segmentation. Pattern Recognit Lett 2020. [DOI: 10.1016/j.patrec.2020.01.023] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/26/2022]
|
46
|
Wu L, Wang Y, Shao L, Wang M. 3-D PersonVLAD: Learning Deep Global Representations for Video-Based Person Reidentification. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2019; 30:3347-3359. [PMID: 30716051 DOI: 10.1109/tnnls.2019.2891244] [Citation(s) in RCA: 33] [Impact Index Per Article: 5.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/09/2023]
Abstract
We present the global deep video representation learning to video-based person reidentification (re-ID) that aggregates local 3-D features across the entire video extent. Existing methods typically extract frame-wise deep features from 2-D convolutional networks (ConvNets) which are pooled temporally to produce the video-level representations. However, 2-D ConvNets lose temporal priors immediately after the convolutions, and a separate temporal pooling is limited in capturing human motion in short sequences. In this paper, we present global video representation learning, to be complementary to 3-D ConvNets as a novel layer to capture the appearance and motion dynamics in full-length videos. Nevertheless, encoding each video frame in its entirety and computing aggregate global representations across all frames is tremendously challenging due to the occlusions and misalignments. To resolve this, our proposed network is further augmented with the 3-D part alignment to learn local features through the soft-attention module. These attended features are statistically aggregated to yield identity-discriminative representations. Our global 3-D features are demonstrated to achieve the state-of-the-art results on three benchmark data sets: MARS, Imagery Library for Intelligent Detection Systems-Video Re-identification, and PRID2011.
Collapse
|
47
|
Wu L, Wang Y, Yin H, Wang M, Shao L. Few-Shot Deep Adversarial Learning for Video-based Person Re-identification. IEEE TRANSACTIONS ON IMAGE PROCESSING : A PUBLICATION OF THE IEEE SIGNAL PROCESSING SOCIETY 2019; 29:1233-1245. [PMID: 31535998 DOI: 10.1109/tip.2019.2940684] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/10/2023]
Abstract
Video-based person re-identification (re-ID) refers to matching people across camera views from arbitrary unaligned video footages. Existing methods rely on supervision signals to optimise a projected space under which the distances between inter/intra-videos are maximised/minimised. However, this demands exhaustively labelling people across camera views, rendering them unable to be scaled in large networked cameras. Also, it is noticed that learning effective video representations with view invariance is not explicitly addressed for which features exhibit different distributions otherwise. Thus, matching videos for person re-ID demands flexible models to capture the dynamics in time-series observations and learn view-invariant representations with access to limited labeled training samples. In this paper, we propose a novel few-shot deep learning approach to videobased person re-ID, to learn comparable representations that are discriminative and view-invariant. The proposed method is developed on the variational recurrent neural networks (VRNNs) and trained adversarially to produce latent variables with temporal dependencies that are highly discriminative yet view-invariant in matching persons. Through extensive experiments conducted on three benchmark datasets, we empirically show the capability of our method in creating view-invariant temporal features and state-of-the-art performance achieved by our method.
Collapse
|
48
|
Deep Visible and Thermal Image Fusion for Enhanced Pedestrian Visibility. SENSORS 2019; 19:s19173727. [PMID: 31466378 PMCID: PMC6749306 DOI: 10.3390/s19173727] [Citation(s) in RCA: 22] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 07/02/2019] [Revised: 08/13/2019] [Accepted: 08/13/2019] [Indexed: 11/16/2022]
Abstract
Reliable vision in challenging illumination conditions is one of the crucial requirements of future autonomous automotive systems. In the last decade, thermal cameras have become more easily accessible to a larger number of researchers. This has resulted in numerous studies which confirmed the benefits of the thermal cameras in limited visibility conditions. In this paper, we propose a learning-based method for visible and thermal image fusion that focuses on generating fused images with high visual similarity to regular truecolor (red-green-blue or RGB) images, while introducing new informative details in pedestrian regions. The goal is to create natural, intuitive images that would be more informative than a regular RGB camera to a human driver in challenging visibility conditions. The main novelty of this paper is the idea to rely on two types of objective functions for optimization: a similarity metric between the RGB input and the fused output to achieve natural image appearance; and an auxiliary pedestrian detection error to help defining relevant features of the human appearance and blending them into the output. We train a convolutional neural network using image samples from variable conditions (day and night) so that the network learns the appearance of humans in the different modalities and creates more robust results applicable in realistic situations. Our experiments show that the visibility of pedestrians is noticeably improved especially in dark regions and at night. Compared to existing methods we can better learn context and define fusion rules that focus on the pedestrian appearance, while that is not guaranteed with methods that focus on low-level image quality metrics.
Collapse
|
49
|
Fu B, Li Y, Wang XH, Ren YG. Image super-resolution using TV priori guided convolutional network. Pattern Recognit Lett 2019. [DOI: 10.1016/j.patrec.2019.06.022] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/26/2022]
|
50
|
|