1
|
Qian Y, Tang SK. Multi-Scale Contrastive Learning with Hierarchical Knowledge Synergy for Visible-Infrared Person Re-Identification. SENSORS (BASEL, SWITZERLAND) 2025; 25:192. [PMID: 39796982 PMCID: PMC11722841 DOI: 10.3390/s25010192] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 12/13/2024] [Revised: 12/30/2024] [Accepted: 12/30/2024] [Indexed: 01/13/2025]
Abstract
Visible-infrared person re-identification (VI-ReID) is a challenging cross-modality retrieval task to match a person across different spectral camera views. Most existing works focus on learning shared feature representations from the final embedding space of advanced networks to alleviate modality differences between visible and infrared images. However, exclusively relying on high-level semantic information from the network's final layers can restrict shared feature representations and overlook the benefits of low-level details. Different from these methods, we propose a multi-scale contrastive learning network (MCLNet) with hierarchical knowledge synergy for VI-ReID. MCLNet is a novel two-stream contrastive deep supervision framework designed to train low-level details and high-level semantic representations simultaneously. MCLNet utilizes supervised contrastive learning (SCL) at each intermediate layer to strengthen visual representations and enhance cross-modality feature learning. Furthermore, a hierarchical knowledge synergy (HKS) strategy for pairwise knowledge matching promotes explicit information interaction across multi-scale features and improves information consistency. Extensive experiments on three benchmarks demonstrate the effectiveness of MCLNet.
Collapse
Affiliation(s)
- Yongheng Qian
- Faculty of Applied Sciences, Macao Polytechnic University, Macao SAR 999078, China;
- Department of Mechatronics and Information Engineering, Zunyi Vocational and Technical College, Zunyi 563000, China
| | - Su-Kit Tang
- Faculty of Applied Sciences, Macao Polytechnic University, Macao SAR 999078, China;
| |
Collapse
|
2
|
Farah H, Bennour A, Kurdi NA, Hammami S, Al-Sarem M. Channel and Spatial Attention in Chest X-Ray Radiographs: Advancing Person Identification and Verification with Self-Residual Attention Network. Diagnostics (Basel) 2024; 14:2655. [PMID: 39682563 DOI: 10.3390/diagnostics14232655] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/30/2024] [Revised: 11/18/2024] [Accepted: 11/20/2024] [Indexed: 12/18/2024] Open
Abstract
BACKGROUND/OBJECTIVES In contrast to traditional biometric modalities, such as facial recognition, fingerprints, and iris scans or even DNA, the research orientation towards chest X-ray recognition has been spurred by its remarkable recognition rates. Capturing the intricate anatomical nuances of an individual's skeletal structure, the ribcage of the chest, lungs, and heart, chest X-rays have emerged as a focal point for identification and verification, especially in the forensic field, even in scenarios where the human body damaged or disfigured. Discriminative feature embedding is essential for large-scale image verification, especially in applying chest X-ray radiographs for identity identification and verification. This study introduced a self-residual attention-based convolutional neural network (SRAN) aimed at effective feature embedding, capturing long-range dependencies and emphasizing critical spatial features in chest X-rays. This method offers a novel approach to person identification and verification through chest X-ray categorization, relevant for biometric applications and patient care, particularly when traditional biometric modalities are ineffective. METHOD The SRAN architecture integrated a self-channel and self-spatial attention module to minimize channel redundancy and enhance significant spatial elements. The attention modules worked by dynamically aggregating feature maps across channel and spatial dimensions to enhance feature differentiation. For the network backbone, a self-residual attention block (SRAB) was implemented within a ResNet50 framework, forming a Siamese network trained with triplet loss to improve feature embedding for identity identification and verification. RESULTS By leveraging the NIH ChestX-ray14 and CheXpert datasets, our method demonstrated notable improvements in accuracy for identity verification and identification based on chest X-ray images. This approach effectively captured the detailed anatomical characteristics of individuals, including skeletal structure, ribcage, lungs, and heart, highlighting chest X-rays as a viable biometric tool even in cases of body damage or disfigurement. CONCLUSIONS The proposed SRAN with self-residual attention provided a promising solution for biometric identification through chest X-ray imaging, showcasing its potential for accurate and reliable identity verification where traditional biometric approaches may fall short, especially in postmortem cases or forensic investigations. This methodology could play a transformative role in both biometric security and healthcare applications, offering a robust alternative modality for identity verification.
Collapse
Affiliation(s)
- Hazem Farah
- Laboratory of Mathematics, Informatics and Systems (LAMIS), Echahid Chiekh Larbi Tebessi University, Tebessa 12002, Algeria
| | - Akram Bennour
- Laboratory of Mathematics, Informatics and Systems (LAMIS), Echahid Chiekh Larbi Tebessi University, Tebessa 12002, Algeria
| | - Neesrin Ali Kurdi
- College of Computer Science and Engineering, Taibah University, Medina 41477, Saudi Arabia
| | - Samir Hammami
- Department of Management Information Systems, Dhofar University, Dhofar, Salalah 211, Oman
| | - Mohammed Al-Sarem
- College of Computer Science and Engineering, Taibah University, Medina 41477, Saudi Arabia
| |
Collapse
|
3
|
Disagreement attention: Let us agree to disagree on computed tomography segmentation. Biomed Signal Process Control 2023. [DOI: 10.1016/j.bspc.2023.104769] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 03/19/2023]
|
4
|
Zhou Y, Liu P, Cui Y, Liu C, Duan W. Integration of Multi-Head Self-Attention and Convolution for Person Re-Identification. SENSORS (BASEL, SWITZERLAND) 2022; 22:6293. [PMID: 36016054 PMCID: PMC9414396 DOI: 10.3390/s22166293] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/06/2022] [Revised: 08/14/2022] [Accepted: 08/17/2022] [Indexed: 06/12/2023]
Abstract
Person re-identification is essential to intelligent video analytics, whose results affect downstream tasks such as behavior and event analysis. However, most existing models only consider the accuracy, rather than the computational complexity, which is also an aspect to consider in practical deployment. We note that self-attention is a powerful technique for representation learning. It can work with convolution to learn more discriminative feature representations for re-identification. We propose an improved multi-scale feature learning structure, DM-OSNet, with better performance than the original OSNet. Our DM-OSNet replaces the 9×9 convolutional stream in OSNet with multi-head self-attention. To maintain model efficiency, we use double-layer multi-head self-attention to reduce the computational complexity of the original multi-head self-attention. The computational complexity is reduced from the original O((H×W)2) to O(H×W×G2). To further improve the model performance, we use SpCL to perform unsupervised pre-training on the large-scale unlabeled pedestrian dataset LUPerson. Finally, our DM-OSNet achieves an mAP of 87.36%, 78.26%, 72.96%, and 57.13% on the Market1501, DukeMTMC-reID, CUHK03, and MSMT17 datasets.
Collapse
Affiliation(s)
- Yalei Zhou
- School of Control and Computer Engineering, North China Electric Power University, Beijing 102206, China
| | - Peng Liu
- School of Control and Computer Engineering, North China Electric Power University, Beijing 102206, China
- School of Electrical and Electronic Engineering, North China Electric Power University, Beijing 102206, China
- Yangzhong Intelligent Electric Research Center, North China Electric Power University, Yangzhong 212211, China
| | - Yue Cui
- School of Electrical and Electronic Engineering, North China Electric Power University, Beijing 102206, China
| | - Chunguang Liu
- Yangzhong Intelligent Electric Research Center, North China Electric Power University, Yangzhong 212211, China
| | - Wenli Duan
- School of Control and Computer Engineering, North China Electric Power University, Beijing 102206, China
| |
Collapse
|
5
|
LiDAR-based Detection, Tracking, and Property Estimation: A Contemporary Review. Neurocomputing 2022. [DOI: 10.1016/j.neucom.2022.07.087] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/21/2022]
|
6
|
|
7
|
Mi JX, Feng J, Huang KY. Designing efficient convolutional neural network structure: A survey. Neurocomputing 2022. [DOI: 10.1016/j.neucom.2021.08.158] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/18/2022]
|
8
|
Defect Detection for Metal Base of TO-Can Packaged Laser Diode Based on Improved YOLO Algorithm. ELECTRONICS 2022. [DOI: 10.3390/electronics11101561] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/17/2022]
Abstract
Defect detection is an important part of the manufacturing process of mechanical products. In order to detect the appearance defects quickly and accurately, a method of defect detection for the metal base of TO-can packaged laser diode (metal TO-base) based on the improved You Only Look Once (YOLO) algorithm named YOLO-SO is proposed in this study. Firstly, convolutional block attention mechanism (CBAM) module was added to the convolutional layer of the backbone network. Then, a random-paste-mosaic (RPM) small object data augmentation module was proposed on the basis of Mosaic algorithm in YOLO-V5. Finally, the K-means++ clustering algorithm was applied to reduce the sensitivity to the initial clustering center, making the positioning more accurate and reducing the network loss. The proposed YOLO-SO model was compared with other object detection algorithms such as YOLO-V3, YOLO-V4, and Faster R-CNN. Experimental results demonstrated that the YOLO-SO model reaches 84.0% mAP, 5.5% higher than the original YOLO-V5 algorithm. Moreover, the YOLO-SO model had clear advantages in terms of the smallest weight size and detection speed of 25 FPS. These advantages make the YOLO-SO model more suitable for the real-time detection of metal TO-base appearance defects.
Collapse
|
9
|
Guo ZH, Chen ZH, You ZH, Wang YB, Yi HC, Wang MN. A learning-based method to predict LncRNA-disease associations by combining CNN and ELM. BMC Bioinformatics 2022; 22:622. [PMID: 35317723 PMCID: PMC8941737 DOI: 10.1186/s12859-022-04611-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/02/2021] [Accepted: 10/07/2021] [Indexed: 11/10/2022] Open
Abstract
Background lncRNAs play a critical role in numerous biological processes and life activities, especially diseases. Considering that traditional wet experiments for identifying uncovered lncRNA-disease associations is limited in terms of time consumption and labor cost. It is imperative to construct reliable and efficient computational models as addition for practice. Deep learning technologies have been proved to make impressive contributions in many areas, but the feasibility of it in bioinformatics has not been adequately verified. Results In this paper, a machine learning-based model called LDACE was proposed to predict potential lncRNA-disease associations by combining Extreme Learning Machine (ELM) and Convolutional Neural Network (CNN). Specifically, the representation vectors are constructed by integrating multiple types of biology information including functional similarity and semantic similarity. Then, CNN is applied to mine both local and global features. Finally, ELM is chosen to carry out the prediction task to detect the potential lncRNA-disease associations. The proposed method achieved remarkable Area Under Receiver Operating Characteristic Curve of 0.9086 in Leave-one-out cross-validation and 0.8994 in fivefold cross-validation, respectively. In addition, 2 kinds of case studies based on lung cancer and endometrial cancer indicate the robustness and efficiency of LDACE even in a real environment. Conclusions Substantial results demonstrated that the proposed model is expected to be an auxiliary tool to guide and assist biomedical research, and the close integration of deep learning and biology big data will provide life sciences with novel insights.
Collapse
Affiliation(s)
- Zhen-Hao Guo
- School of Electronics and Information Engineering, Tongji University, No. 4800 Cao'an Road, Shanghai, 201804, China
| | - Zhan-Heng Chen
- College of Computer Science and Engineering, Shenzhen University, Shenzhen, 518060, China.
| | - Zhu-Hong You
- School of Computer Science, Northwestern Polytechnical University, Xi'an, 710129, China
| | - Yan-Bin Wang
- College of Information Science and Engineering, Zaozhuang University, Zaozhuang, 277100, Shandong, China.
| | - Hai-Cheng Yi
- Xinjiang Technical Institute of Physics and Chemistry, Chinese Academy of Sciences, Urumqi, 830011, China.,University of Chinese Academy of Sciences, Beijing, 100049, China
| | - Mei-Neng Wang
- School of Mathematics and Computer Science, Yichun University, Yichun, 336000, Jiangxi, China
| |
Collapse
|
10
|
Córdova M, Pinto A, Hellevik CC, Alaliyat SAA, Hameed IA, Pedrini H, Torres RDS. Litter Detection with Deep Learning: A Comparative Study. SENSORS 2022; 22:s22020548. [PMID: 35062507 PMCID: PMC8812282 DOI: 10.3390/s22020548] [Citation(s) in RCA: 7] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 12/11/2021] [Revised: 12/30/2021] [Accepted: 01/05/2022] [Indexed: 11/28/2022]
Abstract
Pollution in the form of litter in the natural environment is one of the great challenges of our times. Automated litter detection can help assess waste occurrences in the environment. Different machine learning solutions have been explored to develop litter detection tools, thereby supporting research, citizen science, and volunteer clean-up initiatives. However, to the best of our knowledge, no work has investigated the performance of state-of-the-art deep learning object detection approaches in the context of litter detection. In particular, no studies have focused on the assessment of those methods aiming their use in devices with low processing capabilities, e.g., mobile phones, typically employed in citizen science activities. In this paper, we fill this literature gap. We performed a comparative study involving state-of-the-art CNN architectures (e.g., Faster RCNN, Mask-RCNN, EfficientDet, RetinaNet and YOLO-v5), two litter image datasets and a smartphone. We also introduce a new dataset for litter detection, named PlastOPol, composed of 2418 images and 5300 annotations. The experimental results demonstrate that object detectors based on the YOLO family are promising for the construction of litter detection solutions, with superior performance in terms of detection accuracy, processing time, and memory footprint.
Collapse
Affiliation(s)
- Manuel Córdova
- Institute of Computing, University of Campinas, Avenue Albert Einstein, Campinas 13083-852, Brazil; (M.C.); (H.P.)
| | - Allan Pinto
- Brazilian Center for Research in Energy and Materials (CNPEM), Brazilian Synchrotron Light Laboratory (LNLS), Campinas 13083-100, Brazil;
| | - Christina Carrozzo Hellevik
- Department of International Business, NTNU—Norwegian University of Science and Technology, Larsgårdsvegen 2, 6009 Alesund, Norway;
| | - Saleh Abdel-Afou Alaliyat
- Department of ICT and Natural Sciences, NTNU—Norwegian University of Science and Technology, Larsgårdsvegen 2, 6009 Alesund, Norway; (S.A.-A.A.); (I.A.H.)
| | - Ibrahim A. Hameed
- Department of ICT and Natural Sciences, NTNU—Norwegian University of Science and Technology, Larsgårdsvegen 2, 6009 Alesund, Norway; (S.A.-A.A.); (I.A.H.)
| | - Helio Pedrini
- Institute of Computing, University of Campinas, Avenue Albert Einstein, Campinas 13083-852, Brazil; (M.C.); (H.P.)
| | - Ricardo da S. Torres
- Department of ICT and Natural Sciences, NTNU—Norwegian University of Science and Technology, Larsgårdsvegen 2, 6009 Alesund, Norway; (S.A.-A.A.); (I.A.H.)
- Farm Technology Group and Wageningen Data Competence Center, Wageningen University and Research, 6708 PB Wageningen, The Netherlands
- Correspondence:
| |
Collapse
|
11
|
Self-Erasing Network for Person Re-Identification. SENSORS 2021; 21:s21134262. [PMID: 34206315 PMCID: PMC8271670 DOI: 10.3390/s21134262] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 05/12/2021] [Revised: 06/08/2021] [Accepted: 06/17/2021] [Indexed: 11/24/2022]
Abstract
Person re-identification (ReID) plays an important role in intelligent surveillance and receives widespread attention from academics and the industry. Due to extreme changes in viewing angles, some discriminative local regions are suppressed. In addition, the data with similar backgrounds collected by a fixed viewing angle camera will also affect the model’s ability to distinguish a person. Therefore, we need to discover more fine-grained information to form the overall characteristics of each identity. The proposed self-erasing network structure composed of three branches benefits the extraction of global information, the suppression of background noise and the mining of local information. The two self-erasing strategies that we proposed encourage the network to focus on foreground information and strengthen the model’s ability to encode weak features so as to form more effective and richer visual cues of a person. Extensive experiments show that the proposed method is competitive with the advanced methods and achieves state-of-the-art performance on DukeMTMC-ReID and CUHK-03(D) datasets. Furthermore, it can be seen from the activation map that the proposed method is beneficial to spread the attention to the whole body. Both metrics and the activation map validate the effectiveness of our proposed method.
Collapse
|
12
|
Multi-Resolution Supervision Network with an Adaptive Weighted Loss for Desert Segmentation. REMOTE SENSING 2021. [DOI: 10.3390/rs13112054] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/10/2023]
Abstract
Desert segmentation of remote sensing images is the basis of analysis of desert area. Desert images are usually characterized by large image size, large-scale change, and irregular location distribution of surface objects. The multi-scale fusion method is widely used in the existing deep learning segmentation models to solve the above problems. Based on the idea of multi-scale feature extraction, this paper took the segmentation results of each scale as an independent optimization task and proposed a multi-resolution supervision network (MrsSeg) to further improve the desert segmentation result. Due to the different optimization difficulty of each branch task, we also proposed an auxiliary adaptive weighted loss function (AWL) to automatically optimize the training process. MrsSeg first used a lightweight backbone to extract different-resolution features, then adopted a multi-resolution fusion module to fuse the local information and global information, and finally, a multi-level fusion decoder was used to aggregate and merge the features at different levels to get the desert segmentation result. In this method, each branch loss was treated as an independent task, AWL was proposed to calculate and adjust the weight of each branch. By giving priority to the easy tasks, the improved loss function could effectively improve the convergence speed of the model and the desert segmentation result. The experimental results showed that MrsSeg-AWL effectively improved the learning ability of the model and has faster convergence speed, lower parameter complexity, and more accurate segmentation results.
Collapse
|
13
|
Person Re-Identification Based on Attention Mechanism and Context Information Fusion. FUTURE INTERNET 2021. [DOI: 10.3390/fi13030072] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/16/2022] Open
Abstract
Person re-identification (ReID) plays a significant role in video surveillance analysis. In the real world, due to illumination, occlusion, and deformation, pedestrian features extraction is the key to person ReID. Considering the shortcomings of existing methods in pedestrian features extraction, a method based on attention mechanism and context information fusion is proposed. A lightweight attention module is introduced into ResNet50 backbone network equipped with a small number of network parameters, which enhance the significant characteristics of person and suppress irrelevant information. Aiming at the problem of person context information loss due to the over depth of the network, a context information fusion module is designed to sample the shallow feature map of pedestrians and cascade with the high-level feature map. In order to improve the robustness, the model is trained by combining the loss of margin sample mining with the loss function of cross entropy. Experiments are carried out on datasets Market1501 and DukeMTMC-reID, our method achieves rank-1 accuracy of 95.9% on the Market1501 dataset, and 90.1% on the DukeMTMC-reID dataset, outperforming the current mainstream method in case of only using global feature.
Collapse
|