1
|
Lai S, Liu C, Wang D, Lu H. Refocus the Attention for Parameter-Efficient Thermal Infrared Object Tracking. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2025; 36:9538-9549. [PMID: 39024085 DOI: 10.1109/tnnls.2024.3420928] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 07/20/2024]
Abstract
Introducing deep trackers to thermal infrared (TIR) tracking is hampered by the scarcity of large training datasets. To alleviate the predicament, a common approach is full fine-tuning (FFT) based on pretrained RGB parameters. Nevertheless, due to its inefficient training pattern and representation collapse risk, some parameter-efficient fine-tuning (PEFT) alternatives have been promoted recently. However, the existing PEFT algorithms typically follow a bottom-up way, where their attention solely relies on the input and lacks the capability of task-guided top-down attention, which provides the task-relevant representation such as the human visual perception system. In this article, we introduce ReFocus, a new PEFT method that adapts the pretrained RGB foundation tracking model to the downstream TIR tracking task through the guidance of high-level task-specific signals in a top-down attention manner. By freezing the entire foundation model and only training query-guided feature selection and top-down blocks, ReFocus achieves state-of-the-art (SOTA) TIR tracking performance while keeping training efficiency. Extensive experiments on five TIR tracking benchmarks demonstrate that ReFocus significantly improves the performance of the foundation tracker. Besides, further ablation studies show the effectiveness and flexible adaptability of the proposed method to lighter foundation models and different tracking frameworks. Compared to FFT and other bottom-up PEFT paradigms, such as head probe, low-rank adaptation (LoRA), and adapter, our method achieves comparable or superior performance with fewer training parameters and reveals the advantage of learning stability.
Collapse
|
2
|
Dash Y, Gupta V, Abraham A, Chandna S. Improving Object Detection in High-Altitude Infrared Thermal Images Using Magnitude-Based Pruning and Non-Maximum Suppression. J Imaging 2025; 11:69. [PMID: 40137181 PMCID: PMC11943301 DOI: 10.3390/jimaging11030069] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/24/2024] [Revised: 02/13/2025] [Accepted: 02/19/2025] [Indexed: 03/27/2025] Open
Abstract
The advancement of technology has ushered in remote sensing with the adoption of high-altitude infrared thermal object detection to leverage the distinct advantages of high-altitude platforms. These new technologies readily capture the thermal signatures of objects from an elevated point, generally unmanned aerial vehicles or drones, and thus allow for the enhancement of the detection and monitoring of extensive areas. This study explores the application of YOLOv8's advanced architecture, as well as dynamic magnitude-based pruning techniques paired with non-maximum suppression for high-altitude infrared thermal object detection using UAVs. The current research addresses the complexities of processing high-resolution thermal imagery, where traditional methods fall short. We converted dataset annotations from the COCO and PASCAL VOC formats to YOLO's required format, enabling efficient model training and inference. The results demonstrate the proposed architecture's superior speed and accuracy, effectively handling thermal signatures and object detection. Precision-recall metrics indicate robust performance, though some misclassification, particularly for persons, suggests areas for further refinement. This work highlights the advanced architecture of YOLOv8's potential in enhancing UAV-based thermal imaging applications, paving the way for more effective real-time object detection solutions.
Collapse
Affiliation(s)
- Yajnaseni Dash
- School of Artificial Intelligence, Bennett University, Greater Noida 201310, India
| | - Vinayak Gupta
- School of Computer Science Engineering and Technology, Bennett University, Greater Noida 201310, India;
| | - Ajith Abraham
- School of Artificial Intelligence, Sai University, Chennai 603104, India;
| | - Swati Chandna
- Department of Information, Applied Data Science and Analytics, SRH University of Applied Sciences, 69123 Heidelberg, Germany
| |
Collapse
|
3
|
Seong G, Kim D. An Intelligent Ball Bearing Fault Diagnosis System Using Enhanced Rotational Characteristics on Spectrogram. SENSORS (BASEL, SWITZERLAND) 2024; 24:776. [PMID: 38339493 PMCID: PMC10857163 DOI: 10.3390/s24030776] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 11/16/2023] [Revised: 01/04/2024] [Accepted: 01/22/2024] [Indexed: 02/12/2024]
Abstract
Faults in the ball bearing are a major cause of failure in rotating machinery where ball bearings are used. Therefore, there is a growing demand for ball bearing fault diagnosis to prevent failures in rotating machinery. Although studies on the fault diagnosis of bearing have been conducted using temperature measurements and sound monitoring, these methods have limitations, because they are affected by external noise. Therefore, many researchers have studied vibration monitoring for bearing fault diagnosis. Among these, mel-frequency cepstral coefficients (MFCCs) and 2D convolutional neural networks (CNNs) have attracted significant attention in vibration monitoring schemes. However, the MFCC in existing studies requires a high sampling rate and an expansive frequency band utilization. In addition, 2D CNNs are highly complex. In this study, a rotational characteristic emphasis (RCE) spectrogram process and an optimized CNN were proposed to solve these problems. The RCE spectrogram process analyzes a narrow frequency band and produces low-resolution images. The optimized CNN was designed with a shallow network structure. The experimental results showed an accuracy of 0.9974 for the proposed system. The optimized CNN model has parameters of 5.81 KB and FLOPs of 1.53×106. We demonstrate that the proposed ball bearing fault diagnosis system can achieve high accuracy with low complexity. Thus, we propose a ball bearing fault diagnosis scheme that is applicable to a low sampling rate and changing rotation frequency.
Collapse
Affiliation(s)
| | - Dongwan Kim
- Department of Electronics Engineering, Dong-A University, Busan 49315, Republic of Korea;
| |
Collapse
|
4
|
Yuan D, Shu X, Liu Q, Zhang X, He Z. Robust thermal infrared tracking via an adaptively multi-feature fusion model. Neural Comput Appl 2022; 35:3423-3434. [PMID: 36245795 PMCID: PMC9553631 DOI: 10.1007/s00521-022-07867-1] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/25/2022] [Accepted: 09/21/2022] [Indexed: 01/31/2023]
Abstract
When dealing with complex thermal infrared (TIR) tracking scenarios, the single category feature is not sufficient to portray the appearance of the target, which drastically affects the accuracy of the TIR target tracking method. In order to address these problems, we propose an adaptively multi-feature fusion model (AMFT) for the TIR tracking task. Specifically, our AMFT tracking method adaptively integrates hand-crafted features and deep convolutional neural network (CNN) features. In order to accurately locate the target position, it takes advantage of the complementarity between different features. Additionally, the model is updated using a simple but effective model update strategy to adapt to changes in the target during tracking. In addition, a simple but effective model update strategy is adopted to adapt the model to the changes of the target during the tracking process. We have shown through ablation studies that the adaptively multi-feature fusion model in our AMFT tracking method is very effective. Our AMFT tracker performs favorably on PTB-TIR and LSOTB-TIR benchmarks compared with state-of-the-art trackers.
Collapse
Affiliation(s)
- Di Yuan
- Guangzhou Institute of Technology, Xidian University, Guangzhou, 510555 China
| | - Xiu Shu
- School of Science, Harbin Institute of Technology, Shenzhen, 518055 China
| | - Qiao Liu
- National Center for Applied Mathematics in Chongqing, Chongqing Normal University, Chongqing, 401331 China
| | - Xinming Zhang
- School of Science, Harbin Institute of Technology, Shenzhen, 518055 China
| | - Zhenyu He
- School of Computer Science and Technology, Harbin Institute of Technology, Shenzhen, 518055 China
| |
Collapse
|
5
|
Zhao L, Liu X, Ren H, Xue L. Thermal Infrared Tracking Method Based on Efficient Global Information Perception. SENSORS (BASEL, SWITZERLAND) 2022; 22:7408. [PMID: 36236505 PMCID: PMC9570693 DOI: 10.3390/s22197408] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 08/02/2022] [Revised: 09/15/2022] [Accepted: 09/24/2022] [Indexed: 06/16/2023]
Abstract
To solve the insufficient ability of the current Thermal InfraRed (TIR) tracking methods to resist occlusion and interference from similar targets, we propose a TIR tracking method based on efficient global information perception. In order to efficiently obtain the global semantic information of images, we use the Transformer structure for feature extraction and fusion. In the feature extraction process, the Focal Transformer structure is used to improve the efficiency of remote information modeling, which is highly similar to the human attention mechanism. The feature fusion process supplements the relative position encoding to the standard Transformer structure, which allows the model to continuously consider the influence of positional relationships during the learning process. It can also generalize to capture the different positional information for different input sequences. Thus, it makes the Transformer structure model the semantic information contained in images more efficiently. To further improve the tracking accuracy and robustness, the heterogeneous bi-prediction head is utilized in the object prediction process. The fully connected sub-network is responsible for the classification prediction of the foreground or background. The convolutional sub-network is responsible for the regression prediction of the object bounding box. In order to alleviate the contradiction between the vast demand for training data of the Transformer model and the insufficient scale of the TIR tracking dataset, the LaSOT-TIR dataset is generated with the generative adversarial network for network training. Our method achieves the best performance compared with other state-of-the-art trackers on the VOT2015-TIR, VOT2017-TIR, PTB-TIR and LSOTB-TIR datasets, and performs outstandingly especially when dealing with severe occlusion or interference from similar objects.
Collapse
Affiliation(s)
- Long Zhao
- Big Data Institute, East University of Heilongjiang, Harbin 150066, China
- College of Information and Computer Engineering, Northeast Forestry University, Harbin 150040, China
| | - Xiaoye Liu
- Big Data Institute, East University of Heilongjiang, Harbin 150066, China
| | - Honge Ren
- College of Information and Computer Engineering, Northeast Forestry University, Harbin 150040, China
- Forestry Intelligent Equipment Engineering Research Center, Harbin 150040, China
| | - Lingjixuan Xue
- College of Information and Computer Engineering, Northeast Forestry University, Harbin 150040, China
| |
Collapse
|
6
|
Xi T, Yuan L, Sun Q. A Combined Approach to Infrared Small-Target Detection with the Alternating Direction Method of Multipliers and an Improved Top-Hat Transformation. SENSORS (BASEL, SWITZERLAND) 2022; 22:7327. [PMID: 36236434 PMCID: PMC9571038 DOI: 10.3390/s22197327] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 08/06/2022] [Revised: 09/13/2022] [Accepted: 09/15/2022] [Indexed: 06/16/2023]
Abstract
In infrared small target detection, the infrared patch image (IPI)-model-based methods produce better results than other popular approaches (such as max-mean, top-hat, and human visual system) but in some extreme cases it suffers from long processing times and inconsistent performance. In order to overcome these issues, we propose a novel approach of dividing the traditional target detection process into two steps: suppression of background noise and elimination of clutter. The workflow consists of four steps: after importing the images, the second step applies the alternating direction multiplier method to preliminarily remove the background. Comparatively to the IPI model, this step does not require sliding patches, resulting in a significant reduction in processing time. To eliminate residual noise and clutter, the interim results from morphological filtering are then processed in step 3 through an improved new top-hat transformation, using a threefold structuring element. The final step is thresholding segmentation, which uses an adaptive threshold algorithm. Compared with IPI and the new top-hat methods, as well as some other widely used methods, our approach was able to detect infrared targets more efficiently (90% less computational time) and consistently (no sudden performance drop).
Collapse
Affiliation(s)
- Tengyan Xi
- Key Laboratory of Nondestructive Testing (Ministry of Education), Nanchang Hang Kong University, Nanchang 330031, China
| | - Lihua Yuan
- Key Laboratory of Nondestructive Testing (Ministry of Education), Nanchang Hang Kong University, Nanchang 330031, China
| | - Quanbin Sun
- School of Computing and Digital Technology, Birmingham City University, Birmingham B5 5JU, UK
| |
Collapse
|
7
|
Liu H, Pan W, Hu Y, Li C, Yuan X, Long T. A Detection and Tracking Method Based on Heterogeneous Multi-Sensor Fusion for Unmanned Mining Trucks. SENSORS (BASEL, SWITZERLAND) 2022; 22:5989. [PMID: 36015750 PMCID: PMC9415720 DOI: 10.3390/s22165989] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 06/20/2022] [Revised: 08/03/2022] [Accepted: 08/06/2022] [Indexed: 06/15/2023]
Abstract
There exist many difficulties in environmental perception in transportation at open-pit mines, such as unpaved roads, dusty environments, and high requirements for the detection and tracking stability of small irregular obstacles. In order to solve the above problems, a new multi-target detection and tracking method is proposed based on the fusion of Lidar and millimeter-wave radar. It advances a secondary segmentation algorithm suitable for open-pit mine production scenarios to improve the detection distance and accuracy of small irregular obstacles on unpaved roads. In addition, the paper also proposes an adaptive heterogeneous multi-source fusion strategy of filtering dust, which can significantly improve the detection and tracking ability of the perception system for various targets in the dust environment by adaptively adjusting the confidence of the output target. Finally, the test results in the open-pit mine show that the method can stably detect obstacles with a size of 30-40 cm at 60 m in front of the mining truck, and effectively filter out false alarms of concentration dust, which proves the reliability of the method.
Collapse
|
8
|
Huang B, Xu T, Shen Z, Jiang S, Zhao B, Bian Z. SiamATL: Online Update of Siamese Tracking Network via Attentional Transfer Learning. IEEE TRANSACTIONS ON CYBERNETICS 2022; 52:7527-7540. [PMID: 33417585 DOI: 10.1109/tcyb.2020.3043520] [Citation(s) in RCA: 7] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/12/2023]
Abstract
Visual object tracking with semantic deep features has recently attracted much attention in computer vision. Especially, Siamese trackers, which aim to learn a decision making-based similarity evaluation, are widely utilized in the tracking community. However, the online updating of the Siamese fashion is still a tricky issue due to the limitation, which is a tradeoff between model adaption and degradation. To address such an issue, in this article, we propose a novel attentional transfer learning-based Siamese network (SiamATL), which fully exploits the previous knowledge to inspire the current tracker learning in the decision-making module. First, we explicitly model the template and surroundings by using an attentional online update strategy to avoid template pollution. Then, we introduce an instance-transfer discriminative correlation filter (ITDCF) to enhance the distinguishing ability of the tracker. Finally, we suggest a mutual compensation mechanism that integrates cross-correlation matching and ITDCF detection into the decision-making subnetwork to achieve online tracking. Comprehensive experiments demonstrate that our approach outperforms state-of-the-art tracking algorithms on multiple large-scale tracking datasets.
Collapse
|
9
|
Li W, Lv L, Zhu J. Multigroup spatial shift models for thermal infrared tracking. Knowl Based Syst 2022. [DOI: 10.1016/j.knosys.2022.109705] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/26/2022]
|
10
|
Learning reliable modal weight with transformer for robust RGBT tracking. Knowl Based Syst 2022. [DOI: 10.1016/j.knosys.2022.108945] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/23/2022]
|
11
|
Hybrid neural networks for noise reductions of integrated navigation complexes. ARTIF INTELL 2022. [DOI: 10.15407/jai2022.01.288] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/17/2022]
Abstract
The necessity of integrated navigation complexes (INC) construction is substantiated. It is proposed to include in the complex the following inertial systems: inertial, satellite and visual. It helps to increase the accuracy of determining the coordinates of unmanned aerial vehicles. It is shown that in unfavorable cases, namely the suppression of external noise of the satellite navigation system, an increase in the errors of the inertial navigation system (INS), including through the use of accelerometers and gyroscopes manufactured using MEMS technology, the presence of bad weather conditions, which complicates the work of the visual navigation system. In order to ensure the operation of the navigation complex, it is necessary to ensure the suppression of interference (noise). To improve the accuracy of the INS, which is part of the INC, it is proposed to use the procedure for extracting noise from the raw signal of the INS, its prediction using neural networks and its suppression. To solve this problem, two approaches are proposed, the first of which is based on the use of a multi-row GMDH algorithm and single-layer networks with sigm_piecewise neurons, and the second is on the use of hybrid recurrent neural networks, when neural networks were used, which included long-term and short-term memory (LSTM) and Gated Recurrent Units (GRU). Various types of noise, that are inherent in video images in visual navigation systems are considered: Gaussian noise, salt and pepper noise, Poisson noise, fractional noise, blind noise. Particular attention is paid to blind noise. To improve the accuracy of the visual navigation system, it is proposed to use hybrid convolutional neural networks.
Collapse
|
12
|
|
13
|
Wang K, Liu M. Toward Structural Learning and Enhanced YOLOv4 Network for Object Detection in Optical Remote Sensing Images. ADVANCED THEORY AND SIMULATIONS 2022. [DOI: 10.1002/adts.202200002] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]
Affiliation(s)
- Kun Wang
- College of Electronic Information and Automation Civil Aviation University of China Tianjin 300300 China
| | - Maozhen Liu
- College of Electronic Information and Automation Civil Aviation University of China Tianjin 300300 China
| |
Collapse
|
14
|
HCDC-SRCF tracker: Learning an adaptively multi-feature fuse tracker in spatial regularized correlation filters framework. Knowl Based Syst 2022. [DOI: 10.1016/j.knosys.2021.107913] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022]
|
15
|
Zhao H, Sun X, Dong J, Dong Z, Li Q. Knowledge distillation via instance-level sequence learning. Knowl Based Syst 2021. [DOI: 10.1016/j.knosys.2021.107519] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
|
16
|
Robust Data Association Using Fusion of Data-Driven and Engineered Features for Real-Time Pedestrian Tracking in Thermal Images. SENSORS 2021; 21:s21238005. [PMID: 34884016 PMCID: PMC8659910 DOI: 10.3390/s21238005] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 10/29/2021] [Revised: 11/26/2021] [Accepted: 11/28/2021] [Indexed: 11/17/2022]
Abstract
Object tracking is an essential problem in computer vision that has been extensively researched for decades. Tracking objects in thermal images is particularly difficult because of the lack of color information, low image resolution, or high similarity between objects of the same class. One of the main challenges in multi-object tracking, also referred to as the data association problem, is finding the correct correspondences between measurements and tracks and adapting the object appearance changes over time. We addressed this challenge of data association for thermal images by proposing three contributions. The first contribution consisted of the creation of a data-driven appearance score using five Siamese Networks, which operate on the image detection and on parts of it. Secondly, we engineered an original edge-based descriptor that improves the data association process. Lastly, we proposed a dataset consisting of pedestrian instances that were recorded in different scenarios and are used for training the Siamese Networks. The data-driven part of the data association score offers robustness, while feature engineering offers adaptability to unknown scenarios and their combination leads to a more powerful tracking solution. Our approach had a running time of 25 ms and achieved an average precision of 86.2% on publicly available benchmarks, containing real-world scenarios, as shown in the evaluation section.
Collapse
|
17
|
Zhao L, Zhu M, Ren H, Xue L. Channel Exchanging for RGB-T Tracking. SENSORS 2021; 21:s21175800. [PMID: 34502691 PMCID: PMC8434326 DOI: 10.3390/s21175800] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 06/23/2021] [Revised: 08/13/2021] [Accepted: 08/27/2021] [Indexed: 11/28/2022]
Abstract
It is difficult to achieve all-weather visual object tracking in an open environment only utilizing single modality data input. Due to the complementarity of RGB and thermal infrared (TIR) data in various complex environments, a more robust object tracking framework can be obtained using video data of these two modalities. The fusion methods of RGB and TIR data are the core elements to determine the performance of the RGB-T object tracking method, and the existing RGB-T trackers have not solved this problem well. In order to solve the current low utilization of information intra single modality in aggregation-based methods and between two modalities in alignment-based methods, we used DiMP as the baseline tracker to design an RGB-T object tracking framework channel exchanging DiMP (CEDiMP) based on channel exchanging. CEDiMP achieves dynamic channel exchanging between sub-networks of different modes hardly adding any parameters during the feature fusion process. The expression ability of the deep features generated by our data fusion method based on channel exchanging is stronger. At the same time, in order to solve the poor generalization ability of the existing RGB-T object tracking methods and the poor ability in the long-term object tracking, more training of CEDiMP on the synthetic dataset LaSOT-RGBT is added. A large number of experiments demonstrate the effectiveness of the proposed model. CEDiMP achieves the best performance on two RGB-T object tracking benchmark datasets, GTOT and RGBT234, and performs outstandingly in the generalization testing.
Collapse
Affiliation(s)
- Long Zhao
- College of Information and Computer Engineering, Northeast Forestry University, Harbin 150040, China; (L.Z.); (M.Z.); (L.X.)
- Big Data Institute, East University of Heilongjiang, Harbin 150066, China
| | - Meng Zhu
- College of Information and Computer Engineering, Northeast Forestry University, Harbin 150040, China; (L.Z.); (M.Z.); (L.X.)
| | - Honge Ren
- College of Information and Computer Engineering, Northeast Forestry University, Harbin 150040, China; (L.Z.); (M.Z.); (L.X.)
- Forestry Intelligent Equipment Engineering Research Center, Harbin 150040, China
- Correspondence:
| | - Lingjixuan Xue
- College of Information and Computer Engineering, Northeast Forestry University, Harbin 150040, China; (L.Z.); (M.Z.); (L.X.)
| |
Collapse
|
18
|
A Review of Tracking and Trajectory Prediction Methods for Autonomous Driving. MATHEMATICS 2021. [DOI: 10.3390/math9060660] [Citation(s) in RCA: 11] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/16/2022]
Abstract
This paper provides a literature review of some of the most important concepts, techniques, and methodologies used within autonomous car systems. Specifically, we focus on two aspects extensively explored in the related literature: tracking, i.e., identifying pedestrians, cars or obstacles from images, observations or sensor data, and prediction, i.e., anticipating the future trajectories and motion of other vehicles in order to facilitate navigating through various traffic conditions. Approaches based on deep neural networks and others, especially stochastic techniques, are reported.
Collapse
|
19
|
Abstract
Detection of small moving objects is an important research area with applications including monitoring of flying insects, studying their foraging behavior, using insect pollinators to monitor flowering and pollination of crops, surveillance of honeybee colonies, and tracking movement of honeybees. However, due to the lack of distinctive shape and textural details on small objects, direct application of modern object detection methods based on convolutional neural networks (CNNs) shows considerably lower performance. In this paper we propose a method for the detection of small moving objects in videos recorded using unmanned aerial vehicles equipped with standard video cameras. The main steps of the proposed method are video stabilization, background estimation and subtraction, frame segmentation using a CNN, and thresholding the segmented frame. However, for training a CNN it is required that a large labeled dataset is available. Manual labelling of small moving objects in videos is very difficult and time consuming, and such labeled datasets do not exist at the moment. To circumvent this problem, we propose training a CNN using synthetic videos generated by adding small blob-like objects to video sequences with real-world backgrounds. The experimental results on detection of flying honeybees show that by using a combination of classical computer vision techniques and CNNs, as well as synthetic training sets, the proposed approach overcomes the problems associated with direct application of CNNs to the given problem and achieves an average F1-score of 0.86 in tests on real-world videos.
Collapse
|
20
|
|
21
|
Zhang M, Tian G, Zhang Y, Duan P. Service skill improvement for home robots: Autonomous generation of action sequence based on reinforcement learning. Knowl Based Syst 2021. [DOI: 10.1016/j.knosys.2020.106605] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/07/2023]
|
22
|
Wei C, Zhang J, Yuan X, He Z, Liu G, Wu J. NeuroTIS: Enhancing the prediction of translation initiation sites in mRNA sequences via a hybrid dependency network and deep learning framework. Knowl Based Syst 2021. [DOI: 10.1016/j.knosys.2020.106459] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/20/2022]
|
23
|
Abstract
Airborne target tracking in infrared imagery remains a challenging task. The airborne target usually has a low signal-to-noise ratio and shows different visual patterns. The features adopted in the visual tracking algorithm are usually deep features pre-trained on ImageNet, which are not tightly coupled with the current video domain and therefore might not be optimal for infrared target tracking. To this end, we propose a new approach to learn the domain-specific features, which can be adapted to the current video online without pre-training on a large datasets. Considering that only a few samples of the initial frame can be used for online training, general feature representations are encoded to the network for a better initialization. The feature learning module is flexible and can be integrated into tracking frameworks based on correlation filters to improve the baseline method. Experiments on airborne infrared imagery are conducted to demonstrate the effectiveness of our tracking algorithm.
Collapse
|
24
|
Li X, Liu Q, Fan N, Zhou Z, He Z, Jing XY. Dual-regression model for visual tracking. Neural Netw 2020; 132:364-374. [DOI: 10.1016/j.neunet.2020.09.011] [Citation(s) in RCA: 13] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/05/2020] [Revised: 09/02/2020] [Accepted: 09/10/2020] [Indexed: 01/07/2023]
|
25
|
Thermal infrared pedestrian tracking using joint siamese network and exemplar prediction model. Pattern Recognit Lett 2020. [DOI: 10.1016/j.patrec.2020.09.022] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/19/2022]
|
26
|
Tian C, Fei L, Zheng W, Xu Y, Zuo W, Lin CW. Deep learning on image denoising: An overview. Neural Netw 2020; 131:251-275. [PMID: 32829002 DOI: 10.1016/j.neunet.2020.07.025] [Citation(s) in RCA: 197] [Impact Index Per Article: 39.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/16/2020] [Revised: 06/17/2020] [Accepted: 07/21/2020] [Indexed: 01/19/2023]
Abstract
Deep learning techniques have received much attention in the area of image denoising. However, there are substantial differences in the various types of deep learning methods dealing with image denoising. Specifically, discriminative learning based on deep learning can ably address the issue of Gaussian noise. Optimization models based on deep learning are effective in estimating the real noise. However, there has thus far been little related research to summarize the different deep learning techniques for image denoising. In this paper, we offer a comparative study of deep techniques in image denoising. We first classify the deep convolutional neural networks (CNNs) for additive white noisy images; the deep CNNs for real noisy images; the deep CNNs for blind denoising and the deep CNNs for hybrid noisy images, which represents the combination of noisy, blurred and low-resolution images. Then, we analyze the motivations and principles of the different types of deep learning methods. Next, we compare the state-of-the-art methods on public denoising datasets in terms of quantitative and qualitative analyses. Finally, we point out some potential challenges and directions of future research.
Collapse
Affiliation(s)
- Chunwei Tian
- Bio-Computing Research Center, Harbin Institute of Technology, Shenzhen, Shenzhen, 518055, Guangdong, China; Shenzhen Key Laboratory of Visual Object Detection and Recognition, Shenzhen, 518055, Guangdong, China
| | - Lunke Fei
- School of Computers, Guangdong University of Technology, Guangzhou, 510006, Guangdong, China
| | - Wenxian Zheng
- Tsinghua Shenzhen International Graduate School, Shenzhen, 518055, Guangdong, China
| | - Yong Xu
- Bio-Computing Research Center, Harbin Institute of Technology, Shenzhen, Shenzhen, 518055, Guangdong, China; Shenzhen Key Laboratory of Visual Object Detection and Recognition, Shenzhen, 518055, Guangdong, China; Peng Cheng Laboratory, Shenzhen, 518055, Guangdong, China.
| | - Wangmeng Zuo
- School of Computer Science and Technology, Harbin Institute of Technology, Harbin, 150001, Heilongjiang, China; Peng Cheng Laboratory, Shenzhen, 518055, Guangdong, China
| | - Chia-Wen Lin
- Department of Electrical Engineering and the Institute of Communications Engineering, National Tsing Hua University, Hsinchu, Taiwan
| |
Collapse
|
27
|
Kim C, Ko H. Weighted Kernel Filter Based Anti-Air Object Tracking for Thermal Infrared Systems. SENSORS (BASEL, SWITZERLAND) 2020; 20:s20154081. [PMID: 32707900 PMCID: PMC7435644 DOI: 10.3390/s20154081] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 06/23/2020] [Revised: 07/16/2020] [Accepted: 07/20/2020] [Indexed: 06/11/2023]
Abstract
Visual object tracking is an important component of surveillance systems and many high-performance methods have been developed. However, these tracking methods tend to be optimized for the Red/Green/Blue (RGB) domain and are thus not suitable for use with the infrared (IR) domain. To overcome this disadvantage, many researchers have constructed datasets for IR analysis, including those developed for The Thermal Infrared Visual Object Tracking (VOT-TIR) challenges. As a consequence, many state-of-the-art trackers for the IR domain have been proposed, but there remains a need for reliable IR-based trackers for anti-air surveillance systems, including the construction of a new IR dataset for this purpose. In this paper, we collect various anti-air thermal-wave IR (TIR) images from an electro-optical surveillance system to create a new dataset. We also present a framework based on an end-to-end convolutional neural network that learns object tracking in the IR domain for anti-air targets such as unmanned aerial vehicles (UAVs) and drones. More specifically, we adopt a Siamese network for feature extraction and three region proposal networks for the classification and regression branches. In the inference phase, the proposed network is formulated as a detection-by-tracking method, and kernel filters for the template branch that are continuously updated for every frame are introduced. The proposed network is able to learn robust structural information for the targets during offline training, and the kernel filters can robustly track the targets, demonstrating enhanced performance. Experimental results from the new IR dataset reveal that the proposed method achieves outstanding performance, with a real-time processing speed of 40 frames per second.
Collapse
Affiliation(s)
- Chuljoong Kim
- Department of Video Information Processing, Korea University, Seoul 136-713, Korea;
- Hanwha Systems Co., Sungnam 461-140, Korea
| | - Hanseok Ko
- Department of Video Information Processing, Korea University, Seoul 136-713, Korea;
| |
Collapse
|
28
|
|
29
|
Yuan D, Li X, He Z, Liu Q, Lu S. Visual object tracking with adaptive structural convolutional network. Knowl Based Syst 2020. [DOI: 10.1016/j.knosys.2020.105554] [Citation(s) in RCA: 47] [Impact Index Per Article: 9.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/24/2022]
|
30
|
Yuan D, Fan N, He Z. Learning target-focusing convolutional regression model for visual object tracking. Knowl Based Syst 2020. [DOI: 10.1016/j.knosys.2020.105526] [Citation(s) in RCA: 45] [Impact Index Per Article: 9.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/16/2022]
|
31
|
Global Motion-Aware Robust Visual Object Tracking for Electro Optical Targeting Systems. SENSORS 2020; 20:s20020566. [PMID: 31968620 PMCID: PMC7014512 DOI: 10.3390/s20020566] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 11/01/2019] [Revised: 01/14/2020] [Accepted: 01/16/2020] [Indexed: 11/24/2022]
Abstract
Although recently developed trackers have shown excellent performance even when tracking fast moving and shape changing objects with variable scale and orientation, the trackers for the electro-optical targeting systems (EOTS) still suffer from abrupt scene changes due to frequent and fast camera motions by pan-tilt motor control or dynamic distortions in field environments. Conventional context aware (CA) and deep learning based trackers have been studied to tackle these problems, but they have the drawbacks of not fully overcoming the problems and dealing with their computational burden. In this paper, a global motion aware method is proposed to address the fast camera motion issue. The proposed method consists of two modules: (i) a motion detection module, which is based on the change in image entropy value, and (ii) a background tracking module, used to track a set of features in consecutive images to find correspondences between them and estimate global camera movement. A series of experiments is conducted on thermal infrared images, and the results show that the proposed method can significantly improve the robustness of all trackers with a minimal computational overhead. We show that the proposed method can be easily integrated into any visual tracking framework and can be applied to improve the performance of EOTS applications.
Collapse
|
32
|
Samaras S, Diamantidou E, Ataloglou D, Sakellariou N, Vafeiadis A, Magoulianitis V, Lalas A, Dimou A, Zarpalas D, Votis K, Daras P, Tzovaras D. Deep Learning on Multi Sensor Data for Counter UAV Applications-A Systematic Review. SENSORS 2019; 19:s19224837. [PMID: 31698862 PMCID: PMC6891421 DOI: 10.3390/s19224837] [Citation(s) in RCA: 27] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 09/27/2019] [Revised: 10/23/2019] [Accepted: 11/01/2019] [Indexed: 12/02/2022]
Abstract
Usage of Unmanned Aerial Vehicles (UAVs) is growing rapidly in a wide range of consumer applications, as they prove to be both autonomous and flexible in a variety of environments and tasks. However, this versatility and ease of use also brings a rapid evolution of threats by malicious actors that can use UAVs for criminal activities, converting them to passive or active threats. The need to protect critical infrastructures and important events from such threats has brought advances in counter UAV (c-UAV) applications. Nowadays, c-UAV applications offer systems that comprise a multi-sensory arsenal often including electro-optical, thermal, acoustic, radar and radio frequency sensors, whose information can be fused to increase the confidence of threat’s identification. Nevertheless, real-time surveillance is a cumbersome process, but it is absolutely essential to detect promptly the occurrence of adverse events or conditions. To that end, many challenging tasks arise such as object detection, classification, multi-object tracking and multi-sensor information fusion. In recent years, researchers have utilized deep learning based methodologies to tackle these tasks for generic objects and made noteworthy progress, yet applying deep learning for UAV detection and classification is considered a novel concept. Therefore, the need to present a complete overview of deep learning technologies applied to c-UAV related tasks on multi-sensor data has emerged. The aim of this paper is to describe deep learning advances on c-UAV related tasks when applied to data originating from many different sensors as well as multi-sensor information fusion. This survey may help in making recommendations and improvements of c-UAV applications for the future.
Collapse
Affiliation(s)
- Stamatios Samaras
- Centre for Research and Technology Hellas, Information Technologies Institute, 6th km Charilaou-Thermi, 57001 Thermi, Greece; (E.D.); (D.A.); (N.S.); (A.V.); (V.M.); (A.L.); (A.D.); (D.Z.); (K.V.); (P.D.); (D.T.)
- Correspondence:
| | - Eleni Diamantidou
- Centre for Research and Technology Hellas, Information Technologies Institute, 6th km Charilaou-Thermi, 57001 Thermi, Greece; (E.D.); (D.A.); (N.S.); (A.V.); (V.M.); (A.L.); (A.D.); (D.Z.); (K.V.); (P.D.); (D.T.)
| | - Dimitrios Ataloglou
- Centre for Research and Technology Hellas, Information Technologies Institute, 6th km Charilaou-Thermi, 57001 Thermi, Greece; (E.D.); (D.A.); (N.S.); (A.V.); (V.M.); (A.L.); (A.D.); (D.Z.); (K.V.); (P.D.); (D.T.)
| | - Nikos Sakellariou
- Centre for Research and Technology Hellas, Information Technologies Institute, 6th km Charilaou-Thermi, 57001 Thermi, Greece; (E.D.); (D.A.); (N.S.); (A.V.); (V.M.); (A.L.); (A.D.); (D.Z.); (K.V.); (P.D.); (D.T.)
| | - Anastasios Vafeiadis
- Centre for Research and Technology Hellas, Information Technologies Institute, 6th km Charilaou-Thermi, 57001 Thermi, Greece; (E.D.); (D.A.); (N.S.); (A.V.); (V.M.); (A.L.); (A.D.); (D.Z.); (K.V.); (P.D.); (D.T.)
| | - Vasilis Magoulianitis
- Centre for Research and Technology Hellas, Information Technologies Institute, 6th km Charilaou-Thermi, 57001 Thermi, Greece; (E.D.); (D.A.); (N.S.); (A.V.); (V.M.); (A.L.); (A.D.); (D.Z.); (K.V.); (P.D.); (D.T.)
| | - Antonios Lalas
- Centre for Research and Technology Hellas, Information Technologies Institute, 6th km Charilaou-Thermi, 57001 Thermi, Greece; (E.D.); (D.A.); (N.S.); (A.V.); (V.M.); (A.L.); (A.D.); (D.Z.); (K.V.); (P.D.); (D.T.)
| | - Anastasios Dimou
- Centre for Research and Technology Hellas, Information Technologies Institute, 6th km Charilaou-Thermi, 57001 Thermi, Greece; (E.D.); (D.A.); (N.S.); (A.V.); (V.M.); (A.L.); (A.D.); (D.Z.); (K.V.); (P.D.); (D.T.)
| | - Dimitrios Zarpalas
- Centre for Research and Technology Hellas, Information Technologies Institute, 6th km Charilaou-Thermi, 57001 Thermi, Greece; (E.D.); (D.A.); (N.S.); (A.V.); (V.M.); (A.L.); (A.D.); (D.Z.); (K.V.); (P.D.); (D.T.)
| | - Konstantinos Votis
- Centre for Research and Technology Hellas, Information Technologies Institute, 6th km Charilaou-Thermi, 57001 Thermi, Greece; (E.D.); (D.A.); (N.S.); (A.V.); (V.M.); (A.L.); (A.D.); (D.Z.); (K.V.); (P.D.); (D.T.)
- Institute For the Future, University of Nicosia, Makedonitissis 46, 2417 Nicosia, Cyprus
| | - Petros Daras
- Centre for Research and Technology Hellas, Information Technologies Institute, 6th km Charilaou-Thermi, 57001 Thermi, Greece; (E.D.); (D.A.); (N.S.); (A.V.); (V.M.); (A.L.); (A.D.); (D.Z.); (K.V.); (P.D.); (D.T.)
| | - Dimitrios Tzovaras
- Centre for Research and Technology Hellas, Information Technologies Institute, 6th km Charilaou-Thermi, 57001 Thermi, Greece; (E.D.); (D.A.); (N.S.); (A.V.); (V.M.); (A.L.); (A.D.); (D.Z.); (K.V.); (P.D.); (D.T.)
| |
Collapse
|
33
|
Liu B, Liu Q, Zhang T, Yang Y. MSSTResNet-TLD: A robust tracking method based on tracking-learning-detection framework by using multi-scale spatio-temporal residual network feature model. Neurocomputing 2019. [DOI: 10.1016/j.neucom.2019.07.024] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/27/2022]
|
34
|
|
35
|
Fast Face Tracking-by-Detection Algorithm for Secure Monitoring. APPLIED SCIENCES-BASEL 2019. [DOI: 10.3390/app9183774] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/16/2022]
Abstract
This work proposes a fast face tracking-by-detection (FFTD) algorithm that can perform tracking, face detection and discrimination tasks. On the basis of using the kernelized correlation filter (KCF) as the basic tracker, multitask cascade convolutional neural networks (CNNs) are used to detect the face, and a new tracking update strategy is designed. The update strategy uses the tracking result modified by detector to update the filter model. When the tracker drifts or fails, the discriminator module starts the detector to correct the tracking results, which ensures the out-of-view object can be tracked. Through extensive experiments, the proposed FFTD algorithm is shown to have good robustness and real-time performance for video monitoring scenes.
Collapse
|
36
|
Mask Sparse Representation Based on Semantic Features for Thermal Infrared Target Tracking. REMOTE SENSING 2019. [DOI: 10.3390/rs11171967] [Citation(s) in RCA: 13] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/16/2022]
Abstract
Thermal infrared (TIR) target tracking is a challenging task as it entails learning an effective model to identify the target in the situation of poor target visibility and clutter background. The sparse representation, as a typical appearance modeling approach, has been successfully exploited in the TIR target tracking. However, the discriminative information of the target and its surrounding background is usually neglected in the sparse coding process. To address this issue, we propose a mask sparse representation (MaskSR) model, which combines sparse coding together with high-level semantic features for TIR target tracking. We first obtain the pixel-wise labeling results of the target and its surrounding background in the last frame, and then use such results to train target-specific deep networks using a supervised manner. According to the output features of the deep networks, the high-level pixel-wise discriminative map of the target area is obtained. We introduce the binarized discriminative map as a mask template to the sparse representation and develop a novel algorithm to collaboratively represent the reliable target part and unreliable target part partitioned with the mask template, which explicitly indicates different discriminant capabilities by label 1 and 0. The proposed MaskSR model controls the superiority of the reliable target part in the reconstruction process via a weighted scheme. We solve this multi-parameter constrained problem by a customized alternating direction method of multipliers (ADMM) method. This model is applied to achieve TIR target tracking in the particle filter framework. To improve the sampling effectiveness and decrease the computation cost at the same time, a discriminative particle selection strategy based on kernelized correlation filter is proposed to replace the previous random sampling for searching useful candidates. Our proposed tracking method was tested on the VOT-TIR2016 benchmark. The experiment results show that the proposed method has a significant superiority compared with various state-of-the-art methods in TIR target tracking.
Collapse
|
37
|
Rak M, Steffen J, Meyer A, Hansen C, Tönnies KD. Combining convolutional neural networks and star convex cuts for fast whole spine vertebra segmentation in MRI. COMPUTER METHODS AND PROGRAMS IN BIOMEDICINE 2019; 177:47-56. [PMID: 31319960 DOI: 10.1016/j.cmpb.2019.05.003] [Citation(s) in RCA: 15] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/16/2018] [Revised: 03/26/2019] [Accepted: 05/09/2019] [Indexed: 06/10/2023]
Abstract
BACKGROUND AND OBJECTIVE We propose an automatic approach for fast vertebral body segmentation in three-dimensional magnetic resonance images of the whole spine. Previous works are limited to the lower thoracolumbar section and often take minutes to compute, which is problematic in clinical routine, for study data sets with numerous subjects or when the cervical or upper thoracic spine is to be analyzed. METHODS We address these limitations by a novel graph cut formulation based on vertebra patches extracted along the spine. For each patch, our formulation incorporates appearance and shape information derived from a task-specific convolutional neural network as well as star-convexity constraints that ensure a topologically correct segmentation of each vertebra. When segmenting vertebrae individually, ambiguities will occur due to overlapping segmentations of adjacent vertebrae. We tackle this problem by novel non-overlap constraints between neighboring patches based on so-called encoding swaps. The latter allow us to obtain a globally optimal multi-label segmentation of all vertebrae in polynomial time. RESULTS We validated our approach on two data sets. The first contains T1- and T2-weighted whole spine images of 64 subjects with varying health conditions. The second comprises 23 T2-weighted thoracolumbar images of young healthy adults and is publicly available. Our method yielded Dice coefficients of 93.8 ± 2.6% and 96.0 ± 1.0% for both data sets with a run time of 1.35 ± 0.08 s and 0.90 ± 0.03 s per vertebra on consumer hardware. A complete whole spine segmentation took 32.4 ± 1.92 s on average. CONCLUSIONS Our results are superior to those of previous works at a fraction of their run time, which illustrates the efficiency and effectiveness of our whole spine segmentation approach.
Collapse
Affiliation(s)
- Marko Rak
- Department of Simulation and Graphics, University of Magdeburg Universitätsplatz 2, Magdeburg, 39106 Germany.
| | - Johannes Steffen
- Department of Simulation and Graphics, University of Magdeburg Universitätsplatz 2, Magdeburg, 39106 Germany.
| | - Anneke Meyer
- Department of Simulation and Graphics, University of Magdeburg Universitätsplatz 2, Magdeburg, 39106 Germany
| | - Christian Hansen
- Department of Simulation and Graphics, University of Magdeburg Universitätsplatz 2, Magdeburg, 39106 Germany
| | - Klaus-Dietz Tönnies
- Department of Simulation and Graphics, University of Magdeburg Universitätsplatz 2, Magdeburg, 39106 Germany
| |
Collapse
|
38
|
|
39
|
|
40
|
|
41
|
Chen WS, Liu J, Pan B, Chen B. Face recognition using nonnegative matrix factorization with fractional power inner product kernel. Neurocomputing 2019. [DOI: 10.1016/j.neucom.2018.06.083] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/26/2022]
|
42
|
Wang S, Liu L, Qu L, Yu C, Sun Y, Gao F, Dong J. Accurate Ulva prolifera regions extraction of UAV images with superpixel and CNNs for ocean environment monitoring. Neurocomputing 2019. [DOI: 10.1016/j.neucom.2018.06.088] [Citation(s) in RCA: 20] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
|
43
|
|
44
|
Liao Q, Ding Y, Jiang ZL, Wang X, Zhang C, Zhang Q. Multi-task deep convolutional neural network for cancer diagnosis. Neurocomputing 2019. [DOI: 10.1016/j.neucom.2018.06.084] [Citation(s) in RCA: 40] [Impact Index Per Article: 6.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/26/2022]
|
45
|
|
46
|
|
47
|
Wu G, Zhang D, Chen W, Zuo W, Xia Z. Robust Deep Softmax Regression Against Label Noise for Unsupervised Domain Adaptation. INT J PATTERN RECOGN 2019. [DOI: 10.1142/s0218001419400020] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022]
Abstract
Domain adaptation aims to generalize the classification model from a source domain to a different but related target domain. Recent studies have revealed the benefit of deep convolutional features trained on a large dataset (e.g. ImageNet) in alleviating domain discrepancy. However, literatures show that the transferability of features decreases as (i) the difference between the source and target domains increases, or (ii) the layers are toward the top layers. Therefore, even with deep features, domain adaptation remains necessary. In this paper, we propose a novel unsupervised domain adaptation (UDA) model for deep neural networks, which is learned with the labeled source samples and the unlabeled target ones simultaneously. For target samples without labels, pseudo labels are assigned to them according to their maximum classification scores during training of the UDA model. However, due to the domain discrepancy, label noise generally is inevitable, which degrades the performance of the domain adaptation model. Thus, to effectively utilize the target samples, three specific robust deep softmax regression (RDSR) functions are performed for them with high, medium and low classification confidence respectively. Extensive experiments show that our method yields the state-of-the-art results, demonstrating the effectiveness of the robust deep softmax regression classifier in UDA.
Collapse
Affiliation(s)
- Guangbin Wu
- State Key Laboratory of Robotics and System, Harbin Institute of Technology, Harbin, P. R. China
| | - David Zhang
- Department of Computing, The Hong Kong Polytechnic University, Hong Kong, P. R. China
| | - Weishan Chen
- State Key Laboratory of Robotics and System, Harbin Institute of Technology, Harbin, P. R. China
| | - Wangmeng Zuo
- School of Computer Science and Technology, Harbin Institute of Technology, Harbin, P. R. China
| | - Zhuang Xia
- State Key Laboratory of Robotics and System, Harbin Institute of Technology, Harbin, P. R. China
| |
Collapse
|
48
|
|
49
|
Zhao Z, Chen Z, Voros S, Cheng X. Real-time tracking of surgical instruments based on spatio-temporal context and deep learning. Comput Assist Surg (Abingdon) 2019; 24:20-29. [PMID: 30760050 DOI: 10.1080/24699322.2018.1560097] [Citation(s) in RCA: 16] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/27/2022] Open
Abstract
ABSTARCT Real-time tool tracking in minimally invasive-surgery (MIS) has numerous applications for computer-assisted interventions (CAIs). Visual tracking approaches are a promising solution to real-time surgical tool tracking, however, many approaches may fail to complete tracking when the tracker suffers from issues such as motion blur, adverse lighting, specular reflections, shadows, and occlusions. We propose an automatic real-time method for two-dimensional tool detection and tracking based on a spatial transformer network (STN) and spatio-temporal context (STC). Our method exploits both the ability of a convolutional neural network (CNN) with an in-house trained STN and STC to accurately locate the tool at high speed. Then we compared our method experimentally with other four general of CAIs' visual tracking methods using eight existing online and in-house datasets, covering both in vivo abdominal, cardiac and retinal clinical cases in which different surgical instruments were employed. The experiments demonstrate that our method achieved great performance with respect to the accuracy and the speed. It can track a surgical tool without labels in real time in the most challenging of cases, with an accuracy that is equal to and sometimes surpasses most state-of-the-art tracking algorithms. Further improvements to our method will focus on conditions of occlusion and multi-instruments.
Collapse
Affiliation(s)
- Zijian Zhao
- School of Control Science and Engineering, Shandong University , Jinan , China
| | - Zhaorui Chen
- School of Control Science and Engineering, Shandong University , Jinan , China
| | - Sandrine Voros
- University Grenoble-Alpes, CNRS, INSERM, TIMC-IMAG , Grenoble , France
| | - Xiaolin Cheng
- Lab of Laparoscopic Technique and Engineering, Qilu Hospital of Shandong University , Jinan , China
| |
Collapse
|
50
|
|