1
|
Xie S, Deng G, Lin B, Jing W, Li Y, Zhao X. Real-Time Object Detection from UAV Inspection Videos by Combining YOLOv5s and DeepStream. SENSORS (BASEL, SWITZERLAND) 2024; 24:3862. [PMID: 38931645 PMCID: PMC11207608 DOI: 10.3390/s24123862] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/16/2024] [Revised: 06/07/2024] [Accepted: 06/07/2024] [Indexed: 06/28/2024]
Abstract
The high-altitude real-time inspection of unmanned aerial vehicles (UAVs) has always been a very challenging task. Because high-altitude inspections are susceptible to interference from different weather conditions, interference from communication signals and a larger field of view result in a smaller object area to be identified. We adopted a method that combines a UAV system scheduling platform with artificial intelligence object detection to implement the UAV automatic inspection technology. We trained the YOLOv5s model on five different categories of vehicle data sets, in which mAP50 and mAP50-95 reached 93.2% and 71.7%, respectively. The YOLOv5s model size is only 13.76 MB, and the detection speed of a single inspection photo reaches 11.26 ms. It is a relatively lightweight model and is suitable for deployment on edge devices for real-time detection. In the original DeepStream framework, we set up the http communication protocol to start quickly to enable different users to call and use it at the same time. In addition, asynchronous sending of alarm frame interception function was added and the auxiliary services were set up to quickly resume video streaming after interruption. We deployed the trained YOLOv5s model on the improved DeepStream framework to implement automatic UAV inspection.
Collapse
Affiliation(s)
- Shidun Xie
- Guangdong Engineering Technology Research Center of UAV Remote Sensing Network, Guangzhou iMapCloud Intelligent Technology Co., Ltd., Guangzhou 510095, China; (S.X.); (B.L.); (X.Z.)
| | - Guanghong Deng
- Guangdong Engineering Technology Research Center of UAV Remote Sensing Network, Guangzhou iMapCloud Intelligent Technology Co., Ltd., Guangzhou 510095, China; (S.X.); (B.L.); (X.Z.)
| | - Baihao Lin
- Guangdong Engineering Technology Research Center of UAV Remote Sensing Network, Guangzhou iMapCloud Intelligent Technology Co., Ltd., Guangzhou 510095, China; (S.X.); (B.L.); (X.Z.)
| | - Wenlong Jing
- Guangdong Province Engineering Laboratory for Geographic Spatiotemporal Big Data, Key Laboratory of Guangdong for Utilization of Remote Sensing and Geographical Information System, Guangdong Open Laboratory of Geospatial Information Technology and Application, Guangzhou Institute of Geography, Guangdong Academy of Sciences, Guangzhou 510070, China
| | - Yong Li
- Guangdong Province Engineering Laboratory for Geographic Spatiotemporal Big Data, Key Laboratory of Guangdong for Utilization of Remote Sensing and Geographical Information System, Guangdong Open Laboratory of Geospatial Information Technology and Application, Guangzhou Institute of Geography, Guangdong Academy of Sciences, Guangzhou 510070, China
| | - Xiaodan Zhao
- Guangdong Engineering Technology Research Center of UAV Remote Sensing Network, Guangzhou iMapCloud Intelligent Technology Co., Ltd., Guangzhou 510095, China; (S.X.); (B.L.); (X.Z.)
| |
Collapse
|
2
|
Yıldırım Ş, Ulu B. Deep Learning Based Apples Counting for Yield Forecast Using Proposed Flying Robotic System. SENSORS (BASEL, SWITZERLAND) 2023; 23:6171. [PMID: 37448020 PMCID: PMC10346156 DOI: 10.3390/s23136171] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 08/22/2022] [Revised: 09/12/2022] [Accepted: 09/23/2022] [Indexed: 07/15/2023]
Abstract
Nowadays, Convolution Neural Network (CNN) based deep learning methods are widely used in detecting and classifying fruits from faults, color and size characteristics. In this study, two different neural network model estimators are employed to detect apples using the Single-Shot Multibox Detection (SSD) Mobilenet and Faster Region-CNN (Faster R-CNN) model architectures, with the custom dataset generated from the red apple species. Each neural network model is trained with created dataset using 4000 apple images. With the trained model, apples are detected and counted autonomously using the developed Flying Robotic System (FRS) in a commercially produced apple orchard. In this way, it is aimed that producers make accurate yield forecasts before commercial agreements. In this paper, SSD-Mobilenet and Faster R-CNN architecture models trained with COCO datasets referenced in many studies, and SSD-Mobilenet and Faster R-CNN models trained with a learning rate ranging from 0.015-0.04 using the custom dataset are compared experimentally in terms of performance. In the experiments implemented, it is observed that the accuracy rates of the proposed models increased to the level of 93%. Consequently, it has been observed that the Faster R-CNN model, which is developed, makes extremely successful determinations by lowering the loss value below 0.1.
Collapse
Affiliation(s)
- Şahin Yıldırım
- Department of Mechatronic Engineering, Erciyes University, Kayseri 38039, Turkey
| | | |
Collapse
|
3
|
Intelligent mobility planning for a cost-effective object follower mobile robotic system with obstacle avoidance using robot vision and deep learning. EVOLUTIONARY INTELLIGENCE 2023. [DOI: 10.1007/s12065-023-00817-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/15/2023]
|
4
|
Legittimo M, Felicioni S, Bagni F, Tagliavini A, Dionigi A, Gatti F, Verucchi M, Costante G, Bertogna M. A benchmark analysis of data‐driven and geometric approaches for robot ego‐motion estimation. J FIELD ROBOT 2023. [DOI: 10.1002/rob.22151] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/05/2023]
Affiliation(s)
- Marco Legittimo
- Department of Engineering University of Perugia Perugia Italy
| | | | - Fabio Bagni
- Department of Physics, Informatics and Mathematics University of Modena and Reggio Emilia Modena Italy
- Hipert S.r.l. Modena Italy
| | | | - Alberto Dionigi
- Department of Engineering University of Perugia Perugia Italy
| | | | - Micaela Verucchi
- Department of Physics, Informatics and Mathematics University of Modena and Reggio Emilia Modena Italy
| | | | - Marko Bertogna
- Department of Physics, Informatics and Mathematics University of Modena and Reggio Emilia Modena Italy
- Hipert S.r.l. Modena Italy
| |
Collapse
|
5
|
Diwan T, Anirudh G, Tembhurne JV. Object detection using YOLO: challenges, architectural successors, datasets and applications. MULTIMEDIA TOOLS AND APPLICATIONS 2023; 82:9243-9275. [PMID: 35968414 PMCID: PMC9358372 DOI: 10.1007/s11042-022-13644-y] [Citation(s) in RCA: 23] [Impact Index Per Article: 23.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 01/27/2021] [Revised: 03/14/2022] [Accepted: 08/02/2022] [Indexed: 05/05/2023]
Abstract
Object detection is one of the predominant and challenging problems in computer vision. Over the decade, with the expeditious evolution of deep learning, researchers have extensively experimented and contributed in the performance enhancement of object detection and related tasks such as object classification, localization, and segmentation using underlying deep models. Broadly, object detectors are classified into two categories viz. two stage and single stage object detectors. Two stage detectors mainly focus on selective region proposals strategy via complex architecture; however, single stage detectors focus on all the spatial region proposals for the possible detection of objects via relatively simpler architecture in one shot. Performance of any object detector is evaluated through detection accuracy and inference time. Generally, the detection accuracy of two stage detectors outperforms single stage object detectors. However, the inference time of single stage detectors is better compared to its counterparts. Moreover, with the advent of YOLO (You Only Look Once) and its architectural successors, the detection accuracy is improving significantly and sometime it is better than two stage detectors. YOLOs are adopted in various applications majorly due to their faster inferences rather than considering detection accuracy. As an example, detection accuracies are 63.4 and 70 for YOLO and Fast-RCNN respectively, however, inference time is around 300 times faster in case of YOLO. In this paper, we present a comprehensive review of single stage object detectors specially YOLOs, regression formulation, their architecture advancements, and performance statistics. Moreover, we summarize the comparative illustration between two stage and single stage object detectors, among different versions of YOLOs, applications based on two stage detectors, and different versions of YOLOs along with the future research directions.
Collapse
Affiliation(s)
- Tausif Diwan
- Department of Computer Science & Engineering, Indian Institute of Information Technology, Nagpur, India
| | - G. Anirudh
- Department of Data science and analytics, Central University of Rajasthan, Jaipur, Rajasthan India
| | - Jitendra V. Tembhurne
- Department of Computer Science & Engineering, Indian Institute of Information Technology, Nagpur, India
| |
Collapse
|
6
|
Neural Modeling and Real-Time Environment Training of Human Binocular Stereo Visual Tracking. Cognit Comput 2022. [DOI: 10.1007/s12559-022-10091-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/24/2022]
|
7
|
Xie J, Gao C, Wu J, Shi Z, Chen J. Small Low-Contrast Target Detection: Data-Driven Spatiotemporal Feature Fusion and Implementation. IEEE TRANSACTIONS ON CYBERNETICS 2022; 52:11847-11858. [PMID: 34029202 DOI: 10.1109/tcyb.2021.3072311] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/12/2023]
Abstract
Detecting small low-contrast targets in the airspace is an essential and challenging task. This article proposes a simple and effective data-driven support vector machine (SVM)-based spatiotemporal feature fusion detection method for small low-contrast targets. We design a novel pixel-level feature, called a spatiotemporal profile, to depict the discontinuity of each pixel in the spatial and temporal domains The spatiotemporal profile is a local patch of the spatiotemporal feature maps concatenated by the spatial feature maps and temporal feature maps in channelwise, which are generated by the morphological black-hat filter and a ghost-free dark-focusing frame difference methods, respectively. Instead of the handcrafted feature fusion mechanisms in previous works, we use the labeled spatiotemporal profiles to train an SVM classifier to learn the spatiotemporal feature fusion mechanism automatically. To speed up detection for high-resolution videos, the serial SVM classification process on central processing units (CPUs) is reformed as parallel convolution operations on graphics processing unit (GPUs), which exhibits over 1000+ times speedup in our real experiments. Finally, blob analysis is applied to generate final detection results. Elaborate experiments are conducted, and experimental results demonstrate that the proposed method performs better than 12 baseline methods for the small low-contrast target detection. The field tests manifest that the parallel implementation of the proposed method can realize real-time detection at 15.3 FPS for videos at a resolution of 2048×1536 and the maximum detection distance can reach 1 km for drones in sunny weather.
Collapse
|
8
|
Steckenrider JJ. Adaptive Aerial Localization Using Lissajous Search Patterns. IEEE T ROBOT 2022. [DOI: 10.1109/tro.2021.3126225] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/07/2022]
Affiliation(s)
- J. Josiah Steckenrider
- Department of Civil and Mechanical Engineering United States Military Academy West Point, West Point, NY, USA
| |
Collapse
|
9
|
Design of Moving Target Detection System Using Lightweight Deep Learning Model and Its Impact on the Development of Sports Industry. COMPUTATIONAL INTELLIGENCE AND NEUROSCIENCE 2022; 2022:3252032. [PMID: 35909847 PMCID: PMC9328982 DOI: 10.1155/2022/3252032] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 03/22/2022] [Revised: 04/11/2022] [Accepted: 05/03/2022] [Indexed: 12/05/2022]
Abstract
The intelligent tracking and detection of athletes' actions and the improvement of action standardization are of great practical significance to reducing the injury caused by sports in the sports industry. For the problems of nonstandard movement and single movement mode, this exploration takes the video of sports events as the object and combines it with the video general feature extraction of convolutional neural network (CNN) in the field of deep learning and the filtering detection algorithm of motion trajectory. Then, a target detection and tracking system model is proposed to track and detect targets in sports in real-time. Moreover, through experiments, the performance of the proposed system model is analyzed. After testing the detection quantity, response rate, data loss rate, and target detection accuracy of the model, the results show that the model can track and monitor 50 targets with a loss rate of 3%, a response speed of 4 s and a target detection accuracy of 80%. It can play an excellent role in sports events and postgame video analysis, and provide a good basis and certain design ideas for the goal tracking of the sports industry.
Collapse
|
10
|
Guo S, Li L, Guo T, Cao Y, Li Y. Research on Mask-Wearing Detection Algorithm Based on Improved YOLOv5. SENSORS (BASEL, SWITZERLAND) 2022; 22:4933. [PMID: 35808418 PMCID: PMC9269836 DOI: 10.3390/s22134933] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 05/25/2022] [Revised: 06/19/2022] [Accepted: 06/26/2022] [Indexed: 06/15/2023]
Abstract
COVID-19 is highly contagious, and proper wearing of a mask can hinder the spread of the virus. However, complex factors in natural scenes, including occlusion, dense, and small-scale targets, frequently lead to target misdetection and missed detection. To address these issues, this paper proposes a YOLOv5-based mask-wearing detection algorithm, YOLOv5-CBD. Firstly, the Coordinate Attention mechanism is introduced into the feature fusion process to stress critical features and decrease the impact of redundant features after feature fusion. Then, the original feature pyramid network module in the feature fusion module was replaced with a weighted bidirectional feature pyramid network to achieve efficient bidirectional cross-scale connectivity and weighted feature fusion. Finally, we combined Distance Intersection over Union with Non-Maximum Suppression to improve the missed detection of overlapping targets. Experiments show that the average detection accuracy of the YOLOv5-CBD model is 96.7%-an improvement of 2.1% compared to the baseline model (YOLOv5).
Collapse
|
11
|
Benchmarking Object Detection Deep Learning Models in Embedded Devices. SENSORS 2022; 22:s22114205. [PMID: 35684827 PMCID: PMC9185277 DOI: 10.3390/s22114205] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 04/27/2022] [Revised: 05/26/2022] [Accepted: 05/27/2022] [Indexed: 11/26/2022]
Abstract
Object detection is an essential capability for performing complex tasks in robotic applications. Today, deep learning (DL) approaches are the basis of state-of-the-art solutions in computer vision, where they provide very high accuracy albeit with high computational costs. Due to the physical limitations of robotic platforms, embedded devices are not as powerful as desktop computers, and adjustments have to be made to deep learning models before transferring them to robotic applications. This work benchmarks deep learning object detection models in embedded devices. Furthermore, some hardware selection guidelines are included, together with a description of the most relevant features of the two boards selected for this benchmark. Embedded electronic devices integrate a powerful AI co-processor to accelerate DL applications. To take advantage of these co-processors, models must be converted to a specific embedded runtime format. Five quantization levels applied to a collection of DL models are considered; two of them allow the execution of models in the embedded general-purpose CPU and are used as the baseline to assess the improvements obtained when running the same models with the three remaining quantization levels in the AI co-processors. The benchmark procedure is explained in detail, and a comprehensive analysis of the collected data is presented. Finally, the feasibility and challenges of the implementation of embedded object detection applications are discussed.
Collapse
|
12
|
Abstract
Weather detection systems (WDS) have an indispensable role in supporting the decisions of autonomous vehicles, especially in severe and adverse circumstances. With deep learning techniques, autonomous vehicles can effectively identify outdoor weather conditions and thus make appropriate decisions to easily adapt to new conditions and environments. This paper proposes a deep learning (DL)-based detection framework to categorize weather conditions for autonomous vehicles in adverse or normal situations. The proposed framework leverages the power of transfer learning techniques along with the powerful Nvidia GPU to characterize the performance of three deep convolutional neural networks (CNNs): SqueezeNet, ResNet-50, and EfficientNet. The developed models have been evaluated on two up-to-date weather imaging datasets, namely, DAWN2020 and MCWRD2018. The combined dataset has been used to provide six weather classes: cloudy, rainy, snowy, sandy, shine, and sunrise. Experimentally, all models demonstrated superior classification capacity, with the best experimental performance metrics recorded for the weather-detection-based ResNet-50 CNN model scoring 98.48%, 98.51%, and 98.41% for detection accuracy, precision, and sensitivity. In addition to this, a short detection time has been noted for the weather-detection-based ResNet-50 CNN model, involving an average of 5 (ms) for the time-per-inference step using the GPU component. Finally, comparison with other related state-of-art models showed the superiority of our model which improved the classification accuracy for the six weather conditions classifiers by a factor of 0.5–21%. Consequently, the proposed framework can be effectively implemented in real-time environments to provide decisions on demand for autonomous vehicles with quick, precise detection capacity.
Collapse
|
13
|
Douklias A, Karagiannidis L, Misichroni F, Amditis A. Design and Implementation of a UAV-Based Airborne Computing Platform for Computer Vision and Machine Learning Applications. SENSORS (BASEL, SWITZERLAND) 2022; 22:2049. [PMID: 35271196 PMCID: PMC8914740 DOI: 10.3390/s22052049] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 02/07/2022] [Revised: 02/28/2022] [Accepted: 03/03/2022] [Indexed: 06/14/2023]
Abstract
Visual sensing of the environment is crucial for flying an unmanned aerial vehicle (UAV) and is a centerpiece of many related applications. The ability to run computer vision and machine learning algorithms onboard an unmanned aerial system (UAS) is becoming more of a necessity in an effort to alleviate the communication burden of high-resolution video streaming, to provide flying aids, such as obstacle avoidance and automated landing, and to create autonomous machines. Thus, there is a growing interest on the part of many researchers in developing and validating solutions that are suitable for deployment on a UAV system by following the general trend of edge processing and airborne computing, which transforms UAVs from moving sensors into intelligent nodes that are capable of local processing. In this paper, we present, in a rigorous way, the design and implementation of a 12.85 kg UAV system equipped with the necessary computational power and sensors to serve as a testbed for image processing and machine learning applications, explain the rationale behind our decisions, highlight selected implementation details, and showcase the usefulness of our system by providing an example of how a sample computer vision application can be deployed on our platform.
Collapse
|
14
|
An overview of cluster-based image search result organization: background, techniques, and ongoing challenges. Knowl Inf Syst 2022. [DOI: 10.1007/s10115-021-01650-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/19/2022]
|
15
|
Abdou MA. Literature review: efficient deep neural networks techniques for medical image analysis. Neural Comput Appl 2022. [DOI: 10.1007/s00521-022-06960-9] [Citation(s) in RCA: 6] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/06/2023]
|
16
|
Real-Time Automatic Investigation of Indian Roadway Animals by 3D Reconstruction Detection Using Deep Learning for R-3D-YOLOv3 Image Classification and Filtering. ELECTRONICS 2021. [DOI: 10.3390/electronics10243079] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/16/2022]
Abstract
Statistical reports say that, from 2011 to 2021, more than 11,915 stray animals, such as cats, dogs, goats, cows, etc., and wild animals were wounded in road accidents. Most of the accidents occurred due to negligence and doziness of drivers. These issues can be handled brilliantly using stray and wild animals-vehicle interaction and the pedestrians’ awareness. This paper briefs a detailed forum on GPU-based embedded systems and ODT real-time applications. ML trains machines to recognize images more accurately than humans. This provides a unique and real-time solution using deep-learning real 3D motion-based YOLOv3 (DL-R-3D-YOLOv3) ODT of images on mobility. Besides, it discovers methods for multiple views of flexible objects using 3D reconstruction, especially for stray and wild animals. Computer vision-based IoT devices are also besieged by this DL-R-3D-YOLOv3 model. It seeks solutions by forecasting image filters to find object properties and semantics for object recognition methods leading to closed-loop ODT.
Collapse
|
17
|
Abstract
Object tracking is a fundamental computer vision problem that refers to a set of methods proposed to precisely track the motion trajectory of an object in a video. Multiple Object Tracking (MOT) is a subclass of object tracking that has received growing interest due to its academic and commercial potential. Although numerous methods have been introduced to cope with this problem, many challenges remain to be solved, such as severe object occlusion and abrupt appearance changes. This paper focuses on giving a thorough review of the evolution of MOT in recent decades, investigating the recent advances in MOT, and showing some potential directions for future work. The primary contributions include: (1) a detailed description of the MOT’s main problems and solutions, (2) a categorization of the previous MOT algorithms into 12 approaches and discussion of the main procedures for each category, (3) a review of the benchmark datasets and standard evaluation methods for evaluating the MOT, (4) a discussion of various MOT challenges and solutions by analyzing the related references, and (5) a summary of the latest MOT technologies and recent MOT trends using the mentioned MOT categories.
Collapse
|
18
|
Multiple Drone Navigation and Formation Using Selective Target Tracking-Based Computer Vision. ELECTRONICS 2021. [DOI: 10.3390/electronics10172125] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/16/2022]
Abstract
Autonomous unmanned aerial vehicles work seamlessly within the GPS signal range, but their performance deteriorates in GPS-denied regions. This paper presents a unique collaborative computer vision-based approach for target tracking as per the image’s specific location of interest. The proposed method tracks any object without considering its properties like shape, color, size, or pattern. It is required to keep the target visible and line of sight during the tracking. The method gives freedom of selection to a user to track any target from the image and form a formation around it. We calculate the parameters like distance and angle from the image center to the object for the individual drones. Among all the drones, the one with a significant GPS signal strength or nearer to the target is chosen as the master drone to calculate the relative angle and distance between an object and other drones considering approximate Geo-location. Compared to actual measurements, the results of tests done on a quadrotor UAV frame achieve 99% location accuracy in a robust environment inside the exact GPS longitude and latitude block as GPS-only navigation methods. The individual drones communicate to the ground station through a telemetry link. The master drone calculates the parameters using data collected at ground stations. Various formation flying methods help escort other drones to meet the desired objective with a single high-resolution first-person view (FPV) camera. The proposed method is tested for Airborne Object Target Tracking (AOT) aerial vehicle model and achieves higher tracking accuracy.
Collapse
|
19
|
Abstract
Automated detection of objects in aerial imagery is the basis for many applications, such as search and rescue operations, activity monitoring or mapping. However, in many cases it is beneficial to employ a detector on-board of the aerial platform in order to avoid latencies, make basic decisions within the platform and save transmission bandwidth. In this work, we address the task of designing such an on-board aerial object detector, which meets certain requirements in accuracy, inference speed and power consumption. For this, we first outline a generally applicable design process for such on-board methods and then follow this process to develop our own set of models for the task. Specifically, we first optimize a baseline model with regards to accuracy while not increasing runtime. We then propose a fast detection head to significantly improve runtime at little cost in accuracy. Finally, we discuss several aspects to consider during deployment and in the runtime environment. Our resulting four models that operate at 15, 30, 60 and 90 FPS on an embedded Jetson AGX device are published for future benchmarking and comparison by the community.
Collapse
|
20
|
Image-Based Visual Servo Tracking Control of a Ground Moving Target for a Fixed-Wing Unmanned Aerial Vehicle. J INTELL ROBOT SYST 2021. [DOI: 10.1007/s10846-021-01425-y] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/25/2022]
|
21
|
Wei CC, Huang TH. Modular Neural Networks with Fully Convolutional Networks for Typhoon-Induced Short-Term Rainfall Predictions. SENSORS 2021; 21:s21124200. [PMID: 34207409 PMCID: PMC8235076 DOI: 10.3390/s21124200] [Citation(s) in RCA: 6] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 04/18/2021] [Revised: 06/14/2021] [Accepted: 06/16/2021] [Indexed: 11/26/2022]
Abstract
Taiwan is located at the edge of the northwestern Pacific Ocean and within a typhoon zone. After typhoons are generated, strong winds and heavy rains come to Taiwan and cause major natural disasters. This study employed fully convolutional networks (FCNs) to establish a forecast model for predicting the hourly rainfall data during the arrival of a typhoon. An FCN is an advanced technology that can be used to perform the deep learning of image recognition through semantic segmentation. FCNs deepen the neural net layers and perform upsampling on the feature map of the final convolution layer. This process enables FCN models to restore the size of the output results to that of the raw input image. In this manner, the classification of each raw pixel becomes feasible. The study data were radar echo images and ground station rainfall information for typhoon periods during 2013–2019 in southern Taiwan. Two model cases were designed. The ground rainfall image-based FCN (GRI_FCN) involved the use of the ground rain images to directly forecast the ground rainfall. The GRI combined with rain retrieval image-based modular convolutional neural network (GRI-RRI_MCNN) involved the use of radar echo images to determine the ground rainfall before the prediction of future ground rainfall. Moreover, the RMMLP, a conventional multilayer perceptron neural network, was used to a benchmark model. Forecast horizons varying from 1 to 6 h were evaluated. The results revealed that the GRI-RRI_MCNN model enabled a complete understanding of the future rainfall variation in southern Taiwan during typhoons and effectively improved the accuracy of rainfall forecasting during typhoons.
Collapse
|
22
|
Abstract
With the rise of Deep Learning approaches in computer vision applications, significant strides have been made towards vehicular autonomy. Research activity in autonomous drone navigation has increased rapidly in the past five years, and drones are moving fast towards the ultimate goal of near-complete autonomy. However, while much work in the area focuses on specific tasks in drone navigation, the contribution to the overall goal of autonomy is often not assessed, and a comprehensive overview is needed. In this work, a taxonomy of drone navigation autonomy is established by mapping the definitions of vehicular autonomy levels, as defined by the Society of Automotive Engineers, to specific drone tasks in order to create a clear definition of autonomy when applied to drones. A top–down examination of research work in the area is conducted, focusing on drone navigation tasks, in order to understand the extent of research activity in each area. Autonomy levels are cross-checked against the drone navigation tasks addressed in each work to provide a framework for understanding the trajectory of current research. This work serves as a guide to research in drone autonomy with a particular focus on Deep Learning-based solutions, indicating key works and areas of opportunity for development of this area in the future.
Collapse
|
23
|
Zhang H, Zhu B, Li X, Jiang Y. A Framework for Long-Term Tracking Based on a Global Proposal Network. INT J PATTERN RECOGN 2021. [DOI: 10.1142/s0218001421550119] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022]
Abstract
Deep learning technology has greatly improved the performance of target tracking, but most recently developed tracking algorithms are short-term tracking algorithms, which cannot meet the actual engineering needs. Based on the Siamese network structure, this paper proposes a long-term tracking framework with a persistent tracking capability. The global proposal module extends the search area globally through the construction of a feature pyramid. The local regression module is mainly responsible for the confidence evaluation of the candidate regions and for performing more accurate bounding box regression. To improve the discriminative ability of the regression network, the error samples are eliminated by synthesizing the temporal information and are then classified through a verification module in advance. Experiments on the VOT long-term tracking dataset and the UAV20L aerial dataset show that the proposed algorithm achieves state-of-the-art performance.
Collapse
Affiliation(s)
- Hongwei Zhang
- School of Electronic Countermeasure, National University of Defense Technology, No. 460, Huangshan Road, Hefei, Anhui 230037, P. R. China
| | - Bin Zhu
- School of Electronic Countermeasure, National University of Defense Technology, No. 460, Huangshan Road, Hefei, Anhui 230037, P. R. China
| | - Xiaoxia Li
- State Key Laboratory of Pulsed Power Laser Technology, National University of Defense Technology, No. 460, Huangshan Road, Hefei, Anhui 230037, P. R. China
| | - Yuchen Jiang
- School of Electronic Countermeasure, National University of Defense Technology, No. 460, Huangshan Road, Hefei, Anhui 230037, P. R. China
| |
Collapse
|
24
|
Alzubaidi L, Zhang J, Humaidi AJ, Al-Dujaili A, Duan Y, Al-Shamma O, Santamaría J, Fadhel MA, Al-Amidie M, Farhan L. Review of deep learning: concepts, CNN architectures, challenges, applications, future directions. JOURNAL OF BIG DATA 2021; 8:53. [PMID: 33816053 PMCID: PMC8010506 DOI: 10.1186/s40537-021-00444-8] [Citation(s) in RCA: 760] [Impact Index Per Article: 253.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 01/21/2021] [Accepted: 03/22/2021] [Indexed: 05/04/2023]
Abstract
In the last few years, the deep learning (DL) computing paradigm has been deemed the Gold Standard in the machine learning (ML) community. Moreover, it has gradually become the most widely used computational approach in the field of ML, thus achieving outstanding results on several complex cognitive tasks, matching or even beating those provided by human performance. One of the benefits of DL is the ability to learn massive amounts of data. The DL field has grown fast in the last few years and it has been extensively used to successfully address a wide range of traditional applications. More importantly, DL has outperformed well-known ML techniques in many domains, e.g., cybersecurity, natural language processing, bioinformatics, robotics and control, and medical information processing, among many others. Despite it has been contributed several works reviewing the State-of-the-Art on DL, all of them only tackled one aspect of the DL, which leads to an overall lack of knowledge about it. Therefore, in this contribution, we propose using a more holistic approach in order to provide a more suitable starting point from which to develop a full understanding of DL. Specifically, this review attempts to provide a more comprehensive survey of the most important aspects of DL and including those enhancements recently added to the field. In particular, this paper outlines the importance of DL, presents the types of DL techniques and networks. It then presents convolutional neural networks (CNNs) which the most utilized DL network type and describes the development of CNNs architectures together with their main features, e.g., starting with the AlexNet network and closing with the High-Resolution network (HR.Net). Finally, we further present the challenges and suggested solutions to help researchers understand the existing research gaps. It is followed by a list of the major DL applications. Computational tools including FPGA, GPU, and CPU are summarized along with a description of their influence on DL. The paper ends with the evolution matrix, benchmark datasets, and summary and conclusion.
Collapse
Affiliation(s)
- Laith Alzubaidi
- School of Computer Science, Queensland University of Technology, Brisbane, QLD 4000 Australia
- AlNidhal Campus, University of Information Technology & Communications, Baghdad, 10001 Iraq
| | - Jinglan Zhang
- School of Computer Science, Queensland University of Technology, Brisbane, QLD 4000 Australia
| | - Amjad J. Humaidi
- Control and Systems Engineering Department, University of Technology, Baghdad, 10001 Iraq
| | - Ayad Al-Dujaili
- Electrical Engineering Technical College, Middle Technical University, Baghdad, 10001 Iraq
| | - Ye Duan
- Faculty of Electrical Engineering & Computer Science, University of Missouri, Columbia, MO 65211 USA
| | - Omran Al-Shamma
- AlNidhal Campus, University of Information Technology & Communications, Baghdad, 10001 Iraq
| | - J. Santamaría
- Department of Computer Science, University of Jaén, 23071 Jaén, Spain
| | - Mohammed A. Fadhel
- College of Computer Science and Information Technology, University of Sumer, Thi Qar, 64005 Iraq
| | - Muthana Al-Amidie
- Faculty of Electrical Engineering & Computer Science, University of Missouri, Columbia, MO 65211 USA
| | - Laith Farhan
- School of Engineering, Manchester Metropolitan University, Manchester, M1 5GD UK
| |
Collapse
|
25
|
Real-Time Human Detection and Gesture Recognition for On-Board UAV Rescue. SENSORS 2021; 21:s21062180. [PMID: 33804718 PMCID: PMC8003912 DOI: 10.3390/s21062180] [Citation(s) in RCA: 17] [Impact Index Per Article: 5.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 02/19/2021] [Revised: 03/16/2021] [Accepted: 03/17/2021] [Indexed: 11/16/2022]
Abstract
Unmanned aerial vehicles (UAVs) play an important role in numerous technical and scientific fields, especially in wilderness rescue. This paper carries out work on real-time UAV human detection and recognition of body and hand rescue gestures. We use body-featuring solutions to establish biometric communications, like yolo3-tiny for human detection. When the presence of a person is detected, the system will enter the gesture recognition phase, where the user and the drone can communicate briefly and effectively, avoiding the drawbacks of speech communication. A data-set of ten body rescue gestures (i.e., Kick, Punch, Squat, Stand, Attention, Cancel, Walk, Sit, Direction, and PhoneCall) has been created by a UAV on-board camera. The two most important gestures are the novel dynamic Attention and Cancel which represent the set and reset functions respectively. When the rescue gesture of the human body is recognized as Attention, the drone will gradually approach the user with a larger resolution for hand gesture recognition. The system achieves 99.80% accuracy on testing data in body gesture data-set and 94.71% accuracy on testing data in hand gesture data-set by using the deep learning method. Experiments conducted on real-time UAV cameras confirm our solution can achieve our expected UAV rescue purpose.
Collapse
|
26
|
The INUS Platform: A Modular Solution for Object Detection and Tracking from UAVs and Terrestrial Surveillance Assets. COMPUTATION 2021. [DOI: 10.3390/computation9020012] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/17/2022]
Abstract
Situational awareness is a critical aspect of the decision-making process in emergency response and civil protection and requires the availability of up-to-date information on the current situation. In this context, the related research should not only encompass developing innovative single solutions for (real-time) data collection, but also on the aspect of transforming data into information so that the latter can be considered as a basis for action and decision making. Unmanned systems (UxV) as data acquisition platforms and autonomous or semi-autonomous measurement instruments have become attractive for many applications in emergency operations. This paper proposes a multipurpose situational awareness platform by exploiting advanced on-board processing capabilities and efficient computer vision, image processing, and machine learning techniques. The main pillars of the proposed platform are: (1) a modular architecture that exploits unmanned aerial vehicle (UAV) and terrestrial assets; (2) deployment of on-board data capturing and processing; (3) provision of geolocalized object detection and tracking events; and (4) a user-friendly operational interface for standalone deployment and seamless integration with external systems. Experimental results are provided using RGB and thermal video datasets and applying novel object detection and tracking algorithms. The results show the utility and the potential of the proposed platform, and future directions for extension and optimization are presented.
Collapse
|
27
|
Saponara S, Elhanashi A, Gagliardi A. Implementing a real-time, AI-based, people detection and social distancing measuring system for Covid-19. JOURNAL OF REAL-TIME IMAGE PROCESSING 2021. [PMID: 33500738 DOI: 10.1007/s11554-020-01044-0] [Citation(s) in RCA: 14] [Impact Index Per Article: 4.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/06/2023]
Abstract
COVID-19 is a disease caused by a severe respiratory syndrome coronavirus. It was identified in December 2019 in Wuhan, China. It has resulted in an ongoing pandemic that caused infected cases including many deaths. Coronavirus is primarily spread between people during close contact. Motivating to this notion, this research proposes an artificial intelligence system for social distancing classification of persons using thermal images. By exploiting YOLOv2 (you look at once) approach, a novel deep learning detection technique is developed for detecting and tracking people in indoor and outdoor scenarios. An algorithm is also implemented for measuring and classifying the distance between persons and to automatically check if social distancing rules are respected or not. Hence, this work aims at minimizing the spread of the COVID-19 virus by evaluating if and how persons comply with social distancing rules. The proposed approach is applied to images acquired through thermal cameras, to establish a complete AI system for people tracking, social distancing classification, and body temperature monitoring. The training phase is done with two datasets captured from different thermal cameras. Ground Truth Labeler app is used for labeling the persons in the images. The proposed technique has been deployed in a low-cost embedded system (Jetson Nano) which is composed of a fixed camera. The proposed approach is implemented in a distributed surveillance video system to visualize people from several cameras in one centralized monitoring system. The achieved results show that the proposed method is suitable to set up a surveillance system in smart cities for people detection, social distancing classification, and body temperature analysis.
Collapse
Affiliation(s)
- Sergio Saponara
- Dip. Ingegneria Dell'Informazione University of Pisa, Via G. Caruso 16, 56122 Pisa, Italy
| | - Abdussalam Elhanashi
- Dip. Ingegneria Dell'Informazione University of Pisa, Via G. Caruso 16, 56122 Pisa, Italy
| | - Alessio Gagliardi
- Dip. Ingegneria Dell'Informazione University of Pisa, Via G. Caruso 16, 56122 Pisa, Italy
| |
Collapse
|
28
|
Lightweight Semantic Segmentation Network for Real-Time Weed Mapping Using Unmanned Aerial Vehicles. APPLIED SCIENCES-BASEL 2020. [DOI: 10.3390/app10207132] [Citation(s) in RCA: 13] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/16/2022]
Abstract
The timely and efficient generation of weed maps is essential for weed control tasks and precise spraying applications. Based on the general concept of site-specific weed management (SSWM), many researchers have used unmanned aerial vehicle (UAV) remote sensing technology to monitor weed distributions, which can provide decision support information for precision spraying. However, image processing is mainly conducted offline, as the time gap between image collection and spraying significantly limits the applications of SSWM. In this study, we conducted real-time image processing onboard a UAV to reduce the time gap between image collection and herbicide treatment. First, we established a hardware environment for real-time image processing that integrates map visualization, flight control, image collection, and real-time image processing onboard a UAV based on secondary development. Second, we exploited the proposed model design to develop a lightweight network architecture for weed mapping tasks. The proposed network architecture was evaluated and compared with mainstream semantic segmentation models. Results demonstrate that the proposed network outperform contemporary networks in terms of efficiency with competitive accuracy. We also conducted optimization during the inference process. Precision calibration was applied to both the desktop and embedded devices and the precision was reduced from FP32 to FP16. Experimental results demonstrate that this precision calibration further improves inference speed while maintaining reasonable accuracy. Our modified network architecture achieved an accuracy of 80.9% on the testing samples and its inference speed was 4.5 fps on a Jetson TX2 module (Nvidia Corporation, Santa Clara, CA, USA), which demonstrates its potential for practical agricultural monitoring and precise spraying applications.
Collapse
|
29
|
Multi-Camera Vehicle Tracking Using Edge Computing and Low-Power Communication. SENSORS 2020; 20:s20113334. [PMID: 32545370 PMCID: PMC7309172 DOI: 10.3390/s20113334] [Citation(s) in RCA: 11] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 04/30/2020] [Revised: 06/01/2020] [Accepted: 06/10/2020] [Indexed: 11/16/2022]
Abstract
Typical approaches to visual vehicle tracking across large area require several cameras and complex algorithms to detect, identify and track the vehicle route. Due to memory requirements, computational complexity and hardware constrains, the video images are transmitted to a dedicated workstation equipped with powerful graphic processing units. However, this requires large volumes of data to be transmitted and may raise privacy issues. This paper presents a dedicated deep learning detection and tracking algorithms that can be run directly on the camera's embedded system. This method significantly reduces the stream of data from the cameras, reduces the required communication bandwidth and expands the range of communication technologies to use. Consequently, it allows to use short-range radio communication to transmit vehicle-related information directly between the cameras, and implement the multi-camera tracking directly in the cameras. The proposed solution includes detection and tracking algorithms, and a dedicated low-power short-range communication for multi-target multi-camera tracking systems that can be applied in parking and intersection scenarios. System components were evaluated in various scenarios including different environmental and weather conditions.
Collapse
|
30
|
Le MT, Tu CT, Guo SM, Lien JJJ. A PCB Alignment System Using RST Template Matching with CUDA on Embedded GPU Board. SENSORS 2020; 20:s20092736. [PMID: 32403333 PMCID: PMC7248842 DOI: 10.3390/s20092736] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 03/31/2020] [Revised: 04/24/2020] [Accepted: 05/07/2020] [Indexed: 11/16/2022]
Abstract
The fiducial-marks-based alignment process is one of the most critical steps in printed circuit board (PCB) manufacturing. In the alignment process, a machine vision technique is used to detect the fiducial marks and then adjust the position of the vision system in such a way that it is aligned with the PCB. The present study proposed an embedded PCB alignment system, in which a rotation, scale and translation (RST) template-matching algorithm was employed to locate the marks on the PCB surface. The coordinates and angles of the detected marks were then compared with the reference values which were set by users, and the difference between them was used to adjust the position of the vision system accordingly. To improve the positioning accuracy, the angle and location matching process was performed in refinement processes. To overcome the matching time, in the present study we accelerated the rotation matching by eliminating the weak features in the scanning process and converting the normalized cross correlation (NCC) formula to a sum of products. Moreover, the scanning time was reduced by implementing the entire RST process in parallel on threads of a graphics processing unit (GPU) by applying hash functions to find refined positions in the refinement matching process. The experimental results showed that the resulting matching time was around 32× faster than that achieved on a conventional central processing unit (CPU) for a test image size of 1280 × 960 pixels. Furthermore, the precision of the alignment process achieved a considerable result with a tolerance of 36.4μm.
Collapse
Affiliation(s)
- Minh-Tri Le
- Department of Computer Science and Information Engineering, National Cheng Kung University, No. 1 University Road, Tainan City 701, Taiwan; (M.-T.L.); (S.-M.G.)
| | - Ching-Ting Tu
- Department of Applied Mathematics, National Chung Hsing University, No. 145, Xingda Road, Taichung City 402, Taiwan;
| | - Shu-Mei Guo
- Department of Computer Science and Information Engineering, National Cheng Kung University, No. 1 University Road, Tainan City 701, Taiwan; (M.-T.L.); (S.-M.G.)
| | - Jenn-Jier James Lien
- Department of Computer Science and Information Engineering, National Cheng Kung University, No. 1 University Road, Tainan City 701, Taiwan; (M.-T.L.); (S.-M.G.)
- Correspondence: ; Tel.: +886-2757-5756-2540
| |
Collapse
|
31
|
Investigations of Object Detection in Images/Videos Using Various Deep Learning Techniques and Embedded Platforms—A Comprehensive Review. APPLIED SCIENCES-BASEL 2020. [DOI: 10.3390/app10093280] [Citation(s) in RCA: 35] [Impact Index Per Article: 8.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/17/2022]
Abstract
In recent years there has been remarkable progress in one computer vision application area: object detection. One of the most challenging and fundamental problems in object detection is locating a specific object from the multiple objects present in a scene. Earlier traditional detection methods were used for detecting the objects with the introduction of convolutional neural networks. From 2012 onward, deep learning-based techniques were used for feature extraction, and that led to remarkable breakthroughs in this area. This paper shows a detailed survey on recent advancements and achievements in object detection using various deep learning techniques. Several topics have been included, such as Viola–Jones (VJ), histogram of oriented gradient (HOG), one-shot and two-shot detectors, benchmark datasets, evaluation metrics, speed-up techniques, and current state-of-art object detectors. Detailed discussions on some important applications in object detection areas, including pedestrian detection, crowd detection, and real-time object detection on Gpu-based embedded systems have been presented. At last, we conclude by identifying promising future directions.
Collapse
|
32
|
Mixed YOLOv3-LITE: A Lightweight Real-Time Object Detection Method. SENSORS 2020; 20:s20071861. [PMID: 32230867 PMCID: PMC7180807 DOI: 10.3390/s20071861] [Citation(s) in RCA: 36] [Impact Index Per Article: 9.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 02/16/2020] [Revised: 03/25/2020] [Accepted: 03/26/2020] [Indexed: 11/17/2022]
Abstract
Embedded and mobile smart devices face problems related to limited computing power and excessive power consumption. To address these problems, we propose Mixed YOLOv3-LITE, a lightweight real-time object detection network that can be used with non-graphics processing unit (GPU) and mobile devices. Based on YOLO-LITE as the backbone network, Mixed YOLOv3-LITE supplements residual block (ResBlocks) and parallel high-to-low resolution subnetworks, fully utilizes shallow network characteristics while increasing network depth, and uses a “shallow and narrow” convolution layer to build a detector, thereby achieving an optimal balance between detection precision and speed when used with non-GPU based computers and portable terminal devices. The experimental results obtained in this study reveal that the size of the proposed Mixed YOLOv3-LITE network model is 20.5 MB, which is 91.70%, 38.07%, and 74.25% smaller than YOLOv3, tiny-YOLOv3, and SlimYOLOv3-spp3-50, respectively. The mean average precision (mAP) achieved using the PASCAL VOC 2007 dataset is 48.25%, which is 14.48% higher than that of YOLO-LITE. When the VisDrone 2018-Det dataset is used, the mAP achieved with the Mixed YOLOv3-LITE network model is 28.50%, which is 18.50% and 2.70% higher than tiny-YOLOv3 and SlimYOLOv3-spp3-50, respectively. The results prove that Mixed YOLOv3-LITE can achieve higher efficiency and better performance on mobile terminals and other devices.
Collapse
|
33
|
Real-Time Detection of Ground Objects Based on Unmanned Aerial Vehicle Remote Sensing with Deep Learning: Application in Excavator Detection for Pipeline Safety. REMOTE SENSING 2020. [DOI: 10.3390/rs12010182] [Citation(s) in RCA: 33] [Impact Index Per Article: 8.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/23/2023]
Abstract
Unmanned aerial vehicle (UAV) remote sensing and deep learning provide a practical approach to object detection. However, most of the current approaches for processing UAV remote-sensing data cannot carry out object detection in real time for emergencies, such as firefighting. This study proposes a new approach for integrating UAV remote sensing and deep learning for the real-time detection of ground objects. Excavators, which usually threaten pipeline safety, are selected as the target object. A widely used deep-learning algorithm, namely You Only Look Once V3, is first used to train the excavator detection model on a workstation and then deployed on an embedded board that is carried by a UAV. The recall rate of the trained excavator detection model is 99.4%, demonstrating that the trained model has a very high accuracy. Then, the UAV for an excavator detection system (UAV-ED) is further constructed for operational application. UAV-ED is composed of a UAV Control Module, a UAV Module, and a Warning Module. A UAV experiment with different scenarios was conducted to evaluate the performance of the UAV-ED. The whole process from the UAV observation of an excavator to the Warning Module (350 km away from the testing area) receiving the detection results only lasted about 1.15 s. Thus, the UAV-ED system has good performance and would benefit the management of pipeline safety.
Collapse
|