1
|
Liang L, Ma H, Zhao L, Xie X, Hua C, Zhang M, Zhang Y. Vehicle Detection Algorithms for Autonomous Driving: A Review. SENSORS (BASEL, SWITZERLAND) 2024; 24:3088. [PMID: 38793942 PMCID: PMC11125132 DOI: 10.3390/s24103088] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/27/2024] [Revised: 05/07/2024] [Accepted: 05/10/2024] [Indexed: 05/26/2024]
Abstract
Autonomous driving, as a pivotal technology in modern transportation, is progressively transforming the modalities of human mobility. In this domain, vehicle detection is a significant research direction that involves the intersection of multiple disciplines, including sensor technology and computer vision. In recent years, many excellent vehicle detection methods have been reported, but few studies have focused on summarizing and analyzing these algorithms. This work provides a comprehensive review of existing vehicle detection algorithms and discusses their practical applications in the field of autonomous driving. First, we provide a brief description of the tasks, evaluation metrics, and datasets for vehicle detection. Second, more than 200 classical and latest vehicle detection algorithms are summarized in detail, including those based on machine vision, LiDAR, millimeter-wave radar, and sensor fusion. Finally, this article discusses the strengths and limitations of different algorithms and sensors, and proposes future trends.
Collapse
Affiliation(s)
- Liang Liang
- College of Information Science and Engineering, Henan University of Technology, Zhengzhou 450001, China; (L.L.); (L.Z.); (X.X.)
| | - Haihua Ma
- College of Information Science and Engineering, Henan University of Technology, Zhengzhou 450001, China; (L.L.); (L.Z.); (X.X.)
- Key Laboratory of Grain Information Processing and Control of Ministry of Education, Henan University of Technology, Zhengzhou 450001, China
| | - Le Zhao
- College of Information Science and Engineering, Henan University of Technology, Zhengzhou 450001, China; (L.L.); (L.Z.); (X.X.)
- Key Laboratory of Grain Information Processing and Control of Ministry of Education, Henan University of Technology, Zhengzhou 450001, China
| | - Xiaopeng Xie
- College of Information Science and Engineering, Henan University of Technology, Zhengzhou 450001, China; (L.L.); (L.Z.); (X.X.)
| | - Chengxin Hua
- College of Information Science and Engineering, Henan University of Technology, Zhengzhou 450001, China; (L.L.); (L.Z.); (X.X.)
| | - Miao Zhang
- College of Information Science and Engineering, Henan University of Technology, Zhengzhou 450001, China; (L.L.); (L.Z.); (X.X.)
- Key Laboratory of Grain Information Processing and Control of Ministry of Education, Henan University of Technology, Zhengzhou 450001, China
| | - Yonghui Zhang
- College of Information Science and Engineering, Henan University of Technology, Zhengzhou 450001, China; (L.L.); (L.Z.); (X.X.)
| |
Collapse
|
2
|
Naich AY, Carrión JR. LiDAR-Based Intensity-Aware Outdoor 3D Object Detection. SENSORS (BASEL, SWITZERLAND) 2024; 24:2942. [PMID: 38733047 PMCID: PMC11086319 DOI: 10.3390/s24092942] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/05/2024] [Revised: 04/28/2024] [Accepted: 05/01/2024] [Indexed: 05/13/2024]
Abstract
LiDAR-based 3D object detection and localization are crucial components of autonomous navigation systems, including autonomous vehicles and mobile robots. Most existing LiDAR-based 3D object detection and localization approaches primarily use geometric or structural feature abstractions from LiDAR point clouds. However, these approaches can be susceptible to environmental noise due to adverse weather conditions or the presence of highly scattering media. In this work, we propose an intensity-aware voxel encoder for robust 3D object detection. The proposed voxel encoder generates an intensity histogram that describes the distribution of point intensities within a voxel and is used to enhance the voxel feature set. We integrate this intensity-aware encoder into an efficient single-stage voxel-based detector for 3D object detection. Experimental results obtained using the KITTI dataset show that our method achieves comparable results with respect to the state-of-the-art method for car objects in 3D detection and from a bird's-eye view and superior results for pedestrian and cyclic objects. Furthermore, our model can achieve a detection rate of 40.7 FPS during inference time, which is higher than that of the state-of-the-art methods and incurs a lower computational cost.
Collapse
Affiliation(s)
- Ammar Yasir Naich
- School of Electronic Engineering and Computer Science, Queen Mary University of London, London E1 4NS, UK
| | - Jesús Requena Carrión
- School of Electronic Engineering and Computer Science, Queen Mary University of London, London E1 4NS, UK
| |
Collapse
|
3
|
Ma X, Ouyang W, Simonelli A, Ricci E. 3D Object Detection From Images for Autonomous Driving: A Survey. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE 2024; 46:3537-3556. [PMID: 38145536 DOI: 10.1109/tpami.2023.3346386] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/27/2023]
Abstract
3D object detection from images, one of the fundamental and challenging problems in autonomous driving, has received increasing attention from both industry and academia in recent years. Benefiting from the rapid development of deep learning technologies, image-based 3D detection has achieved remarkable progress. Particularly, more than 200 works have studied this problem from 2015 to 2021, encompassing a broad spectrum of theories, algorithms, and applications. However, to date no recent survey exists to collect and organize this knowledge. In this paper, we fill this gap in the literature and provide the first comprehensive survey of this novel and continuously growing research field, summarizing the most commonly used pipelines for image-based 3D detection and deeply analyzing each of their components. Additionally, we also propose two new taxonomies to organize the state-of-the-art methods into different categories, with the intent of providing a more systematic review of existing methods and facilitating fair comparisons with future works. In retrospect of what has been achieved so far, we also analyze the current challenges in the field and discuss future directions for image-based 3D detection research.
Collapse
|
4
|
Lin C, Tian D, Duan X, Zhou J, Zhao D, Cao D. 3D-DFM: Anchor-Free Multimodal 3-D Object Detection With Dynamic Fusion Module for Autonomous Driving. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2023; 34:10812-10822. [PMID: 35560081 DOI: 10.1109/tnnls.2022.3171553] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/15/2023]
Abstract
Recent advances in cross-modal 3D object detection rely heavily on anchor-based methods, and however, intractable anchor parameter tuning and computationally expensive postprocessing severely impede an embedded system application, such as autonomous driving. In this work, we develop an anchor-free architecture for efficient camera-light detection and ranging (LiDAR) 3D object detection. To highlight the effect of foreground information from different modalities, we propose a dynamic fusion module (DFM) to adaptively interact images with point features via learnable filters. In addition, the 3D distance intersection-over-union (3D-DIoU) loss is explicitly formulated as a supervision signal for 3D-oriented box regression and optimization. We integrate these components into an end-to-end multimodal 3D detector termed 3D-DFM. Comprehensive experimental results on the widely used KITTI dataset demonstrate the superiority and universality of 3D-DFM architecture, with competitive detection accuracy and real-time inference speed. To the best of our knowledge, this is the first work that incorporates an anchor-free pipeline with multimodal 3D object detection.
Collapse
|
5
|
Chitta K, Prakash A, Jaeger B, Yu Z, Renz K, Geiger A. TransFuser: Imitation With Transformer-Based Sensor Fusion for Autonomous Driving. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE 2023; 45:12878-12895. [PMID: 35984797 DOI: 10.1109/tpami.2022.3200245] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/27/2023]
Abstract
How should we integrate representations from complementary sensors for autonomous driving? Geometry-based fusion has shown promise for perception (e.g., object detection, motion forecasting). However, in the context of end-to-end driving, we find that imitation learning based on existing sensor fusion methods underperforms in complex driving scenarios with a high density of dynamic agents. Therefore, we propose TransFuser, a mechanism to integrate image and LiDAR representations using self-attention. Our approach uses transformer modules at multiple resolutions to fuse perspective view and bird's eye view feature maps. We experimentally validate its efficacy on a challenging new benchmark with long routes and dense traffic, as well as the official leaderboard of the CARLA urban driving simulator. At the time of submission, TransFuser outperforms all prior work on the CARLA leaderboard in terms of driving score by a large margin. Compared to geometry-based fusion, TransFuser reduces the average collisions per kilometer by 48%.
Collapse
|
6
|
Zhang X, He L, Chen J, Wang B, Wang Y, Zhou Y. Multiattention Mechanism 3D Object Detection Algorithm Based on RGB and LiDAR Fusion for Intelligent Driving. SENSORS (BASEL, SWITZERLAND) 2023; 23:8732. [PMID: 37960432 PMCID: PMC10649988 DOI: 10.3390/s23218732] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/13/2023] [Revised: 10/11/2023] [Accepted: 10/19/2023] [Indexed: 11/15/2023]
Abstract
This paper proposes a multimodal fusion 3D target detection algorithm based on the attention mechanism to improve the performance of 3D target detection. The algorithm utilizes point cloud data and information from the camera. For image feature extraction, the ResNet50 + FPN architecture extracts features at four levels. Point cloud feature extraction employs the voxel method and FCN to extract point and voxel features. The fusion of image and point cloud features is achieved through regional point fusion and voxel fusion methods. After information fusion, the Coordinate and SimAM attention mechanisms extract fusion features at a deep level. The algorithm's performance is evaluated using the DAIR-V2X dataset. The results show that compared to the Part-A2 algorithm; the proposed algorithm improves the mAP value by 7.9% in the BEV view and 7.8% in the 3D view at IOU = 0.5 (cars) and IOU = 0.25 (pedestrians and cyclists). At IOU = 0.7 (cars) and IOU = 0.5 (pedestrians and cyclists), the mAP value of the SECOND algorithm is improved by 5.4% in the BEV view and 4.3% in the 3D view, compared to other comparison algorithms.
Collapse
Affiliation(s)
| | - Lei He
- State Key Laboratory of Automotive Simulation and Control, Jilin University, Changchun 130022, China; (X.Z.); (J.C.); (B.W.); (Y.W.); (Y.Z.)
| | | | | | | | | |
Collapse
|
7
|
Bavle H, Sanchez-Lopez JL, Cimarelli C, Tourani A, Voos H. From SLAM to Situational Awareness: Challenges and Survey. SENSORS (BASEL, SWITZERLAND) 2023; 23:4849. [PMID: 37430762 DOI: 10.3390/s23104849] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/10/2023] [Revised: 04/27/2023] [Accepted: 05/13/2023] [Indexed: 07/12/2023]
Abstract
The capability of a mobile robot to efficiently and safely perform complex missions is limited by its knowledge of the environment, namely the situation. Advanced reasoning, decision-making, and execution skills enable an intelligent agent to act autonomously in unknown environments. Situational Awareness (SA) is a fundamental capability of humans that has been deeply studied in various fields, such as psychology, military, aerospace, and education. Nevertheless, it has yet to be considered in robotics, which has focused on single compartmentalized concepts such as sensing, spatial perception, sensor fusion, state estimation, and Simultaneous Localization and Mapping (SLAM). Hence, the present research aims to connect the broad multidisciplinary existing knowledge to pave the way for a complete SA system for mobile robotics that we deem paramount for autonomy. To this aim, we define the principal components to structure a robotic SA and their area of competence. Accordingly, this paper investigates each aspect of SA, surveying the state-of-the-art robotics algorithms that cover them, and discusses their current limitations. Remarkably, essential aspects of SA are still immature since the current algorithmic development restricts their performance to only specific environments. Nevertheless, Artificial Intelligence (AI), particularly Deep Learning (DL), has brought new methods to bridge the gap that maintains these fields apart from the deployment to real-world scenarios. Furthermore, an opportunity has been discovered to interconnect the vastly fragmented space of robotic comprehension algorithms through the mechanism of Situational Graph (S-Graph), a generalization of the well-known scene graph. Therefore, we finally shape our vision for the future of robotic situational awareness by discussing interesting recent research directions.
Collapse
Affiliation(s)
- Hriday Bavle
- Interdisciplinary Center for Security Reliability and Trust (SnT), University of Luxembourg, 1855 Luxembourg, Luxembourg
| | - Jose Luis Sanchez-Lopez
- Interdisciplinary Center for Security Reliability and Trust (SnT), University of Luxembourg, 1855 Luxembourg, Luxembourg
| | - Claudio Cimarelli
- Interdisciplinary Center for Security Reliability and Trust (SnT), University of Luxembourg, 1855 Luxembourg, Luxembourg
| | - Ali Tourani
- Interdisciplinary Center for Security Reliability and Trust (SnT), University of Luxembourg, 1855 Luxembourg, Luxembourg
| | - Holger Voos
- Interdisciplinary Center for Security Reliability and Trust (SnT), University of Luxembourg, 1855 Luxembourg, Luxembourg
- Department of Engineering, Faculty of Science, Technology, and Medicine (FSTM), University of Luxembourg, 1359 Luxembourg, Luxembourg
| |
Collapse
|
8
|
Wang L, Huang Y. Fast vehicle detection based on colored point cloud with bird's eye view representation. Sci Rep 2023; 13:7447. [PMID: 37156868 PMCID: PMC10167367 DOI: 10.1038/s41598-023-34479-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/07/2023] [Accepted: 05/02/2023] [Indexed: 05/10/2023] Open
Abstract
RGB cameras and LiDAR are crucial sensors for autonomous vehicles that provide complementary information for accurate detection. Recent early-level fusion-based approaches, flourishing LiDAR data with camera features, may not accomplish promising performance ascribable to the immense difference between two modalities. This paper presents a simple and effective vehicle detection method based on an early-fusion strategy, unified 2D BEV grids, and feature fusion. The proposed method first eliminates many null point clouds through cor-calibration. It augments point cloud data by color information to generate 7D colored point cloud, and unifies augmented data into 2D BEV grids. The colored BEV maps can then be fed to any 2D convolution network. A peculiar Feature Fusion (2F) detection module is utilized to extract multiple scale features from BEV images. Experiments on the KITTI public benchmark and Nuscenes dataset show that fusing RGB image with point cloud rather than raw point cloud can lead to better detection accuracy. Besides, the inference time of the proposed method reaches 0.05 s/frame thanks to its simple and compact architecture.
Collapse
Affiliation(s)
- Lele Wang
- School of Optical-Electrical and Computer Engineering, University of Shanghai for Science and Technology, Shanghai, 200093, China
| | - Yingping Huang
- School of Optical-Electrical and Computer Engineering, University of Shanghai for Science and Technology, Shanghai, 200093, China.
| |
Collapse
|
9
|
He T, Shen C, van den Hengel A. Dynamic Convolution for 3D Point Cloud Instance Segmentation. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE 2023; 45:5697-5711. [PMID: 36279351 DOI: 10.1109/tpami.2022.3216926] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/16/2023]
Abstract
In this paper, we come up with a simple yet effective approach for instance segmentation on 3D point cloud with strong robustness. Previous top-performing methods for this task adopt a bottom-up strategy, which often involves various inefficient operations or complex pipelines, such as grouping over-segmented components, introducing heuristic post-processing steps, and designing complex loss functions. As a result, the inevitable variations of the instances sizes make it vulnerable and sensitive to the values of pre-defined hyper-parameters. To this end, we instead propose a novel pipeline that applies dynamic convolution to generate instance-aware parameters in response to the characteristics of the instances. The representation capability of the parameters is greatly improved by gathering homogeneous points that have identical semantic categories and close votes for the geometric centroids. Instances are then decoded via several simple convolution layers, where the parameters are generated depending on the input. In addition, to introduce a large context and maintain limited computational overheads, a light-weight transformer is built upon the bottleneck layer to capture the long-range dependencies. With the only post-processing step, non-maximum suppression (NMS), we demonstrate a simpler and more robust approach that achieves promising performance on various datasets: ScanNetV2, S3DIS, and PartNet. The consistent improvements on both voxel- and point-based architectures imply the effectiveness of the proposed method. Code is available at: https://git.io/DyCo3D.
Collapse
|
10
|
Ali AM, Benjdira B, Koubaa A, El-Shafai W, Khan Z, Boulila W. Vision Transformers in Image Restoration: A Survey. SENSORS (BASEL, SWITZERLAND) 2023; 23:2385. [PMID: 36904589 PMCID: PMC10006889 DOI: 10.3390/s23052385] [Citation(s) in RCA: 4] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 01/18/2023] [Revised: 02/14/2023] [Accepted: 02/17/2023] [Indexed: 06/18/2023]
Abstract
The Vision Transformer (ViT) architecture has been remarkably successful in image restoration. For a while, Convolutional Neural Networks (CNN) predominated in most computer vision tasks. Now, both CNN and ViT are efficient approaches that demonstrate powerful capabilities to restore a better version of an image given in a low-quality format. In this study, the efficiency of ViT in image restoration is studied extensively. The ViT architectures are classified for every task of image restoration. Seven image restoration tasks are considered: Image Super-Resolution, Image Denoising, General Image Enhancement, JPEG Compression Artifact Reduction, Image Deblurring, Removing Adverse Weather Conditions, and Image Dehazing. The outcomes, the advantages, the limitations, and the possible areas for future research are detailed. Overall, it is noted that incorporating ViT in the new architectures for image restoration is becoming a rule. This is due to some advantages compared to CNN, such as better efficiency, especially when more data are fed to the network, robustness in feature extraction, and a better feature learning approach that sees better the variances and characteristics of the input. Nevertheless, some drawbacks exist, such as the need for more data to show the benefits of ViT over CNN, the increased computational cost due to the complexity of the self-attention block, a more challenging training process, and the lack of interpretability. These drawbacks represent the future research direction that should be targeted to increase the efficiency of ViT in the image restoration domain.
Collapse
Affiliation(s)
- Anas M. Ali
- Robotics and Internet-of-Things Laboratory, Prince Sultan University, Riyadh 12435, Saudi Arabia
- Department of Electronics and Electrical Communications Engineering, Faculty of Electronic Engineering, Menoufia University, Menouf 32952, Egypt
| | - Bilel Benjdira
- Robotics and Internet-of-Things Laboratory, Prince Sultan University, Riyadh 12435, Saudi Arabia
- SE & ICT Laboratory, LR18ES44, ENICarthage, University of Carthage, Tunis 1054, Tunisia
| | - Anis Koubaa
- Robotics and Internet-of-Things Laboratory, Prince Sultan University, Riyadh 12435, Saudi Arabia
| | - Walid El-Shafai
- Department of Electronics and Electrical Communications Engineering, Faculty of Electronic Engineering, Menoufia University, Menouf 32952, Egypt
- Security Engineering Laboratory, Computer Science Department, Prince Sultan University, Riyadh 11586, Saudi Arabia
| | - Zahid Khan
- Robotics and Internet-of-Things Laboratory, Prince Sultan University, Riyadh 12435, Saudi Arabia
| | - Wadii Boulila
- Robotics and Internet-of-Things Laboratory, Prince Sultan University, Riyadh 12435, Saudi Arabia
- RIADI Laboratory, University of Manouba, Manouba 2010, Tunisia
| |
Collapse
|
11
|
Martínez-Otzeta JM, Rodríguez-Moreno I, Mendialdua I, Sierra B. RANSAC for Robotic Applications: A Survey. SENSORS (BASEL, SWITZERLAND) 2022; 23:327. [PMID: 36616922 PMCID: PMC9824669 DOI: 10.3390/s23010327] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 11/23/2022] [Revised: 12/20/2022] [Accepted: 12/23/2022] [Indexed: 06/17/2023]
Abstract
Random Sample Consensus, most commonly abbreviated as RANSAC, is a robust estimation method for the parameters of a model contaminated by a sizable percentage of outliers. In its simplest form, the process starts with a sampling of the minimum data needed to perform an estimation, followed by an evaluation of its adequacy, and further repetitions of this process until some stopping criterion is met. Multiple variants have been proposed in which this workflow is modified, typically tweaking one or several of these steps for improvements in computing time or the quality of the estimation of the parameters. RANSAC is widely applied in the field of robotics, for example, for finding geometric shapes (planes, cylinders, spheres, etc.) in cloud points or for estimating the best transformation between different camera views. In this paper, we present a review of the current state of the art of RANSAC family methods with a special interest in applications in robotics.
Collapse
Affiliation(s)
- José María Martínez-Otzeta
- Department of Computer Science and Artificial Intelligence, University of the Basque Country, 20018 Donostia-San Sebastián, Spain
| | - Itsaso Rodríguez-Moreno
- Department of Computer Science and Artificial Intelligence, University of the Basque Country, 20018 Donostia-San Sebastián, Spain
| | - Iñigo Mendialdua
- Department of Languages and Information Systems, University of the Basque Country, 20018 Donostia-San Sebastián, Spain
| | - Basilio Sierra
- Department of Computer Science and Artificial Intelligence, University of the Basque Country, 20018 Donostia-San Sebastián, Spain
| |
Collapse
|
12
|
Zhu Y, Xu R, An H, Tao C, Lu K. Anti-Noise 3D Object Detection of Multimodal Feature Attention Fusion Based on PV-RCNN. SENSORS (BASEL, SWITZERLAND) 2022; 23:233. [PMID: 36616829 PMCID: PMC9823336 DOI: 10.3390/s23010233] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 11/09/2022] [Revised: 12/14/2022] [Accepted: 12/22/2022] [Indexed: 06/17/2023]
Abstract
3D object detection methods based on camera and LiDAR fusion are susceptible to environmental noise. Due to the mismatch of physical characteristics of the two sensors, the feature vectors encoded by the feature layer are in different feature spaces. This leads to the problem of feature information deviation, which has an impact on detection performance. To address this problem, a point-guided feature abstract method is presented to fuse the camera and LiDAR at first. The extracted image features and point cloud features are aggregated to keypoints for enhancing information redundancy. Second, the proposed multimodal feature attention (MFA) mechanism is used to achieve adaptive fusion of point cloud features and image features with information from multiple feature spaces. Finally, a projection-based farthest point sampling (P-FPS) is proposed to downsample the raw point cloud, which can project more keypoints onto the close object and improve the sampling rate of the point-guided image features. The 3D bounding boxes of the object is obtained by the region of interest (ROI) pooling layer and the fully connected layer. The proposed 3D object detection algorithm is evaluated on three different datasets, and the proposed algorithm achieved better detection performance and robustness when the image and point cloud data contain rain noise. The test results on a physical test platform further validate the effectiveness of the algorithm.
Collapse
Affiliation(s)
- Yuan Zhu
- School of Automotive Studies, Tongji University, Shanghai 201800, China
| | - Ruidong Xu
- School of Automotive Studies, Tongji University, Shanghai 201800, China
| | - Hao An
- School of Automotive Studies, Tongji University, Shanghai 201800, China
| | - Chongben Tao
- Suzhou Automotive Research Institute, Tsinghua University, Suzhou 215200, China
| | - Ke Lu
- School of Automotive Studies, Tongji University, Shanghai 201800, China
| |
Collapse
|
13
|
Zhou Y, He Y, Zhu H, Wang C, Li H, Jiang Q. MonoEF: Extrinsic Parameter Free Monocular 3D Object Detection. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE 2022; 44:10114-10128. [PMID: 34932471 DOI: 10.1109/tpami.2021.3136899] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/14/2023]
Abstract
Monocular 3D object detection is an important task in autonomous driving. It can be easily intractable where there exists ego-car pose change w.r.t. ground plane. This is common due to the slight fluctuation of road smoothness and slope. Due to the lack of insight in industrial application, existing methods on open datasets neglect the camera pose information, which inevitably results in the detector being susceptible to camera extrinsic parameters. The perturbation of objects is very popular in most autonomous driving cases for industrial products. To this end, we propose a novel method to capture camera pose to formulate the detector free from extrinsic perturbation. Specifically, the proposed framework predicts camera extrinsic parameters by detecting vanishing point and horizon change. A converter is designed to rectify perturbative features in the latent space. By doing so, our 3D detector works independent of the extrinsic parameter variations and produces accurate results in realistic cases, e.g., potholed and uneven roads, where almost all existing monocular detectors fail to handle. Experiments demonstrate our method yields the best performance compared with the other state-of-the-arts by a large margin on both KITTI 3D and nuScenes datasets.
Collapse
|
14
|
PV-RCNN++: Point-Voxel Feature Set Abstraction With Local Vector Representation for 3D Object Detection. Int J Comput Vis 2022. [DOI: 10.1007/s11263-022-01710-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/26/2022]
Abstract
Abstract3D object detection is receiving increasing attention from both industry and academia thanks to its wide applications in various fields. In this paper, we propose Point-Voxel Region-based Convolution Neural Networks (PV-RCNNs) for 3D object detection on point clouds. First, we propose a novel 3D detector, PV-RCNN, which boosts the 3D detection performance by deeply integrating the feature learning of both point-based set abstraction and voxel-based sparse convolution through two novel steps, i.e., the voxel-to-keypoint scene encoding and the keypoint-to-grid RoI feature abstraction. Second, we propose an advanced framework, PV-RCNN++, for more efficient and accurate 3D object detection. It consists of two major improvements: sectorized proposal-centric sampling for efficiently producing more representative keypoints, and VectorPool aggregation for better aggregating local point features with much less resource consumption. With these two strategies, our PV-RCNN++ is about $$3\times $$
3
×
faster than PV-RCNN, while also achieving better performance. The experiments demonstrate that our proposed PV-RCNN++ framework achieves state-of-the-art 3D detection performance on the large-scale and highly-competitive Waymo Open Dataset with 10 FPS inference speed on the detection range of $$150m \times 150m$$
150
m
×
150
m
.
Collapse
|
15
|
Wang L, Song Z, Zhang X, Wang C, Zhang G, Zhu L, Li J, Liu H. SAT-GCN: Self-attention graph convolutional network-based 3D object detection for autonomous driving. Knowl Based Syst 2022. [DOI: 10.1016/j.knosys.2022.110080] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/07/2022]
|
16
|
Du L, Ye X, Tan X, Johns E, Chen B, Ding E, Xue X, Feng J. AGO-Net: Association-Guided 3D Point Cloud Object Detection Network. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE 2022; 44:8097-8109. [PMID: 34379590 DOI: 10.1109/tpami.2021.3104172] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/13/2023]
Abstract
The human brain can effortlessly recognize and localize objects, whereas current 3D object detection methods based on LiDAR point clouds still report inferior performance for detecting occluded and distant objects: The point cloud appearance varies greatly due to occlusion, and has inherent variance in point densities along the distance to sensors. Therefore, designing feature representations robust to such point clouds is critical. Inspired by human associative recognition, we propose a novel 3D detection framework that associates intact features for objects via domain adaptation. We bridge the gap between the perceptual domain, where features are derived from real scenes with sub-optimal representations, and the conceptual domain, where features are extracted from augmented scenes that consist of non-occlusion objects with rich detailed information. A feasible method is investigated to construct conceptual scenes without external datasets. We further introduce an attention-based re-weighting module that adaptively strengthens the feature adaptation of more informative regions. The network's feature enhancement ability is exploited without introducing extra cost during inference, which is plug-and-play in various 3D detection frameworks. We achieve new state-of-the-art performance on the KITTI 3D detection benchmark in both accuracy and speed. Experiments on nuScenes and Waymo datasets also validate the versatility of our method.
Collapse
|
17
|
Wang X, Ang MH, Lee GH. Cascaded Refinement Network for Point Cloud Completion With Self-Supervision. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE 2022; 44:8139-8150. [PMID: 34460366 DOI: 10.1109/tpami.2021.3108410] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/13/2023]
Abstract
Point clouds are often sparse and incomplete, which imposes difficulties for real-world applications. Existing shape completion methods tend to generate rough shapes without fine-grained details. Considering this, we introduce a two-branch network for shape completion. The first branch is a cascaded shape completion sub-network to synthesize complete objects, where we propose to use the partial input together with the coarse output to preserve the object details during the dense point reconstruction. The second branch is an auto-encoder to reconstruct the original partial input. The two branches share a same feature extractor to learn an accurate global feature for shape completion. Furthermore, we propose two strategies to enable the training of our network when ground truth data are not available. This is to mitigate the dependence of existing approaches on large amounts of ground truth training data that are often difficult to obtain in real-world applications. Additionally, our proposed strategies are also able to improve the reconstruction quality for fully supervised learning. We verify our approach in self-supervised, semi-supervised and fully supervised settings with superior performances. Quantitative and qualitative results on different datasets demonstrate that our method achieves more realistic outputs than state-of-the-art approaches on the point cloud completion task.
Collapse
|
18
|
Cai Q, Pan Y, Yao T, Mei T. 3D Cascade RCNN: High Quality Object Detection in Point Clouds. IEEE TRANSACTIONS ON IMAGE PROCESSING : A PUBLICATION OF THE IEEE SIGNAL PROCESSING SOCIETY 2022; 31:5706-5719. [PMID: 36040944 DOI: 10.1109/tip.2022.3201469] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/15/2023]
Abstract
Recent progress on 2D object detection has featured Cascade RCNN, which capitalizes on a sequence of cascade detectors to progressively improve proposal quality, towards high-quality object detection. However, there has not been evidence in support of building such cascade structures for 3D object detection, a challenging detection scenario with highly sparse LiDAR point clouds. In this work, we present a simple yet effective cascade architecture, named 3D Cascade RCNN, that allocates multiple detectors based on the voxelized point clouds in a cascade paradigm, pursuing higher quality 3D object detector progressively. Furthermore, we quantitatively define the sparsity level of the points within 3D bounding box of each object as the point completeness score, which is exploited as the task weight for each proposal to guide the learning of each stage detector. The spirit behind is to assign higher weights for high-quality proposals with relatively complete point distribution, while down-weight the proposals with extremely sparse points that often incur noise during training. This design of completeness-aware re-weighting elegantly upgrades the cascade paradigm to be better applicable for the sparse input data, without increasing any FLOP budgets. Through extensive experiments on both the KITTI dataset and Waymo Open Dataset, we validate the superiority of our proposed 3D Cascade RCNN, when comparing to state-of-the-art 3D object detection techniques. The source code is publicly available at https://github.com/caiqi/Cascasde-3D.
Collapse
|
19
|
Han R, Feng W, Zhang Y, Zhao J, Wang S. Multiple Human Association and Tracking From Egocentric and Complementary Top Views. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE 2022; 44:5225-5242. [PMID: 33798068 DOI: 10.1109/tpami.2021.3070562] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/12/2023]
Abstract
Crowded scene surveillance can significantly benefit from combining egocentric-view and its complementary top-view cameras. A typical setting is an egocentric-view camera, e.g., a wearable camera on the ground capturing rich local details, and a top-view camera, e.g., a drone-mounted one from high altitude providing a global picture of the scene. To collaboratively analyze such complementary-view videos, an important task is to associate and track multiple people across views and over time, which is challenging and differs from classical human tracking, since we need to not only track multiple subjects in each video, but also identify the same subjects across the two complementary views. This paper formulates it as a constrained mixed integer programming problem, wherein a major challenge is how to effectively measure subjects similarity over time in each video and across two views. Although appearance and motion consistencies well apply to over-time association, they are not good at connecting two highly different complementary views. To this end, we present a spatial distribution based approach to reliable cross-view subject association. We also build a dataset to benchmark this new challenging task. Extensive experiments verify the effectiveness of our method.
Collapse
|
20
|
Meng Q, Wang W, Zhou T, Shen J, Jia Y, Van Gool L. Towards a Weakly Supervised Framework for 3D Point Cloud Object Detection and Annotation. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE 2022; 44:4454-4468. [PMID: 33656990 DOI: 10.1109/tpami.2021.3063611] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/12/2023]
Abstract
It is quite laborious and costly to manually label LiDAR point cloud data for training high-quality 3D object detectors. This work proposes a weakly supervised framework which allows learning 3D detection from a few weakly annotated examples. This is achieved by a two-stage architecture design. Stage-1 learns to generate cylindrical object proposals under inaccurate and inexact supervision, obtained by our proposed BEV center-click annotation strategy, where only the horizontal object centers are click-annotated in bird's view scenes. Stage-2 learns to predict cuboids and confidence scores in a coarse-to-fine, cascade manner, under incomplete supervision, i.e., only a small portion of object cuboids are precisely annotated. With KITTI dataset, using only 500 weakly annotated scenes and 534 precisely labeled vehicle instances, our method achieves 86-97 percent the performance of current top-leading, fully supervised detectors (which require 3,712 exhaustively annotated scenes with 15,654 instances). More importantly, with our elaborately designed network architecture, our trained model can be applied as a 3D object annotator, supporting both automatic and active (human-in-the-loop) working modes. The annotations generated by our model can be used to train 3D object detectors, achieving over 95 percent of their original performance (with manually labeled training data). Our experiments also show our model's potential in boosting performance when given more training data. The above designs make our approach highly practical and open-up opportunities for learning 3D detection at reduced annotation cost.
Collapse
|
21
|
Chen W, Li P, Zhao H. MSL3D: 3D object detection from monocular, stereo and point cloud for autonomous driving. Neurocomputing 2022. [DOI: 10.1016/j.neucom.2022.04.075] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/18/2022]
|
22
|
Lin S, Wang Z, Ling Y, Tao Y, Yang C. E2EK: End-to-End Regression Network Based on Keypoint for 6D Pose Estimation. IEEE Robot Autom Lett 2022. [DOI: 10.1109/lra.2022.3174261] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/07/2022]
Affiliation(s)
- Shifeng Lin
- School of Automation Science and Engineering, South China University of Technology, Guangdong, China
| | | | | | - Yidan Tao
- Shanghai Jiao Tong University, Shanghai, China
| | - Chenguang Yang
- Bristol Robotics Laboratory, University of the West of England, Bristol, U.K
| |
Collapse
|
23
|
SRIF-RCNN: Sparsely represented inputs fusion of different sensors for 3D object detection. APPL INTELL 2022. [DOI: 10.1007/s10489-022-03594-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/02/2022]
|
24
|
Liu B, Tian B, Wang H, Qiao J, Wang Z. FuseNet: 3D Object Detection Network with Fused Information for Lidar Point Clouds. Neural Process Lett 2022. [DOI: 10.1007/s11063-022-10848-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/18/2022]
|
25
|
MsDA: Multi-scale domain adaptation dehazing network. APPL INTELL 2022. [DOI: 10.1007/s10489-022-03540-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]
|
26
|
|
27
|
Hao Z, Haiyang H, Tianci L. CrossGAN-Detection: A generative adversarial network with directly controllable fusion for target detection. JOURNAL OF INTELLIGENT & FUZZY SYSTEMS 2022. [DOI: 10.3233/jifs-213074] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/15/2022]
Abstract
Most of the deep learning object detection methods based on multi-modal information fusion cannot directly control the quality of the fused images at present, because the fusion only depends on the detection results. The indirectness of control is not conducive to the target detection of the network in principle. For the sake of the problem, we propose a multimodal information cross-fusion detection method based on a generative adversarial network (CrossGAN-Detection), which is composed of GAN and a target detection network. And the target detection network acts as the second discriminator of GAN during training. Through the content loss function and dual discriminator, directly controllable guidance is provided for the generator, which is designed to learn the relationship between different modes adaptively through cross fusion. We conduct abundant experiments on the KITTI dataset, which is the prevalent dataset in the fusion-detection field. The experimental results show that the AP of the novel method for vehicle detection achieves 96.66%, 87.15%, and 78.46% in easy, moderate, and hard categories respectively, which is improved about 7% compared to the state-of-art methods.
Collapse
Affiliation(s)
- Zhang Hao
- Key Laboratory of Opto-Electronic Information Processing, Chinese Academy of Sciences, Shenyang, China
- Shenyang Institute of Automation, Chinese Academy of Sciences, Shenyang, China
- Institutes for Robotics and Intelligent Manufacturing, Chinese Academy of Sciences, Shenyang, China
- University of Chinese Academy of Sciences, Beijing, China
| | - Hua Haiyang
- Key Laboratory of Opto-Electronic Information Processing, Chinese Academy of Sciences, Shenyang, China
- Shenyang Institute of Automation, Chinese Academy of Sciences, Shenyang, China
| | - Liu Tianci
- Key Laboratory of Opto-Electronic Information Processing, Chinese Academy of Sciences, Shenyang, China
- Shenyang Institute of Automation, Chinese Academy of Sciences, Shenyang, China
| |
Collapse
|
28
|
Ji T, Sivakumar AN, Chowdhary G, Driggs-Campbell K. Proactive Anomaly Detection for Robot Navigation With Multi-Sensor Fusion. IEEE Robot Autom Lett 2022. [DOI: 10.1109/lra.2022.3153989] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022]
Affiliation(s)
- Tianchen Ji
- Coordinated Science Laboratory, University of Illinois at Urbana-Champaign, Champaign, IL, USA
| | | | - Girish Chowdhary
- Coordinated Science Laboratory, University of Illinois at Urbana-Champaign, Champaign, IL, USA
| | | |
Collapse
|
29
|
Guo Y, Wang H, Hu Q, Liu H, Liu L, Bennamoun M. Deep Learning for 3D Point Clouds: A Survey. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE 2021; 43:4338-4364. [PMID: 32750799 DOI: 10.1109/tpami.2020.3005434] [Citation(s) in RCA: 151] [Impact Index Per Article: 50.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/11/2023]
Abstract
Point cloud learning has lately attracted increasing attention due to its wide applications in many areas, such as computer vision, autonomous driving, and robotics. As a dominating technique in AI, deep learning has been successfully used to solve various 2D vision problems. However, deep learning on point clouds is still in its infancy due to the unique challenges faced by the processing of point clouds with deep neural networks. Recently, deep learning on point clouds has become even thriving, with numerous methods being proposed to address different problems in this area. To stimulate future research, this paper presents a comprehensive review of recent progress in deep learning methods for point clouds. It covers three major tasks, including 3D shape classification, 3D object detection and tracking, and 3D point cloud segmentation. It also presents comparative results on several publicly available datasets, together with insightful observations and inspiring future research directions.
Collapse
|
30
|
Li Z, Du Y, Zhu M, Zhou S, Zhang L. A survey of 3D object detection algorithms for intelligent vehicles development. ARTIFICIAL LIFE AND ROBOTICS 2021; 27:115-122. [PMID: 34744502 PMCID: PMC8559424 DOI: 10.1007/s10015-021-00711-0] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/14/2021] [Accepted: 10/17/2021] [Indexed: 12/05/2022]
Abstract
With the rapid development of Artificial Intelligent algorithms on Computer Vision, 2D object detection has greatly succeeded and been applied in various industrial products. In the past several years, the accuracy of 2D object detection has been dramatically improved, even beyond the human eyes detection ability. However, there is still a limitation of 2D object detection for the applications of Intelligent Driving. A safe and reliable self-driving car needs to detect a 3D model of the around objects so that an intelligent driving car has a perception ability to real driving situations. This paper systematically surveys the development of 3D object detection methods applied to intelligent driving technology. This paper also analyzes the shortcomings of the existing 3D detection algorithms and the future development directions of 3D detection algorithms on intelligent driving.
Collapse
Affiliation(s)
- Zhen Li
- Kyushu Institute of Technology, Kitakyushu, Japan
| | - Yuren Du
- Yangzhou University, Yangzhou, China
| | - Miaomiao Zhu
- Kyushu Institute of Technology, Kitakyushu, Japan
| | - Shi Zhou
- Kyushu Institute of Technology, Kitakyushu, Japan
| | - Lifeng Zhang
- Kyushu Institute of Technology, Kitakyushu, Japan
| |
Collapse
|
31
|
Wu K, Xu G, Liu Z, Liu H, Cai D, He X. PointCSE: Context-sensitive encoders for efficient 3D object detection from point cloud. INT J MACH LEARN CYB 2021. [DOI: 10.1007/s13042-021-01342-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]
|
32
|
Tao C, He H, Xu F, Cao J. Stereo priori RCNN based car detection on point level for autonomous driving. Knowl Based Syst 2021. [DOI: 10.1016/j.knosys.2021.107346] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/20/2022]
|
33
|
Wang Q, Chen J, Deng J, Zhang X. PI-Net: An End-to-End Deep Neural Network for Bidirectionally and Directly Fusing Point Clouds With Images. IEEE Robot Autom Lett 2021. [DOI: 10.1109/lra.2021.3114429] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/08/2022]
|
34
|
Duerr F, Weigel H, Beyerer J. Decoupled Iterative Deep Sensor Fusion for 3D Semantic Segmentation. INTERNATIONAL JOURNAL OF SEMANTIC COMPUTING 2021. [DOI: 10.1142/s1793351x21400067] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022]
Abstract
One of the key tasks for autonomous vehicles or robots is a robust perception of their 3D environment, which is why autonomous vehicles or robots are equipped with a wide range of different sensors. Building upon a robust sensor setup, understanding and interpreting their 3D environment is the next important step. Semantic segmentation of 3D sensor data, e.g. point clouds, provides valuable information for this task and is often seen as key enabler for 3D scene understanding. This work presents an iterative deep fusion architecture for semantic segmentation of 3D point clouds, which builds upon a range image representation of the point clouds and additionally exploits camera features to increase accuracy and robustness. In contrast to other approaches, which fuse lidar and camera features once, the proposed fusion strategy iteratively combines and refines lidar and camera features at different scales inside the network architecture. Additionally, the proposed approach can deal with camera failure as well as jointly predict lidar and camera segmentation. We demonstrate the benefits of the presented iterative deep fusion approach on two challenging datasets, outperforming all range image-based lidar and fusion approaches. An in-depth evaluation underlines the effectiveness of the proposed fusion strategy and the potential of camera features for 3D semantic segmentation.
Collapse
Affiliation(s)
- Fabian Duerr
- AUDI AG, 85057 Ingolstadt, Germany
- Vision and Fusion Laboratory, Karlsruhe Institute of Technology (KIT), Karlsruhe, Germany
| | | | - Jürgen Beyerer
- Vision and Fusion Laboratory, Karlsruhe Institute of Technology (KIT), Karlsruhe, Germany
- Fraunhofer Institute of Optronics, System Technologies and Image Exploitation (IOSB), Fraunhofer Center for Machine Learning, 76131 Karlsruhe, Germany
| |
Collapse
|
35
|
Blum H, Sarlin PE, Nieto J, Siegwart R, Cadena C. The Fishyscapes Benchmark: Measuring Blind Spots in Semantic Segmentation. Int J Comput Vis 2021. [DOI: 10.1007/s11263-021-01511-6] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/20/2022]
Abstract
AbstractDeep learning has enabled impressive progress in the accuracy of semantic segmentation. Yet, the ability to estimate uncertainty and detect failure is key for safety-critical applications like autonomous driving. Existing uncertainty estimates have mostly been evaluated on simple tasks, and it is unclear whether these methods generalize to more complex scenarios. We present Fishyscapes, the first public benchmark for anomaly detection in a real-world task of semantic segmentation for urban driving. It evaluates pixel-wise uncertainty estimates towards the detection of anomalous objects. We adapt state-of-the-art methods to recent semantic segmentation models and compare uncertainty estimation approaches based on softmax confidence, Bayesian learning, density estimation, image resynthesis, as well as supervised anomaly detection methods. Our results show that anomaly detection is far from solved even for ordinary situations, while our benchmark allows measuring advancements beyond the state-of-the-art. Results, data and submission information can be found at https://fishyscapes.com/.
Collapse
|
36
|
Li Y, Ma L, Zhong Z, Liu F, Chapman MA, Cao D, Li J. Deep Learning for LiDAR Point Clouds in Autonomous Driving: A Review. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2021; 32:3412-3432. [PMID: 32822311 DOI: 10.1109/tnnls.2020.3015992] [Citation(s) in RCA: 40] [Impact Index Per Article: 13.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/11/2023]
Abstract
Recently, the advancement of deep learning (DL) in discriminative feature learning from 3-D LiDAR data has led to rapid development in the field of autonomous driving. However, automated processing uneven, unstructured, noisy, and massive 3-D point clouds are a challenging and tedious task. In this article, we provide a systematic review of existing compelling DL architectures applied in LiDAR point clouds, detailing for specific tasks in autonomous driving, such as segmentation, detection, and classification. Although several published research articles focus on specific topics in computer vision for autonomous vehicles, to date, no general survey on DL applied in LiDAR point clouds for autonomous vehicles exists. Thus, the goal of this article is to narrow the gap in this topic. More than 140 key contributions in the recent five years are summarized in this survey, including the milestone 3-D deep architectures, the remarkable DL applications in 3-D semantic segmentation, object detection, and classification; specific data sets, evaluation metrics, and the state-of-the-art performance. Finally, we conclude the remaining challenges and future researches.
Collapse
|
37
|
Shi S, Wang Z, Shi J, Wang X, Li H. From Points to Parts: 3D Object Detection From Point Cloud With Part-Aware and Part-Aggregation Network. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE 2021; 43:2647-2664. [PMID: 32142423 DOI: 10.1109/tpami.2020.2977026] [Citation(s) in RCA: 36] [Impact Index Per Article: 12.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/10/2023]
Abstract
3D object detection from LiDAR point cloud is a challenging problem in 3D scene understanding and has many practical applications. In this paper, we extend our preliminary work PointRCNN to a novel and strong point-cloud-based 3D object detection framework, the part-aware and aggregation neural network (Part- A2 net). The whole framework consists of the part-aware stage and the part-aggregation stage. First, the part-aware stage for the first time fully utilizes free-of-charge part supervisions derived from 3D ground-truth boxes to simultaneously predict high quality 3D proposals and accurate intra-object part locations. The predicted intra-object part locations within the same proposal are grouped by our new-designed RoI-aware point cloud pooling module, which results in an effective representation to encode the geometry-specific features of each 3D proposal. Then the part-aggregation stage learns to re-score the box and refine the box location by exploring the spatial relationship of the pooled intra-object part locations. Extensive experiments are conducted to demonstrate the performance improvements from each component of our proposed framework. Our Part- A2 net outperforms all existing 3D detection methods and achieves new state-of-the-art on KITTI 3D object detection dataset by utilizing only the LiDAR point cloud data.
Collapse
|
38
|
Yasarla R, Sindagi VA, Patel VM. Semi-Supervised Image Deraining Using Gaussian Processes. IEEE TRANSACTIONS ON IMAGE PROCESSING : A PUBLICATION OF THE IEEE SIGNAL PROCESSING SOCIETY 2021; 30:6570-6582. [PMID: 34270423 DOI: 10.1109/tip.2021.3096323] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/13/2023]
Abstract
Recent CNN-based methods for image deraining have achieved excellent performance in terms of reconstruction error as well as visual quality. However, these methods are limited in the sense that they can be trained only on fully labeled data. Due to various challenges in obtaining real world fully-labeled image deraining datasets, existing methods are trained only on synthetically generated data and hence, generalize poorly to real-world images. The use of real-world data in training image deraining networks is relatively less explored in the literature. We propose a Gaussian Process-based semi-supervised learning framework which enables the network in learning to derain using synthetic dataset while generalizing better using unlabeled real-world images. More specifically, we model the latent space vectors of unlabeled data using Gaussian Processes, which is then used to compute pseudo-ground-truth for supervising the network on unlabeled data. The pseudo ground-truth is further used to supervise the network at the intermediate level for the unlabeled data. Through extensive experiments and ablations on several challenging datasets (such as Rain800, Rain200L and DDN-SIRR), we show that the proposed method is able to effectively leverage unlabeled data thereby resulting in significantly better performance as compared to labeled-only training. Additionally, we demonstrate that using unlabeled real-world images in the proposed GP-based framework results in superior performance as compared to the existing methods. Code is available at: https://github.com/rajeevyasarla/Syn2Real.
Collapse
|
39
|
Lampinen S, Niu L, Hulttinen L, Niemi J, Mattila J. Autonomous robotic rock breaking using a real‐time 3D visual perception system. J FIELD ROBOT 2021. [DOI: 10.1002/rob.22022] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/05/2022]
Affiliation(s)
- Santeri Lampinen
- Faculty of Engineering and Natural Sciences, Unit of Automation Technology and Mechanical Engineering Tampere University Tampere Finland
| | - Longchuan Niu
- Faculty of Engineering and Natural Sciences, Unit of Automation Technology and Mechanical Engineering Tampere University Tampere Finland
| | - Lionel Hulttinen
- Faculty of Engineering and Natural Sciences, Unit of Automation Technology and Mechanical Engineering Tampere University Tampere Finland
| | | | - Jouni Mattila
- Faculty of Engineering and Natural Sciences, Unit of Automation Technology and Mechanical Engineering Tampere University Tampere Finland
| |
Collapse
|
40
|
Qiu K, Qin T, Pan J, Liu S, Shen S. Real-Time Temporal and Rotational Calibration of Heterogeneous Sensors Using Motion Correlation Analysis. IEEE T ROBOT 2021. [DOI: 10.1109/tro.2020.3033698] [Citation(s) in RCA: 10] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022]
|
41
|
Lei H, Liu S, Elazab A, Gong X, Lei B. Attention-Guided Multi-Branch Convolutional Neural Network for Mitosis Detection From Histopathological Images. IEEE J Biomed Health Inform 2021; 25:358-370. [PMID: 32991296 DOI: 10.1109/jbhi.2020.3027566] [Citation(s) in RCA: 11] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/08/2022]
Abstract
Mitotic count is an important indicator for assessing the invasiveness of breast cancers. Currently, the number of mitoses is manually counted by pathologists, which is both tedious and time-consuming. To address this situation, we propose a fast and accurate method to automatically detect mitosis from the histopathological images. The proposed method can automatically identify mitotic candidates from histological sections for mitosis screening. Specifically, our method exploits deep convolutional neural networks to extract high-level features of mitosis to detect mitotic candidates. Then, we use spatial attention modules to re-encode mitotic features, which allows the model to learn more efficient features. Finally, we use multi-branch classification subnets to screen the mitosis. Compared to existing related methods in literature, our method obtains the best detection results on the dataset of the International Pattern Recognition Conference (ICPR) 2012 Mitosis Detection Competition. Code has been made available at: https://github.com/liushaomin/MitosisDetection.
Collapse
|
42
|
Cen J, An P, Chen G, Liang J, Ma J. PSS: Point Semantic Saliency for 3D Object Detection. ARTIF INTELL 2021. [DOI: 10.1007/978-3-030-93046-2_35] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/24/2022]
|
43
|
|
44
|
Feng M, Gilani SZ, Wang Y, Zhang L, Mian A. Relation Graph Network for 3D Object Detection in Point Clouds. IEEE TRANSACTIONS ON IMAGE PROCESSING : A PUBLICATION OF THE IEEE SIGNAL PROCESSING SOCIETY 2020; 30:92-107. [PMID: 33085616 DOI: 10.1109/tip.2020.3031371] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/11/2023]
Abstract
Convolutional Neural Networks (CNNs) have emerged as a powerful tool for object detection in 2D images. However, their power has not been fully realised for detecting 3D objects directly in point clouds without conversion to regular grids. Moreover, existing state-of-the-art 3D object detection methods aim to recognize objects individually without exploiting their relationships during learning or inference. In this article, we first propose a strategy that associates the predictions of direction vectors with pseudo geometric centers, leading to a win-win solution for 3D bounding box candidates regression. Secondly, we propose point attention pooling to extract uniform appearance features for each 3D object proposal, benefiting from the learned direction features, semantic features and spatial coordinates of the object points. Finally, the appearance features are used together with the position features to build 3D object-object relationship graphs for all proposals to model their co-existence. We explore the effect of relation graphs on proposals' appearance feature enhancement under supervised and unsupervised settings. The proposed relation graph network comprises a 3D object proposal generation module and a 3D relation module, making it an end-to-end trainable network for detecting 3D objects in point clouds. Experiments on challenging benchmark point cloud datasets (SunRGB-D, ScanNet and KITTI) show that our algorithm performs better than existing state-of-the-art.
Collapse
|
45
|
Oertel A, Cieslewski T, Scaramuzza D. Augmenting Visual Place Recognition With Structural Cues. IEEE Robot Autom Lett 2020. [DOI: 10.1109/lra.2020.3009077] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]
|
46
|
3D car-detection based on a Mobile Deep Sensor Fusion Model and real-scene applications. PLoS One 2020; 15:e0236947. [PMID: 32881926 PMCID: PMC7470372 DOI: 10.1371/journal.pone.0236947] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/22/2020] [Accepted: 07/16/2020] [Indexed: 11/19/2022] Open
Abstract
Unmanned vehicles need to make a comprehensive perception of the surrounding environmental information during driving. Perception of automotive information is of significance. In the field of automotive perception, the sterevision of car-detection plays a vital role and sterevision can calculate the length, width, and height of a car, making the car more specific. However, under the existing technology, it is impossible to obtain accurate detection in a complex environment by relying on a single sensor. Therefore, it is particularly important to study the complex sensing technology based on multi-sensor fusion. Recently, with the development of deep learning in the field of vision, a mobile sensor-fusion method based on deep learning is proposed and applied in this paper——Mobile Deep Sensor Fusion Model (MDSFM). The content of this article is as follows. It does a data processing that projects 3D data to 2D data, which can form a dataset suitable for the model, thereby training data more efficiently. In the modules of LiDAR, it uses a revised squeezeNet structure to lighten the model and reduce parameters. In the modules of cameras, it uses the improved design of detecting module in R-CNN with a Mobile Spatial Attention Module (MSAM). In the fused part, it uses a dual-view deep fusing structure. And then it selects images from the KITTI’s datasets for validation to test this model. Compared with other recognized methods, it shows that our model has a fairly good performance. Finally, it implements a ROS program on the experimental car and our model is in good condition. The result shows that it can improve performance of detecting easy cars significantly through MDSFM. It increases the quality of the detected data and improves the generalized ability of car-detection model. It improves contextual relevance and preserves background information. It remains stable in driverless environments. It is applied in the realistic scenario and proves that the model has a good practical value.
Collapse
|
47
|
Mohamed IS, Capitanelli A, Mastrogiovanni F, Rovetta S, Zaccaria R. Detection, localisation and tracking of pallets using machine learning techniques and 2D range data. Neural Comput Appl 2020. [DOI: 10.1007/s00521-019-04352-0] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/24/2022]
|
48
|
Abstract
A point cloud is a set of points defined in a 3D metric space. Point clouds have become one of the most significant data formats for 3D representation and are gaining increased popularity as a result of the increased availability of acquisition devices, as well as seeing increased application in areas such as robotics, autonomous driving, and augmented and virtual reality. Deep learning is now the most powerful tool for data processing in computer vision and is becoming the most preferred technique for tasks such as classification, segmentation, and detection. While deep learning techniques are mainly applied to data with a structured grid, the point cloud, on the other hand, is unstructured. The unstructuredness of point clouds makes the use of deep learning for its direct processing very challenging. This paper contains a review of the recent state-of-the-art deep learning techniques, mainly focusing on raw point cloud data. The initial work on deep learning directly with raw point cloud data did not model local regions; therefore, subsequent approaches model local regions through sampling and grouping. More recently, several approaches have been proposed that not only model the local regions but also explore the correlation between points in the local regions. From the survey, we conclude that approaches that model local regions and take into account the correlation between points in the local regions perform better. Contrary to existing reviews, this paper provides a general structure for learning with raw point clouds, and various methods were compared based on the general structure. This work also introduces the popular 3D point cloud benchmark datasets and discusses the application of deep learning in popular 3D vision tasks, including classification, segmentation, and detection.
Collapse
|
49
|
Rahman MM, Tan Y, Xue J, Lu K. Recent Advances in 3D Object Detection in the Era of Deep Neural Networks: A Survey. IEEE TRANSACTIONS ON IMAGE PROCESSING : A PUBLICATION OF THE IEEE SIGNAL PROCESSING SOCIETY 2019; 29:2947-2962. [PMID: 31796401 DOI: 10.1109/tip.2019.2955239] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/10/2023]
Abstract
With the rapid development of deep learning technology and other powerful tools, 3D object detection has made great progress and become one of the fastest growing field in computer vision. Many automated applications such as robotic navigation, autonomous driving, and virtual or augmented reality system require estimation of accurate 3D object location and detection. Under this requirement, many methods have been proposed to improve the performance of 3D object localization and detection. Despite recent efforts, 3D object detection is still a very challenging task due to occlusion, viewpoint variations, scale changes, and limited information in 3D scenes. In this paper, we present a comprehensive review of recent state-of-the-art approaches in 3D object detection technology. We start with some basic concepts, then describe some of the available datasets that are designed to facilitate the performance evaluation of 3D object detection algorithms. Next, we will review the state-of-the-art technologies in this area, highlighting their contributions, importance, and limitations as a guide for future research. Finally, we provide a quantitative comparison of the results of the state-of-the-art methods on the popular public datasets.
Collapse
|
50
|
Bao W, Xu B, Chen Z. MonoFENet: Monocular 3D Object Detection with Feature Enhancement Networks. IEEE TRANSACTIONS ON IMAGE PROCESSING : A PUBLICATION OF THE IEEE SIGNAL PROCESSING SOCIETY 2019; 29:2753-2765. [PMID: 31725382 DOI: 10.1109/tip.2019.2952201] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/10/2023]
Abstract
Monocular 3D object detection has the merit of low cost and can be served as an auxiliary module for autonomous driving system, becoming a growing concern in recent years. In this paper, we present a monocular 3D object detection method with feature enhancement networks, which we call MonoFENet. Specifically, with the estimated disparity from the input monocular image, the features of both the 2D and 3D streams can be enhanced and utilized for accurate 3D localization. For the 2D stream, the input image is used to generate 2D region proposals as well as to extract appearance features. For the 3D stream, the estimated disparity is transformed into 3D dense point cloud, which is then enhanced by the associated front view maps. With the RoI Mean Pooling layer, 3D geometric features of RoI point clouds are further enhanced by the proposed point feature enhancement (PointFE) network. The region-wise features of image and point cloud are fused for the final 2D and 3D bounding boxes regression. The experimental results on the KITTI benchmark reveal that our method can achieve state-of-the-art performance for monocular 3D object detection.
Collapse
|