1
|
Ma X, Ouyang W, Simonelli A, Ricci E. 3D Object Detection From Images for Autonomous Driving: A Survey. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE 2024; 46:3537-3556. [PMID: 38145536 DOI: 10.1109/tpami.2023.3346386] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/27/2023]
Abstract
3D object detection from images, one of the fundamental and challenging problems in autonomous driving, has received increasing attention from both industry and academia in recent years. Benefiting from the rapid development of deep learning technologies, image-based 3D detection has achieved remarkable progress. Particularly, more than 200 works have studied this problem from 2015 to 2021, encompassing a broad spectrum of theories, algorithms, and applications. However, to date no recent survey exists to collect and organize this knowledge. In this paper, we fill this gap in the literature and provide the first comprehensive survey of this novel and continuously growing research field, summarizing the most commonly used pipelines for image-based 3D detection and deeply analyzing each of their components. Additionally, we also propose two new taxonomies to organize the state-of-the-art methods into different categories, with the intent of providing a more systematic review of existing methods and facilitating fair comparisons with future works. In retrospect of what has been achieved so far, we also analyze the current challenges in the field and discuss future directions for image-based 3D detection research.
Collapse
|
2
|
Contreras M, Jain A, Bhatt NP, Banerjee A, Hashemi E. A survey on 3D object detection in real time for autonomous driving. Front Robot AI 2024; 11:1212070. [PMID: 38510560 PMCID: PMC10950960 DOI: 10.3389/frobt.2024.1212070] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/25/2023] [Accepted: 02/19/2024] [Indexed: 03/22/2024] Open
Abstract
This survey reviews advances in 3D object detection approaches for autonomous driving. A brief introduction to 2D object detection is first discussed and drawbacks of the existing methodologies are identified for highly dynamic environments. Subsequently, this paper reviews the state-of-the-art 3D object detection techniques that utilizes monocular and stereo vision for reliable detection in urban settings. Based on depth inference basis, learning schemes, and internal representation, this work presents a method taxonomy of three classes: model-based and geometrically constrained approaches, end-to-end learning methodologies, and hybrid methods. There is highlighted segment for current trend of multi-view detectors as end-to-end methods due to their boosted robustness. Detectors from the last two kinds were specially selected to exploit the autonomous driving context in terms of geometry, scene content and instances distribution. To prove the effectiveness of each method, 3D object detection datasets for autonomous vehicles are described with their unique features, e. g., varying weather conditions, multi-modality, multi camera perspective and their respective metrics associated to different difficulty categories. In addition, we included multi-modal visual datasets, i. e., V2X that may tackle the problems of single-view occlusion. Finally, the current research trends in object detection are summarized, followed by a discussion on possible scope for future research in this domain.
Collapse
Affiliation(s)
| | - Aayush Jain
- Indian Institute of Technology Kharagpur, Kharagpur, West Bengal, India
| | | | | | | |
Collapse
|
3
|
Chen Y, Huang Y, Zeng J, Kang Y, Tan Y, Xie X, Wei B, Li C, Fang L, Jiang T. Energy-Efficient ReS 2-Based Optoelectronic Synapse for 3D Object Reconstruction and Recognition. ACS APPLIED MATERIALS & INTERFACES 2023; 15:58631-58642. [PMID: 38054897 DOI: 10.1021/acsami.3c14958] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/07/2023]
Abstract
The neuromorphic vision system (NVS) equipped with optoelectronic synapses integrates perception, storage, and processing and is expected to address the issues of traditional machine vision. However, owing to their lack of stereo vision, existing NVSs focus on 2D image processing, which makes it difficult to solve problems such as spatial cognition errors and low-precision interpretation. Consequently, inspired by the human visual system, an NVS with stereo vision is developed to achieve 3D object recognition, depending on the prepared ReS2 optoelectronic synapse with 12.12 fJ ultralow power consumption. This device exhibits excellent optical synaptic plasticity derived from the persistent photoconductivity effect. As the cornerstone for 3D vision, color planar information is successfully discriminated and stored in situ at the sensor end, benefiting from its wavelength-dependent plasticity in the visible region. Importantly, the dependence of the channel conductance on the target distance is experimentally revealed, implying that the structure information on the object can be directly captured and stored by the synapse. The 3D image of the object is successfully reconstructed via fusion of its planar and depth images. Therefore, the proposed 3D-NVS based on ReS2 synapses for 3D objects achieves a recognition accuracy of 97.0%, which is much higher than that for 2D objects (32.6%), demonstrating its strong ability to prevent 2D-photo spoofing in applications such as face payment, entrance guard systems, and others.
Collapse
Affiliation(s)
- Yabo Chen
- Institute for Quantum Information & State Key Laboratory of High Performance Computing, College of Computer, National University of Defense Technology, Changsha 410073, P. R. China
| | - Yujie Huang
- Institute for Quantum Information & State Key Laboratory of High Performance Computing, College of Computer, National University of Defense Technology, Changsha 410073, P. R. China
| | - Junwei Zeng
- Institute for Quantum Information & State Key Laboratory of High Performance Computing, College of Computer, National University of Defense Technology, Changsha 410073, P. R. China
| | - Yan Kang
- College of Advanced Interdisciplinary Studies, National University of Defense Technology, Changsha 410073, P. R. China
| | - Yinlong Tan
- College of Advanced Interdisciplinary Studies, National University of Defense Technology, Changsha 410073, P. R. China
| | - Xiangnan Xie
- Institute of Quantum Information Science and Technology, College of Science, National University of Defense Technology, Changsha 410073, P. R. China
| | - Bo Wei
- Institute for Quantum Information & State Key Laboratory of High Performance Computing, College of Computer, National University of Defense Technology, Changsha 410073, P. R. China
| | - Cheng Li
- Institute of Quantum Information Science and Technology, College of Science, National University of Defense Technology, Changsha 410073, P. R. China
| | - Liang Fang
- Institute for Quantum Information & State Key Laboratory of High Performance Computing, College of Computer, National University of Defense Technology, Changsha 410073, P. R. China
| | - Tian Jiang
- Institute of Quantum Information Science and Technology, College of Science, National University of Defense Technology, Changsha 410073, P. R. China
| |
Collapse
|
4
|
Liu W, Zhang T, Ma Y, Wei L. 3D Street Object Detection from Monocular Images Using Deep Learning and Depth Information. JOURNAL OF ADVANCED COMPUTATIONAL INTELLIGENCE AND INTELLIGENT INFORMATICS 2023. [DOI: 10.20965/jaciii.2023.p0198] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 03/21/2023]
Abstract
In this study, we present a three-dimensional (3D) object detection algorithm based on monocular images by constructing an end-to-end network, that incorporates depth information. The entire network consists of three parts. The first part includes the basic object detection neural network as the main body, that uses the region proposal network to obtain the two-dimensional (2D) region proposal of the object. The second part is the depth estimation branch network, that obtains the depth information of the object pixels and calculates the corresponding 3D point cloud. In the last part, concatenated features obtained from the aforementioned two parts are fed into the fully-connected layers. Subsequently, 2D and 3D detection results are obtained. Compared with certain existing methods, the accuracy of the detection results is improved in this study.
Collapse
Affiliation(s)
- Wei Liu
- School of Automation, China University of Geosciences, No.388 Lumo Road, Hongshan District, Wuhan, Hubei 430074, China
- Hubei Key Laboratory of Advanced Control and Intelligent Automation for Complex Systems, No.388 Lumo Road, Hongshan District, Wuhan, Hubei 430074, China
- Engineering Research Center of Intelligent Technology for Geo-Exploration, Ministry of Education, No.388 Lumo Road, Hongshan District, Wuhan, Hubei 430074, China
| | - Tao Zhang
- School of Automation, China University of Geosciences, No.388 Lumo Road, Hongshan District, Wuhan, Hubei 430074, China
- Hubei Key Laboratory of Advanced Control and Intelligent Automation for Complex Systems, No.388 Lumo Road, Hongshan District, Wuhan, Hubei 430074, China
- Engineering Research Center of Intelligent Technology for Geo-Exploration, Ministry of Education, No.388 Lumo Road, Hongshan District, Wuhan, Hubei 430074, China
| | - Yun Ma
- School of Automation, China University of Geosciences, No.388 Lumo Road, Hongshan District, Wuhan, Hubei 430074, China
- Hubei Key Laboratory of Advanced Control and Intelligent Automation for Complex Systems, No.388 Lumo Road, Hongshan District, Wuhan, Hubei 430074, China
- Engineering Research Center of Intelligent Technology for Geo-Exploration, Ministry of Education, No.388 Lumo Road, Hongshan District, Wuhan, Hubei 430074, China
| | - Longsheng Wei
- School of Automation, China University of Geosciences, No.388 Lumo Road, Hongshan District, Wuhan, Hubei 430074, China
- Hubei Key Laboratory of Advanced Control and Intelligent Automation for Complex Systems, No.388 Lumo Road, Hongshan District, Wuhan, Hubei 430074, China
- Engineering Research Center of Intelligent Technology for Geo-Exploration, Ministry of Education, No.388 Lumo Road, Hongshan District, Wuhan, Hubei 430074, China
| |
Collapse
|
5
|
Yao H, Chen J, Wang Z, Wang X, Chai X, Qiu Y, Han P. Vertex points are not enough: Monocular 3D object detection via intra- and inter-plane constraints. Neural Netw 2023; 162:350-358. [PMID: 36940495 DOI: 10.1016/j.neunet.2023.02.038] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/11/2022] [Revised: 02/15/2023] [Accepted: 02/25/2023] [Indexed: 03/06/2023]
Abstract
Existed methods for 3D object detection in monocular images focus mainly on the class of rigid bodies like cars, while more challenging detection like the cyclist is less studied. Therefore, we propose a novel 3D monocular object detection method to improve the accuracy of detection objects with large differences in deformation by introducing the geometric constraints of the object 3D bounding box plane. Considering the map relationship of projection plane and the keypoint, we firstly introduce the geometric constraints of the object 3D bounding box plane, adding the intra-plane constraint while regressing the position and offset of the keypoint itself, so that the position and offset error of the keypoint are always within the error range of the projection plane. For the inter-plane geometry relationship of the 3D bounding box, the prior knowledge is incorporated to optimize the keypoint regression allowing for improved the accuracy of depth location prediction. Experimental results show that the proposed method outperforms some other state-of-the-art methods on cyclist class, and obtains competitive results in the field of real-time monocular detection.
Collapse
Affiliation(s)
- Hongdou Yao
- National Engineering Research Center for Multimedia Software, School of Computer Science, Wuhan University, Wuhan, China; Hubei Key Laboratory of Multimedia and Network Communication Engineering, Wuhan University, Wuhan 430072, China; Collaborative Innovation Center of Geospatial Technology, Wuhan 430072, China
| | - Jun Chen
- National Engineering Research Center for Multimedia Software, School of Computer Science, Wuhan University, Wuhan, China; Hubei Key Laboratory of Multimedia and Network Communication Engineering, Wuhan University, Wuhan 430072, China; Collaborative Innovation Center of Geospatial Technology, Wuhan 430072, China.
| | - Zheng Wang
- National Engineering Research Center for Multimedia Software, School of Computer Science, Wuhan University, Wuhan, China; Hubei Key Laboratory of Multimedia and Network Communication Engineering, Wuhan University, Wuhan 430072, China; Collaborative Innovation Center of Geospatial Technology, Wuhan 430072, China
| | - Xiao Wang
- National Engineering Research Center for Multimedia Software, School of Computer Science, Wuhan University, Wuhan, China; Hubei Key Laboratory of Multimedia and Network Communication Engineering, Wuhan University, Wuhan 430072, China; Collaborative Innovation Center of Geospatial Technology, Wuhan 430072, China
| | - Xiaoyu Chai
- National Engineering Research Center for Multimedia Software, School of Computer Science, Wuhan University, Wuhan, China; Hubei Key Laboratory of Multimedia and Network Communication Engineering, Wuhan University, Wuhan 430072, China; Collaborative Innovation Center of Geospatial Technology, Wuhan 430072, China
| | - Yansheng Qiu
- National Engineering Research Center for Multimedia Software, School of Computer Science, Wuhan University, Wuhan, China; Hubei Key Laboratory of Multimedia and Network Communication Engineering, Wuhan University, Wuhan 430072, China; Collaborative Innovation Center of Geospatial Technology, Wuhan 430072, China
| | - Pengfei Han
- School of Cybersecurity, Northwestern Polytechnical University, Xi'an 710000, China
| |
Collapse
|
6
|
A Review of Different Components of the Intelligent Traffic Management System (ITMS). Symmetry (Basel) 2023. [DOI: 10.3390/sym15030583] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/25/2023] Open
Abstract
Traffic congestion is a serious challenge in urban areas. So, to address this challenge, the intelligent traffic management system (ITMS) is used to manage traffic on road networks. Managing traffic helps to focus on environmental impacts as well as emergency situations. However, the ITMS system has many challenges in analyzing scenes of complex traffic. New technologies such as computer vision (CV) and artificial intelligence (AI) are being used to solve these challenges. As a result, these technologies have made a distinct identity in the surveillance industry, particularly when it comes to keeping a constant eye on traffic scenes. There are many vehicle attributes and existing approaches that are being used in the development of ITMS, along with imaging technologies. In this paper, we reviewed the ITMS-based components that describe existing imaging technologies and existing approaches on the basis of their need for developing ITMS. The first component describes the traffic scene and imaging technologies. The second component talks about vehicle attributes and their utilization in existing vehicle-based approaches. The third component explains the vehicle’s behavior on the basis of the second component’s outcome. The fourth component explains how traffic-related applications can assist in the management and monitoring of traffic flow, as well as in the reduction of congestion and the enhancement of road safety. The fifth component describes the different types of ITMS applications. The sixth component discusses the existing methods of traffic signal control systems (TSCSs). Aside from these components, we also discuss existing vehicle-related tools such as simulators that work to create realistic traffic scenes. In the last section named discussion, we discuss the future development of ITMS and draw some conclusions. The main objective of this paper is to discuss the possible solutions to different problems during the development of ITMS in one place, with the help of components that would play an important role for an ITMS developer to achieve the goal of developing efficient ITMS.
Collapse
|
7
|
CenterLoc3D: monocular 3D vehicle localization network for roadside surveillance cameras. COMPLEX INTELL SYST 2023. [DOI: 10.1007/s40747-022-00962-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/04/2023]
Abstract
AbstractMonocular 3D vehicle localization is an important task for vehicle behaviour analysis, traffic flow parameter estimation and autonomous driving in Intelligent Transportation System (ITS) and Cooperative Vehicle Infrastructure System (CVIS), which is usually achieved by monocular 3D vehicle detection. However, monocular cameras cannot obtain depth information directly due to the inherent imaging mechanism, resulting in more challenging monocular 3D tasks. Currently, most of the monocular 3D vehicle detection methods still rely on 2D detectors and additional geometric constraint modules to recover 3D vehicle information, which reduces the efficiency. At the same time, most of the research is based on datasets of onboard scenes, instead of roadside perspective, which is limited in large-scale 3D perception. Therefore, we focus on 3D vehicle detection without 2D detectors in roadside scenes. We propose a 3D vehicle localization network CenterLoc3D for roadside monocular cameras, which directly predicts centroid and eight vertexes in image space, and the dimension of 3D bounding boxes without 2D detectors. To improve the precision of 3D vehicle localization, we propose a multi-scale weighted-fusion module and a loss with spatial constraints embedded in CenterLoc3D. Firstly, the transformation matrix between 2D image space and 3D world space is solved by camera calibration. Secondly, vehicle type, centroid, eight vertexes, and the dimension of 3D vehicle bounding boxes are obtained by CenterLoc3D. Finally, centroid in 3D world space can be obtained by camera calibration and CenterLoc3D for 3D vehicle localization. To the best of our knowledge, this is the first application of 3D vehicle localization for roadside monocular cameras. Hence, we also propose a benchmark for this application including a dataset (SVLD-3D), an annotation tool (LabelImg-3D), and evaluation metrics. Through experimental validation, the proposed method achieves high accuracy with $$A{P_{3D}}$$
A
P
3
D
of 51.30%, average 3D localization precision of 98%, average 3D dimension precision of 85% and real-time performance with FPS of 41.18.
Collapse
|
8
|
Zheng J, Wang L, Liu J, Wang H, Wang S, Wang L, Zhang J. An inspection method of rail head surface defect via bimodal structured light sensors. INT J MACH LEARN CYB 2022. [DOI: 10.1007/s13042-022-01736-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/24/2022]
|
9
|
Zhou Y, He Y, Zhu H, Wang C, Li H, Jiang Q. MonoEF: Extrinsic Parameter Free Monocular 3D Object Detection. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE 2022; 44:10114-10128. [PMID: 34932471 DOI: 10.1109/tpami.2021.3136899] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/14/2023]
Abstract
Monocular 3D object detection is an important task in autonomous driving. It can be easily intractable where there exists ego-car pose change w.r.t. ground plane. This is common due to the slight fluctuation of road smoothness and slope. Due to the lack of insight in industrial application, existing methods on open datasets neglect the camera pose information, which inevitably results in the detector being susceptible to camera extrinsic parameters. The perturbation of objects is very popular in most autonomous driving cases for industrial products. To this end, we propose a novel method to capture camera pose to formulate the detector free from extrinsic perturbation. Specifically, the proposed framework predicts camera extrinsic parameters by detecting vanishing point and horizon change. A converter is designed to rectify perturbative features in the latent space. By doing so, our 3D detector works independent of the extrinsic parameter variations and produces accurate results in realistic cases, e.g., potholed and uneven roads, where almost all existing monocular detectors fail to handle. Experiments demonstrate our method yields the best performance compared with the other state-of-the-arts by a large margin on both KITTI 3D and nuScenes datasets.
Collapse
|
10
|
Wu X, Ma D, Qu X, Jiang X, Zeng D. Depth Dynamic Center Difference Convolutions for Monocular 3D Object Detection. Neurocomputing 2022. [DOI: 10.1016/j.neucom.2022.11.032] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/15/2022]
|
11
|
Mouawad I, Brasch N, Manhardt F, Tombari F, Odone F. Time-to-Label: Temporal Consistency for Self-Supervised Monocular 3D Object Detection. IEEE Robot Autom Lett 2022. [DOI: 10.1109/lra.2022.3188882] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/05/2022]
|
12
|
Liu Y, Zhang F, Chen C, Wang S, Wang Y, Yu Y. Act Like a Radiologist: Towards Reliable Multi-View Correspondence Reasoning for Mammogram Mass Detection. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE 2022; 44:5947-5961. [PMID: 34061740 DOI: 10.1109/tpami.2021.3085783] [Citation(s) in RCA: 6] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/12/2023]
Abstract
Mammogram mass detection is crucial for diagnosing and preventing the breast cancers in clinical practice. The complementary effect of multi-view mammogram images provides valuable information about the breast anatomical prior structure and is of great significance in digital mammography interpretation. However, unlike radiologists who can utilize the natural reasoning ability to identify masses based on multiple mammographic views, how to endow the existing object detection models with the capability of multi-view reasoning is vital for decision-making in clinical diagnosis but remains the boundary to explore. In this paper, we propose an anatomy-aware graph convolutional network (AGN), which is tailored for mammogram mass detection and endows existing detection methods with multi-view reasoning ability. The proposed AGN consists of three steps. First, we introduce a bipartite graph convolutional network (BGN) to model the intrinsic geometric and semantic relations of ipsilateral views. Second, considering that the visual asymmetry of bilateral views is widely adopted in clinical practice to assist the diagnosis of breast lesions, we propose an inception graph convolutional network (IGN) to model the structural similarities of bilateral views. Finally, based on the constructed graphs, the multi-view information is propagated through nodes methodically, which equips the features learned from the examined view with multi-view reasoning ability. Experiments on two standard benchmarks reveal that AGN significantly exceeds the state-of-the-art performance. Visualization results show that AGN provides interpretable visual cues for clinical diagnosis.
Collapse
|
13
|
Chu H, Mo L, Wang R, Hu T, Ma H. Visibility of points: Mining occlusion cues for monocular 3D object detection. Neurocomputing 2022. [DOI: 10.1016/j.neucom.2022.06.099] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
|
14
|
Qin Z, Wang J, Lu Y. MonoGRNet: A General Framework for Monocular 3D Object Detection. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE 2022; 44:5170-5184. [PMID: 33877968 DOI: 10.1109/tpami.2021.3074363] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/12/2023]
Abstract
Detecting and localizing objects in the real 3D space, which plays a crucial role in scene understanding, is particularly challenging given only a monocular image due to the geometric information loss during imagery projection. We propose MonoGRNet for the amodal 3D object detection from a monocular image via geometric reasoning in both the observed 2D projection and the unobserved depth dimension. MonoGRNet decomposes the monocular 3D object detection task into four sub-tasks including 2D object detection, instance-level depth estimation, projected 3D center estimation and local corner regression. The task decomposition significantly facilitates the monocular 3D object detection, allowing the target 3D bounding boxes to be efficiently predicted in a single forward pass, without using object proposals, post-processing or the computationally expensive pixel-level depth estimation utilized by previous methods. In addition, MonoGRNet flexibly adapts to both fully and weakly supervised learning, which improves the feasibility of our framework in diverse settings. Experiments are conducted on KITTI, Cityscapes and MS COCO datasets. Results demonstrate the promising performance of our framework in various scenarios.
Collapse
|
15
|
Liu S, Huang W, Cao Y, Li D, Chen S. SMS-Net: Sparse multi-scale voxel feature aggregation network for LiDAR-based 3D object detection. Neurocomputing 2022. [DOI: 10.1016/j.neucom.2022.06.054] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/24/2022]
|
16
|
Geng Q, Zhang H, Lu F, Huang X, Wang S, Zhou Z, Yang R. Part-Level Car Parsing and Reconstruction in Single Street View Images. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE 2022; 44:4291-4305. [PMID: 33687835 DOI: 10.1109/tpami.2021.3064837] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/12/2023]
Abstract
Part information has been proven to be resistant to occlusions and viewpoint changes, which are main difficulties in car parsing and reconstruction. However, in the absence of datasets and approaches incorporating car parts, there are limited works that benefit from it. In this paper, we propose the first part-aware approach for joint part-level car parsing and reconstruction in single street view images. Without labor-intensive part annotations on real images, our approach simultaneously estimates pose, shape, and semantic parts of cars. There are two contributions in this paper. First, our network introduces dense part information to facilitate pose and shape estimation, which is further optimized with a novel 3D loss. To obtain part information in real images, a class-consistent method is introduced to implicitly transfer part knowledge from synthesized images. Second, we construct the first high-quality dataset containing 348 car models with physical dimensions and part annotations. Given these models, 60K synthesized images with randomized configurations are generated. Experimental results demonstrate that part knowledge can be effectively transferred with our class-consistent method, which significantly improves part segmentation performance on real street views. By fusing dense part information, our pose and shape estimation results achieve the state-of-the-art performance on the ApolloCar3D and outperform previous approaches by large margins in terms of both A3DP-Abs and A3DP-Rel.
Collapse
|
17
|
Zhou Y, Yu Z, Ma Z. UAV Based Indoor Localization and Objection Detection. Front Neurorobot 2022; 16:914353. [PMID: 35874109 PMCID: PMC9305663 DOI: 10.3389/fnbot.2022.914353] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/06/2022] [Accepted: 05/25/2022] [Indexed: 11/29/2022] Open
Abstract
This article targets fast indoor positioning and 3D target detection for unmanned aerial vehicle (UAV) real-time task implementation. With the combined direct method and feature method, a method is proposed for fast and accurate position estimation of the UAV. The camera pose is estimated by the visual odometer via the photometric error between the frames. Then the ORB features can be extended from the keyframes for the map consistency improvement by Bundle Adjustment with local and global optimization. A depth filter is also applied to assist the convergence of the map points with depth information updates from multiple frames. Moreover, the convolutional neural network is used to detect the specific target in an unknown space, while YOLOv3 is applied to obtain the semantic information of the target in the images. Thus, the spatial map points of the feature in the keyframes can be associated with the target detection box, while the statistical outlier filter can be simultaneously applied to eliminate the noise points. Experiments with public dataset, and field experiments on the established UAV platform in indoor environments have been carried out for visual based fast localization and object detection in real-time for the efficacy verification of the proposed method.
Collapse
Affiliation(s)
- Yimin Zhou
- Shenzhen Institute of Advanced Technology, Chinese Academy of Sciences, Shenzhen, China
- University of Chinese Academy of Sciences, Beijing, China
- *Correspondence: Yimin Zhou
| | - Zhixiong Yu
- University of Chinese Academy of Sciences, Beijing, China
| | - Zhuang Ma
- Shenzhen Institute of Advanced Technology, Chinese Academy of Sciences, Shenzhen, China
- University of Chinese Academy of Sciences, Beijing, China
| |
Collapse
|
18
|
Chen W, Li P, Zhao H. MSL3D: 3D object detection from monocular, stereo and point cloud for autonomous driving. Neurocomputing 2022. [DOI: 10.1016/j.neucom.2022.04.075] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/18/2022]
|
19
|
SRIF-RCNN: Sparsely represented inputs fusion of different sensors for 3D object detection. APPL INTELL 2022. [DOI: 10.1007/s10489-022-03594-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/02/2022]
|
20
|
PIFNet: 3D Object Detection Using Joint Image and Point Cloud Features for Autonomous Driving. APPLIED SCIENCES-BASEL 2022. [DOI: 10.3390/app12073686] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/16/2022]
Abstract
Owing to its wide range of applications, 3D object detection has attracted increasing attention in computer vision tasks. Most existing 3D object detection methods are based on Lidar point cloud data. However, these methods have some limitations in localization consistency and classification confidence, due to the irregularity and sparsity of Light Detection and Ranging (LiDAR) point cloud data. Inspired by the complementary characteristics of Lidar and camera sensors, we propose a new end-to-end learnable framework named Point-Image Fusion Network (PIFNet) to integrate the LiDAR point cloud and camera images. To resolve the problem of inconsistency in the localization and classification, we designed an Encoder-Decoder Fusion (EDF) module to extract the image features effectively, while maintaining the fine-grained localization information of objects. Furthermore, a new effective fusion module is proposed to integrate the color and texture features from images and the depth information from the point cloud. This module can enhance the irregularity and sparsity problem of the point cloud features by capitalizing the fine-grained information from camera images. In PIFNet, each intermediate feature map is fed into the fusion module to be integrated with its corresponding point-wise features. Furthermore, point-wise features are used instead of voxel-wise features to reduce information loss. Extensive experiments using the KITTI dataset demonstrate the superiority of PIFNet over other state-of-the-art methods. Compared with several state-of-the-art methods, our approach outperformed by 1.97% in mean Average Precision (mAP) and by 2.86% in Average Precision (AP) for the hard cases on the KITTI 3D object detection benchmark.
Collapse
|
21
|
3D Sensor Based Pedestrian Detection by Integrating Improved HHA Encoding and Two-Branch Feature Fusion. REMOTE SENSING 2022. [DOI: 10.3390/rs14030645] [Citation(s) in RCA: 6] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/27/2023]
Abstract
Pedestrian detection is vitally important in many computer vision tasks but still suffers from some problems, such as illumination and occlusion if only the RGB image is exploited, especially in outdoor and long-range scenes. Combining RGB with depth information acquired by 3D sensors may effectively alleviate these problems. Therefore, how to utilize depth information and how to fuse RGB and depth features are the focus of the task of RGB-D pedestrian detection. This paper first improves the most commonly used HHA method for depth encoding by optimizing the gravity direction extraction and depth values mapping, which can generate a pseudo-color image from the depth information. Then, a two-branch feature fusion extraction module (TFFEM) is proposed to obtain the local and global features of both modalities. Based on TFFEM, an RGB-D pedestrian detection network is designed to locate the people. In experiments, the improved HHA encoding method is twice as fast and achieves more accurate gravity-direction extraction on four publicly-available datasets. The pedestrian detection performance of the proposed network is validated on KITTI and EPFL datasets and achieves state-of-the-art performance. Moreover, the proposed method achieved third ranking among all published works on the KITTI leaderboard. In general, the proposed method effectively fuses RGB and depth features and overcomes the effects of illumination and occlusion problems in pedestrian detection.
Collapse
|
22
|
|
23
|
Abstract
Three-dimensional (3D) object detection is an important task in the field of machine vision, in which the detection of 3D objects using monocular vision is even more challenging. We observe that most of the existing monocular methods focus on the design of the feature extraction framework or embedded geometric constraints, but ignore the possible errors in the intermediate process of the detection pipeline. These errors may be further amplified in the subsequent processes. After exploring the existing detection framework of keypoints, we find that the accuracy of keypoints prediction will seriously affect the solution of 3D object position. Therefore, we propose a novel keypoints uncertainty prediction network (KUP-Net) for monocular 3D object detection. In this work, we design an uncertainty prediction module to characterize the uncertainty that exists in keypoint prediction. Then, the uncertainty is used for joint optimization with object position. In addition, we adopt position-encoding to assist the uncertainty prediction, and use a timing coefficient to optimize the learning process. The experiments on our detector are conducted on the KITTI benchmark. For the two levels of easy and moderate, we achieve accuracy of 17.26 and 11.78 in AP3D, and achieve accuracy of 23.59 and 16.63 in APBEV, which are higher than the latest method KM3D.
Collapse
|
24
|
Zhu C, Zhang H, Chen W, Tan M, Liu Q. An Occlusion Compensation Learning Framework for Improving the Rendering Quality of Light Field. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2021; 32:5738-5752. [PMID: 33108291 DOI: 10.1109/tnnls.2020.3027468] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/11/2023]
Abstract
Occlusions are common phenomena in light field rendering (LFR) technology applications. The 3-D spatial structures of some features may be missing or incorrect when capturing some samples due to occlusion discontinuities. Most prior works on LFR, however, have neglected occlusions from other objects in 3-D scenes that do not participate in the capturing and rendering of the light field. To improve rendering quality, this report proposes an occlusion probability learning framework (OPLF) based on a deep Boltzmann machine (DBM) to compensate for the occluded information. In the OPLF, an occlusion probability density model is applied to calculate the visibility scores, which are modeled as hidden variables. Additionally, the probability of occlusion is related to the visibility, the camera configuration (i.e., position and direction), and the relationship between the occlusion object and occluded object. Furthermore, a deep probability model based on the OPLF is used for learning the occlusion relationship between the camera and object in multiple layers. The proposed OPLF can optimize the LFR quality. Finally, to verify the claimed performance, we also compare the OPLF with the most advanced occlusion theory and light field reconstruction algorithms. The experimental results show that the proposed OPLF outperforms other known occlusion quantization schemes.
Collapse
|
25
|
|
26
|
Manzoor S, Joo SH, Kim EJ, Bae SH, In GG, Pyo JW, Kuc TY. 3D Recognition Based on Sensor Modalities for Robotic Systems: A Survey. SENSORS (BASEL, SWITZERLAND) 2021; 21:7120. [PMID: 34770429 PMCID: PMC8587961 DOI: 10.3390/s21217120] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 08/26/2021] [Revised: 10/17/2021] [Accepted: 10/20/2021] [Indexed: 11/16/2022]
Abstract
3D visual recognition is a prerequisite for most autonomous robotic systems operating in the real world. It empowers robots to perform a variety of tasks, such as tracking, understanding the environment, and human-robot interaction. Autonomous robots equipped with 3D recognition capability can better perform their social roles through supportive task assistance in professional jobs and effective domestic services. For active assistance, social robots must recognize their surroundings, including objects and places to perform the task more efficiently. This article first highlights the value-centric role of social robots in society by presenting recently developed robots and describes their main features. Instigated by the recognition capability of social robots, we present the analysis of data representation methods based on sensor modalities for 3D object and place recognition using deep learning models. In this direction, we delineate the research gaps that need to be addressed, summarize 3D recognition datasets, and present performance comparisons. Finally, a discussion of future research directions concludes the article. This survey is intended to show how recent developments in 3D visual recognition based on sensor modalities using deep-learning-based approaches can lay the groundwork to inspire further research and serves as a guide to those who are interested in vision-based robotics applications.
Collapse
Affiliation(s)
| | | | | | | | | | | | - Tae-Yong Kuc
- Department of Electrical and Computer Engineering, College of Information and Communication Engineering, Sungkyunkwan University, Suwon 16419, Korea; (S.M.); (S.-H.J.); (E.-J.K.); (S.-H.B.); (G.-G.I.); (J.-W.P.)
| |
Collapse
|
27
|
Tao C, He H, Xu F, Cao J. Stereo priori RCNN based car detection on point level for autonomous driving. Knowl Based Syst 2021. [DOI: 10.1016/j.knosys.2021.107346] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/20/2022]
|
28
|
Wu Y, Jiang X, Fang Z, Gao Y, Fujita H. Multi-modal 3D object detection by 2D-guided precision anchor proposal and multi-layer fusion. Appl Soft Comput 2021. [DOI: 10.1016/j.asoc.2021.107405] [Citation(s) in RCA: 16] [Impact Index Per Article: 5.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/26/2022]
|
29
|
Oil Well Detection via Large-Scale and High-Resolution Remote Sensing Images Based on Improved YOLO v4. REMOTE SENSING 2021. [DOI: 10.3390/rs13163243] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/15/2022]
Abstract
Oil is an important resource for the development of modern society. Accurate detection of oil wells is of great significance to the investigation of oil exploitation status and the formulation of an exploitation plan. However, detecting small objects in large-scale and high-resolution remote sensing images, such as oil wells, is a challenging task due to the problems of large number, limited pixels, and complex background. In order to overcome this problem, first, we create our own oil well dataset to conduct experiments given the lack of a public dataset. Second, we provide a comparative assessment of two state-of-the-art object detection algorithms, SSD and YOLO v4, for oil well detection in our image dataset. The results show that both of them have good performance, but YOLO v4 has better accuracy in oil well detection because of its better feature extraction capability for small objects. In view of the fact that small objects are currently difficult to be detected in large-scale and high-resolution remote sensing images, this article proposes an improved algorithm based on YOLO v4 with sliding slices and discarding edges. The algorithm effectively solves the problems of repeated detection and inaccurate positioning of oil well detection in large-scale and high-resolution remote sensing images, and the accuracy of detection result increases considerably. In summary, this study investigates an appropriate algorithm for oil well detection, improves the algorithm, and achieves an excellent effect on a large-scale and high-resolution satellite image. It provides a new idea for small objects detection in large-scale and high-resolution remote sensing images.
Collapse
|
30
|
Li Y, Ma L, Zhong Z, Liu F, Chapman MA, Cao D, Li J. Deep Learning for LiDAR Point Clouds in Autonomous Driving: A Review. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2021; 32:3412-3432. [PMID: 32822311 DOI: 10.1109/tnnls.2020.3015992] [Citation(s) in RCA: 40] [Impact Index Per Article: 13.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/11/2023]
Abstract
Recently, the advancement of deep learning (DL) in discriminative feature learning from 3-D LiDAR data has led to rapid development in the field of autonomous driving. However, automated processing uneven, unstructured, noisy, and massive 3-D point clouds are a challenging and tedious task. In this article, we provide a systematic review of existing compelling DL architectures applied in LiDAR point clouds, detailing for specific tasks in autonomous driving, such as segmentation, detection, and classification. Although several published research articles focus on specific topics in computer vision for autonomous vehicles, to date, no general survey on DL applied in LiDAR point clouds for autonomous vehicles exists. Thus, the goal of this article is to narrow the gap in this topic. More than 140 key contributions in the recent five years are summarized in this survey, including the milestone 3-D deep architectures, the remarkable DL applications in 3-D semantic segmentation, object detection, and classification; specific data sets, evaluation metrics, and the state-of-the-art performance. Finally, we conclude the remaining challenges and future researches.
Collapse
|
31
|
Bescos B, Campos C, Tardos JD, Neira J. DynaSLAM II: Tightly-Coupled Multi-Object Tracking and SLAM. IEEE Robot Autom Lett 2021. [DOI: 10.1109/lra.2021.3068640] [Citation(s) in RCA: 30] [Impact Index Per Article: 10.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/23/2022]
|
32
|
|
33
|
Li CJ, Qu Z, Wang SY, Liu L. A method of cross-layer fusion multi-object detection and recognition based on improved faster R-CNN model in complex traffic environment. Pattern Recognit Lett 2021. [DOI: 10.1016/j.patrec.2021.02.003] [Citation(s) in RCA: 10] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/22/2022]
|
34
|
Li H, Zhao S, Zhao W, Zhang L, Shen J. One-Stage Anchor-Free 3D Vehicle Detection from LiDAR Sensors. SENSORS 2021; 21:s21082651. [PMID: 33918952 PMCID: PMC8069010 DOI: 10.3390/s21082651] [Citation(s) in RCA: 6] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 02/03/2021] [Revised: 03/19/2021] [Accepted: 03/30/2021] [Indexed: 11/21/2022]
Abstract
Recent one-stage 3D detection methods generate anchor boxes with various sizes and orientations in the ground plane, then determine whether these anchor boxes contain any region of interest and adjust the edges of them for accurate object bounding boxes. The anchor-based algorithm calculates the classification and regression label for each anchor box during the training process, which is inefficient and complicated. We propose a one-stage, anchor-free 3D vehicle detection algorithm based on LiDAR point clouds. The object position is encoded as a set of keypoints in the bird’s-eye view (BEV) of point clouds. We apply the voxel/pillar feature extractor and convolutional blocks to map an unstructured point cloud to a single-channel 2D heatmap. The vehicle’s Z-axis position, dimension, and orientation angle are regressed as additional attributes of the keypoints. Our method combines SmoothL1 loss and IoU (Intersection over Union) loss, and we apply (cosθ,sinθ) as angle regression labels, which achieve high average orientation similarity (AOS) without any direction classification tricks. During the target assignment and bounding box decoding process, our framework completely avoids any calculations related to anchor boxes. Our framework is end-to-end training and stands at the same performance level as the other one-stage anchor-based detectors.
Collapse
Affiliation(s)
- Hao Li
- Beijing Laboratory of Intelligent Information Technology, School of Computer Science, Beijing Institute of Technology, Beijing 100081, China; (H.L.); (J.S.)
| | - Sanyuan Zhao
- Beijing Laboratory of Intelligent Information Technology, School of Computer Science, Beijing Institute of Technology, Beijing 100081, China; (H.L.); (J.S.)
- Correspondence:
| | - Wenjun Zhao
- State Key Laboratory of Smart Manufacturing for Special Vehicles and Transmission System, Inner Mongolia No.2 Mailbox, Baotou City 014030, China; (W.Z.); (L.Z.)
| | - Libin Zhang
- State Key Laboratory of Smart Manufacturing for Special Vehicles and Transmission System, Inner Mongolia No.2 Mailbox, Baotou City 014030, China; (W.Z.); (L.Z.)
| | - Jianbing Shen
- Beijing Laboratory of Intelligent Information Technology, School of Computer Science, Beijing Institute of Technology, Beijing 100081, China; (H.L.); (J.S.)
| |
Collapse
|
35
|
|
36
|
Díez-Pastor JF, Latorre-Carmona P, Arnaiz-González Á, Ruiz-Pérez J, Zurro D. "You Are Not My Type": An Evaluation of Classification Methods for Automatic Phytolith Identification. MICROSCOPY AND MICROANALYSIS : THE OFFICIAL JOURNAL OF MICROSCOPY SOCIETY OF AMERICA, MICROBEAM ANALYSIS SOCIETY, MICROSCOPICAL SOCIETY OF CANADA 2020; 26:1158-1167. [PMID: 33168124 DOI: 10.1017/s1431927620024629] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/11/2023]
Abstract
Phytoliths can be an important source of information related to environmental and climatic change, as well as to ancient plant use by humans, particularly within the disciplines of paleoecology and archaeology. Currently, phytolith identification and categorization is performed manually by researchers, a time-consuming task liable to misclassifications. The automated classification of phytoliths would allow the standardization of identification processes, avoiding possible biases related to the classification capability of researchers. This paper presents a comparative analysis of six classification methods, using digitized microscopic images to examine the efficacy of different quantitative approaches for characterizing phytoliths. A comprehensive experiment performed on images of 429 phytoliths demonstrated that the automatic phytolith classification is a promising area of research that will help researchers to invest time more efficiently and improve their recognition accuracy rate.
Collapse
Affiliation(s)
- José-Francisco Díez-Pastor
- Departamento de Ingeniería Informática, Escuela Politécnica Superior, Universidad de Burgos, Burgos, Spain
| | - Pedro Latorre-Carmona
- Departamento de Ingeniería Informática, Escuela Politécnica Superior, Universidad de Burgos, Burgos, Spain
| | - Álvar Arnaiz-González
- Departamento de Ingeniería Informática, Escuela Politécnica Superior, Universidad de Burgos, Burgos, Spain
| | - Javier Ruiz-Pérez
- CaSEs - Culture and Socio-Ecological Systems Research Group, Department of Humanities, Pompeu Fabra University, Barcelona, Spain
| | - Débora Zurro
- Institución Milá y Fontanals de Investigación en Humanidades - Consejo Superior de Investigaciones Científicas (IMF-CSIC), C. Egipciaques 15, 08001Barcelona, Spain
| |
Collapse
|
37
|
|
38
|
Tian Y, Wang K, Wang Y, Tian Y, Wang Z, Wang FY. Adaptive and azimuth-aware fusion network of multimodal local features for 3D object detection. Neurocomputing 2020. [DOI: 10.1016/j.neucom.2020.05.086] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/24/2022]
|
39
|
Ren Z, Sudderth EB. Clouds of Oriented Gradients for 3D Detection of Objects, Surfaces, and Indoor Scene Layouts. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE 2020; 42:2670-2683. [PMID: 31217095 DOI: 10.1109/tpami.2019.2923201] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/09/2023]
Abstract
We develop new representations and algorithms for three-dimensional (3D) object detection and spatial layout prediction in cluttered indoor scenes. We first propose a clouds of oriented gradient (COG) descriptor that links the 2D appearance and 3D pose of object categories, and thus accurately models how perspective projection affects perceived image gradients. To better represent the 3D visual styles of large objects and provide contextual cues to improve the detection of small objects, we introduce latent support surfaces. We then propose a "Manhattan voxel" representation which better captures the 3D room layout geometry of common indoor environments. Effective classification rules are learned via a latent structured prediction framework. Contextual relationships among categories and layout are captured via a cascade of classifiers, leading to holistic scene hypotheses that exceed the state-of-the-art on the SUN RGB-D database.
Collapse
|
40
|
Abstract
Vision-based object detection technology plays a very important role in the field of computer vision. It is widely used in many machine vision applications. However, in the specific application scenarios, like a solid waste sorting system, it is very difficult to obtain good accuracy due to the color information of objects that is badly damaged. In this work, we propose a novel multimodal convolutional neural network method for RGB-D solid waste object detection. The depth information is introduced as the new modal to improve the object detection performance. Our method fuses two individual features in multiple scales, which forms an end-to-end network. We evaluate our method on the self-constructed solid waste data set. In comparison with single modal detection and other popular cross modal fusion neural networks, our method achieves remarkable results with high validity, reliability, and real-time detection speed.
Collapse
Affiliation(s)
- Yan Yu
- Zhejiang University of Technology, Hangzhou, Zhejiang, China
| | - Siyu Zou
- Zhejiang University of Technology, Hangzhou, Zhejiang, China
| | - Kejie Yin
- Zhejiang University of Technology, Hangzhou, Zhejiang, China
| |
Collapse
|
41
|
Yang L, Huang Y, Hu X, Wei H, Wang Q. Multiclass obstacles detection and classification using stereovision and Bayesian network for intelligent vehicles. INT J ADV ROBOT SYST 2020. [DOI: 10.1177/1729881420947270] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/15/2022] Open
Abstract
Intelligent vehicles should be able to detect various obstacles and also identify their types so that the vehicles can take an appropriate level of protection and intervention. This article presents a method of detecting and classifying multiclass obstacles for intelligent vehicles. A stereovision-based method is used to segment obstacles from traffic background and measure three-dimensional geometrical features. A Bayesian network (BN) model has been established to further classify them into five classes, including pedestrian, cyclist, car, van, and truck. The BN model is trained using substantial data samples. The optimized structure of the model is determined from the necessary path condition method with a presupposition constraint (NPC+PC). The conditional probability table of the discrete nodes and the conditional probability distribution of the continuous nodes are determined from expectation maximization (EM) training algorithm with consideration of prior domain knowledge. Experiments were conducted using the object detection data set on the public KITTI benchmark, and the results show that the proposed BN model exhibits an excellent performance for obstacle classification while the full pipeline of the method including detection and classification is in the upper middle level compared with other existing methods.
Collapse
Affiliation(s)
- Lina Yang
- School of Optical-Electrical and Computer Engineering, University of Shanghai for Science and Technology, Shanghai, China
- College of Mechanical and Electrical Engineering, Jiaxing University, Jiaxing, China
| | - Yingping Huang
- School of Optical-Electrical and Computer Engineering, University of Shanghai for Science and Technology, Shanghai, China
| | - Xing Hu
- School of Optical-Electrical and Computer Engineering, University of Shanghai for Science and Technology, Shanghai, China
| | - Hongjian Wei
- School of Optical-Electrical and Computer Engineering, University of Shanghai for Science and Technology, Shanghai, China
| | - Qixiang Wang
- School of Optical-Electrical and Computer Engineering, University of Shanghai for Science and Technology, Shanghai, China
| |
Collapse
|
42
|
KDA3D: Key-Point Densification and Multi-Attention Guidance for 3D Object Detection. REMOTE SENSING 2020. [DOI: 10.3390/rs12111895] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/16/2022]
Abstract
In this paper, we propose a novel 3D object detector KDA3D, which achieves high-precision and robust classification, segmentation, and localization with the help of key-point densification and multi-attention guidance. The proposed end-to-end neural network architecture takes LIDAR point clouds as the main inputs that can be optionally complemented by RGB images. It consists of three parts: part-1 segments 3D foreground points and generates reliable proposals; part-2 (optional) enhances point cloud density and reconstructs the more compact full-point feature map; part-3 refines 3D bounding boxes and adds semantic segmentation as extra supervision. Our designed lightweight point-wise and channel-wise attention modules can adaptively strengthen the “skeleton” and “distinctiveness” point-features to help feature learning networks capture more representative or finer patterns. The proposed key-point densification component can generate pseudo-point clouds containing target information from monocular images through the distance preference strategy and K-means clustering so as to balance the density distribution and enrich sparse features. Extensive experiments on the KITTI and nuScenes 3D object detection benchmarks show that our KDA3D produces state-of-the-art results while running in near real-time with a low memory footprint.
Collapse
|
43
|
Siddiqui TA, Madhok R, O'Toole M. An Extensible Multi-Sensor Fusion Framework for 3D Imaging. 2020 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION WORKSHOPS (CVPRW) 2020. [DOI: 10.1109/cvprw50498.2020.00512] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 09/01/2023]
|
44
|
Itu R, Danescu RG. A Self-Calibrating Probabilistic Framework for 3D Environment Perception Using Monocular Vision. SENSORS 2020; 20:s20051280. [PMID: 32120868 PMCID: PMC7085646 DOI: 10.3390/s20051280] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 12/27/2019] [Revised: 02/20/2020] [Accepted: 02/25/2020] [Indexed: 11/27/2022]
Abstract
Cameras are sensors that are available anywhere and to everyone, and can be placed easily inside vehicles. While stereovision setups of two or more synchronized cameras have the advantage of directly extracting 3D information, a single camera can be easily set up behind the windshield (like a dashcam), or above the dashboard, usually as an internal camera of a mobile phone placed there for navigation assistance. This paper presents a framework for extracting and tracking obstacle 3D data from the surrounding environment of a vehicle in traffic, using as a sensor a generic camera. The system combines the strength of Convolutional Neural Network (CNN)-based segmentation with a generic probabilistic model of the environment, the dynamic occupancy grid. The main contributions presented in this paper are the following: A method for generating the probabilistic measurement model from monocular images, based on CNN segmentation, which takes into account the particularities, uncertainties, and limitations of monocular vision; a method for automatic calibration of the extrinsic and intrinsic parameters of the camera, without the need of user assistance; the integration of automatic calibration and measurement model generation into a scene tracking system that is able to work with any camera to perceive the obstacles in real traffic. The presented system can be easily fitted to any vehicle, working standalone or together with other sensors, to enhance the environment perception capabilities and improve the traffic safety.
Collapse
|
45
|
|
46
|
Bao W, Xu B, Chen Z. MonoFENet: Monocular 3D Object Detection with Feature Enhancement Networks. IEEE TRANSACTIONS ON IMAGE PROCESSING : A PUBLICATION OF THE IEEE SIGNAL PROCESSING SOCIETY 2019; 29:2753-2765. [PMID: 31725382 DOI: 10.1109/tip.2019.2952201] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/10/2023]
Abstract
Monocular 3D object detection has the merit of low cost and can be served as an auxiliary module for autonomous driving system, becoming a growing concern in recent years. In this paper, we present a monocular 3D object detection method with feature enhancement networks, which we call MonoFENet. Specifically, with the estimated disparity from the input monocular image, the features of both the 2D and 3D streams can be enhanced and utilized for accurate 3D localization. For the 2D stream, the input image is used to generate 2D region proposals as well as to extract appearance features. For the 3D stream, the estimated disparity is transformed into 3D dense point cloud, which is then enhanced by the associated front view maps. With the RoI Mean Pooling layer, 3D geometric features of RoI point clouds are further enhanced by the proposed point feature enhancement (PointFE) network. The region-wise features of image and point cloud are fused for the final 2D and 3D bounding boxes regression. The experimental results on the KITTI benchmark reveal that our method can achieve state-of-the-art performance for monocular 3D object detection.
Collapse
|
47
|
Zhou X, Fang Y, Mu Y. Learning single-shot vehicle orientation estimation from large-scale street panoramas. Neurocomputing 2019. [DOI: 10.1016/j.neucom.2019.07.060] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/27/2022]
|
48
|
Li Y, Zheng H, Yan Z, Chen L. Detail preservation and feature refinement for object detection. Neurocomputing 2019. [DOI: 10.1016/j.neucom.2019.05.086] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
|
49
|
Li X, Ma H, Luo X. Weaklier Supervised Semantic Segmentation With Only One Image Level Annotation per Category. IEEE TRANSACTIONS ON IMAGE PROCESSING : A PUBLICATION OF THE IEEE SIGNAL PROCESSING SOCIETY 2019; 29:128-141. [PMID: 31380759 DOI: 10.1109/tip.2019.2930874] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/10/2023]
Abstract
Image semantic segmentation tasks and methods based on weakly supervised conditions have been proposed and achieve better and better performance in recent years. However, the purpose of these tasks is mainly to simplify the labeling work. In this paper, we establish a new and more challenging task condition: weaklier supervision with one image level annotation per category, which only provides prior knowledge that humans need to recognize new objects, and aims to achieve pixel-level object semantic understanding. In order to solve this problem, a three-stage semantic segmentation framework is put forward, which realizes image level, pixel level, and object common features learning from coarse to fine grade, and finally obtains semantic segmentation results with accurate and complete object regions. Researches on PASCAL VOC 2012 dataset demonstrates the effectiveness of the proposed method, which makes an obvious improvement compared to baselines. Based on fewer supervised information, the method also provides satisfactory performance compared to weakly supervised learning-based methods with complete image-level annotations.
Collapse
|
50
|
|