1
|
Wang H, Yang M, Zheng N. G2-MonoDepth: A General Framework of Generalized Depth Inference From Monocular RGB+X Data. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE 2024; 46:3753-3771. [PMID: 38145531 DOI: 10.1109/tpami.2023.3346466] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/27/2023]
Abstract
Monocular depth inference is a fundamental problem for scene perception of robots. Specific robots may be equipped with a camera plus an optional depth sensor of any type and located in various scenes of different scales, whereas recent advances derived multiple individual sub-tasks. It leads to additional burdens to fine-tune models for specific robots and thereby high-cost customization in large-scale industrialization. This article investigates a unified task of monocular depth inference, which infers high-quality depth maps from all kinds of input raw data from various robots in unseen scenes. A basic benchmark G2-MonoDepth is developed for this task, which comprises four components: (a) a unified data representation RGB+X to accommodate RGB plus raw depth with diverse scene scale/semantics, depth sparsity ([0%, 100%]) and errors (holes/noises/blurs), (b) a novel unified loss to adapt to diverse depth sparsity/errors of input raw data and diverse scales of output scenes, (c) an improved network to well propagate diverse scene scales from input to output, and (d) a data augmentation pipeline to simulate all types of real artifacts in raw depth maps for training. G2-MonoDepth is applied in three sub-tasks including depth estimation, depth completion with different sparsity, and depth enhancement in unseen scenes, and it always outperforms SOTA baselines on both real-world data and synthetic data.
Collapse
|
2
|
Wang Z, Shen M, Chen Q. Eliminating Scale Ambiguity of Unsupervised Monocular Visual Odometry. Neural Process Lett 2023. [DOI: 10.1007/s11063-023-11224-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 03/28/2023]
|
3
|
Bian JW, Zhan H, Wang N, Chin TJ, Shen C, Reid I. Auto-Rectify Network for Unsupervised Indoor Depth Estimation. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE 2022; 44:9802-9813. [PMID: 34919516 DOI: 10.1109/tpami.2021.3136220] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/14/2023]
Abstract
Single-View depth estimation using the CNNs trained from unlabelled videos has shown significant promise. However, excellent results have mostly been obtained in street-scene driving scenarios, and such methods often fail in other settings, particularly indoor videos taken by handheld devices. In this work, we establish that the complex ego-motions exhibited in handheld settings are a critical obstacle for learning depth. Our fundamental analysis suggests that the rotation behaves as noise during training, as opposed to the translation (baseline) which provides supervision signals. To address the challenge, we propose a data pre-processing method that rectifies training images by removing their relative rotations for effective learning. The significantly improved performance validates our motivation. Towards end-to-end learning without requiring pre-processing, we propose an Auto-Rectify Network with novel loss functions, which can automatically learn to rectify images during training. Consequently, our results outperform the previous unsupervised SOTA method by a large margin on the challenging NYUv2 dataset. We also demonstrate the generalization of our trained model in ScanNet and Make3D, and the universality of our proposed learning method on 7-Scenes and KITTI datasets.
Collapse
|
4
|
Ma Z, Li K, Li Y. Self-supervised method for 3D human pose estimation with consistent shape and viewpoint factorization. APPL INTELL 2022. [DOI: 10.1007/s10489-022-03714-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
|
5
|
Kraus M, Pollok T, Miller M, Kilian T, Moritz T, Schweitzer D, Beyerer J, Keim D, Qu C, Jentner W. Toward Mass Video Data Analysis: Interactive and Immersive 4D Scene Reconstruction. SENSORS (BASEL, SWITZERLAND) 2020; 20:E5426. [PMID: 32971822 PMCID: PMC7570841 DOI: 10.3390/s20185426] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 08/29/2020] [Revised: 09/15/2020] [Accepted: 09/17/2020] [Indexed: 11/18/2022]
Abstract
The technical progress in the last decades makes photo and video recording devices omnipresent. This change has a significant impact, among others, on police work. It is no longer unusual that a myriad of digital data accumulates after a criminal act, which must be reviewed by criminal investigators to collect evidence or solve the crime. This paper presents the VICTORIA Interactive 4D Scene Reconstruction and Analysis Framework ("ISRA-4D" 1.0), an approach for the visual consolidation of heterogeneous video and image data in a 3D reconstruction of the corresponding environment. First, by reconstructing the environment in which the materials were created, a shared spatial context of all available materials is established. Second, all footage is spatially and temporally registered within this 3D reconstruction. Third, a visualization of the hereby created 4D reconstruction (3D scene + time) is provided, which can be analyzed interactively. Additional information on video and image content is also extracted and displayed and can be analyzed with supporting visualizations. The presented approach facilitates the process of filtering, annotating, analyzing, and getting an overview of large amounts of multimedia material. The framework is evaluated using four case studies which demonstrate its broad applicability. Furthermore, the framework allows the user to immerse themselves in the analysis by entering the scenario in virtual reality. This feature is qualitatively evaluated by means of interviews of criminal investigators and outlines potential benefits such as improved spatial understanding and the initiation of new fields of application.
Collapse
Affiliation(s)
- Matthias Kraus
- Department of Computer and Information Science, Universiät Konstanz, Universitätsstr. 10, 78465 Konstanz, Germany; (M.M.); (T.K.); (D.S.); (D.K.); (W.J.)
| | - Thomas Pollok
- Fraunhofer IOSB, Fraunhoferstr. 1, 76131 Karlsruhe, Germany; (T.P.); (T.M.); (J.B.); (C.Q.)
| | - Matthias Miller
- Department of Computer and Information Science, Universiät Konstanz, Universitätsstr. 10, 78465 Konstanz, Germany; (M.M.); (T.K.); (D.S.); (D.K.); (W.J.)
| | - Timon Kilian
- Department of Computer and Information Science, Universiät Konstanz, Universitätsstr. 10, 78465 Konstanz, Germany; (M.M.); (T.K.); (D.S.); (D.K.); (W.J.)
| | - Tobias Moritz
- Fraunhofer IOSB, Fraunhoferstr. 1, 76131 Karlsruhe, Germany; (T.P.); (T.M.); (J.B.); (C.Q.)
| | - Daniel Schweitzer
- Department of Computer and Information Science, Universiät Konstanz, Universitätsstr. 10, 78465 Konstanz, Germany; (M.M.); (T.K.); (D.S.); (D.K.); (W.J.)
| | - Jürgen Beyerer
- Fraunhofer IOSB, Fraunhoferstr. 1, 76131 Karlsruhe, Germany; (T.P.); (T.M.); (J.B.); (C.Q.)
- Vision and Fusion Lab (IES), Karlsruhe Institute of Technology (KIT), c/o Technologiefabrik, Haid-und-Neu-Str. 7, 76131 Karlsruhe, Germany
| | - Daniel Keim
- Department of Computer and Information Science, Universiät Konstanz, Universitätsstr. 10, 78465 Konstanz, Germany; (M.M.); (T.K.); (D.S.); (D.K.); (W.J.)
| | - Chengchao Qu
- Fraunhofer IOSB, Fraunhoferstr. 1, 76131 Karlsruhe, Germany; (T.P.); (T.M.); (J.B.); (C.Q.)
| | - Wolfgang Jentner
- Department of Computer and Information Science, Universiät Konstanz, Universitätsstr. 10, 78465 Konstanz, Germany; (M.M.); (T.K.); (D.S.); (D.K.); (W.J.)
| |
Collapse
|