1
|
Wu H, Flynn C, Hall C, Che-Castaldo C, Samaras D, Schwaller M, Lynch HJ. Penguin colony georegistration using camera pose estimation and phototourism. PLoS One 2024; 19:e0311038. [PMID: 39475845 PMCID: PMC11524458 DOI: 10.1371/journal.pone.0311038] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/20/2024] [Accepted: 09/11/2024] [Indexed: 11/02/2024] Open
Abstract
Satellite-based remote sensing and uncrewed aerial imagery play increasingly important roles in the mapping of wildlife populations and wildlife habitat, but the availability of imagery has been limited in remote areas. At the same time, ecotourism is a rapidly growing industry and can yield a vast catalog of photographs that could be harnessed for monitoring purposes, but the inherently ad-hoc and unstructured nature of these images make them difficult to use. To help address this, a subfield of computer vision known as phototourism has been developed to leverage a diverse collection of unstructured photographs to reconstruct a georeferenced three-dimensional scene capturing the environment at that location. Here we demonstrate the use of phototourism in an application involving Antarctic penguins, sentinel species whose dynamics are closely tracked as a measure of ecosystem functioning, and introduce a semi-automated pipeline for aligning and registering ground photographs using a digital elevation model (DEM) and satellite imagery. We employ the Segment Anything Model (SAM) for the interactive identification and segmentation of penguin colonies in these photographs. By creating a textured 3D mesh from the DEM and satellite imagery, we estimate camera poses to align ground photographs with the mesh and register the segmented penguin colony area to the mesh, achieving a detailed representation of the colony. Our approach has demonstrated promising performance, though challenges persist due to variations in image quality and the dynamic nature of natural landscapes. Nevertheless, our method offers a straightforward and effective tool for the georegistration of ad-hoc photographs in natural landscapes, with additional applications such as monitoring glacial retreat.
Collapse
Affiliation(s)
- Haoyu Wu
- Department of Computer Science, Stony Brook University, Stony Brook, New York, United States of America
| | - Clare Flynn
- Department of Ecology & Evolution, Stony Brook University, Stony Brook, New York, United States of America
| | - Carole Hall
- Department of Applied Mathematics and Statistics, Stony Brook University, Stony Brook, New York, United States of America
| | - Christian Che-Castaldo
- U.S. Geological Survey, Wisconsin Cooperative Wildlife Research Unit, Department of Forest and Wildlife Ecology, University of Wisconsin-Madison, Madison, WI, United States of America
| | - Dimitris Samaras
- Department of Computer Science, Stony Brook University, Stony Brook, New York, United States of America
| | - Mathew Schwaller
- Institute for Advanced Computational Science, Stony Brook University, Stony Brook, New York, United States of America
| | - Heather J. Lynch
- Department of Ecology & Evolution, Stony Brook University, Stony Brook, New York, United States of America
- Institute for Advanced Computational Science, Stony Brook University, Stony Brook, New York, United States of America
| |
Collapse
|
2
|
Connor M, Olshausen B, Rozell C. Learning Internal Representations of 3D Transformations From 2D Projected Inputs. Neural Comput 2024; 36:2505-2539. [PMID: 39141802 DOI: 10.1162/neco_a_01695] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/14/2023] [Accepted: 03/06/2024] [Indexed: 08/16/2024]
Abstract
We describe a computational model for inferring 3D structure from the motion of projected 2D points in an image, with the aim of understanding how biological vision systems learn and internally represent 3D transformations from the statistics of their input. The model uses manifold transport operators to describe the action of 3D points in a scene as they undergo transformation. We show that the model can learn the generator of the Lie group for these transformations from purely 2D input, providing a proof-of-concept demonstration for how biological systems could adapt their internal representations based on sensory input. Focusing on a rotational model, we evaluate the ability of the model to infer depth from moving 2D projected points and to learn rotational transformations from 2D training stimuli. Finally, we compare the model performance to psychophysical performance on structure-from-motion tasks.
Collapse
Affiliation(s)
- Marissa Connor
- School of Electrical and Computer Engineering, Georgia Institute of Technology, Atlanta, GA 30332, U.S.A.
| | - Bruno Olshausen
- Helen Wills Neuroscience Institute and School of Optometry, University of California, Berkeley, CA 94720, U.S.A.
| | - Christopher Rozell
- School of Electrical and Computer Engineering, Georgia Institute of Technology, Atlanta, GA 30332, U.S.A.
| |
Collapse
|
3
|
Ruf B, Weinmann M, Hinz S. FaSS-MVS: Fast Multi-View Stereo with Surface-Aware Semi-Global Matching from UAV-Borne Monocular Imagery. SENSORS (BASEL, SWITZERLAND) 2024; 24:6397. [PMID: 39409439 PMCID: PMC11479275 DOI: 10.3390/s24196397] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 08/29/2024] [Revised: 09/25/2024] [Accepted: 09/29/2024] [Indexed: 10/20/2024]
Abstract
With FaSS-MVS, we present a fast, surface-aware semi-global optimization approach for multi-view stereo that allows for rapid depth and normal map estimation from monocular aerial video data captured by unmanned aerial vehicles (UAVs). The data estimated by FaSS-MVS, in turn, facilitate online 3D mapping, meaning that a 3D map of the scene is immediately and incrementally generated as the image data are acquired or being received. FaSS-MVS is composed of a hierarchical processing scheme in which depth and normal data, as well as corresponding confidence scores, are estimated in a coarse-to-fine manner, allowing efficient processing of large scene depths, such as those inherent in oblique images acquired by UAVs flying at low altitudes. The actual depth estimation uses a plane-sweep algorithm for dense multi-image matching to produce depth hypotheses from which the actual depth map is extracted by means of a surface-aware semi-global optimization, reducing the fronto-parallel bias of Semi-Global Matching (SGM). Given the estimated depth map, the pixel-wise surface normal information is then computed by reprojecting the depth map into a point cloud and computing the normal vectors within a confined local neighborhood. In a thorough quantitative and ablative study, we show that the accuracy of the 3D information computed by FaSS-MVS is close to that of state-of-the-art offline multi-view stereo approaches, with the error not even an order of magnitude higher than that of COLMAP. At the same time, however, the average runtime of FaSS-MVS for estimating a single depth and normal map is less than 14% of that of COLMAP, allowing us to perform online and incremental processing of full HD images at 1-2 Hz.
Collapse
Affiliation(s)
- Boitumelo Ruf
- Fraunhofer Institute of Optronics, System Technologies and Image Exploitation IOSB, 76131 Karlsruhe, Germany
| | - Martin Weinmann
- Institute of Photogrammetry and Remote Sensing, Karlsruhe Institute of Technology, 76131 Karlsruhe, Germany; (M.W.); (S.H.)
| | - Stefan Hinz
- Institute of Photogrammetry and Remote Sensing, Karlsruhe Institute of Technology, 76131 Karlsruhe, Germany; (M.W.); (S.H.)
| |
Collapse
|
4
|
Jiang Z, Monno Y, Okutomi M, Suzuki S, Miki K. Neural Radiance Fields for Novel View Synthesis in Monocular Gastroscopy. ANNUAL INTERNATIONAL CONFERENCE OF THE IEEE ENGINEERING IN MEDICINE AND BIOLOGY SOCIETY. IEEE ENGINEERING IN MEDICINE AND BIOLOGY SOCIETY. ANNUAL INTERNATIONAL CONFERENCE 2024; 2024:1-5. [PMID: 40039536 DOI: 10.1109/embc53108.2024.10782186] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 03/06/2025]
Abstract
Enabling the synthesis of arbitrarily novel viewpoint images within a patient's stomach from pre-captured monocular gastroscopic images is a promising topic in stomach diagnosis. Typical methods to achieve this objective integrate traditional 3D reconstruction techniques, including structure-from-motion (SfM) and Poisson surface reconstruction. These methods produce explicit 3D representations, such as point clouds and meshes, thereby enabling the rendering of the images from novel viewpoints. However, the existence of low-texture and non-Lambertian regions within the stomach often results in noisy and incomplete reconstructions of point clouds and meshes, hindering the attainment of high-quality image rendering. In this paper, we apply the emerging technique of neural radiance fields (NeRF) to monocular gastroscopic data for synthesizing photo-realistic images for novel viewpoints. To address the performance degradation due to view sparsity in local regions of monocular gastroscopy, we incorporate geometry priors from a pre-reconstructed point cloud into the training of NeRF, which introduces a novel geometry-based loss to both pre-captured observed views and generated unobserved views. Compared to other recent NeRF methods, our approach showcases high-fidelity image renderings from novel viewpoints within the stomach both qualitatively and quantitatively.
Collapse
|
5
|
Schmitt C, Antic B, Neculai A, Lee JH, Geiger A. Towards Scalable Multi-View Reconstruction of Geometry and Materials. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE 2023; 45:15850-15869. [PMID: 37708017 DOI: 10.1109/tpami.2023.3314348] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 09/16/2023]
Abstract
In this paper, we propose a novel method for joint recovery of camera pose, object geometry and spatially-varying Bidirectional Reflectance Distribution Function (svBRDF) of 3D scenes that exceed object-scale and hence cannot be captured with stationary light stages. The input are high-resolution RGB-D images captured by a mobile, hand-held capture system with point lights for active illumination. Compared to previous works that jointly estimate geometry and materials from a hand-held scanner, we formulate this problem using a single objective function that can be minimized using off-the-shelf gradient-based solvers. To facilitate scalability to large numbers of observation views and optimization variables, we introduce a distributed optimization algorithm that reconstructs 2.5D keyframe-based representations of the scene. A novel multi-view consistency regularizer effectively synchronizes neighboring keyframes such that the local optimization results allow for seamless integration into a globally consistent 3D model. We provide a study on the importance of each component in our formulation and show that our method compares favorably to baselines. We further demonstrate that our method accurately reconstructs various objects and materials and allows for expansion to spatially larger scenes. We believe that this work represents a significant step towards making geometry and material estimation from hand-held scanners scalable.
Collapse
|
6
|
Xu Q, Kong W, Tao W, Pollefeys M. Multi-Scale Geometric Consistency Guided and Planar Prior Assisted Multi-View Stereo. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE 2023; 45:4945-4963. [PMID: 35984800 DOI: 10.1109/tpami.2022.3200074] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/15/2023]
Abstract
In this paper, we propose some efficient multi-view stereo methods for accurate and complete depth map estimation. We first present our basic methods with Adaptive Checkerboard sampling and Multi-Hypothesis joint view selection (ACMH & ACMH+). Based on our basic models, we develop two frameworks to deal with the depth estimation of ambiguous regions (especially low-textured areas) from two different perspectives: multi-scale information fusion and planar geometric clue assistance. For the former one, we propose a multi-scale geometric consistency guidance framework (ACMM) to obtain the reliable depth estimates for low-textured areas at coarser scales and guarantee that they can be propagated to finer scales. For the latter one, we propose a planar prior assisted framework (ACMP). We utilize a probabilistic graphical model to contribute a novel multi-view aggregated matching cost. At last, by taking advantage of the above frameworks, we further design a multi-scale geometric consistency guided and planar prior assisted multi-view stereo (ACMMP). This greatly enhances the discrimination of ambiguous regions and helps their depth sensing. Experiments on extensive datasets show our methods achieve state-of-the-art performance, recovering the depth estimation not only in low-textured areas but also in details. Related codes are available at https://github.com/GhiXu.
Collapse
|
7
|
Chai B, Wei Z. Stratified camera calibration algorithm based on the calibrating conic. OPTICS EXPRESS 2023; 31:1282-1302. [PMID: 36785167 DOI: 10.1364/oe.480086] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Received: 11/09/2022] [Accepted: 12/15/2022] [Indexed: 06/18/2023]
Abstract
In computer vision, camera calibration is essential for photogrammetric measurement. We propose a new stratified camera calibration method based on geometric constraints. This paper proposes several new theorems in 2D projective transformation: (1) There exists a family of lines whose parallelity remains invariable in a 2D projective transformation. These lines are parallel with the image of the infinity line. (2) There is only one line whose verticality is invariable with the family of parallel lines in a 2D projective transformation, and the principal point lies on this line. With the image of the infinite line and the dual conic of the circular points, the closed-form solution of the line passing through principal point is deduced. The angle among the target board and image plane, which influences camera calibration, is computed. We propose a new geometric interpretation of the target board's pose and solution method. To obtain appropriate poses of the target board for camera calibration, we propose a visual pose guide (VPG) of the target board system that can guide a user to move the target board to obtain appropriate images for calibration. The expected homography is defined, and its solution method is deduced. Experimental results with synthetic and real data verify correctness and validity of the proposed method.
Collapse
|
8
|
Cui H, Tu D, Tang F, Xu P, Liu H, Shen S. VidSfM: Robust and Accurate Structure-From-Motion for Monocular Videos. IEEE TRANSACTIONS ON IMAGE PROCESSING : A PUBLICATION OF THE IEEE SIGNAL PROCESSING SOCIETY 2022; 31:2449-2462. [PMID: 35263254 DOI: 10.1109/tip.2022.3156375] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/14/2023]
Abstract
With the popularization of smartphones, larger collection of videos with high quality is available, which makes the scale of scene reconstruction increase dramatically. However, high-resolution video produces more match outliers, and high frame rate video brings more redundant images. To solve these problems, a tailor-made framework is proposed to realize an accurate and robust structure-from-motion based on monocular videos. The key ideas include two points: one is to use the spatial and temporal continuity of video sequences to improve the accuracy and robustness of reconstruction; the other is to use the redundancy of video sequences to improve the efficiency and scalability of system. Our technical contributions include an adaptive way to identify accurate loop matching pairs, a cluster-based camera registration algorithm, a local rotation averaging scheme to verify the pose estimate and a local images extension strategy to reboot the incremental reconstruction. In addition, our system can integrate data from different video sequences, allowing multiple videos to be simultaneously reconstructed. Extensive experiments on both indoor and outdoor monocular videos demonstrate that our method outperforms the state-of-the-art approaches in robustness, accuracy and scalability.
Collapse
|
9
|
Yu Y, Guan B, Sun X, Li Z. Self-calibration of cameras using affine correspondences and known relative rotation angle. APPLIED OPTICS 2021; 60:10785-10794. [PMID: 35200837 DOI: 10.1364/ao.443607] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/20/2021] [Accepted: 11/11/2021] [Indexed: 06/14/2023]
Abstract
This paper proposes a flexible method for camera self-calibration using affine correspondences and known relative rotation angle, which applies to the case that camera and inertial measurement unit (IMU) are tightly fixed. An affine correspondence provides two more constraints for the self-calibration problem than a traditional point correspondence, and the relative rotation angle can be derived from the IMU. Therefore, calibrating intrinsic camera parameters needs fewer correspondences, which can reduce the iterations and improve the algorithm's robustness in the random sample consensus framework. The proposed method does not require rotational alignment between the camera and the IMU. This advantage makes our method more convenient and flexible. The experimental results of both synthetic data and publicly available real datasets demonstrate that our method is effective and accurate for camera self-calibration.
Collapse
|
10
|
Xiang X, Jiang H, Zhang G, Yu Y, Li C, Yang X, Chen D, Bao H. Mobile3DScanner: An Online 3D Scanner for High-quality Object Reconstruction with a Mobile Device. IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS 2021; 27:4245-4255. [PMID: 34449377 DOI: 10.1109/tvcg.2021.3106491] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/13/2023]
Abstract
We present a novel online 3D scanning system for high-quality object reconstruction with a mobile device, called Mobile3DScanner. Using a mobile device equipped with an embedded RGBD camera, our system provides online 3D object reconstruction capability for users to acquire high-quality textured 3D object models. Starting with a simultaneous pose tracking and TSDF fusion module, our system allows users to scan an object with a mobile device to get a 3D model for real-time preview. After the real-time scanning process is completed, the scanned 3D model is globally optimized and mapped with multi-view textures as an efficient postprocess to get the final textured 3D model on the mobile device. Unlike most existing state-of-the-art systems which can only scan homeware objects such as toys with small dimensions due to the limited computation and memory resources of mobile platforms, our system can reconstruct objects with large dimensions such as statues. We propose a novel visual-inertial ICP approach to achieve real-time accurate 6DoF pose tracking of each incoming frame on the front end, while maintaining a keyframe pool on the back end where the keyframe poses are optimized by local BA. Simultaneously, the keyframe depth maps are fused by the optimized poses to a TSDF model in real-time. Especially, we propose a novel adaptive voxel resizing strategy to solve the out-of-memory problem of large dimension TSDF fusion on mobile platforms. In the post-process, the keyframe poses are globally optimized and the keyframe depth maps are optimized and fused to obtain a final object model with more accurate geometry. The experiments with quantitative and qualitative evaluation demonstrate the effectiveness of the proposed 3D scanning system based on a mobile device, which can successfully achieve online high-quality 3D reconstruction of natural objects with larger dimensions for efficient AR content creation.
Collapse
|
11
|
Yang W, Zhang Y, Ye J, Ji Y, Li Z, Zhou M, Yu J. Structure From Motion on XSlit Cameras. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE 2021; 43:1691-1704. [PMID: 31796390 DOI: 10.1109/tpami.2019.2957119] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/10/2023]
Abstract
We present a structure-from-motion (SfM) framework based on a special type of multi-perspective camera called the cross-slit or XSlit camera. Traditional perspective camera based SfM suffers from the scale ambiguity which is inherent to the pinhole camera geometry. In contrast, an XSlit camera captures rays passing through two oblique lines in 3D space and we show such ray geometry directly resolves the scale ambiguity when employed for SfM. To accommodate the XSlit cameras, we develop tailored feature matching, camera pose estimation, triangulation, and bundle adjustment techniques. Specifically, we devise a SIFT feature variant using non-uniform Gaussian kernels to handle the distortions in XSlit images for reliable feature matching. Moreover, we demonstrate that the XSlit camera exhibits ambiguities in pose estimation process which can not be handled by existing work. Consequently, we propose a 14 point algorithm to properly handle the XSlit degeneracy and estimate the relative pose between XSlit cameras from feature correspondences. We further exploit the unique depth-dependent aspect ratio (DDAR) property to improve the bundle adjustment for the XSlit camera. Synthetic and real experiments demonstrate that the proposed XSlit SfM can conduct reliable and high fidelity 3D reconstruction at an absolute scale.
Collapse
|
12
|
Xu J, Xie Q, Chen H, Wang J. Real-Time Plane Detection with Consistency from Point Cloud Sequences. SENSORS (BASEL, SWITZERLAND) 2020; 21:s21010140. [PMID: 33379284 PMCID: PMC7796097 DOI: 10.3390/s21010140] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 09/01/2020] [Revised: 10/28/2020] [Accepted: 12/15/2020] [Indexed: 06/12/2023]
Abstract
Real-time consistent plane detection (RCPD) from structured point cloud sequences facilitates various high-level computer vision and robotic tasks. However, it remains a challenge. Existing techniques for plane detection suffer from a long running time or the problem that the plane detection result is not precise. Meanwhile, labels of planes are not consistent over the whole image sequence due to plane loss in the detection stage. In order to resolve these issues, we propose a novel superpixel-based real-time plane detection approach, while keeping their consistencies over frames simultaneously. In summary, our method has the following key contributions: (i) a real-time plane detection algorithm to extract planes from raw structured three-dimensional (3D) point clouds collected by depth sensors; (ii) a superpixel-based segmentation method to make the detected plane exactly match its actual boundary; and, (iii) a robust strategy to recover the missing planes by utilizing the contextual correspondences information in adjacent frames. Extensive visual and numerical experiments demonstrate that our method outperforms state-of-the-art methods in terms of efficiency and accuracy.
Collapse
Affiliation(s)
- Jinxuan Xu
- College of Computer Science & Technology, Nanjing University of Aeronautics and Astronautics, Nanjing 211100, China;
| | - Qian Xie
- College of Mechanical & Electrical Engineering, Nanjing University of Aeronautics and Astronautics, Nanjing 211100, China; (Q.X.); (H.C.)
| | - Honghua Chen
- College of Mechanical & Electrical Engineering, Nanjing University of Aeronautics and Astronautics, Nanjing 211100, China; (Q.X.); (H.C.)
| | - Jun Wang
- College of Mechanical & Electrical Engineering, Nanjing University of Aeronautics and Astronautics, Nanjing 211100, China; (Q.X.); (H.C.)
| |
Collapse
|
13
|
Yang X, Zhou L, Jiang H, Tang Z, Wang Y, Bao H, Zhang G. Mobile3DRecon: Real-time Monocular 3D Reconstruction on a Mobile Phone. IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS 2020; 26:3446-3456. [PMID: 32956060 DOI: 10.1109/tvcg.2020.3023634] [Citation(s) in RCA: 12] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/11/2023]
Abstract
We present a real-time monocular 3D reconstruction system on a mobile phone, called Mobile3DRecon. Using an embedded monocular camera, our system provides an online mesh generation capability on back end together with real-time 6DoF pose tracking on front end for users to achieve realistic AR effects and interactions on mobile phones. Unlike most existing state-of-the-art systems which produce only point cloud based 3D models online or surface mesh offline, we propose a novel online incremental mesh generation approach to achieve fast online dense surface mesh reconstruction to satisfy the demand of real-time AR applications. For each keyframe of 6DoF tracking, we perform a robust monocular depth estimation, with a multi-view semi-global matching method followed by a depth refinement post-processing. The proposed mesh generation module incrementally fuses each estimated keyframe depth map to an online dense surface mesh, which is useful for achieving realistic AR effects such as occlusions and collisions. We verify our real-time reconstruction results on two mid-range mobile platforms. The experiments with quantitative and qualitative evaluation demonstrate the effectiveness of the proposed monocular 3D reconstruction system, which can handle the occlusions and collisions between virtual objects and real scenes to achieve realistic AR effects.
Collapse
|
14
|
Wang L, Wei H. Understanding of Curved Corridor Scenes Based on Projection of Spatial Right-angles. IEEE TRANSACTIONS ON IMAGE PROCESSING : A PUBLICATION OF THE IEEE SIGNAL PROCESSING SOCIETY 2020; PP:9345-9359. [PMID: 32997629 DOI: 10.1109/tip.2020.3026628] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/11/2023]
Abstract
Helping mobile robots understand curved corridor scenes has considerable value in computer vision. However, due to the diversity of curved corridor scenes, such as curved structures that do not satisfy Manhattan assumption, understanding them remains a challenge. Curved non-Manhattan structures can be seen as compositions of spatial right angles projected into two dimensional projections, which may help us estimate their original posture in 3D scenes. In this paper, we presented an approach for mobile robots to understand curved corridor scenes including Manhattan and curved non-Manhattan structures, from a single image. Angle projections can be assigned to different clusters via geometric inference. Then coplanar structures can be estimated. Fold structures consisting of coplanar structures can be estimated, and curved non-Manhattan structures can be approximately represented by fold structures. Based on understanding curved non-Manhattan structures, the method is practical and efficient for a navigating mobile robot in curved corridor scenes. The algorithm requires no prior training or knowledge of the camera's internal parameters. With geometric features from a monocular camera, the method is robust to calibration errors and image noise. We compared the estimated curved layout against the ground truth and measured the percentage of pixels that were incorrectly classified. The experimental results showed that the algorithm can successfully understand curved corridor scenes including both Manhattan and curved non-Manhattan structures, meeting the requirements of robot navigation in a curved corridor environment.
Collapse
|
15
|
Tanner M, Piniés P, Paz LM, Săftescu Ş, Bewley A, Jonasson E, Newman P. Large-scale outdoor scene reconstruction and correction with vision. Int J Rob Res 2020. [DOI: 10.1177/0278364920937052] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/17/2022]
Abstract
We provide the theory and the system needed to create large-scale dense reconstructions for mobile-robotics applications: this stands in contrast to the object-centric reconstructions dominant in the literature. Our BOR2G system fuses data from multiple sensor modalities (cameras, lidars, or both) and regularizes the resulting 3D model. We use a compressed 3D data structure, which allows us to operate over a large scale. In addition, because of the paucity of surface observations by the camera and lidar sensors, we regularize over both two (camera depth maps) and three dimensions (voxel grid) to provide a local contextual prior for the reconstruction. Our regularizer reduces the median error between 27% and 36% in 7.3 km of dense reconstructions with a median accuracy between 4 and 8 cm. Our pipeline does not end with regularization. We take the unusual step to apply a learned correction mechanism that takes the global context of the reconstruction and adjusts the constructed mesh, addressing errors that are pathological to the first-pass camera-derived reconstruction. We evaluate our system using the Stanford Burghers of Calais, Imperial College ICL-NUIM, Oxford Broad Street (released with this paper), and the KITTI datasets. These latter datasets see us operating at a combined scale and accuracy not seen in the literature. We provide statistics for the metric errors in all surfaces created compared with those measured with 3D lidar as ground truth. We demonstrate our system in practice by reconstructing the inside of the EUROfusion Joint European Torus (JET) fusion reactor, located at the Culham Centre for Fusion Energy (UK Atomic Energy Authority) in Oxfordshire.
Collapse
Affiliation(s)
| | - Pedro Piniés
- Oxford Robotics Institute, University of Oxford, UK
| | | | | | - Alex Bewley
- Oxford Robotics Institute, University of Oxford, UK
| | | | - Paul Newman
- Oxford Robotics Institute, University of Oxford, UK
| |
Collapse
|
16
|
Gong D, He Z, Ye X, Fang Z. Visual Saliency Detection for Over-Temperature Regions in 3D Space via Dual-Source Images. SENSORS 2020; 20:s20123414. [PMID: 32560453 PMCID: PMC7348710 DOI: 10.3390/s20123414] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 05/13/2020] [Revised: 06/13/2020] [Accepted: 06/14/2020] [Indexed: 11/16/2022]
Abstract
To allow mobile robots to visually observe the temperature of equipment in complex industrial environments and work on temperature anomalies in time, it is necessary to accurately find the coordinates of temperature anomalies and obtain information on the surrounding obstacles. This paper proposes a visual saliency detection method for hypertemperature in three-dimensional space through dual-source images. The key novelty of this method is that it can achieve accurate salient object detection without relying on high-performance hardware equipment. First, the redundant point clouds are removed through adaptive sampling to reduce the computational memory. Second, the original images are merged with infrared images and the dense point clouds are surface-mapped to visually display the temperature of the reconstructed surface and use infrared imaging characteristics to detect the plane coordinates of temperature anomalies. Finally, transformation mapping is coordinated according to the pose relationship to obtain the spatial position. Experimental results show that this method not only displays the temperature of the device directly but also accurately obtains the spatial coordinates of the heat source without relying on a high-performance computing platform.
Collapse
|
17
|
Low-Cost iPhone-Assisted Processing to Obtain Radiotherapy Bolus Using Optical Surface Reconstruction and 3D-Printing. Sci Rep 2020; 10:8016. [PMID: 32415217 PMCID: PMC7228923 DOI: 10.1038/s41598-020-64967-5] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/12/2019] [Accepted: 04/27/2020] [Indexed: 11/08/2022] Open
Abstract
Patient specific boluses can increase the skin dose distribution better for treating tumors located just beneath the skin with high-energy radiation than a flat bolus. We introduce a low-cost, 3D-printed, patient-specific bolus made of commonly available materials and easily produced using the "structure from motion" and a simple desktop 3D printing technique. Nine pictures were acquired with an iPhone camera around a head phantom. The 3D surface of the phantom was generated using these pictures and the "structure from motion" algorithm, with a scale factor calculated by a sphere fitting algorithm. A bolus for the requested position and shape based on the above generated surface was 3D-printed using ABS material. Two intensity modulated radiation therapy plans were designed to simulate clinical treatment for a tumor located under the skin surface with a flat bolus and a printed bolus, respectively. The planned parameters of dose volume histogram, conformity index (CI) and homogeneity index (HI) were compared. The printed bolus plan gave a dose coverage to the tumor with a CI of 0.817 compared to the CI of 0.697 for the plan with flat bolus. The HIs of the plan with printed bolus and flat bolus were 0.910 and 0.887, respectively.
Collapse
|
18
|
Abstract
Feature detection, description, and matching are crucial steps for many computer vision algorithms. These steps rely on feature descriptors to match image features across sets of images. Previous work has shown that our SYnthetic BAsis (SYBA) feature descriptor can offer superior performance to other binary descriptors. This paper focused on various optimizations and hardware implementation of the newer and optimized version. The hardware implementation on a field-programmable gate array (FPGA) is a high-throughput low-latency solution which is critical for applications such as high-speed object detection and tracking, stereo vision, visual odometry, structure from motion, and optical flow. We compared our solution to other hardware designs of binary descriptors. We demonstrated that our implementation of SYBA as a feature descriptor in hardware offered superior image feature matching performance and used fewer resources than most binary feature descriptor implementations.
Collapse
|
19
|
He Y, Zheng S, Zhu F, Huang X. Real-Time 3D Reconstruction of Thin Surface Based on Laser Line Scanner. SENSORS (BASEL, SWITZERLAND) 2020; 20:s20020534. [PMID: 31963669 PMCID: PMC7014519 DOI: 10.3390/s20020534] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 11/24/2019] [Revised: 01/12/2020] [Accepted: 01/13/2020] [Indexed: 05/27/2023]
Abstract
The truncated signed distance field (TSDF) has been applied as a fast, accurate, and flexible geometric fusion method in 3D reconstruction of industrial products based on a hand-held laser line scanner. However, this method has some problems for the surface reconstruction of thin products. The surface mesh will collapse to the interior of the model, resulting in some topological errors, such as overlap, intersections, or gaps. Meanwhile, the existing TSDF method ensures real-time performance through significant graphics processing unit (GPU) memory usage, which limits the scale of reconstruction scene. In this work, we propose three improvements to the existing TSDF methods, including: (i) a thin surface attribution judgment method in real-time processing that solves the problem of interference between the opposite sides of the thin surface; we distinguish measurements originating from different parts of a thin surface by the angle between the surface normal and the observation line of sight; (ii) a post-processing method to automatically detect and repair the topological errors in some areas where misjudgment of thin-surface attribution may occur; (iii) a framework that integrates the central processing unit (CPU) and GPU resources to implement our 3D reconstruction approach, which ensures real-time performance and reduces GPU memory usage. The proposed results show that this method can provide more accurate 3D reconstruction of a thin surface, which is similar to the state-of-the-art laser line scanners with 0.02 mm accuracy. In terms of performance, the algorithm can guarantee a frame rate of more than 60 frames per second (FPS) with the GPU memory footprint under 500 MB. In total, the proposed method can achieve a real-time and high-precision 3D reconstruction of a thin surface.
Collapse
Affiliation(s)
- Yuan He
- Correspondence: (Y.H.); (S.Z.)
| | | | | | | |
Collapse
|
20
|
Ortiz-Coder P, Sánchez-Ríos A. A Self-Assembly Portable Mobile Mapping System for Archeological Reconstruction Based on VSLAM-Photogrammetric Algorithm. SENSORS 2019; 19:s19183952. [PMID: 31547455 PMCID: PMC6766812 DOI: 10.3390/s19183952] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 07/19/2019] [Revised: 08/29/2019] [Accepted: 09/09/2019] [Indexed: 11/16/2022]
Abstract
Three Dimensional (3D) models are widely used in clinical applications, geosciences, cultural heritage preservation, and engineering; this, together with new emerging needs such as building information modeling (BIM) develop new data capture techniques and devices with a low cost and reduced learning curve that allow for non-specialized users to employ it. This paper presents a simple, self-assembly device for 3D point clouds data capture with an estimated base price under €2500; furthermore, a workflow for the calculations is described that includes a Visual SLAM-photogrammetric threaded algorithm that has been implemented in C++. Another purpose of this work is to validate the proposed system in BIM working environments. To achieve it, in outdoor tests, several 3D point clouds were obtained and the coordinates of 40 points were obtained by means of this device, with data capture distances ranging between 5 to 20 m. Subsequently, those were compared to the coordinates of the same targets measured by a total station. The Euclidean average distance errors and root mean square errors (RMSEs) ranging between 12-46 mm and 8-33 mm respectively, depending on the data capture distance (5-20 m). Furthermore, the proposed system was compared with a commonly used photogrammetric methodology based on Agisoft Metashape software. The results obtained demonstrate that the proposed system satisfies (in each case) the tolerances of 'level 1' (51 mm) and 'level 2' (13 mm) for point cloud acquisition in urban design and historic documentation, according to the BIM Guide for 3D Imaging (U.S. General Services).
Collapse
Affiliation(s)
- Pedro Ortiz-Coder
- Department of Graphic Expression, University Centre of Mérida, University of Extremadura, 06800 Mérida, Spain.
| | - Alonso Sánchez-Ríos
- Department of Graphic Expression, University Centre of Mérida, University of Extremadura, 06800 Mérida, Spain.
| |
Collapse
|
21
|
Underwater photogrammetry in Antarctica: long-term observations in benthic ecosystems and legacy data rescue. Polar Biol 2019. [DOI: 10.1007/s00300-019-02480-w] [Citation(s) in RCA: 14] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
|
22
|
Pei Z, Li Y, Ma M, Li J, Leng C, Zhang X, Zhang Y. Occluded-Object 3D Reconstruction Using Camera Array Synthetic Aperture Imaging. SENSORS 2019; 19:s19030607. [PMID: 30709046 PMCID: PMC6386989 DOI: 10.3390/s19030607] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 12/28/2018] [Revised: 01/25/2019] [Accepted: 01/28/2019] [Indexed: 11/16/2022]
Abstract
With the three-dimensional (3D) coordinates of objects captured by a sequence of images taken in different views, object reconstruction is a technique which aims to recover the shape and appearance information of objects. Although great progress in object reconstruction has been made over the past few years, object reconstruction in occlusion situations remains a challenging problem. In this paper, we propose a novel method to reconstruct occluded objects based on synthetic aperture imaging. Unlike most existing methods, which either assume that there is no occlusion in the scene or remove the occlusion from the reconstructed result, our method uses the characteristics of synthetic aperture imaging that can effectively reduce the influence of occlusion to reconstruct the scene with occlusion. The proposed method labels occlusion pixels according to variance and reconstructs the 3D point cloud based on synthetic aperture imaging. Accuracies of the point cloud are tested by calculating the spatial difference between occlusion and non-occlusion conditions. The experiment results show that the proposed method can handle the occluded situation well and demonstrates a promising performance.
Collapse
Affiliation(s)
- Zhao Pei
- Key Laboratory of Modern Teaching Technology, Ministry of Education, Xi'an 710119, China.
- School of Computer Science, Shaanxi Normal University, Xi'an 710119, China.
| | - Yawen Li
- School of Computer Science, Shaanxi Normal University, Xi'an 710119, China.
| | - Miao Ma
- School of Computer Science, Shaanxi Normal University, Xi'an 710119, China.
| | - Jun Li
- School of Computer Science, Nanjing Normal University, Nanjing 210046, China.
- School of Automation, Southeast University, Nanjing 210096, China.
| | - Chengcai Leng
- School of Mathematics, Northwest University, Xi'an 710127, China.
| | - Xiaoqiang Zhang
- School of Information Engineering, Southwest University of Science and Technology, Mianyang 621010, China.
| | - Yanning Zhang
- School of Computer Science, Northwestern Polytechnical University, Xi'an 710129, China.
| |
Collapse
|
23
|
Sun Z, Zhang Y. Accuracy Evaluation of Videogrammetry Using A Low-Cost Spherical Camera for Narrow Architectural Heritage: An Observational Study with Variable Baselines and Blur Filters. SENSORS 2019; 19:s19030496. [PMID: 30691033 PMCID: PMC6386977 DOI: 10.3390/s19030496] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 12/29/2018] [Revised: 01/23/2019] [Accepted: 01/23/2019] [Indexed: 11/16/2022]
Abstract
Three-dimensional (3D) reconstruction using video frames extracted from spherical cameras introduces an innovative measurement method in narrow scenes of architectural heritage, but the accuracy of 3D models and their correlations with frame extraction ratios and blur filters are yet to be evaluated. This article addresses these issues for two narrow scenes of architectural heritage that are distinctive in layout, surface material, and lighting conditions. The videos captured with a hand-held spherical camera (30 frames per second) are extracted to frames with various ratios starting from 10 and increasing every 10 frames (10, 20, …, n). Two different blur assessment methods are employed for comparative analyses. Ground truth models obtained from terrestrial laser scanning and photogrammetry are employed for assessing the accuracy of 3D models from different groups. The results show that the relative accuracy (median absolute errors/object dimensions) of spherical-camera videogrammetry range from 1/500 to 1/2000, catering to the surveying and mapping of architectural heritage with medium accuracy and resolution. Sparser baselines (the length between neighboring image pairs) do not necessarily generate higher accuracy than those from denser baselines, and an optimal frame network should consider the essential completeness of complex components and potential degeneracy cases. Substituting blur frames with adjacent sharp frames could reduce global errors by 5⁻15%.
Collapse
Affiliation(s)
- Zheng Sun
- School of Architecture, Nanjing Tech University, Nanjing 211800, China.
| | - Yingying Zhang
- School of Architecture, Southeast University, Nanjing 210096, China.
| |
Collapse
|
24
|
Forensic 3D documentation of skin injuries using photogrammetry: photographs vs video and manual vs automatic measurements. Int J Legal Med 2018; 133:963-971. [PMID: 30560492 DOI: 10.1007/s00414-018-1982-6] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/13/2018] [Accepted: 12/05/2018] [Indexed: 10/27/2022]
Abstract
Accurate and precise documentation of lesions is an important aspect of the forensic pathologists' work. Photogrammetry provides a useful tool to take precise measurements from photographs. These photographs are normally acquired with single camera photographs, but the procedure is quite time-consuming. Video recording has the potential to record a larger amount of image data faster. We documented 33 cadaveric skin lesions, using photographs and video recordings. The dimensions of the lesions ranged between 0.27 and 21.8 cm. The measurements of the lesions were extracted with both manual and automatic point measurements from photographs and from video frames, respectively. Very small differences (mean and median < 1 mm) were found between measurements taken in photographs versus video frames. Video frames were often blurred, preventing clear demarcation of the edges of the lesions and presenting a larger amount of noise in the 3D models. The differences between the manual point and automatic point measurements were very small (mean and median < 1 mm), but the manual procedure is to be preferred, since automatic points were not always located on the edges of the lesions. The only aspect in which video frames were superior to photographs was the recording time: video recording was almost five times faster than the photo sessions. In conclusion, this study shows that precise and comparable measurements can be extracted both from photographs and video frames. The video is the fastest method, but the use of photographs is still recommended. Manual measurements are more precise than automatic measurements and equally time-consuming.
Collapse
|
25
|
Piazza E, Romanoni A, Matteucci M. Real-Time CPU-Based Large-Scale Three-Dimensional Mesh Reconstruction. IEEE Robot Autom Lett 2018. [DOI: 10.1109/lra.2018.2800104] [Citation(s) in RCA: 20] [Impact Index Per Article: 2.9] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]
|
26
|
Wei H, Wang L. Visual Navigation Using Projection of Spatial Right-Angle In Indoor Environment. IEEE TRANSACTIONS ON IMAGE PROCESSING : A PUBLICATION OF THE IEEE SIGNAL PROCESSING SOCIETY 2018; 27:3164-3177. [PMID: 29641398 DOI: 10.1109/tip.2018.2818931] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/08/2023]
Abstract
Helping robots understand indoor scenes has considerable value in computer vision. However, due to the diversity of indoor scenes, understanding them remains a big challenge. There are many spatial right-angles in indoor scenes. These spatial right-angles are projected into diverse 2D projections. These projections can be considered a composition of a pair of lines (line-pairs). Given the vanishing points (VPs), line segments can be assigned to 1 of 3 main orthogonal directions. The line-pairs (intersection of 2 lines), such that each of them converges to a different VP, are likely to be the projection of a spatial right-angle onto the image plane. These projections may enable us to estimate their original orientation and position in 3D scenes. In this paper, we presented a method to efficiently understand indoor scenes from a single image, without training or any knowledge of the camera's internal calibration. Through geometric inference of line-pairs, it is possible to find these spatial right-angle projections. Then, these projections can be assigned to different clusters, and the line that lies in the neighbor-cluster helps us estimate the layout of the indoor scene. The proposed approach required no prior training. We compared the room layout estimated by our algorithm against the room box ground truth, measuring the percentage of pixels that were correctly classified. These experiments showed that our method estimated not only room layout, but also details of the indoor scene.
Collapse
|
27
|
Cao Y, Xu B, Ye Z, Yang J, Cao Y, Tisse CL, Li X. Depth and thermal sensor fusion to enhance 3D thermographic reconstruction. OPTICS EXPRESS 2018; 26:8179-8193. [PMID: 29715787 DOI: 10.1364/oe.26.008179] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Received: 01/17/2018] [Accepted: 03/07/2018] [Indexed: 06/08/2023]
Abstract
Three-dimensional geometrical models with incorporated surface temperature data provide important information for various applications such as medical imaging, energy auditing, and intelligent robots. In this paper we present a robust method for mobile and real-time 3D thermographic reconstruction through depth and thermal sensor fusion. A multimodal imaging device consisting of a thermal camera and a RGB-D sensor is calibrated geometrically and used for data capturing. Based on the underlying principle that temperature information remains robust against illumination and viewpoint changes, we present a Thermal-guided Iterative Closest Point (T-ICP) methodology to facilitate reliable 3D thermal scanning applications. The pose of sensing device is initially estimated using correspondences found through maximizing the thermal consistency between consecutive infrared images. The coarse pose estimate is further refined by finding the motion parameters that minimize a combined geometric and thermographic loss function. Experimental results demonstrate that complimentary information captured by multimodal sensors can be utilized to improve performance of 3D thermographic reconstruction. Through effective fusion of thermal and depth data, the proposed approach generates more accurate 3D thermal models using significantly less scanning data.
Collapse
|
28
|
Robust Stereo Visual Odometry Using Improved RANSAC-Based Methods for Mobile Robot Localization. SENSORS 2017; 17:s17102339. [PMID: 29027935 PMCID: PMC5677260 DOI: 10.3390/s17102339] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 08/28/2017] [Revised: 10/09/2017] [Accepted: 10/11/2017] [Indexed: 11/17/2022]
Abstract
In this paper, we present a novel approach for stereo visual odometry with robust motion estimation that is faster and more accurate than standard RANSAC (Random Sample Consensus). Our method makes improvements in RANSAC in three aspects: first, the hypotheses are preferentially generated by sampling the input feature points on the order of ages and similarities of the features; second, the evaluation of hypotheses is performed based on the SPRT (Sequential Probability Ratio Test) that makes bad hypotheses discarded very fast without verifying all the data points; third, we aggregate the three best hypotheses to get the final estimation instead of only selecting the best hypothesis. The first two aspects improve the speed of RANSAC by generating good hypotheses and discarding bad hypotheses in advance, respectively. The last aspect improves the accuracy of motion estimation. Our method was evaluated in the KITTI (Karlsruhe Institute of Technology and Toyota Technological Institute) and the New Tsukuba dataset. Experimental results show that the proposed method achieves better results for both speed and accuracy than RANSAC.
Collapse
|
29
|
3D Reconstruction of Space Objects from Multi-Views by a Visible Sensor. SENSORS 2017; 17:s17071689. [PMID: 28737675 PMCID: PMC5539474 DOI: 10.3390/s17071689] [Citation(s) in RCA: 18] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 06/01/2017] [Revised: 07/20/2017] [Accepted: 07/20/2017] [Indexed: 11/17/2022]
Abstract
In this paper, a novel 3D reconstruction framework is proposed to recover the 3D structural model of a space object from its multi-view images captured by a visible sensor. Given an image sequence, this framework first estimates the relative camera poses and recovers the depths of the surface points by the structure from motion (SFM) method, then the patch-based multi-view stereo (PMVS) algorithm is utilized to generate a dense 3D point cloud. To resolve the wrong matches arising from the symmetric structure and repeated textures of space objects, a new strategy is introduced, in which images are added to SFM in imaging order. Meanwhile, a refining process exploiting the structural prior knowledge that most sub-components of artificial space objects are composed of basic geometric shapes is proposed and applied to the recovered point cloud. The proposed reconstruction framework is tested on both simulated image datasets and real image datasets. Experimental results illustrate that the recovered point cloud models of space objects are accurate and have a complete coverage of the surface. Moreover, outliers and points with severe noise are effectively filtered out by the refinement, resulting in an distinct improvement of the structure and visualization of the recovered points.
Collapse
|
30
|
Chebiyyam M, Chaudhury S, Kar IN. Recursive Structure from Motion. COMPUTER VISION, GRAPHICS, AND IMAGE PROCESSING 2017:109-119. [DOI: 10.1007/978-3-319-68124-5_10] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 07/19/2023]
|
31
|
Huang AS, Bachrach A, Henry P, Krainin M, Maturana D, Fox D, Roy N. Visual Odometry and Mapping for Autonomous Flight Using an RGB-D Camera. SPRINGER TRACTS IN ADVANCED ROBOTICS 2017. [DOI: 10.1007/978-3-319-29363-9_14] [Citation(s) in RCA: 124] [Impact Index Per Article: 15.5] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/30/2022]
|
32
|
Zhang G, Liu H, Dong Z, Jia J, Wong TT, Bao H. Efficient Non-Consecutive Feature Tracking for Robust Structure-From-Motion. IEEE TRANSACTIONS ON IMAGE PROCESSING : A PUBLICATION OF THE IEEE SIGNAL PROCESSING SOCIETY 2016; 25:5957-5970. [PMID: 27623586 DOI: 10.1109/tip.2016.2607425] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/06/2023]
Abstract
Structure-from-motion (SfM) largely relies on feature tracking. In image sequences, if disjointed tracks caused by objects moving in and out of the field of view, occasional occlusion, or image noise are not handled well, corresponding SfM could be affected. This problem becomes severer for large-scale scenes, which typically requires to capture multiple sequences to cover the whole scene. In this paper, we propose an efficient non-consecutive feature tracking framework to match interrupted tracks distributed in different subsequences or even in different videos. Our framework consists of steps of solving the feature "dropout" problem when indistinctive structures, noise or large image distortion exists, and of rapidly recognizing and joining common features located in different subsequences. In addition, we contribute an effective segment-based coarse-to-fine SfM algorithm for robustly handling large data sets. Experimental results on challenging video data demonstrate the effectiveness of the proposed system.
Collapse
|
33
|
Yang C, Zhou F, Bai X, Cao L, Xiong X, Yu X. Three dimension reconstruction through measure-based image selection. THE IMAGING SCIENCE JOURNAL 2016. [DOI: 10.1080/13682199.2015.1104069] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 10/22/2022]
|
34
|
Kwasnitschka T, Köser K, Sticklus J, Rothenbeck M, Weiß T, Wenzlaff E, Schoening T, Triebe L, Steinführer A, Devey C, Greinert J. DeepSurveyCam--A Deep Ocean Optical Mapping System. SENSORS 2016; 16:164. [PMID: 26828495 PMCID: PMC4801542 DOI: 10.3390/s16020164] [Citation(s) in RCA: 37] [Impact Index Per Article: 4.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 11/01/2015] [Revised: 01/20/2016] [Accepted: 01/22/2016] [Indexed: 11/16/2022]
Abstract
Underwater photogrammetry and in particular systematic visual surveys of the deep sea are by far less developed than similar techniques on land or in space. The main challenges are the rough conditions with extremely high pressure, the accessibility of target areas (container and ship deployment of robust sensors, then diving for hours to the ocean floor), and the limitations of localization technologies (no GPS). The absence of natural light complicates energy budget considerations for deep diving flash-equipped drones. Refraction effects influence geometric image formation considerations with respect to field of view and focus, while attenuation and scattering degrade the radiometric image quality and limit the effective visibility. As an improvement on the stated issues, we present an AUV-based optical system intended for autonomous visual mapping of large areas of the seafloor (square kilometers) in up to 6000 m water depth. We compare it to existing systems and discuss tradeoffs such as resolution vs. mapped area and show results from a recent deployment with 90,000 mapped square meters of deep ocean floor.
Collapse
Affiliation(s)
- Tom Kwasnitschka
- GEOMAR Helmholtz Centre for Ocean Research Kiel, RD4/RD2, Wischhofstr. 1-3, 24148 Kiel, Germany.
| | - Kevin Köser
- GEOMAR Helmholtz Centre for Ocean Research Kiel, RD4/RD2, Wischhofstr. 1-3, 24148 Kiel, Germany.
| | - Jan Sticklus
- GEOMAR Helmholtz Centre for Ocean Research Kiel, RD4/RD2, Wischhofstr. 1-3, 24148 Kiel, Germany.
| | - Marcel Rothenbeck
- GEOMAR Helmholtz Centre for Ocean Research Kiel, RD4/RD2, Wischhofstr. 1-3, 24148 Kiel, Germany.
| | - Tim Weiß
- GEOMAR Helmholtz Centre for Ocean Research Kiel, RD4/RD2, Wischhofstr. 1-3, 24148 Kiel, Germany.
| | - Emanuel Wenzlaff
- GEOMAR Helmholtz Centre for Ocean Research Kiel, RD4/RD2, Wischhofstr. 1-3, 24148 Kiel, Germany.
| | - Timm Schoening
- GEOMAR Helmholtz Centre for Ocean Research Kiel, RD4/RD2, Wischhofstr. 1-3, 24148 Kiel, Germany.
| | - Lars Triebe
- GEOMAR Helmholtz Centre for Ocean Research Kiel, RD4/RD2, Wischhofstr. 1-3, 24148 Kiel, Germany.
| | - Anja Steinführer
- GEOMAR Helmholtz Centre for Ocean Research Kiel, RD4/RD2, Wischhofstr. 1-3, 24148 Kiel, Germany.
| | - Colin Devey
- GEOMAR Helmholtz Centre for Ocean Research Kiel, RD4/RD2, Wischhofstr. 1-3, 24148 Kiel, Germany.
| | - Jens Greinert
- GEOMAR Helmholtz Centre for Ocean Research Kiel, RD4/RD2, Wischhofstr. 1-3, 24148 Kiel, Germany.
| |
Collapse
|
35
|
Taneja A, Ballan L, Pollefeys M. Geometric change detection in urban environments using images. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE 2015; 37:2193-2206. [PMID: 26440261 DOI: 10.1109/tpami.2015.2404834] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/05/2023]
Abstract
We propose a method to detect changes in the geometry of a city using panoramic images captured by a car driving around the city. The proposed method can be used to significantly optimize the process of updating the 3D model of an urban environment that is changing over time, by restricting this process to only those areas where changes are detected. With this application in mind, we designed our algorithm to specifically detect only structural changes in the environment, ignoring any changes in its appearance, and ignoring also all the changes which are not relevant for update purposes such as cars, people etc. The approach also accounts for the challenges involved in a large scale application of change detection, such as inaccuracies in the input geometry, errors in the geo-location data of the images as well as the limited amount of information due to sparse imagery. We evaluated our approach on a small scale setup using high resolution, densely captured images and a large scale setup covering an entire city using instead the more realistic scenario of low resolution, sparsely captured images. A quantitative evaluation was also conducted for the large scale setup consisting of 14,000 images.
Collapse
|
36
|
Cui H, Shen S, Gao W, Hu Z. Efficient Large-Scale Structure From Motion by Fusing Auxiliary Imaging Information. IEEE TRANSACTIONS ON IMAGE PROCESSING : A PUBLICATION OF THE IEEE SIGNAL PROCESSING SOCIETY 2015; 24:3561-3573. [PMID: 26111397 DOI: 10.1109/tip.2015.2449557] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/04/2023]
Abstract
One of the potentially effective means for large-scale 3D scene reconstruction is to reconstruct the scene in a global manner, rather than incrementally, by fully exploiting available auxiliary information on the imaging condition, such as camera location by Global Positioning System (GPS), orientation by inertial measurement unit (or compass), focal length from EXIF, and so on. However, such auxiliary information, though informative and valuable, is usually too noisy to be directly usable. In this paper, we present an approach by taking advantage of such noisy auxiliary information to improve structure from motion solving. More specifically, we introduce two effective iterative global optimization algorithms initiated with such noisy auxiliary information. One is a robust rotation averaging algorithm to deal with contaminated epipolar graph, the other is a robust scene reconstruction algorithm to deal with noisy GPS data for camera centers initialization. We found that by exclusively focusing on the estimated inliers at the current iteration, the optimization process initialized by such noisy auxiliary information could converge well and efficiently. Our proposed method is evaluated on real images captured by unmanned aerial vehicle, StreetView car, and conventional digital cameras. Extensive experimental results show that our method performs similarly or better than many of the state-of-art reconstruction approaches, in terms of reconstruction accuracy and completeness, but is more efficient and scalable for large-scale image data sets.
Collapse
|
37
|
Ondrúška P, Kohli P, Izadi S. MobileFusion: real-time volumetric surface reconstruction and dense tracking on mobile phones. IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS 2015; 21:1251-1258. [PMID: 26439826 DOI: 10.1109/tvcg.2015.2459902] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/05/2023]
Abstract
We present the first pipeline for real-time volumetric surface reconstruction and dense 6DoF camera tracking running purely on standard, off-the-shelf mobile phones. Using only the embedded RGB camera, our system allows users to scan objects of varying shape, size, and appearance in seconds, with real-time feedback during the capture process. Unlike existing state of the art methods, which produce only point-based 3D models on the phone, or require cloud-based processing, our hybrid GPU/CPU pipeline is unique in that it creates a connected 3D surface model directly on the device at 25Hz. In each frame, we perform dense 6DoF tracking, which continuously registers the RGB input to the incrementally built 3D model, minimizing a noise aware photoconsistency error metric. This is followed by efficient key-frame selection, and dense per-frame stereo matching. These depth maps are fused volumetrically using a method akin to KinectFusion, producing compelling surface models. For each frame, the implicit surface is extracted for live user feedback and pose estimation. We demonstrate scans of a variety of objects, and compare to a Kinect-based baseline, showing on average ∼ 1.5cm error. We qualitatively compare to a state of the art point-based mobile phone method, demonstrating an order of magnitude faster scanning times, and fully connected surface models.
Collapse
|
38
|
Cloud-Based Geospatial 3D Image Spaces—A Powerful Urban Model for the Smart City. ISPRS INTERNATIONAL JOURNAL OF GEO-INFORMATION 2015. [DOI: 10.3390/ijgi4042267] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/16/2022]
|
39
|
Natour GE, Ait-Aider O, Rouveure R, Berry F, Faure P. Toward 3D reconstruction of outdoor scenes using an MMW radar and a monocular vision sensor. SENSORS 2015; 15:25937-67. [PMID: 26473874 PMCID: PMC4634451 DOI: 10.3390/s151025937] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 07/08/2015] [Revised: 09/10/2015] [Accepted: 09/22/2015] [Indexed: 12/01/2022]
Abstract
In this paper, we introduce a geometric method for 3D reconstruction of the exterior environment using a panoramic microwave radar and a camera. We rely on the complementarity of these two sensors considering the robustness to the environmental conditions and depth detection ability of the radar, on the one hand, and the high spatial resolution of a vision sensor, on the other. Firstly, geometric modeling of each sensor and of the entire system is presented. Secondly, we address the global calibration problem, which consists of finding the exact transformation between the sensors’ coordinate systems. Two implementation methods are proposed and compared, based on the optimization of a non-linear criterion obtained from a set of radar-to-image target correspondences. Unlike existing methods, no special configuration of the 3D points is required for calibration. This makes the methods flexible and easy to use by a non-expert operator. Finally, we present a very simple, yet robust 3D reconstruction method based on the sensors’ geometry. This method enables one to reconstruct observed features in 3D using one acquisition (static sensor), which is not always met in the state of the art for outdoor scene reconstruction. The proposed methods have been validated with synthetic and real data.
Collapse
Affiliation(s)
- Ghina El Natour
- Lasmea-UMR UBP-CNRS 6602, Université Blaise Pascal, Aubière 63170, France.
| | - Omar Ait-Aider
- Lasmea-UMR UBP-CNRS 6602, Université Blaise Pascal, Aubière 63170, France.
| | - Raphael Rouveure
- IRSTEA, Institut National de Recherche en Sciences et Technologies pour l'Environnement et l'Agriculture, Aubière 63170, France.
| | - François Berry
- Lasmea-UMR UBP-CNRS 6602, Université Blaise Pascal, Aubière 63170, France.
| | - Patrice Faure
- IRSTEA, Institut National de Recherche en Sciences et Technologies pour l'Environnement et l'Agriculture, Aubière 63170, France.
| |
Collapse
|
40
|
A Probabilistic Feature Map-Based Localization System Using a Monocular Camera. SENSORS 2015; 15:21636-59. [PMID: 26404284 PMCID: PMC4610567 DOI: 10.3390/s150921636] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 04/27/2015] [Revised: 08/23/2015] [Accepted: 08/24/2015] [Indexed: 11/17/2022]
Abstract
Image-based localization is one of the most widely researched localization techniques in the robotics and computer vision communities. As enormous image data sets are provided through the Internet, many studies on estimating a location with a pre-built image-based 3D map have been conducted. Most research groups use numerous image data sets that contain sufficient features. In contrast, this paper focuses on image-based localization in the case of insufficient images and features. A more accurate localization method is proposed based on a probabilistic map using 3D-to-2D matching correspondences between a map and a query image. The probabilistic feature map is generated in advance by probabilistic modeling of the sensor system as well as the uncertainties of camera poses. Using the conventional PnP algorithm, an initial camera pose is estimated on the probabilistic feature map. The proposed algorithm is optimized from the initial pose by minimizing Mahalanobis distance errors between features from the query image and the map to improve accuracy. To verify that the localization accuracy is improved, the proposed algorithm is compared with the conventional algorithm in a simulation and realenvironments.
Collapse
|
41
|
Cheng F, Zhang H, Sun M, Yuan D. Cross-trees, Edge and Superpixel Priors-based Cost aggregation for Stereo matching. PATTERN RECOGNITION 2015; 48:2269-2278. [PMID: 26034314 PMCID: PMC4448781 DOI: 10.1016/j.patcog.2015.01.002] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/04/2023]
Abstract
In this paper, we propose a novel cross-trees structure to perform the nonlocal cost aggregation strategy, and the cross-trees structure consists of a horizontal-tree and a vertical-tree. Compared to other spanning trees, the significant superiorities of the cross-trees are that the trees' constructions are efficient and the trees are exactly unique since the constructions are independent on any local or global property of the image itself. Additionally, two different priors: edge prior and superpixel prior, are proposed to tackle the false cost aggregations which cross the depth boundaries. Hence, our method contains two different algorithms in terms of cross-trees+prior. By traversing the two crossed trees successively, a fast non-local cost aggregation algorithm is performed twice to compute the aggregated cost volume. Performance evaluation on the 27 Middlebury data sets shows that both our algorithms outperform the other two tree-based non-local methods, namely minimum spanning tree (MST) and segment-tree (ST).
Collapse
Affiliation(s)
- Feiyang Cheng
- Image Research Center, Beihang University, Beijing, China
| | - Hong Zhang
- Image Research Center, Beihang University, Beijing, China
| | - Mingui Sun
- Department of Neurosurgery, University of Pittsburgh, Pittsburgh, USA
| | - Ding Yuan
- Image Research Center, Beihang University, Beijing, China
| |
Collapse
|
42
|
Automated 3D Scene Reconstruction from Open Geospatial Data Sources: Airborne Laser Scanning and a 2D Topographic Database. REMOTE SENSING 2015. [DOI: 10.3390/rs70606710] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/16/2022]
|
43
|
|
44
|
Zakharov AA, Barinov AE. An algorithm for 3D-object reconstruction from video using stereo correspondences. PATTERN RECOGNITION AND IMAGE ANALYSIS 2015. [DOI: 10.1134/s1054661815010228] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/23/2022]
|
45
|
Sensor fusion of cameras and a laser for city-scale 3D reconstruction. SENSORS 2014; 14:20882-909. [PMID: 25375758 PMCID: PMC4279516 DOI: 10.3390/s141120882] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 08/13/2014] [Revised: 09/18/2014] [Accepted: 09/18/2014] [Indexed: 11/17/2022]
Abstract
This paper presents a sensor fusion system of cameras and a 2D laser sensor for large-scale 3D reconstruction. The proposed system is designed to capture data on a fast-moving ground vehicle. The system consists of six cameras and one 2D laser sensor, and they are synchronized by a hardware trigger. Reconstruction of 3D structures is done by estimating frame-by-frame motion and accumulating vertical laser scans, as in previous works. However, our approach does not assume near 2D motion, but estimates free motion (including absolute scale) in 3D space using both laser data and image features. In order to avoid the degeneration associated with typical three-point algorithms, we present a new algorithm that selects 3D points from two frames captured by multiple cameras. The problem of error accumulation is solved by loop closing, not by GPS. The experimental results show that the estimated path is successfully overlaid on the satellite images, such that the reconstruction result is very accurate.
Collapse
|
46
|
Kyriakaki G, Doulamis A, Doulamis N, Ioannides M, Makantasis K, Protopapadakis E, Hadjiprocopis A, Wenzel K, Fritsch D, Klein M, Weinlinger G. 4D Reconstruction of Tangible Cultural Heritage Objects from Web-Retrieved Images. ACTA ACUST UNITED AC 2014. [DOI: 10.1260/2047-4970.3.2.431] [Citation(s) in RCA: 48] [Impact Index Per Article: 4.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/03/2022]
Abstract
The number of digital images that are available online today has reached unprecedented levels. Recent statistics showed that by the end of 2013 there were over 250 billion photographs stored in just one of the major social media sites, with a daily average upload of 300 million photos. These photos, apart from documenting personal lives, often relate to experiences in well-known places of cultural interest, throughout several periods of time. Thus from the viewpoint of Cultural Heritage professionals, they constitute valuable and freely available digital cultural content. Advances in the fields of Photogrammetry and Computer Vision have led to significant breakthroughs such as the Structure from Motion algorithm which creates 3D models of objects using their 2D photographs. The existence of powerful and affordable computational machinery enables the reconstruction not only of single structures such as artefacts, but also of entire cities. This paper presents an overview of our methodology for producing cost-effective 4D – i.e. in space and time – models of Cultural Heritage structures such as monuments and artefacts from 2D data (pictures, video) and semantic information, freely available ‘in the wild’, i.e. in Internet repositories and social media. State-of-the-art methods from Computer Vision, Photogrammetry, 3D Reconstruction and Semantic representation are incorporated in an innovative workflow with the main goal to enable historians, architects, archaeologists, urban planners and other cultural heritage professionals to reconstruct cost-effective views of historical structures out of the billions of free images floating around the web and subsequently interact with those reconstructions.
Collapse
Affiliation(s)
- Georgia Kyriakaki
- Technical University of Crete, University Campus, Kounoupidiana, Chania, Greece
| | - Anastasios Doulamis
- Technical University of Crete, University Campus, Kounoupidiana, Chania, Greece
| | - Nikolaos Doulamis
- Cyprus University of Technology, Dept. of Electrical and Computer Engineering, 30, Archbishop Kyprianou, Lemesos, Cyprus, 3036
| | - Marinos Ioannides
- Cyprus University of Technology, Dept. of Electrical and Computer Engineering, 30, Archbishop Kyprianou, Lemesos, Cyprus, 3036
| | | | | | - Andreas Hadjiprocopis
- Cyprus University of Technology, Dept. of Electrical and Computer Engineering, 30, Archbishop Kyprianou, Lemesos, Cyprus, 3036
| | - Konrad Wenzel
- Institute of Photogrammetry, University of Stuttgart, Stuttgart, Germany
| | - Dieter Fritsch
- Institute of Photogrammetry, University of Stuttgart, Stuttgart, Germany
| | - Michael Klein
- 7reasons Medien GmbH, Seefeldgasse 72, A-3462 Absdorf, Austria
| | | |
Collapse
|
47
|
Putting the User in the Loop for Image-Based Modeling. Int J Comput Vis 2014. [DOI: 10.1007/s11263-014-0704-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/25/2022]
|
48
|
Exploiting Visibility Information in Surface Reconstruction to Preserve Weakly Supported Surfaces. INTERNATIONAL SCHOLARLY RESEARCH NOTICES 2014; 2014:798595. [PMID: 27437454 PMCID: PMC4897344 DOI: 10.1155/2014/798595] [Citation(s) in RCA: 40] [Impact Index Per Article: 3.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 03/12/2014] [Accepted: 05/03/2014] [Indexed: 11/18/2022]
Abstract
We present a novel method for 3D surface reconstruction from an input cloud of 3D points augmented with visibility information. We observe that it is possible to reconstruct surfaces that do not contain input points. Instead of modeling the surface from input points, we model free space from visibility information of the input points. The complement of the modeled free space is considered full space. The surface occurs at interface between the free and the full space. We show that under certain conditions a part of the full space surrounded by the free space must contain a real object also when the real object does not contain any input points; that is, an occluder reveals itself through occlusion. Our key contribution is the proposal of a new interface classifier that can also detect the occluder interface just from the visibility of input points. We use the interface classifier to modify the state-of-the-art surface reconstruction method so that it gains the ability to reconstruct weakly supported surfaces. We evaluate proposed method on datasets augmented with different levels of noise, undersampling, and amount of outliers. We show that the proposed method outperforms other methods in accuracy and ability to reconstruct weakly supported surfaces.
Collapse
|
49
|
Shen S, Hu Z. How to select good neighboring images in depth-map merging based 3D modeling. IEEE TRANSACTIONS ON IMAGE PROCESSING : A PUBLICATION OF THE IEEE SIGNAL PROCESSING SOCIETY 2014; 23:308-318. [PMID: 24240002 DOI: 10.1109/tip.2013.2290597] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/02/2023]
Abstract
Depth-map merging based 3D modeling is an effective approach for reconstructing large-scale scenes from multiple images. In addition to generate high quality depth maps at each image, how to select suitable neighboring images for each image is also an important step in the reconstruction pipeline, unfortunately to which little attention has been paid in the literature until now. This paper is intended to tackle this issue for large scale scene reconstruction where many unordered images are captured and used with substantial varying scale and view-angle changes. We formulate the neighboring image selection as a combinatorial optimization problem and use the quantum-inspired evolutionary algorithm to seek its optimal solution. Experimental results on the ground truth data set show that our approach can significantly improve the quality of the depth-maps as well as final 3D reconstruction results with high computational efficiency.
Collapse
|
50
|
Fang T, Wang Z, Zhang H, Quan L. Image-based modeling of unwrappable façades. IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS 2013; 19:1720-1731. [PMID: 23929851 DOI: 10.1109/tvcg.2013.68] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/02/2023]
Abstract
In this paper, we propose an unwrappable representation for image-based façade modeling from multiple registered images. An unwrappable façade is represented by the mutually orthogonal baseline and profile. We first reconstruct semidense 3D points from images, then the baseline and profile are extracted from the point cloud to construct the base shape and compose the textures of the building from the images. Through our unwrapping process, the reconstructed 3D points and composed textures are further mapped to an unwrapped space that is parameterized by the baseline and profile. In doing so, the unwrapped space becomes equivalent to the planar space in which planar façade modeling techniques can be used to reconstruct the details of the buildings. Finally, the augmented details can be wrapped back to the original 3D space to generate the final model. This newly introduced unwrappable representation extends the state-of-the-art modeling for planar façades to a more general class of façades. We demonstrate the power of the unwrappable representation with a few examples in which the façade is not planar.
Collapse
Affiliation(s)
- Tian Fang
- Department of Computer Science and Engineering, The Hong Kong University of Science and Technology, Clear Water Bay, Kowloon, Hong Kong.
| | | | | | | |
Collapse
|