1
|
Liu S, Fan J, Yang Y, Xiao D, Ai D, Song H, Wang Y, Yang J. Monocular endoscopy images depth estimation with multi-scale residual fusion. Comput Biol Med 2024; 169:107850. [PMID: 38145602 DOI: 10.1016/j.compbiomed.2023.107850] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/04/2023] [Revised: 11/16/2023] [Accepted: 12/11/2023] [Indexed: 12/27/2023]
Abstract
BACKGROUND Monocular depth estimation plays a fundamental role in clinical endoscopy surgery. However, the coherent illumination, smooth surfaces, and texture-less nature of endoscopy images present significant challenges to traditional depth estimation methods. Existing approaches struggle to accurately perceive depth in such settings. METHOD To overcome these challenges, this paper proposes a novel multi-scale residual fusion method for estimating the depth of monocular endoscopy images. Specifically, we address the issue of coherent illumination by leveraging image frequency domain component space transformation, thereby enhancing the stability of the scene's light source. Moreover, we employ an image radiation intensity attenuation model to estimate the initial depth map. Finally, to refine the accuracy of depth estimation, we utilize a multi-scale residual fusion optimization technique. RESULTS To evaluate the performance of our proposed method, extensive experiments were conducted on public datasets. The structural similarity measures for continuous frames in three distinct clinical data scenes reached impressive values of 0.94, 0.82, and 0.84, respectively. These results demonstrate the effectiveness of our approach in capturing the intricate details of endoscopy images. Furthermore, the depth estimation accuracy achieved remarkable levels of 89.3 % and 91.2 % for the two models' data, respectively, underscoring the robustness of our method. CONCLUSIONS Overall, the promising results obtained on public datasets highlight the significant potential of our method for clinical applications, facilitating reliable depth estimation and enhancing the quality of endoscopy surgical procedures.
Collapse
Affiliation(s)
- Shiyuan Liu
- Beijing Engineering Research Center of Mixed Reality and Advanced Display, School of Optics and Photonics, Beijing Institute of Technology, Beijing, 100081, China; China Center for Information Industry Development, Beijing, 100081, China
| | - Jingfan Fan
- Beijing Engineering Research Center of Mixed Reality and Advanced Display, School of Optics and Photonics, Beijing Institute of Technology, Beijing, 100081, China.
| | - Yun Yang
- Department of General Surgery, Beijing Friendship Hospital, Capital Medical University, National Clinical Research Center for Digestive Diseases, Beijing 100050, China
| | - Deqiang Xiao
- Beijing Engineering Research Center of Mixed Reality and Advanced Display, School of Optics and Photonics, Beijing Institute of Technology, Beijing, 100081, China
| | - Danni Ai
- Beijing Engineering Research Center of Mixed Reality and Advanced Display, School of Optics and Photonics, Beijing Institute of Technology, Beijing, 100081, China
| | - Hong Song
- School of Computer Science and Technology, Beijing Institute of Technology, Beijing, 100081, China
| | - Yongtian Wang
- Beijing Engineering Research Center of Mixed Reality and Advanced Display, School of Optics and Photonics, Beijing Institute of Technology, Beijing, 100081, China.
| | - Jian Yang
- Beijing Engineering Research Center of Mixed Reality and Advanced Display, School of Optics and Photonics, Beijing Institute of Technology, Beijing, 100081, China
| |
Collapse
|
2
|
van Bokhorst QNE, Houwen BBSL, Hazewinkel Y, Fockens P, Dekker E. Advances in artificial intelligence and computer science for computer-aided diagnosis of colorectal polyps: current status. Endosc Int Open 2023; 11:E752-E767. [PMID: 37593158 PMCID: PMC10431975 DOI: 10.1055/a-2098-1999] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 02/04/2023] [Accepted: 05/08/2023] [Indexed: 08/19/2023] Open
Affiliation(s)
- Querijn N E van Bokhorst
- Department of Gastroenterology and Hepatology, Amsterdam University Medical Centers, location Academic Medical Center, Amsterdam, the Netherlands
- Amsterdam Gastroenterology Endocrinology Metabolism, Amsterdam, the Netherlands
| | - Britt B S L Houwen
- Department of Gastroenterology and Hepatology, Amsterdam University Medical Centers, location Academic Medical Center, Amsterdam, the Netherlands
- Amsterdam Gastroenterology Endocrinology Metabolism, Amsterdam, the Netherlands
| | - Yark Hazewinkel
- Department of Gastroenterology and Hepatology, Tergooi Medical Center, Hilversum, the Netherlands
| | - Paul Fockens
- Department of Gastroenterology and Hepatology, Amsterdam University Medical Centers, location Academic Medical Center, Amsterdam, the Netherlands
- Amsterdam Gastroenterology Endocrinology Metabolism, Amsterdam, the Netherlands
| | - Evelien Dekker
- Department of Gastroenterology and Hepatology, Amsterdam University Medical Centers, location Academic Medical Center, Amsterdam, the Netherlands
- Amsterdam Gastroenterology Endocrinology Metabolism, Amsterdam, the Netherlands
| |
Collapse
|
3
|
Mathew A, Magerand L, Trucco E, Manfredi L. Self-supervised monocular depth estimation for high field of view colonoscopy cameras. Front Robot AI 2023; 10:1212525. [PMID: 37559569 PMCID: PMC10407791 DOI: 10.3389/frobt.2023.1212525] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/26/2023] [Accepted: 06/26/2023] [Indexed: 08/11/2023] Open
Abstract
Optical colonoscopy is the gold standard procedure to detect colorectal cancer, the fourth most common cancer in the United Kingdom. Up to 22%-28% of polyps can be missed during the procedure that is associated with interval cancer. A vision-based autonomous soft endorobot for colonoscopy can drastically improve the accuracy of the procedure by inspecting the colon more systematically with reduced discomfort. A three-dimensional understanding of the environment is essential for robot navigation and can also improve the adenoma detection rate. Monocular depth estimation with deep learning methods has progressed substantially, but collecting ground-truth depth maps remains a challenge as no 3D camera can be fitted to a standard colonoscope. This work addresses this issue by using a self-supervised monocular depth estimation model that directly learns depth from video sequences with view synthesis. In addition, our model accommodates wide field-of-view cameras typically used in colonoscopy and specific challenges such as deformable surfaces, specular lighting, non-Lambertian surfaces, and high occlusion. We performed qualitative analysis on a synthetic data set, a quantitative examination of the colonoscopy training model, and real colonoscopy videos in near real-time.
Collapse
Affiliation(s)
- Alwyn Mathew
- Division of Imaging Science and Technology, School of Medicine, University of Dundee, Dundee, United Kingdom
| | - Ludovic Magerand
- Discipline of Computing, School of Science and Engineering, University of Dundee, Dundee, United Kingdom
| | - Emanuele Trucco
- Discipline of Computing, School of Science and Engineering, University of Dundee, Dundee, United Kingdom
| | - Luigi Manfredi
- Division of Imaging Science and Technology, School of Medicine, University of Dundee, Dundee, United Kingdom
| |
Collapse
|
4
|
Liu Y, Zuo S. Self-supervised monocular depth estimation for gastrointestinal endoscopy. COMPUTER METHODS AND PROGRAMS IN BIOMEDICINE 2023; 238:107619. [PMID: 37235969 DOI: 10.1016/j.cmpb.2023.107619] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 12/07/2022] [Revised: 04/26/2023] [Accepted: 05/18/2023] [Indexed: 05/28/2023]
Abstract
BACKGROUND AND OBJECTIVE Gastrointestinal (GI) endoscopy represents a promising tool for GI cancer screening. However, the limited field of view and uneven skills of endoscopists make it remains difficult to accurately identify polyps and follow up on precancerous lesions under endoscopy. Estimating depth from GI endoscopic sequences is essential for a series of AI-assisted surgical techniques. Nonetheless, depth estimation algorithm of GI endoscopy is a challenging task due to the particularity of the environment and the limitation of datasets. In this paper, we propose a self-supervised monocular depth estimation method for GI endoscopy. METHODS A depth estimation network and a camera ego-motion estimation network are firstly constructed to obtain the depth information and pose information of the sequence respectively, and then the model is enabled to perform self-supervised training by calculating the multi-scale structural similarity with L1 norm (MS-SSIM+L1) loss function between the target frame and the reconstructed image as part of the loss of the training network. The MS-SSIM+L1 loss function is good for reserving high-frequency information and can maintain the invariance of brightness and color. Our model consists of the U-shape convolutional network with the dual-attention mechanism, which is beneficial to capture muti-scale contextual information, and greatly improves the accuracy of depth estimation. We evaluated our method qualitatively and quantitatively with different state-of-the-art methods. RESULTS AND CONCLUSIONS The experimental results manifest that our method has superior generality, achieving lower error metrics and higher accuracy metrics on both the UCL dataset and the Endoslam dataset. The proposed method has also been validated with clinical GI endoscopy, demonstrating the potential clinical value of the model.
Collapse
Affiliation(s)
- Yuying Liu
- Key Laboratory of Mechanism Theory and Equipment Design of Ministry of Education, Tianjin University, Tianjin, China
| | - Siyang Zuo
- Key Laboratory of Mechanism Theory and Equipment Design of Ministry of Education, Tianjin University, Tianjin, China.
| |
Collapse
|
5
|
Alizadeh Naeini A, Sheikholeslami MM, Sohn G. An Adaptive Refinement Scheme for Depth Estimation Networks. SENSORS (BASEL, SWITZERLAND) 2022; 22:9755. [PMID: 36560124 PMCID: PMC9786650 DOI: 10.3390/s22249755] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 10/17/2022] [Revised: 11/18/2022] [Accepted: 12/09/2022] [Indexed: 06/17/2023]
Abstract
Deep learning has proved to be a breakthrough in depth generation. However, the generalization ability of deep networks is still limited, and they cannot maintain a satisfactory performance on some inputs. By addressing a similar problem in the segmentation field, a feature backpropagating refinement scheme (f-BRS) has been proposed to refine predictions in the inference time. f-BRS adapts an intermediate activation function to each input by using user clicks as sparse labels. Given the similarity between user clicks and sparse depth maps, this paper aims to extend the application of f-BRS to depth prediction. Our experiments show that f-BRS, fused with a depth estimation baseline, is trapped in local optima, and fails to improve the network predictions. To resolve that, we propose a double-stage adaptive refinement scheme (DARS). In the first stage, a Delaunay-based correction module significantly improves the depth generated by a baseline network. In the second stage, a particle swarm optimizer (PSO) delineates the estimation through fine-tuning f-BRS parameters-that is, scales and biases. DARS is evaluated on an outdoor benchmark, KITTI, and an indoor benchmark, NYUv2, while for both, the network is pre-trained on KITTI. The proposed scheme was effective on both datasets.
Collapse
|
6
|
Gadipudi N, Elamvazuthi I, Lu CK, Paramasivam S, Su S. WPO-Net: Windowed Pose Optimization Network for Monocular Visual Odometry Estimation. SENSORS 2021; 21:s21238155. [PMID: 34884156 PMCID: PMC8662456 DOI: 10.3390/s21238155] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 10/08/2021] [Revised: 11/16/2021] [Accepted: 11/21/2021] [Indexed: 12/26/2022]
Abstract
Visual odometry is the process of estimating incremental localization of the camera in 3-dimensional space for autonomous driving. There have been new learning-based methods which do not require camera calibration and are robust to external noise. In this work, a new method that do not require camera calibration called the "windowed pose optimization network" is proposed to estimate the 6 degrees of freedom pose of a monocular camera. The architecture of the proposed network is based on supervised learning-based methods with feature encoder and pose regressor that takes multiple consecutive two grayscale image stacks at each step for training and enforces the composite pose constraints. The KITTI dataset is used to evaluate the performance of the proposed method. The proposed method yielded rotational error of 3.12 deg/100 m, and the training time is 41.32 ms, while inference time is 7.87 ms. Experiments demonstrate the competitive performance of the proposed method to other state-of-the-art related works which shows the novelty of the proposed technique.
Collapse
Affiliation(s)
- Nivesh Gadipudi
- Smart Assistive and Rehabilitative Technology (SMART) Research Group, Department of Electrical and Electronic Engineering, Universiti Teknologi PETRONAS, Bandar Seri Iskandar 32610, Malaysia; (N.G.); (C.-K.L.)
| | - Irraivan Elamvazuthi
- Smart Assistive and Rehabilitative Technology (SMART) Research Group, Department of Electrical and Electronic Engineering, Universiti Teknologi PETRONAS, Bandar Seri Iskandar 32610, Malaysia; (N.G.); (C.-K.L.)
- Correspondence:
| | - Cheng-Kai Lu
- Smart Assistive and Rehabilitative Technology (SMART) Research Group, Department of Electrical and Electronic Engineering, Universiti Teknologi PETRONAS, Bandar Seri Iskandar 32610, Malaysia; (N.G.); (C.-K.L.)
| | | | - Steven Su
- School of Biomedical Engineering, University of Technology Sydney, Ultimo 2007, Australia;
| |
Collapse
|
7
|
Zhuang H, Zhang J, Liao F. A systematic review on application of deep learning in digestive system image processing. THE VISUAL COMPUTER 2021; 39:2207-2222. [PMID: 34744231 PMCID: PMC8557108 DOI: 10.1007/s00371-021-02322-z] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Accepted: 09/30/2021] [Indexed: 05/07/2023]
Abstract
With the advent of the big data era, the application of artificial intelligence represented by deep learning in medicine has become a hot topic. In gastroenterology, deep learning has accomplished remarkable accomplishments in endoscopy, imageology, and pathology. Artificial intelligence has been applied to benign gastrointestinal tract lesions, early cancer, tumors, inflammatory bowel diseases, livers, pancreas, and other diseases. Computer-aided diagnosis significantly improve diagnostic accuracy and reduce physicians' workload and provide a shred of evidence for clinical diagnosis and treatment. In the near future, artificial intelligence will have high application value in the field of medicine. This paper mainly summarizes the latest research on artificial intelligence in diagnosing and treating digestive system diseases and discussing artificial intelligence's future in digestive system diseases. We sincerely hope that our work can become a stepping stone for gastroenterologists and computer experts in artificial intelligence research and facilitate the application and development of computer-aided image processing technology in gastroenterology.
Collapse
Affiliation(s)
- Huangming Zhuang
- Gastroenterology Department, Renmin Hospital of Wuhan University, Wuhan, 430060 Hubei China
| | - Jixiang Zhang
- Gastroenterology Department, Renmin Hospital of Wuhan University, Wuhan, 430060 Hubei China
| | - Fei Liao
- Gastroenterology Department, Renmin Hospital of Wuhan University, Wuhan, 430060 Hubei China
| |
Collapse
|
8
|
SFA-MDEN: Semantic-Feature-Aided Monocular Depth Estimation Network Using Dual Branches. SENSORS 2021; 21:s21165476. [PMID: 34450917 PMCID: PMC8398641 DOI: 10.3390/s21165476] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 06/30/2021] [Revised: 08/06/2021] [Accepted: 08/09/2021] [Indexed: 01/17/2023]
Abstract
Monocular depth estimation based on unsupervised learning has attracted great attention due to the rising demand for lightweight monocular vision sensors. Inspired by multi-task learning, semantic information has been used to improve the monocular depth estimation models. However, multi-task learning is still limited by multi-type annotations. As far as we know, there are scarcely any large public datasets that provide all the necessary information. Therefore, we propose a novel network architecture Semantic-Feature-Aided Monocular Depth Estimation Network (SFA-MDEN) to extract multi-resolution depth features and semantic features, which are merged and fed into the decoder, with the goal of predicting depth with the support of semantics. Instead of using loss functions to relate the semantics and depth, the fusion of feature maps for semantics and depth is employed to predict the monocular depth. Therefore, two accessible datasets with similar topics for depth estimation and semantic segmentation can meet the requirements of SFA-MDEN for training sets. We explored the performance of the proposed SFA-MDEN with experiments on different datasets, including KITTI, Make3D, and our own dataset BHDE-v1. The experimental results demonstrate that SFA-MDEN achieves competitive accuracy and generalization capacity compared to state-of-the-art methods.
Collapse
|