101
|
Chu Y, Yang X, Li H, Ai D, Ding Y, Fan J, Song H, Yang J. Multi-level feature aggregation network for instrument identification of endoscopic images. Phys Med Biol 2020; 65:165004. [PMID: 32344381 DOI: 10.1088/1361-6560/ab8dda] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022]
Abstract
Identification of surgical instruments is crucial in understanding surgical scenarios and providing an assistive process in endoscopic image-guided surgery. This study proposes a novel multilevel feature-aggregated deep convolutional neural network (MLFA-Net) for identifying surgical instruments in endoscopic images. First, a global feature augmentation layer is created on the top layer of the backbone to improve the localization ability of object identification by boosting the high-level semantic information to the feature flow network. Second, a modified interaction path of cross-channel features is proposed to increase the nonlinear combination of features in the same level and improve the efficiency of information propagation. Third, a multiview fusion branch of features is built to aggregate the location-sensitive information of the same level in different views, increase the information diversity of features, and enhance the localization ability of objects. By utilizing the latent information, the proposed network of multilevel feature aggregation can accomplish multitask instrument identification with a single network. Three tasks are handled by the proposed network, including object detection, which classifies the type of instrument and locates its border; mask segmentation, which detects the instrument shape; and pose estimation, which detects the keypoint of instrument parts. The experiments are performed on laparoscopic images from MICCAI 2017 Endoscopic Vision Challenge, and the mean average precision (AP) and average recall (AR) are utilized to quantify the segmentation and pose estimation results. For the bounding box regression, the AP and AR are 79.1% and 63.2%, respectively, while the AP and AR of mask segmentation are 78.1% and 62.1%, and the AP and AR of the pose estimation achieve 67.1% and 55.7%, respectively. The experiments demonstrate that our method efficiently improves the recognition accuracy of the instrument in endoscopic images, and outperforms the other state-of-the-art methods.
Collapse
Affiliation(s)
- Yakui Chu
- Beijing Engineering Research Center of Mixed Reality and Advanced Display, School of Optics and Photonics, Beijing Institute of Technology, Beijing 100081 People's Republic of China. Authors contribute equally to this article
| | | | | | | | | | | | | | | |
Collapse
|
102
|
Ward TM, Hashimoto DA, Ban Y, Rattner DW, Inoue H, Lillemoe KD, Rus DL, Rosman G, Meireles OR. Automated operative phase identification in peroral endoscopic myotomy. Surg Endosc 2020; 35:4008-4015. [PMID: 32720177 DOI: 10.1007/s00464-020-07833-9] [Citation(s) in RCA: 34] [Impact Index Per Article: 6.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/09/2020] [Accepted: 07/16/2020] [Indexed: 12/23/2022]
Abstract
BACKGROUND Artificial intelligence (AI) and computer vision (CV) have revolutionized image analysis. In surgery, CV applications have focused on surgical phase identification in laparoscopic videos. We proposed to apply CV techniques to identify phases in an endoscopic procedure, peroral endoscopic myotomy (POEM). METHODS POEM videos were collected from Massachusetts General and Showa University Koto Toyosu Hospitals. Videos were labeled by surgeons with the following ground truth phases: (1) Submucosal injection, (2) Mucosotomy, (3) Submucosal tunnel, (4) Myotomy, and (5) Mucosotomy closure. The deep-learning CV model-Convolutional Neural Network (CNN) plus Long Short-Term Memory (LSTM)-was trained on 30 videos to create POEMNet. We then used POEMNet to identify operative phases in the remaining 20 videos. The model's performance was compared to surgeon annotated ground truth. RESULTS POEMNet's overall phase identification accuracy was 87.6% (95% CI 87.4-87.9%). When evaluated on a per-phase basis, the model performed well, with mean unweighted and prevalence-weighted F1 scores of 0.766 and 0.875, respectively. The model performed best with longer phases, with 70.6% accuracy for phases that had a duration under 5 min and 88.3% accuracy for longer phases. DISCUSSION A deep-learning-based approach to CV, previously successful in laparoscopic video phase identification, translates well to endoscopic procedures. With continued refinements, AI could contribute to intra-operative decision-support systems and post-operative risk prediction.
Collapse
Affiliation(s)
- Thomas M Ward
- Surgical AI and Innovation Laboratory, Massachusetts General Hospital, 15 Parkman St., WAC 460, Boston, MA, 02114, USA.
- Department of Surgery, Massachusetts General Hospital, Boston, MA, USA.
| | - Daniel A Hashimoto
- Surgical AI and Innovation Laboratory, Massachusetts General Hospital, 15 Parkman St., WAC 460, Boston, MA, 02114, USA
- Department of Surgery, Massachusetts General Hospital, Boston, MA, USA
| | - Yutong Ban
- Surgical AI and Innovation Laboratory, Massachusetts General Hospital, 15 Parkman St., WAC 460, Boston, MA, 02114, USA
- Department of Surgery, Massachusetts General Hospital, Boston, MA, USA
- Computer Science and Artificial Intelligence Laboratory, Massachusetts Institute of Technology, Cambridge, MA, USA
| | - David W Rattner
- Department of Surgery, Massachusetts General Hospital, Boston, MA, USA
| | - Haruhiro Inoue
- Digestive Disease Center, Showa University Koto Toyosu Hospital, Tokyo, Japan
| | - Keith D Lillemoe
- Department of Surgery, Massachusetts General Hospital, Boston, MA, USA
| | - Daniela L Rus
- Computer Science and Artificial Intelligence Laboratory, Massachusetts Institute of Technology, Cambridge, MA, USA
| | - Guy Rosman
- Surgical AI and Innovation Laboratory, Massachusetts General Hospital, 15 Parkman St., WAC 460, Boston, MA, 02114, USA
- Computer Science and Artificial Intelligence Laboratory, Massachusetts Institute of Technology, Cambridge, MA, USA
| | - Ozanan R Meireles
- Surgical AI and Innovation Laboratory, Massachusetts General Hospital, 15 Parkman St., WAC 460, Boston, MA, 02114, USA
- Department of Surgery, Massachusetts General Hospital, Boston, MA, USA
| |
Collapse
|
103
|
Tanzi L, Piazzolla P, Vezzetti E. Intraoperative surgery room management: A deep learning perspective. Int J Med Robot 2020; 16:1-12. [PMID: 32510857 DOI: 10.1002/rcs.2136] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/01/2020] [Revised: 04/21/2020] [Accepted: 06/03/2020] [Indexed: 12/22/2022]
Abstract
PURPOSE The current study aimed to systematically review the literature addressing the use of deep learning (DL) methods in intraoperative surgery applications, focusing on the data collection, the objectives of these tools and, more technically, the DL-based paradigms utilized. METHODS A literature search with classic databases was performed: we identified, with the use of specific keywords, a total of 996 papers. Among them, we selected 52 for effective analysis, focusing on articles published after January 2015. RESULTS The preliminary results of the implementation of DL in clinical setting are encouraging. Almost all the surgery sub-fields have seen the advent of artificial intelligence (AI) applications and the results outperformed the previous techniques in the majority of the cases. From these results, a conceptualization of an intelligent operating room (IOR) is also presented. CONCLUSION This evaluation outlined how AI and, in particular, DL are revolutionizing the surgery field, with numerous applications, such as context detection and room management. This process is evolving years by years into the realization of an IOR, equipped with technologies perfectly suited to drastically improve the surgical workflow.
Collapse
|
104
|
LRTD: long-range temporal dependency based active learning for surgical workflow recognition. Int J Comput Assist Radiol Surg 2020; 15:1573-1584. [PMID: 32588246 DOI: 10.1007/s11548-020-02198-9] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/19/2019] [Accepted: 05/18/2020] [Indexed: 10/24/2022]
Abstract
PURPOSE Automatic surgical workflow recognition in video is an essentially fundamental yet challenging problem for developing computer-assisted and robotic-assisted surgery. Existing approaches with deep learning have achieved remarkable performance on analysis of surgical videos, however, heavily relying on large-scale labelled datasets. Unfortunately, the annotation is not often available in abundance, because it requires the domain knowledge of surgeons. Even for experts, it is very tedious and time-consuming to do a sufficient amount of annotations. METHODS In this paper, we propose a novel active learning method for cost-effective surgical video analysis. Specifically, we propose a non-local recurrent convolutional network, which introduces non-local block to capture the long-range temporal dependency (LRTD) among continuous frames. We then formulate an intra-clip dependency score to represent the overall dependency within this clip. By ranking scores among clips in unlabelled data pool, we select the clips with weak dependencies to annotate, which indicates the most informative ones to better benefit network training. RESULTS We validate our approach on a large surgical video dataset (Cholec80) by performing surgical workflow recognition task. By using our LRTD based selection strategy, we can outperform other state-of-the-art active learning methods who only consider neighbor-frame information. Using only up to 50% of samples, our approach can exceed the performance of full-data training. CONCLUSION By modeling the intra-clip dependency, our LRTD based strategy shows stronger capability to select informative video clips for annotation compared with other active learning methods, through the evaluation on a popular public surgical dataset. The results also show the promising potential of our framework for reducing annotation workload in the clinical practice.
Collapse
|
105
|
Chen JH, Tseng YJ. Different molecular enumeration influences in deep learning: an example using aqueous solubility. Brief Bioinform 2020; 22:5851267. [PMID: 32501508 DOI: 10.1093/bib/bbaa092] [Citation(s) in RCA: 16] [Impact Index Per Article: 3.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/11/2020] [Revised: 04/27/2020] [Accepted: 04/27/2020] [Indexed: 12/24/2022] Open
Abstract
Aqueous solubility is the key property driving many chemical and biological phenomena and impacts experimental and computational attempts to assess those phenomena. Accurate prediction of solubility is essential and challenging, even with modern computational algorithms. Fingerprint-based, feature-based and molecular graph-based representations have all been used with different deep learning methods for aqueous solubility prediction. It has been clearly demonstrated that different molecular representations impact the model prediction and explainability. In this work, we reviewed different representations and also focused on using graph and line notations for modeling. In general, one canonical chemical structure is used to represent one molecule when computing its properties. We carefully examined the commonly used simplified molecular-input line-entry specification (SMILES) notation representing a single molecule and proposed to use the full enumerations in SMILES to achieve better accuracy. A convolutional neural network (CNN) was used. The full enumeration of SMILES can improve the presentation of a molecule and describe the molecule with all possible angles. This CNN model can be very robust when dealing with large datasets since no additional explicit chemistry knowledge is necessary to predict the solubility. Also, traditionally it is hard to use a neural network to explain the contribution of chemical substructures to a single property. We demonstrated the use of attention in the decoding network to detect the part of a molecule that is relevant to solubility, which can be used to explain the contribution from the CNN.
Collapse
|
106
|
Kitaguchi D, Takeshita N, Matsuzaki H, Oda T, Watanabe M, Mori K, Kobayashi E, Ito M. Automated laparoscopic colorectal surgery workflow recognition using artificial intelligence: Experimental research. Int J Surg 2020; 79:88-94. [PMID: 32413503 DOI: 10.1016/j.ijsu.2020.05.015] [Citation(s) in RCA: 65] [Impact Index Per Article: 13.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/06/2020] [Revised: 05/01/2020] [Accepted: 05/05/2020] [Indexed: 12/13/2022]
Abstract
BACKGROUND Identifying laparoscopic surgical videos using artificial intelligence (AI) facilitates the automation of several currently time-consuming manual processes, including video analysis, indexing, and video-based skill assessment. This study aimed to construct a large annotated dataset comprising laparoscopic colorectal surgery (LCRS) videos from multiple institutions and evaluate the accuracy of automatic recognition for surgical phase, action, and tool by combining this dataset with AI. MATERIALS AND METHODS A total of 300 intraoperative videos were collected from 19 high-volume centers. A series of surgical workflows were classified into 9 phases and 3 actions, and the area of 5 tools were assigned by painting. More than 82 million frames were annotated for a phase and action classification task, and 4000 frames were annotated for a tool segmentation task. Of these frames, 80% were used for the training dataset and 20% for the test dataset. A convolutional neural network (CNN) was used to analyze the videos. Intersection over union (IoU) was used as the evaluation metric for tool recognition. RESULTS The overall accuracies for the automatic surgical phase and action classification task were 81.0% and 83.2%, respectively. The mean IoU for the automatic tool segmentation task for 5 tools was 51.2%. CONCLUSIONS A large annotated dataset of LCRS videos was constructed, and the phase, action, and tool were recognized with high accuracy using AI. Our dataset has potential uses in medical applications such as automatic video indexing and surgical skill assessments. Open research will assist in improving CNN models by making our dataset available in the field of computer vision.
Collapse
Affiliation(s)
- Daichi Kitaguchi
- Surgical Device Innovation Office, National Cancer Center Hospital East, 6-5-1 Kashiwanoha, Kashiwa, Chiba, 277-8577, Japan; Department of Colorectal Surgery, National Cancer Center Hospital East, 6-5-1 Kashiwanoha, Kashiwa, Chiba, 277-8577, Japan; Department of Gastrointestinal and Hepato-Biliary-Pancreatic Surgery, Faculty of Medicine, University of Tsukuba, 1-1-1 Tennodai, Tsukuba, Ibaraki, 305-8575, Japan
| | - Nobuyoshi Takeshita
- Surgical Device Innovation Office, National Cancer Center Hospital East, 6-5-1 Kashiwanoha, Kashiwa, Chiba, 277-8577, Japan; Department of Colorectal Surgery, National Cancer Center Hospital East, 6-5-1 Kashiwanoha, Kashiwa, Chiba, 277-8577, Japan.
| | - Hiroki Matsuzaki
- Surgical Device Innovation Office, National Cancer Center Hospital East, 6-5-1 Kashiwanoha, Kashiwa, Chiba, 277-8577, Japan
| | - Tatsuya Oda
- Department of Gastrointestinal and Hepato-Biliary-Pancreatic Surgery, Faculty of Medicine, University of Tsukuba, 1-1-1 Tennodai, Tsukuba, Ibaraki, 305-8575, Japan
| | - Masahiko Watanabe
- Department of Surgery, Kitasato University School of Medicine, 1-15-1 Kitasato, Minami-ku, Sagamihara, Kanagawa, 252-0374, Japan
| | - Kensaku Mori
- Graduate School of Informatics, Nagoya University, Furo-cho, Chikusa-ku, Nagoya, Aichi, 464-8601, Japan
| | - Etsuko Kobayashi
- Institute of Advanced Biomedical Engineering and Science, Tokyo Women's Medical University, 8-1 Kawada-cho, Shinjuku-ku, Tokyo, 162-8666, Japan
| | - Masaaki Ito
- Surgical Device Innovation Office, National Cancer Center Hospital East, 6-5-1 Kashiwanoha, Kashiwa, Chiba, 277-8577, Japan; Department of Colorectal Surgery, National Cancer Center Hospital East, 6-5-1 Kashiwanoha, Kashiwa, Chiba, 277-8577, Japan.
| |
Collapse
|
107
|
Bano S, Vasconcelos F, Vander Poorten E, Vercauteren T, Ourselin S, Deprest J, Stoyanov D. FetNet: a recurrent convolutional network for occlusion identification in fetoscopic videos. Int J Comput Assist Radiol Surg 2020; 15:791-801. [PMID: 32350787 PMCID: PMC7261278 DOI: 10.1007/s11548-020-02169-0] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/16/2019] [Accepted: 04/10/2020] [Indexed: 12/18/2022]
Abstract
PURPOSE Fetoscopic laser photocoagulation is a minimally invasive surgery for the treatment of twin-to-twin transfusion syndrome (TTTS). By using a lens/fibre-optic scope, inserted into the amniotic cavity, the abnormal placental vascular anastomoses are identified and ablated to regulate blood flow to both fetuses. Limited field-of-view, occlusions due to fetus presence and low visibility make it difficult to identify all vascular anastomoses. Automatic computer-assisted techniques may provide better understanding of the anatomical structure during surgery for risk-free laser photocoagulation and may facilitate in improving mosaics from fetoscopic videos. METHODS We propose FetNet, a combined convolutional neural network (CNN) and long short-term memory (LSTM) recurrent neural network architecture for the spatio-temporal identification of fetoscopic events. We adapt an existing CNN architecture for spatial feature extraction and integrated it with the LSTM network for end-to-end spatio-temporal inference. We introduce differential learning rates during the model training to effectively utilising the pre-trained CNN weights. This may support computer-assisted interventions (CAI) during fetoscopic laser photocoagulation. RESULTS We perform quantitative evaluation of our method using 7 in vivo fetoscopic videos captured from different human TTTS cases. The total duration of these videos was 5551 s (138,780 frames). To test the robustness of the proposed approach, we perform 7-fold cross-validation where each video is treated as a hold-out or test set and training is performed using the remaining videos. CONCLUSION FetNet achieved superior performance compared to the existing CNN-based methods and provided improved inference because of the spatio-temporal information modelling. Online testing of FetNet, using a Tesla V100-DGXS-32GB GPU, achieved a frame rate of 114 fps. These results show that our method could potentially provide a real-time solution for CAI and automating occlusion and photocoagulation identification during fetoscopic procedures.
Collapse
Affiliation(s)
- Sophia Bano
- Wellcome/EPSRC Centre for Interventional and Surgical Sciences (WEISS) and Department of Computer Science, University College London, London, UK
| | - Francisco Vasconcelos
- Wellcome/EPSRC Centre for Interventional and Surgical Sciences (WEISS) and Department of Computer Science, University College London, London, UK
| | | | - Tom Vercauteren
- School of Biomedical Engineering and Imaging Sciences, King’s College London, London, UK
| | - Sebastien Ourselin
- School of Biomedical Engineering and Imaging Sciences, King’s College London, London, UK
| | - Jan Deprest
- Department of Development and Regeneration, University Hospital Leuven, Leuven, Belgium
| | - Danail Stoyanov
- Wellcome/EPSRC Centre for Interventional and Surgical Sciences (WEISS) and Department of Computer Science, University College London, London, UK
| |
Collapse
|
108
|
Mohamadipanah H, Perrone KH, Peterson K, Nathwani J, Huang F, Garren A, Garren M, Witt A, Pugh C. Sensors and Psychomotor Metrics: A Unique Opportunity to Close the Gap on Surgical Processes and Outcomes. ACS Biomater Sci Eng 2020; 6:2630-2640. [DOI: 10.1021/acsbiomaterials.9b01019] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
Affiliation(s)
- Hossein Mohamadipanah
- Department of Surgery, Stanford University School of Medicine, 300 Pasteur Drive, Stanford, California 94305, United States
| | - Kenneth H. Perrone
- Department of Surgery, Stanford University School of Medicine, 300 Pasteur Drive, Stanford, California 94305, United States
| | - Katherine Peterson
- Department of Surgery, School of Medicine and Public Health, University of Wisconsin-Madison, 600 Highland Avenue, Madison, Wisconsin 53726, United States
| | - Jay Nathwani
- Department of Surgery, School of Medicine and Public Health, University of Wisconsin-Madison, 600 Highland Avenue, Madison, Wisconsin 53726, United States
| | - Felix Huang
- Department of Physical Medicine and Rehabilitation, Feinberg School of Medicine, Northwestern University, 710 North Lake Shore Drive, #1022, Chicago, Illinois 60611, United States
| | - Anna Garren
- Department of Surgery, School of Medicine and Public Health, University of Wisconsin-Madison, 600 Highland Avenue, Madison, Wisconsin 53726, United States
| | - Margaret Garren
- Department of Surgery, School of Medicine and Public Health, University of Wisconsin-Madison, 600 Highland Avenue, Madison, Wisconsin 53726, United States
| | - Anna Witt
- Department of Surgery, Stanford University School of Medicine, 300 Pasteur Drive, Stanford, California 94305, United States
| | - Carla Pugh
- Department of Surgery, Stanford University School of Medicine, 300 Pasteur Drive, Stanford, California 94305, United States
| |
Collapse
|
109
|
Yue Z, Ding S, Zhao W, Wang H, Ma J, Zhang Y, Zhang Y. Automatic CIN Grades Prediction of Sequential Cervigram Image Using LSTM With Multistate CNN Features. IEEE J Biomed Health Inform 2020; 24:844-854. [DOI: 10.1109/jbhi.2019.2922682] [Citation(s) in RCA: 17] [Impact Index Per Article: 3.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/08/2022]
|
110
|
Czempiel T, Paschali M, Keicher M, Simson W, Feussner H, Kim ST, Navab N. TeCNO: Surgical Phase Recognition with Multi-stage Temporal Convolutional Networks. MEDICAL IMAGE COMPUTING AND COMPUTER ASSISTED INTERVENTION – MICCAI 2020 2020. [DOI: 10.1007/978-3-030-59716-0_33] [Citation(s) in RCA: 31] [Impact Index Per Article: 6.2] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/23/2023]
|
111
|
Jia X, Xing X, Yuan Y, Xing L, Meng MQH. Wireless Capsule Endoscopy: A New Tool for Cancer Screening in the Colon With Deep-Learning-Based Polyp Recognition. PROCEEDINGS OF THE IEEE 2020; 108:178-197. [DOI: 10.1109/jproc.2019.2950506] [Citation(s) in RCA: 25] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/05/2025]
|
112
|
Real-time automatic surgical phase recognition in laparoscopic sigmoidectomy using the convolutional neural network-based deep learning approach. Surg Endosc 2019; 34:4924-4931. [PMID: 31797047 DOI: 10.1007/s00464-019-07281-0] [Citation(s) in RCA: 78] [Impact Index Per Article: 13.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/29/2019] [Accepted: 11/23/2019] [Indexed: 02/06/2023]
Abstract
BACKGROUND Automatic surgical workflow recognition is a key component for developing the context-aware computer-assisted surgery (CA-CAS) systems. However, automatic surgical phase recognition focused on colorectal surgery has not been reported. We aimed to develop a deep learning model for automatic surgical phase recognition based on laparoscopic sigmoidectomy (Lap-S) videos, which could be used for real-time phase recognition, and to clarify the accuracies of the automatic surgical phase and action recognitions using visual information. METHODS The dataset used contained 71 cases of Lap-S. The video data were divided into frame units every 1/30 s as static images. Every Lap-S video was manually divided into 11 surgical phases (Phases 0-10) and manually annotated for each surgical action on every frame. The model was generated based on the training data. Validation of the model was performed on a set of unseen test data. Convolutional neural network (CNN)-based deep learning was also used. RESULTS The average surgical time was 175 min (± 43 min SD), with the individual surgical phases also showing high variations in the duration between cases. Each surgery started in the first phase (Phase 0) and ended in the last phase (Phase 10), and phase transitions occurred 14 (± 2 SD) times per procedure on an average. The accuracy of the automatic surgical phase recognition was 91.9% and those for the automatic surgical action recognition of extracorporeal action and irrigation were 89.4% and 82.5%, respectively. Moreover, this system could perform real-time automatic surgical phase recognition at 32 fps. CONCLUSIONS The CNN-based deep learning approach enabled the recognition of surgical phases and actions in 71 Lap-S cases based on manually annotated data. This system could perform automatic surgical phase recognition and automatic target surgical action recognition with high accuracy. Moreover, this study showed the feasibility of real-time automatic surgical phase recognition with high frame rate.
Collapse
|
113
|
Jin Y, Li H, Dou Q, Chen H, Qin J, Fu CW, Heng PA. Multi-task recurrent convolutional network with correlation loss for surgical video analysis. Med Image Anal 2019; 59:101572. [PMID: 31639622 DOI: 10.1016/j.media.2019.101572] [Citation(s) in RCA: 76] [Impact Index Per Article: 12.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/08/2018] [Revised: 09/29/2019] [Accepted: 10/03/2019] [Indexed: 12/16/2022]
Abstract
Surgical tool presence detection and surgical phase recognition are two fundamental yet challenging tasks in surgical video analysis as well as very essential components in various applications in modern operating rooms. While these two analysis tasks are highly correlated in clinical practice as the surgical process is typically well-defined, most previous methods tackled them separately, without making full use of their relatedness. In this paper, we present a novel method by developing a multi-task recurrent convolutional network with correlation loss (MTRCNet-CL) to exploit their relatedness to simultaneously boost the performance of both tasks. Specifically, our proposed MTRCNet-CL model has an end-to-end architecture with two branches, which share earlier feature encoders to extract general visual features while holding respective higher layers targeting for specific tasks. Given that temporal information is crucial for phase recognition, long-short term memory (LSTM) is explored to model the sequential dependencies in the phase recognition branch. More importantly, a novel and effective correlation loss is designed to model the relatedness between tool presence and phase identification of each video frame, by minimizing the divergence of predictions from the two branches. Mutually leveraging both low-level feature sharing and high-level prediction correlating, our MTRCNet-CL method can encourage the interactions between the two tasks to a large extent, and hence can bring about benefits to each other. Extensive experiments on a large surgical video dataset (Cholec80) demonstrate outstanding performance of our proposed method, consistently exceeding the state-of-the-art methods by a large margin, e.g., 89.1% v.s. 81.0% for the mAP in tool presence detection and 87.4% v.s. 84.5% for F1 score in phase recognition.
Collapse
Affiliation(s)
- Yueming Jin
- Department of Computer Science and Engineering, The Chinese University of Hong Kong, China
| | - Huaxia Li
- Department of Computer Science and Engineering, The Chinese University of Hong Kong, China
| | - Qi Dou
- Department of Computer Science and Engineering, The Chinese University of Hong Kong, China.
| | - Hao Chen
- Department of Computer Science and Engineering, The Chinese University of Hong Kong, China
| | - Jing Qin
- Centre for Smart Health, School of Nursing, The Hong Kong Polytechnic University, China
| | - Chi-Wing Fu
- Department of Computer Science and Engineering, The Chinese University of Hong Kong, China
| | - Pheng-Ann Heng
- Department of Computer Science and Engineering, The Chinese University of Hong Kong, China; T Stone Robotics Institute, The Chinese University of Hong Kong, China
| |
Collapse
|
114
|
Novel evaluation of surgical activity recognition models using task-based efficiency metrics. Int J Comput Assist Radiol Surg 2019; 14:2155-2163. [PMID: 31267333 DOI: 10.1007/s11548-019-02025-w] [Citation(s) in RCA: 12] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/04/2019] [Accepted: 06/26/2019] [Indexed: 01/14/2023]
Abstract
PURPOSE Surgical task-based metrics (rather than entire procedure metrics) can be used to improve surgeon training and, ultimately, patient care through focused training interventions. Machine learning models to automatically recognize individual tasks or activities are needed to overcome the otherwise manual effort of video review. Traditionally, these models have been evaluated using frame-level accuracy. Here, we propose evaluating surgical activity recognition models by their effect on task-based efficiency metrics. In this way, we can determine when models have achieved adequate performance for providing surgeon feedback via metrics from individual tasks. METHODS We propose a new CNN-LSTM model, RP-Net-V2, to recognize the 12 steps of robotic-assisted radical prostatectomies (RARP). We evaluated our model both in terms of conventional methods (e.g., Jaccard Index, task boundary accuracy) as well as novel ways, such as the accuracy of efficiency metrics computed from instrument movements and system events. RESULTS Our proposed model achieves a Jaccard Index of 0.85 thereby outperforming previous models on RARP. Additionally, we show that metrics computed from tasks automatically identified using RP-Net-V2 correlate well with metrics from tasks labeled by clinical experts. CONCLUSION We demonstrate that metrics-based evaluation of surgical activity recognition models is a viable approach to determine when models can be used to quantify surgical efficiencies. We believe this approach and our results illustrate the potential for fully automated, postoperative efficiency reports.
Collapse
|
115
|
Bodenstedt S, Rivoir D, Jenke A, Wagner M, Breucha M, Müller-Stich B, Mees ST, Weitz J, Speidel S. Active learning using deep Bayesian networks for surgical workflow analysis. Int J Comput Assist Radiol Surg 2019; 14:1079-1087. [PMID: 30968355 DOI: 10.1007/s11548-019-01963-9] [Citation(s) in RCA: 29] [Impact Index Per Article: 4.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/30/2019] [Accepted: 04/02/2019] [Indexed: 11/30/2022]
Abstract
PURPOSE For many applications in the field of computer-assisted surgery, such as providing the position of a tumor, specifying the most probable tool required next by the surgeon or determining the remaining duration of surgery, methods for surgical workflow analysis are a prerequisite. Often machine learning-based approaches serve as basis for analyzing the surgical workflow. In general, machine learning algorithms, such as convolutional neural networks (CNN), require large amounts of labeled data. While data is often available in abundance, many tasks in surgical workflow analysis need annotations by domain experts, making it difficult to obtain a sufficient amount of annotations. METHODS The aim of using active learning to train a machine learning model is to reduce the annotation effort. Active learning methods determine which unlabeled data points would provide the most information according to some metric, such as prediction uncertainty. Experts will then be asked to only annotate these data points. The model is then retrained with the new data and used to select further data for annotation. Recently, active learning has been applied to CNN by means of deep Bayesian networks (DBN). These networks make it possible to assign uncertainties to predictions. In this paper, we present a DBN-based active learning approach adapted for image-based surgical workflow analysis task. Furthermore, by using a recurrent architecture, we extend this network to video-based surgical workflow analysis. To decide which data points should be labeled next, we explore and compare different metrics for expressing uncertainty. RESULTS We evaluate these approaches and compare different metrics on the Cholec80 dataset by performing instrument presence detection and surgical phase segmentation. Here we are able to show that using a DBN-based active learning approach for selecting what data points to annotate next can significantly outperform a baseline based on randomly selecting data points. In particular, metrics such as entropy and variation ratio perform consistently on the different tasks. CONCLUSION We show that using DBN-based active learning strategies make it possible to selectively annotate data, thereby reducing the required amount of labeled training in surgical workflow-related tasks.
Collapse
Affiliation(s)
- Sebastian Bodenstedt
- Department for Translational Surgical Oncology, National Center for Tumor Diseases (NCT), Partner Site Dresden, Dresden, Germany.
| | - Dominik Rivoir
- Department for Translational Surgical Oncology, National Center for Tumor Diseases (NCT), Partner Site Dresden, Dresden, Germany
| | - Alexander Jenke
- Department for Translational Surgical Oncology, National Center for Tumor Diseases (NCT), Partner Site Dresden, Dresden, Germany
| | - Martin Wagner
- Department of General, Visceral and Transplant Surgery, University of Heidelberg, Heidelberg, Germany
| | - Michael Breucha
- Department of Visceral, Thoracic and Vascular Surgery, Faculty of Medicine and University Hospital Carl Gustav Carus, TU Dresden, Dresden, Germany
| | - Beat Müller-Stich
- Department of General, Visceral and Transplant Surgery, University of Heidelberg, Heidelberg, Germany
| | - Sören Torge Mees
- Department of Visceral, Thoracic and Vascular Surgery, Faculty of Medicine and University Hospital Carl Gustav Carus, TU Dresden, Dresden, Germany
| | - Jürgen Weitz
- Department of Visceral, Thoracic and Vascular Surgery, Faculty of Medicine and University Hospital Carl Gustav Carus, TU Dresden, Dresden, Germany
| | - Stefanie Speidel
- Department for Translational Surgical Oncology, National Center for Tumor Diseases (NCT), Partner Site Dresden, Dresden, Germany
| |
Collapse
|
116
|
Abstract
Recent years have seen tremendous progress in artificial intelligence (AI), such as with the automatic and real-time recognition of objects and activities in videos in the field of computer vision. Due to its increasing digitalization, the operating room (OR) promises to directly benefit from this progress in the form of new assistance tools that can enhance the abilities and performance of surgical teams. Key for such tools is the recognition of the surgical workflow, because efficient assistance by an AI system requires this system to be aware of the surgical context, namely of all activities taking place inside the operating room. We present here how several recent techniques relying on machine and deep learning can be used to analyze the activities taking place during surgery, using videos captured from either endoscopic or ceiling-mounted cameras. We also present two potential clinical applications that we are developing at the University of Strasbourg with our clinical partners.
Collapse
Affiliation(s)
- Nicolas Padoy
- a ICube, IHU Strasbourg, CNRS , University of Strasbourg , Strasbourg , France
| |
Collapse
|
117
|
Hard Frame Detection and Online Mapping for Surgical Phase Recognition. LECTURE NOTES IN COMPUTER SCIENCE 2019. [DOI: 10.1007/978-3-030-32254-0_50] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/02/2022]
|
118
|
Graph Convolutional Nets for Tool Presence Detection in Surgical Videos. LECTURE NOTES IN COMPUTER SCIENCE 2019. [DOI: 10.1007/978-3-030-20351-1_36] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/12/2022]
|
119
|
Nakawala H, Bianchi R, Pescatori LE, De Cobelli O, Ferrigno G, De Momi E. “Deep-Onto” network for surgical workflow and context recognition. Int J Comput Assist Radiol Surg 2018; 14:685-696. [DOI: 10.1007/s11548-018-1882-8] [Citation(s) in RCA: 28] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/10/2018] [Accepted: 11/05/2018] [Indexed: 12/31/2022]
|
120
|
Temporal Coherence-based Self-supervised Learning for Laparoscopic Workflow Analysis. LECTURE NOTES IN COMPUTER SCIENCE 2018. [DOI: 10.1007/978-3-030-01201-4_11] [Citation(s) in RCA: 14] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/05/2022]
|