51
|
Golany T, Aides A, Freedman D, Rabani N, Liu Y, Rivlin E, Corrado GS, Matias Y, Khoury W, Kashtan H, Reissman P. Artificial intelligence for phase recognition in complex laparoscopic cholecystectomy. Surg Endosc 2022; 36:9215-9223. [PMID: 35941306 PMCID: PMC9652206 DOI: 10.1007/s00464-022-09405-5] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/01/2021] [Accepted: 06/19/2022] [Indexed: 01/06/2023]
Abstract
BACKGROUND The potential role and benefits of AI in surgery has yet to be determined. This study is a first step in developing an AI system for minimizing adverse events and improving patient's safety. We developed an Artificial Intelligence (AI) algorithm and evaluated its performance in recognizing surgical phases of laparoscopic cholecystectomy (LC) videos spanning a range of complexities. METHODS A set of 371 LC videos with various complexity levels and containing adverse events was collected from five hospitals. Two expert surgeons segmented each video into 10 phases including Calot's triangle dissection and clipping and cutting. For each video, adverse events were also annotated when present (major bleeding; gallbladder perforation; major bile leakage; and incidental finding) and complexity level (on a scale of 1-5) was also recorded. The dataset was then split in an 80:20 ratio (294 and 77 videos), stratified by complexity, hospital, and adverse events to train and test the AI model, respectively. The AI-surgeon agreement was then compared to the agreement between surgeons. RESULTS The mean accuracy of the AI model for surgical phase recognition was 89% [95% CI 87.1%, 90.6%], comparable to the mean inter-annotator agreement of 90% [95% CI 89.4%, 90.5%]. The model's accuracy was inversely associated with procedure complexity, decreasing from 92% (complexity level 1) to 88% (complexity level 3) to 81% (complexity level 5). CONCLUSION The AI model successfully identified surgical phases in both simple and complex LC procedures. Further validation and system training is warranted to evaluate its potential applications such as to increase patient safety during surgery.
Collapse
Affiliation(s)
| | | | | | | | - Yun Liu
- Google Health, Tel Aviv, Israel
| | | | | | | | - Wisam Khoury
- Department of Surgery, Rappaport Faculty of Medicine, Carmel Medical Center, Technion, Haifa, Israel
| | - Hanoch Kashtan
- Department of Surgery, Rabin Medical Center, The Sackler School of Medicine, Tel-Aviv University, Petah Tikva, Israel
| | - Petachia Reissman
- Department of Surgery, The Hebrew University School of Medicine, Sharee Zedek Medical Center, Jerusalem, Israel.
- Digestive Disease Institute, Shaare-Zedek Medical Center, The Hebrew University School of Medicine, P.O. Box 3235, 91031, Jerusalem, Israel.
| |
Collapse
|
52
|
Fang L, Mou L, Gu Y, Hu Y, Chen B, Chen X, Wang Y, Liu J, Zhao Y. Global-local multi-stage temporal convolutional network for cataract surgery phase recognition. Biomed Eng Online 2022; 21:82. [PMID: 36451164 PMCID: PMC9710114 DOI: 10.1186/s12938-022-01048-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/10/2022] [Accepted: 11/04/2022] [Indexed: 12/02/2022] Open
Abstract
BACKGROUND Surgical video phase recognition is an essential technique in computer-assisted surgical systems for monitoring surgical procedures, which can assist surgeons in standardizing procedures and enhancing postsurgical assessment and indexing. However, the high similarity between the phases and temporal variations of cataract videos still poses the greatest challenge for video phase recognition. METHODS In this paper, we introduce a global-local multi-stage temporal convolutional network (GL-MSTCN) to explore the subtle differences between high similarity surgical phases and mitigate the temporal variations of surgical videos. The presented work consists of a triple-stream network (i.e., pupil stream, instrument stream, and video frame stream) and a multi-stage temporal convolutional network. The triple-stream network first detects the pupil and surgical instruments regions in the frame separately and then obtains the fine-grained semantic features of the video frames. The proposed multi-stage temporal convolutional network improves the surgical phase recognition performance by capturing longer time series features through dilated convolutional layers with varying receptive fields. RESULTS Our method is thoroughly validated on the CSVideo dataset with 32 cataract surgery videos and the public Cataract101 dataset with 101 cataract surgery videos, outperforming state-of-the-art approaches with 95.8% and 96.5% accuracy, respectively. CONCLUSIONS The experimental results show that the use of global and local feature information can effectively enhance the model to explore fine-grained features and mitigate temporal and spatial variations, thus improving the surgical phase recognition performance of the proposed GL-MSTCN.
Collapse
Affiliation(s)
- Lixin Fang
- grid.469325.f0000 0004 1761 325XCollege of Mechanical Engineering, Zhejiang University of Technology, Hangzhou, 310014 China ,grid.9227.e0000000119573309Cixi Institute of Biomedical Engineering, Ningbo Institute of Materials Technology and Engineering, Chinese Academy of Sciences, Ningbo, China
| | - Lei Mou
- grid.9227.e0000000119573309Cixi Institute of Biomedical Engineering, Ningbo Institute of Materials Technology and Engineering, Chinese Academy of Sciences, Ningbo, China
| | - Yuanyuan Gu
- grid.9227.e0000000119573309Cixi Institute of Biomedical Engineering, Ningbo Institute of Materials Technology and Engineering, Chinese Academy of Sciences, Ningbo, China ,grid.9227.e0000000119573309Zhejiang Engineering Research Center for Biomedical Materials, Cixi Institute of Biomedical Engineering, Ningbo Institute of Materials Technology and Engineering, Chinese Academy of Sciences, Ningbo, 315300 China
| | - Yan Hu
- grid.263817.90000 0004 1773 1790Department of Computer Science and Engineering, Southern University of Science and Technology, Shenzhen, 518055 China
| | - Bang Chen
- grid.9227.e0000000119573309Cixi Institute of Biomedical Engineering, Ningbo Institute of Materials Technology and Engineering, Chinese Academy of Sciences, Ningbo, China
| | - Xu Chen
- Department of Ophthalmology, Shanghai Aier Eye Hospital, Shanghai, China ,Department of Ophthalmology, Shanghai Aier Qingliang Eye Hospital, Shanghai, China ,grid.258164.c0000 0004 1790 3548Aier Eye Hospital, Jinan University, No. 601, Huangpu Road West, Guangzhou, China ,grid.216417.70000 0001 0379 7164Aier School of Ophthalmology, Central South University Changsha, Changsha, Hunan China
| | - Yang Wang
- grid.9227.e0000000119573309Aerospace Information Research Institute, Chinese Academy of Sciences, Beijing, China
| | - Jiang Liu
- grid.263817.90000 0004 1773 1790Department of Computer Science and Engineering, Southern University of Science and Technology, Shenzhen, 518055 China
| | - Yitian Zhao
- grid.9227.e0000000119573309Cixi Institute of Biomedical Engineering, Ningbo Institute of Materials Technology and Engineering, Chinese Academy of Sciences, Ningbo, China ,grid.9227.e0000000119573309Zhejiang Engineering Research Center for Biomedical Materials, Cixi Institute of Biomedical Engineering, Ningbo Institute of Materials Technology and Engineering, Chinese Academy of Sciences, Ningbo, 315300 China
| |
Collapse
|
53
|
Zou X, Liu W, Wang J, Tao R, Zheng G. ARST: auto-regressive surgical transformer for phase recognition from laparoscopic videos. COMPUTER METHODS IN BIOMECHANICS AND BIOMEDICAL ENGINEERING: IMAGING & VISUALIZATION 2022. [DOI: 10.1080/21681163.2022.2145238] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/15/2022]
Affiliation(s)
- Xiaoyang Zou
- Institute of Medical Robotics, School of Biomedical Engineering, Shanghai Jiao Tong University, Shanghai, China
| | - Wenyong Liu
- Key Laboratory for Biomechanics and Mechanobiology of Ministry of Education, Beijing Advanced Innovation Center for Biomedical Engineering, School of Biological Science and Medical Engineering, Beihang University, Beijing, China
| | - Junchen Wang
- School of Mechanical Engineering and Automation, Beihang University, Beijing, China
| | - Rong Tao
- Institute of Medical Robotics, School of Biomedical Engineering, Shanghai Jiao Tong University, Shanghai, China
| | - Guoyan Zheng
- Institute of Medical Robotics, School of Biomedical Engineering, Shanghai Jiao Tong University, Shanghai, China
| |
Collapse
|
54
|
Anticipation for surgical workflow through instrument interaction and recognized Signals. Med Image Anal 2022; 82:102611. [PMID: 36162336 DOI: 10.1016/j.media.2022.102611] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/20/2022] [Revised: 07/16/2022] [Accepted: 08/30/2022] [Indexed: 10/31/2022]
Abstract
Surgical workflow anticipation is an essential task for computer-assisted intervention (CAI) systems. It aims at predicting the future surgical phase and instrument occurrence, providing support for intra-operative decision-support system. Recent studies have promoted the development of the anticipation task by transforming it into a remaining time prediction problem, but without factoring the surgical instruments' behaviors and their interactions with surrounding anatomies in the network design. In this paper, we propose an Instrument Interaction Aware Anticipation Network (IIA-Net) to overcome the previous deficiency while retaining the merits of two-stage models through using spatial feature extractor and temporal model. Spatially, feature extractor utilizes tooltips' movement to extracts the instrument-instrument interaction, which helps model concentrate on the surgeon's actions. On the other hand, it introduces the segmentation map to capture the rich instrument-surrounding features about the instrument surroundings. Temporally, the temporal model applies the causal dilated multi-stage temporal convolutional network to capture the long-term dependency in the long and untrimmed surgical videos with a large receptive field. Our IIA-Net enforces an online inference with reliable predictions even with severe noise and artifacts in the recorded videos and presence signals. Extensive experiments on Cholec80 dataset demonstrate the performance of our proposed method exceeds the state-of-the-art method by a large margin (1.03 v.s. 1.12 for MAEw, 1.40 v.s. 1.75 for MAEin and 2.14 v.s. 2.68 for MAEe). For reproduction purposes, all the original codes are made public at https://github.com/Flaick/Surgical-Workflow-Anticipation.
Collapse
|
55
|
Sasaki K, Ito M, Kobayashi S, Kitaguchi D, Matsuzaki H, Kudo M, Hasegawa H, Takeshita N, Sugimoto M, Mitsunaga S, Gotohda N. Automated surgical workflow identification by artificial intelligence in laparoscopic hepatectomy: Experimental research. Int J Surg 2022; 105:106856. [PMID: 36031068 DOI: 10.1016/j.ijsu.2022.106856] [Citation(s) in RCA: 6] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/21/2022] [Accepted: 08/19/2022] [Indexed: 12/15/2022]
Abstract
BACKGROUND To perform accurate laparoscopic hepatectomy (LH) without injury, novel intraoperative systems of computer-assisted surgery (CAS) for LH are expected. Automated surgical workflow identification is a key component for developing CAS systems. This study aimed to develop a deep-learning model for automated surgical step identification in LH. MATERIALS AND METHODS We constructed a dataset comprising 40 cases of pure LH videos; 30 and 10 cases were used for the training and testing datasets, respectively. Each video was divided into 30 frames per second as static images. LH was divided into nine surgical steps (Steps 0-8), and each frame was annotated as being within one of these steps in the training set. After extracorporeal actions (Step 0) were excluded from the video, two deep-learning models of automated surgical step identification for 8-step and 6-step models were developed using a convolutional neural network (Models 1 & 2). Each frame in the testing dataset was classified using the constructed model performed in real-time. RESULTS Above 8 million frames were annotated for surgical step identification from the pure LH videos. The overall accuracy of Model 1 was 0.891, which was increased to 0.947 in Model 2. Median and average accuracy for each case in Model 2 was 0.927 (range, 0.884-0.997) and 0.937 ± 0.04 (standardized difference), respectively. Real-time automated surgical step identification was performed at 21 frames per second. CONCLUSIONS We developed a highly accurate deep-learning model for surgical step identification in pure LH. Our model could be applied to intraoperative systems of CAS.
Collapse
Affiliation(s)
- Kimimasa Sasaki
- Surgical Device Innovation Office, National Cancer Center Hospital East, 6-5-1, Kashiwanoha, Kashiwa-City, Chiba, 277-8577, Japan; Department of Hepatobiliary and Pancreatic Surgery, National Cancer Center Hospital East, 6-5-1, Kashiwanoha, Kashiwa-City, Chiba, 277-8577, Japan; Course of Advanced Clinical Research of Cancer, Juntendo University Graduate School of Medicine, 2-1-1, Hongo, Bunkyo-Ward, Tokyo, 113-8421, Japan
| | - Masaaki Ito
- Surgical Device Innovation Office, National Cancer Center Hospital East, 6-5-1, Kashiwanoha, Kashiwa-City, Chiba, 277-8577, Japan.
| | - Shin Kobayashi
- Department of Hepatobiliary and Pancreatic Surgery, National Cancer Center Hospital East, 6-5-1, Kashiwanoha, Kashiwa-City, Chiba, 277-8577, Japan
| | - Daichi Kitaguchi
- Surgical Device Innovation Office, National Cancer Center Hospital East, 6-5-1, Kashiwanoha, Kashiwa-City, Chiba, 277-8577, Japan
| | - Hiroki Matsuzaki
- Surgical Device Innovation Office, National Cancer Center Hospital East, 6-5-1, Kashiwanoha, Kashiwa-City, Chiba, 277-8577, Japan
| | - Masashi Kudo
- Department of Hepatobiliary and Pancreatic Surgery, National Cancer Center Hospital East, 6-5-1, Kashiwanoha, Kashiwa-City, Chiba, 277-8577, Japan
| | - Hiro Hasegawa
- Surgical Device Innovation Office, National Cancer Center Hospital East, 6-5-1, Kashiwanoha, Kashiwa-City, Chiba, 277-8577, Japan
| | - Nobuyoshi Takeshita
- Surgical Device Innovation Office, National Cancer Center Hospital East, 6-5-1, Kashiwanoha, Kashiwa-City, Chiba, 277-8577, Japan
| | - Motokazu Sugimoto
- Department of Hepatobiliary and Pancreatic Surgery, National Cancer Center Hospital East, 6-5-1, Kashiwanoha, Kashiwa-City, Chiba, 277-8577, Japan
| | - Shuichi Mitsunaga
- Course of Advanced Clinical Research of Cancer, Juntendo University Graduate School of Medicine, 2-1-1, Hongo, Bunkyo-Ward, Tokyo, 113-8421, Japan; Department of Hepatobiliary and Pancreatic Oncology, National Cancer Center Hospital East, 6-5-1, Kashiwanoha, Kashiwa-City, Chiba, 277-8577, Japan
| | - Naoto Gotohda
- Department of Hepatobiliary and Pancreatic Surgery, National Cancer Center Hospital East, 6-5-1, Kashiwanoha, Kashiwa-City, Chiba, 277-8577, Japan; Course of Advanced Clinical Research of Cancer, Juntendo University Graduate School of Medicine, 2-1-1, Hongo, Bunkyo-Ward, Tokyo, 113-8421, Japan
| |
Collapse
|
56
|
Loukas C, Gazis A, Schizas D. Multiple instance convolutional neural network for gallbladder assessment from laparoscopic images. Int J Med Robot 2022; 18:e2445. [DOI: 10.1002/rcs.2445] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/17/2022] [Revised: 06/01/2022] [Accepted: 07/21/2022] [Indexed: 12/07/2022]
Affiliation(s)
- Constantinos Loukas
- Laboratory of Medical PhysicsMedical SchoolNational and Kapodistrian University of AthensAthensGreece
| | - Athanasios Gazis
- Laboratory of Medical PhysicsMedical SchoolNational and Kapodistrian University of AthensAthensGreece
| | - Dimitrios Schizas
- 1st Department of SurgeryMedical SchoolLaikon General HospitalNational and Kapodistrian University of AthensAthensGreece
| |
Collapse
|
57
|
Chen HB, Li Z, Fu P, Ni ZL, Bian GB. Spatio-Temporal Causal Transformer for Multi-Grained Surgical Phase Recognition. ANNUAL INTERNATIONAL CONFERENCE OF THE IEEE ENGINEERING IN MEDICINE AND BIOLOGY SOCIETY. IEEE ENGINEERING IN MEDICINE AND BIOLOGY SOCIETY. ANNUAL INTERNATIONAL CONFERENCE 2022; 2022:1663-1666. [PMID: 36086459 DOI: 10.1109/embc48229.2022.9871004] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/15/2023]
Abstract
Automatic surgical phase recognition plays a key role in surgical workflow analysis and overall optimization in clinical work. In the complicated surgical procedures, similar inter-class appearance and drastic variability in phase duration make this still a challenging task. In this paper, a spatio-temporal transformer is proposed for online surgical phase recognition with different granularity. To extract rich spatial information, a spatial transformer is used to model global spatial dependencies of each time index. To overcome the variability in phase duration, a temporal transformer captures the multi-scale temporal context of different time indexes with a dual pyramid pattern. Our method is thoroughly validated on the public Cholec80 dataset with 7 coarse-grained phases and the CATARACTS2020 dataset with 19 fine-grained phases, outperforming state-of-the-art approaches with 91.4% and 84.2% accuracy, taking only 24.5M parameters.
Collapse
|
58
|
Gumbs AA, Grasso V, Bourdel N, Croner R, Spolverato G, Frigerio I, Illanes A, Abu Hilal M, Park A, Elyan E. The Advances in Computer Vision That Are Enabling More Autonomous Actions in Surgery: A Systematic Review of the Literature. SENSORS (BASEL, SWITZERLAND) 2022; 22:4918. [PMID: 35808408 PMCID: PMC9269548 DOI: 10.3390/s22134918] [Citation(s) in RCA: 9] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 05/20/2022] [Revised: 06/21/2022] [Accepted: 06/21/2022] [Indexed: 12/28/2022]
Abstract
This is a review focused on advances and current limitations of computer vision (CV) and how CV can help us obtain to more autonomous actions in surgery. It is a follow-up article to one that we previously published in Sensors entitled, "Artificial Intelligence Surgery: How Do We Get to Autonomous Actions in Surgery?" As opposed to that article that also discussed issues of machine learning, deep learning and natural language processing, this review will delve deeper into the field of CV. Additionally, non-visual forms of data that can aid computerized robots in the performance of more autonomous actions, such as instrument priors and audio haptics, will also be highlighted. Furthermore, the current existential crisis for surgeons, endoscopists and interventional radiologists regarding more autonomy during procedures will be discussed. In summary, this paper will discuss how to harness the power of CV to keep doctors who do interventions in the loop.
Collapse
Affiliation(s)
- Andrew A. Gumbs
- Departement de Chirurgie Digestive, Centre Hospitalier Intercommunal de, Poissy/Saint-Germain-en-Laye, 78300 Poissy, France
- Department of Surgery, University of Magdeburg, 39106 Magdeburg, Germany;
| | - Vincent Grasso
- Family Christian Health Center, 31 West 155th St., Harvey, IL 60426, USA;
| | - Nicolas Bourdel
- Gynecological Surgery Department, CHU Clermont Ferrand, 1, Place Lucie-Aubrac Clermont-Ferrand, 63100 Clermont-Ferrand, France;
- EnCoV, Institut Pascal, UMR6602 CNRS, UCA, Clermont-Ferrand University Hospital, 63000 Clermont-Ferrand, France
- SurgAR-Surgical Augmented Reality, 63000 Clermont-Ferrand, France
| | - Roland Croner
- Department of Surgery, University of Magdeburg, 39106 Magdeburg, Germany;
| | - Gaya Spolverato
- Department of Surgical, Oncological and Gastroenterological Sciences, University of Padova, 35122 Padova, Italy;
| | - Isabella Frigerio
- Department of Hepato-Pancreato-Biliary Surgery, Pederzoli Hospital, 37019 Peschiera del Garda, Italy;
| | - Alfredo Illanes
- INKA-Innovation Laboratory for Image Guided Therapy, Otto-von-Guericke University Magdeburg, 39120 Magdeburg, Germany;
| | - Mohammad Abu Hilal
- Unità Chirurgia Epatobiliopancreatica, Robotica e Mininvasiva, Fondazione Poliambulanza Istituto Ospedaliero, Via Bissolati, 57, 25124 Brescia, Italy;
| | - Adrian Park
- Anne Arundel Medical Center, Johns Hopkins University, Annapolis, MD 21401, USA;
| | - Eyad Elyan
- School of Computing, Robert Gordon University, Aberdeen AB10 7JG, UK;
| |
Collapse
|
59
|
Hybrid Spatiotemporal Contrastive Representation Learning for Content-Based Surgical Video Retrieval. ELECTRONICS 2022. [DOI: 10.3390/electronics11091353] [Citation(s) in RCA: 7] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 02/04/2023]
Abstract
In the medical field, due to their economic and clinical benefits, there is a growing interest in minimally invasive surgeries and microscopic surgeries. These types of surgeries are often recorded during operations, and these recordings have become a key resource for education, patient disease analysis, surgical error analysis, and surgical skill assessment. However, manual searching in this collection of long-term surgical videos is an extremely labor-intensive and long-term task, requiring an effective content-based video analysis system. In this regard, previous methods for surgical video retrieval are based on handcrafted features which do not represent the video effectively. On the other hand, deep learning-based solutions were found to be effective in both surgical image and video analysis, where CNN-, LSTM- and CNN-LSTM-based methods were proposed in most surgical video analysis tasks. In this paper, we propose a hybrid spatiotemporal embedding method to enhance spatiotemporal representations using an adaptive fusion layer on top of the LSTM and temporal causal convolutional modules. To learn surgical video representations, we propose exploring the supervised contrastive learning approach to leverage label information in addition to augmented versions. By validating our approach to a video retrieval task on two datasets, Surgical Actions 160 and Cataract-101, we significantly improve on previous results in terms of mean average precision, 30.012 ± 1.778 vs. 22.54 ± 1.557 for Surgical Actions 160 and 81.134 ± 1.28 vs. 33.18 ± 1.311 for Cataract-101. We also validate the proposed method’s suitability for surgical phase recognition task using the benchmark Cholec80 surgical dataset, where our approach outperforms (with 90.2% accuracy) the state of the art.
Collapse
|
60
|
Ban Y, Rosman G, Eckhoff JA, Ward TM, Hashimoto DA, Kondo T, Iwaki H, Meireles OR, Rus D. SUPR-GAN: SUrgical PRediction GAN for Event Anticipation in Laparoscopic and Robotic Surgery. IEEE Robot Autom Lett 2022. [DOI: 10.1109/lra.2022.3156856] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/27/2022]
Affiliation(s)
- Yutong Ban
- Distributed Robotics Laboratory, CSAIL, Massachusetts Institute of Technology, Cambridge, MA, USA
| | - Guy Rosman
- Distributed Robotics Laboratory, CSAIL, Massachusetts Institute of Technology, Cambridge, MA, USA
| | | | | | | | | | | | | | - Daniela Rus
- Distributed Robotics Laboratory, CSAIL, Massachusetts Institute of Technology, Cambridge, MA, USA
| |
Collapse
|
61
|
Das A, Bano S, Vasconcelos F, Khan DZ, Marcus HJ, Stoyanov D. Reducing Prediction volatility in the surgical workflow recognition of endoscopic pituitary surgery. Int J Comput Assist Radiol Surg 2022; 17:1445-1452. [PMID: 35362848 PMCID: PMC9307536 DOI: 10.1007/s11548-022-02599-y] [Citation(s) in RCA: 7] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/03/2022] [Accepted: 03/08/2022] [Indexed: 11/25/2022]
Abstract
Purpose: Workflow recognition can aid surgeons before an operation when used as a training tool, during an operation by increasing operating room efficiency, and after an operation in the completion of operation notes. Although several methods have been applied to this task, they have been tested on few surgical datasets. Therefore, their generalisability is not well tested, particularly for surgical approaches utilising smaller working spaces which are susceptible to occlusion and necessitate frequent withdrawal of the endoscope. This leads to rapidly changing predictions, which reduces the clinical confidence of the methods, and hence limits their suitability for clinical translation. Methods: Firstly, the optimal neural network is found using established methods, using endoscopic pituitary surgery as an exemplar. Then, prediction volatility is formally defined as a new evaluation metric as a proxy for uncertainty, and two temporal smoothing functions are created. The first (modal, \documentclass[12pt]{minimal}
\usepackage{amsmath}
\usepackage{wasysym}
\usepackage{amsfonts}
\usepackage{amssymb}
\usepackage{amsbsy}
\usepackage{mathrsfs}
\usepackage{upgreek}
\setlength{\oddsidemargin}{-69pt}
\begin{document}$$M_n$$\end{document}Mn) mode-averages over the previous n predictions, and the second (threshold, \documentclass[12pt]{minimal}
\usepackage{amsmath}
\usepackage{wasysym}
\usepackage{amsfonts}
\usepackage{amssymb}
\usepackage{amsbsy}
\usepackage{mathrsfs}
\usepackage{upgreek}
\setlength{\oddsidemargin}{-69pt}
\begin{document}$$T_n$$\end{document}Tn) ensures a class is only changed after being continuously predicted for n predictions. Both functions are independently applied to the predictions of the optimal network. Results: The methods are evaluated on a 50-video dataset using fivefold cross-validation, and the optimised evaluation metric is weighted-\documentclass[12pt]{minimal}
\usepackage{amsmath}
\usepackage{wasysym}
\usepackage{amsfonts}
\usepackage{amssymb}
\usepackage{amsbsy}
\usepackage{mathrsfs}
\usepackage{upgreek}
\setlength{\oddsidemargin}{-69pt}
\begin{document}$$F_1$$\end{document}F1 score. The optimal model is ResNet-50+LSTM achieving 0.84 in 3-phase classification and 0.74 in 7-step classification. Applying threshold smoothing further improves these results, achieving 0.86 in 3-phase classification, and 0.75 in 7-step classification, while also drastically reducing the prediction volatility. Conclusion: The results confirm the established methods generalise to endoscopic pituitary surgery, and show simple temporal smoothing not only reduces prediction volatility, but actively improves performance.
Collapse
Affiliation(s)
- Adrito Das
- Wellcome/EPSRC Centre for Interventional and Surgical Sciences, University College London, London, United Kingdom.
| | - Sophia Bano
- Wellcome/EPSRC Centre for Interventional and Surgical Sciences, University College London, London, United Kingdom
| | - Francisco Vasconcelos
- Wellcome/EPSRC Centre for Interventional and Surgical Sciences, University College London, London, United Kingdom
| | - Danyal Z Khan
- Wellcome/EPSRC Centre for Interventional and Surgical Sciences, University College London, London, United Kingdom
- Department of Neurosurgery, National Hospital for Neurology and Neurosurgery, London, United Kingdom
| | - Hani J Marcus
- Wellcome/EPSRC Centre for Interventional and Surgical Sciences, University College London, London, United Kingdom
- Department of Neurosurgery, National Hospital for Neurology and Neurosurgery, London, United Kingdom
| | - Danail Stoyanov
- Wellcome/EPSRC Centre for Interventional and Surgical Sciences, University College London, London, United Kingdom
| |
Collapse
|
62
|
Real-Time Multi-Label Upper Gastrointestinal Anatomy Recognition from Gastroscope Videos. APPLIED SCIENCES-BASEL 2022. [DOI: 10.3390/app12073306] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/16/2022]
Abstract
Esophagogastroduodenoscopy (EGD) is a critical step in the diagnosis of upper gastrointestinal disorders. However, due to inexperience or high workload, there is a wide variation in EGD performance by endoscopists. Variations in performance may result in exams that do not completely cover all anatomical locations of the stomach, leading to a potential risk of missed diagnosis of gastric diseases. Numerous guidelines or expert consensus have been proposed to assess and optimize the quality of endoscopy. However, there is a lack of mature and robust methods to accurately apply to real clinical real-time video environments. In this paper, we innovatively define the problem of recognizing anatomical locations in videos as a multi-label recognition task. This can be more consistent with the model learning of image-to-label mapping relationships. We propose a combined structure of a deep learning model (GL-Net) that combines a graph convolutional network (GCN) with long short-term memory (LSTM) networks to both extract label features and correlate temporal dependencies for accurate real-time anatomical locations identification in gastroscopy videos. Our methodological evaluation dataset is based on complete videos of real clinical examinations. A total of 29,269 images from 49 videos were collected as a dataset for model training and validation. Another 1736 clinical videos were retrospectively analyzed and evaluated for the application of the proposed model. Our method achieves 97.1% mean accuracy (mAP), 95.5% mean per-class accuracy and 93.7% average overall accuracy in a multi-label classification task, and is able to process these videos in real-time at 29.9 FPS. In addition, based on our approach, we designed a system to monitor routine EGD videos in detail and perform statistical analysis of the operating habits of endoscopists, which can be a useful tool to improve the quality of clinical endoscopy.
Collapse
|
63
|
Shinozuka K, Turuda S, Fujinaga A, Nakanuma H, Kawamura M, Matsunobu Y, Tanaka Y, Kamiyama T, Ebe K, Endo Y, Etoh T, Inomata M, Tokuyasu T. Artificial intelligence software available for medical devices: surgical phase recognition in laparoscopic cholecystectomy. Surg Endosc 2022; 36:7444-7452. [PMID: 35266049 PMCID: PMC9485170 DOI: 10.1007/s00464-022-09160-7] [Citation(s) in RCA: 11] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/18/2021] [Accepted: 02/18/2022] [Indexed: 11/29/2022]
Abstract
Background Surgical process modeling automatically identifies surgical phases, and further improvement in recognition accuracy is expected with deep learning. Surgical tool or time series information has been used to improve the recognition accuracy of a model. However, it is difficult to collect this information continuously intraoperatively. The present study aimed to develop a deep convolution neural network (CNN) model that correctly identifies the surgical phase during laparoscopic cholecystectomy (LC). Methods We divided LC into six surgical phases (P1–P6) and one redundant phase (P0). We prepared 115 LC videos and converted them to image frames at 3 fps. Three experienced doctors labeled the surgical phases in all image frames. Our deep CNN model was trained with 106 of the 115 annotation datasets and was evaluated with the remaining datasets. By depending on both the prediction probability and frequency for a certain period, we aimed for highly accurate surgical phase recognition in the operation room. Results Nine full LC videos were converted into image frames and were fed to our deep CNN model. The average accuracy, precision, and recall were 0.970, 0.855, and 0.863, respectively. Conclusion The deep learning CNN model in this study successfully identified both the six surgical phases and the redundant phase, P0, which may increase the versatility of the surgical process recognition model for clinical use. We believe that this model can be used in artificial intelligence for medical devices. The degree of recognition accuracy is expected to improve with developments in advanced deep learning algorithms.
Collapse
Affiliation(s)
- Ken'ichi Shinozuka
- Faculty of Information Engineering, Department of Information and Systems Engineering, Fukuoka Institute of Technology, 1-30-1 Wajiro higashi, Higashi-ku, Fukuoka, Fukuoka, 811-0295, Japan
| | - Sayaka Turuda
- Faculty of Information Engineering, Department of Information and Systems Engineering, Fukuoka Institute of Technology, 1-30-1 Wajiro higashi, Higashi-ku, Fukuoka, Fukuoka, 811-0295, Japan
| | - Atsuro Fujinaga
- Faculty of Medicine, Department of Gastroenterological and Pediatric Surgery, Oita University, Oita, Japan
| | - Hiroaki Nakanuma
- Faculty of Medicine, Department of Gastroenterological and Pediatric Surgery, Oita University, Oita, Japan
| | - Masahiro Kawamura
- Faculty of Medicine, Department of Gastroenterological and Pediatric Surgery, Oita University, Oita, Japan
| | - Yusuke Matsunobu
- Faculty of Information Engineering, Department of Information and Systems Engineering, Fukuoka Institute of Technology, 1-30-1 Wajiro higashi, Higashi-ku, Fukuoka, Fukuoka, 811-0295, Japan
| | - Yuki Tanaka
- Customer Solutions Development, Platform Technology, Olympus Technologies Asia, Olympus Corporation, Tokyo, Japan
| | - Toshiya Kamiyama
- Customer Solutions Development, Platform Technology, Olympus Technologies Asia, Olympus Corporation, Tokyo, Japan
| | - Kohei Ebe
- Customer Solutions Development, Platform Technology, Olympus Technologies Asia, Olympus Corporation, Tokyo, Japan
| | - Yuichi Endo
- Faculty of Medicine, Department of Gastroenterological and Pediatric Surgery, Oita University, Oita, Japan
| | - Tsuyoshi Etoh
- Faculty of Medicine, Department of Gastroenterological and Pediatric Surgery, Oita University, Oita, Japan
| | - Masafumi Inomata
- Faculty of Medicine, Department of Gastroenterological and Pediatric Surgery, Oita University, Oita, Japan
| | - Tatsushi Tokuyasu
- Faculty of Information Engineering, Department of Information and Systems Engineering, Fukuoka Institute of Technology, 1-30-1 Wajiro higashi, Higashi-ku, Fukuoka, Fukuoka, 811-0295, Japan.
| |
Collapse
|
64
|
Junger D, Frommer SM, Burgert O. State-of-the-art of situation recognition systems for intraoperative procedures. Med Biol Eng Comput 2022; 60:921-939. [PMID: 35178622 PMCID: PMC8933302 DOI: 10.1007/s11517-022-02520-4] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/16/2020] [Accepted: 01/30/2022] [Indexed: 11/05/2022]
Abstract
One of the key challenges for automatic assistance is the support of actors in the operating room depending on the status of the procedure. Therefore, context information collected in the operating room is used to gain knowledge about the current situation. In literature, solutions already exist for specific use cases, but it is doubtful to what extent these approaches can be transferred to other conditions. We conducted a comprehensive literature research on existing situation recognition systems for the intraoperative area, covering 274 articles and 95 cross-references published between 2010 and 2019. We contrasted and compared 58 identified approaches based on defined aspects such as used sensor data or application area. In addition, we discussed applicability and transferability. Most of the papers focus on video data for recognizing situations within laparoscopic and cataract surgeries. Not all of the approaches can be used online for real-time recognition. Using different methods, good results with recognition accuracies above 90% could be achieved. Overall, transferability is less addressed. The applicability of approaches to other circumstances seems to be possible to a limited extent. Future research should place a stronger focus on adaptability. The literature review shows differences within existing approaches for situation recognition and outlines research trends. Applicability and transferability to other conditions are less addressed in current work.
Collapse
Affiliation(s)
- D Junger
- School of Informatics, Research Group Computer Assisted Medicine (CaMed), Reutlingen University, Alteburgstr. 150, 72762, Reutlingen, Germany.
| | - S M Frommer
- School of Informatics, Research Group Computer Assisted Medicine (CaMed), Reutlingen University, Alteburgstr. 150, 72762, Reutlingen, Germany
| | - O Burgert
- School of Informatics, Research Group Computer Assisted Medicine (CaMed), Reutlingen University, Alteburgstr. 150, 72762, Reutlingen, Germany
| |
Collapse
|
65
|
Artificial Intelligence in Surgery: A Research Team Perspective. Curr Probl Surg 2022; 59:101125. [DOI: 10.1016/j.cpsurg.2022.101125] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/21/2022]
|
66
|
Zhang Y, Bano S, Page AS, Deprest J, Stoyanov D, Vasconcelos F. Large-scale surgical workflow segmentation for laparoscopic sacrocolpopexy. Int J Comput Assist Radiol Surg 2022; 17:467-477. [PMID: 35050468 PMCID: PMC8873061 DOI: 10.1007/s11548-021-02544-5] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/09/2021] [Accepted: 12/07/2021] [Indexed: 12/03/2022]
Abstract
Purpose Laparoscopic sacrocolpopexy is the gold standard procedure for the management of vaginal vault prolapse. Studying surgical skills and different approaches to this procedure requires an analysis at the level of each of its individual phases, thus motivating investigation of automated surgical workflow for expediting this research. Phase durations in this procedure are significantly larger and more variable than commonly available benchmarks such as Cholec80, and we assess these differences. Methodology We introduce sequence-to-sequence (seq2seq) models for coarse-level phase segmentation in order to deal with highly variable phase durations in Sacrocolpopexy. Multiple architectures (LSTM and transformer), configurations (time-shifted, time-synchronous), and training strategies are tested with this novel framework to explore its flexibility. Results We perform 7-fold cross-validation on a dataset with 14 complete videos of sacrocolpopexy. We perform both a frame-based (accuracy, F1-score) and an event-based (Ward metric) evaluation of our algorithms and show that different architectures present a trade-off between higher number of accurate frames (LSTM, Mode average) or more consistent ordering of phase transitions (Transformer). We compare the implementations on the widely used Cholec80 dataset and verify that relative performances are different to those in Sacrocolpopexy. Conclusions We show that workflow segmentation of Sacrocolpopexy videos has specific challenges that are different to the widely used benchmark Cholec80 and require dedicated approaches to deal with the significantly larger phase durations. We demonstrate the feasibility of seq2seq models in Sacrocolpopexy, a broad framework that can be further explored with new configurations. We show that an event-based evaluation metric is useful to evaluate workflow segmentation algorithms and provides complementary insight to the more commonly used metrics such as accuracy or F1-score. Supplementary Information The online version supplementary material available at 10.1007/s11548-021-02544-5.
Collapse
Affiliation(s)
- Yitong Zhang
- Wellcome/EPSRC Centre for Interventional and Surgical Sciences (WEISS) and Department of Computer Science, University College London, London, UK.
| | - Sophia Bano
- Wellcome/EPSRC Centre for Interventional and Surgical Sciences (WEISS) and Department of Computer Science, University College London, London, UK
| | - Ann-Sophie Page
- Department of Development and Regeneration, University Hospital Leuven, Leuven, Belgium
| | - Jan Deprest
- Department of Development and Regeneration, University Hospital Leuven, Leuven, Belgium
| | - Danail Stoyanov
- Wellcome/EPSRC Centre for Interventional and Surgical Sciences (WEISS) and Department of Computer Science, University College London, London, UK
| | - Francisco Vasconcelos
- Wellcome/EPSRC Centre for Interventional and Surgical Sciences (WEISS) and Department of Computer Science, University College London, London, UK
| |
Collapse
|
67
|
Uncharted Waters of Machine and Deep Learning for Surgical Phase Recognition in Neurosurgery. World Neurosurg 2022; 160:4-12. [PMID: 35026457 DOI: 10.1016/j.wneu.2022.01.020] [Citation(s) in RCA: 12] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/02/2021] [Revised: 01/05/2022] [Accepted: 01/05/2022] [Indexed: 12/20/2022]
Abstract
Recent years have witnessed artificial intelligence (AI) make meteoric leaps in both medicine and surgery, bridging the gap between the capabilities of humans and machines. Digitization of operating rooms and the creation of massive quantities of data have paved the way for machine learning and computer vision applications in surgery. Surgical phase recognition (SPR) is a newly emerging technology that uses data derived from operative videos to train machine and deep learning algorithms to identify the phases of surgery. Advancement of this technology will be key in establishing context-aware surgical systems in the future. By automatically recognizing and evaluating the current surgical scenario, these intelligent systems are able to provide intraoperative decision support, improve operating room efficiency, assess surgical skills, and aid in surgical training and education. Still in its infancy, SPR has been mainly studied in laparoscopic surgeries, with a contrasting stark lack of research within neurosurgery. Given the high-tech and rapidly advancing nature of neurosurgery, we believe SPR has a tremendous untapped potential in this field. Herein, we present an overview of the SPR technology, its potential applications in neurosurgery, and the challenges that lie ahead.
Collapse
|
68
|
Kitaguchi D, Takeshita N, Hasegawa H, Ito M. Artificial intelligence-based computer vision in surgery: Recent advances and future perspectives. Ann Gastroenterol Surg 2022; 6:29-36. [PMID: 35106412 PMCID: PMC8786689 DOI: 10.1002/ags3.12513] [Citation(s) in RCA: 29] [Impact Index Per Article: 9.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 07/26/2021] [Revised: 09/16/2021] [Accepted: 09/18/2021] [Indexed: 12/04/2022] Open
Abstract
Technology has advanced surgery, especially minimally invasive surgery (MIS), including laparoscopic surgery and robotic surgery. It has led to an increase in the number of technologies in the operating room. They can provide further information about a surgical procedure, e.g. instrument usage and trajectories. Among these surgery-related technologies, the amount of information extracted from a surgical video captured by an endoscope is especially great. Therefore, the automation of data analysis is essential in surgery to reduce the complexity of the data while maximizing its utility to enable new opportunities for research and development. Computer vision (CV) is the field of study that deals with how computers can understand digital images or videos and seeks to automate tasks that can be performed by the human visual system. Because this field deals with all the processes of real-world information acquisition by computers, the terminology "CV" is extensive, and ranges from hardware for image sensing to AI-based image recognition. AI-based image recognition for simple tasks, such as recognizing snapshots, has advanced and is comparable to humans in recent years. Although surgical video recognition is a more complex and challenging task, if we can effectively apply it to MIS, it leads to future surgical advancements, such as intraoperative decision-making support and image navigation surgery. Ultimately, automated surgery might be realized. In this article, we summarize the recent advances and future perspectives of AI-related research and development in the field of surgery.
Collapse
Affiliation(s)
- Daichi Kitaguchi
- Surgical Device Innovation OfficeNational Cancer Center Hospital EastKashiwaJapan
| | - Nobuyoshi Takeshita
- Surgical Device Innovation OfficeNational Cancer Center Hospital EastKashiwaJapan
| | - Hiro Hasegawa
- Surgical Device Innovation OfficeNational Cancer Center Hospital EastKashiwaJapan
| | - Masaaki Ito
- Surgical Device Innovation OfficeNational Cancer Center Hospital EastKashiwaJapan
| |
Collapse
|
69
|
Namazi B, Sankaranarayanan G, Devarajan V. A contextual detector of surgical tools in laparoscopic videos using deep learning. Surg Endosc 2022; 36:679-688. [PMID: 33559057 PMCID: PMC8349373 DOI: 10.1007/s00464-021-08336-x] [Citation(s) in RCA: 11] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/23/2020] [Accepted: 01/13/2021] [Indexed: 01/03/2023]
Abstract
BACKGROUND The complexity of laparoscopy requires special training and assessment. Analyzing the streaming videos during the surgery can potentially improve surgical education. The tedium and cost of such an analysis can be dramatically reduced using an automated tool detection system, among other things. We propose a new multilabel classifier, called LapTool-Net to detect the presence of surgical tools in each frame of a laparoscopic video. METHODS The novelty of LapTool-Net is the exploitation of the correlations among the usage of different tools and, the tools and tasks-i.e., the context of the tools' usage. Towards this goal, the pattern in the co-occurrence of the tools is utilized for designing a decision policy for the multilabel classifier based on a Recurrent Convolutional Neural Network (RCNN), which is trained in an end-to-end manner. In the post-processing step, the predictions are corrected by modeling the long-term tasks' order with an RNN. RESULTS LapTool-Net was trained using publicly available datasets of laparoscopic cholecystectomy, viz., M2CAI16 and Cholec80. For M2CAI16, our exact match accuracies (when all the tools in one frame are predicted correctly) in online and offline modes were 80.95% and 81.84% with per-class F1-score of 88.29% and 90.53%. For Cholec80, the accuracies were 85.77% and 91.92% with F1-scores if 93.10% and 96.11% for online and offline, respectively. CONCLUSIONS The results show LapTool-Net outperformed state-of-the-art methods significantly, even while using fewer training samples and a shallower architecture. Our context-aware model does not require expert's domain-specific knowledge, and the simple architecture can potentially improve all existing methods.
Collapse
Affiliation(s)
- Babak Namazi
- Baylor Scott & White Research Institute, Dallas, Texas
| | | | - Venkat Devarajan
- Electrical Engineering Department, University of Texas at Arlington, Texas
| |
Collapse
|
70
|
Machine Learning in Laparoscopic Surgery. Artif Intell Med 2022. [DOI: 10.1007/978-981-19-1223-8_8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/18/2022]
|
71
|
Zhang Y, Marsic I, Burd RS. Real-time medical phase recognition using long-term video understanding and progress gate method. Med Image Anal 2021; 74:102224. [PMID: 34543914 PMCID: PMC8560574 DOI: 10.1016/j.media.2021.102224] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/09/2020] [Revised: 08/31/2021] [Accepted: 09/02/2021] [Indexed: 01/10/2023]
Abstract
We introduce a real-time system for recognizing five phases of the trauma resuscitation process, the initial management of injured patients in the emergency department. We used depth videos as input to preserve the privacy of the patients and providers. The depth videos were recorded using a Kinect-v2 mounted on the sidewall of the room. Our dataset consisted of 183 depth videos of trauma resuscitations. The model was trained on 150 cases with more than 30 minutes each and tested on the remaining 33 cases. We introduced a reduced long-term operation (RLO) method for extracting features from long segments of video and combined it with the regular model having short-term information only. The model with RLO outperformed the regular short-term model by 5% using the accuracy score. We also introduced a progress gate (PG) method to distinguish visually similar phases using video progress. The final system achieved 91% accuracy and significantly outperformed previous systems for phase recognition in this setting.
Collapse
Affiliation(s)
- Yanyi Zhang
- Department of Electrical and Computer Engineering, Rutgers University, Piscataway, NJ 08854, USA.
| | - Ivan Marsic
- Department of Electrical and Computer Engineering, Rutgers University, Piscataway, NJ 08854, USA
| | - Randall S Burd
- Division of Trauma and Burn Surgery, Children's National Medical Center, Washington, DC 20010, USA
| |
Collapse
|
72
|
Hisey R, Camire D, Erb J, Howes D, Fichtinger G, Ungi T. System for central venous catheterization training using computer vision-based workflow feedback. IEEE Trans Biomed Eng 2021; 69:1630-1638. [PMID: 34727022 PMCID: PMC9118169 DOI: 10.1109/tbme.2021.3124422] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022]
Abstract
OBJECTIVE To develop a system for training central venous catheterization that does not require an expert observer. We propose a training system that uses video-based workflow recognition and electromagnetic tracking to provide trainees with real-time instruction and feedback. METHODS The system provides trainees with prompts about upcoming tasks and visual cues about workflow errors. Most tasks are recognized from a webcam video using a combination of a convolutional neural network and a recurrent neural network. We evaluate the systems ability to recognize tasks in the workflow by computing the percent of tasks that were recognized and the average signed transitional delay between the system and reviewers. We also evaluate the usability of the system using a participant questionnaire. RESULTS The system was able to recognize 86.2% of tasks in the workflow. The average signed transitional delay was -0.7 8.7s. The average score on the questionnaire was 4.7 out of 5 for the system overall. The participants found the interactive task list to be the most useful component of the system with an average score of 4.8 out of 5. CONCLUSION Overall, the participants were happy with the system and felt that it would improve central venous catheterization training. Our system provides trainees with meaningful instruction and feedback without needing an expert observer to be present. SIGNIFICANCE We are able to provide trainees with more opportunities to access instruction and meaningful feedback by using workflow recognition.
Collapse
|
73
|
Pradeep CS, Sinha N. Spatio-Temporal Features Based Surgical Phase Classification Using CNNs. ANNUAL INTERNATIONAL CONFERENCE OF THE IEEE ENGINEERING IN MEDICINE AND BIOLOGY SOCIETY. IEEE ENGINEERING IN MEDICINE AND BIOLOGY SOCIETY. ANNUAL INTERNATIONAL CONFERENCE 2021; 2021:3332-3335. [PMID: 34891953 DOI: 10.1109/embc46164.2021.9630829] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/14/2023]
Abstract
In this paper, we propose a novel encoder-decoder based surgical phase classification technique leveraging on the spatio-temporal features extracted from the videos of laparoscopic cholecystectomy surgery. We use combined margin loss function to train on the computationally efficient PeleeNet architecture to extract features that exhibit: (1) Intra-phase similarity, (2) Inter-phase dissimilarity. Using these features, we propose to encapsulate sequential feature embeddings, 64 at a time and classify the surgical phase based on customized efficient residual factorized CNN architecture (ST-ERFNet). We obtained surgical phase classification accuracy of 86.07% on the publicly available Cholec80 dataset which consists of 7 surgical phases. The number of parameters required for the computation is approximately reduced by 84% and yet achieves comparable performance as the state of the art.Clinical relevance- Autonomous surgical phase classification sets the platform for automatically analyzing the entire surgical work flow. Additionally, could streamline the process of assessment of a surgery in terms of efficiency, early detection of errors or deviation from usual practice. This would potentially result in increased patient care.
Collapse
|
74
|
Huaulmé A, Sarikaya D, Le Mut K, Despinoy F, Long Y, Dou Q, Chng CB, Lin W, Kondo S, Bravo-Sánchez L, Arbeláez P, Reiter W, Mitsuishi M, Harada K, Jannin P. MIcro-surgical anastomose workflow recognition challenge report. COMPUTER METHODS AND PROGRAMS IN BIOMEDICINE 2021; 212:106452. [PMID: 34688174 DOI: 10.1016/j.cmpb.2021.106452] [Citation(s) in RCA: 12] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/28/2021] [Accepted: 09/28/2021] [Indexed: 05/22/2023]
Abstract
BACKGROUND AND OBJECTIVE Automatic surgical workflow recognition is an essential step in developing context-aware computer-assisted surgical systems. Video recordings of surgeries are becoming widely accessible, as the operational field view is captured during laparoscopic surgeries. Head and ceiling mounted cameras are also increasingly being used to record videos in open surgeries. This makes videos a common choice in surgical workflow recognition. Additional modalities, such as kinematic data captured during robot-assisted surgeries, could also improve workflow recognition. This paper presents the design and results of the MIcro-Surgical Anastomose Workflow recognition on training sessions (MISAW) challenge whose objective was to develop workflow recognition models based on kinematic data and/or videos. METHODS The MISAW challenge provided a data set of 27 sequences of micro-surgical anastomosis on artificial blood vessels. This data set was composed of videos, kinematics, and workflow annotations. The latter described the sequences at three different granularity levels: phase, step, and activity. Four tasks were proposed to the participants: three of them were related to the recognition of surgical workflow at three different granularity levels, while the last one addressed the recognition of all granularity levels in the same model. We used the average application-dependent balanced accuracy (AD-Accuracy) as the evaluation metric. This takes unbalanced classes into account and it is more clinically relevant than a frame-by-frame score. RESULTS Six teams participated in at least one task. All models employed deep learning models, such as convolutional neural networks (CNN), recurrent neural networks (RNN), or a combination of both. The best models achieved accuracy above 95%, 80%, 60%, and 75% respectively for recognition of phases, steps, activities, and multi-granularity. The RNN-based models outperformed the CNN-based ones as well as the dedicated modality models compared to the multi-granularity except for activity recognition. CONCLUSION For high levels of granularity, the best models had a recognition rate that may be sufficient for applications such as prediction of remaining surgical time. However, for activities, the recognition rate was still low for applications that can be employed clinically. The MISAW data set is publicly available at http://www.synapse.org/MISAW to encourage further research in surgical workflow recognition.
Collapse
Affiliation(s)
- Arnaud Huaulmé
- Univ Rennes,INSERM, LTSI - UMR 1099, Rennes, F35000, France.
| | - Duygu Sarikaya
- Gazi University, Faculty of Engineering; Department of Computer Engineering, Ankara, Turkey
| | - Kévin Le Mut
- Univ Rennes,INSERM, LTSI - UMR 1099, Rennes, F35000, France
| | | | - Yonghao Long
- Department of Computer Science & Engineering, The Chinese University of Hong Kong, China; T Stone Robotics Institute, The Chinese University of Hong Kong, China
| | - Qi Dou
- Department of Computer Science & Engineering, The Chinese University of Hong Kong, China; T Stone Robotics Institute, The Chinese University of Hong Kong, China
| | - Chin-Boon Chng
- National University of Singapore(NUS), Singapore, Singapore; Southern University of Science and Technology (SUSTech), Shenzhen, China
| | - Wenjun Lin
- National University of Singapore(NUS), Singapore, Singapore; Southern University of Science and Technology (SUSTech), Shenzhen, China
| | | | - Laura Bravo-Sánchez
- Center for Research and Formation in Artificial Intelligence, Department of Biomedical Engineering, Universidad de los Andes, Bogotá, Colombia
| | - Pablo Arbeláez
- Center for Research and Formation in Artificial Intelligence, Department of Biomedical Engineering, Universidad de los Andes, Bogotá, Colombia
| | | | - Manoru Mitsuishi
- Department of Mechanical Engineering, the University of Tokyo,Tokyo 113-8656, Japan
| | - Kanako Harada
- Department of Mechanical Engineering, the University of Tokyo,Tokyo 113-8656, Japan
| | - Pierre Jannin
- Univ Rennes,INSERM, LTSI - UMR 1099, Rennes, F35000, France.
| |
Collapse
|
75
|
Wang J, Jin Y, Cai S, Xu H, Heng PA, Qin J, Wang L. Real-time landmark detection for precise endoscopic submucosal dissection via shape-aware relation network. Med Image Anal 2021; 75:102291. [PMID: 34753019 DOI: 10.1016/j.media.2021.102291] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/22/2021] [Revised: 10/22/2021] [Accepted: 10/25/2021] [Indexed: 10/19/2022]
Abstract
We propose a novel shape-aware relation network for accurate and real-time landmark detection in endoscopic submucosal dissection (ESD) surgery. This task is of great clinical significance but extremely challenging due to bleeding, lighting reflection, and motion blur in the complicated surgical environment. Compared with existing solutions, which either neglect geometric relationships among targeting objects or capture the relationships by using complicated aggregation schemes, the proposed network is capable of achieving satisfactory accuracy while maintaining real-time performance by taking full advantage of the spatial relations among landmarks. We first devise an algorithm to automatically generate relation keypoint heatmaps, which are able to intuitively represent the prior knowledge of spatial relations among landmarks without using any extra manual annotation efforts. We then develop two complementary regularization schemes to progressively incorporate the prior knowledge into the training process. While one scheme introduces pixel-level regularization by multi-task learning, the other integrates global-level regularization by harnessing a newly designed grouped consistency evaluator, which adds relation constraints to the proposed network in an adversarial manner. Both schemes are beneficial to the model in training, and can be readily unloaded in inference to achieve real-time detection. We establish a large in-house dataset of ESD surgery for esophageal cancer to validate the effectiveness of our proposed method. Extensive experimental results demonstrate that our approach outperforms state-of-the-art methods in terms of accuracy and efficiency, achieving better detection results faster. Promising results on two downstream applications further corroborate the great potential of our method in ESD clinical practice.
Collapse
Affiliation(s)
- Jiacheng Wang
- Department of Computer Science at School of Informatics, Xiamen University, Xiamen 361005, China
| | - Yueming Jin
- Department of Computer Science and Engineering, The Chinese University of Hong Kong, China
| | - Shuntian Cai
- Department of Gastroenterology, Zhongshan Hospital affiliated to Xiamen University, Xiamen, China
| | - Hongzhi Xu
- Department of Gastroenterology, Zhongshan Hospital affiliated to Xiamen University, Xiamen, China
| | - Pheng-Ann Heng
- Department of Computer Science and Engineering, The Chinese University of Hong Kong, China
| | - Jing Qin
- Center for Smart Health, School of Nursing, The Hong Kong Polytechnic University, Hong Kong
| | - Liansheng Wang
- Department of Computer Science at School of Informatics, Xiamen University, Xiamen 361005, China.
| |
Collapse
|
76
|
Qin Y, Allan M, Burdick JW, Azizian M. Autonomous Hierarchical Surgical State Estimation During Robot-Assisted Surgery Through Deep Neural Networks. IEEE Robot Autom Lett 2021. [DOI: 10.1109/lra.2021.3091728] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022]
|
77
|
Choi J, Cho S, Chung JW, Kim N. Video recognition of simple mastoidectomy using convolutional neural networks: Detection and segmentation of surgical tools and anatomical regions. COMPUTER METHODS AND PROGRAMS IN BIOMEDICINE 2021; 208:106251. [PMID: 34271262 DOI: 10.1016/j.cmpb.2021.106251] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 12/03/2020] [Accepted: 06/20/2021] [Indexed: 06/13/2023]
Abstract
A simple mastoidectomy is used to remove inflammation of the mastoid cavity and to create a route to the skull base and middle ear. However, due to the complexity and difficulty of the simple mastoidectomy, implementing robot vision for assisted surgery is a challenge. To overcome this issue using a convolutional neural network architecture in a surgical environment, each surgical instrument and anatomical region must be distinguishable in real time. To meet this condition, we used the latest instance segmentation architecture, YOLACT. In this study, a data set comprising 5,319 extracted frames from 70 simple mastoidectomy surgery videos were used. Six surgical tools and five anatomic regions were identified for the training. The YOLACT-based model in the surgical environment was trained and evaluated for real-time object detection and semantic segmentation. Detection accuracies of surgical tools and anatomic regions were 91.2% and 56.5% in mean average precision, respectively. Additionally, the dice similarity coefficient metric for segmentation of the five anatomic regions was 48.2%. The mean frames per second of this model was 32.3, which is sufficient for real-time robotic applications.
Collapse
Affiliation(s)
- Joonmyeong Choi
- University of Ulsan College of Medicine, Convergence Medicine, 388-1 pungnap2-dong, Radiology, East bld 2nd fl Seoul, Songpa-gu, 05505 Korea
| | - Sungman Cho
- University of Ulsan College of Medicine, Convergence Medicine, 388-1 pungnap2-dong, Radiology, East bld 2nd fl Seoul, Songpa-gu, 05505 Korea
| | - Jong Woo Chung
- University of Ulsan College of Medicine, Convergence Medicine, 388-1 pungnap2-dong, Radiology, East bld 2nd fl Seoul, Songpa-gu, 05505 Korea.
| | - Namkug Kim
- University of Ulsan College of Medicine, Convergence Medicine, 388-1 pungnap2-dong, Radiology, East bld 2nd fl Seoul, Songpa-gu, 05505 Korea.
| |
Collapse
|
78
|
Zhang B, Ghanem A, Simes A, Choi H, Yoo A. Surgical workflow recognition with 3DCNN for Sleeve Gastrectomy. Int J Comput Assist Radiol Surg 2021; 16:2029-2036. [PMID: 34415503 PMCID: PMC8589754 DOI: 10.1007/s11548-021-02473-3] [Citation(s) in RCA: 14] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/10/2021] [Accepted: 08/04/2021] [Indexed: 01/07/2023]
Abstract
PURPOSE Surgical workflow recognition is a crucial and challenging problem when building a computer-assisted surgery system. Current techniques focus on utilizing a convolutional neural network and a recurrent neural network (CNN-RNN) to solve the surgical workflow recognition problem. In this paper, we attempt to use a deep 3DCNN to solve this problem. METHODS In order to tackle the surgical workflow recognition problem and the imbalanced data problem, we implement a 3DCNN workflow referred to as I3D-FL-PKF. We utilize focal loss (FL) to train a 3DCNN architecture known as Inflated 3D ConvNet (I3D) for surgical workflow recognition. We use prior knowledge filtering (PKF) to filter the recognition results. RESULTS We evaluate our proposed workflow on a large sleeve gastrectomy surgical video dataset. We show that focal loss can help to address the imbalanced data problem. We show that our PKF can be used to generate smoothed prediction results and improve the overall accuracy. We show that the proposed workflow achieves 84.16% frame-level accuracy and reaches a weighted Jaccard score of 0.7327 which outperforms traditional CNN-RNN design. CONCLUSION The proposed workflow can obtain consistent and smooth predictions not only within the surgical phases but also for phase transitions. By utilizing focal loss and prior knowledge filtering, our implementation of deep 3DCNN has great potential to solve surgical workflow recognition problems for clinical practice.
Collapse
Affiliation(s)
- Bokai Zhang
- C-SATS, Inc. Johnson & Johnson, 1100 Olive Way, Suite 1100, Seattle, WA, 98101, USA.
| | - Amer Ghanem
- C-SATS, Inc. Johnson & Johnson, 1100 Olive Way, Suite 1100, Seattle, WA, 98101, USA
| | - Alexander Simes
- C-SATS, Inc. Johnson & Johnson, 1100 Olive Way, Suite 1100, Seattle, WA, 98101, USA
| | - Henry Choi
- C-SATS, Inc. Johnson & Johnson, 1100 Olive Way, Suite 1100, Seattle, WA, 98101, USA
| | - Andrew Yoo
- C-SATS, Inc. Johnson & Johnson, 1100 Olive Way, Suite 1100, Seattle, WA, 98101, USA
| |
Collapse
|
79
|
Can Deep Learning Algorithms Help Identify Surgical Workflow and Techniques? J Surg Res 2021; 268:318-325. [PMID: 34399354 DOI: 10.1016/j.jss.2021.07.003] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/26/2021] [Revised: 07/14/2021] [Accepted: 07/15/2021] [Indexed: 12/21/2022]
Abstract
BACKGROUND Surgical videos are now being used for performance review and educational purposes; however, broad use is still limited due to time constraints. To make video review more efficient, we implemented Artificial Intelligence (AI) algorithms to detect surgical workflow and technical approaches. METHODS Participants (N = 200) performed a simulated open bowel repair. The operation included two major phases: (1) Injury Identification and (2) Suture Repair. Accordingly, a phase detection algorithm (MobileNetV2+GRU) was implemented to automatically detect the two phases using video data. In addition, participants were noted to use three different technical approaches when running the bowel: (1) use of both hands, (2) use of one hand and one tool, or (3) use of two tools. To discern the three technical approaches, an object detection (YOLOv3) algorithm was implemented to recognize objects that were commonly used during the Injury Identification phase (hands versus tools). RESULTS The phase detection algorithm achieved high precision (recall) when segmenting the two phases: Injury Identification (86 ± 9% [81 ± 12%]) and Suture Repair (81 ± 6% [81 ± 16%]). When evaluating three technical approaches in running the bowel, the object detection algorithm achieved high average precisions (Hands [99.32%] and Tools [94.47%]). The three technical approaches showed no difference in execution time (Kruskal-Wallis Test: P= 0.062) or injury identification (not missing an injury) (Chi-squared: P= 0.998). CONCLUSIONS The AI algorithms showed high precision when segmenting surgical workflow and identifying technical approaches. Automation of these techniques for surgical video databases has great potential to facilitate efficient performance review.
Collapse
|
80
|
|
81
|
Using deep learning to identify the recurrent laryngeal nerve during thyroidectomy. Sci Rep 2021; 11:14306. [PMID: 34253767 PMCID: PMC8275665 DOI: 10.1038/s41598-021-93202-y] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/03/2021] [Accepted: 06/22/2021] [Indexed: 11/16/2022] Open
Abstract
Surgeons must visually distinguish soft-tissues, such as nerves, from surrounding anatomy to prevent complications and optimize patient outcomes. An accurate nerve segmentation and analysis tool could provide useful insight for surgical decision-making. Here, we present an end-to-end, automatic deep learning computer vision algorithm to segment and measure nerves. Unlike traditional medical imaging, our unconstrained setup with accessible handheld digital cameras, along with the unstructured open surgery scene, makes this task uniquely challenging. We investigate one common procedure, thyroidectomy, during which surgeons must avoid damaging the recurrent laryngeal nerve (RLN), which is responsible for human speech. We evaluate our segmentation algorithm on a diverse dataset across varied and challenging settings of operating room image capture, and show strong segmentation performance in the optimal image capture condition. This work lays the foundation for future research in real-time tissue discrimination and integration of accessible, intelligent tools into open surgery to provide actionable insights.
Collapse
|
82
|
Shi X, Jin Y, Dou Q, Heng PA. Semi-supervised learning with progressive unlabeled data excavation for label-efficient surgical workflow recognition. Med Image Anal 2021; 73:102158. [PMID: 34325149 DOI: 10.1016/j.media.2021.102158] [Citation(s) in RCA: 16] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/10/2021] [Revised: 06/04/2021] [Accepted: 06/29/2021] [Indexed: 11/16/2022]
Abstract
Surgical workflow recognition is a fundamental task in computer-assisted surgery and a key component of various applications in operating rooms. Existing deep learning models have achieved promising results for surgical workflow recognition, heavily relying on a large amount of annotated videos. However, obtaining annotation is time-consuming and requires the domain knowledge of surgeons. In this paper, we propose a novel two-stage Semi-Supervised Learning method for label-efficient Surgical workflow recognition, named as SurgSSL. Our proposed SurgSSL progressively leverages the inherent knowledge held in the unlabeled data to a larger extent: from implicit unlabeled data excavation via motion knowledge excavation, to explicit unlabeled data excavation via pre-knowledge pseudo labeling. Specifically, we first propose a novel intra-sequence Visual and Temporal Dynamic Consistency (VTDC) scheme for implicit excavation. It enforces prediction consistency of the same data under perturbations in both spatial and temporal spaces, encouraging model to capture rich motion knowledge. We further perform explicit excavation by optimizing the model towards our pre-knowledge pseudo label. It is naturally generated by the VTDC regularized model with prior knowledge of unlabeled data encoded, and demonstrates superior reliability for model supervision compared with the label generated by existing methods. We extensively evaluate our method on two public surgical datasets of Cholec80 and M2CAI challenge dataset. Our method surpasses the state-of-the-art semi-supervised methods by a large margin, e.g., improving 10.5% Accuracy under the severest annotation regime of M2CAI dataset. Using only 50% labeled videos on Cholec80, our approach achieves competitive performance compared with full-data training method.
Collapse
Affiliation(s)
- Xueying Shi
- Department of Computer Science and Engineering, The Chinese University of Hong Kong, Hong Kong
| | - Yueming Jin
- Department of Computer Science and Engineering, The Chinese University of Hong Kong, Hong Kong.
| | - Qi Dou
- Department of Computer Science and Engineering, The Chinese University of Hong Kong, Hong Kong; T Stone Robotics Institute, The Chinese University of Hong Kong, Hong Kong
| | - Pheng-Ann Heng
- Department of Computer Science and Engineering, The Chinese University of Hong Kong, Hong Kong; T Stone Robotics Institute, The Chinese University of Hong Kong, Hong Kong
| |
Collapse
|
83
|
Cheng K, You J, Wu S, Chen Z, Zhou Z, Guan J, Peng B, Wang X. Artificial intelligence-based automated laparoscopic cholecystectomy surgical phase recognition and analysis. Surg Endosc 2021; 36:3160-3168. [PMID: 34231066 DOI: 10.1007/s00464-021-08619-3] [Citation(s) in RCA: 25] [Impact Index Per Article: 6.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/06/2021] [Accepted: 06/14/2021] [Indexed: 02/08/2023]
Abstract
BACKGROUND Artificial intelligence and computer vision have revolutionized laparoscopic surgical video analysis. However, there is no multi-center study focused on deep learning-based laparoscopic cholecystectomy phases recognizing. This work aims to apply artificial intelligence in recognizing and analyzing phases in laparoscopic cholecystectomy videos from multiple centers. METHODS This observational cohort-study included 163 laparoscopic cholecystectomy videos collected from four medical centers. Videos were labeled by surgeons and a deep-learning model was developed based on 90 videos. Thereafter, the performance of the model was tested in additional ten videos by comparing it with the annotated ground truth of the surgeon. Deep-learning models were trained to identify laparoscopic cholecystectomy phases. The performance of models was measured using precision, recall, F1 score, and overall accuracy. With a high overall accuracy of the model, additional 63 videos as an analysis set were analyzed by the model to identify different phases. RESULTS Mean concordance correlation coefficient for annotations of the surgeons across all operative phases was 92.38%. Also, the overall phase recognition accuracy of laparoscopic cholecystectomy by the model was 91.05%. In the analysis set, there was an average surgery time of 2195 ± 896 s, with a huge individual variance of different surgical phases. Notably, laparoscopic cholecystectomy in acute cholecystitis cases had prolonged overall durations, and the surgeon would spend more time in mobilizing the hepatocystic triangle phase. CONCLUSION A deep-learning model based on multiple centers data can identify phases of laparoscopic cholecystectomy with a high degree of accuracy. With continued refinements, artificial intelligence could be utilized in huge data surgery analysis to achieve clinically relevant future applications.
Collapse
Affiliation(s)
- Ke Cheng
- West China School of Medicine, West China Hospital of Sichuan University, Chengdu, China.,Department of Pancreatic Surgery, West China Hospital, Sichuan University, No. 37, Guoxue Alley, Chengdu, 610041, Sichuan Province, China
| | - Jiaying You
- West China School of Medicine, West China Hospital of Sichuan University, Chengdu, China.,Department of Pancreatic Surgery, West China Hospital, Sichuan University, No. 37, Guoxue Alley, Chengdu, 610041, Sichuan Province, China
| | - Shangdi Wu
- West China School of Medicine, West China Hospital of Sichuan University, Chengdu, China.,Department of Pancreatic Surgery, West China Hospital, Sichuan University, No. 37, Guoxue Alley, Chengdu, 610041, Sichuan Province, China
| | - Zixin Chen
- West China School of Medicine, West China Hospital of Sichuan University, Chengdu, China.,Department of Pancreatic Surgery, West China Hospital, Sichuan University, No. 37, Guoxue Alley, Chengdu, 610041, Sichuan Province, China
| | - Zijian Zhou
- West China School of Medicine, West China Hospital of Sichuan University, Chengdu, China.,Department of Pancreatic Surgery, West China Hospital, Sichuan University, No. 37, Guoxue Alley, Chengdu, 610041, Sichuan Province, China
| | - Jingye Guan
- ChengDu Withai Innovations Technology Company, Chengdu, China
| | - Bing Peng
- West China School of Medicine, West China Hospital of Sichuan University, Chengdu, China. .,Department of Pancreatic Surgery, West China Hospital, Sichuan University, No. 37, Guoxue Alley, Chengdu, 610041, Sichuan Province, China.
| | - Xin Wang
- West China School of Medicine, West China Hospital of Sichuan University, Chengdu, China. .,Department of Pancreatic Surgery, West China Hospital, Sichuan University, No. 37, Guoxue Alley, Chengdu, 610041, Sichuan Province, China.
| |
Collapse
|
84
|
Ramesh S, Dall’Alba D, Gonzalez C, Yu T, Mascagni P, Mutter D, Marescaux J, Fiorini P, Padoy N. Multi-task temporal convolutional networks for joint recognition of surgical phases and steps in gastric bypass procedures. Int J Comput Assist Radiol Surg 2021; 16:1111-1119. [PMID: 34013464 PMCID: PMC8260406 DOI: 10.1007/s11548-021-02388-z] [Citation(s) in RCA: 30] [Impact Index Per Article: 7.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/22/2021] [Accepted: 04/27/2021] [Indexed: 12/31/2022]
Abstract
PURPOSE Automatic segmentation and classification of surgical activity is crucial for providing advanced support in computer-assisted interventions and autonomous functionalities in robot-assisted surgeries. Prior works have focused on recognizing either coarse activities, such as phases, or fine-grained activities, such as gestures. This work aims at jointly recognizing two complementary levels of granularity directly from videos, namely phases and steps. METHODS We introduce two correlated surgical activities, phases and steps, for the laparoscopic gastric bypass procedure. We propose a multi-task multi-stage temporal convolutional network (MTMS-TCN) along with a multi-task convolutional neural network (CNN) training setup to jointly predict the phases and steps and benefit from their complementarity to better evaluate the execution of the procedure. We evaluate the proposed method on a large video dataset consisting of 40 surgical procedures (Bypass40). RESULTS We present experimental results from several baseline models for both phase and step recognition on the Bypass40. The proposed MTMS-TCN method outperforms single-task methods in both phase and step recognition by 1-2% in accuracy, precision and recall. Furthermore, for step recognition, MTMS-TCN achieves a superior performance of 3-6% compared to LSTM-based models on all metrics. CONCLUSION In this work, we present a multi-task multi-stage temporal convolutional network for surgical activity recognition, which shows improved results compared to single-task models on a gastric bypass dataset with multi-level annotations. The proposed method shows that the joint modeling of phases and steps is beneficial to improve the overall recognition of each type of activity.
Collapse
Affiliation(s)
- Sanat Ramesh
- Altair Robotics Lab, Department of Computer Science, University of Verona, Verona, Italy
- ICube, University of Strasbourg, CNRS, IHU Strasbourg, France
| | - Diego Dall’Alba
- Altair Robotics Lab, Department of Computer Science, University of Verona, Verona, Italy
| | | | - Tong Yu
- ICube, University of Strasbourg, CNRS, IHU Strasbourg, France
| | - Pietro Mascagni
- ICube, University of Strasbourg, CNRS, IHU Strasbourg, France
- Fondazione Policlinico Universitario Agostino Gemelli IRCCS, Rome, Italy
| | - Didier Mutter
- University Hospital of Strasbourg, IHU Strasbourg, France
- IRCAD, Strasbourg, France
| | | | - Paolo Fiorini
- Altair Robotics Lab, Department of Computer Science, University of Verona, Verona, Italy
| | - Nicolas Padoy
- ICube, University of Strasbourg, CNRS, IHU Strasbourg, France
| |
Collapse
|
85
|
Jin Y, Long Y, Chen C, Zhao Z, Dou Q, Heng PA. Temporal Memory Relation Network for Workflow Recognition From Surgical Video. IEEE TRANSACTIONS ON MEDICAL IMAGING 2021; 40:1911-1923. [PMID: 33780335 DOI: 10.1109/tmi.2021.3069471] [Citation(s) in RCA: 26] [Impact Index Per Article: 6.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/12/2023]
Abstract
Automatic surgical workflow recognition is a key component for developing context-aware computer-assisted systems in the operating theatre. Previous works either jointly modeled the spatial features with short fixed-range temporal information, or separately learned visual and long temporal cues. In this paper, we propose a novel end-to-end temporal memory relation network (TMRNet) for relating long-range and multi-scale temporal patterns to augment the present features. We establish a long-range memory bank to serve as a memory cell storing the rich supportive information. Through our designed temporal variation layer, the supportive cues are further enhanced by multi-scale temporal-only convolutions. To effectively incorporate the two types of cues without disturbing the joint learning of spatio-temporal features, we introduce a non-local bank operator to attentively relate the past to the present. In this regard, our TMRNet enables the current feature to view the long-range temporal dependency, as well as tolerate complex temporal extents. We have extensively validated our approach on two benchmark surgical video datasets, M2CAI challenge dataset and Cholec80 dataset. Experimental results demonstrate the outstanding performance of our method, consistently exceeding the state-of-the-art methods by a large margin (e.g., 67.0% v.s. 78.9% Jaccard on Cholec80 dataset).
Collapse
|
86
|
Tanzi L, Piazzolla P, Porpiglia F, Vezzetti E. Real-time deep learning semantic segmentation during intra-operative surgery for 3D augmented reality assistance. Int J Comput Assist Radiol Surg 2021; 16:1435-1445. [PMID: 34165672 PMCID: PMC8354939 DOI: 10.1007/s11548-021-02432-y] [Citation(s) in RCA: 11] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/20/2021] [Accepted: 05/10/2021] [Indexed: 01/16/2023]
Abstract
Purpose The current study aimed to propose a Deep Learning (DL) and Augmented Reality (AR) based solution for a in-vivo robot-assisted radical prostatectomy (RARP), to improve the precision of a published work from our group. We implemented a two-steps automatic system to align a 3D virtual ad-hoc model of a patient’s organ with its 2D endoscopic image, to assist surgeons during the procedure. Methods This approach was carried out using a Convolutional Neural Network (CNN) based structure for semantic segmentation and a subsequent elaboration of the obtained output, which produced the needed parameters for attaching the 3D model. We used a dataset obtained from 5 endoscopic videos (A, B, C, D, E), selected and tagged by our team’s specialists. We then evaluated the most performing couple of segmentation architecture and neural network and tested the overlay performances. Results U-Net stood out as the most effecting architectures for segmentation. ResNet and MobileNet obtained similar Intersection over Unit (IoU) results but MobileNet was able to elaborate almost twice operations per seconds. This segmentation technique outperformed the results from the former work, obtaining an average IoU for the catheter of 0.894 (σ = 0.076) compared to 0.339 (σ = 0.195). This modifications lead to an improvement also in the 3D overlay performances, in particular in the Euclidean Distance between the predicted and actual model’s anchor point, from 12.569 (σ= 4.456) to 4.160 (σ = 1.448) and in the Geodesic Distance between the predicted and actual model’s rotations, from 0.266 (σ = 0.131) to 0.169 (σ = 0.073). Conclusion This work is a further step through the adoption of DL and AR in the surgery domain. In future works, we will overcome the limits of this approach and finally improve every step of the surgical procedure.
Collapse
Affiliation(s)
- Leonardo Tanzi
- Department of Management, Production and Design Engineering, Polytechnic University of Turin, Turin, Italy.
| | - Pietro Piazzolla
- Department of Management, Production and Design Engineering, Polytechnic University of Turin, Turin, Italy
| | - Francesco Porpiglia
- Division of Urology, Department of Oncology, School of Medicine, University of Turin, Turin, Italy
| | - Enrico Vezzetti
- Department of Management, Production and Design Engineering, Polytechnic University of Turin, Turin, Italy
| |
Collapse
|
87
|
Reiter W. Co-occurrence balanced time series classification for the semi-supervised recognition of surgical smoke. Int J Comput Assist Radiol Surg 2021; 16:2021-2027. [PMID: 34032964 DOI: 10.1007/s11548-021-02411-3] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/21/2020] [Accepted: 05/14/2021] [Indexed: 12/14/2022]
Abstract
PURPOSE Automatic recognition and removal of smoke in surgical procedures can reduce risks to the patient by supporting the surgeon. Surgical smoke changes its visibility over time, impacting the vision depending on its amount and the volume of the body cavity. While modern deep learning algorithms for computer vision require large amounts of data, annotations for training are scarce. This paper investigates the use of unlabeled training data with a modern time-based deep learning algorithm. METHODS We propose to improve the state of the art in smoke recognition by enhancing a image classifier based on convolutional neural networks with a recurrent architecture thereby providing temporal context to the algorithm. We enrich the training with unlabeled recordings from similar procedures. The influence of surgical tools on the smoke recognition task is studied to reduce a possible bias. RESULTS The evaluations show that smoke recognition benefits from the additional temporal information during training. The use of unlabeled data from the same domain in a semi-supervised training procedure shows additional improvements reaching an accuracy of 86.8%. The proposed balancing policy is shown to have a positive impact on learning the discrimination of co-occurring surgical tools. CONCLUSIONS This study presents, to the best of our knowledge, the first use of a time series algorithm for the recognition of surgical smoke and the first use of this algorithm in the described semi-supervised setting. We show that the performance improvements with unlabeled data can be enhanced by integrating temporal context. We also show that adaption of the data distribution is beneficial to avoid learning biases.
Collapse
|
88
|
Xia T, Jia F. Against spatial-temporal discrepancy: contrastive learning-based network for surgical workflow recognition. Int J Comput Assist Radiol Surg 2021; 16:839-848. [PMID: 33950398 DOI: 10.1007/s11548-021-02382-5] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/15/2021] [Accepted: 04/16/2021] [Indexed: 11/27/2022]
Abstract
PURPOSE Automatic workflow recognition from surgical videos is fundamental and significant for developing context-aware systems in modern operating rooms. Although many approaches have been proposed to tackle challenges in this complex task, there are still many problems such as the fine-grained characteristics and spatial-temporal discrepancies in surgical videos. METHODS We propose a contrastive learning-based convolutional recurrent network with multi-level prediction to tackle these problems. Specifically, split-attention blocks are employed to extract spatial features. Through a mapping function in the step-phase branch, the current workflow can be predicted on two mutual-boosting levels. Furthermore, a contrastive branch is introduced to learn the spatial-temporal features that eliminate irrelevant changes in the environment. RESULTS We evaluate our method on the Cataract-101 dataset. The results show that our method achieves an accuracy of 96.37% with only surgical step labels, which outperforms other state-of-the-art approaches. CONCLUSION The proposed convolutional recurrent network based on step-phase prediction and contrastive learning can leverage fine-grained characteristics and alleviate spatial-temporal discrepancies to improve the performance of surgical workflow recognition.
Collapse
Affiliation(s)
- Tong Xia
- Shenzhen Institute of Advanced Technology, Chinese Academy of Sciences, Shenzhen, China.,University of Chinese Academy of Sciences, Beijing, China
| | - Fucang Jia
- Shenzhen Institute of Advanced Technology, Chinese Academy of Sciences, Shenzhen, China. .,University of Chinese Academy of Sciences, Beijing, China.
| |
Collapse
|
89
|
Garrow CR, Kowalewski KF, Li L, Wagner M, Schmidt MW, Engelhardt S, Hashimoto DA, Kenngott HG, Bodenstedt S, Speidel S, Müller-Stich BP, Nickel F. Machine Learning for Surgical Phase Recognition: A Systematic Review. Ann Surg 2021; 273:684-693. [PMID: 33201088 DOI: 10.1097/sla.0000000000004425] [Citation(s) in RCA: 134] [Impact Index Per Article: 33.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/08/2023]
Abstract
OBJECTIVE To provide an overview of ML models and data streams utilized for automated surgical phase recognition. BACKGROUND Phase recognition identifies different steps and phases of an operation. ML is an evolving technology that allows analysis and interpretation of huge data sets. Automation of phase recognition based on data inputs is essential for optimization of workflow, surgical training, intraoperative assistance, patient safety, and efficiency. METHODS A systematic review was performed according to the Cochrane recommendations and the Preferred Reporting Items for Systematic Reviews and Meta-Analyses statement. PubMed, Web of Science, IEEExplore, GoogleScholar, and CiteSeerX were searched. Literature describing phase recognition based on ML models and the capture of intraoperative signals during general surgery procedures was included. RESULTS A total of 2254 titles/abstracts were screened, and 35 full-texts were included. Most commonly used ML models were Hidden Markov Models and Artificial Neural Networks with a trend towards higher complexity over time. Most frequently used data types were feature learning from surgical videos and manual annotation of instrument use. Laparoscopic cholecystectomy was used most commonly, often achieving accuracy rates over 90%, though there was no consistent standardization of defined phases. CONCLUSIONS ML for surgical phase recognition can be performed with high accuracy, depending on the model, data type, and complexity of surgery. Different intraoperative data inputs such as video and instrument type can successfully be used. Most ML models still require significant amounts of manual expert annotations for training. The ML models may drive surgical workflow towards standardization, efficiency, and objectiveness to improve patient outcome in the future. REGISTRATION PROSPERO CRD42018108907.
Collapse
Affiliation(s)
- Carly R Garrow
- Department of General, Visceral, and Transplantation Surgery, University Hospital of Heidelberg, Heidelberg, Germany
| | - Karl-Friedrich Kowalewski
- Department of General, Visceral, and Transplantation Surgery, University Hospital of Heidelberg, Heidelberg, Germany
- Department of Urology, University Medical Center Mannheim, Heidelberg University, Mannheim, Germany
| | - Linhong Li
- Department of General, Visceral, and Transplantation Surgery, University Hospital of Heidelberg, Heidelberg, Germany
| | - Martin Wagner
- Department of General, Visceral, and Transplantation Surgery, University Hospital of Heidelberg, Heidelberg, Germany
| | - Mona W Schmidt
- Department of General, Visceral, and Transplantation Surgery, University Hospital of Heidelberg, Heidelberg, Germany
| | - Sandy Engelhardt
- Department of Computer Science, Mannheim University of Applied Sciences, Mannheim, Germany
| | - Daniel A Hashimoto
- Department of Surgery, Massachusetts General Hospital, Boston, Massachusetts
| | - Hannes G Kenngott
- Department of General, Visceral, and Transplantation Surgery, University Hospital of Heidelberg, Heidelberg, Germany
| | - Sebastian Bodenstedt
- Division of Translational Surgical Oncology, National Center for Tumor Diseases (NCT), Dresden, Germany
- Centre for Tactile Internet with Human-in-the-Loop (CeTI), TU Dresden, Dresden, Germany
| | - Stefanie Speidel
- Division of Translational Surgical Oncology, National Center for Tumor Diseases (NCT), Dresden, Germany
- Centre for Tactile Internet with Human-in-the-Loop (CeTI), TU Dresden, Dresden, Germany
| | - Beat P Müller-Stich
- Department of General, Visceral, and Transplantation Surgery, University Hospital of Heidelberg, Heidelberg, Germany
| | - Felix Nickel
- Department of General, Visceral, and Transplantation Surgery, University Hospital of Heidelberg, Heidelberg, Germany
| |
Collapse
|
90
|
Beyersdorffer P, Kunert W, Jansen K, Miller J, Wilhelm P, Burgert O, Kirschniak A, Rolinger J. Detection of adverse events leading to inadvertent injury during laparoscopic cholecystectomy using convolutional neural networks. ACTA ACUST UNITED AC 2021; 66:413-421. [PMID: 33655738 DOI: 10.1515/bmt-2020-0106] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/22/2020] [Accepted: 02/16/2021] [Indexed: 01/17/2023]
Abstract
Uncontrolled movements of laparoscopic instruments can lead to inadvertent injury of adjacent structures. The risk becomes evident when the dissecting instrument is located outside the field of view of the laparoscopic camera. Technical solutions to ensure patient safety are appreciated. The present work evaluated the feasibility of an automated binary classification of laparoscopic image data using Convolutional Neural Networks (CNN) to determine whether the dissecting instrument is located within the laparoscopic image section. A unique record of images was generated from six laparoscopic cholecystectomies in a surgical training environment to configure and train the CNN. By using a temporary version of the neural network, the annotation of the training image files could be automated and accelerated. A combination of oversampling and selective data augmentation was used to enlarge the fully labeled image data set and prevent loss of accuracy due to imbalanced class volumes. Subsequently the same approach was applied to the comprehensive, fully annotated Cholec80 database. The described process led to the generation of extensive and balanced training image data sets. The performance of the CNN-based binary classifiers was evaluated on separate test records from both databases. On our recorded data, an accuracy of 0.88 with regard to the safety-relevant classification was achieved. The subsequent evaluation on the Cholec80 data set yielded an accuracy of 0.84. The presented results demonstrate the feasibility of a binary classification of laparoscopic image data for the detection of adverse events in a surgical training environment using a specifically configured CNN architecture.
Collapse
Affiliation(s)
| | - Wolfgang Kunert
- Department of Surgery and Transplantation, Tübingen University Hospital, Tübingen, Germany
| | - Kai Jansen
- Department of Surgery and Transplantation, Tübingen University Hospital, Tübingen, Germany
| | - Johanna Miller
- Department of Surgery and Transplantation, Tübingen University Hospital, Tübingen, Germany
| | - Peter Wilhelm
- Department of Surgery and Transplantation, Tübingen University Hospital, Tübingen, Germany
| | - Oliver Burgert
- Department of Medical Informatics, Reutlingen University, Reutlingen, Germany
| | - Andreas Kirschniak
- Department of Surgery and Transplantation, Tübingen University Hospital, Tübingen, Germany
| | - Jens Rolinger
- Department of Surgery and Transplantation, Tübingen University Hospital, Tübingen, Germany
| |
Collapse
|
91
|
Guzmán-García C, Gómez-Tome M, Sánchez-González P, Oropesa I, Gómez EJ. Speech-Based Surgical Phase Recognition for Non-Intrusive Surgical Skills' Assessment in Educational Contexts. SENSORS (BASEL, SWITZERLAND) 2021; 21:1330. [PMID: 33668544 PMCID: PMC7918578 DOI: 10.3390/s21041330] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 01/08/2021] [Revised: 02/09/2021] [Accepted: 02/09/2021] [Indexed: 12/12/2022]
Abstract
Surgeons' procedural skills and intraoperative decision making are key elements of clinical practice. However, the objective assessment of these skills remains a challenge to this day. Surgical workflow analysis (SWA) is emerging as a powerful tool to solve this issue in surgical educational environments in real time. Typically, SWA makes use of video signals to automatically identify the surgical phase. We hypothesize that the analysis of surgeons' speech using natural language processing (NLP) can provide deeper insight into the surgical decision-making processes. As a preliminary step, this study proposes to use audio signals registered in the educational operating room (OR) to classify the phases of a laparoscopic cholecystectomy (LC). To do this, we firstly created a database with the transcriptions of audio recorded in surgical educational environments and their corresponding phase. Secondly, we compared the performance of four feature extraction techniques and four machine learning models to find the most appropriate model for phase recognition. The best resulting model was a support vector machine (SVM) coupled to a hidden-Markov model (HMM), trained with features obtained with Word2Vec (82.95% average accuracy). The analysis of this model's confusion matrix shows that some phrases are misplaced due to the similarity in the words used. The study of the model's temporal component suggests that further attention should be paid to accurately detect surgeons' normal conversation. This study proves that speech-based classification of LC phases can be effectively achieved. This lays the foundation for the use of audio signals for SWA, to create a framework of LC to be used in surgical training, especially for the training and assessment of procedural and decision-making skills (e.g., to assess residents' procedural knowledge and their ability to react to adverse situations).
Collapse
Affiliation(s)
- Carmen Guzmán-García
- Biomedical Engineering and Telemedicine Centre, ETSI Telecomunicación, Centre for Biomedical Technology, Universidad Politécnica de Madrid, Madrid 28040, Spain; (M.G.-T.); (P.S.-G.); (I.O.); (E.J.G.)
| | - Marcos Gómez-Tome
- Biomedical Engineering and Telemedicine Centre, ETSI Telecomunicación, Centre for Biomedical Technology, Universidad Politécnica de Madrid, Madrid 28040, Spain; (M.G.-T.); (P.S.-G.); (I.O.); (E.J.G.)
| | - Patricia Sánchez-González
- Biomedical Engineering and Telemedicine Centre, ETSI Telecomunicación, Centre for Biomedical Technology, Universidad Politécnica de Madrid, Madrid 28040, Spain; (M.G.-T.); (P.S.-G.); (I.O.); (E.J.G.)
- Centro de Investigación Biomédica en Red en Bioingeniería, Biomateriales y Nanomedicina, Madrid 28029, Spain
| | - Ignacio Oropesa
- Biomedical Engineering and Telemedicine Centre, ETSI Telecomunicación, Centre for Biomedical Technology, Universidad Politécnica de Madrid, Madrid 28040, Spain; (M.G.-T.); (P.S.-G.); (I.O.); (E.J.G.)
| | - Enrique J. Gómez
- Biomedical Engineering and Telemedicine Centre, ETSI Telecomunicación, Centre for Biomedical Technology, Universidad Politécnica de Madrid, Madrid 28040, Spain; (M.G.-T.); (P.S.-G.); (I.O.); (E.J.G.)
- Centro de Investigación Biomédica en Red en Bioingeniería, Biomateriales y Nanomedicina, Madrid 28029, Spain
| |
Collapse
|
92
|
Wang H, Pan X, Zhao H, Gao C, Liu N. Hard frame detection for the automated clipping of surgical nasal endoscopic video. Int J Comput Assist Radiol Surg 2021; 16:231-240. [PMID: 33459977 DOI: 10.1007/s11548-021-02311-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/26/2020] [Accepted: 01/04/2021] [Indexed: 10/22/2022]
Abstract
PURPOSE The automated clipping of surgical nasal endoscopic video is a challenging task because there are many hard frames that have indiscriminative visual features which lead to misclassification. Prior works mainly aim to classify these hard frames along with other frames, and it would seriously affect the performance of classification. METHODS We propose a hard frame detection method using a convolutional LSTM network (called HFD-ConvLSTM) to remove invalid video frames automatically. Firstly, a new separator based on the coarse-grained classifier is defined to remove the invalid frames. Meanwhile, the hard frames are detected via measuring the blurring score of a video frame. Then, the squeeze-and-excitation is used to select the informative spatial-temporal features of endoscopic videos and further classify the video frames with a fine-grained ConvLSTM learning from the reconstructed training set with hard frames. RESULTS We justify the proposed solution through extensive experiments using 12 surgical videos (duration:8501 s). The experiments are performed on both hard frame detection and video frame classification. Nearly 88.3% fuzzy frames can be detected and the classification accuracy is boosted to 95.2%. HFD-ConvLSTM achieves superior performance compared to other methods. CONCLUSION HFD-ConvLSTM provides a new paradigm for video clipping by breaking the complex clipping problem into smaller, more easily managed 2-classification problems. Our investigation reveals that the hard framed detection based on blurring score calculation is effective for nasal endoscopic video clipping.
Collapse
Affiliation(s)
- Hongyu Wang
- School of Computer Science and Technology, Xi'an University of Posts and Telecommunications, Xi'an Shaanxi, 710121, China. .,Shaanxi Key Laboratory of Network Data Analysis and Intelligent Processing, Xi'an University of Posts and Telecommunications, Xi'an Shaanxi, 710121, China.
| | - Xiaoying Pan
- School of Computer Science and Technology, Xi'an University of Posts and Telecommunications, Xi'an Shaanxi, 710121, China.,Shaanxi Key Laboratory of Network Data Analysis and Intelligent Processing, Xi'an University of Posts and Telecommunications, Xi'an Shaanxi, 710121, China
| | - Hao Zhao
- School of Computer Science and Technology, Xi'an University of Posts and Telecommunications, Xi'an Shaanxi, 710121, China
| | - Cong Gao
- School of Computer Science and Technology, Xi'an University of Posts and Telecommunications, Xi'an Shaanxi, 710121, China.,Shaanxi Key Laboratory of Network Data Analysis and Intelligent Processing, Xi'an University of Posts and Telecommunications, Xi'an Shaanxi, 710121, China
| | - Ni Liu
- School of Computer Science and Technology, Xi'an University of Posts and Telecommunications, Xi'an Shaanxi, 710121, China
| |
Collapse
|
93
|
Esteva A, Chou K, Yeung S, Naik N, Madani A, Mottaghi A, Liu Y, Topol E, Dean J, Socher R. Deep learning-enabled medical computer vision. NPJ Digit Med 2021; 4:5. [PMID: 33420381 PMCID: PMC7794558 DOI: 10.1038/s41746-020-00376-2] [Citation(s) in RCA: 296] [Impact Index Per Article: 74.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/17/2020] [Accepted: 12/01/2020] [Indexed: 02/07/2023] Open
Abstract
A decade of unprecedented progress in artificial intelligence (AI) has demonstrated the potential for many fields-including medicine-to benefit from the insights that AI techniques can extract from data. Here we survey recent progress in the development of modern computer vision techniques-powered by deep learning-for medical applications, focusing on medical imaging, medical video, and clinical deployment. We start by briefly summarizing a decade of progress in convolutional neural networks, including the vision tasks they enable, in the context of healthcare. Next, we discuss several example medical imaging applications that stand to benefit-including cardiology, pathology, dermatology, ophthalmology-and propose new avenues for continued work. We then expand into general medical video, highlighting ways in which clinical workflows can integrate computer vision to enhance care. Finally, we discuss the challenges and hurdles required for real-world clinical deployment of these technologies.
Collapse
Affiliation(s)
| | | | | | - Nikhil Naik
- Salesforce AI Research, San Francisco, CA, USA
| | - Ali Madani
- Salesforce AI Research, San Francisco, CA, USA
| | | | - Yun Liu
- Google Research, Mountain View, CA, USA
| | - Eric Topol
- Scripps Research Translational Institute, La Jolla, CA, USA
| | - Jeff Dean
- Google Research, Mountain View, CA, USA
| | | |
Collapse
|
94
|
Ward TM, Mascagni P, Ban Y, Rosman G, Padoy N, Meireles O, Hashimoto DA. Computer vision in surgery. Surgery 2020; 169:1253-1256. [PMID: 33272610 DOI: 10.1016/j.surg.2020.10.039] [Citation(s) in RCA: 65] [Impact Index Per Article: 13.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/06/2020] [Revised: 10/09/2020] [Accepted: 10/10/2020] [Indexed: 12/17/2022]
Abstract
The fields of computer vision (CV) and artificial intelligence (AI) have undergone rapid advancements in the past decade, many of which have been applied to the analysis of intraoperative video. These advances are driven by wide-spread application of deep learning, which leverages multiple layers of neural networks to teach computers complex tasks. Prior to these advances, applications of AI in the operating room were limited by our relative inability to train computers to accurately understand images with traditional machine learning (ML) techniques. The development and refining of deep neural networks that can now accurately identify objects in images and remember past surgical events has sparked a surge in the applications of CV to analyze intraoperative video and has allowed for the accurate identification of surgical phases (steps) and instruments across a variety of procedures. In some cases, CV can even identify operative phases with accuracy similar to surgeons. Future research will likely expand on this foundation of surgical knowledge using larger video datasets and improved algorithms with greater accuracy and interpretability to create clinically useful AI models that gain widespread adoption and augment the surgeon's ability to provide safer care for patients everywhere.
Collapse
Affiliation(s)
- Thomas M Ward
- Surgical Artificial Intelligence and Innovation Laboratory, Massachusetts General Hospital, Harvard Medical School, Boston, MA
| | - Pietro Mascagni
- ICube, University of Strasbourg, CNRS, IHU Strasbourg, France; Fondazione Policlinico Universitario A. Gemelli IRCCS, Rome, Italy
| | - Yutong Ban
- Surgical Artificial Intelligence and Innovation Laboratory, Massachusetts General Hospital, Harvard Medical School, Boston, MA; Distributed Robotics Laboratory, Computer Science and Artificial Intelligence Laboratory, Massachusetts Institute of Technology, Cambridge, MA
| | - Guy Rosman
- Surgical Artificial Intelligence and Innovation Laboratory, Massachusetts General Hospital, Harvard Medical School, Boston, MA; Distributed Robotics Laboratory, Computer Science and Artificial Intelligence Laboratory, Massachusetts Institute of Technology, Cambridge, MA
| | - Nicolas Padoy
- ICube, University of Strasbourg, CNRS, IHU Strasbourg, France
| | - Ozanan Meireles
- Surgical Artificial Intelligence and Innovation Laboratory, Massachusetts General Hospital, Harvard Medical School, Boston, MA
| | - Daniel A Hashimoto
- Surgical Artificial Intelligence and Innovation Laboratory, Massachusetts General Hospital, Harvard Medical School, Boston, MA.
| |
Collapse
|
95
|
Patch-based classification of gallbladder wall vascularity from laparoscopic images using deep learning. Int J Comput Assist Radiol Surg 2020; 16:103-113. [PMID: 33146850 DOI: 10.1007/s11548-020-02285-x] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/29/2020] [Accepted: 10/23/2020] [Indexed: 12/13/2022]
Abstract
PURPOSE In this study, we propose a deep learning approach for assessment of gallbladder (GB) wall vascularity from images of laparoscopic cholecystectomy (LC). Difficulty in the visualization of GB wall vessels may be the result of fatty infiltration or increased thickening of the GB wall, potentially as a result of cholecystitis or other diseases. METHODS The dataset included 800 patches and 181 region outlines of the GB wall extracted from 53 operations of the Cholec80 video collection. The GB regions and patches were annotated by two expert surgeons using two labeling schemes: 3 classes (low, medium and high vascularity) and 2 classes (low vs. high). Two convolutional neural network (CNN) architectures were investigated. Preprocessing (vessel enhancement) and post-processing (late fusion of CNN output) techniques were applied. RESULTS The best model yielded accuracy 94.48% and 83.77% for patch classification into 2 and 3 classes, respectively. For the GB wall regions, the best model yielded accuracy 91.16% (2 classes) and 80.66% (3 classes). The inter-observer agreement was 91.71% (2 classes) and 78.45% (3 classes). Late fusion analysis allowed the computation of spatial probability maps, which provided a visual representation of the probability for each vascularity class across the GB wall region. CONCLUSIONS This study is the first significant step forward to assess the vascularity of the GB wall from intraoperative images based on computer vision and deep learning techniques. The classification performance of the CNNs was comparable to the agreement of two expert surgeons. The approach may be used for various applications such as for classification of LC operations and context-aware assistance in surgical education and practice.
Collapse
|
96
|
Li Y, Chen J, Xue P, Tang C, Chang J, Chu C, Ma K, Li Q, Zheng Y, Qiao Y. Computer-Aided Cervical Cancer Diagnosis Using Time-Lapsed Colposcopic Images. IEEE TRANSACTIONS ON MEDICAL IMAGING 2020; 39:3403-3415. [PMID: 32406830 DOI: 10.1109/tmi.2020.2994778] [Citation(s) in RCA: 44] [Impact Index Per Article: 8.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/11/2023]
Abstract
Cervical cancer causes the fourth most cancer-related deaths of women worldwide. Early detection of cervical intraepithelial neoplasia (CIN) can significantly increase the survival rate of patients. In this paper, we propose a deep learning framework for the accurate identification of LSIL+ (including CIN and cervical cancer) using time-lapsed colposcopic images. The proposed framework involves two main components, i.e., key-frame feature encoding networks and feature fusion network. The features of the original (pre-acetic-acid) image and the colposcopic images captured at around 60s, 90s, 120s and 150s during the acetic acid test are encoded by the feature encoding networks. Several fusion approaches are compared, all of which outperform the existing automated cervical cancer diagnosis systems using a single time slot. A graph convolutional network with edge features (E-GCN) is found to be the most suitable fusion approach in our study, due to its excellent explainability consistent with the clinical practice. A large-scale dataset, containing time-lapsed colposcopic images from 7,668 patients, is collected from the collaborative hospital to train and validate our deep learning framework. Colposcopists are invited to compete with our computer-aided diagnosis system. The proposed deep learning framework achieves a classification accuracy of 78.33%-comparable to that of an in-service colposcopist-which demonstrates its potential to provide assistance in the realistic clinical scenario.
Collapse
|
97
|
Chadebecq F, Vasconcelos F, Mazomenos E, Stoyanov D. Computer Vision in the Surgical Operating Room. Visc Med 2020; 36:456-462. [PMID: 33447601 DOI: 10.1159/000511934] [Citation(s) in RCA: 27] [Impact Index Per Article: 5.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/12/2020] [Accepted: 09/30/2020] [Indexed: 12/20/2022] Open
Abstract
Background Multiple types of surgical cameras are used in modern surgical practice and provide a rich visual signal that is used by surgeons to visualize the clinical site and make clinical decisions. This signal can also be used by artificial intelligence (AI) methods to provide support in identifying instruments, structures, or activities both in real-time during procedures and postoperatively for analytics and understanding of surgical processes. Summary In this paper, we provide a succinct perspective on the use of AI and especially computer vision to power solutions for the surgical operating room (OR). The synergy between data availability and technical advances in computational power and AI methodology has led to rapid developments in the field and promising advances. Key Messages With the increasing availability of surgical video sources and the convergence of technologies around video storage, processing, and understanding, we believe clinical solutions and products leveraging vision are going to become an important component of modern surgical capabilities. However, both technical and clinical challenges remain to be overcome to efficiently make use of vision-based approaches into the clinic.
Collapse
Affiliation(s)
- François Chadebecq
- Department of Computer Science, Wellcome/EPSRC Centre for Interventional and Surgical Sciences (WEISS), University College London, London, United Kingdom
| | - Francisco Vasconcelos
- Department of Computer Science, Wellcome/EPSRC Centre for Interventional and Surgical Sciences (WEISS), University College London, London, United Kingdom
| | - Evangelos Mazomenos
- Department of Computer Science, Wellcome/EPSRC Centre for Interventional and Surgical Sciences (WEISS), University College London, London, United Kingdom
| | - Danail Stoyanov
- Department of Computer Science, Wellcome/EPSRC Centre for Interventional and Surgical Sciences (WEISS), University College London, London, United Kingdom
| |
Collapse
|
98
|
Surgical phase recognition by learning phase transitions. CURRENT DIRECTIONS IN BIOMEDICAL ENGINEERING 2020. [DOI: 10.1515/cdbme-2020-0037] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/15/2022] Open
Abstract
Abstract
Automatic recognition of surgical phases is an important component for developing an intra-operative context-aware system. Prior work in this area focuses on recognizing short-term tool usage patterns within surgical phases. However, the difference between intra- and inter-phase tool usage patterns has not been investigated for automatic phase recognition. We developed a Recurrent Neural Network (RNN), in particular a state-preserving Long Short Term Memory (LSTM) architecture to utilize the long-term evolution of tool usage within complete surgical procedures. For fully automatic tool presence detection from surgical video frames, a Convolutional Neural Network (CNN) based architecture namely ZIBNet is employed. Our proposed approach outperformed EndoNet by 8.1% on overall precision for phase detection tasks and 12.5% on meanAP for tool recognition tasks.
Collapse
|
99
|
Tamer Abdulbaki Alshirbaji, Jalal NA, Möller K. A convolutional neural network with a two-stage LSTM model for tool presence detection in laparoscopic videos. CURRENT DIRECTIONS IN BIOMEDICAL ENGINEERING 2020. [DOI: 10.1515/cdbme-2020-0002] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/15/2022] Open
Abstract
Abstract
Surgical tool presence detection in laparoscopic videos is a challenging problem that plays a critical role in developing context-aware systems in operating rooms (ORs). In this work, we propose a deep learning-based approach for detecting surgical tools in laparoscopic images using a convolutional neural network (CNN) in combination with two long short-term memory (LSTM) models. A pre-trained CNN model was trained to learn visual features from images. Then, LSTM was employed to include temporal information through a video clip of neighbour frames. Finally, the second LSTM was utilized to model temporal dependencies across the whole surgical video. Experimental evaluation has been conducted with the Cholec80 dataset to validate our approach. Results show that the most notable improvement is achieved after employing the two-stage LSTM model, and the proposed approach achieved better or similar performance compared with state-of-the-art methods.
Collapse
Affiliation(s)
| | - Nour Aldeen Jalal
- Institute of Technical Medicine, Furtwangen University , Villingen-Schwenningen , Germany
| | - Knut Möller
- Institute of Technical Medicine, Furtwangen University , Villingen-Schwenningen , Germany
| |
Collapse
|
100
|
Bencteux V, Saibro G, Shlomovitz E, Mascagni P, Perretta S, Hostettler A, Marescaux J, Collins T. Automatic task recognition in a flexible endoscopy benchtop trainer with semi-supervised learning. Int J Comput Assist Radiol Surg 2020; 15:1585-1595. [PMID: 32592068 DOI: 10.1007/s11548-020-02208-w] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/03/2019] [Accepted: 06/01/2020] [Indexed: 12/17/2022]
Abstract
PURPOSE Inexpensive benchtop training systems offer significant advantages to meet the increasing demand of training surgeons and gastroenterologists in flexible endoscopy. Established scoring systems exist, based on task duration and mistake evaluation. However, they require trained human raters, which limits broad and low-cost adoption. There is an unmet and important need to automate rating with machine learning. METHOD We present a general and robust approach for recognizing training tasks from endoscopic training video, which consequently automates task duration computation. Our main technical novelty is to show the performance of state-of-the-art CNN-based approaches can be improved significantly with a novel semi-supervised learning approach, using both labelled and unlabelled videos. In the latter case, we assume only the task execution order is known a priori. RESULTS Two video datasets are presented: the first has 19 videos recorded in examination conditions, where the participants complete their tasks in predetermined order. The second has 17 h of videos recorded in self-assessment conditions, where participants complete one or more tasks in any order. For the first dataset, we obtain a mean task duration estimation error of 3.65 s, with a mean task duration of 159 s ([Formula: see text] relative error). For the second dataset, we obtain a mean task duration estimation error of 3.67 s. We reduce an average of 5.63% in error to 3.67% thanks to our semi-supervised learning approach. CONCLUSION This work is the first significant step forward to automate rating of flexible endoscopy students using a low-cost benchtop trainer. Thanks to our semi-supervised learning approach, we can scale easily to much larger unlabelled training datasets. The approach can also be used for other phase recognition tasks.
Collapse
|