1
|
González-Cebrián Á, Bordonaba S, Pascau J, Paredes I, Lagares A, de Toledo P. Attention in surgical phase recognition for endoscopic pituitary surgery: Insights from real-world data. Comput Biol Med 2025; 191:110222. [PMID: 40249993 DOI: 10.1016/j.compbiomed.2025.110222] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/02/2024] [Revised: 04/11/2025] [Accepted: 04/14/2025] [Indexed: 04/20/2025]
Abstract
BACKGROUND AND OBJECTIVE Surgical Phase Recognition systems are used to support the automated documentation of a procedure and to provide the surgical team with real-time feedback, potentially improving surgical outcome and reducing adverse events. The objective of this work is to develop a model for endoscopic pituitary surgery, a challenging procedure for phase recognition due to the high variability in the order of surgical phases. METHODS A dataset of 69 pituitary endoscopic videos was collected and labelled by two surgeons in seven different phases. The architecture proposed comprises a Convolutional Neural Network to identify spatial features in individual frames, and a Segment Attentive Hierarchical Consistency Network (which combines Temporal Convolutional Networks with attention mechanisms) to learn temporal relationship information between frames and segments at different temporal scales. Finally, predictions are refined with an adaptative mode window. RESULTS We have built and made publicly available the largest pituitary endoscopic surgery database to date, named PituPhase. We have built a model with a 73 % accuracy (75 % using a 10 s relaxed boundary). This result is comparable to other state-of-the-art methods in this surgical domain despite the challenges of the dataset (only 10 % of the videos are complete and only 3 % present all phases in the same order, versus 90 % and 50 % respectively in other studies). CONCLUSIONS Attention mechanisms in combination with Temporal Convolutional Networks and adaptive mode windows improve the performance of Surgical Phase Recognition systems and are robust to missing video sections and high variability in phase order.
Collapse
Affiliation(s)
- Ángela González-Cebrián
- Computer Science and Engineering Department, Universidad Carlos III de Madrid, Madrid, Spain.
| | - Sara Bordonaba
- Instituto de Investigación Sanitaria Hospital 12 de Octubre (imas12), Madrid, Spain.
| | - Javier Pascau
- Bioengineering Department, Universidad Carlos III de Madrid, Madrid, Spain.
| | - Igor Paredes
- Instituto de Investigación Sanitaria Hospital 12 de Octubre (imas12), Madrid, Spain; Department of Surgery, Faculty of Medicine, Universidad Complutense de Madrid, Madrid, Spain; NeuroSurgery Department, Hospital Universitario 12 de Octubre, Madrid, Spain.
| | - Alfonso Lagares
- Instituto de Investigación Sanitaria Hospital 12 de Octubre (imas12), Madrid, Spain; Department of Surgery, Faculty of Medicine, Universidad Complutense de Madrid, Madrid, Spain; NeuroSurgery Department, Hospital Universitario 12 de Octubre, Madrid, Spain.
| | - Paula de Toledo
- Computer Science and Engineering Department, Universidad Carlos III de Madrid, Madrid, Spain.
| |
Collapse
|
2
|
Wang Z, Liu C, Zhu L, Wang T, Zhang S, Dou Q. Improving Foundation Model for Endoscopy Video Analysis via Representation Learning on Long Sequences. IEEE J Biomed Health Inform 2025; 29:3526-3536. [PMID: 40031835 DOI: 10.1109/jbhi.2025.3532311] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 03/05/2025]
Abstract
Recent advancements in endoscopy video analysis have relied on the utilization of relatively short video clips extracted from longer videos or millions of individual frames. However, these approaches tend to neglect the domain-specific characteristics of endoscopy data, which is typically presented as a long stream containing valuable semantic spatial and temporal information. To address this limitation, we propose EndoFM-LV, a foundation model developed under a minute-level pre-training framework upon long endoscopy video sequences. To be specific, we propose a novel masked token modeling scheme within a teacher-student framework for self-supervised video pre-training, which is tailored for learning representations from long video sequences. For pre-training, we construct a large-scale long endoscopy video dataset comprising 6,469 long endoscopic video samples, each longer than 1 minute and totaling over 13 million frames. Our EndoFM-LV is evaluated on four types of endoscopy tasks, namely classification, segmentation, detection, and workflow recognition, serving as the backbone or temporal module. Extensive experimental results demonstrate that our framework outperforms previous state-of-the-art video-based and frame-based approaches by a significant margin, surpassing Endo-FM (5.6% F1, 9.3% Dice, 8.4% F1, and 3.3% accuracy for classification, segmentation, detection, and workflow recognition) and EndoSSL (5.0% F1, 8.1% Dice, 9.3% F1 and 3.1% accuracy for classification, segmentation, detection, and workflow recognition).
Collapse
|
3
|
Power D, Burke C, Madden MG, Ullah I. Automated assessment of simulated laparoscopic surgical skill performance using deep learning. Sci Rep 2025; 15:13591. [PMID: 40253514 PMCID: PMC12009314 DOI: 10.1038/s41598-025-96336-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/12/2024] [Accepted: 03/27/2025] [Indexed: 04/21/2025] Open
Abstract
Artificial intelligence (AI) has the potential to improve healthcare and patient safety and is currently being adopted across various fields of medicine and healthcare. AI and in particular computer vision (CV) are well suited to the analysis of minimally invasive surgical simulation videos for training and performance improvement. CV techniques have rapidly improved in recent years from accurately recognizing objects, instruments, and gestures to phases of surgery and more recently to remembering past surgical steps. Lack of labeled data is a particular problem in surgery considering its complexity, as human annotation and manual assessment are both expensive in time and cost, and in most cases rely on direct intervention of clinical expertise. In this study, we introduce a newly collected simulated Laparoscopic Surgical Performance Dataset (LSPD) specifically designed to address these challenges. Unlike existing datasets that focus on instrument tracking or anatomical structure recognition, the LSPD is tailored for evaluating simulated laparoscopic surgical skill performance at various expertise levels. We provide detailed statistical analyses to identify and compare poorly performed and well-executed operations across different skill levels (novice, trainee, expert) for three specific skills: stack, bands, and tower. We employ a 3-dimensional convolutional neural network (3DCNN) with a weakly-supervised approach to classify the experience levels of surgeons. Our results show that the 3DCNN effectively distinguishes between novices, trainees, and experts, achieving an F1 score of 0.91 and an AUC of 0.92. This study highlights the value of the LSPD dataset and demonstrates the potential of leveraging 3DCNN-based and weakly-supervised approaches to automate the evaluation of surgical performance, reducing reliance on manual expert annotation and assessments. These advancements contribute to improving surgical training and performance analysis.
Collapse
Affiliation(s)
- David Power
- ASSERT Centre, College of Medicine and Health, University College Cork, Cork, Ireland.
| | - Cathy Burke
- Cork University Maternity Hospital, Cork, Ireland
| | - Michael G Madden
- School of Computer Science and Data Science Institute, University of Galway, Galway, Ireland
- Insight Research Ireland Centre for Data Analytics and Data Science Institute, University of Galway, Galway, Ireland
| | - Ihsan Ullah
- School of Computer Science and Data Science Institute, University of Galway, Galway, Ireland
- Insight Research Ireland Centre for Data Analytics and Data Science Institute, University of Galway, Galway, Ireland
| |
Collapse
|
4
|
Tu M, Jung H, Kim J, Kyme A. Head-Mounted Displays in Context-Aware Systems for Open Surgery: A State-of-the-Art Review. IEEE J Biomed Health Inform 2025; 29:1165-1175. [PMID: 39466871 DOI: 10.1109/jbhi.2024.3485023] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/30/2024]
Abstract
Surgical context-aware systems (SCAS), which leverage real-time data and analysis from the operating room to inform surgical activities, can be enhanced through the integration of head-mounted displays (HMDs). Rather than user-agnostic data derived from conventional, and often static, external sensors, HMD-based SCAS relies on dynamic user-centric sensing of the surgical context. The analyzed context-aware information is then augmented directly into a user's field of view via augmented reality (AR) to directly improve their task and decision-making capability. This state-of-the-art review complements previous reviews by exploring the advancement of HMD- based SCAS, including their development and impact on enhancing situational awareness and surgical outcomes in the operating room. The survey demonstrates that this technology can mitigate risks associated with gaps in surgical expertise, increase procedural efficiency, and improve patient outcomes. We also highlight key limitations still to be addressed by the research community, including improving prediction accuracy, robustly handling data heterogeneity, and reducing system latency.
Collapse
|
5
|
Kabir MM, Rahman A, Hasan MN, Mridha MF. Computer vision algorithms in healthcare: Recent advancements and future challenges. Comput Biol Med 2025; 185:109531. [PMID: 39675214 DOI: 10.1016/j.compbiomed.2024.109531] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/01/2023] [Revised: 10/05/2024] [Accepted: 12/03/2024] [Indexed: 12/17/2024]
Abstract
Computer vision has emerged as a promising technology with numerous applications in healthcare. This systematic review provides an overview of advancements and challenges associated with computer vision in healthcare. The review highlights the application areas where computer vision has made significant strides, including medical imaging, surgical assistance, remote patient monitoring, and telehealth. Additionally, it addresses the challenges related to data quality, privacy, model interpretability, and integration with existing healthcare systems. Ethical and legal considerations, such as patient consent and algorithmic bias, are also discussed. The review concludes by identifying future directions and opportunities for research, emphasizing the potential impact of computer vision on healthcare delivery and outcomes. Overall, this systematic review underscores the importance of understanding both the advancements and challenges in computer vision to facilitate its responsible implementation in healthcare.
Collapse
Affiliation(s)
- Md Mohsin Kabir
- School of Innovation, Design and Engineering, Mälardalens University, Västerås, 722 20, Sweden.
| | - Ashifur Rahman
- Department of Computer Science and Engineering, Bangladesh University of Business and Technology, Mirpur-2, Dhaka, 1216, Bangladesh.
| | - Md Nahid Hasan
- Department of Computer Science, University of Wisconsin-Milwaukee, Milwaukee, WI 53211, United States.
| | - M F Mridha
- Department of Computer Science, American International University-Bangladesh, Dhaka, 1229, Dhaka, Bangladesh.
| |
Collapse
|
6
|
Hla DA, Hindin DI. Generative AI & machine learning in surgical education. Curr Probl Surg 2025; 63:101701. [PMID: 39922636 DOI: 10.1016/j.cpsurg.2024.101701] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/08/2024] [Accepted: 12/16/2024] [Indexed: 02/10/2025]
Affiliation(s)
- Diana A Hla
- Mayo Clinic Alix School of Medicine, Rochester, MN
| | - David I Hindin
- Division of General Surgery, Department of Surgery, Stanford University, Stanford, CA.
| |
Collapse
|
7
|
Liu Y, Boels M, Garcia-Peraza-Herrera LC, Vercauteren T, Dasgupta P, Granados A, Ourselin S. LoViT: Long Video Transformer for surgical phase recognition. Med Image Anal 2025; 99:103366. [PMID: 39418831 PMCID: PMC11876726 DOI: 10.1016/j.media.2024.103366] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/14/2023] [Revised: 06/12/2024] [Accepted: 10/02/2024] [Indexed: 10/19/2024]
Abstract
Online surgical phase recognition plays a significant role towards building contextual tools that could quantify performance and oversee the execution of surgical workflows. Current approaches are limited since they train spatial feature extractors using frame-level supervision that could lead to incorrect predictions due to similar frames appearing at different phases, and poorly fuse local and global features due to computational constraints which can affect the analysis of long videos commonly encountered in surgical interventions. In this paper, we present a two-stage method, called Long Video Transformer (LoViT), emphasizing the development of a temporally-rich spatial feature extractor and a phase transition map. The temporally-rich spatial feature extractor is designed to capture critical temporal information within the surgical video frames. The phase transition map provides essential insights into the dynamic transitions between different surgical phases. LoViT combines these innovations with a multiscale temporal aggregator consisting of two cascaded L-Trans modules based on self-attention, followed by a G-Informer module based on ProbSparse self-attention for processing global temporal information. The multi-scale temporal head then leverages the temporally-rich spatial features and phase transition map to classify surgical phases using phase transition-aware supervision. Our approach outperforms state-of-the-art methods on the Cholec80 and AutoLaparo datasets consistently. Compared to Trans-SVNet, LoViT achieves a 2.4 pp (percentage point) improvement in video-level accuracy on Cholec80 and a 3.1 pp improvement on AutoLaparo. Our results demonstrate the effectiveness of our approach in achieving state-of-the-art performance of surgical phase recognition on two datasets of different surgical procedures and temporal sequencing characteristics. The project page is available at https://github.com/MRUIL/LoViT.
Collapse
Affiliation(s)
- Yang Liu
- Department of Surgical & Interventional Engineering, King's College London, United Kingdom.
| | - Maxence Boels
- Department of Surgical & Interventional Engineering, King's College London, United Kingdom
| | | | - Tom Vercauteren
- Department of Surgical & Interventional Engineering, King's College London, United Kingdom
| | - Prokar Dasgupta
- Department of Peter Gorer Department of Immunobiology, King's College London, United Kingdom
| | - Alejandro Granados
- Department of Surgical & Interventional Engineering, King's College London, United Kingdom
| | - Sébastien Ourselin
- Department of Surgical & Interventional Engineering, King's College London, United Kingdom.
| |
Collapse
|
8
|
Yoshida M, Kitaguchi D, Takeshita N, Matsuzaki H, Ishikawa Y, Yura M, Akimoto T, Kinoshita T, Ito M. Surgical step recognition in laparoscopic distal gastrectomy using artificial intelligence: a proof-of-concept study. Langenbecks Arch Surg 2024; 409:213. [PMID: 38995411 DOI: 10.1007/s00423-024-03411-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/27/2024] [Accepted: 07/05/2024] [Indexed: 07/13/2024]
Abstract
PURPOSE Laparoscopic distal gastrectomy (LDG) is a difficult procedure for early career surgeons. Artificial intelligence (AI)-based surgical step recognition is crucial for establishing context-aware computer-aided surgery systems. In this study, we aimed to develop an automatic recognition model for LDG using AI and evaluate its performance. METHODS Patients who underwent LDG at our institution in 2019 were included in this study. Surgical video data were classified into the following nine steps: (1) Port insertion; (2) Lymphadenectomy on the left side of the greater curvature; (3) Lymphadenectomy on the right side of the greater curvature; (4) Division of the duodenum; (5) Lymphadenectomy of the suprapancreatic area; (6) Lymphadenectomy on the lesser curvature; (7) Division of the stomach; (8) Reconstruction; and (9) From reconstruction to completion of surgery. Two gastric surgeons manually assigned all annotation labels. Convolutional neural network (CNN)-based image classification was further employed to identify surgical steps. RESULTS The dataset comprised 40 LDG videos. Over 1,000,000 frames with annotated labels of the LDG steps were used to train the deep-learning model, with 30 and 10 surgical videos for training and validation, respectively. The classification accuracies of the developed models were precision, 0.88; recall, 0.87; F1 score, 0.88; and overall accuracy, 0.89. The inference speed of the proposed model was 32 ps. CONCLUSION The developed CNN model automatically recognized the LDG surgical process with relatively high accuracy. Adding more data to this model could provide a fundamental technology that could be used in the development of future surgical instruments.
Collapse
Affiliation(s)
- Mitsumasa Yoshida
- Gastric Surgery Division, National Cancer Center Hospital East, 6-5-1 Kashiwanoha, Kashiwa, Chiba, 277-8577, Japan
- Department for the Promotion of Medical Device Innovation, National Cancer Center Hospital East, 6- 5-1 Kashiwanoha, Kashiwa, Chiba, 277-8577, Japan
- Course of Advanced Clinical Research of Cancer, Juntendo University Graduate School of Medicine, 2- 1-1, Hongo, Bunkyo-Ward, Tokyo, 113-8421, Japan
| | - Daichi Kitaguchi
- Department for the Promotion of Medical Device Innovation, National Cancer Center Hospital East, 6- 5-1 Kashiwanoha, Kashiwa, Chiba, 277-8577, Japan
| | - Nobuyoshi Takeshita
- Department for the Promotion of Medical Device Innovation, National Cancer Center Hospital East, 6- 5-1 Kashiwanoha, Kashiwa, Chiba, 277-8577, Japan
| | - Hiroki Matsuzaki
- Department for the Promotion of Medical Device Innovation, National Cancer Center Hospital East, 6- 5-1 Kashiwanoha, Kashiwa, Chiba, 277-8577, Japan
| | - Yuto Ishikawa
- Department for the Promotion of Medical Device Innovation, National Cancer Center Hospital East, 6- 5-1 Kashiwanoha, Kashiwa, Chiba, 277-8577, Japan
| | - Masahiro Yura
- Gastric Surgery Division, National Cancer Center Hospital East, 6-5-1 Kashiwanoha, Kashiwa, Chiba, 277-8577, Japan
| | - Tetsuo Akimoto
- Course of Advanced Clinical Research of Cancer, Juntendo University Graduate School of Medicine, 2- 1-1, Hongo, Bunkyo-Ward, Tokyo, 113-8421, Japan
| | - Takahiro Kinoshita
- Gastric Surgery Division, National Cancer Center Hospital East, 6-5-1 Kashiwanoha, Kashiwa, Chiba, 277-8577, Japan
| | - Masaaki Ito
- Department for the Promotion of Medical Device Innovation, National Cancer Center Hospital East, 6- 5-1 Kashiwanoha, Kashiwa, Chiba, 277-8577, Japan.
| |
Collapse
|
9
|
Zhang B, Meng J, Cheng B, Biskup D, Petculescu S, Chapman A. Friends Across Time: Multi-Scale Action Segmentation Transformer for Surgical Phase Recognition. ANNUAL INTERNATIONAL CONFERENCE OF THE IEEE ENGINEERING IN MEDICINE AND BIOLOGY SOCIETY. IEEE ENGINEERING IN MEDICINE AND BIOLOGY SOCIETY. ANNUAL INTERNATIONAL CONFERENCE 2024; 2024:1-6. [PMID: 40039574 DOI: 10.1109/embc53108.2024.10782887] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 03/06/2025]
Abstract
Automatic surgical phase recognition is a core technology for modern operating rooms and online surgical video assessment platforms. Current state-of-the-art methods use both spatial and temporal information to tackle the surgical phase recognition task. Building on this idea, we propose the Multi-Scale Action Segmentation Transformer (MS-AST) for offline surgical phase recognition and the Multi-Scale Action Segmentation Causal Transformer (MS-ASCT) for online surgical phase recognition. We use ResNet50 or EfficientNetV2-M for spatial feature extraction. Our MS-AST and MS-ASCT can model temporal information at different scales with multi-scale temporal self-attention and multi-scale temporal cross-attention, which enhances the capture of temporal relationships between frames and segments. We demonstrate that our method can achieve 95.26% and 96.15% accuracy on the Cholec80 dataset for online and offline surgical phase recognition, respectively, which achieves new state-of-the-art results. Our method can also achieve state-of-the-art results on non-medical datasets in the video action segmentation domain.
Collapse
|
10
|
Tran B, Kulic D, Harrison SM, Pilgrim C, Abdi E. PCNet: Accurate Surgical Workflow Recognition with Phase Consistency. ANNUAL INTERNATIONAL CONFERENCE OF THE IEEE ENGINEERING IN MEDICINE AND BIOLOGY SOCIETY. IEEE ENGINEERING IN MEDICINE AND BIOLOGY SOCIETY. ANNUAL INTERNATIONAL CONFERENCE 2024; 2024:1-4. [PMID: 40039170 DOI: 10.1109/embc53108.2024.10782329] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 03/06/2025]
Abstract
Real-time surgical workflow recognition is a prerequisite for autonomous robot-assisted surgery. Existing approaches to this problem mainly focus on the output accuracy without considering the spurious phase changes issue, which is crucial for online applications in operation room. In this work, we employ a multi-module model to incorporate both spatial and temporal features for accurate inferences via a transformer-like architecture. Especially, we introduce a feedback loop at the output stage to enforce phase consistency. Through thorough experiments with Cholec80 dataset, our model is shown to be comparative with state of the arts in terms of accuracy, while consistently outperforming them in the output continuity aspect. In addition, we validate the benefits of using ResNeSt over the conventional ResNet for feature extraction in this field of research. However, we acknowledge the trade-off between inference accuracy and phase change amount, which needs further study.
Collapse
|
11
|
Power D, Ullah I. Automated Assessment of Simulated Laparoscopic Surgical Performance using 3DCNN. ANNUAL INTERNATIONAL CONFERENCE OF THE IEEE ENGINEERING IN MEDICINE AND BIOLOGY SOCIETY. IEEE ENGINEERING IN MEDICINE AND BIOLOGY SOCIETY. ANNUAL INTERNATIONAL CONFERENCE 2024; 2024:1-4. [PMID: 40039681 DOI: 10.1109/embc53108.2024.10782160] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 03/06/2025]
Abstract
Artificial intelligence & Computer vision have the potential to improve surgical training, especially for minimally invasive surgery by analyzing intraoperative and simulation videos for training or performance improvement purposes. Among these, techniques based on deep learning have rapidly improved, from recognizing objects, instruments, and gestures, to remembering past surgical steps and phases of surgery. However, data scarcity is a problem, particularly in surgery, where complex datasets and human annotation are expensive and time-consuming, and in most cases rely on direct intervention of clinical expertise. Laproscopic surgical assessment of performance traditionally relies on direct observation or video analysis by human experts, a costly and time-consuming undertaking. A newly collected simulated laparoscopic surgical dataset (LSPD) is presented that will initiate the research in automating this problem and avoiding manual expert assessments. LSPD statistical analyses is given to show similarity and differences between different expertise level (on Stack, Bands, and Tower Skills). Finally, a convolutional neural network is used to predict the experience level of the surgeons, where the model achieved good distinguishing results. The proposed work offers the potential to automate performance assessment and self-learn important features that can discriminate between the performance of novice, trainee, and expert levels.
Collapse
|
12
|
Zhang B, Sarhan MH, Goel B, Petculescu S, Ghanem A. SF-TMN: SlowFast temporal modeling network for surgical phase recognition. Int J Comput Assist Radiol Surg 2024; 19:871-880. [PMID: 38512588 DOI: 10.1007/s11548-024-03095-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/14/2023] [Accepted: 02/29/2024] [Indexed: 03/23/2024]
Abstract
PURPOSE Automatic surgical phase recognition is crucial for video-based assessment systems in surgical education. Utilizing temporal information is crucial for surgical phase recognition; hence, various recent approaches extract frame-level features to conduct full video temporal modeling. METHODS For better temporal modeling, we propose SlowFast temporal modeling network (SF-TMN) for offline surgical phase recognition that can achieve not only frame-level full video temporal modeling but also segment-level full video temporal modeling. We employ a feature extraction network, pretrained on the target dataset, to extract features from video frames as the training data for SF-TMN. The Slow Path in SF-TMN utilizes all frame features for frame temporal modeling. The Fast Path in SF-TMN utilizes segment-level features summarized from frame features for segment temporal modeling. The proposed paradigm is flexible regarding the choice of temporal modeling networks. RESULTS We explore MS-TCN and ASFormer as temporal modeling networks and experiment with multiple combination strategies for Slow and Fast Paths. We evaluate SF-TMN on Cholec80 and Cataract-101 surgical phase recognition tasks and demonstrate that SF-TMN can achieve state-of-the-art results on all considered metrics. SF-TMN with ASFormer backbone outperforms the state-of-the-art Swin BiGRU by approximately 1% in accuracy and 1.5% in recall on Cholec80. We also evaluate SF-TMN on action segmentation datasets including 50salads, GTEA, and Breakfast, and achieve state-of-the-art results. CONCLUSION The improvement in the results shows that combining temporal information from both frame level and segment level by refining outputs with temporal refinement stages is beneficial for the temporal modeling of surgical phases.
Collapse
Affiliation(s)
- Bokai Zhang
- Johnson & Johnson MedTech, 1100 Olive Way, Suite 1100, Seattle, WA, 98101, USA.
| | - Mohammad Hasan Sarhan
- Johnson & Johnson MedTech, Robert-Koch-Straße 1, 22851, Norderstedt, Schleswig-Holstein, Germany
| | - Bharti Goel
- Johnson & Johnson MedTech, 5490 Great America Pkwy, Santa Clara, CA, 95054, USA
| | - Svetlana Petculescu
- Johnson & Johnson MedTech, 1100 Olive Way, Suite 1100, Seattle, WA, 98101, USA
| | - Amer Ghanem
- Johnson & Johnson MedTech, 1100 Olive Way, Suite 1100, Seattle, WA, 98101, USA
| |
Collapse
|
13
|
Rivoir D, Funke I, Speidel S. On the pitfalls of Batch Normalization for end-to-end video learning: A study on surgical workflow analysis. Med Image Anal 2024; 94:103126. [PMID: 38452578 DOI: 10.1016/j.media.2024.103126] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/31/2023] [Revised: 01/11/2024] [Accepted: 02/26/2024] [Indexed: 03/09/2024]
Abstract
Batch Normalization's (BN) unique property of depending on other samples in a batch is known to cause problems in several tasks, including sequence modeling. Yet, BN-related issues are hardly studied for long video understanding, despite the ubiquitous use of BN in CNNs (Convolutional Neural Networks) for feature extraction. Especially in surgical workflow analysis, where the lack of pretrained feature extractors has led to complex, multi-stage training pipelines, limited awareness of BN issues may have hidden the benefits of training CNNs and temporal models end to end. In this paper, we analyze pitfalls of BN in video learning, including issues specific to online tasks such as a 'cheating' effect in anticipation. We observe that BN's properties create major obstacles for end-to-end learning. However, using BN-free backbones, even simple CNN-LSTMs beat the state of the art on three surgical workflow benchmarks by utilizing adequate end-to-end training strategies which maximize temporal context. We conclude that awareness of BN's pitfalls is crucial for effective end-to-end learning in surgical tasks. By reproducing results on natural-video datasets, we hope our insights will benefit other areas of video learning as well. Code is available at: https://gitlab.com/nct_tso_public/pitfalls_bn.
Collapse
Affiliation(s)
- Dominik Rivoir
- Department of Translational Surgical Oncology, National Center for Tumor Diseases (NCT/UCC Dresden), Fetscherstraße 74, 01307 Dresden, Germany: German Cancer Research Center (DKFZ), Heidelberg, Germany; Faculty of Medicine and University Hospital Carl Gustav Carus, TUD Dresden University of Technology, Dresden, Germany; Helmholtz-Zentrum Dresden-Rossendorf (HZDR), Dresden, Germany; Centre for Tactile Internet with Human-in-the-Loop (CeTI), TUD Dresden University of Technology, Dresden, Germany.
| | - Isabel Funke
- Department of Translational Surgical Oncology, National Center for Tumor Diseases (NCT/UCC Dresden), Fetscherstraße 74, 01307 Dresden, Germany: German Cancer Research Center (DKFZ), Heidelberg, Germany; Faculty of Medicine and University Hospital Carl Gustav Carus, TUD Dresden University of Technology, Dresden, Germany; Helmholtz-Zentrum Dresden-Rossendorf (HZDR), Dresden, Germany; Centre for Tactile Internet with Human-in-the-Loop (CeTI), TUD Dresden University of Technology, Dresden, Germany
| | - Stefanie Speidel
- Department of Translational Surgical Oncology, National Center for Tumor Diseases (NCT/UCC Dresden), Fetscherstraße 74, 01307 Dresden, Germany: German Cancer Research Center (DKFZ), Heidelberg, Germany; Faculty of Medicine and University Hospital Carl Gustav Carus, TUD Dresden University of Technology, Dresden, Germany; Helmholtz-Zentrum Dresden-Rossendorf (HZDR), Dresden, Germany; Centre for Tactile Internet with Human-in-the-Loop (CeTI), TUD Dresden University of Technology, Dresden, Germany
| |
Collapse
|
14
|
Al Abbas AI, Namazi B, Radi I, Alterio R, Abreu AA, Rail B, Polanco PM, Zeh HJ, Hogg ME, Zureikat AH, Sankaranarayanan G. The development of a deep learning model for automated segmentation of the robotic pancreaticojejunostomy. Surg Endosc 2024; 38:2553-2561. [PMID: 38488870 DOI: 10.1007/s00464-024-10725-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/01/2023] [Accepted: 01/28/2024] [Indexed: 03/17/2024]
Abstract
BACKGROUND Minimally invasive surgery provides an unprecedented opportunity to review video for assessing surgical performance. Surgical video analysis is time-consuming and expensive. Deep learning provides an alternative for analysis. Robotic pancreaticoduodenectomy (RPD) is a complex and morbid operation. Surgeon technical performance of pancreaticojejunostomy (PJ) has been associated with postoperative pancreatic fistula. In this work, we aimed to utilize deep learning to automatically segment PJ RPD videos. METHODS This was a retrospective review of prospectively collected videos from 2011 to 2022 that were in libraries at tertiary referral centers, including 111 PJ videos. Each frame of a robotic PJ video was categorized based on 6 tasks. A 3D convolutional neural network was trained for frame-level visual feature extraction and classification. All the videos were manually annotated for the start and end of each task. RESULTS Of the 100 videos assessed, 60 videos were used for the training the model, 10 for hyperparameter optimization, and 30 for the testing of performance. All the frames were extracted (6 frames/second) and annotated. The accuracy and mean per-class F1 scores were 88.01% and 85.34% for tasks. CONCLUSION The deep learning model performed well for automated segmentation of PJ videos. Future work will focus on skills assessment and outcome prediction.
Collapse
Affiliation(s)
- Amr I Al Abbas
- Department of Surgery, University of Texas Southwestern Medical Center, 5323 Harry Hines Blvd., Dallas, TX, 75390-9169, USA
| | - Babak Namazi
- Department of Surgery, University of Texas Southwestern Medical Center, 5323 Harry Hines Blvd., Dallas, TX, 75390-9169, USA
| | - Imad Radi
- Department of Surgery, University of Texas Southwestern Medical Center, 5323 Harry Hines Blvd., Dallas, TX, 75390-9169, USA
| | - Rodrigo Alterio
- Department of Surgery, University of Texas Southwestern Medical Center, 5323 Harry Hines Blvd., Dallas, TX, 75390-9169, USA
| | - Andres A Abreu
- Department of Surgery, University of Texas Southwestern Medical Center, 5323 Harry Hines Blvd., Dallas, TX, 75390-9169, USA
| | - Benjamin Rail
- Department of Surgery, University of Texas Southwestern Medical Center, 5323 Harry Hines Blvd., Dallas, TX, 75390-9169, USA
| | - Patricio M Polanco
- Department of Surgery, University of Texas Southwestern Medical Center, 5323 Harry Hines Blvd., Dallas, TX, 75390-9169, USA
| | - Herbert J Zeh
- Department of Surgery, University of Texas Southwestern Medical Center, 5323 Harry Hines Blvd., Dallas, TX, 75390-9169, USA
| | | | - Amer H Zureikat
- University of Pittsburgh Medical Center, Pittsburgh, PA, USA
| | - Ganesh Sankaranarayanan
- Department of Surgery, University of Texas Southwestern Medical Center, 5323 Harry Hines Blvd., Dallas, TX, 75390-9169, USA.
| |
Collapse
|
15
|
Gui S, Wang Z, Chen J, Zhou X, Zhang C, Cao Y. MT4MTL-KD: A Multi-Teacher Knowledge Distillation Framework for Triplet Recognition. IEEE TRANSACTIONS ON MEDICAL IMAGING 2024; 43:1628-1639. [PMID: 38127608 DOI: 10.1109/tmi.2023.3345736] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/23/2023]
Abstract
The recognition of surgical triplets plays a critical role in the practical application of surgical videos. It involves the sub-tasks of recognizing instruments, verbs, and targets, while establishing precise associations between them. Existing methods face two significant challenges in triplet recognition: 1) the imbalanced class distribution of surgical triplets may lead to spurious task association learning, and 2) the feature extractors cannot reconcile local and global context modeling. To overcome these challenges, this paper presents a novel multi-teacher knowledge distillation framework for multi-task triplet learning, known as MT4MTL-KD. MT4MTL-KD leverages teacher models trained on less imbalanced sub-tasks to assist multi-task student learning for triplet recognition. Moreover, we adopt different categories of backbones for the teacher and student models, facilitating the integration of local and global context modeling. To further align the semantic knowledge between the triplet task and its sub-tasks, we propose a novel feature attention module (FAM). This module utilizes attention mechanisms to assign multi-task features to specific sub-tasks. We evaluate the performance of MT4MTL-KD on both the 5-fold cross-validation and the CholecTriplet challenge splits of the CholecT45 dataset. The experimental results consistently demonstrate the superiority of our framework over state-of-the-art methods, achieving significant improvements of up to 6.4% on the cross-validation split.
Collapse
|
16
|
Loukas C, Seimenis I, Prevezanou K, Schizas D. Prediction of remaining surgery duration in laparoscopic videos based on visual saliency and the transformer network. Int J Med Robot 2024; 20:e2632. [PMID: 38630888 DOI: 10.1002/rcs.2632] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/19/2024] [Revised: 03/26/2024] [Accepted: 04/07/2024] [Indexed: 04/19/2024]
Abstract
BACKGROUND Real-time prediction of the remaining surgery duration (RSD) is important for optimal scheduling of resources in the operating room. METHODS We focus on the intraoperative prediction of RSD from laparoscopic video. An extensive evaluation of seven common deep learning models, a proposed one based on the Transformer architecture (TransLocal) and four baseline approaches, is presented. The proposed pipeline includes a CNN-LSTM for feature extraction from salient regions within short video segments and a Transformer with local attention mechanisms. RESULTS Using the Cholec80 dataset, TransLocal yielded the best performance (mean absolute error (MAE) = 7.1 min). For long and short surgeries, the MAE was 10.6 and 4.4 min, respectively. Thirty minutes before the end of surgery MAE = 6.2 min, 7.2 and 5.5 min for all long and short surgeries, respectively. CONCLUSIONS The proposed technique achieves state-of-the-art results. In the future, we aim to incorporate intraoperative indicators and pre-operative data.
Collapse
Affiliation(s)
- Constantinos Loukas
- Laboratory of Medical Physics, Medical School, National and Kapodistrian University of Athens, Athens, Greece
| | - Ioannis Seimenis
- Laboratory of Medical Physics, Medical School, National and Kapodistrian University of Athens, Athens, Greece
| | - Konstantina Prevezanou
- Laboratory of Medical Physics, Medical School, National and Kapodistrian University of Athens, Athens, Greece
| | - Dimitrios Schizas
- 1st Department of Surgery, Laikon General Hospital, Medical School, National and Kapodistrian University of Athens, Athens, Greece
| |
Collapse
|
17
|
Abiyev RH, Altabel MZ, Darwish M, Helwan A. A Multimodal Transformer Model for Recognition of Images from Complex Laparoscopic Surgical Videos. Diagnostics (Basel) 2024; 14:681. [PMID: 38611594 PMCID: PMC11011728 DOI: 10.3390/diagnostics14070681] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/10/2024] [Revised: 03/12/2024] [Accepted: 03/15/2024] [Indexed: 04/14/2024] Open
Abstract
The determination of the potential role and advantages of artificial intelligence-based models in the field of surgery remains uncertain. This research marks an initial stride towards creating a multimodal model, inspired by the Video-Audio-Text Transformer, that aims to reduce negative occurrences and enhance patient safety. The model employs text and image embedding state-of-the-art models (ViT and BERT) to assess their efficacy in extracting the hidden and distinct features from the surgery video frames. These features are then used as inputs for convolution-free Transformer architectures to extract comprehensive multidimensional representations. A joint space is then used to combine the text and image features extracted from both Transformer encoders. This joint space ensures that the relationships between the different modalities are preserved during the combination process. The entire model was trained and tested on laparoscopic cholecystectomy (LC) videos encompassing various levels of complexity. Experimentally, a mean accuracy of 91.0%, a precision of 81%, and a recall of 83% were reached by the model when tested on 30 videos out of 80 from the Cholec80 dataset.
Collapse
Affiliation(s)
- Rahib H. Abiyev
- Applied Artificial Intelligence Research Centre, Department of Computer Engineering, Near East University, 99132 North Cyprus, Turkey; (M.Z.A.); (M.D.)
| | - Mohamad Ziad Altabel
- Applied Artificial Intelligence Research Centre, Department of Computer Engineering, Near East University, 99132 North Cyprus, Turkey; (M.Z.A.); (M.D.)
| | - Manal Darwish
- Applied Artificial Intelligence Research Centre, Department of Computer Engineering, Near East University, 99132 North Cyprus, Turkey; (M.Z.A.); (M.D.)
| | - Abdulkader Helwan
- Department of Health, Medicine and Caring Sciences, Linköping University, 581 85 Linköping, Sweden;
| |
Collapse
|
18
|
Deol ES, Tollefson MK, Antolin A, Zohar M, Bar O, Ben-Ayoun D, Mynderse LA, Lomas DJ, Avant RA, Miller AR, Elliott DS, Boorjian SA, Wolf T, Asselmann D, Khanna A. Automated surgical step recognition in transurethral bladder tumor resection using artificial intelligence: transfer learning across surgical modalities. Front Artif Intell 2024; 7:1375482. [PMID: 38525302 PMCID: PMC10958784 DOI: 10.3389/frai.2024.1375482] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/23/2024] [Accepted: 02/26/2024] [Indexed: 03/26/2024] Open
Abstract
Objective Automated surgical step recognition (SSR) using AI has been a catalyst in the "digitization" of surgery. However, progress has been limited to laparoscopy, with relatively few SSR tools in endoscopic surgery. This study aimed to create a SSR model for transurethral resection of bladder tumors (TURBT), leveraging a novel application of transfer learning to reduce video dataset requirements. Materials and methods Retrospective surgical videos of TURBT were manually annotated with the following steps of surgery: primary endoscopic evaluation, resection of bladder tumor, and surface coagulation. Manually annotated videos were then utilized to train a novel AI computer vision algorithm to perform automated video annotation of TURBT surgical video, utilizing a transfer-learning technique to pre-train on laparoscopic procedures. Accuracy of AI SSR was determined by comparison to human annotations as the reference standard. Results A total of 300 full-length TURBT videos (median 23.96 min; IQR 14.13-41.31 min) were manually annotated with sequential steps of surgery. One hundred and seventy-nine videos served as a training dataset for algorithm development, 44 for internal validation, and 77 as a separate test cohort for evaluating algorithm accuracy. Overall accuracy of AI video analysis was 89.6%. Model accuracy was highest for the primary endoscopic evaluation step (98.2%) and lowest for the surface coagulation step (82.7%). Conclusion We developed a fully automated computer vision algorithm for high-accuracy annotation of TURBT surgical videos. This represents the first application of transfer-learning from laparoscopy-based computer vision models into surgical endoscopy, demonstrating the promise of this approach in adapting to new procedure types.
Collapse
Affiliation(s)
- Ekamjit S. Deol
- Department of Urology, Mayo Clinic, Rochester, MN, United States
| | | | | | - Maya Zohar
- theator.io, Palo Alto, CA, United States
| | - Omri Bar
- theator.io, Palo Alto, CA, United States
| | | | | | - Derek J. Lomas
- Department of Urology, Mayo Clinic, Rochester, MN, United States
| | - Ross A. Avant
- Department of Urology, Mayo Clinic, Rochester, MN, United States
| | - Adam R. Miller
- Department of Urology, Mayo Clinic, Rochester, MN, United States
| | | | | | - Tamir Wolf
- theator.io, Palo Alto, CA, United States
| | | | - Abhinav Khanna
- Department of Urology, Mayo Clinic, Rochester, MN, United States
| |
Collapse
|
19
|
Burguete-Lopez A, Makarenko M, Bonifazi M, Menezes de Oliveira BN, Getman F, Tian Y, Mazzone V, Li N, Giammona A, Liberale C, Fratalocchi A. Real-time simultaneous refractive index and thickness mapping of sub-cellular biology at the diffraction limit. Commun Biol 2024; 7:154. [PMID: 38321111 PMCID: PMC10847501 DOI: 10.1038/s42003-024-05839-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/28/2023] [Accepted: 01/20/2024] [Indexed: 02/08/2024] Open
Abstract
Mapping the cellular refractive index (RI) is a central task for research involving the composition of microorganisms and the development of models providing automated medical screenings with accuracy beyond 95%. These models require significantly enhancing the state-of-the-art RI mapping capabilities to provide large amounts of accurate RI data at high throughput. Here, we present a machine-learning-based technique that obtains a biological specimen's real-time RI and thickness maps from a single image acquired with a conventional color camera. This technology leverages a suitably engineered nanostructured membrane that stretches a biological analyte over its surface and absorbs transmitted light, generating complex reflection spectra from each sample point. The technique does not need pre-existing sample knowledge. It achieves 10-4 RI sensitivity and sub-nanometer thickness resolution on diffraction-limited spatial areas. We illustrate practical application by performing sub-cellular segmentation of HCT-116 colorectal cancer cells, obtaining complete three-dimensional reconstruction of the cellular regions with a characteristic length of 30 μm. These results can facilitate the development of real-time label-free technologies for biomedical studies on microscopic multicellular dynamics.
Collapse
Affiliation(s)
- Arturo Burguete-Lopez
- PRIMALIGHT, Computer, Electrical and Mathematical Sciences and Engineering (CEMSE), King Abdullah University of Science and Technology (KAUST), Thuwal, 23955-6900, Saudi Arabia
| | - Maksim Makarenko
- PRIMALIGHT, Computer, Electrical and Mathematical Sciences and Engineering (CEMSE), King Abdullah University of Science and Technology (KAUST), Thuwal, 23955-6900, Saudi Arabia
| | - Marcella Bonifazi
- PRIMALIGHT, Computer, Electrical and Mathematical Sciences and Engineering (CEMSE), King Abdullah University of Science and Technology (KAUST), Thuwal, 23955-6900, Saudi Arabia
- Physik-Institut, University of Zurich, Winterthurerstrasse 190, Zurich, 8057, Switzerland
| | - Barbara Nicoly Menezes de Oliveira
- PRIMALIGHT, Computer, Electrical and Mathematical Sciences and Engineering (CEMSE), King Abdullah University of Science and Technology (KAUST), Thuwal, 23955-6900, Saudi Arabia
| | - Fedor Getman
- PRIMALIGHT, Computer, Electrical and Mathematical Sciences and Engineering (CEMSE), King Abdullah University of Science and Technology (KAUST), Thuwal, 23955-6900, Saudi Arabia
| | - Yi Tian
- PRIMALIGHT, Computer, Electrical and Mathematical Sciences and Engineering (CEMSE), King Abdullah University of Science and Technology (KAUST), Thuwal, 23955-6900, Saudi Arabia
| | - Valerio Mazzone
- PRIMALIGHT, Computer, Electrical and Mathematical Sciences and Engineering (CEMSE), King Abdullah University of Science and Technology (KAUST), Thuwal, 23955-6900, Saudi Arabia
- Physik-Institut, University of Zurich, Winterthurerstrasse 190, Zurich, 8057, Switzerland
| | - Ning Li
- PRIMALIGHT, Computer, Electrical and Mathematical Sciences and Engineering (CEMSE), King Abdullah University of Science and Technology (KAUST), Thuwal, 23955-6900, Saudi Arabia
| | - Alessandro Giammona
- Biological and Environmental Science and Engineering Division (BESE), King Abdullah University of Science and Technology (KAUST), Thuwal, 23955-6900, Saudi Arabia
- Institute of Molecular Bioimaging and Physiology (IBFM), National Research Council (CNR), Segrate, Italy
| | - Carlo Liberale
- Biological and Environmental Science and Engineering Division (BESE), King Abdullah University of Science and Technology (KAUST), Thuwal, 23955-6900, Saudi Arabia
| | - Andrea Fratalocchi
- PRIMALIGHT, Computer, Electrical and Mathematical Sciences and Engineering (CEMSE), King Abdullah University of Science and Technology (KAUST), Thuwal, 23955-6900, Saudi Arabia.
| |
Collapse
|
20
|
Zhai Y, Chen Z, Zheng Z, Wang X, Yan X, Liu X, Yin J, Wang J, Zhang J. Artificial intelligence for automatic surgical phase recognition of laparoscopic gastrectomy in gastric cancer. Int J Comput Assist Radiol Surg 2024; 19:345-353. [PMID: 37914911 DOI: 10.1007/s11548-023-03027-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/05/2023] [Accepted: 10/02/2023] [Indexed: 11/03/2023]
Abstract
PURPOSE This study aimed to classify laparoscopic gastric cancer phases. We also aimed to develop a transformer-based artificial intelligence (AI) model for automatic surgical phase recognition and to evaluate the model's performance using laparoscopic gastric cancer surgical videos. METHODS One hundred patients who underwent laparoscopic surgery for gastric cancer were included in this study. All surgical videos were labeled and classified into eight phases (P0. Preparation. P1. Separate the greater gastric curvature. P2. Separate the distal stomach. P3. Separate lesser gastric curvature. P4. Dissect the superior margin of the pancreas. P5. Separation of the proximal stomach. P6. Digestive tract reconstruction. P7. End of operation). This study proposed an AI phase recognition model consisting of a convolutional neural network-based visual feature extractor and temporal relational transformer. RESULTS A visual and temporal relationship network was proposed to automatically perform accurate surgical phase prediction. The average time for all surgical videos in the video set was 9114 ± 2571 s. The longest phase was at P1 (3388 s). The final research accuracy, F1, recall, and precision were 90.128, 87.04, 87.04, and 87.32%, respectively. The phase with the highest recognition accuracy was P1, and that with the lowest accuracy was P2. CONCLUSION An AI model based on neural and transformer networks was developed in this study. This model can identify the phases of laparoscopic surgery for gastric cancer accurately. AI can be used as an analytical tool for gastric cancer surgical videos.
Collapse
Affiliation(s)
- Yuhao Zhai
- Department of General Surgery, Beijing Friendship Hospital, Capital Medical University, 95 Yong'an Road, Xicheng District, Beijing, China
- State Key Lab of Digestive Health, Beijing, China
| | - Zhen Chen
- Centre for Artificial Intelligence and Robotics (CAIR), Hong Kong Institute of Science and Innovation, Chinese Academy of Sciences, Hong Kong SAR, China
| | - Zhi Zheng
- Department of General Surgery, Beijing Friendship Hospital, Capital Medical University, 95 Yong'an Road, Xicheng District, Beijing, China
- State Key Lab of Digestive Health, Beijing, China
| | - Xi Wang
- Department of General Surgery, Beijing Friendship Hospital, Capital Medical University, 95 Yong'an Road, Xicheng District, Beijing, China
- State Key Lab of Digestive Health, Beijing, China
| | - Xiaosheng Yan
- Department of General Surgery, Beijing Friendship Hospital, Capital Medical University, 95 Yong'an Road, Xicheng District, Beijing, China
- State Key Lab of Digestive Health, Beijing, China
| | - Xiaoye Liu
- Department of General Surgery, Beijing Friendship Hospital, Capital Medical University, 95 Yong'an Road, Xicheng District, Beijing, China
- State Key Lab of Digestive Health, Beijing, China
| | - Jie Yin
- Department of General Surgery, Beijing Friendship Hospital, Capital Medical University, 95 Yong'an Road, Xicheng District, Beijing, China
- State Key Lab of Digestive Health, Beijing, China
| | - Jinqiao Wang
- Centre for Artificial Intelligence and Robotics (CAIR), Hong Kong Institute of Science and Innovation, Chinese Academy of Sciences, Hong Kong SAR, China.
- National Laboratory of Pattern Recognition, Institute of Automation, Chinese Academy of Sciences, 95 Zhongguancun East Road, Haidian District, Beijing, China.
- Wuhan AI Research, Wuhan, China.
| | - Jun Zhang
- Department of General Surgery, Beijing Friendship Hospital, Capital Medical University, 95 Yong'an Road, Xicheng District, Beijing, China.
- State Key Lab of Digestive Health, Beijing, China.
| |
Collapse
|
21
|
Kostiuchik G, Sharan L, Mayer B, Wolf I, Preim B, Engelhardt S. Surgical phase and instrument recognition: how to identify appropriate dataset splits. Int J Comput Assist Radiol Surg 2024:10.1007/s11548-024-03063-9. [PMID: 38285380 DOI: 10.1007/s11548-024-03063-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/01/2023] [Accepted: 01/08/2024] [Indexed: 01/30/2024]
Abstract
PURPOSE Machine learning approaches can only be reliably evaluated if training, validation, and test data splits are representative and not affected by the absence of classes. Surgical workflow and instrument recognition are two tasks that are complicated in this manner, because of heavy data imbalances resulting from different length of phases and their potential erratic occurrences. Furthermore, sub-properties like instrument (co-)occurrence are usually not particularly considered when defining the split. METHODS We present a publicly available data visualization tool that enables interactive exploration of dataset partitions for surgical phase and instrument recognition. The application focuses on the visualization of the occurrence of phases, phase transitions, instruments, and instrument combinations across sets. Particularly, it facilitates assessment of dataset splits, especially regarding identification of sub-optimal dataset splits. RESULTS We performed analysis of the datasets Cholec80, CATARACTS, CaDIS, M2CAI-workflow, and M2CAI-tool using the proposed application. We were able to uncover phase transitions, individual instruments, and combinations of surgical instruments that were not represented in one of the sets. Addressing these issues, we identify possible improvements in the splits using our tool. A user study with ten participants demonstrated that the participants were able to successfully solve a selection of data exploration tasks. CONCLUSION In highly unbalanced class distributions, special care should be taken with respect to the selection of an appropriate dataset split because it can greatly influence the assessments of machine learning approaches. Our interactive tool allows for determination of better splits to improve current practices in the field. The live application is available at https://cardio-ai.github.io/endovis-ml/ .
Collapse
Affiliation(s)
- Georgii Kostiuchik
- Department of Cardiac Surgery, Heidelberg University Hospital, Heidelberg, Germany.
- DZHK (German Centre for Cardiovascular Research), Partner Site Heidelberg/Mannheim, Heidelberg, Germany.
| | - Lalith Sharan
- Department of Cardiac Surgery, Heidelberg University Hospital, Heidelberg, Germany
- DZHK (German Centre for Cardiovascular Research), Partner Site Heidelberg/Mannheim, Heidelberg, Germany
| | - Benedikt Mayer
- Department of Simulation and Graphics, University of Magdeburg, Magdeburg, Germany
| | - Ivo Wolf
- Department of Computer Science, Mannheim University of Applied Sciences, Mannheim, Germany
| | - Bernhard Preim
- Department of Simulation and Graphics, University of Magdeburg, Magdeburg, Germany
| | - Sandy Engelhardt
- Department of Cardiac Surgery, Heidelberg University Hospital, Heidelberg, Germany
- DZHK (German Centre for Cardiovascular Research), Partner Site Heidelberg/Mannheim, Heidelberg, Germany
| |
Collapse
|
22
|
Balu A, Kugener G, Pangal DJ, Lee H, Lasky S, Han J, Buchanan I, Liu J, Zada G, Donoho DA. Simulated outcomes for durotomy repair in minimally invasive spine surgery. Sci Data 2024; 11:62. [PMID: 38200013 PMCID: PMC10781746 DOI: 10.1038/s41597-023-02744-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/28/2023] [Accepted: 11/13/2023] [Indexed: 01/12/2024] Open
Abstract
Minimally invasive spine surgery (MISS) is increasingly performed using endoscopic and microscopic visualization, and the captured video can be used for surgical education and development of predictive artificial intelligence (AI) models. Video datasets depicting adverse event management are also valuable, as predictive models not exposed to adverse events may exhibit poor performance when these occur. Given that no dedicated spine surgery video datasets for AI model development are publicly available, we introduce Simulated Outcomes for Durotomy Repair in Minimally Invasive Spine Surgery (SOSpine). A validated MISS cadaveric dural repair simulator was used to educate neurosurgery residents, and surgical microscope video recordings were paired with outcome data. Objects including durotomy, needle, grasper, needle driver, and nerve hook were then annotated. Altogether, SOSpine contains 15,698 frames with 53,238 annotations and associated durotomy repair outcomes. For validation, an AI model was fine-tuned on SOSpine video and detected surgical instruments with a mean average precision of 0.77. In summary, SOSpine depicts spine surgeons managing a common complication, providing opportunities to develop surgical AI models.
Collapse
Affiliation(s)
- Alan Balu
- Department of Neurosurgery, Georgetown University School of Medicine, 3900 Reservoir Rd NW, Washington, D.C., 20007, USA.
| | - Guillaume Kugener
- Department of Neurological Surgery, Keck School of Medicine of University of Southern California, 1200 North State St., Suite 3300, Los Angeles, CA, 90033, USA
| | - Dhiraj J Pangal
- Department of Neurological Surgery, Keck School of Medicine of University of Southern California, 1200 North State St., Suite 3300, Los Angeles, CA, 90033, USA
| | - Heewon Lee
- University of Southern California, 3709 Trousdale Pkwy., Los Angeles, CA, 90089, USA
| | - Sasha Lasky
- University of Southern California, 3709 Trousdale Pkwy., Los Angeles, CA, 90089, USA
| | - Jane Han
- University of Southern California, 3709 Trousdale Pkwy., Los Angeles, CA, 90089, USA
| | - Ian Buchanan
- Department of Neurological Surgery, Keck School of Medicine of University of Southern California, 1200 North State St., Suite 3300, Los Angeles, CA, 90033, USA
| | - John Liu
- Department of Neurological Surgery, Keck School of Medicine of University of Southern California, 1200 North State St., Suite 3300, Los Angeles, CA, 90033, USA
| | - Gabriel Zada
- Department of Neurological Surgery, Keck School of Medicine of University of Southern California, 1200 North State St., Suite 3300, Los Angeles, CA, 90033, USA
| | - Daniel A Donoho
- Department of Neurosurgery, Children's National Hospital, 111 Michigan Avenue NW, Washington, DC, 20010, USA
| |
Collapse
|
23
|
Men Y, Zhao Z, Chen W, Wu H, Zhang G, Luo F, Yu M. Research on workflow recognition for liver rupture repair surgery. MATHEMATICAL BIOSCIENCES AND ENGINEERING : MBE 2024; 21:1844-1856. [PMID: 38454663 DOI: 10.3934/mbe.2024080] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 03/09/2024]
Abstract
Liver rupture repair surgery serves as one tool to treat liver rupture, especially beneficial for cases of mild liver rupture hemorrhage. Liver rupture can catalyze critical conditions such as hemorrhage and shock. Surgical workflow recognition in liver rupture repair surgery videos presents a significant task aimed at reducing surgical mistakes and enhancing the quality of surgeries conducted by surgeons. A liver rupture repair simulation surgical dataset is proposed in this paper which consists of 45 videos collaboratively completed by nine surgeons. Furthermore, an end-to-end SA-RLNet, a self attention-based recurrent convolutional neural network, is introduced in this paper. The self-attention mechanism is used to automatically identify the importance of input features in various instances and associate the relationships between input features. The accuracy of the surgical phase classification of the SA-RLNet approach is 90.6%. The present study demonstrates that the SA-RLNet approach shows strong generalization capabilities on the dataset. SA-RLNet has proved to be advantageous in capturing subtle variations between surgical phases. The application of surgical workflow recognition has promising feasibility in liver rupture repair surgery.
Collapse
Affiliation(s)
- Yutao Men
- Tianjin Key Laboratory for Advanced Mechatronic System Design and Intelligent Control, School of Mechanical Engineering, Tianjin University of Technology, Tianjin 300384, China
- National Demonstration Center for Experimental Mechanical and Electrical Engineering Education, Tianjin University of Technology, Tianjin 300384, China
| | - Zixian Zhao
- Tianjin Key Laboratory for Advanced Mechatronic System Design and Intelligent Control, School of Mechanical Engineering, Tianjin University of Technology, Tianjin 300384, China
- Medical Support Technology Research Department, Systems Engineering Institute, Academy of Military Sciences, People's Liberation Army, Tianjin 300161, China
- National Demonstration Center for Experimental Mechanical and Electrical Engineering Education, Tianjin University of Technology, Tianjin 300384, China
| | - Wei Chen
- Tianjin Key Laboratory for Advanced Mechatronic System Design and Intelligent Control, School of Mechanical Engineering, Tianjin University of Technology, Tianjin 300384, China
- National Demonstration Center for Experimental Mechanical and Electrical Engineering Education, Tianjin University of Technology, Tianjin 300384, China
| | - Hang Wu
- Medical Support Technology Research Department, Systems Engineering Institute, Academy of Military Sciences, People's Liberation Army, Tianjin 300161, China
| | - Guang Zhang
- Medical Support Technology Research Department, Systems Engineering Institute, Academy of Military Sciences, People's Liberation Army, Tianjin 300161, China
| | - Feng Luo
- Medical Support Technology Research Department, Systems Engineering Institute, Academy of Military Sciences, People's Liberation Army, Tianjin 300161, China
| | - Ming Yu
- Medical Support Technology Research Department, Systems Engineering Institute, Academy of Military Sciences, People's Liberation Army, Tianjin 300161, China
| |
Collapse
|
24
|
Feng X, Zhang X, Shi X, Li L, Wang S. ST-ITEF: Spatio-Temporal Intraoperative Task Estimating Framework to recognize surgical phase and predict instrument path based on multi-object tracking in keratoplasty. Med Image Anal 2024; 91:103026. [PMID: 37976868 DOI: 10.1016/j.media.2023.103026] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/07/2022] [Revised: 08/22/2023] [Accepted: 11/03/2023] [Indexed: 11/19/2023]
Abstract
Computer-assisted cognition guidance for surgical robotics by computer vision is a potential future outcome, which could facilitate the surgery for both operation accuracy and autonomy level. In this paper, multiple-object segmentation and feature extraction from this segmentation are combined to determine and predict surgical manipulation. A novel three-stage Spatio-Temporal Intraoperative Task Estimating Framework is proposed, with a quantitative expression derived from ophthalmologists' visual information process and also with the multi-object tracking of surgical instruments and human corneas involved in keratoplasty. In the estimation of intraoperative workflow, quantifying the operation parameters is still an open challenge. This problem is tackled by extracting key geometric properties from multi-object segmentation and calculating the relative position among instruments and corneas. A decision framework is further proposed, based on prior geometric properties, to recognize the current surgical phase and predict the instrument path for each phase. Our framework is tested and evaluated by real human keratoplasty videos. The optimized DeepLabV3 with image filtration won the competitive class-IoU in the segmentation task and the mean phase jaccard reached 55.58 % for the phase recognition. Both the qualitative and quantitative results indicate that our framework can achieve accurate segmentation and surgical phase recognition under complex disturbance. The Intraoperative Task Estimating Framework would be highly potential to guide surgical robots in clinical practice.
Collapse
Affiliation(s)
- Xiaojing Feng
- School of Mechanical Engineering at Xi'an Jiaotong University, 28 Xianning West Road, Xi'an 710049, China.
| | - Xiaodong Zhang
- School of Mechanical Engineering at Xi'an Jiaotong University, 28 Xianning West Road, Xi'an 710049, China.
| | - Xiaojun Shi
- School of Mechanical Engineering at Xi'an Jiaotong University, 28 Xianning West Road, Xi'an 710049, China
| | - Li Li
- Department of Ophthalmology at the First Affiliated Hospital of Xi'an Jiaotong University, 277 Yanta West Road, Xi'an 710061, China
| | - Shaopeng Wang
- School of Mechanical Engineering at Xi'an Jiaotong University, 28 Xianning West Road, Xi'an 710049, China
| |
Collapse
|
25
|
Komatsu M, Kitaguchi D, Yura M, Takeshita N, Yoshida M, Yamaguchi M, Kondo H, Kinoshita T, Ito M. Automatic surgical phase recognition-based skill assessment in laparoscopic distal gastrectomy using multicenter videos. Gastric Cancer 2024; 27:187-196. [PMID: 38038811 DOI: 10.1007/s10120-023-01450-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 09/20/2023] [Accepted: 10/31/2023] [Indexed: 12/02/2023]
Abstract
BACKGROUND Gastric surgery involves numerous surgical phases; however, its steps can be clearly defined. Deep learning-based surgical phase recognition can promote stylization of gastric surgery with applications in automatic surgical skill assessment. This study aimed to develop a deep learning-based surgical phase-recognition model using multicenter videos of laparoscopic distal gastrectomy, and examine the feasibility of automatic surgical skill assessment using the developed model. METHODS Surgical videos from 20 hospitals were used. Laparoscopic distal gastrectomy was defined and annotated into nine phases and a deep learning-based image classification model was developed for phase recognition. We examined whether the developed model's output, including the number of frames in each phase and the adequacy of the surgical field development during the phase of supra-pancreatic lymphadenectomy, correlated with the manually assigned skill assessment score. RESULTS The overall accuracy of phase recognition was 88.8%. Regarding surgical skill assessment based on the number of frames during the phases of lymphadenectomy of the left greater curvature and reconstruction, the number of frames in the high-score group were significantly less than those in the low-score group (829 vs. 1,152, P < 0.01; 1,208 vs. 1,586, P = 0.01, respectively). The output score of the adequacy of the surgical field development, which is the developed model's output, was significantly higher in the high-score group than that in the low-score group (0.975 vs. 0.970, P = 0.04). CONCLUSION The developed model had high accuracy in phase-recognition tasks and has the potential for application in automatic surgical skill assessment systems.
Collapse
Affiliation(s)
- Masaru Komatsu
- Gastric Surgery Division, National Cancer Center Hospital East, 6-5-1 Kashiwanoha, Kashiwa, Chiba, 277-8577, Japan
- Department for the Promotion of Medical Device Innovation, National Cancer Center Hospital East, 6-5-1 Kashiwanoha, Kashiwa, Chiba, 277-8577, Japan
- Course of Advanced Clinical Research of Cancer, Juntendo University Graduate School of Medicine, 2-1-1, Hongo, Bunkyo-Ward, Tokyo, 113-8421, Japan
| | - Daichi Kitaguchi
- Department for the Promotion of Medical Device Innovation, National Cancer Center Hospital East, 6-5-1 Kashiwanoha, Kashiwa, Chiba, 277-8577, Japan
| | - Masahiro Yura
- Gastric Surgery Division, National Cancer Center Hospital East, 6-5-1 Kashiwanoha, Kashiwa, Chiba, 277-8577, Japan
| | - Nobuyoshi Takeshita
- Department for the Promotion of Medical Device Innovation, National Cancer Center Hospital East, 6-5-1 Kashiwanoha, Kashiwa, Chiba, 277-8577, Japan
| | - Mitsumasa Yoshida
- Gastric Surgery Division, National Cancer Center Hospital East, 6-5-1 Kashiwanoha, Kashiwa, Chiba, 277-8577, Japan
| | - Masayuki Yamaguchi
- Course of Advanced Clinical Research of Cancer, Juntendo University Graduate School of Medicine, 2-1-1, Hongo, Bunkyo-Ward, Tokyo, 113-8421, Japan
| | - Hibiki Kondo
- Department for the Promotion of Medical Device Innovation, National Cancer Center Hospital East, 6-5-1 Kashiwanoha, Kashiwa, Chiba, 277-8577, Japan
| | - Takahiro Kinoshita
- Gastric Surgery Division, National Cancer Center Hospital East, 6-5-1 Kashiwanoha, Kashiwa, Chiba, 277-8577, Japan
| | - Masaaki Ito
- Department for the Promotion of Medical Device Innovation, National Cancer Center Hospital East, 6-5-1 Kashiwanoha, Kashiwa, Chiba, 277-8577, Japan.
- Surgical Device Innovation Office, National Cancer Center Hospital East, 6-5-1, Kashiwanoha, Kashiwa, Chiba, 277-8577, Japan.
| |
Collapse
|
26
|
Ban Y, Eckhoff JA, Ward TM, Hashimoto DA, Meireles OR, Rus D, Rosman G. Concept Graph Neural Networks for Surgical Video Understanding. IEEE TRANSACTIONS ON MEDICAL IMAGING 2024; 43:264-274. [PMID: 37498757 DOI: 10.1109/tmi.2023.3299518] [Citation(s) in RCA: 4] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 07/29/2023]
Abstract
Analysis of relations between objects and comprehension of abstract concepts in the surgical video is important in AI-augmented surgery. However, building models that integrate our knowledge and understanding of surgery remains a challenging endeavor. In this paper, we propose a novel way to integrate conceptual knowledge into temporal analysis tasks using temporal concept graph networks. In the proposed networks, a knowledge graph is incorporated into the temporal video analysis of surgical notions, learning the meaning of concepts and relations as they apply to the data. We demonstrate results in surgical video data for tasks such as verification of the critical view of safety, estimation of the Parkland grading scale as well as recognizing instrument-action-tissue triplets. The results show that our method improves the recognition and detection of complex benchmarks as well as enables other analytic applications of interest.
Collapse
|
27
|
Hegde SR, Namazi B, Iyengar N, Cao S, Desir A, Marques C, Mahnken H, Dumas RP, Sankaranarayanan G. Automated segmentation of phases, steps, and tasks in laparoscopic cholecystectomy using deep learning. Surg Endosc 2024; 38:158-170. [PMID: 37945709 DOI: 10.1007/s00464-023-10482-3] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/23/2023] [Accepted: 09/17/2023] [Indexed: 11/12/2023]
Abstract
BACKGROUND Video-based review is paramount for operative performance assessment but can be laborious when performed manually. Hierarchical Task Analysis (HTA) is a well-known method that divides any procedure into phases, steps, and tasks. HTA requires large datasets of videos with consistent definitions at each level. Our aim was to develop an AI model for automated segmentation of phases, steps, and tasks for laparoscopic cholecystectomy videos using a standardized HTA. METHODS A total of 160 laparoscopic cholecystectomy videos were collected from a publicly available dataset known as cholec80 and from our own institution. All videos were annotated for the beginning and ending of a predefined set of phases, steps, and tasks. Deep learning models were then separately developed and trained for the three levels using a 3D Convolutional Neural Network architecture. RESULTS Four phases, eight steps, and nineteen tasks were defined through expert consensus. The training set for our deep learning models contained 100 videos with an additional 20 videos for hyperparameter optimization and tuning. The remaining 40 videos were used for testing the performance. The overall accuracy for phases, steps, and tasks were 0.90, 0.81, and 0.65 with the average F1 score of 0.86, 0.76 and 0.48 respectively. Control of bleeding and bile spillage tasks were most variable in definition, operative management, and clinical relevance. CONCLUSION The use of hierarchical task analysis for surgical video analysis has numerous applications in AI-based automated systems. Our results show that our tiered method of task analysis can successfully be used to train a DL model.
Collapse
Affiliation(s)
- Shruti R Hegde
- Department of Surgery, University of Texas Southwestern Medical Center, 5323 Harry Hines Blvd., Dallas, TX, 75390-9159, USA
| | - Babak Namazi
- Department of Surgery, University of Texas Southwestern Medical Center, 5323 Harry Hines Blvd., Dallas, TX, 75390-9159, USA
| | - Niyenth Iyengar
- Department of Surgery, University of Texas Southwestern Medical Center, 5323 Harry Hines Blvd., Dallas, TX, 75390-9159, USA
| | - Sarah Cao
- Department of Surgery, University of Texas Southwestern Medical Center, 5323 Harry Hines Blvd., Dallas, TX, 75390-9159, USA
| | - Alexis Desir
- Department of Surgery, University of Texas Southwestern Medical Center, 5323 Harry Hines Blvd., Dallas, TX, 75390-9159, USA
| | - Carolina Marques
- Department of Surgery, University of Texas Southwestern Medical Center, 5323 Harry Hines Blvd., Dallas, TX, 75390-9159, USA
| | - Heidi Mahnken
- Department of Surgery, University of Texas Southwestern Medical Center, 5323 Harry Hines Blvd., Dallas, TX, 75390-9159, USA
| | - Ryan P Dumas
- Department of Surgery, University of Texas Southwestern Medical Center, 5323 Harry Hines Blvd., Dallas, TX, 75390-9159, USA
| | - Ganesh Sankaranarayanan
- Department of Surgery, University of Texas Southwestern Medical Center, 5323 Harry Hines Blvd., Dallas, TX, 75390-9159, USA.
| |
Collapse
|
28
|
Demir KC, Schieber H, Weise T, Roth D, May M, Maier A, Yang SH. Deep Learning in Surgical Workflow Analysis: A Review of Phase and Step Recognition. IEEE J Biomed Health Inform 2023; 27:5405-5417. [PMID: 37665700 DOI: 10.1109/jbhi.2023.3311628] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 09/06/2023]
Abstract
OBJECTIVE In the last two decades, there has been a growing interest in exploring surgical procedures with statistical models to analyze operations at different semantic levels. This information is necessary for developing context-aware intelligent systems, which can assist the physicians during operations, evaluate procedures afterward or help the management team to effectively utilize the operating room. The objective is to extract reliable patterns from surgical data for the robust estimation of surgical activities performed during operations. The purpose of this article is to review the state-of-the-art deep learning methods that have been published after 2018 for analyzing surgical workflows, with a focus on phase and step recognition. METHODS Three databases, IEEE Xplore, Scopus, and PubMed were searched, and additional studies are added through a manual search. After the database search, 343 studies were screened and a total of 44 studies are selected for this review. CONCLUSION The use of temporal information is essential for identifying the next surgical action. Contemporary methods used mainly RNNs, hierarchical CNNs, and Transformers to preserve long-distance temporal relations. The lack of large publicly available datasets for various procedures is a great challenge for the development of new and robust models. As supervised learning strategies are used to show proof-of-concept, self-supervised, semi-supervised, or active learning methods are used to mitigate dependency on annotated data. SIGNIFICANCE The present study provides a comprehensive review of recent methods in surgical workflow analysis, summarizes commonly used architectures, datasets, and discusses challenges.
Collapse
|
29
|
Zhang J, Zhou S, Wang Y, Shi S, Wan C, Zhao H, Cai X, Ding H. Laparoscopic Image-Based Critical Action Recognition and Anticipation With Explainable Features. IEEE J Biomed Health Inform 2023; 27:5393-5404. [PMID: 37603480 DOI: 10.1109/jbhi.2023.3306818] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 08/23/2023]
Abstract
Surgical workflow analysis integrates perception, comprehension, and prediction of the surgical workflow, which helps real-time surgical support systems provide proper guidance and assistance for surgeons. This article promotes the idea of critical actions, which refer to the essential surgical actions that progress towards the fulfillment of the operation. Fine-grained workflow analysis involves recognizing current critical actions and previewing the moving tendency of instruments in the early stage of critical actions. Aiming at this, we propose a framework that incorporates operational experience to improve the robustness and interpretability of action recognition in in-vivo situations. High-dimensional images are mapped into an experience-based explainable feature space with low dimensions to achieve critical action recognition through a hierarchical classification structure. To forecast the instrument's motion tendency, we model the motion primitives in the polar coordinate system (PCS) to represent patterns of complex trajectories. Given the laparoscopy variance, the adaptive pattern recognition (APR) method, which adapts to uncertain trajectories by modifying model parameters, is designed to improve prediction accuracy. The in-vivo dataset validations show that our framework fulfilled the surgical awareness tasks with exceptional accuracy and real-time performance.
Collapse
|
30
|
Tao R, Zou X, Zheng G. LAST: LAtent Space-Constrained Transformers for Automatic Surgical Phase Recognition and Tool Presence Detection. IEEE TRANSACTIONS ON MEDICAL IMAGING 2023; 42:3256-3268. [PMID: 37227905 DOI: 10.1109/tmi.2023.3279838] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/27/2023]
Abstract
When developing context-aware systems, automatic surgical phase recognition and tool presence detection are two essential tasks. There exist previous attempts to develop methods for both tasks but majority of the existing methods utilize a frame-level loss function (e.g., cross-entropy) which does not fully leverage the underlying semantic structure of a surgery, leading to sub-optimal results. In this paper, we propose multi-task learning-based, LAtent Space-constrained Transformers, referred as LAST, for automatic surgical phase recognition and tool presence detection. Our design features a two-branch transformer architecture with a novel and generic way to leverage video-level semantic information during network training. This is done by learning a non-linear compact presentation of the underlying semantic structure information of surgical videos through a transformer variational autoencoder (VAE) and by encouraging models to follow the learned statistical distributions. In other words, LAST is of structure-aware and favors predictions that lie on the extracted low dimensional data manifold. Validated on two public datasets of the cholecystectomy surgery, i.e., the Cholec80 dataset and the M2cai16 dataset, our method achieves better results than other state-of-the-art methods. Specifically, on the Cholec80 dataset, our method achieves an average accuracy of 93.12±4.71%, an average precision of 89.25±5.49%, an average recall of 90.10±5.45% and an average Jaccard of 81.11 ±7.62% for phase recognition, and an average mAP of 95.15±3.87% for tool presence detection. Similar superior performance is also observed when LAST is applied to the M2cai16 dataset.
Collapse
|
31
|
Nwoye CI, Yu T, Sharma S, Murali A, Alapatt D, Vardazaryan A, Yuan K, Hajek J, Reiter W, Yamlahi A, Smidt FH, Zou X, Zheng G, Oliveira B, Torres HR, Kondo S, Kasai S, Holm F, Özsoy E, Gui S, Li H, Raviteja S, Sathish R, Poudel P, Bhattarai B, Wang Z, Rui G, Schellenberg M, Vilaça JL, Czempiel T, Wang Z, Sheet D, Thapa SK, Berniker M, Godau P, Morais P, Regmi S, Tran TN, Fonseca J, Nölke JH, Lima E, Vazquez E, Maier-Hein L, Navab N, Mascagni P, Seeliger B, Gonzalez C, Mutter D, Padoy N. CholecTriplet2022: Show me a tool and tell me the triplet - An endoscopic vision challenge for surgical action triplet detection. Med Image Anal 2023; 89:102888. [PMID: 37451133 DOI: 10.1016/j.media.2023.102888] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/10/2023] [Revised: 06/23/2023] [Accepted: 06/28/2023] [Indexed: 07/18/2023]
Abstract
Formalizing surgical activities as triplets of the used instruments, actions performed, and target anatomies is becoming a gold standard approach for surgical activity modeling. The benefit is that this formalization helps to obtain a more detailed understanding of tool-tissue interaction which can be used to develop better Artificial Intelligence assistance for image-guided surgery. Earlier efforts and the CholecTriplet challenge introduced in 2021 have put together techniques aimed at recognizing these triplets from surgical footage. Estimating also the spatial locations of the triplets would offer a more precise intraoperative context-aware decision support for computer-assisted intervention. This paper presents the CholecTriplet2022 challenge, which extends surgical action triplet modeling from recognition to detection. It includes weakly-supervised bounding box localization of every visible surgical instrument (or tool), as the key actors, and the modeling of each tool-activity in the form of ‹instrument, verb, target› triplet. The paper describes a baseline method and 10 new deep learning algorithms presented at the challenge to solve the task. It also provides thorough methodological comparisons of the methods, an in-depth analysis of the obtained results across multiple metrics, visual and procedural challenges; their significance, and useful insights for future research directions and applications in surgery.
Collapse
Affiliation(s)
| | - Tong Yu
- ICube, University of Strasbourg, CNRS, France
| | | | | | | | | | - Kun Yuan
- ICube, University of Strasbourg, CNRS, France; Technical University Munich, Germany
| | | | | | - Amine Yamlahi
- Division of Intelligent Medical Systems (IMSY), German Cancer Research Center (DKFZ), Heidelberg, Germany
| | - Finn-Henri Smidt
- Division of Intelligent Medical Systems (IMSY), German Cancer Research Center (DKFZ), Heidelberg, Germany
| | - Xiaoyang Zou
- Institute of Medical Robotics, School of Biomedical Engineering, Shanghai Jiao Tong University, China
| | - Guoyan Zheng
- Institute of Medical Robotics, School of Biomedical Engineering, Shanghai Jiao Tong University, China
| | - Bruno Oliveira
- 2Ai School of Technology, IPCA, Barcelos, Portugal; Life and Health Science Research Institute (ICVS), School of Medicine, University of Minho, Braga, Portugal; Algoritimi Center, School of Engineering, University of Minho, Guimeraes, Portugal
| | - Helena R Torres
- 2Ai School of Technology, IPCA, Barcelos, Portugal; Life and Health Science Research Institute (ICVS), School of Medicine, University of Minho, Braga, Portugal; Algoritimi Center, School of Engineering, University of Minho, Guimeraes, Portugal
| | | | | | | | - Ege Özsoy
- Technical University Munich, Germany
| | | | - Han Li
- Southern University of Science and Technology, China
| | | | | | | | | | | | | | - Melanie Schellenberg
- Division of Intelligent Medical Systems (IMSY), German Cancer Research Center (DKFZ), Heidelberg, Germany; National Center for Tumor Diseases (NCT), Heidelberg, Germany
| | | | | | - Zhenkun Wang
- Southern University of Science and Technology, China
| | | | - Shrawan Kumar Thapa
- Nepal Applied Mathematics and Informatics Institute for research (NAAMII), Nepal
| | | | - Patrick Godau
- Division of Intelligent Medical Systems (IMSY), German Cancer Research Center (DKFZ), Heidelberg, Germany; National Center for Tumor Diseases (NCT), Heidelberg, Germany
| | - Pedro Morais
- 2Ai School of Technology, IPCA, Barcelos, Portugal
| | - Sudarshan Regmi
- Nepal Applied Mathematics and Informatics Institute for research (NAAMII), Nepal
| | - Thuy Nuong Tran
- Division of Intelligent Medical Systems (IMSY), German Cancer Research Center (DKFZ), Heidelberg, Germany
| | - Jaime Fonseca
- Algoritimi Center, School of Engineering, University of Minho, Guimeraes, Portugal
| | - Jan-Hinrich Nölke
- Division of Intelligent Medical Systems (IMSY), German Cancer Research Center (DKFZ), Heidelberg, Germany; National Center for Tumor Diseases (NCT), Heidelberg, Germany
| | - Estevão Lima
- Life and Health Science Research Institute (ICVS), School of Medicine, University of Minho, Braga, Portugal
| | | | - Lena Maier-Hein
- Division of Intelligent Medical Systems (IMSY), German Cancer Research Center (DKFZ), Heidelberg, Germany
| | | | - Pietro Mascagni
- Fondazione Policlinico Universitario Agostino Gemelli IRCCS, Rome, Italy
| | - Barbara Seeliger
- ICube, University of Strasbourg, CNRS, France; University Hospital of Strasbourg, France; IHU Strasbourg, France
| | | | - Didier Mutter
- University Hospital of Strasbourg, France; IHU Strasbourg, France
| | - Nicolas Padoy
- ICube, University of Strasbourg, CNRS, France; IHU Strasbourg, France
| |
Collapse
|
32
|
Li Y, Xia T, Luo H, He B, Jia F. MT-FiST: A Multi-Task Fine-Grained Spatial-Temporal Framework for Surgical Action Triplet Recognition. IEEE J Biomed Health Inform 2023; 27:4983-4994. [PMID: 37498758 DOI: 10.1109/jbhi.2023.3299321] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 07/29/2023]
Abstract
Surgical action triplet recognition plays a significant role in helping surgeons facilitate scene analysis and decision-making in computer-assisted surgeries. Compared to traditional context-aware tasks such as phase recognition, surgical action triplets, comprising the instrument, verb, and target, can offer more comprehensive and detailed information. However, current triplet recognition methods fall short in distinguishing the fine-grained subclasses and disregard temporal correlation in action triplets. In this article, we propose a multi-task fine-grained spatial-temporal framework for surgical action triplet recognition named MT-FiST. The proposed method utilizes a multi-label mutual channel loss, which consists of diversity and discriminative components. This loss function decouples global task features into class-aligned features, enabling the learning of more local details from the surgical scene. The proposed framework utilizes partial shared-parameters LSTM units to capture temporal correlations between adjacent frames. We conducted experiments on the CholecT50 dataset proposed in the MICCAI 2021 Surgical Action Triplet Recognition Challenge. Our framework is evaluated on the private test set of the challenge to ensure fair comparisons. Our model apparently outperformed state-of-the-art models in instrument, verb, target, and action triplet recognition tasks, with mAPs of 82.1% (+4.6%), 51.5% (+4.0%), 45.50% (+7.8%), and 35.8% (+3.1%), respectively. The proposed MT-FiST boosts the recognition of surgical action triplets in a context-aware surgical assistant system, further solving multi-task recognition by effective temporal aggregation and fine-grained features.
Collapse
|
33
|
Ramesh S, Dall'Alba D, Gonzalez C, Yu T, Mascagni P, Mutter D, Marescaux J, Fiorini P, Padoy N. Weakly Supervised Temporal Convolutional Networks for Fine-Grained Surgical Activity Recognition. IEEE TRANSACTIONS ON MEDICAL IMAGING 2023; 42:2592-2602. [PMID: 37030859 DOI: 10.1109/tmi.2023.3262847] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/19/2023]
Abstract
Automatic recognition of fine-grained surgical activities, called steps, is a challenging but crucial task for intelligent intra-operative computer assistance. The development of current vision-based activity recognition methods relies heavily on a high volume of manually annotated data. This data is difficult and time-consuming to generate and requires domain-specific knowledge. In this work, we propose to use coarser and easier-to-annotate activity labels, namely phases, as weak supervision to learn step recognition with fewer step annotated videos. We introduce a step-phase dependency loss to exploit the weak supervision signal. We then employ a Single-Stage Temporal Convolutional Network (SS-TCN) with a ResNet-50 backbone, trained in an end-to-end fashion from weakly annotated videos, for temporal activity segmentation and recognition. We extensively evaluate and show the effectiveness of the proposed method on a large video dataset consisting of 40 laparoscopic gastric bypass procedures and the public benchmark CATARACTS containing 50 cataract surgeries.
Collapse
|
34
|
Yu T, Mascagni P, Verde J, Marescaux J, Mutter D, Padoy N. Live laparoscopic video retrieval with compressed uncertainty. Med Image Anal 2023; 88:102866. [PMID: 37356320 DOI: 10.1016/j.media.2023.102866] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/16/2022] [Revised: 04/14/2023] [Accepted: 06/07/2023] [Indexed: 06/27/2023]
Abstract
Searching through large volumes of medical data to retrieve relevant information is a challenging yet crucial task for clinical care. However the primitive and most common approach to retrieval, involving text in the form of keywords, is severely limited when dealing with complex media formats. Content-based retrieval offers a way to overcome this limitation, by using rich media as the query itself. Surgical video-to-video retrieval in particular is a new and largely unexplored research problem with high clinical value, especially in the real-time case: using real-time video hashing, search can be achieved directly inside of the operating room. Indeed, the process of hashing converts large data entries into compact binary arrays or hashes, enabling large-scale search operations at a very fast rate. However, due to fluctuations over the course of a video, not all bits in a given hash are equally reliable. In this work, we propose a method capable of mitigating this uncertainty while maintaining a light computational footprint. We present superior retrieval results (3%-4% top 10 mean average precision) on a multi-task evaluation protocol for surgery, using cholecystectomy phases, bypass phases, and coming from an entirely new dataset introduced here, surgical events across six different surgery types. Success on this multi-task benchmark shows the generalizability of our approach for surgical video retrieval.
Collapse
Affiliation(s)
- Tong Yu
- ICube, University of Strasbourg, CNRS, France; IHU Strasbourg, France.
| | - Pietro Mascagni
- ICube, University of Strasbourg, CNRS, France; IHU Strasbourg, France; Fondazione Policlinico Universitario Agostino Gemelli IRCCS, Rome, Italy
| | | | | | - Didier Mutter
- IHU Strasbourg, France; University Hospital of Strasbourg, France
| | - Nicolas Padoy
- ICube, University of Strasbourg, CNRS, France; IHU Strasbourg, France
| |
Collapse
|
35
|
De Carvalho T, Kader R, Brandao P, González-Bueno Puyal J, Lovat LB, Mountney P, Stoyanov D. Automated colonoscopy withdrawal phase duration estimation using cecum detection and surgical tasks classification. BIOMEDICAL OPTICS EXPRESS 2023; 14:2629-2644. [PMID: 37342682 PMCID: PMC10278633 DOI: 10.1364/boe.485069] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 01/06/2023] [Revised: 04/19/2023] [Accepted: 04/20/2023] [Indexed: 06/23/2023]
Abstract
Colorectal cancer is the third most common type of cancer with almost two million new cases worldwide. They develop from neoplastic polyps, most commonly adenomas, which can be removed during colonoscopy to prevent colorectal cancer from occurring. Unfortunately, up to a quarter of polyps are missed during colonoscopies. Studies have shown that polyp detection during a procedure correlates with the time spent searching for polyps, called the withdrawal time. The different phases of the procedure (cleaning, therapeutic, and exploration phases) make it difficult to precisely measure the withdrawal time, which should only include the exploration phase. Separating this from the other phases requires manual time measurement during the procedure which is rarely performed. In this study, we propose a method to automatically detect the cecum, which is the start of the withdrawal phase, and to classify the different phases of the colonoscopy, which allows precise estimation of the final withdrawal time. This is achieved using a Resnet for both detection and classification trained with two public datasets and a private dataset composed of 96 full procedures. Out of 19 testing procedures, 18 have their withdrawal time correctly estimated, with a mean error of 5.52 seconds per minute per procedure.
Collapse
Affiliation(s)
- Thomas De Carvalho
- Odin Vision, London, UK
- Wellcome/EPSRC Centre for Interventional & Surgical Sciences, University College London, London, UK
| | - Rawen Kader
- Division of Surgery & Interventional Science, University College London, London, UK
- Gastrointestinal Services, University College London Hospital, London, UK
| | | | - Juana González-Bueno Puyal
- Odin Vision, London, UK
- Wellcome/EPSRC Centre for Interventional & Surgical Sciences, University College London, London, UK
| | - Laurence B. Lovat
- Division of Surgery & Interventional Science, University College London, London, UK
- Gastrointestinal Services, University College London Hospital, London, UK
| | | | - Danail Stoyanov
- Wellcome/EPSRC Centre for Interventional & Surgical Sciences, University College London, London, UK
| |
Collapse
|
36
|
Zang C, Turkcan MK, Narasimhan S, Cao Y, Yarali K, Xiang Z, Szot S, Ahmad F, Choksi S, Bitner DP, Filicori F, Kostic Z. Surgical Phase Recognition in Inguinal Hernia Repair-AI-Based Confirmatory Baseline and Exploration of Competitive Models. Bioengineering (Basel) 2023; 10:654. [PMID: 37370585 DOI: 10.3390/bioengineering10060654] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/05/2023] [Revised: 05/18/2023] [Accepted: 05/23/2023] [Indexed: 06/29/2023] Open
Abstract
Video-recorded robotic-assisted surgeries allow the use of automated computer vision and artificial intelligence/deep learning methods for quality assessment and workflow analysis in surgical phase recognition. We considered a dataset of 209 videos of robotic-assisted laparoscopic inguinal hernia repair (RALIHR) collected from 8 surgeons, defined rigorous ground-truth annotation rules, then pre-processed and annotated the videos. We deployed seven deep learning models to establish the baseline accuracy for surgical phase recognition and explored four advanced architectures. For rapid execution of the studies, we initially engaged three dozen MS-level engineering students in a competitive classroom setting, followed by focused research. We unified the data processing pipeline in a confirmatory study, and explored a number of scenarios which differ in how the DL networks were trained and evaluated. For the scenario with 21 validation videos of all surgeons, the Video Swin Transformer model achieved ~0.85 validation accuracy, and the Perceiver IO model achieved ~0.84. Our studies affirm the necessity of close collaborative research between medical experts and engineers for developing automated surgical phase recognition models deployable in clinical settings.
Collapse
Affiliation(s)
- Chengbo Zang
- Department of Electrical Engineering, Columbia University, New York, NY 10027, USA
| | - Mehmet Kerem Turkcan
- Department of Electrical Engineering, Columbia University, New York, NY 10027, USA
| | - Sanjeev Narasimhan
- Department of Computer Science, Columbia University, New York, NY 10027, USA
| | - Yuqing Cao
- Department of Electrical Engineering, Columbia University, New York, NY 10027, USA
| | - Kaan Yarali
- Department of Electrical Engineering, Columbia University, New York, NY 10027, USA
| | - Zixuan Xiang
- Department of Electrical Engineering, Columbia University, New York, NY 10027, USA
| | - Skyler Szot
- Department of Electrical Engineering, Columbia University, New York, NY 10027, USA
| | - Feroz Ahmad
- Department of Computer Science, Columbia University, New York, NY 10027, USA
| | - Sarah Choksi
- Intraoperative Performance Analytics Laboratory (IPAL), Lenox Hill Hospital, New York, NY 10021, USA
| | - Daniel P Bitner
- Intraoperative Performance Analytics Laboratory (IPAL), Lenox Hill Hospital, New York, NY 10021, USA
| | - Filippo Filicori
- Intraoperative Performance Analytics Laboratory (IPAL), Lenox Hill Hospital, New York, NY 10021, USA
- Zucker School of Medicine at Hofstra/Northwell Health, Hempstead, NY 11549, USA
| | - Zoran Kostic
- Department of Electrical Engineering, Columbia University, New York, NY 10027, USA
| |
Collapse
|
37
|
Ramesh S, Srivastav V, Alapatt D, Yu T, Murali A, Sestini L, Nwoye CI, Hamoud I, Sharma S, Fleurentin A, Exarchakis G, Karargyris A, Padoy N. Dissecting self-supervised learning methods for surgical computer vision. Med Image Anal 2023; 88:102844. [PMID: 37270898 DOI: 10.1016/j.media.2023.102844] [Citation(s) in RCA: 9] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/06/2022] [Revised: 05/08/2023] [Accepted: 05/15/2023] [Indexed: 06/06/2023]
Abstract
The field of surgical computer vision has undergone considerable breakthroughs in recent years with the rising popularity of deep neural network-based methods. However, standard fully-supervised approaches for training such models require vast amounts of annotated data, imposing a prohibitively high cost; especially in the clinical domain. Self-Supervised Learning (SSL) methods, which have begun to gain traction in the general computer vision community, represent a potential solution to these annotation costs, allowing to learn useful representations from only unlabeled data. Still, the effectiveness of SSL methods in more complex and impactful domains, such as medicine and surgery, remains limited and unexplored. In this work, we address this critical need by investigating four state-of-the-art SSL methods (MoCo v2, SimCLR, DINO, SwAV) in the context of surgical computer vision. We present an extensive analysis of the performance of these methods on the Cholec80 dataset for two fundamental and popular tasks in surgical context understanding, phase recognition and tool presence detection. We examine their parameterization, then their behavior with respect to training data quantities in semi-supervised settings. Correct transfer of these methods to surgery, as described and conducted in this work, leads to substantial performance gains over generic uses of SSL - up to 7.4% on phase recognition and 20% on tool presence detection - as well as state-of-the-art semi-supervised phase recognition approaches by up to 14%. Further results obtained on a highly diverse selection of surgical datasets exhibit strong generalization properties. The code is available at https://github.com/CAMMA-public/SelfSupSurg.
Collapse
Affiliation(s)
- Sanat Ramesh
- ICube, University of Strasbourg, CNRS, Strasbourg 67000, France; Altair Robotics Lab, Department of Computer Science, University of Verona, Verona 37134, Italy
| | - Vinkle Srivastav
- ICube, University of Strasbourg, CNRS, Strasbourg 67000, France.
| | - Deepak Alapatt
- ICube, University of Strasbourg, CNRS, Strasbourg 67000, France
| | - Tong Yu
- ICube, University of Strasbourg, CNRS, Strasbourg 67000, France
| | - Aditya Murali
- ICube, University of Strasbourg, CNRS, Strasbourg 67000, France
| | - Luca Sestini
- ICube, University of Strasbourg, CNRS, Strasbourg 67000, France; Department of Electronics, Information and Bioengineering, Politecnico di Milano, Milano 20133, Italy
| | | | - Idris Hamoud
- ICube, University of Strasbourg, CNRS, Strasbourg 67000, France
| | - Saurav Sharma
- ICube, University of Strasbourg, CNRS, Strasbourg 67000, France
| | | | - Georgios Exarchakis
- ICube, University of Strasbourg, CNRS, Strasbourg 67000, France; IHU Strasbourg, Strasbourg 67000, France
| | - Alexandros Karargyris
- ICube, University of Strasbourg, CNRS, Strasbourg 67000, France; IHU Strasbourg, Strasbourg 67000, France
| | - Nicolas Padoy
- ICube, University of Strasbourg, CNRS, Strasbourg 67000, France; IHU Strasbourg, Strasbourg 67000, France
| |
Collapse
|
38
|
Zhang B, Goel B, Sarhan MH, Goel VK, Abukhalil R, Kalesan B, Stottler N, Petculescu S. Surgical workflow recognition with temporal convolution and transformer for action segmentation. Int J Comput Assist Radiol Surg 2023; 18:785-794. [PMID: 36542253 DOI: 10.1007/s11548-022-02811-z] [Citation(s) in RCA: 7] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/08/2022] [Accepted: 12/09/2022] [Indexed: 12/24/2022]
Abstract
PURPOSE Automatic surgical workflow recognition enabled by computer vision algorithms plays a key role in enhancing the learning experience of surgeons. It also supports building context-aware systems that allow better surgical planning and decision making which may in turn improve outcomes. Utilizing temporal information is crucial for recognizing context; hence, various recent approaches use recurrent neural networks or transformers to recognize actions. METHODS We design and implement a two-stage method for surgical workflow recognition. We utilize R(2+1)D for video clip modeling in the first stage. We propose Action Segmentation Temporal Convolutional Transformer (ASTCFormer) network for full video modeling in the second stage. ASTCFormer utilizes action segmentation transformers (ASFormers) and temporal convolutional networks (TCNs) to build a temporally aware surgical workflow recognition system. RESULTS We compare the proposed ASTCFormer with recurrent neural networks, multi-stage TCN, and ASFormer approaches. The comparison is done on a dataset comprised of 207 robotic and laparoscopic cholecystectomy surgical videos annotated for 7 surgical phases. The proposed method outperforms the compared methods achieving a [Formula: see text] relative improvement in the average segmental F1-score over the state-of-the-art ASFormer method. Moreover, our proposed method achieves state-of-the-art results on the publicly available Cholec80 dataset. CONCLUSION The improvement in the results when using the proposed method suggests that temporal context could be better captured when adding information from TCN to the ASFormer paradigm. This addition leads to better surgical workflow recognition.
Collapse
Affiliation(s)
- Bokai Zhang
- Johnson & Johnson MedTech, 1100 Olive Way, Suite 1100, Seattle, 98101, WA, USA.
| | - Bharti Goel
- Johnson & Johnson MedTech, 5490 Great America Pkwy, Santa Clara, CA, 95054, USA
| | - Mohammad Hasan Sarhan
- Johnson & Johnson MedTech, Robert-Koch-Straße 1, 22851, Norderstedt, Schleswig-Holstein, Germany
| | - Varun Kejriwal Goel
- Johnson & Johnson MedTech, 5490 Great America Pkwy, Santa Clara, CA, 95054, USA
| | - Rami Abukhalil
- Johnson & Johnson MedTech, 5490 Great America Pkwy, Santa Clara, CA, 95054, USA
| | - Bindu Kalesan
- Johnson & Johnson MedTech, 5490 Great America Pkwy, Santa Clara, CA, 95054, USA
| | - Natalie Stottler
- Johnson & Johnson MedTech, 1100 Olive Way, Suite 1100, Seattle, 98101, WA, USA
| | - Svetlana Petculescu
- Johnson & Johnson MedTech, 1100 Olive Way, Suite 1100, Seattle, 98101, WA, USA
| |
Collapse
|
39
|
Perumalla C, Kearse L, Peven M, Laufer S, Goll C, Wise B, Yang S, Pugh C. AI-Based Video Segmentation: Procedural Steps or Basic Maneuvers? J Surg Res 2023; 283:500-506. [PMID: 36436286 PMCID: PMC10368211 DOI: 10.1016/j.jss.2022.10.069] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/29/2022] [Revised: 10/14/2022] [Accepted: 10/19/2022] [Indexed: 11/25/2022]
Abstract
INTRODUCTION Video-based review of surgical procedures has proven to be useful in training by enabling efficiency in the qualitative assessment of surgical skill and intraoperative decision-making. Current video segmentation protocols focus largely on procedural steps. Although some operations are more complex than others, many of the steps in any given procedure involve an intricate choreography of basic maneuvers such as suturing, knot tying, and cutting. The use of these maneuvers at certain procedural steps can convey information that aids in the assessment of the complexity of the procedure, surgical preference, and skill. Our study aims to develop and evaluate an algorithm to identify these maneuvers. METHODS A standard deep learning architecture was used to differentiate between suture throws, knot ties, and suture cutting on a data set comprised of videos from practicing clinicians (N = 52) who participated in a simulated enterotomy repair. Perception of the added value to traditional artificial intelligence segmentation was explored by qualitatively examining the utility of identifying maneuvers in a subset of steps for an open colon resection. RESULTS An accuracy of 84% was reached in differentiating maneuvers. The precision in detecting the basic maneuvers was 87.9%, 60%, and 90.9% for suture throws, knot ties, and suture cutting, respectively. The qualitative concept mapping confirmed realistic scenarios that could benefit from basic maneuver identification. CONCLUSIONS Basic maneuvers can indicate error management activity or safety measures and allow for the assessment of skill. Our deep learning algorithm identified basic maneuvers with reasonable accuracy. Such models can aid in artificial intelligence-assisted video review by providing additional information that can complement traditional video segmentation protocols.
Collapse
Affiliation(s)
- Calvin Perumalla
- Stanford School of Medicine, Department of Surgery, Stanford, California.
| | - LaDonna Kearse
- Stanford School of Medicine, Department of Surgery, Stanford, California
| | - Michael Peven
- John Hopkins University, Department of Computer Science, Baltimore, Maryland
| | - Shlomi Laufer
- Industrial Engineering and Management, Technion - Israel Institute of Technology, Haifa, Israel
| | - Cassidi Goll
- Stanford School of Medicine, Department of Surgery, Stanford, California
| | - Brett Wise
- Stanford School of Medicine, Department of Surgery, Stanford, California
| | - Su Yang
- Stanford School of Medicine, Department of Surgery, Stanford, California
| | - Carla Pugh
- Stanford School of Medicine, Department of Surgery, Stanford, California
| |
Collapse
|
40
|
Generating Rare Surgical Events Using CycleGAN: Addressing Lack of Data for Artificial Intelligence Event Recognition. J Surg Res 2023; 283:594-605. [PMID: 36442259 DOI: 10.1016/j.jss.2022.11.008] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/08/2022] [Revised: 10/16/2022] [Accepted: 11/06/2022] [Indexed: 11/26/2022]
Abstract
INTRODUCTION Artificial Intelligence (AI) has shown promise in facilitating surgical video review through automatic recognition of surgical activities/events. There are few public video data sources that demonstrate critical yet rare events which are insufficient to train AI for reliable video event recognition. We suggest that a generative AI algorithm can create artificial massive bleeding images for minimally invasive lobectomy that can be used to augment the current lack of data in this field. MATERIALS AND METHODS A generative adversarial network (GAN) algorithm was used (CycleGAN) to generate artificial massive bleeding event images. To train CycleGAN, six videos of minimally invasive lobectomies were utilized from which 1819 frames of nonbleeding instances and 3178 frames of massive bleeding instances were used. RESULTS The performance of the CycleGAN algorithm was tested on a new video that was not used during the training process. The trained CycleGAN was able to alter the laparoscopic lobectomy images according to their corresponding massive bleeding images, where the contents of the original images were preserved (e.g., location of tools in the scene) and the style of each image is changed to massive bleeding (i.e., blood automatically added to appropriate locations on the images). CONCLUSIONS The result could suggest a promising approach to supplement the lack of data for the rare massive bleeding event that can occur during minimally invasive lobectomy. Future work could be dedicated to developing AI algorithms to identify surgical strategies and actions that potentially lead to massive bleeding and warn surgeons prior to this event occurrence.
Collapse
|
41
|
Wagner M, Müller-Stich BP, Kisilenko A, Tran D, Heger P, Mündermann L, Lubotsky DM, Müller B, Davitashvili T, Capek M, Reinke A, Reid C, Yu T, Vardazaryan A, Nwoye CI, Padoy N, Liu X, Lee EJ, Disch C, Meine H, Xia T, Jia F, Kondo S, Reiter W, Jin Y, Long Y, Jiang M, Dou Q, Heng PA, Twick I, Kirtac K, Hosgor E, Bolmgren JL, Stenzel M, von Siemens B, Zhao L, Ge Z, Sun H, Xie D, Guo M, Liu D, Kenngott HG, Nickel F, Frankenberg MV, Mathis-Ullrich F, Kopp-Schneider A, Maier-Hein L, Speidel S, Bodenstedt S. Comparative validation of machine learning algorithms for surgical workflow and skill analysis with the HeiChole benchmark. Med Image Anal 2023; 86:102770. [PMID: 36889206 DOI: 10.1016/j.media.2023.102770] [Citation(s) in RCA: 16] [Impact Index Per Article: 8.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/25/2021] [Revised: 02/03/2023] [Accepted: 02/08/2023] [Indexed: 02/23/2023]
Abstract
PURPOSE Surgical workflow and skill analysis are key technologies for the next generation of cognitive surgical assistance systems. These systems could increase the safety of the operation through context-sensitive warnings and semi-autonomous robotic assistance or improve training of surgeons via data-driven feedback. In surgical workflow analysis up to 91% average precision has been reported for phase recognition on an open data single-center video dataset. In this work we investigated the generalizability of phase recognition algorithms in a multicenter setting including more difficult recognition tasks such as surgical action and surgical skill. METHODS To achieve this goal, a dataset with 33 laparoscopic cholecystectomy videos from three surgical centers with a total operation time of 22 h was created. Labels included framewise annotation of seven surgical phases with 250 phase transitions, 5514 occurences of four surgical actions, 6980 occurences of 21 surgical instruments from seven instrument categories and 495 skill classifications in five skill dimensions. The dataset was used in the 2019 international Endoscopic Vision challenge, sub-challenge for surgical workflow and skill analysis. Here, 12 research teams trained and submitted their machine learning algorithms for recognition of phase, action, instrument and/or skill assessment. RESULTS F1-scores were achieved for phase recognition between 23.9% and 67.7% (n = 9 teams), for instrument presence detection between 38.5% and 63.8% (n = 8 teams), but for action recognition only between 21.8% and 23.3% (n = 5 teams). The average absolute error for skill assessment was 0.78 (n = 1 team). CONCLUSION Surgical workflow and skill analysis are promising technologies to support the surgical team, but there is still room for improvement, as shown by our comparison of machine learning algorithms. This novel HeiChole benchmark can be used for comparable evaluation and validation of future work. In future studies, it is of utmost importance to create more open, high-quality datasets in order to allow the development of artificial intelligence and cognitive robotics in surgery.
Collapse
Affiliation(s)
- Martin Wagner
- Department for General, Visceral and Transplantation Surgery, Heidelberg University Hospital, Im Neuenheimer Feld 420, 69120 Heidelberg, Germany; National Center for Tumor Diseases (NCT) Heidelberg, Im Neuenheimer Feld 460, 69120 Heidelberg, Germany.
| | - Beat-Peter Müller-Stich
- Department for General, Visceral and Transplantation Surgery, Heidelberg University Hospital, Im Neuenheimer Feld 420, 69120 Heidelberg, Germany; National Center for Tumor Diseases (NCT) Heidelberg, Im Neuenheimer Feld 460, 69120 Heidelberg, Germany
| | - Anna Kisilenko
- Department for General, Visceral and Transplantation Surgery, Heidelberg University Hospital, Im Neuenheimer Feld 420, 69120 Heidelberg, Germany; National Center for Tumor Diseases (NCT) Heidelberg, Im Neuenheimer Feld 460, 69120 Heidelberg, Germany
| | - Duc Tran
- Department for General, Visceral and Transplantation Surgery, Heidelberg University Hospital, Im Neuenheimer Feld 420, 69120 Heidelberg, Germany; National Center for Tumor Diseases (NCT) Heidelberg, Im Neuenheimer Feld 460, 69120 Heidelberg, Germany
| | - Patrick Heger
- Department for General, Visceral and Transplantation Surgery, Heidelberg University Hospital, Im Neuenheimer Feld 420, 69120 Heidelberg, Germany
| | - Lars Mündermann
- Data Assisted Solutions, Corporate Research & Technology, KARL STORZ SE & Co. KG, Dr. Karl-Storz-Str. 34, 78332 Tuttlingen
| | - David M Lubotsky
- Department for General, Visceral and Transplantation Surgery, Heidelberg University Hospital, Im Neuenheimer Feld 420, 69120 Heidelberg, Germany; National Center for Tumor Diseases (NCT) Heidelberg, Im Neuenheimer Feld 460, 69120 Heidelberg, Germany
| | - Benjamin Müller
- Department for General, Visceral and Transplantation Surgery, Heidelberg University Hospital, Im Neuenheimer Feld 420, 69120 Heidelberg, Germany; National Center for Tumor Diseases (NCT) Heidelberg, Im Neuenheimer Feld 460, 69120 Heidelberg, Germany
| | - Tornike Davitashvili
- Department for General, Visceral and Transplantation Surgery, Heidelberg University Hospital, Im Neuenheimer Feld 420, 69120 Heidelberg, Germany; National Center for Tumor Diseases (NCT) Heidelberg, Im Neuenheimer Feld 460, 69120 Heidelberg, Germany
| | - Manuela Capek
- Department for General, Visceral and Transplantation Surgery, Heidelberg University Hospital, Im Neuenheimer Feld 420, 69120 Heidelberg, Germany; National Center for Tumor Diseases (NCT) Heidelberg, Im Neuenheimer Feld 460, 69120 Heidelberg, Germany
| | - Annika Reinke
- Div. Computer Assisted Medical Interventions, German Cancer Research Center (DKFZ), Im Neuenheimer Feld 223, 69120 Heidelberg Germany; HIP Helmholtz Imaging Platform, German Cancer Research Center (DKFZ), Im Neuenheimer Feld 223, 69120 Heidelberg Germany; Faculty of Mathematics and Computer Science, Heidelberg University, Im Neuenheimer Feld 205, 69120 Heidelberg
| | - Carissa Reid
- Division of Biostatistics, German Cancer Research Center, Im Neuenheimer Feld 280, Heidelberg, Germany
| | - Tong Yu
- ICube, University of Strasbourg, CNRS, France. 300 bd Sébastien Brant - CS 10413, F-67412 Illkirch Cedex, France; IHU Strasbourg, France. 1 Place de l'hôpital, 67000 Strasbourg, France
| | - Armine Vardazaryan
- ICube, University of Strasbourg, CNRS, France. 300 bd Sébastien Brant - CS 10413, F-67412 Illkirch Cedex, France; IHU Strasbourg, France. 1 Place de l'hôpital, 67000 Strasbourg, France
| | - Chinedu Innocent Nwoye
- ICube, University of Strasbourg, CNRS, France. 300 bd Sébastien Brant - CS 10413, F-67412 Illkirch Cedex, France; IHU Strasbourg, France. 1 Place de l'hôpital, 67000 Strasbourg, France
| | - Nicolas Padoy
- ICube, University of Strasbourg, CNRS, France. 300 bd Sébastien Brant - CS 10413, F-67412 Illkirch Cedex, France; IHU Strasbourg, France. 1 Place de l'hôpital, 67000 Strasbourg, France
| | - Xinyang Liu
- Sheikh Zayed Institute for Pediatric Surgical Innovation, Children's National Hospital, 111 Michigan Ave NW, Washington, DC 20010, USA
| | - Eung-Joo Lee
- University of Maryland, College Park, 2405 A V Williams Building, College Park, MD 20742, USA
| | - Constantin Disch
- Fraunhofer Institute for Digital Medicine MEVIS, Max-von-Laue-Str. 2, 28359 Bremen, Germany
| | - Hans Meine
- Fraunhofer Institute for Digital Medicine MEVIS, Max-von-Laue-Str. 2, 28359 Bremen, Germany; University of Bremen, FB3, Medical Image Computing Group, ℅ Fraunhofer MEVIS, Am Fallturm 1, 28359 Bremen, Germany
| | - Tong Xia
- Lab for Medical Imaging and Digital Surgery, Shenzhen Institute of Advanced Technology, Chinese Academy of Sciences, Shenzhen 518055, China
| | - Fucang Jia
- Lab for Medical Imaging and Digital Surgery, Shenzhen Institute of Advanced Technology, Chinese Academy of Sciences, Shenzhen 518055, China
| | - Satoshi Kondo
- Konika Minolta, Inc., 1-2, Sakura-machi, Takatsuki, Oasak 569-8503, Japan
| | - Wolfgang Reiter
- Wintegral GmbH, Ehrenbreitsteiner Str. 36, 80993 München, Germany
| | - Yueming Jin
- Department of Computer Science and Engineering, Ho Sin-Hang Engineering Building, The Chinese University of Hong Kong, Sha Tin, NT, Hong Kong
| | - Yonghao Long
- Department of Computer Science and Engineering, Ho Sin-Hang Engineering Building, The Chinese University of Hong Kong, Sha Tin, NT, Hong Kong
| | - Meirui Jiang
- Department of Computer Science and Engineering, Ho Sin-Hang Engineering Building, The Chinese University of Hong Kong, Sha Tin, NT, Hong Kong
| | - Qi Dou
- Department of Computer Science and Engineering, Ho Sin-Hang Engineering Building, The Chinese University of Hong Kong, Sha Tin, NT, Hong Kong
| | - Pheng Ann Heng
- Department of Computer Science and Engineering, Ho Sin-Hang Engineering Building, The Chinese University of Hong Kong, Sha Tin, NT, Hong Kong
| | - Isabell Twick
- Caresyntax GmbH, Komturstr. 18A, 12099 Berlin, Germany
| | - Kadir Kirtac
- Caresyntax GmbH, Komturstr. 18A, 12099 Berlin, Germany
| | - Enes Hosgor
- Caresyntax GmbH, Komturstr. 18A, 12099 Berlin, Germany
| | | | | | | | - Long Zhao
- Hikvision Research Institute, Hangzhou, China
| | - Zhenxiao Ge
- Hikvision Research Institute, Hangzhou, China
| | - Haiming Sun
- Hikvision Research Institute, Hangzhou, China
| | - Di Xie
- Hikvision Research Institute, Hangzhou, China
| | - Mengqi Guo
- School of Computing, National University of Singapore, Computing 1, No.13 Computing Drive, 117417, Singapore
| | - Daochang Liu
- National Engineering Research Center of Visual Technology, School of Computer Science, Peking University, Beijing, China
| | - Hannes G Kenngott
- Department for General, Visceral and Transplantation Surgery, Heidelberg University Hospital, Im Neuenheimer Feld 420, 69120 Heidelberg, Germany
| | - Felix Nickel
- Department for General, Visceral and Transplantation Surgery, Heidelberg University Hospital, Im Neuenheimer Feld 420, 69120 Heidelberg, Germany
| | - Moritz von Frankenberg
- Department of Surgery, Salem Hospital of the Evangelische Stadtmission Heidelberg, Zeppelinstrasse 11-33, 69121 Heidelberg, Germany
| | - Franziska Mathis-Ullrich
- Health Robotics and Automation Laboratory, Institute for Anthropomatics and Robotics, Karlsruhe Institute of Technology, Geb. 40.28, KIT Campus Süd, Engler-Bunte-Ring 8, 76131 Karlsruhe, Germany
| | - Annette Kopp-Schneider
- Division of Biostatistics, German Cancer Research Center, Im Neuenheimer Feld 280, Heidelberg, Germany
| | - Lena Maier-Hein
- Div. Computer Assisted Medical Interventions, German Cancer Research Center (DKFZ), Im Neuenheimer Feld 223, 69120 Heidelberg Germany; HIP Helmholtz Imaging Platform, German Cancer Research Center (DKFZ), Im Neuenheimer Feld 223, 69120 Heidelberg Germany; Faculty of Mathematics and Computer Science, Heidelberg University, Im Neuenheimer Feld 205, 69120 Heidelberg; Medical Faculty, Heidelberg University, Im Neuenheimer Feld 672, 69120 Heidelberg
| | - Stefanie Speidel
- Div. Translational Surgical Oncology, National Center for Tumor Diseases Dresden, Fetscherstraße 74, 01307 Dresden, Germany; Cluster of Excellence "Centre for Tactile Internet with Human-in-the-Loop" (CeTI) of Technische Universität Dresden, 01062 Dresden, Germany
| | - Sebastian Bodenstedt
- Div. Translational Surgical Oncology, National Center for Tumor Diseases Dresden, Fetscherstraße 74, 01307 Dresden, Germany; Cluster of Excellence "Centre for Tactile Internet with Human-in-the-Loop" (CeTI) of Technische Universität Dresden, 01062 Dresden, Germany
| |
Collapse
|
42
|
Jalal NA, Alshirbaji TA, Docherty PD, Arabian H, Laufer B, Krueger-Ziolek S, Neumuth T, Moeller K. Laparoscopic Video Analysis Using Temporal, Attention, and Multi-Feature Fusion Based-Approaches. SENSORS (BASEL, SWITZERLAND) 2023; 23:1958. [PMID: 36850554 PMCID: PMC9964851 DOI: 10.3390/s23041958] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 01/01/2023] [Revised: 02/06/2023] [Accepted: 02/07/2023] [Indexed: 06/18/2023]
Abstract
Adapting intelligent context-aware systems (CAS) to future operating rooms (OR) aims to improve situational awareness and provide surgical decision support systems to medical teams. CAS analyzes data streams from available devices during surgery and communicates real-time knowledge to clinicians. Indeed, recent advances in computer vision and machine learning, particularly deep learning, paved the way for extensive research to develop CAS. In this work, a deep learning approach for analyzing laparoscopic videos for surgical phase recognition, tool classification, and weakly-supervised tool localization in laparoscopic videos was proposed. The ResNet-50 convolutional neural network (CNN) architecture was adapted by adding attention modules and fusing features from multiple stages to generate better-focused, generalized, and well-representative features. Then, a multi-map convolutional layer followed by tool-wise and spatial pooling operations was utilized to perform tool localization and generate tool presence confidences. Finally, the long short-term memory (LSTM) network was employed to model temporal information and perform tool classification and phase recognition. The proposed approach was evaluated on the Cholec80 dataset. The experimental results (i.e., 88.5% and 89.0% mean precision and recall for phase recognition, respectively, 95.6% mean average precision for tool presence detection, and a 70.1% F1-score for tool localization) demonstrated the ability of the model to learn discriminative features for all tasks. The performances revealed the importance of integrating attention modules and multi-stage feature fusion for more robust and precise detection of surgical phases and tools.
Collapse
Affiliation(s)
- Nour Aldeen Jalal
- Institute of Technical Medicine (ITeM), Furtwangen University, 78054 Villingen-Schwenningen, Germany
- Innovation Center Computer Assisted Surgery (ICCAS), University of Leipzig, 04103 Leipzig, Germany
| | - Tamer Abdulbaki Alshirbaji
- Institute of Technical Medicine (ITeM), Furtwangen University, 78054 Villingen-Schwenningen, Germany
- Innovation Center Computer Assisted Surgery (ICCAS), University of Leipzig, 04103 Leipzig, Germany
| | - Paul David Docherty
- Institute of Technical Medicine (ITeM), Furtwangen University, 78054 Villingen-Schwenningen, Germany
- Department of Mechanical Engineering, University of Canterbury, Christchurch 8041, New Zealand
| | - Herag Arabian
- Institute of Technical Medicine (ITeM), Furtwangen University, 78054 Villingen-Schwenningen, Germany
| | - Bernhard Laufer
- Institute of Technical Medicine (ITeM), Furtwangen University, 78054 Villingen-Schwenningen, Germany
| | - Sabine Krueger-Ziolek
- Institute of Technical Medicine (ITeM), Furtwangen University, 78054 Villingen-Schwenningen, Germany
| | - Thomas Neumuth
- Innovation Center Computer Assisted Surgery (ICCAS), University of Leipzig, 04103 Leipzig, Germany
| | - Knut Moeller
- Institute of Technical Medicine (ITeM), Furtwangen University, 78054 Villingen-Schwenningen, Germany
- Department of Mechanical Engineering, University of Canterbury, Christchurch 8041, New Zealand
- Department of Microsystems Engineering, University of Freiburg, 79110 Freiburg, Germany
| |
Collapse
|
43
|
Fer D, Zhang B, Abukhalil R, Goel V, Goel B, Barker J, Kalesan B, Barragan I, Gaddis ML, Kilroy PG. An artificial intelligence model that automatically labels roux-en-Y gastric bypasses, a comparison to trained surgeon annotators. Surg Endosc 2023:10.1007/s00464-023-09870-6. [PMID: 36658282 DOI: 10.1007/s00464-023-09870-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/21/2022] [Accepted: 01/04/2023] [Indexed: 01/21/2023]
Abstract
INTRODUCTION Artificial intelligence (AI) can automate certain tasks to improve data collection. Models have been created to annotate the steps of Roux-en-Y Gastric Bypass (RYGB). However, model performance has not been compared with individual surgeon annotator performance. We developed a model that automatically labels RYGB steps and compares its performance to surgeons. METHODS AND PROCEDURES 545 videos (17 surgeons) of laparoscopic RYGB procedures were collected. An annotation guide (12 steps, 52 tasks) was developed. Steps were annotated by 11 surgeons. Each video was annotated by two surgeons and a third reconciled the differences. A convolutional AI model was trained to identify steps and compared with manual annotation. For modeling, we used 390 videos for training, 95 for validation, and 60 for testing. The performance comparison between AI model versus manual annotation was performed using ANOVA (Analysis of Variance) in a subset of 60 testing videos. We assessed the performance of the model at each step and poor performance was defined (F1-score < 80%). RESULTS The convolutional model identified 12 steps in the RYGB architecture. Model performance varied at each step [F1 > 90% for 7, and > 80% for 2]. The reconciled manual annotation data (F1 > 80% for > 5 steps) performed better than trainee's (F1 > 80% for 2-5 steps for 4 annotators, and < 2 steps for 4 annotators). In testing subset, certain steps had low performance, indicating potential ambiguities in surgical landmarks. Additionally, some videos were easier to annotate than others, suggesting variability. After controlling for variability, the AI algorithm was comparable to the manual (p < 0.0001). CONCLUSION AI can be used to identify surgical landmarks in RYGB comparable to the manual process. AI was more accurate to recognize some landmarks more accurately than surgeons. This technology has the potential to improve surgical training by assessing the learning curves of surgeons at scale.
Collapse
Affiliation(s)
- Danyal Fer
- University of California, San Francisco-East Bay, General Surgery, Oakland, CA, USA.,Johnson & Johnson MedTech, New Brunswick, NJ, USA
| | - Bokai Zhang
- Johnson & Johnson MedTech, New Brunswick, NJ, USA
| | - Rami Abukhalil
- Johnson & Johnson MedTech, New Brunswick, NJ, USA. .,, 5490 Great America Parkway, Santa Clara, CA, 95054, USA.
| | - Varun Goel
- University of California, San Francisco-East Bay, General Surgery, Oakland, CA, USA.,Johnson & Johnson MedTech, New Brunswick, NJ, USA
| | - Bharti Goel
- Johnson & Johnson MedTech, New Brunswick, NJ, USA
| | | | | | | | | | | |
Collapse
|
44
|
Kawamura K, Ebata R, Nakamura R, Otori N. Improving situation recognition using endoscopic videos and navigation information for endoscopic sinus surgery. Int J Comput Assist Radiol Surg 2023; 18:9-16. [PMID: 36151349 DOI: 10.1007/s11548-022-02754-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/30/2022] [Accepted: 09/09/2022] [Indexed: 02/01/2023]
Abstract
PURPOSE Endoscopic sinus surgery (ESS) is widely used to treat chronic sinusitis. However, it involves the use of surgical instruments in a narrow surgical field in close proximity to vital organs, such as the brain and eyes. Thus, an advanced level of surgical skill is expected of surgeons performing this surgery. In a previous study, endoscopic images and surgical navigation information were used to develop an automatic situation recognition method in ESS. In this study, we aimed to develop a more accurate automatic surgical situation recognition method for ESS by improving the method proposed in our previous study and adding post-processing to remove incorrect recognition. METHOD We examined the training model parameters and the number of long short-term memory (LSTM) units, modified the input data augmentation method, and added post-processing. We also evaluated the modified method using clinical data. RESULT The proposed improvements improved the overall scene recognition accuracy compared with the previous study. However, phase recognition did not exhibit significant improvement. In addition, the application of the one-dimensional median filter significantly reduced short-time false recognition compared with the time series results. Furthermore, post-processing was required to set constraints on the transition of the scene to further improve recognition accuracy. CONCLUSION We suggested that the scene recognition could be improved by considering the model parameter, adding the one-dimensional filter and post-processing. However, the scene recognition accuracy remained unsatisfactory. Thus, a more accurate scene recognition and appropriate post-processing method is required.
Collapse
Affiliation(s)
- Kazuya Kawamura
- Center for Frontier Medical Engineering, Chiba University, Chiba, Japan.
| | - Ryu Ebata
- Graduate School of Science and Engineering, Chiba University, Chiba, Japan
| | - Ryoichi Nakamura
- Center for Frontier Medical Engineering, Chiba University, Chiba, Japan.,Institute of Biomaterials and Bioengineering, Tokyo Medical and Dental University, Tokyo, Japan
| | - Nobuyoshi Otori
- Department of Otorhinolaryngology, The Jikei University School of Medicine, Tokyo, Japan
| |
Collapse
|
45
|
Temporal-based Swin Transformer network for workflow recognition of surgical video. Int J Comput Assist Radiol Surg 2023; 18:139-147. [PMID: 36331795 DOI: 10.1007/s11548-022-02785-y] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/21/2022] [Accepted: 10/21/2022] [Indexed: 11/06/2022]
Abstract
PURPOSE Surgical workflow recognition has emerged as an important part of computer-assisted intervention systems for the modern operating room, which also is a very challenging problem. Although the CNN-based approach achieves excellent performance, it does not learn global and long-range semantic information interactions well due to the inductive bias inherent in convolution. METHODS In this paper, we propose a temporal-based Swin Transformer network (TSTNet) for the surgical video workflow recognition task. TSTNet contains two main parts: the Swin Transformer and the LSTM. The Swin Transformer incorporates the attention mechanism to encode remote dependencies and learn highly expressive representations. The LSTM is capable of learning long-range dependencies and is used to extract temporal information. The TSTNet organically combines the two components to extract spatiotemporal features that contain more contextual information. In particular, based on a full understanding of the natural features of the surgical video, we propose a priori revision algorithm (PRA) using a priori information about the sequence of the surgical phase. This strategy optimizes the output of TSTNet and further improves the recognition performance. RESULTS We conduct extensive experiments using the Cholec80 dataset to validate the effectiveness of the TSTNet-PRA method. Our method achieves excellent performance on the Cholec80 dataset, which accuracy is up to 92.8% and greatly exceeds the state-of-the-art methods. CONCLUSION By modelling remote temporal information and multi-scale visual information, we propose the TSTNet-PRA method. It was evaluated on a large public dataset, showing a high recognition capability superior to other spatiotemporal networks.
Collapse
|
46
|
Park M, Oh S, Jeong T, Yu S. Multi-Stage Temporal Convolutional Network with Moment Loss and Positional Encoding for Surgical Phase Recognition. Diagnostics (Basel) 2022; 13:107. [PMID: 36611399 PMCID: PMC9818879 DOI: 10.3390/diagnostics13010107] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/29/2022] [Revised: 12/28/2022] [Accepted: 12/28/2022] [Indexed: 12/31/2022] Open
Abstract
In recent times, many studies concerning surgical video analysis are being conducted due to its growing importance in many medical applications. In particular, it is very important to be able to recognize the current surgical phase because the phase information can be utilized in various ways both during and after surgery. This paper proposes an efficient phase recognition network, called MomentNet, for cholecystectomy endoscopic videos. Unlike LSTM-based network, MomentNet is based on a multi-stage temporal convolutional network. Besides, to improve the phase prediction accuracy, the proposed method adopts a new loss function to supplement the general cross entropy loss function. The new loss function significantly improves the performance of the phase recognition network by constraining un-desirable phase transition and preventing over-segmentation. In addition, MomnetNet effectively applies positional encoding techniques, which are commonly applied in transformer architectures, to the multi-stage temporal convolution network. By using the positional encoding techniques, MomentNet can provide important temporal context, resulting in higher phase prediction accuracy. Furthermore, the MomentNet applies label smoothing technique to suppress overfitting and replaces the backbone network for feature extraction to further improve the network performance. As a result, the MomentNet achieves 92.31% accuracy in the phase recognition task with the Cholec80 dataset, which is 4.55% higher than that of the baseline architecture.
Collapse
Affiliation(s)
- Minyoung Park
- School of Electrical and Electronics Engineering, Chung-Ang University, 84 Heukseok-ro, Dongjak-gu, Seoul 06974, Republic of Korea
| | - Seungtaek Oh
- School of Electrical and Electronics Engineering, Chung-Ang University, 84 Heukseok-ro, Dongjak-gu, Seoul 06974, Republic of Korea
| | - Taikyeong Jeong
- School of Artificial Intelligence Convergence, Hallym University, Chuncheon 24252, Republic of Korea
| | - Sungwook Yu
- School of Electrical and Electronics Engineering, Chung-Ang University, 84 Heukseok-ro, Dongjak-gu, Seoul 06974, Republic of Korea
| |
Collapse
|
47
|
Bastian L, Czempiel T, Heiliger C, Karcz K, Eck U, Busam B, Navab N. Know your sensors — a modality study for surgical action classification. COMPUTER METHODS IN BIOMECHANICS AND BIOMEDICAL ENGINEERING: IMAGING & VISUALIZATION 2022. [DOI: 10.1080/21681163.2022.2152377] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/23/2022]
Affiliation(s)
- Lennart Bastian
- Chair for Computer Aided Medical Procedures, TU Munich, Munich, Germany
| | - Tobias Czempiel
- Chair for Computer Aided Medical Procedures, TU Munich, Munich, Germany
| | - Christian Heiliger
- Department of General, Visceral, and Transplant Surgery, University Hospital, LMU Munich, Munich, Germany
| | - Konrad Karcz
- Department of General, Visceral, and Transplant Surgery, University Hospital, LMU Munich, Munich, Germany
| | - Ulrich Eck
- Chair for Computer Aided Medical Procedures, TU Munich, Munich, Germany
| | - Benjamin Busam
- Chair for Computer Aided Medical Procedures, TU Munich, Munich, Germany
| | - Nassir Navab
- Chair for Computer Aided Medical Procedures, TU Munich, Munich, Germany
- Computer Aided Medical Procedures, John Hopkins University, Baltimore, Maryland, USA
| |
Collapse
|
48
|
Zhang B, Sturgeon D, Shankar AR, Goel VK, Barker J, Ghanem A, Lee P, Milecky M, Stottler N, Petculescu S. Surgical instrument recognition for instrument usage documentation and surgical video library indexing. COMPUTER METHODS IN BIOMECHANICS AND BIOMEDICAL ENGINEERING: IMAGING & VISUALIZATION 2022. [DOI: 10.1080/21681163.2022.2152371] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/12/2022]
Affiliation(s)
- Bokai Zhang
- Digital Solutions, Johnson & Johnson MedTech, Seattle, WA, USA
| | - Darrick Sturgeon
- Digital Solutions, Johnson & Johnson MedTech, Santa Clara, CA, USA
| | | | | | - Jocelyn Barker
- Digital Solutions, Johnson & Johnson MedTech, Santa Clara, CA, USA
| | - Amer Ghanem
- Digital Solutions, Johnson & Johnson MedTech, Seattle, WA, USA
| | - Philip Lee
- Digital Solutions, Johnson & Johnson MedTech, Santa Clara, CA, USA
| | - Meghan Milecky
- Digital Solutions, Johnson & Johnson MedTech, Seattle, WA, USA
| | | | | |
Collapse
|
49
|
Guzmán-García C, Sánchez-González P, Oropesa I, Gómez EJ. Automatic Assessment of Procedural Skills Based on the Surgical Workflow Analysis Derived from Speech and Video. Bioengineering (Basel) 2022; 9:753. [PMID: 36550959 PMCID: PMC9774809 DOI: 10.3390/bioengineering9120753] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/19/2022] [Revised: 11/18/2022] [Accepted: 11/29/2022] [Indexed: 12/03/2022] Open
Abstract
Automatic surgical workflow analysis (SWA) plays an important role in the modelling of surgical processes. Current automatic approaches for SWA use videos (with accuracies varying from 0.8 and 0.9), but they do not incorporate speech (inherently linked to the ongoing cognitive process). The approach followed in this study uses both video and speech to classify the phases of laparoscopic cholecystectomy, based on neural networks and machine learning. The automatic application implemented in this study uses this information to calculate the total time spent in surgery, the time spent in each phase, the number of occurrences, the minimal, maximal and average time whenever there is more than one occurrence, the timeline of the surgery and the transition probability between phases. This information can be used as an assessment method for surgical procedural skills.
Collapse
Affiliation(s)
- Carmen Guzmán-García
- Biomedical Engineering and Telemedicine Centre, ETSI Telecomunicación, Center for Biomedical Technology, Universidad Politécnica de Madrid, 28040 Madrid, Spain
| | - Patricia Sánchez-González
- Biomedical Engineering and Telemedicine Centre, ETSI Telecomunicación, Center for Biomedical Technology, Universidad Politécnica de Madrid, 28040 Madrid, Spain
- Centro de Investigación Biomédica en Red en Bioingeniería, Biomateriales y Nanomedicina, 28029 Madrid, Spain
| | - Ignacio Oropesa
- Biomedical Engineering and Telemedicine Centre, ETSI Telecomunicación, Center for Biomedical Technology, Universidad Politécnica de Madrid, 28040 Madrid, Spain
| | - Enrique J. Gómez
- Biomedical Engineering and Telemedicine Centre, ETSI Telecomunicación, Center for Biomedical Technology, Universidad Politécnica de Madrid, 28040 Madrid, Spain
- Centro de Investigación Biomédica en Red en Bioingeniería, Biomateriales y Nanomedicina, 28029 Madrid, Spain
| |
Collapse
|
50
|
Takeuchi M, Collins T, Ndagijimana A, Kawakubo H, Kitagawa Y, Marescaux J, Mutter D, Perretta S, Hostettler A, Dallemagne B. Automatic surgical phase recognition in laparoscopic inguinal hernia repair with artificial intelligence. Hernia 2022; 26:1669-1678. [PMID: 35536371 DOI: 10.1007/s10029-022-02621-x] [Citation(s) in RCA: 19] [Impact Index Per Article: 6.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/10/2022] [Accepted: 04/21/2022] [Indexed: 11/26/2022]
Abstract
BACKGROUND Because of the complexity of the intra-abdominal anatomy in the posterior approach, a longer learning curve has been observed in laparoscopic transabdominal preperitoneal (TAPP) inguinal hernia repair. Consequently, automatic tools using artificial intelligence (AI) to monitor TAPP procedures and assess learning curves are required. The primary objective of this study was to establish a deep learning-based automated surgical phase recognition system for TAPP. A secondary objective was to investigate the relationship between surgical skills and phase duration. METHODS This study enrolled 119 patients who underwent the TAPP procedure. The surgical videos were annotated (delineated in time) and split into seven surgical phases (preparation, peritoneal flap incision, peritoneal flap dissection, hernia dissection, mesh deployment, mesh fixation, peritoneal flap closure, and additional closure). An AI model was trained to automatically recognize surgical phases from videos. The relationship between phase duration and surgical skills were also evaluated. RESULTS A fourfold cross-validation was used to assess the performance of the AI model. The accuracy was 88.81 and 85.82%, in unilateral and bilateral cases, respectively. In unilateral hernia cases, the duration of peritoneal incision (p = 0.003) and hernia dissection (p = 0.014) detected via AI were significantly shorter for experts than for trainees. CONCLUSION An automated surgical phase recognition system was established for TAPP using deep learning with a high accuracy. Our AI-based system can be useful for the automatic monitoring of surgery progress, improving OR efficiency, evaluating surgical skills and video-based surgical education. Specific phase durations detected via the AI model were significantly associated with the surgeons' learning curve.
Collapse
Affiliation(s)
- M Takeuchi
- IRCAD, Research Institute Against Digestive Cancer (IRCAD) France, 1, place de l'Hôpital, 67091, Strasbourg, France.
- Department of Surgery, Keio University School of Medicine, Tokyo, Japan.
| | - T Collins
- IRCAD, Research Institute Against Digestive Cancer (IRCAD) France, 1, place de l'Hôpital, 67091, Strasbourg, France
- IRCAD, Research Institute Against Digestive Cancer (IRCAD) Africa, Kigali, Rwanda
| | - A Ndagijimana
- IRCAD, Research Institute Against Digestive Cancer (IRCAD) Africa, Kigali, Rwanda
| | - H Kawakubo
- Department of Surgery, Keio University School of Medicine, Tokyo, Japan
| | - Y Kitagawa
- Department of Surgery, Keio University School of Medicine, Tokyo, Japan
| | - J Marescaux
- IRCAD, Research Institute Against Digestive Cancer (IRCAD) France, 1, place de l'Hôpital, 67091, Strasbourg, France
- IRCAD, Research Institute Against Digestive Cancer (IRCAD) Africa, Kigali, Rwanda
| | - D Mutter
- IRCAD, Research Institute Against Digestive Cancer (IRCAD) France, 1, place de l'Hôpital, 67091, Strasbourg, France
- Department of Digestive and Endocrine Surgery, University Hospital, Strasbourg, France
| | - S Perretta
- IRCAD, Research Institute Against Digestive Cancer (IRCAD) France, 1, place de l'Hôpital, 67091, Strasbourg, France
- Department of Digestive and Endocrine Surgery, University Hospital, Strasbourg, France
| | - A Hostettler
- IRCAD, Research Institute Against Digestive Cancer (IRCAD) France, 1, place de l'Hôpital, 67091, Strasbourg, France
- IRCAD, Research Institute Against Digestive Cancer (IRCAD) Africa, Kigali, Rwanda
| | - B Dallemagne
- IRCAD, Research Institute Against Digestive Cancer (IRCAD) France, 1, place de l'Hôpital, 67091, Strasbourg, France
- Department of Digestive and Endocrine Surgery, University Hospital, Strasbourg, France
| |
Collapse
|