1
|
Renz-Kiefel L, Lünse S, Mantke R, Eisert P, Hilsmann A, Wisotzky EL. Inter-hospital transferability of AI: A case study on phase recognition in cholecystectomy. Comput Biol Med 2025; 192:110235. [PMID: 40328029 DOI: 10.1016/j.compbiomed.2025.110235] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/12/2024] [Revised: 04/15/2025] [Accepted: 04/17/2025] [Indexed: 05/08/2025]
Abstract
BACKGROUND Identifying surgical phases is a crucial component of surgical workflow analysis, facilitating the automated evaluation of surgical procedures' performance and efficiency. A significant challenge in developing neural networks for surgical phase recognition lies in the scarcity of training data and the large variation in surgical techniques among surgeons. Consequently, it is imperative for these networks to possess generalization capabilities across diverse datasets. In this paper, we analyze the transferability of trained phase recognition models, using cholecystectomy as a case study. METHODS We employed datasets comprising 104 publicly available surgeries from three different centers for training and conducted multiple experiments using 21 videos of surgeries we recorded ourselves for evaluation. A two-stage deep learning architecture was employed, using a ResNet50 backbone followed by a multi-stage Temporal Convolutional Network (MS-TCN). Several experiments were conducted, including training solely on MHB data, training exclusively on public data, and training on a combination of both with an additional fine-tuning approach. RESULTS Models trained solely on MHB data achieved an accuracy of approximately 79.7%, while those trained on public data alone performed significantly worse when applied to MHB data. The best performance was obtained by retraining on a combined dataset. The results indicate that it is possible to transfer models to new environments (operating rooms or clinics) and surgeons by using public data, and incorporating site-specific data improves model transferability. CONCLUSION The results demonstrate that leveraging diverse training data, including institution-specific videos, is crucial to develop robust and transferable AI models for surgical phase recognition, thereby enhancing the potential of automated decision-support systems across different clinical environments.
Collapse
Affiliation(s)
- Lasse Renz-Kiefel
- Fraunhofer Heinrich-Hertz-Institute, Vision & Imaging Technologies, Berlin, Germany
| | - Sebastian Lünse
- Brandenburg Medical School, Department of Surgery, University Hospital Brandenburg, Germany
| | - Rene Mantke
- Brandenburg Medical School, Department of Surgery, University Hospital Brandenburg, Germany; Faculty of Health Sciences, Joint Faculty of the Brandenburg University of Technology Cottbus - Senftenberg, the Brandenburg Medical School Theodor Fontane and the University of Potsdam & Department of Surgery, University Hospital Brandenburg, Germany
| | - Peter Eisert
- Fraunhofer Heinrich-Hertz-Institute, Vision & Imaging Technologies, Berlin, Germany; Humboldt-University Berlin, Visual Computing, Berlin, Germany
| | - Anna Hilsmann
- Fraunhofer Heinrich-Hertz-Institute, Vision & Imaging Technologies, Berlin, Germany
| | - Eric L Wisotzky
- Fraunhofer Heinrich-Hertz-Institute, Vision & Imaging Technologies, Berlin, Germany; Humboldt-University Berlin, Visual Computing, Berlin, Germany; Rostock University Medical Center, Klinik und Poliklinik für Hals-Nasen-Ohrenheilkunde, Kopf- und Halschirurgie "Otto Körner", Rostock, Germany.
| |
Collapse
|
2
|
Liao W, Zhu Y, Zhang H, Wang D, Zhang L, Chen T, Zhou R, Ye Z. Artificial intelligence-assisted phase recognition and skill assessment in laparoscopic surgery: a systematic review. Front Surg 2025; 12:1551838. [PMID: 40292408 PMCID: PMC12021839 DOI: 10.3389/fsurg.2025.1551838] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/26/2024] [Accepted: 03/27/2025] [Indexed: 04/30/2025] Open
Abstract
With the widespread adoption of minimally invasive surgery, laparoscopic surgery has been an essential component of modern surgical procedures. As key technologies, laparoscopic phase recognition and skill evaluation aim to identify different stages of the surgical process and assess surgeons' operational skills using automated methods. This, in turn, can improve the quality of surgery and the skill of surgeons. This review summarizes the progress of research in laparoscopic surgery, phase recognition, and skill evaluation. At first, the importance of laparoscopic surgery is introduced, clarifying the relationship between phase recognition, skill evaluation, and other surgical tasks. The publicly available surgical datasets for laparoscopic phase recognition tasks are then detailed. The review highlights the research methods that have exhibited superior performance in these public datasets and identifies common characteristics of these high-performing methods. Based on the insights obtained, the commonly used phase recognition research and surgical skill evaluation methods and models in this field are summarized. In addition, this study briefly outlines the standards and methods for evaluating laparoscopic surgical skills. Finally, an analysis of the difficulties researchers face and potential future development directions is presented. Moreover, this paper aims to provide valuable references for researchers, promoting further advancements in this domain.
Collapse
Affiliation(s)
- Wenqiang Liao
- Department of General Surgery, RuiJin Hospital LuWan Branch, Shanghai Jiaotong University School of Medicine, Shanghai, China
| | - Ying Zhu
- Hangzhou Institute for Advanced Study, University of Chinese Academy of Sciences, Hangzhou, China
| | - Hanwei Zhang
- Institute of Intelligent Software, Guangzhou, China
| | - Dan Wang
- Hangzhou Institute for Advanced Study, University of Chinese Academy of Sciences, Hangzhou, China
| | - Lijun Zhang
- Institute of Software Chinese Academy of Sciences, Beijing, China
| | - Tianxiang Chen
- School of Cyber Space and Technology, University of Science and Technology of China, Hefei, China
| | - Ru Zhou
- Department of General Surgery, RuiJin Hospital LuWan Branch, Shanghai Jiaotong University School of Medicine, Shanghai, China
| | - Zi Ye
- Institute of Intelligent Software, Guangzhou, China
| |
Collapse
|
3
|
Alabi O, Vercauteren T, Shi M. Multitask learning in minimally invasive surgical vision: A review. Med Image Anal 2025; 101:103480. [PMID: 39938343 DOI: 10.1016/j.media.2025.103480] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/09/2024] [Revised: 11/11/2024] [Accepted: 01/21/2025] [Indexed: 02/14/2025]
Abstract
Minimally invasive surgery (MIS) has revolutionized many procedures and led to reduced recovery time and risk of patient injury. However, MIS poses additional complexity and burden on surgical teams. Data-driven surgical vision algorithms are thought to be key building blocks in the development of future MIS systems with improved autonomy. Recent advancements in machine learning and computer vision have led to successful applications in analysing videos obtained from MIS with the promise of alleviating challenges in MIS videos. Surgical scene and action understanding encompasses multiple related tasks that, when solved individually, can be memory-intensive, inefficient, and fail to capture task relationships. Multitask learning (MTL), a learning paradigm that leverages information from multiple related tasks to improve performance and aid generalization, is well-suited for fine-grained and high-level understanding of MIS data. This review provides a narrative overview of the current state-of-the-art MTL systems that leverage videos obtained from MIS. Beyond listing published approaches, we discuss the benefits and limitations of these MTL systems. Moreover, this manuscript presents an analysis of the literature for various application fields of MTL in MIS, including those with large models, highlighting notable trends, new directions of research, and developments.
Collapse
Affiliation(s)
- Oluwatosin Alabi
- School of Biomedical Engineering & Imaging Sciences, King's College London, United Kingdom
| | - Tom Vercauteren
- School of Biomedical Engineering & Imaging Sciences, King's College London, United Kingdom
| | - Miaojing Shi
- College of Electronic and Information Engineering, Tongji University, China; Shanghai Institute of Intelligent Science and Technology, Tongji University, China.
| |
Collapse
|
4
|
Liu Z, Chen K, Wang S, Xiao Y, Zhang G. Deep learning in surgical process modeling: A systematic review of workflow recognition. J Biomed Inform 2025; 162:104779. [PMID: 39832608 DOI: 10.1016/j.jbi.2025.104779] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/04/2024] [Revised: 12/25/2024] [Accepted: 01/17/2025] [Indexed: 01/22/2025]
Abstract
OBJECTIVE The application of artificial intelligence (AI) in health care has led to a surge of interest in surgical process modeling (SPM). The objective of this study is to investigate the role of deep learning in recognizing surgical workflows and extracting reliable patterns from datasets used in minimally invasive surgery, thereby advancing the development of context-aware intelligent systems in endoscopic surgeries. METHODS We conducted a comprehensive search of articles related to SPM from 2018 to April 2024 in the PubMed, Web of Science, Google Scholar, and IEEE Xplore databases. We chose surgical videos with annotations to describe the article on surgical process modeling and focused on examining the specific methods and research results of each study. RESULTS The search initially yielded 2937 articles. After filtering on the basis of the relevance of titles, abstracts, and content, 59 articles were selected for full-text review. These studies highlight the widespread adoption of neural networks, and transformers for surgical workflow analysis (SWA). They focus on minimally invasive surgeries performed with laparoscopes and microscopes. However, the process of surgical annotation lacks detailed description, and there are significant differences in the annotation process for different surgical procedures. CONCLUSION Time and spatial sequences are key factors determining the identification of surgical phase. RNN, TCN, and transformer networks are commonly used to extract long-distance temporal relationships. Multimodal data input is beneficial, as it combines information from surgical instruments. However, publicly available datasets often lack clinical knowledge, and establishing large annotated datasets for surgery remains a challenge. To reduce annotation costs, methods such as semi supervised learning, self-supervised learning, contrastive learning, transfer learning, and active learning are commonly used.
Collapse
Affiliation(s)
- Zhenzhong Liu
- Tianjin Key Laboratory for Advanced Mechatronic System Design and Intelligent Control, School of Mechanical Engineering, Tianjin University of Technology, Tianjin 300384, China; National Demonstration Center for Experimental Mechanical and Electrical Engineering Education (Tianjin University of Technology), China
| | - Kelong Chen
- Tianjin Key Laboratory for Advanced Mechatronic System Design and Intelligent Control, School of Mechanical Engineering, Tianjin University of Technology, Tianjin 300384, China; National Demonstration Center for Experimental Mechanical and Electrical Engineering Education (Tianjin University of Technology), China
| | - Shuai Wang
- Tianjin Key Laboratory for Advanced Mechatronic System Design and Intelligent Control, School of Mechanical Engineering, Tianjin University of Technology, Tianjin 300384, China; National Demonstration Center for Experimental Mechanical and Electrical Engineering Education (Tianjin University of Technology), China
| | - Yijun Xiao
- Tianjin Key Laboratory for Advanced Mechatronic System Design and Intelligent Control, School of Mechanical Engineering, Tianjin University of Technology, Tianjin 300384, China; National Demonstration Center for Experimental Mechanical and Electrical Engineering Education (Tianjin University of Technology), China
| | - Guobin Zhang
- Tianjin Key Laboratory for Advanced Mechatronic System Design and Intelligent Control, School of Mechanical Engineering, Tianjin University of Technology, Tianjin 300384, China; National Demonstration Center for Experimental Mechanical and Electrical Engineering Education (Tianjin University of Technology), China.
| |
Collapse
|
5
|
Lavanchy JL, Ramesh S, Dall'Alba D, Gonzalez C, Fiorini P, Müller-Stich BP, Nett PC, Marescaux J, Mutter D, Padoy N. Challenges in multi-centric generalization: phase and step recognition in Roux-en-Y gastric bypass surgery. Int J Comput Assist Radiol Surg 2024; 19:2249-2257. [PMID: 38761319 PMCID: PMC11541311 DOI: 10.1007/s11548-024-03166-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/16/2023] [Accepted: 04/02/2024] [Indexed: 05/20/2024]
Abstract
PURPOSE Most studies on surgical activity recognition utilizing artificial intelligence (AI) have focused mainly on recognizing one type of activity from small and mono-centric surgical video datasets. It remains speculative whether those models would generalize to other centers. METHODS In this work, we introduce a large multi-centric multi-activity dataset consisting of 140 surgical videos (MultiBypass140) of laparoscopic Roux-en-Y gastric bypass (LRYGB) surgeries performed at two medical centers, i.e., the University Hospital of Strasbourg, France (StrasBypass70) and Inselspital, Bern University Hospital, Switzerland (BernBypass70). The dataset has been fully annotated with phases and steps by two board-certified surgeons. Furthermore, we assess the generalizability and benchmark different deep learning models for the task of phase and step recognition in 7 experimental studies: (1) Training and evaluation on BernBypass70; (2) Training and evaluation on StrasBypass70; (3) Training and evaluation on the joint MultiBypass140 dataset; (4) Training on BernBypass70, evaluation on StrasBypass70; (5) Training on StrasBypass70, evaluation on BernBypass70; Training on MultiBypass140, (6) evaluation on BernBypass70 and (7) evaluation on StrasBypass70. RESULTS The model's performance is markedly influenced by the training data. The worst results were obtained in experiments (4) and (5) confirming the limited generalization capabilities of models trained on mono-centric data. The use of multi-centric training data, experiments (6) and (7), improves the generalization capabilities of the models, bringing them beyond the level of independent mono-centric training and validation (experiments (1) and (2)). CONCLUSION MultiBypass140 shows considerable variation in surgical technique and workflow of LRYGB procedures between centers. Therefore, generalization experiments demonstrate a remarkable difference in model performance. These results highlight the importance of multi-centric datasets for AI model generalization to account for variance in surgical technique and workflows. The dataset and code are publicly available at https://github.com/CAMMA-public/MultiBypass140.
Collapse
Affiliation(s)
- Joël L Lavanchy
- University Digestive Health Care Center - Clarunis, 4002, Basel, Switzerland.
- Department of Biomedical Engineering, University of Basel, 4123, Allschwil, Switzerland.
- Institute of Image-Guided Surgery, IHU Strasbourg, 67000, Strasbourg, France.
| | - Sanat Ramesh
- Institute of Image-Guided Surgery, IHU Strasbourg, 67000, Strasbourg, France
- ICube, University of Strasbourg, CNRS, 67000, Strasbourg, France
- Altair Robotics Lab, University of Verona, 37134, Verona, Italy
| | - Diego Dall'Alba
- Altair Robotics Lab, University of Verona, 37134, Verona, Italy
| | - Cristians Gonzalez
- Institute of Image-Guided Surgery, IHU Strasbourg, 67000, Strasbourg, France
- University Hospital of Strasbourg, 67000, Strasbourg, France
| | - Paolo Fiorini
- Altair Robotics Lab, University of Verona, 37134, Verona, Italy
| | - Beat P Müller-Stich
- University Digestive Health Care Center - Clarunis, 4002, Basel, Switzerland
- Department of Biomedical Engineering, University of Basel, 4123, Allschwil, Switzerland
| | - Philipp C Nett
- Department of Visceral Surgery and Medicine, Inselspital Bern University Hospital, 3010, Bern, Switzerland
| | | | - Didier Mutter
- Institute of Image-Guided Surgery, IHU Strasbourg, 67000, Strasbourg, France
- University Hospital of Strasbourg, 67000, Strasbourg, France
| | - Nicolas Padoy
- Institute of Image-Guided Surgery, IHU Strasbourg, 67000, Strasbourg, France
- ICube, University of Strasbourg, CNRS, 67000, Strasbourg, France
| |
Collapse
|
6
|
Hossain I, Madani A, Laplante S. Machine learning perioperative applications in visceral surgery: a narrative review. Front Surg 2024; 11:1493779. [PMID: 39539511 PMCID: PMC11557547 DOI: 10.3389/fsurg.2024.1493779] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/09/2024] [Accepted: 10/18/2024] [Indexed: 11/16/2024] Open
Abstract
Artificial intelligence in surgery has seen an expansive rise in research and clinical implementation in recent years, with many of the models being driven by machine learning. In the preoperative setting, machine learning models have been utilized to guide indications for surgery, appropriate timing of operations, calculation of risks and prognostication, along with improving estimations of time and resources required for surgeries. Intraoperative applications that have been demonstrated are visual annotations of the surgical field, automated classification of surgical phases and prediction of intraoperative patient decompensation. Postoperative applications have been studied the most, with most efforts put towards prediction of postoperative complications, recurrence patterns of malignancy, enhanced surgical education and assessment of surgical skill. Challenges to implementation of these models in clinical practice include the need for more quantity and quality of standardized data to improve model performance, sufficient resources and infrastructure to train and use machine learning, along with addressing ethical and patient acceptance considerations.
Collapse
Affiliation(s)
- Intekhab Hossain
- Department of Surgery, University of Toronto, Toronto, ON, Canada
- Surgical Artificial Intelligence Research Academy, University Health Network, Toronto, ON, Canada
| | - Amin Madani
- Department of Surgery, University of Toronto, Toronto, ON, Canada
- Surgical Artificial Intelligence Research Academy, University Health Network, Toronto, ON, Canada
| | - Simon Laplante
- Surgical Artificial Intelligence Research Academy, University Health Network, Toronto, ON, Canada
- Department of Surgery, Mayo Clinic, Rochester, MN, United States
| |
Collapse
|
7
|
Vogel R, Mück B. Artificial Intelligence-What to Expect From Machine Learning and Deep Learning in Hernia Surgery. JOURNAL OF ABDOMINAL WALL SURGERY : JAWS 2024; 3:13059. [PMID: 39310669 PMCID: PMC11412881 DOI: 10.3389/jaws.2024.13059] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 03/29/2024] [Accepted: 07/26/2024] [Indexed: 09/25/2024]
Abstract
This mini-review explores the integration of Artificial Intelligence (AI) within hernia surgery, highlighting the role of Machine Learning (ML) and Deep Learning (DL). The term AI incorporates various technologies including ML, Neural Networks (NN), and DL. Classical ML algorithms depend on structured, labeled data for predictions, requiring significant human oversight. In contrast, DL, a subset of ML, generally leverages unlabeled, raw data such as images and videos to autonomously identify patterns and make intricate deductions. This process is enabled by neural networks used in DL, where hidden layers between the input and output capture complex data patterns. These layers' configuration and weighting are pivotal in developing effective models for various applications, such as image and speech recognition, natural language processing, and more specifically, surgical procedures and outcomes in hernia surgery. Significant advancements have been achieved with DL models in surgical settings, particularly in predicting the complexity of abdominal wall reconstruction (AWR) and other postoperative outcomes, which are elaborated in detail within the context of this mini-review. The review method involved analyzing relevant literature from databases such as PubMed and Google Scholar, focusing on studies related to preoperative planning, intraoperative techniques, and postoperative management within hernia surgery. Only recent, peer-reviewed publications in English that directly relate to the topic were included, highlighting the latest advancements in the field to depict potential benefits and current limitations of AI technologies in hernia surgery, advocating for further research and application in this evolving field.
Collapse
Affiliation(s)
- Robert Vogel
- Klinikum Kempten - Klinikverbund Allgäu, Kempten, Germany
| | - Björn Mück
- Klinikum Kempten - Klinikverbund Allgäu, Kempten, Germany
| |
Collapse
|
8
|
Lin W, Hu Y, Fu H, Yang M, Chng CB, Kawasaki R, Chui C, Liu J. Instrument-Tissue Interaction Detection Framework for Surgical Video Understanding. IEEE TRANSACTIONS ON MEDICAL IMAGING 2024; 43:2803-2813. [PMID: 38530715 DOI: 10.1109/tmi.2024.3381209] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 03/28/2024]
Abstract
Instrument-tissue interaction detection task, which helps understand surgical activities, is vital for constructing computer-assisted surgery systems but with many challenges. Firstly, most models represent instrument-tissue interaction in a coarse-grained way which only focuses on classification and lacks the ability to automatically detect instruments and tissues. Secondly, existing works do not fully consider relations between intra- and inter-frame of instruments and tissues. In the paper, we propose to represent instrument-tissue interaction as 〈 instrument class, instrument bounding box, tissue class, tissue bounding box, action class 〉 quintuple and present an Instrument-Tissue Interaction Detection Network (ITIDNet) to detect the quintuple for surgery videos understanding. Specifically, we propose a Snippet Consecutive Feature (SCF) Layer to enhance features by modeling relationships of proposals in the current frame using global context information in the video snippet. We also propose a Spatial Corresponding Attention (SCA) Layer to incorporate features of proposals between adjacent frames through spatial encoding. To reason relationships between instruments and tissues, a Temporal Graph (TG) Layer is proposed with intra-frame connections to exploit relationships between instruments and tissues in the same frame and inter-frame connections to model the temporal information for the same instance. For evaluation, we build a cataract surgery video (PhacoQ) dataset and a cholecystectomy surgery video (CholecQ) dataset. Experimental results demonstrate the promising performance of our model, which outperforms other state-of-the-art models on both datasets.
Collapse
|
9
|
Knudsen JE, Ghaffar U, Ma R, Hung AJ. Clinical applications of artificial intelligence in robotic surgery. J Robot Surg 2024; 18:102. [PMID: 38427094 PMCID: PMC10907451 DOI: 10.1007/s11701-024-01867-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/12/2024] [Accepted: 02/10/2024] [Indexed: 03/02/2024]
Abstract
Artificial intelligence (AI) is revolutionizing nearly every aspect of modern life. In the medical field, robotic surgery is the sector with some of the most innovative and impactful advancements. In this narrative review, we outline recent contributions of AI to the field of robotic surgery with a particular focus on intraoperative enhancement. AI modeling is allowing surgeons to have advanced intraoperative metrics such as force and tactile measurements, enhanced detection of positive surgical margins, and even allowing for the complete automation of certain steps in surgical procedures. AI is also Query revolutionizing the field of surgical education. AI modeling applied to intraoperative surgical video feeds and instrument kinematics data is allowing for the generation of automated skills assessments. AI also shows promise for the generation and delivery of highly specialized intraoperative surgical feedback for training surgeons. Although the adoption and integration of AI show promise in robotic surgery, it raises important, complex ethical questions. Frameworks for thinking through ethical dilemmas raised by AI are outlined in this review. AI enhancements in robotic surgery is some of the most groundbreaking research happening today, and the studies outlined in this review represent some of the most exciting innovations in recent years.
Collapse
Affiliation(s)
- J Everett Knudsen
- Keck School of Medicine, University of Southern California, Los Angeles, USA
| | | | - Runzhuo Ma
- Cedars-Sinai Medical Center, Los Angeles, USA
| | | |
Collapse
|
10
|
Yang Y, Wang H, Wang J, Dong K, Ding S. Semantic-Preserving Surgical Video Retrieval With Phase and Behavior Coordinated Hashing. IEEE TRANSACTIONS ON MEDICAL IMAGING 2024; 43:807-819. [PMID: 37788194 DOI: 10.1109/tmi.2023.3321382] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 10/05/2023]
Abstract
Medical professionals rely on surgical video retrieval to discover relevant content within large numbers of videos for surgical education and knowledge transfer. However, the existing retrieval techniques often fail to obtain user-expected results since they ignore valuable semantics in surgical videos. The incorporation of rich semantics into video retrieval is challenging in terms of the hierarchical relationship modeling and coordination between coarse- and fine-grained semantics. To address these issues, this paper proposes a novel semantic-preserving surgical video retrieval (SPSVR) framework, which incorporates surgical phase and behavior semantics using a dual-level hashing module to capture their hierarchical relationship. This module preserves the semantics in binary hash codes by transforming the phase and behavior similarities into high- and low-level similarities in a shared Hamming space. The binary codes are optimized by performing a reconstruction task, a high-level similarity preservation task, and a low-level similarity preservation task, using a coordinated optimization strategy for efficient learning. A self-supervised learning scheme is adopted to capture behavior semantics from video clips so that the indexing of behaviors is unencumbered by fine-grained annotation and recognition. Experiments on four surgical video datasets for two different disciplines demonstrate the robust performance of the proposed framework. In addition, the results of the clinical validation experiments indicate the ability of the proposed method to retrieve the results expected by surgeons. The code can be found at https://github.com/trigger26/SPSVR.
Collapse
|
11
|
Zhai Y, Chen Z, Zheng Z, Wang X, Yan X, Liu X, Yin J, Wang J, Zhang J. Artificial intelligence for automatic surgical phase recognition of laparoscopic gastrectomy in gastric cancer. Int J Comput Assist Radiol Surg 2024; 19:345-353. [PMID: 37914911 DOI: 10.1007/s11548-023-03027-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/05/2023] [Accepted: 10/02/2023] [Indexed: 11/03/2023]
Abstract
PURPOSE This study aimed to classify laparoscopic gastric cancer phases. We also aimed to develop a transformer-based artificial intelligence (AI) model for automatic surgical phase recognition and to evaluate the model's performance using laparoscopic gastric cancer surgical videos. METHODS One hundred patients who underwent laparoscopic surgery for gastric cancer were included in this study. All surgical videos were labeled and classified into eight phases (P0. Preparation. P1. Separate the greater gastric curvature. P2. Separate the distal stomach. P3. Separate lesser gastric curvature. P4. Dissect the superior margin of the pancreas. P5. Separation of the proximal stomach. P6. Digestive tract reconstruction. P7. End of operation). This study proposed an AI phase recognition model consisting of a convolutional neural network-based visual feature extractor and temporal relational transformer. RESULTS A visual and temporal relationship network was proposed to automatically perform accurate surgical phase prediction. The average time for all surgical videos in the video set was 9114 ± 2571 s. The longest phase was at P1 (3388 s). The final research accuracy, F1, recall, and precision were 90.128, 87.04, 87.04, and 87.32%, respectively. The phase with the highest recognition accuracy was P1, and that with the lowest accuracy was P2. CONCLUSION An AI model based on neural and transformer networks was developed in this study. This model can identify the phases of laparoscopic surgery for gastric cancer accurately. AI can be used as an analytical tool for gastric cancer surgical videos.
Collapse
Affiliation(s)
- Yuhao Zhai
- Department of General Surgery, Beijing Friendship Hospital, Capital Medical University, 95 Yong'an Road, Xicheng District, Beijing, China
- State Key Lab of Digestive Health, Beijing, China
| | - Zhen Chen
- Centre for Artificial Intelligence and Robotics (CAIR), Hong Kong Institute of Science and Innovation, Chinese Academy of Sciences, Hong Kong SAR, China
| | - Zhi Zheng
- Department of General Surgery, Beijing Friendship Hospital, Capital Medical University, 95 Yong'an Road, Xicheng District, Beijing, China
- State Key Lab of Digestive Health, Beijing, China
| | - Xi Wang
- Department of General Surgery, Beijing Friendship Hospital, Capital Medical University, 95 Yong'an Road, Xicheng District, Beijing, China
- State Key Lab of Digestive Health, Beijing, China
| | - Xiaosheng Yan
- Department of General Surgery, Beijing Friendship Hospital, Capital Medical University, 95 Yong'an Road, Xicheng District, Beijing, China
- State Key Lab of Digestive Health, Beijing, China
| | - Xiaoye Liu
- Department of General Surgery, Beijing Friendship Hospital, Capital Medical University, 95 Yong'an Road, Xicheng District, Beijing, China
- State Key Lab of Digestive Health, Beijing, China
| | - Jie Yin
- Department of General Surgery, Beijing Friendship Hospital, Capital Medical University, 95 Yong'an Road, Xicheng District, Beijing, China
- State Key Lab of Digestive Health, Beijing, China
| | - Jinqiao Wang
- Centre for Artificial Intelligence and Robotics (CAIR), Hong Kong Institute of Science and Innovation, Chinese Academy of Sciences, Hong Kong SAR, China.
- National Laboratory of Pattern Recognition, Institute of Automation, Chinese Academy of Sciences, 95 Zhongguancun East Road, Haidian District, Beijing, China.
- Wuhan AI Research, Wuhan, China.
| | - Jun Zhang
- Department of General Surgery, Beijing Friendship Hospital, Capital Medical University, 95 Yong'an Road, Xicheng District, Beijing, China.
- State Key Lab of Digestive Health, Beijing, China.
| |
Collapse
|
12
|
Kostiuchik G, Sharan L, Mayer B, Wolf I, Preim B, Engelhardt S. Surgical phase and instrument recognition: how to identify appropriate dataset splits. Int J Comput Assist Radiol Surg 2024:10.1007/s11548-024-03063-9. [PMID: 38285380 DOI: 10.1007/s11548-024-03063-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/01/2023] [Accepted: 01/08/2024] [Indexed: 01/30/2024]
Abstract
PURPOSE Machine learning approaches can only be reliably evaluated if training, validation, and test data splits are representative and not affected by the absence of classes. Surgical workflow and instrument recognition are two tasks that are complicated in this manner, because of heavy data imbalances resulting from different length of phases and their potential erratic occurrences. Furthermore, sub-properties like instrument (co-)occurrence are usually not particularly considered when defining the split. METHODS We present a publicly available data visualization tool that enables interactive exploration of dataset partitions for surgical phase and instrument recognition. The application focuses on the visualization of the occurrence of phases, phase transitions, instruments, and instrument combinations across sets. Particularly, it facilitates assessment of dataset splits, especially regarding identification of sub-optimal dataset splits. RESULTS We performed analysis of the datasets Cholec80, CATARACTS, CaDIS, M2CAI-workflow, and M2CAI-tool using the proposed application. We were able to uncover phase transitions, individual instruments, and combinations of surgical instruments that were not represented in one of the sets. Addressing these issues, we identify possible improvements in the splits using our tool. A user study with ten participants demonstrated that the participants were able to successfully solve a selection of data exploration tasks. CONCLUSION In highly unbalanced class distributions, special care should be taken with respect to the selection of an appropriate dataset split because it can greatly influence the assessments of machine learning approaches. Our interactive tool allows for determination of better splits to improve current practices in the field. The live application is available at https://cardio-ai.github.io/endovis-ml/ .
Collapse
Affiliation(s)
- Georgii Kostiuchik
- Department of Cardiac Surgery, Heidelberg University Hospital, Heidelberg, Germany.
- DZHK (German Centre for Cardiovascular Research), Partner Site Heidelberg/Mannheim, Heidelberg, Germany.
| | - Lalith Sharan
- Department of Cardiac Surgery, Heidelberg University Hospital, Heidelberg, Germany
- DZHK (German Centre for Cardiovascular Research), Partner Site Heidelberg/Mannheim, Heidelberg, Germany
| | - Benedikt Mayer
- Department of Simulation and Graphics, University of Magdeburg, Magdeburg, Germany
| | - Ivo Wolf
- Department of Computer Science, Mannheim University of Applied Sciences, Mannheim, Germany
| | - Bernhard Preim
- Department of Simulation and Graphics, University of Magdeburg, Magdeburg, Germany
| | - Sandy Engelhardt
- Department of Cardiac Surgery, Heidelberg University Hospital, Heidelberg, Germany
- DZHK (German Centre for Cardiovascular Research), Partner Site Heidelberg/Mannheim, Heidelberg, Germany
| |
Collapse
|
13
|
Demir KC, Schieber H, Weise T, Roth D, May M, Maier A, Yang SH. Deep Learning in Surgical Workflow Analysis: A Review of Phase and Step Recognition. IEEE J Biomed Health Inform 2023; 27:5405-5417. [PMID: 37665700 DOI: 10.1109/jbhi.2023.3311628] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 09/06/2023]
Abstract
OBJECTIVE In the last two decades, there has been a growing interest in exploring surgical procedures with statistical models to analyze operations at different semantic levels. This information is necessary for developing context-aware intelligent systems, which can assist the physicians during operations, evaluate procedures afterward or help the management team to effectively utilize the operating room. The objective is to extract reliable patterns from surgical data for the robust estimation of surgical activities performed during operations. The purpose of this article is to review the state-of-the-art deep learning methods that have been published after 2018 for analyzing surgical workflows, with a focus on phase and step recognition. METHODS Three databases, IEEE Xplore, Scopus, and PubMed were searched, and additional studies are added through a manual search. After the database search, 343 studies were screened and a total of 44 studies are selected for this review. CONCLUSION The use of temporal information is essential for identifying the next surgical action. Contemporary methods used mainly RNNs, hierarchical CNNs, and Transformers to preserve long-distance temporal relations. The lack of large publicly available datasets for various procedures is a great challenge for the development of new and robust models. As supervised learning strategies are used to show proof-of-concept, self-supervised, semi-supervised, or active learning methods are used to mitigate dependency on annotated data. SIGNIFICANCE The present study provides a comprehensive review of recent methods in surgical workflow analysis, summarizes commonly used architectures, datasets, and discusses challenges.
Collapse
|
14
|
Zhang J, Zhou S, Wang Y, Shi S, Wan C, Zhao H, Cai X, Ding H. Laparoscopic Image-Based Critical Action Recognition and Anticipation With Explainable Features. IEEE J Biomed Health Inform 2023; 27:5393-5404. [PMID: 37603480 DOI: 10.1109/jbhi.2023.3306818] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 08/23/2023]
Abstract
Surgical workflow analysis integrates perception, comprehension, and prediction of the surgical workflow, which helps real-time surgical support systems provide proper guidance and assistance for surgeons. This article promotes the idea of critical actions, which refer to the essential surgical actions that progress towards the fulfillment of the operation. Fine-grained workflow analysis involves recognizing current critical actions and previewing the moving tendency of instruments in the early stage of critical actions. Aiming at this, we propose a framework that incorporates operational experience to improve the robustness and interpretability of action recognition in in-vivo situations. High-dimensional images are mapped into an experience-based explainable feature space with low dimensions to achieve critical action recognition through a hierarchical classification structure. To forecast the instrument's motion tendency, we model the motion primitives in the polar coordinate system (PCS) to represent patterns of complex trajectories. Given the laparoscopy variance, the adaptive pattern recognition (APR) method, which adapts to uncertain trajectories by modifying model parameters, is designed to improve prediction accuracy. The in-vivo dataset validations show that our framework fulfilled the surgical awareness tasks with exceptional accuracy and real-time performance.
Collapse
|
15
|
Tao R, Zou X, Zheng G. LAST: LAtent Space-Constrained Transformers for Automatic Surgical Phase Recognition and Tool Presence Detection. IEEE TRANSACTIONS ON MEDICAL IMAGING 2023; 42:3256-3268. [PMID: 37227905 DOI: 10.1109/tmi.2023.3279838] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/27/2023]
Abstract
When developing context-aware systems, automatic surgical phase recognition and tool presence detection are two essential tasks. There exist previous attempts to develop methods for both tasks but majority of the existing methods utilize a frame-level loss function (e.g., cross-entropy) which does not fully leverage the underlying semantic structure of a surgery, leading to sub-optimal results. In this paper, we propose multi-task learning-based, LAtent Space-constrained Transformers, referred as LAST, for automatic surgical phase recognition and tool presence detection. Our design features a two-branch transformer architecture with a novel and generic way to leverage video-level semantic information during network training. This is done by learning a non-linear compact presentation of the underlying semantic structure information of surgical videos through a transformer variational autoencoder (VAE) and by encouraging models to follow the learned statistical distributions. In other words, LAST is of structure-aware and favors predictions that lie on the extracted low dimensional data manifold. Validated on two public datasets of the cholecystectomy surgery, i.e., the Cholec80 dataset and the M2cai16 dataset, our method achieves better results than other state-of-the-art methods. Specifically, on the Cholec80 dataset, our method achieves an average accuracy of 93.12±4.71%, an average precision of 89.25±5.49%, an average recall of 90.10±5.45% and an average Jaccard of 81.11 ±7.62% for phase recognition, and an average mAP of 95.15±3.87% for tool presence detection. Similar superior performance is also observed when LAST is applied to the M2cai16 dataset.
Collapse
|
16
|
Cao J, Yip HC, Chen Y, Scheppach M, Luo X, Yang H, Cheng MK, Long Y, Jin Y, Chiu PWY, Yam Y, Meng HML, Dou Q. Intelligent surgical workflow recognition for endoscopic submucosal dissection with real-time animal study. Nat Commun 2023; 14:6676. [PMID: 37865629 PMCID: PMC10590425 DOI: 10.1038/s41467-023-42451-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/20/2023] [Accepted: 10/11/2023] [Indexed: 10/23/2023] Open
Abstract
Recent advancements in artificial intelligence have witnessed human-level performance; however, AI-enabled cognitive assistance for therapeutic procedures has not been fully explored nor pre-clinically validated. Here we propose AI-Endo, an intelligent surgical workflow recognition suit, for endoscopic submucosal dissection (ESD). Our AI-Endo is trained on high-quality ESD cases from an expert endoscopist, covering a decade time expansion and consisting of 201,026 labeled frames. The learned model demonstrates outstanding performance on validation data, including cases from relatively junior endoscopists with various skill levels, procedures conducted with different endoscopy systems and therapeutic skills, and cohorts from international multi-centers. Furthermore, we integrate our AI-Endo with the Olympus endoscopic system and validate the AI-enabled cognitive assistance system with animal studies in live ESD training sessions. Dedicated data analysis from surgical phase recognition results is summarized in an automatically generated report for skill assessment.
Collapse
Affiliation(s)
- Jianfeng Cao
- Department of Computer Science and Engineering, The Chinese University of Hong Kong, Hong Kong, China
| | - Hon-Chi Yip
- Department of Surgery, The Chinese University of Hong Kong, Hong Kong, China.
| | - Yueyao Chen
- Department of Computer Science and Engineering, The Chinese University of Hong Kong, Hong Kong, China
| | - Markus Scheppach
- Internal Medicine III-Gastroenterology, University Hospital of Augsburg, Augsburg, Germany
| | - Xiaobei Luo
- Guangdong Provincial Key Laboratory of Gastroenterology, Nanfang Hospital, Southern Medical University, Guangzhou, China
| | - Hongzheng Yang
- Department of Computer Science and Engineering, The Chinese University of Hong Kong, Hong Kong, China
| | - Ming Kit Cheng
- Department of Mechanical and Automation Engineering, The Chinese University of Hong Kong, Hong Kong, China
| | - Yonghao Long
- Department of Computer Science and Engineering, The Chinese University of Hong Kong, Hong Kong, China
| | - Yueming Jin
- Department of Biomedical Engineering, National University of Singapore, Singapore, Singapore
| | - Philip Wai-Yan Chiu
- Multi-scale Medical Robotics Center and The Chinese University of Hong Kong, Hong Kong, China.
| | - Yeung Yam
- Department of Mechanical and Automation Engineering, The Chinese University of Hong Kong, Hong Kong, China.
- Multi-scale Medical Robotics Center and The Chinese University of Hong Kong, Hong Kong, China.
- Centre for Perceptual and Interactive Intelligence and The Chinese University of Hong Kong, Hong Kong, China.
| | - Helen Mei-Ling Meng
- Centre for Perceptual and Interactive Intelligence and The Chinese University of Hong Kong, Hong Kong, China.
| | - Qi Dou
- Department of Computer Science and Engineering, The Chinese University of Hong Kong, Hong Kong, China.
| |
Collapse
|
17
|
Ramesh S, Dall'Alba D, Gonzalez C, Yu T, Mascagni P, Mutter D, Marescaux J, Fiorini P, Padoy N. Weakly Supervised Temporal Convolutional Networks for Fine-Grained Surgical Activity Recognition. IEEE TRANSACTIONS ON MEDICAL IMAGING 2023; 42:2592-2602. [PMID: 37030859 DOI: 10.1109/tmi.2023.3262847] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/19/2023]
Abstract
Automatic recognition of fine-grained surgical activities, called steps, is a challenging but crucial task for intelligent intra-operative computer assistance. The development of current vision-based activity recognition methods relies heavily on a high volume of manually annotated data. This data is difficult and time-consuming to generate and requires domain-specific knowledge. In this work, we propose to use coarser and easier-to-annotate activity labels, namely phases, as weak supervision to learn step recognition with fewer step annotated videos. We introduce a step-phase dependency loss to exploit the weak supervision signal. We then employ a Single-Stage Temporal Convolutional Network (SS-TCN) with a ResNet-50 backbone, trained in an end-to-end fashion from weakly annotated videos, for temporal activity segmentation and recognition. We extensively evaluate and show the effectiveness of the proposed method on a large video dataset consisting of 40 laparoscopic gastric bypass procedures and the public benchmark CATARACTS containing 50 cataract surgeries.
Collapse
|
18
|
Ramesh S, Dall'Alba D, Gonzalez C, Yu T, Mascagni P, Mutter D, Marescaux J, Fiorini P, Padoy N. TRandAugment: temporal random augmentation strategy for surgical activity recognition from videos. Int J Comput Assist Radiol Surg 2023; 18:1665-1672. [PMID: 36944845 PMCID: PMC10491694 DOI: 10.1007/s11548-023-02864-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/05/2023] [Accepted: 03/01/2023] [Indexed: 03/23/2023]
Abstract
PURPOSE Automatic recognition of surgical activities from intraoperative surgical videos is crucial for developing intelligent support systems for computer-assisted interventions. Current state-of-the-art recognition methods are based on deep learning where data augmentation has shown the potential to improve the generalization of these methods. This has spurred work on automated and simplified augmentation strategies for image classification and object detection on datasets of still images. Extending such augmentation methods to videos is not straightforward, as the temporal dimension needs to be considered. Furthermore, surgical videos pose additional challenges as they are composed of multiple, interconnected, and long-duration activities. METHODS This work proposes a new simplified augmentation method, called TRandAugment, specifically designed for long surgical videos, that treats each video as an assemble of temporal segments and applies consistent but random transformations to each segment. The proposed augmentation method is used to train an end-to-end spatiotemporal model consisting of a CNN (ResNet50) followed by a TCN. RESULTS The effectiveness of the proposed method is demonstrated on two surgical video datasets, namely Bypass40 and CATARACTS, and two tasks, surgical phase and step recognition. TRandAugment adds a performance boost of 1-6% over previous state-of-the-art methods, that uses manually designed augmentations. CONCLUSION This work presents a simplified and automated augmentation method for long surgical videos. The proposed method has been validated on different datasets and tasks indicating the importance of devising temporal augmentation methods for long surgical videos.
Collapse
Affiliation(s)
- Sanat Ramesh
- Altair Robotics Lab, University of Verona, 37134, Verona, Italy.
- ICube, University of Strasbourg, CNRS, 67000, Strasbourg, France.
| | - Diego Dall'Alba
- Altair Robotics Lab, University of Verona, 37134, Verona, Italy
| | - Cristians Gonzalez
- University Hospital of Strasbourg, 67000, Strasbourg, France
- Institute of Image-Guided Surgery, IHU Strasbourg, 67000, Strasbourg, France
| | - Tong Yu
- ICube, University of Strasbourg, CNRS, 67000, Strasbourg, France
| | - Pietro Mascagni
- Institute of Image-Guided Surgery, IHU Strasbourg, 67000, Strasbourg, France
- Fondazione Policlinico Universitario Agostino Gemelli IRCCS, 00168, Rome, Italy
| | - Didier Mutter
- University Hospital of Strasbourg, 67000, Strasbourg, France
- IRCAD, 67000, Strasbourg, France
- Institute of Image-Guided Surgery, IHU Strasbourg, 67000, Strasbourg, France
| | | | - Paolo Fiorini
- Altair Robotics Lab, University of Verona, 37134, Verona, Italy
| | - Nicolas Padoy
- ICube, University of Strasbourg, CNRS, 67000, Strasbourg, France
- Institute of Image-Guided Surgery, IHU Strasbourg, 67000, Strasbourg, France
| |
Collapse
|
19
|
Yu T, Mascagni P, Verde J, Marescaux J, Mutter D, Padoy N. Live laparoscopic video retrieval with compressed uncertainty. Med Image Anal 2023; 88:102866. [PMID: 37356320 DOI: 10.1016/j.media.2023.102866] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/16/2022] [Revised: 04/14/2023] [Accepted: 06/07/2023] [Indexed: 06/27/2023]
Abstract
Searching through large volumes of medical data to retrieve relevant information is a challenging yet crucial task for clinical care. However the primitive and most common approach to retrieval, involving text in the form of keywords, is severely limited when dealing with complex media formats. Content-based retrieval offers a way to overcome this limitation, by using rich media as the query itself. Surgical video-to-video retrieval in particular is a new and largely unexplored research problem with high clinical value, especially in the real-time case: using real-time video hashing, search can be achieved directly inside of the operating room. Indeed, the process of hashing converts large data entries into compact binary arrays or hashes, enabling large-scale search operations at a very fast rate. However, due to fluctuations over the course of a video, not all bits in a given hash are equally reliable. In this work, we propose a method capable of mitigating this uncertainty while maintaining a light computational footprint. We present superior retrieval results (3%-4% top 10 mean average precision) on a multi-task evaluation protocol for surgery, using cholecystectomy phases, bypass phases, and coming from an entirely new dataset introduced here, surgical events across six different surgery types. Success on this multi-task benchmark shows the generalizability of our approach for surgical video retrieval.
Collapse
Affiliation(s)
- Tong Yu
- ICube, University of Strasbourg, CNRS, France; IHU Strasbourg, France.
| | - Pietro Mascagni
- ICube, University of Strasbourg, CNRS, France; IHU Strasbourg, France; Fondazione Policlinico Universitario Agostino Gemelli IRCCS, Rome, Italy
| | | | | | - Didier Mutter
- IHU Strasbourg, France; University Hospital of Strasbourg, France
| | - Nicolas Padoy
- ICube, University of Strasbourg, CNRS, France; IHU Strasbourg, France
| |
Collapse
|
20
|
Lavanchy JL, Vardazaryan A, Mascagni P, Mutter D, Padoy N. Preserving privacy in surgical video analysis using a deep learning classifier to identify out-of-body scenes in endoscopic videos. Sci Rep 2023; 13:9235. [PMID: 37286660 PMCID: PMC10247775 DOI: 10.1038/s41598-023-36453-1] [Citation(s) in RCA: 8] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/02/2023] [Accepted: 06/03/2023] [Indexed: 06/09/2023] Open
Abstract
Surgical video analysis facilitates education and research. However, video recordings of endoscopic surgeries can contain privacy-sensitive information, especially if the endoscopic camera is moved out of the body of patients and out-of-body scenes are recorded. Therefore, identification of out-of-body scenes in endoscopic videos is of major importance to preserve the privacy of patients and operating room staff. This study developed and validated a deep learning model for the identification of out-of-body images in endoscopic videos. The model was trained and evaluated on an internal dataset of 12 different types of laparoscopic and robotic surgeries and was externally validated on two independent multicentric test datasets of laparoscopic gastric bypass and cholecystectomy surgeries. Model performance was evaluated compared to human ground truth annotations measuring the receiver operating characteristic area under the curve (ROC AUC). The internal dataset consisting of 356,267 images from 48 videos and the two multicentric test datasets consisting of 54,385 and 58,349 images from 10 and 20 videos, respectively, were annotated. The model identified out-of-body images with 99.97% ROC AUC on the internal test dataset. Mean ± standard deviation ROC AUC on the multicentric gastric bypass dataset was 99.94 ± 0.07% and 99.71 ± 0.40% on the multicentric cholecystectomy dataset, respectively. The model can reliably identify out-of-body images in endoscopic videos and is publicly shared. This facilitates privacy preservation in surgical video analysis.
Collapse
Affiliation(s)
- Joël L Lavanchy
- IHU Strasbourg, 1 Place de l'Hôpital, 67091, Strasbourg Cedex, France.
- Department of Visceral Surgery and Medicine, Inselspital, Bern University Hospital, University of Bern, Bern, Switzerland.
- Division of Surgery, Clarunis-University Center for Gastrointestinal and Liver Diseases, St Clara and University Hospital of Basel, Basel, Switzerland.
| | - Armine Vardazaryan
- IHU Strasbourg, 1 Place de l'Hôpital, 67091, Strasbourg Cedex, France
- ICube, University of Strasbourg, CNRS, Strasbourg, France
| | - Pietro Mascagni
- IHU Strasbourg, 1 Place de l'Hôpital, 67091, Strasbourg Cedex, France
- Fondazione Policlinico Universitario Agostino Gemelli IRCCS, Rome, Italy
| | - Didier Mutter
- IHU Strasbourg, 1 Place de l'Hôpital, 67091, Strasbourg Cedex, France
- University Hospital of Strasbourg, Strasbourg, France
| | - Nicolas Padoy
- IHU Strasbourg, 1 Place de l'Hôpital, 67091, Strasbourg Cedex, France
- ICube, University of Strasbourg, CNRS, Strasbourg, France
| |
Collapse
|
21
|
Nyangoh Timoh K, Huaulme A, Cleary K, Zaheer MA, Lavoué V, Donoho D, Jannin P. A systematic review of annotation for surgical process model analysis in minimally invasive surgery based on video. Surg Endosc 2023:10.1007/s00464-023-10041-w. [PMID: 37157035 DOI: 10.1007/s00464-023-10041-w] [Citation(s) in RCA: 12] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/16/2022] [Accepted: 03/25/2023] [Indexed: 05/10/2023]
Abstract
BACKGROUND Annotated data are foundational to applications of supervised machine learning. However, there seems to be a lack of common language used in the field of surgical data science. The aim of this study is to review the process of annotation and semantics used in the creation of SPM for minimally invasive surgery videos. METHODS For this systematic review, we reviewed articles indexed in the MEDLINE database from January 2000 until March 2022. We selected articles using surgical video annotations to describe a surgical process model in the field of minimally invasive surgery. We excluded studies focusing on instrument detection or recognition of anatomical areas only. The risk of bias was evaluated with the Newcastle Ottawa Quality assessment tool. Data from the studies were visually presented in table using the SPIDER tool. RESULTS Of the 2806 articles identified, 34 were selected for review. Twenty-two were in the field of digestive surgery, six in ophthalmologic surgery only, one in neurosurgery, three in gynecologic surgery, and two in mixed fields. Thirty-one studies (88.2%) were dedicated to phase, step, or action recognition and mainly relied on a very simple formalization (29, 85.2%). Clinical information in the datasets was lacking for studies using available public datasets. The process of annotation for surgical process model was lacking and poorly described, and description of the surgical procedures was highly variable between studies. CONCLUSION Surgical video annotation lacks a rigorous and reproducible framework. This leads to difficulties in sharing videos between institutions and hospitals because of the different languages used. There is a need to develop and use common ontology to improve libraries of annotated surgical videos.
Collapse
Affiliation(s)
- Krystel Nyangoh Timoh
- Department of Gynecology and Obstetrics and Human Reproduction, CHU Rennes, Rennes, France.
- INSERM, LTSI - UMR 1099, University Rennes 1, Rennes, France.
- Laboratoire d'Anatomie et d'Organogenèse, Faculté de Médecine, Centre Hospitalier Universitaire de Rennes, 2 Avenue du Professeur Léon Bernard, 35043, Rennes Cedex, France.
- Department of Obstetrics and Gynecology, Rennes Hospital, Rennes, France.
| | - Arnaud Huaulme
- INSERM, LTSI - UMR 1099, University Rennes 1, Rennes, France
| | - Kevin Cleary
- Sheikh Zayed Institute for Pediatric Surgical Innovation, Children's National Hospital, Washington, DC, 20010, USA
| | - Myra A Zaheer
- George Washington University School of Medicine and Health Sciences, Washington, DC, USA
| | - Vincent Lavoué
- Department of Gynecology and Obstetrics and Human Reproduction, CHU Rennes, Rennes, France
| | - Dan Donoho
- Division of Neurosurgery, Center for Neuroscience, Children's National Hospital, Washington, DC, 20010, USA
| | - Pierre Jannin
- INSERM, LTSI - UMR 1099, University Rennes 1, Rennes, France
| |
Collapse
|
22
|
Sharma S, Nwoye CI, Mutter D, Padoy N. Rendezvous in time: an attention-based temporal fusion approach for surgical triplet recognition. Int J Comput Assist Radiol Surg 2023:10.1007/s11548-023-02914-1. [PMID: 37097518 DOI: 10.1007/s11548-023-02914-1] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/13/2023] [Accepted: 04/07/2023] [Indexed: 04/26/2023]
Abstract
PURPOSE One of the recent advances in surgical AI is the recognition of surgical activities as triplets of [Formula: see text]instrument, verb, target[Formula: see text]. Albeit providing detailed information for computer-assisted intervention, current triplet recognition approaches rely only on single-frame features. Exploiting the temporal cues from earlier frames would improve the recognition of surgical action triplets from videos. METHODS In this paper, we propose Rendezvous in Time (RiT)-a deep learning model that extends the state-of-the-art model, Rendezvous, with temporal modeling. Focusing more on the verbs, our RiT explores the connectedness of current and past frames to learn temporal attention-based features for enhanced triplet recognition. RESULTS We validate our proposal on the challenging surgical triplet dataset, CholecT45, demonstrating an improved recognition of the verb and triplet along with other interactions involving the verb such as [Formula: see text]instrument, verb[Formula: see text]. Qualitative results show that the RiT produces smoother predictions for most triplet instances than the state-of-the-arts. CONCLUSION We present a novel attention-based approach that leverages the temporal fusion of video frames to model the evolution of surgical actions and exploit their benefits for surgical triplet recognition.
Collapse
Affiliation(s)
- Saurav Sharma
- ICube, University of Strasbourg, CNRS, Strasbourg, France.
| | | | - Didier Mutter
- IHU Strasbourg, Strasbourg, France
- University Hospital of Strasbourg, Strasbourg, France
| | - Nicolas Padoy
- ICube, University of Strasbourg, CNRS, Strasbourg, France
- IHU Strasbourg, Strasbourg, France
| |
Collapse
|
23
|
Zhang B, Goel B, Sarhan MH, Goel VK, Abukhalil R, Kalesan B, Stottler N, Petculescu S. Surgical workflow recognition with temporal convolution and transformer for action segmentation. Int J Comput Assist Radiol Surg 2023; 18:785-794. [PMID: 36542253 DOI: 10.1007/s11548-022-02811-z] [Citation(s) in RCA: 7] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/08/2022] [Accepted: 12/09/2022] [Indexed: 12/24/2022]
Abstract
PURPOSE Automatic surgical workflow recognition enabled by computer vision algorithms plays a key role in enhancing the learning experience of surgeons. It also supports building context-aware systems that allow better surgical planning and decision making which may in turn improve outcomes. Utilizing temporal information is crucial for recognizing context; hence, various recent approaches use recurrent neural networks or transformers to recognize actions. METHODS We design and implement a two-stage method for surgical workflow recognition. We utilize R(2+1)D for video clip modeling in the first stage. We propose Action Segmentation Temporal Convolutional Transformer (ASTCFormer) network for full video modeling in the second stage. ASTCFormer utilizes action segmentation transformers (ASFormers) and temporal convolutional networks (TCNs) to build a temporally aware surgical workflow recognition system. RESULTS We compare the proposed ASTCFormer with recurrent neural networks, multi-stage TCN, and ASFormer approaches. The comparison is done on a dataset comprised of 207 robotic and laparoscopic cholecystectomy surgical videos annotated for 7 surgical phases. The proposed method outperforms the compared methods achieving a [Formula: see text] relative improvement in the average segmental F1-score over the state-of-the-art ASFormer method. Moreover, our proposed method achieves state-of-the-art results on the publicly available Cholec80 dataset. CONCLUSION The improvement in the results when using the proposed method suggests that temporal context could be better captured when adding information from TCN to the ASFormer paradigm. This addition leads to better surgical workflow recognition.
Collapse
Affiliation(s)
- Bokai Zhang
- Johnson & Johnson MedTech, 1100 Olive Way, Suite 1100, Seattle, 98101, WA, USA.
| | - Bharti Goel
- Johnson & Johnson MedTech, 5490 Great America Pkwy, Santa Clara, CA, 95054, USA
| | - Mohammad Hasan Sarhan
- Johnson & Johnson MedTech, Robert-Koch-Straße 1, 22851, Norderstedt, Schleswig-Holstein, Germany
| | - Varun Kejriwal Goel
- Johnson & Johnson MedTech, 5490 Great America Pkwy, Santa Clara, CA, 95054, USA
| | - Rami Abukhalil
- Johnson & Johnson MedTech, 5490 Great America Pkwy, Santa Clara, CA, 95054, USA
| | - Bindu Kalesan
- Johnson & Johnson MedTech, 5490 Great America Pkwy, Santa Clara, CA, 95054, USA
| | - Natalie Stottler
- Johnson & Johnson MedTech, 1100 Olive Way, Suite 1100, Seattle, 98101, WA, USA
| | - Svetlana Petculescu
- Johnson & Johnson MedTech, 1100 Olive Way, Suite 1100, Seattle, 98101, WA, USA
| |
Collapse
|
24
|
Lavanchy JL, Gonzalez C, Kassem H, Nett PC, Mutter D, Padoy N. Proposal and multicentric validation of a laparoscopic Roux-en-Y gastric bypass surgery ontology. Surg Endosc 2023; 37:2070-2077. [PMID: 36289088 PMCID: PMC10017621 DOI: 10.1007/s00464-022-09745-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/30/2022] [Accepted: 10/14/2022] [Indexed: 11/30/2022]
Abstract
BACKGROUND Phase and step annotation in surgical videos is a prerequisite for surgical scene understanding and for downstream tasks like intraoperative feedback or assistance. However, most ontologies are applied on small monocentric datasets and lack external validation. To overcome these limitations an ontology for phases and steps of laparoscopic Roux-en-Y gastric bypass (LRYGB) is proposed and validated on a multicentric dataset in terms of inter- and intra-rater reliability (inter-/intra-RR). METHODS The proposed LRYGB ontology consists of 12 phase and 46 step definitions that are hierarchically structured. Two board certified surgeons (raters) with > 10 years of clinical experience applied the proposed ontology on two datasets: (1) StraBypass40 consists of 40 LRYGB videos from Nouvel Hôpital Civil, Strasbourg, France and (2) BernBypass70 consists of 70 LRYGB videos from Inselspital, Bern University Hospital, Bern, Switzerland. To assess inter-RR the two raters' annotations of ten randomly chosen videos from StraBypass40 and BernBypass70 each, were compared. To assess intra-RR ten randomly chosen videos were annotated twice by the same rater and annotations were compared. Inter-RR was calculated using Cohen's kappa. Additionally, for inter- and intra-RR accuracy, precision, recall, F1-score, and application dependent metrics were applied. RESULTS The mean ± SD video duration was 108 ± 33 min and 75 ± 21 min in StraBypass40 and BernBypass70, respectively. The proposed ontology shows an inter-RR of 96.8 ± 2.7% for phases and 85.4 ± 6.0% for steps on StraBypass40 and 94.9 ± 5.8% for phases and 76.1 ± 13.9% for steps on BernBypass70. The overall Cohen's kappa of inter-RR was 95.9 ± 4.3% for phases and 80.8 ± 10.0% for steps. Intra-RR showed an accuracy of 98.4 ± 1.1% for phases and 88.1 ± 8.1% for steps. CONCLUSION The proposed ontology shows an excellent inter- and intra-RR and should therefore be implemented routinely in phase and step annotation of LRYGB.
Collapse
Affiliation(s)
- Joël L Lavanchy
- IHU Strasbourg, 1 Place de l'Hôpital, 67000, Strasbourg, France.
- Department of Visceral Surgery and Medicine, Inselspital, Bern University Hospital, University of Bern, Bern, Switzerland.
| | - Cristians Gonzalez
- IHU Strasbourg, 1 Place de l'Hôpital, 67000, Strasbourg, France
- University Hospital of Strasbourg, Strasbourg, France
| | - Hasan Kassem
- ICube, CNRS, University of Strasbourg, Strasbourg, France
| | - Philipp C Nett
- Department of Visceral Surgery and Medicine, Inselspital, Bern University Hospital, University of Bern, Bern, Switzerland
| | - Didier Mutter
- IHU Strasbourg, 1 Place de l'Hôpital, 67000, Strasbourg, France
- University Hospital of Strasbourg, Strasbourg, France
| | - Nicolas Padoy
- IHU Strasbourg, 1 Place de l'Hôpital, 67000, Strasbourg, France
- ICube, CNRS, University of Strasbourg, Strasbourg, France
| |
Collapse
|
25
|
Zhao Y, Wang X, Che T, Bao G, Li S. Multi-task deep learning for medical image computing and analysis: A review. Comput Biol Med 2023; 153:106496. [PMID: 36634599 DOI: 10.1016/j.compbiomed.2022.106496] [Citation(s) in RCA: 34] [Impact Index Per Article: 17.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/21/2022] [Revised: 12/06/2022] [Accepted: 12/27/2022] [Indexed: 12/29/2022]
Abstract
The renaissance of deep learning has provided promising solutions to various tasks. While conventional deep learning models are constructed for a single specific task, multi-task deep learning (MTDL) that is capable to simultaneously accomplish at least two tasks has attracted research attention. MTDL is a joint learning paradigm that harnesses the inherent correlation of multiple related tasks to achieve reciprocal benefits in improving performance, enhancing generalizability, and reducing the overall computational cost. This review focuses on the advanced applications of MTDL for medical image computing and analysis. We first summarize four popular MTDL network architectures (i.e., cascaded, parallel, interacted, and hybrid). Then, we review the representative MTDL-based networks for eight application areas, including the brain, eye, chest, cardiac, abdomen, musculoskeletal, pathology, and other human body regions. While MTDL-based medical image processing has been flourishing and demonstrating outstanding performance in many tasks, in the meanwhile, there are performance gaps in some tasks, and accordingly we perceive the open challenges and the perspective trends. For instance, in the 2018 Ischemic Stroke Lesion Segmentation challenge, the reported top dice score of 0.51 and top recall of 0.55 achieved by the cascaded MTDL model indicate further research efforts in high demand to escalate the performance of current models.
Collapse
Affiliation(s)
- Yan Zhao
- Beijing Advanced Innovation Center for Biomedical Engineering, School of Biological Science and Medical Engineering, Beihang University, Beijing, 100083, China
| | - Xiuying Wang
- School of Computer Science, The University of Sydney, Sydney, NSW, 2008, Australia.
| | - Tongtong Che
- Beijing Advanced Innovation Center for Biomedical Engineering, School of Biological Science and Medical Engineering, Beihang University, Beijing, 100083, China
| | - Guoqing Bao
- School of Computer Science, The University of Sydney, Sydney, NSW, 2008, Australia
| | - Shuyu Li
- State Key Laboratory of Cognitive Neuroscience and Learning, Beijing Normal University, Beijing, 100875, China.
| |
Collapse
|
26
|
Fer D, Zhang B, Abukhalil R, Goel V, Goel B, Barker J, Kalesan B, Barragan I, Gaddis ML, Kilroy PG. An artificial intelligence model that automatically labels roux-en-Y gastric bypasses, a comparison to trained surgeon annotators. Surg Endosc 2023:10.1007/s00464-023-09870-6. [PMID: 36658282 DOI: 10.1007/s00464-023-09870-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/21/2022] [Accepted: 01/04/2023] [Indexed: 01/21/2023]
Abstract
INTRODUCTION Artificial intelligence (AI) can automate certain tasks to improve data collection. Models have been created to annotate the steps of Roux-en-Y Gastric Bypass (RYGB). However, model performance has not been compared with individual surgeon annotator performance. We developed a model that automatically labels RYGB steps and compares its performance to surgeons. METHODS AND PROCEDURES 545 videos (17 surgeons) of laparoscopic RYGB procedures were collected. An annotation guide (12 steps, 52 tasks) was developed. Steps were annotated by 11 surgeons. Each video was annotated by two surgeons and a third reconciled the differences. A convolutional AI model was trained to identify steps and compared with manual annotation. For modeling, we used 390 videos for training, 95 for validation, and 60 for testing. The performance comparison between AI model versus manual annotation was performed using ANOVA (Analysis of Variance) in a subset of 60 testing videos. We assessed the performance of the model at each step and poor performance was defined (F1-score < 80%). RESULTS The convolutional model identified 12 steps in the RYGB architecture. Model performance varied at each step [F1 > 90% for 7, and > 80% for 2]. The reconciled manual annotation data (F1 > 80% for > 5 steps) performed better than trainee's (F1 > 80% for 2-5 steps for 4 annotators, and < 2 steps for 4 annotators). In testing subset, certain steps had low performance, indicating potential ambiguities in surgical landmarks. Additionally, some videos were easier to annotate than others, suggesting variability. After controlling for variability, the AI algorithm was comparable to the manual (p < 0.0001). CONCLUSION AI can be used to identify surgical landmarks in RYGB comparable to the manual process. AI was more accurate to recognize some landmarks more accurately than surgeons. This technology has the potential to improve surgical training by assessing the learning curves of surgeons at scale.
Collapse
Affiliation(s)
- Danyal Fer
- University of California, San Francisco-East Bay, General Surgery, Oakland, CA, USA.,Johnson & Johnson MedTech, New Brunswick, NJ, USA
| | - Bokai Zhang
- Johnson & Johnson MedTech, New Brunswick, NJ, USA
| | - Rami Abukhalil
- Johnson & Johnson MedTech, New Brunswick, NJ, USA. .,, 5490 Great America Parkway, Santa Clara, CA, 95054, USA.
| | - Varun Goel
- University of California, San Francisco-East Bay, General Surgery, Oakland, CA, USA.,Johnson & Johnson MedTech, New Brunswick, NJ, USA
| | - Bharti Goel
- Johnson & Johnson MedTech, New Brunswick, NJ, USA
| | | | | | | | | | | |
Collapse
|
27
|
Bombieri M, Rospocher M, Ponzetto SP, Fiorini P. Machine understanding surgical actions from intervention procedure textbooks. Comput Biol Med 2023; 152:106415. [PMID: 36527782 DOI: 10.1016/j.compbiomed.2022.106415] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/24/2022] [Revised: 11/23/2022] [Accepted: 12/04/2022] [Indexed: 12/12/2022]
Abstract
The automatic extraction of procedural surgical knowledge from surgery manuals, academic papers or other high-quality textual resources, is of the utmost importance to develop knowledge-based clinical decision support systems, to automatically execute some procedure's step or to summarize the procedural information, spread throughout the texts, in a structured form usable as a study resource by medical students. In this work, we propose a first benchmark on extracting detailed surgical actions from available intervention procedure textbooks and papers. We frame the problem as a Semantic Role Labeling task. Exploiting a manually annotated dataset, we apply different Transformer-based information extraction methods. Starting from RoBERTa and BioMedRoBERTa pre-trained language models, we first investigate a zero-shot scenario and compare the obtained results with a full fine-tuning setting. We then introduce a new ad-hoc surgical language model, named SurgicBERTa, pre-trained on a large collection of surgical materials, and we compare it with the previous ones. In the assessment, we explore different dataset splits (one in-domain and two out-of-domain) and we investigate also the effectiveness of the approach in a few-shot learning scenario. Performance is evaluated on three correlated sub-tasks: predicate disambiguation, semantic argument disambiguation and predicate-argument disambiguation. Results show that the fine-tuning of a pre-trained domain-specific language model achieves the highest performance on all splits and on all sub-tasks. All models are publicly released.
Collapse
Affiliation(s)
- Marco Bombieri
- Department of Computer Science, University of Verona, Verona, Italy.
| | - Marco Rospocher
- Department of Foreign Languages and Literatures, University of Verona, Verona, Italy
| | | | - Paolo Fiorini
- Department of Computer Science, University of Verona, Verona, Italy
| |
Collapse
|
28
|
Becker M, Dai J, Chang AL, Feyaerts D, Stelzer IA, Zhang M, Berson E, Saarunya G, De Francesco D, Espinosa C, Kim Y, Marić I, Mataraso S, Payrovnaziri SN, Phongpreecha T, Ravindra NG, Shome S, Tan Y, Thuraiappah M, Xue L, Mayo JA, Quaintance CC, Laborde A, King LS, Dhabhar FS, Gotlib IH, Wong RJ, Angst MS, Shaw GM, Stevenson DK, Gaudilliere B, Aghaeepour N. Revealing the impact of lifestyle stressors on the risk of adverse pregnancy outcomes with multitask machine learning. Front Pediatr 2022; 10:933266. [PMID: 36582513 PMCID: PMC9793100 DOI: 10.3389/fped.2022.933266] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 04/30/2022] [Accepted: 11/14/2022] [Indexed: 12/15/2022] Open
Abstract
Psychosocial and stress-related factors (PSFs), defined as internal or external stimuli that induce biological changes, are potentially modifiable factors and accessible targets for interventions that are associated with adverse pregnancy outcomes (APOs). Although individual APOs have been shown to be connected to PSFs, they are biologically interconnected, relatively infrequent, and therefore challenging to model. In this context, multi-task machine learning (MML) is an ideal tool for exploring the interconnectedness of APOs on the one hand and building on joint combinatorial outcomes to increase predictive power on the other hand. Additionally, by integrating single cell immunological profiling of underlying biological processes, the effects of stress-based therapeutics may be measurable, facilitating the development of precision medicine approaches. Objectives The primary objectives were to jointly model multiple APOs and their connection to stress early in pregnancy, and to explore the underlying biology to guide development of accessible and measurable interventions. Materials and Methods In a prospective cohort study, PSFs were assessed during the first trimester with an extensive self-filled questionnaire for 200 women. We used MML to simultaneously model, and predict APOs (severe preeclampsia, superimposed preeclampsia, gestational diabetes and early gestational age) as well as several risk factors (BMI, diabetes, hypertension) for these patients based on PSFs. Strongly interrelated stressors were categorized to identify potential therapeutic targets. Furthermore, for a subset of 14 women, we modeled the connection of PSFs to the maternal immune system to APOs by building corresponding ML models based on an extensive single cell immune dataset generated by mass cytometry time of flight (CyTOF). Results Jointly modeling APOs in a MML setting significantly increased modeling capabilities and yielded a highly predictive integrated model of APOs underscoring their interconnectedness. Most APOs were associated with mental health, life stress, and perceived health risks. Biologically, stressors were associated with specific immune characteristics revolving around CD4/CD8 T cells. Immune characteristics predicted based on stress were in turn found to be associated with APOs. Conclusions Elucidating connections among stress, multiple APOs simultaneously, and immune characteristics has the potential to facilitate the implementation of ML-based, individualized, integrative models of pregnancy in clinical decision making. The modifiable nature of stressors may enable the development of accessible interventions, with success tracked through immune characteristics.
Collapse
Affiliation(s)
- Martin Becker
- Department of Anesthesiology, Perioperative and Pain Medicine, Stanford University, Palo Alto, CA, United States
- Department of Pediatrics, Stanford University, Palo Alto, CA, United States
- Department of Biomedical Data Science, Stanford University, Palo Alto, CA, United States
- Chair for Intelligent Data Analytics, Institute for Visual and Analytic Computing, Department of Computer Science and Electrical Engineering, University of Rostock, Rostock, Germany
| | - Jennifer Dai
- Department of Anesthesiology, Perioperative and Pain Medicine, Stanford University, Palo Alto, CA, United States
- Department of Pediatrics, Stanford University, Palo Alto, CA, United States
- Department of Biomedical Data Science, Stanford University, Palo Alto, CA, United States
| | - Alan L. Chang
- Department of Anesthesiology, Perioperative and Pain Medicine, Stanford University, Palo Alto, CA, United States
- Department of Pediatrics, Stanford University, Palo Alto, CA, United States
- Department of Biomedical Data Science, Stanford University, Palo Alto, CA, United States
| | - Dorien Feyaerts
- Department of Anesthesiology, Perioperative and Pain Medicine, Stanford University, Palo Alto, CA, United States
| | - Ina A. Stelzer
- Department of Anesthesiology, Perioperative and Pain Medicine, Stanford University, Palo Alto, CA, United States
| | - Miao Zhang
- Department of Anesthesiology, Perioperative and Pain Medicine, Stanford University, Palo Alto, CA, United States
- Department of Pediatrics, Stanford University, Palo Alto, CA, United States
- Department of Biomedical Data Science, Stanford University, Palo Alto, CA, United States
| | - Eloise Berson
- Department of Anesthesiology, Perioperative and Pain Medicine, Stanford University, Palo Alto, CA, United States
- Department of Pediatrics, Stanford University, Palo Alto, CA, United States
- Department of Pathology, Stanford University, Palo Alto, CA, United States
| | - Geetha Saarunya
- Department of Anesthesiology, Perioperative and Pain Medicine, Stanford University, Palo Alto, CA, United States
- Department of Pediatrics, Stanford University, Palo Alto, CA, United States
- Department of Biomedical Data Science, Stanford University, Palo Alto, CA, United States
| | - Davide De Francesco
- Department of Anesthesiology, Perioperative and Pain Medicine, Stanford University, Palo Alto, CA, United States
- Department of Pediatrics, Stanford University, Palo Alto, CA, United States
- Department of Biomedical Data Science, Stanford University, Palo Alto, CA, United States
| | - Camilo Espinosa
- Department of Anesthesiology, Perioperative and Pain Medicine, Stanford University, Palo Alto, CA, United States
- Department of Pediatrics, Stanford University, Palo Alto, CA, United States
- Department of Biomedical Data Science, Stanford University, Palo Alto, CA, United States
| | - Yeasul Kim
- Department of Anesthesiology, Perioperative and Pain Medicine, Stanford University, Palo Alto, CA, United States
- Department of Pediatrics, Stanford University, Palo Alto, CA, United States
- Department of Biomedical Data Science, Stanford University, Palo Alto, CA, United States
| | - Ivana Marić
- Department of Anesthesiology, Perioperative and Pain Medicine, Stanford University, Palo Alto, CA, United States
- Department of Pediatrics, Stanford University, Palo Alto, CA, United States
- Department of Biomedical Data Science, Stanford University, Palo Alto, CA, United States
| | - Samson Mataraso
- Department of Anesthesiology, Perioperative and Pain Medicine, Stanford University, Palo Alto, CA, United States
- Department of Pediatrics, Stanford University, Palo Alto, CA, United States
- Department of Biomedical Data Science, Stanford University, Palo Alto, CA, United States
| | - Seyedeh Neelufar Payrovnaziri
- Department of Anesthesiology, Perioperative and Pain Medicine, Stanford University, Palo Alto, CA, United States
- Department of Pediatrics, Stanford University, Palo Alto, CA, United States
- Department of Biomedical Data Science, Stanford University, Palo Alto, CA, United States
| | - Thanaphong Phongpreecha
- Department of Anesthesiology, Perioperative and Pain Medicine, Stanford University, Palo Alto, CA, United States
- Department of Biomedical Data Science, Stanford University, Palo Alto, CA, United States
- Department of Pathology, Stanford University, Palo Alto, CA, United States
| | - Neal G. Ravindra
- Department of Anesthesiology, Perioperative and Pain Medicine, Stanford University, Palo Alto, CA, United States
- Department of Pediatrics, Stanford University, Palo Alto, CA, United States
- Department of Biomedical Data Science, Stanford University, Palo Alto, CA, United States
| | - Sayane Shome
- Department of Anesthesiology, Perioperative and Pain Medicine, Stanford University, Palo Alto, CA, United States
- Department of Pediatrics, Stanford University, Palo Alto, CA, United States
- Department of Biomedical Data Science, Stanford University, Palo Alto, CA, United States
| | - Yuqi Tan
- Department of Microbiology & Immunology, Stanford University, Palo Alto, CA, United States
- Baxter Laboratory for Stem Cell Biology, Stanford University, Palo Alto, CA, United States
| | - Melan Thuraiappah
- Department of Anesthesiology, Perioperative and Pain Medicine, Stanford University, Palo Alto, CA, United States
- Department of Pediatrics, Stanford University, Palo Alto, CA, United States
- Department of Biomedical Data Science, Stanford University, Palo Alto, CA, United States
| | - Lei Xue
- Department of Anesthesiology, Perioperative and Pain Medicine, Stanford University, Palo Alto, CA, United States
- Department of Pediatrics, Stanford University, Palo Alto, CA, United States
- Department of Biomedical Data Science, Stanford University, Palo Alto, CA, United States
| | - Jonathan A. Mayo
- Department of Pediatrics, Stanford University, Palo Alto, CA, United States
| | | | - Ana Laborde
- Department of Pediatrics, Stanford University, Palo Alto, CA, United States
| | - Lucy S. King
- Department of Psychology, Stanford University, Palo Alto, CA, United States
| | - Firdaus S. Dhabhar
- Department of Psychiatry & Behavioral Science, University of Miami, Miami, FL, United States
- Department of Microbiology & Immunology, University of Miami, Miami, FL, United States
- Sylvester Comprehensive Cancer Center, University of Miami, Miami, FL, United States
- Miller School of Medicine, University of Miami, Miami, FL, United States
| | - Ian H. Gotlib
- Department of Psychology, Stanford University, Palo Alto, CA, United States
| | - Ronald J. Wong
- Department of Biomedical Data Science, Stanford University, Palo Alto, CA, United States
| | - Martin S. Angst
- Department of Anesthesiology, Perioperative and Pain Medicine, Stanford University, Palo Alto, CA, United States
| | - Gary M. Shaw
- Department of Pediatrics, Stanford University, Palo Alto, CA, United States
| | - David K. Stevenson
- Department of Pediatrics, Stanford University, Palo Alto, CA, United States
| | - Brice Gaudilliere
- Department of Anesthesiology, Perioperative and Pain Medicine, Stanford University, Palo Alto, CA, United States
- Department of Pediatrics, Stanford University, Palo Alto, CA, United States
| | - Nima Aghaeepour
- Department of Anesthesiology, Perioperative and Pain Medicine, Stanford University, Palo Alto, CA, United States
- Department of Pediatrics, Stanford University, Palo Alto, CA, United States
- Department of Biomedical Data Science, Stanford University, Palo Alto, CA, United States
| |
Collapse
|
29
|
Zhang B, Sturgeon D, Shankar AR, Goel VK, Barker J, Ghanem A, Lee P, Milecky M, Stottler N, Petculescu S. Surgical instrument recognition for instrument usage documentation and surgical video library indexing. COMPUTER METHODS IN BIOMECHANICS AND BIOMEDICAL ENGINEERING: IMAGING & VISUALIZATION 2022. [DOI: 10.1080/21681163.2022.2152371] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/12/2022]
Affiliation(s)
- Bokai Zhang
- Digital Solutions, Johnson & Johnson MedTech, Seattle, WA, USA
| | - Darrick Sturgeon
- Digital Solutions, Johnson & Johnson MedTech, Santa Clara, CA, USA
| | | | | | - Jocelyn Barker
- Digital Solutions, Johnson & Johnson MedTech, Santa Clara, CA, USA
| | - Amer Ghanem
- Digital Solutions, Johnson & Johnson MedTech, Seattle, WA, USA
| | - Philip Lee
- Digital Solutions, Johnson & Johnson MedTech, Santa Clara, CA, USA
| | - Meghan Milecky
- Digital Solutions, Johnson & Johnson MedTech, Seattle, WA, USA
| | | | | |
Collapse
|
30
|
Mascagni P, Alapatt D, Urade T, Vardazaryan A, Mutter D, Marescaux J, Costamagna G, Dallemagne B, Padoy N. Response to Comments on: A Computer Vision Platform to Automatically Locate Critical Events in Surgical Videos: Documenting Safety in Laparoscopic Cholecystectomy. Ann Surg 2022; 276:e637-e638. [PMID: 35129513 DOI: 10.1097/sla.0000000000005267] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/26/2022]
Affiliation(s)
- Pietro Mascagni
- ICube, University of Strasbourg, CNRS, IHU Strasbourg, France
- Fondazione Policlinico Universitario A. Gemelli IRCCS, Rome, Italy
| | - Deepak Alapatt
- ICube, University of Strasbourg, CNRS, IHU Strasbourg, France
| | - Takeshi Urade
- IHU-Strasbourg, Institute of Image-Guided Surgery, Strasbourg, France
| | | | - Didier Mutter
- IHU-Strasbourg, Institute of Image-Guided Surgery, Strasbourg, France
- Institute for Research against Digestive Cancer (IRCAD), Strasbourg, France
- Department of Digestive and Endocrine Surgery, University of Strasbourg, Strasbourg, France
| | - Jacques Marescaux
- Institute for Research against Digestive Cancer (IRCAD), Strasbourg, France
| | - Guido Costamagna
- Fondazione Policlinico Universitario A. Gemelli IRCCS, Rome, Italy
| | - Bernard Dallemagne
- Institute for Research against Digestive Cancer (IRCAD), Strasbourg, France
- Department of Digestive and Endocrine Surgery, University of Strasbourg, Strasbourg, France
| | - Nicolas Padoy
- ICube, University of Strasbourg, CNRS, IHU Strasbourg, France
| |
Collapse
|
31
|
Quero G, Mascagni P, Kolbinger FR, Fiorillo C, De Sio D, Longo F, Schena CA, Laterza V, Rosa F, Menghi R, Papa V, Tondolo V, Cina C, Distler M, Weitz J, Speidel S, Padoy N, Alfieri S. Artificial Intelligence in Colorectal Cancer Surgery: Present and Future Perspectives. Cancers (Basel) 2022; 14:3803. [PMID: 35954466 PMCID: PMC9367568 DOI: 10.3390/cancers14153803] [Citation(s) in RCA: 9] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/28/2022] [Revised: 07/29/2022] [Accepted: 08/03/2022] [Indexed: 02/05/2023] Open
Abstract
Artificial intelligence (AI) and computer vision (CV) are beginning to impact medicine. While evidence on the clinical value of AI-based solutions for the screening and staging of colorectal cancer (CRC) is mounting, CV and AI applications to enhance the surgical treatment of CRC are still in their early stage. This manuscript introduces key AI concepts to a surgical audience, illustrates fundamental steps to develop CV for surgical applications, and provides a comprehensive overview on the state-of-the-art of AI applications for the treatment of CRC. Notably, studies show that AI can be trained to automatically recognize surgical phases and actions with high accuracy even in complex colorectal procedures such as transanal total mesorectal excision (TaTME). In addition, AI models were trained to interpret fluorescent signals and recognize correct dissection planes during total mesorectal excision (TME), suggesting CV as a potentially valuable tool for intraoperative decision-making and guidance. Finally, AI could have a role in surgical training, providing automatic surgical skills assessment in the operating room. While promising, these proofs of concept require further development, validation in multi-institutional data, and clinical studies to confirm AI as a valuable tool to enhance CRC treatment.
Collapse
Affiliation(s)
- Giuseppe Quero
- Digestive Surgery Unit, Fondazione Policlinico Universitario A. Gemelli IRCCS, Largo Agostino Gemelli 8, 00168 Rome, Italy
- Faculty of Medicine, Università Cattolica del Sacro Cuore di Roma, Largo Francesco Vito 1, 00168 Rome, Italy
| | - Pietro Mascagni
- Faculty of Medicine, Università Cattolica del Sacro Cuore di Roma, Largo Francesco Vito 1, 00168 Rome, Italy
- Institute of Image-Guided Surgery, IHU-Strasbourg, 67000 Strasbourg, France
| | - Fiona R. Kolbinger
- Department for Visceral, Thoracic and Vascular Surgery, University Hospital and Faculty of Medicine Carl Gustav Carus, Technische Universität Dresden, 01307 Dresden, Germany
| | - Claudio Fiorillo
- Digestive Surgery Unit, Fondazione Policlinico Universitario A. Gemelli IRCCS, Largo Agostino Gemelli 8, 00168 Rome, Italy
| | - Davide De Sio
- Digestive Surgery Unit, Fondazione Policlinico Universitario A. Gemelli IRCCS, Largo Agostino Gemelli 8, 00168 Rome, Italy
| | - Fabio Longo
- Digestive Surgery Unit, Fondazione Policlinico Universitario A. Gemelli IRCCS, Largo Agostino Gemelli 8, 00168 Rome, Italy
| | - Carlo Alberto Schena
- Digestive Surgery Unit, Fondazione Policlinico Universitario A. Gemelli IRCCS, Largo Agostino Gemelli 8, 00168 Rome, Italy
- Faculty of Medicine, Università Cattolica del Sacro Cuore di Roma, Largo Francesco Vito 1, 00168 Rome, Italy
| | - Vito Laterza
- Digestive Surgery Unit, Fondazione Policlinico Universitario A. Gemelli IRCCS, Largo Agostino Gemelli 8, 00168 Rome, Italy
- Faculty of Medicine, Università Cattolica del Sacro Cuore di Roma, Largo Francesco Vito 1, 00168 Rome, Italy
| | - Fausto Rosa
- Digestive Surgery Unit, Fondazione Policlinico Universitario A. Gemelli IRCCS, Largo Agostino Gemelli 8, 00168 Rome, Italy
- Faculty of Medicine, Università Cattolica del Sacro Cuore di Roma, Largo Francesco Vito 1, 00168 Rome, Italy
| | - Roberta Menghi
- Digestive Surgery Unit, Fondazione Policlinico Universitario A. Gemelli IRCCS, Largo Agostino Gemelli 8, 00168 Rome, Italy
- Faculty of Medicine, Università Cattolica del Sacro Cuore di Roma, Largo Francesco Vito 1, 00168 Rome, Italy
| | - Valerio Papa
- Digestive Surgery Unit, Fondazione Policlinico Universitario A. Gemelli IRCCS, Largo Agostino Gemelli 8, 00168 Rome, Italy
- Faculty of Medicine, Università Cattolica del Sacro Cuore di Roma, Largo Francesco Vito 1, 00168 Rome, Italy
| | - Vincenzo Tondolo
- Digestive Surgery Unit, Fondazione Policlinico Universitario A. Gemelli IRCCS, Largo Agostino Gemelli 8, 00168 Rome, Italy
| | - Caterina Cina
- Digestive Surgery Unit, Fondazione Policlinico Universitario A. Gemelli IRCCS, Largo Agostino Gemelli 8, 00168 Rome, Italy
| | - Marius Distler
- Department for Visceral, Thoracic and Vascular Surgery, University Hospital and Faculty of Medicine Carl Gustav Carus, Technische Universität Dresden, 01307 Dresden, Germany
| | - Juergen Weitz
- Department for Visceral, Thoracic and Vascular Surgery, University Hospital and Faculty of Medicine Carl Gustav Carus, Technische Universität Dresden, 01307 Dresden, Germany
| | - Stefanie Speidel
- National Center for Tumor Diseases (NCT), Partner Site Dresden, 01307 Dresden, Germany
| | - Nicolas Padoy
- Institute of Image-Guided Surgery, IHU-Strasbourg, 67000 Strasbourg, France
- ICube, Centre National de la Recherche Scientifique (CNRS), University of Strasbourg, 67000 Strasbourg, France
| | - Sergio Alfieri
- Digestive Surgery Unit, Fondazione Policlinico Universitario A. Gemelli IRCCS, Largo Agostino Gemelli 8, 00168 Rome, Italy
- Faculty of Medicine, Università Cattolica del Sacro Cuore di Roma, Largo Francesco Vito 1, 00168 Rome, Italy
| |
Collapse
|
32
|
Surgical Tool Datasets for Machine Learning Research: A Survey. Int J Comput Vis 2022. [DOI: 10.1007/s11263-022-01640-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/05/2022]
Abstract
AbstractThis paper is a comprehensive survey of datasets for surgical tool detection and related surgical data science and machine learning techniques and algorithms. The survey offers a high level perspective of current research in this area, analyses the taxonomy of approaches adopted by researchers using surgical tool datasets, and addresses key areas of research, such as the datasets used, evaluation metrics applied and deep learning techniques utilised. Our presentation and taxonomy provides a framework that facilitates greater understanding of current work, and highlights the challenges and opportunities for further innovative and useful research.
Collapse
|
33
|
Nwoye CI, Yu T, Gonzalez C, Seeliger B, Mascagni P, Mutter D, Marescaux J, Padoy N. Rendezvous: Attention mechanisms for the recognition of surgical action triplets in endoscopic videos. Med Image Anal 2022; 78:102433. [PMID: 35398658 DOI: 10.1016/j.media.2022.102433] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/20/2021] [Revised: 02/25/2022] [Accepted: 03/21/2022] [Indexed: 10/18/2022]
Abstract
Out of all existing frameworks for surgical workflow analysis in endoscopic videos, action triplet recognition stands out as the only one aiming to provide truly fine-grained and comprehensive information on surgical activities. This information, presented as 〈instrument, verb, target〉 combinations, is highly challenging to be accurately identified. Triplet components can be difficult to recognize individually; in this task, it requires not only performing recognition simultaneously for all three triplet components, but also correctly establishing the data association between them. To achieve this task, we introduce our new model, the Rendezvous (RDV), which recognizes triplets directly from surgical videos by leveraging attention at two different levels. We first introduce a new form of spatial attention to capture individual action triplet components in a scene; called Class Activation Guided Attention Mechanism (CAGAM). This technique focuses on the recognition of verbs and targets using activations resulting from instruments. To solve the association problem, our RDV model adds a new form of semantic attention inspired by Transformer networks; called Multi-Head of Mixed Attention (MHMA). This technique uses several cross and self attentions to effectively capture relationships between instruments, verbs, and targets. We also introduce CholecT50 - a dataset of 50 endoscopic videos in which every frame has been annotated with labels from 100 triplet classes. Our proposed RDV model significantly improves the triplet prediction mAP by over 9% compared to the state-of-the-art methods on this dataset.
Collapse
Affiliation(s)
| | - Tong Yu
- ICube, University of Strasbourg, CNRS, France
| | | | - Barbara Seeliger
- IHU Strasbourg, France; University Hospital of Strasbourg, France
| | - Pietro Mascagni
- ICube, University of Strasbourg, CNRS, France; Fondazione Policlinico Universitario Agostino Gemelli IRCCS, Rome, Italy
| | - Didier Mutter
- IHU Strasbourg, France; University Hospital of Strasbourg, France
| | | | - Nicolas Padoy
- ICube, University of Strasbourg, CNRS, France; IHU Strasbourg, France.
| |
Collapse
|
34
|
Hybrid Spatiotemporal Contrastive Representation Learning for Content-Based Surgical Video Retrieval. ELECTRONICS 2022. [DOI: 10.3390/electronics11091353] [Citation(s) in RCA: 7] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 02/04/2023]
Abstract
In the medical field, due to their economic and clinical benefits, there is a growing interest in minimally invasive surgeries and microscopic surgeries. These types of surgeries are often recorded during operations, and these recordings have become a key resource for education, patient disease analysis, surgical error analysis, and surgical skill assessment. However, manual searching in this collection of long-term surgical videos is an extremely labor-intensive and long-term task, requiring an effective content-based video analysis system. In this regard, previous methods for surgical video retrieval are based on handcrafted features which do not represent the video effectively. On the other hand, deep learning-based solutions were found to be effective in both surgical image and video analysis, where CNN-, LSTM- and CNN-LSTM-based methods were proposed in most surgical video analysis tasks. In this paper, we propose a hybrid spatiotemporal embedding method to enhance spatiotemporal representations using an adaptive fusion layer on top of the LSTM and temporal causal convolutional modules. To learn surgical video representations, we propose exploring the supervised contrastive learning approach to leverage label information in addition to augmented versions. By validating our approach to a video retrieval task on two datasets, Surgical Actions 160 and Cataract-101, we significantly improve on previous results in terms of mean average precision, 30.012 ± 1.778 vs. 22.54 ± 1.557 for Surgical Actions 160 and 81.134 ± 1.28 vs. 33.18 ± 1.311 for Cataract-101. We also validate the proposed method’s suitability for surgical phase recognition task using the benchmark Cholec80 surgical dataset, where our approach outperforms (with 90.2% accuracy) the state of the art.
Collapse
|
35
|
Das A, Bano S, Vasconcelos F, Khan DZ, Marcus HJ, Stoyanov D. Reducing Prediction volatility in the surgical workflow recognition of endoscopic pituitary surgery. Int J Comput Assist Radiol Surg 2022; 17:1445-1452. [PMID: 35362848 PMCID: PMC9307536 DOI: 10.1007/s11548-022-02599-y] [Citation(s) in RCA: 7] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/03/2022] [Accepted: 03/08/2022] [Indexed: 11/25/2022]
Abstract
Purpose: Workflow recognition can aid surgeons before an operation when used as a training tool, during an operation by increasing operating room efficiency, and after an operation in the completion of operation notes. Although several methods have been applied to this task, they have been tested on few surgical datasets. Therefore, their generalisability is not well tested, particularly for surgical approaches utilising smaller working spaces which are susceptible to occlusion and necessitate frequent withdrawal of the endoscope. This leads to rapidly changing predictions, which reduces the clinical confidence of the methods, and hence limits their suitability for clinical translation. Methods: Firstly, the optimal neural network is found using established methods, using endoscopic pituitary surgery as an exemplar. Then, prediction volatility is formally defined as a new evaluation metric as a proxy for uncertainty, and two temporal smoothing functions are created. The first (modal, \documentclass[12pt]{minimal}
\usepackage{amsmath}
\usepackage{wasysym}
\usepackage{amsfonts}
\usepackage{amssymb}
\usepackage{amsbsy}
\usepackage{mathrsfs}
\usepackage{upgreek}
\setlength{\oddsidemargin}{-69pt}
\begin{document}$$M_n$$\end{document}Mn) mode-averages over the previous n predictions, and the second (threshold, \documentclass[12pt]{minimal}
\usepackage{amsmath}
\usepackage{wasysym}
\usepackage{amsfonts}
\usepackage{amssymb}
\usepackage{amsbsy}
\usepackage{mathrsfs}
\usepackage{upgreek}
\setlength{\oddsidemargin}{-69pt}
\begin{document}$$T_n$$\end{document}Tn) ensures a class is only changed after being continuously predicted for n predictions. Both functions are independently applied to the predictions of the optimal network. Results: The methods are evaluated on a 50-video dataset using fivefold cross-validation, and the optimised evaluation metric is weighted-\documentclass[12pt]{minimal}
\usepackage{amsmath}
\usepackage{wasysym}
\usepackage{amsfonts}
\usepackage{amssymb}
\usepackage{amsbsy}
\usepackage{mathrsfs}
\usepackage{upgreek}
\setlength{\oddsidemargin}{-69pt}
\begin{document}$$F_1$$\end{document}F1 score. The optimal model is ResNet-50+LSTM achieving 0.84 in 3-phase classification and 0.74 in 7-step classification. Applying threshold smoothing further improves these results, achieving 0.86 in 3-phase classification, and 0.75 in 7-step classification, while also drastically reducing the prediction volatility. Conclusion: The results confirm the established methods generalise to endoscopic pituitary surgery, and show simple temporal smoothing not only reduces prediction volatility, but actively improves performance.
Collapse
Affiliation(s)
- Adrito Das
- Wellcome/EPSRC Centre for Interventional and Surgical Sciences, University College London, London, United Kingdom.
| | - Sophia Bano
- Wellcome/EPSRC Centre for Interventional and Surgical Sciences, University College London, London, United Kingdom
| | - Francisco Vasconcelos
- Wellcome/EPSRC Centre for Interventional and Surgical Sciences, University College London, London, United Kingdom
| | - Danyal Z Khan
- Wellcome/EPSRC Centre for Interventional and Surgical Sciences, University College London, London, United Kingdom
- Department of Neurosurgery, National Hospital for Neurology and Neurosurgery, London, United Kingdom
| | - Hani J Marcus
- Wellcome/EPSRC Centre for Interventional and Surgical Sciences, University College London, London, United Kingdom
- Department of Neurosurgery, National Hospital for Neurology and Neurosurgery, London, United Kingdom
| | - Danail Stoyanov
- Wellcome/EPSRC Centre for Interventional and Surgical Sciences, University College London, London, United Kingdom
| |
Collapse
|
36
|
Video-based fully automatic assessment of open surgery suturing skills. Int J Comput Assist Radiol Surg 2022; 17:437-448. [PMID: 35103921 PMCID: PMC8805431 DOI: 10.1007/s11548-022-02559-6] [Citation(s) in RCA: 10] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/26/2021] [Accepted: 01/03/2022] [Indexed: 01/09/2023]
Abstract
Purpose The goal of this study was to develop a new reliable open surgery suturing simulation system for training medical students in situations where resources are limited or in the domestic setup. Namely, we developed an algorithm for tools and hands localization as well as identifying the interactions between them based on simple webcam video data, calculating motion metrics for assessment of surgical skill. Methods Twenty-five participants performed multiple suturing tasks using our simulator. The YOLO network was modified to a multi-task network for the purpose of tool localization and tool–hand interaction detection. This was accomplished by splitting the YOLO detection heads so that they supported both tasks with minimal addition to computer run-time. Furthermore, based on the outcome of the system, motion metrics were calculated. These metrics included traditional metrics such as time and path length as well as new metrics assessing the technique participants use for holding the tools. Results The dual-task network performance was similar to that of two networks, while computational load was only slightly bigger than one network. In addition, the motion metrics showed significant differences between experts and novices. Conclusion While video capture is an essential part of minimal invasive surgery, it is not an integral component of open surgery. Thus, new algorithms, focusing on the unique challenges open surgery videos present, are required. In this study, a dual-task network was developed to solve both a localization task and a hand–tool interaction task. The dual network may be easily expanded to a multi-task network, which may be useful for images with multiple layers and for evaluating the interaction between these different layers. Supplementary Information The online version contains supplementary material available at 10.1007/s11548-022-02559-6.
Collapse
|
37
|
An Interaction-Based Bayesian Network Framework for Surgical Workflow Segmentation. INTERNATIONAL JOURNAL OF ENVIRONMENTAL RESEARCH AND PUBLIC HEALTH 2021; 18:ijerph18126401. [PMID: 34199188 PMCID: PMC8296226 DOI: 10.3390/ijerph18126401] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 04/03/2021] [Revised: 06/03/2021] [Accepted: 06/08/2021] [Indexed: 11/25/2022]
Abstract
Recognizing and segmenting surgical workflow is important for assessing surgical skills as well as hospital effectiveness, and plays a crucial role in maintaining and improving surgical and healthcare systems. Most evidence supporting this remains signal-, video-, and/or image-based. Furthermore, casual evidence of the interaction between surgical staff remains challenging to gather and is largely absent. Here, we collected the real-time movement data of the surgical staff during a neurosurgery to explore cooperation networks among different surgical roles, namely surgeon, assistant nurse, scrub nurse, and anesthetist, and to segment surgical workflows to further assess surgical effectiveness. We installed a zone position system (ZPS) in an operating room (OR) to effectively record high-frequency high-resolution movements of all surgical staff. Measuring individual interactions in a closed, small area is difficult, and surgical workflow classification has uncertainties associated with the surgical staff in terms of their varied training and operation skills, patients in terms of their initial states and biological differences, and surgical procedures in terms of their complexities. We proposed an interaction-based framework to recognize the surgical workflow and integrated a Bayesian network (BN) to solve the uncertainty issues. Our results suggest that the proposed BN method demonstrates good performance with a high accuracy of 70%. Furthermore, it semantically explains the interaction and cooperation among surgical staff.
Collapse
|