1
|
Feng X, Zhang X, Shi X, Li L, Wang S. ST-ITEF: Spatio-Temporal Intraoperative Task Estimating Framework to recognize surgical phase and predict instrument path based on multi-object tracking in keratoplasty. Med Image Anal 2024; 91:103026. [PMID: 37976868 DOI: 10.1016/j.media.2023.103026] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/07/2022] [Revised: 08/22/2023] [Accepted: 11/03/2023] [Indexed: 11/19/2023]
Abstract
Computer-assisted cognition guidance for surgical robotics by computer vision is a potential future outcome, which could facilitate the surgery for both operation accuracy and autonomy level. In this paper, multiple-object segmentation and feature extraction from this segmentation are combined to determine and predict surgical manipulation. A novel three-stage Spatio-Temporal Intraoperative Task Estimating Framework is proposed, with a quantitative expression derived from ophthalmologists' visual information process and also with the multi-object tracking of surgical instruments and human corneas involved in keratoplasty. In the estimation of intraoperative workflow, quantifying the operation parameters is still an open challenge. This problem is tackled by extracting key geometric properties from multi-object segmentation and calculating the relative position among instruments and corneas. A decision framework is further proposed, based on prior geometric properties, to recognize the current surgical phase and predict the instrument path for each phase. Our framework is tested and evaluated by real human keratoplasty videos. The optimized DeepLabV3 with image filtration won the competitive class-IoU in the segmentation task and the mean phase jaccard reached 55.58 % for the phase recognition. Both the qualitative and quantitative results indicate that our framework can achieve accurate segmentation and surgical phase recognition under complex disturbance. The Intraoperative Task Estimating Framework would be highly potential to guide surgical robots in clinical practice.
Collapse
Affiliation(s)
- Xiaojing Feng
- School of Mechanical Engineering at Xi'an Jiaotong University, 28 Xianning West Road, Xi'an 710049, China.
| | - Xiaodong Zhang
- School of Mechanical Engineering at Xi'an Jiaotong University, 28 Xianning West Road, Xi'an 710049, China.
| | - Xiaojun Shi
- School of Mechanical Engineering at Xi'an Jiaotong University, 28 Xianning West Road, Xi'an 710049, China
| | - Li Li
- Department of Ophthalmology at the First Affiliated Hospital of Xi'an Jiaotong University, 277 Yanta West Road, Xi'an 710061, China
| | - Shaopeng Wang
- School of Mechanical Engineering at Xi'an Jiaotong University, 28 Xianning West Road, Xi'an 710049, China
| |
Collapse
|
2
|
Demir KC, Schieber H, Weise T, Roth D, May M, Maier A, Yang SH. Deep Learning in Surgical Workflow Analysis: A Review of Phase and Step Recognition. IEEE J Biomed Health Inform 2023; 27:5405-5417. [PMID: 37665700 DOI: 10.1109/jbhi.2023.3311628] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 09/06/2023]
Abstract
OBJECTIVE In the last two decades, there has been a growing interest in exploring surgical procedures with statistical models to analyze operations at different semantic levels. This information is necessary for developing context-aware intelligent systems, which can assist the physicians during operations, evaluate procedures afterward or help the management team to effectively utilize the operating room. The objective is to extract reliable patterns from surgical data for the robust estimation of surgical activities performed during operations. The purpose of this article is to review the state-of-the-art deep learning methods that have been published after 2018 for analyzing surgical workflows, with a focus on phase and step recognition. METHODS Three databases, IEEE Xplore, Scopus, and PubMed were searched, and additional studies are added through a manual search. After the database search, 343 studies were screened and a total of 44 studies are selected for this review. CONCLUSION The use of temporal information is essential for identifying the next surgical action. Contemporary methods used mainly RNNs, hierarchical CNNs, and Transformers to preserve long-distance temporal relations. The lack of large publicly available datasets for various procedures is a great challenge for the development of new and robust models. As supervised learning strategies are used to show proof-of-concept, self-supervised, semi-supervised, or active learning methods are used to mitigate dependency on annotated data. SIGNIFICANCE The present study provides a comprehensive review of recent methods in surgical workflow analysis, summarizes commonly used architectures, datasets, and discusses challenges.
Collapse
|
3
|
Yasrab R, Fu Z, Zhao H, Lee LH, Sharma H, Drukker L, Papageorgiou AT, Noble JA. A Machine Learning Method for Automated Description and Workflow Analysis of First Trimester Ultrasound Scans. IEEE TRANSACTIONS ON MEDICAL IMAGING 2023; 42:1301-1313. [PMID: 36455084 DOI: 10.1109/tmi.2022.3226274] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/04/2023]
Abstract
Obstetric ultrasound assessment of fetal anatomy in the first trimester of pregnancy is one of the less explored fields in obstetric sonography because of the paucity of guidelines on anatomical screening and availability of data. This paper, for the first time, examines imaging proficiency and practices of first trimester ultrasound scanning through analysis of full-length ultrasound video scans. Findings from this study provide insights to inform the development of more effective user-machine interfaces, of targeted assistive technologies, as well as improvements in workflow protocols for first trimester scanning. Specifically, this paper presents an automated framework to model operator clinical workflow from full-length routine first-trimester fetal ultrasound scan videos. The 2D+t convolutional neural network-based architecture proposed for video annotation incorporates transfer learning and spatio-temporal (2D+t) modelling to automatically partition an ultrasound video into semantically meaningful temporal segments based on the fetal anatomy detected in the video. The model results in a cross-validation A1 accuracy of 96.10% , F1=0.95 , precision =0.94 and recall =0.95 . Automated semantic partitioning of unlabelled video scans (n=250) achieves a high correlation with expert annotations ( ρ = 0.95, p=0.06 ). Clinical workflow patterns, operator skill and its variability can be derived from the resulting representation using the detected anatomy labels, order, and distribution. It is shown that nuchal translucency (NT) is the toughest standard plane to acquire and most operators struggle to localize high-quality frames. Furthermore, it is found that newly qualified operators spend 25.56% more time on key biometry tasks than experienced operators.
Collapse
|
4
|
Junger D, Frommer SM, Burgert O. State-of-the-art of situation recognition systems for intraoperative procedures. Med Biol Eng Comput 2022; 60:921-939. [PMID: 35178622 PMCID: PMC8933302 DOI: 10.1007/s11517-022-02520-4] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/16/2020] [Accepted: 01/30/2022] [Indexed: 11/05/2022]
Abstract
One of the key challenges for automatic assistance is the support of actors in the operating room depending on the status of the procedure. Therefore, context information collected in the operating room is used to gain knowledge about the current situation. In literature, solutions already exist for specific use cases, but it is doubtful to what extent these approaches can be transferred to other conditions. We conducted a comprehensive literature research on existing situation recognition systems for the intraoperative area, covering 274 articles and 95 cross-references published between 2010 and 2019. We contrasted and compared 58 identified approaches based on defined aspects such as used sensor data or application area. In addition, we discussed applicability and transferability. Most of the papers focus on video data for recognizing situations within laparoscopic and cataract surgeries. Not all of the approaches can be used online for real-time recognition. Using different methods, good results with recognition accuracies above 90% could be achieved. Overall, transferability is less addressed. The applicability of approaches to other circumstances seems to be possible to a limited extent. Future research should place a stronger focus on adaptability. The literature review shows differences within existing approaches for situation recognition and outlines research trends. Applicability and transferability to other conditions are less addressed in current work.
Collapse
Affiliation(s)
- D Junger
- School of Informatics, Research Group Computer Assisted Medicine (CaMed), Reutlingen University, Alteburgstr. 150, 72762, Reutlingen, Germany.
| | - S M Frommer
- School of Informatics, Research Group Computer Assisted Medicine (CaMed), Reutlingen University, Alteburgstr. 150, 72762, Reutlingen, Germany
| | - O Burgert
- School of Informatics, Research Group Computer Assisted Medicine (CaMed), Reutlingen University, Alteburgstr. 150, 72762, Reutlingen, Germany
| |
Collapse
|
5
|
Sharma H, Drukker L, Chatelain P, Droste R, Papageorghiou AT, Noble JA. Knowledge representation and learning of operator clinical workflow from full-length routine fetal ultrasound scan videos. Med Image Anal 2021; 69:101973. [PMID: 33550004 DOI: 10.1016/j.media.2021.101973] [Citation(s) in RCA: 21] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/12/2020] [Revised: 11/18/2020] [Accepted: 01/11/2021] [Indexed: 12/25/2022]
Abstract
Ultrasound is a widely used imaging modality, yet it is well-known that scanning can be highly operator-dependent and difficult to perform, which limits its wider use in clinical practice. The literature on understanding what makes clinical sonography hard to learn and how sonography varies in the field is sparse, restricted to small-scale studies on the effectiveness of ultrasound training schemes, the role of ultrasound simulation in training, and the effect of introducing scanning guidelines and standards on diagnostic image quality. The Big Data era, and the recent and rapid emergence of machine learning as a more mainstream large-scale data analysis technique, presents a fresh opportunity to study sonography in the field at scale for the first time. Large-scale analysis of video recordings of full-length routine fetal ultrasound scans offers the potential to characterise differences between the scanning proficiency of experts and trainees that would be tedious and time-consuming to do manually due to the vast amounts of data. Such research would be informative to better understand operator clinical workflow when conducting ultrasound scans to support skills training, optimise scan times, and inform building better user-machine interfaces. This paper is to our knowledge the first to address sonography data science, which we consider in the context of second-trimester fetal sonography screening. Specifically, we present a fully-automatic framework to analyse operator clinical workflow solely from full-length routine second-trimester fetal ultrasound scan videos. An ultrasound video dataset containing more than 200 hours of scan recordings was generated for this study. We developed an original deep learning method to temporally segment the ultrasound video into semantically meaningful segments (the video description). The resulting semantic annotation was then used to depict operator clinical workflow (the knowledge representation). Machine learning was applied to the knowledge representation to characterise operator skills and assess operator variability. For video description, our best-performing deep spatio-temporal network shows favourable results in cross-validation (accuracy: 91.7%), statistical analysis (correlation: 0.98, p < 0.05) and retrospective manual validation (accuracy: 76.4%). For knowledge representation of operator clinical workflow, a three-level abstraction scheme consisting of a Subject-specific Timeline Model (STM), Summary of Timeline Features (STF), and an Operator Graph Model (OGM), was introduced that led to a significant decrease in dimensionality and computational complexity compared to raw video data. The workflow representations were learnt to discriminate between operator skills, where a proposed convolutional neural network-based model showed most promising performance (cross-validation accuracy: 98.5%, accuracy on unseen operators: 76.9%). These were further used to derive operator-specific scanning signatures and operator variability in terms of type, order and time distribution of constituent tasks.
Collapse
Affiliation(s)
- Harshita Sharma
- Institute of Biomedical Engineering, Department of Engineering Science, University of Oxford, Oxford, United Kingdom.
| | - Lior Drukker
- Nuffield Department of Women's and Reproductive Health, University of Oxford, Oxford, United Kingdom
| | - Pierre Chatelain
- Institute of Biomedical Engineering, Department of Engineering Science, University of Oxford, Oxford, United Kingdom
| | - Richard Droste
- Institute of Biomedical Engineering, Department of Engineering Science, University of Oxford, Oxford, United Kingdom
| | - Aris T Papageorghiou
- Nuffield Department of Women's and Reproductive Health, University of Oxford, Oxford, United Kingdom
| | - J Alison Noble
- Institute of Biomedical Engineering, Department of Engineering Science, University of Oxford, Oxford, United Kingdom
| |
Collapse
|
6
|
Loukas C, Sgouros NP. Multi‐instance multi‐label learning for surgical image annotation. Int J Med Robot 2020; 16:e2058. [DOI: 10.1002/rcs.2058] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/20/2019] [Revised: 10/30/2019] [Accepted: 11/06/2019] [Indexed: 12/23/2022]
Affiliation(s)
- Constantinos Loukas
- Laboratory of Medical PhysicsMedical School National and Kapodistrian University of Athens Athens Greece
| | - Nicholas P. Sgouros
- Department of InformaticsNational and Kapodistrian University of Athens Athens Greece
| |
Collapse
|
7
|
Zhang Z, Liu Z, Singapogu R. Extracting Subtask-specific Metrics Toward Objective Assessment of Needle Insertion Skill for Hemodialysis Cannulation. JOURNAL OF MEDICAL ROBOTICS RESEARCH 2019; 4:1942006. [PMID: 33681506 PMCID: PMC7932179 DOI: 10.1142/s2424905x19420066] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 12/15/2022]
Abstract
About 80% of all in-hospital patients require vascular access cannulation for treatments. However, there is a high rate of failure for vascular access cannulation, with several studies estimating up to a 50% failure rate for these procedures. Hemodialysis cannulation (HDC) is arguably one of the most difficult of these procedures with a steep learning curve and an extremely high failure rate. In light of this, there is a critical need that clinicians performing HDC have requisite skills. In this work, we present a method that combines the strengths of simulator-based objective skill quantification and task segmentation for needle insertion skill assessment at the subtask level. The results from our experimental study with seven novice nursing students on the cannulation simulator demonstrate that the simulator was able to segment needle insertion into subtask phases. In addition, most metrics were significantly different between the two phases, indicating that there may be value in evaluating participants' behavior at the subtask level. Further, the outcome metric (risk of infiltrating the simulated blood vessel) was successfully predicted by the process metrics in both phases. The implications of these results for skill assessment and training are discussed, which could potentially lead to improved patient outcomes if more extensive validation is pursued.
Collapse
Affiliation(s)
- Ziyang Zhang
- Department of Bioengineering, Clemson University, 301 Rhodes Research Center, Clemson, SC 29634, USA
| | - Zhanhe Liu
- Department of Bioengineering, Clemson University, 301 Rhodes Research Center, Clemson, SC 29634, USA
| | - Ravikiran Singapogu
- Department of Bioengineering, Clemson University, 301 Rhodes Research Center, Clemson, SC 29634, USA
| |
Collapse
|
8
|
Sugino T, Nakamura R, Kuboki A, Honda O, Yamamoto M, Ohtori N. Comparative analysis of surgical processes for image-guided endoscopic sinus surgery. Int J Comput Assist Radiol Surg 2018; 14:93-104. [PMID: 30196337 DOI: 10.1007/s11548-018-1855-y] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/20/2018] [Accepted: 09/01/2018] [Indexed: 10/28/2022]
Abstract
PURPOSE This study proposes a method to analyze surgical performance by modeling, aligning, and comparing surgical processes. This method is intended to serve as a means to support the enhancement of surgical skills for endoscopic sinus surgeries (ESSs). We focus on surgical navigation systems used in image-guided ESSs and aim to construct a comparative analysis method for surgical processes based on the information about the surgical instruments motion obtained from the navigation system. METHODS The proposed method consists of the following three parts: quantification of surgical features, modeling of surgical processes, and alignment and comparison of surgical process models (SPMs). First, we defined time-series parameters using the navigation-based surgical data. Second, we created SPMs by applying the defined parameters and the relative positional information of the instruments to the patient's anatomy. Third, we constructed a method to align and compare SPMs based on dynamic time warping with barycenter averaging. RESULTS The proposed method was validated on a dataset containing surgical data obtained by an optical tracking system from 14 clinical ESS cases. We evaluated the validity of the comparative analysis by aligning and comparing SPMs between experts and residents. The validation results suggested that the proposed method could achieve proper alignment of the SPMs and clarify the differences in surgical processes between experts and residents. CONCLUSION We developed a method to enable a time-series comparative analysis of surgical processes based on the surgical data from the navigation system. This method can allow surgeons to identify differences between their procedures and reference procedures such as experts' procedures.
Collapse
Affiliation(s)
- Takaaki Sugino
- Graduate School of Engineering, Chiba University, Chiba, Japan
| | - Ryoichi Nakamura
- Graduate School of Engineering, Chiba University, Chiba, Japan. .,Center for Frontier Medical Engineering, Chiba University, Chiba, Japan. .,Faculty of Engineering, Chiba University, Chiba, Japan. .,PRESTO, Japan Science and Technology Agency, Saitama, Japan.
| | - Akihito Kuboki
- Department of Otorhinolaryngology, The Jikei University School of Medicine, Tokyo, Japan
| | - Osamu Honda
- Faculty of Engineering, Chiba University, Chiba, Japan
| | | | - Nobuyoshi Ohtori
- Department of Otorhinolaryngology, The Jikei University School of Medicine, Tokyo, Japan
| |
Collapse
|
9
|
Jin Y, Dou Q, Chen H, Yu L, Qin J, Fu CW, Heng PA. SV-RCNet: Workflow Recognition From Surgical Videos Using Recurrent Convolutional Network. IEEE TRANSACTIONS ON MEDICAL IMAGING 2018; 37:1114-1126. [PMID: 29727275 DOI: 10.1109/tmi.2017.2787657] [Citation(s) in RCA: 107] [Impact Index Per Article: 17.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/08/2023]
Abstract
We propose an analysis of surgical videos that is based on a novel recurrent convolutional network (SV-RCNet), specifically for automatic workflow recognition from surgical videos online, which is a key component for developing the context-aware computer-assisted intervention systems. Different from previous methods which harness visual and temporal information separately, the proposed SV-RCNet seamlessly integrates a convolutional neural network (CNN) and a recurrent neural network (RNN) to form a novel recurrent convolutional architecture in order to take full advantages of the complementary information of visual and temporal features learned from surgical videos. We effectively train the SV-RCNet in an end-to-end manner so that the visual representations and sequential dynamics can be jointly optimized in the learning process. In order to produce more discriminative spatio-temporal features, we exploit a deep residual network (ResNet) and a long short term memory (LSTM) network, to extract visual features and temporal dependencies, respectively, and integrate them into the SV-RCNet. Moreover, based on the phase transition-sensitive predictions from the SV-RCNet, we propose a simple yet effective inference scheme, namely the prior knowledge inference (PKI), by leveraging the natural characteristic of surgical video. Such a strategy further improves the consistency of results and largely boosts the recognition performance. Extensive experiments have been conducted with the MICCAI 2016 Modeling and Monitoring of Computer Assisted Interventions Workflow Challenge dataset and Cholec80 dataset to validate SV-RCNet. Our approach not only achieves superior performance on these two datasets but also outperforms the state-of-the-art methods by a significant margin.
Collapse
|
10
|
Dergachyova O, Bouget D, Huaulmé A, Morandi X, Jannin P. Automatic data-driven real-time segmentation and recognition of surgical workflow. Int J Comput Assist Radiol Surg 2016; 11:1081-9. [PMID: 26995598 DOI: 10.1007/s11548-016-1371-x] [Citation(s) in RCA: 58] [Impact Index Per Article: 7.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/12/2016] [Accepted: 02/26/2016] [Indexed: 11/30/2022]
Abstract
PURPOSE With the intention of extending the perception and action of surgical staff inside the operating room, the medical community has expressed a growing interest towards context-aware systems. Requiring an accurate identification of the surgical workflow, such systems make use of data from a diverse set of available sensors. In this paper, we propose a fully data-driven and real-time method for segmentation and recognition of surgical phases using a combination of video data and instrument usage signals, exploiting no prior knowledge. We also introduce new validation metrics for assessment of workflow detection. METHODS The segmentation and recognition are based on a four-stage process. Firstly, during the learning time, a Surgical Process Model is automatically constructed from data annotations to guide the following process. Secondly, data samples are described using a combination of low-level visual cues and instrument information. Then, in the third stage, these descriptions are employed to train a set of AdaBoost classifiers capable of distinguishing one surgical phase from others. Finally, AdaBoost responses are used as input to a Hidden semi-Markov Model in order to obtain a final decision. RESULTS On the MICCAI EndoVis challenge laparoscopic dataset we achieved a precision and a recall of 91 % in classification of 7 phases. CONCLUSION Compared to the analysis based on one data type only, a combination of visual features and instrument signals allows better segmentation, reduction of the detection delay and discovery of the correct phase order.
Collapse
Affiliation(s)
- Olga Dergachyova
- INSERM, U1099, Rennes, 35000, France. .,Université de Rennes 1, LTSI, Rennes, 35000, France.
| | - David Bouget
- INSERM, U1099, Rennes, 35000, France.,Université de Rennes 1, LTSI, Rennes, 35000, France
| | - Arnaud Huaulmé
- INSERM, U1099, Rennes, 35000, France.,Université de Rennes 1, LTSI, Rennes, 35000, France.,Université Joseph Fourier, TIMC-IMAG UMR 5525, Grenoble, 38041, France
| | - Xavier Morandi
- INSERM, U1099, Rennes, 35000, France.,Université de Rennes 1, LTSI, Rennes, 35000, France.,CHU Rennes, Département de Neurochirurgie, Rennes, 35000, France
| | - Pierre Jannin
- INSERM, U1099, Rennes, 35000, France.,Université de Rennes 1, LTSI, Rennes, 35000, France
| |
Collapse
|
11
|
Despinoy F, Bouget D, Forestier G, Penet C, Zemiti N, Poignet P, Jannin P. Unsupervised Trajectory Segmentation for Surgical Gesture Recognition in Robotic Training. IEEE Trans Biomed Eng 2015; 63:1280-91. [PMID: 26513773 DOI: 10.1109/tbme.2015.2493100] [Citation(s) in RCA: 64] [Impact Index Per Article: 7.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/06/2022]
Abstract
Dexterity and procedural knowledge are two critical skills that surgeons need to master to perform accurate and safe surgical interventions. However, current training systems do not allow us to provide an in-depth analysis of surgical gestures to precisely assess these skills. Our objective is to develop a method for the automatic and quantitative assessment of surgical gestures. To reach this goal, we propose a new unsupervised algorithm that can automatically segment kinematic data from robotic training sessions. Without relying on any prior information or model, this algorithm detects critical points in the kinematic data that define relevant spatio-temporal segments. Based on the association of these segments, we obtain an accurate recognition of the gestures involved in the surgical training task. We, then, perform an advanced analysis and assess our algorithm using datasets recorded during real expert training sessions. After comparing our approach with the manual annotations of the surgical gestures, we observe 97.4% accuracy for the learning purpose and an average matching score of 81.9% for the fully automated gesture recognition process. Our results show that trainees workflow can be followed and surgical gestures may be automatically evaluated according to an expert database. This approach tends toward improving training efficiency by minimizing the learning curve.
Collapse
|