1
|
Shafiei SB, Shadpour S, Mohler JL. An Integrated Electroencephalography and Eye-Tracking Analysis Using eXtreme Gradient Boosting for Mental Workload Evaluation in Surgery. HUMAN FACTORS 2025; 67:464-484. [PMID: 39325959 PMCID: PMC11936844 DOI: 10.1177/00187208241285513] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 09/28/2024]
Abstract
ObjectiveWe aimed to develop advanced machine learning models using electroencephalogram (EEG) and eye-tracking data to predict the mental workload associated with engaging in various surgical tasks.BackgroundTraditional methods of evaluating mental workload often involve self-report scales, which are subject to individual biases. Due to the multidimensional nature of mental workload, there is a pressing need to identify factors that contribute to mental workload across different surgical tasks.MethodEEG and eye-tracking data from 26 participants performing Matchboard and Ring Walk tasks from the da Vinci simulator and the pattern cut and suturing tasks from the Fundamentals of Laparoscopic Surgery (FLS) program were used to develop an eXtreme Gradient Boosting (XGBoost) model for mental workload evaluation.ResultsThe developed XGBoost models demonstrated strong predictive performance with R2 values of 0.82, 0.81, 0.82, and 0.83 for the Matchboard, Ring Walk, pattern cut, and suturing tasks, respectively. Key features for predicting mental workload included task average pupil diameter, complexity level, average functional connectivity strength at the temporal lobe, and the total trajectory length of the nondominant eye's pupil. Integrating features from both EEG and eye-tracking data significantly enhanced the performance of mental workload evaluation models, as evidenced by repeated-measures t-tests yielding p-values less than 0.05. However, this enhancement was not observed in the Pattern Cut task (repeated-measures t-tests; p > 0.05).ConclusionThe findings underscore the potential for machine learning and multidimensional feature integration to predict mental workload and thereby improve task design and surgical training.ApplicationThe advanced mental workload prediction models could serve as instrumental tools to enhance our understanding of surgeons' cognitive demands and significantly improve the effectiveness of surgical training programs.
Collapse
|
2
|
Kankanamge D, Wijeweera C, Ong Z, Preda T, Carney T, Wilson M, Preda V. Artificial intelligence based assessment of minimally invasive surgical skills using standardised objective metrics - A narrative review. Am J Surg 2025; 241:116074. [PMID: 39561477 DOI: 10.1016/j.amjsurg.2024.116074] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/22/2024] [Revised: 10/17/2024] [Accepted: 11/05/2024] [Indexed: 11/21/2024]
Abstract
INTRODUCTION Many studies display significant heterogeneity in the reliability of artificial intelligence (AI) assessment of minimally invasive surgical (MIS) skills. Our objective is to investigate whether AI systems utilising standardised objective metrics (SOMs) as the basis of skill assessment can provide a clearer understanding of the current state of such technology. METHODS We systematically searched Medline, Embase, Scopus, CENTRAL and Web of Science from March 2023 to September 2023. Results were compiled as a narrative review. RESULTS Twenty-four citations were analysed. Overall accuracy of AI systems in predicting overall SOM score of a procedure ranged from 63 % to 100 %. The most frequently used SOM by AI algorithms were Objective Structured Assessment of Technical Skills (OSATS) (8/24) and Global Evaluative Assessment of Robotic Skills (GEARS) (8/24). CONCLUSIONS Stratifying for AI studies which employed SOMs to assess surgical skill did not reduce heterogeneity of reported reliability. Our study identifies key issues within the current literature, which, once addressed, could allow more meaningful comparisons between studies.
Collapse
Affiliation(s)
- D Kankanamge
- Faculty of Medicine, Health and Human Sciences, Macquarie University, Macquarie Park, Sydney, Australia.
| | - C Wijeweera
- Faculty of Medicine, Health and Human Sciences, Macquarie University, Macquarie Park, Sydney, Australia
| | - Z Ong
- Faculty of Medicine, Health and Human Sciences, Macquarie University, Macquarie Park, Sydney, Australia
| | - T Preda
- Department of Surgery, School of Medicine, University of Notre Dame Australia (Sydney), Australia; Royal North Shore Virtual Care Service, St Leonards, Sydney, Australia
| | - T Carney
- Surgical XR, 2 Technology Place, Macquarie Park, Sydney, Australia
| | - M Wilson
- Faculty of Medicine, Health and Human Sciences, Macquarie University, Macquarie Park, Sydney, Australia; Surgical XR, 2 Technology Place, Macquarie Park, Sydney, Australia
| | - V Preda
- Faculty of Medicine, Health and Human Sciences, Macquarie University, Macquarie Park, Sydney, Australia
| |
Collapse
|
3
|
Shafiei SB, Shadpour S, Mohler JL, Kauffman EC, Holden M, Gutierrez C. Classification of subtask types and skill levels in robot-assisted surgery using EEG, eye-tracking, and machine learning. Surg Endosc 2024; 38:5137-5147. [PMID: 39039296 PMCID: PMC11362185 DOI: 10.1007/s00464-024-11049-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/23/2024] [Accepted: 07/06/2024] [Indexed: 07/24/2024]
Abstract
BACKGROUND Objective and standardized evaluation of surgical skills in robot-assisted surgery (RAS) holds critical importance for both surgical education and patient safety. This study introduces machine learning (ML) techniques using features derived from electroencephalogram (EEG) and eye-tracking data to identify surgical subtasks and classify skill levels. METHOD The efficacy of this approach was assessed using a comprehensive dataset encompassing nine distinct classes, each representing a unique combination of three surgical subtasks executed by surgeons while performing operations on pigs. Four ML models, logistic regression, random forest, gradient boosting, and extreme gradient boosting (XGB) were used for multi-class classification. To develop the models, 20% of data samples were randomly allocated to a test set, with the remaining 80% used for training and validation. Hyperparameters were optimized through grid search, using fivefold stratified cross-validation repeated five times. Model reliability was ensured by performing train-test split over 30 iterations, with average measurements reported. RESULTS The findings revealed that the proposed approach outperformed existing methods for classifying RAS subtasks and skills; the XGB and random forest models yielded high accuracy rates (88.49% and 88.56%, respectively) that were not significantly different (two-sample t-test; P-value = 0.9). CONCLUSION These results underscore the potential of ML models to augment the objectivity and precision of RAS subtask and skill evaluation. Future research should consider exploring ways to optimize these models, particularly focusing on the classes identified as challenging in this study. Ultimately, this study marks a significant step towards a more refined, objective, and standardized approach to RAS training and competency assessment.
Collapse
Affiliation(s)
- Somayeh B Shafiei
- The Intelligent Cancer Care Laboratory, Department of Urology, Roswell Park Comprehensive Cancer Center, Buffalo, NY, 14263, USA.
| | - Saeed Shadpour
- Department of Animal Biosciences, University of Guelph, Guelph, ON, N1G 2W1, Canada
| | - James L Mohler
- Department of Urology, Roswell Park Comprehensive Cancer Center, Buffalo, NY, 14263, USA
| | - Eric C Kauffman
- Department of Urology, Roswell Park Comprehensive Cancer Center, Buffalo, NY, 14263, USA
| | - Matthew Holden
- School of Computer Science, Carleton University, 1125 Colonel By Drive, Ottawa, ON, K1S 5B6, Canada
| | - Camille Gutierrez
- Obstetrics and Gynecology Residency Program, Sisters of Charity Health System, Buffalo, NY, 14214, USA
| |
Collapse
|
4
|
Shafiei SB, Shadpour S, Mohler JL, Rashidi P, Toussi MS, Liu Q, Shafqat A, Gutierrez C. Prediction of Robotic Anastomosis Competency Evaluation (RACE) metrics during vesico-urethral anastomosis using electroencephalography, eye-tracking, and machine learning. Sci Rep 2024; 14:14611. [PMID: 38918593 PMCID: PMC11199555 DOI: 10.1038/s41598-024-65648-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/26/2023] [Accepted: 06/21/2024] [Indexed: 06/27/2024] Open
Abstract
Residents learn the vesico-urethral anastomosis (VUA), a key step in robot-assisted radical prostatectomy (RARP), early in their training. VUA assessment and training significantly impact patient outcomes and have high educational value. This study aimed to develop objective prediction models for the Robotic Anastomosis Competency Evaluation (RACE) metrics using electroencephalogram (EEG) and eye-tracking data. Data were recorded from 23 participants performing robot-assisted VUA (henceforth 'anastomosis') on plastic models and animal tissue using the da Vinci surgical robot. EEG and eye-tracking features were extracted, and participants' anastomosis subtask performance was assessed by three raters using the RACE tool and operative videos. Random forest regression (RFR) and gradient boosting regression (GBR) models were developed to predict RACE scores using extracted features, while linear mixed models (LMM) identified associations between features and RACE scores. Overall performance scores significantly differed among inexperienced, competent, and experienced skill levels (P value < 0.0001). For plastic anastomoses, R2 values for predicting unseen test scores were: needle positioning (0.79), needle entry (0.74), needle driving and tissue trauma (0.80), suture placement (0.75), and tissue approximation (0.70). For tissue anastomoses, the values were 0.62, 0.76, 0.65, 0.68, and 0.62, respectively. The models could enhance RARP anastomosis training by offering objective performance feedback to trainees.
Collapse
Affiliation(s)
- Somayeh B Shafiei
- Intelligent Cancer Care Laboratory, Department of Urology, Roswell Park Comprehensive Cancer Center, Elm and Carlton Streets, Buffalo, NY, 14263, USA.
| | - Saeed Shadpour
- Department of Animal Biosciences, University of Guelph, Guelph, ON, N1G 2W1, Canada
| | - James L Mohler
- Department of Urology, Roswell Park Comprehensive Cancer Center, Buffalo, NY, 14263, USA
| | - Parisa Rashidi
- Department of Biomedical Engineering, University of Florida, Gainesville, FL, 32611, USA
| | - Mehdi Seilanian Toussi
- Intelligent Cancer Care Laboratory, Department of Urology, Roswell Park Comprehensive Cancer Center, Elm and Carlton Streets, Buffalo, NY, 14263, USA
| | - Qian Liu
- Department of Biostatistics and Bioinformatics, Roswell Park Comprehensive Cancer Center, Buffalo, NY, USA
| | - Ambreen Shafqat
- Intelligent Cancer Care Laboratory, Department of Urology, Roswell Park Comprehensive Cancer Center, Elm and Carlton Streets, Buffalo, NY, 14263, USA
| | - Camille Gutierrez
- Obstetrics and Gynecology Residency Program, Sisters of Charity Health System, Buffalo, NY, 14214, USA
| |
Collapse
|
5
|
Shafiei SB, Shadpour S, Sasangohar F, Mohler JL, Attwood K, Jing Z. Development of performance and learning rate evaluation models in robot-assisted surgery using electroencephalography and eye-tracking. NPJ SCIENCE OF LEARNING 2024; 9:3. [PMID: 38242909 PMCID: PMC10799032 DOI: 10.1038/s41539-024-00216-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/24/2023] [Accepted: 01/08/2024] [Indexed: 01/21/2024]
Abstract
The existing performance evaluation methods in robot-assisted surgery (RAS) are mainly subjective, costly, and affected by shortcomings such as the inconsistency of results and dependency on the raters' opinions. The aim of this study was to develop models for an objective evaluation of performance and rate of learning RAS skills while practicing surgical simulator tasks. The electroencephalogram (EEG) and eye-tracking data were recorded from 26 subjects while performing Tubes, Suture Sponge, and Dots and Needles tasks. Performance scores were generated by the simulator program. The functional brain networks were extracted using EEG data and coherence analysis. Then these networks, along with community detection analysis, facilitated the extraction of average search information and average temporal flexibility features at 21 Brodmann areas (BA) and four band frequencies. Twelve eye-tracking features were extracted and used to develop linear random intercept models for performance evaluation and multivariate linear regression models for the evaluation of the learning rate. Results showed that subject-wise standardization of features improved the R2 of the models. Average pupil diameter and rate of saccade were associated with performance in the Tubes task (multivariate analysis; p-value = 0.01 and p-value = 0.04, respectively). Entropy of pupil diameter was associated with performance in Dots and Needles task (multivariate analysis; p-value = 0.01). Average temporal flexibility and search information in several BAs and band frequencies were associated with performance and rate of learning. The models may be used to objectify performance and learning rate evaluation in RAS once validated with a broader sample size and tasks.
Collapse
Affiliation(s)
- Somayeh B Shafiei
- Intelligent Cancer Care Laboratory, Department of Urology, Roswell Park Comprehensive Cancer Center, Buffalo, NY, 14263, USA.
| | - Saeed Shadpour
- Department of Animal Biosciences, University of Guelph, Guelph, Ontario, N1G 2W1, Canada
| | - Farzan Sasangohar
- Department of Industrial and Systems Engineering, Texas A&M University, College Station, TX, 77843, USA
| | - James L Mohler
- Department of Urology, Roswell Park Comprehensive Cancer Center, Buffalo, NY, 14263, USA
| | - Kristopher Attwood
- Department of Biostatistics and Bioinformatics, Roswell Park Comprehensive Cancer Center, Buffalo, NY, 14263, USA
| | - Zhe Jing
- Department of Biostatistics and Bioinformatics, Roswell Park Comprehensive Cancer Center, Buffalo, NY, 14263, USA
| |
Collapse
|
6
|
Shafiei SB, Shadpour S, Mohler JL, Sasangohar F, Gutierrez C, Seilanian Toussi M, Shafqat A. Surgical skill level classification model development using EEG and eye-gaze data and machine learning algorithms. J Robot Surg 2023; 17:2963-2971. [PMID: 37864129 PMCID: PMC10678814 DOI: 10.1007/s11701-023-01722-8] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/19/2023] [Accepted: 08/19/2023] [Indexed: 10/22/2023]
Abstract
The aim of this study was to develop machine learning classification models using electroencephalogram (EEG) and eye-gaze features to predict the level of surgical expertise in robot-assisted surgery (RAS). EEG and eye-gaze data were recorded from 11 participants who performed cystectomy, hysterectomy, and nephrectomy using the da Vinci robot. Skill level was evaluated by an expert RAS surgeon using the modified Global Evaluative Assessment of Robotic Skills (GEARS) tool, and data from three subtasks were extracted to classify skill levels using three classification models-multinomial logistic regression (MLR), random forest (RF), and gradient boosting (GB). The GB algorithm was used with a combination of EEG and eye-gaze data to classify skill levels, and differences between the models were tested using two-sample t tests. The GB model using EEG features showed the best performance for blunt dissection (83% accuracy), retraction (85% accuracy), and burn dissection (81% accuracy). The combination of EEG and eye-gaze features using the GB algorithm improved the accuracy of skill level classification to 88% for blunt dissection, 93% for retraction, and 86% for burn dissection. The implementation of objective skill classification models in clinical settings may enhance the RAS surgical training process by providing objective feedback about performance to surgeons and their teachers.
Collapse
Affiliation(s)
- Somayeh B Shafiei
- Intelligent Cancer Care Laboratory, Department of Urology, Roswell Park Comprehensive Cancer Center, Buffalo, NY, 14263, USA.
| | - Saeed Shadpour
- Department of Animal Biosciences, University of Guelph, Guelph, ON, N1G 2W1, Canada
| | - James L Mohler
- Department of Urology, Roswell Park Comprehensive Cancer Center, Buffalo, NY, 14263, USA
| | - Farzan Sasangohar
- Mike and Sugar Barnes Faculty Fellow II, Wm Michael Barnes and Department of Industrial and Systems Engineering at Texas A&M University, College Station, TX, 77843, USA
| | - Camille Gutierrez
- Obstetrics and Gynecology Residency Program, Sisters of Charity Health System, Buffalo, NY, 14214, USA
| | - Mehdi Seilanian Toussi
- Intelligent Cancer Care Laboratory, Department of Urology, Roswell Park Comprehensive Cancer Center, Buffalo, NY, 14263, USA
| | - Ambreen Shafqat
- Intelligent Cancer Care Laboratory, Department of Urology, Roswell Park Comprehensive Cancer Center, Buffalo, NY, 14263, USA
| |
Collapse
|
7
|
Shafiei SB, Shadpour S, Intes X, Rahul R, Toussi MS, Shafqat A. Performance and learning rate prediction models development in FLS and RAS surgical tasks using electroencephalogram and eye gaze data and machine learning. Surg Endosc 2023; 37:8447-8463. [PMID: 37730852 PMCID: PMC10615961 DOI: 10.1007/s00464-023-10409-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/23/2023] [Accepted: 08/14/2023] [Indexed: 09/22/2023]
Abstract
OBJECTIVE This study explored the use of electroencephalogram (EEG) and eye gaze features, experience-related features, and machine learning to evaluate performance and learning rates in fundamentals of laparoscopic surgery (FLS) and robotic-assisted surgery (RAS). METHODS EEG and eye-tracking data were collected from 25 participants performing three FLS and 22 participants performing two RAS tasks. Generalized linear mixed models, using L1-penalized estimation, were developed to objectify performance evaluation using EEG and eye gaze features, and linear models were developed to objectify learning rate evaluation using these features and performance scores at the first attempt. Experience metrics were added to evaluate their role in learning robotic surgery. The differences in performance across experience levels were tested using analysis of variance. RESULTS EEG and eye gaze features and experience-related features were important for evaluating performance in FLS and RAS tasks with reasonable results. Residents outperformed faculty in FLS peg transfer (p value = 0.04), while faculty and residents both excelled over pre-medical students in the FLS pattern cut (p value = 0.01 and p value < 0.001, respectively). Fellows outperformed pre-medical students in FLS suturing (p value = 0.01). In RAS tasks, both faculty and fellows surpassed pre-medical students (p values for the RAS pattern cut were 0.001 for faculty and 0.003 for fellows, while for RAS tissue dissection, the p value was less than 0.001 for both groups), with residents also showing superior skills in tissue dissection (p value = 0.03). CONCLUSION Findings could be used to develop training interventions for improving surgical skills and have implications for understanding motor learning and designing interventions to enhance learning outcomes.
Collapse
Affiliation(s)
- Somayeh B Shafiei
- Intelligent Cancer Care Laboratory, Department of Urology, Roswell Park Comprehensive Cancer Center, Buffalo, NY, 14263, USA.
| | | | - Xavier Intes
- Rensselaer Polytechnic Institute, 110 8th Street, Troy, NY, 12180, USA
| | - Rahul Rahul
- Rensselaer Polytechnic Institute, 110 8th Street, Troy, NY, 12180, USA
| | - Mehdi Seilanian Toussi
- Intelligent Cancer Care Laboratory, Department of Urology, Roswell Park Comprehensive Cancer Center, Buffalo, NY, 14263, USA
| | - Ambreen Shafqat
- Intelligent Cancer Care Laboratory, Department of Urology, Roswell Park Comprehensive Cancer Center, Buffalo, NY, 14263, USA
| |
Collapse
|
8
|
Pedrett R, Mascagni P, Beldi G, Padoy N, Lavanchy JL. Technical skill assessment in minimally invasive surgery using artificial intelligence: a systematic review. Surg Endosc 2023; 37:7412-7424. [PMID: 37584774 PMCID: PMC10520175 DOI: 10.1007/s00464-023-10335-z] [Citation(s) in RCA: 11] [Impact Index Per Article: 5.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/15/2023] [Accepted: 07/20/2023] [Indexed: 08/17/2023]
Abstract
BACKGROUND Technical skill assessment in surgery relies on expert opinion. Therefore, it is time-consuming, costly, and often lacks objectivity. Analysis of intraoperative data by artificial intelligence (AI) has the potential for automated technical skill assessment. The aim of this systematic review was to analyze the performance, external validity, and generalizability of AI models for technical skill assessment in minimally invasive surgery. METHODS A systematic search of Medline, Embase, Web of Science, and IEEE Xplore was performed to identify original articles reporting the use of AI in the assessment of technical skill in minimally invasive surgery. Risk of bias (RoB) and quality of the included studies were analyzed according to Quality Assessment of Diagnostic Accuracy Studies criteria and the modified Joanna Briggs Institute checklists, respectively. Findings were reported according to the Preferred Reporting Items for Systematic Reviews and Meta-Analyses statement. RESULTS In total, 1958 articles were identified, 50 articles met eligibility criteria and were analyzed. Motion data extracted from surgical videos (n = 25) or kinematic data from robotic systems or sensors (n = 22) were the most frequent input data for AI. Most studies used deep learning (n = 34) and predicted technical skills using an ordinal assessment scale (n = 36) with good accuracies in simulated settings. However, all proposed models were in development stage, only 4 studies were externally validated and 8 showed a low RoB. CONCLUSION AI showed good performance in technical skill assessment in minimally invasive surgery. However, models often lacked external validity and generalizability. Therefore, models should be benchmarked using predefined performance metrics and tested in clinical implementation studies.
Collapse
Affiliation(s)
- Romina Pedrett
- Department of Visceral Surgery and Medicine, Inselspital, Bern University Hospital, University of Bern, Bern, Switzerland
| | - Pietro Mascagni
- IHU Strasbourg, Strasbourg, France
- Fondazione Policlinico Universitario A. Gemelli IRCCS, Rome, Italy
| | - Guido Beldi
- Department of Visceral Surgery and Medicine, Inselspital, Bern University Hospital, University of Bern, Bern, Switzerland
| | - Nicolas Padoy
- IHU Strasbourg, Strasbourg, France
- ICube, CNRS, University of Strasbourg, Strasbourg, France
| | - Joël L Lavanchy
- Department of Visceral Surgery and Medicine, Inselspital, Bern University Hospital, University of Bern, Bern, Switzerland.
- IHU Strasbourg, Strasbourg, France.
- University Digestive Health Care Center Basel - Clarunis, PO Box, 4002, Basel, Switzerland.
| |
Collapse
|