1
|
Ma R, Kiyasseh D, Laca JA, Kocielnik R, Wong EY, Chu TN, Cen S, Yang CH, Dalieh IS, Haque TF, Goldenberg MG, Huang X, Anandkumar A, Hung AJ. Artificial Intelligence-Based Video Feedback to Improve Novice Performance on Robotic Suturing Skills: A Pilot Study. J Endourol 2024. [PMID: 37905524 DOI: 10.1089/end.2023.0328] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/02/2023] Open
Abstract
Introduction: Automated skills assessment can provide surgical trainees with objective, personalized feedback during training. Here, we measure the efficacy of artificial intelligence (AI)-based feedback on a robotic suturing task. Materials and Methods: Forty-two participants with no robotic surgical experience were randomized to a control or feedback group and video-recorded while completing two rounds (R1 and R2) of suturing tasks on a da Vinci surgical robot. Participants were assessed on needle handling and needle driving, and feedback was provided via a visual interface after R1. For feedback group, participants were informed of their AI-based skill assessment and presented with specific video clips from R1. For control group, participants were presented with randomly selected video clips from R1 as a placebo. Participants from each group were further labeled as underperformers or innate-performers based on a median split of their technical skill scores from R1. Results: Demographic features were similar between the control (n = 20) and feedback group (n = 22) (p > 0.05). Observing the improvement from R1 to R2, the feedback group had a significantly larger improvement in needle handling score (0.30 vs -0.02, p = 0.018) when compared with the control group, although the improvement of needle driving score was not significant when compared with the control group (0.17 vs -0.40, p = 0.074). All innate-performers exhibited similar improvements across rounds, regardless of feedback (p > 0.05). In contrast, underperformers in the feedback group improved more than the control group in needle handling (p = 0.02). Conclusion: AI-based feedback facilitates surgical trainees' acquisition of robotic technical skills, especially underperformers. Future research will extend AI-based feedback to additional suturing skills, surgical tasks, and experience groups.
Collapse
Affiliation(s)
- Runzhuo Ma
- Catherine & Joseph Aresty Department of Urology, Center for Robotic Simulation and Education, USC Institute of Urology, University of Southern California, Los Angeles, California, USA
| | - Dani Kiyasseh
- Department of Computing and Mathematical Sciences, California Institute of Technology, Pasadena, California, USA
| | - Jasper A Laca
- Catherine & Joseph Aresty Department of Urology, Center for Robotic Simulation and Education, USC Institute of Urology, University of Southern California, Los Angeles, California, USA
| | - Rafal Kocielnik
- Department of Computing and Mathematical Sciences, California Institute of Technology, Pasadena, California, USA
| | - Elyssa Y Wong
- Catherine & Joseph Aresty Department of Urology, Center for Robotic Simulation and Education, USC Institute of Urology, University of Southern California, Los Angeles, California, USA
| | - Timothy N Chu
- Catherine & Joseph Aresty Department of Urology, Center for Robotic Simulation and Education, USC Institute of Urology, University of Southern California, Los Angeles, California, USA
| | - Steven Cen
- Radiology Department, University of Southern California, Los Angeles, California, USA
| | - Cherine H Yang
- Catherine & Joseph Aresty Department of Urology, Center for Robotic Simulation and Education, USC Institute of Urology, University of Southern California, Los Angeles, California, USA
| | - Istabraq S Dalieh
- Catherine & Joseph Aresty Department of Urology, Center for Robotic Simulation and Education, USC Institute of Urology, University of Southern California, Los Angeles, California, USA
| | - Taseen F Haque
- Catherine & Joseph Aresty Department of Urology, Center for Robotic Simulation and Education, USC Institute of Urology, University of Southern California, Los Angeles, California, USA
| | - Mitch G Goldenberg
- Catherine & Joseph Aresty Department of Urology, Center for Robotic Simulation and Education, USC Institute of Urology, University of Southern California, Los Angeles, California, USA
| | - Xiuzhen Huang
- Catherine & Joseph Aresty Department of Urology, Center for Robotic Simulation and Education, USC Institute of Urology, University of Southern California, Los Angeles, California, USA
- Computational Biomedicine, Cedars-Sinai Medical Center, Los Angeles, California, USA
| | - Anima Anandkumar
- Department of Computing and Mathematical Sciences, California Institute of Technology, Pasadena, California, USA
| | - Andrew J Hung
- Catherine & Joseph Aresty Department of Urology, Center for Robotic Simulation and Education, USC Institute of Urology, University of Southern California, Los Angeles, California, USA
| |
Collapse
|
2
|
Wong EY, Chu TN, Ma R, Dalieh IS, Yang CH, Ramaswamy A, Medina LG, Kocielnik R, Ladi-Seyedian SS, Shtulman A, Cen SY, Goldenberg MG, Hung AJ. Development of a Classification System for Live Surgical Feedback. JAMA Netw Open 2023; 6:e2320702. [PMID: 37378981 DOI: 10.1001/jamanetworkopen.2023.20702] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Indexed: 06/29/2023] Open
Abstract
Importance Live feedback in the operating room is essential in surgical training. Despite the role this feedback plays in developing surgical skills, an accepted methodology to characterize the salient features of feedback has not been defined. Objective To quantify the intraoperative feedback provided to trainees during live surgical cases and propose a standardized deconstruction for feedback. Design, Setting, and Participants In this qualitative study using a mixed methods analysis, surgeons at a single academic tertiary care hospital were audio and video recorded in the operating room from April to October 2022. Urological residents, fellows, and faculty attending surgeons involved in robotic teaching cases during which trainees had active control of the robotic console for at least some portion of a surgery were eligible to voluntarily participate. Feedback was time stamped and transcribed verbatim. An iterative coding process was performed using recordings and transcript data until recurring themes emerged. Exposure Feedback in audiovisual recorded surgery. Main Outcomes and Measures The primary outcomes were the reliability and generalizability of a feedback classification system in characterizing surgical feedback. Secondary outcomes included assessing the utility of our system. Results In 29 surgical procedures that were recorded and analyzed, 4 attending surgeons, 6 minimally invasive surgery fellows, and 5 residents (postgraduate years, 3-5) were involved. For the reliability of the system, 3 trained raters achieved moderate to substantial interrater reliability in coding cases using 5 types of triggers, 6 types of feedback, and 9 types of responses (prevalence-adjusted and bias-adjusted κ range: a 0.56 [95% CI, 0.45-0.68] minimum for triggers to a 0.99 [95% CI, 0.97-1.00] maximum for feedback and responses). For the generalizability of the system, 6 types of surgical procedures and 3711 instances of feedback were analyzed and coded with types of triggers, feedback, and responses. Significant differences in triggers, feedback, and responses reflected surgeon experience level and surgical task being performed. For example, as a response, attending surgeons took over for safety concerns more often for fellows than residents (prevalence rate ratio [RR], 3.97 [95% CI, 3.12-4.82]; P = .002), and suturing involved more errors that triggered feedback than dissection (RR, 1.65 [95% CI, 1.03-3.33]; P = .007). For the utility of the system, different combinations of trainer feedback had associations with rates of different trainee responses. For example, technical feedback with a visual component was associated with an increased rate of trainee behavioral change or verbal acknowledgment responses (RR, 1.11 [95% CI, 1.03-1.20]; P = .02). Conclusions and Relevance These findings suggest that identifying different types of triggers, feedback, and responses may be a feasible and reliable method for classifying surgical feedback across several robotic procedures. Outcomes suggest that a system that can be generalized across surgical specialties and for trainees of different experience levels may help galvanize novel surgical education strategies.
Collapse
Affiliation(s)
- Elyssa Y Wong
- Center for Robotic Simulation and Education, Catherine and Joseph Aresty Department of Urology, USC Institute of Urology, University of Southern California, Los Angeles
| | - Timothy N Chu
- Center for Robotic Simulation and Education, Catherine and Joseph Aresty Department of Urology, USC Institute of Urology, University of Southern California, Los Angeles
| | - Runzhuo Ma
- Center for Robotic Simulation and Education, Catherine and Joseph Aresty Department of Urology, USC Institute of Urology, University of Southern California, Los Angeles
| | - Istabraq S Dalieh
- Center for Robotic Simulation and Education, Catherine and Joseph Aresty Department of Urology, USC Institute of Urology, University of Southern California, Los Angeles
| | - Cherine H Yang
- Center for Robotic Simulation and Education, Catherine and Joseph Aresty Department of Urology, USC Institute of Urology, University of Southern California, Los Angeles
| | - Ashwin Ramaswamy
- Department of Urology, Weill Cornell Medicine, New York, New York
| | - Luis G Medina
- Center for Robotic Simulation and Education, Catherine and Joseph Aresty Department of Urology, USC Institute of Urology, University of Southern California, Los Angeles
| | - Rafal Kocielnik
- Department of Computing and Mathematical Sciences, California Institute of Technology, Pasadena
| | - Seyedeh-Sanam Ladi-Seyedian
- Center for Robotic Simulation and Education, Catherine and Joseph Aresty Department of Urology, USC Institute of Urology, University of Southern California, Los Angeles
| | - Andrew Shtulman
- Thinking Lab, Department of Psychology, Occidental College, Los Angeles, California
| | - Steven Y Cen
- Department of Radiology, University of Southern California, Los Angeles
| | - Mitchell G Goldenberg
- Center for Robotic Simulation and Education, Catherine and Joseph Aresty Department of Urology, USC Institute of Urology, University of Southern California, Los Angeles
| | - Andrew J Hung
- Center for Robotic Simulation and Education, Catherine and Joseph Aresty Department of Urology, USC Institute of Urology, University of Southern California, Los Angeles
| |
Collapse
|
3
|
Laca JA, Kocielnik R, Nguyen JH, You J, Tsang R, Wong EY, Shtulman A, Anandkumar A, Hung AJ. Using Real-time Feedback To Improve Surgical Performance on a Robotic Tissue Dissection Task. EUR UROL SUPPL 2022; 46:15-21. [PMID: 36506257 PMCID: PMC9732447 DOI: 10.1016/j.euros.2022.09.015] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 09/26/2022] [Indexed: 11/05/2022] Open
Abstract
Background There is no standard for the feedback that an attending surgeon provides to a training surgeon, which may lead to variable outcomes in teaching cases. Objective To create and administer standardized feedback to medical students in an attempt to improve performance and learning. Design setting and participants A cohort of 45 medical students was recruited from a single medical school. Participants were randomly assigned to two groups. Both completed two rounds of a robotic surgical dissection task on a da Vinci Xi surgical system. The first round was the baseline assessment. In the second round, one group received feedback and the other served as the control (no feedback). Outcome measurements and statistical analysis Video from each round was retrospectively reviewed by four blinded raters and given a total error tally (primary outcome) and a technical skills score (Global Evaluative Assessment of Robotic Surgery [GEARS]). Generalized linear models were used for statistical modeling. According to their initial performance, each participant was categorized as either an innate performer or an underperformer, depending on whether their error tally was above or below the median. Results and limitations In round 2, the intervention group had a larger decrease in error rate than the control group, with a risk ratio (RR) of 1.51 (95% confidence interval [CI] 1.07-2.14; p = 0.02). The intervention group also had a greater increase in GEARS score in comparison to the control group, with a mean group difference of 2.15 (95% CI 0.81-3.49; p < 0.01). The interaction effect between innate performers versus underperformers and the intervention was statistically significant for the error rates, at F(1,38) = 5.16 (p = 0.03). Specifically, the intervention had a statistically significant effect on the error rate for underperformers (RR 2.23, 95% CI 1.37-3.62; p < 0.01) but not for innate performers (RR 1.03, 95% CI 0.63-1.68; p = 0.91). Conclusions Real-time feedback improved performance globally compared to the control. The benefit of real-time feedback was stronger for underperformers than for trainees with innate skill. Patient summary We found that real-time feedback during a training task using a surgical robot improved the performance of trainees when the task was repeated. This feedback approach could help in training doctors in robotic surgery.
Collapse
Affiliation(s)
- Jasper A. Laca
- Center for Robotic Simulation and Education, Catherine and Joseph Aresty Department of Urology, USC Institute of Urology, University of Southern California, Los Angeles, CA, USA
| | - Rafal Kocielnik
- Department of Computing and Mathematical Sciences, California Institute of Technology, Pasadena, CA, USA
| | - Jessica H. Nguyen
- Center for Robotic Simulation and Education, Catherine and Joseph Aresty Department of Urology, USC Institute of Urology, University of Southern California, Los Angeles, CA, USA
| | - Jonathan You
- Center for Robotic Simulation and Education, Catherine and Joseph Aresty Department of Urology, USC Institute of Urology, University of Southern California, Los Angeles, CA, USA
| | - Ryan Tsang
- Center for Robotic Simulation and Education, Catherine and Joseph Aresty Department of Urology, USC Institute of Urology, University of Southern California, Los Angeles, CA, USA
| | - Elyssa Y. Wong
- Center for Robotic Simulation and Education, Catherine and Joseph Aresty Department of Urology, USC Institute of Urology, University of Southern California, Los Angeles, CA, USA
| | - Andrew Shtulman
- Thinking Lab, Department of Psychology, Occidental College, Los Angeles, CA, USA
| | - Anima Anandkumar
- Department of Computing and Mathematical Sciences, California Institute of Technology, Pasadena, CA, USA
| | - Andrew J. Hung
- Center for Robotic Simulation and Education, Catherine and Joseph Aresty Department of Urology, USC Institute of Urology, University of Southern California, Los Angeles, CA, USA,Corresponding author. University of Southern California Institute of Urology, 1441 Eastlake Avenue, Los Angeles, CA 90089, USA. Tel. +1 323 865 3700; Fax: +1 323 865 0120.
| |
Collapse
|
4
|
Inouye DA, Ma R, Nguyen JH, Laca J, Kocielnik R, Anandkumar A, Hung AJ. Assessing the efficacy of dissection gestures in robotic surgery. J Robot Surg 2022; 17:597-603. [PMID: 36149590 DOI: 10.1007/s11701-022-01458-x] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/18/2022] [Accepted: 09/17/2022] [Indexed: 10/14/2022]
Abstract
Our group previously defined a dissection gesture classification system that deconstructs robotic tissue dissection into its most elemental yet meaningful movements. The purpose of this study was to expand upon this framework by adding an assessment of gesture efficacy (ineffective, effective, or erroneous) and analyze dissection patterns between groups of surgeons of varying experience. We defined three possible gesture efficacies as ineffective (no meaningful effect on the tissue), effective (intended effect on the tissue), and erroneous (unintended disruption of the tissue). Novices (0 prior robotic cases), intermediates (1-99 cases), and experts (≥ 100 cases) completed a robotic dissection task in a dry-lab training environment. Video recordings were reviewed to classify each gesture and determine its efficacy, then dissection patterns between groups were analyzed. 23 participants completed the task, with 9 novices, 8 intermediates with median caseload 60 (IQR 41-80), and 6 experts with median caseload 525 (IQR 413-900). For gesture selection, we found increasing experience associated with increasing proportion of overall dissection gestures (p = 0.009) and decreasing proportion of retraction gestures (p = 0.009). For gesture efficacy, novices performed the greatest proportion of ineffective gestures (9.8%, p < 0.001), intermediates commit the greatest proportion of erroneous gestures (26.8%, p < 0.001), and the three groups performed similar proportions of overall effective gestures, though experts performed the greatest proportion of effective retraction gestures (85.6%, p < 0.001). Between groups of experience, we found significant differences in gesture selection and gesture efficacy. These relationships may provide insight into further improving surgical training.
Collapse
Affiliation(s)
- Daniel A Inouye
- Center for Robotic Simulation & Education, Catherine & Joseph Aresty Department of Urology, University of Southern California Institute of Urology, Los Angeles, CA, USA
| | - Runzhuo Ma
- Center for Robotic Simulation & Education, Catherine & Joseph Aresty Department of Urology, University of Southern California Institute of Urology, Los Angeles, CA, USA
| | - Jessica H Nguyen
- Center for Robotic Simulation & Education, Catherine & Joseph Aresty Department of Urology, University of Southern California Institute of Urology, Los Angeles, CA, USA
| | - Jasper Laca
- Center for Robotic Simulation & Education, Catherine & Joseph Aresty Department of Urology, University of Southern California Institute of Urology, Los Angeles, CA, USA
| | - Rafal Kocielnik
- Department of Computing and Mathematical Sciences, California Institute of Technology, Pasadena, CA, USA
| | - Anima Anandkumar
- Department of Computing and Mathematical Sciences, California Institute of Technology, Pasadena, CA, USA
| | - Andrew J Hung
- Center for Robotic Simulation & Education, Catherine & Joseph Aresty Department of Urology, University of Southern California Institute of Urology, Los Angeles, CA, USA.
| |
Collapse
|
5
|
Kocaballi AB, Sezgin E, Clark L, Carroll JM, Huang Y, Huh-Yoo J, Kim J, Kocielnik R, Lee YC, Mamykina L, Mitchell EG, Moore RJ, Murali P, Mynatt ED, Park SY, Pasta A, Richards D, Silva LM, Smriti D, Spillane B, Zhang Z, Zubatiy T. Design and Evaluation Challenges of Conversational Agents in Healthcare and Wellbeing (Preprint). J Med Internet Res 2022; 24:e38525. [DOI: 10.2196/38525] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/06/2022] [Revised: 09/17/2022] [Accepted: 09/26/2022] [Indexed: 11/13/2022] Open
|
6
|
Roberts SI, Cen SY, Nguyen J, Perez LC, Medina LG, Ma R, Marshall S, Kocielnik R, Anandkumar A, Hung AJ. The Relationship of Technical Skills and Cognitive Workload to Errors During Robotic Surgical Exercises. J Endourol 2021; 36:712-720. [PMID: 34913734 PMCID: PMC9145254 DOI: 10.1089/end.2021.0790] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open
Abstract
Purpose We attempt to understand the relationship between surgeon technical skills, cognitive workload and errors during a simulated robotic dissection task. Materials and Methods Participant surgeons performed a robotic surgery dissection exercise. Participants were grouped based on surgical experience. Technical skills were evaluated utilizing the validated Global Evaluative Assessment of Robotic Skills (GEARS) assessment tool. The dissection task was evaluated for errors during active dissection or passive retraction maneuvers. We quantified cognitive workload of surgeon participants as an Index of Cognitive Activity (ICA), derived from Task-Evoked-Pupillary-Response metrics; ICA ranged 0-1, with 1 representing maximum ICA. Generalized Estimating Equation (GEE) was used for all modellings to establish relationships between surgeon technical skills, cognitive workload and errors. Results We found a strong association between technical skills as measured by multiple GEARS domains (depth perception, force sensitivity and robotic control) and passive errors, with higher GEARS scores associated with a lower relative risk of errors (all p < 0.01). For novice surgeons, as average GEARS scores increased, the average estimated ICA decreased. In contrast, as average GEARS increased for expert surgeons, the average estimated ICA increased. When exhibiting optimal technical skill (maximal GEARS scores) novices and experts reached a similar range of ICA scores (ICA 0.47 and 0.42, respectively). Conclusions This study found that there is an optimal cognitive workload level for surgeons of all experience levels during our robotic surgical exercise. Select technical skill domains were strong predictors of errors. Future research will explore whether an ideal cognitive workload range truly optimizes surgical training and reduce surgical errors.
Collapse
Affiliation(s)
- Sidney I Roberts
- USC Keck School of Medicine, 12223, Urology , Los Angeles, California, United States;
| | - Steven Yong Cen
- University of Southern California, 5116, Los Angeles, California, United States;
| | - Jessiica Nguyen
- University of Southern California, 5116, Catherine & Joseph Aresty Department of Urology, Los Angeles, California, United States;
| | - Laura C Perez
- University of Southern California, 5116, Catherine & Joseph Aresty Department of Urology , Los Angeles, California, United States;
| | - Luis G Medina
- University of Southern California, 5116, Catherine & Joseph Aresty Department of Urology, Los Angeles, California, United States;
| | - Runzhuo Ma
- University of Southern California, 5116, Center for Robotic Simulation & Education, Catherine and Joseph Aresty Department of Urology, USC Institute of Urology, Los Angeles, California, United States;
| | - Sandra Marshall
- Eyetracking, Inc. , Solana Beach, California, United States;
| | - Rafal Kocielnik
- California Institute of Technology, 6469, Pasadena, California, United States;
| | - Anima Anandkumar
- California Institute of Technology, 6469, Pasadena, California, United States;
| | - Andrew J Hung
- University of Southern California, 5116, Catherine and Joseph Aresty Department of Urology, 1516 San Pablo St, Los Angeles, CA 90033, Los Angeles, California, United States, 90089-0001;
| |
Collapse
|
7
|
Kocielnik R, Agapie E, Argyle A, Hsieh DT, Yadav K, Taira B, Hsieh G. HarborBot: A Chatbot for Social Needs Screening. AMIA Annu Symp Proc 2020; 2019:552-561. [PMID: 32308849 PMCID: PMC7153089] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Subscribe] [Scholar Register] [Indexed: 06/11/2023]
Abstract
Accessing patients' social needs is a critical challenge at emergency departments (EDs). However, most EDs do not have extra staff to administer screeners, and without personnel administration, response rates are low especially for low health literacy patients. To facilitate engagement with such low health literacy patients, we designed a chatbot - HarborBot for social needs screening. Through a study with 30 participants, where participants took a social needs screener both via a traditional survey platform and HarborBot, we found that the two platforms resulted in comparable data (equivalent in 87% of the responses). We also found that while the high health literate participants preferred the traditional survey platform because of efficiency (allowing participants to proceed at their own pace), the low health literate participants preferred HarborBot as it was more engaging, personal, and more understandable. We conclude with a discussion on the design implications for chatbots for social needs screening.
Collapse
|
8
|
Affiliation(s)
| | - Os Keyes
- University of Washington, Seattle, WA, USA
| | | | | | | | - Gary Hsieh
- University of Washington, Seattle, WA, USA
| |
Collapse
|
9
|
Abstract
Machine learning (ML) has become increasingly influential to human society, yet the primary advancements and applications of ML are driven by research in only a few computational disciplines. Even applications that affect or analyze human behaviors and social structures are often developed with limited input from experts outside of computational fields. Social scientists—experts trained to examine and explain the complexity of human behavior and interactions in the world—have considerable expertise to contribute to the development of ML applications for human-generated data, and their analytic practices could benefit from more human-centered ML methods. Although a few researchers have highlighted some gaps between ML and social sciences [51, 57, 70], most discussions only focus on quantitative methods. Yet many social science disciplines rely heavily on qualitative methods to distill patterns that are challenging to discover through quantitative data. One common analysis method for qualitative data is
qualitative coding
. In this article, we highlight three challenges of applying ML to qualitative coding. Additionally, we utilize our experience of designing a visual analytics tool for collaborative qualitative coding to demonstrate the potential in using ML to support qualitative coding by shifting the focus to identifying ambiguity. We illustrate dimensions of ambiguity and discuss the relationship between disagreement and ambiguity. Finally, we propose three research directions to ground ML applications for social science as part of the progression toward
human-centered
machine learning.
Collapse
|