1
|
Analysis of Missingness Scenarios for Observational Health Data. J Pers Med 2024; 14:514. [PMID: 38793096 PMCID: PMC11122060 DOI: 10.3390/jpm14050514] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/04/2024] [Revised: 04/29/2024] [Accepted: 05/08/2024] [Indexed: 05/26/2024] Open
Abstract
Despite the extensive literature on missing data theory and cautionary articles emphasizing the importance of realistic analysis for healthcare data, a critical gap persists in incorporating domain knowledge into the missing data methods. In this paper, we argue that the remedy is to identify the key scenarios that lead to data missingness and investigate their theoretical implications. Based on this proposal, we first introduce an analysis framework where we investigate how different observation agents, such as physicians, influence the data availability and then scrutinize each scenario with respect to the steps in the missing data analysis. We apply this framework to the case study of observational data in healthcare facilities. We identify ten fundamental missingness scenarios and show how they influence the identification step for missing data graphical models, inverse probability weighting estimation, and exponential tilting sensitivity analysis. To emphasize how domain-informed analysis can improve method reliability, we conduct simulation studies under the influence of various missingness scenarios. We compare the results of three common methods in medical data analysis: complete-case analysis, Missforest imputation, and inverse probability weighting estimation. The experiments are conducted for two objectives: variable mean estimation and classification accuracy. We advocate for our analysis approach as a reference for the observational health data analysis. Beyond that, we also posit that the proposed analysis framework is applicable to other medical domains.
Collapse
|
2
|
Delineating morbidity patterns in preterm infants at near-term age using a data-driven approach. BMC Pediatr 2024; 24:249. [PMID: 38605404 PMCID: PMC11010410 DOI: 10.1186/s12887-024-04702-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 11/07/2023] [Accepted: 03/11/2024] [Indexed: 04/13/2024] Open
Abstract
BACKGROUND Long-term survival after premature birth is significantly determined by development of morbidities, primarily affecting the cardio-respiratory or central nervous system. Existing studies are limited to pairwise morbidity associations, thereby lacking a holistic understanding of morbidity co-occurrence and respective risk profiles. METHODS Our study, for the first time, aimed at delineating and characterizing morbidity profiles at near-term age and investigated the most prevalent morbidities in preterm infants: bronchopulmonary dysplasia (BPD), pulmonary hypertension (PH), mild cardiac defects, perinatal brain pathology and retinopathy of prematurity (ROP). For analysis, we employed two independent, prospective cohorts, comprising a total of 530 very preterm infants: AIRR ("Attention to Infants at Respiratory Risks") and NEuroSIS ("Neonatal European Study of Inhaled Steroids"). Using a data-driven strategy, we successfully characterized morbidity profiles of preterm infants in a stepwise approach and (1) quantified pairwise morbidity correlations, (2) assessed the discriminatory power of BPD (complemented by imaging-based structural and functional lung phenotyping) in relation to these morbidities, (3) investigated collective co-occurrence patterns, and (4) identified infant subgroups who share similar morbidity profiles using machine learning techniques. RESULTS First, we showed that, in line with pathophysiologic understanding, BPD and ROP have the highest pairwise correlation, followed by BPD and PH as well as BPD and mild cardiac defects. Second, we revealed that BPD exhibits only limited capacity in discriminating morbidity occurrence, despite its prevalence and clinical indication as a driver of comorbidities. Further, we demonstrated that structural and functional lung phenotyping did not exhibit higher association with morbidity severity than BPD. Lastly, we identified patient clusters that share similar morbidity patterns using machine learning in AIRR (n=6 clusters) and NEuroSIS (n=8 clusters). CONCLUSIONS By capturing correlations as well as more complex morbidity relations, we provided a comprehensive characterization of morbidity profiles at discharge, linked to shared disease pathophysiology. Future studies could benefit from identifying risk profiles to thereby develop personalized monitoring strategies. TRIAL REGISTRATION AIRR: DRKS.de, DRKS00004600, 28/01/2013. NEuroSIS: ClinicalTrials.gov, NCT01035190, 18/12/2009.
Collapse
|
3
|
Assessable and interpretable sensitivity analysis in the pattern graph framework for nonignorable missingness mechanisms. Stat Med 2023; 42:5419-5450. [PMID: 37759370 DOI: 10.1002/sim.9920] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/27/2022] [Revised: 06/12/2023] [Accepted: 09/13/2023] [Indexed: 09/29/2023]
Abstract
The pattern graph framework solves a wide range of missing data problems with nonignorable mechanisms. However, it faces two challenges of assessability and interpretability, particularly important in safety-critical problems such as clinical diagnosis: (i) How can one assess the validity of the framework's a priori assumption and make necessary adjustments to accommodate known information about the problem? (ii) How can one interpret the process of exponential tilting used for sensitivity analysis in the pattern graph framework and choose the tilt perturbations based on meaningful real-world quantities? In this paper, we introduce Informed Sensitivity Analysis, an extension of the pattern graph framework that enables us to incorporate substantive knowledge about the missingness mechanism into the pattern graph framework. Our extension allows us to examine the validity of assumptions underlying pattern graphs and interpret sensitivity analysis results in terms of realistic problem characteristics. We apply our method to a prevalent nonignorable missing data scenario in clinical research. We validate and compare our method's results of our method with a number of widely-used missing data methods, including Unweighted CCA, KNN Imputer, MICE, and MissForest. The validation is done using both boot-strapped simulated experiments as well as real-world clinical observations in the MIMIC-III public dataset.
Collapse
|
4
|
Correlation Between Early Trends of a Prognostic Biomarker and Overall Survival in Non-Small-Cell Lung Cancer Clinical Trials. JCO Clin Cancer Inform 2023; 7:e2300062. [PMID: 37922432 PMCID: PMC10730042 DOI: 10.1200/cci.23.00062] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/12/2023] [Revised: 06/13/2023] [Accepted: 09/07/2023] [Indexed: 11/05/2023] Open
Abstract
PURPOSE Overall survival (OS) is the primary end point in phase III oncology trials. Given low success rates, surrogate end points, such as progression-free survival or objective response rate, are used in early go/no-go decision making. Here, we investigate whether early trends of OS prognostic biomarkers, such as the ROPRO and DeepROPRO, can also be used for this purpose. METHODS Using real-world data, we emulated a series of 12 advanced non-small-cell lung cancer (aNSCLC) clinical trials, originally conducted by six different sponsors and evaluated four different mechanisms, in a total of 19,920 individuals. We evaluated early trends (until 6 months) of the OS biomarker alongside early OS within the joint model (JM) framework. Study-level estimates of early OS and ROPRO trends were correlated against the actual final OS hazard ratios (HRs). RESULTS We observed a strong correlation between the JM estimates and final OS HR at 3 months (adjusted R 2 = 0.88) and at 6 months (adjusted R 2 = 0.85). In the leave-one-out analysis, there was a low overall prediction error of the OS HR at both 3 months (root-mean-square error [RMSE] = 0.11) and 6 months (RMSE = 0.12). In addition, at 3 months, the absolute prediction error of the OS HR was lower than 0.05 for three trials. CONCLUSION We describe a pipeline to predict trial OS HRs using emulated aNSCLC studies and their early OS and OS biomarker trends. The method has the potential to accelerate and improve decision making in drug development.
Collapse
|
5
|
Proteomics reveals antiviral host response and NETosis during acute COVID-19 in high-risk patients. Biochim Biophys Acta Mol Basis Dis 2023; 1869:166592. [PMID: 36328146 PMCID: PMC9622026 DOI: 10.1016/j.bbadis.2022.166592] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/18/2022] [Revised: 10/27/2022] [Accepted: 10/27/2022] [Indexed: 11/05/2022]
Abstract
SARS-CoV-2 remains an acute threat to human health, endangering hospital capacities worldwide. Previous studies have aimed at informing pathophysiologic understanding and identification of disease indicators for risk assessment, monitoring, and therapeutic guidance. While findings start to emerge in the general population, observations in high-risk patients with complex pre-existing conditions are limited. We addressed the gap of existing knowledge with regard to a differentiated understanding of disease dynamics in SARS-CoV-2 infection while specifically considering disease stage and severity. We biomedically characterized quantitative proteomics in a hospitalized cohort of COVID-19 patients with mild to severe symptoms suffering from different (co)-morbidities in comparison to both healthy individuals and patients with non-COVID related inflammation. Deep clinical phenotyping enabled the identification of individual disease trajectories in COVID-19 patients. By the use of the individualized disease phase assignment, proteome analysis revealed a severity dependent general type-2-centered host response side-by-side with a disease specific antiviral immune reaction in early disease. The identification of phenomena such as neutrophil extracellular trap (NET) formation and a pro-coagulatory response characterizing severe disease was successfully validated in a second cohort. Together with the regulation of proteins related to SARS-CoV-2-specific symptoms identified by proteome screening, we not only confirmed results from previous studies but provide novel information for biomarker and therapy development.
Collapse
|
6
|
Improved Macro- and Micronutrient Supply for Favorable Growth and Metabolomic Profile with Standardized Parenteral Nutrition Solutions for Very Preterm Infants. Nutrients 2022; 14:3912. [PMID: 36235563 PMCID: PMC9572167 DOI: 10.3390/nu14193912] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/24/2022] [Revised: 09/09/2022] [Accepted: 09/10/2022] [Indexed: 11/16/2022] Open
Abstract
Very preterm infants are at high risk for suboptimal nutrition in the first weeks of life leading to insufficient weight gain and complications arising from metabolic imbalances such as insufficient bone mineral accretion. We investigated the use of a novel set of standardized parenteral nutrition (PN; MUC PREPARE) solutions regarding improving nutritional intake, accelerating termination of parenteral feeding, and positively affecting growth in comparison to individually prescribed and compounded PN solutions. We studied the effect of MUC PREPARE on macro- and micronutrient intake, metabolism, and growth in 58 very preterm infants and compared results to a historic reference group of 58 very preterm infants matched for clinical characteristics. Infants receiving MUC PREPARE demonstrated improved macro- and micronutrient intake resulting in balanced electrolyte levels and stable metabolomic profiles. Subsequently, improved energy supply was associated with up to 1.5 weeks earlier termination of parenteral feeding, while simultaneously reaching up to 1.9 times higher weight gain at day 28 in extremely immature infants (<27 GA weeks) as well as overall improved growth at 2 years of age for all infants. The use of the new standardized PN solution MUC PREPARE improved nutritional supply and short- and long-term growth and reduced PN duration in very preterm infants and is considered a superior therapeutic strategy.
Collapse
|
7
|
Leveraging Machine Learning to Predict 30-Day Hospital Readmission after Cardiac Surgery. Ann Thorac Surg 2021; 114:2173-2179. [PMID: 34890575 DOI: 10.1016/j.athoracsur.2021.11.011] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 01/31/2021] [Revised: 10/15/2021] [Accepted: 11/15/2021] [Indexed: 11/15/2022]
Abstract
BACKGROUND Hospital readmission within 30 days of discharge is a well-studied outcome. Predicting readmission after cardiac surgery, however, is notoriously challenging; the best-performing models in the literature have AUCs around .65. A reliable predictive model would enable clinicians to identify patients at-risk for readmission and develop prevention strategies. METHODS We analyzed our institution's STS Adult Cardiac Surgery Database (STS), augmented with electronic medical record (EMR) data. Predictors included demographics, pre-operative comorbidities, proxies for intra-operative risk, indicators of post-operative complications, and timeseries-derived variables. We trained several machine learning models, evaluating each on a held-out test set. RESULTS Our analysis cohort consisted of 4,924 cases from 2011-2016. 723 (14.7%) were readmitted within 30 days of discharge. Our models included 141 STS-derived and 24 EMR-derived variables. A random forest model performed best, with test AUC .76 (95% CI: (.73, .79)). Using exclusively pre-operative variables, as in STS calculated risk scores, degraded the AUC: .64 (95% CI: .60, .68). Key predictors included length of stay (12.5x more important than the average variable) and whether the patient was discharged to a rehab facility (11.2x). CONCLUSIONS Our approach, augmenting STS variables with EMR data and employing flexible machine learning modeling, yielded state-of-the-art performance for predicting 30-day readmission. Separately, the importance of variables not directly related to in-patient care, such as discharge location, amplifies questions about the efficacy of assessing care quality via readmissions.
Collapse
|
8
|
Artificial Intelligence for Prognostic Scores in Oncology: a Benchmarking Study. Front Artif Intell 2021; 4:625573. [PMID: 33937744 PMCID: PMC8086599 DOI: 10.3389/frai.2021.625573] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/03/2020] [Accepted: 01/19/2021] [Indexed: 01/22/2023] Open
Abstract
Introduction: Prognostic scores are important tools in oncology to facilitate clinical decision-making based on patient characteristics. To date, classic survival analysis using Cox proportional hazards regression has been employed in the development of these prognostic scores. With the advance of analytical models, this study aimed to determine if more complex machine-learning algorithms could outperform classical survival analysis methods. Methods: In this benchmarking study, two datasets were used to develop and compare different prognostic models for overall survival in pan-cancer populations: a nationwide EHR-derived de-identified database for training and in-sample testing and the OAK (phase III clinical trial) dataset for out-of-sample testing. A real-world database comprised 136K first-line treated cancer patients across multiple cancer types and was split into a 90% training and 10% testing dataset, respectively. The OAK dataset comprised 1,187 patients diagnosed with non-small cell lung cancer. To assess the effect of the covariate number on prognostic performance, we formed three feature sets with 27, 44 and 88 covariates. In terms of methods, we benchmarked ROPRO, a prognostic score based on the Cox model, against eight complex machine-learning models: regularized Cox, Random Survival Forests (RSF), Gradient Boosting (GB), DeepSurv (DS), Autoencoder (AE) and Super Learner (SL). The C-index was used as the performance metric to compare different models. Results: For in-sample testing on the real-world database the resulting C-index [95% CI] values for RSF 0.720 [0.716, 0.725], GB 0.722 [0.718, 0.727], DS 0.721 [0.717, 0.726] and lastly, SL 0.723 [0.718, 0.728] showed significantly better performance as compared to ROPRO 0.701 [0.696, 0.706]. Similar results were derived across all feature sets. However, for the out-of-sample validation on OAK, the stronger performance of the more complex models was not apparent anymore. Consistently, the increase in the number of prognostic covariates did not lead to an increase in model performance. Discussion: The stronger performance of the more complex models did not generalize when applied to an out-of-sample dataset. We hypothesize that future research may benefit by adding multimodal data to exploit advantages of more complex models.
Collapse
|
9
|
Association Between Surgical Trainee Daytime Sleepiness and Intraoperative Technical Skill When Performing Septoplasty. JAMA FACIAL PLAST SU 2020; 21:104-109. [PMID: 30325993 DOI: 10.1001/jamafacial.2018.1171] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022]
Abstract
Importance Daytime sleepiness in surgical trainees can impair intraoperative technical skill and thus affect their learning and pose a risk to patient safety. Objective To determine the association between daytime sleepiness of surgeons in residency and fellowship training and their intraoperative technical skill during septoplasty. Design, Setting, and Participants This prospective cohort study included 19 surgical trainees in otolaryngology-head and neck surgery programs at 2 academic institutions (Johns Hopkins University School of Medicine and MedStar Georgetown University Hospital). The physicians were recruited from June 13, 2016, to April 20, 2018. The analysis includes data that were captured between June 27, 2016, and April 20, 2018. Main Outcomes and Measures Attending physician and surgical trainee self-rated intraoperative technical skill using the Septoplasty Global Assessment Tool (SGAT) and visual analog scales. Daytime sleepiness reported by surgical trainees was measured using the Epworth Sleepiness Scale (ESS). Results Of 19 surgical trainees, 17 resident physicians (9 female [53%]) and 2 facial plastic surgery fellowship physicians (1 female and 1 male) performed a median of 3.00 septoplasty procedures (range, 1-9 procedures) under supervision by an attending physician. Of the 19 surgical trainees, 10 (53%) were aged 25 to 30 years and 9 (47%) were 31 years or older. The mean ESS score overall was 6.74 (95% CI, 5.96-7.52), and this score did not differ between female and male trainees. The mean ESS score was 7.57 (95% CI, 6.58-8.56) in trainees aged 25 to 30 years and 5.44 (95% CI, 4.32-6.57) in trainees aged 31 years or older. In regression models adjusted for sex, age, postgraduate year, and technical complexity of the procedure, there was a statistically significant inverse association between ESS scores and attending physician-rated technical skill for both SGAT (-0.41; 95% CI, -0.55 to -0.27; P < .001) and the visual analog scale (-0.75; 95% CI, -1.40 to -0.07; P = .03). The association between ESS scores and technical skill was not statistically significant for trainee self-rated SGAT (0.04; 95% CI, -0.17 to 0.24; P = .73) and the self-rated visual analog scale (0.19; 95% CI, -0.79 to 1.2; P = .70). Conclusions and Relevance The findings suggest that daytime sleepiness of surgical trainees is inversely associated with attending physician-rated intraoperative technical skill when performing septoplasty. Thus, surgical trainees' ability to learn technical skill in the operating room may be influenced by their daytime sleepiness. Level of Evidence NA.
Collapse
|
10
|
P5706Finding predictors and causes of cardiac surgery ICU readmission using machine learning and causal inference. Eur Heart J 2019. [DOI: 10.1093/eurheartj/ehz746.0647] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open
Abstract
Abstract
Background
“Bounce-back” to the intensive care unit (ICU) occurs when patients return to the ICU for critical changes in clinical status within the same hospital admission. Bounce-backs post-cardiac surgery increase resource utilisation, total cost of care, are associated with higher mortality and morbidity. However, prediction of bounce-back has proved to be challenging. Previous work addressed the feasibility of predicting bounce-back, but these models required significant physician input to design and calibrate the predictive variables.
Purpose
We aimed to develop an automated machine learning model that would identify patients at risk of bounce-back by selecting the most relevant variables from those available before onset of bounce-back. Additionally, we highlight the differences between predictive and causal inference, to demonstrate that purely associative methods of prediction can mislead clinical decision-making.
Methods
Clinical records of adult cardiac surgery patients between 2011 to 2016 were collected from our institutional Society for Thoracic Surgeons (STS) database and our institutional electronic health record (EHR) system. For bounce-back prediction, an L1 regularised logistic regression model was applied, which also automatically determined important variables with highest prediction effect from the initial 151 variables. For causal inference, the g-computation algorithm was used to compare the differences between causal and predictive regression effects. We quantified the performance of our system on clinically relevant metrics such as specificity, sensitivity, and area under the ROC curve (AUC).
Results
Of the 6189 patients, 357 (5.7%) bounced back to the ICU. The prediction model achieved an AUC score of 0.75 (0.03) and 22% specificity at 95% sensitivity, Further analysis showed 79% of the false positive patients had faced other severe postoperative complications but none of the false negative patients had downstream complications. Subsequent causal analysis revealed that the actual causal effects of treatments differed from the predictive model estimates, e.g. administration of intra-operative tranexamic acid increased the probability of bounce-back by 13% but its causal effect on bounce-back after removing confounders was negligible (an increase of only 0.5%).
Conclusions
Our predictive machine-learning model can successfully predict patients at risk of ICU bounce-backs, using linked STS registry data with the comprehensive electronic health record. The prediction model automatically detects important subset of variables. In addition, we note that causal and predictive model estimates of the same parameters differed, indicating that reliance on predictive models for interventional clinical decision-making may not be appropriate.
Acknowledgement/Funding
National Institutes of Health, Office of Naval Research, Defense Advanced Research Projects Agency
Collapse
|
11
|
Abstract
OBJECTIVE State-of-the-art techniques for surgical data analysis report promising results for automated skill assessment and action recognition. The contributions of many of these techniques, however, are limited to study-specific data and validation metrics, making assessment of progress across the field extremely challenging. METHODS In this paper, we address two major problems for surgical data analysis: First, lack of uniform-shared datasets and benchmarks, and second, lack of consistent validation processes. We address the former by presenting the JHU-ISI Gesture and Skill Assessment Working Set (JIGSAWS), a public dataset that we have created to support comparative research benchmarking. JIGSAWS contains synchronized video and kinematic data from multiple performances of robotic surgical tasks by operators of varying skill. We address the latter by presenting a well-documented evaluation methodology and reporting results for six techniques for automated segmentation and classification of time-series data on JIGSAWS. These techniques comprise four temporal approaches for joint segmentation and classification: hidden Markov model, sparse hidden Markov model (HMM), Markov semi-Markov conditional random field, and skip-chain conditional random field; and two feature-based ones that aim to classify fixed segments: bag of spatiotemporal features and linear dynamical systems. RESULTS Most methods recognize gesture activities with approximately 80% overall accuracy under both leave-one-super-trial-out and leave-one-user-out cross-validation settings. CONCLUSION Current methods show promising results on this shared dataset, but room for significant progress remains, particularly for consistent prediction of gesture activities across different surgeons. SIGNIFICANCE The results reported in this paper provide the first systematic and uniform evaluation of surgical activity recognition techniques on the benchmark database.
Collapse
|
12
|
Task-Level vs. Segment-Level Quantitative Metrics for Surgical Skill Assessment. JOURNAL OF SURGICAL EDUCATION 2016; 73:482-489. [PMID: 26896147 DOI: 10.1016/j.jsurg.2015.11.009] [Citation(s) in RCA: 23] [Impact Index Per Article: 2.9] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/22/2015] [Revised: 09/21/2015] [Accepted: 11/08/2015] [Indexed: 06/05/2023]
Abstract
OBJECTIVE Task-level metrics of time and motion efficiency are valid measures of surgical technical skill. Metrics may be computed for segments (maneuvers and gestures) within a task after hierarchical task decomposition. Our objective was to compare task-level and segment (maneuver and gesture)-level metrics for surgical technical skill assessment. DESIGN Our analyses include predictive modeling using data from a prospective cohort study. We used a hierarchical semantic vocabulary to segment a simple surgical task of passing a needle across an incision and tying a surgeon's knot into maneuvers and gestures. We computed time, path length, and movements for the task, maneuvers, and gestures using tool motion data. We fit logistic regression models to predict experience-based skill using the quantitative metrics. We compared the area under a receiver operating characteristic curve (AUC) for task-level, maneuver-level, and gesture-level models. SETTING Robotic surgical skills training laboratory. PARTICIPANTS In total, 4 faculty surgeons with experience in robotic surgery and 14 trainee surgeons with no or minimal experience in robotic surgery. RESULTS Experts performed the task in shorter time (49.74s; 95% CI = 43.27-56.21 vs. 81.97; 95% CI = 69.71-94.22), with shorter path length (1.63m; 95% CI = 1.49-1.76 vs. 2.23; 95% CI = 1.91-2.56), and with fewer movements (429.25; 95% CI = 383.80-474.70 vs. 728.69; 95% CI = 631.84-825.54) than novices. Experts differed from novices on metrics for individual maneuvers and gestures. The AUCs were 0.79; 95% CI = 0.62-0.97 for task-level models, 0.78; 95% CI = 0.6-0.96 for maneuver-level models, and 0.7; 95% CI = 0.44-0.97 for gesture-level models. There was no statistically significant difference in AUC between task-level and maneuver-level (p = 0.7) or gesture-level models (p = 0.17). CONCLUSIONS Maneuver-level and gesture-level metrics are discriminative of surgical skill and can be used to provide targeted feedback to surgical trainees.
Collapse
|
13
|
Analysis of the Structure of Surgical Activity for a Suturing and Knot-Tying Task. PLoS One 2016; 11:e0149174. [PMID: 26950551 PMCID: PMC4780814 DOI: 10.1371/journal.pone.0149174] [Citation(s) in RCA: 16] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/10/2015] [Accepted: 01/07/2016] [Indexed: 11/17/2022] Open
Abstract
Background Surgical tasks are performed in a sequence of steps, and technical skill evaluation includes assessing task flow efficiency. Our objective was to describe differences in task flow for expert and novice surgeons for a basic surgical task. Methods We used a hierarchical semantic vocabulary to decompose and annotate maneuvers and gestures for 135 instances of a surgeon’s knot performed by 18 surgeons. We compared counts of maneuvers and gestures, and analyzed task flow by skill level. Results Experts used fewer gestures to perform the task (26.29; 95% CI = 25.21 to 27.38 for experts vs. 31.30; 95% CI = 29.05 to 33.55 for novices) and made fewer errors in gestures than novices (1.00; 95% CI = 0.61 to 1.39 vs. 2.84; 95% CI = 2.3 to 3.37). Transitions among maneuvers, and among gestures within each maneuver for expert trials were more predictable than novice trials. Conclusions Activity segments and state flow transitions within a basic surgical task differ by surgical skill level, and can be used to provide targeted feedback to surgical trainees.
Collapse
|
14
|
An objective and automated method for assessing surgical skill in endoscopic sinus surgery using eye-tracking and tool-motion data. Int Forum Allergy Rhinol 2012; 2:507-15. [PMID: 22696449 DOI: 10.1002/alr.21053] [Citation(s) in RCA: 25] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/08/2011] [Revised: 03/08/2012] [Accepted: 03/31/2012] [Indexed: 11/07/2022]
Abstract
BACKGROUND Assessment of surgical skill plays a crucial role in determining competency, monitoring educational programs, and providing trainee feedback. With the changing health care environment, it will likely play an important role in credentialing and maintenance of certification. The ideal skill assessment tool should be unbiased, objective, and accurate. We hypothesize that tool-motion data-how a surgeon moves his/her instruments-and eye-gaze data-what a surgeon looks at when he/she operates-contain sufficient information to quantitatively and objectively evaluate surgical skill. We investigate this hypothesis by developing a statistical model of surgery and testing the model experimentally in the context of endoscopic sinus surgery (ESS). METHODS A total of 378 trials were recorded from 7 expert and 13 novice surgeons while they were performing a series of 9 different ESS tasks. Data was collected using an electromagnetic tracker to record the surgeon's tool and endoscope motions. In addition, the location of surgeon's eye gaze was recorded using an infrared eye tracker camera. This data was fit to the statistical model and used to test the accuracy of skill assessment. RESULTS The skill of expert surgeons was identified correctly for 94.6% of tasks. For surgeries performed by novice surgeons the proposed model properly recognizes the skill level with 88.6% accuracy. CONCLUSION We present an objective and unbiased method for assessing the skill of endoscopic sinus surgeons. Experimental results show that the proposed method successfully identifies the skill levels of both expert and novice surgeons.
Collapse
|
15
|
Surgical Task and Skill Classification from Eye Tracking and Tool Motion in Minimally Invasive Surgery. MEDICAL IMAGE COMPUTING AND COMPUTER-ASSISTED INTERVENTION – MICCAI 2010 2010; 13:295-302. [DOI: 10.1007/978-3-642-15711-0_37] [Citation(s) in RCA: 31] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/05/2022]
|
16
|
Intra-operative Localization of Brachytherapy Implants Using Intensity-based Registration. PROCEEDINGS OF SPIE--THE INTERNATIONAL SOCIETY FOR OPTICAL ENGINEERING 2009; 7261. [PMID: 21152376 DOI: 10.1117/12.812447] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/14/2022]
Abstract
In prostate brachytherapy, a transrectal ultrasound (TRUS) will show the prostate boundary but not all the implanted seeds, while fluoroscopy will show all the seeds clearly but not the boundary. We propose an intensity-based registration between TRUS images and the implant reconstructed from uoroscopy as a means of achieving accurate intra-operative dosimetry. The TRUS images are first filtered and compounded, and then registered to the uoroscopy model via mutual information. A training phantom was implanted with 48 seeds and imaged. Various ultrasound filtering techniques were analyzed, and the best results were achieved with the Bayesian combination of adaptive thresholding, phase congruency, and compensation for the non-uniform ultrasound beam profile in the elevation and lateral directions. The average registration error between corresponding seeds relative to the ground truth was 0.78 mm. The effect of false positives and false negatives in ultrasound were investigated by masking true seeds in the uoroscopy volume or adding false seeds. The registration error remained below 1.01 mm when the false positive rate was 31%, and 0.96 mm when the false negative rate was 31%. This fully automated method delivers excellent registration accuracy and robustness in phantom studies, and promises to demonstrate clinically adequate performance on human data as well. Keywords: Prostate brachytherapy, Ultrasound, Fluoroscopy, Registration.
Collapse
|