1
|
Hasan E, Duhaime E, Trueblood JS. Boosting wisdom of the crowd for medical image annotation using training performance and task features. Cogn Res Princ Implic 2024; 9:31. [PMID: 38763994 PMCID: PMC11102897 DOI: 10.1186/s41235-024-00558-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/30/2023] [Accepted: 04/29/2024] [Indexed: 05/21/2024] Open
Abstract
A crucial bottleneck in medical artificial intelligence (AI) is high-quality labeled medical datasets. In this paper, we test a large variety of wisdom of the crowd algorithms to label medical images that were initially classified by individuals recruited through an app-based platform. Individuals classified skin lesions from the International Skin Lesion Challenge 2018 into 7 different categories. There was a large dispersion in the geographical location, experience, training, and performance of the recruited individuals. We tested several wisdom of the crowd algorithms of varying complexity from a simple unweighted average to more complex Bayesian models that account for individual patterns of errors. Using a switchboard analysis, we observe that the best-performing algorithms rely on selecting top performers, weighting decisions by training accuracy, and take into account the task environment. These algorithms far exceed expert performance. We conclude by discussing the implications of these approaches for the development of medical AI.
Collapse
Affiliation(s)
- Eeshan Hasan
- Department of Psychological and Brain Sciences, Indiana University, 1101 E. 10th St., Bloomington, IN, 47405-7007, USA.
- Cognitive Science Program, Indiana University, Bloomington, USA.
| | | | - Jennifer S Trueblood
- Department of Psychological and Brain Sciences, Indiana University, 1101 E. 10th St., Bloomington, IN, 47405-7007, USA.
- Cognitive Science Program, Indiana University, Bloomington, USA.
| |
Collapse
|
2
|
Friche P, Moulis L, Du Thanh A, Dereure O, Duflos C, Carbonnel F. Training Family Medicine Residents in Dermoscopy Using an e-Learning Course: Pilot Interventional Study. JMIR Form Res 2024; 8:e56005. [PMID: 38739910 DOI: 10.2196/56005] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/02/2024] [Revised: 02/05/2024] [Accepted: 03/20/2024] [Indexed: 05/16/2024] Open
Abstract
BACKGROUND Skin cancers are the most common group of cancers diagnosed worldwide. Aging and sun exposure increase their risk. The decline in the number of dermatologists is pushing the issue of dermatological screening back onto family doctors. Dermoscopy is an easy-to-use tool that increases the sensitivity of melanoma diagnosis by 60% to 90%, but its use is limited due to lack of training. The characteristics of "ideal" dermoscopy training have yet to be established. We created a Moodle (Moodle HQ)-based e-learning course to train family medicine residents in dermoscopy. OBJECTIVE This study aimed to evaluate the evolution of dermoscopy knowledge among family doctors immediately and 1 and 3 months after e-learning training. METHODS We conducted a prospective interventional study between April and November 2020 to evaluate an educational program intended for family medicine residents at the University of Montpellier-Nîmes, France. They were asked to complete an e-learning course consisting of 2 modules, with an assessment quiz repeated at 1 (M1) and 3 months (M3). The course was based on a 2-step algorithm, a method of dermoscopic analysis of pigmented skin lesions that is internationally accepted. The objectives of modules 1 and 2 were to differentiate melanocytic lesions from nonmelanocytic lesions and to precisely identify skin lesions by looking for dermoscopic morphological criteria specific to each lesion. Each module consisted of 15 questions with immediate feedback after each question. RESULTS In total, 134 residents were included, and 66.4% (n=89) and 47% (n=63) of trainees fully participated in the evaluation of module 1 and module 2, respectively. This study showed a significant score improvement 3 months after the training course in 92.1% (n=82) of participants for module 1 and 87.3% (n=55) of participants for module 2 (P<.001). The majority of the participants expressed satisfaction (n=48, 90.6%) with the training course, and 96.3% (n=51) planned to use a dermatoscope in their future practice. Regarding final scores, the only variable that was statistically significant was the resident's initial scores (P=.003) for module 1. No measured variable was found to be associated with retention (midtraining or final evaluation) for module 2. Residents who had completed at least 1 dermatology rotation during medical school had significantly higher initial scores in module 1 at M0 (P=.03). Residents who reported having completed at least 1 dermatology rotation during their family medicine training had a statistically significant higher score at M1 for module 1 and M3 for module 2 (P=.01 and P=.001). CONCLUSIONS The integration of an e-learning training course in dermoscopy into the curriculum of FM residents results in a significant improvement in their diagnosis skills and meets their expectations. Developing a program combining an e-learning course and face-to-face training for residents is likely to result in more frequent and effective dermoscopy use by family doctors.
Collapse
Affiliation(s)
- Pauline Friche
- University Department of Family Medicine, University of Montpellier, Montpellier, France
| | - Lionel Moulis
- Clinical Research and Epidemiology Unit, Department of Public Health, Montpellier University Hospital, Montpellier, France
- Pathogenesis and Control of Chronic and Emerging Infections, University of Montpellier, Institut national de la santé et de la recherche médicale, Etablissement français du sang, University of Antilles, Montpellier, France
| | - Aurélie Du Thanh
- Pathogenesis and Control of Chronic and Emerging Infections, University of Montpellier, Institut national de la santé et de la recherche médicale, Etablissement français du sang, University of Antilles, Montpellier, France
- Department of Dermatology, Montpellier University Hospital, Montpellier, France
- Department of Dermatology, University of Montpellier, Montpellier, France
| | - Olivier Dereure
- Department of Dermatology, Montpellier University Hospital, Montpellier, France
- Department of Dermatology, University of Montpellier, Montpellier, France
| | - Claire Duflos
- Clinical Research and Epidemiology Unit, Department of Public Health, Montpellier University Hospital, Montpellier, France
- Department of Public Health, University of Montpellier, Montpellier, France
| | - Francois Carbonnel
- University Department of Family Medicine, University of Montpellier, Montpellier, France
- Desbrest Institute of Epidemiology and Public Health, Unité Mixte de Recherche, Unité d'accueil 11, University of Montpellier, Institut national de la santé et de la recherche médicale, Montpellier, France
- University Multiprofessional Health Center Avicenne, Montpellier, France
| |
Collapse
|
3
|
Skinner G, Chen T, Jentis G, Liu Y, McCulloh C, Harzman A, Huang E, Kalady M, Kim P. Real-time near infrared artificial intelligence using scalable non-expert crowdsourcing in colorectal surgery. NPJ Digit Med 2024; 7:99. [PMID: 38649447 PMCID: PMC11035672 DOI: 10.1038/s41746-024-01095-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/31/2023] [Accepted: 03/29/2024] [Indexed: 04/25/2024] Open
Abstract
Surgical artificial intelligence (AI) has the potential to improve patient safety and clinical outcomes. To date, training such AI models to identify tissue anatomy requires annotations by expensive and rate-limiting surgical domain experts. Herein, we demonstrate and validate a methodology to obtain high quality surgical tissue annotations through crowdsourcing of non-experts, and real-time deployment of multimodal surgical anatomy AI model in colorectal surgery.
Collapse
Affiliation(s)
- Garrett Skinner
- Jacobs School of Medicine and Biomedical Sciences, University at Buffalo, Buffalo, NY, USA
- Activ Surgical, University at Buffalo, Buffalo, NY, USA
| | - Tina Chen
- Activ Surgical, University at Buffalo, Buffalo, NY, USA
| | | | - Yao Liu
- Activ Surgical, University at Buffalo, Buffalo, NY, USA
- Warren Alpert Medical School Alpert Medical School of Brown University, Providence, RI, USA
| | | | - Alan Harzman
- The Ohio State University Wexner Medical Center, Columbus, OH, USA
| | - Emily Huang
- The Ohio State University Wexner Medical Center, Columbus, OH, USA
| | - Matthew Kalady
- The Ohio State University Wexner Medical Center, Columbus, OH, USA
| | - Peter Kim
- Jacobs School of Medicine and Biomedical Sciences, University at Buffalo, Buffalo, NY, USA.
- Activ Surgical, University at Buffalo, Buffalo, NY, USA.
| |
Collapse
|
4
|
McNeil AJ, Parks K, Liu X, Jiang B, Coco J, McCool K, Fabbri D, Duhaime EP, Dawant BM, Tkaczyk ER. Crowdsourcing Skin Demarcations of Chronic Graft-Versus-Host Disease in Patient Photographs: Training Versus Performance Study. JMIR DERMATOLOGY 2023; 6:e48589. [PMID: 38147369 PMCID: PMC10777279 DOI: 10.2196/48589] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/01/2023] [Revised: 10/02/2023] [Accepted: 10/24/2023] [Indexed: 12/27/2023] Open
Abstract
BACKGROUND Chronic graft-versus-host disease (cGVHD) is a significant cause of long-term morbidity and mortality in patients after allogeneic hematopoietic cell transplantation. Skin is the most commonly affected organ, and visual assessment of cGVHD can have low reliability. Crowdsourcing data from nonexpert participants has been used for numerous medical applications, including image labeling and segmentation tasks. OBJECTIVE This study aimed to assess the ability of crowds of nonexpert raters-individuals without any prior training for identifying or marking cGHVD-to demarcate photos of cGVHD-affected skin. We also studied the effect of training and feedback on crowd performance. METHODS Using a Canfield Vectra H1 3D camera, 360 photographs of the skin of 36 patients with cGVHD were taken. Ground truth demarcations were provided in 3D by a trained expert and reviewed by a board-certified dermatologist. In total, 3000 2D images (projections from various angles) were created for crowd demarcation through the DiagnosUs mobile app. Raters were split into high and low feedback groups. The performances of 4 different crowds of nonexperts were analyzed, including 17 raters per image for the low and high feedback groups, 32-35 raters per image for the low feedback group, and the top 5 performers for each image from the low feedback group. RESULTS Across 8 demarcation competitions, 130 raters were recruited to the high feedback group and 161 to the low feedback group. This resulted in a total of 54,887 individual demarcations from the high feedback group and 78,967 from the low feedback group. The nonexpert crowds achieved good overall performance for segmenting cGVHD-affected skin with minimal training, achieving a median surface area error of less than 12% of skin pixels for all crowds in both the high and low feedback groups. The low feedback crowds performed slightly poorer than the high feedback crowd, even when a larger crowd was used. Tracking the 5 most reliable raters from the low feedback group for each image recovered a performance similar to that of the high feedback crowd. Higher variability between raters for a given image was not found to correlate with lower performance of the crowd consensus demarcation and cannot therefore be used as a measure of reliability. No significant learning was observed during the task as more photos and feedback were seen. CONCLUSIONS Crowds of nonexpert raters can demarcate cGVHD images with good overall performance. Tracking the top 5 most reliable raters provided optimal results, obtaining the best performance with the lowest number of expert demarcations required for adequate training. However, the agreement amongst individual nonexperts does not help predict whether the crowd has provided an accurate result. Future work should explore the performance of crowdsourcing in standard clinical photos and further methods to estimate the reliability of consensus demarcations.
Collapse
Affiliation(s)
- Andrew J McNeil
- Dermatology Service and Research Service, Department of Veterans Affairs, Tennessee Valley Healthcare System, Nashville, TN, United States
- Department of Dermatology, Vanderbilt University Medical Center, Nashville, TN, United States
- Department of Electrical and Computer Engineering, Vanderbilt University, Nashville, TN, United States
| | - Kelsey Parks
- Department of Dermatology, Vanderbilt University Medical Center, Nashville, TN, United States
| | - Xiaoqi Liu
- Department of Dermatology, Vanderbilt University Medical Center, Nashville, TN, United States
- Department of Electrical and Computer Engineering, Vanderbilt University, Nashville, TN, United States
| | - Bohan Jiang
- Dermatology Service and Research Service, Department of Veterans Affairs, Tennessee Valley Healthcare System, Nashville, TN, United States
- Department of Dermatology, Vanderbilt University Medical Center, Nashville, TN, United States
- Department of Electrical and Computer Engineering, Vanderbilt University, Nashville, TN, United States
| | - Joseph Coco
- Department of Biomedical Informatics, Vanderbilt University Medical Center, Nasvhille, TN, United States
| | | | - Daniel Fabbri
- Department of Biomedical Informatics, Vanderbilt University Medical Center, Nasvhille, TN, United States
| | | | - Benoit M Dawant
- Department of Electrical and Computer Engineering, Vanderbilt University, Nashville, TN, United States
| | - Eric R Tkaczyk
- Dermatology Service and Research Service, Department of Veterans Affairs, Tennessee Valley Healthcare System, Nashville, TN, United States
- Department of Dermatology, Vanderbilt University Medical Center, Nashville, TN, United States
- Department of Electrical and Computer Engineering, Vanderbilt University, Nashville, TN, United States
- Department of Biomedical Informatics, Vanderbilt University Medical Center, Nasvhille, TN, United States
| |
Collapse
|