1
|
Kopka M, Napierala H, Privoznik M, Sapunova D, Zhang S, Feufel MA. The RepVig framework for designing use-case specific representative vignettes and evaluating triage accuracy of laypeople and symptom assessment applications. Sci Rep 2024; 14:30614. [PMID: 39715767 DOI: 10.1038/s41598-024-83844-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/05/2024] [Accepted: 12/17/2024] [Indexed: 12/25/2024] Open
Abstract
Most studies evaluating symptom-assessment applications (SAAs) rely on a common set of case vignettes that are authored by clinicians and devoid of context, which may be representative of clinical settings but not of situations where patients use SAAs. Assuming the use case of self-triage, we used representative design principles to sample case vignettes from online platforms where patients describe their symptoms to obtain professional advice and compared triage performance of laypeople, SAAs (e.g., WebMD or NHS 111), and Large Language Models (LLMs, e.g., GPT-4 or Claude) on representative versus standard vignettes. We found performance differences in all three groups depending on vignette type: When using representative vignettes, accuracy was higher (OR = 1.52 to 2.00, p < .001 to .03 in binary decisions, i.e., correct or incorrect), safety was higher (OR = 1.81 to 3.41, p < .001 to .002 in binary decisions, i.e., safe or unsafe), and the inclination to overtriage was also higher (OR = 1.80 to 2.66, p < .001 to p = .035 in binary decisions, overtriage or undertriage error). Additionally, we found changed rankings of best-performing SAAs and LLMs. Based on these results, we argue that our representative vignette sampling approach (that we call the RepVig Framework) should replace the practice of using a fixed vignette set as standard for SAA evaluation studies.
Collapse
Affiliation(s)
- Marvin Kopka
- Division of Ergonomics, Department of Psychology and Ergonomics (IPA), Technische Universität Berlin, Berlin, Germany.
| | - Hendrik Napierala
- Institute of General Practice and Family Medicine, Charité - Universitätsmedizin, Corporate Member of Freie Universität Berlin and Humboldt- Universität zu Berlin, Berlin, Germany
| | - Martin Privoznik
- Emergency and Acute Medicine and Health Services Research in Emergency Medicine, Charité - Universitätsmedizin, Corporate Member of Freie Universität Berlin and Humboldt- Universität zu Berlin, Berlin, Germany
| | - Desislava Sapunova
- Division of Ergonomics, Department of Psychology and Ergonomics (IPA), Technische Universität Berlin, Berlin, Germany
| | - Sizhuo Zhang
- Division of Ergonomics, Department of Psychology and Ergonomics (IPA), Technische Universität Berlin, Berlin, Germany
| | - Markus A Feufel
- Division of Ergonomics, Department of Psychology and Ergonomics (IPA), Technische Universität Berlin, Berlin, Germany
| |
Collapse
|
2
|
Grossmann I, Rotella A, Hutcherson CA, Sharpinskyi K, Varnum MEW, Achter S, Dhami MK, Guo XE, Kara-Yakoubian M, Mandel DR, Raes L, Tay L, Vie A, Wagner L, Adamkovic M, Arami A, Arriaga P, Bandara K, Baník G, Bartoš F, Baskin E, Bergmeir C, Białek M, Børsting CK, Browne DT, Caruso EM, Chen R, Chie BT, Chopik WJ, Collins RN, Cong CW, Conway LG, Davis M, Day MV, Dhaliwal NA, Durham JD, Dziekan M, Elbaek CT, Shuman E, Fabrykant M, Firat M, Fong GT, Frimer JA, Gallegos JM, Goldberg SB, Gollwitzer A, Goyal J, Graf-Vlachy L, Gronlund SD, Hafenbrädl S, Hartanto A, Hirshberg MJ, Hornsey MJ, Howe PDL, Izadi A, Jaeger B, Kačmár P, Kim YJ, Krenzler R, Lannin DG, Lin HW, Lou NM, Lua VYQ, Lukaszewski AW, Ly AL, Madan CR, Maier M, Majeed NM, March DS, Marsh AA, Misiak M, Myrseth KOR, Napan JM, Nicholas J, Nikolopoulos K, O J, Otterbring T, Paruzel-Czachura M, Pauer S, Protzko J, Raffaelli Q, Ropovik I, Ross RM, Roth Y, Røysamb E, Schnabel L, Schütz A, Seifert M, Sevincer AT, Sherman GT, Simonsson O, Sung MC, Tai CC, Talhelm T, Teachman BA, Tetlock PE, Thomakos D, Tse DCK, Twardus OJ, Tybur JM, et alGrossmann I, Rotella A, Hutcherson CA, Sharpinskyi K, Varnum MEW, Achter S, Dhami MK, Guo XE, Kara-Yakoubian M, Mandel DR, Raes L, Tay L, Vie A, Wagner L, Adamkovic M, Arami A, Arriaga P, Bandara K, Baník G, Bartoš F, Baskin E, Bergmeir C, Białek M, Børsting CK, Browne DT, Caruso EM, Chen R, Chie BT, Chopik WJ, Collins RN, Cong CW, Conway LG, Davis M, Day MV, Dhaliwal NA, Durham JD, Dziekan M, Elbaek CT, Shuman E, Fabrykant M, Firat M, Fong GT, Frimer JA, Gallegos JM, Goldberg SB, Gollwitzer A, Goyal J, Graf-Vlachy L, Gronlund SD, Hafenbrädl S, Hartanto A, Hirshberg MJ, Hornsey MJ, Howe PDL, Izadi A, Jaeger B, Kačmár P, Kim YJ, Krenzler R, Lannin DG, Lin HW, Lou NM, Lua VYQ, Lukaszewski AW, Ly AL, Madan CR, Maier M, Majeed NM, March DS, Marsh AA, Misiak M, Myrseth KOR, Napan JM, Nicholas J, Nikolopoulos K, O J, Otterbring T, Paruzel-Czachura M, Pauer S, Protzko J, Raffaelli Q, Ropovik I, Ross RM, Roth Y, Røysamb E, Schnabel L, Schütz A, Seifert M, Sevincer AT, Sherman GT, Simonsson O, Sung MC, Tai CC, Talhelm T, Teachman BA, Tetlock PE, Thomakos D, Tse DCK, Twardus OJ, Tybur JM, Ungar L, Vandermeulen D, Vaughan Williams L, Vosgerichian HA, Wang Q, Wang K, Whiting ME, Wollbrant CE, Yang T, Yogeeswaran K, Yoon S, Alves VR, Andrews-Hanna JR, Bloom PA, Boyles A, Charis L, Choi M, Darling-Hammond S, Ferguson ZE, Kaiser CR, Karg ST, Ortega AL, Mahoney L, Marsh MS, Martinie MFRC, Michaels EK, Millroth P, Naqvi JB, Ng W, Rutledge RB, Slattery P, Smiley AH, Strijbis O, Sznycer D, Tsukayama E, van Loon A, Voelkel JG, Wienk MNA, Wilkening T. Insights into the accuracy of social scientists' forecasts of societal change. Nat Hum Behav 2023; 7:484-501. [PMID: 36759585 PMCID: PMC10192018 DOI: 10.1038/s41562-022-01517-1] [Show More Authors] [Citation(s) in RCA: 5] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/26/2022] [Accepted: 12/19/2022] [Indexed: 02/11/2023]
Abstract
How well can social scientists predict societal change, and what processes underlie their predictions? To answer these questions, we ran two forecasting tournaments testing the accuracy of predictions of societal change in domains commonly studied in the social sciences: ideological preferences, political polarization, life satisfaction, sentiment on social media, and gender-career and racial bias. After we provided them with historical trend data on the relevant domain, social scientists submitted pre-registered monthly forecasts for a year (Tournament 1; N = 86 teams and 359 forecasts), with an opportunity to update forecasts on the basis of new data six months later (Tournament 2; N = 120 teams and 546 forecasts). Benchmarking forecasting accuracy revealed that social scientists' forecasts were on average no more accurate than those of simple statistical models (historical means, random walks or linear regressions) or the aggregate forecasts of a sample from the general public (N = 802). However, scientists were more accurate if they had scientific expertise in a prediction domain, were interdisciplinary, used simpler models and based predictions on prior data.
Collapse
|