1
|
Woelfle T, Hirt J, Janiaud P, Kappos L, Ioannidis JPA, Hemkens LG. Benchmarking Human-AI collaboration for common evidence appraisal tools. J Clin Epidemiol 2024; 175:111533. [PMID: 39277058 DOI: 10.1016/j.jclinepi.2024.111533] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/14/2024] [Revised: 06/26/2024] [Accepted: 09/09/2024] [Indexed: 09/17/2024]
Abstract
BACKGROUND AND OBJECTIVE It is unknown whether large language models (LLMs) may facilitate time- and resource-intensive text-related processes in evidence appraisal. The objective was to quantify the agreement of LLMs with human consensus in appraisal of scientific reporting (Preferred Reporting Items for Systematic reviews and Meta-Analyses [PRISMA]) and methodological rigor (A MeaSurement Tool to Assess systematic Reviews [AMSTAR]) of systematic reviews and design of clinical trials (PRagmatic Explanatory Continuum Indicator Summary 2 [PRECIS-2]) and to identify areas where collaboration between humans and artificial intelligence (AI) would outperform the traditional consensus process of human raters in efficiency. STUDY DESIGN AND SETTING Five LLMs (Claude-3-Opus, Claude-2, GPT-4, GPT-3.5, Mixtral-8x22B) assessed 112 systematic reviews applying the PRISMA and AMSTAR criteria and 56 randomized controlled trials applying PRECIS-2. We quantified the agreement between human consensus and (1) individual human raters; (2) individual LLMs; (3) combined LLMs approach; (4) human-AI collaboration. Ratings were marked as deferred (undecided) in case of inconsistency between combined LLMs or between the human rater and the LLM. RESULTS Individual human rater accuracy was 89% for PRISMA and AMSTAR, and 75% for PRECIS-2. Individual LLM accuracy was ranging from 63% (GPT-3.5) to 70% (Claude-3-Opus) for PRISMA, 53% (GPT-3.5) to 74% (Claude-3-Opus) for AMSTAR, and 38% (GPT-4) to 55% (GPT-3.5) for PRECIS-2. Combined LLM ratings led to accuracies of 75%-88% for PRISMA (4%-74% deferred), 74%-89% for AMSTAR (6%-84% deferred), and 64%-79% for PRECIS-2 (29%-88% deferred). Human-AI collaboration resulted in the best accuracies from 89% to 96% for PRISMA (25/35% deferred), 91%-95% for AMSTAR (27/30% deferred), and 80%-86% for PRECIS-2 (76/71% deferred). CONCLUSION Current LLMs alone appraised evidence worse than humans. Human-AI collaboration may reduce workload for the second human rater for the assessment of reporting (PRISMA) and methodological rigor (AMSTAR) but not for complex tasks such as PRECIS-2.
Collapse
Affiliation(s)
- Tim Woelfle
- Pragmatic Evidence Lab, Research Center for Clinical Neuroimmunology and Neuroscience Basel (RC2NB), Basel, Switzerland; Department of Neurology, University Hospital Basel, Basel, Switzerland; Translational Imaging in Neurology (ThINk), Department of Biomedical Engineering, University Hospital and University of Basel, Basel, Switzerland.
| | - Julian Hirt
- Pragmatic Evidence Lab, Research Center for Clinical Neuroimmunology and Neuroscience Basel (RC2NB), Basel, Switzerland; Department of Clinical Research, University Hospital Basel and University of Basel, Basel, Switzerland; Institute of Nursing Science, Department of Health, Eastern Switzerland University of Applied Sciences, St. Gallen, Basel, Switzerland
| | - Perrine Janiaud
- Pragmatic Evidence Lab, Research Center for Clinical Neuroimmunology and Neuroscience Basel (RC2NB), Basel, Switzerland; Department of Clinical Research, University Hospital Basel and University of Basel, Basel, Switzerland
| | - Ludwig Kappos
- Pragmatic Evidence Lab, Research Center for Clinical Neuroimmunology and Neuroscience Basel (RC2NB), Basel, Switzerland
| | - John P A Ioannidis
- Meta-Research Innovation Center at Stanford (METRICS), Stanford University, Stanford, CA, USA; Departments of Medicine, of Epidemiology and Population Health, of Biomedical Data Science, and of Statistics, Stanford University, Stanford, CA, USA
| | - Lars G Hemkens
- Pragmatic Evidence Lab, Research Center for Clinical Neuroimmunology and Neuroscience Basel (RC2NB), Basel, Switzerland; Department of Clinical Research, University Hospital Basel and University of Basel, Basel, Switzerland; Meta-Research Innovation Center at Stanford (METRICS), Stanford University, Stanford, CA, USA; Meta-Research Innovation Center Berlin (METRIC-B), Berlin Institute of Health, Berlin, Germany
| |
Collapse
|
2
|
Hohenschurz-Schmidt D, Cherkin D, Rice AS, Dworkin RH, Turk DC, McDermott MP, Bair MJ, DeBar LL, Edwards RR, Evans SR, Farrar JT, Kerns RD, Rowbotham MC, Wasan AD, Cowan P, Ferguson M, Freeman R, Gewandter JS, Gilron I, Grol-Prokopczyk H, Iyengar S, Kamp C, Karp BI, Kleykamp BA, Loeser JD, Mackey S, Malamut R, McNicol E, Patel KV, Schmader K, Simon L, Steiner DJ, Veasley C, Vollert J. Methods for pragmatic randomized clinical trials of pain therapies: IMMPACT statement. Pain 2024; 165:2165-2183. [PMID: 38723171 PMCID: PMC11404339 DOI: 10.1097/j.pain.0000000000003249] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/09/2023] [Revised: 01/30/2024] [Accepted: 03/08/2024] [Indexed: 09/18/2024]
Abstract
ABSTRACT Pragmatic, randomized, controlled trials hold the potential to directly inform clinical decision making and health policy regarding the treatment of people experiencing pain. Pragmatic trials are designed to replicate or are embedded within routine clinical care and are increasingly valued to bridge the gap between trial research and clinical practice, especially in multidimensional conditions, such as pain and in nonpharmacological intervention research. To maximize the potential of pragmatic trials in pain research, the careful consideration of each methodological decision is required. Trials aligned with routine practice pose several challenges, such as determining and enrolling appropriate study participants, deciding on the appropriate level of flexibility in treatment delivery, integrating information on concomitant treatments and adherence, and choosing comparator conditions and outcome measures. Ensuring data quality in real-world clinical settings is another challenging goal. Furthermore, current trials in the field would benefit from analysis methods that allow for a differentiated understanding of effects across patient subgroups and improved reporting of methods and context, which is required to assess the generalizability of findings. At the same time, a range of novel methodological approaches provide opportunities for enhanced efficiency and relevance of pragmatic trials to stakeholders and clinical decision making. In this study, best-practice considerations for these and other concerns in pragmatic trials of pain treatments are offered and a number of promising solutions discussed. The basis of these recommendations was an Initiative on Methods, Measurement, and Pain Assessment in Clinical Trials (IMMPACT) meeting organized by the Analgesic, Anesthetic, and Addiction Clinical Trial Translations, Innovations, Opportunities, and Networks.
Collapse
Affiliation(s)
- David Hohenschurz-Schmidt
- Pain Research, Department of Surgery & Cancer, Faculty of Medicine, Imperial College London, United Kingdom
- Research Department, University College of Osteopathy, London, United Kingdom
| | - Dan Cherkin
- Osher Center for Integrative Health, Department of Family Medicine, University of Washington, Seattle, WA, United States
| | - Andrew S.C. Rice
- Pain Research, Department of Surgery & Cancer, Faculty of Medicine, Imperial College London, United Kingdom
| | - Robert H. Dworkin
- Department of Anesthesiology and Perioperative Medicine, University of Rochester Medical Center, Rochester, NY, United States
| | - Dennis C. Turk
- Anesthesiology and Pain Medicine, University of Washington, Seattle, WA, United States
| | - Michael P. McDermott
- Department of Biostatistics and Computational Biology, University of Rochester, Rochester, NY, United States
| | - Matthew J. Bair
- VA Center for Health Information and Communication, Regenstrief Institute, and Indiana University School of Medicine, Indianapolis, IN, United States
| | - Lynn L. DeBar
- Kaiser Permanente Washington Health Research Institute, Seattle, WA, United States
| | | | - Scott R. Evans
- Biostatistics Center and the Department of Biostatistics and Bioinformatics, Milken Institute School of Public Health, George Washington University, Rockville, MD, United States
| | - John T. Farrar
- Department of Biostatistics, Epidemiology, and Informatics, University of Pennsylvania, Philadelphia, PA, United States
| | - Robert D. Kerns
- Department of Psychiatry, Yale School of Medicine, New Haven, CT, United States
| | - Michael C. Rowbotham
- Department of Anesthesia, University of California San Francisco School of Medicine, San Francisco, CA, United States
| | - Ajay D. Wasan
- Departments of Anesthesiology & Perioperative Medicine, and Psychiatry, University of Pittsburgh School of Medicine, Pittsburgh, PA, United States
| | - Penney Cowan
- American Chronic Pain Association, Rocklin, CA, United States
| | - McKenzie Ferguson
- Department of Pharmacy Practice, Southern Illinois University Edwardsville, Edwardsville, IL, United States
| | - Roy Freeman
- Department of Neurology, Harvard Medical School, Boston, MA, United States
| | - Jennifer S. Gewandter
- Department of Anesthesiology and Perioperative, University of Rochester, Rochester, NY, United States
| | - Ian Gilron
- Departments of Anesthesiology & Perioperative Medicine, Biomedical & Molecular Sciences, Centre for Neuroscience Studies, and School of Policy Studies, Queen's University, Kingston Health Sciences Centre, Kingston, ON, Canada
| | - Hanna Grol-Prokopczyk
- Department of Sociology, University at Buffalo, State University of New York, Buffalo, NY, United States
| | | | - Cornelia Kamp
- Center for Health and Technology (CHeT), Clinical Materials Services Unit (CMSU), University of Rochester Medical Center, Rochester, NY, United States
| | | | - Bethea A. Kleykamp
- University of Maryland, School of Medicine, Baltimore, MD, United States
| | - John D. Loeser
- Departments of Neurological Surgery and Anesthesia and Pain Medicine, University of Washington, Seattle, WA, United States
| | - Sean Mackey
- Stanford University School of Medicine, Department of Anesthesiology, Perioperative, and Pain Medicine, Neurosciences and Neurology, Palo Alto, CA, United States
| | | | - Ewan McNicol
- Department of Pharmacy Practice, Massachusetts College of Pharmacy and Health Sciences University, Boston, MA, United States
| | - Kushang V. Patel
- Department of Anesthesiology and Pain Medicine, University of Washington, Seattle, WA, United States
| | - Kenneth Schmader
- Department of Medicine-Geriatrics, Center for the Study of Aging, Duke University Medical Center, and Geriatrics Research Education and Clinical Center, Durham VA Medical Center, Durham, NC, United States
| | - Lee Simon
- SDG, LLC, Cambridge, MA, United States
| | | | | | - Jan Vollert
- Department of Clinical and Biomedical Sciences, Faculty of Health and Life Sciences, University of Exeter, Exeter, United Kingdom
| |
Collapse
|
3
|
Hirt J, Janiaud P, Düblin P, Nicoletti GJ, Dembowska K, Nguyen TVT, Woelfle T, Axfors C, Yaldizli Ö, Granziera C, Kuhle J, Kappos L, Hemkens LG. Use of pragmatic randomized trials in multiple sclerosis: A systematic overview. Mult Scler 2024; 30:463-478. [PMID: 38253528 PMCID: PMC11010556 DOI: 10.1177/13524585231221938] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/26/2023] [Revised: 11/29/2023] [Accepted: 12/05/2023] [Indexed: 01/24/2024]
Abstract
BACKGROUND Pragmatic trials are increasingly recognized for providing real-world evidence on treatment choices. OBJECTIVE The objective of this study is to investigate the use and characteristics of pragmatic trials in multiple sclerosis (MS). METHODS Systematic literature search and analysis of pragmatic trials on any intervention published up to 2022. The assessment of pragmatism with PRECIS-2 (PRagmatic Explanatory Continuum Indicator Summary-2) is performed. RESULTS We identified 48 pragmatic trials published 1967-2022 that included a median of 82 participants (interquartile range (IQR) = 42-160) to assess typically supportive care interventions (n = 41; 85%). Only seven trials assessed drugs (15%). Only three trials (6%) included >500 participants. Trials were mostly from the United Kingdom (n = 18; 38%), Italy (n = 6; 13%), the United States and Denmark (each n = 5; 10%). Primary outcomes were diverse, for example, quality-of-life, physical functioning, or disease activity. Only 1 trial (2%) used routinely collected data for outcome ascertainment. No trial was very pragmatic in all design aspects, but 14 trials (29%) were widely pragmatic (i.e. PRECIS-2 score ⩾ 4/5 in all domains). CONCLUSION Only few and mostly small pragmatic trials exist in MS which rarely assess drugs. Despite the widely available routine data infrastructures, very few trials utilize them. There is an urgent need to leverage the potential of this pioneering study design to provide useful randomized real-world evidence.
Collapse
Affiliation(s)
- Julian Hirt
- Research Center for Clinical Neuroimmunology and Neuroscience Basel (RC2NB), University Hospital Basel and University of Basel, Basel, Switzerland/Department of Clinical Research, University Hospital Basel and University of Basel, Basel, Switzerland/Department of Health, Eastern Switzerland University of Applied Sciences, St. Gallen, Switzerland
| | - Perrine Janiaud
- Research Center for Clinical Neuroimmunology and Neuroscience Basel (RC2NB), University Hospital Basel and University of Basel, Basel, Switzerland/Department of Clinical Research, University Hospital Basel and University of Basel, Basel, Switzerland
| | - Pascal Düblin
- Department of Clinical Research, University Hospital Basel and University of Basel, Basel, Switzerland
| | | | - Kinga Dembowska
- Research Center for Clinical Neuroimmunology and Neuroscience Basel (RC2NB), University Hospital Basel and University of Basel, Basel, Switzerland/MSc program in epidemiology, Swiss TPH, University of Basel, Basel, Switzerland
| | - Thao Vy Thi Nguyen
- Research Center for Clinical Neuroimmunology and Neuroscience Basel (RC2NB), University Hospital Basel and University of Basel, Basel, Switzerland/MSc program in epidemiology, Swiss TPH, University of Basel, Basel, Switzerland
| | - Tim Woelfle
- Research Center for Clinical Neuroimmunology and Neuroscience Basel (RC2NB), University Hospital Basel and University of Basel, Basel, Switzerland
| | - Cathrine Axfors
- Research Center for Clinical Neuroimmunology and Neuroscience Basel (RC2NB), University Hospital Basel and University of Basel, Basel, Switzerland/Department of Clinical Research, University Hospital Basel and University of Basel, Basel, Switzerland
| | - Özgür Yaldizli
- Research Center for Clinical Neuroimmunology and Neuroscience Basel (RC2NB), University Hospital Basel and University of Basel, Basel, Switzerland/Department of Clinical Research, University Hospital Basel and University of Basel, Basel, Switzerland
| | - Cristina Granziera
- Research Center for Clinical Neuroimmunology and Neuroscience Basel (RC2NB), University Hospital Basel and University of Basel, Basel, Switzerland/Department of Clinical Research, University Hospital Basel and University of Basel, Basel, Switzerland
| | - Jens Kuhle
- Research Center for Clinical Neuroimmunology and Neuroscience Basel (RC2NB), University Hospital Basel and University of Basel, Basel, Switzerland/Department of Clinical Research, University Hospital Basel and University of Basel, Basel, Switzerland
| | - Ludwig Kappos
- Research Center for Clinical Neuroimmunology and Neuroscience Basel (RC2NB), University Hospital Basel and University of Basel, Basel, Switzerland/Department of Clinical Research, University Hospital Basel and University of Basel, Basel, Switzerland
| | - Lars G Hemkens
- Research Center for Clinical Neuroimmunology and Neuroscience Basel (RC2NB), University Hospital Basel and University of Basel, Basel, Switzerland/Department of Clinical Research, University Hospital Basel and University of Basel, Basel, Switzerland
| |
Collapse
|