1
|
Rinderknecht E, von Winning D, Kravchuk A, Schäfer C, Schnabel MJ, Siepmann S, Mayr R, Grassinger J, Goßler C, Pohl F, Siska PJ, Zeman F, Breyer J, Schmelzer A, Gilfrich C, Brookman-May SD, Burger M, Haas M, May M. Modification and Validation of the System Causability Scale Using AI-Based Therapeutic Recommendations for Urological Cancer Patients: A Basis for the Development of a Prospective Comparative Study. Curr Oncol 2024; 31:7061-7073. [PMID: 39590151 PMCID: PMC11593082 DOI: 10.3390/curroncol31110520] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/14/2024] [Revised: 11/05/2024] [Accepted: 11/09/2024] [Indexed: 11/28/2024] Open
Abstract
The integration of artificial intelligence, particularly Large Language Models (LLMs), has the potential to significantly enhance therapeutic decision-making in clinical oncology. Initial studies across various disciplines have demonstrated that LLM-based treatment recommendations can rival those of multidisciplinary tumor boards (MTBs); however, such data are currently lacking for urological cancers. This preparatory study establishes a robust methodological foundation for the forthcoming CONCORDIA trial, including the validation of the System Causability Scale (SCS) and its modified version (mSCS), as well as the selection of LLMs for urological cancer treatment recommendations based on recommendations from ChatGPT-4 and an MTB for 40 urological cancer scenarios. Both scales demonstrated strong validity, reliability (all aggregated Cohen's K > 0.74), and internal consistency (all Cronbach's Alpha > 0.9), with the mSCS showing superior reliability, internal consistency, and clinical applicability (p < 0.01). Two Delphi processes were used to define the LLMs to be tested in the CONCORDIA study (ChatGPT-4 and Claude 3.5 Sonnet) and to establish the acceptable non-inferiority margin for LLM recommendations compared to MTB recommendations. The forthcoming ethics-approved and registered CONCORDIA non-inferiority trial will require 110 urological cancer scenarios, with an mSCS difference threshold of 0.15, a Bonferroni corrected alpha of 0.025, and a beta of 0.1. Blinded mSCS assessments of MTB recommendations will then be compared to those of the LLMs. In summary, this work establishes the necessary prerequisites prior to initiating the CONCORDIA study and validates a modified score with high applicability and reliability for this and future trials.
Collapse
Affiliation(s)
- Emily Rinderknecht
- Department of Urology, Caritas St. Josef Hospital, University of Regensburg,93053 Regensburg, Germany; (E.R.); (M.J.S.); (R.M.); (C.G.); (J.B.); (M.B.); (M.H.)
| | - Dominik von Winning
- Department of Urology, St. Elisabeth Hospital Straubing, 94315 Straubing, Germany; (D.v.W.); (A.K.); (S.S.); (A.S.); (C.G.)
| | - Anton Kravchuk
- Department of Urology, St. Elisabeth Hospital Straubing, 94315 Straubing, Germany; (D.v.W.); (A.K.); (S.S.); (A.S.); (C.G.)
| | - Christof Schäfer
- Department of Radiotherapy, Straubing Hospital Medical Care Centre, 94315 Straubing, Germany;
| | - Marco J. Schnabel
- Department of Urology, Caritas St. Josef Hospital, University of Regensburg,93053 Regensburg, Germany; (E.R.); (M.J.S.); (R.M.); (C.G.); (J.B.); (M.B.); (M.H.)
| | - Stephan Siepmann
- Department of Urology, St. Elisabeth Hospital Straubing, 94315 Straubing, Germany; (D.v.W.); (A.K.); (S.S.); (A.S.); (C.G.)
| | - Roman Mayr
- Department of Urology, Caritas St. Josef Hospital, University of Regensburg,93053 Regensburg, Germany; (E.R.); (M.J.S.); (R.M.); (C.G.); (J.B.); (M.B.); (M.H.)
| | - Jochen Grassinger
- Department of Hematology and Oncology, Straubing Hospital Medical Care Centre, 94315 Straubing, Germany;
| | - Christopher Goßler
- Department of Urology, Caritas St. Josef Hospital, University of Regensburg,93053 Regensburg, Germany; (E.R.); (M.J.S.); (R.M.); (C.G.); (J.B.); (M.B.); (M.H.)
| | - Fabian Pohl
- Department of Radiotherapy, University Hospital Regensburg, 93053 Regensburg, Germany;
| | - Peter J. Siska
- Department of Internal Medicine III, University Hospital Regensburg, 93053 Regensburg, Germany;
| | - Florian Zeman
- Center for Clinical Studies, University Hospital Regensburg, 93053 Regensburg, Germany;
| | - Johannes Breyer
- Department of Urology, Caritas St. Josef Hospital, University of Regensburg,93053 Regensburg, Germany; (E.R.); (M.J.S.); (R.M.); (C.G.); (J.B.); (M.B.); (M.H.)
| | - Anna Schmelzer
- Department of Urology, St. Elisabeth Hospital Straubing, 94315 Straubing, Germany; (D.v.W.); (A.K.); (S.S.); (A.S.); (C.G.)
| | - Christian Gilfrich
- Department of Urology, St. Elisabeth Hospital Straubing, 94315 Straubing, Germany; (D.v.W.); (A.K.); (S.S.); (A.S.); (C.G.)
| | | | - Maximilian Burger
- Department of Urology, Caritas St. Josef Hospital, University of Regensburg,93053 Regensburg, Germany; (E.R.); (M.J.S.); (R.M.); (C.G.); (J.B.); (M.B.); (M.H.)
| | - Maximilian Haas
- Department of Urology, Caritas St. Josef Hospital, University of Regensburg,93053 Regensburg, Germany; (E.R.); (M.J.S.); (R.M.); (C.G.); (J.B.); (M.B.); (M.H.)
| | - Matthias May
- Department of Urology, St. Elisabeth Hospital Straubing, 94315 Straubing, Germany; (D.v.W.); (A.K.); (S.S.); (A.S.); (C.G.)
| |
Collapse
|
2
|
Labinsky H, Nagler LK, Krusche M, Griewing S, Aries P, Kroiß A, Strunz PP, Kuhn S, Schmalzing M, Gernert M, Knitza J. Vignette-based comparative analysis of ChatGPT and specialist treatment decisions for rheumatic patients: results of the Rheum2Guide study. Rheumatol Int 2024; 44:2043-2053. [PMID: 39126460 PMCID: PMC11392980 DOI: 10.1007/s00296-024-05675-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/03/2024] [Accepted: 07/27/2024] [Indexed: 08/12/2024]
Abstract
BACKGROUND The complex nature of rheumatic diseases poses considerable challenges for clinicians when developing individualized treatment plans. Large language models (LLMs) such as ChatGPT could enable treatment decision support. OBJECTIVE To compare treatment plans generated by ChatGPT-3.5 and GPT-4 to those of a clinical rheumatology board (RB). DESIGN/METHODS Fictional patient vignettes were created and GPT-3.5, GPT-4, and the RB were queried to provide respective first- and second-line treatment plans with underlying justifications. Four rheumatologists from different centers, blinded to the origin of treatment plans, selected the overall preferred treatment concept and assessed treatment plans' safety, EULAR guideline adherence, medical adequacy, overall quality, justification of the treatment plans and their completeness as well as patient vignette difficulty using a 5-point Likert scale. RESULTS 20 fictional vignettes covering various rheumatic diseases and varying difficulty levels were assembled and a total of 160 ratings were assessed. In 68.8% (110/160) of cases, raters preferred the RB's treatment plans over those generated by GPT-4 (16.3%; 26/160) and GPT-3.5 (15.0%; 24/160). GPT-4's plans were chosen more frequently for first-line treatments compared to GPT-3.5. No significant safety differences were observed between RB and GPT-4's first-line treatment plans. Rheumatologists' plans received significantly higher ratings in guideline adherence, medical appropriateness, completeness and overall quality. Ratings did not correlate with the vignette difficulty. LLM-generated plans were notably longer and more detailed. CONCLUSION GPT-4 and GPT-3.5 generated safe, high-quality treatment plans for rheumatic diseases, demonstrating promise in clinical decision support. Future research should investigate detailed standardized prompts and the impact of LLM usage on clinical decisions.
Collapse
Affiliation(s)
- Hannah Labinsky
- Department of Internal Medicine 2, Rheumatology/Clinical Immunology, University Hospital Würzburg, Oberdürrbacher Straße 6, 97080, Würzburg, Germany
| | - Lea-Kristin Nagler
- Department of Internal Medicine 2, Rheumatology/Clinical Immunology, University Hospital Würzburg, Oberdürrbacher Straße 6, 97080, Würzburg, Germany
| | - Martin Krusche
- Division of Rheumatology and Systemic Inflammatory Diseases, III. Department of Medicine, University Medical Center Hamburg-Eppendorf, Hamburg, Germany
| | - Sebastian Griewing
- Institute for Digital Medicine, University Hospital Giessen-Marburg, Philipps University, Baldingerstrasse, Marburg, Germany
- Stanford Center for Biomedical Informatics Research, Stanford University School of Medicine, Palo Alto, CA, USA
| | - Peer Aries
- Department of Rheumatology, Immunologikum, Hamburg, Germany
| | - Anja Kroiß
- Department of Internal Medicine 2, Rheumatology/Clinical Immunology, University Hospital Würzburg, Oberdürrbacher Straße 6, 97080, Würzburg, Germany
| | - Patrick-Pascal Strunz
- Department of Internal Medicine 2, Rheumatology/Clinical Immunology, University Hospital Würzburg, Oberdürrbacher Straße 6, 97080, Würzburg, Germany
| | - Sebastian Kuhn
- Institute for Digital Medicine, University Hospital Giessen-Marburg, Philipps University, Baldingerstrasse, Marburg, Germany
| | - Marc Schmalzing
- Department of Internal Medicine 2, Rheumatology/Clinical Immunology, University Hospital Würzburg, Oberdürrbacher Straße 6, 97080, Würzburg, Germany
| | - Michael Gernert
- Department of Internal Medicine 2, Rheumatology/Clinical Immunology, University Hospital Würzburg, Oberdürrbacher Straße 6, 97080, Würzburg, Germany
| | - Johannes Knitza
- Institute for Digital Medicine, University Hospital Giessen-Marburg, Philipps University, Baldingerstrasse, Marburg, Germany.
- AGEIS, Université Grenoble Alpes, Grenoble, France.
| |
Collapse
|
3
|
Zhou J, Liu Y, Yang Y, Fang P, Chen L, Yuan Y. The rise of ChatGPT-4: exploring its efficacy as a decision support tool in esophageal surgery - a research letter. Int J Surg 2024; 110:5928-5930. [PMID: 38814307 PMCID: PMC11392133 DOI: 10.1097/js9.0000000000001696] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/17/2023] [Accepted: 05/13/2024] [Indexed: 05/31/2024]
Affiliation(s)
| | | | | | | | | | - Yong Yuan
- Department of Thoracic Surgery and Institute of Thoracic Oncology, West China Hospital of Sichuan University, Chengdu, Sichuan, People’s Republic of China
| |
Collapse
|
4
|
Du X, Zhou Z, Wang Y, Chuang YW, Yang R, Zhang W, Wang X, Zhang R, Hong P, Bates DW, Zhou L. Generative Large Language Models in Electronic Health Records for Patient Care Since 2023: A Systematic Review. MEDRXIV : THE PREPRINT SERVER FOR HEALTH SCIENCES 2024:2024.08.11.24311828. [PMID: 39228726 PMCID: PMC11370524 DOI: 10.1101/2024.08.11.24311828] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 09/05/2024]
Abstract
Background Generative Large language models (LLMs) represent a significant advancement in natural language processing, achieving state-of-the-art performance across various tasks. However, their application in clinical settings using real electronic health records (EHRs) is still rare and presents numerous challenges. Objective This study aims to systematically review the use of generative LLMs, and the effectiveness of relevant techniques in patient care-related topics involving EHRs, summarize the challenges faced, and suggest future directions. Methods A Boolean search for peer-reviewed articles was conducted on May 19th, 2024 using PubMed and Web of Science to include research articles published since 2023, which was one month after the release of ChatGPT. The search results were deduplicated. Multiple reviewers, including biomedical informaticians, computer scientists, and a physician, screened the publications for eligibility and conducted data extraction. Only studies utilizing generative LLMs to analyze real EHR data were included. We summarized the use of prompt engineering, fine-tuning, multimodal EHR data, and evaluation matrices. Additionally, we identified current challenges in applying LLMs in clinical settings as reported by the included studies and proposed future directions. Results The initial search identified 6,328 unique studies, with 76 studies included after eligibility screening. Of these, 67 studies (88.2%) employed zero-shot prompting, five of them reported 100% accuracy on five specific clinical tasks. Nine studies used advanced prompting strategies; four tested these strategies experimentally, finding that prompt engineering improved performance, with one study noting a non-linear relationship between the number of examples in a prompt and performance improvement. Eight studies explored fine-tuning generative LLMs, all reported performance improvements on specific tasks, but three of them noted potential performance degradation after fine-tuning on certain tasks. Only two studies utilized multimodal data, which improved LLM-based decision-making and enabled accurate rare disease diagnosis and prognosis. The studies employed 55 different evaluation metrics for 22 purposes, such as correctness, completeness, and conciseness. Two studies investigated LLM bias, with one detecting no bias and the other finding that male patients received more appropriate clinical decision-making suggestions. Six studies identified hallucinations, such as fabricating patient names in structured thyroid ultrasound reports. Additional challenges included but were not limited to the impersonal tone of LLM consultations, which made patients uncomfortable, and the difficulty patients had in understanding LLM responses. Conclusion Our review indicates that few studies have employed advanced computational techniques to enhance LLM performance. The diverse evaluation metrics used highlight the need for standardization. LLMs currently cannot replace physicians due to challenges such as bias, hallucinations, and impersonal responses.
Collapse
Affiliation(s)
- Xinsong Du
- Division of General Internal Medicine and Primary Care, Brigham and Women’s Hospital, Boston, Massachusetts 02115
- Department of Medicine, Harvard Medical School, Boston, Massachusetts 02115
- Department of Biomedical Informatics, Harvard Medical School, Boston, Massachusetts 02115
| | - Zhengyang Zhou
- Department of Computer Science, Brandeis University, Waltham, MA 02453
| | - Yifei Wang
- Department of Computer Science, Brandeis University, Waltham, MA 02453
| | - Ya-Wen Chuang
- Division of Nephrology, Department of Internal Medicine, Taichung Veterans General Hospital, Taichung, Taiwan, 407219
- Department of Post-Baccalaureate Medicine, College of Medicine, National Chung Hsing University, Taichung, Taiwan, 402202
- School of Medicine, College of Medicine, China Medical University, Taichung, Taiwan, 404328
| | - Richard Yang
- Division of General Internal Medicine and Primary Care, Brigham and Women’s Hospital, Boston, Massachusetts 02115
- Department of Medicine, Harvard Medical School, Boston, Massachusetts 02115
| | - Wenyu Zhang
- Division of General Internal Medicine and Primary Care, Brigham and Women’s Hospital, Boston, Massachusetts 02115
- Department of Medicine, Harvard Medical School, Boston, Massachusetts 02115
- Department of Biomedical Informatics, Harvard Medical School, Boston, Massachusetts 02115
| | - Xinyi Wang
- Division of General Internal Medicine and Primary Care, Brigham and Women’s Hospital, Boston, Massachusetts 02115
- Department of Medicine, Harvard Medical School, Boston, Massachusetts 02115
- Department of Biomedical Informatics, Harvard Medical School, Boston, Massachusetts 02115
| | - Rui Zhang
- Division of Computational Health Sciences, University of Minnesota, Minneapolis, MN 55455
| | - Pengyu Hong
- Department of Computer Science, Brandeis University, Waltham, MA 02453
| | - David W. Bates
- Division of General Internal Medicine and Primary Care, Brigham and Women’s Hospital, Boston, Massachusetts 02115
- Department of Medicine, Harvard Medical School, Boston, Massachusetts 02115
- Department of Health Policy and Management, Harvard T.H. Chan School of Public Health, Boston, MA 02115
| | - Li Zhou
- Division of General Internal Medicine and Primary Care, Brigham and Women’s Hospital, Boston, Massachusetts 02115
- Department of Medicine, Harvard Medical School, Boston, Massachusetts 02115
- Department of Biomedical Informatics, Harvard Medical School, Boston, Massachusetts 02115
| |
Collapse
|
5
|
Aghamaliyev U, Karimbayli J, Giessen-Jung C, Matthias I, Unger K, Andrade D, Hofmann FO, Weniger M, Angele MK, Benedikt Westphalen C, Werner J, Renz BW. ChatGPT's Gastrointestinal Tumor Board Tango: A limping dance partner? Eur J Cancer 2024; 205:114100. [PMID: 38729055 DOI: 10.1016/j.ejca.2024.114100] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/12/2024] [Accepted: 04/23/2024] [Indexed: 05/12/2024]
Abstract
OBJECTIVES This study aimed to assess the consistency and replicability of treatment recommendations provided by ChatGPT 3.5 compared to gastrointestinal tumor cases presented at multidisciplinary tumor boards (MTBs). It also aimed to distinguish between general and case-specific responses and investigated the precision of ChatGPT's recommendations in replicating exact treatment plans, particularly regarding chemotherapy regimens and follow-up protocols. MATERIAL AND METHODS A retrospective study was carried out on 115 cases of gastrointestinal malignancies, selected from 448 patients reviewed in MTB meetings. A senior resident fed patient data into ChatGPT 3.5 to produce treatment recommendations, which were then evaluated against the tumor board's decisions by senior oncology fellows. RESULTS Among the examined cases, ChatGPT 3.5 provided general information about the malignancy without considering individual patient characteristics in 19% of cases. However, only in 81% of cases, ChatGPT generated responses that were specific to the individual clinical scenarios. In the subset of case-specific responses, 83% of recommendations exhibited overall treatment strategy concordance between ChatGPT and MTB. However, the exact treatment concordance dropped to 65%, notably lower in recommending specific chemotherapy regimens. Cases recommended for surgery showed the highest concordance rates, while those involving chemotherapy recommendations faced challenges in precision. CONCLUSIONS ChatGPT 3.5 demonstrates potential in aligning conceptual approaches to treatment strategies with MTB guidelines. However, it falls short in accurately duplicating specific treatment plans, especially concerning chemotherapy regimens and follow-up procedures. Ethical concerns and challenges in achieving exact replication necessitate prudence when considering ChatGPT 3.5 for direct clinical decision-making in MTBs.
Collapse
Affiliation(s)
- Ughur Aghamaliyev
- Department of General, Visceral and Transplantation Surgery, LMU University Hospital, LMU Munich, Germany
| | - Javad Karimbayli
- Division of Molecular Oncology, Centro di Riferimento Oncologico di Aviano (CRO), IRCCS, National Cancer Institute, Aviano, Italy
| | - Clemens Giessen-Jung
- Comprehensive Cancer Center Munich & Department of Medicine III, LMU University Hospital, LMU Munich, Germany
| | - Ilmer Matthias
- Department of General, Visceral and Transplantation Surgery, LMU University Hospital, LMU Munich, Germany; German Cancer Consortium (DKTK), Partner Site Munich, Munich, Germany
| | - Kristian Unger
- German Cancer Consortium (DKTK), Partner Site Munich, Munich, Germany; Department of Radiation Oncology, University Hospital, LMU Munich, 81377; Bavarian Cancer Research Center (BZKF), Munich, Germany
| | - Dorian Andrade
- Department of General, Visceral and Transplantation Surgery, LMU University Hospital, LMU Munich, Germany
| | - Felix O Hofmann
- Department of General, Visceral and Transplantation Surgery, LMU University Hospital, LMU Munich, Germany; German Cancer Consortium (DKTK), Partner Site Munich, Munich, Germany
| | - Maximilian Weniger
- Department of General, Visceral and Transplantation Surgery, LMU University Hospital, LMU Munich, Germany
| | - Martin K Angele
- Department of General, Visceral and Transplantation Surgery, LMU University Hospital, LMU Munich, Germany
| | - C Benedikt Westphalen
- Comprehensive Cancer Center Munich & Department of Medicine III, LMU University Hospital, LMU Munich, Germany; German Cancer Consortium (DKTK), Partner Site Munich, Munich, Germany
| | - Jens Werner
- Department of General, Visceral and Transplantation Surgery, LMU University Hospital, LMU Munich, Germany
| | - Bernhard W Renz
- Department of General, Visceral and Transplantation Surgery, LMU University Hospital, LMU Munich, Germany; German Cancer Consortium (DKTK), Partner Site Munich, Munich, Germany.
| |
Collapse
|
6
|
Bragazzi NL, Garbarino S. Toward Clinical Generative AI: Conceptual Framework. JMIR AI 2024; 3:e55957. [PMID: 38875592 PMCID: PMC11193080 DOI: 10.2196/55957] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 12/30/2023] [Revised: 04/08/2024] [Accepted: 05/06/2024] [Indexed: 06/16/2024]
Abstract
Clinical decision-making is a crucial aspect of health care, involving the balanced integration of scientific evidence, clinical judgment, ethical considerations, and patient involvement. This process is dynamic and multifaceted, relying on clinicians' knowledge, experience, and intuitive understanding to achieve optimal patient outcomes through informed, evidence-based choices. The advent of generative artificial intelligence (AI) presents a revolutionary opportunity in clinical decision-making. AI's advanced data analysis and pattern recognition capabilities can significantly enhance the diagnosis and treatment of diseases, processing vast medical data to identify patterns, tailor treatments, predict disease progression, and aid in proactive patient management. However, the incorporation of AI into clinical decision-making raises concerns regarding the reliability and accuracy of AI-generated insights. To address these concerns, 11 "verification paradigms" are proposed in this paper, with each paradigm being a unique method to verify the evidence-based nature of AI in clinical decision-making. This paper also frames the concept of "clinically explainable, fair, and responsible, clinician-, expert-, and patient-in-the-loop AI." This model focuses on ensuring AI's comprehensibility, collaborative nature, and ethical grounding, advocating for AI to serve as an augmentative tool, with its decision-making processes being transparent and understandable to clinicians and patients. The integration of AI should enhance, not replace, the clinician's judgment and should involve continuous learning and adaptation based on real-world outcomes and ethical and legal compliance. In conclusion, while generative AI holds immense promise in enhancing clinical decision-making, it is essential to ensure that it produces evidence-based, reliable, and impactful knowledge. Using the outlined paradigms and approaches can help the medical and patient communities harness AI's potential while maintaining high patient care standards.
Collapse
Affiliation(s)
- Nicola Luigi Bragazzi
- Human Nutrition Unit, Department of Food and Drugs, University of Parma, Parma, Italy
| | - Sergio Garbarino
- Department of Neuroscience, Rehabilitation, Ophthalmology, Genetics and Maternal/Child Sciences, University of Genoa, Genoa, Italy
| |
Collapse
|
7
|
Rodler S, Ganjavi C, De Backer P, Magoulianitis V, Ramacciotti LS, De Castro Abreu AL, Gill IS, Cacciamani GE. Generative artificial intelligence in surgery. Surgery 2024; 175:1496-1502. [PMID: 38582732 DOI: 10.1016/j.surg.2024.02.019] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/07/2023] [Revised: 02/18/2024] [Accepted: 02/23/2024] [Indexed: 04/08/2024]
Abstract
Generative artificial intelligence is able to collect, extract, digest, and generate information in an understandable way for humans. As the first surgical applications of generative artificial intelligence are applied, this perspective paper aims to provide a comprehensive overview of current applications and future perspectives for the application of generative artificial intelligence in surgery, from preoperative planning to training. Generative artificial intelligence can be used before surgery for planning and decision support by extracting patient information and providing patients with information and simulation regarding the procedure. Intraoperatively, generative artificial intelligence can document data that is normally not captured as intraoperative adverse events or provide information to help decision-making. Postoperatively, GAIs can help with patient discharge and follow-up. The ability to provide real-time feedback and store it for later review is an important capability of GAIs. GAI applications are emerging as highly specialized, task-specific tools for tasks such as data extraction, synthesis, presentation, and communication within the realm of surgery. GAIs have the potential to play a pivotal role in facilitating interaction between surgeons and artificial intelligence.
Collapse
Affiliation(s)
- Severin Rodler
- USC Institute of Urology and Catherine and Joseph Aresty Department of Urology, Keck School of Medicine, University of Southern California, Los Angeles, CA; Artificial Intelligence Center at USC Urology, USC Institute of Urology, University of Southern California, Los Angeles, CA; Department of Urology, University Hospital of LMU Munich, Germany; Young Academic Working Group in Urologic Technology of the European Association of Urology, Arnhem, The Netherlands
| | - Conner Ganjavi
- USC Institute of Urology and Catherine and Joseph Aresty Department of Urology, Keck School of Medicine, University of Southern California, Los Angeles, CA; Artificial Intelligence Center at USC Urology, USC Institute of Urology, University of Southern California, Los Angeles, CA
| | - Pieter De Backer
- Young Academic Working Group in Urologic Technology of the European Association of Urology, Arnhem, The Netherlands; Department of Urology, Onze-Lieve-Vrouwziekenhuis Hospital, Aalst, Belgium; ORSI Academy, Ghent, Belgium
| | - Vasileios Magoulianitis
- Ming Hsieh Department of Electrical and Computer Engineering, University of Southern California, Los Angeles, CA
| | - Lorenzo Storino Ramacciotti
- USC Institute of Urology and Catherine and Joseph Aresty Department of Urology, Keck School of Medicine, University of Southern California, Los Angeles, CA; Artificial Intelligence Center at USC Urology, USC Institute of Urology, University of Southern California, Los Angeles, CA
| | - Andre Luis De Castro Abreu
- USC Institute of Urology and Catherine and Joseph Aresty Department of Urology, Keck School of Medicine, University of Southern California, Los Angeles, CA; Artificial Intelligence Center at USC Urology, USC Institute of Urology, University of Southern California, Los Angeles, CA
| | - Inderbir S Gill
- USC Institute of Urology and Catherine and Joseph Aresty Department of Urology, Keck School of Medicine, University of Southern California, Los Angeles, CA; Artificial Intelligence Center at USC Urology, USC Institute of Urology, University of Southern California, Los Angeles, CA
| | - Giovanni E Cacciamani
- USC Institute of Urology and Catherine and Joseph Aresty Department of Urology, Keck School of Medicine, University of Southern California, Los Angeles, CA; Artificial Intelligence Center at USC Urology, USC Institute of Urology, University of Southern California, Los Angeles, CA; Young Academic Working Group in Urologic Technology of the European Association of Urology, Arnhem, The Netherlands.
| |
Collapse
|
8
|
Griewing S, Gremke N, Wagner U, Lingenfelder M, Kuhn S, Boekhoff J. Challenging ChatGPT 3.5 in Senology-An Assessment of Concordance with Breast Cancer Tumor Board Decision Making. J Pers Med 2023; 13:1502. [PMID: 37888113 PMCID: PMC10608120 DOI: 10.3390/jpm13101502] [Citation(s) in RCA: 19] [Impact Index Per Article: 9.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/26/2023] [Revised: 10/13/2023] [Accepted: 10/13/2023] [Indexed: 10/28/2023] Open
Abstract
With the recent diffusion of access to publicly available large language models (LLMs), common interest in generative artificial-intelligence-based applications for medical purposes has skyrocketed. The increased use of these models by tech-savvy patients for personal health issues calls for a scientific evaluation of whether LLMs provide a satisfactory level of accuracy for treatment decisions. This observational study compares the concordance of treatment recommendations from the popular LLM ChatGPT 3.5 with those of a multidisciplinary tumor board for breast cancer (MTB). The study design builds on previous findings by combining an extended input model with patient profiles reflecting patho- and immunomorphological diversity of primary breast cancer, including primary metastasis and precancerous tumor stages. Overall concordance between the LLM and MTB is reached for half of the patient profiles, including precancerous lesions. In the assessment of invasive breast cancer profiles, the concordance amounts to 58.8%. Nevertheless, as the LLM makes considerably fraudulent decisions at times, we do not identify the current development status of publicly available LLMs to be adequate as a support tool for tumor boards. Gynecological oncologists should familiarize themselves with the capabilities of LLMs in order to understand and utilize their potential while keeping in mind potential risks and limitations.
Collapse
Affiliation(s)
- Sebastian Griewing
- Institute for Digital Medicine, University Hospital Marburg, Philipps-University Marburg, Baldingerstraße, 35043 Marburg, Germany;
- Department of Gynecology and Obstetrics, University Hospital Marburg, Philipps-University Marburg, Baldingerstraße, 35043 Marburg, Germany; (N.G.); (U.W.); (J.B.)
- Institute for Healthcare Management, Chair of General Business Administration, Philipps-University Marburg, Universitätsstraße 24, 35037 Marburg, Germany;
| | - Niklas Gremke
- Department of Gynecology and Obstetrics, University Hospital Marburg, Philipps-University Marburg, Baldingerstraße, 35043 Marburg, Germany; (N.G.); (U.W.); (J.B.)
| | - Uwe Wagner
- Department of Gynecology and Obstetrics, University Hospital Marburg, Philipps-University Marburg, Baldingerstraße, 35043 Marburg, Germany; (N.G.); (U.W.); (J.B.)
| | - Michael Lingenfelder
- Institute for Healthcare Management, Chair of General Business Administration, Philipps-University Marburg, Universitätsstraße 24, 35037 Marburg, Germany;
| | - Sebastian Kuhn
- Institute for Digital Medicine, University Hospital Marburg, Philipps-University Marburg, Baldingerstraße, 35043 Marburg, Germany;
| | - Jelena Boekhoff
- Department of Gynecology and Obstetrics, University Hospital Marburg, Philipps-University Marburg, Baldingerstraße, 35043 Marburg, Germany; (N.G.); (U.W.); (J.B.)
| |
Collapse
|