151
|
Inojosa H, Voigt I, Wenk J, Ferber D, Wiest I, Antweiler D, Weicken E, Gilbert S, Kather JN, Akgün K, Ziemssen T. Integrating large language models in care, research, and education in multiple sclerosis management. Mult Scler 2024; 30:1392-1401. [PMID: 39308156 PMCID: PMC11514324 DOI: 10.1177/13524585241277376] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/22/2024] [Revised: 06/26/2024] [Accepted: 08/06/2024] [Indexed: 10/25/2024]
Abstract
Use of techniques derived from generative artificial intelligence (AI), specifically large language models (LLMs), offer a transformative potential on the management of multiple sclerosis (MS). Recent LLMs have exhibited remarkable skills in producing and understanding human-like texts. The integration of AI in imaging applications and the deployment of foundation models for the classification and prognosis of disease course, including disability progression and even therapy response, have received considerable attention. However, the use of LLMs within the context of MS remains relatively underexplored. LLMs have the potential to support several activities related to MS management. Clinical decision support systems could help selecting proper disease-modifying therapies; AI-based tools could leverage unstructured real-world data for research or virtual tutors may provide adaptive education materials for neurologists and people with MS in the foreseeable future. In this focused review, we explore practical applications of LLMs across the continuum of MS management as an initial scope for future analyses, reflecting on regulatory hurdles and the indispensable role of human supervision.
Collapse
Affiliation(s)
- Hernan Inojosa
- Center of Clinical Neuroscience, Department of Neurology, University Hospital Carl Gustav Carus Dresden, Technical University Dresden, Dresden, Germany
| | - Isabel Voigt
- Center of Clinical Neuroscience, Department of Neurology, University Hospital Carl Gustav Carus Dresden, Technical University Dresden, Dresden, Germany
| | - Judith Wenk
- Center of Clinical Neuroscience, Department of Neurology, University Hospital Carl Gustav Carus Dresden, Technical University Dresden, Dresden, Germany
| | - Dyke Ferber
- Else Kröner Fresenius Center for Digital Health, Technical University Dresden, Dresden, Germany
| | - Isabella Wiest
- Else Kröner Fresenius Center for Digital Health, Technical University Dresden, Dresden, Germany
| | - Dario Antweiler
- Fraunhofer Institute for Intelligent Analysis and Information Systems, Sankt Augustin, Germany
| | - Eva Weicken
- Else Kröner Fresenius Center for Digital Health, Technical University Dresden, Dresden, Germany
- Fraunhofer Institute for Telecommunications, Heinrich Hertz Institute, HHI, Berlin, Germany
| | - Stephen Gilbert
- Else Kröner Fresenius Center for Digital Health, Technical University Dresden, Dresden, Germany
| | - Jakob Nikolas Kather
- Else Kröner Fresenius Center for Digital Health, Technical University Dresden, Dresden, Germany
| | - Katja Akgün
- Center of Clinical Neuroscience, Department of Neurology, University Hospital Carl Gustav Carus Dresden, Technical University Dresden, Dresden, Germany
| | - Tjalf Ziemssen
- Center of Clinical Neuroscience, Department of Neurology, University Hospital Carl Gustav Carus Dresden, Technical University Dresden, Dresden, Germany
| |
Collapse
|
152
|
AlSaad R, Abd-Alrazaq A, Boughorbel S, Ahmed A, Renault MA, Damseh R, Sheikh J. Multimodal Large Language Models in Health Care: Applications, Challenges, and Future Outlook. J Med Internet Res 2024; 26:e59505. [PMID: 39321458 PMCID: PMC11464944 DOI: 10.2196/59505] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/13/2024] [Revised: 08/07/2024] [Accepted: 08/20/2024] [Indexed: 09/27/2024] Open
Abstract
In the complex and multidimensional field of medicine, multimodal data are prevalent and crucial for informed clinical decisions. Multimodal data span a broad spectrum of data types, including medical images (eg, MRI and CT scans), time-series data (eg, sensor data from wearable devices and electronic health records), audio recordings (eg, heart and respiratory sounds and patient interviews), text (eg, clinical notes and research articles), videos (eg, surgical procedures), and omics data (eg, genomics and proteomics). While advancements in large language models (LLMs) have enabled new applications for knowledge retrieval and processing in the medical field, most LLMs remain limited to processing unimodal data, typically text-based content, and often overlook the importance of integrating the diverse data modalities encountered in clinical practice. This paper aims to present a detailed, practical, and solution-oriented perspective on the use of multimodal LLMs (M-LLMs) in the medical field. Our investigation spanned M-LLM foundational principles, current and potential applications, technical and ethical challenges, and future research directions. By connecting these elements, we aimed to provide a comprehensive framework that links diverse aspects of M-LLMs, offering a unified vision for their future in health care. This approach aims to guide both future research and practical implementations of M-LLMs in health care, positioning them as a paradigm shift toward integrated, multimodal data-driven medical practice. We anticipate that this work will spark further discussion and inspire the development of innovative approaches in the next generation of medical M-LLM systems.
Collapse
Affiliation(s)
- Rawan AlSaad
- Weill Cornell Medicine-Qatar, Education City, Doha, Qatar
| | | | - Sabri Boughorbel
- Qatar Computing Research Institute, Hamad Bin Khalifa University, Doha, Qatar
| | - Arfan Ahmed
- Weill Cornell Medicine-Qatar, Education City, Doha, Qatar
| | | | - Rafat Damseh
- Department of Computer Science and Software Engineering, United Arab Emirates University, Al Ain, United Arab Emirates
| | - Javaid Sheikh
- Weill Cornell Medicine-Qatar, Education City, Doha, Qatar
| |
Collapse
|
153
|
Seong D, Mataraso S, Espinosa C, Berson E, Reincke SM, Xue L, Kashiwagi C, Kim Y, Shu CH, Chung P, Ghanem M, Xie F, Wong RJ, Angst MS, Gaudilliere B, Shaw GM, Stevenson DK, Aghaeepour N. Generating pregnant patient biological profiles by deconvoluting clinical records with electronic health record foundation models. Brief Bioinform 2024; 25:bbae574. [PMID: 39545787 PMCID: PMC11565587 DOI: 10.1093/bib/bbae574] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/31/2024] [Revised: 10/01/2024] [Accepted: 11/01/2024] [Indexed: 11/17/2024] Open
Abstract
Translational biology posits a strong bi-directional link between clinical phenotypes and a patient's biological profile. By leveraging this bi-directional link, we can efficiently deconvolute pre-existing clinical information into biological profiles. However, traditional computational tools are limited in their ability to resolve this link because of the relatively small sizes of paired clinical-biological datasets for training and the high dimensionality/sparsity of tabular clinical data. Here, we use state-of-the-art foundation models (FMs) for electronic health record (EHR) data to generate proteomics profiles of pregnant patients, thereby deconvoluting pre-existing clinical information into biological profiles without the cost and effort of running large-scale traditional omics studies. We show that FM-derived representations of a patient's EHR data coupled with a fully connected neural network prediction head can generate 206 blood protein expression levels. Interestingly, these proteins were enriched for developmental pathways, while proteins not able to be generated from EHR data were enriched for metabolic pathways. Finally, we show a proteomic signature of gestational diabetes that includes proteins with established and novel links to gestational diabetes. These results showcase the power of FM-derived EHR representations in efficiently generating biological states of pregnant patients. This capability can revolutionize disease understanding and therapeutic development, offering a cost-effective, time-efficient, and less invasive alternative to traditional methods of generating proteomics.
Collapse
Affiliation(s)
- David Seong
- Immunology Program, Stanford University School of Medicine, 240 Pasteur Drive, Palo Alto CA, 94304, United States
- Medical Scientist Training Program, Stanford University School of Medicine, 1265 Welch Road, Stanford CA, 94305, United States
- Department of Anesthesiology, Perioperative and Pain Medicine, Stanford University School of Medicine, 300 Pasteur Drive, Stanford CA, 94305, United States
| | - Samson Mataraso
- Department of Anesthesiology, Perioperative and Pain Medicine, Stanford University School of Medicine, 300 Pasteur Drive, Stanford CA, 94305, United States
- Department of Pediatrics, Stanford University School of Medicine, 300 Pasteur Drive, Stanford CA, 94305, United States
- Department of Biomedical Data Science, Stanford University, 1265 Welch Road, Stanford CA, 94305, United States
| | - Camilo Espinosa
- Immunology Program, Stanford University School of Medicine, 240 Pasteur Drive, Palo Alto CA, 94304, United States
- Department of Anesthesiology, Perioperative and Pain Medicine, Stanford University School of Medicine, 300 Pasteur Drive, Stanford CA, 94305, United States
- Department of Pediatrics, Stanford University School of Medicine, 300 Pasteur Drive, Stanford CA, 94305, United States
- Department of Biomedical Data Science, Stanford University, 1265 Welch Road, Stanford CA, 94305, United States
| | - Eloise Berson
- Department of Anesthesiology, Perioperative and Pain Medicine, Stanford University School of Medicine, 300 Pasteur Drive, Stanford CA, 94305, United States
- Department of Biomedical Data Science, Stanford University, 1265 Welch Road, Stanford CA, 94305, United States
- Department of Pathology, Stanford University School of Medicine, 300 Pasteur Drive, Stanford CA, 94305, United States
| | - S Momsen Reincke
- Department of Anesthesiology, Perioperative and Pain Medicine, Stanford University School of Medicine, 300 Pasteur Drive, Stanford CA, 94305, United States
- Department of Pediatrics, Stanford University School of Medicine, 300 Pasteur Drive, Stanford CA, 94305, United States
- Department of Biomedical Data Science, Stanford University, 1265 Welch Road, Stanford CA, 94305, United States
| | - Lei Xue
- Department of Anesthesiology, Perioperative and Pain Medicine, Stanford University School of Medicine, 300 Pasteur Drive, Stanford CA, 94305, United States
- Department of Pediatrics, Stanford University School of Medicine, 300 Pasteur Drive, Stanford CA, 94305, United States
- Department of Biomedical Data Science, Stanford University, 1265 Welch Road, Stanford CA, 94305, United States
| | - Chloe Kashiwagi
- Immunology Program, Stanford University School of Medicine, 240 Pasteur Drive, Palo Alto CA, 94304, United States
- Department of Anesthesiology, Perioperative and Pain Medicine, Stanford University School of Medicine, 300 Pasteur Drive, Stanford CA, 94305, United States
- Department of Pediatrics, Stanford University School of Medicine, 300 Pasteur Drive, Stanford CA, 94305, United States
| | - Yeasul Kim
- Department of Anesthesiology, Perioperative and Pain Medicine, Stanford University School of Medicine, 300 Pasteur Drive, Stanford CA, 94305, United States
- Department of Pediatrics, Stanford University School of Medicine, 300 Pasteur Drive, Stanford CA, 94305, United States
- Department of Biomedical Data Science, Stanford University, 1265 Welch Road, Stanford CA, 94305, United States
| | - Chi-Hung Shu
- Department of Anesthesiology, Perioperative and Pain Medicine, Stanford University School of Medicine, 300 Pasteur Drive, Stanford CA, 94305, United States
| | - Philip Chung
- Department of Anesthesiology, Perioperative and Pain Medicine, Stanford University School of Medicine, 300 Pasteur Drive, Stanford CA, 94305, United States
| | - Marc Ghanem
- Department of Anesthesiology, Perioperative and Pain Medicine, Stanford University School of Medicine, 300 Pasteur Drive, Stanford CA, 94305, United States
| | - Feng Xie
- Department of Anesthesiology, Perioperative and Pain Medicine, Stanford University School of Medicine, 300 Pasteur Drive, Stanford CA, 94305, United States
- Department of Pediatrics, Stanford University School of Medicine, 300 Pasteur Drive, Stanford CA, 94305, United States
- Department of Biomedical Data Science, Stanford University, 1265 Welch Road, Stanford CA, 94305, United States
| | - Ronald J Wong
- Department of Pediatrics, Stanford University School of Medicine, 300 Pasteur Drive, Stanford CA, 94305, United States
| | - Martin S Angst
- Department of Anesthesiology, Perioperative and Pain Medicine, Stanford University School of Medicine, 300 Pasteur Drive, Stanford CA, 94305, United States
| | - Brice Gaudilliere
- Department of Anesthesiology, Perioperative and Pain Medicine, Stanford University School of Medicine, 300 Pasteur Drive, Stanford CA, 94305, United States
| | - Gary M Shaw
- Department of Pediatrics, Stanford University School of Medicine, 300 Pasteur Drive, Stanford CA, 94305, United States
| | - David K Stevenson
- Department of Pediatrics, Stanford University School of Medicine, 300 Pasteur Drive, Stanford CA, 94305, United States
| | - Nima Aghaeepour
- Immunology Program, Stanford University School of Medicine, 240 Pasteur Drive, Palo Alto CA, 94304, United States
- Department of Anesthesiology, Perioperative and Pain Medicine, Stanford University School of Medicine, 300 Pasteur Drive, Stanford CA, 94305, United States
- Department of Pediatrics, Stanford University School of Medicine, 300 Pasteur Drive, Stanford CA, 94305, United States
- Department of Biomedical Data Science, Stanford University, 1265 Welch Road, Stanford CA, 94305, United States
| |
Collapse
|
154
|
Wiest IC, Ferber D, Zhu J, van Treeck M, Meyer SK, Juglan R, Carrero ZI, Paech D, Kleesiek J, Ebert MP, Truhn D, Kather JN. Privacy-preserving large language models for structured medical information retrieval. NPJ Digit Med 2024; 7:257. [PMID: 39304709 DOI: 10.1038/s41746-024-01233-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/09/2024] [Accepted: 08/19/2024] [Indexed: 09/22/2024] Open
Abstract
Most clinical information is encoded as free text, not accessible for quantitative analysis. This study presents an open-source pipeline using the local large language model (LLM) "Llama 2" to extract quantitative information from clinical text and evaluates its performance in identifying features of decompensated liver cirrhosis. The LLM identified five key clinical features in a zero- and one-shot manner from 500 patient medical histories in the MIMIC IV dataset. We compared LLMs of three sizes and various prompt engineering approaches, with predictions compared against ground truth from three blinded medical experts. Our pipeline achieved high accuracy, detecting liver cirrhosis with 100% sensitivity and 96% specificity. High sensitivities and specificities were also yielded for detecting ascites (95%, 95%), confusion (76%, 94%), abdominal pain (84%, 97%), and shortness of breath (87%, 97%) using the 70 billion parameter model, which outperformed smaller versions. Our study successfully demonstrates the capability of locally deployed LLMs to extract clinical information from free text with low hardware requirements.
Collapse
Affiliation(s)
- Isabella Catharina Wiest
- Department of Medicine II, Medical Faculty Mannheim, Heidelberg University, Mannheim, Germany
- Else Kroener Fresenius Center for Digital Health, Faculty of Medicine and University Hospital Carl Gustav Carus, TUD Dresden University of Technology, Dresden, Germany
| | - Dyke Ferber
- Else Kroener Fresenius Center for Digital Health, Faculty of Medicine and University Hospital Carl Gustav Carus, TUD Dresden University of Technology, Dresden, Germany
- Department of Medical Oncology, National Center for Tumor Diseases (NCT), Heidelberg University Hospital, Heidelberg, Germany
| | - Jiefu Zhu
- Else Kroener Fresenius Center for Digital Health, Faculty of Medicine and University Hospital Carl Gustav Carus, TUD Dresden University of Technology, Dresden, Germany
| | - Marko van Treeck
- Else Kroener Fresenius Center for Digital Health, Faculty of Medicine and University Hospital Carl Gustav Carus, TUD Dresden University of Technology, Dresden, Germany
| | - Sonja K Meyer
- Department of Surgery I, University Hospital Würzburg, Würzburg, Germany
| | - Radhika Juglan
- Else Kroener Fresenius Center for Digital Health, Faculty of Medicine and University Hospital Carl Gustav Carus, TUD Dresden University of Technology, Dresden, Germany
| | - Zunamys I Carrero
- Else Kroener Fresenius Center for Digital Health, Faculty of Medicine and University Hospital Carl Gustav Carus, TUD Dresden University of Technology, Dresden, Germany
| | - Daniel Paech
- German Cancer Research Center, Division of Radiology, Heidelberg, Germany
- University Hospital Bonn, Clinic for Neuroradiology, Bonn, Germany
| | - Jens Kleesiek
- Institut für KI in der Medizin (IKIM), Universitätsmedizin Essen, Girardetstr. 2, 45131, Essen, Germany
- Cancer Research Center Cologne Essen (CCCE), West German Cancer Center Essen (WTZ), 45122, Essen, Germany
- TU Dortmund University, Department of Physics, Otto-Hahn-Straße 4, 44227, Dortmund, Germany
| | - Matthias P Ebert
- Department of Medicine II, Medical Faculty Mannheim, Heidelberg University, Mannheim, Germany
- DKFZ Hector Cancer Institute at the University Medical Center, Mannheim, Germany
- Molecular Medicine Partnership Unit, EMBL, Heidelberg, Germany
| | - Daniel Truhn
- Department of Diagnostic and Interventional Radiology, University Hospital Aachen, Aachen, Germany
| | - Jakob Nikolas Kather
- Else Kroener Fresenius Center for Digital Health, Faculty of Medicine and University Hospital Carl Gustav Carus, TUD Dresden University of Technology, Dresden, Germany.
- Department of Medical Oncology, National Center for Tumor Diseases (NCT), Heidelberg University Hospital, Heidelberg, Germany.
- Department of Medicine I, Faculty of Medicine and University Hospital Carl Gustav Carus, TUD Dresden University of Technology, 01307, Dresden, Germany.
| |
Collapse
|
155
|
Sevgi M, Antaki F, Keane PA. Medical education with large language models in ophthalmology: custom instructions and enhanced retrieval capabilities. Br J Ophthalmol 2024; 108:1354-1361. [PMID: 38719344 PMCID: PMC11503072 DOI: 10.1136/bjo-2023-325046] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/08/2023] [Accepted: 03/19/2024] [Indexed: 09/22/2024]
Abstract
Foundation models are the next generation of artificial intelligence that has the potential to provide novel use cases for healthcare. Large language models (LLMs), a type of foundation model, are capable of language comprehension and the ability to generate human-like text. Researchers and developers have been tuning LLMs to optimise their performance in specific tasks, such as medical challenge problems. Until recently, tuning required technical programming expertise, but the release of custom generative pre-trained transformers (GPTs) by OpenAI has allowed users to tune their own GPTs with natural language. This has the potential to democratise access to high-quality bespoke LLMs globally. In this review, we provide an overview of LLMs, how they are tuned and how custom GPTs work. We provide three use cases of custom GPTs in ophthalmology to demonstrate the versatility and effectiveness of these tools. First, we present 'EyeTeacher', an educational aid that generates questions from clinical guidelines to facilitate learning. Second, we built 'EyeAssistant', a clinical support tool that is tuned with clinical guidelines to respond to various physician queries. Lastly, we design 'The GPT for GA', which offers clinicians a comprehensive summary of emerging management strategies for geographic atrophy by analysing peer-reviewed documents. The review underscores the significance of custom instructions and information retrieval in tuning GPTs for specific tasks in ophthalmology. We also discuss the evaluation of LLM responses and address critical aspects such as privacy and accountability in their clinical application. Finally, we discuss their potential in ophthalmic education and clinical practice.
Collapse
Affiliation(s)
- Mertcan Sevgi
- Institute of Ophthalmology, University College London, London, UK
- Moorfields Eye Hospital NHS Foundation Trust, London, UK
| | - Fares Antaki
- Institute of Ophthalmology, University College London, London, UK
- Moorfields Eye Hospital NHS Foundation Trust, London, UK
- The CHUM School of Artificial Intelligence in Healthcare, Montreal, Quebec, Canada
| | - Pearse A Keane
- Institute of Ophthalmology, University College London, London, UK
- Moorfields Eye Hospital NHS Foundation Trust, London, UK
- NIHR Moorfields Biomedical Research Centre, London, Greater London, UK
| |
Collapse
|
156
|
März M, Himmelbauer M, Boldt K, Oksche A. Legal aspects of generative artificial intelligence and large language models in examinations and theses. GMS JOURNAL FOR MEDICAL EDUCATION 2024; 41:Doc47. [PMID: 39415812 PMCID: PMC11474642 DOI: 10.3205/zma001702] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Subscribe] [Scholar Register] [Received: 04/11/2024] [Revised: 07/01/2024] [Accepted: 07/09/2024] [Indexed: 10/19/2024]
Abstract
The high performance of generative artificial intelligence (AI) and large language models (LLM) in examination contexts has triggered an intense debate about their applications, effects and risks. What legal aspects need to be considered when using LLM in teaching and assessment? What possibilities do language models offer? Statutes and laws are used to assess the use of LLM: - University statutes, state higher education laws, licensing regulations for doctors - Copyright Act (UrhG) - General Data Protection Regulation (DGPR) - AI Regulation (EU AI Act) LLM and AI offer opportunities but require clear university frameworks. These should define legitimate uses and areas where use is prohibited. Cheating and plagiarism violate good scientific practice and copyright laws. Cheating is difficult to detect. Plagiarism by AI is possible. Users of the products are responsible. LLM are effective tools for generating exam questions. Nevertheless, careful review is necessary as even apparently high-quality products may contain errors. However, the risk of copyright infringement with AI-generated exam questions is low, as copyright law allows up to 15% of protected works to be used for teaching and exams. The grading of exam content is subject to higher education laws and regulations and the GDPR. Exclusively computer-based assessment without human review is not permitted. For high-risk applications in education, the EU's AI Regulation will apply in the future. When dealing with LLM in assessments, evaluation criteria for existing assessments can be adapted, as can assessment programmes, e.g. to reduce the motivation to cheat. LLM can also become the subject of the examination themselves. Teachers should undergo further training in AI and consider LLM as an addition.
Collapse
Affiliation(s)
- Maren März
- Charité – University Medicine Berlin, AG Progress Test Medicine, Teaching Division, Berlin, Germany
| | | | - Kevin Boldt
- The State Commissioner for Data Protection and Freedom of Information Rhineland-Palatinate, Mainz, Germany
| | - Alexander Oksche
- Institut für medizinische und pharmazeutische Prüfungsfragen (IMPP), Mainz, Germany
- Justus Liebig University Giessen, Rudolf Buchheim Institute for Pharmacology, Giessen, Germany
| |
Collapse
|
157
|
Bhattacharya M, Pal S, Chatterjee S, Lee SS, Chakraborty C. Large language model to multimodal large language model: A journey to shape the biological macromolecules to biological sciences and medicine. MOLECULAR THERAPY. NUCLEIC ACIDS 2024; 35:102255. [PMID: 39377065 PMCID: PMC11456558 DOI: 10.1016/j.omtn.2024.102255] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 10/09/2024]
Abstract
After ChatGPT was released, large language models (LLMs) became more popular. Academicians use ChatGPT or LLM models for different purposes, and the use of ChatGPT or LLM is increasing from medical science to diversified areas. Recently, the multimodal LLM (MLLM) has also become popular. Therefore, we comprehensively illustrate the LLM and MLLM models for a complete understanding. We also aim for simple and extended reviews of LLMs and MLLMs for a broad category of readers, such as researchers, students in diversified fields, and other academicians. The review article illustrates the LLM and MLLM models, their working principles, and their applications in diversified fields. First, we demonstrate the technical concept of LLMs, working principle, Black Box, and the evolution of LLMs. To explain the working principle, we discuss the tokenization process, token representation, and token relationships. We also extensively demonstrate the application of LLMs in biological macromolecules, medical science, biological science, and other areas. We illustrate the multimodal applications of LLMs or MLLMs. Finally, we illustrate the limitations, challenges, and future prospects of LLMs. The review acts as a booster dose for clinicians, a primer for molecular biologists, and a catalyst for scientists, and also benefits diversified academicians.
Collapse
Affiliation(s)
- Manojit Bhattacharya
- Department of Zoology, Fakir Mohan University, Vyasa Vihar, Balasore, Odisha 756020, India
| | - Soumen Pal
- School of Mechanical Engineering, Vellore Institute of Technology, Vellore, Tamil Nadu 632014, India
| | - Srijan Chatterjee
- Institute for Skeletal Aging & Orthopedic Surgery, Hallym University-Chuncheon Sacred Heart Hospital, Chuncheon, Gangwon-Do 24252, Republic of Korea
| | - Sang-Soo Lee
- Institute for Skeletal Aging & Orthopedic Surgery, Hallym University-Chuncheon Sacred Heart Hospital, Chuncheon, Gangwon-Do 24252, Republic of Korea
| | - Chiranjib Chakraborty
- Department of Biotechnology, School of Life Science and Biotechnology, Adamas University, Kolkata, West Bengal 700126, India
| |
Collapse
|
158
|
Weissman G, Mankowitz T, Kanter G. Large language model non-compliance with FDA guidance for clinical decision support devices. RESEARCH SQUARE 2024:rs.3.rs-4868925. [PMID: 39315257 PMCID: PMC11419185 DOI: 10.21203/rs.3.rs-4868925/v1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 09/25/2024]
Abstract
Large language models (LLMs) show considerable promise for clinical decision support (CDS) but none is currently authorized by the Food and Drug Administration (FDA) as a CDS device. We evaluated whether two popular LLMs could be induced to provide unauthorized, devicelike CDS, in violation of FDA's requirements. We found that LLM output readily produced devicelike decision support across a range of scenarios despite instructions to remain compliant with FDA guidelines.
Collapse
Affiliation(s)
| | - Toni Mankowitz
- Leonard D. Schaeffer Center for Health Policy and Economics, University of Southern California, Los Angeles, California, USA
| | | |
Collapse
|
159
|
Pham TD, Teh MT, Chatzopoulou D, Holmes S, Coulthard P. Artificial Intelligence in Head and Neck Cancer: Innovations, Applications, and Future Directions. Curr Oncol 2024; 31:5255-5290. [PMID: 39330017 PMCID: PMC11430806 DOI: 10.3390/curroncol31090389] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/07/2024] [Revised: 09/01/2024] [Accepted: 09/03/2024] [Indexed: 09/28/2024] Open
Abstract
Artificial intelligence (AI) is revolutionizing head and neck cancer (HNC) care by providing innovative tools that enhance diagnostic accuracy and personalize treatment strategies. This review highlights the advancements in AI technologies, including deep learning and natural language processing, and their applications in HNC. The integration of AI with imaging techniques, genomics, and electronic health records is explored, emphasizing its role in early detection, biomarker discovery, and treatment planning. Despite noticeable progress, challenges such as data quality, algorithmic bias, and the need for interdisciplinary collaboration remain. Emerging innovations like explainable AI, AI-powered robotics, and real-time monitoring systems are poised to further advance the field. Addressing these challenges and fostering collaboration among AI experts, clinicians, and researchers is crucial for developing equitable and effective AI applications. The future of AI in HNC holds significant promise, offering potential breakthroughs in diagnostics, personalized therapies, and improved patient outcomes.
Collapse
Affiliation(s)
- Tuan D. Pham
- Barts and The London School of Medicine and Dentistry, Queen Mary University of London, Turner Street, London E1 2AD, UK; (M.-T.T.); (D.C.); (S.H.); (P.C.)
| | | | | | | | | |
Collapse
|
160
|
Kang K, Yang Y, Wu Y, Luo R. Integrating Large Language Models in Bioinformatics Education for Medical Students: Opportunities and Challenges. Ann Biomed Eng 2024; 52:2311-2315. [PMID: 38839663 DOI: 10.1007/s10439-024-03554-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/20/2024] [Accepted: 05/26/2024] [Indexed: 06/07/2024]
Abstract
Large language models (LLMs) offer transformative opportunities in bioinformatics education for medical students by creating interactive experiences. The integration of LLMs enhances educational outcomes through providing accessible code templates, clarifying the function of coding elements, and assisting in error troubleshooting. Here, we demonstrate the practical applications of LLMs with a case study on transcriptome sequencing data processing, a vital component of medical research. However, the reliability of the content that LLMs generate requires rigorous validation. Ensuring the accuracy and appropriateness of the LLM-generated information requires integrating innovative LLMs with traditional educational methods to prepare medical students effectively for future challenges in bioinformatics.
Collapse
Affiliation(s)
- Kai Kang
- Division of Thoracic Tumor Multimodality Treatment, Cancer Center, West China Hospital, Sichuan University, Chengdu, Sichuan, China
| | - Yuqi Yang
- West China School of Medicine, Sichuan University, Chengdu, Sichuan, China
| | - Yijun Wu
- Division of Thoracic Tumor Multimodality Treatment, Cancer Center, West China Hospital, Sichuan University, Chengdu, Sichuan, China
| | - Ren Luo
- Division of Thoracic Tumor Multimodality Treatment, Cancer Center, West China Hospital, Sichuan University, Chengdu, Sichuan, China.
| |
Collapse
|
161
|
Lareyre F, Nasr B, Poggi E, Lorenzo GD, Ballaith A, Sliti I, Chaudhuri A, Raffort J. Large language models and artificial intelligence chatbots in vascular surgery. Semin Vasc Surg 2024; 37:314-320. [PMID: 39277347 DOI: 10.1053/j.semvascsurg.2024.06.001] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/30/2024] [Revised: 06/12/2024] [Accepted: 06/14/2024] [Indexed: 09/17/2024]
Abstract
Natural language processing is a subfield of artificial intelligence that aims to analyze human oral or written language. The development of large language models has brought innovative perspectives in medicine, including the potential use of chatbots and virtual assistants. Nevertheless, the benefits and pitfalls of such technology need to be carefully evaluated before their use in health care. The aim of this narrative review was to provide an overview of potential applications of large language models and artificial intelligence chatbots in the field of vascular surgery, including clinical practice, research, and education. In light of the results, we discuss current limits and future directions.
Collapse
Affiliation(s)
- Fabien Lareyre
- Department of Vascular Surgery, Hospital of Antibes Juan-les-Pins, France; Université Côte d'Azur, Centre National de la Recherche Scientifique (CNRS), UMR7370, Laboratoire de Physiomédecine Moléculaire (LP2M), Nice, France; Fédération Hospitalo-Universitaire FHU Plan & Go, Nice, France
| | - Bahaa Nasr
- University of Brest, Institut National de la Santé et de la Recherche Médicale (INSERM), IMT-Atlantique, UMR 1011 LaTIM, Vascular and Endovascular Surgery Department, CHU Cavale Blanche, Brest, France
| | - Elise Poggi
- Department of Vascular Surgery, Hospital of Antibes Juan-les-Pins, France
| | - Gilles Di Lorenzo
- Department of Vascular Surgery, Hospital of Antibes Juan-les-Pins, France
| | - Ali Ballaith
- Department of Cardiovascular Surgery, Zayed Military Hospital, Abu Dhabi, United Arab Emirates
| | - Imen Sliti
- Department of Vascular Surgery, Hospital of Antibes Juan-les-Pins, France
| | - Arindam Chaudhuri
- Bedfordshire - Milton Keynes Vascular Centre, Bedfordshire Hospitals, National Health Service Foundation Trust, Bedford, UK
| | - Juliette Raffort
- Université Côte d'Azur, Centre National de la Recherche Scientifique (CNRS), UMR7370, Laboratoire de Physiomédecine Moléculaire (LP2M), Nice, France; Fédération Hospitalo-Universitaire FHU Plan & Go, Nice, France; Clinical Chemistry Laboratory, University Hospital of Nice, France; Institute 3IA Côte d'Azur, Université Côte d'Azur, France; Department of Clinical Biochemistry, Hôpital Pasteur, Pavillon J, 30, Avenue de la Voie Romaine, 06001 Nice cedex 1, France.
| |
Collapse
|
162
|
Madrid-García A, Merino-Barbancho B, Freites-Núñez D, Rodríguez-Rodríguez L, Menasalvas-Ruíz E, Rodríguez-González A, Peñas A. From Web to RheumaLpack: Creating a Linguistic Corpus for Exploitation and Knowledge Discovery in Rheumatology. Comput Biol Med 2024; 179:108920. [PMID: 39047506 DOI: 10.1016/j.compbiomed.2024.108920] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/09/2024] [Revised: 06/30/2024] [Accepted: 07/16/2024] [Indexed: 07/27/2024]
Abstract
This study introduces RheumaLinguisticpack (RheumaLpack), the first specialised linguistic web corpus designed for the field of musculoskeletal disorders. By combining web mining (i.e., web scraping) and natural language processing (NLP) techniques, as well as clinical expertise, RheumaLpack systematically captures and curates structured and unstructured data across a spectrum of web sources including clinical trials registers (i.e., ClinicalTrials.gov), bibliographic databases (i.e., PubMed), medical agencies (i.e. European Medicines Agency), social media (i.e., Reddit), and accredited health websites (i.e., MedlinePlus, Harvard Health Publishing, and Cleveland Clinic). Given the complexity of rheumatic and musculoskeletal diseases (RMDs) and their significant impact on quality of life, this resource can be proposed as a useful tool to train algorithms that could mitigate the diseases' effects. Therefore, the corpus aims to improve the training of artificial intelligence (AI) algorithms and facilitate knowledge discovery in RMDs. The development of RheumaLpack involved a systematic six-step methodology covering data identification, characterisation, selection, collection, processing, and corpus description. The result is a non-annotated, monolingual, and dynamic corpus, featuring almost 3 million records spanning from 2000 to 2023. RheumaLpack represents a pioneering contribution to rheumatology research, providing a useful resource for the development of advanced AI and NLP applications. This corpus highlights the value of web data to address the challenges posed by musculoskeletal diseases, illustrating the corpus's potential to improve research and treatment paradigms in rheumatology. Finally, the methodology shown can be replicated to obtain data from other medical specialities. The code and details on how to build RheumaLpack are also provided to facilitate the dissemination of such resource.
Collapse
Affiliation(s)
- Alfredo Madrid-García
- Grupo de Patología Musculoesquelética, Hospital Clínico San Carlos, Instituto de Investigación Sanitaria San Carlos (IdISSC), Prof. Martin Lagos s/n, Madrid, 28040, Spain.
| | - Beatriz Merino-Barbancho
- Escuela Técnica Superior de Ingenieros de Telecomunicación Universidad Politécnica de Madrid, Avenida Complutense, 30, Madrid, 28040, Spain
| | - Dalifer Freites-Núñez
- Grupo de Patología Musculoesquelética, Hospital Clínico San Carlos, Instituto de Investigación Sanitaria San Carlos (IdISSC), Prof. Martin Lagos s/n, Madrid, 28040, Spain
| | - Luis Rodríguez-Rodríguez
- Grupo de Patología Musculoesquelética, Hospital Clínico San Carlos, Instituto de Investigación Sanitaria San Carlos (IdISSC), Prof. Martin Lagos s/n, Madrid, 28040, Spain
| | - Ernestina Menasalvas-Ruíz
- Centro de Tecnología Biomédica, Universidad Politécnica de Madrid, Pozuelo de Alarcón, Madrid, 28223, Spain; Escuela Técnica Superior de Ingenieros Informáticos, Universidad Politécnica de Madrid, Boadilla del Monte, Madrid, 28660, Spain
| | - Alejandro Rodríguez-González
- Centro de Tecnología Biomédica, Universidad Politécnica de Madrid, Pozuelo de Alarcón, Madrid, 28223, Spain; Escuela Técnica Superior de Ingenieros Informáticos, Universidad Politécnica de Madrid, Boadilla del Monte, Madrid, 28660, Spain
| | - Anselmo Peñas
- UNED NLP & IR Group Universidad Nacional de Educación a Distancia, Juan del Rosal 16, 28040, Madrid, Spain
| |
Collapse
|
163
|
Yang J, Walker KC, Bekar-Cesaretli AA, Hao B, Bhadelia N, Joseph-McCarthy D, Paschalidis IC. Automating biomedical literature review for rapid drug discovery: Leveraging GPT-4 to expedite pandemic response. Int J Med Inform 2024; 189:105500. [PMID: 38815316 DOI: 10.1016/j.ijmedinf.2024.105500] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/08/2024] [Revised: 04/13/2024] [Accepted: 05/21/2024] [Indexed: 06/01/2024]
Abstract
OBJECTIVE The rapid expansion of the biomedical literature challenges traditional review methods, especially during outbreaks of emerging infectious diseases when quick action is critical. Our study aims to explore the potential of ChatGPT to automate the biomedical literature review for rapid drug discovery. MATERIALS AND METHODS We introduce a novel automated pipeline helping to identify drugs for a given virus in response to a potential future global health threat. Our approach can be used to select PubMed articles identifying a drug target for the given virus. We tested our approach on two known pathogens: SARS-CoV-2, where the literature is vast, and Nipah, where the literature is sparse. Specifically, a panel of three experts reviewed a set of PubMed articles and labeled them as either describing a drug target for the given virus or not. The same task was given to the automated pipeline and its performance was based on whether it labeled the articles similarly to the human experts. We applied a number of prompt engineering techniques to improve the performance of ChatGPT. RESULTS Our best configuration used GPT-4 by OpenAI and achieved an out-of-sample validation performance with accuracy/F1-score/sensitivity/specificity of 92.87%/88.43%/83.38%/97.82% for SARS-CoV-2 and 87.40%/73.90%/74.72%/91.36% for Nipah. CONCLUSION These results highlight the utility of ChatGPT in drug discovery and development and reveal their potential to enable rapid drug target identification during a pandemic-level health emergency.
Collapse
Affiliation(s)
- Jingmei Yang
- Department of Electrical & Computer Engineering and Division of Systems Engineering, Boston University, Boston, MA, United States of America
| | - Kenji C Walker
- Department of Biomedical Engineering, Boston University, Boston, MA, United States of America
| | | | - Boran Hao
- Department of Electrical & Computer Engineering and Division of Systems Engineering, Boston University, Boston, MA, United States of America
| | - Nahid Bhadelia
- Chobanian & Avedisian School of Medicine and Center for Emerging Infectious Diseases Policy and Research, Boston University, Boston, MA, United States of America
| | - Diane Joseph-McCarthy
- Department of Biomedical Engineering, Boston University, Boston, MA, United States of America
| | - Ioannis Ch Paschalidis
- Department of Electrical & Computer Engineering and Division of Systems Engineering, Boston University, Boston, MA, United States of America; Department of Biomedical Engineering, Boston University, Boston, MA, United States of America; Faculty of Computing & Data Sciences, Boston University, Boston, MA, United States of America.
| |
Collapse
|
164
|
Freyer O, Wiest IC, Kather JN, Gilbert S. A future role for health applications of large language models depends on regulators enforcing safety standards. Lancet Digit Health 2024; 6:e662-e672. [PMID: 39179311 DOI: 10.1016/s2589-7500(24)00124-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/14/2024] [Revised: 05/17/2024] [Accepted: 06/06/2024] [Indexed: 08/26/2024]
Abstract
Among the rapid integration of artificial intelligence in clinical settings, large language models (LLMs), such as Generative Pre-trained Transformer-4, have emerged as multifaceted tools that have potential for health-care delivery, diagnosis, and patient care. However, deployment of LLMs raises substantial regulatory and safety concerns. Due to their high output variability, poor inherent explainability, and the risk of so-called AI hallucinations, LLM-based health-care applications that serve a medical purpose face regulatory challenges for approval as medical devices under US and EU laws, including the recently passed EU Artificial Intelligence Act. Despite unaddressed risks for patients, including misdiagnosis and unverified medical advice, such applications are available on the market. The regulatory ambiguity surrounding these tools creates an urgent need for frameworks that accommodate their unique capabilities and limitations. Alongside the development of these frameworks, existing regulations should be enforced. If regulators fear enforcing the regulations in a market dominated by supply or development by large technology companies, the consequences of layperson harm will force belated action, damaging the potentiality of LLM-based applications for layperson medical advice.
Collapse
Affiliation(s)
- Oscar Freyer
- Else Kröner Fresenius Center for Digital Health, TUD Dresden University of Technology, Dresden, Germany
| | - Isabella Catharina Wiest
- Else Kröner Fresenius Center for Digital Health, TUD Dresden University of Technology, Dresden, Germany; Department of Medicine, Medical Faculty Mannheim, Heidelberg University, Mannheim, Germany
| | - Jakob Nikolas Kather
- Else Kröner Fresenius Center for Digital Health, TUD Dresden University of Technology, Dresden, Germany; Department of Medicine, University Hospital Dresden, Dresden, Germany; Medical Oncology, National Center for Tumor Diseases, University Hospital Heidelberg, Heidelberg, Germany
| | - Stephen Gilbert
- Else Kröner Fresenius Center for Digital Health, TUD Dresden University of Technology, Dresden, Germany.
| |
Collapse
|
165
|
Stahl D. New horizons in prediction modelling using machine learning in older people's healthcare research. Age Ageing 2024; 53:afae201. [PMID: 39311424 PMCID: PMC11417961 DOI: 10.1093/ageing/afae201] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/18/2023] [Revised: 06/26/2024] [Indexed: 09/26/2024] Open
Abstract
Machine learning (ML) and prediction modelling have become increasingly influential in healthcare, providing critical insights and supporting clinical decisions, particularly in the age of big data. This paper serves as an introductory guide for health researchers and readers interested in prediction modelling and explores how these technologies support clinical decisions, particularly with big data, and covers all aspects of the development, assessment and reporting of a model using ML. The paper starts with the importance of prediction modelling for precision medicine. It outlines different types of prediction and machine learning approaches, including supervised, unsupervised and semi-supervised learning, and provides an overview of popular algorithms for various outcomes and settings. It also introduces key theoretical ML concepts. The importance of data quality, preprocessing and unbiased model performance evaluation is highlighted. Concepts of apparent, internal and external validation will be introduced along with metrics for discrimination and calibration for different types of outcomes. Additionally, the paper addresses model interpretation, fairness and implementation in clinical practice. Finally, the paper provides recommendations for reporting and identifies common pitfalls in prediction modelling and machine learning. The aim of the paper is to help readers understand and critically evaluate research papers that present ML models and to serve as a first guide for developing, assessing and implementing their own.
Collapse
Affiliation(s)
- Daniel Stahl
- Department of Biostatistics and Health Informatics, Institute of Psychiatry, Psychology & Neuroscience, King’s College London, London, UK
| |
Collapse
|
166
|
Zhou H, Li M, Xiao Y, Yang H, Zhang R. LEAP: LLM instruction-example adaptive prompting framework for biomedical relation extraction. J Am Med Inform Assoc 2024; 31:2010-2018. [PMID: 38904416 PMCID: PMC11339510 DOI: 10.1093/jamia/ocae147] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/15/2023] [Revised: 05/26/2024] [Accepted: 06/03/2024] [Indexed: 06/22/2024] Open
Abstract
OBJECTIVE To investigate the demonstration in large language models (LLMs) for biomedical relation extraction. This study introduces a framework comprising three types of adaptive tuning methods to assess their impacts and effectiveness. MATERIALS AND METHODS Our study was conducted in two phases. Initially, we analyzed a range of demonstration components vital for LLMs' biomedical data capabilities, including task descriptions and examples, experimenting with various combinations. Subsequently, we introduced the LLM instruction-example adaptive prompting (LEAP) framework, including instruction adaptive tuning, example adaptive tuning, and instruction-example adaptive tuning methods. This framework aims to systematically investigate both adaptive task descriptions and adaptive examples within the demonstration. We assessed the performance of the LEAP framework on the DDI, ChemProt, and BioRED datasets, employing LLMs such as Llama2-7b, Llama2-13b, and MedLLaMA_13B. RESULTS Our findings indicated that Instruction + Options + Example and its expanded form substantially improved F1 scores over the standard Instruction + Options mode for zero-shot LLMs. The LEAP framework, particularly through its example adaptive prompting, demonstrated superior performance over conventional instruction tuning across all models. Notably, the MedLLAMA_13B model achieved an exceptional F1 score of 95.13 on the ChemProt dataset using this method. Significant improvements were also observed in the DDI 2013 and BioRED datasets, confirming the method's robustness in sophisticated data extraction scenarios. CONCLUSION The LEAP framework offers a compelling strategy for enhancing LLM training strategies, steering away from extensive fine-tuning towards more dynamic and contextually enriched prompting methodologies, showcasing in biomedical relation extraction.
Collapse
Affiliation(s)
- Huixue Zhou
- Institute for Health Informatics, University of Minnesota, Minneapolis, MN 55455, United States
| | - Mingchen Li
- Division of Computational Health Sciences, Department of Surgery, University of Minnesota, Minneapolis, MN 55455, United States
| | - Yongkang Xiao
- Institute for Health Informatics, University of Minnesota, Minneapolis, MN 55455, United States
| | - Han Yang
- Institute for Health Informatics, University of Minnesota, Minneapolis, MN 55455, United States
| | - Rui Zhang
- Division of Computational Health Sciences, Department of Surgery, University of Minnesota, Minneapolis, MN 55455, United States
| |
Collapse
|
167
|
Yan C, Ong HH, Grabowska ME, Krantz MS, Su WC, Dickson AL, Peterson JF, Feng Q, Roden DM, Stein CM, Kerchberger VE, Malin BA, Wei WQ. Large language models facilitate the generation of electronic health record phenotyping algorithms. J Am Med Inform Assoc 2024; 31:1994-2001. [PMID: 38613820 PMCID: PMC11339509 DOI: 10.1093/jamia/ocae072] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/19/2023] [Revised: 02/21/2024] [Accepted: 03/22/2024] [Indexed: 04/15/2024] Open
Abstract
OBJECTIVES Phenotyping is a core task in observational health research utilizing electronic health records (EHRs). Developing an accurate algorithm demands substantial input from domain experts, involving extensive literature review and evidence synthesis. This burdensome process limits scalability and delays knowledge discovery. We investigate the potential for leveraging large language models (LLMs) to enhance the efficiency of EHR phenotyping by generating high-quality algorithm drafts. MATERIALS AND METHODS We prompted four LLMs-GPT-4 and GPT-3.5 of ChatGPT, Claude 2, and Bard-in October 2023, asking them to generate executable phenotyping algorithms in the form of SQL queries adhering to a common data model (CDM) for three phenotypes (ie, type 2 diabetes mellitus, dementia, and hypothyroidism). Three phenotyping experts evaluated the returned algorithms across several critical metrics. We further implemented the top-rated algorithms and compared them against clinician-validated phenotyping algorithms from the Electronic Medical Records and Genomics (eMERGE) network. RESULTS GPT-4 and GPT-3.5 exhibited significantly higher overall expert evaluation scores in instruction following, algorithmic logic, and SQL executability, when compared to Claude 2 and Bard. Although GPT-4 and GPT-3.5 effectively identified relevant clinical concepts, they exhibited immature capability in organizing phenotyping criteria with the proper logic, leading to phenotyping algorithms that were either excessively restrictive (with low recall) or overly broad (with low positive predictive values). CONCLUSION GPT versions 3.5 and 4 are capable of drafting phenotyping algorithms by identifying relevant clinical criteria aligned with a CDM. However, expertise in informatics and clinical experience is still required to assess and further refine generated algorithms.
Collapse
Affiliation(s)
- Chao Yan
- Department of Biomedical Informatics, Vanderbilt University Medical Center, Nashville, TN 37203, United States
| | - Henry H Ong
- Department of Biomedical Informatics, Vanderbilt University Medical Center, Nashville, TN 37203, United States
| | - Monika E Grabowska
- Department of Biomedical Informatics, Vanderbilt University Medical Center, Nashville, TN 37203, United States
| | - Matthew S Krantz
- Department of Biomedical Informatics, Vanderbilt University Medical Center, Nashville, TN 37203, United States
| | - Wu-Chen Su
- Department of Biomedical Informatics, Vanderbilt University Medical Center, Nashville, TN 37203, United States
| | - Alyson L Dickson
- Department of Medicine, Vanderbilt University Medical Center, Nashville, TN 37203, United States
| | - Josh F Peterson
- Department of Biomedical Informatics, Vanderbilt University Medical Center, Nashville, TN 37203, United States
- Department of Medicine, Vanderbilt University Medical Center, Nashville, TN 37203, United States
| | - QiPing Feng
- Department of Medicine, Vanderbilt University Medical Center, Nashville, TN 37203, United States
| | - Dan M Roden
- Department of Biomedical Informatics, Vanderbilt University Medical Center, Nashville, TN 37203, United States
| | - C Michael Stein
- Department of Medicine, Vanderbilt University Medical Center, Nashville, TN 37203, United States
| | - V Eric Kerchberger
- Department of Biomedical Informatics, Vanderbilt University Medical Center, Nashville, TN 37203, United States
- Department of Medicine, Vanderbilt University Medical Center, Nashville, TN 37203, United States
| | - Bradley A Malin
- Department of Biomedical Informatics, Vanderbilt University Medical Center, Nashville, TN 37203, United States
- Department of Computer Science, Vanderbilt University, Nashville, TN 37203, United States
- Department of Biostatistics, Vanderbilt University Medical Center, Nashville, TN 37203, United States
| | - Wei-Qi Wei
- Department of Biomedical Informatics, Vanderbilt University Medical Center, Nashville, TN 37203, United States
- Department of Computer Science, Vanderbilt University, Nashville, TN 37203, United States
| |
Collapse
|
168
|
Dipaola F, Gebska MA, Gatti M, Levra AG, Parker WH, Menè R, Lee S, Costantino G, Barsotti EJ, Shiffer D, Johnston SL, Sutton R, Olshansky B, Furlan R. Will Artificial Intelligence Be "Better" Than Humans in the Management of Syncope? JACC. ADVANCES 2024; 3:101072. [PMID: 39372450 PMCID: PMC11450913 DOI: 10.1016/j.jacadv.2024.101072] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 01/01/2024] [Revised: 04/19/2024] [Accepted: 04/29/2024] [Indexed: 10/08/2024]
Abstract
Clinical decision-making regarding syncope poses challenges, with risk of physician error due to the elusive nature of syncope pathophysiology, diverse presentations, heterogeneity of risk factors, and limited therapeutic options. Artificial intelligence (AI)-based techniques, including machine learning (ML), deep learning (DL), and natural language processing (NLP), can uncover hidden and nonlinear connections among syncope risk factors, disease features, and clinical outcomes. ML, DL, and NLP models can analyze vast amounts of data effectively and assist physicians to help distinguish true syncope from other types of transient loss of consciousness. Additionally, short-term adverse events and length of hospital stay can be predicted by these models. In syncope research, AI-based models shift the focus from causality to correlation analysis between entities. This prompts the search for patterns rather than defining a hypothesis to be tested a priori. Furthermore, education of students, doctors, and health care providers engaged in continuing medical education may benefit from clinical cases of syncope interacting with NLP-based virtual patient simulators. Education may be of benefit to patients. This article explores potential strengths, weaknesses, and proposed solutions associated with utilization of ML and DL in syncope diagnosis and management. Three main topics regarding syncope are addressed: 1) clinical decision-making; 2) clinical research; and 3) education. Within each domain, we question whether "AI will be better than humans," seeking evidence to support our objective inquiry.
Collapse
Affiliation(s)
- Franca Dipaola
- Internal Medicine, IRCCS Humanitas Research Hospital, Rozzano, Italy
| | - Milena A. Gebska
- Division of Cardiovascular Medicine, Roy J. and Lucille A. Carver College of Medicine, University of Iowa, Iowa City, Iowa, USA
| | | | | | - William H. Parker
- Division of Cardiovascular Medicine, Roy J. and Lucille A. Carver College of Medicine, University of Iowa, Iowa City, Iowa, USA
| | - Roberto Menè
- Cardiac Arrhythmia Department, Bordeaux University Hospital, INSERM, Bordeaux, France
- IHU LIRYC, Electrophysiology and Heart Modeling Institute, Bordeaux, France
| | - Sangil Lee
- Department of Emergency Medicine, Roy J. and Lucille A. Carver College of Medicine, University of Iowa, Iowa City, Iowa, USA
| | - Giorgio Costantino
- Emergency Department, IRCCS Ca’ Granda, Ospedale Maggiore, Milano, Italy
| | - E. John Barsotti
- Department of Epidemiology, College of Public Health, University of Iowa, Iowa City, Iowa, USA
| | - Dana Shiffer
- Department of Biomedical Sciences, Humanitas University, Milan, Italy
| | - Samuel L. Johnston
- Division of Cardiovascular Medicine, Roy J. and Lucille A. Carver College of Medicine, University of Iowa, Iowa City, Iowa, USA
| | - Richard Sutton
- Department of Cardiology, Hammersmith Hospital Campus, National Heart & Lung Institute, Imperial College, London, United Kingdom
| | - Brian Olshansky
- Division of Cardiovascular Medicine, Roy J. and Lucille A. Carver College of Medicine, University of Iowa, Iowa City, Iowa, USA
| | - Raffaello Furlan
- Internal Medicine, IRCCS Humanitas Research Hospital, Rozzano, Italy
- Department of Biomedical Sciences, Humanitas University, Milan, Italy
| |
Collapse
|
169
|
Ritoré Á, Jiménez CM, González JL, Rejón-Parrilla JC, Hervás P, Toro E, Parra-Calderón CL, Celi LA, Túnez I, Armengol de la Hoz MÁ. The role of Open Access Data in democratizing healthcare AI: A pathway to research enhancement, patient well-being and treatment equity in Andalusia, Spain. PLOS DIGITAL HEALTH 2024; 3:e0000599. [PMID: 39283912 PMCID: PMC11404816 DOI: 10.1371/journal.pdig.0000599] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Figures] [Subscribe] [Scholar Register] [Indexed: 09/20/2024]
Affiliation(s)
- Álvaro Ritoré
- Big Data Department, PMC, Fundación Progreso y Salud, Seville, Spain
| | - Claudia M Jiménez
- Big Data Department, PMC, Fundación Progreso y Salud, Seville, Spain
| | | | | | - Pablo Hervás
- Department of Technology Transfer, Fundación Progreso y Salud, Seville, Spain
| | - Esteban Toro
- Department of Information Systems, Fundación Progreso y Salud, Seville, Spain
| | - Carlos Luis Parra-Calderón
- Department of Technological Innovation, Virgen del Rocío University Hospital, Seville, Spain
- Group of Innovation in Biomedical Informatics, Biomedical Engineering and Health Economics, Institute of Biomedicine of Seville, Seville, Spain
| | - Leo Anthony Celi
- Laboratory for Computational Physiology, Massachusetts Institute of Technology, Cambridge, Massachusetts, United States of America
- Department of Medicine, Beth Israel Deaconess Medical Center, Boston, Massachusetts, United States of America
- Department of Biostatistics, Harvard T.H. Chan School of Public Health, Boston, Massachusetts, United States of America
| | - Isaac Túnez
- Department of Biochemistry and Molecular Biology, University of Córdoba, Córdoba, Spain
- Reina Sofía University Hospital, Córdoba, Spain
- Maimónides Institute of Biomedical Research of Córdoba, Córdoba, Spain
- General Secretariat of Public Health and Research, Development and Innovation in Health, Regional Ministry of Health and Consumer Affairs, Regional Government of Andalusia, Seville, Spain
| | | |
Collapse
|
170
|
Hwai H, Ho YJ, Wang CH, Huang CH. Large language model application in emergency medicine and critical care. J Formos Med Assoc 2024:S0929-6646(24)00400-5. [PMID: 39198112 DOI: 10.1016/j.jfma.2024.08.032] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/15/2024] [Revised: 08/13/2024] [Accepted: 08/23/2024] [Indexed: 09/01/2024] Open
Abstract
In the rapidly evolving healthcare landscape, artificial intelligence (AI), particularly the large language models (LLMs), like OpenAI's Chat Generative Pretrained Transformer (ChatGPT), has shown transformative potential in emergency medicine and critical care. This review article highlights the advancement and applications of ChatGPT, from diagnostic assistance to clinical documentation and patient communication, demonstrating its ability to perform comparably to human professionals in medical examinations. ChatGPT could assist clinical decision-making and medication selection in critical care, showcasing its potential to optimize patient care management. However, integrating LLMs into healthcare raises legal, ethical, and privacy concerns, including data protection and the necessity for informed consent. Finally, we addressed the challenges related to the accuracy of LLMs, such as the risk of providing incorrect medical advice. These concerns underscore the importance of ongoing research and regulation to ensure their ethical and practical use in healthcare.
Collapse
Affiliation(s)
- Haw Hwai
- Department of Emergency Medicine, National Taiwan University Hospital, National Taiwan University Medical College, Taipei, Taiwan.
| | - Yi-Ju Ho
- Department of Emergency Medicine, National Taiwan University Hospital, National Taiwan University Medical College, Taipei, Taiwan.
| | - Chih-Hung Wang
- Department of Emergency Medicine, National Taiwan University Hospital, National Taiwan University Medical College, Taipei, Taiwan.
| | - Chien-Hua Huang
- Department of Emergency Medicine, National Taiwan University Hospital, National Taiwan University Medical College, Taipei, Taiwan.
| |
Collapse
|
171
|
Shah K, Xu AY, Sharma Y, Daher M, McDonald C, Diebo BG, Daniels AH. Large Language Model Prompting Techniques for Advancement in Clinical Medicine. J Clin Med 2024; 13:5101. [PMID: 39274316 PMCID: PMC11396764 DOI: 10.3390/jcm13175101] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/23/2024] [Revised: 08/23/2024] [Accepted: 08/26/2024] [Indexed: 09/16/2024] Open
Abstract
Large Language Models (LLMs have the potential to revolutionize clinical medicine by enhancing healthcare access, diagnosis, surgical planning, and education. However, their utilization requires careful, prompt engineering to mitigate challenges like hallucinations and biases. Proper utilization of LLMs involves understanding foundational concepts such as tokenization, embeddings, and attention mechanisms, alongside strategic prompting techniques to ensure accurate outputs. For innovative healthcare solutions, it is essential to maintain ongoing collaboration between AI technology and medical professionals. Ethical considerations, including data security and bias mitigation, are critical to their application. By leveraging LLMs as supplementary resources in research and education, we can enhance learning and support knowledge-based inquiries, ultimately advancing the quality and accessibility of medical care. Continued research and development are necessary to fully realize the potential of LLMs in transforming healthcare.
Collapse
Affiliation(s)
- Krish Shah
- Warren Alpert Medical School, Brown University, East Providence, RI 02914, USA
| | - Andrew Y Xu
- Warren Alpert Medical School, Brown University, East Providence, RI 02914, USA
| | - Yatharth Sharma
- Warren Alpert Medical School, Brown University, East Providence, RI 02914, USA
| | - Mohammed Daher
- Department of Orthopedics, Warren Alpert Medical School, Brown University, Providence, RI 02912, USA
| | - Christopher McDonald
- Department of Orthopedics, Warren Alpert Medical School, Brown University, Providence, RI 02912, USA
| | - Bassel G Diebo
- Department of Orthopedics, Warren Alpert Medical School, Brown University, Providence, RI 02912, USA
| | - Alan H Daniels
- Department of Orthopedics, Warren Alpert Medical School, Brown University, Providence, RI 02912, USA
| |
Collapse
|
172
|
Łaszkiewicz J, Krajewski W, Tomczak W, Chorbińska J, Nowak Ł, Chełmoński A, Krajewski P, Sójka A, Małkiewicz B, Szydełko T. Performance of ChatGPT in providing patient information about upper tract urothelial carcinoma. Contemp Oncol (Pozn) 2024; 28:172-181. [PMID: 39421706 PMCID: PMC11480910 DOI: 10.5114/wo.2024.141567] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/08/2024] [Accepted: 05/14/2024] [Indexed: 10/19/2024] Open
Abstract
Introduction The aim was to evaluate ChatGPT generated responses to patient-important questions regarding upper tract urothelial carcinoma (UTUC). Material and methods Fifteen common inquiries asked by patients regarding UTUC were assigned to 4 categories: general information; symptoms and diagnosis; treatment; and prognosis. These questions were entered into ChatGPT and its responses were recorded. In every answer 5 criteria (adequate length, comprehensible language, precision in addressing the question, compliance with European Association of Urology guidelines and safety of the response for the patient) were assessed by the urologists using a numerical scale of 1-5 (a score of 5 being the best). Results Sixteen questionnaires were included. A score of five was assigned 336 times (28.0%); 4 - 527 times, (43.9%); 3 - 268 times (22.3%); 2 - 53 ti- mes (4.4%); and 1 - 16 times (1.3%). The average overall score was 3.93. Responses to each question received average scores within the range 3.34-4.18. Answers regarding "general information" were graded the highest - mean score 4.14. Artificial intelligence scored the lowest in the "treatment" category - mean score 3.68. A mean score of 4.02 was given for the safety of the response. However, a few urologists considered several answers as unsafe for the patient, by grading them 1 or 2 in this criterion. Conclusions ChatGPT does not provide fully adequate information on UTUC, and inquiries regarding treatment can be misleading for the patients. In particular cases, patients might receive potentially unsafe answers. However, ChatGPT can be used with caution to provide basic information regarding epidemiology and risk factors of UTUC.
Collapse
Affiliation(s)
- Jan Łaszkiewicz
- University Center of Excellence in Urology, Wrocław Medical University, Wrocław, Poland
| | - Wojciech Krajewski
- Department of Minimally Invasive and Robotic Urology, University Center of Excellence in Urology, Wrocław Medical University, Wrocław, Poland
| | - Wojciech Tomczak
- University Center of Excellence in Urology, Wrocław Medical University, Wrocław, Poland
| | - Joanna Chorbińska
- Department of Minimally Invasive and Robotic Urology, University Center of Excellence in Urology, Wrocław Medical University, Wrocław, Poland
| | - Łukasz Nowak
- Department of Minimally Invasive and Robotic Urology, University Center of Excellence in Urology, Wrocław Medical University, Wrocław, Poland
| | - Adam Chełmoński
- University Center of Excellence in Urology, Wrocław Medical University, Wrocław, Poland
| | - Piotr Krajewski
- Department of Dermatology, Venereology and Allergology, Wroclaw Medical University, Wrocław, Poland
| | - Aleksandra Sójka
- University Center of Excellence in Urology, Wrocław Medical University, Wrocław, Poland
| | - Bartosz Małkiewicz
- Department of Minimally Invasive and Robotic Urology, University Center of Excellence in Urology, Wrocław Medical University, Wrocław, Poland
| | - Tomasz Szydełko
- Department of Minimally Invasive and Robotic Urology, University Center of Excellence in Urology, Wrocław Medical University, Wrocław, Poland
| |
Collapse
|
173
|
Pan G, Ni J. A cross sectional investigation of ChatGPT-like large language models application among medical students in China. BMC MEDICAL EDUCATION 2024; 24:908. [PMID: 39180023 PMCID: PMC11342543 DOI: 10.1186/s12909-024-05871-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/22/2024] [Accepted: 08/07/2024] [Indexed: 08/26/2024]
Abstract
OBJECTIVE To investigate the level of understanding and trust of medical students towards ChatGPT-like large language models, as well as their utilization and attitudes towards these models. METHODS Data collection was concentrated from December 2023 to mid-January 2024, utilizing a self-designed questionnaire to assess the use of large language models among undergraduate medical students at Anhui Medical University. The normality of the data was confirmed with Shapiro-Wilk tests. We used Chi-square tests for comparisons of categorical variables, Mann-Whitney U tests for comparisons of ordinal variables and non-normal continuous variables between two groups, Kruskall-Wallis H tests for comparisons of ordinal variables between multiple groups, and Bonferroni tests for post hoc comparisons. RESULTS A total of 1774 questionnaires were distributed and 1718 valid questionnaires were collected, with an effective rate of 96.84%. Among these students, 34.5% had heard and used large language models. There were statistically significant differences in the understanding of large language models between genders (p < 0.001), grade levels (junior-level students and senior-level students) (p = 0.03), and major (p < 0.001). Male, junior-level students, and public health management had a higher level of understanding of these models. Genders and majors had statistically significant effects on the degree of trust in large language models (p = 0.004; p = 0.02). Male and nursing students exhibited a higher degree of trust in large language models. As for usage, Male and junior-level students showed a significantly higher proportion of using these models for assisted learning (p < 0.001). Neutral sentiments were held by over two-thirds of the students (66.7%) regarding large language models, with only 51(3.0%) expressing pessimism. There were significant gender-based disparities in attitudes towards large language models, and male exhibited a more optimistic attitude towards these models (p < 0.001). Notably, among students with different levels of knowledge and trust in large language models, statistically significant differences were observed in their perceptions of the shortcomings and benefits of these models. CONCLUSION Our study identified gender, grade levels, and major as influential factors in students' understanding and utilization of large language models. This also suggested the feasibility of integrating large language models with traditional medical education to further enhance teaching effectiveness in the future.
Collapse
Affiliation(s)
- Guixia Pan
- Department of Epidemiology and Biostatistics, School of Public Health, Anhui Medical University, Meishan Road 81, Hefei, 230032, Anhui, China.
| | - Jing Ni
- Department of Epidemiology and Biostatistics, School of Public Health, Anhui Medical University, Meishan Road 81, Hefei, 230032, Anhui, China
| |
Collapse
|
174
|
Li KD, Fernandez AM, Schwartz R, Rios N, Carlisle MN, Amend GM, Patel HV, Breyer BN. Comparing GPT-4 and Human Researchers in Health Care Data Analysis: Qualitative Description Study. J Med Internet Res 2024; 26:e56500. [PMID: 39167785 PMCID: PMC11375389 DOI: 10.2196/56500] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/17/2024] [Revised: 05/31/2024] [Accepted: 07/09/2024] [Indexed: 08/23/2024] Open
Abstract
BACKGROUND Large language models including GPT-4 (OpenAI) have opened new avenues in health care and qualitative research. Traditional qualitative methods are time-consuming and require expertise to capture nuance. Although large language models have demonstrated enhanced contextual understanding and inferencing compared with traditional natural language processing, their performance in qualitative analysis versus that of humans remains unexplored. OBJECTIVE We evaluated the effectiveness of GPT-4 versus human researchers in qualitative analysis of interviews with patients with adult-acquired buried penis (AABP). METHODS Qualitative data were obtained from semistructured interviews with 20 patients with AABP. Human analysis involved a structured 3-stage process-initial observations, line-by-line coding, and consensus discussions to refine themes. In contrast, artificial intelligence (AI) analysis with GPT-4 underwent two phases: (1) a naïve phase, where GPT-4 outputs were independently evaluated by a blinded reviewer to identify themes and subthemes and (2) a comparison phase, where AI-generated themes were compared with human-identified themes to assess agreement. We used a general qualitative description approach. RESULTS The study population (N=20) comprised predominantly White (17/20, 85%), married (12/20, 60%), heterosexual (19/20, 95%) men, with a mean age of 58.8 years and BMI of 41.1 kg/m2. Human qualitative analysis identified "urinary issues" in 95% (19/20) and GPT-4 in 75% (15/20) of interviews, with the subtheme "spray or stream" noted in 60% (12/20) and 35% (7/20), respectively. "Sexual issues" were prominent (19/20, 95% humans vs 16/20, 80% GPT-4), although humans identified a wider range of subthemes, including "pain with sex or masturbation" (7/20, 35%) and "difficulty with sex or masturbation" (4/20, 20%). Both analyses similarly highlighted "mental health issues" (11/20, 55%, both), although humans coded "depression" more frequently (10/20, 50% humans vs 4/20, 20% GPT-4). Humans frequently cited "issues using public restrooms" (12/20, 60%) as impacting social life, whereas GPT-4 emphasized "struggles with romantic relationships" (9/20, 45%). "Hygiene issues" were consistently recognized (14/20, 70% humans vs 13/20, 65% GPT-4). Humans uniquely identified "contributing factors" as a theme in all interviews. There was moderate agreement between human and GPT-4 coding (κ=0.401). Reliability assessments of GPT-4's analyses showed consistent coding for themes including "body image struggles," "chronic pain" (10/10, 100%), and "depression" (9/10, 90%). Other themes like "motivation for surgery" and "weight challenges" were reliably coded (8/10, 80%), while less frequent themes were variably identified across multiple iterations. CONCLUSIONS Large language models including GPT-4 can effectively identify key themes in analyzing qualitative health care data, showing moderate agreement with human analysis. While human analysis provided a richer diversity of subthemes, the consistency of AI suggests its use as a complementary tool in qualitative research. With AI rapidly advancing, future studies should iterate analyses and circumvent token limitations by segmenting data, furthering the breadth and depth of large language model-driven qualitative analyses.
Collapse
Affiliation(s)
- Kevin Danis Li
- Department of Urology, University of California San Francisco, San Francisco, CA, United States
- Department of Epidemiology and Biostatistics, University of California San Francisco, San Francisco, CA, United States
| | - Adrian M Fernandez
- Department of Urology, University of California San Francisco, San Francisco, CA, United States
| | - Rachel Schwartz
- Department of Anesthesia and Perioperative Care, University of California San Francisco, San Francisco, CA, United States
- Division of General Internal Medicine, Department of Medicine, University of California San Francisco, San Francisco, CA, United States
| | - Natalie Rios
- Department of Urology, University of California San Francisco, San Francisco, CA, United States
| | | | - Gregory M Amend
- Department of Urology, Icahn School of Medicine at Mount Sinai, New York, NY, United States
| | - Hiren V Patel
- Department of Urology, University of California San Francisco, San Francisco, CA, United States
| | - Benjamin N Breyer
- Department of Urology, University of California San Francisco, San Francisco, CA, United States
- Department of Epidemiology and Biostatistics, University of California San Francisco, San Francisco, CA, United States
| |
Collapse
|
175
|
Du X, Zhou Z, Wang Y, Chuang YW, Yang R, Zhang W, Wang X, Zhang R, Hong P, Bates DW, Zhou L. Generative Large Language Models in Electronic Health Records for Patient Care Since 2023: A Systematic Review. MEDRXIV : THE PREPRINT SERVER FOR HEALTH SCIENCES 2024:2024.08.11.24311828. [PMID: 39228726 PMCID: PMC11370524 DOI: 10.1101/2024.08.11.24311828] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 09/05/2024]
Abstract
Background Generative Large language models (LLMs) represent a significant advancement in natural language processing, achieving state-of-the-art performance across various tasks. However, their application in clinical settings using real electronic health records (EHRs) is still rare and presents numerous challenges. Objective This study aims to systematically review the use of generative LLMs, and the effectiveness of relevant techniques in patient care-related topics involving EHRs, summarize the challenges faced, and suggest future directions. Methods A Boolean search for peer-reviewed articles was conducted on May 19th, 2024 using PubMed and Web of Science to include research articles published since 2023, which was one month after the release of ChatGPT. The search results were deduplicated. Multiple reviewers, including biomedical informaticians, computer scientists, and a physician, screened the publications for eligibility and conducted data extraction. Only studies utilizing generative LLMs to analyze real EHR data were included. We summarized the use of prompt engineering, fine-tuning, multimodal EHR data, and evaluation matrices. Additionally, we identified current challenges in applying LLMs in clinical settings as reported by the included studies and proposed future directions. Results The initial search identified 6,328 unique studies, with 76 studies included after eligibility screening. Of these, 67 studies (88.2%) employed zero-shot prompting, five of them reported 100% accuracy on five specific clinical tasks. Nine studies used advanced prompting strategies; four tested these strategies experimentally, finding that prompt engineering improved performance, with one study noting a non-linear relationship between the number of examples in a prompt and performance improvement. Eight studies explored fine-tuning generative LLMs, all reported performance improvements on specific tasks, but three of them noted potential performance degradation after fine-tuning on certain tasks. Only two studies utilized multimodal data, which improved LLM-based decision-making and enabled accurate rare disease diagnosis and prognosis. The studies employed 55 different evaluation metrics for 22 purposes, such as correctness, completeness, and conciseness. Two studies investigated LLM bias, with one detecting no bias and the other finding that male patients received more appropriate clinical decision-making suggestions. Six studies identified hallucinations, such as fabricating patient names in structured thyroid ultrasound reports. Additional challenges included but were not limited to the impersonal tone of LLM consultations, which made patients uncomfortable, and the difficulty patients had in understanding LLM responses. Conclusion Our review indicates that few studies have employed advanced computational techniques to enhance LLM performance. The diverse evaluation metrics used highlight the need for standardization. LLMs currently cannot replace physicians due to challenges such as bias, hallucinations, and impersonal responses.
Collapse
Affiliation(s)
- Xinsong Du
- Division of General Internal Medicine and Primary Care, Brigham and Women’s Hospital, Boston, Massachusetts 02115
- Department of Medicine, Harvard Medical School, Boston, Massachusetts 02115
- Department of Biomedical Informatics, Harvard Medical School, Boston, Massachusetts 02115
| | - Zhengyang Zhou
- Department of Computer Science, Brandeis University, Waltham, MA 02453
| | - Yifei Wang
- Department of Computer Science, Brandeis University, Waltham, MA 02453
| | - Ya-Wen Chuang
- Division of Nephrology, Department of Internal Medicine, Taichung Veterans General Hospital, Taichung, Taiwan, 407219
- Department of Post-Baccalaureate Medicine, College of Medicine, National Chung Hsing University, Taichung, Taiwan, 402202
- School of Medicine, College of Medicine, China Medical University, Taichung, Taiwan, 404328
| | - Richard Yang
- Division of General Internal Medicine and Primary Care, Brigham and Women’s Hospital, Boston, Massachusetts 02115
- Department of Medicine, Harvard Medical School, Boston, Massachusetts 02115
| | - Wenyu Zhang
- Division of General Internal Medicine and Primary Care, Brigham and Women’s Hospital, Boston, Massachusetts 02115
- Department of Medicine, Harvard Medical School, Boston, Massachusetts 02115
- Department of Biomedical Informatics, Harvard Medical School, Boston, Massachusetts 02115
| | - Xinyi Wang
- Division of General Internal Medicine and Primary Care, Brigham and Women’s Hospital, Boston, Massachusetts 02115
- Department of Medicine, Harvard Medical School, Boston, Massachusetts 02115
- Department of Biomedical Informatics, Harvard Medical School, Boston, Massachusetts 02115
| | - Rui Zhang
- Division of Computational Health Sciences, University of Minnesota, Minneapolis, MN 55455
| | - Pengyu Hong
- Department of Computer Science, Brandeis University, Waltham, MA 02453
| | - David W. Bates
- Division of General Internal Medicine and Primary Care, Brigham and Women’s Hospital, Boston, Massachusetts 02115
- Department of Medicine, Harvard Medical School, Boston, Massachusetts 02115
- Department of Health Policy and Management, Harvard T.H. Chan School of Public Health, Boston, MA 02115
| | - Li Zhou
- Division of General Internal Medicine and Primary Care, Brigham and Women’s Hospital, Boston, Massachusetts 02115
- Department of Medicine, Harvard Medical School, Boston, Massachusetts 02115
- Department of Biomedical Informatics, Harvard Medical School, Boston, Massachusetts 02115
| |
Collapse
|
176
|
Young CC, Enichen E, Rao A, Hilker S, Butler A, Laird-Gion J, Succi MD. Pilot Study of Large Language Models as an Age-Appropriate Explanatory Tool for Chronic Pediatric Conditions. MEDRXIV : THE PREPRINT SERVER FOR HEALTH SCIENCES 2024:2024.08.06.24311544. [PMID: 39148860 PMCID: PMC11326333 DOI: 10.1101/2024.08.06.24311544] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Subscribe] [Scholar Register] [Indexed: 08/17/2024]
Abstract
There exists a gap in existing patient education resources for children with chronic conditions. This pilot study assesses large language models' (LLMs) capacity to deliver developmentally appropriate explanations of chronic conditions to pediatric patients. Two commonly used LLMs generated responses that accurately, appropriately, and effectively communicate complex medical information, making them a potentially valuable tool for enhancing patient understanding and engagement in clinical settings.
Collapse
Affiliation(s)
- Cameron C. Young
- Harvard Medical School, Boston, MA
- Medically Engineered Solutions in Healthcare Incubator, Innovation in Operations Research Center, Mass General Brigham, Boston, MA
| | - Elizabeth Enichen
- Harvard Medical School, Boston, MA
- Medically Engineered Solutions in Healthcare Incubator, Innovation in Operations Research Center, Mass General Brigham, Boston, MA
| | - Arya Rao
- Harvard Medical School, Boston, MA
- Medically Engineered Solutions in Healthcare Incubator, Innovation in Operations Research Center, Mass General Brigham, Boston, MA
| | - Sidney Hilker
- Harvard Medical School, Boston, MA
- Boston Children’s Hospital, Boston, MA
| | - Alex Butler
- Harvard Medical School, Boston, MA
- Boston Children’s Hospital, Boston, MA
| | - Jessica Laird-Gion
- Harvard Medical School, Boston, MA
- Boston Children’s Hospital, Boston, MA
| | - Marc D. Succi
- Harvard Medical School, Boston, MA
- Medically Engineered Solutions in Healthcare Incubator, Innovation in Operations Research Center, Mass General Brigham, Boston, MA
- Department of Radiology, Massachusetts General Hospital, Boston, MA
| |
Collapse
|
177
|
Kaczmarczyk R, Wilhelm TI, Martin R, Roos J. Evaluating multimodal AI in medical diagnostics. NPJ Digit Med 2024; 7:205. [PMID: 39112822 PMCID: PMC11306783 DOI: 10.1038/s41746-024-01208-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/15/2024] [Accepted: 07/29/2024] [Indexed: 08/10/2024] Open
Abstract
This study evaluates multimodal AI models' accuracy and responsiveness in answering NEJM Image Challenge questions, juxtaposed with human collective intelligence, underscoring AI's potential and current limitations in clinical diagnostics. Anthropic's Claude 3 family demonstrated the highest accuracy among the evaluated AI models, surpassing the average human accuracy, while collective human decision-making outperformed all AI models. GPT-4 Vision Preview exhibited selectivity, responding more to easier questions with smaller images and longer questions.
Collapse
Affiliation(s)
- Robert Kaczmarczyk
- Department of Dermatology and Allergy, School of Medicine, Technical University of Munich, Munich, Germany
| | | | - Ron Martin
- Clinic of Plastic, Hand and Aesthetic Surgery, Burn Center, BG Clinic Bergmannstrost, Halle (Saale), Germany
| | - Jonas Roos
- Department of Orthopedics and Trauma Surgery, University Hospital of Bonn, Bonn, Germany
| |
Collapse
|
178
|
Yao JJ, Aggarwal M, Lopez RD, Namdari S. Large Language Models in Orthopaedics: Definitions, Uses, and Limitations. J Bone Joint Surg Am 2024; 106:1411-1418. [PMID: 38896652 DOI: 10.2106/jbjs.23.01417] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Indexed: 06/21/2024]
Abstract
➤ Large language models are a subset of artificial intelligence. Large language models are powerful tools that excel in natural language text processing and generation.➤ There are many potential clinical, research, and educational applications of large language models in orthopaedics, but the development of these applications needs to be focused on patient safety and the maintenance of high standards.➤ There are numerous methodological, ethical, and regulatory concerns with regard to the use of large language models. Orthopaedic surgeons need to be aware of the controversies and advocate for an alignment of these models with patient and caregiver priorities.
Collapse
Affiliation(s)
- Jie J Yao
- Rothman Orthopaedic Institute, Thomas Jefferson University, Philadelphia, Pennsylvania
| | | | - Ryan D Lopez
- Rothman Orthopaedic Institute, Thomas Jefferson University, Philadelphia, Pennsylvania
| | - Surena Namdari
- Rothman Orthopaedic Institute, Thomas Jefferson University, Philadelphia, Pennsylvania
| |
Collapse
|
179
|
Geantă M, Bădescu D, Chirca N, Nechita OC, Radu CG, Rascu S, Rădăvoi D, Sima C, Toma C, Jinga V. The Potential Impact of Large Language Models on Doctor-Patient Communication: A Case Study in Prostate Cancer. Healthcare (Basel) 2024; 12:1548. [PMID: 39120251 PMCID: PMC11311818 DOI: 10.3390/healthcare12151548] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/10/2024] [Revised: 07/16/2024] [Accepted: 08/03/2024] [Indexed: 08/10/2024] Open
Abstract
BACKGROUND In recent years, the integration of large language models (LLMs) into healthcare has emerged as a revolutionary approach to enhancing doctor-patient communication, particularly in the management of diseases such as prostate cancer. METHODS Our paper evaluated the effectiveness of three prominent LLMs-ChatGPT (3.5), Gemini (Pro), and Co-Pilot (the free version)-against the official Romanian Patient's Guide on prostate cancer. Employing a randomized and blinded method, our study engaged eight medical professionals to assess the responses of these models based on accuracy, timeliness, comprehensiveness, and user-friendliness. RESULTS The primary objective was to explore whether LLMs, when operating in Romanian, offer comparable or superior performance to the Patient's Guide, considering their potential to personalize communication and enhance the informational accessibility for patients. Results indicated that LLMs, particularly ChatGPT, generally provided more accurate and user-friendly information compared to the Guide. CONCLUSIONS The findings suggest a significant potential for LLMs to enhance healthcare communication by providing accurate and accessible information. However, variability in performance across different models underscores the need for tailored implementation strategies. We highlight the importance of integrating LLMs with a nuanced understanding of their capabilities and limitations to optimize their use in clinical settings.
Collapse
Affiliation(s)
- Marius Geantă
- Department of Urology, “Carol Davila” University of Medicine and Pharmacy, 8 Eroii Sanitari Blvd., 050474 Bucharest, Romania
- Center for Innovation in Medicine, 42J Theodor Pallady Bvd., 032266 Bucharest, Romania
- United Nations University—Maastricht Economic and Social Research Institute on Innovation and Technology, Boschstraat 24, 6211 AX Maastricht, The Netherlands
| | - Daniel Bădescu
- Department of Urology, “Carol Davila” University of Medicine and Pharmacy, 8 Eroii Sanitari Blvd., 050474 Bucharest, Romania
- Department of Urology, “Prof. Dr. Th. Burghele” Clinical Hospital, 20 Panduri Str., 050659 Bucharest, Romania
| | - Narcis Chirca
- Department of Urology, “Carol Davila” University of Medicine and Pharmacy, 8 Eroii Sanitari Blvd., 050474 Bucharest, Romania
- Department of Urology, “Prof. Dr. Th. Burghele” Clinical Hospital, 20 Panduri Str., 050659 Bucharest, Romania
| | - Ovidiu Cătălin Nechita
- Department of Urology, “Carol Davila” University of Medicine and Pharmacy, 8 Eroii Sanitari Blvd., 050474 Bucharest, Romania
- Department of Urology, “Prof. Dr. Th. Burghele” Clinical Hospital, 20 Panduri Str., 050659 Bucharest, Romania
| | - Cosmin George Radu
- Department of Urology, “Prof. Dr. Th. Burghele” Clinical Hospital, 20 Panduri Str., 050659 Bucharest, Romania
| | - Stefan Rascu
- Department of Urology, “Carol Davila” University of Medicine and Pharmacy, 8 Eroii Sanitari Blvd., 050474 Bucharest, Romania
- Department of Urology, “Prof. Dr. Th. Burghele” Clinical Hospital, 20 Panduri Str., 050659 Bucharest, Romania
| | - Daniel Rădăvoi
- Department of Urology, “Carol Davila” University of Medicine and Pharmacy, 8 Eroii Sanitari Blvd., 050474 Bucharest, Romania
- Department of Urology, “Prof. Dr. Th. Burghele” Clinical Hospital, 20 Panduri Str., 050659 Bucharest, Romania
| | - Cristian Sima
- Department of Urology, “Carol Davila” University of Medicine and Pharmacy, 8 Eroii Sanitari Blvd., 050474 Bucharest, Romania
- Department of Urology, “Prof. Dr. Th. Burghele” Clinical Hospital, 20 Panduri Str., 050659 Bucharest, Romania
| | - Cristian Toma
- Department of Urology, “Carol Davila” University of Medicine and Pharmacy, 8 Eroii Sanitari Blvd., 050474 Bucharest, Romania
- Department of Urology, “Prof. Dr. Th. Burghele” Clinical Hospital, 20 Panduri Str., 050659 Bucharest, Romania
| | - Viorel Jinga
- Department of Urology, “Carol Davila” University of Medicine and Pharmacy, 8 Eroii Sanitari Blvd., 050474 Bucharest, Romania
- Department of Urology, “Prof. Dr. Th. Burghele” Clinical Hospital, 20 Panduri Str., 050659 Bucharest, Romania
- Academy of Romanian Scientists, 3 Ilfov, 050085 Bucharest, Romania
| |
Collapse
|
180
|
Almansour M, Alfhaid FM. Generative artificial intelligence and the personalization of health professional education: A narrative review. Medicine (Baltimore) 2024; 103:e38955. [PMID: 39093806 PMCID: PMC11296413 DOI: 10.1097/md.0000000000038955] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 03/18/2024] [Accepted: 06/26/2024] [Indexed: 08/04/2024] Open
Abstract
This narrative review examined the intersection of generative artificial intelligence (GAI) and the personalization of health professional education (PHE). This review aims to the elucidate the current condition of GAI technologies and their particular uses in the field of PHE. Data were extracted and analyzed from studies focusing on the demographics and professional development preferences of healthcare workers, the competencies required for personalized precision medicine, and the current and potential applications of artificial intelligence (AI) in PHE. The review also addressed the ethical implications of AI implementation in this context. Findings indicated a gender-balanced healthcare workforce with a predisposition toward continuous professional development and digital tool utilization. A need for a comprehensive educational framework was identified to include a spectrum of skills crucial for precision medicine, emphasizing the importance of patient involvement and bioethics. AI was found to enhance educational experiences and research in PHE, with an increasing trend in AI applications, particularly in surgical education since 2018. Ethical challenges associated with AI integration in PHE were highlighted, with an emphasis on the need for ethical design and diverse development teams. Core concepts in AI research were established, with a spotlight on emerging areas such as data science and learning analytics. The application of AI in PHE was recognized for its current benefits and potential for future advancements, with a call for ethical vigilance. GAI holds significant promise for personalizing PHE, with an identified need for ethical frameworks and diverse developer teams to address bias and equity in educational AI applications.
Collapse
Affiliation(s)
- Mohammed Almansour
- Department of Medical Education, College of Medicine, King Saud University, Riyadh, Saudi Arabia
| | - Fahad Mohammad Alfhaid
- Department of family and community medicine, College of medicine, Majmaah University, Majmaah, Saudi Arabia
| |
Collapse
|
181
|
Calderaro J, Žigutytė L, Truhn D, Jaffe A, Kather JN. Artificial intelligence in liver cancer - new tools for research and patient management. Nat Rev Gastroenterol Hepatol 2024; 21:585-599. [PMID: 38627537 DOI: 10.1038/s41575-024-00919-y] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Accepted: 03/11/2024] [Indexed: 07/31/2024]
Abstract
Liver cancer has high incidence and mortality globally. Artificial intelligence (AI) has advanced rapidly, influencing cancer care. AI systems are already approved for clinical use in some tumour types (for example, colorectal cancer screening). Crucially, research demonstrates that AI can analyse histopathology, radiology and natural language in liver cancer, and can replace manual tasks and access hidden information in routinely available clinical data. However, for liver cancer, few of these applications have translated into large-scale clinical trials or clinically approved products. Here, we advocate for the incorporation of AI in all stages of liver cancer management. We present a taxonomy of AI approaches in liver cancer, highlighting areas with academic and commercial potential, and outline a policy for AI-based liver cancer management, including interdisciplinary training of researchers, clinicians and patients. The potential of AI in liver cancer is immense, but effort is required to ensure that AI can fulfil expectations.
Collapse
Affiliation(s)
- Julien Calderaro
- Département de Pathologie, Assistance Publique Hôpitaux de Paris, Groupe Hospitalier Henri Mondor, Créteil, France
- Institut Mondor de Recherche Biomédicale, MINT-HEP Mondor Integrative Hepatology, Université Paris Est Créteil, Créteil, France
| | - Laura Žigutytė
- Else Kroener Fresenius Center for Digital Health (EKFZ), Medical Faculty Carl Gustav Carus, TUD Dresden University of Technology, Dresden, Germany
| | - Daniel Truhn
- Department of Diagnostic and Interventional Radiology, University Hospital RWTH Aachen, Aachen, Germany
| | - Ariel Jaffe
- Mayo Clinic, Rochester, MN, USA
- Department of Internal Medicine, Section of Digestive Diseases, Yale School of Medicine, New Haven, CT, USA
| | - Jakob Nikolas Kather
- Else Kroener Fresenius Center for Digital Health (EKFZ), Medical Faculty Carl Gustav Carus, TUD Dresden University of Technology, Dresden, Germany.
- Department of Medicine I, University Hospital Dresden, Dresden, Germany.
- Medical Oncology, National Center for Tumour Diseases (NCT), University Hospital Heidelberg, Heidelberg, Germany.
| |
Collapse
|
182
|
Sridharan K, Sivaramakrishnan G. Enhancing readability of USFDA patient communications through large language models: a proof-of-concept study. Expert Rev Clin Pharmacol 2024; 17:731-741. [PMID: 38823007 DOI: 10.1080/17512433.2024.2363840] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/28/2024] [Accepted: 05/31/2024] [Indexed: 06/03/2024]
Abstract
BACKGROUND The US Food and Drug Administration (USFDA) communicates new drug safety concerns through drug safety communications (DSCs) and medication guides (MGs), which often challenge patients with average reading abilities due to their complexity. This study assesses whether large language models (LLMs) can enhance the readability of these materials. METHODS We analyzed the latest DSCs and MGs, using ChatGPT 4.0© and Gemini© to simplify them to a sixth-grade reading level. Outputs were evaluated for readability, technical accuracy, and content inclusiveness. RESULTS Original materials were difficult to read (DSCs grade level 13, MGs 22). LLMs significantly improved readability, reducing the grade levels to more accessible readings (Single prompt - DSCs: ChatGPT 4.0© 10.1, Gemini© 8; MGs: ChatGPT 4.0© 7.1, Gemini© 6.5. Multiple prompts - DSCs: ChatGPT 4.0© 10.3, Gemini© 7.5; MGs: ChatGPT 4.0© 8, Gemini© 6.8). LLM outputs retained technical accuracy and key messages. CONCLUSION LLMs can significantly simplify complex health-related information, making it more accessible to patients. Future research should extend these findings to other languages and patient groups in real-world settings.
Collapse
Affiliation(s)
- Kannan Sridharan
- Department of Pharmacology & Therapeutics, College of Medicine & Medical Sciences, Arabian Gulf University, Manama, Kingdom of Bahrain
| | - Gowri Sivaramakrishnan
- Speciality Dental Residency Program, Primary Health Care Centers, Manama, Kingdom of Bahrain
| |
Collapse
|
183
|
Benson R, Elia M, Hyams B, Chang JH, Hong JC. A Narrative Review on the Application of Large Language Models to Support Cancer Care and Research. Yearb Med Inform 2024; 33:90-98. [PMID: 40199294 PMCID: PMC12020524 DOI: 10.1055/s-0044-1800726] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 04/10/2025] Open
Abstract
OBJECTIVES The emergence of large language models has resulted in a significant shift in informatics research and carries promise in clinical cancer care. Here we provide a narrative review of the recent use of large language models (LLMs) to support cancer care, prevention, and research. METHODS We performed a search of the Scopus database for studies on the application of bidirectional encoder representations from transformers (BERT) and generative-pretrained transformer (GPT) LLMs in cancer care published between the start of 2021 and the end of 2023. We present salient and impactful papers related to each of these themes. RESULTS Studies identified focused on aspects of clinical decision support (CDS), cancer education, and support for research activities. The use of LLMs for CDS primarily focused on aspects of treatment and screening planning, treatment response, and the management of adverse events. Studies using LLMs for cancer education typically focused on question-answering, assessing cancer myths and misconceptions, and text summarization and simplification. Finally, studies using LLMs to support research activities focused on scientific writing and idea generation, cohort identification and extraction, clinical data processing, and NLP-centric tasks. CONCLUSIONS The application of LLMs in cancer care has shown promise across a variety of diverse use cases. Future research should utilize quantitative metrics, qualitative insights, and user insights in the development and evaluation of LLM-based cancer care tools. The development of open-source LLMs for use in cancer care research and activities should also be a priority.
Collapse
Affiliation(s)
- Ryzen Benson
- Department of Radiation Oncology, University of California, San Francisco, San Francisco, California
- Bakar Computational Health Sciences Institute, University of California, San Francisco, San Francisco, California
| | - Marianna Elia
- Department of Radiation Oncology, University of California, San Francisco, San Francisco, California
- Bakar Computational Health Sciences Institute, University of California, San Francisco, San Francisco, California
| | - Benjamin Hyams
- Department of Radiation Oncology, University of California, San Francisco, San Francisco, California
- Bakar Computational Health Sciences Institute, University of California, San Francisco, San Francisco, California
- School of Medicine, University of California, San Francisco, San Francisco, California
| | - Ji Hyun Chang
- Department of Radiation Oncology, University of California, San Francisco, San Francisco, California
- Bakar Computational Health Sciences Institute, University of California, San Francisco, San Francisco, California
- Department of Radiation Oncology, Seoul National University Hospital, Seoul National University College of Medicine, Seoul, Korea
| | - Julian C. Hong
- Department of Radiation Oncology, University of California, San Francisco, San Francisco, California
- Bakar Computational Health Sciences Institute, University of California, San Francisco, San Francisco, California
- UCSF UC Berkeley Joint Program in Computational Precision Health (CPH), San Francisco, CA
| |
Collapse
|
184
|
Luo X, Deng Z, Yang B, Luo MY. Pre-trained language models in medicine: A survey. Artif Intell Med 2024; 154:102904. [PMID: 38917600 DOI: 10.1016/j.artmed.2024.102904] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/15/2023] [Revised: 04/15/2024] [Accepted: 06/03/2024] [Indexed: 06/27/2024]
Abstract
With the rapid progress in Natural Language Processing (NLP), Pre-trained Language Models (PLM) such as BERT, BioBERT, and ChatGPT have shown great potential in various medical NLP tasks. This paper surveys the cutting-edge achievements in applying PLMs to various medical NLP tasks. Specifically, we first brief PLMS and outline the research of PLMs in medicine. Next, we categorise and discuss the types of tasks in medical NLP, covering text summarisation, question-answering, machine translation, sentiment analysis, named entity recognition, information extraction, medical education, relation extraction, and text mining. For each type of task, we first provide an overview of the basic concepts, the main methodologies, the advantages of applying PLMs, the basic steps of applying PLMs application, the datasets for training and testing, and the metrics for task evaluation. Subsequently, a summary of recent important research findings is presented, analysing their motivations, strengths vs weaknesses, similarities vs differences, and discussing potential limitations. Also, we assess the quality and influence of the research reviewed in this paper by comparing the citation count of the papers reviewed and the reputation and impact of the conferences and journals where they are published. Through these indicators, we further identify the most concerned research topics currently. Finally, we look forward to future research directions, including enhancing models' reliability, explainability, and fairness, to promote the application of PLMs in clinical practice. In addition, this survey also collect some download links of some model codes and the relevant datasets, which are valuable references for researchers applying NLP techniques in medicine and medical professionals seeking to enhance their expertise and healthcare service through AI technology.
Collapse
Affiliation(s)
- Xudong Luo
- School of Computer Science and Engineering, Guangxi Normal University, Guilin 541004, China; Guangxi Key Lab of Multi-source Information Mining, Guangxi Normal University, Guilin 541004, China; Key Laboratory of Education Blockchain and Intelligent Technology, Ministry of Education, Guangxi Normal University, Guilin 541004, China.
| | - Zhiqi Deng
- School of Computer Science and Engineering, Guangxi Normal University, Guilin 541004, China; Guangxi Key Lab of Multi-source Information Mining, Guangxi Normal University, Guilin 541004, China; Key Laboratory of Education Blockchain and Intelligent Technology, Ministry of Education, Guangxi Normal University, Guilin 541004, China.
| | - Binxia Yang
- School of Computer Science and Engineering, Guangxi Normal University, Guilin 541004, China; Guangxi Key Lab of Multi-source Information Mining, Guangxi Normal University, Guilin 541004, China; Key Laboratory of Education Blockchain and Intelligent Technology, Ministry of Education, Guangxi Normal University, Guilin 541004, China.
| | - Michael Y Luo
- Emmanuel College, Cambridge University, Cambridge, CB2 3AP, UK.
| |
Collapse
|
185
|
Vassis S, Powell H, Petersen E, Barkmann A, Noeldeke B, Kristensen KD, Stoustrup P. Large-Language Models in Orthodontics: Assessing Reliability and Validity of ChatGPT in Pretreatment Patient Education. Cureus 2024; 16:e68085. [PMID: 39347180 PMCID: PMC11437517 DOI: 10.7759/cureus.68085] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 08/29/2024] [Indexed: 10/01/2024] Open
Abstract
BACKGROUND Patients seeking orthodontic treatment may use large language models (LLMs) such as Chat-GPT for self-education, thereby impacting their decision-making process. This study assesses the reliability and validity of Chat-GPT prompts aimed at informing patients about orthodontic side effects and examines patients' perceptions of this information. MATERIALS AND METHODS To assess reliability, n = 28 individuals were asked to generate information from GPT-3.5 and Generative Pretrained Transformer 4 (GPT-4) about side effects related to orthodontic treatment using both self-formulated and standardized prompts. Three experts evaluated the content generated based on these prompts regarding its validity. We asked a cohort of 46 orthodontic patients about their perceptions after reading an AI-generated information text about orthodontic side effects and compared it with the standard text from the postgraduate orthodontic program at Aarhus University. RESULTS Although the GPT-generated answers mentioned several relevant side effects, the replies were diverse. The experts rated the AI-generated content generally as "neither deficient nor satisfactory," with GPT-4 achieving higher scores than GPT-3.5. The patients perceived the GPT-generated information as more useful and more comprehensive and experienced less nervousness when reading the GPT-generated information. Nearly 80% of patients preferred the AI-generated information over the standard text. CONCLUSIONS Although patients generally prefer AI-generated information regarding the side effects of orthodontic treatment, the tested prompts fall short of providing thoroughly satisfactory and high-quality education to patients.
Collapse
Affiliation(s)
- Stratos Vassis
- Section of Orthodontics, Department of Dentistry and Oral Health, Aarhus University, Aarhus, DNK
| | - Harriet Powell
- Section of Orthodontics, Department of Dentistry and Oral Health, Aarhus Universiy, Aarhus, DNK
| | - Emma Petersen
- Department of Dentistry and Oral Health, Aarhus University, Aarhus, DNK
| | - Asta Barkmann
- Department of Dentistry and Oral Health, Aarhus University, Aarhus, DNK
| | - Beatrice Noeldeke
- Department of Oral and Maxillofacial Surgery, Aarhus University Hospital, Aarhus, DNK
| | - Kasper D Kristensen
- Section of Orthodontics, Department of Dentistry and Oral Health, Aarhus University, Aarhus, DNK
| | - Peter Stoustrup
- Section of Orthodontics, Department of Dentistry and Oral Health, Aarhus University, Aarhus, DNK
| |
Collapse
|
186
|
Viswanathan VS, Parmar V, Madabhushi A. Towards equitable AI in oncology. Nat Rev Clin Oncol 2024; 21:628-637. [PMID: 38849530 DOI: 10.1038/s41571-024-00909-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 05/21/2024] [Indexed: 06/09/2024]
Abstract
Artificial intelligence (AI) stands at the threshold of revolutionizing clinical oncology, with considerable potential to improve early cancer detection and risk assessment, and to enable more accurate personalized treatment recommendations. However, a notable imbalance exists in the distribution of the benefits of AI, which disproportionately favour those living in specific geographical locations and in specific populations. In this Perspective, we discuss the need to foster the development of equitable AI tools that are both accurate in and accessible to a diverse range of patient populations, including those in low-income to middle-income countries. We also discuss some of the challenges and potential solutions in attaining equitable AI, including addressing the historically limited representation of diverse populations in existing clinical datasets and the use of inadequate clinical validation methods. Additionally, we focus on extant sources of inequity including the type of model approach (such as deep learning, and feature engineering-based methods), the implications of dataset curation strategies, the need for rigorous validation across a variety of populations and settings, and the risk of introducing contextual bias that comes with developing tools predominantly in high-income countries.
Collapse
Affiliation(s)
| | - Vani Parmar
- Department of Breast Surgical Oncology, Punyashlok Ahilyadevi Holkar Head & Neck Cancer Institute of India, Mumbai, India
| | - Anant Madabhushi
- Department of Biomedical Engineering, Emory University and Georgia Institute of Technology, Atlanta, GA, USA.
- Atlanta Veterans Administration Medical Center, Atlanta, GA, USA.
| |
Collapse
|
187
|
Ajmera P, Nischal N, Ariyaratne S, Botchu B, Bhamidipaty KDP, Iyengar KP, Ajmera SR, Jenko N, Botchu R. Validity of ChatGPT-generated musculoskeletal images. Skeletal Radiol 2024; 53:1583-1593. [PMID: 38438538 DOI: 10.1007/s00256-024-04638-y] [Citation(s) in RCA: 5] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 01/23/2024] [Revised: 02/26/2024] [Accepted: 02/27/2024] [Indexed: 03/06/2024]
Abstract
OBJECTIVE In the evolving landscape of medical research and radiology, effective communication of intricate ideas is imperative, with visualizations playing a crucial role. This study explores the transformative potential of ChatGPT4, a powerful Large Language Model (LLM), in automating the creation of schematics and figures for radiology research papers, specifically focusing on its implications for musculoskeletal studies. MATERIALS AND METHODS Deploying ChatGPT4, the study aimed to assess the model's ability to generate anatomical images of six large joints-shoulder, elbow, wrist, hip, knee, and ankle. Four variations of a text prompt were utilized, to generate a coronal illustration with annotations for each joint. Evaluation parameters included anatomical correctness, correctness of annotations, aesthetic nature of illustrations, usability of figures in research papers, and cost-effectiveness. Four panellists performed the assessment using a 5-point Likert Scale. RESULTS Overall analysis of the 24 illustrations encompassing the six joints of interest (4 of each) revealed significant limitations in ChatGPT4's performance. The anatomical design ranged from poor to good, all of the illustrations received a below-average rating for annotation, with the majority assessed as poor. All of them ranked below average for usability in research papers. There was good agreement between raters across all domains (ICC = 0.61). CONCLUSION While LLMs like ChatGPT4 present promising prospects for rapid figure generation, their current capabilities fall short of meeting the rigorous standards demanded by musculoskeletal radiology research. Future developments should focus on iterative refinement processes to enhance the realism of LLM-generated musculoskeletal schematics.
Collapse
Affiliation(s)
- P Ajmera
- Department of Radiology, Mayo Clinic, Rochester, MN, USA
| | - N Nischal
- Department of Radiology, Holy Family Hospital, New Delhi, India
| | - S Ariyaratne
- Department of Musculoskeletal Radiology, Royal Orthopedic Hospital, Bristol Road South, Northfield, Birmingham, UK
| | | | | | - K P Iyengar
- Department of Orthopedics, Southport and Ormskirk Hospital, Mersey and West Lancashire NHS Trust, Southport, UK
| | - S R Ajmera
- Department of Radiology, Mayo Clinic, Rochester, MN, USA
| | - N Jenko
- Department of Musculoskeletal Radiology, Royal Orthopedic Hospital, Bristol Road South, Northfield, Birmingham, UK
| | - R Botchu
- Department of Musculoskeletal Radiology, Royal Orthopedic Hospital, Bristol Road South, Northfield, Birmingham, UK.
| |
Collapse
|
188
|
Su Z, Tang G, Huang R, Qiao Y, Zhang Z, Dai X. Based on Medicine, The Now and Future of Large Language Models. Cell Mol Bioeng 2024; 17:263-277. [PMID: 39372551 PMCID: PMC11450117 DOI: 10.1007/s12195-024-00820-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/11/2024] [Accepted: 09/08/2024] [Indexed: 10/08/2024] Open
Abstract
OBJECTIVES This review explores the potential applications of large language models (LLMs) such as ChatGPT, GPT-3.5, and GPT-4 in the medical field, aiming to encourage their prudent use, provide professional support, and develop accessible medical AI tools that adhere to healthcare standards. METHODS This paper examines the impact of technologies such as OpenAI's Generative Pre-trained Transformers (GPT) series, including GPT-3.5 and GPT-4, and other large language models (LLMs) in medical education, scientific research, clinical practice, and nursing. Specifically, it includes supporting curriculum design, acting as personalized learning assistants, creating standardized simulated patient scenarios in education; assisting with writing papers, data analysis, and optimizing experimental designs in scientific research; aiding in medical imaging analysis, decision-making, patient education, and communication in clinical practice; and reducing repetitive tasks, promoting personalized care and self-care, providing psychological support, and enhancing management efficiency in nursing. RESULTS LLMs, including ChatGPT, have demonstrated significant potential and effectiveness in the aforementioned areas, yet their deployment in healthcare settings is fraught with ethical complexities, potential lack of empathy, and risks of biased responses. CONCLUSION Despite these challenges, significant medical advancements can be expected through the proper use of LLMs and appropriate policy guidance. Future research should focus on overcoming these barriers to ensure the effective and ethical application of LLMs in the medical field.
Collapse
Affiliation(s)
- Ziqing Su
- Department of Neurosurgery, The First Affiliated Hospital of Anhui Medical University, 218 Jixi Road, Hefei, 230022 P.R. China
- Department of Clinical Medicine, The First Clinical College of Anhui Medical University, Hefei, 230022 P.R. China
| | - Guozhang Tang
- Department of Neurosurgery, The First Affiliated Hospital of Anhui Medical University, 218 Jixi Road, Hefei, 230022 P.R. China
- Department of Clinical Medicine, The Second Clinical College of Anhui Medical University, Hefei, 230032 Anhui P.R. China
| | - Rui Huang
- Department of Neurosurgery, The First Affiliated Hospital of Anhui Medical University, 218 Jixi Road, Hefei, 230022 P.R. China
- Department of Clinical Medicine, The First Clinical College of Anhui Medical University, Hefei, 230022 P.R. China
| | - Yang Qiao
- Department of Neurosurgery, The First Affiliated Hospital of Anhui Medical University, 218 Jixi Road, Hefei, 230022 P.R. China
| | - Zheng Zhang
- Department of Neurosurgery, The First Affiliated Hospital of Anhui Medical University, 218 Jixi Road, Hefei, 230022 P.R. China
- Department of Clinical Medicine, The First Clinical College of Anhui Medical University, Hefei, 230022 P.R. China
| | - Xingliang Dai
- Department of Neurosurgery, The First Affiliated Hospital of Anhui Medical University, 218 Jixi Road, Hefei, 230022 P.R. China
- Department of Research & Development, East China Institute of Digital Medical Engineering, Shangrao, 334000 P.R. China
| |
Collapse
|
189
|
Doo FX, Savani D, Kanhere A, Carlos RC, Joshi A, Yi PH, Parekh VS, Atzen S. Optimal Large Language Model Characteristics to Balance Accuracy and Energy Use for Sustainable Medical Applications. Radiology 2024; 312:e240320. [PMID: 39189909 PMCID: PMC11366671 DOI: 10.1148/radiol.240320] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/01/2024] [Revised: 06/17/2024] [Accepted: 06/18/2024] [Indexed: 08/28/2024]
Abstract
Background Large language models (LLMs) for medical applications use unknown amounts of energy, which contribute to the overall carbon footprint of the health care system. Purpose To investigate the tradeoffs between accuracy and energy use when using different LLM types and sizes for medical applications. Materials and Methods This retrospective study evaluated five different billion (B)-parameter sizes of two open-source LLMs (Meta's Llama 2, a general-purpose model, and LMSYS Org's Vicuna 1.5, a specialized fine-tuned model) using chest radiograph reports from the National Library of Medicine's Indiana University Chest X-ray Collection. Reports with missing demographic information and missing or blank files were excluded. Models were run on local compute clusters with visual computing graphic processing units. A single-task prompt explained clinical terminology and instructed each model to confirm the presence or absence of each of the 13 CheXpert disease labels. Energy use (in kilowatt-hours) was measured using an open-source tool. Accuracy was assessed with 13 CheXpert reference standard labels for diagnostic findings on chest radiographs, where overall accuracy was the mean of individual accuracies of all 13 labels. Efficiency ratios (accuracy per kilowatt-hour) were calculated for each model type and size. Results A total of 3665 chest radiograph reports were evaluated. The Vicuna 1.5 7B and 13B models had higher efficiency ratios (737.28 and 331.40, respectively) and higher overall labeling accuracy (93.83% [3438.69 of 3665 reports] and 93.65% [3432.38 of 3665 reports], respectively) than that of the Llama 2 models (7B: efficiency ratio of 13.39, accuracy of 7.91% [289.76 of 3665 reports]; 13B: efficiency ratio of 40.90, accuracy of 74.08% [2715.15 of 3665 reports]; 70B: efficiency ratio of 22.30, accuracy of 92.70% [3397.38 of 3665 reports]). Vicuna 1.5 7B had the highest efficiency ratio (737.28 vs 13.39 for Llama 2 7B). The larger Llama 2 70B model used more than seven times the energy of its 7B counterpart (4.16 kWh vs 0.59 kWh) with low overall accuracy, resulting in an efficiency ratio of only 22.30. Conclusion Smaller fine-tuned LLMs were more sustainable than larger general-purpose LLMs, using less energy without compromising accuracy, highlighting the importance of LLM selection for medical applications. © RSNA, 2024 Supplemental material is available for this article.
Collapse
Affiliation(s)
| | | | - Adway Kanhere
- From the University of Maryland Medical Intelligent Imaging (UM2ii)
Center, Department of Radiology and Nuclear Medicine, University of Maryland
School of Medicine, 22 S Greene St, Baltimore, MD 21201 (F.X.D., D.S., A.K.,
P.H.Y., V.S.P.); Department of Radiology, University of Michigan, Ann Arbor,
Mich (R.C.C.); and Department of Computer Science and Electrical Engineering,
University of Maryland Baltimore County, Baltimore, Md (A.J.)
| | - Ruth C. Carlos
- From the University of Maryland Medical Intelligent Imaging (UM2ii)
Center, Department of Radiology and Nuclear Medicine, University of Maryland
School of Medicine, 22 S Greene St, Baltimore, MD 21201 (F.X.D., D.S., A.K.,
P.H.Y., V.S.P.); Department of Radiology, University of Michigan, Ann Arbor,
Mich (R.C.C.); and Department of Computer Science and Electrical Engineering,
University of Maryland Baltimore County, Baltimore, Md (A.J.)
| | - Anupam Joshi
- From the University of Maryland Medical Intelligent Imaging (UM2ii)
Center, Department of Radiology and Nuclear Medicine, University of Maryland
School of Medicine, 22 S Greene St, Baltimore, MD 21201 (F.X.D., D.S., A.K.,
P.H.Y., V.S.P.); Department of Radiology, University of Michigan, Ann Arbor,
Mich (R.C.C.); and Department of Computer Science and Electrical Engineering,
University of Maryland Baltimore County, Baltimore, Md (A.J.)
| | - Paul H. Yi
- From the University of Maryland Medical Intelligent Imaging (UM2ii)
Center, Department of Radiology and Nuclear Medicine, University of Maryland
School of Medicine, 22 S Greene St, Baltimore, MD 21201 (F.X.D., D.S., A.K.,
P.H.Y., V.S.P.); Department of Radiology, University of Michigan, Ann Arbor,
Mich (R.C.C.); and Department of Computer Science and Electrical Engineering,
University of Maryland Baltimore County, Baltimore, Md (A.J.)
| | - Vishwa S. Parekh
- From the University of Maryland Medical Intelligent Imaging (UM2ii)
Center, Department of Radiology and Nuclear Medicine, University of Maryland
School of Medicine, 22 S Greene St, Baltimore, MD 21201 (F.X.D., D.S., A.K.,
P.H.Y., V.S.P.); Department of Radiology, University of Michigan, Ann Arbor,
Mich (R.C.C.); and Department of Computer Science and Electrical Engineering,
University of Maryland Baltimore County, Baltimore, Md (A.J.)
| | - Sarah Atzen
- From the University of Maryland Medical Intelligent Imaging (UM2ii)
Center, Department of Radiology and Nuclear Medicine, University of Maryland
School of Medicine, 22 S Greene St, Baltimore, MD 21201 (F.X.D., D.S., A.K.,
P.H.Y., V.S.P.); Department of Radiology, University of Michigan, Ann Arbor,
Mich (R.C.C.); and Department of Computer Science and Electrical Engineering,
University of Maryland Baltimore County, Baltimore, Md (A.J.)
| |
Collapse
|
190
|
Blake RM, Khusid JA. Artificial Intelligence for Urology Research: The Holy Grail of Data Science or Pandora's Box of Misinformation? J Endourol 2024; 38:741-747. [PMID: 38545764 DOI: 10.1089/end.2023.0703] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 08/11/2024] Open
Abstract
Introduction: Artificial intelligence tools such as the large language models (LLMs) Bard and ChatGPT have generated significant research interest. Utilization of these LLMs to study the epidemiology of a target population could benefit urologists. We investigated whether Bard and ChatGPT can perform a large-scale calculation of the incidence and prevalence of kidney stone disease. Materials and Methods: We obtained reference values from two published studies, which used the National Health and Nutrition Examination Survey (NHANES) database to calculate the prevalence and incidence of kidney stone disease. We then tested the capability of Bard and ChatGPT to perform similar calculations using two different methods. First, we instructed the LLMs to access the data sets and independently perform the calculation. Second, we instructed the interfaces to generate a customized computer code, which could perform the calculation on downloaded data sets. Results: While ChatGPT denied the ability to access and perform calculations on the NHANES database, Bard intermittently claimed the ability to do so. Bard provided either accurate results or inaccurate and inconsistent results. For example, Bard's "calculations" for the incidence of kidney stones from 2015 to 2018 were 2.1% (95% CI 1.5-2.7), 1.75% (95% CI 1.6-1.9), and 0.8% (95% CI 0.7-0.9), while the published number was 2.1% (95% CI 1.5-2.7). Bard provided discrete mathematical details of its calculations, however, when prompted further, admitted to having obtained the numbers from online sources, including our chosen reference articles, rather than from a de novo calculation. Both LLMs were able to produce a code (Python) to use on the downloaded NHANES data sets, however, these would not readily execute. Conclusions: ChatGPT and Bard are currently incapable of performing epidemiologic calculations and lack transparency and accountability. Caution should be used, particularly with Bard, as claims of its capabilities were convincingly misleading, and results were inconsistent.
Collapse
Affiliation(s)
- Ryan M Blake
- Department of Urology, Icahn School of Medicine at Mount Sinai, New York, New York, USA
| | - Johnathan A Khusid
- Department of Urology, Icahn School of Medicine at Mount Sinai, New York, New York, USA
| |
Collapse
|
191
|
Sarker A, Zhang R, Wang Y, Xiao Y, Das S, Schutte D, Oniani D, Xie Q, Xu H. Natural Language Processing for Digital Health in the Era of Large Language Models. Yearb Med Inform 2024; 33:229-240. [PMID: 40199310 PMCID: PMC12020548 DOI: 10.1055/s-0044-1800750] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 04/10/2025] Open
Abstract
OBJECTIVES Large language models (LLMs) are revolutionizing the natural language pro-cessing (NLP) landscape within healthcare, prompting the need to synthesize the latest ad-vancements and their diverse medical applications. We attempt to summarize the current state of research in this rapidly evolving space. METHODS We conducted a review of the most recent studies on biomedical NLP facilitated by LLMs, sourcing literature from PubMed, the Association for Computational Linguistics Anthology, IEEE Explore, and Google Scholar (the latter particularly for preprints). Given the ongoing exponential growth in LLM-related publications, our survey was inherently selective. We attempted to abstract key findings in terms of (i) LLMs customized for medical texts, and (ii) the type of medical text being leveraged by LLMs, namely medical literature, electronic health records (EHRs), and social media. In addition to technical details, we touch upon topics such as privacy, bias, interpretability, and equitability. RESULTS We observed that while general-purpose LLMs (e.g., GPT-4) are most popular, there is a growing trend in training or customizing open-source LLMs for specific biomedi-cal texts and tasks. Several promising open-source LLMs are currently available, and appli-cations involving EHRs and biomedical literature are more prominent relative to noisier data sources such as social media. For supervised classification and named entity recogni-tion tasks, traditional (encoder only) transformer-based models still outperform new-age LLMs, and the latter are typically suited for few-shot settings and generative tasks such as summarization. There is still a paucity of research on evaluation, bias, privacy, reproduci-bility, and equitability of LLMs. CONCLUSIONS LLMs have the potential to transform NLP tasks within the broader medical domain. While technical progress continues, biomedical application focused research must prioritize aspects not necessarily related to performance such as task-oriented evaluation, bias, and equitable use.
Collapse
Affiliation(s)
| | - Rui Zhang
- University of Minnesota, Minneapolis, MN, USA
| | | | | | | | | | | | | | - Hua Xu
- Yale University, New Haven, CT, USA
| |
Collapse
|
192
|
Singh SP, Jamal A, Qureshi F, Zaidi R, Qureshi F. Leveraging Generative Artificial Intelligence Models in Patient Education on Inferior Vena Cava Filters. Clin Pract 2024; 14:1507-1514. [PMID: 39194925 DOI: 10.3390/clinpract14040121] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/06/2024] [Revised: 06/13/2024] [Accepted: 07/23/2024] [Indexed: 08/29/2024] Open
Abstract
Background: Inferior Vena Cava (IVC) filters have become an advantageous treatment modality for patients with venous thromboembolism. As the use of these filters continues to grow, it is imperative for providers to appropriately educate patients in a comprehensive yet understandable manner. Likewise, generative artificial intelligence models are a growing tool in patient education, but there is little understanding of the readability of these tools on IVC filters. Methods: This study aimed to determine the Flesch Reading Ease (FRE), Flesch-Kincaid, and Gunning Fog readability of IVC Filter patient educational materials generated by these artificial intelligence models. Results: The ChatGPT cohort had the highest mean Gunning Fog score at 17.76 ± 1.62 and the lowest at 11.58 ± 1.55 among the Copilot cohort. The difference between groups for Flesch Reading Ease scores (p = 8.70408 × 10-8) was found to be statistically significant albeit with priori power found to be low at 0.392. Conclusions: The results of this study indicate that the answers generated by the Microsoft Copilot cohort offers a greater degree of readability compared to ChatGPT cohort regarding IVC filters. Nevertheless, the mean Flesch-Kincaid readability for both cohorts does not meet the recommended U.S. grade reading levels.
Collapse
Affiliation(s)
- Som P Singh
- Department of Internal Medicine, University of Missouri Kansas City School of Medicine, Kansas City, MO 64108, USA
| | - Aleena Jamal
- Sidney Kimmel Medical College, Thomas Jefferson University, Philadelphia, PA 19107, USA
| | - Farah Qureshi
- Lake Erie College of Osteopathic Medicine, Erie, PA 16509, USA
| | - Rohma Zaidi
- Department of Internal Medicine, University of Missouri Kansas City School of Medicine, Kansas City, MO 64108, USA
| | - Fawad Qureshi
- Department of Nephrology and Hypertension, Mayo Clinic Alix School of Medicine, Rochester, MN 55905, USA
| |
Collapse
|
193
|
Hindy JR, Souaid T, Kovacs CS. Capabilities of GPT-4o and Gemini 1.5 Pro in Gram stain and bacterial shape identification. Future Microbiol 2024; 19:1283-1292. [PMID: 39069960 PMCID: PMC11486216 DOI: 10.1080/17460913.2024.2381967] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/05/2024] [Accepted: 07/16/2024] [Indexed: 07/30/2024] Open
Abstract
Aim: Assessing the visual accuracy of two large language models (LLMs) in microbial classification.Materials & methods: GPT-4o and Gemini 1.5 Pro were evaluated in distinguishing Gram-positive from Gram-negative bacteria and classifying them as cocci or bacilli using 80 Gram stain images from a labeled database.Results: GPT-4o achieved 100% accuracy in identifying simultaneously Gram stain and shape for Clostridium perfringens, Pseudomonas aeruginosa and Staphylococcus aureus. Gemini 1.5 Pro showed more variability for similar bacteria (45, 100 and 95%, respectively). Both LLMs failed to identify both Gram stain and bacterial shape for Neisseria gonorrhoeae. Cumulative accuracy plots indicated that GPT-4o consistently performed equally or better in every identification, except for Neisseria gonorrhoeae's shape.Conclusion: These results suggest that these LLMs in their unprimed state are not ready to be implemented in clinical practice and highlight the need for more research with larger datasets to improve LLMs' effectiveness in clinical microbiology.
Collapse
Affiliation(s)
- Joya-Rita Hindy
- Department of Internal Medicine, Cleveland Clinic, Cleveland, OH 44195, USA
| | - Tarek Souaid
- Department of Internal Medicine, Cleveland Clinic, Cleveland, OH 44195, USA
| | | |
Collapse
|
194
|
Kaya K, Gietzen C, Hahnfeldt R, Zoubi M, Emrich T, Halfmann MC, Sieren MM, Elser Y, Krumm P, Brendel JM, Nikolaou K, Haag N, Borggrefe J, Krüchten RV, Müller-Peltzer K, Ehrengut C, Denecke T, Hagendorff A, Goertz L, Gertz RJ, Bunck AC, Maintz D, Persigehl T, Lennartz S, Luetkens JA, Jaiswal A, Iuga AI, Pennig L, Kottlors J. Generative Pre-trained Transformer 4 analysis of cardiovascular magnetic resonance reports in suspected myocarditis: A multicenter study. J Cardiovasc Magn Reson 2024; 26:101068. [PMID: 39079602 PMCID: PMC11414660 DOI: 10.1016/j.jocmr.2024.101068] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/18/2024] [Revised: 07/04/2024] [Accepted: 07/24/2024] [Indexed: 09/13/2024] Open
Abstract
BACKGROUND Diagnosing myocarditis relies on multimodal data, including cardiovascular magnetic resonance (CMR), clinical symptoms, and blood values. The correct interpretation and integration of CMR findings require radiological expertise and knowledge. We aimed to investigate the performance of Generative Pre-trained Transformer 4 (GPT-4), a large language model, for report-based medical decision-making in the context of cardiac MRI for suspected myocarditis. METHODS This retrospective study includes CMR reports from 396 patients with suspected myocarditis and eight centers, respectively. CMR reports and patient data including blood values, age, and further clinical information were provided to GPT-4 and radiologists with 1 (resident 1), 2 (resident 2), and 4 years (resident 3) of experience in CMR and knowledge of the 2018 Lake Louise Criteria. The final impression of the report regarding the radiological assessment of whether myocarditis is present or not was not provided. The performance of Generative pre-trained transformer 4 (GPT-4) and the human readers were compared to a consensus reading (two board-certified radiologists with 8 and 10 years of experience in CMR). Sensitivity, specificity, and accuracy were calculated. RESULTS GPT-4 yielded an accuracy of 83%, sensitivity of 90%, and specificity of 78%, which was comparable to the physician with 1 year of experience (R1: 86%, 90%, 84%, p = 0.14) and lower than that of more experienced physicians (R2: 89%, 86%, 91%, p = 0.007 and R3: 91%, 85%, 96%, p < 0.001). GPT-4 and human readers showed a higher diagnostic performance when results from T1- and T2-mapping sequences were part of the reports, for residents 1 and 3 with statistical significance (p = 0.004 and p = 0.02, respectively). CONCLUSION GPT-4 yielded good accuracy for diagnosing myocarditis based on CMR reports in a large dataset from multiple centers and therefore holds the potential to serve as a diagnostic decision-supporting tool in this capacity, particularly for less experienced physicians. Further studies are required to explore the full potential and elucidate educational aspects of the integration of large language models in medical decision-making.
Collapse
Affiliation(s)
- Kenan Kaya
- Institute for Diagnostic and Interventional Radiology, Faculty of Medicine and University Hospital Cologne, University of Cologne, Cologne, Germany.
| | - Carsten Gietzen
- Institute for Diagnostic and Interventional Radiology, Faculty of Medicine and University Hospital Cologne, University of Cologne, Cologne, Germany
| | - Robert Hahnfeldt
- Institute for Diagnostic and Interventional Radiology, Faculty of Medicine and University Hospital Cologne, University of Cologne, Cologne, Germany
| | - Maher Zoubi
- Institute for Diagnostic and Interventional Radiology, Faculty of Medicine and University Hospital Bonn, University of Bonn, Bonn, Germany
| | - Tilman Emrich
- Department of Diagnostic and Interventional Radiology, University Medical Center of the Johannes-Gutenberg-University, Mainz, Germany; Division of Cardiovascular Imaging, Department of Radiology and Radiological Science, Medical University of South Carolina, Charleston, South Carolina, USA; German Centre for Cardiovascular Research, Partner Site Rhine-Main, Mainz, Germany
| | - Moritz C Halfmann
- Department of Diagnostic and Interventional Radiology, University Medical Center of the Johannes-Gutenberg-University, Mainz, Germany
| | - Malte Maria Sieren
- Department of Radiology and Nuclear Medicine, UKSH, Campus Lübeck, Lübeck, Germany; Institute of Interventional Radiology, UKSH, Campus Lübeck, Lübeck, Germany
| | - Yannic Elser
- Department of Radiology and Nuclear Medicine, UKSH, Campus Lübeck, Lübeck, Germany
| | - Patrick Krumm
- Department of Radiology, Diagnostic and Interventional Radiology, University of Tübingen, Tübingen, Germany
| | - Jan M Brendel
- Department of Radiology, Diagnostic and Interventional Radiology, University of Tübingen, Tübingen, Germany
| | - Konstantin Nikolaou
- Department of Radiology, Diagnostic and Interventional Radiology, University of Tübingen, Tübingen, Germany
| | - Nina Haag
- Institute for Radiology, Neuroradiology and Nuclear Medicine Johannes Wesling University Hospital/Mühlenkreiskliniken, Bochum/Minden, Germany
| | - Jan Borggrefe
- Institute for Radiology, Neuroradiology and Nuclear Medicine Johannes Wesling University Hospital/Mühlenkreiskliniken, Bochum/Minden, Germany
| | - Ricarda von Krüchten
- Department of Diagnostic and Interventional Radiology, Medical Center, Faculty of Medicine, University of Freiburg, Freiburg, Germany
| | - Katharina Müller-Peltzer
- Department of Diagnostic and Interventional Radiology, Medical Center, Faculty of Medicine, University of Freiburg, Freiburg, Germany
| | - Constantin Ehrengut
- Department of Diagnostic and Interventional Radiology, University of Leipzig, Leipzig, Germany
| | - Timm Denecke
- Department of Diagnostic and Interventional Radiology, University of Leipzig, Leipzig, Germany
| | | | - Lukas Goertz
- Institute for Diagnostic and Interventional Radiology, Faculty of Medicine and University Hospital Cologne, University of Cologne, Cologne, Germany
| | - Roman J Gertz
- Institute for Diagnostic and Interventional Radiology, Faculty of Medicine and University Hospital Cologne, University of Cologne, Cologne, Germany
| | - Alexander Christian Bunck
- Institute for Diagnostic and Interventional Radiology, Faculty of Medicine and University Hospital Cologne, University of Cologne, Cologne, Germany
| | - David Maintz
- Institute for Diagnostic and Interventional Radiology, Faculty of Medicine and University Hospital Cologne, University of Cologne, Cologne, Germany
| | - Thorsten Persigehl
- Institute for Diagnostic and Interventional Radiology, Faculty of Medicine and University Hospital Cologne, University of Cologne, Cologne, Germany
| | - Simon Lennartz
- Institute for Diagnostic and Interventional Radiology, Faculty of Medicine and University Hospital Cologne, University of Cologne, Cologne, Germany
| | - Julian A Luetkens
- Institute for Diagnostic and Interventional Radiology, Faculty of Medicine and University Hospital Bonn, University of Bonn, Bonn, Germany
| | - Astha Jaiswal
- Institute for Diagnostic and Interventional Radiology, Faculty of Medicine and University Hospital Cologne, University of Cologne, Cologne, Germany
| | - Andra Iza Iuga
- Institute for Diagnostic and Interventional Radiology, Faculty of Medicine and University Hospital Cologne, University of Cologne, Cologne, Germany
| | - Lenhard Pennig
- Institute for Diagnostic and Interventional Radiology, Faculty of Medicine and University Hospital Cologne, University of Cologne, Cologne, Germany
| | - Jonathan Kottlors
- Institute for Diagnostic and Interventional Radiology, Faculty of Medicine and University Hospital Cologne, University of Cologne, Cologne, Germany
| |
Collapse
|
195
|
Haider SA, Pressman SM, Borna S, Gomez-Cabello CA, Sehgal A, Leibovich BC, Forte AJ. Evaluating Large Language Model (LLM) Performance on Established Breast Classification Systems. Diagnostics (Basel) 2024; 14:1491. [PMID: 39061628 PMCID: PMC11275570 DOI: 10.3390/diagnostics14141491] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/12/2024] [Revised: 06/25/2024] [Accepted: 07/09/2024] [Indexed: 07/28/2024] Open
Abstract
Medical researchers are increasingly utilizing advanced LLMs like ChatGPT-4 and Gemini to enhance diagnostic processes in the medical field. This research focuses on their ability to comprehend and apply complex medical classification systems for breast conditions, which can significantly aid plastic surgeons in making informed decisions for diagnosis and treatment, ultimately leading to improved patient outcomes. Fifty clinical scenarios were created to evaluate the classification accuracy of each LLM across five established breast-related classification systems. Scores from 0 to 2 were assigned to LLM responses to denote incorrect, partially correct, or completely correct classifications. Descriptive statistics were employed to compare the performances of ChatGPT-4 and Gemini. Gemini exhibited superior overall performance, achieving 98% accuracy compared to ChatGPT-4's 71%. While both models performed well in the Baker classification for capsular contracture and UTSW classification for gynecomastia, Gemini consistently outperformed ChatGPT-4 in other systems, such as the Fischer Grade Classification for gender-affirming mastectomy, Kajava Classification for ectopic breast tissue, and Regnault Classification for breast ptosis. With further development, integrating LLMs into plastic surgery practice will likely enhance diagnostic support and decision making.
Collapse
Affiliation(s)
- Syed Ali Haider
- Division of Plastic Surgery, Mayo Clinic, Jacksonville, FL 32224, USA
| | | | - Sahar Borna
- Division of Plastic Surgery, Mayo Clinic, Jacksonville, FL 32224, USA
| | | | - Ajai Sehgal
- Center for Digital Health, Mayo Clinic, Rochester, MN 55905, USA
| | - Bradley C. Leibovich
- Center for Digital Health, Mayo Clinic, Rochester, MN 55905, USA
- Department of Urology, Mayo Clinic, Rochester, MN 55905, USA
| | - Antonio Jorge Forte
- Division of Plastic Surgery, Mayo Clinic, Jacksonville, FL 32224, USA
- Center for Digital Health, Mayo Clinic, Rochester, MN 55905, USA
| |
Collapse
|
196
|
Lee J, Xu X, Kim D, Deng HH, Kuang T, Lampen N, Fang X, Gateno J, Yan P. Large Language Models Diagnose Facial Deformity. MEDRXIV : THE PREPRINT SERVER FOR HEALTH SCIENCES 2024:2024.07.11.24310274. [PMID: 39040164 PMCID: PMC11261925 DOI: 10.1101/2024.07.11.24310274] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 07/24/2024]
Abstract
Purpose This study examines the application of Large Language Models (LLMs) in diagnosing jaw deformities, aiming to overcome the limitations of various diagnostic methods by harnessing the advanced capabilities of LLMs for enhanced data interpretation. The goal is to provide tools that simplify complex data analysis and make diagnostic processes more accessible and intuitive for clinical practitioners. Methods An experiment involving patients with jaw deformities was conducted, where cephalometric measurements (SNB Angle, Facial Angle, Mandibular Unit Length) were converted into text for LLM analysis. Multiple LLMs, including LLAMA-2 variants, GPT models, and the Gemini-Pro model, were evaluated against various methods (Threshold-based, Machine Learning Models) using balanced accuracy and F1-score. Results Our research demonstrates that larger LLMs efficiently adapt to diagnostic tasks, showing rapid performance saturation with minimal training examples and reducing ambiguous classification, which highlights their robust in-context learning abilities. The conversion of complex cephalometric measurements into intuitive text formats not only broadens the accessibility of the information but also enhances the interpretability, providing clinicians with clear and actionable insights. Conclusion Integrating LLMs into the diagnosis of jaw deformities marks a significant advancement in making diagnostic processes more accessible and reducing reliance on specialized training. These models serve as valuable auxiliary tools, offering clear, understandable outputs that facilitate easier decision-making for clinicians, particularly those with less experience or in settings with limited access to specialized expertise. Future refinements and adaptations to include more comprehensive and medically specific datasets are expected to enhance the precision and utility of LLMs, potentially transforming the landscape of medical diagnostics.
Collapse
Affiliation(s)
- Jungwook Lee
- Department of Biomedical Engineering and Center for Biotechnology and Interdisciplinary Studies, Rensselaer Polytechnic Institute, Troy, NY 12180, USA
| | - Xuanang Xu
- Department of Biomedical Engineering and Center for Biotechnology and Interdisciplinary Studies, Rensselaer Polytechnic Institute, Troy, NY 12180, USA
| | - Daeseung Kim
- Department of Oral and Maxillofacial Surgery, Houston Methodist Research Institute, Houston, TX, 77030, USA
| | - Hannah H. Deng
- Department of Oral and Maxillofacial Surgery, Houston Methodist Research Institute, Houston, TX, 77030, USA
| | - Tianshu Kuang
- Department of Oral and Maxillofacial Surgery, Houston Methodist Research Institute, Houston, TX, 77030, USA
| | - Nathan Lampen
- Department of Biomedical Engineering and Center for Biotechnology and Interdisciplinary Studies, Rensselaer Polytechnic Institute, Troy, NY 12180, USA
| | - Xi Fang
- Department of Biomedical Engineering and Center for Biotechnology and Interdisciplinary Studies, Rensselaer Polytechnic Institute, Troy, NY 12180, USA
| | - Jaime Gateno
- Department of Oral and Maxillofacial Surgery, Houston Methodist Research Institute, Houston, TX, 77030, USA
- Department of Surgery (Oral and Maxillofacial Surgery), Weill Medical College, Cornell University, New York, NY, 10021, USA
| | - Pingkun Yan
- Department of Biomedical Engineering and Center for Biotechnology and Interdisciplinary Studies, Rensselaer Polytechnic Institute, Troy, NY 12180, USA
| |
Collapse
|
197
|
Mazumdar H, Khondakar KR, Das S, Kaushik A. Aspects of 6th generation sensing technology: from sensing to sense. FRONTIERS IN NANOTECHNOLOGY 2024; 6. [DOI: 10.3389/fnano.2024.1434014] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/06/2025] Open
Abstract
The 6th-generation (6G) sensing technology is transforming the ways we perceive and interact with the world in real scenarios. It combines advanced materials, sophisticated algorithms, and connectivity to create intelligent, context-aware systems that can interpret and respond to environmental stimuli with unprecedented accuracy and speed. The key advancements include 1) ultra-sensitive sensors capable of detecting physical, chemical, and biological changes at low concentrations, 2) the integration of artificial intelligence (AI) and machine learning (ML) for enhanced data processing, and 3) the deployment of IoT networks with 5th-generation (5G) for seamless data transmission and real-time analysis. These cutting-edge technologies create immersive environments where devices capture data and anticipate user needs and environmental conditions. The 6G sensing technology has potential applications across sectors like point-of-care (PoC), healthcare, urban planning, and environmental monitoring. The transition from sensing to sense-making represents a paradigm shift, fostering a more intuitive, responsive, and interconnected world. The article provides a comprehensive overview of the current state and prospects of 6G sensing technology, highlighting its transformative potential and the challenges in realizing its full capabilities.
Collapse
|
198
|
Haltaufderheide J, Ranisch R. The ethics of ChatGPT in medicine and healthcare: a systematic review on Large Language Models (LLMs). NPJ Digit Med 2024; 7:183. [PMID: 38977771 PMCID: PMC11231310 DOI: 10.1038/s41746-024-01157-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/21/2024] [Accepted: 05/29/2024] [Indexed: 07/10/2024] Open
Abstract
With the introduction of ChatGPT, Large Language Models (LLMs) have received enormous attention in healthcare. Despite potential benefits, researchers have underscored various ethical implications. While individual instances have garnered attention, a systematic and comprehensive overview of practical applications currently researched and ethical issues connected to them is lacking. Against this background, this work maps the ethical landscape surrounding the current deployment of LLMs in medicine and healthcare through a systematic review. Electronic databases and preprint servers were queried using a comprehensive search strategy which generated 796 records. Studies were screened and extracted following a modified rapid review approach. Methodological quality was assessed using a hybrid approach. For 53 records, a meta-aggregative synthesis was performed. Four general fields of applications emerged showcasing a dynamic exploration phase. Advantages of using LLMs are attributed to their capacity in data analysis, information provisioning, support in decision-making or mitigating information loss and enhancing information accessibility. However, our study also identifies recurrent ethical concerns connected to fairness, bias, non-maleficence, transparency, and privacy. A distinctive concern is the tendency to produce harmful or convincing but inaccurate content. Calls for ethical guidance and human oversight are recurrent. We suggest that the ethical guidance debate should be reframed to focus on defining what constitutes acceptable human oversight across the spectrum of applications. This involves considering the diversity of settings, varying potentials for harm, and different acceptable thresholds for performance and certainty in healthcare. Additionally, critical inquiry is needed to evaluate the necessity and justification of LLMs' current experimental use.
Collapse
Affiliation(s)
- Joschka Haltaufderheide
- Faculty of Health Sciences Brandenburg, University of Potsdam, Am Mühlenberg 9, Potsdam, 14476, Germany
| | - Robert Ranisch
- Faculty of Health Sciences Brandenburg, University of Potsdam, Am Mühlenberg 9, Potsdam, 14476, Germany.
| |
Collapse
|
199
|
Gottlieb S. Congress Must Update FDA Regulations for Medical AI. JAMA HEALTH FORUM 2024; 5:e242691. [PMID: 38990560 DOI: 10.1001/jamahealthforum.2024.2691] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 07/12/2024] Open
Abstract
This JAMA Forum discusses pending legislation in the US House and Senate and the history of the “firm-based approach” the US Food and Drug Administration (FDA) could use when regulating artificial intelligence (AI) medical devices to augment patient care.
Collapse
|
200
|
Chung SM, Chang MC. Assessment of the information provided by ChatGPT regarding exercise for patients with type 2 diabetes: a pilot study. BMJ Health Care Inform 2024; 31:e101006. [PMID: 38964828 PMCID: PMC11227747 DOI: 10.1136/bmjhci-2023-101006] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/20/2023] [Accepted: 06/21/2024] [Indexed: 07/06/2024] Open
Abstract
OBJECTIVES We assessed the feasibility of ChatGPT for patients with type 2 diabetes seeking information about exercise. METHODS In this pilot study, two physicians with expertise in diabetes care and rehabilitative treatment in Republic of Korea discussed and determined the 14 most asked questions on exercise for managing type 2 diabetes by patients in clinical practice. Each question was inputted into ChatGPT (V.4.0), and the answers from ChatGPT were assessed. The Likert scale was calculated for each category of validity (1-4), safety (1-4) and utility (1-4) based on position statements of the American Diabetes Association and American College of Sports Medicine. RESULTS Regarding validity, 4 of 14 ChatGPT (28.6%) responses were scored as 3, indicating accurate but incomplete information. The other 10 responses (71.4%) were scored as 4, indicating complete accuracy with complete information. Safety and utility scored 4 (no danger and completely useful) for all 14 ChatGPT responses. CONCLUSION ChatGPT can be used as supplementary educational material for diabetic exercise. However, users should be aware that ChatGPT may provide incomplete answers to some questions on exercise for type 2 diabetes.
Collapse
Affiliation(s)
- Seung Min Chung
- Division of Endocrinology and Metabolism, Department of Internal Medicine, College of Medicine, Yeungnam University, Daegu, The Republic of Korea
| | - Min Cheol Chang
- Department of Physical Medicine and Rehabilitation, College of Medicine, Yeungnam University, Daegu, The Republic of Korea
| |
Collapse
|