1
|
Wang X, Bai Y, Li W, Yang C, Zhang L, Zhu H, Bao R, Jiang Y, Wang F, Wang H. Effect of artificial intelligence driven therapeutic lifestyle changes (AI-TLC) intervention on health behavior and health among obesity pregnant women in China: a randomized controlled trial protocol. Front Public Health 2025; 13:1580060. [PMID: 40421363 PMCID: PMC12104060 DOI: 10.3389/fpubh.2025.1580060] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/20/2025] [Accepted: 04/17/2025] [Indexed: 05/28/2025] Open
Abstract
Introduction Obesity has reached epidemic proportions globally, posing significant challenges to public health and economic stability. In China, the prevalence of obesity is increasing rapidly, particularly among pregnant women, who face unique risks due to the complex interplay between obesity and pregnancy outcomes. This study aims to evaluate the effectiveness of an Artificial Intelligence-driven Therapeutic Lifestyle Change (AI-TLC) intervention in improving health behaviors and outcomes among obese pregnant women in China. Methods and analysis This randomized controlled trial will recruit pregnant women aged 18 years or older with a singleton pregnancy between 8 and 12 weeks of gestation and a pre-pregnancy BMI of ≥30.0 kg/m2. Participants will be randomly assigned to one of three groups: a manual intervention group, an AI intervention group, and a combined AI and manual intervention group. The intervention will focus on therapeutic lifestyle changes, including behavioral modifications, dietary adjustments, and physical activity promotion, supported by a multidisciplinary team. Primary outcomes will include maternal BMI, weight, and adverse pregnancy outcomes, while secondary outcomes will assess physiological indicators, quality of life, mental health, and lifestyle factors. Results The study will evaluate the effects of health interventions on obese pregnant women through primary outcomes (e.g., BMI, weight, adverse pregnancy outcomes) and secondary outcomes (e.g., physiological indicators, quality of life, mental health) using various statistical methods. The results will provide insights into the intervention's effectiveness and cost-effectiveness across different socioeconomic groups. Discussion The anticipated findings are expected to demonstrate the efficacy of AI-TLC interventions in managing obesity during pregnancy. This study will contribute valuable evidence to the limited research on AI-based interventions for obese pregnant women, offering potential implications for the development of personalized, efficient, and innovative health strategies. The findings may also inform public health initiatives aimed at improving maternal and child health outcomes in the context of obesity.
Collapse
Affiliation(s)
- Xiaoyun Wang
- Pediatric Department, Inner Mongolia Maternal and Child Health Care Hospital, Hohhot, China
| | - Yang Bai
- School of School of Public Administration and Policy, Renmin University of China, Beijing, China
| | - Wenzhuo Li
- Medical College, Qingdao University, Qingdao, Shandong, China
| | - Chenxin Yang
- School of Health Management, China Medical University, Shenyang, China
| | - Linjing Zhang
- Xiang Ya Nursing School, Central South University, Changsha, Hunan, China
| | - Hongyi Zhu
- School of Nursing, Tianjin Medical University, Tianjin, China
| | - Rantong Bao
- Department of Quality Management, Affiliated Hospital of Inner Mongolia Medical University, Hohhot, China
| | - Yang Jiang
- Jitang College, North China University of Science and Technology, Tangshan, Hebei, China
| | - Fei Wang
- Shandong Key Laboratory of Reproductive Research and Birth Defect Prevention, Department of Gynecology, Shandong Provincial Hospital Affiliated to Shandong First Medical University, Jinan, Shandong, China
- Department of Gynecology, Inner Mongolia Maternity and Child Health Care Hospital, Hohhot, China
| | - Huanfang Wang
- Pediatric Department, Inner Mongolia Maternal and Child Health Care Hospital, Hohhot, China
| |
Collapse
|
2
|
Chen D, He E, Pace K, Chekay M, Raman S. Concordance with SPIRIT-AI guidelines in reporting of randomized controlled trial protocols investigating artificial intelligence in oncology: a systematic review. Oncologist 2025; 30:oyaf112. [PMID: 40421957 PMCID: PMC12107541 DOI: 10.1093/oncolo/oyaf112] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/04/2025] [Accepted: 03/24/2025] [Indexed: 05/28/2025] Open
Abstract
BACKGROUND Artificial intelligence (AI) is a promising tool used in oncology that may be able to facilitate diagnosis, treatment planning, and patient management. Transparency and completeness of protocols of randomized controlled trials (RCT) involving AI interventions is necessary to ensure reproducibility of AI tools across diverse clinical settings. The SPIRIT 2013 and SPIRIT-AI 2020 guidelines were developed as evidence-based recommendations for complete reporting of trial protocols. However, the concordance of AI RCT protocols in oncology to SPIRIT reporting guidelines remains unknown. This systematic review evaluates the concordance of protocols of RCTs evaluating AI interventions in oncology to the SPIRIT 2013 and SPIRIT-AI 2020 reporting guidelines. METHODS A systematic search of Ovid Medline and Embase was conducted on October 22, 2024 for primary, peer-reviewed RCT protocols involving AI interventions in oncology. Eligible studies were screened in duplicate and data extraction assessed concordance to SPIRIT 2013 and SPIRIT-AI 2020 guideline items. Item-specific concordance was measured as the proportion of studies that reported the item. Average concordance was measured as the median proportion of items reported for each study. RESULTS Twelve RCT protocols met the inclusion criteria. The median concordance to SPIRIT 2013 guidelines was 81.92% (IQR 74.88-88.95) and the median concordance to SPIRIT-AI 2020 guidelines was 78.21% (IQR 67.21-89.20). For SPIRIT 2013 guidelines, high concordance was observed for items related to study objectives and ethics, but gaps were identified in reporting blinding procedures, participant retention, and post-trial care. For SPIRIT-AI 2020 guidelines, there remained gaps based on data quality management, performance error analysis, and accessibility of AI intervention code. CONCLUSION While concordance to reporting guidelines in oncology AI RCT protocols was moderately high, critical gaps in protocol reporting persist that may hinder reproducibility and clinical implementation. Future efforts should focus on increasing awareness and reinforcement to enhance reporting quality necessary to foster the responsible integration of AI into oncology practice.
Collapse
Affiliation(s)
- David Chen
- Princess Margaret Hospital Cancer Centre, Radiation Medicine Program, Toronto, Ontario, Canada M5G 2C4
- Temerty Faculty of Medicine, University of Toronto, Toronto, ON, Canada M5S 3K3
| | - Emily He
- Temerty Faculty of Medicine, University of Toronto, Toronto, ON, Canada M5S 3K3
| | - Keiran Pace
- Temerty Faculty of Medicine, University of Toronto, Toronto, ON, Canada M5S 3K3
| | - Matthew Chekay
- Temerty Faculty of Medicine, University of Toronto, Toronto, ON, Canada M5S 3K3
| | - Srinivas Raman
- Princess Margaret Hospital Cancer Centre, Radiation Medicine Program, Toronto, Ontario, Canada M5G 2C4
- Department of Radiation Oncology, University of Toronto, Toronto, ON, Canada M5T 1P5
- Department of Radiation Oncology, BC Cancer Vancouver, Vancouver, BC, Canada V5Z 1M9
| |
Collapse
|
3
|
Liu S, McCoy AB, Wright A. Improving large language model applications in biomedicine with retrieval-augmented generation: a systematic review, meta-analysis, and clinical development guidelines. J Am Med Inform Assoc 2025; 32:605-615. [PMID: 39812777 PMCID: PMC12005634 DOI: 10.1093/jamia/ocaf008] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/19/2024] [Revised: 12/17/2024] [Accepted: 01/03/2025] [Indexed: 01/16/2025] Open
Abstract
OBJECTIVE The objectives of this study are to synthesize findings from recent research of retrieval-augmented generation (RAG) and large language models (LLMs) in biomedicine and provide clinical development guidelines to improve effectiveness. MATERIALS AND METHODS We conducted a systematic literature review and a meta-analysis. The report was created in adherence to the Preferred Reporting Items for Systematic Reviews and Meta-Analyses 2020 analysis. Searches were performed in 3 databases (PubMed, Embase, PsycINFO) using terms related to "retrieval augmented generation" and "large language model," for articles published in 2023 and 2024. We selected studies that compared baseline LLM performance with RAG performance. We developed a random-effect meta-analysis model, using odds ratio as the effect size. RESULTS Among 335 studies, 20 were included in this literature review. The pooled effect size was 1.35, with a 95% confidence interval of 1.19-1.53, indicating a statistically significant effect (P = .001). We reported clinical tasks, baseline LLMs, retrieval sources and strategies, as well as evaluation methods. DISCUSSION Building on our literature review, we developed Guidelines for Unified Implementation and Development of Enhanced LLM Applications with RAG in Clinical Settings to inform clinical applications using RAG. CONCLUSION Overall, RAG implementation showed a 1.35 odds ratio increase in performance compared to baseline LLMs. Future research should focus on (1) system-level enhancement: the combination of RAG and agent, (2) knowledge-level enhancement: deep integration of knowledge into LLM, and (3) integration-level enhancement: integrating RAG systems within electronic health records.
Collapse
Affiliation(s)
- Siru Liu
- Department of Biomedical Informatics, Vanderbilt University Medical Center, Nashville, TN 37212, United States
- Department of Computer Science, Vanderbilt University, Nashville, TN 37212, United States
| | - Allison B McCoy
- Department of Biomedical Informatics, Vanderbilt University Medical Center, Nashville, TN 37212, United States
| | - Adam Wright
- Department of Biomedical Informatics, Vanderbilt University Medical Center, Nashville, TN 37212, United States
- Department of Medicine, Vanderbilt University Medical Center, Nashville, TN 37212, United States
| |
Collapse
|
4
|
Scuricini A, Ramoni D, Liberale L, Montecucco F, Carbone F. The role of artificial intelligence in cardiovascular research: Fear less and live bolder. Eur J Clin Invest 2025; 55 Suppl 1:e14364. [PMID: 40191936 PMCID: PMC11973843 DOI: 10.1111/eci.14364] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 10/07/2024] [Accepted: 10/30/2024] [Indexed: 04/09/2025]
Abstract
BACKGROUND Artificial intelligence (AI) has captured the attention of everyone, including cardiovascular (CV) clinicians and scientists. Moving beyond philosophical debates, modern cardiology cannot overlook AI's growing influence but must actively explore its potential applications in clinical practice and research methodology. METHODS AND RESULTS AI offers exciting possibilities for advancing CV medicine by uncovering disease heterogeneity, integrating complex multimodal data, and enhancing treatment strategies. In this review, we discuss the innovative applications of AI in cardiac electrophysiology, imaging, angiography, biomarkers, and genomic data, as well as emerging tools like face recognition and speech analysis. Furthermore, we focus on the expanding role of machine learning (ML) in predicting CV risk and outcomes, outlining a roadmap for the implementation of AI in CV care delivery. While the future of AI holds great promise, technical limitations and ethical challenges remain significant barriers to its widespread clinical adoption. CONCLUSIONS Addressing these issues through the development of high-quality standards and involving key stakeholders will be essential for AI to transform cardiovascular care safely and effectively.
Collapse
Affiliation(s)
| | - Davide Ramoni
- Department of Internal MedicineUniversity of GenoaGenoaItaly
| | - Luca Liberale
- Department of Internal MedicineUniversity of GenoaGenoaItaly
- IRCCS Ospedale Policlinico San Martino, Genoa – Italian Cardiovascular NetworkGenoaItaly
| | - Fabrizio Montecucco
- Department of Internal MedicineUniversity of GenoaGenoaItaly
- IRCCS Ospedale Policlinico San Martino, Genoa – Italian Cardiovascular NetworkGenoaItaly
| | - Federico Carbone
- Department of Internal MedicineUniversity of GenoaGenoaItaly
- IRCCS Ospedale Policlinico San Martino, Genoa – Italian Cardiovascular NetworkGenoaItaly
| |
Collapse
|
5
|
El-Sayed A, Lovat LB, Ahmad OF. Clinical Implementation of Artificial Intelligence in Gastroenterology: Current Landscape, Regulatory Challenges, and Ethical Issues. Gastroenterology 2025:S0016-5085(25)00538-4. [PMID: 40127785 DOI: 10.1053/j.gastro.2025.01.254] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 12/03/2024] [Revised: 01/06/2025] [Accepted: 01/10/2025] [Indexed: 03/26/2025]
Abstract
Artificial intelligence (AI) is set to rapidly transform gastroenterology, particularly in the field of endoscopy, where algorithms have demonstrated efficacy in addressing human operator variability. However, implementing AI in clinical practice presents significant challenges. The regulatory landscape for AI as a medical device continues to evolve with areas of uncertainty. More robust studies generating real-world evidence are required to ultimately demonstrate impacts on patient outcomes. Cost-effectiveness data and reimbursement models will be pivotal for widespread adoption. Novel challenges are posed by emerging technologies, such as generative AI. Ethical and medicolegal concerns exist relating to data governance, patient harm, liability, and bias. This review provides an overview for clinical implementation of AI in gastroenterology and offers potential solutions to current barriers.
Collapse
Affiliation(s)
- Ahmed El-Sayed
- Division of Surgery and Interventional Sciences, University College London, London, United Kingdom
| | - Laurence B Lovat
- Division of Surgery and Interventional Sciences, University College London, London, United Kingdom
| | - Omer F Ahmad
- Division of Surgery and Interventional Sciences, University College London, London, United Kingdom; Department of Gastrointestinal Services, University College London Hospital, London, United Kingdom.
| |
Collapse
|
6
|
Šuto Pavičić J, Marušić A, Buljan I. Using ChatGPT to Improve the Presentation of Plain Language Summaries of Cochrane Systematic Reviews About Oncology Interventions: Cross-Sectional Study. JMIR Cancer 2025; 11:e63347. [PMID: 40106236 PMCID: PMC11939027 DOI: 10.2196/63347] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/17/2024] [Revised: 01/22/2025] [Accepted: 01/27/2025] [Indexed: 03/22/2025] Open
Abstract
Background Plain language summaries (PLSs) of Cochrane systematic reviews are a simple format for presenting medical information to the lay public. This is particularly important in oncology, where patients have a more active role in decision-making. However, current PLS formats often exceed the readability requirements for the general population. There is still a lack of cost-effective and more automated solutions to this problem. Objective This study assessed whether a large language model (eg, ChatGPT) can improve the readability and linguistic characteristics of Cochrane PLSs about oncology interventions, without changing evidence synthesis conclusions. Methods The dataset included 275 scientific abstracts and corresponding PLSs of Cochrane systematic reviews about oncology interventions. ChatGPT-4 was tasked to make each scientific abstract into a PLS using 3 prompts as follows: (1) rewrite this scientific abstract into a PLS to achieve a Simple Measure of Gobbledygook (SMOG) index of 6, (2) rewrite the PLS from prompt 1 so it is more emotional, and (3) rewrite this scientific abstract so it is easier to read and more appropriate for the lay audience. ChatGPT-generated PLSs were analyzed for word count, level of readability (SMOG index), and linguistic characteristics using Linguistic Inquiry and Word Count (LIWC) software and compared with the original PLSs. Two independent assessors reviewed the conclusiveness categories of ChatGPT-generated PLSs and compared them with original abstracts to evaluate consistency. The conclusion of each abstract about the efficacy and safety of the intervention was categorized as conclusive (positive/negative/equal), inconclusive, or unclear. Group comparisons were conducted using the Friedman nonparametric test. Results ChatGPT-generated PLSs using the first prompt (SMOG index 6) were the shortest and easiest to read, with a median SMOG score of 8.2 (95% CI 8-8.4), compared with the original PLSs (median SMOG score 13.1, 95% CI 12.9-13.4). These PLSs had a median word count of 240 (95% CI 232-248) compared with the original PLSs' median word count of 364 (95% CI 339-388). The second prompt (emotional tone) generated PLSs with a median SMOG score of 11.4 (95% CI 11.1-12), again lower than the original PLSs. PLSs produced with the third prompt (write simpler and easier) had a median SMOG score of 8.7 (95% CI 8.4-8.8). ChatGPT-generated PLSs across all prompts demonstrated reduced analytical tone and increased authenticity, clout, and emotional tone compared with the original PLSs. Importantly, the conclusiveness categorization of the original abstracts was unchanged in the ChatGPT-generated PLSs. Conclusions ChatGPT can be a valuable tool in simplifying PLSs as medically related formats for lay audiences. More research is needed, including oversight mechanisms to ensure that the information is accurate, reliable, and culturally relevant for different audiences.
Collapse
Affiliation(s)
- Jelena Šuto Pavičić
- Department of Oncology and Radiotherapy, University Hospital of Split, Spinciceva 1, Split, 21000, Croatia, 385 2155817
| | - Ana Marušić
- Department of Research in Biomedicine in Health, Centre for Evidence-based Medicine, University of Split School of Medicine, Split, Croatia
| | - Ivan Buljan
- Department of Psychology, Faculty of Humanities and Social Sciences, University of Split, Split, Croatia
| |
Collapse
|
7
|
Foote HP, Hong C, Anwar M, Borentain M, Bugin K, Dreyer N, Fessel J, Goyal N, Hanger M, Hernandez AF, Hornik CP, Jackman JG, Lindsay AC, Matheny ME, Ozer K, Seidel J, Stockbridge N, Embi PJ, Lindsell CJ. Embracing Generative Artificial Intelligence in Clinical Research and Beyond: Opportunities, Challenges, and Solutions. JACC. ADVANCES 2025; 4:101593. [PMID: 39923329 PMCID: PMC11850149 DOI: 10.1016/j.jacadv.2025.101593] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/05/2024] [Revised: 12/30/2024] [Accepted: 01/03/2025] [Indexed: 02/11/2025]
Abstract
To explore threats and opportunities and to chart a path for safely navigating the rapid changes that generative artificial intelligence (AI) will bring to clinical research, the Duke Clinical Research Institute convened a multidisciplinary think tank in January 2024. Leading experts from academia, industry, nonprofits, and government agencies highlighted the potential opportunities of generative AI in automation of documentation, strengthening of participant and community engagement, and improvement of trial accuracy and efficiency. Challenges include technical hurdles, ethical dilemmas, and regulatory uncertainties. Success is expected to require establishing rigorous data management and security protocols, fostering integrity and trust among stakeholders, and sharing information about the safety and effectiveness of AI applications. Meeting insights point towards a future where, through collaboration and transparency, generative AI will help to shorten the translational pipeline and increase the inclusivity and equitability of clinical research.
Collapse
Affiliation(s)
- Henry P Foote
- Department of Pediatrics, Duke University, Durham, North Carolina, USA
| | - Chuan Hong
- Department of Biostatistics and Bioinformatics, Duke University, Durham, North Carolina, USA; Duke Clinical Research Institute, Durham, North Carolina, USA
| | - Mohd Anwar
- National Institute of Biomedical Imaging and Bioengineering, National Institutes of Health, Bethesda, Maryland, USA
| | | | - Kevin Bugin
- United States Food and Drug Administration, Silver Spring, Maryland, USA
| | | | - Josh Fessel
- National Center for Advancing Translational Sciences, National Institutes of Health, Bethesda, Maryland, USA
| | | | - Morgan Hanger
- Clinical Trials Transformation Initiative Duke Clinical Research Institute, North Carolina, USA
| | | | | | | | | | | | - Kerem Ozer
- Novo Nordisk, Plainsboro, New Jersey, USA
| | - Jan Seidel
- Boehringer Ingelheim, Plainsboro, New Jersey, USA
| | - Norman Stockbridge
- United States Food and Drug Administration, Silver Spring, Maryland, USA
| | - Peter J Embi
- Vanderbilt University Medical Center, Nashville, Tennessee, USA
| | | |
Collapse
|
8
|
Tam TYC, Sivarajkumar S, Kapoor S, Stolyar AV, Polanska K, McCarthy KR, Osterhoudt H, Wu X, Visweswaran S, Fu S, Mathur P, Cacciamani GE, Sun C, Peng Y, Wang Y. A framework for human evaluation of large language models in healthcare derived from literature review. NPJ Digit Med 2024; 7:258. [PMID: 39333376 PMCID: PMC11437138 DOI: 10.1038/s41746-024-01258-7] [Citation(s) in RCA: 26] [Impact Index Per Article: 26.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/04/2024] [Accepted: 09/11/2024] [Indexed: 09/29/2024] Open
Abstract
With generative artificial intelligence (GenAI), particularly large language models (LLMs), continuing to make inroads in healthcare, assessing LLMs with human evaluations is essential to assuring safety and effectiveness. This study reviews existing literature on human evaluation methodologies for LLMs in healthcare across various medical specialties and addresses factors such as evaluation dimensions, sample types and sizes, selection, and recruitment of evaluators, frameworks and metrics, evaluation process, and statistical analysis type. Our literature review of 142 studies shows gaps in reliability, generalizability, and applicability of current human evaluation practices. To overcome such significant obstacles to healthcare LLM developments and deployments, we propose QUEST, a comprehensive and practical framework for human evaluation of LLMs covering three phases of workflow: Planning, Implementation and Adjudication, and Scoring and Review. QUEST is designed with five proposed evaluation principles: Quality of Information, Understanding and Reasoning, Expression Style and Persona, Safety and Harm, and Trust and Confidence.
Collapse
Affiliation(s)
- Thomas Yu Chow Tam
- Department of Health Information Management, University of Pittsburgh, Pittsburgh, PA, USA
| | | | - Sumit Kapoor
- Department of Critical Care Medicine, University of Pittsburgh Medical Center, Pittsburgh, PA, USA
| | - Alisa V Stolyar
- Department of Health Information Management, University of Pittsburgh, Pittsburgh, PA, USA
| | - Katelyn Polanska
- Department of Health Information Management, University of Pittsburgh, Pittsburgh, PA, USA
| | - Karleigh R McCarthy
- Department of Health Information Management, University of Pittsburgh, Pittsburgh, PA, USA
| | - Hunter Osterhoudt
- Department of Health Information Management, University of Pittsburgh, Pittsburgh, PA, USA
| | - Xizhi Wu
- Department of Health Information Management, University of Pittsburgh, Pittsburgh, PA, USA
| | - Shyam Visweswaran
- Department of Biomedical Informatics, University of Pittsburgh, Pittsburgh, PA, USA
- Clinical and Translational Science Institute, University of Pittsburgh, Pittsburgh, PA, USA
| | - Sunyang Fu
- Department of Clinical and Health Informatics, Center for Translational AI Excellence and Applications in Medicine, University of Texas Health Science Center at Houston, Houston, TX, USA
| | - Piyush Mathur
- Department of Anesthesiology, Cleveland Clinic, Cleveland, OH, USA
- BrainX AI ReSearch, BrainX LLC, Cleveland, OH, USA
| | - Giovanni E Cacciamani
- Department of Urology, Keck School of Medicine, University of Southern California, Los Angeles, CA, USA
| | - Cong Sun
- Department of Population Health Sciences, Weill Cornell Medicine, New York, NY, USA
| | - Yifan Peng
- Department of Population Health Sciences, Weill Cornell Medicine, New York, NY, USA
| | - Yanshan Wang
- Department of Health Information Management, University of Pittsburgh, Pittsburgh, PA, USA.
- Intelligent Systems Program, University of Pittsburgh, Pittsburgh, PA, USA.
- Department of Biomedical Informatics, University of Pittsburgh, Pittsburgh, PA, USA.
- Clinical and Translational Science Institute, University of Pittsburgh, Pittsburgh, PA, USA.
- Hillman Cancer Center, University of Pittsburgh Medical Center, Pittsburgh, PA, USA.
| |
Collapse
|
9
|
Lenharo M. The testing of AI in medicine is a mess. Here's how it should be done. Nature 2024; 632:722-724. [PMID: 39169244 DOI: 10.1038/d41586-024-02675-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 08/23/2024]
|
10
|
Hogg HDJ, Martindale APL, Liu X, Denniston AK. Clinical Evaluation of Artificial Intelligence-Enabled Interventions. Invest Ophthalmol Vis Sci 2024; 65:10. [PMID: 39106058 PMCID: PMC11309043 DOI: 10.1167/iovs.65.10.10] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/13/2023] [Accepted: 07/02/2024] [Indexed: 08/07/2024] Open
Abstract
Artificial intelligence (AI) health technologies are increasingly available for use in real-world care. This emerging opportunity is accompanied by a need for decision makers and practitioners across healthcare systems to evaluate the safety and effectiveness of these interventions against the needs of their own setting. To meet this need, high-quality evidence regarding AI-enabled interventions must be made available, and decision makers in varying roles and settings must be empowered to evaluate that evidence within the context in which they work. This article summarizes good practices across four stages of evidence generation for AI health technologies: study design, study conduct, study reporting, and study appraisal.
Collapse
Affiliation(s)
- H. D. Jeffry Hogg
- University Hospitals Birmingham NHS Foundation Trust, Birmingham, United Kingdom
- Institute of Applied Health Research, College of Medical and Dental Sciences, University of Birmingham, Birmingham, United Kingdom
- NIHR-Supported Incubator in AI & Digital Healthcare, Birmingham, United Kingdom
| | | | - Xiaoxuan Liu
- University Hospitals Birmingham NHS Foundation Trust, Birmingham, United Kingdom
- Institute of Applied Health Research, College of Medical and Dental Sciences, University of Birmingham, Birmingham, United Kingdom
- NIHR-Supported Incubator in AI & Digital Healthcare, Birmingham, United Kingdom
- National Institute for Health and Care Research (NIHR) Birmingham Biomedical Research Centre, United Kingdom
| | - Alastair K. Denniston
- University Hospitals Birmingham NHS Foundation Trust, Birmingham, United Kingdom
- Institute of Applied Health Research, College of Medical and Dental Sciences, University of Birmingham, Birmingham, United Kingdom
- NIHR-Supported Incubator in AI & Digital Healthcare, Birmingham, United Kingdom
- National Institute for Health and Care Research (NIHR) Birmingham Biomedical Research Centre, United Kingdom
| |
Collapse
|