1
|
Bergling K, Wang LC, Shivakumar O, Nandorine Ban A, Moore LW, Ginsberg N, Kooman J, Duncan N, Kotanko P, Zhang H. From bytes to bites: application of large language models to enhance nutritional recommendations. Clin Kidney J 2025; 18:sfaf082. [PMID: 40226366 PMCID: PMC11992566 DOI: 10.1093/ckj/sfaf082] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/31/2024] [Indexed: 04/15/2025] Open
Abstract
Large language models (LLMs) such as ChatGPT are increasingly positioned to be integrated into various aspects of daily life, with promising applications in healthcare, including personalized nutritional guidance for patients with chronic kidney disease (CKD). However, for LLM-powered nutrition support tools to reach their full potential, active collaboration of healthcare professionals, patients, caregivers and LLM experts is crucial. We conducted a comprehensive review of the literature on the use of LLMs as tools to enhance nutrition recommendations for patients with CKD, curated by our expertise in the field. Additionally, we considered relevant findings from adjacent fields, including diabetes and obesity management. Currently, the application of LLMs for CKD-specific nutrition support remains limited and has room for improvement. Although LLMs can generate recipe ideas, their nutritional analyses often underestimate critical food components such as electrolytes and calories. Anticipated advancements in LLMs and other generative artificial intelligence (AI) technologies are expected to enhance these capabilities, potentially enabling accurate nutritional analysis, the generation of visual aids for cooking and identification of kidney-healthy options in restaurants. While LLM-based nutritional support for patients with CKD is still in its early stages, rapid advancements are expected in the near future. Engagement from the CKD community, including healthcare professionals, patients and caregivers, will be essential to harness AI-driven improvements in nutritional care with a balanced perspective that is both critical and optimistic.
Collapse
Affiliation(s)
- Karin Bergling
- Artificial Intelligence Translational Innovation Hub, Renal Research Institute, New York, NY, USA
| | - Lin-Chun Wang
- Fresenius Medical Care, Clinical Research, New York, NY, USA
| | - Oshini Shivakumar
- West London Renal and Transplant Centre, Hammersmith Hospital, Imperial College Healthcare NHS Trust, London, UK
| | - Andrea Nandorine Ban
- Artificial Intelligence Translational Innovation Hub, Renal Research Institute, New York, NY, USA
| | - Linda W Moore
- Department of Surgery, Houston Methodist Hospital, Houston, TX, USA
| | - Nancy Ginsberg
- Nutrition Services, Fresenius Medical Care North America, Waltham, MA, USA
| | - Jeroen Kooman
- Department of Internal Medicine, Maastricht University Medical Center, Maastricht, The Netherlands
| | - Neill Duncan
- West London Renal and Transplant Centre, Hammersmith Hospital, Imperial College Healthcare NHS Trust, London, UK
| | - Peter Kotanko
- Artificial Intelligence Translational Innovation Hub, Renal Research Institute, New York, NY, USA
- Department of Nephrology, Icahn School of Medicine at Mount Sinai, New York, NY, USA
| | - Hanjie Zhang
- Artificial Intelligence Translational Innovation Hub, Renal Research Institute, New York, NY, USA
| |
Collapse
|
2
|
Chou HH, Chen YH, Lin CT, Chang HT, Wu AC, Tsai JL, Chen HW, Hsu CC, Liu SY, Lee JT. AI-driven patient support: Evaluating the effectiveness of ChatGPT-4 in addressing queries about ovarian cancer compared with healthcare professionals in gynecologic oncology. Support Care Cancer 2025; 33:337. [PMID: 40167802 DOI: 10.1007/s00520-025-09389-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/16/2024] [Accepted: 03/20/2025] [Indexed: 04/02/2025]
Abstract
PURPOSE Artificial intelligence (AI) chatbots, such as ChatGPT-4, allow a user to ask questions on an interactive level. This study evaluated the correctness and completeness of responses to questions about ovarian cancer from a GPT-4 chatbot, LilyBot, compared with responses from healthcare professionals in gynecologic cancer care. METHODS Fifteen categories of questions about ovarian cancer were collected from an online patient Chatgroup forum. Ten healthcare professionals in gynecologic oncology generated 150 questions and responses relative to these topics. Responses from LilyBot and the healthcare professionals were scored for correctness and completeness by eight independent healthcare professionals with similar backgrounds blinded to the identity of the responders. Differences between groups were analyzed with Mann-Whitney U and Kruskal-Wallis tests, followed by Tukey's post hoc comparisons. RESULTS Mean scores for overall performance for all 150 questions were significantly higher for LilyBot compared with the healthcare professionals for correctness (5.31 ± 0.98 vs. 5.07 ± 1.00, p = 0.017; range = 1-6) and completeness (2.66 ± 0.55 vs. 2.36 ± 0.55, p < 0.001; range = 1-3). LilyBot had significantly higher scores for immunotherapy compared with the healthcare professionals for correctness (6.00 ± 0.00 vs. 4.70 ± 0.48, p = 0.020) and completeness (3.00 ± 0.00 vs. 2.00 ± 0.00, p < 0.010); and gene therapy for completeness (3.00 ± 0.00 vs. 2.20 ± 0.42, p = 0.023). CONCLUSIONS The significantly better performance by LilyBot compared with healthcare professionals highlights the potential of ChatGPT-4-based dialogue systems to provide patients with clinical information about ovarian cancer.
Collapse
Affiliation(s)
- Hung-Hsueh Chou
- Department of Obstetrics and Gynecology, Linkou Branch, Chang Gung Memorial Hospital, Tao-Yuan, Taiwan
- School of Medicine, National Tsing Hua University, Hsinchu, Taiwan
| | - Yi Hua Chen
- School of Nursing, College of Medicine, Chang Gung University, Tao-Yuan, Taiwan
| | - Chiu-Tzu Lin
- Nursing Department, Linkou Branch, Chang Gung Memorial Hospital, Tao-Yuan, Taiwan
| | - Hsien-Tsung Chang
- Bachelor Program in Artificial Intelligence, Chang Gung University, Taoyuan 333, Tao-Yuan, Taiwan
- Department of Computer Science and Information Engineering, Chang Gung University, Taoyuan 333, Tao-Yuan, Taiwan
- Department of Physical Medicine and Rehabilitation, Chang Gung Memorial Hospital, Taoyuan 333, Tao-Yuan, Taiwan
| | - An-Chieh Wu
- Department of Computer Science and Information Engineering, Chang Gung University, Taoyuan 333, Tao-Yuan, Taiwan
| | - Jia-Ling Tsai
- School of Nursing, College of Medicine, Chang Gung University, Tao-Yuan, Taiwan
| | - Hsiao-Wei Chen
- School of Nursing, College of Medicine, Chang Gung University, Tao-Yuan, Taiwan
| | - Ching-Chun Hsu
- Department of Computer Science and Information Engineering, Chang Gung University, Taoyuan 333, Tao-Yuan, Taiwan
| | - Shu-Ya Liu
- School of Nursing, College of Medicine, Chang Gung University, Tao-Yuan, Taiwan
| | - Jian Tao Lee
- School of Nursing, College of Medicine, Chang Gung University, Tao-Yuan, Taiwan.
- Nursing Department, Linkou Branch, Chang Gung Memorial Hospital, Tao-Yuan, Taiwan.
| |
Collapse
|
3
|
Ozlu Karahan T, Kenger EB, Yilmaz Y. Artificial Intelligence-Based Diets: A Role in the Nutritional Treatment of Metabolic Dysfunction-Associated Steatotic Liver Disease? J Hum Nutr Diet 2025; 38:e70033. [PMID: 40013348 DOI: 10.1111/jhn.70033] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/07/2024] [Revised: 01/31/2025] [Accepted: 02/18/2025] [Indexed: 02/28/2025]
Abstract
BACKGROUND Metabolic dysfunction-associated steatotic liver disease (MASLD) is a growing global health concern. Effective management of this condition relies heavily on lifestyle modifications and dietary interventions. In this study, we sought to evaluate the dietary plans for MASLD generated by ChatGPT (GPT-4o) according to current guideline recommendations. METHODS ChatGPT was used to create single-day meal plans for 48 simulated patients with MASLD, tailored to individual characteristics such as age, gender, height, weight and transient elastography parameters. The plans were assessed for appropriateness according to disease-specific guidelines. RESULTS The mean energy content of the menus planned by ChatGPT was 1596.9 ± 141.5 kcal with a mean accuracy of 91.3 ± 11.0%, and fibre content was 22.0 ± 0.6 g with a mean accuracy of 88.1 ± 2.5%. However, they exhibited elevated levels of protein, fat and saturated fat acids. Conversely, the carbohydrate content was lower. ChatGPT recommended weight loss for obese patients but did not extend this advice to normal-weight and overweight individuals. Notably, recommendations for a Mediterranean diet and physical activity were absent. CONCLUSIONS ChatGPT shows potential in developing dietary plans for MASLD management. However, discrepancies in macronutrient distributions and the omission of key evidence-based recommendations highlight the need for further refinement. To enhance the effectiveness of AI tools in dietary recommendations, alignment with established guidelines must be improved.
Collapse
Affiliation(s)
- Tugce Ozlu Karahan
- Department of Nutrition and Dietetics, Faculty of Health Sciences, Istanbul Bilgi University, Istanbul, Turkey
| | - Emre Batuhan Kenger
- Department of Nutrition and Dietetics, Faculty of Health Sciences, Istanbul Bilgi University, Istanbul, Turkey
| | - Yusuf Yilmaz
- Department of Gastroenterology, School of Medicine, Recep Tayyip Erdoğan University, Rize, Turkey
| |
Collapse
|
4
|
You Q, Li X, Shi L, Rao Z, Hu W. Still a Long Way to Go, the Potential of ChatGPT in Personalized Dietary Prescription, From a Perspective of a Clinical Dietitian. J Ren Nutr 2025:S1051-2276(25)00026-3. [PMID: 40074209 DOI: 10.1053/j.jrn.2025.02.008] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/03/2024] [Revised: 12/16/2024] [Accepted: 02/27/2025] [Indexed: 03/14/2025] Open
Abstract
OBJECTIVE Prominent large language models, such as OpenAI's Chat Generative Pre-trained Transformer (ChatGPT), have shown promising implementation in the field of nutrition. Special care should be taken when using ChatGPT to prescribe protein-restricted diets for kidney-impaired patients. The objective of the current study is to simulate a chronic kidney disease (CKD) patient and evaluate the capabilities of ChatGPT in the context of dietary prescription, with a focus on protein contents of the diet. METHODS We simulated a scenario involving a CKD patient and replicated a clinical counseling session that covered general dietary principles, dietary assessment, energy and protein recommendation, dietary prescription, and diet customization based on dietary culture. To confirm the results derived from our qualitative observations, 10 colleagues were recruited and provided with identical dietary prescription prompts to run the process again. The actual energy and protein levels of the given meal plans were recorded and the difference from the targets were compared. RESULTS ChatGPT provides general principles overall aligning with best practices. The recommendations for energy and protein requirements of CKD patients were tailored and satisfactory. It failed to prescribe a reliable diet based on the target energy and protein requirements. For the quantitative analysis, the prescribed energy levels were generally lower than the targets, ranging from -28.9% to -17.0%, and protein contents were tremendously higher than the targets, ranging from 59.3% to 157%. CONCLUSION ChatGPT is competent in offering generic dietary advice, giving satisfactory nutrients recommendations and adapting cuisines to different cultures but failed to prescribe nutritionally accurate dietary plans for CKD patients. At present, patients with strict protein and other particular nutrient restrictions are not recommended to rely on the dietary plans prescribed by ChatGPT to avoid potential health risks.
Collapse
Affiliation(s)
- Qian You
- Department of Clinical Nutrition, West China Hospital, Sichuan University, Chengdu, China
| | - Xuemei Li
- Department of Clinical Nutrition, West China Hospital, Sichuan University, Chengdu, China
| | - Lei Shi
- Department of Clinical Nutrition, West China Hospital, Sichuan University, Chengdu, China
| | - Zhiyong Rao
- Department of Clinical Nutrition, West China Hospital, Sichuan University, Chengdu, China
| | - Wen Hu
- Department of Clinical Nutrition, West China Hospital, Sichuan University, Chengdu, China.
| |
Collapse
|
5
|
Karataş Ö, Demirci S, Pota K, Tuna S. Assessing ChatGPT's Role in Sarcopenia and Nutrition: Insights from a Descriptive Study on AI-Driven Solutions. J Clin Med 2025; 14:1747. [PMID: 40095876 PMCID: PMC11900272 DOI: 10.3390/jcm14051747] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/20/2025] [Revised: 02/21/2025] [Accepted: 02/22/2025] [Indexed: 03/19/2025] Open
Abstract
Background: Sarcopenia, an age-related decline in muscle mass and function, poses significant health risks. While AI tools like ChatGPT-4 (ChatGPT-4o) are increasingly used in healthcare, their accuracy in addressing sarcopenia remains unclear. Methods: ChatGPT-4's responses to 20 frequently asked sarcopenia-related questions were evaluated by 34 experts using a four-criterion scale (relevance, accuracy, clarity, Ccmpleteness). Responses were rated from 1 (low) to 5 (high), and interrater reliability was assessed via intraclass correlation coefficient (ICC). Results: ChatGPT-4 received consistently high median scores (5.0), with ≥90% of evaluators rating responses ≥4. Relevance had the highest mean score (4.7 ± 0.5), followed by accuracy (4.6 ± 0.6), clarity (4.6 ± 0.6), and completeness (4.6 ± 0.7). ICC analysis showed poor agreement (0.416), with Completeness displaying moderate agreement (0.569). Conclusions: ChatGPT-4 provides highly relevant and structured responses but with variability in accuracy and clarity. While it shows potential for patient education, expert oversight remains essential to ensure clinical validity. Future studies should explore patient-specific data integration and AI comparisons to refine its role in sarcopenia management.
Collapse
Affiliation(s)
- Özlem Karataş
- Department of Physical Medicine and Rehabilitation, Akdeniz University, Antalya 07070, Turkey
| | - Seden Demirci
- Department of Neurology, Akdeniz University, Antalya 07070, Turkey;
| | - Kaan Pota
- Department of Orthopaedics and Traumatology, Akdeniz University, Antalya 07070, Turkey
| | - Serpil Tuna
- Department of Physical Medicine and Rehabilitation, Akdeniz University, Antalya 07070, Turkey
| |
Collapse
|
6
|
Adilmetova G, Nassyrov R, Meyerbekova A, Karabay A, Varol HA, Chan MY. Evaluating ChatGPT's Multilingual Performance in Clinical Nutrition Advice Using Synthetic Medical Text: Insights from Central Asia. J Nutr 2025; 155:729-735. [PMID: 39732434 DOI: 10.1016/j.tjnut.2024.12.018] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/18/2024] [Revised: 12/12/2024] [Accepted: 12/19/2024] [Indexed: 12/30/2024] Open
Abstract
BACKGROUND Although large language models like ChatGPT-4 have demonstrated competency in English, their performance for minority groups speaking underrepresented languages, as well as their ability to adapt to specific sociocultural nuances and regional cuisines, such as those in Central Asia (for example, Kazakhstan), still requires further investigation. OBJECTIVES To evaluate and compare the effectiveness of the ChatGPT-4 system in providing personalized, evidence-based nutritional recommendations in English, Kazakh, and Russian in Central Asia. METHODS This study was conducted from 15 May to 31 August, 2023. On the basis of 50 mock patient profiles, ChatGPT-4 generated dietary advice, and responses were evaluated for personalization, consistency, and practicality using a 5-point Likert scale. To identify significant differences between the 3 languages, the Kruskal-Wallis test was conducted. Additional pairwise comparisons for each language were carried out using the post hoc Dunn's test. RESULTS ChatGPT-4 showed a moderate level of performance in each category for English and Russian languages, whereas in Kazakh language, outputs were unsuitable for evaluation. The scores for English, Russian, and Kazakh were as follows: for personalization, 3.32 ± 0.46, 3.18 ± 0.38, and 1.01 ± 0.06; for consistency, 3.48 ± 0.43, 3.38 ± 0.39, and 1.09 ± 0.18; and for practicality, 3.25 ± 0.41, 3.37 ± 0.38, and 1.07 ± 0.15, respectively. The Kruskal-Wallis test indicated statistically significant differences in ChatGPT-4's performance across the 3 languages (P < 0.001). Subsequent post hoc analysis using Dunn's test showed that the performance in both English and Russian was significantly different from that in Kazakh. CONCLUSIONS Our findings show that, despite using identical prompts across 3 distinct languages, the ChatGPT-4's capability to produce sensible outputs is limited by the lack of training data in non-English languages. Thus, a customized large language model should be developed to perform better in underrepresented languages and to take into account specific local diets and practices.
Collapse
Affiliation(s)
- Gulnoza Adilmetova
- Department of Biomedical Sciences, School of Medicine, Nazarbayev University, Astana, Kazakhstan
| | - Ruslan Nassyrov
- Department of Medicine, School of Medicine, Nazarbayev University, Astana, Kazakhstan
| | - Aizhan Meyerbekova
- Department of Medicine, School of Medicine, Nazarbayev University, Astana, Kazakhstan
| | - Aknur Karabay
- Institute of Smart Systems and Artificial Intelligence, Nazarbayev University, Astana, Kazakhstan
| | - Huseyin Atakan Varol
- Institute of Smart Systems and Artificial Intelligence, Nazarbayev University, Astana, Kazakhstan
| | - Mei-Yen Chan
- Department of Biomedical Sciences, School of Medicine, Nazarbayev University, Astana, Kazakhstan.
| |
Collapse
|
7
|
Azimi I, Qi M, Wang L, Rahmani AM, Li Y. Evaluation of LLMs accuracy and consistency in the registered dietitian exam through prompt engineering and knowledge retrieval. Sci Rep 2025; 15:1506. [PMID: 39789057 PMCID: PMC11718202 DOI: 10.1038/s41598-024-85003-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/15/2024] [Accepted: 12/30/2024] [Indexed: 01/12/2025] Open
Abstract
Large language models (LLMs) are fundamentally transforming human-facing applications in the health and well-being domains: boosting patient engagement, accelerating clinical decision-making, and facilitating medical education. Although state-of-the-art LLMs have shown superior performance in several conversational applications, evaluations within nutrition and diet applications are still insufficient. In this paper, we propose to employ the Registered Dietitian (RD) exam to conduct a standard and comprehensive evaluation of state-of-the-art LLMs, GPT-4o, Claude 3.5 Sonnet, and Gemini 1.5 Pro, assessing both accuracy and consistency in nutrition queries. Our evaluation includes 1050 RD exam questions encompassing several nutrition topics and proficiency levels. In addition, for the first time, we examine the impact of Zero-Shot (ZS), Chain of Thought (CoT), Chain of Thought with Self Consistency (CoT-SC), and Retrieval Augmented Prompting (RAP) on both accuracy and consistency of the responses. Our findings revealed that while these LLMs obtained acceptable overall performance, their results varied considerably with different prompts and question domains. GPT-4o with CoT-SC prompting outperformed the other approaches, whereas Gemini 1.5 Pro with ZS recorded the highest consistency. For GPT-4o and Claude 3.5, CoT improved the accuracy, and CoT-SC improved both accuracy and consistency. RAP was particularly effective for GPT-4o to answer Expert level questions. Consequently, choosing the appropriate LLM and prompting technique, tailored to the proficiency level and specific domain, can mitigate errors and potential risks in diet and nutrition chatbots.
Collapse
Affiliation(s)
- Iman Azimi
- Department of Engineering, iHealth Labs, Sunnyvale, CA, 94085, United States.
| | - Mohan Qi
- Department of Engineering, iHealth Labs, Sunnyvale, CA, 94085, United States
| | - Li Wang
- Department of Clinical Research, iHealth Labs, Sunnyvale, CA, 94085, United States
| | - Amir M Rahmani
- School of Nursing and Department of Computer Science, University of California Irvine, Irvine, CA, 92697, United States
| | - Youlin Li
- Department of Engineering, iHealth Labs, Sunnyvale, CA, 94085, United States
| |
Collapse
|
8
|
Karacan E. Healthy nutrition and weight management for a positive pregnancy experience in the antenatal period: Comparison of responses from artificial intelligence models on nutrition during pregnancy. Int J Med Inform 2025; 193:105663. [PMID: 39531902 DOI: 10.1016/j.ijmedinf.2024.105663] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/25/2024] [Revised: 09/13/2024] [Accepted: 10/23/2024] [Indexed: 11/16/2024]
Abstract
BACKGROUND As artificial intelligence AI-supported applications become integral to web-based information-seeking, assessing their impact on healthy nutrition and weight management during the antenatal period is crucial. OBJECTIVE This study was conducted to evaluate both the quality and semantic similarity of responses created by AI models to the most frequently asked questions about healthy nutrition and weight management during the antenatal period, based on existing clinical knowledge. METHODS In this study, a cross-sectional assessment design was used to explore data from 3 AI models (GPT-4, MedicalGPT, Med-PaLM). We directed the most frequently asked questions about nutrition during pregnancy, obtained from the American College of Obstetricians and Gynecologists (ACOG) to each model in a new and single session on October 21, 2023, without any prior conversation. Immediately after, instructions were given to the AI models to generate responses to these questions. The responses created by AI models were evaluated using the Grading of Recommendations Assessment, Development, and Evaluation (GRADE) scale. Additionally, to assess the semantic similarity between answers to 31 pregnancy nutrition-related frequently asked questions sourced from the ACOG and responses from AI models we evaluated cosine similarity using both WORD2VEC and BioLORD-2023. RESULTS Med-PaLM outperformed GPT-4 and MedicalGPT in response quality (mean = 3.93), demonstrating superior clinical accuracy over both GPT-4 (p = 0.016) and MedicalGPT (p = 0.001). GPT-4 had higher quality than MedicalGPT (p = 0.027). The semantic similarity between ACOG and Med-PaLM is higher with WORD2VEC (0.92) compared to BioLORD-2023 (0.81), showing a difference of +0.11. The similarity scores for ACOG-MedicalGPT and ACOG-GPT-4 are similar across both models, with minimal differences of -0.01. Overall, WORD2VEC has a slightly higher average similarity (0.82) than BioLORD-2023 (0.79), with a difference of +0.03. CONCLUSIONS Despite the superior performance of Med-PaLM, there is a need for further evidence-based research and improvement in the integration of AI in healthcare due to varying AI model performances.
Collapse
Affiliation(s)
- Emine Karacan
- Iskenderun Technical University, Dortyol Vocational School of Health Services, Hatay, Turkey.
| |
Collapse
|
9
|
Ulug E, Gunesli I, Acıkgoz Pinar A, Yildiz BO. Evaluating reliability, quality, and readability of ChatGPT's nutritional recommendations for women with polycystic ovary syndrome. Nutr Res 2025; 133:46-53. [PMID: 39673813 DOI: 10.1016/j.nutres.2024.11.005] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/07/2024] [Revised: 11/13/2024] [Accepted: 11/13/2024] [Indexed: 12/16/2024]
Abstract
Patients with polycystic ovary syndrome (PCOS) often have many questions about nutrition and turn to chatbots such as Chat Generative Pretrained Transformer (ChatGPT) for advice. This study aims to evaluate the reliability, quality, and readability of ChatGPT's responses to nutrition-related questions asked by women with PCOS. Frequently asked nutrition-related questions from women with PCOS were reviewed in both Turkish and English. The reliability and quality of the answers were independently evaluated by 2 authors and a panel of 10 expert dietitians, using modified DISCERN and global quality score. Additionally, the readability of the answers was calculated using frequently used readability formulas. The mean modified DISCERN scores for English and Turkish versions were 27.6±0.87 and 27.2±0.87, respectively, indicating a fair level of reliability in the responses (16-31 points or 40%-79%). According to the global quality score, 100% of the responses in English and 90.9% of the responses in Turkish were rated as high quality. The readability of responses was classified as "difficult to read" with the readership levels assessed at college level and above for both English and Turkish. The correlation and regression analyses indicated no relationship between reliability, quality, and readability in English. However, a significant relationship was observed between quality and readability indexes in Turkish (P < .05). Our results suggest that ChatGPT's responses to nutrition-related questions about PCOS are generally of high quality, but improvements in both reliability and readability are still necessary. Although ChatGPT can offer general information and guidance on nutrition for PCOS, it should not be considered a substitute for personalized medical advice from health care professionals for effective management of the syndrome.
Collapse
Affiliation(s)
- Elif Ulug
- Department of Nutrition and Dietetics, Faculty of Health Sciences, Hacettepe University, 06100, Ankara, Turkey; Department of Nutrition and Dietetics, Faculty of Health Sciences, Ataturk University, 25240, Erzurum, Turkey
| | - Irmak Gunesli
- Department of Internal Medicine, Hacettepe University School of Medicine, 06100, Ankara, Turkey
| | - Aylin Acıkgoz Pinar
- Department of Nutrition and Dietetics, Faculty of Health Sciences, Hacettepe University, 06100, Ankara, Turkey
| | - Bulent Okan Yildiz
- Department of Internal Medicine, Hacettepe University School of Medicine, 06100, Ankara, Turkey; Division of Endocrinology and Metabolism, Hacettepe University School of Medicine, 06100, Ankara, Turkey.
| |
Collapse
|
10
|
Hieronimus B, Hammann S, Podszun MC. Can the AI tools ChatGPT and Bard generate energy, macro- and micro-nutrient sufficient meal plans for different dietary patterns? Nutr Res 2024; 128:105-114. [PMID: 39102765 DOI: 10.1016/j.nutres.2024.07.002] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/05/2024] [Revised: 07/07/2024] [Accepted: 07/07/2024] [Indexed: 08/07/2024]
Abstract
Artificial intelligence chatbots based on large language models have recently emerged as an alternative to traditional online searches and are also entering the nutrition space. In this study, we wanted to investigate whether the artificial intelligence chatbots ChatGPT and Bard (now Gemini) can create meal plans that meet the dietary reference intake (DRI) for different dietary patterns. We further hypothesized that nutritional adequacy could be improved by modifying the prompts used. Meal plans were generated by 3 accounts for different dietary patterns (omnivorous, vegetarian, and vegan) using 2 distinct prompts resulting in 108 meal plans total. The nutrient content of the plans was subsequently analyzed and compared to the DRIs. On average, the meal plans contained less energy and carbohydrates but mostly exceeded the DRI for protein. Vitamin D and fluoride fell below the DRI for all plans, whereas only the vegan plans contained insufficient vitamin B12. ChatGPT suggested using vitamin B12 supplements in 5 of 18 instances, whereas Bard never recommended supplements. There were no significant differences between the prompts or the tools. Although the meal plans generated by ChatGPT and Bard met most DRIs, there were some exceptions, particularly for vegan diets. These tools maybe useful for individuals looking for general dietary inspiration, but they should not be relied on to create nutritionally adequate meal plans, especially for individuals with restrictive dietary needs.
Collapse
Affiliation(s)
- Bettina Hieronimus
- Max Rubner-Institut, Department of Physiology and Biochemistry of Nutrition, Karlsruhe, Germany
| | - Simon Hammann
- Department of Chemistry and Pharmacy, Friedrich-Alexander-Universität Erlangen-Nürnberg, Erlangen, Germany; Department of Food Chemistry and Analytical Chemistry (170a), Institute of Food Chemistry, University of Hohenheim, Stuttgart, Germany
| | - Maren C Podszun
- Institute of Nutritional Science, Department of Food Biofunctionality, University of Hohenheim, Stuttgart, Germany.
| |
Collapse
|
11
|
Ponzo V, Goitre I, Favaro E, Merlo FD, Mancino MV, Riso S, Bo S. Is ChatGPT an Effective Tool for Providing Dietary Advice? Nutrients 2024; 16:469. [PMID: 38398794 PMCID: PMC10892804 DOI: 10.3390/nu16040469] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/08/2024] [Revised: 01/30/2024] [Accepted: 02/03/2024] [Indexed: 02/25/2024] Open
Abstract
The chatbot Chat Generative Pretrained Transformer (ChatGPT) is becoming increasingly popular among patients for searching health-related information. Prior studies have raised concerns regarding accuracy in offering nutritional advice. We investigated in November 2023 ChatGPT's potential as a tool for providing nutritional guidance in relation to different non-communicable diseases (NCDs). First, the dietary advice given by ChatGPT (version 3.5) for various NCDs was compared with guidelines; then, the chatbot's capacity to manage a complex case with several diseases was investigated. A panel of nutrition experts assessed ChatGPT's responses. Overall, ChatGPT offered clear advice, with appropriateness of responses ranging from 55.5% (sarcopenia) to 73.3% (NAFLD). Only two recommendations (one for obesity, one for non-alcoholic-fatty-liver disease) contradicted guidelines. A single suggestion for T2DM was found to be "unsupported", while many recommendations for various NCDs were deemed to be "not fully matched" to the guidelines despite not directly contradicting them. However, when the chatbot handled overlapping conditions, limitations emerged, resulting in some contradictory or inappropriate advice. In conclusion, although ChatGPT exhibited a reasonable accuracy in providing general dietary advice for NCDs, its efficacy decreased in complex situations necessitating customized strategies; therefore, the chatbot is currently unable to replace a healthcare professional's consultation.
Collapse
Affiliation(s)
- Valentina Ponzo
- Department of Medical Sciences, University of Torino, 10126 Torino, Italy; (V.P.); (I.G.); (E.F.)
| | - Ilaria Goitre
- Department of Medical Sciences, University of Torino, 10126 Torino, Italy; (V.P.); (I.G.); (E.F.)
| | - Enrica Favaro
- Department of Medical Sciences, University of Torino, 10126 Torino, Italy; (V.P.); (I.G.); (E.F.)
| | - Fabio Dario Merlo
- Dietetic and Clinical Nutrition Unit, Città della Salute e della Scienza Hospital of Torino, 10126 Torino, Italy; (F.D.M.); (M.V.M.)
| | - Maria Vittoria Mancino
- Dietetic and Clinical Nutrition Unit, Città della Salute e della Scienza Hospital of Torino, 10126 Torino, Italy; (F.D.M.); (M.V.M.)
| | - Sergio Riso
- Dietetic and Clinical Nutrition Unit, Azienda Ospedaliero-Universitaria Maggiore della Carità of Novara, 28100 Novara, Italy;
| | - Simona Bo
- Department of Medical Sciences, University of Torino, 10126 Torino, Italy; (V.P.); (I.G.); (E.F.)
- Dietetic and Clinical Nutrition Unit, Città della Salute e della Scienza Hospital of Torino, 10126 Torino, Italy; (F.D.M.); (M.V.M.)
| |
Collapse
|