Reference Citation Analysis: Find an Article, Find a Category, Find a Journal, Find a Scholar

For: Lee J, Yoon W, Kim S, Kim D, Kim S, So CH, Kang J. BioBERT: a pre-trained biomedical language representation model for biomedical text mining. Bioinformatics 2020;36:1234-1240. [PMID: 31501885 PMCID: PMC7703786 DOI: 10.1093/bioinformatics/btz682] [Citation(s) in RCA: 1410] [Impact Index Per Article: 282.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/16/2019] [Revised: 07/29/2019] [Accepted: 09/05/2019] [Indexed: 12/15/2022] Open

For:	Lee J, Yoon W, Kim S, Kim D, Kim S, So CH, Kang J. BioBERT: a pre-trained biomedical language representation model for biomedical text mining. Bioinformatics 2020;36:1234-1240. [PMID: 31501885 PMCID: PMC7703786 DOI: 10.1093/bioinformatics/btz682] [Citation(s) in RCA: 1410] [Impact Index Per Article: 282.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/16/2019] [Revised: 07/29/2019] [Accepted: 09/05/2019] [Indexed: 12/15/2022] Open

Number

Cited by Other Article(s)

Shen Y, Wang J, Wang Z, Shi Z, Chen H, Wang Z, Jiang Y, Wang X, Cheng C, Wang X, Zhu H, Ye J. CATI: A medical context-enhanced framework for diagnosis code assignment in the UK Biobank study. Artif Intell Med 2025;166:103136. [PMID: 40344999 DOI: 10.1016/j.artmed.2025.103136] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/07/2024] [Revised: 03/10/2025] [Accepted: 04/15/2025] [Indexed: 05/11/2025]

Gupta S, Sharma S, Sharma R, Chandra J. Healing with hierarchy: Hierarchical attention empowered graph neural networks for predictive analysis in medical data. Artif Intell Med 2025;165:103134. [PMID: 40286587 DOI: 10.1016/j.artmed.2025.103134] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/04/2024] [Revised: 04/11/2025] [Accepted: 04/12/2025] [Indexed: 04/29/2025]

Fathy W, Emeriaud G, Cheriet F. A comprehensive review of ICU readmission prediction models: From statistical methods to deep learning approaches. Artif Intell Med 2025;165:103126. [PMID: 40300338 DOI: 10.1016/j.artmed.2025.103126] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/13/2023] [Revised: 10/04/2024] [Accepted: 03/29/2025] [Indexed: 05/01/2025]

Ye X, Shi T, Huang D, Sakurai T. Multi-Omics clustering by integrating clinical features from large language model. Methods 2025;239:64-71. [PMID: 40180255 DOI: 10.1016/j.ymeth.2025.03.017] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/14/2025] [Revised: 03/16/2025] [Accepted: 03/26/2025] [Indexed: 04/05/2025] Open

Hu Y, Chen Y, Xu Y. A shape composition method for named entity recognition. Neural Netw 2025;187:107389. [PMID: 40117979 DOI: 10.1016/j.neunet.2025.107389] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/29/2024] [Revised: 12/16/2024] [Accepted: 03/10/2025] [Indexed: 03/23/2025]

Khan N, Mufti MR, Arif M, Ali A, Shah Z. KEM-IoMT: Knowledge graph embedding-enhanced accurate medical service recommendation against diabetes. Comput Biol Med 2025;194:110463. [PMID: 40516453 DOI: 10.1016/j.compbiomed.2025.110463] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/25/2024] [Revised: 03/16/2025] [Accepted: 05/25/2025] [Indexed: 06/16/2025]

Kohli S, Agarwal P, Ho Wing Chan A, Erekat A, Nadkarni G, Kummer B. Machine learning to predict penumbra core mismatch in acute ischemic stroke using clinical note data. NPJ Digit Med 2025;8:340. [PMID: 40481318 PMCID: PMC12144192 DOI: 10.1038/s41746-025-01703-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/13/2024] [Accepted: 05/03/2025] [Indexed: 06/11/2025] Open

Withers CA, Rufai AM, Venkatesan A, Tirunagari S, Lobentanzer S, Harrison M, Zdrazil B. Natural language processing in drug discovery: bridging the gap between text and therapeutics with artificial intelligence. Expert Opin Drug Discov 2025;20:765-783. [PMID: 40298230 DOI: 10.1080/17460441.2025.2490835] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/28/2024] [Revised: 03/07/2025] [Accepted: 04/04/2025] [Indexed: 04/30/2025]

Zhou F, Parrish R, Afzal M, Saha A, Haynes RB, Iorio A, Lokker C. Benchmarking domain-specific pretrained language models to identify the best model for methodological rigor in clinical studies. J Biomed Inform 2025;166:104825. [PMID: 40246186 DOI: 10.1016/j.jbi.2025.104825] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/01/2024] [Revised: 03/02/2025] [Accepted: 04/03/2025] [Indexed: 04/19/2025]

Abstract

OBJECTIVE

Encoder-only transformer-based language models have shown promise in automating critical appraisal of clinical literature. However, a comprehensive evaluation of the models for classifying the methodological rigor of randomized controlled trials is necessary to identify the more robust ones. This study benchmarks several state-of-the-art transformer-based language models using a diverse set of performance metrics.

METHODS

Seven transformer-based language models were fine-tuned on the title and abstract of 42,575 articles from 2003 to 2023 in McMaster University's Premium LiteratUre Service database under different configurations. The studies reported in the articles addressed questions related to treatment, prevention, or quality improvement for which randomized controlled trials are the gold standard with defined criteria for rigorous methods. Models were evaluated on the validation set using 12 schemes and metrics, including optimization for cross-entropy loss, Brier score, AUROC, average precision, sensitivity, specificity, and accuracy, among others. Threshold tuning was performed to optimize threshold-dependent metrics. Models that achieved the best performance in one or more schemes on the validation set were further tested in hold-out and external datasets.

RESULTS

A total of 210 models were fine-tuned. Six models achieved top performance in one or more evaluation schemes. Three BioLinkBERT models outperformed others on 8 of the 12 schemes. BioBERT, BiomedBERT, and SciBERT were best on 1, 1 and 2 schemes, respectively. While model performance remained robust on the hold-out test set, it declined in external datasets. Class weight adjustments improved performance in most instances.

CONCLUSION

BioLinkBERT generally outperformed the other models. Using comprehensive evaluation metrics and threshold tuning optimizes model selection for real-world applications. Future work should assess generalizability to other datasets, explore alternate imbalance strategies, and examine training on full-text articles.

Collapse

Dias AC, Moreira VP, Comba JLD. RoBIn: A Transformer-based model for risk of bias inference with machine reading comprehension. J Biomed Inform 2025;166:104819. [PMID: 40250743 DOI: 10.1016/j.jbi.2025.104819] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/01/2024] [Revised: 03/12/2025] [Accepted: 03/25/2025] [Indexed: 04/20/2025]

Shen Y, Xu Y, Ma J, Rui W, Zhao C, Heacock L, Huang C. Multi-modal large language models in radiology: principles, applications, and potential. Abdom Radiol (NY) 2025;50:2745-2757. [PMID: 39621074 DOI: 10.1007/s00261-024-04708-8] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/29/2024] [Revised: 11/13/2024] [Accepted: 11/15/2024] [Indexed: 05/13/2025]

Dorfner FJ, Dada A, Busch F, Makowski MR, Han T, Truhn D, Kleesiek J, Sushil M, Adams LC, Bressem KK. Evaluating the effectiveness of biomedical fine-tuning for large language models on clinical tasks. J Am Med Inform Assoc 2025;32:1015-1024. [PMID: 40190132 DOI: 10.1093/jamia/ocaf045] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/20/2024] [Accepted: 03/02/2025] [Indexed: 05/21/2025] Open

Abstract

OBJECTIVES

Large language models (LLMs) have shown potential in biomedical applications, leading to efforts to fine-tune them on domain-specific data. However, the effectiveness of this approach remains unclear. This study aims to critically evaluate the performance of biomedically fine-tuned LLMs against their general-purpose counterparts across a range of clinical tasks.

MATERIALS AND METHODS

We evaluated the performance of biomedically fine-tuned LLMs against their general-purpose counterparts on clinical case challenges from NEJM and JAMA, and on multiple clinical tasks, such as information extraction, document summarization and clinical coding. We used a diverse set of benchmarks specifically chosen to be outside the likely fine-tuning datasets of biomedical models, ensuring a fair assessment of generalization capabilities.

RESULTS

Biomedical LLMs generally underperformed compared to general-purpose models, especially on tasks not focused on probing medical knowledge. While on the case challenges, larger biomedical and general-purpose models showed similar performance (eg, OpenBioLLM-70B: 66.4% vs Llama-3-70B-Instruct: 65% on JAMA), smaller biomedical models showed more pronounced underperformance (OpenBioLLM-8B: 30% vs Llama-3-8B-Instruct: 64.3% on NEJM). Similar trends appeared across CLUE benchmarks, with general-purpose models often achieving higher scores in text generation, question answering, and coding. Notably, biomedical LLMs also showed a higher tendency to hallucinate.

DISCUSSION

Our findings challenge the assumption that biomedical fine-tuning inherently improves LLM performance, as general-purpose models consistently performed better on unseen medical tasks. Retrieval-augmented generation may offer a more effective strategy for clinical adaptation.

CONCLUSION

Fine-tuning LLMs on biomedical data may not yield the anticipated benefits. Alternative approaches, such as retrieval augmentation, should be further explored for effective and reliable clinical integration of LLMs.

Collapse

Raj S, Namdeo V, Singh P, Srivastava A. Identification and prioritization of disease candidate genes using biomedical named entity recognition and random forest classification. Comput Biol Med 2025;192:110320. [PMID: 40349579 DOI: 10.1016/j.compbiomed.2025.110320] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/27/2024] [Revised: 04/13/2025] [Accepted: 04/30/2025] [Indexed: 05/14/2025]

Abstract

BACKGROUND AND OBJECTIVE

The elucidation of candidate genes is fundamental to comprehending intricate diseases, vital for early diagnosis, personalized treatment, and drug discovery. Traditional Disease Gene Identification methods encounter limitations, necessitating substantial sample sizes and statistical power, particularly challenging for complex diseases. Conversely, Disease Gene Prioritization methods leverage biological knowledge but rely on computational predictions, often lacking experimental validation. Addressing existing tool challenges, this study introduces an innovative two-tier machine-learning protocol that distils Disease Gene Association details from disease-specific abstracts, incorporating diverse findings. Employing advanced text mining, the model classifies disease-gene associations from the abstracts into Positive, Negative, and Ambiguous classes.

METHODS

Leveraging Random Forest as a robust text classification tool, this study demonstrates its efficacy in navigating complexities within biomedical texts. In the developed 2-tiered protocol, the level 1 classifier categorizes information into two classes, distinguished by the presence or absence of disease-gene associations, whereas the level 2 classifier further classifies into three classes: Positive, Negative, and Ambiguous associations. The developed classifier underwent rigorous training and cross-validation on different gold standard datasets - Alzheimer's, Breast Cancer and Type 2 Diabetes. Its performance across these varied disease contexts underscores its versatility and robustness without succumbing to overfitting.

RESULTS

Achieving an average accuracy of 97.29 % and 98.14 % for level 1 and level 2 classification, the protocol successfully extracted 2769, 3220 and 740 genes associated positively with Alzheimer's, Breast Cancer and Type 2 Diabetes. From the identified positive genes, a substantial number-1008, 670, and 165 genes, respectively-were not reported in established databases, thus expanding the genetic exploration of these diseases. These identified genes offer promising opportunities for targeted interventions, while ambiguous genes warrant further investigation to unravel deeper disease associations.

CONCLUSIONS

This research significantly contributes to the understanding of genetic diseases by offering a comprehensive roadmap for their intricate exploration. Beyond the study's focus on Alzheimer's, Breast Cancer, and Type 2 Diabetes, the protocol's applicability extends to diverse biomedical landscapes, demonstrating its versatility and impactful potential for comprehensive disease exploration.

Collapse

Wang Y, Cao P, Fang H, Ye Y. Span-aware pre-trained network with deep information bottleneck for scientific entity relation extraction. Neural Netw 2025;186:107250. [PMID: 39955959 DOI: 10.1016/j.neunet.2025.107250] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/11/2024] [Revised: 12/19/2024] [Accepted: 02/02/2025] [Indexed: 02/18/2025]

Guan H, Novoa-Laurentiev J, Zhou L. CD-Tron: Leveraging large clinical language model for early detection of cognitive decline from electronic health records. J Biomed Inform 2025;166:104830. [PMID: 40320101 DOI: 10.1016/j.jbi.2025.104830] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/28/2024] [Revised: 03/28/2025] [Accepted: 04/13/2025] [Indexed: 05/08/2025]

Abstract

BACKGROUND

Early detection of cognitive decline during the preclinical stage of Alzheimer's disease and related dementias (AD/ADRD) is crucial for timely intervention and treatment. Clinical notes in the electronic health record contain valuable information that can aid in the early identification of cognitive decline. In this study, we utilize advanced large clinical language models, fine-tuned on clinical notes, to improve the early detection of cognitive decline.

METHODS

We collected clinical notes from 2,166 patients spanning the 4 years preceding their initial mild cognitive impairment (MCI) diagnosis from the Enterprise Data Warehouse of Mass General Brigham. To train the model, we developed CD-Tron, built upon a large clinical language model that was finetuned using 4,949 expert-labeled note sections. For evaluation, the trained model was applied to 1,996 independent note sections to assess its performance on real-world unstructured clinical data. Additionally, we used explainable AI techniques, specifically SHAP values (SHapley Additive exPlanations), to interpret the model's predictions and provide insight into the most influential features. Error analysis was also facilitated to further analyze the model's prediction.

RESULTS

CD-Tron significantly outperforms baseline models, achieving notable improvements in precision, recall, and AUC metrics for detecting cognitive decline (CD). Tested on many real-world clinical notes, CD-Tron demonstrated high sensitivity with only one false negative, crucial for clinical applications prioritizing early and accurate CD detection. SHAP-based interpretability analysis highlighted key textual features contributing to model predictions, supporting transparency and clinician understanding.

CONCLUSION

CD-Tron offers a novel approach to early cognitive decline detection by applying large clinical language models to free-text EHR data. Pretrained on real-world clinical notes, it accurately identifies early cognitive decline and integrates SHAP for interpretability, enhancing transparency in predictions.

Collapse

Asim MN, Asif T, Hassan F, Dengel A. Protein Sequence Analysis landscape: A Systematic Review of Task Types, Databases, Datasets, Word Embeddings Methods, and Language Models. Database (Oxford) 2025;2025:baaf027. [PMID: 40448683 DOI: 10.1093/database/baaf027] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/28/2024] [Revised: 02/06/2025] [Accepted: 03/26/2025] [Indexed: 06/02/2025]

Abstract

Protein sequence analysis examines the order of amino acids within protein sequences to unlock diverse types of a wealth of knowledge about biological processes and genetic disorders. It helps in forecasting disease susceptibility by finding unique protein signatures, or biomarkers that are linked to particular disease states. Protein Sequence analysis through wet-lab experiments is expensive, time-consuming and error prone. To facilitate large-scale proteomics sequence analysis, the biological community is striving for utilizing AI competence for transitioning from wet-lab to computer aided applications. However, Proteomics and AI are two distinct fields and development of AI-driven protein sequence analysis applications requires knowledge of both domains. To bridge the gap between both fields, various review articles have been written. However, these articles focus revolves around few individual tasks or specific applications rather than providing a comprehensive overview about wide tasks and applications. Following the need of a comprehensive literature that presents a holistic view of wide array of tasks and applications, contributions of this manuscript are manifold: It bridges the gap between Proteomics and AI fields by presenting a comprehensive array of AI-driven applications for 63 distinct protein sequence analysis tasks. It equips AI researchers by facilitating biological foundations of 63 protein sequence analysis tasks. It enhances development of AI-driven protein sequence analysis applications by providing comprehensive details of 68 protein databases. It presents a rich data landscape, encompassing 627 benchmark datasets of 63 diverse protein sequence analysis tasks. It highlights the utilization of 25 unique word embedding methods and 13 language models in AI-driven protein sequence analysis applications. It accelerates the development of AI-driven applications by facilitating current state-of-the-art performances across 63 protein sequence analysis tasks.

Collapse

Potu ST, Niranjan Murthy R, Thomas A, Mishra L, Prange N, Durmaz AR. Ontology-conformal recognition of materials entities using language models. Sci Rep 2025;15:18597. [PMID: 40425727 PMCID: PMC12116928 DOI: 10.1038/s41598-025-03619-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/04/2024] [Accepted: 05/21/2025] [Indexed: 05/29/2025] Open

Abstract

Extracting structured and semantically annotated materials information from unstructured scientific literature is a crucial step toward constructing machine-interpretable knowledge graphs and accelerating data-driven materials research. This is especially important in materials science, which is adversely affected by data scarcity. Data scarcity further motivates employing solutions such as foundation language models for extracting information which can in principle address several subtasks of the information extraction problem in a range of domains without the need of generating costly large-scale annotated datasets for each downstream task. However, foundation language models struggle with tasks like Named Entity Recognition (NER) due to domain-specific terminologies, fine-grained entities, and semantic ambiguity. The issue is even more pronounced when entities must map directly to pre-existing domain ontologies. This work aims to assess whether foundation large language models (LLMs) can successfully perform ontology-conformal NER in the materials mechanics and fatigue domain. Specifically, we present a comparative evaluation of in-context learning (ICL) with foundation models such as GPT-4 against fine-tuned task-specific language models, including MatSciBERT and DeBERTa. The study is performed on two materials fatigue datasets, which contain annotations at a comparatively fine-grained level adhering to the class definitions of a formal ontology to ensure semantic alignment and cross-dataset interoperability. Both datasets cover adjacent domains to assess how well both NER methodologies generalize when presented with typical domain shifts. Task-specific models are shown to significantly outperform general foundation models on an ontology-constrained NER. Our findings reveal a strong dependence on the quality of few-shot demonstrations in ICL to handle domain-shift. The study also highlights the significance of domain-specific pre-training by comparing task-specific models that differ primarily in their pre-training corpus.

Collapse

Tran M, Schmidle P, Guo RR, Wagner SJ, Koch V, Lupperger V, Novotny B, Murphree DH, Hardway HD, D'Amato M, Lefkes J, Geijs DJ, Feuchtinger A, Böhner A, Kaczmarczyk R, Biedermann T, Amir AL, Mooyaart AL, Ciompi F, Litjens G, Wang C, Comfere NI, Eyerich K, Braun SA, Marr C, Peng T. Generating dermatopathology reports from gigapixel whole slide images with HistoGPT. Nat Commun 2025;16:4886. [PMID: 40419470 DOI: 10.1038/s41467-025-60014-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/13/2024] [Accepted: 05/12/2025] [Indexed: 05/28/2025] Open

Affiliation(s)

Manuel Tran Helmholtz AI, Helmholtz Munich, Neuherberg, Germany School of Computation, Information and Technology, Technical University of Munich, Munich, Germany
Paul Schmidle Department of Dermatology, Medical Center, University of Freiburg, Freiburg, Germany
Ruifeng Ray Guo Department of Laboratory Medicine and Pathology, Mayo Clinic, Jacksonville, FL, USA
Sophia J Wagner Helmholtz AI, Helmholtz Munich, Neuherberg, Germany School of Computation, Information and Technology, Technical University of Munich, Munich, Germany
Valentin Koch School of Computation, Information and Technology, Technical University of Munich, Munich, Germany Institute of AI for Health, Helmholtz Munich, Neuherberg, Germany
Valerio Lupperger MLL Munich Leukemia Laboratory, Munich, Germany
Brenna Novotny Department of Quantitative Health Sciences, Mayo Clinic, Rochester, MN, USA
Dennis H Murphree Digital Health, Artificial Intelligence and Innovations Program, Mayo Clinic, Rochester, MN, USA
Heather D Hardway Digital Health, Artificial Intelligence and Innovations Program, Mayo Clinic, Rochester, MN, USA
Marina D'Amato Computational Pathology Group, Radboud University Medical Center, Nijmegen, The Netherlands
Judith Lefkes Computational Pathology Group, Radboud University Medical Center, Nijmegen, The Netherlands Oncode Institute, Utrecht, The Netherlands
Daan J Geijs Computational Pathology Group, Radboud University Medical Center, Nijmegen, The Netherlands Oncode Institute, Utrecht, The Netherlands
Annette Feuchtinger Core Facility Pathology and Tissue Analytics, Helmholtz Munich, Neuherberg, Germany
Alexander Böhner Department of Dermatology and Allergy, Technical University of Munich, Munich, Germany
Robert Kaczmarczyk Department of Dermatology and Allergy, Technical University of Munich, Munich, Germany
Tilo Biedermann Department of Dermatology and Allergy, Technical University of Munich, Munich, Germany
Avital L Amir Department of Pathology, Radboud University Medical Center, Nijmegen, The Netherlands
Antien L Mooyaart Department of Pathology, Erasmus University Medical Center, Rotterdam, The Netherlands
Francesco Ciompi Computational Pathology Group, Radboud University Medical Center, Nijmegen, The Netherlands
Geert Litjens Computational Pathology Group, Radboud University Medical Center, Nijmegen, The Netherlands Oncode Institute, Utrecht, The Netherlands
Chen Wang Department of Quantitative Health Sciences, Mayo Clinic, Rochester, MN, USA
Nneka I Comfere Digital Health, Artificial Intelligence and Innovations Program, Mayo Clinic, Rochester, MN, USA Department of Dermatology and Laboratory Medicine & Pathology, Mayo Clinic, Rochester, MN, USA
Kilian Eyerich Department of Dermatology, Medical Center, University of Freiburg, Freiburg, Germany.
Stephan A Braun Dermatology Department, University Hospital Münster, Münster, Germany. Department of Dermatology, Medical Faculty, Heinrich-Heine University, Düsseldorf, Germany.
Carsten Marr Helmholtz AI, Helmholtz Munich, Neuherberg, Germany. Institute of AI for Health, Helmholtz Munich, Neuherberg, Germany.
Tingying Peng Helmholtz AI, Helmholtz Munich, Neuherberg, Germany. School of Computation, Information and Technology, Technical University of Munich, Munich, Germany.

Collapse

Dastani M, Mardaneh J, Rostamian M. Large language models' capabilities in responding to tuberculosis medical questions: testing ChatGPT, Gemini, and Copilot. Sci Rep 2025;15:18004. [PMID: 40410343 PMCID: PMC12102205 DOI: 10.1038/s41598-025-03074-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/14/2025] [Accepted: 05/19/2025] [Indexed: 05/25/2025] Open

Hein D, Christie A, Holcomb M, Xie B, Jain AJ, Vento J, Rakheja N, Shakur AH, Christley S, Cowell LG, Brugarolas J, Jamieson AR, Kapur P. Iterative refinement and goal articulation to optimize large language models for clinical information extraction. NPJ Digit Med 2025;8:301. [PMID: 40410408 PMCID: PMC12102345 DOI: 10.1038/s41746-025-01686-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/14/2025] [Accepted: 04/28/2025] [Indexed: 05/25/2025] Open

Affiliation(s)

David Hein Lyda Hill Department of Bioinformatics, University of Texas Southwestern Medical Center, Dallas, Texas, USA.
Alana Christie Harold C. Simmons Comprehensive Cancer Center, University of Texas Southwestern Medical Center, Dallas, TX, USA
Michael Holcomb Lyda Hill Department of Bioinformatics, University of Texas Southwestern Medical Center, Dallas, Texas, USA
Bingqing Xie Department of Internal Medicine, Division of Hematology & Oncology, University of Texas Southwestern Medical Center, Dallas, TX, USA
A J Jain Lyda Hill Department of Bioinformatics, University of Texas Southwestern Medical Center, Dallas, Texas, USA
Joseph Vento Department of Internal Medicine, Division of Hematology & Oncology, University of Texas Southwestern Medical Center, Dallas, TX, USA
Neil Rakheja Harold C. Simmons Comprehensive Cancer Center, University of Texas Southwestern Medical Center, Dallas, TX, USA
Ameer Hamza Shakur Lyda Hill Department of Bioinformatics, University of Texas Southwestern Medical Center, Dallas, Texas, USA
Scott Christley Department of Health Data Science and Biostatistics, Peter O'Donnell Jr. School of Public Health, Univerisity of Texas Southwestern Medical Center, Dallas, TX, USA
Lindsay G Cowell Department of Health Data Science and Biostatistics, Peter O'Donnell Jr. School of Public Health, Univerisity of Texas Southwestern Medical Center, Dallas, TX, USA
James Brugarolas Harold C. Simmons Comprehensive Cancer Center, University of Texas Southwestern Medical Center, Dallas, TX, USA
Andrew R Jamieson Lyda Hill Department of Bioinformatics, University of Texas Southwestern Medical Center, Dallas, Texas, USA
Payal Kapur Harold C. Simmons Comprehensive Cancer Center, University of Texas Southwestern Medical Center, Dallas, TX, USA Department of Pathology, University of Texas Southwestern Medical Center, Dallas, TX, USA

Collapse

Kell G, Roberts A, Umansky S, Khare Y, Ahmed N, Patel N, Simela C, Coumbe J, Rozario J, Griffiths RR, Marshall IJ. RealMedQA: A pilot biomedical question answering dataset containing realistic clinical questions. AMIA ... ANNUAL SYMPOSIUM PROCEEDINGS. AMIA SYMPOSIUM 2025;2024:590-599. [PMID: 40417548 PMCID: PMC12099375] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Download PDF] [Subscribe] [Scholar Register] [Indexed: 05/27/2025]

Munzir SI, Hier DB, Oommen C, Carrithers MD. A Large Language Model Outperforms Other Computational Approaches to the High-Throughput Phenotyping of Physician Notes. AMIA ... ANNUAL SYMPOSIUM PROCEEDINGS. AMIA SYMPOSIUM 2025;2024:838-846. [PMID: 40417529 PMCID: PMC12099424] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Download PDF] [Subscribe] [Scholar Register] [Indexed: 05/27/2025]

Shi Y, Xu S, Yang T, Liu Z, Liu T, Li X, Liu N. MKRAG: Medical Knowledge Retrieval Augmented Generation for Medical Question Answering. AMIA ... ANNUAL SYMPOSIUM PROCEEDINGS. AMIA SYMPOSIUM 2025;2024:1011-1020. [PMID: 40417500 PMCID: PMC12099378] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Download PDF] [Subscribe] [Scholar Register] [Indexed: 05/27/2025]

Das T, Shafquat A, Beigi M, Aptekar J, Mezey J, Sun J. SeqTrial: Utility Preserving Sequential Clinical Trial Data Generator. AMIA ... ANNUAL SYMPOSIUM PROCEEDINGS. AMIA SYMPOSIUM 2025;2024:329-338. [PMID: 40417577 PMCID: PMC12099387] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Download PDF] [Subscribe] [Scholar Register] [Indexed: 05/27/2025]

Chekuri A, Johal AS, Allen MR, Ayers JW, Hogarth M, Farcas E. Towards Optimizing LLM Use in Healthcare: Identifying Patient Questions in MyChart Messages. AMIA ... ANNUAL SYMPOSIUM PROCEEDINGS. AMIA SYMPOSIUM 2025;2024:232-241. [PMID: 40417557 PMCID: PMC12099336] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Download PDF] [Subscribe] [Scholar Register] [Indexed: 05/27/2025]

Chen Z, Zhang M, Ahmed MM, Guo Y, George TJ, Bian J, Wu Y. Narrative Feature or Structured Feature? A Study of Large Language Models to Identify Cancer Patients at Risk of Heart Failure. AMIA ... ANNUAL SYMPOSIUM PROCEEDINGS. AMIA SYMPOSIUM 2025;2024:242-251. [PMID: 40417538 PMCID: PMC12099403] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Download PDF] [Subscribe] [Scholar Register] [Indexed: 05/27/2025]

Lyu W, Bi Z, Wang F, Chen C. BadCLM: Backdoor Attack in Clinical Language Models for Electronic Health Records. AMIA ... ANNUAL SYMPOSIUM PROCEEDINGS. AMIA SYMPOSIUM 2025;2024:768-777. [PMID: 40417555 PMCID: PMC12099347] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Download PDF] [Subscribe] [Scholar Register] [Indexed: 05/27/2025]

Majid I, Mishra V, Ravindranath R, Wang SY. Evaluating the Performance of Large Language Models for Named Entity Recognition in Ophthalmology Clinical Free-Text Notes. AMIA ... ANNUAL SYMPOSIUM PROCEEDINGS. AMIA SYMPOSIUM 2025;2024:778-787. [PMID: 40417582 PMCID: PMC12099357] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Download PDF] [Subscribe] [Scholar Register] [Indexed: 05/27/2025]

Wang J, Li H, Liu H. A Comprehensive System for Searching and Evaluating Genomic Variant Evidence Using AI and Knowledge Bases to Support Personalized Medicine. AMIA ... ANNUAL SYMPOSIUM PROCEEDINGS. AMIA SYMPOSIUM 2025;2024:1206-1214. [PMID: 40417484 PMCID: PMC12099401] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Download PDF] [Subscribe] [Scholar Register] [Indexed: 05/27/2025]

Das A, Tariq A, Batalini F, Dhara B, Banerjee I. Exposing Vulnerabilities in Clinical LLMs Through Data Poisoning Attacks: Case Study in Breast Cancer. AMIA ... ANNUAL SYMPOSIUM PROCEEDINGS. AMIA SYMPOSIUM 2025;2024:339-348. [PMID: 40417494 PMCID: PMC12099371] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Download PDF] [Subscribe] [Scholar Register] [Indexed: 05/27/2025]

Dobreva J, Simjanoska Misheva M, Mishev K, Trajanov D, Mishkovski I. A Unified Framework for Alzheimer's Disease Knowledge Graphs: Architectures, Principles, and Clinical Translation. Brain Sci 2025;15:523. [PMID: 40426694 PMCID: PMC12110335 DOI: 10.3390/brainsci15050523] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/08/2025] [Revised: 05/07/2025] [Accepted: 05/12/2025] [Indexed: 05/29/2025] Open

Abstract

This review paper synthesizes the application of knowledge graphs (KGs) in Alzheimer's disease (AD) research, based on two basic questions, as follows: what types of input data are available to construct these knowledge graphs, and what purpose the knowledge graph is intended to fulfill. We synthesize results from existing works to illustrate how diverse knowledge graph structures behave in different data availability settings with distinct application targets in AD research. By comparative analysis, we define the best methodology practices by data type (literature, structured databases, neuroimaging, and clinical records) and application of interest (drug repurposing, disease classification, mechanism discovery, and clinical decision support). From this analysis, we recommend AD-KG 2.0, which is a new framework that coalesces best practices into a unifying architecture with well-defined decision pathways for implementation. Our key contributions are as follows: (1) a dynamic adaptation mechanism that adapts methodological elements automatically according to both data availability and application objectives, (2) a specialized semantic alignment layer that harmonizes terminologies across biological scales, and (3) a multi-constraint optimization approach for knowledge graph building. The framework accommodates a variety of applications, including drug repurposing, patient stratification for precision medicine, disease progression modeling, and clinical decision support. Our system, with a decision tree structured and pipeline layered architecture, offers research precise directions on how to use knowledge graphs in AD research by aligning methodological choice decisions with respective data availability and application goals. We provide precise component designs and adaptation processes that deliver optimal performance across varying research and clinical settings. We conclude by addressing implementation challenges and future directions for translating knowledge graph technologies from research tool to clinical use, with a specific focus on interpretability, workflow integration, and regulatory matters.

Collapse

Karabuğa B, Karaçin C, Büyükkör M, Bayram D, Aydemir E, Kaya OB, Yılmaz ME, Çamöz ES, Ergün Y. The Role of Artificial Intelligence (ChatGPT-4o) in Supporting Tumor Board Decisions. J Clin Med 2025;14:3535. [PMID: 40429531 PMCID: PMC12112035 DOI: 10.3390/jcm14103535] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/18/2025] [Revised: 05/09/2025] [Accepted: 05/16/2025] [Indexed: 05/29/2025] Open

Yamagiwa H, Hashimoto R, Arakane K, Murakami K, Soeda S, Oyama M, Zhu Y, Okada M, Shimodaira H. Predicting drug-gene relations via analogy tasks with word embeddings. Sci Rep 2025;15:17240. [PMID: 40383732 PMCID: PMC12086191 DOI: 10.1038/s41598-025-01418-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/27/2024] [Accepted: 05/06/2025] [Indexed: 05/20/2025] Open

Nun A, Birot O, Guibon G, Lapostolle F, Lerner I. SIMSAMU - A French medical dispatch dialog open dataset. COMPUTER METHODS AND PROGRAMS IN BIOMEDICINE 2025;268:108857. [PMID: 40408830 DOI: 10.1016/j.cmpb.2025.108857] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/26/2024] [Revised: 04/27/2025] [Accepted: 05/12/2025] [Indexed: 05/25/2025]

Abstract

BACKGROUND

Dispatch Services (DS) are essential to Emergency Medical Services (EMS). Dispatchers enable patients to access medical assistance in emergencies, anytime and anywhere, within limited time and resources. AI-based decision-support tools hold great promise for dispatchers. Developing these tools requires medical field-specific data. Medical dispatch dialogue is unique: it is a brief phone exchange in an emergency, within a limited time frame, without a physical examination.

OBJECTIVE

Our main objective was to (i) create an open French dataset of medical dispatch dialogues. Our secondary objectives were to (ii) develop a detailed medical dispatch scheme from this dataset using an unsupervised method, and (iii) provide a baseline evaluation of diarization and speech recognition models for this domain in French.

METHODS

From 2022 to 2023, emergency medicine junior doctors simulated real-life medical dispatch calls. These calls were recorded and transcribed to form the SIMSAMU corpus. We developed a dispatch scheme based on (i) recording analysis, (ii) data-driven utterance typology, and (iii) domain expertise. Utterance typology was derived via hierarchical clustering of representations learned by finetuning BERT embeddings on SIMSAMU. Clusters were mapped to the Roter Interaction Analysis System (RIAS) and included in our dispatch scheme. SIMSAMU was used to train and evaluate state-of-the-art neural network models for diarization and speech recognition. Diarization used the PyaNet model, fine-tuned on the ESLO2 dataset. Speech recognition used a CTC model with pre-trained wav2vec 2.0 embedding, compared to the multilingual Whisper model. The CTC-wav2vec model was further fine-tuned on SIMSAMU and evaluated by leave-one-speaker-out cross-validation.

RESULTS

The dataset consists of 61 audio recordings totaling 3 h 14 min. Four clusters were identified for callers and 3 for dispatchers. Two main dialogue phases were identified: interrogation and contractualization. The diarization model achieved a 10.4 % error rate. Speech recognition word error rates were 35.8 % for Whisper, 24.8 % for the CTC-wav2vec model fine-tuned on ESLO2, and 16.1 % after in-domain fine-tuning.

CONCLUSION

We propose a French open medical dispatch dialogue dataset and an expert-validated schema of the medical dispatch dialogue based on unsupervised analysis. Notable gaps in how well speech recognition models generalize underscore the need for targeted, in-domain fine-tuning in this specialized application. SIMSAMU is designed to support this effort by serving as a benchmark for evaluating domain-adapted speech recognition and dialogue modeling strategies.

Collapse

Liu Z, Zhang G, Shen Y. Psychomedical named entity recognition method based on multi-level feature extraction and multi-granularity embedding fusion. Sci Rep 2025;15:16927. [PMID: 40374721 PMCID: PMC12081933 DOI: 10.1038/s41598-025-90939-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/17/2024] [Accepted: 02/17/2025] [Indexed: 05/17/2025] Open

Abstract

Named Entity Recognition (NER) in psychomedicine is one of the key tasks in natural language processing in psychomedicine. It aims to identify and classify specialized terms in psychomedical texts and provide powerful support for downstream tasks. Psychological medicine texts are characterized by long paragraphs, complex sentences, and scattered knowledge. The current character-based psychomedicine NER model has single embedded information. It lacks structural and phonetic characterization information. Migrating NER models from the general purpose domain to the psychomedical domain are not effective in improving entity recognition accuracy. To solve this problem, we propose a NER method based on multi-level feature extraction and multi-granularity embedding fusion (MFME-NER), which aims to provide an innovative solution. First, three different granularities of embedding information, character embedding, radical embedding and pinyin embedding, are introduced to enrich the semantic representation of the input text. Second, the BERT model is improved. Merging the features of all Encoder layers inside the output. So that the BERT model has multi-layer feature extraction capability (MFE-BERT). The character embedding is pre-trained by MFE-BERT. And the BiLSTM model is utilized for the extraction of features at the character granularity. The features of radical embedding and pinyin embedding are extracted separately by the CNN model, and then feature fusion is performed. Finally, feature vectors at three granularities are integrated using a gated feed-forward neural network attention mechanism (GA-FNNAtention). The experimental results show that MFME-NER achieved 94.26% and 89.63% F1 Score in the self-constructed psychomedical dataset PsyDatase and CBLUE dataset, respectively. The proposed method surpasses the currently used evaluation metrics, thus substantiating its rationality and efficacy.This study can better contribute to the analysis of psychomedical data.

Collapse

Harel-Canada F, Salimian A, Moghanian B, Clingan S, Nguyen A, Avra T, Poimboeuf M, Romero R, Funnell A, Petousis P, Shin M, Peng N, Shover CL, Goodman-Meza D. Enhancing Substance Use Detection in Clinical Notes with Large Language Models. RESEARCH SQUARE 2025:rs.3.rs-6615981. [PMID: 40470194 PMCID: PMC12136207 DOI: 10.21203/rs.3.rs-6615981/v1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 06/16/2025]

Affiliation(s)

Fabrice Harel-Canada Computer Science Department, University of California, Los Angeles, 404 Westwood Plaza Suite 277, Los Angeles, 90095, CA, USA
Anabel Salimian Semel Institute for Neuroscience and Human Behavior at University of California, Los Angeles, 760 Westwood Plaza, Los Angeles, 90024, CA, USA
Brandon Moghanian University of California, Los Angeles, 200 Medical Plaza Suite 365C, Los Angeles, 90024, CA, USA
Sarah Clingan Integrated Substance Abuse Programs at University of California, Los Angeles, 10911 Weyburn Ave, Ste. 200, Los Angeles, 90024, CA, USA
Allan Nguyen University of California, Los Angeles, 200 Medical Plaza Suite 365C, Los Angeles, 90024, CA, USA
Tucker Avra David Geffen School of Medicine at University of California, Los Angeles, 10833 Le Conte Ave, Los Angeles, 90095, CA, USA
Michelle Poimboeuf Division of General Internal Medicine and Health Services Research, University of California, Los Angeles, 1100 Glendon Ave STE 850, Los Angeles, 90024, CA, USA
Ruby Romero Division of General Internal Medicine and Health Services Research, University of California, Los Angeles, 1100 Glendon Ave STE 850, Los Angeles, 90024, CA, USA
Arthur Funnell Clinical and Translational Science Institute, University of California, Los Angeles, 924 Westwood Blvd Suite 420, Los Angeles, 90024, CA, USA
Panayiotis Petousis Clinical and Translational Science Institute, University of California, Los Angeles, 924 Westwood Blvd Suite 420, Los Angeles, 90024, CA, USA
Michael Shin Department of Geography, University of California, Los Angeles, 1255 Bunche Hall, Los Angeles, 90095, CA, USA
Nanyun Peng Computer Science Department, University of California, Los Angeles, 404 Westwood Plaza Suite 277, Los Angeles, 90095, CA, USA
Chelsea L. Shover Division of General Internal Medicine and Health Services Research, University of California, Los Angeles, 1100 Glendon Ave STE 850, Los Angeles, 90024, CA, USA
David Goodman-Meza Kirby Institute, University of New South Wales, Wallace Wurth Building (C27), Cnr High St & Botany St, UNSW, Sydney, 2052, NSW, Australia

Collapse

Wang X, Figueredo G, Li R, Zhang WE, Chen W, Chen X. A survey of deep-learning-based radiology report generation using multimodal inputs. Med Image Anal 2025;103:103627. [PMID: 40382855 DOI: 10.1016/j.media.2025.103627] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/20/2024] [Revised: 04/09/2025] [Accepted: 04/24/2025] [Indexed: 05/20/2025]

Shi B, Chen L, Pang S, Wang Y, Wang S, Li F, Zhao W, Guo P, Zhang L, Fan C, Zou Y, Wu X. Large Language Models and Artificial Neural Networks for Assessing 1-Year Mortality in Patients With Myocardial Infarction: Analysis From the Medical Information Mart for Intensive Care IV (MIMIC-IV) Database. J Med Internet Res 2025;27:e67253. [PMID: 40354652 PMCID: PMC12107198 DOI: 10.2196/67253] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/06/2024] [Revised: 04/01/2025] [Accepted: 04/17/2025] [Indexed: 05/14/2025] Open

Abstract

BACKGROUND

Accurate mortality risk prediction is crucial for effective cardiovascular risk management. Recent advancements in artificial intelligence (AI) have demonstrated potential in this specific medical field. Qwen-2 and Llama-3 are high-performance, open-source large language models (LLMs) available online. An artificial neural network (ANN) algorithm derived from the SWEDEHEART (Swedish Web System for Enhancement and Development of Evidence-Based Care in Heart Disease Evaluated According to Recommended Therapies) registry, termed SWEDEHEART-AI, can predict patient prognosis following acute myocardial infarction (AMI).

OBJECTIVE

This study aims to evaluate the 3 models mentioned above in predicting 1-year all-cause mortality in critically ill patients with AMI.

METHODS

The Medical Information Mart for Intensive Care IV (MIMIC-IV) database is a publicly available data set in critical care medicine. We included 2758 patients who were first admitted for AMI and discharged alive. SWEDEHEART-AI calculated the mortality rate based on each patient's 21 clinical variables. Qwen-2 and Llama-3 analyzed the content of patients' discharge records and directly provided a 1-decimal value between 0 and 1 to represent 1-year death risk probabilities. The patients' actual mortality was verified using follow-up data. The predictive performance of the 3 models was assessed and compared using the Harrell C-statistic (C-index), the area under the receiver operating characteristic curve (AUROC), calibration plots, Kaplan-Meier curves, and decision curve analysis.

RESULTS

SWEDEHEART-AI demonstrated strong discrimination in predicting 1-year all-cause mortality in patients with AMI, with a higher C-index than Qwen-2 and Llama-3 (C-index 0.72, 95% CI 0.69-0.74 vs C-index 0.65, 0.62-0.67 vs C-index 0.56, 95% CI 0.53-0.58, respectively; all P<.001 for both comparisons). SWEDEHEART-AI also showed high and consistent AUROC in the time-dependent ROC curve. The death rates calculated by SWEDEHEART-AI were positively correlated with actual mortality, and the 3 risk classes derived from this model showed clear differentiation in the Kaplan-Meier curve (P<.001). Calibration plots indicated that SWEDEHEART-AI tended to overestimate mortality risk, with an observed-to-expected ratio of 0.478. Compared with the LLMs, SWEDEHEART-AI demonstrated positive and greater net benefits at risk thresholds below 19%.

CONCLUSIONS

SWEDEHEART-AI, a trained ANN model, demonstrated the best performance, with strong discrimination and clinical utility in predicting 1-year all-cause mortality in patients with AMI from an intensive care cohort. Among the LLMs, Qwen-2 outperformed Llama-3 and showed moderate predictive value. Qwen-2 and SWEDEHEART-AI exhibited comparable classification effectiveness. The future integration of LLMs into clinical decision support systems holds promise for accurate risk stratification in patients with AMI; however, further research is needed to optimize LLM performance and address calibration issues across diverse patient populations.

Collapse

Krzyzanowski A, Pickett SD, Pogány P. Exploring BERT for Reaction Yield Prediction: Evaluating the Impact of Tokenization, Molecular Representation, and Pretraining Data Augmentation. J Chem Inf Model 2025;65:4381-4402. [PMID: 40311104 DOI: 10.1021/acs.jcim.5c00359] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 05/03/2025]

Zhang Y, Vlachos DG, Liu D, Fang H. Rapid Adaptation of Chemical Named Entity Recognition Using Few-Shot Learning and LLM Distillation. J Chem Inf Model 2025;65:4334-4345. [PMID: 40310732 DOI: 10.1021/acs.jcim.5c00248] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 05/03/2025]

Li TZ, Still JM, Zuo L, Liu Y, Krishnan AR, Sandler KL, Maldonado F, Lasko TA, Landman BA. Longitudinal Masked Representation Learning for Pulmonary Nodule Diagnosis from Language Embedded EHRs. MEDRXIV : THE PREPRINT SERVER FOR HEALTH SCIENCES 2025:2025.05.09.25327341. [PMID: 40385386 PMCID: PMC12083608 DOI: 10.1101/2025.05.09.25327341] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Subscribe] [Scholar Register] [Indexed: 05/20/2025]

Li W, Wang H, Li W, Zhao J, Sun Y. Generation-Based Few-Shot BioNER via Local Knowledge Index and Dual Prompts. Interdiscip Sci 2025:10.1007/s12539-025-00709-3. [PMID: 40347393 DOI: 10.1007/s12539-025-00709-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/25/2024] [Revised: 04/01/2025] [Accepted: 04/05/2025] [Indexed: 05/12/2025]

Aghaarabi E, Murray D. Transformer-Based Language Models for Group Randomized Trial Classification in Biomedical Literature: Model Development and Validation. JMIR Med Inform 2025;13:e63267. [PMID: 40344669 DOI: 10.2196/63267] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/14/2024] [Revised: 02/02/2025] [Accepted: 02/06/2025] [Indexed: 05/11/2025] Open

Han P, Wang J, Liu D, Liu L, Song T. Robust temporal knowledge inference via pathway snapshots with liquid neural network. Methods 2025;241:24-32. [PMID: 40349883 DOI: 10.1016/j.ymeth.2025.05.003] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/09/2025] [Revised: 04/30/2025] [Accepted: 05/08/2025] [Indexed: 05/14/2025] Open

Tait K, Cronin J, Wiper O, Wallis J, Davies J, Dürichen R. ArcTEX-a novel clinical data enrichment pipeline to support real-world evidence oncology studies. Front Digit Health 2025;7:1561358. [PMID: 40416094 PMCID: PMC12098606 DOI: 10.3389/fdgth.2025.1561358] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/15/2025] [Accepted: 04/23/2025] [Indexed: 05/27/2025] Open

Moassefi M, Houshmand S, Faghani S, Chang PD, Sun SH, Khosravi B, Triphati AG, Rasool G, Bhatia NK, Folio L, Andriole KP, Gichoya JW, Erickson BJ. Cross-Institutional Evaluation of Large Language Models for Radiology Diagnosis Extraction: A Prompt-Engineering Perspective. JOURNAL OF IMAGING INFORMATICS IN MEDICINE 2025:10.1007/s10278-025-01523-5. [PMID: 40341981 DOI: 10.1007/s10278-025-01523-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 01/15/2025] [Revised: 04/14/2025] [Accepted: 04/23/2025] [Indexed: 05/11/2025]

Tomita K, Nishida T, Kitaguchi Y, Kitazawa K, Miyake M. Image Recognition Performance of GPT-4V(ision) and GPT-4o in Ophthalmology: Use of Images in Clinical Questions. Clin Ophthalmol 2025;19:1557-1564. [PMID: 40357454 PMCID: PMC12068282 DOI: 10.2147/opth.s494480] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/14/2024] [Accepted: 04/09/2025] [Indexed: 05/15/2025] Open

Alkhoury N, Shaik M, Wurmus R, Akalin A. Enhancing biomarker based oncology trial matching using large language models. NPJ Digit Med 2025;8:250. [PMID: 40325165 PMCID: PMC12053753 DOI: 10.1038/s41746-025-01673-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/11/2024] [Accepted: 04/24/2025] [Indexed: 05/07/2025] Open

Naufal T, Mahendra R, Wicaksono AF. Sentences, entities, and keyphrases extraction from consumer health forums using multi-task learning. J Biomed Semantics 2025;16:8. [PMID: 40329333 PMCID: PMC12057135 DOI: 10.1186/s13326-025-00329-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/18/2024] [Accepted: 04/08/2025] [Indexed: 05/08/2025] Open

Abstract

PURPOSE

Online consumer health forums offer an alternative source of health-related information for internet users seeking specific details that may not be readily available through articles or other one-way communication channels. However, the effectiveness of these forums can be constrained by the limited number of healthcare professionals actively participating, which can impact response times to user inquiries. One potential solution to this issue is the integration of a semi-automatic system. A critical component of such a system is question processing, which often involves sentence recognition (SR), medical entity recognition (MER), and keyphrase extraction (KE) modules. We posit that the development of these three modules would enable the system to identify critical components of the question, thereby facilitating a deeper understanding of the question, and allowing for the re-formulation of more effective questions with extracted key information.

METHODS

This work contributes to two key aspects related to these three tasks. First, we expand and publicly release an Indonesian dataset for each task. Second, we establish a baseline for all three tasks within the Indonesian language domain by employing transformer-based models with nine distinct encoder variations. Our feature studies revealed an interdependence among these three tasks. Consequently, we propose several multi-task learning (MTL) models, both in pairwise and three-way configurations, incorporating parallel and hierarchical architectures.

RESULTS

Using F1-score at the chunk level, the inter-annotator agreements for SR, MER, and KE tasks were 88.61 % , 64.83 % , and 35.01 % respectively. In single-task learning (STL) settings, the best performance for each task was achieved by different model, with IndoNLU LARGE obtained the highest average score. These results suggested that a larger model did not always perform better. We also found no indication of which ones between Indonesian and multilingual language models that generally performed better for our tasks. In pairwise MTL settings, we found that pairing tasks could outperform the STL baseline for all three tasks. Despite varying loss weights across our three-way MTL models, we did not identify a consistent pattern. While some configurations improved MER and KE performance, none surpassed the best pairwise MTL model for the SR task.

CONCLUSION

We extended an Indonesian dataset for SR, MER, and KE tasks, resulted in 1, 173 labeled data points which splitted into 773 training instances, 200 validation instances, and 200 testing instances. We then used transformer-based models to set a baseline for all three tasks. Our MTL experiments suggested that additional information regarding the other two tasks could help the learning process for MER and KE tasks, while had only a small effect for SR task.

Collapse

Chang YC, Huang MS, Huang YH, Lin YH. The influence of prompt engineering on large language models for protein-protein interaction identification in biomedical literature. Sci Rep 2025;15:15493. [PMID: 40319086 PMCID: PMC12049485 DOI: 10.1038/s41598-025-99290-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/10/2024] [Accepted: 04/18/2025] [Indexed: 05/07/2025] Open