1
|
Sivarajkumar S, Kelley M, Samolyk-Mazzanti A, Visweswaran S, Wang Y. An Empirical Evaluation of Prompting Strategies for Large Language Models in Zero-Shot Clinical Natural Language Processing: Algorithm Development and Validation Study. JMIR Med Inform 2024; 12:e55318. [PMID: 38587879 PMCID: PMC11036183 DOI: 10.2196/55318] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/08/2023] [Revised: 02/20/2024] [Accepted: 02/24/2024] [Indexed: 04/09/2024] Open
Abstract
BACKGROUND Large language models (LLMs) have shown remarkable capabilities in natural language processing (NLP), especially in domains where labeled data are scarce or expensive, such as the clinical domain. However, to unlock the clinical knowledge hidden in these LLMs, we need to design effective prompts that can guide them to perform specific clinical NLP tasks without any task-specific training data. This is known as in-context learning, which is an art and science that requires understanding the strengths and weaknesses of different LLMs and prompt engineering approaches. OBJECTIVE The objective of this study is to assess the effectiveness of various prompt engineering techniques, including 2 newly introduced types-heuristic and ensemble prompts, for zero-shot and few-shot clinical information extraction using pretrained language models. METHODS This comprehensive experimental study evaluated different prompt types (simple prefix, simple cloze, chain of thought, anticipatory, heuristic, and ensemble) across 5 clinical NLP tasks: clinical sense disambiguation, biomedical evidence extraction, coreference resolution, medication status extraction, and medication attribute extraction. The performance of these prompts was assessed using 3 state-of-the-art language models: GPT-3.5 (OpenAI), Gemini (Google), and LLaMA-2 (Meta). The study contrasted zero-shot with few-shot prompting and explored the effectiveness of ensemble approaches. RESULTS The study revealed that task-specific prompt tailoring is vital for the high performance of LLMs for zero-shot clinical NLP. In clinical sense disambiguation, GPT-3.5 achieved an accuracy of 0.96 with heuristic prompts and 0.94 in biomedical evidence extraction. Heuristic prompts, alongside chain of thought prompts, were highly effective across tasks. Few-shot prompting improved performance in complex scenarios, and ensemble approaches capitalized on multiple prompt strengths. GPT-3.5 consistently outperformed Gemini and LLaMA-2 across tasks and prompt types. CONCLUSIONS This study provides a rigorous evaluation of prompt engineering methodologies and introduces innovative techniques for clinical information extraction, demonstrating the potential of in-context learning in the clinical domain. These findings offer clear guidelines for future prompt-based clinical NLP research, facilitating engagement by non-NLP experts in clinical NLP advancements. To the best of our knowledge, this is one of the first works on the empirical evaluation of different prompt engineering approaches for clinical NLP in this era of generative artificial intelligence, and we hope that it will inspire and inform future research in this area.
Collapse
Affiliation(s)
- Sonish Sivarajkumar
- Intelligent Systems Program, University of Pittsburgh, Pittsburgh, PA, United States
| | - Mark Kelley
- Department of Health Information Management, University of Pittsburgh, Pittsburgh, PA, United States
| | - Alyssa Samolyk-Mazzanti
- Department of Health Information Management, University of Pittsburgh, Pittsburgh, PA, United States
| | - Shyam Visweswaran
- Intelligent Systems Program, University of Pittsburgh, Pittsburgh, PA, United States
- Department of Biomedical Informatics, University of Pittsburgh, Pittsburgh, PA, United States
| | - Yanshan Wang
- Intelligent Systems Program, University of Pittsburgh, Pittsburgh, PA, United States
- Department of Health Information Management, University of Pittsburgh, Pittsburgh, PA, United States
- Department of Biomedical Informatics, University of Pittsburgh, Pittsburgh, PA, United States
| |
Collapse
|
2
|
Chu HY, Fong JHC, Thean DGL, Zhou P, Fung FKC, Huang Y, Wong ASL. Accurate top protein variant discovery via low-N pick-and-validate machine learning. Cell Syst 2024; 15:193-203.e6. [PMID: 38340729 DOI: 10.1016/j.cels.2024.01.002] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/21/2023] [Revised: 10/11/2023] [Accepted: 01/18/2024] [Indexed: 02/12/2024]
Abstract
A strategy to obtain the greatest number of best-performing variants with least amount of experimental effort over the vast combinatorial mutational landscape would have enormous utility in boosting resource producibility for protein engineering. Toward this goal, we present a simple and effective machine learning-based strategy that outperforms other state-of-the-art methods. Our strategy integrates zero-shot prediction and multi-round sampling to direct active learning via experimenting with only a few predicted top variants. We find that four rounds of low-N pick-and-validate sampling of 12 variants for machine learning yielded the best accuracy of up to 92.6% in selecting the true top 1% variants in combinatorial mutant libraries, whereas two rounds of 24 variants can also be used. We demonstrate our strategy in successfully discovering high-performance protein variants from diverse families including the CRISPR-based genome editors, supporting its generalizable application for solving protein engineering tasks. A record of this paper's transparent peer review process is included in the supplemental information.
Collapse
Affiliation(s)
- Hoi Yee Chu
- Laboratory of Combinatorial Genetics and Synthetic Biology, School of Biomedical Sciences, The University of Hong Kong, Pokfulam, Hong Kong SAR, China; Centre for Oncology and Immunology, Hong Kong Science Park, Hong Kong SAR, China
| | - John H C Fong
- Laboratory of Combinatorial Genetics and Synthetic Biology, School of Biomedical Sciences, The University of Hong Kong, Pokfulam, Hong Kong SAR, China
| | - Dawn G L Thean
- Laboratory of Combinatorial Genetics and Synthetic Biology, School of Biomedical Sciences, The University of Hong Kong, Pokfulam, Hong Kong SAR, China
| | - Peng Zhou
- Laboratory of Combinatorial Genetics and Synthetic Biology, School of Biomedical Sciences, The University of Hong Kong, Pokfulam, Hong Kong SAR, China; Centre for Oncology and Immunology, Hong Kong Science Park, Hong Kong SAR, China
| | - Frederic K C Fung
- Laboratory of Combinatorial Genetics and Synthetic Biology, School of Biomedical Sciences, The University of Hong Kong, Pokfulam, Hong Kong SAR, China; Centre for Oncology and Immunology, Hong Kong Science Park, Hong Kong SAR, China
| | - Yuanhua Huang
- School of Biomedical Sciences, The University of Hong Kong, Pokfulam, Hong Kong SAR, China; Department of Statistics and Actuarial Science, The University of Hong Kong, Pokfulam, Hong Kong SAR, China
| | - Alan S L Wong
- Laboratory of Combinatorial Genetics and Synthetic Biology, School of Biomedical Sciences, The University of Hong Kong, Pokfulam, Hong Kong SAR, China; Centre for Oncology and Immunology, Hong Kong Science Park, Hong Kong SAR, China.
| |
Collapse
|
3
|
Guan L, Liu F, Zhang R, Liu J, Tang Y. MCW: A Generalizable Deepfake Detection Method for Few-Shot Learning. Sensors (Basel) 2023; 23:8763. [PMID: 37960463 PMCID: PMC10649340 DOI: 10.3390/s23218763] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/05/2023] [Revised: 10/09/2023] [Accepted: 10/17/2023] [Indexed: 11/15/2023]
Abstract
With the development of deepfake technology, deepfake detection has received widespread attention. Although some deepfake forensics techniques have been proposed, they are still very difficult to implement in real-world scenarios. This is due to the differences in different deepfake technologies and the compression or editing of videos during the propagation process. Considering the issue of sample imbalance with few-shot scenarios in deepfake detection, we propose a multi-feature channel domain-weighted framework based on meta-learning (MCW). In order to obtain outstanding detection performance of a cross-database, the proposed framework improves a meta-learning network in two ways: it enhances the model's feature extraction ability for detecting targets by combining the RGB domain and frequency domain information of the image and enhances the model's generalization ability for detecting targets by assigning meta weights to channels on the feature map. The proposed MCW framework solves the problems of poor detection performance and insufficient data compression resistance of the algorithm for samples generated by unknown algorithms. The experiment was set in a zero-shot scenario and few-shot scenario, simulating the deepfake detection environment in real situations. We selected nine detection algorithms as comparative algorithms. The experimental results show that the MCW framework outperforms other algorithms in cross-algorithm detection and cross-dataset detection. The MCW framework demonstrates its ability to generalize and resist compression with low-quality training images and across different generation algorithm scenarios, and it has better fine-tuning potential in few-shot learning scenarios.
Collapse
Affiliation(s)
- Lei Guan
- Department of Electronic Engineering, Tsinghua University, Beijing 100190, China;
| | - Fan Liu
- Department of Cyberspace Security, Beijing University of Posts and Telecommunications, Beijing 100876, China; (F.L.); (J.L.); (Y.T.)
| | - Ru Zhang
- Department of Cyberspace Security, Beijing University of Posts and Telecommunications, Beijing 100876, China; (F.L.); (J.L.); (Y.T.)
| | - Jianyi Liu
- Department of Cyberspace Security, Beijing University of Posts and Telecommunications, Beijing 100876, China; (F.L.); (J.L.); (Y.T.)
| | - Yifan Tang
- Department of Cyberspace Security, Beijing University of Posts and Telecommunications, Beijing 100876, China; (F.L.); (J.L.); (Y.T.)
| |
Collapse
|
4
|
Tian J, Zhang J. A Zero-Shot Low Light Image Enhancement Method Integrating Gating Mechanism. Sensors (Basel) 2023; 23:7306. [PMID: 37631842 PMCID: PMC10458961 DOI: 10.3390/s23167306] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/25/2023] [Revised: 07/15/2023] [Accepted: 07/19/2023] [Indexed: 08/27/2023]
Abstract
Photographs taken under harsh ambient lighting can suffer from a number of image quality degradation phenomena due to insufficient exposure. These include reduced brightness, loss of transfer information, noise, and color distortion. In order to solve the above problems, researchers have proposed many deep learning-based methods to improve the illumination of images. However, most existing methods face the problem of difficulty in obtaining paired training data. In this context, a zero-reference image enhancement network for low light conditions is proposed in this paper. First, the improved Encoder-Decoder structure is used to extract image features to generate feature maps and generate the parameter matrix of the enhancement factor from the feature maps. Then, the enhancement curve is constructed using the parameter matrix. The image is iteratively enhanced using the enhancement curve and the enhancement parameters. Second, the unsupervised algorithm needs to design an image non-reference loss function in training. Four non-reference loss functions are introduced to train the parameter estimation network. Experiments on several datasets with only low-light images show that the proposed network has improved performance compared with other methods in NIQE, PIQE, and BRISQUE non-reference evaluation index, and ablation experiments are carried out for key parts, which proves the effectiveness of this method. At the same time, the performance data of the method on PC devices and mobile devices are investigated, and the experimental analysis is given. This proves the feasibility of the method in this paper in practical application.
Collapse
Affiliation(s)
| | - Jianwei Zhang
- School of Computer Science, Sichuan University, Chengdu 610065, China;
| |
Collapse
|
5
|
Tang L, Sun Z, Idnay B, Nestor JG, Soroush A, Elias PA, Xu Z, Ding Y, Durrett G, Rousseau J, Weng C, Peng Y. Evaluating Large Language Models on Medical Evidence Summarization. medRxiv 2023:2023.04.22.23288967. [PMID: 37162998 PMCID: PMC10168498 DOI: 10.1101/2023.04.22.23288967] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/11/2023]
Abstract
Recent advances in large language models (LLMs) have demonstrated remarkable successes in zero- and few-shot performance on various downstream tasks, paving the way for applications in high-stakes domains. In this study, we systematically examine the capabilities and limitations of LLMs, specifically GPT-3.5 and ChatGPT, in performing zero-shot medical evidence summarization across six clinical domains. We conduct both automatic and human evaluations, covering several dimensions of summary quality. Our study has demonstrated that automatic metrics often do not strongly correlate with the quality of summaries. Furthermore, informed by our human evaluations, we define a terminology of error types for medical evidence summarization. Our findings reveal that LLMs could be susceptible to generating factually inconsistent summaries and making overly convincing or uncertain statements, leading to potential harm due to misinformation. Moreover, we find that models struggle to identify the salient information and are more error-prone when summarizing over longer textual contexts.
Collapse
Affiliation(s)
- Liyan Tang
- School of Information, The University of Texas at Austin, Austin, TX
| | - Zhaoyi Sun
- Department of Population Health Sciences, Weill Cornell Medicine, New York, NY
| | - Betina Idnay
- Department of Biomedical Informatics, Columbia University, New York, NY
| | | | - Ali Soroush
- Department of Biomedical Informatics, Columbia University, New York, NY
| | - Pierre A. Elias
- Department of Biomedical Informatics, Columbia University, New York, NY
| | - Ziyang Xu
- Department of Medicine, Massachusetts General Hospital, Boston, MA
| | - Ying Ding
- School of Information, The University of Texas at Austin, Austin, TX
| | - Greg Durrett
- Department of Computer Science, The University of Texas at Austin, Austin, TX
| | - Justin Rousseau
- Departments of Population Health and Neurology, Dell Medical School, The University of Texas at Austin, Austin, TX
| | - Chunhua Weng
- Department of Biomedical Informatics, Columbia University, New York, NY
| | - Yifan Peng
- Department of Population Health Sciences, Weill Cornell Medicine, New York, NY
| |
Collapse
|
6
|
Allaway E, McKeown K. Zero-shot stance detection: Paradigms and challenges. Front Artif Intell 2023; 5:1070429. [PMID: 36714207 PMCID: PMC9880531 DOI: 10.3389/frai.2022.1070429] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/14/2022] [Accepted: 12/26/2022] [Indexed: 01/14/2023] Open
Abstract
A major challenge in stance detection is the large (potentially infinite) and diverse set of stance topics. Collecting data for such a set is unrealistic due to both the expense of annotation and the continuous creation of new real-world topics (e.g., a new politician runs for office). Furthermore, stancetaking occurs in a wide range of languages and genres (e.g., Twitter, news articles). While zero-shot stance detection in English, where evaluation is on topics not seen during training, has received increasing attention, we argue that this attention should be expanded to multilingual and multi-genre settings. We discuss two paradigms for English zero-shot stance detection evaluation, as well as recent work in this area. We then discuss recent work on multilingual and multi-genre stance detection, which has focused primarily on non-zero-shot settings. We argue that this work should be expanded to multilingual and multi-genre zero-shot stance detection and propose best practices to systematize and stimulate future work in this direction. While domain adaptation techniques are well-suited for work in these settings, we argue that increased care should be taken to improve model explainability and to conduct robust evaluations, considering not only empirical generalization ability but also the understanding of complex language and inferences.
Collapse
|
7
|
Paul A, Shen TC, Lee S, Balachandar N, Peng Y, Lu Z, Summers RM. Generalized Zero-Shot Chest X-Ray Diagnosis Through Trait-Guided Multi-View Semantic Embedding With Self-Training. IEEE Trans Med Imaging 2021; 40:2642-2655. [PMID: 33523805 PMCID: PMC8591713 DOI: 10.1109/tmi.2021.3054817] [Citation(s) in RCA: 7] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/10/2023]
Abstract
Zero-shot learning (ZSL) is one of the most promising avenues of annotation-efficient machine learning. In the era of deep learning, ZSL techniques have achieved unprecedented success. However, the developments of ZSL methods have taken place mostly for natural images. ZSL for medical images has remained largely unexplored. We design a novel strategy for generalized zero-shot diagnosis of chest radiographs. In doing so, we leverage the potential of multi-view semantic embedding, a useful yet less-explored direction for ZSL. Our design also incorporates a self-training phase to tackle the problem of noisy labels alongside improving the performance for classes not seen during training. Through rigorous experiments, we show that our model trained on one dataset can produce consistent performance across test datasets from different sources including those with very different quality. Comparisons with a number of state-of-the-art techniques show the superiority of the proposed method for generalized zero-shot chest x-ray diagnosis.
Collapse
|
8
|
Szczotka AB, Shakir DI, Clarkson MJ, Pereira SP, Vercauteren T. Zero-Shot Super-Resolution With a Physically-Motivated Downsampling Kernel for Endomicroscopy. IEEE Trans Med Imaging 2021; 40:1863-1874. [PMID: 33739921 PMCID: PMC7610492 DOI: 10.1109/tmi.2021.3067512] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/12/2023]
Abstract
Super-resolution (SR) methods have seen significant advances thanks to the development of convolutional neural networks (CNNs). CNNs have been successfully employed to improve the quality of endomicroscopy imaging. Yet, the inherent limitation of research on SR in endomicroscopy remains the lack of ground truth high-resolution (HR) images, commonly used for both supervised training and reference-based image quality assessment (IQA). Therefore, alternative methods, such as unsupervised SR are being explored. To address the need for non-reference image quality improvement, we designed a novel zero-shot super-resolution (ZSSR) approach that relies only on the endomicroscopy data to be processed in a self-supervised manner without the need for ground-truth HR images. We tailored the proposed pipeline to the idiosyncrasies of endomicroscopy by introducing both: a physically-motivated Voronoi downscaling kernel accounting for the endomicroscope's irregular fibre-based sampling pattern, and realistic noise patterns. We also took advantage of video sequences to exploit a sequence of images for self-supervised zero-shot image quality improvement. We run ablation studies to assess our contribution in regards to the downscaling kernel and noise simulation. We validate our methodology on both synthetic and original data. Synthetic experiments were assessed with reference-based IQA, while our results for original images were evaluated in a user study conducted with both expert and non-expert observers. The results demonstrated superior performance in image quality of ZSSR reconstructions in comparison to the baseline method. The ZSSR is also competitive when compared to supervised single-image SR, especially being the preferred reconstruction technique by experts.
Collapse
|
9
|
Abstract
An important aspect of intelligence is the ability to adapt to a novel task without any direct experience (zero shot), based on its relationship to previous tasks. Humans can exhibit this cognitive flexibility. By contrast, models that achieve superhuman performance in specific tasks often fail to adapt to even slight task alterations. To address this, we propose a general computational framework for adapting to novel tasks based on their relationship to prior tasks. We begin by learning vector representations of tasks. To adapt to new tasks, we propose metamappings, higher-order tasks that transform basic task representations. We demonstrate the effectiveness of this framework across a wide variety of tasks and computational paradigms, ranging from regression to image classification and reinforcement learning. We compare to both human adaptability and language-based approaches to zero-shot learning. Across these domains, metamapping is successful, often achieving 80 to 90% performance, without any data, on a novel task, even when the new task directly contradicts prior experience. We further show that metamapping can not only generalize to new tasks via learned relationships, but can also generalize using novel relationships unseen during training. Finally, using metamapping as a starting point can dramatically accelerate later learning on a new task and reduce learning time and cumulative error substantially. Our results provide insight into a possible computational basis of intelligent adaptability and offer a possible framework for modeling cognitive flexibility and building more flexible artificial intelligence systems.
Collapse
|