1
|
Li Y, Yan Y, Tong Z, Wang Y, Yang Y, Bai M, Pu D, Xie J, Liu C, Li B, Liu M, Shu K. Efficient fine-tuning of small-parameter large language models for biomedical bilingual multi-task applications. Appl Soft Comput 2025; 175:113084. [DOI: 10.1016/j.asoc.2025.113084] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 05/16/2025]
|
2
|
Chen X, Wang T, Zhou J, Song Z, Gao X, Zhang X. Evaluating and mitigating bias in AI-based medical text generation. NATURE COMPUTATIONAL SCIENCE 2025:10.1038/s43588-025-00789-7. [PMID: 40269315 DOI: 10.1038/s43588-025-00789-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Received: 08/17/2024] [Accepted: 03/12/2025] [Indexed: 04/25/2025]
Abstract
Artificial intelligence (AI) systems, particularly those based on deep learning models, have increasingly achieved expert-level performance in medical applications. However, there is growing concern that such AI systems may reflect and amplify human bias, reducing the quality of their performance in historically underserved populations. The fairness issue has attracted considerable research interest in the medical imaging classification field, yet it remains understudied in the text-generation domain. In this study, we investigate the fairness problem in text generation within the medical field and observe substantial performance discrepancies across different races, sexes and age groups, including intersectional groups, various model scales and different evaluation metrics. To mitigate this fairness issue, we propose an algorithm that selectively optimizes those underserved groups to reduce bias. Our evaluations across multiple backbones, datasets and modalities demonstrate that our proposed algorithm enhances fairness in text generation without compromising overall performance.
Collapse
Affiliation(s)
- Xiuying Chen
- Mohamed bin Zayed University of Artificial Intelligence, Abu Dhabi, United Arab Emirates.
- King Abdullah University of Science and Technology, Thuwal, Saudi Arabia.
| | - Tairan Wang
- King Abdullah University of Science and Technology, Thuwal, Saudi Arabia
| | - Juexiao Zhou
- King Abdullah University of Science and Technology, Thuwal, Saudi Arabia
| | - Zirui Song
- Mohamed bin Zayed University of Artificial Intelligence, Abu Dhabi, United Arab Emirates
| | - Xin Gao
- King Abdullah University of Science and Technology, Thuwal, Saudi Arabia.
| | - Xiangliang Zhang
- King Abdullah University of Science and Technology, Thuwal, Saudi Arabia.
- University of Notre Dame, Notre Dame, IN, USA.
| |
Collapse
|
3
|
Junyent M, Noori H, De Schepper R, Frajdenberg S, Elsaigh RKAH, McDonald PH, Duckett D, Maudsley S. Unravelling Convergent Signaling Mechanisms Underlying the Aging-Disease Nexus Using Computational Language Analysis. Curr Issues Mol Biol 2025; 47:189. [PMID: 40136443 PMCID: PMC11941692 DOI: 10.3390/cimb47030189] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/21/2025] [Revised: 02/12/2025] [Accepted: 03/08/2025] [Indexed: 03/27/2025] Open
Abstract
Multiple lines of evidence suggest that multiple pathological conditions and diseases that account for the majority of human mortality are driven by the molecular aging process. At the cellular level, aging can largely be conceptualized to comprise the progressive accumulation of molecular damage, leading to resultant cellular dysfunction. As many diseases, e.g., cancer, coronary heart disease, Chronic obstructive pulmonary disease, Type II diabetes mellitus, or chronic kidney disease, potentially share a common molecular etiology, then the identification of such mechanisms may represent an ideal locus to develop targeted prophylactic agents that can mitigate this disease-driving mechanism. Here, using the input of artificial intelligence systems to generate unbiased disease and aging mechanism profiles, we have aimed to identify key signaling mechanisms that may represent new disease-preventing signaling pathways that are ideal for the creation of disease-preventing chemical interventions. Using a combinatorial informatics approach, we have identified a potential critical mechanism involving the recently identified kinase, Dual specificity tyrosine-phosphorylation-regulated kinase 3 (DYRK3) and the epidermal growth factor receptor (EGFR) that may function as a regulator of the pathological transition of health into disease via the control of cellular fate in response to stressful insults.
Collapse
Affiliation(s)
- Marina Junyent
- Receptor Biology Lab., University of Antwerp, 2610 Wilrijk, Belgium; (M.J.); (H.N.); (R.D.S.); (S.F.); (R.K.A.H.E.)
- IMIM, Hospital del Mar Research Institute, 08003 Barcelona, Spain
| | - Haki Noori
- Receptor Biology Lab., University of Antwerp, 2610 Wilrijk, Belgium; (M.J.); (H.N.); (R.D.S.); (S.F.); (R.K.A.H.E.)
- Department of Chemistry, KU Leuven, Oude Markt 13, 3000 Leuven, Belgium
| | - Robin De Schepper
- Receptor Biology Lab., University of Antwerp, 2610 Wilrijk, Belgium; (M.J.); (H.N.); (R.D.S.); (S.F.); (R.K.A.H.E.)
| | - Shanna Frajdenberg
- Receptor Biology Lab., University of Antwerp, 2610 Wilrijk, Belgium; (M.J.); (H.N.); (R.D.S.); (S.F.); (R.K.A.H.E.)
| | | | - Patricia H. McDonald
- Lexicon Pharmaceuticals Inc., 2445 Technology Forest Blvd Fl 1, The Woodlands, TX 77381, USA;
| | - Derek Duckett
- Department of Drug Discovery, H. Lee Moffitt Cancer Center, 12902 Magnolia Drive, Tampa, FL 33612, USA;
| | - Stuart Maudsley
- Receptor Biology Lab., University of Antwerp, 2610 Wilrijk, Belgium; (M.J.); (H.N.); (R.D.S.); (S.F.); (R.K.A.H.E.)
- Department of Drug Discovery, H. Lee Moffitt Cancer Center, 12902 Magnolia Drive, Tampa, FL 33612, USA;
| |
Collapse
|
4
|
Del Moral-González R, Gómez-Adorno H, Ramos-Flores O. Comparative analysis of generative LLMs for labeling entities in clinical notes. Genomics Inform 2025; 23:3. [PMID: 39915888 PMCID: PMC11804004 DOI: 10.1186/s44342-024-00036-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/01/2024] [Accepted: 12/31/2024] [Indexed: 02/09/2025] Open
Abstract
This paper evaluates and compares different fine-tuned variations of generative large language models (LLM) in the zero-shot named entity recognition (NER) task for the clinical domain. As part of the 8th Biomedical Linked Annotation Hackathon, we examined Llama 2 and Mistral models, including base versions and those that have been fine-tuned for code, chat, and instruction-following tasks. We assess both the number of correctly identified entities and the models' ability to retrieve entities in structured formats. We used a publicly available set of clinical cases labeled with mentions of diseases, symptoms, and medical procedures for the evaluation. Results show that instruction fine-tuned models perform better than chat fine-tuned and base models in recognizing entities. It is also shown that models perform better when simple output structures are requested.
Collapse
Affiliation(s)
- Rodrigo Del Moral-González
- Posgrado en Ciencia e Ingeniería de la Computación, Universidad Nacional Autónoma de México, Circuito Escolar, Ciudad Universitaria, Coyoacán, 04510, Ciudad de México, México.
| | - Helena Gómez-Adorno
- Instituto de Investigaciones en Matemáticas Aplicadas y en Sistemas, Universidad Nacional Autónoma de México, Circuito Escolar, Ciudad Universitaria, Coyoacán, 04510, Ciudad de México, México
| | - Orlando Ramos-Flores
- Instituto de Investigaciones en Matemáticas Aplicadas y en Sistemas, Universidad Nacional Autónoma de México, Circuito Escolar, Ciudad Universitaria, Coyoacán, 04510, Ciudad de México, México
| |
Collapse
|
5
|
Chen K, Xu W, Li X. The Potential of Gemini and GPTs for Structured Report Generation based on Free-Text 18F-FDG PET/CT Breast Cancer Reports. Acad Radiol 2025; 32:624-633. [PMID: 39245597 DOI: 10.1016/j.acra.2024.08.052] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/06/2024] [Revised: 08/15/2024] [Accepted: 08/25/2024] [Indexed: 09/10/2024]
Abstract
RATIONALE AND OBJECTIVE To compare the performance of large language model (LLM) based Gemini and Generative Pre-trained Transformers (GPTs) in data mining and generating structured reports based on free-text PET/CT reports for breast cancer after user-defined tasks. MATERIALS AND METHODS Breast cancer patients (mean age, 50 years ± 11 [SD]; all female) who underwent consecutive 18F-FDG PET/CT for follow-up between July 2005 and October 2023 were retrospectively included in the study. A total of twenty reports from 10 patients were used to train user-defined text prompts for Gemini and GPTs, by which structured PET/CT reports were generated. The natural language processing (NLP) generated structured reports and the structured reports annotated by nuclear medicine physicians were compared in terms of data extraction accuracy and capacity of progress decision-making. Statistical methods, including chi-square test, McNemar test and paired samples t-test, were employed in the study. RESULTS The structured PET/CT reports for 131 patients were generated by using the two NLP techniques, including Gemini and GPTs. In general, GPTs exhibited superiority over Gemini in data mining in terms of primary lesion size (89.6% vs. 53.8%, p < 0.001) and metastatic lesions (96.3% vs 89.6%, p < 0.001). Moreover, GPTs outperformed Gemini in making decision for progress (p < 0.001) and semantic similarity (F1 score 0.930 vs 0.907, p < 0.001) for reports. CONCLUSION GPTs outperformed Gemini in generating structured reports based on free-text PET/CT reports, which is potentially applied in clinical practice. DATA AVAILABILITY The data used and/or analyzed during the current study are available from the corresponding author on reasonable request.
Collapse
Affiliation(s)
- Kun Chen
- Department of Nuclear Medicine, Zhejiang Cancer Hospital, Hangzhou Institute of Medicine (HIM), Chinese Academy of Sciences, Hangzhou, Zhejiang 310022, China (K.C.)
| | - Wengui Xu
- Department of Molecular Imaging and Nuclear Medicine, Tianjin Medical University Cancer Institute and Hospital, National Clinical Research Center for Cancer, Huanhuxi Road, Hexi District, Tianjin 300060, China (W.X., X.L.); Tianjin's Clinical Research Center for Cancer, Tianjin 300060, China (W.X., X.L.)
| | - Xiaofeng Li
- Department of Molecular Imaging and Nuclear Medicine, Tianjin Medical University Cancer Institute and Hospital, National Clinical Research Center for Cancer, Huanhuxi Road, Hexi District, Tianjin 300060, China (W.X., X.L.); Tianjin's Clinical Research Center for Cancer, Tianjin 300060, China (W.X., X.L.).
| |
Collapse
|
6
|
Zhang Y, Ren S, Wang J, Lu J, Wu C, He M, Liu X, Wu R, Zhao J, Zhan C, Du D, Zhan Z, Singla RK, Shen B. Aligning Large Language Models with Humans: A Comprehensive Survey of ChatGPT's Aptitude in Pharmacology. Drugs 2025; 85:231-254. [PMID: 39702867 PMCID: PMC11802629 DOI: 10.1007/s40265-024-02124-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 11/11/2024] [Indexed: 12/21/2024]
Abstract
BACKGROUND Due to the lack of a comprehensive pharmacology test set, evaluating the potential and value of large language models (LLMs) in pharmacology is complex and challenging. AIMS This study aims to provide a test set reference for assessing the application potential of both general-purpose and specialized LLMs in pharmacology. METHODS We constructed a pharmacology test set consisting of three tasks: drug information retrieval, lead compound structure optimization, and research trend summarization and analysis. Subsequently, we compared the performance of general-purpose LLMs GPT-3.5 and GPT-4 on this test set. RESULTS The results indicate that GPT-3.5 and GPT-4 can better understand instructions for information retrieval, scheme optimization, and trend summarization in pharmacology, showing significant potential in basic pharmacology tasks, especially in areas such as drug pharmacological properties, pharmacokinetics, mode of action, and toxicity prediction. These general LLMs also effectively summarize the current challenges and future trends in this field, proving their valuable resource for interdisciplinary pharmacology researchers. However, the limitations of ChatGPT become evident when handling tasks such as drug identification queries, drug interaction information retrieval, and drug structure simulation optimization. It struggles to provide accurate interaction information for individual or specific drugs and cannot optimize specific drugs. This lack of depth in knowledge integration and analysis limits its application in scientific research and clinical exploration. CONCLUSION Therefore, exploring retrieval-augmented generation (RAG) or integrating proprietary knowledge bases and knowledge graphs into pharmacology-oriented ChatGPT systems would yield favorable results. This integration will further optimize the potential of LLMs in pharmacology.
Collapse
Affiliation(s)
- Yingbo Zhang
- Department of Pharmacy and Institutes for Systems Genetics, Frontiers Science Center for Disease-related Molecular Network, West China Hospital, Sichuan University, Chengdu, 610212, China
- Tropical Crops Genetic Resources Institute, Chinese Academy of Tropical Agricultural Sciences (CATAS), Haikou, 571101, China
| | - Shumin Ren
- Department of Pharmacy and Institutes for Systems Genetics, Frontiers Science Center for Disease-related Molecular Network, West China Hospital, Sichuan University, Chengdu, 610212, China
- Department of Computer Science and Information Technology, University of A Coruña, 15071, A Coruña, Spain
| | - Jiao Wang
- Department of Pharmacy and Institutes for Systems Genetics, Frontiers Science Center for Disease-related Molecular Network, West China Hospital, Sichuan University, Chengdu, 610212, China
- Department of Computer Science and Information Technology, University of A Coruña, 15071, A Coruña, Spain
| | - Junyu Lu
- Department of Pharmacy and Institutes for Systems Genetics, Frontiers Science Center for Disease-related Molecular Network, West China Hospital, Sichuan University, Chengdu, 610212, China
| | - Cong Wu
- Department of Pharmacy and Institutes for Systems Genetics, Frontiers Science Center for Disease-related Molecular Network, West China Hospital, Sichuan University, Chengdu, 610212, China
| | - Mengqiao He
- Department of Pharmacy and Institutes for Systems Genetics, Frontiers Science Center for Disease-related Molecular Network, West China Hospital, Sichuan University, Chengdu, 610212, China
| | - Xingyun Liu
- Department of Pharmacy and Institutes for Systems Genetics, Frontiers Science Center for Disease-related Molecular Network, West China Hospital, Sichuan University, Chengdu, 610212, China
- Department of Computer Science and Information Technology, University of A Coruña, 15071, A Coruña, Spain
| | - Rongrong Wu
- Department of Pharmacy and Institutes for Systems Genetics, Frontiers Science Center for Disease-related Molecular Network, West China Hospital, Sichuan University, Chengdu, 610212, China
| | - Jing Zhao
- Department of Pharmacy and Institutes for Systems Genetics, Frontiers Science Center for Disease-related Molecular Network, West China Hospital, Sichuan University, Chengdu, 610212, China
| | - Chaoying Zhan
- Department of Pharmacy and Institutes for Systems Genetics, Frontiers Science Center for Disease-related Molecular Network, West China Hospital, Sichuan University, Chengdu, 610212, China
| | - Dan Du
- Advanced Mass Spectrometry Center, Research Core Facility, Frontiers Science Center for Disease-Related Molecular Network, West China Hospital/West China Medical School, Sichuan University, Chengdu, 610041, China
| | - Zhajun Zhan
- College of Pharmaceutical Science, Zhejiang University of Technology, Hangzhou, 310014, China
| | - Rajeev K Singla
- Department of Pharmacy and Institutes for Systems Genetics, Frontiers Science Center for Disease-related Molecular Network, West China Hospital, Sichuan University, Chengdu, 610212, China
- School of Pharmaceutical Sciences, Lovely Professional University, Phagwara, Punjab-144411, India
| | - Bairong Shen
- Department of Pharmacy and Institutes for Systems Genetics, Frontiers Science Center for Disease-related Molecular Network, West China Hospital, Sichuan University, Chengdu, 610212, China.
| |
Collapse
|
7
|
Bedi S, Liu Y, Orr-Ewing L, Dash D, Koyejo S, Callahan A, Fries JA, Wornow M, Swaminathan A, Lehmann LS, Hong HJ, Kashyap M, Chaurasia AR, Shah NR, Singh K, Tazbaz T, Milstein A, Pfeffer MA, Shah NH. Testing and Evaluation of Health Care Applications of Large Language Models: A Systematic Review. JAMA 2025; 333:319-328. [PMID: 39405325 PMCID: PMC11480901 DOI: 10.1001/jama.2024.21700] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 04/18/2024] [Accepted: 09/30/2024] [Indexed: 10/19/2024]
Abstract
Importance Large language models (LLMs) can assist in various health care activities, but current evaluation approaches may not adequately identify the most useful application areas. Objective To summarize existing evaluations of LLMs in health care in terms of 5 components: (1) evaluation data type, (2) health care task, (3) natural language processing (NLP) and natural language understanding (NLU) tasks, (4) dimension of evaluation, and (5) medical specialty. Data Sources A systematic search of PubMed and Web of Science was performed for studies published between January 1, 2022, and February 19, 2024. Study Selection Studies evaluating 1 or more LLMs in health care. Data Extraction and Synthesis Three independent reviewers categorized studies via keyword searches based on the data used, the health care tasks, the NLP and NLU tasks, the dimensions of evaluation, and the medical specialty. Results Of 519 studies reviewed, published between January 1, 2022, and February 19, 2024, only 5% used real patient care data for LLM evaluation. The most common health care tasks were assessing medical knowledge such as answering medical licensing examination questions (44.5%) and making diagnoses (19.5%). Administrative tasks such as assigning billing codes (0.2%) and writing prescriptions (0.2%) were less studied. For NLP and NLU tasks, most studies focused on question answering (84.2%), while tasks such as summarization (8.9%) and conversational dialogue (3.3%) were infrequent. Almost all studies (95.4%) used accuracy as the primary dimension of evaluation; fairness, bias, and toxicity (15.8%), deployment considerations (4.6%), and calibration and uncertainty (1.2%) were infrequently measured. Finally, in terms of medical specialty area, most studies were in generic health care applications (25.6%), internal medicine (16.4%), surgery (11.4%), and ophthalmology (6.9%), with nuclear medicine (0.6%), physical medicine (0.4%), and medical genetics (0.2%) being the least represented. Conclusions and Relevance Existing evaluations of LLMs mostly focus on accuracy of question answering for medical examinations, without consideration of real patient care data. Dimensions such as fairness, bias, and toxicity and deployment considerations received limited attention. Future evaluations should adopt standardized applications and metrics, use clinical data, and broaden focus to include a wider range of tasks and specialties.
Collapse
Affiliation(s)
- Suhana Bedi
- Department of Biomedical Data Science, Stanford School of Medicine, Stanford, California
| | - Yutong Liu
- Clinical Excellence Research Center, Stanford University, Stanford, California
| | - Lucy Orr-Ewing
- Clinical Excellence Research Center, Stanford University, Stanford, California
| | - Dev Dash
- Clinical Excellence Research Center, Stanford University, Stanford, California
- Center for Biomedical Informatics Research, Stanford University, Stanford, California
| | - Sanmi Koyejo
- Department of Computer Science, Stanford University, Stanford, California
| | - Alison Callahan
- Center for Biomedical Informatics Research, Stanford University, Stanford, California
| | - Jason A. Fries
- Center for Biomedical Informatics Research, Stanford University, Stanford, California
| | - Michael Wornow
- Center for Biomedical Informatics Research, Stanford University, Stanford, California
| | - Akshay Swaminathan
- Center for Biomedical Informatics Research, Stanford University, Stanford, California
| | | | - Hyo Jung Hong
- Department of Anesthesiology, Stanford University, Stanford, California
| | - Mehr Kashyap
- Stanford University School of Medicine, Stanford, California
| | - Akash R. Chaurasia
- Center for Biomedical Informatics Research, Stanford University, Stanford, California
| | - Nirav R. Shah
- Clinical Excellence Research Center, Stanford University, Stanford, California
| | - Karandeep Singh
- Digital Health Innovation, University of California San Diego Health, San Diego
| | - Troy Tazbaz
- Digital Health Center of Excellence, US Food and Drug Administration, Washington, DC
| | - Arnold Milstein
- Clinical Excellence Research Center, Stanford University, Stanford, California
| | - Michael A. Pfeffer
- Department of Medicine, Stanford University School of Medicine, Stanford, California
| | - Nigam H. Shah
- Clinical Excellence Research Center, Stanford University, Stanford, California
- Center for Biomedical Informatics Research, Stanford University, Stanford, California
| |
Collapse
|
8
|
Chen J, Su L, Li Y, Lin M, Peng Y, Sun C. A multimodal approach for few-shot biomedical named entity recognition in low-resource languages. J Biomed Inform 2025; 161:104754. [PMID: 39622400 DOI: 10.1016/j.jbi.2024.104754] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/20/2024] [Revised: 11/05/2024] [Accepted: 11/19/2024] [Indexed: 12/16/2024]
Abstract
In this study, we revisit named entity recognition (NER) in the biomedical domain from a multimodal perspective, with a particular focus on applications in low-resource languages. Existing research primarily relies on unimodal methods for NER, which limits the potential for capturing diverse information. To address this limitation, we propose a novel method that integrates a cross-modal generation module to transform unimodal data into multimodal data, thereby enabling the use of enriched multimodal information for NER. Additionally, we design a cross-modal filtering module to mitigate the adverse effects of text-image mismatches in multimodal NER. We validate our proposed method on two biomedical datasets specifically curated for low-resource languages. Experimental results demonstrate that our method significantly enhances the performance of NER, highlighting its effectiveness and potential for broader applications in biomedical research and low-resource language contexts.
Collapse
Affiliation(s)
- Jian Chen
- Department of Data Science and Big Data Technology, Hainan University, Haikou 570228, China
| | - Leilei Su
- Department of Mathematics, Hainan University, Haikou 570228, China
| | - Yihong Li
- Department of Data Science and Big Data Technology, Hainan University, Haikou 570228, China
| | - Mingquan Lin
- Department of Surgery, University of Minnesota, Minneapolis 55455, USA
| | - Yifan Peng
- Department of Population Health Sciences, Weill Cornell Medicine, New York 10022, USA
| | - Cong Sun
- Department of Population Health Sciences, Weill Cornell Medicine, New York 10022, USA.
| |
Collapse
|
9
|
Smith N, Yuan X, Melissinos C, Moghe G. FuncFetch: an LLM-assisted workflow enables mining thousands of enzyme-substrate interactions from published manuscripts. Bioinformatics 2024; 41:btae756. [PMID: 39718779 PMCID: PMC11734755 DOI: 10.1093/bioinformatics/btae756] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/22/2024] [Revised: 11/16/2024] [Accepted: 12/20/2024] [Indexed: 12/25/2024] Open
Abstract
MOTIVATION Thousands of genomes are publicly available, however, most genes in those genomes have poorly defined functions. This is partly due to a gap between previously published, experimentally characterized protein activities and activities deposited in databases. This activity deposition is bottlenecked by the time-consuming biocuration process. The emergence of large language models presents an opportunity to speed up the text-mining of protein activities for biocuration. RESULTS We developed FuncFetch-a workflow that integrates NCBI E-Utilities, OpenAI's GPT-4, and Zotero-to screen thousands of manuscripts and extract enzyme activities. Extensive validation revealed high precision and recall of GPT-4 in determining whether the abstract of a given paper indicates the presence of a characterized enzyme activity in that paper. Provided the manuscript, FuncFetch extracted data such as species information, enzyme names, sequence identifiers, substrates, and products, which were subjected to extensive quality analyses. Comparison of this workflow against a manually curated dataset of BAHD acyltransferase activities demonstrated a precision/recall of 0.86/0.64 in extracting substrates. We further deployed FuncFetch on nine large plant enzyme families. Screening 26 543 papers, FuncFetch retrieved 32 605 entries from 5459 selected papers. We also identified multiple extraction errors including incorrect associations, nontarget enzymes, and hallucinations, which highlight the need for further manual curation. The BAHD activities were verified, resulting in a comprehensive functional fingerprint of this family and revealing that ∼70% of the experimentally characterized enzymes are uncurated in the public domain. FuncFetch represents an advance in biocuration and lays the groundwork for predicting the functions of uncharacterized enzymes. AVAILABILITY AND IMPLEMENTATION Code and minimally curated activities are available at: https://github.com/moghelab/funcfetch and https://tools.moghelab.org/funczymedb.
Collapse
Affiliation(s)
- Nathaniel Smith
- Plant Biology Section, School of Integrative Plant Science, Cornell University, Ithaca, NY 14853, United States
| | - Xinyu Yuan
- Plant Biology Section, School of Integrative Plant Science, Cornell University, Ithaca, NY 14853, United States
| | - Chesney Melissinos
- Plant Biology Section, School of Integrative Plant Science, Cornell University, Ithaca, NY 14853, United States
| | - Gaurav Moghe
- Plant Biology Section, School of Integrative Plant Science, Cornell University, Ithaca, NY 14853, United States
| |
Collapse
|
10
|
Amirian S, Kekre A, Loganathan BJ, Chavan V, Kandula P, Littlefield N, Franco JR, Tafti AP, Ebuenyi ID. Advancing psychosocial disability and psychosocial rehabilitation research through large language models and computational text mining. Glob Ment Health (Camb) 2024; 11:e123. [PMID: 39776990 PMCID: PMC11704382 DOI: 10.1017/gmh.2024.114] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 04/03/2024] [Revised: 08/22/2024] [Accepted: 09/16/2024] [Indexed: 01/11/2025] Open
Abstract
Psychosocial rehabilitation and psychosocial disability research have been a longstanding topic in healthcare, demanding continuous exploration and analysis to enhance patient and clinical outcomes. As the prevalence of psychosocial disability research continues to attract scholarly attention, many scientific articles are being published in the literature. These publications offer profound insights into diagnostics, preventative measures, treatment strategies, and epidemiological factors. Computational text mining as a subfield of artificial intelligence (AI) can make a big difference in accurately analyzing the current extensive collection of scientific articles on time, assisting individual scientists in understanding psychosocial disabilities better, and improving how we care for people with these challenges. Leveraging the vast repository of scientific literature available on PubMed, this study employs advanced text mining strategies, including word embeddings and large language models (LLMs) to extract valuable insights, automatically catalyzing research in mental health. It aims to significantly enhance the scientific community's knowledge by creating an extensive textual dataset and advanced computational text mining strategies to explore current trends in psychosocial rehabilitation and psychosocial disability research.
Collapse
Affiliation(s)
- Soheyla Amirian
- School of Computing, University of Georgia, Athens, GA, 30602USA
| | - Ashutosh Kekre
- School of Computing, University of Georgia, Athens, GA, 30602USA
| | | | - Vedraj Chavan
- School of Computing, University of Georgia, Athens, GA, 30602USA
| | - Punith Kandula
- School of Computing, University of Georgia, Athens, GA, 30602USA
| | - Nickolas Littlefield
- Intelligent Systems Program, University of Pittsburgh, Pittsburgh, PA, 15213, USA
| | | | - Ahmad P. Tafti
- School of Health and Rehabilitation Sciences, University of Pittsburgh, Pittsburgh, PA15260, USA
| | - Ikenna D. Ebuenyi
- School of Health and Rehabilitation Sciences, University of Pittsburgh, Pittsburgh, PA15260, USA
| |
Collapse
|
11
|
Wang J, Cheng Z, Yao Q, Liu L, Xu D, Hu G. Bioinformatics and biomedical informatics with ChatGPT: Year one review. QUANTITATIVE BIOLOGY 2024; 12:345-359. [PMID: 39364207 PMCID: PMC11446534 DOI: 10.1002/qub2.67] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/28/2024] [Accepted: 06/12/2024] [Indexed: 10/05/2024]
Abstract
The year 2023 marked a significant surge in the exploration of applying large language model chatbots, notably Chat Generative Pre-trained Transformer (ChatGPT), across various disciplines. We surveyed the application of ChatGPT in bioinformatics and biomedical informatics throughout the year, covering omics, genetics, biomedical text mining, drug discovery, biomedical image understanding, bioinformatics programming, and bioinformatics education. Our survey delineates the current strengths and limitations of this chatbot in bioinformatics and offers insights into potential avenues for future developments.
Collapse
Affiliation(s)
- Jinge Wang
- Department of Microbiology, Immunology & Cell Biology, West Virginia University, Morgantown, West Virginia, USA
| | - Zien Cheng
- Department of Microbiology, Immunology & Cell Biology, West Virginia University, Morgantown, West Virginia, USA
| | - Qiuming Yao
- School of Computing, University of Nebraska-Lincoln, Lincoln, Nebraska, USA
| | - Li Liu
- College of Health Solutions, Arizona State University, Phoenix, Arizona, USA
- Biodesign Institute, Arizona State University, Tempe, Arizona, USA
| | - Dong Xu
- Department of Electrical Engineer and Computer Science, Christopher S. Bond Life Sciences Center, University of Missouri, Columbia, Missouri, USA
| | - Gangqing Hu
- Department of Microbiology, Immunology & Cell Biology, West Virginia University, Morgantown, West Virginia, USA
| |
Collapse
|
12
|
Liu J, Wong ZSY. Utilizing active learning strategies in machine-assisted annotation for clinical named entity recognition: a comprehensive analysis considering annotation costs and target effectiveness. J Am Med Inform Assoc 2024; 31:2632-2640. [PMID: 39081233 PMCID: PMC11491619 DOI: 10.1093/jamia/ocae197] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/21/2024] [Revised: 07/09/2024] [Accepted: 07/15/2024] [Indexed: 10/22/2024] Open
Abstract
OBJECTIVES Active learning (AL) has rarely integrated diversity-based and uncertainty-based strategies into a dynamic sampling framework for clinical named entity recognition (NER). Machine-assisted annotation is becoming popular for creating gold-standard labels. This study investigated the effectiveness of dynamic AL strategies under simulated machine-assisted annotation scenarios for clinical NER. MATERIALS AND METHODS We proposed 3 new AL strategies: a diversity-based strategy (CLUSTER) based on Sentence-BERT and 2 dynamic strategies (CLC and CNBSE) capable of switching from diversity-based to uncertainty-based strategies. Using BioClinicalBERT as the foundational NER model, we conducted simulation experiments on 3 medication-related clinical NER datasets independently: i2b2 2009, n2c2 2018 (Track 2), and MADE 1.0. We compared the proposed strategies with uncertainty-based (LC and NBSE) and passive-learning (RANDOM) strategies. Performance was primarily measured by the number of edits made by the annotators to achieve a desired target effectiveness evaluated on independent test sets. RESULTS When aiming for 98% overall target effectiveness, on average, CLUSTER required the fewest edits. When aiming for 99% overall target effectiveness, CNBSE required 20.4% fewer edits than NBSE did. CLUSTER and RANDOM could not achieve such a high target under the pool-based simulation experiment. For high-difficulty entities, CNBSE required 22.5% fewer edits than NBSE to achieve 99% target effectiveness, whereas neither CLUSTER nor RANDOM achieved 93% target effectiveness. DISCUSSION AND CONCLUSION When the target effectiveness was set high, the proposed dynamic strategy CNBSE exhibited both strong learning capabilities and low annotation costs in machine-assisted annotation. CLUSTER required the fewest edits when the target effectiveness was set low.
Collapse
Affiliation(s)
- Jiaxing Liu
- School of Statistics and Mathematics, Zhongnan University of Economics and Law, Wuhan, Hubei 430073, China
| | - Zoie S Y Wong
- Graduate School of Public Health, St Luke’s International University, OMURA Susumu & Mieko Memorial St Luke’s Center for Clinical Academia, Chuo-ku, Tokyo 104-0045, Japan
- The Kirby Institute, University of New South Wales, Sydney, NSW 2052, Australia
- School of Medical Sciences, The Unviersity of Sydney, Camperdown, NSW 2050, Australia
| |
Collapse
|
13
|
Kim J, Wang K, Weng C, Liu C. Assessing the utility of large language models for phenotype-driven gene prioritization in the diagnosis of rare genetic disease. Am J Hum Genet 2024; 111:2190-2202. [PMID: 39255797 PMCID: PMC11480789 DOI: 10.1016/j.ajhg.2024.08.010] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/17/2024] [Revised: 08/08/2024] [Accepted: 08/13/2024] [Indexed: 09/12/2024] Open
Abstract
Phenotype-driven gene prioritization is fundamental to diagnosing rare genetic disorders. While traditional approaches rely on curated knowledge graphs with phenotype-gene relations, recent advancements in large language models (LLMs) promise a streamlined text-to-gene solution. In this study, we evaluated five LLMs, including two generative pre-trained transformers (GPT) series and three Llama2 series, assessing their performance across task completeness, gene prediction accuracy, and adherence to required output structures. We conducted experiments, exploring various combinations of models, prompts, phenotypic input types, and task difficulty levels. Our findings revealed that the best-performed LLM, GPT-4, achieved an average accuracy of 17.0% in identifying diagnosed genes within the top 50 predictions, which still falls behind traditional tools. However, accuracy increased with the model size. Consistent results were observed over time, as shown in the dataset curated after 2023. Advanced techniques such as retrieval-augmented generation (RAG) and few-shot learning did not improve the accuracy. Sophisticated prompts were more likely to enhance task completeness, especially in smaller models. Conversely, complicated prompts tended to decrease output structure compliance rate. LLMs also achieved better-than-random prediction accuracy with free-text input, though performance was slightly lower than with standardized concept input. Bias analysis showed that highly cited genes, such as BRCA1, TP53, and PTEN, are more likely to be predicted. Our study provides valuable insights into integrating LLMs with genomic analysis, contributing to the ongoing discussion on their utilization in clinical workflows.
Collapse
Affiliation(s)
- Junyoung Kim
- Department of Biomedical Informatics, Columbia University, New York, NY 10032, USA
| | - Kai Wang
- Raymond G. Perelman Center for Cellular and Molecular Therapeutics, Children's Hospital of Philadelphia, Philadelphia, PA 19104, USA; Department of Pathology and Laboratory Medicine, University of Pennsylvania, Philadelphia, PA 19104, USA
| | - Chunhua Weng
- Department of Biomedical Informatics, Columbia University, New York, NY 10032, USA.
| | - Cong Liu
- Department of Biomedical Informatics, Columbia University, New York, NY 10032, USA.
| |
Collapse
|
14
|
Zaghir J, Naguib M, Bjelogrlic M, Névéol A, Tannier X, Lovis C. Prompt Engineering Paradigms for Medical Applications: Scoping Review. J Med Internet Res 2024; 26:e60501. [PMID: 39255030 PMCID: PMC11422740 DOI: 10.2196/60501] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/14/2024] [Revised: 07/09/2024] [Accepted: 07/22/2024] [Indexed: 09/11/2024] Open
Abstract
BACKGROUND Prompt engineering, focusing on crafting effective prompts to large language models (LLMs), has garnered attention for its capabilities at harnessing the potential of LLMs. This is even more crucial in the medical domain due to its specialized terminology and language technicity. Clinical natural language processing applications must navigate complex language and ensure privacy compliance. Prompt engineering offers a novel approach by designing tailored prompts to guide models in exploiting clinically relevant information from complex medical texts. Despite its promise, the efficacy of prompt engineering in the medical domain remains to be fully explored. OBJECTIVE The aim of the study is to review research efforts and technical approaches in prompt engineering for medical applications as well as provide an overview of opportunities and challenges for clinical practice. METHODS Databases indexing the fields of medicine, computer science, and medical informatics were queried in order to identify relevant published papers. Since prompt engineering is an emerging field, preprint databases were also considered. Multiple data were extracted, such as the prompt paradigm, the involved LLMs, the languages of the study, the domain of the topic, the baselines, and several learning, design, and architecture strategies specific to prompt engineering. We include studies that apply prompt engineering-based methods to the medical domain, published between 2022 and 2024, and covering multiple prompt paradigms such as prompt learning (PL), prompt tuning (PT), and prompt design (PD). RESULTS We included 114 recent prompt engineering studies. Among the 3 prompt paradigms, we have observed that PD is the most prevalent (78 papers). In 12 papers, PD, PL, and PT terms were used interchangeably. While ChatGPT is the most commonly used LLM, we have identified 7 studies using this LLM on a sensitive clinical data set. Chain-of-thought, present in 17 studies, emerges as the most frequent PD technique. While PL and PT papers typically provide a baseline for evaluating prompt-based approaches, 61% (48/78) of the PD studies do not report any nonprompt-related baseline. Finally, we individually examine each of the key prompt engineering-specific information reported across papers and find that many studies neglect to explicitly mention them, posing a challenge for advancing prompt engineering research. CONCLUSIONS In addition to reporting on trends and the scientific landscape of prompt engineering, we provide reporting guidelines for future studies to help advance research in the medical field. We also disclose tables and figures summarizing medical prompt engineering papers available and hope that future contributions will leverage these existing works to better advance the field.
Collapse
Affiliation(s)
- Jamil Zaghir
- Division of Medical Information Sciences, Geneva University Hospitals, Geneva, Switzerland
- Department of Radiology and Medical Informatics, University of Geneva, Geneva, Switzerland
| | - Marco Naguib
- Université Paris-Saclay, CNRS, Laboratoire Interdisciplinaire des Sciences du Numérique, Orsay, France
| | - Mina Bjelogrlic
- Division of Medical Information Sciences, Geneva University Hospitals, Geneva, Switzerland
- Department of Radiology and Medical Informatics, University of Geneva, Geneva, Switzerland
| | - Aurélie Névéol
- Université Paris-Saclay, CNRS, Laboratoire Interdisciplinaire des Sciences du Numérique, Orsay, France
| | - Xavier Tannier
- Sorbonne Université, INSERM, Université Sorbonne Paris-Nord, Laboratoire d'Informatique Médicale et d'Ingénierie des Connaissances en eSanté, LIMICS, Paris, France
| | - Christian Lovis
- Division of Medical Information Sciences, Geneva University Hospitals, Geneva, Switzerland
- Department of Radiology and Medical Informatics, University of Geneva, Geneva, Switzerland
| |
Collapse
|
15
|
Niu Z, Xiao X, Wu W, Cai Q, Jiang Y, Jin W, Wang M, Yang G, Kong L, Jin X, Yang G, Chen H. PharmaBench: Enhancing ADMET benchmarks with large language models. Sci Data 2024; 11:985. [PMID: 39256394 PMCID: PMC11387650 DOI: 10.1038/s41597-024-03793-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/08/2024] [Accepted: 08/19/2024] [Indexed: 09/12/2024] Open
Abstract
Accurately predicting ADMET (Absorption, Distribution, Metabolism, Excretion, and Toxicity) properties early in drug development is essential for selecting compounds with optimal pharmacokinetics and minimal toxicity. Existing ADMET-related benchmark sets are limited in utility due to their small dataset sizes and the lack of representation of compounds used in drug discovery projects. These shortcomings hinder their application in model building for drug discovery. To address this issue, we propose a multi-agent data mining system based on Large Language Models that effectively identifies experimental conditions within 14,401 bioassays. This approach facilitates merging entries from different sources, culminating in the creation of PharmaBench. Additionally, we have developed a data processing workflow to integrate data from various sources, resulting in 156,618 raw entries. Through this workflow, we constructed PharmaBench, a comprehensive benchmark set for ADMET properties, which comprises eleven ADMET datasets and 52,482 entries. This benchmark set is designed to serve as an open-source dataset for the development of AI models relevant to drug discovery projects.
Collapse
Affiliation(s)
- Zhangming Niu
- MindRank AI, Hangzhou, Zhejiang, China
- National Heart and Lung Institute, Imperial College London, London, SW7 2AZ, UK
| | - Xianglu Xiao
- MindRank AI, Hangzhou, Zhejiang, China
- Bioengineering Department and Imperial-X, Imperial College London, London, W12 7SL, UK
| | - Wenfan Wu
- MindRank AI, Hangzhou, Zhejiang, China
- Department of Bioinformatics and Systems Biology, Huazhong University of Science and Technology College of Life Sciences and Technology, Wuhan, Hubei, China
- Guangzhou National Laboratory, Guangzhou, 510005, China
| | - Qiwei Cai
- MindRank AI, Hangzhou, Zhejiang, China
| | | | | | | | | | | | - Xurui Jin
- MindRank AI, Hangzhou, Zhejiang, China
| | - Guang Yang
- National Heart and Lung Institute, Imperial College London, London, SW7 2AZ, UK.
- Bioengineering Department and Imperial-X, Imperial College London, London, W12 7SL, UK.
- Cardiovascular Research Centre, Royal Brompton Hospital, London, SW3 6NP, UK.
- School of Biomedical Engineering & Imaging Sciences, King's College London, London, UK.
| | - Hongming Chen
- Department of Bioinformatics and Systems Biology, Huazhong University of Science and Technology College of Life Sciences and Technology, Wuhan, Hubei, China.
- Guangzhou National Laboratory, Guangzhou, 510005, China.
- School of pharmaceutical sciences, Guangzhou Medical University, Guangzhou, 511495, China.
| |
Collapse
|
16
|
Akyon SH, Akyon FC, Camyar AS, Hızlı F, Sari T, Hızlı Ş. Evaluating the Capabilities of Generative AI Tools in Understanding Medical Papers: Qualitative Study. JMIR Med Inform 2024; 12:e59258. [PMID: 39230947 PMCID: PMC11411230 DOI: 10.2196/59258] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/07/2024] [Revised: 06/16/2024] [Accepted: 07/05/2024] [Indexed: 09/05/2024] Open
Abstract
BACKGROUND Reading medical papers is a challenging and time-consuming task for doctors, especially when the papers are long and complex. A tool that can help doctors efficiently process and understand medical papers is needed. OBJECTIVE This study aims to critically assess and compare the comprehension capabilities of large language models (LLMs) in accurately and efficiently understanding medical research papers using the STROBE (Strengthening the Reporting of Observational Studies in Epidemiology) checklist, which provides a standardized framework for evaluating key elements of observational study. METHODS The study is a methodological type of research. The study aims to evaluate the understanding capabilities of new generative artificial intelligence tools in medical papers. A novel benchmark pipeline processed 50 medical research papers from PubMed, comparing the answers of 6 LLMs (GPT-3.5-Turbo, GPT-4-0613, GPT-4-1106, PaLM 2, Claude v1, and Gemini Pro) to the benchmark established by expert medical professors. Fifteen questions, derived from the STROBE checklist, assessed LLMs' understanding of different sections of a research paper. RESULTS LLMs exhibited varying performance, with GPT-3.5-Turbo achieving the highest percentage of correct answers (n=3916, 66.9%), followed by GPT-4-1106 (n=3837, 65.6%), PaLM 2 (n=3632, 62.1%), Claude v1 (n=2887, 58.3%), Gemini Pro (n=2878, 49.2%), and GPT-4-0613 (n=2580, 44.1%). Statistical analysis revealed statistically significant differences between LLMs (P<.001), with older models showing inconsistent performance compared to newer versions. LLMs showcased distinct performances for each question across different parts of a scholarly paper-with certain models like PaLM 2 and GPT-3.5 showing remarkable versatility and depth in understanding. CONCLUSIONS This study is the first to evaluate the performance of different LLMs in understanding medical papers using the retrieval augmented generation method. The findings highlight the potential of LLMs to enhance medical research by improving efficiency and facilitating evidence-based decision-making. Further research is needed to address limitations such as the influence of question formats, potential biases, and the rapid evolution of LLM models.
Collapse
Affiliation(s)
| | - Fatih Cagatay Akyon
- SafeVideo AI, San Francisco, CA, United States
- Graduate School of Informatics, Middle East Technical University, Ankara, Turkey
| | - Ahmet Sefa Camyar
- Department of Internal Medicine, Ankara Etlik City Hospital, Ankara, Turkey
| | - Fatih Hızlı
- Faculty of Medicine, Ankara Yildirim Beyazit University, Ankara, Turkey
| | - Talha Sari
- SafeVideo AI, San Francisco, CA, United States
- Department of Computer Science, Istanbul Technical University, Istanbul, Turkey
| | - Şamil Hızlı
- Department of Pediatric Gastroenterology, Children Hospital, Ankara Bilkent City Hospital, Ankara Yildirim Beyazit University, Ankara, Turkey
| |
Collapse
|
17
|
Mehryary F, Nastou K, Ohta T, Jensen LJ, Pyysalo S. STRING-ing together protein complexes: corpus and methods for extracting physical protein interactions from the biomedical literature. BIOINFORMATICS (OXFORD, ENGLAND) 2024; 40:btae552. [PMID: 39276156 PMCID: PMC11441320 DOI: 10.1093/bioinformatics/btae552] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/01/2024] [Revised: 07/01/2024] [Accepted: 09/12/2024] [Indexed: 09/16/2024]
Abstract
MOTIVATION Understanding biological processes relies heavily on curated knowledge of physical interactions between proteins. Yet, a notable gap remains between the information stored in databases of curated knowledge and the plethora of interactions documented in the scientific literature. RESULTS To bridge this gap, we introduce ComplexTome, a manually annotated corpus designed to facilitate the development of text-mining methods for the extraction of complex formation relationships among biomedical entities targeting the downstream semantics of the physical interaction subnetwork of the STRING database. This corpus comprises 1287 documents with ∼3500 relationships. We train a novel relation extraction model on this corpus and find that it can highly reliably identify physical protein interactions (F1-score = 82.8%). We additionally enhance the model's capabilities through unsupervised trigger word detection and apply it to extract relations and trigger words for these relations from all open publications in the domain literature. This information has been fully integrated into the latest version of the STRING database. AVAILABILITY AND IMPLEMENTATION We provide the corpus, code, and all results produced by the large-scale runs of our systems biomedical on literature via Zenodo https://doi.org/10.5281/zenodo.8139716, Github https://github.com/farmeh/ComplexTome_extraction, and the latest version of STRING database https://string-db.org/.
Collapse
Affiliation(s)
- Farrokh Mehryary
- TurkuNLP Group, Department of Computing, University of Turku, Turku 20014, Finland
| | - Katerina Nastou
- Novo Nordisk Foundation Center for Protein Research, University of Copenhagen, Copenhagen 2200, Denmark
| | - Tomoko Ohta
- Textimi, 1-37-13 Kitazawa, Tokyo, Setagaya-ku 155-0031, Japan
| | - Lars Juhl Jensen
- Novo Nordisk Foundation Center for Protein Research, University of Copenhagen, Copenhagen 2200, Denmark
| | - Sampo Pyysalo
- TurkuNLP Group, Department of Computing, University of Turku, Turku 20014, Finland
| |
Collapse
|
18
|
Luo L, Ning J, Zhao Y, Wang Z, Ding Z, Chen P, Fu W, Han Q, Xu G, Qiu Y, Pan D, Li J, Li H, Feng W, Tu S, Liu Y, Yang Z, Wang J, Sun Y, Lin H. Taiyi: a bilingual fine-tuned large language model for diverse biomedical tasks. J Am Med Inform Assoc 2024; 31:1865-1874. [PMID: 38422367 PMCID: PMC11339499 DOI: 10.1093/jamia/ocae037] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/12/2023] [Revised: 01/08/2024] [Accepted: 02/16/2024] [Indexed: 03/02/2024] Open
Abstract
OBJECTIVE Most existing fine-tuned biomedical large language models (LLMs) focus on enhancing performance in monolingual biomedical question answering and conversation tasks. To investigate the effectiveness of the fine-tuned LLMs on diverse biomedical natural language processing (NLP) tasks in different languages, we present Taiyi, a bilingual fine-tuned LLM for diverse biomedical NLP tasks. MATERIALS AND METHODS We first curated a comprehensive collection of 140 existing biomedical text mining datasets (102 English and 38 Chinese datasets) across over 10 task types. Subsequently, these corpora were converted to the instruction data used to fine-tune the general LLM. During the supervised fine-tuning phase, a 2-stage strategy is proposed to optimize the model performance across various tasks. RESULTS Experimental results on 13 test sets, which include named entity recognition, relation extraction, text classification, and question answering tasks, demonstrate that Taiyi achieves superior performance compared to general LLMs. The case study involving additional biomedical NLP tasks further shows Taiyi's considerable potential for bilingual biomedical multitasking. CONCLUSION Leveraging rich high-quality biomedical corpora and developing effective fine-tuning strategies can significantly improve the performance of LLMs within the biomedical domain. Taiyi shows the bilingual multitasking capability through supervised fine-tuning. However, those tasks such as information extraction that are not generation tasks in nature remain challenging for LLM-based generative approaches, and they still underperform the conventional discriminative approaches using smaller language models.
Collapse
Affiliation(s)
- Ling Luo
- School of Computer Science and Technology, Dalian University of Technology, Dalian 116024, China
| | - Jinzhong Ning
- School of Computer Science and Technology, Dalian University of Technology, Dalian 116024, China
| | - Yingwen Zhao
- School of Computer Science and Technology, Dalian University of Technology, Dalian 116024, China
| | - Zhijun Wang
- School of Computer Science and Technology, Dalian University of Technology, Dalian 116024, China
| | - Zeyuan Ding
- School of Computer Science and Technology, Dalian University of Technology, Dalian 116024, China
| | - Peng Chen
- School of Computer Science and Technology, Dalian University of Technology, Dalian 116024, China
| | - Weiru Fu
- School of Computer Science and Technology, Dalian University of Technology, Dalian 116024, China
| | - Qinyu Han
- School of Computer Science and Technology, Dalian University of Technology, Dalian 116024, China
| | - Guangtao Xu
- School of Computer Science and Technology, Dalian University of Technology, Dalian 116024, China
| | - Yunzhi Qiu
- School of Computer Science and Technology, Dalian University of Technology, Dalian 116024, China
| | - Dinghao Pan
- School of Computer Science and Technology, Dalian University of Technology, Dalian 116024, China
| | - Jiru Li
- School of Computer Science and Technology, Dalian University of Technology, Dalian 116024, China
| | - Hao Li
- School of Computer Science and Technology, Dalian University of Technology, Dalian 116024, China
| | - Wenduo Feng
- School of Computer Science and Technology, Dalian University of Technology, Dalian 116024, China
| | - Senbo Tu
- School of Computer Science and Technology, Dalian University of Technology, Dalian 116024, China
| | - Yuqi Liu
- School of Computer Science and Technology, Dalian University of Technology, Dalian 116024, China
| | - Zhihao Yang
- School of Computer Science and Technology, Dalian University of Technology, Dalian 116024, China
| | - Jian Wang
- School of Computer Science and Technology, Dalian University of Technology, Dalian 116024, China
| | - Yuanyuan Sun
- School of Computer Science and Technology, Dalian University of Technology, Dalian 116024, China
| | - Hongfei Lin
- School of Computer Science and Technology, Dalian University of Technology, Dalian 116024, China
| |
Collapse
|
19
|
Kim S, Yoon J. VAIV bio-discovery service using transformer model and retrieval augmented generation. BMC Bioinformatics 2024; 25:273. [PMID: 39169321 PMCID: PMC11340140 DOI: 10.1186/s12859-024-05903-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/30/2024] [Accepted: 08/16/2024] [Indexed: 08/23/2024] Open
Abstract
BACKGROUND There has been a considerable advancement in AI technologies like LLM and machine learning to support biomedical knowledge discovery. MAIN BODY We propose a novel biomedical neural search service called 'VAIV Bio-Discovery', which supports enhanced knowledge discovery and document search on unstructured text such as PubMed. It mainly handles with information related to chemical compound/drugs, gene/proteins, diseases, and their interactions (chemical compounds/drugs-proteins/gene including drugs-targets, drug-drug, and drug-disease). To provide comprehensive knowledge, the system offers four search options: basic search, entity and interaction search, and natural language search. We employ T5slim_dec, which adapts the autoregressive generation task of the T5 (text-to-text transfer transformer) to the interaction extraction task by removing the self-attention layer in the decoder block. It also assists in interpreting research findings by summarizing the retrieved search results for a given natural language query with Retrieval Augmented Generation (RAG). The search engine is built with a hybrid method that combines neural search with the probabilistic search, BM25. CONCLUSION As a result, our system can better understand the context, semantics and relationships between terms within the document, enhancing search accuracy. This research contributes to the rapidly evolving biomedical field by introducing a new service to access and discover relevant knowledge.
Collapse
Affiliation(s)
- Seonho Kim
- Department of Computer Science, Sogang University, 35, Baekbeom-Ro, Mapo-Gu, Seoul, Korea.
| | - Juntae Yoon
- VAIV Company Inc, 97, Dokseodang-Ro, Yongsan-Gu, Seoul, Korea.
| |
Collapse
|
20
|
Holland AM, Lorenz WR, Cavanagh JC, Smart NJ, Ayuso SA, Scarola GT, Kercher KW, Jorgensen LN, Janis JE, Fischer JP, Heniford BT. Comparison of Medical Research Abstracts Written by Surgical Trainees and Senior Surgeons or Generated by Large Language Models. JAMA Netw Open 2024; 7:e2425373. [PMID: 39093561 PMCID: PMC11297395 DOI: 10.1001/jamanetworkopen.2024.25373] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 03/12/2024] [Accepted: 06/04/2024] [Indexed: 08/04/2024] Open
Abstract
Importance Artificial intelligence (AI) has permeated academia, especially OpenAI Chat Generative Pretrained Transformer (ChatGPT), a large language model. However, little has been reported on its use in medical research. Objective To assess a chatbot's capability to generate and grade medical research abstracts. Design, Setting, and Participants In this cross-sectional study, ChatGPT versions 3.5 and 4.0 (referred to as chatbot 1 and chatbot 2) were coached to generate 10 abstracts by providing background literature, prompts, analyzed data for each topic, and 10 previously presented, unassociated abstracts to serve as models. The study was conducted between August 2023 and February 2024 (including data analysis). Exposure Abstract versions utilizing the same topic and data were written by a surgical trainee or a senior physician or generated by chatbot 1 and chatbot 2 for comparison. The 10 training abstracts were written by 8 surgical residents or fellows, edited by the same senior surgeon, at a high-volume hospital in the Southeastern US with an emphasis on outcomes-based research. Abstract comparison was then based on 10 abstracts written by 5 surgical trainees within the first 6 months of their research year, edited by the same senior author. Main Outcomes and Measures The primary outcome measurements were the abstract grades using 10- and 20-point scales and ranks (first to fourth). Abstract versions by chatbot 1, chatbot 2, junior residents, and the senior author were compared and judged by blinded surgeon-reviewers as well as both chatbot models. Five academic attending surgeons from Denmark, the UK, and the US, with extensive experience in surgical organizations, research, and abstract evaluation served as reviewers. Results Surgeon-reviewers were unable to differentiate between abstract versions. Each reviewer ranked an AI-generated version first at least once. Abstracts demonstrated no difference in their median (IQR) 10-point scores (resident, 7.0 [6.0-8.0]; senior author, 7.0 [6.0-8.0]; chatbot 1, 7.0 [6.0-8.0]; chatbot 2, 7.0 [6.0-8.0]; P = .61), 20-point scores (resident, 14.0 [12.0-7.0]; senior author, 15.0 [13.0-17.0]; chatbot 1, 14.0 [12.0-16.0]; chatbot 2, 14.0 [13.0-16.0]; P = .50), or rank (resident, 3.0 [1.0-4.0]; senior author, 2.0 [1.0-4.0]; chatbot 1, 3.0 [2.0-4.0]; chatbot 2, 2.0 [1.0-3.0]; P = .14). The abstract grades given by chatbot 1 were comparable to the surgeon-reviewers' grades. However, chatbot 2 graded more favorably than the surgeon-reviewers and chatbot 1. Median (IQR) chatbot 2-reviewer grades were higher than surgeon-reviewer grades of all 4 abstract versions (resident, 14.0 [12.0-17.0] vs 16.9 [16.0-17.5]; P = .02; senior author, 15.0 [13.0-17.0] vs 17.0 [16.5-18.0]; P = .03; chatbot 1, 14.0 [12.0-16.0] vs 17.8 [17.5-18.5]; P = .002; chatbot 2, 14.0 [13.0-16.0] vs 16.8 [14.5-18.0]; P = .04). When comparing the grades of the 2 chatbots, chatbot 2 gave higher median (IQR) grades for abstracts than chatbot 1 (resident, 14.0 [13.0-15.0] vs 16.9 [16.0-17.5]; P = .003; senior author, 13.5 [13.0-15.5] vs 17.0 [16.5-18.0]; P = .004; chatbot 1, 14.5 [13.0-15.0] vs 17.8 [17.5-18.5]; P = .003; chatbot 2, 14.0 [13.0-15.0] vs 16.8 [14.5-18.0]; P = .01). Conclusions and Relevance In this cross-sectional study, trained chatbots generated convincing medical abstracts, undifferentiable from resident or senior author drafts. Chatbot 1 graded abstracts similarly to surgeon-reviewers, while chatbot 2 was less stringent. These findings may assist surgeon-scientists in successfully implementing AI in medical research.
Collapse
Affiliation(s)
- Alexis M. Holland
- Division of Gastrointestinal and Minimally Invasive Surgery, Department of Surgery, Atrium Health Carolinas Medical Center, Charlotte, North Carolina
| | - William R. Lorenz
- Division of Gastrointestinal and Minimally Invasive Surgery, Department of Surgery, Atrium Health Carolinas Medical Center, Charlotte, North Carolina
| | - Jack C. Cavanagh
- Department of Economics, Massachusetts Institute of Technology, Cambridge
| | - Neil J. Smart
- Division of Colorectal Surgery, Department of Surgery, Royal Devon & Exeter Hospital, Exeter, Devon, United Kingdom
| | - Sullivan A. Ayuso
- Division of Gastrointestinal and Minimally Invasive Surgery, Department of Surgery, Atrium Health Carolinas Medical Center, Charlotte, North Carolina
| | - Gregory T. Scarola
- Division of Gastrointestinal and Minimally Invasive Surgery, Department of Surgery, Atrium Health Carolinas Medical Center, Charlotte, North Carolina
| | - Kent W. Kercher
- Division of Gastrointestinal and Minimally Invasive Surgery, Department of Surgery, Atrium Health Carolinas Medical Center, Charlotte, North Carolina
| | - Lars N. Jorgensen
- Department of Clinical Medicine, University of Copenhagen, Bispedjerg & Frederiksberg Hospital, Copenhagen, Denmark
| | - Jeffrey E. Janis
- Division of Plastic and Reconstructive Surgery, The Ohio State University Wexner Medical Center, Columbus
| | - John P. Fischer
- Division of Plastic Surgery, University of Pennsylvania Health System, Philadelphia
| | - B. Todd Heniford
- Division of Gastrointestinal and Minimally Invasive Surgery, Department of Surgery, Atrium Health Carolinas Medical Center, Charlotte, North Carolina
| |
Collapse
|
21
|
Zhang W, Wang Q, Kong X, Xiong J, Ni S, Cao D, Niu B, Chen M, Li Y, Zhang R, Wang Y, Zhang L, Li X, Xiong Z, Shi Q, Huang Z, Fu Z, Zheng M. Fine-tuning large language models for chemical text mining. Chem Sci 2024; 15:10600-10611. [PMID: 38994403 PMCID: PMC11234886 DOI: 10.1039/d4sc00924j] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/06/2024] [Accepted: 06/02/2024] [Indexed: 07/13/2024] Open
Abstract
Extracting knowledge from complex and diverse chemical texts is a pivotal task for both experimental and computational chemists. The task is still considered to be extremely challenging due to the complexity of the chemical language and scientific literature. This study explored the power of fine-tuned large language models (LLMs) on five intricate chemical text mining tasks: compound entity recognition, reaction role labelling, metal-organic framework (MOF) synthesis information extraction, nuclear magnetic resonance spectroscopy (NMR) data extraction, and the conversion of reaction paragraphs to action sequences. The fine-tuned LLMs demonstrated impressive performance, significantly reducing the need for repetitive and extensive prompt engineering experiments. For comparison, we guided ChatGPT (GPT-3.5-turbo) and GPT-4 with prompt engineering and fine-tuned GPT-3.5-turbo as well as other open-source LLMs such as Mistral, Llama3, Llama2, T5, and BART. The results showed that the fine-tuned ChatGPT models excelled in all tasks. They achieved exact accuracy levels ranging from 69% to 95% on these tasks with minimal annotated data. They even outperformed those task-adaptive pre-training and fine-tuning models that were based on a significantly larger amount of in-domain data. Notably, fine-tuned Mistral and Llama3 show competitive abilities. Given their versatility, robustness, and low-code capability, leveraging fine-tuned LLMs as flexible and effective toolkits for automated data acquisition could revolutionize chemical knowledge extraction.
Collapse
Affiliation(s)
- Wei Zhang
- Drug Discovery and Design Center, State Key Laboratory of Drug Research, Shanghai Institute of Materia Medica, Chinese Academy of Sciences 555 Zuchongzhi Road Shanghai 201203 China
- University of Chinese Academy of Sciences No. 19A Yuquan Road Beijing 100049 China
| | - Qinggong Wang
- Nanjing University of Chinese Medicine 138 Xianlin Road Nanjing 210023 China
| | - Xiangtai Kong
- Drug Discovery and Design Center, State Key Laboratory of Drug Research, Shanghai Institute of Materia Medica, Chinese Academy of Sciences 555 Zuchongzhi Road Shanghai 201203 China
- University of Chinese Academy of Sciences No. 19A Yuquan Road Beijing 100049 China
| | - Jiacheng Xiong
- Drug Discovery and Design Center, State Key Laboratory of Drug Research, Shanghai Institute of Materia Medica, Chinese Academy of Sciences 555 Zuchongzhi Road Shanghai 201203 China
- University of Chinese Academy of Sciences No. 19A Yuquan Road Beijing 100049 China
| | - Shengkun Ni
- Drug Discovery and Design Center, State Key Laboratory of Drug Research, Shanghai Institute of Materia Medica, Chinese Academy of Sciences 555 Zuchongzhi Road Shanghai 201203 China
- University of Chinese Academy of Sciences No. 19A Yuquan Road Beijing 100049 China
| | - Duanhua Cao
- Drug Discovery and Design Center, State Key Laboratory of Drug Research, Shanghai Institute of Materia Medica, Chinese Academy of Sciences 555 Zuchongzhi Road Shanghai 201203 China
- Innovation Institute for Artificial Intelligence in Medicine of Zhejiang University, College of Pharmaceutical Sciences, Zhejiang University Hangzhou Zhejiang 310058 China
| | - Buying Niu
- Drug Discovery and Design Center, State Key Laboratory of Drug Research, Shanghai Institute of Materia Medica, Chinese Academy of Sciences 555 Zuchongzhi Road Shanghai 201203 China
- University of Chinese Academy of Sciences No. 19A Yuquan Road Beijing 100049 China
| | - Mingan Chen
- Drug Discovery and Design Center, State Key Laboratory of Drug Research, Shanghai Institute of Materia Medica, Chinese Academy of Sciences 555 Zuchongzhi Road Shanghai 201203 China
- School of Physical Science and Technology, ShanghaiTech University Shanghai 201210 China
- Lingang Laboratory Shanghai 200031 China
| | - Yameng Li
- ProtonUnfold Technology Co., Ltd Suzhou China
| | - Runze Zhang
- Drug Discovery and Design Center, State Key Laboratory of Drug Research, Shanghai Institute of Materia Medica, Chinese Academy of Sciences 555 Zuchongzhi Road Shanghai 201203 China
- University of Chinese Academy of Sciences No. 19A Yuquan Road Beijing 100049 China
| | - Yitian Wang
- Drug Discovery and Design Center, State Key Laboratory of Drug Research, Shanghai Institute of Materia Medica, Chinese Academy of Sciences 555 Zuchongzhi Road Shanghai 201203 China
- University of Chinese Academy of Sciences No. 19A Yuquan Road Beijing 100049 China
| | - Lehan Zhang
- Drug Discovery and Design Center, State Key Laboratory of Drug Research, Shanghai Institute of Materia Medica, Chinese Academy of Sciences 555 Zuchongzhi Road Shanghai 201203 China
- University of Chinese Academy of Sciences No. 19A Yuquan Road Beijing 100049 China
| | - Xutong Li
- Drug Discovery and Design Center, State Key Laboratory of Drug Research, Shanghai Institute of Materia Medica, Chinese Academy of Sciences 555 Zuchongzhi Road Shanghai 201203 China
- University of Chinese Academy of Sciences No. 19A Yuquan Road Beijing 100049 China
| | | | - Qian Shi
- Lingang Laboratory Shanghai 200031 China
| | - Ziming Huang
- Medizinische Klinik und Poliklinik I, Klinikum der Universität München, Ludwig-Maximilians-Universität Munich Germany
| | - Zunyun Fu
- Drug Discovery and Design Center, State Key Laboratory of Drug Research, Shanghai Institute of Materia Medica, Chinese Academy of Sciences 555 Zuchongzhi Road Shanghai 201203 China
| | - Mingyue Zheng
- Drug Discovery and Design Center, State Key Laboratory of Drug Research, Shanghai Institute of Materia Medica, Chinese Academy of Sciences 555 Zuchongzhi Road Shanghai 201203 China
- University of Chinese Academy of Sciences No. 19A Yuquan Road Beijing 100049 China
- Nanjing University of Chinese Medicine 138 Xianlin Road Nanjing 210023 China
| |
Collapse
|
22
|
Farrell MJ, Le Guillarme N, Brierley L, Hunter B, Scheepens D, Willoughby A, Yates A, Mideo N. The changing landscape of text mining: a review of approaches for ecology and evolution. Proc Biol Sci 2024; 291:20240423. [PMID: 39082244 PMCID: PMC11289731 DOI: 10.1098/rspb.2024.0423] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/20/2024] [Revised: 06/20/2024] [Accepted: 06/20/2024] [Indexed: 08/02/2024] Open
Abstract
In ecology and evolutionary biology, the synthesis and modelling of data from published literature are commonly used to generate insights and test theories across systems. However, the tasks of searching, screening, and extracting data from literature are often arduous. Researchers may manually process hundreds to thousands of articles for systematic reviews, meta-analyses, and compiling synthetic datasets. As relevant articles expand to tens or hundreds of thousands, computer-based approaches can increase the efficiency, transparency and reproducibility of literature-based research. Methods available for text mining are rapidly changing owing to developments in machine learning-based language models. We review the growing landscape of approaches, mapping them onto three broad paradigms (frequency-based approaches, traditional Natural Language Processing and deep learning-based language models). This serves as an entry point to learn foundational and cutting-edge concepts, vocabularies, and methods to foster integration of these tools into ecological and evolutionary research. We cover approaches for modelling ecological texts, generating training data, developing custom models and interacting with large language models and discuss challenges and possible solutions to implementing these methods in ecology and evolution.
Collapse
Affiliation(s)
- Maxwell J. Farrell
- Department of Ecology & Evolutionary Biology, University of Toronto, Toronto, Ontario, Canada
- School of Biodiversity, One Health & Veterinary Medicine, University of Glasgow, Glasgow, UK
- MRC-University of Glasgow Centre for Virus Research, Glasgow, UK
| | - Nicolas Le Guillarme
- Université Grenoble Alpes, CNRS, LECA, Laboratoire d'Ecologie Alpine, Grenoble, France
| | - Liam Brierley
- MRC-University of Glasgow Centre for Virus Research, Glasgow, UK
- Department of Health Data Science, University of Liverpool, Liverpool, UK
| | - Bronwen Hunter
- School of Life Sciences, University of Sussex, Brighton, UK
| | - Daan Scheepens
- Division of Biosciences, University College London, London, UK
| | | | - Andrew Yates
- Informatics Institute, University of Amsterdam, Amsterdam, The Netherlands
| | - Nicole Mideo
- Department of Ecology & Evolutionary Biology, University of Toronto, Toronto, Ontario, Canada
| |
Collapse
|
23
|
Wang J, Cheng Z, Yao Q, Liu L, Xu D, Hu G. Bioinformatics and biomedical informatics with ChatGPT: Year one review. ARXIV 2024:arXiv:2403.15274v2. [PMID: 38562449 PMCID: PMC10984005] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Subscribe] [Scholar Register] [Indexed: 04/04/2024]
Abstract
The year 2023 marked a significant surge in the exploration of applying large language model (LLM) chatbots, notably ChatGPT, across various disciplines. We surveyed the applications of ChatGPT in bioinformatics and biomedical informatics throughout the year, covering omics, genetics, biomedical text mining, drug discovery, biomedical image understanding, bioinformatics programming, and bioinformatics education. Our survey delineates the current strengths and limitations of this chatbot in bioinformatics and offers insights into potential avenues for future developments.
Collapse
Affiliation(s)
- Jinge Wang
- Department of Microbiology, Immunology & Cell Biology, West Virginia University, Morgantown, WV 26506, USA
| | - Zien Cheng
- Department of Microbiology, Immunology & Cell Biology, West Virginia University, Morgantown, WV 26506, USA
| | - Qiuming Yao
- School of Computing, University of Nebraska-Lincoln, Lincoln, NE 68588, USA
| | - Li Liu
- College of Health Solutions, Arizona State University, Phoenix, AZ 85004, USA
- Biodesign Institute, Arizona State University, Tempe, AZ 85281, USA
| | - Dong Xu
- Department of Electrical Engineer and Computer Science, Christopher S. Bond Life Sciences Center, University of Missouri, Columbia, MO 65211, USA
| | - Gangqing Hu
- Department of Microbiology, Immunology & Cell Biology, West Virginia University, Morgantown, WV 26506, USA
| |
Collapse
|
24
|
Livne M, Miftahutdinov Z, Tutubalina E, Kuznetsov M, Polykovskiy D, Brundyn A, Jhunjhunwala A, Costa A, Aliper A, Aspuru-Guzik A, Zhavoronkov A. nach0: multimodal natural and chemical languages foundation model. Chem Sci 2024; 15:8380-8389. [PMID: 38846388 PMCID: PMC11151847 DOI: 10.1039/d4sc00966e] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/08/2024] [Accepted: 04/26/2024] [Indexed: 06/09/2024] Open
Abstract
Large Language Models (LLMs) have substantially driven scientific progress in various domains, and many papers have demonstrated their ability to tackle complex problems with creative solutions. Our paper introduces a new foundation model, nach0, capable of solving various chemical and biological tasks: biomedical question answering, named entity recognition, molecular generation, molecular synthesis, attributes prediction, and others. nach0 is a multi-domain and multi-task encoder-decoder LLM pre-trained on unlabeled text from scientific literature, patents, and molecule strings to incorporate a range of chemical and linguistic knowledge. We employed instruction tuning, where specific task-related instructions are utilized to fine-tune nach0 for the final set of tasks. To train nach0 effectively, we leverage the NeMo framework, enabling efficient parallel optimization of both base and large model versions. Extensive experiments demonstrate that our model outperforms state-of-the-art baselines on single-domain and cross-domain tasks. Furthermore, it can generate high-quality outputs in molecular and textual formats, showcasing its effectiveness in multi-domain setups.
Collapse
Affiliation(s)
- Micha Livne
- NVIDIA 2788 San Tomas Expressway Santa Clara 95051 CA USA
| | - Zulfat Miftahutdinov
- Insilico Medicine Canada Inc. 3710-1250 René-Lévesque West Montreal Quebec Canada
| | - Elena Tutubalina
- Insilico Medicine Hong Kong Ltd. Unit 310, 3/F, Building 8W, Phase 2, Hong Kong Science Park, Pak Shek Kok New Territories Hong Kong
| | - Maksim Kuznetsov
- Insilico Medicine Canada Inc. 3710-1250 René-Lévesque West Montreal Quebec Canada
| | - Daniil Polykovskiy
- Insilico Medicine Canada Inc. 3710-1250 René-Lévesque West Montreal Quebec Canada
| | - Annika Brundyn
- NVIDIA 2788 San Tomas Expressway Santa Clara 95051 CA USA
| | | | - Anthony Costa
- NVIDIA 2788 San Tomas Expressway Santa Clara 95051 CA USA
| | - Alex Aliper
- Insilico Medicine AI Ltd. Level 6, Unit 08, Block A, IRENA HQ Building, Masdar City Abu Dhabi United Arab Emirates
| | - Alán Aspuru-Guzik
- University of Toronto Lash Miller Building 80 St. George Street Toronto Ontario Canada
| | - Alex Zhavoronkov
- Insilico Medicine Hong Kong Ltd. Unit 310, 3/F, Building 8W, Phase 2, Hong Kong Science Park, Pak Shek Kok New Territories Hong Kong
| |
Collapse
|
25
|
Harada Y, Suzuki T, Harada T, Sakamoto T, Ishizuka K, Miyagami T, Kawamura R, Kunitomo K, Nagano H, Shimizu T, Watari T. Performance evaluation of ChatGPT in detecting diagnostic errors and their contributing factors: an analysis of 545 case reports of diagnostic errors. BMJ Open Qual 2024; 13:e002654. [PMID: 38830730 PMCID: PMC11149143 DOI: 10.1136/bmjoq-2023-002654] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/17/2023] [Accepted: 05/28/2024] [Indexed: 06/05/2024] Open
Abstract
BACKGROUND Manual chart review using validated assessment tools is a standardised methodology for detecting diagnostic errors. However, this requires considerable human resources and time. ChatGPT, a recently developed artificial intelligence chatbot based on a large language model, can effectively classify text based on suitable prompts. Therefore, ChatGPT can assist manual chart reviews in detecting diagnostic errors. OBJECTIVE This study aimed to clarify whether ChatGPT could correctly detect diagnostic errors and possible factors contributing to them based on case presentations. METHODS We analysed 545 published case reports that included diagnostic errors. We imputed the texts of case presentations and the final diagnoses with some original prompts into ChatGPT (GPT-4) to generate responses, including the judgement of diagnostic errors and contributing factors of diagnostic errors. Factors contributing to diagnostic errors were coded according to the following three taxonomies: Diagnosis Error Evaluation and Research (DEER), Reliable Diagnosis Challenges (RDC) and Generic Diagnostic Pitfalls (GDP). The responses on the contributing factors from ChatGPT were compared with those from physicians. RESULTS ChatGPT correctly detected diagnostic errors in 519/545 cases (95%) and coded statistically larger numbers of factors contributing to diagnostic errors per case than physicians: DEER (median 5 vs 1, p<0.001), RDC (median 4 vs 2, p<0.001) and GDP (median 4 vs 1, p<0.001). The most important contributing factors of diagnostic errors coded by ChatGPT were 'failure/delay in considering the diagnosis' (315, 57.8%) in DEER, 'atypical presentation' (365, 67.0%) in RDC, and 'atypical presentation' (264, 48.4%) in GDP. CONCLUSION ChatGPT accurately detects diagnostic errors from case presentations. ChatGPT may be more sensitive than manual reviewing in detecting factors contributing to diagnostic errors, especially for 'atypical presentation'.
Collapse
Affiliation(s)
- Yukinori Harada
- Department of Diagnostic and Generalist Medicine, Dokkyo Medical University, Shimotsuga-gun, Tochigi, Japan
| | | | - Taku Harada
- Department of Diagnostic and Generalist Medicine, Dokkyo Medical University, Shimotsuga-gun, Tochigi, Japan
- Nerima Hikarigaoka Hospital, Nerima-ku, Tokyo, Japan
| | - Tetsu Sakamoto
- Department of Diagnostic and Generalist Medicine, Dokkyo Medical University, Shimotsuga-gun, Tochigi, Japan
| | - Kosuke Ishizuka
- Yokohama City University School of Medicine Graduate School of Medicine, Yokohama, Kanagawa, Japan
| | - Taiju Miyagami
- Department of General Medicine, Faculty of Medicine, Juntendo University, Bunkyo-ku, Tokyo, Japan
| | - Ren Kawamura
- Department of Diagnostic and Generalist Medicine, Dokkyo Medical University, Shimotsuga-gun, Tochigi, Japan
| | | | - Hiroyuki Nagano
- Department of General Internal Medicine, Tenri Hospital, Tenri, Nara, Japan
| | - Taro Shimizu
- Department of Diagnostic and Generalist Medicine, Dokkyo Medical University, Shimotsuga-gun, Tochigi, Japan
| | - Takashi Watari
- Integrated Clinical Education Center, Kyoto University Hospital, Kyoto, Kyoto, Japan
| |
Collapse
|
26
|
Xie Q, Chen Q, Chen A, Peng C, Hu Y, Lin F, Peng X, Huang J, Zhang J, Keloth V, Zhou X, He H, Ohno-Machado L, Wu Y, Xu H, Bian J. Me-LLaMA: Foundation Large Language Models for Medical Applications. RESEARCH SQUARE 2024:rs.3.rs-4240043. [PMID: 38826372 PMCID: PMC11142305 DOI: 10.21203/rs.3.rs-4240043/v1] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/04/2024]
Abstract
Recent advancements in large language models (LLMs) such as ChatGPT and LLaMA have hinted at their potential to revolutionize medical applications, yet their application in clinical settings often reveals limitations due to a lack of specialized training on medical-specific data. In response to this challenge, this study introduces Me-LLaMA, a novel medical LLM family that includes foundation models - Me-LLaMA 13/70B, along with their chat-enhanced versions - Me-LLaMA 13/70B-chat, developed through continual pre-training and instruction tuning of LLaMA2 using large medical datasets. Our methodology leverages a comprehensive domain-specific data suite, including a large-scale, continual pre-training dataset with 129B tokens, an instruction tuning dataset with 214k samples, and a new medical evaluation benchmark (MIBE) across six critical medical tasks with 12 datasets. Our extensive evaluation using the MIBE shows that Me-LLaMA models achieve overall better performance than existing open-source medical LLMs in zero-shot, few-shot and supervised learning abilities. With task-specific instruction tuning, Me-LLaMA models outperform ChatGPT on 7 out of 8 datasets and GPT-4 on 5 out of 8 datasets. In addition, we investigated the catastrophic forgetting problem, and our results show that Me-LLaMA models outperform other open-source medical LLMs in mitigating this issue. Me-LLaMA is one of the largest open-source medical foundation LLMs that use both biomedical and clinical data. It exhibits superior performance across both general and medical tasks compared to other open-source medical LLMs, rendering it an attractive choice for medical AI applications. We release our models, datasets, and evaluation scripts at: https://github.com/BIDS-Xu-Lab/Me-LLaMA.
Collapse
Affiliation(s)
- Qianqian Xie
- Section of Biomedical Informatics and Data Science, School of Medicine, Yale University, New Haven, CT, USA
| | - Qingyu Chen
- Section of Biomedical Informatics and Data Science, School of Medicine, Yale University, New Haven, CT, USA
| | - Aokun Chen
- Department of Health Outcomes and Biomedical Informatics, College of Medicine, University of Florida, Gainesville, FL, USA
| | - Cheng Peng
- Department of Health Outcomes and Biomedical Informatics, College of Medicine, University of Florida, Gainesville, FL, USA
| | - Yan Hu
- School of Biomedical Informatics, University of Texas Health Science, Center at Houston, Houston, TX, USA
| | - Fongci Lin
- Section of Biomedical Informatics and Data Science, School of Medicine, Yale University, New Haven, CT, USA
| | - Xueqing Peng
- Section of Biomedical Informatics and Data Science, School of Medicine, Yale University, New Haven, CT, USA
| | - Jimin Huang
- Section of Biomedical Informatics and Data Science, School of Medicine, Yale University, New Haven, CT, USA
| | - Jeffrey Zhang
- Section of Biomedical Informatics and Data Science, School of Medicine, Yale University, New Haven, CT, USA
| | - Vipina Keloth
- Section of Biomedical Informatics and Data Science, School of Medicine, Yale University, New Haven, CT, USA
| | - Xinyu Zhou
- Section of Biomedical Informatics and Data Science, School of Medicine, Yale University, New Haven, CT, USA
| | - Huan He
- Section of Biomedical Informatics and Data Science, School of Medicine, Yale University, New Haven, CT, USA
| | - Lucila Ohno-Machado
- Section of Biomedical Informatics and Data Science, School of Medicine, Yale University, New Haven, CT, USA
| | - Yonghui Wu
- Department of Health Outcomes and Biomedical Informatics, College of Medicine, University of Florida, Gainesville, FL, USA
| | - Hua Xu
- Section of Biomedical Informatics and Data Science, School of Medicine, Yale University, New Haven, CT, USA
| | - Jiang Bian
- Department of Health Outcomes and Biomedical Informatics, College of Medicine, University of Florida, Gainesville, FL, USA
| |
Collapse
|
27
|
Hu G, Liu L, Xu D. On the Responsible Use of Chatbots in Bioinformatics. GENOMICS, PROTEOMICS & BIOINFORMATICS 2024; 22:qzae002. [PMID: 38862428 PMCID: PMC11104453 DOI: 10.1093/gpbjnl/qzae002] [Citation(s) in RCA: 3] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/01/2023] [Revised: 11/08/2023] [Accepted: 11/14/2023] [Indexed: 06/13/2024]
Affiliation(s)
- Gangqing Hu
- Department of Microbiology, Immunology & Cell Biology, West Virginia University, Morgantown, WV 26506, USA
| | - Li Liu
- College of Health Solutions, Arizona State University, Phoenix, AZ 85004, USA
- Biodesign Institute, Arizona State University, Tempe, AZ 85281, USA
| | - Dong Xu
- Department of Electrical Engineer and Computer Science, Christopher S. Bond Life Sciences Center, University of Missouri, Columbia, MO 65211, USA
| |
Collapse
|
28
|
Zhang X, Wang H, Sun C. BiSpec Pairwise AI: guiding the selection of bispecific antibody target combinations with pairwise learning and GPT augmentation. J Cancer Res Clin Oncol 2024; 150:237. [PMID: 38713378 PMCID: PMC11076393 DOI: 10.1007/s00432-024-05740-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/18/2024] [Accepted: 04/03/2024] [Indexed: 05/08/2024]
Abstract
PURPOSE Bispecific antibodies (BsAbs), capable of targeting two antigens simultaneously, represent a significant advancement by employing dual mechanisms of action for tumor suppression. However, how to pair targets to develop effective and safe bispecific drugs is a major challenge for pharmaceutical companies. METHODS Using machine learning models, we refined the biological characteristics of currently approved or in clinical development BsAbs and analyzed hundreds of membrane proteins as bispecific targets to predict the likelihood of successful drug development for various target combinations. Moreover, to enhance the interpretability of prediction results in bispecific target combination, we combined machine learning models with Large Language Models (LLMs). Through a Retrieval-Augmented Generation (RAG) approach, we supplement each pair of bispecific targets' machine learning prediction with important features and rationales, generating interpretable analytical reports. RESULTS In this study, the XGBoost model with pairwise learning was employed to predict the druggability of BsAbs. By analyzing extensive data on BsAbs and designing features from perspectives such as target activity, safety, cell type specificity, pathway mechanism, and gene embedding representation, our model is able to predict target combinations of BsAbs with high market potential. Specifically, we integrated XGBoost with the GPT model to discuss the efficacy of each bispecific target pair, thereby aiding the decision-making for drug developers. CONCLUSION The novelty of this study lies in the integration of machine learning and GPT techniques to provide a novel framework for the design of BsAbs drugs. This holistic approach not only improves prediction accuracy, but also enhances the interpretability and innovativeness of drug design.
Collapse
Affiliation(s)
- Xin Zhang
- Beijing Engineering Research Center of Protein and Antibody, Sinocelltech Ltd., Beijing, 100176, China
- School of Medicine, Nankai University, Tianjin, 300071, China
| | - Huiyu Wang
- Beijing Engineering Research Center of Protein and Antibody, Sinocelltech Ltd., Beijing, 100176, China
| | - Chunyun Sun
- Beijing Engineering Research Center of Protein and Antibody, Sinocelltech Ltd., Beijing, 100176, China.
| |
Collapse
|
29
|
Wang J, Ye Q, Liu L, Guo NL, Hu G. Scientific figures interpreted by ChatGPT: strengths in plot recognition and limits in color perception. NPJ Precis Oncol 2024; 8:84. [PMID: 38580746 PMCID: PMC10997760 DOI: 10.1038/s41698-024-00576-z] [Citation(s) in RCA: 4] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/26/2023] [Accepted: 02/27/2024] [Indexed: 04/07/2024] Open
Abstract
Emerging studies underscore the promising capabilities of large language model-based chatbots in conducting basic bioinformatics data analyses. The recent feature of accepting image inputs by ChatGPT, also known as GPT-4V(ision), motivated us to explore its efficacy in deciphering bioinformatics scientific figures. Our evaluation with examples in cancer research, including sequencing data analysis, multimodal network-based drug repositioning, and tumor clonal evolution, revealed that ChatGPT can proficiently explain different plot types and apply biological knowledge to enrich interpretations. However, it struggled to provide accurate interpretations when color perception and quantitative analysis of visual elements were involved. Furthermore, while the chatbot can draft figure legends and summarize findings from the figures, stringent proofreading is imperative to ensure the accuracy and reliability of the content.
Collapse
Affiliation(s)
- Jinge Wang
- Department of Microbiology, Immunology & Cell Biology, West Virginia University, Morgantown, WV, 26506, USA
| | - Qing Ye
- West Virginia University Cancer Institute, West Virginia University, Morgantown, WV, 26506, USA
| | - Li Liu
- College of Health Solutions, Arizona State University, Phoenix, AZ, 85004, USA
- Biodesign Institute, Arizona State University, Tempe, AZ, 85281, USA
| | - Nancy Lan Guo
- West Virginia University Cancer Institute, West Virginia University, Morgantown, WV, 26506, USA
- Department of Occupational and Environmental Health Sciences, West Virginia University, Morgantown, WV, 26506, USA
| | - Gangqing Hu
- Department of Microbiology, Immunology & Cell Biology, West Virginia University, Morgantown, WV, 26506, USA.
- West Virginia University Cancer Institute, West Virginia University, Morgantown, WV, 26506, USA.
| |
Collapse
|
30
|
Yao X, He Z, Liu Y, Wang Y, Ouyang S, Xia J. Cancer-Alterome: a literature-mined resource for regulatory events caused by genetic alterations in cancer. Sci Data 2024; 11:265. [PMID: 38431735 PMCID: PMC10908799 DOI: 10.1038/s41597-024-03083-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/07/2023] [Accepted: 02/20/2024] [Indexed: 03/05/2024] Open
Abstract
It is vital to investigate the complex mechanisms underlying tumors to better understand cancer and develop effective treatments. Metabolic abnormalities and clinical phenotypes can serve as essential biomarkers for diagnosing this challenging disease. Additionally, genetic alterations provide profound insights into the fundamental aspects of cancer. This study introduces Cancer-Alterome, a literature-mined dataset that focuses on the regulatory events of an organism's biological processes or clinical phenotypes caused by genetic alterations. By proposing and leveraging a text-mining pipeline, we identify 16,681 thousand of regulatory events records encompassing 21K genes, 157K genetic alterations and 154K downstream bio-concepts, extracted from 4,354K pan-cancer literature. The resulting dataset empowers a multifaceted investigation of cancer pathology, enabling the meticulous tracking of relevant literature support. Its potential applications extend to evidence-based medicine and precision medicine, yielding valuable insights for further advancements in cancer research.
Collapse
Affiliation(s)
- Xinzhi Yao
- Hubei Key Lab of Agricultural Bioinformatics, College of Informatics, Huazhong Agricultural University, Wuhan, 430070, P.R. China
| | - Zhihan He
- Hubei Key Lab of Agricultural Bioinformatics, College of Informatics, Huazhong Agricultural University, Wuhan, 430070, P.R. China
| | - Yawen Liu
- Hubei Key Lab of Agricultural Bioinformatics, College of Informatics, Huazhong Agricultural University, Wuhan, 430070, P.R. China
| | - Yuxing Wang
- Hubei Key Lab of Agricultural Bioinformatics, College of Informatics, Huazhong Agricultural University, Wuhan, 430070, P.R. China
- School of Computer Science and Engineering, Sun Yat-sen University, Guangzhou, 510006, P.R. China
| | - Sizhuo Ouyang
- Hubei Key Lab of Agricultural Bioinformatics, College of Informatics, Huazhong Agricultural University, Wuhan, 430070, P.R. China
| | - Jingbo Xia
- Hubei Key Lab of Agricultural Bioinformatics, College of Informatics, Huazhong Agricultural University, Wuhan, 430070, P.R. China.
| |
Collapse
|
31
|
Tian S, Jin Q, Yeganova L, Lai PT, Zhu Q, Chen X, Yang Y, Chen Q, Kim W, Comeau DC, Islamaj R, Kapoor A, Gao X, Lu Z. Opportunities and challenges for ChatGPT and large language models in biomedicine and health. Brief Bioinform 2023; 25:bbad493. [PMID: 38168838 PMCID: PMC10762511 DOI: 10.1093/bib/bbad493] [Citation(s) in RCA: 85] [Impact Index Per Article: 42.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/28/2023] [Revised: 11/15/2023] [Accepted: 12/06/2023] [Indexed: 01/05/2024] Open
Abstract
ChatGPT has drawn considerable attention from both the general public and domain experts with its remarkable text generation capabilities. This has subsequently led to the emergence of diverse applications in the field of biomedicine and health. In this work, we examine the diverse applications of large language models (LLMs), such as ChatGPT, in biomedicine and health. Specifically, we explore the areas of biomedical information retrieval, question answering, medical text summarization, information extraction and medical education and investigate whether LLMs possess the transformative power to revolutionize these tasks or whether the distinct complexities of biomedical domain presents unique challenges. Following an extensive literature survey, we find that significant advances have been made in the field of text generation tasks, surpassing the previous state-of-the-art methods. For other applications, the advances have been modest. Overall, LLMs have not yet revolutionized biomedicine, but recent rapid progress indicates that such methods hold great potential to provide valuable means for accelerating discovery and improving health. We also find that the use of LLMs, like ChatGPT, in the fields of biomedicine and health entails various risks and challenges, including fabricated information in its generated responses, as well as legal and privacy concerns associated with sensitive patient data. We believe this survey can provide a comprehensive and timely overview to biomedical researchers and healthcare practitioners on the opportunities and challenges associated with using ChatGPT and other LLMs for transforming biomedicine and health.
Collapse
Affiliation(s)
- Shubo Tian
- National Library of Medicine, National Institutes of Health
| | - Qiao Jin
- National Library of Medicine, National Institutes of Health
| | - Lana Yeganova
- National Library of Medicine, National Institutes of Health
| | - Po-Ting Lai
- National Library of Medicine, National Institutes of Health
| | - Qingqing Zhu
- National Library of Medicine, National Institutes of Health
| | - Xiuying Chen
- King Abdullah University of Science and Technology
| | - Yifan Yang
- National Library of Medicine, National Institutes of Health
| | - Qingyu Chen
- National Library of Medicine, National Institutes of Health
| | - Won Kim
- National Library of Medicine, National Institutes of Health
| | | | | | - Aadit Kapoor
- National Library of Medicine, National Institutes of Health
| | - Xin Gao
- King Abdullah University of Science and Technology
| | - Zhiyong Lu
- National Library of Medicine, National Institutes of Health
| |
Collapse
|
32
|
Wang J, Ye Q, Liu L, Lan Guo N, Hu G. Bioinformatics Illustrations Decoded by ChatGPT: The Good, The Bad, and The Ugly. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.10.15.562423. [PMID: 37904927 PMCID: PMC10614796 DOI: 10.1101/2023.10.15.562423] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/01/2023]
Abstract
Emerging studies underscore the promising capabilities of large language model-based chatbots in conducting fundamental bioinformatics data analyses. The recent feature of accepting image-inputs by ChatGPT motivated us to explore its efficacy in deciphering bioinformatics illustrations. Our evaluation with examples in cancer research, including sequencing data analysis, multimodal network-based drug repositioning, and tumor clonal evolution, revealed that ChatGPT can proficiently explain different plot types and apply biological knowledge to enrich interpretations. However, it struggled to provide accurate interpretations when quantitative analysis of visual elements was involved. Furthermore, while the chatbot can draft figure legends and summarize findings from the figures, stringent proofreading is imperative to ensure the accuracy and reliability of the content.
Collapse
Affiliation(s)
- Jinge Wang
- Department of Microbiology, Immunology & Cell Biology, West Virginia University, Morgantown, WV 26506, USA
| | - Qing Ye
- West Virginia University Cancer Institute, West Virginia University, Morgantown, WV 26506, USA
| | - Li Liu
- College of Health Solutions, Arizona State University, Phoenix, AZ 85004, USA
- Biodesign Institute, Arizona State University, Tempe, AZ, 85281 USA
| | - Nancy Lan Guo
- West Virginia University Cancer Institute, West Virginia University, Morgantown, WV 26506, USA
- Department of Occupational and Environmental Health Sciences, West Virginia University, Morgantown, WV 26506, USA
| | - Gangqing Hu
- Department of Microbiology, Immunology & Cell Biology, West Virginia University, Morgantown, WV 26506, USA
- West Virginia University Cancer Institute, West Virginia University, Morgantown, WV 26506, USA
| |
Collapse
|
33
|
Schmidt L, Finnerty Mutlu AN, Elmore R, Olorisade BK, Thomas J, Higgins JPT. Data extraction methods for systematic review (semi)automation: Update of a living systematic review. F1000Res 2021; 10:401. [PMID: 34408850 PMCID: PMC8361807 DOI: 10.12688/f1000research.51117.3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Accepted: 03/11/2025] [Indexed: 04/11/2025] Open
Abstract
Background The reliable and usable (semi) automation of data extraction can support the field of systematic review by reducing the workload required to gather information about the conduct and results of the included studies. This living systematic review examines published approaches for data extraction from reports of clinical studies. Methods We systematically and continually search PubMed, ACL Anthology, arXiv, OpenAlex via EPPI-Reviewer, and the dblp computer science bibliography databases. Full text screening and data extraction are conducted using a mix of open-source and commercial tools. This living review update includes publications up to August 2024 and OpenAlex content up to September 2024. Results 117 publications are included in this review. Of these, 30 (26%) used full texts while the rest used titles and abstracts. A total of 112 (96%) publications developed classifiers for randomised controlled trials. Over 30 entities were extracted, with PICOs (population, intervention, comparator, outcome) being the most frequently extracted. Data are available from 53 (45%), and code from 49 (42%) publications. Nine (8%) implemented publicly available tools. Conclusions This living systematic review presents an overview of (semi)automated data-extraction literature of interest to different types of literature review. We identified a broad evidence base of publications describing data extraction for interventional reviews and a small number of publications extracting other study types. Between review updates, large language models emerged as a new tool for data extraction. While facilitating access to automated extraction, they showed a trend of decreasing quality of results reporting, especially quantitative results such as recall and lower reproducibility of results. Compared with the previous update, trends such as transition to relation extraction and sharing of code and datasets stayed similar.
Collapse
Affiliation(s)
- Lena Schmidt
- NIHR Innovation Observatory, Newcastle University, Newcastle upon Tyne, NE4 5TG, UK
- Sciome LLC, Research Triangle Park, North Carolina, 27713, USA
- Bristol Medical School, University of Bristol, Bristol, BS8 2PS, UK
| | | | - Rebecca Elmore
- Sciome LLC, Research Triangle Park, North Carolina, 27713, USA
| | - Babatunde K. Olorisade
- Bristol Medical School, University of Bristol, Bristol, BS8 2PS, UK
- Evaluate Ltd, London, SE1 2RE, UK
- Cardiff School of Technologies, Cardiff Metropolitan University, Cardiff, CF5 2YB, UK
- EdgeStride (Timeless Dynamics Academy), AACSL 1st Floor, North Westgate House, Harlow, Essex, CM20 1YS, UK
| | - James Thomas
- UCL Social Research Institute, University College London, London, WC1H 0AL, UK
| | | |
Collapse
|