1
|
Chang YC, Huang MS, Huang YH, Lin YH. The influence of prompt engineering on large language models for protein-protein interaction identification in biomedical literature. Sci Rep 2025; 15:15493. [PMID: 40319086 PMCID: PMC12049485 DOI: 10.1038/s41598-025-99290-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/10/2024] [Accepted: 04/18/2025] [Indexed: 05/07/2025] Open
Abstract
Identifying protein-protein interactions (PPIs) is a foundational task in biomedical natural language processing. While specialized models have been developed, the potential of general-domain large language models (LLMs) in PPI extraction, particularly for researchers without computational expertise, remains unexplored. This study evaluates the effectiveness of proprietary LLMs (GPT-3.5, GPT-4, and Google Gemini) in PPI prediction through systematic prompt engineering. We designed six prompting scenarios of increasing complexity, from basic interaction queries to sophisticated entity-tagged formats, and assessed model performance across multiple benchmark datasets (LLL, IEPA, HPRD50, AIMed, BioInfer, and PEDD). Carefully designed prompts effectively guided LLMs in PPI prediction. Gemini 1.5 Pro achieved the highest performance across most datasets, with notable F1-scores in LLL (90.3%), IEPA (68.2%), HPRD50 (67.5%), and PEDD (70.2%). GPT-4 showed competitive performance, particularly in the LLL dataset (87.3%). We identified and addressed a positive prediction bias, demonstrating improved performance after evaluation refinement. While not surpassing specialized models, general-purpose LLMs with appropriate prompting strategies can effectively perform PPI prediction tasks, offering valuable tools for biomedical researchers without extensive computational expertise.
Collapse
Affiliation(s)
- Yung-Chun Chang
- Graduate Institute of Data Science, Taipei Medical University, Taipei, Taiwan.
- Clinical Big Data Research Center, Taipei Medical University Hospital, Taipei, Taiwan.
| | - Ming-Siang Huang
- Graduate Institute of Data Science, Taipei Medical University, Taipei, Taiwan
| | - Yi-Hsuan Huang
- Graduate Institute of Data Science, Taipei Medical University, Taipei, Taiwan
| | - Yi-Hsuan Lin
- Graduate Institute of Data Science, Taipei Medical University, Taipei, Taiwan
| |
Collapse
|
2
|
Sun B, Li Q, Xiao X, Zhang J, Zhou Y, Huang Y, Gao J, Cao X. The loach haplotype-resolved genome and the identification of Mex3a involved in fish air breathing. CELL GENOMICS 2024; 4:100670. [PMID: 39389021 PMCID: PMC11602589 DOI: 10.1016/j.xgen.2024.100670] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/29/2024] [Revised: 07/30/2024] [Accepted: 09/13/2024] [Indexed: 10/12/2024]
Abstract
Fish air breathing is crucial for the transition of vertebrates from water to land. So far, the genes involved in fish air breathing have not been well identified. Here, we performed gene enrichment analysis of positively selected genes (PSGs) in loach (Misgurnus anguillicaudatus, an air-breathing fish) in comparison to Triplophysa tibetana (a non-air-breathing fish), haplotype-resolved genome assembly of the loach, and gene evolutionary analysis of air-breathing and non-air-breathing fishes and found that the PSG mex3a originated from ancient air-breathing fish species. Deletion of Mex3a impaired loach air-breathing capacity by inhibiting angiogenesis through its interaction with T-box transcription factor 20. Mex3a overexpression significantly promoted angiogenesis. Structural analysis and point mutation revealed the critical role of the 201st amino acid in loach Mex3a for angiogenesis. Our findings innovatively indicate that the ancient mex3a is a fish air-breathing gene, which holds significance for understanding fish air breathing and provides a valuable resource for cultivating hypoxia-tolerant fish varieties.
Collapse
Affiliation(s)
- Bing Sun
- College of Fisheries, Engineering Research Center of Green Development for Conventional Aquatic Biological Industry in the Yangtze River Economic Belt, Ministry of Education/Key Lab of Agricultural Animal Genetics, Breeding and Reproduction of Ministry of Education/Key Lab of Freshwater Animal Breeding, Ministry of Agriculture and Rural Affairs, Huazhong Agricultural University, Wuhan 430070, China
| | - Qingshan Li
- College of Fisheries, Engineering Research Center of Green Development for Conventional Aquatic Biological Industry in the Yangtze River Economic Belt, Ministry of Education/Key Lab of Agricultural Animal Genetics, Breeding and Reproduction of Ministry of Education/Key Lab of Freshwater Animal Breeding, Ministry of Agriculture and Rural Affairs, Huazhong Agricultural University, Wuhan 430070, China
| | - Xinxin Xiao
- College of Fisheries, Engineering Research Center of Green Development for Conventional Aquatic Biological Industry in the Yangtze River Economic Belt, Ministry of Education/Key Lab of Agricultural Animal Genetics, Breeding and Reproduction of Ministry of Education/Key Lab of Freshwater Animal Breeding, Ministry of Agriculture and Rural Affairs, Huazhong Agricultural University, Wuhan 430070, China
| | - Jianwei Zhang
- National Key Laboratory of Crop Genetic Improvement, Hubei Hongshan Laboratory, Huazhong Agricultural University, Wuhan 430070, China
| | - Ying Zhou
- College of Fisheries, Engineering Research Center of Green Development for Conventional Aquatic Biological Industry in the Yangtze River Economic Belt, Ministry of Education/Key Lab of Agricultural Animal Genetics, Breeding and Reproduction of Ministry of Education/Key Lab of Freshwater Animal Breeding, Ministry of Agriculture and Rural Affairs, Huazhong Agricultural University, Wuhan 430070, China
| | - Yuwei Huang
- College of Fisheries, Engineering Research Center of Green Development for Conventional Aquatic Biological Industry in the Yangtze River Economic Belt, Ministry of Education/Key Lab of Agricultural Animal Genetics, Breeding and Reproduction of Ministry of Education/Key Lab of Freshwater Animal Breeding, Ministry of Agriculture and Rural Affairs, Huazhong Agricultural University, Wuhan 430070, China
| | - Jian Gao
- College of Fisheries, Engineering Research Center of Green Development for Conventional Aquatic Biological Industry in the Yangtze River Economic Belt, Ministry of Education/Key Lab of Agricultural Animal Genetics, Breeding and Reproduction of Ministry of Education/Key Lab of Freshwater Animal Breeding, Ministry of Agriculture and Rural Affairs, Huazhong Agricultural University, Wuhan 430070, China.
| | - Xiaojuan Cao
- College of Fisheries, Engineering Research Center of Green Development for Conventional Aquatic Biological Industry in the Yangtze River Economic Belt, Ministry of Education/Key Lab of Agricultural Animal Genetics, Breeding and Reproduction of Ministry of Education/Key Lab of Freshwater Animal Breeding, Ministry of Agriculture and Rural Affairs, Huazhong Agricultural University, Wuhan 430070, China.
| |
Collapse
|
3
|
Al-Aamri A, Taha K, Al-Hammadi Y, Maalouf M, Homouz D. Constructing Genetic Networks using Biomedical Literature and Rare Event Classification. Sci Rep 2017; 7:15784. [PMID: 29150626 PMCID: PMC5694017 DOI: 10.1038/s41598-017-16081-2] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/17/2017] [Accepted: 10/24/2017] [Indexed: 12/16/2022] Open
Abstract
Text mining has become an important tool in bioinformatics research with the massive growth in the biomedical literature over the past decade. Mining the biomedical literature has resulted in an incredible number of computational algorithms that assist many bioinformatics researchers. In this paper, we present a text mining system called Gene Interaction Rare Event Miner (GIREM) that constructs gene-gene-interaction networks for human genome using information extracted from biomedical literature. GIREM identifies functionally related genes based on their co-occurrences in the abstracts of biomedical literature. For a given gene g, GIREM first extracts the set of genes found within the abstracts of biomedical literature associated with g. GIREM aims at enhancing biological text mining approaches by identifying the semantic relationship between each co-occurrence of a pair of genes in abstracts using the syntactic structures of sentences and linguistics theories. It uses a supervised learning algorithm, weighted logistic regression to label pairs of genes to related or un-related classes, and to reflect the population proportion using smaller samples. We evaluated GIREM by comparing it experimentally with other well-known approaches and a protein-protein interactions database. Results showed marked improvement.
Collapse
Affiliation(s)
- Amira Al-Aamri
- Department of Electrical and Computer Engineering, Khalifa University of Science and Technology, P.O. Box 127788, Abu Dhabi, UAE
| | - Kamal Taha
- Department of Electrical and Computer Engineering, Khalifa University of Science and Technology, P.O. Box 127788, Abu Dhabi, UAE
| | - Yousof Al-Hammadi
- Department of Electrical and Computer Engineering, Khalifa University of Science and Technology, P.O. Box 127788, Abu Dhabi, UAE
| | - Maher Maalouf
- Department of Industrial and Systems Engineering, Khalifa University of Science and Technology, P.O. Box 127788, Abu Dhabi, UAE
| | - Dirar Homouz
- Department of Physics, Khalifa University of Science and Technology, P.O. Box 127788, Abu Dhabi, UAE.
| |
Collapse
|
4
|
Taha K, Yoo PD. Erratum to: Predicting the functions of a protein from its ability to associate with other molecules. BMC Bioinformatics 2016; 17:105. [PMID: 26921164 PMCID: PMC4768403 DOI: 10.1186/s12859-016-0953-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022] Open
|