1
|
Liu D, Zhang Y, Yang M, Yuan J, Qu W. Extracting Mutant-Affected Protein-Protein Interactions via Gaussian-Enhanced Representation and Contrastive Learning. J Comput Biol 2023; 30:972-984. [PMID: 37682321 DOI: 10.1089/cmb.2023.0080] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 09/09/2023] Open
Abstract
Genetic mutations can impact protein-protein interactions (PPIs) in biomedical literature. Automated extraction of PPIs affected by gene mutations from biomedical literature can aid in evaluating the clinical importance of gene variations, which is crucial for the advancement of precision medicine. In this study, a new model called the Gaussian-enhanced representation model (GRM) is introduced for PPI extraction. The model utilizes the Gaussian probability distribution to produce a target entity representation based on the BioBERT pretraining model. The GRM assigns more weight to target protein entities and their adjacent entities, resolving the problem of lengthy input text and scattered distribution of target entities in the PPI extraction task. Additionally, the model introduces a supervised contrast learning approach to enhance its effectiveness and robustness. Experiments on the BioCreative VI data set demonstrate that our proposed GRM model has achieved state-of-the-art performance.
Collapse
Affiliation(s)
- Da Liu
- School of Information Science and Technology, Dalian Maritime University, Dalian, Liaoning, China
| | - Yijia Zhang
- School of Information Science and Technology, Dalian Maritime University, Dalian, Liaoning, China
| | - Ming Yang
- School of Information Science and Technology, Dalian Maritime University, Dalian, Liaoning, China
| | - Jianyuan Yuan
- School of Information Science and Technology, Dalian Maritime University, Dalian, Liaoning, China
| | - Wen Qu
- School of Information Science and Technology, Dalian Maritime University, Dalian, Liaoning, China
| |
Collapse
|
2
|
Zheng T, Xu Z, Li Y, Zhao Y, Wang B, Yang X. A Novel Conditional Knowledge Graph Representation and Construction. ARTIF INTELL 2021. [DOI: 10.1007/978-3-030-93049-3_32] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/24/2022]
|
3
|
Li Z, Li C, Long Y, Wang X. A system for automatically extracting clinical events with temporal information. BMC Med Inform Decis Mak 2020; 20:198. [PMID: 32819377 PMCID: PMC7439713 DOI: 10.1186/s12911-020-01208-9] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/30/2018] [Accepted: 08/03/2020] [Indexed: 12/03/2022] Open
Abstract
Background The popularization of health and medical informatics yields huge amounts of data. Extracting clinical events on a temporal course is the foundation of enabling advanced applications and research. It is a structure of presenting information in chronological order. Manual extraction would be extremely challenging due to the quantity and complexity of the records. Methods We present an recurrent neural network- based architecture, which is able to automatically extract clinical event expressions along with each event’s temporal information. The system is built upon the attention-based and recursive neural networks and introduce a piecewise representation (we divide the input sentences into three pieces to better utilize the information in the sentences), incorporates semantic information by utilizing word representations obtained from BioASQ and Wikipedia. Results The system is evaluated on the THYME corpus, a set of manually annotated clinical records from Mayo Clinic. In order to further verify the effectiveness of the system, the system is also evaluated on the TimeBank _Dense corpus. The experiments demonstrate that the system outperforms the current state-of-the-art models. The system also supports domain adaptation, i.e., the system may be used in brain cancer data while its model is trained in colon cancer data. Conclusion Our system extracts temporal expressions, event expressions and link them according to actually occurring sequence, which may structure the key information from complicated unstructured clinical records. Furthermore, we demonstrate that combining the piecewise representation method with attention mechanism can capture more complete features. The system is flexible and can be extended to handle other document types.
Collapse
Affiliation(s)
- Zhijing Li
- School of Computer Science and Technology, Xi'an Jiaotong University, Xi'an, 710049, Shaanxi, China.,Shaanxi Province Key Laboratory of Satellite and Terrestrial Network Tech. R&D, Xi'an Jiaotong University, Xi'an, 710049, Shaanxi, China
| | - Chen Li
- School of Computer Science and Technology, Xi'an Jiaotong University, Xi'an, 710049, Shaanxi, China. .,Shaanxi Province Key Laboratory of Satellite and Terrestrial Network Tech. R&D, Xi'an Jiaotong University, Xi'an, 710049, Shaanxi, China.
| | - Yu Long
- School of Computer Science and Technology, Xi'an Jiaotong University, Xi'an, 710049, Shaanxi, China.,Shaanxi Province Key Laboratory of Satellite and Terrestrial Network Tech. R&D, Xi'an Jiaotong University, Xi'an, 710049, Shaanxi, China
| | - Xuan Wang
- School of Computer Science and Technology, Xi'an Jiaotong University, Xi'an, 710049, Shaanxi, China.,Shaanxi Province Key Laboratory of Satellite and Terrestrial Network Tech. R&D, Xi'an Jiaotong University, Xi'an, 710049, Shaanxi, China
| |
Collapse
|
4
|
Liu X, Fan J, Dong S. Document-Level Biomedical Relation Extraction Leveraging Pretrained Self-Attention Structure and Entity Replacement: Algorithm and Pretreatment Method Validation Study. JMIR Med Inform 2020; 8:e17644. [PMID: 32469325 PMCID: PMC7314385 DOI: 10.2196/17644] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/30/2019] [Revised: 03/02/2020] [Accepted: 03/19/2020] [Indexed: 01/26/2023] Open
Abstract
Background The most current methods applied for intrasentence relation extraction in the biomedical literature are inadequate for document-level relation extraction, in which the relationship may cross sentence boundaries. Hence, some approaches have been proposed to extract relations by splitting the document-level datasets through heuristic rules and learning methods. However, these approaches may introduce additional noise and do not really solve the problem of intersentence relation extraction. It is challenging to avoid noise and extract cross-sentence relations. Objective This study aimed to avoid errors by dividing the document-level dataset, verify that a self-attention structure can extract biomedical relations in a document with long-distance dependencies and complex semantics, and discuss the relative benefits of different entity pretreatment methods for biomedical relation extraction. Methods This paper proposes a new data preprocessing method and attempts to apply a pretrained self-attention structure for document biomedical relation extraction with an entity replacement method to capture very long-distance dependencies and complex semantics. Results Compared with state-of-the-art approaches, our method greatly improved the precision. The results show that our approach increases the F1 value, compared with state-of-the-art methods. Through experiments of biomedical entity pretreatments, we found that a model using an entity replacement method can improve performance. Conclusions When considering all target entity pairs as a whole in the document-level dataset, a pretrained self-attention structure is suitable to capture very long-distance dependencies and learn the textual context and complicated semantics. A replacement method for biomedical entities is conducive to biomedical relation extraction, especially to document-level relation extraction.
Collapse
Affiliation(s)
- Xiaofeng Liu
- Communication and Computer Network Key Laboratory of Guangdong, School of Computer Science and Engineering, South China University of Technology, Guangzhou, China
| | - Jianye Fan
- Communication and Computer Network Key Laboratory of Guangdong, School of Computer Science and Engineering, South China University of Technology, Guangzhou, China
| | - Shoubin Dong
- Communication and Computer Network Key Laboratory of Guangdong, School of Computer Science and Engineering, South China University of Technology, Guangzhou, China
| |
Collapse
|
5
|
Bio-semantic relation extraction with attention-based external knowledge reinforcement. BMC Bioinformatics 2020; 21:213. [PMID: 32448122 PMCID: PMC7245897 DOI: 10.1186/s12859-020-3540-8] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/20/2019] [Accepted: 05/07/2020] [Indexed: 12/13/2022] Open
Abstract
Background Semantic resources such as knowledge bases contains high-quality-structured knowledge and therefore require significant effort from domain experts. Using the resources to reinforce the information retrieval from the unstructured text may further exploit the potentials of such unstructured text resources and their curated knowledge. Results The paper proposes a novel method that uses a deep neural network model adopting the prior knowledge to improve performance in the automated extraction of biological semantic relations from the scientific literature. The model is based on a recurrent neural network combining the attention mechanism with the semantic resources, i.e., UniProt and BioModels. Our method is evaluated on the BioNLP and BioCreative corpus, a set of manually annotated biological text. The experiments demonstrate that the method outperforms the current state-of-the-art models, and the structured semantic information could improve the result of bio-text-mining. Conclusion The experiment results show that our approach can effectively make use of the external prior knowledge information and improve the performance in the protein-protein interaction extraction task. The method should be able to be generalized for other types of data, although it is validated on biomedical texts.
Collapse
|
6
|
Caufield JH, Ping P. New advances in extracting and learning from protein-protein interactions within unstructured biomedical text data. Emerg Top Life Sci 2019; 3:357-369. [PMID: 33523203 DOI: 10.1042/etls20190003] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/05/2019] [Revised: 07/11/2019] [Accepted: 07/16/2019] [Indexed: 12/14/2022]
Abstract
Protein-protein interactions, or PPIs, constitute a basic unit of our understanding of protein function. Though substantial effort has been made to organize PPI knowledge into structured databases, maintenance of these resources requires careful manual curation. Even then, many PPIs remain uncurated within unstructured text data. Extracting PPIs from experimental research supports assembly of PPI networks and highlights relationships crucial to elucidating protein functions. Isolating specific protein-protein relationships from numerous documents is technically demanding by both manual and automated means. Recent advances in the design of these methods have leveraged emerging computational developments and have demonstrated impressive results on test datasets. In this review, we discuss recent developments in PPI extraction from unstructured biomedical text. We explore the historical context of these developments, recent strategies for integrating and comparing PPI data, and their application to advancing the understanding of protein function. Finally, we describe the challenges facing the application of PPI mining to the text concerning protein families, using the multifunctional 14-3-3 protein family as an example.
Collapse
Affiliation(s)
- J Harry Caufield
- The NIH BD2K Center of Excellence in Biomedical Computing, University of California at Los Angeles, Los Angeles, CA 90095, U.S.A
- Department of Physiology, University of California at Los Angeles, Los Angeles, CA 90095, U.S.A
| | - Peipei Ping
- The NIH BD2K Center of Excellence in Biomedical Computing, University of California at Los Angeles, Los Angeles, CA 90095, U.S.A
- Department of Physiology, University of California at Los Angeles, Los Angeles, CA 90095, U.S.A
- Department of Medicine/Cardiology, University of California at Los Angeles, Los Angeles, CA 90095, U.S.A
- Department of Bioinformatics, University of California at Los Angeles, Los Angeles, CA 90095, U.S.A
- Scalable Analytics Institute (ScAi), University of California at Los Angeles, Los Angeles, CA 90095, U.S.A
| |
Collapse
|
7
|
Zhou H, Liu Z, Ning S, Lang C, Lin Y, Du L. Knowledge-aware attention network for protein-protein interaction extraction. J Biomed Inform 2019; 96:103234. [PMID: 31202937 DOI: 10.1016/j.jbi.2019.103234] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/04/2018] [Revised: 06/06/2019] [Accepted: 06/13/2019] [Indexed: 11/19/2022]
Abstract
Protein-protein interaction (PPI) extraction from published scientific literature provides additional support for precision medicine efforts. However, many of the current PPI extraction methods need extensive feature engineering and cannot make full use of the prior knowledge in knowledge bases (KBs). KBs contain huge amounts of structured information about entities and relationships, therefore play a pivotal role in PPI extraction. This paper proposes a knowledge-aware attention network (KAN) to fuse prior knowledge about protein-protein pairs and context information for PPI extraction. The proposed model first adopts a diagonal-disabled multi-head attention mechanism to encode context sequence along with knowledge representations learned from KBs. Then a novel multi-dimensional attention mechanism is used to select the features that can best describe the encoded context. Experiment results on the BioCreative VI PPI dataset show that the proposed approach could acquire knowledge-aware dependencies between different words in a sequence and lead to a new state-of-the-art performance.
Collapse
Affiliation(s)
- Huiwei Zhou
- School of Computer Science and Technology, Dalian University of Technology, Dalian 116024, Liaoning, China.
| | - Zhuang Liu
- School of Computer Science and Technology, Dalian University of Technology, Dalian 116024, Liaoning, China.
| | - Shixian Ning
- School of Computer Science and Technology, Dalian University of Technology, Dalian 116024, Liaoning, China.
| | - Chengkun Lang
- School of Computer Science and Technology, Dalian University of Technology, Dalian 116024, Liaoning, China.
| | - Yingyu Lin
- School of Foreign Languages, Dalian University of Technology, Dalian 116024, Liaoning, China.
| | - Lei Du
- School of Mathematical Sciences, Dalian University of Technology, Dalian 116024, Liaoning, China.
| |
Collapse
|