1
|
Huang M, Jiang Z, Guo S. Phar-LSTM: a pharmacological representation-based LSTM network for drug-drug interaction extraction. PeerJ 2023; 11:e16606. [PMID: 38107590 PMCID: PMC10725669 DOI: 10.7717/peerj.16606] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/07/2023] [Accepted: 11/15/2023] [Indexed: 12/19/2023] Open
Abstract
Pharmacological drug interactions are among the most common causes of medication errors. Many different methods have been proposed to extract drug-drug interactions from the literature to reduce medication errors over the last few years. However, the performance of these methods can be further improved. In this paper, we present a Pharmacological representation-based Long Short-Term Memory (LSTM) network named Phar-LSTM. In this method, a novel embedding strategy is proposed to extract pharmacological representations from the biomedical literature, and the information related to the target drug is considered. Then, an LSTM-based multi-task learning scheme is introduced to extract features from the different but related tasks according to their corresponding pharmacological representations. Finally, the extracted features are fed to the SoftMax classifier of the corresponding task. Experimental results on the DDIExtraction 2011 and DDIExtraction 2013 corpuses show that the performance of Phar-LSTM is competitive compared with other state-of-the-art methods. Our Python implementation and the corresponding data of Phar-LSTM are available by using the DOI 10.5281/zenodo.8249384.
Collapse
Affiliation(s)
- Mingqing Huang
- School of Software Engineering, Shenzhen Institute of Information Technology, Shenzhen, Guangdong, China
- Shenzhen Institute of Advanced Technology, Chinese Academy of Sciences, Shenzhen, Guangdong, China
| | - Zhenchao Jiang
- Shenzhen Institute of Advanced Technology, Chinese Academy of Sciences, Shenzhen, Guangdong, China
| | - Shun Guo
- Shenzhen Institute of Advanced Technology, Chinese Academy of Sciences, Shenzhen, Guangdong, China
| |
Collapse
|
2
|
Faessler E, Hahn U, Schäuble S. GePI: large-scale text mining, customized retrieval and flexible filtering of gene/protein interactions. Nucleic Acids Res 2023:7177881. [PMID: 37224532 DOI: 10.1093/nar/gkad445] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/18/2023] [Revised: 05/01/2023] [Accepted: 05/11/2023] [Indexed: 05/26/2023] Open
Abstract
We present GePI, a novel Web server for large-scale text mining of molecular interactions from the scientific biomedical literature. GePI leverages natural language processing techniques to identify genes and related entities, interactions between those entities and biomolecular events involving them. GePI supports rapid retrieval of interactions based on powerful search options to contextualize queries targeting (lists of) genes of interest. Contextualization is enabled by full-text filters constraining the search for interactions to either sentences or paragraphs, with or without pre-defined gene lists. Our knowledge graph is updated several times a week ensuring the most recent information to be available at all times. The result page provides an overview of the outcome of a search, with accompanying interaction statistics and visualizations. A table (downloadable in Excel format) gives direct access to the retrieved interaction pairs, together with information about the molecular entities, the factual certainty of the interactions (as verbatim expressed by the authors), and a text snippet from the original document that verbalizes each interaction. In summary, our Web application offers free, easy-to-use, and up-to-date monitoring of gene and protein interaction information, in company with flexible query formulation and filtering options. GePI is available at https://gepi.coling.uni-jena.de/.
Collapse
Affiliation(s)
- Erik Faessler
- Jena University Language and Information Engineering (JULIE) Lab, Friedrich Schiller University Jena, Fürstengraben 30, 07743 Jena, Germany
| | - Udo Hahn
- Jena University Language and Information Engineering (JULIE) Lab, Friedrich Schiller University Jena, Fürstengraben 30, 07743 Jena, Germany
| | - Sascha Schäuble
- Jena University Language and Information Engineering (JULIE) Lab, Friedrich Schiller University Jena, Fürstengraben 30, 07743 Jena, Germany
- Microbiome Dynamics, Leibniz Institute for Natural Product Research and Infection Biology (Leibniz-HKI), 07745 Jena, Germany
| |
Collapse
|
3
|
He X, Tai P, Lu H, Huang X, Ren Y. A biomedical event extraction method based on fine-grained and attention mechanism. BMC Bioinformatics 2022; 23:308. [PMID: 35906547 PMCID: PMC9336007 DOI: 10.1186/s12859-022-04854-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/28/2022] [Accepted: 07/21/2022] [Indexed: 11/24/2022] Open
Abstract
Background Biomedical event extraction is a fundamental task in biomedical text mining, which provides inspiration for medicine research and disease prevention. Biomedical events include simple events and complex events. Existing biomedical event extraction methods usually deal with simple events and complex events uniformly, and the performance of complex event extraction is relatively low. Results In this paper, we propose a fine-grained Bidirectional Long Short Term Memory method for biomedical event extraction, which designs different argument detection models for simple and complex events respectively. In addition, multi-level attention is designed to improve the performance of complex event extraction, and sentence embeddings are integrated to obtain sentence level information which can resolve the ambiguities for some types of events. Our method achieves state-of-the-art performance on the commonly used dataset Multi-Level Event Extraction. Conclusions The sentence embeddings enrich the global sentence-level information. The fine-grained argument detection model improves the performance of complex biomedical event extraction. Furthermore, the multi-level attention mechanism enhances the interactions among relevant arguments. The experimental results demonstrate the effectiveness of the proposed method for biomedical event extraction.
Collapse
Affiliation(s)
- Xinyu He
- School of Computer and Information Technology, Liaoning Normal University, Dalian, Liaoning, China. .,Information and Communication Engineering Postdoctoral Research Station, Dalian University of Technology, Dalian, Liaoning, China. .,Postdoctoral Workstation of Dalian Yongjia Electronic Technology Co., Ltd, Dalian, Liaoning, China.
| | - Ping Tai
- FIL Technology Limited, Dalian, Liaoning, China
| | - Hongbin Lu
- Information and Communication Engineering Postdoctoral Research Station, Dalian University of Technology, Dalian, Liaoning, China
| | - Xin Huang
- Anshan Normal University, Anshan, Liaoning, China
| | - Yonggong Ren
- School of Computer and Information Technology, Liaoning Normal University, Dalian, Liaoning, China.
| |
Collapse
|
4
|
Zhao W, Zhang J, Yang J, He T, Ma H, Li Z. A novel joint biomedical event extraction framework via two-level modeling of documents. Inf Sci (N Y) 2021. [DOI: 10.1016/j.ins.2020.10.047] [Citation(s) in RCA: 9] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/24/2023]
|
5
|
Wang H, Tao Q, Du S, Luo X. An Extensible Framework of Leveraging Syntactic Skeleton for Semantic Relation Classification. ACM T ASIAN LOW-RESO 2020. [DOI: 10.1145/3402885] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/23/2022]
Abstract
Relation classification is one of the most fundamental upstream tasks in natural language processing and information extraction. State-of-the-art approaches make use of various deep neural networks (DNNs) to extract higher-level features directly. They can easily access to accurate classification results by taking advantage of both local entity features and global sentential features. Recent works on relation classification devote efforts to modify these neural networks, but less attention has been paid to the feature design concerning syntax. However, from a linguistic perspective, syntactic features are essential for relation classification. In this article, we present a novel linguistically motivated approach that enhances relation classification by imposing additional syntactic constraints. We investigate to leverage syntactic skeletons along with the sentential contexts to identify hidden relation types. The syntactic skeletons are extracted under the guidance of prior syntax knowledge. During extraction, the input sentences are recursively decomposed into syntactically shorter and simpler chunks. Experimental results on the SemEval-2010 Task 8 benchmark show that incorporating syntactic skeletons into current DNN models enhances the task of relation classification. Our systems significantly surpass two strong baseline systems. One of the substantial advantages of our proposal is that this framework is extensible for most current DNN models.
Collapse
Affiliation(s)
- Hao Wang
- Shanghai University, Shanghai, China
| | | | - Siyuan Du
- Shanghai University, Shanghai, China
| | | |
Collapse
|
6
|
Yu X, Rong W, Liu J, Zhou D, Ouyang Y, Xiong Z. LSTM-Based End-to-End Framework for Biomedical Event Extraction. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2020; 17:2029-2039. [PMID: 31095491 DOI: 10.1109/tcbb.2019.2916346] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/09/2023]
Abstract
Biomedical event extraction plays an important role in the extraction of biological information from large-scale scientific publications. However, most state-of-the-art systems separate this task into several steps, which leads to cascading errors. In addition, it is complicated to generate features from syntactic and dependency analysis separately. Therefore, in this paper, we propose an end-to-end model based on long short-term memory (LSTM) to optimize biomedical event extraction. Experimental results demonstrate that our approach improves the performance of biomedical event extraction. We achieve average F1-scores of 59.68, 58.23, and 57.39 percent on the BioNLP09, BioNLP11, and BioNLP13's Genia event datasets, respectively. The experimental study has shown our proposed model's potential in biomedical event extraction.
Collapse
|
7
|
Fezai R, Abodayeh K, Mansouri M, Nounou H, Nounou M. Fault diagnosis of biological systems using improved machine learning technique. INT J MACH LEARN CYB 2020. [DOI: 10.1007/s13042-020-01184-6] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/14/2023]
|
8
|
Zhu L, Zheng H. Biomedical event extraction with a novel combination strategy based on hybrid deep neural networks. BMC Bioinformatics 2020; 21:47. [PMID: 32028883 PMCID: PMC7006190 DOI: 10.1186/s12859-020-3376-2] [Citation(s) in RCA: 9] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/13/2019] [Accepted: 01/20/2020] [Indexed: 11/10/2022] Open
Abstract
Background Biomedical event extraction is a fundamental and in-demand technology that has attracted substantial interest from many researchers. Previous works have heavily relied on manual designed features and external NLP packages in which the feature engineering is large and complex. Additionally, most of the existing works use the pipeline process that breaks down a task into simple sub-tasks but ignores the interaction between them. To overcome these limitations, we propose a novel event combination strategy based on hybrid deep neural networks to settle the task in a joint end-to-end manner. Results We adapted our method to several annotated corpora of biomedical event extraction tasks. Our method achieved state-of-the-art performance with noticeable overall F1 score improvement compared to that of existing methods for all of these corpora. Conclusions The experimental results demonstrated that our method is effective for biomedical event extraction. The combination strategy can reconstruct complex events from the output of deep neural networks, while the deep neural networks effectively capture the feature representation from the raw text. The biomedical event extraction implementation is available online at http://www.predictor.xin/event_extraction.
Collapse
Affiliation(s)
- Lvxing Zhu
- School of Computer Science and Technology, University of Science and Technology of China, Huangshan Road, Hefei, 230026, People's Republic of China
| | - Haoran Zheng
- School of Computer Science and Technology, University of Science and Technology of China, Huangshan Road, Hefei, 230026, People's Republic of China. .,Anhui Key Laboratory of Software Engineering in Computing and Communication, University of Science and Technology of China, Huangshan Road, Hefei, 230026, People's Republic of China. .,Anhui Province Key Lab. of Big Data Analysis and Application, University of Science and Technology of China, Huangshan Road, Hefei, 230026, People's Republic of China.
| |
Collapse
|
9
|
Hoyt CT, Domingo-Fernández D, Hofmann-Apitius M. BEL Commons: an environment for exploration and analysis of networks encoded in Biological Expression Language. DATABASE-THE JOURNAL OF BIOLOGICAL DATABASES AND CURATION 2018; 2018:5255171. [PMID: 30576488 PMCID: PMC6301338 DOI: 10.1093/database/bay126] [Citation(s) in RCA: 13] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 04/03/2018] [Accepted: 11/05/2018] [Indexed: 12/19/2022]
Abstract
The rapid accumulation of knowledge in the field of systems and networks biology during recent years requires complex, but user-friendly and accessible web applications that allow from visualization to complex algorithmic analysis. While several web applications exist with various focuses on creation, revision, curation, storage, integration, collaboration, exploration, visualization and analysis, many of these services remain disjoint and have yet to be packaged into a cohesive environment. Here, we present BEL Commons: an integrative knowledge discovery environment for networks encoded in the Biological Expression Language (BEL). Users can upload files in BEL to be parsed, validated, compiled and stored with fine granular permissions. After, users can summarize, explore and optionally shared their networks with the scientific community. We have implemented a query builder wizard to help users find the relevant portions of increasingly large and complex networks and a visualization interface that allows them to explore their resulting networks. Finally, we have included a dedicated analytical service for performing data-driven analysis of knowledge networks to support hypothesis generation.
Collapse
Affiliation(s)
- Charles Tapley Hoyt
- Department of Bioinformatics, Fraunhofer Institute for Algorithms and Scientific Computing, Sankt Augustin, Germany.,Bonn-Aachen International Center for IT, Rheinische Friedrich-Wilhelms-Universität Bonn, Bonn, Germany
| | - Daniel Domingo-Fernández
- Department of Bioinformatics, Fraunhofer Institute for Algorithms and Scientific Computing, Sankt Augustin, Germany.,Bonn-Aachen International Center for IT, Rheinische Friedrich-Wilhelms-Universität Bonn, Bonn, Germany
| | - Martin Hofmann-Apitius
- Department of Bioinformatics, Fraunhofer Institute for Algorithms and Scientific Computing, Sankt Augustin, Germany.,Bonn-Aachen International Center for IT, Rheinische Friedrich-Wilhelms-Universität Bonn, Bonn, Germany
| |
Collapse
|
10
|
Rindflesch TC, Blake CL, Fiszman M, Kilicoglu H, Rosemblat G, Schneider J, Zeiss CJ. Informatics Support for Basic Research in Biomedicine. ILAR J 2017; 58:80-89. [PMID: 28838071 DOI: 10.1093/ilar/ilx004] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/15/2016] [Accepted: 01/13/2017] [Indexed: 11/13/2022] Open
Abstract
Informatics methodologies exploit computer-assisted techniques to help biomedical researchers manage large amounts of information. In this paper, we focus on the biomedical research literature (MEDLINE). We first provide an overview of some text mining techniques that offer assistance in research by identifying biomedical entities (e.g., genes, substances, and diseases) and relations between them in text.We then discuss Semantic MEDLINE, an application that integrates PubMed document retrieval, concept and relation identification, and visualization, thus enabling a user to explore concepts and relations from within a set of retrieved citations. Semantic MEDLINE provides a roadmap through content and helps users discern patterns in large numbers of retrieved citations. We illustrate its use with an informatics method we call "discovery browsing," which provides a principled way of navigating through selected aspects of some biomedical research area. The method supports an iterative process that accommodates learning and hypothesis formation in which a user is provided with high level connections before delving into details.As a use case, we examine current developments in basic research on mechanisms of Alzheimer's disease. Out of the nearly 90 000 citations returned by the PubMed query "Alzheimer's disease," discovery browsing led us to 73 citations on sortilin and that disorder. We provide a synopsis of the basic research reported in 15 of these. There is wide-spread consensus among researchers working with a range of animal models and human cells that increased sortilin expression and decreased receptor expression are associated with amyloid beta and/or amyloid precursor protein.
Collapse
Affiliation(s)
- Thomas C Rindflesch
- Lister Hill National Center for Biomedical Communications, National Library of Medicine, Bethesda, Maryland. School of Information Sciences, University of Illinois, Urbana-Champaign; Center for Informatics in Science and Scholarship. Lister Hill National Center for Biomedical Communications, National Library of Medicine, Bethesda, Maryland. Lister Hill National Center for Biomedical Communications, National Library of Medicine, Bethesda, Maryland. Lister Hill National Center for Biomedical Communications, National Library of Medicine, Bethesda, Maryland. School of Information Sciences, University of Illinois Urbana-Champaign, Champaign, Illinois. Yale University School of Medicine, New Haven, Connecticut
| | - Catherine L Blake
- Lister Hill National Center for Biomedical Communications, National Library of Medicine, Bethesda, Maryland. School of Information Sciences, University of Illinois, Urbana-Champaign; Center for Informatics in Science and Scholarship. Lister Hill National Center for Biomedical Communications, National Library of Medicine, Bethesda, Maryland. Lister Hill National Center for Biomedical Communications, National Library of Medicine, Bethesda, Maryland. Lister Hill National Center for Biomedical Communications, National Library of Medicine, Bethesda, Maryland. School of Information Sciences, University of Illinois Urbana-Champaign, Champaign, Illinois. Yale University School of Medicine, New Haven, Connecticut
| | - Marcelo Fiszman
- Lister Hill National Center for Biomedical Communications, National Library of Medicine, Bethesda, Maryland. School of Information Sciences, University of Illinois, Urbana-Champaign; Center for Informatics in Science and Scholarship. Lister Hill National Center for Biomedical Communications, National Library of Medicine, Bethesda, Maryland. Lister Hill National Center for Biomedical Communications, National Library of Medicine, Bethesda, Maryland. Lister Hill National Center for Biomedical Communications, National Library of Medicine, Bethesda, Maryland. School of Information Sciences, University of Illinois Urbana-Champaign, Champaign, Illinois. Yale University School of Medicine, New Haven, Connecticut
| | - Halil Kilicoglu
- Lister Hill National Center for Biomedical Communications, National Library of Medicine, Bethesda, Maryland. School of Information Sciences, University of Illinois, Urbana-Champaign; Center for Informatics in Science and Scholarship. Lister Hill National Center for Biomedical Communications, National Library of Medicine, Bethesda, Maryland. Lister Hill National Center for Biomedical Communications, National Library of Medicine, Bethesda, Maryland. Lister Hill National Center for Biomedical Communications, National Library of Medicine, Bethesda, Maryland. School of Information Sciences, University of Illinois Urbana-Champaign, Champaign, Illinois. Yale University School of Medicine, New Haven, Connecticut
| | - Graciela Rosemblat
- Lister Hill National Center for Biomedical Communications, National Library of Medicine, Bethesda, Maryland. School of Information Sciences, University of Illinois, Urbana-Champaign; Center for Informatics in Science and Scholarship. Lister Hill National Center for Biomedical Communications, National Library of Medicine, Bethesda, Maryland. Lister Hill National Center for Biomedical Communications, National Library of Medicine, Bethesda, Maryland. Lister Hill National Center for Biomedical Communications, National Library of Medicine, Bethesda, Maryland. School of Information Sciences, University of Illinois Urbana-Champaign, Champaign, Illinois. Yale University School of Medicine, New Haven, Connecticut
| | - Jodi Schneider
- Lister Hill National Center for Biomedical Communications, National Library of Medicine, Bethesda, Maryland. School of Information Sciences, University of Illinois, Urbana-Champaign; Center for Informatics in Science and Scholarship. Lister Hill National Center for Biomedical Communications, National Library of Medicine, Bethesda, Maryland. Lister Hill National Center for Biomedical Communications, National Library of Medicine, Bethesda, Maryland. Lister Hill National Center for Biomedical Communications, National Library of Medicine, Bethesda, Maryland. School of Information Sciences, University of Illinois Urbana-Champaign, Champaign, Illinois. Yale University School of Medicine, New Haven, Connecticut
| | - Caroline J Zeiss
- Lister Hill National Center for Biomedical Communications, National Library of Medicine, Bethesda, Maryland. School of Information Sciences, University of Illinois, Urbana-Champaign; Center for Informatics in Science and Scholarship. Lister Hill National Center for Biomedical Communications, National Library of Medicine, Bethesda, Maryland. Lister Hill National Center for Biomedical Communications, National Library of Medicine, Bethesda, Maryland. Lister Hill National Center for Biomedical Communications, National Library of Medicine, Bethesda, Maryland. School of Information Sciences, University of Illinois Urbana-Champaign, Champaign, Illinois. Yale University School of Medicine, New Haven, Connecticut
| |
Collapse
|
11
|
Active learning for ontological event extraction incorporating named entity recognition and unknown word handling. J Biomed Semantics 2016; 7:22. [PMID: 27127603 PMCID: PMC4849099 DOI: 10.1186/s13326-016-0059-z] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/21/2015] [Accepted: 03/28/2016] [Indexed: 11/15/2022] Open
Abstract
Background Biomedical text mining may target various kinds of valuable information embedded in the literature, but a critical obstacle to the extension of the mining targets is the cost of manual construction of labeled data, which are required for state-of-the-art supervised learning systems. Active learning is to choose the most informative documents for the supervised learning in order to reduce the amount of required manual annotations. Previous works of active learning, however, focused on the tasks of entity recognition and protein-protein interactions, but not on event extraction tasks for multiple event types. They also did not consider the evidence of event participants, which might be a clue for the presence of events in unlabeled documents. Moreover, the confidence scores of events produced by event extraction systems are not reliable for ranking documents in terms of informativity for supervised learning. We here propose a novel committee-based active learning method that supports multi-event extraction tasks and employs a new statistical method for informativity estimation instead of using the confidence scores from event extraction systems. Methods Our method is based on a committee of two systems as follows: We first employ an event extraction system to filter potential false negatives among unlabeled documents, from which the system does not extract any event. We then develop a statistical method to rank the potential false negatives of unlabeled documents 1) by using a language model that measures the probabilities of the expression of multiple events in documents and 2) by using a named entity recognition system that locates the named entities that can be event arguments (e.g. proteins). The proposed method further deals with unknown words in test data by using word similarity measures. We also apply our active learning method for the task of named entity recognition. Results and conclusion We evaluate the proposed method against the BioNLP Shared Tasks datasets, and show that our method can achieve better performance than such previous methods as entropy and Gibbs error based methods and a conventional committee-based method. We also show that the incorporation of named entity recognition into the active learning for event extraction and the unknown word handling further improve the active learning method. In addition, the adaptation of the active learning method into named entity recognition tasks also improves the document selection for manual annotation of named entities.
Collapse
|
12
|
Abstract
Natural language processing employs computational techniques for the purpose of learning, understanding, and producing human language content. Early computational approaches to language research focused on automating the analysis of the linguistic structure of language and developing basic technologies such as machine translation, speech recognition, and speech synthesis. Today's researchers refine and make use of such tools in real-world applications, creating spoken dialogue systems and speech-to-speech translation engines, mining social media for information about health or finance, and identifying sentiment and emotion toward products and services. We describe successes and challenges in this rapidly advancing area.
Collapse
Affiliation(s)
- Julia Hirschberg
- Department of Computer Science, Columbia University, New York, NY 10027, USA.
| | - Christopher D Manning
- Department of Linguistics, Stanford University, Stanford, CA 94305-2150, USA. Department of Computer Science, Stanford University, Stanford, CA 94305-9020, USA
| |
Collapse
|
13
|
Pyysalo S, Ohta T, Rak R, Rowley A, Chun HW, Jung SJ, Choi SP, Tsujii J, Ananiadou S. Overview of the Cancer Genetics and Pathway Curation tasks of BioNLP Shared Task 2013. BMC Bioinformatics 2015; 16 Suppl 10:S2. [PMID: 26202570 PMCID: PMC4511510 DOI: 10.1186/1471-2105-16-s10-s2] [Citation(s) in RCA: 30] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/24/2022] Open
Abstract
BACKGROUND Since their introduction in 2009, the BioNLP Shared Task events have been instrumental in advancing the development of methods and resources for the automatic extraction of information from the biomedical literature. In this paper, we present the Cancer Genetics (CG) and Pathway Curation (PC) tasks, two event extraction tasks introduced in the BioNLP Shared Task 2013. The CG task focuses on cancer, emphasizing the extraction of physiological and pathological processes at various levels of biological organization, and the PC task targets reactions relevant to the development of biomolecular pathway models, defining its extraction targets on the basis of established pathway representations and ontologies. RESULTS Six groups participated in the CG task and two groups in the PC task, together applying a wide range of extraction approaches including both established state-of-the-art systems and newly introduced extraction methods. The best-performing systems achieved F-scores of 55% on the CG task and 53% on the PC task, demonstrating a level of performance comparable to the best results achieved in similar previously proposed tasks. CONCLUSIONS The results indicate that existing event extraction technology can generalize to meet the novel challenges represented by the CG and PC task settings, suggesting that extraction methods are capable of supporting the construction of knowledge bases on the molecular mechanisms of cancer and the curation of biomolecular pathway models. The CG and PC tasks continue as open challenges for all interested parties, with data, tools and resources available from the shared task homepage.
Collapse
Affiliation(s)
- Sampo Pyysalo
- Department of Information technology, University of Turku, Turku, Finland
| | | | - Rafal Rak
- National Centre for Text Mining and School of Computer Science, University of Manchester, Manchester, UK
| | - Andrew Rowley
- National Centre for Text Mining and School of Computer Science, University of Manchester, Manchester, UK
| | - Hong-Woo Chun
- Software Research Center, Korea Institute of Science and Technology Information (KISTI), Daejeon, South Korea
| | - Sung-Jae Jung
- Software Research Center, Korea Institute of Science and Technology Information (KISTI), Daejeon, South Korea
- Department of Applied Information Science, University of Science and Technology (UST), Daejeon, South Korea
| | - Sung-Pil Choi
- Department of Library and Information Science, Kyonggi University, Suwon, South Korea
| | | | - Sophia Ananiadou
- National Centre for Text Mining and School of Computer Science, University of Manchester, Manchester, UK
| |
Collapse
|
14
|
A Maximum Entropy-Based Bio-Molecular Event Extraction Model that Considers Event Generation. JOURNAL OF INFORMATION PROCESSING SYSTEMS 2014. [DOI: 10.3745/jips.04.0008] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/03/2022]
|
15
|
Zhou D, Zhong D, He Y. Biomedical relation extraction: from binary to complex. COMPUTATIONAL AND MATHEMATICAL METHODS IN MEDICINE 2014; 2014:298473. [PMID: 25214883 PMCID: PMC4156999 DOI: 10.1155/2014/298473] [Citation(s) in RCA: 25] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 04/02/2014] [Revised: 06/09/2014] [Accepted: 07/08/2014] [Indexed: 12/01/2022]
Abstract
Biomedical relation extraction aims to uncover high-quality relations from life science literature with high accuracy and efficiency. Early biomedical relation extraction tasks focused on capturing binary relations, such as protein-protein interactions, which are crucial for virtually every process in a living cell. Information about these interactions provides the foundations for new therapeutic approaches. In recent years, more interests have been shifted to the extraction of complex relations such as biomolecular events. While complex relations go beyond binary relations and involve more than two arguments, they might also take another relation as an argument. In the paper, we conduct a thorough survey on the research in biomedical relation extraction. We first present a general framework for biomedical relation extraction and then discuss the approaches proposed for binary and complex relation extraction with focus on the latter since it is a much more difficult task compared to binary relation extraction. Finally, we discuss challenges that we are facing with complex relation extraction and outline possible solutions and future directions.
Collapse
Affiliation(s)
- Deyu Zhou
- School of Computer Science and Engineering, Key Laboratory of Computer Network and Information Integration, Ministry of Education, Southeast University, Nanjing 210096, China
| | - Dayou Zhong
- School of Computer Science and Engineering, Key Laboratory of Computer Network and Information Integration, Ministry of Education, Southeast University, Nanjing 210096, China
| | - Yulan He
- School of Engineering and Applied Science, Aston University, Birmingham B4 7ET, UK
| |
Collapse
|
16
|
Xia J, Fang AC, Zhang X. A novel feature selection strategy for enhanced biomedical event extraction using the Turku system. BIOMED RESEARCH INTERNATIONAL 2014; 2014:205239. [PMID: 24800214 PMCID: PMC3997098 DOI: 10.1155/2014/205239] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 01/14/2014] [Revised: 02/22/2014] [Accepted: 03/03/2014] [Indexed: 12/25/2022]
Abstract
Feature selection is of paramount importance for text-mining classifiers with high-dimensional features. The Turku Event Extraction System (TEES) is the best performing tool in the GENIA BioNLP 2009/2011 shared tasks, which relies heavily on high-dimensional features. This paper describes research which, based on an implementation of an accumulated effect evaluation (AEE) algorithm applying the greedy search strategy, analyses the contribution of every single feature class in TEES with a view to identify important features and modify the feature set accordingly. With an updated feature set, a new system is acquired with enhanced performance which achieves an increased F-score of 53.27% up from 51.21% for Task 1 under strict evaluation criteria and 57.24% according to the approximate span and recursive criterion.
Collapse
Affiliation(s)
- Jingbo Xia
- College of Science, Huazhong Agricultural University, Wuhan, Hubei 430070, China
- Department of Chinese, Translation and Linguistics, City University of Hong Kong, Kowloon, Hong Kong
| | - Alex Chengyu Fang
- Department of Chinese, Translation and Linguistics, City University of Hong Kong, Kowloon, Hong Kong
- The Halliday Centre for Intelligent Applications of Language Studies, City University of Hong Kong, Kowloon, Hong Kong
| | - Xing Zhang
- Department of Chinese, Translation and Linguistics, City University of Hong Kong, Kowloon, Hong Kong
- The Halliday Centre for Intelligent Applications of Language Studies, City University of Hong Kong, Kowloon, Hong Kong
| |
Collapse
|
17
|
Abstract
Background Time delays are important factors that are often neglected in gene regulatory network (GRN) inference models. Validating time delays from knowledge bases is a challenge since the vast majority of biological databases do not record temporal information of gene regulations. Biological knowledge and facts on gene regulations are typically extracted from bio-literature with specialized methods that depend on the regulation task. In this paper, we mine evidences for time delays related to the transcriptional regulation of yeast from the PubMed abstracts. Results Since the vast majority of abstracts lack quantitative time information, we can only collect qualitative evidences of time delays. Specifically, the speed-up or delay in transcriptional regulation rate can provide evidences for time delays (shorter or longer) in GRN. Thus, we focus on deriving events related to rate changes in transcriptional regulation. A corpus of yeast regulation related abstracts was manually labeled with such events. In order to capture these events automatically, we create an ontology of sub-processes that are likely to result in transcription rate changes by combining textual patterns and biological knowledge. We also propose effective feature extraction methods based on the created ontology to identify the direct evidences with specific details of these events. Our ontologies outperform existing state-of-the-art gene regulation ontologies in the automatic rule learning method applied to our corpus. The proposed deterministic ontology rule-based method can achieve comparable performance to the automatic rule learning method based on decision trees. This demonstrates the effectiveness of our ontology in identifying rate-changing events. We also tested the effectiveness of the proposed feature mining methods on detecting direct evidence of events. Experimental results show that the machine learning method on these features achieves an F1-score of 71.43%. Conclusions The manually labeled corpus of events relating to rate changes in transcriptional regulation for yeast is available in https://sites.google.com/site/wentingntu/data. The created ontologies summarized both biological causes of rate changes in transcriptional regulation and corresponding positive and negative textual patterns from the corpus. They are demonstrated to be effective in identifying rate-changing events, which shows the benefits of combining textual patterns and biological knowledge on extracting complex biological events.
Collapse
|
18
|
U-Compare bio-event meta-service: compatible BioNLP event extraction services. BMC Bioinformatics 2011; 12:481. [PMID: 22177292 PMCID: PMC3299809 DOI: 10.1186/1471-2105-12-481] [Citation(s) in RCA: 10] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/31/2011] [Accepted: 12/18/2011] [Indexed: 11/10/2022] Open
|