1
|
Zhang Y, Peng J, Cheng B, Liu Y, Jiang C. MMR: A Multi-view Merge Representation model for Chemical-Disease relation extraction. Comput Biol Chem 2024; 110:108063. [PMID: 38613989 DOI: 10.1016/j.compbiolchem.2024.108063] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/09/2023] [Revised: 03/13/2024] [Accepted: 03/25/2024] [Indexed: 04/15/2024]
Abstract
Chemical-Disease relation (CDR) extraction aims to identify the semantic relations between chemical and disease entities in the unstructured biomedical document, which provides a basis for downstream tasks such as clinical medical diagnosis and drug discovery. Compared with general domain relation extraction, it needs a more effective representation of the whole document due to the specialized nature of texts in the biomedical domain, including the biomedical entity and entity-pair representation. In this paper, we propose a novel Multi-view Merge Representation (MMR) model to thoroughly capture entity and entity-pair representation of the document. First, we utilize prior knowledge and a pre-trained transformer encoder to capture entity semantic representation. Then we employ the U-Net layer and Graph Convolution Network layer to capture global entity-pair representation. Finally, we get a specific merged representation for each entity pair to be classified. We evaluate our model on the CDR dataset published by the BioCreative-V community and achieve a state-of-the-art result.
Collapse
Affiliation(s)
- Yi Zhang
- Intelligent Bioinformatics Laboratory, School of Computer and Artificial Intelligence, Wuhan University of Technology, Wuhan, 430070, China.
| | - Jing Peng
- Intelligent Bioinformatics Laboratory, School of Computer and Artificial Intelligence, Wuhan University of Technology, Wuhan, 430070, China.
| | - Baitai Cheng
- Intelligent Bioinformatics Laboratory, School of Computer and Artificial Intelligence, Wuhan University of Technology, Wuhan, 430070, China.
| | - Yang Liu
- Intelligent Bioinformatics Laboratory, School of Computer and Artificial Intelligence, Wuhan University of Technology, Wuhan, 430070, China.
| | - Chi Jiang
- Intelligent Bioinformatics Laboratory, School of Computer and Artificial Intelligence, Wuhan University of Technology, Wuhan, 430070, China.
| |
Collapse
|
2
|
Cai L, Li J, Lv H, Liu W, Niu H, Wang Z. Integrating domain knowledge for biomedical text analysis into deep learning: A survey. J Biomed Inform 2023; 143:104418. [PMID: 37290540 DOI: 10.1016/j.jbi.2023.104418] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/16/2022] [Revised: 04/24/2023] [Accepted: 05/31/2023] [Indexed: 06/10/2023]
Abstract
The past decade has witnessed an explosion of textual information in the biomedical field. Biomedical texts provide a basis for healthcare delivery, knowledge discovery, and decision-making. Over the same period, deep learning has achieved remarkable performance in biomedical natural language processing, however, its development has been limited by well-annotated datasets and interpretability. To solve this, researchers have considered combining domain knowledge (such as biomedical knowledge graph) with biomedical data, which has become a promising means of introducing more information into biomedical datasets and following evidence-based medicine. This paper comprehensively reviews more than 150 recent literature studies on incorporating domain knowledge into deep learning models to facilitate typical biomedical text analysis tasks, including information extraction, text classification, and text generation. We eventually discuss various challenges and future directions.
Collapse
Affiliation(s)
- Linkun Cai
- School of Biological Science and Medical Engineering, Beihang University, 100191 Beijing, China
| | - Jia Li
- Department of Radiology, Beijing Friendship Hospital, Capital Medical University, 100050 Beijing, China
| | - Han Lv
- Department of Radiology, Beijing Friendship Hospital, Capital Medical University, 100050 Beijing, China
| | - Wenjuan Liu
- Aerospace Center Hospital, 100049 Beijing, China
| | - Haijun Niu
- School of Biological Science and Medical Engineering, Beihang University, 100191 Beijing, China
| | - Zhenchang Wang
- School of Biological Science and Medical Engineering, Beihang University, 100191 Beijing, China; Department of Radiology, Beijing Friendship Hospital, Capital Medical University, 100050 Beijing, China.
| |
Collapse
|
3
|
Trajanov D, Trajkovski V, Dimitrieva M, Dobreva J, Jovanovik M, Klemen M, Žagar A, Robnik-Šikonja M. Review of Natural Language Processing in Pharmacology. Pharmacol Rev 2023; 75:714-738. [PMID: 36931724 DOI: 10.1124/pharmrev.122.000715] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/04/2022] [Revised: 01/18/2023] [Accepted: 03/07/2023] [Indexed: 03/19/2023] Open
Abstract
Natural language processing (NLP) is an area of artificial intelligence that applies information technologies to process the human language, understand it to a certain degree, and use it in various applications. This area has rapidly developed in the past few years and now employs modern variants of deep neural networks to extract relevant patterns from large text corpora. The main objective of this work is to survey the recent use of NLP in the field of pharmacology. As our work shows, NLP is a highly relevant information extraction and processing approach for pharmacology. It has been used extensively, from intelligent searches through thousands of medical documents to finding traces of adversarial drug interactions in social media. We split our coverage into five categories to survey modern NLP: methodology, commonly addressed tasks, relevant textual data, knowledge bases, and useful programming libraries. We split each of the five categories into appropriate subcategories, describe their main properties and ideas, and summarize them in a tabular form. The resulting survey presents a comprehensive overview of the area, useful to practitioners and interested observers. SIGNIFICANCE STATEMENT: The main objective of this work is to survey the recent use of NLP in the field of pharmacology in order to provide a comprehensive overview of the current state in the area after the rapid developments that occurred in the past few years. The resulting survey will be useful to practitioners and interested observers in the domain.
Collapse
Affiliation(s)
- Dimitar Trajanov
- Faculty of Computer Science and Engineering, Ss. Cyril and Methodius University in Skopje, North Macedonia (D.T., V.T., M.D., J.D., M.J.); Computer Science Department, Metropolitan College, Boston University, Boston, Massachusetts (D.T.); and Faculty of Computer and Information Science, University of Ljubljana, Slovenia (M.K., A.Ž., M.R.- Š.)
| | - Vangel Trajkovski
- Faculty of Computer Science and Engineering, Ss. Cyril and Methodius University in Skopje, North Macedonia (D.T., V.T., M.D., J.D., M.J.); Computer Science Department, Metropolitan College, Boston University, Boston, Massachusetts (D.T.); and Faculty of Computer and Information Science, University of Ljubljana, Slovenia (M.K., A.Ž., M.R.- Š.)
| | - Makedonka Dimitrieva
- Faculty of Computer Science and Engineering, Ss. Cyril and Methodius University in Skopje, North Macedonia (D.T., V.T., M.D., J.D., M.J.); Computer Science Department, Metropolitan College, Boston University, Boston, Massachusetts (D.T.); and Faculty of Computer and Information Science, University of Ljubljana, Slovenia (M.K., A.Ž., M.R.- Š.)
| | - Jovana Dobreva
- Faculty of Computer Science and Engineering, Ss. Cyril and Methodius University in Skopje, North Macedonia (D.T., V.T., M.D., J.D., M.J.); Computer Science Department, Metropolitan College, Boston University, Boston, Massachusetts (D.T.); and Faculty of Computer and Information Science, University of Ljubljana, Slovenia (M.K., A.Ž., M.R.- Š.)
| | - Milos Jovanovik
- Faculty of Computer Science and Engineering, Ss. Cyril and Methodius University in Skopje, North Macedonia (D.T., V.T., M.D., J.D., M.J.); Computer Science Department, Metropolitan College, Boston University, Boston, Massachusetts (D.T.); and Faculty of Computer and Information Science, University of Ljubljana, Slovenia (M.K., A.Ž., M.R.- Š.)
| | - Matej Klemen
- Faculty of Computer Science and Engineering, Ss. Cyril and Methodius University in Skopje, North Macedonia (D.T., V.T., M.D., J.D., M.J.); Computer Science Department, Metropolitan College, Boston University, Boston, Massachusetts (D.T.); and Faculty of Computer and Information Science, University of Ljubljana, Slovenia (M.K., A.Ž., M.R.- Š.)
| | - Aleš Žagar
- Faculty of Computer Science and Engineering, Ss. Cyril and Methodius University in Skopje, North Macedonia (D.T., V.T., M.D., J.D., M.J.); Computer Science Department, Metropolitan College, Boston University, Boston, Massachusetts (D.T.); and Faculty of Computer and Information Science, University of Ljubljana, Slovenia (M.K., A.Ž., M.R.- Š.)
| | - Marko Robnik-Šikonja
- Faculty of Computer Science and Engineering, Ss. Cyril and Methodius University in Skopje, North Macedonia (D.T., V.T., M.D., J.D., M.J.); Computer Science Department, Metropolitan College, Boston University, Boston, Massachusetts (D.T.); and Faculty of Computer and Information Science, University of Ljubljana, Slovenia (M.K., A.Ž., M.R.- Š.)
| |
Collapse
|
4
|
Sun Y, Wang J, Lin H, Zhang Y, Yang Z. Knowledge Guided Attention and Graph Convolutional Networks for Chemical-Disease Relation Extraction. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2023; 20:489-499. [PMID: 34962873 DOI: 10.1109/tcbb.2021.3135844] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/04/2023]
Abstract
The automatic extraction of the chemical-disease relation (CDR) from the text becomes critical because it takes a lot of time and effort to extract valuable CDR manually. Studies have shown that prior knowledge from the biomedical knowledge base is important for relation extraction. The method of combining deep learning models with prior knowledge is worthy of our study. In this paper, we propose a new model called Knowledge Guided Attention and Graph Convolutional Networks (KGAGN) for CDR extraction. First, to make full advantage of domain knowledge, we train entity embedding as a feature representation of input sequence, and relation embedding to capture weighted contextual information further through the attention mechanism. Then, to make full advantage of syntactic dependency information in cross-sentence CDR extraction, we construct document-level syntactic dependency graphs and encode them using a graph convolution network (GCN). Finally, the chemical-induced disease (CID) relation is extracted by using weighted context features and long-range dependency features both of which contain additional knowledge information We evaluated our model on the CDR dataset published by the BioCreative-V community and achieves an F1-score of 73.3%, surpassing other state-of-the-art methods. the code implemented by PyTorch 1.7.0 deep learning library can be downloaded from Github: https://github.com/sunyi123/cdr.
Collapse
|
5
|
Gu J, Chersoni E, Wang X, Huang CR, Qian L, Zhou G. LitCovid ensemble learning for COVID-19 multi-label classification. Database (Oxford) 2022; 2022:6846687. [PMID: 36426767 PMCID: PMC9693804 DOI: 10.1093/database/baac103] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/05/2022] [Revised: 10/27/2022] [Accepted: 11/04/2022] [Indexed: 11/27/2022]
Abstract
The Coronavirus Disease 2019 (COVID-19) pandemic has shifted the focus of research worldwide, and more than 10 000 new articles per month have concentrated on COVID-19-related topics. Considering this rapidly growing literature, the efficient and precise extraction of the main topics of COVID-19-relevant articles is of great importance. The manual curation of this information for biomedical literature is labor-intensive and time-consuming, and as such the procedure is insufficient and difficult to maintain. In response to these complications, the BioCreative VII community has proposed a challenging task, LitCovid Track, calling for a global effort to automatically extract semantic topics for COVID-19 literature. This article describes our work on the BioCreative VII LitCovid Track. We proposed the LitCovid Ensemble Learning (LCEL) method for the tasks and integrated multiple biomedical pretrained models to address the COVID-19 multi-label classification problem. Specifically, seven different transformer-based pretrained models were ensembled for the initialization and fine-tuning processes independently. To enhance the representation abilities of the deep neural models, diverse additional biomedical knowledge was utilized to facilitate the fruitfulness of the semantic expressions. Simple yet effective data augmentation was also leveraged to address the learning deficiency during the training phase. In addition, given the imbalanced label distribution of the challenging task, a novel asymmetric loss function was applied to the LCEL model, which explicitly adjusted the negative-positive importance by assigning different exponential decay factors and helped the model focus on the positive samples. After the training phase, an ensemble bagging strategy was adopted to merge the outputs from each model for final predictions. The experimental results show the effectiveness of our proposed approach, as LCEL obtains the state-of-the-art performance on the LitCovid dataset. Database URL: https://github.com/JHnlp/LCEL.
Collapse
Affiliation(s)
| | - Emmanuele Chersoni
- Department of Chinese and Bilingual Studies, The Hong Kong Polytechnic University, Hong Kong 999077, China
| | - Xing Wang
- Tencent AI Lab, Shenzhen 518071, China
| | - Chu-Ren Huang
- Department of Chinese and Bilingual Studies, The Hong Kong Polytechnic University, Hong Kong 999077, China
| | - Longhua Qian
- School of Computer Science and Technology, Soochow University, Suzhou 215006, China
| | - Guodong Zhou
- School of Computer Science and Technology, Soochow University, Suzhou 215006, China
| |
Collapse
|
6
|
Li Z, Wang M, Peng D, Liu J, Xie Y, Dai Z, Zou X. Identification of Chemical-Disease Associations Through Integration of Molecular Fingerprint, Gene Ontology and Pathway Information. Interdiscip Sci 2022; 14:683-696. [PMID: 35391615 DOI: 10.1007/s12539-022-00511-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/28/2021] [Revised: 03/16/2022] [Accepted: 03/17/2022] [Indexed: 06/14/2023]
Abstract
The identification of chemical-disease association types is helpful not only to discovery lead compounds and study drug repositioning, but also to treat disease and decipher pathomechanism. It is very urgent to develop computational method for identifying potential chemical-disease association types, since wet methods are usually expensive, laborious and time-consuming. In this study, molecular fingerprint, gene ontology and pathway are utilized to characterize chemicals and diseases. A novel predictor is proposed to recognize potential chemical-disease associations at the first layer, and further distinguish whether their relationships belong to biomarker or therapeutic relations at the second layer. The prediction performance of current method is assessed using the benchmark dataset based on ten-fold cross-validation. The practical prediction accuracies of the first layer and the second layer are 78.47% and 72.07%, respectively. The recognition ability for lead compounds, new drug indications, potential and true chemical-disease association pairs has also been investigated and confirmed by constructing a variety of datasets and performing a series of experiments. It is anticipated that the current method can be considered as a powerful high-throughput virtual screening tool for drug researches and developments.
Collapse
Affiliation(s)
- Zhanchao Li
- School of Chemistry and Chemical Engineering, Guangdong Pharmaceutical University, Guangzhou, 510006, People's Republic of China.
- NMPA Key Laboratory for Technology Research and Evaluation of Pharmacovigilance, Guangzhou, 510006, People's Republic of China.
- Key Laboratory of Digital Quality Evaluation of Chinese Materia Medica of State Administration of Traditional Chinese Medicine, Guangzhou, 510006, People's Republic of China.
| | - Mengru Wang
- School of Chemistry and Chemical Engineering, Guangdong Pharmaceutical University, Guangzhou, 510006, People's Republic of China
| | - Dongdong Peng
- School of Chemistry and Chemical Engineering, Guangdong Pharmaceutical University, Guangzhou, 510006, People's Republic of China
| | - Jie Liu
- School of Chemistry and Chemical Engineering, Guangdong Pharmaceutical University, Guangzhou, 510006, People's Republic of China
| | - Yun Xie
- HuiZhou University, Huizhou, 516007, People's Republic of China
| | - Zong Dai
- School of Biomedical Engineering, Sun Yat-Sen University, Guangzhou, 510275, People's Republic of China
| | - Xiaoyong Zou
- School of Chemistry, Sun Yat-Sen University, Guangzhou, 510275, People's Republic of China.
| |
Collapse
|
7
|
Stocker M, Heger T, Schweidtmann A, Ćwiek-Kupczyńska H, Penev L, Dojchinovski M, Willighagen E, Vidal ME, Turki H, Balliet D, Tiddi I, Kuhn T, Mietchen D, Karras O, Vogt L, Hellmann S, Jeschke J, Krajewski P, Auer S. SKG4EOSC - Scholarly Knowledge Graphs for EOSC: Establishing a backbone of knowledge graphs for FAIR Scholarly Information in EOSC. RESEARCH IDEAS AND OUTCOMES 2022. [DOI: 10.3897/rio.8.e83789] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022] Open
Abstract
In the age of advanced information systems powering fast-paced knowledge economies that face global societal challenges, it is no longer adequate to express scholarly information - an essential resource for modern economies - primarily as article narratives in document form. Despite being a well-established tradition in scholarly communication, PDF-based text publishing is hindering scientific progress as it buries scholarly information into non-machine-readable formats. The key objective of SKG4EOSC is to improve science productivity through development and implementation of services for text and data conversion, and production, curation, and re-use of FAIR scholarly information. This will be achieved by (1) establishing the Open Research Knowledge Graph (ORKG, orkg.org), a service operated by the SKG4EOSC coordinator, as a Hub for access to FAIR scholarly information in the EOSC; (2) lifting to EOSC of numerous and heterogeneous domain-specific research infrastructures through the ORKG Hub’s harmonized access facilities; and (3) leverage the Hub to support cross-disciplinary research and policy decisions addressing societal challenges. SKG4EOSC will pilot the devised approaches and technologies in four research domains: biodiversity crisis, precision oncology, circular processes, and human cooperation. With the aim to improve machine-based scholarly information use, SKG4EOSC addresses an important current and future need of researchers. It extends the application of the FAIR data principles to scholarly communication practices, hence a more comprehensive coverage of the entire research lifecycle. Through explicit, machine actionable provenance links between FAIR scholarly information, primary data and contextual entities, it will substantially contribute to reproducibility, validation and trust in science. The resulting advanced machine support will catalyse new discoveries in basic research and solutions in key application areas.
Collapse
|
8
|
Yang J, Han SC, Poon J. A survey on extraction of causal relations from natural language text. Knowl Inf Syst 2022. [DOI: 10.1007/s10115-022-01665-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/18/2022]
Abstract
AbstractAs an essential component of human cognition, cause–effect relations appear frequently in text, and curating cause–effect relations from text helps in building causal networks for predictive tasks. Existing causality extraction techniques include knowledge-based, statistical machine learning (ML)-based, and deep learning-based approaches. Each method has its advantages and weaknesses. For example, knowledge-based methods are understandable but require extensive manual domain knowledge and have poor cross-domain applicability. Statistical machine learning methods are more automated because of natural language processing (NLP) toolkits. However, feature engineering is labor-intensive, and toolkits may lead to error propagation. In the past few years, deep learning techniques attract substantial attention from NLP researchers because of its powerful representation learning ability and the rapid increase in computational resources. Their limitations include high computational costs and a lack of adequate annotated training data. In this paper, we conduct a comprehensive survey of causality extraction. We initially introduce primary forms existing in the causality extraction: explicit intra-sentential causality, implicit causality, and inter-sentential causality. Next, we list benchmark datasets and modeling assessment methods for causal relation extraction. Then, we present a structured overview of the three techniques with their representative systems. Lastly, we highlight existing open challenges with their potential directions.
Collapse
|
9
|
Chen J, Hu B, Peng W, Chen Q, Tang B. Biomedical relation extraction via knowledge-enhanced reading comprehension. BMC Bioinformatics 2022; 23:20. [PMID: 34991458 PMCID: PMC8734165 DOI: 10.1186/s12859-021-04534-5] [Citation(s) in RCA: 15] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/03/2020] [Accepted: 12/13/2021] [Indexed: 12/01/2022] Open
Abstract
Background In biomedical research, chemical and disease relation extraction from unstructured biomedical literature is an essential task. Effective context understanding and knowledge integration are two main research problems in this task. Most work of relation extraction focuses on classification for entity mention pairs. Inspired by the effectiveness of machine reading comprehension (RC) in the respect of context understanding, solving biomedical relation extraction with the RC framework at both intra-sentential and inter-sentential levels is a new topic worthy to be explored. Except for the unstructured biomedical text, many structured knowledge bases (KBs) provide valuable guidance for biomedical relation extraction. Utilizing knowledge in the RC framework is also worthy to be investigated. We propose a knowledge-enhanced reading comprehension (KRC) framework to leverage reading comprehension and prior knowledge for biomedical relation extraction. First, we generate questions for each relation, which reformulates the relation extraction task to a question answering task. Second, based on the RC framework, we integrate knowledge representation through an efficient knowledge-enhanced attention interaction mechanism to guide the biomedical relation extraction. Results The proposed model was evaluated on the BioCreative V CDR dataset and CHR dataset. Experiments show that our model achieved a competitive document-level F1 of 71.18% and 93.3%, respectively, compared with other methods. Conclusion Result analysis reveals that open-domain reading comprehension data and knowledge representation can help improve biomedical relation extraction in our proposed KRC framework. Our work can encourage more research on bridging reading comprehension and biomedical relation extraction and promote the biomedical relation extraction.
Collapse
Affiliation(s)
- Jing Chen
- Intelligent Computing Research Center, Harbin Institute of Technology (Shenzhen), Shenzhen, China
| | - Baotian Hu
- Intelligent Computing Research Center, Harbin Institute of Technology (Shenzhen), Shenzhen, China.
| | - Weihua Peng
- Baidu International Technology (Shenzhen) Co., Ltd, Shenzhen, China
| | - Qingcai Chen
- Intelligent Computing Research Center, Harbin Institute of Technology (Shenzhen), Shenzhen, China. .,Peng Cheng Laboratory, Shenzhen, China.
| | - Buzhou Tang
- Intelligent Computing Research Center, Harbin Institute of Technology (Shenzhen), Shenzhen, China.,Peng Cheng Laboratory, Shenzhen, China
| |
Collapse
|
10
|
|
11
|
Lu H, Li L, Li Z, Zhao S. Extracting chemical-induced disease relation by integrating a hierarchical concentrative attention and a hybrid graph-based neural network. J Biomed Inform 2021; 121:103874. [PMID: 34298157 DOI: 10.1016/j.jbi.2021.103874] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/16/2020] [Revised: 07/09/2021] [Accepted: 07/18/2021] [Indexed: 10/20/2022]
Abstract
Extracting the chemical-induced disease relation from literatures is important for biomedical research. On one hand, it is challenging to capture the interactions among remote words and the long-distance information is not adequately exploited by existing systems for document-level relation extraction. On the other hand, there is some information particularly important to the target relations in documents, which should attract more attention than the less relevant information for the relation extraction. However, this issue is not well addressed in existing methods. In this paper, we present a method that integrates a hybrid graph and a hierarchical concentrative attention to overcome these problems. The hybrid graph is constructed by synthesizing the syntactic graph and Abstract Meaning Representation graph to acquire the long-distance information for document-level relation extraction. Meanwhile, the concentrative attention is used to focus on the most important information, and alleviate the disturbance brought by the less relevant items in the document. The experimental results demonstrate that our model yields competitive performance on the dataset of chemical-induced disease relations.
Collapse
Affiliation(s)
- Hongbin Lu
- School of Computer Science and Technology, Dalian University of Technology, 116024 Dalian, China
| | - Lishuang Li
- School of Computer Science and Technology, Dalian University of Technology, 116024 Dalian, China.
| | - Zuocheng Li
- School of Computer Science and Technology, Dalian University of Technology, 116024 Dalian, China
| | - Shiyi Zhao
- School of Computer Science and Technology, Dalian University of Technology, 116024 Dalian, China.
| |
Collapse
|
12
|
Tutubalina E, Alimova I, Miftahutdinov Z, Sakhovskiy A, Malykh V, Nikolenko S. The Russian Drug Reaction Corpus and neural models for drug reactions and effectiveness detection in user reviews. Bioinformatics 2021; 37:243-249. [PMID: 32722774 DOI: 10.1093/bioinformatics/btaa675] [Citation(s) in RCA: 12] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/08/2020] [Revised: 07/14/2020] [Accepted: 07/20/2020] [Indexed: 11/14/2022] Open
Abstract
MOTIVATION Drugs and diseases play a central role in many areas of biomedical research and healthcare. Aggregating knowledge about these entities across a broader range of domains and languages is critical for information extraction (IE) applications. To facilitate text mining methods for analysis and comparison of patient's health conditions and adverse drug reactions reported on the Internet with traditional sources such as drug labels, we present a new corpus of Russian language health reviews. RESULTS The Russian Drug Reaction Corpus (RuDReC) is a new partially annotated corpus of consumer reviews in Russian about pharmaceutical products for the detection of health-related named entities and the effectiveness of pharmaceutical products. The corpus itself consists of two parts, the raw one and the labeled one. The raw part includes 1.4 million health-related user-generated texts collected from various Internet sources, including social media. The labeled part contains 500 consumer reviews about drug therapy with drug- and disease-related information. Labels for sentences include health-related issues or their absence. The sentences with one are additionally labeled at the expression level for identification of fine-grained subtypes such as drug classes and drug forms, drug indications and drug reactions. Further, we present a baseline model for named entity recognition (NER) and multilabel sentence classification tasks on this corpus. The macro F1 score of 74.85% in the NER task was achieved by our RuDR-BERT model. For the sentence classification task, our model achieves the macro F1 score of 68.82% gaining 7.47% over the score of BERT model trained on Russian data. AVAILABILITY AND IMPLEMENTATION We make the RuDReC corpus and pretrained weights of domain-specific BERT models freely available at https://github.com/cimm-kzn/RuDReC. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Elena Tutubalina
- Chemoinformatics and Molecular Modeling Laboratory, The Alexander Butlerov Institute of Chemistry, Kazan Federal University, Kazan 420008, Russian Federation
| | - Ilseyar Alimova
- Chemoinformatics and Molecular Modeling Laboratory, The Alexander Butlerov Institute of Chemistry, Kazan Federal University, Kazan 420008, Russian Federation
| | - Zulfat Miftahutdinov
- Chemoinformatics and Molecular Modeling Laboratory, The Alexander Butlerov Institute of Chemistry, Kazan Federal University, Kazan 420008, Russian Federation
| | - Andrey Sakhovskiy
- Chemoinformatics and Molecular Modeling Laboratory, The Alexander Butlerov Institute of Chemistry, Kazan Federal University, Kazan 420008, Russian Federation
| | - Valentin Malykh
- Chemoinformatics and Molecular Modeling Laboratory, The Alexander Butlerov Institute of Chemistry, Kazan Federal University, Kazan 420008, Russian Federation
| | - Sergey Nikolenko
- Chemoinformatics and Molecular Modeling Laboratory, The Alexander Butlerov Institute of Chemistry, Kazan Federal University, Kazan 420008, Russian Federation.,Samsung-PDMI AI Center, Steklov Institute of Mathematics at St. Petersburg, St. Petersburg 191023, Russian Federation
| |
Collapse
|
13
|
Bai T, Guan H, Wang S, Wang Y, Huang L. Traditional Chinese medicine entity relation extraction based on CNN with segment attention. Neural Comput Appl 2021. [DOI: 10.1007/s00521-021-05897-9] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/21/2022]
|
14
|
Zeng D, Zhao C, Quan Z. CID-GCN: An Effective Graph Convolutional Networks for Chemical-Induced Disease Relation Extraction. Front Genet 2021; 12:624307. [PMID: 33643385 PMCID: PMC7902761 DOI: 10.3389/fgene.2021.624307] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/31/2020] [Accepted: 01/18/2021] [Indexed: 11/26/2022] Open
Abstract
Automatic extraction of chemical-induced disease (CID) relation from unstructured text is of essential importance for disease treatment and drug development. In this task, some relational facts can only be inferred from the document rather than single sentence. Recently, researchers investigate graph-based approaches to extract relations across sentences. It iteratively combines the information from neighbor nodes to model the interactions in entity mentions that exist in different sentences. Despite their success, one severe limitation of the graph-based approaches is the over-smoothing problem, which decreases the model distinguishing ability. In this paper, we propose CID-GCN, an effective Graph Convolutional Networks (GCNs) with gating mechanism, for CID relation extraction. Specifically, we construct a heterogeneous graph which contains mention, sentence and entity nodes. Then, the graph convolution operation is employed to aggregate interactive information on the constructed graph. Particularly, we combine gating mechanism with the graph convolution operation to address the over-smoothing problem. The experimental results demonstrate that our approach significantly outperforms the baselines.
Collapse
Affiliation(s)
- Daojian Zeng
- Hunan Provincial Key Laboratory of Intelligent Computing and Language Information Processing, Hunan Normal University, Changsha, China
| | - Chao Zhao
- School of Computer and Communication Engineering, Changsha University of Science and Technology, Changsha, China
| | - Zhe Quan
- College of Information Science and Engineering, Hunan University, Changsha, China
| |
Collapse
|
15
|
Mitra S, Saha S, Hasanuzzaman M. A Multi-View Deep Neural Network Model for Chemical-Disease Relation Extraction From Imbalanced Datasets. IEEE J Biomed Health Inform 2020; 24:3315-3325. [DOI: 10.1109/jbhi.2020.2983365] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022]
|
16
|
Wang J, Chen X, Zhang Y, Zhang Y, Wen J, Lin H, Yang Z, Wang X. Document-Level Biomedical Relation Extraction Using Graph Convolutional Network and Multihead Attention: Algorithm Development and Validation. JMIR Med Inform 2020; 8:e17638. [PMID: 32459636 PMCID: PMC7458061 DOI: 10.2196/17638] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/30/2019] [Revised: 04/14/2020] [Accepted: 04/25/2020] [Indexed: 11/22/2022] Open
Abstract
Background Automatically extracting relations between chemicals and diseases plays an important role in biomedical text mining. Chemical-disease relation (CDR) extraction aims at extracting complex semantic relationships between entities in documents, which contain intrasentence and intersentence relations. Most previous methods did not consider dependency syntactic information across the sentences, which are very valuable for the relations extraction task, in particular, for extracting the intersentence relations accurately. Objective In this paper, we propose a novel end-to-end neural network based on the graph convolutional network (GCN) and multihead attention, which makes use of the dependency syntactic information across the sentences to improve CDR extraction task. Methods To improve the performance of intersentence relation extraction, we constructed a document-level dependency graph to capture the dependency syntactic information across sentences. GCN is applied to capture the feature representation of the document-level dependency graph. The multihead attention mechanism is employed to learn the relatively important context features from different semantic subspaces. To enhance the input representation, the deep context representation is used in our model instead of traditional word embedding. Results We evaluate our method on CDR corpus. The experimental results show that our method achieves an F-measure of 63.5%, which is superior to other state-of-the-art methods. In the intrasentence level, our method achieves a precision, recall, and F-measure of 59.1%, 81.5%, and 68.5%, respectively. In the intersentence level, our method achieves a precision, recall, and F-measure of 47.8%, 52.2%, and 49.9%, respectively. Conclusions The GCN model can effectively exploit the across sentence dependency information to improve the performance of intersentence CDR extraction. Both the deep context representation and multihead attention are helpful in the CDR extraction task.
Collapse
Affiliation(s)
- Jian Wang
- School of Computer Science and Technology, Dalian University of Technology, Dalian, China
| | - Xiaoyu Chen
- School of Computer Science and Technology, Dalian University of Technology, Dalian, China
| | - Yu Zhang
- School of Computer Science and Technology, Dalian University of Technology, Dalian, China
| | - Yijia Zhang
- School of Computer Science and Technology, Dalian University of Technology, Dalian, China
| | - Jiabin Wen
- Department of VIP, The Second Hospital of Dalian Medical University, Dalian, China
| | - Hongfei Lin
- School of Computer Science and Technology, Dalian University of Technology, Dalian, China
| | - Zhihao Yang
- School of Computer Science and Technology, Dalian University of Technology, Dalian, China
| | - Xin Wang
- School of Computer Science and Technology, Dalian University of Technology, Dalian, China
| |
Collapse
|
17
|
Liu X, Fan J, Dong S. Document-Level Biomedical Relation Extraction Leveraging Pretrained Self-Attention Structure and Entity Replacement: Algorithm and Pretreatment Method Validation Study. JMIR Med Inform 2020; 8:e17644. [PMID: 32469325 PMCID: PMC7314385 DOI: 10.2196/17644] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/30/2019] [Revised: 03/02/2020] [Accepted: 03/19/2020] [Indexed: 01/26/2023] Open
Abstract
Background The most current methods applied for intrasentence relation extraction in the biomedical literature are inadequate for document-level relation extraction, in which the relationship may cross sentence boundaries. Hence, some approaches have been proposed to extract relations by splitting the document-level datasets through heuristic rules and learning methods. However, these approaches may introduce additional noise and do not really solve the problem of intersentence relation extraction. It is challenging to avoid noise and extract cross-sentence relations. Objective This study aimed to avoid errors by dividing the document-level dataset, verify that a self-attention structure can extract biomedical relations in a document with long-distance dependencies and complex semantics, and discuss the relative benefits of different entity pretreatment methods for biomedical relation extraction. Methods This paper proposes a new data preprocessing method and attempts to apply a pretrained self-attention structure for document biomedical relation extraction with an entity replacement method to capture very long-distance dependencies and complex semantics. Results Compared with state-of-the-art approaches, our method greatly improved the precision. The results show that our approach increases the F1 value, compared with state-of-the-art methods. Through experiments of biomedical entity pretreatments, we found that a model using an entity replacement method can improve performance. Conclusions When considering all target entity pairs as a whole in the document-level dataset, a pretrained self-attention structure is suitable to capture very long-distance dependencies and learn the textual context and complicated semantics. A replacement method for biomedical entities is conducive to biomedical relation extraction, especially to document-level relation extraction.
Collapse
Affiliation(s)
- Xiaofeng Liu
- Communication and Computer Network Key Laboratory of Guangdong, School of Computer Science and Engineering, South China University of Technology, Guangzhou, China
| | - Jianye Fan
- Communication and Computer Network Key Laboratory of Guangdong, School of Computer Science and Engineering, South China University of Technology, Guangzhou, China
| | - Shoubin Dong
- Communication and Computer Network Key Laboratory of Guangdong, School of Computer Science and Engineering, South China University of Technology, Guangzhou, China
| |
Collapse
|
18
|
Li Z, Yang Z, Xiang Y, Luo L, Sun Y, Lin H. Exploiting sequence labeling framework to extract document-level relations from biomedical texts. BMC Bioinformatics 2020; 21:125. [PMID: 32216746 PMCID: PMC7099809 DOI: 10.1186/s12859-020-3457-2] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/30/2019] [Accepted: 03/18/2020] [Indexed: 12/02/2022] Open
Abstract
Background Both intra- and inter-sentential semantic relations in biomedical texts provide valuable information for biomedical research. However, most existing methods either focus on extracting intra-sentential relations and ignore inter-sentential ones or fail to extract inter-sentential relations accurately and regard the instances containing entity relations as being independent, which neglects the interactions between relations. We propose a novel sequence labeling-based biomedical relation extraction method named Bio-Seq. In the method, sequence labeling framework is extended by multiple specified feature extractors so as to facilitate the feature extractions at different levels, especially at the inter-sentential level. Besides, the sequence labeling framework enables Bio-Seq to take advantage of the interactions between relations, and thus, further improves the precision of document-level relation extraction. Results Our proposed method obtained an F1-score of 63.5% on BioCreative V chemical disease relation corpus, and an F1-score of 54.4% on inter-sentential relations, which was 10.5% better than the document-level classification baseline. Also, our method achieved an F1-score of 85.1% on n2c2-ADE sub-dataset. Conclusion Sequence labeling method can be successfully used to extract document-level relations, especially for boosting the performance on inter-sentential relation extraction. Our work can facilitate the research on document-level biomedical text mining.
Collapse
Affiliation(s)
- Zhiheng Li
- School of Computer Science and Technology, Dalian University of Technology, Dalian, 116024, China
| | - Zhihao Yang
- School of Computer Science and Technology, Dalian University of Technology, Dalian, 116024, China.
| | - Yang Xiang
- School of Biomedical Informatics, University of Texas Health Science Center at Houston, Houston, 77030, USA
| | - Ling Luo
- School of Computer Science and Technology, Dalian University of Technology, Dalian, 116024, China
| | - Yuanyuan Sun
- School of Computer Science and Technology, Dalian University of Technology, Dalian, 116024, China
| | - Hongfei Lin
- School of Computer Science and Technology, Dalian University of Technology, Dalian, 116024, China
| |
Collapse
|
19
|
Zhou H, Yang Y, Ning S, Liu Z, Lang C, Lin Y, Huang D. Combining Context and Knowledge Representations for Chemical-Disease Relation Extraction. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2019; 16:1879-1889. [PMID: 29994540 DOI: 10.1109/tcbb.2018.2838661] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/08/2023]
Abstract
Automatically extracting the relationships between chemicals and diseases is significantly important to various areas of biomedical research and health care. Biomedical experts have built many large-scale knowledge bases (KBs) to advance the development of biomedical research. KBs contain huge amounts of structured information about entities and relationships, therefore plays a pivotal role in chemical-disease relation (CDR) extraction. However, previous researches pay less attention to the prior knowledge existing in KBs. This paper proposes a neural network-based attention model (NAM) for CDR extraction, which makes full use of context information in documents and prior knowledge in KBs. For a pair of entities in a document, an attention mechanism is employed to select important context words with respect to the relation representations learned from KBs. Experiments on the BioCreative V CDR dataset show that combining context and knowledge representations through the attention mechanism, could significantly improve the CDR extraction performance while achieve comparable results with state-of-the-art systems.
Collapse
|
20
|
Gu J, Sun F, Qian L, Zhou G. Chemical-induced disease relation extraction via attention-based distant supervision. BMC Bioinformatics 2019; 20:403. [PMID: 31331263 PMCID: PMC6647285 DOI: 10.1186/s12859-019-2884-4] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/15/2018] [Accepted: 05/08/2019] [Indexed: 11/24/2022] Open
Abstract
Background Automatically understanding chemical-disease relations (CDRs) is crucial in various areas of biomedical research and health care. Supervised machine learning provides a feasible solution to automatically extract relations between biomedical entities from scientific literature, its success, however, heavily depends on large-scale biomedical corpora manually annotated with intensive labor and tremendous investment. Results We present an attention-based distant supervision paradigm for the BioCreative-V CDR extraction task. Training examples at both intra- and inter-sentence levels are generated automatically from the Comparative Toxicogenomics Database (CTD) without any human intervention. An attention-based neural network and a stacked auto-encoder network are applied respectively to induce learning models and extract relations at both levels. After merging the results of both levels, the document-level CDRs can be finally extracted. It achieves the precision/recall/F1-score of 60.3%/73.8%/66.4%, outperforming the state-of-the-art supervised learning systems without using any annotated corpus. Conclusion Our experiments demonstrate that distant supervision is promising for extracting chemical disease relations from biomedical literature, and capturing both local and global attention features simultaneously is effective in attention-based distantly supervised learning.
Collapse
Affiliation(s)
- Jinghang Gu
- Natural Language Processing Lab, School of Computer Science and Technology, Soochow University, 1 Shizi Street, Suzhou, China.,Big Data Group, Baidu Inc., Beijing, China
| | - Fuqing Sun
- Department of Gynecology Minimally Invasive Center, Beijing Obstetrics and Gynecology Hospital, Capital Medical University, Beijing, China
| | - Longhua Qian
- Natural Language Processing Lab, School of Computer Science and Technology, Soochow University, 1 Shizi Street, Suzhou, China.
| | - Guodong Zhou
- Natural Language Processing Lab, School of Computer Science and Technology, Soochow University, 1 Shizi Street, Suzhou, China
| |
Collapse
|
21
|
Zhou H, Lang C, Liu Z, Ning S, Lin Y, Du L. Knowledge-guided convolutional networks for chemical-disease relation extraction. BMC Bioinformatics 2019; 20:260. [PMID: 31113357 PMCID: PMC6528333 DOI: 10.1186/s12859-019-2873-7] [Citation(s) in RCA: 17] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/07/2018] [Accepted: 05/02/2019] [Indexed: 01/10/2023] Open
Abstract
BACKGROUND Automatic extraction of chemical-disease relations (CDR) from unstructured text is of essential importance for disease treatment and drug development. Meanwhile, biomedical experts have built many highly-structured knowledge bases (KBs), which contain prior knowledge about chemicals and diseases. Prior knowledge provides strong support for CDR extraction. How to make full use of it is worth studying. RESULTS This paper proposes a novel model called "Knowledge-guided Convolutional Networks (KCN)" to leverage prior knowledge for CDR extraction. The proposed model first learns knowledge representations including entity embeddings and relation embeddings from KBs. Then, entity embeddings are used to control the propagation of context features towards a chemical-disease pair with gated convolutions. After that, relation embeddings are employed to further capture the weighted context features by a shared attention pooling. Finally, the weighted context features containing additional knowledge information are used for CDR extraction. Experiments on the BioCreative V CDR dataset show that the proposed KCN achieves 71.28% F1-score, which outperforms most of the state-of-the-art systems. CONCLUSIONS This paper proposes a novel CDR extraction model KCN to make full use of prior knowledge. Experimental results demonstrate that KCN could effectively integrate prior knowledge and contexts for the performance improvement.
Collapse
Affiliation(s)
- Huiwei Zhou
- School of Computer Science and Technology, Dalian University of Technology, Chuangxinyuan Building, No.2 Linggong Road, Ganjingzi District, Dalian, 116024, Liaoning, China.
| | - Chengkun Lang
- School of Computer Science and Technology, Dalian University of Technology, Chuangxinyuan Building, No.2 Linggong Road, Ganjingzi District, Dalian, 116024, Liaoning, China
| | - Zhuang Liu
- School of Computer Science and Technology, Dalian University of Technology, Chuangxinyuan Building, No.2 Linggong Road, Ganjingzi District, Dalian, 116024, Liaoning, China
| | - Shixian Ning
- School of Computer Science and Technology, Dalian University of Technology, Chuangxinyuan Building, No.2 Linggong Road, Ganjingzi District, Dalian, 116024, Liaoning, China
| | - Yingyu Lin
- School of Foreign Languages, Dalian University of Technology, Arts Building, No.2 Linggong Road, Ganjingzi District, Dalian, 116024, Liaoning, China
| | - Lei Du
- School of Mathematical Sciences, Dalian University of Technology, Chuangxinyuan Building, No.2 Linggong Road, Ganjingzi District, Dalian, 116024, Liaoning, China
| |
Collapse
|
22
|
Onye SC, Akkeleş A, Dimililer N. relSCAN - A system for extracting chemical-induced disease relation from biomedical literature. J Biomed Inform 2018; 87:79-87. [PMID: 30296491 DOI: 10.1016/j.jbi.2018.09.018] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/01/2018] [Revised: 09/17/2018] [Accepted: 09/30/2018] [Indexed: 11/20/2022]
Abstract
This paper proposes an effective and robust approach for Chemical-Induced Disease (CID) relation extraction from PubMed articles. The study was performed on the Chemical Disease Relation (CDR) task of BioCreative V track-3 corpus. The proposed system, named relSCAN, is an efficient CID relation extraction system with two phases to classify relation instances from the Co-occurrence and Non-Co-occurrence mention levels. We describe the case of chemical and disease mentions that occur in the same sentence as 'Co-occurrence', or as 'Non-Co-occurrence' otherwise. In the first phase, the relation instances are constructed on both mention levels. In the second phase, we employ a hybrid feature set to classify the relation instances at both of these mention levels using the combination of two Machine Learning (ML) classifiers (Support Vector Machine (SVM) and J48 Decision tree). This system is entirely corpus dependent and does not rely on information from external resources in order to boost its performance. We achieved good results, which are comparable with the other state-of-the-art CID relation extraction systems on the BioCreative V corpus. Furthermore, our system achieves the best performance on the Non-Co-occurrence mention level.
Collapse
Affiliation(s)
- Stanley Chika Onye
- Department of Applied Mathematics and Computer Science, Faculty of Arts & Sciences, Eastern Mediterranean University, Famagusta, North Cyprus via Mersin 10, Turkey.
| | - Arif Akkeleş
- Department of Mathematics, Faculty of Arts & Sciences, Eastern Mediterranean University, Famagusta, North Cyprus via Mersin 10, Turkey
| | - Nazife Dimililer
- Department of Information Technology, School of Computing and Technology, Eastern Mediterranean University, Famagusta, North Cyprus via Mersin 10, Turkey
| |
Collapse
|
23
|
Zheng W, Lin H, Liu X, Xu B. A document level neural model integrated domain knowledge for chemical-induced disease relations. BMC Bioinformatics 2018; 19:328. [PMID: 30223767 PMCID: PMC6142695 DOI: 10.1186/s12859-018-2316-x] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/04/2018] [Accepted: 08/14/2018] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND The effective combination of texts and knowledge may improve performances of natural language processing tasks. For the recognition of chemical-induced disease (CID) relations which may span sentence boundaries in an article, although existing CID systems explored the utilization for knowledge bases, the effects of different knowledge on the identification of a special CID haven't been distinguished by these systems. Moreover, systems based on neural network only constructed sentence or mention level models. RESULTS In this work, we proposed an effective document level neural model integrated domain knowledge to extract CID relations from biomedical articles. Basic semantic information of an article with respect to a special CID candidate pair was learned from the document level sub-network module. Furthermore, knowledge attention depending on the representation of the article was proposed to distinguish the influences of different knowledge on the special CID pair and then the final representation of knowledge was formed by aggregating weighed knowledge. Finally, the integrated representations of texts and knowledge were passed to a softmax classifier to perform the CID recognition. Experimental results on the chemical-disease relation corpus proposed by BioCreative V show that our proposed system integrated knowledge achieves a good overall performance compared with other state-of-the-art systems. CONCLUSIONS Experimental analyses demonstrate that the introduced attention mechanism on domain knowledge plays a significant role in distinguishing influences of different knowledge on the judgment for a special CID relation.
Collapse
Affiliation(s)
- Wei Zheng
- College of Computer Science and Technology, Dalian University of Technology, Dalian, China.,College of Software, Dalian JiaoTong University, Dalian, China
| | - Hongfei Lin
- College of Computer Science and Technology, Dalian University of Technology, Dalian, China.
| | - Xiaoxia Liu
- College of Computer Science and Technology, Dalian University of Technology, Dalian, China
| | - Bo Xu
- College of Computer Science and Technology, Dalian University of Technology, Dalian, China.
| |
Collapse
|
24
|
Chemical-induced disease relation extraction with dependency information and prior knowledge. J Biomed Inform 2018; 84:171-178. [DOI: 10.1016/j.jbi.2018.07.007] [Citation(s) in RCA: 14] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/29/2018] [Revised: 07/09/2018] [Accepted: 07/11/2018] [Indexed: 11/18/2022]
|
25
|
Zheng W, Lin H, Li Z, Liu X, Li Z, Xu B, Zhang Y, Yang Z, Wang J. An effective neural model extracting document level chemical-induced disease relations from biomedical literature. J Biomed Inform 2018; 83:1-9. [DOI: 10.1016/j.jbi.2018.05.001] [Citation(s) in RCA: 19] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/09/2017] [Revised: 03/14/2018] [Accepted: 05/04/2018] [Indexed: 01/06/2023]
|
26
|
Warikoo N, Chang YC, Hsu WL. LPTK: a linguistic pattern-aware dependency tree kernel approach for the BioCreative VI CHEMPROT task. Database (Oxford) 2018; 2018:5139652. [PMID: 30346607 PMCID: PMC6196310 DOI: 10.1093/database/bay108] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/05/2018] [Revised: 08/30/2018] [Accepted: 09/24/2018] [Indexed: 11/14/2022]
Abstract
Identifying the interactions between chemical compounds and genes from biomedical literatures is one of the frequently discussed topics of text mining in the life science field. In this paper, we describe Linguistic Pattern-Aware Dependency Tree Kernel, a linguistic interaction pattern learning method developed for CHEMPROT task-BioCreative VI, to capture chemical-protein interaction (CPI) patterns within biomedical literatures. We also introduce a framework to integrate these linguistic patterns with smooth partial tree kernel to extract the CPIs. This new method of feature representation models aspects of linguistic probability in geometric representation, which not only optimizes the sufficiency of feature dimension for classification, but also defines features as interpretable contexts rather than long vectors of numbers. In order to test the robustness and efficiency of our system in identifying different kinds of biological interactions, we evaluated our framework on three separate data sets, i.e. CHEMPROT corpus, Chemical-Disease Relation corpus and Protein-Protein Interaction corpus. Corresponding experiment results demonstrate that our method is effective and outperforms several compared systems for each data set.
Collapse
Affiliation(s)
- Neha Warikoo
- Institute of Biomedical Informatics, National Yang-Ming University, Taipei, Taiwan
- Bioinformatics Program, Taiwan International Graduate Program, Institute of Information Science, Academia Sinica, Taipei, Taiwan
- Institute of Information Science, Academia Sinica, Taipei, Taiwan
| | - Yung-Chun Chang
- Graduate Institute of Data Science, Taipei Medical University, Taipei, Taiwan
| | - Wen-Lian Hsu
- Institute of Information Science, Academia Sinica, Taipei, Taiwan
| |
Collapse
|
27
|
Segura Bedmar I, Martínez P, Carruana Martín A. Search and Graph Database Technologies for Biomedical Semantic Indexing: Experimental Analysis. JMIR Med Inform 2017; 5:e48. [PMID: 29196280 PMCID: PMC5732329 DOI: 10.2196/medinform.7059] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/12/2017] [Revised: 09/08/2017] [Accepted: 09/27/2017] [Indexed: 11/25/2022] Open
Abstract
Background Biomedical semantic indexing is a very useful support tool for human curators in their efforts for indexing and cataloging the biomedical literature. Objective The aim of this study was to describe a system to automatically assign Medical Subject Headings (MeSH) to biomedical articles from MEDLINE. Methods Our approach relies on the assumption that similar documents should be classified by similar MeSH terms. Although previous work has already exploited the document similarity by using a k-nearest neighbors algorithm, we represent documents as document vectors by search engine indexing and then compute the similarity between documents using cosine similarity. Once the most similar documents for a given input document are retrieved, we rank their MeSH terms to choose the most suitable set for the input document. To do this, we define a scoring function that takes into account the frequency of the term into the set of retrieved documents and the similarity between the input document and each retrieved document. In addition, we implement guidelines proposed by human curators to annotate MEDLINE articles; in particular, the heuristic that says if 3 MeSH terms are proposed to classify an article and they share the same ancestor, they should be replaced by this ancestor. The representation of the MeSH thesaurus as a graph database allows us to employ graph search algorithms to quickly and easily capture hierarchical relationships such as the lowest common ancestor between terms. Results Our experiments show promising results with an F1 of 69% on the test dataset. Conclusions To the best of our knowledge, this is the first work that combines search and graph database technologies for the task of biomedical semantic indexing. Due to its horizontal scalability, ElasticSearch becomes a real solution to index large collections of documents (such as the bibliographic database MEDLINE). Moreover, the use of graph search algorithms for accessing MeSH information could provide a support tool for cataloging MEDLINE abstracts in real time.
Collapse
Affiliation(s)
- Isabel Segura Bedmar
- LaBDA Group, Department of Computer Science, Universidad Carlos III de Madrid, Leganés, Spain
| | - Paloma Martínez
- LaBDA Group, Department of Computer Science, Universidad Carlos III de Madrid, Leganés, Spain
| | - Adrián Carruana Martín
- LaBDA Group, Department of Computer Science, Universidad Carlos III de Madrid, Leganés, Spain
| |
Collapse
|
28
|
Combination of Deep Recurrent Neural Networks and Conditional Random Fields for Extracting Adverse Drug Reactions from User Reviews. JOURNAL OF HEALTHCARE ENGINEERING 2017; 2017:9451342. [PMID: 29177027 PMCID: PMC5605929 DOI: 10.1155/2017/9451342] [Citation(s) in RCA: 34] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 04/14/2017] [Accepted: 07/27/2017] [Indexed: 01/30/2023]
Abstract
Adverse drug reactions (ADRs) are an essential part of the analysis of drug use, measuring drug use benefits, and making policy decisions. Traditional channels for identifying ADRs are reliable but very slow and only produce a small amount of data. Text reviews, either on specialized web sites or in general-purpose social networks, may lead to a data source of unprecedented size, but identifying ADRs in free-form text is a challenging natural language processing problem. In this work, we propose a novel model for this problem, uniting recurrent neural architectures and conditional random fields. We evaluate our model with a comprehensive experimental study, showing improvements over state-of-the-art methods of ADR extraction.
Collapse
|
29
|
Wang P, Hao T, Yan J, Jin L. Large-scale extraction of drug-disease pairs from the medical literature. J Assoc Inf Sci Technol 2017. [DOI: 10.1002/asi.23876] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/20/2023]
Affiliation(s)
- Pengwei Wang
- School of Electronic and Information Engineering; South China University of Technology; Guangzhou China
| | - Tianyong Hao
- Cisco School of Informatics; Guangdong University of Foreign Studies; Guangzhou China
| | - Jun Yan
- Microsoft Research Asia; Beijing China
| | - Lianwen Jin
- School of Electronic and Information Engineering; South China University of Technology; Guangzhou China
| |
Collapse
|
30
|
Gu J, Sun F, Qian L, Zhou G. Chemical-induced disease relation extraction via convolutional neural network. DATABASE-THE JOURNAL OF BIOLOGICAL DATABASES AND CURATION 2017; 2017:3098440. [PMID: 28415073 PMCID: PMC5467558 DOI: 10.1093/database/bax024] [Citation(s) in RCA: 38] [Impact Index Per Article: 4.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 12/13/2016] [Accepted: 03/01/2017] [Indexed: 01/08/2023]
Abstract
This article describes our work on the BioCreative-V chemical–disease relation (CDR) extraction task, which employed a maximum entropy (ME) model and a convolutional neural network model for relation extraction at inter- and intra-sentence level, respectively. In our work, relation extraction between entity concepts in documents was simplified to relation extraction between entity mentions. We first constructed pairs of chemical and disease mentions as relation instances for training and testing stages, then we trained and applied the ME model and the convolutional neural network model for inter- and intra-sentence level, respectively. Finally, we merged the classification results from mention level to document level to acquire the final relations between chemical and disease concepts. The evaluation on the BioCreative-V CDR corpus shows the effectiveness of our proposed approach. Database URL:http://www.biocreative.org/resources/corpora/biocreative-v-cdr-corpus/
Collapse
Affiliation(s)
- Jinghang Gu
- School of Computer Science and Technology, Soochow University, 1 Shizi Street, Suzhou, China
| | - Fuqing Sun
- Department of Gynecology Minimally Invasive Center, Beijing Obstetrics and Gynecology Hospital, Capital Medical University, 17 Qihelou Street, Beijing, China
| | - Longhua Qian
- School of Computer Science and Technology, Soochow University, 1 Shizi Street, Suzhou, China
| | - Guodong Zhou
- School of Computer Science and Technology, Soochow University, 1 Shizi Street, Suzhou, China
| |
Collapse
|