1
|
Hu Y, Chen Y, Qin Y, Huang R. Learning entity-oriented representation for biomedical relation extraction. J Biomed Inform 2023; 147:104527. [PMID: 37852347 DOI: 10.1016/j.jbi.2023.104527] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/21/2023] [Revised: 10/11/2023] [Accepted: 10/15/2023] [Indexed: 10/20/2023]
Abstract
Biomedical Relation Extraction (BioRE) aims to automatically extract semantic relations for given entity pairs and is of great significance in biomedical research. Current popular methods often utilize pretrained language models to extract semantic features from individual input instances, which frequently suffer from overlapping semantics. Overlapping semantics refers to the situation in which a sentence contains multiple entity pairs that share the same context, leading to highly similar information between these entity pairs. In this study, we propose a model for learning Entity-oriented Representation (EoR) that aims to improve the performance of the model by enhancing the discriminability between entity pairs. It contains three modules: sentence representation, entity-oriented representation, and output. The first module learns the global semantic information of the input instance; the second module focuses on extracting the semantic information of the sentence from the target entities; and the third module enhances distinguishability among entity pairs and classifies the relation type. We evaluated our approach on four BioRE tasks with eight datasets, and the experiments showed that our EoR achieved state-of-the-art performance for PPI, DDI, CPI, and DPI tasks. Further analysis demonstrated the benefits of entity-oriented semantic information in handling multiple entity pairs in the BioRE task.
Collapse
Affiliation(s)
- Ying Hu
- Text Computing and Cognitive Intelligence Engineering Research Center of National Education Ministry, State Key Laboratory of Public Big Data, College of Computer Science and Technology, Guizhou University, Guiyang, 550025, China.
| | - Yanping Chen
- Text Computing and Cognitive Intelligence Engineering Research Center of National Education Ministry, State Key Laboratory of Public Big Data, College of Computer Science and Technology, Guizhou University, Guiyang, 550025, China.
| | - Yongbin Qin
- Text Computing and Cognitive Intelligence Engineering Research Center of National Education Ministry, State Key Laboratory of Public Big Data, College of Computer Science and Technology, Guizhou University, Guiyang, 550025, China.
| | - Ruizhang Huang
- Text Computing and Cognitive Intelligence Engineering Research Center of National Education Ministry, State Key Laboratory of Public Big Data, College of Computer Science and Technology, Guizhou University, Guiyang, 550025, China.
| |
Collapse
|
2
|
Zhang Z, Chen ALP. Biomedical named entity recognition with the combined feature attention and fully-shared multi-task learning. BMC Bioinformatics 2022; 23:458. [PMID: 36329384 PMCID: PMC9632084 DOI: 10.1186/s12859-022-04994-3] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/18/2022] [Accepted: 10/19/2022] [Indexed: 11/06/2022] Open
Abstract
Background Biomedical named entity recognition (BioNER) is a basic and important task for biomedical text mining with the purpose of automatically recognizing and classifying biomedical entities. The performance of BioNER systems directly impacts downstream applications. Recently, deep neural networks, especially pre-trained language models, have made great progress for BioNER. However, because of the lack of high-quality and large-scale annotated data and relevant external knowledge, the capability of the BioNER system remains limited. Results In this paper, we propose a novel fully-shared multi-task learning model based on the pre-trained language model in biomedical domain, namely BioBERT, with a new attention module to integrate the auto-processed syntactic information for the BioNER task. We have conducted numerous experiments on seven benchmark BioNER datasets. The proposed best multi-task model obtains F1 score improvements of 1.03% on BC2GM, 0.91% on NCBI-disease, 0.81% on Linnaeus, 1.26% on JNLPBA, 0.82% on BC5CDR-Chemical, 0.87% on BC5CDR-Disease, and 1.10% on Species-800 compared to the single-task BioBERT model. Conclusion The results demonstrate our model outperforms previous studies on all datasets. Further analysis and case studies are also provided to prove the importance of the proposed attention module and fully-shared multi-task learning method used in our model.
Collapse
Affiliation(s)
- Zhiyu Zhang
- Department of Computer Science, National Tsing Hua University, Hsinchu, Taiwan
| | - Arbee L P Chen
- Department of Computer Science, National Tsing Hua University, Hsinchu, Taiwan. .,Department of Computer Science and Information Engineering, Asia University, Taichung, Taiwan.
| |
Collapse
|
3
|
Utilizing external corpora through kernel function: application in biomedical named entity recognition. PROGRESS IN ARTIFICIAL INTELLIGENCE 2020. [DOI: 10.1007/s13748-020-00208-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/24/2022]
|
4
|
Lauw HW, Wong RCW, Ntoulas A, Lim EP, Ng SK, Pan SJ. Attribute-Driven Capsule Network for Entity Relation Prediction. ADVANCES IN KNOWLEDGE DISCOVERY AND DATA MINING 2020. [PMCID: PMC7206169 DOI: 10.1007/978-3-030-47426-3_52] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
Abstract
Multi-attribute entity relation prediction is a novel data mining application about designing an intelligent system that supports inferencing across attributes information. However, most existing deep learning methods capture the inner structural information between different attributes are far more limited. In this paper, we propose an attribute-driven approach for entity relation prediction task based on capsule networks that have been shown to demonstrate good performance on relation mining. We develop a self-attention routing method to encapsulate multiple attributes semantic representation into relational semantic capsules and using dynamic routing method to generate class capsules for predicting relations. Due to the lack of multi-attribute entity relation data is a major obstacle in this task, we construct a new real-world multi-attribute entity relation dataset in this work. Experimental results show significant superiority of our model, as compared with other baselines.
Collapse
Affiliation(s)
- Hady W. Lauw
- School of Information Systems, Singapore Management University, Singapore, Singapore
| | - Raymond Chi-Wing Wong
- Department of Computer Science and Engineering, Hong Kong University of Science and Technology, Hong Kong, Hong Kong
| | - Alexandros Ntoulas
- Department of Informatics and Telecommunications, National and Kapodistrian University of Athens, Athens, Greece
| | - Ee-Peng Lim
- School of Information Systems, Singapore Management University, Singapore, Singapore
| | - See-Kiong Ng
- Institute of Data Science, National University of Singapore, Singapore, Singapore
| | - Sinno Jialin Pan
- School of Computer Science and Engineering, Nanyang Technological University, Singapore, Singapore
| |
Collapse
|
5
|
He Z, Chen W, Li Z, Zhang W, Shao H, Zhang M. Syntax-aware entity representations for neural relation extraction. ARTIF INTELL 2019. [DOI: 10.1016/j.artint.2019.07.004] [Citation(s) in RCA: 15] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]
|
6
|
Tran T, Kavuluru R. Distant supervision for treatment relation extraction by leveraging MeSH subheadings. Artif Intell Med 2019; 98:18-26. [PMID: 31521249 PMCID: PMC6748648 DOI: 10.1016/j.artmed.2019.06.002] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/16/2018] [Revised: 06/04/2019] [Accepted: 06/05/2019] [Indexed: 11/26/2022]
Abstract
The growing body of knowledge in biomedicine is too vast for human consumption. Hence there is a need for automated systems able to navigate and distill the emerging wealth of information. One fundamental task to that end is relation extraction, whereby linguistic expressions of semantic relationships between biomedical entities are recognized and extracted. In this study, we propose a novel distant supervision approach for relation extraction of binary treatment relationships such that high quality positive/negative training examples are generated from PubMed abstracts by leveraging associated MeSH subheadings. The quality of generated examples is assessed based on the quality of supervised models they induce; that is, the mean performance of trained models (derived via bootstrapped ensembling) on a gold standard test set is used as a proxy for data quality. We show that our approach is preferable to traditional distant supervision for treatment relations and is closer to human crowd annotations in terms of annotation quality. For treatment relations, our generated training data performs at 81.38%, compared to traditional distant supervision at 64.33% and crowd-sourced annotations at 90.57% on the model-wide PR-AUC metric. We also demonstrate that examples generated using our method can be used to augment crowd-sourced datasets. Augmented models improve over non-augmented models by more than two absolute points on the more established F1 metric. We lastly demonstrate that performance can be further improved by implementing a classification loss that is resistant to label noise.
Collapse
Affiliation(s)
- Tung Tran
- Department of Computer Science, University of Kentucky, Lexington, KY, United States.
| | - Ramakanth Kavuluru
- Department of Computer Science, University of Kentucky, Lexington, KY, United States; Division of Biomedical Informatics, Department of Internal Medicine, University of Kentucky, Lexington, KY, United States.
| |
Collapse
|
7
|
Tran T, Kavuluru R. An end-to-end deep learning architecture for extracting protein-protein interactions affected by genetic mutations. DATABASE-THE JOURNAL OF BIOLOGICAL DATABASES AND CURATION 2018; 2018:1-13. [PMID: 30239680 PMCID: PMC6146129 DOI: 10.1093/database/bay092] [Citation(s) in RCA: 21] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 02/28/2018] [Accepted: 08/13/2018] [Indexed: 11/25/2022]
Abstract
The BioCreative VI Track IV (mining protein interactions and mutations for precision medicine) challenge was organized in 2017 with the goal of applying biomedical text mining methods to support advancements in precision medicine approaches. As part of the challenge, a new dataset was introduced for the purpose of building a supervised relation extraction model capable of taking a test article and returning a list of interacting protein pairs identified by their Entrez Gene IDs. Specifically, such pairs represent proteins participating in a binary protein–protein interaction relation where the interaction is additionally affected by a genetic mutation—referred to as a PPIm relation. In this study, we explore an end-to-end approach for PPIm relation extraction by deploying a three-component pipeline involving deep learning-based named-entity recognition and relation classification models along with a knowledge-based approach for gene normalization. We propose several recall-focused improvements to our original challenge entry that placed second when matching on Entrez Gene ID (exact matching) and on HomoloGene ID. On exact matching, the improved system achieved new competitive test results of 37.78% micro-F1 with a precision of 38.22% and recall of 37.34% that corresponds to an improvement from the prior best system by approximately three micro-F1 points. When matching on HomoloGene IDs, we report similarly competitive test results at 46.17% micro-F1 with a precision and recall of 46.67 and 45.59%, respectively, corresponding to an improvement of more than eight micro-F1 points over the prior best result. The code for our deep learning system is made publicly available at https://github.com/bionlproc/biocppi_extraction.
Collapse
Affiliation(s)
- Tung Tran
- Department of Computer Science, University of Kentucky, Lexington, KY, USA
| | - Ramakanth Kavuluru
- Department of Computer Science, University of Kentucky, Lexington, KY, USA.,Division of Biomedical Informatics, Department of Internal Medicine, University of Kentucky, Lexington, KY, USA
| |
Collapse
|
8
|
SSEL-ADE: A semi-supervised ensemble learning framework for extracting adverse drug events from social media. Artif Intell Med 2017; 84:34-49. [PMID: 29111222 DOI: 10.1016/j.artmed.2017.10.003] [Citation(s) in RCA: 17] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/13/2017] [Revised: 08/28/2017] [Accepted: 10/15/2017] [Indexed: 11/21/2022]
Abstract
With the development of Web 2.0 technology, social media websites have become lucrative but under-explored data sources for extracting adverse drug events (ADEs), which is a serious health problem. Besides ADE, other semantic relation types (e.g., drug indication and beneficial effect) could hold between the drug and adverse event mentions, making ADE relation extraction - distinguishing ADE relationship from other relation types - necessary. However, conducting ADE relation extraction in social media environment is not a trivial task because of the expertise-dependent, time-consuming and costly annotation process, and the feature space's high-dimensionality attributed to intrinsic characteristics of social media data. This study aims to develop a framework for ADE relation extraction using patient-generated content in social media with better performance than that delivered by previous efforts. To achieve the objective, a general semi-supervised ensemble learning framework, SSEL-ADE, was developed. The framework exploited various lexical, semantic, and syntactic features, and integrated ensemble learning and semi-supervised learning. A series of experiments were conducted to verify the effectiveness of the proposed framework. Empirical results demonstrate the effectiveness of each component of SSEL-ADE and reveal that our proposed framework outperforms most of existing ADE relation extraction methods The SSEL-ADE can facilitate enhanced ADE relation extraction performance, thereby providing more reliable support for pharmacovigilance. Moreover, the proposed semi-supervised ensemble methods have the potential of being applied to effectively deal with other social media-based problems.
Collapse
|
9
|
An ensemble method for extracting adverse drug events from social media. Artif Intell Med 2016; 70:62-76. [PMID: 27431037 DOI: 10.1016/j.artmed.2016.05.004] [Citation(s) in RCA: 27] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/07/2015] [Revised: 05/20/2016] [Accepted: 05/27/2016] [Indexed: 11/24/2022]
Abstract
OBJECTIVE Because adverse drug events (ADEs) are a serious health problem and a leading cause of death, it is of vital importance to identify them correctly and in a timely manner. With the development of Web 2.0, social media has become a large data source for information on ADEs. The objective of this study is to develop a relation extraction system that uses natural language processing techniques to effectively distinguish between ADEs and non-ADEs in informal text on social media. METHODS AND MATERIALS We develop a feature-based approach that utilizes various lexical, syntactic, and semantic features. Information-gain-based feature selection is performed to address high-dimensional features. Then, we evaluate the effectiveness of four well-known kernel-based approaches (i.e., subset tree kernel, tree kernel, shortest dependency path kernel, and all-paths graph kernel) and several ensembles that are generated by adopting different combination methods (i.e., majority voting, weighted averaging, and stacked generalization). All of the approaches are tested using three data sets: two health-related discussion forums and one general social networking site (i.e., Twitter). RESULTS When investigating the contribution of each feature subset, the feature-based approach attains the best area under the receiver operating characteristics curve (AUC) values, which are 78.6%, 72.2%, and 79.2% on the three data sets. When individual methods are used, we attain the best AUC values of 82.1%, 73.2%, and 77.0% using the subset tree kernel, shortest dependency path kernel, and feature-based approach on the three data sets, respectively. When using classifier ensembles, we achieve the best AUC values of 84.5%, 77.3%, and 84.5% on the three data sets, outperforming the baselines. CONCLUSIONS Our experimental results indicate that ADE extraction from social media can benefit from feature selection. With respect to the effectiveness of different feature subsets, lexical features and semantic features can enhance the ADE extraction capability. Kernel-based approaches, which can stay away from the feature sparsity issue, are qualified to address the ADE extraction problem. Combining different individual classifiers using suitable combination methods can further enhance the ADE extraction effectiveness.
Collapse
|
10
|
Suh JH. Comparing writing style feature-based classification methods for estimating user reputations in social media. SPRINGERPLUS 2016; 5:261. [PMID: 27006870 PMCID: PMC4775724 DOI: 10.1186/s40064-016-1841-1] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 10/12/2015] [Accepted: 02/15/2016] [Indexed: 12/01/2022]
Abstract
In recent years, the anonymous nature of the Internet has made it difficult to detect manipulated user reputations in social media, as well as to ensure the qualities of users and their posts. To deal with this, this study designs and examines an automatic approach that adopts writing style features to estimate user reputations in social media. Under varying ways of defining Good and Bad classes of user reputations based on the collected data, it evaluates the classification performance of the state-of-art methods: four writing style features, i.e. lexical, syntactic, structural, and content-specific, and eight classification techniques, i.e. four base learners—C4.5, Neural Network (NN), Support Vector Machine (SVM), and Naïve Bayes (NB)—and four Random Subspace (RS) ensemble methods based on the four base learners. When South Korea’s Web forum, Daum Agora, was selected as a test bed, the experimental results show that the configuration of the full feature set containing content-specific features and RS-SVM combining RS and SVM gives the best accuracy for classification if the test bed poster reputations are segmented strictly into Good and Bad classes by portfolio approach. Pairwise t tests on accuracy confirm two expectations coming from the literature reviews: first, the feature set adding content-specific features outperform the others; second, ensemble learning methods are more viable than base learners. Moreover, among the four ways on defining the classes of user reputations, i.e. like, dislike, sum, and portfolio, the results show that the portfolio approach gives the highest accuracy.
Collapse
Affiliation(s)
- Jong Hwan Suh
- Moon Soul Graduate School of Future Strategy, KAIST, 291 Daehak-ro, Yuseong-gu, Daejeon, 34141 Republic of Korea
| |
Collapse
|
11
|
A research framework for pharmacovigilance in health social media: Identification and evaluation of patient adverse drug event reports. J Biomed Inform 2015; 58:268-279. [PMID: 26518315 DOI: 10.1016/j.jbi.2015.10.011] [Citation(s) in RCA: 42] [Impact Index Per Article: 4.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/06/2015] [Revised: 10/20/2015] [Accepted: 10/21/2015] [Indexed: 11/23/2022]
Abstract
Social media offer insights of patients' medical problems such as drug side effects and treatment failures. Patient reports of adverse drug events from social media have great potential to improve current practice of pharmacovigilance. However, extracting patient adverse drug event reports from social media continues to be an important challenge for health informatics research. In this study, we develop a research framework with advanced natural language processing techniques for integrated and high-performance patient reported adverse drug event extraction. The framework consists of medical entity extraction for recognizing patient discussions of drug and events, adverse drug event extraction with shortest dependency path kernel based statistical learning method and semantic filtering with information from medical knowledge bases, and report source classification to tease out noise. To evaluate the proposed framework, a series of experiments were conducted on a test bed encompassing about postings from major diabetes and heart disease forums in the United States. The results reveal that each component of the framework significantly contributes to its overall effectiveness. Our framework significantly outperforms prior work.
Collapse
|
12
|
Abstract
The published biomedical research literature encompasses most of our understanding of how drugs interact with gene products to produce physiological responses (phenotypes). Unfortunately, this information is distributed throughout the unstructured text of over 23 million articles. The creation of structured resources that catalog the relationships between drugs and genes would accelerate the translation of basic molecular knowledge into discoveries of genomic biomarkers for drug response and prediction of unexpected drug-drug interactions. Extracting these relationships from natural language sentences on such a large scale, however, requires text mining algorithms that can recognize when different-looking statements are expressing similar ideas. Here we describe a novel algorithm, Ensemble Biclustering for Classification (EBC), that learns the structure of biomedical relationships automatically from text, overcoming differences in word choice and sentence structure. We validate EBC's performance against manually-curated sets of (1) pharmacogenomic relationships from PharmGKB and (2) drug-target relationships from DrugBank, and use it to discover new drug-gene relationships for both knowledge bases. We then apply EBC to map the complete universe of drug-gene relationships based on their descriptions in Medline, revealing unexpected structure that challenges current notions about how these relationships are expressed in text. For instance, we learn that newer experimental findings are described in consistently different ways than established knowledge, and that seemingly pure classes of relationships can exhibit interesting chimeric structure. The EBC algorithm is flexible and adaptable to a wide range of problems in biomedical text mining. Virtually all important biomedical knowledge is described in the published research literature, but Medline currently contains over 23 million articles and is growing at the rate of several hundred thousand new articles each year. In this environment, we need computational algorithms that can efficiently extract, aggregate, annotate and store information from the raw text. Because authors describe their results using natural language, descriptions of similar phenomena vary considerably with respect to both word choice and sentence structure. Any algorithm capable of mining the biomedical literature on a large scale must be able to overcome these differences and recognize when two different-looking statements are saying the same thing. Here we describe a novel algorithm, Ensemble Biclustering for Classification (EBC), that learns the structure of drug-gene relationships automatically from the unstructured text of biomedical research abstracts. By applying EBC to the entirety of Medline, we learn from the structure of the text itself approximately 20 key ways that drugs and genes can interact, discover new facts for two biomedical knowledge bases, and reveal rich and unexpected structure in how scientists describe drug-gene relationships.
Collapse
Affiliation(s)
- Bethany Percha
- Biomedical Informatics Training Program, Stanford University, Stanford, California, United States of America
| | - Russ B. Altman
- Departments of Medicine, Genetics and Bioengineering, Stanford University, Stanford, California, United States of America
- * E-mail:
| |
Collapse
|
13
|
Abbasi A, Zahedi F“M, Zeng D, Chen Y, Chen H, Nunamaker JF. Enhancing Predictive Analytics for Anti-Phishing by Exploiting Website Genre Information. J MANAGE INFORM SYST 2015. [DOI: 10.1080/07421222.2014.1001260] [Citation(s) in RCA: 31] [Impact Index Per Article: 3.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/23/2022]
|
14
|
Development and evaluation of a biomedical search engine using a predicate-based vector space model. J Biomed Inform 2013; 46:929-39. [PMID: 23892296 DOI: 10.1016/j.jbi.2013.07.006] [Citation(s) in RCA: 11] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/01/2013] [Revised: 06/18/2013] [Accepted: 07/19/2013] [Indexed: 11/21/2022]
Abstract
Although biomedical information available in articles and patents is increasing exponentially, we continue to rely on the same information retrieval methods and use very few keywords to search millions of documents. We are developing a fundamentally different approach for finding much more precise and complete information with a single query using predicates instead of keywords for both query and document representation. Predicates are triples that are more complex datastructures than keywords and contain more structured information. To make optimal use of them, we developed a new predicate-based vector space model and query-document similarity function with adjusted tf-idf and boost function. Using a test bed of 107,367 PubMed abstracts, we evaluated the first essential function: retrieving information. Cancer researchers provided 20 realistic queries, for which the top 15 abstracts were retrieved using a predicate-based (new) and keyword-based (baseline) approach. Each abstract was evaluated, double-blind, by cancer researchers on a 0-5 point scale to calculate precision (0 versus higher) and relevance (0-5 score). Precision was significantly higher (p<.001) for the predicate-based (80%) than for the keyword-based (71%) approach. Relevance was almost doubled with the predicate-based approach-2.1 versus 1.6 without rank order adjustment (p<.001) and 1.34 versus 0.98 with rank order adjustment (p<.001) for predicate--versus keyword-based approach respectively. Predicates can support more precise searching than keywords, laying the foundation for rich and sophisticated information search.
Collapse
|
15
|
Liu X, Chen H. AZDrugMiner: An Information Extraction System for Mining Patient-Reported Adverse Drug Events in Online Patient Forums. SMART HEALTH 2013. [DOI: 10.1007/978-3-642-39844-5_16] [Citation(s) in RCA: 49] [Impact Index Per Article: 4.1] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/31/2022]
|
16
|
Nebot V, Berlanga R. Exploiting semantic annotations for open information extraction: an experience in the biomedical domain. Knowl Inf Syst 2012. [DOI: 10.1007/s10115-012-0590-x] [Citation(s) in RCA: 10] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/27/2022]
|
17
|
Rubrichi S, Quaglini S, Spengler A, Russo P, Gallinari P. A system for the extraction and representation of summary of product characteristics content. Artif Intell Med 2012; 57:145-54. [PMID: 23085139 DOI: 10.1016/j.artmed.2012.08.004] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/09/2011] [Revised: 07/29/2012] [Accepted: 08/26/2012] [Indexed: 10/27/2022]
Abstract
OBJECTIVE Information about medications is critical in supporting decision-making during the prescription process and thus in improving the safety and quality of care. In this work, we propose a methodology for the automatic recognition of drug-related entities (active ingredient, interaction effects, etc.) in textual drug descriptions, and their further location in a previously developed domain ontology. METHODS AND MATERIAL The summary of product characteristics (SPC) represents the basis of information for health professionals on how to use medicines. However, this information is locked in free-text and, as such, cannot be actively accessed and elaborated by computerized applications. Our approach exploits a combination of machine learning and rule-based methods. It consists of two stages. Initially it learns to classify this information in a structured prediction framework, relying on conditional random fields. The classifier is trained and evaluated using a corpus of about a hundred SPCs. They have been hand-annotated with different semantic labels that have been derived from the domain ontology. At a second stage the extracted entities are added in the domain ontology corresponding concepts as new instances, using a set of rules manually-constructed from the corpus. RESULTS Our evaluations show that the extraction module exhibits high overall performance, with an average F1-measure of 88% for contraindications and 90% for interactions. CONCLUSION SPCs can be exploited to provide structured information for computer-based decision support systems.
Collapse
Affiliation(s)
- Stefania Rubrichi
- Laboratory for Biomedical Informatics Mario Stefanelli, Dipartimento di Ingegneria Industriale e dell'Informazione, University of Pavia, via Fearrata 1, 27100 Pavia, Italy.
| | | | | | | | | |
Collapse
|
18
|
Segura-Bedmar I, Martínez P, de Pablo-Sánchez C. Using a shallow linguistic kernel for drug–drug interaction extraction. J Biomed Inform 2011; 44:789-804. [DOI: 10.1016/j.jbi.2011.04.005] [Citation(s) in RCA: 89] [Impact Index Per Article: 6.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/08/2010] [Revised: 04/14/2011] [Accepted: 04/19/2011] [Indexed: 11/26/2022]
|
19
|
|
20
|
Suakkaphong N, Zhang Z, Chen H. Disease named entity recognition using semisupervised learning and conditional random fields. ACTA ACUST UNITED AC 2011. [DOI: 10.1002/asi.21488] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/11/2022]
|
21
|
Xu KSJ, Wang W, Ren J, Xu JSY, Liu L, Liao S. Classifying Consumer Comparison Opinions to Uncover Product Strengths and Weaknesses. INTERNATIONAL JOURNAL OF INTELLIGENT INFORMATION TECHNOLOGIES 2011. [DOI: 10.4018/jiit.2011010101] [Citation(s) in RCA: 29] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]
Abstract
With the Web 2.0 paradigm, a huge volume of Web content is generated by users at online forums, wikis, blogs, and social networks, among others. These user-contributed contents include numerous user opinions regarding products, services, or political issues. Among these user opinions, certain comparison opinions exist, reflecting customer preferences. Mining comparison opinions is useful as these types of viewpoints can bring more business values than other types of opinion data. Manufacturers can better understand relative product strengths or weaknesses, and accordingly develop better products to meet consumer requirements. Meanwhile, consumers can make purchasing decisions that are more informed by comparing the various features of similar products. In this paper, a novel Support Vector Machine-based method is proposed to automatically identify comparison opinions, extract comparison relations, and display results with the comparison relation maps by mining the volume of consumer opinions posted on the Web. The proposed method is empirically evaluated based on consumer opinions crawled from the Web. The initial experimental results show that the performance of the proposed method is promising and this research opens the door to utilizing these comparison opinions for business intelligence.
Collapse
Affiliation(s)
| | - Wei Wang
- City University of Hong Kong, China
| | | | | | - Long Liu
- USTC-CityU Joint Advanced Research Centre, China
| | | |
Collapse
|
22
|
A concept-driven biomedical knowledge extraction and visualization framework for conceptualization of text corpora. J Biomed Inform 2010; 43:1020-35. [DOI: 10.1016/j.jbi.2010.09.008] [Citation(s) in RCA: 16] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/18/2010] [Revised: 08/22/2010] [Accepted: 09/21/2010] [Indexed: 11/19/2022]
|