Reference Citation Analysis: Find an Article, Find a Category, Find a Journal, Find a Scholar

For: Alsuhaibani M, Bollegala D, Maehara T, Kawarabayashi KI. Jointly learning word embeddings using a corpus and a knowledge base. PLoS One 2018. [PMID: 29529052 PMCID: PMC5847320 DOI: 10.1371/journal.pone.0193094] [Citation(s) in RCA: 12] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/05/2022] Open

For:	Alsuhaibani M, Bollegala D, Maehara T, Kawarabayashi KI. Jointly learning word embeddings using a corpus and a knowledge base. PLoS One 2018. [PMID: 29529052 PMCID: PMC5847320 DOI: 10.1371/journal.pone.0193094] [Citation(s) in RCA: 12] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/05/2022] Open

Number

Cited by Other Article(s)

Biomedical Text Classification Using Augmented Word Representation Based on Distributional and Relational Contexts. COMPUTATIONAL INTELLIGENCE AND NEUROSCIENCE 2023. [DOI: 10.1155/2023/2989791] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 02/17/2023]

Abstract Due to the increasing use of information technologies by biomedical experts, researchers, public health agencies, and healthcare professionals, a large number of scientific literatures, clinical notes, and other structured and unstructured text resources are rapidly increasing and being stored in various data sources like PubMed. These massive text resources can be leveraged to extract valuable knowledge and insights using machine learning techniques. Recent advancement in neural network-based classification models has gained popularity which takes numeric vectors (aka word representation) of training data as the input to train classification models. Better the input vectors, more accurate would be the classification. Word representations are learned as the distribution of words in an embedding space, wherein each word has its vector and the semantically similar words based on the contexts appear nearby each other. However, such distributional word representations are incapable of encapsulating relational semantics between distant words. In the biomedical domain, relation mining is a well-studied problem which aims to extract relational words, which associates distant entities generally representing the subject and object of a sentence. Our goal is to capture the relational semantics information between distant words from a large corpus to learn enhanced word representation and employ the learned word representation for various natural language processing tasks such as text classification. In this article, we have proposed an application of biomedical relation triplets to learn word representation through incorporating relational semantic information within the distributional representation of words. In other words, the proposed approach aims to capture both distributional and relational contexts of the words to learn their numeric vectors from text corpus. We have also proposed an application of the learned word representations for text classification. The proposed approach is evaluated over multiple benchmark datasets, and the efficacy of the learned word representations is tested in terms of word similarity and concept categorization tasks. Our proposed approach provides better performance in comparison to the state-of-the-art GloVe model. Furthermore, we have applied the learned word representations to classify biomedical texts using four neural network-based classification models, and the classification accuracy further confirms the effectiveness of the learned word representations by our proposed approach. Collapse

Lexical semantics enhanced neural word embeddings. Knowl Based Syst 2022. [DOI: 10.1016/j.knosys.2022.109298] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022]

Sänger M, Leser U. Large-scale entity representation learning for biomedical relationship extraction. Bioinformatics 2021;37:236-242. [PMID: 32726411 DOI: 10.1093/bioinformatics/btaa674] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/03/2020] [Revised: 07/14/2020] [Accepted: 07/21/2020] [Indexed: 11/13/2022] Open

Jiang S, Wu W, Tomita N, Ganoe C, Hassanpour S. Multi-Ontology Refined Embeddings (MORE): A hybrid multi-ontology and corpus-based semantic representation model for biomedical concepts. J Biomed Inform 2020;111:103581. [PMID: 33010425 DOI: 10.1016/j.jbi.2020.103581] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/14/2020] [Revised: 09/22/2020] [Accepted: 09/26/2020] [Indexed: 11/25/2022]

Abstract

OBJECTIVE

Currently, a major limitation for natural language processing (NLP) analyses in clinical applications is that concepts are not effectively referenced in various forms across different texts. This paper introduces Multi-Ontology Refined Embeddings (MORE), a novel hybrid framework that incorporates domain knowledge from multiple ontologies into a distributional semantic model, learned from a corpus of clinical text.

MATERIALS AND METHODS

We use the RadCore and MIMIC-III free-text datasets for the corpus-based component of MORE. For the ontology-based part, we use the Medical Subject Headings (MeSH) ontology and three state-of-the-art ontology-based similarity measures. In our approach, we propose a new learning objective, modified from the sigmoid cross-entropy objective function.

RESULTS AND DISCUSSION

We used two established datasets of semantic similarities among biomedical concept pairs to evaluate the quality of the generated word embeddings. On the first dataset with 29 concept pairs, with similarity scores established by physicians and medical coders, MORE's similarity scores have the highest combined correlation (0.633), which is 5.0% higher than that of the baseline model, and 12.4% higher than that of the best ontology-based similarity measure. On the second dataset with 449 concept pairs, MORE's similarity scores have a correlation of 0.481, based on the average of four medical residents' similarity ratings, and that outperforms the skip-gram model by 8.1%, and the best ontology measure by 6.9%. Furthermore, MORE outperforms three pre-trained transformer-based word embedding models (i.e., BERT, ClinicalBERT, and BioBERT) on both datasets.

CONCLUSION

MORE incorporates knowledge from several biomedical ontologies into an existing corpus-based distributional semantics model, improving both the accuracy of the learned word embeddings and the extensibility of the model to a broader range of biomedical concepts. MORE allows for more accurate clustering of concepts across a wide range of applications, such as analyzing patient health records to identify subjects with similar pathologies, or integrating heterogeneous clinical data to improve interoperability between hospitals.

Collapse

Arguello-Casteleiro M, Stevens R, Des-Diz J, Wroe C, Fernandez-Prieto MJ, Maroto N, Maseda-Fernandez D, Demetriou G, Peters S, Noble PJM, Jones PH, Dukes-McEwan J, Radford AD, Keane J, Nenadic G. Exploring semantic deep learning for building reliable and reusable one health knowledge from PubMed systematic reviews and veterinary clinical notes. J Biomed Semantics 2019;10:22. [PMID: 31711540 PMCID: PMC6849172 DOI: 10.1186/s13326-019-0212-6] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/14/2023] Open

Abstract

Background

Deep Learning opens up opportunities for routinely scanning large bodies of biomedical literature and clinical narratives to represent the meaning of biomedical and clinical terms. However, the validation and integration of this knowledge on a scale requires cross checking with ground truths (i.e. evidence-based resources) that are unavailable in an actionable or computable form. In this paper we explore how to turn information about diagnoses, prognoses, therapies and other clinical concepts into computable knowledge using free-text data about human and animal health. We used a Semantic Deep Learning approach that combines the Semantic Web technologies and Deep Learning to acquire and validate knowledge about 11 well-known medical conditions mined from two sets of unstructured free-text data: 300 K PubMed Systematic Review articles (the PMSB dataset) and 2.5 M veterinary clinical notes (the VetCN dataset). For each target condition we obtained 20 related clinical concepts using two deep learning methods applied separately on the two datasets, resulting in 880 term pairs (target term, candidate term). Each concept, represented by an n-gram, is mapped to UMLS using MetaMap; we also developed a bespoke method for mapping short forms (e.g. abbreviations and acronyms). Existing ontologies were used to formally represent associations. We also create ontological modules and illustrate how the extracted knowledge can be queried. The evaluation was performed using the content within BMJ Best Practice.

Results

MetaMap achieves an F measure of 88% (precision 85%, recall 91%) when applied directly to the total of 613 unique candidate terms for the 880 term pairs. When the processing of short forms is included, MetaMap achieves an F measure of 94% (precision 92%, recall 96%). Validation of the term pairs with BMJ Best Practice yields precision between 98 and 99%.

Conclusions

The Semantic Deep Learning approach can transform neural embeddings built from unstructured free-text data into reliable and reusable One Health knowledge using ontologies and content from BMJ Best Practice.

Collapse

Grabar N, Grouin C. A Year of Papers Using Biomedical Texts: Findings from the Section on Natural Language Processing of the IMIA Yearbook. Yearb Med Inform 2019;28:218-222. [PMID: 31419835 PMCID: PMC6697498 DOI: 10.1055/s-0039-1677937] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/09/2023] Open