1
|
Leroy G, Andrews JG, KeAlohi-Preece M, Jaswani A, Song H, Galindo MK, Rice SA. Transparent deep learning to identify autism spectrum disorders (ASD) in EHR using clinical notes. J Am Med Inform Assoc 2024; 31:1313-1321. [PMID: 38626184 PMCID: PMC11105145 DOI: 10.1093/jamia/ocae080] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/02/2023] [Revised: 03/25/2024] [Accepted: 04/03/2024] [Indexed: 04/18/2024] Open
Abstract
OBJECTIVE Machine learning (ML) is increasingly employed to diagnose medical conditions, with algorithms trained to assign a single label using a black-box approach. We created an ML approach using deep learning that generates outcomes that are transparent and in line with clinical, diagnostic rules. We demonstrate our approach for autism spectrum disorders (ASD), a neurodevelopmental condition with increasing prevalence. METHODS We use unstructured data from the Centers for Disease Control and Prevention (CDC) surveillance records labeled by a CDC-trained clinician with ASD A1-3 and B1-4 criterion labels per sentence and with ASD cases labels per record using Diagnostic and Statistical Manual of Mental Disorders (DSM5) rules. One rule-based and three deep ML algorithms and six ensembles were compared and evaluated using a test set with 6773 sentences (N = 35 cases) set aside in advance. Criterion and case labeling were evaluated for each ML algorithm and ensemble. Case labeling outcomes were compared also with seven traditional tests. RESULTS Performance for criterion labeling was highest for the hybrid BiLSTM ML model. The best case labeling was achieved by an ensemble of two BiLSTM ML models using a majority vote. It achieved 100% precision (or PPV), 83% recall (or sensitivity), 100% specificity, 91% accuracy, and 0.91 F-measure. A comparison with existing diagnostic tests shows that our best ensemble was more accurate overall. CONCLUSIONS Transparent ML is achievable even with small datasets. By focusing on intermediate steps, deep ML can provide transparent decisions. By leveraging data redundancies, ML errors at the intermediate level have a low impact on final outcomes.
Collapse
Affiliation(s)
- Gondy Leroy
- Department of Management Information Systems, The University of Arizona, Tucson, AZ 85621, United States
| | - Jennifer G Andrews
- Department of Pediatrics, The University of Arizona, Tucson, AZ 85621, United States
| | | | - Ajay Jaswani
- Department of Management Information Systems, The University of Arizona, Tucson, AZ 85621, United States
| | - Hyunju Song
- Department of Computer Science, The University of Arizona, Tucson, AZ 85621, United States
| | - Maureen Kelly Galindo
- Department of Pediatrics, The University of Arizona, Tucson, AZ 85621, United States
| | - Sydney A Rice
- Department of Pediatrics, The University of Arizona, Tucson, AZ 85621, United States
| |
Collapse
|
2
|
Jing X. The Unified Medical Language System at 30 Years and How It Is Used and Published: Systematic Review and Content Analysis. JMIR Med Inform 2021; 9:e20675. [PMID: 34236337 PMCID: PMC8433943 DOI: 10.2196/20675] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/25/2020] [Revised: 11/25/2020] [Accepted: 07/02/2021] [Indexed: 01/22/2023] Open
Abstract
BACKGROUND The Unified Medical Language System (UMLS) has been a critical tool in biomedical and health informatics, and the year 2021 marks its 30th anniversary. The UMLS brings together many broadly used vocabularies and standards in the biomedical field to facilitate interoperability among different computer systems and applications. OBJECTIVE Despite its longevity, there is no comprehensive publication analysis of the use of the UMLS. Thus, this review and analysis is conducted to provide an overview of the UMLS and its use in English-language peer-reviewed publications, with the objective of providing a comprehensive understanding of how the UMLS has been used in English-language peer-reviewed publications over the last 30 years. METHODS PubMed, ACM Digital Library, and the Nursing & Allied Health Database were used to search for studies. The primary search strategy was as follows: UMLS was used as a Medical Subject Headings term or a keyword or appeared in the title or abstract. Only English-language publications were considered. The publications were screened first, then coded and categorized iteratively, following the grounded theory. The review process followed the PRISMA (Preferred Reporting Items for Systematic Reviews and Meta-Analyses) guidelines. RESULTS A total of 943 publications were included in the final analysis. Moreover, 32 publications were categorized into 2 categories; hence the total number of publications before duplicates are removed is 975. After analysis and categorization of the publications, UMLS was found to be used in the following emerging themes or areas (the number of publications and their respective percentages are given in parentheses): natural language processing (230/975, 23.6%), information retrieval (125/975, 12.8%), terminology study (90/975, 9.2%), ontology and modeling (80/975, 8.2%), medical subdomains (76/975, 7.8%), other language studies (53/975, 5.4%), artificial intelligence tools and applications (46/975, 4.7%), patient care (35/975, 3.6%), data mining and knowledge discovery (25/975, 2.6%), medical education (20/975, 2.1%), degree-related theses (13/975, 1.3%), digital library (5/975, 0.5%), and the UMLS itself (150/975, 15.4%), as well as the UMLS for other purposes (27/975, 2.8%). CONCLUSIONS The UMLS has been used successfully in patient care, medical education, digital libraries, and software development, as originally planned, as well as in degree-related theses, the building of artificial intelligence tools, data mining and knowledge discovery, foundational work in methodology, and middle layers that may lead to advanced products. Natural language processing, the UMLS itself, and information retrieval are the 3 most common themes that emerged among the included publications. The results, although largely related to academia, demonstrate that UMLS achieves its intended uses successfully, in addition to achieving uses broadly beyond its original intentions.
Collapse
Affiliation(s)
- Xia Jing
- Department of Public Health Sciences, College of Behavioral, Social and Health Sciences, Clemson University, Clemson, SC, United States
| |
Collapse
|
3
|
Leroy G, Gu Y, Pettygrove S, Galindo MK, Arora A, Kurzius-Spencer M. Automated Extraction of Diagnostic Criteria From Electronic Health Records for Autism Spectrum Disorders: Development, Evaluation, and Application. J Med Internet Res 2018; 20:e10497. [PMID: 30404767 PMCID: PMC6249505 DOI: 10.2196/10497] [Citation(s) in RCA: 14] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/25/2018] [Revised: 06/18/2018] [Accepted: 07/10/2018] [Indexed: 12/18/2022] Open
Abstract
BACKGROUND Electronic health records (EHRs) bring many opportunities for information utilization. One such use is the surveillance conducted by the Centers for Disease Control and Prevention to track cases of autism spectrum disorder (ASD). This process currently comprises manual collection and review of EHRs of 4- and 8-year old children in 11 US states for the presence of ASD criteria. The work is time-consuming and expensive. OBJECTIVE Our objective was to automatically extract from EHRs the description of behaviors noted by the clinicians in evidence of the diagnostic criteria in the Diagnostic and Statistical Manual of Mental Disorders (DSM). Previously, we reported on the classification of entire EHRs as ASD or not. In this work, we focus on the extraction of individual expressions of the different ASD criteria in the text. We intend to facilitate large-scale surveillance efforts for ASD and support analysis of changes over time as well as enable integration with other relevant data. METHODS We developed a natural language processing (NLP) parser to extract expressions of 12 DSM criteria using 104 patterns and 92 lexicons (1787 terms). The parser is rule-based to enable precise extraction of the entities from the text. The entities themselves are encompassed in the EHRs as very diverse expressions of the diagnostic criteria written by different people at different times (clinicians, speech pathologists, among others). Due to the sparsity of the data, a rule-based approach is best suited until larger datasets can be generated for machine learning algorithms. RESULTS We evaluated our rule-based parser and compared it with a machine learning baseline (decision tree). Using a test set of 6636 sentences (50 EHRs), we found that our parser achieved 76% precision, 43% recall (ie, sensitivity), and >99% specificity for criterion extraction. The performance was better for the rule-based approach than for the machine learning baseline (60% precision and 30% recall). For some individual criteria, precision was as high as 97% and recall 57%. Since precision was very high, we were assured that criteria were rarely assigned incorrectly, and our numbers presented a lower bound of their presence in EHRs. We then conducted a case study and parsed 4480 new EHRs covering 10 years of surveillance records from the Arizona Developmental Disabilities Surveillance Program. The social criteria (A1 criteria) showed the biggest change over the years. The communication criteria (A2 criteria) did not distinguish the ASD from the non-ASD records. Among behaviors and interests criteria (A3 criteria), 1 (A3b) was present with much greater frequency in the ASD than in the non-ASD EHRs. CONCLUSIONS Our results demonstrate that NLP can support large-scale analysis useful for ASD surveillance and research. In the future, we intend to facilitate detailed analysis and integration of national datasets.
Collapse
Affiliation(s)
- Gondy Leroy
- University of Arizona, Tucson, AZ, United States
| | - Yang Gu
- University of Arizona, Tucson, AZ, United States
| | | | | | | | | |
Collapse
|
4
|
Ning W, Yu M, Kong D. Evaluating semantic similarity between Chinese biomedical terms through multiple ontologies with score normalization: An initial study. J Biomed Inform 2016; 64:273-287. [PMID: 27810481 DOI: 10.1016/j.jbi.2016.10.017] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/15/2016] [Revised: 09/13/2016] [Accepted: 10/30/2016] [Indexed: 10/20/2022]
Abstract
BACKGROUND Semantic similarity estimation significantly promotes the understanding of natural language resources and supports medical decision making. Previous studies have investigated semantic similarity and relatedness estimation between biomedical terms through resources in English, such as SNOMED-CT or UMLS. However, very limited studies focused on the Chinese language, and technology on natural language processing and text mining of medical documents in China is urgently needed. Due to the lack of a complete and publicly available biomedical ontology in China, we only have access to several modest-sized ontologies with no overlaps. Although all these ontologies do not constitute a complete coverage of biomedicine, their coverage of their respective domains is acceptable. In this paper, semantic similarity estimations between Chinese biomedical terms using these multiple non-overlapping ontologies were explored as an initial study. METHODS Typical path-based and information content (IC)-based similarity measures were applied on these ontologies. From the analysis of the computed similarity scores, heterogeneity in the statistical distributions of scores derived from multiple ontologies was discovered. This heterogeneity hampers the comparability of scores and the overall accuracy of similarity estimation. This problem was addressed through a novel language-independent method by combining semantic similarity estimation and score normalization. A reference standard was also created in this study. RESULTS Compared with the existing task-independent normalization methods, the newly developed method exhibited superior performance on most IC-based similarity measures. The accuracy of semantic similarity estimation was enhanced through score normalization. This enhancement resulted from the mitigation of heterogeneity in the similarity scores derived from multiple ontologies. CONCLUSION We demonstrated the potential necessity of score normalization when estimating semantic similarity using ontology-based measures. The results of this study can also be extended to other language systems to implement semantic similarity estimation in biomedicine.
Collapse
Affiliation(s)
- Wenxin Ning
- Health Care Services Research Center, Department of Industrial Engineering, Tsinghua University, Beijing 100084, China.
| | - Ming Yu
- Health Care Services Research Center, Department of Industrial Engineering, Tsinghua University, Beijing 100084, China.
| | - Dehua Kong
- Health Care Services Research Center, Department of Industrial Engineering, Tsinghua University, Beijing 100084, China.
| |
Collapse
|
5
|
Harispe S, Sánchez D, Ranwez S, Janaqi S, Montmain J. A framework for unifying ontology-based semantic similarity measures: a study in the biomedical domain. J Biomed Inform 2013; 48:38-53. [PMID: 24269894 DOI: 10.1016/j.jbi.2013.11.006] [Citation(s) in RCA: 46] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/16/2013] [Revised: 11/06/2013] [Accepted: 11/09/2013] [Indexed: 10/26/2022]
Abstract
Ontologies are widely adopted in the biomedical domain to characterize various resources (e.g. diseases, drugs, scientific publications) with non-ambiguous meanings. By exploiting the structured knowledge that ontologies provide, a plethora of ad hoc and domain-specific semantic similarity measures have been defined over the last years. Nevertheless, some critical questions remain: which measure should be defined/chosen for a concrete application? Are some of the, a priori different, measures indeed equivalent? In order to bring some light to these questions, we perform an in-depth analysis of existing ontology-based measures to identify the core elements of semantic similarity assessment. As a result, this paper presents a unifying framework that aims to improve the understanding of semantic measures, to highlight their equivalences and to propose bridges between their theoretical bases. By demonstrating that groups of measures are just particular instantiations of parameterized functions, we unify a large number of state-of-the-art semantic similarity measures through common expressions. The application of the proposed framework and its practical usefulness is underlined by an empirical analysis of hundreds of semantic measures in a biomedical context.
Collapse
Affiliation(s)
- Sébastien Harispe
- LGI2P/EMA Research Centre, Site de Nîmes, Parc scientifique G. Besse, 30035 Nîmes cedex 1, France.
| | - David Sánchez
- Departament d'Enginyeria Informàtica i Matemàtiques, Universitat Rovira i Virgili, Av. Països Catalans, 26, 43007 Tarragona, Spain
| | - Sylvie Ranwez
- LGI2P/EMA Research Centre, Site de Nîmes, Parc scientifique G. Besse, 30035 Nîmes cedex 1, France
| | - Stefan Janaqi
- LGI2P/EMA Research Centre, Site de Nîmes, Parc scientifique G. Besse, 30035 Nîmes cedex 1, France
| | - Jacky Montmain
- LGI2P/EMA Research Centre, Site de Nîmes, Parc scientifique G. Besse, 30035 Nîmes cedex 1, France
| |
Collapse
|
6
|
Leroy G, Endicott JE, Kauchak D, Mouradi O, Just M. User evaluation of the effects of a text simplification algorithm using term familiarity on perception, understanding, learning, and information retention. J Med Internet Res 2013; 15:e144. [PMID: 23903235 PMCID: PMC3742397 DOI: 10.2196/jmir.2569] [Citation(s) in RCA: 30] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/06/2013] [Revised: 05/31/2013] [Accepted: 06/09/2013] [Indexed: 11/16/2022] Open
Abstract
Background Adequate health literacy is important for people to maintain good health and manage diseases and injuries. Educational text, either retrieved from the Internet or provided by a doctor’s office, is a popular method to communicate health-related information. Unfortunately, it is difficult to write text that is easy to understand, and existing approaches, mostly the application of readability formulas, have not convincingly been shown to reduce the difficulty of text. Objective To develop an evidence-based writer support tool to improve perceived and actual text difficulty. To this end, we are developing and testing algorithms that automatically identify difficult sections in text and provide appropriate, easier alternatives; algorithms that effectively reduce text difficulty will be included in the support tool. This work describes the user evaluation with an independent writer of an automated simplification algorithm using term familiarity. Methods Term familiarity indicates how easy words are for readers and is estimated using term frequencies in the Google Web Corpus. Unfamiliar words are algorithmically identified and tagged for potential replacement. Easier alternatives consisting of synonyms, hypernyms, definitions, and semantic types are extracted from WordNet, the Unified Medical Language System (UMLS), and Wiktionary and ranked for a writer to choose from to simplify the text. We conducted a controlled user study with a representative writer who used our simplification algorithm to simplify texts. We tested the impact with representative consumers. The key independent variable of our study is lexical simplification, and we measured its effect on both perceived and actual text difficulty. Participants were recruited from Amazon’s Mechanical Turk website. Perceived difficulty was measured with 1 metric, a 5-point Likert scale. Actual difficulty was measured with 3 metrics: 5 multiple-choice questions alongside each text to measure understanding, 7 multiple-choice questions without the text for learning, and 2 free recall questions for information retention. Results Ninety-nine participants completed the study. We found strong beneficial effects on both perceived and actual difficulty. After simplification, the text was perceived as simpler (P<.001) with simplified text scoring 2.3 and original text 3.2 on the 5-point Likert scale (score 1: easiest). It also led to better understanding of the text (P<.001) with 11% more correct answers with simplified text (63% correct) compared to the original (52% correct). There was more learning with 18% more correct answers after reading simplified text compared to 9% more correct answers after reading the original text (P=.003). There was no significant effect on free recall. Conclusions Term familiarity is a valuable feature in simplifying text. Although the topic of the text influences the effect size, the results were convincing and consistent.
Collapse
Affiliation(s)
- Gondy Leroy
- Information Systems and Technology, Claremont Graduate University, Claremont, CA 91711, United States.
| | | | | | | | | |
Collapse
|
7
|
Xiong W, Song M, deVersterre L. A Comparative Study of an Unsupervised Word Sense Disambiguation Approach. Bioinformatics 2013. [DOI: 10.4018/978-1-4666-3604-0.ch066] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/09/2022] Open
Abstract
Word sense disambiguation is the problem of selecting a sense for a word from a set of predefined possibilities. This is a significant problem in the biomedical domain where a single word may be used to describe a gene, protein, or abbreviation. In this paper, we evaluate SENSATIONAL, a novel unsupervised WSD technique, in comparison with two popular learning algorithms: support vector machines (SVM) and K-means. Based on the accuracy measure, our results show that SENSATIONAL outperforms SVM and K-means by 2% and 17%, respectively. In addition, we develop a polysemy-based search engine and an experimental visualization application that utilizes SENSATIONAL’s clustering technique.
Collapse
Affiliation(s)
- Wei Xiong
- New Jersey Institute of Technology, USA
| | - Min Song
- New Jersey Institute of Technology, USA
| | | |
Collapse
|
8
|
Sánchez D, Batet M. Semantic similarity estimation in the biomedical domain: An ontology-based information-theoretic perspective. J Biomed Inform 2011; 44:749-59. [DOI: 10.1016/j.jbi.2011.03.013] [Citation(s) in RCA: 72] [Impact Index Per Article: 5.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/17/2010] [Revised: 03/25/2011] [Accepted: 03/28/2011] [Indexed: 10/18/2022]
|
9
|
Morrey CP, Chen L, Halper M, Perl Y. Resolution of redundant semantic type assignments for organic chemicals in the UMLS. Artif Intell Med 2011; 52:141-51. [PMID: 21646001 DOI: 10.1016/j.artmed.2011.05.003] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/29/2009] [Revised: 05/03/2011] [Accepted: 05/09/2011] [Indexed: 11/27/2022]
Abstract
OBJECTIVE The Unified Medical Language System (UMLS) integrates terms from different sources into concepts and supplements these with the assignment of one or more high-level semantic types (STs) from its Semantic Network (SN). For a composite organic chemical concept, multiple assignments of organic chemical STs often serve to enumerate the types of the composite's underlying chemical constituents. This practice sometimes leads to the introduction of a forbidden redundant ST assignment, where both an ST and one of its descendants are assigned to the same concept. A methodology for resolving redundant ST assignments for organic chemicals, better capturing the essence of such composite chemicals than the typical omission of the more general ST, is presented. MATERIALS AND METHODS The typical SN resolution of a redundant ST assignment is to retain only the more specific ST assignment and omit the more general one. However, with organic chemicals, that is not always the correct strategy. A methodology for properly dealing with the redundancy based on the relative sizes of the chemical components is presented. It is more accurate to use the ST of the larger chemical component for capturing the category of the concept, even if that means using the more general ST. RESULTS A sample of 254 chemical concepts having redundant ST assignments in older UMLS releases was audited to analyze the accuracy of current ST assignments. For 81 (32%) of them, our chemical analysis-based approach yielded a different recommendation from the UMLS (2009AA). New UMLS usage notes capturing rules of this methodology are proffered. CONCLUSIONS Redundant ST assignments have typically arisen for organic composite chemical concepts. A methodology for dealing with this kind of erroneous configuration, capturing the proper category for a composite chemical, is presented and demonstrated.
Collapse
Affiliation(s)
- C Paul Morrey
- Information Systems, New York-Presbyterian Hospital, NY 10030, United States.
| | | | | | | |
Collapse
|
10
|
Jimeno-Yepes AJ, McInnes BT, Aronson AR. Exploiting MeSH indexing in MEDLINE to generate a data set for word sense disambiguation. BMC Bioinformatics 2011; 12:223. [PMID: 21635749 PMCID: PMC3123611 DOI: 10.1186/1471-2105-12-223] [Citation(s) in RCA: 48] [Impact Index Per Article: 3.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/09/2010] [Accepted: 06/02/2011] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Evaluation of Word Sense Disambiguation (WSD) methods in the biomedical domain is difficult because the available resources are either too small or too focused on specific types of entities (e.g. diseases or genes). We present a method that can be used to automatically develop a WSD test collection using the Unified Medical Language System (UMLS) Metathesaurus and the manual MeSH indexing of MEDLINE. We demonstrate the use of this method by developing such a data set, called MSH WSD. METHODS In our method, the Metathesaurus is first screened to identify ambiguous terms whose possible senses consist of two or more MeSH headings. We then use each ambiguous term and its corresponding MeSH heading to extract MEDLINE citations where the term and only one of the MeSH headings co-occur. The term found in the MEDLINE citation is automatically assigned the UMLS CUI linked to the MeSH heading. Each instance has been assigned a UMLS Concept Unique Identifier (CUI). We compare the characteristics of the MSH WSD data set to the previously existing NLM WSD data set. RESULTS The resulting MSH WSD data set consists of 106 ambiguous abbreviations, 88 ambiguous terms and 9 which are a combination of both, for a total of 203 ambiguous entities. For each ambiguous term/abbreviation, the data set contains a maximum of 100 instances per sense obtained from MEDLINE.We evaluated the reliability of the MSH WSD data set using existing knowledge-based methods and compared their performance to that of the results previously obtained by these algorithms on the pre-existing data set, NLM WSD. We show that the knowledge-based methods achieve different results but keep their relative performance except for the Journal Descriptor Indexing (JDI) method, whose performance is below the other methods. CONCLUSIONS The MSH WSD data set allows the evaluation of WSD algorithms in the biomedical domain. Compared to previously existing data sets, MSH WSD contains a larger number of biomedical terms/abbreviations and covers the largest set of UMLS Semantic Types. Furthermore, the MSH WSD data set has been generated automatically reusing already existing annotations and, therefore, can be regenerated from subsequent UMLS versions.
Collapse
|
11
|
Disambiguation in the biomedical domain: The role of ambiguity type. J Biomed Inform 2010; 43:972-81. [DOI: 10.1016/j.jbi.2010.08.009] [Citation(s) in RCA: 16] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/22/2010] [Revised: 08/19/2010] [Accepted: 08/20/2010] [Indexed: 10/19/2022]
|
12
|
Jimeno-Yepes AJ, Aronson AR. Knowledge-based biomedical word sense disambiguation: comparison of approaches. BMC Bioinformatics 2010; 11:569. [PMID: 21092226 PMCID: PMC3001745 DOI: 10.1186/1471-2105-11-569] [Citation(s) in RCA: 31] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/06/2010] [Accepted: 11/22/2010] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Word sense disambiguation (WSD) algorithms attempt to select the proper sense of ambiguous terms in text. Resources like the UMLS provide a reference thesaurus to be used to annotate the biomedical literature. Statistical learning approaches have produced good results, but the size of the UMLS makes the production of training data infeasible to cover all the domain. METHODS We present research on existing WSD approaches based on knowledge bases, which complement the studies performed on statistical learning. We compare four approaches which rely on the UMLS Metathesaurus as the source of knowledge. The first approach compares the overlap of the context of the ambiguous word to the candidate senses based on a representation built out of the definitions, synonyms and related terms. The second approach collects training data for each of the candidate senses to perform WSD based on queries built using monosemous synonyms and related terms. These queries are used to retrieve MEDLINE citations. Then, a machine learning approach is trained on this corpus. The third approach is a graph-based method which exploits the structure of the Metathesaurus network of relations to perform unsupervised WSD. This approach ranks nodes in the graph according to their relative structural importance. The last approach uses the semantic types assigned to the concepts in the Metathesaurus to perform WSD. The context of the ambiguous word and semantic types of the candidate concepts are mapped to Journal Descriptors. These mappings are compared to decide among the candidate concepts. Results are provided estimating accuracy of the different methods on the WSD test collection available from the NLM. CONCLUSIONS We have found that the last approach achieves better results compared to the other methods. The graph-based approach, using the structure of the Metathesaurus network to estimate the relevance of the Metathesaurus concepts, does not perform well compared to the first two methods. In addition, the combination of methods improves the performance over the individual approaches. On the other hand, the performance is still below statistical learning trained on manually produced data and below the maximum frequency sense baseline. Finally, we propose several directions to improve the existing methods and to improve the Metathesaurus to be more effective in WSD.
Collapse
|
13
|
Stevenson M, Guo Y. Disambiguation of ambiguous biomedical terms using examples generated from the UMLS Metathesaurus. J Biomed Inform 2010; 43:762-73. [DOI: 10.1016/j.jbi.2010.06.001] [Citation(s) in RCA: 14] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/25/2009] [Revised: 06/03/2010] [Accepted: 06/04/2010] [Indexed: 11/25/2022]
|
14
|
Adaptively entropy-based weighting classifiers in combination using Dempster–Shafer theory for word sense disambiguation. COMPUT SPEECH LANG 2010. [DOI: 10.1016/j.csl.2009.06.003] [Citation(s) in RCA: 21] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/19/2022]
|
15
|
|
16
|
Duan W, Song M, Yates A. Fast max-margin clustering for unsupervised word sense disambiguation in biomedical texts. BMC Bioinformatics 2009; 10 Suppl 3:S4. [PMID: 19344480 PMCID: PMC2665052 DOI: 10.1186/1471-2105-10-s3-s4] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND We aim to solve the problem of determining word senses for ambiguous biomedical terms with minimal human effort. METHODS We build a fully automated system for Word Sense Disambiguation by designing a system that does not require manually-constructed external resources or manually-labeled training examples except for a single ambiguous word. The system uses a novel and efficient graph-based algorithm to cluster words into groups that have the same meaning. Our algorithm follows the principle of finding a maximum margin between clusters, determining a split of the data that maximizes the minimum distance between pairs of data points belonging to two different clusters. RESULTS On a test set of 21 ambiguous keywords from PubMed abstracts, our system has an average accuracy of 78%, outperforming a state-of-the-art unsupervised system by 2% and a baseline technique by 23%. On a standard data set from the National Library of Medicine, our system outperforms the baseline by 6% and comes within 5% of the accuracy of a supervised system. CONCLUSION Our system is a novel, state-of-the-art technique for efficiently finding word sense clusters, and does not require training data or human effort for each new word to be disambiguated.
Collapse
Affiliation(s)
- Weisi Duan
- Department of Computer and Information Sciences, Temple University, Philadelphia, PA 19122, USA.
| | | | | |
Collapse
|
17
|
Alexopoulou D, Andreopoulos B, Dietze H, Doms A, Gandon F, Hakenberg J, Khelif K, Schroeder M, Wächter T. Biomedical word sense disambiguation with ontologies and metadata: automation meets accuracy. BMC Bioinformatics 2009; 10:28. [PMID: 19159460 PMCID: PMC2663782 DOI: 10.1186/1471-2105-10-28] [Citation(s) in RCA: 20] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/16/2008] [Accepted: 01/21/2009] [Indexed: 11/24/2022] Open
Abstract
Background Ontology term labels can be ambiguous and have multiple senses. While this is no problem for human annotators, it is a challenge to automated methods, which identify ontology terms in text. Classical approaches to word sense disambiguation use co-occurring words or terms. However, most treat ontologies as simple terminologies, without making use of the ontology structure or the semantic similarity between terms. Another useful source of information for disambiguation are metadata. Here, we systematically compare three approaches to word sense disambiguation, which use ontologies and metadata, respectively. Results The 'Closest Sense' method assumes that the ontology defines multiple senses of the term. It computes the shortest path of co-occurring terms in the document to one of these senses. The 'Term Cooc' method defines a log-odds ratio for co-occurring terms including co-occurrences inferred from the ontology structure. The 'MetaData' approach trains a classifier on metadata. It does not require any ontology, but requires training data, which the other methods do not. To evaluate these approaches we defined a manually curated training corpus of 2600 documents for seven ambiguous terms from the Gene Ontology and MeSH. All approaches over all conditions achieve 80% success rate on average. The 'MetaData' approach performed best with 96%, when trained on high-quality data. Its performance deteriorates as quality of the training data decreases. The 'Term Cooc' approach performs better on Gene Ontology (92% success) than on MeSH (73% success) as MeSH is not a strict is-a/part-of, but rather a loose is-related-to hierarchy. The 'Closest Sense' approach achieves on average 80% success rate. Conclusion Metadata is valuable for disambiguation, but requires high quality training data. Closest Sense requires no training, but a large, consistently modelled ontology, which are two opposing conditions. Term Cooc achieves greater 90% success given a consistently modelled ontology. Overall, the results show that well structured ontologies can play a very important role to improve disambiguation. Availability The three benchmark datasets created for the purpose of disambiguation are available in Additional file 1.
Collapse
Affiliation(s)
- Dimitra Alexopoulou
- Biotechnology Center (BIOTEC), Technische Universität Dresden, 01062, Dresden, Germany.
| | | | | | | | | | | | | | | | | |
Collapse
|
18
|
Stevenson M, Guo Y, Gaizauskas R, Martinez D. Disambiguation of biomedical text using diverse sources of information. BMC Bioinformatics 2008; 9 Suppl 11:S7. [PMID: 19025693 PMCID: PMC2586756 DOI: 10.1186/1471-2105-9-s11-s7] [Citation(s) in RCA: 23] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/24/2022] Open
Abstract
Background Like text in other domains, biomedical documents contain a range of terms with more than one possible meaning. These ambiguities form a significant obstacle to the automatic processing of biomedical texts. Previous approaches to resolving this problem have made use of various sources of information including linguistic features of the context in which the ambiguous term is used and domain-specific resources, such as UMLS. Materials and methods We compare various sources of information including ones which have been previously used and a novel one: MeSH terms. Evaluation is carried out using a standard test set (the NLM-WSD corpus). Results The best performance is obtained using a combination of linguistic features and MeSH terms. Performance of our system exceeds previously published results for systems evaluated using the same data set. Conclusion Disambiguation of biomedical terms benefits from the use of information from a variety of sources. In particular, MeSH terms have proved to be useful and should be used if available.
Collapse
Affiliation(s)
- Mark Stevenson
- Department of Computer Science, University of Sheffield, Regent Court, 211 Portobello, Sheffield S14DP, UK.
| | | | | | | |
Collapse
|
19
|
Word sense disambiguation across two domains: biomedical literature and clinical notes. J Biomed Inform 2008; 41:1088-100. [PMID: 18375190 DOI: 10.1016/j.jbi.2008.02.003] [Citation(s) in RCA: 41] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/26/2007] [Revised: 02/16/2008] [Accepted: 02/20/2008] [Indexed: 11/23/2022]
Abstract
The aim of this study is to explore the word sense disambiguation (WSD) problem across two biomedical domains-biomedical literature and clinical notes. A supervised machine learning technique was used for the WSD task. One of the challenges addressed is the creation of a suitable clinical corpus with manual sense annotations. This corpus in conjunction with the WSD set from the National Library of Medicine provided the basis for the evaluation of our method across multiple domains and for the comparison of our results to published ones. Noteworthy is that only 20% of the most relevant ambiguous terms within a domain overlap between the two domains, having more senses associated with them in the clinical space than in the biomedical literature space. Experimentation with 28 different feature sets rendered a system achieving an average F-score of 0.82 on the clinical data and 0.86 on the biomedical literature.
Collapse
|
20
|
Kang SH, Poynton MR, Kim KM, Lee H, Kim DH, Lee SH, Bae KS, Linares O, Kern SE, Noh GJ. Population pharmacokinetic and pharmacodynamic models of remifentanil in healthy volunteers using artificial neural network analysis. Br J Clin Pharmacol 2007; 64:3-13. [PMID: 17324247 PMCID: PMC2000605 DOI: 10.1111/j.1365-2125.2007.02845.x] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022] Open
Abstract
AIMS An ordinary sigmoid E(max) model could not predict overshoot of electroencephalographic approximate entropy (ApEn) during recovery from remifentanil effect in our previous study. The aim of this study was to evaluate the ability of an artificial neural network (ANN) to predict ApEn overshoot and to evaluate the predictive performance of the pharmacokinetic model, and pharmacodynamic models of ANN with respect to data used. METHODS Using a reduced number of ApEn instances (n = 1581) to make NONMEM modelling feasible and complete ApEn data (n = 24 509), the presence of overshoot was assessed. A total of 1077 measured remifentanil concentrations and ApEn data, and a total of 24 509 predicted concentrations and ApEn data were used in the pharmacodynamic model A and B of ANN, respectively. The testing subset of model B (n = 7352) was used to evaluate the ability of ANN to predict overshoot of ApEn. Mean squared error (MSE) was calculated to evaluate the predictive performance of the ANN models. RESULTS With complete ApEn data, ApEn overshoot was observed in 66.7% of subjects, but only in 37% with a reduced number of ApEn instances. The ANN model B predicted 77.8% of ApEn overshoot. MSE (95% confidence interval) was 57.1 (3.22, 71.03) for the pharmacokinetic model, 0.148 (0.004, 0.007) for model A and 0.0018 (0.0017, 0.0019) for model B. CONCLUSIONS The reduced ApEn instances interfered with the approximation of true electroencephalographic response. ANN predicted 77.8% of ApEn overshoot. The predictive performance of model B was significantly better than that of model A.
Collapse
Affiliation(s)
- S H Kang
- School of Health Adminstration Program, Inje University, Kimhae, Korea
| | | | | | | | | | | | | | | | | | | |
Collapse
|
21
|
Hasman A, Haux R. Modeling in biomedical informatics—An exploratory analysis. Int J Med Inform 2007; 76:96-102. [PMID: 17113824 DOI: 10.1016/j.ijmedinf.2006.08.004] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/07/2006] [Revised: 06/13/2006] [Accepted: 08/29/2006] [Indexed: 10/23/2022]
Abstract
OBJECTIVE Modeling is a significant part of research, education and practice in biomedical and health informatics. Our objective was to explore which types of models of processes are used in current biomedical/health informatics research, as reflected in publications of scientific journals in this field. Also, the implications for medical informatics curricula were investigated. METHODS Retrospective, prolective observational study on recent publications of the two official journals of the International Medical Informatics Association (IMIA), the International Journal of Medical Informatics (IJMI) and Methods of Information in Medicine (MIM). All publications of the years 2004 and 2005 from these journals were indexed according to a given list of model types. Random samples out of these publications were analysed in more depth. RESULTS Three hundred and eighty-four publications have been analysed, 190 of IJMI and 194 of MIM. For publications in special issues (121 in IJMI) and special topics (132 in MIM) we found differences between theme-centered and conference-centered special issues/special topics (SIT) publications. In particular, we could observe a high variation between modeling in publications of theme-centered SITs. It became obvious that often sound formal knowledge as well as a strong engineering background is needed for carrying out this type of research. Usually, this knowledge and the related skills can be best provided in consecutive B.Sc. and M.Sc. programs in medical informatics (respectively, health informatics, biomedical informatics). If the focus should be primarily on health information systems and evaluation this can be offered in a M.Sc. program in medical informatics. CONCLUSIONS In analysing the 384 publications it became obvious that modeling continues to be a major task in research, education and practice in biomedical and health informatics. Knowledge and skills on a broad range of model types are needed in biomedical/health informatics.
Collapse
Affiliation(s)
- A Hasman
- University of Amsterdam, Academic Medical Center, Department of Medical Informatics, 1100 DE Amsterdam, The Netherlands.
| | | |
Collapse
|
22
|
Erhardt RAA, Schneider R, Blaschke C. Status of text-mining techniques applied to biomedical text. Drug Discov Today 2007; 11:315-25. [PMID: 16580973 DOI: 10.1016/j.drudis.2006.02.011] [Citation(s) in RCA: 82] [Impact Index Per Article: 4.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/10/2005] [Revised: 02/08/2006] [Accepted: 02/27/2006] [Indexed: 11/16/2022]
Abstract
Scientific progress is increasingly based on knowledge and information. Knowledge is now recognized as the driver of productivity and economic growth, leading to a new focus on the role of information in the decision-making process. Most scientific knowledge is registered in publications and other unstructured representations that make it difficult to use and to integrate the information with other sources (e.g. biological databases). Making a computer understand human language has proven to be a complex achievement, but there are techniques capable of detecting, distinguishing and extracting a limited number of different classes of facts. In the biomedical field, extracting information has specific problems: complex and ever-changing nomenclature (especially genes and proteins) and the limited representation of domain knowledge.
Collapse
|
23
|
Xu H, Markatou M, Dimova R, Liu H, Friedman C. Machine learning and word sense disambiguation in the biomedical domain: design and evaluation issues. BMC Bioinformatics 2006; 7:334. [PMID: 16822321 PMCID: PMC1550263 DOI: 10.1186/1471-2105-7-334] [Citation(s) in RCA: 31] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/26/2006] [Accepted: 07/05/2006] [Indexed: 11/17/2022] Open
Abstract
Background Word sense disambiguation (WSD) is critical in the biomedical domain for improving the precision of natural language processing (NLP), text mining, and information retrieval systems because ambiguous words negatively impact accurate access to literature containing biomolecular entities, such as genes, proteins, cells, diseases, and other important entities. Automated techniques have been developed that address the WSD problem for a number of text processing situations, but the problem is still a challenging one. Supervised WSD machine learning (ML) methods have been applied in the biomedical domain and have shown promising results, but the results typically incorporate a number of confounding factors, and it is problematic to truly understand the effectiveness and generalizability of the methods because these factors interact with each other and affect the final results. Thus, there is a need to explicitly address the factors and to systematically quantify their effects on performance. Results Experiments were designed to measure the effect of "sample size" (i.e. size of the datasets), "sense distribution" (i.e. the distribution of the different meanings of the ambiguous word) and "degree of difficulty" (i.e. the measure of the distances between the meanings of the senses of an ambiguous word) on the performance of WSD classifiers. Support Vector Machine (SVM) classifiers were applied to an automatically generated data set containing four ambiguous biomedical abbreviations: BPD, BSA, PCA, and RSV, which were chosen because of varying degrees of differences in their respective senses. Results showed that: 1) increasing the sample size generally reduced the error rate, but this was limited mainly to well-separated senses (i.e. cases where the distances between the senses were large); in difficult cases an unusually large increase in sample size was needed to increase performance slightly, which was impractical, 2) the sense distribution did not have an effect on performance when the senses were separable, 3) when there was a majority sense of over 90%, the WSD classifier was not better than use of the simple majority sense, 4) error rates were proportional to the similarity of senses, and 5) there was no statistical difference between results when using a 5-fold or 10-fold cross-validation method. Other issues that impact performance are also enumerated. Conclusion Several different independent aspects affect performance when using ML techniques for WSD. We found that combining them into one single result obscures understanding of the underlying methods. Although we studied only four abbreviations, we utilized a well-established statistical method that guarantees the results are likely to be generalizable for abbreviations with similar characteristics. The results of our experiments show that in order to understand the performance of these ML methods it is critical that papers report on the baseline performance, the distribution and sample size of the senses in the datasets, and the standard deviation or confidence intervals. In addition, papers should also characterize the difficulty of the WSD task, the WSD situations addressed and not addressed, as well as the ML methods and features used. This should lead to an improved understanding of the generalizablility and the limitations of the methodology.
Collapse
Affiliation(s)
- Hua Xu
- Department of Biomedical Informatics, Columbia University, 622 168St, New York City, New York, USA
| | - Marianthi Markatou
- Department of Biostatistics, Columbia University, 722 168St, New York City, New York, USA
| | - Rositsa Dimova
- Department of Biostatistics, Columbia University, 722 168St, New York City, New York, USA
| | - Hongfang Liu
- Department of Biostatistics, Bioinformatics and Biomathematics, Georgetown University Medical Center, 4000 Reservoir Rd, Washington DC, USA
| | - Carol Friedman
- Department of Biomedical Informatics, Columbia University, 622 168St, New York City, New York, USA
| |
Collapse
|