1
|
Mehra T, Wekhof T, Keller DI. Additional Value From Free-Text Diagnoses in Electronic Health Records: Hybrid Dictionary and Machine Learning Classification Study. JMIR Med Inform 2024; 12:e49007. [PMID: 38231569 PMCID: PMC10831590 DOI: 10.2196/49007] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/15/2023] [Revised: 10/30/2023] [Accepted: 11/24/2023] [Indexed: 01/18/2024] Open
Abstract
BACKGROUND Physicians are hesitant to forgo the opportunity of entering unstructured clinical notes for structured data entry in electronic health records. Does free text increase informational value in comparison with structured data? OBJECTIVE This study aims to compare information from unstructured text-based chief complaints harvested and processed by a natural language processing (NLP) algorithm with clinician-entered structured diagnoses in terms of their potential utility for automated improvement of patient workflows. METHODS Electronic health records of 293,298 patient visits at the emergency department of a Swiss university hospital from January 2014 to October 2021 were analyzed. Using emergency department overcrowding as a case in point, we compared supervised NLP-based keyword dictionaries of symptom clusters from unstructured clinical notes and clinician-entered chief complaints from a structured drop-down menu with the following 2 outcomes: hospitalization and high Emergency Severity Index (ESI) score. RESULTS Of 12 symptom clusters, the NLP cluster was substantial in predicting hospitalization in 11 (92%) clusters; 8 (67%) clusters remained significant even after controlling for the cluster of clinician-determined chief complaints in the model. All 12 NLP symptom clusters were significant in predicting a low ESI score, of which 9 (75%) remained significant when controlling for clinician-determined chief complaints. The correlation between NLP clusters and chief complaints was low (r=-0.04 to 0.6), indicating complementarity of information. CONCLUSIONS The NLP-derived features and clinicians' knowledge were complementary in explaining patient outcome heterogeneity. They can provide an efficient approach to patient flow management, for example, in an emergency medicine setting. We further demonstrated the feasibility of creating extensive and precise keyword dictionaries with NLP by medical experts without requiring programming knowledge. Using the dictionary, we could classify short and unstructured clinical texts into diagnostic categories defined by the clinician.
Collapse
Affiliation(s)
- Tarun Mehra
- Department for Medical Oncology and Hematology, University Hospital of Zurich, Zurich, Switzerland
| | - Tobias Wekhof
- Center of Economic Research, ETH Zurich, Zurich, Switzerland
| | - Dagmar Iris Keller
- Faculty of Medicine, University of Zurich, Zurich, Switzerland
- Emergency Department, University Hospital of Zurich, Zurich, Switzerland
| |
Collapse
|
2
|
Berge GT, Granmo OC, Tveit TO, Ruthjersen AL, Sharma J. Combining unsupervised, supervised and rule-based learning: the case of detecting patient allergies in electronic health records. BMC Med Inform Decis Mak 2023; 23:188. [PMID: 37723446 PMCID: PMC10507898 DOI: 10.1186/s12911-023-02271-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/01/2022] [Accepted: 08/17/2023] [Indexed: 09/20/2023] Open
Abstract
BACKGROUND Data mining of electronic health records (EHRs) has a huge potential for improving clinical decision support and to help healthcare deliver precision medicine. Unfortunately, the rule-based and machine learning-based approaches used for natural language processing (NLP) in healthcare today all struggle with various shortcomings related to performance, efficiency, or transparency. METHODS In this paper, we address these issues by presenting a novel method for NLP that implements unsupervised learning of word embeddings, semi-supervised learning for simplified and accelerated clinical vocabulary and concept building, and deterministic rules for fine-grained control of information extraction. The clinical language is automatically learnt, and vocabulary, concepts, and rules supporting a variety of NLP downstream tasks can further be built with only minimal manual feature engineering and tagging required from clinical experts. Together, these steps create an open processing pipeline that gradually refines the data in a transparent way, which greatly improves the interpretable nature of our method. Data transformations are thus made transparent and predictions interpretable, which is imperative for healthcare. The combined method also has other advantages, like potentially being language independent, demanding few domain resources for maintenance, and able to cover misspellings, abbreviations, and acronyms. To test and evaluate the combined method, we have developed a clinical decision support system (CDSS) named Information System for Clinical Concept Searching (ICCS) that implements the method for clinical concept tagging, extraction, and classification. RESULTS In empirical studies the method shows high performance (recall 92.6%, precision 88.8%, F-measure 90.7%), and has demonstrated its value to clinical practice. Here we employ a real-life EHR-derived dataset to evaluate the method's performance on the task of classification (i.e., detecting patient allergies) against a range of common supervised learning algorithms. The combined method achieves state-of-the-art performance compared to the alternative methods we evaluate. We also perform a qualitative analysis of common word embedding methods on the task of word similarity to examine their potential for supporting automatic feature engineering for clinical NLP tasks. CONCLUSIONS Based on the promising results, we suggest more research should be aimed at exploiting the inherent synergies between unsupervised, supervised, and rule-based paradigms for clinical NLP.
Collapse
Affiliation(s)
- Geir Thore Berge
- Department of Information Systems, University of Agder, Kristiansand, Norway
- Department of Technology and eHealth, Sørlandet Hospital Trust, Kristiansand, Norway
| | | | - Tor Oddbjørn Tveit
- Department of Technology and eHealth, Sørlandet Hospital Trust, Kristiansand, Norway
- Department of Anesthesia and Intensive Care, Sørlandet Hospital Trust, Kristiansand, Norway
| | - Anna Linda Ruthjersen
- Department of Technology and eHealth, Sørlandet Hospital Trust, Kristiansand, Norway
| | - Jivitesh Sharma
- Department of Technology and eHealth, Sørlandet Hospital Trust, Kristiansand, Norway.
- Department of ICT, University of Agder, Grimstad, Norway.
| |
Collapse
|
3
|
Fraile Navarro D, Ijaz K, Rezazadegan D, Rahimi-Ardabili H, Dras M, Coiera E, Berkovsky S. Clinical named entity recognition and relation extraction using natural language processing of medical free text: A systematic review. Int J Med Inform 2023; 177:105122. [PMID: 37295138 DOI: 10.1016/j.ijmedinf.2023.105122] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/09/2022] [Revised: 04/14/2023] [Accepted: 06/03/2023] [Indexed: 06/12/2023]
Abstract
BACKGROUND Natural Language Processing (NLP) applications have developed over the past years in various fields including its application to clinical free text for named entity recognition and relation extraction. However, there has been rapid developments the last few years that there's currently no overview of it. Moreover, it is unclear how these models and tools have been translated into clinical practice. We aim to synthesize and review these developments. METHODS We reviewed literature from 2010 to date, searching PubMed, Scopus, the Association of Computational Linguistics (ACL), and Association of Computer Machinery (ACM) libraries for studies of NLP systems performing general-purpose (i.e., not disease- or treatment-specific) information extraction and relation extraction tasks in unstructured clinical text (e.g., discharge summaries). RESULTS We included in the review 94 studies with 30 studies published in the last three years. Machine learning methods were used in 68 studies, rule-based in 5 studies, and both in 22 studies. 63 studies focused on Named Entity Recognition, 13 on Relation Extraction and 18 performed both. The most frequently extracted entities were "problem", "test" and "treatment". 72 studies used public datasets and 22 studies used proprietary datasets alone. Only 14 studies defined clearly a clinical or information task to be addressed by the system and just three studies reported its use outside the experimental setting. Only 7 studies shared a pre-trained model and only 8 an available software tool. DISCUSSION Machine learning-based methods have dominated the NLP field on information extraction tasks. More recently, Transformer-based language models are taking the lead and showing the strongest performance. However, these developments are mostly based on a few datasets and generic annotations, with very few real-world use cases. This may raise questions about the generalizability of findings, translation into practice and highlights the need for robust clinical evaluation.
Collapse
Affiliation(s)
- David Fraile Navarro
- Centre for Health Informatics, Australian Institute of Health Innovation, Macquarie University, Sydney, Australia.
| | - Kiran Ijaz
- Centre for Health Informatics, Australian Institute of Health Innovation, Macquarie University, Sydney, Australia
| | - Dana Rezazadegan
- Department of Computer Science and Software Engineering. School of Software and Electrical Engineering, Swinburne University of Technology, Melbourne, Australia
| | - Hania Rahimi-Ardabili
- Centre for Health Informatics, Australian Institute of Health Innovation, Macquarie University, Sydney, Australia
| | - Mark Dras
- Department of Computing, Macquarie University, Sydney, Australia
| | - Enrico Coiera
- Centre for Health Informatics, Australian Institute of Health Innovation, Macquarie University, Sydney, Australia
| | - Shlomo Berkovsky
- Centre for Health Informatics, Australian Institute of Health Innovation, Macquarie University, Sydney, Australia
| |
Collapse
|
4
|
Seif MA, Kruse BC, Keramati CA, Aloia TA, Amaku RA, Bhavsar S, DeCarlo KR, Erfe RJD, Eska JS, Iniesta MD, Prakash LR, Zhang T, Gottumukkala V. Development and implementation of an institutional enhanced recovery program data process. HEALTH INF MANAG J 2023; 52:151-156. [PMID: 35695132 DOI: 10.1177/18333583221095139] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/16/2022]
Abstract
Background: With increasing implementation of enhanced recovery programs (ERPs) in clinical practice, standardised data collection and reporting have become critical in addressing the heterogeneity of metrics used for reporting outcomes. Opportunities exist to leverage electronic health record (EHR) systems to collect, analyse, and disseminate ERP data. Objectives: (i) To consolidate relevant ERP variables into a singular data universe; (ii) To create an accessible and intuitive query tool for rapid data retrieval. Method: We reviewed nine established individual team databases to identify common variables to create one standard ERP data dictionary. To address data automation, we used a third-party business intelligence tool to map identified variables within the EHR system, consolidating variables into a single ERP universe. To determine efficacy, we compared times for four experienced research coordinators to use manual, five-universe, and ERP Universe processes to retrieve ERP data for 10 randomly selected surgery patients. Results: The total times to process data variables for all 10 patients for the manual, five universe, and ERP Universe processes were 510, 111, and 76 min, respectively. Shifting from the five-universe or manual process to the ERP Universe resulted in decreases in time of 32% and 85%, respectively. Conclusion: The ERP Universe improves time spent collecting, analysing, and reporting ERP elements without increasing operational costs or interrupting workflow. Implications: Manual data abstraction places significant burden on resources. The creation of a singular instrument dedicated to ERP data abstraction greatly increases the efficiency in which clinicians and supporting staff can query adherence to an ERP protocol.
Collapse
Affiliation(s)
- Mohamed A Seif
- Department of Urology, The University of Texas MD Anderson Cancer Center, Houston, TX, USA
| | - Brittany C Kruse
- Institute for Cancer Care Innovation, The University of Texas MD Anderson Cancer Center, Houston, TX, USA
| | - Cameron A Keramati
- Institute for Cancer Care Innovation, The University of Texas MD Anderson Cancer Center, Houston, TX, USA
| | - Thomas A Aloia
- Institute for Cancer Care Innovation, The University of Texas MD Anderson Cancer Center, Houston, TX, USA
| | - Ruth A Amaku
- Institute for Cancer Care Innovation, The University of Texas MD Anderson Cancer Center, Houston, TX, USA
| | - Shreyas Bhavsar
- Anesthesiology and PeriOperative Medicine, The University of Texas MD Anderson Cancer Center, Houston, TX, USA
| | - Kenneth R DeCarlo
- EHR Analytics and Reporting, The University of Texas MD Anderson Cancer Center, Houston, TX, USA
| | - Rose Joan D Erfe
- Department of Anesthesia, Critical Care, and Pain Management, The University of Texas MD Anderson Cancer Center, Houston, TX, USA
| | - Jarrod S Eska
- Institute for Cancer Care Innovation, The University of Texas MD Anderson Cancer Center, Houston, TX, USA
| | - Maria D Iniesta
- Gynecology Oncology and Reproductive Medicine, The University of Texas MD Anderson Cancer Center, Houston, TX, USA
| | - Laura R Prakash
- Surgical Oncology, The University of Texas MD Anderson Cancer Center, Houston, TX, USA
| | - Tao Zhang
- EHR Analytics and Reporting, The University of Texas MD Anderson Cancer Center, Houston, TX, USA
| | - Vijaya Gottumukkala
- Anesthesiology and PeriOperative Medicine, The University of Texas MD Anderson Cancer Center, Houston, TX, USA
| |
Collapse
|
5
|
Reyes CN, Zheng K, Hanauer DA. Design, Implementation, and Usability of the Electronic Medical Record Search Engine (EMERSE) Tool. AMIA ... ANNUAL SYMPOSIUM PROCEEDINGS. AMIA SYMPOSIUM 2023; 2022:932-941. [PMID: 37128440 PMCID: PMC10148345] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Subscribe] [Scholar Register] [Indexed: 05/03/2023]
Abstract
Free text forms of clinical documentation stored in electronic health records contain a trove of data for researchers and clinicians alike. However, often these data are challenging to use and not easily accessible. EMERSE, a clinical documentation search and data abstraction tool developed by the University of Michigan, helps users in the task of searching through free text notes in clinical documentation. This study evaluates the usability and user experience of the EMERSE system, and draws inferences for the design of such systems. The study was conducted in 3 phases. In Phase 1, interviews with site administrators investigated factors that facilitate or hinder the implementation and adoption of EMERSE. Phase 2 employed semi-structured interviews to understand the uses, benefits, and limitations of the system from the perspective of experienced users. In Phase 3, system-naive users performed a set of basic workflow tasks, then completed post-activity questions and surveys to evaluate the intuitiveness and usability of the system. Participants rated the system exceptionally high on usability, user interface satisfaction, and perceived usefulness. Feedback also indicated that improvements could be made in visual contrast, affordances, and scope of notes indexed. These results indicate that tools such as EMERSE should be highly intuitive, attractive, and moderately customizable. This paper discusses some aspects of what may contribute to a system having such characteristics.
Collapse
Affiliation(s)
| | - Kai Zheng
- University of California, Irvine, Irvine, CA, USA
| | | |
Collapse
|
6
|
Han P, Fu S, Kolis J, Hughes R, Hallstrom BR, Carvour M, Maradit-Kremers H, Sohn S, Vydiswaran VGV. Multicenter Validation of Natural Language Processing Algorithms for the Detection of Common Data Elements in Operative Notes for Total Hip Arthroplasty: Algorithm Development and Validation. JMIR Med Inform 2022; 10:e38155. [PMID: 36044253 PMCID: PMC9475406 DOI: 10.2196/38155] [Citation(s) in RCA: 7] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/21/2022] [Revised: 05/30/2022] [Accepted: 07/12/2022] [Indexed: 11/18/2022] Open
Abstract
BACKGROUND Natural language processing (NLP) methods are powerful tools for extracting and analyzing critical information from free-text data. MedTaggerIE, an open-source NLP pipeline for information extraction based on text patterns, has been widely used in the annotation of clinical notes. A rule-based system, MedTagger-total hip arthroplasty (THA), developed based on MedTaggerIE, was previously shown to correctly identify the surgical approach, fixation, and bearing surface from the THA operative notes at Mayo Clinic. OBJECTIVE This study aimed to assess the implementability, usability, and portability of MedTagger-THA at two external institutions, Michigan Medicine and the University of Iowa, and provide lessons learned for best practices. METHODS We conducted iterative test-apply-refinement processes with three involved sites-the development site (Mayo Clinic) and two deployment sites (Michigan Medicine and the University of Iowa). Mayo Clinic was the primary NLP development site, with the THA registry as the gold standard. The activities at the two deployment sites included the extraction of the operative notes, gold standard development (Michigan: registry data; Iowa: manual chart review), the refinement of NLP algorithms on training data, and the evaluation of test data. Error analyses were conducted to understand language variations across sites. To further assess the model specificity for approach and fixation, we applied the refined MedTagger-THA to arthroscopic hip procedures and periacetabular osteotomy cases, as neither of these operative notes should contain any approach or fixation keywords. RESULTS MedTagger-THA algorithms were implemented and refined independently for both sites. At Michigan, the study comprised THA-related notes for 2569 patient-date pairs. Before model refinement, MedTagger-THA algorithms demonstrated excellent accuracy for approach (96.6%, 95% CI 94.6%-97.9%) and fixation (95.7%, 95% CI 92.4%-97.6%). These results were comparable with internal accuracy at the development site (99.2% for approach and 90.7% for fixation). Model refinement improved accuracies slightly for both approach (99%, 95% CI 97.6%-99.6%) and fixation (98%, 95% CI 95.3%-99.3%). The specificity of approach identification was 88.9% for arthroscopy cases, and the specificity of fixation identification was 100% for both periacetabular osteotomy and arthroscopy cases. At the Iowa site, the study comprised an overall data set of 100 operative notes (50 training notes and 50 test notes). MedTagger-THA algorithms achieved moderate-high performance on the training data. After model refinement, the model achieved high performance for approach (100%, 95% CI 91.3%-100%), fixation (98%, 95% CI 88.3%-100%), and bearing surface (92%, 95% CI 80.5%-97.3%). CONCLUSIONS High performance across centers was achieved for the MedTagger-THA algorithms, demonstrating that they were sufficiently implementable, usable, and portable to different deployment sites. This study provided important lessons learned during the model deployment and validation processes, and it can serve as a reference for transferring rule-based electronic health record models.
Collapse
Affiliation(s)
- Peijin Han
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, MI, United States
| | - Sunyang Fu
- Department of Artificial Intelligence and Informatics, Mayo Clinic, Rochester, MN, United States
| | - Julie Kolis
- Department of Orthopedic Surgery, University of Michigan, Ann Arbor, MI, United States
| | - Richard Hughes
- Department of Orthopedic Surgery, University of Michigan, Ann Arbor, MI, United States
| | - Brian R Hallstrom
- Department of Orthopedic Surgery, University of Michigan, Ann Arbor, MI, United States
| | - Martha Carvour
- Department of Internal Medicine and Epidemiology, University of Iowa, Iowa City, IA, United States
| | - Hilal Maradit-Kremers
- Department of Artificial Intelligence and Informatics, Mayo Clinic, Rochester, MN, United States
- Departments of Orthopedic Surgery, Mayo Clinic, Rochester, MN, United States
| | - Sunghwan Sohn
- Department of Artificial Intelligence and Informatics, Mayo Clinic, Rochester, MN, United States
| | - V G Vinod Vydiswaran
- Department of Learning Health Sciences, Medical School, University of Michigan, Ann Arbor, MI, United States
- School of Information, University of Michigan, Ann Arbor, MI, United States
| |
Collapse
|
7
|
Lederman A, Lederman R, Verspoor K. Tasks as needs: reframing the paradigm of clinical natural language processing research for real-world decision support. J Am Med Inform Assoc 2022; 29:1810-1817. [PMID: 35848784 PMCID: PMC9471702 DOI: 10.1093/jamia/ocac121] [Citation(s) in RCA: 14] [Impact Index Per Article: 4.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/19/2022] [Revised: 06/06/2022] [Accepted: 07/04/2022] [Indexed: 12/13/2022] Open
Abstract
Electronic medical records are increasingly used to store patient information in hospitals and other clinical settings. There has been a corresponding proliferation of clinical natural language processing (cNLP) systems aimed at using text data in these records to improve clinical decision-making, in comparison to manual clinician search and clinical judgment alone. However, these systems have delivered marginal practical utility and are rarely deployed into healthcare settings, leading to proposals for technical and structural improvements. In this paper, we argue that this reflects a violation of Friedman's "Fundamental Theorem of Biomedical Informatics," and that a deeper epistemological change must occur in the cNLP field, as a parallel step alongside any technical or structural improvements. We propose that researchers shift away from designing cNLP systems independent of clinical needs, in which cNLP tasks are ends in themselves-"tasks as decisions"-and toward systems that are directly guided by the needs of clinicians in realistic decision-making contexts-"tasks as needs." A case study example illustrates the potential benefits of developing cNLP systems that are designed to more directly support clinical needs.
Collapse
Affiliation(s)
- Asher Lederman
- Faculty of Engineering and IT, School of Computing and Information Systems, University of Melbourne, Melbourne, Australia
| | - Reeva Lederman
- Faculty of Engineering and IT, School of Computing and Information Systems, University of Melbourne, Melbourne, Australia
| | - Karin Verspoor
- STEM College, School of Computing Technologies, RMIT University, Melbourne, Australia
| |
Collapse
|
8
|
Shah-Mohammadi F, Cui W, Bachi K, Hurd Y, Finkelstein J. Using Natural Language Processing of Clinical Notes to Predict Outcomes of Opioid Treatment Program. ANNUAL INTERNATIONAL CONFERENCE OF THE IEEE ENGINEERING IN MEDICINE AND BIOLOGY SOCIETY. IEEE ENGINEERING IN MEDICINE AND BIOLOGY SOCIETY. ANNUAL INTERNATIONAL CONFERENCE 2022; 2022:4415-4420. [PMID: 36085896 PMCID: PMC9472807 DOI: 10.1109/embc48229.2022.9871960] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/15/2023]
Abstract
Potential of natural language processing (NLP) in extracting patient's information from clinical notes of opioid treatment programs (OTP) and leveraging it in development of predictive models has not been fully explored. The goal of this study was to assess potential of NLP in identifying legal, social, mental, medical and family environment-based determinants of distress from clinical narratives of patients with opioid addiction, and then using this information in predicting OTP outcomes. Around 63% of patients reported improvements after completing OTP. We compared the results of logistics regression and random forest for predictive modeling. Random forest model performed slightly better than logistic regression (75% F1 score) with 74% accuracy. Clinical Relevance- Psychiatric and medical disorders, social, legal and family-based distress are important determinants of distress in patients enrolled in OTP. These information are often recorded in clinical notes. Extraction of this information and their utilization as features in machine learning models will lead to the enhancement of the performance of the OTP outcome predictive models.
Collapse
|
9
|
Fang Y, Idnay B, Sun Y, Liu H, Chen Z, Marder K, Xu H, Schnall R, Weng C. Combining human and machine intelligence for clinical trial eligibility querying. J Am Med Inform Assoc 2022; 29:1161-1171. [PMID: 35426943 PMCID: PMC9196697 DOI: 10.1093/jamia/ocac051] [Citation(s) in RCA: 14] [Impact Index Per Article: 4.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/25/2022] [Accepted: 03/29/2022] [Indexed: 11/13/2022] Open
Abstract
OBJECTIVE To combine machine efficiency and human intelligence for converting complex clinical trial eligibility criteria text into cohort queries. MATERIALS AND METHODS Criteria2Query (C2Q) 2.0 was developed to enable real-time user intervention for criteria selection and simplification, parsing error correction, and concept mapping. The accuracy, precision, recall, and F1 score of enhanced modules for negation scope detection, temporal and value normalization were evaluated using a previously curated gold standard, the annotated eligibility criteria of 1010 COVID-19 clinical trials. The usability and usefulness were evaluated by 10 research coordinators in a task-oriented usability evaluation using 5 Alzheimer's disease trials. Data were collected by user interaction logging, a demographic questionnaire, the Health Information Technology Usability Evaluation Scale (Health-ITUES), and a feature-specific questionnaire. RESULTS The accuracies of negation scope detection, temporal and value normalization were 0.924, 0.916, and 0.966, respectively. C2Q 2.0 achieved a moderate usability score (3.84 out of 5) and a high learnability score (4.54 out of 5). On average, 9.9 modifications were made for a clinical study. Experienced researchers made more modifications than novice researchers. The most frequent modification was deletion (5.35 per study). Furthermore, the evaluators favored cohort queries resulting from modifications (score 4.1 out of 5) and the user engagement features (score 4.3 out of 5). DISCUSSION AND CONCLUSION Features to engage domain experts and to overcome the limitations in automated machine output are shown to be useful and user-friendly. We concluded that human-computer collaboration is key to improving the adoption and user-friendliness of natural language processing.
Collapse
Affiliation(s)
- Yilu Fang
- Department of Biomedical Informatics, Columbia University, New York, New York, USA
| | - Betina Idnay
- School of Nursing, Columbia University, New York, New York, USA.,Department of Neurology, Columbia University, New York, New York, USA
| | - Yingcheng Sun
- Department of Biomedical Informatics, Columbia University, New York, New York, USA
| | - Hao Liu
- Department of Biomedical Informatics, Columbia University, New York, New York, USA
| | - Zhehuan Chen
- Department of Biomedical Informatics, Columbia University, New York, New York, USA
| | - Karen Marder
- Department of Neurology, Columbia University, New York, New York, USA
| | - Hua Xu
- School of Biomedical Informatics, The University of Texas Health Science Center at Houston, Houston, Texas, USA
| | - Rebecca Schnall
- School of Nursing, Columbia University, New York, New York, USA.,Heilbrunn Department of Population and Family Health, Mailman School of Public Health, Columbia University, New York, New York, USA
| | - Chunhua Weng
- Department of Biomedical Informatics, Columbia University, New York, New York, USA
| |
Collapse
|
10
|
Ahne A, Fagherazzi G, Tannier X, Czernichow T, Orchard F. Improving Diabetes-Related Biomedical Literature Exploration in the Clinical Decision-making Process via Interactive Classification and Topic Discovery: Methodology Development Study. J Med Internet Res 2022; 24:e27434. [PMID: 35040795 PMCID: PMC8808347 DOI: 10.2196/27434] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/25/2021] [Revised: 04/06/2021] [Accepted: 11/10/2021] [Indexed: 11/30/2022] Open
Abstract
BACKGROUND The amount of available textual health data such as scientific and biomedical literature is constantly growing and becoming more and more challenging for health professionals to properly summarize those data and practice evidence-based clinical decision making. Moreover, the exploration of unstructured health text data is challenging for professionals without computer science knowledge due to limited time, resources, and skills. Current tools to explore text data lack ease of use, require high computational efforts, and incorporate domain knowledge and focus on topics of interest with difficulty. OBJECTIVE We developed a methodology able to explore and target topics of interest via an interactive user interface for health professionals with limited computer science knowledge. We aim to reach near state-of-the-art performance while reducing memory consumption, increasing scalability, and minimizing user interaction effort to improve the clinical decision-making process. The performance was evaluated on diabetes-related abstracts from PubMed. METHODS The methodology consists of 4 parts: (1) a novel interpretable hierarchical clustering of documents where each node is defined by headwords (words that best represent the documents in the node), (2) an efficient classification system to target topics, (3) minimized user interaction effort through active learning, and (4) a visual user interface. We evaluated our approach on 50,911 diabetes-related abstracts providing a hierarchical Medical Subject Headings (MeSH) structure, a unique identifier for a topic. Hierarchical clustering performance was compared against the implementation in the machine learning library scikit-learn. On a subset of 2000 randomly chosen diabetes abstracts, our active learning strategy was compared against 3 other strategies: random selection of training instances, uncertainty sampling that chooses instances about which the model is most uncertain, and an expected gradient length strategy based on convolutional neural networks (CNNs). RESULTS For the hierarchical clustering performance, we achieved an F1 score of 0.73 compared to 0.76 achieved by scikit-learn. Concerning active learning performance, after 200 chosen training samples based on these strategies, the weighted F1 score of all MeSH codes resulted in a satisfying 0.62 F1 score using our approach, 0.61 using the uncertainty strategy, 0.63 using the CNN, and 0.45 using the random strategy. Moreover, our methodology showed a constant low memory use with increased number of documents. CONCLUSIONS We proposed an easy-to-use tool for health professionals with limited computer science knowledge who combine their domain knowledge with topic exploration and target specific topics of interest while improving transparency. Furthermore, our approach is memory efficient and highly parallelizable, making it interesting for large Big Data sets. This approach can be used by health professionals to gain deep insights into biomedical literature to ultimately improve the evidence-based clinical decision making process.
Collapse
Affiliation(s)
- Adrian Ahne
- Exposome and Heredity team, Center of Epidemiology and Population Health, Hospital Gustave Roussy, Inserm, Paris-Saclay University, Villejuif, France
- Epiconcept Company, Paris, France
| | - Guy Fagherazzi
- Deep Digital Phenotyping Research Unit, Department of Population Health, Luxembourg Institute of Health, Luxembourg, Luxembourg
| | - Xavier Tannier
- Laboratoire d'Informatique Medicale et d'Ingenierie des Connaissances pour la e-Sante, Limics, Inserm, University Sorbonne Paris Nord, Sorbonne University, Paris, France
| | | | | |
Collapse
|
11
|
Shah-Mohammadi F, Cui W, Finkelstein J. Comparison of ACM and CLAMP for Entity Extraction in Clinical Notes. ANNUAL INTERNATIONAL CONFERENCE OF THE IEEE ENGINEERING IN MEDICINE AND BIOLOGY SOCIETY. IEEE ENGINEERING IN MEDICINE AND BIOLOGY SOCIETY. ANNUAL INTERNATIONAL CONFERENCE 2021; 2021:1989-1992. [PMID: 34891677 DOI: 10.1109/embc46164.2021.9630611] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/14/2023]
Abstract
Rapid increase in adoption of electronic health records in health care institutions has motivated the use of entity extraction tools to extract meaningful information from clinical notes with unstructured and narrative style. This paper investigates the performance of two such tools in automatic entity extraction. In specific, this work focuses on automatic medication extraction performance of Amazon Comprehend Medical (ACM) and Clinical Language Annotation, Modeling and Processing (CLAMP) toolkit using 2014 i2b2 NLP challenge dataset and its annotated medical entities. Recall, precision and F-score are used to evaluate the performance of the tools.Clinical Relevance- Majority of data in electronic health records (EHRs) are in the form of free text that features a gold mine of patient's information. While computerized applications in healthcare institutions as well as clinical research leverage structured data. As a result, information hidden in clinical free texts needs to be extracted and formatted as a structured data. This paper evaluates the performance of ACM and CLAMP in automatic entity extraction. The evaluation results show that CLAMP achieves an F-score of 91%, in comparison to an 87% F-score by ACM.
Collapse
|
12
|
Wu P, Nelson SD, Zhao J, Stone Jr CA, Feng Q, Chen Q, Larson EA, Li B, Cox NJ, Stein CM, Phillips EJ, Roden DM, Denny JC, Wei WQ. DDIWAS: High-throughput electronic health record-based screening of drug-drug interactions. J Am Med Inform Assoc 2021; 28:1421-1430. [PMID: 33712848 PMCID: PMC8279788 DOI: 10.1093/jamia/ocab019] [Citation(s) in RCA: 9] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/25/2020] [Accepted: 02/08/2021] [Indexed: 11/13/2022] Open
Abstract
OBJECTIVE We developed and evaluated Drug-Drug Interaction Wide Association Study (DDIWAS). This novel method detects potential drug-drug interactions (DDIs) by leveraging data from the electronic health record (EHR) allergy list. MATERIALS AND METHODS To identify potential DDIs, DDIWAS scans for drug pairs that are frequently documented together on the allergy list. Using deidentified medical records, we tested 616 drugs for potential DDIs with simvastatin (a common lipid-lowering drug) and amlodipine (a common blood-pressure lowering drug). We evaluated the performance to rediscover known DDIs using existing knowledge bases and domain expert review. To validate potential novel DDIs, we manually reviewed patient charts and searched the literature. RESULTS DDIWAS replicated 34 known DDIs. The positive predictive value to detect known DDIs was 0.85 and 0.86 for simvastatin and amlodipine, respectively. DDIWAS also discovered potential novel interactions between simvastatin-hydrochlorothiazide, amlodipine-omeprazole, and amlodipine-valacyclovir. A software package to conduct DDIWAS is publicly available. CONCLUSIONS In this proof-of-concept study, we demonstrate the value of incorporating information mined from existing allergy lists to detect DDIs in a real-world clinical setting. Since allergy lists are routinely collected in EHRs, DDIWAS has the potential to detect and validate DDI signals across institutions.
Collapse
Affiliation(s)
- Patrick Wu
- Department of Biomedical Informatics, Vanderbilt University Medical Center, Nashville, Tennessee, USA
- Medical Scientist Training Program, Vanderbilt University School of Medicine, Nashville, Tennessee, USA
| | - Scott D Nelson
- Department of Biomedical Informatics, Vanderbilt University Medical Center, Nashville, Tennessee, USA
- HealthIT, Vanderbilt University Medical Center, Nashville, Tennessee, USA
| | - Juan Zhao
- Department of Biomedical Informatics, Vanderbilt University Medical Center, Nashville, Tennessee, USA
| | - Cosby A Stone Jr
- Division of Allergy, Pulmonary and Critical Care Medicine, Department of Medicine, Vanderbilt University Medical Center, Nashville, Tennessee, USA
| | - QiPing Feng
- Division of Clinical Pharmacology, Department of Medicine, Vanderbilt University Medical Center, Nashville, Tennessee, USA
| | - Qingxia Chen
- Department of Biomedical Informatics, Vanderbilt University Medical Center, Nashville, Tennessee, USA
- Department of Biostatistics, Vanderbilt University School of Medicine, Nashville, Tennessee, USA
| | - Eric A Larson
- Department of Medicine, University of South Dakota Sanford School of Medicine, Sioux Falls, South Dakota, USA
| | - Bingshan Li
- Department of Molecular Physiology and Biophysics, Vanderbilt University School of Medicine, Nashville, Tennessee, USA
- Vanderbilt Genetics Institute, Department of Medicine, Vanderbilt University Medical Center, Nashville, Tennessee, USA
| | - Nancy J Cox
- Vanderbilt Genetics Institute, Department of Medicine, Vanderbilt University Medical Center, Nashville, Tennessee, USA
- Department of Medicine, Vanderbilt University Medical Center, Nashville, Tennessee, USA
| | - C Michael Stein
- Division of Clinical Pharmacology, Department of Medicine, Vanderbilt University Medical Center, Nashville, Tennessee, USA
- Department of Medicine, Vanderbilt University Medical Center, Nashville, Tennessee, USA
- Department of Pharmacology, Vanderbilt University School of Medicine, Nashville, Tennessee, USA
| | - Elizabeth J Phillips
- Department of Pharmacology, Vanderbilt University School of Medicine, Nashville, Tennessee, USA
- Division of Infectious Diseases, Department of Medicine, Vanderbilt University Medical Center, Nashville, Tennessee, USA
- Department of Pathology, Microbiology and Immunology, Vanderbilt University Medical Center, Nashville, Tennessee, USA
| | - Dan M Roden
- Department of Biomedical Informatics, Vanderbilt University Medical Center, Nashville, Tennessee, USA
- Department of Medicine, Vanderbilt University Medical Center, Nashville, Tennessee, USA
- Department of Pharmacology, Vanderbilt University School of Medicine, Nashville, Tennessee, USA
| | - Joshua C Denny
- Department of Biomedical Informatics, Vanderbilt University Medical Center, Nashville, Tennessee, USA
- All of Us Research Program, National Institutes of Health, Bethesda, Maryland, USA
| | - Wei-Qi Wei
- Department of Biomedical Informatics, Vanderbilt University Medical Center, Nashville, Tennessee, USA
| |
Collapse
|
13
|
Park J, You SC, Jeong E, Weng C, Park D, Roh J, Lee DY, Cheong JY, Choi JW, Kang M, Park RW. A Framework (SOCRATex) for Hierarchical Annotation of Unstructured Electronic Health Records and Integration Into a Standardized Medical Database: Development and Usability Study. JMIR Med Inform 2021; 9:e23983. [PMID: 33783361 PMCID: PMC8044740 DOI: 10.2196/23983] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/31/2020] [Revised: 11/14/2020] [Accepted: 01/23/2021] [Indexed: 02/06/2023] Open
Abstract
BACKGROUND Although electronic health records (EHRs) have been widely used in secondary assessments, clinical documents are relatively less utilized owing to the lack of standardized clinical text frameworks across different institutions. OBJECTIVE This study aimed to develop a framework for processing unstructured clinical documents of EHRs and integration with standardized structured data. METHODS We developed a framework known as Staged Optimization of Curation, Regularization, and Annotation of clinical text (SOCRATex). SOCRATex has the following four aspects: (1) extracting clinical notes for the target population and preprocessing the data, (2) defining the annotation schema with a hierarchical structure, (3) performing document-level hierarchical annotation using the annotation schema, and (4) indexing annotations for a search engine system. To test the usability of the proposed framework, proof-of-concept studies were performed on EHRs. We defined three distinctive patient groups and extracted their clinical documents (ie, pathology reports, radiology reports, and admission notes). The documents were annotated and integrated into the Observational Medical Outcomes Partnership (OMOP)-common data model (CDM) database. The annotations were used for creating Cox proportional hazard models with different settings of clinical analyses to measure (1) all-cause mortality, (2) thyroid cancer recurrence, and (3) 30-day hospital readmission. RESULTS Overall, 1055 clinical documents of 953 patients were extracted and annotated using the defined annotation schemas. The generated annotations were indexed into an unstructured textual data repository. Using the annotations of pathology reports, we identified that node metastasis and lymphovascular tumor invasion were associated with all-cause mortality among colon and rectum cancer patients (both P=.02). The other analyses involving measuring thyroid cancer recurrence using radiology reports and 30-day hospital readmission using admission notes in depressive disorder patients also showed results consistent with previous findings. CONCLUSIONS We propose a framework for hierarchical annotation of textual data and integration into a standardized OMOP-CDM medical database. The proof-of-concept studies demonstrated that our framework can effectively process and integrate diverse clinical documents with standardized structured data for clinical research.
Collapse
Affiliation(s)
- Jimyung Park
- Department of Biomedical Sciences, Ajou University Graduate School of Medicine, Suwon, Republic of Korea
| | - Seng Chan You
- Department of Preventive Medicine and Public Health, Yonsei University College of Medicine, Seoul, Republic of Korea
| | - Eugene Jeong
- Department of Biomedical Informatics, Vanderbilt University School of Medicine, Nashville, TN, United States
| | - Chunhua Weng
- Department of Biomedical Informatics, Columbia University, New York, NY, United States
| | - Dongsu Park
- Department of Biomedical Informatics, Ajou University School of Medicine, Suwon, Republic of Korea
| | - Jin Roh
- Department of Pathology, Ajou University Hospital, Suwon, Republic of Korea
| | - Dong Yun Lee
- Department of Biomedical Informatics, Ajou University School of Medicine, Suwon, Republic of Korea
| | - Jae Youn Cheong
- Department of Gastroenterology, Ajou University School of Medicine, Suwon, Republic of Korea
| | - Jin Wook Choi
- Department of Radiology, Ajou University School of Medicine, Suwon, Republic of Korea
| | - Mira Kang
- Department of Digital Health, Samsung Advanced Institute for Health Sciences & Technology, Sungkyunkwan University, Seoul, Republic of Korea
| | - Rae Woong Park
- Department of Biomedical Sciences, Ajou University Graduate School of Medicine, Suwon, Republic of Korea
- Department of Biomedical Informatics, Ajou University School of Medicine, Suwon, Republic of Korea
| |
Collapse
|
14
|
Xu D, Gopale M, Zhang J, Brown K, Begoli E, Bethard S. Unified Medical Language System resources improve sieve-based generation and Bidirectional Encoder Representations from Transformers (BERT)-based ranking for concept normalization. J Am Med Inform Assoc 2020; 27:1510-1519. [PMID: 32719838 PMCID: PMC7566510 DOI: 10.1093/jamia/ocaa080] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/09/2020] [Revised: 03/25/2020] [Accepted: 04/27/2020] [Indexed: 12/02/2022] Open
Abstract
OBJECTIVE Concept normalization, the task of linking phrases in text to concepts in an ontology, is useful for many downstream tasks including relation extraction, information retrieval, etc. We present a generate-and-rank concept normalization system based on our participation in the 2019 National NLP Clinical Challenges Shared Task Track 3 Concept Normalization. MATERIALS AND METHODS The shared task provided 13 609 concept mentions drawn from 100 discharge summaries. We first design a sieve-based system that uses Lucene indices over the training data, Unified Medical Language System (UMLS) preferred terms, and UMLS synonyms to generate a list of possible concepts for each mention. We then design a listwise classifier based on the BERT (Bidirectional Encoder Representations from Transformers) neural network to rank the candidate concepts, integrating UMLS semantic types through a regularizer. RESULTS Our generate-and-rank system was third of 33 in the competition, outperforming the candidate generator alone (81.66% vs 79.44%) and the previous state of the art (76.35%). During postevaluation, the model's accuracy was increased to 83.56% via improvements to how training data are generated from UMLS and incorporation of our UMLS semantic type regularizer. DISCUSSION Analysis of the model shows that prioritizing UMLS preferred terms yields better performance, that the UMLS semantic type regularizer results in qualitatively better concept predictions, and that the model performs well even on concepts not seen during training. CONCLUSIONS Our generate-and-rank framework for UMLS concept normalization integrates key UMLS features like preferred terms and semantic types with a neural network-based ranking model to accurately link phrases in text to UMLS concepts.
Collapse
Affiliation(s)
- Dongfang Xu
- School of Information, University of Arizona, Tucson, Arizona, USA
| | - Manoj Gopale
- Department of Electrical and Computer Engineering, University of Arizona, Tucson, Arizona, USA
| | - Jiacheng Zhang
- Department of Computer Science, University of Arizona, Tucson, Arizona, USA
| | - Kris Brown
- National Center for Computational Sciences, Oak Ridge National Laboratory, Oak Ridge, Tennessee, USA
| | - Edmon Begoli
- National Center for Computational Sciences, Oak Ridge National Laboratory, Oak Ridge, Tennessee, USA
| | - Steven Bethard
- School of Information, University of Arizona, Tucson, Arizona, USA
| |
Collapse
|
15
|
Hier DB, Brint SU. A Neuro-ontology for the neurological examination. BMC Med Inform Decis Mak 2020; 20:47. [PMID: 32131804 PMCID: PMC7057564 DOI: 10.1186/s12911-020-1066-7] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/18/2019] [Accepted: 02/25/2020] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND The use of clinical data in electronic health records for machine-learning or data analytics depends on the conversion of free text into machine-readable codes. We have examined the feasibility of capturing the neurological examination as machine-readable codes based on UMLS Metathesaurus concepts. METHODS We created a target ontology for capturing the neurological examination using 1100 concepts from the UMLS Metathesaurus. We created a dataset of 2386 test-phrases based on 419 published neurological cases. We then mapped the test-phrases to the target ontology. RESULTS We were able to map all of the 2386 test-phrases to 601 unique UMLS concepts. A neurological examination ontology with 1100 concepts has sufficient breadth and depth of coverage to encode all of the neurologic concepts derived from the 419 test cases. Using only pre-coordinated concepts, component ontologies of the UMLS, such as HPO, SNOMED CT, and OMIM, do not have adequate depth and breadth of coverage to encode the complexity of the neurological examination. CONCLUSION An ontology based on a subset of UMLS has sufficient breadth and depth of coverage to convert deficits from the neurological examination into machine-readable codes using pre-coordinated concepts. The use of a small subset of UMLS concepts for a neurological examination ontology offers the advantage of improved manageability as well as the opportunity to curate the hierarchy and subsumption relationships.
Collapse
Affiliation(s)
- Daniel B Hier
- Department of Neurology and Rehabilitation, University of Illinois at Chicago, 912 S. Wood Street (MC 796), Chicago, IL, 60612, USA.
| | - Steven U Brint
- Department of Neurology and Rehabilitation, University of Illinois at Chicago, 912 S. Wood Street (MC 796), Chicago, IL, 60612, USA
| |
Collapse
|
16
|
Adekkanattu P, Jiang G, Luo Y, Kingsbury PR, Xu Z, Rasmussen LV, Pacheco JA, Kiefer RC, Stone DJ, Brandt PS, Yao L, Zhong Y, Deng Y, Wang F, Ancker JS, Campion TR, Pathak J. Evaluating the Portability of an NLP System for Processing Echocardiograms: A Retrospective, Multi-site Observational Study. AMIA ... ANNUAL SYMPOSIUM PROCEEDINGS. AMIA SYMPOSIUM 2020; 2019:190-199. [PMID: 32308812 PMCID: PMC7153064] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Subscribe] [Scholar Register] [Indexed: 06/11/2023]
Abstract
While natural language processing (NLP) of unstructured clinical narratives holds the potential for patient care and clinical research, portability of NLP approaches across multiple sites remains a major challenge. This study investigated the portability of an NLP system developed initially at the Department of Veterans Affairs (VA) to extract 27 key cardiac concepts from free-text or semi-structured echocardiograms from three academic edical centers: Weill Cornell Medicine, Mayo Clinic and Northwestern Medicine. While the NLP system showed high precision and recall easurements for four target concepts (aortic valve regurgitation, left atrium size at end systole, mitral valve regurgitation, tricuspid valve regurgitation) across all sites, we found moderate or poor results for the remaining concepts and the NLP system performance varied between individual sites.
Collapse
Affiliation(s)
| | | | - Yuan Luo
- Northwestern University, Chicago, IL
| | | | | | | | | | | | | | | | - Liang Yao
- Northwestern University, Chicago, IL
| | | | - Yu Deng
- Northwestern University, Chicago, IL
| | - Fei Wang
- Weill Cornell Medicine, New York, NY
| | | | | | | |
Collapse
|
17
|
Pfaff ER, Crosskey M, Morton K, Krishnamurthy A. Clinical Annotation Research Kit (CLARK): Computable Phenotyping Using Machine Learning. JMIR Med Inform 2020; 8:e16042. [PMID: 32012059 PMCID: PMC7007592 DOI: 10.2196/16042] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/28/2019] [Revised: 10/30/2019] [Accepted: 12/16/2019] [Indexed: 01/02/2023] Open
Abstract
Computable phenotypes are algorithms that translate clinical features into code that can be run against electronic health record (EHR) data to define patient cohorts. However, computable phenotypes that only make use of structured EHR data do not capture the full richness of a patient’s medical record. While natural language processing (NLP) methods have shown success in extracting clinical features from text, the use of such tools has generally been limited to research groups with substantial NLP expertise. Our goal was to develop an open-source phenotyping software, Clinical Annotation Research Kit (CLARK), that would enable clinical and translational researchers to use machine learning–based NLP for computable phenotyping without requiring deep informatics expertise. CLARK enables nonexpert users to mine text using machine learning classifiers by specifying features for the software to match in clinical notes. Once the features are defined, the user-friendly CLARK interface allows the user to choose from a variety of standard machine learning algorithms (linear support vector machine, Gaussian Naïve Bayes, decision tree, and random forest), cross-validation methods, and the number of folds (cross-validation splits) to be used in evaluation of the classifier. Example phenotypes where CLARK has been applied include pediatric diabetes (sensitivity=0.91; specificity=0.98), symptomatic uterine fibroids (positive predictive value=0.81; negative predictive value=0.54), nonalcoholic fatty liver disease (sensitivity=0.90; specificity=0.94), and primary ciliary dyskinesia (sensitivity=0.88; specificity=1.0). In each of these use cases, CLARK allowed investigators to incorporate variables into their phenotype algorithm that would not be available as structured data. Moreover, the fact that nonexpert users can get started with machine learning–based NLP with limited informatics involvement is a significant improvement over the status quo. We hope to disseminate CLARK to other organizations that may not have NLP or machine learning specialists available, enabling wider use of these methods.
Collapse
Affiliation(s)
- Emily R Pfaff
- North Carolina Translational and Clinical Sciences Institute, University of North Carolina at Chapel Hill, Chapel Hill, NC, United States
| | | | | | - Ashok Krishnamurthy
- Renaissance Computing Institute, University of North Carolina at Chapel Hill, Chapel Hill, NC, United States
| |
Collapse
|
18
|
Felipe GF, Lima FET, Barbosa LP, Moreira TMM, Joventino ES, Freire VS, Mendonça LBDA. Evaluation of user embracement software with pediatric risk classification. Rev Bras Enferm 2020; 73:e20180677. [DOI: 10.1590/0034-7167-2018-0677] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/17/2018] [Accepted: 05/17/2019] [Indexed: 11/22/2022] Open
Abstract
ABSTRACT Objective: to evaluate functional performance and technical quality of user embracement software with pediatric risk classification. Method: descriptive exploratory study developed based on the quality requirements set forth in ISO/IEC 25010. The evaluated characteristics were: functional adequacy, reliability, usability, performance efficiency, compatibility, safety, maintainability and portability. Eight specialists from the area of informatics and 13 from nursing participated in the evaluation. The characteristics were considered adequate when they reached more than 70% of indication as very and/or completely appropriate in the evaluations of each group of specialists. Results: The results obtained from the evaluation of informatics and nursing specialists were: functional adequacy (100.0%, 96.2%), reliability (82.6%, 88.5%), usability (84.9%; 98.7%), performance efficiency (93.4%; 96.2%), compatibility (85.0%, 98.1%), safety (91.7%, 100.0%), and, yet, maintainability (95.0%) and portability (87.5%) evaluated by the first ones. Conclusion: the software was considered adequate regarding technical quality and functional performance.
Collapse
|
19
|
Wen A, Fu S, Moon S, El Wazir M, Rosenbaum A, Kaggal VC, Liu S, Sohn S, Liu H, Fan J. Desiderata for delivering NLP to accelerate healthcare AI advancement and a Mayo Clinic NLP-as-a-service implementation. NPJ Digit Med 2019; 2:130. [PMID: 31872069 PMCID: PMC6917754 DOI: 10.1038/s41746-019-0208-8] [Citation(s) in RCA: 65] [Impact Index Per Article: 10.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/05/2019] [Accepted: 11/25/2019] [Indexed: 12/23/2022] Open
Abstract
Data is foundational to high-quality artificial intelligence (AI). Given that a substantial amount of clinically relevant information is embedded in unstructured data, natural language processing (NLP) plays an essential role in extracting valuable information that can benefit decision making, administration reporting, and research. Here, we share several desiderata pertaining to development and usage of NLP systems, derived from two decades of experience implementing clinical NLP at the Mayo Clinic, to inform the healthcare AI community. Using a framework, we developed as an example implementation, the desiderata emphasize the importance of a user-friendly platform, efficient collection of domain expert inputs, seamless integration with clinical data, and a highly scalable computing infrastructure.
Collapse
Affiliation(s)
- Andrew Wen
- 1Division of Digital Health Sciences, Department of Health Sciences Research, Mayo Clinic, Rochester, MN USA
| | - Sunyang Fu
- 1Division of Digital Health Sciences, Department of Health Sciences Research, Mayo Clinic, Rochester, MN USA
| | - Sungrim Moon
- 1Division of Digital Health Sciences, Department of Health Sciences Research, Mayo Clinic, Rochester, MN USA
| | - Mohamed El Wazir
- 2Department of Cardiovascular Medicine, Mayo Clinic, Rochester, MN USA
| | - Andrew Rosenbaum
- 2Department of Cardiovascular Medicine, Mayo Clinic, Rochester, MN USA
| | - Vinod C Kaggal
- 3Advanced Analytics Service Unit, Department of Information Technology, Mayo Clinic, Rochester, MN USA
| | - Sijia Liu
- 1Division of Digital Health Sciences, Department of Health Sciences Research, Mayo Clinic, Rochester, MN USA
| | - Sunghwan Sohn
- 1Division of Digital Health Sciences, Department of Health Sciences Research, Mayo Clinic, Rochester, MN USA
| | - Hongfang Liu
- 1Division of Digital Health Sciences, Department of Health Sciences Research, Mayo Clinic, Rochester, MN USA
| | - Jungwei Fan
- 1Division of Digital Health Sciences, Department of Health Sciences Research, Mayo Clinic, Rochester, MN USA
| |
Collapse
|
20
|
Velupillai S, Suominen H, Liakata M, Roberts A, Shah AD, Morley K, Osborn D, Hayes J, Stewart R, Downs J, Chapman W, Dutta R. Using clinical Natural Language Processing for health outcomes research: Overview and actionable suggestions for future advances. J Biomed Inform 2018; 88:11-19. [PMID: 30368002 PMCID: PMC6986921 DOI: 10.1016/j.jbi.2018.10.005] [Citation(s) in RCA: 114] [Impact Index Per Article: 16.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/25/2018] [Revised: 10/14/2018] [Accepted: 10/15/2018] [Indexed: 12/27/2022]
Abstract
The importance of incorporating Natural Language Processing (NLP) methods in clinical informatics research has been increasingly recognized over the past years, and has led to transformative advances. Typically, clinical NLP systems are developed and evaluated on word, sentence, or document level annotations that model specific attributes and features, such as document content (e.g., patient status, or report type), document section types (e.g., current medications, past medical history, or discharge summary), named entities and concepts (e.g., diagnoses, symptoms, or treatments) or semantic attributes (e.g., negation, severity, or temporality). From a clinical perspective, on the other hand, research studies are typically modelled and evaluated on a patient- or population-level, such as predicting how a patient group might respond to specific treatments or patient monitoring over time. While some NLP tasks consider predictions at the individual or group user level, these tasks still constitute a minority. Owing to the discrepancy between scientific objectives of each field, and because of differences in methodological evaluation priorities, there is no clear alignment between these evaluation approaches. Here we provide a broad summary and outline of the challenging issues involved in defining appropriate intrinsic and extrinsic evaluation methods for NLP research that is to be used for clinical outcomes research, and vice versa. A particular focus is placed on mental health research, an area still relatively understudied by the clinical NLP research community, but where NLP methods are of notable relevance. Recent advances in clinical NLP method development have been significant, but we propose more emphasis needs to be placed on rigorous evaluation for the field to advance further. To enable this, we provide actionable suggestions, including a minimal protocol that could be used when reporting clinical NLP method development and its evaluation.
Collapse
Affiliation(s)
- Sumithra Velupillai
- Institute of Psychiatry, Psychology & Neuroscience, King's College London, UK; School of Electrical Engineering and Computer Science, KTH, Stockholm, Sweden.
| | - Hanna Suominen
- College of Engineering and Computer Science, The Australian National University, Data61/CSIRO, University of Canberra, Australia; University of Turku, Finland.
| | - Maria Liakata
- Department of Computer Science, University of Warwick/Alan Turing Institute, UK.
| | - Angus Roberts
- Institute of Psychiatry, Psychology & Neuroscience, King's College London, UK.
| | - Anoop D Shah
- Institute of Health Informatics, University College London, UK; University College London NHS Foundation Trust, London, UK.
| | - Katherine Morley
- Institute of Psychiatry, Psychology & Neuroscience, King's College London, UK; Melbourne School of Population and Global Health, The University of Melbourne, Australia.
| | - David Osborn
- Division of Psychiatry, University College London, UK; Camden and Islington NHS Foundation Trust, London, UK.
| | - Joseph Hayes
- Division of Psychiatry, University College London, UK; Camden and Islington NHS Foundation Trust, London, UK.
| | - Robert Stewart
- Institute of Psychiatry, Psychology & Neuroscience, King's College London, UK; South London and Maudsley NHS Foundation Trust, London, UK.
| | - Johnny Downs
- Institute of Psychiatry, Psychology & Neuroscience, King's College London, UK; South London and Maudsley NHS Foundation Trust, London, UK.
| | - Wendy Chapman
- Department of Biomedical Informatics, University of Utah, United States.
| | - Rina Dutta
- Institute of Psychiatry, Psychology & Neuroscience, King's College London, UK; South London and Maudsley NHS Foundation Trust, London, UK.
| |
Collapse
|
21
|
Johnson SB, Adekkanattu P, Campion TR, Flory J, Pathak J, Patterson OV, DuVall SL, Major V, Aphinyanaphongs Y. From Sour Grapes to Low-Hanging Fruit: A Case Study Demonstrating a Practical Strategy for Natural Language Processing Portability. AMIA JOINT SUMMITS ON TRANSLATIONAL SCIENCE PROCEEDINGS. AMIA JOINT SUMMITS ON TRANSLATIONAL SCIENCE 2018; 2017:104-112. [PMID: 29888051 PMCID: PMC5961788] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Figures] [Subscribe] [Scholar Register] [Indexed: 06/08/2023]
Abstract
Natural Language Processing (NLP) holds potential for patient care and clinical research, but a gap exists between promise and reality. While some studies have demonstrated portability of NLP systems across multiple sites, challenges remain. Strategies to mitigate these challenges can strive for complex NLP problems using advanced methods (hard-to-reach fruit), or focus on simple NLP problems using practical methods (low-hanging fruit). This paper investigates a practical strategy for NLP portability using extraction of left ventricular ejection fraction (LVEF) as a use case. We used a tool developed at the Department of Veterans Affair (VA) to extract the LVEF values from free-text echocardiograms in the MIMIC-III database. The approach showed an accuracy of 98.4%, sensitivity of 99.4%, a positive predictive value of 98.7%, and F-score of 99.0%. This experience, in which a simple NLP solution proved highly portable with excellent performance, illustrates the point that simple NLP applications may be easier to disseminate and adapt, and in the short term may prove more useful, than complex applications.
Collapse
Affiliation(s)
- Stephen B Johnson
- Healthcare Policy and Research, Weill Cornell Medicine, New York, New York
| | - Prakash Adekkanattu
- Information Technologies & Services, Weill Cornell Medicine, New York, New York
| | - Thomas R Campion
- Healthcare Policy and Research, Weill Cornell Medicine, New York, New York
- Information Technologies & Services, Weill Cornell Medicine, New York, New York
| | - James Flory
- Healthcare Policy and Research, Weill Cornell Medicine, New York, New York
| | - Jyotishman Pathak
- Healthcare Policy and Research, Weill Cornell Medicine, New York, New York
| | - Olga V Patterson
- VA Salt Lake City Health Care System
- University of Utah, Salt Lake City, UT
| | - Scott L DuVall
- VA Salt Lake City Health Care System
- University of Utah, Salt Lake City, UT
| | - Vincent Major
- Center for Health Informatics and Bioinformatics, NYU Langone Medical Center, New York, New York
| | - Yindalon Aphinyanaphongs
- Center for Health Informatics and Bioinformatics, NYU Langone Medical Center, New York, New York
| |
Collapse
|
22
|
Chen J, Druhl E, Polepalli Ramesh B, Houston TK, Brandt CA, Zulman DM, Vimalananda VG, Malkani S, Yu H. A Natural Language Processing System That Links Medical Terms in Electronic Health Record Notes to Lay Definitions: System Development Using Physician Reviews. J Med Internet Res 2018; 20:e26. [PMID: 29358159 PMCID: PMC5799720 DOI: 10.2196/jmir.8669] [Citation(s) in RCA: 20] [Impact Index Per Article: 2.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/08/2017] [Revised: 11/21/2017] [Accepted: 12/06/2017] [Indexed: 11/23/2022] Open
Abstract
Background Many health care systems now allow patients to access their electronic health record (EHR) notes online through patient portals. Medical jargon in EHR notes can confuse patients, which may interfere with potential benefits of patient access to EHR notes. Objective The aim of this study was to develop and evaluate the usability and content quality of NoteAid, a Web-based natural language processing system that links medical terms in EHR notes to lay definitions, that is, definitions easily understood by lay people. Methods NoteAid incorporates two core components: CoDeMed, a lexical resource of lay definitions for medical terms, and MedLink, a computational unit that links medical terms to lay definitions. We developed innovative computational methods, including an adapted distant supervision algorithm to prioritize medical terms important for EHR comprehension to facilitate the effort of building CoDeMed. Ten physician domain experts evaluated the user interface and content quality of NoteAid. The evaluation protocol included a cognitive walkthrough session and a postsession questionnaire. Physician feedback sessions were audio-recorded. We used standard content analysis methods to analyze qualitative data from these sessions. Results Physician feedback was mixed. Positive feedback on NoteAid included (1) Easy to use, (2) Good visual display, (3) Satisfactory system speed, and (4) Adequate lay definitions. Opportunities for improvement arising from evaluation sessions and feedback included (1) improving the display of definitions for partially matched terms, (2) including more medical terms in CoDeMed, (3) improving the handling of terms whose definitions vary depending on different contexts, and (4) standardizing the scope of definitions for medicines. On the basis of these results, we have improved NoteAid’s user interface and a number of definitions, and added 4502 more definitions in CoDeMed. Conclusions Physician evaluation yielded useful feedback for content validation and refinement of this innovative tool that has the potential to improve patient EHR comprehension and experience using patient portals. Future ongoing work will develop algorithms to handle ambiguous medical terms and test and evaluate NoteAid with patients.
Collapse
Affiliation(s)
- Jinying Chen
- Department of Quantitative Health Sciences, University of Massachusetts Medical School, Worcester, MA, United States
| | - Emily Druhl
- Bedford Veterans Affairs Medical Center, Center for Healthcare Organization and Implementation Research, Bedford, MA, United States
| | | | - Thomas K Houston
- Department of Quantitative Health Sciences, University of Massachusetts Medical School, Worcester, MA, United States.,Bedford Veterans Affairs Medical Center, Center for Healthcare Organization and Implementation Research, Bedford, MA, United States
| | - Cynthia A Brandt
- Veterans Affairs Connecticut Health Care System, West Haven, CT, United States.,Center for Medical Informatics, Yale University, New Haven, CT, United States
| | - Donna M Zulman
- Division of Primary Care and Population Health, Stanford University School of Medicine, Stanford, CA, United States.,Veterans Affairs Palo Alto Health Care System, Menlo Park, CA, United States
| | - Varsha G Vimalananda
- Bedford Veterans Affairs Medical Center, Center for Healthcare Organization and Implementation Research, Bedford, MA, United States.,School of Medicine, Boston University, Boston, MA, United States
| | - Samir Malkani
- Diabetes Center of Excellence, University of Massachusetts Medical School, Worcester, MA, United States
| | - Hong Yu
- Department of Quantitative Health Sciences, University of Massachusetts Medical School, Worcester, MA, United States.,Bedford Veterans Affairs Medical Center, Center for Healthcare Organization and Implementation Research, Bedford, MA, United States
| |
Collapse
|
23
|
Soysal E, Wang J, Jiang M, Wu Y, Pakhomov S, Liu H, Xu H. CLAMP - a toolkit for efficiently building customized clinical natural language processing pipelines. J Am Med Inform Assoc 2017; 25:331-336. [PMID: 29186491 PMCID: PMC7378877 DOI: 10.1093/jamia/ocx132] [Citation(s) in RCA: 176] [Impact Index Per Article: 22.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/04/2017] [Revised: 09/28/2017] [Accepted: 10/19/2017] [Indexed: 11/14/2022] Open
Abstract
Existing general clinical natural language processing (NLP) systems such as MetaMap and Clinical Text Analysis and Knowledge Extraction System have been successfully applied to information extraction from clinical text. However, end users often have to customize existing systems for their individual tasks, which can require substantial NLP skills. Here we present CLAMP (Clinical Language Annotation, Modeling, and Processing), a newly developed clinical NLP toolkit that provides not only state-of-the-art NLP components, but also a user-friendly graphic user interface that can help users quickly build customized NLP pipelines for their individual applications. Our evaluation shows that the CLAMP default pipeline achieved good performance on named entity recognition and concept encoding. We also demonstrate the efficiency of the CLAMP graphic user interface in building customized, high-performance NLP pipelines with 2 use cases, extracting smoking status and lab test values. CLAMP is publicly available for research use, and we believe it is a unique asset for the clinical NLP community.
Collapse
Affiliation(s)
- Ergin Soysal
- School of Biomedical Informatics, University of Texas Health Science Center at Houston, Houston, TX, USA
| | - Jingqi Wang
- School of Biomedical Informatics, University of Texas Health Science Center at Houston, Houston, TX, USA
| | - Min Jiang
- School of Biomedical Informatics, University of Texas Health Science Center at Houston, Houston, TX, USA
| | - Yonghui Wu
- School of Biomedical Informatics, University of Texas Health Science Center at Houston, Houston, TX, USA
| | - Serguei Pakhomov
- Department of Pharmaceutical Care and Health System, University of Minnesota Twin Cities, Minneapolis, MN, USA
| | - Hongfang Liu
- Department of Health Sciences Research, Mayo Clinic, Rochester, MN, USA
| | - Hua Xu
- School of Biomedical Informatics, University of Texas Health Science Center at Houston, Houston, TX, USA
| |
Collapse
|
24
|
Névéol A, Zweigenbaum P. Clinical Natural Language Processing in 2015: Leveraging the Variety of Texts of Clinical Interest. Yearb Med Inform 2016; 25:234-239. [PMID: 27830256 PMCID: PMC5171575 DOI: 10.15265/iy-2016-049] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/29/2023] Open
Abstract
OBJECTIVE To summarize recent research and present a selection of the best papers published in 2015 in the field of clinical Natural Language Processing (NLP). METHOD A systematic review of the literature was performed by the two section editors of the IMIA Yearbook NLP section by searching bibliographic databases with a focus on NLP efforts applied to clinical texts or aimed at a clinical outcome. Section editors first selected a shortlist of candidate best papers that were then peer-reviewed by independent external reviewers. RESULTS The clinical NLP best paper selection shows that clinical NLP is making use of a variety of texts of clinical interest to contribute to the analysis of clinical information and the building of a body of clinical knowledge. The full review process highlighted five papers analyzing patient-authored texts or seeking to connect and aggregate multiple sources of information. They provide a contribution to the development of methods, resources, applications, and sometimes a combination of these aspects. CONCLUSIONS The field of clinical NLP continues to thrive through the contributions of both NLP researchers and healthcare professionals interested in applying NLP techniques to impact clinical practice. Foundational progress in the field makes it possible to leverage a larger variety of texts of clinical interest for healthcare purposes.
Collapse
Affiliation(s)
- A Névéol
- Aurélie Névéol, LIMSI CNRS UPR 3251, Université Paris Saclay, Rue John von Neumann, 91400 Orsay, France, E-mail:
| | - P Zweigenbaum
- Pierre Zweigenbaum, LIMSI CNRS UPR 3251, Université Paris Saclay, Rue John von Neumann, 91400 Orsay, France, E-mail:
| |
Collapse
|
25
|
Wang Q, S Abdul S, Almeida L, Ananiadou S, Balderas-Martínez YI, Batista-Navarro R, Campos D, Chilton L, Chou HJ, Contreras G, Cooper L, Dai HJ, Ferrell B, Fluck J, Gama-Castro S, George N, Gkoutos G, Irin AK, Jensen LJ, Jimenez S, Jue TR, Keseler I, Madan S, Matos S, McQuilton P, Milacic M, Mort M, Natarajan J, Pafilis E, Pereira E, Rao S, Rinaldi F, Rothfels K, Salgado D, Silva RM, Singh O, Stefancsik R, Su CH, Subramani S, Tadepally HD, Tsaprouni L, Vasilevsky N, Wang X, Chatr-Aryamontri A, Laulederkind SJF, Matis-Mitchell S, McEntyre J, Orchard S, Pundir S, Rodriguez-Esteban R, Van Auken K, Lu Z, Schaeffer M, Wu CH, Hirschman L, Arighi CN. Overview of the interactive task in BioCreative V. DATABASE-THE JOURNAL OF BIOLOGICAL DATABASES AND CURATION 2016; 2016:baw119. [PMID: 27589961 PMCID: PMC5009325 DOI: 10.1093/database/baw119] [Citation(s) in RCA: 34] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 04/04/2016] [Accepted: 07/28/2016] [Indexed: 11/14/2022]
Abstract
Fully automated text mining (TM) systems promote efficient literature searching, retrieval, and review but are not sufficient to produce ready-to-consume curated documents. These systems are not meant to replace biocurators, but instead to assist them in one or more literature curation steps. To do so, the user interface is an important aspect that needs to be considered for tool adoption. The BioCreative Interactive task (IAT) is a track designed for exploring user-system interactions, promoting development of useful TM tools, and providing a communication channel between the biocuration and the TM communities. In BioCreative V, the IAT track followed a format similar to previous interactive tracks, where the utility and usability of TM tools, as well as the generation of use cases, have been the focal points. The proposed curation tasks are user-centric and formally evaluated by biocurators. In BioCreative V IAT, seven TM systems and 43 biocurators participated. Two levels of user participation were offered to broaden curator involvement and obtain more feedback on usability aspects. The full level participation involved training on the system, curation of a set of documents with and without TM assistance, tracking of time-on-task, and completion of a user survey. The partial level participation was designed to focus on usability aspects of the interface and not the performance per se. In this case, biocurators navigated the system by performing pre-designed tasks and then were asked whether they were able to achieve the task and the level of difficulty in completing the task. In this manuscript, we describe the development of the interactive task, from planning to execution and discuss major findings for the systems tested. Database URL:http://www.biocreative.org
Collapse
Affiliation(s)
- Qinghua Wang
- Center for Bioinformatics and Computational Biology, University of Delaware, Newark, DE, 19711, USA Department of Computer and Information Sciences, University of Delaware, Newark, DE, 19711, USA
| | - Shabbir S Abdul
- International Centre of Health Information Technology, Taipei Medical University, Taipei, Taiwan
| | - Lara Almeida
- DETI/IEETA, University of Aveiro, Campus Universitário de Santiago, Aveiro 3810-193, Portugal
| | - Sophia Ananiadou
- National Centre for Text Mining, University of Manchester, Manchester, UK
| | | | | | | | - Lucy Chilton
- Northern Institute for Cancer Research, Newcastle University, New Castle, UK
| | - Hui-Jou Chou
- Rutgers University-Camden, Camden, NJ 08102, USA
| | - Gabriela Contreras
- Centro de Ciencias Genómicas, Universidad Nacional Autónoma de México, 04510 Ciudad de México, México
| | - Laurel Cooper
- Department of Botany and Plant Pathology, Oregon State University Corvallis, OR 97331, USA
| | - Hong-Jie Dai
- Department of Computer Science and Information Engineering, National Taitung University, Taitung, Taiwan
| | - Barbra Ferrell
- College of Agriculture and Natural Resources, University of Delaware, Newark, DE 19711, USA
| | - Juliane Fluck
- Fraunhofer Institute for Algorithms and Scientific Computing, Schloss Birlinghoven, 53754 St. Augustin, Germany
| | - Socorro Gama-Castro
- Centro de Ciencias Genómicas, Universidad Nacional Autónoma de México, 04510 Ciudad de México, México
| | | | - Georgios Gkoutos
- College of Medical and Dental Sciences, Institute of Cancer and Genomic Sciences, Centre for Computational Biology, University of Birmingham, Birmingham B15 2TT, UK Institute of Translational Medicine, University Hospitals Birmingham NHS Foundation Trust, Birmingham B15 2TT, UK
| | - Afroza K Irin
- Life Science Informatics, University of Bonn, Bonn, Germany
| | - Lars J Jensen
- Novo Nordisk Foundation Center for Protein Research, Faculty of Health and Medical Sciences, University of Copenhagen, Copenhagen, Denmark
| | - Silvia Jimenez
- Blue Brain Project, École Polytechnique Fédérale de Lausanne (EPFL) Biotech Campus, Geneva, Switzerland
| | - Toni R Jue
- Prince of Wales Clinical School, University of New South Wales NSW, Sydney, New South Wales, Australia
| | | | - Sumit Madan
- Fraunhofer Institute for Algorithms and Scientific Computing, Schloss Birlinghoven, 53754 St. Augustin, Germany
| | - Sérgio Matos
- DETI/IEETA, University of Aveiro, Campus Universitário de Santiago, Aveiro 3810-193, Portugal
| | | | - Marija Milacic
- Department of Informatics and Bio-Computing, Ontario Institute for Cancer Research, Toronto, ON M5G0A3, Canada
| | - Matthew Mort
- HGMD, Institute of Medical Genetics, Cardiff University, Heath Park, Cardiff, UK
| | - Jeyakumar Natarajan
- Department of Bioinformatics, Bharathiar University, Coimbatore, Tamil Nadu, India
| | - Evangelos Pafilis
- Institute of Marine Biology, Biotechnology and Aquaculture, Hellenic Centre for Marine Research, Heraklion, Crete, Greece
| | - Emiliano Pereira
- Microbial Genomics and Bioinformatics Group, Max Planck Institute for Marine Microbiology, Bremen, Germany
| | - Shruti Rao
- Innovation Center for Biomedical Informatics (ICBI), Georgetown University, Washington, DC 20007, USA
| | - Fabio Rinaldi
- Institute of Computational Linguistics, University of Zurich, Zurich, Switzerland
| | - Karen Rothfels
- Department of Informatics and Bio-Computing, Ontario Institute for Cancer Research, Toronto, ON M5G0A3, Canada
| | - David Salgado
- GMGF, Aix-Marseille Universite, 13385 Marseille, France Inserm, UMR_S 910, 13385 Marseille, France
| | - Raquel M Silva
- Department of Medical Sciences, iBiMED & IEETA, University of Aveiro, 3810-193 Aveiro, Portugal
| | - Onkar Singh
- Taipei Medical University Graduate Institute of Biomedical informatics, Taipei, Taiwan
| | | | - Chu-Hsien Su
- Institute of Information Science, Academia Sinica, Taipei, Taiwan
| | - Suresh Subramani
- Department of Bioinformatics, Bharathiar University, Coimbatore, Tamil Nadu, India
| | | | - Loukia Tsaprouni
- Institute of Sport and Physical Activity Research (ISPAR), University of Bedfordshire, Bedford, UK
| | - Nicole Vasilevsky
- Ontology Development Group, Oregon Health & Science University, Portland, OR 97239, USA
| | - Xiaodong Wang
- WormBase Consortium, Division of Biology and Biological Engineering, California Institute of Technology, Pasadena, CA 91125, USA
| | | | | | | | | | - Sandra Orchard
- European Bioinformatics Institute (EMBL-EBI), Hinxton, UK
| | - Sangya Pundir
- European Bioinformatics Institute (EMBL-EBI), Hinxton, UK
| | | | - Kimberly Van Auken
- WormBase Consortium, Division of Biology and Biological Engineering, California Institute of Technology, Pasadena, CA 91125, USA
| | - Zhiyong Lu
- National Center for Biotechnology Information (NCBI), National Institutes of Health, Bethesda, MD 20894, USA
| | - Mary Schaeffer
- MaizeGDB USDA ARS and University of Missouri, Columbia, MO 65211, USA
| | - Cathy H Wu
- Center for Bioinformatics and Computational Biology, University of Delaware, Newark, DE, 19711, USA Department of Computer and Information Sciences, University of Delaware, Newark, DE, 19711, USA
| | | | - Cecilia N Arighi
- Center for Bioinformatics and Computational Biology, University of Delaware, Newark, DE, 19711, USA Department of Computer and Information Sciences, University of Delaware, Newark, DE, 19711, USA
| |
Collapse
|
26
|
Uzuner Ö, Stubbs A. Practical applications for natural language processing in clinical research: The 2014 i2b2/UTHealth shared tasks. J Biomed Inform 2015; 58 Suppl:S1-S5. [PMID: 26515500 PMCID: PMC4978169 DOI: 10.1016/j.jbi.2015.10.007] [Citation(s) in RCA: 18] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/21/2015] [Revised: 10/08/2015] [Accepted: 10/14/2015] [Indexed: 12/29/2022]
Affiliation(s)
- Özlem Uzuner
- Department of Information Studies, State University of New York at Albany, Albany, NY, USA.
| | - Amber Stubbs
- School of Library and Information Science, Simmons College, Boston, MA, USA.
| |
Collapse
|