1
|
Durango MC, Torres-Silva EA, Orozco-Duque A. Named Entity Recognition in Electronic Health Records: A Methodological Review. Healthc Inform Res 2023; 29:286-300. [PMID: 37964451 PMCID: PMC10651400 DOI: 10.4258/hir.2023.29.4.286] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/21/2023] [Revised: 07/29/2023] [Accepted: 09/03/2023] [Indexed: 11/16/2023] Open
Abstract
OBJECTIVES A substantial portion of the data contained in Electronic Health Records (EHR) is unstructured, often appearing as free text. This format restricts its potential utility in clinical decision-making. Named entity recognition (NER) methods address the challenge of extracting pertinent information from unstructured text. The aim of this study was to outline the current NER methods and trace their evolution from 2011 to 2022. METHODS We conducted a methodological literature review of NER methods, with a focus on distinguishing the classification models, the types of tagging systems, and the languages employed in various corpora. RESULTS Several methods have been documented for automatically extracting relevant information from EHRs using natural language processing techniques such as NER and relation extraction (RE). These methods can automatically extract concepts, events, attributes, and other data, as well as the relationships between them. Most NER studies conducted thus far have utilized corpora in English or Chinese. Additionally, the bidirectional encoder representation from transformers using the BIO tagging system architecture is the most frequently reported classification scheme. We discovered a limited number of papers on the implementation of NER or RE tasks in EHRs within a specific clinical domain. CONCLUSIONS EHRs play a pivotal role in gathering clinical information and could serve as the primary source for automated clinical decision support systems. However, the creation of new corpora from EHRs in specific clinical domains is essential to facilitate the swift development of NER and RE models applied to EHRs for use in clinical practice.
Collapse
Affiliation(s)
- María C. Durango
- Grupo de Investigación e Innovación Biomédica, Instituto Tecnológico Metropolitano, Antioquia,
Colombia
| | - Ever A. Torres-Silva
- Grupo de Investigación e Innovación Biomédica, Instituto Tecnológico Metropolitano, Antioquia,
Colombia
| | - Andrés Orozco-Duque
- Grupo de Investigación e Innovación Biomédica, Instituto Tecnológico Metropolitano, Antioquia,
Colombia
- Facultad de Ingenierías, Universidad de Medellín, Antioquia,
Colombia
| |
Collapse
|
2
|
Zhang Z, Liu D, Zhang M, Qin X. Combining data augmentation and domain information with TENER model for Clinical Event Detection. BMC Med Inform Decis Mak 2021; 21:261. [PMID: 34789246 PMCID: PMC8596895 DOI: 10.1186/s12911-021-01618-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/15/2021] [Accepted: 08/23/2021] [Indexed: 11/19/2022] Open
Abstract
Background In recent years, with the development of artificial intelligence, the use of deep learning technology for clinical information extraction has become a new trend. Clinical Event Detection (CED) as its subtask has attracted the attention from academia and industry. However, directly applying the advancements in deep learning to CED task often yields unsatisfactory results. The main reasons are due to the following two points: (1) A great number of obscure professional terms in the electronic medical record leads to poor recognition performance of model. (2) The scarcity of datasets required for the task leads to poor model robustness. Therefore, it is urgent to solve these two problems to improve model performance. Methods This paper proposes a combining data augmentation and domain information with TENER Model for Clinical Event Detection. Results We use two evaluation metrics to compare the overall performance of the proposed model with the existing model on the 2012 i2b2 challenge dataset. Experimental results demonstrate that our proposed model achieves the best F1-score of 80.26%, type accuracy of 93% and Span F1-score of 90.33%, and outperforms the state-of-the-art approaches. Conclusions This paper proposes a multi-granularity information fusion encoder-decoder framework, which applies the TENER model to the CED task for the first time. It uses the pre-trained language model (BioBERT) to generate word-level features, solving the problem of a great number of obscure professional terms in the electronic medical record lead to poor recognition performance of model. In addition, this paper proposes a new data augmentation method for sequence labeling tasks, solving the problem of the scarcity of datasets required for the task leads to poor model robustness.
Collapse
Affiliation(s)
- Zhichang Zhang
- College of Computer Science and Engineering, Northwest Normal University, 967 Anning East Road, 730070, Lanzhou, China.
| | - Dan Liu
- College of Computer Science and Engineering, Northwest Normal University, 967 Anning East Road, 730070, Lanzhou, China
| | - Minyu Zhang
- College of Computer Science and Engineering, Northwest Normal University, 967 Anning East Road, 730070, Lanzhou, China
| | - Xiaohui Qin
- College of Computer Science and Engineering, Northwest Normal University, 967 Anning East Road, 730070, Lanzhou, China
| |
Collapse
|
3
|
Alfattni G, Peek N, Nenadic G. Attention-based bidirectional long short-term memory networks for extracting temporal relationships from clinical discharge summaries. J Biomed Inform 2021; 123:103915. [PMID: 34600144 DOI: 10.1016/j.jbi.2021.103915] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/09/2021] [Revised: 08/05/2021] [Accepted: 09/09/2021] [Indexed: 10/20/2022]
Abstract
Temporal relation extraction between health-related events is a widely studied task in clinical Natural Language Processing (NLP). The current state-of-the-art methods mostly rely on engineered features (i.e., rule-based modelling) and sequence modelling, which often encodes a source sentence into a single fixed-length context. An obvious disadvantage of this fixed-length context design is its incapability to model longer sentences, as important temporal information in the clinical text may appear at different positions. To address this issue, we propose an Attention-based Bidirectional Long Short-Term Memory (Att-BiLSTM) model to enable learning the important semantic information in long source text segments and to better determine which parts of the text are most important. We experimented with two embeddings and compared the performances to traditional state-of-the-art methods that require elaborate linguistic pre-processing and hand-engineered features. The experimental results on the i2b2 2012 temporal relation test corpus show that the proposed method achieves a significant improvement with an F-score of 0.811, which is at least 10% better than state-of-the-art in the field. We show that the model can be remarkably effective at classifying temporal relations when provided with word embeddings trained on corpora in a general domain. Finally, we perform an error analysis to gain insight into the common errors made by the model.
Collapse
Affiliation(s)
- Ghada Alfattni
- Department of Computer Science, University of Manchester, Manchester, UK; Department of Computer Science, Jamoum University College, Umm Al-Qura University, Makkah, Saudi Arabia.
| | - Niels Peek
- Centre for Health Informatics, Division of Informatics, Imaging and Data Sciences, University of Manchester, Manchester, UK; National Institute of Health Research Manchester Biomedical Research Centre, Manchester Academic Health Science Centre, University of Manchester, Manchester, UK; The Alan Turing Institute, UK
| | - Goran Nenadic
- Department of Computer Science, University of Manchester, Manchester, UK; The Alan Turing Institute, UK
| |
Collapse
|
4
|
Olex AL, McInnes BT. Review of Temporal Reasoning in the Clinical Domain for Timeline Extraction: Where we are and where we need to be. J Biomed Inform 2021; 118:103784. [PMID: 33862232 DOI: 10.1016/j.jbi.2021.103784] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/25/2020] [Revised: 03/07/2021] [Accepted: 04/08/2021] [Indexed: 11/16/2022]
Abstract
Understanding a patient's medical history, such as how long symptoms last or when a procedure was performed, is vital to diagnosing problems and providing good care. Frequently, important information regarding a patient's medical timeline is buried in their Electronic Health Record (EHR) in the form of unstructured clinical notes. This results in care providers spending time reading notes in a patient's record in order to become familiar with their condition prior to developing a diagnosis or treatment plan. Valuable time could be saved if this information was readily accessible for searching and visualization for fast comprehension by the medical team. Clinical Natural Language Processing (NLP) is an area of research that aims to build computational methods to automatically extract medically relevant information from unstructured clinical texts. A key component of Clinical NLP is Temporal Reasoning, as understanding a patient's medical history relies heavily on the ability to identify, assimilate, and reason over temporal information. In this work, we review the current state of Temporal Reasoning in the clinical domain with respect to Clinical Timeline Extraction. While much progress has been made, the current state-of-the-art still has a ways to go before practical application in the clinical setting will be possible. Areas such as handling relative and implicit temporal expressions, both in normalization and in identifying temporal relationships, improving co-reference resolution, and building inter-operable timeline extraction tools that can integrate multiple types of data are in need of new and innovative solutions to improve performance on clinical data.
Collapse
Affiliation(s)
- Amy L Olex
- Virginia Commonwealth University, 401 S. Main St., Richmond, VA 23284, USA.
| | - Bridget T McInnes
- Virginia Commonwealth University, 401 S. Main St., Richmond, VA 23284, USA
| |
Collapse
|
5
|
Alfattni G, Peek N, Nenadic G. Extraction of temporal relations from clinical free text: A systematic review of current approaches. J Biomed Inform 2020; 108:103488. [PMID: 32673788 DOI: 10.1016/j.jbi.2020.103488] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/27/2019] [Revised: 06/10/2020] [Accepted: 06/15/2020] [Indexed: 10/23/2022]
Abstract
BACKGROUND Temporal relations between clinical events play an important role in clinical assessment and decision making. Extracting such relations from free text data is a challenging task because it lies on between medical natural language processing, temporal representation and temporal reasoning. OBJECTIVES To survey existing methods for extracting temporal relations (TLINKs) between events from clinical free text in English; to establish the state-of-the-art in this field; and to identify outstanding methodological challenges. METHODS A systematic search in PubMed and the DBLP computer science bibliography was conducted for studies published between January 2006 and December 2018. The relevant studies were identified by examining the titles and abstracts. Then, the full text of selected studies was analyzed in depth and information were collected on TLINK tasks, TLINK types, data sources, features selection, methods used, and reported performance. RESULTS A total of 2834 publications were identified for title and abstract screening. Of these publications, 51 studies were selected. Thirty-two studies used machine learning approaches, 15 studies used a hybrid approaches, and only four studies used a rule-based approach. The majority of studies use publicly available corpora: THYME (28 studies) and the i2b2 corpus (17 studies). CONCLUSION The performance of TLINK extraction methods ranges widely depending on relation types and events (e.g. from 32% to 87% F-score for identifying relations between clinical events and document creation time). A small set of TLINKs (before, after, overlap and contains) has been widely studied with relatively good performance, whereas other types of TLINK (e.g., started by, finished by, precedes) are rarely studied and remain challenging. Machine learning classifiers (such as Support Vector Machine and Conditional Random Fields) and Deep Neural Networks were among the best performing methods for extracting TLINKs, but nearly all the work has been carried out and tested on two publicly available corpora only. The field would benefit from the availability of more publicly available, high-quality, annotated clinical text corpora.
Collapse
Affiliation(s)
- Ghada Alfattni
- Department of Computer Science, University of Manchester, Manchester, UK; Department of Computer Science, Jamoum University College, Umm Al-Qura University, Makkah, Saudi Arabia.
| | - Niels Peek
- Centre for Health Informatics, Division of Informatics, Imaging and Data Sciences, University of Manchester, Manchester, UK; National Institute of Health Research Manchester Biomedical Research Centre, Manchester Academic Health Science Centre, University of Manchester, Manchester, UK; The Alan Turing Institute, UK
| | - Goran Nenadic
- Department of Computer Science, University of Manchester, Manchester, UK; The Alan Turing Institute, UK
| |
Collapse
|
6
|
Zhao S, Li L, Lu H, Zhou A, Qian S. Associative attention networks for temporal relation extraction from electronic health records. J Biomed Inform 2019; 99:103309. [PMID: 31627021 DOI: 10.1016/j.jbi.2019.103309] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/06/2019] [Revised: 10/10/2019] [Accepted: 10/11/2019] [Indexed: 01/17/2023]
Abstract
Temporal relations are crucial in constructing a timeline over the course of clinical care, which can help medical practitioners and researchers track the progression of diseases, treatments and adverse reactions over time. Due to the rapid adoption of Electronic Health Records (EHRs) and high cost of manual curation, using Natural Language Processing (NLP) to extract temporal relations automatically has become a promising approach. Typically temporal relation extraction is formulated as a classification problem for the instances of entity pairs, which relies on the information hidden in context. However, EHRs contain an overwhelming amount of entities and a large number of entity pairs gathering in the same context, making it difficult to distinguish instances and identify relevant contextual information for a specific entity pair. All these pose significant challenges towards temporal relation extraction while existing methods rarely pay attention to. In this work, we propose the associative attention networks to address these issues. Each instance is first carved into three segments according to the entity pair to obtain the differentiated representation initially. Then we devise the associative attention mechanism for a further distinction by emphasizing the relevant information, and meanwhile, for the reconstruction of association among segments as the final representation of the whole instance. In addition, position weights are utilized to enhance the performance. We validate the merit of our method on the widely used THYME corpus and achieve an average F1-score of 64.3% over three runs, which outperforms the state-of-the-art by 1.5%.
Collapse
Affiliation(s)
- Shiyi Zhao
- School of Computer Science and Technology, Dalian University of Technology, 116024 Dalian, China
| | - Lishuang Li
- School of Computer Science and Technology, Dalian University of Technology, 116024 Dalian, China.
| | - Hongbin Lu
- School of Computer Science and Technology, Dalian University of Technology, 116024 Dalian, China
| | - Anqiao Zhou
- School of Computer Science and Technology, Dalian University of Technology, 116024 Dalian, China
| | - Shuang Qian
- School of Computer Science and Technology, Dalian University of Technology, 116024 Dalian, China
| |
Collapse
|
7
|
Complexities, variations, and errors of numbering within clinical notes: the potential impact on information extraction and cohort-identification. BMC Med Inform Decis Mak 2019; 19:75. [PMID: 30944012 PMCID: PMC6448181 DOI: 10.1186/s12911-019-0784-1] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Numbers and numerical concepts appear frequently in free text clinical notes from electronic health records. Knowledge of the frequent lexical variations of these numerical concepts, and their accurate identification, is important for many information extraction tasks. This paper describes an analysis of the variation in how numbers and numerical concepts are represented in clinical notes. METHODS We used an inverted index of approximately 100 million notes to obtain the frequency of various permutations of numbers and numerical concepts, including the use of Roman numerals, numbers spelled as English words, and invalid dates, among others. Overall, twelve types of lexical variants were analyzed. RESULTS We found substantial variation in how these concepts were represented in the notes, including multiple data quality issues. We also demonstrate that not considering these variations could have substantial real-world implications for cohort identification tasks, with one case missing > 80% of potential patients. CONCLUSIONS Numbering within clinical notes can be variable, and not taking these variations into account could result in missing or inaccurate information for natural language processing and information retrieval tasks.
Collapse
|
8
|
Wang W, Kreimeyer K, Woo EJ, Ball R, Foster M, Pandey A, Scott J, Botsis T. A new algorithmic approach for the extraction of temporal associations from clinical narratives with an application to medical product safety surveillance reports. J Biomed Inform 2016; 62:78-89. [DOI: 10.1016/j.jbi.2016.06.006] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/14/2016] [Revised: 06/11/2016] [Accepted: 06/17/2016] [Indexed: 11/25/2022]
|
9
|
Cohen KB, Glass B, Greiner HM, Holland-Bouley K, Standridge S, Arya R, Faist R, Morita D, Mangano F, Connolly B, Glauser T, Pestian J. Methodological Issues in Predicting Pediatric Epilepsy Surgery Candidates Through Natural Language Processing and Machine Learning. BIOMEDICAL INFORMATICS INSIGHTS 2016; 8:11-8. [PMID: 27257386 PMCID: PMC4876984 DOI: 10.4137/bii.s38308] [Citation(s) in RCA: 45] [Impact Index Per Article: 5.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 12/23/2015] [Revised: 03/03/2016] [Accepted: 03/03/2016] [Indexed: 01/26/2023]
Abstract
OBJECTIVE We describe the development and evaluation of a system that uses machine learning and natural language processing techniques to identify potential candidates for surgical intervention for drug-resistant pediatric epilepsy. The data are comprised of free-text clinical notes extracted from the electronic health record (EHR). Both known clinical outcomes from the EHR and manual chart annotations provide gold standards for the patient's status. The following hypotheses are then tested: 1) machine learning methods can identify epilepsy surgery candidates as well as physicians do and 2) machine learning methods can identify candidates earlier than physicians do. These hypotheses are tested by systematically evaluating the effects of the data source, amount of training data, class balance, classification algorithm, and feature set on classifier performance. The results support both hypotheses, with F-measures ranging from 0.71 to 0.82. The feature set, classification algorithm, amount of training data, class balance, and gold standard all significantly affected classification performance. It was further observed that classification performance was better than the highest agreement between two annotators, even at one year before documented surgery referral. The results demonstrate that such machine learning methods can contribute to predicting pediatric epilepsy surgery candidates and reducing lag time to surgery referral.
Collapse
Affiliation(s)
- Kevin Bretonnel Cohen
- Computational Bioscience Program, University of Colorado School of Medicine, Denver, CO, USA
| | - Benjamin Glass
- Division of Biomedical Informatics, Cincinnati Children’s Hospital Medical Center, University of Cincinnati, Cincinnati, OH, USA
| | - Hansel M. Greiner
- Division of Neurology, Cincinnati Children’s Hospital Medical Center, University of Cincinnati, Cincinnati, OH, USA
| | - Katherine Holland-Bouley
- Division of Neurology, Cincinnati Children’s Hospital Medical Center, University of Cincinnati, Cincinnati, OH, USA
| | - Shannon Standridge
- Division of Neurology, Cincinnati Children’s Hospital Medical Center, University of Cincinnati, Cincinnati, OH, USA
| | - Ravindra Arya
- Division of Neurology, Cincinnati Children’s Hospital Medical Center, University of Cincinnati, Cincinnati, OH, USA
| | - Robert Faist
- Division of Biomedical Informatics, Cincinnati Children’s Hospital Medical Center, University of Cincinnati, Cincinnati, OH, USA
| | - Diego Morita
- Division of Neurology, Cincinnati Children’s Hospital Medical Center, University of Cincinnati, Cincinnati, OH, USA
| | - Francesco Mangano
- Division of Pediatric Neurosurgery, Cincinnati Children’s Hospital Medical Center, University of Cincinnati, Cincinnati, OH, USA
| | - Brian Connolly
- Division of Biomedical Informatics, Cincinnati Children’s Hospital Medical Center, University of Cincinnati, Cincinnati, OH, USA
| | - Tracy Glauser
- Division of Neurology, Cincinnati Children’s Hospital Medical Center, University of Cincinnati, Cincinnati, OH, USA
| | - John Pestian
- Division of Biomedical Informatics, Cincinnati Children’s Hospital Medical Center, University of Cincinnati, Cincinnati, OH, USA
| |
Collapse
|
10
|
Madkour M, Benhaddou D, Tao C. Temporal data representation, normalization, extraction, and reasoning: A review from clinical domain. COMPUTER METHODS AND PROGRAMS IN BIOMEDICINE 2016; 128:52-68. [PMID: 27040831 PMCID: PMC4837648 DOI: 10.1016/j.cmpb.2016.02.007] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 08/18/2015] [Accepted: 02/16/2016] [Indexed: 05/04/2023]
Abstract
BACKGROUND AND OBJECTIVE We live our lives by the calendar and the clock, but time is also an abstraction, even an illusion. The sense of time can be both domain-specific and complex, and is often left implicit, requiring significant domain knowledge to accurately recognize and harness. In the clinical domain, the momentum gained from recent advances in infrastructure and governance practices has enabled the collection of tremendous amount of data at each moment in time. Electronic health records (EHRs) have paved the way to making these data available for practitioners and researchers. However, temporal data representation, normalization, extraction and reasoning are very important in order to mine such massive data and therefore for constructing the clinical timeline. The objective of this work is to provide an overview of the problem of constructing a timeline at the clinical point of care and to summarize the state-of-the-art in processing temporal information of clinical narratives. METHODS This review surveys the methods used in three important area: modeling and representing of time, medical NLP methods for extracting time, and methods of time reasoning and processing. The review emphasis on the current existing gap between present methods and the semantic web technologies and catch up with the possible combinations. RESULTS The main findings of this review are revealing the importance of time processing not only in constructing timelines and clinical decision support systems but also as a vital component of EHR data models and operations. CONCLUSIONS Extracting temporal information in clinical narratives is a challenging task. The inclusion of ontologies and semantic web will lead to better assessment of the annotation task and, together with medical NLP techniques, will help resolving granularity and co-reference resolution problems.
Collapse
Affiliation(s)
- Mohcine Madkour
- School of Biomedical Informatics, University of Texas Health Science Center at Houston, 7000 Fannin St, Houston, TX 77030, United States.
| | - Driss Benhaddou
- Department of Engineering Technology, University of Houston, 4800 Calhoun Rd, Houston, TX 77004, United States.
| | - Cui Tao
- School of Biomedical Informatics, University of Texas Health Science Center at Houston, 7000 Fannin St, Houston, TX 77030, United States.
| |
Collapse
|
11
|
Lin C, Karlson EW, Dligach D, Ramirez MP, Miller TA, Mo H, Braggs NS, Cagan A, Gainer V, Denny JC, Savova GK. Automatic identification of methotrexate-induced liver toxicity in patients with rheumatoid arthritis from the electronic medical record. J Am Med Inform Assoc 2015; 22:e151-61. [PMID: 25344930 PMCID: PMC5901122 DOI: 10.1136/amiajnl-2014-002642] [Citation(s) in RCA: 47] [Impact Index Per Article: 5.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/09/2014] [Revised: 08/14/2014] [Accepted: 08/22/2014] [Indexed: 12/22/2022] Open
Abstract
OBJECTIVES To improve the accuracy of mining structured and unstructured components of the electronic medical record (EMR) by adding temporal features to automatically identify patients with rheumatoid arthritis (RA) with methotrexate-induced liver transaminase abnormalities. MATERIALS AND METHODS Codified information and a string-matching algorithm were applied to a RA cohort of 5903 patients from Partners HealthCare to select 1130 patients with potential liver toxicity. Supervised machine learning was applied as our key method. For features, Apache clinical Text Analysis and Knowledge Extraction System (cTAKES) was used to extract standard vocabulary from relevant sections of the unstructured clinical narrative. Temporal features were further extracted to assess the temporal relevance of event mentions with regard to the date of transaminase abnormality. All features were encapsulated in a 3-month-long episode for classification. Results were summarized at patient level in a training set (N=480 patients) and evaluated against a test set (N=120 patients). RESULTS The system achieved positive predictive value (PPV) 0.756, sensitivity 0.919, F1 score 0.829 on the test set, which was significantly better than the best baseline system (PPV 0.590, sensitivity 0.703, F1 score 0.642). Our innovations, which included framing the phenotype problem as an episode-level classification task, and adding temporal information, all proved highly effective. CONCLUSIONS Automated methotrexate-induced liver toxicity phenotype discovery for patients with RA based on structured and unstructured information in the EMR shows accurate results. Our work demonstrates that adding temporal features significantly improved classification results.
Collapse
Affiliation(s)
- Chen Lin
- Boston Children's Hospital, Informatics Program, Boston, Massachusetts, USA
- *CL, EWK and DD are co-first authors
| | - Elizabeth W Karlson
- Division of Rheumatology, Immunology and Allergy, Department of Medicine, Brigham and Women's Hospital, Boston, Massachusetts, USA
- Harvard Medical School, Boston, Massachusetts, USA
- *CL, EWK and DD are co-first authors
| | - Dmitriy Dligach
- Boston Children's Hospital, Informatics Program, Boston, Massachusetts, USA
- Harvard Medical School, Boston, Massachusetts, USA
- *CL, EWK and DD are co-first authors
| | - Monica P Ramirez
- Division of Rheumatology, Immunology and Allergy, Department of Medicine, Brigham and Women's Hospital, Boston, Massachusetts, USA
| | - Timothy A Miller
- Boston Children's Hospital, Informatics Program, Boston, Massachusetts, USA
- Harvard Medical School, Boston, Massachusetts, USA
| | - Huan Mo
- Department of Biomedical Informatics, Vanderbilt University, Nashville, Tennessee, USA
| | - Natalie S Braggs
- Department of Medicine, Vanderbilt University, Nashville, Tennessee, USA
| | - Andrew Cagan
- Research Computing, Partners HealthCare, Boston, Massachusetts, USA
| | - Vivian Gainer
- Research Computing, Partners HealthCare, Boston, Massachusetts, USA
| | - Joshua C Denny
- Department of Biomedical Informatics, Vanderbilt University, Nashville, Tennessee, USA
- Department of Medicine, Vanderbilt University, Nashville, Tennessee, USA
| | - Guergana K Savova
- Boston Children's Hospital, Informatics Program, Boston, Massachusetts, USA
- Harvard Medical School, Boston, Massachusetts, USA
| |
Collapse
|
12
|
Chang YC, Dai HJ, Wu JCY, Chen JM, Tsai RTH, Hsu WL. TEMPTING system: a hybrid method of rule and machine learning for temporal relation extraction in patient discharge summaries. J Biomed Inform 2013; 46 Suppl:S54-S62. [PMID: 24060600 DOI: 10.1016/j.jbi.2013.09.007] [Citation(s) in RCA: 31] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/14/2013] [Revised: 09/07/2013] [Accepted: 09/11/2013] [Indexed: 11/30/2022]
Abstract
Patient discharge summaries provide detailed medical information about individuals who have been hospitalized. To make a precise and legitimate assessment of the abundant data, a proper time layout of the sequence of relevant events should be compiled and used to drive a patient-specific timeline, which could further assist medical personnel in making clinical decisions. The process of identifying the chronological order of entities is called temporal relation extraction. In this paper, we propose a hybrid method to identify appropriate temporal links between a pair of entities. The method combines two approaches: one is rule-based and the other is based on the maximum entropy model. We develop an integration algorithm to fuse the results of the two approaches. All rules and the integration algorithm are formally stated so that one can easily reproduce the system and results. To optimize the system's configuration, we used the 2012 i2b2 challenge TLINK track dataset and applied threefold cross validation to the training set. Then, we evaluated its performance on the training and test datasets. The experiment results show that the proposed TEMPTING (TEMPoral relaTion extractING) system (ranked seventh) achieved an F-score of 0.563, which was at least 30% better than that of the baseline system, which randomly selects TLINK candidates from all pairs and assigns the TLINK types. The TEMPTING system using the hybrid method also outperformed the stage-based TEMPTING system. Its F-scores were 3.51% and 0.97% better than those of the stage-based system on the training set and test set, respectively.
Collapse
Affiliation(s)
- Yung-Chun Chang
- Institute of Information Science, Academia Sinica, Nankang, Taipei, Taiwan, ROC; Department of Information Management, National Taiwan University, Taipei, Taiwan, ROC
| | - Hong-Jie Dai
- Graduate Institute of Biomedical Informatics, College of Medical Science and Technology, Taipei Medical University, Taipei, Taiwan, ROC.
| | - Johnny Chi-Yang Wu
- Institute of Information Science, Academia Sinica, Nankang, Taipei, Taiwan, ROC
| | - Jian-Ming Chen
- Institute of Information Science, Academia Sinica, Nankang, Taipei, Taiwan, ROC
| | - Richard Tzong-Han Tsai
- Department of Computer Science and Information Engineering, National Central University, Taoyuan, Taiwan, ROC
| | - Wen-Lian Hsu
- Institute of Information Science, Academia Sinica, Nankang, Taipei, Taiwan, ROC
| |
Collapse
|
13
|
Jindal P, Roth D. Extraction of events and temporal expressions from clinical narratives. J Biomed Inform 2013; 46 Suppl:S13-S19. [PMID: 24022023 DOI: 10.1016/j.jbi.2013.08.010] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/12/2013] [Revised: 08/08/2013] [Accepted: 08/27/2013] [Indexed: 10/26/2022]
Abstract
This paper addresses an important task of event and timex extraction from clinical narratives in context of the i2b2 2012 challenge. State-of-the-art approaches for event extraction use a multi-class classifier for finding the event types. However, such approaches consider each event in isolation. In this paper, we present a sentence-level inference strategy which enforces consistency constraints on attributes of those events which appear close to one another. Our approach is general and can be used for other tasks as well. We also design novel features like clinical descriptors (from medical ontologies) which encode a lot of useful information about the concepts. For timex extraction, we adapt a state-of-the-art system, HeidelTime, for use in clinical narratives and also develop several rules which complement HeidelTime. We also give a robust algorithm for date extraction. For the event extraction task, we achieved an overall F1 score of 0.71 for determining span of the events along with their attributes. For the timex extraction task, we achieved an F1 score of 0.79 for determining span of the temporal expressions. We present detailed error analysis of our system and also point out some factors which can help to improve its accuracy.
Collapse
Affiliation(s)
- Prateek Jindal
- Department of Computer Science, UIUC, 201 N. Goodwin Ave, Urbana, IL 61801, United States.
| | - Dan Roth
- Department of Computer Science, UIUC, 201 N. Goodwin Ave, Urbana, IL 61801, United States.
| |
Collapse
|
14
|
Sun W, Rumshisky A, Uzuner O. Temporal reasoning over clinical text: the state of the art. J Am Med Inform Assoc 2013; 20:814-9. [PMID: 23676245 PMCID: PMC3756277 DOI: 10.1136/amiajnl-2013-001760] [Citation(s) in RCA: 27] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/26/2013] [Revised: 04/17/2013] [Accepted: 04/20/2013] [Indexed: 11/03/2022] Open
Abstract
OBJECTIVES To provide an overview of the problem of temporal reasoning over clinical text and to summarize the state of the art in clinical natural language processing for this task. TARGET AUDIENCE This overview targets medical informatics researchers who are unfamiliar with the problems and applications of temporal reasoning over clinical text. SCOPE We review the major applications of text-based temporal reasoning, describe the challenges for software systems handling temporal information in clinical text, and give an overview of the state of the art. Finally, we present some perspectives on future research directions that emerged during the recent community-wide challenge on text-based temporal reasoning in the clinical domain.
Collapse
Affiliation(s)
- Weiyi Sun
- Department of Informatics, University at Albany, SUNY, Albany, New York, USA
| | - Anna Rumshisky
- Department of Computer Science, University of Massachusetts, Lowell, Massachusetts, USA
| | - Ozlem Uzuner
- Department of Information Studies, University at Albany, SUNY, Albany, New York, USA
| |
Collapse
|
15
|
Sun W, Rumshisky A, Uzuner O. Evaluating temporal relations in clinical text: 2012 i2b2 Challenge. J Am Med Inform Assoc 2013; 20:806-13. [PMID: 23564629 DOI: 10.1136/amiajnl-2013-001628] [Citation(s) in RCA: 180] [Impact Index Per Article: 16.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/03/2022] Open
Abstract
BACKGROUND The Sixth Informatics for Integrating Biology and the Bedside (i2b2) Natural Language Processing Challenge for Clinical Records focused on the temporal relations in clinical narratives. The organizers provided the research community with a corpus of discharge summaries annotated with temporal information, to be used for the development and evaluation of temporal reasoning systems. 18 teams from around the world participated in the challenge. During the workshop, participating teams presented comprehensive reviews and analysis of their systems, and outlined future research directions suggested by the challenge contributions. METHODS The challenge evaluated systems on the information extraction tasks that targeted: (1) clinically significant events, including both clinical concepts such as problems, tests, treatments, and clinical departments, and events relevant to the patient's clinical timeline, such as admissions, transfers between departments, etc; (2) temporal expressions, referring to the dates, times, durations, or frequencies phrases in the clinical text. The values of the extracted temporal expressions had to be normalized to an ISO specification standard; and (3) temporal relations, between the clinical events and temporal expressions. Participants determined pairs of events and temporal expressions that exhibited a temporal relation, and identified the temporal relation between them. RESULTS For event detection, statistical machine learning (ML) methods consistently showed superior performance. While ML and rule based methods seemed to detect temporal expressions equally well, the best systems overwhelmingly adopted a rule based approach for value normalization. For temporal relation classification, the systems using hybrid approaches that combined ML and heuristics based methods produced the best results.
Collapse
Affiliation(s)
- Weiyi Sun
- Department of Informatics, University at Albany, SUNY, Albany, New York, USA
| | | | | |
Collapse
|