1
|
Moore CL, Socrates V, Hesami M, Denkewicz RP, Cavallo JJ, Venkatesh AK, Taylor RA. Using natural language processing to identify emergency department patients with incidental lung nodules requiring follow-up. Acad Emerg Med 2025; 32:274-283. [PMID: 39821298 DOI: 10.1111/acem.15080] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/12/2024] [Revised: 12/12/2024] [Accepted: 12/15/2024] [Indexed: 01/19/2025]
Abstract
OBJECTIVES For emergency department (ED) patients, lung cancer may be detected early through incidental lung nodules (ILNs) discovered on chest CTs. However, there are significant errors in the communication and follow-up of incidental findings on ED imaging, particularly due to unstructured radiology reports. Natural language processing (NLP) can aid in identifying ILNs requiring follow-up, potentially reducing errors from missed follow-up. We sought to develop an open-access, three-step NLP pipeline specifically for this purpose. METHODS This retrospective used a cohort of 26,545 chest CTs performed in three EDs from 2014 to 2021. Randomly selected chest CT reports were annotated by MD raters using Prodigy software to develop a stepwise NLP "pipeline" that first excluded prior or known malignancy, determined the presence of a lung nodule, and then categorized any recommended follow-up. NLP was developed using a RoBERTa large language model on the SpaCy platform and deployed as open-access software using Docker. After NLP development it was applied to 1000 CT reports that were manually reviewed to determine accuracy using accepted NLP metrics of precision (positive predictive value), recall (sensitivity), and F1 score (which balances precision and recall). RESULTS Precision, recall, and F1 score were 0.85, 0.71, and 0.77, respectively, for malignancy; 0.87, 0.83, and 0.85 for nodule; and 0.82, 0.90, and 0.85 for follow-up. Overall accuracy for follow-up in the absence of malignancy with a nodule present was 93.3%. The overall recommended follow-up rate was 12.4%, with 10.1% of patients having evidence of known or prior malignancy. CONCLUSIONS We developed an accurate, open-access pipeline to identify ILNs with recommended follow-up on ED chest CTs. While the prevalence of recommended follow-up is lower than some prior studies, it more accurately reflects the prevalence of truly incidental findings without prior or known malignancy. Incorporating this tool could reduce errors by improving the identification, communication, and tracking of ILNs.
Collapse
Affiliation(s)
- Christopher L Moore
- Department of Emergency Medicine, Yale University, New Haven, Connecticut, USA
| | - Vimig Socrates
- Department of Biomedical Informatics and Data Science, Yale University, New Haven, Connecticut, USA
| | - Mina Hesami
- Department of Emergency Medicine, Yale University, New Haven, Connecticut, USA
| | - Ryan P Denkewicz
- Department of Emergency Medicine, Yale University, New Haven, Connecticut, USA
| | - Joe J Cavallo
- Department of Radiology and Biomedical Imaging, Yale University, New Haven, Connecticut, USA
| | - Arjun K Venkatesh
- Department of Emergency Medicine, Yale University, New Haven, Connecticut, USA
| | - R Andrew Taylor
- Department of Emergency Medicine, Yale University, New Haven, Connecticut, USA
| |
Collapse
|
2
|
Corvi J, Díaz-Roussel N, Fernández JM, Ronzano F, Centeno E, Accuosto P, Ibrahim C, Asakura S, Bringezu F, Fröhlicher M, Kreuchwig A, Nogami Y, Rih J, Rodriguez-Esteban R, Sajot N, Wichard J, Wu HYM, Drew P, Steger-Hartmann T, Valencia A, Furlong LI, Capella-Gutierrez S. PretoxTM: a text mining system for extracting treatment-related findings from preclinical toxicology reports. J Cheminform 2025; 17:15. [PMID: 39901182 PMCID: PMC11792311 DOI: 10.1186/s13321-024-00925-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/08/2024] [Accepted: 11/04/2024] [Indexed: 02/05/2025] Open
Abstract
Over the last few decades the pharmaceutical industry has generated a vast corpus of knowledge on the safety and efficacy of drugs. Much of this information is contained in toxicology reports, which summarise the results of animal studies designed to analyse the effects of the tested compound, including unintended pharmacological and toxic effects, known as treatment-related findings. Despite the potential of this knowledge, the fact that most of this relevant information is only available as unstructured text with variable degrees of digitisation has hampered its systematic access, use and exploitation. Text mining technologies have the ability to automatically extract, analyse and aggregate such information, providing valuable new insights into the drug discovery and development process. In the context of the eTRANSAFE project, we present PretoxTM (Preclinical Toxicology Text Mining), the first system specifically designed to detect, extract, organise and visualise treatment-related findings from toxicology reports. The PretoxTM tool comprises three main components: PretoxTM Corpus, PretoxTM Pipeline and PretoxTM Web App. The PretoxTM Corpus is a gold standard corpus of preclinical treatment-related findings annotated by toxicology experts. This corpus was used to develop, train and validate the PretoxTM Pipeline, which extracts treatment-related findings from preclinical study reports. The extracted information is then presented for expert visualisation and validation in the PretoxTM Web App.Scientific ContributionWhile text mining solutions have been widely used in the clinical domain to identify adverse drug reactions from various sources, no similar systems exist for identifying adverse events in animal models during preclinical testing. PretoxTM fills this gap by efficiently extracting treatment-related findings from preclinical toxicology reports. This provides a valuable resource for toxicology research, enhancing the efficiency of safety evaluations, saving time, and leading to more effective decision-making in the drug development process.
Collapse
Affiliation(s)
- Javier Corvi
- Life Sciences Department, Barcelona Supercomputing Center (BSC), Barcelona, Spain.
- MedBioInformatics Solutions, Barcelona, Spain.
- University of Barcelona, Barcelona, Spain.
| | - Nicolás Díaz-Roussel
- Life Sciences Department, Barcelona Supercomputing Center (BSC), Barcelona, Spain
| | - José M Fernández
- Life Sciences Department, Barcelona Supercomputing Center (BSC), Barcelona, Spain
| | - Francesco Ronzano
- Hospital del Mar Medical Research Institute (IMIM), Barcelona, Spain
| | - Emilio Centeno
- Hospital del Mar Medical Research Institute (IMIM), Barcelona, Spain
| | | | | | | | - Frank Bringezu
- Chemical and Preclinical Safety, Merck Healthcare KGaA, Darmstadt, Germany
| | - Mirjam Fröhlicher
- Translational Medicine, Preclinical Safety, Novartis Biomedical Research, Basel, Switzerland
| | | | | | | | | | | | | | - Heng-Yi Michael Wu
- Genentech Research and Early Development (gRED) Computational Sciences, Genentech, Inc., South San Francisco, CA, USA
| | | | | | - Alfonso Valencia
- Life Sciences Department, Barcelona Supercomputing Center (BSC), Barcelona, Spain
- Catalan Institution for Research and Advanced Studies (ICREA), Barcelona, Spain
| | | | | |
Collapse
|
3
|
Nourani E, Makri EM, Mao X, Pyysalo S, Brunak S, Nastou K, Jensen LJ. LSD600: the first corpus of biomedical abstracts annotated with lifestyle-disease relations. Database (Oxford) 2025; 2025:baae129. [PMID: 39824652 PMCID: PMC11756709 DOI: 10.1093/database/baae129] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/03/2024] [Revised: 11/15/2024] [Accepted: 12/09/2024] [Indexed: 01/20/2025]
Abstract
Lifestyle factors (LSFs) are increasingly recognized as instrumental in both the development and control of diseases. Despite their importance, there is a lack of methods to extract relations between LSFs and diseases from the literature, a step necessary to consolidate the currently available knowledge into a structured form. As simple co-occurrence-based relation extraction (RE) approaches are unable to distinguish between the different types of LSF-disease relations, context-aware models such as transformers are required to extract and classify these relations into specific relation types. However, no comprehensive LSF-disease RE system existed, nor a corpus suitable for developing one. We present LSD600 (available at https://zenodo.org/records/13952449), the first corpus specifically designed for LSF-disease RE, comprising 600 abstracts with 1900 relations of eight distinct types between 5027 diseases and 6930 LSF entities. We evaluated LSD600's quality by training a RoBERTa model on the corpus, achieving an F-score of 68.5% for the multilabel RE task on the held-out test set. We further validated LSD600 by using the trained model on the two Nutrition-Disease and FoodDisease datasets, where it achieved F-scores of 70.7% and 80.7%, respectively. Building on these performance results, LSD600 and the RE system trained on it can be valuable resources to fill the existing gap in this area and pave the way for downstream applications. Database URL: https://zenodo.org/records/13952449.
Collapse
Affiliation(s)
- Esmaeil Nourani
- Novo Nordisk Foundation Center for Protein Research, Faculty of Health and Medical Sciences, University of Copenhagen, Blegdamsvej 3, Copenhagen 2200, Denmark
- Faculty of Information Technology and Computer Engineering, Azarbaijan Shahid Madani University, Tabriz, Iran
| | - Evangelia-Mantelena Makri
- Novo Nordisk Foundation Center for Protein Research, Faculty of Health and Medical Sciences, University of Copenhagen, Blegdamsvej 3, Copenhagen 2200, Denmark
- Department of Nutrition and Dietetics, Harokopio University, Athens 17676, Attiki, Greece
| | - Xiqing Mao
- Novo Nordisk Foundation Center for Protein Research, Faculty of Health and Medical Sciences, University of Copenhagen, Blegdamsvej 3, Copenhagen 2200, Denmark
| | - Sampo Pyysalo
- TurkuNLP group, Department of Computing, Faculty of Technology, University of Turku, Turku 20014, Finland
| | - Søren Brunak
- Novo Nordisk Foundation Center for Protein Research, Faculty of Health and Medical Sciences, University of Copenhagen, Blegdamsvej 3, Copenhagen 2200, Denmark
| | - Katerina Nastou
- Novo Nordisk Foundation Center for Protein Research, Faculty of Health and Medical Sciences, University of Copenhagen, Blegdamsvej 3, Copenhagen 2200, Denmark
| | - Lars Juhl Jensen
- Novo Nordisk Foundation Center for Protein Research, Faculty of Health and Medical Sciences, University of Copenhagen, Blegdamsvej 3, Copenhagen 2200, Denmark
| |
Collapse
|
4
|
Veeranki SPK, Abdulnazar A, Kramer D, Kreuzthaler M, Lumenta DB. Multi-label text classification via secondary use of large clinical real-world data sets. Sci Rep 2024; 14:26972. [PMID: 39505974 PMCID: PMC11541716 DOI: 10.1038/s41598-024-76424-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/07/2024] [Accepted: 10/14/2024] [Indexed: 11/08/2024] Open
Abstract
Procedural coding presents a taxing challenge for clinicians. However, recent advances in natural language processing offer a promising avenue for developing applications that assist clinicians, thereby alleviating their administrative burdens. This study seeks to create an application capable of predicting procedure codes by analysing clinicians' operative notes, aiming to streamline their workflow and enhance efficiency. We downstreamed an existing and a native German medical BERT model in a secondary use scenario, utilizing already coded surgery notes to model the coding procedure as a multi-label classification task. In comparison to the transformer-based architecture, we were levering the non-contextual model fastText, a convolutional neural network, a support vector machine and logistic regression for a comparative analysis of possible coding performance. About 350,000 notes were used for model adaption. By considering the top five suggested procedure codes from medBERT.de, surgeryBERT.at, fastText, a convolutional neural network, a support vector machine and a logistic regression, the mean average precision achieved was 0.880, 0.867, 0.870, 0.851, 0.870 and 0.805 respectively. Support vector machines performed better for surgery reports with a sequence length greater than 512, achieving a mean average precision of 0.872 in comparison to 0.840 for fastText, 0.837 for medBERT.de and 0.820 for surgeryBERT.at. A prototypical front-end application for coding support was additionally implemented. The problem of predicting procedure codes from a given operative report can be successfully modelled as a multi-label classification task, with a promising performance. Support vector machines as a classical machine learning method outperformed the non-contextual fastText approach. FastText with less demanding hardware resources has reached a similar performance to BERT-based models and has shown to be more suitable for explaining the predictions efficiently.
Collapse
Affiliation(s)
- Sai Pavan Kumar Veeranki
- Steiermärkische Krankenanstaltengesellschaft m.b.H. (KAGes), Billrothgasse 18a, 8010, Graz, Austria
- Institute of Neural Engineering, Graz University of Technology, Stremayrgasse 16/IV, 8010, Graz, Austria
- Center for Health and Bioresources, AIT Austrian Institute of Technology GmbH, Reininghausstrasse 13, 8020, Graz, Austria
| | - Akhila Abdulnazar
- Institute for Medical Informatics, Statistics and Documentation, Medical University of Graz, Auenbruggerplatz 2, 8036, Graz, Austria
| | - Diether Kramer
- Steiermärkische Krankenanstaltengesellschaft m.b.H. (KAGes), Billrothgasse 18a, 8010, Graz, Austria
- Predicting Health GmbH, Ruckerlberggasse 13, 8010, Graz, Austria
| | - Markus Kreuzthaler
- Institute for Medical Informatics, Statistics and Documentation, Medical University of Graz, Auenbruggerplatz 2, 8036, Graz, Austria.
| | - David Benjamin Lumenta
- Research Unit for Digital Surgery, Division of Plastic, Aesthetic and Reconstructive Surgery, Department of Surgery, Medical University of Graz, Auenbruggerplatz 29/4, 8036, Graz, Austria
| |
Collapse
|
5
|
Thalhath N. Lightweight technology stacks for assistive linked annotations. Genomics Inform 2024; 22:17. [PMID: 39390526 PMCID: PMC11468380 DOI: 10.1186/s44342-024-00021-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/03/2024] [Accepted: 09/26/2024] [Indexed: 10/12/2024] Open
Abstract
This report presents the findings of a project from the 8th Biomedical Linked Annotation Hackathon (BLAH) to explore lightweight technology stacks to enhance assistive linked annotations. Using modern JavaScript frameworks and edge functions, in-browser Named Entity Recognition (NER), serverless embedding and vector search within web interfaces, and efficient serverless full-text search were implemented. Through this experimental approach, a proof of concept to demonstrate the feasibility and performance of these technologies was demonstrated. The results show that lightweight stacks can significantly improve the efficiency and cost-effectiveness of annotation tools and provide a local-first, privacy-oriented, and secure alternative to traditional server-based solutions in various use cases. This work emphasizes the potential of developing annotation interfaces that are more responsive, scalable, and user-friendly, which would benefit bioinformatics researchers, practitioners, and software developers.
Collapse
Affiliation(s)
- Nishad Thalhath
- Laboratory for Large-Scale Biomedical Data Technology, RIKEN Center for Integrative Medical Sciences, Tsurumi, Yokohama, 230-0045, Kanagawa, Japan.
| |
Collapse
|
6
|
Wiest IC, Wolf F, Leßmann ME, van Treeck M, Ferber D, Zhu J, Boehme H, Bressem KK, Ulrich H, Ebert MP, Kather JN. LLM-AIx: An open source pipeline for Information Extraction from unstructured medical text based on privacy preserving Large Language Models. MEDRXIV : THE PREPRINT SERVER FOR HEALTH SCIENCES 2024:2024.09.02.24312917. [PMID: 39281753 PMCID: PMC11398444 DOI: 10.1101/2024.09.02.24312917] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Indexed: 09/18/2024]
Abstract
In clinical science and practice, text data, such as clinical letters or procedure reports, is stored in an unstructured way. This type of data is not a quantifiable resource for any kind of quantitative investigations and any manual review or structured information retrieval is time-consuming and costly. The capabilities of Large Language Models (LLMs) mark a paradigm shift in natural language processing and offer new possibilities for structured Information Extraction (IE) from medical free text. This protocol describes a workflow for LLM based information extraction (LLM-AIx), enabling extraction of predefined entities from unstructured text using privacy preserving LLMs. By converting unstructured clinical text into structured data, LLM-AIx addresses a critical barrier in clinical research and practice, where the efficient extraction of information is essential for improving clinical decision-making, enhancing patient outcomes, and facilitating large-scale data analysis. The protocol consists of four main processing steps: 1) Problem definition and data preparation, 2) data preprocessing, 3) LLM-based IE and 4) output evaluation. LLM-AIx allows integration on local hospital hardware without the need of transferring any patient data to external servers. As example tasks, we applied LLM-AIx for the anonymization of fictitious clinical letters from patients with pulmonary embolism. Additionally, we extracted symptoms and laterality of the pulmonary embolism of these fictitious letters. We demonstrate troubleshooting for potential problems within the pipeline with an IE on a real-world dataset, 100 pathology reports from the Cancer Genome Atlas Program (TCGA), for TNM stage extraction. LLM-AIx can be executed without any programming knowledge via an easy-to-use interface and in no more than a few minutes or hours, depending on the LLM model selected.
Collapse
Affiliation(s)
- Isabella Catharina Wiest
- Department of Medicine II, Medical Faculty Mannheim, Heidelberg University, Mannheim, Germany
- Else Kroener Fresenius Center for Digital Health, Faculty of Medicine and University Hospital Carl Gustav Carus, TUD Dresden University of Technology, 01307 Dresden, Germany
| | - Fabian Wolf
- Else Kroener Fresenius Center for Digital Health, Faculty of Medicine and University Hospital Carl Gustav Carus, TUD Dresden University of Technology, 01307 Dresden, Germany
| | - Marie-Elisabeth Leßmann
- Else Kroener Fresenius Center for Digital Health, Faculty of Medicine and University Hospital Carl Gustav Carus, TUD Dresden University of Technology, 01307 Dresden, Germany
- Department of Medicine I, Faculty of Medicine and University Hospital Carl Gustav Carus, TUD Dresden University of Technology, 01307 Dresden, Germany
| | - Marko van Treeck
- Else Kroener Fresenius Center for Digital Health, Faculty of Medicine and University Hospital Carl Gustav Carus, TUD Dresden University of Technology, 01307 Dresden, Germany
| | - Dyke Ferber
- Else Kroener Fresenius Center for Digital Health, Faculty of Medicine and University Hospital Carl Gustav Carus, TUD Dresden University of Technology, 01307 Dresden, Germany
- Medical Oncology, National Center for Tumor Diseases (NCT), University Hospital Heidelberg, Heidelberg, Germany
| | - Jiefu Zhu
- Else Kroener Fresenius Center for Digital Health, Faculty of Medicine and University Hospital Carl Gustav Carus, TUD Dresden University of Technology, 01307 Dresden, Germany
| | - Heiko Boehme
- National Center for Tumor Diseases (NCT/UCC), Dresden, Germany: German Cancer Research Center (DKFZ), Heidelberg, Germany; Medical Faculty and University Hospital Carl Gustav Carus, TUD Dresden University of Technology, Dresden, Germany; Helmholtz-Zentrum Dresden-Rossendorf (HZDR), Dresden, Germany
| | - Keno K. Bressem
- Department of Cardiovascular Radiology and Nuclear Medicine, Technical University of Munich, School of Medicine and Health, German Heart Center, TUM University Hospital, Lazarethstr. 36, 80636, Munich, Germany
| | - Hannes Ulrich
- Institute for Medical Informatics and Statistics, Kiel University and University Hospital Schleswig-Holstein, Campus Kiel, Kiel and Lübeck, Schleswig-Holstein, Germany
| | - Matthias P. Ebert
- Department of Medicine II, Medical Faculty Mannheim, Heidelberg University, Mannheim, Germany
- DKFZ Hector Cancer Institute at the University Medical Center, Mannheim, Germany
| | - Jakob Nikolas Kather
- Else Kroener Fresenius Center for Digital Health, Faculty of Medicine and University Hospital Carl Gustav Carus, TUD Dresden University of Technology, 01307 Dresden, Germany
- Department of Medicine I, Faculty of Medicine and University Hospital Carl Gustav Carus, TUD Dresden University of Technology, 01307 Dresden, Germany
- Medical Oncology, National Center for Tumor Diseases (NCT), University Hospital Heidelberg, Heidelberg, Germany
| |
Collapse
|
7
|
Iscoe M, Socrates V, Gilson A, Chi L, Li H, Huang T, Kearns T, Perkins R, Khandjian L, Taylor RA. Identifying signs and symptoms of urinary tract infection from emergency department clinical notes using large language models. Acad Emerg Med 2024; 31:599-610. [PMID: 38567658 DOI: 10.1111/acem.14883] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/12/2023] [Revised: 01/24/2024] [Accepted: 01/24/2024] [Indexed: 04/04/2024]
Abstract
BACKGROUND Natural language processing (NLP) tools including recently developed large language models (LLMs) have myriad potential applications in medical care and research, including the efficient labeling and classification of unstructured text such as electronic health record (EHR) notes. This opens the door to large-scale projects that rely on variables that are not typically recorded in a structured form, such as patient signs and symptoms. OBJECTIVES This study is designed to acquaint the emergency medicine research community with the foundational elements of NLP, highlighting essential terminology, annotation methodologies, and the intricacies involved in training and evaluating NLP models. Symptom characterization is critical to urinary tract infection (UTI) diagnosis, but identification of symptoms from the EHR has historically been challenging, limiting large-scale research, public health surveillance, and EHR-based clinical decision support. We therefore developed and compared two NLP models to identify UTI symptoms from unstructured emergency department (ED) notes. METHODS The study population consisted of patients aged ≥ 18 who presented to an ED in a northeastern U.S. health system between June 2013 and August 2021 and had a urinalysis performed. We annotated a random subset of 1250 ED clinician notes from these visits for a list of 17 UTI symptoms. We then developed two task-specific LLMs to perform the task of named entity recognition: a convolutional neural network-based model (SpaCy) and a transformer-based model designed to process longer documents (Clinical Longformer). Models were trained on 1000 notes and tested on a holdout set of 250 notes. We compared model performance (precision, recall, F1 measure) at identifying the presence or absence of UTI symptoms at the note level. RESULTS A total of 8135 entities were identified in 1250 notes; 83.6% of notes included at least one entity. Overall F1 measure for note-level symptom identification weighted by entity frequency was 0.84 for the SpaCy model and 0.88 for the Longformer model. F1 measure for identifying presence or absence of any UTI symptom in a clinical note was 0.96 (232/250 correctly classified) for the SpaCy model and 0.98 (240/250 correctly classified) for the Longformer model. CONCLUSIONS The study demonstrated the utility of LLMs and transformer-based models in particular for extracting UTI symptoms from unstructured ED clinical notes; models were highly accurate for detecting the presence or absence of any UTI symptom on the note level, with variable performance for individual symptoms.
Collapse
Affiliation(s)
- Mark Iscoe
- Department of Emergency Medicine, Yale School of Medicine, New Haven, Connecticut, USA
- Section for Biomedical Informatics and Data Science, Yale University School of Medicine, New Haven, Connecticut, USA
| | - Vimig Socrates
- Section for Biomedical Informatics and Data Science, Yale University School of Medicine, New Haven, Connecticut, USA
- Program of Computational Biology and Bioinformatics, Yale University, New Haven, Connecticut, USA
| | - Aidan Gilson
- Yale School of Medicine, New Haven, Connecticut, USA
| | - Ling Chi
- Department of Biostatistics, Yale School of Public Health, New Haven, Connecticut, USA
| | - Huan Li
- Program of Computational Biology and Bioinformatics, Yale University, New Haven, Connecticut, USA
| | - Thomas Huang
- Yale School of Medicine, New Haven, Connecticut, USA
| | - Thomas Kearns
- Department of Emergency Medicine, Yale School of Medicine, New Haven, Connecticut, USA
| | - Rachelle Perkins
- Department of Emergency Medicine, Yale School of Medicine, New Haven, Connecticut, USA
| | - Laura Khandjian
- Department of Emergency Medicine, Yale School of Medicine, New Haven, Connecticut, USA
| | - R Andrew Taylor
- Department of Emergency Medicine, Yale School of Medicine, New Haven, Connecticut, USA
- Section for Biomedical Informatics and Data Science, Yale University School of Medicine, New Haven, Connecticut, USA
| |
Collapse
|
8
|
Vega C, Ostaszewski M, Grouès V, Schneider R, Satagopam V. BioKC: a collaborative platform for curation and annotation of molecular interactions. Database (Oxford) 2024; 2024:baae013. [PMID: 38537198 PMCID: PMC10972550 DOI: 10.1093/database/baae013] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/01/2023] [Revised: 01/30/2024] [Accepted: 02/19/2024] [Indexed: 03/23/2025]
Abstract
Curation of biomedical knowledge into systems biology diagrammatic or computational models is essential for studying complex biological processes. However, systems-level curation is a laborious manual process, especially when facing ever-increasing growth of domain literature. New findings demonstrating elaborate relationships between multiple molecules, pathways and cells have to be represented in a format suitable for systems biology applications. Importantly, curation should capture the complexity of molecular interactions in such a format together with annotations of the involved elements and support stable identifiers and versioning. This challenge calls for novel collaborative tools and platforms allowing to improve the quality and the output of the curation process. In particular, community-based curation, an important source of curated knowledge, requires support in role management, reviewing features and versioning. Here, we present Biological Knowledge Curation (BioKC), a web-based collaborative platform for the curation and annotation of biomedical knowledge following the standard data model from Systems Biology Markup Language (SBML). BioKC offers a graphical user interface for curation of complex molecular interactions and their annotation with stable identifiers and supporting sentences. With the support of collaborative curation and review, it allows to construct building blocks for systems biology diagrams and computational models. These building blocks can be published under stable identifiers and versioned and used as annotations, supporting knowledge building for modelling activities.
Collapse
Affiliation(s)
- Carlos Vega
- Luxembourg Centre for Systems Biomedicine, Université du Luxembourg, 7 Avenue des Hauts Fourneaux, Esch-sur-Alzette 4362, Luxembourg
| | - Marek Ostaszewski
- Luxembourg Centre for Systems Biomedicine, Université du Luxembourg, 7 Avenue des Hauts Fourneaux, Esch-sur-Alzette 4362, Luxembourg
| | - Valentin Grouès
- Luxembourg Centre for Systems Biomedicine, Université du Luxembourg, 7 Avenue des Hauts Fourneaux, Esch-sur-Alzette 4362, Luxembourg
| | - Reinhard Schneider
- Luxembourg Centre for Systems Biomedicine, Université du Luxembourg, 7 Avenue des Hauts Fourneaux, Esch-sur-Alzette 4362, Luxembourg
| | - Venkata Satagopam
- Luxembourg Centre for Systems Biomedicine, Université du Luxembourg, 7 Avenue des Hauts Fourneaux, Esch-sur-Alzette 4362, Luxembourg
| |
Collapse
|
9
|
Irrera O, Marchesin S, Silvello G. MetaTron: advancing biomedical annotation empowering relation annotation and collaboration. BMC Bioinformatics 2024; 25:112. [PMID: 38486137 PMCID: PMC10941452 DOI: 10.1186/s12859-024-05730-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/26/2023] [Accepted: 03/04/2024] [Indexed: 03/17/2024] Open
Abstract
BACKGROUND The constant growth of biomedical data is accompanied by the need for new methodologies to effectively and efficiently extract machine-readable knowledge for training and testing purposes. A crucial aspect in this regard is creating large, often manually or semi-manually, annotated corpora vital for developing effective and efficient methods for tasks like relation extraction, topic recognition, and entity linking. However, manual annotation is expensive and time-consuming especially if not assisted by interactive, intuitive, and collaborative computer-aided tools. To support healthcare experts in the annotation process and foster annotated corpora creation, we present MetaTron. MetaTron is an open-source and free-to-use web-based annotation tool to annotate biomedical data interactively and collaboratively; it supports both mention-level and document-level annotations also integrating automatic built-in predictions. Moreover, MetaTron enables relation annotation with the support of ontologies, functionalities often overlooked by off-the-shelf annotation tools. RESULTS We conducted a qualitative analysis to compare MetaTron with a set of manual annotation tools including TeamTat, INCEpTION, LightTag, MedTAG, and brat, on three sets of criteria: technical, data, and functional. A quantitative evaluation allowed us to assess MetaTron performances in terms of time and number of clicks to annotate a set of documents. The results indicated that MetaTron fulfills almost all the selected criteria and achieves the best performances. CONCLUSIONS MetaTron stands out as one of the few annotation tools targeting the biomedical domain supporting the annotation of relations, and fully customizable with documents in several formats-PDF included, as well as abstracts retrieved from PubMed, Semantic Scholar, and OpenAIRE. To meet any user need, we released MetaTron both as an online instance and as a Docker image locally deployable.
Collapse
Affiliation(s)
- Ornella Irrera
- Department of Information Engineering, University of Padova, Padua, Italy.
| | - Stefano Marchesin
- Department of Information Engineering, University of Padova, Padua, Italy
| | - Gianmaria Silvello
- Department of Information Engineering, University of Padova, Padua, Italy
| |
Collapse
|
10
|
Guével E, Priou S, Flicoteaux R, Lamé G, Bey R, Tannier X, Cohen A, Chatellier G, Daniel C, Tournigand C, Kempf E. Development of a natural language processing model for deriving breast cancer quality indicators : A cross-sectional, multicenter study. Rev Epidemiol Sante Publique 2023; 71:102189. [PMID: 37972522 DOI: 10.1016/j.respe.2023.102189] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/30/2023] [Revised: 10/13/2023] [Accepted: 10/13/2023] [Indexed: 11/19/2023] Open
Abstract
OBJECTIVES Medico-administrative data are promising to automate the calculation of Healthcare Quality and Safety Indicators. Nevertheless, not all relevant indicators can be calculated with this data alone. Our feasibility study objective is to analyze 1) the availability of data sources; 2) the availability of each indicator elementary variables, and 3) to apply natural language processing to automatically retrieve such information. METHOD We performed a multicenter cross-sectional observational feasibility study on the clinical data warehouse of Assistance Publique - Hôpitaux de Paris (AP-HP). We studied the management of breast cancer patients treated at AP-HP between January 2019 and June 2021, and the quality indicators published by the European Society of Breast Cancer Specialist, using claims data from the Programme de Médicalisation du Système d'Information (PMSI) and pathology reports. For each indicator, we calculated the number (%) of patients for whom all necessary data sources were available, and the number (%) of patients for whom all elementary variables were available in the sources, and for whom the related HQSI was computable. To extract useful data from the free text reports, we developed and validated dedicated rule-based algorithms, whose performance metrics were assessed with recall, precision, and f1-score. RESULTS Out of 5785 female patients diagnosed with a breast cancer (60.9 years, IQR [50.0-71.9]), 5,147 (89.0%) had procedures related to breast cancer recorded in the PMSI, and 3732 (72.5%) had at least one surgery. Out of the 34 key indicators, 9 could be calculated with the PMSI alone, and 6 others became so using the data from pathology reports. Ten elementary variables were needed to calculate the 6 indicators combining the PMSI and pathology reports. The necessary sources were available for 58.8% to 94.6% of patients, depending on the indicators. The extraction algorithms developed had an average accuracy of 76.5% (min-max [32.7%-93.3%]), an average precision of 77.7% [10.0%-97.4%] and an average sensitivity of 71.6% [2.8% to 100.0%]. Once these algorithms applied, the variables needed to calculate the indicators were extracted for 2% to 88% of patients, depending on the indicators. DISCUSSION The availability of medical reports in the electronic health records, of the elementary variables within the reports, and the performance of the extraction algorithms limit the population for which the indicators can be calculated. CONCLUSIONS The automated calculation of quality indicators from electronic health records is a prospect that comes up against many practical obstacles.
Collapse
Affiliation(s)
- Etienne Guével
- Assistance Publique - Hôpitaux de Paris, Innovation and Data, IT Department, 75012 Paris, France
| | - Sonia Priou
- Université Paris-Saclay, CentraleSupélec, Laboratoire Génie Industriel, 91192 Gif-sur-Yvette, France
| | - Rémi Flicoteaux
- Assistance Publique - Hôpitaux de Paris, Department of medical information, 75012 Paris, France
| | - Guillaume Lamé
- Université Paris-Saclay, CentraleSupélec, Laboratoire Génie Industriel, 91192 Gif-sur-Yvette, France
| | - Romain Bey
- Assistance Publique - Hôpitaux de Paris, Innovation and Data, IT Department, 75012 Paris, France
| | - Xavier Tannier
- Université Sorbonne Paris Nord, Laboratoire d'Informatique Médicale et d'Ingénierie des Connaissances pour la e-Santé, LIMICS, 75006 Paris, France
| | - Ariel Cohen
- Assistance Publique - Hôpitaux de Paris, Innovation and Data, IT Department, 75012 Paris, France
| | - Gilles Chatellier
- Université Paris CIté, Department of medical informatics, Assistance Publique Hôpitaux de Paris, Centre-Université de Paris (APHP-CUP), 75015 Paris, France
| | - Christel Daniel
- Assistance Publique - Hôpitaux de Paris, Innovation and Data, IT Department, 75012 Paris, France
| | - Christophe Tournigand
- Université Paris Est Créteil, Assistance Publique - Hôpitaux de Paris, Department of medical oncology, Henri Mondor and Albert Chenevier University Hospital, 94000 Créteil, France
| | - Emmanuelle Kempf
- Université Sorbonne Paris Nord, Laboratoire d'Informatique Médicale et d'Ingénierie des Connaissances pour la e-Santé, LIMICS, 75006 Paris, France; Université Paris Est Créteil, Assistance Publique - Hôpitaux de Paris, Department of medical oncology, Henri Mondor and Albert Chenevier University Hospital, 94000 Créteil, France.
| |
Collapse
|
11
|
Macri CZ, Teoh SC, Bacchi S, Tan I, Casson R, Sun MT, Selva D, Chan W. A case study in applying artificial intelligence-based named entity recognition to develop an automated ophthalmic disease registry. Graefes Arch Clin Exp Ophthalmol 2023; 261:3335-3344. [PMID: 37535181 PMCID: PMC10587337 DOI: 10.1007/s00417-023-06190-2] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/02/2022] [Revised: 06/23/2023] [Accepted: 07/23/2023] [Indexed: 08/04/2023] Open
Abstract
PURPOSE Advances in artificial intelligence (AI)-based named entity extraction (NER) have improved the ability to extract diagnostic entities from unstructured, narrative, free-text data in electronic health records. However, there is a lack of ready-to-use tools and workflows to encourage the use among clinicians who often lack experience and training in AI. We sought to demonstrate a case study for developing an automated registry of ophthalmic diseases accompanied by a ready-to-use low-code tool for clinicians. METHODS We extracted deidentified electronic clinical records from a single centre's adult outpatient ophthalmology clinic from November 2019 to May 2022. We used a low-code annotation software tool (Prodigy) to annotate diagnoses and train a bespoke spaCy NER model to extract diagnoses and create an ophthalmic disease registry. RESULTS A total of 123,194 diagnostic entities were extracted from 33,455 clinical records. After decapitalisation and removal of non-alphanumeric characters, there were 5070 distinct extracted diagnostic entities. The NER model achieved a precision of 0.8157, recall of 0.8099, and F score of 0.8128. CONCLUSION We presented a case study using low-code artificial intelligence-based NLP tools to produce an automated ophthalmic disease registry. The workflow created a NER model with a moderate overall ability to extract diagnoses from free-text electronic clinical records. We have produced a ready-to-use tool for clinicians to implement this low-code workflow in their institutions and encourage the uptake of artificial intelligence methods for case finding in electronic health records.
Collapse
Affiliation(s)
- Carmelo Z Macri
- Discipline of Ophthalmology and Visual Sciences, The University of Adelaide, Adelaide, South Australia, Australia.
- Department of Ophthalmology, The Royal Adelaide Hospital, Adelaide, South Australia, Australia.
| | - Sheng Chieh Teoh
- Department of Ophthalmology, The Royal Adelaide Hospital, Adelaide, South Australia, Australia
| | - Stephen Bacchi
- Discipline of Ophthalmology and Visual Sciences, The University of Adelaide, Adelaide, South Australia, Australia
- Department of Ophthalmology, The Royal Adelaide Hospital, Adelaide, South Australia, Australia
| | - Ian Tan
- Department of Ophthalmology, The Royal Adelaide Hospital, Adelaide, South Australia, Australia
| | - Robert Casson
- Discipline of Ophthalmology and Visual Sciences, The University of Adelaide, Adelaide, South Australia, Australia
- Department of Ophthalmology, The Royal Adelaide Hospital, Adelaide, South Australia, Australia
| | - Michelle T Sun
- Discipline of Ophthalmology and Visual Sciences, The University of Adelaide, Adelaide, South Australia, Australia
- Department of Ophthalmology, The Royal Adelaide Hospital, Adelaide, South Australia, Australia
| | - Dinesh Selva
- Discipline of Ophthalmology and Visual Sciences, The University of Adelaide, Adelaide, South Australia, Australia
- Department of Ophthalmology, The Royal Adelaide Hospital, Adelaide, South Australia, Australia
| | - WengOnn Chan
- Discipline of Ophthalmology and Visual Sciences, The University of Adelaide, Adelaide, South Australia, Australia
- Department of Ophthalmology, The Royal Adelaide Hospital, Adelaide, South Australia, Australia
| |
Collapse
|
12
|
Liu L, Perez-Concha O, Nguyen A, Bennett V, Blake V, Gallego B, Jorm L. Web-Based Application Based on Human-in-the-Loop Deep Learning for Deidentifying Free-Text Data in Electronic Medical Records: Development and Usability Study. Interact J Med Res 2023; 12:e46322. [PMID: 37624624 PMCID: PMC10492176 DOI: 10.2196/46322] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/08/2023] [Revised: 05/31/2023] [Accepted: 07/24/2023] [Indexed: 08/26/2023] Open
Abstract
BACKGROUND The narrative free-text data in electronic medical records (EMRs) contain valuable clinical information for analysis and research to inform better patient care. However, the release of free text for secondary use is hindered by concerns surrounding personally identifiable information (PII), as protecting individuals' privacy is paramount. Therefore, it is necessary to deidentify free text to remove PII. Manual deidentification is a time-consuming and labor-intensive process. Numerous automated deidentification approaches and systems have been attempted to overcome this challenge over the past decade. OBJECTIVE We sought to develop an accurate, web-based system deidentifying free text (DEFT), which can be readily and easily adopted in real-world settings for deidentification of free text in EMRs. The system has several key features including a simple and task-focused web user interface, customized PII types, use of a state-of-the-art deep learning model for tagging PII from free text, preannotation by an interactive learning loop, rapid manual annotation with autosave, support for project management and team collaboration, user access control, and central data storage. METHODS DEFT comprises frontend and backend modules and communicates with central data storage through a filesystem path access. The frontend web user interface provides end users with a user-friendly workspace for managing and annotating free text. The backend module processes the requests from the frontend and performs relevant persistence operations. DEFT manages the deidentification workflow as a project, which can contain one or more data sets. Customized PII types and user access control can also be configured. The deep learning model is based on a Bidirectional Long Short-Term Memory-Conditional Random Field (BiLSTM-CRF) with RoBERTa as the word embedding layer. The interactive learning loop is further integrated into DEFT to speed up the deidentification process and increase its performance over time. RESULTS DEFT has many advantages over existing deidentification systems in terms of its support for project management, user access control, data management, and an interactive learning process. Experimental results from DEFT on the 2014 i2b2 data set obtained the highest performance compared to 5 benchmark models in terms of microaverage strict entity-level recall and F1-scores of 0.9563 and 0.9627, respectively. In a real-world use case of deidentifying clinical notes, extracted from 1 referral hospital in Sydney, New South Wales, Australia, DEFT achieved a high microaverage strict entity-level F1-score of 0.9507 on a corpus of 600 annotated clinical notes. Moreover, the manual annotation process with preannotation demonstrated a 43% increase in work efficiency compared to the process without preannotation. CONCLUSIONS DEFT is designed for health domain researchers and data custodians to easily deidentify free text in EMRs. DEFT supports an interactive learning loop and end users with minimal technical knowledge can perform the deidentification work with only a shallow learning curve.
Collapse
Affiliation(s)
- Leibo Liu
- Centre for Big Data Research in Health, University of New South Wales, Sydney, Australia
| | - Oscar Perez-Concha
- Centre for Big Data Research in Health, University of New South Wales, Sydney, Australia
| | - Anthony Nguyen
- Australian e-Health Research Centre (AEHRC), Commonwealth Scientific and Industrial Research Organisation (CSIRO), Brisbane, Australia
| | - Vicki Bennett
- Metadata, Information Management and Classifications Unit (MIMCU), Australian Institute of Health and Welfare, Canberra, Australia
| | - Victoria Blake
- Eastern Heart Clinic, Prince of Wales Hospital, Randwick, Australia
| | - Blanca Gallego
- Centre for Big Data Research in Health, University of New South Wales, Sydney, Australia
| | - Louisa Jorm
- Centre for Big Data Research in Health, University of New South Wales, Sydney, Australia
| |
Collapse
|
13
|
Oommen C, Howlett-Prieto Q, Carrithers MD, Hier DB. Inter-rater agreement for the annotation of neurologic signs and symptoms in electronic health records. Front Digit Health 2023; 5:1075771. [PMID: 37383943 PMCID: PMC10294690 DOI: 10.3389/fdgth.2023.1075771] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/20/2022] [Accepted: 05/26/2023] [Indexed: 06/30/2023] Open
Abstract
The extraction of patient signs and symptoms recorded as free text in electronic health records is critical for precision medicine. Once extracted, signs and symptoms can be made computable by mapping to signs and symptoms in an ontology. Extracting signs and symptoms from free text is tedious and time-consuming. Prior studies have suggested that inter-rater agreement for clinical concept extraction is low. We have examined inter-rater agreement for annotating neurologic concepts in clinical notes from electronic health records. After training on the annotation process, the annotation tool, and the supporting neuro-ontology, three raters annotated 15 clinical notes in three rounds. Inter-rater agreement between the three annotators was high for text span and category label. A machine annotator based on a convolutional neural network had a high level of agreement with the human annotators but one that was lower than human inter-rater agreement. We conclude that high levels of agreement between human annotators are possible with appropriate training and annotation tools. Furthermore, more training examples combined with improvements in neural networks and natural language processing should make machine annotators capable of high throughput automated clinical concept extraction with high levels of agreement with human annotators.
Collapse
Affiliation(s)
- Chelsea Oommen
- Department of Neurology and Rehabilitation, University of Illinois at Chicago, Chicago, IL, United States
| | - Quentin Howlett-Prieto
- Department of Neurology and Rehabilitation, University of Illinois at Chicago, Chicago, IL, United States
| | - Michael D. Carrithers
- Department of Neurology and Rehabilitation, University of Illinois at Chicago, Chicago, IL, United States
| | - Daniel B. Hier
- Department of Neurology and Rehabilitation, University of Illinois at Chicago, Chicago, IL, United States
- Department of Electrical and Computer Engineering, Missouri University of Science and Technology, Rolla, MO, United States
| |
Collapse
|
14
|
Kreuzthaler M, Brochhausen M, Zayas C, Blobel B, Schulz S. Linguistic and ontological challenges of multiple domains contributing to transformed health ecosystems. Front Med (Lausanne) 2023; 10:1073313. [PMID: 37007792 PMCID: PMC10050682 DOI: 10.3389/fmed.2023.1073313] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/18/2022] [Accepted: 02/13/2023] [Indexed: 03/17/2023] Open
Abstract
This paper provides an overview of current linguistic and ontological challenges which have to be met in order to provide full support to the transformation of health ecosystems in order to meet precision medicine (5 PM) standards. It highlights both standardization and interoperability aspects regarding formal, controlled representations of clinical and research data, requirements for smart support to produce and encode content in a way that humans and machines can understand and process it. Starting from the current text-centered communication practices in healthcare and biomedical research, it addresses the state of the art in information extraction using natural language processing (NLP). An important aspect of the language-centered perspective of managing health data is the integration of heterogeneous data sources, employing different natural languages and different terminologies. This is where biomedical ontologies, in the sense of formal, interchangeable representations of types of domain entities come into play. The paper discusses the state of the art of biomedical ontologies, addresses their importance for standardization and interoperability and sheds light to current misconceptions and shortcomings. Finally, the paper points out next steps and possible synergies of both the field of NLP and the area of Applied Ontology and Semantic Web to foster data interoperability for 5 PM.
Collapse
Affiliation(s)
- Markus Kreuzthaler
- Institute for Medical Informatics, Statistics and Documentation, Medical University of Graz, Graz, Austria
| | - Mathias Brochhausen
- Department of Biomedical Informatics, University of Arkansas for Medical Sciences, Little Rock, AR, United States
| | - Cilia Zayas
- Department of Biomedical Informatics, University of Arkansas for Medical Sciences, Little Rock, AR, United States
| | - Bernd Blobel
- Medical Faculty, University of Regensburg, Regensburg, Germany
- eHealth Competence Center Bavaria, Deggendorf Institute of Technology, Deggendorf, Germany
- First Medical Faculty, Charles University Prague, Prague, Czechia
| | - Stefan Schulz
- Institute for Medical Informatics, Statistics and Documentation, Medical University of Graz, Graz, Austria
- Averbis GmbH, Freiburg, Germany
- *Correspondence: Stefan Schulz,
| |
Collapse
|
15
|
Azizi S, Hier DB, Wunsch II DC. Enhanced neurologic concept recognition using a named entity recognition model based on transformers. Front Digit Health 2022; 4:1065581. [PMID: 36569804 PMCID: PMC9772022 DOI: 10.3389/fdgth.2022.1065581] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/09/2022] [Accepted: 11/21/2022] [Indexed: 12/12/2022] Open
Abstract
Although deep learning has been applied to the recognition of diseases and drugs in electronic health records and the biomedical literature, relatively little study has been devoted to the utility of deep learning for the recognition of signs and symptoms. The recognition of signs and symptoms is critical to the success of deep phenotyping and precision medicine. We have developed a named entity recognition model that uses deep learning to identify text spans containing neurological signs and symptoms and then maps these text spans to the clinical concepts of a neuro-ontology. We compared a model based on convolutional neural networks to one based on bidirectional encoder representation from transformers. Models were evaluated for accuracy of text span identification on three text corpora: physician notes from an electronic health record, case histories from neurologic textbooks, and clinical synopses from an online database of genetic diseases. Both models performed best on the professionally-written clinical synopses and worst on the physician-written clinical notes. Both models performed better when signs and symptoms were represented as shorter text spans. Consistent with prior studies that examined the recognition of diseases and drugs, the model based on bidirectional encoder representations from transformers outperformed the model based on convolutional neural networks for recognizing signs and symptoms. Recall for signs and symptoms ranged from 59.5% to 82.0% and precision ranged from 61.7% to 80.4%. With further advances in NLP, fully automated recognition of signs and symptoms in electronic health records and the medical literature should be feasible.
Collapse
Affiliation(s)
- Sima Azizi
- Applied Computational Intelligence Laboratory, Department of Electrical & Computer Engineering, Missouri University of Science & Technology, Rolla, MO, United States
| | - Daniel B. Hier
- Applied Computational Intelligence Laboratory, Department of Electrical & Computer Engineering, Missouri University of Science & Technology, Rolla, MO, United States
- Department of Neurology and Rehabilitation, University of Illinois at Chicago, Chicago, IL, United States
| | - Donald C. Wunsch II
- Applied Computational Intelligence Laboratory, Department of Electrical & Computer Engineering, Missouri University of Science & Technology, Rolla, MO, United States
- National Science Foundation, ECCS Division, Arlington, VA, United States
| |
Collapse
|
16
|
Macri C, Teoh I, Bacchi S, Sun M, Selva D, Casson R, Chan W. Automated Identification of Clinical Procedures in Free-Text Electronic Clinical Records with a Low-Code Named Entity Recognition Workflow. Methods Inf Med 2022; 61:84-89. [PMID: 36096143 DOI: 10.1055/s-0042-1749358] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/14/2022]
Abstract
INTRODUCTION Clinical procedures are often performed in outpatient clinics without prior scheduling at the administrative level, and documentation of the procedure often occurs solely in free-text clinical electronic notes. Natural language processing (NLP), particularly named entity recognition (NER), may provide a solution to extracting procedure data from free-text electronic notes. METHODS Free-text notes from outpatient ophthalmology visits were collected from the electronic clinical records at a single institution over 3 months. The Prodigy low-code annotation tool was used to create an annotation dataset and train a custom NER model for clinical procedures. Clinical procedures were extracted from the entire set of clinical notes. RESULTS There were a total of 5,098 clinic notes extracted for the study period; 1,923 clinic notes were used to build the NER model, which included a total of 231 manual annotations. The NER model achieved an F-score of 0.767, a precision of 0.810, and a recall of 0.729. The most common procedures performed included intravitreal injections of therapeutic substances, removal of corneal foreign bodies, and epithelial debridement of corneal ulcers. CONCLUSIONS The use of a low-code annotation software tool allows the rapid creation of a custom annotation dataset to train a NER model to identify clinical procedures stored in free-text electronic clinical notes. This enables clinicians to rapidly gather previously unidentified procedural data for quality improvement and auditing purposes. Low-code annotation tools may reduce time and coding barriers to clinician participation in NLP research.
Collapse
Affiliation(s)
- Carmelo Macri
- Machine Learning Division, Ophthalmic Research Laboratory, University of Adelaide, Adelaide, South Australia, Australia.,Department of Ophthalmology, Royal Adelaide Hospital, Adelaide, South Australia, Australia.,Discipline of Ophthalmology and Visual Sciences, The University of Adelaide, Adelaide, South Australia, Australia
| | - Ian Teoh
- Machine Learning Division, Ophthalmic Research Laboratory, University of Adelaide, Adelaide, South Australia, Australia
| | - Stephen Bacchi
- Machine Learning Division, Ophthalmic Research Laboratory, University of Adelaide, Adelaide, South Australia, Australia.,Department of Ophthalmology, Royal Adelaide Hospital, Adelaide, South Australia, Australia
| | - Michelle Sun
- Department of Ophthalmology, Royal Adelaide Hospital, Adelaide, South Australia, Australia
| | - Dinesh Selva
- Department of Ophthalmology, Royal Adelaide Hospital, Adelaide, South Australia, Australia
| | - Robert Casson
- Department of Ophthalmology, Royal Adelaide Hospital, Adelaide, South Australia, Australia
| | - WengOnn Chan
- Machine Learning Division, Ophthalmic Research Laboratory, University of Adelaide, Adelaide, South Australia, Australia.,Department of Ophthalmology, Royal Adelaide Hospital, Adelaide, South Australia, Australia
| |
Collapse
|
17
|
Deng L, Zhang X, Yang T, Liu M, Chen L, Jiang T. PIAT: an evolutionarily intelligent system for deep phenotyping of Chinese electronic health records. IEEE J Biomed Health Inform 2022; 26:4142-4152. [PMID: 35609107 DOI: 10.1109/jbhi.2022.3177421] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]
Abstract
Electronic health record (EHR) resources are valuable but remain underexplored because most clinical information, especially phenotype information, is buried in the free text of EHRs. An intelligent annotation tool plays an important role in unlocking the full potential of EHRs by transforming free-text phenotype information into a computer-readable form. Deep phenotyping has shown its advantage in representing phenotype information in EHRs with high fidelity; however, most existing annotation tools are not suitable for the deep phenotyping task. Here, we developed an intelligent annotation tool named PIAT with a major focus on the deep phenotyping of Chinese EHRs. PIAT can improve the annotation efficiency for EHR-based deep phenotyping with a simple but effective interactive interface, automatic preannotation support, and a learning mechanism. Specifically, experts can proofread automatic annotation results from the annotation algorithm in the web-based interactive interface, and EHRs reviewed by experts can be used for evolving the underlying annotation algorithm. In this way, the annotation process of deep phenotyping EHRs will become easier. In conclusion, we create a powerful intelligent system for the deep phenotyping of Chinese EHRs. It is hoped that our work will inspire further studies in constructing intelligent systems for deep phenotyping English and non-English EHRs.
Collapse
|
18
|
A review on method entities in the academic literature: extraction, evaluation, and application. Scientometrics 2022. [DOI: 10.1007/s11192-022-04332-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/18/2022]
|
19
|
He H, Fu S, Wang L, Liu S, Wen A, Liu H. MedTator: a serverless annotation tool for corpus development. Bioinformatics 2022; 38:1776-1778. [PMID: 34983060 DOI: 10.1093/bioinformatics/btab880] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/07/2021] [Revised: 12/23/2021] [Accepted: 12/31/2021] [Indexed: 02/05/2023] Open
Abstract
SUMMARY Building a high-quality annotation corpus requires expenditure of considerable time and expertise, particularly for biomedical and clinical research applications. Most existing annotation tools provide many advanced features to cover a variety of needs where the installation, integration and difficulty of use present a significant burden for actual annotation tasks. Here, we present MedTator, a serverless annotation tool, aiming to provide an intuitive and interactive user interface that focuses on the core steps related to corpus annotation, such as document annotation, corpus summarization, annotation export and annotation adjudication. AVAILABILITY AND IMPLEMENTATION MedTator and its tutorial are freely available from https://ohnlp.github.io/MedTator. MedTator source code is available under the Apache 2.0 license: https://github.com/OHNLP/MedTator. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Huan He
- Department of Artificial Intelligence and Informatics, Mayo Clinic, Rochester, MN 55901, USA
| | - Sunyang Fu
- Department of Artificial Intelligence and Informatics, Mayo Clinic, Rochester, MN 55901, USA
| | - Liwei Wang
- Department of Artificial Intelligence and Informatics, Mayo Clinic, Rochester, MN 55901, USA
| | - Sijia Liu
- Department of Artificial Intelligence and Informatics, Mayo Clinic, Rochester, MN 55901, USA
| | - Andrew Wen
- Department of Artificial Intelligence and Informatics, Mayo Clinic, Rochester, MN 55901, USA
| | - Hongfang Liu
- Department of Artificial Intelligence and Informatics, Mayo Clinic, Rochester, MN 55901, USA
| |
Collapse
|
20
|
Syed S, Angel AJ, Syeda HB, Jennings CF, VanScoy J, Syed M, Greer M, Bhattacharyya S, Al-Shukri S, Zozus M, Prior F, Tharian B. TAX-Corpus: Taxonomy based Annotations for Colonoscopy Evaluation. BIOMEDICAL ENGINEERING SYSTEMS AND TECHNOLOGIES, INTERNATIONAL JOINT CONFERENCE, BIOSTEC ... REVISED SELECTED PAPERS. BIOSTEC (CONFERENCE) 2022; 2022:162-169. [PMID: 35300321 PMCID: PMC8926426 DOI: 10.5220/0010876100003123] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/14/2023]
Abstract
Colonoscopy plays a critical role in screening of colorectal carcinomas (CC). Unfortunately, the data related to this procedure are stored in disparate documents, colonoscopy, pathology, and radiology reports respectively. The lack of integrated standardized documentation is impeding accurate reporting of quality metrics and clinical and translational research. Natural language processing (NLP) has been used as an alternative to manual data abstraction. Performance of Machine Learning (ML) based NLP solutions is heavily dependent on the accuracy of annotated corpora. Availability of large volume annotated corpora is limited due to data privacy laws and the cost and effort required. In addition, the manual annotation process is error-prone, making the lack of quality annotated corpora the largest bottleneck in deploying ML solutions. The objective of this study is to identify clinical entities critical to colonoscopy quality, and build a high-quality annotated corpus using domain specific taxonomies following standardized annotation guidelines. The annotated corpus can be used to train ML models for a variety of downstream tasks.
Collapse
Affiliation(s)
- Shorabuddin Syed
- Department of Biomedical Informatics, University of Arkansas for Medical Sciences, U.S.A
| | | | - Hafsa Bareen Syeda
- Department of Neurology, University of Arkansas for Medical Sciences, U.S.A
| | | | - Joseph VanScoy
- College of Medicine, University of Arkansas for Medical Sciences, U.S.A
| | - Mahanazuddin Syed
- Department of Biomedical Informatics, University of Arkansas for Medical Sciences, U.S.A
| | - Melody Greer
- Department of Biomedical Informatics, University of Arkansas for Medical Sciences, U.S.A
| | | | - Shaymaa Al-Shukri
- Department of Biomedical Informatics, University of Arkansas for Medical Sciences, U.S.A
| | - Meredith Zozus
- Department of Population Health Sciences, University of Texas Health Science Centre at San Antonio, U.S.A
| | - Fred Prior
- Department of Biomedical Informatics, University of Arkansas for Medical Sciences, U.S.A
| | - Benjamin Tharian
- Division of Gastroenterology and Hepatology, University of Arkansas for Medical Sciences, U.S.A
| |
Collapse
|
21
|
Giachelle F, Irrera O, Silvello G. MedTAG: a portable and customizable annotation tool for biomedical documents. BMC Med Inform Decis Mak 2021; 21:352. [PMID: 34922517 PMCID: PMC8684237 DOI: 10.1186/s12911-021-01706-4] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/27/2021] [Accepted: 12/01/2021] [Indexed: 01/08/2023] Open
Abstract
BACKGROUND Semantic annotators and Natural Language Processing (NLP) methods for Named Entity Recognition and Linking (NER+L) require plenty of training and test data, especially in the biomedical domain. Despite the abundance of unstructured biomedical data, the lack of richly annotated biomedical datasets poses hindrances to the further development of NER+L algorithms for any effective secondary use. In addition, manual annotation of biomedical documents performed by physicians and experts is a costly and time-consuming task. To support, organize and speed up the annotation process, we introduce MedTAG, a collaborative biomedical annotation tool that is open-source, platform-independent, and free to use/distribute. RESULTS We present the main features of MedTAG and how it has been employed in the histopathology domain by physicians and experts to annotate more than seven thousand clinical reports manually. We compare MedTAG with a set of well-established biomedical annotation tools, including BioQRator, ezTag, MyMiner, and tagtog, comparing their pros and cons with those of MedTag. We highlight that MedTAG is one of the very few open-source tools provided with an open license and a straightforward installation procedure supporting cross-platform use. CONCLUSIONS MedTAG has been designed according to five requirements (i.e. available, distributable, installable, workable and schematic) defined in a recent extensive review of manual annotation tools. Moreover, MedTAG satisfies 20 over 22 criteria specified in the same study.
Collapse
Affiliation(s)
- Fabio Giachelle
- Department of Information Engineering, University of Padua, Padua, Italy
| | - Ornella Irrera
- Department of Information Engineering, University of Padua, Padua, Italy
| | - Gianmaria Silvello
- Department of Information Engineering, University of Padua, Padua, Italy
| |
Collapse
|
22
|
Islamaj R, Kwon D, Kim S, Lu Z. TeamTat: a collaborative text annotation tool. Nucleic Acids Res 2020; 48:W5-W11. [PMID: 32383756 DOI: 10.1093/nar/gkaa333] [Citation(s) in RCA: 26] [Impact Index Per Article: 5.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/03/2020] [Revised: 04/16/2020] [Accepted: 04/22/2020] [Indexed: 12/20/2022] Open
Abstract
Manually annotated data is key to developing text-mining and information-extraction algorithms. However, human annotation requires considerable time, effort and expertise. Given the rapid growth of biomedical literature, it is paramount to build tools that facilitate speed and maintain expert quality. While existing text annotation tools may provide user-friendly interfaces to domain experts, limited support is available for figure display, project management, and multi-user team annotation. In response, we developed TeamTat (https://www.teamtat.org), a web-based annotation tool (local setup available), equipped to manage team annotation projects engagingly and efficiently. TeamTat is a novel tool for managing multi-user, multi-label document annotation, reflecting the entire production life cycle. Project managers can specify annotation schema for entities and relations and select annotator(s) and distribute documents anonymously to prevent bias. Document input format can be plain text, PDF or BioC (uploaded locally or automatically retrieved from PubMed/PMC), and output format is BioC with inline annotations. TeamTat displays figures from the full text for the annotator's convenience. Multiple users can work on the same document independently in their workspaces, and the team manager can track task completion. TeamTat provides corpus quality assessment via inter-annotator agreement statistics, and a user-friendly interface convenient for annotation review and inter-annotator disagreement resolution to improve corpus quality.
Collapse
Affiliation(s)
- Rezarta Islamaj
- National Center for Biotechnology Information (NCBI), National Library of Medicine (NLM), National Institutes of Health (NIH), Bethesda, MD 20894, USA
| | - Dongseop Kwon
- School of Software Convergence, Myongji University, Seoul 03674, South Korea
| | - Sun Kim
- National Center for Biotechnology Information (NCBI), National Library of Medicine (NLM), National Institutes of Health (NIH), Bethesda, MD 20894, USA
| | - Zhiyong Lu
- National Center for Biotechnology Information (NCBI), National Library of Medicine (NLM), National Institutes of Health (NIH), Bethesda, MD 20894, USA
| |
Collapse
|
23
|
Piad-Morffis A, Gutiérrez Y, Almeida-Cruz Y, Muñoz R. A computational ecosystem to support eHealth Knowledge Discovery technologies in Spanish. J Biomed Inform 2020; 109:103517. [PMID: 32712157 PMCID: PMC7377985 DOI: 10.1016/j.jbi.2020.103517] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/05/2020] [Revised: 05/18/2020] [Accepted: 07/19/2020] [Indexed: 11/29/2022]
Abstract
The massive amount of biomedical information published online requires the development of automatic knowledge discovery technologies to effectively make use of this available content. To foster and support this, the research community creates linguistic resources, such as annotated corpora, and designs shared evaluation campaigns and academic competitive challenges. This work describes an ecosystem that facilitates research and development in knowledge discovery in the biomedical domain, specifically in Spanish language. To this end, several resources are developed and shared with the research community, including a novel semantic annotation model, an annotated corpus of 1045 sentences, and computational resources to build and evaluate automatic knowledge discovery techniques. Furthermore, a research task is defined with objective evaluation criteria, and an online evaluation environment is setup and maintained, enabling researchers interested in this task to obtain immediate feedback and compare their results with the state-of-the-art. As a case study, we analyze the results of a competitive challenge based on these resources and provide guidelines for future research. The constructed ecosystem provides an effective learning and evaluation environment to encourage research in knowledge discovery in Spanish biomedical documents.
Collapse
Affiliation(s)
| | - Yoan Gutiérrez
- University Institute for Computing Research (IUII), University of Alicante, Alicante 03690, Spain; Department of Language and Computing Systems, University of Alicante, Alicante 03690, Spain.
| | | | - Rafael Muñoz
- University Institute for Computing Research (IUII), University of Alicante, Alicante 03690, Spain; Department of Language and Computing Systems, University of Alicante, Alicante 03690, Spain.
| |
Collapse
|
24
|
Weissenbacher D, O'Connor K, Hiraki AT, Kim JD, Gonzalez-Hernandez G. An empirical evaluation of electronic annotation tools for Twitter data. Genomics Inform 2020; 18:e24. [PMID: 32634878 PMCID: PMC7362942 DOI: 10.5808/gi.2020.18.2.e24] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/29/2020] [Accepted: 06/16/2020] [Indexed: 11/30/2022] Open
Abstract
Despite a growing number of natural language processing shared-tasks dedicated to the use of Twitter data, there is currently no ad-hoc annotation tool for the purpose. During the 6th edition of Biomedical Linked Annotation Hackathon (BLAH), after a short review of 19 generic annotation tools, we adapted GATE and TextAE for annotating Twitter timelines. Although none of the tools reviewed allow the annotation of all information inherent of Twitter timelines, a few may be suitable provided the willingness by annotators to compromise on some functionality.
Collapse
Affiliation(s)
- Davy Weissenbacher
- Department of Biostatistics, Epidemiology and Informatics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA 19104, USA
| | - Karen O'Connor
- Department of Biostatistics, Epidemiology and Informatics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA 19104, USA
| | - Aiko T Hiraki
- Database Center for Life Science, Research Organization of Information and Systems, Kashiwa, Chiba 277-0871, Japan
| | - Jin-Dong Kim
- Database Center for Life Science, Research Organization of Information and Systems, Kashiwa, Chiba 277-0871, Japan
| | - Graciela Gonzalez-Hernandez
- Department of Biostatistics, Epidemiology and Informatics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA 19104, USA
| |
Collapse
|
25
|
Lever J, Altman R, Kim JD. Extending TextAE for annotation of non-contiguous entities. Genomics Inform 2020; 18:e15. [PMID: 32634869 PMCID: PMC7362949 DOI: 10.5808/gi.2020.18.2.e15] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/16/2020] [Accepted: 05/22/2020] [Indexed: 11/22/2022] Open
Abstract
Named entity recognition tools are used to identify mentions of biomedical entities in free text and are essential components of high-quality information retrieval and extraction systems. Without good entity recognition, methods will mislabel searched text and will miss important information or identify spurious text that will frustrate users. Most tools do not capture non-contiguous entities which are separate spans of text that together refer to an entity, e.g., the entity “type 1 diabetes” in the phrase “type 1 and type 2 diabetes.” This type is commonly found in biomedical texts, especially in lists, where multiple biomedical entities are named in shortened form to avoid repeating words. Most text annotation systems, that enable users to view and edit entity annotations, do not support non-contiguous entities. Therefore, experts cannot even visualize non-contiguous entities, let alone annotate them to build valuable datasets for machine learning methods. To combat this problem and as part of the BLAH6 hackathon, we extended the TextAE platform to allow visualization and annotation of non-contiguous entities. This enables users to add new subspans to existing entities by selecting additional text. We integrate this new functionality with TextAE’s existing editing functionality to allow easy changes to entity annotation and editing of relation annotations involving non-contiguous entities, with importing and exporting to the PubAnnotation format. Finally, we roughly quantify the problem across the entire accessible biomedical literature to highlight that there are a substantial number of non-contiguous entities that appear in lists that would be missed by most text mining systems.
Collapse
Affiliation(s)
- Jake Lever
- Department of Bioengineering, Stanford University, Stanford, CA, 94305, USA
| | - Russ Altman
- Department of Bioengineering, Stanford University, Stanford, CA, 94305, USA
| | - Jin-Dong Kim
- Database Center for Life Science, Research Organization of Information and Systems, Kashiwa 277-0871, Japan
| |
Collapse
|
26
|
Yamada R, Okada D, Wang J, Basak T, Koyama S. Interpretation of omics data analyses. J Hum Genet 2020; 66:93-102. [PMID: 32385339 PMCID: PMC7728595 DOI: 10.1038/s10038-020-0763-5] [Citation(s) in RCA: 45] [Impact Index Per Article: 9.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/09/2020] [Revised: 03/25/2020] [Accepted: 03/28/2020] [Indexed: 11/22/2022]
Abstract
Omics studies attempt to extract meaningful messages from large-scale and high-dimensional data sets by treating the data sets as a whole. The concept of treating data sets as a whole is important in every step of the data-handling procedures: the pre-processing step of data records, the step of statistical analyses and machine learning, translation of the outputs into human natural perceptions, and acceptance of the messages with uncertainty. In the pre-processing, the method by which to control the data quality and batch effects are discussed. For the main analyses, the approaches are divided into two types and their basic concepts are discussed. The first type is the evaluation of many items individually, followed by interpretation of individual items in the context of multiple testing and combination. The second type is the extraction of fewer important aspects from the whole data records. The outputs of the main analyses are translated into natural languages with techniques, such as annotation and ontology. The other technique for making the outputs perceptible is visualization. At the end of this review, one of the most important issues in the interpretation of omics data analyses is discussed. Omics studies have a large amount of information in their data sets, and every approach reveals only a very restricted aspect of the whole data sets. The understandable messages from these studies have unavoidable uncertainty.
Collapse
Affiliation(s)
- Ryo Yamada
- Unit of Statistical Genetics, Center for Genomic Medicine, Graduate School of Medicine, Kyoto University, Nanbusogo-Kenkyu-To-1, 5F, 53 Syogoin-Kawaramachi, Sakyo-ku, Kyoto, 606-8507, Japan.
| | - Daigo Okada
- Unit of Statistical Genetics, Center for Genomic Medicine, Graduate School of Medicine, Kyoto University, Nanbusogo-Kenkyu-To-1, 5F, 53 Syogoin-Kawaramachi, Sakyo-ku, Kyoto, 606-8507, Japan
| | - Juan Wang
- Unit of Statistical Genetics, Center for Genomic Medicine, Graduate School of Medicine, Kyoto University, Nanbusogo-Kenkyu-To-1, 5F, 53 Syogoin-Kawaramachi, Sakyo-ku, Kyoto, 606-8507, Japan
| | - Tapati Basak
- Unit of Statistical Genetics, Center for Genomic Medicine, Graduate School of Medicine, Kyoto University, Nanbusogo-Kenkyu-To-1, 5F, 53 Syogoin-Kawaramachi, Sakyo-ku, Kyoto, 606-8507, Japan
| | - Satoshi Koyama
- Unit of Statistical Genetics, Center for Genomic Medicine, Graduate School of Medicine, Kyoto University, Nanbusogo-Kenkyu-To-1, 5F, 53 Syogoin-Kawaramachi, Sakyo-ku, Kyoto, 606-8507, Japan
| |
Collapse
|