Reference Citation Analysis: Find an Article, Find a Category, Find a Journal, Find a Scholar

For: Zheng NS, Feng Q, Kerchberger VE, Zhao J, Edwards TL, Cox NJ, Stein CM, Roden DM, Denny JC, Wei WQ. PheMap: a multi-resource knowledge base for high-throughput phenotyping within electronic health records. J Am Med Inform Assoc 2021;27:1675-1687. [PMID: 32974638 PMCID: PMC7751140 DOI: 10.1093/jamia/ocaa104] [Citation(s) in RCA: 24] [Impact Index Per Article: 8.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/22/2020] [Revised: 05/06/2020] [Accepted: 05/13/2020] [Indexed: 01/16/2023] Open

For:	Zheng NS, Feng Q, Kerchberger VE, Zhao J, Edwards TL, Cox NJ, Stein CM, Roden DM, Denny JC, Wei WQ. PheMap: a multi-resource knowledge base for high-throughput phenotyping within electronic health records. J Am Med Inform Assoc 2021;27:1675-1687. [PMID: 32974638 PMCID: PMC7751140 DOI: 10.1093/jamia/ocaa104] [Citation(s) in RCA: 24] [Impact Index Per Article: 8.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/22/2020] [Revised: 05/06/2020] [Accepted: 05/13/2020] [Indexed: 01/16/2023] Open

Number

Cited by Other Article(s)

Løkhammer S, Koller D, Wendt FR, Choi KW, He J, Friligkou E, Overstreet C, Gelernter J, Hellard SL, Polimanti R. Distinguishing vulnerability and resilience to posttraumatic stress disorder evaluating traumatic experiences, genetic risk and electronic health records. Psychiatry Res 2024;337:115950. [PMID: 38744179 PMCID: PMC11156529 DOI: 10.1016/j.psychres.2024.115950] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 03/11/2024] [Revised: 04/29/2024] [Accepted: 05/04/2024] [Indexed: 05/16/2024]

Affiliation(s)

Solveig Løkhammer Department of Psychiatry, Yale School of Medicine, New Haven, Connecticut, USA Department of Clinical Science, University of Bergen, Bergen, Norway Dr. Einar Martens Research Group for Biological Psychiatry, Center for Medical Genetics and Molecular Medicine, Haukeland University Hospital, Bergen, Norway
Dora Koller Department of Psychiatry, Yale School of Medicine, New Haven, Connecticut, USA Department of Genetics, Microbiology, and Statistics, Faculty of Biology, University of Barcelona, Catalonia, Spain
Frank R. Wendt Department of Anthropology, University of Toronto, Mississauga, Canada Biostatistics Division, Dalla Lana School of Public Health, University of Toronto, Toronto, Canada
Karmel W. Choi Harvard T.H. Chan School of Public Health, Boston, Massachusetts, USA Department of Psychiatry, Massachusetts General Hospital, Harvard Medical School, Boston, Massachusetts, USA
Jun He Department of Psychiatry, Yale School of Medicine, New Haven, Connecticut, USA Veterans Affairs Connecticut Healthcare Center, West Haven, Connecticut, USA
Eleni Friligkou Department of Psychiatry, Yale School of Medicine, New Haven, Connecticut, USA Veterans Affairs Connecticut Healthcare Center, West Haven, Connecticut, USA
Cassie Overstreet Department of Psychiatry, Yale School of Medicine, New Haven, Connecticut, USA Veterans Affairs Connecticut Healthcare Center, West Haven, Connecticut, USA
Joel Gelernter Department of Psychiatry, Yale School of Medicine, New Haven, Connecticut, USA Veterans Affairs Connecticut Healthcare Center, West Haven, Connecticut, USA Department of Genetics, Yale School of Medicine, New Haven, Connecticut, USA Department of Neuroscience, Yale School of Medicine, New Haven, Connecticut, USA Wu Tsai Institute, Yale University, New Haven, Connecticut, USA
Stéphanie Le Hellard Department of Clinical Science, University of Bergen, Bergen, Norway Dr. Einar Martens Research Group for Biological Psychiatry, Center for Medical Genetics and Molecular Medicine, Haukeland University Hospital, Bergen, Norway Bergen Center of Brain Plasticity, Haukeland University Hospital, Bergen, Norway
Renato Polimanti Department of Psychiatry, Yale School of Medicine, New Haven, Connecticut, USA Veterans Affairs Connecticut Healthcare Center, West Haven, Connecticut, USA Wu Tsai Institute, Yale University, New Haven, Connecticut, USA Department of Chronic Disease Epidemiology, Yale School of Public Health, New Haven, Connecticut, USA

Collapse

Steinfeldt J, Wild B, Buergel T, Pietzner M, Upmeier Zu Belzen J, Vauvelle A, Hegselmann S, Denaxas S, Hemingway H, Langenberg C, Landmesser U, Deanfield J, Eils R. Medical history predicts phenome-wide disease onset and enables the rapid response to emerging health threats. Nat Commun 2024;15:4257. [PMID: 38763986 PMCID: PMC11102902 DOI: 10.1038/s41467-024-48568-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/17/2023] [Accepted: 05/03/2024] [Indexed: 05/21/2024] Open

Affiliation(s)

Jakob Steinfeldt Department of Cardiology, Angiology and Intensive Care Medicine, Deutsches Herzzentrum der Charité (DHZC), Berlin, Germany Charité - Universitätsmedizin Berlin, corporate member of Freie Universität Berlin and Humboldt Universität zu Berlin, Klinik/Centrum, Charitéplatz 1, 10117, Berlin, Germany Computational Medicine, Berlin Institute of Health (BIH), Charite - University Medicine Berlin, Berlin, Germany Friede Springer Cardiovascular Prevention Center@Charite, Charite - University Medicine Berlin, Berlin, Germany Institute of Cardiovascular Sciences, University College London, London, UK
Benjamin Wild Center for Digital Health, Berlin Institute of Health (BIH), Charite - University Medicine Berlin, Berlin, Germany
Thore Buergel Institute of Cardiovascular Sciences, University College London, London, UK Center for Digital Health, Berlin Institute of Health (BIH), Charite - University Medicine Berlin, Berlin, Germany
Maik Pietzner Computational Medicine, Berlin Institute of Health (BIH), Charite - University Medicine Berlin, Berlin, Germany MRC Epidemiology Unit, Institute of Metabolic Science, University of Cambridge, Cambridge, UK Precision Health University Research Institute, Queen Mary University of London and Barts NHS Trust, London, UK
Julius Upmeier Zu Belzen Center for Digital Health, Berlin Institute of Health (BIH), Charite - University Medicine Berlin, Berlin, Germany
Andre Vauvelle Institute of Health Informatics, University College London, London, UK
Stefan Hegselmann Institute for Medical Engineering and Science, Massachusetts Institute of Technology, Massachusetts, USA Pattern Recognition and Image Analysis Lab, University of Münster, Münster, Germany
Spiros Denaxas Institute of Health Informatics, University College London, London, UK British Heart Foundation Data Science Centre, London, UK Health Data Research UK, London, UK National Institute for Health Research, Biomedical Research Centre at University College London Hospitals National Institute for Health Research, Biomedical Research Centre, London, UK
Harry Hemingway Institute of Health Informatics, University College London, London, UK Health Data Research UK, London, UK National Institute for Health Research, Biomedical Research Centre at University College London Hospitals National Institute for Health Research, Biomedical Research Centre, London, UK
Claudia Langenberg Computational Medicine, Berlin Institute of Health (BIH), Charite - University Medicine Berlin, Berlin, Germany MRC Epidemiology Unit, Institute of Metabolic Science, University of Cambridge, Cambridge, UK Precision Health University Research Institute, Queen Mary University of London and Barts NHS Trust, London, UK
Ulf Landmesser Department of Cardiology, Angiology and Intensive Care Medicine, Deutsches Herzzentrum der Charité (DHZC), Berlin, Germany Charité - Universitätsmedizin Berlin, corporate member of Freie Universität Berlin and Humboldt Universität zu Berlin, Klinik/Centrum, Charitéplatz 1, 10117, Berlin, Germany Friede Springer Cardiovascular Prevention Center@Charite, Charite - University Medicine Berlin, Berlin, Germany Berlin Institute of Health (BIH), Charite - University Medicine Berlin, Berlin, Germany DZHK (German Centre for Cardiovascular Research), Partner Site Berlin, Berlin, Berlin, Germany
John Deanfield Institute of Cardiovascular Sciences, University College London, London, UK
Roland Eils Center for Digital Health, Berlin Institute of Health (BIH), Charite - University Medicine Berlin, Berlin, Germany. Health Data Science Unit, Heidelberg University Hospital and BioQuant, Heidelberg, Germany.

Collapse

Elfman J, Goins L, Heller T, Singh S, Wang YH, Li H. Discovery of a polymorphic gene fusion via bottom-up chimeric RNA prediction. Nucleic Acids Res 2024;52:4409-4421. [PMID: 38587197 PMCID: PMC11077074 DOI: 10.1093/nar/gkae258] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/17/2022] [Accepted: 03/27/2024] [Indexed: 04/09/2024] Open

Yan C, Ong HH, Grabowska ME, Krantz MS, Su WC, Dickson AL, Peterson JF, Feng Q, Roden DM, Stein CM, Kerchberger VE, Malin BA, Wei WQ. Large language models facilitate the generation of electronic health record phenotyping algorithms. J Am Med Inform Assoc 2024:ocae072. [PMID: 38613820 DOI: 10.1093/jamia/ocae072] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/19/2023] [Revised: 02/21/2024] [Accepted: 03/22/2024] [Indexed: 04/15/2024] Open

Abstract

OBJECTIVES

Phenotyping is a core task in observational health research utilizing electronic health records (EHRs). Developing an accurate algorithm demands substantial input from domain experts, involving extensive literature review and evidence synthesis. This burdensome process limits scalability and delays knowledge discovery. We investigate the potential for leveraging large language models (LLMs) to enhance the efficiency of EHR phenotyping by generating high-quality algorithm drafts.

MATERIALS AND METHODS

We prompted four LLMs-GPT-4 and GPT-3.5 of ChatGPT, Claude 2, and Bard-in October 2023, asking them to generate executable phenotyping algorithms in the form of SQL queries adhering to a common data model (CDM) for three phenotypes (ie, type 2 diabetes mellitus, dementia, and hypothyroidism). Three phenotyping experts evaluated the returned algorithms across several critical metrics. We further implemented the top-rated algorithms and compared them against clinician-validated phenotyping algorithms from the Electronic Medical Records and Genomics (eMERGE) network.

RESULTS

GPT-4 and GPT-3.5 exhibited significantly higher overall expert evaluation scores in instruction following, algorithmic logic, and SQL executability, when compared to Claude 2 and Bard. Although GPT-4 and GPT-3.5 effectively identified relevant clinical concepts, they exhibited immature capability in organizing phenotyping criteria with the proper logic, leading to phenotyping algorithms that were either excessively restrictive (with low recall) or overly broad (with low positive predictive values).

CONCLUSION

GPT versions 3.5 and 4 are capable of drafting phenotyping algorithms by identifying relevant clinical criteria aligned with a CDM. However, expertise in informatics and clinical experience is still required to assess and further refine generated algorithms.

Collapse

Affiliation(s)

Chao Yan Department of Biomedical Informatics, Vanderbilt University Medical Center, Nashville, TN 37203, United States
Henry H Ong Department of Biomedical Informatics, Vanderbilt University Medical Center, Nashville, TN 37203, United States
Monika E Grabowska Department of Biomedical Informatics, Vanderbilt University Medical Center, Nashville, TN 37203, United States
Matthew S Krantz Department of Biomedical Informatics, Vanderbilt University Medical Center, Nashville, TN 37203, United States
Wu-Chen Su Department of Biomedical Informatics, Vanderbilt University Medical Center, Nashville, TN 37203, United States
Alyson L Dickson Department of Medicine, Vanderbilt University Medical Center, Nashville, TN 37203, United States
Josh F Peterson Department of Biomedical Informatics, Vanderbilt University Medical Center, Nashville, TN 37203, United States Department of Medicine, Vanderbilt University Medical Center, Nashville, TN 37203, United States
QiPing Feng Department of Medicine, Vanderbilt University Medical Center, Nashville, TN 37203, United States
Dan M Roden Department of Biomedical Informatics, Vanderbilt University Medical Center, Nashville, TN 37203, United States
C Michael Stein Department of Medicine, Vanderbilt University Medical Center, Nashville, TN 37203, United States
V Eric Kerchberger Department of Biomedical Informatics, Vanderbilt University Medical Center, Nashville, TN 37203, United States Department of Medicine, Vanderbilt University Medical Center, Nashville, TN 37203, United States
Bradley A Malin Department of Biomedical Informatics, Vanderbilt University Medical Center, Nashville, TN 37203, United States Department of Computer Science, Vanderbilt University, Nashville, TN 37203, United States Department of Biostatistics, Vanderbilt University Medical Center, Nashville, TN 37203, United States
Wei-Qi Wei Department of Biomedical Informatics, Vanderbilt University Medical Center, Nashville, TN 37203, United States Department of Computer Science, Vanderbilt University, Nashville, TN 37203, United States

Collapse

Wei WQ, Rowley R, Wood A, MacArthur J, Embi PJ, Denaxas S. Improving reporting standards for phenotyping algorithm in biomedical research: 5 fundamental dimensions. J Am Med Inform Assoc 2024;31:1036-1041. [PMID: 38269642 PMCID: PMC10990558 DOI: 10.1093/jamia/ocae005] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/03/2023] [Revised: 12/12/2023] [Accepted: 01/08/2024] [Indexed: 01/26/2024] Open

Yan C, Ong HH, Grabowska ME, Krantz MS, Su WC, Dickson AL, Peterson JF, Feng Q, Roden DM, Stein CM, Kerchberger VE, Malin BA, Wei WQ. Large Language Models Facilitate the Generation of Electronic Health Record Phenotyping Algorithms. MEDRXIV : THE PREPRINT SERVER FOR HEALTH SCIENCES 2024:2023.12.19.23300230. [PMID: 38196578 PMCID: PMC10775330 DOI: 10.1101/2023.12.19.23300230] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/11/2024]

Abstract

Objectives

Materials and Methods

We prompted four LLMs-GPT-4 and GPT-3.5 of ChatGPT, Claude 2, and Bard-in October 2023, asking them to generate executable phenotyping algorithms in the form of SQL queries adhering to a common data model (CDM) for three phenotypes (i.e., type 2 diabetes mellitus, dementia, and hypothyroidism). Three phenotyping experts evaluated the returned algorithms across several critical metrics. We further implemented the top-rated algorithms and compared them against clinician-validated phenotyping algorithms from the Electronic Medical Records and Genomics (eMERGE) network.

Results

Conclusion

Collapse

Smith JC, Williamson BD, Cronkite DJ, Park D, Whitaker JM, McLemore MF, Osmanski JT, Winter R, Ramaprasan A, Kelley A, Shea M, Wittayanukorn S, Stojanovic D, Zhao Y, Toh S, Johnson KB, Aronoff DM, Carrell DS. Data-driven automated classification algorithms for acute health conditions: applying PheNorm to COVID-19 disease. J Am Med Inform Assoc 2024;31:574-582. [PMID: 38109888 PMCID: PMC10873852 DOI: 10.1093/jamia/ocad241] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/27/2023] [Revised: 10/19/2023] [Accepted: 11/27/2023] [Indexed: 12/20/2023] Open

Abstract

OBJECTIVES

Automated phenotyping algorithms can reduce development time and operator dependence compared to manually developed algorithms. One such approach, PheNorm, has performed well for identifying chronic health conditions, but its performance for acute conditions is largely unknown. Herein, we implement and evaluate PheNorm applied to symptomatic COVID-19 disease to investigate its potential feasibility for rapid phenotyping of acute health conditions.

MATERIALS AND METHODS

PheNorm is a general-purpose automated approach to creating computable phenotype algorithms based on natural language processing, machine learning, and (low cost) silver-standard training labels. We applied PheNorm to cohorts of potential COVID-19 patients from 2 institutions and used gold-standard manual chart review data to investigate the impact on performance of alternative feature engineering options and implementing externally trained models without local retraining.

RESULTS

Models at each institution achieved AUC, sensitivity, and positive predictive value of 0.853, 0.879, 0.851 and 0.804, 0.976, and 0.885, respectively, at quantiles of model-predicted risk that maximize F1. We report performance metrics for all combinations of silver labels, feature engineering options, and models trained internally versus externally.

DISCUSSION

Phenotyping algorithms developed using PheNorm performed well at both institutions. Performance varied with different silver-standard labels and feature engineering options. Models developed locally at one site also worked well when implemented externally at the other site.

CONCLUSION

PheNorm models successfully identified an acute health condition, symptomatic COVID-19. The simplicity of the PheNorm approach allows it to be applied at multiple study sites with substantially reduced overhead compared to traditional approaches.

Collapse

Affiliation(s)

Joshua C Smith Department of Biomedical Informatics, Vanderbilt University Medical Center, Nashville, TN 37203, United States
Brian D Williamson Kaiser Permanente Washington Health Research Institute, Seattle, WA 98101, United States
David J Cronkite Kaiser Permanente Washington Health Research Institute, Seattle, WA 98101, United States
Daniel Park Department of Biomedical Informatics, Vanderbilt University Medical Center, Nashville, TN 37203, United States
Jill M Whitaker Department of Biomedical Informatics, Vanderbilt University Medical Center, Nashville, TN 37203, United States
Michael F McLemore Department of Biomedical Informatics, Vanderbilt University Medical Center, Nashville, TN 37203, United States
Joshua T Osmanski Department of Biomedical Informatics, Vanderbilt University Medical Center, Nashville, TN 37203, United States
Robert Winter Department of Biomedical Informatics, Vanderbilt University Medical Center, Nashville, TN 37203, United States
Arvind Ramaprasan Kaiser Permanente Washington Health Research Institute, Seattle, WA 98101, United States
Ann Kelley Kaiser Permanente Washington Health Research Institute, Seattle, WA 98101, United States
Mary Shea Kaiser Permanente Washington Health Research Institute, Seattle, WA 98101, United States
Saranrat Wittayanukorn Center for Drug Evaluation and Research, US Food and Drug Administration, Silver Spring, MD 20903, United States
Danijela Stojanovic Center for Drug Evaluation and Research, US Food and Drug Administration, Silver Spring, MD 20903, United States
Yueqin Zhao Center for Drug Evaluation and Research, US Food and Drug Administration, Silver Spring, MD 20903, United States
Sengwee Toh Harvard Pilgrim Health Care Institute, Boston, MA 02215, United States
Kevin B Johnson Department of Biostatistics, Epidemiology and Informatics, University of Pennsylvania, Philadelphia, PA 19104, United States
David M Aronoff Department of Medicine, Indiana University School of Medicine, Indianapolis, IN 46202, United States
David S Carrell Kaiser Permanente Washington Health Research Institute, Seattle, WA 98101, United States

Collapse

Wan NC, Yaqoob AA, Ong HH, Zhao J, Wei WQ. Evaluating resources composing the PheMAP knowledge base to enhance high-throughput phenotyping. J Am Med Inform Assoc 2023;30:456-465. [PMID: 36451277 PMCID: PMC9933070 DOI: 10.1093/jamia/ocac234] [Citation(s) in RCA: 3] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/24/2022] [Revised: 10/28/2022] [Accepted: 11/23/2022] [Indexed: 12/02/2022] Open

Abstract

OBJECTIVE

A previous study, PheMAP, combined independent, online resources to enable high-throughput phenotyping (HTP) using electronic health records (EHRs). However, online resources offer distinct quality descriptions of diseases which may affect phenotyping performance. We aimed to evaluate the phenotyping performance of single resource-based PheMAPs and investigate an optimized strategy for HTP.

MATERIALS AND METHODS

We compared how each resource produced top-ranked concept unique identifiers (CUIs) by term frequency-inverse document frequency with Jaccard matrices comparing single resources and the original PheMAP. We correlated top-ranked concepts from each resource to features used in established Phenotype KnowledgeBase (PheKB) algorithms for hypothyroidism, type II diabetes mellitus (T2DM), and dementias. Using resources separately, we calculated multiple phenotype risk scores for individuals from Vanderbilt University Medical Center's BioVU DNA Biobank and compared phenotyping performance against rule-based eMERGE algorithms. Lastly, we implemented an ensemble strategy which classified patient case/control status based upon PheMAP resource agreement.

RESULTS

Jaccard similarity matrices indicate that the similarity of CUIs comprising single resource-based PheMAPs varies. Single resource-based PheMAPs generated from MedlinePlus and MedicineNet outperformed others but only encompass 81.6% of overall disease phenotypes. We propose the PheMAP-Ensemble which provides higher average accuracy and precision than the combined average accuracy and precision of single resource-based PheMAPs. While offering complete phenotype coverage, PheMAP-Ensemble significantly increases phenotyping recall compared to the original iteration.

CONCLUSIONS

Resources comprising the PheMAP produce different phenotyping performance when implemented individually. The ensemble method significantly improves the quality of PheMAP by fully utilizing dissimilar resources to capture accurate phenotyping data from EHRs.

Collapse

Yang S, Varghese P, Stephenson E, Tu K, Gronsbell J. Machine learning approaches for electronic health records phenotyping: a methodical review. J Am Med Inform Assoc 2023;30:367-381. [PMID: 36413056 PMCID: PMC9846699 DOI: 10.1093/jamia/ocac216] [Citation(s) in RCA: 16] [Impact Index Per Article: 16.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/28/2022] [Revised: 09/27/2022] [Accepted: 10/27/2022] [Indexed: 11/23/2022] Open

Abstract

OBJECTIVE

Accurate and rapid phenotyping is a prerequisite to leveraging electronic health records for biomedical research. While early phenotyping relied on rule-based algorithms curated by experts, machine learning (ML) approaches have emerged as an alternative to improve scalability across phenotypes and healthcare settings. This study evaluates ML-based phenotyping with respect to (1) the data sources used, (2) the phenotypes considered, (3) the methods applied, and (4) the reporting and evaluation methods used.

MATERIALS AND METHODS

We searched PubMed and Web of Science for articles published between 2018 and 2022. After screening 850 articles, we recorded 37 variables on 100 studies.

RESULTS

Most studies utilized data from a single institution and included information in clinical notes. Although chronic conditions were most commonly considered, ML also enabled the characterization of nuanced phenotypes such as social determinants of health. Supervised deep learning was the most popular ML paradigm, while semi-supervised and weakly supervised learning were applied to expedite algorithm development and unsupervised learning to facilitate phenotype discovery. ML approaches did not uniformly outperform rule-based algorithms, but deep learning offered a marginal improvement over traditional ML for many conditions.

DISCUSSION

Despite the progress in ML-based phenotyping, most articles focused on binary phenotypes and few articles evaluated external validity or used multi-institution data. Study settings were infrequently reported and analytic code was rarely released.

CONCLUSION

Continued research in ML-based phenotyping is warranted, with emphasis on characterizing nuanced phenotypes, establishing reporting and evaluation standards, and developing methods to accommodate misclassified phenotypes due to algorithm errors in downstream applications.

Collapse

Meegan JE, Kerchberger VE, Fortune NL, McNeil JB, Bastarache JA, Austin ED, Ware LB, Hemnes AR, Brittain EL. Transpulmonary generation of cell-free hemoglobin contributes to vascular dysfunction in pulmonary arterial hypertension via dysregulated clearance mechanisms. Pulm Circ 2023;13:e12185. [PMID: 36743426 PMCID: PMC9841468 DOI: 10.1002/pul2.12185] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 10/31/2022] [Revised: 12/12/2022] [Accepted: 01/03/2023] [Indexed: 01/07/2023] Open

Abstract

Circulating cell-free hemoglobin (CFH) is elevated in pulmonary arterial hypertension (PAH) and associated with poor outcomes but the mechanisms are unknown. We hypothesized that CFH is generated from the pulmonary circulation and inadequately cleared in PAH. Transpulmonary CFH (difference between wedge and pulmonary artery positions) and lung hemoglobin α were analyzed in patients with PAH and healthy controls. Haptoglobin genotype and plasma hemoglobin processing proteins were analyzed in patients with PAH, unaffected bone morphogenetic protein receptor type II mutation carriers (UMCs), and control subjects. Transpulmonary CFH was increased in patients with PAH (p = 0.04) and correlated with pulmonary vascular resistanc (PVR) (r s = 0.75, p = 0.02) and mean pulmonary arterial pressure (mPAP) (r s = 0.78, p = 0.02). Pulmonary vascular hemoglobin α protein was increased in patients with PAH (p = 0.006), especially in occluded vessels (p = 0.04). Haptoglobin genotype did not differ between groups. Plasma haptoglobin was higher in UMCs compared with both control subjects (p = 0.03) and patients with HPAH (p < 0.0001); patients with IPAH had higher circulating haptoglobin levels than patients with HPAH (p = 0.006). Notably, circulating CFH to haptoglobin ratio was elevated in patients with HPAH compared to control subjects (p = 0.02) and UMCs (p = 0.006). Moreover, in patients with PAH, CFH: haptoglobin correlated with PVR (r s = 0.37, p = 0.0004) and mPAP (r s = 0.25, p = 0.02). Broad alterations in other plasma hemoglobin processing proteins (hemopexin, heme oxygenase-1, and sCD163) were observed. In conclusion, pulmonary vascular CFH is associated with increased PVR and mPAP in PAH and dysregulated CFH clearance may contribute to PAH pathology. Further study is needed to determine whether targeting CFH is a viable therapeutic for pulmonary vascular dysfunction in PAH.

Collapse

Affiliation(s)

Jamie E. Meegan Department of Medicine, Division of Allergy, Pulmonary and Critical Care MedicineVanderbilt University Medical CenterNashvilleTennesseeUSA
Vern Eric Kerchberger Department of Medicine, Division of Allergy, Pulmonary and Critical Care MedicineVanderbilt University Medical CenterNashvilleTennesseeUSA
Niki L. Fortune Department of Medicine, Division of Allergy, Pulmonary and Critical Care MedicineVanderbilt University Medical CenterNashvilleTennesseeUSA
Joel Brennan McNeil Department of Medicine, Division of Allergy, Pulmonary and Critical Care MedicineVanderbilt University Medical CenterNashvilleTennesseeUSA
Julie A. Bastarache Department of Medicine, Division of Allergy, Pulmonary and Critical Care MedicineVanderbilt University Medical CenterNashvilleTennesseeUSA Department of Pathology, Microbiology and ImmunologyVanderbilt University Medical CenterNashvilleTennesseeUSA Department of Cell and Developmental BiologyVanderbilt University Medical CenterNashvilleTennesseeUSA
Eric D. Austin Department of Pediatrics, Division of Allergy, Immunology, and Pulmonary MedicineVanderbilt University Medical CenterNashvilleTennesseeUSA
Lorraine B. Ware Department of Medicine, Division of Allergy, Pulmonary and Critical Care MedicineVanderbilt University Medical CenterNashvilleTennesseeUSA Department of Pathology, Microbiology and ImmunologyVanderbilt University Medical CenterNashvilleTennesseeUSA
Anna R. Hemnes Department of Medicine, Division of Allergy, Pulmonary and Critical Care MedicineVanderbilt University Medical CenterNashvilleTennesseeUSA Vanderbilt Pulmonary Circulation CenterVanderbilt University Medical CenterNashvilleTennesseeUSA
Evan L. Brittain Vanderbilt Pulmonary Circulation CenterVanderbilt University Medical CenterNashvilleTennesseeUSA Department of Medicine, Division of Cardiovascular MedicineVanderbilt University Medical CenterNashvilleTennesseeUSA

Collapse

Barr PB, Bigdeli TB, Meyers JL. Characterizing and Coding Psychiatric Diagnoses Using Electronic Health Record Data-Reply. JAMA Psychiatry 2022;79:2796414. [PMID: 36103173 DOI: 10.1001/jamapsychiatry.2022.2739] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Indexed: 11/14/2022]

Brandt PS, Pacheco JA, Adekkanattu P, Sholle ET, Abedian S, Stone DJ, Knaack DM, Xu J, Xu Z, Peng Y, Benda NC, Wang F, Luo Y, Jiang G, Pathak J, Rasmussen LV. Design and validation of a FHIR-based EHR-driven phenotyping toolbox. J Am Med Inform Assoc 2022;29:1449-1460. [PMID: 35799370 PMCID: PMC9382394 DOI: 10.1093/jamia/ocac063] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/27/2021] [Revised: 04/04/2022] [Accepted: 06/17/2022] [Indexed: 12/14/2022] Open

Krantz MS, Kerchberger VE, Wei WQ. Novel Analysis Methods to Mine Immune-Mediated Phenotypes and Find Genetic Variation Within the Electronic Health Record (Roadmap for Phenotype to Genotype: Immunogenomics). THE JOURNAL OF ALLERGY AND CLINICAL IMMUNOLOGY. IN PRACTICE 2022;10:1757-1762. [PMID: 35487368 PMCID: PMC9624141 DOI: 10.1016/j.jaip.2022.04.016] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 01/27/2022] [Revised: 04/13/2022] [Accepted: 04/18/2022] [Indexed: 06/14/2023]

Genetics in chronic kidney disease: conclusions from a Kidney Disease: Improving Global Outcomes (KDIGO) Controversies Conference. Kidney Int 2022;101:1126-1141. [PMID: 35460632 PMCID: PMC9922534 DOI: 10.1016/j.kint.2022.03.019] [Citation(s) in RCA: 34] [Impact Index Per Article: 17.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/13/2022] [Revised: 03/16/2022] [Accepted: 03/29/2022] [Indexed: 01/19/2023]

Integration of Omics and Phenotypic Data for Precision Medicine. METHODS IN MOLECULAR BIOLOGY (CLIFTON, N.J.) 2022;2486:19-35. [PMID: 35437716 DOI: 10.1007/978-1-0716-2265-0_2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Subscribe] [Scholar Register] [Indexed: 10/18/2022]

Denaxas S, Liu G, Feng Q, Fatemifar G, Bastarache L, Kerchberger EV, Hingorani AD, Lumbers T, Peterson JF, Wei WQ, Hemingway H. Mapping the Read2/CTV3 controlled clinical terminologies to Phecodes in UK Biobank primary care electronic health records: implementation and evaluation. AMIA ... ANNUAL SYMPOSIUM PROCEEDINGS. AMIA SYMPOSIUM 2022;2021:362-371. [PMID: 35308936 PMCID: PMC8861677] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Subscribe] [Scholar Register] [Indexed: 03/30/2023]

Maturation and application of phenome-wide association studies. Trends Genet 2022;38:353-363. [PMID: 34991903 DOI: 10.1016/j.tig.2021.12.002] [Citation(s) in RCA: 6] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/11/2021] [Revised: 11/12/2021] [Accepted: 12/02/2021] [Indexed: 12/12/2022]

Association of step counts over time with the risk of chronic disease in the All of Us Research Program. Nat Med 2022;28:2301-2308. [PMID: 36216933 PMCID: PMC9671804 DOI: 10.1038/s41591-022-02012-w] [Citation(s) in RCA: 50] [Impact Index Per Article: 25.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/28/2022] [Accepted: 08/15/2022] [Indexed: 01/14/2023]

Liu X, Chubak J, Hubbard RA, Chen Y. SAT: a Surrogate-Assisted Two-wave case boosting sampling method, with application to EHR-based association studies. J Am Med Inform Assoc 2021;29:918-927. [PMID: 34962283 PMCID: PMC9714591 DOI: 10.1093/jamia/ocab267] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/24/2021] [Revised: 10/16/2021] [Accepted: 11/23/2021] [Indexed: 12/30/2022] Open

Lockshin MD, Crow MK, Barbhaiya M. When a Diagnosis Has No Name: Uncertainty and Opportunity. ACR Open Rheumatol 2021;4:197-201. [PMID: 34806330 PMCID: PMC8916551 DOI: 10.1002/acr2.11368] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/24/2022] Open

Abstract

Diagnostic uncertainty, commonly encountered in rheumatology and other fields of medicine, is an opportunity: Stakeholders who understand uncertainty's causes and quantitate its effects can reduce uncertainty and can use uncertainty to improve medical practice, science, and administration. To articulate, bring attention to, and offer recommendations for diagnostic uncertainty, the Barbara Volcker Center at the Hospital for Special Surgery sponsored, in April 2021, a virtual international workshop, “When a Diagnosis Has No Name.” This paper summarizes the opinions of 72 stakeholders from the fields of medical research, industry, federal regulatory agencies, insurers, hospital management, medical philosophy, public media, health care law, clinical rheumatology, other specialty areas of medicine, and patients. Speakers addressed the effects of diagnostic uncertainty in their fields. The workshop addressed the following six questions: What is a diagnosis? What are the purposes of diagnoses? How do doctors assign diagnoses? What is uncertainty? What are its causes? How does understanding uncertainty offer opportunities to improve all fields of medicine? The workshop's conveners systematically reviewed video recordings of formal presentations, video recordings of open discussion periods, manuscripts, and slide files submitted by the speakers to develop consensus take‐home messages, which were as follows: Diagnostic uncertainty causes harm when patients lack access to laboratory test and treatments, do not participate in research studies, are not counted in administrative and public health documents, and suffer humiliation in their interactions with others. Uncertainty offers opportunities, such as quantifying uncertainty, using statistical technologies and automated intelligence to stratify patient groups by level of uncertainty, using a common vocabulary, and considering the effects of time.

Collapse

Zheng NS, Kerchberger VE, Borza VA, Eken HN, Smith JC, Wei WQ. An updated, computable MEDication-Indication resource for biomedical research. Sci Rep 2021;11:18953. [PMID: 34556781 PMCID: PMC8460636 DOI: 10.1038/s41598-021-98579-4] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/26/2020] [Accepted: 09/02/2021] [Indexed: 11/09/2022] Open

Chapman M, Mumtaz S, Rasmussen LV, Karwath A, Gkoutos GV, Gao C, Thayer D, Pacheco JA, Parkinson H, Richesson RL, Jefferson E, Denaxas S, Curcin V. Desiderata for the development of next-generation electronic health record phenotype libraries. Gigascience 2021;10:giab059. [PMID: 34508578 PMCID: PMC8434766 DOI: 10.1093/gigascience/giab059] [Citation(s) in RCA: 9] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/24/2021] [Revised: 07/15/2021] [Accepted: 08/18/2021] [Indexed: 11/22/2022] Open

De Freitas JK, Johnson KW, Golden E, Nadkarni GN, Dudley JT, Bottinger EP, Glicksberg BS, Miotto R. Phe2vec: Automated disease phenotyping based on unsupervised embeddings from electronic health records. PATTERNS (NEW YORK, N.Y.) 2021;2:100337. [PMID: 34553174 PMCID: PMC8441576 DOI: 10.1016/j.patter.2021.100337] [Citation(s) in RCA: 10] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 03/29/2021] [Revised: 06/30/2021] [Accepted: 08/05/2021] [Indexed: 11/23/2022]

Affiliation(s)

Jessica K. De Freitas Hasso Plattner Institute for Digital Health at Mount Sinai, Icahn School of Medicine at Mount Sinai, 1 Gustave L. Levy Pl, New York, NY 10029, USA Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, 1 Gustave L. Levy Pl, New York, NY 10029, USA
Kipp W. Johnson Hasso Plattner Institute for Digital Health at Mount Sinai, Icahn School of Medicine at Mount Sinai, 1 Gustave L. Levy Pl, New York, NY 10029, USA Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, 1 Gustave L. Levy Pl, New York, NY 10029, USA
Eddye Golden Hasso Plattner Institute for Digital Health at Mount Sinai, Icahn School of Medicine at Mount Sinai, 1 Gustave L. Levy Pl, New York, NY 10029, USA Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, 1 Gustave L. Levy Pl, New York, NY 10029, USA
Girish N. Nadkarni Hasso Plattner Institute for Digital Health at Mount Sinai, Icahn School of Medicine at Mount Sinai, 1 Gustave L. Levy Pl, New York, NY 10029, USA Department of Medicine, Icahn School of Medicine at Mount Sinai, 1 Gustave L. Levy Pl, New York, NY 10029, USA
Joel T. Dudley Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, 1 Gustave L. Levy Pl, New York, NY 10029, USA
Erwin P. Bottinger Hasso Plattner Institute for Digital Health at Mount Sinai, Icahn School of Medicine at Mount Sinai, 1 Gustave L. Levy Pl, New York, NY 10029, USA Department of Medicine, Icahn School of Medicine at Mount Sinai, 1 Gustave L. Levy Pl, New York, NY 10029, USA Digital Health Center at Hasso Plattner Institute, University of Potsdam, Professor-Dr.-Helmert-Str 2–3, 14482 Potsdam, Germany
Benjamin S. Glicksberg Hasso Plattner Institute for Digital Health at Mount Sinai, Icahn School of Medicine at Mount Sinai, 1 Gustave L. Levy Pl, New York, NY 10029, USA Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, 1 Gustave L. Levy Pl, New York, NY 10029, USA
Riccardo Miotto Hasso Plattner Institute for Digital Health at Mount Sinai, Icahn School of Medicine at Mount Sinai, 1 Gustave L. Levy Pl, New York, NY 10029, USA Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, 1 Gustave L. Levy Pl, New York, NY 10029, USA

Collapse

DeLozier S, Bland HT, McPheeters M, Wells Q, Farber-Eger E, Bejan CA, Fabbri D, Rosenbloom T, Roden D, Johnson KB, Wei WQ, Peterson J, Bastarache L. Phenotyping coronavirus disease 2019 during a global health pandemic: Lessons learned from the characterization of an early cohort. J Biomed Inform 2021;117:103777. [PMID: 33838341 PMCID: PMC8026248 DOI: 10.1016/j.jbi.2021.103777] [Citation(s) in RCA: 10] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/16/2020] [Revised: 02/09/2021] [Accepted: 04/03/2021] [Indexed: 01/08/2023]

Abstract

From the start of the coronavirus disease 2019 (COVID-19) pandemic, researchers have looked to electronic health record (EHR) data as a way to study possible risk factors and outcomes. To ensure the validity and accuracy of research using these data, investigators need to be confident that the phenotypes they construct are reliable and accurate, reflecting the healthcare settings from which they are ascertained. We developed a COVID-19 registry at a single academic medical center and used data from March 1 to June 5, 2020 to assess differences in population-level characteristics in pandemic and non-pandemic years respectively. Median EHR length, previously shown to impact phenotype performance in type 2 diabetes, was significantly shorter in the SARS-CoV-2 positive group relative to a 2019 influenza tested group (median 3.1 years vs 8.7; Wilcoxon rank sum P = 1.3e-52). Using three phenotyping methods of increasing complexity (billing codes alone and domain-specific algorithms provided by an EHR vendor and clinical experts), common medical comorbidities were abstracted from COVID-19 EHRs, defined by the presence of a positive laboratory test (positive predictive value 100%, recall 93%). After combining performance data across phenotyping methods, we observed significantly lower false negative rates for those records billed for a comprehensive care visit (p = 4e-11) and those with complete demographics data recorded (p = 7e-5). In an early COVID-19 cohort, we found that phenotyping performance of nine common comorbidities was influenced by median EHR length, consistent with previous studies, as well as by data density, which can be measured using portable metrics including CPT codes. Here we present those challenges and potential solutions to creating deeply phenotyped, acute COVID-19 cohorts.

Collapse

Affiliation(s)

Sarah DeLozier Department of Biomedical Informatics, Vanderbilt University Medical Center, West End Ave, Suite 1475, Nashville, TN 37203, USA.
Harris T Bland Department of Biomedical Informatics, Vanderbilt University Medical Center, West End Ave, Suite 1475, Nashville, TN 37203, USA
Melissa McPheeters Department of Biomedical Informatics, Vanderbilt University Medical Center, West End Ave, Suite 1475, Nashville, TN 37203, USA
Quinn Wells Division of Cardiovascular Medicine, Vanderbilt University Medical Center, Pierce Avenue, 383 Preston Research Building, Nashville, TN 37232, USA
Eric Farber-Eger Division of Cardiovascular Medicine, Vanderbilt University Medical Center, Pierce Avenue, 383 Preston Research Building, Nashville, TN 37232, USA
Cosmin A Bejan Department of Biomedical Informatics, Vanderbilt University Medical Center, West End Ave, Suite 1475, Nashville, TN 37203, USA
Daniel Fabbri Department of Biomedical Informatics, Vanderbilt University Medical Center, West End Ave, Suite 1475, Nashville, TN 37203, USA
Trent Rosenbloom Department of Biomedical Informatics, Vanderbilt University Medical Center, West End Ave, Suite 1475, Nashville, TN 37203, USA
Dan Roden Department of Biomedical Informatics, Vanderbilt University Medical Center, West End Ave, Suite 1475, Nashville, TN 37203, USA; Division of Cardiovascular Medicine, Vanderbilt University Medical Center, Pierce Avenue, 383 Preston Research Building, Nashville, TN 37232, USA
Kevin B Johnson Department of Biomedical Informatics, Vanderbilt University Medical Center, West End Ave, Suite 1475, Nashville, TN 37203, USA
Wei-Qi Wei Department of Biomedical Informatics, Vanderbilt University Medical Center, West End Ave, Suite 1475, Nashville, TN 37203, USA
Josh Peterson Department of Biomedical Informatics, Vanderbilt University Medical Center, West End Ave, Suite 1475, Nashville, TN 37203, USA
Lisa Bastarache Department of Biomedical Informatics, Vanderbilt University Medical Center, West End Ave, Suite 1475, Nashville, TN 37203, USA

Collapse

Si Y, Bernstam EV, Roberts K. Generalized and transferable patient language representation for phenotyping with limited data. J Biomed Inform 2021;116:103726. [PMID: 33711541 DOI: 10.1016/j.jbi.2021.103726] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/20/2020] [Revised: 12/14/2020] [Accepted: 02/23/2021] [Indexed: 12/19/2022]

Newman-Griffis D, Fosler-Lussier E. Automated Coding of Under-Studied Medical Concept Domains: Linking Physical Activity Reports to the International Classification of Functioning, Disability, and Health. Front Digit Health 2021;3:620828. [PMID: 33791684 PMCID: PMC8009547 DOI: 10.3389/fdgth.2021.620828] [Citation(s) in RCA: 10] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/23/2020] [Accepted: 02/16/2021] [Indexed: 11/13/2022] Open

Abstract

Linking clinical narratives to standardized vocabularies and coding systems is a key component of unlocking the information in medical text for analysis. However, many domains of medical concepts, such as functional outcomes and social determinants of health, lack well-developed terminologies that can support effective coding of medical text. We present a framework for developing natural language processing (NLP) technologies for automated coding of medical information in under-studied domains, and demonstrate its applicability through a case study on physical mobility function. Mobility function is a component of many health measures, from post-acute care and surgical outcomes to chronic frailty and disability, and is represented as one domain of human activity in the International Classification of Functioning, Disability, and Health (ICF). However, mobility and other types of functional activity remain under-studied in the medical informatics literature, and neither the ICF nor commonly-used medical terminologies capture functional status terminology in practice. We investigated two data-driven paradigms, classification and candidate selection, to link narrative observations of mobility status to standardized ICF codes, using a dataset of clinical narratives from physical therapy encounters. Recent advances in language modeling and word embedding were used as features for established machine learning models and a novel deep learning approach, achieving a macro-averaged F-1 score of 84% on linking mobility activity reports to ICF codes. Both classification and candidate selection approaches present distinct strengths for automated coding in under-studied domains, and we highlight that the combination of (i) a small annotated data set; (ii) expert definitions of codes of interest; and (iii) a representative text corpus is sufficient to produce high-performing automated coding systems. This research has implications for continued development of language technologies to analyze functional status information, and the ongoing growth of NLP tools for a variety of specialized applications in clinical care and research.

Collapse