1
|
Soysal E, Roberts K. PheNormGPT: a framework for extraction and normalization of key medical findings. Database (Oxford) 2024; 2024:baae103. [PMID: 39444329 PMCID: PMC11498178 DOI: 10.1093/database/baae103] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/31/2024] [Revised: 07/31/2024] [Accepted: 08/27/2024] [Indexed: 10/25/2024]
Abstract
This manuscript presents PheNormGPT, a framework for extraction and normalization of key findings in clinical text. PheNormGPT relies on an innovative approach, leveraging large language models to extract key findings and phenotypic data in unstructured clinical text and map them to Human Phenotype Ontology concepts. It utilizes OpenAI's GPT-3.5 Turbo and GPT-4 models with fine-tuning and few-shot learning strategies, including a novel few-shot learning strategy for custom-tailored few-shot example selection per request. PheNormGPT was evaluated in the BioCreative VIII Track 3: Genetic Phenotype Extraction from Dysmorphology Physical Examination Entries shared task. PheNormGPT achieved an F1 score of 0.82 for standard matching and 0.72 for exact matching, securing first place for this shared task.
Collapse
Affiliation(s)
- Ekin Soysal
- McWilliams School of Biomedical Informatics, University of Texas Health Science Center at Houston, 7000 Fannin St #600, Houston, TX 77030, United States
| | - Kirk Roberts
- McWilliams School of Biomedical Informatics, University of Texas Health Science Center at Houston, 7000 Fannin St #600, Houston, TX 77030, United States
| |
Collapse
|
2
|
Xu D, Xu Z. Machine learning applications in preventive healthcare: A systematic literature review on predictive analytics of disease comorbidity from multiple perspectives. Artif Intell Med 2024; 156:102950. [PMID: 39163727 DOI: 10.1016/j.artmed.2024.102950] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/25/2023] [Revised: 06/17/2024] [Accepted: 08/13/2024] [Indexed: 08/22/2024]
Abstract
Artificial intelligence is constantly revolutionizing biomedical research and healthcare management. Disease comorbidity is a major threat to the quality of life for susceptible groups, especially middle-aged and elderly patients. The presence of multiple chronic diseases makes precision diagnosis challenging to realize and imposes a heavy burden on the healthcare system and economy. Given an enormous amount of accumulated health data, machine learning techniques show their capability in handling this puzzle. The present study conducts a review to uncover current research efforts in applying these methods to understanding comorbidity mechanisms and making clinical predictions considering these complex patterns. A descriptive metadata analysis of 791 unique publications aims to capture the overall research progression between January 2012 and June 2023. To delve into comorbidity-focused research, 61 of these scientific papers are systematically assessed. Four predictive analytics of tasks are detected: disease comorbidity data extraction, clustering, network, and risk prediction. It is observed that some machine learning-driven applications address inherent data deficiencies in healthcare datasets and provide a model interpretation that identifies significant risk factors of comorbidity development. Based on insights, both technical and practical, gained from relevant literature, this study intends to guide future interests in comorbidity research and draw conclusions about chronic disease prevention and diagnosis with managerial implications.
Collapse
Affiliation(s)
- Duo Xu
- School of Economics and Management, Southeast University, Nanjing 211189, China.
| | - Zeshui Xu
- School of Economics and Management, Southeast University, Nanjing 211189, China; Business School, Sichuan University, Chengdu 610064, China.
| |
Collapse
|
3
|
Alsentzer E, Rasmussen MJ, Fontoura R, Cull AL, Beaulieu-Jones B, Gray KJ, Bates DW, Kovacheva VP. Zero-shot interpretable phenotyping of postpartum hemorrhage using large language models. NPJ Digit Med 2023; 6:212. [PMID: 38036723 PMCID: PMC10689487 DOI: 10.1038/s41746-023-00957-x] [Citation(s) in RCA: 21] [Impact Index Per Article: 10.5] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/02/2023] [Accepted: 11/01/2023] [Indexed: 12/02/2023] Open
Abstract
Many areas of medicine would benefit from deeper, more accurate phenotyping, but there are limited approaches for phenotyping using clinical notes without substantial annotated data. Large language models (LLMs) have demonstrated immense potential to adapt to novel tasks with no additional training by specifying task-specific instructions. Here we report the performance of a publicly available LLM, Flan-T5, in phenotyping patients with postpartum hemorrhage (PPH) using discharge notes from electronic health records (n = 271,081). The language model achieves strong performance in extracting 24 granular concepts associated with PPH. Identifying these granular concepts accurately allows the development of interpretable, complex phenotypes and subtypes. The Flan-T5 model achieves high fidelity in phenotyping PPH (positive predictive value of 0.95), identifying 47% more patients with this complication compared to the current standard of using claims codes. This LLM pipeline can be used reliably for subtyping PPH and outperforms a claims-based approach on the three most common PPH subtypes associated with uterine atony, abnormal placentation, and obstetric trauma. The advantage of this approach to subtyping is its interpretability, as each concept contributing to the subtype determination can be evaluated. Moreover, as definitions may change over time due to new guidelines, using granular concepts to create complex phenotypes enables prompt and efficient updating of the algorithm. Using this language modelling approach enables rapid phenotyping without the need for any manually annotated training data across multiple clinical use cases.
Collapse
Affiliation(s)
- Emily Alsentzer
- Division of General Internal Medicine and Primary Care, Brigham and Women's Hospital, Boston, MA, USA
| | - Matthew J Rasmussen
- Department of Anesthesiology, Perioperative and Pain Medicine, Brigham and Women's Hospital, Boston, MA, USA
| | - Romy Fontoura
- Department of Anesthesiology, Perioperative and Pain Medicine, Brigham and Women's Hospital, Boston, MA, USA
| | - Alexis L Cull
- Department of Anesthesiology, Perioperative and Pain Medicine, Brigham and Women's Hospital, Boston, MA, USA
| | - Brett Beaulieu-Jones
- Section of Biomedical Data Science, Department of Medicine, University of Chicago, Chicago, IL, USA
| | - Kathryn J Gray
- Center for Genomic Medicine, Massachusetts General Hospital, Boston, MA, USA
- Division of Maternal-Fetal Medicine, Brigham and Women's Hospital, Boston, MA, USA
| | - David W Bates
- Division of General Internal Medicine and Primary Care, Brigham and Women's Hospital, Boston, MA, USA
- Department of Health Care Policy and Management, Harvard T. H. Chan School of Public Health, Boston, MA, USA
| | - Vesela P Kovacheva
- Department of Anesthesiology, Perioperative and Pain Medicine, Brigham and Women's Hospital, Boston, MA, USA.
| |
Collapse
|
4
|
Mao C, Xu J, Rasmussen L, Li Y, Adekkanattu P, Pacheco J, Bonakdarpour B, Vassar R, Shen L, Jiang G, Wang F, Pathak J, Luo Y. AD-BERT: Using pre-trained language model to predict the progression from mild cognitive impairment to Alzheimer's disease. J Biomed Inform 2023; 144:104442. [PMID: 37429512 PMCID: PMC11131134 DOI: 10.1016/j.jbi.2023.104442] [Citation(s) in RCA: 16] [Impact Index Per Article: 8.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/24/2023] [Revised: 06/13/2023] [Accepted: 07/07/2023] [Indexed: 07/12/2023]
Abstract
OBJECTIVE We develop a deep learning framework based on the pre-trained Bidirectional Encoder Representations from Transformers (BERT) model using unstructured clinical notes from electronic health records (EHRs) to predict the risk of disease progression from Mild Cognitive Impairment (MCI) to Alzheimer's Disease (AD). METHODS We identified 3657 patients diagnosed with MCI together with their progress notes from Northwestern Medicine Enterprise Data Warehouse (NMEDW) between 2000 and 2020. The progress notes no later than the first MCI diagnosis were used for the prediction. We first preprocessed the notes by deidentification, cleaning and splitting into sections, and then pre-trained a BERT model for AD (named AD-BERT) based on the publicly available Bio+Clinical BERT on the preprocessed notes. All sections of a patient were embedded into a vector representation by AD-BERT and then combined by global MaxPooling and a fully connected network to compute the probability of MCI-to-AD progression. For validation, we conducted a similar set of experiments on 2563 MCI patients identified at Weill Cornell Medicine (WCM) during the same timeframe. RESULTS Compared with the 7 baseline models, the AD-BERT model achieved the best performance on both datasets, with Area Under receiver operating characteristic Curve (AUC) of 0.849 and F1 score of 0.440 on NMEDW dataset, and AUC of 0.883 and F1 score of 0.680 on WCM dataset. CONCLUSION The use of EHRs for AD-related research is promising, and AD-BERT shows superior predictive performance in modeling MCI-to-AD progression prediction. Our study demonstrates the utility of pre-trained language models and clinical notes in predicting MCI-to-AD progression, which could have important implications for improving early detection and intervention for AD.
Collapse
Affiliation(s)
- Chengsheng Mao
- Department of Preventive Medicine, Feinberg School of Medicine, Northwestern University, Chicago, IL, United States
| | - Jie Xu
- Department of Health Outcomes and Biomedical Informatics, University of Florida, Gainesville, FL, United States; Weill Cornell Medicine, New York, NY, United States
| | - Luke Rasmussen
- Department of Preventive Medicine, Feinberg School of Medicine, Northwestern University, Chicago, IL, United States
| | - Yikuan Li
- Department of Preventive Medicine, Feinberg School of Medicine, Northwestern University, Chicago, IL, United States
| | | | - Jennifer Pacheco
- Department of Preventive Medicine, Feinberg School of Medicine, Northwestern University, Chicago, IL, United States
| | - Borna Bonakdarpour
- Department of Neurology, Feinberg School of Medicine, Northwestern University, Chicago, IL, United States
| | - Robert Vassar
- Department of Neurology, Feinberg School of Medicine, Northwestern University, Chicago, IL, United States
| | - Li Shen
- Department of Biostatistics, Epidemiology and Informatics, University of Pennsylvania, Philadelphia, PA, United States
| | | | - Fei Wang
- Weill Cornell Medicine, New York, NY, United States
| | | | - Yuan Luo
- Department of Preventive Medicine, Feinberg School of Medicine, Northwestern University, Chicago, IL, United States.
| |
Collapse
|
5
|
Alsentzer E, Rasmussen MJ, Fontoura R, Cull AL, Beaulieu-Jones B, Gray KJ, Bates DW, Kovacheva VP. Zero-shot Interpretable Phenotyping of Postpartum Hemorrhage Using Large Language Models. MEDRXIV : THE PREPRINT SERVER FOR HEALTH SCIENCES 2023:2023.05.31.23290753. [PMID: 37398230 PMCID: PMC10312824 DOI: 10.1101/2023.05.31.23290753] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 07/04/2023]
Abstract
Many areas of medicine would benefit from deeper, more accurate phenotyping, but there are limited approaches for phenotyping using clinical notes without substantial annotated data. Large language models (LLMs) have demonstrated immense potential to adapt to novel tasks with no additional training by specifying task-specific i nstructions. We investigated the per-formance of a publicly available LLM, Flan-T5, in phenotyping patients with postpartum hemorrhage (PPH) using discharge notes from electronic health records ( n =271,081). The language model achieved strong performance in extracting 24 granular concepts associated with PPH. Identifying these granular concepts accurately allowed the development of inter-pretable, complex phenotypes and subtypes. The Flan-T5 model achieved high fidelity in phenotyping PPH (positive predictive value of 0.95), identifying 47% more patients with this complication compared to the current standard of using claims codes. This LLM pipeline can be used reliably for subtyping PPH and outperformed a claims-based approach on the three most common PPH subtypes associated with uterine atony, abnormal placentation, and obstetric trauma. The advantage of this approach to subtyping is its interpretability, as each concept contributing to the subtype determination can be evaluated. Moreover, as definitions may change over time due to new guidelines, using granular concepts to create complex phenotypes enables prompt and efficient updating of the algorithm. Using this lan-guage modelling approach enables rapid phenotyping without the need for any manually annotated training data across multiple clinical use cases.
Collapse
|
6
|
Pacheco JA, Rasmussen LV, Wiley K, Person TN, Cronkite DJ, Sohn S, Murphy S, Gundelach JH, Gainer V, Castro VM, Liu C, Mentch F, Lingren T, Sundaresan AS, Eickelberg G, Willis V, Furmanchuk A, Patel R, Carrell DS, Deng Y, Walton N, Satterfield BA, Kullo IJ, Dikilitas O, Smith JC, Peterson JF, Shang N, Kiryluk K, Ni Y, Li Y, Nadkarni GN, Rosenthal EA, Walunas TL, Williams MS, Karlson EW, Linder JE, Luo Y, Weng C, Wei W. Evaluation of the portability of computable phenotypes with natural language processing in the eMERGE network. Sci Rep 2023; 13:1971. [PMID: 36737471 PMCID: PMC9898520 DOI: 10.1038/s41598-023-27481-y] [Citation(s) in RCA: 6] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/15/2022] [Accepted: 01/03/2023] [Indexed: 02/05/2023] Open
Abstract
The electronic Medical Records and Genomics (eMERGE) Network assessed the feasibility of deploying portable phenotype rule-based algorithms with natural language processing (NLP) components added to improve performance of existing algorithms using electronic health records (EHRs). Based on scientific merit and predicted difficulty, eMERGE selected six existing phenotypes to enhance with NLP. We assessed performance, portability, and ease of use. We summarized lessons learned by: (1) challenges; (2) best practices to address challenges based on existing evidence and/or eMERGE experience; and (3) opportunities for future research. Adding NLP resulted in improved, or the same, precision and/or recall for all but one algorithm. Portability, phenotyping workflow/process, and technology were major themes. With NLP, development and validation took longer. Besides portability of NLP technology and algorithm replicability, factors to ensure success include privacy protection, technical infrastructure setup, intellectual property agreement, and efficient communication. Workflow improvements can improve communication and reduce implementation time. NLP performance varied mainly due to clinical document heterogeneity; therefore, we suggest using semi-structured notes, comprehensive documentation, and customization options. NLP portability is possible with improved phenotype algorithm performance, but careful planning and architecture of the algorithms is essential to support local customizations.
Collapse
Affiliation(s)
| | | | - Ken Wiley
- National Human Genome Research Institute, Bethesda, USA
| | | | - David J Cronkite
- Kaiser Permanente Washington Health Research Institute, Seattle, USA
| | | | | | | | | | | | - Cong Liu
- Columbia University, New York, USA
| | - Frank Mentch
- Children's Hospital of Philadelphia, Philadelphia, USA
| | - Todd Lingren
- Cincinnati Children's Hospital Medical Center, Cincinnati, USA
| | | | | | | | | | | | - David S Carrell
- Kaiser Permanente Washington Health Research Institute, Seattle, USA
| | - Yu Deng
- Northwestern University, Evanston, USA
| | | | | | | | | | | | | | | | | | - Yizhao Ni
- Cincinnati Children's Hospital Medical Center, Cincinnati, USA
| | - Yikuan Li
- Northwestern University, Evanston, USA
| | | | | | | | | | | | | | - Yuan Luo
- Northwestern University, Evanston, USA
| | | | - WeiQi Wei
- Vanderbilt University Medical Center, Nashville, USA
| |
Collapse
|
7
|
Brandt PS, Pacheco JA, Adekkanattu P, Sholle ET, Abedian S, Stone DJ, Knaack DM, Xu J, Xu Z, Peng Y, Benda NC, Wang F, Luo Y, Jiang G, Pathak J, Rasmussen LV. Design and validation of a FHIR-based EHR-driven phenotyping toolbox. J Am Med Inform Assoc 2022; 29:1449-1460. [PMID: 35799370 PMCID: PMC9382394 DOI: 10.1093/jamia/ocac063] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/27/2021] [Revised: 04/04/2022] [Accepted: 06/17/2022] [Indexed: 12/14/2022] Open
Abstract
OBJECTIVES To develop and validate a standards-based phenotyping tool to author electronic health record (EHR)-based phenotype definitions and demonstrate execution of the definitions against heterogeneous clinical research data platforms. MATERIALS AND METHODS We developed an open-source, standards-compliant phenotyping tool known as the PhEMA Workbench that enables a phenotype representation using the Fast Healthcare Interoperability Resources (FHIR) and Clinical Quality Language (CQL) standards. We then demonstrated how this tool can be used to conduct EHR-based phenotyping, including phenotype authoring, execution, and validation. We validated the performance of the tool by executing a thrombotic event phenotype definition at 3 sites, Mayo Clinic (MC), Northwestern Medicine (NM), and Weill Cornell Medicine (WCM), and used manual review to determine precision and recall. RESULTS An initial version of the PhEMA Workbench has been released, which supports phenotype authoring, execution, and publishing to a shared phenotype definition repository. The resulting thrombotic event phenotype definition consisted of 11 CQL statements, and 24 value sets containing a total of 834 codes. Technical validation showed satisfactory performance (both NM and MC had 100% precision and recall and WCM had a precision of 95% and a recall of 84%). CONCLUSIONS We demonstrate that the PhEMA Workbench can facilitate EHR-driven phenotype definition, execution, and phenotype sharing in heterogeneous clinical research data environments. A phenotype definition that integrates with existing standards-compliant systems, and the use of a formal representation facilitates automation and can decrease potential for human error.
Collapse
Affiliation(s)
- Pascal S Brandt
- Corresponding Author: Pascal S. Brandt, Department of Biomedical Informatics & Medical Education, University of Washington, Box 358047, Seattle, WA 98195, USA;
| | - Jennifer A Pacheco
- Department of Preventive Medicine, Northwestern University Feinberg School of Medicine, Chicago, Illinois, USA
| | - Prakash Adekkanattu
- Department of Healthcare Policy and Research, Weill Cornell Medicine, New York, New York, USA
| | - Evan T Sholle
- Department of Healthcare Policy and Research, Weill Cornell Medicine, New York, New York, USA
| | - Sajjad Abedian
- Department of Healthcare Policy and Research, Weill Cornell Medicine, New York, New York, USA
| | - Daniel J Stone
- Department of Artificial Intelligence and Informatics, Mayo Clinic, Rochester, Minnesota, USA
| | - David M Knaack
- Department of Artificial Intelligence and Informatics, Mayo Clinic, Rochester, Minnesota, USA
| | - Jie Xu
- Department of Healthcare Policy and Research, Weill Cornell Medicine, New York, New York, USA
| | - Zhenxing Xu
- Department of Healthcare Policy and Research, Weill Cornell Medicine, New York, New York, USA
| | - Yifan Peng
- Department of Healthcare Policy and Research, Weill Cornell Medicine, New York, New York, USA
| | - Natalie C Benda
- Department of Healthcare Policy and Research, Weill Cornell Medicine, New York, New York, USA
| | - Fei Wang
- Department of Healthcare Policy and Research, Weill Cornell Medicine, New York, New York, USA
| | - Yuan Luo
- Department of Preventive Medicine, Northwestern University Feinberg School of Medicine, Chicago, Illinois, USA
| | - Guoqian Jiang
- Department of Artificial Intelligence and Informatics, Mayo Clinic, Rochester, Minnesota, USA
| | - Jyotishman Pathak
- Department of Healthcare Policy and Research, Weill Cornell Medicine, New York, New York, USA
| | - Luke V Rasmussen
- Department of Preventive Medicine, Northwestern University Feinberg School of Medicine, Chicago, Illinois, USA
| |
Collapse
|
8
|
Seedahmed MI, Mogilnicka I, Zeng S, Luo G, Whooley MA, McCulloch CE, Koth L, Arjomandi M. Performance of a Computational Phenotyping Algorithm for Sarcoidosis Using Diagnostic Codes in Electronic Medical Records: Case Validation Study From 2 Veterans Affairs Medical Centers. JMIR Form Res 2022; 6:e31615. [PMID: 35081036 PMCID: PMC8928044 DOI: 10.2196/31615] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/29/2021] [Revised: 01/24/2022] [Accepted: 01/24/2022] [Indexed: 11/29/2022] Open
Abstract
BACKGROUND Electronic medical records (EMRs) offer the promise of computationally identifying sarcoidosis cases. However, the accuracy of identifying these cases in the EMR is unknown. OBJECTIVE The aim of this study is to determine the statistical performance of using the International Classification of Diseases (ICD) diagnostic codes to identify patients with sarcoidosis in the EMR. METHODS We used the ICD diagnostic codes to identify sarcoidosis cases by searching the EMRs of the San Francisco and Palo Alto Veterans Affairs medical centers and randomly selecting 200 patients. To improve the diagnostic accuracy of the computational algorithm in cases where histopathological data are unavailable, we developed an index of suspicion to identify cases with a high index of suspicion for sarcoidosis (confirmed and probable) based on clinical and radiographic features alone using the American Thoracic Society practice guideline. Through medical record review, we determined the positive predictive value (PPV) of diagnosing sarcoidosis by two computational methods: using ICD codes alone and using ICD codes plus the high index of suspicion. RESULTS Among the 200 patients, 158 (79%) had a high index of suspicion for sarcoidosis. Of these 158 patients, 142 (89.9%) had documentation of nonnecrotizing granuloma, confirming biopsy-proven sarcoidosis. The PPV of using ICD codes alone was 79% (95% CI 78.6%-80.5%) for identifying sarcoidosis cases and 71% (95% CI 64.7%-77.3%) for identifying histopathologically confirmed sarcoidosis in the EMRs. The inclusion of the generated high index of suspicion to identify confirmed sarcoidosis cases increased the PPV significantly to 100% (95% CI 96.5%-100%). Histopathology documentation alone was 90% sensitive compared with high index of suspicion. CONCLUSIONS ICD codes are reasonable classifiers for identifying sarcoidosis cases within EMRs with a PPV of 79%. Using a computational algorithm to capture index of suspicion data elements could significantly improve the case-identification accuracy.
Collapse
Affiliation(s)
- Mohamed I Seedahmed
- Division of Pulmonary, Critical Care, Allergy and Immunology, and Sleep, Department of Medicine, University of California San Francisco, San Francisco, CA, United States
- San Francisco Veterans Affairs Medical Center, San Francisco, CA, United States
| | - Izabella Mogilnicka
- San Francisco Veterans Affairs Medical Center, San Francisco, CA, United States
- Department of Experimental Physiology and Pathophysiology, Laboratory of the Centre for Preclinical Research, Medical University of Warsaw, Warsaw, Poland
| | - Siyang Zeng
- San Francisco Veterans Affairs Medical Center, San Francisco, CA, United States
- Department of Biomedical Informatics and Medical Education, School of Medicine, University of Washington, Seattle, WA, United States
| | - Gang Luo
- Department of Biomedical Informatics and Medical Education, School of Medicine, University of Washington, Seattle, WA, United States
| | - Mary A Whooley
- San Francisco Veterans Affairs Medical Center, San Francisco, CA, United States
- Department of Medicine, University of California San Francisco, San Francisco, CA, United States
- Measurement Science Quality Enhancement Research Initiative, San Francisco Veterans Affairs Healthcare System, San Francisco, CA, United States
| | - Charles E McCulloch
- Department of Epidemiology & Biostatistics, University of California San Francisco, San Francisco, CA, United States
| | - Laura Koth
- Division of Pulmonary, Critical Care, Allergy and Immunology, and Sleep, Department of Medicine, University of California San Francisco, San Francisco, CA, United States
| | - Mehrdad Arjomandi
- Division of Pulmonary, Critical Care, Allergy and Immunology, and Sleep, Department of Medicine, University of California San Francisco, San Francisco, CA, United States
- San Francisco Veterans Affairs Medical Center, San Francisco, CA, United States
| |
Collapse
|
9
|
Park C, You SC, Jeon H, Jeong CW, Choi JW, Park RW. Development and Validation of the Radiology Common Data Model (R-CDM) for the International Standardization of Medical Imaging Data. Yonsei Med J 2022; 63:S74-S83. [PMID: 35040608 PMCID: PMC8790584 DOI: 10.3349/ymj.2022.63.s74] [Citation(s) in RCA: 9] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 09/08/2021] [Revised: 10/28/2021] [Accepted: 10/31/2021] [Indexed: 12/02/2022] Open
Abstract
PURPOSE Digital Imaging and Communications in Medicine (DICOM), a standard file format for medical imaging data, contains metadata describing each file. However, metadata are often incomplete, and there is no standardized format for recording metadata, leading to inefficiency during the metadata-based data retrieval process. Here, we propose a novel standardization method for DICOM metadata termed the Radiology Common Data Model (R-CDM). MATERIALS AND METHODS R-CDM was designed to be compatible with Health Level Seven International (HL7)/Fast Healthcare Interoperability Resources (FHIR) and linked with the Observational Medical Outcomes Partnership (OMOP)-CDM to achieve a seamless link between clinical data and medical imaging data. The terminology system was standardized using the RadLex playbook, a comprehensive lexicon of radiology. As a proof of concept, the R-CDM conversion process was conducted with 41.7 TB of data from the Ajou University Hospital. The R-CDM database visualizer was developed to visualize the main characteristics of the R-CDM database. RESULTS Information from 2801360 cases and 87203226 DICOM files was organized into two tables constituting the R-CDM. Information on imaging device and image resolution was recorded with more than 99.9% accuracy. Furthermore, OMOP-CDM and R-CDM were linked to efficiently extract specific types of images from specific patient cohorts. CONCLUSION R-CDM standardizes the structure and terminology for recording medical imaging data to eliminate incomplete and unstandardized information. Successful standardization was achieved by the extract, transform, and load process and image classifier. We hope that the R-CDM will contribute to deep learning research in the medical imaging field by enabling the securement of large-scale medical imaging data from multinational institutions.
Collapse
Affiliation(s)
- ChulHyoung Park
- Department of Biomedical Informatics, Ajou University School of Medicine, Suwon, Korea
| | - Seng Chan You
- Department of Preventive Medicine, Yonsei University College of Medicine, Seoul, Korea
| | - Hokyun Jeon
- Department of Biomedical Informatics, Ajou University School of Medicine, Suwon, Korea
| | - Chang Won Jeong
- Medical Convergence Research Center, Wonkwang University, Iksan, Korea
| | - Jin Wook Choi
- Department of Radiology, Ajou University Medical Center, Suwon, Korea
| | - Rae Woong Park
- Department of Biomedical Informatics, Ajou University School of Medicine, Suwon, Korea
- Department of Biomedical Sciences, Ajou University Graduate School of Medicine, Suwon, Korea.
| |
Collapse
|
10
|
Artificial Intelligence in Clinical Immunology. Artif Intell Med 2022. [DOI: 10.1007/978-3-030-64573-1_83] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/25/2022]
|
11
|
Ostropolets A, Zachariah P, Ryan P, Chen R, Hripcsak G. Data Consult Service: Can we use observational data to address immediate clinical needs? J Am Med Inform Assoc 2021; 28:2139-2146. [PMID: 34333606 PMCID: PMC8449613 DOI: 10.1093/jamia/ocab122] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/15/2021] [Revised: 03/30/2021] [Accepted: 06/02/2021] [Indexed: 01/08/2023] Open
Abstract
OBJECTIVE A number of clinical decision support tools aim to use observational data to address immediate clinical needs, but few of them address challenges and biases inherent in such data. The goal of this article is to describe the experience of running a data consult service that generates clinical evidence in real time and characterize the challenges related to its use of observational data. MATERIALS AND METHODS In 2019, we launched the Data Consult Service pilot with clinicians affiliated with Columbia University Irving Medical Center. We created and implemented a pipeline (question gathering, data exploration, iterative patient phenotyping, study execution, and assessing validity of results) for generating new evidence in real time. We collected user feedback and assessed issues related to producing reliable evidence. RESULTS We collected 29 questions from 22 clinicians through clinical rounds, emails, and in-person communication. We used validated practices to ensure reliability of evidence and answered 24 of them. Questions differed depending on the collection method, with clinical rounds supporting proactive team involvement and gathering more patient characterization questions and questions related to a current patient. The main challenges we encountered included missing and incomplete data, underreported conditions, and nonspecific coding and accurate identification of drug regimens. CONCLUSIONS While the Data Consult Service has the potential to generate evidence and facilitate decision making, only a portion of questions can be answered in real time. Recognizing challenges in patient phenotyping and designing studies along with using validated practices for observational research are mandatory to produce reliable evidence.
Collapse
Affiliation(s)
- Anna Ostropolets
- Department of Biomedical Informatics, Columbia University Medical Center, New York, New York, USA
| | - Philip Zachariah
- Department of Biomedical Informatics, Columbia University Medical Center, New York, New York, USA
- NewYork-Presbyterian Hospital, New York, New York, USA
| | - Patrick Ryan
- Department of Biomedical Informatics, Columbia University Medical Center, New York, New York, USA
| | - Ruijun Chen
- Department of Biomedical Informatics, Columbia University Medical Center, New York, New York, USA
- Department of Translational Data Science and Informatics, Geisinger, Danville, Pennsylvania, USA
| | - George Hripcsak
- Department of Biomedical Informatics, Columbia University Medical Center, New York, New York, USA
- NewYork-Presbyterian Hospital, New York, New York, USA
| |
Collapse
|
12
|
Linder JE, Bastarache L, Hughey JJ, Peterson JF. The Role of Electronic Health Records in Advancing Genomic Medicine. Annu Rev Genomics Hum Genet 2021; 22:219-238. [PMID: 34038146 PMCID: PMC9297710 DOI: 10.1146/annurev-genom-121120-125204] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022]
Abstract
Recent advances in genomic technology and widespread adoption of electronic health records (EHRs) have accelerated the development of genomic medicine, bringing promising research findings from genome science into clinical practice. Genomic and phenomic data, accrued across large populations through biobanks linked to EHRs, have enabled the study of genetic variation at a phenome-wide scale. Through new quantitative techniques, pleiotropy can be explored with phenome-wide association studies, the occurrence of common complex diseases can be predicted using the cumulative influence of many genetic variants (polygenic risk scores), and undiagnosed Mendelian syndromes can be identified using EHR-based phenotypic signatures (phenotype risk scores). In this review, we trace the role of EHRs from the development of genome-wide analytic techniques to translational efforts to test these new interventions to the clinic. Throughout, we describe the challenges that remain when combining EHRs with genetics to improve clinical care.
Collapse
Affiliation(s)
- Jodell E Linder
- Vanderbilt Institute for Clinical and Translational Research, Vanderbilt University Medical Center, Nashville, Tennessee 37203, USA;
| | - Lisa Bastarache
- Department of Biomedical Informatics, Vanderbilt University Medical Center, Nashville, Tennessee 37203, USA; , ,
| | - Jacob J Hughey
- Department of Biomedical Informatics, Vanderbilt University Medical Center, Nashville, Tennessee 37203, USA; , ,
| | - Josh F Peterson
- Department of Biomedical Informatics, Vanderbilt University Medical Center, Nashville, Tennessee 37203, USA; , ,
- Department of Medicine, Vanderbilt University Medical Center, Nashville, Tennessee 37203, USA
| |
Collapse
|
13
|
Hripcsak G, Schuemie MJ, Madigan D, Ryan PB, Suchard MA. Drawing Reproducible Conclusions from Observational Clinical Data with OHDSI. Yearb Med Inform 2021; 30:283-289. [PMID: 33882595 PMCID: PMC8416226 DOI: 10.1055/s-0041-1726481] [Citation(s) in RCA: 17] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/07/2023] Open
Abstract
OBJECTIVE The current observational research literature shows extensive publication bias and contradiction. The Observational Health Data Sciences and Informatics (OHDSI) initiative seeks to improve research reproducibility through open science. METHODS OHDSI has created an international federated data source of electronic health records and administrative claims that covers nearly 10% of the world's population. Using a common data model with a practical schema and extensive vocabulary mappings, data from around the world follow the identical format. OHDSI's research methods emphasize reproducibility, with a large-scale approach to addressing confounding using propensity score adjustment with extensive diagnostics; negative and positive control hypotheses to test for residual systematic error; a variety of data sources to assess consistency and generalizability; a completely open approach including protocol, software, models, parameters, and raw results so that studies can be externally verified; and the study of many hypotheses in parallel so that the operating characteristics of the methods can be assessed. RESULTS OHDSI has already produced findings in areas like hypertension treatment that are being incorporated into practice, and it has produced rigorous studies of COVID-19 that have aided government agencies in their treatment decisions, that have characterized the disease extensively, that have estimated the comparative effects of treatments, and that the predict likelihood of advancing to serious complications. CONCLUSIONS OHDSI practices open science and incorporates a series of methods to address reproducibility. It has produced important results in several areas, including hypertension therapy and COVID-19 research.
Collapse
Affiliation(s)
- George Hripcsak
- Department of Biomedical Informatics, Columbia University, New York, New York, USA
- Observational Health Data Sciences and Informatics, New York, New York, USA
| | - Martijn J. Schuemie
- Observational Health Data Sciences and Informatics, New York, New York, USA
- Epidemiology Analytics, Janssen Research and Development, Titusville, New Jersey, USA
| | - David Madigan
- Observational Health Data Sciences and Informatics, New York, New York, USA
- Northeastern University, Boston, Massachusetts, USA
| | - Patrick B. Ryan
- Department of Biomedical Informatics, Columbia University, New York, New York, USA
- Observational Health Data Sciences and Informatics, New York, New York, USA
- Epidemiology Analytics, Janssen Research and Development, Titusville, New Jersey, USA
| | - Marc A. Suchard
- Observational Health Data Sciences and Informatics, New York, New York, USA
- Fielding School of Public Health, Department of Biostatistics, University of California, Los Angeles, Los Angeles, USA
- David Geffen School of Medicine, Department of Biomathematics, University of California, Los Angeles, Los Angeles, USA
| |
Collapse
|
14
|
Liu S, Luo Y, Stone D, Zong N, Wen A, Yu Y, Rasmussen LV, Wang F, Pathak J, Liu H, Jiang G. Integration of NLP2FHIR Representation with Deep Learning Models for EHR Phenotyping: A Pilot Study on Obesity Datasets. AMIA JOINT SUMMITS ON TRANSLATIONAL SCIENCE PROCEEDINGS. AMIA JOINT SUMMITS ON TRANSLATIONAL SCIENCE 2021; 2021:410-419. [PMID: 34457156 PMCID: PMC8378603] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Subscribe] [Scholar Register] [Indexed: 06/13/2023]
Abstract
HL7 Fast Healthcare Interoperability Resources (FHIR) is one of the current data standards for enabling electronic healthcare information exchange. Previous studies have shown that FHIR is capable of modeling both structured and unstructured data from electronic health records (EHRs). However, the capability of FHIR in enabling clinical data analytics has not been well investigated. The objective of the study is to demonstrate how FHIR-based representation of unstructured EHR data can be ported to deep learning models for text classification in clinical phenotyping. We leverage and extend the NLP2FHIR clinical data normalization pipeline and conduct a case study with two obesity datasets. We tested several deep learning-based text classifiers such as convolutional neural networks, gated recurrent unit, and text graph convolutional networks on both raw text and NLP2FHIR inputs. We found that the combination of NLP2FHIR input and text graph convolutional networks has the highest F1 score. Therefore, FHIR-based deep learning methods has the potential to be leveraged in supporting EHR phenotyping, making the phenotyping algorithms more portable across EHR systems and institutions.
Collapse
Affiliation(s)
| | - Yuan Luo
- Northwestern University, Chicago, IL
| | | | | | | | - Yue Yu
- Mayo Clinic, Rochester, MN
| | | | - Fei Wang
- Weill Cornell Medicine, New York, NY
| | | | | | | |
Collapse
|
15
|
Wen A, Rasmussen LV, Stone D, Liu S, Kiefer R, Adekkanattu P, Brandt PS, Pacheco JA, Luo Y, Wang F, Pathak J, Liu H, Jiang G. CQL4NLP: Development and Integration of FHIR NLP Extensions in Clinical Quality Language for EHR-driven Phenotyping. AMIA JOINT SUMMITS ON TRANSLATIONAL SCIENCE PROCEEDINGS. AMIA JOINT SUMMITS ON TRANSLATIONAL SCIENCE 2021; 2021:624-633. [PMID: 34457178 PMCID: PMC8378647] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Subscribe] [Scholar Register] [Indexed: 06/13/2023]
Abstract
Lack of standardized representation of natural language processing (NLP) components in phenotyping algorithms hinders portability of the phenotyping algorithms and their execution in a high-throughput and reproducible manner. The objective of the study is to develop and evaluate a standard-driven approach - CQL4NLP - that integrates a collection of NLP extensions represented in the HL7 Fast Healthcare Interoperability Resources (FHIR) standard into the clinical quality language (CQL). A minimal NLP data model with 11 NLP-specific data elements was created, including six FHIR NLP extensions. All 11 data elements were identified from their usage in real-world phenotyping algorithms. An NLP ruleset generation mechanism was integrated into the NLP2FHIR pipeline and the NLP rulesets enabled comparable performance for a case study with the identification of obesity comorbidities. The NLP ruleset generation mechanism created a reproducible process for defining the NLP components of a phenotyping algorithm and its execution.
Collapse
Affiliation(s)
| | | | | | | | | | | | | | | | - Yuan Luo
- Northwestern University, Chicago, IL
| | - Fei Wang
- Weill Cornell Medicine, New York, NY
| | | | | | | |
Collapse
|
16
|
Park J, You SC, Jeong E, Weng C, Park D, Roh J, Lee DY, Cheong JY, Choi JW, Kang M, Park RW. A Framework (SOCRATex) for Hierarchical Annotation of Unstructured Electronic Health Records and Integration Into a Standardized Medical Database: Development and Usability Study. JMIR Med Inform 2021; 9:e23983. [PMID: 33783361 PMCID: PMC8044740 DOI: 10.2196/23983] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/31/2020] [Revised: 11/14/2020] [Accepted: 01/23/2021] [Indexed: 02/06/2023] Open
Abstract
BACKGROUND Although electronic health records (EHRs) have been widely used in secondary assessments, clinical documents are relatively less utilized owing to the lack of standardized clinical text frameworks across different institutions. OBJECTIVE This study aimed to develop a framework for processing unstructured clinical documents of EHRs and integration with standardized structured data. METHODS We developed a framework known as Staged Optimization of Curation, Regularization, and Annotation of clinical text (SOCRATex). SOCRATex has the following four aspects: (1) extracting clinical notes for the target population and preprocessing the data, (2) defining the annotation schema with a hierarchical structure, (3) performing document-level hierarchical annotation using the annotation schema, and (4) indexing annotations for a search engine system. To test the usability of the proposed framework, proof-of-concept studies were performed on EHRs. We defined three distinctive patient groups and extracted their clinical documents (ie, pathology reports, radiology reports, and admission notes). The documents were annotated and integrated into the Observational Medical Outcomes Partnership (OMOP)-common data model (CDM) database. The annotations were used for creating Cox proportional hazard models with different settings of clinical analyses to measure (1) all-cause mortality, (2) thyroid cancer recurrence, and (3) 30-day hospital readmission. RESULTS Overall, 1055 clinical documents of 953 patients were extracted and annotated using the defined annotation schemas. The generated annotations were indexed into an unstructured textual data repository. Using the annotations of pathology reports, we identified that node metastasis and lymphovascular tumor invasion were associated with all-cause mortality among colon and rectum cancer patients (both P=.02). The other analyses involving measuring thyroid cancer recurrence using radiology reports and 30-day hospital readmission using admission notes in depressive disorder patients also showed results consistent with previous findings. CONCLUSIONS We propose a framework for hierarchical annotation of textual data and integration into a standardized OMOP-CDM medical database. The proof-of-concept studies demonstrated that our framework can effectively process and integrate diverse clinical documents with standardized structured data for clinical research.
Collapse
Affiliation(s)
- Jimyung Park
- Department of Biomedical Sciences, Ajou University Graduate School of Medicine, Suwon, Republic of Korea
| | - Seng Chan You
- Department of Preventive Medicine and Public Health, Yonsei University College of Medicine, Seoul, Republic of Korea
| | - Eugene Jeong
- Department of Biomedical Informatics, Vanderbilt University School of Medicine, Nashville, TN, United States
| | - Chunhua Weng
- Department of Biomedical Informatics, Columbia University, New York, NY, United States
| | - Dongsu Park
- Department of Biomedical Informatics, Ajou University School of Medicine, Suwon, Republic of Korea
| | - Jin Roh
- Department of Pathology, Ajou University Hospital, Suwon, Republic of Korea
| | - Dong Yun Lee
- Department of Biomedical Informatics, Ajou University School of Medicine, Suwon, Republic of Korea
| | - Jae Youn Cheong
- Department of Gastroenterology, Ajou University School of Medicine, Suwon, Republic of Korea
| | - Jin Wook Choi
- Department of Radiology, Ajou University School of Medicine, Suwon, Republic of Korea
| | - Mira Kang
- Department of Digital Health, Samsung Advanced Institute for Health Sciences & Technology, Sungkyunkwan University, Seoul, Republic of Korea
| | - Rae Woong Park
- Department of Biomedical Sciences, Ajou University Graduate School of Medicine, Suwon, Republic of Korea
- Department of Biomedical Informatics, Ajou University School of Medicine, Suwon, Republic of Korea
| |
Collapse
|
17
|
Artificial Intelligence in Clinical Immunology. Artif Intell Med 2021. [DOI: 10.1007/978-3-030-58080-3_83-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/25/2022]
|
18
|
Ryu B, Yoon E, Kim S, Lee S, Baek H, Yi S, Na HY, Kim JW, Baek RM, Hwang H, Yoo S. Transformation of Pathology Reports Into the Common Data Model With Oncology Module: Use Case for Colon Cancer. J Med Internet Res 2020; 22:e18526. [PMID: 33295294 PMCID: PMC7758167 DOI: 10.2196/18526] [Citation(s) in RCA: 12] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/04/2020] [Revised: 05/20/2020] [Accepted: 11/11/2020] [Indexed: 01/15/2023] Open
Abstract
BACKGROUND Common data models (CDMs) help standardize electronic health record data and facilitate outcome analysis for observational and longitudinal research. An analysis of pathology reports is required to establish fundamental information infrastructure for data-driven colon cancer research. The Observational Medical Outcomes Partnership (OMOP) CDM is used in distributed research networks for clinical data; however, it requires conversion of free text-based pathology reports into the CDM's format. There are few use cases of representing cancer data in CDM. OBJECTIVE In this study, we aimed to construct a CDM database of colon cancer-related pathology with natural language processing (NLP) for a research platform that can utilize both clinical and omics data. The essential text entities from the pathology reports are extracted, standardized, and converted to the OMOP CDM format in order to utilize the pathology data in cancer research. METHODS We extracted clinical text entities, mapped them to the standard concepts in the Observational Health Data Sciences and Informatics vocabularies, and built databases and defined relations for the CDM tables. Major clinical entities were extracted through NLP on pathology reports of surgical specimens, immunohistochemical studies, and molecular studies of colon cancer patients at a tertiary general hospital in South Korea. Items were extracted from each report using regular expressions in Python. Unstructured data, such as text that does not have a pattern, were handled with expert advice by adding regular expression rules. Our own dictionary was used for normalization and standardization to deal with biomarker and gene names and other ungrammatical expressions. The extracted clinical and genetic information was mapped to the Logical Observation Identifiers Names and Codes databases and the Systematized Nomenclature of Medicine (SNOMED) standard terminologies recommended by the OMOP CDM. The database-table relationships were newly defined through SNOMED standard terminology concepts. The standardized data were inserted into the CDM tables. For evaluation, 100 reports were randomly selected and independently annotated by a medical informatics expert and a nurse. RESULTS We examined and standardized 1848 immunohistochemical study reports, 3890 molecular study reports, and 12,352 pathology reports of surgical specimens (from 2017 to 2018). The constructed and updated database contained the following extracted colorectal entities: (1) NOTE_NLP, (2) MEASUREMENT, (3) CONDITION_OCCURRENCE, (4) SPECIMEN, and (5) FACT_RELATIONSHIP of specimen with condition and measurement. CONCLUSIONS This study aimed to prepare CDM data for a research platform to take advantage of all omics clinical and patient data at Seoul National University Bundang Hospital for colon cancer pathology. A more sophisticated preparation of the pathology data is needed for further research on cancer genomics, and various types of text narratives are the next target for additional research on the use of data in the CDM.
Collapse
Affiliation(s)
- Borim Ryu
- Office of eHealth Research and Business, Seoul National University Bundang Hospital, Seongnam, Republic of Korea
| | - Eunsil Yoon
- Office of eHealth Research and Business, Seoul National University Bundang Hospital, Seongnam, Republic of Korea
| | - Seok Kim
- Office of eHealth Research and Business, Seoul National University Bundang Hospital, Seongnam, Republic of Korea
| | - Sejoon Lee
- Department of Pathology and Translational Medicine, Seoul National University Bundang Hospital, Seongnam, Republic of Korea
| | - Hyunyoung Baek
- Office of eHealth Research and Business, Seoul National University Bundang Hospital, Seongnam, Republic of Korea
| | - Soyoung Yi
- Office of eHealth Research and Business, Seoul National University Bundang Hospital, Seongnam, Republic of Korea
| | - Hee Young Na
- Department of Pathology and Translational Medicine, Seoul National University Bundang Hospital, Seongnam, Republic of Korea
| | - Ji-Won Kim
- Division of Hematology and Medical Oncology, Seoul National University Bundang Hospital, Seongnam, Republic of Korea
| | - Rong-Min Baek
- Department of Plastic Surgery, Seoul National University Bundang Hospital, Seongnam, Republic of Korea
| | - Hee Hwang
- Office of eHealth Research and Business, Seoul National University Bundang Hospital, Seongnam, Republic of Korea
| | - Sooyoung Yoo
- Office of eHealth Research and Business, Seoul National University Bundang Hospital, Seongnam, Republic of Korea
| |
Collapse
|
19
|
Abstract
PURPOSE OF REVIEW Healthcare has already been impacted by the fourth industrial revolution exemplified by tip of spear technology, such as artificial intelligence and quantum computing. Yet, there is much to be accomplished as systems remain suboptimal, and full interoperability of digital records is not realized. Given the footprint of technology in healthcare, the field of clinical immunology will certainly see improvements related to these tools. RECENT FINDINGS Biomedical informatics spans the gamut of technology in biomedicine. Within this distinct field, advances are being made, which allow for engineering of systems to automate disease detection, create computable phenotypes and improve record portability. Within clinical immunology, technologies are emerging along these lines and are expected to continue. SUMMARY This review highlights advancements in digital health including learning health systems, electronic phenotyping, artificial intelligence and use of registries. Technological advancements for improving diagnosis and care of patients with primary immunodeficiency diseases is also highlighted.
Collapse
|
20
|
Bozkurt S, Paul R, Coquet J, Sun R, Banerjee I, Brooks JD, Hernandez-Boussard T. Phenotyping severity of patient-centered outcomes using clinical notes: A prostate cancer use case. Learn Health Syst 2020; 4:e10237. [PMID: 33083539 PMCID: PMC7556418 DOI: 10.1002/lrh2.10237] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/15/2020] [Revised: 06/15/2020] [Accepted: 06/23/2020] [Indexed: 01/12/2023] Open
Abstract
Introduction A learning health system (LHS) must improve care in ways that are meaningful to patients, integrating patient‐centered outcomes (PCOs) into core infrastructure. PCOs are common following cancer treatment, such as urinary incontinence (UI) following prostatectomy. However, PCOs are not systematically recorded because they can only be described by the patient, are subjective and captured as unstructured text in the electronic health record (EHR). Therefore, PCOs pose significant challenges for phenotyping patients. Here, we present a natural language processing (NLP) approach for phenotyping patients with UI to classify their disease into severity subtypes, which can increase opportunities to provide precision‐based therapy and promote a value‐based delivery system. Methods Patients undergoing prostate cancer treatment from 2008 to 2018 were identified at an academic medical center. Using a hybrid NLP pipeline that combines rule‐based and deep learning methodologies, we classified positive UI cases as mild, moderate, and severe by mining clinical notes. Results The rule‐based model accurately classified UI into disease severity categories (accuracy: 0.86), which outperformed the deep learning model (accuracy: 0.73). In the deep learning model, the recall rates for mild and moderate group were higher than the precision rate (0.78 and 0.79, respectively). A hybrid model that combined both methods did not improve the accuracy of the rule‐based model but did outperform the deep learning model (accuracy: 0.75). Conclusion Phenotyping patients based on indication and severity of PCOs is essential to advance a patient centered LHS. EHRs contain valuable information on PCOs and by using NLP methods, it is feasible to accurately and efficiently phenotype PCO severity. Phenotyping must extend beyond the identification of disease to provide classification of disease severity that can be used to guide treatment and inform shared decision‐making. Our methods demonstrate a path to a patient centered LHS that could advance precision medicine.
Collapse
Affiliation(s)
- Selen Bozkurt
- Department of Medicine, Biomedical Informatics Research Stanford University Stanford California USA
| | - Rohan Paul
- Department of Biomedical Data Sciences Stanford University Stanford California USA
| | - Jean Coquet
- Department of Medicine, Biomedical Informatics Research Stanford University Stanford California USA
| | - Ran Sun
- Department of Medicine, Biomedical Informatics Research Stanford University Stanford California USA
| | - Imon Banerjee
- Department of Biomedical Data Sciences Stanford University Stanford California USA.,Department of Radiology Stanford University Stanford California USA
| | - James D Brooks
- Department of Urology Stanford University Stanford California USA
| | - Tina Hernandez-Boussard
- Department of Medicine, Biomedical Informatics Research Stanford University Stanford California USA.,Department of Biomedical Data Sciences Stanford University Stanford California USA.,Department of Surgery Stanford University Stanford California USA
| |
Collapse
|
21
|
Xu J, Wang F, Xu Z, Adekkanattu P, Brandt P, Jiang G, Kiefer RC, Luo Y, Mao C, Pacheco JA, Rasmussen LV, Zhang Y, Isaacson R, Pathak J. Data-driven discovery of probable Alzheimer's disease and related dementia subphenotypes using electronic health records. Learn Health Syst 2020; 4:e10246. [PMID: 33083543 PMCID: PMC7556420 DOI: 10.1002/lrh2.10246] [Citation(s) in RCA: 12] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/18/2020] [Revised: 07/19/2020] [Accepted: 08/06/2020] [Indexed: 12/04/2022] Open
Abstract
Introduction We sought to assess longitudinal electronic health records (EHRs) using machine learning (ML) methods to computationally derive probable Alzheimer's Disease (AD) and related dementia subphenotypes. Methods A retrospective analysis of EHR data from a cohort of 7587 patients seen at a large, multi‐specialty urban academic medical center in New York was conducted. Subphenotypes were derived using hierarchical clustering from 792 probable AD patients (cases) who had received at least one diagnosis of AD using their clinical data. The other 6795 patients, labeled as controls, were matched on age and gender with the cases and randomly selected in the ratio of 9:1. Prediction models with multiple ML algorithms were trained on this cohort using 5‐fold cross‐validation. XGBoost was used to rank the variable importance. Results Four subphenotypes were computationally derived. Subphenotype A (n = 273; 28.2%) had more patients with cardiovascular diseases; subphenotype B (n = 221; 27.9%) had more patients with mental health illnesses, such as depression and anxiety; patients in subphenotype C (n = 183; 23.1%) were overall older (mean (SD) age, 79.5 (5.4) years) and had the most comorbidities including diabetes, cardiovascular diseases, and mental health disorders; and subphenotype D (n = 115; 14.5%) included patients who took anti‐dementia drugs and had sensory problems, such as deafness and hearing impairment. The 0‐year prediction model for AD risk achieved an area under the receiver operating curve (AUC) of 0.764 (SD: 0.02); the 6‐month model, 0.751 (SD: 0.02); the 1‐year model, 0.752 (SD: 0.02); the 2‐year model, 0.749 (SD: 0.03); and the 3‐year model, 0.735 (SD: 0.03), respectively. Based on variable importance, the top‐ranked comorbidities included depression, stroke/transient ischemic attack, hypertension, anxiety, mobility impairments, and atrial fibrillation. The top‐ranked medications included anti‐dementia drugs, antipsychotics, antiepileptics, and antidepressants. Conclusions Four subphenotypes were computationally derived that correlated with cardiovascular diseases and mental health illnesses. ML algorithms based on patient demographics, diagnosis, and treatment demonstrated promising results in predicting the risk of developing AD at different time points across an individual's lifespan.
Collapse
Affiliation(s)
- Jie Xu
- Department of Population Health Sciences Information Technologies and Services, Weill Cornell Medicine New York New York USA
| | - Fei Wang
- Department of Population Health Sciences Information Technologies and Services, Weill Cornell Medicine New York New York USA
| | - Zhenxing Xu
- Department of Population Health Sciences Information Technologies and Services, Weill Cornell Medicine New York New York USA
| | - Prakash Adekkanattu
- Information Technologies and Services, Weill Cornell Medicine New York New York USA
| | - Pascal Brandt
- Biomedical Informatics and Medical Education University of Washington Seattle Washington USA
| | - Guoqian Jiang
- Department of Health Sciences Research Mayo Clinic Rochester Minnesota USA
| | - Richard C Kiefer
- Department of Health Sciences Research Mayo Clinic Rochester Minnesota USA
| | - Yuan Luo
- Feinberg School of Medicine Northwestern University Chicago Illinois USA
| | - Chengsheng Mao
- Feinberg School of Medicine Northwestern University Chicago Illinois USA
| | - Jennifer A Pacheco
- Feinberg School of Medicine Northwestern University Chicago Illinois USA
| | - Luke V Rasmussen
- Feinberg School of Medicine Northwestern University Chicago Illinois USA
| | - Yiye Zhang
- Department of Population Health Sciences Information Technologies and Services, Weill Cornell Medicine New York New York USA
| | - Richard Isaacson
- Department of Population Health Sciences Information Technologies and Services, Weill Cornell Medicine New York New York USA
| | - Jyotishman Pathak
- Department of Population Health Sciences Information Technologies and Services, Weill Cornell Medicine New York New York USA
| |
Collapse
|
22
|
Brandt PS, Kiefer RC, Pacheco JA, Adekkanattu P, Sholle ET, Ahmad FS, Xu J, Xu Z, Ancker JS, Wang F, Luo Y, Jiang G, Pathak J, Rasmussen LV. Toward cross-platform electronic health record-driven phenotyping using Clinical Quality Language. Learn Health Syst 2020; 4:e10233. [PMID: 33083538 PMCID: PMC7556419 DOI: 10.1002/lrh2.10233] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/18/2020] [Revised: 05/31/2020] [Accepted: 06/02/2020] [Indexed: 12/14/2022] Open
Abstract
INTRODUCTION Electronic health record (EHR)-driven phenotyping is a critical first step in generating biomedical knowledge from EHR data. Despite recent progress, current phenotyping approaches are manual, time-consuming, error-prone, and platform-specific. This results in duplication of effort and highly variable results across systems and institutions, and is not scalable or portable. In this work, we investigate how the nascent Clinical Quality Language (CQL) can address these issues and enable high-throughput, cross-platform phenotyping. METHODS We selected a clinically validated heart failure (HF) phenotype definition and translated it into CQL, then developed a CQL execution engine to integrate with the Observational Health Data Sciences and Informatics (OHDSI) platform. We executed the phenotype definition at two large academic medical centers, Northwestern Medicine and Weill Cornell Medicine, and conducted results verification (n = 100) to determine precision and recall. We additionally executed the same phenotype definition against two different data platforms, OHDSI and Fast Healthcare Interoperability Resources (FHIR), using the same underlying dataset and compared the results. RESULTS CQL is expressive enough to represent the HF phenotype definition, including Boolean and aggregate operators, and temporal relationships between data elements. The language design also enabled the implementation of a custom execution engine with relative ease, and results verification at both sites revealed that precision and recall were both 100%. Cross-platform execution resulted in identical patient cohorts generated by both data platforms. CONCLUSIONS CQL supports the representation of arbitrarily complex phenotype definitions, and our execution engine implementation demonstrated cross-platform execution against two widely used clinical data platforms. The language thus has the potential to help address current limitations with portability in EHR-driven phenotyping and scale in learning health systems.
Collapse
Affiliation(s)
- Pascal S. Brandt
- Biomedical Informatics and Medical EducationUniversity of WashingtonSeattleWashingtonUSA
| | - Richard C. Kiefer
- Department of Health Sciences ResearchMayo ClinicRochesterMinnesotaUSA
| | | | - Prakash Adekkanattu
- Information Technologies and ServicesWeill Cornell MedicineNew YorkNew YorkUSA
| | - Evan T. Sholle
- Information Technologies and ServicesWeill Cornell MedicineNew YorkNew YorkUSA
| | - Faraz S. Ahmad
- Feinberg School of MedicineNorthwestern UniversityChicagoIllinoisUSA
| | - Jie Xu
- Department of Population Health SciencesWeill Cornell MedicineNew YorkNew YorkUSA
| | - Zhenxing Xu
- Department of Population Health SciencesWeill Cornell MedicineNew YorkNew YorkUSA
| | - Jessica S. Ancker
- Department of Population Health SciencesWeill Cornell MedicineNew YorkNew YorkUSA
| | - Fei Wang
- Department of Population Health SciencesWeill Cornell MedicineNew YorkNew YorkUSA
| | - Yuan Luo
- Feinberg School of MedicineNorthwestern UniversityChicagoIllinoisUSA
| | - Guoqian Jiang
- Department of Health Sciences ResearchMayo ClinicRochesterMinnesotaUSA
| | - Jyotishman Pathak
- Department of Population Health SciencesWeill Cornell MedicineNew YorkNew YorkUSA
| | - Luke V. Rasmussen
- Feinberg School of MedicineNorthwestern UniversityChicagoIllinoisUSA
| |
Collapse
|
23
|
Ahmad FS, Ricket IM, Hammill BG, Eskenazi L, Robertson HR, Curtis LH, Dobi CD, Girotra S, Haynes K, Kizer JR, Kripalani S, Roe MT, Roumie CL, Waitman R, Jones WS, Weiner MG. Computable Phenotype Implementation for a National, Multicenter Pragmatic Clinical Trial: Lessons Learned From ADAPTABLE. Circ Cardiovasc Qual Outcomes 2020; 13:e006292. [PMID: 32466729 DOI: 10.1161/circoutcomes.119.006292] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Indexed: 12/25/2022]
Abstract
BACKGROUND Many large-scale cardiovascular clinical trials are plagued with escalating costs and low enrollment. Implementing a computable phenotype, which is a set of executable algorithms, to identify a group of clinical characteristics derivable from electronic health records or administrative claims records, is essential to successful recruitment in large-scale pragmatic clinical trials. This methods paper provides an overview of the development and implementation of a computable phenotype in ADAPTABLE (Aspirin Dosing: a Patient-Centric Trial Assessing Benefits and Long-Term Effectiveness)-a pragmatic, randomized, open-label clinical trial testing the optimal dose of aspirin for secondary prevention of atherosclerotic cardiovascular disease events. METHODS AND RESULTS A multidisciplinary team developed and tested the computable phenotype to identify adults ≥18 years of age with a history of atherosclerotic cardiovascular disease without safety concerns around using aspirin and meeting trial eligibility criteria. Using the computable phenotype, investigators identified over 650 000 potentially eligible patients from the 40 participating sites from Patient-Centered Outcomes Research Network-a network of Clinical Data Research Networks, Patient-Powered Research Networks, and Health Plan Research Networks. Leveraging diverse recruitment methods, sites enrolled 15 076 participants from April 2016 to June 2019. During the process of developing and implementing the ADAPTABLE computable phenotype, several key lessons were learned. The accuracy and utility of a computable phenotype are dependent on the quality of the source data, which can be variable even with a common data model. Local validation and modification were required based on site factors, such as recruitment strategies, data quality, and local coding patterns. Sustained collaboration among a diverse team of researchers is needed during computable phenotype development and implementation. CONCLUSIONS The ADAPTABLE computable phenotype served as an efficient method to recruit patients in a multisite pragmatic clinical trial. This process of development and implementation will be informative for future large-scale, pragmatic clinical trials. Registration: URL: https://www.clinicaltrials.gov; Unique identifier: NCT02697916.
Collapse
Affiliation(s)
- Faraz S Ahmad
- Department of Medicine, Northwestern University Feinberg School of Medicine, Chicago, IL (F.S.A.)
| | - Iben M Ricket
- Louisiana Public Health Institute, New Orleans (I.M.R.)
| | - Bradley G Hammill
- Duke University School of Medicine, Durham, NC (B.G.H., M.T.R., W.S.J.).,Duke Clinical Research Institute, Durham, NC (B.G.H., L.E., H.R., L.H.C., M.T.R., W.S.J.)
| | - Lisa Eskenazi
- Duke Clinical Research Institute, Durham, NC (B.G.H., L.E., H.R., L.H.C., M.T.R., W.S.J.)
| | - Holly R Robertson
- Duke Clinical Research Institute, Durham, NC (B.G.H., L.E., H.R., L.H.C., M.T.R., W.S.J.)
| | - Lesley H Curtis
- Duke Clinical Research Institute, Durham, NC (B.G.H., L.E., H.R., L.H.C., M.T.R., W.S.J.)
| | - Cecilia D Dobi
- Department of Clinical Sciences, Lewis Katz School of Medicine at Temple University, Philadelphia, PA (C.D.D.)
| | - Saket Girotra
- University of Iowa Carver College of Medicine, Iowa City (S.G.).,Iowa City Veteran Affairs Medical Center (S.G.)
| | - Kevin Haynes
- Scientific Affairs, HealthCore, Inc., Wilmington, DE (K.H.)
| | - Jorge R Kizer
- Cardiology Section, San Francisco Veterans Affairs Health Care System, CA (J.R.K.).,Department of Medicine and Department of Epidemiology and Biostatistics, University of California San Francisco (J.R.K.)
| | - Sunil Kripalani
- Department of Medicine, Vanderbilt University Medical Center, Veterans Health Administration-Tennessee Valley Healthcare System Geriatric Research Education Clinical Center, Health Services Research and Development Center, Nashville, TN (S.K., C.L.R.)
| | - Mathew T Roe
- Duke University School of Medicine, Durham, NC (B.G.H., M.T.R., W.S.J.).,Duke Clinical Research Institute, Durham, NC (B.G.H., L.E., H.R., L.H.C., M.T.R., W.S.J.)
| | - Christianne L Roumie
- Department of Medicine, Vanderbilt University Medical Center, Veterans Health Administration-Tennessee Valley Healthcare System Geriatric Research Education Clinical Center, Health Services Research and Development Center, Nashville, TN (S.K., C.L.R.)
| | - Russ Waitman
- Department of Internal Medicine, University of Kansas Medical Center, Kansas City, KS (R.W.)
| | - W Schuyler Jones
- Duke University School of Medicine, Durham, NC (B.G.H., M.T.R., W.S.J.).,Duke Clinical Research Institute, Durham, NC (B.G.H., L.E., H.R., L.H.C., M.T.R., W.S.J.)
| | - Mark G Weiner
- Department of Population Health Sciences, Weill Cornell Medicine, New York Presbyterian-Weill Cornell Campus, New York (M.G.W.)
| |
Collapse
|
24
|
Adekkanattu P, Jiang G, Luo Y, Kingsbury PR, Xu Z, Rasmussen LV, Pacheco JA, Kiefer RC, Stone DJ, Brandt PS, Yao L, Zhong Y, Deng Y, Wang F, Ancker JS, Campion TR, Pathak J. Evaluating the Portability of an NLP System for Processing Echocardiograms: A Retrospective, Multi-site Observational Study. AMIA ... ANNUAL SYMPOSIUM PROCEEDINGS. AMIA SYMPOSIUM 2020; 2019:190-199. [PMID: 32308812 PMCID: PMC7153064] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Subscribe] [Scholar Register] [Indexed: 06/11/2023]
Abstract
While natural language processing (NLP) of unstructured clinical narratives holds the potential for patient care and clinical research, portability of NLP approaches across multiple sites remains a major challenge. This study investigated the portability of an NLP system developed initially at the Department of Veterans Affairs (VA) to extract 27 key cardiac concepts from free-text or semi-structured echocardiograms from three academic edical centers: Weill Cornell Medicine, Mayo Clinic and Northwestern Medicine. While the NLP system showed high precision and recall easurements for four target concepts (aortic valve regurgitation, left atrium size at end systole, mitral valve regurgitation, tricuspid valve regurgitation) across all sites, we found moderate or poor results for the remaining concepts and the NLP system performance varied between individual sites.
Collapse
Affiliation(s)
| | | | - Yuan Luo
- Northwestern University, Chicago, IL
| | | | | | | | | | | | | | | | - Liang Yao
- Northwestern University, Chicago, IL
| | | | - Yu Deng
- Northwestern University, Chicago, IL
| | - Fei Wang
- Weill Cornell Medicine, New York, NY
| | | | | | | |
Collapse
|
25
|
Hong N, Wen A, Stone DJ, Tsuji S, Kingsbury PR, Rasmussen LV, Pacheco JA, Adekkanattu P, Wang F, Luo Y, Pathak J, Liu H, Jiang G. Developing a FHIR-based EHR phenotyping framework: A case study for identification of patients with obesity and multiple comorbidities from discharge summaries. J Biomed Inform 2019; 99:103310. [PMID: 31622801 PMCID: PMC6990976 DOI: 10.1016/j.jbi.2019.103310] [Citation(s) in RCA: 33] [Impact Index Per Article: 5.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/01/2019] [Revised: 09/15/2019] [Accepted: 10/11/2019] [Indexed: 12/16/2022]
Abstract
BACKGROUND Standards-based clinical data normalization has become a key component of effective data integration and accurate phenotyping for secondary use of electronic healthcare records (EHR) data. HL7 Fast Healthcare Interoperability Resources (FHIR) is an emerging clinical data standard for exchanging electronic healthcare data and has been used in modeling and integrating both structured and unstructured EHR data for a variety of clinical research applications. The overall objective of this study is to develop and evaluate a FHIR-based EHR phenotyping framework for identification of patients with obesity and its multiple comorbidities from semi-structured discharge summaries leveraging a FHIR-based clinical data normalization pipeline (known as NLP2FHIR). METHODS We implemented a multi-class and multi-label classification system based on the i2b2 Obesity Challenge task to evaluate the FHIR-based EHR phenotyping framework. Two core parts of the framework are: (a) the conversion of discharge summaries into corresponding FHIR resources - Composition, Condition, MedicationStatement, Procedure and FamilyMemberHistory using the NLP2FHIR pipeline, and (b) the implementation of four machine learning algorithms (logistic regression, support vector machine, decision tree, and random forest) to train classifiers to predict disease state of obesity and 15 comorbidities using features extracted from standard FHIR resources and terminology expansions. We used the macro- and micro-averaged precision (P), recall (R), and F1 score (F1) measures to evaluate the classifier performance. We validated the framework using a second obesity dataset extracted from the MIMIC-III database. RESULTS Using the NLP2FHIR pipeline, 1237 clinical discharge summaries from the 2008 i2b2 obesity challenge dataset were represented as the instances of the FHIR Composition resource consisting of 5677 records with 16 unique section types. After the NLP processing and FHIR modeling, a set of 244,438 FHIR clinical resource instances were generated. As the results of the four machine learning classifiers, the random forest algorithm performed the best with F1-micro(0.9466)/F1-macro(0.7887) and F1-micro(0.9536)/F1-macro(0.6524) for intuitive classification (reflecting medical professionals' judgments) and textual classification (reflecting the judgments based on explicitly reported information of diseases), respectively. The MIMIC-III obesity dataset was successfully integrated for prediction with minimal configuration of the NLP2FHIR pipeline and machine learning models. CONCLUSIONS The study demonstrated that the FHIR-based EHR phenotyping approach could effectively identify the state of obesity and multiple comorbidities using semi-structured discharge summaries. Our FHIR-based phenotyping approach is a first concrete step towards improving the data aspect of phenotyping portability across EHR systems and enhancing interpretability of the machine learning-based phenotyping algorithms.
Collapse
Affiliation(s)
- Na Hong
- Mayo Clinic, Rochester, MN, USA
| | | | | | | | | | - Luke V Rasmussen
- Northwestern University Feinberg School of Medicine, Chicago, IL, USA
| | | | | | - Fei Wang
- Weill Cornell Medicine, New York City, NY, USA
| | - Yuan Luo
- Northwestern University Feinberg School of Medicine, Chicago, IL, USA
| | | | | | | |
Collapse
|
26
|
Vydiswaran VGV, Zhang Y, Wang Y, Xu H. Special issue of BMC medical informatics and decision making on health natural language processing. BMC Med Inform Decis Mak 2019; 19:76. [PMID: 30943961 PMCID: PMC6448180 DOI: 10.1186/s12911-019-0777-0] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [MESH Headings] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022] Open
Affiliation(s)
| | - Yaoyun Zhang
- School of Biomedical Informatics, The University of Texas Health Science Center at Houston, Houston, TX USA
| | - Yanshan Wang
- Department of Health Sciences Research, Mayo Clinic, Rochester, MN USA
| | - Hua Xu
- School of Biomedical Informatics, The University of Texas Health Science Center at Houston, Houston, TX USA
| |
Collapse
|