1
|
Ding S, Zhang S, Hu X, Zou N. Identify and mitigate bias in electronic phenotyping: A comprehensive study from computational perspective. J Biomed Inform 2024:104671. [PMID: 38876452 DOI: 10.1016/j.jbi.2024.104671] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/01/2023] [Revised: 05/26/2024] [Accepted: 06/05/2024] [Indexed: 06/16/2024]
Abstract
Electronic phenotyping is a fundamental task that identifies the special group of patients, which plays an important role in precision medicine in the era of digital health. Phenotyping provides real-world evidence for other related biomedical research and clinical tasks, e.g., disease diagnosis, drug development, and clinical trials, etc. With the development of electronic health records, the performance of electronic phenotyping has been significantly boosted by advanced machine learning techniques. In the healthcare domain, precision and fairness are both essential aspects that should be taken into consideration. However, most related efforts are put into designing phenotyping models with higher accuracy. Few attention is put on the fairness perspective of phenotyping. The neglection of bias in phenotyping leads to subgroups of patients being underrepresented which will further affect the following healthcare activities such as patient recruitment in clinical trials. In this work, we are motivated to bridge this gap through a comprehensive experimental study to identify the bias existing in electronic phenotyping models and evaluate the widely-used debiasing methods' performance on these models. We choose pneumonia and sepsis as our phenotyping target diseases. We benchmark 9 kinds of electronic phenotyping methods spanning from rule-based to data-driven methods. Meanwhile, we evaluate the performance of the 5 bias mitigation strategies covering pre-processing, in-processing, and post-processing. Through the extensive experiments, we summarize several insightful findings from the bias identified in the phenotyping and key points of the bias mitigation strategies in phenotyping.
Collapse
Affiliation(s)
- Sirui Ding
- Department of Computer Science & Engineering, Texas A&M University, College Station, TX, United States
| | - Shenghan Zhang
- Department of Biomedical Informatics, Harvard University, Boston, MA, United States
| | - Xia Hu
- Department of Computer Science, Rice University, Houston, TX, United States
| | - Na Zou
- Department of Industrial Engineering, University of Houston, Houston, TX, United States.
| |
Collapse
|
2
|
Yan S, Melnick K, He X, Lyu T, Moor RSF, Still MEH, Mitchell DA, Shenkman EA, Wang H, Guo Y, Bian J, Ghiaseddin AP. Developing a computable phenotype for glioblastoma. Neuro Oncol 2024; 26:1163-1170. [PMID: 38141226 PMCID: PMC11145437 DOI: 10.1093/neuonc/noad249] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/26/2023] [Indexed: 12/25/2023] Open
Abstract
BACKGROUND Glioblastoma is the most common malignant brain tumor, and thus it is important to be able to identify patients with this diagnosis for population studies. However, this can be challenging as diagnostic codes are nonspecific. The aim of this study was to create a computable phenotype (CP) for glioblastoma multiforme (GBM) from structured and unstructured data to identify patients with this condition in a large electronic health record (EHR). METHODS We used the University of Florida (UF) Health Integrated Data Repository, a centralized clinical data warehouse that stores clinical and research data from various sources within the UF Health system, including the EHR system. We performed multiple iterations to refine the GBM-relevant diagnosis codes, procedure codes, medication codes, and keywords through manual chart review of patient data. We then evaluated the performances of various possible proposed CPs constructed from the relevant codes and keywords. RESULTS We underwent six rounds of manual chart reviews to refine the CP elements. The final CP algorithm for identifying GBM patients was selected based on the best F1-score. Overall, the CP rule "if the patient had at least 1 relevant diagnosis code and at least 1 relevant keyword" demonstrated the highest F1-score using both structured and unstructured data. Thus, it was selected as the best-performing CP rule. CONCLUSIONS We developed and validated a CP algorithm for identifying patients with GBM using both structured and unstructured EHR data from a large tertiary care center. The final algorithm achieved an F1-score of 0.817, indicating a high performance, which minimizes possible biases from misclassification errors.
Collapse
Affiliation(s)
- Sandra Yan
- Department of Neurosurgery, College of Medicine, University of Florida, Gainesville, Florida, USA
| | - Kaitlyn Melnick
- Department of Neurosurgery, College of Medicine, University of Florida, Gainesville, Florida, USA
| | - Xing He
- Department of Health Outcomes & Biomedical Informatics, College of Medicine, University of Florida, Gainesville, FL, USA
- Cancer Informatics Shared Resource, University of Florida Health Cancer Center, Gainesville, FL, USA
| | - Tianchen Lyu
- Department of Health Outcomes & Biomedical Informatics, College of Medicine, University of Florida, Gainesville, FL, USA
- Cancer Informatics Shared Resource, University of Florida Health Cancer Center, Gainesville, FL, USA
| | - Rachel S F Moor
- Department of Neurosurgery, College of Medicine, University of Florida, Gainesville, Florida, USA
| | - Megan E H Still
- Department of Neurosurgery, College of Medicine, University of Florida, Gainesville, Florida, USA
| | - Duane A Mitchell
- Department of Neurosurgery, College of Medicine, University of Florida, Gainesville, Florida, USA
| | - Elizabeth A Shenkman
- Department of Health Outcomes & Biomedical Informatics, College of Medicine, University of Florida, Gainesville, FL, USA
- Cancer Informatics Shared Resource, University of Florida Health Cancer Center, Gainesville, FL, USA
| | - Han Wang
- Department of Health Outcomes & Biomedical Informatics, College of Medicine, University of Florida, Gainesville, FL, USA
| | - Yi Guo
- Department of Health Outcomes & Biomedical Informatics, College of Medicine, University of Florida, Gainesville, FL, USA
- Cancer Informatics Shared Resource, University of Florida Health Cancer Center, Gainesville, FL, USA
| | - Jiang Bian
- Department of Health Outcomes & Biomedical Informatics, College of Medicine, University of Florida, Gainesville, FL, USA
- Cancer Informatics Shared Resource, University of Florida Health Cancer Center, Gainesville, FL, USA
| | - Ashley P Ghiaseddin
- Department of Neurosurgery, College of Medicine, University of Florida, Gainesville, Florida, USA
| |
Collapse
|
3
|
Cheung KS. Big data approach in the field of gastric and colorectal cancer research. J Gastroenterol Hepatol 2024; 39:1027-1032. [PMID: 38413187 DOI: 10.1111/jgh.16527] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 01/28/2024] [Accepted: 02/07/2024] [Indexed: 02/29/2024]
Abstract
Big data is characterized by three attributes: volume, variety,, and velocity. In healthcare setting, big data refers to vast dataset that is electronically stored and managed in an automated manner and has the potential to enhance human health and healthcare system. In this review, gastric cancer (GC) and postcolonoscopy colorectal cancer (PCCRC) will be used to illustrate application of big data approach in the field of gastrointestinal cancer research. Helicobacter pylori (HP) eradication only reduces GC risk by 46% due to preexisting precancerous lesions. Apart from endoscopy surveillance, identifying medications that modify GC risk is another strategy. Population-based cohort studies showed that long-term use of proton pump inhibitors (PPIs) associated with higher GC risk after HP eradication, while aspirin and statins associated with lower risk. While diabetes mellitus conferred 73% higher GC risk, metformin use associated with 51% lower risk, effect of which was independent of glycemic control. Nonetheless, nonsteroidal anti-inflammatory drugs (NA-NSAIDs) are not associated with lower GC risk. CRC can still occur after initial colonoscopy in which no cancer was detected (i.e. PCCRC). Between 2005 and 2013, the rate of interval-type PCCRC-3y (defined as CRC diagnosed between 6 and 36 months of index colonoscopy which was negative for CRC) was 7.9% in Hong Kong, with >80% being distal cancers and higher cancer-specific mortality compared with detected CRC. Certain clinical and endoscopy-related factors were associated with PCCRC-3 risk. Medications shown to have chemopreventive effects on PCCRC include statins, NA-NSAIDs, and angiotensin-converting enzyme inhibitors/angiotensin receptor blockers.
Collapse
Affiliation(s)
- Ka Shing Cheung
- Department of Medicine, School of Clinical Medicine, The University of Hong Kong, Queen Mary Hospital, Hong Kong, China
- Department of Medicine, The University of Hong Kong-Shenzhen Hospital, Shenzhen, China
| |
Collapse
|
4
|
De Clercq L, Himmelreich JCL, Harskamp RE. Quality of heart failure registration in primary care: observations from 1 million electronic health records in the Amsterdam Metropolitan Area. Diagnosis (Berl) 2024; 0:dx-2024-0009. [PMID: 38741552 DOI: 10.1515/dx-2024-0009] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/10/2024] [Accepted: 04/22/2024] [Indexed: 05/16/2024]
Abstract
OBJECTIVES Proper coding of heart failure (HF) in electronic health records (EHRs) is an important prerequisite for adequate care and research towards this vulnerable patient population. We set out to evaluate the accuracy of registration of HF diagnoses in primary care EHRs. METHODS In a routine primary care database covering the Amsterdam Metropolitan Area, we identified all episodes of care with International Classification of Primary Care (ICPC) codes K77 (decompensatio cordis) or K84.03 (cardiomyopathy) up to 31/12/2021. We also performed two text-based searches to identify HF episodes without an appropriate ICPC-code. An expert panel evaluated all ICPC and text matches for congruence between the assigned codes and notes. RESULTS From a database of 968,433 records we identified 19,106 patients (2.0 %) with a total of 24,011 ICPC-coded HF episodes. Removal of 1,324 episodes found to concern other or uncertain diagnoses and inclusion of 4,582 validated HF episodes identified through text search led to exclusion of 909 (overregistration: 4.8 %) and inclusion of 2,266 additional patients (underregistration: 11.1 %). The inclusion of miscoded HF episodes advanced the first known date of HF diagnosis in 3.9 % of records, with a median shift of 3.45 years. Episode-level underregistration decreased significantly over time, from 23.8 % in 2006 to 10.0 % in 2021. CONCLUSIONS While there is improvement over time, there are still substantial levels of over- and underregistration of HF, emphasizing the need for cautious interpretation of ICPC-coded data. The findings contribute to the understanding of HF registration issues in primary care and provide insights for improving registration practices.
Collapse
Affiliation(s)
- Lukas De Clercq
- Department of General Practice, 26066 Amsterdam UMC location, University of Amsterdam , Amsterdam, The Netherlands
- Personalized Medicine and Digital Health, Amsterdam Public Health, Amsterdam, The Netherlands
| | - Jelle C L Himmelreich
- Department of General Practice, 26066 Amsterdam UMC location, University of Amsterdam , Amsterdam, The Netherlands
- Personalized Medicine, Amsterdam Public Health, Amsterdam, The Netherlands
- Heart Failure & Arrhythmias and Atherosclerosis & Ischemic Syndromes, Amsterdam Cardiovascular Sciences, Amsterdam, The Netherlands
| | - Ralf E Harskamp
- Department of General Practice, 26066 Amsterdam UMC location, University of Amsterdam , Amsterdam, The Netherlands
- Personalized Medicine, Amsterdam Public Health, Amsterdam, The Netherlands
- Heart Failure & Arrhythmias, Amsterdam Cardiovascular Sciences, Amsterdam, The Netherlands
| |
Collapse
|
5
|
Li Y, Yang AY, Marelli A, Li Y. MixEHR-SurG: A joint proportional hazard and guided topic model for inferring mortality-associated topics from electronic health records. J Biomed Inform 2024; 153:104638. [PMID: 38631461 DOI: 10.1016/j.jbi.2024.104638] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/20/2023] [Revised: 03/07/2024] [Accepted: 04/03/2024] [Indexed: 04/19/2024]
Abstract
Survival models can help medical practitioners to evaluate the prognostic importance of clinical variables to patient outcomes such as mortality or hospital readmission and subsequently design personalized treatment regimes. Electronic Health Records (EHRs) hold the promise for large-scale survival analysis based on systematically recorded clinical features for each patient. However, existing survival models either do not scale to high dimensional and multi-modal EHR data or are difficult to interpret. In this study, we present a supervised topic model called MixEHR-SurG to simultaneously integrate heterogeneous EHR data and model survival hazard. Our contributions are three-folds: (1) integrating EHR topic inference with Cox proportional hazards likelihood; (2) integrating patient-specific topic hyperparameters using the PheCode concepts such that each topic can be identified with exactly one PheCode-associated phenotype; (3) multi-modal survival topic inference. This leads to a highly interpretable survival topic model that can infer PheCode-specific phenotype topics associated with patient mortality. We evaluated MixEHR-SurG using a simulated dataset and two real-world EHR datasets: the Quebec Congenital Heart Disease (CHD) data consisting of 8211 subjects with 75,187 outpatient claim records of 1767 unique ICD codes; the MIMIC-III consisting of 1458 subjects with multi-modal EHR records. Compared to the baselines, MixEHR-SurG achieved a superior dynamic AUROC for mortality prediction, with a mean AUROC score of 0.89 in the simulation dataset and a mean AUROC of 0.645 on the CHD dataset. Qualitatively, MixEHR-SurG associates severe cardiac conditions with high mortality risk among the CHD patients after the first heart failure hospitalization and critical brain injuries with increased mortality among the MIMIC-III patients after their ICU discharge. Together, the integration of the Cox proportional hazards model and EHR topic inference in MixEHR-SurG not only leads to competitive mortality prediction but also meaningful phenotype topics for in-depth survival analysis. The software is available at GitHub: https://github.com/li-lab-mcgill/MixEHR-SurG.
Collapse
Affiliation(s)
- Yixuan Li
- Department of Mathematics and Statistics, McGill University, Montreal, Canada; Mila - Quebec AI institute, Montreal, Canada
| | - Archer Y Yang
- Department of Mathematics and Statistics, McGill University, Montreal, Canada; Mila - Quebec AI institute, Montreal, Canada; School of Computer Science, McGill University, Montreal, Canada.
| | - Ariane Marelli
- McGill Adult Unit for Congenital Heart Disease (MAUDE Unit), McGill University of Health Centre, Montreal, Canada.
| | - Yue Li
- Mila - Quebec AI institute, Montreal, Canada; School of Computer Science, McGill University, Montreal, Canada.
| |
Collapse
|
6
|
Martins C, Neves B, Teixeira AS, Froes M, Sarmento P, Machado J, Magalhães CA, Silva NA, Silva MJ, Leite F. Identifying subgroups in heart failure patients with multimorbidity by clustering and network analysis. BMC Med Inform Decis Mak 2024; 24:95. [PMID: 38622703 PMCID: PMC11020914 DOI: 10.1186/s12911-024-02497-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/31/2023] [Accepted: 04/03/2024] [Indexed: 04/17/2024] Open
Abstract
This study presents a workflow for identifying and characterizing patients with Heart Failure (HF) and multimorbidity utilizing data from Electronic Health Records. Multimorbidity, the co-occurrence of two or more chronic conditions, poses a significant challenge on healthcare systems. Nonetheless, understanding of patients with multimorbidity, including the most common disease interactions, risk factors, and treatment responses, remains limited, particularly for complex and heterogeneous conditions like HF. We conducted a clustering analysis of 3745 HF patients using demographics, comorbidities, laboratory values, and drug prescriptions. Our analysis revealed four distinct clusters with significant differences in multimorbidity profiles showing differential prognostic implications regarding unplanned hospital admissions. These findings underscore the considerable disease heterogeneity within HF patients and emphasize the potential for improved characterization of patient subgroups for clinical risk stratification through the use of EHR data.
Collapse
Affiliation(s)
- Catarina Martins
- Instituto Superior Técnico, Universidade de Lisboa, Lisboa, Portugal
- INESC-ID, Lisboa, Portugal
| | - Bernardo Neves
- Instituto Superior Técnico, Universidade de Lisboa, Lisboa, Portugal.
- Hospital da Luz Lisboa, Internal Medicine, Luz Saúde, Lisboa, Portugal.
- Hospital da Luz Learning Health, Luz Saúde, Lisboa, Portugal.
| | - Andreia Sofia Teixeira
- Hospital da Luz Learning Health, Luz Saúde, Lisboa, Portugal
- LASIGE and Departamento de Informática, Faculdade de Ciências, Universidade de Lisboa, Lisboa, Portugal
| | - Miguel Froes
- Instituto Superior Técnico, Universidade de Lisboa, Lisboa, Portugal
| | - Pedro Sarmento
- Hospital da Luz Lisboa, Internal Medicine, Luz Saúde, Lisboa, Portugal
| | - Jaime Machado
- Hospital da Luz Learning Health, Luz Saúde, Lisboa, Portugal
| | | | - Nuno A Silva
- Hospital da Luz Learning Health, Luz Saúde, Lisboa, Portugal
| | - Mário J Silva
- Instituto Superior Técnico, Universidade de Lisboa, Lisboa, Portugal
- INESC-ID, Lisboa, Portugal
| | - Francisca Leite
- Hospital da Luz Learning Health, Luz Saúde, Lisboa, Portugal
| |
Collapse
|
7
|
Vanderbleek JJ, Owensby JK, McAnnally A, England BR, Chen L, Curtis JR, Yun H. Classifying Multimorbidity Using Drug Concepts via the Rx-Risk Comorbidity Index: Methods and Comparative Cross-Sectional Study. Arthritis Care Res (Hoboken) 2024; 76:559-569. [PMID: 37986017 DOI: 10.1002/acr.25273] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/12/2022] [Revised: 06/26/2023] [Accepted: 11/14/2023] [Indexed: 11/22/2023]
Abstract
OBJECTIVE The study objective was to update a method to identify comorbid conditions using only medication information in circumstances in which diagnosis codes may be undercaptured, such as in single-specialty electronic health records (EHRs), and to compare the distribution of comorbidities across Rx-Risk versus other traditional comorbidity indices. METHODS Using First Databank, RxNorm, and its web-based clients, RxNav and RxClass, we mapped Drug Concept Unique Identifiers (RxCUIs), National Drug Codes (NDCs), and Anatomical Therapeutic Chemical (ATC) codes to Rx-Risk, a medication-focused comorbidity index. In established rheumatoid arthritis (RA) and osteoarthritis (OA) cohorts within the Rheumatology Informatics System for Effectiveness registry, we then compared Rx-Risk with other comorbidity indices, including the Charlson Comorbidity Index, Rheumatic Disease Comorbidity Index (RDCI), and Elixhauser. RESULTS We identified 965 unique ingredient RxCUIs representing the 46 Rx-Risk comorbidity categories. After excluding dosage form and ingredient related RxCUIs, 80,911 unique associated RxCUIs were mapped to the index. Additionally, 187,024 unique NDCs and 354 ATC codes were obtained and mapped to the index categories. When compared to traditional comorbidity indices in the RA cohort, the median score for Rx-Risk (median 6.00 [25th percentile 2, 75th percentile 9]) was much greater than for Charlson (median 0 [25th percentile 0, 75th percentile 0]), RDCI (median 0 [25th percentile 0, 75th percentile 0]), and Elixhauser (median 1 [25th percentile 1, 75th percentile 1]). Analyses of the OA cohort yielded similar results. For patients with a Charlson score of 0 (85% of total), both the RDCI and Elixhauser were close to 1, but the Rx-Risk score ranged from 0 to 16 or more. CONCLUSION The misclassification and under-ascertainment of comorbidities in single-specialty EHRs can largely be overcome by using a medication-focused comorbidity index.
Collapse
Affiliation(s)
- Jared J Vanderbleek
- University of Alabama at Birmingham and University of Alabama at Birmingham Hospital
| | | | | | - Bryant R England
- University of Nebraska Medical Center and VA Nebraska-Western Iowa Health Care System, Omaha
| | | | | | | |
Collapse
|
8
|
Mizuno S, Wagata M, Nagaie S, Ishikuro M, Obara T, Tamiya G, Kuriyama S, Tanaka H, Yaegashi N, Yamamoto M, Sugawara J, Ogishima S. Development of phenotyping algorithms for hypertensive disorders of pregnancy (HDP) and their application in more than 22,000 pregnant women. Sci Rep 2024; 14:6292. [PMID: 38491024 PMCID: PMC10943000 DOI: 10.1038/s41598-024-55914-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/15/2023] [Accepted: 02/28/2024] [Indexed: 03/18/2024] Open
Abstract
Recently, many phenotyping algorithms for high-throughput cohort identification have been developed. Prospective genome cohort studies are critical resources for precision medicine, but there are many hurdles in the precise cohort identification. Consequently, it is important to develop phenotyping algorithms for cohort data collection. Hypertensive disorders of pregnancy (HDP) is a leading cause of maternal morbidity and mortality. In this study, we developed, applied, and validated rule-based phenotyping algorithms of HDP. Two phenotyping algorithms, algorithms 1 and 2, were developed according to American and Japanese guidelines, and applied into 22,452 pregnant women in the Birth and Three-Generation Cohort Study of the Tohoku Medical Megabank project. To precise cohort identification, we analyzed both structured data (e.g., laboratory and physiological tests) and unstructured clinical notes. The identified subtypes of HDP were validated against reference standards. Algorithms 1 and 2 identified 7.93% and 8.08% of the subjects as having HDP, respectively, along with their HDP subtypes. Our algorithms were high performing with high positive predictive values (0.96 and 0.90 for algorithms 1 and 2, respectively). Overcoming the hurdle of precise cohort identification from large-scale cohort data collection, we achieved both developed and implemented phenotyping algorithms, and precisely identified HDP patients and their subtypes from large-scale cohort data collection.
Collapse
Affiliation(s)
- Satoshi Mizuno
- Department of Informatics for Genomic Medicine, Tohoku Medical Megabank Organization, Tohoku University, 2-1, Seiryo-Machi, Aoba-Ku, Sendai, Miyagi, 980-8575, Japan
| | - Maiko Wagata
- Department of Feto-Maternal Medical Science, Tohoku Medical Megabank Organization, Tohoku University, Miyagi, Japan
| | - Satoshi Nagaie
- Department of Informatics for Genomic Medicine, Tohoku Medical Megabank Organization, Tohoku University, 2-1, Seiryo-Machi, Aoba-Ku, Sendai, Miyagi, 980-8575, Japan
| | - Mami Ishikuro
- Department of Molecular Epidemiology, Tohoku Medical Megabank Organization, Tohoku University, Miyagi, Japan
| | - Taku Obara
- Department of Molecular Epidemiology, Tohoku Medical Megabank Organization, Tohoku University, Miyagi, Japan
| | - Gen Tamiya
- Department of Statistical Genetics and Genomics, Tohoku Medical Megabank Organization, Tohoku University, Miyagi, Japan
| | - Shinichi Kuriyama
- Department of Molecular Epidemiology, Tohoku Medical Megabank Organization, Tohoku University, Miyagi, Japan
| | | | - Nobuo Yaegashi
- Department of Gynecology and Obstetrics, Tohoku University Graduate School of Medicine, Tohoku University, Miyagi, Japan
| | - Masayuki Yamamoto
- Department of Biochemistry and Molecular Biology, Tohoku Medical Megabank Organization, Tohoku University, Miyagi, Japan
| | - Junichi Sugawara
- Department of Gynecology and Obstetrics, Tohoku University Graduate School of Medicine, Tohoku University, Miyagi, Japan
- Suzuki Memorial Hospital, 3-5-5, Satonomori, Iwanumashi, Miyagi, Japan
| | - Soichi Ogishima
- Department of Informatics for Genomic Medicine, Tohoku Medical Megabank Organization, Tohoku University, 2-1, Seiryo-Machi, Aoba-Ku, Sendai, Miyagi, 980-8575, Japan.
- Advanced Research Center for Innovations in Next-Generation Medicine, Tohoku University, Miyagi, Japan.
| |
Collapse
|
9
|
Yusuf A, Boyne DJ, O'Sullivan DE, Brenner DR, Cheung WY, Mirza I, Jarada TN. Text analysis framework for identifying mutations among non-small cell lung cancer patients from laboratory data. BMC Med Res Methodol 2024; 24:63. [PMID: 38468224 PMCID: PMC10926579 DOI: 10.1186/s12874-024-02192-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/31/2022] [Accepted: 02/25/2024] [Indexed: 03/13/2024] Open
Abstract
BACKGROUND Laboratory data can provide great value to support research aimed at reducing the incidence, prolonging survival and enhancing outcomes of cancer. Data is characterized by the information it carries and the format it holds. Data captured in Alberta's biomarker laboratory repository is free text, cluttered and rouge. Such data format limits its utility and prohibits broader adoption and research development. Text analysis for information extraction of unstructured data can change this and lead to more complete analyses. Previous work on extracting relevant information from free text, unstructured data employed Natural Language Processing (NLP), Machine Learning (ML), rule-based Information Extraction (IE) methods, or a hybrid combination between them. METHODS In our study, text analysis was performed on Alberta Precision Laboratories data which consisted of 95,854 entries from the Southern Alberta Dataset (SAD) and 6944 entries from the Northern Alberta Dataset (NAD). The data covers all of Alberta and is completely population-based. Our proposed framework is built around rule-based IE methods. It incorporates topics such as Syntax and Lexical analyses to achieve deterministic extraction of data from biomarker laboratory data (i.e., Epidermal Growth Factor Receptor (EGFR) test results). Lexical analysis compromises of data cleaning and pre-processing, Rich Text Format text conversion into readable plain text format, and normalization and tokenization of text. The framework then passes the text into the Syntax analysis stage which includes the rule-based method of extracting relevant data. Rule-based patterns of the test result are identified, and a Context Free Grammar then generates the rules of information extraction. Finally, the results are linked with the Alberta Cancer Registry to support real-world cancer research studies. RESULTS Of the original 5512 entries in the SAD dataset and 5017 entries in the NAD dataset which were filtered for EGFR, the framework yielded 5129 and 3388 extracted EGFR test results from the SAD and NAD datasets, respectively. An accuracy of 97.5% was achieved on a random sample of 362 tests. CONCLUSIONS We presented a text analysis framework to extract specific information from unstructured clinical data. Our proposed framework has shown that it can successfully extract relevant information from EGFR test results.
Collapse
Affiliation(s)
- Amman Yusuf
- Department of Oncology, University of Calgary, Calgary, AB, T2N 4N2, Canada
| | - Devon J Boyne
- Department of Oncology, University of Calgary, Calgary, AB, T2N 4N2, Canada
- Department of Community Health Sciences, University of Calgary, Calgary, AB, T2N 4Z6, Canada
| | - Dylan E O'Sullivan
- Department of Oncology, University of Calgary, Calgary, AB, T2N 4N2, Canada
- Department of Community Health Sciences, University of Calgary, Calgary, AB, T2N 4Z6, Canada
| | - Darren R Brenner
- Department of Oncology, University of Calgary, Calgary, AB, T2N 4N2, Canada
- Department of Community Health Sciences, University of Calgary, Calgary, AB, T2N 4Z6, Canada
| | - Winson Y Cheung
- Department of Oncology, University of Calgary, Calgary, AB, T2N 4N2, Canada
- Department of Community Health Sciences, University of Calgary, Calgary, AB, T2N 4Z6, Canada
| | - Imran Mirza
- Alberta Precision Laboratories, Calgary, AB, T2L 2K8, Canada
| | - Tamer N Jarada
- Department of Oncology, University of Calgary, Calgary, AB, T2N 4N2, Canada.
- Department of Community Health Sciences, University of Calgary, Calgary, AB, T2N 4Z6, Canada.
| |
Collapse
|
10
|
Ammar S, Borghoff K, El Mikati IK, Mustafa RA, Noureddine L. Using ICD9/10 codes for identifying ADPKD patients, a validation study. J Nephrol 2024; 37:523-525. [PMID: 37907678 DOI: 10.1007/s40620-023-01780-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/22/2023] [Accepted: 09/03/2023] [Indexed: 11/02/2023]
Affiliation(s)
- Shahed Ammar
- Division of Nephrology and Hypertension, Department of Internal Medicine, University of Iowa, Iowa City, IA, USA.
- University of Iowa Carver College of Medicine, Campus Box C44-K, 200 Hawkins Dr., Iowa City, IA, 52242, USA.
| | - Kathleen Borghoff
- Division of Nephrology and Hypertension, Department of Internal Medicine, University of Iowa, Iowa City, IA, USA
- Division of Nephrology and Hypertension, Department of Internal Medicine, University of Nebraska Medical Center, Omaha, NE, USA
| | - Ibrahim K El Mikati
- Outcomes and Implementation Research Unit, Department of Internal Medicine, University of Kansas Medical Center, Kansas City, KS, USA
| | - Reem A Mustafa
- Division of Nephrology and Hypertension, Department of Internal Medicine, University of Kansas Medical Center, Kansas City, KS, USA
- Department of Health Research Methods, Evidence, and Impact, McMaster University, Hamilton, ON, Canada
| | - Lama Noureddine
- Division of Nephrology and Hypertension, Department of Internal Medicine, University of Iowa, Iowa City, IA, USA
| |
Collapse
|
11
|
Gao J, Bonzel CL, Hong C, Varghese P, Zakir K, Gronsbell J. Semi-supervised ROC analysis for reliable and streamlined evaluation of phenotyping algorithms. J Am Med Inform Assoc 2024; 31:640-650. [PMID: 38128118 PMCID: PMC10873838 DOI: 10.1093/jamia/ocad226] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/03/2023] [Revised: 09/22/2023] [Accepted: 11/20/2023] [Indexed: 12/23/2023] Open
Abstract
OBJECTIVE High-throughput phenotyping will accelerate the use of electronic health records (EHRs) for translational research. A critical roadblock is the extensive medical supervision required for phenotyping algorithm (PA) estimation and evaluation. To address this challenge, numerous weakly-supervised learning methods have been proposed. However, there is a paucity of methods for reliably evaluating the predictive performance of PAs when a very small proportion of the data is labeled. To fill this gap, we introduce a semi-supervised approach (ssROC) for estimation of the receiver operating characteristic (ROC) parameters of PAs (eg, sensitivity, specificity). MATERIALS AND METHODS ssROC uses a small labeled dataset to nonparametrically impute missing labels. The imputations are then used for ROC parameter estimation to yield more precise estimates of PA performance relative to classical supervised ROC analysis (supROC) using only labeled data. We evaluated ssROC with synthetic, semi-synthetic, and EHR data from Mass General Brigham (MGB). RESULTS ssROC produced ROC parameter estimates with minimal bias and significantly lower variance than supROC in the simulated and semi-synthetic data. For the 5 PAs from MGB, the estimates from ssROC are 30% to 60% less variable than supROC on average. DISCUSSION ssROC enables precise evaluation of PA performance without demanding large volumes of labeled data. ssROC is also easily implementable in open-source R software. CONCLUSION When used in conjunction with weakly-supervised PAs, ssROC facilitates the reliable and streamlined phenotyping necessary for EHR-based research.
Collapse
Affiliation(s)
- Jianhui Gao
- Department of Statistical Sciences, University of Toronto, Toronto, ON, Canada
| | - Clara-Lea Bonzel
- Department of Biomedical Informatics, Harvard Medical School, Boston, MA, United States
| | - Chuan Hong
- Department of Biostatistics and Bioinformatics, Duke University, Durham, NC, United States
| | - Paul Varghese
- Health Informatics, Verily Life Sciences, Cambridge, MA, United States
| | - Karim Zakir
- Department of Statistical Sciences, University of Toronto, Toronto, ON, Canada
| | - Jessica Gronsbell
- Department of Statistical Sciences, University of Toronto, Toronto, ON, Canada
- Department of Family and Community Medicine, University of Toronto, Toronto, ON, Canada
- Department of Computer Science, University of Toronto, Toronto, ON, Canada
| |
Collapse
|
12
|
Boeker M, Zöller D, Blasini R, Macho P, Helfer S, Behrens M, Prokosch HU, Gulden C. Effectiveness of IT-supported patient recruitment: study protocol for an interrupted time series study at ten German university hospitals. Trials 2024; 25:125. [PMID: 38365848 PMCID: PMC10870691 DOI: 10.1186/s13063-024-07918-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/03/2021] [Accepted: 01/09/2024] [Indexed: 02/18/2024] Open
Abstract
BACKGROUND As part of the German Medical Informatics Initiative, the MIRACUM project establishes data integration centers across ten German university hospitals. The embedded MIRACUM Use Case "Alerting in Care - IT Support for Patient Recruitment", aims to support the recruitment into clinical trials by automatically querying the repositories for patients satisfying eligibility criteria and presenting them as screening candidates. The objective of this study is to investigate whether the developed recruitment tool has a positive effect on study recruitment within a multi-center environment by increasing the number of participants. Its secondary objective is the measurement of organizational burden and user satisfaction of the provided IT solution. METHODS The study uses an Interrupted Time Series Design with a duration of 15 months. All trials start in the control phase of randomized length with regular recruitment and change to the intervention phase with additional IT support. The intervention consists of the application of a recruitment-support system which uses patient data collected in general care for screening according to specific criteria. The inclusion and exclusion criteria of all selected trials are translated into a machine-readable format using the OHDSI ATLAS tool. All patient data from the data integration centers is regularly checked against these criteria. The primary outcome is the number of participants recruited per trial and week standardized by the targeted number of participants per week and the expected recruitment duration of the specific trial. Secondary outcomes are usability, usefulness, and efficacy of the recruitment support. Sample size calculation based on simple parallel group assumption can demonstrate an effect size of d=0.57 on a significance level of 5% and a power of 80% with a total number of 100 trials (10 per site). Data describing the included trials and the recruitment process is collected at each site. The primary analysis will be conducted using linear mixed models with the actual recruitment number per week and trial standardized by the expected recruitment number per week and trial as the dependent variable. DISCUSSION The application of an IT-supported recruitment solution developed in the MIRACUM consortium leads to an increased number of recruited participants in studies at German university hospitals. It supports employees engaged in the recruitment of trial participants and is easy to integrate in their daily work.
Collapse
Affiliation(s)
- Martin Boeker
- Institute of Medical Biometry and Statistics, Medical Faculty and Medical Center, University of Freiburg, Freiburg im Breisgau, Germany
- Chair of Medical Informatics, Institute of Artificial Intelligence and Informatics in Medicine, Klinikum rechts der Isar, School of Medicine and Health, Technical University of Munich, Munich, Germany
| | - Daniela Zöller
- Institute of Medical Biometry and Statistics, Medical Faculty and Medical Center, University of Freiburg, Freiburg im Breisgau, Germany
| | - Romina Blasini
- Institute of Medical Informatics, Justus-Liebig-University Gießen, Gießen, Germany
| | - Philipp Macho
- Institute of Medical Biostatistics, Epidemiology and Informatics (IMBEI), Mainz University Medical Center, Mainz, Germany
| | - Sven Helfer
- Department of Pediatrics, Faculty of Medicine and University Hospital Carl Gustav Carus, Technische Universität Dresden, Dresden, Germany
| | - Max Behrens
- Institute of Medical Biometry and Statistics, Medical Faculty and Medical Center, University of Freiburg, Freiburg im Breisgau, Germany
| | - Hans-Ulrich Prokosch
- Chair of Medical Informatics, Department of Medical Informatics, Biometrics and Epidemiology, Friedrich-Alexander Universität Erlangen-Nürnberg, Erlangen, Germany
| | - Christian Gulden
- Chair of Medical Informatics, Department of Medical Informatics, Biometrics and Epidemiology, Friedrich-Alexander Universität Erlangen-Nürnberg, Erlangen, Germany.
| |
Collapse
|
13
|
He X, Wei R, Huang Y, Chen Z, Lyu T, Bost S, Tong J, Li L, Zhou Y, Guo J, Tang H, Wang F, DeKosky S, Xu H, Chen Y, Zhang R, Xu J, Guo Y, Wu Y, Bian J. Develop and Validate a Computable Phenotype for the Identification of Alzheimer's Disease Patients Using Electronic Health Record Data. MEDRXIV : THE PREPRINT SERVER FOR HEALTH SCIENCES 2024:2024.02.06.24302389. [PMID: 38370766 PMCID: PMC10871460 DOI: 10.1101/2024.02.06.24302389] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 02/20/2024]
Abstract
INTRODUCTION Alzheimer's Disease (AD) are often misclassified in electronic health records (EHRs) when relying solely on diagnostic codes. This study aims to develop a more accurate, computable phenotype (CP) for identifying AD patients by using both structured and unstructured EHR data. METHODS We used EHRs from the University of Florida Health (UF Health) system and created rule-based CPs iteratively through manual chart reviews. The CPs were then validated using data from the University of Texas Health Science Center at Houston (UT Health) and the University of Minnesota (UMN). RESULTS Our best-performing CP is " patient has at least 2 AD diagnoses and AD-related keywords " with an F1-score of 0.817 at UF, and 0.961 and 0.623 at UT Health and UMN, respectively. DISCUSSION We developed and validated rule-based CPs for AD identification with good performance, crucial for studies that aim to use real-world data like EHRs.
Collapse
|
14
|
Dong G, Bate A, Haguinet F, Westman G, Dürlich L, Hviid A, Sessa M. Optimizing Signal Management in a Vaccine Adverse Event Reporting System: A Proof-of-Concept with COVID-19 Vaccines Using Signs, Symptoms, and Natural Language Processing. Drug Saf 2024; 47:173-182. [PMID: 38062261 PMCID: PMC10821983 DOI: 10.1007/s40264-023-01381-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 11/14/2023] [Indexed: 01/28/2024]
Abstract
INTRODUCTION The Vaccine Adverse Event Reporting System (VAERS) has already been challenged by an extreme increase in the number of individual case safety reports (ICSRs) after the market introduction of coronavirus disease 2019 (COVID-19) vaccines. Evidence from scientific literature suggests that when there is an extreme increase in the number of ICSRs recorded in spontaneous reporting databases (such as the VAERS), an accompanying increase in the number of disproportionality signals (sometimes referred to as 'statistical alerts') generated is expected. OBJECTIVES The objective of this study was to develop a natural language processing (NLP)-based approach to optimize signal management by excluding disproportionality signals related to listed adverse events following immunization (AEFIs). COVID-19 vaccines were used as a proof-of-concept. METHODS The VAERS was used as a data source, and the Finding Associated Concepts with Text Analysis (FACTA+) was used to extract signs and symptoms of listed AEFIs from MEDLINE for COVID-19 vaccines. Disproportionality analyses were conducted according to guidelines and recommendations provided by the US Centers for Disease Control and Prevention. By using signs and symptoms of listed AEFIs, we computed the proportion of disproportionality signals dismissed for COVID-19 vaccines using this approach. Nine NLP techniques, including Generative Pre-Trained Transformer 3.5 (GPT-3.5), were used to automatically retrieve Medical Dictionary for Regulatory Activities Preferred Terms (MedDRA PTs) from signs and symptoms extracted from FACTA+. RESULTS Overall, 17% of disproportionality signals for COVID-19 vaccines were dismissed as they reported signs and symptoms of listed AEFIs. Eight of nine NLP techniques used to automatically retrieve MedDRA PTs from signs and symptoms extracted from FACTA+ showed suboptimal performance. GPT-3.5 achieved an accuracy of 78% in correctly assigning MedDRA PTs. CONCLUSION Our approach reduced the need for manual exclusion of disproportionality signals related to listed AEFIs and may lead to better optimization of time and resources in signal management.
Collapse
Affiliation(s)
- Guojun Dong
- Department of Drug Design and Pharmacology, University of Copenhagen, Jagtvej 160, 2100, Copenhagen, Denmark
| | - Andrew Bate
- Global Safety, GSK, Brentford, UK
- Department of Non‑Communicable Disease Epidemiology, London School of Hygiene and Tropical Medicine, London, UK
| | | | - Gabriel Westman
- Department of Medical Sciences, Uppsala University, Uppsala, Sweden
| | - Luise Dürlich
- Department of Linguistics and Philology, Uppsala University, Uppsala, Sweden
- Department of Computer Science, RISE Research Institutes of Sweden, Kista, Sweden
| | - Anders Hviid
- Department of Drug Design and Pharmacology, University of Copenhagen, Jagtvej 160, 2100, Copenhagen, Denmark
- Department of Epidemiology Research, Statens Serum Institut, Copenhagen, Denmark
| | - Maurizio Sessa
- Department of Drug Design and Pharmacology, University of Copenhagen, Jagtvej 160, 2100, Copenhagen, Denmark.
| |
Collapse
|
15
|
Mollalo A, Hamidi B, Lenert L, Alekseyenko AV. Application of Spatial Analysis for Electronic Health Records: Characterizing Patient Phenotypes and Emerging Trends. RESEARCH SQUARE 2024:rs.3.rs-3443865. [PMID: 37886509 PMCID: PMC10602163 DOI: 10.21203/rs.3.rs-3443865/v1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 10/28/2023]
Abstract
Background Electronic health records (EHR) commonly contain patient addresses that provide valuable data for geocoding and spatial analysis, enabling more comprehensive descriptions of individual patients for clinical purposes. Despite the widespread use of EHR in clinical decision support and interventions, no systematic review has examined the extent to which spatial analysis is used to characterize patient phenotypes. Objective This study reviews advanced spatial analyses that employed individual-level health data from EHR within the US to characterize patient phenotypes. Methods We systematically evaluated English-language peer-reviewed articles from PubMed/MEDLINE, Scopus, Web of Science, and Google Scholar databases from inception to August 20, 2023, without imposing constraints on time, study design, or specific health domains. Results Only 49 articles met the eligibility criteria. These articles utilized diverse spatial methods, with a predominant focus on clustering techniques, while spatiotemporal analysis (frequentist and Bayesian) and modeling were relatively underexplored. A noteworthy surge (n = 42, 85.7%) in publications was observed post-2017. The publications investigated a variety of adult and pediatric clinical areas, including infectious disease, endocrinology, and cardiology, using phenotypes defined over a range of data domains, such as demographics, diagnoses, and visits. The primary health outcomes investigated were asthma, hypertension, and diabetes. Notably, patient phenotypes involving genomics, imaging, and notes were rarely utilized. Conclusions This review underscores the growing interest in spatial analysis of EHR-derived data and highlights knowledge gaps in clinical health, phenotype domains, and spatial methodologies. Additionally, this review proposes guidelines for harnessing the potential of spatial analysis to enhance the context of individual patients for future clinical decision support.
Collapse
|
16
|
Schopow N, Osterhoff G, Baur D. Applications of the Natural Language Processing Tool ChatGPT in Clinical Practice: Comparative Study and Augmented Systematic Review. JMIR Med Inform 2023; 11:e48933. [PMID: 38015610 DOI: 10.2196/48933] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/11/2023] [Revised: 06/20/2023] [Accepted: 08/25/2023] [Indexed: 11/29/2023] Open
Abstract
BACKGROUND This research integrates a comparative analysis of the performance of human researchers and OpenAI's ChatGPT in systematic review tasks and describes an assessment of the application of natural language processing (NLP) models in clinical practice through a review of 5 studies. OBJECTIVE This study aimed to evaluate the reliability between ChatGPT and human researchers in extracting key information from clinical articles, and to investigate the practical use of NLP in clinical settings as evidenced by selected studies. METHODS The study design comprised a systematic review of clinical articles executed independently by human researchers and ChatGPT. The level of agreement between and within raters for parameter extraction was assessed using the Fleiss and Cohen κ statistics. RESULTS The comparative analysis revealed a high degree of concordance between ChatGPT and human researchers for most parameters, with less agreement for study design, clinical task, and clinical implementation. The review identified 5 significant studies that demonstrated the diverse applications of NLP in clinical settings. These studies' findings highlight the potential of NLP to improve clinical efficiency and patient outcomes in various contexts, from enhancing allergy detection and classification to improving quality metrics in psychotherapy treatments for veterans with posttraumatic stress disorder. CONCLUSIONS Our findings underscore the potential of NLP models, including ChatGPT, in performing systematic reviews and other clinical tasks. Despite certain limitations, NLP models present a promising avenue for enhancing health care efficiency and accuracy. Future studies must focus on broadening the range of clinical applications and exploring the ethical considerations of implementing NLP applications in health care settings.
Collapse
Affiliation(s)
- Nikolas Schopow
- Department for Orthopedics, Trauma Surgery and Plastic Surgery, University Hospital Leipzig, Leipzig, Germany
| | - Georg Osterhoff
- Department for Orthopedics, Trauma Surgery and Plastic Surgery, University Hospital Leipzig, Leipzig, Germany
| | - David Baur
- Department for Orthopedics, Trauma Surgery and Plastic Surgery, University Hospital Leipzig, Leipzig, Germany
| |
Collapse
|
17
|
Meier R, Grischott T, Rachamin Y, Jäger L, Senn O, Rosemann T, Burgstaller JM, Markun S. Importance of different electronic medical record components for chronic disease identification in a Swiss primary care database: a cross-sectional study. Swiss Med Wkly 2023; 153:40107. [PMID: 37854021 DOI: 10.57187/smw.2023.40107] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/20/2023] Open
Abstract
BACKGROUND Primary care databases collect electronic medical records with routine data from primary care patients. The identification of chronic diseases in primary care databases often integrates information from various electronic medical record components (EMR-Cs) used by primary care providers. This study aimed to estimate the prevalence of selected chronic conditions using a large Swiss primary care database and to examine the importance of different EMR-Cs for case identification. METHODS Cross-sectional study with 120,608 patients of 128 general practitioners in the Swiss FIRE ("Family Medicine Research using Electronic Medical Records") primary care database in 2019. Sufficient criteria on three individual EMR-Cs, namely medication, clinical or laboratory parameters and reasons for encounters, were combined by logical disjunction into definitions of 49 chronic conditions; then prevalence estimates and measures of importance of the individual EMR-Cs for case identification were calculated. RESULTS A total of 185,535 cases (i.e. patients with a specific chronic condition) were identified. Prevalence estimates were 27.5% (95% CI: 27.3-27.8%) for hypertension, 13.5% (13.3-13.7%) for dyslipidaemia and 6.6% (6.4-6.7%) for diabetes mellitus. Of all cases, 87.1% (87.0-87.3%) were identified via medication, 22.1% (21.9-22.3%) via clinical or laboratory parameters and 19.3% (19.1-19.5%) via reasons for encounters. The majority (65.4%) of cases were identifiable solely through medication. Of the two other EMR-Cs, clinical or laboratory parameters was most important for identifying cases of chronic kidney disease, anorexia/bulimia nervosa and obesity whereas reasons for encounters was crucial for identifying many low-prevalence diseases as well as cancer, heart disease and osteoarthritis. CONCLUSIONS The EMR-C medication was most important for chronic disease identification overall, but identification varied strongly by disease. The analysis of the importance of different EMR-Cs for estimating prevalence revealed strengths and weaknesses of the disease definitions used within the FIRE primary care database. Although prioritising specificity over sensitivity in the EMR-C criteria may have led to underestimation of most prevalences, their sex- and age-specific patterns were consistent with published figures for Swiss general practice.
Collapse
Affiliation(s)
- Rahel Meier
- Institute of Primary Care, University Hospital Zurich, University of Zurich, Zurich, Switzerland
| | - Thomas Grischott
- Institute of Primary Care, University Hospital Zurich, University of Zurich, Zurich, Switzerland
| | - Yael Rachamin
- Institute of Primary Care, University Hospital Zurich, University of Zurich, Zurich, Switzerland
| | - Levy Jäger
- Institute of Primary Care, University Hospital Zurich, University of Zurich, Zurich, Switzerland
| | - Oliver Senn
- Institute of Primary Care, University Hospital Zurich, University of Zurich, Zurich, Switzerland
| | - Thomas Rosemann
- Institute of Primary Care, University Hospital Zurich, University of Zurich, Zurich, Switzerland
| | - Jakob M Burgstaller
- Institute of Primary Care, University Hospital Zurich, University of Zurich, Zurich, Switzerland
| | - Stefan Markun
- Institute of Primary Care, University Hospital Zurich, University of Zurich, Zurich, Switzerland
| |
Collapse
|
18
|
Dhingra LS, Shen M, Mangla A, Khera R. Cardiovascular Care Innovation through Data-Driven Discoveries in the Electronic Health Record. Am J Cardiol 2023; 203:136-148. [PMID: 37499593 PMCID: PMC10865722 DOI: 10.1016/j.amjcard.2023.06.104] [Citation(s) in RCA: 3] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 02/28/2023] [Revised: 05/24/2023] [Accepted: 06/29/2023] [Indexed: 07/29/2023]
Abstract
The electronic health record (EHR) represents a rich source of patient information, increasingly being leveraged for cardiovascular research. Although its primary use remains the seamless delivery of health care, the various longitudinally aggregated structured and unstructured data elements for each patient within the EHR can define the computational phenotypes of disease and care signatures and their association with outcomes. Although structured data elements, such as demographic characteristics, laboratory measurements, problem lists, and medications, are easily extracted, unstructured data are underused. The latter include free text in clinical narratives, documentation of procedures, and reports of imaging and pathology. Rapid scaling up of data storage and rapid innovation in natural language processing and computer vision can power insights from unstructured data streams. However, despite an array of opportunities for research using the EHR, specific expertise is necessary to adequately address confidentiality, accuracy, completeness, and heterogeneity challenges in EHR-based research. These often require methodological innovation and best practices to design and conduct successful research studies. Our review discusses these challenges and their proposed solutions. In addition, we highlight the ongoing innovations in federated learning in the EHR through a greater focus on common data models and discuss ongoing work that defines such an approach to large-scale, multicenter, federated studies. Such parallel improvements in technology and research methods enable innovative care and optimization of patient outcomes.
Collapse
Affiliation(s)
| | - Miles Shen
- Section of Cardiovascular Medicine, Department of Internal Medicine; Department of Internal Medicine
| | - Anjali Mangla
- Section of Cardiovascular Medicine, Department of Internal Medicine; Department of Neuroscience, Yale School of Medicine, New Haven, Connecticut
| | - Rohan Khera
- Section of Cardiovascular Medicine, Department of Internal Medicine; Center for Outcomes Research and Evaluation (CORE), Yale New Haven Hospital, New Haven, Connecticut; Section of Health Informatics, Department of Biostatistics, Yale School of Public Health, New Haven, Connecticut.; Section of Biomedical Informatics and Data Science, Yale School of Medicine, New Haven, Connecticut.
| |
Collapse
|
19
|
Sathe NA, Xian S, Mabrey FL, Crosslin DR, Mooney SD, Morrell ED, Lybarger K, Yetisgen M, Jarvik GP, Bhatraju PK, Wurfel MM. Evaluating construct validity of computable acute respiratory distress syndrome definitions in adults hospitalized with COVID-19: an electronic health records based approach. BMC Pulm Med 2023; 23:292. [PMID: 37559024 PMCID: PMC10413524 DOI: 10.1186/s12890-023-02560-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/08/2023] [Accepted: 07/11/2023] [Indexed: 08/11/2023] Open
Abstract
BACKGROUND Evolving ARDS epidemiology and management during COVID-19 have prompted calls to reexamine the construct validity of Berlin criteria, which have been rarely evaluated in real-world data. We developed a Berlin ARDS definition (EHR-Berlin) computable in electronic health records (EHR) to (1) assess its construct validity, and (2) assess how expanding its criteria affected validity. METHODS We performed a retrospective cohort study at two tertiary care hospitals with one EHR, among adults hospitalized with COVID-19 February 2020-March 2021. We assessed five candidate definitions for ARDS: the EHR-Berlin definition modeled on Berlin criteria, and four alternatives informed by recent proposals to expand criteria and include patients on high-flow oxygen (EHR-Alternative 1), relax imaging criteria (EHR-Alternatives 2-3), and extend timing windows (EHR-Alternative 4). We evaluated two aspects of construct validity for the EHR-Berlin definition: (1) criterion validity: agreement with manual ARDS classification by experts, available in 175 patients; (2) predictive validity: relationships with hospital mortality, assessed by Pearson r and by area under the receiver operating curve (AUROC). We assessed predictive validity and timing of identification of EHR-Berlin definition compared to alternative definitions. RESULTS Among 765 patients, mean (SD) age was 57 (18) years and 471 (62%) were male. The EHR-Berlin definition classified 171 (22%) patients as ARDS, which had high agreement with manual classification (kappa 0.85), and was associated with mortality (Pearson r = 0.39; AUROC 0.72, 95% CI 0.68, 0.77). In comparison, EHR-Alternative 1 classified 219 (29%) patients as ARDS, maintained similar relationships to mortality (r = 0.40; AUROC 0.74, 95% CI 0.70, 0.79, Delong test P = 0.14), and identified patients earlier in their hospitalization (median 13 vs. 15 h from admission, Wilcoxon signed-rank test P < 0.001). EHR-Alternative 3, which removed imaging criteria, had similar correlation (r = 0.41) but better discrimination for mortality (AUROC 0.76, 95% CI 0.72, 0.80; P = 0.036), and identified patients median 2 h (P < 0.001) from admission. CONCLUSIONS The EHR-Berlin definition can enable ARDS identification with high criterion validity, supporting large-scale study and surveillance. There are opportunities to expand the Berlin criteria that preserve predictive validity and facilitate earlier identification.
Collapse
Affiliation(s)
- Neha A Sathe
- Division of Pulmonary, Critical Care and Sleep Medicine, Department of Medicine, University of Washington, 325 9th Avenue HMC #359640, Seattle, WA, 98104-2499, USA.
| | - Su Xian
- Department of Biomedical Informatics and Medical Education, University of Washington, Seattle, WA, USA
| | - F Linzee Mabrey
- Division of Pulmonary, Critical Care and Sleep Medicine, Department of Medicine, University of Washington, 325 9th Avenue HMC #359640, Seattle, WA, 98104-2499, USA
| | - David R Crosslin
- Division of Biomedical Informatics and Genomics, John W. Deming Department of Medicine, Tulane University School of Medicine, New Orleans, LA, USA
| | - Sean D Mooney
- Department of Biomedical Informatics and Medical Education, University of Washington, Seattle, WA, USA
| | - Eric D Morrell
- Division of Pulmonary, Critical Care and Sleep Medicine, Department of Medicine, University of Washington, 325 9th Avenue HMC #359640, Seattle, WA, 98104-2499, USA
| | - Kevin Lybarger
- Department of Information Sciences and Technology, George Mason University, Fairfax, VA, USA
| | - Meliha Yetisgen
- Department of Biomedical Informatics and Medical Education, University of Washington, Seattle, WA, USA
| | - Gail P Jarvik
- Department of Genome Sciences and Division of Medical Genetics, Department of Medicine, University of Washington Medical Center, Seattle, WA, USA
| | - Pavan K Bhatraju
- Division of Pulmonary, Critical Care and Sleep Medicine, Department of Medicine, University of Washington, 325 9th Avenue HMC #359640, Seattle, WA, 98104-2499, USA
| | - Mark M Wurfel
- Division of Pulmonary, Critical Care and Sleep Medicine, Department of Medicine, University of Washington, 325 9th Avenue HMC #359640, Seattle, WA, 98104-2499, USA
| |
Collapse
|
20
|
Smith G, Miller A, Marra DE, Wu Y, Bian J, Maraganore DM, Anton S. Evaluation of a Computable Phenotype for Successful Cognitive Aging. Mayo Clin Proc Innov Qual Outcomes 2023; 7:212-221. [PMID: 37304063 PMCID: PMC10250575 DOI: 10.1016/j.mayocpiqo.2023.04.006] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/13/2023] Open
Abstract
Objective To establish, apply, and evaluate a computable phenotype for the recruitment of individuals with successful cognitive aging. Participants and Methods Interviews with 10 aging experts identified electronic health record (EHR)-available variables representing successful aging among individuals aged 85 years and older. On the basis of the identified variables, we developed a rule-based computable phenotype algorithm composed of 17 eligibility criteria. Starting September 1, 2019, we applied the computable phenotype algorithm to all living persons aged 85 years and older at the University of Florida Health, which identified 24,024 individuals. This sample was comprised of 13,841 (58%) women, 13,906 (58%) Whites, and 16,557 (69%) non-Hispanics. A priori permission to be contacted for research had been obtained for 11,898 individuals, of whom 470 responded to study announcements and 333 consented to evaluation. Then, we contacted those who consented to evaluate whether their cognitive and functional status clinically met out successful cognitive aging criteria of a modified Telephone Interview for Cognitive Status score of more than 27 and Geriatric Depression Scale of less than 6. The study was completed on December 31, 2022. Results Of the 45% of living persons aged 85 years and older included in the University of Florida Health EHR database identified by the computable phenotype as successfully aged, approximately 4% of these responded to study announcements and 333 consented, of which 218 (65%) met successful cognitive aging criteria through direct evaluation. Conclusion The study evaluated a computable phenotype algorithm for the recruitment of individuals for a successful aging study using large-scale EHRs. Our study provides proof of concept of using big data and informatics as aids for the recruitment of individuals for prospective cohort studies.
Collapse
Affiliation(s)
- Glenn Smith
- Department of Clinical and Health Psychology, University of Florida, Gainesville
| | - Amber Miller
- Department of Neurology, College of Medicine, University of Florida, Gainesville
| | - David E. Marra
- Department of Psychology, VA Boston Healthcare System, Boston, MA
| | - Yonghui Wu
- Department of Health Outcomes and Biomedical Informatics, College of Medicine, University of Florida, Gainesville
| | - Jiang Bian
- Department of Health Outcomes and Biomedical Informatics, College of Medicine, University of Florida, Gainesville
| | | | - Stephen Anton
- Department of Clinical and Health Psychology, University of Florida, Gainesville
- Department of Physiology and Aging, University of Florida, Gainesville
| |
Collapse
|
21
|
Noaeen M, Amini S, Bhasker S, Ghezelsefli Z, Ahmed A, Jafarinezhad O, Abad ZSH. Unlocking the Power of EHRs: Harnessing Unstructured Data for Machine Learning-based Outcome Predictions. ANNUAL INTERNATIONAL CONFERENCE OF THE IEEE ENGINEERING IN MEDICINE AND BIOLOGY SOCIETY. IEEE ENGINEERING IN MEDICINE AND BIOLOGY SOCIETY. ANNUAL INTERNATIONAL CONFERENCE 2023; 2023:1-4. [PMID: 38083058 DOI: 10.1109/embc40787.2023.10340232] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/18/2023]
Abstract
The integration of Electronic Health Records (EHRs) with Machine Learning (ML) models has become imperative in examining patient outcomes due to the vast amounts of clinical data they provide. However, critical information regarding social and behavioral factors that affect health, such as social isolation, stress, and mental health complexities, is often recorded in unstructured clinical notes, hindering its accessibility. This has resulted in an over-reliance on clinical data in current EHR-based research, potentially leading to disparities in health outcomes. This study aims to evaluate the impact of incorporating patient-specific context from unstructured EHR data on the accuracy and stability of ML algorithms for predicting mortality, using the MIMIC III database. Results from the study confirmed the significance of incorporating patient-specific information into prediction models, leading to a notable improvement in the discriminatory power and robustness of the ML algorithms. Furthermore, the findings underline the importance of considering non-clinical factors related to a patient's daily life, in addition to clinical factors, when making predictions about patient outcomes. The advent of advanced generative models, such as GPT-4, presents new opportunities for effectively extracting social and behavioral factors from unstructured clinical notes, further enhancing the accuracy and stability of ML algorithms in predicting patient outcomes. The results of our study have significant ramifications for improving ML in clinical decision support and patient outcome predictions, specifically highlighting the potential role of generative models like GPT-4 in advancing ML-based outcome predictions.
Collapse
|
22
|
Gendrin A, Souliotis L, Loudon-Griffiths J, Aggarwal R, Amoako D, Desouza G, Dimitrievska S, Metcalfe P, Louvet E, Sahni H. Identifying Patient Populations in Texts Describing Drug Approvals Through Deep Learning-Based Information Extraction: Development of a Natural Language Processing Algorithm. JMIR Form Res 2023; 7:e44876. [PMID: 37347514 DOI: 10.2196/44876] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/07/2022] [Revised: 03/30/2023] [Accepted: 04/17/2023] [Indexed: 06/23/2023] Open
Abstract
BACKGROUND New drug treatments are regularly approved, and it is challenging to remain up-to-date in this rapidly changing environment. Fast and accurate visualization is important to allow a global understanding of the drug market. Automation of this information extraction provides a helpful starting point for the subject matter expert, helps to mitigate human errors, and saves time. OBJECTIVE We aimed to semiautomate disease population extraction from the free text of oncology drug approval descriptions from the BioMedTracker database for 6 selected drug targets. More specifically, we intended to extract (1) line of therapy, (2) stage of cancer of the patient population described in the approval, and (3) the clinical trials that provide evidence for the approval. We aimed to use these results in downstream applications, aiding the searchability of relevant content against related drug project sources. METHODS We fine-tuned a state-of-the-art deep learning model, Bidirectional Encoder Representations from Transformers, for each of the 3 desired outputs. We independently applied rule-based text mining approaches. We compared the performances of deep learning and rule-based approaches and selected the best method, which was then applied to new entries. The results were manually curated by a subject matter expert and then used to train new models. RESULTS The training data set is currently small (433 entries) and will enlarge over time when new approval descriptions become available or if a choice is made to take another drug target into account. The deep learning models achieved 61% and 56% 5-fold cross-validated accuracies for line of therapy and stage of cancer, respectively, which were treated as classification tasks. Trial identification is treated as a named entity recognition task, and the 5-fold cross-validated F1-score is currently 87%. Although the scores of the classification tasks could seem low, the models comprise 5 classes each, and such scores are a marked improvement when compared to random classification. Moreover, we expect improved performance as the input data set grows, since deep learning models need to be trained on a large enough amount of data to be able to learn the task they are taught. The rule-based approach achieved 60% and 74% 5-fold cross-validated accuracies for line of therapy and stage of cancer, respectively. No attempt was made to define a rule-based approach for trial identification. CONCLUSIONS We developed a natural language processing algorithm that is currently assisting subject matter experts in disease population extraction, which supports health authority approvals. This algorithm achieves semiautomation, enabling subject matter experts to leverage the results for deeper analysis and to accelerate information retrieval in a crowded clinical environment such as oncology.
Collapse
|
23
|
Oommen C, Howlett-Prieto Q, Carrithers MD, Hier DB. Inter-rater agreement for the annotation of neurologic signs and symptoms in electronic health records. Front Digit Health 2023; 5:1075771. [PMID: 37383943 PMCID: PMC10294690 DOI: 10.3389/fdgth.2023.1075771] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/20/2022] [Accepted: 05/26/2023] [Indexed: 06/30/2023] Open
Abstract
The extraction of patient signs and symptoms recorded as free text in electronic health records is critical for precision medicine. Once extracted, signs and symptoms can be made computable by mapping to signs and symptoms in an ontology. Extracting signs and symptoms from free text is tedious and time-consuming. Prior studies have suggested that inter-rater agreement for clinical concept extraction is low. We have examined inter-rater agreement for annotating neurologic concepts in clinical notes from electronic health records. After training on the annotation process, the annotation tool, and the supporting neuro-ontology, three raters annotated 15 clinical notes in three rounds. Inter-rater agreement between the three annotators was high for text span and category label. A machine annotator based on a convolutional neural network had a high level of agreement with the human annotators but one that was lower than human inter-rater agreement. We conclude that high levels of agreement between human annotators are possible with appropriate training and annotation tools. Furthermore, more training examples combined with improvements in neural networks and natural language processing should make machine annotators capable of high throughput automated clinical concept extraction with high levels of agreement with human annotators.
Collapse
Affiliation(s)
- Chelsea Oommen
- Department of Neurology and Rehabilitation, University of Illinois at Chicago, Chicago, IL, United States
| | - Quentin Howlett-Prieto
- Department of Neurology and Rehabilitation, University of Illinois at Chicago, Chicago, IL, United States
| | - Michael D. Carrithers
- Department of Neurology and Rehabilitation, University of Illinois at Chicago, Chicago, IL, United States
| | - Daniel B. Hier
- Department of Neurology and Rehabilitation, University of Illinois at Chicago, Chicago, IL, United States
- Department of Electrical and Computer Engineering, Missouri University of Science and Technology, Rolla, MO, United States
| |
Collapse
|
24
|
Alsaleh MM, Allery F, Choi JW, Hama T, McQuillin A, Wu H, Thygesen JH. Prediction of disease comorbidity using explainable artificial intelligence and machine learning techniques: A systematic review. Int J Med Inform 2023; 175:105088. [PMID: 37156169 DOI: 10.1016/j.ijmedinf.2023.105088] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/06/2023] [Revised: 03/23/2023] [Accepted: 05/01/2023] [Indexed: 05/10/2023]
Abstract
OBJECTIVE Disease comorbidity is a major challenge in healthcare affecting the patient's quality of life and costs. AI-based prediction of comorbidities can overcome this issue by improving precision medicine and providing holistic care. The objective of this systematic literature review was to identify and summarise existing machine learning (ML) methods for comorbidity prediction and evaluate the interpretability and explainability of the models. MATERIALS AND METHODS The Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) framework was used to identify articles in three databases: Ovid Medline, Web of Science and PubMed. The literature search covered a broad range of terms for the prediction of disease comorbidity and ML, including traditional predictive modelling. RESULTS Of 829 unique articles, 58 full-text papers were assessed for eligibility. A final set of 22 articles with 61 ML models was included in this review. Of the identified ML models, 33 models achieved relatively high accuracy (80-95%) and AUC (0.80-0.89). Overall, 72% of studies had high or unclear concerns regarding the risk of bias. DISCUSSION This systematic review is the first to examine the use of ML and explainable artificial intelligence (XAI) methods for comorbidity prediction. The chosen studies focused on a limited scope of comorbidities ranging from 1 to 34 (mean = 6), and no novel comorbidities were found due to limited phenotypic and genetic data. The lack of standard evaluation for XAI hinders fair comparisons. CONCLUSION A broad range of ML methods has been used to predict the comorbidities of various disorders. With further development of explainable ML capacity in the field of comorbidity prediction, there is a significant possibility of identifying unmet health needs by highlighting comorbidities in patient groups that were not previously recognised to be at risk for particular comorbidities.
Collapse
Affiliation(s)
- Mohanad M Alsaleh
- Institute of Health Informatics, University College London, London, UK; Department of Health Informatics, College of Public Health and Health Informatics, Qassim University, Al Bukayriyah, Saudi Arabia.
| | - Freya Allery
- Institute of Health Informatics, University College London, London, UK
| | - Jung Won Choi
- Institute of Health Informatics, University College London, London, UK
| | - Tuankasfee Hama
- Institute of Health Informatics, University College London, London, UK
| | | | - Honghan Wu
- Institute of Health Informatics, University College London, London, UK
| | - Johan H Thygesen
- Institute of Health Informatics, University College London, London, UK
| |
Collapse
|
25
|
Daniali M, Galer PD, Lewis-Smith D, Parthasarathy S, Kim E, Salvucci DD, Miller JM, Haag S, Helbig I. Enriching representation learning using 53 million patient notes through human phenotype ontology embedding. Artif Intell Med 2023; 139:102523. [PMID: 37100502 PMCID: PMC10782859 DOI: 10.1016/j.artmed.2023.102523] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/01/2022] [Revised: 02/17/2023] [Accepted: 02/23/2023] [Indexed: 03/04/2023]
Abstract
The Human Phenotype Ontology (HPO) is a dictionary of >15,000 clinical phenotypic terms with defined semantic relationships, developed to standardize phenotypic analysis. Over the last decade, the HPO has been used to accelerate the implementation of precision medicine into clinical practice. In addition, recent research in representation learning, specifically in graph embedding, has led to notable progress in automated prediction via learned features. Here, we present a novel approach to phenotype representation by incorporating phenotypic frequencies based on 53 million full-text health care notes from >1.5 million individuals. We demonstrate the efficacy of our proposed phenotype embedding technique by comparing our work to existing phenotypic similarity-measuring methods. Using phenotype frequencies in our embedding technique, we are able to identify phenotypic similarities that surpass current computational models. Furthermore, our embedding technique exhibits a high degree of agreement with domain experts' judgment. By transforming complex and multidimensional phenotypes from the HPO format into vectors, our proposed method enables efficient representation of these phenotypes for downstream tasks that require deep phenotyping. This is demonstrated in a patient similarity analysis and can further be applied to disease trajectory and risk prediction.
Collapse
Affiliation(s)
- Maryam Daniali
- Department of Computer Science, Drexel University, Philadelphia, PA, USA; Department of Biomedical and Health Informatics (DBHi), Children's Hospital of Philadelphia, Philadelphia, PA, USA
| | - Peter D Galer
- Department of Biomedical and Health Informatics (DBHi), Children's Hospital of Philadelphia, Philadelphia, PA, USA; Division of Neurology, Children's Hospital of Philadelphia, Philadelphia, PA, USA; The Epilepsy Neuro Genetics Initiative (ENGIN), Children's Hospital of Philadelphia, Philadelphia, PA, USA; Center for Neuroengineering and Therapeutics, University of Pennsylvania, Philadelphia, PA, USA
| | - David Lewis-Smith
- Department of Biomedical and Health Informatics (DBHi), Children's Hospital of Philadelphia, Philadelphia, PA, USA; Division of Neurology, Children's Hospital of Philadelphia, Philadelphia, PA, USA; The Epilepsy Neuro Genetics Initiative (ENGIN), Children's Hospital of Philadelphia, Philadelphia, PA, USA; Translational and Clinical Research Institute, Newcastle University, Newcastle-upon-Tyne, UK; Department of Clinical Neurosciences, Royal Victoria Infirmary, Newcastle-upon-Tyne, UK
| | - Shridhar Parthasarathy
- Department of Biomedical and Health Informatics (DBHi), Children's Hospital of Philadelphia, Philadelphia, PA, USA; Division of Neurology, Children's Hospital of Philadelphia, Philadelphia, PA, USA; The Epilepsy Neuro Genetics Initiative (ENGIN), Children's Hospital of Philadelphia, Philadelphia, PA, USA
| | - Edward Kim
- Department of Computer Science, Drexel University, Philadelphia, PA, USA
| | - Dario D Salvucci
- Department of Computer Science, Drexel University, Philadelphia, PA, USA
| | - Jeffrey M Miller
- Department of Biomedical and Health Informatics (DBHi), Children's Hospital of Philadelphia, Philadelphia, PA, USA
| | - Scott Haag
- Department of Computer Science, Drexel University, Philadelphia, PA, USA; Department of Biomedical and Health Informatics (DBHi), Children's Hospital of Philadelphia, Philadelphia, PA, USA
| | - Ingo Helbig
- Department of Biomedical and Health Informatics (DBHi), Children's Hospital of Philadelphia, Philadelphia, PA, USA; Division of Neurology, Children's Hospital of Philadelphia, Philadelphia, PA, USA; The Epilepsy Neuro Genetics Initiative (ENGIN), Children's Hospital of Philadelphia, Philadelphia, PA, USA; Department of Neurology, University of Pennsylvania, Perelman School of Medicine, Philadelphia, PA, USA.
| |
Collapse
|
26
|
Callahan TJ, Stefanksi AL, Ostendorf DM, Wyrwa JM, Davies SJD, Hripcsak G, Hunter LE, Kahn MG. Characterizing Patient Representations for Computational Phenotyping. AMIA ... ANNUAL SYMPOSIUM PROCEEDINGS. AMIA SYMPOSIUM 2023; 2022:319-328. [PMID: 37128436 PMCID: PMC10148332] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Subscribe] [Scholar Register] [Indexed: 05/03/2023]
Abstract
Patient representation learning methods create rich representations of complex data and have potential to further advance the development of computational phenotypes (CP). Currently, these methods are either applied to small predefined concept sets or all available patient data, limiting the potential for novel discovery and reducing the explainability of the resulting representations. We report on an extensive, data-driven characterization of the utility of patient representation learning methods for the purpose of CP development or automatization. We conducted ablation studies to examine the impact of patient representations, built using data from different combinations of data types and sampling windows on rare disease classification. We demonstrated that the data type and sampling window directly impact classification and clustering performance, and these results differ by rare disease group. Our results, although preliminary, exemplify the importance of and need for data-driven characterization in patient representation-based CP development pipelines.
Collapse
Affiliation(s)
- Tiffany J Callahan
- Columbia University, New York, NY, 10032, USA
- University of Colorado Anschutz Medical Campus, Aurora, CO, 80045, USA
| | | | | | - Jordan M Wyrwa
- University of Colorado Anschutz Medical Campus, Aurora, CO, 80045, USA
- Children's Hospital Colorado, Aurora, CO, 80045, USA
| | | | | | - Lawrence E Hunter
- University of Colorado Anschutz Medical Campus, Aurora, CO, 80045, USA
| | - Michael G Kahn
- University of Colorado Anschutz Medical Campus, Aurora, CO, 80045, USA
| |
Collapse
|
27
|
Samaras A, Bekiaridou A, Papazoglou AS, Moysidis DV, Tsoumakas G, Bamidis P, Tsigkas G, Lazaros G, Kassimis G, Fragakis N, Vassilikos V, Zarifis I, Tziakas DN, Tsioufis K, Davlouros P, Giannakoulas G. Artificial intelligence-based mining of electronic health record data to accelerate the digital transformation of the national cardiovascular ecosystem: design protocol of the CardioMining study. BMJ Open 2023; 13:e068698. [PMID: 37012018 PMCID: PMC10083759 DOI: 10.1136/bmjopen-2022-068698] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Indexed: 04/04/2023] Open
Abstract
INTRODUCTION Mining of electronic health record (EHRs) data is increasingly being implemented all over the world but mainly focuses on structured data. The capabilities of artificial intelligence (AI) could reverse the underusage of unstructured EHR data and enhance the quality of medical research and clinical care. This study aims to develop an AI-based model to transform unstructured EHR data into an organised, interpretable dataset and form a national dataset of cardiac patients. METHODS AND ANALYSIS CardioMining is a retrospective, multicentre study based on large, longitudinal data obtained from unstructured EHRs of the largest tertiary hospitals in Greece. Demographics, hospital administrative data, medical history, medications, laboratory examinations, imaging reports, therapeutic interventions, in-hospital management and postdischarge instructions will be collected, coupled with structured prognostic data from the National Institute of Health. The target number of included patients is 100 000. Natural language processing techniques will facilitate data mining from the unstructured EHRs. The accuracy of the automated model will be compared with the manual data extraction by study investigators. Machine learning tools will provide data analytics. CardioMining aims to cultivate the digital transformation of the national cardiovascular system and fill the gap in medical recording and big data analysis using validated AI techniques. ETHICS AND DISSEMINATION This study will be conducted in keeping with the International Conference on Harmonisation Good Clinical Practice guidelines, the Declaration of Helsinki, the Data Protection Code of the European Data Protection Authority and the European General Data Protection Regulation. The Research Ethics Committee of the Aristotle University of Thessaloniki and Scientific and Ethics Council of the AHEPA University Hospital have approved this study. Study findings will be disseminated through peer-reviewed medical journals and international conferences. International collaborations with other cardiovascular registries will be attempted. TRIAL REGISTRATION NUMBER NCT05176769.
Collapse
Affiliation(s)
- Athanasios Samaras
- 1st Department of Cardiology, University General Hospital of Thessaloniki AHEPA, Thessaloniki, Greece
| | - Alexandra Bekiaridou
- 1st Department of Cardiology, University General Hospital of Thessaloniki AHEPA, Thessaloniki, Greece
- Institute of Bioelectronic Medicine, Feinstein Institutes for Medical Research, New York, New York, USA
| | - Andreas S Papazoglou
- 1st Department of Cardiology, University General Hospital of Thessaloniki AHEPA, Thessaloniki, Greece
| | - Dimitrios V Moysidis
- 1st Department of Cardiology, University General Hospital of Thessaloniki AHEPA, Thessaloniki, Greece
| | - Grigorios Tsoumakas
- School of Informatics, Aristotle University of Thessaloniki, Thessaloniki, Greece
| | - Panagiotis Bamidis
- Medical Physics and Digital Innovation Laboratory, School of Medicine, Aristotle University of Thessaloniki, Thessaloniki, Greece
| | - Grigorios Tsigkas
- Department of Cardiology, University Hospital of Patras, Rio Patras, Greece
| | - George Lazaros
- 1st Cardiology Department, "Hippokration" General Hospital, University of Athens Medical School, Athens, Greece
| | - George Kassimis
- 1st Department of Cardiology, University General Hospital of Thessaloniki AHEPA, Thessaloniki, Greece
- 2nd Cardiology Department, Hippokrateion General Hospital, Aristotle University of Thessaloniki, Thessaloniki, Greece
| | - Nikolaos Fragakis
- 2nd Cardiology Department, Hippokrateion General Hospital, Aristotle University of Thessaloniki, Thessaloniki, Greece
| | - Vassilios Vassilikos
- 3rd Cardiology Department, Hippokrateion General Hospital, Aristotle University of Thessaloniki, Thessaloniki, Greece
| | - Ioannis Zarifis
- Department of Cardiology, "George Papanikolaou" General Hospital, Thessaloniki, Greece
| | - Dimitrios N Tziakas
- Department of Cardiology, Democritus University of Thrace, University Hospital of Alexandroupolis, Alexandroupolis, Greece
| | - Konstantinos Tsioufis
- 1st Cardiology Department, "Hippokration" General Hospital, University of Athens Medical School, Athens, Greece
| | - Periklis Davlouros
- Department of Cardiology, University Hospital of Patras, Rio Patras, Greece
| | - George Giannakoulas
- 1st Department of Cardiology, University General Hospital of Thessaloniki AHEPA, Thessaloniki, Greece
| |
Collapse
|
28
|
Sharperson C, Hajibonabi F, Hanna TN, Gerard RL, Gilyard S, Johnson JO. Are disparities in emergency department imaging exacerbated during high-volume periods? Clin Imaging 2023; 96:9-14. [PMID: 36731373 DOI: 10.1016/j.clinimag.2023.01.005] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/29/2022] [Revised: 01/05/2023] [Accepted: 01/09/2023] [Indexed: 01/17/2023]
Abstract
PURPOSE Evaluate if disparities in the emergency department (ED) imaging timeline exist, and if disparities are altered during high volume periods which may stress resource availability. METHODS This retrospective study was conducted at a four-hospital healthcare system. All patients with at least one ED visit containing imaging from 1/1/2016 to 9/30/2020 were included. Peak hours were defined as ED encounters occurring between 5 pm and midnight, while all other ED encounters were non-peak hours. Patient-flow data points included ED length of stay (LOS), image acquisition time, and diagnostic image assessment time. RESULTS 321,786 total ED visits consisted of 102,560 during peak hours and 219,226 during non-peak hours. Black patients experienced longer image acquisition and image assessment times across both time periods (TR = 1.030; p < 0.001 and TR = 1.112; p < 0.001, respectively); Black patients also had increased length of stay compared to White patients, which was amplified during peak hours. Likewise, patients with primary payer insurance experienced significantly longer image acquisition and image assessment times in both periods (TR > 1.00; p < 0.05 for all). Females had longer image acquisition and image assessment time and the difference was more pronounced in image acquisition time during both peak and non-peak hours (TR = 1.146 and TR = 1.139 respectively with p < 0.001 for both). CONCLUSION When measuring radiology time periods, patient flow throughout the ED was not uniform. There was unequal acceleration and deceleration of patient flow based on racial, gender, age, and insurance status. Segmentation of patient flow time periods may allow identification of causes of inequity such that disparities can be addressed with targeted actions.
Collapse
Affiliation(s)
- Camara Sharperson
- Emory University School of Medicine, Atlanta, GA, United States of America
| | - Farid Hajibonabi
- Department of Radiology and Imaging Sciences, Emory University School of Medicine, Atlanta, GA, United States of America.
| | - Tarek N Hanna
- Department of Radiology and Imaging Sciences, Emory University School of Medicine, Atlanta, GA, United States of America
| | - Roger L Gerard
- Department of Radiology and Imaging Sciences, Emory University School of Medicine, Atlanta, GA, United States of America
| | - Shenise Gilyard
- Department of Radiology and Imaging Sciences, Emory University School of Medicine, Atlanta, GA, United States of America
| | - Jamlik-Omari Johnson
- Department of Radiology and Imaging Sciences, Emory University School of Medicine, Atlanta, GA, United States of America
| |
Collapse
|
29
|
He T, Belouali A, Patricoski J, Lehmann H, Ball R, Anagnostou V, Kreimeyer K, Botsis T. Trends and opportunities in computable clinical phenotyping: A scoping review. J Biomed Inform 2023; 140:104335. [PMID: 36933631 DOI: 10.1016/j.jbi.2023.104335] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/09/2022] [Revised: 03/07/2023] [Accepted: 03/09/2023] [Indexed: 03/18/2023]
Abstract
Identifying patient cohorts meeting the criteria of specific phenotypes is essential in biomedicine and particularly timely in precision medicine. Many research groups deliver pipelines that automatically retrieve and analyze data elements from one or more sources to automate this task and deliver high-performing computable phenotypes. We applied a systematic approach based on the Preferred Reporting Items for Systematic Reviews and Meta-Analyses guidelines to conduct a thorough scoping review on computable clinical phenotyping. Five databases were searched using a query that combined the concepts of automation, clinical context, and phenotyping. Subsequently, four reviewers screened 7960 records (after removing over 4000 duplicates) and selected 139 that satisfied the inclusion criteria. This dataset was analyzed to extract information on target use cases, data-related topics, phenotyping methodologies, evaluation strategies, and portability of developed solutions. Most studies supported patient cohort selection without discussing the application to specific use cases, such as precision medicine. Electronic Health Records were the primary source in 87.1 % (N = 121) of all studies, and International Classification of Diseases codes were heavily used in 55.4 % (N = 77) of all studies, however, only 25.9 % (N = 36) of the records described compliance with a common data model. In terms of the presented methods, traditional Machine Learning (ML) was the dominant method, often combined with natural language processing and other approaches, while external validation and portability of computable phenotypes were pursued in many cases. These findings revealed that defining target use cases precisely, moving away from sole ML strategies, and evaluating the proposed solutions in the real setting are essential opportunities for future work. There is also momentum and an emerging need for computable phenotyping to support clinical and epidemiological research and precision medicine.
Collapse
Affiliation(s)
- Ting He
- Department of Oncology, The Sidney Kimmel Comprehensive Cancer Center, Johns Hopkins University School of Medicine, Baltimore, MD, USA; Biomedical Informatics and Data Science Section, Johns Hopkins University School of Medicine, Baltimore, MD, USA.
| | - Anas Belouali
- Biomedical Informatics and Data Science Section, Johns Hopkins University School of Medicine, Baltimore, MD, USA
| | - Jessica Patricoski
- Biomedical Informatics and Data Science Section, Johns Hopkins University School of Medicine, Baltimore, MD, USA
| | - Harold Lehmann
- Biomedical Informatics and Data Science Section, Johns Hopkins University School of Medicine, Baltimore, MD, USA
| | - Robert Ball
- Office of Surveillance and Epidemiology, Center for Drug Evaluation and Research, US FDA, Silver Spring, MD, USA
| | - Valsamo Anagnostou
- Department of Oncology, The Sidney Kimmel Comprehensive Cancer Center, Johns Hopkins University School of Medicine, Baltimore, MD, USA
| | - Kory Kreimeyer
- Department of Oncology, The Sidney Kimmel Comprehensive Cancer Center, Johns Hopkins University School of Medicine, Baltimore, MD, USA; Biomedical Informatics and Data Science Section, Johns Hopkins University School of Medicine, Baltimore, MD, USA
| | - Taxiarchis Botsis
- Department of Oncology, The Sidney Kimmel Comprehensive Cancer Center, Johns Hopkins University School of Medicine, Baltimore, MD, USA; Biomedical Informatics and Data Science Section, Johns Hopkins University School of Medicine, Baltimore, MD, USA.
| |
Collapse
|
30
|
Arnold CG, Sonn B, Meyers FJ, Vest A, Puls R, Zirkler E, Edelmann M, Brooks IM, Monte AA. Accessing and utilizing clinical and genomic data from an electronic health record data warehouse. TRANSLATIONAL MEDICINE COMMUNICATIONS 2023; 8:7. [PMID: 38223535 PMCID: PMC10786622 DOI: 10.1186/s41231-023-00140-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 12/29/2022] [Accepted: 02/20/2023] [Indexed: 01/16/2024]
Abstract
Electronic health records (EHRs) and linked biobanks have tremendous potential to advance biomedical research and ultimately improve the health of future generations. Repurposing EHR data for research is not without challenges, however. In this paper, we describe the processes and considerations necessary to successfully access and utilize a data warehouse for research. Although imperfect, data warehouses are a powerful tool for harnessing a large amount of data to phenotype disease. They will have increasing relevance and applications in clinical research with growing sophistication in processes for EHR data abstraction, biobank integration, and cross-institutional linkage.
Collapse
Affiliation(s)
- Cosby G. Arnold
- Department of Emergency Medicine, School of Medicine, University of California, Davis, 4150 V Street #2100, Sacramento, CA 95817, USA
| | - Brandon Sonn
- Department of Emergency Medicine, University of Colorado Denver-Anschutz Medical Center, University of Colorado School of Medicine, Mail Stop B-215, 12401 East 17th Avenue, Aurora, CO 80045, USA
| | - Frederick J. Meyers
- Department of Internal Medicine, University of California, Davis, School of Medicine, 4150 V Street #3100, Sacramento, CA 95817, USA
| | - Alexis Vest
- Department of Emergency Medicine, University of Colorado Denver-Anschutz Medical Center, University of Colorado School of Medicine, Mail Stop B-215, 12401 East 17th Avenue, Aurora, CO 80045, USA
| | - Richie Puls
- Department of Emergency Medicine, University of Colorado Denver-Anschutz Medical Center, University of Colorado School of Medicine, Mail Stop B-215, 12401 East 17th Avenue, Aurora, CO 80045, USA
| | - Estelle Zirkler
- Department of Biomedical Informatics, University of Colorado School of Medicine, Anschutz Health Sciences Building, 1890 N. Revere Court, Mailstop F600, Aurora, CO 80045, USA
| | - Michelle Edelmann
- Department of Biomedical Informatics, University of Colorado School of Medicine, Anschutz Health Sciences Building, 1890 N. Revere Court, Mailstop F600, Aurora, CO 80045, USA
| | - Ian M. Brooks
- Department of Biomedical Informatics, University of Colorado School of Medicine, Anschutz Health Sciences Building, 1890 N. Revere Court, Mailstop F600, Aurora, CO 80045, USA
| | - Andrew A. Monte
- Department of Emergency Medicine, School of Medicine, University of California, Davis, 4150 V Street #2100, Sacramento, CA 95817, USA
- Rocky Mountain Poison & Drug Center, Denver Health and Hospital Authority, 1391 Speer Blvd Unit 600, Denver, CO 80204, USA
| |
Collapse
|
31
|
Wang L, Foer D, Zhang Y, Karlson EW, Bates DW, Zhou L. Post-Acute COVID-19 Respiratory Symptoms in Patients With Asthma: An Electronic Health Records-Based Study. THE JOURNAL OF ALLERGY AND CLINICAL IMMUNOLOGY. IN PRACTICE 2023; 11:825-835.e3. [PMID: 36566779 PMCID: PMC9773736 DOI: 10.1016/j.jaip.2022.12.003] [Citation(s) in RCA: 5] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 06/23/2022] [Revised: 11/27/2022] [Accepted: 12/01/2022] [Indexed: 12/24/2022]
Abstract
BACKGROUND Post-viral respiratory symptoms are common among patients with asthma. Respiratory symptoms after acute COVID-19 are widely reported in the general population, but large-scale studies identifying symptom risk for patients with asthma are lacking. OBJECTIVE To identify and compare risk for post-acute COVID-19 respiratory symptoms in patients with and without asthma. METHODS This retrospective, observational cohort study included COVID-19-positive patients between March 4, 2020, and January 20, 2021, with up to 180 days of health care follow-up in a health care system in the Northeastern United States. Respiratory symptoms recorded in clinical notes from days 28 to 180 after COVID-19 diagnosis were extracted using natural language processing. Cohorts were stratified by hospitalization status during the acute COVID-19 period. Univariable and multivariable analyses were used to compare symptoms among patients with and without asthma adjusting for demographic and clinical confounders. RESULTS Among 31,084 eligible patients with COVID-19, 2863 (9.2%) had hospitalization during the acute COVID-19 period; 4049 (13.0%) had a history of asthma, accounting for 13.8% of hospitalized and 12.9% of nonhospitalized patients. In the post-acute COVID-19 period, patients with asthma had significantly higher risk of shortness of breath, cough, bronchospasm, and wheezing than patients without an asthma history. Incident respiratory symptoms of bronchospasm and wheezing were also higher in patients with asthma. Patients with asthma who had not been hospitalized during acute COVID-19 had additionally higher risk of cough, abnormal breathing, sputum changes, and a wider range of incident respiratory symptoms. CONCLUSION Patients with asthma may have an under-recognized burden of respiratory symptoms after COVID-19 warranting increased awareness and monitoring in this population.
Collapse
Affiliation(s)
- Liqin Wang
- Division of General Internal Medicine and Primary Care, Department of Medicine, Brigham and Women's Hospital and Harvard Medical School, Boston, Mass.
| | - Dinah Foer
- Division of General Internal Medicine and Primary Care, Department of Medicine, Brigham and Women's Hospital and Harvard Medical School, Boston, Mass; Division of Allergy and Clinical Immunology, Department of Medicine, Brigham and Women's Hospital, Boston, Mass
| | - Yuqing Zhang
- Division of Rheumatology, Allergy, and Immunology, Department of Medicine, Massachusetts General Hospital and Harvard Medical School, Boston, Mass
| | - Elizabeth W Karlson
- Division of Rheumatology, Inflammation, and Immunity, Department of Medicine, Brigham and Women's Hospital and Harvard Medical School, Boston, Mass
| | - David W Bates
- Division of General Internal Medicine and Primary Care, Department of Medicine, Brigham and Women's Hospital and Harvard Medical School, Boston, Mass
| | - Li Zhou
- Division of General Internal Medicine and Primary Care, Department of Medicine, Brigham and Women's Hospital and Harvard Medical School, Boston, Mass
| |
Collapse
|
32
|
Li Y, Hu H, Zheng Y, Donahoo WT, Guo Y, Xu J, Chen WH, Liu N, Shenkman EA, Bian J, Guo J. Impact of Contextual-Level Social Determinants of Health on Newer Antidiabetic Drug Adoption in Patients with Type 2 Diabetes. INTERNATIONAL JOURNAL OF ENVIRONMENTAL RESEARCH AND PUBLIC HEALTH 2023; 20:ijerph20054036. [PMID: 36901047 PMCID: PMC10001625 DOI: 10.3390/ijerph20054036] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 12/24/2022] [Revised: 02/17/2023] [Accepted: 02/22/2023] [Indexed: 05/14/2023]
Abstract
BACKGROUND We aimed to investigate the association between contextual-level social determinants of health (SDoH) and the use of novel antidiabetic drugs (ADD), including sodium-glucose cotransporter-2 inhibitors (SGLT2i) and glucagon-like peptide-1 receptor agonists (GLP1a) for patients with type 2 diabetes (T2D), and whether the association varies across racial and ethnic groups. METHODS Using electronic health records from the OneFlorida+ network, we assembled a cohort of T2D patients who initiated a second-line ADD in 2015-2020. A set of 81 contextual-level SDoH documenting social and built environment were spatiotemporally linked to individuals based on their residential histories. We assessed the association between the contextual-level SDoH and initiation of SGTL2i/GLP1a and determined their effects across racial groups, adjusting for clinical factors. RESULTS Of 28,874 individuals, 61% were women, and the mean age was 58 (±15) years. Two contextual-level SDoH factors identified as significantly associated with SGLT2i/GLP1a use were neighborhood deprivation index (odds ratio [OR] 0.87, 95% confidence interval [CI] 0.81-0.94) and the percent of vacant addresses in the neighborhood (OR 0.91, 95% CI 0.85-0.98). Patients living in such neighborhoods are less likely to be prescribed with newer ADD. There was no interaction between race-ethnicity and SDoH on the use of newer ADD. However, in the overall cohort, the non-Hispanic Black individuals were less likely to use newer ADD than the non-Hispanic White individuals (OR 0.82, 95% CI 0.76-0.88). CONCLUSION Using a data-driven approach, we identified the key contextual-level SDoH factors associated with not following evidence-based treatment of T2D. Further investigations are needed to examine the mechanisms underlying these associations.
Collapse
Affiliation(s)
- Yujia Li
- Department of Pharmaceutical Outcomes & Policy, College of Pharmacy, University of Florida, Gainesville, FL 32610, USA
| | - Hui Hu
- Channing Division of Network Medicine, Brigham and Women’s Hospital, Boston, MA 02115, USA
- Department of Medicine, Harvard Medical School, Boston, MA 02115, USA
| | - Yi Zheng
- Channing Division of Network Medicine, Brigham and Women’s Hospital, Boston, MA 02115, USA
- Department of Medicine, Harvard Medical School, Boston, MA 02115, USA
| | - William Troy Donahoo
- Division of Endocrinology, Diabetes and Metabolism, College of Medicine, University of Florida, Gainesville, FL 32610, USA
| | - Yi Guo
- Department of Health Outcomes & Biomedical Informatics, College of Medicine, University of Florida, Gainesville, FL 32610, USA
| | - Jie Xu
- Department of Health Outcomes & Biomedical Informatics, College of Medicine, University of Florida, Gainesville, FL 32610, USA
| | - Wei-Han Chen
- Department of Pharmaceutical Outcomes & Policy, College of Pharmacy, University of Florida, Gainesville, FL 32610, USA
| | - Ning Liu
- Department of Pharmaceutical Outcomes & Policy, College of Pharmacy, University of Florida, Gainesville, FL 32610, USA
| | - Elisabeth A. Shenkman
- Department of Health Outcomes & Biomedical Informatics, College of Medicine, University of Florida, Gainesville, FL 32610, USA
| | - Jiang Bian
- Department of Health Outcomes & Biomedical Informatics, College of Medicine, University of Florida, Gainesville, FL 32610, USA
| | - Jingchuan Guo
- Department of Pharmaceutical Outcomes & Policy, College of Pharmacy, University of Florida, Gainesville, FL 32610, USA
- Correspondence: ; Tel.: +1-352-273-6533
| |
Collapse
|
33
|
Brandt PS, Kho A, Luo Y, Pacheco JA, Walunas TL, Hakonarson H, Hripcsak G, Liu C, Shang N, Weng C, Walton N, Carrell DS, Crane PK, Larson EB, Chute CG, Kullo IJ, Carroll R, Denny J, Ramirez A, Wei WQ, Pathak J, Wiley LK, Richesson R, Starren JB, Rasmussen LV. Characterizing variability of electronic health record-driven phenotype definitions. J Am Med Inform Assoc 2023; 30:427-437. [PMID: 36474423 PMCID: PMC9933077 DOI: 10.1093/jamia/ocac235] [Citation(s) in RCA: 4] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/21/2022] [Revised: 10/19/2022] [Accepted: 11/23/2022] [Indexed: 12/12/2022] Open
Abstract
OBJECTIVE The aim of this study was to analyze a publicly available sample of rule-based phenotype definitions to characterize and evaluate the variability of logical constructs used. MATERIALS AND METHODS A sample of 33 preexisting phenotype definitions used in research that are represented using Fast Healthcare Interoperability Resources and Clinical Quality Language (CQL) was analyzed using automated analysis of the computable representation of the CQL libraries. RESULTS Most of the phenotype definitions include narrative descriptions and flowcharts, while few provide pseudocode or executable artifacts. Most use 4 or fewer medical terminologies. The number of codes used ranges from 5 to 6865, and value sets from 1 to 19. We found that the most common expressions used were literal, data, and logical expressions. Aggregate and arithmetic expressions are the least common. Expression depth ranges from 4 to 27. DISCUSSION Despite the range of conditions, we found that all of the phenotype definitions consisted of logical criteria, representing both clinical and operational logic, and tabular data, consisting of codes from standard terminologies and keywords for natural language processing. The total number and variety of expressions are low, which may be to simplify implementation, or authors may limit complexity due to data availability constraints. CONCLUSIONS The phenotype definitions analyzed show significant variation in specific logical, arithmetic, and other operators but are all composed of the same high-level components, namely tabular data and logical expressions. A standard representation for phenotype definitions should support these formats and be modular to support localization and shared logic.
Collapse
Affiliation(s)
- Pascal S Brandt
- Department of Biomedical and Medical Education, University of Washington, Seattle, Washington, USA
| | - Abel Kho
- Department of Preventive Medicine, Northwestern University Feinberg School of Medicine, Chicago, Illinois, USA
| | - Yuan Luo
- Department of Preventive Medicine, Northwestern University Feinberg School of Medicine, Chicago, Illinois, USA
| | - Jennifer A Pacheco
- Department of Preventive Medicine, Northwestern University Feinberg School of Medicine, Chicago, Illinois, USA
| | - Theresa L Walunas
- Department of Preventive Medicine, Northwestern University Feinberg School of Medicine, Chicago, Illinois, USA
| | - Hakon Hakonarson
- Center for Applied Genomics, Children's Hospital of Philadelphia, Philadelphia, Pennsylvania, USA
| | - George Hripcsak
- Department of Biomedical Informatics, Columbia University, New York, New York, USA
| | - Cong Liu
- Department of Biomedical Informatics, Columbia University, New York, New York, USA
| | - Ning Shang
- Department of Biomedical Informatics, Columbia University, New York, New York, USA
| | - Chunhua Weng
- Department of Biomedical Informatics, Columbia University, New York, New York, USA
| | - Nephi Walton
- Intermountain Precision Genomics, Intermountain Healthcare, St George, Utah, USA
| | - David S Carrell
- Kaiser Permanente Washington Health Research Institute, Seattle, Washington, USA
| | - Paul K Crane
- Department of Medicine, University of Washington, Seattle, Washington, USA
| | - Eric B Larson
- Department of Medicine, University of Washington, Seattle, Washington, USA
- Department of Health Services, University of Washington, Seattle, Washington, USA
| | - Christopher G Chute
- Schools of Medicine, Public Health, and Nursing, Johns Hopkins University, Baltimore, Maryland, USA
| | - Iftikhar J Kullo
- Department of Cardiovascular Medicine, Mayo Clinic, Rochester, Minnesota, USA
| | - Robert Carroll
- Department of Medicine, Vanderbilt University Medical Center, Nashville, Tennessee, USA
| | - Josh Denny
- All of Us Research Program, National Institutes of Health, Bethesda, Maryland, USA
| | - Andrea Ramirez
- Department of Medicine, Vanderbilt University Medical Center, Nashville, Tennessee, USA
| | - Wei-Qi Wei
- Department of Biomedical Informatics, Vanderbilt University Medical Center, Nashville, Tennessee, USA
| | - Jyoti Pathak
- Department of Population Health Sciences, Weill Cornell Medicine, New York, New York, USA
| | - Laura K Wiley
- Department of Biomedical Informatics, University of Colorado Anschutz Medical Campus, Aurora, Colorado, USA
| | - Rachel Richesson
- Department of Learning Health Sciences, University of Michigan Medical School, Ann Arbor, Michigan, USA
| | - Justin B Starren
- Department of Preventive Medicine, Northwestern University Feinberg School of Medicine, Chicago, Illinois, USA
| | - Luke V Rasmussen
- Department of Preventive Medicine, Northwestern University Feinberg School of Medicine, Chicago, Illinois, USA
| |
Collapse
|
34
|
Assess the documentation of cognitive tests and biomarkers in electronic health records via natural language processing for Alzheimer's disease and related dementias. Int J Med Inform 2023; 170:104973. [PMID: 36577203 DOI: 10.1016/j.ijmedinf.2022.104973] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/27/2022] [Revised: 12/11/2022] [Accepted: 12/17/2022] [Indexed: 12/24/2022]
Abstract
BACKGROUND Cognitive tests and biomarkers are the key information to assess the severity and track the progression of Alzheimer's' disease (AD) and AD-related dementias (AD/ADRD), yet, both are often only documented in clinical narratives of patients' electronic health records (EHRs). In this work, we aim to (1) assess the documentation of cognitive tests and biomarkers in EHRs that can be used as real-world endpoints, and (2) identify, extract, and harmonize the different commonly used cognitive tests from clinical narratives using natural language processing (NLP) methods into categorical AD/ADRD severity. METHODS We developed a rule-based NLP pipeline to extract the cognitive tests and biomarkers from clinical narratives in AD/ADRD patients' EHRs. We aggregated the extracted results to the patient level and harmonized the cognitive test scores into severity categories using cutoffs determined based on both relevant literature and domain knowledge of AD/ADRD clinicians. RESULTS We identified an AD/ADRD cohort of 48,912 patients from the University of Florida (UF) Health system and identified 7 measurements (6 cognitive tests and 1 biomarker) that are frequently documented in our data. Our NLP pipeline achieved an overall F1-score of 0.9059 across the 7 measurements. Among the 6 cognitive tests, we were able to harmonize 4 cognitive test scores into severity categories, and the population characteristics of patients with different severity were described. We also identified several factors related to the availability of their documentation in EHRs. CONCLUSION This study demonstrates that our NLP pipelines can extract cognitive tests and biomarkers of AD/ADRD accurately for downstream studies. Although, the documentation of cognitive tests and biomarkers in EHRs appears to be low, RWD is still an important resource for AD/ADRD research. Nevertheless, providing standardized approach to document cognitive tests and biomarkers in EHRS are also warranted.
Collapse
|
35
|
Wu CS, Chen CH, Su CH, Chien YL, Dai HJ, Chen HH. Augmenting DSM-5 diagnostic criteria with self-attention-based BiLSTM models for psychiatric diagnosis. Artif Intell Med 2023; 136:102488. [PMID: 36710066 DOI: 10.1016/j.artmed.2023.102488] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/06/2022] [Revised: 11/20/2022] [Accepted: 01/09/2023] [Indexed: 01/12/2023]
Abstract
BACKGROUND Most previous studies make psychiatric diagnoses based on diagnostic terms. In this study we sought to augment Diagnostic and Statistical Manual of Mental Disorders, 5th Edition (DSM-5) diagnostic criteria with deep neural network models to make psychiatric diagnoses based on psychiatric notes. METHODS We augmented DSM-5 diagnostic criteria with self-attention-based bidirectional long short-term memory (BiLSTM) models to identify schizophrenia, bipolar, and unipolar depressive disorders. Given that the diagnostic criteria for psychiatric diagnosis include a certain symptom profile and functional impairment, we first extracted psychiatric symptoms and functional features with two approaches, including a lexicon-based approach and a dependency parsing approach. Then, we incorporated free-text discharge notes and extracted features for psychiatric diagnoses with the proposed models. RESULTS The micro-averaged F1 scores of the two automatic annotation approaches were greater than 0.8. BiLSTM models with self-attention outperformed the rule-based models with DSM-5 criteria in the prediction of schizophrenia and bipolar disorder, while the latter outperformed the former in predicting unipolar depressive disorder. Approaches for augmenting DSM-5 criteria with a self-attention-based BiLSTM outperformed both pure rule-based and pure deep neural network models. In terms of classification of psychiatric diagnoses, we observed that the performance for schizophrenia and bipolar disorder was acceptable. CONCLUSION This DSM-5-augmented deep neural network models showed good performance in identifying psychiatric diagnoses from psychiatric notes. We conclude that it is possible to establish a model that consults clinical notes to make psychiatric diagnoses comparably to physicians. Further research will be extended to outpatient notes and other psychiatric disorders.
Collapse
Affiliation(s)
- Chi-Shin Wu
- National Center for Geriatrics and Welfare Research, National Health Research Institutes, Zhunan, Taiwan; Department of Psychiatry, National Taiwan University Hospital, Yunlin branch, Douliu, Taiwan
| | - Chien-Hung Chen
- Graduate Institute of Networking and Multimedia, National Taiwan University, Taipei, Taiwan
| | - Chu-Hsien Su
- National Center for Geriatrics and Welfare Research, National Health Research Institutes, Zhunan, Taiwan
| | - Yi-Ling Chien
- Department of Psychiatry, National Taiwan University Hospital, Taipei, Taiwan
| | - Hong-Jie Dai
- Department of Electrical Engineering, College of Electrical Engineering and Computer Science, National Kaohsiung University of Science and Technology, Kaohsiung, Taiwan; School of Post-Baccalaureate Medicine, Kaohsiung Medical University, Kaohsiung, Taiwan; National Institute of Cancer Research, National Health Research Institutes, Tainan, Taiwan
| | - Hsin-Hsi Chen
- Department of Computer Science and Information Engineering, National Taiwan University, Taipei, Taiwan.
| |
Collapse
|
36
|
Abstract
Heterogeneity in sepsis and acute respiratory distress syndrome (ARDS) is increasingly being recognized as one of the principal barriers to finding efficacious targeted therapies. The advent of multiple high-throughput biological data ("omics"), coupled with the widespread access to increased computational power, has led to the emergence of phenotyping in critical care. Phenotyping aims to use a multitude of data to identify homogenous subgroups within an otherwise heterogenous population. Increasingly, phenotyping schemas are being applied to sepsis and ARDS to increase understanding of these clinical conditions and identify potential therapies. Here we present a selective review of the biological phenotyping schemas applied to sepsis and ARDS. Further, we outline some of the challenges involved in translating these conceptual findings to bedside clinical decision-making tools.
Collapse
Affiliation(s)
- Pratik Sinha
- Division of Clinical & Translational Research and Division of Critical Care, Department of Anesthesia, Washington University, St. Louis, Missouri, USA;
| | - Nuala J Meyer
- Division of Pulmonary, Allergy, and Critical Care Medicine; Center for Translational Lung Biology; and Lung Biology Institute, University of Pennsylvania Perelman School of Medicine; Philadelphia, Pennsylvania, USA
| | - Carolyn S Calfee
- Division of Pulmonary, Critical Care, Allergy & Sleep Medicine, Department of Medicine, University of California San Francisco, San Francisco, California, USA
| |
Collapse
|
37
|
Yang S, Varghese P, Stephenson E, Tu K, Gronsbell J. Machine learning approaches for electronic health records phenotyping: a methodical review. J Am Med Inform Assoc 2023; 30:367-381. [PMID: 36413056 PMCID: PMC9846699 DOI: 10.1093/jamia/ocac216] [Citation(s) in RCA: 23] [Impact Index Per Article: 23.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/28/2022] [Revised: 09/27/2022] [Accepted: 10/27/2022] [Indexed: 11/23/2022] Open
Abstract
OBJECTIVE Accurate and rapid phenotyping is a prerequisite to leveraging electronic health records for biomedical research. While early phenotyping relied on rule-based algorithms curated by experts, machine learning (ML) approaches have emerged as an alternative to improve scalability across phenotypes and healthcare settings. This study evaluates ML-based phenotyping with respect to (1) the data sources used, (2) the phenotypes considered, (3) the methods applied, and (4) the reporting and evaluation methods used. MATERIALS AND METHODS We searched PubMed and Web of Science for articles published between 2018 and 2022. After screening 850 articles, we recorded 37 variables on 100 studies. RESULTS Most studies utilized data from a single institution and included information in clinical notes. Although chronic conditions were most commonly considered, ML also enabled the characterization of nuanced phenotypes such as social determinants of health. Supervised deep learning was the most popular ML paradigm, while semi-supervised and weakly supervised learning were applied to expedite algorithm development and unsupervised learning to facilitate phenotype discovery. ML approaches did not uniformly outperform rule-based algorithms, but deep learning offered a marginal improvement over traditional ML for many conditions. DISCUSSION Despite the progress in ML-based phenotyping, most articles focused on binary phenotypes and few articles evaluated external validity or used multi-institution data. Study settings were infrequently reported and analytic code was rarely released. CONCLUSION Continued research in ML-based phenotyping is warranted, with emphasis on characterizing nuanced phenotypes, establishing reporting and evaluation standards, and developing methods to accommodate misclassified phenotypes due to algorithm errors in downstream applications.
Collapse
Affiliation(s)
- Siyue Yang
- Department of Statistical Sciences, University of Toronto, Toronto, Ontario, Canada
| | | | - Ellen Stephenson
- Department of Family & Community Medicine, University of Toronto, Toronto, Ontario, Canada
| | - Karen Tu
- Department of Family & Community Medicine, University of Toronto, Toronto, Ontario, Canada
| | - Jessica Gronsbell
- Department of Statistical Sciences, University of Toronto, Toronto, Ontario, Canada
- Department of Family & Community Medicine, University of Toronto, Toronto, Ontario, Canada
- Department of Computer Science, University of Toronto, Toronto, Ontario, Canada
| |
Collapse
|
38
|
Obeid JS, Khalifa A, Xavier B, Bou-Daher H, Rockey DC. An AI Approach for Identifying Patients With Cirrhosis. J Clin Gastroenterol 2023; 57:82-88. [PMID: 34238846 PMCID: PMC8741865 DOI: 10.1097/mcg.0000000000001586] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 02/28/2021] [Accepted: 06/05/2021] [Indexed: 02/05/2023]
Abstract
GOAL The goal of this study was to evaluate an artificial intelligence approach, namely deep learning, on clinical text in electronic health records (EHRs) to identify patients with cirrhosis. BACKGROUND AND AIMS Accurate identification of cirrhosis in EHR is important for epidemiological, health services, and outcomes research. Currently, such efforts depend on International Classification of Diseases (ICD) codes, with limited success. MATERIALS AND METHODS We trained several machine learning models using discharge summaries from patients with known cirrhosis from a patient registry and random controls without cirrhosis or its complications based on ICD codes. Models were validated on patients for whom discharge summaries were manually reviewed and used as the gold standard test set. We tested Naive Bayes and Random Forest as baseline models and a deep learning model using word embedding and a convolutional neural network (CNN). RESULTS The training set included 446 cirrhosis patients and 689 controls, while the gold standard test set included 139 cirrhosis patients and 152 controls. Among the machine learning models, the CNN achieved the highest area under the receiver operating characteristic curve (0.993), with a precision of 0.965 and recall of 0.978, compared with 0.879 and 0.981 for the Naive Bayes and Random Forest, respectively (precision 0.787 and 0.958, and recalls 0.878 and 0.827). The precision by ICD codes for cirrhosis was 0.883 and recall was 0.978. CONCLUSIONS A CNN model trained on discharge summaries identified cirrhosis patients with high precision and recall. This approach for phenotyping cirrhosis in the EHR may provide a more accurate assessment of disease burden in a variety of studies.
Collapse
Affiliation(s)
- Jihad S. Obeid
- Department of Public Health Sciences, Medical University of South Carolina, Charleston, South Carolina, USA
| | - Ali Khalifa
- Division of Gastroenterology and Hepatology, Medical University of South Carolina, Charleston, South Carolina, USA
| | - Brandon Xavier
- Division of Gastroenterology and Hepatology, Medical University of South Carolina, Charleston, South Carolina, USA
| | - Halim Bou-Daher
- Division of Gastroenterology and Hepatology, Medical University of South Carolina, Charleston, South Carolina, USA
| | - Don C. Rockey
- Division of Gastroenterology and Hepatology, Medical University of South Carolina, Charleston, South Carolina, USA
- Medical University of South Carolina Digestive Disease Research Center, Medical University of South Carolina, Charleston, South Carolina, USA
| |
Collapse
|
39
|
van den Bulk S, Spoelman WA, van Dijkman PRM, Numans ME, Bonten TN. Non-acute chest pain in primary care; referral rates, communication and guideline adherence: a cohort study using routinely collected health data. BMC PRIMARY CARE 2022; 23:336. [PMID: 36550420 PMCID: PMC9784001 DOI: 10.1186/s12875-022-01939-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 05/11/2022] [Accepted: 12/02/2022] [Indexed: 12/24/2022]
Abstract
BACKGROUND The prevalence of coronary artery disease is increasing due to the aging population and increasing prevalence of cardiovascular risk factors. Non-acute chest pain often is the first symptom of stable coronary artery disease. To optimise care for patients with non-acute chest pain and make efficient use of available resources, we need to know more about the current incidence, referral rate and management of these patients. METHODS We used routinely collected health data from the STIZON data warehouse in the Netherlands between 2010 and 2016. Patients > 18 years, with no history of cardiovascular disease, seen by the general practitioner (GP) for non-acute chest pain with a suspected cardiac origin were included. Outcomes were (i) incidence of new non-acute chest pain in primary care, (ii) referral rates to the cardiologist, (iii) correspondence from the cardiologist to the GP, (iv) registration by GPs of received correspondence and; (v) pharmacological guideline adherence after newly diagnosed stable angina pectoris. RESULTS In total 9029 patients were included during the study period, resulting in an incidence of new non-acute chest pain of 1.01/1000 patient-years. 2166 (24%) patients were referred to the cardiologist. In 857/2114 (41%) referred patients, correspondence from the cardiologist was not available in the GP's electronic medical record. In 753/1257 (60%) patients with available correspondence, the GP did not code the conclusion in the electronic medical record. Despite guideline recommendations, 37/255 (15%) patients with angina pectoris were not prescribed antiplatelet therapy nor anticoagulation, 69/255 (27%) no statin and 67/255 (26%) no beta-blocker. CONCLUSION After referral, both communication from cardiologists and registration of the final diagnosis by GPs were suboptimal. Both cardiologists and GPs should make adequate communication and registration a priority, as it improves health outcomes. Secondary pharmacological prevention in patients with angina pectoris was below guideline standards. So, proactive attention needs to be given to optimise secondary prevention in this high-risk group in primary care.
Collapse
Affiliation(s)
- Simone van den Bulk
- grid.10419.3d0000000089452978Department of Public Health and Primary Care, Leiden University Medical Center, Postzone V0-P, Postbus 9600, 2300 RC Leiden, The Netherlands
| | - Wouter A. Spoelman
- grid.10419.3d0000000089452978Department of Public Health and Primary Care, Leiden University Medical Center, Postzone V0-P, Postbus 9600, 2300 RC Leiden, The Netherlands
| | - Paul R. M. van Dijkman
- grid.10419.3d0000000089452978Department of Cardiology, Leiden University Medical Center, Postzone V0-P, Postbus 9600, 2300 RC Leiden, The Netherlands
| | - Mattijs E. Numans
- grid.10419.3d0000000089452978Department of Public Health and Primary Care, Leiden University Medical Center, Postzone V0-P, Postbus 9600, 2300 RC Leiden, The Netherlands
| | - Tobias N. Bonten
- grid.10419.3d0000000089452978Department of Public Health and Primary Care, Leiden University Medical Center, Postzone V0-P, Postbus 9600, 2300 RC Leiden, The Netherlands
| | | |
Collapse
|
40
|
Alzubi R, Alzoubi H, Katsigiannis S, West D, Ramzan N. Automated Detection of Substance-Use Status and Related Information from Clinical Text. SENSORS (BASEL, SWITZERLAND) 2022; 22:9609. [PMID: 36559979 PMCID: PMC9783118 DOI: 10.3390/s22249609] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 11/05/2022] [Revised: 11/21/2022] [Accepted: 11/25/2022] [Indexed: 06/17/2023]
Abstract
This study aims to develop and evaluate an automated system for extracting information related to patient substance use (smoking, alcohol, and drugs) from unstructured clinical text (medical discharge records). The authors propose a four-stage system for the extraction of the substance-use status and related attributes (type, frequency, amount, quit-time, and period). The first stage uses a keyword search technique to detect sentences related to substance use and to exclude unrelated records. In the second stage, an extension of the NegEx negation detection algorithm is developed and employed for detecting the negated records. The third stage involves identifying the temporal status of the substance use by applying windowing and chunking methodologies. Finally, in the fourth stage, regular expressions, syntactic patterns, and keyword search techniques are used in order to extract the substance-use attributes. The proposed system achieves an F1-score of up to 0.99 for identifying substance-use-related records, 0.98 for detecting the negation status, and 0.94 for identifying temporal status. Moreover, F1-scores of up to 0.98, 0.98, 1.00, 0.92, and 0.98 are achieved for the extraction of the amount, frequency, type, quit-time, and period attributes, respectively. Natural Language Processing (NLP) and rule-based techniques are employed efficiently for extracting substance-use status and attributes, with the proposed system being able to detect substance-use status and attributes over both sentence-level and document-level data. Results show that the proposed system outperforms the compared state-of-the-art substance-use identification system on an unseen dataset, demonstrating its generalisability.
Collapse
Affiliation(s)
- Raid Alzubi
- Department of Computer Science, College of Computer Science and Information Technology, King Faisal University, Al-Ahsa 31982, Saudi Arabia
| | - Hadeel Alzoubi
- Department of Computer Science, College of Computer Science and Information Technology, King Faisal University, Al-Ahsa 31982, Saudi Arabia
| | - Stamos Katsigiannis
- Department of Computer Science, Durham University, Upper Mountjoy Campus, Stockton Road, Durham DH1 3LE, UK
| | - Daune West
- School of Computing, Engineering and Physical Sciences, University of the West of Scotland, High St., Paisley PA1 2BE, UK
| | - Naeem Ramzan
- School of Computing, Engineering and Physical Sciences, University of the West of Scotland, High St., Paisley PA1 2BE, UK
| |
Collapse
|
41
|
Woodward AA, Urbanowicz RJ, Naj AC, Moore JH. Genetic heterogeneity: Challenges, impacts, and methods through an associative lens. Genet Epidemiol 2022; 46:555-571. [PMID: 35924480 PMCID: PMC9669229 DOI: 10.1002/gepi.22497] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/18/2022] [Revised: 07/06/2022] [Accepted: 07/19/2022] [Indexed: 01/07/2023]
Abstract
Genetic heterogeneity describes the occurrence of the same or similar phenotypes through different genetic mechanisms in different individuals. Robustly characterizing and accounting for genetic heterogeneity is crucial to pursuing the goals of precision medicine, for discovering novel disease biomarkers, and for identifying targets for treatments. Failure to account for genetic heterogeneity may lead to missed associations and incorrect inferences. Thus, it is critical to review the impact of genetic heterogeneity on the design and analysis of population level genetic studies, aspects that are often overlooked in the literature. In this review, we first contextualize our approach to genetic heterogeneity by proposing a high-level categorization of heterogeneity into "feature," "outcome," and "associative" heterogeneity, drawing on perspectives from epidemiology and machine learning to illustrate distinctions between them. We highlight the unique nature of genetic heterogeneity as a heterogeneous pattern of association that warrants specific methodological considerations. We then focus on the challenges that preclude effective detection and characterization of genetic heterogeneity across a variety of epidemiological contexts. Finally, we discuss systems heterogeneity as an integrated approach to using genetic and other high-dimensional multi-omic data in complex disease research.
Collapse
Affiliation(s)
- Alexa A. Woodward
- Department of Biostatistics, Epidemiology and InformaticsUniversity of PennsylvaniaPhiladelphiaPennsylvaniaUSA
| | - Ryan J. Urbanowicz
- Department of Computational BiomedicineCedars‐Sinai Medical CenterLos AngelesCaliforniaUSA
| | - Adam C. Naj
- Department of Biostatistics, Epidemiology and InformaticsUniversity of PennsylvaniaPhiladelphiaPennsylvaniaUSA
| | - Jason H. Moore
- Department of Computational BiomedicineCedars‐Sinai Medical CenterLos AngelesCaliforniaUSA
| |
Collapse
|
42
|
Tanwar A, Zhang J, Ive J, Gupta V, Guo Y. Phenotyping in clinical text with unsupervised numerical reasoning for patient stratification. Exp Biol Med (Maywood) 2022; 247:2038-2052. [PMID: 36217914 PMCID: PMC9791305 DOI: 10.1177/15353702221118092] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/30/2022] Open
Abstract
Phenotypic information of patients, as expressed in clinical text, is important in many clinical applications such as identifying patients at risk of hard-to-diagnose conditions. Extracting and inferring some phenotypes from clinical text requires numerical reasoning, for example, a temperature of 102°F suggests the phenotype Fever. However, while current state-of-the-art phenotyping models using natural language processing (NLP) are in general very efficient in extracting phenotypes, they struggle to extract phenotypes that require numerical reasoning. In this article, we propose a novel unsupervised method that leverages external clinical knowledge and contextualized word embeddings by ClinicalBERT for numerical reasoning in different phenotypic contexts. Experiments show that the proposed method achieves significant improvement against unsupervised baseline methods with absolute increase in generalized Recall and F1 scores of up to 79% and 71%, respectively. Also, the proposed method outperforms supervised baseline methods with absolute increase in generalized Recall and F1 scores of up to 70% and 44%, respectively. In addition, we validate the methodology on clinical use cases where the detected phenotypes significantly contribute to patient stratification systems for a set of diseases, namely, HIV and myocardial infarction (heart attack). Moreover, we find that these phenotypes from clinical text can be used to impute the missing values in structured data, which enrich and improve data quality.
Collapse
|
43
|
Zou Y, Pesaranghader A, Song Z, Verma A, Buckeridge DL, Li Y. Modeling electronic health record data using an end-to-end knowledge-graph-informed topic model. Sci Rep 2022; 12:17868. [PMID: 36284225 PMCID: PMC9596500 DOI: 10.1038/s41598-022-22956-w] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/16/2022] [Accepted: 10/21/2022] [Indexed: 01/20/2023] Open
Abstract
The rapid growth of electronic health record (EHR) datasets opens up promising opportunities to understand human diseases in a systematic way. However, effective extraction of clinical knowledge from EHR data has been hindered by the sparse and noisy information. We present Graph ATtention-Embedded Topic Model (GAT-ETM), an end-to-end taxonomy-knowledge-graph-based multimodal embedded topic model. GAT-ETM distills latent disease topics from EHR data by learning the embedding from a constructed medical knowledge graph. We applied GAT-ETM to a large-scale EHR dataset consisting of over 1 million patients. We evaluated its performance based on topic quality, drug imputation, and disease diagnosis prediction. GAT-ETM demonstrated superior performance over the alternative methods on all tasks. Moreover, GAT-ETM learned clinically meaningful graph-informed embedding of the EHR codes and discovered interpretable and accurate patient representations for patient stratification and drug recommendations. GAT-ETM code is available at https://github.com/li-lab-mcgill/GAT-ETM .
Collapse
Affiliation(s)
- Yuesong Zou
- grid.14709.3b0000 0004 1936 8649School of Computer Science, McGill University, Montreal, Canada
| | - Ahmad Pesaranghader
- grid.14709.3b0000 0004 1936 8649School of Computer Science, McGill University, Montreal, Canada
| | - Ziyang Song
- grid.14709.3b0000 0004 1936 8649School of Computer Science, McGill University, Montreal, Canada
| | - Aman Verma
- grid.14709.3b0000 0004 1936 8649School of Population and Global Health, McGill University, Montreal, Canada
| | - David L. Buckeridge
- grid.14709.3b0000 0004 1936 8649School of Population and Global Health, McGill University, Montreal, Canada
| | - Yue Li
- grid.14709.3b0000 0004 1936 8649School of Computer Science, McGill University, Montreal, Canada
| |
Collapse
|
44
|
Conte M, Flynn A, Boisvert P, Landis-Lewis Z, Richesson R, Friedman C. Computable phenotypes for cohort identification: core content for a new class of FAIR Digital Objects. RESEARCH IDEAS AND OUTCOMES 2022. [DOI: 10.3897/rio.8.e95856] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022] Open
Abstract
Introduction
We present current work to develop and define a class of digital objects that facilitates patient cohort identification for clinical studies, such that these objects are Findable, Accessible, Interoperable, and Reusable (FAIR) (Wilkinson et al. 2016). Developing this class of FAIR Digital Objects (FDOs) builds on the work of several years to develop the Knowledge Grid (https://kgrid.org/), which facilitates the development, description and implementation of biomedical knowledge packaged in machine-readable and machine-executable formats (Flynn et al. 2018). Additionally, this work aligns with the goals of the Mobilizing Computable Biomedical Knowledge (MCBK) community (https://mobilizecbk.med.umich.edu/) (Mobilizing Computable Biomedical Knowledge 2018). In this abstract, we describe our work to develop a FDO carrying a computable phenotype.
Defining computable phenotypes
In biomedical informatics, 'phenotyping' describes a data-driven approach to identifying a group of individuals sharing observable characteristics of interest, generally related to a disease or condition, and a 'computable phenotype' (CP) is a machine-processable expression of a phenotypic pattern of these characteristics (Hripcsak and Albers 2018).
For the purposes of this work, we are interested in CPs derived from data contained in electronic health record (EHR) systems. This includes both structured data, e.g. codes for diseases, diagnoses, procedures, or laboratory tests, and unstructured data, e.g. free text including patient histories, clinical observations, discharge summaries, and reports. Thus, we define computable phenotype FDOs (CP-FDOs) as a class of FDO that packages an executable EHR-derived CP together with documentation needed to implement and use it effectively for creating cohorts of individuals with similar observable characteristics from EHR data sets.
Importance of portable and FAIR CPs
There is tremendous excitement for using real-world EHR data to discover important findings about human health and well-being. However, for discovery to happen, researchers need mechanisms like CPs to identify study cohorts for analysis. Beginning in the early 2010s, a growing literature explores various methods for the secondary use of EHR data for patient phenotyping to arrive at consistent study cohorts (Shivade et al. 2014, Banda et al. 2018). The heterogeneous nature of EHR data has inspired a wide variety of phenotyping methods, from those which rely solely on documented codes linked to terms in existing vocabularies to those which combine such codes with other concepts extracted from free text using natural language processing.
Our current focus is on packaging CPs inside FDOs for classifying patients as having or not having a phenotype of interest. This can be done within an individual health system, or at scale across a clinical data research network. Using CPs for cohort identification can reduce the time and expense of traditional data set building and clincal trial recruitment, and expand the potential scope of a study population(Boland et al. 2013).
Creating and validating CPs requires time, resources, and both clinical and technical expertise. One estimate is that it can take 6-10 months to develop and validate a CP (Shang et al. 2019). And, as there is no standard data model within EHRs in the United States, many CPs are designed for performance at a single site, rather than for portability, which is understood as the ability to implement a phenotype at a different site with similar performance (Shang et al. 2019). While portability is increasingly recognized as an important element of phenotyping, and there have been recent efforts to develop more portable CPs, many of these processes still require significant technical expertise at the implementation site to adapt the phenotype for use on local data.
There may also be significant advantages to making CPs FAIR. These include transparency in cohort selection, and better generalizability of results. FAIR CPs may also increase the potential for robust comparisons of data from related studies, leading to better evidence synthesis to improve delivery of care and ultimately human health.
Defining a new class of FDOs to hold and convey CPs
We believe that packaging validated CPs inside digital objects may alleviate many of the pressures mentioned above, and contributes to making both the processes and products of clinical research more FAIR. To this end, our current work focuses on packaging a validated CP inside a machine-processable FDO. The phenotype of interest identifies pediatric and adult patients with a rare disease (Oliverio et al. 2021), and has several features which make it ideal for transformation to an executable FDO. First, the phenotype utilizes standards to define the clinical characteristics of interest, and is based on a common data model; these features increase the potential for both interoperability and reuse. Additionally, because the phenotype has been validated across three sites, its portability has already been demonstrated. Finally, the full computable phenotype has been shared as a series of SQL queries, including scripts for patient identification, deriving statistics, and validation, which have been annotated with instructions for implementation at other sites.
The goals of this work are:
To develop CPs as executable DOs, leveraging previous work to develop executable Knowledge Objects (KO) (Flynn et al. 2018)
To advance our understanding of how to define computable phenotypes as a class of FDO, including what is needed to meet the requirements of binding, abstraction, and encapsulation (Wittenburg et al. 2019)
To develop CPs as executable DOs, leveraging previous work to develop executable Knowledge Objects (KO) (Flynn et al. 2018)
To advance our understanding of how to define computable phenotypes as a class of FDO, including what is needed to meet the requirements of binding, abstraction, and encapsulation (Wittenburg et al. 2019)
Conclusion
Computable phenotypes, packaged as FDOs, may increase the potential both for the portability of a phenotype and the reusability of data resulting from its implementation. Providing CPs as executable FDOs may also reduce barriers to portability and local implementation. In this presentation, we describe our work to develop a FDO computable phenotype from an existing validated phenotype. Lessons learned from this process will increase our understanding of both the technical requirements, and how to address necessary components of abstraction, binding, and encapsulation so that these can function as FAIR Digital Objects.
Collapse
|
45
|
Culié D, Schiappa R, Contu S, Scheller B, Villarme A, Dassonville O, Poissonnet G, Bozec A, Chamorey E. Validation and Improvement of a Convolutional Neural Network to Predict the Involved Pathology in a Head and Neck Surgery Cohort. INTERNATIONAL JOURNAL OF ENVIRONMENTAL RESEARCH AND PUBLIC HEALTH 2022; 19:12200. [PMID: 36231500 PMCID: PMC9564535 DOI: 10.3390/ijerph191912200] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 08/30/2022] [Revised: 09/19/2022] [Accepted: 09/22/2022] [Indexed: 06/16/2023]
Abstract
The selection of patients for the constitution of a cohort is a major issue for clinical research (prospective studies and retrospective studies in real life). Our objective was to validate in real life conditions the use of a Deep Learning process based on a neural network, for the classification of patients according to the pathology involved in a head and neck surgery department. 24,434 Electronic Health Records (EHR) from the first visit between 2000 and 2020 were extracted. More than 6000 EHR were manually classified in ten groups of interest according to the reason for consultation with a clinical relevance. A convolutional neural network (TensorFlow, previously reported by Hsu et al.) was then used to predict the group of patients based on their pathology, using two levels of classification based on clinically relevant criteria. On the first and second level of classification, macro-average performances were: 0.95, 0.83, 0.85, 0.97, 0.84 and 0.93, 0.76, 0.83, 0.96, 0.79 for accuracy, recall, precision, specificity and F1-score versus accuracy, recall and precision of 0.580, 580 and 0.582 for Hsu et al., respectively. We validated this model to predict the pathology involved and to constitute clinically relevant cohorts in a tertiary hospital. This model did not require a preprocessing stage, was used in French and showed equivalent or better performances than other already published techniques.
Collapse
Affiliation(s)
- Dorian Culié
- Head and Neck Surgery Department, Antoine Laccassagne Center, 06100 Nice, France
- Epidemiology, Biostatistics and Health Data Department, Antoine Laccassagne Center, 06100 Nice, France
| | - Renaud Schiappa
- Epidemiology, Biostatistics and Health Data Department, Antoine Laccassagne Center, 06100 Nice, France
| | - Sara Contu
- Epidemiology, Biostatistics and Health Data Department, Antoine Laccassagne Center, 06100 Nice, France
| | - Boris Scheller
- Head and Neck Surgery Department, Antoine Laccassagne Center, 06100 Nice, France
- Epidemiology, Biostatistics and Health Data Department, Antoine Laccassagne Center, 06100 Nice, France
| | - Agathe Villarme
- Head and Neck Surgery Department, Antoine Laccassagne Center, 06100 Nice, France
| | - Olivier Dassonville
- Head and Neck Surgery Department, Antoine Laccassagne Center, 06100 Nice, France
| | - Gilles Poissonnet
- Head and Neck Surgery Department, Antoine Laccassagne Center, 06100 Nice, France
| | - Alexandre Bozec
- Head and Neck Surgery Department, Antoine Laccassagne Center, 06100 Nice, France
- Epidemiology, Biostatistics and Health Data Department, Antoine Laccassagne Center, 06100 Nice, France
| | - Emmanuel Chamorey
- Epidemiology, Biostatistics and Health Data Department, Antoine Laccassagne Center, 06100 Nice, France
| |
Collapse
|
46
|
Development and validation of algorithms to identify patients with chronic kidney disease and related chronic diseases across the Northern Territory, Australia. BMC Nephrol 2022; 23:320. [PMID: 36151531 PMCID: PMC9502610 DOI: 10.1186/s12882-022-02947-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/14/2022] [Accepted: 09/13/2022] [Indexed: 11/15/2022] Open
Abstract
Background Electronic health records can be used for population-wide identification and monitoring of disease. The Territory Kidney Care project developed algorithms to identify individuals with chronic kidney disease (CKD) and several commonly comorbid chronic diseases. This study aims to describe the development and validation of our algorithms for CKD, diabetes, hypertension, and cardiovascular disease. A secondary aim of the study was to describe data completeness of the Territory Kidney Care database. Methods The Territory Kidney Care database consolidates electronic health records from multiple health services including public hospitals (n = 6) and primary care health services (> 60) across the Northern Territory, Australia. Using the database (n = 48,569) we selected a stratified random sample of patients (n = 288), which included individuals with mild to end-stage CKD. Diagnostic accuracy of the algorithms was tested against blinded manual chart reviews. Data completeness of the database was also described. Results For CKD defined as CKD stage 1 or higher (eGFR of any level with albuminuria or persistent eGFR < 60 ml/min/1.732, including renal replacement therapy) overall algorithm sensitivity was 93% (95%CI 89 to 96%) and specificity was 73% (95%CI 64 to 82%). For CKD defined as CKD stage 3a or higher (eGFR < 60 ml/min/1.732) algorithm sensitivity and specificity were 93% and 97% respectively. Among the CKD 1 to 5 staging algorithms, the CKD stage 5 algorithm was most accurate with > 99% sensitivity and specificity. For related comorbidities – algorithm sensitivity and specificity results were 75% and 97% for diabetes; 85% and 88% for hypertension; and 79% and 96% for cardiovascular disease. Conclusions We developed and validated algorithms to identify CKD and related chronic diseases within electronic health records. Validation results showed that CKD algorithms have a high degree of diagnostic accuracy compared to traditional administrative codes. Our highly accurate algorithms present new opportunities in early kidney disease detection, monitoring, and epidemiological research. Supplementary Information The online version contains supplementary material available at 10.1186/s12882-022-02947-9.
Collapse
|
47
|
Chushig-Muzo D, Soguero-Ruiz C, Miguel Bohoyo PD, Mora-Jiménez I. Learning and visualizing chronic latent representations using electronic health records. BioData Min 2022; 15:18. [PMID: 36064616 PMCID: PMC9446539 DOI: 10.1186/s13040-022-00303-z] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/14/2021] [Accepted: 07/27/2022] [Indexed: 12/03/2022] Open
Abstract
Background Nowadays, patients with chronic diseases such as diabetes and hypertension have reached alarming numbers worldwide. These diseases increase the risk of developing acute complications and involve a substantial economic burden and demand for health resources. The widespread adoption of Electronic Health Records (EHRs) is opening great opportunities for supporting decision-making. Nevertheless, data extracted from EHRs are complex (heterogeneous, high-dimensional and usually noisy), hampering the knowledge extraction with conventional approaches. Methods We propose the use of the Denoising Autoencoder (DAE), a Machine Learning (ML) technique allowing to transform high-dimensional data into latent representations (LRs), thus addressing the main challenges with clinical data. We explore in this work how the combination of LRs with a visualization method can be used to map the patient data in a two-dimensional space, gaining knowledge about the distribution of patients with different chronic conditions. Furthermore, this representation can be also used to characterize the patient’s health status evolution, which is of paramount importance in the clinical setting. Results To obtain clinical LRs, we considered real-world data extracted from EHRs linked to the University Hospital of Fuenlabrada in Spain. Experimental results showed the great potential of DAEs to identify patients with clinical patterns linked to hypertension, diabetes and multimorbidity. The procedure allowed us to find patients with the same main chronic disease but different clinical characteristics. Thus, we identified two kinds of diabetic patients with differences in their drug therapy (insulin and non-insulin dependant), and also a group of women affected by hypertension and gestational diabetes. We also present a proof of concept for mapping the health status evolution of synthetic patients when considering the most significant diagnoses and drugs associated with chronic patients. Conclusion Our results highlighted the value of ML techniques to extract clinical knowledge, supporting the identification of patients with certain chronic conditions. Furthermore, the patient’s health status progression on the two-dimensional space might be used as a tool for clinicians aiming to characterize health conditions and identify their more relevant clinical codes. Supplementary Information The online version contains supplementary material available at (10.1186/s13040-022-00303-z).
Collapse
Affiliation(s)
- David Chushig-Muzo
- Department of Signal Theory and Communications, Telematics and Computing Systems, Rey Juan Carlos University, Madrid, Spain
| | - Cristina Soguero-Ruiz
- Department of Signal Theory and Communications, Telematics and Computing Systems, Rey Juan Carlos University, Madrid, Spain
| | | | - Inmaculada Mora-Jiménez
- Department of Signal Theory and Communications, Telematics and Computing Systems, Rey Juan Carlos University, Madrid, Spain.
| |
Collapse
|
48
|
Zhong C, Liao K, Chen W, Liu Q, Peng B, Huang X, Peng J, Wei Z. Hierarchical reinforcement learning for automatic disease diagnosis. Bioinformatics 2022; 38:3995-4001. [PMID: 35775965 DOI: 10.1093/bioinformatics/btac408] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/21/2022] [Revised: 05/16/2022] [Accepted: 06/29/2022] [Indexed: 12/24/2022] Open
Abstract
MOTIVATION Disease diagnosis-oriented dialog system models the interactive consultation procedure as the Markov decision process, and reinforcement learning algorithms are used to solve the problem. Existing approaches usually employ a flat policy structure that treat all symptoms and diseases equally for action making. This strategy works well in a simple scenario when the action space is small; however, its efficiency will be challenged in the real environment. Inspired by the offline consultation process, we propose to integrate a hierarchical policy structure of two levels into the dialog system for policy learning. The high-level policy consists of a master model that is responsible for triggering a low-level model, the low-level policy consists of several symptom checkers and a disease classifier. The proposed policy structure is capable to deal with diagnosis problem including large number of diseases and symptoms. RESULTS Experimental results on three real-world datasets and a synthetic dataset demonstrate that our hierarchical framework achieves higher accuracy and symptom recall in disease diagnosis compared with existing systems. We construct a benchmark including datasets and implementation of existing algorithms to encourage follow-up researches. AVAILABILITY AND IMPLEMENTATION The code and data are available from https://github.com/FudanDISC/DISCOpen-MedBox-DialoDiagnosis. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Cheng Zhong
- School of Data Science, Fudan University, 200433 Shanghai, China
| | - Kangenbei Liao
- School of Data Science, Fudan University, 200433 Shanghai, China
| | - Wei Chen
- School of Data Science, Fudan University, 200433 Shanghai, China
| | | | | | - Xuanjing Huang
- School of Computer Science, Fudan University, 200433 Shanghai, China
| | - Jiajie Peng
- Research Institute of Intelligent Complex Symtems, Fudan University, 200433 Shanghai, China
| | - Zhongyu Wei
- School of Data Science, Fudan University, 200433 Shanghai, China.,Research Institute of Intelligent Complex Symtems, Fudan University, 200433 Shanghai, China
| |
Collapse
|
49
|
Abdulkareem M, Kenawy AA, Rauseo E, Lee AM, Sojoudi A, Amir-Khalili A, Lekadir K, Young AA, Barnes MR, Barckow P, Khanji MY, Aung N, Petersen SE. Predicting post-contrast information from contrast agent free cardiac MRI using machine learning: Challenges and methods. Front Cardiovasc Med 2022; 9:894503. [PMID: 36051279 PMCID: PMC9426684 DOI: 10.3389/fcvm.2022.894503] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/11/2022] [Accepted: 06/27/2022] [Indexed: 11/29/2022] Open
Abstract
Objectives Currently, administering contrast agents is necessary for accurately visualizing and quantifying presence, location, and extent of myocardial infarction (MI) with cardiac magnetic resonance (CMR). In this study, our objective is to investigate and analyze pre- and post-contrast CMR images with the goal of predicting post-contrast information using pre-contrast information only. We propose methods and identify challenges. Methods The study population consists of 272 retrospectively selected CMR studies with diagnoses of MI (n = 108) and healthy controls (n = 164). We describe a pipeline for pre-processing this dataset for analysis. After data feature engineering, 722 cine short-axis (SAX) images and segmentation mask pairs were used for experimentation. This constitutes 506, 108, and 108 pairs for the training, validation, and testing sets, respectively. We use deep learning (DL) segmentation (UNet) and classification (ResNet50) models to discover the extent and location of the scar and classify between the ischemic cases and healthy cases (i.e., cases with no regional myocardial scar) from the pre-contrast cine SAX image frames, respectively. We then capture complex data patterns that represent subtle signal and functional changes in the cine SAX images due to MI using optical flow, rate of change of myocardial area, and radiomics data. We apply this dataset to explore two supervised learning methods, namely, the support vector machines (SVM) and the decision tree (DT) methods, to develop predictive models for classifying pre-contrast cine SAX images as being a case of MI or healthy. Results Overall, for the UNet segmentation model, the performance based on the mean Dice score for the test set (n = 108) is 0.75 (±0.20) for the endocardium, 0.51 (±0.21) for the epicardium and 0.20 (±0.17) for the scar. For the classification task, the accuracy, F1 and precision scores of 0.68, 0.69, and 0.64, respectively, were achieved with the SVM model, and of 0.62, 0.63, and 0.72, respectively, with the DT model. Conclusion We have presented some promising approaches involving DL, SVM, and DT methods in an attempt to accurately predict contrast information from non-contrast images. While our initial results are modest for this challenging task, this area of research still poses several open problems.
Collapse
Affiliation(s)
- Musa Abdulkareem
- Barts Heart Centre, Barts Health National Health Service (NHS) Trust, London, United Kingdom
- National Institute for Health Research (NIHR) Barts Biomedical Research Centre, William Harvey Research Institute, Queen Mary University of London, London, United Kingdom
- Health Data Research UK, London, United Kingdom
| | - Asmaa A. Kenawy
- Barts Heart Centre, Barts Health National Health Service (NHS) Trust, London, United Kingdom
- National Institute for Health Research (NIHR) Barts Biomedical Research Centre, William Harvey Research Institute, Queen Mary University of London, London, United Kingdom
| | - Elisa Rauseo
- Barts Heart Centre, Barts Health National Health Service (NHS) Trust, London, United Kingdom
- National Institute for Health Research (NIHR) Barts Biomedical Research Centre, William Harvey Research Institute, Queen Mary University of London, London, United Kingdom
| | - Aaron M. Lee
- Barts Heart Centre, Barts Health National Health Service (NHS) Trust, London, United Kingdom
- National Institute for Health Research (NIHR) Barts Biomedical Research Centre, William Harvey Research Institute, Queen Mary University of London, London, United Kingdom
| | | | | | - Karim Lekadir
- Artificial Intelligence in Medicine Lab (BCN-AIM), Faculty of Mathematics and Computer Science, University of Barcelona, Barcelona, Spain
| | - Alistair A. Young
- Department of Biomedical Engineering, King’s College London, London, United Kingdom
| | - Michael R. Barnes
- Centre for Translational Bioinformatics, William Harvey Research Institute, Faculty of Medicine and Dentistry, Queen Mary University of London, London, United Kingdom
| | | | - Mohammed Y. Khanji
- Barts Heart Centre, Barts Health National Health Service (NHS) Trust, London, United Kingdom
- National Institute for Health Research (NIHR) Barts Biomedical Research Centre, William Harvey Research Institute, Queen Mary University of London, London, United Kingdom
- Newham University Hospital, Barts Health National Health Service (NHS) Trust, London, United Kingdom
| | - Nay Aung
- Barts Heart Centre, Barts Health National Health Service (NHS) Trust, London, United Kingdom
- National Institute for Health Research (NIHR) Barts Biomedical Research Centre, William Harvey Research Institute, Queen Mary University of London, London, United Kingdom
| | - Steffen E. Petersen
- Barts Heart Centre, Barts Health National Health Service (NHS) Trust, London, United Kingdom
- National Institute for Health Research (NIHR) Barts Biomedical Research Centre, William Harvey Research Institute, Queen Mary University of London, London, United Kingdom
- Health Data Research UK, London, United Kingdom
- The Alan Turing Institute, London, United Kingdom
| |
Collapse
|
50
|
Momtazmanesh S, Nowroozi A, Rezaei N. Artificial Intelligence in Rheumatoid Arthritis: Current Status and Future Perspectives: A State-of-the-Art Review. Rheumatol Ther 2022; 9:1249-1304. [PMID: 35849321 PMCID: PMC9510088 DOI: 10.1007/s40744-022-00475-4] [Citation(s) in RCA: 16] [Impact Index Per Article: 8.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/01/2022] [Accepted: 06/24/2022] [Indexed: 11/23/2022] Open
Abstract
Investigation of the potential applications of artificial intelligence (AI), including machine learning (ML) and deep learning (DL) techniques, is an exponentially growing field in medicine and healthcare. These methods can be critical in providing high-quality care to patients with chronic rheumatological diseases lacking an optimal treatment, like rheumatoid arthritis (RA), which is the second most prevalent autoimmune disease. Herein, following reviewing the basic concepts of AI, we summarize the advances in its applications in RA clinical practice and research. We provide directions for future investigations in this field after reviewing the current knowledge gaps and technical and ethical challenges in applying AI. Automated models have been largely used to improve RA diagnosis since the early 2000s, and they have used a wide variety of techniques, e.g., support vector machine, random forest, and artificial neural networks. AI algorithms can facilitate screening and identification of susceptible groups, diagnosis using omics, imaging, clinical, and sensor data, patient detection within electronic health record (EHR), i.e., phenotyping, treatment response assessment, monitoring disease course, determining prognosis, novel drug discovery, and enhancing basic science research. They can also aid in risk assessment for incidence of comorbidities, e.g., cardiovascular diseases, in patients with RA. However, the proposed models may vary significantly in their performance and reliability. Despite the promising results achieved by AI models in enhancing early diagnosis and management of patients with RA, they are not fully ready to be incorporated into clinical practice. Future investigations are required to ensure development of reliable and generalizable algorithms while they carefully look for any potential source of bias or misconduct. We showed that a growing body of evidence supports the potential role of AI in revolutionizing screening, diagnosis, and management of patients with RA. However, multiple obstacles hinder clinical applications of AI models. Incorporating the machine and/or deep learning algorithms into real-world settings would be a key step in the progress of AI in medicine.
Collapse
Affiliation(s)
- Sara Momtazmanesh
- School of Medicine, Tehran University of Medical Sciences, Tehran, Iran.,Network of Immunity in Infection, Malignancy and Autoimmunity (NIIMA), Universal Scientific Education and Research Network (USERN), Tehran, Iran.,Research Center for Immunodeficiencies, Pediatrics Center of Excellence, Children's Medical Center, Tehran University of Medical Sciences, Dr. Gharib St, Keshavarz Blvd, Tehran, Iran
| | - Ali Nowroozi
- School of Medicine, Tehran University of Medical Sciences, Tehran, Iran.,Network of Immunity in Infection, Malignancy and Autoimmunity (NIIMA), Universal Scientific Education and Research Network (USERN), Tehran, Iran
| | - Nima Rezaei
- Network of Immunity in Infection, Malignancy and Autoimmunity (NIIMA), Universal Scientific Education and Research Network (USERN), Tehran, Iran. .,Research Center for Immunodeficiencies, Pediatrics Center of Excellence, Children's Medical Center, Tehran University of Medical Sciences, Dr. Gharib St, Keshavarz Blvd, Tehran, Iran. .,Department of Immunology, School of Medicine, Tehran University of Medical Sciences, Tehran, Iran.
| |
Collapse
|