Reference Citation Analysis: Find an Article, Find a Category, Find a Journal, Find a Scholar

For: Lin C, Lou YS, Tsai DJ, Lee CC, Hsu CJ, Wu DC, Wang MC, Fang WH. Projection Word Embedding Model With Hybrid Sampling Training for Classifying ICD-10-CM Codes: Longitudinal Observational Study. JMIR Med Inform 2019;7:e14499. [PMID: 31339103 PMCID: PMC6683650 DOI: 10.2196/14499] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/26/2019] [Revised: 06/13/2019] [Accepted: 06/17/2019] [Indexed: 12/26/2022] Open

For:	Lin C, Lou YS, Tsai DJ, Lee CC, Hsu CJ, Wu DC, Wang MC, Fang WH. Projection Word Embedding Model With Hybrid Sampling Training for Classifying ICD-10-CM Codes: Longitudinal Observational Study. JMIR Med Inform 2019;7:e14499. [PMID: 31339103 PMCID: PMC6683650 DOI: 10.2196/14499] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/26/2019] [Revised: 06/13/2019] [Accepted: 06/17/2019] [Indexed: 12/26/2022] Open

Number

Cited by Other Article(s)

Saleem S, Asim MN, Van Elst L, Junker M, Dengel A. MLR-predictor: a versatile and efficient computational framework for multi-label requirements classification. Front Artif Intell 2024;7:1481581. [PMID: 39664103 PMCID: PMC11632133 DOI: 10.3389/frai.2024.1481581] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/27/2024] [Accepted: 11/05/2024] [Indexed: 12/13/2024] Open

Abstract

Introduction

Requirements classification is an essential task for development of a successful software by incorporating all relevant aspects of users' needs. Additionally, it aids in the identification of project failure risks and facilitates to achieve project milestones in more comprehensive way. Several machine learning predictors are developed for binary or multi-class requirements classification. However, a few predictors are designed for multi-label classification and they are not practically useful due to less predictive performance.

Method

MLR-Predictor makes use of innovative OkapiBM25 model to transforms requirements text into statistical vectors by computing words informative patterns. Moreover, predictor transforms multi-label requirements classification data into multi-class classification problem and utilize logistic regression classifier for categorization of requirements. The performance of the proposed predictor is evaluated and compared with 123 machine learning and 9 deep learning-based predictive pipelines across three public benchmark requirements classification datasets using eight different evaluation measures.

Results

The large-scale experimental results demonstrate that proposed MLR-Predictor outperforms 123 adopted machine learning and 9 deep learning predictive pipelines, as well as the state-of-the-art requirements classification predictor. Specifically, in comparison to state-of-the-art predictor, it achieves a 13% improvement in macro F1-measure on the PROMISE dataset, a 1% improvement on the EHR-binary dataset, and a 2.5% improvement on the EHR-multiclass dataset.

Discussion

As a case study, the generalizability of proposed predictor is evaluated on softwares customer reviews classification data. In this context, the proposed predictor outperformed the state-of-the-art BERT language model by F-1 score of 1.4%. These findings underscore the robustness and effectiveness of the proposed MLR-Predictor in various contexts, establishing its utility as a promising solution for requirements classification task.

Collapse

Zhou D, Gan Z, Shi X, Patwari A, Rush E, Bonzel CL, Panickan VA, Hong C, Ho YL, Cai T, Costa L, Li X, Castro VM, Murphy SN, Brat G, Weber G, Avillach P, Gaziano JM, Cho K, Liao KP, Lu J, Cai T. Multiview Incomplete Knowledge Graph Integration with application to cross-institutional EHR data harmonization. J Biomed Inform 2022;133:104147. [PMID: 35872266 DOI: 10.1016/j.jbi.2022.104147] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/31/2022] [Revised: 07/05/2022] [Accepted: 07/15/2022] [Indexed: 11/26/2022]

Abstract

OBJECTIVE

The growing availability of electronic health records (EHR) data opens opportunities for integrative analysis of multi-institutional EHR to produce generalizable knowledge. A key barrier to such integrative analyses is the lack of semantic interoperability across different institutions due to coding differences. We propose a Multiview Incomplete Knowledge Graph Integration (MIKGI) algorithm to integrate information from multiple sources with partially overlapping EHR concept codes to enable translations between healthcare systems.

METHODS

The MIKGI algorithm combines knowledge graph information from (i) embeddings trained from the co-occurrence patterns of medical codes within each EHR system and (ii) semantic embeddings of the textual strings of all medical codes obtained from the Self-Aligning Pretrained BERT (SAPBERT) algorithm. Due to the heterogeneity in the coding across healthcare systems, each EHR source provides partial coverage of the available codes. MIKGI synthesizes the incomplete knowledge graphs derived from these multi-source embeddings by minimizing a spherical loss function that combines the pairwise directional similarities of embeddings computed from all available sources. MIKGI outputs harmonized semantic embedding vectors for all EHR codes, which improves the quality of the embeddings and enables direct assessment of both similarity and relatedness between any pair of codes from multiple healthcare systems.

RESULTS

With EHR co-occurrence data from Veteran Affairs (VA) healthcare and Mass General Brigham (MGB), MIKGI algorithm produces high quality embeddings for a variety of downstream tasks including detecting known similar or related entity pairs and mapping VA local codes to the relevant EHR codes used at MGB. Based on the cosine similarity of the MIKGI trained embeddings, the AUC was 0.918 for detecting similar entity pairs and 0.809 for detecting related pairs. For cross-institutional medical code mapping, the top 1 and top 5 accuracy were 91.0% and 97.5% when mapping medication codes at VA to RxNorm medication codes at MGB; 59.1% and 75.8% when mapping VA local laboratory codes to LOINC hierarchy. When trained with 500 labels, the lab code mapping attained top 1 and 5 accuracy at 77.7% and 87.9%. MIKGI also attained best performance in selecting VA local lab codes for desired laboratory tests and COVID-19 related features for COVID EHR studies. Compared to existing methods, MIKGI attained the most robust performance with accuracy the highest or near the highest across all tasks.

CONCLUSIONS

The proposed MIKGI algorithm can effectively integrate incomplete summary data from biomedical text and EHR data to generate harmonized embeddings for EHR codes for knowledge graph modeling and cross-institutional translation of EHR codes.

Collapse

Lin C, Lee YT, Wu FJ, Lin SA, Hsu CJ, Lee CC, Tsai DJ, Fang WH. The Application of Projection Word Embeddings on Medical Records Scoring System. Healthcare (Basel) 2021;9:healthcare9101298. [PMID: 34682978 PMCID: PMC8544381 DOI: 10.3390/healthcare9101298] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/24/2021] [Revised: 09/24/2021] [Accepted: 09/28/2021] [Indexed: 11/16/2022] Open

Abstract

Medical records scoring is important in a health care system. Artificial intelligence (AI) with projection word embeddings has been validated in its performance disease coding tasks, which maintain the vocabulary diversity of open internet databases and the medical terminology understanding of electronic health records (EHRs). We considered that an AI-enhanced system might be also applied to automatically score medical records. This study aimed to develop a series of deep learning models (DLMs) and validated their performance in medical records scoring task. We also analyzed the practical value of the best model. We used the admission medical records from the Tri-Services General Hospital during January 2016 to May 2020, which were scored by our visiting staffs with different levels from different departments. The medical records were scored ranged 0 to 10. All samples were divided into a training set (n = 74,959) and testing set (n = 152,730) based on time, which were used to train and validate the DLMs, respectively. The mean absolute error (MAE) was used to evaluate each DLM performance. In original AI medical record scoring, the predicted score by BERT architecture is closer to the actual reviewer score than the projection word embedding and LSTM architecture. The original MAE is 0.84 ± 0.27 using the BERT model, and the MAE is 1.00 ± 0.32 using the LSTM model. Linear mixed model can be used to improve the model performance, and the adjusted predicted score was closer compared to the original score. However, the project word embedding with the LSTM model (0.66 ± 0.39) provided better performance compared to BERT (0.70 ± 0.33) after linear mixed model enhancement (p < 0.001). In addition to comparing different architectures to score the medical records, this study further uses a mixed linear model to successfully adjust the AI medical record score to make it closer to the actual physician's score.

Collapse

Affiliation(s)

Chin Lin School of Medicine, National Defense Medical Center, Taipei 114, Taiwan; School of Public Health, National Defense Medical Center, Taipei 114, Taiwan Graduate Institute of Life Sciences, National Defense Medical Center, Taipei 114, Taiwan Artificial Intelligence of Things Center, Tri-Service General Hospital, National Defense Medical Center, Taipei 114, Taiwan
Yung-Tsai Lee Division of Cardiovascular Surgery, Cheng Hsin Rehabilitation and Medical Center, Taipei 112, Taiwan;
Feng-Jen Wu Department of Informatics, Taoyuan Armed Forces General Hospital, Taoyuan 325, Taiwan;
Shing-An Lin Department of Medical Informatics, Tri-Service General Hospital, National Defense Medical Center, Taipei 114, Taiwan; (S.-A.L.); (C.-J.H.); (C.-C.L.)
Chia-Jung Hsu Department of Medical Informatics, Tri-Service General Hospital, National Defense Medical Center, Taipei 114, Taiwan; (S.-A.L.); (C.-J.H.); (C.-C.L.)
Chia-Cheng Lee Department of Medical Informatics, Tri-Service General Hospital, National Defense Medical Center, Taipei 114, Taiwan; (S.-A.L.); (C.-J.H.); (C.-C.L.) Division of Colorectal Surgery, Department of Surgery, Tri-Service General Hospital, National Defense Medical Center, Taipei 114, Taiwan
Dung-Jang Tsai School of Public Health, National Defense Medical Center, Taipei 114, Taiwan Graduate Institute of Life Sciences, National Defense Medical Center, Taipei 114, Taiwan Artificial Intelligence of Things Center, Tri-Service General Hospital, National Defense Medical Center, Taipei 114, Taiwan Correspondence: (D.-J.T.); (W.-H.F.); Tel.: +886-2-8792-3100 (ext. #18305) (D.-J.T.); +886-2-8792-3100 (ext. #12322) (W.-H.F.); Fax: +886-2-8792-3147 (D.-J.T. & W.-H.F.)
Wen-Hui Fang Artificial Intelligence of Things Center, Tri-Service General Hospital, National Defense Medical Center, Taipei 114, Taiwan Department of Family and Community Medicine, Department of Internal Medicine, Tri-Service General Hospital, National Defense Medical Center, Taipei 114, Taiwan Correspondence: (D.-J.T.); (W.-H.F.); Tel.: +886-2-8792-3100 (ext. #18305) (D.-J.T.); +886-2-8792-3100 (ext. #12322) (W.-H.F.); Fax: +886-2-8792-3147 (D.-J.T. & W.-H.F.)

Collapse

Lin CS, Lee YT, Fang WH, Lou YS, Kuo FC, Lee CC, Lin C. Deep Learning Algorithm for Management of Diabetes Mellitus via Electrocardiogram-Based Glycated Hemoglobin (ECG-HbA1c): A Retrospective Cohort Study. J Pers Med 2021;11:725. [PMID: 34442369 PMCID: PMC8398464 DOI: 10.3390/jpm11080725] [Citation(s) in RCA: 22] [Impact Index Per Article: 5.5] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/30/2021] [Revised: 07/21/2021] [Accepted: 07/26/2021] [Indexed: 11/16/2022] Open

Abstract

BACKGROUND

glycated hemoglobin (HbA1c) provides information on diabetes mellitus (DM) management. Electrocardiography (ECG) is a noninvasive test of cardiac activity that has been determined to be related to DM and its complications. This study developed a deep learning model (DLM) to estimate HbA1c via ECG.

METHODS

there were 104,823 ECGs with corresponding HbA1c or fasting glucose which were utilized to train a DLM for calculating ECG-HbA1c. Next, 1539 cases from outpatient departments and health examination centers provided 2190 ECGs for initial validation, and another 3293 cases with their first ECGs were employed to analyze its contributions to DM management. The primary analysis was used to distinguish patients with and without mild to severe DM, and the secondary analysis was to explore the predictive value of ECG-HbA1c for future complications, which included all-cause mortality, new-onset chronic kidney disease (CKD), and new-onset heart failure (HF).

RESULTS

we used a gender/age-matching strategy to train a DLM to achieve the best AUCs of 0.8255 with a sensitivity of 71.9% and specificity of 77.7% in a follow-up cohort with correlation of 0.496 and mean absolute errors of 1.230. The stratified analysis shows that DM presented in patients with fewer comorbidities was significantly more likely to be detected by ECG-HbA1c. Patients with higher ECG-HbA1c under the same Lab-HbA1c exhibited worse physical conditions. Of interest, ECG-HbA1c may contribute to the mortality (gender/age adjusted hazard ratio (HR): 1.53, 95% conference interval (CI): 1.08-2.17), new-onset CKD (HR: 1.56, 95% CI: 1.30-1.87), and new-onset HF (HR: 1.51, 95% CI: 1.13-2.01) independently of Lab-HbA1c. An additional impact of ECG-HbA1c on the risk of all-cause mortality (C-index: 0.831 to 0.835, p < 0.05), new-onset CKD (C-index: 0.735 to 0.745, p < 0.01), and new-onset HF (C-index: 0.793 to 0.796, p < 0.05) were observed in full adjustment models.

CONCLUSION

the ECG-HbA1c could be considered as a novel biomarker for screening DM and predicting the progression of DM and its complications.

Collapse

Automatic multilabel detection of ICD10 codes in Dutch cardiology discharge letters using neural networks. NPJ Digit Med 2021;4:37. [PMID: 33637859 PMCID: PMC7910461 DOI: 10.1038/s41746-021-00404-9] [Citation(s) in RCA: 12] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/16/2020] [Accepted: 01/26/2021] [Indexed: 12/02/2022] Open

Abstract

Standard reference terminology of diagnoses and risk factors is crucial for billing, epidemiological studies, and inter/intranational comparisons of diseases. The International Classification of Disease (ICD) is a standardized and widely used method, but the manual classification is an enormously time-consuming endeavor. Natural language processing together with machine learning allows automated structuring of diagnoses using ICD-10 codes, but the limited performance of machine learning models, the necessity of gigantic datasets, and poor reliability of terminal parts of these codes restricted clinical usability. We aimed to create a high performing pipeline for automated classification of reliable ICD-10 codes in the free medical text in cardiology. We focussed on frequently used and well-defined three- and four-digit ICD-10 codes that still have enough granularity to be clinically relevant such as atrial fibrillation (I48), acute myocardial infarction (I21), or dilated cardiomyopathy (I42.0). Our pipeline uses a deep neural network known as a Bidirectional Gated Recurrent Unit Neural Network and was trained and tested with 5548 discharge letters and validated in 5089 discharge and procedural letters. As in clinical practice discharge letters may be labeled with more than one code, we assessed the single- and multilabel performance of main diagnoses and cardiovascular risk factors. We investigated using both the entire body of text and only the summary paragraph, supplemented by age and sex. Given the privacy-sensitive information included in discharge letters, we added a de-identification step. The performance was high, with F1 scores of 0.76–0.99 for three-character and 0.87–0.98 for four-character ICD-10 codes, and was best when using complete discharge letters. Adding variables age/sex did not affect results. For model interpretability, word coefficients were provided and qualitative assessment of classification was manually performed. Because of its high performance, this pipeline can be useful to decrease the administrative burden of classifying discharge diagnoses and may serve as a scaffold for reimbursement and research applications.

Collapse

Weber C, Röschke L, Modersohn L, Lohr C, Kolditz T, Hahn U, Ammon D, Betz B, Kiehntopf M. Optimized Identification of Advanced Chronic Kidney Disease and Absence of Kidney Disease by Combining Different Electronic Health Data Resources and by Applying Machine Learning Strategies. J Clin Med 2020;9:jcm9092955. [PMID: 32932685 PMCID: PMC7563476 DOI: 10.3390/jcm9092955] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/25/2020] [Revised: 08/26/2020] [Accepted: 08/28/2020] [Indexed: 12/31/2022] Open

Robinson PN, Haendel MA. Ontologies, Knowledge Representation, and Machine Learning for Translational Research: Recent Contributions. Yearb Med Inform 2020;29:159-162. [PMID: 32823310 PMCID: PMC7442528 DOI: 10.1055/s-0040-1701991] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/24/2022] Open