1
|
Saleem S, Asim MN, Van Elst L, Junker M, Dengel A. MLR-predictor: a versatile and efficient computational framework for multi-label requirements classification. Front Artif Intell 2024; 7:1481581. [PMID: 39664103 PMCID: PMC11632133 DOI: 10.3389/frai.2024.1481581] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/27/2024] [Accepted: 11/05/2024] [Indexed: 12/13/2024] Open
Abstract
Introduction Requirements classification is an essential task for development of a successful software by incorporating all relevant aspects of users' needs. Additionally, it aids in the identification of project failure risks and facilitates to achieve project milestones in more comprehensive way. Several machine learning predictors are developed for binary or multi-class requirements classification. However, a few predictors are designed for multi-label classification and they are not practically useful due to less predictive performance. Method MLR-Predictor makes use of innovative OkapiBM25 model to transforms requirements text into statistical vectors by computing words informative patterns. Moreover, predictor transforms multi-label requirements classification data into multi-class classification problem and utilize logistic regression classifier for categorization of requirements. The performance of the proposed predictor is evaluated and compared with 123 machine learning and 9 deep learning-based predictive pipelines across three public benchmark requirements classification datasets using eight different evaluation measures. Results The large-scale experimental results demonstrate that proposed MLR-Predictor outperforms 123 adopted machine learning and 9 deep learning predictive pipelines, as well as the state-of-the-art requirements classification predictor. Specifically, in comparison to state-of-the-art predictor, it achieves a 13% improvement in macro F1-measure on the PROMISE dataset, a 1% improvement on the EHR-binary dataset, and a 2.5% improvement on the EHR-multiclass dataset. Discussion As a case study, the generalizability of proposed predictor is evaluated on softwares customer reviews classification data. In this context, the proposed predictor outperformed the state-of-the-art BERT language model by F-1 score of 1.4%. These findings underscore the robustness and effectiveness of the proposed MLR-Predictor in various contexts, establishing its utility as a promising solution for requirements classification task.
Collapse
Affiliation(s)
- Summra Saleem
- Department of Computer Science, Rheinland Pfälzische Technische Universität, Kaiserslautern, Germany
- German Research Center for Artificial Intelligence (DFKI), Kaiserslautern, Germany
| | - Muhammad Nabeel Asim
- German Research Center for Artificial Intelligence (DFKI), Kaiserslautern, Germany
| | - Ludger Van Elst
- German Research Center for Artificial Intelligence (DFKI), Kaiserslautern, Germany
| | - Markus Junker
- German Research Center for Artificial Intelligence (DFKI), Kaiserslautern, Germany
| | - Andreas Dengel
- Department of Computer Science, Rheinland Pfälzische Technische Universität, Kaiserslautern, Germany
- German Research Center for Artificial Intelligence (DFKI), Kaiserslautern, Germany
| |
Collapse
|
2
|
Zhou D, Gan Z, Shi X, Patwari A, Rush E, Bonzel CL, Panickan VA, Hong C, Ho YL, Cai T, Costa L, Li X, Castro VM, Murphy SN, Brat G, Weber G, Avillach P, Gaziano JM, Cho K, Liao KP, Lu J, Cai T. Multiview Incomplete Knowledge Graph Integration with application to cross-institutional EHR data harmonization. J Biomed Inform 2022; 133:104147. [PMID: 35872266 DOI: 10.1016/j.jbi.2022.104147] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/31/2022] [Revised: 07/05/2022] [Accepted: 07/15/2022] [Indexed: 11/26/2022]
Abstract
OBJECTIVE The growing availability of electronic health records (EHR) data opens opportunities for integrative analysis of multi-institutional EHR to produce generalizable knowledge. A key barrier to such integrative analyses is the lack of semantic interoperability across different institutions due to coding differences. We propose a Multiview Incomplete Knowledge Graph Integration (MIKGI) algorithm to integrate information from multiple sources with partially overlapping EHR concept codes to enable translations between healthcare systems. METHODS The MIKGI algorithm combines knowledge graph information from (i) embeddings trained from the co-occurrence patterns of medical codes within each EHR system and (ii) semantic embeddings of the textual strings of all medical codes obtained from the Self-Aligning Pretrained BERT (SAPBERT) algorithm. Due to the heterogeneity in the coding across healthcare systems, each EHR source provides partial coverage of the available codes. MIKGI synthesizes the incomplete knowledge graphs derived from these multi-source embeddings by minimizing a spherical loss function that combines the pairwise directional similarities of embeddings computed from all available sources. MIKGI outputs harmonized semantic embedding vectors for all EHR codes, which improves the quality of the embeddings and enables direct assessment of both similarity and relatedness between any pair of codes from multiple healthcare systems. RESULTS With EHR co-occurrence data from Veteran Affairs (VA) healthcare and Mass General Brigham (MGB), MIKGI algorithm produces high quality embeddings for a variety of downstream tasks including detecting known similar or related entity pairs and mapping VA local codes to the relevant EHR codes used at MGB. Based on the cosine similarity of the MIKGI trained embeddings, the AUC was 0.918 for detecting similar entity pairs and 0.809 for detecting related pairs. For cross-institutional medical code mapping, the top 1 and top 5 accuracy were 91.0% and 97.5% when mapping medication codes at VA to RxNorm medication codes at MGB; 59.1% and 75.8% when mapping VA local laboratory codes to LOINC hierarchy. When trained with 500 labels, the lab code mapping attained top 1 and 5 accuracy at 77.7% and 87.9%. MIKGI also attained best performance in selecting VA local lab codes for desired laboratory tests and COVID-19 related features for COVID EHR studies. Compared to existing methods, MIKGI attained the most robust performance with accuracy the highest or near the highest across all tasks. CONCLUSIONS The proposed MIKGI algorithm can effectively integrate incomplete summary data from biomedical text and EHR data to generate harmonized embeddings for EHR codes for knowledge graph modeling and cross-institutional translation of EHR codes.
Collapse
Affiliation(s)
| | | | - Xu Shi
- University of Michigan, MI, USA
| | | | - Everett Rush
- Department of Energy, Oak Ridge National Lab, Oak Ridge, TN, USA
| | - Clara-Lea Bonzel
- Harvard Medical School, Boston, MA, USA; VA Boston Healthcare System, Boston, MA, USA
| | - Vidul A Panickan
- Harvard Medical School, Boston, MA, USA; VA Boston Healthcare System, Boston, MA, USA
| | - Chuan Hong
- VA Boston Healthcare System, Boston, MA, USA; Duke University, Durham, NC, USA
| | - Yuk-Lam Ho
- VA Boston Healthcare System, Boston, MA, USA
| | - Tianrun Cai
- VA Boston Healthcare System, Boston, MA, USA; Brigham and Women's Hospital, Boston, MA, USA
| | | | | | | | | | - Gabriel Brat
- Harvard Medical School, Boston, MA, USA; Beth Israel Deaconess Medical Center, Boston, MA, USA
| | | | | | - J Michael Gaziano
- Harvard Medical School, Boston, MA, USA; VA Boston Healthcare System, Boston, MA, USA; Brigham and Women's Hospital, Boston, MA, USA
| | - Kelly Cho
- Harvard Medical School, Boston, MA, USA; VA Boston Healthcare System, Boston, MA, USA; Brigham and Women's Hospital, Boston, MA, USA
| | - Katherine P Liao
- VA Boston Healthcare System, Boston, MA, USA; Brigham and Women's Hospital, Boston, MA, USA
| | - Junwei Lu
- VA Boston Healthcare System, Boston, MA, USA; Harvard T.H. Chan School of Public Health, Boston, MA, USA
| | - Tianxi Cai
- Harvard Medical School, Boston, MA, USA; VA Boston Healthcare System, Boston, MA, USA; Harvard T.H. Chan School of Public Health, Boston, MA, USA.
| |
Collapse
|
3
|
Lin C, Lee YT, Wu FJ, Lin SA, Hsu CJ, Lee CC, Tsai DJ, Fang WH. The Application of Projection Word Embeddings on Medical Records Scoring System. Healthcare (Basel) 2021; 9:healthcare9101298. [PMID: 34682978 PMCID: PMC8544381 DOI: 10.3390/healthcare9101298] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/24/2021] [Revised: 09/24/2021] [Accepted: 09/28/2021] [Indexed: 11/16/2022] Open
Abstract
Medical records scoring is important in a health care system. Artificial intelligence (AI) with projection word embeddings has been validated in its performance disease coding tasks, which maintain the vocabulary diversity of open internet databases and the medical terminology understanding of electronic health records (EHRs). We considered that an AI-enhanced system might be also applied to automatically score medical records. This study aimed to develop a series of deep learning models (DLMs) and validated their performance in medical records scoring task. We also analyzed the practical value of the best model. We used the admission medical records from the Tri-Services General Hospital during January 2016 to May 2020, which were scored by our visiting staffs with different levels from different departments. The medical records were scored ranged 0 to 10. All samples were divided into a training set (n = 74,959) and testing set (n = 152,730) based on time, which were used to train and validate the DLMs, respectively. The mean absolute error (MAE) was used to evaluate each DLM performance. In original AI medical record scoring, the predicted score by BERT architecture is closer to the actual reviewer score than the projection word embedding and LSTM architecture. The original MAE is 0.84 ± 0.27 using the BERT model, and the MAE is 1.00 ± 0.32 using the LSTM model. Linear mixed model can be used to improve the model performance, and the adjusted predicted score was closer compared to the original score. However, the project word embedding with the LSTM model (0.66 ± 0.39) provided better performance compared to BERT (0.70 ± 0.33) after linear mixed model enhancement (p < 0.001). In addition to comparing different architectures to score the medical records, this study further uses a mixed linear model to successfully adjust the AI medical record score to make it closer to the actual physician's score.
Collapse
Affiliation(s)
- Chin Lin
- School of Medicine, National Defense Medical Center, Taipei 114, Taiwan;
- School of Public Health, National Defense Medical Center, Taipei 114, Taiwan
- Graduate Institute of Life Sciences, National Defense Medical Center, Taipei 114, Taiwan
- Artificial Intelligence of Things Center, Tri-Service General Hospital, National Defense Medical Center, Taipei 114, Taiwan
| | - Yung-Tsai Lee
- Division of Cardiovascular Surgery, Cheng Hsin Rehabilitation and Medical Center, Taipei 112, Taiwan;
| | - Feng-Jen Wu
- Department of Informatics, Taoyuan Armed Forces General Hospital, Taoyuan 325, Taiwan;
| | - Shing-An Lin
- Department of Medical Informatics, Tri-Service General Hospital, National Defense Medical Center, Taipei 114, Taiwan; (S.-A.L.); (C.-J.H.); (C.-C.L.)
| | - Chia-Jung Hsu
- Department of Medical Informatics, Tri-Service General Hospital, National Defense Medical Center, Taipei 114, Taiwan; (S.-A.L.); (C.-J.H.); (C.-C.L.)
| | - Chia-Cheng Lee
- Department of Medical Informatics, Tri-Service General Hospital, National Defense Medical Center, Taipei 114, Taiwan; (S.-A.L.); (C.-J.H.); (C.-C.L.)
- Division of Colorectal Surgery, Department of Surgery, Tri-Service General Hospital, National Defense Medical Center, Taipei 114, Taiwan
| | - Dung-Jang Tsai
- School of Public Health, National Defense Medical Center, Taipei 114, Taiwan
- Graduate Institute of Life Sciences, National Defense Medical Center, Taipei 114, Taiwan
- Artificial Intelligence of Things Center, Tri-Service General Hospital, National Defense Medical Center, Taipei 114, Taiwan
- Correspondence: (D.-J.T.); (W.-H.F.); Tel.: +886-2-8792-3100 (ext. #18305) (D.-J.T.); +886-2-8792-3100 (ext. #12322) (W.-H.F.); Fax: +886-2-8792-3147 (D.-J.T. & W.-H.F.)
| | - Wen-Hui Fang
- Artificial Intelligence of Things Center, Tri-Service General Hospital, National Defense Medical Center, Taipei 114, Taiwan
- Department of Family and Community Medicine, Department of Internal Medicine, Tri-Service General Hospital, National Defense Medical Center, Taipei 114, Taiwan
- Correspondence: (D.-J.T.); (W.-H.F.); Tel.: +886-2-8792-3100 (ext. #18305) (D.-J.T.); +886-2-8792-3100 (ext. #12322) (W.-H.F.); Fax: +886-2-8792-3147 (D.-J.T. & W.-H.F.)
| |
Collapse
|
4
|
Lin CS, Lee YT, Fang WH, Lou YS, Kuo FC, Lee CC, Lin C. Deep Learning Algorithm for Management of Diabetes Mellitus via Electrocardiogram-Based Glycated Hemoglobin (ECG-HbA1c): A Retrospective Cohort Study. J Pers Med 2021; 11:725. [PMID: 34442369 PMCID: PMC8398464 DOI: 10.3390/jpm11080725] [Citation(s) in RCA: 22] [Impact Index Per Article: 5.5] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/30/2021] [Revised: 07/21/2021] [Accepted: 07/26/2021] [Indexed: 11/16/2022] Open
Abstract
BACKGROUND glycated hemoglobin (HbA1c) provides information on diabetes mellitus (DM) management. Electrocardiography (ECG) is a noninvasive test of cardiac activity that has been determined to be related to DM and its complications. This study developed a deep learning model (DLM) to estimate HbA1c via ECG. METHODS there were 104,823 ECGs with corresponding HbA1c or fasting glucose which were utilized to train a DLM for calculating ECG-HbA1c. Next, 1539 cases from outpatient departments and health examination centers provided 2190 ECGs for initial validation, and another 3293 cases with their first ECGs were employed to analyze its contributions to DM management. The primary analysis was used to distinguish patients with and without mild to severe DM, and the secondary analysis was to explore the predictive value of ECG-HbA1c for future complications, which included all-cause mortality, new-onset chronic kidney disease (CKD), and new-onset heart failure (HF). RESULTS we used a gender/age-matching strategy to train a DLM to achieve the best AUCs of 0.8255 with a sensitivity of 71.9% and specificity of 77.7% in a follow-up cohort with correlation of 0.496 and mean absolute errors of 1.230. The stratified analysis shows that DM presented in patients with fewer comorbidities was significantly more likely to be detected by ECG-HbA1c. Patients with higher ECG-HbA1c under the same Lab-HbA1c exhibited worse physical conditions. Of interest, ECG-HbA1c may contribute to the mortality (gender/age adjusted hazard ratio (HR): 1.53, 95% conference interval (CI): 1.08-2.17), new-onset CKD (HR: 1.56, 95% CI: 1.30-1.87), and new-onset HF (HR: 1.51, 95% CI: 1.13-2.01) independently of Lab-HbA1c. An additional impact of ECG-HbA1c on the risk of all-cause mortality (C-index: 0.831 to 0.835, p < 0.05), new-onset CKD (C-index: 0.735 to 0.745, p < 0.01), and new-onset HF (C-index: 0.793 to 0.796, p < 0.05) were observed in full adjustment models. CONCLUSION the ECG-HbA1c could be considered as a novel biomarker for screening DM and predicting the progression of DM and its complications.
Collapse
Affiliation(s)
- Chin-Sheng Lin
- Division of Cardiology, Department of Internal Medicine, Tri-Service General Hospital, National Defense Medical Center, No 325, Section 2, Cheng-Kung Rd., Neihu, Taipei 114, Taiwan;
| | - Yung-Tsai Lee
- Division of Cardiovascular Surgery, Cheng Hsin Rehabilitation and Medical Center, No 45, Cheng Hsin St., Beitou, Taipei 112, Taiwan;
| | - Wen-Hui Fang
- Department of Family and Community Medicine, Department of Internal Medicine, Tri-Service General Hospital, National Defense Medical Center, No 325, Section 2, Cheng-Kung Rd., Neihu, Taipei 114, Taiwan;
| | - Yu-Sheng Lou
- Graduate Institute of Life Sciences, National Defense Medical Center, No.161, Section 6, Min-Chun E. Rd., Neihu, Taipei 114, Taiwan;
- School of Public Health, National Defense Medical Center, No.161, Section 6, Min-Chun E. Rd., Neihu, Taipei 114, Taiwan
| | - Feng-Chih Kuo
- Division of Endocrinology and Metabolism, Department of Internal Medicine, Tri-Service General Hospital, National Defense Medical Center, No 325, Section 2, Cheng-Kung Rd., Neihu, Taipei 114, Taiwan;
| | - Chia-Cheng Lee
- Department of Medical Informatics, Tri-Service General Hospital, National Defense Medical Center, No 325, Section 2, Cheng-Kung Rd., Neihu, Taipei 114, Taiwan;
- Division of Colorectal Surgery, Department of Surgery, Tri-Service General Hospital, National Defense Medical Center, No 325, Section 2, Cheng-Kung Rd., Neihu, Taipei 114, Taiwan
| | - Chin Lin
- Graduate Institute of Life Sciences, National Defense Medical Center, No.161, Section 6, Min-Chun E. Rd., Neihu, Taipei 114, Taiwan;
- School of Public Health, National Defense Medical Center, No.161, Section 6, Min-Chun E. Rd., Neihu, Taipei 114, Taiwan
- Medical Technology Education Center, School of Medicine, National Defense Medical Center, No.161, Section 6, Min-Chun E. Rd., Neihu, Taipei 114, Taiwan
| |
Collapse
|
5
|
Automatic multilabel detection of ICD10 codes in Dutch cardiology discharge letters using neural networks. NPJ Digit Med 2021; 4:37. [PMID: 33637859 PMCID: PMC7910461 DOI: 10.1038/s41746-021-00404-9] [Citation(s) in RCA: 12] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/16/2020] [Accepted: 01/26/2021] [Indexed: 12/02/2022] Open
Abstract
Standard reference terminology of diagnoses and risk factors is crucial for billing, epidemiological studies, and inter/intranational comparisons of diseases. The International Classification of Disease (ICD) is a standardized and widely used method, but the manual classification is an enormously time-consuming endeavor. Natural language processing together with machine learning allows automated structuring of diagnoses using ICD-10 codes, but the limited performance of machine learning models, the necessity of gigantic datasets, and poor reliability of terminal parts of these codes restricted clinical usability. We aimed to create a high performing pipeline for automated classification of reliable ICD-10 codes in the free medical text in cardiology. We focussed on frequently used and well-defined three- and four-digit ICD-10 codes that still have enough granularity to be clinically relevant such as atrial fibrillation (I48), acute myocardial infarction (I21), or dilated cardiomyopathy (I42.0). Our pipeline uses a deep neural network known as a Bidirectional Gated Recurrent Unit Neural Network and was trained and tested with 5548 discharge letters and validated in 5089 discharge and procedural letters. As in clinical practice discharge letters may be labeled with more than one code, we assessed the single- and multilabel performance of main diagnoses and cardiovascular risk factors. We investigated using both the entire body of text and only the summary paragraph, supplemented by age and sex. Given the privacy-sensitive information included in discharge letters, we added a de-identification step. The performance was high, with F1 scores of 0.76–0.99 for three-character and 0.87–0.98 for four-character ICD-10 codes, and was best when using complete discharge letters. Adding variables age/sex did not affect results. For model interpretability, word coefficients were provided and qualitative assessment of classification was manually performed. Because of its high performance, this pipeline can be useful to decrease the administrative burden of classifying discharge diagnoses and may serve as a scaffold for reimbursement and research applications.
Collapse
|
6
|
Weber C, Röschke L, Modersohn L, Lohr C, Kolditz T, Hahn U, Ammon D, Betz B, Kiehntopf M. Optimized Identification of Advanced Chronic Kidney Disease and Absence of Kidney Disease by Combining Different Electronic Health Data Resources and by Applying Machine Learning Strategies. J Clin Med 2020; 9:jcm9092955. [PMID: 32932685 PMCID: PMC7563476 DOI: 10.3390/jcm9092955] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/25/2020] [Revised: 08/26/2020] [Accepted: 08/28/2020] [Indexed: 12/31/2022] Open
Abstract
Automated identification of advanced chronic kidney disease (CKD ≥ III) and of no known kidney disease (NKD) can support both clinicians and researchers. We hypothesized that identification of CKD and NKD can be improved, by combining information from different electronic health record (EHR) resources, comprising laboratory values, discharge summaries and ICD-10 billing codes, compared to using each component alone. We included EHRs from 785 elderly multimorbid patients, hospitalized between 2010 and 2015, that were divided into a training and a test (n = 156) dataset. We used both the area under the receiver operating characteristic (AUROC) and under the precision-recall curve (AUCPR) with a 95% confidence interval for evaluation of different classification models. In the test dataset, the combination of EHR components as a simple classifier identified CKD ≥ III (AUROC 0.96[0.93-0.98]) and NKD (AUROC 0.94[0.91-0.97]) better than laboratory values (AUROC CKD 0.85[0.79-0.90], NKD 0.91[0.87-0.94]), discharge summaries (AUROC CKD 0.87[0.82-0.92], NKD 0.84[0.79-0.89]) or ICD-10 billing codes (AUROC CKD 0.85[0.80-0.91], NKD 0.77[0.72-0.83]) alone. Logistic regression and machine learning models improved recognition of CKD ≥ III compared to the simple classifier if only laboratory values were used (AUROC 0.96[0.92-0.99] vs. 0.86[0.81-0.91], p < 0.05) and improved recognition of NKD if information from previous hospital stays was used (AUROC 0.99[0.98-1.00] vs. 0.95[0.92-0.97]], p < 0.05). Depending on the availability of data, correct automated identification of CKD ≥ III and NKD from EHRs can be improved by generating classification models based on the combination of different EHR components.
Collapse
Affiliation(s)
- Christoph Weber
- Department of Clinical Chemistry and Laboratory Diagnostics and Integrated Biobank Jena (IBBJ), Jena University Hospital, 07747 Jena, Germany; (C.W.); (L.R.)
| | - Lena Röschke
- Department of Clinical Chemistry and Laboratory Diagnostics and Integrated Biobank Jena (IBBJ), Jena University Hospital, 07747 Jena, Germany; (C.W.); (L.R.)
| | - Luise Modersohn
- Jena University Language & Information Engineering (JULIE) Lab, Friedrich Schiller University Jena, 07743 Jena, Germany; (L.M.); (C.L.); (T.K.); (U.H.)
| | - Christina Lohr
- Jena University Language & Information Engineering (JULIE) Lab, Friedrich Schiller University Jena, 07743 Jena, Germany; (L.M.); (C.L.); (T.K.); (U.H.)
| | - Tobias Kolditz
- Jena University Language & Information Engineering (JULIE) Lab, Friedrich Schiller University Jena, 07743 Jena, Germany; (L.M.); (C.L.); (T.K.); (U.H.)
| | - Udo Hahn
- Jena University Language & Information Engineering (JULIE) Lab, Friedrich Schiller University Jena, 07743 Jena, Germany; (L.M.); (C.L.); (T.K.); (U.H.)
| | - Danny Ammon
- Data Integration Center, Jena University Hospital, 07743 Jena, Germany;
| | - Boris Betz
- Department of Clinical Chemistry and Laboratory Diagnostics and Integrated Biobank Jena (IBBJ), Jena University Hospital, 07747 Jena, Germany; (C.W.); (L.R.)
- Correspondence: (B.B.); (M.K.); Tel.: +49-3641-9-325074 (B.B.); +49-3641-9-325001 (M.K.)
| | - Michael Kiehntopf
- Department of Clinical Chemistry and Laboratory Diagnostics and Integrated Biobank Jena (IBBJ), Jena University Hospital, 07747 Jena, Germany; (C.W.); (L.R.)
- Correspondence: (B.B.); (M.K.); Tel.: +49-3641-9-325074 (B.B.); +49-3641-9-325001 (M.K.)
| |
Collapse
|
7
|
Robinson PN, Haendel MA. Ontologies, Knowledge Representation, and Machine Learning for Translational Research: Recent Contributions. Yearb Med Inform 2020; 29:159-162. [PMID: 32823310 PMCID: PMC7442528 DOI: 10.1055/s-0040-1701991] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/24/2022] Open
Abstract
Objectives
: To select, present, and summarize the most relevant papers published in 2018 and 2019 in the field of Ontologies and Knowledge Representation, with a particular focus on the intersection between Ontologies and Machine Learning.
Methods
: A comprehensive review of the medical informatics literature was performed to select the most interesting papers published in 2018 and 2019 and that document the utility of ontologies for computational analysis, including machine learning.
Results
: Fifteen articles were selected for inclusion in this survey paper. The chosen articles belong to three major themes: (i) the identification of phenotypic abnormalities in electronic health record (EHR) data using the Human Phenotype Ontology ; (ii) word and node embedding algorithms to supplement natural language processing (NLP) of EHRs and other medical texts; and (iii) hybrid ontology and NLP-based approaches to extracting structured and unstructured components of EHRs.
Conclusion
: Unprecedented amounts of clinically relevant data are now available for clinical and research use. Machine learning is increasingly being applied to these data sources for predictive analytics, precision medicine, and differential diagnosis. Ontologies have become an essential component of software pipelines designed to extract, code, and analyze clinical information by machine learning algorithms. The intersection of machine learning and semantics is proving to be an innovative space in clinical research.
Collapse
Affiliation(s)
- Peter N Robinson
- The Jackson Laboratory for Genomic Medicine, Farmington, CT, USA.,Institute for Systems Genomics, University of Connecticut, Farmington, CT, USA
| | - Melissa A Haendel
- Oregon Clinical & Translational Research Institute, Oregon Health & Science University, Portland, OR, USA.,Department of Environmental and Molecular Toxicology, Oregon State University, Corvallis, OR, USA
| |
Collapse
|