Reference Citation Analysis: Find an Article, Find a Category, Find a Journal, Find a Scholar

For: Kocbek S, Cavedon L, Martinez D, Bain C, Manus CM, Haffari G, Zukerman I, Verspoor K. Text mining electronic hospital records to automatically classify admissions against disease: Measuring the impact of linking data sources. J Biomed Inform 2016;64:158-167. [PMID: 27742349 DOI: 10.1016/j.jbi.2016.10.008] [Citation(s) in RCA: 18] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/06/2016] [Revised: 08/20/2016] [Accepted: 10/10/2016] [Indexed: 10/20/2022]

For:	Kocbek S, Cavedon L, Martinez D, Bain C, Manus CM, Haffari G, Zukerman I, Verspoor K. Text mining electronic hospital records to automatically classify admissions against disease: Measuring the impact of linking data sources. J Biomed Inform 2016;64:158-167. [PMID: 27742349 DOI: 10.1016/j.jbi.2016.10.008] [Citation(s) in RCA: 18] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/06/2016] [Revised: 08/20/2016] [Accepted: 10/10/2016] [Indexed: 10/20/2022]

Number

Cited by Other Article(s)

Stella F, Calimeri F, Dragoni M. Special issue on learning from multiple data sources for decision making in health care. J Biomed Inform 2024;153:104645. [PMID: 38636701 DOI: 10.1016/j.jbi.2024.104645] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/10/2024] [Revised: 04/11/2024] [Accepted: 04/16/2024] [Indexed: 04/20/2024]

Huang H, Zheng O, Wang D, Yin J, Wang Z, Ding S, Yin H, Xu C, Yang R, Zheng Q, Shi B. ChatGPT for shaping the future of dentistry: the potential of multi-modal large language model. Int J Oral Sci 2023;15:29. [PMID: 37507396 PMCID: PMC10382494 DOI: 10.1038/s41368-023-00239-y] [Citation(s) in RCA: 7] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/26/2023] [Revised: 07/06/2023] [Accepted: 07/13/2023] [Indexed: 07/30/2023] Open

Affiliation(s)

Hanyao Huang State Key Laboratory of Oral Diseases & National Clinical Research Center for Oral Diseases & Department of Oral and Maxillofacial Surgery, West China Hospital of Stomatology, Sichuan University, Chengdu, China.
Ou Zheng Department of Civil, Environmental & Construction Engineering, University of Central Florida, Orlando, USA.
Dongdong Wang Department of Civil, Environmental & Construction Engineering, University of Central Florida, Orlando, USA
Jiayi Yin State Key Laboratory of Oral Diseases & National Clinical Research Center for Oral Diseases & Department of Oral and Maxillofacial Surgery, West China Hospital of Stomatology, Sichuan University, Chengdu, China
Zijin Wang Department of Civil, Environmental & Construction Engineering, University of Central Florida, Orlando, USA
Shengxuan Ding College of Transportation Engineering, University of Central Florida, Orlando, USA
Heng Yin State Key Laboratory of Oral Diseases & National Clinical Research Center for Oral Diseases & Department of Oral and Maxillofacial Surgery, West China Hospital of Stomatology, Sichuan University, Chengdu, China
Chuan Xu School of Transportation and Logistics, Southwest Jiaotong University, Chengdu, China C2SMART Center, Tandon School of Engineering, New York University, Brooklyn, USA
Renjie Yang State Key Laboratory of Oral Diseases & National Clinical Research Center for Oral Diseases & Eastern Clinic, West China Hospital of Stomatology, Sichuan University, Chengdu, China
Qian Zheng State Key Laboratory of Oral Diseases & National Clinical Research Center for Oral Diseases & Department of Oral and Maxillofacial Surgery, West China Hospital of Stomatology, Sichuan University, Chengdu, China
Bing Shi State Key Laboratory of Oral Diseases & National Clinical Research Center for Oral Diseases & Department of Oral and Maxillofacial Surgery, West China Hospital of Stomatology, Sichuan University, Chengdu, China

Collapse

Xue Y, Liu H. Exploration of the Dynamic Evolution of Online Public Opinion towards Waste Classification in Shanghai. INTERNATIONAL JOURNAL OF ENVIRONMENTAL RESEARCH AND PUBLIC HEALTH 2023;20:1471. [PMID: 36674228 PMCID: PMC9859488 DOI: 10.3390/ijerph20021471] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 10/26/2021] [Revised: 11/26/2021] [Accepted: 11/30/2021] [Indexed: 06/17/2023]

Saha A, Burns L, Kulkarni AM. A scoping review of natural language processing of radiology reports in breast cancer. Front Oncol 2023;13:1160167. [PMID: 37124523 PMCID: PMC10130381 DOI: 10.3389/fonc.2023.1160167] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/06/2023] [Accepted: 03/28/2023] [Indexed: 05/02/2023] Open

Wang Y, Wang Y, Peng Z, Zhang F, Zhou L, Yang F. Medical text classification based on the discriminative pre-training model and prompt-tuning. Digit Health 2023;9:20552076231193213. [PMID: 37559830 PMCID: PMC10408339 DOI: 10.1177/20552076231193213] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/13/2022] [Accepted: 07/18/2023] [Indexed: 08/11/2023] Open

Dipnall JF, Lu J, Gabbe BJ, Cosic F, Edwards E, Page R, Du L. Comparison of state-of-the-art machine and deep learning algorithms to classify proximal humeral fractures using radiology text. Eur J Radiol 2022;153:110366. [DOI: 10.1016/j.ejrad.2022.110366] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/02/2022] [Revised: 04/08/2022] [Accepted: 05/16/2022] [Indexed: 12/01/2022]

Chintalapudi N, Angeloni U, Battineni G, di Canio M, Marotta C, Rezza G, Sagaro GG, Silenzi A, Amenta F. LASSO Regression Modeling on Prediction of Medical Terms among Seafarers’ Health Documents Using Tidy Text Mining. Bioengineering (Basel) 2022;9:bioengineering9030124. [PMID: 35324813 PMCID: PMC8945331 DOI: 10.3390/bioengineering9030124] [Citation(s) in RCA: 7] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/07/2022] [Revised: 03/02/2022] [Accepted: 03/16/2022] [Indexed: 12/31/2022] Open

Yang X, Mu D, Peng H, Li H, Wang Y, Wang P, Wang Y, Han S. Research and Application of Artificial Intelligence (AI) based on Electronic Health Records from Patients with Cancer: a Systematic Review (Preprint). JMIR Med Inform 2021;10:e33799. [PMID: 35442195 PMCID: PMC9069295 DOI: 10.2196/33799] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/24/2021] [Revised: 01/24/2022] [Accepted: 03/14/2022] [Indexed: 01/12/2023] Open

Abstract

Background

With the accumulation of electronic health records and the development of artificial intelligence, patients with cancer urgently need new evidence of more personalized clinical and demographic characteristics and more sophisticated treatment and prevention strategies. However, no research has systematically analyzed the application and significance of artificial intelligence based on electronic health records in cancer care.

Objective

The aim of this study was to conduct a review to introduce the current state and limitations of artificial intelligence based on electronic health records of patients with cancer and to summarize the performance of artificial intelligence in mining electronic health records and its impact on cancer care.

Methods

Three databases were systematically searched to retrieve potentially relevant papers published from January 2009 to October 2020. Four principal reviewers assessed the quality of the papers and reviewed them for eligibility based on the inclusion criteria in the extracted data. The summary measures used in this analysis were the number and frequency of occurrence of the themes.

Results

Of the 1034 papers considered, 148 papers met the inclusion criteria. Cancer care, especially cancers of female organs and digestive organs, could benefit from artificial intelligence based on electronic health records through cancer emergencies and prognostic estimates, cancer diagnosis and prediction, tumor stage detection, cancer case detection, and treatment pattern recognition. The models can always achieve an area under the curve of 0.7. Ensemble methods and deep learning are on the rise. In addition, electronic medical records in the existing studies are mainly in English and from private institutional databases.

Conclusions

Artificial intelligence based on electronic health records performed well and could be useful for cancer care. Improving the performance of artificial intelligence can help patients receive more scientific-based and accurate treatments. There is a need for the development of new methods and electronic health record data sharing and for increased passion and support from cancer specialists.

Collapse

Percha B. Modern Clinical Text Mining: A Guide and Review. Annu Rev Biomed Data Sci 2021;4:165-187. [PMID: 34465177 DOI: 10.1146/annurev-biodatasci-030421-030931] [Citation(s) in RCA: 11] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]

Leveraging electronic health record data to inform hospital resource management : A systematic data mining approach. Health Care Manag Sci 2021;24:716-741. [PMID: 34031792 DOI: 10.1007/s10729-021-09554-4] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/26/2020] [Accepted: 02/02/2021] [Indexed: 10/21/2022]

Abstract

Early identification of resource needs is instrumental in promoting efficient hospital resource management. Hospital information systems, and electronic health records (EHR) in particular, collect valuable demographic and clinical patient data from the moment patients are admitted, which can help predict expected resource needs in early stages of patient episodes. To this end, this article proposes a data mining methodology to systematically obtain predictions for relevant managerial variables by leveraging structured EHR data. Specifically, these managerial variables are: i) Diagnosis categories, ii) procedure codes, iii) diagnosis-related groups (DRGs), iv) outlier episodes and v) length of stay (LOS). The proposed methodology approaches the problem in four stages: Feature set construction, feature selection, prediction model development, and model performance evaluation. We tested this approach with an EHR dataset of 5,089 inpatient episodes and compared different classification and regression models (for categorical and continuous variables, respectively), performed temporal analysis of model performance, analyzed the impact of training set homogeneity on performance and assessed the contribution of different EHR data elements for model predictive power. Overall, our results indicate that inpatient EHR data can effectively be leveraged to inform resource management on multiple perspectives. Logistic regression (combined with minimal redundancy maximum relevance feature selection) and bagged decision trees yielded best results for predicting categorical and numerical managerial variables, respectively. Furthermore, our temporal analysis indicated that, while DRG classes are more difficult to predict, several diagnosis categories, procedure codes and LOS amongst shorter-stay patients can be predicted with higher confidence in early stages of patient stay. Lastly, value of information analysis indicated that diagnoses, medication and structured assessment forms were the most valuable EHR data elements in predicting managerial variables of interest through a data mining approach.

Collapse

Na HJ, Lee KC, Kim ST. Integrating Text-Mining and Balanced Scorecard Techniques to Investigate the Association between CEO Message of Homepage Words and Financial Status: Emphasis on Hospitals. Healthcare (Basel) 2021;9:healthcare9040408. [PMID: 33916303 PMCID: PMC8067190 DOI: 10.3390/healthcare9040408] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/01/2021] [Revised: 03/19/2021] [Accepted: 03/22/2021] [Indexed: 11/16/2022] Open

Abstract

(1) Background: The Chief Executive Officer’s (CEO’s) message on a hospital’s homepage on the Internet contains various components, such as the hospital’s future vision, promises to customers, availability of upgraded services and public activities. This statement usually includes non-financial information as well as financial information about the corporate entity owning/operating the hospital. In addition, it provides useful information about not only the company’s goals and vision, but also firm performance targets and strategies for the future. This study aims to investigate associations between the CEO’s message and the financial status of the institution. We used the balanced scorecard framework to analyze what content on the hospital’s homepage is related to the hospital’s various financial ratios. (2) Methods: We adopted a text-mining method to extract significantly repeated keywords from the CEO’s message on the hospital’s website. Then, we classified these keywords using a balanced scorecard approach. To examine the relationship between keywords in the CEO’s message and the hospital’s financial ratios, a t-test was conducted for the difference in the term frequency divided by inverse document frequency (TF-IDF) mean of the home page contents and its relationship with the views of the balanced scorecard framework. (3) Results: According to our empirical results on 65 samples collected from local hospitals, there are some significant relationships between the qualitative content of the hospital’s homepage and the quantitative financial ratios that indicate profitability, activity, leverage, liquidity, and accumulating reserves for proper business purposes. (4) Conclusions: The introduction section of a homepage is the part most accessible to customers, containing the aims and ideals of the hospital and reflecting the institution’s values and visions. In addition, in the coverage of financial status, the organization can either emphasize financial strength or focus on other areas to divert attention from any weakness shown in the financial information. This study reminds us of the importance of the hospital website’s disclosure, and what can be inferred from the financial status of the hospital. It also highlights the need for reconciliation and harmony between the quantitative data, financial statements, and qualitative data in the CEO’s message. (5) Implications: To the best of our knowledge, this paper is the first research attempting to investigate the relationship between text on the hospital’s homepage and the hospital’s financial ratios using text-mining techniques and the balanced scorecard framework. Hospitals play a crucial role in a country’s welfare and healthcare industry. Nevertheless, in many countries, hospital organizations tend to remain a source of critical fiscal deficits due to ineffective and sloppy management. We expect that the result of this paper can provide hospital managers with useful information to address that situation.

Collapse

Ontology-based enriched concept graphs for medical document classification. Inf Sci (N Y) 2020. [DOI: 10.1016/j.ins.2020.03.006] [Citation(s) in RCA: 11] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/23/2022]

Bagheri A, Sammani A, van der Heijden PGM, Asselbergs FW, Oberski DL. ETM: Enrichment by topic modeling for automated clinical sentence classification to detect patients’ disease history. J Intell Inf Syst 2020. [DOI: 10.1007/s10844-020-00605-w] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/24/2022]

Abstract AbstractGiven the rapid rate at which text data are being digitally gathered in the medical domain, there is growing need for automated tools that can analyze clinical notes and classify their sentences in electronic health records (EHRs). This study uses EHR texts to detect patients’ disease history from clinical sentences. However, in EHRs, sentences are less topic-focused and shorter than that in general domain, which leads to the sparsity of co-occurrence patterns and the lack of semantic features. To tackle this challenge, current approaches for clinical sentence classification are dependent on external information to improve classification performance. However, this is implausible owing to a lack of universal medical dictionaries. This study proposes the ETM (enrichment by topic modeling) algorithm, based on latent Dirichlet allocation, to smoothen the semantic representations of short sentences. The ETM enriches text representation by incorporating probability distributions generated by an unsupervised algorithm into it. It considers the length of the original texts to enhance representation by using an internal knowledge acquisition procedure. When it comes to clinical predictive modeling, interpretability improves the acceptance of the model. Thus, for clinical sentence classification, the ETM approach employs an initial TFiDF (term frequency inverse document frequency) representation, where we use the support vector machine and neural network algorithms for the classification task. We conducted three sets of experiments on a data set consisting of clinical cardiovascular notes from the Netherlands to test the sentence classification performance of the proposed method in comparison with prevalent approaches. The results show that the proposed ETM approach outperformed state-of-the-art baselines. Collapse

Drozdov I, Forbes D, Szubert B, Hall M, Carlin C, Lowe DJ. Supervised and unsupervised language modelling in Chest X-Ray radiological reports. PLoS One 2020;15:e0229963. [PMID: 32155219 PMCID: PMC7064166 DOI: 10.1371/journal.pone.0229963] [Citation(s) in RCA: 9] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/23/2019] [Accepted: 02/17/2020] [Indexed: 12/14/2022] Open

Zhou W, Shao F, Li J. Bioinformatic analysis of the molecular mechanism underlying bronchial pulmonary dysplasia using a text mining approach. Medicine (Baltimore) 2019;98:e18493. [PMID: 31876736 PMCID: PMC6946243 DOI: 10.1097/md.0000000000018493] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Indexed: 11/26/2022] Open

Chen CJ, Warikoo N, Chang YC, Chen JH, Hsu WL. Medical knowledge infused convolutional neural networks for cohort selection in clinical trials. J Am Med Inform Assoc 2019;26:1227-1236. [PMID: 31390470 DOI: 10.1093/jamia/ocz128] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/20/2019] [Revised: 06/18/2019] [Accepted: 07/04/2019] [Indexed: 12/24/2022] Open

Abstract

OBJECTIVE

In this era of digitized health records, there has been a marked interest in using de-identified patient records for conducting various health related surveys. To assist in this research effort, we developed a novel clinical data representation model entitled medical knowledge-infused convolutional neural network (MKCNN), which is used for learning the clinical trial criteria eligibility status of patients to participate in cohort studies.

MATERIALS AND METHODS

In this study, we propose a clinical text representation infused with medical knowledge (MK). First, we isolate the noise from the relevant data using a medically relevant description extractor; then we utilize log-likelihood ratio based weights from selected sentences to highlight "met" and "not-met" knowledge-infused representations in bichannel setting for each instance. The combined medical knowledge-infused representation (MK) from these modules helps identify significant clinical criteria semantics, which in turn renders effective learning when used with a convolutional neural network architecture.

RESULTS

MKCNN outperforms other Medical Knowledge (MK) relevant learning architectures by approximately 3%; notably SVM and XGBoost implementations developed in this study. MKCNN scored 86.1% on F1metric, a gain of 6% above the average performance assessed from the submissions for n2c2 task. Although pattern/rule-based methods show a higher average performance for the n2c2 clinical data set, MKCNN significantly improves performance of machine learning implementations for clinical datasets.

CONCLUSION

MKCNN scored 86.1% on the F1 score metric. In contrast to many of the rule-based systems introduced during the n2c2 challenge workshop, our system presents a model that heavily draws on machine-based learning. In addition, the MK representations add more value to clinical comprehension and interpretation of natural texts.

Collapse

Rios A, Durbin EB, Hands I, Arnold SM, Shah D, Schwartz SM, Goulart BHL, Kavuluru R. Cross-registry neural domain adaptation to extract mutational test results from pathology reports. J Biomed Inform 2019;97:103267. [PMID: 31401235 PMCID: PMC6736690 DOI: 10.1016/j.jbi.2019.103267] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/23/2019] [Revised: 07/30/2019] [Accepted: 08/05/2019] [Indexed: 10/26/2022]

Abstract

OBJECTIVE

We study the performance of machine learning (ML) methods, including neural networks (NNs), to extract mutational test results from pathology reports collected by cancer registries. Given the lack of hand-labeled datasets for mutational test result extraction, we focus on the particular use-case of extracting Epidermal Growth Factor Receptor mutation results in non-small cell lung cancers. We explore the generalization of NNs across different registries where our goals are twofold: (1) to assess how well models trained on a registry's data port to test data from a different registry and (2) to assess whether and to what extent such models can be improved using state-of-the-art neural domain adaptation techniques under different assumptions about what is available (labeled vs unlabeled data) at the target registry site.

MATERIALS AND METHODS

We collected data from two registries: the Kentucky Cancer Registry (KCR) and the Fred Hutchinson Cancer Research Center (FH) Cancer Surveillance System. We combine NNs with adversarial domain adaptation to improve cross-registry performance. We compare to other classifiers in the standard supervised classification, unsupervised domain adaptation, and supervised domain adaptation scenarios.

RESULTS

The performance of ML methods varied between registries. To extract positive results, the basic convolutional neural network (CNN) had an F1 of 71.5% on the KCR dataset and 95.7% on the FH dataset. For the KCR dataset, the CNN F1 results were low when trained on FH data (Positive F1: 23%). Using our proposed adversarial CNN, without any labeled data, we match the F1 of the models trained directly on each target registry's data. The adversarial CNN F1 improved when trained on FH and applied to KCR dataset (Positive F1: 70.8%). We found similar performance improvements when we trained on KCR and tested on FH reports (Positive F1: 45% to 96%).

CONCLUSION

Adversarial domain adaptation improves the performance of NNs applied to pathology reports. In the unsupervised domain adaptation setting, we match the performance of models that are trained directly on target registry's data by using source registry's labeled data and unlabeled examples from the target registry.

Collapse

Integrating Natural Language Processing and Machine Learning Algorithms to Categorize Oncologic Response in Radiology Reports. J Digit Imaging 2019;31:178-184. [PMID: 29079959 DOI: 10.1007/s10278-017-0027-x] [Citation(s) in RCA: 41] [Impact Index Per Article: 8.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/18/2022] Open

Brown A, Kachura J. Natural Language Processing of Radiology Reports in Patients With Hepatocellular Carcinoma to Predict Radiology Resource Utilization. J Am Coll Radiol 2019;16:840-844. [DOI: 10.1016/j.jacr.2018.12.004] [Citation(s) in RCA: 19] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/01/2018] [Revised: 12/02/2018] [Accepted: 12/03/2018] [Indexed: 12/13/2022]

Kocbek S, Kocbek P, Stozer A, Zupanic T, Groza T, Stiglic G. Building interpretable models for polypharmacy prediction in older chronic patients based on drug prescription records. PeerJ 2018;6:e5765. [PMID: 30345175 PMCID: PMC6187991 DOI: 10.7717/peerj.5765] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/11/2018] [Accepted: 09/17/2018] [Indexed: 01/02/2023] Open

Abstract

Background

Multimorbidity presents an increasingly common problem in older population, and is tightly related to polypharmacy, i.e., concurrent use of multiple medications by one individual. Detecting polypharmacy from drug prescription records is not only related to multimorbidity, but can also point at incorrect use of medicines. In this work, we build models for predicting polypharmacy from drug prescription records for newly diagnosed chronic patients. We evaluate the models’ performance with a strong focus on interpretability of the results.

Methods

A centrally collected nationwide dataset of prescription records was used to perform electronic phenotyping of patients for the following two chronic conditions: type 2 diabetes mellitus (T2D) and cardiovascular disease (CVD). In addition, a hospital discharge dataset was linked to the prescription records. A regularized regression model was built for 11 different experimental scenarios on two datasets, and complexity of the model was controlled with a maximum number of dimensions (MND) parameter. Performance and interpretability of the model were evaluated with AUC, AUPRC, calibration plots, and interpretation by a medical doctor.

Results

For the CVD model, AUC and AUPRC values of 0.900 (95% [0.898–0.901]) and 0.640 (0.635–0.645) were reached, respectively, while for the T2D model the values were 0.808 (0.803–0.812) and 0.732 (0.725–0.739). Reducing complexity of the model by 65% and 48% for CVD and T2D, resulted in 3% and 4% lower AUC, and 4% and 5% lower AUPRC values, respectively. Calibration plots for our models showed that we can achieve moderate calibration with reducing the models’ complexity without significant loss of predictive performance.

Discussion

In this study, we found that it is possible to use drug prescription data to build a model for polypharmacy prediction in older population. In addition, the study showed that it is possible to find a balance between good performance and interpretability of the model, and achieve acceptable calibration at the same time.

Collapse

Pérez J, Pérez A, Casillas A, Gojenola K. Cardiology record multi-label classification using latent Dirichlet allocation. COMPUTER METHODS AND PROGRAMS IN BIOMEDICINE 2018;164:111-119. [PMID: 30195419 DOI: 10.1016/j.cmpb.2018.07.002] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/14/2017] [Revised: 06/18/2018] [Accepted: 07/16/2018] [Indexed: 06/08/2023]

Abstract

BACKGROUND AND OBJECTIVES

Electronic health records (EHRs) convey vast and valuable knowledge about dynamically changing clinical practices. Indeed, clinical documentation entails the inspection of massive number of records across hospitals and hospital sections. The goal of this study is to provide an efficient framework that will help clinicians explore EHRs and attain alternative views related to both patient-segments and diseases, like clustering and statistical information about the development of heart diseases (replacement of pacemakers, valve implantation etc.) in co-occurrence with other diseases. The task is challenging, dealing with lengthy health records and a high number of classes in a multi-label setting.

METHODS

LDA is a statistical procedure optimized to explain a document by multinomial distributions on their latent topics and the topics by distributions on related words. These distributions allow to represent collections of texts into a continuous space enabling distance-based associations between documents and also revealing the underlying topics. The topic models were assessed by means of four divergence metrics. In addition, we applied LDA to the task of multi-label document classification of EHRs according to the International Classification of Diseases 10th Clinical Modification (ICD-10). The set of EHRs had assigned 7 codes on average over 970 different codes corresponding to cardiology.

RESULTS

First, the discriminative ability of topic models was assessed using dissimilarity metrics. Nevertheless, there was an open question regarding the interpretability of automatically discovered topics. To address this issue, we explored the connection between the latent topics and ICD-10. EHRs were represented by means of LDA and, next, supervised classifiers were inferred from those representations. Given the low-dimensional representation provided by LDA, the search was computationally efficient compared to symbolic approaches such as TF-IDF. The classifiers achieved an average AUC of 77.79. As a side contribution, with this work we released the software implemented in Python and R to both train and evaluate the models.

CONCLUSIONS

Topic modeling offers a means of representing EHRs in a small dimensional continuous space. This representation conveys relevant information as hidden topics in a comprehensive manner. Moreover, in practice, this compact representation allowed to extract the ICD-10 codes associated to EHRs.

Collapse

Weng WH, Wagholikar KB, McCray AT, Szolovits P, Chueh HC. Medical subdomain classification of clinical notes using a machine learning-based natural language processing approach. BMC Med Inform Decis Mak 2017;17:155. [PMID: 29191207 PMCID: PMC5709846 DOI: 10.1186/s12911-017-0556-8] [Citation(s) in RCA: 79] [Impact Index Per Article: 11.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/15/2017] [Accepted: 11/19/2017] [Indexed: 01/18/2023] Open

Abstract

BACKGROUND

The medical subdomain of a clinical note, such as cardiology or neurology, is useful content-derived metadata for developing machine learning downstream applications. To classify the medical subdomain of a note accurately, we have constructed a machine learning-based natural language processing (NLP) pipeline and developed medical subdomain classifiers based on the content of the note.

METHODS

We constructed the pipeline using the clinical NLP system, clinical Text Analysis and Knowledge Extraction System (cTAKES), the Unified Medical Language System (UMLS) Metathesaurus, Semantic Network, and learning algorithms to extract features from two datasets - clinical notes from Integrating Data for Analysis, Anonymization, and Sharing (iDASH) data repository (n = 431) and Massachusetts General Hospital (MGH) (n = 91,237), and built medical subdomain classifiers with different combinations of data representation methods and supervised learning algorithms. We evaluated the performance of classifiers and their portability across the two datasets.

RESULTS

The convolutional recurrent neural network with neural word embeddings trained-medical subdomain classifier yielded the best performance measurement on iDASH and MGH datasets with area under receiver operating characteristic curve (AUC) of 0.975 and 0.991, and F1 scores of 0.845 and 0.870, respectively. Considering better clinical interpretability, linear support vector machine-trained medical subdomain classifier using hybrid bag-of-words and clinically relevant UMLS concepts as the feature representation, with term frequency-inverse document frequency (tf-idf)-weighting, outperformed other shallow learning classifiers on iDASH and MGH datasets with AUC of 0.957 and 0.964, and F1 scores of 0.932 and 0.934 respectively. We trained classifiers on one dataset, applied to the other dataset and yielded the threshold of F1 score of 0.7 in classifiers for half of the medical subdomains we studied.

CONCLUSION

Our study shows that a supervised learning-based NLP approach is useful to develop medical subdomain classifiers. The deep learning algorithm with distributed word representation yields better performance yet shallow learning algorithms with the word and concept representation achieves comparable performance with better clinical interpretability. Portable classifiers may also be used across datasets from different institutions.

Collapse