1
|
Stella F, Calimeri F, Dragoni M. Special issue on learning from multiple data sources for decision making in health care. J Biomed Inform 2024; 153:104645. [PMID: 38636701 DOI: 10.1016/j.jbi.2024.104645] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/10/2024] [Revised: 04/11/2024] [Accepted: 04/16/2024] [Indexed: 04/20/2024]
Affiliation(s)
- Fabio Stella
- University of Milano-Bicocca, 336 Viale Sarca, 20126 Milano, Italy.
| | | | | |
Collapse
|
2
|
Huang H, Zheng O, Wang D, Yin J, Wang Z, Ding S, Yin H, Xu C, Yang R, Zheng Q, Shi B. ChatGPT for shaping the future of dentistry: the potential of multi-modal large language model. Int J Oral Sci 2023; 15:29. [PMID: 37507396 PMCID: PMC10382494 DOI: 10.1038/s41368-023-00239-y] [Citation(s) in RCA: 7] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/26/2023] [Revised: 07/06/2023] [Accepted: 07/13/2023] [Indexed: 07/30/2023] Open
Abstract
The ChatGPT, a lite and conversational variant of Generative Pretrained Transformer 4 (GPT-4) developed by OpenAI, is one of the milestone Large Language Models (LLMs) with billions of parameters. LLMs have stirred up much interest among researchers and practitioners in their impressive skills in natural language processing tasks, which profoundly impact various fields. This paper mainly discusses the future applications of LLMs in dentistry. We introduce two primary LLM deployment methods in dentistry, including automated dental diagnosis and cross-modal dental diagnosis, and examine their potential applications. Especially, equipped with a cross-modal encoder, a single LLM can manage multi-source data and conduct advanced natural language reasoning to perform complex clinical operations. We also present cases to demonstrate the potential of a fully automatic Multi-Modal LLM AI system for dentistry clinical application. While LLMs offer significant potential benefits, the challenges, such as data privacy, data quality, and model bias, need further study. Overall, LLMs have the potential to revolutionize dental diagnosis and treatment, which indicates a promising avenue for clinical application and research in dentistry.
Collapse
Affiliation(s)
- Hanyao Huang
- State Key Laboratory of Oral Diseases & National Clinical Research Center for Oral Diseases & Department of Oral and Maxillofacial Surgery, West China Hospital of Stomatology, Sichuan University, Chengdu, China.
| | - Ou Zheng
- Department of Civil, Environmental & Construction Engineering, University of Central Florida, Orlando, USA.
| | - Dongdong Wang
- Department of Civil, Environmental & Construction Engineering, University of Central Florida, Orlando, USA
| | - Jiayi Yin
- State Key Laboratory of Oral Diseases & National Clinical Research Center for Oral Diseases & Department of Oral and Maxillofacial Surgery, West China Hospital of Stomatology, Sichuan University, Chengdu, China
| | - Zijin Wang
- Department of Civil, Environmental & Construction Engineering, University of Central Florida, Orlando, USA
| | - Shengxuan Ding
- College of Transportation Engineering, University of Central Florida, Orlando, USA
| | - Heng Yin
- State Key Laboratory of Oral Diseases & National Clinical Research Center for Oral Diseases & Department of Oral and Maxillofacial Surgery, West China Hospital of Stomatology, Sichuan University, Chengdu, China
| | - Chuan Xu
- School of Transportation and Logistics, Southwest Jiaotong University, Chengdu, China
- C2SMART Center, Tandon School of Engineering, New York University, Brooklyn, USA
| | - Renjie Yang
- State Key Laboratory of Oral Diseases & National Clinical Research Center for Oral Diseases & Eastern Clinic, West China Hospital of Stomatology, Sichuan University, Chengdu, China
| | - Qian Zheng
- State Key Laboratory of Oral Diseases & National Clinical Research Center for Oral Diseases & Department of Oral and Maxillofacial Surgery, West China Hospital of Stomatology, Sichuan University, Chengdu, China
| | - Bing Shi
- State Key Laboratory of Oral Diseases & National Clinical Research Center for Oral Diseases & Department of Oral and Maxillofacial Surgery, West China Hospital of Stomatology, Sichuan University, Chengdu, China
| |
Collapse
|
3
|
Xue Y, Liu H. Exploration of the Dynamic Evolution of Online Public Opinion towards Waste Classification in Shanghai. INTERNATIONAL JOURNAL OF ENVIRONMENTAL RESEARCH AND PUBLIC HEALTH 2023; 20:1471. [PMID: 36674228 PMCID: PMC9859488 DOI: 10.3390/ijerph20021471] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 10/26/2021] [Revised: 11/26/2021] [Accepted: 11/30/2021] [Indexed: 06/17/2023]
Abstract
Shanghai is one of the fastest-growing metropolises and the first city in China to implement mandatory waste classification. Waste classification policy of Shanghai has attracted widespread attention since its implementation in July 2019. However, previous papers have not focused on online public attitudes surrounding the implementation of a waste classification policy in Shanghai. In order to fill this gap, this paper explored the dynamic evolution of online public attitudes towards waste classification in Shanghai by using sentiment analysis technology and topic modeling technology. It was found that the proportion of negative posts each month was about 20%; therefore, online public sentiment towards waste classification in Shanghai was generally positive. Compared with the first three months of policy implementation, the public sentiment towards Shanghai's waste classification became more positive, with the exception of two special periods. Negative posts in July 2019 mainly discussed waste's environmental hazards and policy provisions. New topics in negative posts in later months focused on some specific problems, including the process of throwing away wet waste, the allocated throwing times, the number of waste cans, takeaway meal disposal, and gathering activities. Improving the factors causing the negative sentiments in the posts will help the government better implement the policy. The paper will help the government to receive higher public support for the waste classification policy in Shanghai. The present findings also have great reference significance for other cities.
Collapse
Affiliation(s)
- Yingxia Xue
- Management Science and Engineering, School of Economics and Management, Tongji University, 4800 Caoan Rd., Shanghai 201804, China
| | - Honglei Liu
- Department of Construction Management, Changshu Institute of Technology, Changshu 215500, China
| |
Collapse
|
4
|
Saha A, Burns L, Kulkarni AM. A scoping review of natural language processing of radiology reports in breast cancer. Front Oncol 2023; 13:1160167. [PMID: 37124523 PMCID: PMC10130381 DOI: 10.3389/fonc.2023.1160167] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/06/2023] [Accepted: 03/28/2023] [Indexed: 05/02/2023] Open
Abstract
Various natural language processing (NLP) algorithms have been applied in the literature to analyze radiology reports pertaining to the diagnosis and subsequent care of cancer patients. Applications of this technology include cohort selection for clinical trials, population of large-scale data registries, and quality improvement in radiology workflows including mammography screening. This scoping review is the first to examine such applications in the specific context of breast cancer. Out of 210 identified articles initially, 44 met our inclusion criteria for this review. Extracted data elements included both clinical and technical details of studies that developed or evaluated NLP algorithms applied to free-text radiology reports of breast cancer. Our review illustrates an emphasis on applications in diagnostic and screening processes over treatment or therapeutic applications and describes growth in deep learning and transfer learning approaches in recent years, although rule-based approaches continue to be useful. Furthermore, we observe increased efforts in code and software sharing but not with data sharing.
Collapse
Affiliation(s)
- Ashirbani Saha
- Department of Oncology, McMaster University, Hamilton, ON, Canada
- Hamilton Health Sciences and McMaster University, Escarpment Cancer Research Institute, Hamilton, ON, Canada
- *Correspondence: Ashirbani Saha,
| | - Levi Burns
- Michael G. DeGroote School of Medicine, McMaster University, Hamilton, ON, Canada
| | | |
Collapse
|
5
|
Wang Y, Wang Y, Peng Z, Zhang F, Zhou L, Yang F. Medical text classification based on the discriminative pre-training model and prompt-tuning. Digit Health 2023; 9:20552076231193213. [PMID: 37559830 PMCID: PMC10408339 DOI: 10.1177/20552076231193213] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/13/2022] [Accepted: 07/18/2023] [Indexed: 08/11/2023] Open
Abstract
Medical text classification, as a fundamental medical natural language processing task, aims to identify the categories to which a short medical text belongs. Current research has focused on performing the medical text classification task using a pre-training language model through fine-tuning. However, this paradigm introduces additional parameters when training extra classifiers. Recent studies have shown that the "prompt-tuning" paradigm induces better performance in many natural language processing tasks because it bridges the gap between pre-training goals and downstream tasks. The main idea of prompt-tuning is to transform binary or multi-classification tasks into mask prediction tasks by fully exploiting the features learned by pre-training language models. This study explores, for the first time, how to classify medical texts using a discriminative pre-training language model called ERNIE-Health through prompt-tuning. Specifically, we attempt to perform prompt-tuning based on the multi-token selection task, which is a pre-training task of ERNIE-Health. The raw text is wrapped into a new sequence with a template in which the category label is replaced by a [UNK] token. The model is then trained to calculate the probability distribution of the candidate categories. Our method is tested on the KUAKE-Question Intention Classification and CHiP-Clinical Trial Criterion datasets and obtains the accuracy values of 0.866 and 0.861. In addition, the loss values of our model decrease faster throughout the training period compared to the fine-tuning. The experimental results provide valuable insights to the community and suggest that prompt-tuning can be a promising approach to improve the performance of pre-training models in domain-specific tasks.
Collapse
Affiliation(s)
- Yu Wang
- School of Biomedical Engineering, Anhui Medical University, Hefei, China
| | - Yuan Wang
- Hefei Institutes of Physical Science, Chinese Academy of Sciences, Hefei, China
| | - Zhenwan Peng
- School of Biomedical Engineering, Anhui Medical University, Hefei, China
| | - Feifan Zhang
- School of Biomedical Engineering, Anhui Medical University, Hefei, China
| | - Luyao Zhou
- School of Biomedical Engineering, Anhui Medical University, Hefei, China
| | - Fei Yang
- School of Biomedical Engineering, Anhui Medical University, Hefei, China
| |
Collapse
|
6
|
Dipnall JF, Lu J, Gabbe BJ, Cosic F, Edwards E, Page R, Du L. Comparison of state-of-the-art machine and deep learning algorithms to classify proximal humeral fractures using radiology text. Eur J Radiol 2022; 153:110366. [DOI: 10.1016/j.ejrad.2022.110366] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/02/2022] [Revised: 04/08/2022] [Accepted: 05/16/2022] [Indexed: 12/01/2022]
|
7
|
Chintalapudi N, Angeloni U, Battineni G, di Canio M, Marotta C, Rezza G, Sagaro GG, Silenzi A, Amenta F. LASSO Regression Modeling on Prediction of Medical Terms among Seafarers’ Health Documents Using Tidy Text Mining. Bioengineering (Basel) 2022; 9:bioengineering9030124. [PMID: 35324813 PMCID: PMC8945331 DOI: 10.3390/bioengineering9030124] [Citation(s) in RCA: 7] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/07/2022] [Revised: 03/02/2022] [Accepted: 03/16/2022] [Indexed: 12/31/2022] Open
Abstract
Generally, seafarers face a higher risk of illnesses and accidents than land workers. In most cases, there are no medical professionals on board seagoing vessels, which makes disease diagnosis even more difficult. When this occurs, onshore doctors may be able to provide medical advice through telemedicine by receiving better symptomatic and clinical details in the health abstracts of seafarers. The adoption of text mining techniques can assist in extracting diagnostic information from clinical texts. We applied lexicon sentimental analysis to explore the automatic labeling of positive and negative healthcare terms to seafarers’ text healthcare documents. This was due to the lack of experimental evaluations using computational techniques. In order to classify diseases and their associated symptoms, the LASSO regression algorithm is applied to analyze these text documents. A visualization of symptomatic data frequency for each disease can be achieved by analyzing TF-IDF values. The proposed approach allows for the classification of text documents with 93.8% accuracy by using a machine learning model called LASSO regression. It is possible to classify text documents effectively with tidy text mining libraries. In addition to delivering health assistance, this method can be used to classify diseases and establish health observatories. Knowledge developed in the present work will be applied to establish an Epidemiological Observatory of Seafarers’ Pathologies and Injuries. This Observatory will be a collaborative initiative of the Italian Ministry of Health, University of Camerino, and International Radio Medical Centre (C.I.R.M.), the Italian TMAS.
Collapse
Affiliation(s)
- Nalini Chintalapudi
- Clinical Research Centre, School of Medicinal and Health Products Sciences, University of Camerino, 62032 Camerino, Italy; (G.B.); (M.d.C.); (G.G.S.); (F.A.)
- Correspondence: ; Tel.: +39-35-33776704
| | - Ulrico Angeloni
- General Directorate of Health Prevention, Ministry of Health, 00144 Rome, Italy; (U.A.); (C.M.); (G.R.); (A.S.)
| | - Gopi Battineni
- Clinical Research Centre, School of Medicinal and Health Products Sciences, University of Camerino, 62032 Camerino, Italy; (G.B.); (M.d.C.); (G.G.S.); (F.A.)
| | - Marzio di Canio
- Clinical Research Centre, School of Medicinal and Health Products Sciences, University of Camerino, 62032 Camerino, Italy; (G.B.); (M.d.C.); (G.G.S.); (F.A.)
- Research Department, International Radio Medical Centre (C.I.R.M.), 00144 Rome, Italy
| | - Claudia Marotta
- General Directorate of Health Prevention, Ministry of Health, 00144 Rome, Italy; (U.A.); (C.M.); (G.R.); (A.S.)
| | - Giovanni Rezza
- General Directorate of Health Prevention, Ministry of Health, 00144 Rome, Italy; (U.A.); (C.M.); (G.R.); (A.S.)
| | - Getu Gamo Sagaro
- Clinical Research Centre, School of Medicinal and Health Products Sciences, University of Camerino, 62032 Camerino, Italy; (G.B.); (M.d.C.); (G.G.S.); (F.A.)
| | - Andrea Silenzi
- General Directorate of Health Prevention, Ministry of Health, 00144 Rome, Italy; (U.A.); (C.M.); (G.R.); (A.S.)
| | - Francesco Amenta
- Clinical Research Centre, School of Medicinal and Health Products Sciences, University of Camerino, 62032 Camerino, Italy; (G.B.); (M.d.C.); (G.G.S.); (F.A.)
- Research Department, International Radio Medical Centre (C.I.R.M.), 00144 Rome, Italy
| |
Collapse
|
8
|
Yang X, Mu D, Peng H, Li H, Wang Y, Wang P, Wang Y, Han S. Research and Application of Artificial Intelligence (AI) based on Electronic Health Records from Patients with Cancer: a Systematic Review (Preprint). JMIR Med Inform 2021; 10:e33799. [PMID: 35442195 PMCID: PMC9069295 DOI: 10.2196/33799] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/24/2021] [Revised: 01/24/2022] [Accepted: 03/14/2022] [Indexed: 01/12/2023] Open
Abstract
Background With the accumulation of electronic health records and the development of artificial intelligence, patients with cancer urgently need new evidence of more personalized clinical and demographic characteristics and more sophisticated treatment and prevention strategies. However, no research has systematically analyzed the application and significance of artificial intelligence based on electronic health records in cancer care. Objective The aim of this study was to conduct a review to introduce the current state and limitations of artificial intelligence based on electronic health records of patients with cancer and to summarize the performance of artificial intelligence in mining electronic health records and its impact on cancer care. Methods Three databases were systematically searched to retrieve potentially relevant papers published from January 2009 to October 2020. Four principal reviewers assessed the quality of the papers and reviewed them for eligibility based on the inclusion criteria in the extracted data. The summary measures used in this analysis were the number and frequency of occurrence of the themes. Results Of the 1034 papers considered, 148 papers met the inclusion criteria. Cancer care, especially cancers of female organs and digestive organs, could benefit from artificial intelligence based on electronic health records through cancer emergencies and prognostic estimates, cancer diagnosis and prediction, tumor stage detection, cancer case detection, and treatment pattern recognition. The models can always achieve an area under the curve of 0.7. Ensemble methods and deep learning are on the rise. In addition, electronic medical records in the existing studies are mainly in English and from private institutional databases. Conclusions Artificial intelligence based on electronic health records performed well and could be useful for cancer care. Improving the performance of artificial intelligence can help patients receive more scientific-based and accurate treatments. There is a need for the development of new methods and electronic health record data sharing and for increased passion and support from cancer specialists.
Collapse
Affiliation(s)
- Xinyu Yang
- Division of Clinical Research, The First Hospital of Jilin University, Changchun, China
- Department of Medical Informatics, School of Public Health, Jilin University, Changchun, China
| | - Dongmei Mu
- Division of Clinical Research, The First Hospital of Jilin University, Changchun, China
- Department of Medical Informatics, School of Public Health, Jilin University, Changchun, China
| | - Hao Peng
- Division of Clinical Research, The First Hospital of Jilin University, Changchun, China
- Department of Medical Informatics, School of Public Health, Jilin University, Changchun, China
| | - Hua Li
- Division of Clinical Research, The First Hospital of Jilin University, Changchun, China
- Department of Medical Informatics, School of Public Health, Jilin University, Changchun, China
| | - Ying Wang
- Division of Clinical Research, The First Hospital of Jilin University, Changchun, China
- Department of Medical Informatics, School of Public Health, Jilin University, Changchun, China
| | - Ping Wang
- Division of Clinical Research, The First Hospital of Jilin University, Changchun, China
- Department of Medical Informatics, School of Public Health, Jilin University, Changchun, China
| | - Yue Wang
- Division of Clinical Research, The First Hospital of Jilin University, Changchun, China
- Department of Medical Informatics, School of Public Health, Jilin University, Changchun, China
| | - Siqi Han
- Division of Clinical Research, The First Hospital of Jilin University, Changchun, China
- Department of Medical Informatics, School of Public Health, Jilin University, Changchun, China
| |
Collapse
|
9
|
Abstract
Electronic health records (EHRs) are becoming a vital source of data for healthcare quality improvement, research, and operations. However, much of the most valuable information contained in EHRs remains buried in unstructured text. The field of clinical text mining has advanced rapidly in recent years, transitioning from rule-based approaches to machine learning and, more recently, deep learning. With new methods come new challenges, however, especially for those new to the field. This review provides an overview of clinical text mining for those who are encountering it for the first time (e.g., physician researchers, operational analytics teams, machine learning scientists from other domains). While not a comprehensive survey, this review describes the state of the art, with a particular focus on new tasks and methods developed over the past few years. It also identifies key barriers between these remarkable technical advances and the practical realities of implementation in health systems and in industry.
Collapse
Affiliation(s)
- Bethany Percha
- Department of Medicine and Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, New York, NY 10025, USA;
| |
Collapse
|
10
|
Leveraging electronic health record data to inform hospital resource management : A systematic data mining approach. Health Care Manag Sci 2021; 24:716-741. [PMID: 34031792 DOI: 10.1007/s10729-021-09554-4] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/26/2020] [Accepted: 02/02/2021] [Indexed: 10/21/2022]
Abstract
Early identification of resource needs is instrumental in promoting efficient hospital resource management. Hospital information systems, and electronic health records (EHR) in particular, collect valuable demographic and clinical patient data from the moment patients are admitted, which can help predict expected resource needs in early stages of patient episodes. To this end, this article proposes a data mining methodology to systematically obtain predictions for relevant managerial variables by leveraging structured EHR data. Specifically, these managerial variables are: i) Diagnosis categories, ii) procedure codes, iii) diagnosis-related groups (DRGs), iv) outlier episodes and v) length of stay (LOS). The proposed methodology approaches the problem in four stages: Feature set construction, feature selection, prediction model development, and model performance evaluation. We tested this approach with an EHR dataset of 5,089 inpatient episodes and compared different classification and regression models (for categorical and continuous variables, respectively), performed temporal analysis of model performance, analyzed the impact of training set homogeneity on performance and assessed the contribution of different EHR data elements for model predictive power. Overall, our results indicate that inpatient EHR data can effectively be leveraged to inform resource management on multiple perspectives. Logistic regression (combined with minimal redundancy maximum relevance feature selection) and bagged decision trees yielded best results for predicting categorical and numerical managerial variables, respectively. Furthermore, our temporal analysis indicated that, while DRG classes are more difficult to predict, several diagnosis categories, procedure codes and LOS amongst shorter-stay patients can be predicted with higher confidence in early stages of patient stay. Lastly, value of information analysis indicated that diagnoses, medication and structured assessment forms were the most valuable EHR data elements in predicting managerial variables of interest through a data mining approach.
Collapse
|
11
|
Na HJ, Lee KC, Kim ST. Integrating Text-Mining and Balanced Scorecard Techniques to Investigate the Association between CEO Message of Homepage Words and Financial Status: Emphasis on Hospitals. Healthcare (Basel) 2021; 9:healthcare9040408. [PMID: 33916303 PMCID: PMC8067190 DOI: 10.3390/healthcare9040408] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/01/2021] [Revised: 03/19/2021] [Accepted: 03/22/2021] [Indexed: 11/16/2022] Open
Abstract
(1) Background: The Chief Executive Officer’s (CEO’s) message on a hospital’s homepage on the Internet contains various components, such as the hospital’s future vision, promises to customers, availability of upgraded services and public activities. This statement usually includes non-financial information as well as financial information about the corporate entity owning/operating the hospital. In addition, it provides useful information about not only the company’s goals and vision, but also firm performance targets and strategies for the future. This study aims to investigate associations between the CEO’s message and the financial status of the institution. We used the balanced scorecard framework to analyze what content on the hospital’s homepage is related to the hospital’s various financial ratios. (2) Methods: We adopted a text-mining method to extract significantly repeated keywords from the CEO’s message on the hospital’s website. Then, we classified these keywords using a balanced scorecard approach. To examine the relationship between keywords in the CEO’s message and the hospital’s financial ratios, a t-test was conducted for the difference in the term frequency divided by inverse document frequency (TF-IDF) mean of the home page contents and its relationship with the views of the balanced scorecard framework. (3) Results: According to our empirical results on 65 samples collected from local hospitals, there are some significant relationships between the qualitative content of the hospital’s homepage and the quantitative financial ratios that indicate profitability, activity, leverage, liquidity, and accumulating reserves for proper business purposes. (4) Conclusions: The introduction section of a homepage is the part most accessible to customers, containing the aims and ideals of the hospital and reflecting the institution’s values and visions. In addition, in the coverage of financial status, the organization can either emphasize financial strength or focus on other areas to divert attention from any weakness shown in the financial information. This study reminds us of the importance of the hospital website’s disclosure, and what can be inferred from the financial status of the hospital. It also highlights the need for reconciliation and harmony between the quantitative data, financial statements, and qualitative data in the CEO’s message. (5) Implications: To the best of our knowledge, this paper is the first research attempting to investigate the relationship between text on the hospital’s homepage and the hospital’s financial ratios using text-mining techniques and the balanced scorecard framework. Hospitals play a crucial role in a country’s welfare and healthcare industry. Nevertheless, in many countries, hospital organizations tend to remain a source of critical fiscal deficits due to ineffective and sloppy management. We expect that the result of this paper can provide hospital managers with useful information to address that situation.
Collapse
Affiliation(s)
- Hyung Jong Na
- School of Global Business Administration, Semyung University, Jecheon 27136, Korea;
| | - Kun Chang Lee
- SKK Business School, Sungkyunkwan University, Seoul 03063, Korea
- Correspondence:
| | - Seong Tae Kim
- School of Management, Kyung Hee University, Seoul 02447, Korea;
| |
Collapse
|
12
|
|
13
|
Bagheri A, Sammani A, van der Heijden PGM, Asselbergs FW, Oberski DL. ETM: Enrichment by topic modeling for automated clinical sentence classification to detect patients’ disease history. J Intell Inf Syst 2020. [DOI: 10.1007/s10844-020-00605-w] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/24/2022]
Abstract
AbstractGiven the rapid rate at which text data are being digitally gathered in the medical domain, there is growing need for automated tools that can analyze clinical notes and classify their sentences in electronic health records (EHRs). This study uses EHR texts to detect patients’ disease history from clinical sentences. However, in EHRs, sentences are less topic-focused and shorter than that in general domain, which leads to the sparsity of co-occurrence patterns and the lack of semantic features. To tackle this challenge, current approaches for clinical sentence classification are dependent on external information to improve classification performance. However, this is implausible owing to a lack of universal medical dictionaries. This study proposes the ETM (enrichment by topic modeling) algorithm, based on latent Dirichlet allocation, to smoothen the semantic representations of short sentences. The ETM enriches text representation by incorporating probability distributions generated by an unsupervised algorithm into it. It considers the length of the original texts to enhance representation by using an internal knowledge acquisition procedure. When it comes to clinical predictive modeling, interpretability improves the acceptance of the model. Thus, for clinical sentence classification, the ETM approach employs an initial TFiDF (term frequency inverse document frequency) representation, where we use the support vector machine and neural network algorithms for the classification task. We conducted three sets of experiments on a data set consisting of clinical cardiovascular notes from the Netherlands to test the sentence classification performance of the proposed method in comparison with prevalent approaches. The results show that the proposed ETM approach outperformed state-of-the-art baselines.
Collapse
|
14
|
Drozdov I, Forbes D, Szubert B, Hall M, Carlin C, Lowe DJ. Supervised and unsupervised language modelling in Chest X-Ray radiological reports. PLoS One 2020; 15:e0229963. [PMID: 32155219 PMCID: PMC7064166 DOI: 10.1371/journal.pone.0229963] [Citation(s) in RCA: 9] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/23/2019] [Accepted: 02/17/2020] [Indexed: 12/14/2022] Open
Abstract
Chest radiography (CXR) is the most commonly used imaging modality and deep neural network (DNN) algorithms have shown promise in effective triage of normal and abnormal radiograms. Typically, DNNs require large quantities of expertly labelled training exemplars, which in clinical contexts is a major bottleneck to effective modelling, as both considerable clinical skill and time is required to produce high-quality ground truths. In this work we evaluate thirteen supervised classifiers using two large free-text corpora and demonstrate that bi-directional long short-term memory (BiLSTM) networks with attention mechanism effectively identify Normal, Abnormal, and Unclear CXR reports in internal (n = 965 manually-labelled reports, f1-score = 0.94) and external (n = 465 manually-labelled reports, f1-score = 0.90) testing sets using a relatively small number of expert-labelled training observations (n = 3,856 annotated reports). Furthermore, we introduce a general unsupervised approach that accurately distinguishes Normal and Abnormal CXR reports in a large unlabelled corpus. We anticipate that the results presented in this work can be used to automatically extract standardized clinical information from free-text CXR radiological reports, facilitating the training of clinical decision support systems for CXR triage.
Collapse
Affiliation(s)
| | - Daniel Forbes
- Emergency Department, Queen Elizabeth University Hospital, Glasgow, Scotland
| | | | - Mark Hall
- Radiology Department, Queen Elizabeth University Hospital, Glasgow, Scotland
| | - Chris Carlin
- Department of Respiratory Medicine. Queen Elizabeth University Hospital, Glasgow, Scotland
| | - David J. Lowe
- Emergency Department, Queen Elizabeth University Hospital, Glasgow, Scotland
| |
Collapse
|
15
|
Zhou W, Shao F, Li J. Bioinformatic analysis of the molecular mechanism underlying bronchial pulmonary dysplasia using a text mining approach. Medicine (Baltimore) 2019; 98:e18493. [PMID: 31876736 PMCID: PMC6946243 DOI: 10.1097/md.0000000000018493] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Indexed: 11/26/2022] Open
Abstract
Bronchopulmonary dysplasia (BPD) is a common disease of premature infants with very low birth weight. The mechanism is inconclusive. The aim of this study is to systematically explore BPD-related genes and characterize their functions.Natural language processing analysis was used to identify BPD-related genes. Gene data were extracted from PubMed database. Gene ontology, pathway, and network analysis were carried out, and the result was integrated with corresponding database.In this study, 216 genes were identified as BPD-related genes with P < .05, and 30 pathways were identified as significant. A network of BPD-related genes was also constructed with 17 hub genes identified. In particular, phosphatidyl inositol-3-enzyme-serine/threonine kinase signaling pathway involved the largest number of genes. Insulin was found to be a promising candidate gene related with BPD, suggesting that it may serve as an effective therapeutic target.Our data may help to better understand the molecular mechanisms underlying BPD. However, the mechanisms of BPD are elusive, and further studies are needed.
Collapse
Affiliation(s)
- Weitao Zhou
- Department of Pediatrics, The First Affiliated Hospital of the University of Science and Technology of China
| | - Fei Shao
- Department of Oncology, Second Affiliated Hospital of Anhui Medical University, Hefei
| | - Jing Li
- Department of Pediatric Intensive Care Unit, Children's Hospital of Chongqing Medical University; Ministry of Education Key Laboratory of Child Development and Disorders; National Clinical Research Center for Child Health and Disorders; China International Science and Technology Cooperation base of Child Development and Critical Disorders; Children's Hospital of Chongqing Medical University
- Chongqing Key Laboratory of Pediatrics, Chongqing, China
| |
Collapse
|
16
|
Chen CJ, Warikoo N, Chang YC, Chen JH, Hsu WL. Medical knowledge infused convolutional neural networks for cohort selection in clinical trials. J Am Med Inform Assoc 2019; 26:1227-1236. [PMID: 31390470 DOI: 10.1093/jamia/ocz128] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/20/2019] [Revised: 06/18/2019] [Accepted: 07/04/2019] [Indexed: 12/24/2022] Open
Abstract
OBJECTIVE In this era of digitized health records, there has been a marked interest in using de-identified patient records for conducting various health related surveys. To assist in this research effort, we developed a novel clinical data representation model entitled medical knowledge-infused convolutional neural network (MKCNN), which is used for learning the clinical trial criteria eligibility status of patients to participate in cohort studies. MATERIALS AND METHODS In this study, we propose a clinical text representation infused with medical knowledge (MK). First, we isolate the noise from the relevant data using a medically relevant description extractor; then we utilize log-likelihood ratio based weights from selected sentences to highlight "met" and "not-met" knowledge-infused representations in bichannel setting for each instance. The combined medical knowledge-infused representation (MK) from these modules helps identify significant clinical criteria semantics, which in turn renders effective learning when used with a convolutional neural network architecture. RESULTS MKCNN outperforms other Medical Knowledge (MK) relevant learning architectures by approximately 3%; notably SVM and XGBoost implementations developed in this study. MKCNN scored 86.1% on F1metric, a gain of 6% above the average performance assessed from the submissions for n2c2 task. Although pattern/rule-based methods show a higher average performance for the n2c2 clinical data set, MKCNN significantly improves performance of machine learning implementations for clinical datasets. CONCLUSION MKCNN scored 86.1% on the F1 score metric. In contrast to many of the rule-based systems introduced during the n2c2 challenge workshop, our system presents a model that heavily draws on machine-based learning. In addition, the MK representations add more value to clinical comprehension and interpretation of natural texts.
Collapse
Affiliation(s)
- Chi-Jen Chen
- Graduate Institute of Data Science, College of Management, Taipei Medical University, Taipei, Taiwan
| | - Neha Warikoo
- Institute of Biomedical Informatics, National Yang-Ming University, Taipei, Taiwan.,Bioinformatics Program, Taiwan International Graduate Program, Institute of Information Science, Academia Sinica, Taipei, Taiwan
| | - Yung-Chun Chang
- Graduate Institute of Data Science, College of Management, Taipei Medical University, Taipei, Taiwan.,Clinical Big Data Research Center, Taipei Medical University Hospital, Taipei, Taiwan.,Pervasive AI Research Labs, Ministry of Science and Technology, Taipei, Taiwan
| | - Jin-Hua Chen
- Graduate Institute of Data Science, College of Management, Taipei Medical University, Taipei, Taiwan
| | - Wen-Lian Hsu
- Pervasive AI Research Labs, Ministry of Science and Technology, Taipei, Taiwan.,Institute of Information Science, Academia Sinica, Taipei, Taiwan
| |
Collapse
|
17
|
Rios A, Durbin EB, Hands I, Arnold SM, Shah D, Schwartz SM, Goulart BHL, Kavuluru R. Cross-registry neural domain adaptation to extract mutational test results from pathology reports. J Biomed Inform 2019; 97:103267. [PMID: 31401235 PMCID: PMC6736690 DOI: 10.1016/j.jbi.2019.103267] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/23/2019] [Revised: 07/30/2019] [Accepted: 08/05/2019] [Indexed: 10/26/2022]
Abstract
OBJECTIVE We study the performance of machine learning (ML) methods, including neural networks (NNs), to extract mutational test results from pathology reports collected by cancer registries. Given the lack of hand-labeled datasets for mutational test result extraction, we focus on the particular use-case of extracting Epidermal Growth Factor Receptor mutation results in non-small cell lung cancers. We explore the generalization of NNs across different registries where our goals are twofold: (1) to assess how well models trained on a registry's data port to test data from a different registry and (2) to assess whether and to what extent such models can be improved using state-of-the-art neural domain adaptation techniques under different assumptions about what is available (labeled vs unlabeled data) at the target registry site. MATERIALS AND METHODS We collected data from two registries: the Kentucky Cancer Registry (KCR) and the Fred Hutchinson Cancer Research Center (FH) Cancer Surveillance System. We combine NNs with adversarial domain adaptation to improve cross-registry performance. We compare to other classifiers in the standard supervised classification, unsupervised domain adaptation, and supervised domain adaptation scenarios. RESULTS The performance of ML methods varied between registries. To extract positive results, the basic convolutional neural network (CNN) had an F1 of 71.5% on the KCR dataset and 95.7% on the FH dataset. For the KCR dataset, the CNN F1 results were low when trained on FH data (Positive F1: 23%). Using our proposed adversarial CNN, without any labeled data, we match the F1 of the models trained directly on each target registry's data. The adversarial CNN F1 improved when trained on FH and applied to KCR dataset (Positive F1: 70.8%). We found similar performance improvements when we trained on KCR and tested on FH reports (Positive F1: 45% to 96%). CONCLUSION Adversarial domain adaptation improves the performance of NNs applied to pathology reports. In the unsupervised domain adaptation setting, we match the performance of models that are trained directly on target registry's data by using source registry's labeled data and unlabeled examples from the target registry.
Collapse
Affiliation(s)
- Anthony Rios
- Department of Information Systems and Cyber Security, University of Texas at San Antonio, USA
| | - Eric B Durbin
- Division of Biomedical Informatics, Dept. of Internal Medicine, University of Kentucky, USA; Kentucky Cancer Registry, Lexington, KY, USA
| | - Isaac Hands
- Kentucky Cancer Registry, Lexington, KY, USA
| | - Susanne M Arnold
- Markey Cancer Center, University of Kentucky, Lexington, KY, USA
| | - Darshil Shah
- Ironwood Cancer and Research Centers, Avondale, AZ, USA
| | | | | | - Ramakanth Kavuluru
- Division of Biomedical Informatics, Dept. of Internal Medicine, University of Kentucky, USA; Computer Science Department, University of Kentucky, USA.
| |
Collapse
|
18
|
Integrating Natural Language Processing and Machine Learning Algorithms to Categorize Oncologic Response in Radiology Reports. J Digit Imaging 2019; 31:178-184. [PMID: 29079959 DOI: 10.1007/s10278-017-0027-x] [Citation(s) in RCA: 41] [Impact Index Per Article: 8.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/18/2022] Open
Abstract
A significant volume of medical data remains unstructured. Natural language processing (NLP) and machine learning (ML) techniques have shown to successfully extract insights from radiology reports. However, the codependent effects of NLP and ML in this context have not been well-studied. Between April 1, 2015 and November 1, 2016, 9418 cross-sectional abdomen/pelvis CT and MR examinations containing our internal structured reporting element for cancer were separated into four categories: Progression, Stable Disease, Improvement, or No Cancer. We combined each of three NLP techniques with five ML algorithms to predict the assigned label using the unstructured report text and compared the performance of each combination. The three NLP algorithms included term frequency-inverse document frequency (TF-IDF), term frequency weighting (TF), and 16-bit feature hashing. The ML algorithms included logistic regression (LR), random decision forest (RDF), one-vs-all support vector machine (SVM), one-vs-all Bayes point machine (BPM), and fully connected neural network (NN). The best-performing NLP model consisted of tokenized unigrams and bigrams with TF-IDF. Increasing N-gram length yielded little to no added benefit for most ML algorithms. With all parameters optimized, SVM had the best performance on the test dataset, with 90.6 average accuracy and F score of 0.813. The interplay between ML and NLP algorithms and their effect on interpretation accuracy is complex. The best accuracy is achieved when both algorithms are optimized concurrently.
Collapse
|
19
|
Brown A, Kachura J. Natural Language Processing of Radiology Reports in Patients With Hepatocellular Carcinoma to Predict Radiology Resource Utilization. J Am Coll Radiol 2019; 16:840-844. [DOI: 10.1016/j.jacr.2018.12.004] [Citation(s) in RCA: 19] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/01/2018] [Revised: 12/02/2018] [Accepted: 12/03/2018] [Indexed: 12/13/2022]
|
20
|
Kocbek S, Kocbek P, Stozer A, Zupanic T, Groza T, Stiglic G. Building interpretable models for polypharmacy prediction in older chronic patients based on drug prescription records. PeerJ 2018; 6:e5765. [PMID: 30345175 PMCID: PMC6187991 DOI: 10.7717/peerj.5765] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/11/2018] [Accepted: 09/17/2018] [Indexed: 01/02/2023] Open
Abstract
Background Multimorbidity presents an increasingly common problem in older population, and is tightly related to polypharmacy, i.e., concurrent use of multiple medications by one individual. Detecting polypharmacy from drug prescription records is not only related to multimorbidity, but can also point at incorrect use of medicines. In this work, we build models for predicting polypharmacy from drug prescription records for newly diagnosed chronic patients. We evaluate the models’ performance with a strong focus on interpretability of the results. Methods A centrally collected nationwide dataset of prescription records was used to perform electronic phenotyping of patients for the following two chronic conditions: type 2 diabetes mellitus (T2D) and cardiovascular disease (CVD). In addition, a hospital discharge dataset was linked to the prescription records. A regularized regression model was built for 11 different experimental scenarios on two datasets, and complexity of the model was controlled with a maximum number of dimensions (MND) parameter. Performance and interpretability of the model were evaluated with AUC, AUPRC, calibration plots, and interpretation by a medical doctor. Results For the CVD model, AUC and AUPRC values of 0.900 (95% [0.898–0.901]) and 0.640 (0.635–0.645) were reached, respectively, while for the T2D model the values were 0.808 (0.803–0.812) and 0.732 (0.725–0.739). Reducing complexity of the model by 65% and 48% for CVD and T2D, resulted in 3% and 4% lower AUC, and 4% and 5% lower AUPRC values, respectively. Calibration plots for our models showed that we can achieve moderate calibration with reducing the models’ complexity without significant loss of predictive performance. Discussion In this study, we found that it is possible to use drug prescription data to build a model for polypharmacy prediction in older population. In addition, the study showed that it is possible to find a balance between good performance and interpretability of the model, and achieve acceptable calibration at the same time.
Collapse
Affiliation(s)
- Simon Kocbek
- Kinghorn Centre for Clinical Genomics, Garvan Institute of Medical Research, Sydney, NSW, Australia.,Advanced Analytics Institute, Faculty of Engineering and IT, University of Technology, Sydney, New South Wales, Australia.,Department of Computing and Information Systems, University of Melbourne, Melbourne, Victoria, Australia
| | - Primoz Kocbek
- Faculty of Health Sciences, University of Maribor, Maribor, Slovenia
| | - Andraz Stozer
- Institute of Physiology, Faculty of Medicine, University of Maribor, Maribor, Slovenia
| | - Tina Zupanic
- Healthcare Data Center, The National Institute of Public Health of the Republic of Slovenia, Ljubljana, Slovenia
| | - Tudor Groza
- Kinghorn Centre for Clinical Genomics, Garvan Institute of Medical Research, Sydney, NSW, Australia.,St Vincent's Clinical School, Faculty of Medicine, UNSW Sydney, Sydney, NSW, Australia
| | - Gregor Stiglic
- Faculty of Health Sciences, University of Maribor, Maribor, Slovenia.,Faculty of Electrical Engineering and Computer Science, University of Maribor, Maribor, Slovenia
| |
Collapse
|
21
|
Pérez J, Pérez A, Casillas A, Gojenola K. Cardiology record multi-label classification using latent Dirichlet allocation. COMPUTER METHODS AND PROGRAMS IN BIOMEDICINE 2018; 164:111-119. [PMID: 30195419 DOI: 10.1016/j.cmpb.2018.07.002] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/14/2017] [Revised: 06/18/2018] [Accepted: 07/16/2018] [Indexed: 06/08/2023]
Abstract
BACKGROUND AND OBJECTIVES Electronic health records (EHRs) convey vast and valuable knowledge about dynamically changing clinical practices. Indeed, clinical documentation entails the inspection of massive number of records across hospitals and hospital sections. The goal of this study is to provide an efficient framework that will help clinicians explore EHRs and attain alternative views related to both patient-segments and diseases, like clustering and statistical information about the development of heart diseases (replacement of pacemakers, valve implantation etc.) in co-occurrence with other diseases. The task is challenging, dealing with lengthy health records and a high number of classes in a multi-label setting. METHODS LDA is a statistical procedure optimized to explain a document by multinomial distributions on their latent topics and the topics by distributions on related words. These distributions allow to represent collections of texts into a continuous space enabling distance-based associations between documents and also revealing the underlying topics. The topic models were assessed by means of four divergence metrics. In addition, we applied LDA to the task of multi-label document classification of EHRs according to the International Classification of Diseases 10th Clinical Modification (ICD-10). The set of EHRs had assigned 7 codes on average over 970 different codes corresponding to cardiology. RESULTS First, the discriminative ability of topic models was assessed using dissimilarity metrics. Nevertheless, there was an open question regarding the interpretability of automatically discovered topics. To address this issue, we explored the connection between the latent topics and ICD-10. EHRs were represented by means of LDA and, next, supervised classifiers were inferred from those representations. Given the low-dimensional representation provided by LDA, the search was computationally efficient compared to symbolic approaches such as TF-IDF. The classifiers achieved an average AUC of 77.79. As a side contribution, with this work we released the software implemented in Python and R to both train and evaluate the models. CONCLUSIONS Topic modeling offers a means of representing EHRs in a small dimensional continuous space. This representation conveys relevant information as hidden topics in a comprehensive manner. Moreover, in practice, this compact representation allowed to extract the ICD-10 codes associated to EHRs.
Collapse
Affiliation(s)
- Jorge Pérez
- IXA Research Group, University of the Basque Country (UPV-EHU), Manuel Lardizabal 1, 20080, Donostia. http://ixa.eus
| | - Alicia Pérez
- IXA Research Group, University of the Basque Country (UPV-EHU), Manuel Lardizabal 1, 20080, Donostia.
| | - Arantza Casillas
- IXA Research Group, University of the Basque Country (UPV-EHU), Manuel Lardizabal 1, 20080, Donostia. http://ixa.eus
| | - Koldo Gojenola
- IXA Research Group, University of the Basque Country (UPV-EHU), Manuel Lardizabal 1, 20080, Donostia. http://ixa.eus
| |
Collapse
|
22
|
Weng WH, Wagholikar KB, McCray AT, Szolovits P, Chueh HC. Medical subdomain classification of clinical notes using a machine learning-based natural language processing approach. BMC Med Inform Decis Mak 2017; 17:155. [PMID: 29191207 PMCID: PMC5709846 DOI: 10.1186/s12911-017-0556-8] [Citation(s) in RCA: 79] [Impact Index Per Article: 11.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/15/2017] [Accepted: 11/19/2017] [Indexed: 01/18/2023] Open
Abstract
BACKGROUND The medical subdomain of a clinical note, such as cardiology or neurology, is useful content-derived metadata for developing machine learning downstream applications. To classify the medical subdomain of a note accurately, we have constructed a machine learning-based natural language processing (NLP) pipeline and developed medical subdomain classifiers based on the content of the note. METHODS We constructed the pipeline using the clinical NLP system, clinical Text Analysis and Knowledge Extraction System (cTAKES), the Unified Medical Language System (UMLS) Metathesaurus, Semantic Network, and learning algorithms to extract features from two datasets - clinical notes from Integrating Data for Analysis, Anonymization, and Sharing (iDASH) data repository (n = 431) and Massachusetts General Hospital (MGH) (n = 91,237), and built medical subdomain classifiers with different combinations of data representation methods and supervised learning algorithms. We evaluated the performance of classifiers and their portability across the two datasets. RESULTS The convolutional recurrent neural network with neural word embeddings trained-medical subdomain classifier yielded the best performance measurement on iDASH and MGH datasets with area under receiver operating characteristic curve (AUC) of 0.975 and 0.991, and F1 scores of 0.845 and 0.870, respectively. Considering better clinical interpretability, linear support vector machine-trained medical subdomain classifier using hybrid bag-of-words and clinically relevant UMLS concepts as the feature representation, with term frequency-inverse document frequency (tf-idf)-weighting, outperformed other shallow learning classifiers on iDASH and MGH datasets with AUC of 0.957 and 0.964, and F1 scores of 0.932 and 0.934 respectively. We trained classifiers on one dataset, applied to the other dataset and yielded the threshold of F1 score of 0.7 in classifiers for half of the medical subdomains we studied. CONCLUSION Our study shows that a supervised learning-based NLP approach is useful to develop medical subdomain classifiers. The deep learning algorithm with distributed word representation yields better performance yet shallow learning algorithms with the word and concept representation achieves comparable performance with better clinical interpretability. Portable classifiers may also be used across datasets from different institutions.
Collapse
Affiliation(s)
- Wei-Hung Weng
- Department of Biomedical Informatics, Harvard Medical School, 10 Shattuck Street, 4th Floor, Boston, MA 02115 USA
- Laboratory of Computer Science, Massachusetts General Hospital, 50 Staniford Street, Suite 750, Boston, MA 02114 USA
- Computer Science and Artificial Intelligence Laboratory, Massachusetts Institute of Technology, 32 Vassar Street, Cambridge, MA 02139 USA
| | - Kavishwar B. Wagholikar
- Laboratory of Computer Science, Massachusetts General Hospital, 50 Staniford Street, Suite 750, Boston, MA 02114 USA
- Department of Medicine, Massachusetts General Hospital, 55 Fruit St, Boston, MA 02114 USA
| | - Alexa T. McCray
- Department of Biomedical Informatics, Harvard Medical School, 10 Shattuck Street, 4th Floor, Boston, MA 02115 USA
| | - Peter Szolovits
- Computer Science and Artificial Intelligence Laboratory, Massachusetts Institute of Technology, 32 Vassar Street, Cambridge, MA 02139 USA
| | - Henry C. Chueh
- Laboratory of Computer Science, Massachusetts General Hospital, 50 Staniford Street, Suite 750, Boston, MA 02114 USA
- Department of Medicine, Massachusetts General Hospital, 55 Fruit St, Boston, MA 02114 USA
| |
Collapse
|