1
|
Maciejewski C, Ozierański K, Barwiołek A, Basza M, Bożym A, Ciurla M, Janusz Krajsman M, Maciejewska M, Lodziński P, Opolski G, Grabowski M, Cacko A, Balsam P. AssistMED project: Transforming cardiology cohort characterisation from electronic health records through natural language processing - Algorithm design, preliminary results, and field prospects. Int J Med Inform 2024; 185:105380. [PMID: 38447318 DOI: 10.1016/j.ijmedinf.2024.105380] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/12/2023] [Revised: 02/15/2024] [Accepted: 02/16/2024] [Indexed: 03/08/2024]
Abstract
INTRODUCTION Electronic health records (EHR) are of great value for clinical research. However, EHR consists primarily of unstructured text which must be analysed by a human and coded into a database before data analysis- a time-consuming and costly process limiting research efficiency. Natural language processing (NLP) can facilitate data retrieval from unstructured text. During AssistMED project, we developed a practical, NLP tool that automatically provides comprehensive clinical characteristics of patients from EHR, that is tailored to clinical researchers needs. MATERIAL AND METHODS AssistMED retrieves patient characteristics regarding clinical conditions, medications with dosage, and echocardiographic parameters with clinically oriented data structure and provides researcher-friendly database output. We validate the algorithm performance against manual data retrieval and provide critical quantitative and qualitative analysis. RESULTS AssistMED analysed the presence of 56 clinical conditions, medications from 16 drug groups with dosage and 15 numeric echocardiographic parameters in a sample of 400 patients hospitalized in the cardiology unit. No statistically significant differences between algorithm and human retrieval were noted. Qualitative analysis revealed that disagreements with manual annotation were primarily accounted to random algorithm errors, erroneous human annotation and lack of advanced context awareness of our tool. CONCLUSIONS Current NLP approaches are feasible to acquire accurate and detailed patient characteristics tailored to clinical researchers' needs from EHR. We present an in-depth description of an algorithm development and validation process, discuss obstacles and pinpoint potential solutions, including opportunities arising with recent advancements in the field of NLP, such as large language models.
Collapse
Affiliation(s)
- Cezary Maciejewski
- 1st Chair and Department of Cardiology, Medical University of Warsaw, 02-091 Warszawa, Poland; Doctoral School, Medical University of Warsaw, 02-091 Warszawa, Poland; Department of Medical Informatics and Telemedicine, Medical University of Warsaw, 02-091 Warszawa, Poland
| | - Krzysztof Ozierański
- 1st Chair and Department of Cardiology, Medical University of Warsaw, 02-091 Warszawa, Poland.
| | - Adam Barwiołek
- Codifive sp. z o.o., Lindleya 16, 02-013 Warszawa, Poland
| | - Mikołaj Basza
- Medical University of Silesia in Katowice, 40-055 Katowice, Poland
| | - Aleksandra Bożym
- 1st Chair and Department of Cardiology, Medical University of Warsaw, 02-091 Warszawa, Poland
| | - Michalina Ciurla
- 1st Chair and Department of Cardiology, Medical University of Warsaw, 02-091 Warszawa, Poland
| | - Maciej Janusz Krajsman
- Department of Medical Informatics and Telemedicine, Medical University of Warsaw, 02-091 Warszawa, Poland
| | | | - Piotr Lodziński
- 1st Chair and Department of Cardiology, Medical University of Warsaw, 02-091 Warszawa, Poland
| | - Grzegorz Opolski
- 1st Chair and Department of Cardiology, Medical University of Warsaw, 02-091 Warszawa, Poland
| | - Marcin Grabowski
- 1st Chair and Department of Cardiology, Medical University of Warsaw, 02-091 Warszawa, Poland
| | - Andrzej Cacko
- 1st Chair and Department of Cardiology, Medical University of Warsaw, 02-091 Warszawa, Poland; Department of Medical Informatics and Telemedicine, Medical University of Warsaw, 02-091 Warszawa, Poland
| | - Paweł Balsam
- 1st Chair and Department of Cardiology, Medical University of Warsaw, 02-091 Warszawa, Poland
| |
Collapse
|
2
|
Richter-Pechanski P, Geis NA, Kiriakou C, Schwab DM, Dieterich C. Automatic extraction of 12 cardiovascular concepts from German discharge letters using pre-trained language models. Digit Health 2021; 7:20552076211057662. [PMID: 34868618 PMCID: PMC8637713 DOI: 10.1177/20552076211057662] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/03/2021] [Accepted: 10/15/2021] [Indexed: 11/17/2022] Open
Abstract
Objective A vast amount of medical data is still stored in unstructured text documents.
We present an automated method of information extraction from German
unstructured clinical routine data from the cardiology domain enabling their
usage in state-of-the-art data-driven deep learning projects. Methods We evaluated pre-trained language models to extract a set of 12
cardiovascular concepts in German discharge letters. We compared three
bidirectional encoder representations from transformers pre-trained on
different corpora and fine-tuned them on the task of cardiovascular concept
extraction using 204 discharge letters manually annotated by cardiologists
at the University Hospital Heidelberg. We compared our results with
traditional machine learning methods based on a long short-term memory
network and a conditional random field. Results Our best performing model, based on publicly available German pre-trained
bidirectional encoder representations from the transformer model, achieved a
token-wise micro-average F1-score of 86% and outperformed the baseline by at
least 6%. Moreover, this approach achieved the best trade-off between
precision (positive predictive value) and recall (sensitivity). Conclusion Our results show the applicability of state-of-the-art deep learning methods
using pre-trained language models for the task of cardiovascular concept
extraction using limited training data. This minimizes annotation efforts,
which are currently the bottleneck of any application of data-driven deep
learning projects in the clinical domain for German and many other European
languages.
Collapse
Affiliation(s)
- Phillip Richter-Pechanski
- Section of Bioinformatics and Systems Cardiology, Klaus Tschira Institute for Integrative Computational Cardiology, Heidelberg, Germany.,Department of Internal Medicine III, University Hospital Heidelberg, Heidelberg, Germany.,German Center for Cardiovascular Research (DZHK) - Partner Site Heidelberg/Mannheim, Mannheim, Germany.,Informatics for Life, Heidelberg, Germany
| | - Nicolas A Geis
- Department of Internal Medicine III, University Hospital Heidelberg, Heidelberg, Germany.,Informatics for Life, Heidelberg, Germany
| | - Christina Kiriakou
- Department of Internal Medicine III, University Hospital Heidelberg, Heidelberg, Germany
| | - Dominic M Schwab
- Department of Internal Medicine III, University Hospital Heidelberg, Heidelberg, Germany
| | - Christoph Dieterich
- Section of Bioinformatics and Systems Cardiology, Klaus Tschira Institute for Integrative Computational Cardiology, Heidelberg, Germany.,Department of Internal Medicine III, University Hospital Heidelberg, Heidelberg, Germany.,German Center for Cardiovascular Research (DZHK) - Partner Site Heidelberg/Mannheim, Mannheim, Germany.,Informatics for Life, Heidelberg, Germany
| |
Collapse
|
3
|
Seetharam K, Shrestha S, Sengupta PP. Cardiovascular Imaging and Intervention Through the Lens of Artificial Intelligence. Interv Cardiol 2021; 16:e31. [PMID: 34754333 PMCID: PMC8559149 DOI: 10.15420/icr.2020.04] [Citation(s) in RCA: 11] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/07/2020] [Accepted: 06/18/2021] [Indexed: 12/13/2022] Open
Abstract
Artificial Intelligence (AI) is the simulation of human intelligence in machines so they can perform various actions and execute decision-making. Machine learning (ML), a branch of AI, can analyse information from data and discover novel patterns. AI and ML are rapidly gaining prominence in healthcare as data become increasingly complex. These algorithms can enhance the role of cardiovascular imaging by automating many tasks or calculations, find new patterns or phenotypes in data and provide alternative diagnoses. In interventional cardiology, AI can assist in intraprocedural guidance, intravascular imaging and provide additional information to the operator. AI is slowly expanding its boundaries into interventional cardiology and can fundamentally alter the field. In this review, the authors discuss how AI can enhance the role of cardiovascular imaging and imaging in interventional cardiology.
Collapse
Affiliation(s)
- Karthik Seetharam
- West Virginia University Medicine Heart and Vascular Institute Morgantown, WV, US
| | - Sirish Shrestha
- West Virginia University Medicine Heart and Vascular Institute Morgantown, WV, US
| | - Partho P Sengupta
- West Virginia University Medicine Heart and Vascular Institute Morgantown, WV, US
| |
Collapse
|
4
|
AlSaieedi A, Salhi A, Tifratene F, Raies AB, Hungler A, Uludag M, Van Neste C, Bajic VB, Gojobori T, Essack M. DES-Tcell is a knowledgebase for exploring immunology-related literature. Sci Rep 2021; 11:14344. [PMID: 34253812 PMCID: PMC8275784 DOI: 10.1038/s41598-021-93809-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/08/2021] [Accepted: 06/24/2021] [Indexed: 12/02/2022] Open
Abstract
T-cells are a subtype of white blood cells circulating throughout the body, searching for infected and abnormal cells. They have multifaceted functions that include scanning for and directly killing cells infected with intracellular pathogens, eradicating abnormal cells, orchestrating immune response by activating and helping other immune cells, memorizing encountered pathogens, and providing long-lasting protection upon recurrent infections. However, T-cells are also involved in immune responses that result in organ transplant rejection, autoimmune diseases, and some allergic diseases. To support T-cell research, we developed the DES-Tcell knowledgebase (KB). This KB incorporates text- and data-mined information that can expedite retrieval and exploration of T-cell relevant information from the large volume of published T-cell-related research. This KB enables exploration of data through concepts from 15 topic-specific dictionaries, including immunology-related genes, mutations, pathogens, and pathways. We developed three case studies using DES-Tcell, one of which validates effective retrieval of known associations by DES-Tcell. The second and third case studies focuses on concepts that are common to Grave’s disease (GD) and Hashimoto’s thyroiditis (HT). Several reports have shown that up to 20% of GD patients treated with antithyroid medication develop HT, thus suggesting a possible conversion or shift from GD to HT disease. DES-Tcell found miR-4442 links to both GD and HT, and that miR-4442 possibly targets the autoimmune disease risk factor CD6, which provides potential new knowledge derived through the use of DES-Tcell. According to our understanding, DES-Tcell is the first KB dedicated to exploring T-cell-relevant information via literature-mining, data-mining, and topic-specific dictionaries.
Collapse
Affiliation(s)
- Ahdab AlSaieedi
- Department of Medical Laboratory Technology (MLT), Faculty of Applied Medical Sciences (FAMS), King Abdulaziz University (KAU), Jeddah, 21589-80324, Saudi Arabia
| | - Adil Salhi
- Computer, Electrical, and Mathematical Sciences and Engineering Division (CEMSE), Computational Bioscience Research Center (CBRC), King Abdullah University of Science and Technology (KAUST), Thuwal, 23955-6900, Saudi Arabia
| | - Faroug Tifratene
- Computer, Electrical, and Mathematical Sciences and Engineering Division (CEMSE), Computational Bioscience Research Center (CBRC), King Abdullah University of Science and Technology (KAUST), Thuwal, 23955-6900, Saudi Arabia
| | - Arwa Bin Raies
- Computer, Electrical, and Mathematical Sciences and Engineering Division (CEMSE), Computational Bioscience Research Center (CBRC), King Abdullah University of Science and Technology (KAUST), Thuwal, 23955-6900, Saudi Arabia
| | - Arnaud Hungler
- Computer, Electrical, and Mathematical Sciences and Engineering Division (CEMSE), Computational Bioscience Research Center (CBRC), King Abdullah University of Science and Technology (KAUST), Thuwal, 23955-6900, Saudi Arabia
| | - Mahmut Uludag
- Computer, Electrical, and Mathematical Sciences and Engineering Division (CEMSE), Computational Bioscience Research Center (CBRC), King Abdullah University of Science and Technology (KAUST), Thuwal, 23955-6900, Saudi Arabia
| | - Christophe Van Neste
- Computer, Electrical, and Mathematical Sciences and Engineering Division (CEMSE), Computational Bioscience Research Center (CBRC), King Abdullah University of Science and Technology (KAUST), Thuwal, 23955-6900, Saudi Arabia
| | - Vladimir B Bajic
- Computer, Electrical, and Mathematical Sciences and Engineering Division (CEMSE), Computational Bioscience Research Center (CBRC), King Abdullah University of Science and Technology (KAUST), Thuwal, 23955-6900, Saudi Arabia
| | - Takashi Gojobori
- Computer, Electrical, and Mathematical Sciences and Engineering Division (CEMSE), Computational Bioscience Research Center (CBRC), King Abdullah University of Science and Technology (KAUST), Thuwal, 23955-6900, Saudi Arabia
| | - Magbubah Essack
- Computer, Electrical, and Mathematical Sciences and Engineering Division (CEMSE), Computational Bioscience Research Center (CBRC), King Abdullah University of Science and Technology (KAUST), Thuwal, 23955-6900, Saudi Arabia.
| |
Collapse
|
5
|
Large-scale identification of aortic stenosis and its severity using natural language processing on electronic health records. CARDIOVASCULAR DIGITAL HEALTH JOURNAL 2021; 2:156-163. [PMID: 35265904 PMCID: PMC8890044 DOI: 10.1016/j.cvdhj.2021.03.003] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/02/2023] Open
Abstract
Background Objective Methods Results Conclusion
Collapse
|
6
|
Yang Z, Xu W, Chen R. A deep learning-based multi-turn conversation modeling for diagnostic Q&A document recommendation. Inf Process Manag 2021. [DOI: 10.1016/j.ipm.2020.102485] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/22/2022]
|
7
|
Tran L, Chi L, Bonti A, Abdelrazek M, Chen YPP. Mortality Prediction of Patients With Cardiovascular Disease Using Medical Claims Data Under Artificial Intelligence Architectures: Validation Study. JMIR Med Inform 2021; 9:e25000. [PMID: 33792549 PMCID: PMC8050753 DOI: 10.2196/25000] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/14/2020] [Revised: 11/17/2020] [Accepted: 12/05/2020] [Indexed: 11/23/2022] Open
Abstract
Background Cardiovascular disease (CVD) is the greatest health problem in Australia, which kills more people than any other disease and incurs enormous costs for the health care system. In this study, we present a benchmark comparison of various artificial intelligence (AI) architectures for predicting the mortality rate of patients with CVD using structured medical claims data. Compared with other research in the clinical literature, our models are more efficient because we use a smaller number of features, and this study could help health professionals accurately choose AI models to predict mortality among patients with CVD using only claims data before a clinic visit. Objective This study aims to support health clinicians in accurately predicting mortality among patients with CVD using only claims data before a clinic visit. Methods The data set was obtained from the Medicare Benefits Scheme and Pharmaceutical Benefits Scheme service information in the period between 2004 and 2014, released by the Department of Health Australia in 2016. It included 346,201 records, corresponding to 346,201 patients. A total of five AI algorithms, including four classical machine learning algorithms (logistic regression [LR], random forest [RF], extra trees [ET], and gradient boosting trees [GBT]) and a deep learning algorithm, which is a densely connected neural network (DNN), were developed and compared in this study. In addition, because of the minority of deceased patients in the data set, a separate experiment using the Synthetic Minority Oversampling Technique (SMOTE) was conducted to enrich the data. Results Regarding model performance, in terms of discrimination, GBT and RF were the models with the highest area under the receiver operating characteristic curve (97.8% and 97.7%, respectively), followed by ET (96.8%) and LR (96.4%), whereas DNN was the least discriminative (95.3%). In terms of reliability, LR predictions were the least calibrated compared with the other four algorithms. In this study, despite increasing the training time, SMOTE was proven to further improve the model performance of LR, whereas other algorithms, especially GBT and DNN, worked well with class imbalanced data. Conclusions Compared with other research in the clinical literature involving AI models using claims data to predict patient health outcomes, our models are more efficient because we use a smaller number of features but still achieve high performance. This study could help health professionals accurately choose AI models to predict mortality among patients with CVD using only claims data before a clinic visit.
Collapse
Affiliation(s)
- Linh Tran
- School of Info Technology, Deakin University, Burwood, Australia
| | - Lianhua Chi
- Department of Computer Science and Information Technology, La Trobe University, Bundoora, Australia
| | - Alessio Bonti
- School of Info Technology, Deakin University, Burwood, Australia
| | | | - Yi-Ping Phoebe Chen
- Department of Computer Science and Information Technology, La Trobe University, Bundoora, Australia
| |
Collapse
|
8
|
Caufield JH, Sigdel D, Fu J, Choi H, Guevara-Gonzalez V, Wang D, Ping P. Cardiovascular Informatics: building a bridge to data harmony. Cardiovasc Res 2021; 118:732-745. [PMID: 33751044 DOI: 10.1093/cvr/cvab067] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 09/01/2020] [Accepted: 03/03/2021] [Indexed: 12/11/2022] Open
Abstract
The search for new strategies for better understanding cardiovascular disease is a constant one, spanning multitudinous types of observations and studies. A comprehensive characterization of each disease state and its biomolecular underpinnings relies upon insights gleaned from extensive information collection of various types of data. Researchers and clinicians in cardiovascular biomedicine repeatedly face questions regarding which types of data may best answer their questions, how to integrate information from multiple datasets of various types, and how to adapt emerging advances in machine learning and/or artificial intelligence to their needs in data processing. Frequently lauded as a field with great practical and translational potential, the interface between biomedical informatics and cardiovascular medicine is challenged with staggeringly massive datasets. Successful application of computational approaches to decode these complex and gigantic amounts of information becomes an essential step toward realizing the desired benefits. In this review, we examine recent efforts to adapt informatics strategies to cardiovascular biomedical research: automated information extraction and unification of multifaceted -omics data. We discuss how and why this interdisciplinary space of Cardiovascular Informatics is particularly relevant to and supportive of current experimental and clinical research. We describe in detail how open data sources and methods can drive discovery while demanding few initial resources, an advantage afforded by widespread availability of cloud computing-driven platforms. Subsequently, we provide examples of how interoperable computational systems facilitate exploration of data from multiple sources, including both consistently-formatted structured data and unstructured data. Taken together, these approaches for achieving data harmony enable molecular phenotyping of cardiovascular (CV) diseases and unification of cardiovascular knowledge.
Collapse
Affiliation(s)
- J Harry Caufield
- NHLBI Integrated Cardiovascular Data Science Training Program at University of California, Los Angeles (UCLA), Los Angeles, CA, 90095, USA.,Departments of Physiology at UCLA School of Medicine, Los Angeles, CA, 90095, USA
| | - Dibakar Sigdel
- NHLBI Integrated Cardiovascular Data Science Training Program at University of California, Los Angeles (UCLA), Los Angeles, CA, 90095, USA.,Departments of Physiology at UCLA School of Medicine, Los Angeles, CA, 90095, USA
| | - John Fu
- NHLBI Integrated Cardiovascular Data Science Training Program at University of California, Los Angeles (UCLA), Los Angeles, CA, 90095, USA
| | - Howard Choi
- NHLBI Integrated Cardiovascular Data Science Training Program at University of California, Los Angeles (UCLA), Los Angeles, CA, 90095, USA
| | - Vladimir Guevara-Gonzalez
- NHLBI Integrated Cardiovascular Data Science Training Program at University of California, Los Angeles (UCLA), Los Angeles, CA, 90095, USA
| | - Ding Wang
- Departments of Physiology at UCLA School of Medicine, Los Angeles, CA, 90095, USA
| | - Peipei Ping
- NHLBI Integrated Cardiovascular Data Science Training Program at University of California, Los Angeles (UCLA), Los Angeles, CA, 90095, USA.,Departments of Physiology at UCLA School of Medicine, Los Angeles, CA, 90095, USA.,Department of Medicine (Cardiology) at UCLA School of Medicine, Los Angeles, CA, 90095, USA.,Bioinformatics and Medical Informatics, Los Angeles, CA, 90095, USA.,Scalable Analytics Institute (ScAi) at UCLA School of Engineering, Los Angeles, CA, 90095, USA
| |
Collapse
|
9
|
Pellathy T, Saul M, Clermont G, Dubrawski AW, Pinsky MR, Hravnak M. Accuracy of identifying hospital acquired venous thromboembolism by administrative coding: implications for big data and machine learning research. J Clin Monit Comput 2021; 36:397-405. [PMID: 33558981 DOI: 10.1007/s10877-021-00664-6] [Citation(s) in RCA: 9] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/19/2020] [Accepted: 01/20/2021] [Indexed: 12/23/2022]
Abstract
Big data analytics research using heterogeneous electronic health record (EHR) data requires accurate identification of disease phenotype cases and controls. Overreliance on ground truth determination based on administrative data can lead to biased and inaccurate findings. Hospital-acquired venous thromboembolism (HA-VTE) is challenging to identify due to its temporal evolution and variable EHR documentation. To establish ground truth for machine learning modeling, we compared accuracy of HA-VTE diagnoses made by administrative coding to manual review of gold standard diagnostic test results. We performed retrospective analysis of EHR data on 3680 adult stepdown unit patients identifying HA-VTE. International Classification of Diseases, Ninth Revision (ICD-9-CM) codes for VTE were identified. 4544 radiology reports associated with VTE diagnostic tests were screened using terminology extraction and then manually reviewed by a clinical expert to confirm diagnosis. Of 415 cases with ICD-9-CM codes for VTE, 219 were identified with acute onset type codes. Test report review identified 158 new-onset HA-VTE cases. Only 40% of ICD-9-CM coded cases (n = 87) were confirmed by a positive diagnostic test report, leaving the majority of administratively coded cases unsubstantiated by confirmatory diagnostic test. Additionally, 45% of diagnostic test confirmed HA-VTE cases lacked corresponding ICD codes. ICD-9-CM coding missed diagnostic test-confirmed HA-VTE cases and inaccurately assigned cases without confirmed VTE, suggesting dependence on administrative coding leads to inaccurate HA-VTE phenotyping. Alternative methods to develop more sensitive and specific VTE phenotype solutions portable across EHR vendor data are needed to support case-finding in big-data analytics.
Collapse
Affiliation(s)
- Tiffany Pellathy
- University of Pittsburgh School of Nursing, 336 Victoria Hall; 3500 Victoria Street, Pittsburgh, PA, 15213, USA.
| | - Melissa Saul
- University of Pittsburgh School of Medicine, Pittsburgh, PA, 15213, USA
| | - Gilles Clermont
- University of Pittsburgh School of Medicine, Pittsburgh, PA, 15213, USA
| | - Artur W Dubrawski
- School of Computer Science, Auton Lab, Carnegie Mellon University, Pittsburgh, PA, 15213, USA
| | - Michael R Pinsky
- University of Pittsburgh School of Medicine, Pittsburgh, PA, 15213, USA
| | - Marilyn Hravnak
- University of Pittsburgh School of Nursing, 336 Victoria Hall; 3500 Victoria Street, Pittsburgh, PA, 15213, USA
| |
Collapse
|
10
|
Sardar P, Abbott JD, Kundu A, Aronow HD, Granada JF, Giri J. Impact of Artificial Intelligence on Interventional Cardiology: From Decision-Making Aid to Advanced Interventional Procedure Assistance. JACC Cardiovasc Interv 2020; 12:1293-1303. [PMID: 31320024 DOI: 10.1016/j.jcin.2019.04.048] [Citation(s) in RCA: 62] [Impact Index Per Article: 12.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 08/08/2018] [Revised: 02/26/2019] [Accepted: 04/02/2019] [Indexed: 11/26/2022]
Abstract
Access to big data analyzed by supercomputers using advanced mathematical algorithms (i.e., deep machine learning) has allowed for enhancement of cognitive output (i.e., visual imaging interpretation) to previously unseen levels and promises to fundamentally change the practice of medicine. This field, known as "artificial intelligence" (AI), is making significant progress in areas such as automated clinical decision making, medical imaging analysis, and interventional procedures, and has the potential to dramatically influence the practice of interventional cardiology. The unique nature of interventional cardiology makes it an ideal target for the development of AI-based technologies designed to improve real-time clinical decision making, streamline workflow in the catheterization laboratory, and standardize catheter-based procedures through advanced robotics. This review provides an introduction to AI by highlighting its scope, potential applications, and limitations in interventional cardiology.
Collapse
Affiliation(s)
- Partha Sardar
- Cardiovascular Institute, Warren Alpert Medical School at Brown University, Providence, Rhode Island
| | - J Dawn Abbott
- Cardiovascular Institute, Warren Alpert Medical School at Brown University, Providence, Rhode Island
| | - Amartya Kundu
- Division of Cardiovascular Medicine, University of Massachusetts Medical School, Worcester, Massachusetts
| | - Herbert D Aronow
- Cardiovascular Institute, Warren Alpert Medical School at Brown University, Providence, Rhode Island
| | - Juan F Granada
- Cardiovascular Research Foundation, Columbia University Medical Center, New York, New York
| | - Jay Giri
- Penn Cardiovascular Outcomes, Quality and Evaluative Research Center, Leonard Davis Institute of Health Economics, University of Pennsylvania, Philadelphia, Pennsylvania; Cardiovascular Medicine Division, University of Pennsylvania, Philadelphia, Pennsylvania.
| |
Collapse
|
11
|
Labrosse J, Lam T, Sebbag C, Benque M, Abdennebi I, Merckelbagh H, Osdoit M, Priour M, Guerin J, Balezeau T, Grandal B, Coussy F, Bobrie A, Ferrer L, Laas E, Feron JG, Reyal F, Hamy AS. Text Mining in Electronic Medical Records Enables Quick and Efficient Identification of Pregnancy Cases Occurring After Breast Cancer. JCO Clin Cancer Inform 2020; 3:1-12. [PMID: 31626565 DOI: 10.1200/cci.19.00031] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/18/2023] Open
Abstract
PURPOSE To apply text mining (TM) technology on electronic medical records (EMRs) of patients with breast cancer (BC) to retrieve the occurrence of a pregnancy after BC diagnosis and compare its performance to manual curation. MATERIALS AND METHODS The training cohort (Cohort A) comprised 344 patients with BC age ≤ 40 years old treated at Institut Curie between 2005 and 2007. Manual curation consisted in manually reviewing each EMR to retrieve pregnancies. TM consisted of first applying a keyword filter ("accouch*" or "enceinte," French terms for "deliver*" and "pregnant," respectively) to select a subset of EMRs, and, second, checking manually EMRs to confirm the pregnancy. Then, we applied our TM algorithm on an independent cohort of patients with BC treated between 2008 and 2012 (Cohort B). RESULTS In Cohort A, 36 pregnancies were identified among 344 patients (10.5%; 2,829 person-years of EMR). Thirty were identified by manual review versus 35 by TM. TM resulted in a lower percentage of manual checking (26.7% v 100%, respectively) and substantial time gains (time to identify a pregnancy: 13 minutes for TM v 244 minutes for manual curation, respectively). Presence of any of the two TM filters showed excellent sensitivity (97%) and negative predictive value (100%). In Cohort B, 67 pregnancies were identified among 1,226 patients (5.5%; 7,349 person-years of EMR). Similarly, for Cohort B, TM spared 904 (73.7%) EMRs from manual review and quickly generated a cohort of 67 pregnancies after BC. Incidence rate of pregnancy after BC was 0.01 pregnancy per person-year of EMR in both cohorts. CONCLUSION TM is highly efficient to quickly identify rare events and is a promising tool to improve rapidity, efficiency, and costs of medical research.
Collapse
Affiliation(s)
| | - Thanh Lam
- Geneva University Hospitals, Geneva, Switzerland
| | | | | | | | | | | | | | | | | | | | | | | | - Loïc Ferrer
- Institut Curie, U900, Hôpital René Huguenin, Saint-Cloud, France
| | | | | | - Fabien Reyal
- Paris 5 Research University, INSERM U932, Institut Curie, Paris, France
| | - Anne-Sophie Hamy
- Paris 5 Research University, INSERM U932, Institut Curie, Paris, France
| |
Collapse
|
12
|
Wang L, Schnall J, Small A, Hubbard RA, Moore JH, Damrauer SM, Chen J. Case contamination in electronic health records based case-control studies. Biometrics 2020; 77:67-77. [PMID: 32246839 DOI: 10.1111/biom.13264] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/27/2018] [Accepted: 03/03/2020] [Indexed: 12/01/2022]
Abstract
Clinically relevant information from electronic health records (EHRs) permits derivation of a rich collection of phenotypes. Unlike traditionally designed studies where scientific hypotheses are specified a priori before data collection, the true phenotype status of any given individual in EHR-based studies is not directly available. Structured and unstructured data elements need to be queried through preconstructed rules to identify case and control groups. A sufficient number of controls can usually be identified with high accuracy by making the selection criteria stringent. But more relaxed criteria are often necessary for more thorough identification of cases to ensure achievable statistical power. The resulting pool of candidate cases consists of genuine cases contaminated with noncase patients who do not satisfy the control definition. The presence of patients who are neither true cases nor controls among the identified cases is a unique challenge in EHR-based case-control studies. Ignoring case contamination would lead to biased estimation of odds ratio association parameters. We propose an estimating equation approach to bias correction, study its large sample property, and evaluate its performance through extensive simulation studies and an application to a pilot study of aortic stenosis in the Penn medicine EHR. Our method holds the promise of facilitating more efficient EHR studies by accommodating enlarged albeit contaminated case pools.
Collapse
Affiliation(s)
- Lu Wang
- Department of Biostatistics, Epidemiology and Informatics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, Pennsylvania
| | - Jill Schnall
- Department of Biostatistics, Epidemiology and Informatics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, Pennsylvania
| | - Aeron Small
- Traditional Internal Medicine, School of Medicine, Yale University, New Haven, Connecticut
| | - Rebecca A Hubbard
- Department of Biostatistics, Epidemiology and Informatics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, Pennsylvania
| | - Jason H Moore
- Department of Biostatistics, Epidemiology and Informatics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, Pennsylvania.,Institute for Biomedical Informatics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, Pennsylvania
| | - Scott M Damrauer
- Department of Surgery, School of Medicine, University of Pennsylvania, Philadelphia, Pennsylvania
| | - Jinbo Chen
- Department of Biostatistics, Epidemiology and Informatics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, Pennsylvania
| |
Collapse
|
13
|
Chou LW, Chang KM, Puspitasari I. Drug Abuse Research Trend Investigation with Text Mining. COMPUTATIONAL AND MATHEMATICAL METHODS IN MEDICINE 2020; 2020:1030815. [PMID: 32076454 PMCID: PMC7016473 DOI: 10.1155/2020/1030815] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 12/03/2019] [Accepted: 01/07/2020] [Indexed: 11/18/2022]
Abstract
Drug abuse poses great physical and psychological harm to humans, thereby attracting scholarly attention. It often requires experience and time for a researcher, just entering this field, to find an appropriate method to study drug abuse issue. It is crucial for researchers to rapidly understand the existing research on a particular topic and be able to propose an effective new research method. Text mining analysis has been widely applied in recent years, and this study integrated the text mining method into a review of drug abuse research. Through searches for keywords related to the drug abuse, all related publications were identified and downloaded from PubMed. After removing the duplicate and incomplete literature, the retained data were imported for analysis through text mining. A total of 19,843 papers were analyzed, and the text mining technique was used to search for keyword and questionnaire types. The results showed the associations between these questionnaires, with the top five being the Addiction Severity Index (16.44%), the Quality of Life survey (5.01%), the Beck Depression Inventory (3.24%), the Addiction Research Center Inventory (2.81%), and the Profile of Mood States (1.10%). Specifically, the Addiction Severity Index was most commonly used in combination with Quality of Life scales. In conclusion, association analysis is useful to extract core knowledge. Researchers can learn and visualize the latest research trend.
Collapse
Affiliation(s)
- Li-Wei Chou
- Department of Physical Medicine and Rehabilitation, China Medical University Hospital, Taichung, Taiwan
- Department of Physical Therapy, Graduate Institute of Rehabilitation Science, China Medical University, Taichung, Taiwan
- Department of Rehabilitation, Asia University Hospital, Taichung, Taiwan
| | - Kang-Ming Chang
- Department of Photonics and Communication Engineering, Asia University, Taichung 41354, Taiwan
- Department of Medical Research, China Medical University Hospital, China Medical University, Taichung 40402, Taiwan
| | - Ira Puspitasari
- Information System Study Program, Faculty of Science and Technology, Universitas Airlangga, Surabaya, Indonesia
| |
Collapse
|
14
|
Przybyła P, Brockmeier AJ, Ananiadou S. Quantifying risk factors in medical reports with a context-aware linear model. J Am Med Inform Assoc 2019; 26:537-546. [PMID: 30840055 PMCID: PMC6515525 DOI: 10.1093/jamia/ocz004] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/20/2018] [Revised: 12/14/2018] [Accepted: 01/03/2019] [Indexed: 12/03/2022] Open
Abstract
OBJECTIVE We seek to quantify the mortality risk associated with mentions of medical concepts in textual electronic health records (EHRs). Recognizing mentions of named entities of relevant types (eg, conditions, symptoms, laboratory tests or behaviors) in text is a well-researched task. However, determining the level of risk associated with them is partly dependent on the textual context in which they appear, which may describe severity, temporal aspects, quantity, etc. METHODS To take into account that a given word appearing in the context of different risk factors (medical concepts) can make different contributions toward risk level, we propose a multitask approach, called context-aware linear modeling, which can be applied using appropriately regularized linear regression. To improve the performance for risk factors unseen in training data (eg, rare diseases), we take into account their distributional similarity to other concepts. RESULTS The evaluation is based on a corpus of 531 reports from EHRs with 99 376 risk factors rated manually by experts. While context-aware linear modeling significantly outperforms single-task models, taking into account concept similarity further improves performance, reaching the level of human annotators' agreements. CONCLUSION Our results show that automatic quantification of risk factors in EHRs can achieve performance comparable to human assessment, and taking into account the multitask structure of the problem and the ability to handle rare concepts is crucial for its accuracy.
Collapse
Affiliation(s)
- Piotr Przybyła
- National Centre for Text Mining, School of Computer Science, University of Manchester, Manchester, United Kingdom
| | - Austin J Brockmeier
- National Centre for Text Mining, School of Computer Science, University of Manchester, Manchester, United Kingdom
| | - Sophia Ananiadou
- National Centre for Text Mining, School of Computer Science, University of Manchester, Manchester, United Kingdom
| |
Collapse
|
15
|
Chang JR, Chen MY, Chen LS, Chien WT. Recognizing important factors of influencing trust in O2O models: an example of OpenTable. Soft comput 2019. [DOI: 10.1007/s00500-019-04019-x] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/26/2022]
|
16
|
Sheikhalishahi S, Miotto R, Dudley JT, Lavelli A, Rinaldi F, Osmani V. Natural Language Processing of Clinical Notes on Chronic Diseases: Systematic Review. JMIR Med Inform 2019; 7:e12239. [PMID: 31066697 PMCID: PMC6528438 DOI: 10.2196/12239] [Citation(s) in RCA: 262] [Impact Index Per Article: 43.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/17/2018] [Revised: 03/04/2019] [Accepted: 03/24/2019] [Indexed: 01/08/2023] Open
Abstract
BACKGROUND Novel approaches that complement and go beyond evidence-based medicine are required in the domain of chronic diseases, given the growing incidence of such conditions on the worldwide population. A promising avenue is the secondary use of electronic health records (EHRs), where patient data are analyzed to conduct clinical and translational research. Methods based on machine learning to process EHRs are resulting in improved understanding of patient clinical trajectories and chronic disease risk prediction, creating a unique opportunity to derive previously unknown clinical insights. However, a wealth of clinical histories remains locked behind clinical narratives in free-form text. Consequently, unlocking the full potential of EHR data is contingent on the development of natural language processing (NLP) methods to automatically transform clinical text into structured clinical data that can guide clinical decisions and potentially delay or prevent disease onset. OBJECTIVE The goal of the research was to provide a comprehensive overview of the development and uptake of NLP methods applied to free-text clinical notes related to chronic diseases, including the investigation of challenges faced by NLP methodologies in understanding clinical narratives. METHODS Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) guidelines were followed and searches were conducted in 5 databases using "clinical notes," "natural language processing," and "chronic disease" and their variations as keywords to maximize coverage of the articles. RESULTS Of the 2652 articles considered, 106 met the inclusion criteria. Review of the included papers resulted in identification of 43 chronic diseases, which were then further classified into 10 disease categories using the International Classification of Diseases, 10th Revision. The majority of studies focused on diseases of the circulatory system (n=38) while endocrine and metabolic diseases were fewest (n=14). This was due to the structure of clinical records related to metabolic diseases, which typically contain much more structured data, compared with medical records for diseases of the circulatory system, which focus more on unstructured data and consequently have seen a stronger focus of NLP. The review has shown that there is a significant increase in the use of machine learning methods compared to rule-based approaches; however, deep learning methods remain emergent (n=3). Consequently, the majority of works focus on classification of disease phenotype with only a handful of papers addressing extraction of comorbidities from the free text or integration of clinical notes with structured data. There is a notable use of relatively simple methods, such as shallow classifiers (or combination with rule-based methods), due to the interpretability of predictions, which still represents a significant issue for more complex methods. Finally, scarcity of publicly available data may also have contributed to insufficient development of more advanced methods, such as extraction of word embeddings from clinical notes. CONCLUSIONS Efforts are still required to improve (1) progression of clinical NLP methods from extraction toward understanding; (2) recognition of relations among entities rather than entities in isolation; (3) temporal extraction to understand past, current, and future clinical events; (4) exploitation of alternative sources of clinical knowledge; and (5) availability of large-scale, de-identified clinical corpora.
Collapse
Affiliation(s)
- Seyedmostafa Sheikhalishahi
- eHealth Research Group, Fondazione Bruno Kessler Research Institute, Trento, Italy
- Department of Information Engineering and Computer Science, University of Trento, Trento, Italy
| | - Riccardo Miotto
- Institute for Next Generation Healthcare, Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, New York, NY, United States
| | - Joel T Dudley
- Institute for Next Generation Healthcare, Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, New York, NY, United States
| | - Alberto Lavelli
- NLP Research Group, Fondazione Bruno Kessler Research Institute, Trento, Italy
| | - Fabio Rinaldi
- Institute of Computational Linguistics, University of Zurich, Zurich, Switzerland
| | - Venet Osmani
- eHealth Research Group, Fondazione Bruno Kessler Research Institute, Trento, Italy
| |
Collapse
|
17
|
Pearson DR, Werth VP. Geospatial Correlation of Amyopathic Dermatomyositis With Fixed Sources of Airborne Pollution: A Retrospective Cohort Study. Front Med (Lausanne) 2019; 6:85. [PMID: 31069228 PMCID: PMC6491706 DOI: 10.3389/fmed.2019.00085] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/06/2019] [Accepted: 04/04/2019] [Indexed: 11/28/2022] Open
Abstract
Objective: Dermatomyositis (DM) may result from exogenous triggers, including airborne pollutants, in genetically susceptible individuals. The United States Environmental Protection Agency's 2011 National Air Toxics Assessment (NATA) models health risks associated with airborne emissions, available by ZIP code tabulation area (ZCTA). Important contributors include point (fixed), on-road, and secondary sources. The objective of this study was to investigate the geospatial distributions of DM and subtypes, classic DM (CDM) and clinically amyopathic DM (CADM), and their associations with airborne pollutants. Methods: This retrospective cohort study identified 642 adult DM patients from 336 unique ZCTAs. GeoDa v.1.10 was used to calculate global and local Moran's indices and generate local indicator of spatial autocorrelation (LISA) maps. All Moran's indices and LISA maps were permuted 999 times. Results: Univariate global Moran's indices for DM, CDM, and CADM prevalence were not significant, but LISA maps demonstrated differential local spatial clustering and outliers. CADM prevalence correlated with point sources (bivariate global Moran's index 0.071, pseudo-p = 0.018), in contrast to CDM (−0.0053, pseudo-p = 0.46). Bivariate global Moran's indices for DM, CDM, and CADM prevalence did not correlate with other airborne toxics, but bivariate LISA maps revealed local spatial clustering and outliers. Conclusion: Prevalence of CADM, but not CDM, is geospatially correlated with fixed sources of airborne emissions. This effect is small but significant and may support the hypothesis that triggering exposures influence disease phenotype. Important limitations are NATA data and ZCTA population estimates were collected from 2011 and ZCTA of residence may not have been where patients had greatest airborne pollutant exposure.
Collapse
Affiliation(s)
- David R Pearson
- Department of Dermatology, University of Minnesota Medical School, Minneapolis, MN, United States
| | - Victoria P Werth
- Corporal Michael J. Crescenz VA Medical Center, Philadelphia, PA, United States.,Department of Dermatology, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, United States
| |
Collapse
|