1
|
Du H, Xu J, Du Z, Chen L, Ma S, Wei D, Wang X. MF-MNER: Multi-models Fusion for MNER in Chinese Clinical Electronic Medical Records. Interdiscip Sci 2024:10.1007/s12539-024-00624-z. [PMID: 38578388 DOI: 10.1007/s12539-024-00624-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/16/2023] [Revised: 01/13/2024] [Accepted: 02/25/2024] [Indexed: 04/06/2024]
Abstract
To address the problem of poor entity recognition performance caused by the lack of Chinese annotation in clinical electronic medical records, this paper proposes a multi-medical entity recognition method F-MNER using a fusion technique combining BART, Bi-LSTM, and CRF. First, after cleaning, encoding, and segmenting the electronic medical records, the obtained semantic representations are dynamically fused using a bidirectional autoregressive transformer (BART) model. Then, sequential information is captured using a bidirectional long short-term memory (Bi-LSTM) network. Finally, the conditional random field (CRF) is used to decode and output multi-task entity recognition. Experiments are performed on the CCKS2019 dataset, with micro avg Precision, macro avg Recall, weighted avg Precision reaching 0.880, 0.887, and 0.883, and micro avg F1-score, macro avg F1-score, weighted avg F1-score reaching 0.875, 0.876, and 0.876 respectively. Compared with existing models, our method outperforms the existing literature in three evaluation metrics (micro average, macro average, weighted average) under the same dataset conditions. In the case of weighted average, the Precision, Recall, and F1-score are 19.64%, 15.67%, and 17.58% higher than the existing BERT-BiLSTM-CRF model respectively. Experiments are performed on the actual clinical dataset with our MF-MNER, the Precision, Recall, and F1-score are 0.638, 0.825, and 0.719 under the micro-avg evaluation mechanism. The Precision, Recall, and F1-score are 0.685, 0.800, and 0.733 under the macro-avg evaluation mechanism. The Precision, Recall, and F1-score are 0.647, 0.825, and 0.722 under the weighted avg evaluation mechanism. The above results show that our method MF-MNER can integrate the advantages of BART, Bi-LSTM, and CRF layers, significantly improving the performance of downstream named entity recognition tasks with a small amount of annotation, and achieving excellent performance in terms of recall score, which has certain practical significance. Source code and datasets to reproduce the results in this paper are available at https://github.com/xfwang1969/MF-MNER .
Collapse
Affiliation(s)
- Haoze Du
- Department of Computer Science, North Carolina State University, Raleigh, NC, 27695, USA
| | - Jiahao Xu
- School of Electrical and Electronic Engineering, Nanyang Technological University, Singapore, 639798, Singapore
| | - Zhiyong Du
- School of Computer Science and Technology, Henan Institute of Technology, Xinxiang, 453003, China
| | - Lihui Chen
- School of Electrical and Electronic Engineering, Nanyang Technological University, Singapore, 639798, Singapore
| | - Shaohui Ma
- School of Computer Science and Technology, Henan Institute of Technology, Xinxiang, 453003, China
| | - Dongqing Wei
- State Key Laboratory of Microbial Metabolism, School of Life Sciences and Biotechnology, Shanghai Jiaotong University, Shanghai, 200240, China.
- Joint Laboratory of International Cooperation in Metabolic and Developmental Sciences, Ministry of Education, Shanghai Jiaotong University, Shanghai, 200240, China.
- Zhongjing Research and Industrialization, Institute of Chinese Medicine, Zhongguancun Scientific Park, Nanyang, 473000, China.
| | - Xianfang Wang
- School of Computer Science and Technology, Henan Institute of Technology, Xinxiang, 453003, China.
| |
Collapse
|
2
|
Curth A, Peck RW, McKinney E, Weatherall J, van der Schaar M. Using Machine Learning to Individualize Treatment Effect Estimation: Challenges and Opportunities. Clin Pharmacol Ther 2024; 115:710-719. [PMID: 38124482 DOI: 10.1002/cpt.3159] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/30/2023] [Accepted: 12/07/2023] [Indexed: 12/23/2023]
Abstract
The use of data from randomized clinical trials to justify treatment decisions for real-world patients is the current state of the art. It relies on the assumption that average treatment effects from the trial can be extrapolated to patients with personal and/or disease characteristics different from those treated in the trial. Yet, because of heterogeneity of treatment effects between patients and between the trial population and real-world patients, this assumption may not be correct for many patients. Using machine learning to estimate the expected conditional average treatment effect (CATE) in individual patients from observational data offers the potential for more accurate estimation of the expected treatment effects in each patient based on their observed characteristics. In this review, we discuss some of the challenges and opportunities for machine learning to estimate CATE, including ensuring identification assumptions are met, managing covariate shift, and learning without access to the true label of interest. We also discuss the potential applications as well as future work and collaborations needed to further improve identification and utilization of CATE estimates to increase patient benefit.
Collapse
Affiliation(s)
- Alicia Curth
- Department of Applied Mathematics & Theoretical Physics, University of Cambridge, Cambridge, UK
| | - Richard W Peck
- Department of Pharmacology & Therapeutics, University of Liverpool, Liverpool, UK
- Roche Pharma Research & Early Development (pRED), Roche Innovation Center, Basel, Switzerland
| | - Eoin McKinney
- Cambridge Institute for Immunotherapy & Infectious Disease, Jeffrey Cheah Biomedical Center, Cambridge Biomedical Campus, Addenbrooke's Hospital, Cambridge, UK
- Cambridge Centre for AI in Medicine, Cambridge, UK
| | - James Weatherall
- AstraZeneca R&D Data Science and Artificial Intelligence, Cambridge, UK
| | - Mihaela van der Schaar
- Department of Applied Mathematics & Theoretical Physics, University of Cambridge, Cambridge, UK
- Cambridge Centre for AI in Medicine, Cambridge, UK
- The Alan Turing Institute, London, UK
| |
Collapse
|
3
|
Kim MK, Rouphael C, McMichael J, Welch N, Dasarathy S. Challenges in and Opportunities for Electronic Health Record-Based Data Analysis and Interpretation. Gut Liver 2024; 18:201-208. [PMID: 37905424 PMCID: PMC10938158 DOI: 10.5009/gnl230272] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 07/14/2023] [Accepted: 08/15/2023] [Indexed: 11/02/2023] Open
Abstract
Electronic health records (EHRs) have been increasingly adopted in clinical practices across the United States, providing a primary source of data for clinical research, particularly observational cohort studies. EHRs are a high-yield, low-maintenance source of longitudinal real-world data for large patient populations and provide a wealth of information and clinical contexts that are useful for clinical research and translation into practice. Despite these strengths, it is important to recognize the multiple limitations and challenges related to the use of EHR data in clinical research. Missing data are a major source of error and biases and can affect the representativeness of the cohort of interest, as well as the accuracy of the outcomes and exposures. Here, we aim to provide a critical understanding of the types of data available in EHRs and describe the impact of data heterogeneity, quality, and generalizability, which should be evaluated prior to and during the analysis of EHR data. We also identify challenges pertaining to data quality, including errors and biases, and examine potential sources of such biases and errors. Finally, we discuss approaches to mitigate and remediate these limitations. A proactive approach to addressing these issues can help ensure the integrity and quality of EHR data and the appropriateness of their use in clinical studies.
Collapse
Affiliation(s)
- Michelle Kang Kim
- Department of Gastroenterology, Hepatology, and Nutrition, Digestive Disease and Surgery Institute, Cleveland Clinic, Cleveland, OH, USA
| | - Carol Rouphael
- Department of Gastroenterology, Hepatology, and Nutrition, Digestive Disease and Surgery Institute, Cleveland Clinic, Cleveland, OH, USA
| | - John McMichael
- Department of Surgery, Digestive Disease and Surgery Institute, Cleveland Clinic, Cleveland, OH, USA
| | - Nicole Welch
- Department of Gastroenterology, Hepatology, and Nutrition, Digestive Disease and Surgery Institute, Cleveland Clinic, Cleveland, OH, USA
- Department of Inflammation and Immunity, Lerner Research Institute, Cleveland Clinic, Cleveland, OH, USA
| | - Srinivasan Dasarathy
- Department of Gastroenterology, Hepatology, and Nutrition, Digestive Disease and Surgery Institute, Cleveland Clinic, Cleveland, OH, USA
- Department of Inflammation and Immunity, Lerner Research Institute, Cleveland Clinic, Cleveland, OH, USA
| |
Collapse
|
4
|
Leviton A, Loddenkemper T. Design, implementation, and inferential issues associated with clinical trials that rely on data in electronic medical records: a narrative review. BMC Med Res Methodol 2023; 23:271. [PMID: 37974111 PMCID: PMC10652539 DOI: 10.1186/s12874-023-02102-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/17/2022] [Accepted: 11/08/2023] [Indexed: 11/19/2023] Open
Abstract
Real world evidence is now accepted by authorities charged with assessing the benefits and harms of new therapies. Clinical trials based on real world evidence are much less expensive than randomized clinical trials that do not rely on "real world evidence" such as contained in electronic health records (EHR). Consequently, we can expect an increase in the number of reports of these types of trials, which we identify here as 'EHR-sourced trials.' 'In this selected literature review, we discuss the various designs and the ethical issues they raise. EHR-sourced trials have the potential to improve/increase common data elements and other aspects of the EHR and related systems. Caution is advised, however, in drawing causal inferences about the relationships among EHR variables. Nevertheless, we anticipate that EHR-CTs will play a central role in answering research and regulatory questions.
Collapse
Affiliation(s)
- Alan Leviton
- Department of Neurology, Boston Children's Hospital, Harvard Medical School, Boston, MA, USA.
| | - Tobias Loddenkemper
- Department of Neurology, Boston Children's Hospital, Harvard Medical School, Boston, MA, USA
| |
Collapse
|
5
|
Hirosawa T, Harada Y, Yokose M, Sakamoto T, Kawamura R, Shimizu T. Diagnostic Accuracy of Differential-Diagnosis Lists Generated by Generative Pretrained Transformer 3 Chatbot for Clinical Vignettes with Common Chief Complaints: A Pilot Study. INTERNATIONAL JOURNAL OF ENVIRONMENTAL RESEARCH AND PUBLIC HEALTH 2023; 20:3378. [PMID: 36834073 PMCID: PMC9967747 DOI: 10.3390/ijerph20043378] [Citation(s) in RCA: 77] [Impact Index Per Article: 77.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 01/25/2023] [Revised: 02/09/2023] [Accepted: 02/13/2023] [Indexed: 06/01/2023]
Abstract
The diagnostic accuracy of differential diagnoses generated by artificial intelligence (AI) chatbots, including the generative pretrained transformer 3 (GPT-3) chatbot (ChatGPT-3) is unknown. This study evaluated the accuracy of differential-diagnosis lists generated by ChatGPT-3 for clinical vignettes with common chief complaints. General internal medicine physicians created clinical cases, correct diagnoses, and five differential diagnoses for ten common chief complaints. The rate of correct diagnosis by ChatGPT-3 within the ten differential-diagnosis lists was 28/30 (93.3%). The rate of correct diagnosis by physicians was still superior to that by ChatGPT-3 within the five differential-diagnosis lists (98.3% vs. 83.3%, p = 0.03). The rate of correct diagnosis by physicians was also superior to that by ChatGPT-3 in the top diagnosis (53.3% vs. 93.3%, p < 0.001). The rate of consistent differential diagnoses among physicians within the ten differential-diagnosis lists generated by ChatGPT-3 was 62/88 (70.5%). In summary, this study demonstrates the high diagnostic accuracy of differential-diagnosis lists generated by ChatGPT-3 for clinical cases with common chief complaints. This suggests that AI chatbots such as ChatGPT-3 can generate a well-differentiated diagnosis list for common chief complaints. However, the order of these lists can be improved in the future.
Collapse
Affiliation(s)
- Takanobu Hirosawa
- Department of Diagnostic and Generalist Medicine, Dokkyo Medical University, Tochigi 321-0293, Japan
| | | | | | | | | | | |
Collapse
|
6
|
Tamang S, Humbert-Droz M, Gianfrancesco M, Izadi Z, Schmajuk G, Yazdany J. Practical Considerations for Developing Clinical Natural Language Processing Systems for Population Health Management and Measurement. JMIR Med Inform 2023; 11:e37805. [PMID: 36595345 PMCID: PMC9846439 DOI: 10.2196/37805] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/08/2022] [Revised: 09/02/2022] [Accepted: 11/09/2022] [Indexed: 11/11/2022] Open
Abstract
Experts have noted a concerning gap between clinical natural language processing (NLP) research and real-world applications, such as clinical decision support. To help address this gap, in this viewpoint, we enumerate a set of practical considerations for developing an NLP system to support real-world clinical needs and improve health outcomes. They include determining (1) the readiness of the data and compute resources for NLP, (2) the organizational incentives to use and maintain the NLP systems, and (3) the feasibility of implementation and continued monitoring. These considerations are intended to benefit the design of future clinical NLP projects and can be applied across a variety of settings, including large health systems or smaller clinical practices that have adopted electronic medical records in the United States and globally.
Collapse
Affiliation(s)
- Suzanne Tamang
- Division of Immunology and Rheumatology, Stanford University School of Medicine, Stanford, CA, United States
- Department of Veterans Affairs, Office of Mental Health and Suicide Prevention, Program Evaluation Resource Center, Palo Alto, CA, United States
| | - Marie Humbert-Droz
- Division of Immunology and Rheumatology, Stanford University School of Medicine, Stanford, CA, United States
| | - Milena Gianfrancesco
- Division of Rheumatology, University of California, San Francisco, San Francisco, CA, United States
| | - Zara Izadi
- Division of Rheumatology, University of California, San Francisco, San Francisco, CA, United States
| | - Gabriela Schmajuk
- Division of Rheumatology, University of California, San Francisco, San Francisco, CA, United States
| | - Jinoos Yazdany
- Division of Rheumatology, University of California, San Francisco, San Francisco, CA, United States
| |
Collapse
|
7
|
Intelligent oncology: The convergence of artificial intelligence and oncology. JOURNAL OF THE NATIONAL CANCER CENTER 2022. [DOI: 10.1016/j.jncc.2022.11.004] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/12/2022] Open
|