Reference Citation Analysis: Find an Article, Find a Category, Find a Journal, Find a Scholar

For: Si Y, Wang J, Xu H, Roberts K. Enhancing clinical concept extraction with contextual embeddings. J Am Med Inform Assoc 2021;26:1297-1304. [PMID: 31265066 DOI: 10.1093/jamia/ocz096] [Citation(s) in RCA: 105] [Impact Index Per Article: 35.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/18/2019] [Revised: 05/10/2019] [Accepted: 05/24/2019] [Indexed: 11/14/2022] Open

For:	Si Y, Wang J, Xu H, Roberts K. Enhancing clinical concept extraction with contextual embeddings. J Am Med Inform Assoc 2021;26:1297-1304. [PMID: 31265066 DOI: 10.1093/jamia/ocz096] [Citation(s) in RCA: 105] [Impact Index Per Article: 35.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/18/2019] [Revised: 05/10/2019] [Accepted: 05/24/2019] [Indexed: 11/14/2022] Open

Number

Cited by Other Article(s)

Singh A, Krishnamoorthy S, Ortega JE. NeighBERT: Medical Entity Linking Using Relation-Induced Dense Retrieval. JOURNAL OF HEALTHCARE INFORMATICS RESEARCH 2024;8:353-369. [PMID: 38681752 PMCID: PMC11052986 DOI: 10.1007/s41666-023-00136-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/09/2022] [Revised: 05/08/2023] [Accepted: 07/03/2023] [Indexed: 05/01/2024]

Park J, Fang Y, Ta C, Zhang G, Idnay B, Chen F, Feng D, Shyu R, Gordon ER, Spotnitz M, Weng C. Criteria2Query 3.0: Leveraging generative large language models for clinical trial eligibility query generation. J Biomed Inform 2024;154:104649. [PMID: 38697494 PMCID: PMC11129920 DOI: 10.1016/j.jbi.2024.104649] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/12/2023] [Revised: 04/03/2024] [Accepted: 04/29/2024] [Indexed: 05/05/2024]

Bakken S. What can you do with a large language model? J Am Med Inform Assoc 2024;31:1217-1218. [PMID: 38768444 PMCID: PMC11105124 DOI: 10.1093/jamia/ocae106] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/25/2024] [Indexed: 05/22/2024] Open

Lyu D, Wang X, Chen Y, Wang F. Language model and its interpretability in biomedicine: A scoping review. iScience 2024;27:109334. [PMID: 38495823 PMCID: PMC10940999 DOI: 10.1016/j.isci.2024.109334] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 03/19/2024] Open

Peng C, Yang X, Chen A, Yu Z, Smith KE, Costa AB, Flores MG, Bian J, Wu Y. Generative large language models are all-purpose text analytics engines: text-to-text learning is all your need. J Am Med Inform Assoc 2024:ocae078. [PMID: 38630580 DOI: 10.1093/jamia/ocae078] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/12/2023] [Revised: 02/26/2024] [Accepted: 04/02/2024] [Indexed: 04/19/2024] Open

Li G, Togo R, Ogawa T, Haseyama M. Importance-aware adaptive dataset distillation. Neural Netw 2024;172:106154. [PMID: 38309137 DOI: 10.1016/j.neunet.2024.106154] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/15/2023] [Revised: 01/04/2024] [Accepted: 01/28/2024] [Indexed: 02/05/2024]

Zhou H, Austin R, Lu SC, Silverman GM, Zhou Y, Kilicoglu H, Xu H, Zhang R. Complementary and Integrative Health Information in the literature: its lexicon and named entity recognition. J Am Med Inform Assoc 2024;31:426-434. [PMID: 37952122 PMCID: PMC10797266 DOI: 10.1093/jamia/ocad216] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/15/2023] [Revised: 10/20/2023] [Accepted: 11/08/2023] [Indexed: 11/14/2023] Open

Yuan K, Haddad Y, Law R, Shakya I, Haileyesus T, Navon L, Zhang L, Liu Y, Bergen G. Emergency Department Visits for Alcohol-Associated Falls Among Older Adults in the United States, 2011 to 2020. Ann Emerg Med 2023;82:666-677. [PMID: 37204348 PMCID: PMC10950308 DOI: 10.1016/j.annemergmed.2023.04.013] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/20/2022] [Revised: 03/30/2023] [Accepted: 04/11/2023] [Indexed: 05/20/2023]

Abstract

STUDY OBJECTIVE

The aim of this study was to examine the epidemiology of alcohol-associated fall injuries among older adults aged ≥65 years in the United States.

METHODS

We included emergency department (ED) visits for unintentional fall injuries by adults from the National Electronic Injury Surveillance System-All Injury Program during 2011 to 2020. We estimated the annual national rate of ED visits for alcohol-associated falls and the proportion of these falls among older adults' fall-related ED visits using demographic and clinical characteristics. Joinpoint regression was performed to examine trends in alcohol-associated ED fall visits between 2011 and 2019 among older adult age subgroups and to compare these trends with those of younger adults.

RESULTS

There were 9,657 (weighted national estimate: 618,099) ED visits for alcohol-associated falls, representing 2.2% of ED fall visits during 2011 to 2020 among older adults. The proportion of fall-related ED visits that were alcohol-associated was higher among men than among women (adjusted prevalence ratio [aPR]=3.6, 95% confidence interval [CI] 2.9 to 4.5). The head and face were the most commonly injured body parts, and internal injury was the most common diagnosis for alcohol-associated falls. From 2011 to 2019, the annual rate of ED visits for alcohol-associated falls increased (annual percent change 7.5, 95% CI 6.1 to 8.9) among older adults. Adults aged 55 to 64 years had a similar increase; a sustained increase was not detected in younger age groups.

CONCLUSION

Our findings highlight the rising rates of ED visits for alcohol-associated falls among older adults during the study period. Health care providers in the ED can screen older adults for fall risk and assess for modifiable risk factors such as alcohol use to help identify those who could benefit from interventions to reduce their risk.

Collapse

Macri CZ, Teoh SC, Bacchi S, Tan I, Casson R, Sun MT, Selva D, Chan W. A case study in applying artificial intelligence-based named entity recognition to develop an automated ophthalmic disease registry. Graefes Arch Clin Exp Ophthalmol 2023;261:3335-3344. [PMID: 37535181 PMCID: PMC10587337 DOI: 10.1007/s00417-023-06190-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/02/2022] [Revised: 06/23/2023] [Accepted: 07/23/2023] [Indexed: 08/04/2023] Open

Zhou X, Zhang S, Agarwal M, Akroyd J, Mosbach S, Kraft M. Marie and BERT-A Knowledge Graph Embedding Based Question Answering System for Chemistry. ACS OMEGA 2023;8:33039-33057. [PMID: 37720754 PMCID: PMC10500657 DOI: 10.1021/acsomega.3c05114] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 07/16/2023] [Accepted: 08/03/2023] [Indexed: 09/19/2023]

Guizzardi S, Colangelo MT, Mirandola P, Galli C. Modeling new trends in bone regeneration, using the BERTopic approach. Regen Med 2023;18:719-734. [PMID: 37577987 DOI: 10.2217/rme-2023-0096] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 08/15/2023] Open

Fraile Navarro D, Ijaz K, Rezazadegan D, Rahimi-Ardabili H, Dras M, Coiera E, Berkovsky S. Clinical named entity recognition and relation extraction using natural language processing of medical free text: A systematic review. Int J Med Inform 2023;177:105122. [PMID: 37295138 DOI: 10.1016/j.ijmedinf.2023.105122] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/09/2022] [Revised: 04/14/2023] [Accepted: 06/03/2023] [Indexed: 06/12/2023]

Abstract

BACKGROUND

Natural Language Processing (NLP) applications have developed over the past years in various fields including its application to clinical free text for named entity recognition and relation extraction. However, there has been rapid developments the last few years that there's currently no overview of it. Moreover, it is unclear how these models and tools have been translated into clinical practice. We aim to synthesize and review these developments.

METHODS

We reviewed literature from 2010 to date, searching PubMed, Scopus, the Association of Computational Linguistics (ACL), and Association of Computer Machinery (ACM) libraries for studies of NLP systems performing general-purpose (i.e., not disease- or treatment-specific) information extraction and relation extraction tasks in unstructured clinical text (e.g., discharge summaries).

RESULTS

We included in the review 94 studies with 30 studies published in the last three years. Machine learning methods were used in 68 studies, rule-based in 5 studies, and both in 22 studies. 63 studies focused on Named Entity Recognition, 13 on Relation Extraction and 18 performed both. The most frequently extracted entities were "problem", "test" and "treatment". 72 studies used public datasets and 22 studies used proprietary datasets alone. Only 14 studies defined clearly a clinical or information task to be addressed by the system and just three studies reported its use outside the experimental setting. Only 7 studies shared a pre-trained model and only 8 an available software tool.

DISCUSSION

Machine learning-based methods have dominated the NLP field on information extraction tasks. More recently, Transformer-based language models are taking the lead and showing the strongest performance. However, these developments are mostly based on a few datasets and generic annotations, with very few real-world use cases. This may raise questions about the generalizability of findings, translation into practice and highlights the need for robust clinical evaluation.

Collapse

Mishra RK, Roy S, Palla SK, Patel N, Patel M, Jos S. Hybrid approach combining deep learning and a rule based expert system for concept extraction from prescriptions. ANNUAL INTERNATIONAL CONFERENCE OF THE IEEE ENGINEERING IN MEDICINE AND BIOLOGY SOCIETY. IEEE ENGINEERING IN MEDICINE AND BIOLOGY SOCIETY. ANNUAL INTERNATIONAL CONFERENCE 2023;2023:1-4. [PMID: 38082624 DOI: 10.1109/embc40787.2023.10339977] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/18/2023]

Datta S, Roberts K. Weakly supervised spatial relation extraction from radiology reports. JAMIA Open 2023;6:ooad027. [PMID: 37096148 PMCID: PMC10122604 DOI: 10.1093/jamiaopen/ooad027] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/27/2022] [Revised: 03/16/2023] [Accepted: 04/04/2023] [Indexed: 04/26/2023] Open

Dolatabadi E, Chen B, Buchan SA, Austin AM, Azimaee M, McGeer A, Mubareka S, Kwong JC. Natural Language Processing for Clinical Laboratory Data Repository Systems: Implementation and Evaluation for Respiratory Viruses. JMIR AI 2023;2:e44835. [PMID: 38875570 PMCID: PMC11057455 DOI: 10.2196/44835] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 12/05/2022] [Revised: 03/31/2023] [Accepted: 04/18/2023] [Indexed: 06/16/2024]

Abstract

BACKGROUND

With the growing volume and complexity of laboratory repositories, it has become tedious to parse unstructured data into structured and tabulated formats for secondary uses such as decision support, quality assurance, and outcome analysis. However, advances in natural language processing (NLP) approaches have enabled efficient and automated extraction of clinically meaningful medical concepts from unstructured reports.

OBJECTIVE

In this study, we aimed to determine the feasibility of using the NLP model for information extraction as an alternative approach to a time-consuming and operationally resource-intensive handcrafted rule-based tool. Therefore, we sought to develop and evaluate a deep learning-based NLP model to derive knowledge and extract information from text-based laboratory reports sourced from a provincial laboratory repository system.

METHODS

The NLP model, a hierarchical multilabel classifier, was trained on a corpus of laboratory reports covering testing for 14 different respiratory viruses and viral subtypes. The corpus includes 87,500 unique laboratory reports annotated by 8 subject matter experts (SMEs). The classification task involved assigning the laboratory reports to labels at 2 levels: 24 fine-grained labels in level 1 and 6 coarse-grained labels in level 2. A "label" also refers to the status of a specific virus or strain being tested or detected (eg, influenza A is detected). The model's performance stability and variation were analyzed across all labels in the classification task. Additionally, the model's generalizability was evaluated internally and externally on various test sets.

RESULTS

Overall, the NLP model performed well on internal, out-of-time (pre-COVID-19), and external (different laboratories) test sets with microaveraged F1-scores >94% across all classes. Higher precision and recall scores with less variability were observed for the internal and pre-COVID-19 test sets. As expected, the model's performance varied across categories and virus types due to the imbalanced nature of the corpus and sample sizes per class. There were intrinsically fewer classes of viruses being detected than those tested; therefore, the model's performance (lowest F1-score of 57%) was noticeably lower in the detected cases.

CONCLUSIONS

We demonstrated that deep learning-based NLP models are promising solutions for information extraction from text-based laboratory reports. These approaches enable scalable, timely, and practical access to high-quality and encoded laboratory data if integrated into laboratory information system repositories.

Collapse

Keloth VK, Banda JM, Gurley M, Heider PM, Kennedy G, Liu H, Liu F, Miller T, Natarajan K, V Patterson O, Peng Y, Raja K, Reeves RM, Rouhizadeh M, Shi J, Wang X, Wang Y, Wei WQ, Williams AE, Zhang R, Belenkaya R, Reich C, Blacketer C, Ryan P, Hripcsak G, Elhadad N, Xu H. Representing and utilizing clinical textual data for real world studies: An OHDSI approach. J Biomed Inform 2023;142:104343. [PMID: 36935011 PMCID: PMC10428170 DOI: 10.1016/j.jbi.2023.104343] [Citation(s) in RCA: 3] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/25/2022] [Revised: 01/21/2023] [Accepted: 03/13/2023] [Indexed: 03/19/2023]

Affiliation(s)

Vipina K Keloth Section of Biomedical Informatics and Data Science, Yale School of Medicine, Yale University, New Haven, CT, USA
Juan M Banda Department of Computer Science, Georgia State University, Atlanta, GA, USA
Michael Gurley Lurie Cancer Center, Northwestern University, Chicago, Illinois, USA
Paul M Heider Biomedical Informatics Center, Medical University of South Carolina, Charleston, SC, USA
Georgina Kennedy Ingham Institute for Applied Medical Research, Sydney, Australia
Hongfang Liu Department of Artificial Intelligence and Informatics, Mayo Clinic, Rochester, MN, USA
Feifan Liu Department of Population and Quantitative Health Sciences, University of Massachusetts Chan Medical School, Worcester, MA, USA
Timothy Miller Computational Health Informatics Program, Boston Children's Hospital, and Department of Pediatrics, Harvard Medical School, Boston, MA, USA
Karthik Natarajan Department of Biomedical Informatics, Columbia University Irving Medical Center, New York, NY, USA
Olga V Patterson VA Informatics and Computing Infrastructure, Department of Veterans Affairs Salt Lake City Health Care System, Salt Lake City, Utah, USA; Division of Epidemiology, Department of Internal Medicine, School of Medicine, University of Utah, Salt Lake City, Utah, USA; Verily Life Sciences, Mountain View, CA, USA
Yifan Peng Department of Population Health Sciences, Weill Cornell Medicine, New York, NY, USA
Kalpana Raja Section of Biomedical Informatics and Data Science, Yale School of Medicine, Yale University, New Haven, CT, USA
Ruth M Reeves TN Valley Healthcare System, U.S. Department of Veterans Affairs, Nashville, TN, USA; Department of Biomedical Informatics, Vanderbilt University Medical Center, Nashville, TN, USA
Masoud Rouhizadeh Department of Pharmaceutical Outcomes & Policy, University of Florida, Gainesville, FL, USA; Biomedical Informatics and Data Science, Johns Hopkins University, Baltimore, MD, USA
Jianlin Shi VA Informatics and Computing Infrastructure, Department of Veterans Affairs Salt Lake City Health Care System, Salt Lake City, Utah, USA; Division of Epidemiology, Department of Internal Medicine, School of Medicine, University of Utah, Salt Lake City, Utah, USA; Department of Biomedical Informatics, University of Utah, Salt Lake City, USA
Xiaoyan Wang Sema4 Mount Sinai Genomics Incorporation, Stamford, CT, USA
Yanshan Wang Department of Health Information Management, Department of Biomedical Informatics, and Intelligent Systems Program, University of Pittsburgh, Pittsburgh, PA, USA
Wei-Qi Wei Department of Biomedical Informatics, Vanderbilt University Medical Center, Nashville, TN, USA
Andrew E Williams School of Medicine, Tufts University, Boston, MA, USA
Rui Zhang Institute for Health Informatics, and Department of Pharmaceutical Care & Health Systems, University of Minnesota, Minneapolis, MN, USA
Rimma Belenkaya Memorial Sloan Kettering Cancer Center, New York, NY, USA
Christian Reich Real World Solutions, IQVIA, Durham, NC, USA
Clair Blacketer Janssen Pharmaceutical Research and Development LLC, Titusville, NJ, USA; Department of Medical Informatics, Erasmus University Medical Center, Rotterdam, the Netherlands
Patrick Ryan Department of Biomedical Informatics, Columbia University Irving Medical Center, New York, NY, USA; Janssen Pharmaceutical Research and Development LLC, Titusville, NJ, USA
George Hripcsak Department of Biomedical Informatics, Columbia University Irving Medical Center, New York, NY, USA
Noémie Elhadad Department of Biomedical Informatics, Columbia University Irving Medical Center, New York, NY, USA.
Hua Xu Section of Biomedical Informatics and Data Science, Yale School of Medicine, Yale University, New Haven, CT, USA.

Collapse

Mithun S, Jha AK, Sherkhane UB, Jaiswar V, Purandare NC, Dekker A, Puts S, Bermejo I, Rangarajan V, Zegers CML, Wee L. Clinical Concept-Based Radiology Reports Classification Pipeline for Lung Carcinoma. J Digit Imaging 2023;36:812-826. [PMID: 36788196 PMCID: PMC10287609 DOI: 10.1007/s10278-023-00787-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/31/2022] [Revised: 01/23/2023] [Accepted: 01/24/2023] [Indexed: 02/16/2023] Open

Abstract

Rising incidence and mortality of cancer have led to an incremental amount of research in the field. To learn from preexisting data, it has become important to capture maximum information related to disease type, stage, treatment, and outcomes. Medical imaging reports are rich in this kind of information but are only present as free text. The extraction of information from such unstructured text reports is labor-intensive. The use of Natural Language Processing (NLP) tools to extract information from radiology reports can make it less time-consuming as well as more effective. In this study, we have developed and compared different models for the classification of lung carcinoma reports using clinical concepts. This study was approved by the institutional ethics committee as a retrospective study with a waiver of informed consent. A clinical concept-based classification pipeline for lung carcinoma radiology reports was developed using rule-based as well as machine learning models and compared. The machine learning models used were XGBoost and two more deep learning model architectures with bidirectional long short-term neural networks. A corpus consisting of 1700 radiology reports including computed tomography (CT) and positron emission tomography/computed tomography (PET/CT) reports were used for development and testing. Five hundred one radiology reports from MIMIC-III Clinical Database version 1.4 was used for external validation. The pipeline achieved an overall F1 score of 0.94 on the internal set and 0.74 on external validation with the rule-based algorithm using expert input giving the best performance. Among the machine learning models, the Bi-LSTM_dropout model performed better than the ML model using XGBoost and the Bi-LSTM_simple model on internal set, whereas on external validation, the Bi-LSTM_simple model performed relatively better than other 2. This pipeline can be used for clinical concept-based classification of radiology reports related to lung carcinoma from a huge corpus and also for automated annotation of these reports.

Collapse

Affiliation(s)

Sneha Mithun Department of Radiation Oncology (Maastro), GROW School for Oncology and Reproduction, Maastricht University Medical Centre+, 6229 ET, Maastricht, The Netherlands. Department of Nuclear Medicine and Molecular Imaging, Tata Memorial Hospital, Mumbai, India. Homi Bhabha National Institute (HBNI), Deemed University, Mumbai, India.
Ashish Kumar Jha Department of Radiation Oncology (Maastro), GROW School for Oncology and Reproduction, Maastricht University Medical Centre+, 6229 ET, Maastricht, The Netherlands Department of Nuclear Medicine and Molecular Imaging, Tata Memorial Hospital, Mumbai, India Homi Bhabha National Institute (HBNI), Deemed University, Mumbai, India
Umesh B Sherkhane Department of Radiation Oncology (Maastro), GROW School for Oncology and Reproduction, Maastricht University Medical Centre+, 6229 ET, Maastricht, The Netherlands Department of Nuclear Medicine and Molecular Imaging, Tata Memorial Hospital, Mumbai, India
Vinay Jaiswar Department of Nuclear Medicine and Molecular Imaging, Tata Memorial Hospital, Mumbai, India
Nilendu C Purandare Department of Nuclear Medicine and Molecular Imaging, Tata Memorial Hospital, Mumbai, India Homi Bhabha National Institute (HBNI), Deemed University, Mumbai, India
Andre Dekker Department of Radiation Oncology (Maastro), GROW School for Oncology and Reproduction, Maastricht University Medical Centre+, 6229 ET, Maastricht, The Netherlands
Sander Puts Department of Radiation Oncology (Maastro), GROW School for Oncology and Reproduction, Maastricht University Medical Centre+, 6229 ET, Maastricht, The Netherlands
Inigo Bermejo Department of Radiation Oncology (Maastro), GROW School for Oncology and Reproduction, Maastricht University Medical Centre+, 6229 ET, Maastricht, The Netherlands
V Rangarajan Department of Nuclear Medicine and Molecular Imaging, Tata Memorial Hospital, Mumbai, India Homi Bhabha National Institute (HBNI), Deemed University, Mumbai, India
Catharina M L Zegers Department of Radiation Oncology (Maastro), GROW School for Oncology and Reproduction, Maastricht University Medical Centre+, 6229 ET, Maastricht, The Netherlands
Leonard Wee Department of Radiation Oncology (Maastro), GROW School for Oncology and Reproduction, Maastricht University Medical Centre+, 6229 ET, Maastricht, The Netherlands

Collapse

Yang L, Huang X, Wang J, Yang X, Ding L, Li Z, Li J. Identifying stroke-related quantified evidence from electronic health records in real-world studies. Artif Intell Med 2023;140:102552. [PMID: 37210153 DOI: 10.1016/j.artmed.2023.102552] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/18/2021] [Revised: 02/28/2023] [Accepted: 04/11/2023] [Indexed: 05/22/2023]

Abstract

BACKGROUND

Stroke is one of the leading causes of death and disability worldwide. The National Institutes of Health Stroke Scale (NIHSS) scores in electronic health records (EHRs), which quantitatively describe patients' neurological deficits in evidence-based treatment, are crucial in stroke-related clinical investigations. However, the free-text format and lack of standardization inhibit their effective use. Automatically extracting the scale scores from the clinical free text so that its potential value in real-world studies is realized has become an important goal.

OBJECTIVE

This study aims to develop an automated method to extract scale scores from the free text of EHRs.

METHODS

We propose a two-step pipeline method to identify NIHSS items and numerical scores and validate its feasibility using a freely accessible critical care database: MIMIC-III (Medical Information Mart for Intensive Care III). First, we utilize MIMIC-III to create an annotated corpus. Then, we investigate possible machine learning methods for two subtasks, NIHSS item and score recognition and item-score relation extraction. In the evaluation, we conduct both task-specific and end-to-end evaluations and compare our method with the rule-based method using precision, recall and F1 scores as evaluation metrics.

RESULTS

We use all available discharge summaries of stroke cases in MIMIC-III. The annotated NIHSS corpus contains 312 cases, 2929 scale items, 2774 scores and 2733 relations. The results show that the best F1-score of our method was 0.9006, which was attained by combining BERT-BiLSTM-CRF and Random Forest, and it outperformed the rule-based method (F1-score = 0.8098). In the end-to-end task, our method could successfully recognize the item "1b level of consciousness questions", the score "1" and their relation "('1b level of consciousness questions', '1', 'has value')" from the sentence "1b level of consciousness questions: said name = 1", while the rule-based method could not.

CONCLUSIONS

The two-step pipeline method we propose is an effective approach to identify NIHSS items, scores and their relations. With its help, clinical investigators can easily retrieve and access structured scale data, thereby supporting stroke-related real-world studies.

Collapse

Affiliation(s)

Lin Yang Institute of Medical Information and Library, Chinese Academy of Medical Sciences/Peking Union Medical College, Beijing 100020, China; Key Laboratory of Medical Information Intelligent Technology, Chinese Academy of Medical Sciences, Beijing 100020, China
Xiaoshuo Huang Institute of Medical Information and Library, Chinese Academy of Medical Sciences/Peking Union Medical College, Beijing 100020, China; School of Health Care Technology, Dalian Neusoft University of Information, Dalian 116023, China
Jiayang Wang Institute of Medical Information and Library, Chinese Academy of Medical Sciences/Peking Union Medical College, Beijing 100020, China
Xin Yang China National Clinical Research Center for Neurological Diseases, Beijing Tiantan Hospital, Capital Medical University, Beijing 100070, China; National Center for Healthcare Quality Management in Neurological Diseases, Beijing Tiantan Hospital, Capital Medical University, Beijing 100070, China
Lingling Ding China National Clinical Research Center for Neurological Diseases, Beijing Tiantan Hospital, Capital Medical University, Beijing 100070, China; Department of Neurology, Beijing Tiantan Hospital, Capital Medical University, Beijing 100070, China
Zixiao Li China National Clinical Research Center for Neurological Diseases, Beijing Tiantan Hospital, Capital Medical University, Beijing 100070, China; National Center for Healthcare Quality Management in Neurological Diseases, Beijing Tiantan Hospital, Capital Medical University, Beijing 100070, China; Department of Neurology, Beijing Tiantan Hospital, Capital Medical University, Beijing 100070, China
Jiao Li Institute of Medical Information and Library, Chinese Academy of Medical Sciences/Peking Union Medical College, Beijing 100020, China; Key Laboratory of Medical Information Intelligent Technology, Chinese Academy of Medical Sciences, Beijing 100020, China.

Collapse

Rani S, Jain A. Optimizing healthcare system by amalgamation of text processing and deep learning: a systematic review. MULTIMEDIA TOOLS AND APPLICATIONS 2023:1-25. [PMID: 37362695 PMCID: PMC10183315 DOI: 10.1007/s11042-023-15539-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 08/23/2021] [Revised: 05/18/2022] [Accepted: 04/19/2023] [Indexed: 06/28/2023]

Lokker C, Bagheri E, Abdelkader W, Parrish R, Afzal M, Navarro T, Cotoi C, Germini F, Linkins L, Brian Haynes R, Chu L, Iorio A. Deep Learning to Refine the Identification of High-Quality Clinical Research Articles from the Biomedical Literature: Performance Evaluation. J Biomed Inform 2023;142:104384. [PMID: 37164244 DOI: 10.1016/j.jbi.2023.104384] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/15/2022] [Revised: 04/24/2023] [Accepted: 05/03/2023] [Indexed: 05/12/2023]

Abstract

BACKGROUND

Identifying practice-ready evidence-based journal articles in medicine is a challenge due to the sheer volume of biomedical research publications. Newer approaches to support evidence discovery apply deep learning techniques to improve the efficiency and accuracy of classifying sound evidence.

OBJECTIVE

To determine how well deep learning models using variants of Bidirectional Encoder Representations from Transformers (BERT) identify high-quality evidence with high clinical relevance from the biomedical literature for consideration in clinical practice.

METHODS

We fine-tuned variations of BERT models (BERT_BASE, BioBERT, BlueBERT, and PubMedBERT) and compared their performance in classifying articles based on methodological quality criteria. The dataset used for fine-tuning models included titles and abstracts of >160,000 PubMed records from 2012-2020 that were of interest to human health which had been manually labeled based on meeting established critical appraisal criteria for methodological rigor. The data was randomly divided into 80:10:10 sets for training, validating, and testing. In addition to using the full unbalanced set, the training data was randomly undersampled into four balanced datasets to assess performance and select the best performing model. For each of the four sets, one model that maintained sensitivity (recall) at ≥99% was selected and were ensembled. The best performing model was evaluated in a prospective, blinded test and applied to an established reference standard, the Clinical Hedges dataset.

RESULTS

In training, three of the four selected best performing models were trained using BioBERT_BASE. The ensembled model did not boost performance compared with the best individual model. Hence a solo BioBERT-based model (named DL-PLUS) was selected for further testing as it was computationally more efficient. The model had high recall (>99%) and 60% to 77% specificity in a prospective evaluation conducted with blinded research associates and saved >60% of the work required to identify high quality articles.

CONCLUSIONS

Deep learning using pretrained language models and a large dataset of classified articles produced models with improved specificity while maintaining >99% recall. The resulting DL-PLUS model identifies high-quality, clinically relevant articles from PubMed at the time of publication. The model improves the efficiency of a literature surveillance program, which allows for faster dissemination of appraised research.

Collapse

Affiliation(s)

Cynthia Lokker Health Information Research Unit, Department of Health Research Methods, Evidence, and Impact, McMaster University, Hamilton, Ontario, Canada.
Elham Bagheri Health Information Research Unit, Department of Health Research Methods, Evidence, and Impact, McMaster University, Hamilton, Ontario, Canada
Wael Abdelkader Health Information Research Unit, Department of Health Research Methods, Evidence, and Impact, McMaster University, Hamilton, Ontario, Canada
Rick Parrish Health Information Research Unit, Department of Health Research Methods, Evidence, and Impact, McMaster University, Hamilton, Ontario, Canada
Muhammad Afzal Department of Computing, Birmingham City University, Birmingham, UK
Tamara Navarro Health Information Research Unit, Department of Health Research Methods, Evidence, and Impact, McMaster University, Hamilton, Ontario, Canada
Chris Cotoi Health Information Research Unit, Department of Health Research Methods, Evidence, and Impact, McMaster University, Hamilton, Ontario, Canada
Federico Germini Health Information Research Unit, Department of Health Research Methods, Evidence, and Impact, McMaster University, Hamilton, Ontario, Canada; Department of Medicine, McMaster University, Hamilton, Ontario, Canada
Lori Linkins Department of Medicine, McMaster University, Hamilton, Ontario, Canada
R Brian Haynes Health Information Research Unit, Department of Health Research Methods, Evidence, and Impact, McMaster University, Hamilton, Ontario, Canada; Department of Medicine, McMaster University, Hamilton, Ontario, Canada
Lingyang Chu Department of Computing and Software, McMaster University, Hamilton, Ontario, Canada
Alfonso Iorio Health Information Research Unit, Department of Health Research Methods, Evidence, and Impact, McMaster University, Hamilton, Ontario, Canada; Department of Medicine, McMaster University, Hamilton, Ontario, Canada

Collapse

Houssein EH, Mohamed RE, Ali AA. Heart disease risk factors detection from electronic health records using advanced NLP and deep learning techniques. Sci Rep 2023;13:7173. [PMID: 37138014 PMCID: PMC10156668 DOI: 10.1038/s41598-023-34294-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/19/2022] [Accepted: 04/27/2023] [Indexed: 05/05/2023] Open

Singh T, Roberts K, Cohen T, Cobb N, Franklin A, Myneni S. Discerning conversational context in online health communities for personalized digital behavior change solutions using Pragmatics to Reveal Intent in Social Media (PRISM) framework. J Biomed Inform 2023;140:104324. [PMID: 36842490 PMCID: PMC10206862 DOI: 10.1016/j.jbi.2023.104324] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/05/2022] [Revised: 02/18/2023] [Accepted: 02/21/2023] [Indexed: 02/28/2023]

Abstract

BACKGROUND

Online health communities (OHCs) have emerged as prominent platforms for behavior modification, and the digitization of online peer interactions has afforded researchers with unique opportunities to model multilevel mechanisms that drive behavior change. Existing studies, however, have been limited by a lack of methods that allow the capture of conversational context and socio-behavioral dynamics at scale, as manifested in these digital platforms.

OBJECTIVE

We develop, evaluate, and apply a novel methodological framework, Pragmatics to Reveal Intent in Social Media (PRISM), to facilitate granular characterization of peer interactions by combining multidimensional facets of human communication.

METHODS

We developed and applied PRISM to analyze peer interactions (N = 2.23 million) in QuitNet, an OHC for tobacco cessation. First, we generated a labeled set of peer interactions (n = 2,005) through manual annotation along three dimensions: communication themes (CTs), behavior change techniques (BCTs), and speech acts (SAs). Second, we used deep learning models to apply our qualitative codes at scale. Third, we applied our validated model to perform a retrospective analysis. Finally, using social network analysis (SNA), we portrayed large-scale patterns and relationships among the aforementioned communication dimensions embedded in peer interactions in QuitNet.

RESULTS

Qualitative analysis showed that the themes of social support and behavioral progress were common. The most used BCTs were feedback and monitoring and comparison of behavior, and users most commonly expressed their intentions using SAs-expressive and emotion. With additional in-domain pre-training, bidirectional encoder representations from Transformers (BERT) outperformed other deep learning models on the classification tasks. Content-specific SNA revealed that users' engagement or abstinence status is associated with the prevalence of various categories of BCTs and SAs, which also was evident from the visualization of network structures.

CONCLUSIONS

Our study describes the interplay of multilevel characteristics of online communication and their association with individual health behaviors.

Collapse

Matero M, Giorgi S, Curtis B, Ungar LH, Schwartz HA. Opioid death projections with AI-based forecasts using social media language. NPJ Digit Med 2023;6:35. [PMID: 36882633 PMCID: PMC9992514 DOI: 10.1038/s41746-023-00776-0] [Citation(s) in RCA: 4] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/08/2022] [Accepted: 02/13/2023] [Indexed: 03/09/2023] Open

Tabaie A, Orenstein EW, Kandaswamy S, Kamaleswaran R. Integrating structured and unstructured data for timely prediction of bloodstream infection among children. Pediatr Res 2023;93:969-975. [PMID: 35854085 DOI: 10.1038/s41390-022-02116-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 01/09/2022] [Revised: 04/08/2022] [Accepted: 05/08/2022] [Indexed: 11/09/2022]

Satti FA, Hussain M, Ali SI, Saleem M, Ali H, Chung TC, Lee S. A semantic sequence similarity based approach for extracting medical entities from clinical conversations. Inf Process Manag 2023. [DOI: 10.1016/j.ipm.2022.103213] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/13/2022]

Kariampuzha WZ, Alyea G, Qu S, Sanjak J, Mathé E, Sid E, Chatelaine H, Yadaw A, Xu Y, Zhu Q. Precision information extraction for rare disease epidemiology at scale. J Transl Med 2023;21:157. [PMID: 36855134 PMCID: PMC9972634 DOI: 10.1186/s12967-023-04011-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/28/2022] [Accepted: 02/18/2023] [Indexed: 03/02/2023] Open

Abstract

BACKGROUND

The United Nations recently made a call to address the challenges of an estimated 300 million persons worldwide living with a rare disease through the collection, analysis, and dissemination of disaggregated data. Epidemiologic Information (EI) regarding prevalence and incidence data of rare diseases is sparse and current paradigms of identifying, extracting, and curating EI rely upon time-intensive, error-prone manual processes. With these limitations, a clear understanding of the variation in epidemiology and outcomes for rare disease patients is hampered. This challenges the public health of rare diseases patients through a lack of information necessary to prioritize research, policy decisions, therapeutic development, and health system allocations.

METHODS

In this study, we developed a newly curated epidemiology corpus for Named Entity Recognition (NER), a deep learning framework, and a novel rare disease epidemiologic information pipeline named EpiPipeline4RD consisting of a web interface and Restful API. For the corpus creation, we programmatically gathered a representative sample of rare disease epidemiologic abstracts, utilized weakly-supervised machine learning techniques to label the dataset, and manually validated the labeled dataset. For the deep learning framework development, we fine-tuned our dataset and adapted the BioBERT model for NER. We measured the performance of our BioBERT model for epidemiology entity recognition quantitatively with precision, recall, and F1 and qualitatively through a comparison with Orphanet. We demonstrated the ability for our pipeline to gather, identify, and extract epidemiology information from rare disease abstracts through three case studies.

RESULTS

We developed a deep learning model to extract EI with overall F1 scores of 0.817 and 0.878, evaluated at the entity-level and token-level respectively, and which achieved comparable qualitative results to Orphanet's collection paradigm. Additionally, case studies of the rare diseases Classic homocystinuria, GRACILE syndrome, Phenylketonuria demonstrated the adequate recall of abstracts with epidemiology information, high precision of epidemiology information extraction through our deep learning model, and the increased efficiency of EpiPipeline4RD compared to a manual curation paradigm.

CONCLUSIONS

EpiPipeline4RD demonstrated high performance of EI extraction from rare disease literature to augment manual curation processes. This automated information curation paradigm will not only effectively empower development of the NIH Genetic and Rare Diseases Information Center (GARD), but also support the public health of the rare disease community.

Collapse

Affiliation(s)

William Z Kariampuzha Division of Rare Diseases Research Innovation, National Center for Advancing Translational Sciences (NCATS), National Institutes of Health (NIH), Bethesda, MD, USA
Gioconda Alyea Division of Rare Diseases Research Innovation, National Center for Advancing Translational Sciences (NCATS), National Institutes of Health (NIH), Bethesda, MD, USA
Sue Qu Division of Rare Diseases Research Innovation, National Center for Advancing Translational Sciences (NCATS), National Institutes of Health (NIH), Bethesda, MD, USA
Jaleal Sanjak Division of Pre-Clinical Innovation, National Center for Advancing Translational Sciences (NCATS), National Institutes of Health (NIH), 9800 Medical Center Drive, Rockville, MD, 20850, USA
Ewy Mathé Division of Pre-Clinical Innovation, National Center for Advancing Translational Sciences (NCATS), National Institutes of Health (NIH), 9800 Medical Center Drive, Rockville, MD, 20850, USA
Eric Sid Division of Rare Diseases Research Innovation, National Center for Advancing Translational Sciences (NCATS), National Institutes of Health (NIH), Bethesda, MD, USA
Haley Chatelaine Division of Pre-Clinical Innovation, National Center for Advancing Translational Sciences (NCATS), National Institutes of Health (NIH), 9800 Medical Center Drive, Rockville, MD, 20850, USA
Arjun Yadaw Division of Pre-Clinical Innovation, National Center for Advancing Translational Sciences (NCATS), National Institutes of Health (NIH), 9800 Medical Center Drive, Rockville, MD, 20850, USA
Yanji Xu Division of Rare Diseases Research Innovation, National Center for Advancing Translational Sciences (NCATS), National Institutes of Health (NIH), Bethesda, MD, USA
Qian Zhu Division of Pre-Clinical Innovation, National Center for Advancing Translational Sciences (NCATS), National Institutes of Health (NIH), 9800 Medical Center Drive, Rockville, MD, 20850, USA.

Collapse

Fine-tuning BERT for automatic ADME semantic labeling in FDA drug labeling to enhance product-specific guidance assessment. J Biomed Inform 2023;138:104285. [PMID: 36632860 DOI: 10.1016/j.jbi.2023.104285] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/25/2022] [Revised: 10/25/2022] [Accepted: 01/07/2023] [Indexed: 01/11/2023]

Moezzi SAR, Ghaedi A, Rahmanian M, Mousavi SZ, Sami A. Application of Deep Learning in Generating Structured Radiology Reports: A Transformer-Based Technique. J Digit Imaging 2023;36:80-90. [PMID: 36002778 PMCID: PMC9984654 DOI: 10.1007/s10278-022-00692-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/16/2021] [Revised: 06/20/2022] [Accepted: 07/27/2022] [Indexed: 11/29/2022] Open

Levine DM, Tuwani R, Kompa B, Varma A, Finlayson SG, Mehrotra A, Beam A. The Diagnostic and Triage Accuracy of the GPT-3 Artificial Intelligence Model. MEDRXIV : THE PREPRINT SERVER FOR HEALTH SCIENCES 2023:2023.01.30.23285067. [PMID: 36778449 PMCID: PMC9915829 DOI: 10.1101/2023.01.30.23285067] [Citation(s) in RCA: 23] [Impact Index Per Article: 23.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 02/04/2023]

Abstract

Importance

Artificial intelligence (AI) applications in health care have been effective in many areas of medicine, but they are often trained for a single task using labeled data, making deployment and generalizability challenging. Whether a general-purpose AI language model can perform diagnosis and triage is unknown.

Objective

Compare the general-purpose Generative Pre-trained Transformer 3 (GPT-3) AI model's diagnostic and triage performance to attending physicians and lay adults who use the Internet.

Design

We compared the accuracy of GPT-3's diagnostic and triage ability for 48 validated case vignettes of both common (e.g., viral illness) and severe (e.g., heart attack) conditions to lay people and practicing physicians. Finally, we examined how well calibrated GPT-3's confidence was for diagnosis and triage.

Setting and Participants

The GPT-3 model, a nationally representative sample of lay people, and practicing physicians.

Exposure

Validated case vignettes (<60 words; <6th grade reading level).

Main Outcomes and Measures

Correct diagnosis, correct triage.

Results

Among all cases, GPT-3 replied with the correct diagnosis in its top 3 for 88% (95% CI, 75% to 94%) of cases, compared to 54% (95% CI, 53% to 55%) for lay individuals (p<0.001) and 96% (95% CI, 94% to 97%) for physicians (p=0.0354). GPT-3 triaged (71% correct; 95% CI, 57% to 82%) similarly to lay individuals (74%; 95% CI, 73% to 75%; p=0.73); both were significantly worse than physicians (91%; 95% CI, 89% to 93%; p<0.001). As measured by the Brier score, GPT-3 confidence in its top prediction was reasonably well-calibrated for diagnosis (Brier score = 0.18) and triage (Brier score = 0.22).

Conclusions and Relevance

A general-purpose AI language model without any content-specific training could perform diagnosis at levels close to, but below physicians and better than lay individuals. The model was performed less well on triage, where its performance was closer to that of lay individuals.

Collapse

Jeon SH, Cho S. Edge Weight Updating Neural Network for Named Entity Normalization. Neural Process Lett 2022;55:1-22. [PMID: 36573130 PMCID: PMC9770557 DOI: 10.1007/s11063-022-11102-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 12/07/2022] [Indexed: 12/24/2022]

Zhang A, Xing L, Zou J, Wu JC. Shifting machine learning for healthcare from development to deployment and from models to data. Nat Biomed Eng 2022;6:1330-1345. [PMID: 35788685 DOI: 10.1038/s41551-022-00898-y] [Citation(s) in RCA: 51] [Impact Index Per Article: 25.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/24/2021] [Accepted: 05/03/2022] [Indexed: 01/14/2023]

Wang W, Li X, Ren H, Gao D, Fang A. Chinese Clinical Named Entity Recognition from Electronic Medical Records based on Multi-semantic Features by using RoBERTa-wwm and CNN: Model Development and Validation (Preprint). JMIR Med Inform 2022;11:e44597. [PMID: 37163343 DOI: 10.2196/44597] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/01/2022] [Revised: 02/18/2023] [Accepted: 03/31/2023] [Indexed: 04/03/2023] Open

Abstract

BACKGROUND

Clinical electronic medical records (EMRs) contain important information on patients' anatomy, symptoms, examinations, diagnoses, and medications. Large-scale mining of rich medical information from EMRs will provide notable reference value for medical research. With the complexity of Chinese grammar and blurred boundaries of Chinese words, Chinese clinical named entity recognition (CNER) remains a notable challenge. Follow-up tasks such as medical entity structuring, medical entity standardization, medical entity relationship extraction, and medical knowledge graph construction largely depend on medical named entity recognition effects. A promising CNER result would provide reliable support for building domain knowledge graphs, knowledge bases, and knowledge retrieval systems. Furthermore, it would provide research ideas for scientists and medical decision-making references for doctors and even guide patients on disease and health management. Therefore, obtaining excellent CNER results is essential.

OBJECTIVE

We aimed to propose a Chinese CNER method to learn semantics-enriched representations for comprehensively enhancing machines to understand deep semantic information of EMRs by using multisemantic features, which makes medical information more readable and understandable.

METHODS

First, we used Robustly Optimized Bidirectional Encoder Representation from Transformers Pretraining Approach Whole Word Masking (RoBERTa-wwm) with dynamic fusion and Chinese character features, including 5-stroke code, Zheng code, phonological code, and stroke code, extracted by 1-dimensional convolutional neural networks (CNNs) to obtain fine-grained semantic features of Chinese characters. Subsequently, we converted Chinese characters into square images to obtain Chinese character image features from another modality by using a 2-dimensional CNN. Finally, we input multisemantic features into Bidirectional Long Short-Term Memory with Conditional Random Fields to achieve Chinese CNER. The effectiveness of our model was compared with that of the baseline and existing research models, and the features involved in the model were ablated and analyzed to verify the model's effectiveness.

RESULTS

We collected 1379 Yidu-S4K EMRs containing 23,655 entities in 6 categories and 2007 self-annotated EMRs containing 118,643 entities in 7 categories. The experiments showed that our model outperformed the comparison experiments, with F₁-scores of 89.28% and 84.61% on the Yidu-S4K and self-annotated data sets, respectively. The results of the ablation analysis demonstrated that each feature and method we used could improve the entity recognition ability.

CONCLUSIONS

Our proposed CNER method would mine the richer deep semantic information in EMRs by multisemantic embedding using RoBERTa-wwm and CNNs, enhancing the semantic recognition of characters at different granularity levels and improving the generalization capability of the method by achieving information complementarity among different semantic features, thus making the machine semantically understand EMRs and improving the CNER task accuracy.

Collapse

Fu S, Vassilaki M, Ibrahim OA, Petersen RC, Pagali S, St Sauver J, Moon S, Wang L, Fan JW, Liu H, Sohn S. Quality assessment of functional status documentation in EHRs across different healthcare institutions. Front Digit Health 2022;4:958539. [PMID: 36238199 PMCID: PMC9552292 DOI: 10.3389/fdgth.2022.958539] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/31/2022] [Accepted: 09/05/2022] [Indexed: 11/29/2022] Open

Moqurrab SA, Tariq N, Anjum A, Asheralieva A, Malik SUR, Malik H, Pervaiz H, Gill SS. A Deep Learning-Based Privacy-Preserving Model for Smart Healthcare in Internet of Medical Things Using Fog Computing. WIRELESS PERSONAL COMMUNICATIONS 2022;126:2379-2401. [PMID: 36059591 PMCID: PMC9426374 DOI: 10.1007/s11277-021-09323-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Accepted: 11/04/2021] [Indexed: 06/15/2023]

Liu J, Capurro D, Nguyen A, Verspoor K. "Note Bloat" impacts deep learning-based NLP models for clinical prediction tasks. J Biomed Inform 2022;133:104149. [PMID: 35878821 DOI: 10.1016/j.jbi.2022.104149] [Citation(s) in RCA: 7] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/22/2022] [Revised: 05/28/2022] [Accepted: 07/19/2022] [Indexed: 10/17/2022]

Kaplar A, Stošović M, Kaplar A, Brković V, Naumović R, Kovačević A. Evaluation of clinical named entity recognition methods for Serbian electronic health records. Int J Med Inform 2022;164:104805. [PMID: 35653828 DOI: 10.1016/j.ijmedinf.2022.104805] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/22/2022] [Revised: 05/06/2022] [Accepted: 05/22/2022] [Indexed: 11/25/2022]

Abstract

BACKGROUND AND OBJECTIVES

The importance of clinical natural language processing (NLP) has increased with the adoption of electronic health records (EHRs). One of the critical tasks in clinical NLP is named entity recognition (NER). Clinical NER in the Serbian language is a severely under-researched area. The few approaches that have been proposed so far are based on rules or machine-learning models with hand-crafted features, while current state-of-the-art models have not been explored. The objective of this paper is to assess the performance of state-of-the-art NER methods on clinical narratives in the Serbian language.

MATERIALS AND METHODS

We designed an experimental setup for a comprehensive evaluation of state-of-the-art NER models. The gold standard corpus we used for the evaluation is comprised of discharge summaries from the Clinic for Nephrology at the University Clinical Center of Serbia. The following models were evaluated: conditional random fields (CRF), multilingual transformers (BERT Multilingual and XLM RoBERTa), and long short-term memory (LSTM) recurrent neural networks, and their ensembles. In addition, we investigated the necessity of the pretraining task of transformer based models and the use of pretrained word embeddings with LSTM model.

RESULTS

Our results show that individually CRF had the best precision, the pretrained BERT Multilingual model had the best recall values, and the LSTM model had the best F1 score. The best performance was achieved by combining the existing models in a majority voting ensemble with an F1 score of 0.892. The presented results are similar to the inter annotator agreement on our gold standard corpus and are comparable to existing state-of-the-art results for clinical NER reported in literature.

CONCLUSION

Existing state-of-the-art models can provide viable results for clinical named entity recognition when applied to languages with the complexity of the Serbian language without major modifications.

Collapse

Research on Aspect-Level Sentiment Analysis Based on Text Comments. Symmetry (Basel) 2022. [DOI: 10.3390/sym14051072] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/16/2022] Open

NEAR: Named Entity and Attribute Recognition of clinical concepts. J Biomed Inform 2022;130:104092. [DOI: 10.1016/j.jbi.2022.104092] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/14/2021] [Revised: 02/21/2022] [Accepted: 05/01/2022] [Indexed: 11/23/2022]

Chanda AK, Bai T, Yang Z, Vucetic S. Improving medical term embeddings using UMLS Metathesaurus. BMC Med Inform Decis Mak 2022;22:114. [PMID: 35488252 PMCID: PMC9052653 DOI: 10.1186/s12911-022-01850-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/27/2020] [Accepted: 03/29/2022] [Indexed: 11/25/2022] Open

Abstract

Background

Health providers create Electronic Health Records (EHRs) to describe the conditions and procedures used to treat their patients. Medical notes entered by medical staff in the form of free text are a particularly insightful component of EHRs. There is a great interest in applying machine learning tools on medical notes in numerous medical informatics applications. Learning vector representations, or embeddings, of terms in the notes, is an important pre-processing step in such applications. However, learning good embeddings is challenging because medical notes are rich in specialized terminology, and the number of available EHRs in practical applications is often very small.

Methods

In this paper, we propose a novel algorithm to learn embeddings of medical terms from a limited set of medical notes. The algorithm, called definition2vec, exploits external information in the form of medical term definitions. It is an extension of a skip-gram algorithm that incorporates textual definitions of medical terms provided by the Unified Medical Language System (UMLS) Metathesaurus.

Results

To evaluate the proposed approach, we used a publicly available Medical Information Mart for Intensive Care (MIMIC-III) EHR data set. We performed quantitative and qualitative experiments to measure the usefulness of the learned embeddings. The experimental results show that definition2vec keeps the semantically similar medical terms together in the embedding vector space even when they are rare or unobserved in the corpus. We also demonstrate that learned vector embeddings are helpful in downstream medical informatics applications.

Conclusion

This paper shows that medical term definitions can be helpful when learning embeddings of rare or previously unseen medical terms from a small corpus of specialized documents such as medical notes.

Collapse

Naseem U, Dunn AG, Khushi M, Kim J. Benchmarking for biomedical natural language processing tasks with a domain specific ALBERT. BMC Bioinformatics 2022;23:144. [PMID: 35448946 PMCID: PMC9022356 DOI: 10.1186/s12859-022-04688-w] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/12/2021] [Accepted: 03/31/2022] [Indexed: 11/10/2022] Open

Abstract

BACKGROUND

The abundance of biomedical text data coupled with advances in natural language processing (NLP) is resulting in novel biomedical NLP (BioNLP) applications. These NLP applications, or tasks, are reliant on the availability of domain-specific language models (LMs) that are trained on a massive amount of data. Most of the existing domain-specific LMs adopted bidirectional encoder representations from transformers (BERT) architecture which has limitations, and their generalizability is unproven as there is an absence of baseline results among common BioNLP tasks.

RESULTS

We present 8 variants of BioALBERT, a domain-specific adaptation of a lite bidirectional encoder representations from transformers (ALBERT), trained on biomedical (PubMed and PubMed Central) and clinical (MIMIC-III) corpora and fine-tuned for 6 different tasks across 20 benchmark datasets. Experiments show that a large variant of BioALBERT trained on PubMed outperforms the state-of-the-art on named-entity recognition (+ 11.09% BLURB score improvement), relation extraction (+ 0.80% BLURB score), sentence similarity (+ 1.05% BLURB score), document classification (+ 0.62% F1-score), and question answering (+ 2.83% BLURB score). It represents a new state-of-the-art in 5 out of 6 benchmark BioNLP tasks.

CONCLUSIONS

The large variant of BioALBERT trained on PubMed achieved a higher BLURB score than previous state-of-the-art models on 5 of the 6 benchmark BioNLP tasks. Depending on the task, 5 different variants of BioALBERT outperformed previous state-of-the-art models on 17 of the 20 benchmark datasets, showing that our model is robust and generalizable in the common BioNLP tasks. We have made BioALBERT freely available which will help the BioNLP community avoid computational cost of training and establish a new set of baselines for future efforts across a broad range of BioNLP tasks.

Collapse

Scarcity-aware spam detection technique for big data ecosystem. Pattern Recognit Lett 2022. [DOI: 10.1016/j.patrec.2022.03.021] [Citation(s) in RCA: 5] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/20/2022]

Botelle R, Bhavsar V, Kadra-Scalzo G, Mascio A, Williams MV, Roberts A, Velupillai S, Stewart R. Can natural language processing models extract and classify instances of interpersonal violence in mental healthcare electronic records: an applied evaluative study. BMJ Open 2022;12:e052911. [PMID: 35172999 PMCID: PMC8852656 DOI: 10.1136/bmjopen-2021-052911] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Indexed: 11/04/2022] Open

Abstract

OBJECTIVE

This paper evaluates the application of a natural language processing (NLP) model for extracting clinical text referring to interpersonal violence using electronic health records (EHRs) from a large mental healthcare provider.

DESIGN

A multidisciplinary team iteratively developed guidelines for annotating clinical text referring to violence. Keywords were used to generate a dataset which was annotated (ie, classified as affirmed, negated or irrelevant) for: presence of violence, patient status (ie, as perpetrator, witness and/or victim of violence) and violence type (domestic, physical and/or sexual). An NLP approach using a pretrained transformer model, BioBERT (Bidirectional Encoder Representations from Transformers for Biomedical Text Mining) was fine-tuned on the annotated dataset and evaluated using 10-fold cross-validation.

SETTING

We used the Clinical Records Interactive Search (CRIS) database, comprising over 500 000 de-identified EHRs of patients within the South London and Maudsley NHS Foundation Trust, a specialist mental healthcare provider serving an urban catchment area.

PARTICIPANTS

Searches of CRIS were carried out based on 17 predefined keywords. Randomly selected text fragments were taken from the results for each keyword, amounting to 3771 text fragments from the records of 2832 patients.

OUTCOME MEASURES

We estimated precision, recall and F1 score for each NLP model. We examined sociodemographic and clinical variables in patients giving rise to the text data, and frequencies for each annotated violence characteristic.

RESULTS

Binary classification models were developed for six labels (violence presence, perpetrator, victim, domestic, physical and sexual). Among annotations affirmed for the presence of any violence, 78% (1724) referred to physical violence, 61% (1350) referred to patients as perpetrator and 33% (731) to domestic violence. NLP models' precision ranged from 89% (perpetrator) to 98% (sexual); recall ranged from 89% (victim, perpetrator) to 97% (sexual).

CONCLUSIONS

State of the art NLP models can extract and classify clinical text on violence from EHRs at acceptable levels of scale, efficiency and accuracy.

Collapse

Syed S, Angel AJ, Syeda HB, Jennings CF, VanScoy J, Syed M, Greer M, Bhattacharyya S, Zozus M, Tharian B, Prior F. The h-ANN Model: Comprehensive Colonoscopy Concept Compilation Using Combined Contextual Embeddings. BIOMEDICAL ENGINEERING SYSTEMS AND TECHNOLOGIES, INTERNATIONAL JOINT CONFERENCE, BIOSTEC ... REVISED SELECTED PAPERS. BIOSTEC (CONFERENCE) 2022;5:189-200. [PMID: 35373222 PMCID: PMC8970464 DOI: 10.5220/0010903300003123] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/14/2023]

Deep contextual multi-task feature fusion for enhanced concept, negation and speculation detection from clinical notes. INFORMATICS IN MEDICINE UNLOCKED 2022. [DOI: 10.1016/j.imu.2022.101109] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/06/2022] Open

Flamholz ZN, Crane-Droesch A, Ungar LH, Weissman GE. Word embeddings trained on published case reports are lightweight, effective for clinical tasks, and free of protected health information. J Biomed Inform 2022;125:103971. [PMID: 34920127 PMCID: PMC8766939 DOI: 10.1016/j.jbi.2021.103971] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/02/2021] [Revised: 11/22/2021] [Accepted: 12/02/2021] [Indexed: 01/03/2023]

Abstract

OBJECTIVE

Quantify tradeoffs in performance, reproducibility, and resource demands across several strategies for developing clinically relevant word embeddings.

MATERIALS AND METHODS

We trained separate embeddings on all full-text manuscripts in the Pubmed Central (PMC) Open Access subset, case reports therein, the English Wikipedia corpus, the Medical Information Mart for Intensive Care (MIMIC) III dataset, and all notes in the University of Pennsylvania Health System (UPHS) electronic health record. We tested embeddings in six clinically relevant tasks including mortality prediction and de-identification, and assessed performance using the scaled Brier score (SBS) and the proportion of notes successfully de-identified, respectively.

RESULTS

Embeddings from UPHS notes best predicted mortality (SBS 0.30, 95% CI 0.15 to 0.45) while Wikipedia embeddings performed worst (SBS 0.12, 95% CI -0.05 to 0.28). Wikipedia embeddings most consistently (78% of notes) and the full PMC corpus embeddings least consistently (48%) de-identified notes. Across all six tasks, the full PMC corpus demonstrated the most consistent performance, and the Wikipedia corpus the least. Corpus size ranged from 49 million tokens (PMC case reports) to 10 billion (UPHS).

DISCUSSION

Embeddings trained on published case reports performed as least as well as embeddings trained on other corpora in most tasks, and clinical corpora consistently outperformed non-clinical corpora. No single corpus produced a strictly dominant set of embeddings across all tasks and so the optimal training corpus depends on intended use.

CONCLUSION

Embeddings trained on published case reports performed comparably on most clinical tasks to embeddings trained on larger corpora. Open access corpora allow training of clinically relevant, effective, and reproducible embeddings.

Collapse

Kalyan KS, Rajasekharan A, Sangeetha S. AMMU: A survey of transformer-based biomedical pretrained language models. J Biomed Inform 2021;126:103982. [PMID: 34974190 DOI: 10.1016/j.jbi.2021.103982] [Citation(s) in RCA: 20] [Impact Index Per Article: 6.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/14/2021] [Revised: 12/12/2021] [Accepted: 12/20/2021] [Indexed: 01/04/2023]

Richter-Pechanski P, Geis NA, Kiriakou C, Schwab DM, Dieterich C. Automatic extraction of 12 cardiovascular concepts from German discharge letters using pre-trained language models. Digit Health 2021;7:20552076211057662. [PMID: 34868618 PMCID: PMC8637713 DOI: 10.1177/20552076211057662] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/03/2021] [Accepted: 10/15/2021] [Indexed: 11/17/2022] Open

Naderi N, Knafou J, Copara J, Ruch P, Teodoro D. Ensemble of Deep Masked Language Models for Effective Named Entity Recognition in Health and Life Science Corpora. Front Res Metr Anal 2021;6:689803. [PMID: 34870074 PMCID: PMC8640190 DOI: 10.3389/frma.2021.689803] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/01/2021] [Accepted: 10/11/2021] [Indexed: 11/13/2022] Open

A contextual multi-task neural approach to medication and adverse events identification from clinical text. J Biomed Inform 2021;125:103960. [PMID: 34875387 DOI: 10.1016/j.jbi.2021.103960] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/27/2021] [Revised: 11/04/2021] [Accepted: 11/22/2021] [Indexed: 12/27/2022]

González-Fernández C, Fernández-Isabel A, Martín de Diego I, Fernández RR, Viseu Pinheiro J. Experts perception-based system to detect misinformation in health websites. Pattern Recognit Lett 2021. [DOI: 10.1016/j.patrec.2021.11.008] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/19/2022]