1
|
Bayani A, Ayotte A, Nikiema JN. Transformer-Based Tool for Automated Fact-Checking of Online Health Information: Development Study. JMIR INFODEMIOLOGY 2025; 5:e56831. [PMID: 39812653 PMCID: PMC11890130 DOI: 10.2196/56831] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 01/27/2024] [Revised: 05/08/2024] [Accepted: 12/24/2024] [Indexed: 01/16/2025]
Abstract
BACKGROUND Many people seek health-related information online. The significance of reliable information became particularly evident due to the potential dangers of misinformation. Therefore, discerning true and reliable information from false information has become increasingly challenging. OBJECTIVE This study aimed to present a pilot study in which we introduced a novel approach to automate the fact-checking process, leveraging PubMed resources as a source of truth using natural language processing transformer models to enhance the process. METHODS A total of 538 health-related web pages, covering 7 different disease subjects, were manually selected by Factually Health Company. The process included the following steps: (1) using transformer models of bidirectional encoder representations from transformers (BERT), BioBERT, and SciBERT, and traditional models of random forests and support vector machines, to classify the contents of web pages into 3 thematic categories (semiology, epidemiology, and management), (2) for each category in the web pages, a PubMed query was automatically produced using a combination of the "WellcomeBertMesh" and "KeyBERT" models, (3) top 20 related literatures were automatically extracted from PubMed, and finally, (4) the similarity checking techniques of cosine similarity and Jaccard distance were applied to compare the content of extracted literature and web pages. RESULTS The BERT model for the categorization of web page contents had good performance, with F1-scores and recall of 93% and 94% for semiology and epidemiology, respectively, and 96% for both the recall and F1-score for management. For each of the 3 categories in a web page, 1 PubMed query was generated and with each query, the 20 most related, open access articles within the category of systematic reviews and meta-analyses were extracted. Less than 10% of the extracted literature was irrelevant; those were deleted. For each web page, an average of 23% of the sentences were found to be very similar to the literature. Moreover, during the evaluation, it was found that cosine similarity outperformed the Jaccard distance measure when comparing the similarity between sentences from web pages and academic papers vectorized by BERT. However, there was a significant issue with false positives in the retrieved sentences when compared with accurate similarities, as some sentences had a similarity score exceeding 80%, but they could not be considered similar sentences. CONCLUSIONS In this pilot study, we have proposed an approach to automate the fact-checking of health-related online information. Incorporating content from PubMed or other scientific article databases as trustworthy resources can automate the discovery of similarly credible information in the health domain.
Collapse
Affiliation(s)
- Azadeh Bayani
- Laboratoire Transformation Numérique en Santé, LabTNS, Montreal, QC, Canada
- Centre de recherche en santé publique, Université de Montréal et CIUSSS du Centre-Sud-de-l'Île-de-Montréal, Montreal, QC, Canada
| | - Alexandre Ayotte
- Laboratoire Transformation Numérique en Santé, LabTNS, Montreal, QC, Canada
- Centre de recherche en santé publique, Université de Montréal et CIUSSS du Centre-Sud-de-l'Île-de-Montréal, Montreal, QC, Canada
| | - Jean Noel Nikiema
- Laboratoire Transformation Numérique en Santé, LabTNS, Montreal, QC, Canada
- Centre de recherche en santé publique, Université de Montréal et CIUSSS du Centre-Sud-de-l'Île-de-Montréal, Montreal, QC, Canada
- Department of Management, Evaluation and Health Policy, School of Public Health, Université de Montréal, Montreal, QC, Canada
| |
Collapse
|
2
|
Wen A, Wang L, He H, Fu S, Liu S, Hanauer DA, Harris DR, Kavuluru R, Zhang R, Natarajan K, Pavinkurve NP, Hajagos J, Rajupet S, Lingam V, Saltz M, Elowsky C, Moffitt RA, Koraishy FM, Palchuk MB, Donovan J, Lingrey L, Stone-DerHagopian G, Miller RT, Williams AE, Leese PJ, Kovach PI, Pfaff ER, Zemmel M, Pates RD, Guthe N, Haendel MA, Chute CG, Liu H. A Case Demonstration of the Open Health Natural Language Processing Toolkit From the National COVID-19 Cohort Collaborative and the Researching COVID to Enhance Recovery Programs for a Natural Language Processing System for COVID-19 or Postacute Sequelae of SARS CoV-2 Infection: Algorithm Development and Validation. JMIR Med Inform 2024; 12:e49997. [PMID: 39250782 PMCID: PMC11420592 DOI: 10.2196/49997] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/15/2023] [Revised: 12/11/2023] [Accepted: 03/01/2024] [Indexed: 09/11/2024] Open
Abstract
BACKGROUND A wealth of clinically relevant information is only obtainable within unstructured clinical narratives, leading to great interest in clinical natural language processing (NLP). While a multitude of approaches to NLP exist, current algorithm development approaches have limitations that can slow the development process. These limitations are exacerbated when the task is emergent, as is the case currently for NLP extraction of signs and symptoms of COVID-19 and postacute sequelae of SARS-CoV-2 infection (PASC). OBJECTIVE This study aims to highlight the current limitations of existing NLP algorithm development approaches that are exacerbated by NLP tasks surrounding emergent clinical concepts and to illustrate our approach to addressing these issues through the use case of developing an NLP system for the signs and symptoms of COVID-19 and PASC. METHODS We used 2 preexisting studies on PASC as a baseline to determine a set of concepts that should be extracted by NLP. This concept list was then used in conjunction with the Unified Medical Language System to autonomously generate an expanded lexicon to weakly annotate a training set, which was then reviewed by a human expert to generate a fine-tuned NLP algorithm. The annotations from a fully human-annotated test set were then compared with NLP results from the fine-tuned algorithm. The NLP algorithm was then deployed to 10 additional sites that were also running our NLP infrastructure. Of these 10 sites, 5 were used to conduct a federated evaluation of the NLP algorithm. RESULTS An NLP algorithm consisting of 12,234 unique normalized text strings corresponding to 2366 unique concepts was developed to extract COVID-19 or PASC signs and symptoms. An unweighted mean dictionary coverage of 77.8% was found for the 5 sites. CONCLUSIONS The evolutionary and time-critical nature of the PASC NLP task significantly complicates existing approaches to NLP algorithm development. In this work, we present a hybrid approach using the Open Health Natural Language Processing Toolkit aimed at addressing these needs with a dictionary-based weak labeling step that minimizes the need for additional expert annotation while still preserving the fine-tuning capabilities of expert involvement.
Collapse
Affiliation(s)
- Andrew Wen
- Department of Artificial Intelligence and Informatics, Mayo Clinic, Rochester, MN, United States
- McWilliams School of Biomedical Informatics, University of Texas Health Sciences Center at Houston, Houston, TX, United States
| | - Liwei Wang
- Department of Artificial Intelligence and Informatics, Mayo Clinic, Rochester, MN, United States
- McWilliams School of Biomedical Informatics, University of Texas Health Sciences Center at Houston, Houston, TX, United States
| | - Huan He
- Department of Artificial Intelligence and Informatics, Mayo Clinic, Rochester, MN, United States
| | - Sunyang Fu
- Department of Artificial Intelligence and Informatics, Mayo Clinic, Rochester, MN, United States
- McWilliams School of Biomedical Informatics, University of Texas Health Sciences Center at Houston, Houston, TX, United States
| | - Sijia Liu
- Department of Artificial Intelligence and Informatics, Mayo Clinic, Rochester, MN, United States
| | - David A Hanauer
- Department of Learning Health Sciences, University of Michigan Medical School, Ann Arbor, MI, United States
| | - Daniel R Harris
- Institute for Pharmaceutical Outcomes and Policy, College of Pharmacy, University of Kentucky, Lexington, KY, United States
| | - Ramakanth Kavuluru
- Division of Biomedical Informatics, Department of Internal Medicine, University of Kentucky, Lexington, KY, United States
| | - Rui Zhang
- Division of Health Data Science, University of Minnesota Medical School, Minneapolis, MN, United States
| | - Karthik Natarajan
- Department of Biomedical Informatics, Columbia University Irving Medical Center, New York, NY, United States
| | - Nishanth P Pavinkurve
- Department of Biomedical Informatics, Columbia University Irving Medical Center, New York, NY, United States
| | - Janos Hajagos
- Department of Biomedical Informatics, Stony Brook Medicine, Stony Brook, NY, United States
| | - Sritha Rajupet
- Department of Biomedical Informatics, Stony Brook Medicine, Stony Brook, NY, United States
| | - Veena Lingam
- Department of Biomedical Informatics, Stony Brook Medicine, Stony Brook, NY, United States
| | - Mary Saltz
- Department of Biomedical Informatics, Stony Brook Medicine, Stony Brook, NY, United States
| | - Corey Elowsky
- Department of Biomedical Informatics, Stony Brook Medicine, Stony Brook, NY, United States
| | - Richard A Moffitt
- Department of Biomedical Informatics, Stony Brook Medicine, Stony Brook, NY, United States
| | - Farrukh M Koraishy
- Division of Nephrology, Stony Brook Medicine, Stony Brook, NY, United States
| | | | | | | | | | - Robert T Miller
- Clinical and Translational Science Institute, Tufts Medical Center, Boston, MA, United States
| | - Andrew E Williams
- Clinical and Translational Science Institute, Tufts Medical Center, Boston, MA, United States
- Institute for Clinical Research and Health Policy Studies, Tufts Medical Center, Boston, MA, United States
| | - Peter J Leese
- North Carolina Translational and Clinical Sciences Institute, University of North Carolina School of Medicine, Chapel Hill, NC, United States
| | - Paul I Kovach
- North Carolina Translational and Clinical Sciences Institute, University of North Carolina School of Medicine, Chapel Hill, NC, United States
| | - Emily R Pfaff
- North Carolina Translational and Clinical Sciences Institute, University of North Carolina School of Medicine, Chapel Hill, NC, United States
| | - Mikhail Zemmel
- University of Virginia, Charlottesville, VA, United States
| | - Robert D Pates
- University of Virginia, Charlottesville, VA, United States
| | - Nick Guthe
- Department of Population Health, New York University Grossman School of Medicine, New York, NY, United States
| | - Melissa A Haendel
- University of Colorado Anschutz Medical Campus, Denver, CO, United States
| | - Christopher G Chute
- Schools of Medicine, Public Health, and Nursing, Johns Hopkins University, Baltimore, MD, United States
| | - Hongfang Liu
- Department of Artificial Intelligence and Informatics, Mayo Clinic, Rochester, MN, United States
- McWilliams School of Biomedical Informatics, University of Texas Health Sciences Center at Houston, Houston, TX, United States
| |
Collapse
|
3
|
Fu S, Wang L, He H, Wen A, Zong N, Kumari A, Liu F, Zhou S, Zhang R, Li C, Wang Y, St Sauver J, Liu H, Sohn S. A taxonomy for advancing systematic error analysis in multi-site electronic health record-based clinical concept extraction. J Am Med Inform Assoc 2024; 31:1493-1502. [PMID: 38742455 PMCID: PMC11187420 DOI: 10.1093/jamia/ocae101] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/15/2024] [Revised: 03/26/2024] [Accepted: 04/19/2024] [Indexed: 05/16/2024] Open
Abstract
BACKGROUND Error analysis plays a crucial role in clinical concept extraction, a fundamental subtask within clinical natural language processing (NLP). The process typically involves a manual review of error types, such as contextual and linguistic factors contributing to their occurrence, and the identification of underlying causes to refine the NLP model and improve its performance. Conducting error analysis can be complex, requiring a combination of NLP expertise and domain-specific knowledge. Due to the high heterogeneity of electronic health record (EHR) settings across different institutions, challenges may arise when attempting to standardize and reproduce the error analysis process. OBJECTIVES This study aims to facilitate a collaborative effort to establish common definitions and taxonomies for capturing diverse error types, fostering community consensus on error analysis for clinical concept extraction tasks. MATERIALS AND METHODS We iteratively developed and evaluated an error taxonomy based on existing literature, standards, real-world data, multisite case evaluations, and community feedback. The finalized taxonomy was released in both .dtd and .owl formats at the Open Health Natural Language Processing Consortium. The taxonomy is compatible with several different open-source annotation tools, including MAE, Brat, and MedTator. RESULTS The resulting error taxonomy comprises 43 distinct error classes, organized into 6 error dimensions and 4 properties, including model type (symbolic and statistical machine learning), evaluation subject (model and human), evaluation level (patient, document, sentence, and concept), and annotation examples. Internal and external evaluations revealed strong variations in error types across methodological approaches, tasks, and EHR settings. Key points emerged from community feedback, including the need to enhancing clarity, generalizability, and usability of the taxonomy, along with dissemination strategies. CONCLUSION The proposed taxonomy can facilitate the acceleration and standardization of the error analysis process in multi-site settings, thus improving the provenance, interpretability, and portability of NLP models. Future researchers could explore the potential direction of developing automated or semi-automated methods to assist in the classification and standardization of error analysis.
Collapse
Affiliation(s)
- Sunyang Fu
- Department of AI and Informatics, Mayo Clinic, Rochester, MN 55902, United States
- Center for Translational AI Excellence and Applications in Medicine, University of Texas Health Science Center at Houston, Houston, TX 77030, United States
| | - Liwei Wang
- Department of AI and Informatics, Mayo Clinic, Rochester, MN 55902, United States
- Center for Translational AI Excellence and Applications in Medicine, University of Texas Health Science Center at Houston, Houston, TX 77030, United States
| | - Huan He
- Department of Biomedical Informatics & Data Science, Yale University, New Haven, CT 06520, United States
| | - Andrew Wen
- Center for Translational AI Excellence and Applications in Medicine, University of Texas Health Science Center at Houston, Houston, TX 77030, United States
| | - Nansu Zong
- Department of AI and Informatics, Mayo Clinic, Rochester, MN 55902, United States
| | - Anamika Kumari
- Department of Population and Quantitative Health Sciences, University of Massachusetts Chan Medical School, Boston, MA 01655, United States
| | - Feifan Liu
- Department of Population and Quantitative Health Sciences, University of Massachusetts Chan Medical School, Boston, MA 01655, United States
| | - Sicheng Zhou
- Division of Computational Health Sciences, University of Minnesota, Minneapolis, MN 55455, United States
| | - Rui Zhang
- Division of Computational Health Sciences, University of Minnesota, Minneapolis, MN 55455, United States
| | - Chenyu Li
- Department of Health Information Management, University of Pittsburgh, Pittsburgh, PA 15260, United States
| | - Yanshan Wang
- Department of Health Information Management, University of Pittsburgh, Pittsburgh, PA 15260, United States
| | - Jennifer St Sauver
- Department of Quantitative Health Sciences, Mayo Clinic, Rochester, MN 55902, United States
| | - Hongfang Liu
- Department of AI and Informatics, Mayo Clinic, Rochester, MN 55902, United States
- Center for Translational AI Excellence and Applications in Medicine, University of Texas Health Science Center at Houston, Houston, TX 77030, United States
| | - Sunghwan Sohn
- Department of AI and Informatics, Mayo Clinic, Rochester, MN 55902, United States
| |
Collapse
|
4
|
Bayani A, Ayotte A, Nikiema JN. Automated Credibility Assessment of Web-Based Health Information Considering Health on the Net Foundation Code of Conduct (HONcode): Model Development and Validation Study. JMIR Form Res 2023; 7:e52995. [PMID: 38133919 PMCID: PMC10770789 DOI: 10.2196/52995] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/21/2023] [Revised: 12/12/2023] [Accepted: 12/13/2023] [Indexed: 12/23/2023] Open
Abstract
BACKGROUND An increasing number of users are turning to web-based sources as an important source of health care guidance information. Thus, trustworthy sources of information should be automatically identifiable using objective criteria. OBJECTIVE The purpose of this study was to automate the assessment of the Health on the Net Foundation Code of Conduct (HONcode) criteria, enhancing our ability to pinpoint trustworthy health information sources. METHODS A data set of 538 web pages displaying health content was collected from 43 health-related websites. HONcode criteria have been considered as web page and website levels. For the website-level criteria (confidentiality, transparency, financial disclosure, and advertising policy), a bag of keywords has been identified to assess the criteria using a rule-based model. For the web page-level criteria (authority, complementarity, justifiability, and attribution) several machine learning (ML) approaches were used. In total, 200 web pages were manually annotated until achieving a balanced representation in terms of frequency. In total, 3 ML models-random forest, support vector machines (SVM), and Bidirectional Encoder Representations from Transformers (BERT)-were trained on the initial annotated data. A second step of training was implemented for the complementarity criterion using the BERT model for multiclass classification of the complementarity sentences obtained by annotation and data augmentation (positive, negative, and noncommittal sentences). Finally, the remaining web pages were classified using the selected model and 100 sentences were randomly selected for manual review. RESULTS For web page-level criteria, the random forest model showed a good performance for the attribution criterion while displaying subpar performance in the others. BERT and SVM had a stable performance across all the criteria. BERT had a better area under the curve (AUC) of 0.96, 0.98, and 1.00 for neutral sentences, justifiability, and attribution, respectively. SVM had the overall better performance for the classification of complementarity with the AUC equal to 0.98. Finally, SVM and BERT had an equal AUC of 0.98 for the authority criterion. For the website level criteria, the rule-based model was able to retrieve web pages with an accuracy of 0.97 for confidentiality, 0.82 for transparency, and 0.51 for both financial disclosure and advertising policy. The final evaluation of the sentences determined 0.88 of precision and the agreement level of reviewers was computed at 0.82. CONCLUSIONS Our results showed the potential power of automating the HONcode criteria assessment using ML approaches. This approach could be used with different types of pretrained models to accelerate the text annotation, and classification and to improve the performance in low-resource cases. Further work needs to be conducted to determine how to assign different weights to the criteria, as well as to identify additional characteristics that should be considered for consolidating these criteria into a comprehensive reliability score.
Collapse
Affiliation(s)
- Azadeh Bayani
- Centre de recherche en santé publique, Université de Montréal et Centre intégré universitaire de santé et de services sociaux du Centre-Sud-de-l'Île-de-Montréal, Montréal, QC, Canada
- Laboratoire Transformation Numérique en Santé, Montreal, QC, Canada
| | - Alexandre Ayotte
- Centre de recherche en santé publique, Université de Montréal et Centre intégré universitaire de santé et de services sociaux du Centre-Sud-de-l'Île-de-Montréal, Montréal, QC, Canada
- Laboratoire Transformation Numérique en Santé, Montreal, QC, Canada
| | - Jean Noel Nikiema
- Centre de recherche en santé publique, Université de Montréal et Centre intégré universitaire de santé et de services sociaux du Centre-Sud-de-l'Île-de-Montréal, Montréal, QC, Canada
- Laboratoire Transformation Numérique en Santé, Montreal, QC, Canada
- Department of Management, Evaluation and Health Policy, School of Public Health, Université de Montréal, Montéal, QC, Canada
| |
Collapse
|
5
|
Serna García G, Al Khalaf R, Invernici F, Ceri S, Bernasconi A. CoVEffect: interactive system for mining the effects of SARS-CoV-2 mutations and variants based on deep learning. Gigascience 2022; 12:giad036. [PMID: 37222749 PMCID: PMC10205000 DOI: 10.1093/gigascience/giad036] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/05/2022] [Revised: 04/11/2023] [Accepted: 04/27/2023] [Indexed: 05/25/2023] Open
Abstract
BACKGROUND Literature about SARS-CoV-2 widely discusses the effects of variations that have spread in the past 3 years. Such information is dispersed in the texts of several research articles, hindering the possibility of practically integrating it with related datasets (e.g., millions of SARS-CoV-2 sequences available to the community). We aim to fill this gap, by mining literature abstracts to extract-for each variant/mutation-its related effects (in epidemiological, immunological, clinical, or viral kinetics terms) with labeled higher/lower levels in relation to the nonmutated virus. RESULTS The proposed framework comprises (i) the provisioning of abstracts from a COVID-19-related big data corpus (CORD-19) and (ii) the identification of mutation/variant effects in abstracts using a GPT2-based prediction model. The above techniques enable the prediction of mutations/variants with their effects and levels in 2 distinct scenarios: (i) the batch annotation of the most relevant CORD-19 abstracts and (ii) the on-demand annotation of any user-selected CORD-19 abstract through the CoVEffect web application (http://gmql.eu/coveffect), which assists expert users with semiautomated data labeling. On the interface, users can inspect the predictions and correct them; user inputs can then extend the training dataset used by the prediction model. Our prototype model was trained through a carefully designed process, using a minimal and highly diversified pool of samples. CONCLUSIONS The CoVEffect interface serves for the assisted annotation of abstracts, allowing the download of curated datasets for further use in data integration or analysis pipelines. The overall framework can be adapted to resolve similar unstructured-to-structured text translation tasks, which are typical of biomedical domains.
Collapse
Affiliation(s)
- Giuseppe Serna García
- Dipartimento di Informazione, Elettronica e Bioingegneria, 20133 Milano Country: Italy, Italy
| | - Ruba Al Khalaf
- Dipartimento di Informazione, Elettronica e Bioingegneria, 20133 Milano Country: Italy, Italy
| | - Francesco Invernici
- Dipartimento di Informazione, Elettronica e Bioingegneria, 20133 Milano Country: Italy, Italy
| | - Stefano Ceri
- Dipartimento di Informazione, Elettronica e Bioingegneria, 20133 Milano Country: Italy, Italy
| | - Anna Bernasconi
- Dipartimento di Informazione, Elettronica e Bioingegneria, 20133 Milano Country: Italy, Italy
| |
Collapse
|