1
|
García-Barragán Á, Sakor A, Vidal ME, Menasalvas E, Gonzalez JCS, Provencio M, Robles V. NSSC: a neuro-symbolic AI system for enhancing accuracy of named entity recognition and linking from oncologic clinical notes. Med Biol Eng Comput 2025; 63:749-772. [PMID: 39485651 PMCID: PMC11891111 DOI: 10.1007/s11517-024-03227-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/31/2024] [Accepted: 10/12/2024] [Indexed: 11/03/2024]
Abstract
Accurate recognition and linking of oncologic entities in clinical notes is essential for extracting insights across cancer research, patient care, clinical decision-making, and treatment optimization. We present the Neuro-Symbolic System for Cancer (NSSC), a hybrid AI framework that integrates neurosymbolic methods with named entity recognition (NER) and entity linking (EL) to transform unstructured clinical notes into structured terms using medical vocabularies, with the Unified Medical Language System (UMLS) as a case study. NSSC was evaluated on a dataset of clinical notes from breast cancer patients, demonstrating significant improvements in the accuracy of both entity recognition and linking compared to state-of-the-art models. Specifically, NSSC achieved a 33% improvement over BioFalcon and a 58% improvement over scispaCy. By combining large language models (LLMs) with symbolic reasoning, NSSC improves the recognition and interoperability of oncologic entities, enabling seamless integration with existing biomedical knowledge. This approach marks a significant advancement in extracting meaningful information from clinical narratives, offering promising applications in cancer research and personalized patient care.
Collapse
Affiliation(s)
- Álvaro García-Barragán
- Center of Biomedical Technology, Universidad Politécnica de Madrid, Campus Montegancedo, Pozuelo de Alarcón, 28223, Madrid, Spain.
| | - Ahmad Sakor
- Data Science Institute, Leibniz University of Hannover, Welfengarten 1, Hannover, 30060, Lower Saxony, Germany.
- Scientific Data Management Group, TIB-Leibniz Information Centre for Science and Technology, Welfengarten 1B, Hannover, 30167, Lower Saxony, Germany.
| | - Maria-Esther Vidal
- Data Science Institute, Leibniz University of Hannover, Welfengarten 1, Hannover, 30060, Lower Saxony, Germany.
- Scientific Data Management Group, TIB-Leibniz Information Centre for Science and Technology, Welfengarten 1B, Hannover, 30167, Lower Saxony, Germany.
| | - Ernestina Menasalvas
- Center of Biomedical Technology, Universidad Politécnica de Madrid, Campus Montegancedo, Pozuelo de Alarcón, 28223, Madrid, Spain.
| | | | | | - Víctor Robles
- Center of Biomedical Technology, Universidad Politécnica de Madrid, Campus Montegancedo, Pozuelo de Alarcón, 28223, Madrid, Spain.
| |
Collapse
|
2
|
Campillos-Llanos L, Valverde-Mateos A, Capllonch-Carrión A. Hybrid natural language processing tool for semantic annotation of medical texts in Spanish. BMC Bioinformatics 2025; 26:7. [PMID: 39780059 PMCID: PMC11708069 DOI: 10.1186/s12859-024-05949-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/28/2024] [Accepted: 09/30/2024] [Indexed: 01/11/2025] Open
Abstract
BACKGROUND Natural language processing (NLP) enables the extraction of information embedded within unstructured texts, such as clinical case reports and trial eligibility criteria. By identifying relevant medical concepts, NLP facilitates the generation of structured and actionable data, supporting complex tasks like cohort identification and the analysis of clinical records. To accomplish those tasks, we introduce a deep learning-based and lexicon-based named entity recognition (NER) tool for texts in Spanish. It performs medical NER and normalization, medication information extraction and detection of temporal entities, negation and speculation, and temporality or experiencer attributes (Age, Contraindicated, Negated, Speculated, Hypothetical, Future, Family_member, Patient and Other). We built the tool with a dedicated lexicon and rules adapted from NegEx and HeidelTime. Using these resources, we annotated a corpus of 1200 texts, with high inter-annotator agreement (average F1 = 0.841% ± 0.045 for entities, and average F1 = 0.881% ± 0.032 for attributes). We used this corpus to train Transformer-based models (RoBERTa-based models, mBERT and mDeBERTa). We integrated them with the dictionary-based system in a hybrid tool, and distribute the models via the Hugging Face hub. For an internal validation, we used a held-out test set and conducted an error analysis. For an external validation, eight medical professionals evaluated the system by revising the annotation of 200 new texts not used in development. RESULTS In the internal validation, the models yielded F1 values up to 0.915. In the external validation with 100 clinical trials, the tool achieved an average F1 score of 0.858 (± 0.032); and in 100 anonymized clinical cases, it achieved an average F1 score of 0.910 (± 0.019). CONCLUSIONS The tool is available at https://claramed.csic.es/medspaner . We also release the code ( https://github.com/lcampillos/medspaner ) and the annotated corpus to train the models.
Collapse
Affiliation(s)
| | - Ana Valverde-Mateos
- Medical Terminology Unit, Spanish Royal Academy of Medicine, C/Arrieta 12, 28013, Madrid, Spain
| | - Adrián Capllonch-Carrión
- Centro de Salud Retiro, Hospital Universitario Gregorio Marañon, C/Lope de Rueda, 43, 28009, Madrid, Spain
| |
Collapse
|
3
|
Benson R, Elia M, Hyams B, Chang JH, Hong JC. A Narrative Review on the Application of Large Language Models to Support Cancer Care and Research. Yearb Med Inform 2024; 33:90-98. [PMID: 40199294 PMCID: PMC12020524 DOI: 10.1055/s-0044-1800726] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 04/10/2025] Open
Abstract
OBJECTIVES The emergence of large language models has resulted in a significant shift in informatics research and carries promise in clinical cancer care. Here we provide a narrative review of the recent use of large language models (LLMs) to support cancer care, prevention, and research. METHODS We performed a search of the Scopus database for studies on the application of bidirectional encoder representations from transformers (BERT) and generative-pretrained transformer (GPT) LLMs in cancer care published between the start of 2021 and the end of 2023. We present salient and impactful papers related to each of these themes. RESULTS Studies identified focused on aspects of clinical decision support (CDS), cancer education, and support for research activities. The use of LLMs for CDS primarily focused on aspects of treatment and screening planning, treatment response, and the management of adverse events. Studies using LLMs for cancer education typically focused on question-answering, assessing cancer myths and misconceptions, and text summarization and simplification. Finally, studies using LLMs to support research activities focused on scientific writing and idea generation, cohort identification and extraction, clinical data processing, and NLP-centric tasks. CONCLUSIONS The application of LLMs in cancer care has shown promise across a variety of diverse use cases. Future research should utilize quantitative metrics, qualitative insights, and user insights in the development and evaluation of LLM-based cancer care tools. The development of open-source LLMs for use in cancer care research and activities should also be a priority.
Collapse
Affiliation(s)
- Ryzen Benson
- Department of Radiation Oncology, University of California, San Francisco, San Francisco, California
- Bakar Computational Health Sciences Institute, University of California, San Francisco, San Francisco, California
| | - Marianna Elia
- Department of Radiation Oncology, University of California, San Francisco, San Francisco, California
- Bakar Computational Health Sciences Institute, University of California, San Francisco, San Francisco, California
| | - Benjamin Hyams
- Department of Radiation Oncology, University of California, San Francisco, San Francisco, California
- Bakar Computational Health Sciences Institute, University of California, San Francisco, San Francisco, California
- School of Medicine, University of California, San Francisco, San Francisco, California
| | - Ji Hyun Chang
- Department of Radiation Oncology, University of California, San Francisco, San Francisco, California
- Bakar Computational Health Sciences Institute, University of California, San Francisco, San Francisco, California
- Department of Radiation Oncology, Seoul National University Hospital, Seoul National University College of Medicine, Seoul, Korea
| | - Julian C. Hong
- Department of Radiation Oncology, University of California, San Francisco, San Francisco, California
- Bakar Computational Health Sciences Institute, University of California, San Francisco, San Francisco, California
- UCSF UC Berkeley Joint Program in Computational Precision Health (CPH), San Francisco, CA
| |
Collapse
|
4
|
Solar M, Castañeda V, Ñanculef R, Dombrovskaia L, Araya M. A Data Ingestion Procedure towards a Medical Images Repository. SENSORS (BASEL, SWITZERLAND) 2024; 24:4985. [PMID: 39124032 PMCID: PMC11314906 DOI: 10.3390/s24154985] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/04/2024] [Revised: 07/02/2024] [Accepted: 07/29/2024] [Indexed: 08/12/2024]
Abstract
This article presents an ingestion procedure towards an interoperable repository called ALPACS (Anonymized Local Picture Archiving and Communication System). ALPACS provides services to clinical and hospital users, who can access the repository data through an Artificial Intelligence (AI) application called PROXIMITY. This article shows the automated procedure for data ingestion from the medical imaging provider to the ALPACS repository. The data ingestion procedure was successfully applied by the data provider (Hospital Clínico de la Universidad de Chile, HCUCH) using a pseudo-anonymization algorithm at the source, thereby ensuring that the privacy of patients' sensitive data is respected. Data transfer was carried out using international communication standards for health systems, which allows for replication of the procedure by other institutions that provide medical images. OBJECTIVES This article aims to create a repository of 33,000 medical CT images and 33,000 diagnostic reports with international standards (HL7 HAPI FHIR, DICOM, SNOMED). This goal requires devising a data ingestion procedure that can be replicated by other provider institutions, guaranteeing data privacy by implementing a pseudo-anonymization algorithm at the source, and generating labels from annotations via NLP. METHODOLOGY Our approach involves hybrid on-premise/cloud deployment of PACS and FHIR services, including transfer services for anonymized data to populate the repository through a structured ingestion procedure. We used NLP over the diagnostic reports to generate annotations, which were then used to train ML algorithms for content-based similar exam recovery. OUTCOMES We successfully implemented ALPACS and PROXIMITY 2.0, ingesting almost 19,000 thorax CT exams to date along with their corresponding reports.
Collapse
Affiliation(s)
- Mauricio Solar
- Departamento de Informática, Universidad Tecnica Federico Santa Maria, Campus Vitacura-Santiago, Vitacura 7660251, Chile
| | - Victor Castañeda
- DETEM, Faculty of Medicine, Universidad de Chile, Independencia-Santiago, Santiago 8380453, Chile;
| | - Ricardo Ñanculef
- Departamento de Informática, Universidad Tecnica Federico Santa Maria, Campus San Joaquin-Santiago, Santiago 8940897, Chile; (R.Ñ.); (L.D.)
| | - Lioubov Dombrovskaia
- Departamento de Informática, Universidad Tecnica Federico Santa Maria, Campus San Joaquin-Santiago, Santiago 8940897, Chile; (R.Ñ.); (L.D.)
| | - Mauricio Araya
- Departamento de Informática, Universidad Tecnica Federico Santa Maria, Campus Casa Central-Valparaíso, Valparaíso 2390123, Chile;
| |
Collapse
|
5
|
Ahumada R, Dunstan J, Rojas M, Peñafiel S, Paredes I, Báez P. Automatic Detection of Distant Metastasis Mentions in Radiology Reports in Spanish. JCO Clin Cancer Inform 2024; 8:e2300130. [PMID: 38194615 PMCID: PMC10793975 DOI: 10.1200/cci.23.00130] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/13/2023] [Revised: 10/12/2023] [Accepted: 11/08/2023] [Indexed: 01/11/2024] Open
Abstract
PURPOSE A critical task in oncology is extracting information related to cancer metastasis from electronic health records. Metastasis-related information is crucial for planning treatment, evaluating patient prognoses, and cancer research. However, the unstructured way in which findings of distant metastasis are often written in radiology reports makes it difficult to extract information automatically. The main aim of this study was to extract distant metastasis findings from free-text imaging and nuclear medicine reports to classify the patient status according to the presence or absence of distant metastasis. MATERIALS AND METHODS We created a distant metastasis annotated corpus using positron emission tomography-computed tomography and computed tomography reports of patients with prostate, colorectal, and breast cancers. Entities were labeled M1 or M0 according to affirmative or negative metastasis descriptions. We used a named entity recognition model on the basis of a bidirectional long short-term memory model and conditional random fields to identify entities. Mentions were subsequently used to classify whole reports into M1 or M0. RESULTS The model detected distant metastasis mentions with a weighted average F1 score performance of 0.84. Whole reports were classified with an F1 score of 0.92 for M0 documents and 0.90 for M1 documents. CONCLUSION These results show the usefulness of the model in detecting distant metastasis findings in three different types of cancer and the consequent classification of reports. The relevance of this study is to generate structured distant metastasis information from free-text imaging reports in Spanish. In addition, the manually annotated corpus, annotation guidelines, and code are freely released to the research community.
Collapse
Affiliation(s)
- Ricardo Ahumada
- Center of Medical Informatics and Telemedicine, Faculty of Medicine, University of Chile, Santiago, Chile
| | - Jocelyn Dunstan
- Department of Computer Science & the Institute for Mathematical Computing, Pontificia Universidad Católica de Chile, Santiago, Chile
| | - Matías Rojas
- Center for Mathematical Modeling—CNRS IRL 2807, Faculty of Physical and Mathematical Sciences, University of Chile, Santiago, Chile
| | - Sergio Peñafiel
- Unidad de Informática Médica y Data Science, Departamento de Investigación del Cáncer, Instituto Oncológico Fundación Arturo López Pérez, Santiago, Chile
| | - Inti Paredes
- Unidad de Informática Médica y Data Science, Departamento de Investigación del Cáncer, Instituto Oncológico Fundación Arturo López Pérez, Santiago, Chile
| | - Pablo Báez
- Center of Medical Informatics and Telemedicine, Faculty of Medicine, University of Chile, Santiago, Chile
| |
Collapse
|
6
|
Perez N, Cuadros M, Rigau G. Negation and speculation processing: A study on cue-scope labelling and assertion classification in Spanish clinical text. Artif Intell Med 2023; 145:102682. [PMID: 37925211 DOI: 10.1016/j.artmed.2023.102682] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/10/2022] [Revised: 08/25/2023] [Accepted: 10/06/2023] [Indexed: 11/06/2023]
Abstract
Natural Language Processing (NLP) based on new deep learning technology is contributing to the emergence of powerful solutions that help healthcare providers and researchers discover valuable patterns within insurmountable volumes of health records and scientific literature. Fundamental to the success of such solutions is the processing of negation and speculation. The article addresses this problem with state-of-the-art deep learning approaches from two perspectives: cue and scope labelling, and assertion classification. In light of the real struggle to access clinical annotated data, the study (a) proposes a methodology to automatically convert cue-scope annotations to assertion annotations; and (b) includes a range of scenarios with varying amounts of training data and adversarial test examples. The results expose the clear advantage of Transformer-based models in this regard, managing to overpass a series of baselines and the related work in the public corpus NUBes of clinical Spanish text.
Collapse
Affiliation(s)
- Naiara Perez
- SNLT group at Vicomtech Foundation, Basque Research and Technology Alliance (BRTA), Mikeletegi Pasealekua 57, Donostia/San Sebastián, 20009, Spain; HiTZ Basque Center for Language Technologies, University of the Basque Country (UPV-EHU), Manuel Lardizabal Ibilbidea 1, Donostia/San Sebastián, 20018, Spain.
| | - Montse Cuadros
- SNLT group at Vicomtech Foundation, Basque Research and Technology Alliance (BRTA), Mikeletegi Pasealekua 57, Donostia/San Sebastián, 20009, Spain
| | - German Rigau
- HiTZ Basque Center for Language Technologies, University of the Basque Country (UPV-EHU), Manuel Lardizabal Ibilbidea 1, Donostia/San Sebastián, 20018, Spain
| |
Collapse
|
7
|
Argüello-González G, Aquino-Esperanza J, Salvador D, Bretón-Romero R, Del Río-Bermudez C, Tello J, Menke S. Negation recognition in clinical natural language processing using a combination of the NegEx algorithm and a convolutional neural network. BMC Med Inform Decis Mak 2023; 23:216. [PMID: 37833661 PMCID: PMC10576331 DOI: 10.1186/s12911-023-02301-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/29/2023] [Accepted: 09/18/2023] [Indexed: 10/15/2023] Open
Abstract
BACKGROUND Important clinical information of patients is present in unstructured free-text fields of Electronic Health Records (EHRs). While this information can be extracted using clinical Natural Language Processing (cNLP), the recognition of negation modifiers represents an important challenge. A wide range of cNLP applications have been developed to detect the negation of medical entities in clinical free-text, however, effective solutions for languages other than English are scarce. This study aimed at developing a solution for negation recognition in Spanish EHRs based on a combination of a customized rule-based NegEx layer and a convolutional neural network (CNN). METHODS Based on our previous experience in real world evidence (RWE) studies using information embedded in EHRs, negation recognition was simplified into a binary problem ('affirmative' vs. 'non-affirmative' class). For the NegEx layer, negation rules were obtained from a publicly available Spanish corpus and enriched with custom ones, whereby the CNN binary classifier was trained on EHRs annotated for clinical named entities (cNEs) and negation markers by medical doctors. RESULTS The proposed negation recognition pipeline obtained precision, recall, and F1-score of 0.93, 0.94, and 0.94 for the 'affirmative' class, and 0.86, 0.84, and 0.85 for the 'non-affirmative' class, respectively. To validate the generalization capabilities of our methodology, we applied the negation recognition pipeline on EHRs (6,710 cNEs) from a different data source distribution than the training corpus and obtained consistent performance metrics for the 'affirmative' and 'non-affirmative' class (0.95, 0.97, and 0.96; and 0.90, 0.83, and 0.86 for precision, recall, and F1-score, respectively). Lastly, we evaluated the pipeline against two publicly available Spanish negation corpora, the IULA and NUBes, obtaining state-of-the-art metrics (1.00, 0.99, and 0.99; and 1.00, 0.93, and 0.96 for precision, recall, and F1-score, respectively). CONCLUSION Negation recognition is a source of low precision in the retrieval of cNEs from EHRs' free-text. Combining a customized rule-based NegEx layer with a CNN binary classifier outperformed many other current approaches. RWE studies highly benefit from the correct recognition of negation as it reduces false positive detections of cNE which otherwise would undoubtedly reduce the credibility of cNLP systems.
Collapse
Affiliation(s)
- Guillermo Argüello-González
- MedSavana SL, Madrid, 28004, Spain
- Statistics and Operations Research, University of Oviedo, Oviedo, 33003, Spain
| | - José Aquino-Esperanza
- MedSavana SL, Madrid, 28004, Spain
- Faculty of Medicine and Health Sciences, University of Barcelona, Barcelona, 08007, Spain
| | | | | | | | | | | |
Collapse
|
8
|
Shaitarova A, Zaghir J, Lavelli A, Krauthammer M, Rinaldi F. Exploring the Latest Highlights in Medical Natural Language Processing across Multiple Languages: A Survey. Yearb Med Inform 2023; 32:230-243. [PMID: 38147865 PMCID: PMC10751112 DOI: 10.1055/s-0043-1768726] [Citation(s) in RCA: 5] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/28/2023] Open
Abstract
OBJECTIVES This survey aims to provide an overview of the current state of biomedical and clinical Natural Language Processing (NLP) research and practice in Languages other than English (LoE). We pay special attention to data resources, language models, and popular NLP downstream tasks. METHODS We explore the literature on clinical and biomedical NLP from the years 2020-2022, focusing on the challenges of multilinguality and LoE. We query online databases and manually select relevant publications. We also use recent NLP review papers to identify the possible information lacunae. RESULTS Our work confirms the recent trend towards the use of transformer-based language models for a variety of NLP tasks in medical domains. In addition, there has been an increase in the availability of annotated datasets for clinical NLP in LoE, particularly in European languages such as Spanish, German and French. Common NLP tasks addressed in medical NLP research in LoE include information extraction, named entity recognition, normalization, linking, and negation detection. However, there is still a need for the development of annotated datasets and models specifically tailored to the unique characteristics and challenges of medical text in some of these languages, especially low-resources ones. Lastly, this survey highlights the progress of medical NLP in LoE, and helps at identifying opportunities for future research and development in this field.
Collapse
Affiliation(s)
| | - Jamil Zaghir
- Division of Medical Information Sciences, University Hospitals of Geneva, Geneva, Switzerland
- Department of Radiology and Medical Informatics, University of Geneva, Geneva, Switzerland
| | - Alberto Lavelli
- Natural Language Processing Research Unit, Center for Digital Health and Wellbeing, Fondazione Bruno Kessler, Trento, Italy
| | - Michael Krauthammer
- Department of Quantitative Biomedicine, University of Zurich, Zurich, Switzerland
- Biomedical Informatics, University Hospital Zurich, Zurich, Switzerland
| | - Fabio Rinaldi
- Natural Language Processing Research Unit, Center for Digital Health and Wellbeing, Fondazione Bruno Kessler, Trento, Italy
- Department of Quantitative Biomedicine, University of Zurich, Zurich, Switzerland
- Dalle Molle Institute for Artificial Intelligence Research, Lugano, Switzerland
- Swiss Institute of Bioinformatics
| |
Collapse
|
9
|
Grouin C, Grabar N, Section Editors for the IMIA Yearbook Section on Natural Language Processing . Year 2022 in Medical Natural Language Processing: Availability of Language Models as a Step in the Democratization of NLP in the Biomedical Area. Yearb Med Inform 2023; 32:244-252. [PMID: 38147866 PMCID: PMC10751107 DOI: 10.1055/s-0043-1768752] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/28/2023] Open
Abstract
OBJECTIVES To analyse the content of publications within the medical Natural Language Processing (NLP) domain in 2022. METHODS Automatic and manual preselection of publications to be reviewed, and selection of the best NLP papers of the year. Analysis of the important issues. RESULTS Three best papers have been selected. We also propose an analysis of the content of the NLP publications in 2022, stressing on some of the topics. CONCLUSION The main trend in 2022 is certainly related to the availability of large language models, especially those based on Transformers, and to their use by non-NLP researchers. This leads to the democratization of the NLP methods. We also observe the renewal of interest to languages other than English, the continuation of research on information extraction and prediction, the massive use of data from social media, and the consideration of needs and interests of patients.
Collapse
Affiliation(s)
- Cyril Grouin
- Université Paris Saclay, CNRS, Laboratoire Interdisciplinaire des Sciences du Numérique, 91400 Orsay, France
| | - Natalia Grabar
- UMR8163 STL, CNRS, Université de Lille, Domaine du Pont-de-bois, 59653 Villeneuve-d'Ascq cedex, France
| | | |
Collapse
|
10
|
Seong D, Choi YH, Shin SY, Yi BK. Deep learning approach to detection of colonoscopic information from unstructured reports. BMC Med Inform Decis Mak 2023; 23:28. [PMID: 36750932 PMCID: PMC9903463 DOI: 10.1186/s12911-023-02121-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/23/2022] [Accepted: 01/23/2023] [Indexed: 02/09/2023] Open
Abstract
BACKGROUND Colorectal cancer is a leading cause of cancer deaths. Several screening tests, such as colonoscopy, can be used to find polyps or colorectal cancer. Colonoscopy reports are often written in unstructured narrative text. The information embedded in the reports can be used for various purposes, including colorectal cancer risk prediction, follow-up recommendation, and quality measurement. However, the availability and accessibility of unstructured text data are still insufficient despite the large amounts of accumulated data. We aimed to develop and apply deep learning-based natural language processing (NLP) methods to detect colonoscopic information. METHODS This study applied several deep learning-based NLP models to colonoscopy reports. Approximately 280,668 colonoscopy reports were extracted from the clinical data warehouse of Samsung Medical Center. For 5,000 reports, procedural information and colonoscopic findings were manually annotated with 17 labels. We compared the long short-term memory (LSTM) and BioBERT model to select the one with the best performance for colonoscopy reports, which was the bidirectional LSTM with conditional random fields. Then, we applied pre-trained word embedding using large unlabeled data (280,668 reports) to the selected model. RESULTS The NLP model with pre-trained word embedding performed better for most labels than the model with one-hot encoding. The F1 scores for colonoscopic findings were: 0.9564 for lesions, 0.9722 for locations, 0.9809 for shapes, 0.9720 for colors, 0.9862 for sizes, and 0.9717 for numbers. CONCLUSIONS This study applied deep learning-based clinical NLP models to extract meaningful information from colonoscopy reports. The method in this study achieved promising results that demonstrate it can be applied to various practical purposes.
Collapse
Affiliation(s)
- Donghyeong Seong
- grid.264381.a0000 0001 2181 989XSamsung Advanced Institute for Health Sciences and Technology (SAIHST), Sungkyunkwan University, Seoul, 06355 Republic of Korea
| | - Yoon Ho Choi
- grid.264381.a0000 0001 2181 989XDepartment of Digital Health, SAIHST, Sungkyunkwan University, Seoul, 06355 Republic of Korea
| | - Soo-Yong Shin
- grid.264381.a0000 0001 2181 989XDepartment of Digital Health, SAIHST, Sungkyunkwan University, Seoul, 06355 Republic of Korea ,grid.414964.a0000 0001 0640 5613Research Institute for Future Medicine, Samsung Medical Center, Seoul, 06351 Republic of Korea
| | - Byoung-Kee Yi
- Department of Artificial Intelligence Convergence, Kangwon National University, 1 Kangwondaehak-Gil, Chuncheon-si, Gangwon-do, 24341, Republic of Korea.
| |
Collapse
|
11
|
Albahli S, Nazir T. AI-CenterNet CXR: An artificial intelligence (AI) enabled system for localization and classification of chest X-ray disease. Front Med (Lausanne) 2022; 9:955765. [PMID: 36111113 PMCID: PMC9469020 DOI: 10.3389/fmed.2022.955765] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/29/2022] [Accepted: 07/21/2022] [Indexed: 12/03/2022] Open
Abstract
Machine learning techniques have lately attracted a lot of attention for their potential to execute expert-level clinical tasks, notably in the area of medical image analysis. Chest radiography is one of the most often utilized diagnostic imaging modalities in medical practice, and it necessitates timely coverage regarding the presence of probable abnormalities and disease diagnoses in the images. Computer-aided solutions for the identification of chest illness using chest radiography are being developed in medical imaging research. However, accurate localization and categorization of specific disorders in chest X-ray images is still a challenging problem due to the complex nature of radiographs, presence of different distortions, high inter-class similarities, and intra-class variations in abnormalities. In this work, we have presented an Artificial Intelligence (AI)-enabled fully automated approach using an end-to-end deep learning technique to improve the accuracy of thoracic illness diagnosis. We proposed AI-CenterNet CXR, a customized CenterNet model with an improved feature extraction network for the recognition of multi-label chest diseases. The enhanced backbone computes deep key points that improve the abnormality localization accuracy and, thus, overall disease classification performance. Moreover, the proposed architecture is lightweight and computationally efficient in comparison to the original CenterNet model. We have performed extensive experimentation to validate the effectiveness of the proposed technique using the National Institutes of Health (NIH) Chest X-ray dataset. Our method achieved an overall Area Under the Curve (AUC) of 0.888 and an average IOU of 0.801 to detect and classify the eight types of chest abnormalities. Both the qualitative and quantitative findings reveal that the suggested approach outperforms the existing methods, indicating the efficacy of our approach.
Collapse
Affiliation(s)
- Saleh Albahli
- Department of Information Technology, College of Computer, Qassim University, Buraydah, Saudi Arabia
| | - Tahira Nazir
- Faculty of Computing, Riphah International University, Islamabad, Pakistan
| |
Collapse
|
12
|
An Artificial Intelligence-Based Tool for Data Analysis and Prognosis in Cancer Patients: Results from the Clarify Study. Cancers (Basel) 2022; 14:cancers14164041. [PMID: 36011034 PMCID: PMC9406336 DOI: 10.3390/cancers14164041] [Citation(s) in RCA: 13] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/14/2022] [Revised: 08/18/2022] [Accepted: 08/19/2022] [Indexed: 11/16/2022] Open
Abstract
Simple Summary Cancer is associated with significant morbimortality worldwide. Although significant advances have been made in the last few decades in terms of early detection and treatment, providing personalized care remains a challenge. Artificial intelligence (AI) has emerged as a means of improving cancer care with the use of computer science. Identification of risk factors for poor prognosis and patient profiling with AI techniques and tools is feasible and has potential application in clinical settings, including surveillance management. The goal of this study is to present an AI-based solution tool for cancer patients data analysis and improve their management by identifying clinical factors associated with relapse and survival, developing a prognostic model that identifies features associated with poor prognosis, and stratifying patients by risk. Abstract Background: Artificial intelligence (AI) has contributed substantially in recent years to the resolution of different biomedical problems, including cancer. However, AI tools with significant and widespread impact in oncology remain scarce. The goal of this study is to present an AI-based solution tool for cancer patients data analysis that assists clinicians in identifying the clinical factors associated with poor prognosis, relapse and survival, and to develop a prognostic model that stratifies patients by risk. Materials and Methods: We used clinical data from 5275 patients diagnosed with non-small cell lung cancer, breast cancer, and non-Hodgkin lymphoma at Hospital Universitario Puerta de Hierro-Majadahonda. Accessible clinical parameters measured with a wearable device and quality of life questionnaires data were also collected. Results: Using an AI-tool, data from 5275 cancer patients were analyzed, integrating clinical data, questionnaires data, and data collected from wearable devices. Descriptive analyses were performed in order to explore the patients’ characteristics, survival probabilities were calculated, and a prognostic model identified low and high-risk profile patients. Conclusion: Overall, the reconstruction of the population’s risk profile for the cancer-specific predictive model was achieved and proved useful in clinical practice using artificial intelligence. It has potential application in clinical settings to improve risk stratification, early detection, and surveillance management of cancer patients.
Collapse
|
13
|
Segura-Bedmar I, Camino-Perdones D, Guerrero-Aspizua S. Exploring deep learning methods for recognizing rare diseases and their clinical manifestations from texts. BMC Bioinformatics 2022; 23:263. [PMID: 35794528 PMCID: PMC9258216 DOI: 10.1186/s12859-022-04810-y] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/18/2022] [Accepted: 06/21/2022] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND AND OBJECTIVE Although rare diseases are characterized by low prevalence, approximately 400 million people are affected by a rare disease. The early and accurate diagnosis of these conditions is a major challenge for general practitioners, who do not have enough knowledge to identify them. In addition to this, rare diseases usually show a wide variety of manifestations, which might make the diagnosis even more difficult. A delayed diagnosis can negatively affect the patient's life. Therefore, there is an urgent need to increase the scientific and medical knowledge about rare diseases. Natural Language Processing (NLP) and Deep Learning can help to extract relevant information about rare diseases to facilitate their diagnosis and treatments. METHODS The paper explores several deep learning techniques such as Bidirectional Long Short Term Memory (BiLSTM) networks or deep contextualized word representations based on Bidirectional Encoder Representations from Transformers (BERT) to recognize rare diseases and their clinical manifestations (signs and symptoms). RESULTS BioBERT, a domain-specific language representation based on BERT and trained on biomedical corpora, obtains the best results with an F1 of 85.2% for rare diseases. Since many signs are usually described by complex noun phrases that involve the use of use of overlapped, nested and discontinuous entities, the model provides lower results with an F1 of 57.2%. CONCLUSIONS While our results are promising, there is still much room for improvement, especially with respect to the identification of clinical manifestations (signs and symptoms).
Collapse
Affiliation(s)
- Isabel Segura-Bedmar
- Human Language and Accesibility Technologies, Computer Science Department, Universidad Carlos III de Madrid, Avenidad de la Universidad, 30, Leganés, 28911 Madrid, Spain
| | - David Camino-Perdones
- Human Language and Accesibility Technologies, Computer Science Department, Universidad Carlos III de Madrid, Avenidad de la Universidad, 30, Leganés, 28911 Madrid, Spain
| | - Sara Guerrero-Aspizua
- Tissue Engineering and Regenerative Medicine group, Department of Bioengineering, Universidad Carlos III de Madrid, Avenidad de la Universidad, 30, Leganés, 28911 Madrid, Spain
- Hospital Fundación Jiménez Díaz e Instituto de Investigación, FJD, Av. de los Reyes Católicos, 2, 28040 Madrid, Spain
- Epithelial Biomedicine Division, CIEMAT, Avda. Complutense 40, 28029 Madrid, Spain
- Centre for Biomedical Network Research on Rare Diseases (CIBERER), C/Monforte de Lemos 3-5, 28029 Madrid, Spain
| |
Collapse
|
14
|
Liu Y, Li J, Liu C, Wei J. Evaluation of cultivated land quality using attention mechanism-back propagation neural network. PeerJ Comput Sci 2022; 8:e948. [PMID: 35494807 PMCID: PMC9044315 DOI: 10.7717/peerj-cs.948] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/21/2021] [Accepted: 03/24/2022] [Indexed: 06/14/2023]
Abstract
Cultivated land quality is related to the quality and safety of agricultural products and to ecological safety. Therefore, reasonably evaluating the quality of land, which is helpful in identifying its benefits, is crucial. However, most studies have used traditional methods to estimate cultivated land quality, and there is little research on using deep learning for this purpose. Using Ya'an cultivated land as the research object, this study constructs an evaluation system for cultivated land quality based on seven aspects, including soil organic matter and soil texture. An attention mechanism (AM) is introduced into a back propagation (BP) neural network model. Therefore, an AM-BP neural network that is suitable for Ya'an cultivated land is designed. The sample is divided into training and test sets by a ratio of 7:3. We can output the evaluation results of cultivated land quality through experiments. Furthermore, they can be visualized through a pie chart. The experimental results indicate that the model effect of the AM-BP neural network is better than that of the BP neural network. That is, the mean square error is reduced by approximately 0.0019 and the determination coefficient is increased by approximately 0.005. In addition, this study obtains better results via the ensemble model. The quality of cultivated land in Yucheng District is generally good, i.e.,mostly third and fourth grades. It conforms to the normal distribution. Lastly, the method has certain to evaluate cultivated land quality, providing a reference for future cultivated land quality evaluation.
Collapse
Affiliation(s)
- Yulin Liu
- College of Information Engineering, Sichuan Agricultural University, Ya’an, Sichuan, China
| | - Jiaolong Li
- College of Information Engineering, Sichuan Agricultural University, Ya’an, Sichuan, China
| | - Chuang Liu
- College of Information Engineering, Sichuan Agricultural University, Ya’an, Sichuan, China
| | - Jiangshu Wei
- College of Information Engineering, Sichuan Agricultural University, Ya’an, Sichuan, China
| |
Collapse
|