1
|
Bilal M, Hamza A, Malik N. NLP for Analyzing Electronic Health Records and Clinical Notes in Cancer Research: A Review. J Pain Symptom Manage 2025; 69:e374-e394. [PMID: 39894080 DOI: 10.1016/j.jpainsymman.2025.01.019] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 10/31/2024] [Revised: 12/31/2024] [Accepted: 01/20/2025] [Indexed: 02/04/2025]
Abstract
This review examines the application of natural language processing (NLP) techniques in cancer research using electronic health records (EHRs) and clinical notes. It addresses gaps in existing literature by providing a broader perspective than previous studies focused on specific cancer types or applications. A comprehensive literature search in the Scopus database identified 94 relevant studies published between 2019 and 2024. The analysis revealed a growing trend in NLP applications for cancer research, with information extraction (47 studies) and text classification (40 studies) emerging as predominant NLP tasks, followed by named entity recognition (7 studies). Among cancer types, breast, lung, and colorectal cancers were found to be the most studied. A significant shift from rule-based and traditional machine learning approaches to advanced deep learning techniques and transformer-based models was observed. It was found that dataset sizes used in existing studies varied widely, ranging from small, manually annotated datasets to large-scale EHRs. The review highlighted key challenges, including the limited generalizability of proposed solutions and the need for improved integration into clinical workflows. While NLP techniques show significant potential in analyzing EHRs and clinical notes for cancer research, future work should focus on improving model generalizability, enhancing robustness in handling complex clinical language, and expanding applications to understudied cancer types. The integration of NLP tools into palliative medicine and addressing ethical considerations remain crucial for utilizing the full potential of NLP in enhancing cancer diagnosis, treatment, and patient outcomes. This review provides valuable insights into the current state and future directions of NLP applications in cancer research.
Collapse
Affiliation(s)
- Muhammad Bilal
- Department of Pharmaceutical Outcomes and Policy (M.B.), University of Florida, Gainesville, Florida, USA; Department of Software Engineering (M.B.), National University of Computer and Emerging Sciences, Islamabad, Pakistan.
| | - Ameer Hamza
- Department of Computer Science (A.H.), Faculty of Computing and IT, University of Sargodha, Sargodha, Punjab, Pakistan
| | - Nadia Malik
- Department of Software Engineering (N.M.), Faculty of Computing and IT, University of Sargodha, Sargodha, Punjab, Pakistan
| |
Collapse
|
2
|
Nair SS, Devi VM, Bhasi S. Enhanced lung cancer detection: Integrating improved random walker segmentation with artificial neural network and random forest classifier. Heliyon 2024; 10:e29032. [PMID: 38617949 PMCID: PMC11015404 DOI: 10.1016/j.heliyon.2024.e29032] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/08/2023] [Revised: 03/22/2024] [Accepted: 03/28/2024] [Indexed: 04/16/2024] Open
Abstract
Background Medical image segmentation is a vital yet difficult job because of the multimodality of the acquired images. It is difficult to locate the polluted area before it spreads. Methods This research makes use of several machine learning tools, including an artificial neural network as well as a random forest classifier, to increase the system's reliability of pulmonary nodule classification. Anisotropic diffusion filtering is initially used to remove noise from a picture. After that, a modified random walk method is used to get the region of interest inside the lung parenchyma. Finally, the features corresponding to the consistency of the picture segments are extracted using texture-based feature extraction for pulmonary nodules. The final stage is to identify and classify the pulmonary nodules using a classifier algorithm. Results The studies employ cross-validation to demonstrate the validity of the diagnosis framework. In this instance, the proposed method is tested using CT scan information provided by the Lung Image Database Consortium. A random forest classifier showed 99.6 percent accuracy rate for detecting lung cancer, compared to a artificial neural network's 94.8 percent accuracy rate. Conclusions Due to this, current research is now primarily concerned with identifying lung nodules and classifying them as benign or malignant. The diagnostic potential of machine learning as well as image processing approaches are enormous for the categorization of lung cancer.
Collapse
Affiliation(s)
- Sneha S. Nair
- Department of Physics, Noorul Islam Centre for Higher Education, Kumarakovil, Kanyakumari District, Tamil Nadu, India
| | - V.N. Meena Devi
- Department of Physics, Noorul Islam Centre for Higher Education, Kumarakovil, Kanyakumari District, Tamil Nadu, India
| | - Saju Bhasi
- Department of Radiation Physics, Regional Cancer Centre, Thiruvananthapuram, Kerala, India
| |
Collapse
|
3
|
Yang Y, Lu Y, Yan W. A comprehensive review on knowledge graphs for complex diseases. Brief Bioinform 2023; 24:6931722. [PMID: 36528805 DOI: 10.1093/bib/bbac543] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/22/2022] [Revised: 11/02/2022] [Accepted: 11/10/2022] [Indexed: 12/23/2022] Open
Abstract
In recent years, knowledge graphs (KGs) have gained a great deal of popularity as a tool for storing relationships between entities and for performing higher level reasoning. KGs in biomedicine and clinical practice aim to provide an elegant solution for diagnosing and treating complex diseases more efficiently and flexibly. Here, we provide a systematic review to characterize the state-of-the-art of KGs in the area of complex disease research. We cover the following topics: (1) knowledge sources, (2) entity extraction methods, (3) relation extraction methods and (4) the application of KGs in complex diseases. As a result, we offer a complete picture of the domain. Finally, we discuss the challenges in the field by identifying gaps and opportunities for further research and propose potential research directions of KGs for complex disease diagnosis and treatment.
Collapse
Affiliation(s)
- Yang Yang
- School of Computer Science & Technology, Soochow University, Suzhou 215000, China.,Collaborative Innovation Center of Novel Software Technology and Industrialization, Nanjing 210000, China
| | - Yuwei Lu
- School of Computer Science & Technology, Soochow University, Suzhou 215000, China.,Collaborative Innovation Center of Novel Software Technology and Industrialization, Nanjing 210000, China
| | - Wenying Yan
- Department of Bioinformatics, School of Biology and Basic Medical Sciences, Medical College of Soochow University, and Center for Systems Biology, Soochow University, Suzhou 215123, China
| |
Collapse
|
4
|
Mendoza-Urbano DM, Garcia JF, Moreno JS, Bravo-Ocaña JC, Riascos AJ, Zambrano Harvey A, Prada SI. Automated extraction of information from free text of Spanish oncology pathology reports. Colomb Med (Cali) 2023; 54:e2035300. [PMID: 37614525 PMCID: PMC10443791 DOI: 10.25100/cm.v54i1.5300] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/10/2022] [Revised: 08/02/2022] [Accepted: 09/20/2022] [Indexed: 08/25/2023] Open
Abstract
Background Pathology reports are stored as unstructured, ungrammatical, fragmented, and abbreviated free text with linguistic variability among pathologists. For this reason, tumor information extraction requires a significant human effort. Recording data in an efficient and high-quality format is essential in implementing and establishing a hospital-based-cancer registry. Objective This study aimed to describe implementing a natural language processing algorithm for oncology pathology reports. Methods An algorithm was developed to process oncology pathology reports in Spanish to extract 20 medical descriptors. The approach is based on the successive coincidence of regular expressions. Results The validation was performed with 140 pathological reports. The topography identification was performed manually by humans and the algorithm in all reports. The human identified morphology in 138 reports and by the algorithm in 137. The average fuzzy matching score was 68.3 for Topography and 89.5 for Morphology. Conclusions A preliminary algorithm validation against human extraction was performed over a small set of reports with satisfactory results. This shows that a regular-expression approach can accurately and precisely extract multiple specimen attributes from free-text Spanish pathology reports. Additionally, we developed a website to facilitate collaborative validation at a larger scale which may be helpful for future research on the subject.
Collapse
Affiliation(s)
| | | | - Juan Sebastian Moreno
- Quantil SAS. Bogotá, Colombia
- Centro de Analítica para Políticas Públicas. Bogotá, Colombia
| | | | - Alvaro José Riascos
- Quantil SAS. Bogotá, Colombia
- Centro de Analítica para Políticas Públicas. Bogotá, Colombia
- Universidad de los Andes, Facultad de Economía. Bogotá, Colombia
| | | | - Sergio I Prada
- Fundación Valle del Lili, Centro de Investigaciones Clínicas, Cali, Colombia
- Universidad Icesi, Centro PROESA, Cali, Colombia
| |
Collapse
|
5
|
Negation and Speculation in NLP: A Survey, Corpora, Methods, and Applications. APPLIED SCIENCES-BASEL 2022. [DOI: 10.3390/app12105209] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/17/2022]
Abstract
Negation and speculation are universal linguistic phenomena that affect the performance of Natural Language Processing (NLP) applications, such as those for opinion mining and information retrieval, especially in biomedical data. In this article, we review the corpora annotated with negation and speculation in various natural languages and domains. Furthermore, we discuss the ongoing research into recent rule-based, supervised, and transfer learning techniques for the detection of negating and speculative content. Many English corpora for various domains are now annotated with negation and speculation; moreover, the availability of annotated corpora in other languages has started to increase. However, this growth is insufficient to address these important phenomena in languages with limited resources. The use of cross-lingual models and translation of the well-known languages are acceptable alternatives. We also highlight the lack of consistent annotation guidelines and the shortcomings of the existing techniques, and suggest alternatives that may speed up progress in this research direction. Adding more syntactic features may alleviate the limitations of the existing techniques, such as cue ambiguity and detecting the discontinuous scopes. In some NLP applications, inclusion of a system that is negation- and speculation-aware improves performance, yet this aspect is still not addressed or considered an essential step.
Collapse
|
6
|
Solarte Pabón O, Montenegro O, Torrente M, Rodríguez González A, Provencio M, Menasalvas E. Negation and uncertainty detection in clinical texts written in Spanish: a deep learning-based approach. PeerJ Comput Sci 2022; 8:e913. [PMID: 35494817 PMCID: PMC9044225 DOI: 10.7717/peerj-cs.913] [Citation(s) in RCA: 10] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/26/2021] [Accepted: 02/10/2022] [Indexed: 06/14/2023]
Abstract
Detecting negation and uncertainty is crucial for medical text mining applications; otherwise, extracted information can be incorrectly identified as real or factual events. Although several approaches have been proposed to detect negation and uncertainty in clinical texts, most efforts have focused on the English language. Most proposals developed for Spanish have focused mainly on negation detection and do not deal with uncertainty. In this paper, we propose a deep learning-based approach for both negation and uncertainty detection in clinical texts written in Spanish. The proposed approach explores two deep learning methods to achieve this goal: (i) Bidirectional Long-Short Term Memory with a Conditional Random Field layer (BiLSTM-CRF) and (ii) Bidirectional Encoder Representation for Transformers (BERT). The approach was evaluated using NUBES and IULA, two public corpora for the Spanish language. The results obtained showed an F-score of 92% and 80% in the scope recognition task for negation and uncertainty, respectively. We also present the results of a validation process conducted using a real-life annotated dataset from clinical notes belonging to cancer patients. The proposed approach shows the feasibility of deep learning-based methods to detect negation and uncertainty in Spanish clinical texts. Experiments also highlighted that this approach improves performance in the scope recognition task compared to other proposals in the biomedical domain.
Collapse
Affiliation(s)
- Oswaldo Solarte Pabón
- Centro de Tecnología Biomédica, Universidad Politécnica de Madrid, Madrid, Spain
- Escuela de Ingeniería de Sistemas y Computación, Universidad del Valle, Cali, Colombia
| | - Orlando Montenegro
- Escuela de Ingeniería de Sistemas y Computación, Universidad del Valle, Cali, Colombia
| | | | | | | | - Ernestina Menasalvas
- Centro de Tecnología Biomédica, Universidad Politécnica de Madrid, Madrid, Spain
| |
Collapse
|
7
|
Deep contextual multi-task feature fusion for enhanced concept, negation and speculation detection from clinical notes. INFORMATICS IN MEDICINE UNLOCKED 2022. [DOI: 10.1016/j.imu.2022.101109] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/06/2022] Open
|