1
|
Bilal M, Hamza A, Malik N. NLP for Analyzing Electronic Health Records and Clinical Notes in Cancer Research: A Review. J Pain Symptom Manage 2025; 69:e374-e394. [PMID: 39894080 DOI: 10.1016/j.jpainsymman.2025.01.019] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 10/31/2024] [Revised: 12/31/2024] [Accepted: 01/20/2025] [Indexed: 02/04/2025]
Abstract
This review examines the application of natural language processing (NLP) techniques in cancer research using electronic health records (EHRs) and clinical notes. It addresses gaps in existing literature by providing a broader perspective than previous studies focused on specific cancer types or applications. A comprehensive literature search in the Scopus database identified 94 relevant studies published between 2019 and 2024. The analysis revealed a growing trend in NLP applications for cancer research, with information extraction (47 studies) and text classification (40 studies) emerging as predominant NLP tasks, followed by named entity recognition (7 studies). Among cancer types, breast, lung, and colorectal cancers were found to be the most studied. A significant shift from rule-based and traditional machine learning approaches to advanced deep learning techniques and transformer-based models was observed. It was found that dataset sizes used in existing studies varied widely, ranging from small, manually annotated datasets to large-scale EHRs. The review highlighted key challenges, including the limited generalizability of proposed solutions and the need for improved integration into clinical workflows. While NLP techniques show significant potential in analyzing EHRs and clinical notes for cancer research, future work should focus on improving model generalizability, enhancing robustness in handling complex clinical language, and expanding applications to understudied cancer types. The integration of NLP tools into palliative medicine and addressing ethical considerations remain crucial for utilizing the full potential of NLP in enhancing cancer diagnosis, treatment, and patient outcomes. This review provides valuable insights into the current state and future directions of NLP applications in cancer research.
Collapse
Affiliation(s)
- Muhammad Bilal
- Department of Pharmaceutical Outcomes and Policy (M.B.), University of Florida, Gainesville, Florida, USA; Department of Software Engineering (M.B.), National University of Computer and Emerging Sciences, Islamabad, Pakistan.
| | - Ameer Hamza
- Department of Computer Science (A.H.), Faculty of Computing and IT, University of Sargodha, Sargodha, Punjab, Pakistan
| | - Nadia Malik
- Department of Software Engineering (N.M.), Faculty of Computing and IT, University of Sargodha, Sargodha, Punjab, Pakistan
| |
Collapse
|
2
|
Wang L, Wen A, Fu S, Ruan X, Huang M, Li R, Lu Q, Lyu H, Williams AE, Liu H. A scoping review of OMOP CDM adoption for cancer research using real world data. NPJ Digit Med 2025; 8:189. [PMID: 40189628 PMCID: PMC11973147 DOI: 10.1038/s41746-025-01581-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/20/2024] [Accepted: 03/23/2025] [Indexed: 04/09/2025] Open
Abstract
The Observational Medical Outcomes Partnership (OMOP) common data model (CDM) supports large-scale research by enabling distributed network analyses. However, the breadth of its adoption in cancer research is not well understood. We conducted a scoping review to describe the adoption of the OMOP CDM in cancer research. A total of 49 unique articles were included in the review, with 30 on the data analysis theme, and 20 on the infrastructure theme. This review highlighted that while the OMOP CDM ecosystem has enabled successful data support for cancer research, particularly for collaborative studies, ongoing model development and iterative improvement remain needed to fulfill additional research data needs. Expanding disease sites, specifically for rare cancers, integrating more diverse types of data sources, improving data quality, adopting advanced analytics methodology, and increasing multisite evaluations serve as important opportunities to facilitate secondary usage of observational data in future cancer research.
Collapse
Affiliation(s)
- Liwei Wang
- McWilliams School of Biomedical Informatics, The University of Texas Health Science Center at Houston, Houston, TX, USA
| | - Andrew Wen
- McWilliams School of Biomedical Informatics, The University of Texas Health Science Center at Houston, Houston, TX, USA
| | - Sunyang Fu
- McWilliams School of Biomedical Informatics, The University of Texas Health Science Center at Houston, Houston, TX, USA
| | - Xiaoyang Ruan
- McWilliams School of Biomedical Informatics, The University of Texas Health Science Center at Houston, Houston, TX, USA
| | - Ming Huang
- McWilliams School of Biomedical Informatics, The University of Texas Health Science Center at Houston, Houston, TX, USA
| | - Rui Li
- McWilliams School of Biomedical Informatics, The University of Texas Health Science Center at Houston, Houston, TX, USA
| | - Qiuhao Lu
- McWilliams School of Biomedical Informatics, The University of Texas Health Science Center at Houston, Houston, TX, USA
| | - Heather Lyu
- Department of Surgical Oncology, Division of Surgery, The University of Texas MD Anderson Cancer Center, Houston, TX, USA
| | - Andrew E Williams
- Clinical and Translational Science Institute, Tufts Medical Center, Boston, MA, USA
- Institute for Clinical Research and Health Policy Studies, Tufts Medical Center, Boston, MA, USA
| | - Hongfang Liu
- McWilliams School of Biomedical Informatics, The University of Texas Health Science Center at Houston, Houston, TX, USA.
| |
Collapse
|
3
|
Forjaz G, Kohler B, Coleman MP, Steliarova-Foucher E, Negoita S, Guidry Auvil JM, Michels FS, Goderre J, Wiggins C, Durbin EB, Geleijnse G, Henrion MC, Altmayer C, Dubois T, Penberthy L. Making the Case for an International Childhood Cancer Data Partnership. J Natl Cancer Inst 2025:djaf003. [PMID: 39799506 DOI: 10.1093/jnci/djaf003] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/13/2024] [Revised: 12/04/2024] [Accepted: 01/03/2025] [Indexed: 01/15/2025] Open
Abstract
Childhood cancers are a heterogeneous group of rare diseases, accounting for less than 2% of all cancers diagnosed worldwide. Most countries, therefore, do not have enough cases to provide robust information on epidemiology, treatment, and late effects, especially for rarer types of cancer. Thus, only through a concerted effort to share data internationally will we be able to answer research questions that could not otherwise be answered. With this goal in mind, the U.S. National Cancer Institute and the French National Cancer Institute co-sponsored the Paris Conference for an International Childhood Cancer Data Partnership in November 2023. This meeting convened more than 200 participants from 17 countries to address complex challenges in pediatric cancer research and data sharing. This Commentary delves into some key topics discussed during the Paris Conference and describes pilots that will help move this international effort forward. Main topics presented include: 1) the wide variation in interpreting the European Union's General Data Protection Regulation among Member States; 2) obstacles with transferring personal health data outside of the European Union; 3) standardization and harmonization, including common data models; and 4) novel approaches to data sharing such as federated querying and federated learning. We finally provide a brief description of three ongoing pilot projects. The International Childhood Cancer Data Partnership is the first step in developing a process to better support pediatric cancer research internationally through combining data from multiple countries.
Collapse
Affiliation(s)
- Gonçalo Forjaz
- Public Health Practice, Westat, Inc, ., Rockville, MD, USA
| | - Betsy Kohler
- North American Association of Central Cancer Registries, Springfield, IL, USA
| | - Michel P Coleman
- London School of Hygiene & Tropical Medicine, Cancer Survival Group, UK, London
| | | | - Serban Negoita
- Surveillance Research Program, Division of Cancer Control and Population Sciences, National Cancer Institute, Rockville, MD, USA
| | - Jaime M Guidry Auvil
- Center for Biomedical Informatics & Information Technology, National Cancer Institute, Rockville, MD, USA
| | | | - Johanna Goderre
- Surveillance Research Program, Division of Cancer Control and Population Sciences, National Cancer Institute, Rockville, MD, USA
| | - Charles Wiggins
- New Mexico Tumor Registry, University of New Mexico Comprehensive Cancer Center, Albuquerque, NM, USA
| | - Eric B Durbin
- Kentucky Cancer Registry, Markey Cancer Center, Lexington, KY, USA
| | - Gijs Geleijnse
- Netherlands Comprehensive Cancer Organisation, Utrecht, The Netherlands
| | | | | | | | - Lynne Penberthy
- Surveillance Research Program, Division of Cancer Control and Population Sciences, National Cancer Institute, Rockville, MD, USA
| |
Collapse
|
4
|
Song WH, Park M. RCC-Supporter: supporting renal cell carcinoma treatment decision-making using machine learning. BMC Med Inform Decis Mak 2024; 24:259. [PMID: 39285449 PMCID: PMC11403845 DOI: 10.1186/s12911-024-02660-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/12/2023] [Accepted: 08/30/2024] [Indexed: 09/22/2024] Open
Abstract
BACKGROUND The population diagnosed with renal cell carcinoma, especially in Asia, represents 36.6% of global cases, with the incidence rate of renal cell carcinoma in Korea steadily increasing annually. However, treatment options for renal cell carcinoma are diverse, depending on clinical stage and histologic characteristics. Hence, this study aims to develop a machine learning based clinical decision-support system that recommends personalized treatment tailored to the individual health condition of each patient. RESULTS We reviewed the real-world medical data of 1,867 participants diagnosed with renal cell carcinoma between November 2008 and June 2021 at the Pusan National University Yangsan Hospital in South Korea. Data were manually divided into a follow-up group where the patients did not undergo surgery or chemotherapy (Surveillance), a group where the patients underwent surgery (Surgery), and a group where the patients received chemotherapy before or after surgery (Chemotherapy). Feature selection was conducted to identify the significant clinical factors influencing renal cell carcinoma treatment decisions from 2,058 features. These features included subsets of 20, 50, 75, 100, and 150, as well as the complete set and an additional 50 expert-selected features. We applied representative machine learning algorithms, namely Decision Tree, Random Forest, and Gradient Boosting Machine (GBM). We analyzed the performance of three applied machine learning algorithms, among which the GBM algorithm achieved an accuracy score of 95% (95% CI, 92-98%) for the 100 and 150 feature sets. The GBM algorithm using 100 and 150 features achieved better performance than the algorithm using features selected by clinical experts (93%, 95% CI 89-97%). CONCLUSIONS We developed a preliminary personalized treatment decision-support system (TDSS) called "RCC-Supporter" by applying machine learning (ML) algorithms to determine personalized treatment for the various clinical situations of RCC patients. Our results demonstrate the feasibility of using machine learning-based clinical decision support systems for treatment decisions in real clinical settings.
Collapse
Affiliation(s)
- Won Hoon Song
- Department of Urology, Pusan National University School of Medicine, Yangsan, Republic of Korea
- Department of Urology, Pusan National University Yangsan Hospital, Yangsan, Republic of Korea
| | - Meeyoung Park
- Department of Computer Engineering, Kyungnam University, 7, Gyeongnamdaehak-ro, Masanhappo-gu, Changwon-si, 51767, Gyeongsangnam-do, Republic of Korea.
| |
Collapse
|
5
|
Cho H, Yoo S, Kim B, Jang S, Sunwoo L, Kim S, Lee D, Kim S, Nam S, Chung JH. Extracting lung cancer staging descriptors from pathology reports: A generative language model approach. J Biomed Inform 2024; 157:104720. [PMID: 39233209 DOI: 10.1016/j.jbi.2024.104720] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/10/2024] [Revised: 08/04/2024] [Accepted: 08/31/2024] [Indexed: 09/06/2024]
Abstract
BACKGROUND In oncology, electronic health records contain textual key information for the diagnosis, staging, and treatment planning of patients with cancer. However, text data processing requires a lot of time and effort, which limits the utilization of these data. Recent advances in natural language processing (NLP) technology, including large language models, can be applied to cancer research. Particularly, extracting the information required for the pathological stage from surgical pathology reports can be utilized to update cancer staging according to the latest cancer staging guidelines. OBJECTIVES This study has two main objectives. The first objective is to evaluate the performance of extracting information from text-based surgical pathology reports and determining pathological stages based on the extracted information using fine-tuned generative language models (GLMs) for patients with lung cancer. The second objective is to determine the feasibility of utilizing relatively small GLMs for information extraction in a resource-constrained computing environment. METHODS Lung cancer surgical pathology reports were collected from the Common Data Model database of Seoul National University Bundang Hospital (SNUBH), a tertiary hospital in Korea. We selected 42 descriptors necessary for tumor-node (TN) classification based on these reports and created a gold standard with validation by two clinical experts. The pathology reports and gold standard were used to generate prompt-response pairs for training and evaluating GLMs which then were used to extract information required for staging from pathology reports. RESULTS We evaluated the information extraction performance of six trained models as well as their performance in TN classification using the extracted information. The Deductive Mistral-7B model, which was pre-trained with the deductive dataset, showed the best performance overall, with an exact match ratio of 92.24% in the information extraction problem and an accuracy of 0.9876 (predicting T and N classification concurrently) in classification. CONCLUSION This study demonstrated that training GLMs with deductive datasets can improve information extraction performance, and GLMs with a relatively small number of parameters at approximately seven billion can achieve high performance in this problem. The proposed GLM-based information extraction method is expected to be useful in clinical decision-making support, lung cancer staging and research.
Collapse
Affiliation(s)
- Hyeongmin Cho
- ezCaretech Research & Development Center, Jung-gu, Seoul, Republic of Korea
| | - Sooyoung Yoo
- Office of eHealth Research and Business, Seoul National University Bundang Hospital, Seongnam, Republic of Korea
| | - Borham Kim
- Office of eHealth Research and Business, Seoul National University Bundang Hospital, Seongnam, Republic of Korea
| | - Sowon Jang
- Department of Radiology, Seoul National University Bundang Hospital, Seongnam, Republic of Korea
| | - Leonard Sunwoo
- Department of Radiology, Seoul National University Bundang Hospital, Seongnam, Republic of Korea
| | - Sanghwan Kim
- ezCaretech Research & Development Center, Jung-gu, Seoul, Republic of Korea
| | - Donghyoung Lee
- ezCaretech Research & Development Center, Jung-gu, Seoul, Republic of Korea
| | - Seok Kim
- Office of eHealth Research and Business, Seoul National University Bundang Hospital, Seongnam, Republic of Korea
| | - Sejin Nam
- ezCaretech Research & Development Center, Jung-gu, Seoul, Republic of Korea.
| | - Jin-Haeng Chung
- Department of Pathology, Seoul National University College of Medicine, Seoul, Republic of Korea; Department of Pathology and Translational Medicine Seoul National University Bundang Hospital, Seongnam, Republic of Korea.
| |
Collapse
|
6
|
Wang L, Wen A, Fu S, Ruan X, Huang M, Li R, Lu Q, Williams AE, Liu H. Adoption of the OMOP CDM for Cancer Research using Real-world Data: Current Status and Opportunities. MEDRXIV : THE PREPRINT SERVER FOR HEALTH SCIENCES 2024:2024.08.23.24311950. [PMID: 39228725 PMCID: PMC11370549 DOI: 10.1101/2024.08.23.24311950] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Subscribe] [Scholar Register] [Indexed: 09/05/2024]
Abstract
Background The Observational Medical Outcomes Partnership (OMOP) common data model (CDM) that is developed and maintained by the Observational Health Data Sciences and Informatics (OHDSI) community supports large scale cancer research by enabling distributed network analysis. As the number of studies using the OMOP CDM for cancer research increases, there is a growing need for an overview of the scope of cancer research that relies on the OMOP CDM ecosystem. Objectives In this study, we present a comprehensive review of the adoption of the OMOP CDM for cancer research and offer some insights on opportunities in leveraging the OMOP CDM ecosystem for advancing cancer research. Materials and Methods Published literature databases were searched to retrieve OMOP CDM and cancer-related English language articles published between January 2010 and December 2023. A charting form was developed for two main themes, i.e., clinically focused data analysis studies and infrastructure development studies in the cancer domain. Results In total, 50 unique articles were included, with 30 for the data analysis theme and 23 for the infrastructure theme, with 3 articles belonging to both themes. The topics covered by the existing body of research was depicted. Conclusion Through depicting the status quo of research efforts to improve or leverage the potential of the OMOP CDM ecosystem for advancing cancer research, we identify challenges and opportunities surrounding data analysis and infrastructure including data quality, advanced analytics methodology adoption, in-depth phenotypic data inclusion through NLP, and multisite evaluation.
Collapse
Affiliation(s)
- Liwei Wang
- McWilliams School of Biomedical Informatics, The University of Texas Health Science Center at Houston, TX, USA
| | - Andrew Wen
- McWilliams School of Biomedical Informatics, The University of Texas Health Science Center at Houston, TX, USA
| | - Sunyang Fu
- McWilliams School of Biomedical Informatics, The University of Texas Health Science Center at Houston, TX, USA
| | - Xiaoyang Ruan
- McWilliams School of Biomedical Informatics, The University of Texas Health Science Center at Houston, TX, USA
| | - Ming Huang
- McWilliams School of Biomedical Informatics, The University of Texas Health Science Center at Houston, TX, USA
| | - Rui Li
- McWilliams School of Biomedical Informatics, The University of Texas Health Science Center at Houston, TX, USA
| | - Qiuhao Lu
- McWilliams School of Biomedical Informatics, The University of Texas Health Science Center at Houston, TX, USA
| | - Andrew E Williams
- Clinical and Translational Science Institute Tufts Medical Center Boston US
- Institute for Clinical Research and Health Policy Studies Tufts Medical Center Boston US
| | - Hongfang Liu
- McWilliams School of Biomedical Informatics, The University of Texas Health Science Center at Houston, TX, USA
| |
Collapse
|
7
|
Gholipour M, Khajouei R, Amiri P, Hajesmaeel Gohari S, Ahmadian L. Extracting cancer concepts from clinical notes using natural language processing: a systematic review. BMC Bioinformatics 2023; 24:405. [PMID: 37898795 PMCID: PMC10613366 DOI: 10.1186/s12859-023-05480-0] [Citation(s) in RCA: 10] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/13/2022] [Accepted: 09/13/2023] [Indexed: 10/30/2023] Open
Abstract
BACKGROUND Extracting information from free texts using natural language processing (NLP) can save time and reduce the hassle of manually extracting large quantities of data from incredibly complex clinical notes of cancer patients. This study aimed to systematically review studies that used NLP methods to identify cancer concepts from clinical notes automatically. METHODS PubMed, Scopus, Web of Science, and Embase were searched for English language papers using a combination of the terms concerning "Cancer", "NLP", "Coding", and "Registries" until June 29, 2021. Two reviewers independently assessed the eligibility of papers for inclusion in the review. RESULTS Most of the software programs used for concept extraction reported were developed by the researchers (n = 7). Rule-based algorithms were the most frequently used algorithms for developing these programs. In most articles, the criteria of accuracy (n = 14) and sensitivity (n = 12) were used to evaluate the algorithms. In addition, Systematized Nomenclature of Medicine-Clinical Terms (SNOMED-CT) and Unified Medical Language System (UMLS) were the most commonly used terminologies to identify concepts. Most studies focused on breast cancer (n = 4, 19%) and lung cancer (n = 4, 19%). CONCLUSION The use of NLP for extracting the concepts and symptoms of cancer has increased in recent years. The rule-based algorithms are well-liked algorithms by developers. Due to these algorithms' high accuracy and sensitivity in identifying and extracting cancer concepts, we suggested that future studies use these algorithms to extract the concepts of other diseases as well.
Collapse
Affiliation(s)
- Maryam Gholipour
- Student Research Committee, Kerman University of Medical Sciences, Kerman, Iran
| | - Reza Khajouei
- Department of Health Information Sciences, Faculty of Management and Medical Information Sciences, Kerman University of Medical Sciences, Kerman, Iran
| | - Parastoo Amiri
- Student Research Committee, Kerman University of Medical Sciences, Kerman, Iran
| | - Sadrieh Hajesmaeel Gohari
- Medical Informatics Research Center, Institute for Futures Studies in Health, Kerman University of Medical Sciences, Kerman, Iran
| | - Leila Ahmadian
- Department of Health Information Sciences, Faculty of Management and Medical Information Sciences, Kerman University of Medical Sciences, Kerman, Iran.
| |
Collapse
|
8
|
Berloco F, Ciavarella S, Colucci S, Grieco LA, Guarini A, Zaccaria GM. ARGO 2.0: a Hybrid NLP/ML Framework for Diagnosis Standardization. ANNUAL INTERNATIONAL CONFERENCE OF THE IEEE ENGINEERING IN MEDICINE AND BIOLOGY SOCIETY. IEEE ENGINEERING IN MEDICINE AND BIOLOGY SOCIETY. ANNUAL INTERNATIONAL CONFERENCE 2023; 2023:1-4. [PMID: 38083100 DOI: 10.1109/embc40787.2023.10340022] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/18/2023]
Abstract
A relevant problem in medicine is the standardization of the diagnosis associated with a clinical case. Although diagnosis formulation is an intrinsically subjective and uncertain process, its standardization may take benefit from digital solutions automating the routines at the basis of such a decision. In this work, we propose ARGO 2.0: a framework for the development of decision support systems for diagnosis formulation. The framework can read free-text reports and store their clinically relevant information as personalized electronic Case Report Forms. A hybrid strategy, exploiting the synergy of Natural Language Processing and Machine Learning techniques, is used to automatically suggest a diagnosis in a standardized fashion. ARGO 2.0 has been designed to be template-independent and easily tailored to specific medical fields. We here demonstrate its feasibility in hemo lympho-pathology, by detailing its implementation, object of an ongoing validation campaign in a standing medical institute. ARGO 2.0 achieved an average Accuracy of 95.07%, an average precision of 94.85%, an average Recall of 96.31% and a F-Score of 95.32% onto the test set, outperforming both its embedded components, based on Natural Language Processing and Machine Learning.
Collapse
|
9
|
Barr B, Harasemiw O, Gibson IW, Tremblay-Savard O, Tangri N. The Development of a Comprehensive Clinicopathologic Registry for Glomerular Diseases Using Natural Language Processing. Can J Kidney Health Dis 2023; 10:20543581231178963. [PMID: 37342151 PMCID: PMC10278432 DOI: 10.1177/20543581231178963] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/31/2023] [Accepted: 04/22/2023] [Indexed: 06/22/2023] Open
Abstract
Background Glomerulonephritis (GN) represents a common cause of chronic kidney disease, and treatment to slow or prevent progression of GN is associated with significant morbidity. Large patient registries have improved the understanding of risk stratification, treatment selection, and definitions of treatment response in GN, but can be resource-intensive, with incomplete patient capture. Objective To describe the creation of a comprehensive clinicopathologic registry for all patients undergoing kidney biopsy in Manitoba, using natural language processing software for data extraction from pathology reports, as well as to describe cohort characteristics and outcomes. Design Retrospective population-based cohort study. Setting Tertiary care center in the province of Manitoba. Patients All patients undergoing a kidney biopsy in the province of Manitoba from 2002 to 2019. Measurements Descriptive statistics are presented for the most common glomerular diseases, along with outcomes of kidney failure and mortality for the individual diseases. Methods Data from native kidney biopsy reports from January 2002 to December 2019 were extracted into a structured database using a natural language processing algorithm employing regular expressions. The pathology database was then linked with population-level clinical, laboratory, and medication data, creating a comprehensive clinicopathologic registry. Kaplan-Meier curves and Cox models were constructed to assess the relationship between type of GN and outcomes of kidney failure and mortality. Results Of 2421 available biopsies, 2103 individuals were linked to administrative data, of which 1292 had a common glomerular disease. The incidence of yearly biopsies increased almost 3-fold over the study period. Among common glomerular diseases, immunoglobulin A (IgA) nephropathy was the most common (28.6%), whereas infection-related GN had the highest proportions of kidney failure (70.3%) and all-cause mortality (42.3%). Predictors of kidney failure included urine albumin-to-creatinine ratio at the time of biopsy (adjusted hazard ratio [HR] = 1.43, 95% confidence interval [CI] = 1.24-1.65), whereas predictors of mortality included age at the time of biopsy (adjusted HR = 1.05, 95% CI = 1.04-1.06) and infection-related GN (adjusted HR = 1.85, 95% CI = 1.14-2.99, compared with the reference category of IgA nephropathy). Limitations Retrospective, single-center study with a relatively small number of biopsies. Conclusions Creation of a comprehensive glomerular diseases registry is feasible and can be facilitated through the use of novel data extraction methods. This registry will facilitate further epidemiological research in GN.
Collapse
Affiliation(s)
- Bryce Barr
- Department of Internal Medicine, University of Manitoba, Winnipeg, Canada
- Chronic Disease Innovation Centre, Seven Oaks General Hospital, Winnipeg, MB, Canada
| | - Oksana Harasemiw
- Department of Internal Medicine, University of Manitoba, Winnipeg, Canada
- Chronic Disease Innovation Centre, Seven Oaks General Hospital, Winnipeg, MB, Canada
| | - Ian W Gibson
- Department of Pathology, University of Manitoba, Winnipeg, Canada
- Shared Health Services Manitoba, Winnipeg, Canada
| | | | - Navdeep Tangri
- Department of Internal Medicine, University of Manitoba, Winnipeg, Canada
- Chronic Disease Innovation Centre, Seven Oaks General Hospital, Winnipeg, MB, Canada
| |
Collapse
|
10
|
Keloth VK, Banda JM, Gurley M, Heider PM, Kennedy G, Liu H, Liu F, Miller T, Natarajan K, V Patterson O, Peng Y, Raja K, Reeves RM, Rouhizadeh M, Shi J, Wang X, Wang Y, Wei WQ, Williams AE, Zhang R, Belenkaya R, Reich C, Blacketer C, Ryan P, Hripcsak G, Elhadad N, Xu H. Representing and utilizing clinical textual data for real world studies: An OHDSI approach. J Biomed Inform 2023; 142:104343. [PMID: 36935011 PMCID: PMC10428170 DOI: 10.1016/j.jbi.2023.104343] [Citation(s) in RCA: 7] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/25/2022] [Revised: 01/21/2023] [Accepted: 03/13/2023] [Indexed: 03/19/2023]
Abstract
Clinical documentation in electronic health records contains crucial narratives and details about patients and their care. Natural language processing (NLP) can unlock the information conveyed in clinical notes and reports, and thus plays a critical role in real-world studies. The NLP Working Group at the Observational Health Data Sciences and Informatics (OHDSI) consortium was established to develop methods and tools to promote the use of textual data and NLP in real-world observational studies. In this paper, we describe a framework for representing and utilizing textual data in real-world evidence generation, including representations of information from clinical text in the Observational Medical Outcomes Partnership (OMOP) Common Data Model (CDM), the workflow and tools that were developed to extract, transform and load (ETL) data from clinical notes into tables in OMOP CDM, as well as current applications and specific use cases of the proposed OHDSI NLP solution at large consortia and individual institutions with English textual data. Challenges faced and lessons learned during the process are also discussed to provide valuable insights for researchers who are planning to implement NLP solutions in real-world studies.
Collapse
Affiliation(s)
- Vipina K Keloth
- Section of Biomedical Informatics and Data Science, Yale School of Medicine, Yale University, New Haven, CT, USA
| | - Juan M Banda
- Department of Computer Science, Georgia State University, Atlanta, GA, USA
| | - Michael Gurley
- Lurie Cancer Center, Northwestern University, Chicago, Illinois, USA
| | - Paul M Heider
- Biomedical Informatics Center, Medical University of South Carolina, Charleston, SC, USA
| | - Georgina Kennedy
- Ingham Institute for Applied Medical Research, Sydney, Australia
| | - Hongfang Liu
- Department of Artificial Intelligence and Informatics, Mayo Clinic, Rochester, MN, USA
| | - Feifan Liu
- Department of Population and Quantitative Health Sciences, University of Massachusetts Chan Medical School, Worcester, MA, USA
| | - Timothy Miller
- Computational Health Informatics Program, Boston Children's Hospital, and Department of Pediatrics, Harvard Medical School, Boston, MA, USA
| | - Karthik Natarajan
- Department of Biomedical Informatics, Columbia University Irving Medical Center, New York, NY, USA
| | - Olga V Patterson
- VA Informatics and Computing Infrastructure, Department of Veterans Affairs Salt Lake City Health Care System, Salt Lake City, Utah, USA; Division of Epidemiology, Department of Internal Medicine, School of Medicine, University of Utah, Salt Lake City, Utah, USA; Verily Life Sciences, Mountain View, CA, USA
| | - Yifan Peng
- Department of Population Health Sciences, Weill Cornell Medicine, New York, NY, USA
| | - Kalpana Raja
- Section of Biomedical Informatics and Data Science, Yale School of Medicine, Yale University, New Haven, CT, USA
| | - Ruth M Reeves
- TN Valley Healthcare System, U.S. Department of Veterans Affairs, Nashville, TN, USA; Department of Biomedical Informatics, Vanderbilt University Medical Center, Nashville, TN, USA
| | - Masoud Rouhizadeh
- Department of Pharmaceutical Outcomes & Policy, University of Florida, Gainesville, FL, USA; Biomedical Informatics and Data Science, Johns Hopkins University, Baltimore, MD, USA
| | - Jianlin Shi
- VA Informatics and Computing Infrastructure, Department of Veterans Affairs Salt Lake City Health Care System, Salt Lake City, Utah, USA; Division of Epidemiology, Department of Internal Medicine, School of Medicine, University of Utah, Salt Lake City, Utah, USA; Department of Biomedical Informatics, University of Utah, Salt Lake City, USA
| | - Xiaoyan Wang
- Sema4 Mount Sinai Genomics Incorporation, Stamford, CT, USA
| | - Yanshan Wang
- Department of Health Information Management, Department of Biomedical Informatics, and Intelligent Systems Program, University of Pittsburgh, Pittsburgh, PA, USA
| | - Wei-Qi Wei
- Department of Biomedical Informatics, Vanderbilt University Medical Center, Nashville, TN, USA
| | | | - Rui Zhang
- Institute for Health Informatics, and Department of Pharmaceutical Care & Health Systems, University of Minnesota, Minneapolis, MN, USA
| | | | | | - Clair Blacketer
- Janssen Pharmaceutical Research and Development LLC, Titusville, NJ, USA; Department of Medical Informatics, Erasmus University Medical Center, Rotterdam, the Netherlands
| | - Patrick Ryan
- Department of Biomedical Informatics, Columbia University Irving Medical Center, New York, NY, USA; Janssen Pharmaceutical Research and Development LLC, Titusville, NJ, USA
| | - George Hripcsak
- Department of Biomedical Informatics, Columbia University Irving Medical Center, New York, NY, USA
| | - Noémie Elhadad
- Department of Biomedical Informatics, Columbia University Irving Medical Center, New York, NY, USA.
| | - Hua Xu
- Section of Biomedical Informatics and Data Science, Yale School of Medicine, Yale University, New Haven, CT, USA.
| |
Collapse
|
11
|
Kempf E, Vaterkowski M, Leprovost D, Griffon N, Ouagne D, Breant S, Serre P, Mouchet A, Rance B, Chatellier G, Bellamine A, Frank M, Guerin J, Tannier X, Livartowski A, Hilka M, Daniel C. How to Improve Cancer Patients ENrollment in Clinical Trials From rEal-Life Databases Using the Observational Medical Outcomes Partnership Oncology Extension: Results of the PENELOPE Initiative in Urologic Cancers. JCO Clin Cancer Inform 2023; 7:e2200179. [PMID: 37167578 DOI: 10.1200/cci.22.00179] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 05/13/2023] Open
Abstract
PURPOSE To compare the computability of Observational Medical Outcomes Partnership (OMOP)-based queries related to prescreening of patients using two versions of the OMOP common data model (CDM; v5.3 and v5.4) and to assess the performance of the Greater Paris University Hospital (APHP) prescreening tool. MATERIALS AND METHODS We identified the prescreening information items being relevant for prescreening of patients with cancer. We randomly selected 15 academic and industry-sponsored urology phase I-IV clinical trials (CTs) launched at APHP between 2016 and 2021. The computability of the related prescreening criteria (PC) was defined by their translation rate in OMOP-compliant queries and by their execution rate on the APHP clinical data warehouse (CDW) containing data of 205,977 patients with cancer. The overall performance of the prescreening tool was assessed by the rate of true- and false-positive cases of three randomly selected CTs. RESULTS We defined a list of 15 minimal information items being relevant for patients' prescreening. We identified 83 PC of the 534 eligibility criteria from the 15 CTs. We translated 33 and 62 PC in queries on the basis of OMOP CDM v5.3 and v5.4, respectively (translation rates of 40% and 75%, respectively). Of the 33 PC translated in the v5.3 of the OMOP CDM, 19 could be executed on the APHP CDW (execution rate of 58%). Of 83 PC, the computability rate on the APHP CDW reached 23%. On the basis of three CTs, we identified 17, 32, and 63 patients as being potentially eligible for inclusion in those CTs, resulting in positive predictive values of 53%, 41%, and 21%, respectively. CONCLUSION We showed that PC could be formalized according to the OMOP CDM and that the oncology extension increased their translation rate through better representation of cancer natural history.
Collapse
Affiliation(s)
- Emmanuelle Kempf
- Sorbonne Université, Inserm, Université Sorbonne Paris Nord, Laboratoire d'Informatique Médicale et d'Ingénierie des Connaissances pour la e-Santé, LIMICS, Paris, France
- Department of Medical Oncology, Assistance Publique Hôpitaux de Paris, Henri Mondor Teaching Hospital, Créteil, France
| | - Morgan Vaterkowski
- Innovation and Data, Paris, IT Department, Assistance Publique Hôpitaux de Paris, Paris, France
- EPITA School of Engineering and Computer Science, Paris, France
| | - Damien Leprovost
- Sorbonne Université, Inserm, Université Sorbonne Paris Nord, Laboratoire d'Informatique Médicale et d'Ingénierie des Connaissances pour la e-Santé, LIMICS, Paris, France
- Innovation and Data, Paris, IT Department, Assistance Publique Hôpitaux de Paris, Paris, France
| | - Nicolas Griffon
- Sorbonne Université, Inserm, Université Sorbonne Paris Nord, Laboratoire d'Informatique Médicale et d'Ingénierie des Connaissances pour la e-Santé, LIMICS, Paris, France
- Innovation and Data, Paris, IT Department, Assistance Publique Hôpitaux de Paris, Paris, France
| | - David Ouagne
- Innovation and Data, Paris, IT Department, Assistance Publique Hôpitaux de Paris, Paris, France
| | - Stéphane Breant
- Innovation and Data, Paris, IT Department, Assistance Publique Hôpitaux de Paris, Paris, France
| | - Patricia Serre
- Innovation and Data, Paris, IT Department, Assistance Publique Hôpitaux de Paris, Paris, France
| | - Alexandre Mouchet
- Innovation and Data, Paris, IT Department, Assistance Publique Hôpitaux de Paris, Paris, France
| | - Bastien Rance
- Department of Medical Informatics, Assistance Publique Hôpitaux de Paris, Centre-Université de Paris (APHP-CUP), Université de Paris, Paris, France
| | - Gilles Chatellier
- Department of Medical Informatics, Assistance Publique Hôpitaux de Paris, Centre-Université de Paris (APHP-CUP), Université de Paris, Paris, France
| | - Ali Bellamine
- Innovation and Data, Paris, IT Department, Assistance Publique Hôpitaux de Paris, Paris, France
| | - Marie Frank
- Department of Medical Information, Paris Saclay Teaching Hospital, Assistance Publique Hôpitaux de Paris, Paris, France
| | | | - Xavier Tannier
- Sorbonne Université, Inserm, Université Sorbonne Paris Nord, Laboratoire d'Informatique Médicale et d'Ingénierie des Connaissances pour la e-Santé, LIMICS, Paris, France
| | | | - Martin Hilka
- Innovation and Data, Paris, IT Department, Assistance Publique Hôpitaux de Paris, Paris, France
| | - Christel Daniel
- Sorbonne Université, Inserm, Université Sorbonne Paris Nord, Laboratoire d'Informatique Médicale et d'Ingénierie des Connaissances pour la e-Santé, LIMICS, Paris, France
- Innovation and Data, Paris, IT Department, Assistance Publique Hôpitaux de Paris, Paris, France
| |
Collapse
|
12
|
Seong D, Choi YH, Shin SY, Yi BK. Deep learning approach to detection of colonoscopic information from unstructured reports. BMC Med Inform Decis Mak 2023; 23:28. [PMID: 36750932 PMCID: PMC9903463 DOI: 10.1186/s12911-023-02121-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/23/2022] [Accepted: 01/23/2023] [Indexed: 02/09/2023] Open
Abstract
BACKGROUND Colorectal cancer is a leading cause of cancer deaths. Several screening tests, such as colonoscopy, can be used to find polyps or colorectal cancer. Colonoscopy reports are often written in unstructured narrative text. The information embedded in the reports can be used for various purposes, including colorectal cancer risk prediction, follow-up recommendation, and quality measurement. However, the availability and accessibility of unstructured text data are still insufficient despite the large amounts of accumulated data. We aimed to develop and apply deep learning-based natural language processing (NLP) methods to detect colonoscopic information. METHODS This study applied several deep learning-based NLP models to colonoscopy reports. Approximately 280,668 colonoscopy reports were extracted from the clinical data warehouse of Samsung Medical Center. For 5,000 reports, procedural information and colonoscopic findings were manually annotated with 17 labels. We compared the long short-term memory (LSTM) and BioBERT model to select the one with the best performance for colonoscopy reports, which was the bidirectional LSTM with conditional random fields. Then, we applied pre-trained word embedding using large unlabeled data (280,668 reports) to the selected model. RESULTS The NLP model with pre-trained word embedding performed better for most labels than the model with one-hot encoding. The F1 scores for colonoscopic findings were: 0.9564 for lesions, 0.9722 for locations, 0.9809 for shapes, 0.9720 for colors, 0.9862 for sizes, and 0.9717 for numbers. CONCLUSIONS This study applied deep learning-based clinical NLP models to extract meaningful information from colonoscopy reports. The method in this study achieved promising results that demonstrate it can be applied to various practical purposes.
Collapse
Affiliation(s)
- Donghyeong Seong
- grid.264381.a0000 0001 2181 989XSamsung Advanced Institute for Health Sciences and Technology (SAIHST), Sungkyunkwan University, Seoul, 06355 Republic of Korea
| | - Yoon Ho Choi
- grid.264381.a0000 0001 2181 989XDepartment of Digital Health, SAIHST, Sungkyunkwan University, Seoul, 06355 Republic of Korea
| | - Soo-Yong Shin
- grid.264381.a0000 0001 2181 989XDepartment of Digital Health, SAIHST, Sungkyunkwan University, Seoul, 06355 Republic of Korea ,grid.414964.a0000 0001 0640 5613Research Institute for Future Medicine, Samsung Medical Center, Seoul, 06351 Republic of Korea
| | - Byoung-Kee Yi
- Department of Artificial Intelligence Convergence, Kangwon National University, 1 Kangwondaehak-Gil, Chuncheon-si, Gangwon-do, 24341, Republic of Korea.
| |
Collapse
|
13
|
Vuokko R, Vakkuri A, Palojoki S. Systematized Nomenclature of Medicine-Clinical Terminology (SNOMED CT) Clinical Use Cases in the Context of Electronic Health Record Systems: Systematic Literature Review. JMIR Med Inform 2023; 11:e43750. [PMID: 36745498 PMCID: PMC9941898 DOI: 10.2196/43750] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/23/2022] [Revised: 12/05/2022] [Accepted: 12/22/2022] [Indexed: 12/24/2022] Open
Abstract
BACKGROUND The Systematized Medical Nomenclature for Medicine-Clinical Terminology (SNOMED CT) is a clinical terminology system that provides a standardized and scientifically validated way of representing clinical information captured by clinicians. It can be integrated into electronic health records (EHRs) to increase the possibilities for effective data use and ensure a better quality of documentation that supports continuity of care, thus enabling better quality in the care process. Even though SNOMED CT consists of extensively studied clinical terminology, previous research has repeatedly documented a lack of scientific evidence for SNOMED CT in the form of reported clinical use cases in electronic health record systems. OBJECTIVE The aim of this study was to explore evidence in previous literature reviews of clinical use cases of SNOMED CT integrated into EHR systems or other clinical applications during the last 5 years of continued development. The study sought to identify the main clinical use purposes, use phases, and key clinical benefits documented in SNOMED CT use cases. METHODS The Cochrane review protocol was applied for the study design. The application of the protocol was modified step-by-step to fit the research problem by first defining the search strategy, identifying the articles for the review by isolating the exclusion and inclusion criteria for assessing the search results, and lastly, evaluating and summarizing the review results. RESULTS In total, 17 research articles illustrating SNOMED CT clinical use cases were reviewed. The use purpose of SNOMED CT was documented in all the articles, with the terminology as a standard in EHR being the most common (8/17). The clinical use phase was documented in all the articles. The most common category of use phases was SNOMED CT in development (6/17). Core benefits achieved by applying SNOMED CT in a clinical context were identified by the researchers. These were related to terminology use outcomes, that is, to data quality in general or to enabling a consistent way of indexing, storing, retrieving, and aggregating clinical data (8/17). Additional benefits were linked to the productivity of coding or to advances in the quality and continuity of care. CONCLUSIONS While the SNOMED CT use categories were well supported by previous research, this review demonstrates that further systematic research on clinical use cases is needed to promote the scalability of the review results. To achieve the best out-of-use case reports, more emphasis is suggested on describing the contextual factors, such as the electronic health care system and the use of previous frameworks to enable comparability of results. A lesson to be drawn from our study is that SNOMED CT is essential for structuring clinical data; however, research is needed to gather more evidence of how SNOMED CT benefits clinical care and patient safety.
Collapse
Affiliation(s)
- Riikka Vuokko
- Unit for Digitalization and Management, Ministry of Social Affairs and Health, Helsinki, Finland
| | - Anne Vakkuri
- Perioperative, Intensive Care and Pain Medicine, Helsinki University Hospital, Vantaa, Finland
| | - Sari Palojoki
- Unit for Digital Transformation, European Centre for Disease Prevention and Control, Stockholm, Sweden
| |
Collapse
|
14
|
López-Úbeda P, Martín-Noguerol T, Aneiros-Fernández J, Luna A. Natural Language Processing in Pathology: Current Trends and Future Insights. THE AMERICAN JOURNAL OF PATHOLOGY 2022; 192:1486-1495. [PMID: 35985480 DOI: 10.1016/j.ajpath.2022.07.012] [Citation(s) in RCA: 9] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/07/2022] [Revised: 07/21/2022] [Accepted: 07/29/2022] [Indexed: 06/15/2023]
Abstract
Natural language processing (NLP) plays a key role in advancing health care, being key to extracting structured information from electronic health reports. In the last decade, several advances in the field of pathology have been derived from the application of NLP to pathology reports. Herein, a comprehensive review of the most used NLP methods for extracting, coding, and organizing information from pathology reports is presented, including how the development of tools is used to improve workflow. In addition, this article discusses, from a practical point of view, the steps necessary to extract data and encode natural language information for its analytical processing, ranging from preprocessing of text to its inclusion in complex algorithms. Finally, the potential of NLP-based automatic solutions for improving workflow in pathology and their further applications in the near future is highlighted.
Collapse
Affiliation(s)
| | | | | | - Antonio Luna
- MRI Unit, Radiology Department, HT Medica, Jaén, Spain
| |
Collapse
|
15
|
Yoo S, Yoon E, Boo D, Kim B, Kim S, Paeng JC, Yoo IR, Choi IY, Kim K, Ryoo HG, Lee SJ, Song E, Joo YH, Kim J, Lee HY. Transforming Thyroid Cancer Diagnosis and Staging Information from Unstructured Reports to the Observational Medical Outcome Partnership Common Data Model. Appl Clin Inform 2022; 13:521-531. [PMID: 35705182 PMCID: PMC9200482 DOI: 10.1055/s-0042-1748144] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/03/2022] Open
Abstract
BACKGROUND Cancer staging information is an essential component of cancer research. However, the information is primarily stored as either a full or semistructured free-text clinical document which is limiting the data use. By transforming the cancer-specific data to the Observational Medical Outcome Partnership Common Data Model (OMOP CDM), the information can contribute to establish multicenter observational cancer studies. To the best of our knowledge, there have been no studies on OMOP CDM transformation and natural language processing (NLP) for thyroid cancer to date. OBJECTIVE We aimed to demonstrate the applicability of the OMOP CDM oncology extension module for thyroid cancer diagnosis and cancer stage information by processing free-text medical reports. METHODS Thyroid cancer diagnosis and stage-related modifiers were extracted with rule-based NLP from 63,795 thyroid cancer pathology reports and 56,239 Iodine whole-body scan reports from three medical institutions in the Observational Health Data Sciences and Informatics data network. The data were converted into the OMOP CDM v6.0 according to the OMOP CDM oncology extension module. The cancer staging group was derived and populated using the transformed CDM data. RESULTS The extracted thyroid cancer data were completely converted into the OMOP CDM. The distributions of histopathological types of thyroid cancer were approximately 95.3 to 98.8% of papillary carcinoma, 0.9 to 3.7% of follicular carcinoma, 0.04 to 0.54% of adenocarcinoma, 0.17 to 0.81% of medullary carcinoma, and 0 to 0.3% of anaplastic carcinoma. Regarding cancer staging, stage-I thyroid cancer accounted for 55 to 64% of the cases, while stage III accounted for 24 to 26% of the cases. Stage-II and -IV thyroid cancers were detected at a low rate of 2 to 6%. CONCLUSION As a first study on OMOP CDM transformation and NLP for thyroid cancer, this study will help other institutions to standardize thyroid cancer-specific data for retrospective observational research and participate in multicenter studies.
Collapse
Affiliation(s)
- Sooyoung Yoo
- Office of eHealth Research and Business, Healthcare Innovation Park, Seoul National University Bundang Hospital, Seongnam, South Korea
| | - Eunsil Yoon
- Office of eHealth Research and Business, Healthcare Innovation Park, Seoul National University Bundang Hospital, Seongnam, South Korea
| | - Dachung Boo
- Office of eHealth Research and Business, Healthcare Innovation Park, Seoul National University Bundang Hospital, Seongnam, South Korea
| | - Borham Kim
- Office of eHealth Research and Business, Healthcare Innovation Park, Seoul National University Bundang Hospital, Seongnam, South Korea
| | - Seok Kim
- Office of eHealth Research and Business, Healthcare Innovation Park, Seoul National University Bundang Hospital, Seongnam, South Korea
| | - Jin Chul Paeng
- Department of Nuclear Medicine, Seoul National University, College of Medicine, Seoul, South Korea
| | - Ie Ryung Yoo
- Division of Nuclear Medicine, Department of Radiology, Seoul St. Mary's Hospital, College of Medicine, The Catholic University of Korea, Seoul, South Korea
| | - In Young Choi
- Department of Medical Informatics, The Catholic University of Korea, College of Medicine, Seoul, South Korea.,Department of Biomedicine and Health Sciences, The Catholic University of Korea, College of Medicine, Seoul, South Korea
| | - Kwangsoo Kim
- Transdisciplinary Department of Medicine and Advanced Technology, Seoul National University Hospital, Seoul, South Korea
| | - Hyun Gee Ryoo
- Department of Nuclear Medicine, Seoul National University Hospital, Seoul, South Korea.,Department of Nuclear Medicine, Seoul National University Bundang Hospital, Seongnam, South Korea
| | - Sun Jung Lee
- Department of Medical Informatics, The Catholic University of Korea, College of Medicine, Seoul, South Korea.,Department of Biomedicine and Health Sciences, The Catholic University of Korea, College of Medicine, Seoul, South Korea
| | - Eunhye Song
- Department of Data Science Research, Innovative Medical Technology Research Institute, Seoul National University Hospital, Seoul, South Korea
| | - Young-Hwan Joo
- Biomedical Research Institute, Seoul National University Hospital, Seoul, South Korea
| | - Junmo Kim
- Interdisciplinary Program in Bioengineering, Seoul National University, Seoul, South Korea
| | - Ho-Young Lee
- Office of eHealth Research and Business, Healthcare Innovation Park, Seoul National University Bundang Hospital, Seongnam, South Korea.,Department of Nuclear Medicine, Seoul National University, College of Medicine, Seoul, South Korea
| |
Collapse
|
16
|
Arvisais-Anhalt S, Lehmann CU, Bishop JA, Balani J, Boutte L, Morales M, Park JY, Araj E. Searching Full-Text Anatomic Pathology Reports Using Business Intelligence Software. J Pathol Inform 2022; 13:100014. [PMID: 35251753 PMCID: PMC8892022 DOI: 10.1016/j.jpi.2022.100014] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 12/07/2021] [Indexed: 01/24/2023] Open
Abstract
Although the laboratory information system has largely solved the problem of storing anatomic pathology reports and disseminating their contents across the healthcare system, the retrospective query of anatomic pathology reports remains an area for improvement across laboratory information system vendors. Our institution desired the ability to query our repository of anatomic pathology reports for clinical, operational, research, and educational purposes. To address this need, we developed a full-text anatomic pathology search tool using the business intelligence software, Tableau. Our search tool allows users to query the 333,685 anatomic pathology reports from our institutional clinical relational database using the business intelligence tool's built-in regular expression functionality. Users securely access the search tool using any web browser, thereby avoiding the cost of installing or maintaining software on users' computers. This tool is laboratory information system vendor agnostic and as many institutions already subscribe to business intelligence software, we believe this solution could be easily reproduced at other institutions and in other clinical departments.
Collapse
Affiliation(s)
- Simone Arvisais-Anhalt
- Department of Hospital Medicine and Department of Laboratory Medicine, University of California San Francisco, San Francisco, CA, USA
| | - Christoph U. Lehmann
- Clinical Informatics Center, University of Texas Southwestern Medical Center, Dallas, TX, USA
| | - Justin A. Bishop
- Department of Pathology, University of Texas Southwestern Medical Center, Dallas, TX, USA
| | - Jyoti Balani
- Department of Pathology, University of Texas Southwestern Medical Center, Dallas, TX, USA
| | - Laurie Boutte
- Health System Quality & Operational Excellence, University of Texas Southwestern Medical Center, Dallas, TX, USA
| | - Marjorie Morales
- Health System Quality & Operational Excellence, University of Texas Southwestern Medical Center, Dallas, TX, USA
| | - Jason Y. Park
- Department of Pathology, University of Texas Southwestern Medical Center, Dallas, TX, USA
| | - Ellen Araj
- Department of Pathology, University of Texas Southwestern Medical Center, Dallas, TX, USA,Corresponding author at: Department of Pathology, University of Texas Southwestern Medical Center, 5323 Harry Hines Blvd., Dallas, TX 75390-9072, USA.
| |
Collapse
|
17
|
Zaccaria GM, Colella V, Colucci S, Clemente F, Pavone F, Vegliante MC, Esposito F, Opinto G, Scattone A, Loseto G, Minoia C, Rossini B, Quinto AM, Angiulli V, Grieco LA, Fama A, Ferrero S, Moia R, Di Rocco A, Quaglia FM, Tabanelli V, Guarini A, Ciavarella S. Electronic case report forms generation from pathology reports by ARGO, automatic record generator for onco-hematology. Sci Rep 2021; 11:23823. [PMID: 34893665 PMCID: PMC8664934 DOI: 10.1038/s41598-021-03204-z] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/01/2021] [Accepted: 11/23/2021] [Indexed: 12/04/2022] Open
Abstract
The unstructured nature of Real-World (RW) data from onco-hematological patients and the scarce accessibility to integrated systems restrain the use of RW information for research purposes. Natural Language Processing (NLP) might help in transposing unstructured reports into standardized electronic health records. We exploited NLP to develop an automated tool, named ARGO (Automatic Record Generator for Onco-hematology) to recognize information from pathology reports and populate electronic case report forms (eCRFs) pre-implemented by REDCap. ARGO was applied to hemo-lymphopathology reports of diffuse large B-cell, follicular, and mantle cell lymphomas, and assessed for accuracy (A), precision (P), recall (R) and F1-score (F) on internal (n = 239) and external (n = 93) report series. 326 (98.2%) reports were converted into corresponding eCRFs. Overall, ARGO showed high performance in capturing (1) identification report number (all metrics > 90%), (2) biopsy date (all metrics > 90% in both series), (3) specimen type (86.6% and 91.4% of A, 98.5% and 100.0% of P, 92.5% and 95.5% of F, and 87.2% and 91.4% of R for internal and external series, respectively), (4) diagnosis (100% of P with A, R and F of 90% in both series). We developed and validated a generalizable tool that generates structured eCRFs from real-life pathology reports.
Collapse
Affiliation(s)
- Gian Maria Zaccaria
- Hematology and Cell Therapy Unit, IRCCS Istituto Tumori 'Giovanni Paolo II', Viale Orazio Flacco, 65, Bari, Italy.
| | - Vito Colella
- Department of Electrical and Information Engineering, Politecnico of Bari, Bari, Italy
| | - Simona Colucci
- Department of Electrical and Information Engineering, Politecnico of Bari, Bari, Italy
| | - Felice Clemente
- Hematology and Cell Therapy Unit, IRCCS Istituto Tumori 'Giovanni Paolo II', Viale Orazio Flacco, 65, Bari, Italy
| | - Fabio Pavone
- Hematology and Cell Therapy Unit, IRCCS Istituto Tumori 'Giovanni Paolo II', Viale Orazio Flacco, 65, Bari, Italy
| | - Maria Carmela Vegliante
- Hematology and Cell Therapy Unit, IRCCS Istituto Tumori 'Giovanni Paolo II', Viale Orazio Flacco, 65, Bari, Italy
| | - Flavia Esposito
- Hematology and Cell Therapy Unit, IRCCS Istituto Tumori 'Giovanni Paolo II', Viale Orazio Flacco, 65, Bari, Italy.,Department of Mathematics, University of Bari Aldo Moro, Bari, Italy
| | - Giuseppina Opinto
- Hematology and Cell Therapy Unit, IRCCS Istituto Tumori 'Giovanni Paolo II', Viale Orazio Flacco, 65, Bari, Italy
| | - Anna Scattone
- Pathology Department, IRCCS Istituto Tumori 'Giovanni Paolo II', Bari, Italy
| | - Giacomo Loseto
- Hematology and Cell Therapy Unit, IRCCS Istituto Tumori 'Giovanni Paolo II', Viale Orazio Flacco, 65, Bari, Italy
| | - Carla Minoia
- Hematology and Cell Therapy Unit, IRCCS Istituto Tumori 'Giovanni Paolo II', Viale Orazio Flacco, 65, Bari, Italy
| | - Bernardo Rossini
- Hematology and Cell Therapy Unit, IRCCS Istituto Tumori 'Giovanni Paolo II', Viale Orazio Flacco, 65, Bari, Italy
| | - Angela Maria Quinto
- Hematology and Cell Therapy Unit, IRCCS Istituto Tumori 'Giovanni Paolo II', Viale Orazio Flacco, 65, Bari, Italy
| | - Vito Angiulli
- Clinical Engineering Unit, IRCCS Istituto Tumori 'Giovanni Paolo II', Bari, Italy
| | - Luigi Alfredo Grieco
- Department of Electrical and Information Engineering, Politecnico of Bari, Bari, Italy
| | - Angelo Fama
- Hematology, Azienda USL - IRCCS Di Reggio Emilia, Reggio Emilia, Italy
| | - Simone Ferrero
- Division of Hematology 1, AOU "Città Della Salute e Della Scienza di Torino", Torino, Italy.,Department of Molecular Biotechnologies and Health Sciences, University of Torino, Torino, Italy
| | - Riccardo Moia
- Division of Hematology, Azienda Ospedaliero-Universitaria Maggiore Della Carità Di Novara, Novara, Italy
| | - Alice Di Rocco
- Unit of Hematology, Azienda Ospedaliero-Universitaria Policlinico Umberto I, Roma, Italy
| | | | - Valentina Tabanelli
- Division of Diagnostic Haematopathology, European Institute of Oncology, IRCCS, Milano, Italy
| | - Attilio Guarini
- Hematology and Cell Therapy Unit, IRCCS Istituto Tumori 'Giovanni Paolo II', Viale Orazio Flacco, 65, Bari, Italy
| | - Sabino Ciavarella
- Hematology and Cell Therapy Unit, IRCCS Istituto Tumori 'Giovanni Paolo II', Viale Orazio Flacco, 65, Bari, Italy
| |
Collapse
|
18
|
Lamer A, Abou-Arab O, Bourgeois A, Parrot A, Popoff B, Beuscart JB, Tavernier B, Moussa MD. Transforming Anesthesia Data Into the Observational Medical Outcomes Partnership Common Data Model: Development and Usability Study. J Med Internet Res 2021; 23:e29259. [PMID: 34714250 PMCID: PMC8590192 DOI: 10.2196/29259] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/26/2021] [Revised: 06/14/2021] [Accepted: 07/05/2021] [Indexed: 12/13/2022] Open
Abstract
BACKGROUND Electronic health records (EHRs, such as those created by an anesthesia management system) generate a large amount of data that can notably be reused for clinical audits and scientific research. The sharing of these data and tools is generally affected by the lack of system interoperability. To overcome these issues, Observational Health Data Sciences and Informatics (OHDSI) developed the Observational Medical Outcomes Partnership (OMOP) common data model (CDM) to standardize EHR data and promote large-scale observational and longitudinal research. Anesthesia data have not previously been mapped into the OMOP CDM. OBJECTIVE The primary objective was to transform anesthesia data into the OMOP CDM. The secondary objective was to provide vocabularies, queries, and dashboards that might promote the exploitation and sharing of anesthesia data through the CDM. METHODS Using our local anesthesia data warehouse, a group of 5 experts from 5 different medical centers identified local concepts related to anesthesia. The concepts were then matched with standard concepts in the OHDSI vocabularies. We performed structural mapping between the design of our local anesthesia data warehouse and the OMOP CDM tables and fields. To validate the implementation of anesthesia data into the OMOP CDM, we developed a set of queries and dashboards. RESULTS We identified 522 concepts related to anesthesia care. They were classified as demographics, units, measurements, operating room steps, drugs, periods of interest, and features. After semantic mapping, 353 (67.7%) of these anesthesia concepts were mapped to OHDSI concepts. Further, 169 (32.3%) concepts related to periods and features were added to the OHDSI vocabularies. Then, 8 OMOP CDM tables were implemented with anesthesia data and 2 new tables (EPISODE and FEATURE) were added to store secondarily computed data. We integrated data from 5,72,609 operations and provided the code for a set of 8 queries and 4 dashboards related to anesthesia care. CONCLUSIONS Generic data concerning demographics, drugs, units, measurements, and operating room steps were already available in OHDSI vocabularies. However, most of the intraoperative concepts (the duration of specific steps, an episode of hypotension, etc) were not present in OHDSI vocabularies. The OMOP mapping provided here enables anesthesia data reuse.
Collapse
Affiliation(s)
- Antoine Lamer
- Univ. Lille, CHU Lille, ULR 2694 - METRICS: Évaluation des technologies de santé et des pratiques médicales, Lille, France
- InterHop, Paris, France
- Univ. Lille, Faculté Ingénierie et Management de la Santé, Lille, France
| | - Osama Abou-Arab
- Department of Anaesthesiology and Critical Care Medicine, Amiens Picardie University Hospital, Amiens, France
| | - Alexandre Bourgeois
- Department of Anesthesiology and Critical Care Medicine, Regional University Hospital of Nancy, Nancy, France
| | | | - Benjamin Popoff
- Department of Anaesthesiology and Critical Care, Rouen University Hospital, Rouen, France
| | - Jean-Baptiste Beuscart
- Univ. Lille, CHU Lille, ULR 2694 - METRICS: Évaluation des technologies de santé et des pratiques médicales, Lille, France
| | - Benoît Tavernier
- Univ. Lille, CHU Lille, ULR 2694 - METRICS: Évaluation des technologies de santé et des pratiques médicales, Lille, France
- Department of Anesthesiology and Critical Care, CHU Lille, Lille, France
| | | |
Collapse
|