1
|
Wang L, Wen A, Fu S, Ruan X, Huang M, Li R, Lu Q, Lyu H, Williams AE, Liu H. A scoping review of OMOP CDM adoption for cancer research using real world data. NPJ Digit Med 2025; 8:189. [PMID: 40189628 PMCID: PMC11973147 DOI: 10.1038/s41746-025-01581-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/20/2024] [Accepted: 03/23/2025] [Indexed: 04/09/2025] Open
Abstract
The Observational Medical Outcomes Partnership (OMOP) common data model (CDM) supports large-scale research by enabling distributed network analyses. However, the breadth of its adoption in cancer research is not well understood. We conducted a scoping review to describe the adoption of the OMOP CDM in cancer research. A total of 49 unique articles were included in the review, with 30 on the data analysis theme, and 20 on the infrastructure theme. This review highlighted that while the OMOP CDM ecosystem has enabled successful data support for cancer research, particularly for collaborative studies, ongoing model development and iterative improvement remain needed to fulfill additional research data needs. Expanding disease sites, specifically for rare cancers, integrating more diverse types of data sources, improving data quality, adopting advanced analytics methodology, and increasing multisite evaluations serve as important opportunities to facilitate secondary usage of observational data in future cancer research.
Collapse
Affiliation(s)
- Liwei Wang
- McWilliams School of Biomedical Informatics, The University of Texas Health Science Center at Houston, Houston, TX, USA
| | - Andrew Wen
- McWilliams School of Biomedical Informatics, The University of Texas Health Science Center at Houston, Houston, TX, USA
| | - Sunyang Fu
- McWilliams School of Biomedical Informatics, The University of Texas Health Science Center at Houston, Houston, TX, USA
| | - Xiaoyang Ruan
- McWilliams School of Biomedical Informatics, The University of Texas Health Science Center at Houston, Houston, TX, USA
| | - Ming Huang
- McWilliams School of Biomedical Informatics, The University of Texas Health Science Center at Houston, Houston, TX, USA
| | - Rui Li
- McWilliams School of Biomedical Informatics, The University of Texas Health Science Center at Houston, Houston, TX, USA
| | - Qiuhao Lu
- McWilliams School of Biomedical Informatics, The University of Texas Health Science Center at Houston, Houston, TX, USA
| | - Heather Lyu
- Department of Surgical Oncology, Division of Surgery, The University of Texas MD Anderson Cancer Center, Houston, TX, USA
| | - Andrew E Williams
- Clinical and Translational Science Institute, Tufts Medical Center, Boston, MA, USA
- Institute for Clinical Research and Health Policy Studies, Tufts Medical Center, Boston, MA, USA
| | - Hongfang Liu
- McWilliams School of Biomedical Informatics, The University of Texas Health Science Center at Houston, Houston, TX, USA.
| |
Collapse
|
2
|
Lee S, Hong N, Kim GS, Li J, Lin X, Seager S, Shin S, Kim KJ, Bae JH, You SC, Rhee Y, Kim SG. Digital Phenotyping of Rare Endocrine Diseases Across International Data Networks and the Effect of Granularity of Original Vocabulary. Yonsei Med J 2025; 66:187-194. [PMID: 39999994 PMCID: PMC11865875 DOI: 10.3349/ymj.2023.0628] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 02/19/2024] [Revised: 07/03/2024] [Accepted: 08/12/2024] [Indexed: 02/27/2025] Open
Abstract
PURPOSE Rare diseases occur in <50 per 100000 people and require lifelong management. However, essential epidemiological data on such diseases are lacking, and a consecutive monitoring system across time and regions remains to be established. Standardized digital phenotypes are required to leverage an international data network for research on rare endocrine diseases. We developed digital phenotypes for rare endocrine diseases using the observational medical outcome partnership common data model. MATERIALS AND METHODS Digital phenotypes of three rare endocrine diseases (medullary thyroid cancer, hypoparathyroidism, pheochromocytoma/paraganglioma) were validated across three databases that use different vocabularies: Severance Hospital's electronic health record from South Korea; IQVIA's United Kingdom (UK) database for general practitioners; and IQVIA's United States (US) hospital database for general hospitals. We estimated the performance of different digital phenotyping methods based on International Classification of Diseases (ICD)-10 in the UK and the US or systematized nomenclature of medicine clinical terms (SNOMED CT) in Korea. RESULTS The positive predictive value of digital phenotyping was higher using SNOMED CT-based phenotyping than ICD-10-based phenotyping for all three diseases in Korea (e.g., pheochromocytoma/paraganglioma: ICD-10, 58%-62%; SNOMED CT, 89%). Estimated incidence rates by digital phenotyping were as follows: medullary thyroid cancer, 0.34-2.07 (Korea), 0.13-0.30 (US); hypoparathyroidism, 0.40-1.20 (Korea), 0.59-1.01 (US), 0.00-1.78 (UK); and pheochromocytoma/paraganglioma, 0.95-1.67 (Korea), 0.35-0.77 (US), 0.00-0.49 (UK). CONCLUSION Our findings demonstrate the feasibility of developing digital phenotyping of rare endocrine diseases and highlight the importance of implementing SNOMED CT in routine clinical practice to provide granularity for research.
Collapse
Affiliation(s)
- Seunghyun Lee
- Department of Internal Medicine, Endocrine Research Institute, Yonsei University College of Medicine, Seoul, Korea
- Department of Internal Medicine, Yonsei University Wonju College of Medicine, Wonju, Korea
| | - Namki Hong
- Department of Internal Medicine, Endocrine Research Institute, Yonsei University College of Medicine, Seoul, Korea
- Institute for Innovation in Digital Healthcare, Yonsei University, Seoul, Korea
| | - Gyu Seop Kim
- Institute for Innovation in Digital Healthcare, Yonsei University, Seoul, Korea
| | - Jing Li
- Real-World Solutions, IQVIA, Durham, USA
| | - Xiaoyu Lin
- Real-World Solutions, IQVIA, Durham, USA
| | | | - Sungjae Shin
- Department of Internal Medicine, Endocrine Research Institute, Yonsei University College of Medicine, Seoul, Korea
| | - Kyoung Jin Kim
- Department of Internal Medicine, Korea University College of Medicine, Seoul, Korea
| | - Jae Hyun Bae
- Department of Internal Medicine, Korea University Anam Hospital, Seoul, Korea
- Department of Internal Medicine, Seoul National University Hospital, Seoul, Korea
| | - Seng Chan You
- Institute for Innovation in Digital Healthcare, Yonsei University, Seoul, Korea
- Department of Biomedical Systems Informatics, Yonsei University College of Medicine, Seoul, Korea.
| | - Yumie Rhee
- Department of Internal Medicine, Endocrine Research Institute, Yonsei University College of Medicine, Seoul, Korea.
| | - Sin Gon Kim
- Department of Internal Medicine, Korea University College of Medicine, Seoul, Korea
| |
Collapse
|
3
|
Jeon K, Park WY, Kahn CE, Nagy P, You SC, Yoon SH. Advancing Medical Imaging Research Through Standardization: The Path to Rapid Development, Rigorous Validation, and Robust Reproducibility. Invest Radiol 2025; 60:1-10. [PMID: 38985896 DOI: 10.1097/rli.0000000000001106] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 07/12/2024]
Abstract
ABSTRACT Artificial intelligence (AI) has made significant advances in radiology. Nonetheless, challenges in AI development, validation, and reproducibility persist, primarily due to the lack of high-quality, large-scale, standardized data across the world. Addressing these challenges requires comprehensive standardization of medical imaging data and seamless integration with structured medical data.Developed by the Observational Health Data Sciences and Informatics community, the OMOP Common Data Model enables large-scale international collaborations with structured medical data. It ensures syntactic and semantic interoperability, while supporting the privacy-protected distribution of research across borders. The recently proposed Medical Imaging Common Data Model is designed to encompass all DICOM-formatted medical imaging data and integrate imaging-derived features with clinical data, ensuring their provenance.The harmonization of medical imaging data and its seamless integration with structured clinical data at a global scale will pave the way for advanced AI research in radiology. This standardization will enable federated learning, ensuring privacy-preserving collaboration across institutions and promoting equitable AI through the inclusion of diverse patient populations. Moreover, it will facilitate the development of foundation models trained on large-scale, multimodal datasets, serving as powerful starting points for specialized AI applications. Objective and transparent algorithm validation on a standardized data infrastructure will enhance reproducibility and interoperability of AI systems, driving innovation and reliability in clinical applications.
Collapse
Affiliation(s)
- Kyulee Jeon
- From the Department of Biomedical Systems Informatics, Yonsei University, Seoul, South Korea (K.J., S.C.Y.); Institution for Innovation in Digital Healthcare, Yonsei University, Seoul, South Korea (K.J., S.C.Y.); Biomedical Informatics and Data Science, Johns Hopkins University, Baltimore, MD (W.Y.P., P.N.); Department of Radiology, University of Pennsylvania, Philadelphia, PA (C.E.K.); and Department of Radiology, Seoul National University College of Medicine, Seoul National University Hospital, Seoul, South Korea (S.H.Y.)
| | | | | | | | | | | |
Collapse
|
4
|
Lastrucci A, Wandael Y, Barra A, Ricci R, Pirrera A, Lepri G, Gulino RA, Miele V, Giansanti D. Revolutionizing Radiology with Natural Language Processing and Chatbot Technologies: A Narrative Umbrella Review on Current Trends and Future Directions. J Clin Med 2024; 13:7337. [PMID: 39685793 DOI: 10.3390/jcm13237337] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/24/2024] [Revised: 11/18/2024] [Accepted: 11/26/2024] [Indexed: 12/18/2024] Open
Abstract
The application of chatbots and NLP in radiology is an emerging field, currently characterized by a growing body of research. An umbrella review has been proposed utilizing a standardized checklist and quality control procedure for including scientific papers. This review explores the early developments and potential future impact of these technologies in radiology. The current literature, comprising 15 systematic reviews, highlights potentialities, opportunities, areas needing improvements, and recommendations. This umbrella review offers a comprehensive overview of the current landscape of natural language processing (NLP) and natural language models (NLMs), including chatbots, in healthcare. These technologies show potential for improving clinical decision-making, patient engagement, and communication across various medical fields. However, significant challenges remain, particularly the lack of standardized protocols, which raises concerns about the reliability and consistency of these tools in different clinical contexts. Without uniform guidelines, variability in outcomes may hinder the broader adoption of NLP/NLM technologies by healthcare providers. Moreover, the limited research on how these technologies intersect with medical devices (MDs) is a notable gap in the literature. Future research must address these challenges to fully realize the potential of NLP/NLM applications in healthcare. Key future research directions include the development of standardized protocols to ensure the consistent and safe deployment of NLP/NLM tools, particularly in high-stake areas like radiology. Investigating the integration of these technologies with MD workflows will be crucial to enhance clinical decision-making and patient care. Ethical concerns, such as data privacy, informed consent, and algorithmic bias, must also be explored to ensure responsible use in clinical settings. Longitudinal studies are needed to evaluate the long-term impact of these technologies on patient outcomes, while interdisciplinary collaboration between healthcare professionals, data scientists, and ethicists is essential for driving innovation in an ethically sound manner. Addressing these areas will advance the application of NLP/NLM technologies and improve patient care in this emerging field.
Collapse
Affiliation(s)
- Andrea Lastrucci
- Department of Allied Health Professions, Azienda Ospedaliero-Universitaria Careggi, 50134 Florence, Italy
| | - Yannick Wandael
- Department of Allied Health Professions, Azienda Ospedaliero-Universitaria Careggi, 50134 Florence, Italy
| | - Angelo Barra
- Department of Allied Health Professions, Azienda Ospedaliero-Universitaria Careggi, 50134 Florence, Italy
| | - Renzo Ricci
- Department of Allied Health Professions, Azienda Ospedaliero-Universitaria Careggi, 50134 Florence, Italy
| | | | - Graziano Lepri
- Azienda Unità Sanitaria Locale Umbria 1, Via Guerriero Guerra 21, 06127 Perugia, Italy
| | - Rosario Alfio Gulino
- Facoltà di Ingegneria, Università di Tor Vergata, Via del Politecnico, 1, 00133 Rome, Italy
| | - Vittorio Miele
- Department of Experimental Clinical and Biomedical Sciences, University of Florence, 50134 Florence, Italy
- Department of Radiology, Careggi University Hospital, 50134 Florence, Italy
| | | |
Collapse
|
5
|
Wang L, Wen A, Fu S, Ruan X, Huang M, Li R, Lu Q, Williams AE, Liu H. Adoption of the OMOP CDM for Cancer Research using Real-world Data: Current Status and Opportunities. MEDRXIV : THE PREPRINT SERVER FOR HEALTH SCIENCES 2024:2024.08.23.24311950. [PMID: 39228725 PMCID: PMC11370549 DOI: 10.1101/2024.08.23.24311950] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Subscribe] [Scholar Register] [Indexed: 09/05/2024]
Abstract
Background The Observational Medical Outcomes Partnership (OMOP) common data model (CDM) that is developed and maintained by the Observational Health Data Sciences and Informatics (OHDSI) community supports large scale cancer research by enabling distributed network analysis. As the number of studies using the OMOP CDM for cancer research increases, there is a growing need for an overview of the scope of cancer research that relies on the OMOP CDM ecosystem. Objectives In this study, we present a comprehensive review of the adoption of the OMOP CDM for cancer research and offer some insights on opportunities in leveraging the OMOP CDM ecosystem for advancing cancer research. Materials and Methods Published literature databases were searched to retrieve OMOP CDM and cancer-related English language articles published between January 2010 and December 2023. A charting form was developed for two main themes, i.e., clinically focused data analysis studies and infrastructure development studies in the cancer domain. Results In total, 50 unique articles were included, with 30 for the data analysis theme and 23 for the infrastructure theme, with 3 articles belonging to both themes. The topics covered by the existing body of research was depicted. Conclusion Through depicting the status quo of research efforts to improve or leverage the potential of the OMOP CDM ecosystem for advancing cancer research, we identify challenges and opportunities surrounding data analysis and infrastructure including data quality, advanced analytics methodology adoption, in-depth phenotypic data inclusion through NLP, and multisite evaluation.
Collapse
Affiliation(s)
- Liwei Wang
- McWilliams School of Biomedical Informatics, The University of Texas Health Science Center at Houston, TX, USA
| | - Andrew Wen
- McWilliams School of Biomedical Informatics, The University of Texas Health Science Center at Houston, TX, USA
| | - Sunyang Fu
- McWilliams School of Biomedical Informatics, The University of Texas Health Science Center at Houston, TX, USA
| | - Xiaoyang Ruan
- McWilliams School of Biomedical Informatics, The University of Texas Health Science Center at Houston, TX, USA
| | - Ming Huang
- McWilliams School of Biomedical Informatics, The University of Texas Health Science Center at Houston, TX, USA
| | - Rui Li
- McWilliams School of Biomedical Informatics, The University of Texas Health Science Center at Houston, TX, USA
| | - Qiuhao Lu
- McWilliams School of Biomedical Informatics, The University of Texas Health Science Center at Houston, TX, USA
| | - Andrew E Williams
- Clinical and Translational Science Institute Tufts Medical Center Boston US
- Institute for Clinical Research and Health Policy Studies Tufts Medical Center Boston US
| | - Hongfang Liu
- McWilliams School of Biomedical Informatics, The University of Texas Health Science Center at Houston, TX, USA
| |
Collapse
|
6
|
Loor-Torres R, Duran M, Toro-Tobon D, Mateo Chavez M, Ponce O, Soto Jacome C, Segura Torres D, Algarin Perneth S, Montori V, Golembiewski E, Borras Osorio M, Fan JW, Singh Ospina N, Wu Y, Brito JP. A Systematic Review of Natural Language Processing Methods and Applications in Thyroidology. MAYO CLINIC PROCEEDINGS. DIGITAL HEALTH 2024; 2:270-279. [PMID: 38938930 PMCID: PMC11210322 DOI: 10.1016/j.mcpdig.2024.03.007] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Indexed: 06/29/2024]
Abstract
This study aimed to review the application of natural language processing (NLP) in thyroid-related conditions and to summarize current challenges and potential future directions. We performed a systematic search of databases for studies describing NLP applications in thyroid conditions published in English between January 1, 2012 and November 4, 2022. In addition, we used a snowballing technique to identify studies missed in the initial search or published after our search timeline until April 1, 2023. For included studies, we extracted the NLP method (eg, rule-based, machine learning, deep learning, or hybrid), NLP application (eg, identification, classification, and automation), thyroid condition (eg, thyroid cancer, thyroid nodule, and functional or autoimmune disease), data source (eg, electronic health records, health forums, medical literature databases, or genomic databases), performance metrics, and stages of development. We identified 24 eligible NLP studies focusing on thyroid-related conditions. Deep learning-based methods were the most common (38%), followed by rule-based (21%), and traditional machine learning (21%) methods. Thyroid nodules (54%) and thyroid cancer (29%) were the primary conditions under investigation. Electronic health records were the dominant data source (17/24, 71%), with imaging reports being the most frequently used (15/17, 88%). There is increasing interest in NLP applications for thyroid-related studies, mostly addressing thyroid nodules and using deep learning-based methodologies with limited external validation. However, none of the reviewed NLP applications have reached clinical practice. Several limitations, including inconsistent clinical documentation and model portability, need to be addressed to promote the evaluation and implementation of NLP applications to support patient care in thyroidology.
Collapse
Affiliation(s)
| | - Mayra Duran
- Knowledge and Evaluation Research Unit, Mayo Clinic, Rochester, MN
| | - David Toro-Tobon
- Division of Endocrinology, Diabetes, Metabolism, and Nutrition, Mayo Clinic, Rochester, MN
| | | | - Oscar Ponce
- University of Edinburgh, Edinburgh, Scotland, United Kingdom
| | | | - Danny Segura Torres
- Knowledge and Evaluation Research Unit, Mayo Clinic, Rochester, MN
- University of Edinburgh, Edinburgh, Scotland, United Kingdom
- Respiratory, Cardiovascular, and Renal Pathobiology and Bioengineering, Universitat de Barcelona, Spain
| | | | - Victor Montori
- Knowledge and Evaluation Research Unit, Mayo Clinic, Rochester, MN
| | | | | | - Jungwei W. Fan
- Montefiore Health Center, Albert Einstein College of Medicine, New York, NY
| | - Naykky Singh Ospina
- Department of Medicine, and Department of Artificial Intelligence and Informatics, Mayo Clinic, Rochester, MN
- Division of Endocrinology, Department of Medicine, University of Florida, Gainesville, FL
| | - Yonghui Wu
- Department of Health Outcomes and Biomedical Informatics, University of Florida, Gainesville, FL
| | - Juan P. Brito
- Knowledge and Evaluation Research Unit, Mayo Clinic, Rochester, MN
- Division of Endocrinology, Diabetes, Metabolism, and Nutrition, Mayo Clinic, Rochester, MN
| |
Collapse
|
7
|
Lee DY, Kim N, Park C, Gan S, Son SJ, Park RW, Park B. Explainable multimodal prediction of treatment-resistance in patients with depression leveraging brain morphometry and natural language processing. Psychiatry Res 2024; 334:115817. [PMID: 38430816 DOI: 10.1016/j.psychres.2024.115817] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 08/07/2023] [Revised: 02/19/2024] [Accepted: 02/23/2024] [Indexed: 03/05/2024]
Abstract
Although 20 % of patients with depression receiving treatment do not achieve remission, predicting treatment-resistant depression (TRD) remains challenging. In this study, we aimed to develop an explainable multimodal prediction model for TRD using structured electronic medical record data, brain morphometry, and natural language processing. In total, 247 patients with a new depressive episode were included. TRD-predictive models were developed based on the combination of following parameters: selected tabular dataset features, independent components-map weightings from brain T1-weighted magnetic resonance imaging (MRI), and topic probabilities from clinical notes. All models applied the extreme gradient boosting (XGBoost) algorithm via five-fold cross-validation. The model using all data sources showed the highest area under the receiver operating characteristic of 0.794, followed by models that used combined brain MRI and structured data, brain MRI and clinical notes, clinical notes and structured data, brain MRI only, structured data only, and clinical notes only (0.770, 0.762, 0.728, 0.703, 0.684, and 0.569, respectively). Classifications of TRD were driven by several predictors, such as previous exposure to antidepressants and antihypertensive medications, sensorimotor network, default mode network, and somatic symptoms. Our findings suggest that a combination of clinical data with neuroimaging and natural language processing variables improves the prediction of TRD.
Collapse
Affiliation(s)
- Dong Yun Lee
- Department of Biomedical Informatics, Ajou University School of Medicine, Suwon, South Korea; Department of Medical Sciences, Graduate School of Ajou University, Suwon, South Korea
| | - Narae Kim
- Department of Biomedical Informatics, Ajou University School of Medicine, Suwon, South Korea; Department of Biomedical Sciences, Graduate School of Ajou University, Suwon, South Korea
| | - ChulHyoung Park
- Department of Biomedical Informatics, Ajou University School of Medicine, Suwon, South Korea; Department of Medical Sciences, Graduate School of Ajou University, Suwon, South Korea
| | - Sujin Gan
- Department of Biomedical Sciences, Graduate School of Ajou University, Suwon, South Korea
| | - Sang Joon Son
- Department of Psychiatry, Ajou University School of Medicine, Suwon, South Korea
| | - Rae Woong Park
- Department of Biomedical Informatics, Ajou University School of Medicine, Suwon, South Korea; Department of Biomedical Sciences, Graduate School of Ajou University, Suwon, South Korea.
| | - Bumhee Park
- Department of Biomedical Informatics, Ajou University School of Medicine, Suwon, South Korea; Office of Biostatistics, Medical Research Collaborating Center, Ajou Research Institute for Innovative Medicine, Ajou University Medical Center, Suwon, South Korea.
| |
Collapse
|
8
|
Bhatt S, Johnson PC, Markovitz NH, Gray T, Nipp RD, Ufere N, Rice J, Reynolds MJ, Lavoie MW, Clay MA, Lindvall C, El-Jawahri A. The Use of Natural Language Processing to Assess Social Support in Patients With Advanced Cancer. Oncologist 2022; 28:165-171. [PMID: 36427022 PMCID: PMC9907037 DOI: 10.1093/oncolo/oyac238] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/29/2022] [Accepted: 10/12/2022] [Indexed: 11/26/2022] Open
Abstract
BACKGROUND Data examining associations among social support, survival, and healthcare utilization are lacking in patients with advanced cancer. METHODS We conducted a cross-sectional secondary analysis using data from a prospective longitudinal cohort study of 966 hospitalized patients with advanced cancer at Massachusetts General Hospital from 2014 through 2017. We used NLP to identify extent of patients' social support (limited versus adequate as defined by NLP-aided review of the Electronic Health Record (EHR)). Two independent coders achieved a Kappa of 0.90 (95% CI: 0.84-1.00) using NLP. Using multivariable regression models, we examined associations of social support with: 1) OS; 2) death or readmission within 90 days of hospital discharge; 3) time to readmission within 90 days; and 4) hospital length of stay (LOS). RESULTS Patients' median age was 65 (range: 21-92) years, and a plurality had gastrointestinal (GI) cancer (34.3%) followed by lung cancer (19.5%). 6.2% (60/966) of patients had limited social support. In multivariable analyses, limited social support was not significantly associated with OS (HR = 1.13, P = 0.390), death or readmission (OR = 1.18, P = 0.578), time to readmission (HR = 0.92, P = 0.698), or LOS (β = -0.22, P = 0.726). We identified a potential interaction suggesting cancer type (GI cancer versus other) may be an effect modifier of the relationship between social support and OS (interaction term P = 0.053). In separate unadjusted analyses, limited social support was associated with lower OS (HR = 2.10, P = 0.008) in patients with GI cancer but not other cancer types (HR = 1.00, P = 0.991). CONCLUSION We used NLP to assess the extent of social support in patients with advanced cancer. We did not identify significant associations of social support with OS or healthcare utilization but found cancer type may be an effect modifier of the relationship between social support and OS. These findings underscore the potential utility of NLP for evaluating social support in patients with advanced cancer.
Collapse
Affiliation(s)
| | - P Connor Johnson
- Corresponding author: P. Connor Johnson, MD, Massachusetts General Hospital Cancer Center, 55 Fruit St., Yawkey 9A, Boston, MA 02114, USA. Tel: +1 617 724 4000; Fax: +1 617 724 1135; E-mail:
| | - Netana H Markovitz
- Department of Medicine, Division of Hematology & Oncology, Massachusetts General Hospital, Boston, MA, USA
| | - Tamryn Gray
- Harvard Medical School, Boston, MA, USA,Department of Psychosocial Oncology and Palliative Care, Dana-Farber Cancer Institute, Boston, MA, USA
| | - Ryan D Nipp
- Department of Medicine, Division of Hematology & Oncology, Massachusetts General Hospital, Boston, MA, USA,Harvard Medical School, Boston, MA, USA
| | - Nneka Ufere
- Harvard Medical School, Boston, MA, USA,Division of Gastroenterology, Department of Medicine, Massachusetts General Hospital, Brigham and Women’s Hospital, Boston, MA, USA
| | - Julia Rice
- Department of Psychiatry, Massachusetts General Hospital, Boston, MA, USA
| | - Matthew J Reynolds
- Department of Psychiatry, Massachusetts General Hospital, Boston, MA, USA
| | - Mitchell W Lavoie
- Department of Psychiatry, Massachusetts General Hospital, Boston, MA, USA
| | - Madison A Clay
- Department of Psychiatry, Massachusetts General Hospital, Boston, MA, USA
| | | | | |
Collapse
|
9
|
The Leaf Clinical Trials Corpus: a new resource for query generation from clinical trial eligibility criteria. Sci Data 2022; 9:490. [PMID: 35953524 PMCID: PMC9372145 DOI: 10.1038/s41597-022-01521-0] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/01/2022] [Accepted: 06/28/2022] [Indexed: 11/08/2022] Open
Abstract
Identifying cohorts of patients based on eligibility criteria such as medical conditions, procedures, and medication use is critical to recruitment for clinical trials. Such criteria are often most naturally described in free-text, using language familiar to clinicians and researchers. In order to identify potential participants at scale, these criteria must first be translated into queries on clinical databases, which can be labor-intensive and error-prone. Natural language processing (NLP) methods offer a potential means of such conversion into database queries automatically. However they must first be trained and evaluated using corpora which capture clinical trials criteria in sufficient detail. In this paper, we introduce the Leaf Clinical Trials (LCT) corpus, a human-annotated corpus of over 1,000 clinical trial eligibility criteria descriptions using highly granular structured labels capturing a range of biomedical phenomena. We provide details of our schema, annotation process, corpus quality, and statistics. Additionally, we present baseline information extraction results on this corpus as benchmarks for future work. Measurement(s) | Clinical Trial Eligibility Criteria | Technology Type(s) | natural language processing | Sample Characteristic - Organism | Homo sapiens |
Collapse
|
10
|
A Privacy-Preserving and Standard-Based Architecture for Secondary Use of Clinical Data. INFORMATION 2022. [DOI: 10.3390/info13020087] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/01/2023] Open
Abstract
The heterogeneity of the formats and standards of clinical data, which includes both structured, semi-structured, and unstructured data, in addition to the sensitive information contained in them, require the definition of specific approaches that are able to implement methodologies that can permit the extraction of valuable information buried under such data. Although many challenges and issues that have not been fully addressed still exist when this information must be processed and used for further purposes, the most recent techniques based on machine learning and big data analytics can support the information extraction process for the secondary use of clinical data. In particular, these techniques can facilitate the transformation of heterogeneous data into a common standard format. Moreover, they can also be exploited to define anonymization or pseudonymization approaches, respecting the privacy requirements stated in the General Data Protection Regulation, Health Insurance Portability and Accountability Act and other national and regional laws. In fact, compliance with these laws requires that only de-identified clinical and personal data can be processed for secondary analyses, in particular when data is shared or exchanged across different institutions. This work proposes a modular architecture capable of collecting clinical data from heterogeneous sources and transforming them into useful data for secondary uses, such as research, governance, and medical education purposes. The proposed architecture is able to exploit appropriate modules and algorithms, carry out transformations (pseudonymization and standardization) required to use data for the second purposes, as well as provide efficient tools to facilitate the retrieval and analysis processes. Preliminary experimental tests show good accuracy in terms of quantitative evaluations.
Collapse
|
11
|
Lee DY, Kim C, Lee S, Son SJ, Cho SM, Cho YH, Lim J, Park RW. Psychosis Relapse Prediction Leveraging Electronic Health Records Data and Natural Language Processing Enrichment Methods. Front Psychiatry 2022; 13:844442. [PMID: 35479497 PMCID: PMC9037331 DOI: 10.3389/fpsyt.2022.844442] [Citation(s) in RCA: 6] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 12/28/2021] [Accepted: 03/09/2022] [Indexed: 12/30/2022] Open
Abstract
BACKGROUND Identifying patients at a high risk of psychosis relapse is crucial for early interventions. A relevant psychiatric clinical context is often recorded in clinical notes; however, the utilization of unstructured data remains limited. This study aimed to develop psychosis-relapse prediction models using various types of clinical notes and structured data. METHODS Clinical data were extracted from the electronic health records of the Ajou University Medical Center in South Korea. The study population included patients with psychotic disorders, and outcome was psychosis relapse within 1 year. Using only structured data, we developed an initial prediction model, then three natural language processing (NLP)-enriched models using three types of clinical notes (psychological tests, admission notes, and initial nursing assessment) and one complete model. Latent Dirichlet Allocation was used to cluster the clinical context into similar topics. All models applied the least absolute shrinkage and selection operator logistic regression algorithm. We also performed an external validation using another hospital database. RESULTS A total of 330 patients were included, and 62 (18.8%) experienced psychosis relapse. Six predictors were used in the initial model and 10 additional topics from Latent Dirichlet Allocation processing were added in the enriched models. The model derived from all notes showed the highest value of the area under the receiver operating characteristic (AUROC = 0.946) in the internal validation, followed by models based on the psychological test notes, admission notes, initial nursing assessments, and structured data only (0.902, 0.855, 0.798, and 0.784, respectively). The external validation was performed using only the initial nursing assessment note, and the AUROC was 0.616. CONCLUSIONS We developed prediction models for psychosis relapse using the NLP-enrichment method. Models using clinical notes were more effective than models using only structured data, suggesting the importance of unstructured data in psychosis prediction.
Collapse
Affiliation(s)
- Dong Yun Lee
- Department of Biomedical Informatics, Ajou University School of Medicine, Suwon, South Korea
| | - Chungsoo Kim
- Department of Biomedical Sciences, Ajou University Graduate School of Medicine, Suwon, South Korea
| | - Seongwon Lee
- Department of Biomedical Informatics, Ajou University School of Medicine, Suwon, South Korea.,Department of Biomedical Sciences, Ajou University Graduate School of Medicine, Suwon, South Korea
| | - Sang Joon Son
- Department of Psychiatry, Ajou University School of Medicine, Suwon, South Korea
| | - Sun-Mi Cho
- Department of Psychiatry, Ajou University School of Medicine, Suwon, South Korea
| | - Yong Hyuk Cho
- Department of Psychiatry, Ajou University School of Medicine, Suwon, South Korea
| | - Jaegyun Lim
- Department of Laboratory Medicine, Myongji Hospital, Hanyang University College of Medicine, Goyang, South Korea
| | - Rae Woong Park
- Department of Biomedical Informatics, Ajou University School of Medicine, Suwon, South Korea.,Department of Biomedical Sciences, Ajou University Graduate School of Medicine, Suwon, South Korea
| |
Collapse
|
12
|
Park C, You SC, Jeon H, Jeong CW, Choi JW, Park RW. Development and Validation of the Radiology Common Data Model (R-CDM) for the International Standardization of Medical Imaging Data. Yonsei Med J 2022; 63:S74-S83. [PMID: 35040608 PMCID: PMC8790584 DOI: 10.3349/ymj.2022.63.s74] [Citation(s) in RCA: 9] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 09/08/2021] [Revised: 10/28/2021] [Accepted: 10/31/2021] [Indexed: 12/02/2022] Open
Abstract
PURPOSE Digital Imaging and Communications in Medicine (DICOM), a standard file format for medical imaging data, contains metadata describing each file. However, metadata are often incomplete, and there is no standardized format for recording metadata, leading to inefficiency during the metadata-based data retrieval process. Here, we propose a novel standardization method for DICOM metadata termed the Radiology Common Data Model (R-CDM). MATERIALS AND METHODS R-CDM was designed to be compatible with Health Level Seven International (HL7)/Fast Healthcare Interoperability Resources (FHIR) and linked with the Observational Medical Outcomes Partnership (OMOP)-CDM to achieve a seamless link between clinical data and medical imaging data. The terminology system was standardized using the RadLex playbook, a comprehensive lexicon of radiology. As a proof of concept, the R-CDM conversion process was conducted with 41.7 TB of data from the Ajou University Hospital. The R-CDM database visualizer was developed to visualize the main characteristics of the R-CDM database. RESULTS Information from 2801360 cases and 87203226 DICOM files was organized into two tables constituting the R-CDM. Information on imaging device and image resolution was recorded with more than 99.9% accuracy. Furthermore, OMOP-CDM and R-CDM were linked to efficiently extract specific types of images from specific patient cohorts. CONCLUSION R-CDM standardizes the structure and terminology for recording medical imaging data to eliminate incomplete and unstandardized information. Successful standardization was achieved by the extract, transform, and load process and image classifier. We hope that the R-CDM will contribute to deep learning research in the medical imaging field by enabling the securement of large-scale medical imaging data from multinational institutions.
Collapse
Affiliation(s)
- ChulHyoung Park
- Department of Biomedical Informatics, Ajou University School of Medicine, Suwon, Korea
| | - Seng Chan You
- Department of Preventive Medicine, Yonsei University College of Medicine, Seoul, Korea
| | - Hokyun Jeon
- Department of Biomedical Informatics, Ajou University School of Medicine, Suwon, Korea
| | - Chang Won Jeong
- Medical Convergence Research Center, Wonkwang University, Iksan, Korea
| | - Jin Wook Choi
- Department of Radiology, Ajou University Medical Center, Suwon, Korea
| | - Rae Woong Park
- Department of Biomedical Informatics, Ajou University School of Medicine, Suwon, Korea
- Department of Biomedical Sciences, Ajou University Graduate School of Medicine, Suwon, Korea.
| |
Collapse
|
13
|
Almeida JR, Silva JF, Matos S, Oliveira JL. A two-stage workflow to extract and harmonize drug mentions from clinical notes into observational databases. J Biomed Inform 2021; 120:103849. [PMID: 34214696 DOI: 10.1016/j.jbi.2021.103849] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/02/2020] [Revised: 06/04/2021] [Accepted: 06/19/2021] [Indexed: 01/02/2023]
Abstract
BACKGROUND The content of the clinical notes that have been continuously collected along patients' health history has the potential to provide relevant information about treatments and diseases, and to increase the value of structured data available in Electronic Health Records (EHR) databases. EHR databases are currently being used in observational studies which lead to important findings in medical and biomedical sciences. However, the information present in clinical notes is not being used in those studies, since the computational analysis of this unstructured data is much complex in comparison to structured data. METHODS We propose a two-stage workflow for solving an existing gap in Extraction, Transformation and Loading (ETL) procedures regarding observational databases. The first stage of the workflow extracts prescriptions present in patient's clinical notes, while the second stage harmonises the extracted information into their standard definition and stores the resulting information in a common database schema used in observational studies. RESULTS We validated this methodology using two distinct data sets, in which the goal was to extract and store drug related information in a new Observational Medical Outcomes Partnership (OMOP) Common Data Model (CDM) database. We analysed the performance of the used annotator as well as its limitations. Finally, we described some practical examples of how users can explore these datasets once migrated to OMOP CDM databases. CONCLUSION With this methodology, we were able to show a strategy for using the information extracted from the clinical notes in business intelligence tools, or for other applications such as data exploration through the use of SQL queries. Besides, the extracted information complements the data present in OMOP CDM databases which was not directly available in the EHR database.
Collapse
Affiliation(s)
- João Rafael Almeida
- DETI/IEETA, University of Aveiro, Aveiro, Portugal; Department of Computation, University of A Coruña, A Coruña, Spain.
| | | | - Sérgio Matos
- DETI/IEETA, University of Aveiro, Aveiro, Portugal.
| | | |
Collapse
|