1
|
Pedrera-Jiménez M, García-Barrio N, Rubio-Mayo P, Tato-Gómez A, Cruz-Bermúdez JL, Bernal-Sobrino JL, Muñoz-Carrero A, Serrano-Balazote P. TransformEHRs: a flexible methodology for building transparent ETL processes for EHR reuse. Methods Inf Med 2022; 61:e89-e102. [PMID: 36220109 PMCID: PMC9788916 DOI: 10.1055/s-0042-1757763] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/27/2022]
Abstract
BACKGROUND During the COVID-19 pandemic, several methodologies were designed for obtaining electronic health record (EHR)-derived datasets for research. These processes are often based on black boxes, on which clinical researchers are unaware of how the data were recorded, extracted, and transformed. In order to solve this, it is essential that extract, transform, and load (ETL) processes are based on transparent, homogeneous, and formal methodologies, making them understandable, reproducible, and auditable. OBJECTIVES This study aims to design and implement a methodology, according with FAIR Principles, for building ETL processes (focused on data extraction, selection, and transformation) for EHR reuse in a transparent and flexible manner, applicable to any clinical condition and health care organization. METHODS The proposed methodology comprises four stages: (1) analysis of secondary use models and identification of data operations, based on internationally used clinical repositories, case report forms, and aggregated datasets; (2) modeling and formalization of data operations, through the paradigm of the Detailed Clinical Models; (3) agnostic development of data operations, selecting SQL and R as programming languages; and (4) automation of the ETL instantiation, building a formal configuration file with XML. RESULTS First, four international projects were analyzed to identify 17 operations, necessary to obtain datasets according to the specifications of these projects from the EHR. With this, each of the data operations was formalized, using the ISO 13606 reference model, specifying the valid data types as arguments, inputs and outputs, and their cardinality. Then, an agnostic catalog of data was developed through data-oriented programming languages previously selected. Finally, an automated ETL instantiation process was built from an ETL configuration file formally defined. CONCLUSIONS This study has provided a transparent and flexible solution to the difficulty of making the processes for obtaining EHR-derived data for secondary use understandable, auditable, and reproducible. Moreover, the abstraction carried out in this study means that any previous EHR reuse methodology can incorporate these results into them.
Collapse
Affiliation(s)
- Miguel Pedrera-Jiménez
- Data Science Unit, Instituto de Investigación Sanitaria Hospital Universitario 12 de Octubre, Madrid, Spain,ETSI Telecomunicación, Universidad Politécnica de Madrid, Madrid, Spain,Address for correspondence Miguel Pedrera-Jiménez, Eng, MSc Health Informatics DepartmentHospital Universitario 12 de Octubre, Av. de Córdoba, s/n, 28041 MadridSpain
| | - Noelia García-Barrio
- Data Science Unit, Instituto de Investigación Sanitaria Hospital Universitario 12 de Octubre, Madrid, Spain
| | - Paula Rubio-Mayo
- Data Science Unit, Instituto de Investigación Sanitaria Hospital Universitario 12 de Octubre, Madrid, Spain
| | - Alberto Tato-Gómez
- Data Science Unit, Instituto de Investigación Sanitaria Hospital Universitario 12 de Octubre, Madrid, Spain
| | - Juan Luis Cruz-Bermúdez
- Data Science Unit, Instituto de Investigación Sanitaria Hospital Universitario 12 de Octubre, Madrid, Spain
| | - José Luis Bernal-Sobrino
- Data Science Unit, Instituto de Investigación Sanitaria Hospital Universitario 12 de Octubre, Madrid, Spain
| | | | - Pablo Serrano-Balazote
- Data Science Unit, Instituto de Investigación Sanitaria Hospital Universitario 12 de Octubre, Madrid, Spain
| |
Collapse
|
2
|
Pedrera-Jiménez M, García-Barrio N, Cruz-Rojo J, Terriza-Torres AI, López-Jiménez EA, Calvo-Boyero F, Jiménez-Cerezo MJ, Blanco-Martínez AJ, Roig-Domínguez G, Cruz-Bermúdez JL, Bernal-Sobrino JL, Serrano-Balazote P, Muñoz-Carrero A. Obtaining EHR-derived datasets for COVID-19 research within a short time: a flexible methodology based on Detailed Clinical Models. J Biomed Inform 2021; 115:103697. [PMID: 33548541 PMCID: PMC7857038 DOI: 10.1016/j.jbi.2021.103697] [Citation(s) in RCA: 12] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/11/2020] [Revised: 12/18/2020] [Accepted: 02/01/2021] [Indexed: 10/27/2022]
Abstract
BACKGROUND COVID-19 ranks as the single largest health incident worldwide in decades. In such a scenario, electronic health records (EHRs) should provide a timely response to healthcare needs and to data uses that go beyond direct medical care and are known as secondary uses, which include biomedical research. However, it is usual for each data analysis initiative to define its own information model in line with its requirements. These specifications share clinical concepts, but differ in format and recording criteria, something that creates data entry redundancy in multiple electronic data capture systems (EDCs) with the consequent investment of effort and time by the organization. OBJECTIVE This study sought to design and implement a flexible methodology based on detailed clinical models (DCM), which would enable EHRs generated in a tertiary hospital to be effectively reused without loss of meaning and within a short time. MATERIAL AND METHODS The proposed methodology comprises four stages: (1) specification of an initial set of relevant variables for COVID-19; (2) modeling and formalization of clinical concepts using ISO 13606 standard and SNOMED CT and LOINC terminologies; (3) definition of transformation rules to generate secondary use models from standardized EHRs and development of them using R language; and (4) implementation and validation of the methodology through the generation of the International Severe Acute Respiratory and emerging Infection Consortium (ISARIC-WHO) COVID-19 case report form. This process has been implemented into a 1300-bed tertiary Hospital for a cohort of 4489 patients hospitalized from 25 February 2020 to 10 September 2020. RESULTS An initial and expandable set of relevant concepts for COVID-19 was identified, modeled and formalized using ISO-13606 standard and SNOMED CT and LOINC terminologies. Similarly, an algorithm was designed and implemented with R and then applied to process EHRs in accordance with standardized concepts, transforming them into secondary use models. Lastly, these resources were applied to obtain a data extract conforming to the ISARIC-WHO COVID-19 case report form, without requiring manual data collection. The methodology allowed obtaining the observation domain of this model with a coverage of over 85% of patients in the majority of concepts. CONCLUSION This study has furnished a solution to the difficulty of rapidly and efficiently obtaining EHR-derived data for secondary use in COVID-19, capable of adapting to changes in data specifications and applicable to other organizations and other health conditions. The conclusion to be drawn from this initial validation is that this DCM-based methodology allows the effective reuse of EHRs generated in a tertiary Hospital during COVID-19 pandemic, with no additional effort or time for the organization and with a greater data scope than that yielded by conventional manual data collection process in ad-hoc EDCs.
Collapse
Affiliation(s)
- Miguel Pedrera-Jiménez
- Hospital Universitario 12 de Octubre, Av. de Córdoba, s/n, 28041 Madrid, Spain; ETSI Telecomunicación, Universidad Politécnica de Madrid, 28040 Madrid, Spain.
| | | | - Jaime Cruz-Rojo
- Hospital Universitario 12 de Octubre, Av. de Córdoba, s/n, 28041 Madrid, Spain.
| | | | | | | | | | | | | | | | | | | | - Adolfo Muñoz-Carrero
- Digital Health Research Dept., Instituto de Salud Carlos III, Av. de Monforte de Lemos, 5, 28029 Madrid, Spain.
| |
Collapse
|
3
|
Provencio M, Torrente M, Calvo V, Gutiérrez L, Pérez-Callejo D, Pérez-Barrios C, Barquín M, Royuela A, Rodriguez-Alfonso B, Sotelo M, Cruz-Bermúdez JL, Mendez M, Cruz-Bermúdez A, Romero A. Dynamic circulating tumor DNA quantificaton for the individualization of non-small-cell lung cancer patients treatment. Oncotarget 2017; 8:60291-60298. [PMID: 28947971 PMCID: PMC5601139 DOI: 10.18632/oncotarget.20016] [Citation(s) in RCA: 20] [Impact Index Per Article: 2.9] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/06/2017] [Accepted: 07/25/2017] [Indexed: 12/04/2022] Open
Abstract
Background Liquid biopsy has evolved from being a promising line to becoming a validated approach for biomarker testing. However, its utility for individualization of therapy has been scarcely reported. In this study, we show how monitoring levels of EGFR mutation in plasma can be useful for the individualization of treatment. Results Longitudinal EGFR mutation levels in plasma always correlated with tumor response ascertained by RECIST criteria. Moreover, decreasing EGFR mutation levels were detected in all patients benefiting from locoregional radiotherapy, whereas the opposite occurred when a patient progressed soon after radiotherapy treatment. Similarly, increasing EGFR mutation levels anticipated disease progression after TKI dose reduction, discontinuation of treatment, or reduced bioavailability due to drug interactions. In addition, EGFR mutation levels were useful to monitor treatment outcome of new therapies and constituted a decisive factor when the clinical situation of the patient did not correlate with responses ascertained by radiologist. Finally, our results indicate that cancer associated body fluids (pleural, pericardial or cerebrospinal fluid) are certainly a suitable source for biomarker testing that can extend EGFR mutation detection to biofluids other than blood. Materials and Methods A total of 180 serial plasma samples from 18 non-small-cell lung cancer patients who carried an activating EGFR mutation were investigated by digital PCR. Conclusions Monitoring levels of EGFR mutation in plasma allows resolving doubts that frequently arise in daily clinical practice and constitutes a major step towards achieving personalized medicine.
Collapse
Affiliation(s)
- Mariano Provencio
- Medical Oncology Department, Hospital Universitario Puerta de Hierro-Majadahonda, Majadahonda, Spain
| | - María Torrente
- Medical Oncology Department, Hospital Universitario Puerta de Hierro-Majadahonda, Majadahonda, Spain
| | - Virgina Calvo
- Medical Oncology Department, Hospital Universitario Puerta de Hierro-Majadahonda, Majadahonda, Spain
| | - Lourdes Gutiérrez
- Medical Oncology Department, Hospital Universitario Puerta de Hierro-Majadahonda, Majadahonda, Spain
| | - David Pérez-Callejo
- Medical Oncology Department, Hospital Universitario Puerta de Hierro-Majadahonda, Majadahonda, Spain
| | - Clara Pérez-Barrios
- Molecular Oncology Laboratory, Biomedical Sciences Research Institute, Hospital Universitario Puerta de Hierro-Majadahonda, Majadahonda, Spain
| | - Miguel Barquín
- Molecular Oncology Laboratory, Biomedical Sciences Research Institute, Hospital Universitario Puerta de Hierro-Majadahonda, Majadahonda, Spain
| | - Ana Royuela
- Biostatistics Department, Biomedical Sciences Research Institute, Hospital Universitario Puerta de Hierro-Majadahonda, Majadahonda, Spain
| | - Begoña Rodriguez-Alfonso
- Nuclear Medicine Department, Hospital Universitario Puerta de Hierro-Majadahonda, Majadahonda, Spain
| | - Miguel Sotelo
- Medical Oncology Department, Hospital Infanta Cristina, Parla, Spain
| | - Juan Luis Cruz-Bermúdez
- Information Technologies Department, Hospital Universidad Politécnica de Madrid, Madrid, Spain
| | - Miriam Mendez
- Medical Oncology Department, Hospital Universitario Puerta de Hierro-Majadahonda, Majadahonda, Spain
| | - Alberto Cruz-Bermúdez
- Medical Oncology Department, Hospital Universitario Puerta de Hierro-Majadahonda, Majadahonda, Spain
| | - Atocha Romero
- Medical Oncology Department, Hospital Universitario Puerta de Hierro-Majadahonda, Majadahonda, Spain.,Molecular Oncology Laboratory, Biomedical Sciences Research Institute, Hospital Universitario Puerta de Hierro-Majadahonda, Majadahonda, Spain
| |
Collapse
|
4
|
Provencio M, Pérez-Callejo D, Torrente M, Martin P, Calvo V, Gutiérrez L, Franco F, Coronado MJ, Cruz-Bermúdez JL, Ruiz-Valdepeñas AM, Cruz-Bermúdez A, Sánchez-Beato M, Romero A, García-Grande A. Concordance between circulating tumor cells and clinical status during follow-up in anaplastic lymphoma kinase (ALK) non-small-cell lung cancer patients. Oncotarget 2017; 8:59408-59416. [PMID: 28938646 PMCID: PMC5601742 DOI: 10.18632/oncotarget.19722] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/21/2017] [Accepted: 06/29/2017] [Indexed: 12/29/2022] Open
Abstract
Background The identification of anaplastic lymphoma kinase (ALK) rearrangements is found in approximately 5% of non-small-cell lung cancers (NSCLCs). However, the development of liquid biopsies as a diagnostic tool is less developed in these cases. This study investigates the use of CTCs during treatment, together with an extended follow-up to correlate with clinical evolution. Patients and Methods A total of 13 patients out of a cohort of 212 patients with lung adenocarcinoma, presented ALK rearrangements (6%) confirmed by tumor biopsy. A total of 60 serial blood samples were collected from these patients who were prospectively enrolled in the study. Results All patients had a positive CTC count at baseline (mean = 3). The median follow-up was 9 months (range 1-17 months). Three patients underwent surgery and their CTC counts decreased after the procedure but still remained detectable. After radiotherapy, 3 cases showed an average decrease of 5 CTCs. A total of 6 patients were treated with ALK inhibitors and a partial response was observed in 3 of them, who also presented decreased CTC counts. The other 3 patients presented primary resistance, and their CTC counts were higher than those obtained prior to progression. Conclusion We believe that the use of CTCs for dynamic monitoring of NSCLC with ALK rearrangement and to detect disease persistence or recurrence may be a reliable technique. CTC counts may also have potential use to monitor the efficacy of ALK inhibitors, facilitating detection of resistance to treatment.
Collapse
Affiliation(s)
- Mariano Provencio
- Medical Oncology Department, Hospital Universitario Puerta de Hierro-Majadahonda, Biomedical Sciences Research Institute Puerta de Hierro-Majadahonda (IDIPHIM), Madrid, Spain
| | - David Pérez-Callejo
- Medical Oncology Department, Hospital Universitario Puerta de Hierro-Majadahonda, Biomedical Sciences Research Institute Puerta de Hierro-Majadahonda (IDIPHIM), Madrid, Spain
| | - María Torrente
- Medical Oncology Department, Hospital Universitario Puerta de Hierro-Majadahonda, Biomedical Sciences Research Institute Puerta de Hierro-Majadahonda (IDIPHIM), Madrid, Spain
| | - Paloma Martin
- Pathology Department, Molecular Section, Hospital Universitario Puerta de Hierro-Majadahonda, Biomedical Sciences Research Institute Puerta de Hierro-Majadahonda (IDIPHIM), Madrid, Spain
| | - Virginia Calvo
- Medical Oncology Department, Hospital Universitario Puerta de Hierro-Majadahonda, Biomedical Sciences Research Institute Puerta de Hierro-Majadahonda (IDIPHIM), Madrid, Spain
| | - Lourdes Gutiérrez
- Medical Oncology Department, Hospital Universitario Puerta de Hierro-Majadahonda, Biomedical Sciences Research Institute Puerta de Hierro-Majadahonda (IDIPHIM), Madrid, Spain
| | - Fernando Franco
- Medical Oncology Department, Hospital Universitario Puerta de Hierro-Majadahonda, Biomedical Sciences Research Institute Puerta de Hierro-Majadahonda (IDIPHIM), Madrid, Spain
| | - Maria José Coronado
- Confocal Microscopy Core Facility, Hospital Universitario Puerta de Hierro-Majadahonda, Biomedical Sciences Research Institute Puerta de Hierro-Majadahonda (IDIPHIM), Madrid, Spain
| | - Juan Luis Cruz-Bermúdez
- Information Technologies Department, Hospital Universitario Puerta de Hierro-Majadahonda (IDIPHIM), Madrid, Spain.,Universidad Politécnica de Madrid, Madrid, Spain
| | - Asunción Martín Ruiz-Valdepeñas
- Medical Oncology Department, Hospital Universitario Puerta de Hierro-Majadahonda, Biomedical Sciences Research Institute Puerta de Hierro-Majadahonda (IDIPHIM), Madrid, Spain
| | - Alberto Cruz-Bermúdez
- Medical Oncology Department, Hospital Universitario Puerta de Hierro-Majadahonda, Biomedical Sciences Research Institute Puerta de Hierro-Majadahonda (IDIPHIM), Madrid, Spain
| | - Margarita Sánchez-Beato
- Medical Oncology Department, Hospital Universitario Puerta de Hierro-Majadahonda, Biomedical Sciences Research Institute Puerta de Hierro-Majadahonda (IDIPHIM), Madrid, Spain
| | - Atocha Romero
- Medical Oncology Department, Hospital Universitario Puerta de Hierro-Majadahonda, Biomedical Sciences Research Institute Puerta de Hierro-Majadahonda (IDIPHIM), Madrid, Spain
| | - Aránzazu García-Grande
- Flow Cytometry Core Facility, Hospital Universitario Puerta de Hierro-Majadahonda, Biomedical Sciences Research Institute Puerta de Hierro-Majadahonda (IDIPHIM), Madrid, Spain
| |
Collapse
|