1
|
Bennett N, Plečko D, Ukor IF, Meinshausen N, Bühlmann P. ricu: R's interface to intensive care data. Gigascience 2022; 12:giad041. [PMID: 37318234 PMCID: PMC10268223 DOI: 10.1093/gigascience/giad041] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/17/2022] [Revised: 03/07/2023] [Accepted: 05/18/2023] [Indexed: 06/16/2023] Open
Abstract
OBJECTIVE To develop a unified framework for analyzing data from 5 large publicly available intensive care unit (ICU) datasets. FINDINGS Using 3 American (Medical Information Mart for Intensive Care III, Medical Information Mart for Intensive Care IV, electronic ICU) and 2 European (Amsterdam University Medical Center Database, High Time Resolution ICU Dataset) databases, we constructed a mapping for each database to a set of clinically relevant concepts, which are grounded in the Observational Medical Outcomes Partnership Vocabulary wherever possible. Furthermore, we performed synchronization in the units of measurement and data type representation. On top of this, we built functionality, which allows the user to download, set up, and load data from all of the 5 databases, through a unified Application Programming Interface. The resulting ricu R-package represents the computational infrastructure for handling publicly available ICU datasets, and its latest release allows the user to load 119 existing clinical concepts from the 5 data sources. CONCLUSION The ricu R-package (available on GitHub and CRAN) is the first tool that enables users to analyze publicly available ICU datasets simultaneously (datasets are available upon request from respective owners). Such an interface saves researchers time when analyzing ICU data and helps reproducibility. We hope that ricu can become a community-wide effort, so that data harmonization is not repeated by each research group separately. One current limitation is that concepts were added on a case-to-case basis, and therefore the resulting dictionary of concepts is not comprehensive. Further work is needed to make the dictionary comprehensive.
Collapse
Affiliation(s)
- Nicolas Bennett
- Seminar for Statistics, ETH Zürich, 8092 Zürich (Rämistrasse 101), Switzerland
| | - Drago Plečko
- Seminar for Statistics, ETH Zürich, 8092 Zürich (Rämistrasse 101), Switzerland
| | - Ida-Fong Ukor
- Department of Anaesthesiology and Perioperative Medicine, Monash Health, Clayton VIC 3168, Australia
| | - Nicolai Meinshausen
- Seminar for Statistics, ETH Zürich, 8092 Zürich (Rämistrasse 101), Switzerland
| | - Peter Bühlmann
- Seminar for Statistics, ETH Zürich, 8092 Zürich (Rämistrasse 101), Switzerland
| |
Collapse
|
2
|
Leckenby E, Dawoud D, Bouvy J, Jónsson P. The Sandbox Approach and its Potential for Use in Health Technology Assessment: A Literature Review. Appl Health Econ Health Policy 2021; 19:857-869. [PMID: 34254275 PMCID: PMC8545721 DOI: 10.1007/s40258-021-00665-1] [Citation(s) in RCA: 7] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Accepted: 05/26/2021] [Indexed: 05/09/2023]
Abstract
BACKGROUND The concept of the regulatory sandbox-a safe space for testing new regulatory processes-was first used within the financial technologies (FinTech) sector, but has since expanded into other sectors, including healthcare. OBJECTIVES This review aims to describe the extent of use of sandboxes in healthcare and assess the potential for the sandbox approach to be used to test and develop emerging health technology assessment (HTA) methods, policies and processes for innovative technologies. METHODS A systematic literature review was undertaken to identify published papers and reports that described and/or assessed the use of sandboxes in the healthcare sector. Searches were conducted in Medline, Embase, Econlit, Social Policy and Practice, and Health Management Information Consortium databases from inception to March 2020. Free-text Google search was also conducted to identify relevant grey literature. Only papers and reports discussing or evaluating the use of sandboxes in healthcare settings and published in English were included. Included studies were qualitatively summarised using a thematic analysis approach. RESULTS Overall, 46 papers and reports were included. The topics covered were classified into 4 major themes: history of the regulatory sandbox, the sandbox as a testing environment, the sandbox as a regulatory approach, examples of using sandboxes in healthcare. Findings show that the use of regulatory sandboxes in healthcare is relatively new and primarily used in high-income countries to support the adoption of new technologies, particularly those related to digital health. Recommendations are made based on these findings to guide its use in HTA policy and methods development. CONCLUSIONS Sandboxes are increasingly used within healthcare regulation. Despite its potential, this approach has not been used in HTA policy and methodological developments to date. HTA agencies should consider this approach to facilitate developing policies, methods and processes for innovative and disruptive health technologies. Transferability to low- and middle-income countries' settings, however, should be assessed.
Collapse
Affiliation(s)
- Emily Leckenby
- Centre for Health Technology Evaluation, National Institute for Health and Care Excellence (NICE), Manchester, UK
| | - Dalia Dawoud
- Science Policy and Research Programme, Science, Evidence and Analytics Directorate, National Institute for Health and Care Excellence (NICE), 2nd Floor, 2 Redman Place, London, E20 1JQ, UK.
| | - Jacoline Bouvy
- NICE Scientific Advice, Centre for Health Technology Evaluation, National Institute for Health and Care Excellence (NICE), London, UK
| | - Páll Jónsson
- Data and Analytics, Science, Evidence and Analytics Directorate, National Institute for Health and Care Excellence (NICE), Manchester, UK
| |
Collapse
|
3
|
Datta S, Sachs JP, FreitasDa Cruz H, Martensen T, Bode P, Morassi Sasso A, Glicksberg BS, Böttinger E. FIBER: enabling flexible retrieval of electronic health records data for clinical predictive modeling. JAMIA Open 2021; 4:ooab048. [PMID: 34350388 PMCID: PMC8327378 DOI: 10.1093/jamiaopen/ooab048] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/23/2021] [Revised: 05/12/2021] [Accepted: 06/20/2021] [Indexed: 11/13/2022] Open
Abstract
Objectives The development of clinical predictive models hinges upon the availability of comprehensive clinical data. Tapping into such resources requires considerable effort from clinicians, data scientists, and engineers. Specifically, these efforts are focused on data extraction and preprocessing steps required prior to modeling, including complex database queries. A handful of software libraries exist that can reduce this complexity by building upon data standards. However, a gap remains concerning electronic health records (EHRs) stored in star schema clinical data warehouses, an approach often adopted in practice. In this article, we introduce the FlexIBle EHR Retrieval (FIBER) tool: a Python library built on top of a star schema (i2b2) clinical data warehouse that enables flexible generation of modeling-ready cohorts as data frames. Materials and Methods FIBER was developed on top of a large-scale star schema EHR database which contains data from 8 million patients and over 120 million encounters. To illustrate FIBER’s capabilities, we present its application by building a heart surgery patient cohort with subsequent prediction of acute kidney injury (AKI) with various machine learning models. Results Using FIBER, we were able to build the heart surgery cohort (n = 12 061), identify the patients that developed AKI (n = 1005), and automatically extract relevant features (n = 774). Finally, we trained machine learning models that achieved area under the curve values of up to 0.77 for this exemplary use case. Conclusion FIBER is an open-source Python library developed for extracting information from star schema clinical data warehouses and reduces time-to-modeling, helping to streamline the clinical modeling process.
Collapse
Affiliation(s)
- Suparno Datta
- Digital Health Center, Hasso Plattner Institute, University of Potsdam, Potsdam, Germany.,Hasso Plattner Institute for Digital Health at Mount Sinai, Icahn School of Medicine at Mount Sinai, New York, New York, USA
| | - Jan Philipp Sachs
- Digital Health Center, Hasso Plattner Institute, University of Potsdam, Potsdam, Germany.,Hasso Plattner Institute for Digital Health at Mount Sinai, Icahn School of Medicine at Mount Sinai, New York, New York, USA
| | - Harry FreitasDa Cruz
- Digital Health Center, Hasso Plattner Institute, University of Potsdam, Potsdam, Germany.,Hasso Plattner Institute for Digital Health at Mount Sinai, Icahn School of Medicine at Mount Sinai, New York, New York, USA
| | - Tom Martensen
- Digital Health Center, Hasso Plattner Institute, University of Potsdam, Potsdam, Germany
| | - Philipp Bode
- Digital Health Center, Hasso Plattner Institute, University of Potsdam, Potsdam, Germany
| | - Ariane Morassi Sasso
- Digital Health Center, Hasso Plattner Institute, University of Potsdam, Potsdam, Germany.,Hasso Plattner Institute for Digital Health at Mount Sinai, Icahn School of Medicine at Mount Sinai, New York, New York, USA
| | - Benjamin S Glicksberg
- Hasso Plattner Institute for Digital Health at Mount Sinai, Icahn School of Medicine at Mount Sinai, New York, New York, USA.,Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, New York, New York, USA
| | - Erwin Böttinger
- Digital Health Center, Hasso Plattner Institute, University of Potsdam, Potsdam, Germany.,Hasso Plattner Institute for Digital Health at Mount Sinai, Icahn School of Medicine at Mount Sinai, New York, New York, USA
| |
Collapse
|
4
|
Daniel C, Kalra D. Clinical Research Informatics. Yearb Med Inform 2020; 29:203-207. [PMID: 32823317 PMCID: PMC7442510 DOI: 10.1055/s-0040-1702007] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/26/2022] Open
Abstract
Objectives
: To summarize key contributions to current research in the field of Clinical Research Informatics (CRI) and to select best papers published in 2019.
Method
: A bibliographic search using a combination of MeSH descriptors and free-text terms on CRI was performed using PubMed, followed by a double-blind review in order to select a list of candidate best papers to be then peer-reviewed by external reviewers. After peer-review ranking, a consensus meeting between the two section editors and the editorial team was organized to finally conclude on the selected three best papers.
Results
: Among the 517 papers, published in 2019, returned by the search, that were in the scope of the various areas of CRI, the full review process selected three best papers. The first best paper describes the use of a homomorphic encryption technique to enable federated analysis of real-world data while complying more easily with data protection requirements. The authors of the second best paper demonstrate the evidence value of federated data networks reporting a large real world data study related to the first line treatment for hypertension. The third best paper reports the migration of the US Food and Drug Administration (FDA) adverse event reporting system database to the OMOP common data model. This work opens the combined analysis of both spontaneous reporting system and electronic health record (EHR) data for pharmacovigilance.
Conclusions
: The most significant research efforts in the CRI field are currently focusing on real world evidence generation and especially the reuse of EHR data. With the progress achieved this year in the areas of phenotyping, data integration, semantic interoperability, and data quality assessment, real world data is becoming more accessible and reusable. High quality data sets are key assets not only for large scale observational studies or for changing the way clinical trials are conducted but also for developing or evaluating artificial intelligence algorithms guiding clinical decision for more personalized care. And lastly, security and confidentiality, ethical and regulatory issues, and more generally speaking data governance are still active research areas this year.
Collapse
Affiliation(s)
- Christel Daniel
- Information Technology Department, AP-HP, Paris, France.,Sorbonne University, University Paris 13, Sorbonne Paris Cité, INSERM UMR_S 1142, LIMICS, Paris, France
| | | | | |
Collapse
|
5
|
Glicksberg BS, Burns S, Currie R, Griffin A, Wang ZJ, Haussler D, Goldstein T, Collisson E. Blockchain-Authenticated Sharing of Genomic and Clinical Outcomes Data of Patients With Cancer: A Prospective Cohort Study. J Med Internet Res 2020; 22:e16810. [PMID: 32196460 PMCID: PMC7125440 DOI: 10.2196/16810] [Citation(s) in RCA: 18] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/27/2019] [Revised: 12/09/2019] [Accepted: 12/15/2019] [Indexed: 12/21/2022] Open
Abstract
Background Efficiently sharing health data produced during standard care could dramatically accelerate progress in cancer treatments, but various barriers make this difficult. Not sharing these data to ensure patient privacy is at the cost of little to no learning from real-world data produced during cancer care. Furthermore, recent research has demonstrated a willingness of patients with cancer to share their treatment experiences to fuel research, despite potential risks to privacy. Objective The objective of this study was to design, pilot, and release a decentralized, scalable, efficient, economical, and secure strategy for the dissemination of deidentified clinical and genomic data with a focus on late-stage cancer. Methods We created and piloted a blockchain-authenticated system to enable secure sharing of deidentified patient data derived from standard of care imaging, genomic testing, and electronic health records (EHRs), called the Cancer Gene Trust (CGT). We prospectively consented and collected data for a pilot cohort (N=18), which we uploaded to the CGT. EHR data were extracted from both a hospital cancer registry and a common data model (CDM) format to identify optimal data extraction and dissemination practices. Specifically, we scored and compared the level of completeness between two EHR data extraction formats against the gold standard source documentation for patients with available data (n=17). Results Although the total completeness scores were greater for the registry reports than those for the CDM, this difference was not statistically significant. We did find that some specific data fields, such as histology site, were better captured using the registry reports, which can be used to improve the continually adapting CDM. In terms of the overall pilot study, we found that CGT enables rapid integration of real-world data of patients with cancer in a more clinically useful time frame. We also developed an open-source Web application to allow users to seamlessly search, browse, explore, and download CGT data. Conclusions Our pilot demonstrates the willingness of patients with cancer to participate in data sharing and how blockchain-enabled structures can maintain relationships between individual data elements while preserving patient privacy, empowering findings by third-party researchers and clinicians. We demonstrate the feasibility of CGT as a framework to share health data trapped in silos to further cancer research. Further studies to optimize data representation, stream, and integrity are required.
Collapse
Affiliation(s)
- Benjamin Scott Glicksberg
- Bakar Computational Health Sciences Institute, University of California, San Francisco, CA, United States.,Hasso Plattner Institute for Digital Health at Mount Sinai, Icahn School of Medicine at Mount Sinai, New York, NY, United States.,Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, New York, NY, United States
| | - Shohei Burns
- Division of Hematology and Oncology, Department of Medicine, University of California, San Francisco, CA, United States
| | - Rob Currie
- UC Santa Cruz Genomics Institute, University of California, Santa Cruz, Santa Cruz, CA, United States
| | - Ann Griffin
- Helen Diller Family Comprehensive Cancer Center, University of California, San Francisco, CA, United States
| | - Zhen Jane Wang
- Department of Radiology and Biomedical Imaging, University of California, San Francisco, CA, United States
| | - David Haussler
- UC Santa Cruz Genomics Institute, University of California, Santa Cruz, Santa Cruz, CA, United States.,Howard Hughes Medical Institute, Santa Cruz, CA, United States
| | - Theodore Goldstein
- Bakar Computational Health Sciences Institute, University of California, San Francisco, CA, United States.,UC Santa Cruz Genomics Institute, University of California, Santa Cruz, Santa Cruz, CA, United States
| | - Eric Collisson
- Division of Hematology and Oncology, Department of Medicine, University of California, San Francisco, CA, United States
| |
Collapse
|
6
|
Glicksberg BS, Oskotsky B, Thangaraj PM, Giangreco N, Badgeley MA, Johnson KW, Datta D, Rudrapatna VA, Rappoport N, Shervey MM, Miotto R, Goldstein TC, Rutenberg E, Frazier R, Lee N, Israni S, Larsen R, Percha B, Li L, Dudley JT, Tatonetti NP, Butte AJ. PatientExploreR: an extensible application for dynamic visualization of patient clinical history from electronic health records in the OMOP common data model. Bioinformatics 2019; 35:4515-4518. [PMID: 31214700 PMCID: PMC6821222 DOI: 10.1093/bioinformatics/btz409] [Citation(s) in RCA: 18] [Impact Index Per Article: 3.6] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/16/2018] [Revised: 03/20/2019] [Accepted: 06/13/2019] [Indexed: 01/05/2023] Open
Abstract
MOTIVATION Electronic health records (EHRs) are quickly becoming omnipresent in healthcare, but interoperability issues and technical demands limit their use for biomedical and clinical research. Interactive and flexible software that interfaces directly with EHR data structured around a common data model (CDM) could accelerate more EHR-based research by making the data more accessible to researchers who lack computational expertise and/or domain knowledge. RESULTS We present PatientExploreR, an extensible application built on the R/Shiny framework that interfaces with a relational database of EHR data in the Observational Medical Outcomes Partnership CDM format. PatientExploreR produces patient-level interactive and dynamic reports and facilitates visualization of clinical data without any programming required. It allows researchers to easily construct and export patient cohorts from the EHR for analysis with other software. This application could enable easier exploration of patient-level data for physicians and researchers. PatientExploreR can incorporate EHR data from any institution that employs the CDM for users with approved access. The software code is free and open source under the MIT license, enabling institutions to install and users to expand and modify the application for their own purposes. AVAILABILITY AND IMPLEMENTATION PatientExploreR can be freely obtained from GitHub: https://github.com/BenGlicksberg/PatientExploreR. We provide instructions for how researchers with approved access to their institutional EHR can use this package. We also release an open sandbox server of synthesized patient data for users without EHR access to explore: http://patientexplorer.ucsf.edu. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Benjamin S Glicksberg
- Bakar Computational Health Sciences Institute, University of California, San Francisco, San Francisco, CA, USA
| | - Boris Oskotsky
- Bakar Computational Health Sciences Institute, University of California, San Francisco, San Francisco, CA, USA
| | - Phyllis M Thangaraj
- Department of Biomedical Informatics, Columbia University, New York, NY, USA
- Department of Systems Biology, Columbia University, New York, NY, USA
- Department of Medicine, Columbia University, New York, NY, USA
| | - Nicholas Giangreco
- Department of Biomedical Informatics, Columbia University, New York, NY, USA
- Department of Systems Biology, Columbia University, New York, NY, USA
- Department of Medicine, Columbia University, New York, NY, USA
| | - Marcus A Badgeley
- Departments of Genomics and Data Science, Icahn Institute for Genomic Sciences and Multiscale Biology, Icahn School of Medicine at Mount Sinai, Institute of Next Generation Healthcare, New York, NY, USA
| | - Kipp W Johnson
- Departments of Genomics and Data Science, Icahn Institute for Genomic Sciences and Multiscale Biology, Icahn School of Medicine at Mount Sinai, Institute of Next Generation Healthcare, New York, NY, USA
| | - Debajyoti Datta
- Bakar Computational Health Sciences Institute, University of California, San Francisco, San Francisco, CA, USA
| | - Vivek A Rudrapatna
- Bakar Computational Health Sciences Institute, University of California, San Francisco, San Francisco, CA, USA
- Division of Gastroenterology, Department of Medicine, University of California, San Francisco, CA, USA
| | - Nadav Rappoport
- Bakar Computational Health Sciences Institute, University of California, San Francisco, San Francisco, CA, USA
| | - Mark M Shervey
- Departments of Genomics and Data Science, Icahn Institute for Genomic Sciences and Multiscale Biology, Icahn School of Medicine at Mount Sinai, Institute of Next Generation Healthcare, New York, NY, USA
| | - Riccardo Miotto
- Departments of Genomics and Data Science, Icahn Institute for Genomic Sciences and Multiscale Biology, Icahn School of Medicine at Mount Sinai, Institute of Next Generation Healthcare, New York, NY, USA
| | - Theodore C Goldstein
- Bakar Computational Health Sciences Institute, University of California, San Francisco, San Francisco, CA, USA
| | - Eugenia Rutenberg
- Bakar Computational Health Sciences Institute, University of California, San Francisco, San Francisco, CA, USA
| | - Remi Frazier
- Enterprise Information and Analytics, University of California, San Francisco, San Francisco, CA, USA
| | - Nelson Lee
- Enterprise Information and Analytics, University of California, San Francisco, San Francisco, CA, USA
| | - Sharat Israni
- Bakar Computational Health Sciences Institute, University of California, San Francisco, San Francisco, CA, USA
| | - Rick Larsen
- Enterprise Information and Analytics, University of California, San Francisco, San Francisco, CA, USA
| | - Bethany Percha
- Departments of Genomics and Data Science, Icahn Institute for Genomic Sciences and Multiscale Biology, Icahn School of Medicine at Mount Sinai, Institute of Next Generation Healthcare, New York, NY, USA
| | - Li Li
- Departments of Genomics and Data Science, Icahn Institute for Genomic Sciences and Multiscale Biology, Icahn School of Medicine at Mount Sinai, Institute of Next Generation Healthcare, New York, NY, USA
| | - Joel T Dudley
- Departments of Genomics and Data Science, Icahn Institute for Genomic Sciences and Multiscale Biology, Icahn School of Medicine at Mount Sinai, Institute of Next Generation Healthcare, New York, NY, USA
| | - Nicholas P Tatonetti
- Department of Biomedical Informatics, Columbia University, New York, NY, USA
- Department of Systems Biology, Columbia University, New York, NY, USA
- Department of Medicine, Columbia University, New York, NY, USA
| | - Atul J Butte
- Bakar Computational Health Sciences Institute, University of California, San Francisco, San Francisco, CA, USA
- Center for Data-Driven Insights and Innovation, University of California Health, Oakland, CA, USA
| |
Collapse
|
7
|
Cho J, You SC, Lee S, Park D, Park B, Hripcsak G, Park RW. Application for Epidemiological Geographic Information System: An Open-Source Spatial Analysis Tool based on the Common Data Model (Preprint). JMIR Public Health Surveill 2019. [DOI: 10.2196/15804] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open
|