1
|
Wack M, Coulet A, Burgun A, Rance B. Enhancing clinical data warehousing with provenance data to support longitudinal analyses and large file management: The gitOmmix approach for genomic and image data. J Biomed Inform 2025; 163:104788. [PMID: 39952627 DOI: 10.1016/j.jbi.2025.104788] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/02/2024] [Revised: 01/27/2025] [Accepted: 02/04/2025] [Indexed: 02/17/2025]
Abstract
BACKGROUND If hospital Clinical Data Warehouses are to address today's focus in personalized medicine, they need to be able to track patients longitudinally and manage the large data sets generated by whole genome sequencing, RNA analyses, and complex imaging studies. Current Clinical Data Warehouses address neither issue. This paper reports on methods to enrich current systems by providing provenance data allowing patient histories to be followed longitudinally and managing the linking and versioning of large data sets from whatever source. The methods are open source and applicable to any clinical data warehouse system, whether data schema it uses. METHOD We introduce gitOmmix, an approach that overcomes these limitations, and illustrate its usefulness in the management of medical omics data. gitOmmix relies on (i) a file versioning system: git, (ii) an extension that handles large files: git-annex, (iii) a provenance knowledge graph: PROV-O, and (iv) an alignment between the git versioning information and the provenance knowledge graph. RESULTS Capabilities inherited from git and git-annex enable retracing the history of a clinical interpretation back to the patient sample, through supporting data and analyses. In addition, the provenance knowledge graph, aligned with the git versioning information, enables querying and browsing provenance relationships between these elements. CONCLUSION gitOmmix adds a provenance layer to CDWs, while scaling to large files and being agnostic of the CDW system. For these reasons, we think that it is a viable and generalizable solution for omics clinical studies.
Collapse
Affiliation(s)
- Maxime Wack
- Centre de Recherche des Cordeliers, UMRS 1138, Inserm, Université Paris Cité, Sorbonne Université, Paris, France; Inria Paris, Paris, France; Department of Biomedical Informatics, Hôpital Européen Georges Pompidou, AP-HP, Paris, France; Centre Hospitalier National d'Ophtalmologie des Quinze-Vingts, IHU FOReSIGHT, 75012 Paris, France.
| | - Adrien Coulet
- Centre de Recherche des Cordeliers, UMRS 1138, Inserm, Université Paris Cité, Sorbonne Université, Paris, France; Inria Paris, Paris, France.
| | - Anita Burgun
- Centre de Recherche des Cordeliers, UMRS 1138, Inserm, Université Paris Cité, Sorbonne Université, Paris, France; Inria Paris, Paris, France; Department of Biomedical Informatics, Hôpital Européen Georges Pompidou, AP-HP, Paris, France.
| | - Bastien Rance
- Centre de Recherche des Cordeliers, UMRS 1138, Inserm, Université Paris Cité, Sorbonne Université, Paris, France; Inria Paris, Paris, France; Department of Biomedical Informatics, Hôpital Européen Georges Pompidou, AP-HP, Paris, France.
| |
Collapse
|
2
|
A Review of the Role and Challenges of Big Data in Healthcare Informatics and Analytics. COMPUTATIONAL INTELLIGENCE AND NEUROSCIENCE 2022; 2022:5317760. [PMID: 36210978 PMCID: PMC9536942 DOI: 10.1155/2022/5317760] [Citation(s) in RCA: 7] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 05/12/2022] [Revised: 07/02/2022] [Accepted: 07/05/2022] [Indexed: 11/17/2022]
Abstract
Healthcare has evolved with the development of technology to improve the quality of life and save lives. Today, big data is considered as one of the most essential and promising future technology areas and has been attracting the medical community's attention. As a result of big data, we can improve patient outcomes, personalize care, improve relationships between the patient and the provider, and decrease hospital costs. The effect of big data is very large since medical societies are known for their size, diversity of complexity, and a high degree of dynamism. Big data has been discussed from different viewpoints in recent years, protecting its involvement in many aspects, specifically those related to the healthcare system. Assembling health information, sharing data, and integrating health are essential in spreading health care. In addition, the security and privacy of data are critical since the data must be accessed from multiple locations within the distributed system. This paper review aims to understand the role of big data in healthcare issues aggregating data and the challenges associated with big data in healthcare. The papers that have been selected for review are from last year's research.
Collapse
|
3
|
Zhang H, Lyu T, Yin P, Bost S, He X, Guo Y, Prosperi M, Hogan WR, Bian J. A scoping review of semantic integration of health data and information. Int J Med Inform 2022; 165:104834. [PMID: 35863206 DOI: 10.1016/j.ijmedinf.2022.104834] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/21/2022] [Revised: 07/06/2022] [Accepted: 07/13/2022] [Indexed: 11/25/2022]
Abstract
OBJECTIVE We summarized a decade of new research focusing on semantic data integration (SDI) since 2009, and we aim to: (1) summarize the state-of-art approaches on integrating health data and information; and (2) identify the main gaps and challenges of integrating health data and information from multiple levels and domains. MATERIALS AND METHODS We used PubMed as our focus is applications of SDI in biomedical domains and followed the Preferred Reporting Items for Systematic Review and Meta-Analyses (PRISMA) to search and report for relevant studies published between January 1, 2009 and December 31, 2021. We used Covidence-a systematic review management system-to carry out this scoping review. RESULTS The initial search from PubMed resulted in 5,326 articles using the two sets of keywords. We then removed 44 duplicates and 5,282 articles were retained for abstract screening. After abstract screening, we included 246 articles for full-text screening, among which 87 articles were deemed eligible for full-text extraction. We summarized the 87 articles from four aspects: (1) methods for the global schema; (2) data integration strategies (i.e., federated system vs. data warehousing); (3) the sources of the data; and (4) downstream applications. CONCLUSION SDI approach can effectively resolve the semantic heterogeneities across different data sources. We identified two key gaps and challenges in existing SDI studies that (1) many of the existing SDI studies used data from only single-level data sources (e.g., integrating individual-level patient records from different hospital systems), and (2) documentation of the data integration processes is sparse, threatening the reproducibility of SDI studies.
Collapse
Affiliation(s)
- Hansi Zhang
- Health Outcomes & Biomedical Informatics, College of Medicine, University of Florida, Gainesville, FL, United States
| | - Tianchen Lyu
- Health Outcomes & Biomedical Informatics, College of Medicine, University of Florida, Gainesville, FL, United States
| | - Pengfei Yin
- Health Outcomes & Biomedical Informatics, College of Medicine, University of Florida, Gainesville, FL, United States
| | - Sarah Bost
- Health Outcomes & Biomedical Informatics, College of Medicine, University of Florida, Gainesville, FL, United States
| | - Xing He
- Health Outcomes & Biomedical Informatics, College of Medicine, University of Florida, Gainesville, FL, United States
| | - Yi Guo
- Health Outcomes & Biomedical Informatics, College of Medicine, University of Florida, Gainesville, FL, United States
| | - Mattia Prosperi
- Department of Epidemiology, College of Medicine, University of Florida, Gainesville, FL, United States
| | - Willian R Hogan
- Health Outcomes & Biomedical Informatics, College of Medicine, University of Florida, Gainesville, FL, United States
| | - Jiang Bian
- Health Outcomes & Biomedical Informatics, College of Medicine, University of Florida, Gainesville, FL, United States.
| |
Collapse
|
4
|
Murphy SN, Visweswaran S, Becich MJ, Campion TR, Knosp BM, Melton-Meaux GB, Lenert LA. Research data warehouse best practices: catalyzing national data sharing through informatics innovation. J Am Med Inform Assoc 2022; 29:581-584. [PMID: 35289371 PMCID: PMC8922176 DOI: 10.1093/jamia/ocac024] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/04/2022] [Accepted: 02/14/2022] [Indexed: 11/12/2022] Open
Affiliation(s)
- Shawn N Murphy
- Research Information Science and Computing, Mass General Brigham, Somerville, Massachusetts, USA
- Department of Neurology, Massachusetts General Hospital and Harvard Medical School, Boston, Massachusetts, USA
| | - Shyam Visweswaran
- Department of Biomedical Informatics, University of Pittsburgh School of Medicine, Pittsburgh, Pennsylvania, USA
- Clinical and Translational Science Institute, University of Pittsburgh School of Medicine, Pittsburgh, Pennsylvania, USA
| | - Michael J Becich
- Department of Biomedical Informatics, University of Pittsburgh School of Medicine, Pittsburgh, Pennsylvania, USA
- Clinical and Translational Science Institute, University of Pittsburgh School of Medicine, Pittsburgh, Pennsylvania, USA
| | - Thomas R Campion
- Department of Population Health Sciences, Weill Cornell Medicine, New York, New York, USA
- Clinical and Translational Science Center, Weill Cornell Medicine, New York, New York, USA
| | - Boyd M Knosp
- Roy J. and Lucille A. Carver College of Medicine and the Institute for Clinical & Translational Science, University of Iowa, Iowa City, Iowa, USA
| | - Genevieve B Melton-Meaux
- Department of Surgery, University of Minnesota, Minneapolis, Minnesota, USA
- Institute for Health Informatics (IHI), University of Minnesota, Minneapolis, Minnesota, USA
| | - Leslie A Lenert
- Biomedical Informatics Center (BMIC), Medical University of South Carolina, Charleston, South Carolina, USA
- Health Sciences South Carolina, Columbia, South Carolina, USA
| |
Collapse
|
5
|
Smith J, Shi Y, Benedikt M, Nikolic M. Scalable analysis of multi-modal biomedical data. Gigascience 2021; 10:giab058. [PMID: 34508579 PMCID: PMC8434767 DOI: 10.1093/gigascience/giab058] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/18/2020] [Revised: 05/31/2021] [Accepted: 08/18/2021] [Indexed: 11/15/2022] Open
Abstract
BACKGROUND Targeted diagnosis and treatment options are dependent on insights drawn from multi-modal analysis of large-scale biomedical datasets. Advances in genomics sequencing, image processing, and medical data management have supported data collection and management within medical institutions. These efforts have produced large-scale datasets and have enabled integrative analyses that provide a more thorough look of the impact of a disease on the underlying system. The integration of large-scale biomedical data commonly involves several complex data transformation steps, such as combining datasets to build feature vectors for learning analysis. Thus, scalable data integration solutions play a key role in the future of targeted medicine. Though large-scale data processing frameworks have shown promising performance for many domains, they fail to support scalable processing of complex datatypes. SOLUTION To address these issues and achieve scalable processing of multi-modal biomedical data, we present TraNCE, a framework that automates the difficulties of designing distributed analyses with complex biomedical data types. PERFORMANCE We outline research and clinical applications for the platform, including data integration support for building feature sets for classification. We show that the system is capable of outperforming the common alternative, based on "flattening" complex data structures, and runs efficiently when alternative approaches are unable to perform at all.
Collapse
Affiliation(s)
- Jaclyn Smith
- University of Oxford, Computer Science, Wolfson Building, Parks Road, Oxford OX1 3QD, UK
| | - Yao Shi
- University of Oxford, Computer Science, Wolfson Building, Parks Road, Oxford OX1 3QD, UK
| | - Michael Benedikt
- University of Oxford, Computer Science, Wolfson Building, Parks Road, Oxford OX1 3QD, UK
| | - Milos Nikolic
- University of Edinburgh, School of Informatics, Informatics Forum, 10 Crichton St, Newington, Edinburgh EH8 9AB, Scotland
| |
Collapse
|
6
|
Eschrich SA, Teer JK, Reisman P, Siegel E, Challa C, Lewis P, Fellows K, Malpica E, Carvajal R, Gonzalez G, Cukras S, Betin-Montes M, Aden-Buie G, Avedon M, Manning D, Tan AC, Fridley BL, Gerke T, Van Looveren M, Blake A, Greenman J, Rollison D. Enabling Precision Medicine in Cancer Care Through a Molecular Data Warehouse: The Moffitt Experience. JCO Clin Cancer Inform 2021; 5:561-569. [PMID: 33989014 DOI: 10.1200/cci.20.00175] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/20/2022] Open
Abstract
PURPOSE The use of genomics within cancer research and clinical oncology practice has become commonplace. Efforts such as The Cancer Genome Atlas have characterized the cancer genome and suggested a wealth of targets for implementing precision medicine strategies for patients with cancer. The data produced from research studies and clinical care have many potential secondary uses beyond their originally intended purpose. Effective storage, query, retrieval, and visualization of these data are essential to create an infrastructure to enable new discoveries in cancer research. METHODS Moffitt Cancer Center implemented a molecular data warehouse to complement the extensive enterprise clinical data warehouse (Health and Research Informatics). Seven different sequencing experiment types were included in the warehouse, with data from institutional research studies and clinical sequencing. RESULTS The implementation of the molecular warehouse involved the close collaboration of many teams with different expertise and a use case-focused approach. Cornerstones of project success included project planning, open communication, institutional buy-in, piloting the implementation, implementing custom solutions to address specific problems, data quality improvement, and data governance, unique aspects of which are featured here. We describe our experience in selecting, configuring, and loading molecular data into the molecular data warehouse. Specifically, we developed solutions for heterogeneous genomic sequencing cohorts (many different platforms) and integration with our existing clinical data warehouse. CONCLUSION The implementation was ultimately successful despite challenges encountered, many of which can be generalized to other research cancer centers.
Collapse
Affiliation(s)
- Steven A Eschrich
- Department of Biostatistics & Bioinformatics, Moffitt Cancer Center, Tampa, FL
| | - Jamie K Teer
- Department of Biostatistics & Bioinformatics, Moffitt Cancer Center, Tampa, FL
| | | | - Erin Siegel
- Total Cancer Care, Moffitt Cancer Center, Tampa, FL
| | | | - Patricia Lewis
- Data Quality and Business Intelligence, Moffitt Cancer Center, Tampa, FL
| | - Katherine Fellows
- Data Quality and Business Intelligence, Moffitt Cancer Center, Tampa, FL
| | | | - Rodrigo Carvajal
- Biostatistics and Bioinformatics Shared Resource, Moffitt Cancer Center, Tampa, FL
| | - Guillermo Gonzalez
- Biostatistics and Bioinformatics Shared Resource, Moffitt Cancer Center, Tampa, FL
| | - Scott Cukras
- Biostatistics and Bioinformatics Shared Resource, Moffitt Cancer Center, Tampa, FL
| | - Miguel Betin-Montes
- Biostatistics and Bioinformatics Shared Resource, Moffitt Cancer Center, Tampa, FL
| | | | - Melissa Avedon
- Basic, Population, and Quantitative Science Shared Resource Administration, Moffitt Cancer Center, Tampa, FL
| | - Daniel Manning
- Information Technology, Moffitt Cancer Center, Tampa, FL
| | - Aik Choon Tan
- Department of Biostatistics & Bioinformatics, Moffitt Cancer Center, Tampa, FL
| | - Brooke L Fridley
- Department of Biostatistics & Bioinformatics, Moffitt Cancer Center, Tampa, FL
| | - Travis Gerke
- Health Informatics, Moffitt Cancer Center, Tampa, FL
| | | | | | | | - Dana Rollison
- Department of Epidemiology, Moffitt Cancer Center, Tampa, FL
| |
Collapse
|
7
|
Gruendner J, Wolf N, Tögel L, Haller F, Prokosch HU, Christoph J. Integrating Genomics and Clinical Data for Statistical Analysis by Using GEnome MINIng (GEMINI) and Fast Healthcare Interoperability Resources (FHIR): System Design and Implementation. J Med Internet Res 2020; 22:e19879. [PMID: 33026356 PMCID: PMC7578821 DOI: 10.2196/19879] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/07/2020] [Revised: 07/26/2020] [Accepted: 08/17/2020] [Indexed: 11/13/2022] Open
Abstract
BACKGROUND The introduction of next-generation sequencing (NGS) into molecular cancer diagnostics has led to an increase in the data available for the identification and evaluation of driver mutations and for defining personalized cancer treatment regimens. The meaningful combination of omics data, ie, pathogenic gene variants and alterations with other patient data, to understand the full picture of malignancy has been challenging. OBJECTIVE This study describes the implementation of a system capable of processing, analyzing, and subsequently combining NGS data with other clinical patient data for analysis within and across institutions. METHODS On the basis of the already existing NGS analysis workflows for the identification of malignant gene variants at the Institute of Pathology of the University Hospital Erlangen, we defined basic requirements on an NGS processing and analysis pipeline and implemented a pipeline based on the GEMINI (GEnome MINIng) open source genetic variation database. For the purpose of validation, this pipeline was applied to data from the 1000 Genomes Project and subsequently to NGS data derived from 206 patients of a local hospital. We further integrated the pipeline into existing structures of data integration centers at the University Hospital Erlangen and combined NGS data with local nongenomic patient-derived data available in Fast Healthcare Interoperability Resources format. RESULTS Using data from the 1000 Genomes Project and from the patient cohort as input, the implemented system produced the same results as already established methodologies. Further, it satisfied all our identified requirements and was successfully integrated into the existing infrastructure. Finally, we showed in an exemplary analysis how the data could be quickly loaded into and analyzed in KETOS, a web-based analysis platform for statistical analysis and clinical decision support. CONCLUSIONS This study demonstrates that the GEMINI open source database can be augmented to create an NGS analysis pipeline. The pipeline generates high-quality results consistent with the already established workflows for gene variant annotation and pathological evaluation. We further demonstrate how NGS-derived genomic and other clinical data can be combined for further statistical analysis, thereby providing for data integration using standardized vocabularies and methods. Finally, we demonstrate the feasibility of the pipeline integration into hospital workflows by providing an exemplary integration into the data integration center infrastructure, which is currently being established across Germany.
Collapse
Affiliation(s)
- Julian Gruendner
- Department of Medical Informatics, Friedrich-Alexander University, Erlangen-Nürnberg, Erlangen-Tennenlohe, Germany
| | - Nicolas Wolf
- Department of Medical Informatics, Friedrich-Alexander University, Erlangen-Nürnberg, Erlangen-Tennenlohe, Germany
| | - Lars Tögel
- Diagnostic Molecular Pathology, Institute of Pathology, Friedrich-Alexander University, Erlangen-Nürnberg, Erlangen, Germany
| | - Florian Haller
- Diagnostic Molecular Pathology, Institute of Pathology, Friedrich-Alexander University, Erlangen-Nürnberg, Erlangen, Germany
| | - Hans-Ulrich Prokosch
- Department of Medical Informatics, Friedrich-Alexander University, Erlangen-Nürnberg, Erlangen-Tennenlohe, Germany
| | - Jan Christoph
- Department of Medical Informatics, Friedrich-Alexander University, Erlangen-Nürnberg, Erlangen-Tennenlohe, Germany
| |
Collapse
|
8
|
Smith JM, Lathara M, Wright H, Hill B, Ganapati N, Srinivasa G, Denny CT. Advancing clinical cohort selection with genomics analysis on a distributed platform. PLoS One 2020; 15:e0231826. [PMID: 32324802 PMCID: PMC7179830 DOI: 10.1371/journal.pone.0231826] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/22/2019] [Accepted: 04/01/2020] [Indexed: 01/02/2023] Open
Abstract
The affordability of next-generation genomic sequencing and the improvement of medical data management have contributed largely to the evolution of biological analysis from both a clinical and research perspective. Precision medicine is a response to these advancements that places individuals into better-defined subsets based on shared clinical and genetic features. The identification of personalized diagnosis and treatment options is dependent on the ability to draw insights from large-scale, multi-modal analysis of biomedical datasets. Driven by a real use case, we premise that platforms that support precision medicine analysis should maintain data in their optimal data stores, should support distributed storage and query mechanisms, and should scale as more samples are added to the system. We extended a genomics-based columnar data store, GenomicsDB, for ease of use within a distributed analytics platform for clinical and genomic data integration, known as the ODA framework. The framework supports interaction from an i2b2 plugin as well as a notebook environment. We show that the ODA framework exhibits worst-case linear scaling for array size (storage), import time (data construction), and query time for an increasing number of samples. We go on to show worst-case linear time for both import of clinical data and aggregate query execution time within a distributed environment. This work highlights the integration of a distributed genomic database with a distributed compute environment to support scalable and efficient precision medicine queries from a HIPAA-compliant, cohort system in a real-world setting. The ODA framework is currently deployed in production to support precision medicine exploration and analysis from clinicians and researchers at UCLA David Geffen School of Medicine.
Collapse
Affiliation(s)
- Jaclyn M. Smith
- Department of Computer Science, University of Oxford, Oxford, United Kingdom
- Omics Data Automation Inc., Beaverton, Oregon, United States of America
| | - Melvin Lathara
- Omics Data Automation Inc., Beaverton, Oregon, United States of America
| | - Hollis Wright
- Omics Data Automation Inc., Beaverton, Oregon, United States of America
| | - Brian Hill
- Omics Data Automation Inc., Beaverton, Oregon, United States of America
- Department of Computer Science, University of California, Los Angeles, California, United States of America
| | - Nalini Ganapati
- Omics Data Automation Inc., Beaverton, Oregon, United States of America
| | | | - Christopher T. Denny
- Division of Hematology/Oncology, Department of Pediatrics, Gwynne Hazen Cherry Memorial Laboratories, University of California, Los Angeles, California, United States of America
- Molecular Biology Institute, University of California, Los Angeles, California, United States of America
- Jonsson Comprehensive Cancer Center, University of California, Los Angeles, California, United States of America
- California NanoSystems Institute, University of California, Los Angeles, California, United States of America
| |
Collapse
|
9
|
Emam I, Elyasigomari V, Matthews A, Pavlidis S, Rocca-Serra P, Guitton F, Verbeeck D, Grainger L, Borgogni E, Del Giudice G, Saqi M, Houston P, Guo Y. PlatformTM, a standards-based data custodianship platform for translational medicine research. Sci Data 2019; 6:149. [PMID: 31409798 PMCID: PMC6692384 DOI: 10.1038/s41597-019-0156-9] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/23/2018] [Accepted: 07/25/2019] [Indexed: 12/20/2022] Open
Abstract
Biomedical informatics has traditionally adopted a linear view of the informatics process (collect, store and analyse) in translational medicine (TM) studies; focusing primarily on the challenges in data integration and analysis. However, a data management challenge presents itself with the new lifecycle view of data emphasized by the recent calls for data re-use, long term data preservation, and data sharing. There is currently a lack of dedicated infrastructure focused on the 'manageability' of the data lifecycle in TM research between data collection and analysis. Current community efforts towards establishing a culture for open science prompt the creation of a data custodianship environment for management of TM data assets to support data reuse and reproducibility of research results. Here we present the development of a lifecycle-based methodology to create a metadata management framework based on community driven standards for standardisation, consolidation and integration of TM research data. Based on this framework, we also present the development of a new platform (PlatformTM) focused on managing the lifecycle for translational research data assets.
Collapse
Affiliation(s)
- Ibrahim Emam
- Data Science Institute, Imperial College London, London, UK.
| | | | - Alex Matthews
- Clinical Research Centre, University of Surrey, Guildford, UK
| | | | | | | | | | | | | | | | - Mansoor Saqi
- Data Science Institute, Imperial College London, London, UK
| | - Paul Houston
- CDISC, Clinical Data Interchange Standards Consortium and CDISC EU Foundation, London, UK
| | - Yike Guo
- Data Science Institute, Imperial College London, London, UK
| |
Collapse
|
10
|
Ghogawala Z, Dunbar MR, Essa I. Lumbar spondylolisthesis: modern registries and the development of artificial intelligence. J Neurosurg Spine 2019; 30:729-735. [PMID: 31153155 DOI: 10.3171/2019.2.spine18751] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/12/2019] [Accepted: 02/20/2019] [Indexed: 11/06/2022]
Abstract
OBJECTIVEThere are a wide variety of comparative treatment options in neurosurgery that do not lend themselves to traditional randomized controlled trials. The object of this article was to examine how clinical registries might be used to generate new evidence to support a particular treatment option when comparable options exist. Lumbar spondylolisthesis is used as an example.METHODSThe authors reviewed the literature examining the comparative effectiveness of decompression alone versus decompression with fusion for lumbar stenosis with degenerative spondylolisthesis. Modern data acquisition for the creation of registries was also reviewed with an eye toward how artificial intelligence for the treatment of lumbar spondylolisthesis might be explored.RESULTSCurrent randomized controlled trials differ on the importance of adding fusion when performing decompression for lumbar spondylolisthesis. Standardized approaches to extracting data from the electronic medical record as well as the ability to capture radiographic imaging and incorporate patient-reported outcomes (PROs) will ultimately lead to the development of modern, structured, data-filled registries that will lay the foundation for machine learning.CONCLUSIONSThere is a growing realization that patient experience, satisfaction, and outcomes are essential to improving the overall quality of spine care. There is a need to use practical, validated PRO tools in the quest to optimize outcomes within spine care. Registries will be designed to contain robust clinical data in which predictive analytics can be generated to develop and guide data-driven personalized spine care.
Collapse
Affiliation(s)
- Zoher Ghogawala
- 1Alan L. and Jacqueline B. Stuart Spine Research Center, Department of Neurosurgery, Lahey Hospital & Medical Center, Burlington, Massachusetts
- 2Department of Neurosurgery, Tufts University School of Medicine, Boston, Massachusetts; and
| | - Melissa R Dunbar
- 1Alan L. and Jacqueline B. Stuart Spine Research Center, Department of Neurosurgery, Lahey Hospital & Medical Center, Burlington, Massachusetts
| | - Irfan Essa
- 3College of Computing, Georgia Institute of Technology, Atlanta, Georgia
| |
Collapse
|
11
|
Ofili EO, Schanberg LE, Hutchinson B, Sogade F, Fergus I, Duncan P, Hargrove J, Artis A, Onyekwere O, Batchelor W, Williams M, Oduwole A, Onwuanyi A, Ojutalayo F, Cross JA, Seto TB, Okafor H, Pemu P, Immergluck L, Foreman M, Mensah EA, Quarshie A, Mubasher M, Baker A, Ngare A, Dent A, Malouhi M, Tchounwou P, Lee J, Hayes T, Abdelrahim M, Sarpong D, Fernandez-Repollet E, Sodeke SO, Hernandez A, Thomas K, Dennos A, Smith D, Gbadebo D, Ajuluchikwu J, Kong BW, McCollough C, Weiler SR, Natter MD, Mandl KD, Murphy S. The Association of Black Cardiologists (ABC) Cardiovascular Implementation Study (CVIS): A Research Registry Integrating Social Determinants to Support Care for Underserved Patients. INTERNATIONAL JOURNAL OF ENVIRONMENTAL RESEARCH AND PUBLIC HEALTH 2019; 16:E1631. [PMID: 31083298 PMCID: PMC6539418 DOI: 10.3390/ijerph16091631] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 12/16/2018] [Revised: 02/14/2019] [Accepted: 02/28/2019] [Indexed: 01/12/2023]
Abstract
African Americans, other minorities and underserved populations are consistently under- represented in clinical trials. Such underrepresentation results in a gap in the evidence base, and health disparities. The ABC Cardiovascular Implementation Study (CVIS) is a comprehensive prospective cohort registry that integrates social determinants of health. ABC CVIS uses real world clinical practice data to address critical gaps in care by facilitating robust participation of African Americans and other minorities in clinical trials. ABC CVIS will include diverse patients from collaborating ABC member private practices, as well as patients from academic health centers and Federally Qualified Health Centers (FQHCs). This paper describes the rationale and design of the ABC CVIS Registry. The registry will: (1) prospectively collect socio-demographic, clinical and biospecimen data from enrolled adults, adolescents and children with prioritized cardiovascular diseases; (2) Evaluate the safety and clinical outcomes of new therapeutic agents, including post marketing surveillance and pharmacovigilance; (3) Support National Institutes of Health (NIH) and industry sponsored research; (4) Support Quality Measures standards from the Center for Medicare and Medicaid Services (CMS) and Commercial Health Plans. The registry will utilize novel data and technology tools to facilitate mobile health technology application programming interface (API) to health system or practice electronic health records (EHR). Long term, CVIS will become the most comprehensive patient registry for underserved diverse patients with cardiovascular disease (CVD) and co morbid conditions, providing real world data to address health disparities. At least 10,000 patients will be enrolled from 50 sites across the United States.
Collapse
Affiliation(s)
- Elizabeth O Ofili
- Department of Clinical Research Center, Morehouse School of Medicine, 720 Westview Drive, SW, Atlanta, GA 30310, USA.
| | - Laura E Schanberg
- Department of Pediatrics, Duke Clinical Research Institute, Duke University School of Medicine, 2400 Pratt St., Durham, NC 27705, USA.
| | - Barbara Hutchinson
- Association of Black Cardiologists,2400 N Street, Suite 200, Washington, DC 20037, USA.
| | - Felix Sogade
- Association of Black Cardiologists,2400 N Street, Suite 200, Washington, DC 20037, USA.
| | - Icilma Fergus
- Association of Black Cardiologists,2400 N Street, Suite 200, Washington, DC 20037, USA.
| | - Phillip Duncan
- Association of Black Cardiologists,2400 N Street, Suite 200, Washington, DC 20037, USA.
| | - Joe Hargrove
- Department of Pediatrics, Duke Clinical Research Institute, Duke University School of Medicine, 2400 Pratt St., Durham, NC 27705, USA.
| | - Andre Artis
- Association of Black Cardiologists,2400 N Street, Suite 200, Washington, DC 20037, USA.
| | - Osita Onyekwere
- Association of Black Cardiologists,2400 N Street, Suite 200, Washington, DC 20037, USA.
| | - Wayne Batchelor
- Association of Black Cardiologists,2400 N Street, Suite 200, Washington, DC 20037, USA.
| | - Marcus Williams
- Association of Black Cardiologists,2400 N Street, Suite 200, Washington, DC 20037, USA.
| | - Adefisayo Oduwole
- Department of Clinical Research Center, Morehouse School of Medicine, 720 Westview Drive, SW, Atlanta, GA 30310, USA.
| | - Anekwe Onwuanyi
- Department of Clinical Research Center, Morehouse School of Medicine, 720 Westview Drive, SW, Atlanta, GA 30310, USA.
| | - Folake Ojutalayo
- Department of Clinical Research Center, Morehouse School of Medicine, 720 Westview Drive, SW, Atlanta, GA 30310, USA.
| | - Jo Ann Cross
- Department of Clinical Research Center, Morehouse School of Medicine, 720 Westview Drive, SW, Atlanta, GA 30310, USA.
| | - Todd B Seto
- Department of Academic Affairs and Research, The Queen's Medical Center, 1301 Punchbowl Street, Honolulu, HI 96813, USA.
| | - Henry Okafor
- Department of Medicine, Meharry Medical College,1818 Albion St, Nashville, TN 37208, USA.
| | - Priscilla Pemu
- Department of Clinical Research Center, Morehouse School of Medicine, 720 Westview Drive, SW, Atlanta, GA 30310, USA.
| | - Lilly Immergluck
- Department of Clinical Research Center, Morehouse School of Medicine, 720 Westview Drive, SW, Atlanta, GA 30310, USA.
| | - Marilyn Foreman
- Department of Clinical Research Center, Morehouse School of Medicine, 720 Westview Drive, SW, Atlanta, GA 30310, USA.
| | - Ernest Alema Mensah
- Department of Clinical Research Center, Morehouse School of Medicine, 720 Westview Drive, SW, Atlanta, GA 30310, USA.
| | - Alexander Quarshie
- Department of Clinical Research Center, Morehouse School of Medicine, 720 Westview Drive, SW, Atlanta, GA 30310, USA.
| | - Mohamed Mubasher
- Department of Clinical Research Center, Morehouse School of Medicine, 720 Westview Drive, SW, Atlanta, GA 30310, USA.
| | - Almelida Baker
- Department of Clinical Research Center, Morehouse School of Medicine, 720 Westview Drive, SW, Atlanta, GA 30310, USA.
| | - Alnida Ngare
- RCMI Data Coordinating Center, Jackson State University, 1400 John R. Lynch Street, Jackson, MS 39217, USA.
| | - Andrew Dent
- RCMI Data Coordinating Center, Jackson State University, 1400 John R. Lynch Street, Jackson, MS 39217, USA.
| | - Mohamad Malouhi
- RCMI Data Coordinating Center, Jackson State University, 1400 John R. Lynch Street, Jackson, MS 39217, USA.
| | - Paul Tchounwou
- RCMI Data Coordinating Center, Jackson State University, 1400 John R. Lynch Street, Jackson, MS 39217, USA.
| | - Jae Lee
- RCMI Data Coordinating Center, Jackson State University, 1400 John R. Lynch Street, Jackson, MS 39217, USA.
| | - Traci Hayes
- RCMI Data Coordinating Center, Jackson State University, 1400 John R. Lynch Street, Jackson, MS 39217, USA.
| | - Muna Abdelrahim
- RCMI Data Coordinating Center, Jackson State University, 1400 John R. Lynch Street, Jackson, MS 39217, USA.
| | - Daniel Sarpong
- Department of Biostatistics, College of Pharmacy, Xavier University of Louisiana, 1 Drexel Drive, New Orleans, LA 70125, USA.
| | - Emma Fernandez-Repollet
- Department of Pharmacology and Toxicology, University of Puerto Rico Medical Sciences Campus, P.O. Box 365067, San Juan, PR 00936, Puerto Rico.
| | - Stephen O Sodeke
- Department of Bioethics, Tuskegee University, 1200 W. Montgomery Rd., Tuskegee, AL 36088, USA.
| | - Adrian Hernandez
- Department of Pediatrics, Duke Clinical Research Institute, Duke University School of Medicine, 2400 Pratt St., Durham, NC 27705, USA.
| | - Kevin Thomas
- Department of Pediatrics, Duke Clinical Research Institute, Duke University School of Medicine, 2400 Pratt St., Durham, NC 27705, USA.
| | - Anne Dennos
- Department of Pediatrics, Duke Clinical Research Institute, Duke University School of Medicine, 2400 Pratt St., Durham, NC 27705, USA.
| | - David Smith
- Association of Black Cardiologists,2400 N Street, Suite 200, Washington, DC 20037, USA.
| | - David Gbadebo
- Association of Black Cardiologists,2400 N Street, Suite 200, Washington, DC 20037, USA.
| | - Janet Ajuluchikwu
- Association of Black Cardiologists,2400 N Street, Suite 200, Washington, DC 20037, USA.
- Department of Medicine, College of Medicine of the University of Lagos, Private Mail Bag 12003, Idi Araba, Lagos, Nigeria.
| | - B Waine Kong
- Association of Black Cardiologists,2400 N Street, Suite 200, Washington, DC 20037, USA.
| | - Cassandra McCollough
- Association of Black Cardiologists,2400 N Street, Suite 200, Washington, DC 20037, USA.
| | - Sarah R Weiler
- Department of Pediatrics and Computational Health Informatics, Harvard Medical School, 300 Longwood Avenue, Boston, MA 02115, USA.
| | - Marc D Natter
- Department of Pediatrics and Computational Health Informatics, Harvard Medical School, 300 Longwood Avenue, Boston, MA 02115, USA.
| | - Kenneth D Mandl
- Department of Pediatrics and Computational Health Informatics, Harvard Medical School, 300 Longwood Avenue, Boston, MA 02115, USA.
| | - Shawn Murphy
- Department of Pediatrics and Computational Health Informatics, Harvard Medical School, 300 Longwood Avenue, Boston, MA 02115, USA.
| |
Collapse
|
12
|
Wagholikar KB, Fischer CM, Goodson AP, Herrick CD, Maclean TE, Smith KV, Fera L, Gaziano TA, Dunning JR, Bosque-Hamilton J, Matta L, Toscano E, Richter B, Ainsworth L, Oates MF, Aronson S, MacRae CA, Scirica BM, Desai AS, Murphy SN. Phenotyping to Facilitate Accrual for a Cardiovascular Intervention. J Clin Med Res 2019; 11:458-463. [PMID: 31143314 PMCID: PMC6522233 DOI: 10.14740/jocmr3830] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/02/2019] [Accepted: 04/30/2019] [Indexed: 01/29/2023] Open
Abstract
Background The conventional approach for clinical studies is to identify a cohort of potentially eligible patients and then screen for enrollment. In an effort to reduce the cost and manual effort involved in the screening process, several studies have leveraged electronic health records (EHR) to refine cohorts to better match the eligibility criteria, which is referred to as phenotyping. We extend this approach to dynamically identify a cohort by repeating phenotyping in alternation with manual screening. Methods Our approach consists of multiple screen cycles. At the start of each cycle, the phenotyping algorithm is used to identify eligible patients from the EHR, creating an ordered list such that patients that are most likely eligible are listed first. This list is then manually screened, and the results are analyzed to improve the phenotyping for the next cycle. We describe the preliminary results and challenges in the implementation of this approach for an intervention study on heart failure. Results A total of 1,022 patients were screened, with 223 (23%) of patients being found eligible for enrollment into the intervention study. The iterative approach improved the phenotyping in each screening cycle. Without an iterative approach, the positive screening rate (PSR) was expected to dip below the 20% measured in the first cycle; however, the cyclical approach increased the PSR to 23%. Conclusions Our study demonstrates that dynamic phenotyping can facilitate recruitment for prospective clinical study. Future directions include improved informatics infrastructure and governance policies to enable real-time updates to research repositories, tooling for EHR annotation, and methodologies to reduce human annotation.
Collapse
Affiliation(s)
- Kavishwar B Wagholikar
- Harvard Medical School, Boston, MA, USA.,Massachusetts General Hospital, Boston, MA, USA
| | | | | | | | | | | | | | | | | | | | - Lina Matta
- Brigham and Women's Hospital, Boston, MA, USA
| | | | | | | | | | | | - Calum A MacRae
- Harvard Medical School, Boston, MA, USA.,Brigham and Women's Hospital, Boston, MA, USA
| | - Benjamin M Scirica
- Harvard Medical School, Boston, MA, USA.,Brigham and Women's Hospital, Boston, MA, USA
| | - Akshay S Desai
- Harvard Medical School, Boston, MA, USA.,Brigham and Women's Hospital, Boston, MA, USA
| | - Shawn N Murphy
- Harvard Medical School, Boston, MA, USA.,Massachusetts General Hospital, Boston, MA, USA
| |
Collapse
|
13
|
Blaisure JC, Ceusters WM. Enhancing the Representational Power of i2b2 through Referent Tracking. AMIA ... ANNUAL SYMPOSIUM PROCEEDINGS. AMIA SYMPOSIUM 2018; 2018:262-271. [PMID: 30815064 PMCID: PMC6371319] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Subscribe] [Scholar Register] [Indexed: 06/09/2023]
Abstract
The Informatics for Integrating Biology and the Bedside (i2b2) software platform has proven successful in leveraging clinical enterprise data for the identification of cohorts of patients satisfying certain demographic, phenotypic and genetic criteria in support of further studies. An unanswered question thus far is whether i2b2 search criteria could include characteristics of assertions themselves, e.g. diagnoses, rather than what the assertions (observations) are about, e.g. diseases. This would allow, for instance, to find cohorts of patients for which different providers have been in disagreement about what condition the patient is suffering from. Previous research has shown that this requires more explicit detail about, and unique identification of, two sorts of entities: those that directly or indirectly contribute to the coming into existence of such observations and those that are either explicitly mentioned or merely implied in the assertions. Our research here demonstrates that i2b2's modifier system can be used to represent the relationships between observations and their explicit or implied referents on the one hand, and between relevant referents themselves on the other hand, both in combination with the storage of explicit unique instance identifiers for these observations and referents in i2b2's fact table. While this approach adheres to i2b2's base functionality and implementation specifications, it makes explicit ambiguities and confusions that would otherwise remain undetected.
Collapse
Affiliation(s)
- Jonathan C Blaisure
- Institute for Healthcare Informatics, University at Buffalo, Buffalo, New York, USA
- Department of Biomedical Informatics, University at Buffalo, Buffalo, New York, USA
| | - Werner M Ceusters
- Institute for Healthcare Informatics, University at Buffalo, Buffalo, New York, USA
- Department of Biomedical Informatics, University at Buffalo, Buffalo, New York, USA
| |
Collapse
|
14
|
Zapletal E, Bibault JE, Giraud P, Burgun A. Integrating Multimodal Radiation Therapy Data into i2b2. Appl Clin Inform 2018; 9:377-390. [PMID: 29847842 PMCID: PMC5976493 DOI: 10.1055/s-0038-1651497] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/25/2022] Open
Abstract
Background
Clinical data warehouses are now widely used to foster clinical and translational research and the Informatics for Integrating Biology and the Bedside (i2b2) platform has become a de facto standard for storing clinical data in many projects. However, to design predictive models and assist in personalized treatment planning in cancer or radiation oncology, all available patient data need to be integrated into i2b2, including radiation therapy data that are currently not addressed in many existing i2b2 sites.
Objective
To use radiation therapy data in projects related to rectal cancer patients, we assessed the feasibility of integrating radiation oncology data into the i2b2 platform.
Methods
The Georges Pompidou European Hospital, a hospital from the Assistance Publique – Hôpitaux de Paris group, has developed an i2b2-based clinical data warehouse of various structured and unstructured clinical data for research since 2008. To store and reuse various radiation therapy data—dose details, activities scheduling, and dose-volume histogram (DVH) curves—in this repository, we first extracted raw data by using some reverse engineering techniques and a vendor's application programming interface. Then, we implemented a hybrid storage approach by combining the standard i2b2 “Entity-Attribute-Value” storage mechanism with a “JavaScript Object Notation (JSON) document-based” storage mechanism without modifying the i2b2 core tables. Validation was performed using (1) the Business Objects framework for replicating vendor's application screens showing dose details and activities scheduling data and (2) the R software for displaying the DVH curves.
Results
We developed a pipeline to integrate the radiation therapy data into the Georges Pompidou European Hospital i2b2 instance and evaluated it on a cohort of 262 patients. We were able to use the radiation therapy data on a preliminary use case by fetching the DVH curve data from the clinical data warehouse and displaying them in a R chart.
Conclusion
By adding radiation therapy data into the clinical data warehouse, we were able to analyze radiation therapy response in cancer patients and we have leveraged the i2b2 platform to store radiation therapy data, including detailed information such as the DVH to create new ontology-based modules that provides research investigators with a wider spectrum of clinical data.
Collapse
Affiliation(s)
- Eric Zapletal
- Department of Medical Informatics, Biostatistics, and Public Health, Georges Pompidou European Hospital, Assistance Publique-Hôpitaux de Paris, Paris Descartes Faculty of Medicine, Paris, France
| | - Jean-Emmanuel Bibault
- Department of Radiation Oncology, Georges Pompidou European Hospital, Assistance Publique-Hôpitaux de Paris, Paris Descartes Faculty of Medicine, Paris, France.,INSERM UMR 1138 Eq22, Cordeliers Research Centre, Paris Descartes University, Paris, France
| | - Philippe Giraud
- Department of Radiation Oncology, Georges Pompidou European Hospital, Assistance Publique-Hôpitaux de Paris, Paris Descartes Faculty of Medicine, Paris, France
| | - Anita Burgun
- Department of Medical Informatics, Biostatistics, and Public Health, Georges Pompidou European Hospital, Assistance Publique-Hôpitaux de Paris, Paris Descartes Faculty of Medicine, Paris, France.,INSERM UMR 1138 Eq22, Cordeliers Research Centre, Paris Descartes University, Paris, France
| |
Collapse
|
15
|
Kraus JM, Lausser L, Kuhn P, Jobst F, Bock M, Halanke C, Hummel M, Heuschmann P, Kestler HA. Big data and precision medicine: challenges and strategies with healthcare data. INTERNATIONAL JOURNAL OF DATA SCIENCE AND ANALYTICS 2018. [DOI: 10.1007/s41060-018-0095-0] [Citation(s) in RCA: 14] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/22/2023]
|