1
|
Post AR, Ho N, Rasmussen E, Post I, Cho A, Hofer J, Maness AT, Parnell T, Nix DA. Hypermedia-based software architecture enables Test-Driven Development. JAMIA Open 2023; 6:ooad089. [PMID: 37860604 PMCID: PMC10582517 DOI: 10.1093/jamiaopen/ooad089] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/31/2023] [Revised: 08/12/2023] [Accepted: 10/04/2023] [Indexed: 10/21/2023] Open
Abstract
Objectives Using agile software development practices, develop and evaluate an architecture and implementation for reliable and user-friendly self-service management of bioinformatic data stored in the cloud. Materials and methods Comprehensive Oncology Research Environment (CORE) Browser is a new open-source web application for cancer researchers to manage sequencing data organized in a flexible format in Amazon Simple Storage Service (S3) buckets. It has a microservices- and hypermedia-based architecture, which we integrated with Test-Driven Development (TDD), the iterative writing of computable specifications for how software should work prior to development. Relying on repeating patterns found in hypermedia-based architectures, we hypothesized that hypermedia would permit developing test "templates" that can be parameterized and executed for each microservice, maximizing code coverage while minimizing effort. Results After one-and-a-half years of development, the CORE Browser backend had 121 test templates and 875 custom tests that were parameterized and executed 3031 times, providing 78% code coverage. Discussion Architecting to permit test reuse through a hypermedia approach was a key success factor for our testing efforts. CORE Browser's application of hypermedia and TDD illustrates one way to integrate software engineering methods into data-intensive networked applications. Separating bioinformatic data management from analysis distinguishes this platform from others in bioinformatics and may provide stable data management while permitting analysis methods to advance more rapidly. Conclusion Software engineering practices are underutilized in informatics. Similar informatics projects will more likely succeed through application of good architecture and automated testing. Our approach is broadly applicable to data management tools involving cloud data storage.
Collapse
Affiliation(s)
- Andrew R Post
- Huntsman Cancer Institute, University of Utah, Salt Lake City, UT 84112, United States
- Department of Biomedical Informatics, University of Utah, Salt Lake City, UT 84112, United States
| | - Nancy Ho
- Huntsman Cancer Institute, University of Utah, Salt Lake City, UT 84112, United States
| | - Erik Rasmussen
- Huntsman Cancer Institute, University of Utah, Salt Lake City, UT 84112, United States
| | - Ivan Post
- Huntsman Cancer Institute, University of Utah, Salt Lake City, UT 84112, United States
| | - Aika Cho
- Huntsman Cancer Institute, University of Utah, Salt Lake City, UT 84112, United States
| | - John Hofer
- Huntsman Cancer Institute, University of Utah, Salt Lake City, UT 84112, United States
| | - Arthur T Maness
- Huntsman Cancer Institute, University of Utah, Salt Lake City, UT 84112, United States
| | - Timothy Parnell
- Huntsman Cancer Institute, University of Utah, Salt Lake City, UT 84112, United States
| | - David A Nix
- Huntsman Cancer Institute, University of Utah, Salt Lake City, UT 84112, United States
| |
Collapse
|
2
|
Morris BB, Smith JP, Zhang Q, Jiang Z, Hampton OA, Churchman ML, Arnold SM, Owen DH, Gray JE, Dillon PM, Soliman HH, Stover DG, Colman H, Chakravarti A, Shain KH, Silva AS, Villano JL, Vogelbaum MA, Borges VF, Akerley WL, Gentzler RD, Hall RD, Matsen CB, Ulrich CM, Post AR, Nix DA, Singer EA, Larner JM, Stukenberg PT, Jones DR, Mayo MW. Replicative Instability Drives Cancer Progression. Biomolecules 2022; 12:1570. [PMID: 36358918 PMCID: PMC9688014 DOI: 10.3390/biom12111570] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/16/2022] [Revised: 10/16/2022] [Accepted: 10/23/2022] [Indexed: 01/07/2023] Open
Abstract
In the past decade, defective DNA repair has been increasingly linked with cancer progression. Human tumors with markers of defective DNA repair and increased replication stress exhibit genomic instability and poor survival rates across tumor types. Seminal studies have demonstrated that genomic instability develops following inactivation of BRCA1, BRCA2, or BRCA-related genes. However, it is recognized that many tumors exhibit genomic instability but lack BRCA inactivation. We sought to identify a pan-cancer mechanism that underpins genomic instability and cancer progression in BRCA-wildtype tumors. Methods: Using multi-omics data from two independent consortia, we analyzed data from dozens of tumor types to identify patient cohorts characterized by poor outcomes, genomic instability, and wildtype BRCA genes. We developed several novel metrics to identify the genetic underpinnings of genomic instability in tumors with wildtype BRCA. Associated clinical data was mined to analyze patient responses to standard of care therapies and potential differences in metastatic dissemination. Results: Systematic analysis of the DNA repair landscape revealed that defective single-strand break repair, translesion synthesis, and non-homologous end-joining effectors drive genomic instability in tumors with wildtype BRCA and BRCA-related genes. Importantly, we find that loss of these effectors promotes replication stress, therapy resistance, and increased primary carcinoma to brain metastasis. Conclusions: Our results have defined a new pan-cancer class of tumors characterized by replicative instability (RIN). RIN is defined by the accumulation of intra-chromosomal, gene-level gain and loss events at replication stress sensitive (RSS) genome sites. We find that RIN accelerates cancer progression by driving copy number alterations and transcriptional program rewiring that promote tumor evolution. Clinically, we find that RIN drives therapy resistance and distant metastases across multiple tumor types.
Collapse
Affiliation(s)
- Benjamin B. Morris
- Department of Biochemistry and Molecular Genetics, University of Virginia, Charlottesville, VA 22908, USA
- Department of Pathology, University of Virginia, Charlottesville, VA 22908, USA
| | - Jason P. Smith
- Department of Biochemistry and Molecular Genetics, University of Virginia, Charlottesville, VA 22908, USA
- Center for Public Health Genomics, University of Virginia, Charlottesville, VA 22908, USA
| | | | | | | | | | - Susanne M. Arnold
- Division of Medical Oncology, Department of Internal Medicine, Markey Cancer Center, Lexington, KY 40536, USA
| | - Dwight H. Owen
- Division of Medical Oncology, Department of Internal Medicine, The Ohio State University Comprehensive Cancer Center, Columbus, OH 43210, USA
| | - Jhanelle E. Gray
- Department of Thoracic Oncology, H. Lee Moffitt Cancer Center and Research Institute, Tampa, FL 33612, USA
| | - Patrick M. Dillon
- Division of Hematology/Oncology, Department of Internal Medicine, University of Virginia Comprehensive Cancer Center, Charlottesville, VA 22908, USA
| | - Hatem H. Soliman
- Department of Breast Oncology, H. Lee Moffitt Cancer Center and Research Institute, Tampa, FL 33612, USA
| | - Daniel G. Stover
- Division of Medical Oncology, Department of Internal Medicine, The Ohio State University Comprehensive Cancer Center, Columbus, OH 43210, USA
| | - Howard Colman
- Huntsman Cancer Institute and Department of Neurosurgery, University of Utah, Salt Lake City, UT 84112, USA
| | - Arnab Chakravarti
- Department of Radiation Oncology, The Ohio State University Comprehensive Cancer Center, Columbus, OH 43210, USA
| | - Kenneth H. Shain
- Department of Malignant Hematology, H. Lee Moffitt Cancer Center and Research Institute, Tampa, FL 33612, USA
| | - Ariosto S. Silva
- Department of Cancer Physiology, H. Lee Moffitt Cancer Center and Research Institute, Tampa, FL 33612, USA
| | - John L. Villano
- Division of Medical Oncology, Department of Internal Medicine, Markey Cancer Center, Lexington, KY 40536, USA
| | | | - Virginia F. Borges
- Division of Medical Oncology, University of Colorado Comprehensive Cancer Center, Aurora, CO 80045, USA
| | - Wallace L. Akerley
- Department of Medical Oncology, Department of Internal Medicine, Huntsman Cancer Institute, Salt Lake City, UT 84112, USA
| | - Ryan D. Gentzler
- Division of Hematology/Oncology, Department of Internal Medicine, University of Virginia Comprehensive Cancer Center, Charlottesville, VA 22908, USA
| | - Richard D. Hall
- Division of Hematology/Oncology, Department of Internal Medicine, University of Virginia Comprehensive Cancer Center, Charlottesville, VA 22908, USA
| | - Cindy B. Matsen
- Department of Surgery, Huntsman Cancer Institute, University of Utah, Salt Lake City, UT 84112, USA
| | - C. M. Ulrich
- Huntsman Cancer Institute and Department of Population Health Sciences, University of Utah, Salt Lake City, UT 84112, USA
| | - Andrew R. Post
- Department of Biomedical Informatics and Huntsman Cancer Institute, University of Utah, Salt Lake City, UT 84112, USA
| | - David A. Nix
- Department of Oncological Sciences, Huntsman Cancer Institute, Salt Lake City, UT 84112, USA
| | - Eric A. Singer
- Section of Urologic Oncology, Rutgers Cancer Institute of New Jersey, New Brunswick, NJ 08901, USA
| | - James M. Larner
- Department of Radiation Oncology, University of Virginia Comprehensive Cancer Center, Charlottesville, VA 22908, USA
| | - Peter Todd Stukenberg
- Department of Biochemistry and Molecular Genetics, University of Virginia, Charlottesville, VA 22908, USA
| | - David R. Jones
- Department of Thoracic Surgery, Memorial Sloan-Kettering Cancer Center, New York, NY 10065, USA
| | - Marty W. Mayo
- Department of Biochemistry and Molecular Genetics, University of Virginia, Charlottesville, VA 22908, USA
| |
Collapse
|
3
|
Post AR, Burningham Z, Halwani AS. Electronic Health Record Data in Cancer Learning Health Systems: Challenges and Opportunities. JCO Clin Cancer Inform 2022; 6:e2100158. [PMID: 35353547 DOI: 10.1200/cci.21.00158] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/21/2022] Open
Affiliation(s)
- Andrew R Post
- Research Informatics Shared Resource, Huntsman Cancer Institute, University of Utah, Salt Lake City, UT.,Department of Biomedical Informatics, University of Utah School of Medicine, Salt Lake City, UT
| | - Zachary Burningham
- Division of Epidemiology, Department of Internal Medicine, University of Utah, Salt Lake City, UT
| | - Ahmad S Halwani
- Division of Hematology and Hematologic Malignancies, Department of Internal Medicine, University of Utah, Salt Lake City, UT
| |
Collapse
|
4
|
Rollison DE, M. Levin G, Warner JL, Pinder R, Havener LA, Behera M, Post AR, Gopalakrishnan R, Durbin EB. Current and Emerging Informatics Initiatives Impactful to Cancer Registries. J Registry Manag 2022; 49:153-160. [PMID: 37260815 PMCID: PMC10229192] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Subscribe] [Scholar Register] [Indexed: 06/02/2023]
Abstract
Cancer surveillance at the population level is a highly labor-intensive process, with certified tumor registrars (CTRs) manually reviewing medical charts of cancer patients and entering information into local databases that are centrally merged and curated at state and national levels. Registries face considerable challenges in terms of constrained budgets, staffing shortages, and keeping pace with the evolving national and international data standards that are essential to cancer registration. Advanced informatics methods are needed to increase automation, reduce manual efforts, and to help address some of these challenges. The Cancer Informatics Advisory Group (CIAG) to the North American Association of Central Cancer Registries (NAACCR) board was established in 2019 to advise of external informatics activities and initiatives for long-term strategic planning. Reviewed here by the CIAG are current informatics initiatives that were either born out of the cancer registry field or have implications for expansion to cancer surveillance programs in the future. Several areas of notable activity are presented, including an overview of informatics initiatives and descriptions of 12 specific informatics projects with implications for cancer registries. Recommendations are also provided to the registry community for the continued tracking and impact of the projects and initiatives.
Collapse
Affiliation(s)
| | | | - Jeremy L. Warner
- Lifespan Cancer Institute
- Vanderbilt University Medical Center, Nashville, Tennessee
| | - Rich Pinder
- Los Angeles Cancer Surveillance Program, Los Angeles, California
| | - Lori A. Havener
- North American Association of Central Cancer Registries, Springfield, Illinois
| | - Madhusmita Behera
- Winship Cancer Institute, Atlanta, Georgia
- Woodruff Health Sciences Center, Emory University, Atlanta, Georgia
| | - Andrew R. Post
- Huntsman Cancer Institute, University of Utah, Salt Lake City, Utah
| | | | - Eric B. Durbin
- Kentucky Cancer Registry, Lexington, Kentucky
- Markey Cancer Center, University of Kentucky, Lexington, Kentucky
| |
Collapse
|
5
|
Post AR, Luther J, Loveless JM, Ward M, Hewitt S. Enhancing research informatics core user satisfaction through agile practices. JAMIA Open 2021; 4:ooab103. [PMID: 34927001 PMCID: PMC8672926 DOI: 10.1093/jamiaopen/ooab103] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/10/2021] [Revised: 10/06/2021] [Accepted: 11/18/2021] [Indexed: 11/23/2022] Open
Abstract
OBJECTIVE The Huntsman Cancer Institute Research Informatics Shared Resource (RISR), a software and database development core facility, sought to address a lack of published operational best practices for research informatics cores. It aimed to use those insights to enhance effectiveness after an increase in team size from 20 to 31 full-time equivalents coincided with a reduction in user satisfaction. MATERIALS AND METHODS RISR migrated from a water-scrum-fall model of software development to agile software development practices, which emphasize iteration and collaboration. RISR's agile implementation emphasizes the product owner role, which is responsible for user engagement and may be particularly valuable in software development that requires close engagement with users like in science. RESULTS All RISR's software development teams implemented agile practices in early 2020. All project teams are led by a product owner who serves as the voice of the user on the development team. Annual user survey scores for service quality and turnaround time recorded 9 months after implementation increased by 17% and 11%, respectively. DISCUSSION RISR is illustrative of the increasing size of research informatics cores and the need to identify best practices for maintaining high effectiveness. Agile practices may address concerns about the fit of software engineering practices in science. The study had one time point after implementing agile practices and one site, limiting its generalizability. CONCLUSIONS Agile software development may substantially increase a research informatics core facility's effectiveness and should be studied further as a potential best practice for how such cores are operated.
Collapse
Affiliation(s)
- Andrew R Post
- Research Informatics Shared Resource, Huntsman
Cancer Institute, University of Utah, Salt Lake City, Utah,
USA
- Department of Biomedical Informatics, University of
Utah, Salt Lake City, Utah, USA
| | - Jared Luther
- Research Informatics Shared Resource, Huntsman
Cancer Institute, University of Utah, Salt Lake City, Utah,
USA
| | - J Maxwell Loveless
- Research Administration, Huntsman Cancer Institute,
University of Utah, Salt Lake City, Utah, USA
| | - Melanie Ward
- Research Administration, Huntsman Cancer Institute,
University of Utah, Salt Lake City, Utah, USA
| | - Shirleen Hewitt
- Research Informatics Shared Resource, Huntsman
Cancer Institute, University of Utah, Salt Lake City, Utah,
USA
| |
Collapse
|
6
|
Post AR, Ai M, Kalsanka Pai A, Overcash M, Stephens DS. Architecting the Data Loading Process for an i2b2 Research Data Warehouse: Full Reload versus Incremental Updating. AMIA Annu Symp Proc 2018; 2017:1411-1420. [PMID: 29854210 PMCID: PMC5977612] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Grants] [Subscribe] [Scholar Register] [Indexed: 06/08/2023]
Abstract
Research data warehouses integrate research and patient data from one or more sources into a single data model that is designed for research. Typically, institutions update their warehouse by fully reloading it periodically. The alternative is to update the warehouse incrementally with new, changed and/or deleted data. Full reloads avoid having to correct and add to a live system, but they can render the data outdated for clinical trial accrual. They place a substantial burden on source systems, involve intermittent work that is challenging to resource, and may involve tight coordination across IT and informatics units. We have implemented daily incremental updating for our i2b2 data warehouse. Incremental updating requires substantial up-front development, and it can expose provisional data to investigators. However, it may support more use cases, it may be a better fit for academic healthcare IT organizational structures, and ongoing support needs appear to be similar or lower.
Collapse
Affiliation(s)
| | - Miao Ai
- Emory University, Atlanta, GA
| | | | | | | |
Collapse
|
7
|
Post AR, Pai AK, Willard R, May BJ, West AC, Agravat S, Granite SJ, Winslow RL, Stephens DS. Metadata-driven Clinical Data Loading into i2b2 for Clinical and Translational Science Institutes. AMIA Jt Summits Transl Sci Proc 2016; 2016:184-93. [PMID: 27570667 PMCID: PMC5001768] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 11/04/2022]
Abstract
Clinical and Translational Science Award (CTSA) recipients have a need to create research data marts from their clinical data warehouses, through research data networks and the use of i2b2 and SHRINE technologies. These data marts may have different data requirements and representations, thus necessitating separate extract, transform and load (ETL) processes for populating each mart. Maintaining duplicative procedural logic for each ETL process is onerous. We have created an entirely metadata-driven ETL process that can be customized for different data marts through separate configurations, each stored in an extension of i2b2 's ontology database schema. We extended our previously reported and open source Eureka! Clinical Analytics software with this capability. The same software has created i2b2 data marts for several projects, the largest being the nascent Accrual for Clinical Trials (ACT) network, for which it has loaded over 147 million facts about 1.2 million patients.
Collapse
Affiliation(s)
- Andrew R. Post
- Atlanta Clinical and Translational Science Institute, Emory University, Atlanta, GA
| | - Akshatha K. Pai
- Atlanta Clinical and Translational Science Institute, Emory University, Atlanta, GA
| | - Richard Willard
- Atlanta Clinical and Translational Science Institute, Emory University, Atlanta, GA
| | - Bradley J. May
- Atlanta Clinical and Translational Science Institute, Emory University, Atlanta, GA
| | - Andrew C. West
- Atlanta Clinical and Translational Science Institute, Emory University, Atlanta, GA
| | - Sanjay Agravat
- Atlanta Clinical and Translational Science Institute, Emory University, Atlanta, GA
| | - Stephen J. Granite
- Institute for Computational Medicine, Johns Hopkins University, Baltimore, MD
| | - Raimond L. Winslow
- Institute for Computational Medicine, Johns Hopkins University, Baltimore, MD
| | - David S. Stephens
- Atlanta Clinical and Translational Science Institute, Emory University, Atlanta, GA
| |
Collapse
|
8
|
Post AR, Kurc T, Willard R, Rathod H, Mansour M, Pai AK, Torian WM, Agravat S, Sturm S, Saltz JH. Temporal abstraction-based clinical phenotyping with Eureka! AMIA Annu Symp Proc 2013; 2013:1160-9. [PMID: 24551400 PMCID: PMC3900137] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Grants] [Subscribe] [Scholar Register] [Indexed: 06/03/2023]
Abstract
Temporal abstraction, a method for specifying and detecting temporal patterns in clinical databases, is very expressive and performs well, but it is difficult for clinical investigators and data analysts to understand. Such patterns are critical in phenotyping patients using their medical records in research and quality improvement. We have previously developed the Analytic Information Warehouse (AIW), which computes such phenotypes using temporal abstraction but requires software engineers to use. We have extended the AIW's web user interface, Eureka! Clinical Analytics, to support specifying phenotypes using an alternative model that we developed with clinical stakeholders. The software converts phenotypes from this model to that of temporal abstraction prior to data processing. The model can represent all phenotypes in a quality improvement project and a growing set of phenotypes in a multi-site research study. Phenotyping that is accessible to investigators and IT personnel may enable its broader adoption.
Collapse
Affiliation(s)
- Andrew R Post
- Dept. of Biomedical Informatics, Emory University, Atlanta, GA
| | - Tahsin Kurc
- Dept. of Biomedical Informatics, Emory University, Atlanta, GA
| | - Richie Willard
- Dept. of Biomedical Informatics, Emory University, Atlanta, GA
| | - Himanshu Rathod
- Dept. of Biomedical Informatics, Emory University, Atlanta, GA
| | - Michel Mansour
- Dept. of Biomedical Informatics, Emory University, Atlanta, GA
| | | | | | - Sanjay Agravat
- Dept. of Biomedical Informatics, Emory University, Atlanta, GA
| | - Suzanne Sturm
- Dept. of Biomedical Informatics, Emory University, Atlanta, GA
| | - Joel H Saltz
- Dept. of Biomedical Informatics, Emory University, Atlanta, GA
| |
Collapse
|
9
|
Post AR, Kurc T, Cholleti S, Gao J, Lin X, Bornstein W, Cantrell D, Levine D, Hohmann S, Saltz JH. The Analytic Information Warehouse (AIW): a platform for analytics using electronic health record data. J Biomed Inform 2013; 46:410-24. [PMID: 23402960 PMCID: PMC3660520 DOI: 10.1016/j.jbi.2013.01.005] [Citation(s) in RCA: 29] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/21/2012] [Revised: 12/20/2012] [Accepted: 01/28/2013] [Indexed: 12/28/2022]
Abstract
OBJECTIVE To create an analytics platform for specifying and detecting clinical phenotypes and other derived variables in electronic health record (EHR) data for quality improvement investigations. MATERIALS AND METHODS We have developed an architecture for an Analytic Information Warehouse (AIW). It supports transforming data represented in different physical schemas into a common data model, specifying derived variables in terms of the common model to enable their reuse, computing derived variables while enforcing invariants and ensuring correctness and consistency of data transformations, long-term curation of derived data, and export of derived data into standard analysis tools. It includes software that implements these features and a computing environment that enables secure high-performance access to and processing of large datasets extracted from EHRs. RESULTS We have implemented and deployed the architecture in production locally. The software is available as open source. We have used it as part of hospital operations in a project to reduce rates of hospital readmission within 30days. The project examined the association of over 100 derived variables representing disease and co-morbidity phenotypes with readmissions in 5years of data from our institution's clinical data warehouse and the UHC Clinical Database (CDB). The CDB contains administrative data from over 200 hospitals that are in academic medical centers or affiliated with such centers. DISCUSSION AND CONCLUSION A widely available platform for managing and detecting phenotypes in EHR data could accelerate the use of such data in quality improvement and comparative effectiveness studies.
Collapse
Affiliation(s)
- Andrew R Post
- Department of Biomedical Informatics, Emory University, 36 Eagle Row, Atlanta, GA 30322, USA.
| | | | | | | | | | | | | | | | | | | |
Collapse
|
10
|
Post AR, Krc T, Rathod H, Agravat S, Mansour M, Torian W, Saltz JH. Semantic ETL into i2b2 with Eureka! AMIA Jt Summits Transl Sci Proc 2013; 2013:203-7. [PMID: 24303265 PMCID: PMC3845783] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Grants] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 11/23/2022]
Abstract
Clinical phenotyping is an emerging research information systems capability. Research uses of electronic health record (EHR) data may require the ability to identify clinical co-morbidities and complications. Such phenotypes may not be represented directly as discrete data elements, but rather as frequency, sequential and temporal patterns in billing and clinical data. These patterns' complexity suggests the need for a robust yet flexible extract, transform and load (ETL) process that can compute them. This capability should be accessible to investigators with limited ability to engage an IT department in data management. We have developed such a system, Eureka! Clinical Analytics. It extracts data from an Excel spreadsheet, computes a broad set of phenotypes of common interest, and loads both raw and computed data into an i2b2 project. A web-based user interface allows executing and monitoring ETL processes. Eureka! is deployed at our institution and is available for deployment in the cloud.
Collapse
Affiliation(s)
- Andrew R Post
- Center for Comprehensive Informatics, Emory University, Atlanta, GA
| | | | | | | | | | | | | |
Collapse
|
11
|
Gardner J, Xiong L, Xiao Y, Gao J, Post AR, Jiang X, Ohno-Machado L. SHARE: system design and case studies for statistical health information release. J Am Med Inform Assoc 2012; 20:109-16. [PMID: 23059729 DOI: 10.1136/amiajnl-2012-001032] [Citation(s) in RCA: 25] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/04/2022] Open
Abstract
OBJECTIVES We present SHARE, a new system for statistical health information release with differential privacy. We present two case studies that evaluate the software on real medical datasets and demonstrate the feasibility and utility of applying the differential privacy framework on biomedical data. MATERIALS AND METHODS SHARE releases statistical information in electronic health records with differential privacy, a strong privacy framework for statistical data release. It includes a number of state-of-the-art methods for releasing multidimensional histograms and longitudinal patterns. We performed a variety of experiments on two real datasets, the surveillance, epidemiology and end results (SEER) breast cancer dataset and the Emory electronic medical record (EeMR) dataset, to demonstrate the feasibility and utility of SHARE. RESULTS Experimental results indicate that SHARE can deal with heterogeneous data present in medical data, and that the released statistics are useful. The Kullback-Leibler divergence between the released multidimensional histograms and the original data distribution is below 0.5 and 0.01 for seven-dimensional and three-dimensional data cubes generated from the SEER dataset, respectively. The relative error for longitudinal pattern queries on the EeMR dataset varies between 0 and 0.3. While the results are promising, they also suggest that challenges remain in applying statistical data release using the differential privacy framework for higher dimensional data. CONCLUSIONS SHARE is one of the first systems to provide a mechanism for custodians to release differentially private aggregate statistics for a variety of use cases in the medical domain. This proof-of-concept system is intended to be applied to large-scale medical data warehouses.
Collapse
Affiliation(s)
- James Gardner
- Digital Reasoning Systems Inc, Franklin, Tennessee, USA
| | | | | | | | | | | | | |
Collapse
|
12
|
Post AR, Harrison JH. Concept similarity in publications precedes cross-disciplinary collaboration. AMIA Annu Symp Proc 2008; 2008:606-610. [PMID: 18999254 PMCID: PMC2655988] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Subscribe] [Scholar Register] [Received: 03/14/2008] [Revised: 07/16/2008] [Indexed: 05/27/2023]
Abstract
Innovative science frequently occurs as a result of cross-disciplinary collaboration, the importance of which is reflected by recent NIH funding initiatives that promote communication and collaboration. If shared research interests between collaborators are important for the formation of collaborations,methods for identifying these shared interests across scientific domains could potentially reveal new and useful collaboration opportunities. MEDLINE represents a comprehensive database of collaborations and research interests, as reflected by article co-authors and concept content. We analyzed six years of citations using information retrieval based methods to compute articles conceptual similarity, and found that articles by basic and clinical scientists who later collaborated had significantly higher average similarity than articles by similar scientists who did not collaborate.Refinement of these methods and characterization of found conceptual overlaps could allow automated discovery of collaboration opportunities that are currently missed.
Collapse
Affiliation(s)
- Andrew R Post
- Clinical Informatics Division, Dept. of Public Health Sciences, University of Virginia, Charlottesville, VA, USA
| | | |
Collapse
|
13
|
Post AR, Sovarel AN, Harrison JH. Abstraction-based temporal data retrieval for a Clinical Data Repository. AMIA Annu Symp Proc 2007; 2007:603-7. [PMID: 18693907 PMCID: PMC2655874] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Grants] [Subscribe] [Scholar Register] [Received: 03/15/2007] [Revised: 07/03/2007] [Accepted: 10/11/2007] [Indexed: 05/26/2023]
Abstract
Disease and patient care processes often create characteristic states, trends, and temporal patterns in clinical events and observations, called temporal abstractions. Identifying patient populations who share similar abstractions may be useful for clinical research, outcomes studies, and quality assurance. In these settings, abstractions may be specific to a query, and thus allowing the specification of abstractions directly in the query would be desirable. We propose a query language for specifying and retrieving clinical data sets that allows specifying abstractions directly, and automatically selects data for retrieval based on the presence of abstractions inferred from the data. We describe the language and a prototype implementation, demonstrate its features with two queries constructed in response to clinical researcher-initiated data requests submitted to our institution's Clinical Data Repository, and report preliminary results from an evaluation of the implementation's performance.
Collapse
|
14
|
Post AR, Harrison JH. PROTEMPA: a method for specifying and identifying temporal sequences in retrospective data for patient selection. J Am Med Inform Assoc 2007; 14:674-83. [PMID: 17600103 PMCID: PMC1975802 DOI: 10.1197/jamia.m2275] [Citation(s) in RCA: 23] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022] Open
Abstract
OBJECTIVE To specify and identify disease and patient care processes represented by temporal patterns in clinical events and observations, and retrieve patient populations containing those patterns from clinical data repositories, in support of clinical research, outcomes studies, and quality assurance. DESIGN A data processing method called PROTEMPA (Process-oriented Temporal Analysis) was developed for defining and detecting clinically relevant temporal and mathematical patterns in retrospective data. PROTEMPA provides for portability across data sources, "pluggable" data processing environments, and the creation of libraries of pattern definitions and data processing algorithms. MEASUREMENTS A proof-of-concept implementation of PROTEMPA in Java was evaluated against standard SQL queries for its ability to identify patients from a large clinical data repository who show the features of HELLP syndrome, and categorize those patients by disease severity and progression based on time sequence characteristics in their clinical laboratory test results. RESULTS were verified by manual case review. RESULTS The proof-of-concept implementation was more accurate than SQL in identifying patients with HELLP and correctly assigned severity and disease progression categories, which was not possible using SQL only. CONCLUSIONS PROTEMPA supports the identification and categorization of patients with complex disease based on the characteristics of and relationships between time sequences in multiple data types. Identifying patient populations who share these types of patterns may be useful when patient features of interest do not have standard codes, are poorly-expressed in coding schemes, may be inaccurately or incompletely coded, or are not represented explicitly as data values.
Collapse
Affiliation(s)
- Andrew R Post
- Division of Clinical Informatics, Department of Public Health Sciences, University of Virginia, Charlottesville, VA 22908-0717, USA.
| | | |
Collapse
|
15
|
Abstract
The production of meat and poultry products has become increasingly complex. Technological growth has contributed to the need for sophistication in determining the origin and risk of food-borne microbial infections as well as environmental contaminants. The increasing use of agricultural chemicals in animal production and to some extent in processed foods has led to the presence of chemical residues in meat and poultry. These changes have caused the Food Safety and Inspection Service (FSIS), a public health agency within the United States Department of Agriculture (USDA), to institute new food safety initiatives and procedures for inspection of meat and poultry products. The goal is to reduce risks to the public health from conditions observed during antemortem and postmortem inspection or detected during processing. FSIS is committed to scientific innovation and has implemented several rapid inplant tests that have given the Agency inexpensive, less disruptive methods to determine product adulteration contamination.
Collapse
Affiliation(s)
- M A Norcross
- Food Safety and Inspection Service, USDA, Washington, DC 20250
| | | |
Collapse
|