1
|
Callahan A, Ashley E, Datta S, Desai P, Ferris TA, Fries JA, Halaas M, Langlotz CP, Mackey S, Posada JD, Pfeffer MA, Shah NH. The Stanford Medicine data science ecosystem for clinical and translational research. JAMIA Open 2023; 6:ooad054. [PMID: 37545984 PMCID: PMC10397535 DOI: 10.1093/jamiaopen/ooad054] [Citation(s) in RCA: 4] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/29/2022] [Revised: 03/14/2023] [Accepted: 07/19/2023] [Indexed: 08/08/2023] Open
Abstract
Objective To describe the infrastructure, tools, and services developed at Stanford Medicine to maintain its data science ecosystem and research patient data repository for clinical and translational research. Materials and Methods The data science ecosystem, dubbed the Stanford Data Science Resources (SDSR), includes infrastructure and tools to create, search, retrieve, and analyze patient data, as well as services for data deidentification, linkage, and processing to extract high-value information from healthcare IT systems. Data are made available via self-service and concierge access, on HIPAA compliant secure computing infrastructure supported by in-depth user training. Results The Stanford Medicine Research Data Repository (STARR) functions as the SDSR data integration point, and includes electronic medical records, clinical images, text, bedside monitoring data and HL7 messages. SDSR tools include tools for electronic phenotyping, cohort building, and a search engine for patient timelines. The SDSR supports patient data collection, reproducible research, and teaching using healthcare data, and facilitates industry collaborations and large-scale observational studies. Discussion Research patient data repositories and their underlying data science infrastructure are essential to realizing a learning health system and advancing the mission of academic medical centers. Challenges to maintaining the SDSR include ensuring sufficient financial support while providing researchers and clinicians with maximal access to data and digital infrastructure, balancing tool development with user training, and supporting the diverse needs of users. Conclusion Our experience maintaining the SDSR offers a case study for academic medical centers developing data science and research informatics infrastructure.
Collapse
Affiliation(s)
- Alison Callahan
- Stanford Center for Biomedical Informatics Research, Stanford University, Stanford, California, USA
| | - Euan Ashley
- Department of Medicine, School of Medicine, Stanford University, Stanford, California, USA
- Department of Genetics, School of Medicine, Stanford University, Stanford, California, USA
- Department of Biomedical Data Science, School of Medicine, Stanford University, Stanford, California, USA
| | - Somalee Datta
- Technology and Digital Solutions, Stanford Medicine, Stanford University, Stanford, California, USA
| | - Priyamvada Desai
- Technology and Digital Solutions, Stanford Medicine, Stanford University, Stanford, California, USA
| | - Todd A Ferris
- Technology and Digital Solutions, Stanford Medicine, Stanford University, Stanford, California, USA
| | - Jason A Fries
- Stanford Center for Biomedical Informatics Research, Stanford University, Stanford, California, USA
| | - Michael Halaas
- Technology and Digital Solutions, Stanford Medicine, Stanford University, Stanford, California, USA
| | - Curtis P Langlotz
- Department of Radiology, School of Medicine, Stanford University, Stanford, California, USA
| | - Sean Mackey
- Department of Anesthesia, School of Medicine, Stanford University, Stanford, California, USA
| | - José D Posada
- Technology and Digital Solutions, Stanford Medicine, Stanford University, Stanford, California, USA
| | - Michael A Pfeffer
- Technology and Digital Solutions, Stanford Medicine, Stanford University, Stanford, California, USA
| | - Nigam H Shah
- Stanford Center for Biomedical Informatics Research, Stanford University, Stanford, California, USA
- Technology and Digital Solutions, Stanford Medicine, Stanford University, Stanford, California, USA
- Clinical Excellence Research Center, School of Medicine, Stanford University, Stanford, California, USA
| |
Collapse
|
2
|
Wood WA, Anderson KC, Kumar SK, Semmel EA, Hewitt K, Plovnick RM, Pappas G. A pandemic preparedness network for individuals living with compromised immune systems. Blood Adv 2023; 7:3925-3927. [PMID: 37023227 PMCID: PMC10405186 DOI: 10.1182/bloodadvances.2023010035] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/21/2023] [Revised: 03/22/2023] [Accepted: 04/05/2023] [Indexed: 04/08/2023] Open
Affiliation(s)
- William A. Wood
- Division of Hematology, Department of Medicine, University of North Carolina at Chapel Hill, Chapel Hill, NC
| | | | | | - Emily A. Semmel
- American Society of Hematology Research Collaborative, Washington, DC
| | - Kathleen Hewitt
- American Society of Hematology Research Collaborative, Washington, DC
| | | | | |
Collapse
|
3
|
Keloth VK, Banda JM, Gurley M, Heider PM, Kennedy G, Liu H, Liu F, Miller T, Natarajan K, V Patterson O, Peng Y, Raja K, Reeves RM, Rouhizadeh M, Shi J, Wang X, Wang Y, Wei WQ, Williams AE, Zhang R, Belenkaya R, Reich C, Blacketer C, Ryan P, Hripcsak G, Elhadad N, Xu H. Representing and utilizing clinical textual data for real world studies: An OHDSI approach. J Biomed Inform 2023; 142:104343. [PMID: 36935011 PMCID: PMC10428170 DOI: 10.1016/j.jbi.2023.104343] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/25/2022] [Revised: 01/21/2023] [Accepted: 03/13/2023] [Indexed: 03/19/2023]
Abstract
Clinical documentation in electronic health records contains crucial narratives and details about patients and their care. Natural language processing (NLP) can unlock the information conveyed in clinical notes and reports, and thus plays a critical role in real-world studies. The NLP Working Group at the Observational Health Data Sciences and Informatics (OHDSI) consortium was established to develop methods and tools to promote the use of textual data and NLP in real-world observational studies. In this paper, we describe a framework for representing and utilizing textual data in real-world evidence generation, including representations of information from clinical text in the Observational Medical Outcomes Partnership (OMOP) Common Data Model (CDM), the workflow and tools that were developed to extract, transform and load (ETL) data from clinical notes into tables in OMOP CDM, as well as current applications and specific use cases of the proposed OHDSI NLP solution at large consortia and individual institutions with English textual data. Challenges faced and lessons learned during the process are also discussed to provide valuable insights for researchers who are planning to implement NLP solutions in real-world studies.
Collapse
Affiliation(s)
- Vipina K Keloth
- Section of Biomedical Informatics and Data Science, Yale School of Medicine, Yale University, New Haven, CT, USA
| | - Juan M Banda
- Department of Computer Science, Georgia State University, Atlanta, GA, USA
| | - Michael Gurley
- Lurie Cancer Center, Northwestern University, Chicago, Illinois, USA
| | - Paul M Heider
- Biomedical Informatics Center, Medical University of South Carolina, Charleston, SC, USA
| | - Georgina Kennedy
- Ingham Institute for Applied Medical Research, Sydney, Australia
| | - Hongfang Liu
- Department of Artificial Intelligence and Informatics, Mayo Clinic, Rochester, MN, USA
| | - Feifan Liu
- Department of Population and Quantitative Health Sciences, University of Massachusetts Chan Medical School, Worcester, MA, USA
| | - Timothy Miller
- Computational Health Informatics Program, Boston Children's Hospital, and Department of Pediatrics, Harvard Medical School, Boston, MA, USA
| | - Karthik Natarajan
- Department of Biomedical Informatics, Columbia University Irving Medical Center, New York, NY, USA
| | - Olga V Patterson
- VA Informatics and Computing Infrastructure, Department of Veterans Affairs Salt Lake City Health Care System, Salt Lake City, Utah, USA; Division of Epidemiology, Department of Internal Medicine, School of Medicine, University of Utah, Salt Lake City, Utah, USA; Verily Life Sciences, Mountain View, CA, USA
| | - Yifan Peng
- Department of Population Health Sciences, Weill Cornell Medicine, New York, NY, USA
| | - Kalpana Raja
- Section of Biomedical Informatics and Data Science, Yale School of Medicine, Yale University, New Haven, CT, USA
| | - Ruth M Reeves
- TN Valley Healthcare System, U.S. Department of Veterans Affairs, Nashville, TN, USA; Department of Biomedical Informatics, Vanderbilt University Medical Center, Nashville, TN, USA
| | - Masoud Rouhizadeh
- Department of Pharmaceutical Outcomes & Policy, University of Florida, Gainesville, FL, USA; Biomedical Informatics and Data Science, Johns Hopkins University, Baltimore, MD, USA
| | - Jianlin Shi
- VA Informatics and Computing Infrastructure, Department of Veterans Affairs Salt Lake City Health Care System, Salt Lake City, Utah, USA; Division of Epidemiology, Department of Internal Medicine, School of Medicine, University of Utah, Salt Lake City, Utah, USA; Department of Biomedical Informatics, University of Utah, Salt Lake City, USA
| | - Xiaoyan Wang
- Sema4 Mount Sinai Genomics Incorporation, Stamford, CT, USA
| | - Yanshan Wang
- Department of Health Information Management, Department of Biomedical Informatics, and Intelligent Systems Program, University of Pittsburgh, Pittsburgh, PA, USA
| | - Wei-Qi Wei
- Department of Biomedical Informatics, Vanderbilt University Medical Center, Nashville, TN, USA
| | | | - Rui Zhang
- Institute for Health Informatics, and Department of Pharmaceutical Care & Health Systems, University of Minnesota, Minneapolis, MN, USA
| | | | | | - Clair Blacketer
- Janssen Pharmaceutical Research and Development LLC, Titusville, NJ, USA; Department of Medical Informatics, Erasmus University Medical Center, Rotterdam, the Netherlands
| | - Patrick Ryan
- Department of Biomedical Informatics, Columbia University Irving Medical Center, New York, NY, USA; Janssen Pharmaceutical Research and Development LLC, Titusville, NJ, USA
| | - George Hripcsak
- Department of Biomedical Informatics, Columbia University Irving Medical Center, New York, NY, USA
| | - Noémie Elhadad
- Department of Biomedical Informatics, Columbia University Irving Medical Center, New York, NY, USA.
| | - Hua Xu
- Section of Biomedical Informatics and Data Science, Yale School of Medicine, Yale University, New Haven, CT, USA.
| |
Collapse
|
4
|
Zhang GQ, Li X, Huang Y, Cui L. Temporal Cohort Logic. AMIA ... ANNUAL SYMPOSIUM PROCEEDINGS. AMIA SYMPOSIUM 2023; 2022:1237-1246. [PMID: 37128360 PMCID: PMC10148298] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Subscribe] [Scholar Register] [Indexed: 05/03/2023]
Abstract
We introduce a new logic, called Temporal Cohort Logic (TCL), for cohort specification and discovery in clinical and population health research. TCL is created to fill a conceptual gap in formalizing temporal reasoning in biomedicine, in a similar role that temporal logics play for computer science and its applications. We provide formal syntax and semantics for TCL and illustrate the various logical constructs using examples related to human health. Relationships and distinctions with existing temporal logical frameworks are discussed. Applications in electronic health record (EHR) and in neurophysiological data resource are provided. Our approach differs from existing temporal logics, in that we explicitly capture Allen's interval algebra as modal operators in a language of temporal logic (rather than addressing it in the semantic structure). This has two major implications. First, it provides a formal logical framework for reasoning about time in biomedicine, allowing general (i.e., higher-levels of abstraction) investigation into the properties of this approach (such as proof systems, completeness, expressiveness, and decidability) independent of a specific query language or a database system. Second, it puts our approach in the context of logical developments in computer science, allowing potential translation of existing results into the setting of TCL and its variants or subsystems so as to illuminate opportunities and computational challenges involved in temporal reasoning for biomedicine.
Collapse
Affiliation(s)
- Guo-Qiang Zhang
- McGovern Medical School
- School of Biomedical Informatics
- Texas Institute for Restorative Neurotechnologies The University of Texas Health Science Center at Houston, Houston, Texas 77030, USA
| | - Xiaojin Li
- McGovern Medical School
- Texas Institute for Restorative Neurotechnologies The University of Texas Health Science Center at Houston, Houston, Texas 77030, USA
| | - Yan Huang
- McGovern Medical School
- Texas Institute for Restorative Neurotechnologies The University of Texas Health Science Center at Houston, Houston, Texas 77030, USA
| | - Licong Cui
- School of Biomedical Informatics
- Texas Institute for Restorative Neurotechnologies The University of Texas Health Science Center at Houston, Houston, Texas 77030, USA
| |
Collapse
|
5
|
He T, Belouali A, Patricoski J, Lehmann H, Ball R, Anagnostou V, Kreimeyer K, Botsis T. Trends and opportunities in computable clinical phenotyping: A scoping review. J Biomed Inform 2023; 140:104335. [PMID: 36933631 DOI: 10.1016/j.jbi.2023.104335] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/09/2022] [Revised: 03/07/2023] [Accepted: 03/09/2023] [Indexed: 03/18/2023]
Abstract
Identifying patient cohorts meeting the criteria of specific phenotypes is essential in biomedicine and particularly timely in precision medicine. Many research groups deliver pipelines that automatically retrieve and analyze data elements from one or more sources to automate this task and deliver high-performing computable phenotypes. We applied a systematic approach based on the Preferred Reporting Items for Systematic Reviews and Meta-Analyses guidelines to conduct a thorough scoping review on computable clinical phenotyping. Five databases were searched using a query that combined the concepts of automation, clinical context, and phenotyping. Subsequently, four reviewers screened 7960 records (after removing over 4000 duplicates) and selected 139 that satisfied the inclusion criteria. This dataset was analyzed to extract information on target use cases, data-related topics, phenotyping methodologies, evaluation strategies, and portability of developed solutions. Most studies supported patient cohort selection without discussing the application to specific use cases, such as precision medicine. Electronic Health Records were the primary source in 87.1 % (N = 121) of all studies, and International Classification of Diseases codes were heavily used in 55.4 % (N = 77) of all studies, however, only 25.9 % (N = 36) of the records described compliance with a common data model. In terms of the presented methods, traditional Machine Learning (ML) was the dominant method, often combined with natural language processing and other approaches, while external validation and portability of computable phenotypes were pursued in many cases. These findings revealed that defining target use cases precisely, moving away from sole ML strategies, and evaluating the proposed solutions in the real setting are essential opportunities for future work. There is also momentum and an emerging need for computable phenotyping to support clinical and epidemiological research and precision medicine.
Collapse
Affiliation(s)
- Ting He
- Department of Oncology, The Sidney Kimmel Comprehensive Cancer Center, Johns Hopkins University School of Medicine, Baltimore, MD, USA; Biomedical Informatics and Data Science Section, Johns Hopkins University School of Medicine, Baltimore, MD, USA.
| | - Anas Belouali
- Biomedical Informatics and Data Science Section, Johns Hopkins University School of Medicine, Baltimore, MD, USA
| | - Jessica Patricoski
- Biomedical Informatics and Data Science Section, Johns Hopkins University School of Medicine, Baltimore, MD, USA
| | - Harold Lehmann
- Biomedical Informatics and Data Science Section, Johns Hopkins University School of Medicine, Baltimore, MD, USA
| | - Robert Ball
- Office of Surveillance and Epidemiology, Center for Drug Evaluation and Research, US FDA, Silver Spring, MD, USA
| | - Valsamo Anagnostou
- Department of Oncology, The Sidney Kimmel Comprehensive Cancer Center, Johns Hopkins University School of Medicine, Baltimore, MD, USA
| | - Kory Kreimeyer
- Department of Oncology, The Sidney Kimmel Comprehensive Cancer Center, Johns Hopkins University School of Medicine, Baltimore, MD, USA; Biomedical Informatics and Data Science Section, Johns Hopkins University School of Medicine, Baltimore, MD, USA
| | - Taxiarchis Botsis
- Department of Oncology, The Sidney Kimmel Comprehensive Cancer Center, Johns Hopkins University School of Medicine, Baltimore, MD, USA; Biomedical Informatics and Data Science Section, Johns Hopkins University School of Medicine, Baltimore, MD, USA.
| |
Collapse
|
6
|
Lovis C, Mageau A, Mékinian A, Tannier X, Carrat F. Construction of Cohorts of Similar Patients From Automatic Extraction of Medical Concepts: Phenotype Extraction Study. JMIR Med Inform 2022; 10:e42379. [PMID: 36534446 PMCID: PMC9808583 DOI: 10.2196/42379] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/01/2022] [Revised: 10/17/2022] [Accepted: 10/22/2022] [Indexed: 12/23/2022] Open
Abstract
BACKGROUND Reliable and interpretable automatic extraction of clinical phenotypes from large electronic medical record databases remains a challenge, especially in a language other than English. OBJECTIVE We aimed to provide an automated end-to-end extraction of cohorts of similar patients from electronic health records for systemic diseases. METHODS Our multistep algorithm includes a named-entity recognition step, a multilabel classification using medical subject headings ontology, and the computation of patient similarity. A selection of cohorts of similar patients on a priori annotated phenotypes was performed. Six phenotypes were selected for their clinical significance: P1, osteoporosis; P2, nephritis in systemic erythematosus lupus; P3, interstitial lung disease in systemic sclerosis; P4, lung infection; P5, obstetric antiphospholipid syndrome; and P6, Takayasu arteritis. We used a training set of 151 clinical notes and an independent validation set of 256 clinical notes, with annotated phenotypes, both extracted from the Assistance Publique-Hôpitaux de Paris data warehouse. We evaluated the precision of the 3 patients closest to the index patient for each phenotype with precision-at-3 and recall and average precision. RESULTS For P1-P4, the precision-at-3 ranged from 0.85 (95% CI 0.75-0.95) to 0.99 (95% CI 0.98-1), the recall ranged from 0.53 (95% CI 0.50-0.55) to 0.83 (95% CI 0.81-0.84), and the average precision ranged from 0.58 (95% CI 0.54-0.62) to 0.88 (95% CI 0.85-0.90). P5-P6 phenotypes could not be analyzed due to the limited number of phenotypes. CONCLUSIONS Using a method close to clinical reasoning, we built a scalable and interpretable end-to-end algorithm for extracting cohorts of similar patients.
Collapse
Affiliation(s)
| | - Arthur Mageau
- Institut National de la Santé et de la Recherche Médicale, Unité Mixte de Recherche 1137 Infection Antimicrobials Modelling Evolution, Team Decision Sciences in Infectious Diseases, Université Paris Cité, Paris, France
| | - Arsène Mékinian
- Service de Médecine Interne, Inflammation-Immunopathology-Biotherapy Department, Hôpital Saint-Antoine, Sorbonne Université, Assistance Publique-Hôpitaux de Paris, Paris, France
| | - Xavier Tannier
- Laboratoire d'Informatique Médicale et d'Ingénierie des Connaissances pour la e-Santé, Institut National de la Santé et de la Recherche Médicale, Université Sorbonne, Paris, France
| | - Fabrice Carrat
- Institute Pierre Louis Epidemiology and Public Health, Institut National de la Santé et de la Recherche Médicale, Sorbonne Université, Paris, France.,Public Health Department, Hopital Saint-Antoine, Assistance Publique-Hôpitaux de Paris, Paris, France
| |
Collapse
|
7
|
Chaunzwa TL, del Rey MQ, Bitterman DS. Clinical Informatics Approaches to Understand and Address Cancer Disparities. Yearb Med Inform 2022; 31:121-130. [PMID: 36463869 PMCID: PMC9719762 DOI: 10.1055/s-0042-1742511] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/07/2022] Open
Abstract
OBJECTIVES Disparities in cancer incidence and outcomes across race, ethnicity, gender, socioeconomic status, and geography are well-documented, but their etiologies are often poorly understood and multifactorial. Clinical informatics can provide tools to better understand and address these disparities by enabling high-throughput analysis of multiple types of data. Here, we review recent efforts in clinical informatics to study and measure disparities in cancer. METHODS We carried out a narrative review of clinical informatics studies related to cancer disparities and bias published from 2018-2021, with a focus on domains such as real-world data (RWD) analysis, natural language processing (NLP), radiomics, genomics, proteomics, metabolomics, and metagenomics. RESULTS Clinical informatics studies that investigated cancer disparities across race, ethnicity, gender, and age were identified. Most cancer disparities work within clinical informatics used RWD analysis, NLP, radiomics, and genomics. Emerging applications of clinical informatics to understand cancer disparities, including proteomics, metabolomics, and metagenomics, were less well represented in the literature but are promising future research avenues. Algorithmic bias was identified as an important consideration when developing and implementing cancer clinical informatics techniques, and efforts to address this bias were reviewed. CONCLUSIONS In recent years, clinical informatics has been used to probe a range of data sources to understand cancer disparities across different populations. As informatics tools become integrated into clinical decision-making, attention will need to be paid to ensure that algorithmic bias does not amplify existing disparities. In our increasingly interconnected medical systems, clinical informatics is poised to untap the full potential of multi-platform health data to address cancer disparities.
Collapse
Affiliation(s)
- Tafadzwa L. Chaunzwa
- Department of Radiation Oncology, Dana-Farber Brigham Cancer Center, Harvard Medical School, Boston, MA, USA,Artificial Intelligence in Medicine (AIM) Program, Mass General Brigham, Harvard Medical School, Boston, MA, USA
| | - Maria Quiles del Rey
- Department of Radiation Oncology, Dana-Farber Brigham Cancer Center, Harvard Medical School, Boston, MA, USA
| | - Danielle S. Bitterman
- Department of Radiation Oncology, Dana-Farber Brigham Cancer Center, Harvard Medical School, Boston, MA, USA,Artificial Intelligence in Medicine (AIM) Program, Mass General Brigham, Harvard Medical School, Boston, MA, USA,Correspondence to: Dr. Danielle S. Bitterman Department of Radiation Oncology, Dana-Farber Cancer Institute/Brigham and Women's Hospital75 Francis Street, Boston, MA 02115USA+1 857 215 1489+1 617 975 0985
| |
Collapse
|
8
|
Brandt PS, Pacheco JA, Adekkanattu P, Sholle ET, Abedian S, Stone DJ, Knaack DM, Xu J, Xu Z, Peng Y, Benda NC, Wang F, Luo Y, Jiang G, Pathak J, Rasmussen LV. Design and validation of a FHIR-based EHR-driven phenotyping toolbox. J Am Med Inform Assoc 2022; 29:1449-1460. [PMID: 35799370 PMCID: PMC9382394 DOI: 10.1093/jamia/ocac063] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/27/2021] [Revised: 04/04/2022] [Accepted: 06/17/2022] [Indexed: 12/14/2022] Open
Abstract
OBJECTIVES To develop and validate a standards-based phenotyping tool to author electronic health record (EHR)-based phenotype definitions and demonstrate execution of the definitions against heterogeneous clinical research data platforms. MATERIALS AND METHODS We developed an open-source, standards-compliant phenotyping tool known as the PhEMA Workbench that enables a phenotype representation using the Fast Healthcare Interoperability Resources (FHIR) and Clinical Quality Language (CQL) standards. We then demonstrated how this tool can be used to conduct EHR-based phenotyping, including phenotype authoring, execution, and validation. We validated the performance of the tool by executing a thrombotic event phenotype definition at 3 sites, Mayo Clinic (MC), Northwestern Medicine (NM), and Weill Cornell Medicine (WCM), and used manual review to determine precision and recall. RESULTS An initial version of the PhEMA Workbench has been released, which supports phenotype authoring, execution, and publishing to a shared phenotype definition repository. The resulting thrombotic event phenotype definition consisted of 11 CQL statements, and 24 value sets containing a total of 834 codes. Technical validation showed satisfactory performance (both NM and MC had 100% precision and recall and WCM had a precision of 95% and a recall of 84%). CONCLUSIONS We demonstrate that the PhEMA Workbench can facilitate EHR-driven phenotype definition, execution, and phenotype sharing in heterogeneous clinical research data environments. A phenotype definition that integrates with existing standards-compliant systems, and the use of a formal representation facilitates automation and can decrease potential for human error.
Collapse
Affiliation(s)
- Pascal S Brandt
- Corresponding Author: Pascal S. Brandt, Department of Biomedical Informatics & Medical Education, University of Washington, Box 358047, Seattle, WA 98195, USA;
| | - Jennifer A Pacheco
- Department of Preventive Medicine, Northwestern University Feinberg School of Medicine, Chicago, Illinois, USA
| | - Prakash Adekkanattu
- Department of Healthcare Policy and Research, Weill Cornell Medicine, New York, New York, USA
| | - Evan T Sholle
- Department of Healthcare Policy and Research, Weill Cornell Medicine, New York, New York, USA
| | - Sajjad Abedian
- Department of Healthcare Policy and Research, Weill Cornell Medicine, New York, New York, USA
| | - Daniel J Stone
- Department of Artificial Intelligence and Informatics, Mayo Clinic, Rochester, Minnesota, USA
| | - David M Knaack
- Department of Artificial Intelligence and Informatics, Mayo Clinic, Rochester, Minnesota, USA
| | - Jie Xu
- Department of Healthcare Policy and Research, Weill Cornell Medicine, New York, New York, USA
| | - Zhenxing Xu
- Department of Healthcare Policy and Research, Weill Cornell Medicine, New York, New York, USA
| | - Yifan Peng
- Department of Healthcare Policy and Research, Weill Cornell Medicine, New York, New York, USA
| | - Natalie C Benda
- Department of Healthcare Policy and Research, Weill Cornell Medicine, New York, New York, USA
| | - Fei Wang
- Department of Healthcare Policy and Research, Weill Cornell Medicine, New York, New York, USA
| | - Yuan Luo
- Department of Preventive Medicine, Northwestern University Feinberg School of Medicine, Chicago, Illinois, USA
| | - Guoqian Jiang
- Department of Artificial Intelligence and Informatics, Mayo Clinic, Rochester, Minnesota, USA
| | - Jyotishman Pathak
- Department of Healthcare Policy and Research, Weill Cornell Medicine, New York, New York, USA
| | - Luke V Rasmussen
- Department of Preventive Medicine, Northwestern University Feinberg School of Medicine, Chicago, Illinois, USA
| |
Collapse
|