1
|
Pfaff ER, Girvin AT, Crosskey M, Gangireddy S, Master H, Wei WQ, Kerchberger VE, Weiner M, Harris PA, Basford M, Lunt C, Chute CG, Moffitt RA, Haendel M. De-black-boxing health AI: demonstrating reproducible machine learning computable phenotypes using the N3C-RECOVER Long COVID model in the All of Us data repository. J Am Med Inform Assoc 2023; 30:1305-1312. [PMID: 37218289 PMCID: PMC10280348 DOI: 10.1093/jamia/ocad077] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/18/2023] [Revised: 03/28/2023] [Accepted: 04/24/2023] [Indexed: 05/24/2023] Open
Abstract
Machine learning (ML)-driven computable phenotypes are among the most challenging to share and reproduce. Despite this difficulty, the urgent public health considerations around Long COVID make it especially important to ensure the rigor and reproducibility of Long COVID phenotyping algorithms such that they can be made available to a broad audience of researchers. As part of the NIH Researching COVID to Enhance Recovery (RECOVER) Initiative, researchers with the National COVID Cohort Collaborative (N3C) devised and trained an ML-based phenotype to identify patients highly probable to have Long COVID. Supported by RECOVER, N3C and NIH's All of Us study partnered to reproduce the output of N3C's trained model in the All of Us data enclave, demonstrating model extensibility in multiple environments. This case study in ML-based phenotype reuse illustrates how open-source software best practices and cross-site collaboration can de-black-box phenotyping algorithms, prevent unnecessary rework, and promote open science in informatics.
Collapse
Affiliation(s)
- Emily R Pfaff
- Department of Medicine, University of North Carolina at Chapel Hill School of Medicine, Chapel Hill, North Carolina, USA
| | | | | | - Srushti Gangireddy
- Department of Biomedical Informatics, Vanderbilt University Medical Center, Nashville, Tennessee, USA
| | - Hiral Master
- Vanderbilt Institute for Clinical and Translational Research, Vanderbilt University Medical Center, Nashville, Tennessee, USA
| | - Wei-Qi Wei
- Department of Biomedical Informatics, Vanderbilt University Medical Center, Nashville, Tennessee, USA
| | - V Eric Kerchberger
- Department of Medicine, Division of Allergy, Pulmonary & Critical Care Medicine, Vanderbilt University Medical Center, Nashville, Tennessee, USA
| | - Mark Weiner
- Department of Medicine, Weill Cornell Medicine, New York, USA
| | - Paul A Harris
- Department of Biomedical Informatics, Vanderbilt University Medical Center, Nashville, Tennessee, USA
| | - Melissa Basford
- Vanderbilt Institute for Clinical and Translational Research, Vanderbilt University Medical Center, Nashville, Tennessee, USA
| | - Chris Lunt
- National Institutes of Health, Bethesda, Maryland, USA
| | - Christopher G Chute
- Johns Hopkins Schools of Medicine, Public Health, and Nursing. Baltimore, Maryland, USA
| | - Richard A Moffitt
- Departments of Hematology and Medical Oncology and Biomedical Informatics, Emory University, Atlanta, Georgia, USA
| | - Melissa Haendel
- Department of Biomedical Informatics, University of Colorado Anschutz Medical Campus, Denver, Colorado, USA
| | | |
Collapse
|
2
|
Aschebrook-Kilfoy B, Zakin P, Craver A, Shah S, Kibriya MG, Stepniak E, Ramirez A, Clark C, Cohn E, Ohno-Machado L, Cicek M, Boerwinkle E, Schully SD, Mockrin S, Gebo K, Mayo K, Ratsimbazafy F, Sanders A, Shah RC, Argos M, Ho J, Kim K, Daviglus M, Greenland P, Ahsan H. An Overview of Cancer in the First 315,000 All of Us Participants. PLoS One 2022; 17:e0272522. [PMID: 36048778 PMCID: PMC9436122 DOI: 10.1371/journal.pone.0272522] [Citation(s) in RCA: 5] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/03/2022] [Accepted: 07/21/2022] [Indexed: 11/19/2022] Open
Abstract
Introduction The NIH All of Us Research Program will have the scale and scope to enable research for a wide range of diseases, including cancer. The program’s focus on diversity and inclusion promises a better understanding of the unequal burden of cancer. Preliminary cancer ascertainment in the All of Us cohort from two data sources (self-reported versus electronic health records (EHR)) is considered. Materials and methods This work was performed on data collected from the All of Us Research Program’s 315,297 enrolled participants to date using the Researcher Workbench, where approved researchers can access and analyze All of Us data on cancer and other diseases. Cancer case ascertainment was performed using data from EHR and self-reported surveys across key factors. Distribution of cancer types and concordance of data sources by cancer site and demographics is analyzed. Results and discussion Data collected from 315,297 participants resulted in 13,298 cancer cases detected in the survey (in 89,261 participants), 23,520 cancer cases detected in the EHR (in 203,813 participants), and 7,123 cancer cases detected across both sources (in 62,497 participants). Key differences in survey completion by race/ethnicity impacted the makeup of cohorts when compared to cancer in the EHR and national NCI SEER data. Conclusions This study provides key insight into cancer detection in the All of Us Research Program and points to the existing strengths and limitations of All of Us as a platform for cancer research now and in the future.
Collapse
Affiliation(s)
- Briseis Aschebrook-Kilfoy
- Department of Public Health Sciences, University of Chicago, Chicago, Illinois, United States of America
- Institute for Population and Precision Health, University of Chicago, Chicago, Illinois, United States of America
- Comprehensive Cancer Center, University of Chicago, Chicago, Illinois, United States of America
- * E-mail:
| | - Paul Zakin
- Department of Public Health Sciences, University of Chicago, Chicago, Illinois, United States of America
- Institute for Population and Precision Health, University of Chicago, Chicago, Illinois, United States of America
| | - Andrew Craver
- Department of Public Health Sciences, University of Chicago, Chicago, Illinois, United States of America
- Institute for Population and Precision Health, University of Chicago, Chicago, Illinois, United States of America
| | - Sameep Shah
- Department of Public Health Sciences, University of Chicago, Chicago, Illinois, United States of America
- Institute for Population and Precision Health, University of Chicago, Chicago, Illinois, United States of America
| | - Muhammad G. Kibriya
- Department of Public Health Sciences, University of Chicago, Chicago, Illinois, United States of America
- Institute for Population and Precision Health, University of Chicago, Chicago, Illinois, United States of America
| | - Elizabeth Stepniak
- Department of Public Health Sciences, University of Chicago, Chicago, Illinois, United States of America
- Institute for Population and Precision Health, University of Chicago, Chicago, Illinois, United States of America
| | - Andrea Ramirez
- Vanderbilt University Medical Center, Nashville, Tennessee, United States of America
| | - Cheryl Clark
- Brigham and Women’s Hospital, Boston, Massachusetts, United States of America
| | - Elizabeth Cohn
- Hunter College City University of New York, New York, New York, United States of America
| | - Lucila Ohno-Machado
- University of California San Diego Health, La Jolla, California, United States of America
| | - Mine Cicek
- Mayo Clinic, Rochester, Minnesota, United States of America
| | - Eric Boerwinkle
- The University of Texas Health Science Center at Houston, Houston, Texas, United States of America
| | - Sheri D. Schully
- National Institutes of Health, Bethesda, Maryland, United States of America
| | - Stephen Mockrin
- National Institutes of Health, Leidos, Inc, Frederick, Maryland, United States of America
| | - Kelly Gebo
- Johns Hopkins University School of Medicine, Bethesda, Maryland, United States of America
| | - Kelsey Mayo
- National Institutes of Health, Bethesda, Maryland, United States of America
| | | | - Alan Sanders
- Northshore University Health System, Evanston, Illinois, United States of America
| | - Raj C. Shah
- Rush Alzheimer’s Disease Center, Rush University Medical Center, Chicago, Illinois, United States of America
| | - Maria Argos
- Division of Epidemiology and Biostatistics, School of Public Health, University of Illinois at Chicago, Chicago, Illinois, United States of America
| | - Joyce Ho
- Department of Preventive Medicine, Northwestern University Feinberg School of Medicine, Chicago, Illinois, United States of America
| | - Karen Kim
- Comprehensive Cancer Center, University of Chicago, Chicago, Illinois, United States of America
- Department of Medicine, University of Chicago, Chicago, Illinois, United States of America
| | - Martha Daviglus
- Institute for Minority Health Research, College of Medicine, University of Illinois at Chicago, Chicago, Illinois, United States of America
| | - Philip Greenland
- Department of Preventive Medicine, Northwestern University Feinberg School of Medicine, Chicago, Illinois, United States of America
| | - Habibul Ahsan
- Department of Public Health Sciences, University of Chicago, Chicago, Illinois, United States of America
- Institute for Population and Precision Health, University of Chicago, Chicago, Illinois, United States of America
- Comprehensive Cancer Center, University of Chicago, Chicago, Illinois, United States of America
| | | |
Collapse
|
3
|
Cronin RM, Halvorson AE, Springer C, Feng X, Sulieman L, Loperena-Cortes R, Mayo K, Carroll RJ, Chen Q, Ahmedani BK, Karnes J, Korf B, O’Donnell CJ, Qian J, Ramirez AH. Comparison of family health history in surveys vs electronic health record data mapped to the observational medical outcomes partnership data model in the All of Us Research Program. J Am Med Inform Assoc 2021; 28:695-703. [PMID: 33404595 PMCID: PMC7973437 DOI: 10.1093/jamia/ocaa315] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/10/2021] [Revised: 10/15/2020] [Accepted: 11/14/2020] [Indexed: 01/28/2023] Open
Abstract
OBJECTIVE Family health history is important to clinical care and precision medicine. Prior studies show gaps in data collected from patient surveys and electronic health records (EHRs). The All of Us Research Program collects family history from participants via surveys and EHRs. This Demonstration Project aims to evaluate availability of family health history information within the publicly available data from All of Us and to characterize the data from both sources. MATERIALS AND METHODS Surveys were completed by participants on an electronic portal. EHR data was mapped to the Observational Medical Outcomes Partnership data model. We used descriptive statistics to perform exploratory analysis of the data, including evaluating a list of medically actionable genetic disorders. We performed a subanalysis on participants who had both survey and EHR data. RESULTS There were 54 872 participants with family history data. Of those, 26% had EHR data only, 63% had survey only, and 10.5% had data from both sources. There were 35 217 participants with reported family history of a medically actionable genetic disorder (9% from EHR only, 89% from surveys, and 2% from both). In the subanalysis, we found inconsistencies between the surveys and EHRs. More details came from surveys. When both mentioned a similar disease, the source of truth was unclear. CONCLUSIONS Compiling data from both surveys and EHR can provide a more comprehensive source for family health history, but informatics challenges and opportunities exist. Access to more complete understanding of a person's family health history may provide opportunities for precision medicine.
Collapse
Affiliation(s)
- Robert M Cronin
- Department of Biomedical Informatics, Vanderbilt University Medical Center, Nashville, Tennessee, USA
- Department of Medicine, The Ohio State University, Columbus, Ohio, USA
| | - Alese E Halvorson
- Department of Biomedical Informatics, Vanderbilt University Medical Center, Nashville, Tennessee, USA
| | - Cassie Springer
- Department of Biomedical Informatics, Vanderbilt University Medical Center, Nashville, Tennessee, USA
| | - Xiaoke Feng
- Department of Biomedical Informatics, Vanderbilt University Medical Center, Nashville, Tennessee, USA
| | - Lina Sulieman
- Department of Biomedical Informatics, Vanderbilt University Medical Center, Nashville, Tennessee, USA
| | - Roxana Loperena-Cortes
- Department of Biomedical Informatics, Vanderbilt University Medical Center, Nashville, Tennessee, USA
| | - Kelsey Mayo
- Department of Biomedical Informatics, Vanderbilt University Medical Center, Nashville, Tennessee, USA
| | - Robert J Carroll
- Department of Biomedical Informatics, Vanderbilt University Medical Center, Nashville, Tennessee, USA
| | - Qingxia Chen
- Department of Biomedical Informatics, Vanderbilt University Medical Center, Nashville, Tennessee, USA
| | - Brian K Ahmedani
- Center for Health Policy and Health Services Research, Henry Ford Health System, Detroit, Michigan, USA
| | - Jason Karnes
- Department of Pharmacy Practice and Science, University of Arizona College of Pharmacy, Tuscon, Arizona, USA
| | - Bruce Korf
- Department of Genetics, University of Alabama at Birmingham, Birmingham, Alabama, USA
| | - Christopher J O’Donnell
- Department of Medicine, Veterans Administration Boston Healthcare System, Boston, Massachusetts, USA
- Department of Medicine, Brigham and Women's Hospital, Harvard Medical School, Boston, Massachusetts, USA
| | - Jun Qian
- Department of Biomedical Informatics, Vanderbilt University Medical Center, Nashville, Tennessee, USA
| | - Andrea H Ramirez
- Department of Biomedical Informatics, Vanderbilt University Medical Center, Nashville, Tennessee, USA
| |
Collapse
|