1
|
Haendel MA, Chute CG, Bennett TD, Eichmann DA, Guinney J, Kibbe WA, Payne PRO, Pfaff ER, Robinson PN, Saltz JH, Spratt H, Suver C, Wilbanks J, Wilcox AB, Williams AE, Wu C, Blacketer C, Bradford RL, Cimino JJ, Clark M, Colmenares EW, Francis PA, Gabriel D, Graves A, Hemadri R, Hong SS, Hripscak G, Jiao D, Klann JG, Kostka K, Lee AM, Lehmann HP, Lingrey L, Miller RT, Morris M, Murphy SN, Natarajan K, Palchuk MB, Sheikh U, Solbrig H, Visweswaran S, Walden A, Walters KM, Weber GM, Zhang XT, Zhu RL, Amor B, Girvin AT, Manna A, Qureshi N, Kurilla MG, Michael SG, Portilla LM, Rutter JL, Austin CP, Gersing KR. The National COVID Cohort Collaborative (N3C): Rationale, design, infrastructure, and deployment. J Am Med Inform Assoc 2021; 28:427-443. [PMID: 32805036 PMCID: PMC7454687 DOI: 10.1093/jamia/ocaa196] [Citation(s) in RCA: 293] [Impact Index Per Article: 97.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/14/2020] [Accepted: 08/14/2020] [Indexed: 01/12/2023] Open
Abstract
Objective Coronavirus disease 2019 (COVID-19) poses societal challenges that require expeditious data and knowledge sharing. Though organizational clinical data are abundant, these are largely inaccessible to outside researchers. Statistical, machine learning, and causal analyses are most successful with large-scale data beyond what is available in any given organization. Here, we introduce the National COVID Cohort Collaborative (N3C), an open science community focused on analyzing patient-level data from many centers. Materials and Methods The Clinical and Translational Science Award Program and scientific community created N3C to overcome technical, regulatory, policy, and governance barriers to sharing and harmonizing individual-level clinical data. We developed solutions to extract, aggregate, and harmonize data across organizations and data models, and created a secure data enclave to enable efficient, transparent, and reproducible collaborative analytics. Results Organized in inclusive workstreams, we created legal agreements and governance for organizations and researchers; data extraction scripts to identify and ingest positive, negative, and possible COVID-19 cases; a data quality assurance and harmonization pipeline to create a single harmonized dataset; population of the secure data enclave with data, machine learning, and statistical analytics tools; dissemination mechanisms; and a synthetic data pilot to democratize data access. Conclusions The N3C has demonstrated that a multisite collaborative learning health network can overcome barriers to rapidly build a scalable infrastructure incorporating multiorganizational clinical data for COVID-19 analytics. We expect this effort to save lives by enabling rapid collaboration among clinicians, researchers, and data scientists to identify treatments and specialized care and thereby reduce the immediate and long-term impacts of COVID-19.
Collapse
Affiliation(s)
- Melissa A Haendel
- Oregon Clinical and Translational Research Institute, Oregon Health and Science University, Portland, Oregon, USA.,Translational and Integrative Sciences Center, Department of Molecular Toxicology, Oregon State University, Corvallis, Oregon, USA
| | - Christopher G Chute
- Schools of Medicine, Public Health, and Nursing, Johns Hopkins University, Baltimore, Maryland, USA
| | - Tellen D Bennett
- Section of Informatics and Data Science, Department of Pediatrics, University of Colorado School of Medicine, University of Colorado, Aurora, Colorado, USA
| | - David A Eichmann
- School of Library and Information Science, The University of Iowa, Iowa City, Iowa, USA
| | | | | | - Philip R O Payne
- Institute for Informatics, Washington University in St. Louis, Saint Louis,Missouri, USA
| | - Emily R Pfaff
- North Carolina Translational and Clinical Sciences Institute (NC TraCS), University of North Carolina at Chapel Hill, Chapel Hill,North Carolina, USA
| | | | - Joel H Saltz
- Department of Biomedical Informatics, Stony Brook University, Stony Brook, New York, USA
| | - Heidi Spratt
- University of Texas Medical Branch, Galveston, Texas, USA
| | | | | | | | - Andrew E Williams
- Tufts Medical Center Clinical and Translational Science Institute, Tufts Medical Center, Boston,Massachusetts, USA
| | - Chunlei Wu
- Department of Integrative Structural and Computational Biology, The Scripps Research Institute, La Jolla, California, USA
| | - Clair Blacketer
- Janssen Research and Development, LLC, Raritan, New Jersey, USA
| | - Robert L Bradford
- North Carolina Translational and Clinical Sciences Institute (NC TraCS), University of North Carolina at Chapel Hill, Chapel Hill,North Carolina, USA
| | - James J Cimino
- University of Alabama-Birmingham, Birmingham, Alabama, USA
| | - Marshall Clark
- North Carolina Translational and Clinical Sciences Institute (NC TraCS), University of North Carolina at Chapel Hill, Chapel Hill,North Carolina, USA
| | - Evan W Colmenares
- Department of Pharmaceutical Outcomes and Policy, University of North Carolina at Chapel Hill, Chapel Hill,North Carolina, USA
| | | | - Davera Gabriel
- Johns Hopkins University School of Medicine, Baltimore, Maryland, USA
| | - Alexis Graves
- University of Iowa Institute for Clinical and Translational Science, The University of Iowa, Iowa City, Iowa, USA
| | - Raju Hemadri
- National Center for Advancing Translational Science, Bethesda, Maryland, USA
| | - Stephanie S Hong
- Johns Hopkins University School of Medicine, Baltimore, Maryland, USA
| | - George Hripscak
- Department of Biomedical Informatics, Columbia University, New York, New York, USA
| | - Dazhi Jiao
- Johns Hopkins University School of Medicine, Baltimore, Maryland, USA
| | | | | | - Adam M Lee
- University of North Carolina at Chapel Hill, Chapel Hill,North Carolina, USA
| | - Harold P Lehmann
- Johns Hopkins University School of Medicine, Baltimore, Maryland, USA
| | | | - Robert T Miller
- Tufts Clinical and Translational Science Institute, Tufts University, Boston,Massachusetts, USA
| | - Michele Morris
- Department of Biomedical Informatics, University of Pittsburgh, Pittsburgh,Pennsylvania, USA
| | | | | | | | - Usman Sheikh
- National Center for Advancing Translational Science, Bethesda, Maryland, USA
| | - Harold Solbrig
- Johns Hopkins University School of Medicine, Baltimore, Maryland, USA
| | - Shyam Visweswaran
- Department of Biomedical Informatics, University of Pittsburgh, Pittsburgh,Pennsylvania, USA
| | - Anita Walden
- Oregon Clinical and Translational Research Institute, Oregon Health and Science University, Portland, Oregon, USA.,Sage Bionetworks, Seattle, Washington, USA
| | - Kellie M Walters
- North Carolina Translational and Clinical Sciences Institute (NC TraCS), University of North Carolina at Chapel Hill, Chapel Hill,North Carolina, USA
| | - Griffin M Weber
- Department of Biomedical Informatics, Harvard Medical School, Boston,Massachusetts, USA
| | | | - Richard L Zhu
- Johns Hopkins University School of Medicine, Baltimore, Maryland, USA
| | | | | | - Amin Manna
- Palantir Technologies, Palo Alto, California, USA
| | | | - Michael G Kurilla
- Division of Clinical Innovation, National Center for Advancing Translational Science, Bethesda, Maryland, USA
| | - Sam G Michael
- National Center for Advancing Translational Sciences, National Institutes of Health, Bethesda, Maryland, USA
| | - Lili M Portilla
- Office of Strategic Alliances, National Center for Advancing Translational Sciences, National Institutes of Health, Bethesda, Maryland, USA
| | - Joni L Rutter
- Office of the Director, National Center for Advancing Translational Science, Bethesda, Maryland, USA
| | - Christopher P Austin
- National Center for Advancing Translational Sciences, National Institutes of Health, Bethesda, Maryland, USA
| | - Ken R Gersing
- National Center for Advancing Translational Science, Bethesda, Maryland, USA
| | | |
Collapse
|