1
|
Tachinardi U, Grannis SJ, Michael SG, Misquitta L, Dahlin J, Sheikh U, Kho A, Phua J, Rogovin SS, Amor B, Choudhury M, Sparks P, Mannaa A, Ljazouli S, Saltz J, Prior F, Baghal A, Gersing K, Embi PJ. Privacy-preserving record linkage across disparate institutions and datasets to enable a learning health system: The national COVID cohort collaborative (N3C) experience. Learn Health Syst 2024; 8:e10404. [PMID: 38249841 PMCID: PMC10797567 DOI: 10.1002/lrh2.10404] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/22/2023] [Revised: 12/06/2023] [Accepted: 12/06/2023] [Indexed: 01/23/2024] Open
Abstract
Introduction Research driven by real-world clinical data is increasingly vital to enabling learning health systems, but integrating such data from across disparate health systems is challenging. As part of the NCATS National COVID Cohort Collaborative (N3C), the N3C Data Enclave was established as a centralized repository of deidentified and harmonized COVID-19 patient data from institutions across the US. However, making this data most useful for research requires linking it with information such as mortality data, images, and viral variants. The objective of this project was to establish privacy-preserving record linkage (PPRL) methods to ensure that patient-level EHR data remains secure and private when governance-approved linkages with other datasets occur. Methods Separate agreements and approval processes govern N3C data contribution and data access. The Linkage Honest Broker (LHB), an independent neutral party (the Regenstrief Institute), ensures data linkages are robust and secure by adding an extra layer of separation between protected health information and clinical data. The LHB's PPRL methods (including algorithms, processes, and governance) match patient records using "deidentified tokens," which are hashed combinations of identifier fields that define a match across data repositories without using patients' clear-text identifiers. Results These methods enable three linkage functions: Deduplication, Linking Multiple Datasets, and Cohort Discovery. To date, two external repositories have been cross-linked. As of March 1, 2023, 43 sites have signed the LHB Agreement; 35 sites have sent tokens generated for 9 528 998 patients. In this initial cohort, the LHB identified 135 037 matches and 68 596 duplicates. Conclusion This large-scale linkage study using deidentified datasets of varying characteristics established secure methods for protecting the privacy of N3C patient data when linked for research purposes. This technology has potential for use with registries for other diseases and conditions.
Collapse
Affiliation(s)
- Umberto Tachinardi
- Department of Biomedical InformaticsUniversity of Cincinnati College of MedicineCincinnatiOhioUSA
| | - Shaun J. Grannis
- Center for Biomedical Informatics, Regenstrief InstituteDepartment of Family Medicine, IU School of MedicineRegenstrief Institute, Inc. and Indiana University School of MedicineIndianapolisIndianaUSA
| | - Sam G. Michael
- National Center for Advancing Translational ScienceNIHBethesdaMarylandUSA
| | - Leonie Misquitta
- National Center for Advancing Translational ScienceNIHBethesdaMarylandUSA
| | - Jayme Dahlin
- National Center for Advancing Translational ScienceNIHBethesdaMarylandUSA
| | - Usman Sheikh
- National Center for Advancing Translational ScienceNIHBethesdaMarylandUSA
| | - Abel Kho
- Department of MedicineNorthwestern University, Feinberg School of MedicineChicagoIllinoisUSA
- Public SectorDatavant, IncSan FranciscoCaliforniaUSA
| | - Jasmin Phua
- Public SectorDatavant, IncSan FranciscoCaliforniaUSA
| | | | - Benjamin Amor
- Federal HealthPalantir TechnologiesDenverColoradoUSA
| | | | - Philip Sparks
- Federal HealthPalantir TechnologiesDenverColoradoUSA
| | - Amin Mannaa
- Federal HealthPalantir TechnologiesDenverColoradoUSA
| | - Saad Ljazouli
- Federal HealthPalantir TechnologiesDenverColoradoUSA
| | - Joel Saltz
- School of MedicineStony Brook UniversityStony BrookNew YorkUSA
| | - Fred Prior
- COM Biomedical InformaticsUniversity of Arkansas for Medical SciencesLittle RockArkansasUSA
| | - Ahmen Baghal
- COM Biomedical InformaticsUniversity of Arkansas for Medical SciencesLittle RockArkansasUSA
| | - Kenneth Gersing
- National Center for Advancing Translational ScienceNIHBethesdaMarylandUSA
| | - Peter J. Embi
- Department of Biomedical InformaticsVanderbilt University Medical CenterNashvilleTennesseeUSA
| |
Collapse
|
2
|
Bergquist T, Wax M, Bennett TD, Moffitt RA, Gao J, Chen G, Telenti A, Maher MC, Bartha I, Walker L, Orwoll BE, Mishra M, Alamgir J, Cragin BL, Ferguson CH, Wong HH, Deslattes Mays A, Misquitta L, DeMarco KA, Sciarretta KL, Patel SA. A framework for future national pediatric pandemic respiratory disease severity triage: The HHS pediatric COVID-19 data challenge. J Clin Transl Sci 2023; 7:e175. [PMID: 37745933 PMCID: PMC10514686 DOI: 10.1017/cts.2023.549] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/12/2023] [Revised: 04/28/2023] [Accepted: 05/05/2023] [Indexed: 09/26/2023] Open
Abstract
Introduction With persistent incidence, incomplete vaccination rates, confounding respiratory illnesses, and few therapeutic interventions available, COVID-19 continues to be a burden on the pediatric population. During a surge, it is difficult for hospitals to direct limited healthcare resources effectively. While the overwhelming majority of pediatric infections are mild, there have been life-threatening exceptions that illuminated the need to proactively identify pediatric patients at risk of severe COVID-19 and other respiratory infectious diseases. However, a nationwide capability for developing validated computational tools to identify pediatric patients at risk using real-world data does not exist. Methods HHS ASPR BARDA sought, through the power of competition in a challenge, to create computational models to address two clinically important questions using the National COVID Cohort Collaborative: (1) Of pediatric patients who test positive for COVID-19 in an outpatient setting, who are at risk for hospitalization? (2) Of pediatric patients who test positive for COVID-19 and are hospitalized, who are at risk for needing mechanical ventilation or cardiovascular interventions? Results This challenge was the first, multi-agency, coordinated computational challenge carried out by the federal government as a response to a public health emergency. Fifty-five computational models were evaluated across both tasks and two winners and three honorable mentions were selected. Conclusion This challenge serves as a framework for how the government, research communities, and large data repositories can be brought together to source solutions when resources are strapped during a pandemic.
Collapse
Affiliation(s)
| | - Marie Wax
- United States Department of Health and Human Services, Biomedical Advanced Research and Development Authority, Administration for Strategic Preparedness and Response, Washington, DC, USA
| | | | | | - Jifan Gao
- University of Wisconsin-Madison, Madison, WI, USA
| | - Guanhua Chen
- University of Wisconsin-Madison, Madison, WI, USA
| | | | | | | | - Lorne Walker
- Oregon Health & Science University, Portland, OR, USA
| | | | | | | | | | - Christopher H. Ferguson
- United States Department of Health and Human Services, Biomedical Advanced Research and Development Authority, Administration for Strategic Preparedness and Response, Washington, DC, USA
| | - Hui-Hsing Wong
- United States Department of Health and Human Services, Biomedical Advanced Research and Development Authority, Administration for Strategic Preparedness and Response, Washington, DC, USA
| | - Anne Deslattes Mays
- United States Department of Health and Human Services, National Institutes of Health, Eunice Kennedy Shriver National Institute of Child Health and Human Development, Bethesda, MD, USA
| | - Leonie Misquitta
- United States Department of Health and Human Services, National Institutes of Health, National Center for Advancing Translational Sciences, Bethesda, MD, USA
| | - Kerry A. DeMarco
- United States Department of Health and Human Services, Biomedical Advanced Research and Development Authority, Administration for Strategic Preparedness and Response, Washington, DC, USA
| | - Kimberly L. Sciarretta
- United States Department of Health and Human Services, Biomedical Advanced Research and Development Authority, Administration for Strategic Preparedness and Response, Washington, DC, USA
| | - Sandeep A. Patel
- United States Department of Health and Human Services, Biomedical Advanced Research and Development Authority, Administration for Strategic Preparedness and Response, Washington, DC, USA
| |
Collapse
|
3
|
Iwaki H, Leonard HL, Makarious MB, Bookman M, Landin B, Vismer D, Casey B, Gibbs JR, Hernandez DG, Blauwendraat C, Vitale D, Song Y, Kumar D, Dalgard CL, Sadeghi M, Dong X, Misquitta L, Scholz SW, Scherzer CR, Nalls MA, Biswas S, Singleton AB. Accelerating Medicines Partnership: Parkinson's Disease. Genetic Resource. Mov Disord 2021; 36:1795-1804. [PMID: 33960523 PMCID: PMC8453903 DOI: 10.1002/mds.28549] [Citation(s) in RCA: 36] [Impact Index Per Article: 12.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/04/2020] [Revised: 01/20/2021] [Accepted: 02/11/2021] [Indexed: 02/03/2023] Open
Abstract
BACKGROUND Whole-genome sequencing data are available from several large studies across a variety of diseases and traits. However, massive storage and computation resources are required to use these data, and to achieve sufficient power for discoveries, harmonization of multiple cohorts is critical. OBJECTIVES The Accelerating Medicines Partnership Parkinson's Disease program has developed a research platform for Parkinson's disease (PD) that integrates the storage and analysis of whole-genome sequencing data, RNA expression data, and clinical data, harmonized across multiple cohort studies. METHODS The version 1 release contains whole-genome sequencing data derived from 3941 participants from 4 cohorts. Samples underwent joint genotyping by the TOPMed Freeze 9 Variant Calling Pipeline. We performed descriptive analyses of these whole-genome sequencing data using the Accelerating Medicines Partnership Parkinson's Disease platform. RESULTS The clinical diagnosis of participants in version 1 release includes 2005 idiopathic PD patients, 963 healthy controls, 64 prodromal subjects, 62 clinically diagnosed PD subjects without evidence of dopamine deficit, and 705 participants of genetically enriched cohorts carrying PD risk-associated GBA variants or LRRK2 variants, of whom 304 were affected. We did not observe significant enrichment of pathogenic variants in the idiopathic PD group, but the polygenic risk score was higher in PD both in nongenetically enriched cohorts and genetically enriched cohorts. The population analysis showed a correlation between genetically enriched cohorts and Ashkenazi Jewish ancestry. CONCLUSIONS We describe the genetic component of the Accelerating Medicines Partnership Parkinson's Disease platform, a solution to democratize data access and analysis for the PD research community. © 2021 The Authors. Movement Disorders published by Wiley Periodicals LLC on behalf of International Parkinson and Movement Disorder Society. This article is a U.S. Government work and is in the public domain in the USA.
Collapse
Affiliation(s)
- Hirotaka Iwaki
- Data Tecnica InternationalGlen EchoMarylandUSA
- Center for Alzheimer's and Related DementiasNational Institute on AgingBethesdaMarylandUSA
- Laboratory of NeurogeneticsNational Institute on AgingBethesdaMarylandUSA
| | - Hampton L. Leonard
- Data Tecnica InternationalGlen EchoMarylandUSA
- Center for Alzheimer's and Related DementiasNational Institute on AgingBethesdaMarylandUSA
- Laboratory of NeurogeneticsNational Institute on AgingBethesdaMarylandUSA
| | - Mary B. Makarious
- Laboratory of NeurogeneticsNational Institute on AgingBethesdaMarylandUSA
| | | | | | | | - Bradford Casey
- The Michael J. Fox Foundation for Parkinson's ResearchNew YorkNew YorkUSA
| | - J. Raphael Gibbs
- Laboratory of NeurogeneticsNational Institute on AgingBethesdaMarylandUSA
| | - Dena G. Hernandez
- Laboratory of NeurogeneticsNational Institute on AgingBethesdaMarylandUSA
| | | | - Daniel Vitale
- Data Tecnica InternationalGlen EchoMarylandUSA
- Center for Alzheimer's and Related DementiasNational Institute on AgingBethesdaMarylandUSA
- Laboratory of NeurogeneticsNational Institute on AgingBethesdaMarylandUSA
| | - Yeajin Song
- Data Tecnica InternationalGlen EchoMarylandUSA
- Center for Alzheimer's and Related DementiasNational Institute on AgingBethesdaMarylandUSA
- Laboratory of NeurogeneticsNational Institute on AgingBethesdaMarylandUSA
| | | | - Clifton L. Dalgard
- Department of Anatomy, Physiology & GeneticsUniformed Services University of the Health SciencesBethesdaMarylandUSA
- The American Genome CenterUniformed Services University of the Health SciencesBethesdaMarylandUSA
| | - Mahdiar Sadeghi
- SanofiFraminghamMassachusettsUSA
- Northeastern UniversityBostonMassachusettsUSA
| | - Xianjun Dong
- Harvard Medical SchoolBrigham and Women's HospitalBostonMassachusettsUSA
| | | | - Sonja W. Scholz
- National Institute of Neurological Disorders and StrokeBethesdaMarylandUSA
- Department of NeurologyJohns Hopkins UniversityBaltimoreMarylandUSA
| | | | - Mike A. Nalls
- Data Tecnica InternationalGlen EchoMarylandUSA
- Center for Alzheimer's and Related DementiasNational Institute on AgingBethesdaMarylandUSA
- Laboratory of NeurogeneticsNational Institute on AgingBethesdaMarylandUSA
| | | | - Andrew B. Singleton
- Center for Alzheimer's and Related DementiasNational Institute on AgingBethesdaMarylandUSA
- Laboratory of NeurogeneticsNational Institute on AgingBethesdaMarylandUSA
| | | | | | | |
Collapse
|
4
|
LaPlaca MC, Huie JR, Alam HB, Bachstetter AD, Bayir H, Bellgowan PF, Cummings D, Dixon CE, Ferguson AR, Ferland-Beckham C, Floyd CL, Friess SH, Galanopoulou AS, Hall ED, Harris NG, Hawkins BE, Hicks RR, Hulbert LE, Johnson VE, Kabitzke PA, Lafrenaye AD, Lemmon VP, Lifshitz CW, Lifshitz J, Loane DJ, Misquitta L, Nikolian VC, Noble-Haeusslein LJ, Smith DH, Taylor-Burds C, Umoh N, Vovk O, Williams AM, Young M, Zai LJ. Pre-Clinical Common Data Elements for Traumatic Brain Injury Research: Progress and Use Cases. J Neurotrauma 2021; 38:1399-1410. [PMID: 33297844 PMCID: PMC8082734 DOI: 10.1089/neu.2020.7328] [Citation(s) in RCA: 16] [Impact Index Per Article: 5.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/07/2023] Open
Abstract
Traumatic brain injury (TBI) is an extremely complex condition due to heterogeneity in injury mechanism, underlying conditions, and secondary injury. Pre-clinical and clinical researchers face challenges with reproducibility that negatively impact translation and therapeutic development for improved TBI patient outcomes. To address this challenge, TBI Pre-clinical Working Groups expanded upon previous efforts and developed common data elements (CDEs) to describe the most frequently used experimental parameters. The working groups created 913 CDEs to describe study metadata, animal characteristics, animal history, injury models, and behavioral tests. Use cases applied a set of commonly used CDEs to address and evaluate the degree of missing data resulting from combining legacy data from different laboratories for two different outcome measures (Morris water maze [MWM]; RotorRod/Rotarod). Data were cleaned and harmonized to Form Structures containing the relevant CDEs and subjected to missing value analysis. For the MWM dataset (358 animals from five studies, 44 CDEs), 50% of the CDEs contained at least one missing value, while for the Rotarod dataset (97 animals from three studies, 48 CDEs), over 60% of CDEs contained at least one missing value. Overall, 35% of values were missing across the MWM dataset, and 33% of values were missing for the Rotarod dataset, demonstrating both the feasibility and the challenge of combining legacy datasets using CDEs. The CDEs and the associated forms created here are available to the broader pre-clinical research community to promote consistent and comprehensive data acquisition, as well as to facilitate data sharing and formation of data repositories. In addition to addressing the challenge of standardization in TBI pre-clinical studies, this effort is intended to bring attention to the discrepancies in assessment and outcome metrics among pre-clinical laboratories and ultimately accelerate translation to clinical research.
Collapse
Affiliation(s)
- Michelle C. LaPlaca
- Department of Biomedical Engineering, Georgia Institute of Technology/Emory University, Atlanta, Georgia, USA
- San Francisco Veterans Affairs Health Care System, San Francisco, California, USA
| | - J. Russell Huie
- Brain and Spinal Injury Center, Department of Neurological Surgery, Weill Institute for Neurosciences, University of California, San Francisco, San Francisco, California, USA
| | - Hasan B. Alam
- Department of Surgery, University of Michigan, Ann Arbor, Michigan, USA
| | - Adam D. Bachstetter
- Department of Neuroscience, University of Kentucky, Lexington, Kentucky, USA
| | - Hűlya Bayir
- Department of Neurological Surgery, University of Pittsburgh, Pittsburgh, Pennsylvania, USA
| | | | | | - C. Edward Dixon
- Department of Neurological Surgery, University of Pittsburgh, Pittsburgh, Pennsylvania, USA
| | - Adam R. Ferguson
- Brain and Spinal Injury Center, Department of Neurological Surgery, Weill Institute for Neurosciences, University of California, San Francisco, San Francisco, California, USA
| | | | - Candace L. Floyd
- Department of Physical Medicine and Rehabilitation, University of Utah, Salt Lake City, Utah, USA
| | - Stuart H. Friess
- Division of Critical Care Medicine, Washington University in St. Louis, St. Louis, Missouri, USA
| | | | - Edward D. Hall
- Department of Neuroscience, University of Kentucky, Lexington, Kentucky, USA
| | - Neil G. Harris
- Department of Neurosurgery, University of California, Los Angeles, Los Angeles, California, USA
| | - Bridget E. Hawkins
- Department of Anesthesiology, University of Texas Medical Branch, Galveston, Texas, USA
| | | | - Lindsey E. Hulbert
- Department of Animal Sciences and Industry, Kansas State University, Manhattan, Kansas, USA
| | - Victoria E. Johnson
- Department of Neurosurgery, University of Pennsylvania, Philadelphia, Pennsylvania, USA
| | | | - Audrey D. Lafrenaye
- Department of Anatomy and Neurobiology, Virginia Commonwealth University, Richmond, Virginia, USA
| | - Vance P. Lemmon
- Department of Neurological Surgery, University of Miami, Miami, Florida, USA
| | - Carrie W. Lifshitz
- Department of Child Health, University of Arizona College of Medicine Phoenix, Phoenix, Arizona, USA
| | - Jonathan Lifshitz
- Department of Child Health, University of Arizona College of Medicine Phoenix, Phoenix, Arizona, USA
| | - David J. Loane
- School of Biochemistry and Immunology, Trinity College Dublin, Dublin, Ireland
| | | | | | | | - Douglas H. Smith
- Department of Neurosurgery, University of Pennsylvania, Philadelphia, Pennsylvania, USA
| | | | - Nsini Umoh
- Department of Defense, U.S. Army Medical Research and Materiel Command, Fort Detrick, Frederick, Maryland, USA
| | - Olga Vovk
- National Institutes of Health, Bethesda, Maryland, USA
| | - Aaron M. Williams
- Department of Surgery, University of Michigan, Ann Arbor, Michigan, USA
| | - Margaret Young
- Department of Neurological Surgery, University of Pittsburgh, Pittsburgh, Pennsylvania, USA
| | | |
Collapse
|
5
|
Abstract
The Biomedical Research Informatics Computing System (BRICS) was developed to support multiple disease-focused research programs. Seven service modules are integrated together to provide a collaborative and extensible web-based environment. The modules-Data Dictionary, Account Management, Query Tool, Protocol and Form Research Management System, Meta Study, Data Repository and Globally Unique Identifier -facilitate the management of research protocols, to submit, process, curate, access and store clinical, imaging, and derived genomics data within the associated data repositories. Multiple instances of BRICS are deployed to support various biomedical research communities focused on accelerating discoveries for rare diseases, Traumatic Brain Injury, Parkinson's Disease, inherited eye diseases and symptom science research. No Personally Identifiable Information is stored within the data repositories. Digital Object Identifiers are associated with the research studies. Reusability of biomedical data is enhanced by Common Data Elements (CDEs) which enable systematic collection, analysis and sharing of data. The use of CDEs with a service-oriented informatics architecture enabled the development of disease-specific repositories that support hypothesis-based biomedical research.
Collapse
Affiliation(s)
- Vivek Navale
- Office of Intramural Research, Center for Information Technology, National Institutes of Health, USA, Bethesda, Maryland, 20892, USA
| | - Michele Ji
- Office of Intramural Research, Center for Information Technology, National Institutes of Health, USA, Bethesda, Maryland, 20892, USA
| | - Olga Vovk
- General Dynamics Information Technology, Inc., Fairfax, Virginia, 22030, USA
| | | | | | - Alison Garcia
- Sapient Government Services, Arlington, Virginia, 22201, USA
| | - Yang Fann
- Intramural IT and Bioinformatics Program, National Institute of Neurological Disorders and Stroke, National Institutes of Health, Bethesda, Maryland, 20892, USA
| | - Matthew McAuliffe
- Office of Intramural Research, Center for Information Technology, National Institutes of Health, USA, Bethesda, Maryland, 20892, USA
| |
Collapse
|
6
|
Temple G, Gerhard DS, Rasooly R, Feingold EA, Good PJ, Robinson C, Mandich A, Derge JG, Lewis J, Shoaf D, Collins FS, Jang W, Wagner L, Shenmen CM, Misquitta L, Schaefer CF, Buetow KH, Bonner TI, Yankie L, Ward M, Phan L, Astashyn A, Brown G, Farrell C, Hart J, Landrum M, Maidak BL, Murphy M, Murphy T, Rajput B, Riddick L, Webb D, Weber J, Wu W, Pruitt KD, Maglott D, Siepel A, Brejova B, Diekhans M, Harte R, Baertsch R, Kent J, Haussler D, Brent M, Langton L, Comstock CLG, Stevens M, Wei C, van Baren MJ, Salehi-Ashtiani K, Murray RR, Ghamsari L, Mello E, Lin C, Pennacchio C, Schreiber K, Shapiro N, Marsh A, Pardes E, Moore T, Lebeau A, Muratet M, Simmons B, Kloske D, Sieja S, Hudson J, Sethupathy P, Brownstein M, Bhat N, Lazar J, Jacob H, Gruber CE, Smith MR, McPherson J, Garcia AM, Gunaratne PH, Wu J, Muzny D, Gibbs RA, Young AC, Bouffard GG, Blakesley RW, Mullikin J, Green ED, Dickson MC, Rodriguez AC, Grimwood J, Schmutz J, Myers RM, Hirst M, Zeng T, Tse K, Moksa M, Deng M, Ma K, Mah D, Pang J, Taylor G, Chuah E, Deng A, Fichter K, Go A, Lee S, Wang J, Griffith M, Morin R, Moore RA, Mayo M, Munro S, Wagner S, Jones SJM, Holt RA, Marra MA, Lu S, Yang S, Hartigan J, Graf M, Wagner R, Letovksy S, Pulido JC, Robison K, Esposito D, Hartley J, Wall VE, Hopkins RF, Ohara O, Wiemann S. The completion of the Mammalian Gene Collection (MGC). Genome Res 2009; 19:2324-33. [PMID: 19767417 DOI: 10.1101/gr.095976.109] [Citation(s) in RCA: 103] [Impact Index Per Article: 6.9] [Reference Citation Analysis] [What about the content of this article? (0)] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/20/2022]
Abstract
Since its start, the Mammalian Gene Collection (MGC) has sought to provide at least one full-protein-coding sequence cDNA clone for every human and mouse gene with a RefSeq transcript, and at least 6200 rat genes. The MGC cloning effort initially relied on random expressed sequence tag screening of cDNA libraries. Here, we summarize our recent progress using directed RT-PCR cloning and DNA synthesis. The MGC now contains clones with the entire protein-coding sequence for 92% of human and 89% of mouse genes with curated RefSeq (NM-accession) transcripts, and for 97% of human and 96% of mouse genes with curated RefSeq transcripts that have one or more PubMed publications, in addition to clones for more than 6300 rat genes. These high-quality MGC clones and their sequences are accessible without restriction to researchers worldwide.
Collapse
|
7
|
Abstract
INTRODUCTIONRNA interference (RNAi) is a powerful method for determining the role of specific genes during Drosophila embryogenesis. This protocol describes a method for collection of Drosophila embryos for RNA interference (RNAi) experiments. The embryos are collected in a simple, homemade apparatus, arrayed on prepared glass slides, and readied for injection. It is important to keep the embryos moist and oxygenated. Work expeditiously, because embryos should be injected within 30-60 min after collection.
Collapse
|
8
|
Abstract
INTRODUCTIONRNA interference (RNAi) is a powerful method for determining the role of specific genes during Drosophila embryogenesis. This protocol describes a technique by which Drosophila embryos can be injected with dsRNA in order to disrupt targeted gene function. The approach is straightforward, utilizing improved methods for injecting the dsRNA directly through the chorion of the embryo. This strategy minimizes problems normally associated with desiccation of the dechorionated embryo and facilitates post-injection analysis of gene expression.
Collapse
|
9
|
Abstract
INTRODUCTIONRNA interference (RNAi) is a powerful method for determining the role of specific genes during Drosophila embryogenesis. This protocol describes a method for RNAi in vivo using tissue-specific Gal-4 transgenes to induce dsRNA synthesis from an upstream activator sequence (UAS) vector. This vector contains the desired exonic inverted sequences representing the target gene (preferably more than 400 bp) separated by a unique spacer, the first intron of the actin 5C gene. The inverted repeats are stable during cloning in E. coli with this intronic spacer and the intron is spliced out to produce an almost perfect dsRNA target for Dicer cleavage and the production of siRNAs.
Collapse
Affiliation(s)
- Leonie Misquitta
- Laboratory of Biochemistry and Molecular Biology, National Cancer Institute, National Institutes of Health, Bethesda, MD 20892, USA
| | | | | |
Collapse
|
10
|
Misquitta L, Wei Q, Paterson BM. Preparation of Double-Stranded RNA for Drosophila RNA Interference (RNAi). Cold Spring Harb Protoc 2008; 2008:pdb.prot4916. [PMID: 21356761 DOI: 10.1101/pdb.prot4916] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 05/30/2023]
Abstract
INTRODUCTIONRNA interference (RNAi) is a powerful method for determining the role of specific genes during Drosophila embryogenesis. It has been used in our laboratory to phenocopy a series of known mutations in Drosophila, including twist, engrailed, daughterless, Dmef2, and, to a lesser extent, white in the adult eye. This protocol describes the preparation of dsRNA by in vitro transcription of complementary strands of a cloned DNA fragment that codes for all or a portion of the gene of interest, followed by annealing of the transcribed RNA.
Collapse
|
11
|
Gerhard DS, Wagner L, Feingold EA, Shenmen CM, Grouse LH, Schuler G, Klein SL, Old S, Rasooly R, Good P, Guyer M, Peck AM, Derge JG, Lipman D, Collins FS, Jang W, Sherry S, Feolo M, Misquitta L, Lee E, Rotmistrovsky K, Greenhut SF, Schaefer CF, Buetow K, Bonner TI, Haussler D, Kent J, Kiekhaus M, Furey T, Brent M, Prange C, Schreiber K, Shapiro N, Bhat NK, Hopkins RF, Hsie F, Driscoll T, Soares MB, Casavant TL, Scheetz TE, Brown-stein MJ, Usdin TB, Toshiyuki S, Carninci P, Piao Y, Dudekula DB, Ko MSH, Kawakami K, Suzuki Y, Sugano S, Gruber CE, Smith MR, Simmons B, Moore T, Waterman R, Johnson SL, Ruan Y, Wei CL, Mathavan S, Gunaratne PH, Wu J, Garcia AM, Hulyk SW, Fuh E, Yuan Y, Sneed A, Kowis C, Hodgson A, Muzny DM, McPherson J, Gibbs RA, Fahey J, Helton E, Ketteman M, Madan A, Rodrigues S, Sanchez A, Whiting M, Madari A, Young AC, Wetherby KD, Granite SJ, Kwong PN, Brinkley CP, Pearson RL, Bouffard GG, Blakesly RW, Green ED, Dickson MC, Rodriguez AC, Grimwood J, Schmutz J, Myers RM, Butterfield YSN, Griffith M, Griffith OL, Krzywinski MI, Liao N, Morin R, Morrin R, Palmquist D, Petrescu AS, Skalska U, Smailus DE, Stott JM, Schnerch A, Schein JE, Jones SJM, Holt RA, Baross A, Marra MA, Clifton S, Makowski KA, Bosak S, Malek J. The status, quality, and expansion of the NIH full-length cDNA project: the Mammalian Gene Collection (MGC). Genome Res 2004; 14:2121-7. [PMID: 15489334 PMCID: PMC528928 DOI: 10.1101/gr.2596504] [Citation(s) in RCA: 403] [Impact Index Per Article: 20.2] [Reference Citation Analysis] [What about the content of this article? (0)] [Abstract] [MESH Headings] [Grants] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/14/2022]
Abstract
The National Institutes of Health's Mammalian Gene Collection (MGC) project was designed to generate and sequence a publicly accessible cDNA resource containing a complete open reading frame (ORF) for every human and mouse gene. The project initially used a random strategy to select clones from a large number of cDNA libraries from diverse tissues. Candidate clones were chosen based on 5'-EST sequences, and then fully sequenced to high accuracy and analyzed by algorithms developed for this project. Currently, more than 11,000 human and 10,000 mouse genes are represented in MGC by at least one clone with a full ORF. The random selection approach is now reaching a saturation point, and a transition to protocols targeted at the missing transcripts is now required to complete the mouse and human collections. Comparison of the sequence of the MGC clones to reference genome sequences reveals that most cDNA clones are of very high sequence quality, although it is likely that some cDNAs may carry missense variants as a consequence of experimental artifact, such as PCR, cloning, or reverse transcriptase errors. Recently, a rat cDNA component was added to the project, and ongoing frog (Xenopus) and zebrafish (Danio) cDNA projects were expanded to take advantage of the high-throughput MGC pipeline.
Collapse
|
12
|
Misquitta L, Paterson BM. Targeted disruption of gene function in Drosophila by RNA interference (RNA-i): a role for nautilus in embryonic somatic muscle formation. Proc Natl Acad Sci U S A 1999; 96:1451-6. [PMID: 9990044 PMCID: PMC15483 DOI: 10.1073/pnas.96.4.1451] [Citation(s) in RCA: 268] [Impact Index Per Article: 10.7] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022] Open
Abstract
The expression of the MyoD gene homolog, nautilus (nau), in the Drosophila embryo defines a subset of mesodermal cells known as the muscle "pioneer" or "founder" cells. These cells are thought to establish the future muscle pattern in each hemisegment. Founders appear to recruit fusion-competent mesodermal cells to establish a particular muscle fiber type. In support of this concept every somatic muscle in the embryo is associated with one or more nautilus-positive cells. However, because of the lack of known (isolated) nautilus mutations, no direct test of the founder cell hypothesis has been possible. We now have utilized toxin ablation and genetic interference by double-stranded RNA (RNA interference or RNA-i) to determine both the role of the nautilus-expressing cells and the nautilus gene, respectively, in embryonic muscle formation. In the absence of nautilus-expressing cells muscle formation is severely disrupted or absent. A similar phenotype is observed with the elimination of the nautilus gene product by genetic interference upon injection of nautilus double-stranded RNA. These results define a crucial role for nautilus in embryonic muscle formation. The application of RNA interference to a variety of known Drosophila mutations as controls gave phenotypes essentially indistinguishable from the original mutation. RNA-i provides a powerful approach for the targeted disruption of a given genetic function in Drosophila.
Collapse
Affiliation(s)
- L Misquitta
- Laboratory of Biochemistry, National Cancer Institute, National Institutes of Health, Bethesda, MD 20892, USA
| | | |
Collapse
|