1
|
Cabili MN, Lawson J, Saltzman A, Rushton G, O’Rourke P, Wilbanks J, Rodriguez LL, Nyronen T, Courtot M, Donnelly S, Philippakis AA. Empirical validation of an automated approach to data use oversight. Cell Genom 2021; 1:100031. [PMID: 36778584 PMCID: PMC9903839 DOI: 10.1016/j.xgen.2021.100031] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 02/28/2021] [Revised: 06/30/2021] [Accepted: 08/07/2021] [Indexed: 10/19/2022]
Abstract
The current paradigm for data use oversight of biomedical datasets is onerous, extending the timescale and resources needed to obtain access for secondary analyses, thus hindering scientific discovery. For a researcher to utilize a controlled-access dataset, a data access committee must review her research plans to determine whether they are consistent with the data use limitations (DULs) specified by the informed consent form. The newly created GA4GH data use ontology (DUO) holds the potential to streamline this process by making data use oversight computable. Here, we describe an open-source software platform, the Data Use Oversight System (DUOS), that connects with DUO terminology to enable automated data use oversight. We analyze dbGaP data acquired since 2006, finding an exponential increase in data access requests, which will not be sustainable with current manual oversight review. We perform an empirical evaluation of DUOS and DUO on selected datasets from the Broad Institute's data repository. We were able to structure 118/123 of the evaluated DULs (96%) and 52/52 (100%) of research proposals using DUO terminology, and we find that DUOS' automated data access adjudication in all cases agreed with the DAC manual review. This first empirical evaluation of the feasibility of automated data use oversight demonstrates comparable accuracy to human-based data access oversight in real-world data governance.
Collapse
Affiliation(s)
- Moran N. Cabili
- Broad Institute of Harvard and the Massachusetts Institute of Technology, Cambridge, MA, USA
| | - Jonathan Lawson
- Broad Institute of Harvard and the Massachusetts Institute of Technology, Cambridge, MA, USA
| | - Andrea Saltzman
- Broad Institute of Harvard and the Massachusetts Institute of Technology, Cambridge, MA, USA
| | - Greg Rushton
- Broad Institute of Harvard and the Massachusetts Institute of Technology, Cambridge, MA, USA
| | | | | | | | - Tommi Nyronen
- ELIXIR Finland, CSC - IT Center for Science, Espoo, Finland
| | - Mélanie Courtot
- European Molecular Biology Laboratory - European Bioinformatics Institute (EMBL-EBI), Hinxton, UK
| | - Stacey Donnelly
- Broad Institute of Harvard and the Massachusetts Institute of Technology, Cambridge, MA, USA,Corresponding author
| | - Anthony A. Philippakis
- Broad Institute of Harvard and the Massachusetts Institute of Technology, Cambridge, MA, USA,Corresponding author
| |
Collapse
|
2
|
Lawson J, Cabili MN, Kerry G, Boughtwood T, Thorogood A, Alper P, Bowers SR, Boyles RR, Brookes AJ, Brush M, Burdett T, Clissold H, Donnelly S, Dyke SO, Freeberg MA, Haendel MA, Hata C, Holub P, Jeanson F, Jene A, Kawashima M, Kawashima S, Konopko M, Kyomugisha I, Li H, Linden M, Rodriguez LL, Morita M, Mulder N, Muller J, Nagaie S, Nasir J, Ogishima S, Ota Wang V, Paglione LD, Pandya RN, Parkinson H, Philippakis AA, Prasser F, Rambla J, Reinold K, Rushton GA, Saltzman A, Saunders G, Sofia HJ, Spalding JD, Swertz MA, Tulchinsky I, van Enckevort EJ, Varma S, Voisin C, Yamamoto N, Yamasaki C, Zass L, Guidry Auvil JM, Nyrönen TH, Courtot M. The Data Use Ontology to streamline responsible access to human biomedical datasets. Cell Genom 2021; 1:None. [PMID: 34820659 PMCID: PMC8591903 DOI: 10.1016/j.xgen.2021.100028] [Citation(s) in RCA: 26] [Impact Index Per Article: 8.7] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 02/28/2021] [Revised: 07/02/2021] [Accepted: 08/09/2021] [Indexed: 11/25/2022]
Abstract
Human biomedical datasets that are critical for research and clinical studies to benefit human health also often contain sensitive or potentially identifying information of individual participants. Thus, care must be taken when they are processed and made available to comply with ethical and regulatory frameworks and informed consent data conditions. To enable and streamline data access for these biomedical datasets, the Global Alliance for Genomics and Health (GA4GH) Data Use and Researcher Identities (DURI) work stream developed and approved the Data Use Ontology (DUO) standard. DUO is a hierarchical vocabulary of human and machine-readable data use terms that consistently and unambiguously represents a dataset's allowable data uses. DUO has been implemented by major international stakeholders such as the Broad and Sanger Institutes and is currently used in annotation of over 200,000 datasets worldwide. Using DUO in data management and access facilitates researchers' discovery and access of relevant datasets. DUO annotations increase the FAIRness of datasets and support data linkages using common data use profiles when integrating the data for secondary analyses. DUO is implemented in the Web Ontology Language (OWL) and, to increase community awareness and engagement, hosted in an open, centralized GitHub repository. DUO, together with the GA4GH Passport standard, offers a new, efficient, and streamlined data authorization and access framework that has enabled increased sharing of biomedical datasets worldwide.
Collapse
Affiliation(s)
- Jonathan Lawson
- Broad Institute of Harvard and the Massachusetts Institute of Technology, Cambridge, MA, USA
| | - Moran N. Cabili
- Broad Institute of Harvard and the Massachusetts Institute of Technology, Cambridge, MA, USA
| | - Giselle Kerry
- European Molecular Biology Laboratory—European Bioinformatics Institute (EMBL-EBI), Hinxton, UK
| | - Tiffany Boughtwood
- Australian Genomics, Murdoch Children’s Research Institute, Parkville, VIC, Australia
| | - Adrian Thorogood
- Centre of Genomics and Policy, Department of Human Genetics, McGill University, Montreal, QC, Canada,ELIXIR-Luxembourg, Luxembourg Centre for Systems Biomedicine, University of Luxembourg, Belvaux, Luxembourg
| | - Pinar Alper
- ELIXIR-Luxembourg, Luxembourg Centre for Systems Biomedicine, University of Luxembourg, Belvaux, Luxembourg
| | | | | | | | - Matthew Brush
- University of Colorado Anschutz Medical Campus, Aurora, CO, USA
| | - Tony Burdett
- European Molecular Biology Laboratory—European Bioinformatics Institute (EMBL-EBI), Hinxton, UK
| | - Hayley Clissold
- Wellcome Sanger Institute, Wellcome Genome Campus, Hinxton, UK
| | - Stacey Donnelly
- Broad Institute of Harvard and the Massachusetts Institute of Technology, Cambridge, MA, USA
| | - Stephanie O.M. Dyke
- McGill Centre for Integrative Neuroscience, Montreal Neurological Institute, Department of Neurology & Neurosurgery, Faculty of Medicine, McGill University, Montreal, QC, Canada
| | - Mallory A. Freeberg
- European Molecular Biology Laboratory—European Bioinformatics Institute (EMBL-EBI), Hinxton, UK
| | | | - Chihiro Hata
- Bioinformation and DDBJ Center, National Institute of Genetics, Mishima, Japan
| | - Petr Holub
- BBMRI-ERIC, AT and Masaryk University, Brno, Czech Republic
| | | | - Aina Jene
- Centre de Regulació Genòmica (CRG), Barcelona, Spain
| | - Minae Kawashima
- National Bioscience Database Center, Japan Science and Technology Agency, Tokyo, Japan
| | - Shuichi Kawashima
- Database Center for Life Science, Joint Support-Center for Data Science Research, Research Organization of Information and Systems, Kashiwa, Japan
| | | | - Irene Kyomugisha
- Division of Human Genetics, Faculty of Health Sciences, University of Cape Town, Cape Town, South Africa
| | - Haoyuan Li
- Canada’s Michael Smith Genome Sciences Centre, Vancouver, BC, Canada
| | - Mikael Linden
- ELIXIR-Finland, CSC - IT Center for Science Ltd, Espoo, Finland
| | | | | | - Nicola Mulder
- Computational Biology Division, IDM, Faculty of Health Sciences, University of Cape Town, Cape Town, South Africa
| | - Jean Muller
- Laboratoire de Génétique Médicale, Institut de Génétique Médicale d’Alsace, INSERM U1112, Université; de Strasbourg, Strasbourg, France,Laboratoire de Diagnostic Génétique, Institut de Génétique Médicale d’Alsace, Hôpitaux Universitaires de Strasbourg, Strasbourg, France
| | - Satoshi Nagaie
- Tohoku Medical Megabank Organization (ToMMo), Tohoku University, Sendai, Japan
| | - Jamal Nasir
- Department of Life Sciences, University of Northampton, Northampton, UK
| | - Soichi Ogishima
- Tohoku Medical Megabank Organization (ToMMo), Tohoku University, Sendai, Japan
| | - Vivian Ota Wang
- Office of Data Sharing, National Cancer Institute, NIH, Rockville, MD, USA
| | | | | | - Helen Parkinson
- European Molecular Biology Laboratory—European Bioinformatics Institute (EMBL-EBI), Hinxton, UK
| | - Anthony A. Philippakis
- Broad Institute of Harvard and the Massachusetts Institute of Technology, Cambridge, MA, USA
| | - Fabian Prasser
- Berlin Institute of Health at Charité—Universitätsmedizin Berlin, Berlin, Germany
| | - Jordi Rambla
- Centre de Regulació Genòmica (CRG), Barcelona, Spain
| | - Kathy Reinold
- Broad Institute of Harvard and the Massachusetts Institute of Technology, Cambridge, MA, USA
| | - Gregory A. Rushton
- Broad Institute of Harvard and the Massachusetts Institute of Technology, Cambridge, MA, USA
| | - Andrea Saltzman
- Broad Institute of Harvard and the Massachusetts Institute of Technology, Cambridge, MA, USA
| | | | - Heidi J. Sofia
- National Human Genome Research Institute, NIH, Bethesda, MD, USA
| | - John D. Spalding
- European Molecular Biology Laboratory—European Bioinformatics Institute (EMBL-EBI), Hinxton, UK
| | - Morris A. Swertz
- Genomics Coordination Center, Department of Genetics, University of Groningen, University Medical Center Groningen, Groningen, the Netherlands
| | | | - Esther J. van Enckevort
- Genomics Coordination Center, Department of Genetics, University of Groningen, University Medical Center Groningen, Groningen, the Netherlands
| | - Susheel Varma
- Health Data Research UK, Gibbs Building, 215 Euston Road, London NW1 2BE, UK
| | | | | | | | - Lyndon Zass
- Computational Biology Division, IDM, Faculty of Health Sciences, University of Cape Town, Cape Town, South Africa
| | | | | | - Mélanie Courtot
- European Molecular Biology Laboratory—European Bioinformatics Institute (EMBL-EBI), Hinxton, UK,Corresponding author
| |
Collapse
|
3
|
Rehm HL, Page AJ, Smith L, Adams JB, Alterovitz G, Babb LJ, Barkley MP, Baudis M, Beauvais MJ, Beck T, Beckmann JS, Beltran S, Bernick D, Bernier A, Bonfield JK, Boughtwood TF, Bourque G, Bowers SR, Brookes AJ, Brudno M, Brush MH, Bujold D, Burdett T, Buske OJ, Cabili MN, Cameron DL, Carroll RJ, Casas-Silva E, Chakravarty D, Chaudhari BP, Chen SH, Cherry JM, Chung J, Cline M, Clissold HL, Cook-Deegan RM, Courtot M, Cunningham F, Cupak M, Davies RM, Denisko D, Doerr MJ, Dolman LI, Dove ES, Dursi LJ, Dyke SO, Eddy JA, Eilbeck K, Ellrott KP, Fairley S, Fakhro KA, Firth HV, Fitzsimons MS, Fiume M, Flicek P, Fore IM, Freeberg MA, Freimuth RR, Fromont LA, Fuerth J, Gaff CL, Gan W, Ghanaim EM, Glazer D, Green RC, Griffith M, Griffith OL, Grossman RL, Groza T, Guidry Auvil JM, Guigó R, Gupta D, Haendel MA, Hamosh A, Hansen DP, Hart RK, Hartley DM, Haussler D, Hendricks-Sturrup RM, Ho CW, Hobb AE, Hoffman MM, Hofmann OM, Holub P, Hsu JS, Hubaux JP, Hunt SE, Husami A, Jacobsen JO, Jamuar SS, Janes EL, Jeanson F, Jené A, Johns AL, Joly Y, Jones SJ, Kanitz A, Kato K, Keane TM, Kekesi-Lafrance K, Kelleher J, Kerry G, Khor SS, Knoppers BM, Konopko MA, Kosaki K, Kuba M, Lawson J, Leinonen R, Li S, Lin MF, Linden M, Liu X, Liyanage IU, Lopez J, Lucassen AM, Lukowski M, Mann AL, Marshall J, Mattioni M, Metke-Jimenez A, Middleton A, Milne RJ, Molnár-Gábor F, Mulder N, Munoz-Torres MC, Nag R, Nakagawa H, Nasir J, Navarro A, Nelson TH, Niewielska A, Nisselle A, Niu J, Nyrönen TH, O’Connor BD, Oesterle S, Ogishima S, Ota Wang V, Paglione LA, Palumbo E, Parkinson HE, Philippakis AA, Pizarro AD, Prlic A, Rambla J, Rendon A, Rider RA, Robinson PN, Rodarmer KW, Rodriguez LL, Rubin AF, Rueda M, Rushton GA, Ryan RS, Saunders GI, Schuilenburg H, Schwede T, Scollen S, Senf A, Sheffield NC, Skantharajah N, Smith AV, Sofia HJ, Spalding D, Spurdle AB, Stark Z, Stein LD, Suematsu M, Tan P, Tedds JA, Thomson AA, Thorogood A, Tickle TL, Tokunaga K, Törnroos J, Torrents D, Upchurch S, Valencia A, Guimera RV, Vamathevan J, Varma S, Vears DF, Viner C, Voisin C, Wagner AH, Wallace SE, Walsh BP, Williams MS, Winkler EC, Wold BJ, Wood GM, Woolley JP, Yamasaki C, Yates AD, Yung CK, Zass LJ, Zaytseva K, Zhang J, Goodhand P, North K, Birney E. GA4GH: International policies and standards for data sharing across genomic research and healthcare. Cell Genom 2021; 1:100029. [PMID: 35072136 PMCID: PMC8774288 DOI: 10.1016/j.xgen.2021.100029] [Citation(s) in RCA: 64] [Impact Index Per Article: 21.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 01/08/2023]
Abstract
The Global Alliance for Genomics and Health (GA4GH) aims to accelerate biomedical advances by enabling the responsible sharing of clinical and genomic data through both harmonized data aggregation and federated approaches. The decreasing cost of genomic sequencing (along with other genome-wide molecular assays) and increasing evidence of its clinical utility will soon drive the generation of sequence data from tens of millions of humans, with increasing levels of diversity. In this perspective, we present the GA4GH strategies for addressing the major challenges of this data revolution. We describe the GA4GH organization, which is fueled by the development efforts of eight Work Streams and informed by the needs of 24 Driver Projects and other key stakeholders. We present the GA4GH suite of secure, interoperable technical standards and policy frameworks and review the current status of standards, their relevance to key domains of research and clinical care, and future plans of GA4GH. Broad international participation in building, adopting, and deploying GA4GH standards and frameworks will catalyze an unprecedented effort in data sharing that will be critical to advancing genomic medicine and ensuring that all populations can access its benefits.
Collapse
Affiliation(s)
- Heidi L. Rehm
- Broad Institute of MIT and Harvard, Cambridge, MA, USA
- Massachusetts General Hospital, Boston, MA, USA
- Harvard Medical School, Boston, MA, USA
| | - Angela J.H. Page
- Broad Institute of MIT and Harvard, Cambridge, MA, USA
- Global Alliance for Genomics and Health, Toronto, ON, Canada
| | - Lindsay Smith
- Global Alliance for Genomics and Health, Toronto, ON, Canada
- Ontario Institute for Cancer Research, Toronto, ON, Canada
| | - Jeremy B. Adams
- Global Alliance for Genomics and Health, Toronto, ON, Canada
- Ontario Institute for Cancer Research, Toronto, ON, Canada
| | - Gil Alterovitz
- Brigham and Women’s Hospital, Boston, MA, USA
- Harvard Medical School, Boston, MA, USA
| | | | | | - Michael Baudis
- University of Zurich, Zurich, Switzerland
- SIB Swiss Institute of Bioinformatics, Lausanne, Switzerland
| | - Michael J.S. Beauvais
- Global Alliance for Genomics and Health, Toronto, ON, Canada
- McGill University, Montreal, QC, Canada
| | - Tim Beck
- University of Leicester, Leicester, UK
| | | | - Sergi Beltran
- CNAG-CRG, Centre for Genomic Regulation (CRG), The Barcelona Institute of Science and Technology, Barcelona, Spain
- Universitat Pompeu Fabra (UPF), Barcelona, Spain
- Universitat de Barcelona, Barcelona, Spain
| | - David Bernick
- Broad Institute of MIT and Harvard, Cambridge, MA, USA
| | | | | | - Tiffany F. Boughtwood
- Australian Genomics, Parkville, VIC, Australia
- Murdoch Children’s Research Institute, Parkville, VIC, Australia
| | - Guillaume Bourque
- McGill University, Montreal, QC, Canada
- Canadian Center for Computational Genomics, Montreal, QC, Canada
| | | | | | - Michael Brudno
- Canadian Center for Computational Genomics, Montreal, QC, Canada
- University of Toronto, Toronto, ON, Canada
- University Health Network, Toronto, ON, Canada
- Vector Institute, Toronto, ON, Canada
- Canadian Distributed Infrastructure for Genomics (CanDIG), Toronto, ON, Canada
| | | | - David Bujold
- McGill University, Montreal, QC, Canada
- Canadian Center for Computational Genomics, Montreal, QC, Canada
- Canadian Distributed Infrastructure for Genomics (CanDIG), Toronto, ON, Canada
| | - Tony Burdett
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Hinxton, UK
| | | | | | - Daniel L. Cameron
- Walter and Eliza Hall Institute of Medical Research, Parkville, VIC, Australia
- University of Melbourne, Melbourne, VIC, Australia
| | | | | | | | - Bimal P. Chaudhari
- Nationwide Children’s Hospital, Columbus, OH, USA
- The Ohio State University, Columbus, OH, USA
| | - Shu Hui Chen
- National Heart, Lung, and Blood Institute, National Institutes of Health, Bethesda, MD, USA
| | | | - Justina Chung
- Global Alliance for Genomics and Health, Toronto, ON, Canada
- Ontario Institute for Cancer Research, Toronto, ON, Canada
| | - Melissa Cline
- UC Santa Cruz Genomics Institute, Santa Cruz, CA, USA
| | | | | | - Mélanie Courtot
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Hinxton, UK
| | - Fiona Cunningham
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Hinxton, UK
| | | | | | | | | | | | | | - L. Jonathan Dursi
- University Health Network, Toronto, ON, Canada
- Canadian Distributed Infrastructure for Genomics (CanDIG), Toronto, ON, Canada
| | | | | | | | | | - Susan Fairley
- Global Alliance for Genomics and Health, Toronto, ON, Canada
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Hinxton, UK
| | - Khalid A. Fakhro
- Sidra Medicine, Doha, Qatar
- Weill Cornell Medicine - Qatar, Doha, Qatar
| | - Helen V. Firth
- Wellcome Sanger Institute, Hinxton, UK
- Addenbrooke’s Hospital, Cambridge, UK
| | | | | | - Paul Flicek
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Hinxton, UK
| | - Ian M. Fore
- National Cancer Institute, National Institutes of Health, Bethesda, MD, USA
| | - Mallory A. Freeberg
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Hinxton, UK
| | | | - Lauren A. Fromont
- Centre for Genomic Regulation (CRG), The Barcelona Institute of Science and Technology, Barcelona, Spain
| | | | - Clara L. Gaff
- Australian Genomics, Parkville, VIC, Australia
- Murdoch Children’s Research Institute, Parkville, VIC, Australia
- Walter and Eliza Hall Institute of Medical Research, Parkville, VIC, Australia
- University of Melbourne, Melbourne, VIC, Australia
| | - Weiniu Gan
- National Heart, Lung, and Blood Institute, National Institutes of Health, Bethesda, MD, USA
| | - Elena M. Ghanaim
- National Human Genome Research Institute, National Institutes of Health, Bethesda, MD, USA
| | - David Glazer
- Verily Life Sciences, South San Francisco, CA, USA
| | - Robert C. Green
- Brigham and Women’s Hospital, Boston, MA, USA
- Harvard Medical School, Boston, MA, USA
| | - Malachi Griffith
- Washington University School of Medicine in St. Louis, St. Louis, MO, USA
| | - Obi L. Griffith
- Washington University School of Medicine in St. Louis, St. Louis, MO, USA
| | | | | | | | - Roderic Guigó
- Universitat Pompeu Fabra (UPF), Barcelona, Spain
- Centre for Genomic Regulation (CRG), The Barcelona Institute of Science and Technology, Barcelona, Spain
| | - Dipayan Gupta
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Hinxton, UK
| | | | - Ada Hamosh
- Johns Hopkins University, Baltimore, MD, USA
| | - David P. Hansen
- Australian Genomics, Parkville, VIC, Australia
- The Australian e-Health Research Centre, CSIRO, Herston, QLD, Australia
| | - Reece K. Hart
- Broad Institute of MIT and Harvard, Cambridge, MA, USA
- Invitae, San Francisco, CA, USA
- MyOme, Inc, San Bruno, CA, USA
| | | | - David Haussler
- UC Santa Cruz Genomics Institute, Santa Cruz, CA, USA
- Howard Hughes Medical Institute, University of California, Santa Cruz, CA, USA
| | | | | | | | - Michael M. Hoffman
- University of Toronto, Toronto, ON, Canada
- University Health Network, Toronto, ON, Canada
- Vector Institute, Toronto, ON, Canada
| | - Oliver M. Hofmann
- University of Toronto, Toronto, ON, Canada
- University of Melbourne, Melbourne, VIC, Australia
| | - Petr Holub
- BBMRI-ERIC, Graz, Austria
- Masaryk University, Brno, Czech Republic
| | | | | | - Sarah E. Hunt
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Hinxton, UK
| | - Ammar Husami
- Cincinnati Children’s Hospital Medical Center, Cincinnati, OH, USA
| | | | - Saumya S. Jamuar
- SingHealth Duke-NUS Genomic Medicine Centre, Singapore, Republic of Singapore
- SingHealth Duke-NUS Institute of Precision Medicine, Singapore, Republic of Singapore
| | - Elizabeth L. Janes
- Global Alliance for Genomics and Health, Toronto, ON, Canada
- University of Waterloo, Waterloo, ON, Canada
| | | | - Aina Jené
- Centre for Genomic Regulation (CRG), The Barcelona Institute of Science and Technology, Barcelona, Spain
| | - Amber L. Johns
- Garvan Institute of Medical Research, Darlinghurst, NSW, Australia
| | - Yann Joly
- McGill University, Montreal, QC, Canada
| | - Steven J.M. Jones
- Canada’s Michael Smith Genome Sciences Centre, BC Cancer, Vancouver, BC, Canada
| | - Alexander Kanitz
- SIB Swiss Institute of Bioinformatics, Lausanne, Switzerland
- University of Basel, Basel, Switzerland
| | | | - Thomas M. Keane
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Hinxton, UK
- University of Nottingham, Nottingham, UK
| | - Kristina Kekesi-Lafrance
- Global Alliance for Genomics and Health, Toronto, ON, Canada
- McGill University, Montreal, QC, Canada
| | | | - Giselle Kerry
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Hinxton, UK
| | - Seik-Soon Khor
- National Center for Global Health and Medicine Hospital, Tokyo, Japan
- University of Tokyo, Tokyo, Japan
| | | | | | | | | | | | - Rasko Leinonen
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Hinxton, UK
| | - Stephanie Li
- Broad Institute of MIT and Harvard, Cambridge, MA, USA
- Global Alliance for Genomics and Health, Toronto, ON, Canada
| | | | - Mikael Linden
- CSC–IT Center for Science, Espoo, Finland
- ELIXIR Finland, Espoo, Finland
| | | | - Isuru Udara Liyanage
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Hinxton, UK
| | | | | | | | - Alice L. Mann
- Global Alliance for Genomics and Health, Toronto, ON, Canada
- Wellcome Sanger Institute, Hinxton, UK
| | | | | | | | - Anna Middleton
- Wellcome Connecting Science, Hinxton, UK
- University of Cambridge, Cambridge, UK
| | - Richard J. Milne
- Wellcome Connecting Science, Hinxton, UK
- University of Cambridge, Cambridge, UK
| | | | - Nicola Mulder
- H3ABioNet, Computational Biology Division, IDM, Faculty of Health Sciences, Cape Town, South Africa
| | | | - Rishi Nag
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Hinxton, UK
| | - Hidewaki Nakagawa
- Japan Agency for Medical Research & Development (AMED), Tokyo, Japan
- RIKEN Center for Integrative Medical Sciences, Yokohama, Japan
| | | | - Arcadi Navarro
- Centre for Genomic Regulation (CRG), The Barcelona Institute of Science and Technology, Barcelona, Spain
- Institute of Evolutionary Biology (UPF-CSIC), Universitat Pompeu Fabra (UPF), Barcelona, Spain
- Institució Catalana de Recerca i Estudis Avançats, Barcelona, Spain
- Barcelonaβeta Brain Research Center (BBRC), Pasqual Maragall Foundation, Barcelona, Spain
| | | | - Ania Niewielska
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Hinxton, UK
| | - Amy Nisselle
- Murdoch Children’s Research Institute, Parkville, VIC, Australia
- University of Melbourne, Melbourne, VIC, Australia
- Human Genetics Society of Australasia Education, Ethics & Social Issues Committee, Alexandria, NSW, Australia
| | - Jeffrey Niu
- University Health Network, Toronto, ON, Canada
| | - Tommi H. Nyrönen
- CSC–IT Center for Science, Espoo, Finland
- ELIXIR Finland, Espoo, Finland
| | | | - Sabine Oesterle
- SIB Swiss Institute of Bioinformatics, Lausanne, Switzerland
| | | | - Vivian Ota Wang
- National Cancer Institute, National Institutes of Health, Bethesda, MD, USA
| | | | - Emilio Palumbo
- Universitat Pompeu Fabra (UPF), Barcelona, Spain
- Centre for Genomic Regulation (CRG), The Barcelona Institute of Science and Technology, Barcelona, Spain
| | - Helen E. Parkinson
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Hinxton, UK
| | | | | | | | - Jordi Rambla
- Universitat Pompeu Fabra (UPF), Barcelona, Spain
- Centre for Genomic Regulation (CRG), The Barcelona Institute of Science and Technology, Barcelona, Spain
| | | | - Renee A. Rider
- National Human Genome Research Institute, National Institutes of Health, Bethesda, MD, USA
| | - Peter N. Robinson
- The Jackson Laboratory, Farmington, CT, USA
- University of Connecticut, Farmington, CT, USA
| | - Kurt W. Rodarmer
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD, USA
| | | | - Alan F. Rubin
- Walter and Eliza Hall Institute of Medical Research, Parkville, VIC, Australia
- University of Melbourne, Melbourne, VIC, Australia
| | - Manuel Rueda
- Centre for Genomic Regulation (CRG), The Barcelona Institute of Science and Technology, Barcelona, Spain
| | | | | | | | - Helen Schuilenburg
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Hinxton, UK
| | - Torsten Schwede
- SIB Swiss Institute of Bioinformatics, Lausanne, Switzerland
- University of Basel, Basel, Switzerland
| | | | | | | | - Neerjah Skantharajah
- Global Alliance for Genomics and Health, Toronto, ON, Canada
- Ontario Institute for Cancer Research, Toronto, ON, Canada
| | | | - Heidi J. Sofia
- National Human Genome Research Institute, National Institutes of Health, Bethesda, MD, USA
| | - Dylan Spalding
- CSC–IT Center for Science, Espoo, Finland
- ELIXIR Finland, Espoo, Finland
| | | | - Zornitza Stark
- Australian Genomics, Parkville, VIC, Australia
- Murdoch Children’s Research Institute, Parkville, VIC, Australia
- University of Melbourne, Melbourne, VIC, Australia
| | - Lincoln D. Stein
- Ontario Institute for Cancer Research, Toronto, ON, Canada
- University of Toronto, Toronto, ON, Canada
| | | | - Patrick Tan
- SingHealth Duke-NUS Genomic Medicine Centre, Singapore, Republic of Singapore
- Precision Health Research Singapore, Singapore, Republic of Singapore
- Genome Institute of Singapore, Singapore, Republic of Singapore
| | | | - Alastair A. Thomson
- National Heart, Lung, and Blood Institute, National Institutes of Health, Bethesda, MD, USA
| | - Adrian Thorogood
- McGill University, Montreal, QC, Canada
- University of Luxembourg, Esch-sur-Alzette, Luxembourg
| | | | - Katsushi Tokunaga
- University of Tokyo, Tokyo, Japan
- National Center for Global Health and Medicine, Tokyo, Japan
| | - Juha Törnroos
- CSC–IT Center for Science, Espoo, Finland
- ELIXIR Finland, Espoo, Finland
| | - David Torrents
- Institució Catalana de Recerca i Estudis Avançats, Barcelona, Spain
- Barcelona Supercomputing Center, Barcelona, Spain
| | - Sean Upchurch
- California Institute of Technology, Pasadena, CA, USA
| | - Alfonso Valencia
- Institució Catalana de Recerca i Estudis Avançats, Barcelona, Spain
- Barcelona Supercomputing Center, Barcelona, Spain
| | | | - Jessica Vamathevan
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Hinxton, UK
| | - Susheel Varma
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Hinxton, UK
- Health Data Research UK, London, UK
| | - Danya F. Vears
- Murdoch Children’s Research Institute, Parkville, VIC, Australia
- University of Melbourne, Melbourne, VIC, Australia
- Human Genetics Society of Australasia Education, Ethics & Social Issues Committee, Alexandria, NSW, Australia
- Melbourne Law School, University of Melbourne, Parkville, VIC, Australia
| | - Coby Viner
- University of Toronto, Toronto, ON, Canada
- University Health Network, Toronto, ON, Canada
| | | | - Alex H. Wagner
- Nationwide Children’s Hospital, Columbus, OH, USA
- The Ohio State University, Columbus, OH, USA
| | | | | | | | - Eva C. Winkler
- Section of Translational Medical Ethics, University Hospital Heidelberg, Heidelberg, Germany
| | | | | | | | | | - Andrew D. Yates
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Hinxton, UK
| | - Christina K. Yung
- Ontario Institute for Cancer Research, Toronto, ON, Canada
- Indoc Research, Toronto, ON, Canada
| | - Lyndon J. Zass
- H3ABioNet, Computational Biology Division, IDM, Faculty of Health Sciences, Cape Town, South Africa
| | - Ksenia Zaytseva
- McGill University, Montreal, QC, Canada
- Canadian Centre for Computational Genomics, Montreal, QC, Canada
| | - Junjun Zhang
- Ontario Institute for Cancer Research, Toronto, ON, Canada
| | - Peter Goodhand
- Global Alliance for Genomics and Health, Toronto, ON, Canada
- Ontario Institute for Cancer Research, Toronto, ON, Canada
| | - Kathryn North
- Murdoch Children’s Research Institute, Parkville, VIC, Australia
- University of Toronto, Toronto, ON, Canada
- University of Melbourne, Melbourne, VIC, Australia
| | - Ewan Birney
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Hinxton, UK
- European Molecular Biology Laboratory, Heidelberg, Germany
| |
Collapse
|
4
|
Lu MW, Walia G, Schulze K, Doral MY, Maund SL, Gaffey S, Cabili MN, Bourla AB, Green RJ, Santos EC, Herbst RS, Chiang AC, Schwartzberg LS. A multi-stakeholder platform to prospectively link longitudinal real-world clinico-genomic, imaging, and outcomes data for patients with metastatic lung cancer. J Clin Oncol 2020. [DOI: 10.1200/jco.2020.38.15_suppl.tps2087] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/20/2022] Open
Abstract
TPS2087 Background: Making personalized diagnostics and treatments a reality for every cancer patient necessitates comprehensively capturing the patient journey. Real-world data has shown promise for the future of clinical research and advancing precision medicine. However, certain limitations exist such as data quality management as well as bias and confounding factors associated with retrospective analyses. We present a multi-stakeholder platform to prospectively collect and link real-world clinico-genomic, imaging, and outcomes data to longitudinal blood genomic profiling for lung cancer. Methods: This study is enrolling approximately 1000 patients with metastatic non-small cell lung cancer or extensive-stage small cell lung cancer who will initiate standard-of-care systemic anti-neoplastic treatment, regardless of line of therapy, at 20 community oncology and academic practices within the Flatiron Health network. Relevant clinical data points from both structured and unstructured fields will be collected through the electronic health records via technology-enabled abstraction, eliminating the need for case report forms. Digital pathology and clinical images at standard-of-care visits will be collected. Blood samples for circulating tumor DNA (ctDNA) profiling using FoundationOne Liquid will be collected at three timepoints: enrollment, first tumor assessment, and end of treatment. Tumor tissue samples may be submitted at baseline for genomic profiling using FoundationOne CDx. Overall survival follow-up will occur until death, withdrawal of consent, loss to follow-up, or end of study. The objectives are to evaluate 1) the feasibility of building a scalable, prospective platform and 2) the associations between ctDNA and real-world clinical outcomes, including overall survival. Enrollment is ongoing. Clinical trial information: NCT04180176.
Collapse
Affiliation(s)
| | | | | | | | | | | | | | | | | | - Eric C. Santos
- Cancer and Hem Ctrs of Western Michigan, Grand Rapids, MI
| | | | | | | |
Collapse
|
5
|
Dyke SOM, Linden M, Lappalainen I, De Argila JR, Carey K, Lloyd D, Spalding JD, Cabili MN, Kerry G, Foreman J, Cutts T, Shabani M, Rodriguez LL, Haeussler M, Walsh B, Jiang X, Wang S, Perrett D, Boughtwood T, Matern A, Brookes AJ, Cupak M, Fiume M, Pandya R, Tulchinsky I, Scollen S, Törnroos J, Das S, Evans AC, Malin BA, Beck S, Brenner SE, Nyrönen T, Blomberg N, Firth HV, Hurles M, Philippakis AA, Rätsch G, Brudno M, Boycott KM, Rehm HL, Baudis M, Sherry ST, Kato K, Knoppers BM, Baker D, Flicek P. Registered access: authorizing data access. Eur J Hum Genet 2018; 26:1721-1731. [PMID: 30069064 PMCID: PMC6244209 DOI: 10.1038/s41431-018-0219-y] [Citation(s) in RCA: 19] [Impact Index Per Article: 3.2] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/26/2018] [Revised: 05/08/2018] [Accepted: 06/20/2018] [Indexed: 12/14/2022] Open
Abstract
The Global Alliance for Genomics and Health (GA4GH) proposes a data access policy model-"registered access"-to increase and improve access to data requiring an agreement to basic terms and conditions, such as the use of DNA sequence and health data in research. A registered access policy would enable a range of categories of users to gain access, starting with researchers and clinical care professionals. It would also facilitate general use and reuse of data but within the bounds of consent restrictions and other ethical obligations. In piloting registered access with the Scientific Demonstration data sharing projects of GA4GH, we provide additional ethics, policy and technical guidance to facilitate the implementation of this access model in an international setting.
Collapse
Affiliation(s)
- Stephanie O M Dyke
- Centre of Genomics and Policy, Faculty of Medicine, McGill University, Montreal, QC, Canada.
- Montreal Neurological Institute, Faculty of Medicine, McGill University, Montreal, QC, Canada.
| | - Mikael Linden
- CSC - IT Center for Science, Espoo, Finland
- ELIXIR Hub, Wellcome Genome Campus, Hinxton, Cambridge, UK
| | - Ilkka Lappalainen
- CSC - IT Center for Science, Espoo, Finland
- ELIXIR Hub, Wellcome Genome Campus, Hinxton, Cambridge, UK
- European Molecular Biology Laboratory, European Bioinformatics Institute, Hinxton, UK
| | - Jordi Rambla De Argila
- Centre for Genomic Regulation, Barcelona, Spain
- Universitat Pompeu Fabra, Barcelona, Spain
| | | | - David Lloyd
- European Molecular Biology Laboratory, European Bioinformatics Institute, Hinxton, UK
- The Global Alliance for Genomics and Health, MaRS Centre, West Tower, 661 University Avenue, Suite 510, Toronto, M5G 0A3, ON, Canada
| | - J Dylan Spalding
- European Molecular Biology Laboratory, European Bioinformatics Institute, Hinxton, UK
| | | | - Giselle Kerry
- European Molecular Biology Laboratory, European Bioinformatics Institute, Hinxton, UK
| | - Julia Foreman
- Wellcome Trust Sanger Institute, Wellcome Genome Campus, Hinxton, Cambridge, UK
| | - Tim Cutts
- Wellcome Trust Sanger Institute, Wellcome Genome Campus, Hinxton, Cambridge, UK
| | - Mahsa Shabani
- Center for Biomedical Ethics and Law, Department of Public Health and Primary Care, University of Leuven, Leuven, Belgium
| | | | | | | | - Xiaoqian Jiang
- Department of Biomedical Informatics, UC San Diego, La Jolla, CA, USA
| | - Shuang Wang
- Department of Biomedical Informatics, UC San Diego, La Jolla, CA, USA
| | - Daniel Perrett
- Wellcome Trust Sanger Institute, Wellcome Genome Campus, Hinxton, Cambridge, UK
| | - Tiffany Boughtwood
- Australian Genomics Health Alliance, 50 Flemington Road, Parkville, VIC, 3052, Australia
| | | | - Anthony J Brookes
- Department of Genetics and Genome Biology, University of Leicester, Leicester, UK
| | | | | | | | | | - Serena Scollen
- ELIXIR Hub, Wellcome Genome Campus, Hinxton, Cambridge, UK
| | | | - Samir Das
- McGill Centre for Integrative Neurosciences, Montreal Neurological Institute, McGill University, Montreal, QC, Canada
| | - Alan C Evans
- McGill Centre for Integrative Neurosciences, Montreal Neurological Institute, McGill University, Montreal, QC, Canada
| | | | - Stephan Beck
- UCL Cancer Institute, University College London, London, UK
| | - Steven E Brenner
- Department of Plant & Microbial Biology, University of California, Berkeley, CA, USA
| | - Tommi Nyrönen
- CSC - IT Center for Science, Espoo, Finland
- ELIXIR Compute Platform, ELIXIR, Wellcome Genome Campus, Hinxton, Cambridge, UK
| | | | - Helen V Firth
- Wellcome Trust Sanger Institute, Wellcome Genome Campus, Hinxton, Cambridge, UK
| | - Matthew Hurles
- Wellcome Trust Sanger Institute, Wellcome Genome Campus, Hinxton, Cambridge, UK
| | | | - Gunnar Rätsch
- Department of Computer Science, Biomedical Informatics, ETH Zurich, Zurich, Switzerland
| | - Michael Brudno
- Department of Computer Science, University of Toronto, Toronto, ON, Canada
- Centre for Computational Medicine, Hospital for Sick Children, Toronto, ON, Canada
| | - Kym M Boycott
- Children's Hospital of Eastern Ontario Research Institute, University of Ottawa, Ottawa, ON, Canada
| | - Heidi L Rehm
- Broad Institute of MIT and Harvard, Cambridge, MA, USA
- Department of Pathology, Brigham & Women's Hospital & Harvard Medical School, Boston, MA, USA
| | - Michael Baudis
- University of Zurich & Swiss Institute of Bioinformatics, Zurich, Switzerland
| | - Stephen T Sherry
- National Centre for Biotechnology Information, US National Library of Medicine, Bethesda, MD, USA
| | - Kazuto Kato
- Department of Biomedical Ethics and Public Policy, Graduate School of Medicine, Osaka University, Osaka, Japan
| | - Bartha M Knoppers
- Centre of Genomics and Policy, Faculty of Medicine, McGill University, Montreal, QC, Canada
| | - Dixie Baker
- Martin, Blanck & Associates, Alexandria, VA, USA
| | - Paul Flicek
- European Molecular Biology Laboratory, European Bioinformatics Institute, Hinxton, UK
| |
Collapse
|
6
|
Woolley JP, Kirby E, Leslie J, Jeanson F, Cabili MN, Rushton G, Hazard JG, Ladas V, Veal CD, Gibson SJ, Tassé AM, Dyke SOM, Gaff C, Thorogood A, Knoppers BM, Wilbanks J, Brookes AJ. Responsible sharing of biomedical data and biospecimens via the "Automatable Discovery and Access Matrix" (ADA-M). NPJ Genom Med 2018; 3:17. [PMID: 30062047 PMCID: PMC6056554 DOI: 10.1038/s41525-018-0057-4] [Citation(s) in RCA: 34] [Impact Index Per Article: 5.7] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/16/2017] [Revised: 05/31/2018] [Accepted: 06/08/2018] [Indexed: 11/15/2022] Open
Abstract
Given the data-rich nature of modern biomedical research, there is a pressing need for a systematic, structured, computer-readable way to capture, communicate, and manage sharing rules that apply to biomedical resources. This is essential for responsible recording, versioning, communication, querying, and actioning of resource sharing plans. However, lack of a common “information model” for rules and conditions that govern the sharing of materials, methods, software, data, and knowledge creates a fundamental barrier. Without this, it can be virtually impossible for Research Ethics Committees (RECs), Institutional Review Boards (IRBs), Data Access Committees (DACs), biobanks, and end users to confidently track, manage, and interpret applicable legal and ethical requirements. This raises costs and burdens of data stewardship and decreases efficient and responsible access to data, biospecimens, and other resources. To address this, the GA4GH and IRDiRC organizations sponsored the creation of the Automatable Discovery and Access Matrix (ADA-M, read simply as “Adam”). ADA-M is a comprehensive information model that provides the basis for producing structured metadata “Profiles” of regulatory conditions, thereby enabling efficient application of those conditions across regulatory spheres. Widespread use of ADA-M will aid researchers in globally searching and prescreening potential data and/or biospecimen resources for compatibility with their research plans in a responsible and efficient manner, increasing likelihood of timely DAC approvals while also significantly reducing time and effort DACs, RECs, and IRBs spend evaluating resource requests and research proposals. Extensive online documentation, software support, video guides, and an Application Programming Interface (API) for ADA-M have been made available.
Collapse
Affiliation(s)
- J Patrick Woolley
- 1Harris Manchester College, University of Oxford, Mansfield Road, Oxford, OX1 3TD UK
| | - Emily Kirby
- 2Public Population Project in Genomics and Society (P3G), McGill University and Genome Quebec Innovation Centre, 740 Dr Penfield Avenue, Suite 5104, Montreal, QC H3A 0G1 Canada
| | - Josh Leslie
- Stewardly, Centre for Social Innovation, Suite 400, 215 Spadina Ave., Toronto, ON M5T 2C7 Canada
| | | | - Moran N Cabili
- 5Broad Institute of MIT and Harvard, 415 Main Street, Cambridge, MA 02142 USA
| | - Gregory Rushton
- 5Broad Institute of MIT and Harvard, 415 Main Street, Cambridge, MA 02142 USA
| | | | - Vagelis Ladas
- 7Department of Genetics and Genome Biology, University of Leicester, Adrian Building, University Road, Leicester, LE1 7RH UK
| | - Colin D Veal
- 7Department of Genetics and Genome Biology, University of Leicester, Adrian Building, University Road, Leicester, LE1 7RH UK
| | - Spencer J Gibson
- 7Department of Genetics and Genome Biology, University of Leicester, Adrian Building, University Road, Leicester, LE1 7RH UK
| | - Anne-Marie Tassé
- 2Public Population Project in Genomics and Society (P3G), McGill University and Genome Quebec Innovation Centre, 740 Dr Penfield Avenue, Suite 5104, Montreal, QC H3A 0G1 Canada
| | - Stephanie O M Dyke
- 8Centre of Genomics and Policy, McGill University, 740 Dr. Penfield Avenue, suite 5200, Montreal, QC H3A 0G1 Canada
| | - Clara Gaff
- 9Walter and Eliza Hall Institute, 1G Royal Parade, Parkville, VIC 3052 Australia.,10The University of Melbourne, Melbourne, VIC 3010 Australia
| | - Adrian Thorogood
- 8Centre of Genomics and Policy, McGill University, 740 Dr. Penfield Avenue, suite 5200, Montreal, QC H3A 0G1 Canada
| | - Bartha Maria Knoppers
- 8Centre of Genomics and Policy, McGill University, 740 Dr. Penfield Avenue, suite 5200, Montreal, QC H3A 0G1 Canada
| | - John Wilbanks
- 11Sage Bionetworks, 1100 Fairview Ave. N., Mailstop M1-C108, Seattle, WA 98109 USA
| | - Anthony J Brookes
- 7Department of Genetics and Genome Biology, University of Leicester, Adrian Building, University Road, Leicester, LE1 7RH UK
| |
Collapse
|
7
|
Shukla CJ, McCorkindale AL, Gerhardinger C, Korthauer KD, Cabili MN, Shechner DM, Irizarry RA, Maass PG, Rinn JL. High-throughput identification of RNA nuclear enrichment sequences. EMBO J 2018; 37:embj.201798452. [PMID: 29335281 PMCID: PMC5852646 DOI: 10.15252/embj.201798452] [Citation(s) in RCA: 80] [Impact Index Per Article: 13.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/19/2017] [Revised: 12/18/2017] [Accepted: 12/20/2017] [Indexed: 11/21/2022] Open
Abstract
In the post‐genomic era, thousands of putative noncoding regulatory regions have been identified, such as enhancers, promoters, long noncoding RNAs (lncRNAs), and a cadre of small peptides. These ever‐growing catalogs require high‐throughput assays to test their functionality at scale. Massively parallel reporter assays have greatly enhanced the understanding of noncoding DNA elements en masse. Here, we present a massively parallel RNA assay (MPRNA) that can assay 10,000 or more RNA segments for RNA‐based functionality. We applied MPRNA to identify RNA‐based nuclear localization domains harbored in lncRNAs. We examined a pool of 11,969 oligos densely tiling 38 human lncRNAs that were fused to a cytosolic transcript. After cell fractionation and barcode sequencing, we identified 109 unique RNA regions that significantly enriched this cytosolic transcript in the nucleus including a cytosine‐rich motif. These nuclear enrichment sequences are highly conserved and over‐represented in global nuclear fractionation sequencing. Importantly, many of these regions were independently validated by single‐molecule RNA fluorescence in situ hybridization. Overall, we demonstrate the utility of MPRNA for future investigation of RNA‐based functionalities.
Collapse
Affiliation(s)
- Chinmay J Shukla
- Department of Stem Cell and Regenerative Biology, Harvard University, Cambridge, MA, USA.,Broad Institute of MIT and Harvard, Cambridge, MA, USA.,Department of Biostatistics and Computational Biology, Dana-Farber Cancer Institute, Boston, MA, USA.,Program in Biological and Biomedical Sciences, Harvard Medical School, Boston, MA, USA
| | - Alexandra L McCorkindale
- Department of Stem Cell and Regenerative Biology, Harvard University, Cambridge, MA, USA.,Berlin Institute for Medical Systems Biology, Max Delbrück Center for Molecular Medicine, Berlin, Germany
| | - Chiara Gerhardinger
- Department of Stem Cell and Regenerative Biology, Harvard University, Cambridge, MA, USA.,Broad Institute of MIT and Harvard, Cambridge, MA, USA
| | - Keegan D Korthauer
- Department of Biostatistics and Computational Biology, Dana-Farber Cancer Institute, Boston, MA, USA.,Department of Biostatistics, Harvard T.H. Chan School of Public Health, Boston, MA, USA
| | | | - David M Shechner
- Department of Stem Cell and Regenerative Biology, Harvard University, Cambridge, MA, USA.,Broad Institute of MIT and Harvard, Cambridge, MA, USA
| | - Rafael A Irizarry
- Department of Biostatistics and Computational Biology, Dana-Farber Cancer Institute, Boston, MA, USA.,Department of Biostatistics, Harvard T.H. Chan School of Public Health, Boston, MA, USA
| | - Philipp G Maass
- Department of Stem Cell and Regenerative Biology, Harvard University, Cambridge, MA, USA
| | - John L Rinn
- Department of Stem Cell and Regenerative Biology, Harvard University, Cambridge, MA, USA .,Broad Institute of MIT and Harvard, Cambridge, MA, USA.,Department of Pathology, Beth Israel Deaconess Medical Center, Boston, MA, USA
| |
Collapse
|
8
|
Abstract
Single-molecule RNA fluorescence in situ hybridization is a technique that holds great potential for the study of long noncoding RNA. It enables quantification and spatial resolution of single RNA molecules within cells via hybridization of multiple, labeled nucleic acid probes to a target RNA. It has recently become apparent that single-molecule RNA FISH probes targeting noncoding RNA are more prone to off-target binding yielding spurious results than when targeting mRNA. Here we present a protocol for the application of single-molecule RNA FISH to the study of noncoding RNA as well as an experimental procedure for validating legitimate signals.
Collapse
Affiliation(s)
- Margaret Dunagin
- Department of Bioengineering, University of Pennsylvania, 210 S. 33rd St., Philadelphia, PA, 19104-6321, USA
| | | | | | | |
Collapse
|
9
|
Cabili MN, Dunagin MC, McClanahan PD, Biaesch A, Padovan-Merhar O, Regev A, Rinn JL, Raj A. Localization and abundance analysis of human lncRNAs at single-cell and single-molecule resolution. Genome Biol 2015; 16:20. [PMID: 25630241 PMCID: PMC4369099 DOI: 10.1186/s13059-015-0586-4] [Citation(s) in RCA: 457] [Impact Index Per Article: 50.8] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/10/2014] [Accepted: 01/13/2015] [Indexed: 02/06/2023] Open
Abstract
Background Long non-coding RNAs (lncRNAs) have been implicated in diverse biological processes. In contrast to extensive genomic annotation of lncRNA transcripts, far fewer have been characterized for subcellular localization and cell-to-cell variability. Addressing this requires systematic, direct visualization of lncRNAs in single cells at single-molecule resolution. Results We use single-molecule RNA-FISH to systematically quantify and categorize the subcellular localization patterns of a representative set of 61 lncRNAs in three different cell types. Our survey yields high-resolution quantification and stringent validation of the number and spatial positions of these lncRNA, with an mRNA set for comparison. Using this highly quantitative image-based dataset, we observe a variety of subcellular localization patterns, ranging from bright sub-nuclear foci to almost exclusively cytoplasmic localization. We also find that the low abundance of lncRNAs observed from cell population measurements cannot be explained by high expression in a small subset of ‘jackpot’ cells. Additionally, nuclear lncRNA foci dissolve during mitosis and become widely dispersed, suggesting these lncRNAs are not mitotic bookmarking factors. Moreover, we see that divergently transcribed lncRNAs do not always correlate with their cognate mRNA, nor do they have a characteristic localization pattern. Conclusions Our systematic, high-resolution survey of lncRNA localization reveals aspects of lncRNAs that are similar to mRNAs, such as cell-to-cell variability, but also several distinct properties. These characteristics may correspond to particular functional roles. Our study also provides a quantitative description of lncRNAs at the single-cell level and a universally applicable framework for future study and validation of lncRNAs. Electronic supplementary material The online version of this article (doi:10.1186/s13059-015-0586-4) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Moran N Cabili
- Broad Institute of MIT and Harvard, 7 Cambridge Center, Cambridge, MA, 02142, USA. .,Department of Systems Biology, Harvard Medical School, Boston, MA, 02115, USA. .,Department of Stem Cell and Regenerative Biology, Harvard University, Cambridge, MA, 02138, USA.
| | - Margaret C Dunagin
- School of Engineering and Applied Science, University of Pennsylvania, Philadelphia, PA, 19104, USA.
| | - Patrick D McClanahan
- School of Engineering and Applied Science, University of Pennsylvania, Philadelphia, PA, 19104, USA.
| | - Andrew Biaesch
- School of Engineering and Applied Science, University of Pennsylvania, Philadelphia, PA, 19104, USA.
| | - Olivia Padovan-Merhar
- School of Engineering and Applied Science, University of Pennsylvania, Philadelphia, PA, 19104, USA.
| | - Aviv Regev
- Broad Institute of MIT and Harvard, 7 Cambridge Center, Cambridge, MA, 02142, USA. .,Howard Hughes Medical Institute, Department of Biology, Massachusetts Institute of Technology, Cambridge, MA, 02140, USA.
| | - John L Rinn
- Broad Institute of MIT and Harvard, 7 Cambridge Center, Cambridge, MA, 02142, USA. .,Department of Stem Cell and Regenerative Biology, Harvard University, Cambridge, MA, 02138, USA.
| | - Arjun Raj
- School of Engineering and Applied Science, University of Pennsylvania, Philadelphia, PA, 19104, USA.
| |
Collapse
|
10
|
Abstract
Recently, researchers have uncovered the presence of many long noncoding RNAs (lncRNAs) in embryonic stem cells and believe they are important regulators of the differentiation process. However, there are only a few examples explicitly linking lncRNA activity to transcriptional regulation. Here, we used transcript counting and spatial localization to characterize a lncRNA (dubbed linc-HOXA1) located ∼50 kb from the Hoxa gene cluster in mouse embryonic stem cells. Single-cell transcript counting revealed that linc-HOXA1 and Hoxa1 RNA are highly variable at the single-cell level and that whenever linc-HOXA1 RNA abundance was high, Hoxa1 mRNA abundance was low and vice versa. Knockdown analysis revealed that depletion of linc-HOXA1 RNA at its site of transcription increased transcription of the Hoxa1 gene cis to the chromosome and that exposure of cells to retinoic acid can disrupt this interaction. We further showed that linc-HOXA1 RNA represses Hoxa1 by recruiting the protein PURB as a transcriptional cofactor. Our results highlight the power of transcript visualization to characterize lncRNA function and also suggest that PURB can facilitate lncRNA-mediated transcriptional regulation.
Collapse
Affiliation(s)
- Hédia Maamar
- Department of Bioengineering, University of Pennsylvania, Philadelphia, Pennsylvania 19104, USA
| | | | | | | |
Collapse
|
11
|
Slavoff SA, Mitchell AJ, Schwaid AG, Cabili MN, Ma J, Levin JZ, Karger AD, Budnik BA, Rinn JL, Saghatelian A. Peptidomic discovery of short open reading frame-encoded peptides in human cells. Nat Chem Biol 2012; 9:59-64. [PMID: 23160002 PMCID: PMC3625679 DOI: 10.1038/nchembio.1120] [Citation(s) in RCA: 421] [Impact Index Per Article: 35.1] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/22/2012] [Accepted: 10/16/2012] [Indexed: 11/24/2022]
Abstract
The amount of the transcriptome that is translated into polypeptides is of fundamental importance. We developed a peptidomic strategy to detect short ORF (sORF)-encoded polypeptides (SEPs) in human cells. We identified 90 SEPs, 86 of which are novel, the largest number of human SEPs ever reported. SEP abundances range from 10-1000 molecules per cell, identical to known proteins. SEPs arise from sORFs in non-coding RNAs as well as multi-cistronic mRNAs, and many SEPs initiate with non-AUG start codons, indicating that non-canonical translation may be more widespread in mammals than previously thought. In addition, coding sORFs are present in a small fraction (8/1866) of long intergenic non-coding RNAs (lincRNAs). Together, these results provide the strongest evidence to date that the human proteome is more complex than previously appreciated.
Collapse
Affiliation(s)
- Sarah A Slavoff
- Department of Chemistry and Chemical Biology, Harvard University, Cambridge, Massachusetts, USA
| | | | | | | | | | | | | | | | | | | |
Collapse
|
12
|
Abstract
A report on the Keystone symposium 'Non-coding RNAs' held at Snowbird, Utah, USA, 31 March to 5 April 2012.
Collapse
Affiliation(s)
- Ezgi Hacisuleyman
- Molecular and Cellular Biology, Harvard University, Cambridge, MA 02138, USA
| | | | | |
Collapse
|
13
|
Cabili MN, Trapnell C, Goff L, Koziol M, Tazon-Vega B, Regev A, Rinn JL. Integrative annotation of human large intergenic noncoding RNAs reveals global properties and specific subclasses. Genes Dev 2011; 25:1915-27. [PMID: 21890647 DOI: 10.1101/gad.17446611] [Citation(s) in RCA: 2632] [Impact Index Per Article: 202.5] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/07/2023]
Abstract
Large intergenic noncoding RNAs (lincRNAs) are emerging as key regulators of diverse cellular processes. Determining the function of individual lincRNAs remains a challenge. Recent advances in RNA sequencing (RNA-seq) and computational methods allow for an unprecedented analysis of such transcripts. Here, we present an integrative approach to define a reference catalog of >8000 human lincRNAs. Our catalog unifies previously existing annotation sources with transcripts we assembled from RNA-seq data collected from ∼4 billion RNA-seq reads across 24 tissues and cell types. We characterize each lincRNA by a panorama of >30 properties, including sequence, structural, transcriptional, and orthology features. We found that lincRNA expression is strikingly tissue-specific compared with coding genes, and that lincRNAs are typically coexpressed with their neighboring genes, albeit to an extent similar to that of pairs of neighboring protein-coding genes. We distinguish an additional subset of transcripts that have high evolutionary conservation but may include short ORFs and may serve as either lincRNAs or small peptides. Our integrated, comprehensive, yet conservative reference catalog of human lincRNAs reveals the global properties of lincRNAs and will facilitate experimental studies and further functional classification of these genes.
Collapse
Affiliation(s)
- Moran N Cabili
- Broad Institute of Massachusetts Institute of Technology and Harvard, Cambridge, Massachusetts 02142, USA
| | | | | | | | | | | | | |
Collapse
|
14
|
Loewer S, Cabili MN, Guttman M, Loh YH, Thomas K, Park IH, Garber M, Curran M, Onder T, Agarwal S, Manos PD, Datta S, Lander ES, Schlaeger TM, Daley GQ, Rinn JL. Large intergenic non-coding RNA-RoR modulates reprogramming of human induced pluripotent stem cells. Nat Genet 2010; 42:1113-7. [PMID: 21057500 DOI: 10.1038/ng.710] [Citation(s) in RCA: 764] [Impact Index Per Article: 54.6] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/25/2010] [Accepted: 09/18/2010] [Indexed: 12/11/2022]
Abstract
The conversion of lineage-committed cells to induced pluripotent stem cells (iPSCs) by reprogramming is accompanied by a global remodeling of the epigenome, resulting in altered patterns of gene expression. Here we characterize the transcriptional reorganization of large intergenic non-coding RNAs (lincRNAs) that occurs upon derivation of human iPSCs and identify numerous lincRNAs whose expression is linked to pluripotency. Among these, we defined ten lincRNAs whose expression was elevated in iPSCs compared with embryonic stem cells, suggesting that their activation may promote the emergence of iPSCs. Supporting this, our results indicate that these lincRNAs are direct targets of key pluripotency transcription factors. Using loss-of-function and gain-of-function approaches, we found that one such lincRNA (lincRNA-RoR) modulates reprogramming, thus providing a first demonstration for critical functions of lincRNAs in the derivation of pluripotent stem cells.
Collapse
Affiliation(s)
- Sabine Loewer
- Stem Cell Transplantation Program, Division of Pediatric Hematology and Oncology, Manton Center for Orphan Disease Research, Children's Hospital Boston and Dana Farber Cancer Institute, Boston, Massachusetts, USA
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
15
|
Abstract
Early diagnosis of inborn errors of metabolism is commonly performed through biofluid metabolomics, which detects specific metabolic biomarkers whose concentration is altered due to genomic mutations. The identification of new biomarkers is of major importance to biomedical research and is usually performed through data mining of metabolomic data. After the recent publication of the genome-scale network model of human metabolism, we present a novel computational approach for systematically predicting metabolic biomarkers in stochiometric metabolic models. Applying the method to predict biomarkers for disruptions of red-blood cell metabolism demonstrates a marked correlation with altered metabolic concentrations inferred through kinetic model simulations. Applying the method to the genome-scale human model reveals a set of 233 metabolites whose concentration is predicted to be either elevated or reduced as a result of 176 possible dysfunctional enzymes. The method's predictions are shown to significantly correlate with known disease biomarkers and to predict many novel potential biomarkers. Using this method to prioritize metabolite measurement experiments to identify new biomarkers can provide an order of a 10-fold increase in biomarker detection performance.
Collapse
Affiliation(s)
- Tomer Shlomi
- Department of Computer Science, Technion-Israel Institute of Technology, Haifa, Israel.
| | | | | |
Collapse
|
16
|
Guttman M, Amit I, Garber M, French C, Lin MF, Feldser D, Huarte M, Zuk O, Carey BW, Cassady JP, Cabili MN, Jaenisch R, Mikkelsen TS, Jacks T, Hacohen N, Bernstein BE, Kellis M, Regev A, Rinn JL, Lander ES. Chromatin signature reveals over a thousand highly conserved large non-coding RNAs in mammals. Nature 2009; 458:223-7. [PMID: 19182780 DOI: 10.1038/nature07672] [Citation(s) in RCA: 3195] [Impact Index Per Article: 213.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/05/2008] [Accepted: 11/25/2008] [Indexed: 12/19/2022]
Abstract
There is growing recognition that mammalian cells produce many thousands of large intergenic transcripts. However, the functional significance of these transcripts has been particularly controversial. Although there are some well-characterized examples, most (>95%) show little evidence of evolutionary conservation and have been suggested to represent transcriptional noise. Here we report a new approach to identifying large non-coding RNAs using chromatin-state maps to discover discrete transcriptional units intervening known protein-coding loci. Our approach identified approximately 1,600 large multi-exonic RNAs across four mouse cell types. In sharp contrast to previous collections, these large intervening non-coding RNAs (lincRNAs) show strong purifying selection in their genomic loci, exonic sequences and promoter regions, with greater than 95% showing clear evolutionary conservation. We also developed a functional genomics approach that assigns putative functions to each lincRNA, demonstrating a diverse range of roles for lincRNAs in processes from embryonic stem cell pluripotency to cell proliferation. We obtained independent functional validation for the predictions for over 100 lincRNAs, using cell-based assays. In particular, we demonstrate that specific lincRNAs are transcriptionally regulated by key transcription factors in these processes such as p53, NFkappaB, Sox2, Oct4 (also known as Pou5f1) and Nanog. Together, these results define a unique collection of functional lincRNAs that are highly conserved and implicated in diverse biological processes.
Collapse
Affiliation(s)
- Mitchell Guttman
- Broad Institute of MIT and Harvard, 7 Cambridge Center, Cambridge, Massachusetts 02142, USA
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
17
|
Abstract
Direct in vivo investigation of mammalian metabolism is complicated by the distinct metabolic functions of different tissues. We present a computational method that successfully describes the tissue specificity of human metabolism on a large scale. By integrating tissue-specific gene- and protein-expression data with an existing comprehensive reconstruction of the global human metabolic network, we predict tissue-specific metabolic activity in ten human tissues. This reveals a central role for post-transcriptional regulation in shaping tissue-specific metabolic activity profiles. The predicted tissue specificity of genes responsible for metabolic diseases and tissue-specific differences in metabolite exchange with biofluids extend markedly beyond tissue-specific differences manifest in enzyme-expression data, and are validated by large-scale mining of tissue-specificity data. Our results establish a computational basis for the genome-wide study of normal and abnormal human metabolism in a tissue-specific manner.
Collapse
Affiliation(s)
- Tomer Shlomi
- School of Computer Science, Tel-Aviv University, Tel-Aviv 69978, Israel.
| | | | | | | | | |
Collapse
|