1
|
Yao D, Tycko J, Oh JW, Bounds LR, Gosai SJ, Lataniotis L, Mackay-Smith A, Doughty BR, Gabdank I, Schmidt H, Guerrero-Altamirano T, Siklenka K, Guo K, White AD, Youngworth I, Andreeva K, Ren X, Barrera A, Luo Y, Yardımcı GG, Tewhey R, Kundaje A, Greenleaf WJ, Sabeti PC, Leslie C, Pritykin Y, Moore JE, Beer MA, Gersbach CA, Reddy TE, Shen Y, Engreitz JM, Bassik MC, Reilly SK. Multicenter integrated analysis of noncoding CRISPRi screens. Nat Methods 2024; 21:723-734. [PMID: 38504114 PMCID: PMC11009116 DOI: 10.1038/s41592-024-02216-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/02/2022] [Accepted: 02/18/2024] [Indexed: 03/21/2024]
Abstract
The ENCODE Consortium's efforts to annotate noncoding cis-regulatory elements (CREs) have advanced our understanding of gene regulatory landscapes. Pooled, noncoding CRISPR screens offer a systematic approach to investigate cis-regulatory mechanisms. The ENCODE4 Functional Characterization Centers conducted 108 screens in human cell lines, comprising >540,000 perturbations across 24.85 megabases of the genome. Using 332 functionally confirmed CRE-gene links in K562 cells, we established guidelines for screening endogenous noncoding elements with CRISPR interference (CRISPRi), including accurate detection of CREs that exhibit variable, often low, transcriptional effects. Benchmarking five screen analysis tools, we find that CASA produces the most conservative CRE calls and is robust to artifacts of low-specificity single guide RNAs. We uncover a subtle DNA strand bias for CRISPRi in transcribed regions with implications for screen design and analysis. Together, we provide an accessible data resource, predesigned single guide RNAs for targeting 3,275,697 ENCODE SCREEN candidate CREs with CRISPRi and screening guidelines to accelerate functional characterization of the noncoding genome.
Collapse
Affiliation(s)
- David Yao
- Department of Genetics, Stanford University, Stanford, CA, USA
| | - Josh Tycko
- Department of Genetics, Stanford University, Stanford, CA, USA.
- Department of Neurobiology, Harvard Medical School, Boston, MA, USA.
| | - Jin Woo Oh
- Departments of Biomedical Engineering and Genetic Medicine, Johns Hopkins University, Baltimore, MD, USA
| | - Lexi R Bounds
- Department of Biomedical Engineering, Duke University, Durham, NC, USA
- Center for Advanced Genomic Technologies, Duke University, Durham, NC, USA
| | - Sager J Gosai
- Broad Institute of Harvard & MIT, Cambridge, MA, USA
- Department of Organismic and Evolutionary Biology, Center for System Biology, Harvard University, Cambridge, MA, USA
- Harvard Graduate Program in Biological and Biomedical Science, Boston, MA, USA
| | - Lazaros Lataniotis
- Department of Neurology, Institute for Human Genetics, University of California, San Franscisco, San Francisco, CA, USA
| | - Ava Mackay-Smith
- University Program in Genetics and Genomics, Duke University School of Medicine, Durham, NC, USA
| | | | - Idan Gabdank
- Department of Genetics, Stanford University, Stanford, CA, USA
| | - Henri Schmidt
- Department of Computer Science, Princeton University, Princeton, NJ, USA
- Computational and Systems Biology Program, Memorial Sloan Kettering Cancer Center, New York, NY, USA
| | - Tania Guerrero-Altamirano
- University Program in Genetics and Genomics, Duke University School of Medicine, Durham, NC, USA
- Department of Biology, Duke University, Durham, NC, USA
| | - Keith Siklenka
- Center for Advanced Genomic Technologies, Duke University, Durham, NC, USA
- Department of Biostatistics and Bioinformatics, Duke University Medical Center, Durham, NC, USA
| | - Katherine Guo
- Department of Genetics, Stanford University, Stanford, CA, USA
| | - Alexander D White
- Department of Electrical Engineering, Stanford University, Stanford, CA, USA
| | | | - Kalina Andreeva
- Department of Genetics, Stanford University, Stanford, CA, USA
| | - Xingjie Ren
- Department of Neurology, Institute for Human Genetics, University of California, San Franscisco, San Francisco, CA, USA
| | - Alejandro Barrera
- Center for Advanced Genomic Technologies, Duke University, Durham, NC, USA
- Department of Biostatistics and Bioinformatics, Duke University Medical Center, Durham, NC, USA
| | - Yunhai Luo
- Department of Genetics, Stanford University, Stanford, CA, USA
| | | | | | - Anshul Kundaje
- Department of Genetics, Stanford University, Stanford, CA, USA
- Department of Computer Science, Stanford University, Stanford, CA, USA
| | - William J Greenleaf
- Department of Genetics, Stanford University, Stanford, CA, USA
- Center for Personal Dynamic Regulomes, Stanford University, Stanford, CA, USA
- Department of Applied Physics, Stanford University, Stanford, CA, USA
- Chan Zuckerberg Biohub, San Francisco, CA, USA
| | - Pardis C Sabeti
- Broad Institute of Harvard & MIT, Cambridge, MA, USA
- Department of Organismic and Evolutionary Biology, Center for System Biology, Harvard University, Cambridge, MA, USA
- Howard Hughes Medical Institute, Chevy Chase, MD, USA
- Department of Immunology and Infectious Disease, Harvard T.H. Chan School of Public Health, Boston, MA, USA
| | - Christina Leslie
- Computational and Systems Biology Program, Memorial Sloan Kettering Cancer Center, New York, NY, USA
| | - Yuri Pritykin
- Department of Computer Science, Princeton University, Princeton, NJ, USA
- Lewis-Sigler Institute for Integrative Genomics, Princeton University, Princeton, NJ, USA
| | - Jill E Moore
- Program in Bioinformatics and Integrative Biology, RNA Therapeutics Institute, University of Massachusetts Chan Medical School, Worcester, MA, USA
| | - Michael A Beer
- Departments of Biomedical Engineering and Genetic Medicine, Johns Hopkins University, Baltimore, MD, USA
| | - Charles A Gersbach
- Department of Biomedical Engineering, Duke University, Durham, NC, USA
- Center for Advanced Genomic Technologies, Duke University, Durham, NC, USA
| | - Timothy E Reddy
- Center for Advanced Genomic Technologies, Duke University, Durham, NC, USA
- Department of Biostatistics and Bioinformatics, Duke University Medical Center, Durham, NC, USA
| | - Yin Shen
- Department of Neurology, Institute for Human Genetics, University of California, San Franscisco, San Francisco, CA, USA
- Department of Neurology, University of California, San Francisco, San Francisco, CA, USA
- Weill Institute for Neurosciences, University of California, San Francisco, San Francisco, CA, USA
| | - Jesse M Engreitz
- Department of Genetics, Stanford University, Stanford, CA, USA
- BASE Initiative, Betty Irene Moore Children's Heart Center, Lucile Packard Children's Hospital, Stanford, CA, USA
- The Novo Nordisk Foundation Center for Genomic Mechanisms of Disease, Broad Institute of MIT and Harvard, Cambridge, MA, USA
| | | | - Steven K Reilly
- Department of Genetics, Yale University, New Haven, CT, USA.
| |
Collapse
|
2
|
Hitz BC, Lee JW, Jolanki O, Kagda MS, Graham K, Sud P, Gabdank I, Strattan JS, Sloan CA, Dreszer T, Rowe LD, Podduturi NR, Malladi VS, Chan ET, Davidson JM, Ho M, Miyasato S, Simison M, Tanaka F, Luo Y, Whaling I, Hong EL, Lee BT, Sandstrom R, Rynes E, Nelson J, Nishida A, Ingersoll A, Buckley M, Frerker M, Kim DS, Boley N, Trout D, Dobin A, Rahmanian S, Wyman D, Balderrama-Gutierrez G, Reese F, Durand NC, Dudchenko O, Weisz D, Rao SSP, Blackburn A, Gkountaroulis D, Sadr M, Olshansky M, Eliaz Y, Nguyen D, Bochkov I, Shamim MS, Mahajan R, Aiden E, Gingeras T, Heath S, Hirst M, Kent WJ, Kundaje A, Mortazavi A, Wold B, Cherry JM. The ENCODE Uniform Analysis Pipelines. Res Sq 2023:rs.3.rs-3111932. [PMID: 37503119 PMCID: PMC10371165 DOI: 10.21203/rs.3.rs-3111932/v1] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 07/29/2023]
Abstract
The Encyclopedia of DNA elements (ENCODE) project is a collaborative effort to create a comprehensive catalog of functional elements in the human genome. The current database comprises more than 19000 functional genomics experiments across more than 1000 cell lines and tissues using a wide array of experimental techniques to study the chromatin structure, regulatory and transcriptional landscape of the Homo sapiens and Mus musculus genomes. All experimental data, metadata, and associated computational analyses created by the ENCODE consortium are submitted to the Data Coordination Center (DCC) for validation, tracking, storage, and distribution to community resources and the scientific community. The ENCODE project has engineered and distributed uniform processing pipelines in order to promote data provenance and reproducibility as well as allow interoperability between genomic resources and other consortia. All data files, reference genome versions, software versions, and parameters used by the pipelines are captured and available via the ENCODE Portal. The pipeline code, developed using Docker and Workflow Description Language (WDL; https://openwdl.org/) is publicly available in GitHub, with images available on Dockerhub (https://hub.docker.com), enabling access to a diverse range of biomedical researchers. ENCODE pipelines maintained and used by the DCC can be installed to run on personal computers, local HPC clusters, or in cloud computing environments via Cromwell. Access to the pipelines and data via the cloud allows small labs the ability to use the data or software without access to institutional compute clusters. Standardization of the computational methodologies for analysis and quality control leads to comparable results from different ENCODE collections - a prerequisite for successful integrative analyses.
Collapse
Affiliation(s)
- Benjamin C Hitz
- Department of Genetics, Stanford University School of Medicine, Stanford, CA 94305, USA
| | - Jin-Wook Lee
- Department of Genetics, Stanford University School of Medicine, Stanford, CA 94305, USA
| | - Otto Jolanki
- Department of Genetics, Stanford University School of Medicine, Stanford, CA 94305, USA
| | - Meenakshi S Kagda
- Department of Genetics, Stanford University School of Medicine, Stanford, CA 94305, USA
| | - Keenan Graham
- Department of Genetics, Stanford University School of Medicine, Stanford, CA 94305, USA
| | - Paul Sud
- Department of Genetics, Stanford University School of Medicine, Stanford, CA 94305, USA
| | - Idan Gabdank
- Department of Genetics, Stanford University School of Medicine, Stanford, CA 94305, USA
| | - J Seth Strattan
- Department of Genetics, Stanford University School of Medicine, Stanford, CA 94305, USA
| | - Cricket A Sloan
- Department of Genetics, Stanford University School of Medicine, Stanford, CA 94305, USA
| | - Timothy Dreszer
- Department of Genetics, Stanford University School of Medicine, Stanford, CA 94305, USA
| | - Laurence D Rowe
- Department of Genetics, Stanford University School of Medicine, Stanford, CA 94305, USA
| | - Nikhil R Podduturi
- Department of Genetics, Stanford University School of Medicine, Stanford, CA 94305, USA
| | - Venkat S Malladi
- Department of Genetics, Stanford University School of Medicine, Stanford, CA 94305, USA
| | - Esther T Chan
- Department of Genetics, Stanford University School of Medicine, Stanford, CA 94305, USA
| | - Jean M Davidson
- Department of Genetics, Stanford University School of Medicine, Stanford, CA 94305, USA
| | - Marcus Ho
- Department of Genetics, Stanford University School of Medicine, Stanford, CA 94305, USA
| | - Stuart Miyasato
- Department of Genetics, Stanford University School of Medicine, Stanford, CA 94305, USA
| | - Matt Simison
- Department of Genetics, Stanford University School of Medicine, Stanford, CA 94305, USA
| | - Forrest Tanaka
- Department of Genetics, Stanford University School of Medicine, Stanford, CA 94305, USA
| | - Yunhai Luo
- Department of Genetics, Stanford University School of Medicine, Stanford, CA 94305, USA
| | - Ian Whaling
- Department of Genetics, Stanford University School of Medicine, Stanford, CA 94305, USA
| | - Eurie L Hong
- Department of Genetics, Stanford University School of Medicine, Stanford, CA 94305, USA
| | - Brian T Lee
- Genomics Institute, School of Engineering, University of California Santa Cruz, Santa Cruz, CA 95064, USA
| | - Richard Sandstrom
- Altius Institute for Biomedical Sciences, 2211 Elliott Avenue, 6th Floor, Seattle, WA 98121, USA
| | - Eric Rynes
- Altius Institute for Biomedical Sciences, 2211 Elliott Avenue, 6th Floor, Seattle, WA 98121, USA
| | - Jemma Nelson
- Altius Institute for Biomedical Sciences, 2211 Elliott Avenue, 6th Floor, Seattle, WA 98121, USA
| | - Andrew Nishida
- Altius Institute for Biomedical Sciences, 2211 Elliott Avenue, 6th Floor, Seattle, WA 98121, USA
| | - Alyssa Ingersoll
- Altius Institute for Biomedical Sciences, 2211 Elliott Avenue, 6th Floor, Seattle, WA 98121, USA
| | - Michael Buckley
- Altius Institute for Biomedical Sciences, 2211 Elliott Avenue, 6th Floor, Seattle, WA 98121, USA
| | - Mark Frerker
- Altius Institute for Biomedical Sciences, 2211 Elliott Avenue, 6th Floor, Seattle, WA 98121, USA
| | - Daniel S Kim
- Department of Genetics, Department of Computer Science, Stanford University, 240 Pasteur Drive, Palo Alto, CA 94304, USA
| | - Nathan Boley
- Department of Genetics, Department of Computer Science, Stanford University, 240 Pasteur Drive, Palo Alto, CA 94304, USA
| | - Diane Trout
- Division of Biology and Biological Engineering, California Institute of Technology, Pasadena, CA, 91125 USA
| | - Alex Dobin
- Cold Spring Harbor Laboratory, Cold Spring Harbor, NY 11724, USA
| | - Sorena Rahmanian
- Center for Complex Biological Systems, University of California, Irvine, Irvine, CA 92697, USA
| | - Dana Wyman
- Center for Complex Biological Systems, University of California, Irvine, Irvine, CA 92697, USA
| | | | - Fairlie Reese
- Center for Complex Biological Systems, University of California, Irvine, Irvine, CA 92697, USA
| | - Neva C Durand
- Broad Institute of MIT and Harvard, Cambridge, MA 02142, USA
- The Center for Genome Architecture, Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, TX 77030, USA
- Center for Theoretical Biological Physics, Rice University, Houston, TX 77030, USA
- Department of Computer Science, Rice University, Houston, TX 77030, USA
| | - Olga Dudchenko
- The Center for Genome Architecture, Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, TX 77030, USA
| | - David Weisz
- The Center for Genome Architecture, Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, TX 77030, USA
| | - Suhas S P Rao
- The Center for Genome Architecture, Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, TX 77030, USA
- Department of Structural Biology, Stanford University School of Medicine, Stanford, CA 94305, USA
- Department of Medicine, University of California San Francisco, San Francisco, CA 94143, USA
| | - Alyssa Blackburn
- The Center for Genome Architecture, Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, TX 77030, USA
- Center for Theoretical Biological Physics, Rice University, Houston, TX 77030, USA
| | - Dimos Gkountaroulis
- The Center for Genome Architecture, Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, TX 77030, USA
- Center for Theoretical Biological Physics, Rice University, Houston, TX 77030, USA
| | - Mahdi Sadr
- The Center for Genome Architecture, Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, TX 77030, USA
| | - Moshe Olshansky
- Broad Institute of MIT and Harvard, Cambridge, MA 02142, USA
| | - Yossi Eliaz
- The Center for Genome Architecture, Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, TX 77030, USA
| | - Dat Nguyen
- The Center for Genome Architecture, Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, TX 77030, USA
| | - Ivan Bochkov
- The Center for Genome Architecture, Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, TX 77030, USA
| | - Muhammad Saad Shamim
- The Center for Genome Architecture, Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, TX 77030, USA
- Center for Theoretical Biological Physics, Rice University, Houston, TX 77030, USA
- Department of Bioengineering, Rice University, Houston, TX 77030, USA
- Medical Scientist Training Program, Baylor College of Medicine, Houston, TX 77030, USA
| | - Ragini Mahajan
- Center for Theoretical Biological Physics, Rice University, Houston, TX 77030, USA
- Department of BioSciences, Rice University, Houston, TX 77005, USA
| | - Erez Aiden
- Broad Institute of MIT and Harvard, Cambridge, MA 02142, USA
- The Center for Genome Architecture, Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, TX 77030, USA
- Center for Theoretical Biological Physics, Rice University, Houston, TX 77030, USA
| | - Tom Gingeras
- Cold Spring Harbor Laboratory, Cold Spring Harbor, NY 11724, USA
| | - Simon Heath
- CNAG-CRG, Centre for Genomic Regulation (CRG), The Barcelona Institute of Science and Technology (BIST), Barcelona, Spain. Universitat Pompeu Fabra, Barcelona, Spain
| | - Martin Hirst
- Micheal Smith Laboratories, University of British Columbia, British Columbia, Canada
| | - W James Kent
- Genomics Institute, School of Engineering, University of California Santa Cruz, Santa Cruz, CA 95064, USA
| | - Anshul Kundaje
- Department of Genetics, Department of Computer Science, Stanford University, 240 Pasteur Drive, Palo Alto, CA 94304, USA
| | - Ali Mortazavi
- Center for Complex Biological Systems, University of California, Irvine, Irvine, CA 92697, USA
| | - Barbara Wold
- Division of Biology and Biological Engineering, California Institute of Technology, Pasadena, CA, 91125 USA
| | - J Michael Cherry
- Department of Genetics, Stanford University School of Medicine, Stanford, CA 94305, USA
| |
Collapse
|
3
|
Reese F, Williams B, Balderrama-Gutierrez G, Wyman D, Çelik MH, Rebboah E, Rezaie N, Trout D, Razavi-Mohseni M, Jiang Y, Borsari B, Morabito S, Liang HY, McGill CJ, Rahmanian S, Sakr J, Jiang S, Zeng W, Carvalho K, Weimer AK, Dionne LA, McShane A, Bedi K, Elhajjajy SI, Upchurch S, Jou J, Youngworth I, Gabdank I, Sud P, Jolanki O, Strattan JS, Kagda MS, Snyder MP, Hitz BC, Moore JE, Weng Z, Bennett D, Reinholdt L, Ljungman M, Beer MA, Gerstein MB, Pachter L, Guigó R, Wold BJ, Mortazavi A. The ENCODE4 long-read RNA-seq collection reveals distinct classes of transcript structure diversity. bioRxiv 2023:2023.05.15.540865. [PMID: 37292896 PMCID: PMC10245583 DOI: 10.1101/2023.05.15.540865] [Citation(s) in RCA: 9] [Impact Index Per Article: 9.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/10/2023]
Abstract
The majority of mammalian genes encode multiple transcript isoforms that result from differential promoter use, changes in exonic splicing, and alternative 3' end choice. Detecting and quantifying transcript isoforms across tissues, cell types, and species has been extremely challenging because transcripts are much longer than the short reads normally used for RNA-seq. By contrast, long-read RNA-seq (LR-RNA-seq) gives the complete structure of most transcripts. We sequenced 264 LR-RNA-seq PacBio libraries totaling over 1 billion circular consensus reads (CCS) for 81 unique human and mouse samples. We detect at least one full-length transcript from 87.7% of annotated human protein coding genes and a total of 200,000 full-length transcripts, 40% of which have novel exon junction chains. To capture and compute on the three sources of transcript structure diversity, we introduce a gene and transcript annotation framework that uses triplets representing the transcript start site, exon junction chain, and transcript end site of each transcript. Using triplets in a simplex representation demonstrates how promoter selection, splice pattern, and 3' processing are deployed across human tissues, with nearly half of multi-transcript protein coding genes showing a clear bias toward one of the three diversity mechanisms. Evaluated across samples, the predominantly expressed transcript changes for 74% of protein coding genes. In evolution, the human and mouse transcriptomes are globally similar in types of transcript structure diversity, yet among individual orthologous gene pairs, more than half (57.8%) show substantial differences in mechanism of diversification in matching tissues. This initial large-scale survey of human and mouse long-read transcriptomes provides a foundation for further analyses of alternative transcript usage, and is complemented by short-read and microRNA data on the same samples and by epigenome data elsewhere in the ENCODE4 collection.
Collapse
Affiliation(s)
- Fairlie Reese
- Developmental and Cell Biology, University of California, Irvine, Irvine, USA
- Center for Complex Biological Systems, University of California, Irvine, Irvine, USA
| | - Brian Williams
- Division of Biology and Biological Engineering, California Institute of Technology, Pasadena, USA
| | - Gabriela Balderrama-Gutierrez
- Developmental and Cell Biology, University of California, Irvine, Irvine, USA
- Center for Complex Biological Systems, University of California, Irvine, Irvine, USA
| | - Dana Wyman
- Center for Complex Biological Systems, University of California, Irvine, Irvine, USA
| | - Muhammed Hasan Çelik
- Center for Complex Biological Systems, University of California, Irvine, Irvine, USA
| | - Elisabeth Rebboah
- Center for Complex Biological Systems, University of California, Irvine, Irvine, USA
| | - Narges Rezaie
- Center for Complex Biological Systems, University of California, Irvine, Irvine, USA
| | - Diane Trout
- Division of Biology and Biological Engineering, California Institute of Technology, Pasadena, USA
| | - Milad Razavi-Mohseni
- Department of Biomedical Engineering, Johns Hopkins University, Baltimore, USA
- McKusick-Nathans Department of Genetic Medicine, Johns Hopkins University, Baltimore, USA
| | - Yunzhe Jiang
- Program in Computational Biology and Bioinformatics, Yale University, New Haven, USA
- Department of Molecular Biophysics and Biochemistry, Yale University, New Haven, USA
| | - Beatrice Borsari
- Program in Computational Biology and Bioinformatics, Yale University, New Haven, USA
- Department of Molecular Biophysics and Biochemistry, Yale University, New Haven, USA
- Centre for Genomic Regulation, The Barcelona Institute of Science and Technology, Barcelona, Spain
| | - Samuel Morabito
- Center for Complex Biological Systems, University of California, Irvine, Irvine, USA
| | - Heidi Yahan Liang
- Center for Complex Biological Systems, University of California, Irvine, Irvine, USA
| | - Cassandra J McGill
- Developmental and Cell Biology, University of California, Irvine, Irvine, USA
- Center for Complex Biological Systems, University of California, Irvine, Irvine, USA
| | - Sorena Rahmanian
- Center for Complex Biological Systems, University of California, Irvine, Irvine, USA
| | - Jasmine Sakr
- Center for Complex Biological Systems, University of California, Irvine, Irvine, USA
- Department of Pharmaceutical Sciences, University of California, Irvine, Irvine, USA
| | - Shan Jiang
- Developmental and Cell Biology, University of California, Irvine, Irvine, USA
- Center for Complex Biological Systems, University of California, Irvine, Irvine, USA
| | - Weihua Zeng
- Developmental and Cell Biology, University of California, Irvine, Irvine, USA
- Center for Complex Biological Systems, University of California, Irvine, Irvine, USA
| | - Klebea Carvalho
- Center for Complex Biological Systems, University of California, Irvine, Irvine, USA
| | - Annika K Weimer
- Department of Genetics, Stanford University School of Medicine, Palo Alto, USA
| | - Louise A Dionne
- The Jackson Laboratory, The Jackson Laboratory, Bar Harbor, USA
| | - Ariel McShane
- Cellular and Molecular Biology Program, University of Michigan, Ann Arbor, USA
- Department of Radiation Oncology, University of Michigan, Ann Arbor, USA
| | - Karan Bedi
- Department of Biostatistics, University of Michigan, Ann Arbor, USA
- Center for RNA Biomedicine and Rogel Cancer Center, University of Michigan, Ann Arbor, USA
| | - Shaimae I Elhajjajy
- Program in Bioinformatics and Integrative Biology, University of Massachusetts Chan Medical School, Worcester, USA
| | - Sean Upchurch
- Division of Biology and Biological Engineering, California Institute of Technology, Pasadena, USA
| | - Jennifer Jou
- Department of Genetics, Stanford University School of Medicine, Palo Alto, USA
| | - Ingrid Youngworth
- Department of Genetics, Stanford University School of Medicine, Palo Alto, USA
| | - Idan Gabdank
- Department of Genetics, Stanford University School of Medicine, Palo Alto, USA
| | - Paul Sud
- Department of Genetics, Stanford University School of Medicine, Palo Alto, USA
| | - Otto Jolanki
- Department of Genetics, Stanford University School of Medicine, Palo Alto, USA
| | - J Seth Strattan
- Department of Genetics, Stanford University School of Medicine, Palo Alto, USA
| | - Meenakshi S Kagda
- Department of Genetics, Stanford University School of Medicine, Palo Alto, USA
| | - Michael P Snyder
- Department of Genetics, Stanford University School of Medicine, Palo Alto, USA
| | - Ben C Hitz
- Department of Genetics, Stanford University School of Medicine, Palo Alto, USA
| | - Jill E Moore
- Program in Bioinformatics and Integrative Biology, University of Massachusetts Chan Medical School, Worcester, USA
| | - Zhiping Weng
- Program in Bioinformatics and Integrative Biology, University of Massachusetts Chan Medical School, Worcester, USA
| | - David Bennett
- Rush Alzheimer's Disease Center, Rush University Medical Center, Chicago, USA
- Department of Neurological Sciences, Rush University Medical Center, Chicago, USA
| | - Laura Reinholdt
- The Jackson Laboratory, The Jackson Laboratory, Bar Harbor, USA
| | - Mats Ljungman
- Center for RNA Biomedicine and Rogel Cancer Center, University of Michigan, Ann Arbor, USA
- Departments of Radiation Oncology and Environmental Health Sciences, University of Michigan, Ann Arbor, USA
| | - Michael A Beer
- Department of Biomedical Engineering, Johns Hopkins University, Baltimore, USA
- McKusick-Nathans Department of Genetic Medicine, Johns Hopkins University, Baltimore, USA
| | - Mark B Gerstein
- Program in Computational Biology and Bioinformatics, Yale University, New Haven, USA
- Department of Molecular Biophysics and Biochemistry, Yale University, New Haven, USA
- Section on Biomedical Informatics and Data Science, Yale University, New Haven, USA
- Department of Statistics and Data Science, Yale University, New Haven, USA
- Department of Computer Science, Yale University, New Haven, USA
| | - Lior Pachter
- Division of Biology and Biological Engineering, California Institute of Technology, Pasadena, USA
- Department of Computing and Mathematical Sciences, California Institute of Technology, Pasadena, USA
| | - Roderic Guigó
- Centre for Genomic Regulation, The Barcelona Institute of Science and Technology, Barcelona, Spain
- Department of Medicine and Life Sciences, Universitat Pompeu Fabra, Barcelona, Spain
| | - Barbara J Wold
- Division of Biology and Biological Engineering, California Institute of Technology, Pasadena, USA
| | - Ali Mortazavi
- Developmental and Cell Biology, University of California, Irvine, Irvine, USA
- Center for Complex Biological Systems, University of California, Irvine, Irvine, USA
| |
Collapse
|
4
|
Hitz BC, Jin-Wook L, Jolanki O, Kagda MS, Graham K, Sud P, Gabdank I, Strattan JS, Sloan CA, Dreszer T, Rowe LD, Podduturi NR, Malladi VS, Chan ET, Davidson JM, Ho M, Miyasato S, Simison M, Tanaka F, Luo Y, Whaling I, Hong EL, Lee BT, Sandstrom R, Rynes E, Nelson J, Nishida A, Ingersoll A, Buckley M, Frerker M, Kim DS, Boley N, Trout D, Dobin A, Rahmanian S, Wyman D, Balderrama-Gutierrez G, Reese F, Durand NC, Dudchenko O, Weisz D, Rao SSP, Blackburn A, Gkountaroulis D, Sadr M, Olshansky M, Eliaz Y, Nguyen D, Bochkov I, Shamim MS, Mahajan R, Aiden E, Gingeras T, Heath S, Hirst M, Kent WJ, Kundaje A, Mortazavi A, Wold B, Cherry JM. The ENCODE Uniform Analysis Pipelines. bioRxiv 2023:2023.04.04.535623. [PMID: 37066421 PMCID: PMC10104020 DOI: 10.1101/2023.04.04.535623] [Citation(s) in RCA: 7] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 04/18/2023]
Abstract
The Encyclopedia of DNA elements (ENCODE) project is a collaborative effort to create a comprehensive catalog of functional elements in the human genome. The current database comprises more than 19000 functional genomics experiments across more than 1000 cell lines and tissues using a wide array of experimental techniques to study the chromatin structure, regulatory and transcriptional landscape of the Homo sapiens and Mus musculus genomes. All experimental data, metadata, and associated computational analyses created by the ENCODE consortium are submitted to the Data Coordination Center (DCC) for validation, tracking, storage, and distribution to community resources and the scientific community. The ENCODE project has engineered and distributed uniform processing pipelines in order to promote data provenance and reproducibility as well as allow interoperability between genomic resources and other consortia. All data files, reference genome versions, software versions, and parameters used by the pipelines are captured and available via the ENCODE Portal. The pipeline code, developed using Docker and Workflow Description Language (WDL; https://openwdl.org/) is publicly available in GitHub, with images available on Dockerhub (https://hub.docker.com), enabling access to a diverse range of biomedical researchers. ENCODE pipelines maintained and used by the DCC can be installed to run on personal computers, local HPC clusters, or in cloud computing environments via Cromwell. Access to the pipelines and data via the cloud allows small labs the ability to use the data or software without access to institutional compute clusters. Standardization of the computational methodologies for analysis and quality control leads to comparable results from different ENCODE collections - a prerequisite for successful integrative analyses.
Collapse
Affiliation(s)
- Benjamin C Hitz
- Department of Genetics, Stanford University School of Medicine, Stanford, CA 94305, USA
| | - Lee Jin-Wook
- Department of Genetics, Stanford University School of Medicine, Stanford, CA 94305, USA
| | - Otto Jolanki
- Department of Genetics, Stanford University School of Medicine, Stanford, CA 94305, USA
| | - Meenakshi S Kagda
- Department of Genetics, Stanford University School of Medicine, Stanford, CA 94305, USA
| | - Keenan Graham
- Department of Genetics, Stanford University School of Medicine, Stanford, CA 94305, USA
| | - Paul Sud
- Department of Genetics, Stanford University School of Medicine, Stanford, CA 94305, USA
| | - Idan Gabdank
- Department of Genetics, Stanford University School of Medicine, Stanford, CA 94305, USA
| | - J Seth Strattan
- Department of Genetics, Stanford University School of Medicine, Stanford, CA 94305, USA
| | - Cricket A Sloan
- Department of Genetics, Stanford University School of Medicine, Stanford, CA 94305, USA
| | - Timothy Dreszer
- Department of Genetics, Stanford University School of Medicine, Stanford, CA 94305, USA
| | - Laurence D Rowe
- Department of Genetics, Stanford University School of Medicine, Stanford, CA 94305, USA
| | - Nikhil R Podduturi
- Department of Genetics, Stanford University School of Medicine, Stanford, CA 94305, USA
| | - Venkat S Malladi
- Department of Genetics, Stanford University School of Medicine, Stanford, CA 94305, USA
| | - Esther T Chan
- Department of Genetics, Stanford University School of Medicine, Stanford, CA 94305, USA
| | - Jean M Davidson
- Department of Genetics, Stanford University School of Medicine, Stanford, CA 94305, USA
| | - Marcus Ho
- Department of Genetics, Stanford University School of Medicine, Stanford, CA 94305, USA
| | - Stuart Miyasato
- Department of Genetics, Stanford University School of Medicine, Stanford, CA 94305, USA
| | - Matt Simison
- Department of Genetics, Stanford University School of Medicine, Stanford, CA 94305, USA
| | - Forrest Tanaka
- Department of Genetics, Stanford University School of Medicine, Stanford, CA 94305, USA
| | - Yunhai Luo
- Department of Genetics, Stanford University School of Medicine, Stanford, CA 94305, USA
| | - Ian Whaling
- Department of Genetics, Stanford University School of Medicine, Stanford, CA 94305, USA
| | - Eurie L Hong
- Department of Genetics, Stanford University School of Medicine, Stanford, CA 94305, USA
| | - Brian T Lee
- Genomics Institute, School of Engineering, University of California Santa Cruz, Santa Cruz, CA 95064, USA
| | - Richard Sandstrom
- Altius Institute for Biomedical Sciences, 2211 Elliott Avenue, 6th Floor, Seattle, WA 98121, USA
| | - Eric Rynes
- Altius Institute for Biomedical Sciences, 2211 Elliott Avenue, 6th Floor, Seattle, WA 98121, USA
| | - Jemma Nelson
- Altius Institute for Biomedical Sciences, 2211 Elliott Avenue, 6th Floor, Seattle, WA 98121, USA
| | - Andrew Nishida
- Altius Institute for Biomedical Sciences, 2211 Elliott Avenue, 6th Floor, Seattle, WA 98121, USA
| | - Alyssa Ingersoll
- Altius Institute for Biomedical Sciences, 2211 Elliott Avenue, 6th Floor, Seattle, WA 98121, USA
| | - Michael Buckley
- Altius Institute for Biomedical Sciences, 2211 Elliott Avenue, 6th Floor, Seattle, WA 98121, USA
| | - Mark Frerker
- Altius Institute for Biomedical Sciences, 2211 Elliott Avenue, 6th Floor, Seattle, WA 98121, USA
| | - Daniel S Kim
- Dept. of Genetics, Dept. of Computer Science, Stanford University, 240 Pasteur Drive, Palo Alto, CA 94304, USA
| | - Nathan Boley
- Dept. of Genetics, Dept. of Computer Science, Stanford University, 240 Pasteur Drive, Palo Alto, CA 94304, USA
| | - Diane Trout
- Division of Biology and Biological Engineering, California Institute of Technology, Pasadena, CA, 91125 USA
| | - Alex Dobin
- Cold Spring Harbor Laboratory, Cold Spring Harbor, NY 11724, USA
| | - Sorena Rahmanian
- Center for Complex Biological Systems, University of California, Irvine, Irvine, CA 92697, USA
| | - Dana Wyman
- Center for Complex Biological Systems, University of California, Irvine, Irvine, CA 92697, USA
| | | | - Fairlie Reese
- Center for Complex Biological Systems, University of California, Irvine, Irvine, CA 92697, USA
| | - Neva C Durand
- Broad Institute of MIT and Harvard, Cambridge, MA 02142, USA
- The Center for Genome Architecture, Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, TX 77030, USA
- Center for Theoretical Biological Physics, Rice University, Houston, TX 77030, USA
- Department of Computer Science, Rice University, Houston, TX 77030, USA
| | - Olga Dudchenko
- The Center for Genome Architecture, Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, TX 77030, USA
| | - David Weisz
- The Center for Genome Architecture, Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, TX 77030, USA
| | - Suhas S P Rao
- The Center for Genome Architecture, Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, TX 77030, USA
- Department of Structural Biology, Stanford University School of Medicine, Stanford, CA 94305, USA
- Department of Medicine, University of California San Francisco, San Francisco, CA 94143, USA
| | - Alyssa Blackburn
- The Center for Genome Architecture, Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, TX 77030, USA
- Center for Theoretical Biological Physics, Rice University, Houston, TX 77030, USA
| | - Dimos Gkountaroulis
- The Center for Genome Architecture, Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, TX 77030, USA
- Center for Theoretical Biological Physics, Rice University, Houston, TX 77030, USA
| | - Mahdi Sadr
- The Center for Genome Architecture, Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, TX 77030, USA
| | - Moshe Olshansky
- Broad Institute of MIT and Harvard, Cambridge, MA 02142, USA
| | - Yossi Eliaz
- The Center for Genome Architecture, Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, TX 77030, USA
| | - Dat Nguyen
- The Center for Genome Architecture, Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, TX 77030, USA
| | - Ivan Bochkov
- The Center for Genome Architecture, Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, TX 77030, USA
| | - Muhammad Saad Shamim
- The Center for Genome Architecture, Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, TX 77030, USA
- Center for Theoretical Biological Physics, Rice University, Houston, TX 77030, USA
- Department of Bioengineering, Rice University, Houston, TX 77030, USA
- Medical Scientist Training Program, Baylor College of Medicine, Houston, TX 77030, USA
| | - Ragini Mahajan
- Center for Theoretical Biological Physics, Rice University, Houston, TX 77030, USA
- Department of BioSciences, Rice University, Houston, TX 77005, USA
| | - Erez Aiden
- Broad Institute of MIT and Harvard, Cambridge, MA 02142, USA
- The Center for Genome Architecture, Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, TX 77030, USA
- Center for Theoretical Biological Physics, Rice University, Houston, TX 77030, USA
| | - Tom Gingeras
- Cold Spring Harbor Laboratory, Cold Spring Harbor, NY 11724, USA
| | - Simon Heath
- CNAG-CRG, Centre for Genomic Regulation (CRG), The Barcelona Institute of Science and Technology (BIST), Barcelona, Spain. Universitat Pompeu Fabra, Barcelona, Spain
| | - Martin Hirst
- Micheal Smith Laboratories, University of British Columbia, British Columbia, Canada
| | - W James Kent
- Genomics Institute, School of Engineering, University of California Santa Cruz, Santa Cruz, CA 95064, USA
| | - Anshul Kundaje
- Dept. of Genetics, Dept. of Computer Science, Stanford University, 240 Pasteur Drive, Palo Alto, CA 94304, USA
| | - Ali Mortazavi
- Center for Complex Biological Systems, University of California, Irvine, Irvine, CA 92697, USA
| | - Barbara Wold
- Division of Biology and Biological Engineering, California Institute of Technology, Pasadena, CA, 91125 USA
| | - J Michael Cherry
- Department of Genetics, Stanford University School of Medicine, Stanford, CA 94305, USA
| |
Collapse
|
5
|
Rozowsky J, Gao J, Borsari B, Yang YT, Galeev T, Gürsoy G, Epstein CB, Xiong K, Xu J, Li T, Liu J, Yu K, Berthel A, Chen Z, Navarro F, Sun MS, Wright J, Chang J, Cameron CJF, Shoresh N, Gaskell E, Drenkow J, Adrian J, Aganezov S, Aguet F, Balderrama-Gutierrez G, Banskota S, Corona GB, Chee S, Chhetri SB, Cortez Martins GC, Danyko C, Davis CA, Farid D, Farrell NP, Gabdank I, Gofin Y, Gorkin DU, Gu M, Hecht V, Hitz BC, Issner R, Jiang Y, Kirsche M, Kong X, Lam BR, Li S, Li B, Li X, Lin KZ, Luo R, Mackiewicz M, Meng R, Moore JE, Mudge J, Nelson N, Nusbaum C, Popov I, Pratt HE, Qiu Y, Ramakrishnan S, Raymond J, Salichos L, Scavelli A, Schreiber JM, Sedlazeck FJ, See LH, Sherman RM, Shi X, Shi M, Sloan CA, Strattan JS, Tan Z, Tanaka FY, Vlasova A, Wang J, Werner J, Williams B, Xu M, Yan C, Yu L, Zaleski C, Zhang J, Ardlie K, Cherry JM, Mendenhall EM, Noble WS, Weng Z, Levine ME, Dobin A, Wold B, Mortazavi A, Ren B, Gillis J, Myers RM, Snyder MP, Choudhary J, Milosavljevic A, Schatz MC, Bernstein BE, Guigó R, Gingeras TR, Gerstein M. The EN-TEx resource of multi-tissue personal epigenomes & variant-impact models. Cell 2023; 186:1493-1511.e40. [PMID: 37001506 PMCID: PMC10074325 DOI: 10.1016/j.cell.2023.02.018] [Citation(s) in RCA: 12] [Impact Index Per Article: 12.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/03/2022] [Revised: 10/16/2022] [Accepted: 02/10/2023] [Indexed: 04/03/2023]
Abstract
Understanding how genetic variants impact molecular phenotypes is a key goal of functional genomics, currently hindered by reliance on a single haploid reference genome. Here, we present the EN-TEx resource of 1,635 open-access datasets from four donors (∼30 tissues × ∼15 assays). The datasets are mapped to matched, diploid genomes with long-read phasing and structural variants, instantiating a catalog of >1 million allele-specific loci. These loci exhibit coordinated activity along haplotypes and are less conserved than corresponding, non-allele-specific ones. Surprisingly, a deep-learning transformer model can predict the allele-specific activity based only on local nucleotide-sequence context, highlighting the importance of transcription-factor-binding motifs particularly sensitive to variants. Furthermore, combining EN-TEx with existing genome annotations reveals strong associations between allele-specific and GWAS loci. It also enables models for transferring known eQTLs to difficult-to-profile tissues (e.g., from skin to heart). Overall, EN-TEx provides rich data and generalizable models for more accurate personal functional genomics.
Collapse
Affiliation(s)
- Joel Rozowsky
- Section on Biomedical Informatics and Data Science, Yale University, New Haven, CT, USA; Program in Computational Biology and Bioinformatics, Yale University, New Haven, CT, USA; Department of Molecular Biophysics and Biochemistry, Yale University, New Haven, CT, USA
| | - Jiahao Gao
- Program in Computational Biology and Bioinformatics, Yale University, New Haven, CT, USA; Department of Molecular Biophysics and Biochemistry, Yale University, New Haven, CT, USA
| | - Beatrice Borsari
- Program in Computational Biology and Bioinformatics, Yale University, New Haven, CT, USA; Department of Molecular Biophysics and Biochemistry, Yale University, New Haven, CT, USA; Centre for Genomic Regulation, The Barcelona Institute of Science and Technology, Barcelona, Catalonia, Spain
| | - Yucheng T Yang
- Institute of Science and Technology for Brain-Inspired Intelligence; MOE Key Laboratory of Computational Neuroscience and Brain-Inspired Intelligence; MOE Frontiers Center for Brain Science, Fudan University, Shanghai 200433, China; Program in Computational Biology and Bioinformatics, Yale University, New Haven, CT, USA; Department of Molecular Biophysics and Biochemistry, Yale University, New Haven, CT, USA
| | - Timur Galeev
- Program in Computational Biology and Bioinformatics, Yale University, New Haven, CT, USA; Department of Molecular Biophysics and Biochemistry, Yale University, New Haven, CT, USA
| | - Gamze Gürsoy
- Program in Computational Biology and Bioinformatics, Yale University, New Haven, CT, USA; Department of Molecular Biophysics and Biochemistry, Yale University, New Haven, CT, USA
| | | | - Kun Xiong
- Program in Computational Biology and Bioinformatics, Yale University, New Haven, CT, USA; Department of Molecular Biophysics and Biochemistry, Yale University, New Haven, CT, USA
| | - Jinrui Xu
- Program in Computational Biology and Bioinformatics, Yale University, New Haven, CT, USA; Department of Molecular Biophysics and Biochemistry, Yale University, New Haven, CT, USA
| | - Tianxiao Li
- Program in Computational Biology and Bioinformatics, Yale University, New Haven, CT, USA; Department of Molecular Biophysics and Biochemistry, Yale University, New Haven, CT, USA
| | - Jason Liu
- Program in Computational Biology and Bioinformatics, Yale University, New Haven, CT, USA; Department of Molecular Biophysics and Biochemistry, Yale University, New Haven, CT, USA
| | - Keyang Yu
- Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, TX, USA
| | - Ana Berthel
- Program in Computational Biology and Bioinformatics, Yale University, New Haven, CT, USA; Department of Molecular Biophysics and Biochemistry, Yale University, New Haven, CT, USA
| | - Zhanlin Chen
- Department of Statistics and Data Science, Yale University, New Haven, CT, USA
| | - Fabio Navarro
- Program in Computational Biology and Bioinformatics, Yale University, New Haven, CT, USA; Department of Molecular Biophysics and Biochemistry, Yale University, New Haven, CT, USA
| | - Maxwell S Sun
- Program in Computational Biology and Bioinformatics, Yale University, New Haven, CT, USA; Department of Molecular Biophysics and Biochemistry, Yale University, New Haven, CT, USA
| | | | - Justin Chang
- Program in Computational Biology and Bioinformatics, Yale University, New Haven, CT, USA; Department of Molecular Biophysics and Biochemistry, Yale University, New Haven, CT, USA
| | - Christopher J F Cameron
- Program in Computational Biology and Bioinformatics, Yale University, New Haven, CT, USA; Department of Molecular Biophysics and Biochemistry, Yale University, New Haven, CT, USA
| | - Noam Shoresh
- Broad Institute of MIT and Harvard, Cambridge, MA, USA
| | | | - Jorg Drenkow
- Functional Genomics, Cold Spring Harbor Laboratory, Cold Spring Harbor, NY, USA
| | - Jessika Adrian
- Department of Genetics, School of Medicine, Stanford University, Palo Alto, CA, USA
| | - Sergey Aganezov
- Departments of Computer Science and Biology, Johns Hopkins University, Baltimore, MD, USA
| | | | | | | | | | - Sora Chee
- Ludwig Institute for Cancer Research, University of California, San Diego, La Jolla, CA, USA
| | - Surya B Chhetri
- HudsonAlpha Institute for Biotechnology, Huntsville, AL, USA
| | - Gabriel Conte Cortez Martins
- Program in Computational Biology and Bioinformatics, Yale University, New Haven, CT, USA; Department of Molecular Biophysics and Biochemistry, Yale University, New Haven, CT, USA
| | - Cassidy Danyko
- Functional Genomics, Cold Spring Harbor Laboratory, Cold Spring Harbor, NY, USA
| | - Carrie A Davis
- Functional Genomics, Cold Spring Harbor Laboratory, Cold Spring Harbor, NY, USA
| | - Daniel Farid
- Program in Computational Biology and Bioinformatics, Yale University, New Haven, CT, USA; Department of Molecular Biophysics and Biochemistry, Yale University, New Haven, CT, USA
| | | | - Idan Gabdank
- Department of Genetics, School of Medicine, Stanford University, Palo Alto, CA, USA
| | - Yoel Gofin
- Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, TX, USA
| | - David U Gorkin
- Ludwig Institute for Cancer Research, University of California, San Diego, La Jolla, CA, USA
| | - Mengting Gu
- Program in Computational Biology and Bioinformatics, Yale University, New Haven, CT, USA; Department of Molecular Biophysics and Biochemistry, Yale University, New Haven, CT, USA
| | - Vivian Hecht
- Broad Institute of MIT and Harvard, Cambridge, MA, USA
| | - Benjamin C Hitz
- Department of Genetics, School of Medicine, Stanford University, Palo Alto, CA, USA
| | - Robbyn Issner
- Broad Institute of MIT and Harvard, Cambridge, MA, USA
| | - Yunzhe Jiang
- Program in Computational Biology and Bioinformatics, Yale University, New Haven, CT, USA; Department of Molecular Biophysics and Biochemistry, Yale University, New Haven, CT, USA
| | - Melanie Kirsche
- Departments of Computer Science and Biology, Johns Hopkins University, Baltimore, MD, USA
| | - Xiangmeng Kong
- Program in Computational Biology and Bioinformatics, Yale University, New Haven, CT, USA; Department of Molecular Biophysics and Biochemistry, Yale University, New Haven, CT, USA
| | - Bonita R Lam
- Department of Genetics, School of Medicine, Stanford University, Palo Alto, CA, USA
| | - Shantao Li
- Program in Computational Biology and Bioinformatics, Yale University, New Haven, CT, USA; Department of Molecular Biophysics and Biochemistry, Yale University, New Haven, CT, USA
| | - Bian Li
- Program in Computational Biology and Bioinformatics, Yale University, New Haven, CT, USA; Department of Molecular Biophysics and Biochemistry, Yale University, New Haven, CT, USA
| | - Xiqi Li
- Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, TX, USA
| | - Khine Zin Lin
- Department of Genetics, School of Medicine, Stanford University, Palo Alto, CA, USA
| | - Ruibang Luo
- Department of Computer Science, The University of Hong Kong, Hong Kong, CHN
| | - Mark Mackiewicz
- HudsonAlpha Institute for Biotechnology, Huntsville, AL, USA
| | - Ran Meng
- Program in Computational Biology and Bioinformatics, Yale University, New Haven, CT, USA; Department of Molecular Biophysics and Biochemistry, Yale University, New Haven, CT, USA
| | - Jill E Moore
- Program in Bioinformatics and Integrative Biology, University of Massachusetts Medical School, Worcester, MA, USA
| | - Jonathan Mudge
- European Bioinformatics Institute, Cambridge, Cambridgeshire, GB
| | | | - Chad Nusbaum
- Broad Institute of MIT and Harvard, Cambridge, MA, USA
| | - Ioann Popov
- Program in Computational Biology and Bioinformatics, Yale University, New Haven, CT, USA; Department of Molecular Biophysics and Biochemistry, Yale University, New Haven, CT, USA
| | - Henry E Pratt
- Program in Bioinformatics and Integrative Biology, University of Massachusetts Medical School, Worcester, MA, USA
| | - Yunjiang Qiu
- Ludwig Institute for Cancer Research, University of California, San Diego, La Jolla, CA, USA
| | - Srividya Ramakrishnan
- Departments of Computer Science and Biology, Johns Hopkins University, Baltimore, MD, USA
| | - Joe Raymond
- Broad Institute of MIT and Harvard, Cambridge, MA, USA
| | - Leonidas Salichos
- Program in Computational Biology and Bioinformatics, Yale University, New Haven, CT, USA; Department of Molecular Biophysics and Biochemistry, Yale University, New Haven, CT, USA; Department of Biological and Chemical Sciences, New York Institute of Technology, Old Westbury, NY, USA
| | - Alexandra Scavelli
- Functional Genomics, Cold Spring Harbor Laboratory, Cold Spring Harbor, NY, USA
| | - Jacob M Schreiber
- Department of Genome Sciences, University of Washington, Seattle, WA, USA
| | - Fritz J Sedlazeck
- Departments of Computer Science and Biology, Johns Hopkins University, Baltimore, MD, USA; Simons Center for Quantitative Biology, Cold Spring Harbor Laboratory, Cold Spring Harbor, NY, USA; Human Genome Sequencing Center, Baylor College of Medicine, Houston, TX, USA
| | - Lei Hoon See
- Functional Genomics, Cold Spring Harbor Laboratory, Cold Spring Harbor, NY, USA
| | - Rachel M Sherman
- Departments of Computer Science and Biology, Johns Hopkins University, Baltimore, MD, USA
| | - Xu Shi
- Program in Computational Biology and Bioinformatics, Yale University, New Haven, CT, USA; Department of Molecular Biophysics and Biochemistry, Yale University, New Haven, CT, USA
| | - Minyi Shi
- Department of Genetics, School of Medicine, Stanford University, Palo Alto, CA, USA
| | - Cricket Alicia Sloan
- Department of Genetics, School of Medicine, Stanford University, Palo Alto, CA, USA
| | - J Seth Strattan
- Department of Genetics, School of Medicine, Stanford University, Palo Alto, CA, USA
| | - Zhen Tan
- Program in Computational Biology and Bioinformatics, Yale University, New Haven, CT, USA; Department of Molecular Biophysics and Biochemistry, Yale University, New Haven, CT, USA
| | - Forrest Y Tanaka
- Department of Genetics, School of Medicine, Stanford University, Palo Alto, CA, USA
| | - Anna Vlasova
- Centre for Genomic Regulation, The Barcelona Institute of Science and Technology, Barcelona, Catalonia, Spain; Comparative Genomics Group, Life Science Programme, Barcelona Supercomputing Centre, Barcelona, Spain; Institute of Research in Biomedicine, Barcelona, Spain
| | - Jun Wang
- Program in Computational Biology and Bioinformatics, Yale University, New Haven, CT, USA; Department of Molecular Biophysics and Biochemistry, Yale University, New Haven, CT, USA
| | - Jonathan Werner
- Functional Genomics, Cold Spring Harbor Laboratory, Cold Spring Harbor, NY, USA
| | - Brian Williams
- Division of Biology and Biological Engineering, California Institute of Technology, Pasadena, CA, USA
| | - Min Xu
- Program in Computational Biology and Bioinformatics, Yale University, New Haven, CT, USA; Department of Molecular Biophysics and Biochemistry, Yale University, New Haven, CT, USA
| | - Chengfei Yan
- Program in Computational Biology and Bioinformatics, Yale University, New Haven, CT, USA; Department of Molecular Biophysics and Biochemistry, Yale University, New Haven, CT, USA
| | - Lu Yu
- Institute of Cancer Research, London, UK
| | - Christopher Zaleski
- Functional Genomics, Cold Spring Harbor Laboratory, Cold Spring Harbor, NY, USA
| | - Jing Zhang
- Department of Computer Science, University of California, Irvine, Irvine, CA, USA
| | | | - J Michael Cherry
- Department of Genetics, School of Medicine, Stanford University, Palo Alto, CA, USA
| | | | - William S Noble
- Department of Genome Sciences, University of Washington, Seattle, WA, USA
| | - Zhiping Weng
- Program in Bioinformatics and Integrative Biology, University of Massachusetts Medical School, Worcester, MA, USA
| | - Morgan E Levine
- Program in Computational Biology and Bioinformatics, Yale University, New Haven, CT, USA; Department of Pathology, Yale University School of Medicine, New Haven, CT, USA
| | - Alexander Dobin
- Functional Genomics, Cold Spring Harbor Laboratory, Cold Spring Harbor, NY, USA
| | - Barbara Wold
- Division of Biology and Biological Engineering, California Institute of Technology, Pasadena, CA, USA
| | - Ali Mortazavi
- Department of Developmental and Cell Biology, University of California, Irvine, Irvine, CA, USA
| | - Bing Ren
- Ludwig Institute for Cancer Research, University of California, San Diego, La Jolla, CA, USA
| | - Jesse Gillis
- Functional Genomics, Cold Spring Harbor Laboratory, Cold Spring Harbor, NY, USA; Department of Physiology, University of Toronto, Toronto, ON, Canada
| | - Richard M Myers
- HudsonAlpha Institute for Biotechnology, Huntsville, AL, USA
| | - Michael P Snyder
- Department of Genetics, School of Medicine, Stanford University, Palo Alto, CA, USA
| | | | | | - Michael C Schatz
- Departments of Computer Science and Biology, Johns Hopkins University, Baltimore, MD, USA; Simons Center for Quantitative Biology, Cold Spring Harbor Laboratory, Cold Spring Harbor, NY, USA.
| | - Bradley E Bernstein
- Broad Institute of MIT and Harvard, Cambridge, MA, USA; Department of Cancer Biology, Dana-Farber Cancer Institute, Boston, MA, USA.
| | - Roderic Guigó
- Centre for Genomic Regulation, The Barcelona Institute of Science and Technology, Barcelona, Catalonia, Spain; Universitat Pompeu Fabra, Barcelona, Catalonia, Spain.
| | - Thomas R Gingeras
- Functional Genomics, Cold Spring Harbor Laboratory, Cold Spring Harbor, NY, USA.
| | - Mark Gerstein
- Section on Biomedical Informatics and Data Science, Yale University, New Haven, CT, USA; Program in Computational Biology and Bioinformatics, Yale University, New Haven, CT, USA; Department of Molecular Biophysics and Biochemistry, Yale University, New Haven, CT, USA; Department of Statistics and Data Science, Yale University, New Haven, CT, USA; Department of Computer Science, Yale University, New Haven, CT, USA.
| |
Collapse
|
6
|
Jou J, Gabdank I, Luo Y, Lin K, Sud P, Myers Z, Hilton JA, Kagda MS, Lam B, O'Neill E, Adenekan P, Graham K, Baymuradov UK, R Miyasato S, Strattan JS, Jolanki O, Lee JW, Litton C, Y Tanaka F, Hitz BC, Cherry JM. The ENCODE Portal as an Epigenomics Resource. ACTA ACUST UNITED AC 2020; 68:e89. [PMID: 31751002 PMCID: PMC7307447 DOI: 10.1002/cpbi.89] [Citation(s) in RCA: 15] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/15/2022]
Abstract
The Encyclopedia of DNA Elements (ENCODE) web portal hosts genomic data generated by the ENCODE Consortium, Genomics of Gene Regulation, The NIH Roadmap Epigenomics Consortium, and the modENCODE and modERN projects. The goal of the ENCODE project is to build a comprehensive map of the functional elements of the human and mouse genomes. Currently, the portal database stores over 500 TB of raw and processed data from over 15,000 experiments spanning assays that measure gene expression, DNA accessibility, DNA and RNA binding, DNA methylation, and 3D chromatin structure across numerous cell lines, tissue types, and differentiation states with selected genetic and molecular perturbations. The ENCODE portal provides unrestricted access to the aforementioned data and relevant metadata as a service to the scientific community. The metadata model captures the details of the experiments, raw and processed data files, and processing pipelines in human and machine‐readable form and enables the user to search for specific data either using a web browser or programmatically via REST API. Furthermore, ENCODE data can be freely visualized or downloaded for additional analyses. © 2019 The Authors. Basic Protocol: Query the portal Support Protocol 1: Batch downloading Support Protocol 2: Using the cart to download files Support Protocol 3: Visualize data Alternate Protocol: Query building and programmatic access
Collapse
Affiliation(s)
- Jennifer Jou
- Department of Genetics, Stanford University, Stanford, California
| | - Idan Gabdank
- Department of Genetics, Stanford University, Stanford, California
| | - Yunhai Luo
- Department of Genetics, Stanford University, Stanford, California
| | - Khine Lin
- Department of Genetics, Stanford University, Stanford, California
| | - Paul Sud
- Department of Genetics, Stanford University, Stanford, California
| | - Zachary Myers
- Department of Genetics, Stanford University, Stanford, California
| | - Jason A Hilton
- Department of Genetics, Stanford University, Stanford, California
| | | | - Bonita Lam
- Department of Genetics, Stanford University, Stanford, California
| | - Emma O'Neill
- Department of Genetics, Stanford University, Stanford, California
| | - Philip Adenekan
- Department of Genetics, Stanford University, Stanford, California
| | - Keenan Graham
- Department of Genetics, Stanford University, Stanford, California
| | | | | | - J Seth Strattan
- Department of Genetics, Stanford University, Stanford, California
| | - Otto Jolanki
- Department of Genetics, Stanford University, Stanford, California
| | - Jin-Wook Lee
- Department of Genetics, Stanford University, Stanford, California
| | - Casey Litton
- Department of Genetics, Stanford University, Stanford, California
| | - Forrest Y Tanaka
- Department of Genetics, Stanford University, Stanford, California
| | - Benjamin C Hitz
- Department of Genetics, Stanford University, Stanford, California
| | - J Michael Cherry
- Department of Genetics, Stanford University, Stanford, California
| |
Collapse
|
7
|
Luo Y, Hitz BC, Gabdank I, Hilton JA, Kagda MS, Lam B, Myers Z, Sud P, Jou J, Lin K, Baymuradov UK, Graham K, Litton C, Miyasato SR, Strattan JS, Jolanki O, Lee JW, Tanaka FY, Adenekan P, O'Neill E, Cherry JM. New developments on the Encyclopedia of DNA Elements (ENCODE) data portal. Nucleic Acids Res 2020; 48:D882-D889. [PMID: 31713622 PMCID: PMC7061942 DOI: 10.1093/nar/gkz1062] [Citation(s) in RCA: 276] [Impact Index Per Article: 69.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/13/2019] [Revised: 10/18/2019] [Accepted: 10/25/2019] [Indexed: 02/06/2023] Open
Abstract
The Encyclopedia of DNA Elements (ENCODE) is an ongoing collaborative research project aimed at identifying all the functional elements in the human and mouse genomes. Data generated by the ENCODE consortium are freely accessible at the ENCODE portal (https://www.encodeproject.org/), which is developed and maintained by the ENCODE Data Coordinating Center (DCC). Since the initial portal release in 2013, the ENCODE DCC has updated the portal to make ENCODE data more findable, accessible, interoperable and reusable. Here, we report on recent updates, including new ENCODE data and assays, ENCODE uniform data processing pipelines, new visualization tools, a dataset cart feature, unrestricted public access to ENCODE data on the cloud (Amazon Web Services open data registry, https://registry.opendata.aws/encode-project/) and more comprehensive tutorials and documentation.
Collapse
Affiliation(s)
- Yunhai Luo
- Department of Genetics, Stanford University, Stanford, CA 94305-5477, USA
| | - Benjamin C Hitz
- Department of Genetics, Stanford University, Stanford, CA 94305-5477, USA
| | - Idan Gabdank
- Department of Genetics, Stanford University, Stanford, CA 94305-5477, USA
| | - Jason A Hilton
- Department of Genetics, Stanford University, Stanford, CA 94305-5477, USA
| | - Meenakshi S Kagda
- Department of Genetics, Stanford University, Stanford, CA 94305-5477, USA
| | - Bonita Lam
- Department of Genetics, Stanford University, Stanford, CA 94305-5477, USA
| | - Zachary Myers
- Department of Genetics, Stanford University, Stanford, CA 94305-5477, USA
| | - Paul Sud
- Department of Genetics, Stanford University, Stanford, CA 94305-5477, USA
| | - Jennifer Jou
- Department of Genetics, Stanford University, Stanford, CA 94305-5477, USA
| | - Khine Lin
- Department of Genetics, Stanford University, Stanford, CA 94305-5477, USA
| | | | - Keenan Graham
- Department of Genetics, Stanford University, Stanford, CA 94305-5477, USA
| | - Casey Litton
- Department of Genetics, Stanford University, Stanford, CA 94305-5477, USA
| | - Stuart R Miyasato
- Department of Genetics, Stanford University, Stanford, CA 94305-5477, USA
| | - J Seth Strattan
- Department of Genetics, Stanford University, Stanford, CA 94305-5477, USA
| | - Otto Jolanki
- Department of Genetics, Stanford University, Stanford, CA 94305-5477, USA
| | - Jin-Wook Lee
- Department of Genetics, Stanford University, Stanford, CA 94305-5477, USA
| | - Forrest Y Tanaka
- Department of Genetics, Stanford University, Stanford, CA 94305-5477, USA
| | - Philip Adenekan
- Department of Genetics, Stanford University, Stanford, CA 94305-5477, USA
| | - Emma O'Neill
- Department of Genetics, Stanford University, Stanford, CA 94305-5477, USA
| | - J Michael Cherry
- Department of Genetics, Stanford University, Stanford, CA 94305-5477, USA
| |
Collapse
|
8
|
Davis CA, Hitz BC, Sloan CA, Chan ET, Davidson JM, Gabdank I, Hilton JA, Jain K, Baymuradov UK, Narayanan AK, Onate KC, Graham K, Miyasato SR, Dreszer TR, Strattan JS, Jolanki O, Tanaka FY, Cherry JM. The Encyclopedia of DNA elements (ENCODE): data portal update. Nucleic Acids Res 2019; 46:D794-D801. [PMID: 29126249 PMCID: PMC5753278 DOI: 10.1093/nar/gkx1081] [Citation(s) in RCA: 1071] [Impact Index Per Article: 214.2] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/15/2017] [Accepted: 10/19/2017] [Indexed: 12/30/2022] Open
Abstract
The Encyclopedia of DNA Elements (ENCODE) Data Coordinating Center has developed the ENCODE Portal database and website as the source for the data and metadata generated by the ENCODE Consortium. Two principles have motivated the design. First, experimental protocols, analytical procedures and the data themselves should be made publicly accessible through a coherent, web-based search and download interface. Second, the same interface should serve carefully curated metadata that record the provenance of the data and justify its interpretation in biological terms. Since its initial release in 2013 and in response to recommendations from consortium members and the wider community of scientists who use the Portal to access ENCODE data, the Portal has been regularly updated to better reflect these design principles. Here we report on these updates, including results from new experiments, uniformly-processed data from other projects, new visualization tools and more comprehensive metadata to describe experiments and analyses. Additionally, the Portal is now home to meta(data) from related projects including Genomics of Gene Regulation, Roadmap Epigenome Project, Model organism ENCODE (modENCODE) and modERN. The Portal now makes available over 13000 datasets and their accompanying metadata and can be accessed at: https://www.encodeproject.org/.
Collapse
Affiliation(s)
- Carrie A Davis
- Department of Genetics, Stanford University, Stanford, CA 94305-5120, USA
| | - Benjamin C Hitz
- Department of Genetics, Stanford University, Stanford, CA 94305-5120, USA
| | - Cricket A Sloan
- Department of Genetics, Stanford University, Stanford, CA 94305-5120, USA
| | - Esther T Chan
- Department of Genetics, Stanford University, Stanford, CA 94305-5120, USA
| | - Jean M Davidson
- Department of Genetics, Stanford University, Stanford, CA 94305-5120, USA
| | - Idan Gabdank
- Department of Genetics, Stanford University, Stanford, CA 94305-5120, USA
| | - Jason A Hilton
- Department of Genetics, Stanford University, Stanford, CA 94305-5120, USA
| | - Kriti Jain
- Department of Genetics, Stanford University, Stanford, CA 94305-5120, USA
| | | | - Aditi K Narayanan
- Department of Genetics, Stanford University, Stanford, CA 94305-5120, USA
| | - Kathrina C Onate
- Department of Genetics, Stanford University, Stanford, CA 94305-5120, USA
| | - Keenan Graham
- Department of Genetics, Stanford University, Stanford, CA 94305-5120, USA
| | - Stuart R Miyasato
- Department of Genetics, Stanford University, Stanford, CA 94305-5120, USA
| | - Timothy R Dreszer
- Department of Genetics, Stanford University, Stanford, CA 94305-5120, USA
| | - J Seth Strattan
- Department of Genetics, Stanford University, Stanford, CA 94305-5120, USA
| | - Otto Jolanki
- Department of Genetics, Stanford University, Stanford, CA 94305-5120, USA
| | - Forrest Y Tanaka
- Department of Genetics, Stanford University, Stanford, CA 94305-5120, USA
| | - J Michael Cherry
- Department of Genetics, Stanford University, Stanford, CA 94305-5120, USA
| |
Collapse
|
9
|
Yoshimura J, Ichikawa K, Shoura MJ, Artiles KL, Gabdank I, Wahba L, Smith CL, Edgley ML, Rougvie AE, Fire AZ, Morishita S, Schwarz EM. Recompleting the Caenorhabditis elegans genome. Genome Res 2019; 29:1009-1022. [PMID: 31123080 PMCID: PMC6581061 DOI: 10.1101/gr.244830.118] [Citation(s) in RCA: 73] [Impact Index Per Article: 14.6] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/04/2018] [Accepted: 03/11/2019] [Indexed: 01/14/2023]
Abstract
Caenorhabditis elegans was the first multicellular eukaryotic genome sequenced to apparent completion. Although this assembly employed a standard C. elegans strain (N2), it used sequence data from several laboratories, with DNA propagated in bacteria and yeast. Thus, the N2 assembly has many differences from any C. elegans available today. To provide a more accurate C. elegans genome, we performed long-read assembly of VC2010, a modern strain derived from N2. Our VC2010 assembly has 99.98% identity to N2 but with an additional 1.8 Mb including tandem repeat expansions and genome duplications. For 116 structural discrepancies between N2 and VC2010, 97 structures matching VC2010 (84%) were also found in two outgroup strains, implying deficiencies in N2. Over 98% of N2 genes encoded unchanged products in VC2010; moreover, we predicted ≥53 new genes in VC2010. The recompleted genome of C. elegans should be a valuable resource for genetics, genomics, and systems biology.
Collapse
Affiliation(s)
- Jun Yoshimura
- Department of Computational Biology and Medical Sciences, Graduate School of Frontier Sciences, The University of Tokyo, Chiba 277-8583, Japan
| | - Kazuki Ichikawa
- Department of Computational Biology and Medical Sciences, Graduate School of Frontier Sciences, The University of Tokyo, Chiba 277-8583, Japan
| | - Massa J Shoura
- Department of Pathology, Stanford University, Stanford, California 94305, USA
| | - Karen L Artiles
- Department of Pathology, Stanford University, Stanford, California 94305, USA
| | - Idan Gabdank
- Department of Genetics, Stanford University, Stanford, California 94305, USA
| | - Lamia Wahba
- Department of Pathology, Stanford University, Stanford, California 94305, USA
| | - Cheryl L Smith
- Department of Pathology, Stanford University, Stanford, California 94305, USA.,Department of Genetics, Stanford University, Stanford, California 94305, USA
| | - Mark L Edgley
- Department of Zoology and Michael Smith Laboratories, University of British Columbia, Vancouver V6T 1Z3, British Columbia, Canada
| | - Ann E Rougvie
- Department of Genetics, Cell Biology, and Development, University of Minnesota, Minneapolis, Minnesota 55454, USA
| | - Andrew Z Fire
- Department of Pathology, Stanford University, Stanford, California 94305, USA.,Department of Genetics, Stanford University, Stanford, California 94305, USA
| | - Shinichi Morishita
- Department of Computational Biology and Medical Sciences, Graduate School of Frontier Sciences, The University of Tokyo, Chiba 277-8583, Japan
| | - Erich M Schwarz
- Department of Molecular Biology and Genetics, Cornell University, Ithaca, New York 14853, USA
| |
Collapse
|
10
|
Gabdank I, Chan ET, Davidson JM, Hilton JA, Davis CA, Baymuradov UK, Narayanan A, Onate KC, Graham K, Miyasato SR, Dreszer TR, Strattan JS, Jolanki O, Tanaka FY, Hitz BC, Sloan CA, Cherry JM. Prevention of data duplication for high throughput sequencing repositories. Database (Oxford) 2018; 2018:4913687. [PMID: 29688363 PMCID: PMC5829560 DOI: 10.1093/database/bay008] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 10/20/2017] [Accepted: 01/10/2018] [Indexed: 01/01/2023]
Abstract
Database URL https://www.encodeproject.org/.
Collapse
Affiliation(s)
- Idan Gabdank
- Department of Genetics, Stanford University, Stanford, CA 94305-5120, USA
| | - Esther T Chan
- Department of Genetics, Stanford University, Stanford, CA 94305-5120, USA
| | - Jean M Davidson
- Department of Genetics, Stanford University, Stanford, CA 94305-5120, USA
| | - Jason A Hilton
- Department of Genetics, Stanford University, Stanford, CA 94305-5120, USA
| | - Carrie A Davis
- Department of Genetics, Stanford University, Stanford, CA 94305-5120, USA
| | | | - Aditi Narayanan
- Department of Genetics, Stanford University, Stanford, CA 94305-5120, USA
| | - Kathrina C Onate
- Department of Genetics, Stanford University, Stanford, CA 94305-5120, USA
| | - Keenan Graham
- Department of Genetics, Stanford University, Stanford, CA 94305-5120, USA
| | - Stuart R Miyasato
- Department of Genetics, Stanford University, Stanford, CA 94305-5120, USA
| | - Timothy R Dreszer
- Department of Genetics, Stanford University, Stanford, CA 94305-5120, USA
| | - J Seth Strattan
- Department of Genetics, Stanford University, Stanford, CA 94305-5120, USA
| | - Otto Jolanki
- Department of Genetics, Stanford University, Stanford, CA 94305-5120, USA
| | - Forrest Y Tanaka
- Department of Genetics, Stanford University, Stanford, CA 94305-5120, USA
| | - Benjamin C Hitz
- Department of Genetics, Stanford University, Stanford, CA 94305-5120, USA
| | - Cricket A Sloan
- Department of Genetics, Stanford University, Stanford, CA 94305-5120, USA
| | - J Michael Cherry
- Department of Genetics, Stanford University, Stanford, CA 94305-5120, USA
| |
Collapse
|
11
|
Gabdank I, Ramakrishnan S, Villeneuve AM, Fire AZ. A streamlined tethered chromosome conformation capture protocol. BMC Genomics 2016; 17:274. [PMID: 27036078 PMCID: PMC4818521 DOI: 10.1186/s12864-016-2596-3] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/14/2015] [Accepted: 03/16/2016] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Identification of locus-locus contacts at the chromatin level provides a valuable foundation for understanding of nuclear architecture and function and a valuable tool for inferring long-range linkage relationships. As one approach to this, chromatin conformation capture-based techniques allow creation of genome spatial organization maps. While such approaches have been available for some time, methodological advances will be of considerable use in minimizing both time and input material required for successful application. RESULTS Here we report a modified tethered conformation capture protocol that utilizes a series of rapid and efficient molecular manipulations. We applied the method to Caenorhabditis elegans, obtaining chromatin interaction maps that provide a sequence-anchored delineation of salient aspects of Caenorhabditis elegans chromosome structure, demonstrating a high level of consistency in overall chromosome organization between biological samples collected under different conditions. In addition to the application of the method to defining nuclear architecture, we found the resulting chromatin interaction maps to be of sufficient resolution and sensitivity to enable detection of large-scale structural variants such as inversions or translocations. CONCLUSION Our streamlined protocol provides an accelerated, robust, and broadly applicable means of generating chromatin spatial organization maps and detecting genome rearrangements without a need for cellular or chromatin fractionation.
Collapse
Affiliation(s)
- Idan Gabdank
- Department of Genetics, Stanford University School of Medicine, Stanford, CA, 94304, USA
| | - Sreejith Ramakrishnan
- Departments of Developmental Biology and Genetics, Stanford University School of Medicine, Stanford, California, 94304, USA
| | - Anne M Villeneuve
- Departments of Developmental Biology and Genetics, Stanford University School of Medicine, Stanford, California, 94304, USA
| | - Andrew Z Fire
- Departments of Pathology and Genetics, Stanford University School of Medicine, Stanford, California, 94304, USA.
| |
Collapse
|
12
|
Hong EL, Sloan CA, Chan ET, Davidson JM, Malladi VS, Strattan JS, Hitz BC, Gabdank I, Narayanan AK, Ho M, Lee BT, Rowe LD, Dreszer TR, Roe GR, Podduturi NR, Tanaka F, Hilton JA, Cherry JM. Principles of metadata organization at the ENCODE data coordination center. Database (Oxford) 2016; 2016:baw001. [PMID: 26980513 PMCID: PMC4792520 DOI: 10.1093/database/baw001] [Citation(s) in RCA: 35] [Impact Index Per Article: 4.4] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 11/06/2015] [Accepted: 01/04/2016] [Indexed: 12/20/2022]
Abstract
The Encyclopedia of DNA Elements (ENCODE) Data Coordinating Center (DCC) is responsible for organizing, describing and providing access to the diverse data generated by the ENCODE project. The description of these data, known as metadata, includes the biological sample used as input, the protocols and assays performed on these samples, the data files generated from the results and the computational methods used to analyze the data. Here, we outline the principles and philosophy used to define the ENCODE metadata in order to create a metadata standard that can be applied to diverse assays and multiple genomic projects. In addition, we present how the data are validated and used by the ENCODE DCC in creating the ENCODE Portal (https://www.encodeproject.org/). Database URL:www.encodeproject.org
Collapse
Affiliation(s)
- Eurie L Hong
- Department of Genetics, Stanford University School of Medicine Department of Genetics, Stanford, CA, USA
| | - Cricket A Sloan
- Department of Genetics, Stanford University School of Medicine Department of Genetics, Stanford, CA, USA
| | - Esther T Chan
- Department of Genetics, Stanford University School of Medicine Department of Genetics, Stanford, CA, USA
| | - Jean M Davidson
- Department of Genetics, Stanford University School of Medicine Department of Genetics, Stanford, CA, USA
| | - Venkat S Malladi
- Department of Genetics, Stanford University School of Medicine Department of Genetics, Stanford, CA, USA
| | - J Seth Strattan
- Department of Genetics, Stanford University School of Medicine Department of Genetics, Stanford, CA, USA
| | - Benjamin C Hitz
- Department of Genetics, Stanford University School of Medicine Department of Genetics, Stanford, CA, USA
| | - Idan Gabdank
- Department of Genetics, Stanford University School of Medicine Department of Genetics, Stanford, CA, USA
| | - Aditi K Narayanan
- Department of Genetics, Stanford University School of Medicine Department of Genetics, Stanford, CA, USA
| | - Marcus Ho
- Department of Genetics, Stanford University School of Medicine Department of Genetics, Stanford, CA, USA
| | - Brian T Lee
- Center for Biomolecular Science and Engineering Santa Cruz, University of California, Santa Cruz, CA, USA
| | - Laurence D Rowe
- Department of Genetics, Stanford University School of Medicine Department of Genetics, Stanford, CA, USA
| | - Timothy R Dreszer
- Department of Genetics, Stanford University School of Medicine Department of Genetics, Stanford, CA, USA
| | - Greg R Roe
- Department of Genetics, Stanford University School of Medicine Department of Genetics, Stanford, CA, USA
| | - Nikhil R Podduturi
- Department of Genetics, Stanford University School of Medicine Department of Genetics, Stanford, CA, USA
| | - Forrest Tanaka
- Department of Genetics, Stanford University School of Medicine Department of Genetics, Stanford, CA, USA
| | - Jason A Hilton
- Department of Genetics, Stanford University School of Medicine Department of Genetics, Stanford, CA, USA
| | - J Michael Cherry
- Department of Genetics, Stanford University School of Medicine Department of Genetics, Stanford, CA, USA
| |
Collapse
|
13
|
Sloan CA, Chan ET, Davidson JM, Malladi VS, Strattan JS, Hitz BC, Gabdank I, Narayanan AK, Ho M, Lee BT, Rowe LD, Dreszer TR, Roe G, Podduturi NR, Tanaka F, Hong EL, Cherry JM. ENCODE data at the ENCODE portal. Nucleic Acids Res 2016; 44:D726-32. [PMID: 26527727 PMCID: PMC4702836 DOI: 10.1093/nar/gkv1160] [Citation(s) in RCA: 326] [Impact Index Per Article: 40.8] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/14/2015] [Revised: 10/07/2015] [Accepted: 10/19/2015] [Indexed: 01/20/2023] Open
Abstract
The Encyclopedia of DNA Elements (ENCODE) Project is in its third phase of creating a comprehensive catalog of functional elements in the human genome. This phase of the project includes an expansion of assays that measure diverse RNA populations, identify proteins that interact with RNA and DNA, probe regions of DNA hypersensitivity, and measure levels of DNA methylation in a wide range of cell and tissue types to identify putative regulatory elements. To date, results for almost 5000 experiments have been released for use by the scientific community. These data are available for searching, visualization and download at the new ENCODE Portal (www.encodeproject.org). The revamped ENCODE Portal provides new ways to browse and search the ENCODE data based on the metadata that describe the assays as well as summaries of the assays that focus on data provenance. In addition, it is a flexible platform that allows integration of genomic data from multiple projects. The portal experience was designed to improve access to ENCODE data by relying on metadata that allow reusability and reproducibility of the experiments.
Collapse
Affiliation(s)
- Cricket A Sloan
- Stanford University School of Medicine, Department of Genetics, Stanford, CA, 94305, USA
| | - Esther T Chan
- Stanford University School of Medicine, Department of Genetics, Stanford, CA, 94305, USA
| | - Jean M Davidson
- Stanford University School of Medicine, Department of Genetics, Stanford, CA, 94305, USA
| | - Venkat S Malladi
- Stanford University School of Medicine, Department of Genetics, Stanford, CA, 94305, USA
| | - J Seth Strattan
- Stanford University School of Medicine, Department of Genetics, Stanford, CA, 94305, USA
| | - Benjamin C Hitz
- Stanford University School of Medicine, Department of Genetics, Stanford, CA, 94305, USA
| | - Idan Gabdank
- Stanford University School of Medicine, Department of Genetics, Stanford, CA, 94305, USA
| | - Aditi K Narayanan
- Stanford University School of Medicine, Department of Genetics, Stanford, CA, 94305, USA
| | - Marcus Ho
- Stanford University School of Medicine, Department of Genetics, Stanford, CA, 94305, USA
| | - Brian T Lee
- University of California at Santa Cruz, Center for Biomolecular Science and Engineering, Santa Cruz, CA, 95064, USA
| | - Laurence D Rowe
- Stanford University School of Medicine, Department of Genetics, Stanford, CA, 94305, USA
| | - Timothy R Dreszer
- Stanford University School of Medicine, Department of Genetics, Stanford, CA, 94305, USA
| | - Greg Roe
- Stanford University School of Medicine, Department of Genetics, Stanford, CA, 94305, USA
| | - Nikhil R Podduturi
- Stanford University School of Medicine, Department of Genetics, Stanford, CA, 94305, USA
| | - Forrest Tanaka
- Stanford University School of Medicine, Department of Genetics, Stanford, CA, 94305, USA
| | - Eurie L Hong
- Stanford University School of Medicine, Department of Genetics, Stanford, CA, 94305, USA
| | - J Michael Cherry
- Stanford University School of Medicine, Department of Genetics, Stanford, CA, 94305, USA
| |
Collapse
|
14
|
Churkin A, Gabdank I, Barash D. On topological indices for small RNA graphs. Comput Biol Chem 2012; 41:35-40. [PMID: 23147564 DOI: 10.1016/j.compbiolchem.2012.10.004] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/31/2012] [Revised: 10/11/2012] [Accepted: 10/12/2012] [Indexed: 11/29/2022]
Abstract
The secondary structure of RNAs can be represented by graphs at various resolutions. While it was shown that RNA secondary structures can be represented by coarse grain tree-graphs and meaningful topological indices can be used to distinguish between various structures, small RNAs are needed to be represented by full graphs. No meaningful topological index has yet been suggested for the analysis of such type of RNA graphs. Recalling that the second eigenvalue of the Laplacian matrix can be used to track topological changes in the case of coarse grain tree-graphs, it is plausible to assume that a topological index such as the Wiener index that represents all Laplacian eigenvalues may provide a similar guide for full graphs. However, by its original definition, the Wiener index was defined for acyclic graphs. Nevertheless, similarly to cyclic chemical graphs, small RNA graphs can be analyzed using elementary cuts, which enables the calculation of topological indices for small RNAs in an intuitive way. We show how to calculate a structural descriptor that is suitable for cyclic graphs, the Szeged index, for small RNA graphs by elementary cuts. We discuss potential uses of such a procedure that considers all eigenvalues of the associated Laplacian matrices to quantify the topology of small RNA graphs.
Collapse
Affiliation(s)
- Alexander Churkin
- Department of Computer Science, Ben-Gurion University, 84105 Beer-Sheva, Israel
| | | | | |
Collapse
|
15
|
Abstract
RNA mutational analysis at the secondary-structure level can be useful to a wide-range of biological applications. It can be used to predict an optimal site for performing a nucleotide mutation at the single molecular level, as well as to analyze basic phenomena at the systems level. For the former, as more sequence modification experiments are performed that include site-directed mutagenesis to find and explore functional motifs in RNAs, a pre-processing step that helps guide in planning the experiment becomes vital. For the latter, mutations are generally accepted as a central mechanism by which evolution occurs, and mutational analysis relating to structure should gain a better understanding of system functionality and evolution. In the past several years, the program RNAmute that is structure based and relies on RNA secondary-structure prediction has been developed for assisting in RNA mutational analysis. It has been extended from single-point mutations to treat multiple-point mutations efficiently by initially calculating all suboptimal solutions, after which only the mutations that stabilize the suboptimal solutions and destabilize the optimal one are considered as candidates for being deleterious. The RNAmute web server for mutational analysis is available at http://www.cs.bgu.ac.il/~xrnamute/XRNAmute.
Collapse
Affiliation(s)
- Alexander Churkin
- Department of Computer Science, Ben-Gurion University, Beer-Sheva 84105, Israel
| | | | | |
Collapse
|
16
|
Abstract
Nucleosome DNA bendability pattern extracted from large nucleosome DNA database of C. elegans is used for construction of full length (116 dinucleotide positions) nucleosome DNA bendability matrix. The matrix can be used for sequence-directed mapping of the nucleosomes on the sequences. Several alternative positions for a given nucleosome are typically predicted, separated by multiples of nucleosome DNA period. The corresponding computer program is successfully tested on best known experimental examples of accurately positioned nucleosomes. The uncertainty of the computational mapping is +/-1 base. The procedure is placed on publicly accessible server and can be applied to any DNA sequence of interest.
Collapse
Affiliation(s)
- I Gabdank
- Department of Computer Science, Ben Gurion University of the Negev, P.O.B 653 Be'er Sheva 84105, Israel.
| | | | | |
Collapse
|
17
|
David M, Gabdank I, Ben-David M, Zilka A, Orr I, Barash D, Shapira M. Preferential translation of Hsp83 in Leishmania requires a thermosensitive polypyrimidine-rich element in the 3' UTR and involves scanning of the 5' UTR. RNA 2010; 16:364-374. [PMID: 20040590 PMCID: PMC2811665 DOI: 10.1261/rna.1874710] [Citation(s) in RCA: 35] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/08/2009] [Accepted: 11/09/2009] [Indexed: 05/28/2023]
Abstract
Heat shock proteins (HSPs) provide a useful system for studying developmental patterns in the digenetic Leishmania parasites, since their expression is induced in the mammalian life form. Translation regulation plays a key role in control of protein coding genes in trypanosomatids, and is directed exclusively by elements in the 3' untranslated region (UTR). Using sequential deletions of the Leishmania Hsp83 3' UTR (888 nucleotides [nt]), we mapped a region of 150 nt that was required, but not sufficient for preferential translation of a reporter gene at mammalian-like temperatures, suggesting that changes in RNA structure could be involved. An advanced bioinformatics package for prediction of RNA folding (UNAfold) marked the regulatory region on a highly probable structural arm that includes a polypyrimidine tract (PPT). Mutagenesis of this PPT abrogated completely preferential translation of the fused reporter gene. Furthermore, temperature elevation caused the regulatory region to melt more extensively than the same region that lacked the PPT. We propose that at elevated temperatures the regulatory element in the 3' UTR is more accessible to mediators that promote its interaction with the basal translation components at the 5' end during mRNA circularization. Translation initiation of Hsp83 at all temperatures appears to proceed via scanning of the 5' UTR, since a hairpin structure abolishes expression of a fused reporter gene.
Collapse
Affiliation(s)
- Maya David
- Department of Life Sciences, Ben Gurion University of the Negev, Beer Sheva 84105, Israel
| | | | | | | | | | | | | |
Collapse
|
18
|
|
19
|
Abstract
Energy minimization methods for RNA secondary structure prediction have been used extensively for studying a variety of biological systems. Here, we demonstrate their applicability in riboswitch studies, exemplified in both the expression platform and aptamer domains. In the expression platform domain, energy minimization methods can be used to predict in silico a unique point mutation positioned in the non-conserved region of the TPP riboswitch that will transform it from a termination to an anti-termination state, thus backing the prediction experimentally. Furthermore, a successive prediction can be made for a compensatory mutation that is positioned over half the sequence length of the riboswitch from the original mutation and that completely overturns the anti-termination effect of the original mutation. This approach can be used to computationally predict rational modifications in riboswitches for both research and practical applications. In the aptamer domain, energy minimization methods can be used when attempting to detect a novel purine riboswitch in eukaryotes based on the consensus sequence and structure of the bacterial guanine binding aptamer. In the process, some interesting candidates are identified, and although they are attractive enough to be tested experimentally, they are not detectable by sequence based methods alone. These brief examples represent the important lessons to be learned as to the strengths and limitations of energy minimization methods. In light of our growing knowledge in the energy minimization field, future challenges can be advanced for the rational design of known riboswitches and the detection of novel riboswitches. Unlike analyses of specific cases, it is stressed that all the results described here are predictive in scope with direct applicability and an attempt to validate the predictions experimentally.
Collapse
Affiliation(s)
- Danny Barash
- Department of Computer Science, Ben-Gurion University, Beer-Sheva, Israel.
| | | |
Collapse
|
20
|
|
21
|
Cohen A, Bocobza S, Veksler I, Gabdank I, Barash D, Aharoni A, Shapira M, Kedem K. Computational identification of three-way junctions in folded RNAs: a case study in Arabidopsis. In Silico Biol 2008; 8:105-120. [PMID: 18928199] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 05/26/2023]
Abstract
Three-way junctions in folded RNAs have been investigated both experimentally and computationally. The interest in their analysis stems from the fact that they have significantly been found to possess a functional role. In recent work, three-way junctions have been categorized into families depending on the relative lengths of the segments linking the three helices. Here, based on ideas originating from computational geometry, an algorithm is proposed for detecting three-way junctions in data sets of genes that are related to a metabolic pathway of interest. In its current implementation, the algorithm relies on a moving window that performs energy minimization folding predictions, and is demonstrated on a set of genes that are involved in purine metabolism in plants. The pattern matching algorithm can be extended to other organisms and other metabolic cycles of interest in which three-way junctions have been or will be discovered to play an important role. In the test case presented here with, the computational prediction of a three-way junction in Arabidopsis that was speculated to have an interesting functional role is verified experimentally.
Collapse
Affiliation(s)
- Adaya Cohen
- Department of Computer Science, Ben-Gurion University, Beer-Sheva 84105, Israel
| | | | | | | | | | | | | | | |
Collapse
|
22
|
Abstract
The discovery of natural RNA sensors that respond to a change in the environment by a conformational switch can be utilized for various biotechnological and nanobiotechnological advances. One class of RNA sensors is the riboswitch: an RNA genetic control element that is capable of sensing small molecules, responding to a deviation in ligand concentration with a structural change. Riboswitches are modularly built from smaller components. Computational methods can potentially be utilized in assembling these building block components and offering improvements in the biochemical design process. We describe a computational procedure to design RNA switches from building blocks with favorable properties. To achieve maximal throughput for genetic control purposes, future designer RNA switches can be assembled based on a computerized preprocessing buildup of the constituent domains, namely the aptamer and the expression platform in the case of a synthetic riboswitch. Conformational switching is enabled by the RNA versatility to possess two highly stable states that are energetically close to each other but topologically distinct, separated by an energy barrier between them. Initially, computer simulations can produce a list of short sequences that switch between two conformers when trigerred by point mutations or temperature. The short sequences should possess an additional desirable property; when these selected small RNA switch segments are attached to various aptamers, the ligand binding mechanism should replace the aforementioned event triggers, which will no longer be effective for crossing the energy barrier. In the assembled RNA sequence, energy minimization folding predictions should then show no difference between the folded structure of the entire sequence relative to the folded structure of each of its constituents. Moreover, energy minimization methods applied on the entire sequence could aid at this preprocessing stage by exhibiting high mutational robustness to capture the stability of the formed hairpin in the expression platform. The above computer-assisted assembly procedure together with application specific considerations may further be tailored for therapeutic gene regulation. Index Terms-Design of RNA switches, energy minimization methods, RNA folding predictions.
Collapse
Affiliation(s)
- Assaf Avihoo
- Department of Computer Science, Ben-Gurion University of the Negev, Beer-Sheva 84105, Israel
| | | | | | | |
Collapse
|
23
|
Abstract
Evolution of the triplet code is reconstructed on the basis of consensus temporal order of appearance of amino acids. Several important predictions are confirmed by computational sequence analyses. The earliest amino acids, alanine and glycine, have been encoded by GCC and GGC codons, as today. They were succeeded, respectively, by A- and G-series of amino acids, encoded by pyrimidine-central and purine-central codons. The length of the earliest proteins is estimated to be 6-7 residues. The earliest mRNAs were short G+C-rich molecules. These short sequences could have formed hairpins. This is confirmed by analysis of modern prokaryotic mRNA sequences. Predominant size of detected ancient hairpins also corresponds to 6-7 amino acids, as above. Vestiges of last common ancestor can be found in extant proteins in form of entirely conserved short sequences of size six to nine residues present in all or almost all sequenced prokaryotic proteomes (omnipresent motifs). The functions of the topmost conserved octamers are not involved in the basic elementary syntheses. This suggests an initial abiotic supply of amino acids, bases and sugars.
Collapse
Affiliation(s)
- Edward N Trifonov
- Genome Diversity Center, Institute of Evolution, University of Haifa, Haifa 31905, Israel.
| | | | | | | |
Collapse
|
24
|
Abstract
From recent developments of the early evolution theory it follows that the earliest mRNAs were short ( approximately 20 nt) (G+C)-rich polynucleotides. These short sequences could form hairpins, which would be of high evolutionary advantage because of stability and uniqueness of their conformations. Due to mutations accumulated during billions of years of evolution, the speculated earliest hairpins would largely lose the initial complementarities. Some of the original complementary base-to-base contacts, however, may have survived. Computational analysis of modern prokaryotic mRNA sequences reveals excess population of the expected short range complementarities. The derived earliest mRNA hairpin size fully corresponds to the predicted size of ancient coding duplexes. The repertoire of the surviving hairpins traced in modern mRNA confirms duplex structure of the earliest mRNA, suggested by the early molecular evolution theory.
Collapse
Affiliation(s)
- Idan Gabdank
- Department of Computer Science, Ben Gurion University of the Negev, P.O.B 653, Be'er Sheva 84105, Israel.
| | | | | |
Collapse
|