1
|
Maeng JH, Jang HJ, Du AY, Tzeng SC, Wang T. Using long-read CAGE sequencing to profile cryptic-promoter-derived transcripts and their contribution to the immunopeptidome. Genome Res 2023; 33:gr.277061.122. [PMID: 38065624 PMCID: PMC10760525 DOI: 10.1101/gr.277061.122] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/19/2023] [Accepted: 11/13/2023] [Indexed: 01/04/2024]
Abstract
Recent studies have shown that the noncoding genome can produce unannotated proteins as antigens that induce immune response. One major source of this activity is the aberrant epigenetic reactivation of transposable elements (TEs). In tumors, TEs often provide cryptic or alternate promoters, which can generate transcripts that encode tumor-specific unannotated proteins. Thus, TE-derived transcripts (TE transcripts) have the potential to produce tumor-specific, but recurrent, antigens shared among many tumors. Identification of TE-derived tumor antigens holds the promise to improve cancer immunotherapy approaches; however, current genomics and computational tools are not optimized for their detection. Here we combined CAGE technology with full-length long-read transcriptome sequencing (long-read CAGE, or LRCAGE) and developed a suite of computational tools to significantly improve immunopeptidome detection by incorporating TE and other tumor transcripts into the proteome database. By applying our methods to human lung cancer cell line H1299 data, we show that long-read technology significantly improves mapping of promoters with low mappability scores and that LRCAGE guarantees accurate construction of uncharacterized 5' transcript structure. Augmenting a reference proteome database with newly characterized transcripts enabled us to detect noncanonical antigens from HLA-pulldown LC-MS/MS data. Lastly, we show that epigenetic treatment increased the number of noncanonical antigens, particularly those encoded by TE transcripts, which might expand the pool of targetable antigens for cancers with low mutational burden.
Collapse
Affiliation(s)
- Ju Heon Maeng
- Department of Genetics, Washington University School of Medicine, St. Louis, Missouri 63110, USA
- Edison Family Center for Genome Sciences and Systems Biology, Washington University School of Medicine, St. Louis, Missouri 63110, USA
| | - H Josh Jang
- Department of Genetics, Washington University School of Medicine, St. Louis, Missouri 63110, USA
- Edison Family Center for Genome Sciences and Systems Biology, Washington University School of Medicine, St. Louis, Missouri 63110, USA
| | - Alan Y Du
- Department of Genetics, Washington University School of Medicine, St. Louis, Missouri 63110, USA
- Edison Family Center for Genome Sciences and Systems Biology, Washington University School of Medicine, St. Louis, Missouri 63110, USA
| | - Shin-Cheng Tzeng
- Donald Danforth Plant Science Center, St. Louis, Missouri 63132, USA
| | - Ting Wang
- Department of Genetics, Washington University School of Medicine, St. Louis, Missouri 63110, USA;
- Edison Family Center for Genome Sciences and Systems Biology, Washington University School of Medicine, St. Louis, Missouri 63110, USA
- McDonnell Genome Institute, Washington University School of Medicine, St. Louis, Missouri 63108, USA
| |
Collapse
|
2
|
Hitz BC, Lee JW, Jolanki O, Kagda MS, Graham K, Sud P, Gabdank I, Strattan JS, Sloan CA, Dreszer T, Rowe LD, Podduturi NR, Malladi VS, Chan ET, Davidson JM, Ho M, Miyasato S, Simison M, Tanaka F, Luo Y, Whaling I, Hong EL, Lee BT, Sandstrom R, Rynes E, Nelson J, Nishida A, Ingersoll A, Buckley M, Frerker M, Kim DS, Boley N, Trout D, Dobin A, Rahmanian S, Wyman D, Balderrama-Gutierrez G, Reese F, Durand NC, Dudchenko O, Weisz D, Rao SSP, Blackburn A, Gkountaroulis D, Sadr M, Olshansky M, Eliaz Y, Nguyen D, Bochkov I, Shamim MS, Mahajan R, Aiden E, Gingeras T, Heath S, Hirst M, Kent WJ, Kundaje A, Mortazavi A, Wold B, Cherry JM. The ENCODE Uniform Analysis Pipelines. RESEARCH SQUARE 2023:rs.3.rs-3111932. [PMID: 37503119 PMCID: PMC10371165 DOI: 10.21203/rs.3.rs-3111932/v1] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 07/29/2023]
Abstract
The Encyclopedia of DNA elements (ENCODE) project is a collaborative effort to create a comprehensive catalog of functional elements in the human genome. The current database comprises more than 19000 functional genomics experiments across more than 1000 cell lines and tissues using a wide array of experimental techniques to study the chromatin structure, regulatory and transcriptional landscape of the Homo sapiens and Mus musculus genomes. All experimental data, metadata, and associated computational analyses created by the ENCODE consortium are submitted to the Data Coordination Center (DCC) for validation, tracking, storage, and distribution to community resources and the scientific community. The ENCODE project has engineered and distributed uniform processing pipelines in order to promote data provenance and reproducibility as well as allow interoperability between genomic resources and other consortia. All data files, reference genome versions, software versions, and parameters used by the pipelines are captured and available via the ENCODE Portal. The pipeline code, developed using Docker and Workflow Description Language (WDL; https://openwdl.org/) is publicly available in GitHub, with images available on Dockerhub (https://hub.docker.com), enabling access to a diverse range of biomedical researchers. ENCODE pipelines maintained and used by the DCC can be installed to run on personal computers, local HPC clusters, or in cloud computing environments via Cromwell. Access to the pipelines and data via the cloud allows small labs the ability to use the data or software without access to institutional compute clusters. Standardization of the computational methodologies for analysis and quality control leads to comparable results from different ENCODE collections - a prerequisite for successful integrative analyses.
Collapse
Affiliation(s)
- Benjamin C Hitz
- Department of Genetics, Stanford University School of Medicine, Stanford, CA 94305, USA
| | - Jin-Wook Lee
- Department of Genetics, Stanford University School of Medicine, Stanford, CA 94305, USA
| | - Otto Jolanki
- Department of Genetics, Stanford University School of Medicine, Stanford, CA 94305, USA
| | - Meenakshi S Kagda
- Department of Genetics, Stanford University School of Medicine, Stanford, CA 94305, USA
| | - Keenan Graham
- Department of Genetics, Stanford University School of Medicine, Stanford, CA 94305, USA
| | - Paul Sud
- Department of Genetics, Stanford University School of Medicine, Stanford, CA 94305, USA
| | - Idan Gabdank
- Department of Genetics, Stanford University School of Medicine, Stanford, CA 94305, USA
| | - J Seth Strattan
- Department of Genetics, Stanford University School of Medicine, Stanford, CA 94305, USA
| | - Cricket A Sloan
- Department of Genetics, Stanford University School of Medicine, Stanford, CA 94305, USA
| | - Timothy Dreszer
- Department of Genetics, Stanford University School of Medicine, Stanford, CA 94305, USA
| | - Laurence D Rowe
- Department of Genetics, Stanford University School of Medicine, Stanford, CA 94305, USA
| | - Nikhil R Podduturi
- Department of Genetics, Stanford University School of Medicine, Stanford, CA 94305, USA
| | - Venkat S Malladi
- Department of Genetics, Stanford University School of Medicine, Stanford, CA 94305, USA
| | - Esther T Chan
- Department of Genetics, Stanford University School of Medicine, Stanford, CA 94305, USA
| | - Jean M Davidson
- Department of Genetics, Stanford University School of Medicine, Stanford, CA 94305, USA
| | - Marcus Ho
- Department of Genetics, Stanford University School of Medicine, Stanford, CA 94305, USA
| | - Stuart Miyasato
- Department of Genetics, Stanford University School of Medicine, Stanford, CA 94305, USA
| | - Matt Simison
- Department of Genetics, Stanford University School of Medicine, Stanford, CA 94305, USA
| | - Forrest Tanaka
- Department of Genetics, Stanford University School of Medicine, Stanford, CA 94305, USA
| | - Yunhai Luo
- Department of Genetics, Stanford University School of Medicine, Stanford, CA 94305, USA
| | - Ian Whaling
- Department of Genetics, Stanford University School of Medicine, Stanford, CA 94305, USA
| | - Eurie L Hong
- Department of Genetics, Stanford University School of Medicine, Stanford, CA 94305, USA
| | - Brian T Lee
- Genomics Institute, School of Engineering, University of California Santa Cruz, Santa Cruz, CA 95064, USA
| | - Richard Sandstrom
- Altius Institute for Biomedical Sciences, 2211 Elliott Avenue, 6th Floor, Seattle, WA 98121, USA
| | - Eric Rynes
- Altius Institute for Biomedical Sciences, 2211 Elliott Avenue, 6th Floor, Seattle, WA 98121, USA
| | - Jemma Nelson
- Altius Institute for Biomedical Sciences, 2211 Elliott Avenue, 6th Floor, Seattle, WA 98121, USA
| | - Andrew Nishida
- Altius Institute for Biomedical Sciences, 2211 Elliott Avenue, 6th Floor, Seattle, WA 98121, USA
| | - Alyssa Ingersoll
- Altius Institute for Biomedical Sciences, 2211 Elliott Avenue, 6th Floor, Seattle, WA 98121, USA
| | - Michael Buckley
- Altius Institute for Biomedical Sciences, 2211 Elliott Avenue, 6th Floor, Seattle, WA 98121, USA
| | - Mark Frerker
- Altius Institute for Biomedical Sciences, 2211 Elliott Avenue, 6th Floor, Seattle, WA 98121, USA
| | - Daniel S Kim
- Department of Genetics, Department of Computer Science, Stanford University, 240 Pasteur Drive, Palo Alto, CA 94304, USA
| | - Nathan Boley
- Department of Genetics, Department of Computer Science, Stanford University, 240 Pasteur Drive, Palo Alto, CA 94304, USA
| | - Diane Trout
- Division of Biology and Biological Engineering, California Institute of Technology, Pasadena, CA, 91125 USA
| | - Alex Dobin
- Cold Spring Harbor Laboratory, Cold Spring Harbor, NY 11724, USA
| | - Sorena Rahmanian
- Center for Complex Biological Systems, University of California, Irvine, Irvine, CA 92697, USA
| | - Dana Wyman
- Center for Complex Biological Systems, University of California, Irvine, Irvine, CA 92697, USA
| | | | - Fairlie Reese
- Center for Complex Biological Systems, University of California, Irvine, Irvine, CA 92697, USA
| | - Neva C Durand
- Broad Institute of MIT and Harvard, Cambridge, MA 02142, USA
- The Center for Genome Architecture, Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, TX 77030, USA
- Center for Theoretical Biological Physics, Rice University, Houston, TX 77030, USA
- Department of Computer Science, Rice University, Houston, TX 77030, USA
| | - Olga Dudchenko
- The Center for Genome Architecture, Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, TX 77030, USA
| | - David Weisz
- The Center for Genome Architecture, Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, TX 77030, USA
| | - Suhas S P Rao
- The Center for Genome Architecture, Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, TX 77030, USA
- Department of Structural Biology, Stanford University School of Medicine, Stanford, CA 94305, USA
- Department of Medicine, University of California San Francisco, San Francisco, CA 94143, USA
| | - Alyssa Blackburn
- The Center for Genome Architecture, Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, TX 77030, USA
- Center for Theoretical Biological Physics, Rice University, Houston, TX 77030, USA
| | - Dimos Gkountaroulis
- The Center for Genome Architecture, Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, TX 77030, USA
- Center for Theoretical Biological Physics, Rice University, Houston, TX 77030, USA
| | - Mahdi Sadr
- The Center for Genome Architecture, Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, TX 77030, USA
| | - Moshe Olshansky
- Broad Institute of MIT and Harvard, Cambridge, MA 02142, USA
| | - Yossi Eliaz
- The Center for Genome Architecture, Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, TX 77030, USA
| | - Dat Nguyen
- The Center for Genome Architecture, Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, TX 77030, USA
| | - Ivan Bochkov
- The Center for Genome Architecture, Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, TX 77030, USA
| | - Muhammad Saad Shamim
- The Center for Genome Architecture, Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, TX 77030, USA
- Center for Theoretical Biological Physics, Rice University, Houston, TX 77030, USA
- Department of Bioengineering, Rice University, Houston, TX 77030, USA
- Medical Scientist Training Program, Baylor College of Medicine, Houston, TX 77030, USA
| | - Ragini Mahajan
- Center for Theoretical Biological Physics, Rice University, Houston, TX 77030, USA
- Department of BioSciences, Rice University, Houston, TX 77005, USA
| | - Erez Aiden
- Broad Institute of MIT and Harvard, Cambridge, MA 02142, USA
- The Center for Genome Architecture, Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, TX 77030, USA
- Center for Theoretical Biological Physics, Rice University, Houston, TX 77030, USA
| | - Tom Gingeras
- Cold Spring Harbor Laboratory, Cold Spring Harbor, NY 11724, USA
| | - Simon Heath
- CNAG-CRG, Centre for Genomic Regulation (CRG), The Barcelona Institute of Science and Technology (BIST), Barcelona, Spain. Universitat Pompeu Fabra, Barcelona, Spain
| | - Martin Hirst
- Micheal Smith Laboratories, University of British Columbia, British Columbia, Canada
| | - W James Kent
- Genomics Institute, School of Engineering, University of California Santa Cruz, Santa Cruz, CA 95064, USA
| | - Anshul Kundaje
- Department of Genetics, Department of Computer Science, Stanford University, 240 Pasteur Drive, Palo Alto, CA 94304, USA
| | - Ali Mortazavi
- Center for Complex Biological Systems, University of California, Irvine, Irvine, CA 92697, USA
| | - Barbara Wold
- Division of Biology and Biological Engineering, California Institute of Technology, Pasadena, CA, 91125 USA
| | - J Michael Cherry
- Department of Genetics, Stanford University School of Medicine, Stanford, CA 94305, USA
| |
Collapse
|
3
|
Hitz BC, Jin-Wook L, Jolanki O, Kagda MS, Graham K, Sud P, Gabdank I, Strattan JS, Sloan CA, Dreszer T, Rowe LD, Podduturi NR, Malladi VS, Chan ET, Davidson JM, Ho M, Miyasato S, Simison M, Tanaka F, Luo Y, Whaling I, Hong EL, Lee BT, Sandstrom R, Rynes E, Nelson J, Nishida A, Ingersoll A, Buckley M, Frerker M, Kim DS, Boley N, Trout D, Dobin A, Rahmanian S, Wyman D, Balderrama-Gutierrez G, Reese F, Durand NC, Dudchenko O, Weisz D, Rao SSP, Blackburn A, Gkountaroulis D, Sadr M, Olshansky M, Eliaz Y, Nguyen D, Bochkov I, Shamim MS, Mahajan R, Aiden E, Gingeras T, Heath S, Hirst M, Kent WJ, Kundaje A, Mortazavi A, Wold B, Cherry JM. The ENCODE Uniform Analysis Pipelines. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.04.04.535623. [PMID: 37066421 PMCID: PMC10104020 DOI: 10.1101/2023.04.04.535623] [Citation(s) in RCA: 7] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 04/18/2023]
Abstract
The Encyclopedia of DNA elements (ENCODE) project is a collaborative effort to create a comprehensive catalog of functional elements in the human genome. The current database comprises more than 19000 functional genomics experiments across more than 1000 cell lines and tissues using a wide array of experimental techniques to study the chromatin structure, regulatory and transcriptional landscape of the Homo sapiens and Mus musculus genomes. All experimental data, metadata, and associated computational analyses created by the ENCODE consortium are submitted to the Data Coordination Center (DCC) for validation, tracking, storage, and distribution to community resources and the scientific community. The ENCODE project has engineered and distributed uniform processing pipelines in order to promote data provenance and reproducibility as well as allow interoperability between genomic resources and other consortia. All data files, reference genome versions, software versions, and parameters used by the pipelines are captured and available via the ENCODE Portal. The pipeline code, developed using Docker and Workflow Description Language (WDL; https://openwdl.org/) is publicly available in GitHub, with images available on Dockerhub (https://hub.docker.com), enabling access to a diverse range of biomedical researchers. ENCODE pipelines maintained and used by the DCC can be installed to run on personal computers, local HPC clusters, or in cloud computing environments via Cromwell. Access to the pipelines and data via the cloud allows small labs the ability to use the data or software without access to institutional compute clusters. Standardization of the computational methodologies for analysis and quality control leads to comparable results from different ENCODE collections - a prerequisite for successful integrative analyses.
Collapse
Affiliation(s)
- Benjamin C Hitz
- Department of Genetics, Stanford University School of Medicine, Stanford, CA 94305, USA
| | - Lee Jin-Wook
- Department of Genetics, Stanford University School of Medicine, Stanford, CA 94305, USA
| | - Otto Jolanki
- Department of Genetics, Stanford University School of Medicine, Stanford, CA 94305, USA
| | - Meenakshi S Kagda
- Department of Genetics, Stanford University School of Medicine, Stanford, CA 94305, USA
| | - Keenan Graham
- Department of Genetics, Stanford University School of Medicine, Stanford, CA 94305, USA
| | - Paul Sud
- Department of Genetics, Stanford University School of Medicine, Stanford, CA 94305, USA
| | - Idan Gabdank
- Department of Genetics, Stanford University School of Medicine, Stanford, CA 94305, USA
| | - J Seth Strattan
- Department of Genetics, Stanford University School of Medicine, Stanford, CA 94305, USA
| | - Cricket A Sloan
- Department of Genetics, Stanford University School of Medicine, Stanford, CA 94305, USA
| | - Timothy Dreszer
- Department of Genetics, Stanford University School of Medicine, Stanford, CA 94305, USA
| | - Laurence D Rowe
- Department of Genetics, Stanford University School of Medicine, Stanford, CA 94305, USA
| | - Nikhil R Podduturi
- Department of Genetics, Stanford University School of Medicine, Stanford, CA 94305, USA
| | - Venkat S Malladi
- Department of Genetics, Stanford University School of Medicine, Stanford, CA 94305, USA
| | - Esther T Chan
- Department of Genetics, Stanford University School of Medicine, Stanford, CA 94305, USA
| | - Jean M Davidson
- Department of Genetics, Stanford University School of Medicine, Stanford, CA 94305, USA
| | - Marcus Ho
- Department of Genetics, Stanford University School of Medicine, Stanford, CA 94305, USA
| | - Stuart Miyasato
- Department of Genetics, Stanford University School of Medicine, Stanford, CA 94305, USA
| | - Matt Simison
- Department of Genetics, Stanford University School of Medicine, Stanford, CA 94305, USA
| | - Forrest Tanaka
- Department of Genetics, Stanford University School of Medicine, Stanford, CA 94305, USA
| | - Yunhai Luo
- Department of Genetics, Stanford University School of Medicine, Stanford, CA 94305, USA
| | - Ian Whaling
- Department of Genetics, Stanford University School of Medicine, Stanford, CA 94305, USA
| | - Eurie L Hong
- Department of Genetics, Stanford University School of Medicine, Stanford, CA 94305, USA
| | - Brian T Lee
- Genomics Institute, School of Engineering, University of California Santa Cruz, Santa Cruz, CA 95064, USA
| | - Richard Sandstrom
- Altius Institute for Biomedical Sciences, 2211 Elliott Avenue, 6th Floor, Seattle, WA 98121, USA
| | - Eric Rynes
- Altius Institute for Biomedical Sciences, 2211 Elliott Avenue, 6th Floor, Seattle, WA 98121, USA
| | - Jemma Nelson
- Altius Institute for Biomedical Sciences, 2211 Elliott Avenue, 6th Floor, Seattle, WA 98121, USA
| | - Andrew Nishida
- Altius Institute for Biomedical Sciences, 2211 Elliott Avenue, 6th Floor, Seattle, WA 98121, USA
| | - Alyssa Ingersoll
- Altius Institute for Biomedical Sciences, 2211 Elliott Avenue, 6th Floor, Seattle, WA 98121, USA
| | - Michael Buckley
- Altius Institute for Biomedical Sciences, 2211 Elliott Avenue, 6th Floor, Seattle, WA 98121, USA
| | - Mark Frerker
- Altius Institute for Biomedical Sciences, 2211 Elliott Avenue, 6th Floor, Seattle, WA 98121, USA
| | - Daniel S Kim
- Dept. of Genetics, Dept. of Computer Science, Stanford University, 240 Pasteur Drive, Palo Alto, CA 94304, USA
| | - Nathan Boley
- Dept. of Genetics, Dept. of Computer Science, Stanford University, 240 Pasteur Drive, Palo Alto, CA 94304, USA
| | - Diane Trout
- Division of Biology and Biological Engineering, California Institute of Technology, Pasadena, CA, 91125 USA
| | - Alex Dobin
- Cold Spring Harbor Laboratory, Cold Spring Harbor, NY 11724, USA
| | - Sorena Rahmanian
- Center for Complex Biological Systems, University of California, Irvine, Irvine, CA 92697, USA
| | - Dana Wyman
- Center for Complex Biological Systems, University of California, Irvine, Irvine, CA 92697, USA
| | | | - Fairlie Reese
- Center for Complex Biological Systems, University of California, Irvine, Irvine, CA 92697, USA
| | - Neva C Durand
- Broad Institute of MIT and Harvard, Cambridge, MA 02142, USA
- The Center for Genome Architecture, Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, TX 77030, USA
- Center for Theoretical Biological Physics, Rice University, Houston, TX 77030, USA
- Department of Computer Science, Rice University, Houston, TX 77030, USA
| | - Olga Dudchenko
- The Center for Genome Architecture, Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, TX 77030, USA
| | - David Weisz
- The Center for Genome Architecture, Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, TX 77030, USA
| | - Suhas S P Rao
- The Center for Genome Architecture, Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, TX 77030, USA
- Department of Structural Biology, Stanford University School of Medicine, Stanford, CA 94305, USA
- Department of Medicine, University of California San Francisco, San Francisco, CA 94143, USA
| | - Alyssa Blackburn
- The Center for Genome Architecture, Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, TX 77030, USA
- Center for Theoretical Biological Physics, Rice University, Houston, TX 77030, USA
| | - Dimos Gkountaroulis
- The Center for Genome Architecture, Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, TX 77030, USA
- Center for Theoretical Biological Physics, Rice University, Houston, TX 77030, USA
| | - Mahdi Sadr
- The Center for Genome Architecture, Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, TX 77030, USA
| | - Moshe Olshansky
- Broad Institute of MIT and Harvard, Cambridge, MA 02142, USA
| | - Yossi Eliaz
- The Center for Genome Architecture, Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, TX 77030, USA
| | - Dat Nguyen
- The Center for Genome Architecture, Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, TX 77030, USA
| | - Ivan Bochkov
- The Center for Genome Architecture, Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, TX 77030, USA
| | - Muhammad Saad Shamim
- The Center for Genome Architecture, Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, TX 77030, USA
- Center for Theoretical Biological Physics, Rice University, Houston, TX 77030, USA
- Department of Bioengineering, Rice University, Houston, TX 77030, USA
- Medical Scientist Training Program, Baylor College of Medicine, Houston, TX 77030, USA
| | - Ragini Mahajan
- Center for Theoretical Biological Physics, Rice University, Houston, TX 77030, USA
- Department of BioSciences, Rice University, Houston, TX 77005, USA
| | - Erez Aiden
- Broad Institute of MIT and Harvard, Cambridge, MA 02142, USA
- The Center for Genome Architecture, Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, TX 77030, USA
- Center for Theoretical Biological Physics, Rice University, Houston, TX 77030, USA
| | - Tom Gingeras
- Cold Spring Harbor Laboratory, Cold Spring Harbor, NY 11724, USA
| | - Simon Heath
- CNAG-CRG, Centre for Genomic Regulation (CRG), The Barcelona Institute of Science and Technology (BIST), Barcelona, Spain. Universitat Pompeu Fabra, Barcelona, Spain
| | - Martin Hirst
- Micheal Smith Laboratories, University of British Columbia, British Columbia, Canada
| | - W James Kent
- Genomics Institute, School of Engineering, University of California Santa Cruz, Santa Cruz, CA 95064, USA
| | - Anshul Kundaje
- Dept. of Genetics, Dept. of Computer Science, Stanford University, 240 Pasteur Drive, Palo Alto, CA 94304, USA
| | - Ali Mortazavi
- Center for Complex Biological Systems, University of California, Irvine, Irvine, CA 92697, USA
| | - Barbara Wold
- Division of Biology and Biological Engineering, California Institute of Technology, Pasadena, CA, 91125 USA
| | - J Michael Cherry
- Department of Genetics, Stanford University School of Medicine, Stanford, CA 94305, USA
| |
Collapse
|
4
|
The Diagnostic and Therapeutic Role of Circular RNA HIPK3 in Human Diseases. Diagnostics (Basel) 2022; 12:diagnostics12102469. [PMID: 36292157 PMCID: PMC9601126 DOI: 10.3390/diagnostics12102469] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/14/2022] [Revised: 10/05/2022] [Accepted: 10/09/2022] [Indexed: 11/17/2022] Open
Abstract
Circular RNAs (circRNAs) are a class of noncoding RNAs with closed-loop of single-stranded RNA structure. Although most of the circRNAs do not directly encode proteins, emerging evidence suggests that circRNAs play a pivotal and complex role in multiple biological processes by regulating gene expression. As one of the most popular circRNAs, circular homeodomain-interacting protein kinase 3 (circHIPK3) has frequently gained the interest of researchers in recent years. Accumulating studies have demonstrated the significant impacts on the occurrence and development of multiple human diseases including cancers, cardiovascular diseases, diabetes mellitus, inflammatory diseases, and others. The present review aims to provide a detailed description of the functions of circHIPK3 and comprehensively overview the diagnostic and therapeutic value of circHIPK3 in these certain diseases.
Collapse
|
5
|
Ferrer-Bonsoms JA, Morales X, Afshar PT, Wong WH, Rubio A. On the identifiability of the isoform deconvolution problem: application to select the proper fragment length in an RNA-seq library. Bioinformatics 2022; 38:1491-1496. [PMID: 34978563 PMCID: PMC8896638 DOI: 10.1093/bioinformatics/btab873] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/18/2021] [Revised: 11/12/2021] [Accepted: 12/30/2021] [Indexed: 02/03/2023] Open
Abstract
MOTIVATION Isoform deconvolution is an NP-hard problem. The accuracy of the proposed solutions is far from perfect. At present, it is not known if gene structure and isoform concentration can be uniquely inferred given paired-end reads, and there is no objective method to select the fragment length to improve the number of identifiable genes. Different pieces of evidence suggest that the optimal fragment length is gene-dependent, stressing the need for a method that selects the fragment length according to a reasonable trade-off across all the genes in the whole genome. RESULTS A gene is considered to be identifiable if it is possible to get both the structure and concentration of its transcripts univocally. Here, we present a method to state the identifiability of this deconvolution problem. Assuming a given transcriptome and that the coverage is sufficient to interrogate all junction reads of the transcripts, this method states whether or not a gene is identifiable given the read length and fragment length distribution. Applying this method using different read and fragment length combinations, the optimal average fragment length for the human transcriptome is around 400-600 nt for coding genes and 150-200 nt for long non-coding RNAs. The optimal read length is the largest one that fits in the fragment length. It is also discussed the potential profit of combining several libraries to reconstruct the transcriptome. Combining two libraries of very different fragment lengths results in a significant improvement in gene identifiability. AVAILABILITY AND IMPLEMENTATION Code is available in GitHub (https://github.com/JFerrer-B/transcriptome-identifiability). SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Juan A Ferrer-Bonsoms
- Department of Biomedical Engineering and Sciences, TECNUN, University of Navarra, Pamplona, Spain
| | - Xabier Morales
- Department of Biomedical Engineering and Sciences, TECNUN, University of Navarra, Pamplona, Spain
| | - Pegah T Afshar
- Department of Statistics, Stanford University, Stanford, CA 94305-4020, USA
| | - Wing H Wong
- Department of Statistics, Stanford University, Stanford, CA 94305-4020, USA
| | | |
Collapse
|
6
|
Wang S, Ying Y, Ma X, Wang W, Wang X, Xie L. Diverse Roles and Therapeutic Potentials of Circular RNAs in Urological Cancers. Front Mol Biosci 2021; 8:761698. [PMID: 34869591 PMCID: PMC8640215 DOI: 10.3389/fmolb.2021.761698] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/20/2021] [Accepted: 10/20/2021] [Indexed: 12/14/2022] Open
Abstract
Circular RNAs (circRNAs) are a novel class of noncoding RNAs, which are mainly formed as a loop structure at the exons caused by noncanonical splicing; they are much more stable than linear transcripts; recent reports have suggested that the dysregulation of circRNAs is associated with the occurrence and development of diseases, especially various human malignancies. Emerging evidence demonstrated that a large number of circRNAs play a vital role in a series of biological processes such as tumor cell proliferation, migration, drug resistance, and immune escape. Additionally, circRNAs were also reported to be potential prognostic and diagnostic biomarkers in cancers. In this work, we systematically summarize the biogenesis and characteristics of circRNAs, paying special attention to potential mechanisms and clinical applications of circRNAs in urological cancers, which may help develop potential therapy targets for urological cancers in the future.
Collapse
Affiliation(s)
- Song Wang
- Department of Urology, The First Affiliated Hospital, School of Medicine, Zhejiang University, Hangzhou, China
| | - Yufan Ying
- Department of Urology, The First Affiliated Hospital, School of Medicine, Zhejiang University, Hangzhou, China
| | - Xueyou Ma
- Department of Urology, The First Affiliated Hospital, School of Medicine, Zhejiang University, Hangzhou, China
| | - Weiyu Wang
- Department of Urology, The First Affiliated Hospital, School of Medicine, Zhejiang University, Hangzhou, China
| | - Xiao Wang
- Department of Urology, The First Affiliated Hospital, School of Medicine, Zhejiang University, Hangzhou, China
| | - Liping Xie
- Department of Urology, The First Affiliated Hospital, School of Medicine, Zhejiang University, Hangzhou, China
| |
Collapse
|
7
|
Guerrini MM, Oguchi A, Suzuki A, Murakawa Y. Cap analysis of gene expression (CAGE) and noncoding regulatory elements. Semin Immunopathol 2021; 44:127-136. [PMID: 34468849 DOI: 10.1007/s00281-021-00886-5] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/01/2021] [Accepted: 08/13/2021] [Indexed: 01/06/2023]
Abstract
Cap analysis of gene expression (CAGE) was developed to detect the 5' end of RNA. Trapping of the RNA 5'-cap structure enables the enrichment and selective sequencing of complete transcripts. Upscaled high-throughput versions of CAGE have enabled the genome-wide identification of transcription start sites, including transcriptionally active promoters and enhancers. CAGE sequencing can be exploited to draw comprehensive maps of active genomic regulatory elements in a cell type- and activation-specific manner. The cells of the immune system are among the best candidates to be analyzed in humans, since they are easily accessible. In this review, we discuss how CAGE data are instrumental for integrative analyses with quantitative trait loci and omics data, and their usefulness in the mechanistic interpretation of the effects of genetic variations over the entire human genome. Integrating CAGE data with the currently available omics information will contribute to better understanding of the genome-wide association study variants that lie outside of annotated genes, deepening our knowledge on human diseases, and enabling the targeted design of more specific therapeutic interventions.
Collapse
Affiliation(s)
- Matteo Maurizio Guerrini
- Laboratory for Autoimmune Diseases, RIKEN Center for Integrative Medical Sciences, Yokohama, Kanagawa, Japan.
| | - Akiko Oguchi
- RIKEN-IFOM Joint Laboratory for Cancer Genomics, RIKEN Center for Integrative Medical Sciences, Yokohama, Japan
| | - Akari Suzuki
- Laboratory for Autoimmune Diseases, RIKEN Center for Integrative Medical Sciences, Yokohama, Kanagawa, Japan
| | - Yasuhiro Murakawa
- RIKEN-IFOM Joint Laboratory for Cancer Genomics, RIKEN Center for Integrative Medical Sciences, Yokohama, Japan
- IFOM-the FIRC Institute of Molecular Oncology, Milan, Italy
| |
Collapse
|
8
|
Ali A, Thorgaard GH, Salem M. PacBio Iso-Seq Improves the Rainbow Trout Genome Annotation and Identifies Alternative Splicing Associated With Economically Important Phenotypes. Front Genet 2021; 12:683408. [PMID: 34335690 PMCID: PMC8321248 DOI: 10.3389/fgene.2021.683408] [Citation(s) in RCA: 16] [Impact Index Per Article: 5.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/20/2021] [Accepted: 06/14/2021] [Indexed: 01/04/2023] Open
Abstract
Rainbow trout is an important model organism that has received concerted international efforts to study the transcriptome. For this purpose, short-read sequencing has been primarily used over the past decade. However, these sequences are too short of resolving the transcriptome complexity. This study reported a first full-length transcriptome assembly of the rainbow trout using single-molecule long-read isoform sequencing (Iso-Seq). Extensive computational approaches were used to refine and validate the reconstructed transcriptome. The study identified 10,640 high-confidence transcripts not previously annotated, in addition to 1,479 isoforms not mapped to the current Swanson reference genome. Most of the identified lncRNAs were non-coding variants of coding transcripts. The majority of genes had multiple transcript isoforms (average ∼3 isoforms/locus). Intron retention (IR) and exon skipping (ES) accounted for 56% of alternative splicing (AS) events. Iso-Seq improved the reference genome annotation, which allowed identification of characteristic AS associated with fish growth, muscle accretion, disease resistance, stress response, and fish migration. For instance, an ES in GVIN1 gene existed in fish susceptible to bacterial cold-water disease (BCWD). Besides, under five stress conditions, there was a commonly regulated exon in prolyl 4-hydroxylase subunit alpha-2 (P4HA2) gene. The reconstructed gene models and their posttranscriptional processing in rainbow trout provide invaluable resources that could be further used for future genetics and genomics studies. Additionally, the study identified characteristic transcription events associated with economically important phenotypes, which could be applied in selective breeding.
Collapse
Affiliation(s)
- Ali Ali
- Department of Animal and Avian Sciences, University of Maryland, College Park, College Park, MD, United States
| | - Gary H. Thorgaard
- School of Biological Sciences and Center for Reproductive Biology, Washington State University, Pullman, WA, United States
| | - Mohamed Salem
- Department of Animal and Avian Sciences, University of Maryland, College Park, College Park, MD, United States
| |
Collapse
|
9
|
Zheng X, Chen Y, Zhou Y, Shi K, Hu X, Li D, Ye H, Zhou Y, Wang K. Full-length annotation with multistrategy RNA-seq uncovers transcriptional regulation of lncRNAs in cotton. PLANT PHYSIOLOGY 2021; 185:179-195. [PMID: 33631798 PMCID: PMC8133545 DOI: 10.1093/plphys/kiaa003] [Citation(s) in RCA: 12] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/21/2020] [Accepted: 10/16/2020] [Indexed: 05/11/2023]
Abstract
Long noncoding RNAs (lncRNAs) are crucial factors during plant development and environmental responses. To build an accurate atlas of lncRNAs in the diploid cotton Gossypium arboreum, we combined Isoform-sequencing, strand-specific RNA-seq (ssRNA-seq), and cap analysis gene expression (CAGE-seq) with PolyA-seq and compiled a pipeline named plant full-length lncRNA to integrate multi-strategy RNA-seq data. In total, 9,240 lncRNAs from 21 tissue samples were identified. 4,405 and 4,805 lncRNA transcripts were supported by CAGE-seq and PolyA-seq, respectively, among which 6.7% and 7.2% had multiple transcription start sites (TSSs) and transcription termination sites (TTSs). We revealed that alternative usage of TSS and TTS of lncRNAs occurs pervasively during plant growth. Besides, we uncovered that many lncRNAs act in cis to regulate adjacent protein-coding genes (PCGs). It was especially interesting to observe 64 cases wherein the lncRNAs were involved in the TSS alternative usage of PCGs. We identified lncRNAs that are coexpressed with ovule- and fiber development-associated PCGs, or linked to GWAS single-nucleotide polymorphisms. We mapped the genome-wide binding sites of two lncRNAs with chromatin isolation by RNA purification sequencing. We also validated the transcriptional regulatory role of lnc-Ga13g0352 via virus-induced gene suppression assay, indicating that this lncRNA might act as a dual-functional regulator that either activates or inhibits the transcription of target genes.
Collapse
Affiliation(s)
- Xiaomin Zheng
- College of Life Sciences, Wuhan University, Wuhan 430000, China
| | - Yanjun Chen
- College of Life Sciences, Wuhan University, Wuhan 430000, China
| | - Yifan Zhou
- College of Life Sciences, Wuhan University, Wuhan 430000, China
| | - Keke Shi
- College of Life Sciences, Wuhan University, Wuhan 430000, China
| | - Xiao Hu
- College of Life Sciences, Wuhan University, Wuhan 430000, China
| | - Danyang Li
- College of Life Sciences, Wuhan University, Wuhan 430000, China
| | - Hanzhe Ye
- College of Life Sciences, Wuhan University, Wuhan 430000, China
| | - Yu Zhou
- College of Life Sciences, Wuhan University, Wuhan 430000, China
- Institute for Advanced Studies, Wuhan University, Wuhan 430000, China
| | - Kun Wang
- College of Life Sciences, Wuhan University, Wuhan 430000, China
- Author for communication:
| |
Collapse
|
10
|
Markus BM, Waldman BS, Lorenzi HA, Lourido S. High-Resolution Mapping of Transcription Initiation in the Asexual Stages of Toxoplasma gondii. Front Cell Infect Microbiol 2021; 10:617998. [PMID: 33553008 PMCID: PMC7854901 DOI: 10.3389/fcimb.2020.617998] [Citation(s) in RCA: 6] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/15/2020] [Accepted: 12/03/2020] [Indexed: 12/13/2022] Open
Abstract
Toxoplasma gondii is a common parasite of humans and animals, causing life-threatening disease in the immunocompromized, fetal abnormalities when contracted during gestation, and recurrent ocular lesions in some patients. Central to the prevalence and pathogenicity of this protozoan is its ability to adapt to a broad range of environments, and to differentiate between acute and chronic stages. These processes are underpinned by a major rewiring of gene expression, yet the mechanisms that regulate transcription in this parasite are only partially characterized. Deciphering these mechanisms requires a precise and comprehensive map of transcription start sites (TSSs); however, Toxoplasma TSSs have remained incompletely defined. To address this challenge, we used 5'-end RNA sequencing to genomically assess transcription initiation in both acute and chronic stages of Toxoplasma. Here, we report an in-depth analysis of transcription initiation at promoters, and provide empirically-defined TSSs for 7603 (91%) protein-coding genes, of which only 1840 concur with existing gene models. Comparing data from acute and chronic stages, we identified instances of stage-specific alternative TSSs that putatively generate mRNA isoforms with distinct 5' termini. Analysis of the nucleotide content and nucleosome occupancy around TSSs allowed us to examine the determinants of TSS choice, and outline features of Toxoplasma promoter architecture. We also found pervasive divergent transcription at Toxoplasma promoters, clustered within the nucleosomes of highly-symmetrical phased arrays, underscoring chromatin contributions to transcription initiation. Corroborating previous observations, we asserted that Toxoplasma 5' leaders are among the longest of any eukaryote studied thus far, displaying a median length of approximately 800 nucleotides. Further highlighting the utility of a precise TSS map, we pinpointed motifs associated with transcription initiation, including the binding sites of the master regulator of chronic-stage differentiation, BFD1, and a novel motif with a similar positional arrangement present at 44% of Toxoplasma promoters. This work provides a critical resource for functional genomics in Toxoplasma, and lays down a foundation to study the interactions between genomic sequences and the regulatory factors that control transcription in this parasite.
Collapse
Affiliation(s)
- Benedikt M. Markus
- Whitehead Institute for Biomedical Research, Cambridge, MA, United States
- Faculty of Biology, University of Freiburg, Freiburg, Germany
| | - Benjamin S. Waldman
- Whitehead Institute for Biomedical Research, Cambridge, MA, United States
- Department of Biology, Massachusetts Institute of Technology, Cambridge, MA, United States
| | | | - Sebastian Lourido
- Whitehead Institute for Biomedical Research, Cambridge, MA, United States
- Department of Biology, Massachusetts Institute of Technology, Cambridge, MA, United States
| |
Collapse
|
11
|
Razo-Mendivil FG, Martínez O, Hayano-Kanashiro C. Compacta: a fast contig clustering tool for de novo assembled transcriptomes. BMC Genomics 2020; 21:148. [PMID: 32046653 PMCID: PMC7014741 DOI: 10.1186/s12864-020-6528-x] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/02/2019] [Accepted: 01/22/2020] [Indexed: 12/25/2022] Open
Abstract
Background RNA-Seq is the preferred method to explore transcriptomes and to estimate differential gene expression. When an organism has a well-characterized and annotated genome, reads obtained from RNA-Seq experiments can be directly mapped to that genome to estimate the number of transcripts present and relative expression levels of these transcripts. However, for unknown genomes, de novo assembly of RNA-Seq reads must be performed to generate a set of contigs that represents the transcriptome. These contig sets contain multiple transcripts, including immature mRNAs, spliced transcripts and allele variants, as well as products of close paralogs or gene families that can be difficult to distinguish. Thus, tools are needed to select a set of less redundant contigs to represent the transcriptome for downstream analyses. Here we describe the development of Compacta to produce contig sets from de novo assemblies. Results Compacta is a fast and flexible computational tool that allows selection of a representative set of contigs from de novo assemblies. Using a graph-based algorithm, Compacta groups contigs into clusters based on the proportion of shared reads. The user can determine the minimum coverage of the contigs to be clustered, as well as a threshold for the proportion of shared reads in the clustered contigs, thus providing a dynamic range of transcriptome compression that can be adapted according to experimental aims. We compared the performance of Compacta against state of the art clustering algorithms on assemblies from Arabidopsis, mouse and mango, and found that Compacta yielded more rapid results and had competitive precision and recall ratios. We describe and demonstrate a pipeline to tailor Compacta parameters to specific experimental aims. Conclusions Compacta is a fast and flexible algorithm for the determination of optimum contig sets that represent the transcriptome for downstream analyses.
Collapse
Affiliation(s)
- Fernando G Razo-Mendivil
- Departamento de Investigaciones Científicas y Tecnológicas de la Universidad de Sonora, Universidad de Sonora, Hermosillo, Mexico
| | - Octavio Martínez
- Unidad de Genómica Avanzada (Langebio), Centro de Investigacíon y de Estudios Avanzados del Instituto Politécnico Nacional (Cinvestav), Irapuato, Gto, Mexico.
| | - Corina Hayano-Kanashiro
- Departamento de Investigaciones Científicas y Tecnológicas de la Universidad de Sonora, Universidad de Sonora, Hermosillo, Mexico.
| |
Collapse
|
12
|
Haronikova L, Olivares-Illana V, Wang L, Karakostis K, Chen S, Fåhraeus R. The p53 mRNA: an integral part of the cellular stress response. Nucleic Acids Res 2019; 47:3257-3271. [PMID: 30828720 PMCID: PMC6468297 DOI: 10.1093/nar/gkz124] [Citation(s) in RCA: 45] [Impact Index Per Article: 9.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/03/2018] [Revised: 02/12/2019] [Accepted: 02/21/2019] [Indexed: 12/16/2022] Open
Abstract
A large number of signalling pathways converge on p53 to induce different cellular stress responses that aim to promote cell cycle arrest and repair or, if the damage is too severe, to induce irreversible senescence or apoptosis. The differentiation of p53 activity towards specific cellular outcomes is tightly regulated via a hierarchical order of post-translational modifications and regulated protein-protein interactions. The mechanisms governing these processes provide a model for how cells optimize the genetic information for maximal diversity. The p53 mRNA also plays a role in this process and this review aims to illustrate how protein and RNA interactions throughout the p53 mRNA in response to different signalling pathways control RNA stability, translation efficiency or alternative initiation of translation. We also describe how a p53 mRNA platform shows riboswitch-like features and controls the rate of p53 synthesis, protein stability and modifications of the nascent p53 protein. A single cancer-derived synonymous mutation disrupts the folding of this platform and prevents p53 activation following DNA damage. The role of the p53 mRNA as a target for signalling pathways illustrates how mRNA sequences have co-evolved with the function of the encoded protein and sheds new light on the information hidden within mRNAs.
Collapse
Affiliation(s)
- Lucia Haronikova
- RECAMO, Masaryk Memorial Cancer Institute, Zluty kopec 7, 656 53 Brno, Czech Republic
| | - Vanesa Olivares-Illana
- Laboratorio de Interacciones Biomoleculares y cáncer. Instituto de Física Universidad Autónoma de San Luis Potosí, Manuel Nava 6, Zona universitaria, 78290 SLP, México
| | - Lixiao Wang
- Department of Medical Biosciences, Umeå University, 90185 Umeå, Sweden
| | | | - Sa Chen
- Department of Medical Biosciences, Umeå University, 90185 Umeå, Sweden
| | - Robin Fåhraeus
- RECAMO, Masaryk Memorial Cancer Institute, Zluty kopec 7, 656 53 Brno, Czech Republic.,Department of Medical Biosciences, Umeå University, 90185 Umeå, Sweden.,Inserm U1162, 27 rue Juliette Dodu, 75010 Paris, France.,ICCVS, University of Gdańsk, Science, ul. Wita Stwosza 63, 80-308 Gdańsk, Poland
| |
Collapse
|
13
|
Multi-strategic RNA-seq analysis reveals a high-resolution transcriptional landscape in cotton. Nat Commun 2019; 10:4714. [PMID: 31624240 PMCID: PMC6797763 DOI: 10.1038/s41467-019-12575-x] [Citation(s) in RCA: 43] [Impact Index Per Article: 8.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/05/2019] [Accepted: 09/18/2019] [Indexed: 11/09/2022] Open
Abstract
Cotton is an important natural fiber crop, however, its comprehensive and high-resolution gene map is lacking. Here we integrate four complementary high-throughput techniques, including Pacbio long read Iso-seq, strand-specific RNA-seq, CAGE-seq, and PolyA-seq, to systematically explore the transcription landscape across 16 tissues or different organ types in Gossypium arboreum. We devise a computational pipeline, named IGIA, to reconstruct accurate gene structures from the integrated data. Our results reveal a dynamic and diverse transcriptional map in cotton: tissue-specific gene expression, alternative usage of TSSs and polyadenylation sites, hotspot of alternative splicing, and transcriptional read-through. These regulated events affect many genes in various aspects such as gain or loss of functional RNA motifs and protein domains, fine-tuning of DNA binding activity, and co-regulation for genes in the same complex or pathway. The methods and findings provide valuable resources for further functional genomic studies such as understanding natural SNP variations for plant community.
Collapse
|
14
|
Abstract
Genetic, transcriptional, and post-transcriptional variations shape the transcriptome of individual cells, rendering establishing an exhaustive set of reference RNAs a complicated matter. Current reference transcriptomes, which are based on carefully curated transcripts, are lagging behind the extensive RNA variation revealed by massively parallel sequencing. Much may be missed by ignoring this unreferenced RNA diversity. There is plentiful evidence for non-reference transcripts with important phenotypic effects. Although reference transcriptomes are inestimable for gene expression analysis, they may turn limiting in important medical applications. We discuss computational strategies for retrieving hidden transcript diversity.
Collapse
Affiliation(s)
- Antonin Morillon
- ncRNA, Epigenetic and Genome Fluidity, CNRS UMR 3244, Sorbonne Université, PSL University, Institut Curie, Centre de Recherche, 26 rue d'Ulm, 75248, Paris, France
| | - Daniel Gautheret
- Institute for Integrative Biology of the Cell, CEA, CNRS, Université Paris-Sud, Université Paris Saclay, Gif sur Yvette, France.
| |
Collapse
|
15
|
Lu Z, Lin Z. Pervasive and dynamic transcription initiation in Saccharomyces cerevisiae. Genome Res 2019; 29:1198-1210. [PMID: 31076411 PMCID: PMC6633255 DOI: 10.1101/gr.245456.118] [Citation(s) in RCA: 34] [Impact Index Per Article: 6.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/22/2018] [Accepted: 05/07/2019] [Indexed: 12/15/2022]
Abstract
Transcription initiation is finely regulated to ensure proper expression and function of genes. The regulated transcription initiation in response to various environmental stimuli in a classic model organism Saccharomyces cerevisiae has not been systematically investigated. In this study, we generated quantitative maps of transcription start sites (TSSs) at a single-nucleotide resolution for S. cerevisiae grown in nine different conditions using no-amplification nontagging Cap analysis of gene expression (nAnT-iCAGE) sequencing. We mapped ∼1 million well-supported TSSs, suggesting highly pervasive transcription initiation in the compact genome of the budding yeast. The comprehensive TSS maps allowed us to identify core promoters for ∼96% verified protein-coding genes. We corrected misannotation of translation start codon for 122 genes and suggested an alternative start codon for 57 genes. We found that 56% of yeast genes are controlled by multiple core promoters, and alternative core promoter usage by a gene is widespread in response to changing environments. Most core promoter shifts are coupled with altered gene expression, indicating that alternative core promoter usage might play an important role in controlling gene transcriptional activities. Based on their activities in responding to environmental cues, we divided core promoters into constitutive class (55%) and inducible class (45%). The two classes of core promoters display distinctive patterns in transcriptional abundance, chromatin structure, promoter shape, and sequence context. In summary, our study improved the annotation of the yeast genome and demonstrated a much more pervasive and dynamic nature of transcription initiation in yeast than previously recognized.
Collapse
Affiliation(s)
- Zhaolian Lu
- Department of Biology, Saint Louis University, St. Louis, Missouri 63104, USA
| | - Zhenguo Lin
- Department of Biology, Saint Louis University, St. Louis, Missouri 63104, USA
| |
Collapse
|
16
|
|
17
|
Sanfilippo P, Wen J, Lai EC. Landscape and evolution of tissue-specific alternative polyadenylation across Drosophila species. Genome Biol 2017; 18:229. [PMID: 29191225 PMCID: PMC5707805 DOI: 10.1186/s13059-017-1358-0] [Citation(s) in RCA: 37] [Impact Index Per Article: 5.3] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/02/2017] [Accepted: 11/08/2017] [Indexed: 12/19/2022] Open
Abstract
BACKGROUND Drosophila melanogaster has one of best-described transcriptomes of any multicellular organism. Nevertheless, the paucity of 3'-sequencing data in this species precludes comprehensive assessment of alternative polyadenylation (APA), which is subject to broad tissue-specific control. RESULTS Here, we generate deep 3'-sequencing data from 23 developmental stages, tissues, and cell lines of D. melanogaster, yielding a comprehensive atlas of ~ 62,000 polyadenylated ends. These data broadly extend the annotated transcriptome, identify ~ 40,000 novel 3' termini, and reveal that two-thirds of Drosophila genes are subject to APA. Furthermore, we dramatically expand the numbers of genes known to be subject to tissue-specific APA, such as 3' untranslated region (UTR) lengthening in head and 3' UTR shortening in testis, and characterize new tissue and developmental 3' UTR patterns. Our thorough 3' UTR annotations permit reassessment of post-transcriptional regulatory networks, via conserved miRNA and RNA binding protein sites. To evaluate the evolutionary conservation and divergence of APA patterns, we generate developmental and tissue-specific 3'-seq libraries from Drosophila yakuba and Drosophila virilis. We document broadly analogous tissue-specific APA trends in these species, but also observe significant alterations in 3' end usage across orthologs. We exploit the population of functionally evolving poly(A) sites to gain clear evidence that evolutionary divergence in core polyadenylation signal (PAS) and downstream sequence element (DSE) motifs drive broad alterations in 3' UTR isoform expression across the Drosophila phylogeny. CONCLUSIONS These data provide a critical resource for the Drosophila community and offer many insights into the complex control of alternative tissue-specific 3' UTR formation and its consequences for post-transcriptional regulatory networks.
Collapse
Affiliation(s)
- Piero Sanfilippo
- Department of Developmental Biology, Sloan-Kettering Institute, New York, New York, 10065, USA.,Louis V. Gerstner, Jr. Graduate School of Biomedical Sciences, Memorial Sloan Kettering Cancer Center, New York, New York, 10065, USA
| | - Jiayu Wen
- Department of Developmental Biology, Sloan-Kettering Institute, New York, New York, 10065, USA.,Present address: Biochemistry and Biomedical Sciences, Research School of Biology, ANU College of Science, The Australian National University, Canberra, ACT 2601, Australia
| | - Eric C Lai
- Department of Developmental Biology, Sloan-Kettering Institute, New York, New York, 10065, USA. .,Louis V. Gerstner, Jr. Graduate School of Biomedical Sciences, Memorial Sloan Kettering Cancer Center, New York, New York, 10065, USA.
| |
Collapse
|
18
|
Afik S, Bartok O, Artyomov MN, Shishkin AA, Kadri S, Hanan M, Zhu X, Garber M, Kadener S. Defining the 5΄ and 3΄ landscape of the Drosophila transcriptome with Exo-seq and RNaseH-seq. Nucleic Acids Res 2017; 45:e95. [PMID: 28335028 PMCID: PMC5499799 DOI: 10.1093/nar/gkx133] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/25/2016] [Accepted: 02/15/2017] [Indexed: 01/19/2023] Open
Abstract
Cells regulate biological responses in part through changes in transcription start sites (TSS) or cleavage and polyadenylation sites (PAS). To fully understand gene regulatory networks, it is therefore critical to accurately annotate cell type-specific TSS and PAS. Here we present a simple and straightforward approach for genome-wide annotation of 5΄- and 3΄-RNA ends. Our approach reliably discerns bona fide PAS from false PAS that arise due to internal poly(A) tracts, a common problem with current PAS annotation methods. We applied our methodology to study the impact of temperature on the Drosophila melanogaster head transcriptome. We found hundreds of previously unidentified TSS and PAS which revealed two interesting phenomena: first, genes with multiple PASs tend to harbor a motif near the most proximal PAS, which likely represents a new cleavage and polyadenylation signal. Second, motif analysis of promoters of genes affected by temperature suggested that boundary element association factor of 32 kDa (BEAF-32) and DREF mediates a transcriptional program at warm temperatures, a result we validated in a fly line where beaf-32 is downregulated. These results demonstrate the utility of a high-throughput platform for complete experimental and computational analysis of mRNA-ends to improve gene annotation.
Collapse
Affiliation(s)
- Shaked Afik
- Biological Chemistry Department, Silberman Institute of Life Sciences, The Hebrew University, Jerusalem 91904, Israel
| | - Osnat Bartok
- Biological Chemistry Department, Silberman Institute of Life Sciences, The Hebrew University, Jerusalem 91904, Israel
| | - Maxim N Artyomov
- Department of Pathology and Immunology, Washington University School of Medicine, St Louis, MO 63110, USA.,Broad Institute of Harvard and MIT, Cambridge, MA 02142, USA
| | - Alexander A Shishkin
- Division of Biology and Biological Engineering, California Institute of Technology, Pasadena, CA 91125, USA
| | - Sabah Kadri
- Broad Institute of Harvard and MIT, Cambridge, MA 02142, USA
| | - Mor Hanan
- Biological Chemistry Department, Silberman Institute of Life Sciences, The Hebrew University, Jerusalem 91904, Israel
| | - Xiaopeng Zhu
- Program in Bioinformatics and Integrative Biology, University of Massachusetts Medical School, Worcester, MA 01655, USA
| | - Manuel Garber
- Program in Bioinformatics and Integrative Biology, University of Massachusetts Medical School, Worcester, MA 01655, USA
| | - Sebastian Kadener
- Biological Chemistry Department, Silberman Institute of Life Sciences, The Hebrew University, Jerusalem 91904, Israel
| |
Collapse
|
19
|
A Short History and Description of Drosophila melanogaster Classical Genetics: Chromosome Aberrations, Forward Genetic Screens, and the Nature of Mutations. Genetics 2017; 206:665-689. [PMID: 28592503 DOI: 10.1534/genetics.117.199950] [Citation(s) in RCA: 38] [Impact Index Per Article: 5.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/09/2017] [Accepted: 04/06/2017] [Indexed: 12/21/2022] Open
Abstract
The purpose of this chapter in FlyBook is to acquaint the reader with the Drosophila genome and the ways in which it can be altered by mutation. Much of what follows will be familiar to the experienced Fly Pusher but hopefully will be useful to those just entering the field and are thus unfamiliar with the genome, the history of how it has been and can be altered, and the consequences of those alterations. I will begin with the structure, content, and organization of the genome, followed by the kinds of structural alterations (karyotypic aberrations), how they affect the behavior of chromosomes in meiotic cell division, and how that behavior can be used. Finally, screens for mutations as they have been performed will be discussed. There are several excellent sources of detailed information on Drosophila husbandry and screening that are recommended for those interested in further expanding their familiarity with Drosophila as a research tool and model organism. These are a book by Ralph Greenspan and a review article by John Roote and Andreas Prokop, which should be required reading for any new student entering a fly lab for the first time.
Collapse
|
20
|
Avila Cobos F, Anckaert J, Volders PJ, Everaert C, Rombaut D, Vandesompele J, De Preter K, Mestdagh P. Zipper plot: visualizing transcriptional activity of genomic regions. BMC Bioinformatics 2017; 18:231. [PMID: 28464823 PMCID: PMC5414305 DOI: 10.1186/s12859-017-1651-7] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/04/2017] [Accepted: 04/25/2017] [Indexed: 11/10/2022] Open
Abstract
Background Reconstructing transcript models from RNA-sequencing (RNA-seq) data and establishing these as independent transcriptional units can be a challenging task. Current state-of-the-art tools for long non-coding RNA (lncRNA) annotation are mainly based on evolutionary constraints, which may result in false negatives due to the overall limited conservation of lncRNAs. Results To tackle this problem we have developed the Zipper plot, a novel visualization and analysis method that enables users to simultaneously interrogate thousands of human putative transcription start sites (TSSs) in relation to various features that are indicative for transcriptional activity. These include publicly available CAGE-sequencing, ChIP-sequencing and DNase-sequencing datasets. Our method only requires three tab-separated fields (chromosome, genomic coordinate of the TSS and strand) as input and generates a report that includes a detailed summary table, a Zipper plot and several statistics derived from this plot. Conclusion Using the Zipper plot, we found evidence of transcription for a set of well-characterized lncRNAs and observed that fewer mono-exonic lncRNAs have CAGE peaks overlapping with their TSSs compared to multi-exonic lncRNAs. Using publicly available RNA-seq data, we found more than one hundred cases where junction reads connected protein-coding gene exons with a downstream mono-exonic lncRNA, revealing the need for a careful evaluation of lncRNA 5′-boundaries. Our method is implemented using the statistical programming language R and is freely available as a webtool. Electronic supplementary material The online version of this article (doi:10.1186/s12859-017-1651-7) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Francisco Avila Cobos
- Center for Medical Genetics, Ghent University, De Pintelaan 185, Ghent, Belgium.,Cancer Research Institute Ghent, De Pintelaan 185, Ghent, Belgium.,Bioinformatics Institute Ghent from Nucleotides to Networks, De Pintelaan 185, Ghent, Belgium
| | - Jasper Anckaert
- Center for Medical Genetics, Ghent University, De Pintelaan 185, Ghent, Belgium.,Cancer Research Institute Ghent, De Pintelaan 185, Ghent, Belgium.,Bioinformatics Institute Ghent from Nucleotides to Networks, De Pintelaan 185, Ghent, Belgium
| | - Pieter-Jan Volders
- Center for Medical Genetics, Ghent University, De Pintelaan 185, Ghent, Belgium.,Cancer Research Institute Ghent, De Pintelaan 185, Ghent, Belgium.,Bioinformatics Institute Ghent from Nucleotides to Networks, De Pintelaan 185, Ghent, Belgium
| | - Celine Everaert
- Center for Medical Genetics, Ghent University, De Pintelaan 185, Ghent, Belgium.,Cancer Research Institute Ghent, De Pintelaan 185, Ghent, Belgium.,Bioinformatics Institute Ghent from Nucleotides to Networks, De Pintelaan 185, Ghent, Belgium
| | - Dries Rombaut
- Center for Medical Genetics, Ghent University, De Pintelaan 185, Ghent, Belgium.,Cancer Research Institute Ghent, De Pintelaan 185, Ghent, Belgium.,Bioinformatics Institute Ghent from Nucleotides to Networks, De Pintelaan 185, Ghent, Belgium
| | - Jo Vandesompele
- Center for Medical Genetics, Ghent University, De Pintelaan 185, Ghent, Belgium.,Cancer Research Institute Ghent, De Pintelaan 185, Ghent, Belgium.,Bioinformatics Institute Ghent from Nucleotides to Networks, De Pintelaan 185, Ghent, Belgium
| | - Katleen De Preter
- Center for Medical Genetics, Ghent University, De Pintelaan 185, Ghent, Belgium.,Cancer Research Institute Ghent, De Pintelaan 185, Ghent, Belgium.,Bioinformatics Institute Ghent from Nucleotides to Networks, De Pintelaan 185, Ghent, Belgium
| | - Pieter Mestdagh
- Center for Medical Genetics, Ghent University, De Pintelaan 185, Ghent, Belgium. .,Cancer Research Institute Ghent, De Pintelaan 185, Ghent, Belgium. .,Bioinformatics Institute Ghent from Nucleotides to Networks, De Pintelaan 185, Ghent, Belgium.
| |
Collapse
|
21
|
Incoronato M, Aiello M, Infante T, Cavaliere C, Grimaldi AM, Mirabelli P, Monti S, Salvatore M. Radiogenomic Analysis of Oncological Data: A Technical Survey. Int J Mol Sci 2017; 18:ijms18040805. [PMID: 28417933 PMCID: PMC5412389 DOI: 10.3390/ijms18040805] [Citation(s) in RCA: 82] [Impact Index Per Article: 11.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/30/2016] [Revised: 04/06/2017] [Accepted: 04/08/2017] [Indexed: 12/18/2022] Open
Abstract
In the last few years, biomedical research has been boosted by the technological development of analytical instrumentation generating a large volume of data. Such information has increased in complexity from basic (i.e., blood samples) to extensive sets encompassing many aspects of a subject phenotype, and now rapidly extending into genetic and, more recently, radiomic information. Radiogenomics integrates both aspects, investigating the relationship between imaging features and gene expression. From a methodological point of view, radiogenomics takes advantage of non-conventional data analysis techniques that reveal meaningful information for decision-support in cancer diagnosis and treatment. This survey is aimed to review the state-of-the-art techniques employed in radiomics and genomics with special focus on analysis methods based on molecular and multimodal probes. The impact of single and combined techniques will be discussed in light of their suitability in correlation and predictive studies of specific oncologic diseases.
Collapse
Affiliation(s)
| | - Marco Aiello
- IRCCS SDN, Via E. Gianturco, 113, 80143 Naples, Italy.
| | | | | | | | | | - Serena Monti
- IRCCS SDN, Via E. Gianturco, 113, 80143 Naples, Italy.
| | | |
Collapse
|
22
|
You BH, Yoon SH, Nam JW. High-confidence coding and noncoding transcriptome maps. Genome Res 2017; 27:1050-1062. [PMID: 28396519 PMCID: PMC5453319 DOI: 10.1101/gr.214288.116] [Citation(s) in RCA: 49] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/06/2016] [Accepted: 04/06/2017] [Indexed: 12/30/2022]
Abstract
The advent of high-throughput RNA sequencing (RNA-seq) has led to the discovery of unprecedentedly immense transcriptomes encoded by eukaryotic genomes. However, the transcriptome maps are still incomplete partly because they were mostly reconstructed based on RNA-seq reads that lack their orientations (known as unstranded reads) and certain boundary information. Methods to expand the usability of unstranded RNA-seq data by predetermining the orientation of the reads and precisely determining the boundaries of assembled transcripts could significantly benefit the quality of the resulting transcriptome maps. Here, we present a high-performing transcriptome assembly pipeline, called CAFE, that significantly improves the original assemblies, respectively assembled with stranded and/or unstranded RNA-seq data, by orienting unstranded reads using the maximum likelihood estimation and by integrating information about transcription start sites and cleavage and polyadenylation sites. Applying large-scale transcriptomic data comprising 230 billion RNA-seq reads from the ENCODE, Human BodyMap 2.0, The Cancer Genome Atlas, and GTEx projects, CAFE enabled us to predict the directions of about 220 billion unstranded reads, which led to the construction of more accurate transcriptome maps, comparable to the manually curated map, and a comprehensive lncRNA catalog that includes thousands of novel lncRNAs. Our pipeline should not only help to build comprehensive, precise transcriptome maps from complex genomes but also to expand the universe of noncoding genomes.
Collapse
Affiliation(s)
- Bo-Hyun You
- Department of Life Science, College of Natural Sciences, Hanyang University, Seoul 133791, Republic of Korea
| | - Sang-Ho Yoon
- Department of Life Science, College of Natural Sciences, Hanyang University, Seoul 133791, Republic of Korea
| | - Jin-Wu Nam
- Department of Life Science, College of Natural Sciences, Hanyang University, Seoul 133791, Republic of Korea.,Research Institute for Convergence of Basic Sciences, Hanyang University, Seoul 133791, Republic of Korea.,Research Institute for Natural Sciences, Hanyang University, Seoul 133791, Republic of Korea
| |
Collapse
|
23
|
Abstract
A genome sequence is worthless if it cannot be deciphered; therefore, efforts to describe - or 'annotate' - genes began as soon as DNA sequences became available. Whereas early work focused on individual protein-coding genes, the modern genomic ocean is a complex maelstrom of alternative splicing, non-coding transcription and pseudogenes. Scientists - from clinicians to evolutionary biologists - need to navigate these waters, and this has led to the design of high-throughput, computationally driven annotation projects. The catalogues that are being produced are key resources for genome exploration, especially as they become integrated with expression, epigenomic and variation data sets. Their creation, however, remains challenging.
Collapse
Affiliation(s)
- Jonathan M Mudge
- Department of Computational Genomics, Wellcome Trust Sanger Institute, Hinxton CB10 1SA, UK
| | - Jennifer Harrow
- Department of Computational Genomics, Wellcome Trust Sanger Institute, Hinxton CB10 1SA, UK.,Illumina Cambridge Ltd, Chesterford Research Park, Little Chesterford, Saffron Walden CB10 1 XL, UK
| |
Collapse
|
24
|
Raborn RT, Spitze K, Brendel VP, Lynch M. Promoter Architecture and Sex-Specific Gene Expression in Daphnia pulex. Genetics 2016; 204:593-612. [PMID: 27585846 PMCID: PMC5068849 DOI: 10.1534/genetics.116.193334] [Citation(s) in RCA: 17] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/02/2016] [Accepted: 07/29/2016] [Indexed: 11/18/2022] Open
Abstract
Large-scale transcription start site (TSS) profiling produces a high-resolution, quantitative picture of transcription initiation and core promoter locations within a genome. However, application of TSS profiling to date has largely been restricted to a small set of prominent model systems. We sought to characterize the cis-regulatory landscape of the water flea Daphnia pulex, an emerging model arthropod that reproduces both asexually (via parthenogenesis) and sexually (via meiosis). We performed Cap Analysis of Gene Expression (CAGE) with RNA isolated from D. pulex within three developmental states: sexual females, asexual females, and males. Identified TSSs were utilized to generate a "Daphnia Promoter Atlas," i.e., a catalog of active promoters across the surveyed states. Analysis of the distribution of promoters revealed evidence for widespread alternative promoter usage in D. pulex, in addition to a prominent fraction of compactly-arranged promoters in divergent orientations. We carried out de novo motif discovery using CAGE-defined TSSs and identified eight candidate core promoter motifs; this collection includes canonical promoter elements (e.g., TATA and Initiator) in addition to others lacking obvious orthologs. A comparison of promoter activities found evidence for considerable state-specific differential gene expression between states. Our work represents the first global definition of transcription initiation and promoter architecture in crustaceans. The Daphnia Promoter Atlas presented here provides a valuable resource for comparative study of cis-regulatory regions in metazoans, as well as for investigations into the circuitries that underpin meiosis and parthenogenesis.
Collapse
Affiliation(s)
- R Taylor Raborn
- Department of Biology, Indiana University, Bloomington, Indiana 47405 School of Informatics and Computing, Indiana University, Bloomington, Indiana 47405
| | - Ken Spitze
- Department of Biology, Indiana University, Bloomington, Indiana 47405
| | - Volker P Brendel
- Department of Biology, Indiana University, Bloomington, Indiana 47405 School of Informatics and Computing, Indiana University, Bloomington, Indiana 47405
| | - Michael Lynch
- Department of Biology, Indiana University, Bloomington, Indiana 47405
| |
Collapse
|
25
|
Széll M, Danis J, Bata-Csörgő Z, Kemény L. PRINS, a primate-specific long non-coding RNA, plays a role in the keratinocyte stress response and psoriasis pathogenesis. Pflugers Arch 2016; 468:935-43. [PMID: 26935426 PMCID: PMC4893059 DOI: 10.1007/s00424-016-1803-z] [Citation(s) in RCA: 35] [Impact Index Per Article: 4.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/30/2015] [Accepted: 02/23/2016] [Indexed: 12/30/2022]
Abstract
In the last few years with the recent emergence of high-throughput technologies, thousands of long non-coding RNAs (lncRNAs) have been identified in the human genome. However, assigning functional annotation and determining cellular contexts for these RNAs are still in its infancy. As information gained about lncRNA structure, interacting partners, and roles in human diseases may be helpful in the characterization of novel lncRNAs, we review our knowledge on a selected group of lncRNAs that were identified serendipitously years ago by large-scale gene expression methods used to study human diseases. In particular, we focus on the Psoriasis-susceptibility-Related RNA Gene Induced by Stress (PRINS) lncRNA, first identified by our research group as a transcript highest expressed in psoriatic non-lesional epidermis. Results gathered for PRINS in the last 10 years indicate that it is conserved in primates and plays a role in keratinocyte stress response. Elevated levels of PRINS expression in psoriatic non-lesional keratinocytes alter the stress response of non-lesional epidermis and contribute to disease pathogenesis. Finally, we propose a categorization for the PRINS lncRNA based on a recently elaborated system for lncRNA classification.
Collapse
Affiliation(s)
- Márta Széll
- Department of Medical Genetics, Faculty of Medicine, University of Szeged, Szeged, Somogyi B. u. 4, 6720, Hungary. .,MTA-SZTE Dermatological Research Group, Szeged, Korányi fasor 6, 6720, Hungary.
| | - Judit Danis
- Department of Dermatology and Allergology, Faculty of Medicine, University of Szeged, Szeged, Korányi fasor 6, 6720, Hungary
| | - Zsuzsanna Bata-Csörgő
- MTA-SZTE Dermatological Research Group, Szeged, Korányi fasor 6, 6720, Hungary.,Department of Dermatology and Allergology, Faculty of Medicine, University of Szeged, Szeged, Korányi fasor 6, 6720, Hungary
| | - Lajos Kemény
- MTA-SZTE Dermatological Research Group, Szeged, Korányi fasor 6, 6720, Hungary.,Department of Dermatology and Allergology, Faculty of Medicine, University of Szeged, Szeged, Korányi fasor 6, 6720, Hungary
| |
Collapse
|
26
|
Pseudo-Reference-Based Assembly of Vertebrate Transcriptomes. Genes (Basel) 2016; 7:genes7030010. [PMID: 26927182 PMCID: PMC4808791 DOI: 10.3390/genes7030010] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/24/2015] [Revised: 02/05/2016] [Accepted: 02/17/2016] [Indexed: 11/17/2022] Open
Abstract
High-throughput RNA sequencing (RNA-seq) provides a comprehensive picture of the transcriptome, including the identity, structure, quantity, and variability of expressed transcripts in cells, through the assembly of sequenced short RNA-seq reads. Although the reference-based approach guarantees the high quality of the resulting transcriptome, this approach is only applicable when the relevant reference genome is present. Here, we developed a pseudo-reference-based assembly (PRA) that reconstructs a transcriptome based on a linear regression function of the optimized mapping parameters and genetic distances of the closest species. Using the linear model, we reconstructed transcriptomes of four different aves, the white leg horn, turkey, duck, and zebra finch, with the Gallus gallus genome as a pseudo-reference, and of three primates, the chimpanzee, gorilla, and macaque, with the human genome as a pseudo-reference. The resulting transcriptomes show that the PRAs outperformed the de novo approach for species with within about 10% mutation rate among orthologous transcriptomes, enough to cover distantly related species as far as chicken and duck. Taken together, we suggest that the PRA method can be used as a tool for reconstructing transcriptome maps of vertebrates whose genomes have not yet been sequenced.
Collapse
|
27
|
Canzar S, Andreotti S, Weese D, Reinert K, Klau GW. CIDANE: comprehensive isoform discovery and abundance estimation. Genome Biol 2016; 17:16. [PMID: 26831908 PMCID: PMC4734886 DOI: 10.1186/s13059-015-0865-0] [Citation(s) in RCA: 37] [Impact Index Per Article: 4.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/28/2015] [Accepted: 12/29/2015] [Indexed: 12/19/2022] Open
Abstract
We present CIDANE, a novel framework for genome-based transcript reconstruction and quantification from RNA-seq reads. CIDANE assembles transcripts efficiently with significantly higher sensitivity and precision than existing tools. Its algorithmic core not only reconstructs transcripts ab initio, but also allows the use of the growing annotation of known splice sites, transcription start and end sites, or full-length transcripts, which are available for most model organisms. CIDANE supports the integrated analysis of RNA-seq and additional gene-boundary data and recovers splice junctions that are invisible to other methods. CIDANE is available at http://ccb.jhu.edu/software/cidane/.
Collapse
Affiliation(s)
- Stefan Canzar
- Center for Computational Biology, McKusick-Nathans Institute of Genetic Medicine, Johns Hopkins University School of Medicine, Baltimore, MD 21205, USA.,Toyota Technological Institute at Chicago, 6045 S. Kennwood Avenue, Chicago, IL 60637, USA
| | - Sandro Andreotti
- Department of Mathematics and Computer Science, Institute of Computer Science, Freie Universität Berlin, Arnimallee 14, Berlin, 14195, Germany
| | - David Weese
- Department of Mathematics and Computer Science, Institute of Computer Science, Freie Universität Berlin, Arnimallee 14, Berlin, 14195, Germany.
| | - Knut Reinert
- Department of Mathematics and Computer Science, Institute of Computer Science, Freie Universität Berlin, Arnimallee 14, Berlin, 14195, Germany.
| | - Gunnar W Klau
- Life Sciences, Centrum Wiskunde & Informatica (CWI), Science Park 123, Amsterdam, 1098 XG, The Netherlands.
| |
Collapse
|
28
|
Conesa A, Madrigal P, Tarazona S, Gomez-Cabrero D, Cervera A, McPherson A, Szcześniak MW, Gaffney DJ, Elo LL, Zhang X, Mortazavi A. A survey of best practices for RNA-seq data analysis. Genome Biol 2016; 17:13. [PMID: 26813401 PMCID: PMC4728800 DOI: 10.1186/s13059-016-0881-8] [Citation(s) in RCA: 1340] [Impact Index Per Article: 167.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/27/2022] Open
Abstract
RNA-sequencing (RNA-seq) has a wide variety of applications, but no single analysis pipeline can be used in all cases. We review all of the major steps in RNA-seq data analysis, including experimental design, quality control, read alignment, quantification of gene and transcript levels, visualization, differential gene expression, alternative splicing, functional analysis, gene fusion detection and eQTL mapping. We highlight the challenges associated with each step. We discuss the analysis of small RNAs and the integration of RNA-seq with other functional genomics techniques. Finally, we discuss the outlook for novel technologies that are changing the state of the art in transcriptomics.
Collapse
Affiliation(s)
- Ana Conesa
- Institute for Food and Agricultural Sciences, Department of Microbiology and Cell Science, University of Florida, Gainesville, FL, 32603, USA. .,Centro de Investigación Príncipe Felipe, Genomics of Gene Expression Laboratory, 46012, Valencia, Spain.
| | - Pedro Madrigal
- Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SA, UK. .,Wellcome Trust-Medical Research Council Cambridge Stem Cell Institute, Anne McLaren Laboratory for Regenerative Medicine, Department of Surgery, University of Cambridge, Cambridge, CB2 0SZ, UK.
| | - Sonia Tarazona
- Centro de Investigación Príncipe Felipe, Genomics of Gene Expression Laboratory, 46012, Valencia, Spain.,Department of Applied Statistics, Operations Research and Quality, Universidad Politécnica de Valencia, 46020, Valencia, Spain
| | - David Gomez-Cabrero
- Unit of Computational Medicine, Department of Medicine, Karolinska Institutet, Karolinska University Hospital, 171 77, Stockholm, Sweden.,Center for Molecular Medicine, Karolinska Institutet, 17177, Stockholm, Sweden.,Unit of Clinical Epidemiology, Department of Medicine, Karolinska University Hospital, L8, 17176, Stockholm, Sweden.,Science for Life Laboratory, 17121, Solna, Sweden
| | - Alejandra Cervera
- Systems Biology Laboratory, Institute of Biomedicine and Genome-Scale Biology Research Program, University of Helsinki, 00014, Helsinki, Finland
| | - Andrew McPherson
- School of Computing Science, Simon Fraser University, Burnaby, V5A 1S6, BC, Canada
| | - Michał Wojciech Szcześniak
- Department of Bioinformatics, Institute of Molecular Biology and Biotechnology, Adam Mickiewicz University in Poznań, 61-614, Poznań, Poland
| | - Daniel J Gaffney
- Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SA, UK
| | - Laura L Elo
- Turku Centre for Biotechnology, University of Turku and Åbo Akademi University, FI-20520, Turku, Finland
| | - Xuegong Zhang
- Key Lab of Bioinformatics/Bioinformatics Division, TNLIST and Department of Automation, Tsinghua University, Beijing, 100084, China.,School of Life Sciences, Tsinghua University, Beijing, 100084, China
| | - Ali Mortazavi
- Department of Developmental and Cell Biology, University of California, Irvine, Irvine, CA, 92697-2300, USA. .,Center for Complex Biological Systems, University of California, Irvine, Irvine, CA, 92697, USA.
| |
Collapse
|
29
|
Linking Genes to Cardiovascular Diseases: Gene Action and Gene-Environment Interactions. J Cardiovasc Transl Res 2015; 8:506-27. [PMID: 26545598 DOI: 10.1007/s12265-015-9658-9] [Citation(s) in RCA: 23] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 08/21/2015] [Accepted: 10/08/2015] [Indexed: 01/22/2023]
Abstract
A unique myocardial characteristic is its ability to grow/remodel in order to adapt; this is determined partly by genes and partly by the environment and the milieu intérieur. In the "post-genomic" era, a need is emerging to elucidate the physiologic functions of myocardial genes, as well as potential adaptive and maladaptive modulations induced by environmental/epigenetic factors. Genome sequencing and analysis advances have become exponential lately, with escalation of our knowledge concerning sometimes controversial genetic underpinnings of cardiovascular diseases. Current technologies can identify candidate genes variously involved in diverse normal/abnormal morphomechanical phenotypes, and offer insights into multiple genetic factors implicated in complex cardiovascular syndromes. The expression profiles of thousands of genes are regularly ascertained under diverse conditions. Global analyses of gene expression levels are useful for cataloging genes and correlated phenotypes, and for elucidating the role of genes in maladies. Comparative expression of gene networks coupled to complex disorders can contribute insights as to how "modifier genes" influence the expressed phenotypes. Increasingly, a more comprehensive and detailed systematic understanding of genetic abnormalities underlying, for example, various genetic cardiomyopathies is emerging. Implementing genomic findings in cardiology practice may well lead directly to better diagnosing and therapeutics. There is currently evolving a strong appreciation for the value of studying gene anomalies, and doing so in a non-disjointed, cohesive manner. However, it is challenging for many-practitioners and investigators-to comprehend, interpret, and utilize the clinically increasingly accessible and affordable cardiovascular genomics studies. This survey addresses the need for fundamental understanding in this vital area.
Collapse
|
30
|
Stoiber MH, Olson S, May GE, Duff MO, Manent J, Obar R, Guruharsha KG, Bickel PJ, Artavanis-Tsakonas S, Brown JB, Graveley BR, Celniker SE. Extensive cross-regulation of post-transcriptional regulatory networks in Drosophila. Genome Res 2015; 25:1692-702. [PMID: 26294687 PMCID: PMC4617965 DOI: 10.1101/gr.182675.114] [Citation(s) in RCA: 22] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/07/2014] [Accepted: 06/10/2015] [Indexed: 01/01/2023]
Abstract
In eukaryotic cells, RNAs exist as ribonucleoprotein particles (RNPs). Despite the importance of these complexes in many biological processes, including splicing, polyadenylation, stability, transportation, localization, and translation, their compositions are largely unknown. We affinity-purified 20 distinct RNA-binding proteins (RBPs) from cultured Drosophila melanogaster cells under native conditions and identified both the RNA and protein compositions of these RNP complexes. We identified “high occupancy target” (HOT) RNAs that interact with the majority of the RBPs we surveyed. HOT RNAs encode components of the nonsense-mediated decay and splicing machinery, as well as RNA-binding and translation initiation proteins. The RNP complexes contain proteins and mRNAs involved in RNA binding and post-transcriptional regulation. Genes with the capacity to produce hundreds of mRNA isoforms, ultracomplex genes, interact extensively with heterogeneous nuclear ribonuclear proteins (hnRNPs). Our data are consistent with a model in which subsets of RNPs include mRNA and protein products from the same gene, indicating the widespread existence of auto-regulatory RNPs. From the simultaneous acquisition and integrative analysis of protein and RNA constituents of RNPs, we identify extensive cross-regulatory and hierarchical interactions in post-transcriptional control.
Collapse
Affiliation(s)
- Marcus H Stoiber
- Department of Biostatistics, University of California Berkeley, Berkeley, California 94720, USA; Department of Genome Dynamics, Lawrence Berkeley National Laboratory, Berkeley, California 94720, USA
| | - Sara Olson
- Department of Genetics and Genome Sciences, Institute for Systems Genomics, University of Connecticut Health Center, Farmington, Connecticut 06030, USA
| | - Gemma E May
- Department of Genetics and Genome Sciences, Institute for Systems Genomics, University of Connecticut Health Center, Farmington, Connecticut 06030, USA
| | - Michael O Duff
- Department of Genetics and Genome Sciences, Institute for Systems Genomics, University of Connecticut Health Center, Farmington, Connecticut 06030, USA
| | - Jan Manent
- Department of Cell Biology, Harvard Medical School, Boston, Massachusetts 02115, USA
| | - Robert Obar
- Department of Cell Biology, Harvard Medical School, Boston, Massachusetts 02115, USA
| | - K G Guruharsha
- Department of Cell Biology, Harvard Medical School, Boston, Massachusetts 02115, USA; Biogen Incorporated, Cambridge, Massachusetts 02142, USA
| | - Peter J Bickel
- Department of Biostatistics, University of California Berkeley, Berkeley, California 94720, USA
| | - Spyros Artavanis-Tsakonas
- Department of Cell Biology, Harvard Medical School, Boston, Massachusetts 02115, USA; Biogen Incorporated, Cambridge, Massachusetts 02142, USA
| | - James B Brown
- Department of Genome Dynamics, Lawrence Berkeley National Laboratory, Berkeley, California 94720, USA; Department of Statistics, University of California Berkeley, Berkeley, California 94720, USA
| | - Brenton R Graveley
- Department of Genetics and Genome Sciences, Institute for Systems Genomics, University of Connecticut Health Center, Farmington, Connecticut 06030, USA
| | - Susan E Celniker
- Department of Genome Dynamics, Lawrence Berkeley National Laboratory, Berkeley, California 94720, USA
| |
Collapse
|
31
|
Abstract
The modENCODE (Model Organism Encyclopedia of DNA Elements) Consortium aimed to map functional elements-including transcripts, chromatin marks, regulatory factor binding sites, and origins of DNA replication-in the model organisms Drosophila melanogaster and Caenorhabditis elegans. During its five-year span, the consortium conducted more than 2,000 genome-wide assays in developmentally staged animals, dissected tissues, and homogeneous cell lines. Analysis of these data sets provided foundational insights into genome, epigenome, and transcriptome structure and the evolutionary turnover of regulatory pathways. These studies facilitated a comparative analysis with similar data types produced by the ENCODE Consortium for human cells. Genome organization differs drastically in these distant species, and yet quantitative relationships among chromatin state, transcription, and cotranscriptional RNA processing are deeply conserved. Of the many biological discoveries of the modENCODE Consortium, we highlight insights that emerged from integrative studies. We focus on operational and scientific lessons that may aid future projects of similar scale or aims in other, emerging model systems.
Collapse
Affiliation(s)
- James B Brown
- Department of Statistics, University of California, Berkeley, California 94720;
| | | |
Collapse
|
32
|
Regulation of gene expression through production of unstable mRNA isoforms. Biochem Soc Trans 2015; 42:1196-205. [PMID: 25110025 DOI: 10.1042/bst20140102] [Citation(s) in RCA: 21] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/13/2022]
Abstract
Alternative splicing is universally accredited for expanding the information encoded within the transcriptome. In recent years, several tightly regulated alternative splicing events have been reported which do not lead to generation of protein products, but lead to unstable mRNA isoforms. Instead these transcripts are targets for NMD (nonsense-mediated decay) or retained in the nucleus and degraded. In the present review I discuss the regulation of these events, and how many have been implicated in control of gene expression that is instrumental to a number of developmental paradigms. I further discuss their relevance to disease settings and conclude by highlighting technologies that will aid identification of more candidate events in future.
Collapse
|
33
|
Cui H, Dhroso A, Johnson N, Korkin D. The variation game: Cracking complex genetic disorders with NGS and omics data. Methods 2015; 79-80:18-31. [PMID: 25944472 DOI: 10.1016/j.ymeth.2015.04.018] [Citation(s) in RCA: 18] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/13/2014] [Revised: 03/27/2015] [Accepted: 04/17/2015] [Indexed: 12/14/2022] Open
Abstract
Tremendous advances in Next Generation Sequencing (NGS) and high-throughput omics methods have brought us one step closer towards mechanistic understanding of the complex disease at the molecular level. In this review, we discuss four basic regulatory mechanisms implicated in complex genetic diseases, such as cancer, neurological disorders, heart disease, diabetes, and many others. The mechanisms, including genetic variations, copy-number variations, posttranscriptional variations, and epigenetic variations, can be detected using a variety of NGS methods. We propose that malfunctions detected in these mechanisms are not necessarily independent, since these malfunctions are often found associated with the same disease and targeting the same gene, group of genes, or functional pathway. As an example, we discuss possible rewiring effects of the cancer-associated genetic, structural, and posttranscriptional variations on the protein-protein interaction (PPI) network centered around P53 protein. The review highlights multi-layered complexity of common genetic disorders and suggests that integration of NGS and omics data is a critical step in developing new computational methods capable of deciphering this complexity.
Collapse
Affiliation(s)
- Hongzhu Cui
- Department of Computer Science, Worcester Polytechnic Institute, 100 Institute Road, Worcester, MA 01609, United States
| | - Andi Dhroso
- Department of Computer Science, Worcester Polytechnic Institute, 100 Institute Road, Worcester, MA 01609, United States
| | - Nathan Johnson
- Department of Computer Science, Worcester Polytechnic Institute, 100 Institute Road, Worcester, MA 01609, United States
| | - Dmitry Korkin
- Department of Computer Science, Worcester Polytechnic Institute, 100 Institute Road, Worcester, MA 01609, United States; Bioinformatics and Computational Biology Program, Worcester Polytechnic Institute, 100 Institute Road, Worcester, MA 01609, United States
| |
Collapse
|
34
|
Zhang B, Rotelli M, Dixon M, Calvi BR. The function of Drosophila p53 isoforms in apoptosis. Cell Death Differ 2015; 22:2058-67. [PMID: 25882045 DOI: 10.1038/cdd.2015.40] [Citation(s) in RCA: 20] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/30/2014] [Revised: 03/02/2015] [Accepted: 03/03/2015] [Indexed: 12/20/2022] Open
Abstract
The p53 protein is a major mediator of the cellular response to genotoxic stress and is a crucial suppressor of tumor formation. In a variety of organisms, p53 and its paralogs, p63 and p73, each encode multiple protein isoforms through alternative splicing, promoters, and translation start sites. The function of these isoforms in development and disease are still being defined. Here, we evaluate the apoptotic potential of multiple isoforms of the single p53 gene in the genetic model Drosophila melanogaster. Most previous studies have focused on the p53A isoform, but it has been recently shown that a larger p53B isoform can induce apoptosis when overexpressed. It has remained unclear, however, whether one or both isoforms are required for the apoptotic response to genotoxic stress. We show that p53B is a much more potent inducer of apoptosis than p53A when overexpressed. Overexpression of two newly identified short isoforms perturbed development and inhibited the apoptotic response to ionizing radiation. Analysis of physiological protein expression indicated that p53A is the most abundant isoform, and that both p53A and p53B can form a complex and co-localize to sub-nuclear compartments. In contrast to the overexpression results, new isoform-specific loss-of-function mutants indicated that it is the shorter p53A isoform, not full-length p53B, that is the primary mediator of pro-apoptotic gene transcription and apoptosis after ionizing radiation. Together, our data show that it is the shorter p53A isoform that mediates the apoptotic response to DNA damage, and further suggest that p53B and shorter isoforms have specialized functions.
Collapse
Affiliation(s)
- B Zhang
- Department of Biology, Indiana University, Bloomington, IN, USA
| | - M Rotelli
- Department of Biology, Indiana University, Bloomington, IN, USA
| | - M Dixon
- Department of Biology, Indiana University, Bloomington, IN, USA
| | - B R Calvi
- Department of Biology, Indiana University, Bloomington, IN, USA
| |
Collapse
|
35
|
Roy CK, Olson S, Graveley BR, Zamore PD, Moore MJ. Assessing long-distance RNA sequence connectivity via RNA-templated DNA-DNA ligation. eLife 2015; 4. [PMID: 25866926 PMCID: PMC4442144 DOI: 10.7554/elife.03700] [Citation(s) in RCA: 25] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/15/2014] [Accepted: 04/12/2015] [Indexed: 02/04/2023] Open
Abstract
Many RNAs, including pre-mRNAs and long non-coding RNAs, can be thousands of nucleotides long and undergo complex post-transcriptional processing. Multiple sites of alternative splicing within a single gene exponentially increase the number of possible spliced isoforms, with most human genes currently estimated to express at least ten. To understand the mechanisms underlying these complex isoform expression patterns, methods are needed that faithfully maintain long-range exon connectivity information in individual RNA molecules. In this study, we describe SeqZip, a methodology that uses RNA-templated DNA–DNA ligation to retain and compress connectivity between distant sequences within single RNA molecules. Using this assay, we test proposed coordination between distant sites of alternative exon utilization in mouse Fn1, and we characterize the extraordinary exon diversity of Drosophila melanogaster Dscam1. DOI:http://dx.doi.org/10.7554/eLife.03700.001 A flow chart can show how an outcome can be achieved from a particular start point by breaking down an activity into a list of possible steps. Often, a flow chart contains several alternative steps, not all of which are taken every time the flow chart is used. The same can be said of genes, which are biological instructions that often contain many options within their DNA sequences. Proteins—which perform many roles in cells—are built following the instructions contained in genes. First, the DNA sequence of the gene is copied. This produces a molecule of ribonucleic acid (RNA), which is able to move around the cell to find the machinery that can use the genetic information to make a protein. Genes and their RNA copies contain instructions with more steps—called exons—than are necessary to make a working protein, so extra exons are removed (‘spliced’) from the RNA copies. Different combinations of exons can be removed, so splicing can make different versions of the RNA called isoforms. These allow a single gene to build many different proteins. In fruit flies, for example, the different exons of the gene Dscam1 can be spliced into one of 38,016 unique RNA isoforms. Current technology only allows researchers to deduce the sequence of RNA molecules by combining sequences recorded from short fragments of the molecule. However, before splicing, RNA molecules tend to be much longer than this, so this restricts our understanding of the RNA isoforms found in cells. Here, Roy et al. devised and tested a new method called SeqZip to solve this problem. SeqZip uses short fragments of DNA called ligamers that can only stick to the sections of RNA that will remain after the molecule has been spliced. After splicing, the ligamers can be stuck together to make a DNA replica of the spliced RNA. The end product is at least 49 times shorter than the original RNA, so it is easier to sequence. In addition, the combinations of the ligamers in the DNA replica show which exons of a specific gene are kept and which ones are spliced out. To test the method, Roy et al. studied a mouse gene that has six RNA isoforms. SeqZip reduced the length of the RNA by five times and made it possible to measure how frequently the different isoforms naturally arise. Roy et al. also used SeqZip to work out which isoforms of the Dscam1 gene are used at different stages in the life of fruit fly larvae. SeqZip can provide insights into how complex organisms like flies, mice, and humans have evolved with relatively few—a little over 20,000—genes in their genomes. DOI:http://dx.doi.org/10.7554/eLife.03700.002
Collapse
Affiliation(s)
- Christian K Roy
- RNA Therapeutics Institute, Howard Hughes Medical Institute, University of Massachusetts Medical School, Worcester, United States
| | - Sara Olson
- Institute for Systems Genomics, Department of Genetics and Developmental Biology, University of Connecticut Health Center, Farmington, United States
| | - Brenton R Graveley
- Institute for Systems Genomics, Department of Genetics and Developmental Biology, University of Connecticut Health Center, Farmington, United States
| | - Phillip D Zamore
- RNA Therapeutics Institute, Howard Hughes Medical Institute, University of Massachusetts Medical School, Worcester, United States
| | - Melissa J Moore
- RNA Therapeutics Institute, Howard Hughes Medical Institute, University of Massachusetts Medical School, Worcester, United States
| |
Collapse
|
36
|
St Laurent G, Wahlestedt C, Kapranov P. The Landscape of long noncoding RNA classification. Trends Genet 2015; 31:239-51. [PMID: 25869999 DOI: 10.1016/j.tig.2015.03.007] [Citation(s) in RCA: 800] [Impact Index Per Article: 88.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/12/2015] [Revised: 03/09/2015] [Accepted: 03/12/2015] [Indexed: 12/12/2022]
Abstract
Advances in the depth and quality of transcriptome sequencing have revealed many new classes of long noncoding RNAs (lncRNAs). lncRNA classification has mushroomed to accommodate these new findings, even though the real dimensions and complexity of the noncoding transcriptome remain unknown. Although evidence of functionality of specific lncRNAs continues to accumulate, conflicting, confusing, and overlapping terminology has fostered ambiguity and lack of clarity in the field in general. The lack of fundamental conceptual unambiguous classification framework results in a number of challenges in the annotation and interpretation of noncoding transcriptome data. It also might undermine integration of the new genomic methods and datasets in an effort to unravel the function of lncRNA. Here, we review existing lncRNA classifications, nomenclature, and terminology. Then, we describe the conceptual guidelines that have emerged for their classification and functional annotation based on expanding and more comprehensive use of large systems biology-based datasets.
Collapse
Affiliation(s)
- Georges St Laurent
- St. Laurent Institute, 317 New Boston St., Suite 201, Woburn, MA 01801 USA; Department of Molecular Biology, Cell Biology and Biochemistry, Brown University, 185 Meeting Street, Providence, RI 02912, USA
| | - Claes Wahlestedt
- Center for Therapeutic Innovation and Department of Psychiatry and Behavioral Sciences, University of Miami Miller School of Medicine, 1501 NW 10th Ave, Miami, FL 33136 USA.
| | - Philipp Kapranov
- Institute of Genomics, School of Biomedical Sciences, Huaqiao Univerisity, 668 Jimei Road, Xiamen, China 361021; St. Laurent Institute, 317 New Boston St., Suite 201, Woburn, MA 01801 USA.
| |
Collapse
|
37
|
Mallory AC, Shkumatava A. LncRNAs in vertebrates: advances and challenges. Biochimie 2015; 117:3-14. [PMID: 25812751 DOI: 10.1016/j.biochi.2015.03.014] [Citation(s) in RCA: 30] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/17/2015] [Accepted: 03/17/2015] [Indexed: 01/06/2023]
Abstract
Beyond the handful of classic and well-characterized long noncoding RNAs (lncRNAs), more recently, hundreds of thousands of lncRNAs have been identified in multiple species including bacteria, plants and vertebrates, and the number of newly annotated lncRNAs continues to increase as more transcriptomes are analyzed. In vertebrates, the expression of many lncRNAs is highly regulated, displaying discrete temporal and spatial expression patterns, suggesting roles in a wide range of developmental processes and setting them apart from classic housekeeping ncRNAs. In addition, the deregulation of a subset of these lncRNAs has been linked to the development of several diseases, including cancers, as well as developmental anomalies. However, the majority of vertebrate lncRNA functions remain enigmatic. As such, a major task at hand is to decipher the biological roles of lncRNAs and uncover the regulatory networks upon which they impinge. This review focuses on our emerging understanding of lncRNAs in vertebrate animals, highlighting some recent advances in their functional analyses across several species and emphasizing the current challenges researchers face to characterize lncRNAs and identify their in vivo functions.
Collapse
Affiliation(s)
- Allison C Mallory
- Institut Curie, 26 Rue d'Ulm, 75248 Paris Cedex 05, France; CNRS UMR3215, 75248 Paris Cedex 05, France; INSERM U934, 75248 Paris Cedex 05, France.
| | - Alena Shkumatava
- Institut Curie, 26 Rue d'Ulm, 75248 Paris Cedex 05, France; CNRS UMR3215, 75248 Paris Cedex 05, France; INSERM U934, 75248 Paris Cedex 05, France.
| |
Collapse
|
38
|
Evans TG, Padilla-Gamiño JL, Kelly MW, Pespeni MH, Chan F, Menge BA, Gaylord B, Hill TM, Russell AD, Palumbi SR, Sanford E, Hofmann GE. Ocean acidification research in the 'post-genomic' era: Roadmaps from the purple sea urchin Strongylocentrotus purpuratus. Comp Biochem Physiol A Mol Integr Physiol 2015; 185:33-42. [PMID: 25773301 DOI: 10.1016/j.cbpa.2015.03.007] [Citation(s) in RCA: 16] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/01/2014] [Revised: 03/07/2015] [Accepted: 03/08/2015] [Indexed: 01/26/2023]
Abstract
Advances in nucleic acid sequencing technology are removing obstacles that historically prevented use of genomics within ocean change biology. As one of the first marine calcifiers to have its genome sequenced, purple sea urchins (Strongylocentrotus purpuratus) have been the subject of early research exploring genomic responses to ocean acidification, work that points to future experiments and illustrates the value of expanding genomic resources to other marine organisms in this new 'post-genomic' era. This review presents case studies of S. purpuratus demonstrating the ability of genomic experiments to address major knowledge gaps within ocean acidification. Ocean acidification research has focused largely on species vulnerability, and studies exploring mechanistic bases of tolerance toward low pH seawater are comparatively few. Transcriptomic responses to high pCO₂ seawater in a population of urchins already encountering low pH conditions have cast light on traits required for success in future oceans. Secondly, there is relatively little information on whether marine organisms possess the capacity to adapt to oceans progressively decreasing in pH. Genomics offers powerful methods to investigate evolutionary responses to ocean acidification and recent work in S. purpuratus has identified genes under selection in acidified seawater. Finally, relatively few ocean acidification experiments investigate how shifts in seawater pH combine with other environmental factors to influence organism performance. In S. purpuratus, transcriptomics has provided insight into physiological responses of urchins exposed simultaneously to warmer and more acidic seawater. Collectively, these data support that similar breakthroughs will occur as genomic resources are developed for other marine species.
Collapse
Affiliation(s)
- Tyler G Evans
- Department of Biological Sciences, California State University East Bay, Hayward, CA 94542, USA.
| | | | - Morgan W Kelly
- Department of Biological Sciences, Louisiana State University, Baton Rouge, LA 70803, USA
| | - Melissa H Pespeni
- Department of Biology, University of Vermont, Burlington, VT 05405, USA
| | - Francis Chan
- Department of Integrative Biology, Oregon State University, Corvallis, OR 97331-2914, USA
| | - Bruce A Menge
- Department of Integrative Biology, Oregon State University, Corvallis, OR 97331-2914, USA
| | - Brian Gaylord
- Department of Evolution and Ecology and Bodega Marine Laboratory, University of California Davis, Bodega Bay, CA 94923, USA
| | - Tessa M Hill
- Department of Geology and Bodega Marine Laboratory, University of California Davis, Bodega Bay, CA 94923, USA
| | - Ann D Russell
- Department of Geology, University of California Davis, Davis, CA 95616, USA
| | - Stephen R Palumbi
- Department of Biology, Stanford University, Hopkins Marine Station, Pacific Grove, CA 93950, USA
| | - Eric Sanford
- Department of Evolution and Ecology and Bodega Marine Laboratory, University of California Davis, Bodega Bay, CA 94923, USA
| | - Gretchen E Hofmann
- Department of Ecology, Evolution and Marine Biology, University of California Santa Barbara, Santa Barbara, CA 93106-9620, USA
| |
Collapse
|
39
|
Brown JB, Boley N, Eisman R, May GE, Stoiber MH, Duff MO, Booth BW, Wen J, Park S, Suzuki AM, Wan KH, Yu C, Zhang D, Carlson JW, Cherbas L, Eads BD, Miller D, Mockaitis K, Roberts J, Davis CA, Frise E, Hammonds AS, Olson S, Shenker S, Sturgill D, Samsonova AA, Weiszmann R, Robinson G, Hernandez J, Andrews J, Bickel PJ, Carninci P, Cherbas P, Gingeras TR, Hoskins RA, Kaufman TC, Lai EC, Oliver B, Perrimon N, Graveley BR, Celniker SE. Diversity and dynamics of the Drosophila transcriptome. Nature 2014; 512:393-9. [PMID: 24670639 PMCID: PMC4152413 DOI: 10.1038/nature12962] [Citation(s) in RCA: 470] [Impact Index Per Article: 47.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/20/2013] [Accepted: 12/18/2013] [Indexed: 01/10/2023]
Abstract
Animal transcriptomes are dynamic, with each cell type, tissue and organ system expressing an ensemble of transcript isoforms that give rise to substantial diversity. Here we have identified new genes, transcripts and proteins using poly(A)+ RNA sequencing from Drosophila melanogaster in cultured cell lines, dissected organ systems and under environmental perturbations. We found that a small set of mostly neural-specific genes has the potential to encode thousands of transcripts each through extensive alternative promoter usage and RNA splicing. The magnitudes of splicing changes are larger between tissues than between developmental stages, and most sex-specific splicing is gonad-specific. Gonads express hundreds of previously unknown coding and long non-coding RNAs (lncRNAs), some of which are antisense to protein-coding genes and produce short regulatory RNAs. Furthermore, previously identified pervasive intergenic transcription occurs primarily within newly identified introns. The fly transcriptome is substantially more complex than previously recognized, with this complexity arising from combinatorial usage of promoters, splice sites and polyadenylation sites.
Collapse
|
40
|
Boley N, Wan KH, Bickel PJ, Celniker SE. Navigating and mining modENCODE data. Methods 2014; 68:38-47. [PMID: 24636835 DOI: 10.1016/j.ymeth.2014.03.007] [Citation(s) in RCA: 16] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/09/2014] [Revised: 03/04/2014] [Accepted: 03/06/2014] [Indexed: 01/12/2023] Open
Abstract
modENCODE was a 5year NHGRI funded project (2007-2012) to map the function of every base in the genomes of worms and flies characterizing positions of modified histones and other chromatin marks, origins of DNA replication, RNA transcripts and the transcription factor binding sites that control gene expression. Here we describe the Drosophila modENCODE datasets and how best to access and use them for genome wide and individual gene studies.
Collapse
Affiliation(s)
- Nathan Boley
- Department of Biostatistics, University of California Berkeley, Berkeley, CA, United States
| | - Kenneth H Wan
- Department of Genome Dynamics, Lawrence Berkeley National Laboratory, Berkeley, CA, United States
| | - Peter J Bickel
- Department of Statistics, University of California Berkeley, Berkeley, CA, United States
| | - Susan E Celniker
- Department of Genome Dynamics, Lawrence Berkeley National Laboratory, Berkeley, CA, United States.
| |
Collapse
|
41
|
Affiliation(s)
- Chen Gao
- Departments of Anesthesiology, Physiology and Medicine, Molecular Biology Institute, David Geffen School of Medicine at University of California at Los Angeles
| | - Yibin Wang
- Departments of Anesthesiology, Physiology and Medicine, Molecular Biology Institute, David Geffen School of Medicine at University of California at Los Angeles
| |
Collapse
|