151
|
Kryuchkova-Mostacci N, Robinson-Rechavi M. Tissue-Specific Evolution of Protein Coding Genes in Human and Mouse. PLoS One 2015; 10:e0131673. [PMID: 26121354 PMCID: PMC4488272 DOI: 10.1371/journal.pone.0131673] [Citation(s) in RCA: 33] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/23/2015] [Accepted: 06/04/2015] [Indexed: 12/23/2022] Open
Abstract
Protein-coding genes evolve at different rates, and the influence of different parameters, from gene size to expression level, has been extensively studied. While in yeast gene expression level is the major causal factor of gene evolutionary rate, the situation is more complex in animals. Here we investigate these relations further, especially taking in account gene expression in different organs as well as indirect correlations between parameters. We used RNA-seq data from two large datasets, covering 22 mouse tissues and 27 human tissues. Over all tissues, evolutionary rate only correlates weakly with levels and breadth of expression. The strongest explanatory factors of purifying selection are GC content, expression in many developmental stages, and expression in brain tissues. While the main component of evolutionary rate is purifying selection, we also find tissue-specific patterns for sites under neutral evolution and for positive selection. We observe fast evolution of genes expressed in testis, but also in other tissues, notably liver, which are explained by weak purifying selection rather than by positive selection.
Collapse
Affiliation(s)
- Nadezda Kryuchkova-Mostacci
- Department of Ecology and Evolution, University of Lausanne, Lausanne, Switzerland
- Swiss Institute of Bioinformatics, Lausanne, Switzerland
| | - Marc Robinson-Rechavi
- Department of Ecology and Evolution, University of Lausanne, Lausanne, Switzerland
- Swiss Institute of Bioinformatics, Lausanne, Switzerland
| |
Collapse
|
152
|
Melé M, Ferreira PG, Reverter F, DeLuca DS, Monlong J, Sammeth M, Young TR, Goldmann JM, Pervouchine DD, Sullivan TJ, Johnson R, Segrè AV, Djebali S, Niarchou A, Wright FA, Lappalainen T, Calvo M, Getz G, Dermitzakis ET, Ardlie KG, Guigó R. Human genomics. The human transcriptome across tissues and individuals. Science 2015; 348:660-5. [PMID: 25954002 DOI: 10.1126/science.aaa0355] [Citation(s) in RCA: 851] [Impact Index Per Article: 94.6] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/17/2022]
Abstract
Transcriptional regulation and posttranscriptional processing underlie many cellular and organismal phenotypes. We used RNA sequence data generated by Genotype-Tissue Expression (GTEx) project to investigate the patterns of transcriptome variation across individuals and tissues. Tissues exhibit characteristic transcriptional signatures that show stability in postmortem samples. These signatures are dominated by a relatively small number of genes—which is most clearly seen in blood—though few are exclusive to a particular tissue and vary more across tissues than individuals. Genes exhibiting high interindividual expression variation include disease candidates associated with sex, ethnicity, and age. Primary transcription is the major driver of cellular specificity, with splicing playing mostly a complementary role; except for the brain, which exhibits a more divergent splicing program. Variation in splicing, despite its stochasticity, may play in contrast a comparatively greater role in defining individual phenotypes.
Collapse
Affiliation(s)
- Marta Melé
- Center for Genomic Regulation (CRG), Barcelona, Catalonia, Spain. Harvard Department of stem cell and regenerative biology, Harvard University, Cambridge, MA, USA
| | - Pedro G Ferreira
- Center for Genomic Regulation (CRG), Barcelona, Catalonia, Spain. Department of Genetic Medicine and Development, University of Geneva, Geneva, Switzerland. Institute for Genetics and Genomics in Geneva (iGE3), University of Geneva, Geneva, Switzerland. Swiss Institute of Bioinformatics, Geneva, Switzerland
| | - Ferran Reverter
- Center for Genomic Regulation (CRG), Barcelona, Catalonia, Spain. Facultat de Biologia, Universitat de Barcelona (UB), Barcelona, Catalonia, Spain. Universitat Pompeu Fabra (UPF), Barcelona, Catalonia, Spain
| | | | - Jean Monlong
- Center for Genomic Regulation (CRG), Barcelona, Catalonia, Spain. Universitat Pompeu Fabra (UPF), Barcelona, Catalonia, Spain. McGill University, Montreal, Canada
| | - Michael Sammeth
- Center for Genomic Regulation (CRG), Barcelona, Catalonia, Spain. Universitat Pompeu Fabra (UPF), Barcelona, Catalonia, Spain. National Institute for Scientific Computing (LNCC), Petropolis, Rio de Janeiro, Brazil
| | | | - Jakob M Goldmann
- Center for Genomic Regulation (CRG), Barcelona, Catalonia, Spain. Universitat Pompeu Fabra (UPF), Barcelona, Catalonia, Spain. Radboud University, Nijmegen, Netherlands
| | - Dmitri D Pervouchine
- Center for Genomic Regulation (CRG), Barcelona, Catalonia, Spain. Universitat Pompeu Fabra (UPF), Barcelona, Catalonia, Spain. Faculty of Bioengineering and Bioinformatics, Moscow State University, Leninskie Gory 1-73, 119992 Moscow, Russia
| | | | - Rory Johnson
- Center for Genomic Regulation (CRG), Barcelona, Catalonia, Spain. Universitat Pompeu Fabra (UPF), Barcelona, Catalonia, Spain
| | | | - Sarah Djebali
- Center for Genomic Regulation (CRG), Barcelona, Catalonia, Spain. Universitat Pompeu Fabra (UPF), Barcelona, Catalonia, Spain
| | - Anastasia Niarchou
- Department of Genetic Medicine and Development, University of Geneva, Geneva, Switzerland. Institute for Genetics and Genomics in Geneva (iGE3), University of Geneva, Geneva, Switzerland. Swiss Institute of Bioinformatics, Geneva, Switzerland
| | -
- Center for Genomic Regulation (CRG), Barcelona, Catalonia, Spain. Harvard Department of stem cell and regenerative biology, Harvard University, Cambridge, MA, USA. Department of Genetic Medicine and Development, University of Geneva, Geneva, Switzerland. Institute for Genetics and Genomics in Geneva (iGE3), University of Geneva, Geneva, Switzerland. Swiss Institute of Bioinformatics, Geneva, Switzerland. Facultat de Biologia, Universitat de Barcelona (UB), Barcelona, Catalonia, Spain. Universitat Pompeu Fabra (UPF), Barcelona, Catalonia, Spain. Broad Institute of MIT and Harvard, Cambridge, MA, USA. McGill University, Montreal, Canada. National Institute for Scientific Computing (LNCC), Petropolis, Rio de Janeiro, Brazil. Radboud University, Nijmegen, Netherlands. Faculty of Bioengineering and Bioinformatics, Moscow State University, Leninskie Gory 1-73, 119992 Moscow, Russia. North Carolina State University, Raleigh, NC, USA. New York Genome Center, New York, NY, USA. Department of Systems Biology, Columbia University, New York, NY, USA. Cancer Center and Department of Pathology, Massachusetts General Hospital, Boston, MA 02114, USA. Institut Hospital del Mar d'Investigacions Mèdiques (IMIM), Barcelona, Catalonia, Spain. Joint CRG-Barcelona Super Computing Center (BSC)-Institut de Recerca Biomedica (IRB) Program in Computational Biology, Barcelona, Catalonia, Spain
| | | | - Tuuli Lappalainen
- Department of Genetic Medicine and Development, University of Geneva, Geneva, Switzerland. Institute for Genetics and Genomics in Geneva (iGE3), University of Geneva, Geneva, Switzerland. Swiss Institute of Bioinformatics, Geneva, Switzerland. New York Genome Center, New York, NY, USA. Department of Systems Biology, Columbia University, New York, NY, USA
| | - Miquel Calvo
- Facultat de Biologia, Universitat de Barcelona (UB), Barcelona, Catalonia, Spain
| | - Gad Getz
- Broad Institute of MIT and Harvard, Cambridge, MA, USA. Cancer Center and Department of Pathology, Massachusetts General Hospital, Boston, MA 02114, USA
| | - Emmanouil T Dermitzakis
- Department of Genetic Medicine and Development, University of Geneva, Geneva, Switzerland. Institute for Genetics and Genomics in Geneva (iGE3), University of Geneva, Geneva, Switzerland. Swiss Institute of Bioinformatics, Geneva, Switzerland
| | | | - Roderic Guigó
- Center for Genomic Regulation (CRG), Barcelona, Catalonia, Spain. Universitat Pompeu Fabra (UPF), Barcelona, Catalonia, Spain. Institut Hospital del Mar d'Investigacions Mèdiques (IMIM), Barcelona, Catalonia, Spain. Joint CRG-Barcelona Super Computing Center (BSC)-Institut de Recerca Biomedica (IRB) Program in Computational Biology, Barcelona, Catalonia, Spain.
| |
Collapse
|
153
|
Ge F, Parker J, Chul Choi S, Layer M, Ross K, Jilly B, Chen J. Preferential Amplification of Pathogenic Sequences. Sci Rep 2015; 5:11047. [PMID: 26067233 PMCID: PMC4464073 DOI: 10.1038/srep11047] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/10/2014] [Accepted: 05/14/2015] [Indexed: 02/07/2023] Open
Abstract
The application of next generation sequencing (NGS) technology in the diagnosis of human pathogens is hindered by the fact that pathogenic sequences, especially viral, are often scarce in human clinical specimens. This known disproportion leads to the requirement of subsequent deep sequencing and extensive bioinformatics analysis. Here we report a method we called “Preferential Amplification of Pathogenic Sequences (PATHseq)” that can be used to greatly enrich pathogenic sequences. Using a computer program, we developed 8-, 9-, and 10-mer oligonucleotides called “non-human primers” that do not match the most abundant human transcripts, but instead selectively match transcripts of human pathogens. Instead of using random primers in the construction of cDNA libraries, the PATHseq method recruits these short non-human primers, which in turn, preferentially amplifies non-human, presumably pathogenic sequences. Using this method, we were able to enrich pathogenic sequences up to 200-fold in the final sequencing library. This method does not require prior knowledge of the pathogen or assumption of the infection; therefore, it provides a fast and sequence-independent approach for detection and identification of human viruses and other pathogens. The PATHseq method, coupled with NGS technology, can be broadly used in identification of known human pathogens and discovery of new pathogens.
Collapse
Affiliation(s)
- Fang Ge
- Department of Biology and Wildlife, Institute of Arctic Biology, University of Alaska Fairbanks, Fairbanks, Alaska, USA
| | - Jayme Parker
- 1] Department of Biology and Wildlife, Institute of Arctic Biology, University of Alaska Fairbanks, Fairbanks, Alaska, USA [2] Alaska State Public Health Virology Laboratory, Fairbanks, Alaska, USA
| | - Sang Chul Choi
- Department of Biology and Wildlife, Institute of Arctic Biology, University of Alaska Fairbanks, Fairbanks, Alaska, USA
| | - Mark Layer
- Department of Biology and Wildlife, Institute of Arctic Biology, University of Alaska Fairbanks, Fairbanks, Alaska, USA
| | - Katherine Ross
- Alaska State Public Health Laboratories, Anchorage, Alaska, USA
| | - Bernard Jilly
- Alaska State Public Health Laboratories, Anchorage, Alaska, USA
| | - Jack Chen
- 1] Department of Biology and Wildlife, Institute of Arctic Biology, University of Alaska Fairbanks, Fairbanks, Alaska, USA [2] Alaska State Public Health Virology Laboratory, Fairbanks, Alaska, USA
| |
Collapse
|
154
|
Tay AP, Pang CNI, Twine NA, Hart-Smith G, Harkness L, Kassem M, Wilkins MR. Proteomic Validation of Transcript Isoforms, Including Those Assembled from RNA-Seq Data. J Proteome Res 2015; 14:3541-54. [PMID: 25961807 DOI: 10.1021/pr5011394] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/21/2022]
Abstract
Human proteome analysis now requires an understanding of protein isoforms. We recently published the PG Nexus pipeline, which facilitates high confidence validation of exons and splice junctions by integrating genomics and proteomics data. Here we comprehensively explore how RNA-seq transcriptomics data, and proteomic analysis of the same sample, can identify protein isoforms. RNA-seq data from human mesenchymal (hMSC) stem cells were analyzed with our new TranscriptCoder tool to generate a database of protein isoform sequences. MS/MS data from matching hMSC samples were then matched against the TranscriptCoder-derived database, along with Ensembl and the neXtProt database. Querying the TranscriptCoder-derived or Ensembl database could unambiguously identify ∼450 protein isoforms, with isoform-specific proteotypic peptides, including candidate hMSC-specific isoforms for the genes DPYSL2 and FXR1. Where isoform-specific peptides did not exist, groups of nonisoform-specific proteotypic peptides could specifically identify many isoforms. In both the above cases, isoforms will be detectable with targeted MS/MS assays. Unfortunately, our analysis also revealed that some isoforms will be difficult to identify unambiguously as they do not have peptides that are sufficiently distinguishing. We covisualize mRNA isoforms and peptides in a genome browser to illustrate the above situations. Mass spectrometry data is available via ProteomeXchange (PXD001449).
Collapse
Affiliation(s)
- Aidan P Tay
- Systems Biology Initiative, The University of New South Wales , Sydney, New South Wales 2052, Australia.,School of Biotechnology and Biomolecular Sciences, The University of New South Wales , Sydney, New South Wales 2052, Australia
| | - Chi Nam Ignatius Pang
- Systems Biology Initiative, The University of New South Wales , Sydney, New South Wales 2052, Australia.,School of Biotechnology and Biomolecular Sciences, The University of New South Wales , Sydney, New South Wales 2052, Australia
| | - Natalie A Twine
- Systems Biology Initiative, The University of New South Wales , Sydney, New South Wales 2052, Australia.,School of Biotechnology and Biomolecular Sciences, The University of New South Wales , Sydney, New South Wales 2052, Australia
| | - Gene Hart-Smith
- Systems Biology Initiative, The University of New South Wales , Sydney, New South Wales 2052, Australia.,School of Biotechnology and Biomolecular Sciences, The University of New South Wales , Sydney, New South Wales 2052, Australia
| | - Linda Harkness
- Endocrine Research Laboratory (KMEB), Department of Endocrinology and Metabolism, Odense University Hospital & University of Southern Denmark , Odense 5230, Denmark
| | - Moustapha Kassem
- Endocrine Research Laboratory (KMEB), Department of Endocrinology and Metabolism, Odense University Hospital & University of Southern Denmark , Odense 5230, Denmark
| | - Marc R Wilkins
- Systems Biology Initiative, The University of New South Wales , Sydney, New South Wales 2052, Australia.,School of Biotechnology and Biomolecular Sciences, The University of New South Wales , Sydney, New South Wales 2052, Australia
| |
Collapse
|
155
|
Sudarkina OY, Filippenkov IB, Brodsky IB, Limborska SA, Dergunova LV. Comparative analysis of sphingomyelin synthase 1 gene expression at the transcriptional and translational levels in human tissues. Mol Cell Biochem 2015; 406:91-9. [PMID: 25912551 DOI: 10.1007/s11010-015-2427-x] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/27/2014] [Accepted: 04/22/2015] [Indexed: 12/18/2022]
Abstract
Sphingomyelin synthase 1 (SMS1) catalyses the biosynthesis of sphingomyelin in eukaryotic cells. We have previously determined the structure of the SGMS1 gene encoding this enzyme and a number of its alternative transcripts. Here, we describe a study of the expression of the full-length SMS1 protein and the sum of the alternative transcripts encoding this protein in human tissues. The SMS1 protein and mRNA levels in tissues differed significantly and were not correlated, implying the active post-transcriptional regulation of SMS1 protein expression. The putative truncated isoforms of the SMS1 protein, which are encoded by a number of alternative transcripts, were not detected by immunoblotting and thus may be absent or present in only small amounts.
Collapse
Affiliation(s)
- Olga Yu Sudarkina
- Human Molecular Genetics Department, Institute of Molecular Genetics, Russian Academy of Sciences, Moscow, 123182, Russia,
| | | | | | | | | |
Collapse
|
156
|
Bitton DA, Atkinson SR, Rallis C, Smith GC, Ellis DA, Chen YYC, Malecki M, Codlin S, Lemay JF, Cotobal C, Bachand F, Marguerat S, Mata J, Bähler J. Widespread exon skipping triggers degradation by nuclear RNA surveillance in fission yeast. Genome Res 2015; 25:884-96. [PMID: 25883323 PMCID: PMC4448684 DOI: 10.1101/gr.185371.114] [Citation(s) in RCA: 36] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/05/2014] [Accepted: 03/31/2015] [Indexed: 12/31/2022]
Abstract
Exon skipping is considered a principal mechanism by which eukaryotic cells expand their transcriptome and proteome repertoires, creating different splice variants with distinct cellular functions. Here we analyze RNA-seq data from 116 transcriptomes in fission yeast (Schizosaccharomyces pombe), covering multiple physiological conditions as well as transcriptional and RNA processing mutants. We applied brute-force algorithms to detect all possible exon-skipping events, which were widespread but rare compared to normal splicing events. Exon-skipping events increased in cells deficient for the nuclear exosome or the 5′-3′ exonuclease Dhp1, and also at late stages of meiotic differentiation when nuclear-exosome transcripts decreased. The pervasive exon-skipping transcripts were stochastic, did not increase in specific physiological conditions, and were mostly present at less than one copy per cell, even in the absence of nuclear RNA surveillance and during late meiosis. These exon-skipping transcripts are therefore unlikely to be functional and may reflect splicing errors that are actively removed by nuclear RNA surveillance. The average splicing rate by exon skipping was ∼0.24% in wild type and ∼1.75% in nuclear exonuclease mutants. We also detected approximately 250 circular RNAs derived from single or multiple exons. These circular RNAs were rare and stochastic, although a few became stabilized during quiescence and in splicing mutants. Using an exhaustive search algorithm, we also uncovered thousands of previously unknown splice sites, indicating pervasive splicing; yet most of these splicing variants were cryptic and increased in nuclear degradation mutants. This study highlights widespread but low frequency alternative or aberrant splicing events that are targeted by nuclear RNA surveillance.
Collapse
Affiliation(s)
- Danny A Bitton
- University College London, Research Department of Genetics, Evolution and Environment and UCL Cancer Institute, London WC1E 6BT, United Kingdom
| | - Sophie R Atkinson
- University College London, Research Department of Genetics, Evolution and Environment and UCL Cancer Institute, London WC1E 6BT, United Kingdom
| | - Charalampos Rallis
- University College London, Research Department of Genetics, Evolution and Environment and UCL Cancer Institute, London WC1E 6BT, United Kingdom
| | - Graeme C Smith
- University College London, Research Department of Genetics, Evolution and Environment and UCL Cancer Institute, London WC1E 6BT, United Kingdom
| | - David A Ellis
- University College London, Research Department of Genetics, Evolution and Environment and UCL Cancer Institute, London WC1E 6BT, United Kingdom
| | - Yuan Y C Chen
- University College London, Research Department of Genetics, Evolution and Environment and UCL Cancer Institute, London WC1E 6BT, United Kingdom
| | - Michal Malecki
- University College London, Research Department of Genetics, Evolution and Environment and UCL Cancer Institute, London WC1E 6BT, United Kingdom
| | - Sandra Codlin
- University College London, Research Department of Genetics, Evolution and Environment and UCL Cancer Institute, London WC1E 6BT, United Kingdom
| | - Jean-François Lemay
- Université de Sherbrooke, Department of Biochemistry, Sherbrooke, Quebec J1H 5N4, Canada
| | - Cristina Cotobal
- Department of Biochemistry, University of Cambridge, Cambridge CB2 1QW, United Kingdom
| | - François Bachand
- Université de Sherbrooke, Department of Biochemistry, Sherbrooke, Quebec J1H 5N4, Canada
| | - Samuel Marguerat
- University College London, Research Department of Genetics, Evolution and Environment and UCL Cancer Institute, London WC1E 6BT, United Kingdom
| | - Juan Mata
- Department of Biochemistry, University of Cambridge, Cambridge CB2 1QW, United Kingdom
| | - Jürg Bähler
- University College London, Research Department of Genetics, Evolution and Environment and UCL Cancer Institute, London WC1E 6BT, United Kingdom
| |
Collapse
|
157
|
Stubbington MJ, Mahata B, Svensson V, Deonarine A, Nissen JK, Betz AG, Teichmann SA. An atlas of mouse CD4(+) T cell transcriptomes. Biol Direct 2015; 10:14. [PMID: 25886751 PMCID: PMC4384382 DOI: 10.1186/s13062-015-0045-x] [Citation(s) in RCA: 59] [Impact Index Per Article: 6.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/10/2014] [Accepted: 02/23/2015] [Indexed: 12/24/2022] Open
Abstract
BACKGROUND CD4(+) T cells are key regulators of the adaptive immune system and can be divided into T helper (Th) cells and regulatory T (Treg) cells. During an immune response Th cells mature from a naive state into one of several effector subtypes that exhibit distinct functions. The transcriptional mechanisms that underlie the specific functional identity of CD4(+) T cells are not fully understood. RESULTS To assist investigations into the transcriptional identity and regulatory processes of these cells we performed mRNA-sequencing on three murine T helper subtypes (Th1, Th2 and Th17) as well as on splenic Treg cells and induced Treg (iTreg) cells. Our integrated analysis of this dataset revealed the gene expression changes associated with these related but distinct cellular identities. Each cell subtype differentially expresses a wealth of 'subtype upregulated' genes, some of which are well known whilst others promise new insights into signalling processes and transcriptional regulation. We show that hundreds of genes are regulated purely by alternative splicing to extend our knowledge of the role of post-transcriptional regulation in cell differentiation. CONCLUSIONS This CD4(+) transcriptome atlas provides a valuable resource for the study of CD4(+) T cell populations. To facilitate its use by others, we have made the data available in an easily accessible online resource at www.th-express.org.
Collapse
Affiliation(s)
- Michael Jt Stubbington
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SD, UK.
| | - Bidesh Mahata
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SD, UK.
- Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SA, UK.
| | - Valentine Svensson
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SD, UK.
| | | | - Jesper K Nissen
- MRC Laboratory of Molecular Biology, Cambridge, CB2 0QH, UK.
| | | | - Sarah A Teichmann
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SD, UK.
- Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SA, UK.
| |
Collapse
|
158
|
Ezkurdia I, Rodriguez JM, Carrillo-de Santa Pau E, Vázquez J, Valencia A, Tress ML. Most highly expressed protein-coding genes have a single dominant isoform. J Proteome Res 2015; 14:1880-7. [PMID: 25732134 DOI: 10.1021/pr501286b] [Citation(s) in RCA: 78] [Impact Index Per Article: 8.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/28/2022]
Abstract
Although eukaryotic cells express a wide range of alternatively spliced transcripts, it is not clear whether genes tend to express a range of transcripts simultaneously across cells, or produce dominant isoforms in a manner that is either tissue-specific or regardless of tissue. To date, large-scale investigations into the pattern of transcript expression across distinct tissues have produced contradictory results. Here, we attempt to determine whether genes express a dominant splice variant at the protein level. We interrogate peptides from eight large-scale human proteomics experiments and databases and find that there is a single dominant protein isoform, irrespective of tissue or cell type, for the vast majority of the protein-coding genes in these experiments, in partial agreement with the conclusions from the most recent large-scale RNAseq study. Remarkably, the dominant isoforms from the experimental proteomics analyses coincided overwhelmingly with the reference isoforms selected by two completely orthogonal sources, the consensus coding sequence variants, which are agreed upon by separate manual genome curation teams, and the principal isoforms from the APPRIS database, predicted automatically from the conservation of protein sequence, structure, and function.
Collapse
Affiliation(s)
- Iakes Ezkurdia
- †Unidad de Proteómica and ‡Laboratorio de Proteómica Cardiovascular, Centro Nacional de Investigaciones Cardiovasculares, 28029 Madrid, Spain.,§National Bioinformatics Institute and ∥Structural Biology and Bioinformatics Programme, Spanish National Cancer Research Centre, 28029 Madrid, Spain
| | - Jose Manuel Rodriguez
- †Unidad de Proteómica and ‡Laboratorio de Proteómica Cardiovascular, Centro Nacional de Investigaciones Cardiovasculares, 28029 Madrid, Spain.,§National Bioinformatics Institute and ∥Structural Biology and Bioinformatics Programme, Spanish National Cancer Research Centre, 28029 Madrid, Spain
| | - Enrique Carrillo-de Santa Pau
- †Unidad de Proteómica and ‡Laboratorio de Proteómica Cardiovascular, Centro Nacional de Investigaciones Cardiovasculares, 28029 Madrid, Spain.,§National Bioinformatics Institute and ∥Structural Biology and Bioinformatics Programme, Spanish National Cancer Research Centre, 28029 Madrid, Spain
| | - Jesús Vázquez
- †Unidad de Proteómica and ‡Laboratorio de Proteómica Cardiovascular, Centro Nacional de Investigaciones Cardiovasculares, 28029 Madrid, Spain.,§National Bioinformatics Institute and ∥Structural Biology and Bioinformatics Programme, Spanish National Cancer Research Centre, 28029 Madrid, Spain
| | - Alfonso Valencia
- †Unidad de Proteómica and ‡Laboratorio de Proteómica Cardiovascular, Centro Nacional de Investigaciones Cardiovasculares, 28029 Madrid, Spain.,§National Bioinformatics Institute and ∥Structural Biology and Bioinformatics Programme, Spanish National Cancer Research Centre, 28029 Madrid, Spain
| | - Michael L Tress
- †Unidad de Proteómica and ‡Laboratorio de Proteómica Cardiovascular, Centro Nacional de Investigaciones Cardiovasculares, 28029 Madrid, Spain.,§National Bioinformatics Institute and ∥Structural Biology and Bioinformatics Programme, Spanish National Cancer Research Centre, 28029 Madrid, Spain
| |
Collapse
|
159
|
de Klerk E, 't Hoen PAC. Alternative mRNA transcription, processing, and translation: insights from RNA sequencing. Trends Genet 2015; 31:128-39. [PMID: 25648499 DOI: 10.1016/j.tig.2015.01.001] [Citation(s) in RCA: 216] [Impact Index Per Article: 24.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/01/2014] [Revised: 12/22/2014] [Accepted: 01/05/2015] [Indexed: 12/13/2022]
Abstract
The human transcriptome comprises >80,000 protein-coding transcripts and the estimated number of proteins synthesized from these transcripts is in the range of 250,000 to 1 million. These transcripts and proteins are encoded by less than 20,000 genes, suggesting extensive regulation at the transcriptional, post-transcriptional, and translational level. Here we review how RNA sequencing (RNA-seq) technologies have increased our understanding of the mechanisms that give rise to alternative transcripts and their alternative translation. We highlight four different regulatory processes: alternative transcription initiation, alternative splicing, alternative polyadenylation, and alternative translation initiation. We discuss their transcriptome-wide distribution, their impact on protein expression, their biological relevance, and the possible molecular mechanisms leading to their alternative regulation. We conclude with a discussion of the coordination and the interdependence of these four regulatory layers.
Collapse
Affiliation(s)
- Eleonora de Klerk
- Department of Human Genetics, Leiden University Medical Center, Leiden, The Netherlands
| | - Peter A C 't Hoen
- Department of Human Genetics, Leiden University Medical Center, Leiden, The Netherlands.
| |
Collapse
|
160
|
Uhlén M, Fagerberg L, Hallström BM, Lindskog C, Oksvold P, Mardinoglu A, Sivertsson Å, Kampf C, Sjöstedt E, Asplund A, Olsson I, Edlund K, Lundberg E, Navani S, Szigyarto CAK, Odeberg J, Djureinovic D, Takanen JO, Hober S, Alm T, Edqvist PH, Berling H, Tegel H, Mulder J, Rockberg J, Nilsson P, Schwenk JM, Hamsten M, von Feilitzen K, Forsberg M, Persson L, Johansson F, Zwahlen M, von Heijne G, Nielsen J, Pontén F. Tissue-based map of the human proteome. Science 2015; 347:1260419. [PMID: 25613900 DOI: 10.1126/science.1260419] [Citation(s) in RCA: 8997] [Impact Index Per Article: 999.7] [Reference Citation Analysis] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/11/2022]
Affiliation(s)
- Mathias Uhlén
- Science for Life Laboratory, KTH-Royal Institute of Technology, SE-171 21 Stockholm, Sweden. Department of Proteomics, KTH-Royal Institute of Technology, SE-106 91 Stockholm, Sweden. Novo Nordisk Foundation Center for Biosustainability, Technical University of Denmark, DK-2970 Hørsholm, Denmark.
| | - Linn Fagerberg
- Science for Life Laboratory, KTH-Royal Institute of Technology, SE-171 21 Stockholm, Sweden
| | - Björn M Hallström
- Science for Life Laboratory, KTH-Royal Institute of Technology, SE-171 21 Stockholm, Sweden. Department of Proteomics, KTH-Royal Institute of Technology, SE-106 91 Stockholm, Sweden
| | - Cecilia Lindskog
- Department of Immunology, Genetics and Pathology, Science for Life Laboratory, Uppsala University, SE-751 85 Uppsala, Sweden·
| | - Per Oksvold
- Science for Life Laboratory, KTH-Royal Institute of Technology, SE-171 21 Stockholm, Sweden
| | - Adil Mardinoglu
- Department of Chemical and Biological Engineering, Chalmers University of Technology, SE-412 96 Gothenburg, Sweden
| | - Åsa Sivertsson
- Science for Life Laboratory, KTH-Royal Institute of Technology, SE-171 21 Stockholm, Sweden
| | - Caroline Kampf
- Department of Immunology, Genetics and Pathology, Science for Life Laboratory, Uppsala University, SE-751 85 Uppsala, Sweden·
| | - Evelina Sjöstedt
- Science for Life Laboratory, KTH-Royal Institute of Technology, SE-171 21 Stockholm, Sweden. Department of Immunology, Genetics and Pathology, Science for Life Laboratory, Uppsala University, SE-751 85 Uppsala, Sweden·
| | - Anna Asplund
- Department of Immunology, Genetics and Pathology, Science for Life Laboratory, Uppsala University, SE-751 85 Uppsala, Sweden·
| | - IngMarie Olsson
- Department of Immunology, Genetics and Pathology, Science for Life Laboratory, Uppsala University, SE-751 85 Uppsala, Sweden·
| | - Karolina Edlund
- Leibniz Research Centre for Working Environment and Human Factors (IfADo) at Dortmund TU, D-44139 Dortmund, Germany
| | - Emma Lundberg
- Science for Life Laboratory, KTH-Royal Institute of Technology, SE-171 21 Stockholm, Sweden
| | | | | | - Jacob Odeberg
- Science for Life Laboratory, KTH-Royal Institute of Technology, SE-171 21 Stockholm, Sweden
| | - Dijana Djureinovic
- Department of Immunology, Genetics and Pathology, Science for Life Laboratory, Uppsala University, SE-751 85 Uppsala, Sweden·
| | - Jenny Ottosson Takanen
- Department of Proteomics, KTH-Royal Institute of Technology, SE-106 91 Stockholm, Sweden
| | - Sophia Hober
- Department of Proteomics, KTH-Royal Institute of Technology, SE-106 91 Stockholm, Sweden
| | - Tove Alm
- Science for Life Laboratory, KTH-Royal Institute of Technology, SE-171 21 Stockholm, Sweden
| | - Per-Henrik Edqvist
- Department of Immunology, Genetics and Pathology, Science for Life Laboratory, Uppsala University, SE-751 85 Uppsala, Sweden·
| | - Holger Berling
- Department of Proteomics, KTH-Royal Institute of Technology, SE-106 91 Stockholm, Sweden
| | - Hanna Tegel
- Department of Proteomics, KTH-Royal Institute of Technology, SE-106 91 Stockholm, Sweden
| | - Jan Mulder
- Science for Life Laboratory, Department of Neuroscience, Karolinska Institute, SE-171 77 Stockholm, Sweden
| | - Johan Rockberg
- Department of Proteomics, KTH-Royal Institute of Technology, SE-106 91 Stockholm, Sweden
| | - Peter Nilsson
- Science for Life Laboratory, KTH-Royal Institute of Technology, SE-171 21 Stockholm, Sweden
| | - Jochen M Schwenk
- Science for Life Laboratory, KTH-Royal Institute of Technology, SE-171 21 Stockholm, Sweden
| | - Marica Hamsten
- Department of Proteomics, KTH-Royal Institute of Technology, SE-106 91 Stockholm, Sweden
| | - Kalle von Feilitzen
- Science for Life Laboratory, KTH-Royal Institute of Technology, SE-171 21 Stockholm, Sweden
| | - Mattias Forsberg
- Science for Life Laboratory, KTH-Royal Institute of Technology, SE-171 21 Stockholm, Sweden
| | - Lukas Persson
- Science for Life Laboratory, KTH-Royal Institute of Technology, SE-171 21 Stockholm, Sweden
| | - Fredric Johansson
- Science for Life Laboratory, KTH-Royal Institute of Technology, SE-171 21 Stockholm, Sweden
| | - Martin Zwahlen
- Science for Life Laboratory, KTH-Royal Institute of Technology, SE-171 21 Stockholm, Sweden
| | - Gunnar von Heijne
- Center for Biomembrane Research, Department of Biochemistry and Biophysics, Stockholm University, Stockholm, Sweden
| | - Jens Nielsen
- Novo Nordisk Foundation Center for Biosustainability, Technical University of Denmark, DK-2970 Hørsholm, Denmark. Department of Chemical and Biological Engineering, Chalmers University of Technology, SE-412 96 Gothenburg, Sweden
| | - Fredrik Pontén
- Department of Immunology, Genetics and Pathology, Science for Life Laboratory, Uppsala University, SE-751 85 Uppsala, Sweden·
| |
Collapse
|
161
|
Soneson C, Love MI, Robinson MD. Differential analyses for RNA-seq: transcript-level estimates improve gene-level inferences. F1000Res 2015. [PMID: 26925227 DOI: 10.12688/f1000research10.12688/f1000research.7563.1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Indexed: 04/15/2023] Open
Abstract
High-throughput sequencing of cDNA (RNA-seq) is used extensively to characterize the transcriptome of cells. Many transcriptomic studies aim at comparing either abundance levels or the transcriptome composition between given conditions, and as a first step, the sequencing reads must be used as the basis for abundance quantification of transcriptomic features of interest, such as genes or transcripts. Various quantification approaches have been proposed, ranging from simple counting of reads that overlap given genomic regions to more complex estimation of underlying transcript abundances. In this paper, we show that gene-level abundance estimates and statistical inference offer advantages over transcript-level analyses, in terms of performance and interpretability. We also illustrate that the presence of differential isoform usage can lead to inflated false discovery rates in differential gene expression analyses on simple count matrices but that this can be addressed by incorporating offsets derived from transcript-level abundance estimates. We also show that the problem is relatively minor in several real data sets. Finally, we provide an R package ( tximport) to help users integrate transcript-level abundance estimates from common quantification pipelines into count-based statistical inference engines.
Collapse
Affiliation(s)
- Charlotte Soneson
- Institute for Molecular Life Sciences, University of Zurich, Zurich, 8057, Switzerland; SIB Swiss Institute of Bioinformatics, University of Zurich, Zurich, 8057, Switzerland
| | - Michael I Love
- Department of Biostatistics and Computational Biology, Dana-Farber Cancer Institute, Boston, MA, 02210, USA; Department of Biostatistics, Harvard TH Chan School of Public Health, Boston, MA, 02115, USA
| | - Mark D Robinson
- Institute for Molecular Life Sciences, University of Zurich, Zurich, 8057, Switzerland; SIB Swiss Institute of Bioinformatics, University of Zurich, Zurich, 8057, Switzerland
| |
Collapse
|
162
|
Soneson C, Love MI, Robinson MD. Differential analyses for RNA-seq: transcript-level estimates improve gene-level inferences. F1000Res 2015. [PMID: 26925227 DOI: 10.5256/f1000research.7563.d114723] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Indexed: 04/20/2023] Open
Abstract
High-throughput sequencing of cDNA (RNA-seq) is used extensively to characterize the transcriptome of cells. Many transcriptomic studies aim at comparing either abundance levels or the transcriptome composition between given conditions, and as a first step, the sequencing reads must be used as the basis for abundance quantification of transcriptomic features of interest, such as genes or transcripts. Various quantification approaches have been proposed, ranging from simple counting of reads that overlap given genomic regions to more complex estimation of underlying transcript abundances. In this paper, we show that gene-level abundance estimates and statistical inference offer advantages over transcript-level analyses, in terms of performance and interpretability. We also illustrate that the presence of differential isoform usage can lead to inflated false discovery rates in differential gene expression analyses on simple count matrices but that this can be addressed by incorporating offsets derived from transcript-level abundance estimates. We also show that the problem is relatively minor in several real data sets. Finally, we provide an R package ( tximport) to help users integrate transcript-level abundance estimates from common quantification pipelines into count-based statistical inference engines.
Collapse
Affiliation(s)
- Charlotte Soneson
- Institute for Molecular Life Sciences, University of Zurich, Zurich, 8057, Switzerland; SIB Swiss Institute of Bioinformatics, University of Zurich, Zurich, 8057, Switzerland
| | - Michael I Love
- Department of Biostatistics and Computational Biology, Dana-Farber Cancer Institute, Boston, MA, 02210, USA; Department of Biostatistics, Harvard TH Chan School of Public Health, Boston, MA, 02115, USA
| | - Mark D Robinson
- Institute for Molecular Life Sciences, University of Zurich, Zurich, 8057, Switzerland; SIB Swiss Institute of Bioinformatics, University of Zurich, Zurich, 8057, Switzerland
| |
Collapse
|
163
|
Soneson C, Love MI, Robinson MD. Differential analyses for RNA-seq: transcript-level estimates improve gene-level inferences. F1000Res 2015. [PMID: 26925227 DOI: 10.5256/f1000research.7563.d114726] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Indexed: 04/20/2023] Open
Abstract
High-throughput sequencing of cDNA (RNA-seq) is used extensively to characterize the transcriptome of cells. Many transcriptomic studies aim at comparing either abundance levels or the transcriptome composition between given conditions, and as a first step, the sequencing reads must be used as the basis for abundance quantification of transcriptomic features of interest, such as genes or transcripts. Various quantification approaches have been proposed, ranging from simple counting of reads that overlap given genomic regions to more complex estimation of underlying transcript abundances. In this paper, we show that gene-level abundance estimates and statistical inference offer advantages over transcript-level analyses, in terms of performance and interpretability. We also illustrate that the presence of differential isoform usage can lead to inflated false discovery rates in differential gene expression analyses on simple count matrices but that this can be addressed by incorporating offsets derived from transcript-level abundance estimates. We also show that the problem is relatively minor in several real data sets. Finally, we provide an R package ( tximport) to help users integrate transcript-level abundance estimates from common quantification pipelines into count-based statistical inference engines.
Collapse
Affiliation(s)
- Charlotte Soneson
- Institute for Molecular Life Sciences, University of Zurich, Zurich, 8057, Switzerland; SIB Swiss Institute of Bioinformatics, University of Zurich, Zurich, 8057, Switzerland
| | - Michael I Love
- Department of Biostatistics and Computational Biology, Dana-Farber Cancer Institute, Boston, MA, 02210, USA; Department of Biostatistics, Harvard TH Chan School of Public Health, Boston, MA, 02115, USA
| | - Mark D Robinson
- Institute for Molecular Life Sciences, University of Zurich, Zurich, 8057, Switzerland; SIB Swiss Institute of Bioinformatics, University of Zurich, Zurich, 8057, Switzerland
| |
Collapse
|
164
|
Soneson C, Love MI, Robinson MD. Differential analyses for RNA-seq: transcript-level estimates improve gene-level inferences. F1000Res 2015. [PMID: 26925227 DOI: 10.5256/f1000research.7563.d114724] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Indexed: 04/20/2023] Open
Abstract
High-throughput sequencing of cDNA (RNA-seq) is used extensively to characterize the transcriptome of cells. Many transcriptomic studies aim at comparing either abundance levels or the transcriptome composition between given conditions, and as a first step, the sequencing reads must be used as the basis for abundance quantification of transcriptomic features of interest, such as genes or transcripts. Various quantification approaches have been proposed, ranging from simple counting of reads that overlap given genomic regions to more complex estimation of underlying transcript abundances. In this paper, we show that gene-level abundance estimates and statistical inference offer advantages over transcript-level analyses, in terms of performance and interpretability. We also illustrate that the presence of differential isoform usage can lead to inflated false discovery rates in differential gene expression analyses on simple count matrices but that this can be addressed by incorporating offsets derived from transcript-level abundance estimates. We also show that the problem is relatively minor in several real data sets. Finally, we provide an R package ( tximport) to help users integrate transcript-level abundance estimates from common quantification pipelines into count-based statistical inference engines.
Collapse
Affiliation(s)
- Charlotte Soneson
- Institute for Molecular Life Sciences, University of Zurich, Zurich, 8057, Switzerland; SIB Swiss Institute of Bioinformatics, University of Zurich, Zurich, 8057, Switzerland
| | - Michael I Love
- Department of Biostatistics and Computational Biology, Dana-Farber Cancer Institute, Boston, MA, 02210, USA; Department of Biostatistics, Harvard TH Chan School of Public Health, Boston, MA, 02115, USA
| | - Mark D Robinson
- Institute for Molecular Life Sciences, University of Zurich, Zurich, 8057, Switzerland; SIB Swiss Institute of Bioinformatics, University of Zurich, Zurich, 8057, Switzerland
| |
Collapse
|
165
|
Soneson C, Love MI, Robinson MD. Differential analyses for RNA-seq: transcript-level estimates improve gene-level inferences. F1000Res 2015. [PMID: 26925227 DOI: 10.5256/f1000research.7563.d114722] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Indexed: 04/20/2023] Open
Abstract
High-throughput sequencing of cDNA (RNA-seq) is used extensively to characterize the transcriptome of cells. Many transcriptomic studies aim at comparing either abundance levels or the transcriptome composition between given conditions, and as a first step, the sequencing reads must be used as the basis for abundance quantification of transcriptomic features of interest, such as genes or transcripts. Various quantification approaches have been proposed, ranging from simple counting of reads that overlap given genomic regions to more complex estimation of underlying transcript abundances. In this paper, we show that gene-level abundance estimates and statistical inference offer advantages over transcript-level analyses, in terms of performance and interpretability. We also illustrate that the presence of differential isoform usage can lead to inflated false discovery rates in differential gene expression analyses on simple count matrices but that this can be addressed by incorporating offsets derived from transcript-level abundance estimates. We also show that the problem is relatively minor in several real data sets. Finally, we provide an R package ( tximport) to help users integrate transcript-level abundance estimates from common quantification pipelines into count-based statistical inference engines.
Collapse
Affiliation(s)
- Charlotte Soneson
- Institute for Molecular Life Sciences, University of Zurich, Zurich, 8057, Switzerland; SIB Swiss Institute of Bioinformatics, University of Zurich, Zurich, 8057, Switzerland
| | - Michael I Love
- Department of Biostatistics and Computational Biology, Dana-Farber Cancer Institute, Boston, MA, 02210, USA; Department of Biostatistics, Harvard TH Chan School of Public Health, Boston, MA, 02115, USA
| | - Mark D Robinson
- Institute for Molecular Life Sciences, University of Zurich, Zurich, 8057, Switzerland; SIB Swiss Institute of Bioinformatics, University of Zurich, Zurich, 8057, Switzerland
| |
Collapse
|
166
|
Soneson C, Love MI, Robinson MD. Differential analyses for RNA-seq: transcript-level estimates improve gene-level inferences. F1000Res 2015. [PMID: 26925227 DOI: 10.5256/f1000research.7563.d114730] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Indexed: 04/20/2023] Open
Abstract
High-throughput sequencing of cDNA (RNA-seq) is used extensively to characterize the transcriptome of cells. Many transcriptomic studies aim at comparing either abundance levels or the transcriptome composition between given conditions, and as a first step, the sequencing reads must be used as the basis for abundance quantification of transcriptomic features of interest, such as genes or transcripts. Various quantification approaches have been proposed, ranging from simple counting of reads that overlap given genomic regions to more complex estimation of underlying transcript abundances. In this paper, we show that gene-level abundance estimates and statistical inference offer advantages over transcript-level analyses, in terms of performance and interpretability. We also illustrate that the presence of differential isoform usage can lead to inflated false discovery rates in differential gene expression analyses on simple count matrices but that this can be addressed by incorporating offsets derived from transcript-level abundance estimates. We also show that the problem is relatively minor in several real data sets. Finally, we provide an R package ( tximport) to help users integrate transcript-level abundance estimates from common quantification pipelines into count-based statistical inference engines.
Collapse
Affiliation(s)
- Charlotte Soneson
- Institute for Molecular Life Sciences, University of Zurich, Zurich, 8057, Switzerland; SIB Swiss Institute of Bioinformatics, University of Zurich, Zurich, 8057, Switzerland
| | - Michael I Love
- Department of Biostatistics and Computational Biology, Dana-Farber Cancer Institute, Boston, MA, 02210, USA; Department of Biostatistics, Harvard TH Chan School of Public Health, Boston, MA, 02115, USA
| | - Mark D Robinson
- Institute for Molecular Life Sciences, University of Zurich, Zurich, 8057, Switzerland; SIB Swiss Institute of Bioinformatics, University of Zurich, Zurich, 8057, Switzerland
| |
Collapse
|
167
|
Soneson C, Love MI, Robinson MD. Differential analyses for RNA-seq: transcript-level estimates improve gene-level inferences. F1000Res 2015. [PMID: 26925227 DOI: 10.5256/f1000research.7563.d114725] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Indexed: 04/20/2023] Open
Abstract
High-throughput sequencing of cDNA (RNA-seq) is used extensively to characterize the transcriptome of cells. Many transcriptomic studies aim at comparing either abundance levels or the transcriptome composition between given conditions, and as a first step, the sequencing reads must be used as the basis for abundance quantification of transcriptomic features of interest, such as genes or transcripts. Various quantification approaches have been proposed, ranging from simple counting of reads that overlap given genomic regions to more complex estimation of underlying transcript abundances. In this paper, we show that gene-level abundance estimates and statistical inference offer advantages over transcript-level analyses, in terms of performance and interpretability. We also illustrate that the presence of differential isoform usage can lead to inflated false discovery rates in differential gene expression analyses on simple count matrices but that this can be addressed by incorporating offsets derived from transcript-level abundance estimates. We also show that the problem is relatively minor in several real data sets. Finally, we provide an R package ( tximport) to help users integrate transcript-level abundance estimates from common quantification pipelines into count-based statistical inference engines.
Collapse
Affiliation(s)
- Charlotte Soneson
- Institute for Molecular Life Sciences, University of Zurich, Zurich, 8057, Switzerland; SIB Swiss Institute of Bioinformatics, University of Zurich, Zurich, 8057, Switzerland
| | - Michael I Love
- Department of Biostatistics and Computational Biology, Dana-Farber Cancer Institute, Boston, MA, 02210, USA; Department of Biostatistics, Harvard TH Chan School of Public Health, Boston, MA, 02115, USA
| | - Mark D Robinson
- Institute for Molecular Life Sciences, University of Zurich, Zurich, 8057, Switzerland; SIB Swiss Institute of Bioinformatics, University of Zurich, Zurich, 8057, Switzerland
| |
Collapse
|
168
|
Bao J, Tang C, Yuan S, Porse BT, Yan W. UPF2, a nonsense-mediated mRNA decay factor, is required for prepubertal Sertoli cell development and male fertility by ensuring fidelity of the transcriptome. Development 2014; 142:352-62. [PMID: 25503407 DOI: 10.1242/dev.115642] [Citation(s) in RCA: 24] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/02/2023]
Abstract
Nonsense-mediated mRNA decay (NMD) represents a highly conserved RNA surveillance mechanism through which mRNA transcripts bearing premature termination codons (PTCs) are selectively degraded to maintain transcriptomic fidelity in the cell. Numerous in vitro studies have demonstrated the importance of the NMD pathway; however, evidence supporting its physiological necessity has only just started to emerge. Here, we report that ablation of Upf2, which encodes a core NMD factor, in murine embryonic Sertoli cells (SCs) leads to severe testicular atrophy and male sterility owing to rapid depletion of both SCs and germ cells during prepubertal testicular development. RNA-Seq and bioinformatic analyses revealed impaired transcriptomic homeostasis in SC-specific Upf2 knockout testes, characterized by an accumulation of PTC-containing transcripts and the transcriptome-wide dysregulation of genes encoding splicing factors and key proteins essential for SC fate control. Our data demonstrate an essential role of UPF2-mediated NMD in prepubertal SC development and male fertility.
Collapse
Affiliation(s)
- Jianqiang Bao
- Department of Physiology and Cell Biology, University of Nevada School of Medicine, 1664 North Virginia Street, MS575, Reno, NV 89557, USA
| | - Chong Tang
- Department of Physiology and Cell Biology, University of Nevada School of Medicine, 1664 North Virginia Street, MS575, Reno, NV 89557, USA
| | - Shuiqiao Yuan
- Department of Physiology and Cell Biology, University of Nevada School of Medicine, 1664 North Virginia Street, MS575, Reno, NV 89557, USA
| | - Bo T Porse
- The Finsen Laboratory, Rigshospitalet, Faculty of Health Sciences, University of Copenhagen, Jagtvej 124, Copenhagen, DK-2200, Denmark Biotech Research and Innovation Centre (BRIC), University of Copenhagen, Jagtvej 124, Copenhagen, DK-2200, Denmark Danish Stem Cell Centre (DanStem), Faculty of Health Sciences, University of Copenhagen, Ole Maaløes Vej 5, Copenhagen N DK2200, Denmark
| | - Wei Yan
- Department of Physiology and Cell Biology, University of Nevada School of Medicine, 1664 North Virginia Street, MS575, Reno, NV 89557, USA
| |
Collapse
|
169
|
Yao QY, Xia EH, Liu FH, Gao LZ. Genome-wide identification and comparative expression analysis reveal a rapid expansion and functional divergence of duplicated genes in the WRKY gene family of cabbage, Brassica oleracea var. capitata. Gene 2014; 557:35-42. [PMID: 25481634 DOI: 10.1016/j.gene.2014.12.005] [Citation(s) in RCA: 30] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/13/2014] [Revised: 11/28/2014] [Accepted: 12/02/2014] [Indexed: 12/18/2022]
Abstract
WRKY transcription factors (TFs), one of the ten largest TF families in higher plants, play important roles in regulating plant development and resistance. To date, little is known about the WRKY TF family in Brassica oleracea. Recently, the completed genome sequence of cabbage (B. oleracea var. capitata) allows us to systematically analyze WRKY genes in this species. A total of 148 WRKY genes were characterized and classified into seven subgroups that belong to three major groups. Phylogenetic and synteny analyses revealed that the repertoire of cabbage WRKY genes was derived from a common ancestor shared with Arabidopsis thaliana. The B. oleracea WRKY genes were found to be preferentially retained after the whole-genome triplication (WGT) event in its recent ancestor, suggesting that the WGT event had largely contributed to a rapid expansion of the WRKY gene family in B. oleracea. The analysis of RNA-Seq data from various tissues (i.e., roots, stems, leaves, buds, flowers and siliques) revealed that most of the identified WRKY genes were positively expressed in cabbage, and a large portion of them exhibited patterns of differential and tissue-specific expression, demonstrating that these gene members might play essential roles in plant developmental processes. Comparative analysis of the expression level among duplicated genes showed that gene expression divergence was evidently presented among cabbage WRKY paralogs, indicating functional divergence of these duplicated WRKY genes.
Collapse
Affiliation(s)
- Qiu-Yang Yao
- Laboratory of Plant Breeding and Utilization, Yunnan University, Kunming 650091, China; Plant Germplasm and Genomics Center, Germplasm Bank of Wild Species in Southwest China, Kunming Institute of Botany, Chinese Academy of Sciences, Kunming 650201, China; University of the Chinese Academy of Sciences, Beijing 100039, China
| | - En-Hua Xia
- Plant Germplasm and Genomics Center, Germplasm Bank of Wild Species in Southwest China, Kunming Institute of Botany, Chinese Academy of Sciences, Kunming 650201, China; University of the Chinese Academy of Sciences, Beijing 100039, China
| | - Fei-Hu Liu
- Laboratory of Plant Breeding and Utilization, Yunnan University, Kunming 650091, China.
| | - Li-Zhi Gao
- Plant Germplasm and Genomics Center, Germplasm Bank of Wild Species in Southwest China, Kunming Institute of Botany, Chinese Academy of Sciences, Kunming 650201, China.
| |
Collapse
|
170
|
Moulos P, Hatzis P. Systematic integration of RNA-Seq statistical algorithms for accurate detection of differential gene expression patterns. Nucleic Acids Res 2014; 43:e25. [PMID: 25452340 PMCID: PMC4344485 DOI: 10.1093/nar/gku1273] [Citation(s) in RCA: 64] [Impact Index Per Article: 6.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022] Open
Abstract
RNA-Seq is gradually becoming the standard tool for transcriptomic expression studies in biological research. Although considerable progress has been recorded in the development of statistical algorithms for the detection of differentially expressed genes using RNA-Seq data, the list of detected genes can differ significantly between algorithms. We present a new method (PANDORA) that combines multiple algorithms toward a summarized result, more efficiently reflecting true experimental outcomes. This is achieved through the systematic combination of several analysis algorithms, by weighting their outcomes according to their performance with realistically simulated data sets generated from real data. Results supported by the analysis of both simulated and real data from different organisms as well as correlation with PolII occupancy demonstrate that PANDORA improves the detection of differential expression. It accomplishes this by optimizing the tradeoff between standard performance measurements, such as precision and sensitivity.
Collapse
Affiliation(s)
- Panagiotis Moulos
- Biomedical Sciences Research Center 'Alexander Fleming', 34 Fleming str, 16672, Vari, Greece
| | - Pantelis Hatzis
- Biomedical Sciences Research Center 'Alexander Fleming', 34 Fleming str, 16672, Vari, Greece
| |
Collapse
|
171
|
Li HD, Menon R, Omenn GS, Guan Y. Revisiting the identification of canonical splice isoforms through integration of functional genomics and proteomics evidence. Proteomics 2014; 14:2709-18. [PMID: 25265570 DOI: 10.1002/pmic.201400170] [Citation(s) in RCA: 30] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/28/2014] [Revised: 08/11/2014] [Accepted: 09/23/2014] [Indexed: 01/08/2023]
Abstract
Canonical isoforms in different databases have been defined as the most prevalent, most conserved, most expressed, longest, or the one with the clearest description of domains or posttranslational modifications. In this article, we revisit these definitions of canonical isoforms based on functional genomics and proteomics evidence, focusing on mouse data. We report a novel functional relationship network-based approach for identifying the highest connected isoforms (HCIs). We show that 46% of these HCIs are not the longest transcripts. In addition, this approach revealed many genes that have more than one highly connected isoforms. Averaged across 175 RNA-seq datasets covering diverse tissues and conditions, 65% of the HCIs show higher expression levels than nonhighest connected isoforms at the transcript level. At the protein level, these HCIs highly overlap with the expressed splice variants, based on proteomic data from eight different normal tissues. These results suggest that a more confident definition of canonical isoforms can be made through integration of multiple lines of evidence, including HCIs defined by biological processes and pathways, expression prevalence at the transcript level, and relative or absolute abundance at the protein level. This integrative proteogenomics approach can successfully identify principal isoforms that are responsible for the canonical functions of genes.
Collapse
Affiliation(s)
- Hong-Dong Li
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, MI, USA
| | | | | | | |
Collapse
|
172
|
Scelo G, Riazalhosseini Y, Greger L, Letourneau L, Gonzàlez-Porta M, Wozniak MB, Bourgey M, Harnden P, Egevad L, Jackson SM, Karimzadeh M, Arseneault M, Lepage P, How-Kit A, Daunay A, Renault V, Blanché H, Tubacher E, Sehmoun J, Viksna J, Celms E, Opmanis M, Zarins A, Vasudev NS, Seywright M, Abedi-Ardekani B, Carreira C, Selby PJ, Cartledge JJ, Byrnes G, Zavadil J, Su J, Holcatova I, Brisuda A, Zaridze D, Moukeria A, Foretova L, Navratilova M, Mates D, Jinga V, Artemov A, Nedoluzhko A, Mazur A, Rastorguev S, Boulygina E, Heath S, Gut M, Bihoreau MT, Lechner D, Foglio M, Gut IG, Skryabin K, Prokhortchouk E, Cambon-Thomsen A, Rung J, Bourque G, Brennan P, Tost J, Banks RE, Brazma A, Lathrop GM. Variation in genomic landscape of clear cell renal cell carcinoma across Europe. Nat Commun 2014; 5:5135. [PMID: 25351205 DOI: 10.1038/ncomms6135] [Citation(s) in RCA: 132] [Impact Index Per Article: 13.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/30/2014] [Accepted: 09/03/2014] [Indexed: 12/31/2022] Open
Abstract
The incidence of renal cell carcinoma (RCC) is increasing worldwide, and its prevalence is particularly high in some parts of Central Europe. Here we undertake whole-genome and transcriptome sequencing of clear cell RCC (ccRCC), the most common form of the disease, in patients from four different European countries with contrasting disease incidence to explore the underlying genomic architecture of RCC. Our findings support previous reports on frequent aberrations in the epigenetic machinery and PI3K/mTOR signalling, and uncover novel pathways and genes affected by recurrent mutations and abnormal transcriptome patterns including focal adhesion, components of extracellular matrix (ECM) and genes encoding FAT cadherins. Furthermore, a large majority of patients from Romania have an unexpected high frequency of A:T>T:A transversions, consistent with exposure to aristolochic acid (AA). These results show that the processes underlying ccRCC tumorigenesis may vary in different populations and suggest that AA may be an important ccRCC carcinogen in Romania, a finding with major public health implications.
Collapse
Affiliation(s)
- Ghislaine Scelo
- International Agency for Research on Cancer (IARC), 150 cours Albert Thomas, 69008 Lyon, France
| | - Yasser Riazalhosseini
- 1] Department of Human Genetics, McGill University, 1205 Dr Penfield Avenue, Montreal, Quebec, Canada H3A 1B1 [2] McGill University and Genome Quebec Innovation Centre, 740 Doctor Penfield Avenue, Montreal, Quebec, Canada H3A 0G1
| | - Liliana Greger
- European Molecular Biology Laboratory, European Bioinformatics Institute, EMBL-EBI, Wellcome Trust Genome Campus, Hinxton CB10 1SD, UK
| | - Louis Letourneau
- McGill University and Genome Quebec Innovation Centre, 740 Doctor Penfield Avenue, Montreal, Quebec, Canada H3A 0G1
| | - Mar Gonzàlez-Porta
- European Molecular Biology Laboratory, European Bioinformatics Institute, EMBL-EBI, Wellcome Trust Genome Campus, Hinxton CB10 1SD, UK
| | - Magdalena B Wozniak
- International Agency for Research on Cancer (IARC), 150 cours Albert Thomas, 69008 Lyon, France
| | - Mathieu Bourgey
- McGill University and Genome Quebec Innovation Centre, 740 Doctor Penfield Avenue, Montreal, Quebec, Canada H3A 0G1
| | - Patricia Harnden
- Leeds Institute of Cancer and Pathology, University of Leeds, Cancer Research Building, St James's University Hospital, Leeds LS9 7TF, UK
| | - Lars Egevad
- Department of Pathology, Karolinska Institutet, SE-171 77 Stockholm, Sweden
| | - Sharon M Jackson
- Leeds Institute of Cancer and Pathology, University of Leeds, Cancer Research Building, St James's University Hospital, Leeds LS9 7TF, UK
| | - Mehran Karimzadeh
- 1] Department of Human Genetics, McGill University, 1205 Dr Penfield Avenue, Montreal, Quebec, Canada H3A 1B1 [2] McGill University and Genome Quebec Innovation Centre, 740 Doctor Penfield Avenue, Montreal, Quebec, Canada H3A 0G1
| | - Madeleine Arseneault
- 1] Department of Human Genetics, McGill University, 1205 Dr Penfield Avenue, Montreal, Quebec, Canada H3A 1B1 [2] McGill University and Genome Quebec Innovation Centre, 740 Doctor Penfield Avenue, Montreal, Quebec, Canada H3A 0G1
| | - Pierre Lepage
- McGill University and Genome Quebec Innovation Centre, 740 Doctor Penfield Avenue, Montreal, Quebec, Canada H3A 0G1
| | - Alexandre How-Kit
- Fondation Jean Dausset - Centre d'Etude du Polymorphisme Humain, 27 rue Juliette Dodu, 75010 Paris, France
| | - Antoine Daunay
- Fondation Jean Dausset - Centre d'Etude du Polymorphisme Humain, 27 rue Juliette Dodu, 75010 Paris, France
| | - Victor Renault
- Fondation Jean Dausset - Centre d'Etude du Polymorphisme Humain, 27 rue Juliette Dodu, 75010 Paris, France
| | - Hélène Blanché
- Fondation Jean Dausset - Centre d'Etude du Polymorphisme Humain, 27 rue Juliette Dodu, 75010 Paris, France
| | - Emmanuel Tubacher
- Fondation Jean Dausset - Centre d'Etude du Polymorphisme Humain, 27 rue Juliette Dodu, 75010 Paris, France
| | - Jeremy Sehmoun
- Fondation Jean Dausset - Centre d'Etude du Polymorphisme Humain, 27 rue Juliette Dodu, 75010 Paris, France
| | - Juris Viksna
- Institute of Mathematics and Computer Science, University of Latvia, 29 Rainis Boulevard, Riga LV-1459, Latvia
| | - Edgars Celms
- Institute of Mathematics and Computer Science, University of Latvia, 29 Rainis Boulevard, Riga LV-1459, Latvia
| | - Martins Opmanis
- Institute of Mathematics and Computer Science, University of Latvia, 29 Rainis Boulevard, Riga LV-1459, Latvia
| | - Andris Zarins
- Institute of Mathematics and Computer Science, University of Latvia, 29 Rainis Boulevard, Riga LV-1459, Latvia
| | - Naveen S Vasudev
- Leeds Institute of Cancer and Pathology, University of Leeds, Cancer Research Building, St James's University Hospital, Leeds LS9 7TF, UK
| | - Morag Seywright
- Department of Pathology, The Beatson Institute for Cancer Research, Switchback Road, Bearsden, Glasgow G61 1BD, UK
| | - Behnoush Abedi-Ardekani
- International Agency for Research on Cancer (IARC), 150 cours Albert Thomas, 69008 Lyon, France
| | - Christine Carreira
- International Agency for Research on Cancer (IARC), 150 cours Albert Thomas, 69008 Lyon, France
| | - Peter J Selby
- Leeds Institute of Cancer and Pathology, University of Leeds, Cancer Research Building, St James's University Hospital, Leeds LS9 7TF, UK
| | - Jon J Cartledge
- Leeds Teaching Hospitals NHS Trust, Pyrah Department of Urology, Lincoln Wing, St James's University Hospital, Leeds LS9 7TF, UK
| | - Graham Byrnes
- International Agency for Research on Cancer (IARC), 150 cours Albert Thomas, 69008 Lyon, France
| | - Jiri Zavadil
- International Agency for Research on Cancer (IARC), 150 cours Albert Thomas, 69008 Lyon, France
| | - Jing Su
- European Molecular Biology Laboratory, European Bioinformatics Institute, EMBL-EBI, Wellcome Trust Genome Campus, Hinxton CB10 1SD, UK
| | - Ivana Holcatova
- First Faculty of Medicine, Institute of Hygiene and Epidemiology, Charles University in Prague, Studničkova 7, Praha 2, 128 00 Prague, Czech Republic
| | - Antonin Brisuda
- University Hospital Motol, V Úvalu 84, 150 06 Prague, Czech Republic
| | - David Zaridze
- Russian N.N. Blokhin Cancer Research Centre, Kashirskoye shosse 24, Moscow 115478, Russian Federation
| | - Anush Moukeria
- Russian N.N. Blokhin Cancer Research Centre, Kashirskoye shosse 24, Moscow 115478, Russian Federation
| | - Lenka Foretova
- Department of Cancer Epidemiology and Genetics, Masaryk Memorial Cancer Institute and MF MU, Zluty Kopec 7, 656 53 Brno, Czech Republic
| | - Marie Navratilova
- Department of Cancer Epidemiology and Genetics, Masaryk Memorial Cancer Institute and MF MU, Zluty Kopec 7, 656 53 Brno, Czech Republic
| | - Dana Mates
- National Institute of Public Health, Dr Leonte Anastasievici 1-3, sector 5, Bucuresti 050463, Romania
| | - Viorel Jinga
- Carol Davila University of Medicine and Pharmacy, Th. Burghele Hospital, 20 Panduri Street, 050659 Bucharest, Romania
| | - Artem Artemov
- Centre 'Bioengineering', The Russian Academy of Sciences, Moscow 117312, Russian Federation
| | - Artem Nedoluzhko
- National Research Centre 'Kurchatov Institute', 1 Akademika Kurchatova pl., Moscow 123182, Russia
| | - Alexander Mazur
- Centre 'Bioengineering', The Russian Academy of Sciences, Moscow 117312, Russian Federation
| | - Sergey Rastorguev
- National Research Centre 'Kurchatov Institute', 1 Akademika Kurchatova pl., Moscow 123182, Russia
| | - Eugenia Boulygina
- National Research Centre 'Kurchatov Institute', 1 Akademika Kurchatova pl., Moscow 123182, Russia
| | - Simon Heath
- Centro Nacional de Análisis Genómico, Baldiri Reixac, 4, Barcleona Science Park - Tower I, 08028 Barcelona, Spain
| | - Marta Gut
- Centro Nacional de Análisis Genómico, Baldiri Reixac, 4, Barcleona Science Park - Tower I, 08028 Barcelona, Spain
| | - Marie-Therese Bihoreau
- Centre National de Génotypage, CEA - Institute de Génomique, 2 rue Gaston Crémieux, 91000 Evry, France
| | - Doris Lechner
- Centre National de Génotypage, CEA - Institute de Génomique, 2 rue Gaston Crémieux, 91000 Evry, France
| | - Mario Foglio
- Centre National de Génotypage, CEA - Institute de Génomique, 2 rue Gaston Crémieux, 91000 Evry, France
| | - Ivo G Gut
- Centro Nacional de Análisis Genómico, Baldiri Reixac, 4, Barcleona Science Park - Tower I, 08028 Barcelona, Spain
| | - Konstantin Skryabin
- 1] Centre 'Bioengineering', The Russian Academy of Sciences, Moscow 117312, Russian Federation [2] National Research Centre 'Kurchatov Institute', 1 Akademika Kurchatova pl., Moscow 123182, Russia
| | - Egor Prokhortchouk
- 1] Centre 'Bioengineering', The Russian Academy of Sciences, Moscow 117312, Russian Federation [2] National Research Centre 'Kurchatov Institute', 1 Akademika Kurchatova pl., Moscow 123182, Russia
| | - Anne Cambon-Thomsen
- Faculty of Medicine, Institut National de la Santé et de la Recherche Medicale (INSERM) and University Toulouse III-Paul Sabatier, UMR 1027, 37 allées Jules Guesde, 31000 Toulouse, France
| | - Johan Rung
- European Molecular Biology Laboratory, European Bioinformatics Institute, EMBL-EBI, Wellcome Trust Genome Campus, Hinxton CB10 1SD, UK
| | - Guillaume Bourque
- 1] Department of Human Genetics, McGill University, 1205 Dr Penfield Avenue, Montreal, Quebec, Canada H3A 1B1 [2] McGill University and Genome Quebec Innovation Centre, 740 Doctor Penfield Avenue, Montreal, Quebec, Canada H3A 0G1
| | - Paul Brennan
- International Agency for Research on Cancer (IARC), 150 cours Albert Thomas, 69008 Lyon, France
| | - Jörg Tost
- Centre National de Génotypage, CEA - Institute de Génomique, 2 rue Gaston Crémieux, 91000 Evry, France
| | - Rosamonde E Banks
- Leeds Institute of Cancer and Pathology, University of Leeds, Cancer Research Building, St James's University Hospital, Leeds LS9 7TF, UK
| | - Alvis Brazma
- European Molecular Biology Laboratory, European Bioinformatics Institute, EMBL-EBI, Wellcome Trust Genome Campus, Hinxton CB10 1SD, UK
| | - G Mark Lathrop
- 1] Department of Human Genetics, McGill University, 1205 Dr Penfield Avenue, Montreal, Quebec, Canada H3A 1B1 [2] Fondation Jean Dausset - Centre d'Etude du Polymorphisme Humain, 27 rue Juliette Dodu, 75010 Paris, France [3] Centre National de Génotypage, CEA - Institute de Génomique, 2 rue Gaston Crémieux, 91000 Evry, France
| |
Collapse
|
173
|
Decoding neuroproteomics: integrating the genome, translatome and functional anatomy. Nat Neurosci 2014; 17:1491-9. [PMID: 25349915 DOI: 10.1038/nn.3829] [Citation(s) in RCA: 52] [Impact Index Per Article: 5.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/09/2014] [Accepted: 09/04/2014] [Indexed: 02/07/2023]
Abstract
The immense intercellular and intracellular heterogeneity of the CNS presents major challenges for high-throughput omic analyses. Transcriptional, translational and post-translational regulatory events are localized to specific neuronal cell types or subcellular compartments, resulting in discrete patterns of protein expression and activity. A spatial and quantitative knowledge of the neuroproteome is therefore critical to understanding both normal and pathological aspects of the functional genomics and anatomy of the CNS. Improvements in mass spectrometry allow the profiling of proteins at a sufficient depth to complement results from high-throughput genomic and transcriptomic assays. However, there are challenges in integrating proteomic data with other data modalities and even greater challenges in obtaining comprehensive neuroproteomic data with cell-type specificity. Here we discuss how proteomics should be exploited to enhance high-throughput functional genomic analysis by tighter integration of data analyses. We also discuss experimental strategies to achieve finer cellular and subcellular resolution in transcriptomic and proteomic studies of neural tissues.
Collapse
|
174
|
Muñoz J, Heck AJR. Vom humanen Genom zum humanen Proteom. Angew Chem Int Ed Engl 2014. [DOI: 10.1002/ange.201406545] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/06/2022]
|
175
|
Sun X, Yang Q, Deng Z, Ye X. Digital inventory of Arabidopsis transcripts revealed by 61 RNA sequencing samples. PLANT PHYSIOLOGY 2014; 166:869-78. [PMID: 25118256 PMCID: PMC4213114 DOI: 10.1104/pp.114.241604] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/18/2014] [Accepted: 08/10/2014] [Indexed: 05/30/2023]
Abstract
Alternative splicing is an essential biological process to generate proteome diversity and phenotypic complexity. Recent improvements in RNA sequencing accuracy and computational algorithms have provided unprecedented opportunities to examine the expression levels of Arabidopsis (Arabidopsis thaliana) transcripts. In this article, we analyzed 61 RNA sequencing samples from 10 totally independent studies of Arabidopsis and calculated the transcript expression levels in different tissues, treatments, developmental stages, and varieties. These data provide a comprehensive profile of Arabidopsis transcripts with single-base resolution. We quantified the expression levels of 40,745 transcripts annotated in The Arabidopsis Information Resource 10, comprising 73% common transcripts, 15% rare transcripts, and 12% nondetectable transcripts. In addition, we investigated diverse common transcripts in detail, including ubiquitous transcripts, dominant/subordinate transcripts, and switch transcripts, in terms of their expression and transcript ratio. Interestingly, alternative splicing was the highly enriched function for the genes related to dominant/subordinate transcripts and switch transcripts. In addition, motif analysis revealed that TC motifs were enriched in dominant transcripts but not in subordinate transcripts. These motifs were found to have a strong relationship with transcription factor activity. Our results shed light on the complexity of alternative splicing and the diversity of the contributing factors.
Collapse
Affiliation(s)
- Xiaoyong Sun
- Agricultural Big-Data Research Center, College of Information Science and Engineering, Shandong Agricultural University, Taian, Shandong 271018, China (X.S.);Department of Physiology, University of Texas Southwestern Medical Center, Dallas, Texas 75235 (Q.Y.);State Key Laboratory Breeding Base for Zhejiang Sustainable Pest and Disease Control, Institute of Virology and Biotechnology, Zhejiang Academy of Agricultural Sciences, Hangzhou 310021, China (Z.D.); andFruit Research Institute, Fujian Academy of Agricultural Sciences, Fuzhou, Fujian 350013, China (X.Y.)
| | - Qiuying Yang
- Agricultural Big-Data Research Center, College of Information Science and Engineering, Shandong Agricultural University, Taian, Shandong 271018, China (X.S.);Department of Physiology, University of Texas Southwestern Medical Center, Dallas, Texas 75235 (Q.Y.);State Key Laboratory Breeding Base for Zhejiang Sustainable Pest and Disease Control, Institute of Virology and Biotechnology, Zhejiang Academy of Agricultural Sciences, Hangzhou 310021, China (Z.D.); andFruit Research Institute, Fujian Academy of Agricultural Sciences, Fuzhou, Fujian 350013, China (X.Y.)
| | - Zhiping Deng
- Agricultural Big-Data Research Center, College of Information Science and Engineering, Shandong Agricultural University, Taian, Shandong 271018, China (X.S.);Department of Physiology, University of Texas Southwestern Medical Center, Dallas, Texas 75235 (Q.Y.);State Key Laboratory Breeding Base for Zhejiang Sustainable Pest and Disease Control, Institute of Virology and Biotechnology, Zhejiang Academy of Agricultural Sciences, Hangzhou 310021, China (Z.D.); andFruit Research Institute, Fujian Academy of Agricultural Sciences, Fuzhou, Fujian 350013, China (X.Y.)
| | - Xinfu Ye
- Agricultural Big-Data Research Center, College of Information Science and Engineering, Shandong Agricultural University, Taian, Shandong 271018, China (X.S.);Department of Physiology, University of Texas Southwestern Medical Center, Dallas, Texas 75235 (Q.Y.);State Key Laboratory Breeding Base for Zhejiang Sustainable Pest and Disease Control, Institute of Virology and Biotechnology, Zhejiang Academy of Agricultural Sciences, Hangzhou 310021, China (Z.D.); andFruit Research Institute, Fujian Academy of Agricultural Sciences, Fuzhou, Fujian 350013, China (X.Y.)
| |
Collapse
|
176
|
Muñoz J, Heck AJR. From the Human Genome to the Human Proteome. Angew Chem Int Ed Engl 2014; 53:10864-6. [DOI: 10.1002/anie.201406545] [Citation(s) in RCA: 36] [Impact Index Per Article: 3.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/24/2014] [Indexed: 11/11/2022]
|
177
|
Analysis of stop-gain and frameshift variants in human innate immunity genes. PLoS Comput Biol 2014; 10:e1003757. [PMID: 25058640 PMCID: PMC4110073 DOI: 10.1371/journal.pcbi.1003757] [Citation(s) in RCA: 25] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/22/2014] [Accepted: 06/16/2014] [Indexed: 12/03/2022] Open
Abstract
Loss-of-function variants in innate immunity genes are associated with Mendelian disorders in the form of primary immunodeficiencies. Recent resequencing projects report that stop-gains and frameshifts are collectively prevalent in humans and could be responsible for some of the inter-individual variability in innate immune response. Current computational approaches evaluating loss-of-function in genes carrying these variants rely on gene-level characteristics such as evolutionary conservation and functional redundancy across the genome. However, innate immunity genes represent a particular case because they are more likely to be under positive selection and duplicated. To create a ranking of severity that would be applicable to innate immunity genes we evaluated 17,764 stop-gain and 13,915 frameshift variants from the NHLBI Exome Sequencing Project and 1,000 Genomes Project. Sequence-based features such as loss of functional domains, isoform-specific truncation and nonsense-mediated decay were found to correlate with variant allele frequency and validated with gene expression data. We integrated these features in a Bayesian classification scheme and benchmarked its use in predicting pathogenic variants against Online Mendelian Inheritance in Man (OMIM) disease stop-gains and frameshifts. The classification scheme was applied in the assessment of 335 stop-gains and 236 frameshifts affecting 227 interferon-stimulated genes. The sequence-based score ranks variants in innate immunity genes according to their potential to cause disease, and complements existing gene-based pathogenicity scores. Specifically, the sequence-based score improves measurement of functional gene impairment, discriminates across different variants in a given gene and appears particularly useful for analysis of less conserved genes. There are well-characterized severe immunodeficiencies associated with loss-of-function variants in innate immunity genes. Genome sequencing projects identify rare stop-gain and frameshift variants in innate immunity genes whose phenotype is uncharacterized. Current methods to estimate the severity of rare stop-gains and frameshifts are based on evolutionary conservation of the gene, the likelihood for redundancy in its function or mutational burden. These parameters are not always applicable to innate immunity genes. We evaluated sequence-level characteristics of more than 30'000 stop-gains and frameshifts and prioritized variants according to their predicted functional consequences. Our scoring approach complements existing tools in the prediction of innate immunity OMIM disease variants and associates with functional readouts such as gene expression. In this framework, we show that many individuals do carry highly pathogenic variants in genes participating in antiviral defense. The clinical assessment of these variants is of significant interest.
Collapse
|
178
|
Vitulo N, Forcato C, Carpinelli EC, Telatin A, Campagna D, D'Angelo M, Zimbello R, Corso M, Vannozzi A, Bonghi C, Lucchin M, Valle G. A deep survey of alternative splicing in grape reveals changes in the splicing machinery related to tissue, stress condition and genotype. BMC PLANT BIOLOGY 2014; 14:99. [PMID: 24739459 PMCID: PMC4108029 DOI: 10.1186/1471-2229-14-99] [Citation(s) in RCA: 155] [Impact Index Per Article: 15.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 12/12/2013] [Accepted: 04/07/2014] [Indexed: 05/18/2023]
Abstract
BACKGROUND Alternative splicing (AS) significantly enhances transcriptome complexity. It is differentially regulated in a wide variety of cell types and plays a role in several cellular processes. Here we describe a detailed survey of alternative splicing in grape based on 124 SOLiD RNAseq analyses from different tissues, stress conditions and genotypes. RESULTS We used the RNAseq data to update the existing grape gene prediction with 2,258 new coding genes and 3,336 putative long non-coding RNAs. Several gene structures have been improved and alternative splicing was described for about 30% of the genes. A link between AS and miRNAs was shown in 139 genes where we found that AS affects the miRNA target site. A quantitative analysis of the isoforms indicated that most of the spliced genes have one major isoform and tend to simultaneously co-express a low number of isoforms, typically two, with intron retention being the most frequent alternative splicing event. CONCLUSIONS As described in Arabidopsis, also grape displays a marked AS tissue-specificity, while stress conditions produce splicing changes to a minor extent. Surprisingly, some distinctive splicing features were also observed between genotypes. This was further supported by the observation that the panel of Serine/Arginine-rich splicing factors show a few, but very marked differences between genotypes. The finding that a part the splicing machinery can change in closely related organisms can lead to some interesting hypotheses for evolutionary adaptation, that could be particularly relevant in the response to sudden and strong selective pressures.
Collapse
Affiliation(s)
- Nicola Vitulo
- CRIBI Biotechnology Centre, University of Padua, Padua, Italy
| | - Claudio Forcato
- CRIBI Biotechnology Centre, University of Padua, Padua, Italy
| | | | - Andrea Telatin
- CRIBI Biotechnology Centre, University of Padua, Padua, Italy
| | | | | | | | - Massimiliano Corso
- Department of Agronomy, Food, Natural resources, Animals and Environment, DAFNAE, University of Padua, Padua, Italy
| | - Alessandro Vannozzi
- Department of Agronomy, Food, Natural resources, Animals and Environment, DAFNAE, University of Padua, Padua, Italy
| | - Claudio Bonghi
- Department of Agronomy, Food, Natural resources, Animals and Environment, DAFNAE, University of Padua, Padua, Italy
| | - Margherita Lucchin
- Department of Agronomy, Food, Natural resources, Animals and Environment, DAFNAE, University of Padua, Padua, Italy
- CIRVE, Centre for Research in Viticulture and Enology, University of Padua, Padua, Italy
| | - Giorgio Valle
- CRIBI Biotechnology Centre, University of Padua, Padua, Italy
- Department of Biology, University of Padua, Padua, Italy
| |
Collapse
|
179
|
Pitchiaya S, Heinicke LA, Custer TC, Walter NG. Single molecule fluorescence approaches shed light on intracellular RNAs. Chem Rev 2014; 114:3224-65. [PMID: 24417544 PMCID: PMC3968247 DOI: 10.1021/cr400496q] [Citation(s) in RCA: 64] [Impact Index Per Article: 6.4] [Reference Citation Analysis] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/29/2022]
Affiliation(s)
- Sethuramasundaram Pitchiaya
- Single Molecule Analysis in Real-Time (SMART)
Center, University of Michigan, Ann Arbor, MI 48109-1055, USA
- Single Molecule Analysis Group, Department of
Chemistry, University of Michigan, Ann Arbor, MI 48109-1055, USA
| | - Laurie A. Heinicke
- Single Molecule Analysis Group, Department of
Chemistry, University of Michigan, Ann Arbor, MI 48109-1055, USA
| | - Thomas C. Custer
- Program in Chemical Biology, University of Michigan,
Ann Arbor, MI 48109-1055, USA
| | - Nils G. Walter
- Single Molecule Analysis in Real-Time (SMART)
Center, University of Michigan, Ann Arbor, MI 48109-1055, USA
- Single Molecule Analysis Group, Department of
Chemistry, University of Michigan, Ann Arbor, MI 48109-1055, USA
| |
Collapse
|
180
|
Precompetitive activity to address the biological data needs of drug discovery. Nat Rev Drug Discov 2014; 13:83-4. [PMID: 24481293 DOI: 10.1038/nrd4230] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/20/2023]
|
181
|
Elliott DJ. Illuminating the Transcriptome through the Genome. Genes (Basel) 2014; 5:235-53. [PMID: 24705295 PMCID: PMC3978521 DOI: 10.3390/genes5010235] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/17/2014] [Revised: 03/03/2014] [Accepted: 03/05/2014] [Indexed: 02/01/2023] Open
Abstract
Sequencing the human genome was a huge milestone in genetic research that revealed almost the total DNA sequence required to create a human being. However, in order to function, the DNA genome needs to be expressed as an RNA transcriptome. This article reviews how knowledge of genome sequence information has led to fundamental discoveries in how the transcriptome is processed, with a focus on new system-wide insights into how pre-mRNAs that are encoded by split genes in the genome are rearranged by splicing into functional mRNAs. These advances have been made possible by the development of new post-genome technologies to probe splicing patterns. Transcriptome-wide approaches have characterised a "splicing code" that is embedded within and has a significant role in deciphering the genome, and is deciphered by RNA binding proteins. These analyses have also found that most human genes encode multiple mRNA isoforms, and in some cases proteins, leading in turn to a re-assessment of what exactly a gene is. Analysis of the transcriptome has given insights into how the genome is packaged and transcribed, and is helping to explain important aspects of genome evolution.
Collapse
Affiliation(s)
- David J Elliott
- Institute of Genetic Medicine, Newcastle University, Newcastle, NE1 3BZ, UK.
| |
Collapse
|
182
|
Law CW, Chen Y, Shi W, Smyth GK. voom: Precision weights unlock linear model analysis tools for RNA-seq read counts. Genome Biol 2014; 15:R29. [PMID: 24485249 PMCID: PMC4053721 DOI: 10.1186/gb-2014-15-2-r29] [Citation(s) in RCA: 3558] [Impact Index Per Article: 355.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/25/2013] [Accepted: 02/03/2014] [Indexed: 12/17/2022] Open
Abstract
New normal linear modeling strategies are presented for analyzing read counts from RNA-seq experiments. The voom method estimates the mean-variance relationship of the log-counts, generates a precision weight for each observation and enters these into the limma empirical Bayes analysis pipeline. This opens access for RNA-seq analysts to a large body of methodology developed for microarrays. Simulation studies show that voom performs as well or better than count-based RNA-seq methods even when the data are generated according to the assumptions of the earlier methods. Two case studies illustrate the use of linear modeling and gene set testing methods.
Collapse
Affiliation(s)
- Charity W Law
- Bioinformatics Division, The Walter and Eliza Hall Institute of Medical Research, 1G Royal Parade, Parkville, Victoria 3052, Australia
- Department of Medical Biology, The University of Melbourne, Parkville, Victoria 3010, Australia
| | - Yunshun Chen
- Bioinformatics Division, The Walter and Eliza Hall Institute of Medical Research, 1G Royal Parade, Parkville, Victoria 3052, Australia
- Department of Medical Biology, The University of Melbourne, Parkville, Victoria 3010, Australia
| | - Wei Shi
- Bioinformatics Division, The Walter and Eliza Hall Institute of Medical Research, 1G Royal Parade, Parkville, Victoria 3052, Australia
- Department of Computing and Information Systems, The University of Melbourne, Parkville, Victoria 3010, Australia
| | - Gordon K Smyth
- Bioinformatics Division, The Walter and Eliza Hall Institute of Medical Research, 1G Royal Parade, Parkville, Victoria 3052, Australia
- Department of Mathematics and Statistics, The University of Melbourne, Parkville, Victoria 3010, Australia
| |
Collapse
|
183
|
Bi YM, Meyer A, Downs GS, Shi X, El-Kereamy A, Lukens L, Rothstein SJ. High throughput RNA sequencing of a hybrid maize and its parents shows different mechanisms responsive to nitrogen limitation. BMC Genomics 2014; 15:77. [PMID: 24472600 PMCID: PMC3912931 DOI: 10.1186/1471-2164-15-77] [Citation(s) in RCA: 27] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/16/2013] [Accepted: 01/25/2014] [Indexed: 12/19/2022] Open
Abstract
BACKGROUND Development of crop varieties with high nitrogen use efficiency (NUE) is crucial for minimizing N loss, reducing environmental pollution and decreasing input cost. Maize is one of the most important crops cultivated worldwide and its productivity is closely linked to the amount of fertilizer used. A survey of the transcriptomes of shoot and root tissues of a maize hybrid line and its two parental inbred lines grown under sufficient and limiting N conditions by mRNA-Seq has been conducted to have a better understanding of how different maize genotypes respond to N limitation. RESULTS A different set of genes were found to be N-responsive in the three genotypes. Many biological processes important for N metabolism such as the cellular nitrogen compound metabolic process and the cellular amino acid metabolic process were enriched in the N-responsive gene list from the hybrid shoots but not from the parental lines' shoots. Coupled to this, sugar, carbohydrate, monosaccharide, glucose, and sorbitol transport pathways were all up-regulated in the hybrid, but not in the parents under N limitation. Expression patterns also differed between shoots and roots, such as the up-regulation of the cytokinin degradation pathway in the shoots of the hybrid and down-regulation of that pathway in the roots. The change of gene expression under N limitation in the hybrid resembled the parent with the higher NUE trait. The transcript abundances of alleles derived from each parent were estimated using polymorphic sites in mapped reads in the hybrid. While there were allele abundance differences, there was no correlation between these and the expression differences seen between the hybrid and the two parents. CONCLUSIONS Gene expression in two parental inbreds and the corresponding hybrid line in response to N limitation was surveyed using the mRNA-Seq technology. The data showed that the three genotypes respond very differently to N-limiting conditions, and the hybrid clearly has a unique expression pattern compared to its parents. Our results expand our current understanding of N responses and will help move us forward towards effective strategies to improve NUE and enhance crop production.
Collapse
Affiliation(s)
| | | | | | | | | | | | - Steven J Rothstein
- Department of Molecular and Cellular Biology, University of Guelph, N1G 2 W1 Guelph, ON, Canada.
| |
Collapse
|
184
|
Burns PD, Li Y, Ma J, Borodovsky M. UnSplicer: mapping spliced RNA-Seq reads in compact genomes and filtering noisy splicing. Nucleic Acids Res 2013; 42:e25. [PMID: 24259430 PMCID: PMC3936741 DOI: 10.1093/nar/gkt1141] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/16/2022] Open
Abstract
Accurate mapping of spliced RNA-Seq reads to genomic DNA has been known as a challenging problem. Despite significant efforts invested in developing efficient algorithms, with the human genome as a primary focus, the best solution is still not known. A recently introduced tool, TrueSight, has demonstrated better performance compared with earlier developed algorithms such as TopHat and MapSplice. To improve detection of splice junctions, TrueSight uses information on statistical patterns of nucleotide ordering in intronic and exonic DNA. This line of research led to yet another new algorithm, UnSplicer, designed for eukaryotic species with compact genomes where functional alternative splicing is likely to be dominated by splicing noise. Genome-specific parameters of the new algorithm are generated by GeneMark-ES, an ab initio gene prediction algorithm based on unsupervised training. UnSplicer shares several components with TrueSight; the difference lies in the training strategy and the classification algorithm. We tested UnSplicer on RNA-Seq data sets of Arabidopsis thaliana, Caenorhabditis elegans, Cryptococcus neoformans and Drosophila melanogaster. We have shown that splice junctions inferred by UnSplicer are in better agreement with knowledge accumulated on these well-studied genomes than predictions made by earlier developed tools.
Collapse
Affiliation(s)
- Paul D Burns
- Joint Georgia Tech and Emory Wallace H. Coulter Department of Biomedical Engineering, Atlanta, GA 30332, USA, Department of Bioengineering, University of Illinois at Urbana-Champaign, IL 61801, USA, Institute for Genomic Biology, University of Illinois at Urbana-Champaign, IL 61801, USA, School of Computational Science & Engineering, Georgia Tech, Atlanta, GA 30332, USA and Department of Bioinformatics, Moscow Institute of Physics and Technology, Moscow, 141700, Russia
| | | | | | | |
Collapse
|
185
|
Eksi R, Li HD, Menon R, Wen Y, Omenn GS, Kretzler M, Guan Y. Systematically differentiating functions for alternatively spliced isoforms through integrating RNA-seq data. PLoS Comput Biol 2013; 9:e1003314. [PMID: 24244129 PMCID: PMC3820534 DOI: 10.1371/journal.pcbi.1003314] [Citation(s) in RCA: 68] [Impact Index Per Article: 6.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/18/2013] [Accepted: 09/19/2013] [Indexed: 12/13/2022] Open
Abstract
Integrating large-scale functional genomic data has significantly accelerated our understanding of gene functions. However, no algorithm has been developed to differentiate functions for isoforms of the same gene using high-throughput genomic data. This is because standard supervised learning requires ‘ground-truth’ functional annotations, which are lacking at the isoform level. To address this challenge, we developed a generic framework that interrogates public RNA-seq data at the transcript level to differentiate functions for alternatively spliced isoforms. For a specific function, our algorithm identifies the ‘responsible’ isoform(s) of a gene and generates classifying models at the isoform level instead of at the gene level. Through cross-validation, we demonstrated that our algorithm is effective in assigning functions to genes, especially the ones with multiple isoforms, and robust to gene expression levels and removal of homologous gene pairs. We identified genes in the mouse whose isoforms are predicted to have disparate functionalities and experimentally validated the ‘responsible’ isoforms using data from mammary tissue. With protein structure modeling and experimental evidence, we further validated the predicted isoform functional differences for the genes Cdkn2a and Anxa6. Our generic framework is the first to predict and differentiate functions for alternatively spliced isoforms, instead of genes, using genomic data. It is extendable to any base machine learner and other species with alternatively spliced isoforms, and shifts the current gene-centered function prediction to isoform-level predictions. In mammalian genomes, a single gene can be alternatively spliced into multiple isoforms which greatly increase the functional diversity of the genome. In the human, more than 95% of multi-exon genes undergo alternative splicing. It is hard to computationally differentiate the functions for the splice isoforms of the same gene, because they are almost always annotated with the same functions and share similar sequences. In this paper, we developed a generic framework to identify the ‘responsible’ isoform(s) for each function that the gene carries out, and therefore predict functional assignment on the isoform level instead of on the gene level. Within this generic framework, we implemented and evaluated several related algorithms for isoform function prediction. We tested these algorithms through both computational evaluation and experimental validation of the predicted ‘responsible’ isoform(s) and the predicted disparate functions of the isoforms of Cdkn2a and of Anxa6. Our algorithm represents the first effort to predict and differentiate isoforms through large-scale genomic data integration.
Collapse
Affiliation(s)
- Ridvan Eksi
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, Michigan, United States of America
| | - Hong-Dong Li
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, Michigan, United States of America
| | - Rajasree Menon
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, Michigan, United States of America
| | - Yuchen Wen
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, Michigan, United States of America
| | - Gilbert S. Omenn
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, Michigan, United States of America
- Department of Internal Medicine, University of Michigan, Ann Arbor, Michigan, United States of America
- * E-mail: (GSO); (MK); (YG)
| | - Matthias Kretzler
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, Michigan, United States of America
- Department of Internal Medicine, University of Michigan, Ann Arbor, Michigan, United States of America
- * E-mail: (GSO); (MK); (YG)
| | - Yuanfang Guan
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, Michigan, United States of America
- Department of Internal Medicine, University of Michigan, Ann Arbor, Michigan, United States of America
- Department of Electrical Engineering and Computer Science, University of Michigan, Ann Arbor, Michigan, United States of America
- * E-mail: (GSO); (MK); (YG)
| |
Collapse
|
186
|
Abstract
The last decade has seen tremendous effort committed to the annotation of the human genome sequence, most notably perhaps in the form of the ENCODE project. One of the major findings of ENCODE, and other genome analysis projects, is that the human transcriptome is far larger and more complex than previously thought. This complexity manifests, for example, as alternative splicing within protein-coding genes, as well as in the discovery of thousands of long noncoding RNAs. It is also possible that significant numbers of human transcripts have not yet been described by annotation projects, while existing transcript models are frequently incomplete. The question as to what proportion of this complexity is truly functional remains open, however, and this ambiguity presents a serious challenge to genome scientists. In this article, we will discuss the current state of human transcriptome annotation, drawing on our experience gained in generating the GENCODE gene annotation set. We highlight the gaps in our knowledge of transcript functionality that remain, and consider the potential computational and experimental strategies that can be used to help close them. We propose that an understanding of the true overlap between transcriptional complexity and functionality will not be gained in the short term. However, significant steps toward obtaining this knowledge can now be taken by using an integrated strategy, combining all of the experimental resources at our disposal.
Collapse
Affiliation(s)
- Jonathan M Mudge
- Department of Informatics, Wellcome Trust Sanger Institute, Hinxton CB10 1SA, United Kingdom
| | | | | |
Collapse
|
187
|
Liu Q, Ullery J, Zhu J, Liebler DC, Marnett LJ, Zhang B. RNA-seq data analysis at the gene and CDS levels provides a comprehensive view of transcriptome responses induced by 4-hydroxynonenal. MOLECULAR BIOSYSTEMS 2013; 9:3036-46. [PMID: 24056865 DOI: 10.1039/c3mb70114j] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/20/2022]
Abstract
Reactive electrophiles produced during oxidative stress, such as 4-hydroxynonenal (HNE), are increasingly recognized as contributing factors in a variety of degenerative and inflammatory diseases. Here we used the RNA-seq technology to characterize transcriptome responses in RKO cells induced by HNE at subcytotoxic and cytotoxic doses. RNA-seq analysis rediscovered most of the differentially expressed genes reported by microarray studies and also identified novel gene responses. Interestingly, differential expression detection at the coding DNA sequence (CDS) level helped to further improve the consistency between the two technologies, suggesting the utility and importance of the CDS level analysis. RNA-seq data analysis combining gene and CDS levels yielded an informative and comprehensive picture of gradually evolving response networks with increasing HNE doses, from cell protection against oxidative injury at low dose, initiation of cell apoptosis and DNA damage at intermediate dose to significant deregulation of cellular functions at high dose. These evolving dose-dependent pathway changes, which cannot be observed by the gene level analysis alone, clearly reveal the HNE cytotoxic effect and are supported by IC50 experiments. Additionally, differential expression at the CDS level provides new insights into isoform regulation mechanisms. Taken together, our data demonstrate the power of RNA-seq to identify subtle transcriptome changes and to characterize effects induced by HNE through the generation of high-resolution data coupled with differential analysis at both gene and CDS levels.
Collapse
Affiliation(s)
- Qi Liu
- Department of Biomedical Informatics, Vanderbilt University School of Medicine, Nashville, TN 37232, USA.
| | | | | | | | | | | |
Collapse
|