51
|
Morata J, Béjar S, Talavera D, Riera C, Lois S, de Xaxars GM, de la Cruz X. The relationship between gene isoform multiplicity, number of exons and protein divergence. PLoS One 2013; 8:e72742. [PMID: 24023641 PMCID: PMC3758341 DOI: 10.1371/journal.pone.0072742] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/27/2013] [Accepted: 07/14/2013] [Indexed: 11/18/2022] Open
Abstract
At present we know that phenotypic differences between organisms arise from a variety of sources, like protein sequence divergence, regulatory sequence divergence, alternative splicing, etc. However, we do not have yet a complete view of how these sources are related. Here we address this problem, studying the relationship between protein divergence and the ability of genes to express multiple isoforms. We used three genome-wide datasets of human-mouse orthologs to study the relationship between isoform multiplicity co-occurrence between orthologs (the fact that two orthologs have more than one isoform) and protein divergence. In all cases our results showed that there was a monotonic dependence between these two properties. We could explain this relationship in terms of a more fundamental one, between exon number of the largest isoform and protein divergence. We found that this last relationship was present, although with variations, in other species (chimpanzee, cow, rat, chicken, zebrafish and fruit fly). In summary, we have identified a relationship between protein divergence and isoform multiplicity co-occurrence and explained its origin in terms of a simple gene-level property. Finally, we discuss the biological implications of these findings for our understanding of inter-species phenotypic differences.
Collapse
Affiliation(s)
- Jordi Morata
- Department of Structural Biology, Institut de Biologia Molecular de Barcelona (IBMB)-Consejo Superior de Investigaciones Científicas (CSIC), Barcelona, Spain
| | - Santi Béjar
- Department of Structural Biology, Institut de Biologia Molecular de Barcelona (IBMB)-Consejo Superior de Investigaciones Científicas (CSIC), Barcelona, Spain
| | - David Talavera
- Faculty of Life Sciences, Manchester University, Manchester, United Kingdom
| | - Casandra Riera
- Laboratory of Translational Bioinformatics in Neuroscience, Vall d'Hebron Institute of Research (VHIR), Barcelona, Spain
| | - Sergio Lois
- Laboratory of Translational Bioinformatics in Neuroscience, Vall d'Hebron Institute of Research (VHIR), Barcelona, Spain
| | - Gemma Mas de Xaxars
- Laboratori de Botànica, Facultat de Farmàcia, Universitat de Barcelona, Barcelona, Spain
| | - Xavier de la Cruz
- Department of Structural Biology, Institut de Biologia Molecular de Barcelona (IBMB)-Consejo Superior de Investigaciones Científicas (CSIC), Barcelona, Spain
- Laboratory of Translational Bioinformatics in Neuroscience, Vall d'Hebron Institute of Research (VHIR), Barcelona, Spain
- Institució Catalana de Recerca i Estudis Avançats (ICREA), Barcelona, Spain
- * E-mail:
| |
Collapse
|
52
|
Expansion of the mutually exclusive spliced exome in Drosophila. Nat Commun 2013; 4:2460. [DOI: 10.1038/ncomms3460] [Citation(s) in RCA: 20] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/24/2013] [Accepted: 08/19/2013] [Indexed: 12/16/2022] Open
|
53
|
Koumandou VL, Scorilas A. Evolution of the plasma and tissue kallikreins, and their alternative splicing isoforms. PLoS One 2013; 8:e68074. [PMID: 23874499 PMCID: PMC3707919 DOI: 10.1371/journal.pone.0068074] [Citation(s) in RCA: 36] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/11/2012] [Accepted: 05/25/2013] [Indexed: 12/14/2022] Open
Abstract
Kallikreins are secreted serine proteases with important roles in human physiology. Human plasma kallikrein, encoded by the KLKB1 gene on locus 4q34-35, functions in the blood coagulation pathway, and in regulating blood pressure. The human tissue kallikrein and kallikrein-related peptidases (KLKs) have diverse expression patterns and physiological roles, including cancer-related processes such as cell growth regulation, angiogenesis, invasion, and metastasis. Prostate-specific antigen (PSA), the product of the KLK3 gene, is the most widely used biomarker in clinical practice today. A total of 15 KLKs are encoded by the largest contiguous cluster of protease genes in the human genome (19q13.3-13.4), which makes them ideal for evolutionary analysis of gene duplication events. Previous studies on the evolution of KLKs have traced mammalian homologs as well as a probable early origin of the family in aves, amphibia and reptilia. The aim of this study was to address the evolutionary and functional relationships between tissue KLKs and plasma kallikrein, and to examine the evolution of alternative splicing isoforms. Sequences of plasma and tissue kallikreins and their alternative transcripts were collected from the NCBI and Ensembl databases, and comprehensive phylogenetic analysis was performed by Bayesian as well as maximum likelihood methods. Plasma and tissue kallikreins exhibit high sequence similarity in the trypsin domain (>50%). Phylogenetic analysis indicates an early divergence of KLKB1, which groups closely with plasminogen, chymotrypsin, and complement factor D (CFD), in a monophyletic group distinct from trypsin and the tissue KLKs. Reconstruction of the earliest events leading to the diversification of the tissue KLKs is not well resolved, indicating rapid expansion in mammals. Alternative transcripts of each KLK gene show species-specific divergence, while examination of sequence conservation indicates that many annotated human KLK isoforms are missing the catalytic triad that is crucial for protease activity.
Collapse
Affiliation(s)
| | - Andreas Scorilas
- Department of Biochemistry and Molecular Biology, University of Athens, Athens, Greece
- * E-mail:
| |
Collapse
|
54
|
Gonzàlez-Porta M, Frankish A, Rung J, Harrow J, Brazma A. Transcriptome analysis of human tissues and cell lines reveals one dominant transcript per gene. Genome Biol 2013; 14:R70. [PMID: 23815980 PMCID: PMC4053754 DOI: 10.1186/gb-2013-14-7-r70] [Citation(s) in RCA: 196] [Impact Index Per Article: 16.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/17/2013] [Accepted: 07/01/2013] [Indexed: 12/15/2022] Open
Abstract
BACKGROUND RNA sequencing has opened new avenues for the study of transcriptome composition. Significant evidence has accumulated showing that the human transcriptome contains in excess of a hundred thousand different transcripts. However, it is still not clear to what extent this diversity prevails when considering the relative abundances of different transcripts from the same gene. RESULTS Here we show that, in a given condition, most protein coding genes have one major transcript expressed at significantly higher level than others, that in human tissues the major transcripts contribute almost 85 percent to the total mRNA from protein coding loci, and that often the same major transcript is expressed in many tissues. We detect a high degree of overlap between the set of major transcripts and a recently published set of alternatively spliced transcripts that are predicted to be translated utilizing proteomic data. Thus, we hypothesize that although some minor transcripts may play a functional role, the major ones are likely to be the main contributors to the proteome. However, we still detect a non-negligible fraction of protein coding genes for which the major transcript does not code a protein. CONCLUSIONS Overall, our findings suggest that the transcriptome from protein coding loci is dominated by one transcript per gene and that not all the transcripts that contribute to transcriptome diversity are equally likely to contribute to protein diversity. This observation can help to prioritize candidate targets in proteomics research and to predict the functional impact of the detected changes in variation studies.
Collapse
|
55
|
Sheynkman GM, Shortreed MR, Frey BL, Smith LM. Discovery and mass spectrometric analysis of novel splice-junction peptides using RNA-Seq. Mol Cell Proteomics 2013; 12:2341-53. [PMID: 23629695 DOI: 10.1074/mcp.o113.028142] [Citation(s) in RCA: 104] [Impact Index Per Article: 8.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/22/2023] Open
Abstract
Human proteomic databases required for MS peptide identification are frequently updated and carefully curated, yet are still incomplete because it has been challenging to acquire every protein sequence from the diverse assemblage of proteoforms expressed in every tissue and cell type. In particular, alternative splicing has been shown to be a major source of this cell-specific proteomic variation. Many new alternative splice forms have been detected at the transcript level using next generation sequencing methods, especially RNA-Seq, but it is not known how many of these transcripts are being translated. Leveraging the unprecedented capabilities of next generation sequencing methods, we collected RNA-Seq and proteomics data from the same cell population (Jurkat cells) and created a bioinformatics pipeline that builds customized databases for the discovery of novel splice-junction peptides. Eighty million paired-end Illumina reads and ∼500,000 tandem mass spectra were used to identify 12,873 transcripts (19,320 including isoforms) and 6810 proteins. We developed a bioinformatics workflow to retrieve high-confidence, novel splice junction sequences from the RNA data, translate these sequences into the analogous polypeptide sequence, and create a customized splice junction database for MS searching. Based on the RefSeq gene models, we detected 136,123 annotated and 144,818 unannotated transcript junctions. Of those, 24,834 unannotated junctions passed various quality filters (e.g. minimum read depth) and these entries were translated into 33,589 polypeptide sequences and used for database searching. We discovered 57 splice junction peptides not present in the Uniprot-Trembl proteomic database comprising an array of different splicing events, including skipped exons, alternative donors and acceptors, and noncanonical transcriptional start sites. To our knowledge this is the first example of using sample-specific RNA-Seq data to create a splice-junction database and discover new peptides resulting from alternative splicing.
Collapse
Affiliation(s)
- Gloria M Sheynkman
- Department of Chemistry, University of Wisconsin-Madison, 1101 University Ave., Madison, Wisconsin 53706, USA
| | | | | | | |
Collapse
|
56
|
Light S, Elofsson A. The impact of splicing on protein domain architecture. Curr Opin Struct Biol 2013; 23:451-8. [PMID: 23562110 DOI: 10.1016/j.sbi.2013.02.013] [Citation(s) in RCA: 27] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/14/2013] [Revised: 02/22/2013] [Accepted: 02/28/2013] [Indexed: 10/27/2022]
Abstract
Many proteins are composed of protein domains, functional units of common descent. Multidomain forms are common in all eukaryotes making up more than half of the proteome and the evolution of novel domain architecture has been accelerated in metazoans. It is also becoming increasingly clear that alternative splicing is prevalent among vertebrates. Given that protein domains are defined as structurally, functionally and evolutionarily distinct units, one may speculate that some alternative splicing events may lead to clean excisions of protein domains, thus generating a number of different domain architectures from one gene template. However, recent findings indicate that smaller alternative splicing events, in particular in disordered regions, might be more prominent than domain architectural changes. The problem of identifying protein isoforms is, however, still not resolved. Clearly, many splice forms identified through detection of mRNA sequences appear to produce 'nonfunctional' proteins, such as proteins with missing internal secondary structure elements. Here, we review the state of the art methods for identification of functional isoforms and present a summary of what is known, thus far, about alternative splicing with regard to protein domain architectures.
Collapse
Affiliation(s)
- Sara Light
- Science for Life Laboratory, Stockholm University, Box 1031 SE-171 21 Solna, Sweden
| | | |
Collapse
|
57
|
Bianco AM, Marcuzzi A, Zanin V, Girardelli M, Vuch J, Crovella S. Database tools in genetic diseases research. Genomics 2013; 101:75-85. [DOI: 10.1016/j.ygeno.2012.11.001] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/19/2012] [Revised: 10/26/2012] [Accepted: 11/01/2012] [Indexed: 01/22/2023]
|
58
|
Rodriguez JM, Maietta P, Ezkurdia I, Pietrelli A, Wesselink JJ, Lopez G, Valencia A, Tress ML. APPRIS: annotation of principal and alternative splice isoforms. Nucleic Acids Res 2012; 41:D110-7. [PMID: 23161672 PMCID: PMC3531113 DOI: 10.1093/nar/gks1058] [Citation(s) in RCA: 165] [Impact Index Per Article: 12.7] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/03/2022] Open
Abstract
Here, we present APPRIS (http://appris.bioinfo.cnio.es), a database that houses annotations of human splice isoforms. APPRIS has been designed to provide value to manual annotations of the human genome by adding reliable protein structural and functional data and information from cross-species conservation. The visual representation of the annotations provided by APPRIS for each gene allows annotators and researchers alike to easily identify functional changes brought about by splicing events. In addition to collecting, integrating and analyzing reliable predictions of the effect of splicing events, APPRIS also selects a single reference sequence for each gene, here termed the principal isoform, based on the annotations of structure, function and conservation for each transcript. APPRIS identifies a principal isoform for 85% of the protein-coding genes in the GENCODE 7 release for ENSEMBL. Analysis of the APPRIS data shows that at least 70% of the alternative (non-principal) variants would lose important functional or structural information relative to the principal isoform.
Collapse
|
59
|
Bicknell AA, Cenik C, Chua HN, Roth FP, Moore MJ. Introns in UTRs: why we should stop ignoring them. Bioessays 2012; 34:1025-34. [PMID: 23108796 DOI: 10.1002/bies.201200073] [Citation(s) in RCA: 95] [Impact Index Per Article: 7.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]
Abstract
Although introns in 5'- and 3'-untranslated regions (UTRs) are found in many protein coding genes, rarely are they considered distinctive entities with specific functions. Indeed, mammalian transcripts with 3'-UTR introns are often assumed nonfunctional because they are subject to elimination by nonsense-mediated decay (NMD). Nonetheless, recent findings indicate that 5'- and 3'-UTR intron status is of significant functional consequence for the regulation of mammalian genes. Therefore these features should be ignored no longer.
Collapse
Affiliation(s)
- Alicia A Bicknell
- Biochemistry and Molecular Pharmacology, University of Massachusetts Medical School, Worcester, MA, USA
| | | | | | | | | |
Collapse
|
60
|
Abstract
Because they are generally noncoding and thus considered nonfunctional and unimportant, pseudogenes have long been neglected. Recent advances have established that the DNA of a pseudogene, the RNA transcribed from a pseudogene, or the protein translated from a pseudogene can have multiple, diverse functions and that these functions can affect not only their parental genes but also unrelated genes. Therefore, pseudogenes have emerged as a previously unappreciated class of sophisticated modulators of gene expression, with a multifaceted involvement in the pathogenesis of human cancer.
Collapse
Affiliation(s)
- Laura Poliseno
- Oncogenomics Unit, Core Research Laboratory, Istituto Toscano Tumori (CRL-ITT), c/o IFC-CNR Via Moruzzi 1, 56124 Pisa, Italy.
| |
Collapse
|
61
|
Jacobs E, Mills JD, Janitz M. The role of RNA structure in posttranscriptional regulation of gene expression. J Genet Genomics 2012; 39:535-43. [PMID: 23089363 DOI: 10.1016/j.jgg.2012.08.002] [Citation(s) in RCA: 32] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/27/2012] [Revised: 08/16/2012] [Accepted: 08/17/2012] [Indexed: 01/18/2023]
Abstract
As more information is gathered on the mechanisms of transcription and translation, it is becoming apparent that these processes are highly regulated. The formation of mRNA secondary and tertiary structures is one such regulatory process that until recently it has not been analysed in depth. Formation of these mRNA structures has the potential to enhance and inhibit alternative splicing of transcripts, and regulate rates and amount of translation. As this regulatory mechanism potentially impacts at both the transcriptional and translational level, while also potentially utilising the vast array of non-coding RNAs, it warrants further investigation. Currently, a variety of high-throughput sequencing techniques including parallel analysis of RNA structure (PARS), fragmentation sequencing (FragSeq) and selective 2-hydroxyl acylation analysed by primer extension (SHAPE) lead the way in the genome-wide identification and analysis of mRNA structure formation. These new sequencing techniques highlight the diversity and complexity of the transcriptome, and demonstrate another regulatory mechanism that could become a target for new therapeutic approaches.
Collapse
Affiliation(s)
- Elina Jacobs
- School of Biotechnology and Biomolecular Sciences, University of New South Wales, Sydney NSW 2052, Australia
| | | | | |
Collapse
|