1
|
Murali M, Saquing J, Lu S, Gao Z, Watts EF, Jordan B, Wakefield ZP, Fiszbein A, Cooper DR, Castaldi PJ, Korkin D, Sheynkman GM. Biosurfer for systematic tracking of regulatory mechanisms leading to protein isoform diversity. Genome Res 2025; 35:1012-1024. [PMID: 40086882 PMCID: PMC12047184 DOI: 10.1101/gr.279317.124] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/15/2024] [Accepted: 01/06/2025] [Indexed: 03/16/2025]
Abstract
Long-read RNA-seq has shed light on transcriptomic complexity, but questions remain about the functionality of downstream protein products. We introduce Biosurfer, a computational approach for comparing protein isoforms, while systematically tracking the transcriptional, splicing, and translational variations that underlie differences in the sequences of the protein products. Using Biosurfer, we analyzed the differences in 35,082 pairs of GENCODE annotated protein isoforms, finding a majority (70%) of variable N-termini are due to the alternative transcription start sites, while only 9% arise from 5' UTR alternative splicing (AS). Biosurfer's detailed tracking of nucleotide-to-residue relationships helps reveal an uncommonly tracked source of single amino acid residue changes arising from the codon splits at junctions. For 17% of internal sequence changes, such split codon patterns lead to single residue differences, termed "ragged codons." Of variable C-termini, 72% involve splice- or intron retention-induced reading frameshifts. We systematically characterize an unusual pattern of reading frame changes, in which the first frameshift is closely followed by a distinct second frameshift that restores the original frame, which we term a "snapback" frameshift. We analyze the long-read RNA-seq-predicted proteome of a human cell line and find similar trends as compared to our GENCODE analysis, with the exception of a higher proportion of transcripts predicted to undergo nonsense-mediated decay. Biosurfer's comprehensive characterization of long-read RNA-seq data sets should accelerate insights of the functional role of protein isoforms, providing mechanistic explanation of the origins of the proteomic diversity driven by the AS. Biosurfer is available as a Python package.
Collapse
Affiliation(s)
- Mayank Murali
- Broad Institute of MIT and Harvard University, Cambridge, Massachusetts 02142, USA
| | - Jamie Saquing
- Department of Molecular Physiology and Biological Physics, University of Virginia, Charlottesville, Virginia 22903, USA
| | - Senbao Lu
- Bioinformatics and Computational Biology Program, Worcester Polytechnic Institute, Worcester, Massachusetts 01609, USA
- Computer Science Department, Worcester Polytechnic Institute, Worcester, Massachusetts 01609, USA
| | - Ziyang Gao
- Bioinformatics and Computational Biology Program, Worcester Polytechnic Institute, Worcester, Massachusetts 01609, USA
- Computer Science Department, Worcester Polytechnic Institute, Worcester, Massachusetts 01609, USA
| | - Emily F Watts
- Department of Molecular Physiology and Biological Physics, University of Virginia, Charlottesville, Virginia 22903, USA
| | - Ben Jordan
- Department of Molecular Physiology and Biological Physics, University of Virginia, Charlottesville, Virginia 22903, USA
| | - Zachary Peters Wakefield
- Bioinformatics Program, Boston University, Boston, Massachusetts 02215, USA
- Department of Biology, Boston University, Boston, Massachusetts 02215, USA
| | - Ana Fiszbein
- Bioinformatics Program, Boston University, Boston, Massachusetts 02215, USA
- Department of Biology, Boston University, Boston, Massachusetts 02215, USA
| | - David R Cooper
- Department of Molecular Physiology and Biological Physics, University of Virginia, Charlottesville, Virginia 22903, USA
| | - Peter J Castaldi
- Channing Division of Network Medicine, Department of Medicine, Brigham and Women's Hospital, Boston, Massachusetts 02115, USA
- Division of General Medicine and Primary Care, Department of Medicine, Brigham and Women's Hospital, Boston, Massachusetts 02115, USA
| | - Dmitry Korkin
- Bioinformatics and Computational Biology Program, Worcester Polytechnic Institute, Worcester, Massachusetts 01609, USA
- Computer Science Department, Worcester Polytechnic Institute, Worcester, Massachusetts 01609, USA
| | - Gloria M Sheynkman
- Department of Molecular Physiology and Biological Physics, University of Virginia, Charlottesville, Virginia 22903, USA;
- Department of Biochemistry and Molecular Genetics, University of Virginia, Charlottesville, Virginia 22903, USA
- Center for Public Health Genomics, University of Virginia, Charlottesville, Virginia 22903, USA
- UVA Cancer Center, University of Virginia, Charlottesville, Virginia 22903, USA
| |
Collapse
|
2
|
Murali M, Saquing J, Lu S, Gao Z, Jordan B, Wakefield ZP, Fiszbein A, Cooper DR, Castaldi PJ, Korkin D, Sheynkman G. Biosurfer for systematic tracking of regulatory mechanisms leading to protein isoform diversity. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.03.15.585320. [PMID: 38559226 PMCID: PMC10980011 DOI: 10.1101/2024.03.15.585320] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Indexed: 04/04/2024]
Abstract
Long-read RNA sequencing has shed light on transcriptomic complexity, but questions remain about the functionality of downstream protein products. We introduce Biosurfer, a computational approach for comparing protein isoforms, while systematically tracking the transcriptional, splicing, and translational variations that underlie differences in the sequences of the protein products. Using Biosurfer, we analyzed the differences in 32,799 pairs of GENCODE annotated protein isoforms, finding a majority (70%) of variable N-termini are due to the alternative transcription start sites, while only 9% arise from 5' UTR alternative splicing. Biosurfer's detailed tracking of nucleotide-to-residue relationships helped reveal an uncommonly tracked source of single amino acid residue changes arising from the codon splits at junctions. For 17% of internal sequence changes, such split codon patterns lead to single residue differences, termed "ragged codons". Of variable C-termini, 72% involve splice- or intron retention-induced reading frameshifts. We found an unusual pattern of reading frame changes, in which the first frameshift is closely followed by a distinct second frameshift that restores the original frame, which we term a "snapback" frameshift. We analyzed long read RNA-seq-predicted proteome of a human cell line and found similar trends as compared to our GENCODE analysis, with the exception of a higher proportion of isoforms predicted to undergo nonsense-mediated decay. Biosurfer's comprehensive characterization of long-read RNA-seq datasets should accelerate insights of the functional role of protein isoforms, providing mechanistic explanation of the origins of the proteomic diversity driven by the alternative splicing. Biosurfer is available as a Python package at https://github.com/sheynkman-lab/biosurfer.
Collapse
Affiliation(s)
- Mayank Murali
- Broad Institute of MIT and Harvard University, Cambridge, MA, USA
| | - Jamie Saquing
- Department of Molecular Physiology and Biological Physics, University of Virginia, Charlottesville, VA, USA
| | - Senbao Lu
- Bioinformatics and Computational Biology Program, Worcester Polytechnic Institute, Worcester, MA, USA
- Computer Science Department, Worcester Polytechnic Institute, Worcester, MA, USA
| | - Ziyang Gao
- Bioinformatics and Computational Biology Program, Worcester Polytechnic Institute, Worcester, MA, USA
- Computer Science Department, Worcester Polytechnic Institute, Worcester, MA, USA
| | - Ben Jordan
- Department of Molecular Physiology and Biological Physics, University of Virginia, Charlottesville, VA, USA
| | - Zachary Peters Wakefield
- Bioinformatics Program, Boston University, Boston, MA, USA
- Department of Biology, Boston University, Boston, MA, USA
| | - Ana Fiszbein
- Bioinformatics Program, Boston University, Boston, MA, USA
- Department of Biology, Boston University, Boston, MA, USA
| | - David R. Cooper
- Department of Molecular Physiology and Biological Physics, University of Virginia, Charlottesville, VA, USA
| | - Peter J. Castaldi
- Channing Division of Network Medicine, Department of Medicine, Brigham and Women’s Hospital, Boston, MA, USA
- Division of General Medicine and Primary Care, Department of Medicine, Brigham and Women’s Hospital, Boston, MA, USA
| | - Dmitry Korkin
- Bioinformatics and Computational Biology Program, Worcester Polytechnic Institute, Worcester, MA, USA
- Computer Science Department, Worcester Polytechnic Institute, Worcester, MA, USA
| | - Gloria Sheynkman
- Department of Molecular Physiology and Biological Physics, University of Virginia, Charlottesville, VA, USA
- Department of Biochemistry and Molecular Genetics, University of Virginia, Charlottesville, VA, USA
- Center for Public Health Genomics, University of Virginia, Charlottesville, VA, USA
- UVA Cancer Center, University of Virginia, Charlottesville, VA, USA
| |
Collapse
|
3
|
Hammond DA, Olman V, Xu Y. Functional understanding of the diverse exon-intron structures of human GPCR genes. J Bioinform Comput Biol 2013; 12:1350019. [PMID: 24467758 DOI: 10.1142/s0219720013500194] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022]
Abstract
The GPCR genes have a variety of exon-intron structures even though their proteins are all structurally homologous. We have examined all human GPCR genes with at least two functional protein isoforms, totaling 199, aiming to gain an understanding of what may have contributed to the large diversity of the exon-intron structures of the GPCR genes. The 199 genes have a total of 808 known protein splicing isoforms with experimentally verified functions. Our analysis reveals that 1301 (80.6%) adjacent exon-exon pairs out of the total of 1,613 in the 199 genes have either exactly one exon skipped or the intron in-between retained in at least one of the 808 protein splicing isoforms. This observation has a statistical significance p-value of 2.051762 * e(-09), assuming that the observed splicing isoforms are independent of the exon-intron structures. Our interpretation of this observation is that the exon boundaries of the GPCR genes are not randomly determined; instead they may be selected to facilitate specific alternative splicing for functional purposes.
Collapse
Affiliation(s)
- Dorothy A Hammond
- Computational Systems Biology Laboratory, Department of Biochemistry and Molecular Biology and Institute of Bioinformatics, University of Georgia, Athens, GA 30602, USA
| | | | | |
Collapse
|
4
|
Hamby SE, Thomas NST, Cooper DN, Chuzhanova N. A meta-analysis of single base-pair substitutions in translational termination codons ('nonstop' mutations) that cause human inherited disease. Hum Genomics 2011; 5:241-64. [PMID: 21712188 PMCID: PMC3525242 DOI: 10.1186/1479-7364-5-4-241] [Citation(s) in RCA: 34] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/21/2022] Open
Abstract
'Nonstop' mutations are single base-pair substitutions that occur within translational termination (stop) codons and which can lead to the continued and inappropriate translation of the mRNA into the 3'-untranslated region. We have performed a meta-analysis of the 119 nonstop mutations (in 87 different genes) known to cause human inherited disease, examining the sequence context of the mutated stop codons and the average distance to the next alternative in-frame stop codon downstream, in comparison with their counterparts from control (non-mutated) gene sequences. A paucity of alternative in-frame stop codons was noted in the immediate vicinity (0-49 nucleotides downstream) of the mutated stop codons as compared with their control counterparts (p = 7.81 × 10-4). This implies that at least some nonstop mutations with alternative stop codons in close proximity will not have come to clinical attention, possibly because they will have given rise to stable mRNAs (not subject to nonstop mRNA decay) that are translatable into proteins of near-normal length and biological function. A significant excess of downstream in-frame stop codons was, however, noted in the range 150-199 nucleotides from the mutated stop codon (p = 8.55 × 10-4). We speculate that recruitment of an alternative stop codon at greater distance from the mutated stop codon may trigger nonstop mRNA decay, thereby decreasing the amount of protein product and yielding a readily discernible clinical phenotype. Confirmation or otherwise of this postulate must await the emergence of a clearer understanding of the mechanism of nonstop mRNA decay in mammalian cells.
Collapse
Affiliation(s)
- Stephen E Hamby
- School of Science and Technology, Nottingham Trent University, UK
| | | | | | | |
Collapse
|
5
|
Gianazza E, Eberini I, Sensi C, Barile M, Vergani L, Vanoni MA. Energy matters: mitochondrial proteomics for biomedicine. Proteomics 2011; 11:657-74. [PMID: 21241019 DOI: 10.1002/pmic.201000412] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/05/2010] [Revised: 09/22/2010] [Accepted: 11/03/2010] [Indexed: 12/16/2022]
Abstract
This review compiles results of medical relevance from mitochondrial proteomics, grouped either according to the type of disease - genetic or degenerative - or to the involved mechanism - oxidative stress or apoptosis. The findings are commented in the light of our current understanding of uniformity/variability in cell responses to different stimuli. Specificities in the conceptual and technical approaches to human mitochondrial proteomics are also outlined.
Collapse
Affiliation(s)
- Elisabetta Gianazza
- Dipartimento di Scienze Farmacologiche, Università degli Studi di Milano, Milano, Italy.
| | | | | | | | | | | |
Collapse
|
6
|
Bahn S, Noll R, Barnes A, Schwarz E, Guest PC. Challenges of introducing new biomarker products for neuropsychiatric disorders into the market. INTERNATIONAL REVIEW OF NEUROBIOLOGY 2011; 101:299-327. [PMID: 22050857 DOI: 10.1016/b978-0-12-387718-5.00012-2] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/06/2023]
Abstract
There are many challenges associated with the discovery and development of serum-based biomarkers for psychiatric disorders such as schizophrenia. Here, we review these challenges from the point of view of psychiatrists, general practitioners, the regulatory agencies, and biomarker scientists. There is a general opinion in psychiatric medicine that improvements over the current subjective tests are essential. Despite this, there is a reluctance to accept that peripheral molecules can do the job any better. In addition, psychiatrists find it difficult to accept that peripheral molecules, such as those found in blood, can reflect what is happening in the brain. However, the regulatory health authorities now consider biomarkers as important for the future of drug development and have called for efforts to modernize methods, tools, and techniques for the purpose of developing more efficient and safer drugs. We also describe here the development of the first ever molecular blood test for schizophrenia, and its reception in the market place, as a case in point.
Collapse
Affiliation(s)
- Sabine Bahn
- Department of Chemical Engineering and Biotechnology, University of Cambridge, Cambridge, United Kingdom
| | | | | | | | | |
Collapse
|
7
|
D'Antonio M, Masseroli M. Extraction, integration and analysis of alternative splicing and protein structure distributed information. BMC Bioinformatics 2009; 10 Suppl 12:S15. [PMID: 19828075 PMCID: PMC2762064 DOI: 10.1186/1471-2105-10-s12-s15] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/25/2022] Open
Abstract
Background Alternative splicing has been demonstrated to affect most of human genes; different isoforms from the same gene encode for proteins which differ for a limited number of residues, thus yielding similar structures. This suggests possible correlations between alternative splicing and protein structure. In order to support the investigation of such relationships, we have developed the Alternative Splicing and Protein Structure Scrutinizer (PASS), a Web application to automatically extract, integrate and analyze human alternative splicing and protein structure data sparsely available in the Alternative Splicing Database, Ensembl databank and Protein Data Bank. Primary data from these databases have been integrated and analyzed using the Protein Identifier Cross-Reference, BLAST, CLUSTALW and FeatureMap3D software tools. Results A database has been developed to store the considered primary data and the results from their analysis; a system of Perl scripts has been implemented to automatically create and update the database and analyze the integrated data; a Web interface has been implemented to make the analyses easily accessible; a database has been created to manage user accesses to the PASS Web application and store user's data and searches. Conclusion PASS automatically integrates data from the Alternative Splicing Database with protein structure data from the Protein Data Bank. Additionally, it comprehensively analyzes the integrated data with publicly available well-known bioinformatics tools in order to generate structural information of isoform pairs. Further analysis of such valuable information might reveal interesting relationships between alternative splicing and protein structure differences, which may be significantly associated with different functions.
Collapse
|
8
|
Jackson NE, Wang HW, Bryant KJ, McNeil HP, Husain A, Liu K, Tedla N, Thomas PS, King GC, Hettiaratchi A, Cairns J, Hunt JE. Alternate mRNA splicing in multiple human tryptase genes is predicted to regulate tetramer formation. J Biol Chem 2008; 283:34178-87. [PMID: 18854315 PMCID: PMC2662235 DOI: 10.1074/jbc.m807553200] [Citation(s) in RCA: 11] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/30/2008] [Indexed: 11/06/2022] Open
Abstract
Tryptases are serine proteases that are thought to be uniquely and proteolytically active as tetramers. Crystallographic studies reveal that the active tetramer is a flat ring structure composed of four monomers, with their active sites arranged around a narrow central pore. This model explains why many of the preferred substrates of tryptase are short peptides; however, it does not explain how tryptase cleaves large protein substrates such as fibronectin, although a number of studies have reported in vitro mechanisms for generating active monomers that could digest larger substrates. Here we suggest that alternate mRNA splicing of human tryptase genes generates active tryptase monomers (or dimers). We have identified a conserved pattern of alternate splicing in four tryptase alleles (alphaII, betaI, betaIII, and deltaI), representing three distinct tryptase gene loci. When compared with their full-length counterparts, the splice variants use an alternate acceptor site within exon 4. This results in the deletion of 27 nucleotides within the central coding sequence and 9 amino acids from the translated protein product. Although modeling suggests that the deletion can be easily accommodated by the enzymes structurally, it is predicted to alter the specificity by enlarging the S1' or S2' binding pocket and results in the complete loss of the "47 loop," reported to be critical for the formation of tetramers. Although active monomers can be generated in vitro using a range of artificial conditions, we suggest that alternate splicing is the in vivo mechanism used to generate active tryptase that can cleave large protein substrates.
Collapse
Affiliation(s)
- Nicole E Jackson
- Centre for Infection and Inflammation Research, School of Medical Sciences, Sydney, New South Wales 2052, Australia
| | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
9
|
Using ribosomal protein genes as reference: a tale of caution. PLoS One 2008; 3:e1854. [PMID: 18365009 PMCID: PMC2267211 DOI: 10.1371/journal.pone.0001854] [Citation(s) in RCA: 157] [Impact Index Per Article: 9.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/06/2007] [Accepted: 02/20/2008] [Indexed: 12/28/2022] Open
Abstract
Background Housekeeping genes are needed in every tissue as their expression is required for survival, integrity or duplication of every cell. Housekeeping genes commonly have been used as reference genes to normalize gene expression data, the underlying assumption being that they are expressed in every cell type at approximately the same level. Often, the terms “reference genes” and “housekeeping genes” are used interchangeably. In this paper, we would like to distinguish between these terms. Consensus is growing that housekeeping genes which have traditionally been used to normalize gene expression data are not good reference genes. Recently, ribosomal protein genes have been suggested as reference genes based on a meta-analysis of publicly available microarray data. Methodology/Principal Findings We have applied several statistical tools on a dataset of 70 microarrays representing 22 different tissues, to assess and visualize expression stability of ribosomal protein genes. We confirmed the housekeeping status of these genes, but further estimated expression stability across tissues in order to assess their potential as reference genes. One- and two-way ANOVA revealed that all ribosomal protein genes have significant expression variation across tissues and exhibit tissue-dependent expression behavior as a group. Via multidimensional unfolding analysis, we visualized this tissue-dependency. In addition, we explored mechanisms that may cause tissue dependent effects of individual ribosomal protein genes. Conclusions/Significance Here we provide statistical and biological evidence that ribosomal protein genes exhibit important tissue-dependent variation in mRNA expression. Though these genes are most stably expressed of all investigated genes in a meta-analysis they cannot be considered true reference genes.
Collapse
|
10
|
Ding L, Mychaleckyj JC, Hegde AN. Full length cloning and expression analysis of splice variants of regulator of G-protein signaling RGS4 in human and murine brain. Gene 2007; 401:46-60. [PMID: 17707117 DOI: 10.1016/j.gene.2007.07.002] [Citation(s) in RCA: 24] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/11/2006] [Revised: 05/25/2007] [Accepted: 07/02/2007] [Indexed: 10/23/2022]
Abstract
RGS4 (regulator of G protein signaling 4) protein is a GTPase-activating protein specific for Gi/o and Gq alpha subunits. It is highly expressed in brain but the mechanisms by which RGS4 expression is regulated remain unknown. RGS4 is associated with schizophrenia either through heritable genetic polymorphisms or as a co-regulated mediator of the pathology, and may play a role in other brain diseases. As a necessary step towards understanding the transcriptional regulation of RGS4, we isolated full-length splice variants of the human RGS4 and mouse Rgs4 gene using bioinformatic predictions, followed by RACE, RT-PCR, and sequencing. In human brain, we found five different isoforms RGS4-1, RGS4-2, RGS4-3, RGS4-4 and RGS4-5 of which RGS4-2, RGS4-3, RGS4-4 and RGS4-5 are novel. RGS4-1 and 2 encode a 205-amino acid protein, while RGS4-3 encodes a 302 aa protein with an N-terminal extension. RGS4-4 and RGS4-5 encode truncated proteins of 93 aa and 187 aa respectively. Our results indicate that RGS4-1, RGS4-2, RGS4-3 and RGS4-4 are translated into proteins. In contrast, the mouse brain has 3 different splice variants, Rgs4-1, Rgs4-2 and Rgs4-3 which encode the same 205 aa protein but vary in their 3'UTRs. Among the mouse isoforms, Rgs4-1 and Rgs4-3 are novel. Human RGS4 has four different transcription start sites and three different stop sites. We found differential expression of the human isoforms in dorsolateral prefrontal and visual cortex. All five RGS4 splice variants are expressed at high levels in human cortical areas although RGS4 isoforms 1, 2, and 3 are not expressed in the cerebellum. RGS4-2 is tissue-specific whereas RGS4-4 and RGS4-5 appear to be ubiquitously expressed. Our results suggest the intriguing possibility that RGS4 gene expression in the human brain is spatially and temporally regulated through differential transcription of isoforms from alternative promoters. This may have implications for the physiological role of RGS4 and in pathologies of the brain.
Collapse
Affiliation(s)
- Lan Ding
- Department of Neurobiology and Anatomy, Wake Forest University Health Sciences, Medical Center Boulevard, Winston-Salem, NC 27157, USA
| | | | | |
Collapse
|
11
|
Pino P, Foth BJ, Kwok LY, Sheiner L, Schepers R, Soldati T, Soldati-Favre D. Dual targeting of antioxidant and metabolic enzymes to the mitochondrion and the apicoplast of Toxoplasma gondii. PLoS Pathog 2007; 3:e115. [PMID: 17784785 PMCID: PMC1959373 DOI: 10.1371/journal.ppat.0030115] [Citation(s) in RCA: 88] [Impact Index Per Article: 4.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/02/2007] [Accepted: 06/27/2007] [Indexed: 01/05/2023] Open
Abstract
Toxoplasma gondii is an aerobic protozoan parasite that possesses mitochondrial antioxidant enzymes to safely dispose of oxygen radicals generated by cellular respiration and metabolism. As with most Apicomplexans, it also harbors a chloroplast-like organelle, the apicoplast, which hosts various biosynthetic pathways and requires antioxidant protection. Most apicoplast-resident proteins are encoded in the nuclear genome and are targeted to the organelle via a bipartite N-terminal targeting sequence. We show here that two antioxidant enzymes—a superoxide dismutase (TgSOD2) and a thioredoxin-dependent peroxidase (TgTPX1/2)—and an aconitase are dually targeted to both the apicoplast and the mitochondrion of T. gondii. In the case of TgSOD2, our results indicate that a single gene product is bimodally targeted due to an inconspicuous variation within the putative signal peptide of the organellar protein, which significantly alters its subcellular localization. Dual organellar targeting of proteins might occur frequently in Apicomplexans to serve important biological functions such as antioxidant protection and carbon metabolism. Toxoplasma gondii is a human and animal pathogen representative of the large group of Apicomplexa. Most members of this phylum contain, in addition to a tubular mitochondrion, a second endosymbiotic organelle indispensable for parasite survival, called the apicoplast. This non-photosynthetic plastid is the site of several anabolic pathways, including the biosynthesis of fatty acids, isoprenoids, iron-sulphur cluster, and heme. Virtually all enzymes active inside the apicoplast are encoded by the nuclear genome and targeted to the organelle via the endoplasmic reticulum courtesy of a bipartite amino terminal recognition sequence. The metabolic activities of the apicoplast impose a high demand for antioxidant protection. We show here that T. gondii possesses a superoxide dismutase and a peroxidase that are shared between the two organelles by an unusual mechanism of bimodal targeting whereby the nature of the signal peptide influences the destination of the protein to both organelles. Dual targeting also extends to other classical metabolic enzymes such as aconitase, uncovering unexpected metabolic pathways occurring in these organelles. In consequence, the bioinformatic predictions for plastidic or mitochondrial targeting on the basis of the characteristics of N-terminal presequences are insufficient in the absence of an experimental confirmation.
Collapse
Affiliation(s)
- Paco Pino
- Department of Microbiology and Molecular Medicine, Centre Medical Universitaire, University of Geneva, Geneva, Switzerland
| | - Bernardo Javier Foth
- Department of Microbiology and Molecular Medicine, Centre Medical Universitaire, University of Geneva, Geneva, Switzerland
| | - Lai-Yu Kwok
- Department of Microbiology and Molecular Medicine, Centre Medical Universitaire, University of Geneva, Geneva, Switzerland
| | - Lilach Sheiner
- Department of Microbiology and Molecular Medicine, Centre Medical Universitaire, University of Geneva, Geneva, Switzerland
| | - Rebecca Schepers
- Department of Biological Sciences, Imperial College London, London, United Kingdom
| | - Thierry Soldati
- Department of Biochemistry, Sciences II, University of Geneva, Geneva, Switzerland
| | - Dominique Soldati-Favre
- Department of Microbiology and Molecular Medicine, Centre Medical Universitaire, University of Geneva, Geneva, Switzerland
- * To whom correspondence should be addressed. E-mail:
| |
Collapse
|
12
|
Falk R, Ramström M, Ståhl S, Hober S. Approaches for systematic proteome exploration. ACTA ACUST UNITED AC 2007; 24:155-68. [PMID: 17376740 DOI: 10.1016/j.bioeng.2007.01.001] [Citation(s) in RCA: 15] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/04/2006] [Revised: 01/24/2007] [Accepted: 01/25/2007] [Indexed: 10/23/2022]
Abstract
With the completion of the human genome project (HUGO) during recent years, gene function, protein abundance and expression patterns in tissues and cell types have emerged as central areas for the scientific community. A mapped human proteome will extend the value of the genome sequence and large-scale efforts aiming at elucidating protein localization, abundance and function are invaluable for biomarker and drug discovery. This research area, termed proteomics, is more demanding than any genome sequencing effort and to perform this on a wide scale is a highly diverse task. Therefore, the proteomics field employs a range of methods to examine different aspects of proteomics including protein localization, protein-protein interactions, posttranslational modifications and alteration of protein composition (e.g. differential expression) in tissues and body fluids. Here, some of the most commonly used methods, including chromatographic separations together with mass spectrometry and a number of affinity proteomics concepts are discussed and exemplified.
Collapse
Affiliation(s)
- Ronny Falk
- Royal Institute of Technology, Albanova University Center, School of Biotechnology, SE-106 91 Stockholm, Sweden
| | | | | | | |
Collapse
|
13
|
Tress ML, Martelli PL, Frankish A, Reeves GA, Wesselink JJ, Yeats C, Ólason PĹ, Albrecht M, Hegyi H, Giorgetti A, Raimondo D, Lagarde J, Laskowski RA, López G, Sadowski MI, Watson JD, Fariselli P, Rossi I, Nagy A, Kai W, Størling Z, Orsini M, Assenov Y, Blankenburg H, Huthmacher C, Ramírez F, Schlicker A, Denoeud F, Jones P, Kerrien S, Orchard S, Antonarakis SE, Reymond A, Birney E, Brunak S, Casadio R, Guigo R, Harrow J, Hermjakob H, Jones DT, Lengauer T, A. Orengo C, Patthy L, Thornton JM, Tramontano A, Valencia A. The implications of alternative splicing in the ENCODE protein complement. Proc Natl Acad Sci U S A 2007; 104:5495-500. [PMID: 17372197 PMCID: PMC1838448 DOI: 10.1073/pnas.0700800104] [Citation(s) in RCA: 161] [Impact Index Per Article: 8.9] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/20/2006] [Indexed: 12/22/2022] Open
Abstract
Alternative premessenger RNA splicing enables genes to generate more than one gene product. Splicing events that occur within protein coding regions have the potential to alter the biological function of the expressed protein and even to create new protein functions. Alternative splicing has been suggested as one explanation for the discrepancy between the number of human genes and functional complexity. Here, we carry out a detailed study of the alternatively spliced gene products annotated in the ENCODE pilot project. We find that alternative splicing in human genes is more frequent than has commonly been suggested, and we demonstrate that many of the potential alternative gene products will have markedly different structure and function from their constitutively spliced counterparts. For the vast majority of these alternative isoforms, little evidence exists to suggest they have a role as functional proteins, and it seems unlikely that the spectrum of conventional enzymatic or structural functions can be substantially extended through alternative splicing.
Collapse
Affiliation(s)
- Michael L. Tress
- Structural Computational Biology Programme, Spanish National Cancer Research Centre, E-28029 Madrid, Spain
| | | | - Adam Frankish
- HAVANA Group, The Wellcome Trust Sanger Institute, Hinxton, Cambridge CB10 1SA, United Kingdom
| | - Gabrielle A. Reeves
- European Bioinformatics Institute, Hinxton, Cambridge CB10 1SD, United Kingdom
| | - Jan Jaap Wesselink
- Structural Computational Biology Programme, Spanish National Cancer Research Centre, E-28029 Madrid, Spain
| | - Corin Yeats
- Department of Biochemistry and Molecular Biology and
| | - Páll ĺsólfur Ólason
- Center for Biological Sequence Analysis, BioCentrum-DTU, DK-2800 Lyngby, Denmark
| | - Mario Albrecht
- Max Planck Institute for Informatics, 66123 Saarbrücken, Germany
| | - Hedi Hegyi
- Biological Research Center, Hungarian Academy of Sciences, 1113 Budapest, Hungary
| | - Alejandro Giorgetti
- Department of Biochemical Sciences, University of Rome “La Sapienza,” 2-00185 Rome, Italy
| | - Domenico Raimondo
- Department of Biochemical Sciences, University of Rome “La Sapienza,” 2-00185 Rome, Italy
| | - Julien Lagarde
- Research Unit on Biomedical Informatics, Institut Municipal d'Investigació Mèdica, E-8003 Barcelona, Spain
| | - Roman A. Laskowski
- European Bioinformatics Institute, Hinxton, Cambridge CB10 1SD, United Kingdom
| | - Gonzalo López
- Structural Computational Biology Programme, Spanish National Cancer Research Centre, E-28029 Madrid, Spain
| | - Michael I. Sadowski
- Bioinformatics Unit, University College London, London WC1E 6BT, United Kingdom
| | - James D. Watson
- European Bioinformatics Institute, Hinxton, Cambridge CB10 1SD, United Kingdom
| | - Piero Fariselli
- Department of Biology, University of Bologna, 33-40126 Bologna, Italy
| | - Ivan Rossi
- Department of Biology, University of Bologna, 33-40126 Bologna, Italy
| | - Alinda Nagy
- Biological Research Center, Hungarian Academy of Sciences, 1113 Budapest, Hungary
| | - Wang Kai
- Center for Biological Sequence Analysis, BioCentrum-DTU, DK-2800 Lyngby, Denmark
| | - Zenia Størling
- Center for Biological Sequence Analysis, BioCentrum-DTU, DK-2800 Lyngby, Denmark
| | - Massimiliano Orsini
- Center for Advanced Studies, Research and Development in Sardinia (CRS4), 09010 Pula, Italy
| | - Yassen Assenov
- Max Planck Institute for Informatics, 66123 Saarbrücken, Germany
| | | | | | - Fidel Ramírez
- Max Planck Institute for Informatics, 66123 Saarbrücken, Germany
| | | | - France Denoeud
- Research Unit on Biomedical Informatics, Institut Municipal d'Investigació Mèdica, E-8003 Barcelona, Spain
| | - Phil Jones
- European Bioinformatics Institute, Hinxton, Cambridge CB10 1SD, United Kingdom
| | - Samuel Kerrien
- European Bioinformatics Institute, Hinxton, Cambridge CB10 1SD, United Kingdom
| | - Sandra Orchard
- European Bioinformatics Institute, Hinxton, Cambridge CB10 1SD, United Kingdom
| | - Stylianos E. Antonarakis
- Department of Genetic Medicine and Development, University of Geneva Medical School, 1211 Geneva, Switzerland
| | - Alexandre Reymond
- Center for Integrative Genomics, Genopode building, University of Lausanne, 1015 Lausanne, Switzerland; and
| | - Ewan Birney
- European Bioinformatics Institute, Hinxton, Cambridge CB10 1SD, United Kingdom
| | - Søren Brunak
- Center for Biological Sequence Analysis, BioCentrum-DTU, DK-2800 Lyngby, Denmark
| | - Rita Casadio
- Department of Biology, University of Bologna, 33-40126 Bologna, Italy
| | - Roderic Guigo
- Research Unit on Biomedical Informatics, Institut Municipal d'Investigació Mèdica, E-8003 Barcelona, Spain
- Centre de Regulació Genòmica, Universitat Pompeu Fabra, E-08003 Barcelona, Spain
| | - Jennifer Harrow
- HAVANA Group, The Wellcome Trust Sanger Institute, Hinxton, Cambridge CB10 1SA, United Kingdom
| | - Henning Hermjakob
- European Bioinformatics Institute, Hinxton, Cambridge CB10 1SD, United Kingdom
| | - David T. Jones
- Bioinformatics Unit, University College London, London WC1E 6BT, United Kingdom
| | - Thomas Lengauer
- Max Planck Institute for Informatics, 66123 Saarbrücken, Germany
| | | | - László Patthy
- Biological Research Center, Hungarian Academy of Sciences, 1113 Budapest, Hungary
| | - Janet M. Thornton
- European Bioinformatics Institute, Hinxton, Cambridge CB10 1SD, United Kingdom
| | | | - Alfonso Valencia
- Structural Computational Biology Programme, Spanish National Cancer Research Centre, E-28029 Madrid, Spain
| |
Collapse
|
14
|
Mueller M, Martens L, Apweiler R. Annotating the human proteome: Beyond establishing a parts list. BIOCHIMICA ET BIOPHYSICA ACTA-PROTEINS AND PROTEOMICS 2007; 1774:175-91. [PMID: 17223395 DOI: 10.1016/j.bbapap.2006.11.011] [Citation(s) in RCA: 26] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/03/2006] [Revised: 11/16/2006] [Accepted: 11/21/2006] [Indexed: 12/31/2022]
Abstract
The completion of the human genome has shifted the attention from deciphering the sequence to the identification and characterisation of the functional components, including genes. Improved gene prediction algorithms, together with the existing transcript and protein information, have enabled the identification of most exons in a genome. Availability of the 'parts list' has fostered the development of experimental approaches to systematically interrogate gene function on the genome, transcriptome and proteome level. Studying gene function at the protein level is vital to the understanding of how cells perform their functions as variations in protein isoforms and protein quantity which may underlie a change in phenotype can often not be deduced from sequence or transcript level genomics experiments alone. Recent advancements in proteomics have afforded technologies capable of measuring protein expression, post-translational modifications of these proteins, their subcellular localisation and assembly into complexes and pathways. Although an enormous amount of data already exists on the function of many human proteins, much of it is scattered over multiple resources. Public domain databases are therefore required to manage and collate this information and present it to the user community in both a human and machine readable manner. Of special importance here is the integration of heterogeneous data to facilitate the creation of resources that go beyond a mere parts list.
Collapse
Affiliation(s)
- Michael Mueller
- EMBL Outstation, The European Bioinformatics Institute, Wellcome Trust Genome Campus, Cambridge CB10 1SD, UK
| | | | | |
Collapse
|
15
|
Takeda JI, Suzuki Y, Nakao M, Kuroda T, Sugano S, Gojobori T, Imanishi T. H-DBAS: alternative splicing database of completely sequenced and manually annotated full-length cDNAs based on H-Invitational. Nucleic Acids Res 2007; 35:D104-9. [PMID: 17130147 PMCID: PMC1716722 DOI: 10.1093/nar/gkl854] [Citation(s) in RCA: 34] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/15/2006] [Revised: 10/09/2006] [Accepted: 10/10/2006] [Indexed: 11/13/2022] Open
Abstract
The Human-transcriptome DataBase for Alternative Splicing (H-DBAS) is a specialized database of alternatively spliced human transcripts. In this database, each of the alternative splicing (AS) variants corresponds to a completely sequenced and carefully annotated human full-length cDNA, one of those collected for the H-Invitational human-transcriptome annotation meeting. H-DBAS contains 38,664 representative alternative splicing variants (RASVs) in 11,744 loci, in total. The data is retrievable by various features of AS, which were annotated according to manual annotations, such as by patterns of ASs, consequently invoked alternations in the encoded amino acids and affected protein motifs, GO terms, predicted subcellular localization signals and transmembrane domains. The database also records recently identified very complex patterns of AS, in which two distinct genes seemed to be bridged, nested or degenerated (multiple CDS): in all three cases, completely unrelated proteins are encoded by a single locus. By using AS Viewer, each AS event can be analyzed in the context of full-length cDNAs, enabling the user's empirical understanding of the relation between AS event and the consequent alternations in the encoded amino acid sequences together with various kinds of affected protein motifs. H-DBAS is accessible at http://jbirc.jbic.or.jp/h-dbas/.
Collapse
Affiliation(s)
- Jun-ichi Takeda
- Integrated Database Group, Japan Biological Information Research Center, Japan Biological Informatics Consortium, AIST Bio-IT Research BuildingAomi 2-42, Koto-ku, Tokyo 135-0064, Japan
- Biological Information Research Center, National Institute of Advanced Industrial Science and Technology, AIST Bio-IT Research BuildingAomi 2-42, Koto-ku, Tokyo 135-0064, Japan
| | - Yutaka Suzuki
- Department of Medical Genome Sciences, Graduate School of Frontier Sciences, the University of Tokyo5-1-5 Kashiwanoha, Kashiwa, Chiba 277-8562, Japan
| | - Mitsuteru Nakao
- Computational Biology Research Center, National Institute of Advanced Science and Technology, AIST Bio-IT Research BuildingAomi 2-42, Koto-ku, Tokyo 135-0064, Japan
- Kazusa DNA Research Institute, 2-6-7 Kazusa-KamatariKisarazu, Chiba 292-0818, Japan
| | - Tsuyoshi Kuroda
- Maze Corporation, TS Building 1013-20-2 Hatagaya, Shibuya-ku, Tokyo 151-0072, Japan
| | - Sumio Sugano
- Department of Medical Genome Sciences, Graduate School of Frontier Sciences, the University of Tokyo5-1-5 Kashiwanoha, Kashiwa, Chiba 277-8562, Japan
| | - Takashi Gojobori
- Biological Information Research Center, National Institute of Advanced Industrial Science and Technology, AIST Bio-IT Research BuildingAomi 2-42, Koto-ku, Tokyo 135-0064, Japan
- Center for Information Biology and DDBJ, National Institute of Genetics1111 Yata, Mishima, Shizuoka 411-8540, Japan
| | - Tadashi Imanishi
- Biological Information Research Center, National Institute of Advanced Industrial Science and Technology, AIST Bio-IT Research BuildingAomi 2-42, Koto-ku, Tokyo 135-0064, Japan
- Graduate School of Information Science and Technology, Hokkaido UniversityNorth 14, West 9, Kita-ku, Sapporo, Hokkaido 060-0814, Japan
| |
Collapse
|
16
|
Xing Y, Lee C. Relating alternative splicing to proteome complexity and genome evolution. ADVANCES IN EXPERIMENTAL MEDICINE AND BIOLOGY 2007; 623:36-49. [PMID: 18380339 DOI: 10.1007/978-0-387-77374-2_3] [Citation(s) in RCA: 24] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/28/2023]
Abstract
Prior to genomics, studies of alternative splicing primarily focused on the function and mechanism of alternative splicing in individual genes and exons. This has changed dramatically since the late 1990s. High-throughput genomics technologies, such as EST sequencing and microarrays designed to detect changes in splicing, led to genome-wide discoveries and quantification of alternative splicing in a wide range of species from human to Arabidopsis. Consensus estimates of AS frequency in the human genome grew from less than 5% in mid-1990s to as high as 60-74% now. The rapid growth in sequence and microarray data for alternative splicing has made it possible to look into the global impact of alternative splicing on protein function and evolution of genomes. In this chapter, we review recent research on alternative splicing's impact on proteomic complexity and its role in genome evolution.
Collapse
Affiliation(s)
- Yi Xing
- Department of Internal Medicine, Roy J. and Lucille A. Carver College of Medicine, University of Iowa, Iowa City, USA
| | | |
Collapse
|
17
|
Abstract
It is widely recognized that much of the information for determining the final subcellular localization of proteins is found in their amino acid sequences. Thus the prediction of protein localization sites is of both theoretical and practical interest. In most cases, the prediction has been attempted in two ways: one is based on the knowledge of experimentally characterized targeting signals, while the other utilizes the statistical differences of general sequence characteristics, such as amino acid composition, between localization sites. Both approaches have limitations, and it is recommended to check the results of various prediction methods based on different principles as well as training data. Recently, increased proteomic analyses of localization sites have provided new data to assess the current status of predictive methods. In this chapter we discuss these issues and close with an example illustrating the use of the WoLF PSORT web server for localization prediction.
Collapse
Affiliation(s)
- Kenta Nakai
- Laboratory of Functional Analysis in silico, Human Genome Center, Institute of Medical Science, University of Tokyo, Tokyo, Japan
| | | |
Collapse
|
18
|
Takeda JI, Suzuki Y, Nakao M, Barrero RA, Koyanagi KO, Jin L, Motono C, Hata H, Isogai T, Nagai K, Otsuki T, Kuryshev V, Shionyu M, Yura K, Go M, Thierry-Mieg J, Thierry-Mieg D, Wiemann S, Nomura N, Sugano S, Gojobori T, Imanishi T. Large-scale identification and characterization of alternative splicing variants of human gene transcripts using 56,419 completely sequenced and manually annotated full-length cDNAs. Nucleic Acids Res 2006; 34:3917-28. [PMID: 16914452 PMCID: PMC1557807 DOI: 10.1093/nar/gkl507] [Citation(s) in RCA: 35] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/22/2006] [Revised: 07/03/2006] [Accepted: 07/03/2006] [Indexed: 11/12/2022] Open
Abstract
We report the first genome-wide identification and characterization of alternative splicing in human gene transcripts based on analysis of the full-length cDNAs. Applying both manual and computational analyses for 56,419 completely sequenced and precisely annotated full-length cDNAs selected for the H-Invitational human transcriptome annotation meetings, we identified 6877 alternative splicing genes with 18 297 different alternative splicing variants. A total of 37,670 exons were involved in these alternative splicing events. The encoded protein sequences were affected in 6005 of the 6877 genes. Notably, alternative splicing affected protein motifs in 3015 genes, subcellular localizations in 2982 genes and transmembrane domains in 1348 genes. We also identified interesting patterns of alternative splicing, in which two distinct genes seemed to be bridged, nested or having overlapping protein coding sequences (CDSs) of different reading frames (multiple CDS). In these cases, completely unrelated proteins are encoded by a single locus. Genome-wide annotations of alternative splicing, relying on full-length cDNAs, should lay firm groundwork for exploring in detail the diversification of protein function, which is mediated by the fast expanding universe of alternative splicing variants.
Collapse
Affiliation(s)
- Jun-ichi Takeda
- Integrated Database Group, Japan Biological Information Research Center, Japan Biological Informatics Consortium, AIST Bio-IT Research BuildingAomi 2-42, Koto-ku, Tokyo 135-0064, Japan
- Biological Information Research Center, National Institute of Advanced Industrial Science and Technology, AIST Bio-IT Research BuildingAomi 2-42, Koto-ku, Tokyo 135-0064, Japan
| | - Yutaka Suzuki
- Department of Medical Genome Sciences, Graduate School of Frontier Sciences, the University of Tokyo5-1-5 Kashiwanoha, Kashiwa, Chiba 277-8562, Japan
| | - Mitsuteru Nakao
- Computational Biology Research Center, National Institute of Advanced Science and Technology, AIST Bio-IT Research BuildingAomi 2-42, Koto-ku, Tokyo 135-0064, Japan
- Kazusa DNA Research Institute2-6-7 Kazusa-Kamatari, Kisarazu, Chiba 292-0818, Japan
| | - Roberto A. Barrero
- Center for Information Biology and DDBJ, National Institute of Genetics1111 Yata, Mishima, Shizuoka 411-8540, Japan
| | - Kanako O. Koyanagi
- Graduate School of Information Science and Technology, Hokkaido UniversityNorth 14, West 9, Kita-ku, Sapporo, Hokkaido 060-0814, Japan
| | - Lihua Jin
- Center for Information Biology and DDBJ, National Institute of Genetics1111 Yata, Mishima, Shizuoka 411-8540, Japan
| | - Chie Motono
- Computational Biology Research Center, National Institute of Advanced Science and Technology, AIST Bio-IT Research BuildingAomi 2-42, Koto-ku, Tokyo 135-0064, Japan
| | - Hiroko Hata
- Department of Medical Genome Sciences, Graduate School of Frontier Sciences, the University of Tokyo5-1-5 Kashiwanoha, Kashiwa, Chiba 277-8562, Japan
| | - Takao Isogai
- Reverse Proteomics Research Institute, 2-6-7 Kazusa-KamatariKisarazu, Chiba 292-0818, Japan
- Helix Research Institute, Inc. 1532-3Yana, Kisarazu, Chiba 292-0812, Japan
| | - Keiichi Nagai
- Helix Research Institute, Inc. 1532-3Yana, Kisarazu, Chiba 292-0812, Japan
- Central Research Laboratory, Hitachi Ltd1-280, Higashi-koigakubo, Kokubunji-shi, Tokyo 185-8601, Japan
| | - Tetsuji Otsuki
- Helix Research Institute, Inc. 1532-3Yana, Kisarazu, Chiba 292-0812, Japan
| | - Vladimir Kuryshev
- Division of Molecular Genome Analysis, German Cancer Research CenterIm Neuenheimer Feld 580, D-69120 Heidelberg, Germany
| | - Masafumi Shionyu
- Faculty of Bio-Science, Nagahama Institute of Bio-Science and Technology1266 Tamura-cho, Nagahama, Shiga 526-0829, Japan
| | - Kei Yura
- Quantum Bioinformatics Team, Center for Computational Science and Engineering, Japan Atomic Energy Agency8-1 Umemidai, Kizu, Souraku, Kyoto 619-0215, Japan
- Core Research for Evolution Science and Technology, Japan Science and Technology AgencyJapan
| | - Mitiko Go
- Division of Molecular Genome Analysis, German Cancer Research CenterIm Neuenheimer Feld 580, D-69120 Heidelberg, Germany
- Ochanomizu University2-1-1 Otsuka, Bunkyo-ku, Tokyo 112-8610, Japan
| | - Jean Thierry-Mieg
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of HealthBethesda, MD, USA
- Centre National de la Recherche Scientifique, Laboratoire de Physique MathematiqueMontpellier, France
| | - Danielle Thierry-Mieg
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of HealthBethesda, MD, USA
- Centre National de la Recherche Scientifique, Laboratoire de Physique MathematiqueMontpellier, France
| | - Stefan Wiemann
- Division of Molecular Genome Analysis, German Cancer Research CenterIm Neuenheimer Feld 580, D-69120 Heidelberg, Germany
| | - Nobuo Nomura
- Biological Information Research Center, National Institute of Advanced Industrial Science and Technology, AIST Bio-IT Research BuildingAomi 2-42, Koto-ku, Tokyo 135-0064, Japan
| | - Sumio Sugano
- Department of Medical Genome Sciences, Graduate School of Frontier Sciences, the University of Tokyo5-1-5 Kashiwanoha, Kashiwa, Chiba 277-8562, Japan
| | - Takashi Gojobori
- Biological Information Research Center, National Institute of Advanced Industrial Science and Technology, AIST Bio-IT Research BuildingAomi 2-42, Koto-ku, Tokyo 135-0064, Japan
- Center for Information Biology and DDBJ, National Institute of Genetics1111 Yata, Mishima, Shizuoka 411-8540, Japan
| | - Tadashi Imanishi
- Biological Information Research Center, National Institute of Advanced Industrial Science and Technology, AIST Bio-IT Research BuildingAomi 2-42, Koto-ku, Tokyo 135-0064, Japan
- Graduate School of Information Science and Technology, Hokkaido UniversityNorth 14, West 9, Kita-ku, Sapporo, Hokkaido 060-0814, Japan
| |
Collapse
|
19
|
Liang P, Nair JR, Song L, McGuire JJ, Dolnick BJ. Comparative genomic analysis reveals a novel mitochondrial isoform of human rTS protein and unusual phylogenetic distribution of the rTS gene. BMC Genomics 2005; 6:125. [PMID: 16162288 PMCID: PMC1261261 DOI: 10.1186/1471-2164-6-125] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/20/2005] [Accepted: 09/14/2005] [Indexed: 01/23/2023] Open
Abstract
BACKGROUND The rTS gene (ENOSF1), first identified in Homo sapiens as a gene complementary to the thymidylate synthase (TYMS) mRNA, is known to encode two protein isoforms, rTSalpha and rTSbeta. The rTSbeta isoform appears to be an enzyme responsible for the synthesis of signaling molecules involved in the down-regulation of thymidylate synthase, but the exact cellular functions of rTS genes are largely unknown. RESULTS Through comparative genomic sequence analysis, we predicted the existence of a novel protein isoform, rTS, which has a 27 residue longer N-terminus by virtue of utilizing an alternative start codon located upstream of the start codon in rTSbeta. We observed that a similar extended N-terminus could be predicted in all rTS genes for which genomic sequences are available and the extended regions are conserved from bacteria to human. Therefore, we reasoned that the protein with the extended N-terminus might represent an ancestral form of the rTS protein. Sequence analysis strongly predicts a mitochondrial signal sequence in the extended N-terminal of human rTSgamma, which is absent in rTSbeta. We confirmed the existence of rTS in human mitochondria experimentally by demonstrating the presence of both rTSgamma and rTSbeta proteins in mitochondria isolated by subcellular fractionation. In addition, our comprehensive analysis of rTS orthologous sequences reveals an unusual phylogenetic distribution of this gene, which suggests the occurrence of one or more horizontal gene transfer events. CONCLUSION The presence of two rTS isoforms in mitochondria suggests that the rTS signaling pathway may be active within mitochondria. Our report also presents an example of identifying novel protein isoforms and for improving gene annotation through comparative genomic analysis.
Collapse
Affiliation(s)
- Ping Liang
- Department of Cancer Genetics, Roswell Park Cancer Institute, Buffalo, USA
| | - Jayakumar R Nair
- Department of Pharmacology and Therapeutics, Roswell Park Cancer Institute, Buffalo, USA
| | - Lei Song
- Department of Cancer Genetics, Roswell Park Cancer Institute, Buffalo, USA
| | - John J McGuire
- Department of Pharmacology and Therapeutics, Roswell Park Cancer Institute, Buffalo, USA
| | - Bruce J Dolnick
- Department of Pharmacology and Therapeutics, Roswell Park Cancer Institute, Buffalo, USA
| |
Collapse
|