1
|
Korchak J, Jeffery ED, Bandyopadhyay S, Jordan BT, Lehe MD, Watts EF, Fenix A, Wilhelm M, Sheynkman GM. IS-PRM-Based Peptide Targeting Informed by Long-Read Sequencing for Alternative Proteome Detection. JOURNAL OF THE AMERICAN SOCIETY FOR MASS SPECTROMETRY 2024; 35:2614-2630. [PMID: 39012054 PMCID: PMC11544703 DOI: 10.1021/jasms.4c00119] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/22/2024] [Revised: 05/24/2024] [Accepted: 06/25/2024] [Indexed: 07/17/2024]
Abstract
Alternative splicing is a major contributor of transcriptomic complexity, but the extent to which transcript isoforms are translated into stable, functional protein isoforms is unclear. Furthermore, detection of relatively scarce isoform-specific peptides is challenging, with many protein isoforms remaining uncharted due to technical limitations. Recently, a family of advanced targeted MS strategies, termed internal standard parallel reaction monitoring (IS-PRM), have demonstrated multiplexed, sensitive detection of predefined peptides of interest. Such approaches have not yet been used to confirm existence of novel peptides. Here, we present a targeted proteogenomic approach that leverages sample-matched long-read RNA sequencing (lrRNA-seq) data to predict potential protein isoforms with prior transcript evidence. Predicted tryptic isoform-specific peptides, which are specific to individual gene product isoforms, serve as "triggers" and "targets" in the IS-PRM method, Tomahto. Using the model human stem cell line WTC11, LR RNaseq data were generated and used to inform the generation of synthetic standards for 192 isoform-specific peptides (114 isoforms from 55 genes). These synthetic "trigger" peptides were labeled with super heavy tandem mass tags (TMT) and spiked into TMT-labeled WTC11 tryptic digest, predicted to contain corresponding endogenous "target" peptides. Compared to DDA mode, Tomahto increased detectability of isoforms by 3.6-fold, resulting in the identification of five previously unannotated isoforms. Our method detected protein isoform expression for 43 out of 55 genes corresponding to 54 resolved isoforms. This lrRNA-seq-informed Tomahto targeted approach is a new modality for generating protein-level evidence of alternative isoforms─a critical first step in designing functional studies and eventually clinical assays.
Collapse
Affiliation(s)
- Jennifer
A. Korchak
- Department
of Molecular Physiology and Biological Physics, University of Virginia, Charlottesville, Virginia 22903, United States
| | - Erin D. Jeffery
- Department
of Molecular Physiology and Biological Physics, University of Virginia, Charlottesville, Virginia 22903, United States
| | - Saikat Bandyopadhyay
- Department
of Molecular Physiology and Biological Physics, University of Virginia, Charlottesville, Virginia 22903, United States
- Center
for Public Health Genomics, University of
Virginia, Charlottesville, Virginia 22903, United States
| | - Ben T. Jordan
- Cancer
Genomics Research Laboratory, Frederick
National Laboratory for Cancer Research, Frederick, Maryland 21701, United States
| | - Micah D. Lehe
- Department
of Molecular Physiology and Biological Physics, University of Virginia, Charlottesville, Virginia 22903, United States
| | - Emily F. Watts
- Department
of Molecular Physiology and Biological Physics, University of Virginia, Charlottesville, Virginia 22903, United States
| | - Aidan Fenix
- Department
of Laboratory Medicine and Pathology, University
of Washington, Seattle, Washington 98195, United States
| | - Mathias Wilhelm
- Computational
Mass Spectrometry, Technical University
of Munich (TUM), D-85354 Freising, Germany
| | - Gloria M. Sheynkman
- Department
of Molecular Physiology and Biological Physics, University of Virginia, Charlottesville, Virginia 22903, United States
- Department
of Biochemistry and Molecular Genetics, University of Virginia, Charlottesville, Virginia 22903, United States
- UVA
Comprehensive Cancer Center, University
of Virginia, Charlottesville, Virginia 22903, United States
| |
Collapse
|
2
|
Tilliole P, Fix S, Godin JD. hnRNPs: roles in neurodevelopment and implication for brain disorders. Front Mol Neurosci 2024; 17:1411639. [PMID: 39086926 PMCID: PMC11288931 DOI: 10.3389/fnmol.2024.1411639] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/03/2024] [Accepted: 06/17/2024] [Indexed: 08/02/2024] Open
Abstract
Heterogeneous nuclear ribonucleoproteins (hnRNPs) constitute a family of multifunctional RNA-binding proteins able to process nuclear pre-mRNAs into mature mRNAs and regulate gene expression in multiple ways. They comprise at least 20 different members in mammals, named from A (HNRNP A1) to U (HNRNP U). Many of these proteins are components of the spliceosome complex and can modulate alternative splicing in a tissue-specific manner. Notably, while genes encoding hnRNPs exhibit ubiquitous expression, increasing evidence associate these proteins to various neurodevelopmental and neurodegenerative disorders, such as intellectual disability, epilepsy, microcephaly, amyotrophic lateral sclerosis, or dementias, highlighting their crucial role in the central nervous system. This review explores the evolution of the hnRNPs family, highlighting the emergence of numerous new members within this family, and sheds light on their implications for brain development.
Collapse
Affiliation(s)
- Pierre Tilliole
- Institut de Génétique et de Biologie Moléculaire et Cellulaire, IGBMC, Illkirch, France
- Centre National de la Recherche Scientifique, CNRS, UMR7104, Illkirch, France
- Institut National de la Santé et de la Recherche Médicale, INSERM, U1258, Illkirch, France
- Université de Strasbourg, Strasbourg, France
| | - Simon Fix
- Institut de Génétique et de Biologie Moléculaire et Cellulaire, IGBMC, Illkirch, France
- Centre National de la Recherche Scientifique, CNRS, UMR7104, Illkirch, France
- Institut National de la Santé et de la Recherche Médicale, INSERM, U1258, Illkirch, France
- Université de Strasbourg, Strasbourg, France
| | - Juliette D. Godin
- Institut de Génétique et de Biologie Moléculaire et Cellulaire, IGBMC, Illkirch, France
- Centre National de la Recherche Scientifique, CNRS, UMR7104, Illkirch, France
- Institut National de la Santé et de la Recherche Médicale, INSERM, U1258, Illkirch, France
- Université de Strasbourg, Strasbourg, France
| |
Collapse
|
3
|
Korchak JA, Jeffery ED, Bandyopadhyay S, Jordan BT, Lehe M, Watts EF, Fenix A, Wilhelm M, Sheynkman GM. IS-PRM-based peptide targeting informed by long-read sequencing for alternative proteome detection. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.04.01.587549. [PMID: 38617311 PMCID: PMC11014528 DOI: 10.1101/2024.04.01.587549] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 04/16/2024]
Abstract
Alternative splicing is a major contributor of transcriptomic complexity, but the extent to which transcript isoforms are translated into stable, functional protein isoforms is unclear. Furthermore, detection of relatively scarce isoform-specific peptides is challenging, with many protein isoforms remaining uncharted due to technical limitations. Recently, a family of advanced targeted MS strategies, termed internal standard parallel reaction monitoring (IS-PRM), have demonstrated multiplexed, sensitive detection of pre-defined peptides of interest. Such approaches have not yet been used to confirm existence of novel peptides. Here, we present a targeted proteogenomic approach that leverages sample-matched long-read RNA sequencing (LR RNAseq) data to predict potential protein isoforms with prior transcript evidence. Predicted tryptic isoform-specific peptides, which are specific to individual gene product isoforms, serve as "triggers" and "targets" in the IS-PRM method, Tomahto. Using the model human stem cell line WTC11, LR RNAseq data were generated and used to inform the generation of synthetic standards for 192 isoform-specific peptides (114 isoforms from 55 genes). These synthetic "trigger" peptides were labeled with super heavy tandem mass tags (TMT) and spiked into TMT-labeled WTC11 tryptic digest, predicted to contain corresponding endogenous "target" peptides. Compared to DDA mode, Tomahto increased detectability of isoforms by 3.6-fold, resulting in the identification of five previously unannotated isoforms. Our method detected protein isoform expression for 43 out of 55 genes corresponding to 54 resolved isoforms. This LR RNA seq-informed Tomahto targeted approach, called LRP-IS-PRM, is a new modality for generating protein-level evidence of alternative isoforms - a critical first step in designing functional studies and eventually clinical assays.
Collapse
Affiliation(s)
- Jennifer A. Korchak
- Department of Molecular Physiology and Biological Physics, University of Virginia, Charlottesville, Virginia, USA
| | - Erin D. Jeffery
- Department of Molecular Physiology and Biological Physics, University of Virginia, Charlottesville, Virginia, USA
| | - Saikat Bandyopadhyay
- Department of Molecular Physiology and Biological Physics, University of Virginia, Charlottesville, Virginia, USA
- Center for Public Health Genomics, University of Virginia, Charlottesville, VA, USA
| | - Ben T. Jordan
- Cancer Genomics Research Laboratory, Frederick National Laboratory for Cancer Research, Frederick, MD USA
| | - Micah Lehe
- Department of Molecular Physiology and Biological Physics, University of Virginia, Charlottesville, Virginia, USA
| | - Emily F. Watts
- Department of Molecular Physiology and Biological Physics, University of Virginia, Charlottesville, Virginia, USA
| | - Aidan Fenix
- Department of Laboratory Medicine and Pathology, University of Washington, Seattle, WA, USA
| | - Mathias Wilhelm
- Computational Mass Spectrometry, Technical University of Munich (TUM), D-85354 Freising, Germany
| | - Gloria M. Sheynkman
- Department of Molecular Physiology and Biological Physics, University of Virginia, Charlottesville, Virginia, USA
- Department of Biochemistry and Molecular Genetics, University of Virginia, Charlottesville, VA, USA
- UVA Comprehensive Cancer Center, University of Virginia, Charlottesville, VA, USA
| |
Collapse
|
4
|
Rogalska ME, Vivori C, Valcárcel J. Regulation of pre-mRNA splicing: roles in physiology and disease, and therapeutic prospects. Nat Rev Genet 2023; 24:251-269. [PMID: 36526860 DOI: 10.1038/s41576-022-00556-8] [Citation(s) in RCA: 106] [Impact Index Per Article: 53.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 11/10/2022] [Indexed: 12/23/2022]
Abstract
The removal of introns from mRNA precursors and its regulation by alternative splicing are key for eukaryotic gene expression and cellular function, as evidenced by the numerous pathologies induced or modified by splicing alterations. Major recent advances have been made in understanding the structures and functions of the splicing machinery, in the description and classification of physiological and pathological isoforms and in the development of the first therapies for genetic diseases based on modulation of splicing. Here, we review this progress and discuss important remaining challenges, including predicting splice sites from genomic sequences, understanding the variety of molecular mechanisms and logic of splicing regulation, and harnessing this knowledge for probing gene function and disease aetiology and for the design of novel therapeutic approaches.
Collapse
Affiliation(s)
- Malgorzata Ewa Rogalska
- Genome Biology Program, Centre for Genomic Regulation (CRG), The Barcelona Institute of Science and Technology, Barcelona, Spain
| | - Claudia Vivori
- Genome Biology Program, Centre for Genomic Regulation (CRG), The Barcelona Institute of Science and Technology, Barcelona, Spain
- Department of Medicine and Life Sciences, Universitat Pompeu Fabra (UPF), Barcelona, Spain
- The Francis Crick Institute, London, UK
| | - Juan Valcárcel
- Genome Biology Program, Centre for Genomic Regulation (CRG), The Barcelona Institute of Science and Technology, Barcelona, Spain.
- Department of Medicine and Life Sciences, Universitat Pompeu Fabra (UPF), Barcelona, Spain.
- Institució Catalana de Recerca i Estudis Avançats (ICREA), Barcelona, Spain.
| |
Collapse
|
5
|
Manuel JM, Guilloy N, Khatir I, Roucou X, Laurent B. Re-evaluating the impact of alternative RNA splicing on proteomic diversity. Front Genet 2023; 14:1089053. [PMID: 36845399 PMCID: PMC9947481 DOI: 10.3389/fgene.2023.1089053] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/03/2022] [Accepted: 01/23/2023] [Indexed: 02/11/2023] Open
Abstract
Alternative splicing (AS) constitutes a mechanism by which protein-coding genes and long non-coding RNA (lncRNA) genes produce more than a single mature transcript. From plants to humans, AS is a powerful process that increases transcriptome complexity. Importantly, splice variants produced from AS can potentially encode for distinct protein isoforms which can lose or gain specific domains and, hence, differ in their functional properties. Advances in proteomics have shown that the proteome is indeed diverse due to the presence of numerous protein isoforms. For the past decades, with the help of advanced high-throughput technologies, numerous alternatively spliced transcripts have been identified. However, the low detection rate of protein isoforms in proteomic studies raised debatable questions on whether AS contributes to proteomic diversity and on how many AS events are really functional. We propose here to assess and discuss the impact of AS on proteomic complexity in the light of the technological progress, updated genome annotation, and current scientific knowledge.
Collapse
Affiliation(s)
- Jeru Manoj Manuel
- Research Center on Aging, Centre Intégré Universitaire de Santé et Services Sociaux de l’Estrie-Centre Hospitalier Universitaire de Sherbrooke, Sherbrooke, QC, Canada,Department of Biochemistry and Functional Genomics, Faculty of Medicine and Health Sciences, Université de Sherbrooke, Sherbrooke, QC, Canada
| | - Noé Guilloy
- Department of Biochemistry and Functional Genomics, Faculty of Medicine and Health Sciences, Université de Sherbrooke, Sherbrooke, QC, Canada
| | - Inès Khatir
- Research Center on Aging, Centre Intégré Universitaire de Santé et Services Sociaux de l’Estrie-Centre Hospitalier Universitaire de Sherbrooke, Sherbrooke, QC, Canada,Department of Biochemistry and Functional Genomics, Faculty of Medicine and Health Sciences, Université de Sherbrooke, Sherbrooke, QC, Canada
| | - Xavier Roucou
- Department of Biochemistry and Functional Genomics, Faculty of Medicine and Health Sciences, Université de Sherbrooke, Sherbrooke, QC, Canada,Centre de Recherche du Centre Hospitalier Universitaire de Sherbrooke (CRCHUS), Sherbrooke, QC, Canada,Quebec Network for Research on Protein Function Structure and Engineering, PROTEO, Québec, QC, Canada
| | - Benoit Laurent
- Research Center on Aging, Centre Intégré Universitaire de Santé et Services Sociaux de l’Estrie-Centre Hospitalier Universitaire de Sherbrooke, Sherbrooke, QC, Canada,Department of Biochemistry and Functional Genomics, Faculty of Medicine and Health Sciences, Université de Sherbrooke, Sherbrooke, QC, Canada,*Correspondence: Benoit Laurent,
| |
Collapse
|
6
|
Reixachs‐Solé M, Eyras E. Uncovering the impacts of alternative splicing on the proteome with current omics techniques. WILEY INTERDISCIPLINARY REVIEWS. RNA 2022; 13:e1707. [PMID: 34979593 PMCID: PMC9542554 DOI: 10.1002/wrna.1707] [Citation(s) in RCA: 33] [Impact Index Per Article: 11.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 09/26/2021] [Revised: 11/27/2021] [Accepted: 11/29/2021] [Indexed: 12/15/2022]
Abstract
The high-throughput sequencing of cellular RNAs has underscored a broad effect of isoform diversification through alternative splicing on the transcriptome. Moreover, the differential production of transcript isoforms from gene loci has been recognized as a critical mechanism in cell differentiation, organismal development, and disease. Yet, the extent of the impact of alternative splicing on protein production and cellular function remains a matter of debate. Multiple experimental and computational approaches have been developed in recent years to address this question. These studies have unveiled how molecular changes at different steps in the RNA processing pathway can lead to differences in protein production and have functional effects. New and emerging experimental technologies open exciting new opportunities to develop new methods to fully establish the connection between messenger RNA expression and protein production and to further investigate how RNA variation impacts the proteome and cell function. This article is categorized under: RNA Processing > Splicing Regulation/Alternative Splicing Translation > Regulation RNA Evolution and Genomics > Computational Analyses of RNA.
Collapse
Affiliation(s)
- Marina Reixachs‐Solé
- The John Curtin School of Medical ResearchAustralian National UniversityCanberraAustralian Capital TerritoryAustralia
- EMBL Australia Partner Laboratory Network and the Australian National UniversityCanberraAustralian Capital TerritoryAustralia
| | - Eduardo Eyras
- The John Curtin School of Medical ResearchAustralian National UniversityCanberraAustralian Capital TerritoryAustralia
- EMBL Australia Partner Laboratory Network and the Australian National UniversityCanberraAustralian Capital TerritoryAustralia
- Catalan Institution for Research and Advanced StudiesBarcelonaSpain
- Hospital del Mar Medical Research Institute (IMIM)BarcelonaSpain
| |
Collapse
|
7
|
Wang C, Chen L, Chen Y, Jia W, Cai X, Liu Y, Ji F, Xiong P, Liang A, Liu R, Guan Y, Cheng Z, Weng Y, Wang W, Duan Y, Kuang D, Xu S, Cai H, Xia Q, Yang D, Wang MW, Yang X, Zhang J, Cheng C, Liu L, Liu Z, Liang R, Wang G, Li Z, Xia H, Xia T. Abnormal global alternative RNA splicing in COVID-19 patients. PLoS Genet 2022; 18:e1010137. [PMID: 35421082 PMCID: PMC9089920 DOI: 10.1371/journal.pgen.1010137] [Citation(s) in RCA: 22] [Impact Index Per Article: 7.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/07/2021] [Revised: 05/10/2022] [Accepted: 03/08/2022] [Indexed: 12/25/2022] Open
Abstract
Viral infections can alter host transcriptomes by manipulating host splicing machinery. Despite intensive transcriptomic studies on SARS-CoV-2, a systematic analysis of alternative splicing (AS) in severe COVID-19 patients remains largely elusive. Here we integrated proteomic and transcriptomic sequencing data to study AS changes in COVID-19 patients. We discovered that RNA splicing is among the major down-regulated proteomic signatures in COVID-19 patients. The transcriptome analysis showed that SARS-CoV-2 infection induces widespread dysregulation of transcript usage and expression, affecting blood coagulation, neutrophil activation, and cytokine production. Notably, CD74 and LRRFIP1 had increased skipping of an exon in COVID-19 patients that disrupts a functional domain, which correlated with reduced antiviral immunity. Furthermore, the dysregulation of transcripts was strongly correlated with clinical severity of COVID-19, and splice-variants may contribute to unexpected therapeutic activity. In summary, our data highlight that a better understanding of the AS landscape may aid in COVID-19 diagnosis and therapy.
Collapse
Affiliation(s)
- Changli Wang
- Department of Pathology, School of Basic Medicine, Tongji Medical College, Huazhong University of Science and Technology, Wuhan, China
| | - Lijun Chen
- Department of Pathology, School of Basic Medicine, Tongji Medical College, Huazhong University of Science and Technology, Wuhan, China
| | - Yaobin Chen
- Institute of Artificial Intelligence and Automation, Huazhong University of Science and Technology, Wuhan, China
| | - Wenwen Jia
- Institute for Regenerative Medicine, Shanghai East Hospital, School of Life Sciences and Technology, Tongji University, Shanghai, China
| | - Xunhui Cai
- Institute of Artificial Intelligence and Automation, Huazhong University of Science and Technology, Wuhan, China
| | - Yufeng Liu
- Institute of Artificial Intelligence and Automation, Huazhong University of Science and Technology, Wuhan, China
| | - Fenghu Ji
- Institute of Artificial Intelligence and Automation, Huazhong University of Science and Technology, Wuhan, China
| | - Peng Xiong
- Institute of Artificial Intelligence and Automation, Huazhong University of Science and Technology, Wuhan, China
| | - Anyi Liang
- Institute of Artificial Intelligence and Automation, Huazhong University of Science and Technology, Wuhan, China
| | - Ren Liu
- Department of Research and Development, Hugobiotech Co. Ltd., Beijing, China
| | - Yuanlin Guan
- Department of Research and Development, Hugobiotech Co. Ltd., Beijing, China
| | - Zhongyi Cheng
- Jingjie PTM BioLab (Hangzhou) Co. Ltd., Hangzhou, China
| | - Yejing Weng
- Jingjie PTM BioLab (Hangzhou) Co. Ltd., Hangzhou, China
| | - Weixin Wang
- Jingjie PTM BioLab (Hangzhou) Co. Ltd., Hangzhou, China
| | - Yaqi Duan
- Institute of Artificial Intelligence and Automation, Huazhong University of Science and Technology, Wuhan, China
| | - Dong Kuang
- Institute of Artificial Intelligence and Automation, Huazhong University of Science and Technology, Wuhan, China
| | - Sanpeng Xu
- Institute of Artificial Intelligence and Automation, Huazhong University of Science and Technology, Wuhan, China
| | - Hanghang Cai
- Institute of Artificial Intelligence and Automation, Huazhong University of Science and Technology, Wuhan, China
| | - Qin Xia
- Department of Pathology, Union Hospital, Tongji Medical College, Huazhong University of Science and Technology, Wuhan, China
| | - Dehua Yang
- The National Center for Drug Screening, Shanghai Institute of Materia Medica, Chinese Academy of Sciences, Shanghai, China
| | - Ming-Wei Wang
- The National Center for Drug Screening, Shanghai Institute of Materia Medica, Chinese Academy of Sciences, Shanghai, China
| | - Xiangping Yang
- Department of Pathology, School of Basic Medicine, Tongji Medical College, Huazhong University of Science and Technology, Wuhan, China
| | - Jianjun Zhang
- Department of Thoracic/Head and Neck Medical Oncology, The University of Texas MD Anderson Cancer Center, Houston, Texas, United States of America
- Department of Genomic Medicine, The University of Texas MD Anderson Cancer Center, Houston, Texas, United States of America
| | - Chao Cheng
- Department of Medicine, Baylor College of Medicine, Houston, Texas, United States of America
| | - Liang Liu
- Department of Forensic Medicine, Tongji Medical College, Huazhong University of Science and Technology, Wuhan, China
| | - Zhongmin Liu
- Institute for Regenerative Medicine, Shanghai East Hospital, School of Life Sciences and Technology, Tongji University, Shanghai, China
| | - Ren Liang
- Department of Forensic Medicine, Tongji Medical College, Huazhong University of Science and Technology, Wuhan, China
| | - Guopin Wang
- Institute of Artificial Intelligence and Automation, Huazhong University of Science and Technology, Wuhan, China
| | | | - Han Xia
- Department of Research and Development, Hugobiotech Co. Ltd., Beijing, China
- School of Automation Science and Engineering, Faculty of Electronic and Information Engineering, Xi’an Jiaotong University, Xi’an, China
| | - Tian Xia
- Department of Pathology, School of Basic Medicine, Tongji Medical College, Huazhong University of Science and Technology, Wuhan, China
- Institute of Artificial Intelligence and Automation, Huazhong University of Science and Technology, Wuhan, China
| |
Collapse
|
8
|
Pseudogenes: Four Decades of Discovery. Methods Mol Biol 2021. [PMID: 34165705 DOI: 10.1007/978-1-0716-1503-4_1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 09/10/2023]
Abstract
A pseudogene is defined as a genomic DNA sequence that looks like a mutated or truncated version of a known functional gene. Nearly four decades since their first discovery it has been estimated that between ~12,000 and ~20,000 pseudogenes exist in the human genome. Early efforts to characterize functions for pseudogenes were unsuccessful, thus they were considered functionless relics of evolutionary selection, junk DNA or genetic fossils. Remarkably, an increasing number of pseudogenes have been reported to be expressed as RNA transcripts above and beyond levels considered accidental or spurious transcription. There is emerging evidence that some expressed pseudogene transcripts have biological functions and should be defined as a subclass of functional long noncoding RNAs (lncRNA). In this introductory chapter, I briefly summarize the history and the current knowledge of pseudogenes, and highlight the emerging functions of some pseudogenes in human biology and disease. This second iteration of Pseudogenes in Methods in Molecular Biology highlights new methodological approaches to investigate this intriguing family of lncRNAs and the extent of their biological function.
Collapse
|
9
|
Martinez Gomez L, Pozo F, Walsh TA, Abascal F, Tress ML. The clinical importance of tandem exon duplication-derived substitutions. Nucleic Acids Res 2021; 49:8232-8246. [PMID: 34302486 PMCID: PMC8373072 DOI: 10.1093/nar/gkab623] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/07/2021] [Accepted: 07/21/2021] [Indexed: 01/04/2023] Open
Abstract
Most coding genes in the human genome are annotated with multiple alternative transcripts. However, clear evidence for the functional relevance of the protein isoforms produced by these alternative transcripts is often hard to find. Alternative isoforms generated from tandem exon duplication-derived substitutions are an exception. These splice events are rare, but have important functional consequences. Here, we have catalogued the 236 tandem exon duplication-derived substitutions annotated in the GENCODE human reference set. We find that more than 90% of the events have a last common ancestor in teleost fish, so are at least 425 million years old, and twenty-one can be traced back to the Bilateria clade. Alternative isoforms generated from tandem exon duplication-derived substitutions also have significantly more clinical impact than other alternative isoforms. Tandem exon duplication-derived substitutions have >25 times as many pathogenic and likely pathogenic mutations as other alternative events. Tandem exon duplication-derived substitutions appear to have vital functional roles in the cell and may have played a prominent part in metazoan evolution.
Collapse
Affiliation(s)
- Laura Martinez Gomez
- Bioinformatics Unit, Spanish National Cancer Research Centre (CNIO), C. Melchor Fernandez Almagro, 3, 28029 Madrid, Spain
| | - Fernando Pozo
- Bioinformatics Unit, Spanish National Cancer Research Centre (CNIO), C. Melchor Fernandez Almagro, 3, 28029 Madrid, Spain
| | - Thomas A Walsh
- Bioinformatics Unit, Spanish National Cancer Research Centre (CNIO), C. Melchor Fernandez Almagro, 3, 28029 Madrid, Spain.,Eukaryotic Annotation Team, EMBL-EBI, Wellcome Genome Campus, Hinxton, Cambridgeshire CB10 1SA. UK
| | - Federico Abascal
- Somatic Evolution Group, Wellcome Sanger Institute, Wellcome Genome Campus, Hinxton, Cambridgeshire CB10 1SA, UK
| | - Michael L Tress
- Bioinformatics Unit, Spanish National Cancer Research Centre (CNIO), C. Melchor Fernandez Almagro, 3, 28029 Madrid, Spain
| |
Collapse
|
10
|
Pozo F, Martinez-Gomez L, Walsh TA, Rodriguez JM, Di Domenico T, Abascal F, Vazquez J, Tress ML. Assessing the functional relevance of splice isoforms. NAR Genom Bioinform 2021; 3:lqab044. [PMID: 34046593 PMCID: PMC8140736 DOI: 10.1093/nargab/lqab044] [Citation(s) in RCA: 23] [Impact Index Per Article: 5.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/27/2020] [Revised: 04/22/2021] [Accepted: 05/17/2021] [Indexed: 12/20/2022] Open
Abstract
Alternative splicing of messenger RNA can generate an array of mature transcripts, but it is not clear how many go on to produce functionally relevant protein isoforms. There is only limited evidence for alternative proteins in proteomics analyses and data from population genetic variation studies indicate that most alternative exons are evolving neutrally. Determining which transcripts produce biologically important isoforms is key to understanding isoform function and to interpreting the real impact of somatic mutations and germline variations. Here we have developed a method, TRIFID, to classify the functional importance of splice isoforms. TRIFID was trained on isoforms detected in large-scale proteomics analyses and distinguishes these biologically important splice isoforms with high confidence. Isoforms predicted as functionally important by the algorithm had measurable cross species conservation and significantly fewer broken functional domains. Additionally, exons that code for these functionally important protein isoforms are under purifying selection, while exons from low scoring transcripts largely appear to be evolving neutrally. TRIFID has been developed for the human genome, but it could in principle be applied to other well-annotated species. We believe that this method will generate valuable insights into the cellular importance of alternative splicing.
Collapse
Affiliation(s)
- Fernando Pozo
- Bioinformatics Unit, Spanish National Cancer Research Centre (CNIO), Madrid, Spain
| | - Laura Martinez-Gomez
- Bioinformatics Unit, Spanish National Cancer Research Centre (CNIO), Madrid, Spain
| | - Thomas A Walsh
- Bioinformatics Unit, Spanish National Cancer Research Centre (CNIO), Madrid, Spain
| | - José Manuel Rodriguez
- Cardiovascular Proteomics Laboratory, Centro Nacional de Investigaciones Cardiovasculares (CNIC), Madrid, Spain
| | - Tomas Di Domenico
- Bioinformatics Unit, Spanish National Cancer Research Centre (CNIO), Madrid, Spain
| | - Federico Abascal
- Somatic Evolution Group, Wellcome Sanger Institute, Hinxton CB10 1SA, UK
| | - Jesús Vazquez
- Cardiovascular Proteomics Laboratory, Centro Nacional de Investigaciones Cardiovasculares (CNIC), Madrid, Spain
| | - Michael L Tress
- Bioinformatics Unit, Spanish National Cancer Research Centre (CNIO), Madrid, Spain
| |
Collapse
|
11
|
Fingleton E, Li Y, Roche KW. Advances in Proteomics Allow Insights Into Neuronal Proteomes. Front Mol Neurosci 2021; 14:647451. [PMID: 33935646 PMCID: PMC8084103 DOI: 10.3389/fnmol.2021.647451] [Citation(s) in RCA: 11] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/29/2020] [Accepted: 03/25/2021] [Indexed: 11/29/2022] Open
Abstract
Protein–protein interaction networks and signaling complexes are essential for normal brain function and are often dysregulated in neurological disorders. Nevertheless, unraveling neuron- and synapse-specific proteins interaction networks has remained a technical challenge. New techniques, however, have allowed for high-resolution and high-throughput analyses, enabling quantification and characterization of various neuronal protein populations. Over the last decade, mass spectrometry (MS) has surfaced as the primary method for analyzing multiple protein samples in tandem, allowing for the precise quantification of proteomic data. Moreover, the development of sophisticated protein-labeling techniques has given MS a high temporal and spatial resolution, facilitating the analysis of various neuronal substructures, cell types, and subcellular compartments. Recent studies have leveraged these novel techniques to reveal the proteomic underpinnings of well-characterized neuronal processes, such as axon guidance, long-term potentiation, and homeostatic plasticity. Translational MS studies have facilitated a better understanding of complex neurological disorders, such as Alzheimer’s disease (AD), Schizophrenia (SCZ), and Autism Spectrum Disorder (ASD). Proteomic investigation of these diseases has not only given researchers new insight into disease mechanisms but has also been used to validate disease models and identify new targets for research.
Collapse
Affiliation(s)
- Erin Fingleton
- National Institute of Neurological Disorders and Stroke (NINDS), Bethesda, MD, United States
| | - Yan Li
- National Institute of Neurological Disorders and Stroke (NINDS), Bethesda, MD, United States
| | - Katherine W Roche
- National Institute of Neurological Disorders and Stroke (NINDS), Bethesda, MD, United States
| |
Collapse
|
12
|
Zhang F, Deng CK, Wang M, Deng B, Barber R, Huang G. Identification of novel alternative splicing biomarkers for breast cancer with LC/MS/MS and RNA-Seq. BMC Bioinformatics 2020; 21:541. [PMID: 33272210 PMCID: PMC7713335 DOI: 10.1186/s12859-020-03824-8] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/14/2020] [Accepted: 10/19/2020] [Indexed: 01/12/2023] Open
Abstract
Background Alternative splicing isoforms have been reported as a new and robust class of diagnostic biomarkers. Over 95% of human genes are estimated to be alternatively spliced as a powerful means of producing functionally diverse proteins from a single gene. The emergence of next-generation sequencing technologies, especially RNA-seq, provides novel insights into large-scale detection and analysis of alternative splicing at the transcriptional level. Advances in Proteomic Technologies such as liquid chromatography coupled tandem mass spectrometry (LC–MS/MS), have shown tremendous power for the parallel characterization of large amount of proteins in biological samples. Although poor correspondence has been generally found from previous qualitative comparative analysis between proteomics and microarray data, significantly higher degrees of correlation have been observed at the level of exon. Combining protein and RNA data by searching LC–MS/MS data against a customized protein database from RNA-Seq may produce a subset of alternatively spliced protein isoform candidates that have higher confidence. Results We developed a bioinformatics workflow to discover alternative splicing biomarkers from LC–MS/MS using RNA-Seq. First, we retrieved high confident, novel alternative splicing biomarkers from the breast cancer RNA-Seq database. Then, we translated these sequences into in silico Isoform Junction Peptides, and created a customized alternative splicing database for MS searching. Lastly, we ran the Open Mass spectrometry Search Algorithm against the customized alternative splicing database with breast cancer plasma proteome. Twenty six alternative splicing biomarker peptides with one single intron event and one exon skipping event were identified. Further interpretation of biological pathways with our Integrated Pathway Analysis Database showed that these 26 peptides are associated with Cancer, Signaling, Metabolism, Regulation, Immune System and Hemostasis pathways, which are consistent with the 256 alternative splicing biomarkers from the RNA-Seq. Conclusions This paper presents a bioinformatics workflow for using RNA-seq data to discover novel alternative splicing biomarkers from the breast cancer proteome. As a complement to synthetic alternative splicing database technique for alternative splicing identification, this method combines the advantages of two platforms: mass spectrometry and next generation sequencing and can help identify potentially highly sample-specific alternative splicing isoform biomarkers at early-stage of cancer.
Collapse
Affiliation(s)
- Fan Zhang
- Vermont Biomedical Research Network and Department of Biology, University of Vermont, Burlington, VT, 05405, USA. .,Institute for Translational Research and Department of Family Medicine, University of North Texas Health Science Center, Fort Worth, TX, 76107, USA.
| | - Chris K Deng
- School of Molecular and Cellular Biology, University of Illinois at Urbana-Champaign, Champaign, IL, 61801, USA
| | - Mu Wang
- Department of Biochemistry and Molecular Biology, IU School of Medicine, Indianapolis, IN, 46202, USA.,Indiana Center for Systems Biology and Personalized Medicine, Indianapolis, IN, 46202, USA
| | - Bin Deng
- Vermont Biomedical Research Network and Department of Biology, University of Vermont, Burlington, VT, 05405, USA.,Institute for Translational Research and Department of Family Medicine, University of North Texas Health Science Center, Fort Worth, TX, 76107, USA
| | - Robert Barber
- Department of Pharmacology and Neuroscience, University of North Texas Health Science Center, Fort Worth, TX, USA
| | - Gang Huang
- Shanghai Key Laboratory for Molecular Imaging, Shanghai University of Medicine and Health Sciences, Shanghai, 201318, People's Republic of China.
| |
Collapse
|
13
|
Christie AE, Rivera CD, Call CM, Dickinson PS, Stemmler EA, Hull JJ. Multiple transcriptome mining coupled with tissue specific molecular cloning and mass spectrometry provide insights into agatoxin-like peptide conservation in decapod crustaceans. Gen Comp Endocrinol 2020; 299:113609. [PMID: 32916171 PMCID: PMC7747469 DOI: 10.1016/j.ygcen.2020.113609] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 07/24/2020] [Revised: 08/25/2020] [Accepted: 08/29/2020] [Indexed: 12/16/2022]
Abstract
Over the past decade, in silico genome and transcriptome mining has led to the identification of many new crustacean peptide families, including the agatoxin-like peptides (ALPs), a group named for their structural similarity to agatoxin, a spider venom component. Here, analysis of publicly accessible transcriptomes was used to expand our understanding of crustacean ALPs. Specifically, transcriptome mining was used to investigate the phylogenetic/structural conservation, tissue localization, and putative functions of ALPs in decapod species. Transcripts encoding putative ALP precursors were identified from one or more members of the Penaeoidea (penaeid shrimp), Sergestoidea (sergestid shrimps), Caridea (caridean shrimp), Astacidea (clawed lobsters and freshwater crayfish), Achelata (spiny/slipper lobsters), and Brachyura (true crabs), suggesting a broad, and perhaps ubiquitous, conservation of ALPs in decapods. Comparison of the predicted mature structures of decapod ALPs revealed high levels of amino acid conservation, including eight identically conserved cysteine residues that presumably allow for the formation of four identically positioned disulfide bridges. All decapod ALPs are predicted to have amidated carboxyl-terminals. Two isoforms of ALP appear to be present in most decapod species, one 44 amino acids long and the other 42 amino acids in length, both likely generated by alternative splicing of a single gene. In carideans, a gene or terminal exon duplication appears to have occurred, with alternative splicing producing four ALPs, two 44 and two 42 amino acid isoforms. The identification of ALP precursor-encoding transcripts in nervous system-specific transcriptomes (e.g., Homarus americanus brain, eyestalk ganglia, and cardiac ganglion assemblies, finding confirmed using RT-PCR) suggests that members of this peptide family may serve as locally-released and/or hormonally-delivered neuromodulators in decapods. Their detection in testis- and hepatopancreas-specific transcriptomes suggests that members of the ALP family may also play roles in male reproduction and innate immunity/detoxification.
Collapse
Affiliation(s)
- Andrew E Christie
- Békésy Laboratory of Neurobiology, Pacific Biosciences Research Center, School of Ocean and Earth Science and Technology, University of Hawaii at Manoa, 1993 East-West Road, Honolulu, HI 96822, USA
| | - Cindy D Rivera
- Department of Chemistry, Bowdoin College, 6600 College Station, Brunswick, ME 04011, USA
| | - Catherine M Call
- Department of Chemistry, Bowdoin College, 6600 College Station, Brunswick, ME 04011, USA
| | - Patsy S Dickinson
- Department of Biology, Bowdoin College, 6500 College Station, Brunswick, ME 04011, USA
| | - Elizabeth A Stemmler
- Department of Chemistry, Bowdoin College, 6600 College Station, Brunswick, ME 04011, USA
| | - J Joe Hull
- Pest Management and Biocontrol Research Unit, US Arid Land Agricultural Research Center, USDA Agricultural Research Services, Maricopa, AZ 85138, USA.
| |
Collapse
|
14
|
Rodriguez JM, Pozo F, di Domenico T, Vazquez J, Tress ML. An analysis of tissue-specific alternative splicing at the protein level. PLoS Comput Biol 2020; 16:e1008287. [PMID: 33017396 PMCID: PMC7561204 DOI: 10.1371/journal.pcbi.1008287] [Citation(s) in RCA: 51] [Impact Index Per Article: 10.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/10/2020] [Revised: 10/15/2020] [Accepted: 08/25/2020] [Indexed: 01/09/2023] Open
Abstract
The role of alternative splicing is one of the great unanswered questions in cellular biology. There is strong evidence for alternative splicing at the transcript level, and transcriptomics experiments show that many splice events are tissue specific. It has been suggested that alternative splicing evolved in order to remodel tissue-specific protein-protein networks. Here we investigated the evidence for tissue-specific splicing among splice isoforms detected in a large-scale proteomics analysis. Although the data supporting alternative splicing is limited at the protein level, clear patterns emerged among the small numbers of alternative splice events that we could detect in the proteomics data. More than a third of these splice events were tissue-specific and most were ancient: over 95% of splice events that were tissue-specific in both proteomics and RNAseq analyses evolved prior to the ancestors of lobe-finned fish, at least 400 million years ago. By way of contrast, three in four alternative exons in the human gene set arose in the primate lineage, so our results cannot be extrapolated to the whole genome. Tissue-specific alternative protein forms in the proteomics analysis were particularly abundant in nervous and muscle tissues and their genes had roles related to the cytoskeleton and either the structure of muscle fibres or cell-cell connections. Our results suggest that this conserved tissue-specific alternative splicing may have played a role in the development of the vertebrate brain and heart. We manually curated a set of 255 splice events detected in a large-scale tissue-based proteomics experiment and found that more than a third had evidence of significant tissue-specific differences. Events that were significantly tissue-specific at the protein level were highly conserved; almost 75% evolved over 400 million years ago. The tissues in which we found most evidence for tissue-specific splicing were nervous tissues and cardiac tissues. Genes with tissue-specific events in these two tissues had functions related to important cellular structures in brain and heart tissues. These splice events may have been essential for the development of vertebrate heart and muscle. However, our data set may not be representative of alternative exons as a whole. We found that most tissue specific splicing was strongly conserved, but just 5% of annotated alternative exons in the human gene set are ancient. More than three quarters of alternative exons are primate-derived. Although the analysis does not provide a definitive answer to the question of the functional role of alternative splicing, our results do indicate that alternative splice variants may have played a significant part in the evolution of brain and heart tissues in vertebrates.
Collapse
Affiliation(s)
- Jose Manuel Rodriguez
- Cardiovascular Proteomics Laboratory, Centro Nacional de Investigaciones Cardiovasculares (CNIC), Calle Melchor Fernandez, Madrid, Spain
| | - Fernando Pozo
- Bioinformatics Unit, Spanish National Cancer Research Centre (CNIO), Calle Melchor Fernandez, Madrid, Spain
| | - Tomas di Domenico
- Bioinformatics Unit, Spanish National Cancer Research Centre (CNIO), Calle Melchor Fernandez, Madrid, Spain
| | - Jesus Vazquez
- Cardiovascular Proteomics Laboratory, Centro Nacional de Investigaciones Cardiovasculares (CNIC), Calle Melchor Fernandez, Madrid, Spain
- Centro de Investigación Biomédica en Red de Enfermedades Cardiovasculares (CIBERCV), Madrid, Spain
| | - Michael L. Tress
- Bioinformatics Unit, Spanish National Cancer Research Centre (CNIO), Calle Melchor Fernandez, Madrid, Spain
- * E-mail:
| |
Collapse
|
15
|
Sulakhe D, D'Souza M, Wang S, Balasubramanian S, Athri P, Xie B, Canzar S, Agam G, Gilliam TC, Maltsev N. Exploring the functional impact of alternative splicing on human protein isoforms using available annotation sources. Brief Bioinform 2020; 20:1754-1768. [PMID: 29931155 DOI: 10.1093/bib/bby047] [Citation(s) in RCA: 13] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/19/2018] [Revised: 05/02/2018] [Indexed: 12/30/2022] Open
Abstract
In recent years, the emphasis of scientific inquiry has shifted from whole-genome analyses to an understanding of cellular responses specific to tissue, developmental stage or environmental conditions. One of the central mechanisms underlying the diversity and adaptability of the contextual responses is alternative splicing (AS). It enables a single gene to encode multiple isoforms with distinct biological functions. However, to date, the functions of the vast majority of differentially spliced protein isoforms are not known. Integration of genomic, proteomic, functional, phenotypic and contextual information is essential for supporting isoform-based modeling and analysis. Such integrative proteogenomics approaches promise to provide insights into the functions of the alternatively spliced protein isoforms and provide high-confidence hypotheses to be validated experimentally. This manuscript provides a survey of the public databases supporting isoform-based biology. It also presents an overview of the potential global impact of AS on the human canonical gene functions, molecular interactions and cellular pathways.
Collapse
Affiliation(s)
- Dinanath Sulakhe
- Department of Human Genetics, University of Chicago, 920 E. 58th Street, Chicago, IL, USA.,Computation Institute, University of Chicago, 5735 S. Ellis Avenue, Chicago, IL, USA
| | - Mark D'Souza
- Department of Human Genetics, University of Chicago, 920 E. 58th Street, Chicago, IL, USA
| | - Sheng Wang
- Department of Human Genetics, University of Chicago, 920 E. 58th Street, Chicago, IL, USA.,Toyota Technological Institute at Chicago, 6045 S. Kenwood Avenue, Chicago, IL, USA
| | - Sandhya Balasubramanian
- Department of Human Genetics, University of Chicago, 920 E. 58th Street, Chicago, IL, USA.,Genentech, Inc. 1 DNA Way, Mail Stop: 35-6J, South San Francisco, CA, USA
| | - Prashanth Athri
- Department of Computer Science and Engineering, Amrita School of Engineering, Bengaluru, Amrita Vishwa Vidyapeetham, Kasavanahalli, Carmelaram P.O., Bengaluru, Karnataka, India
| | - Bingqing Xie
- Department of Human Genetics, University of Chicago, 920 E. 58th Street, Chicago, IL, USA.,Department of Computer Science, Illinois Institute of Technology, Chicago, IL, USA
| | - Stefan Canzar
- Toyota Technological Institute at Chicago, 6045 S. Kenwood Avenue, Chicago, IL, USA.,Gene Center, Ludwig-Maximilians-Universität München, Munich, Germany
| | - Gady Agam
- Department of Computer Science, Illinois Institute of Technology, Chicago, IL, USA
| | - T Conrad Gilliam
- Department of Human Genetics, University of Chicago, 920 E. 58th Street, Chicago, IL, USA.,Computation Institute, University of Chicago, 5735 S. Ellis Avenue, Chicago, IL, USA
| | - Natalia Maltsev
- Department of Human Genetics, University of Chicago, 920 E. 58th Street, Chicago, IL, USA.,Computation Institute, University of Chicago, 5735 S. Ellis Avenue, Chicago, IL, USA
| |
Collapse
|
16
|
Chaudhary S, Khokhar W, Jabre I, Reddy ASN, Byrne LJ, Wilson CM, Syed NH. Alternative Splicing and Protein Diversity: Plants Versus Animals. FRONTIERS IN PLANT SCIENCE 2019; 10:708. [PMID: 31244866 PMCID: PMC6581706 DOI: 10.3389/fpls.2019.00708] [Citation(s) in RCA: 116] [Impact Index Per Article: 19.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/08/2019] [Accepted: 05/13/2019] [Indexed: 05/11/2023]
Abstract
Plants, unlike animals, exhibit a very high degree of plasticity in their growth and development and employ diverse strategies to cope with the variations during diurnal cycles and stressful conditions. Plants and animals, despite their remarkable morphological and physiological differences, share many basic cellular processes and regulatory mechanisms. Alternative splicing (AS) is one such gene regulatory mechanism that modulates gene expression in multiple ways. It is now well established that AS is prevalent in all multicellular eukaryotes including plants and humans. Emerging evidence indicates that in plants, as in animals, transcription and splicing are coupled. Here, we reviewed recent evidence in support of co-transcriptional splicing in plants and highlighted similarities and differences between plants and humans. An unsettled question in the field of AS is the extent to which splice isoforms contribute to protein diversity. To take a critical look at this question, we presented a comprehensive summary of the current status of research in this area in both plants and humans, discussed limitations with the currently used approaches and suggested improvements to current methods and alternative approaches. We end with a discussion on the potential role of epigenetic modifications and chromatin state in splicing memory in plants primed with stresses.
Collapse
Affiliation(s)
- Saurabh Chaudhary
- School of Human and Life Sciences, Canterbury Christ Church University, Canterbury, United Kingdom
| | - Waqas Khokhar
- School of Human and Life Sciences, Canterbury Christ Church University, Canterbury, United Kingdom
| | - Ibtissam Jabre
- School of Human and Life Sciences, Canterbury Christ Church University, Canterbury, United Kingdom
| | - Anireddy S. N. Reddy
- Department of Biology and Program in Cell and Molecular Biology, Colorado State University, Fort Collins, CO, United States
| | - Lee J. Byrne
- School of Human and Life Sciences, Canterbury Christ Church University, Canterbury, United Kingdom
| | - Cornelia M. Wilson
- School of Human and Life Sciences, Canterbury Christ Church University, Canterbury, United Kingdom
| | - Naeem H. Syed
- School of Human and Life Sciences, Canterbury Christ Church University, Canterbury, United Kingdom
- *Correspondence: Naeem H. Syed,
| |
Collapse
|
17
|
Romero MR, Pérez-Figueroa A, Carrera M, Swanson WJ, Skibinski DOF, Diz AP. RNA-seq coupled to proteomic analysis reveals high sperm proteome variation between two closely related marine mussel species. J Proteomics 2018; 192:169-187. [PMID: 30189323 DOI: 10.1016/j.jprot.2018.08.020] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/14/2018] [Revised: 08/10/2018] [Accepted: 08/31/2018] [Indexed: 12/12/2022]
Abstract
Speciation mechanisms in marine organisms have attracted great interest because of the apparent lack of substantial barriers to genetic exchange in marine ecosystems. Marine mussels of the Mytilus edulis species complex provide a good model to study mechanisms underlying species formation. They hybridise extensively at many localities and both pre- and postzygotic isolating mechanisms may be operating. Mussels have external fertilisation and sperm cells should show specific adaptations for survival and successful fertilisation. Sperm thus represent key targets in investigations of the molecular mechanisms underlying reproductive isolation. We undertook a deep transcriptome sequencing (RNA-seq) of mature male gonads and a 2DE/MS-based proteome analysis of sperm from Mytilus edulis and M. galloprovincialis raised in a common environment. We provide evidence of extensive expression differences between the two mussel species, and general agreement between the transcriptomic and proteomic results in the direction of expression differences between species. Differential expression is marked for mitochondrial genes and for those involved in spermatogenesis, sperm motility, sperm-egg interactions, the acrosome reaction, sperm capacitation, ATP reserves and ROS production. Proteins and their corresponding genes might thus be good targets in further genomic analysis of reproductive barriers between these closely related species. SIGNIFICANCE: Model systems for the study of fertilization include marine invertebrates with external fertilisation, such as abalones, sea urchins and mussels, because of the ease with which large quantities of gametes released into seawater can be collected after induced spawning. Unlike abalones and sea urchins, hybridisation has been reported between mussels of different Mytilus spp., which thus makes them very appealing for the study of reproductive isolation at both pre- and postzygotic levels. There is a lack of empirical proteomic studies on sperm samples comparing different Mytilus species, which could help to advance this study. A comparative analysis of sperm proteomes across different taxa may provide important insights into the fundamental molecular processes and mechanisms involved in reproductive isolation. It might also contribute to a better understanding of sperm function and of the adaptive evolution of sperm proteins in different taxa. There is now growing evidence from genomics studies that multiple protein complexes and many individual proteins might have important functions in sperm biology and the fertilisation process. From an applied perspective, the identification of sperm-specific proteins could also contribute to the improved understanding of fertility problems and as targets for fertility control.
Collapse
Affiliation(s)
- Mónica R Romero
- Department of Biochemistry, Genetics and Immunology, Faculty of Biology, University of Vigo, Vigo, Spain; Marine Research Centre, University of Vigo (CIM-UVIGO), Isla de Toralla, Vigo, Spain
| | - Andrés Pérez-Figueroa
- Department of Biochemistry, Genetics and Immunology, Faculty of Biology, University of Vigo, Vigo, Spain
| | | | - Willie J Swanson
- Department of Genome Sciences, School of Medicine, University of Washington, Seattle, USA
| | - David O F Skibinski
- Institute of Life Science, Swansea University Medical School, Swansea University, Swansea, UK
| | - Angel P Diz
- Department of Biochemistry, Genetics and Immunology, Faculty of Biology, University of Vigo, Vigo, Spain; Marine Research Centre, University of Vigo (CIM-UVIGO), Isla de Toralla, Vigo, Spain.
| |
Collapse
|
18
|
Ramalho RF, Li S, Radivojac P, Hahn MW. Proteomic Evidence for In-Frame and Out-of-Frame Alternatively Spliced Isoforms in Human and Mouse. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2018; 15:1284-1289. [PMID: 26394435 DOI: 10.1109/tcbb.2015.2480068] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/05/2023]
Abstract
In order to find evidence for translation of alternatively spliced transcripts, especially those that result in a change in reading frame, we collected exon-skipping cases previously found by RNA-Seq and applied a computational approach to screen millions of mass spectra. These spectra came from seven human and six mouse tissues, five of which are the same between the two organisms: liver, kidney, lung, heart, and brain. Overall, we detected 4 percent of all exon-skipping events found in RNA-seq data, regardless of their effect on reading frame. The fraction of alternative isoforms detected did not differ between out-of-frame and in-frame events. Moreover, the fraction of identified alternative exon-exon junctions and constitutive junctions were similar. Together, our results suggest that both in-frame and out-of-frame translation may be actively used to regulate protein activity or localization.
Collapse
|
19
|
Kiseleva OI, Lisitsa AV, Poverennaya EV. Proteoforms: Methods of Analysis and Clinical Prospects. Mol Biol 2018. [DOI: 10.1134/s0026893318030068] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/22/2022]
|
20
|
Hatje K, Rahman RU, Vidal RO, Simm D, Hammesfahr B, Bansal V, Rajput A, Mickael ME, Sun T, Bonn S, Kollmar M. The landscape of human mutually exclusive splicing. Mol Syst Biol 2017; 13:959. [PMID: 29242366 PMCID: PMC5740500 DOI: 10.15252/msb.20177728] [Citation(s) in RCA: 47] [Impact Index Per Article: 5.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022] Open
Abstract
Mutually exclusive splicing of exons is a mechanism of functional gene and protein diversification with pivotal roles in organismal development and diseases such as Timothy syndrome, cardiomyopathy and cancer in humans. In order to obtain a first genomewide estimate of the extent and biological role of mutually exclusive splicing in humans, we predicted and subsequently validated mutually exclusive exons (MXEs) using 515 publically available RNA‐Seq datasets. Here, we provide evidence for the expression of over 855 MXEs, 42% of which represent novel exons, increasing the annotated human mutually exclusive exome more than fivefold. The data provide strong evidence for the existence of large and multi‐cluster MXEs in higher vertebrates and offer new insights into MXE evolution. More than 82% of the MXE clusters are conserved in mammals, and five clusters have homologous clusters in Drosophila. Finally, MXEs are significantly enriched in pathogenic mutations and their spatio‐temporal expression might predict human disease pathology.
Collapse
Affiliation(s)
- Klas Hatje
- Group Systems Biology of Motor Proteins Department of NMR-Based Structural Biology Max-Planck-Institute for Biophysical Chemistry, Göttingen, Germany.,Group of Computational Systems Biology, German Center for Neurodegenerative Diseases, Göttingen, Germany
| | - Raza-Ur Rahman
- Group of Computational Systems Biology, German Center for Neurodegenerative Diseases, Göttingen, Germany.,Center for Molecular Neurobiology, Institute of Medical Systems Biology University Clinic Hamburg-Eppendorf, Hamburg, Germany
| | - Ramon O Vidal
- Group of Computational Systems Biology, German Center for Neurodegenerative Diseases, Göttingen, Germany
| | - Dominic Simm
- Group Systems Biology of Motor Proteins Department of NMR-Based Structural Biology Max-Planck-Institute for Biophysical Chemistry, Göttingen, Germany.,Theoretical Computer Science and Algorithmic Methods, Institute of Computer Science Georg-August-University, Göttingen, Germany
| | - Björn Hammesfahr
- Group Systems Biology of Motor Proteins Department of NMR-Based Structural Biology Max-Planck-Institute for Biophysical Chemistry, Göttingen, Germany
| | - Vikas Bansal
- Group of Computational Systems Biology, German Center for Neurodegenerative Diseases, Göttingen, Germany.,Center for Molecular Neurobiology, Institute of Medical Systems Biology University Clinic Hamburg-Eppendorf, Hamburg, Germany
| | - Ashish Rajput
- Group of Computational Systems Biology, German Center for Neurodegenerative Diseases, Göttingen, Germany.,Center for Molecular Neurobiology, Institute of Medical Systems Biology University Clinic Hamburg-Eppendorf, Hamburg, Germany
| | - Michel Edwar Mickael
- Group of Computational Systems Biology, German Center for Neurodegenerative Diseases, Göttingen, Germany.,Center for Molecular Neurobiology, Institute of Medical Systems Biology University Clinic Hamburg-Eppendorf, Hamburg, Germany
| | - Ting Sun
- Group of Computational Systems Biology, German Center for Neurodegenerative Diseases, Göttingen, Germany.,Center for Molecular Neurobiology, Institute of Medical Systems Biology University Clinic Hamburg-Eppendorf, Hamburg, Germany
| | - Stefan Bonn
- Group of Computational Systems Biology, German Center for Neurodegenerative Diseases, Göttingen, Germany .,Center for Molecular Neurobiology, Institute of Medical Systems Biology University Clinic Hamburg-Eppendorf, Hamburg, Germany.,German Center for Neurodegenerative Diseases, Tübingen, Germany
| | - Martin Kollmar
- Group Systems Biology of Motor Proteins Department of NMR-Based Structural Biology Max-Planck-Institute for Biophysical Chemistry, Göttingen, Germany
| |
Collapse
|
21
|
Yefremova Y, Danquah BD, Opuni KF, El-Kased R, Koy C, Glocker MO. Mass spectrometric characterization of protein structures and protein complexes in condensed and gas phase. EUROPEAN JOURNAL OF MASS SPECTROMETRY (CHICHESTER, ENGLAND) 2017; 23:445-459. [PMID: 29183193 DOI: 10.1177/1469066717722256] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/07/2023]
Abstract
Proteins are essential for almost all physiological processes of life. They serve a myriad of functions which are as varied as their unique amino acid sequences and their corresponding three-dimensional structures. To fulfill their tasks, most proteins depend on stable physical associations, in the form of protein complexes that evolved between themselves and other proteins. In solution (condensed phase), proteins and/or protein complexes are in constant energy exchange with the surrounding solvent. Albeit methods to describe in-solution thermodynamic properties of proteins and of protein complexes are well established and broadly applied, they do not provide a broad enough access to life-science experimentalists to study all their proteins' properties at leisure. This leaves great desire to add novel methods to the analytical biochemist's toolbox. The development of electrospray ionization created the opportunity to characterize protein higher order structures and protein complexes rather elegantly by simultaneously lessening the need of sophisticated sample preparation steps. Electrospray mass spectrometry enabled us to translate proteins and protein complexes very efficiently into the gas phase under mild conditions, retaining both, intact protein complexes, and gross protein structures upon phase transition. Moreover, in the environment of the mass spectrometer (gas phase, in vacuo), analyte molecules are free of interactions with surrounding solvent molecules and, therefore, the energy of inter- and intramolecular forces can be studied independently from interference of the solvating environment. Provided that gas phase methods can give information which is relevant for understanding in-solution processes, gas phase protein structure studies and/or investigations on the characterization of protein complexes has rapidly gained more and more attention from the bioanalytical scientific community. Recent reports have shown that electrospray mass spectrometry provides direct access to six prime protein complex properties: stabilities, compositions, binding surfaces (epitopes), disassembly processes, stoichiometries, and thermodynamic parameters.
Collapse
Affiliation(s)
- Yelena Yefremova
- 1 Proteome Center Rostock, University of Rostock, Rostock, Germany
| | - Bright D Danquah
- 1 Proteome Center Rostock, University of Rostock, Rostock, Germany
| | | | - Reham El-Kased
- 3 Microbiology and Immunology, Faculty of Pharmacy, The British University in Egypt, Cairo, Egypt
| | - Cornelia Koy
- 1 Proteome Center Rostock, University of Rostock, Rostock, Germany
| | | |
Collapse
|
22
|
Saudemont B, Popa A, Parmley JL, Rocher V, Blugeon C, Necsulea A, Meyer E, Duret L. The fitness cost of mis-splicing is the main determinant of alternative splicing patterns. Genome Biol 2017; 18:208. [PMID: 29084568 PMCID: PMC5663052 DOI: 10.1186/s13059-017-1344-6] [Citation(s) in RCA: 60] [Impact Index Per Article: 7.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/03/2017] [Accepted: 10/09/2017] [Indexed: 12/19/2022] Open
Abstract
BACKGROUND Most eukaryotic genes are subject to alternative splicing (AS), which may contribute to the production of protein variants or to the regulation of gene expression via nonsense-mediated messenger RNA (mRNA) decay (NMD). However, a fraction of splice variants might correspond to spurious transcripts and the question of the relative proportion of splicing errors to functional splice variants remains highly debated. RESULTS We propose a test to quantify the fraction of AS events corresponding to errors. This test is based on the fact that the fitness cost of splicing errors increases with the number of introns in a gene and with expression level. We analyzed the transcriptome of the intron-rich eukaryote Paramecium tetraurelia. We show that in both normal and in NMD-deficient cells, AS rates strongly decrease with increasing expression level and with increasing number of introns. This relationship is observed for AS events that are detectable by NMD as well as for those that are not, which invalidates the hypothesis of a link with the regulation of gene expression. Our results show that in genes with a median expression level, 92-98% of observed splice variants correspond to errors. We observed the same patterns in human transcriptomes and we further show that AS rates correlate with the fitness cost of splicing errors. CONCLUSIONS These observations indicate that genes under weaker selective pressure accumulate more maladaptive substitutions and are more prone to splicing errors. Thus, to a large extent, patterns of gene expression variants simply reflect the balance between selection, mutation, and drift.
Collapse
Affiliation(s)
- Baptiste Saudemont
- Institut de Biologie de l’Ecole Normale Supérieure (IBENS), CNRS, Inserm, Ecole Normale Supérieure, PSL Research University, F-75005 Paris, France
- (Epi)genomics of Animal Development Unit, Department of Developmental and Stem Cell Biology, Institut Pasteur, 75015 Paris, France
| | - Alexandra Popa
- Université de Lyon, Université Claude Bernard, CNRS, Laboratoire de Biométrie et Biologie Evolutive UMR 5558, F-69100 Villeurbanne, France
- CeMM Research Center for Molecular Medicine of the Austrian Academy of Sciences, Lazarettgasse 14 AKH BT25.3, 1090 Vienna, Austria
| | - Joanna L. Parmley
- Université de Lyon, Université Claude Bernard, CNRS, Laboratoire de Biométrie et Biologie Evolutive UMR 5558, F-69100 Villeurbanne, France
- General Bioinformatics, Reading Enterprise Centre, The University of Reading, Whiteknights Road, Reading, RG6 6BU UK
| | - Vincent Rocher
- Université de Lyon, Université Claude Bernard, CNRS, Laboratoire de Biométrie et Biologie Evolutive UMR 5558, F-69100 Villeurbanne, France
| | - Corinne Blugeon
- Institut de Biologie de l’Ecole Normale Supérieure (IBENS), CNRS, Inserm, Ecole Normale Supérieure, PSL Research University, F-75005 Paris, France
| | - Anamaria Necsulea
- Université de Lyon, Université Claude Bernard, CNRS, Laboratoire de Biométrie et Biologie Evolutive UMR 5558, F-69100 Villeurbanne, France
| | - Eric Meyer
- Institut de Biologie de l’Ecole Normale Supérieure (IBENS), CNRS, Inserm, Ecole Normale Supérieure, PSL Research University, F-75005 Paris, France
| | - Laurent Duret
- Université de Lyon, Université Claude Bernard, CNRS, Laboratoire de Biométrie et Biologie Evolutive UMR 5558, F-69100 Villeurbanne, France
| |
Collapse
|
23
|
Ranwez V, Serra A, Pot D, Chantret N. Domestication reduces alternative splicing expression variations in sorghum. PLoS One 2017; 12:e0183454. [PMID: 28886042 PMCID: PMC5590825 DOI: 10.1371/journal.pone.0183454] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/19/2017] [Accepted: 08/06/2017] [Indexed: 01/09/2023] Open
Abstract
Domestication is known to strongly reduce genomic diversity through population bottlenecks. The resulting loss of polymorphism has been thoroughly documented in numerous cultivated species. Here we investigate the impact of domestication on the diversity of alternative transcript expressions using RNAseq data obtained on cultivated and wild sorghum accessions (ten accessions for each pool). In that aim, we focus on genes expressing two isoforms in sorghum and estimate the ratio between expression levels of those isoforms in each accession. Noticeably, for a given gene, one isoform can either be overexpressed or underexpressed in some wild accessions, whereas in the cultivated accessions, the balance between the two isoforms of the same gene appears to be much more homogenous. Indeed, we observe in sorghum significantly more variation in isoform expression balance among wild accessions than among domesticated accessions. The possibility exists that the loss of nucleotide diversity due to domestication could affect regulatory elements, controlling transcription or degradation of these isoforms. Impact on the isoform expression balance is discussed. As far as we know, this is the first time that the impact of domestication on transcript isoform balance has been studied at the genomic scale. This could pave the way towards the identification of key domestication genes with finely tuned isoform expressions in domesticated accessions while being highly variable in their wild relatives.
Collapse
Affiliation(s)
| | - Audrey Serra
- Montpellier SupAgro, UMR AGAP, Montpellier, France
| | - David Pot
- CIRAD, UMR AGAP, Montpellier, France
| | | |
Collapse
|
24
|
Liu Y, Gonzàlez-Porta M, Santos S, Brazma A, Marioni JC, Aebersold R, Venkitaraman AR, Wickramasinghe VO. Impact of Alternative Splicing on the Human Proteome. Cell Rep 2017; 20:1229-1241. [PMID: 28768205 PMCID: PMC5554779 DOI: 10.1016/j.celrep.2017.07.025] [Citation(s) in RCA: 124] [Impact Index Per Article: 15.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/21/2017] [Revised: 06/02/2017] [Accepted: 07/12/2017] [Indexed: 02/02/2023] Open
Abstract
Alternative splicing is a critical determinant of genome complexity and, by implication, is assumed to engender proteomic diversity. This notion has not been experimentally tested in a targeted, quantitative manner. Here, we have developed an integrative approach to ask whether perturbations in mRNA splicing patterns alter the composition of the proteome. We integrate RNA sequencing (RNA-seq) (to comprehensively report intron retention, differential transcript usage, and gene expression) with a data-independent acquisition (DIA) method, SWATH-MS (sequential window acquisition of all theoretical spectra-mass spectrometry), to capture an unbiased, quantitative snapshot of the impact of constitutive and alternative splicing events on the proteome. Whereas intron retention is accompanied by decreased protein abundance, alterations in differential transcript usage and gene expression alter protein abundance proportionate to transcript levels. Our findings illustrate how RNA splicing links isoform expression in the human transcriptome with proteomic diversity and provides a foundation for studying perturbations associated with human diseases.
Collapse
Affiliation(s)
- Yansheng Liu
- Department of Biology, Institute of Molecular Systems Biology, ETH Zurich, Zurich, Switzerland
| | - Mar Gonzàlez-Porta
- European Molecular Biology Laboratory-European Bioinformatics Institute (EMBL-EBI), Hinxton, UK
| | - Sergio Santos
- European Molecular Biology Laboratory-European Bioinformatics Institute (EMBL-EBI), Hinxton, UK
| | - Alvis Brazma
- European Molecular Biology Laboratory-European Bioinformatics Institute (EMBL-EBI), Hinxton, UK
| | - John C Marioni
- European Molecular Biology Laboratory-European Bioinformatics Institute (EMBL-EBI), Hinxton, UK
| | - Ruedi Aebersold
- Department of Biology, Institute of Molecular Systems Biology, ETH Zurich, Zurich, Switzerland.
| | - Ashok R Venkitaraman
- The Medical Research Council Cancer Unit, University of Cambridge, Cambridge CB2 0XZ, UK.
| | - Vihandha O Wickramasinghe
- The Medical Research Council Cancer Unit, University of Cambridge, Cambridge CB2 0XZ, UK; RNA Biology and Cancer Laboratory, Peter MacCallum Cancer Centre, Melbourne, VIC 3000, Australia.
| |
Collapse
|
25
|
Fernández-Moya SM, Ehses J, Kiebler MA. The alternative life of RNA-sequencing meets single molecule approaches. FEBS Lett 2017; 591:1455-1470. [PMID: 28369835 DOI: 10.1002/1873-3468.12639] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/22/2017] [Revised: 03/15/2017] [Accepted: 03/24/2017] [Indexed: 12/31/2022]
Abstract
The central dogma of RNA processing has started to totter. Single genes produce a variety of mRNA isoforms by mRNA modification, alternative polyadenylation (APA), and splicing. Different isoforms, even those that code for the identical protein, may differ in function or spatiotemporal expression. One option of how this can be achieved is by the selective recruitment of trans-acting factors to the 3'-untranslated region of a given isoform. Recent innovations in high-throughput RNA-sequencing methods allow deep insight into global RNA regulation, whereas novel imaging-based technologies enable researchers to explore single RNA molecules during different stages of development, in different tissues and different compartments of the cell. Resolving the dynamic function of ribonucleoprotein particles in splicing, APA, or RNA modification will enable us to understand their contribution to pathological conditions.
Collapse
Affiliation(s)
| | - Janina Ehses
- BioMedical Center, Ludwig Maximilians University, Planegg-Martinsried, Germany
| | - Michael A Kiebler
- BioMedical Center, Ludwig Maximilians University, Planegg-Martinsried, Germany
| |
Collapse
|
26
|
Will T, Helms V. Rewiring of the inferred protein interactome during blood development studied with the tool PPICompare. BMC SYSTEMS BIOLOGY 2017; 11:44. [PMID: 28376810 PMCID: PMC5379774 DOI: 10.1186/s12918-017-0400-x] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 10/05/2016] [Accepted: 01/26/2017] [Indexed: 12/24/2022]
Abstract
BACKGROUND Differential analysis of cellular conditions is a key approach towards understanding the consequences and driving causes behind biological processes such as developmental transitions or diseases. The progress of whole-genome expression profiling enabled to conveniently capture the state of a cell's transcriptome and to detect the characteristic features that distinguish cells in specific conditions. In contrast, mapping the physical protein interactome for many samples is experimentally infeasible at the moment. For the understanding of the whole system, however, it is equally important how the interactions of proteins are rewired between cellular states. To overcome this deficiency, we recently showed how condition-specific protein interaction networks that even consider alternative splicing can be inferred from transcript expression data. Here, we present the differential network analysis tool PPICompare that was specifically designed for isoform-sensitive protein interaction networks. RESULTS Besides detecting significant rewiring events between the interactomes of grouped samples, PPICompare infers which alterations to the transcriptome caused each rewiring event and what is the minimal set of alterations necessary to explain all between-group changes. When applied to the development of blood cells, we verified that a reasonable amount of rewiring events were reported by the tool and found that differential gene expression was the major determinant of cellular adjustments to the interactome. Alternative splicing events were consistently necessary in each developmental step to explain all significant alterations and were especially important for rewiring in the context of transcriptional control. CONCLUSIONS Applying PPICompare enabled us to investigate the dynamics of the human protein interactome during developmental transitions. A platform-independent implementation of the tool PPICompare is available at https://sourceforge.net/projects/ppicompare/ .
Collapse
Affiliation(s)
- Thorsten Will
- Center for Bioinformatics, Saarland University, Campus E2.1, Saarbrücken, 66123 Germany
- Graduate School of Computer Science, Saarland University, Campus E1.3, Saarbrücken, 66123 Germany
| | - Volkhard Helms
- Center for Bioinformatics, Saarland University, Campus E2.1, Saarbrücken, 66123 Germany
| |
Collapse
|
27
|
A Golden Age for Working with Public Proteomics Data. Trends Biochem Sci 2017; 42:333-341. [PMID: 28118949 PMCID: PMC5414595 DOI: 10.1016/j.tibs.2017.01.001] [Citation(s) in RCA: 71] [Impact Index Per Article: 8.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/02/2016] [Revised: 12/13/2016] [Accepted: 01/02/2017] [Indexed: 11/23/2022]
Abstract
Data sharing in mass spectrometry (MS)-based proteomics is becoming a common scientific practice, as is now common in the case of other, more mature ‘omics’ disciplines like genomics and transcriptomics. We want to highlight that this situation, unprecedented in the field, opens a plethora of opportunities for data scientists. First, we explain in some detail some of the work already achieved, such as systematic reanalysis efforts. We also explain existing applications of public proteomics data, such as proteogenomics and the creation of spectral libraries and spectral archives. Finally, we discuss the main existing challenges and mention the first attempts to combine public proteomics data with other types of omics data sets. The field of proteomics has matured and diversified substantially over the past 10 years. Proteomics data are increasingly shared through centralized, public repositories. Standardization efforts have ensured that a large proportion of these public data can be read and processed by any interested researcher. Because any proteomics data set is only partially understood, there is great opportunity for (orthogonal) reuse of public data. While public proteomics data has so far remained outside ethics and privacy discussions, recent work indicates that there is an inherent risk.
Collapse
|
28
|
Tress ML, Abascal F, Valencia A. Alternative Splicing May Not Be the Key to Proteome Complexity. Trends Biochem Sci 2016; 42:98-110. [PMID: 27712956 DOI: 10.1016/j.tibs.2016.08.008] [Citation(s) in RCA: 231] [Impact Index Per Article: 25.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/16/2016] [Revised: 05/19/2016] [Accepted: 08/15/2016] [Indexed: 12/21/2022]
Abstract
Alternative splicing is commonly believed to be a major source of cellular protein diversity. However, although many thousands of alternatively spliced transcripts are routinely detected in RNA-seq studies, reliable large-scale mass spectrometry-based proteomics analyses identify only a small fraction of annotated alternative isoforms. The clearest finding from proteomics experiments is that most human genes have a single main protein isoform, while those alternative isoforms that are identified tend to be the most biologically plausible: those with the most cross-species conservation and those that do not compromise functional domains. Indeed, most alternative exons do not seem to be under selective pressure, suggesting that a large majority of predicted alternative transcripts may not even be translated into proteins.
Collapse
Affiliation(s)
- Michael L Tress
- Structural Biology and Bioinformatics Programme, Spanish National Cancer Research Centre (CNIO), Melchor Fernández Almagro, 3, 28029 Madrid, Spain
| | - Federico Abascal
- Structural Biology and Bioinformatics Programme, Spanish National Cancer Research Centre (CNIO), Melchor Fernández Almagro, 3, 28029 Madrid, Spain; Human Genetics Department, Sandhu Group, Wellcome Trust Sanger Institute, Genome Campus, Hinxton, Cambridge CB10 1SA, UK
| | - Alfonso Valencia
- Structural Biology and Bioinformatics Programme, Spanish National Cancer Research Centre (CNIO), Melchor Fernández Almagro, 3, 28029 Madrid, Spain; National Bioinformatics Institute (INB), Spanish National Cancer Research Centre (CNIO), Melchor Fernández Almagro, 3, 28029 Madrid, Spain.
| |
Collapse
|
29
|
Vaudel M, Verheggen K, Csordas A, Raeder H, Berven FS, Martens L, Vizcaíno JA, Barsnes H. Exploring the potential of public proteomics data. Proteomics 2016; 16:214-25. [PMID: 26449181 PMCID: PMC4738454 DOI: 10.1002/pmic.201500295] [Citation(s) in RCA: 57] [Impact Index Per Article: 6.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/13/2015] [Revised: 08/25/2015] [Accepted: 09/28/2015] [Indexed: 12/22/2022]
Abstract
In a global effort for scientific transparency, it has become feasible and good practice to share experimental data supporting novel findings. Consequently, the amount of publicly available MS-based proteomics data has grown substantially in recent years. With some notable exceptions, this extensive material has however largely been left untouched. The time has now come for the proteomics community to utilize this potential gold mine for new discoveries, and uncover its untapped potential. In this review, we provide a brief history of the sharing of proteomics data, showing ways in which publicly available proteomics data are already being (re-)used, and outline potential future opportunities based on four different usage types: use, reuse, reprocess, and repurpose. We thus aim to assist the proteomics community in stepping up to the challenge, and to make the most of the rapidly increasing amount of public proteomics data.
Collapse
Affiliation(s)
- Marc Vaudel
- Proteomics Unit, Department of Biomedicine, University of Bergen, Bergen, Norway
| | - Kenneth Verheggen
- Medical Biotechnology Center, VIB, Ghent, Belgium
- Department of Biochemistry, Ghent University, Ghent, Belgium
- Bioinformatics Institute Ghent, Ghent University, Ghent, Belgium
| | - Attila Csordas
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Trust Genome Campus, Hinxton, Cambridge, UK
| | - Helge Raeder
- Department of Clinical Science, KG Jebsen Center for Diabetes Research, University of Bergen, Bergen, Norway
| | - Frode S Berven
- Proteomics Unit, Department of Biomedicine, University of Bergen, Bergen, Norway
- Department of Clinical Medicine, KG Jebsen Centre for Multiple Sclerosis Research, University of Bergen, Bergen, Norway
| | - Lennart Martens
- Medical Biotechnology Center, VIB, Ghent, Belgium
- Department of Biochemistry, Ghent University, Ghent, Belgium
- Bioinformatics Institute Ghent, Ghent University, Ghent, Belgium
| | - Juan A Vizcaíno
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Trust Genome Campus, Hinxton, Cambridge, UK
| | - Harald Barsnes
- Proteomics Unit, Department of Biomedicine, University of Bergen, Bergen, Norway
- Department of Clinical Science, KG Jebsen Center for Diabetes Research, University of Bergen, Bergen, Norway
| |
Collapse
|
30
|
Will T, Helms V. PPIXpress: construction of condition-specific protein interaction networks based on transcript expression. Bioinformatics 2015; 32:571-8. [PMID: 26508756 DOI: 10.1093/bioinformatics/btv620] [Citation(s) in RCA: 17] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/05/2015] [Accepted: 10/20/2015] [Indexed: 12/13/2022] Open
Abstract
UNLABELLED Protein-protein interaction networks are an important component of modern systems biology. Yet, comparatively few efforts have been made to tailor their topology to the actual cellular condition being studied. Here, we present a network construction method that exploits expression data at the transcript-level and thus reveals alterations in protein connectivity not only caused by differential gene expression but also by alternative splicing. We achieved this by establishing a direct correspondence between individual protein interactions and underlying domain interactions in a complete but condition-unspecific protein interaction network. This knowledge was then used to infer the condition-specific presence of interactions from the dominant protein isoforms. When we compared contextualized interaction networks of matched normal and tumor samples in breast cancer, our transcript-based construction identified more significant alterations that affected proteins associated with cancerogenesis than a method that only uses gene expression data. The approach is provided as the user-friendly tool PPIXpress. AVAILABILITY AND IMPLEMENTATION PPIXpress is available at https://sourceforge.net/projects/ppixpress/.
Collapse
Affiliation(s)
- Thorsten Will
- Center for Bioinformatics and Graduate School of Computer Science, Saarland University, Saarbrücken, Germany
| | | |
Collapse
|
31
|
Zhang M, Sun F, Chen F, Zhou B, Duan Y, Su H, Lin X. Subcellular proteomic approach for identifying the signaling effectors of protein kinase C-β₂ under high glucose conditions in human umbilical vein endothelial cells. Mol Med Rep 2015; 12:7247-62. [PMID: 26459836 PMCID: PMC4626174 DOI: 10.3892/mmr.2015.4403] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/13/2014] [Accepted: 08/05/2015] [Indexed: 11/06/2022] Open
Abstract
The high glucose‑induced activation of protein kinase C‑β2 (PKC‑β2) has an essential role in the pathophysiology of diabetes‑associated vascular disease. In the present study, human umbilical vein endothelial cells (HUVECs) were cultured in high and normal glucose conditions prior to being infected with a recombinant adenovirus to induce the overexpression of PKC‑β2. The activity of PKC‑β2 was also decreased using a selective PKC‑β2 inhibitor. A series of two‑dimensional electrophoresis images detected ~800 spots in the nuclei, and ~600 spots in the cytosol. Following intra‑ and inter‑group cross‑matching, 38 significantly altered spots were identified as high glucose‑induced and PKC‑β2‑associated nuclear proteins. In addition to the observation that the regulation of key proteins involved in the nuclear factor (NF)‑κB signaling cascade occurred in the cytosol, various transcription factors, including peroxisome proliferator‑activated receptor δ (PPAR‑δ), were also altered in the nuclei. A human protein‑protein interaction network of potential connections of PKC‑β2‑associated proteins was constructed in the proteomics investigation using Biological General Repository for Interaction Datasets. The results indicated that PKC‑β2 may be involved in high glucose‑induced glucose and lipid crosstalk by regulating PPAR‑δ. In addition, NF‑κB inhibitor‑interacting Ras‑like protein 1 may be important in the PKC‑β2‑NF‑κB inhibitor‑NF‑κB signaling pathway in HUVECs under high‑glucose conditions.
Collapse
Affiliation(s)
- Min Zhang
- Department of Endocrinology, The First Affiliated Hospital of Chongqing Medical University, Chongqing 400016, P.R. China
| | - Fang Sun
- Department of Hypertension and Endocrinology, Daping Hospital, Third Military Medical University, Chongqing 400042, P.R. China
| | - Fangfang Chen
- Department of Endocrinology, The First Affiliated Hospital of Chongqing Medical University, Chongqing 400016, P.R. China
| | - Bo Zhou
- Department of Endocrinology, The First Affiliated Hospital of Chongqing Medical University, Chongqing 400016, P.R. China
| | - Yaqian Duan
- Department of Endocrinology, The First Affiliated Hospital of Chongqing Medical University, Chongqing 400016, P.R. China
| | - Hong Su
- Department of Endocrinology, The First Affiliated Hospital of Chongqing Medical University, Chongqing 400016, P.R. China
| | - Xuebo Lin
- Department of Endocrinology, The First Affiliated Hospital of Chongqing Medical University, Chongqing 400016, P.R. China
| |
Collapse
|
32
|
Lees JG, Ranea JA, Orengo CA. Identifying and characterising key alternative splicing events in Drosophila development. BMC Genomics 2015; 16:608. [PMID: 26275604 PMCID: PMC4537583 DOI: 10.1186/s12864-015-1674-2] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/23/2015] [Accepted: 05/29/2015] [Indexed: 12/26/2022] Open
Abstract
BACKGROUND In complex Metazoans a given gene frequently codes for multiple protein isoforms, through processes such as alternative splicing. Large scale functional annotation of these isoforms is a key challenge for functional genomics. This annotation gap is increasing with the large numbers of multi transcript genes being identified by technologies such as RNASeq. Furthermore attempts to characterise the functions of splicing in an organism are complicated by the difficulty in distinguishing functional isoforms from those produced by splicing errors or transcription noise. Tools to help prioritise candidate isoforms for testing are largely absent. RESULTS In this study we implement a Time-course Switch (TS) score for ranking isoforms by their likelihood of producing additional functions based on their developmental expression profiles, as reported by modENCODE. The TS score allows us to better investigate functional roles of different isoforms expressed in multi transcript genes. From this analysis, we find that isoforms with high TS scores have sequence feature changes consistent with more deterministic splicing and functional changes and tend to gain domains or whole exons which could carry additional functions. Furthermore these functions appear to be particularly important for essential regulatory roles, establishing functional isoform switching as key for regulatory processes. Based on the TS score we develop a Transcript Annotations Pipeline for Alternative Splicing (TAPAS) that identifies functional neighbourhoods of potentially interesting isoforms. CONCLUSIONS We have identified a subset of protein isoforms which appear to have high functional significance, particularly in regulation. This has been made possible through the development of novel methods that make use of transcript expression profiles. The methods and analyses we present here represent important first steps in the development of tools to address the near complete lack of isoform specific function annotation. In turn the tools allow us to better characterise the regulatory functions of alternative splicing in more detail.
Collapse
Affiliation(s)
- Jonathan G Lees
- Institute of Structural and Molecular Biology, Division of Biosciences, University College London, Gower Street, London, WC1E 6BT, UK.
| | - Juan A Ranea
- Department of Molecular Biology and Biochemistry-CIBER de Enfermedades Raras, University of Malaga, Malaga, 29071, Spain.
| | - Christine A Orengo
- Institute of Structural and Molecular Biology, Division of Biosciences, University College London, Gower Street, London, WC1E 6BT, UK.
| |
Collapse
|
33
|
Hao Y, Colak R, Teyra J, Corbi-Verge C, Ignatchenko A, Hahne H, Wilhelm M, Kuster B, Braun P, Kaida D, Kislinger T, Kim PM. Semi-supervised Learning Predicts Approximately One Third of the Alternative Splicing Isoforms as Functional Proteins. Cell Rep 2015; 12:183-9. [PMID: 26146086 DOI: 10.1016/j.celrep.2015.06.031] [Citation(s) in RCA: 18] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/27/2014] [Revised: 02/18/2015] [Accepted: 06/09/2015] [Indexed: 12/30/2022] Open
Abstract
Alternative splicing acts on transcripts from almost all human multi-exon genes. Notwithstanding its ubiquity, fundamental ramifications of splicing on protein expression remain unresolved. The number and identity of spliced transcripts that form stably folded proteins remain the sources of considerable debate, due largely to low coverage of experimental methods and the resulting absence of negative data. We circumvent this issue by developing a semi-supervised learning algorithm, positive unlabeled learning for splicing elucidation (PULSE; http://www.kimlab.org/software/pulse), which uses 48 features spanning various categories. We validated its accuracy on sets of bona fide protein isoforms and directly on mass spectrometry (MS) spectra for an overall AU-ROC of 0.85. We predict that around 32% of "exon skipping" alternative splicing events produce stable proteins, suggesting that the process engenders a significant number of previously uncharacterized proteins. We also provide insights into the distribution of positive isoforms in various functional classes and into the structural effects of alternative splicing.
Collapse
Affiliation(s)
- Yanqi Hao
- Terrence Donnelly Centre for Cellular and Biomolecular Research, University of Toronto, Toronto, ON M5S 1AS, Canada; Department of Computer Science, University of Toronto, Toronto, ON M5S 3G4, Canada
| | - Recep Colak
- Terrence Donnelly Centre for Cellular and Biomolecular Research, University of Toronto, Toronto, ON M5S 1AS, Canada; Department of Computer Science, University of Toronto, Toronto, ON M5S 3G4, Canada
| | - Joan Teyra
- Terrence Donnelly Centre for Cellular and Biomolecular Research, University of Toronto, Toronto, ON M5S 1AS, Canada
| | - Carles Corbi-Verge
- Terrence Donnelly Centre for Cellular and Biomolecular Research, University of Toronto, Toronto, ON M5S 1AS, Canada
| | - Alexander Ignatchenko
- Department of Medical Biophysics, University of Toronto, Toronto, ON M5G 1L7, Canada
| | - Hannes Hahne
- Chair for Proteomics and Bioanalytics, TU Muenchen, Freising 85354, Germany
| | - Mathias Wilhelm
- Chair for Proteomics and Bioanalytics, TU Muenchen, Freising 85354, Germany
| | - Bernhard Kuster
- Chair for Proteomics and Bioanalytics, TU Muenchen, Freising 85354, Germany; German Cancer Consortium (DKTK), Munich, Germany; German Cancer Research Center (DKFZ), Heidelberg, Germany; Center for Integrated Protein Science Munich, Munich, Germany; Bavarian Biomolecular Mass Spectrometry Center, Technische Universität München, Freising, Germany
| | - Pascal Braun
- Lehrstuhl fuer Systembiologie der Pflanzen, TU Muenchen, Munich, Germany
| | - Daisuke Kaida
- Frontier Research Core for Life Sciences, University of Toyama, Toyama 930-8555, Japan
| | - Thomas Kislinger
- Department of Medical Biophysics, University of Toronto, Toronto, ON M5G 1L7, Canada; Princess Margaret Cancer Center, University Health Network, Toronto, ON M5T 2M9, Canada
| | - Philip M Kim
- Terrence Donnelly Centre for Cellular and Biomolecular Research, University of Toronto, Toronto, ON M5S 1AS, Canada; Department of Computer Science, University of Toronto, Toronto, ON M5S 3G4, Canada; Department of Molecular Genetics, University of Toronto, Toronto, ON M5S 1AS, Canada.
| |
Collapse
|
34
|
Abascal F, Ezkurdia I, Rodriguez-Rivas J, Rodriguez JM, del Pozo A, Vázquez J, Valencia A, Tress ML. Alternatively Spliced Homologous Exons Have Ancient Origins and Are Highly Expressed at the Protein Level. PLoS Comput Biol 2015; 11:e1004325. [PMID: 26061177 PMCID: PMC4465641 DOI: 10.1371/journal.pcbi.1004325] [Citation(s) in RCA: 58] [Impact Index Per Article: 5.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/16/2014] [Accepted: 05/08/2015] [Indexed: 11/19/2022] Open
Abstract
Alternative splicing of messenger RNA can generate a wide variety of mature RNA transcripts, and these transcripts may produce protein isoforms with diverse cellular functions. While there is much supporting evidence for the expression of alternative transcripts, the same is not true for the alternatively spliced protein products. Large-scale mass spectroscopy experiments have identified evidence of alternative splicing at the protein level, but with conflicting results. Here we carried out a rigorous analysis of the peptide evidence from eight large-scale proteomics experiments to assess the scale of alternative splicing that is detectable by high-resolution mass spectroscopy. We find fewer splice events than would be expected: we identified peptides for almost 64% of human protein coding genes, but detected just 282 splice events. This data suggests that most genes have a single dominant isoform at the protein level. Many of the alternative isoforms that we could identify were only subtly different from the main splice isoform. Very few of the splice events identified at the protein level disrupted functional domains, in stark contrast to the two thirds of splice events annotated in the human genome that would lead to the loss or damage of functional domains. The most striking result was that more than 20% of the splice isoforms we identified were generated by substituting one homologous exon for another. This is significantly more than would be expected from the frequency of these events in the genome. These homologous exon substitution events were remarkably conserved—all the homologous exons we identified evolved over 460 million years ago—and eight of the fourteen tissue-specific splice isoforms we identified were generated from homologous exons. The combination of proteomics evidence, ancient origin and tissue-specific splicing indicates that isoforms generated from homologous exons may have important cellular roles. Alternative splicing is thought to be one means for generating the protein diversity necessary for the whole range of cellular functions. While the presence of alternatively spliced transcripts in the cell has been amply demonstrated, the same cannot be said for alternatively spliced proteins. The quest for alternative protein isoforms has focused primarily on the analysis of peptides from large-scale mass spectroscopy experiments, but evidence for alternative isoforms has been patchy and contradictory. A careful analysis of the peptide evidence is needed to fully understand the scale of alternative splicing detectable at the protein level. Here we analysed peptides from eight large-scale data sets, identifying just 282 splice events among 12,716 genes. This suggests that most genes have a single dominant isoform. Many of the alternative isoforms that we identified were only subtly different from the main splice variant, and one in five was generated by substitution of homologous exons by swapping one related exon for another. Remarkably, the alternative isoforms generated from homologous exons were highly conserved, first appearing 460 million years ago, and several appear to have tissue-specific roles in the brain and heart. Our results suggest that these particular isoforms are likely to have important cellular roles.
Collapse
Affiliation(s)
- Federico Abascal
- Structural Biology and Bioinformatics Programme, Spanish National Cancer Research Centre (CNIO), Madrid, Spain
| | - Iakes Ezkurdia
- Unidad de Proteómica, Centro Nacional de Investigaciones Cardiovasculares (CNIC), Madrid, Spain
| | - Juan Rodriguez-Rivas
- Structural Biology and Bioinformatics Programme, Spanish National Cancer Research Centre (CNIO), Madrid, Spain
| | - Jose Manuel Rodriguez
- National Bioinformatics Institute (INB), Spanish National Cancer Research Centre (CNIO), Madrid, Spain
| | - Angela del Pozo
- Instituto de Genetica Medica y Molecular, Hospital Universitario La Paz, Madrid, Spain
| | - Jesús Vázquez
- Laboratorio de Proteómica Cardiovascular, Centro Nacional de Investigaciones Cardiovasculares (CNIC) Madrid, Spain
| | - Alfonso Valencia
- Structural Biology and Bioinformatics Programme, Spanish National Cancer Research Centre (CNIO), Madrid, Spain
- National Bioinformatics Institute (INB), Spanish National Cancer Research Centre (CNIO), Madrid, Spain
- * E-mail: (AV); (MLT)
| | - Michael L. Tress
- Structural Biology and Bioinformatics Programme, Spanish National Cancer Research Centre (CNIO), Madrid, Spain
- * E-mail: (AV); (MLT)
| |
Collapse
|
35
|
Rodriguez JM, Carro A, Valencia A, Tress ML. APPRIS WebServer and WebServices. Nucleic Acids Res 2015; 43:W455-9. [PMID: 25990727 PMCID: PMC4489225 DOI: 10.1093/nar/gkv512] [Citation(s) in RCA: 17] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/27/2015] [Accepted: 05/05/2015] [Indexed: 01/08/2023] Open
Abstract
This paper introduces the APPRIS WebServer (http://appris.bioinfo.cnio.es) and WebServices (http://apprisws.bioinfo.cnio.es). Both the web servers and the web services are based around the APPRIS Database, a database that presently houses annotations of splice isoforms for five different vertebrate genomes. The APPRIS WebServer and WebServices provide access to the computational methods implemented in the APPRIS Database, while the APPRIS WebServices also allows retrieval of the annotations. The APPRIS WebServer and WebServices annotate splice isoforms with protein structural and functional features, and with data from cross-species alignments. In addition they can use the annotations of structure, function and conservation to select a single reference isoform for each protein-coding gene (the principal protein isoform). APPRIS principal isoforms have been shown to agree overwhelmingly with the main protein isoform detected in proteomics experiments. The APPRIS WebServer allows for the annotation of splice isoforms for individual genes, and provides a range of visual representations and tools to allow researchers to identify the likely effect of splicing events. The APPRIS WebServices permit users to generate annotations automatically in high throughput mode and to interrogate the annotations in the APPRIS Database. The APPRIS WebServices have been implemented using REST architecture to be flexible, modular and automatic.
Collapse
Affiliation(s)
- Jose Manuel Rodriguez
- Spanish National Bioinformatics Institute (INB), Spanish National Cancer Research Centre (CNIO), Madrid 28029, Spain
| | - Angel Carro
- Bioinformatics Unit, Spanish National Cancer Research Centre (CNIO), Madrid 28029, Spain
| | - Alfonso Valencia
- Spanish National Bioinformatics Institute (INB), Spanish National Cancer Research Centre (CNIO), Madrid 28029, Spain Structural Biology and Biocomputing Programme, Spanish National Cancer Research Centre (CNIO), Madrid 28029, Spain
| | - Michael L Tress
- Structural Biology and Biocomputing Programme, Spanish National Cancer Research Centre (CNIO), Madrid 28029, Spain
| |
Collapse
|
36
|
Melé M, Ferreira PG, Reverter F, DeLuca DS, Monlong J, Sammeth M, Young TR, Goldmann JM, Pervouchine DD, Sullivan TJ, Johnson R, Segrè AV, Djebali S, Niarchou A, Wright FA, Lappalainen T, Calvo M, Getz G, Dermitzakis ET, Ardlie KG, Guigó R. Human genomics. The human transcriptome across tissues and individuals. Science 2015; 348:660-5. [PMID: 25954002 PMCID: PMC4547472 DOI: 10.1126/science.aaa0355] [Citation(s) in RCA: 911] [Impact Index Per Article: 91.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/17/2022]
Abstract
Transcriptional regulation and posttranscriptional processing underlie many cellular and organismal phenotypes. We used RNA sequence data generated by Genotype-Tissue Expression (GTEx) project to investigate the patterns of transcriptome variation across individuals and tissues. Tissues exhibit characteristic transcriptional signatures that show stability in postmortem samples. These signatures are dominated by a relatively small number of genes—which is most clearly seen in blood—though few are exclusive to a particular tissue and vary more across tissues than individuals. Genes exhibiting high interindividual expression variation include disease candidates associated with sex, ethnicity, and age. Primary transcription is the major driver of cellular specificity, with splicing playing mostly a complementary role; except for the brain, which exhibits a more divergent splicing program. Variation in splicing, despite its stochasticity, may play in contrast a comparatively greater role in defining individual phenotypes.
Collapse
Affiliation(s)
- Marta Melé
- Center for Genomic Regulation (CRG), Barcelona, Catalonia, Spain. Harvard Department of stem cell and regenerative biology, Harvard University, Cambridge, MA, USA
| | - Pedro G Ferreira
- Center for Genomic Regulation (CRG), Barcelona, Catalonia, Spain. Department of Genetic Medicine and Development, University of Geneva, Geneva, Switzerland. Institute for Genetics and Genomics in Geneva (iGE3), University of Geneva, Geneva, Switzerland. Swiss Institute of Bioinformatics, Geneva, Switzerland
| | - Ferran Reverter
- Center for Genomic Regulation (CRG), Barcelona, Catalonia, Spain. Facultat de Biologia, Universitat de Barcelona (UB), Barcelona, Catalonia, Spain. Universitat Pompeu Fabra (UPF), Barcelona, Catalonia, Spain
| | | | - Jean Monlong
- Center for Genomic Regulation (CRG), Barcelona, Catalonia, Spain. Universitat Pompeu Fabra (UPF), Barcelona, Catalonia, Spain. McGill University, Montreal, Canada
| | - Michael Sammeth
- Center for Genomic Regulation (CRG), Barcelona, Catalonia, Spain. Universitat Pompeu Fabra (UPF), Barcelona, Catalonia, Spain. National Institute for Scientific Computing (LNCC), Petropolis, Rio de Janeiro, Brazil
| | | | - Jakob M Goldmann
- Center for Genomic Regulation (CRG), Barcelona, Catalonia, Spain. Universitat Pompeu Fabra (UPF), Barcelona, Catalonia, Spain. Radboud University, Nijmegen, Netherlands
| | - Dmitri D Pervouchine
- Center for Genomic Regulation (CRG), Barcelona, Catalonia, Spain. Universitat Pompeu Fabra (UPF), Barcelona, Catalonia, Spain. Faculty of Bioengineering and Bioinformatics, Moscow State University, Leninskie Gory 1-73, 119992 Moscow, Russia
| | | | - Rory Johnson
- Center for Genomic Regulation (CRG), Barcelona, Catalonia, Spain. Universitat Pompeu Fabra (UPF), Barcelona, Catalonia, Spain
| | | | - Sarah Djebali
- Center for Genomic Regulation (CRG), Barcelona, Catalonia, Spain. Universitat Pompeu Fabra (UPF), Barcelona, Catalonia, Spain
| | - Anastasia Niarchou
- Department of Genetic Medicine and Development, University of Geneva, Geneva, Switzerland. Institute for Genetics and Genomics in Geneva (iGE3), University of Geneva, Geneva, Switzerland. Swiss Institute of Bioinformatics, Geneva, Switzerland
| | | | - Tuuli Lappalainen
- Department of Genetic Medicine and Development, University of Geneva, Geneva, Switzerland. Institute for Genetics and Genomics in Geneva (iGE3), University of Geneva, Geneva, Switzerland. Swiss Institute of Bioinformatics, Geneva, Switzerland. New York Genome Center, New York, NY, USA. Department of Systems Biology, Columbia University, New York, NY, USA
| | - Miquel Calvo
- Facultat de Biologia, Universitat de Barcelona (UB), Barcelona, Catalonia, Spain
| | - Gad Getz
- Broad Institute of MIT and Harvard, Cambridge, MA, USA. Cancer Center and Department of Pathology, Massachusetts General Hospital, Boston, MA 02114, USA
| | - Emmanouil T Dermitzakis
- Department of Genetic Medicine and Development, University of Geneva, Geneva, Switzerland. Institute for Genetics and Genomics in Geneva (iGE3), University of Geneva, Geneva, Switzerland. Swiss Institute of Bioinformatics, Geneva, Switzerland
| | | | - Roderic Guigó
- Center for Genomic Regulation (CRG), Barcelona, Catalonia, Spain. Universitat Pompeu Fabra (UPF), Barcelona, Catalonia, Spain. Institut Hospital del Mar d'Investigacions Mèdiques (IMIM), Barcelona, Catalonia, Spain. Joint CRG-Barcelona Super Computing Center (BSC)-Institut de Recerca Biomedica (IRB) Program in Computational Biology, Barcelona, Catalonia, Spain.
| |
Collapse
|
37
|
Abascal F, Tress ML, Valencia A. The evolutionary fate of alternatively spliced homologous exons after gene duplication. Genome Biol Evol 2015; 7:1392-403. [PMID: 25931610 PMCID: PMC4494069 DOI: 10.1093/gbe/evv076] [Citation(s) in RCA: 24] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/25/2022] Open
Abstract
Alternative splicing and gene duplication are the two main processes responsible for expanding protein functional diversity. Although gene duplication can generate new genes and alternative splicing can introduce variation through alternative gene products, the interplay between the two processes is complex and poorly understood. Here, we have carried out a study of the evolution of alternatively spliced exons after gene duplication to better understand the interaction between the two processes. We created a manually curated set of 97 human genes with mutually exclusively spliced homologous exons and analyzed the evolution of these exons across five distantly related vertebrates (lamprey, spotted gar, zebrafish, fugu, and coelacanth). Most of these exons had an ancient origin (more than 400 Ma). We found examples supporting two extreme evolutionary models for the behaviour of homologous axons after gene duplication. We observed 11 events in which gene duplication was accompanied by splice isoform separation, that is, each paralog specifically conserved just one distinct ancestral homologous exon. At other extreme, we identified genes in which the homologous exons were always conserved within paralogs, suggesting that the alternative splicing event cannot easily be separated from the function in these genes. That many homologous exons fall in between these two extremes highlights the diversity of biological systems and suggests that the subtle balance between alternative splicing and gene duplication is adjusted to the specific cellular context of each gene.
Collapse
Affiliation(s)
- Federico Abascal
- Structural Biology and Biocomputing Programme, Spanish National Cancer Research Centre (CNIO), Madrid, Spain
| | - Michael L Tress
- Structural Biology and Biocomputing Programme, Spanish National Cancer Research Centre (CNIO), Madrid, Spain
| | - Alfonso Valencia
- Structural Biology and Biocomputing Programme, Spanish National Cancer Research Centre (CNIO), Madrid, Spain
| |
Collapse
|
38
|
Ezkurdia I, Rodriguez JM, Carrillo-de Santa Pau E, Vázquez J, Valencia A, Tress ML. Most highly expressed protein-coding genes have a single dominant isoform. J Proteome Res 2015; 14:1880-7. [PMID: 25732134 DOI: 10.1021/pr501286b] [Citation(s) in RCA: 83] [Impact Index Per Article: 8.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/28/2022]
Abstract
Although eukaryotic cells express a wide range of alternatively spliced transcripts, it is not clear whether genes tend to express a range of transcripts simultaneously across cells, or produce dominant isoforms in a manner that is either tissue-specific or regardless of tissue. To date, large-scale investigations into the pattern of transcript expression across distinct tissues have produced contradictory results. Here, we attempt to determine whether genes express a dominant splice variant at the protein level. We interrogate peptides from eight large-scale human proteomics experiments and databases and find that there is a single dominant protein isoform, irrespective of tissue or cell type, for the vast majority of the protein-coding genes in these experiments, in partial agreement with the conclusions from the most recent large-scale RNAseq study. Remarkably, the dominant isoforms from the experimental proteomics analyses coincided overwhelmingly with the reference isoforms selected by two completely orthogonal sources, the consensus coding sequence variants, which are agreed upon by separate manual genome curation teams, and the principal isoforms from the APPRIS database, predicted automatically from the conservation of protein sequence, structure, and function.
Collapse
Affiliation(s)
- Iakes Ezkurdia
- †Unidad de Proteómica and ‡Laboratorio de Proteómica Cardiovascular, Centro Nacional de Investigaciones Cardiovasculares, 28029 Madrid, Spain.,§National Bioinformatics Institute and ∥Structural Biology and Bioinformatics Programme, Spanish National Cancer Research Centre, 28029 Madrid, Spain
| | - Jose Manuel Rodriguez
- †Unidad de Proteómica and ‡Laboratorio de Proteómica Cardiovascular, Centro Nacional de Investigaciones Cardiovasculares, 28029 Madrid, Spain.,§National Bioinformatics Institute and ∥Structural Biology and Bioinformatics Programme, Spanish National Cancer Research Centre, 28029 Madrid, Spain
| | - Enrique Carrillo-de Santa Pau
- †Unidad de Proteómica and ‡Laboratorio de Proteómica Cardiovascular, Centro Nacional de Investigaciones Cardiovasculares, 28029 Madrid, Spain.,§National Bioinformatics Institute and ∥Structural Biology and Bioinformatics Programme, Spanish National Cancer Research Centre, 28029 Madrid, Spain
| | - Jesús Vázquez
- †Unidad de Proteómica and ‡Laboratorio de Proteómica Cardiovascular, Centro Nacional de Investigaciones Cardiovasculares, 28029 Madrid, Spain.,§National Bioinformatics Institute and ∥Structural Biology and Bioinformatics Programme, Spanish National Cancer Research Centre, 28029 Madrid, Spain
| | - Alfonso Valencia
- †Unidad de Proteómica and ‡Laboratorio de Proteómica Cardiovascular, Centro Nacional de Investigaciones Cardiovasculares, 28029 Madrid, Spain.,§National Bioinformatics Institute and ∥Structural Biology and Bioinformatics Programme, Spanish National Cancer Research Centre, 28029 Madrid, Spain
| | - Michael L Tress
- †Unidad de Proteómica and ‡Laboratorio de Proteómica Cardiovascular, Centro Nacional de Investigaciones Cardiovasculares, 28029 Madrid, Spain.,§National Bioinformatics Institute and ∥Structural Biology and Bioinformatics Programme, Spanish National Cancer Research Centre, 28029 Madrid, Spain
| |
Collapse
|
39
|
Abascal F, Tress ML, Valencia A. Alternative splicing and co-option of transposable elements: the case of TMPO/LAP2α and ZNF451 in mammals. Bioinformatics 2015; 31:2257-61. [PMID: 25735770 PMCID: PMC4495291 DOI: 10.1093/bioinformatics/btv132] [Citation(s) in RCA: 28] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/19/2014] [Accepted: 02/25/2015] [Indexed: 01/05/2023] Open
Abstract
Transposable elements constitute a large fraction of vertebrate genomes and, during evolution, may be co-opted for new functions. Exonization of transposable elements inserted within or close to host genes is one possible way to generate new genes, and alternative splicing of the new exons may represent an intermediate step in this process. The genes TMPO and ZNF451 are present in all vertebrate lineages. Although they are not evolutionarily related, mammalian TMPO and ZNF451 do have something in common-they both code for splice isoforms that contain LAP2alpha domains. We found that these LAP2alpha domains have sequence similarity to repetitive sequences in non-mammalian genomes, which are in turn related to the first ORF from a DIRS1-like retrotransposon. This retrotransposon domestication happened separately and resulted in proteins that combine retrotransposon and host protein domains. The alternative splicing of the retrotransposed sequence allowed the production of both the new and the untouched original isoforms, which may have contributed to the success of the colonization process. The LAP2alpha-specific isoform of TMPO (LAP2α) has been co-opted for important roles in the cell, whereas the ZNF451 LAP2alpha isoform is evolving under strong purifying selection but remains uncharacterized.
Collapse
Affiliation(s)
- Federico Abascal
- Structural Biology and Biocomputing Programme, Spanish National Cancer Research Centre, Madrid 28029, Spain
| | - Michael L Tress
- Structural Biology and Biocomputing Programme, Spanish National Cancer Research Centre, Madrid 28029, Spain
| | - Alfonso Valencia
- Structural Biology and Biocomputing Programme, Spanish National Cancer Research Centre, Madrid 28029, Spain
| |
Collapse
|
40
|
Yefremova Y, Al-Majdoub M, Opuni KF, Koy C, Cui W, Yan Y, Gross M, Glocker MO. "De-novo" amino acid sequence elucidation of protein G'e by combined "top-down" and "bottom-up" mass spectrometry. JOURNAL OF THE AMERICAN SOCIETY FOR MASS SPECTROMETRY 2015; 26:482-492. [PMID: 25560987 PMCID: PMC6130978 DOI: 10.1007/s13361-014-1053-2] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 10/05/2014] [Revised: 11/20/2014] [Accepted: 11/20/2014] [Indexed: 06/04/2023]
Abstract
Mass spectrometric de-novo sequencing was applied to review the amino acid sequence of a commercially available recombinant protein G´ with great scientific and economic importance. Substantial deviations to the published amino acid sequence (Uniprot Q54181) were found by the presence of 46 additional amino acids at the N-terminus, including a so-called "His-tag" as well as an N-terminal partial α-N-gluconoylation and α-N-phosphogluconoylation, respectively. The unexpected amino acid sequence of the commercial protein G' comprised 241 amino acids and resulted in a molecular mass of 25,998.9 ± 0.2 Da for the unmodified protein. Due to the higher mass that is caused by its extended amino acid sequence compared with the original protein G' (185 amino acids), we named this protein "protein G'e." By means of mass spectrometric peptide mapping, the suggested amino acid sequence, as well as the N-terminal partial α-N-gluconoylations, was confirmed with 100% sequence coverage. After the protein G'e sequence was determined, we were able to determine the expression vector pET-28b from Novagen with the Xho I restriction enzyme cleavage site as the best option that was used for cloning and expressing the recombinant protein G'e in E. coli. A dissociation constant (K(d)) value of 9.4 nM for protein G'e was determined thermophoretically, showing that the N-terminal flanking sequence extension did not cause significant changes in the binding affinity to immunoglobulins.
Collapse
Affiliation(s)
- Yelena Yefremova
- Proteome Center Rostock, University Medicine Rostock, Rostock, Germany
| | | | | | - Cornelia Koy
- Proteome Center Rostock, University Medicine Rostock, Rostock, Germany
| | - Weidong Cui
- Washington University in St. Louis, St. Louis, Missouri, USA
| | - Yuetian Yan
- Washington University in St. Louis, St. Louis, Missouri, USA
| | - Michael Gross
- Washington University in St. Louis, St. Louis, Missouri, USA
| | | |
Collapse
|
41
|
Nesvizhskii AI. Proteogenomics: concepts, applications and computational strategies. Nat Methods 2015; 11:1114-25. [PMID: 25357241 DOI: 10.1038/nmeth.3144] [Citation(s) in RCA: 543] [Impact Index Per Article: 54.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/21/2014] [Accepted: 09/22/2014] [Indexed: 12/19/2022]
Abstract
Proteogenomics is an area of research at the interface of proteomics and genomics. In this approach, customized protein sequence databases generated using genomic and transcriptomic information are used to help identify novel peptides (not present in reference protein sequence databases) from mass spectrometry-based proteomic data; in turn, the proteomic data can be used to provide protein-level evidence of gene expression and to help refine gene models. In recent years, owing to the emergence of new sequencing technologies such as RNA-seq and dramatic improvements in the depth and throughput of mass spectrometry-based proteomics, the pace of proteogenomic research has greatly accelerated. Here I review the current state of proteogenomic methods and applications, including computational strategies for building and using customized protein sequence databases. I also draw attention to the challenge of false positive identifications in proteogenomics and provide guidelines for analyzing the data and reporting the results of proteogenomic studies.
Collapse
Affiliation(s)
- Alexey I Nesvizhskii
- 1] Department of Pathology, University of Michigan, Ann Arbor, Michigan, USA. [2] Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, Michigan, USA
| |
Collapse
|
42
|
Abstract
Historically pseudogenes were believed to represent nonfunctional genomic fossils; however, there is emerging evidence that many of them could be biologically active. This possibility has ignited interest in pseudogene loci and made the need for their high-quality annotation more pressing as an accurate knowledge of all pseudogenes in the human reference genome sequence facilitates confident functional analysis. GENCODE have undertaken the first genome-wide pseudogene assignment for protein-coding genes combining both large-scale manual annotation and computational pseudogene prediction pipelines. Multiple computational predictions provide an unbiased set of hints for manual annotators to investigate, both during first-pass annotation and as part of QC to identify any potential missing pseudogene loci. Where a pseudogene is identified, the extent of its homology to the parent locus is fully investigated by a manual annotator; a pseudogene model is built and assigned to one of eight pseudogene biotypes depending on the mechanism of creation and on the presence of locus-specific transcriptional or proteomic data. The high-quality, information-rich set of pseudogenes created has been integrated with ENCODE functional genomics data, specifically expression level, transcription factor and RNA polymerase II binding, and chromatin marks. In this way we have been able to identify some pseudogenes that possess conventional characteristics of functionality as well as others with interesting patterns of partial activity, which might suggest that putatively inactive loci could be gaining a novel function, for example as long noncoding RNAs. The activity data associated with every pseudogene is stored in the psiDR resource.
Collapse
|
43
|
Abstract
ENCODE projects exist for many eukaryotes, including humans, but as of yet no defined project exists for plants. A plant ENCODE would be invaluable to the research community and could be more readily produced than its metazoan equivalents by capitalizing on the preexisting infrastructure provided from similar projects. Collecting and normalizing plant epigenomic data for a range of species will facilitate hypothesis generation, cross-species comparisons, annotation of genomes, and an understanding of epigenomic functions throughout plant evolution. Here, we discuss the need for such a project, outline the challenges it faces, and suggest ways forward to build a plant ENCODE.
Collapse
Affiliation(s)
- Amanda K Lane
- Department of Genetics, University of Georgia, Athens, Georgia 30602;
| | | | | | | |
Collapse
|
44
|
Ezkurdia I, Juan D, Rodriguez JM, Frankish A, Diekhans M, Harrow J, Vazquez J, Valencia A, Tress ML. Multiple evidence strands suggest that there may be as few as 19,000 human protein-coding genes. Hum Mol Genet 2014; 23:5866-78. [PMID: 24939910 PMCID: PMC4204768 DOI: 10.1093/hmg/ddu309] [Citation(s) in RCA: 333] [Impact Index Per Article: 30.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/27/2022] Open
Abstract
Determining the full complement of protein-coding genes is a key goal of genome annotation. The most powerful approach for confirming protein-coding potential is the detection of cellular protein expression through peptide mass spectrometry (MS) experiments. Here, we mapped peptides detected in seven large-scale proteomics studies to almost 60% of the protein-coding genes in the GENCODE annotation of the human genome. We found a strong relationship between detection in proteomics experiments and both gene family age and cross-species conservation. Most of the genes for which we detected peptides were highly conserved. We found peptides for >96% of genes that evolved before bilateria. At the opposite end of the scale, we identified almost no peptides for genes that have appeared since primates, for genes that did not have any protein-like features or for genes with poor cross-species conservation. These results motivated us to describe a set of 2001 potential non-coding genes based on features such as weak conservation, a lack of protein features, or ambiguous annotations from major databases, all of which correlated with low peptide detection across the seven experiments. We identified peptides for just 3% of these genes. We show that many of these genes behave more like non-coding genes than protein-coding genes and suggest that most are unlikely to code for proteins under normal circumstances. We believe that their inclusion in the human protein-coding gene catalogue should be revised as part of the ongoing human genome annotation effort.
Collapse
Affiliation(s)
| | - David Juan
- Structural Biology and Bioinformatics Programme and
| | - Jose Manuel Rodriguez
- National Bioinformatics Institute (INB), Spanish National Cancer Research Centre (CNIO), Melchor Fernández Almagro, 3, 28029, Madrid, Spain
| | - Adam Frankish
- Wellcome Trust Sanger Institute, Wellcome Trust Campus, Hinxton, Cambridge CB10 1SA, UK and
| | - Mark Diekhans
- Center for Biomolecular Science and Engineering, School of Engineering, University of California Santa Cruz (UCSC), 1156 High Street, Santa Cruz, CA 95064, USA
| | - Jennifer Harrow
- Wellcome Trust Sanger Institute, Wellcome Trust Campus, Hinxton, Cambridge CB10 1SA, UK and
| | - Jesus Vazquez
- Laboratorio de Proteómica Cardiovascular, Centro Nacional de Investigaciones Cardiovasculares, CNIC, Melchor Fernández Almagro, 3, 28029, Madrid, Spain
| | - Alfonso Valencia
- Structural Biology and Bioinformatics Programme and, National Bioinformatics Institute (INB), Spanish National Cancer Research Centre (CNIO), Melchor Fernández Almagro, 3, 28029, Madrid, Spain,
| | | |
Collapse
|
45
|
Vieira HGS, Grynberg P, Bitar M, Pires SDF, Hilário HO, Macedo AM, Machado CR, de Andrade HM, Franco GR. Proteomic analysis of Trypanosoma cruzi response to ionizing radiation stress. PLoS One 2014; 9:e97526. [PMID: 24842666 PMCID: PMC4026238 DOI: 10.1371/journal.pone.0097526] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/15/2013] [Accepted: 04/22/2014] [Indexed: 11/18/2022] Open
Abstract
Trypanosoma cruzi, the causative agent of Chagas disease, is extremely resistant to ionizing radiation, enduring up to 1.5 kGy of gamma rays. Ionizing radiation can damage the DNA molecule both directly, resulting in double-strand breaks, and indirectly, as a consequence of reactive oxygen species production. After a dose of 500 Gy of gamma rays, the parasite genome is fragmented, but the chromosomal bands are restored within 48 hours. Under such conditions, cell growth arrests for up to 120 hours and the parasites resume normal growth after this period. To better understand the parasite response to ionizing radiation, we analyzed the proteome of irradiated (4, 24, and 96 hours after irradiation) and non-irradiated T. cruzi using two-dimensional differential gel electrophoresis followed by mass spectrometry for protein identification. A total of 543 spots were found to be differentially expressed, from which 215 were identified. These identified protein spots represent different isoforms of only 53 proteins. We observed a tendency for overexpression of proteins with molecular weights below predicted, indicating that these may be processed, yielding shorter polypeptides. The presence of shorter protein isoforms after irradiation suggests the occurrence of post-translational modifications and/or processing in response to gamma radiation stress. Our results also indicate that active translation is essential for the recovery of parasites from ionizing radiation damage. This study therefore reveals the peculiar response of T. cruzi to ionizing radiation, raising questions about how this organism can change its protein expression to survive such a harmful stress.
Collapse
Affiliation(s)
| | - Priscila Grynberg
- Departamento de Bioquímica e Imunologia, Universidade Federal de Minas Gerais, Belo Horizonte, Minas Gerais, Brazil
- Embrapa Recursos Genéticos e Biotecnologia, Brasília, Distrito Federal, Brazil
| | - Mainá Bitar
- Departamento de Bioquímica e Imunologia, Universidade Federal de Minas Gerais, Belo Horizonte, Minas Gerais, Brazil
| | - Simone da Fonseca Pires
- Departamento de Parasitologia, Universidade Federal de Minas Gerais, Belo Horizonte, Minas Gerais, Brazil
| | - Heron Oliveira Hilário
- Departamento de Bioquímica e Imunologia, Universidade Federal de Minas Gerais, Belo Horizonte, Minas Gerais, Brazil
| | - Andrea Mara Macedo
- Departamento de Bioquímica e Imunologia, Universidade Federal de Minas Gerais, Belo Horizonte, Minas Gerais, Brazil
| | - Carlos Renato Machado
- Departamento de Bioquímica e Imunologia, Universidade Federal de Minas Gerais, Belo Horizonte, Minas Gerais, Brazil
| | - Hélida Monteiro de Andrade
- Departamento de Parasitologia, Universidade Federal de Minas Gerais, Belo Horizonte, Minas Gerais, Brazil
| | - Glória Regina Franco
- Departamento de Bioquímica e Imunologia, Universidade Federal de Minas Gerais, Belo Horizonte, Minas Gerais, Brazil
| |
Collapse
|
46
|
Abstract
The study of pseudogenes, originally dismissed as genomic relics of evolutionary selection, has seen a resurgence in scientific literature, in addition to being a peculiar topic of discussion in theological debates. For a long time, pseudogenes have been touted as a beacon of natural selection and a definitive proof of evolution due to the slow mutation rate that differentiated them from their parental genes and ultimately caused their genetic demise as functional genes. It now seems that "creationists" have co-opted some recent reports identifying unheralded biological functions to pseudogens and other noncoding RNAs as evidence to undermine the existence of evolution and supporting intelligent design. This issue of Methods in Molecular Biology focused on pseudogenes will certainly not end, nor enter this debate; however, scientists who are also genomics and pseudogene enthusiasts will certainly appreciate that many scientists are thinking about these particular genetic elements in new and interesting ways. With this new interest in a biological significance and "non-junk" role for pseudogenes and other noncoding RNAs, new methods and approaches are being developed to unlock the mystery of these ancient artifacts we know as pseudogenes. In this brief introductory chapter we highlight the renewed interest in pseudogenes and review a rationale for intensification of pseudogene-related research.
Collapse
|
47
|
Ucciferri N, Rocchiccioli S. Proteomics techniques for the detection of translated pseudogenes. Methods Mol Biol 2014; 1167:187-95. [PMID: 24823778 DOI: 10.1007/978-1-4939-0835-6_12] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/24/2022]
Abstract
Increasing evidence indicates that pseudogenes can reach the translational process. Translated pseudogene products have in fact been found in various organisms, confuting the original definition of pseudogenes as genes without any coding potential. Proteomics is the main technology allowing the study of proteins and, when integrated with genomics, is defined as proteogenomics. In proteogenomics, the peptide-genome alignment drives the identification and annotation of gene products and allows for a better understanding of their function. In this chapter, we give a brief overview of the proteomic techniques applied to pseudogenes. In particular, we discuss peptide spectrum acquisition, mass data analysis, and genome database matching.
Collapse
Affiliation(s)
- Nadia Ucciferri
- CNR, Institute of Clinical Physiology, Via Moruzzi 1, 56124, Pisa, Italy
| | | |
Collapse
|
48
|
Piwowar M, Banach M, Konieczny L, Roterman I. Structural role of exon-coded fragment of polypeptide chains in selected enzymes. J Theor Biol 2013; 337:15-23. [PMID: 23896319 DOI: 10.1016/j.jtbi.2013.07.016] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/20/2013] [Revised: 06/12/2013] [Accepted: 07/17/2013] [Indexed: 11/27/2022]
Abstract
This paper discusses the structural role of fragments encoded by individual exons in proteins. Selected enzymes (hydrolases, transferases, ligases) reveal the presence of at least one exon fragment whose contribution to the protein's hydrophobic core is in line with theoretical expectations. This phenomenon is confirmed by quantitative analysis of the hydrophobicity density distribution in protein molecules. Results are compared with a 3D Gaussian function, treated as an "idealized" distribution of hydrophobicity density, with the highest values observed near the center of the molecule and near-zero values on its surface. At least one accordant exon fragment has been identified in each of the proteins subjected to analysis. On the basis of these results the authors propose that accordant exons are responsible for tertiary structural stabilization of proteins by ensuring the generation of a stable hydrophobic core.
Collapse
Affiliation(s)
- Monika Piwowar
- Department of Bioinformatics and Telemedicine, Medical College-Jagiellonian University, Lazarza 16, 31-530 Krakow, Poland
| | - Mateusz Banach
- Department of Bioinformatics and Telemedicine, Medical College-Jagiellonian University, Lazarza 16, 31-530 Krakow, Poland
| | - Leszek Konieczny
- Chair of Medical Biochemistry, Medical College-Jagiellonian University, Kopernika 7, 31-034 Krakow, Poland
| | - Irena Roterman
- Department of Bioinformatics and Telemedicine, Medical College-Jagiellonian University, Lazarza 16, 31-530 Krakow, Poland.
| |
Collapse
|
49
|
Pang CNI, Tay AP, Aya C, Twine NA, Harkness L, Hart-Smith G, Chia SZ, Chen Z, Deshpande NP, Kaakoush NO, Mitchell HM, Kassem M, Wilkins MR. Tools to covisualize and coanalyze proteomic data with genomes and transcriptomes: validation of genes and alternative mRNA splicing. J Proteome Res 2013; 13:84-98. [PMID: 24152167 DOI: 10.1021/pr400820p] [Citation(s) in RCA: 33] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/19/2022]
Abstract
Direct links between proteomic and genomic/transcriptomic data are not frequently made, partly because of lack of appropriate bioinformatics tools. To help address this, we have developed the PG Nexus pipeline. The PG Nexus allows users to covisualize peptides in the context of genomes or genomic contigs, along with RNA-seq reads. This is done in the Integrated Genome Viewer (IGV). A Results Analyzer reports the precise base position where LC-MS/MS-derived peptides cover genes or gene isoforms, on the chromosomes or contigs where this occurs. In prokaryotes, the PG Nexus pipeline facilitates the validation of genes, where annotation or gene prediction is available, or the discovery of genes using a "virtual protein"-based unbiased approach. We illustrate this with a comprehensive proteogenomics analysis of two strains of Campylobacter concisus . For higher eukaryotes, the PG Nexus facilitates gene validation and supports the identification of mRNA splice junction boundaries and splice variants that are protein-coding. This is illustrated with an analysis of splice junctions covered by human phosphopeptides, and other examples of relevance to the Chromosome-Centric Human Proteome Project. The PG Nexus is open-source and available from https://github.com/IntersectAustralia/ap11_Samifier. It has been integrated into Galaxy and made available in the Galaxy tool shed.
Collapse
Affiliation(s)
- Chi Nam Ignatius Pang
- Systems Biology Initiative, The University of New South Wales , Sydney, New South Wales 2052, Australia
| | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
50
|
Abstract
The last decade has seen tremendous effort committed to the annotation of the human genome sequence, most notably perhaps in the form of the ENCODE project. One of the major findings of ENCODE, and other genome analysis projects, is that the human transcriptome is far larger and more complex than previously thought. This complexity manifests, for example, as alternative splicing within protein-coding genes, as well as in the discovery of thousands of long noncoding RNAs. It is also possible that significant numbers of human transcripts have not yet been described by annotation projects, while existing transcript models are frequently incomplete. The question as to what proportion of this complexity is truly functional remains open, however, and this ambiguity presents a serious challenge to genome scientists. In this article, we will discuss the current state of human transcriptome annotation, drawing on our experience gained in generating the GENCODE gene annotation set. We highlight the gaps in our knowledge of transcript functionality that remain, and consider the potential computational and experimental strategies that can be used to help close them. We propose that an understanding of the true overlap between transcriptional complexity and functionality will not be gained in the short term. However, significant steps toward obtaining this knowledge can now be taken by using an integrated strategy, combining all of the experimental resources at our disposal.
Collapse
Affiliation(s)
- Jonathan M Mudge
- Department of Informatics, Wellcome Trust Sanger Institute, Hinxton CB10 1SA, United Kingdom
| | | | | |
Collapse
|