1
|
Song Y, Zhang C, Omenn GS, O’Meara MJ, Welch JD. Predicting the Structural Impact of Human Alternative Splicing. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.12.21.572928. [PMID: 38187531 PMCID: PMC10769328 DOI: 10.1101/2023.12.21.572928] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/09/2024]
Abstract
Protein structure prediction with neural networks is a powerful new method for linking protein sequence, structure, and function, but structures have generally been predicted for only a single isoform of each gene, neglecting splice variants. To investigate the structural implications of alternative splicing, we used AlphaFold2 to predict the structures of more than 11,000 human isoforms. We employed multiple metrics to identify splicing-induced structural alterations, including template matching score, secondary structure composition, surface charge distribution, radius of gyration, accessibility of post-translational modification sites, and structure-based function prediction. We identified examples of how alternative splicing induced clear changes in each of these properties. Structural similarity between isoforms largely correlated with degree of sequence identity, but we identified a subset of isoforms with low structural similarity despite high sequence similarity. Exon skipping and alternative last exons tended to increase the surface charge and radius of gyration. Splicing also buried or exposed numerous post-translational modification sites, most notably among the isoforms of BAX. Functional prediction nominated numerous functional differences among isoforms of the same gene, with loss of function compared to the reference predominating. Finally, we used single-cell RNA-seq data from the Tabula Sapiens to determine the cell types in which each structure is expressed. Our work represents an important resource for studying the structure and function of splice isoforms across the cell types of the human body.
Collapse
Affiliation(s)
- Yuxuan Song
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, MI, USA
| | - Chengxin Zhang
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, MI, USA
| | - Gilbert S. Omenn
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, MI, USA
| | - Matthew J. O’Meara
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, MI, USA
- Department of Medicinal Chemistry, University of Michigan, Ann Arbor, MI, USA
| | - Joshua D. Welch
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, MI, USA
- Department of Computer Science and Engineering, University of Michigan, Ann Arbor, MI, USA
| |
Collapse
|
2
|
Reixachs‐Solé M, Eyras E. Uncovering the impacts of alternative splicing on the proteome with current omics techniques. WILEY INTERDISCIPLINARY REVIEWS. RNA 2022; 13:e1707. [PMID: 34979593 PMCID: PMC9542554 DOI: 10.1002/wrna.1707] [Citation(s) in RCA: 33] [Impact Index Per Article: 11.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 09/26/2021] [Revised: 11/27/2021] [Accepted: 11/29/2021] [Indexed: 12/15/2022]
Abstract
The high-throughput sequencing of cellular RNAs has underscored a broad effect of isoform diversification through alternative splicing on the transcriptome. Moreover, the differential production of transcript isoforms from gene loci has been recognized as a critical mechanism in cell differentiation, organismal development, and disease. Yet, the extent of the impact of alternative splicing on protein production and cellular function remains a matter of debate. Multiple experimental and computational approaches have been developed in recent years to address this question. These studies have unveiled how molecular changes at different steps in the RNA processing pathway can lead to differences in protein production and have functional effects. New and emerging experimental technologies open exciting new opportunities to develop new methods to fully establish the connection between messenger RNA expression and protein production and to further investigate how RNA variation impacts the proteome and cell function. This article is categorized under: RNA Processing > Splicing Regulation/Alternative Splicing Translation > Regulation RNA Evolution and Genomics > Computational Analyses of RNA.
Collapse
Affiliation(s)
- Marina Reixachs‐Solé
- The John Curtin School of Medical ResearchAustralian National UniversityCanberraAustralian Capital TerritoryAustralia
- EMBL Australia Partner Laboratory Network and the Australian National UniversityCanberraAustralian Capital TerritoryAustralia
| | - Eduardo Eyras
- The John Curtin School of Medical ResearchAustralian National UniversityCanberraAustralian Capital TerritoryAustralia
- EMBL Australia Partner Laboratory Network and the Australian National UniversityCanberraAustralian Capital TerritoryAustralia
- Catalan Institution for Research and Advanced StudiesBarcelonaSpain
- Hospital del Mar Medical Research Institute (IMIM)BarcelonaSpain
| |
Collapse
|
3
|
Pozo F, Martinez-Gomez L, Walsh TA, Rodriguez JM, Di Domenico T, Abascal F, Vazquez J, Tress ML. Assessing the functional relevance of splice isoforms. NAR Genom Bioinform 2021; 3:lqab044. [PMID: 34046593 PMCID: PMC8140736 DOI: 10.1093/nargab/lqab044] [Citation(s) in RCA: 23] [Impact Index Per Article: 5.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/27/2020] [Revised: 04/22/2021] [Accepted: 05/17/2021] [Indexed: 12/20/2022] Open
Abstract
Alternative splicing of messenger RNA can generate an array of mature transcripts, but it is not clear how many go on to produce functionally relevant protein isoforms. There is only limited evidence for alternative proteins in proteomics analyses and data from population genetic variation studies indicate that most alternative exons are evolving neutrally. Determining which transcripts produce biologically important isoforms is key to understanding isoform function and to interpreting the real impact of somatic mutations and germline variations. Here we have developed a method, TRIFID, to classify the functional importance of splice isoforms. TRIFID was trained on isoforms detected in large-scale proteomics analyses and distinguishes these biologically important splice isoforms with high confidence. Isoforms predicted as functionally important by the algorithm had measurable cross species conservation and significantly fewer broken functional domains. Additionally, exons that code for these functionally important protein isoforms are under purifying selection, while exons from low scoring transcripts largely appear to be evolving neutrally. TRIFID has been developed for the human genome, but it could in principle be applied to other well-annotated species. We believe that this method will generate valuable insights into the cellular importance of alternative splicing.
Collapse
Affiliation(s)
- Fernando Pozo
- Bioinformatics Unit, Spanish National Cancer Research Centre (CNIO), Madrid, Spain
| | - Laura Martinez-Gomez
- Bioinformatics Unit, Spanish National Cancer Research Centre (CNIO), Madrid, Spain
| | - Thomas A Walsh
- Bioinformatics Unit, Spanish National Cancer Research Centre (CNIO), Madrid, Spain
| | - José Manuel Rodriguez
- Cardiovascular Proteomics Laboratory, Centro Nacional de Investigaciones Cardiovasculares (CNIC), Madrid, Spain
| | - Tomas Di Domenico
- Bioinformatics Unit, Spanish National Cancer Research Centre (CNIO), Madrid, Spain
| | - Federico Abascal
- Somatic Evolution Group, Wellcome Sanger Institute, Hinxton CB10 1SA, UK
| | - Jesús Vazquez
- Cardiovascular Proteomics Laboratory, Centro Nacional de Investigaciones Cardiovasculares (CNIC), Madrid, Spain
| | - Michael L Tress
- Bioinformatics Unit, Spanish National Cancer Research Centre (CNIO), Madrid, Spain
| |
Collapse
|
4
|
Lau E, Han Y, Williams DR, Thomas CT, Shrestha R, Wu JC, Lam MPY. Splice-Junction-Based Mapping of Alternative Isoforms in the Human Proteome. Cell Rep 2020; 29:3751-3765.e5. [PMID: 31825849 PMCID: PMC6961840 DOI: 10.1016/j.celrep.2019.11.026] [Citation(s) in RCA: 49] [Impact Index Per Article: 9.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/18/2019] [Revised: 09/24/2019] [Accepted: 11/06/2019] [Indexed: 12/18/2022] Open
Abstract
The protein-level translational status and function of many alternative splicing events remain poorly understood. We use an RNA sequencing (RNA-seq)-guided proteomics method to identify protein alternative splicing isoforms in the human proteome by constructing tissue-specific protein databases that prioritize transcript splice junction pairs with high translational potential. Using the custom databases to reanalyze ~80 million mass spectra in public proteomics datasets, we identify more than 1,500 noncanonical protein isoforms across 12 human tissues, including ~400 sequences undocumented on TrEMBL and RefSeq databases. We apply the method to original quantitative mass spectrometry experiments and observe widespread isoform regulation during human induced pluripotent stem cell cardiomyocyte differentiation. On a proteome scale, alternative isoform regions overlap frequently with disordered sequences and post-translational modification sites, suggesting that alternative splicing may regulate protein function through modulating intrinsically disordered regions. The described approach may help elucidate functional consequences of alternative splicing and expand the scope of proteomics investigations in various systems. The translation and function of many alternative splicing events await confirmation at the protein level. Lau et al. use an integrated proteotranscriptomics approach to identify non-canonical and undocumented isoforms from 12 organs in the human proteome. Alternative isoforms interfere with functional sequence features and are differentially regulated during iPSC cardiomyocyte differentiation.
Collapse
Affiliation(s)
- Edward Lau
- Stanford Cardiovascular Institute, Department of Medicine, Stanford University, Palo Alto, CA, USA
| | - Yu Han
- Consortium for Fibrosis Research and Translation, Anschutz Medical Campus, University of Colorado, Aurora, CO, USA; Departments of Medicine-Cardiology and Biochemistry and Molecular Genetics, Anschutz Medical Campus, University of Colorado, Aurora, CO, USA
| | - Damon R Williams
- Stanford Cardiovascular Institute, Department of Medicine, Stanford University, Palo Alto, CA, USA
| | - Cody T Thomas
- Departments of Medicine-Cardiology and Biochemistry and Molecular Genetics, Anschutz Medical Campus, University of Colorado, Aurora, CO, USA
| | - Rajani Shrestha
- Stanford Cardiovascular Institute, Department of Medicine, Stanford University, Palo Alto, CA, USA
| | - Joseph C Wu
- Stanford Cardiovascular Institute, Department of Medicine, Stanford University, Palo Alto, CA, USA; Department of Radiology, School of Medicine, Stanford University, Palo Alto, CA, USA
| | - Maggie P Y Lam
- Consortium for Fibrosis Research and Translation, Anschutz Medical Campus, University of Colorado, Aurora, CO, USA; Departments of Medicine-Cardiology and Biochemistry and Molecular Genetics, Anschutz Medical Campus, University of Colorado, Aurora, CO, USA.
| |
Collapse
|
5
|
Sulakhe D, D'Souza M, Wang S, Balasubramanian S, Athri P, Xie B, Canzar S, Agam G, Gilliam TC, Maltsev N. Exploring the functional impact of alternative splicing on human protein isoforms using available annotation sources. Brief Bioinform 2020; 20:1754-1768. [PMID: 29931155 DOI: 10.1093/bib/bby047] [Citation(s) in RCA: 18] [Impact Index Per Article: 3.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/19/2018] [Revised: 05/02/2018] [Indexed: 12/30/2022] Open
Abstract
In recent years, the emphasis of scientific inquiry has shifted from whole-genome analyses to an understanding of cellular responses specific to tissue, developmental stage or environmental conditions. One of the central mechanisms underlying the diversity and adaptability of the contextual responses is alternative splicing (AS). It enables a single gene to encode multiple isoforms with distinct biological functions. However, to date, the functions of the vast majority of differentially spliced protein isoforms are not known. Integration of genomic, proteomic, functional, phenotypic and contextual information is essential for supporting isoform-based modeling and analysis. Such integrative proteogenomics approaches promise to provide insights into the functions of the alternatively spliced protein isoforms and provide high-confidence hypotheses to be validated experimentally. This manuscript provides a survey of the public databases supporting isoform-based biology. It also presents an overview of the potential global impact of AS on the human canonical gene functions, molecular interactions and cellular pathways.
Collapse
Affiliation(s)
- Dinanath Sulakhe
- Department of Human Genetics, University of Chicago, 920 E. 58th Street, Chicago, IL, USA.,Computation Institute, University of Chicago, 5735 S. Ellis Avenue, Chicago, IL, USA
| | - Mark D'Souza
- Department of Human Genetics, University of Chicago, 920 E. 58th Street, Chicago, IL, USA
| | - Sheng Wang
- Department of Human Genetics, University of Chicago, 920 E. 58th Street, Chicago, IL, USA.,Toyota Technological Institute at Chicago, 6045 S. Kenwood Avenue, Chicago, IL, USA
| | - Sandhya Balasubramanian
- Department of Human Genetics, University of Chicago, 920 E. 58th Street, Chicago, IL, USA.,Genentech, Inc. 1 DNA Way, Mail Stop: 35-6J, South San Francisco, CA, USA
| | - Prashanth Athri
- Department of Computer Science and Engineering, Amrita School of Engineering, Bengaluru, Amrita Vishwa Vidyapeetham, Kasavanahalli, Carmelaram P.O., Bengaluru, Karnataka, India
| | - Bingqing Xie
- Department of Human Genetics, University of Chicago, 920 E. 58th Street, Chicago, IL, USA.,Department of Computer Science, Illinois Institute of Technology, Chicago, IL, USA
| | - Stefan Canzar
- Toyota Technological Institute at Chicago, 6045 S. Kenwood Avenue, Chicago, IL, USA.,Gene Center, Ludwig-Maximilians-Universität München, Munich, Germany
| | - Gady Agam
- Department of Computer Science, Illinois Institute of Technology, Chicago, IL, USA
| | - T Conrad Gilliam
- Department of Human Genetics, University of Chicago, 920 E. 58th Street, Chicago, IL, USA.,Computation Institute, University of Chicago, 5735 S. Ellis Avenue, Chicago, IL, USA
| | - Natalia Maltsev
- Department of Human Genetics, University of Chicago, 920 E. 58th Street, Chicago, IL, USA.,Computation Institute, University of Chicago, 5735 S. Ellis Avenue, Chicago, IL, USA
| |
Collapse
|
6
|
Li X, Ma R, Li Q, Li S, Zhang H, Xie J, Bai J, Idris A, Feng R. Transmembrane Protein 39A Promotes the Replication of Encephalomyocarditis Virus via Autophagy Pathway. Front Microbiol 2019; 10:2680. [PMID: 31849860 PMCID: PMC6901969 DOI: 10.3389/fmicb.2019.02680] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/25/2019] [Accepted: 11/05/2019] [Indexed: 12/31/2022] Open
Abstract
Encephalomyocarditis virus (EMCV) causes encephalitis, myocarditis, neuropathy, reproductive disorders, and diabetes in animals. EMCV is known to induce cell autophagy; however, the molecular mechanisms underlying this remain unclear. Here, we show that the type III-transmembrane protein, transmembrane protein 39A (TMEM39A), plays a critical role in EMCV replication. We showed that EMCV GS01 strain infection upregulated TMEM39A expression. Importantly, EMCV induced autophagy in a range of host cells. The autophagy chemical inhibitor, 3-MA, inhibited EMCV replication and reduced TMEM39A expression. This is the first study demonstrating TMEM39A promoting the replication of EMCV via autophagy. Overall, we show that TMEM39A plays a positive regulatory role in EMCV proliferation and that TMEM39A expression is dependent on the autophagy pathway.
Collapse
Affiliation(s)
- Xiangrong Li
- Key Laboratory of Biotechnology and Bioengineering of State Ethnic Affairs Commission, Biomedical Research Center, Northwest Minzu University, Lanzhou, China.,Gansu Tech Innovation Center of Animal Cell, Biomedical Research Center, Lanzhou, China
| | - Ruixian Ma
- Key Laboratory of Biotechnology and Bioengineering of State Ethnic Affairs Commission, Biomedical Research Center, Northwest Minzu University, Lanzhou, China.,Life Science and Engineering College, Northwest Minzu University, Lanzhou, China
| | - Qian Li
- Key Laboratory of Biotechnology and Bioengineering of State Ethnic Affairs Commission, Biomedical Research Center, Northwest Minzu University, Lanzhou, China.,Life Science and Engineering College, Northwest Minzu University, Lanzhou, China
| | - Shengjun Li
- Key Laboratory of Biotechnology and Bioengineering of State Ethnic Affairs Commission, Biomedical Research Center, Northwest Minzu University, Lanzhou, China.,Life Science and Engineering College, Northwest Minzu University, Lanzhou, China
| | - Haixia Zhang
- Key Laboratory of Biotechnology and Bioengineering of State Ethnic Affairs Commission, Biomedical Research Center, Northwest Minzu University, Lanzhou, China.,Gansu Tech Innovation Center of Animal Cell, Biomedical Research Center, Lanzhou, China
| | - Jingying Xie
- College of Veterinary Medicine, Gansu Agricultural University, Lanzhou, China
| | - Jialin Bai
- Key Laboratory of Biotechnology and Bioengineering of State Ethnic Affairs Commission, Biomedical Research Center, Northwest Minzu University, Lanzhou, China.,Gansu Tech Innovation Center of Animal Cell, Biomedical Research Center, Lanzhou, China
| | - Adi Idris
- School of Medical Science, Menzies Health Institute Queensland, Griffith University, Gold Coast, QLD, Australia
| | - Ruofei Feng
- Key Laboratory of Biotechnology and Bioengineering of State Ethnic Affairs Commission, Biomedical Research Center, Northwest Minzu University, Lanzhou, China.,Gansu Tech Innovation Center of Animal Cell, Biomedical Research Center, Lanzhou, China
| |
Collapse
|
7
|
Bhuiyan SA, Ly S, Phan M, Huntington B, Hogan E, Liu CC, Liu J, Pavlidis P. Systematic evaluation of isoform function in literature reports of alternative splicing. BMC Genomics 2018; 19:637. [PMID: 30153812 PMCID: PMC6114036 DOI: 10.1186/s12864-018-5013-2] [Citation(s) in RCA: 28] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/06/2018] [Accepted: 08/14/2018] [Indexed: 12/17/2022] Open
Abstract
BACKGROUND Although most genes in mammalian genomes have multiple isoforms, an ongoing debate is whether these isoforms are all functional as well as the extent to which they increase the functional repertoire of the genome. To ground this debate in data, it would be helpful to have a corpus of experimentally-verified cases of genes which have functionally distinct splice isoforms (FDSIs). RESULTS We established a curation framework for evaluating experimental evidence of FDSIs, and analyzed over 700 human and mouse genes, strongly biased towards genes that are prominent in the alternative splicing literature. Despite this bias, we found experimental evidence meeting the classical definition for functionally distinct isoforms for ~ 5% of the curated genes. If we relax our criteria for inclusion to include weaker forms of evidence, the fraction of genes with evidence of FDSIs remains low (~ 13%). We provide evidence that this picture will not change substantially with further curation and conclude there is a large gap between the presumed impact of splicing on gene function and the experimental evidence. Furthermore, many functionally distinct isoforms were not traceable to a specific isoform in Ensembl, a database that forms the basis for much computational research. CONCLUSIONS We conclude that the claim that alternative splicing vastly increases the functional repertoire of the genome is an extrapolation from a limited number of empirically supported cases. We also conclude that more work is needed to integrate experimental evidence and genome annotation databases. Our work should help shape research around the role of splicing on gene function from presuming large general effects to acknowledging the need for stronger experimental evidence.
Collapse
Affiliation(s)
- Shamsuddin A. Bhuiyan
- Michael Smith Laboratories, University of British Columbia, Vancouver, BC V6T 1Z4 Canada
- Department of Psychiatry, University of British Columbia, Vancouver, BC V6T 1Z4 Canada
- Graduate Program in Bioinformatics, University of British Columbia, Vancouver, Canada
| | - Sophia Ly
- Michael Smith Laboratories, University of British Columbia, Vancouver, BC V6T 1Z4 Canada
| | - Minh Phan
- Michael Smith Laboratories, University of British Columbia, Vancouver, BC V6T 1Z4 Canada
| | - Brandon Huntington
- Michael Smith Laboratories, University of British Columbia, Vancouver, BC V6T 1Z4 Canada
| | - Ellie Hogan
- Michael Smith Laboratories, University of British Columbia, Vancouver, BC V6T 1Z4 Canada
| | - Chao Chun Liu
- Michael Smith Laboratories, University of British Columbia, Vancouver, BC V6T 1Z4 Canada
| | - James Liu
- Michael Smith Laboratories, University of British Columbia, Vancouver, BC V6T 1Z4 Canada
| | - Paul Pavlidis
- Michael Smith Laboratories, University of British Columbia, Vancouver, BC V6T 1Z4 Canada
- Department of Psychiatry, University of British Columbia, Vancouver, BC V6T 1Z4 Canada
| |
Collapse
|
8
|
Ullah I, Liao Y, Wan R, Tang L, Feng J. Alternative Splicing of SMAD4 and Its Function in HaCaT Cells in Response to UVB Irradiation. J Cancer 2018; 9:3177-3186. [PMID: 30210641 PMCID: PMC6134820 DOI: 10.7150/jca.24756] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/05/2018] [Accepted: 05/09/2018] [Indexed: 12/27/2022] Open
Abstract
Alternative splicing is one of the most common mechanisms of human gene regulation and plays a crucial role in increasing the diversity of functional proteins. Many diseases are linked to alternative splicing, especially cancer. SMAD4 is a member of the SMAD family and plays a critical role in mediating of TGF-β signal transduction and gene regulatory events. Smad4 is a tumour suppressor and acts as a shuttling protein between nucleus and cytoplasm. The splicing variants of Smad4 have been found in many cancers. The present study performed nested PCR to detect alternative splicing of Smad4 in HaCaT cells lines in response to UVB irradiation. The UVB induced a novel Smad4B isoform that led to decrease the Smad4 expression. The hnRNPA1 splicing factor is responsible for Smad4 alternative splicing in response to UVB. The UVB increased the expression of SF2 and hnRNPA1 Splicing factors. The hnRNPA1 overexpression induced Smad4B by regulating Smad4 alternative splicing. The Smad4B isoform supported the function of Smad4 full length in UVB resistance with certain limitation. The western blot analyses showed that the overexpressed Smad4 full length significantly increased N-cadherin expression while Smad4B overexpression decreased the expression the N-cadherin (P<0.05). Furthermore, overexpression of the isoform in HaCaT cells decreased cell invasion as compared to Smad4 full-length overexpression. These results will be helpful to understand the importance of Smad4 alternative splicing in skin tumorigenesis.
Collapse
Affiliation(s)
- Irfan Ullah
- Key Laboratory of Biorheological Science and Technology, Ministry of Education, College of Bioengineering, Chongqing University, Chongqing 400044, China
| | - Yi Liao
- Department of Cardiothoracic Surgery, Southwest Hospital, Third Military Medical University Chongqing, China
| | - Rongxue Wan
- Key Laboratory of Biorheological Science and Technology, Ministry of Education, College of Bioengineering, Chongqing University, Chongqing 400044, China
| | - Liling Tang
- Key Laboratory of Biorheological Science and Technology, Ministry of Education, College of Bioengineering, Chongqing University, Chongqing 400044, China
| | - Jianguo Feng
- Department of Anesthesiology, The Affiliated Hospital of Southwest Medical University, Luzhou, Sichuan Province, China
| |
Collapse
|
9
|
Peng H, Lan C, Liu Y, Liu T, Blumenstein M, Li J. Chromosome preference of disease genes and vectorization for the prediction of non-coding disease genes. Oncotarget 2017; 8:78901-78916. [PMID: 29108274 PMCID: PMC5668007 DOI: 10.18632/oncotarget.20481] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/23/2017] [Accepted: 07/19/2017] [Indexed: 12/15/2022] Open
Abstract
Disease-related protein-coding genes have been widely studied, but disease-related non-coding genes remain largely unknown. This work introduces a new vector to represent diseases, and applies the newly vectorized data for a positive-unlabeled learning algorithm to predict and rank disease-related long non-coding RNA (lncRNA) genes. This novel vector representation for diseases consists of two sub-vectors, one is composed of 45 elements, characterizing the information entropies of the disease genes distribution over 45 chromosome substructures. This idea is supported by our observation that some substructures (e.g., the chromosome 6 p-arm) are highly preferred by disease-related protein coding genes, while some (e.g., the 21 p-arm) are not favored at all. The second sub-vector is 30-dimensional, characterizing the distribution of disease gene enriched KEGG pathways in comparison with our manually created pathway groups. The second sub-vector complements with the first one to differentiate between various diseases. Our prediction method outperforms the state-of-the-art methods on benchmark datasets for prioritizing disease related lncRNA genes. The method also works well when only the sequence information of an lncRNA gene is known, or even when a given disease has no currently recognized long non-coding genes.
Collapse
Affiliation(s)
- Hui Peng
- Advanced Analytics Institute & Centre for Health Technologies, University of Technology Sydney, Broadway, NSW, Australia
| | - Chaowang Lan
- Advanced Analytics Institute & Centre for Health Technologies, University of Technology Sydney, Broadway, NSW, Australia
| | - Yuansheng Liu
- Advanced Analytics Institute & Centre for Health Technologies, University of Technology Sydney, Broadway, NSW, Australia
| | - Tao Liu
- Centre for Childhood Cancer Research, University of New South Wales, Sydney, Kensington, NSW, Australia
| | - Michael Blumenstein
- School of Software, University of Technology Sydney, Broadway, NSW, Australia
| | - Jinyan Li
- Advanced Analytics Institute & Centre for Health Technologies, University of Technology Sydney, Broadway, NSW, Australia
| |
Collapse
|
10
|
Detecting protein variants by mass spectrometry: a comprehensive study in cancer cell-lines. Genome Med 2017; 9:62. [PMID: 28716134 PMCID: PMC5514513 DOI: 10.1186/s13073-017-0454-9] [Citation(s) in RCA: 37] [Impact Index Per Article: 4.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/03/2017] [Accepted: 06/22/2017] [Indexed: 02/07/2023] Open
Abstract
Background Onco-proteogenomics aims to understand how changes in a cancer’s genome influences its proteome. One challenge in integrating these molecular data is the identification of aberrant protein products from mass-spectrometry (MS) datasets, as traditional proteomic analyses only identify proteins from a reference sequence database. Methods We established proteomic workflows to detect peptide variants within MS datasets. We used a combination of publicly available population variants (dbSNP and UniProt) and somatic variations in cancer (COSMIC) along with sample-specific genomic and transcriptomic data to examine proteome variation within and across 59 cancer cell-lines. Results We developed a set of recommendations for the detection of variants using three search algorithms, a split target-decoy approach for FDR estimation, and multiple post-search filters. We examined 7.3 million unique variant tryptic peptides not found within any reference proteome and identified 4771 mutations corresponding to somatic and germline deviations from reference proteomes in 2200 genes among the NCI60 cell-line proteomes. Conclusions We discuss in detail the technical and computational challenges in identifying variant peptides by MS and show that uncovering these variants allows the identification of druggable mutations within important cancer genes. Electronic supplementary material The online version of this article (doi:10.1186/s13073-017-0454-9) contains supplementary material, which is available to authorized users.
Collapse
|
11
|
The contribution of intrinsically disordered regions to protein function, cellular complexity, and human disease. Biochem Soc Trans 2017; 44:1185-1200. [PMID: 27911701 PMCID: PMC5095923 DOI: 10.1042/bst20160172] [Citation(s) in RCA: 278] [Impact Index Per Article: 34.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/13/2016] [Revised: 07/20/2016] [Accepted: 07/22/2016] [Indexed: 12/23/2022]
Abstract
In the 1960s, Christian Anfinsen postulated that the unique three-dimensional structure of a protein is determined by its amino acid sequence. This work laid the foundation for the sequence–structure–function paradigm, which states that the sequence of a protein determines its structure, and structure determines function. However, a class of polypeptide segments called intrinsically disordered regions does not conform to this postulate. In this review, I will first describe established and emerging ideas about how disordered regions contribute to protein function. I will then discuss molecular principles by which regulatory mechanisms, such as alternative splicing and asymmetric localization of transcripts that encode disordered regions, can increase the functional versatility of proteins. Finally, I will discuss how disordered regions contribute to human disease and the emergence of cellular complexity during organismal evolution.
Collapse
|
12
|
Abstract
MDM4, an essential negative regulator of the P53 tumor suppressor, is frequently overexpressed in cancer cells that harbor a wild-type P53. By a mechanism based on alternative splicing, the MDM4 gene generates two mutually exclusive isoforms: MDM4-FL, which encodes the full-length MDM4 protein, and a shorter splice variant called MDM4-S. Previous results suggested that the MDM4-S isoform could be an important driver of tumor development. In this short review, we discuss a recent set of data indicating that MDM4-S is more likely a passenger isoform during tumorigenesis and that targeting MDM4 splicing to prevent MDM4-FL protein expression appears as a promising strategy to reactivate p53 in cancer cells. The benefits and risks associated with this strategy are also discussed.
Collapse
|
13
|
Park J, Lee H, Tran Q, Mun K, Kim D, Hong Y, Kwon SH, Brazil D, Park J, Kim SH. Recognition of Transmembrane Protein 39A as a Tumor-Specific Marker in Brain Tumor. Toxicol Res 2017; 33:63-69. [PMID: 28133515 PMCID: PMC5266369 DOI: 10.5487/tr.2017.33.1.063] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/28/2016] [Revised: 11/30/2016] [Accepted: 12/02/2016] [Indexed: 12/17/2022] Open
Abstract
Transmembrane protein 39A (TMEM39A) belongs to the TMEM39 family. TMEM39A gene is a susceptibility locus for multiple sclerosis. In addition, TMEM39A seems to be implicated in systemic lupus erythematosus. However, any possible involvement of TMEM39A in cancer remains largely unknown. In the present report, we provide evidence that TMEM39A may play a role in brain tumors. Western blotting using an anti-TMEM39A antibody indicated that TMEM39A was overexpressed in glioblastoma cell lines, including U87-MG and U251-MG. Deep-sequencing transcriptomic profiling of U87-MG and U251-MG cells revealed that TMEM39A transcripts were upregulated in such cells compared with those of the cerebral cortex. Confocal microscopic analysis of U251-MG cells stained with anti-TMEM39A antibody showed that TMEM39A was located in dot-like structures lying close to the nucleus. TMEM39A probably located to mitochondria or to endosomes. Immunohistochemical analysis of glioma tissue specimens indicated that TMEM39A was markedly upregulated in such samples. Bioinformatic analysis of the Rembrandt knowledge base also supported upregulation of TMEM39A mRNA levels in glioma patients. Together, the results afford strong evidence that TMEM39A is upregulated in glioma cell lines and glioma tissue specimens. Therefore, TMEM39A may serve as a novel diagnostic marker of, and a therapeutic target for, gliomas and other cancers.
Collapse
Affiliation(s)
- Jisoo Park
- Department of Pharmacology and Medical Science, Metabolic Syndrome and Cell Signaling Laboratory, Research Institute for Medical Sciences, College of Medicine, Chungnam National University, Daejeon, Korea
| | - Hyunji Lee
- Department of Pharmacology and Medical Science, Metabolic Syndrome and Cell Signaling Laboratory, Research Institute for Medical Sciences, College of Medicine, Chungnam National University, Daejeon, Korea
| | - Quangdon Tran
- Department of Pharmacology and Medical Science, Metabolic Syndrome and Cell Signaling Laboratory, Research Institute for Medical Sciences, College of Medicine, Chungnam National University, Daejeon, Korea
| | - Kisun Mun
- Department of Pharmacology and Medical Science, Metabolic Syndrome and Cell Signaling Laboratory, Research Institute for Medical Sciences, College of Medicine, Chungnam National University, Daejeon, Korea
| | - Dohoon Kim
- Department of Pharmacology and Medical Science, Metabolic Syndrome and Cell Signaling Laboratory, Research Institute for Medical Sciences, College of Medicine, Chungnam National University, Daejeon, Korea
| | - Youngeun Hong
- Department of Pharmacology and Medical Science, Metabolic Syndrome and Cell Signaling Laboratory, Research Institute for Medical Sciences, College of Medicine, Chungnam National University, Daejeon, Korea
| | - So Hee Kwon
- College of Pharmacy, Yonsei Institute of Pharmaceutical Sciences, Yonsei University, Incheon, Korea
| | - Derek Brazil
- Centre for Experimental Medicine, Queen's University Belfast, Belfast, Northern Ireland, United Kingdom
| | - Jongsun Park
- Department of Pharmacology and Medical Science, Metabolic Syndrome and Cell Signaling Laboratory, Research Institute for Medical Sciences, College of Medicine, Chungnam National University, Daejeon, Korea
| | - Seon-Hwan Kim
- Department of Neurosurgery, Institute for Cancer Research, College of Medicine, Chungnam National University, Daejeon, Korea
| |
Collapse
|
14
|
Dujardin G, Daguenet É, Bernard DG, Flodrops M, Durand S, Chauveau A, El Khoury F, Le Jossic-Corcos C, Corcos L. L’épissage des ARN pré-messagers : quand le splicéosome perd pied. Med Sci (Paris) 2017; 32:1103-1110. [DOI: 10.1051/medsci/20163212014] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/13/2022] Open
|
15
|
Abstract
A genome sequence is worthless if it cannot be deciphered; therefore, efforts to describe - or 'annotate' - genes began as soon as DNA sequences became available. Whereas early work focused on individual protein-coding genes, the modern genomic ocean is a complex maelstrom of alternative splicing, non-coding transcription and pseudogenes. Scientists - from clinicians to evolutionary biologists - need to navigate these waters, and this has led to the design of high-throughput, computationally driven annotation projects. The catalogues that are being produced are key resources for genome exploration, especially as they become integrated with expression, epigenomic and variation data sets. Their creation, however, remains challenging.
Collapse
Affiliation(s)
- Jonathan M Mudge
- Department of Computational Genomics, Wellcome Trust Sanger Institute, Hinxton CB10 1SA, UK
| | - Jennifer Harrow
- Department of Computational Genomics, Wellcome Trust Sanger Institute, Hinxton CB10 1SA, UK.,Illumina Cambridge Ltd, Chesterford Research Park, Little Chesterford, Saffron Walden CB10 1 XL, UK
| |
Collapse
|
16
|
Cozzetto D, Minneci F, Currant H, Jones DT. FFPred 3: feature-based function prediction for all Gene Ontology domains. Sci Rep 2016; 6:31865. [PMID: 27561554 PMCID: PMC4999993 DOI: 10.1038/srep31865] [Citation(s) in RCA: 70] [Impact Index Per Article: 7.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/05/2016] [Accepted: 07/25/2016] [Indexed: 11/09/2022] Open
Abstract
Predicting protein function has been a major goal of bioinformatics for several decades, and it has gained fresh momentum thanks to recent community-wide blind tests aimed at benchmarking available tools on a genomic scale. Sequence-based predictors, especially those performing homology-based transfers, remain the most popular but increasing understanding of their limitations has stimulated the development of complementary approaches, which mostly exploit machine learning. Here we present FFPred 3, which is intended for assigning Gene Ontology terms to human protein chains, when homology with characterized proteins can provide little aid. Predictions are made by scanning the input sequences against an array of Support Vector Machines (SVMs), each examining the relationship between protein function and biophysical attributes describing secondary structure, transmembrane helices, intrinsically disordered regions, signal peptides and other motifs. This update features a larger SVM library that extends its coverage to the cellular component sub-ontology for the first time, prompted by the establishment of a dedicated evaluation category within the Critical Assessment of Functional Annotation. The effectiveness of this approach is demonstrated through benchmarking experiments, and its usefulness is illustrated by analysing the potential functional consequences of alternative splicing in human and their relationship to patterns of biological features.
Collapse
Affiliation(s)
- Domenico Cozzetto
- Bioinformatics Group, Department of Computer Science, University College London, Gower Street, London, WC1E 6BT, UK
| | - Federico Minneci
- Bioinformatics Group, Department of Computer Science, University College London, Gower Street, London, WC1E 6BT, UK
| | - Hannah Currant
- Bioinformatics Group, Department of Computer Science, University College London, Gower Street, London, WC1E 6BT, UK
| | - David T Jones
- Bioinformatics Group, Department of Computer Science, University College London, Gower Street, London, WC1E 6BT, UK
| |
Collapse
|
17
|
Ezkurdia I, Calvo E, Del Pozo A, Vázquez J, Valencia A, Tress ML. The potential clinical impact of the release of two drafts of the human proteome. Expert Rev Proteomics 2015; 12:579-93. [PMID: 26496066 PMCID: PMC4732427 DOI: 10.1586/14789450.2015.1103186] [Citation(s) in RCA: 21] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/20/2022]
Abstract
The authors have carried out an investigation of the two "draft maps of the human proteome" published in 2014 in Nature. The findings include an abundance of poor spectra, low-scoring peptide-spectrum matches and incorrectly identified proteins in both these studies, highlighting clear issues with the application of false discovery rates. This noise means that the claims made by the two papers - the identification of high numbers of protein coding genes, the detection of novel coding regions and the draft tissue maps themselves - should be treated with considerable caution. The authors recommend that clinicians and researchers do not use the unfiltered data from these studies. Despite this these studies will inspire further investigation into tissue-based proteomics. As long as this future work has proper quality controls, it could help produce a consensus map of the human proteome and improve our understanding of the processes that underlie health and disease.
Collapse
Affiliation(s)
- Iakes Ezkurdia
- Unidad de Proteómica, Centro Nacional de Investigaciones Cardiovasculares, CNIC, Madrid, Spain
| | - Enrique Calvo
- Unidad de Proteómica, Centro Nacional de Investigaciones Cardiovasculares, CNIC, Madrid, Spain
| | - Angela Del Pozo
- Instituto de Genetica Medica y Molecular, Hospital Universitario La Paz, Madrid, Spain
| | - Jesús Vázquez
- Laboratorio de Proteómica Cardiovascular, Centro Nacional de Investigaciones Cardiovasculares, CNIC, Madrid, Spain
| | - Alfonso Valencia
- Structural Biology and Bioinformatics Programme, Spanish National Cancer Research Centre (CNIO), Madrid, Spain
- National Bioinformatics Institute (INB), Spanish National Cancer Research Centre (CNIO), Madrid, Spain
| | - Michael L. Tress
- Structural Biology and Bioinformatics Programme, Spanish National Cancer Research Centre (CNIO), Madrid, Spain
| |
Collapse
|
18
|
Affiliation(s)
- Boris Bardot
- Genetics of Tumor Suppression (Equipe labellisée Ligue), UMR 3244 IC/CNRS/UPMC, Institut Curie, Paris, France
| | - Franck Toledo
- Genetics of Tumor Suppression (Equipe labellisée Ligue), UMR 3244 IC/CNRS/UPMC, Institut Curie, Paris, France
| |
Collapse
|