1
|
Martinez-Gomez L, Cerdán-Vélez D, Abascal F, Tress ML. Origins and Evolution of Human Tandem Duplicated Exon Substitution Events. Genome Biol Evol 2022; 14:6809199. [PMID: 36346145 PMCID: PMC9741552 DOI: 10.1093/gbe/evac162] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/06/2022] [Revised: 10/25/2022] [Accepted: 10/29/2022] [Indexed: 11/10/2022] Open
Abstract
The mutually exclusive splicing of tandem duplicated exons produces protein isoforms that are identical save for a homologous region that allows for the fine tuning of protein function. Tandem duplicated exon substitution events are rare, yet highly important alternative splicing events. Most events are ancient, their isoforms are highly expressed, and they have significantly more pathogenic mutations than other splice events. Here, we analyzed the physicochemical properties and functional roles of the homologous polypeptide regions produced by the 236 tandem duplicated exon substitutions annotated in the human gene set. We find that the most important structural and functional residues in these homologous regions are maintained, and that most changes are conservative rather than drastic. Three quarters of the isoforms produced from tandem duplicated exon substitution events are tissue-specific, particularly in nervous and cardiac tissues, and tandem duplicated exon substitution events are enriched in functional terms related to structures in the brain and skeletal muscle. We find considerable evidence for the convergent evolution of tandem duplicated exon substitution events in vertebrates, arthropods, and nematodes. Twelve human gene families have orthologues with tandem duplicated exon substitution events in both Drosophila melanogaster and Caenorhabditis elegans. Six of these gene families are ion transporters, suggesting that tandem exon duplication in genes that control the flow of ions into the cell has an adaptive benefit. The ancient origins, the strong indications of tissue-specific functions, and the evidence of convergent evolution suggest that these events may have played important roles in the evolution of animal tissues and organs.
Collapse
Affiliation(s)
- Laura Martinez-Gomez
- Bioinformatics Unit, Spanish National Cancer Research Centre (CNIO), C. Melchor Fernandez Almagro, 3, 28029 Madrid, Spain
| | - Daniel Cerdán-Vélez
- Bioinformatics Unit, Spanish National Cancer Research Centre (CNIO), C. Melchor Fernandez Almagro, 3, 28029 Madrid, Spain
| | - Federico Abascal
- Somatic Evolution Group, Wellcome Sanger Institute, Wellcome Genome Campus, Hinxton, Cambridgeshire CB10 1SA, United Kingdom
| | | |
Collapse
|
2
|
Pozo F, Rodriguez JM, Martínez Gómez L, Vázquez J, Tress ML. APPRIS principal isoforms and MANE Select transcripts define reference splice variants. Bioinformatics 2022; 38:ii89-ii94. [PMID: 36124785 PMCID: PMC9486585 DOI: 10.1093/bioinformatics/btac473] [Citation(s) in RCA: 9] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/25/2022] Open
Abstract
MOTIVATION Selecting the splice variant that best represents a coding gene is a crucial first step in many experimental analyses, and vital for mapping clinically relevant variants. This study compares the longest isoforms, MANE Select transcripts, APPRIS principal isoforms, and expression data, and aims to determine which method is best for selecting biological important reference splice variants for large-scale analyses. RESULTS Proteomics analyses and human genetic variation data suggest that most coding genes have a single main protein isoform. We show that APPRIS principal isoforms and MANE Select transcripts best describe these main cellular isoforms, and find that using the longest splice variant as the representative is a poor strategy. Exons unique to the longest splice isoforms are not under selective pressure, and so are unlikely to be functionally relevant. Expression data are also a poor means of selecting the main splice variant. APPRIS principal and MANE Select exons are under purifying selection, while exons specific to alternative transcripts are not. There are MANE and APPRIS representatives for almost 95% of genes, and where they agree they are particularly effective, coinciding with the main proteomics isoform for over 98.2% of genes. AVAILABILITY AND IMPLEMENTATION APPRIS principal isoforms for human, mouse and other model species can be downloaded from the APPRIS database (https://appris.bioinfo.cnio.es), GENCODE genes (https://www.gencodegenes.org/) and the Ensembl website (https://www.ensembl.org). MANE Select transcripts for the human reference set are available from the Ensembl, GENCODE and RefSeq databases (https://www.ncbi.nlm.nih.gov/refseq/). Lists of splice variants where MANE and APPRIS coincide are available from the APPRIS database. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Fernando Pozo
- Bioinformatics Unit, Spanish National Cancer Research Centre (CNIO), 28029 Madrid, Spain
| | - José Manuel Rodriguez
- Cardiovascular Proteomics Laboratory, Centro Nacional de Investigaciones Cardiovasculares Carlos III (CNIC), 28029 Madrid, Spain
| | - Laura Martínez Gómez
- Bioinformatics Unit, Spanish National Cancer Research Centre (CNIO), 28029 Madrid, Spain
| | - Jesús Vázquez
- Cardiovascular Proteomics Laboratory, Centro Nacional de Investigaciones Cardiovasculares Carlos III (CNIC), 28029 Madrid, Spain,CIBER de Investigaciones Cardiovasculares (CIBERCV), 28029 Madrid, Spain
| | | |
Collapse
|
3
|
Zhou WJ, Wei ZH, He SM, Chi H. pValid 2: A deep learning based validation method for peptide identification in shotgun proteomics with increased discriminating power. J Proteomics 2022; 251:104414. [PMID: 34737111 DOI: 10.1016/j.jprot.2021.104414] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/19/2021] [Revised: 10/13/2021] [Accepted: 10/25/2021] [Indexed: 11/26/2022]
Abstract
Tandem mass spectrometry has been the principal method in shotgun proteomics for peptide and protein identification. However, incorrect identifications reported by proteome search engines are still unknown, and further validation methods are needed. We have proposed a validation method pValid before, but its scope of application is limited because two features used in pValid are related to open database search and sub-optimal peptide candidates for tandem mass spectra, and the performance on complex datasets still has room for improvement. In this study, we developed a more comprehensive validation method, pValid 2, to break these limitations by removing the two features and bringing in a new feature related to the retention time predicted by a deep learning-based method pPredRT. pValid 2 yielded an average false positive rate of 0.03% and an average false negative rate of 1.37% on three testing datasets, better than those of pValid, and flagged 8.47% to 11.31% more incorrect identifications than pValid on two complex datasets. Moreover, pValid 2 flagged almost all decoy identifications in validating the open-search datasets. In addition, the function of validating identifications given by MaxQuant and MS-GF+ was implemented in pValid 2, and the validation results showed that pValid 2 performed dramatically better than three metabolic labeling validation methods. Further considering its cost-effectiveness as a pure computational approach, pValid 2 has the potential to be a widely used validation tool for peptide identifications of any proteome search engines in shotgun proteomics. SIGNIFICANCE: Identification results given by shotgun proteomics are vital to life science research. The correctness of identifications deeply affects the precision of the subsequent studies about protein structures and functions, protein-protein interactions, pathogenic mechanism, and targeted drugs. Thus, validating the correctness of identifications is crucial and urgent. In 2019, we developed an identification credibility validation method named pValid, whose false positive rate (FPR) is 0.03% and false negative rate (FNR) is 1.79%, comparable to those of the gold standard, i.e., the Synthetic-peptide validation method. However, pValid can only be used for validating the results from pFind, and its validation performance on a few complex datasets still has room for improvement. So, in this submission, we proposed pValid 2, a more comprehensive computational validation method that can validate identifications from any proteome search engines with increased discriminating power.
Collapse
Affiliation(s)
- Wen-Jing Zhou
- Key Laboratory of Intelligent Information Processing of Chinese Academy of Sciences (CAS), Institute of Computing Technology, CAS, Beijing, China; University of Chinese Academy of Sciences, Beijing, China
| | - Zhuo-Hong Wei
- Key Laboratory of Intelligent Information Processing of Chinese Academy of Sciences (CAS), Institute of Computing Technology, CAS, Beijing, China; University of Chinese Academy of Sciences, Beijing, China
| | - Si-Min He
- Key Laboratory of Intelligent Information Processing of Chinese Academy of Sciences (CAS), Institute of Computing Technology, CAS, Beijing, China; University of Chinese Academy of Sciences, Beijing, China
| | - Hao Chi
- Key Laboratory of Intelligent Information Processing of Chinese Academy of Sciences (CAS), Institute of Computing Technology, CAS, Beijing, China; University of Chinese Academy of Sciences, Beijing, China.
| |
Collapse
|
4
|
Martinez Gomez L, Pozo F, Walsh TA, Abascal F, Tress ML. The clinical importance of tandem exon duplication-derived substitutions. Nucleic Acids Res 2021; 49:8232-8246. [PMID: 34302486 PMCID: PMC8373072 DOI: 10.1093/nar/gkab623] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/07/2021] [Accepted: 07/21/2021] [Indexed: 01/04/2023] Open
Abstract
Most coding genes in the human genome are annotated with multiple alternative transcripts. However, clear evidence for the functional relevance of the protein isoforms produced by these alternative transcripts is often hard to find. Alternative isoforms generated from tandem exon duplication-derived substitutions are an exception. These splice events are rare, but have important functional consequences. Here, we have catalogued the 236 tandem exon duplication-derived substitutions annotated in the GENCODE human reference set. We find that more than 90% of the events have a last common ancestor in teleost fish, so are at least 425 million years old, and twenty-one can be traced back to the Bilateria clade. Alternative isoforms generated from tandem exon duplication-derived substitutions also have significantly more clinical impact than other alternative isoforms. Tandem exon duplication-derived substitutions have >25 times as many pathogenic and likely pathogenic mutations as other alternative events. Tandem exon duplication-derived substitutions appear to have vital functional roles in the cell and may have played a prominent part in metazoan evolution.
Collapse
Affiliation(s)
- Laura Martinez Gomez
- Bioinformatics Unit, Spanish National Cancer Research Centre (CNIO), C. Melchor Fernandez Almagro, 3, 28029 Madrid, Spain
| | - Fernando Pozo
- Bioinformatics Unit, Spanish National Cancer Research Centre (CNIO), C. Melchor Fernandez Almagro, 3, 28029 Madrid, Spain
| | - Thomas A Walsh
- Bioinformatics Unit, Spanish National Cancer Research Centre (CNIO), C. Melchor Fernandez Almagro, 3, 28029 Madrid, Spain.,Eukaryotic Annotation Team, EMBL-EBI, Wellcome Genome Campus, Hinxton, Cambridgeshire CB10 1SA. UK
| | - Federico Abascal
- Somatic Evolution Group, Wellcome Sanger Institute, Wellcome Genome Campus, Hinxton, Cambridgeshire CB10 1SA, UK
| | - Michael L Tress
- Bioinformatics Unit, Spanish National Cancer Research Centre (CNIO), C. Melchor Fernandez Almagro, 3, 28029 Madrid, Spain
| |
Collapse
|
5
|
Pozo F, Martinez-Gomez L, Walsh TA, Rodriguez JM, Di Domenico T, Abascal F, Vazquez J, Tress ML. Assessing the functional relevance of splice isoforms. NAR Genom Bioinform 2021; 3:lqab044. [PMID: 34046593 PMCID: PMC8140736 DOI: 10.1093/nargab/lqab044] [Citation(s) in RCA: 23] [Impact Index Per Article: 5.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/27/2020] [Revised: 04/22/2021] [Accepted: 05/17/2021] [Indexed: 12/20/2022] Open
Abstract
Alternative splicing of messenger RNA can generate an array of mature transcripts, but it is not clear how many go on to produce functionally relevant protein isoforms. There is only limited evidence for alternative proteins in proteomics analyses and data from population genetic variation studies indicate that most alternative exons are evolving neutrally. Determining which transcripts produce biologically important isoforms is key to understanding isoform function and to interpreting the real impact of somatic mutations and germline variations. Here we have developed a method, TRIFID, to classify the functional importance of splice isoforms. TRIFID was trained on isoforms detected in large-scale proteomics analyses and distinguishes these biologically important splice isoforms with high confidence. Isoforms predicted as functionally important by the algorithm had measurable cross species conservation and significantly fewer broken functional domains. Additionally, exons that code for these functionally important protein isoforms are under purifying selection, while exons from low scoring transcripts largely appear to be evolving neutrally. TRIFID has been developed for the human genome, but it could in principle be applied to other well-annotated species. We believe that this method will generate valuable insights into the cellular importance of alternative splicing.
Collapse
Affiliation(s)
- Fernando Pozo
- Bioinformatics Unit, Spanish National Cancer Research Centre (CNIO), Madrid, Spain
| | - Laura Martinez-Gomez
- Bioinformatics Unit, Spanish National Cancer Research Centre (CNIO), Madrid, Spain
| | - Thomas A Walsh
- Bioinformatics Unit, Spanish National Cancer Research Centre (CNIO), Madrid, Spain
| | - José Manuel Rodriguez
- Cardiovascular Proteomics Laboratory, Centro Nacional de Investigaciones Cardiovasculares (CNIC), Madrid, Spain
| | - Tomas Di Domenico
- Bioinformatics Unit, Spanish National Cancer Research Centre (CNIO), Madrid, Spain
| | - Federico Abascal
- Somatic Evolution Group, Wellcome Sanger Institute, Hinxton CB10 1SA, UK
| | - Jesús Vazquez
- Cardiovascular Proteomics Laboratory, Centro Nacional de Investigaciones Cardiovasculares (CNIC), Madrid, Spain
| | - Michael L Tress
- Bioinformatics Unit, Spanish National Cancer Research Centre (CNIO), Madrid, Spain
| |
Collapse
|
6
|
Zhu H, Wang G, Zhu H, Xu A. MTFR2, A Potential Biomarker for Prognosis and Immune Infiltrates, Promotes Progression of Gastric Cancer Based on Bioinformatics Analysis and Experiments. J Cancer 2021; 12:3611-3625. [PMID: 33995638 PMCID: PMC8120185 DOI: 10.7150/jca.58158] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/13/2021] [Accepted: 04/15/2021] [Indexed: 12/24/2022] Open
Abstract
Background: Mitochondrial fission regulator 2 (MTFR2) which can promote mitochondrial fission, has recently been reported to be involved in tumorigenesis. However, little is known about its expression levels and function in gastric cancer (GC). This study aims to clarify the role of MTFR2 in GC. Methods:We firstly determined the expression level and prognostic value of MTFR2 in GC by integrated bioinformatics (Oncomine, GEPIA, Kaplan-Meier Plotter database) and experimental approaches (RT-qPCR, western blot, immunohistochemistry). After constructing stable down-regulated GC cells, the biological functions of MTFR2 in vitro and in vivo were studied through cell clone formation, wound healing, transwell and tumor formation experiments.To understand the reason for the high expression of MTFR2 in GC, copy number alternation, promoter methylation and mutation of MTFR2 were detected by UALCAN and cBioPortal. TargetScanHuman and PROMO databases were also used to explore the miRNAs and transcription factors of MTFR2, and the regulatory network was visualized by Cytoscape. LinkedOmics was used to detect the co-expression profile, and then these co-expressed genes were used for gene oncology function and pathway enrichment analysis to deepen the understanding of MTFR2 mechanism. The protein interaction network of MTFR2 was constructed by the GeneMANIA platform. Docking study of the binding mode was conducted by H DOCK webserver, and PYMOL is used for visualization, and analysis. TIMER database was used to explore the correlation between MTFR2 expression level and immune cells infiltration and gene markers of tumor infiltrating immune cells. Results: We demonstrated that MTFR2 was up-regulated in GC, and its overexpression led to poorer prognosis. MTFR2 downregulation inhibited the proliferation, migration, and invasion of GC cells in vitro and in vivo. By bioinformatics analysis, we identified the possible factors in MTFR2 overexpression. Moreover, function and pathway enrichment analyses found that MTFR2 was involved in chromosome segregation, catalytic activity, cell cycle, and ribonucleic acid transport. A MTFR2-protein interaction network revealed a potential direct protein interaction between MTFR2 and protein kinase adenosine-monophosphate-activated catalytic subunit alpha 1 (PRKAA1), and their potential binding site was predicted in a molecular docking model. In addition, we also found that MTFR2 may be correlated with immune infiltration in GC. Conclusions: Our study has effectively revealed the expression, prognostic value, potential functional networks, protein interactions and immune infiltration of MTFR2 in GC. Altogether, our data identify the possible underlying mechanisms of MTFR2 and suggest that MTFR2 may be a prognostic biomarker and therapeutic target in GC.
Collapse
Affiliation(s)
- Hai Zhu
- Department of General Surgery, The First Affiliated Hospital of Anhui Medical University, Hefei 230001, People's Republic of China
| | - Gang Wang
- Department of General Surgery, The Fourth Affiliated Hospital of Anhui Medical University, Hefei 230001, People's Republic of China
| | - Haixing Zhu
- Department of Gastrointestinal Surgery, The First Affiliated Hospital of USTC, Division of Life Sciences and Medicine, University of Science and Technology of China, Hefei 230031, People's Republic of China
| | - Aman Xu
- Department of General Surgery, The First Affiliated Hospital of Anhui Medical University, Hefei 230001, People's Republic of China.,Department of General Surgery, The Fourth Affiliated Hospital of Anhui Medical University, Hefei 230001, People's Republic of China
| |
Collapse
|
7
|
De Sousa Mendes M, L Orton A, Humphries HE, Jones B, Gardner I, Neuhoff S, Pilla Reddy V. A Laboratory-Specific Scaling Factor to Predict the In Vivo Human Clearance of Aldehyde Oxidase Substrates. Drug Metab Dispos 2020; 48:1231-1238. [PMID: 32893186 DOI: 10.1124/dmd.120.000082] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/16/2020] [Accepted: 07/22/2020] [Indexed: 02/13/2025] Open
Abstract
Aldehyde oxidase (AO) efficiently metabolizes a range of compounds with N-containing heterocyclic aromatic rings and/or aldehydes. The limited knowledge of AO activity and abundance (in vitro and in vivo) has led to poor prediction of in vivo systemic clearance (CL) using in vitro-to-in vivo extrapolation approaches, which for drugs in development can lead to their discontinuation. We aimed to identify appropriate scaling factors to predict AO CL of future new chemical entities (NCEs). The metabolism of six AO substrates was measured in human liver cytosol (HLC) and S9 fractions. Measured blood-to-plasma ratios and free fractions (in the in vitro system and in plasma) were used to develop physiologically based pharmacokinetic models for each compound. The impact of extrahepatic metabolism was explored, and the intrinsic clearance required to recover in vivo profiles was estimated and compared with in vitro measurements. Using HLC data and assuming only hepatic metabolism, a systematic underprediction of clearance was observed (average fold underprediction was 3.8). Adding extrahepatic metabolism improved the accuracy of the results (average fold error of 1.9). A workflow for predicting metabolism of an NCE by AO is proposed, and an empirical (laboratory-specific) scaling factor of three on the predicted intravenous CL allows a reasonable prediction of the available clinical data. Alternatively, considering also extrahepatic metabolism, an scaling factor of 6.5 applied on the intrinsic clearance could be used. Future research should focus on the impact of the in vitro study designs and the contribution of extrahepatic metabolism to AO-mediated clearance to understand the mechanisms behind the systematic underprediction. SIGNIFICANCE STATEMENT: This works describes the development of scaling factors to allow in vitro-in vivo extrapolation of the clearance of compounds by aldehyde oxidase metabolism in humans. In addition, physiologically based pharmacokinetic models were developed for each of the aldehyde oxidase substrate compounds investigated.
Collapse
Affiliation(s)
- Mailys De Sousa Mendes
- Certara UK Limited, Simcyp Division, Sheffield, United Kingdom (M.D.S.M., H.E.H., I.G., S.N.) and Oncology DMPK Research & Early Development (A.O., B.J.) and Modelling and Simulation, Research & Early Development (V.P.R.), Oncology R&D, AstraZeneca, Cambridge, United Kingdom
| | - Alexandra L Orton
- Certara UK Limited, Simcyp Division, Sheffield, United Kingdom (M.D.S.M., H.E.H., I.G., S.N.) and Oncology DMPK Research & Early Development (A.O., B.J.) and Modelling and Simulation, Research & Early Development (V.P.R.), Oncology R&D, AstraZeneca, Cambridge, United Kingdom
| | - Helen E Humphries
- Certara UK Limited, Simcyp Division, Sheffield, United Kingdom (M.D.S.M., H.E.H., I.G., S.N.) and Oncology DMPK Research & Early Development (A.O., B.J.) and Modelling and Simulation, Research & Early Development (V.P.R.), Oncology R&D, AstraZeneca, Cambridge, United Kingdom
| | - Barry Jones
- Certara UK Limited, Simcyp Division, Sheffield, United Kingdom (M.D.S.M., H.E.H., I.G., S.N.) and Oncology DMPK Research & Early Development (A.O., B.J.) and Modelling and Simulation, Research & Early Development (V.P.R.), Oncology R&D, AstraZeneca, Cambridge, United Kingdom
| | - Iain Gardner
- Certara UK Limited, Simcyp Division, Sheffield, United Kingdom (M.D.S.M., H.E.H., I.G., S.N.) and Oncology DMPK Research & Early Development (A.O., B.J.) and Modelling and Simulation, Research & Early Development (V.P.R.), Oncology R&D, AstraZeneca, Cambridge, United Kingdom
| | - Sibylle Neuhoff
- Certara UK Limited, Simcyp Division, Sheffield, United Kingdom (M.D.S.M., H.E.H., I.G., S.N.) and Oncology DMPK Research & Early Development (A.O., B.J.) and Modelling and Simulation, Research & Early Development (V.P.R.), Oncology R&D, AstraZeneca, Cambridge, United Kingdom
| | - Venkatesh Pilla Reddy
- Certara UK Limited, Simcyp Division, Sheffield, United Kingdom (M.D.S.M., H.E.H., I.G., S.N.) and Oncology DMPK Research & Early Development (A.O., B.J.) and Modelling and Simulation, Research & Early Development (V.P.R.), Oncology R&D, AstraZeneca, Cambridge, United Kingdom
| |
Collapse
|
8
|
Couté Y, Bruley C, Burger T. Beyond Target-Decoy Competition: Stable Validation of Peptide and Protein Identifications in Mass Spectrometry-Based Discovery Proteomics. Anal Chem 2020; 92:14898-14906. [PMID: 32970414 DOI: 10.1021/acs.analchem.0c00328] [Citation(s) in RCA: 26] [Impact Index Per Article: 5.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/17/2022]
Abstract
In bottom-up discovery proteomics, target-decoy competition (TDC) is the most popular method for false discovery rate (FDR) control. Despite unquestionable statistical foundations, this method has drawbacks, including its hitherto unknown intrinsic lack of stability vis-à-vis practical conditions of application. Although some consequences of this instability have already been empirically described, they may have been misinterpreted. This article provides evidence that TDC has become less reliable as the accuracy of modern mass spectrometers improved. We therefore propose to replace TDC by a totally different method to control the FDR at the spectrum, peptide, and protein levels, while benefiting from the theoretical guarantees of the Benjamini-Hochberg framework. As this method is simpler to use, faster to compute, and more stable than TDC, we argue that it is better adapted to the standardization and throughput constraints of current proteomic platforms.
Collapse
Affiliation(s)
- Yohann Couté
- Université Grenoble Alpes, CNRS, CEA, INSERM, IRIG, BGE, F-38000 Grenoble, France
| | - Christophe Bruley
- Université Grenoble Alpes, CNRS, CEA, INSERM, IRIG, BGE, F-38000 Grenoble, France
| | - Thomas Burger
- Université Grenoble Alpes, CNRS, CEA, INSERM, IRIG, BGE, F-38000 Grenoble, France
| |
Collapse
|
9
|
Prieto G, Vázquez J. Protein Probability Model for High-Throughput Protein Identification by Mass Spectrometry-Based Proteomics. J Proteome Res 2020; 19:1285-1297. [PMID: 32037837 DOI: 10.1021/acs.jproteome.9b00819] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/12/2022]
Abstract
Shotgun proteomics is the method of choice for high-throughput protein identification; however, robust statistical methods are essential to automatize this task while minimizing the number of false identifications. The standard method for estimating the false discovery rate (FDR) of individual identifications and keeping it below a threshold (typically 1%) is the target-decoy approach. However, numerous works have shown that FDR at the protein level may become much larger than FDR at the peptide level. The development of an appropriate scoring model to identify proteins from their peptides using high-throughput shotgun proteomics is highly needed. In this study, we present a novel protein-level scoring algorithm that uses the scores of the identified peptides and maintains all of the properties expected for a true protein probability. We also present a refinement of the picked method to calculate FDR at the protein level. These algorithms can be used together as a robust identification workflow suitable for large-scale proteomics, and we show that the identification performance of this workflow is superior to that of other widely used methods in several samples and using different search engines. Our protein probability model offers the scientific community an algorithm that is easy to integrate into protein identification workflows for the automated analysis of shotgun proteomics data.
Collapse
Affiliation(s)
- Gorka Prieto
- Department of Communications Engineering, University of the Basque Country (UPV/EHU), 48013 Bilbao, Spain
| | - Jesús Vázquez
- Centro Nacional de Investigaciones Cardiovasculares Carlos III (CNIC), 28049 Madrid, Spain
| |
Collapse
|
10
|
Abstract
Shotgun proteomics is the method of choice for large-scale protein identification. However, the use of a robust statistical workflow to validate such identification is mandatory to minimize false matches, ambiguities, and amplification of error rates from spectra to proteins. In this chapter we emphasize the key concepts to take into account when processing the output of a search engine to obtain reliable peptide or protein identifications. We assume that the reader is already familiar with tandem mass spectrometry so we can focus on the use of statistical confidence methods. After introducing the key concepts we present different software tools and how to use them with an example dataset.
Collapse
Affiliation(s)
- Gorka Prieto
- Department of Communications Engineering, Faculty of Engineering of Bilbao, University of the Basque Country (UPV/EHU), Bilbao, Spain.
| | - Jesús Vázquez
- Laboratory of Cardiovascular Proteomics, Centro Nacional de Investigaciones Cardiovasculares (CNIC) and CIBER de Enfermedades Cardiovasculares (CIBERCV), Madrid, Spain
| |
Collapse
|
11
|
Martinez-Gomez L, Abascal F, Jungreis I, Pozo F, Kellis M, Mudge JM, Tress ML. Few SINEs of life: Alu elements have little evidence for biological relevance despite elevated translation. NAR Genom Bioinform 2019; 2:lqz023. [PMID: 31886458 PMCID: PMC6924539 DOI: 10.1093/nargab/lqz023] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/08/2019] [Revised: 10/30/2019] [Accepted: 12/12/2019] [Indexed: 12/12/2022] Open
Abstract
Transposable elements colonize genomes and with time may end up being incorporated into functional regions. SINE Alu elements, which appeared in the primate lineage, are ubiquitous in the human genome and more than a thousand overlap annotated coding exons. Although almost all Alu-derived coding exons appear to be in alternative transcripts, they have been incorporated into the main coding transcript in at least 11 genes. The extent to which Alu regions are incorporated into functional proteins is unclear, but we detected reliable peptide evidence to support the translation to protein of 33 Alu-derived exons. All but one of the Alu elements for which we detected peptides were frame-preserving and there was proportionally seven times more peptide evidence for Alu elements as for other primate exons. Despite this strong evidence for translation to protein we found no evidence of selection, either from cross species alignments or human population variation data, among these Alu-derived exons. Overall, our results confirm that SINE Alu elements have contributed to the expansion of the human proteome, and this contribution appears to be stronger than might be expected over such a relatively short evolutionary timeframe. Despite this, the biological relevance of these modifications remains open to question.
Collapse
Affiliation(s)
- Laura Martinez-Gomez
- Bioinformatics Unit, Spanish National Cancer Research Centre, 28029 Madrid, Spain
| | | | - Irwin Jungreis
- MIT Computer Science and Artificial Intelligence Laboratory, Cambridge, MA and Broad Institute of MIT and Harvard, Cambridge, MA 02139, USA
| | - Fernando Pozo
- Bioinformatics Unit, Spanish National Cancer Research Centre, 28029 Madrid, Spain
| | - Manolis Kellis
- MIT Computer Science and Artificial Intelligence Laboratory, Cambridge, MA and Broad Institute of MIT and Harvard, Cambridge, MA 02139, USA
| | - Jonathan M Mudge
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge, CB10 1SD, UK
| | - Michael L Tress
- Bioinformatics Unit, Spanish National Cancer Research Centre, 28029 Madrid, Spain
- To whom correspondence should be addressed. Tel: +34 91 732 8000; Fax: +34 91 224 6980;
| |
Collapse
|
12
|
Rinschen MM, Limbutara K, Knepper MA, Payne DM, Pisitkun T. From Molecules to Mechanisms: Functional Proteomics and Its Application to Renal Tubule Physiology. Physiol Rev 2019; 98:2571-2606. [PMID: 30182799 DOI: 10.1152/physrev.00057.2017] [Citation(s) in RCA: 27] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/03/2023] Open
Abstract
Classical physiological studies using electrophysiological, biophysical, biochemical, and molecular techniques have created a detailed picture of molecular transport, bioenergetics, contractility and movement, and growth, as well as the regulation of these processes by external stimuli in cells and organisms. Newer systems biology approaches are beginning to provide deeper and broader understanding of these complex biological processes and their dynamic responses to a variety of environmental cues. In the past decade, advances in mass spectrometry-based proteomic technologies have provided invaluable tools to further elucidate these complex cellular processes, thereby confirming, complementing, and advancing common views of physiology. As one notable example, the application of proteomics to study the regulation of kidney function has yielded novel insights into the chemical and physical processes that tightly control body fluids, electrolytes, and metabolites to provide optimal microenvironments for various cellular and organ functions. Here, we systematically review, summarize, and discuss the most significant key findings from functional proteomic studies in renal epithelial physiology. We also identify further improvements in technological and bioinformatics methods that will be essential to advance precision medicine in nephrology.
Collapse
Affiliation(s)
- Markus M Rinschen
- Department II of Internal Medicine, University Hospital Cologne , Cologne , Germany ; Center for Molecular Medicine Cologne, University of Cologne , Cologne , Germany ; Cologne Excellence Cluster on Cellular Stress Responses in Aging-Associated Diseases, University of Cologne , Cologne , Germany ; Division of Nephrology, Faculty of Medicine, Chulalongkorn University , Bangkok , Thailand ; Epithelial Systems Biology Laboratory, National Heart, Lung, and Blood Institute, National Institutes of Health , Bethesda, Maryland ; and Center of Excellence in Systems Biology, Research Affairs, Faculty of Medicine, Chulalongkorn University , Bangkok , Thailand
| | - Kavee Limbutara
- Department II of Internal Medicine, University Hospital Cologne , Cologne , Germany ; Center for Molecular Medicine Cologne, University of Cologne , Cologne , Germany ; Cologne Excellence Cluster on Cellular Stress Responses in Aging-Associated Diseases, University of Cologne , Cologne , Germany ; Division of Nephrology, Faculty of Medicine, Chulalongkorn University , Bangkok , Thailand ; Epithelial Systems Biology Laboratory, National Heart, Lung, and Blood Institute, National Institutes of Health , Bethesda, Maryland ; and Center of Excellence in Systems Biology, Research Affairs, Faculty of Medicine, Chulalongkorn University , Bangkok , Thailand
| | - Mark A Knepper
- Department II of Internal Medicine, University Hospital Cologne , Cologne , Germany ; Center for Molecular Medicine Cologne, University of Cologne , Cologne , Germany ; Cologne Excellence Cluster on Cellular Stress Responses in Aging-Associated Diseases, University of Cologne , Cologne , Germany ; Division of Nephrology, Faculty of Medicine, Chulalongkorn University , Bangkok , Thailand ; Epithelial Systems Biology Laboratory, National Heart, Lung, and Blood Institute, National Institutes of Health , Bethesda, Maryland ; and Center of Excellence in Systems Biology, Research Affairs, Faculty of Medicine, Chulalongkorn University , Bangkok , Thailand
| | - D Michael Payne
- Department II of Internal Medicine, University Hospital Cologne , Cologne , Germany ; Center for Molecular Medicine Cologne, University of Cologne , Cologne , Germany ; Cologne Excellence Cluster on Cellular Stress Responses in Aging-Associated Diseases, University of Cologne , Cologne , Germany ; Division of Nephrology, Faculty of Medicine, Chulalongkorn University , Bangkok , Thailand ; Epithelial Systems Biology Laboratory, National Heart, Lung, and Blood Institute, National Institutes of Health , Bethesda, Maryland ; and Center of Excellence in Systems Biology, Research Affairs, Faculty of Medicine, Chulalongkorn University , Bangkok , Thailand
| | - Trairak Pisitkun
- Department II of Internal Medicine, University Hospital Cologne , Cologne , Germany ; Center for Molecular Medicine Cologne, University of Cologne , Cologne , Germany ; Cologne Excellence Cluster on Cellular Stress Responses in Aging-Associated Diseases, University of Cologne , Cologne , Germany ; Division of Nephrology, Faculty of Medicine, Chulalongkorn University , Bangkok , Thailand ; Epithelial Systems Biology Laboratory, National Heart, Lung, and Blood Institute, National Institutes of Health , Bethesda, Maryland ; and Center of Excellence in Systems Biology, Research Affairs, Faculty of Medicine, Chulalongkorn University , Bangkok , Thailand
| |
Collapse
|
13
|
Chi H, Liu C, Yang H, Zeng WF, Wu L, Zhou WJ, Wang RM, Niu XN, Ding YH, Zhang Y, Wang ZW, Chen ZL, Sun RX, Liu T, Tan GM, Dong MQ, Xu P, Zhang PH, He SM. Comprehensive identification of peptides in tandem mass spectra using an efficient open search engine. Nat Biotechnol 2018; 36:nbt.4236. [PMID: 30295672 DOI: 10.1038/nbt.4236] [Citation(s) in RCA: 250] [Impact Index Per Article: 35.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/03/2017] [Accepted: 08/03/2018] [Indexed: 12/27/2022]
Abstract
We present a sequence-tag-based search engine, Open-pFind, to identify peptides in an ultra-large search space that includes coeluting peptides, unexpected modifications and digestions. Our method detects peptides with higher precision and speed than seven other search engines. Open-pFind identified 70-85% of the tandem mass spectra in four large-scale datasets and 14,064 proteins, each supported by at least two protein-unique peptides, in a human proteome dataset.
Collapse
Affiliation(s)
- Hao Chi
- Key Laboratory of Intelligent Information Processing of Chinese Academy of Sciences (CAS), Institute of Computing Technology, CAS, Beijing, China
- University of Chinese Academy of Sciences, Beijing, China
| | - Chao Liu
- Key Laboratory of Intelligent Information Processing of Chinese Academy of Sciences (CAS), Institute of Computing Technology, CAS, Beijing, China
- University of Chinese Academy of Sciences, Beijing, China
| | - Hao Yang
- Key Laboratory of Intelligent Information Processing of Chinese Academy of Sciences (CAS), Institute of Computing Technology, CAS, Beijing, China
- University of Chinese Academy of Sciences, Beijing, China
| | - Wen-Feng Zeng
- Key Laboratory of Intelligent Information Processing of Chinese Academy of Sciences (CAS), Institute of Computing Technology, CAS, Beijing, China
- University of Chinese Academy of Sciences, Beijing, China
| | - Long Wu
- Key Laboratory of Intelligent Information Processing of Chinese Academy of Sciences (CAS), Institute of Computing Technology, CAS, Beijing, China
- University of Chinese Academy of Sciences, Beijing, China
| | - Wen-Jing Zhou
- Key Laboratory of Intelligent Information Processing of Chinese Academy of Sciences (CAS), Institute of Computing Technology, CAS, Beijing, China
- University of Chinese Academy of Sciences, Beijing, China
| | - Rui-Min Wang
- Key Laboratory of Intelligent Information Processing of Chinese Academy of Sciences (CAS), Institute of Computing Technology, CAS, Beijing, China
- University of Chinese Academy of Sciences, Beijing, China
| | - Xiu-Nan Niu
- Key Laboratory of Intelligent Information Processing of Chinese Academy of Sciences (CAS), Institute of Computing Technology, CAS, Beijing, China
- University of Chinese Academy of Sciences, Beijing, China
| | - Yue-He Ding
- National Institute of Biological Sciences, Beijing, Beijing, China
| | - Yao Zhang
- State Key Laboratory of Proteomics, Beijing Proteome Research Center, National Center for Protein Sciences (Beijing), Beijing Institute of Lifeomics, Beijing, China
- State Key Laboratory of Biocontrol and Guangdong Provincial Key Laboratory of Plant Resources, College of Ecology and Evolution, Sun Yat-Sen University, Guangzhou, China
| | - Zhao-Wei Wang
- Key Laboratory of Intelligent Information Processing of Chinese Academy of Sciences (CAS), Institute of Computing Technology, CAS, Beijing, China
- University of Chinese Academy of Sciences, Beijing, China
| | - Zhen-Lin Chen
- Key Laboratory of Intelligent Information Processing of Chinese Academy of Sciences (CAS), Institute of Computing Technology, CAS, Beijing, China
- University of Chinese Academy of Sciences, Beijing, China
| | - Rui-Xiang Sun
- Key Laboratory of Intelligent Information Processing of Chinese Academy of Sciences (CAS), Institute of Computing Technology, CAS, Beijing, China
- University of Chinese Academy of Sciences, Beijing, China
| | - Tao Liu
- Key Laboratory of Intelligent Information Processing of Chinese Academy of Sciences (CAS), Institute of Computing Technology, CAS, Beijing, China
| | - Guang-Ming Tan
- Key Laboratory of Intelligent Information Processing of Chinese Academy of Sciences (CAS), Institute of Computing Technology, CAS, Beijing, China
| | - Meng-Qiu Dong
- National Institute of Biological Sciences, Beijing, Beijing, China
| | - Ping Xu
- State Key Laboratory of Proteomics, Beijing Proteome Research Center, National Center for Protein Sciences (Beijing), Beijing Institute of Lifeomics, Beijing, China
| | - Pei-Heng Zhang
- Key Laboratory of Intelligent Information Processing of Chinese Academy of Sciences (CAS), Institute of Computing Technology, CAS, Beijing, China
| | - Si-Min He
- Key Laboratory of Intelligent Information Processing of Chinese Academy of Sciences (CAS), Institute of Computing Technology, CAS, Beijing, China
- University of Chinese Academy of Sciences, Beijing, China
| |
Collapse
|
14
|
Bittremieux W, Tabb DL, Impens F, Staes A, Timmerman E, Martens L, Laukens K. Quality control in mass spectrometry-based proteomics. MASS SPECTROMETRY REVIEWS 2018; 37:697-711. [PMID: 28802010 DOI: 10.1002/mas.21544] [Citation(s) in RCA: 83] [Impact Index Per Article: 11.9] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/04/2017] [Revised: 07/24/2017] [Accepted: 07/24/2017] [Indexed: 05/21/2023]
Abstract
Mass spectrometry is a highly complex analytical technique and mass spectrometry-based proteomics experiments can be subject to a large variability, which forms an obstacle to obtaining accurate and reproducible results. Therefore, a comprehensive and systematic approach to quality control is an essential requirement to inspire confidence in the generated results. A typical mass spectrometry experiment consists of multiple different phases including the sample preparation, liquid chromatography, mass spectrometry, and bioinformatics stages. We review potential sources of variability that can impact the results of a mass spectrometry experiment occurring in all of these steps, and we discuss how to monitor and remedy the negative influences on the experimental results. Furthermore, we describe how specialized quality control samples of varying sample complexity can be incorporated into the experimental workflow and how they can be used to rigorously assess detailed aspects of the instrument performance.
Collapse
Affiliation(s)
- Wout Bittremieux
- Department of Mathematics and Computer Science, University of Antwerp, Antwerp, Belgium
- Biomedical Informatics Research Center Antwerp (Biomina), University of Antwerp/Antwerp University Hospital, Edegem, Belgium
| | - David L Tabb
- Division of Molecular Biology and Human Genetics, Stellenbosch University Faculty of Medicine and Health Sciences, Tygerberg Hospital, Cape Town, South Africa
| | - Francis Impens
- VIB Proteomics Core, Ghent, Belgium
- VIB-UGent Center for Medical Biotechnology, Ghent, Belgium
- Faculty of Medicine and Health Sciences, Department of Biochemistry, Ghent University, Ghent, Belgium
| | - An Staes
- VIB Proteomics Core, Ghent, Belgium
- VIB-UGent Center for Medical Biotechnology, Ghent, Belgium
- Faculty of Medicine and Health Sciences, Department of Biochemistry, Ghent University, Ghent, Belgium
| | - Evy Timmerman
- VIB Proteomics Core, Ghent, Belgium
- VIB-UGent Center for Medical Biotechnology, Ghent, Belgium
- Faculty of Medicine and Health Sciences, Department of Biochemistry, Ghent University, Ghent, Belgium
| | - Lennart Martens
- VIB-UGent Center for Medical Biotechnology, Ghent, Belgium
- Faculty of Medicine and Health Sciences, Department of Biochemistry, Ghent University, Ghent, Belgium
- Bioinformatics Institute Ghent, Ghent University, Zwijnaarde, Belgium
| | - Kris Laukens
- Department of Mathematics and Computer Science, University of Antwerp, Antwerp, Belgium
- Biomedical Informatics Research Center Antwerp (Biomina), University of Antwerp/Antwerp University Hospital, Edegem, Belgium
| |
Collapse
|
15
|
Uszczynska-Ratajczak B, Lagarde J, Frankish A, Guigó R, Johnson R. Towards a complete map of the human long non-coding RNA transcriptome. Nat Rev Genet 2018; 19:535-548. [PMID: 29795125 PMCID: PMC6451964 DOI: 10.1038/s41576-018-0017-y] [Citation(s) in RCA: 416] [Impact Index Per Article: 59.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/01/2023]
Abstract
Gene maps, or annotations, enable us to navigate the functional landscape of our genome. They are a resource upon which virtually all studies depend, from single-gene to genome-wide scales and from basic molecular biology to medical genetics. Yet present-day annotations suffer from trade-offs between quality and size, with serious but often unappreciated consequences for downstream studies. This is particularly true for long non-coding RNAs (lncRNAs), which are poorly characterized compared to protein-coding genes. Long-read sequencing technologies promise to improve current annotations, paving the way towards a complete annotation of lncRNAs expressed throughout a human lifetime.
Collapse
Affiliation(s)
| | - Julien Lagarde
- Centre for Genomic Regulation (CRG), The Barcelona Institute of Science and Technology, Barcelona, Catalonia, Spain
- Universitat Pompeu Fabra (UPF), Barcelona, Catalonia, Spain
| | - Adam Frankish
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge, UK
| | - Roderic Guigó
- Centre for Genomic Regulation (CRG), The Barcelona Institute of Science and Technology, Barcelona, Catalonia, Spain
- Universitat Pompeu Fabra (UPF), Barcelona, Catalonia, Spain
| | - Rory Johnson
- Department of Medical Oncology, Inselspital, University Hospital and University of Bern, Bern, Switzerland.
- Department of Biomedical Research (DBMR), University of Bern, Bern, Switzerland.
| |
Collapse
|
16
|
Yan R, Zhang J, Zellmer L, Chen L, Wu D, Liu S, Xu N, Liao JD. Probably less than one-tenth of the genes produce only the wild type protein without at least one additional protein isoform in some human cancer cell lines. Oncotarget 2017; 8:82714-82727. [PMID: 29137297 PMCID: PMC5669923 DOI: 10.18632/oncotarget.20015] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/14/2017] [Accepted: 06/30/2017] [Indexed: 11/25/2022] Open
Abstract
To estimate how many genes produce multiple protein isoforms, we electrophoresed proteins from MCF7 and MDA-MB231 (MB231) human breast cancer cells in SDS-PAGE and excised narrow stripes of the gel at the 48kD, 55kD and 72kD. Proteins in these stripes were identified using liquid chromatography and tandem mass spectrometry. A total of 765, 750 and 679 proteins from MB231 cells, as well as 470, 390 and 490 proteins from MCF7 cells, were identified from the 48kD, 55kD and 72kD stripes, respectively. We arbitrarily allowed a 10% technical variation from the proteins' theoretical molecular mass (TMM) and considered those proteins with their TMMs within the 43-53 kD, 49-61 kD and 65-79 kD ranges as the wild type (WT) expected from the corresponding stripe, whereas those with a TMM above or below this range as a smaller- or larger-group, respectively. Only 263 (34.4%), 269 (35.9%) and 151 (22.2%) proteins from MB231 cells and 117 (24.9%), 135 (34.6%) and 130 (26.5%) proteins from MCF7 cells from the 48kD, 55kD and 72kD stripes, respectively, belonged to the WT, while the remaining majority belonged to the smaller- or larger-groups. Only about 3-16%, on average about 10% regardless of the stripe and cell line, of the proteins appeared in only one stripe and within the WT range, while the remaining preponderance appeared also in additional stripe(s) or had a larger or smaller TMM. We conclude that few (fewer than 10%) of the human genes produce only the WT protein without additional isoform(s).
Collapse
Affiliation(s)
- Rui Yan
- Nephrology Department, Guizhou Medical University Hospital, Guiyang, P.R. China
| | - Ju Zhang
- CAS Key Laboratory of Genome Sciences and Information, Beijing Institute of Genomics, Chinese Academy of Sciences, Beijing, P.R. China
| | - Lucas Zellmer
- Hormel Institute, University of Minnesota, Austin, Minnesota, USA
| | - Lichan Chen
- Hormel Institute, University of Minnesota, Austin, Minnesota, USA
| | - Di Wu
- Beijing Protein Innovation Co., Ltd, Beijing, P.R. China
| | - Siqi Liu
- CAS Key Laboratory of Genome Sciences and Information, Beijing Institute of Genomics, Chinese Academy of Sciences, Beijing, P.R. China
| | - Ningzhi Xu
- Laboratory of Cell and Molecular Biology & State Key Laboratory of Molecular Oncology, National Cancer Center/Cancer Hospital, Chinese Academy of Medical Sciences & Peking Union Medical College, Beijing, P.R. China
| | - Joshua D Liao
- Department of Pathology, Guizhou Medical University Hospital, Guiyang, P.R. China
| |
Collapse
|
17
|
Kroll JE, da Silva VL, de Souza SJ, de Souza GA. A tool for integrating genetic and mass spectrometry-based peptide data: Proteogenomics Viewer. Bioessays 2017; 39. [DOI: 10.1002/bies.201700015] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/29/2022]
Affiliation(s)
- José Eduardo Kroll
- Institute of Bioinformatics and Biotechnology; Natal − RN Brazil
- Brain Institute; Universidade Federal do Rio Grande do Norte; Natal − RN Brazil
- Bioinformatics Multidisciplinary Environment; Instituto Metrópole Digital; UFRN, Natal-RN Brazil
| | - Vandeclécio Lira da Silva
- Brain Institute; Universidade Federal do Rio Grande do Norte; Natal − RN Brazil
- Bioinformatics Multidisciplinary Environment; Instituto Metrópole Digital; UFRN, Natal-RN Brazil
| | - Sandro José de Souza
- Brain Institute; Universidade Federal do Rio Grande do Norte; Natal − RN Brazil
- Bioinformatics Multidisciplinary Environment; Instituto Metrópole Digital; UFRN, Natal-RN Brazil
| | - Gustavo Antonio de Souza
- Brain Institute; Universidade Federal do Rio Grande do Norte; Natal − RN Brazil
- Bioinformatics Multidisciplinary Environment; Instituto Metrópole Digital; UFRN, Natal-RN Brazil
- Department of Immunology and Centre for Immune Regulation, Oslo University Hospital HF Rikshospitalet; University of Oslo; Oslo Norway
| |
Collapse
|
18
|
Gilany K, Minai-Tehrani A, Amini M, Agharezaee N, Arjmand B. The Challenge of Human Spermatozoa Proteome: A Systematic Review. J Reprod Infertil 2017; 18:267-279. [PMID: 29062791 PMCID: PMC5641436] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/21/2022] Open
Abstract
Currently, there are 20,197 human protein-coding genes in the most expertly curated database (UniProtKB/Swiss-Pro). Big efforts have been made by the international consortium, the Chromosome-Centric Human Proteome Project (C-HPP) and independent researchers, to map human proteome. In brief, anno 2017 the human proteome was outlined. The male factor contributes to 50% of infertility in couples. However, there are limited human spermatozoa proteomic studies. Firstly, the development of the mapping of the human spermatozoa was analyzed. The human spermatozoa have been used as a model for missing proteins. It has been shown that human spermatozoa are excellent sources for finding missing proteins. Y chromosome proteome mapping is led by Iran. However, it seems that it is extremely challenging to map the human spermatozoa Y chromosome proteins based on current mass spectrometry-based proteomics technology. Post-translation modifications (PTMs) of human spermatozoa proteome are the most unexplored area and currently the exact role of PTMs in male infertility is unknown. Additionally, the clinical human spermatozoa proteomic analysis, anno 2017 was done in this study.
Collapse
Affiliation(s)
- Kambiz Gilany
- Reproductive Biotechnology Research Center, Avicenna Research Institute, ACECR, Tehran, Iran, Metabolomics and Genomics Research Center, Endocrinology and Metabolism Molecular Cellular Sciences Institute, Tehran University of Medical Sciences, Tehran, Iran,Corresponding Author: Kambiz Gilany, Reproductive Biotechnology Research Center, Avicenna Research Institute, ACECR, Tehran, Iran, P.O. Box: 19615-1177 E-mail:
| | - Arash Minai-Tehrani
- Nanobiotechnology Research Center, Avicenna Research Institute, ACECR, Tehran, Iran
| | - Mehdi Amini
- Reproductive Biotechnology Research Center, Avicenna Research Institute, ACECR, Tehran, Iran
| | - Niloofar Agharezaee
- Reproductive Biotechnology Research Center, Avicenna Research Institute, ACECR, Tehran, Iran, Department of Genetics, Tehran Medical Sciences Branch, Islamic Azad University, Tehran, Iran
| | - Babak Arjmand
- Metabolomics and Genomics Research Center, Endocrinology and Metabolism Molecular Cellular Sciences Institute, Tehran University of Medical Sciences, Tehran, Iran, Cell Therapy and Regenerative Medicine Research Center, Endocrinology and Metabolism Molecular Cellular Sciences Institute, Tehran University of Medical Sciences, Tehran, Iran
| |
Collapse
|
19
|
Bittremieux W, Valkenborg D, Martens L, Laukens K. Computational quality control tools for mass spectrometry proteomics. Proteomics 2016; 17. [DOI: 10.1002/pmic.201600159] [Citation(s) in RCA: 27] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/30/2016] [Revised: 07/28/2016] [Accepted: 08/19/2016] [Indexed: 12/30/2022]
Affiliation(s)
- Wout Bittremieux
- Department of Mathematics and Computer Science; University of Antwerp; Antwerp Belgium
- Biomedical Informatics Research Center Antwerp (biomina); University of Antwerp/Antwerp, University Hospital; Edegem Belgium
| | - Dirk Valkenborg
- Flemish Institute for Technological Research (VITO); Mol Belgium
- CFP; University of Antwerp; Antwerp Belgium
- I-BioStat; Hasselt University; Diepenbeek Belgium
| | - Lennart Martens
- Medical Biotechnology Center; VIB; Ghent Belgium
- Department of Biochemistry, Faculty of Medicine and Health Sciences; Ghent University; Ghent Belgium
- Bioinformatics Institute Ghent; Ghent University; Zwijnaarde Belgium
| | - Kris Laukens
- Department of Mathematics and Computer Science; University of Antwerp; Antwerp Belgium
- Biomedical Informatics Research Center Antwerp (biomina); University of Antwerp/Antwerp, University Hospital; Edegem Belgium
| |
Collapse
|
20
|
Tress ML, Abascal F, Valencia A. Alternative Splicing May Not Be the Key to Proteome Complexity. Trends Biochem Sci 2016; 42:98-110. [PMID: 27712956 DOI: 10.1016/j.tibs.2016.08.008] [Citation(s) in RCA: 231] [Impact Index Per Article: 25.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/16/2016] [Revised: 05/19/2016] [Accepted: 08/15/2016] [Indexed: 12/21/2022]
Abstract
Alternative splicing is commonly believed to be a major source of cellular protein diversity. However, although many thousands of alternatively spliced transcripts are routinely detected in RNA-seq studies, reliable large-scale mass spectrometry-based proteomics analyses identify only a small fraction of annotated alternative isoforms. The clearest finding from proteomics experiments is that most human genes have a single main protein isoform, while those alternative isoforms that are identified tend to be the most biologically plausible: those with the most cross-species conservation and those that do not compromise functional domains. Indeed, most alternative exons do not seem to be under selective pressure, suggesting that a large majority of predicted alternative transcripts may not even be translated into proteins.
Collapse
Affiliation(s)
- Michael L Tress
- Structural Biology and Bioinformatics Programme, Spanish National Cancer Research Centre (CNIO), Melchor Fernández Almagro, 3, 28029 Madrid, Spain
| | - Federico Abascal
- Structural Biology and Bioinformatics Programme, Spanish National Cancer Research Centre (CNIO), Melchor Fernández Almagro, 3, 28029 Madrid, Spain; Human Genetics Department, Sandhu Group, Wellcome Trust Sanger Institute, Genome Campus, Hinxton, Cambridge CB10 1SA, UK
| | - Alfonso Valencia
- Structural Biology and Bioinformatics Programme, Spanish National Cancer Research Centre (CNIO), Melchor Fernández Almagro, 3, 28029 Madrid, Spain; National Bioinformatics Institute (INB), Spanish National Cancer Research Centre (CNIO), Melchor Fernández Almagro, 3, 28029 Madrid, Spain.
| |
Collapse
|