1
|
Jeong SK, Kim CY, Paik YK. ASV-ID, a Proteogenomic Workflow To Predict Candidate Protein Isoforms on the Basis of Transcript Evidence. J Proteome Res 2018; 17:4235-4242. [PMID: 30289715 DOI: 10.1021/acs.jproteome.8b00548] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]
Abstract
One of the goals of the Chromosome-Centric Human Proteome Project (C-HPP) is to map and characterize the functions of protein isoforms produced by alternative splicing of genes. However, identifying alternative splice variants (ASVs) via mass spectrometry remains a major challenge, because ASVs usually contain highly homologous peptide sequences. A routine protein sequence analysis suggests that more than half of the investigated proteins do not generate two or more uniquely mapping peptides that would enable their isoforms to be distinguished. Here, we develop a new proteogenomics method, named "ASV-ID" (alternative splicing variants identification), which enables identification of ASVs by using a cell type-specific protein sequence database that is supported by RNA-Seq data. Using this workflow, we identify 1935 distinct proteins under highly stringent conditions. In fact, transcript evidence on these 841 proteins helps us distinguish them from other isoforms, despite the fact that these proteins are not predicted to make 2 or more uniquely mapping peptides. We also demonstrate that ASV-ID enables detection of 19 differently expressed isoforms present in several cell lines. Thus, a new workflow using ASV-ID has the potential to map yet-to-be-identified difficult protein isoforms in a simple and robust way.
Collapse
|
2
|
Paik YK, Overall CM, Deutsch EW, Hancock WS, Omenn GS. Progress in the Chromosome-Centric Human Proteome Project as Highlighted in the Annual Special Issue IV. J Proteome Res 2018; 15:3945-3950. [PMID: 27809547 DOI: 10.1021/acs.jproteome.6b00803] [Citation(s) in RCA: 16] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/13/2022]
Affiliation(s)
- Young-Ki Paik
- Yonsei Proteome Research Center and Department of Biochemistry, Yonsei University
| | - Christopher M Overall
- Centre for Blood Research, Departments of Oral Biological & Medical Sciences, and Biochemistry & Molecular Biology, Faculty of Dentistry, University of British Columbia
| | | | | | - Gilbert S Omenn
- Departments of Computational Medicine & Bioinformatics, Internal Medicine, and Human Genetics and School of Public Health, University of Michigan
| |
Collapse
|
3
|
Paik YK, Omenn GS, Hancock WS, Lane L, Overall CM. Advances in the Chromosome-Centric Human Proteome Project: looking to the future. Expert Rev Proteomics 2017; 14:1059-1071. [PMID: 29039980 DOI: 10.1080/14789450.2017.1394189] [Citation(s) in RCA: 22] [Impact Index Per Article: 3.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/01/2023]
Abstract
INTRODUCTION The mission of the Chromosome-Centric Human Proteome Project (C-HPP), is to map and annotate the entire predicted human protein set (~20,000 proteins) encoded by each chromosome. The initial steps of the project are focused on 'missing proteins (MPs)', which lacked documented evidence for existence at protein level. In addition to remaining 2,579 MPs, we also target those annotated proteins having unknown functions, uPE1 proteins, alternative splice isoforms and post-translational modifications. We also consider how to investigate various protein functions involved in cis-regulatory phenomena, amplicons lncRNAs and smORFs. Areas covered: We will cover the scope, historic background, progress, challenges and future prospects of C-HPP. This review also addresses the question of how we can best improve the methodological approaches, select the optimal biological samples, and recommend stringent protocols for the identification and characterization of MPs. A new strategy for functional analysis of some of those annotated proteins having unknown function will also be discussed. Expert commentary: If the project moves well by reshaping the original goals, the current working modules and team work in the proposed extended planning period, it is anticipated that a progressively more detailed draft of an accurate chromosome-based proteome map will become available with functional information.
Collapse
Affiliation(s)
- Young-Ki Paik
- a Yonsei Proteome Research Center and Department of Biochemistry , Yonsei University , Seoul , Korea
| | - Gilbert S Omenn
- b Department of Computational Medicine & Bioinformatics , University of Michigan , Ann Arbor , MI , USA
| | - William S Hancock
- c Department of Chemical Biology , Northeastern University , Boston , Massachusetts 02115 , USA
| | - Lydie Lane
- d Department of Human Protein Sciences, Faculty of Medicine , University of Geneva , Geneva , Switzerland.,e Swiss Institute of Bioinformatics , Geneva , Switzerland
| | - Christopher M Overall
- f Centre for Blood Research, Departments of Oral Biological & Medical Sciences, and Biochemistry & Molecular Biology, Faculty of Dentistry , University of British Columbia , Vancouver , Canada
| |
Collapse
|
4
|
Na K, Shin H, Cho JY, Jung SH, Lim J, Lim JS, Kim EA, Kim HS, Kang AR, Kim JH, Shin JM, Jeong SK, Kim CY, Park JY, Chung HM, Omenn GS, Hancock WS, Paik YK. Systematic Proteogenomic Approach To Exploring a Novel Function for NHERF1 in Human Reproductive Disorder: Lessons for Exploring Missing Proteins. J Proteome Res 2017; 16:4455-4467. [PMID: 28960081 DOI: 10.1021/acs.jproteome.7b00146] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/22/2022]
Abstract
One of the major goals of the Chromosome-Centric Human Proteome Project (C-HPP) is to fill the knowledge gaps between human genomic information and the corresponding proteomic information. These gaps are due to "missing" proteins (MPs)-predicted proteins with insufficient evidence from mass spectrometry (MS), biochemical, structural, or antibody analyses-that currently account for 2579 of the 19587 predicted human proteins (neXtProt, 2017-01). We address some of the lessons learned from the inconsistent annotations of missing proteins in databases (DB) and demonstrate a systematic proteogenomic approach designed to explore a potential new function of a known protein. To illustrate a cautious and strategic approach for characterization of novel function in vitro and in vivo, we present the case of Na(+)/H(+) exchange regulatory cofactor 1 (NHERF1/SLC9A3R1, located at chromosome 17q25.1; hereafter NHERF1), which was mistakenly labeled as an MP in one DB (Global Proteome Machine Database; GPMDB, 2011-09 release) but was well known in another public DB and in the literature. As a first step, NHERF1 was determined by MS and immunoblotting for its molecular identity. We next investigated the potential new function of NHERF1 by carrying out the quantitative MS profiling of placental trophoblasts (PXD004723) and functional study of cytotrophoblast JEG-3 cells. We found that NHERF1 was associated with trophoblast differentiation and motility. To validate this newly found cellular function of NHERF1, we used the Caenorhabditis elegans mutant of nrfl-1 (a nematode ortholog of NHERF1), which exhibits a protruding vulva (Pvl) and egg-laying-defective phenotype, and performed genetic complementation work. The nrfl-1 mutant was almost fully rescued by the transfection of the recombinant transgenic construct that contained human NHERF1. These results suggest that NHERF1 could have a previously unknown function in pregnancy and in the development of human embryos. Our study outlines a stepwise experimental platform to explore new functions of ambiguously denoted candidate proteins and scrutinizes the mandated DB search for the selection of MPs to study in the future.
Collapse
Affiliation(s)
- Keun Na
- Yonsei Proteome Research Center, Yonsei University , Seoul 03722, South Korea
| | - Heon Shin
- Department of Integrated OMICS for Biomedical Science, Yonsei University , Seoul 03722, South Korea
| | - Jin-Young Cho
- Yonsei Proteome Research Center, Yonsei University , Seoul 03722, South Korea
| | - Sang Hee Jung
- Department of Obstetrics and Gynecology, CHA Bundang Medical Center, CHA University , Seongnam 13496, South Korea
| | - Jaeseung Lim
- CHA Biotech Co., Ltd. , Seongnam 13488, South Korea
| | - Jong-Sun Lim
- Yonsei Proteome Research Center, Yonsei University , Seoul 03722, South Korea
| | - Eun Ah Kim
- Department of Obstetrics and Gynecology, CHA Bundang Medical Center, CHA University , Seongnam 13496, South Korea
| | - Hye Sun Kim
- CHA Biotech Co., Ltd. , Seongnam 13488, South Korea
| | - Ah Reum Kang
- CHA Biotech Co., Ltd. , Seongnam 13488, South Korea
| | - Ji Hye Kim
- CHA Biotech Co., Ltd. , Seongnam 13488, South Korea
| | - Jeong Min Shin
- Department of Biochemistry, CHA University , Seongnam 13488, South Korea
| | - Seul-Ki Jeong
- Yonsei Proteome Research Center, Yonsei University , Seoul 03722, South Korea
| | - Chae-Yeon Kim
- Department of Integrated OMICS for Biomedical Science, Yonsei University , Seoul 03722, South Korea
| | - Jun Young Park
- Department of Integrated OMICS for Biomedical Science, Yonsei University , Seoul 03722, South Korea
| | - Hyung-Min Chung
- Department of Medicine, School of Medicine, Konkuk University , Seoul 143701, South Korea
| | - Gilbert S Omenn
- Center for Computational Medicine and Bioinformatics, University of Michigan , Ann Arbor, Michigan 48109, United States
| | - William S Hancock
- Department of Chemical Biology, Northeastern University , Boston, Massachusetts 02115, United States
| | - Young-Ki Paik
- Yonsei Proteome Research Center, Yonsei University , Seoul 03722, South Korea.,Department of Integrated OMICS for Biomedical Science, Yonsei University , Seoul 03722, South Korea.,Department of Biochemistry, College of Life Science and Biotechnology, Yonsei University , Seoul 03722, South Korea
| |
Collapse
|
5
|
Kim JW, Hwang H, Lim JS, Lee HJ, Jeong SK, Yoo JS, Paik YK. gFinder: A Web-Based Bioinformatics Tool for the Analysis of N-Glycopeptides. J Proteome Res 2016; 15:4116-4125. [PMID: 27573070 DOI: 10.1021/acs.jproteome.6b00772] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/16/2023]
Abstract
Glycoproteins influence numerous indispensable biological functions, and changes in protein glycosylation have been observed in various diseases. The identification and characterization of glycoprotein and glycosylation sites by mass spectrometry (MS) remain challenging tasks, and great efforts have been devoted to the development of proteome informatics tools that facilitate the MS analysis of glycans and glycopeptides. Here we report on the development of gFinder, a web-based bioinformatics tool that analyzes mixtures of native N-glycopeptides that have been profiled by tandem MS. gFinder not only enables the simultaneous integration of collision-induced dissociation (CID) and high-energy collisional dissociation (HCD) fragmentation but also merges the spectra for high-throughput analysis. These merged spectra expedite the identification of both glycans and N-glycopeptide backbones in tandem MS data using the glycan database and a proteomic search tool (e.g., Mascot). These data can be used to simultaneously characterize peptide backbone sequences and possible N-glycan structures using assigned scores. gFinder also provides many convenient functions that make it easy to perform manual calculations while viewing the spectrum on-screen. We used gFinder to detect an additional protein (Q8N9B8) that was missed from the previously published data set containing N-linked glycosylation. For N-glycan analysis, we used the GlycomeDB glycan structure database, which integrates the structural and taxonomic data from all of the major carbohydrate databases available in the public domain. Thus, gFinder is a convenient, high-throughput analytical tool for interpreting the tandem mass spectra of N-glycopeptides, which can then be used for identification of potential missing proteins having glycans. gFinder is available publicly at http://gFinder.proteomix.org/ .
Collapse
Affiliation(s)
- Ju-Wan Kim
- Graduate Program in Functional Genomics, College of Life Sciences and Biotechnology, Yonsei University , Seoul 03722, Korea.,Yonsei Proteome Research Center , Seoul 03722, Korea
| | - Heeyoun Hwang
- Korea Basic Science Institute , Ochang 28199, Chungbuk, Korea
| | - Jong-Sun Lim
- Yonsei Proteome Research Center , Seoul 03722, Korea
| | | | - Seul-Ki Jeong
- Yonsei Proteome Research Center , Seoul 03722, Korea
| | - Jong Shin Yoo
- Korea Basic Science Institute , Ochang 28199, Chungbuk, Korea
| | - Young-Ki Paik
- Graduate Program in Functional Genomics, College of Life Sciences and Biotechnology, Yonsei University , Seoul 03722, Korea.,Yonsei Proteome Research Center , Seoul 03722, Korea
| |
Collapse
|
6
|
Paik YK, Omenn GS, Overall CM, Deutsch EW, Hancock WS. Recent Advances in the Chromosome-Centric Human Proteome Project: Missing Proteins in the Spot Light. J Proteome Res 2016; 14:3409-14. [PMID: 26337862 DOI: 10.1021/acs.jproteome.5b00785] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
Affiliation(s)
- Young-Ki Paik
- Yonsei Proteome Research Center, Yonsei University , Seoul 120-749, Korea
| | - Gilbert S Omenn
- Center for Computational Medicine and Bioinformatics, University of Michigan , Ann Arbor, Michigan 48109, United States.,Yonsei Proteome Research Center, Yonsei University , Seoul 120-749, Korea
| | - Christopher M Overall
- Department of Biochemistry and Molecular Biology, University of British Columbia , Vancouver, British Columbia V6T 1Z3, Canada.,Yonsei Proteome Research Center, Yonsei University , Seoul 120-749, Korea
| | - Eric W Deutsch
- Institute for Systems Biology , Seattle, Washington 98109, United States.,Yonsei Proteome Research Center, Yonsei University , Seoul 120-749, Korea
| | - William S Hancock
- Department of Chemical Biology, Northeastern University , Boston, Massachusetts 02115, United States.,Yonsei Proteome Research Center, Yonsei University , Seoul 120-749, Korea
| |
Collapse
|
7
|
Li HD, Omenn GS, Guan Y. A proteogenomic approach to understand splice isoform functions through sequence and expression-based computational modeling. Brief Bioinform 2016; 17:1024-1031. [PMID: 26740460 DOI: 10.1093/bib/bbv109] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/27/2015] [Revised: 11/03/2015] [Indexed: 01/23/2023] Open
Abstract
The products of multi-exon genes are a mixture of alternatively spliced isoforms, from which the translated proteins can have similar, different or even opposing functions. It is therefore essential to differentiate and annotate functions for individual isoforms. Computational approaches provide an efficient complement to expensive and time-consuming experimental studies. The input data of these methods range from DNA sequence, to RNA selection pressure, to expressed sequence tags, to full-length complementary DNA, to exon array, to RNA-seq expression, to proteomic data. Notably, RNA-seq technology generates quantitative profiling of transcript expression at the genome scale, with an unprecedented amount of expression data available for developing isoform function prediction methods. Integrative analysis of these data at different molecular levels enables a proteogenomic approach to systematically interrogate isoform functions. Here, we briefly review the state-of-the-art methods according to their input data sources, discuss their advantages and limitations and point out potential ways to improve prediction accuracies.
Collapse
|
8
|
Mayne J, Ning Z, Zhang X, Starr AE, Chen R, Deeke S, Chiang CK, Xu B, Wen M, Cheng K, Seebun D, Star A, Moore JI, Figeys D. Bottom-Up Proteomics (2013-2015): Keeping up in the Era of Systems Biology. Anal Chem 2015; 88:95-121. [PMID: 26558748 DOI: 10.1021/acs.analchem.5b04230] [Citation(s) in RCA: 49] [Impact Index Per Article: 5.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/07/2023]
Affiliation(s)
- Janice Mayne
- Ottawa Institute of Systems Biology, Department of Biochemistry, Microbiology and Immunology, University of Ottawa , 451 Smyth Rd., Ottawa, Ontario, Canada , K1H8M5
| | - Zhibin Ning
- Ottawa Institute of Systems Biology, Department of Biochemistry, Microbiology and Immunology, University of Ottawa , 451 Smyth Rd., Ottawa, Ontario, Canada , K1H8M5
| | - Xu Zhang
- Ottawa Institute of Systems Biology, Department of Biochemistry, Microbiology and Immunology, University of Ottawa , 451 Smyth Rd., Ottawa, Ontario, Canada , K1H8M5
| | - Amanda E Starr
- Ottawa Institute of Systems Biology, Department of Biochemistry, Microbiology and Immunology, University of Ottawa , 451 Smyth Rd., Ottawa, Ontario, Canada , K1H8M5
| | - Rui Chen
- Ottawa Institute of Systems Biology, Department of Biochemistry, Microbiology and Immunology, University of Ottawa , 451 Smyth Rd., Ottawa, Ontario, Canada , K1H8M5
| | - Shelley Deeke
- Ottawa Institute of Systems Biology, Department of Biochemistry, Microbiology and Immunology, University of Ottawa , 451 Smyth Rd., Ottawa, Ontario, Canada , K1H8M5
| | - Cheng-Kang Chiang
- Ottawa Institute of Systems Biology, Department of Biochemistry, Microbiology and Immunology, University of Ottawa , 451 Smyth Rd., Ottawa, Ontario, Canada , K1H8M5
| | - Bo Xu
- Ottawa Institute of Systems Biology, Department of Biochemistry, Microbiology and Immunology, University of Ottawa , 451 Smyth Rd., Ottawa, Ontario, Canada , K1H8M5
| | - Ming Wen
- Ottawa Institute of Systems Biology, Department of Biochemistry, Microbiology and Immunology, University of Ottawa , 451 Smyth Rd., Ottawa, Ontario, Canada , K1H8M5
| | - Kai Cheng
- Ottawa Institute of Systems Biology, Department of Biochemistry, Microbiology and Immunology, University of Ottawa , 451 Smyth Rd., Ottawa, Ontario, Canada , K1H8M5
| | - Deeptee Seebun
- Ottawa Institute of Systems Biology, Department of Biochemistry, Microbiology and Immunology, University of Ottawa , 451 Smyth Rd., Ottawa, Ontario, Canada , K1H8M5
| | - Alexandra Star
- Ottawa Institute of Systems Biology, Department of Biochemistry, Microbiology and Immunology, University of Ottawa , 451 Smyth Rd., Ottawa, Ontario, Canada , K1H8M5
| | - Jasmine I Moore
- Ottawa Institute of Systems Biology, Department of Biochemistry, Microbiology and Immunology, University of Ottawa , 451 Smyth Rd., Ottawa, Ontario, Canada , K1H8M5
| | - Daniel Figeys
- Ottawa Institute of Systems Biology, Department of Biochemistry, Microbiology and Immunology, University of Ottawa , 451 Smyth Rd., Ottawa, Ontario, Canada , K1H8M5
| |
Collapse
|