1
|
Shi C, Liu F, Su X, Yang Z, Wang Y, Xie S, Xie S, Sun Q, Chen Y, Sang L, Tan M, Zhu L, Lei K, Li J, Yang J, Gao Z, Yu M, Wang X, Wang J, Chen J, Zhuo W, Fang Z, Liu J, Yan Q, Neculai D, Sun Q, Shao J, Lin W, Liu W, Chen J, Wang L, Liu Y, Li X, Zhou T, Lin A. Comprehensive discovery and functional characterization of the noncanonical proteome. Cell Res 2025; 35:186-204. [PMID: 39794466 PMCID: PMC11909191 DOI: 10.1038/s41422-024-01059-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/21/2024] [Accepted: 11/14/2024] [Indexed: 01/13/2025] Open
Abstract
The systematic identification and functional characterization of noncanonical translation products, such as novel peptides, will facilitate the understanding of the human genome and provide new insights into cell biology. Here, we constructed a high-coverage peptide sequencing reference library with 11,668,944 open reading frames and employed an ultrafiltration tandem mass spectrometry assay to identify novel peptides. Through these methods, we discovered 8945 previously unannotated peptides from normal gastric tissues, gastric cancer tissues and cell lines, nearly half of which were derived from noncoding RNAs. Moreover, our CRISPR screening revealed that 1161 peptides are involved in tumor cell proliferation. The presence and physiological function of a subset of these peptides, selected based on screening scores, amino acid length, and various indicators, were verified through Flag-knockin and multiple other methods. To further characterize the potential regulatory mechanisms involved, we constructed a framework based on artificial intelligence structure prediction and peptide‒protein interaction network analysis for the top 100 candidates and revealed that these cancer-related peptides have diverse subcellular locations and participate in organelle-specific processes. Further investigation verified the interacting partners of pep1-nc-OLMALINC, pep5-nc-TRHDE-AS1, pep-nc-ZNF436-AS1 and pep2-nc-AC027045.3, and the functions of these peptides in mitochondrial complex assembly, energy metabolism, and cholesterol metabolism, respectively. We showed that pep5-nc-TRHDE-AS1 and pep2-nc-AC027045.3 had substantial impacts on tumor growth in xenograft models. Furthermore, the dysregulation of these four peptides is closely correlated with clinical prognosis. Taken together, our study provides a comprehensive characterization of the noncanonical proteome, and highlights critical roles of these previously unannotated peptides in cancer biology.
Collapse
Affiliation(s)
- Chengyu Shi
- The Center for RNA Medicine, International Institutes of Medicine, International School of Medicine, The 4th Affiliated Hospital of Zhejiang University School of Medicine, Yiwu, Zhejiang, China
- MOE Laboratory of Biosystem Homeostasis and Protection, College of Life Sciences, Zhejiang University, Hangzhou, Zhejiang, China
- Cancer Center, Zhejiang University, Hangzhou, Zhejiang, China
- Key Laboratory of Cancer Prevention and Intervention, China National Ministry of Education, Hangzhou, Zhejiang, China
| | - Fangzhou Liu
- The Center for RNA Medicine, International Institutes of Medicine, International School of Medicine, The 4th Affiliated Hospital of Zhejiang University School of Medicine, Yiwu, Zhejiang, China
- MOE Laboratory of Biosystem Homeostasis and Protection, College of Life Sciences, Zhejiang University, Hangzhou, Zhejiang, China
- Cancer Center, Zhejiang University, Hangzhou, Zhejiang, China
- Key Laboratory of Cancer Prevention and Intervention, China National Ministry of Education, Hangzhou, Zhejiang, China
| | - Xinwan Su
- The Center for RNA Medicine, International Institutes of Medicine, International School of Medicine, The 4th Affiliated Hospital of Zhejiang University School of Medicine, Yiwu, Zhejiang, China
- MOE Laboratory of Biosystem Homeostasis and Protection, College of Life Sciences, Zhejiang University, Hangzhou, Zhejiang, China
- Cancer Center, Zhejiang University, Hangzhou, Zhejiang, China
- Key Laboratory of Cancer Prevention and Intervention, China National Ministry of Education, Hangzhou, Zhejiang, China
| | - Zuozhen Yang
- MOE Laboratory of Biosystem Homeostasis and Protection, College of Life Sciences, Zhejiang University, Hangzhou, Zhejiang, China
- Cancer Center, Zhejiang University, Hangzhou, Zhejiang, China
- Key Laboratory of Cancer Prevention and Intervention, China National Ministry of Education, Hangzhou, Zhejiang, China
| | - Ying Wang
- MOE Laboratory of Biosystem Homeostasis and Protection, College of Life Sciences, Zhejiang University, Hangzhou, Zhejiang, China
- Cancer Center, Zhejiang University, Hangzhou, Zhejiang, China
- Key Laboratory of Cancer Prevention and Intervention, China National Ministry of Education, Hangzhou, Zhejiang, China
| | - Shanshan Xie
- Cancer Center, Zhejiang University, Hangzhou, Zhejiang, China
- Department of Cell Biology and Program in Molecular Cell Biology, Zhejiang University School of Medicine, Hangzhou, Zhejiang, China
- Department of Gastroenterology, the Second Affiliated Hospital, School of Medicine and Institute of Gastroenterology, Zhejiang University, Hangzhou, Zhejiang, China
| | - Shaofang Xie
- Key Laboratory of Structural Biology of Zhejiang Province, Westlake Laboratory of Life Sciences and Biomedicine, Westlake University, Hangzhou, Zhejiang, China
| | - Qiang Sun
- The Center for RNA Medicine, International Institutes of Medicine, International School of Medicine, The 4th Affiliated Hospital of Zhejiang University School of Medicine, Yiwu, Zhejiang, China
| | - Yu Chen
- MOE Laboratory of Biosystem Homeostasis and Protection, College of Life Sciences, Zhejiang University, Hangzhou, Zhejiang, China
- Cancer Center, Zhejiang University, Hangzhou, Zhejiang, China
- Key Laboratory of Cancer Prevention and Intervention, China National Ministry of Education, Hangzhou, Zhejiang, China
| | - Lingjie Sang
- MOE Laboratory of Biosystem Homeostasis and Protection, College of Life Sciences, Zhejiang University, Hangzhou, Zhejiang, China
- Cancer Center, Zhejiang University, Hangzhou, Zhejiang, China
- Key Laboratory of Cancer Prevention and Intervention, China National Ministry of Education, Hangzhou, Zhejiang, China
| | - Manman Tan
- MOE Laboratory of Biosystem Homeostasis and Protection, College of Life Sciences, Zhejiang University, Hangzhou, Zhejiang, China
- Cancer Center, Zhejiang University, Hangzhou, Zhejiang, China
- Key Laboratory of Cancer Prevention and Intervention, China National Ministry of Education, Hangzhou, Zhejiang, China
| | - Linyu Zhu
- MOE Laboratory of Biosystem Homeostasis and Protection, College of Life Sciences, Zhejiang University, Hangzhou, Zhejiang, China
- Cancer Center, Zhejiang University, Hangzhou, Zhejiang, China
- Key Laboratory of Cancer Prevention and Intervention, China National Ministry of Education, Hangzhou, Zhejiang, China
| | - Kai Lei
- MOE Laboratory of Biosystem Homeostasis and Protection, College of Life Sciences, Zhejiang University, Hangzhou, Zhejiang, China
- Cancer Center, Zhejiang University, Hangzhou, Zhejiang, China
- Key Laboratory of Cancer Prevention and Intervention, China National Ministry of Education, Hangzhou, Zhejiang, China
| | - Junhong Li
- MOE Laboratory of Biosystem Homeostasis and Protection, College of Life Sciences, Zhejiang University, Hangzhou, Zhejiang, China
- Cancer Center, Zhejiang University, Hangzhou, Zhejiang, China
- Key Laboratory of Cancer Prevention and Intervention, China National Ministry of Education, Hangzhou, Zhejiang, China
| | - Jiecheng Yang
- MOE Laboratory of Biosystem Homeostasis and Protection, College of Life Sciences, Zhejiang University, Hangzhou, Zhejiang, China
- Cancer Center, Zhejiang University, Hangzhou, Zhejiang, China
- Key Laboratory of Cancer Prevention and Intervention, China National Ministry of Education, Hangzhou, Zhejiang, China
| | - Zerui Gao
- MOE Laboratory of Biosystem Homeostasis and Protection, College of Life Sciences, Zhejiang University, Hangzhou, Zhejiang, China
- Cancer Center, Zhejiang University, Hangzhou, Zhejiang, China
- Key Laboratory of Cancer Prevention and Intervention, China National Ministry of Education, Hangzhou, Zhejiang, China
| | - Meng Yu
- MOE Laboratory of Biosystem Homeostasis and Protection, College of Life Sciences, Zhejiang University, Hangzhou, Zhejiang, China
- Cancer Center, Zhejiang University, Hangzhou, Zhejiang, China
- Key Laboratory of Cancer Prevention and Intervention, China National Ministry of Education, Hangzhou, Zhejiang, China
| | - Xinyi Wang
- MOE Laboratory of Biosystem Homeostasis and Protection, College of Life Sciences, Zhejiang University, Hangzhou, Zhejiang, China
- Cancer Center, Zhejiang University, Hangzhou, Zhejiang, China
- Key Laboratory of Cancer Prevention and Intervention, China National Ministry of Education, Hangzhou, Zhejiang, China
| | - Junfeng Wang
- MOE Laboratory of Biosystem Homeostasis and Protection, College of Life Sciences, Zhejiang University, Hangzhou, Zhejiang, China
- Cancer Center, Zhejiang University, Hangzhou, Zhejiang, China
- Key Laboratory of Cancer Prevention and Intervention, China National Ministry of Education, Hangzhou, Zhejiang, China
| | - Jing Chen
- Department of Gastrointestinal Surgery, The Second Affiliated Hospital, Zhejiang University School of Medicine, Hangzhou, Zhejiang, China
| | - Wei Zhuo
- Cancer Center, Zhejiang University, Hangzhou, Zhejiang, China
- Department of Cell Biology and Program in Molecular Cell Biology, Zhejiang University School of Medicine, Hangzhou, Zhejiang, China
- Department of Gastroenterology, the Second Affiliated Hospital, School of Medicine and Institute of Gastroenterology, Zhejiang University, Hangzhou, Zhejiang, China
| | - Zhaoyuan Fang
- Zhejiang University-University of Edinburgh Institute, Zhejiang University School of Medicine, Haining, Zhejiang, China
- The Second Affiliated Hospital, Zhejiang University School of Medicine, Hangzhou, Zhejiang, China
| | - Jian Liu
- Zhejiang University-University of Edinburgh Institute, Zhejiang University School of Medicine, Haining, Zhejiang, China
- Hangzhou Cancer Hospital, Hangzhou, Zhejiang, China
| | - Qingfeng Yan
- MOE Laboratory of Biosystem Homeostasis and Protection, College of Life Sciences, Zhejiang University, Hangzhou, Zhejiang, China
| | - Dante Neculai
- The Center for RNA Medicine, International Institutes of Medicine, International School of Medicine, The 4th Affiliated Hospital of Zhejiang University School of Medicine, Yiwu, Zhejiang, China
| | - Qiming Sun
- The Center for RNA Medicine, International Institutes of Medicine, International School of Medicine, The 4th Affiliated Hospital of Zhejiang University School of Medicine, Yiwu, Zhejiang, China
| | - Jianzhong Shao
- MOE Laboratory of Biosystem Homeostasis and Protection, College of Life Sciences, Zhejiang University, Hangzhou, Zhejiang, China
| | - Weiqiang Lin
- Department of Nephrology, Center for Regeneration and Aging Medicine, The Fourth Affiliated Hospital of School of Medicine and International School of Medicine, International Institutes of Medicine, Zhejiang University, Yiwu, Zhejiang, China
| | - Wei Liu
- The Center for RNA Medicine, International Institutes of Medicine, International School of Medicine, The 4th Affiliated Hospital of Zhejiang University School of Medicine, Yiwu, Zhejiang, China
| | - Jian Chen
- Cancer Center, Zhejiang University, Hangzhou, Zhejiang, China
- Department of Gastrointestinal Surgery, The Second Affiliated Hospital, Zhejiang University School of Medicine, Hangzhou, Zhejiang, China
| | - Liangjing Wang
- Department of Gastroenterology, the Second Affiliated Hospital, School of Medicine and Institute of Gastroenterology, Zhejiang University, Hangzhou, Zhejiang, China
| | - Yang Liu
- Institute of Immunology, Zhejiang University School of Medicine, Hangzhou, Zhejiang, China
| | - Xu Li
- Key Laboratory of Structural Biology of Zhejiang Province, Westlake Laboratory of Life Sciences and Biomedicine, Westlake University, Hangzhou, Zhejiang, China
| | - Tianhua Zhou
- The Center for RNA Medicine, International Institutes of Medicine, International School of Medicine, The 4th Affiliated Hospital of Zhejiang University School of Medicine, Yiwu, Zhejiang, China.
- Cancer Center, Zhejiang University, Hangzhou, Zhejiang, China.
- Department of Cell Biology and Program in Molecular Cell Biology, Zhejiang University School of Medicine, Hangzhou, Zhejiang, China.
- Department of Molecular Genetics, University of Toronto, Toronto, ON, Canada.
| | - Aifu Lin
- The Center for RNA Medicine, International Institutes of Medicine, International School of Medicine, The 4th Affiliated Hospital of Zhejiang University School of Medicine, Yiwu, Zhejiang, China.
- MOE Laboratory of Biosystem Homeostasis and Protection, College of Life Sciences, Zhejiang University, Hangzhou, Zhejiang, China.
- Cancer Center, Zhejiang University, Hangzhou, Zhejiang, China.
- Key Laboratory of Cancer Prevention and Intervention, China National Ministry of Education, Hangzhou, Zhejiang, China.
- Future Health Laboratory, Innovation Center of Yangtze River Delta, Zhejiang University, Jiashan, Zhejiang, China.
- Key Laboratory for Cell and Gene Engineering of Zhejiang Province, Hangzhou, Zhejiang, China.
| |
Collapse
|
2
|
Kochetov AV. Evaluation of Eukaryotic mRNA Coding Potential. Methods Mol Biol 2025; 2859:319-331. [PMID: 39436610 DOI: 10.1007/978-1-0716-4152-1_18] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/23/2024]
Abstract
It is widely discussed that eukaryotic mRNAs can encode several functional polypeptides. Recent progress in NGS and proteomics techniques has resulted in a huge volume of information on potential alternative translation initiation sites and open reading frames (altORFs). However, these data are still incomprehensive, and the vast majority of eukaryotic mRNAs annotated in conventional databases (e.g., GenBank) contain a single ORF (CDS) encoding a protein larger than some arbitrary threshold (commonly 100 amino acid residues). Indeed, some gene functions may relate to the polypeptides encoded by unannotated altORFs, and insufficient information in nucleotide sequence databanks may limit the interpretation of genomics and transcriptomics data. However, despite the need for special experiments to predict altORFs accurately, there are some simple methods for their preliminary mapping.
Collapse
Affiliation(s)
- Alex V Kochetov
- Institute of Cytology and Genetics, SB RAS, Novosibirsk, Russia.
- Novosibirsk State Agrarian University, Novosibirsk, Russia.
- Novosibirsk State University, Novosibirsk, Russia.
| |
Collapse
|
3
|
Vasylieva V, Arefiev I, Bourassa F, Trifiro FA, Brunet MA. Proteomics Can Rise to the Challenge of Pseudogenes' Coding Nature. J Proteome Res 2024; 23:5233-5249. [PMID: 39486438 PMCID: PMC11629383 DOI: 10.1021/acs.jproteome.4c00116] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/19/2024] [Revised: 09/18/2024] [Accepted: 10/18/2024] [Indexed: 11/04/2024]
Abstract
Throughout the past decade, technological advances in genomics and transcriptomics have revealed pervasive translation throughout mammalian genomes. These putative proteins are usually excluded from proteomics analyses, as they are absent from common protein repositories. A sizable portion of these noncanonical proteins is translated from pseudogenes. Pseudogenes are commonly termed defective copies of coding genes unable to produce proteins. Here, we suggest that proteomics can help in their annotation. First, we define important terms and review specific examples underlining the caveats in pseudogene annotation and their coding potential. Then, we will discuss the challenges inherent to pseudogenes that have thus far rendered complex their confidence in omics data. Finally, we identify recent developments in experimental procedures, instrumentation, and computational methods in proteomics that put the field in a unique position to solve the pseudogene annotation conundrum.
Collapse
Affiliation(s)
- Valeriia Vasylieva
- Pediatrics
Department, Université de Sherbrooke, Sherbrooke, Québec J1K 2R1, Canada
- Centre
de Recherche du Centre hospitalier de l’université de
Sherbrooke (CRCHUS), Sherbrooke, Québec J1E 4K8, Canada
| | - Ihor Arefiev
- Pediatrics
Department, Université de Sherbrooke, Sherbrooke, Québec J1K 2R1, Canada
- Centre
de Recherche du Centre hospitalier de l’université de
Sherbrooke (CRCHUS), Sherbrooke, Québec J1E 4K8, Canada
| | - Francis Bourassa
- Pediatrics
Department, Université de Sherbrooke, Sherbrooke, Québec J1K 2R1, Canada
- Centre
de Recherche du Centre hospitalier de l’université de
Sherbrooke (CRCHUS), Sherbrooke, Québec J1E 4K8, Canada
| | - Félix-Antoine Trifiro
- Pediatrics
Department, Université de Sherbrooke, Sherbrooke, Québec J1K 2R1, Canada
- Centre
de Recherche du Centre hospitalier de l’université de
Sherbrooke (CRCHUS), Sherbrooke, Québec J1E 4K8, Canada
| | - Marie A. Brunet
- Pediatrics
Department, Université de Sherbrooke, Sherbrooke, Québec J1K 2R1, Canada
- Centre
de Recherche du Centre hospitalier de l’université de
Sherbrooke (CRCHUS), Sherbrooke, Québec J1E 4K8, Canada
| |
Collapse
|
4
|
Mohsen JJ, Martel AA, Slavoff SA. Microproteins-Discovery, structure, and function. Proteomics 2023; 23:e2100211. [PMID: 37603371 PMCID: PMC10841188 DOI: 10.1002/pmic.202100211] [Citation(s) in RCA: 5] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/04/2023] [Revised: 08/03/2023] [Accepted: 08/10/2023] [Indexed: 08/22/2023]
Abstract
Advances in proteogenomic technologies have revealed hundreds to thousands of translated small open reading frames (sORFs) that encode microproteins in genomes across evolutionary space. While many microproteins have now been shown to play critical roles in biology and human disease, a majority of recently identified microproteins have little or no experimental evidence regarding their functionality. Computational tools have some limitations for analysis of short, poorly conserved microprotein sequences, so additional approaches are needed to determine the role of each member of this recently discovered polypeptide class. A currently underexplored avenue in the study of microproteins is structure prediction and determination, which delivers a depth of functional information. In this review, we provide a brief overview of microprotein discovery methods, then examine examples of microprotein structures (and, conversely, intrinsic disorder) that have been experimentally determined using crystallography, cryo-electron microscopy, and NMR, which provide insight into their molecular functions and mechanisms. Additionally, we discuss examples of predicted microprotein structures that have provided insight or context regarding their function. Analysis of microprotein structure at the angstrom level, and confirmation of predicted structures, therefore, has potential to identify translated microproteins that are of biological importance and to provide molecular mechanism for their in vivo roles.
Collapse
Affiliation(s)
- Jessica J. Mohsen
- Department of Chemistry, Yale University, New Haven, CT, USA
- Institute of Biomolecular Design and Discovery, Yale University, West Haven, CT, USA
| | - Alina A. Martel
- Institute of Biomolecular Design and Discovery, Yale University, West Haven, CT, USA
| | - Sarah A. Slavoff
- Department of Chemistry, Yale University, New Haven, CT, USA
- Institute of Biomolecular Design and Discovery, Yale University, West Haven, CT, USA
- Department of Molecular Biophysics and Biochemistry, Yale University, New Haven, CT, USA
| |
Collapse
|
5
|
Prensner JR, Abelin JG, Kok LW, Clauser KR, Mudge JM, Ruiz-Orera J, Bassani-Sternberg M, Moritz RL, Deutsch EW, van Heesch S. What Can Ribo-Seq, Immunopeptidomics, and Proteomics Tell Us About the Noncanonical Proteome? Mol Cell Proteomics 2023; 22:100631. [PMID: 37572790 PMCID: PMC10506109 DOI: 10.1016/j.mcpro.2023.100631] [Citation(s) in RCA: 38] [Impact Index Per Article: 19.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/14/2023] [Revised: 07/21/2023] [Accepted: 08/08/2023] [Indexed: 08/14/2023] Open
Abstract
Ribosome profiling (Ribo-Seq) has proven transformative for our understanding of the human genome and proteome by illuminating thousands of noncanonical sites of ribosome translation outside the currently annotated coding sequences (CDSs). A conservative estimate suggests that at least 7000 noncanonical ORFs are translated, which, at first glance, has the potential to expand the number of human protein CDSs by 30%, from ∼19,500 annotated CDSs to over 26,000 annotated CDSs. Yet, additional scrutiny of these ORFs has raised numerous questions about what fraction of them truly produce a protein product and what fraction of those can be understood as proteins according to conventional understanding of the term. Adding further complication is the fact that published estimates of noncanonical ORFs vary widely by around 30-fold, from several thousand to several hundred thousand. The summation of this research has left the genomics and proteomics communities both excited by the prospect of new coding regions in the human genome but searching for guidance on how to proceed. Here, we discuss the current state of noncanonical ORF research, databases, and interpretation, focusing on how to assess whether a given ORF can be said to be "protein coding."
Collapse
Affiliation(s)
- John R Prensner
- Division of Pediatric Hematology/Oncology, Department of Pediatrics, University of Michigan Medical School, Ann Arbor, Michigan, USA; Department of Biological Chemistry, University of Michigan Medical School, Ann Arbor, Michigan, USA.
| | | | - Leron W Kok
- Princess Máxima Center for Pediatric Oncology, Utrecht, The Netherlands
| | - Karl R Clauser
- Broad Institute of MIT and Harvard, Cambridge, Massachusetts, USA
| | - Jonathan M Mudge
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Cambridge, UK
| | - Jorge Ruiz-Orera
- Cardiovascular and Metabolic Sciences, Max Delbrück Center for Molecular Medicine in the Helmholtz Association (MDC), Berlin, Germany
| | - Michal Bassani-Sternberg
- Ludwig Institute for Cancer Research, Agora Center Bugnon 25A, University of Lausanne, Lausanne, Switzerland; Department of Oncology, Centre Hospitalier Universitaire Vaudois (CHUV), Lausanne, Switzerland; Agora Cancer Research Centre, Lausanne, Switzerland
| | - Robert L Moritz
- Institute for Systems Biology (ISB), Seattle, Washington, USA
| | - Eric W Deutsch
- Institute for Systems Biology (ISB), Seattle, Washington, USA
| | | |
Collapse
|
6
|
Bogaert A, Fijalkowska D, Staes A, Van de Steene T, Vuylsteke M, Stadler C, Eyckerman S, Spirohn K, Hao T, Calderwood MA, Gevaert K. N-terminal proteoforms may engage in different protein complexes. Life Sci Alliance 2023; 6:e202301972. [PMID: 37316325 PMCID: PMC10267514 DOI: 10.26508/lsa.202301972] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/06/2023] [Revised: 05/26/2023] [Accepted: 05/30/2023] [Indexed: 06/16/2023] Open
Abstract
Alternative translation initiation and alternative splicing may give rise to N-terminal proteoforms, proteins that differ at their N-terminus compared with their canonical counterparts. Such proteoforms can have altered localizations, stabilities, and functions. Although proteoforms generated from splice variants can be engaged in different protein complexes, it remained to be studied to what extent this applies to N-terminal proteoforms. To address this, we mapped the interactomes of several pairs of N-terminal proteoforms and their canonical counterparts. First, we generated a catalogue of N-terminal proteoforms found in the HEK293T cellular cytosol from which 22 pairs were selected for interactome profiling. In addition, we provide evidence for the expression of several N-terminal proteoforms, identified in our catalogue, across different human tissues, as well as tissue-specific expression, highlighting their biological relevance. Protein-protein interaction profiling revealed that the overlap of the interactomes for both proteoforms is generally high, showing their functional relation. We also showed that N-terminal proteoforms can be engaged in new interactions and/or lose several interactions compared with their canonical counterparts, thus further expanding the functional diversity of proteomes.
Collapse
Affiliation(s)
- Annelies Bogaert
- VIB Center for Medical Biotechnology, VIB, Ghent, Belgium
- Department of Biomolecular Medicine, Ghent University, Ghent, Belgium
| | - Daria Fijalkowska
- VIB Center for Medical Biotechnology, VIB, Ghent, Belgium
- Department of Biomolecular Medicine, Ghent University, Ghent, Belgium
| | - An Staes
- VIB Center for Medical Biotechnology, VIB, Ghent, Belgium
- Department of Biomolecular Medicine, Ghent University, Ghent, Belgium
| | - Tessa Van de Steene
- VIB Center for Medical Biotechnology, VIB, Ghent, Belgium
- Department of Biomolecular Medicine, Ghent University, Ghent, Belgium
| | | | - Charlotte Stadler
- Department of Protein Science, KTH Royal Institute of Technology and Science for Life Laboratories, Stockholm, Sweden
| | - Sven Eyckerman
- VIB Center for Medical Biotechnology, VIB, Ghent, Belgium
- Department of Biomolecular Medicine, Ghent University, Ghent, Belgium
| | - Kerstin Spirohn
- Center for Cancer Systems Biology (CCSB), Dana-Farber Cancer Institute, Boston, MA, USA
- Department of Genetics, Blavatnik Institute, Harvard Medical School, Boston, MA, USA
- Department of Cancer Biology, Dana-Farber Cancer Institute, Boston, MA, USA
| | - Tong Hao
- Center for Cancer Systems Biology (CCSB), Dana-Farber Cancer Institute, Boston, MA, USA
- Department of Genetics, Blavatnik Institute, Harvard Medical School, Boston, MA, USA
- Department of Cancer Biology, Dana-Farber Cancer Institute, Boston, MA, USA
| | - Michael A Calderwood
- Center for Cancer Systems Biology (CCSB), Dana-Farber Cancer Institute, Boston, MA, USA
- Department of Genetics, Blavatnik Institute, Harvard Medical School, Boston, MA, USA
- Department of Cancer Biology, Dana-Farber Cancer Institute, Boston, MA, USA
| | - Kris Gevaert
- VIB Center for Medical Biotechnology, VIB, Ghent, Belgium
- Department of Biomolecular Medicine, Ghent University, Ghent, Belgium
| |
Collapse
|
7
|
Prensner JR, Abelin JG, Kok LW, Clauser KR, Mudge JM, Ruiz-Orera J, Bassani-Sternberg M, Deutsch EW, van Heesch S. What can Ribo-seq and proteomics tell us about the non-canonical proteome? BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.05.16.541049. [PMID: 37292611 PMCID: PMC10245706 DOI: 10.1101/2023.05.16.541049] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/10/2023]
Abstract
Ribosome profiling (Ribo-seq) has proven transformative for our understanding of the human genome and proteome by illuminating thousands of non-canonical sites of ribosome translation outside of the currently annotated coding sequences (CDSs). A conservative estimate suggests that at least 7,000 non-canonical open reading frames (ORFs) are translated, which, at first glance, has the potential to expand the number of human protein-coding sequences by 30%, from ∼19,500 annotated CDSs to over 26,000. Yet, additional scrutiny of these ORFs has raised numerous questions about what fraction of them truly produce a protein product and what fraction of those can be understood as proteins according to conventional understanding of the term. Adding further complication is the fact that published estimates of non-canonical ORFs vary widely by around 30-fold, from several thousand to several hundred thousand. The summation of this research has left the genomics and proteomics communities both excited by the prospect of new coding regions in the human genome, but searching for guidance on how to proceed. Here, we discuss the current state of non-canonical ORF research, databases, and interpretation, focusing on how to assess whether a given ORF can be said to be "protein-coding". In brief The human genome encodes thousands of non-canonical open reading frames (ORFs) in addition to protein-coding genes. As a nascent field, many questions remain regarding non-canonical ORFs. How many exist? Do they encode proteins? What level of evidence is needed for their verification? Central to these debates has been the advent of ribosome profiling (Ribo-seq) as a method to discern genome-wide ribosome occupancy, and immunopeptidomics as a method to detect peptides that are processed and presented by MHC molecules and not observed in traditional proteomics experiments. This article provides a synthesis of the current state of non-canonical ORF research and proposes standards for their future investigation and reporting. Highlights Combined use of Ribo-seq and proteomics-based methods enables optimal confidence in detecting non-canonical ORFs and their protein products.Ribo-seq can provide more sensitive detection of non-canonical ORFs, but data quality and analytical pipelines will impact results.Non-canonical ORF catalogs are diverse and span both high-stringency and low-stringency ORF nominations.A framework for standardized non-canonical ORF evidence will advance the research field.
Collapse
Affiliation(s)
- John R. Prensner
- Department of Pediatrics, Division of Pediatric Hematology/Oncology, University of Michigan Medical School, Ann Arbor, MI 48109, USA
| | | | - Leron W. Kok
- Princess Máxima Center for Pediatric Oncology, Heidelberglaan 25, 3584 CS, Utrecht, the Netherlands
| | - Karl R. Clauser
- Broad Institute of MIT and Harvard, Cambridge, MA, 02142, USA
| | - Jonathan M. Mudge
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Jorge Ruiz-Orera
- Cardiovascular and Metabolic Sciences, Max Delbrück Center for Molecular Medicine in the Helmholtz Association (MDC), 13125 Berlin, Germany
| | - Michal Bassani-Sternberg
- Ludwig Institute for Cancer Research, University of Lausanne, Agora Center Bugnon 25A, 1005 Lausanne, Switzerland
- Department of Oncology, Centre hospitalier universitaire vaudois (CHUV), Rue du Bugnon 46, 1005 Lausanne, Switzerland
- Agora Cancer Research Centre, 1011 Lausanne, Switzerland
| | - Eric W. Deutsch
- Institute for Systems Biology (ISB), Seattle, Washington 98109, USA
| | - Sebastiaan van Heesch
- Princess Máxima Center for Pediatric Oncology, Heidelberglaan 25, 3584 CS, Utrecht, the Netherlands
| |
Collapse
|
8
|
Jin J, Meng L, Chen K, Xu Y, Lu P, Li Z, Tao J, Li Z, Wang C, Yang X, Yu S, Yang Z, Cao L, Cao P. Analysis of herbivore-responsive long noncoding ribonucleic acids reveals a subset of small peptide-coding transcripts in Nicotiana tabacum. FRONTIERS IN PLANT SCIENCE 2022; 13:971400. [PMID: 36212334 PMCID: PMC9538394 DOI: 10.3389/fpls.2022.971400] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 06/17/2022] [Accepted: 08/18/2022] [Indexed: 06/16/2023]
Abstract
Long non-coding RNAs (lncRNAs) regulate many biological processes in plants, including defense against pathogens and herbivores. Recently, many small ORFs embedded in lncRNAs have been identified to encode biologically functional peptides (small ORF-encoded peptides [SEPs]) in many species. However, it is unknown whether lncRNAs mediate defense against herbivore attack and whether there are novel functional SEPs for these lncRNAs. By sequencing Spodoptera litura-treated leaves at six time-points in Nicotiana tabacum, 22,436 lncRNAs were identified, of which 787 were differentially expressed. Using a comprehensive mass spectrometry (MS) pipeline, 302 novel SEPs derived from 115 tobacco lncRNAs were identified. Moreover, 61 SEPs showed differential expression after S. litura attack. Importantly, several of these peptides were characterized through 3D structure prediction, subcellular localization validation by laser confocal microscopy, and western blotting. Subsequent bioinformatic analysis revealed some specific chemical and physical properties of these novel SEPs, which probably represent the largest number of SEPs identified in plants to date. Our study not only identifies potential lncRNA regulators of plant response to herbivore attack but also serves as a valuable resource for the functional characterization of SEP-encoding lncRNAs.
Collapse
Affiliation(s)
- Jingjing Jin
- China Tobacco Gene Research Center, Zhengzhou Tobacco Research Institute of CNTC, Zhengzhou, China
| | - Lijun Meng
- China Tobacco Gene Research Center, Zhengzhou Tobacco Research Institute of CNTC, Zhengzhou, China
| | - Kai Chen
- China Tobacco Hunan Industrial Co., Ltd., Changsha, China
| | - Yalong Xu
- China Tobacco Gene Research Center, Zhengzhou Tobacco Research Institute of CNTC, Zhengzhou, China
| | - Peng Lu
- China Tobacco Gene Research Center, Zhengzhou Tobacco Research Institute of CNTC, Zhengzhou, China
| | - Zhaowu Li
- China Tobacco Hunan Industrial Co., Ltd., Changsha, China
| | - Jiemeng Tao
- China Tobacco Gene Research Center, Zhengzhou Tobacco Research Institute of CNTC, Zhengzhou, China
| | - Zefeng Li
- China Tobacco Gene Research Center, Zhengzhou Tobacco Research Institute of CNTC, Zhengzhou, China
| | - Chen Wang
- China Tobacco Gene Research Center, Zhengzhou Tobacco Research Institute of CNTC, Zhengzhou, China
| | - Xiaonian Yang
- China Tobacco Hunan Industrial Co., Ltd., Changsha, China
| | - Shizhou Yu
- Molecular Genetics Key Laboratory of China Tobacco, Guizhou Academy of Tobacco Science, Guiyang, China
| | - Zhixiao Yang
- Molecular Genetics Key Laboratory of China Tobacco, Guizhou Academy of Tobacco Science, Guiyang, China
| | - Linggai Cao
- Molecular Genetics Key Laboratory of China Tobacco, Guizhou Academy of Tobacco Science, Guiyang, China
| | - Peijian Cao
- China Tobacco Gene Research Center, Zhengzhou Tobacco Research Institute of CNTC, Zhengzhou, China
| |
Collapse
|
9
|
Bagheri A, Astafev A, Al-Hashimy T, Jiang P. Tracing Translational Footprint by Ribo-Seq: Principle, Workflow, and Applications to Understand the Mechanism of Human Diseases. Cells 2022; 11:cells11192966. [PMID: 36230928 PMCID: PMC9562884 DOI: 10.3390/cells11192966] [Citation(s) in RCA: 6] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/26/2022] [Revised: 09/02/2022] [Accepted: 09/19/2022] [Indexed: 11/30/2022] Open
Abstract
RNA-seq has been widely used as a high-throughput method to characterize transcript dynamic changes in a broad context, such as development and diseases. However, whether RNA-seq-estimated transcriptional dynamics can be translated into protein level changes is largely unknown. Ribo-seq (Ribosome profiling) is an emerging technology that allows for the investigation of the translational footprint via profiling ribosome-bounded mRNA fragments. Ribo-seq coupled with RNA-seq will allow us to understand the transcriptional and translational control of the fundamental biological process and human diseases. This review focuses on discussing the principle, workflow, and applications of Ribo-seq to study human diseases.
Collapse
Affiliation(s)
- Atefeh Bagheri
- Department of Biological, Geological and Environmental Sciences (BGES), Cleveland State University, Cleveland, OH 44115, USA
- Center for Gene Regulation in Health and Disease (GRHD), Cleveland State University, Cleveland, OH 44115, USA
| | - Artem Astafev
- Department of Biological, Geological and Environmental Sciences (BGES), Cleveland State University, Cleveland, OH 44115, USA
- Center for Gene Regulation in Health and Disease (GRHD), Cleveland State University, Cleveland, OH 44115, USA
| | - Tara Al-Hashimy
- Department of Biological, Geological and Environmental Sciences (BGES), Cleveland State University, Cleveland, OH 44115, USA
| | - Peng Jiang
- Department of Biological, Geological and Environmental Sciences (BGES), Cleveland State University, Cleveland, OH 44115, USA
- Center for Gene Regulation in Health and Disease (GRHD), Cleveland State University, Cleveland, OH 44115, USA
- Center for Applied Data Analysis and Modeling (ADAM), Cleveland State University, Cleveland, OH 44115, USA
- Center for RNA Science and Therapeutics, School of Medicine, Case Western Reserve University, Cleveland, OH 44106, USA
- Correspondence: ; Tel.: +1-(216)-687-3917
| |
Collapse
|
10
|
Na Z, Dai X, Zheng SJ, Bryant CJ, Loh KH, Su H, Luo Y, Buhagiar AF, Cao X, Baserga SJ, Chen S, Slavoff SA. Mapping subcellular localizations of unannotated microproteins and alternative proteins with MicroID. Mol Cell 2022; 82:2900-2911.e7. [PMID: 35905735 PMCID: PMC9662605 DOI: 10.1016/j.molcel.2022.06.035] [Citation(s) in RCA: 24] [Impact Index Per Article: 8.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/07/2021] [Revised: 04/08/2022] [Accepted: 06/29/2022] [Indexed: 11/15/2022]
Abstract
Proteogenomic identification of translated small open reading frames has revealed thousands of previously unannotated, largely uncharacterized microproteins, or polypeptides of less than 100 amino acids, and alternative proteins (alt-proteins) that are co-encoded with canonical proteins and are often larger. The subcellular localizations of microproteins and alt-proteins are generally unknown but can have significant implications for their functions. Proximity biotinylation is an attractive approach to define the protein composition of subcellular compartments in cells and in animals. Here, we developed a high-throughput technology to map unannotated microproteins and alt-proteins to subcellular localizations by proximity biotinylation with TurboID (MicroID). More than 150 microproteins and alt-proteins are associated with subnuclear organelles. One alt-protein, alt-LAMA3, localizes to the nucleolus and functions in pre-rRNA transcription. We applied MicroID in a mouse model, validating expression of a conserved nuclear microprotein, and establishing MicroID for discovery of microproteins and alt-proteins in vivo.
Collapse
Affiliation(s)
- Zhenkun Na
- Department of Chemistry, Yale University, New Haven, CT 06520, USA; Institute of Biomolecular Design and Discovery, Yale University, West Haven, CT 06516, USA
| | - Xiaoyun Dai
- Department of Genetics, Yale University School of Medicine, New Haven, CT 06520, USA; Systems Biology Institute, Yale University, West Haven, CT 06516, USA
| | - Shu-Jian Zheng
- Department of Chemistry, Yale University, New Haven, CT 06520, USA; Institute of Biomolecular Design and Discovery, Yale University, West Haven, CT 06516, USA
| | - Carson J Bryant
- Department of Molecular Biophysics and Biochemistry, Yale University, New Haven, CT 06529, USA
| | - Ken H Loh
- Laboratory of Molecular Genetics, Howard Hughes Medical Institute, The Rockefeller University, New York, NY 10065, USA
| | - Haomiao Su
- Department of Chemistry, Yale University, New Haven, CT 06520, USA; Institute of Biomolecular Design and Discovery, Yale University, West Haven, CT 06516, USA
| | - Yang Luo
- Department of Chemistry, Yale University, New Haven, CT 06520, USA; Institute of Biomolecular Design and Discovery, Yale University, West Haven, CT 06516, USA
| | - Amber F Buhagiar
- Department of Molecular Biophysics and Biochemistry, Yale University, New Haven, CT 06529, USA
| | - Xiongwen Cao
- Department of Chemistry, Yale University, New Haven, CT 06520, USA; Institute of Biomolecular Design and Discovery, Yale University, West Haven, CT 06516, USA
| | - Susan J Baserga
- Department of Molecular Biophysics and Biochemistry, Yale University, New Haven, CT 06529, USA; Department of Genetics, Yale University School of Medicine, New Haven, CT 06520, USA; Department of Therapeutic Radiology, Yale University School of Medicine, New Haven, CT 06520, USA
| | - Sidi Chen
- Department of Genetics, Yale University School of Medicine, New Haven, CT 06520, USA; Systems Biology Institute, Yale University, West Haven, CT 06516, USA
| | - Sarah A Slavoff
- Department of Chemistry, Yale University, New Haven, CT 06520, USA; Institute of Biomolecular Design and Discovery, Yale University, West Haven, CT 06516, USA; Department of Molecular Biophysics and Biochemistry, Yale University, New Haven, CT 06529, USA.
| |
Collapse
|
11
|
Bogaert A, Fijalkowska D, Staes A, Van de Steene T, Demol H, Gevaert K. Limited evidence for protein products of non-coding transcripts in the HEK293T cellular cytosol. Mol Cell Proteomics 2022; 21:100264. [PMID: 35788065 PMCID: PMC9396073 DOI: 10.1016/j.mcpro.2022.100264] [Citation(s) in RCA: 10] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/21/2022] [Revised: 06/22/2022] [Accepted: 06/30/2022] [Indexed: 10/25/2022] Open
Abstract
Ribosome profiling has revealed translation outside of canonical coding sequences (CDSs) including translation of short upstream ORFs, long non-coding RNAs, overlapping ORFs, ORFs in UTRs or ORFs in alternative reading frames. Studies combining mass spectrometry, ribosome profiling and CRISPR-based screens showed that hundreds of ORFs derived from non-coding transcripts produce (micro)proteins, while other studies failed to find evidence for such types of non-canonical translation products. Here, we attempted to discover translation products from non-coding regions by strongly reducing the complexity of the sample prior to mass spectrometric analysis. We used an extended database as the search space and applied stringent filtering of the identified peptides to find evidence for novel translation events. We show that, theoretically our strategy facilitates the detection of translation events of transcripts from non-coding regions, but experimentally only find 19 peptides that might originate from such translation events. Finally, Virotrap based interactome analysis of two N-terminal proteoforms originating from non-coding regions finally showed the functional potential of these novel proteins.
Collapse
Affiliation(s)
- Annelies Bogaert
- VIB Center for Medical Biotechnology, VIB, Ghent, 9052, Belgium; Department of Biomolecular Medicine, Ghent University, Ghent, 9052, Belgium
| | - Daria Fijalkowska
- VIB Center for Medical Biotechnology, VIB, Ghent, 9052, Belgium; Department of Biomolecular Medicine, Ghent University, Ghent, 9052, Belgium
| | - An Staes
- VIB Center for Medical Biotechnology, VIB, Ghent, 9052, Belgium; Department of Biomolecular Medicine, Ghent University, Ghent, 9052, Belgium
| | - Tessa Van de Steene
- VIB Center for Medical Biotechnology, VIB, Ghent, 9052, Belgium; Department of Biomolecular Medicine, Ghent University, Ghent, 9052, Belgium
| | - Hans Demol
- VIB Center for Medical Biotechnology, VIB, Ghent, 9052, Belgium; Department of Biomolecular Medicine, Ghent University, Ghent, 9052, Belgium
| | - Kris Gevaert
- VIB Center for Medical Biotechnology, VIB, Ghent, 9052, Belgium; Department of Biomolecular Medicine, Ghent University, Ghent, 9052, Belgium.
| |
Collapse
|
12
|
Cao X, Khitun A, Harold CM, Bryant CJ, Zheng SJ, Baserga SJ, Slavoff SA. Nascent alt-protein chemoproteomics reveals a pre-60S assembly checkpoint inhibitor. Nat Chem Biol 2022; 18:643-651. [PMID: 35393574 PMCID: PMC9423127 DOI: 10.1038/s41589-022-01003-9] [Citation(s) in RCA: 10] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/03/2021] [Accepted: 02/25/2022] [Indexed: 12/29/2022]
Abstract
Many unannotated microproteins and alternative proteins (alt-proteins) are coencoded with canonical proteins, but few of their functions are known. Motivated by the hypothesis that alt-proteins undergoing regulated synthesis could play important cellular roles, we developed a chemoproteomic pipeline to identify nascent alt-proteins in human cells. We identified 22 actively translated alt-proteins or N-terminal extensions, one of which is post-transcriptionally upregulated by DNA damage stress. We further defined a nucleolar, cell-cycle-regulated alt-protein that negatively regulates assembly of the pre-60S ribosomal subunit (MINAS-60). Depletion of MINAS-60 increases the amount of cytoplasmic 60S ribosomal subunit, upregulating global protein synthesis and cell proliferation. Mechanistically, MINAS-60 represses the rate of late-stage pre-60S assembly and export to the cytoplasm. Together, these results implicate MINAS-60 as a potential checkpoint inhibitor of pre-60S assembly and demonstrate that chemoproteomics enables hypothesis generation for uncharacterized alt-proteins.
Collapse
Affiliation(s)
- Xiongwen Cao
- Department of Chemistry, Yale University, New Haven, CT, USA.,Institute of Biomolecular Design and Discovery, Yale University, West Haven, CT, USA
| | - Alexandra Khitun
- Department of Chemistry, Yale University, New Haven, CT, USA.,Institute of Biomolecular Design and Discovery, Yale University, West Haven, CT, USA
| | - Cecelia M Harold
- Department of Genetics, Yale University School of Medicine, New Haven, CT, USA
| | - Carson J Bryant
- Department of Molecular Biophysics and Biochemistry, Yale University, New Haven, CT, USA
| | - Shu-Jian Zheng
- Department of Chemistry, Yale University, New Haven, CT, USA.,Institute of Biomolecular Design and Discovery, Yale University, West Haven, CT, USA
| | - Susan J Baserga
- Department of Genetics, Yale University School of Medicine, New Haven, CT, USA.,Department of Molecular Biophysics and Biochemistry, Yale University, New Haven, CT, USA.,Department of Therapeutic Radiology, Yale University School of Medicine, New Haven, CT, USA
| | - Sarah A Slavoff
- Department of Chemistry, Yale University, New Haven, CT, USA. .,Institute of Biomolecular Design and Discovery, Yale University, West Haven, CT, USA. .,Department of Molecular Biophysics and Biochemistry, Yale University, New Haven, CT, USA.
| |
Collapse
|
13
|
Leong AZX, Lee PY, Mohtar MA, Syafruddin SE, Pung YF, Low TY. Short open reading frames (sORFs) and microproteins: an update on their identification and validation measures. J Biomed Sci 2022; 29:19. [PMID: 35300685 PMCID: PMC8928697 DOI: 10.1186/s12929-022-00802-5] [Citation(s) in RCA: 38] [Impact Index Per Article: 12.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/31/2021] [Accepted: 03/09/2022] [Indexed: 12/17/2022] Open
Abstract
A short open reading frame (sORFs) constitutes ≤ 300 bases, encoding a microprotein or sORF-encoded protein (SEP) which comprises ≤ 100 amino acids. Traditionally dismissed by genome annotation pipelines as meaningless noise, sORFs were found to possess coding potential with ribosome profiling (RIBO-Seq), which unveiled sORF-based transcripts at various genome locations. Nonetheless, the existence of corresponding microproteins that are stable and functional was little substantiated by experimental evidence initially. With recent advancements in multi-omics, the identification, validation, and functional characterisation of sORFs and microproteins have become feasible. In this review, we discuss the history and development of an emerging research field of sORFs and microproteins. In particular, we focus on an array of bioinformatics and OMICS approaches used for predicting, sequencing, validating, and characterizing these recently discovered entities. These strategies include RIBO-Seq which detects sORF transcripts via ribosome footprints, and mass spectrometry (MS)-based proteomics for sequencing the resultant microproteins. Subsequently, our discussion extends to the functional characterisation of microproteins by incorporating CRISPR/Cas9 screen and protein–protein interaction (PPI) studies. Our review discusses not only detection methodologies, but we also highlight on the challenges and potential solutions in identifying and validating sORFs and their microproteins. The novelty of this review lies within its validation for the functional role of microproteins, which could contribute towards the future landscape of microproteomics.
Collapse
Affiliation(s)
- Alyssa Zi-Xin Leong
- UKM Medical Molecular Biology Institute (UMBI), Universiti Kebangsaan Malaysia, 56000, Kuala Lumpur, Malaysia
| | - Pey Yee Lee
- UKM Medical Molecular Biology Institute (UMBI), Universiti Kebangsaan Malaysia, 56000, Kuala Lumpur, Malaysia
| | - M Aiman Mohtar
- UKM Medical Molecular Biology Institute (UMBI), Universiti Kebangsaan Malaysia, 56000, Kuala Lumpur, Malaysia
| | - Saiful Effendi Syafruddin
- UKM Medical Molecular Biology Institute (UMBI), Universiti Kebangsaan Malaysia, 56000, Kuala Lumpur, Malaysia
| | - Yuh-Fen Pung
- Division of Biomedical Science, School of Pharmacy, University of Nottingham Malaysia, Semenyih, 43500, Selangor, Malaysia
| | - Teck Yew Low
- UKM Medical Molecular Biology Institute (UMBI), Universiti Kebangsaan Malaysia, 56000, Kuala Lumpur, Malaysia.
| |
Collapse
|
14
|
Cope AL, Vellappan S, Favate JS, Skalenko KS, Yadavalli SS, Shah P. Exploring Ribosome-Positioning on Translating Transcripts with Ribosome Profiling. Methods Mol Biol 2022; 2404:83-110. [PMID: 34694605 DOI: 10.1007/978-1-0716-1851-6_5] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/13/2023]
Abstract
The emergence of ribosome profiling as a tool for measuring the translatome has provided researchers with valuable insights into the post-transcriptional regulation of gene expression. Despite the biological insights and technical improvements made since the technique was initially described by Ingolia et al. (Science 324(5924):218-223, 2009), ribosome profiling measurements and subsequent data analysis remain challenging. Here, we describe our lab's protocol for performing ribosome profiling in bacteria, yeast, and mammalian cells. This protocol has integrated elements from three published ribosome profiling methods. In addition, we describe a tool called RiboViz (Carja et al., BMC Bioinformatics 18:461, 2017) ( https://github.com/riboviz/riboviz ) for the analysis and visualization of ribosome profiling data. Given raw sequencing reads and transcriptome information (e.g., FASTA, GFF) for a species, RiboViz performs the necessary pre-processing and mapping of the raw sequencing reads. RiboViz also provides the user with various quality control visualizations.
Collapse
Affiliation(s)
- Alexander L Cope
- Department of Genetics, Rutgers University, Piscataway, NJ, USA
- Human Genetics Institute of New Jersey, Piscataway, NJ, USA
| | - Sangeevan Vellappan
- Department of Genetics, Rutgers University, Piscataway, NJ, USA
- Waksman Institute, Rutgers University, Piscataway, NJ, USA
| | - John S Favate
- Department of Genetics, Rutgers University, Piscataway, NJ, USA
- Human Genetics Institute of New Jersey, Piscataway, NJ, USA
| | - Kyle S Skalenko
- Department of Genetics, Rutgers University, Piscataway, NJ, USA
- Waksman Institute, Rutgers University, Piscataway, NJ, USA
| | - Srujana S Yadavalli
- Department of Genetics, Rutgers University, Piscataway, NJ, USA
- Waksman Institute, Rutgers University, Piscataway, NJ, USA
| | - Premal Shah
- Department of Genetics, Rutgers University, Piscataway, NJ, USA.
- Human Genetics Institute of New Jersey, Piscataway, NJ, USA.
| |
Collapse
|
15
|
Extensive Translational Regulation through the Proliferative Transition of Trypanosoma cruzi Revealed by Multi-Omics. mSphere 2021; 6:e0036621. [PMID: 34468164 PMCID: PMC8550152 DOI: 10.1128/msphere.00366-21] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/28/2022] Open
Abstract
Trypanosoma cruzi is the etiological agent for Chagas disease, a neglected parasitic disease in Latin America. Gene transcription control governs the eukaryotic cell replication but is absent in trypanosomatids; thus, it must be replaced by posttranscriptional regulatory events. We investigated the entrance into the T. cruzi replicative cycle using ribosome profiling and proteomics on G1/S epimastigote cultures synchronized with hydroxyurea. We identified 1,784 translationally regulated genes (change > 2, false-discovery rate [FDR] < 0.05) and 653 differentially expressed proteins (change > 1.5, FDR < 0.05), respectively. A major translational remodeling accompanied by an extensive proteome change is found, while the transcriptome remains largely unperturbed at the replicative entrance of the cell cycle. The differentially expressed genes comprise specific cell cycle processes, confirming previous findings while revealing candidate cell cycle regulators that undergo previously unnoticed translational regulation. Clusters of genes showing a coordinated regulation at translation and protein abundance share related biological functions such as cytoskeleton organization and mitochondrial metabolism; thus, they may represent posttranscriptional regulons. The translatome and proteome of the coregulated clusters change in both coupled and uncoupled directions, suggesting that complex cross talk between the two processes is required to achieve adequate protein levels of different regulons. This is the first simultaneous assessment of the transcriptome, translatome, and proteome of trypanosomatids, which represent a paradigm for the absence of transcriptional control. The findings suggest that gene expression chronology along the T. cruzi cell cycle is controlled mainly by translatome and proteome changes coordinated using different mechanisms for specific gene groups. IMPORTANCE Trypanosoma cruzi is an ancient eukaryotic unicellular parasite causing Chagas disease, a potentially life-threatening illness that affects 6 to 7 million people, mostly in Latin America. The antiparasitic treatments for the disease have incomplete efficacy and adverse reactions; thus, improved drugs are needed. We study the mechanisms governing the replication of the parasite, aiming to find differences with the human host, valuable for the development of parasite-specific antiproliferative drugs. Transcriptional regulation is essential for replication in most eukaryotes, but in trypanosomatids, it must be replaced by subsequent gene regulation steps since they lack transcription initiation control. We identified the genome-wide remodeling of mRNA translation and protein abundance during the entrance to the replicative phase of the cell cycle. We found that translation is strongly regulated, causing variation in protein levels of specific cell cycle processes, representing the first simultaneous study of the translatome and proteome in trypanosomatids.
Collapse
|
16
|
Prensner JR, Enache OM, Luria V, Krug K, Clauser KR, Dempster JM, Karger A, Wang L, Stumbraite K, Wang VM, Botta G, Lyons NJ, Goodale A, Kalani Z, Fritchman B, Brown A, Alan D, Green T, Yang X, Jaffe JD, Roth JA, Piccioni F, Kirschner MW, Ji Z, Root DE, Golub TR. Noncanonical open reading frames encode functional proteins essential for cancer cell survival. Nat Biotechnol 2021; 39:697-704. [PMID: 33510483 PMCID: PMC8195866 DOI: 10.1038/s41587-020-00806-2] [Citation(s) in RCA: 104] [Impact Index Per Article: 26.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/18/2020] [Accepted: 12/16/2020] [Indexed: 01/30/2023]
Abstract
Although genomic analyses predict many noncanonical open reading frames (ORFs) in the human genome, it is unclear whether they encode biologically active proteins. Here we experimentally interrogated 553 candidates selected from noncanonical ORF datasets. Of these, 57 induced viability defects when knocked out in human cancer cell lines. Following ectopic expression, 257 showed evidence of protein expression and 401 induced gene expression changes. Clustered regularly interspaced short palindromic repeat (CRISPR) tiling and start codon mutagenesis indicated that their biological effects required translation as opposed to RNA-mediated effects. We found that one of these ORFs, G029442-renamed glycine-rich extracellular protein-1 (GREP1)-encodes a secreted protein highly expressed in breast cancer, and its knockout in 263 cancer cell lines showed preferential essentiality in breast cancer-derived lines. The secretome of GREP1-expressing cells has an increased abundance of the oncogenic cytokine GDF15, and GDF15 supplementation mitigated the growth-inhibitory effect of GREP1 knockout. Our experiments suggest that noncanonical ORFs can express biologically active proteins that are potential therapeutic targets.
Collapse
Affiliation(s)
- John R. Prensner
- Broad Institute of Harvard and MIT, Cambridge, MA, 02142, USA.,Department of Pediatric Oncology, Dana-Farber Cancer Institute, Boston, MA 02215,Division of Pediatric Hematology/Oncology, Boston Children’s Hospital, Boston, MA, 02115
| | - Oana M. Enache
- Broad Institute of Harvard and MIT, Cambridge, MA, 02142, USA
| | - Victor Luria
- Department of Systems Biology, Harvard Medical School, Boston, MA, 02115, USA
| | - Karsten Krug
- Broad Institute of Harvard and MIT, Cambridge, MA, 02142, USA
| | - Karl R. Clauser
- Broad Institute of Harvard and MIT, Cambridge, MA, 02142, USA
| | | | - Amir Karger
- IT-Research Computing, Harvard Medical School, Boston, MA, USA, 02115
| | - Li Wang
- Broad Institute of Harvard and MIT, Cambridge, MA, 02142, USA
| | | | - Vickie M. Wang
- Broad Institute of Harvard and MIT, Cambridge, MA, 02142, USA
| | - Ginevra Botta
- Broad Institute of Harvard and MIT, Cambridge, MA, 02142, USA
| | | | - Amy Goodale
- Broad Institute of Harvard and MIT, Cambridge, MA, 02142, USA
| | - Zohra Kalani
- Broad Institute of Harvard and MIT, Cambridge, MA, 02142, USA
| | | | - Adam Brown
- Broad Institute of Harvard and MIT, Cambridge, MA, 02142, USA
| | - Douglas Alan
- Broad Institute of Harvard and MIT, Cambridge, MA, 02142, USA
| | - Thomas Green
- Broad Institute of Harvard and MIT, Cambridge, MA, 02142, USA
| | - Xiaoping Yang
- Broad Institute of Harvard and MIT, Cambridge, MA, 02142, USA
| | - Jacob D. Jaffe
- Broad Institute of Harvard and MIT, Cambridge, MA, 02142, USA.,Present address: Inzen Therapeutics, Cambridge, MA, 02139, USA
| | | | - Federica Piccioni
- Broad Institute of Harvard and MIT, Cambridge, MA, 02142, USA.,Present address: Merck Research Laboratories, Boston, MA, 02115, USA
| | - Marc W. Kirschner
- Department of Systems Biology, Harvard Medical School, Boston, MA, 02115, USA
| | - Zhe Ji
- Department of Pharmacology, Feinberg School of Medicine, Northwestern University, Chicago, IL 60611,Department of Biomedical Engineering, McCormick School of Engineering, Northwestern University, Evanston, IL 60628
| | - David E. Root
- Broad Institute of Harvard and MIT, Cambridge, MA, 02142, USA
| | - Todd R. Golub
- Broad Institute of Harvard and MIT, Cambridge, MA, 02142, USA.,Department of Pediatric Oncology, Dana-Farber Cancer Institute, Boston, MA 02215,Division of Pediatric Hematology/Oncology, Boston Children’s Hospital, Boston, MA, 02115,Corresponding author: Address correspondence to: Todd R. Golub, MD, Chief Scientific Officer, Broad Institute of Harvard and MIT, Room 4013, 415 Main Street, Cambridge, MA, 02142, , Phone: 617-714-7050
| |
Collapse
|
17
|
Tsang O, Wong JWH. Proteogenomic interrogation of cancer cell lines: an overview of the field. Expert Rev Proteomics 2021; 18:221-232. [PMID: 33877947 DOI: 10.1080/14789450.2021.1914594] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/17/2022]
Abstract
Introduction: Cancer cell lines (CCLs) have been a major resource for cancer research. Over the past couple of decades, they have been instrumental in omic profiling method development and as model systems to generate new knowledge in cell and cancer biology. More recently, with the increasing amount of genomic, transcriptomic and proteomic data being generated in hundreds of CCLs, there is growing potential for integrative proteogenomic data analyses to be performed.Areas covered: In this review, we first describe the most commonly used proteome profiling methods in CCLs. We then discuss how these proteomics data can be integrated with genomics data for proteogenomics analyses. Finally, we highlight some of the recent biological discoveries that have arisen from proteogenomics analyses of CCLs.Expert opinion: Protegeonomics analyses of CCLs have so far enabled the discovery of novel proteins and proteoforms. It has also improved our understanding of biological processes including post-transcriptional regulation of protein abundance and the presentation of antigens by major histocompatibility complex alleles. With proteomics data to be generated in hundreds to thousands of CCLs in coming years, there will be further potential for large-scale proteogenomics analyses and data integration with the phenotypically well-characterized CCLs.
Collapse
Affiliation(s)
- Olson Tsang
- Centre for PanorOmic Sciences, The University of Hong Kong, Pokfulam, Hong Kong SAR
| | - Jason W H Wong
- Centre for PanorOmic Sciences, The University of Hong Kong, Pokfulam, Hong Kong SAR.,School of Biomedical Sciences, The University of Hong Kong, Pokfulam, Hong Kong SAR
| |
Collapse
|
18
|
Verbruggen S, Gessulat S, Gabriels R, Matsaroki A, Van de Voorde H, Kuster B, Degroeve S, Martens L, Van Criekinge W, Wilhelm M, Menschaert G. Spectral Prediction Features as a Solution for the Search Space Size Problem in Proteogenomics. Mol Cell Proteomics 2021; 20:100076. [PMID: 33823297 PMCID: PMC8214147 DOI: 10.1016/j.mcpro.2021.100076] [Citation(s) in RCA: 27] [Impact Index Per Article: 6.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/27/2020] [Revised: 03/04/2021] [Accepted: 03/25/2021] [Indexed: 11/17/2022] Open
Abstract
Proteogenomics approaches often struggle with the distinction between true and false peptide-to-spectrum matches as the database size enlarges. However, features extracted from tandem mass spectrometry intensity predictors can enhance the peptide identification rate and can provide extra confidence for peptide-to-spectrum matching in a proteogenomics context. To that end, features from the spectral intensity pattern predictors MS2PIP and Prosit were combined with the canonical scores from MaxQuant in the Percolator postprocessing tool for protein sequence databases constructed out of ribosome profiling and nanopore RNA-Seq analyses. The presented results provide evidence that this approach enhances both the identification rate as well as the validation stringency in a proteogenomic setting. First proteogenomics with PSM rescoring using machine learning–predicted spectra Demonstrated on both ribosome profiling and nanopore RNA-Seq–derived databases Rescoring leads to elevated stringency and increased identification rates Rescoring compensates for the search space size issues in proteogenomics
Collapse
Affiliation(s)
- Steven Verbruggen
- BioBix, Lab of Bioinformatics and Computational Genomics, Department of Mathematical Modeling, Statistics and Bioinformatics, Faculty of Bioscience Engineering, Ghent University, Ghent, Belgium; OHMX.bio, Ghent, Belgium
| | - Siegfried Gessulat
- Chair of Proteomics and Bioanalytics, Technical University of Munich, Freising, Germany
| | - Ralf Gabriels
- Department of Biomolecular Medicine, Faculty of Medicine and Health Sciences, Ghent University, Ghent, Belgium; VIB-UGent Center for Medical Biotechnology, VIB, Ghent, Belgium
| | | | | | - Bernhard Kuster
- Chair of Proteomics and Bioanalytics, Technical University of Munich, Freising, Germany
| | - Sven Degroeve
- Department of Biomolecular Medicine, Faculty of Medicine and Health Sciences, Ghent University, Ghent, Belgium; VIB-UGent Center for Medical Biotechnology, VIB, Ghent, Belgium
| | - Lennart Martens
- Department of Biomolecular Medicine, Faculty of Medicine and Health Sciences, Ghent University, Ghent, Belgium; VIB-UGent Center for Medical Biotechnology, VIB, Ghent, Belgium
| | - Wim Van Criekinge
- BioBix, Lab of Bioinformatics and Computational Genomics, Department of Mathematical Modeling, Statistics and Bioinformatics, Faculty of Bioscience Engineering, Ghent University, Ghent, Belgium
| | - Mathias Wilhelm
- Chair of Proteomics and Bioanalytics, Technical University of Munich, Freising, Germany
| | - Gerben Menschaert
- BioBix, Lab of Bioinformatics and Computational Genomics, Department of Mathematical Modeling, Statistics and Bioinformatics, Faculty of Bioscience Engineering, Ghent University, Ghent, Belgium; OHMX.bio, Ghent, Belgium.
| |
Collapse
|
19
|
Fijalkowska D, Fijalkowski I, Willems P, Van Damme P. Bacterial riboproteogenomics: the era of N-terminal proteoform existence revealed. FEMS Microbiol Rev 2021; 44:418-431. [PMID: 32386204 DOI: 10.1093/femsre/fuaa013] [Citation(s) in RCA: 9] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/19/2019] [Accepted: 05/07/2020] [Indexed: 12/17/2022] Open
Abstract
With the rapid increase in the number of sequenced prokaryotic genomes, relying on automated gene annotation became a necessity. Multiple lines of evidence, however, suggest that current bacterial genome annotations may contain inconsistencies and are incomplete, even for so-called well-annotated genomes. We here discuss underexplored sources of protein diversity and new methodologies for high-throughput genome reannotation. The expression of multiple molecular forms of proteins (proteoforms) from a single gene, particularly driven by alternative translation initiation, is gaining interest as a prominent contributor to bacterial protein diversity. In consequence, riboproteogenomic pipelines were proposed to comprehensively capture proteoform expression in prokaryotes by the complementary use of (positional) proteomics and the direct readout of translated genomic regions using ribosome profiling. To complement these discoveries, tailored strategies are required for the functional characterization of newly discovered bacterial proteoforms.
Collapse
Affiliation(s)
- Daria Fijalkowska
- Department of Biochemistry and Microbiology, Ghent University, K. L. Ledeganckstraat 35, B-9000 Ghent, Belgium
| | - Igor Fijalkowski
- Department of Biochemistry and Microbiology, Ghent University, K. L. Ledeganckstraat 35, B-9000 Ghent, Belgium
| | - Patrick Willems
- Department of Biochemistry and Microbiology, Ghent University, K. L. Ledeganckstraat 35, B-9000 Ghent, Belgium
| | - Petra Van Damme
- Department of Biochemistry and Microbiology, Ghent University, K. L. Ledeganckstraat 35, B-9000 Ghent, Belgium
| |
Collapse
|
20
|
Schlesinger D, Elsässer SJ. Revisiting sORFs: overcoming challenges to identify and characterize functional microproteins. FEBS J 2021; 289:53-74. [PMID: 33595896 DOI: 10.1111/febs.15769] [Citation(s) in RCA: 61] [Impact Index Per Article: 15.3] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/21/2020] [Revised: 01/17/2021] [Accepted: 02/15/2021] [Indexed: 02/07/2023]
Abstract
Short ORFs (sORFs), that is, occurrences of a start and stop codon within 100 codons or less, can be found in organisms of all domains of life, outnumbering annotated protein-coding ORFs by orders of magnitude. Even though functional proteins smaller than 100 amino acids are known, the coding potential of sORFs has often been overlooked, as it is not trivial to predict and test for functionality within the large number of sORFs. Recent advances in ribosome profiling and mass spectrometry approaches, together with refined bioinformatic predictions, have enabled a huge leap forward in this field and identified thousands of likely coding sORFs. A relatively low number of small proteins or microproteins produced from these sORFs have been characterized so far on the molecular, structural, and/or mechanistic level. These however display versatile and, in some cases, essential cellular functions, allowing for the exciting possibility that many more, previously unknown small proteins might be encoded in the genome, waiting to be discovered. This review will give an overview of the steadily growing microprotein field, focusing on eukaryotic small proteins. We will discuss emerging themes in the molecular action of microproteins, as well as advances and challenges in microprotein identification and characterization.
Collapse
Affiliation(s)
- Dörte Schlesinger
- Science for Life Laboratory, Division of Genome Biology, Department of Medical Biochemistry and Biophysics, Karolinska Institutet, Stockholm, Sweden.,Ming Wai Lau Centre for Reparative Medicine, Stockholm node, Karolinska Institutet, Stockholm, Sweden
| | - Simon J Elsässer
- Science for Life Laboratory, Division of Genome Biology, Department of Medical Biochemistry and Biophysics, Karolinska Institutet, Stockholm, Sweden.,Ming Wai Lau Centre for Reparative Medicine, Stockholm node, Karolinska Institutet, Stockholm, Sweden
| |
Collapse
|
21
|
Tariq MU, Haseeb M, Aledhari M, Razzak R, Parizi RM, Saeed F. Methods for Proteogenomics Data Analysis, Challenges, and Scalability Bottlenecks: A Survey. IEEE ACCESS : PRACTICAL INNOVATIONS, OPEN SOLUTIONS 2020; 9:5497-5516. [PMID: 33537181 PMCID: PMC7853650 DOI: 10.1109/access.2020.3047588] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/17/2023]
Abstract
Big Data Proteogenomics lies at the intersection of high-throughput Mass Spectrometry (MS) based proteomics and Next Generation Sequencing based genomics. The combined and integrated analysis of these two high-throughput technologies can help discover novel proteins using genomic, and transcriptomic data. Due to the biological significance of integrated analysis, the recent past has seen an influx of proteogenomic tools that perform various tasks, including mapping proteins to the genomic data, searching experimental MS spectra against a six-frame translation genome database, and automating the process of annotating genome sequences. To date, most of such tools have not focused on scalability issues that are inherent in proteogenomic data analysis where the size of the database is much larger than a typical protein database. These state-of-the-art tools can take more than half a month to process a small-scale dataset of one million spectra against a genome of 3 GB. In this article, we provide an up-to-date review of tools that can analyze proteogenomic datasets, providing a critical analysis of the techniques' relative merits and potential pitfalls. We also point out potential bottlenecks and recommendations that can be incorporated in the future design of these workflows to ensure scalability with the increasing size of proteogenomic data. Lastly, we make a case of how high-performance computing (HPC) solutions may be the best bet to ensure the scalability of future big data proteogenomic data analysis.
Collapse
Affiliation(s)
- Muhammad Usman Tariq
- School of Computing and Information Sciences, Florida International University, Miami, FL 33199, USA
| | - Muhammad Haseeb
- School of Computing and Information Sciences, Florida International University, Miami, FL 33199, USA
| | - Mohammed Aledhari
- College of Computing and Software Engineering, Kennesaw State University, Marietta, GA 30060, USA
| | - Rehma Razzak
- College of Computing and Software Engineering, Kennesaw State University, Marietta, GA 30060, USA
| | - Reza M Parizi
- College of Computing and Software Engineering, Kennesaw State University, Marietta, GA 30060, USA
| | - Fahad Saeed
- School of Computing and Information Sciences, Florida International University, Miami, FL 33199, USA
| |
Collapse
|
22
|
Lau E, Han Y, Williams DR, Thomas CT, Shrestha R, Wu JC, Lam MPY. Splice-Junction-Based Mapping of Alternative Isoforms in the Human Proteome. Cell Rep 2020; 29:3751-3765.e5. [PMID: 31825849 PMCID: PMC6961840 DOI: 10.1016/j.celrep.2019.11.026] [Citation(s) in RCA: 49] [Impact Index Per Article: 9.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/18/2019] [Revised: 09/24/2019] [Accepted: 11/06/2019] [Indexed: 12/18/2022] Open
Abstract
The protein-level translational status and function of many alternative splicing events remain poorly understood. We use an RNA sequencing (RNA-seq)-guided proteomics method to identify protein alternative splicing isoforms in the human proteome by constructing tissue-specific protein databases that prioritize transcript splice junction pairs with high translational potential. Using the custom databases to reanalyze ~80 million mass spectra in public proteomics datasets, we identify more than 1,500 noncanonical protein isoforms across 12 human tissues, including ~400 sequences undocumented on TrEMBL and RefSeq databases. We apply the method to original quantitative mass spectrometry experiments and observe widespread isoform regulation during human induced pluripotent stem cell cardiomyocyte differentiation. On a proteome scale, alternative isoform regions overlap frequently with disordered sequences and post-translational modification sites, suggesting that alternative splicing may regulate protein function through modulating intrinsically disordered regions. The described approach may help elucidate functional consequences of alternative splicing and expand the scope of proteomics investigations in various systems. The translation and function of many alternative splicing events await confirmation at the protein level. Lau et al. use an integrated proteotranscriptomics approach to identify non-canonical and undocumented isoforms from 12 organs in the human proteome. Alternative isoforms interfere with functional sequence features and are differentially regulated during iPSC cardiomyocyte differentiation.
Collapse
Affiliation(s)
- Edward Lau
- Stanford Cardiovascular Institute, Department of Medicine, Stanford University, Palo Alto, CA, USA
| | - Yu Han
- Consortium for Fibrosis Research and Translation, Anschutz Medical Campus, University of Colorado, Aurora, CO, USA; Departments of Medicine-Cardiology and Biochemistry and Molecular Genetics, Anschutz Medical Campus, University of Colorado, Aurora, CO, USA
| | - Damon R Williams
- Stanford Cardiovascular Institute, Department of Medicine, Stanford University, Palo Alto, CA, USA
| | - Cody T Thomas
- Departments of Medicine-Cardiology and Biochemistry and Molecular Genetics, Anschutz Medical Campus, University of Colorado, Aurora, CO, USA
| | - Rajani Shrestha
- Stanford Cardiovascular Institute, Department of Medicine, Stanford University, Palo Alto, CA, USA
| | - Joseph C Wu
- Stanford Cardiovascular Institute, Department of Medicine, Stanford University, Palo Alto, CA, USA; Department of Radiology, School of Medicine, Stanford University, Palo Alto, CA, USA
| | - Maggie P Y Lam
- Consortium for Fibrosis Research and Translation, Anschutz Medical Campus, University of Colorado, Aurora, CO, USA; Departments of Medicine-Cardiology and Biochemistry and Molecular Genetics, Anschutz Medical Campus, University of Colorado, Aurora, CO, USA.
| |
Collapse
|
23
|
Choi SW, Kim HW, Nam JW. The small peptide world in long noncoding RNAs. Brief Bioinform 2020; 20:1853-1864. [PMID: 30010717 PMCID: PMC6917221 DOI: 10.1093/bib/bby055] [Citation(s) in RCA: 200] [Impact Index Per Article: 40.0] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/09/2018] [Revised: 05/08/2018] [Indexed: 02/07/2023] Open
Abstract
Long noncoding RNAs (lncRNAs) are a group of transcripts that are longer than 200 nucleotides (nt) without coding potential. Over the past decade, tens of thousands of novel lncRNAs have been annotated in animal and plant genomes because of advanced high-throughput RNA sequencing technologies and with the aid of coding transcript classifiers. Further, a considerable number of reports have revealed the existence of stable, functional small peptides (also known as micropeptides), translated from lncRNAs. In this review, we discuss the methods of lncRNA classification, the investigations regarding their coding potential and the functional significance of the peptides they encode.
Collapse
Affiliation(s)
- Seo-Won Choi
- Department of Life Science, College of Natural Sciences, Hanyang University, Seoul 04763, Republic of Korea
| | - Hyun-Woo Kim
- Department of Life Science, College of Natural Sciences, Hanyang University, Seoul 04763, Republic of Korea
| | - Jin-Wu Nam
- Department of Life Science, College of Natural Sciences, Hanyang University, Seoul 04763, Republic of Korea
| |
Collapse
|
24
|
Translation initiation downstream from annotated start codons in human mRNAs coevolves with the Kozak context. Genome Res 2020; 30:974-984. [PMID: 32669370 PMCID: PMC7397870 DOI: 10.1101/gr.257352.119] [Citation(s) in RCA: 25] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/18/2019] [Accepted: 06/25/2020] [Indexed: 12/13/2022]
Abstract
Eukaryotic translation initiation involves preinitiation ribosomal complex 5′-to-3′ directional probing of mRNA for codons suitable for starting protein synthesis. The recognition of codons as starts depends on the codon identity and on its immediate nucleotide context known as Kozak context. When the context is weak (i.e., nonoptimal), leaky scanning takes place during which a fraction of ribosomes continues the mRNA probing. We explored the relationship between the context of AUG codons annotated as starts of protein-coding sequences and the next AUG codon occurrence. We found that AUG codons downstream from weak starts occur in the same frame more frequently than downstream from strong starts. We suggest that evolutionary selection on in-frame AUGs downstream from weak start codons is driven by the advantage of the reduction of wasteful out-of-frame product synthesis and also by the advantage of producing multiple proteoforms from certain mRNAs. We confirmed translation initiation downstream from weak start codons using ribosome profiling data. We also tested translation of alternative start codons in 10 specific human genes using reporter constructs. In all tested cases, initiation at downstream start codons was more productive than at the annotated ones. In most cases, optimization of Kozak context did not completely abolish downstream initiation, and in the specific example of CMPK1 mRNA, the optimized start remained unproductive. Collectively, our work reveals previously uncharacterized forces shaping the evolution of protein-coding genes and points to the plurality of translation initiation and the existence of sequence features influencing start codon selection, other than Kozak context.
Collapse
|
25
|
Brunet MA, Leblanc S, Roucou X. Reconsidering proteomic diversity with functional investigation of small ORFs and alternative ORFs. Exp Cell Res 2020; 393:112057. [PMID: 32387289 DOI: 10.1016/j.yexcr.2020.112057] [Citation(s) in RCA: 29] [Impact Index Per Article: 5.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/19/2019] [Revised: 04/21/2020] [Accepted: 05/02/2020] [Indexed: 12/13/2022]
Abstract
The discovery of functional yet non-annotated open reading frames (ORFs) throughout the genome of several species presents an unprecedented challenge in current genome annotation. These novel ORFs are shorter than annotated ones and many can be found on the same RNA, in opposition to current assumptions in annotation methodologies. Whilst the literature lacks consensus, these novel ORFs are commonly referred to as small ORFs (sORFs) or alternative ORFs (alt-ORFs). Unannotated ORFs represent an overlooked layer of complexity in the coding potential of genomes and are transforming our current vision of the nature of coding genes. In this review, we outline what constitutes a sORF or an alt-ORF and emphasize differences between both nomenclatures. We then describe complementary large-scale methods to accurately discover novel ORFs as well as yield functional insights on the novel proteins they encode. While serendipitous discoveries highlighted the functional importance of some novel ORFs, omics methods facilitate and improve their characterization to better understand physiological and pathological pathways. Functional annotation of sORFs, alt-ORFs and their corresponding microproteins will likely help fundamental and clinical research.
Collapse
Affiliation(s)
- Marie A Brunet
- Department of Biochemistry and Functional Genomics, Université de Sherbrooke, Sherbrooke, Québec, Canada; PROTEO, Quebec Network for Research on Protein Function, Structure, and Engineering, Canada.
| | - Sebastien Leblanc
- Department of Biochemistry and Functional Genomics, Université de Sherbrooke, Sherbrooke, Québec, Canada; PROTEO, Quebec Network for Research on Protein Function, Structure, and Engineering, Canada
| | - Xavier Roucou
- Department of Biochemistry and Functional Genomics, Université de Sherbrooke, Sherbrooke, Québec, Canada; PROTEO, Quebec Network for Research on Protein Function, Structure, and Engineering, Canada.
| |
Collapse
|
26
|
Monteuuis G, Miścicka A, Świrski M, Zenad L, Niemitalo O, Wrobel L, Alam J, Chacinska A, Kastaniotis AJ, Kufel J. Non-canonical translation initiation in yeast generates a cryptic pool of mitochondrial proteins. Nucleic Acids Res 2019; 47:5777-5791. [PMID: 31216041 PMCID: PMC6582344 DOI: 10.1093/nar/gkz301] [Citation(s) in RCA: 37] [Impact Index Per Article: 6.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/24/2018] [Revised: 04/12/2019] [Accepted: 04/16/2019] [Indexed: 12/15/2022] Open
Abstract
Utilization of non-AUG alternative translation start sites is most common in bacteria and viruses, but it has been also reported in other organisms. This phenomenon increases proteome complexity by allowing expression of multiple protein isoforms from a single gene. In Saccharomyces cerevisiae, a few described cases concern proteins that are translated from upstream near-cognate start codons as N-terminally extended variants that localize to mitochondria. Using bioinformatics tools, we provide compelling evidence that in yeast the potential for producing alternative protein isoforms by non-AUG translation initiation is much more prevalent than previously anticipated and may apply to as many as a few thousand proteins. Several hundreds of candidates are predicted to gain a mitochondrial targeting signal (MTS), generating an unrecognized pool of mitochondrial proteins. We confirmed mitochondrial localization of a subset of proteins previously not identified as mitochondrial, whose standard forms do not carry an MTS. Our data highlight the potential of non-canonical translation initiation in expanding the capacity of the mitochondrial proteome and possibly also other cellular features.
Collapse
Affiliation(s)
- Geoffray Monteuuis
- Faculty of Biochemistry and Molecular Medicine, University of Oulu, P.O. Box 5400, FIN-90014 Finland
| | - Anna Miścicka
- Institute of Genetics and Biotechnology, Faculty of Biology, University of Warsaw, 02-106 Warsaw, Poland
| | - Michał Świrski
- Institute of Genetics and Biotechnology, Faculty of Biology, University of Warsaw, 02-106 Warsaw, Poland
| | - Lounis Zenad
- Institute of Genetics and Biotechnology, Faculty of Biology, University of Warsaw, 02-106 Warsaw, Poland
| | - Olli Niemitalo
- Faculty of Biochemistry and Molecular Medicine, University of Oulu, P.O. Box 5400, FIN-90014 Finland
| | - Lidia Wrobel
- International Institute of Molecular and Cell Biology, 02-109 Warsaw, Poland
| | - Jahangir Alam
- Faculty of Biochemistry and Molecular Medicine, University of Oulu, P.O. Box 5400, FIN-90014 Finland
| | - Agnieszka Chacinska
- International Institute of Molecular and Cell Biology, 02-109 Warsaw, Poland.,Centre of New Technologies, University of Warsaw, 02-097 Warsaw, Poland
| | - Alexander J Kastaniotis
- Faculty of Biochemistry and Molecular Medicine, University of Oulu, P.O. Box 5400, FIN-90014 Finland
| | - Joanna Kufel
- Institute of Genetics and Biotechnology, Faculty of Biology, University of Warsaw, 02-106 Warsaw, Poland
| |
Collapse
|
27
|
Ang MY, Low TY, Lee PY, Wan Mohamad Nazarie WF, Guryev V, Jamal R. Proteogenomics: From next-generation sequencing (NGS) and mass spectrometry-based proteomics to precision medicine. Clin Chim Acta 2019; 498:38-46. [DOI: 10.1016/j.cca.2019.08.010] [Citation(s) in RCA: 25] [Impact Index Per Article: 4.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/22/2019] [Revised: 08/13/2019] [Accepted: 08/13/2019] [Indexed: 12/14/2022]
|
28
|
Xu Z, Hu L, Shi B, Geng S, Xu L, Wang D, Lu ZJ. Ribosome elongating footprints denoised by wavelet transform comprehensively characterize dynamic cellular translation events. Nucleic Acids Res 2019; 46:e109. [PMID: 29945224 PMCID: PMC6182183 DOI: 10.1093/nar/gky533] [Citation(s) in RCA: 34] [Impact Index Per Article: 5.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/24/2018] [Accepted: 05/31/2018] [Indexed: 02/06/2023] Open
Abstract
Translation is dynamically regulated during cell development and stress response. In order to detect actively translated open reading frames (ORFs) and dynamic cellular translation events, we have developed a computational method, RiboWave, to process ribosome profiling data. RiboWave utilizes wavelet transform to denoise the original signal by extracting 3-nt periodicity of ribosomes and precisely locate their footprint denoted as Periodic Footprint P-site (PF P-site). Such high-resolution footprint is found to capture the full track of actively elongating ribosomes, from which translational landscape can be explicitly characterized. We compare RiboWave with several published methods, like RiboTaper, ORFscore and RibORF, and found that RiboWave outperforms them in both accuracy and usage when defining actively translated ORFs. Moreover, we show that PF P-site derived by RiboWave shows superior performance in characterizing the dynamics and complexity of cellular translatome by accurately estimating the abundance of protein levels, assessing differential translation and identifying dynamic translation frameshift.
Collapse
Affiliation(s)
- Zhiyu Xu
- MOE Key Laboratory of Bioinformatics, Center for Synthetic and Systems Biology, School of Life Sciences, Tsinghua University, Beijing 100084, China
| | - Long Hu
- MOE Key Laboratory of Bioinformatics, Center for Synthetic and Systems Biology, School of Life Sciences, Tsinghua University, Beijing 100084, China
| | - Binbin Shi
- MOE Key Laboratory of Bioinformatics, Center for Synthetic and Systems Biology, School of Life Sciences, Tsinghua University, Beijing 100084, China
| | - SiSi Geng
- MOE Key Laboratory of Bioinformatics, Center for Synthetic and Systems Biology, School of Life Sciences, Tsinghua University, Beijing 100084, China
| | - Longchen Xu
- MOE Key Laboratory of Bioinformatics, Center for Synthetic and Systems Biology, School of Life Sciences, Tsinghua University, Beijing 100084, China
| | - Dong Wang
- MOE Key Laboratory of Bioinformatics, Center for Synthetic and Systems Biology, School of Life Sciences, Tsinghua University, Beijing 100084, China
| | - Zhi J Lu
- MOE Key Laboratory of Bioinformatics, Center for Synthetic and Systems Biology, School of Life Sciences, Tsinghua University, Beijing 100084, China
| |
Collapse
|
29
|
Verbruggen S, Ndah E, Van Criekinge W, Gessulat S, Kuster B, Wilhelm M, Van Damme P, Menschaert G. PROTEOFORMER 2.0: Further Developments in the Ribosome Profiling-assisted Proteogenomic Hunt for New Proteoforms. Mol Cell Proteomics 2019; 18:S126-S140. [PMID: 31040227 PMCID: PMC6692777 DOI: 10.1074/mcp.ra118.001218] [Citation(s) in RCA: 32] [Impact Index Per Article: 5.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/16/2018] [Revised: 04/30/2019] [Indexed: 12/20/2022] Open
Abstract
PROTEOFORMER is a pipeline that enables the automated processing of data derived from ribosome profiling (RIBO-seq, i.e. the sequencing of ribosome-protected mRNA fragments). As such, genome-wide ribosome occupancies lead to the delineation of data-specific translation product candidates and these can improve the mass spectrometry-based identification. Since its first publication, different upgrades, new features and extensions have been added to the PROTEOFORMER pipeline. Some of the most important upgrades include P-site offset calculation during mapping, comprehensive data pre-exploration, the introduction of two alternative proteoform calling strategies and extended pipeline output features. These novelties are illustrated by analyzing ribosome profiling data of human HCT116 and Jurkat data. The different proteoform calling strategies are used alongside one another and in the end combined together with reference sequences from UniProt. Matching mass spectrometry data are searched against this extended search space with MaxQuant. Overall, besides annotated proteoforms, this pipeline leads to the identification and validation of different categories of new proteoforms, including translation products of up- and downstream open reading frames, 5' and 3' extended and truncated proteoforms, single amino acid variants, splice variants and translation products of so-called noncoding regions. Further, proof-of-concept is reported for the improvement of spectrum matching by including Prosit, a deep neural network strategy that adds extra fragmentation spectrum intensity features to the analysis. In the light of ribosome profiling-driven proteogenomics, it is shown that this allows validating the spectrum matches of newly identified proteoforms with elevated stringency. These updates and novel conclusions provide new insights and lessons for the ribosome profiling-based proteogenomic research field. More practical information on the pipeline, raw code, the user manual (README) and explanations on the different modes of availability can be found at the GitHub repository of PROTEOFORMER: https://github.com/Biobix/proteoformer.
Collapse
Affiliation(s)
- Steven Verbruggen
- BioBix, Lab of Bioinformatics and Computational Genomics, Department of Mathematical Modeling, Statistics and Bioinformatics, Faculty of Bioscience Engineering, Ghent University, Ghent, Belgium.
| | - Elvis Ndah
- BioBix, Lab of Bioinformatics and Computational Genomics, Department of Mathematical Modeling, Statistics and Bioinformatics, Faculty of Bioscience Engineering, Ghent University, Ghent, Belgium; VIB-UGent Center for Medical Biotechnology, Ghent, Belgium
| | - Wim Van Criekinge
- BioBix, Lab of Bioinformatics and Computational Genomics, Department of Mathematical Modeling, Statistics and Bioinformatics, Faculty of Bioscience Engineering, Ghent University, Ghent, Belgium
| | - Siegfried Gessulat
- Chair of Proteomics and Bioanalytics, Technical University of Munich, Munich, Germany; SAP SE, Potsdam, Germany
| | - Bernhard Kuster
- Chair of Proteomics and Bioanalytics, Technical University of Munich, Munich, Germany
| | - Mathias Wilhelm
- Chair of Proteomics and Bioanalytics, Technical University of Munich, Munich, Germany
| | - Petra Van Damme
- VIB-UGent Center for Medical Biotechnology, Ghent, Belgium; Department of Biochemistry and Microbiology, Faculty of Sciences, Ghent University, Ghent, Belgium
| | - Gerben Menschaert
- BioBix, Lab of Bioinformatics and Computational Genomics, Department of Mathematical Modeling, Statistics and Bioinformatics, Faculty of Bioscience Engineering, Ghent University, Ghent, Belgium.
| |
Collapse
|
30
|
Kaspar JR, Walker AR. Expanding the Vocabulary of Peptide Signals in Streptococcus mutans. Front Cell Infect Microbiol 2019; 9:194. [PMID: 31245303 PMCID: PMC6563777 DOI: 10.3389/fcimb.2019.00194] [Citation(s) in RCA: 16] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/01/2019] [Accepted: 05/21/2019] [Indexed: 12/18/2022] Open
Abstract
Streptococci, including the dental pathogen Streptococcus mutans, undergo cell-to-cell signaling that is mediated by small peptides to control critical physiological functions such as adaptation to the environment, control of subpopulation behaviors and regulation of virulence factors. One such model pathway is the regulation of genetic competence, controlled by the ComRS signaling system and the peptide XIP. However, recent research in the characterization of this pathway has uncovered novel operons and peptides that are intertwined into its regulation. These discoveries, such as cell lysis playing a critical role in XIP release and importance of bacterial self-sensing during the signaling process, have caused us to reevaluate previous paradigms and shift our views on the true purpose of these signaling systems. The finding of new peptides such as the ComRS inhibitor XrpA and the peptides of the RcrRPQ operon also suggests there may be more peptides hidden in the genomes of streptococci that could play critical roles in the physiology of these organisms. In this review, we summarize the recent findings in S. mutans regarding the integration of other circuits into the ComRS signaling pathway, the true mode of XIP export, and how the RcrRPQ operon controls competence activation. We also look at how new technologies can be used to re-annotate the genome to find new open reading frames that encode peptide signals. Together, this summary of research will allow us to reconsider how we perceive these systems to behave and lead us to expand our vocabulary of peptide signals within the genus Streptococcus.
Collapse
Affiliation(s)
- Justin R. Kaspar
- Department of Oral Biology, University of Florida, Gainesville, FL, United States
| | | |
Collapse
|
31
|
Ma WT, Liu ZY, Chen XZ, Lin ZL, Zheng ZB, Miao WG, Xie SQ. A protein identification algorithm for tandem mass spectrometry by incorporating the abundance of mRNA into a binomial probability scoring model. J Proteomics 2019; 197:53-59. [PMID: 30790687 DOI: 10.1016/j.jprot.2019.02.010] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/06/2018] [Revised: 02/15/2019] [Accepted: 02/17/2019] [Indexed: 12/17/2022]
Abstract
Peptide-spectrum matches (PSM) scoring between the experimental and theoretical spectrum is a key step in the identification of proteins using mass spectrometry (MS)-based proteomics analyses. Efficient protein identification using MS/MS data remains a challenge. The strategy of using RNA-seq data increases the number of proteins identified by re-constructing the custom search database and integrating mRNA abundance into the false discovery rate of post-PSM. However, this process lacks an algorithm that can allow the incorporation of mRNA abundance into the key scoring model of PSM. Therefore, we developed a novel PSM scoring model, which incorporates mRNA abundance for improved peptide and protein identification. In the new algorithm, abundance information of mRNA was transformed to the prior probability of protein identification and integrated to re-score in PSM using the binomial probability distribution model. Compared with other algorithms using five MS/MS datasets, the results showed that the least improvement ratios of peptide and protein groups were 3.39%-9.79% and 0.48%-8.16% in different datasets (human, rat, zebrafish, yeast, and Arabidopsis thaliana). The new strategy offers an effective solution for MS-based identification of peptides and proteins. SIGNIFICANCE: The new algorithm identifies proteins by quantifying mRNA abundance (FPKM) and incorporating it into a scoring model for peptide-spectrum matches. It is important to improve peptide and protein identification from MS/MS datasets in proteomics research.
Collapse
Affiliation(s)
- Wen-Tai Ma
- Institute of Tropical Agriculture and Forestry, Hainan University, Haikou 570228, China
| | - Zhao-Yu Liu
- Institute of Tropical Agriculture and Forestry, Hainan University, Haikou 570228, China
| | - Xiao-Zhou Chen
- School of Mathematics and Computer science, Yunnan Minzu University, Kunming 650031, China
| | - Zhen-Liang Lin
- Department of General Surgery, The Affiliated Cangnan Hospital of Wenzhou Medical University, Wenzhou 325800, China
| | - Zhong-Bing Zheng
- Institute of Tropical Agriculture and Forestry, Hainan University, Haikou 570228, China
| | - Wei-Guo Miao
- Institute of Tropical Agriculture and Forestry, Hainan University, Haikou 570228, China.
| | - Shang-Qian Xie
- Institute of Tropical Agriculture and Forestry, Hainan University, Haikou 570228, China.
| |
Collapse
|
32
|
Low TY, Mohtar MA, Ang MY, Jamal R. Connecting Proteomics to Next‐Generation Sequencing: Proteogenomics and Its Current Applications in Biology. Proteomics 2018; 19:e1800235. [DOI: 10.1002/pmic.201800235] [Citation(s) in RCA: 23] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/11/2018] [Revised: 10/09/2018] [Indexed: 12/17/2022]
Affiliation(s)
- Teck Yew Low
- UKM Medical Molecular Biology Institute (UMBI)Universiti Kebangsaan Malaysia 56000 Kuala Lumpur Malaysia
| | - M. Aiman Mohtar
- UKM Medical Molecular Biology Institute (UMBI)Universiti Kebangsaan Malaysia 56000 Kuala Lumpur Malaysia
| | - Mia Yang Ang
- UKM Medical Molecular Biology Institute (UMBI)Universiti Kebangsaan Malaysia 56000 Kuala Lumpur Malaysia
| | - Rahman Jamal
- UKM Medical Molecular Biology Institute (UMBI)Universiti Kebangsaan Malaysia 56000 Kuala Lumpur Malaysia
| |
Collapse
|
33
|
Carlyle BC, Kitchen RR, Zhang J, Wilson R, Lam TT, Rozowsky JS, Williams KR, Sestan N, Gerstein M, Nairn AC. Isoform-Level Interpretation of High-Throughput Proteomics Data Enabled by Deep Integration with RNA-seq. J Proteome Res 2018; 17:3431-3444. [PMID: 30125121 PMCID: PMC6392456 DOI: 10.1021/acs.jproteome.8b00310] [Citation(s) in RCA: 18] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/18/2022]
Abstract
Cellular control of gene expression is a complex process that is subject to multiple levels of regulation, but ultimately it is the protein produced that determines the biosynthetic state of the cell. One way that a cell can regulate the protein output from each gene is by expressing alternate isoforms with distinct amino acid sequences. These isoforms may exhibit differences in localization and binding interactions that can have profound functional implications. High-throughput liquid chromatography tandem mass spectrometry proteomics (LC-MS/MS) relies on enzymatic digestion and has lower coverage and sensitivity than transcriptomic profiling methods such as RNA-seq. Digestion results in predictable fragmentation of a protein, which can limit the generation of peptides capable of distinguishing between isoforms. Here we exploit transcript-level expression from RNA-seq to set prior likelihoods and enable protein isoform abundances to be directly estimated from LC-MS/MS, an approach derived from the principle that most genes appear to be expressed as a single dominant isoform in a given cell type or tissue. Through this deep integration of RNA-seq and LC-MS/MS data from the same sample, we show that a principal isoform can be identified in >80% of gene products in homogeneous HEK293 cell culture and >70% of proteins detected in complex human brain tissue. We demonstrate that the incorporation of translatome data from ribosome profiling further refines this process. Defining isoforms in experiments with matched RNA-seq/translatome and proteomic data increases the functional relevance of such data sets and will further broaden our understanding of multilevel control of gene expression.
Collapse
Affiliation(s)
- Becky C. Carlyle
- Department of Psychiatry, Yale School of Medicine, Connecticut Mental Health Center, 34 Park St, New Haven, CT 06519
| | - Robert R. Kitchen
- Department of Psychiatry, Yale School of Medicine, Connecticut Mental Health Center, 34 Park St, New Haven, CT 06519
- Department of Molecular Biophysics & Biochemistry, Yale School of Medicine, PO Box 208114, New Haven, CT, 06520
| | - Jing Zhang
- Department of Molecular Biophysics & Biochemistry, Yale School of Medicine, PO Box 208114, New Haven, CT, 06520
| | - Rashaun Wilson
- Yale/NIDA Neuroproteomics Center, Yale School of Medicine, 300 George Street, New Haven, CT 06510
| | - Tukiet T Lam
- Department of Molecular Biophysics & Biochemistry, Yale School of Medicine, PO Box 208114, New Haven, CT, 06520
- Yale/NIDA Neuroproteomics Center, Yale School of Medicine, 300 George Street, New Haven, CT 06510
- W.M. Keck Biotechnology Resource Laboratory, Yale School of Medicine, 300 George Street, New Haven, CT 06510
| | - Joel S Rozowsky
- Department of Molecular Biophysics & Biochemistry, Yale School of Medicine, PO Box 208114, New Haven, CT, 06520
| | - Kenneth R Williams
- Department of Molecular Biophysics & Biochemistry, Yale School of Medicine, PO Box 208114, New Haven, CT, 06520
- Yale/NIDA Neuroproteomics Center, Yale School of Medicine, 300 George Street, New Haven, CT 06510
| | - Nenad Sestan
- Department of Neuroscience and Kavli Institute for Neuroscience, Departments of Genetics and Psychiatry, Section of Comparative Medicine, and Yale Child Study Center, Program in Cellular Neuroscience, Neurodegeneration and Repair, Yale School of Medicine, New Haven, CT 06510
| | - Mark Gerstein
- Department of Molecular Biophysics & Biochemistry, Yale School of Medicine, PO Box 208114, New Haven, CT, 06520
| | - Angus C Nairn
- Department of Psychiatry, Yale School of Medicine, Connecticut Mental Health Center, 34 Park St, New Haven, CT 06519
| |
Collapse
|
34
|
Delcourt V, Brunelle M, Roy AV, Jacques JF, Salzet M, Fournier I, Roucou X. The Protein Coded by a Short Open Reading Frame, Not by the Annotated Coding Sequence, Is the Main Gene Product of the Dual-Coding Gene MIEF1. Mol Cell Proteomics 2018; 17:2402-2411. [PMID: 30181344 PMCID: PMC6283296 DOI: 10.1074/mcp.ra118.000593] [Citation(s) in RCA: 42] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/23/2018] [Revised: 07/19/2018] [Indexed: 12/18/2022] Open
Abstract
Proteogenomics and ribosome profiling concurrently show that genes may code for both a large and one or more small proteins translated from annotated coding sequences (CDSs) and unannotated alternative open reading frames (named alternative ORFs or altORFs), respectively, but the stoichiometry between large and small proteins translated from a same gene is unknown. MIEF1, a gene recently identified as a dual-coding gene, harbors a CDS and a newly annotated and actively translated altORF located in the 5′UTR. Here, we use absolute quantification with stable isotope-labeled peptides and parallel reaction monitoring to determine levels of both proteins in two human cells lines and in human colon. We report that the main MIEF1 translational product is not the canonical 463 amino acid MiD51 protein but the small 70 amino acid alternative MiD51 protein (altMiD51). These results demonstrate the inadequacy of the single CDS concept and provide a strong argument for incorporating altORFs and small proteins in functional annotations.
Collapse
Affiliation(s)
- Vivian Delcourt
- Département de Biochimie, Université de Sherbrooke, Québec, Canada; Univ. Lille, INSERM U1192, Laboratoire Protéomique, Réponse Inflammatoire and Spectrométrie de Masse (PRISM) F-59000 Lille, France; PROTEO, Québec Network for Research on Protein Function, Structure, and Engineering, Québec, Canada
| | - Mylène Brunelle
- Département de Biochimie, Université de Sherbrooke, Québec, Canada; PROTEO, Québec Network for Research on Protein Function, Structure, and Engineering, Québec, Canada
| | - Annie V Roy
- Département de Biochimie, Université de Sherbrooke, Québec, Canada; PROTEO, Québec Network for Research on Protein Function, Structure, and Engineering, Québec, Canada
| | - Jean-François Jacques
- Département de Biochimie, Université de Sherbrooke, Québec, Canada; PROTEO, Québec Network for Research on Protein Function, Structure, and Engineering, Québec, Canada
| | - Michel Salzet
- Univ. Lille, INSERM U1192, Laboratoire Protéomique, Réponse Inflammatoire and Spectrométrie de Masse (PRISM) F-59000 Lille, France
| | - Isabelle Fournier
- Univ. Lille, INSERM U1192, Laboratoire Protéomique, Réponse Inflammatoire and Spectrométrie de Masse (PRISM) F-59000 Lille, France
| | - Xavier Roucou
- Département de Biochimie, Université de Sherbrooke, Québec, Canada; PROTEO, Québec Network for Research on Protein Function, Structure, and Engineering, Québec, Canada.
| |
Collapse
|
35
|
MetAP1 and MetAP2 drive cell selectivity for a potent anti-cancer agent in synergy, by controlling glutathione redox state. Oncotarget 2018; 7:63306-63323. [PMID: 27542228 PMCID: PMC5325365 DOI: 10.18632/oncotarget.11216] [Citation(s) in RCA: 29] [Impact Index Per Article: 4.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/10/2016] [Accepted: 07/19/2016] [Indexed: 12/17/2022] Open
Abstract
Fumagillin and its derivatives are therapeutically useful because they can decrease cancer progression. The specific molecular target of fumagillin is methionine aminopeptidase 2 (MetAP2), one of the two MetAPs present in the cytosol. MetAPs catalyze N-terminal methionine excision (NME), an essential pathway of cotranslational protein maturation. To date, it remains unclear the respective contribution of MetAP1 and MetAP2 to the NME process in vivo and why MetAP2 inhibition causes cell cycle arrest only in a subset of cells. Here, we performed a global characterization of the N-terminal methionine excision pathway and the inhibition of MetAP2 by fumagillin in a number of lines, including cancer cell lines. Large-scale N-terminus profiling in cells responsive and unresponsive to fumagillin treatment revealed that both MetAPs were required in vivo for M[VT]X-targets and, possibly, for lower-level M[G]X-targets. Interestingly, we found that the responsiveness of the cell lines to fumagillin was correlated with the ability of the cells to modulate their glutathione homeostasis. Indeed, alterations to glutathione status were observed in fumagillin-sensitive cells but not in cells unresponsive to this agent. Proteo-transcriptomic analyses revealed that both MetAP1 and MetAP2 accumulated in a cell-specific manner and that cell sensitivity to fumagillin was related to the levels of these MetAPs, particularly MetAP1. We suggest that MetAP1 levels could be routinely checked in several types of tumor and used as a prognostic marker for predicting the response to treatments inhibiting MetAP2.
Collapse
|
36
|
Delcourt V, Staskevicius A, Salzet M, Fournier I, Roucou X. Small Proteins Encoded by Unannotated ORFs are Rising Stars of the Proteome, Confirming Shortcomings in Genome Annotations and Current Vision of an mRNA. Proteomics 2017. [DOI: 10.1002/pmic.201700058] [Citation(s) in RCA: 43] [Impact Index Per Article: 5.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/20/2022]
Affiliation(s)
- Vivian Delcourt
- Department of Biochemistry; Université de Sherbrooke; Quebec Canada
- Univ. Lille, INSERM U1192, Laboratoire Protéomique; Réponse Inflammatoire & Spectrométrie de Masse (PRISM); Lille France
- PROTEO, Quebec Network for Research on Protein Function; Structure, and Engineering; Quebec Canada
| | | | - Michel Salzet
- Univ. Lille, INSERM U1192, Laboratoire Protéomique; Réponse Inflammatoire & Spectrométrie de Masse (PRISM); Lille France
| | - Isabelle Fournier
- Univ. Lille, INSERM U1192, Laboratoire Protéomique; Réponse Inflammatoire & Spectrométrie de Masse (PRISM); Lille France
| | - Xavier Roucou
- Department of Biochemistry; Université de Sherbrooke; Quebec Canada
- PROTEO, Quebec Network for Research on Protein Function; Structure, and Engineering; Quebec Canada
| |
Collapse
|
37
|
Fijalkowska D, Verbruggen S, Ndah E, Jonckheere V, Menschaert G, Van Damme P. eIF1 modulates the recognition of suboptimal translation initiation sites and steers gene expression via uORFs. Nucleic Acids Res 2017; 45:7997-8013. [PMID: 28541577 PMCID: PMC5570006 DOI: 10.1093/nar/gkx469] [Citation(s) in RCA: 54] [Impact Index Per Article: 6.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/12/2017] [Accepted: 05/11/2017] [Indexed: 12/25/2022] Open
Abstract
Alternative translation initiation mechanisms such as leaky scanning and reinitiation potentiate the polycistronic nature of human transcripts. By allowing for reprogrammed translation, these mechanisms can mediate biological responses to stimuli. We combined proteomics with ribosome profiling and mRNA sequencing to identify the biological targets of translation control triggered by the eukaryotic translation initiation factor 1 (eIF1), a protein implicated in the stringency of start codon selection. We quantified expression changes of over 4000 proteins and 10 000 actively translated transcripts, leading to the identification of 245 transcripts undergoing translational control mediated by upstream open reading frames (uORFs) upon eIF1 deprivation. Here, the stringency of start codon selection and preference for an optimal nucleotide context were largely diminished leading to translational upregulation of uORFs with suboptimal start. Interestingly, genes affected by eIF1 deprivation were implicated in energy production and sensing of metabolic stress.
Collapse
Affiliation(s)
- Daria Fijalkowska
- VIB-UGent Center for Medical Biotechnology, B-9000 Ghent, Belgium.,Department of Biochemistry, Ghent University, B-9000 Ghent, Belgium
| | - Steven Verbruggen
- Lab of Bioinformatics and Computational Genomics, Department of Mathematical Modelling, Statistics and Bioinformatics, Faculty of Bioscience Engineering, Ghent University, B-9000 Ghent, Belgium
| | - Elvis Ndah
- VIB-UGent Center for Medical Biotechnology, B-9000 Ghent, Belgium.,Department of Biochemistry, Ghent University, B-9000 Ghent, Belgium.,Lab of Bioinformatics and Computational Genomics, Department of Mathematical Modelling, Statistics and Bioinformatics, Faculty of Bioscience Engineering, Ghent University, B-9000 Ghent, Belgium
| | - Veronique Jonckheere
- VIB-UGent Center for Medical Biotechnology, B-9000 Ghent, Belgium.,Department of Biochemistry, Ghent University, B-9000 Ghent, Belgium
| | - Gerben Menschaert
- Lab of Bioinformatics and Computational Genomics, Department of Mathematical Modelling, Statistics and Bioinformatics, Faculty of Bioscience Engineering, Ghent University, B-9000 Ghent, Belgium
| | - Petra Van Damme
- VIB-UGent Center for Medical Biotechnology, B-9000 Ghent, Belgium.,Department of Biochemistry, Ghent University, B-9000 Ghent, Belgium
| |
Collapse
|
38
|
Heunis T, Dippenaar A, Warren RM, van Helden PD, van der Merwe RG, Gey van Pittius NC, Pain A, Sampson SL, Tabb DL. Proteogenomic Investigation of Strain Variation in Clinical Mycobacterium tuberculosis Isolates. J Proteome Res 2017; 16:3841-3851. [PMID: 28820946 DOI: 10.1021/acs.jproteome.7b00483] [Citation(s) in RCA: 19] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/12/2023]
Abstract
Mycobacterium tuberculosis consists of a large number of different strains that display unique virulence characteristics. Whole-genome sequencing has revealed substantial genetic diversity among clinical M. tuberculosis isolates, and elucidating the phenotypic variation encoded by this genetic diversity will be of the utmost importance to fully understand M. tuberculosis biology and pathogenicity. In this study, we integrated whole-genome sequencing and mass spectrometry (GeLC-MS/MS) to reveal strain-specific characteristics in the proteomes of two clinical M. tuberculosis Latin American-Mediterranean isolates. Using this approach, we identified 59 peptides containing single amino acid variants, which covered ∼9% of all coding nonsynonymous single nucleotide variants detected by whole-genome sequencing. Furthermore, we identified 29 distinct peptides that mapped to a hypothetical protein not present in the M. tuberculosis H37Rv reference proteome. Here, we provide evidence for the expression of this protein in the clinical M. tuberculosis SAWC3651 isolate. The strain-specific databases enabled confirmation of genomic differences (i.e., large genomic regions of difference and nonsynonymous single nucleotide variants) in these two clinical M. tuberculosis isolates and allowed strain differentiation at the proteome level. Our results contribute to the growing field of clinical microbial proteogenomics and can improve our understanding of phenotypic variation in clinical M. tuberculosis isolates.
Collapse
Affiliation(s)
- Tiaan Heunis
- DST/NRF Centre of Excellence for Biomedical Tuberculosis Research, SAMRC Centre for Tuberculosis Research, Division of Molecular Biology and Human Genetics, Faculty of Medicine and Health Sciences, Stellenbosch University , Cape Town 7505, South Africa
| | - Anzaan Dippenaar
- DST/NRF Centre of Excellence for Biomedical Tuberculosis Research, SAMRC Centre for Tuberculosis Research, Division of Molecular Biology and Human Genetics, Faculty of Medicine and Health Sciences, Stellenbosch University , Cape Town 7505, South Africa
| | - Robin M Warren
- DST/NRF Centre of Excellence for Biomedical Tuberculosis Research, SAMRC Centre for Tuberculosis Research, Division of Molecular Biology and Human Genetics, Faculty of Medicine and Health Sciences, Stellenbosch University , Cape Town 7505, South Africa
| | - Paul D van Helden
- DST/NRF Centre of Excellence for Biomedical Tuberculosis Research, SAMRC Centre for Tuberculosis Research, Division of Molecular Biology and Human Genetics, Faculty of Medicine and Health Sciences, Stellenbosch University , Cape Town 7505, South Africa
| | - Ruben G van der Merwe
- DST/NRF Centre of Excellence for Biomedical Tuberculosis Research, SAMRC Centre for Tuberculosis Research, Division of Molecular Biology and Human Genetics, Faculty of Medicine and Health Sciences, Stellenbosch University , Cape Town 7505, South Africa
| | - Nicolaas C Gey van Pittius
- DST/NRF Centre of Excellence for Biomedical Tuberculosis Research, SAMRC Centre for Tuberculosis Research, Division of Molecular Biology and Human Genetics, Faculty of Medicine and Health Sciences, Stellenbosch University , Cape Town 7505, South Africa
| | - Arnab Pain
- Pathogen Genomics Laboratory, BESE Division, King Abdullah University of Science and Technology , Thuwal 23955, Saudi Arabia
| | - Samantha L Sampson
- DST/NRF Centre of Excellence for Biomedical Tuberculosis Research, SAMRC Centre for Tuberculosis Research, Division of Molecular Biology and Human Genetics, Faculty of Medicine and Health Sciences, Stellenbosch University , Cape Town 7505, South Africa
| | - David L Tabb
- DST/NRF Centre of Excellence for Biomedical Tuberculosis Research, SAMRC Centre for Tuberculosis Research, Division of Molecular Biology and Human Genetics, Faculty of Medicine and Health Sciences, Stellenbosch University , Cape Town 7505, South Africa
| |
Collapse
|
39
|
Menschaert G, David F. Proteogenomics from a bioinformatics angle: A growing field. MASS SPECTROMETRY REVIEWS 2017; 36:584-599. [PMID: 26670565 PMCID: PMC6101030 DOI: 10.1002/mas.21483] [Citation(s) in RCA: 58] [Impact Index Per Article: 7.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/20/2015] [Accepted: 09/01/2015] [Indexed: 05/16/2023]
Abstract
Proteogenomics is a research area that combines areas as proteomics and genomics in a multi-omics setup using both mass spectrometry and high-throughput sequencing technologies. Currently, the main goals of the field are to aid genome annotation or to unravel the proteome complexity. Mass spectrometry based identifications of matching or homologues peptides can further refine gene models. Also, the identification of novel proteoforms is also made possible based on detection of novel translation initiation sites (cognate or near-cognate), novel transcript isoforms, sequence variation or novel (small) open reading frames in intergenic or un-translated genic regions by analyzing high-throughput sequencing data from RNAseq or ribosome profiling experiments. Other proteogenomics studies using a combination of proteomics and genomics techniques focus on antibody sequencing, the identification of immunogenic peptides or venom peptides. Over the years, a growing amount of bioinformatics tools and databases became available to help streamlining these cross-omics studies. Some of these solutions only help in specific steps of the proteogenomics studies, e.g. building custom sequence databases (based on next generation sequencing output) for mass spectrometry fragmentation spectrum matching. Over the last few years a handful integrative tools also became available that can execute complete proteogenomics analyses. Some of these are presented as stand-alone solutions, whereas others are implemented in a web-based framework such as Galaxy. In this review we aimed at sketching a comprehensive overview of all the bioinformatics solutions that are available for this growing research area. © 2015 Wiley Periodicals, Inc. Mass Spec Rev 36:584-599, 2017.
Collapse
Affiliation(s)
- Gerben Menschaert
- Lab of Bioinformatics and Computational Genomics, Department of
Mathematical Modeling, Statistics and Bioinformatics, Faculty of Bioscience
Engineering, Ghent University, Ghent, Belgium
- To whom correspondence should be addressed. Tel:
+32 9 264 99 22; Fax: +32 9 264 6220;
| | - Fenyö David
- Center for Health Informatics and Bioinformatics and Department of
Biochemistry and Molecular Pharmacology, New York University School of Medicine, New
York, New York, USA
| |
Collapse
|
40
|
Wingo TS, Duong DM, Zhou M, Dammer EB, Wu H, Cutler DJ, Lah JJ, Levey AI, Seyfried NT. Integrating Next-Generation Genomic Sequencing and Mass Spectrometry To Estimate Allele-Specific Protein Abundance in Human Brain. J Proteome Res 2017; 16:3336-3347. [PMID: 28691493 DOI: 10.1021/acs.jproteome.7b00324] [Citation(s) in RCA: 28] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/28/2022]
Abstract
Gene expression contributes to phenotypic traits and human disease. To date, comparatively less is known about regulators of protein abundance, which is also under genetic control and likely influences clinical phenotypes. However, identifying and quantifying allele-specific protein abundance by bottom-up proteomics is challenging since single nucleotide variants (SNVs) that alter protein sequence are not considered in standard human protein databases. To address this, we developed the GenPro software and used it to create personalized protein databases (PPDs) to identify single amino acid variants (SAAVs) at the protein level from whole exome sequencing. In silico assessment of PPDs generated by GenPro revealed only a 1% increase in tryptic search space compared to a direct translation of all human transcripts and an equivalent search space compared to the UniProtKB reference database. To identify a large unbiased number of SAAV peptides, we performed high-resolution mass spectrometry-based proteomics for two human post-mortem brain samples and searched the collected MS/MS spectra against their respective PPD. We found an average of ∼117 000 unique peptides mapping to ∼9300 protein groups for each sample, and of these, 977 were unique variant peptides. We found that over 400 reference and SAAV peptide pairs were, on average, equally abundant in human brain by label-free ion intensity measurements and confirmed the absolute levels of three reference and SAAV peptide pairs using heavy labeled peptides standards coupled with parallel reaction monitoring (PRM). Our results highlight the utility of integrating genomic and proteomic sequencing data to identify sample-specific SAAV peptides and support the hypothesis that most alleles are equally expressed in human brain.
Collapse
Affiliation(s)
- Thomas S Wingo
- Division of Neurology, Department of Veterans Affairs Medical Center , Decatur, Georgia 30033, United States
| | | | | | | | | | | | | | | | | |
Collapse
|
41
|
Willems P, Ndah E, Jonckheere V, Stael S, Sticker A, Martens L, Van Breusegem F, Gevaert K, Van Damme P. N-terminal Proteomics Assisted Profiling of the Unexplored Translation Initiation Landscape in Arabidopsis thaliana. Mol Cell Proteomics 2017; 16:1064-1080. [PMID: 28432195 PMCID: PMC5461538 DOI: 10.1074/mcp.m116.066662] [Citation(s) in RCA: 39] [Impact Index Per Article: 4.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/22/2016] [Revised: 04/11/2017] [Indexed: 01/05/2023] Open
Abstract
Proteogenomics is an emerging research field yet lacking a uniform method of analysis. Proteogenomic studies in which N-terminal proteomics and ribosome profiling are combined, suggest that a high number of protein start sites are currently missing in genome annotations. We constructed a proteogenomic pipeline specific for the analysis of N-terminal proteomics data, with the aim of discovering novel translational start sites outside annotated protein coding regions. In summary, unidentified MS/MS spectra were matched to a specific N-terminal peptide library encompassing protein N termini encoded in the Arabidopsis thaliana genome. After a stringent false discovery rate filtering, 117 protein N termini compliant with N-terminal methionine excision specificity and indicative of translation initiation were found. These include N-terminal protein extensions and translation from transposable elements and pseudogenes. Gene prediction provided supporting protein-coding models for approximately half of the protein N termini. Besides the prediction of functional domains (partially) contained within the newly predicted ORFs, further supporting evidence of translation was found in the recently released Araport11 genome re-annotation of Arabidopsis and computational translations of sequences stored in public repositories. Most interestingly, complementary evidence by ribosome profiling was found for 23 protein N termini. Finally, by analyzing protein N-terminal peptides, an in silico analysis demonstrates the applicability of our N-terminal proteogenomics strategy in revealing protein-coding potential in species with well- and poorly-annotated genomes.
Collapse
Affiliation(s)
- Patrick Willems
- From the ‡VIB/UGent Center for Plant Systems Biology, 9052 Ghent, Belgium.,§Ghent University, Department of Plant Biotechnology and Bioinformatics, 9052 Ghent.,¶VIB/UGent Center for Medical Biotechnology, 9000 Ghent, Belgium.,‖Ghent University, Department of Biochemistry, 9000 Ghent, Belgium
| | - Elvis Ndah
- ¶VIB/UGent Center for Medical Biotechnology, 9000 Ghent, Belgium.,‖Ghent University, Department of Biochemistry, 9000 Ghent, Belgium.,**Ghent University, Department of Mathematical Modeling, Statistics and Bioinformatics, 9000 Ghent, Belgium
| | - Veronique Jonckheere
- ¶VIB/UGent Center for Medical Biotechnology, 9000 Ghent, Belgium.,‖Ghent University, Department of Biochemistry, 9000 Ghent, Belgium
| | - Simon Stael
- From the ‡VIB/UGent Center for Plant Systems Biology, 9052 Ghent, Belgium.,§Ghent University, Department of Plant Biotechnology and Bioinformatics, 9052 Ghent.,¶VIB/UGent Center for Medical Biotechnology, 9000 Ghent, Belgium.,‖Ghent University, Department of Biochemistry, 9000 Ghent, Belgium
| | - Adriaan Sticker
- ¶VIB/UGent Center for Medical Biotechnology, 9000 Ghent, Belgium.,‖Ghent University, Department of Biochemistry, 9000 Ghent, Belgium.,**Ghent University, Department of Mathematical Modeling, Statistics and Bioinformatics, 9000 Ghent, Belgium
| | - Lennart Martens
- ¶VIB/UGent Center for Medical Biotechnology, 9000 Ghent, Belgium.,‖Ghent University, Department of Biochemistry, 9000 Ghent, Belgium.,**Ghent University, Department of Mathematical Modeling, Statistics and Bioinformatics, 9000 Ghent, Belgium
| | - Frank Van Breusegem
- From the ‡VIB/UGent Center for Plant Systems Biology, 9052 Ghent, Belgium.,§Ghent University, Department of Plant Biotechnology and Bioinformatics, 9052 Ghent
| | - Kris Gevaert
- ¶VIB/UGent Center for Medical Biotechnology, 9000 Ghent, Belgium.,‖Ghent University, Department of Biochemistry, 9000 Ghent, Belgium
| | - Petra Van Damme
- ¶VIB/UGent Center for Medical Biotechnology, 9000 Ghent, Belgium; .,‖Ghent University, Department of Biochemistry, 9000 Ghent, Belgium
| |
Collapse
|
42
|
Adjibade P, Grenier St-Sauveur V, Droit A, Khandjian EW, Toren P, Mazroui R. Analysis of the translatome in solid tumors using polyribosome profiling/RNA-Seq. J Biol Methods 2016; 3:e59. [PMID: 31453221 PMCID: PMC6706116 DOI: 10.14440/jbm.2016.151] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/15/2016] [Revised: 09/22/2016] [Accepted: 10/18/2016] [Indexed: 01/25/2023] Open
Abstract
Gene expression involves multiple steps from the transcription of a mRNA in the nucleus to the production of the encoded protein in the cytoplasm. This final step occurs through a highly regulated process of mRNA translation on ribosomes that is required to maintain cell homeostasis. Alterations in the control of mRNA translation may lead to cell's transformation, a hallmark of cancer development. Indeed, recent advances indicated that increased translation of mRNAs encoding tumor-promoting proteins may be a key mechanism of tumor resistance in several cancers. Moreover, it was found that proteins whose encoding mRNAs are translated at higher efficiencies may be effective biomarkers. Evaluation of global changes in translation efficiency in human tumors has thus the potential of better understanding what can be used as biomarkers and therapeutic targets. Investigating changes in translation efficiency in human cancer cells has been made possible through the development and use of the polyribosome profiling combined with DNA microarray or deep RNA sequencing (RNA-Seq). While helpful, the use of cancer cell lines has many limitations and it is essential to define translational changes in human tumor samples in order to properly prioritize genes implicated in cancer phenotype. We present an optimized polyribosome RNA-Seq protocol suitable for quantitative analysis of mRNA translation that occurs in human tumor samples and murine xenografts. Applying this innovative approach to human tumors, which requires a complementary bioinformatics analysis, unlocks the potential to identify key mRNA which are preferentially translated in tumor tissue compared to benign tissue as well as translational changes which occur following treatment. These technical advances will be of interest to those researching all solid tumors, opening possibilities for understanding what may be therapeutic Achilles heels' or relevant biomarkers.
Collapse
Affiliation(s)
- Pauline Adjibade
- Centre de Recherche en Cancérologie. Centre de Recherche du CHU de Québec. Département de Biologie Moléculaire, Biochimie Médicale et Pathologie, Faculté de Médecine, Université Laval, Québec, PQ, Canada
| | - Valérie Grenier St-Sauveur
- Centre de Recherche en Cancérologie. Centre de Recherche du CHU de Québec. Département de Biologie Moléculaire, Biochimie Médicale et Pathologie, Faculté de Médecine, Université Laval, Québec, PQ, Canada
| | - Arnaud Droit
- Centre de Recherche du CHU de Québec. Département de Médecine Moléculaire, Faculté de Médecine, Université Laval, Québec, PQ, Canada
| | - Edouard W. Khandjian
- Centre de Recherche, Institut Universitaire en Santé Mentale de Québec. Département de Psychiatrie et de Neurosciences, Faculté de Médecine, Université Laval, Québec, PQ, Canada
| | - Paul Toren
- Centre de Recherche du CHU de Québec. Département de Chirurgie, Faculté de Médecine, Université Laval, Québec, PQ, Canada
| | - Rachid Mazroui
- Centre de Recherche en Cancérologie. Centre de Recherche du CHU de Québec. Département de Biologie Moléculaire, Biochimie Médicale et Pathologie, Faculté de Médecine, Université Laval, Québec, PQ, Canada
| |
Collapse
|
43
|
Pancsa R, Tompa P. Coding Regions of Intrinsic Disorder Accommodate Parallel Functions. Trends Biochem Sci 2016; 41:898-906. [DOI: 10.1016/j.tibs.2016.08.009] [Citation(s) in RCA: 19] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/19/2016] [Revised: 08/16/2016] [Accepted: 08/19/2016] [Indexed: 02/01/2023]
|
44
|
Abstract
Omics approaches have become popular in biology as powerful discovery tools, and currently gain in interest for diagnostic applications. Establishing the accurate genome sequence of any organism is easy, but the outcome of its annotation by means of automatic pipelines remains imprecise. Some protein-encoding genes may be missed as soon as they are specific and poorly conserved in a given taxon, while important to explain the specific traits of the organism. Translational starts are also poorly predicted in a relatively important number of cases, thus impacting the protein sequence database used in proteomics, comparative genomics, and systems biology. The use of high-throughput proteomics data to improve genome annotation is an attractive option to obtain a more comprehensive molecular picture of a given organism. Here, protocols for reannotating prokaryote genomes are described based on shotgun proteomics and derivatization of protein N-termini with a positively charged reagent coupled to high-resolution tandem mass spectrometry.
Collapse
|
45
|
Kumar D, Bansal G, Narang A, Basak T, Abbas T, Dash D. Integrating transcriptome and proteome profiling: Strategies and applications. Proteomics 2016; 16:2533-2544. [PMID: 27343053 DOI: 10.1002/pmic.201600140] [Citation(s) in RCA: 108] [Impact Index Per Article: 12.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/16/2016] [Revised: 06/12/2016] [Accepted: 06/23/2016] [Indexed: 12/17/2022]
Abstract
Discovering the gene expression signature associated with a cellular state is one of the basic quests in majority of biological studies. For most of the clinical and cellular manifestations, these molecular differences may be exhibited across multiple layers of gene regulation like genomic variations, gene expression, protein translation and post-translational modifications. These system wide variations are dynamic in nature and their crosstalk is overwhelmingly complex, thus analyzing them separately may not be very informative. This necessitates the integrative analysis of such multiple layers of information to understand the interplay of the individual components of the biological system. Recent developments in high throughput RNA sequencing and mass spectrometric (MS) technologies to probe transcripts and proteins made these as preferred methods for understanding global gene regulation. Subsequently, improvements in "big-data" analysis techniques enable novel conclusions to be drawn from integrative transcriptomic-proteomic analysis. The unified analyses of both these data types have been rewarding for several biological objectives like improving genome annotation, predicting RNA-protein quantities, deciphering gene regulations, discovering disease markers and drug targets. There are different ways in which transcriptomics and proteomics data can be integrated; each aiming for different research objectives. Here, we review various studies, approaches and computational tools targeted for integrative analysis of these two high-throughput omics methods.
Collapse
Affiliation(s)
- Dhirendra Kumar
- G.N. Ramachandran Knowledge Center for Genome Informatics, CSIR-Institute of Genomics and Integrative Biology, South Campus, Sukhdev Vihar, New Delhi, INDIA
| | - Gourja Bansal
- G.N. Ramachandran Knowledge Center for Genome Informatics, CSIR-Institute of Genomics and Integrative Biology, South Campus, Sukhdev Vihar, New Delhi, INDIA
| | - Ankita Narang
- G.N. Ramachandran Knowledge Center for Genome Informatics, CSIR-Institute of Genomics and Integrative Biology, South Campus, Sukhdev Vihar, New Delhi, INDIA
| | - Trayambak Basak
- G.N. Ramachandran Knowledge Center for Genome Informatics, CSIR-Institute of Genomics and Integrative Biology, South Campus, Sukhdev Vihar, New Delhi, INDIA.,Academy of Scientific & Innovative Research (AcSIR), CSIR-IGIB South Campus, New Delhi, India
| | - Tahseen Abbas
- G.N. Ramachandran Knowledge Center for Genome Informatics, CSIR-Institute of Genomics and Integrative Biology, South Campus, Sukhdev Vihar, New Delhi, INDIA.,Academy of Scientific & Innovative Research (AcSIR), CSIR-IGIB South Campus, New Delhi, India
| | - Debasis Dash
- G.N. Ramachandran Knowledge Center for Genome Informatics, CSIR-Institute of Genomics and Integrative Biology, South Campus, Sukhdev Vihar, New Delhi, INDIA. , .,Academy of Scientific & Innovative Research (AcSIR), CSIR-IGIB South Campus, New Delhi, India. ,
| |
Collapse
|
46
|
Lycette BE, Glickman JW, Roth SJ, Cram AE, Kim TH, Krizanc D, Weir MP. N-Terminal Peptide Detection with Optimized Peptide-Spectrum Matching and Streamlined Sequence Libraries. J Proteome Res 2016; 15:2891-9. [PMID: 27498768 DOI: 10.1021/acs.jproteome.5b00996] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
Abstract
We identified tryptic peptides in yeast cell lysates that map to translation initiation sites downstream of the annotated start sites using the peptide-spectrum matching algorithms OMSSA and Mascot. To increase the accuracy of peptide-spectrum matching, both algorithms were run using several standardized parameter sets, and Mascot was run utilizing a, b, and y ions from collision-induced dissociation. A large fraction (22%) of the detected N-terminal peptides mapped to translation initiation downstream of the annotated initiation sites. Expression of several truncated proteins from downstream initiation in the same reading frame as the full-length protein (frame 1) was verified by western analysis. To facilitate analysis of the larger proteome of Drosophila, we created a streamlined sequence library from which all duplicated trypsin fragments had been removed. OMSSA assessment using this "stripped" library revealed 171 peptides that map to downstream translation initiation sites, 76% of which are in the same reading frame as the full-length annotated proteins, although some are in different reading frames creating new protein sequences not in the annotated proteome. Sequences surrounding implicated downstream AUG start codons are associated with nucleotide preferences with a pronounced three-base periodicity N1^G2^A3.
Collapse
Affiliation(s)
- Brynne E Lycette
- Department of Biology, Wesleyan University , Middletown, Connecticut 06459, United States.,Department of Mathematics and Computer Science, Wesleyan University , Middletown, Connecticut 06459, United States
| | - Jacob W Glickman
- Department of Biology, Wesleyan University , Middletown, Connecticut 06459, United States
| | - Samuel J Roth
- Department of Biology, Wesleyan University , Middletown, Connecticut 06459, United States.,Department of Mathematics and Computer Science, Wesleyan University , Middletown, Connecticut 06459, United States
| | - Abigail E Cram
- Department of Biology, Wesleyan University , Middletown, Connecticut 06459, United States
| | - Tae Hee Kim
- Department of Biology, Wesleyan University , Middletown, Connecticut 06459, United States
| | - Danny Krizanc
- Department of Mathematics and Computer Science, Wesleyan University , Middletown, Connecticut 06459, United States
| | - Michael P Weir
- Department of Biology, Wesleyan University , Middletown, Connecticut 06459, United States
| |
Collapse
|
47
|
Sheynkman GM, Shortreed MR, Cesnik AJ, Smith LM. Proteogenomics: Integrating Next-Generation Sequencing and Mass Spectrometry to Characterize Human Proteomic Variation. ANNUAL REVIEW OF ANALYTICAL CHEMISTRY (PALO ALTO, CALIF.) 2016; 9:521-45. [PMID: 27049631 PMCID: PMC4991544 DOI: 10.1146/annurev-anchem-071015-041722] [Citation(s) in RCA: 73] [Impact Index Per Article: 8.1] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/09/2023]
Abstract
Mass spectrometry-based proteomics has emerged as the leading method for detection, quantification, and characterization of proteins. Nearly all proteomic workflows rely on proteomic databases to identify peptides and proteins, but these databases typically contain a generic set of proteins that lack variations unique to a given sample, precluding their detection. Fortunately, proteogenomics enables the detection of such proteomic variations and can be defined, broadly, as the use of nucleotide sequences to generate candidate protein sequences for mass spectrometry database searching. Proteogenomics is experiencing heightened significance due to two developments: (a) advances in DNA sequencing technologies that have made complete sequencing of human genomes and transcriptomes routine, and (b) the unveiling of the tremendous complexity of the human proteome as expressed at the levels of genes, cells, tissues, individuals, and populations. We review here the field of human proteogenomics, with an emphasis on its history, current implementations, the types of proteomic variations it reveals, and several important applications.
Collapse
Affiliation(s)
- Gloria M Sheynkman
- Center for Cancer Systems Biology (CCSB) and Department of Cancer Biology, Dana-Farber Cancer Institute, Boston, Massachusetts 02215;
- Department of Genetics, Harvard Medical School, Boston, Massachusetts 02115
- Department of Chemistry, University of Wisconsin, Madison, Wisconsin 53706; ,
| | - Michael R Shortreed
- Department of Chemistry, University of Wisconsin, Madison, Wisconsin 53706; ,
| | - Anthony J Cesnik
- Department of Chemistry, University of Wisconsin, Madison, Wisconsin 53706; ,
| | - Lloyd M Smith
- Department of Chemistry, University of Wisconsin, Madison, Wisconsin 53706; ,
- Genome Center of Wisconsin, University of Wisconsin, Madison, Wisconsin 53706;
| |
Collapse
|
48
|
Hellens RP, Brown CM, Chisnall MAW, Waterhouse PM, Macknight RC. The Emerging World of Small ORFs. TRENDS IN PLANT SCIENCE 2016; 21:317-328. [PMID: 26684391 DOI: 10.1016/j.tplants.2015.11.005] [Citation(s) in RCA: 79] [Impact Index Per Article: 8.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/06/2015] [Revised: 10/23/2015] [Accepted: 11/05/2015] [Indexed: 05/10/2023]
Abstract
Small open reading frames (sORFs) are an often overlooked feature of plant genomes. Initially found in plant viral RNAs and considered an interesting curiosity, an increasing number of these sORFs have been shown to encode functional peptides or play a regulatory role. The recent discovery that many of these sORFs initiate with start codons other than AUG, together with the identification of functional small peptides encoded in supposedly noncoding primary miRNA transcripts (pri-miRs), has drastically increased the number of potentially functional sORFs within the genome. Here we review how advances in technology, notably ribosome profiling (RP) assays, are complementing bioinformatics and proteogenomic methods to provide powerful ways to identify these elusive features of plant genomes, and highlight the regulatory roles sORFs can play.
Collapse
Affiliation(s)
- Roger P Hellens
- Centre for Tropical Crops and Biocommodities, Queensland University of Technology, GPO Box 2434, Brisbane, QLD 4001, Australia
| | - Chris M Brown
- Department of Biochemistry, University of Otago, PO Box 56, Dunedin 9054, New Zealand
| | - Matthew A W Chisnall
- Department of Biochemistry, University of Otago, PO Box 56, Dunedin 9054, New Zealand
| | - Peter M Waterhouse
- Centre for Tropical Crops and Biocommodities, Queensland University of Technology, GPO Box 2434, Brisbane, QLD 4001, Australia
| | - Richard C Macknight
- Department of Biochemistry, University of Otago, PO Box 56, Dunedin 9054, New Zealand; New Zealand Institute for Plant and Food Research Ltd.
| |
Collapse
|
49
|
Gawron D, Ndah E, Gevaert K, Van Damme P. Positional proteomics reveals differences in N-terminal proteoform stability. Mol Syst Biol 2016; 12:858. [PMID: 26893308 PMCID: PMC4770386 DOI: 10.15252/msb.20156662] [Citation(s) in RCA: 63] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/06/2023] Open
Abstract
To understand the impact of alternative translation initiation on a proteome, we performed a proteome‐wide study on protein turnover using positional proteomics and ribosome profiling to distinguish between N‐terminal proteoforms of individual genes. By combining pulsed SILAC with N‐terminal COFRADIC, we monitored the stability of 1,941 human N‐terminal proteoforms, including 147 N‐terminal proteoform pairs that originate from alternative translation initiation, alternative splicing or incomplete processing of the initiator methionine. N‐terminally truncated proteoforms were less abundant than canonical proteoforms and often displayed altered stabilities, likely attributed to individual protein characteristics, including intrinsic disorder, but independent of N‐terminal amino acid identity or truncation length. We discovered that the removal of initiator methionine by methionine aminopeptidases reduced the stability of processed proteoforms, while susceptibility for N‐terminal acetylation did not seem to influence protein turnover rates. Taken together, our findings reveal differences in protein stability between N‐terminal proteoforms and point to a role for alternative translation initiation and co‐translational initiator methionine removal, next to alternative splicing, in the overall regulation of proteome homeostasis.
Collapse
Affiliation(s)
- Daria Gawron
- Department of Medical Protein Research, VIB, Ghent, Belgium Department of Biochemistry, Ghent University, Ghent, Belgium
| | - Elvis Ndah
- Department of Medical Protein Research, VIB, Ghent, Belgium Department of Biochemistry, Ghent University, Ghent, Belgium Lab of Bioinformatics and Computational Genomics, Department of Mathematical Modelling, Statistics and Bioinformatics, Faculty of Bioscience Engineering, Ghent University, Ghent, Belgium
| | - Kris Gevaert
- Department of Medical Protein Research, VIB, Ghent, Belgium Department of Biochemistry, Ghent University, Ghent, Belgium
| | - Petra Van Damme
- Department of Medical Protein Research, VIB, Ghent, Belgium Department of Biochemistry, Ghent University, Ghent, Belgium
| |
Collapse
|
50
|
Locard-Paulet M, Pible O, Gonzalez de Peredo A, Alpha-Bazin B, Almunia C, Burlet-Schiltz O, Armengaud J. Clinical implications of recent advances in proteogenomics. Expert Rev Proteomics 2016; 13:185-99. [DOI: 10.1586/14789450.2016.1132169] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/13/2023]
|