1
|
Brunet MA, Lekehal AM, Roucou X. How to Illuminate the Dark Proteome Using the Multi-omic OpenProt Resource. ACTA ACUST UNITED AC 2021; 71:e103. [PMID: 32780568 DOI: 10.1002/cpbi.103] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/16/2022]
Abstract
Ten of thousands of open reading frames (ORFs) are hidden within genomes. These alternative ORFs, or small ORFs, have eluded annotations because they are either small or within unsuspected locations. They are found in untranslated regions or overlap a known coding sequence in messenger RNA and anywhere in a "non-coding" RNA. Serendipitous discoveries have highlighted these ORFs' importance in biological functions and pathways. With their discovery came the need for deeper ORF annotation and large-scale mining of public repositories to gather supporting experimental evidence. OpenProt, accessible at https://openprot.org/, is the first proteogenomic resource enforcing a polycistronic model of annotation across an exhaustive transcriptome for 10 species. Moreover, OpenProt reports experimental evidence cumulated across a re-analysis of 114 mass spectrometry and 87 ribosome profiling datasets. The multi-omics OpenProt resource also includes the identification of predicted functional domains and evaluation of conservation for all predicted ORFs. The OpenProt web server provides two query interfaces and one genome browser. The query interfaces allow for exploration of the coding potential of genes or transcripts of interest as well as custom downloads of all information contained in OpenProt. © 2020 The Authors. Basic Protocol 1: Using the Search interface Basic Protocol 2: Using the Downloads interface.
Collapse
Affiliation(s)
- Marie A Brunet
- Department of Biochemistry and Functional Genomics, Université de Sherbrooke, Sherbrooke, Québec, Canada.,PROTEO, Quebec Network for Research on Protein Function, Structure, and Engineering, Québec, Canada
| | - Amina M Lekehal
- Department of Biochemistry and Functional Genomics, Université de Sherbrooke, Sherbrooke, Québec, Canada.,PROTEO, Quebec Network for Research on Protein Function, Structure, and Engineering, Québec, Canada
| | - Xavier Roucou
- Department of Biochemistry and Functional Genomics, Université de Sherbrooke, Sherbrooke, Québec, Canada.,PROTEO, Quebec Network for Research on Protein Function, Structure, and Engineering, Québec, Canada
| |
Collapse
|
2
|
Guerra-Almeida D, Tschoeke DA, da-Fonseca RN. Understanding small ORF diversity through a comprehensive transcription feature classification. DNA Res 2021; 28:6317669. [PMID: 34240112 PMCID: PMC8435553 DOI: 10.1093/dnares/dsab007] [Citation(s) in RCA: 15] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/22/2020] [Indexed: 11/13/2022] Open
Abstract
Small open reading frames (small ORFs/sORFs/smORFs) are potentially coding sequences smaller than 100 codons that have historically been considered junk DNA by gene prediction software and in annotation screening; however, the advent of next-generation sequencing has contributed to the deeper investigation of junk DNA regions and their transcription products, resulting in the emergence of smORFs as a new focus of interest in systems biology. Several smORF peptides were recently reported in noncanonical mRNAs as new players in numerous biological contexts; however, their relevance is still overlooked in coding potential analysis. Hence, this review proposes a smORF classification based on transcriptional features, discussing the most promising approaches to investigate smORFs based on their different characteristics. First, smORFs were divided into nonexpressed (intergenic) and expressed (genic) smORFs. Second, genic smORFs were classified as smORFs located in noncoding RNAs (ncRNAs) or canonical mRNAs. Finally, smORFs in ncRNAs were further subdivided into sequences located in small or long RNAs, whereas smORFs located in canonical mRNAs were subdivided into several specific classes depending on their localization along the gene. We hope that this review provides new insights into large-scale annotations and reinforces the role of smORFs as essential components of a hidden coding DNA world.
Collapse
Affiliation(s)
- Diego Guerra-Almeida
- Institute of Biodiversity and Sustainability, Federal University of Rio de Janeiro, Rio de Janeiro, Brazil
| | - Diogo Antonio Tschoeke
- Alberto Luiz Coimbra Institute of Graduate Studies and Engineering Research (COPPE), Biomedical Engineering Program, Federal University of Rio de Janeiro, Rio de Janeiro, Brazil
| | - Rodrigo Nunes- da-Fonseca
- Institute of Biodiversity and Sustainability, Federal University of Rio de Janeiro, Rio de Janeiro, Brazil.,National Institute of Science and Technology in Molecular Entomology, Rio de Janeiro, Brazil
| |
Collapse
|
3
|
Vitorino R, Guedes S, Amado F, Santos M, Akimitsu N. The role of micropeptides in biology. Cell Mol Life Sci 2021; 78:3285-3298. [PMID: 33507325 PMCID: PMC11073438 DOI: 10.1007/s00018-020-03740-3] [Citation(s) in RCA: 37] [Impact Index Per Article: 9.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/10/2020] [Revised: 12/01/2020] [Accepted: 12/11/2020] [Indexed: 12/11/2022]
Abstract
Micropeptides are small polypeptides coded by small open-reading frames. Progress in computational biology and the analyses of large-scale transcriptomes and proteomes have revealed that mammalian genomes produce a large number of transcripts encoding micropeptides. Many of these have been previously annotated as long noncoding RNAs. The role of micropeptides in cellular homeostasis maintenance has been demonstrated. This review discusses different types of micropeptides as well as methods to identify them, such as computational approaches, ribosome profiling, and mass spectrometry.
Collapse
Affiliation(s)
- Rui Vitorino
- Departamento de Cirurgia E Fisiologia, Faculdade de Medicina da Universidade Do Porto, UnIC, Porto, Portugal.
- Department of Medical Sciences, iBiMED, University of Aveiro, Aveiro, Portugal.
| | - Sofia Guedes
- Departamento de Química, LAQV-REQUIMTE, Universidade de Aveiro, Aveiro, Portugal
- Department of Chemistry, University of Aveiro, Aveiro, Portugal
| | - Francisco Amado
- Departamento de Química, LAQV-REQUIMTE, Universidade de Aveiro, Aveiro, Portugal
- Department of Chemistry, University of Aveiro, Aveiro, Portugal
| | - Manuel Santos
- Department of Medical Sciences, iBiMED, University of Aveiro, Aveiro, Portugal
| | | |
Collapse
|
4
|
Brunet MA, Lucier JF, Levesque M, Leblanc S, Jacques JF, Al-Saedi HRH, Guilloy N, Grenier F, Avino M, Fournier I, Salzet M, Ouangraoua A, Scott M, Boisvert FM, Roucou X. OpenProt 2021: deeper functional annotation of the coding potential of eukaryotic genomes. Nucleic Acids Res 2021; 49:D380-D388. [PMID: 33179748 PMCID: PMC7779043 DOI: 10.1093/nar/gkaa1036] [Citation(s) in RCA: 72] [Impact Index Per Article: 18.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/14/2020] [Revised: 10/15/2020] [Accepted: 10/16/2020] [Indexed: 12/12/2022] Open
Abstract
OpenProt (www.openprot.org) is the first proteogenomic resource supporting a polycistronic annotation model for eukaryotic genomes. It provides a deeper annotation of open reading frames (ORFs) while mining experimental data for supporting evidence using cutting-edge algorithms. This update presents the major improvements since the initial release of OpenProt. All species support recent NCBI RefSeq and Ensembl annotations, with changes in annotations being reported in OpenProt. Using the 131 ribosome profiling datasets re-analysed by OpenProt to date, non-AUG initiation starts are reported alongside a confidence score of the initiating codon. From the 177 mass spectrometry datasets re-analysed by OpenProt to date, the unicity of the detected peptides is controlled at each implementation. Furthermore, to guide the users, detectability statistics and protein relationships (isoforms) are now reported for each protein. Finally, to foster access to deeper ORF annotation independently of one's bioinformatics skills or computational resources, OpenProt now offers a data analysis platform. Users can submit their dataset for analysis and receive the results from the analysis by OpenProt. All data on OpenProt are freely available and downloadable for each species, the release-based format ensuring a continuous access to the data. Thus, OpenProt enables a more comprehensive annotation of eukaryotic genomes and fosters functional proteomic discoveries.
Collapse
Affiliation(s)
- Marie A Brunet
- Department of Biochemistry and Functional Genomics, Université de Sherbrooke, 3201 Jean Mignault, Sherbrooke, QC J1E 4K8, Canada
- PROTEO, Quebec Network for Research on Protein Function, Structure, and Engineering, Université Laval, Quebec City, QC G1V0A6, Canada
| | - Jean-François Lucier
- Center for Computational Science, Université de Sherbrooke, Sherbrooke, QC J1K 2R1, Canada
- Biology Department, Université de Sherbrooke, Sherbrooke, QC J1K 2R1, Canada
| | - Maxime Levesque
- Center for Computational Science, Université de Sherbrooke, Sherbrooke, QC J1K 2R1, Canada
- Biology Department, Université de Sherbrooke, Sherbrooke, QC J1K 2R1, Canada
| | - Sébastien Leblanc
- Department of Biochemistry and Functional Genomics, Université de Sherbrooke, 3201 Jean Mignault, Sherbrooke, QC J1E 4K8, Canada
- PROTEO, Quebec Network for Research on Protein Function, Structure, and Engineering, Université Laval, Quebec City, QC G1V0A6, Canada
| | - Jean-Francois Jacques
- Department of Biochemistry and Functional Genomics, Université de Sherbrooke, 3201 Jean Mignault, Sherbrooke, QC J1E 4K8, Canada
- PROTEO, Quebec Network for Research on Protein Function, Structure, and Engineering, Université Laval, Quebec City, QC G1V0A6, Canada
| | - Hassan R H Al-Saedi
- Department of Biochemistry and Functional Genomics, Université de Sherbrooke, 3201 Jean Mignault, Sherbrooke, QC J1E 4K8, Canada
| | - Noé Guilloy
- Department of Biochemistry and Functional Genomics, Université de Sherbrooke, 3201 Jean Mignault, Sherbrooke, QC J1E 4K8, Canada
- PROTEO, Quebec Network for Research on Protein Function, Structure, and Engineering, Université Laval, Quebec City, QC G1V0A6, Canada
| | - Frederic Grenier
- Center for Computational Science, Université de Sherbrooke, Sherbrooke, QC J1K 2R1, Canada
- Biology Department, Université de Sherbrooke, Sherbrooke, QC J1K 2R1, Canada
| | - Mariano Avino
- Department of Biochemistry and Functional Genomics, Université de Sherbrooke, 3201 Jean Mignault, Sherbrooke, QC J1E 4K8, Canada
| | - Isabelle Fournier
- INSERM U1192, Laboratoire Protéomique, Réponse Inflammatoire & Spectrométrie de Masse (PRISM), Université de Lille, F-59000 Lille, France
| | - Michel Salzet
- INSERM U1192, Laboratoire Protéomique, Réponse Inflammatoire & Spectrométrie de Masse (PRISM), Université de Lille, F-59000 Lille, France
| | - Aïda Ouangraoua
- Informatics Department, Université de Sherbrooke, Sherbrooke, QC J1K 2R1, Canada
| | - Michelle S Scott
- Department of Biochemistry and Functional Genomics, Université de Sherbrooke, 3201 Jean Mignault, Sherbrooke, QC J1E 4K8, Canada
| | - François-Michel Boisvert
- Department of Immunology and Cellular Biology, Université de Sherbrooke, Sherbrooke, QC J1E 4K8, Canada
| | - Xavier Roucou
- Department of Biochemistry and Functional Genomics, Université de Sherbrooke, 3201 Jean Mignault, Sherbrooke, QC J1E 4K8, Canada
- PROTEO, Quebec Network for Research on Protein Function, Structure, and Engineering, Université Laval, Quebec City, QC G1V0A6, Canada
| |
Collapse
|
5
|
Leblanc S, Brunet MA. Modelling of pathogen-host systems using deeper ORF annotations and transcriptomics to inform proteomics analyses. Comput Struct Biotechnol J 2020; 18:2836-2850. [PMID: 33133425 PMCID: PMC7585943 DOI: 10.1016/j.csbj.2020.10.010] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/29/2020] [Revised: 10/07/2020] [Accepted: 10/08/2020] [Indexed: 01/08/2023] Open
Abstract
The Zika virus is a flavivirus that can cause fulminant outbreaks and lead to Guillain-Barré syndrome, microcephaly and fetal demise. Like other flaviviruses, the Zika virus is transmitted by mosquitoes and provokes neurological disorders. Despite its risk to public health, no antiviral nor vaccine are currently available. In the recent years, several studies have set to identify human host proteins interacting with Zika viral proteins to better understand its pathogenicity. Yet these studies used standard human protein sequence databases. Such databases rely on genome annotations, which enforce a minimal open reading frame (ORF) length criterion. An ever-increasing number of studies have demonstrated the shortcomings of such annotation, which overlooks thousands of functional ORFs. Here we show that the use of a customized database including currently non-annotated proteins led to the identification of 4 alternative proteins as interactors of the viral capsid and NS4A proteins. Furthermore, 12 alternative proteins were identified in the proteome profiling of Zika infected monocytes, one of which was significantly up-regulated. This study presents a computational framework for the re-analysis of proteomics datasets to better investigate the viral-host protein interplays upon infection with the Zika virus.
Collapse
Key Words
- AP-MS, affinity-purification mass spectrometry
- Alternative ORFs
- DEP, differentially expressed proteins
- FDR, false discovery rate
- FPKM, fragments per kilobase of exon model per million reads mapped
- Flavivirus
- HCIP, highly confident interacting proteins
- HCMV, human cytomegalovirus
- LFQ, label free quantification
- MS, mass spectrometry
- ORF, open reading frame
- PSM, peptide spectrum match
- Protein network
- Proteogenomics
- Proteome profiling
- ZIKV, Zika virus
- Zika
- altProt, alternative protein
- ncRNA, non-coding RNA
- sORF, small open reading frame
Collapse
Affiliation(s)
- Sebastien Leblanc
- Department of Biochemistry and Functional Genomics, Université de Sherbrooke, Sherbrooke, Québec, Canada
- PROTEO, Quebec Network for Research on Protein Function, Structure, and Engineering, Canada
| | - Marie A. Brunet
- Department of Biochemistry and Functional Genomics, Université de Sherbrooke, Sherbrooke, Québec, Canada
- PROTEO, Quebec Network for Research on Protein Function, Structure, and Engineering, Canada
| |
Collapse
|
6
|
Brunet MA, Brunelle M, Lucier JF, Delcourt V, Levesque M, Grenier F, Samandi S, Leblanc S, Aguilar JD, Dufour P, Jacques JF, Fournier I, Ouangraoua A, Scott MS, Boisvert FM, Roucou X. OpenProt: a more comprehensive guide to explore eukaryotic coding potential and proteomes. Nucleic Acids Res 2020; 47:D403-D410. [PMID: 30299502 PMCID: PMC6323990 DOI: 10.1093/nar/gky936] [Citation(s) in RCA: 41] [Impact Index Per Article: 8.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/19/2018] [Accepted: 10/04/2018] [Indexed: 01/06/2023] Open
Abstract
Advances in proteomics and sequencing have highlighted many non-annotated open reading frames (ORFs) in eukaryotic genomes. Genome annotations, cornerstones of today's research, mostly rely on protein prior knowledge and on ab initio prediction algorithms. Such algorithms notably enforce an arbitrary criterion of one coding sequence (CDS) per transcript, leading to a substantial underestimation of the coding potential of eukaryotes. Here, we present OpenProt, the first database fully endorsing a polycistronic model of eukaryotic genomes to date. OpenProt contains all possible ORFs longer than 30 codons across 10 species, and cumulates supporting evidence such as protein conservation, translation and expression. OpenProt annotates all known proteins (RefProts), novel predicted isoforms (Isoforms) and novel predicted proteins from alternative ORFs (AltProts). It incorporates cutting-edge algorithms to evaluate protein orthology and re-interrogate publicly available ribosome profiling and mass spectrometry datasets, supporting the annotation of thousands of predicted ORFs. The constantly growing database currently cumulates evidence from 87 ribosome profiling and 114 mass spectrometry studies from several species, tissues and cell lines. All data is freely available and downloadable from a web platform (www.openprot.org) supporting a genome browser and advanced queries for each species. Thus, OpenProt enables a more comprehensive landscape of eukaryotic genomes’ coding potential.
Collapse
Affiliation(s)
- Marie A Brunet
- Department of Biochemistry, Université de Sherbrooke, Sherbrooke, Québec, Canada.,PROTEO, Quebec Network for Research on Protein Function, Structure, and Engineering, Université de Lille, F-59000 Lille, France
| | - Mylène Brunelle
- Department of Biochemistry, Université de Sherbrooke, Sherbrooke, Québec, Canada.,PROTEO, Quebec Network for Research on Protein Function, Structure, and Engineering, Université de Lille, F-59000 Lille, France
| | - Jean-François Lucier
- Center for Computational Science, Université de Sherbrooke, Sherbrooke, Québec, Canada.,Biology Department, Université de Sherbrooke, Sherbrooke, Québec, Canada
| | - Vivian Delcourt
- Department of Biochemistry, Université de Sherbrooke, Sherbrooke, Québec, Canada.,PROTEO, Quebec Network for Research on Protein Function, Structure, and Engineering, Université de Lille, F-59000 Lille, France.,INSERM U1192, Laboratoire Protéomique, Réponse Inflammatoire & Spectrométrie de Masse (PRISM), Université de Lille, F-59000 Lille, France
| | - Maxime Levesque
- Center for Computational Science, Université de Sherbrooke, Sherbrooke, Québec, Canada.,Biology Department, Université de Sherbrooke, Sherbrooke, Québec, Canada
| | - Frédéric Grenier
- Center for Computational Science, Université de Sherbrooke, Sherbrooke, Québec, Canada.,Biology Department, Université de Sherbrooke, Sherbrooke, Québec, Canada
| | - Sondos Samandi
- Department of Biochemistry, Université de Sherbrooke, Sherbrooke, Québec, Canada.,PROTEO, Quebec Network for Research on Protein Function, Structure, and Engineering, Université de Lille, F-59000 Lille, France
| | - Sébastien Leblanc
- Department of Biochemistry, Université de Sherbrooke, Sherbrooke, Québec, Canada
| | - Jean-David Aguilar
- Department of Biochemistry, Université de Sherbrooke, Sherbrooke, Québec, Canada
| | - Pascal Dufour
- Department of Biochemistry, Université de Sherbrooke, Sherbrooke, Québec, Canada
| | - Jean-Francois Jacques
- Department of Biochemistry, Université de Sherbrooke, Sherbrooke, Québec, Canada.,PROTEO, Quebec Network for Research on Protein Function, Structure, and Engineering, Université de Lille, F-59000 Lille, France
| | - Isabelle Fournier
- INSERM U1192, Laboratoire Protéomique, Réponse Inflammatoire & Spectrométrie de Masse (PRISM), Université de Lille, F-59000 Lille, France
| | - Aida Ouangraoua
- Informatics Department, Université de Sherbrooke, Sherbrooke, Québec, Canada
| | - Michelle S Scott
- Department of Biochemistry, Université de Sherbrooke, Sherbrooke, Québec, Canada
| | | | - Xavier Roucou
- Department of Biochemistry, Université de Sherbrooke, Sherbrooke, Québec, Canada.,PROTEO, Quebec Network for Research on Protein Function, Structure, and Engineering, Université de Lille, F-59000 Lille, France
| |
Collapse
|
7
|
Ang MY, Low TY, Lee PY, Wan Mohamad Nazarie WF, Guryev V, Jamal R. Proteogenomics: From next-generation sequencing (NGS) and mass spectrometry-based proteomics to precision medicine. Clin Chim Acta 2019; 498:38-46. [DOI: 10.1016/j.cca.2019.08.010] [Citation(s) in RCA: 39] [Impact Index Per Article: 6.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/22/2019] [Revised: 08/13/2019] [Accepted: 08/13/2019] [Indexed: 12/14/2022]
|
8
|
Low TY, Mohtar MA, Ang MY, Jamal R. Connecting Proteomics to Next‐Generation Sequencing: Proteogenomics and Its Current Applications in Biology. Proteomics 2018; 19:e1800235. [DOI: 10.1002/pmic.201800235] [Citation(s) in RCA: 23] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/11/2018] [Revised: 10/09/2018] [Indexed: 12/17/2022]
Affiliation(s)
- Teck Yew Low
- UKM Medical Molecular Biology Institute (UMBI)Universiti Kebangsaan Malaysia 56000 Kuala Lumpur Malaysia
| | - M. Aiman Mohtar
- UKM Medical Molecular Biology Institute (UMBI)Universiti Kebangsaan Malaysia 56000 Kuala Lumpur Malaysia
| | - Mia Yang Ang
- UKM Medical Molecular Biology Institute (UMBI)Universiti Kebangsaan Malaysia 56000 Kuala Lumpur Malaysia
| | - Rahman Jamal
- UKM Medical Molecular Biology Institute (UMBI)Universiti Kebangsaan Malaysia 56000 Kuala Lumpur Malaysia
| |
Collapse
|
9
|
Yang M, Lin X, Liu X, Zhang J, Ge F. Genome Annotation of a Model Diatom Phaeodactylum tricornutum Using an Integrated Proteogenomic Pipeline. MOLECULAR PLANT 2018; 11:1292-1307. [PMID: 30176371 DOI: 10.1016/j.molp.2018.08.005] [Citation(s) in RCA: 30] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/15/2018] [Revised: 08/26/2018] [Accepted: 08/28/2018] [Indexed: 06/08/2023]
Abstract
Diatoms comprise a diverse and ecologically important group of eukaryotic phytoplankton that significantly contributes to marine primary production and global carbon cycling. Phaeodactylum tricornutum is commonly used as a model organism for studying diatom biology. Although its genome was sequenced in 2008, a high-quality genome annotation is still not available for this diatom. Here we report the development of an integrated proteogenomic pipeline and its application for improved annotation of P. tricornutum genome using mass spectrometry (MS)-based proteomics data. Our proteogenomic analysis unambiguously identified approximately 8300 genes and revealed 606 novel proteins, 506 revised genes, 94 splice variants, 58 single amino acid variants, and a holistic view of post-translational modifications in P. tricornutum. We experimentally confirmed a subset of novel events and obtained MS evidence for more than 200 micropeptides in P. tricornutum. These findings expand the genomic landscape of P. tricornutum and provide a rich resource for the study of diatom biology. The proteogenomic pipeline we developed in this study is applicable to any sequenced eukaryote and thus represents a significant contribution to the toolset for eukaryotic proteogenomic analysis. The pipeline and its source code are freely available at https://sourceforge.net/projects/gapeproteogenomic.
Collapse
Affiliation(s)
- Mingkun Yang
- Key Laboratory of Algal Biology, Institute of Hydrobiology, Chinese Academy of Sciences, Wuhan 430072, China
| | - Xiaohuang Lin
- Key Laboratory of Algal Biology, Institute of Hydrobiology, Chinese Academy of Sciences, Wuhan 430072, China; University of Chinese Academy of Sciences, Beijing 100039, China
| | - Xin Liu
- Key Laboratory of Algal Biology, Institute of Hydrobiology, Chinese Academy of Sciences, Wuhan 430072, China; University of Chinese Academy of Sciences, Beijing 100039, China
| | - Jia Zhang
- Key Laboratory of Algal Biology, Institute of Hydrobiology, Chinese Academy of Sciences, Wuhan 430072, China
| | - Feng Ge
- Key Laboratory of Algal Biology, Institute of Hydrobiology, Chinese Academy of Sciences, Wuhan 430072, China; University of Chinese Academy of Sciences, Beijing 100039, China.
| |
Collapse
|
10
|
Delcourt V, Brunelle M, Roy AV, Jacques JF, Salzet M, Fournier I, Roucou X. The Protein Coded by a Short Open Reading Frame, Not by the Annotated Coding Sequence, Is the Main Gene Product of the Dual-Coding Gene MIEF1. Mol Cell Proteomics 2018; 17:2402-2411. [PMID: 30181344 PMCID: PMC6283296 DOI: 10.1074/mcp.ra118.000593] [Citation(s) in RCA: 42] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/23/2018] [Revised: 07/19/2018] [Indexed: 12/18/2022] Open
Abstract
Proteogenomics and ribosome profiling concurrently show that genes may code for both a large and one or more small proteins translated from annotated coding sequences (CDSs) and unannotated alternative open reading frames (named alternative ORFs or altORFs), respectively, but the stoichiometry between large and small proteins translated from a same gene is unknown. MIEF1, a gene recently identified as a dual-coding gene, harbors a CDS and a newly annotated and actively translated altORF located in the 5′UTR. Here, we use absolute quantification with stable isotope-labeled peptides and parallel reaction monitoring to determine levels of both proteins in two human cells lines and in human colon. We report that the main MIEF1 translational product is not the canonical 463 amino acid MiD51 protein but the small 70 amino acid alternative MiD51 protein (altMiD51). These results demonstrate the inadequacy of the single CDS concept and provide a strong argument for incorporating altORFs and small proteins in functional annotations.
Collapse
Affiliation(s)
- Vivian Delcourt
- Département de Biochimie, Université de Sherbrooke, Québec, Canada; Univ. Lille, INSERM U1192, Laboratoire Protéomique, Réponse Inflammatoire and Spectrométrie de Masse (PRISM) F-59000 Lille, France; PROTEO, Québec Network for Research on Protein Function, Structure, and Engineering, Québec, Canada
| | - Mylène Brunelle
- Département de Biochimie, Université de Sherbrooke, Québec, Canada; PROTEO, Québec Network for Research on Protein Function, Structure, and Engineering, Québec, Canada
| | - Annie V Roy
- Département de Biochimie, Université de Sherbrooke, Québec, Canada; PROTEO, Québec Network for Research on Protein Function, Structure, and Engineering, Québec, Canada
| | - Jean-François Jacques
- Département de Biochimie, Université de Sherbrooke, Québec, Canada; PROTEO, Québec Network for Research on Protein Function, Structure, and Engineering, Québec, Canada
| | - Michel Salzet
- Univ. Lille, INSERM U1192, Laboratoire Protéomique, Réponse Inflammatoire and Spectrométrie de Masse (PRISM) F-59000 Lille, France
| | - Isabelle Fournier
- Univ. Lille, INSERM U1192, Laboratoire Protéomique, Réponse Inflammatoire and Spectrométrie de Masse (PRISM) F-59000 Lille, France
| | - Xavier Roucou
- Département de Biochimie, Université de Sherbrooke, Québec, Canada; PROTEO, Québec Network for Research on Protein Function, Structure, and Engineering, Québec, Canada.
| |
Collapse
|
11
|
Budamgunta H, Olexiouk V, Luyten W, Schildermans K, Maes E, Boonen K, Menschaert G, Baggerman G. Comprehensive Peptide Analysis of Mouse Brain Striatum Identifies Novel sORF-Encoded Polypeptides. Proteomics 2018; 18:e1700218. [DOI: 10.1002/pmic.201700218] [Citation(s) in RCA: 29] [Impact Index Per Article: 4.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/21/2017] [Revised: 03/30/2018] [Indexed: 11/10/2022]
Affiliation(s)
| | - Volodimir Olexiouk
- BioBix; Lab for Bioinformatics and Computational Genomics; Department of Mathematical Modelling; Statistics and Bio-informatics; Ghent University; Ghent Belgium
| | - Walter Luyten
- Animal Physiology and Neurobiology; KULeuven; Leuven Belgium
| | | | - Evelyne Maes
- Centre for Proteomics; UAntwerp; Antwerp Belgium
- Proteins and Biomaterials; AgResearch; Christchurch New Zealand
| | - Kurt Boonen
- Centre for Proteomics; UAntwerp; Antwerp Belgium
- Unit Environmental Risk and Health; VITO; Mol Belgium
| | - Gerben Menschaert
- BioBix; Lab for Bioinformatics and Computational Genomics; Department of Mathematical Modelling; Statistics and Bio-informatics; Ghent University; Ghent Belgium
| | - Geert Baggerman
- Centre for Proteomics; UAntwerp; Antwerp Belgium
- Unit Environmental Risk and Health; VITO; Mol Belgium
| |
Collapse
|
12
|
Olexiouk V, Van Criekinge W, Menschaert G. An update on sORFs.org: a repository of small ORFs identified by ribosome profiling. Nucleic Acids Res 2018; 46:D497-D502. [PMID: 29140531 PMCID: PMC5753181 DOI: 10.1093/nar/gkx1130] [Citation(s) in RCA: 145] [Impact Index Per Article: 20.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/14/2017] [Revised: 10/25/2017] [Accepted: 10/26/2017] [Indexed: 12/13/2022] Open
Abstract
sORFs.org (http://www.sorfs.org) is a public repository of small open reading frames (sORFs) identified by ribosome profiling (RIBO-seq). This update elaborates on the major improvements implemented since its initial release. sORFs.org now additionally supports three more species (zebrafish, rat and Caenorhabditis elegans) and currently includes 78 RIBO-seq datasets, a vast increase compared to the three that were processed in the initial release. Therefore, a novel pipeline was constructed that also enables sORF detection in RIBO-seq datasets comprising solely elongating RIBO-seq data while previously, matching initiating RIBO-seq data was necessary to delineate the sORFs. Furthermore, a novel noise filtering algorithm was designed, able to distinguish sORFs with true ribosomal activity from simulated noise, consequently reducing the false positive identification rate. The inclusion of other species also led to the development of an inner BLAST pipeline, assessing sequence similarity between sORFs in the repository. Building on the proof of concept model in the initial release of sORFs.org, a full PRIDE-ReSpin pipeline was now released, reprocessing publicly available MS-based proteomics PRIDE datasets, reporting on true translation events. Next to reporting those identified peptides, sORFs.org allows visual inspection of the annotated spectra within the Lorikeet MS/MS viewer, thus enabling detailed manual inspection and interpretation.
Collapse
Affiliation(s)
- Volodimir Olexiouk
- Lab of Bioinformatics and Computational Genomics (BioBix), Department of Mathematical Modelling, Statistics and Bioinformatics, Faculty of Bioscience Engineering, Ghent University, 9000 Ghent, Belgium
| | - Wim Van Criekinge
- Lab of Bioinformatics and Computational Genomics (BioBix), Department of Mathematical Modelling, Statistics and Bioinformatics, Faculty of Bioscience Engineering, Ghent University, 9000 Ghent, Belgium
| | - Gerben Menschaert
- Lab of Bioinformatics and Computational Genomics (BioBix), Department of Mathematical Modelling, Statistics and Bioinformatics, Faculty of Bioscience Engineering, Ghent University, 9000 Ghent, Belgium
| |
Collapse
|
13
|
Delcourt V, Staskevicius A, Salzet M, Fournier I, Roucou X. Small Proteins Encoded by Unannotated ORFs are Rising Stars of the Proteome, Confirming Shortcomings in Genome Annotations and Current Vision of an mRNA. Proteomics 2017. [DOI: 10.1002/pmic.201700058] [Citation(s) in RCA: 43] [Impact Index Per Article: 5.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/20/2022]
Affiliation(s)
- Vivian Delcourt
- Department of Biochemistry; Université de Sherbrooke; Quebec Canada
- Univ. Lille, INSERM U1192, Laboratoire Protéomique; Réponse Inflammatoire & Spectrométrie de Masse (PRISM); Lille France
- PROTEO, Quebec Network for Research on Protein Function; Structure, and Engineering; Quebec Canada
| | | | - Michel Salzet
- Univ. Lille, INSERM U1192, Laboratoire Protéomique; Réponse Inflammatoire & Spectrométrie de Masse (PRISM); Lille France
| | - Isabelle Fournier
- Univ. Lille, INSERM U1192, Laboratoire Protéomique; Réponse Inflammatoire & Spectrométrie de Masse (PRISM); Lille France
| | - Xavier Roucou
- Department of Biochemistry; Université de Sherbrooke; Quebec Canada
- PROTEO, Quebec Network for Research on Protein Function; Structure, and Engineering; Quebec Canada
| |
Collapse
|
14
|
Abstract
Increasing evidence indicates that many, if not all, small genes encoding proteins ≤100 aa are missing in annotations of bacterial genomes currently available. To uncover unannotated small genes in the model bacterium Salmonella enterica Typhimurium 14028s, we used the genomic technique ribosome profiling, which provides a snapshot of all mRNAs being translated (translatome) in a given growth condition. For comprehensive identification of unannotated small genes, we obtained Salmonella translatomes from four different growth conditions: LB, MOPS rich defined medium, and two infection-relevant conditions low Mg2+ (10 µM) and low pH (5.8). To facilitate the identification of small genes, ribosome profiling data were analyzed in combination with in silico predicted putative open reading frames and transcriptome profiles. As a result, we uncovered 130 unannotated ORFs. Of them, 98% were small ORFs putatively encoding peptides/proteins ≤100 aa, and some of them were only expressed in the infection-relevant low Mg2+ and/or low pH condition. We validated the expression of 25 of these ORFs by western blot, including the smallest, which encodes a peptide of 7 aa residues. Our results suggest that many sequenced bacterial genomes are underannotated with regard to small genes and their gene annotations need to be revised.
Collapse
|