51
|
Prensner JR, Enache OM, Luria V, Krug K, Clauser KR, Dempster JM, Karger A, Wang L, Stumbraite K, Wang VM, Botta G, Lyons NJ, Goodale A, Kalani Z, Fritchman B, Brown A, Alan D, Green T, Yang X, Jaffe JD, Roth JA, Piccioni F, Kirschner MW, Ji Z, Root DE, Golub TR. Noncanonical open reading frames encode functional proteins essential for cancer cell survival. Nat Biotechnol 2021; 39:697-704. [PMID: 33510483 PMCID: PMC8195866 DOI: 10.1038/s41587-020-00806-2] [Citation(s) in RCA: 104] [Impact Index Per Article: 26.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/18/2020] [Accepted: 12/16/2020] [Indexed: 01/30/2023]
Abstract
Although genomic analyses predict many noncanonical open reading frames (ORFs) in the human genome, it is unclear whether they encode biologically active proteins. Here we experimentally interrogated 553 candidates selected from noncanonical ORF datasets. Of these, 57 induced viability defects when knocked out in human cancer cell lines. Following ectopic expression, 257 showed evidence of protein expression and 401 induced gene expression changes. Clustered regularly interspaced short palindromic repeat (CRISPR) tiling and start codon mutagenesis indicated that their biological effects required translation as opposed to RNA-mediated effects. We found that one of these ORFs, G029442-renamed glycine-rich extracellular protein-1 (GREP1)-encodes a secreted protein highly expressed in breast cancer, and its knockout in 263 cancer cell lines showed preferential essentiality in breast cancer-derived lines. The secretome of GREP1-expressing cells has an increased abundance of the oncogenic cytokine GDF15, and GDF15 supplementation mitigated the growth-inhibitory effect of GREP1 knockout. Our experiments suggest that noncanonical ORFs can express biologically active proteins that are potential therapeutic targets.
Collapse
Affiliation(s)
- John R. Prensner
- Broad Institute of Harvard and MIT, Cambridge, MA, 02142, USA.,Department of Pediatric Oncology, Dana-Farber Cancer Institute, Boston, MA 02215,Division of Pediatric Hematology/Oncology, Boston Children’s Hospital, Boston, MA, 02115
| | - Oana M. Enache
- Broad Institute of Harvard and MIT, Cambridge, MA, 02142, USA
| | - Victor Luria
- Department of Systems Biology, Harvard Medical School, Boston, MA, 02115, USA
| | - Karsten Krug
- Broad Institute of Harvard and MIT, Cambridge, MA, 02142, USA
| | - Karl R. Clauser
- Broad Institute of Harvard and MIT, Cambridge, MA, 02142, USA
| | | | - Amir Karger
- IT-Research Computing, Harvard Medical School, Boston, MA, USA, 02115
| | - Li Wang
- Broad Institute of Harvard and MIT, Cambridge, MA, 02142, USA
| | | | - Vickie M. Wang
- Broad Institute of Harvard and MIT, Cambridge, MA, 02142, USA
| | - Ginevra Botta
- Broad Institute of Harvard and MIT, Cambridge, MA, 02142, USA
| | | | - Amy Goodale
- Broad Institute of Harvard and MIT, Cambridge, MA, 02142, USA
| | - Zohra Kalani
- Broad Institute of Harvard and MIT, Cambridge, MA, 02142, USA
| | | | - Adam Brown
- Broad Institute of Harvard and MIT, Cambridge, MA, 02142, USA
| | - Douglas Alan
- Broad Institute of Harvard and MIT, Cambridge, MA, 02142, USA
| | - Thomas Green
- Broad Institute of Harvard and MIT, Cambridge, MA, 02142, USA
| | - Xiaoping Yang
- Broad Institute of Harvard and MIT, Cambridge, MA, 02142, USA
| | - Jacob D. Jaffe
- Broad Institute of Harvard and MIT, Cambridge, MA, 02142, USA.,Present address: Inzen Therapeutics, Cambridge, MA, 02139, USA
| | | | - Federica Piccioni
- Broad Institute of Harvard and MIT, Cambridge, MA, 02142, USA.,Present address: Merck Research Laboratories, Boston, MA, 02115, USA
| | - Marc W. Kirschner
- Department of Systems Biology, Harvard Medical School, Boston, MA, 02115, USA
| | - Zhe Ji
- Department of Pharmacology, Feinberg School of Medicine, Northwestern University, Chicago, IL 60611,Department of Biomedical Engineering, McCormick School of Engineering, Northwestern University, Evanston, IL 60628
| | - David E. Root
- Broad Institute of Harvard and MIT, Cambridge, MA, 02142, USA
| | - Todd R. Golub
- Broad Institute of Harvard and MIT, Cambridge, MA, 02142, USA.,Department of Pediatric Oncology, Dana-Farber Cancer Institute, Boston, MA 02215,Division of Pediatric Hematology/Oncology, Boston Children’s Hospital, Boston, MA, 02115,Corresponding author: Address correspondence to: Todd R. Golub, MD, Chief Scientific Officer, Broad Institute of Harvard and MIT, Room 4013, 415 Main Street, Cambridge, MA, 02142, , Phone: 617-714-7050
| |
Collapse
|
52
|
Pavesi A. Origin, Evolution and Stability of Overlapping Genes in Viruses: A Systematic Review. Genes (Basel) 2021; 12:genes12060809. [PMID: 34073395 PMCID: PMC8227390 DOI: 10.3390/genes12060809] [Citation(s) in RCA: 19] [Impact Index Per Article: 4.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/06/2021] [Revised: 05/22/2021] [Accepted: 05/24/2021] [Indexed: 12/11/2022] Open
Abstract
During their long evolutionary history viruses generated many proteins de novo by a mechanism called “overprinting”. Overprinting is a process in which critical nucleotide substitutions in a pre-existing gene can induce the expression of a novel protein by translation of an alternative open reading frame (ORF). Overlapping genes represent an intriguing example of adaptive conflict, because they simultaneously encode two proteins whose freedom to change is constrained by each other. However, overlapping genes are also a source of genetic novelties, as the constraints under which alternative ORFs evolve can give rise to proteins with unusual sequence properties, most importantly the potential for novel functions. Starting with the discovery of overlapping genes in phages infecting Escherichia coli, this review covers a range of studies dealing with detection of overlapping genes in small eukaryotic viruses (genomic length below 30 kb) and recognition of their critical role in the evolution of pathogenicity. Origin of overlapping genes, what factors favor their birth and retention, and how they manage their inherent adaptive conflict are extensively reviewed. Special attention is paid to the assembly of overlapping genes into ad hoc databases, suitable for future studies, and to the development of statistical methods for exploring viral genome sequences in search of undiscovered overlaps.
Collapse
Affiliation(s)
- Angelo Pavesi
- Department of Chemistry, Life Sciences and Environmental Sustainability, University of Parma, Parco Area delle Scienze 23/A, I-43124 Parma, Italy
| |
Collapse
|
53
|
Gholizadeh Z, Iqbal MS, Li R, Romerio F. The HIV-1 Antisense Gene ASP: The New Kid on the Block. Vaccines (Basel) 2021; 9:vaccines9050513. [PMID: 34067514 PMCID: PMC8156140 DOI: 10.3390/vaccines9050513] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/20/2021] [Revised: 05/04/2021] [Accepted: 05/13/2021] [Indexed: 01/14/2023] Open
Abstract
Viruses have developed incredibly creative ways of making a virtue out of necessity, including taking full advantage of their small genomes. Indeed, viruses often encode multiple proteins within the same genomic region by using two or more reading frames in both orientations through a process called overprinting. Complex retroviruses provide compelling examples of that. The human immunodeficiency virus type 1 (HIV-1) genome expresses sixteen proteins from nine genes that are encoded in the three positive-sense reading frames. In addition, the genome of some HIV-1 strains contains a tenth gene in one of the negative-sense reading frames. The so-called Antisense Protein (ASP) gene overlaps the HIV-1 Rev Response Element (RRE) and the envelope glycoprotein gene, and encodes a highly hydrophobic protein of ~190 amino acids. Despite being identified over thirty years ago, relatively few studies have investigated the role that ASP may play in the virus lifecycle, and its expression in vivo is still questioned. Here we review the current knowledge about ASP, and we discuss some of the many unanswered questions.
Collapse
|
54
|
Verbruggen S, Gessulat S, Gabriels R, Matsaroki A, Van de Voorde H, Kuster B, Degroeve S, Martens L, Van Criekinge W, Wilhelm M, Menschaert G. Spectral Prediction Features as a Solution for the Search Space Size Problem in Proteogenomics. Mol Cell Proteomics 2021; 20:100076. [PMID: 33823297 PMCID: PMC8214147 DOI: 10.1016/j.mcpro.2021.100076] [Citation(s) in RCA: 27] [Impact Index Per Article: 6.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/27/2020] [Revised: 03/04/2021] [Accepted: 03/25/2021] [Indexed: 11/17/2022] Open
Abstract
Proteogenomics approaches often struggle with the distinction between true and false peptide-to-spectrum matches as the database size enlarges. However, features extracted from tandem mass spectrometry intensity predictors can enhance the peptide identification rate and can provide extra confidence for peptide-to-spectrum matching in a proteogenomics context. To that end, features from the spectral intensity pattern predictors MS2PIP and Prosit were combined with the canonical scores from MaxQuant in the Percolator postprocessing tool for protein sequence databases constructed out of ribosome profiling and nanopore RNA-Seq analyses. The presented results provide evidence that this approach enhances both the identification rate as well as the validation stringency in a proteogenomic setting. First proteogenomics with PSM rescoring using machine learning–predicted spectra Demonstrated on both ribosome profiling and nanopore RNA-Seq–derived databases Rescoring leads to elevated stringency and increased identification rates Rescoring compensates for the search space size issues in proteogenomics
Collapse
Affiliation(s)
- Steven Verbruggen
- BioBix, Lab of Bioinformatics and Computational Genomics, Department of Mathematical Modeling, Statistics and Bioinformatics, Faculty of Bioscience Engineering, Ghent University, Ghent, Belgium; OHMX.bio, Ghent, Belgium
| | - Siegfried Gessulat
- Chair of Proteomics and Bioanalytics, Technical University of Munich, Freising, Germany
| | - Ralf Gabriels
- Department of Biomolecular Medicine, Faculty of Medicine and Health Sciences, Ghent University, Ghent, Belgium; VIB-UGent Center for Medical Biotechnology, VIB, Ghent, Belgium
| | | | | | - Bernhard Kuster
- Chair of Proteomics and Bioanalytics, Technical University of Munich, Freising, Germany
| | - Sven Degroeve
- Department of Biomolecular Medicine, Faculty of Medicine and Health Sciences, Ghent University, Ghent, Belgium; VIB-UGent Center for Medical Biotechnology, VIB, Ghent, Belgium
| | - Lennart Martens
- Department of Biomolecular Medicine, Faculty of Medicine and Health Sciences, Ghent University, Ghent, Belgium; VIB-UGent Center for Medical Biotechnology, VIB, Ghent, Belgium
| | - Wim Van Criekinge
- BioBix, Lab of Bioinformatics and Computational Genomics, Department of Mathematical Modeling, Statistics and Bioinformatics, Faculty of Bioscience Engineering, Ghent University, Ghent, Belgium
| | - Mathias Wilhelm
- Chair of Proteomics and Bioanalytics, Technical University of Munich, Freising, Germany
| | - Gerben Menschaert
- BioBix, Lab of Bioinformatics and Computational Genomics, Department of Mathematical Modeling, Statistics and Bioinformatics, Faculty of Bioscience Engineering, Ghent University, Ghent, Belgium; OHMX.bio, Ghent, Belgium.
| |
Collapse
|
55
|
Schlesinger D, Elsässer SJ. Revisiting sORFs: overcoming challenges to identify and characterize functional microproteins. FEBS J 2021; 289:53-74. [PMID: 33595896 DOI: 10.1111/febs.15769] [Citation(s) in RCA: 69] [Impact Index Per Article: 17.3] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/21/2020] [Revised: 01/17/2021] [Accepted: 02/15/2021] [Indexed: 02/07/2023]
Abstract
Short ORFs (sORFs), that is, occurrences of a start and stop codon within 100 codons or less, can be found in organisms of all domains of life, outnumbering annotated protein-coding ORFs by orders of magnitude. Even though functional proteins smaller than 100 amino acids are known, the coding potential of sORFs has often been overlooked, as it is not trivial to predict and test for functionality within the large number of sORFs. Recent advances in ribosome profiling and mass spectrometry approaches, together with refined bioinformatic predictions, have enabled a huge leap forward in this field and identified thousands of likely coding sORFs. A relatively low number of small proteins or microproteins produced from these sORFs have been characterized so far on the molecular, structural, and/or mechanistic level. These however display versatile and, in some cases, essential cellular functions, allowing for the exciting possibility that many more, previously unknown small proteins might be encoded in the genome, waiting to be discovered. This review will give an overview of the steadily growing microprotein field, focusing on eukaryotic small proteins. We will discuss emerging themes in the molecular action of microproteins, as well as advances and challenges in microprotein identification and characterization.
Collapse
Affiliation(s)
- Dörte Schlesinger
- Science for Life Laboratory, Division of Genome Biology, Department of Medical Biochemistry and Biophysics, Karolinska Institutet, Stockholm, Sweden.,Ming Wai Lau Centre for Reparative Medicine, Stockholm node, Karolinska Institutet, Stockholm, Sweden
| | - Simon J Elsässer
- Science for Life Laboratory, Division of Genome Biology, Department of Medical Biochemistry and Biophysics, Karolinska Institutet, Stockholm, Sweden.,Ming Wai Lau Centre for Reparative Medicine, Stockholm node, Karolinska Institutet, Stockholm, Sweden
| |
Collapse
|
56
|
Cohen AA, Leblanc S, Roucou X. Robust Physiological Metrics From Sparsely Sampled Networks. Front Physiol 2021; 12:624097. [PMID: 33643068 PMCID: PMC7902772 DOI: 10.3389/fphys.2021.624097] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/30/2020] [Accepted: 01/12/2021] [Indexed: 12/14/2022] Open
Abstract
Physiological and biochemical networks are highly complex, involving thousands of nodes as well as a hierarchical structure. True network structure is also rarely known. This presents major challenges for applying classical network theory to these networks. However, complex systems generally share the property of having a diffuse or distributed signal. Accordingly, we should predict that system state can be robustly estimated with sparse sampling, and with limited knowledge of true network structure. In this review, we summarize recent findings from several methodologies to estimate system state via a limited sample of biomarkers, notably Mahalanobis distance, principal components analysis, and cluster analysis. While statistically simple, these methods allow novel characterizations of system state when applied judiciously. Broadly, system state can often be estimated even from random samples of biomarkers. Furthermore, appropriate methods can detect emergent underlying physiological structure from this sparse data. We propose that approaches such as these are a powerful tool to understand physiology, and could lead to a new understanding and mapping of the functional implications of biological variation.
Collapse
Affiliation(s)
- Alan A. Cohen
- Groupe de Recherche PRIMUS, Département de Médecine de Famille et de Médecine d’Urgence, Université de Sherbrooke, Sherbrooke, QC, Canada
- Centre de Recherche, Centre Hospitalier Universitaire de Sherbrooke (CRCHUS), Sherbrooke, QC, Canada
- Research Center on Aging, CIUSSS-de-l’Estrie-CHUS, Sherbrooke, QC, Canada
| | - Sebastien Leblanc
- Département de Biochimie et de Génomique Fonctionnelle, Université de Sherbrooke, Sherbrooke, QC, Canada
| | - Xavier Roucou
- Département de Biochimie et de Génomique Fonctionnelle, Université de Sherbrooke, Sherbrooke, QC, Canada
| |
Collapse
|
57
|
Gunnarsson S, Prabakaran S. In silico identification of novel open reading frames in Plasmodium falciparum oocyte and salivary gland sporozoites using proteogenomics framework. Malar J 2021; 20:71. [PMID: 33546698 PMCID: PMC7866754 DOI: 10.1186/s12936-021-03598-1] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/11/2020] [Accepted: 01/16/2021] [Indexed: 11/25/2022] Open
Abstract
Background Plasmodium falciparum causes the deadliest form of malaria, which remains one of the most prevalent infectious diseases. Unfortunately, the only licensed vaccine showed limited protection and resistance to anti-malarial drug is increasing, which can be largely attributed to the biological complexity of the parasite’s life cycle. The progression from one developmental stage to another in P. falciparum involves drastic changes in gene expressions, where its infectivity to human hosts varies greatly depending on the stage. Approaches to identify candidate genes that are responsible for the development of infectivity to human hosts typically involve differential gene expression analysis between stages. However, the detection may be limited to annotated proteins and open reading frames (ORFs) predicted using restrictive criteria. Methods The above problem is particularly relevant for P. falciparum; whose genome annotation is relatively incomplete given its clinical significance. In this work, systems proteogenomics approach was used to address this challenge, as it allows computational detection of unannotated, novel Open Reading Frames (nORFs), which are neglected by conventional analyses. Two pairs of transcriptome/proteome were obtained from a previous study where one was collected in the mosquito-infectious oocyst sporozoite stage, and the other in the salivary gland sporozoite stage with human infectivity. They were then re-analysed using the proteogenomics framework to identify nORFs in each stage. Results Translational products of nORFs that map to antisense, intergenic, intronic, 3′ UTR and 5′ UTR regions, as well as alternative reading frames of canonical proteins were detected. Some of these nORFs also showed differential expression between the two life cycle stages studied. Their regulatory roles were explored through further bioinformatics analyses including the expression regulation on the parent reference genes, in silico structure prediction, and gene ontology term enrichment analysis. Conclusion The identification of nORFs in P. falciparum sporozoites highlights the biological complexity of the parasite. Although the analyses are solely computational, these results provide a starting point for further experimental validation of the existence and functional roles of these nORFs,
Collapse
Affiliation(s)
- Sophie Gunnarsson
- Department of Genetics, University of Cambridge, Downing Site, Cambridge, CB2 3EH, UK
| | - Sudhakaran Prabakaran
- Department of Genetics, University of Cambridge, Downing Site, Cambridge, CB2 3EH, UK.
| |
Collapse
|
58
|
Erady C, Boxall A, Puntambekar S, Suhas Jagannathan N, Chauhan R, Chong D, Meena N, Kulkarni A, Kasabe B, Prathivadi Bhayankaram K, Umrania Y, Andreani A, Nel J, Wayland MT, Pina C, Lilley KS, Prabakaran S. Pan-cancer analysis of transcripts encoding novel open-reading frames (nORFs) and their potential biological functions. NPJ Genom Med 2021; 6:4. [PMID: 33495453 PMCID: PMC7835362 DOI: 10.1038/s41525-020-00167-4] [Citation(s) in RCA: 22] [Impact Index Per Article: 5.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/06/2020] [Accepted: 11/18/2020] [Indexed: 12/13/2022] Open
Abstract
Uncharacterized and unannotated open-reading frames, which we refer to as novel open reading frames (nORFs), may sometimes encode peptides that remain unexplored for novel therapeutic opportunities. To our knowledge, no systematic identification and characterization of transcripts encoding nORFs or their translation products in cancer, or in any other physiological process has been performed. We use our curated nORFs database (nORFs.org), together with RNA-Seq data from The Cancer Genome Atlas (TCGA) and Genotype-Expression (GTEx) consortiums, to identify transcripts containing nORFs that are expressed frequently in cancer or matched normal tissue across 22 cancer types. We show nORFs are subject to extensive dysregulation at the transcript level in cancer tissue and that a small subset of nORFs are associated with overall patient survival, suggesting that nORFs may have prognostic value. We also show that nORF products can form protein-like structures with post-translational modifications. Finally, we perform in silico screening for inhibitors against nORF-encoded proteins that are disrupted in stomach and esophageal cancer, showing that they can potentially be targeted by inhibitors. We hope this work will guide and motivate future studies that perform in-depth characterization of nORF functions in cancer and other diseases.
Collapse
Affiliation(s)
- Chaitanya Erady
- Department of Genetics, University of Cambridge, Downing Site, Cambridge, CB2 3EH, UK
| | - Adam Boxall
- Department of Genetics, University of Cambridge, Downing Site, Cambridge, CB2 3EH, UK
| | - Shraddha Puntambekar
- Department of Biology, Indian Institute of Science Education and Research, Pune, Maharashtra, 411008, India
| | - N Suhas Jagannathan
- Cancer and Stem Cell Biology Programme, and Centre for Computational Biology, Duke-NUS Medical School, Singapore, 169857, Singapore
| | - Ruchi Chauhan
- Department of Genetics, University of Cambridge, Downing Site, Cambridge, CB2 3EH, UK
| | - David Chong
- Department of Genetics, University of Cambridge, Downing Site, Cambridge, CB2 3EH, UK
| | - Narendra Meena
- Department of Genetics, University of Cambridge, Downing Site, Cambridge, CB2 3EH, UK
| | - Apurv Kulkarni
- Department of Biology, Indian Institute of Science Education and Research, Pune, Maharashtra, 411008, India
| | - Bhagyashri Kasabe
- Department of Biology, Indian Institute of Science Education and Research, Pune, Maharashtra, 411008, India
| | | | - Yagnesh Umrania
- Cambridge Centre for Proteomics, Department of Biochemistry, University of Cambridge, Tennis Court Road, Cambridge, CB2 1QR, UK
| | - Adam Andreani
- Department of Genetics, University of Cambridge, Downing Site, Cambridge, CB2 3EH, UK
| | - Jean Nel
- Department of Genetics, University of Cambridge, Downing Site, Cambridge, CB2 3EH, UK
| | - Matthew T Wayland
- Department of Zoology, University of Cambridge, Downing Street, Cambridge, CB2 3EJ, UK
| | - Cristina Pina
- Department of Haematology, Cambridge Biomedical Campus, Cambridge, CB2 0PT, UK
| | - Kathryn S Lilley
- Cambridge Centre for Proteomics, Department of Biochemistry, University of Cambridge, Tennis Court Road, Cambridge, CB2 1QR, UK
| | - Sudhakaran Prabakaran
- Department of Genetics, University of Cambridge, Downing Site, Cambridge, CB2 3EH, UK.
| |
Collapse
|
59
|
Gagnon M, Savard M, Jacques JF, Bkaily G, Geha S, Roucou X, Gobeil F. Potentiation of B2 receptor signaling by AltB2R, a newly identified alternative protein encoded in the human bradykinin B2 receptor gene. J Biol Chem 2021; 296:100329. [PMID: 33497625 PMCID: PMC7949122 DOI: 10.1016/j.jbc.2021.100329] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/19/2020] [Revised: 01/12/2021] [Accepted: 01/21/2021] [Indexed: 12/27/2022] Open
Abstract
Recent functional and proteomic studies in eukaryotes (www.openprot.org) predict the translation of alternative open reading frames (AltORFs) in mature G-protein-coupled receptor (GPCR) mRNAs, including that of bradykinin B2 receptor (B2R). Our main objective was to determine the implication of a newly discovered AltORF resulting protein, termed AltB2R, in the known signaling properties of B2R using complementary methodological approaches. When ectopically expressed in HeLa cells, AltB2R presented predominant punctate cytoplasmic/perinuclear distribution and apparent cointeraction with B2R at plasma and endosomal/vesicular membranes. The presence of AltB2R increases intracellular [Ca2+] and ERK1/2-MAPK activation (via phosphorylation) following B2R stimulation. Moreover, HEK293A cells expressing mutant B2R lacking concomitant expression of AltB2R displayed significantly decreased maximal responses in agonist-stimulated Gαq-Gαi2/3-protein coupling, IP3 generation, and ERK1/2-MAPK activation as compared with wild-type controls. Conversely, there was no difference in cell-surface density as well as ligand-binding properties of B2R and in efficiencies of cognate agonists at promoting B2R internalization and β-arrestin 2 recruitment. Importantly, both AltB2R and B2R proteins were overexpressed in prostate and breast cancers, compared with their normal counterparts suggesting new associative roles of AltB2R in these diseases. Our study shows that BDKRB2 is a dual-coding gene and identifies AltB2R as a novel positive modulator of some B2R signaling pathways. More broadly, it also supports a new, unexpected alternative proteome for GPCRs, which opens new frontiers in fields of GPCR biology, diseases, and drug discovery.
Collapse
Affiliation(s)
- Maxime Gagnon
- Department of Biochemistry, Université de Sherbrooke, Sherbrooke, Québec, Canada; Institute of Pharmacology, Université de Sherbrooke, Sherbrooke, Québec, Canada
| | - Martin Savard
- Department of Pharmacology & Physiology, Université de Sherbrooke, Sherbrooke, Québec, Canada; Institute of Pharmacology, Université de Sherbrooke, Sherbrooke, Québec, Canada
| | - Jean-François Jacques
- Department of Pharmacology & Physiology, Université de Sherbrooke, Sherbrooke, Québec, Canada
| | - Ghassan Bkaily
- Department of Immunology & Cellular Biology, Université de Sherbrooke, Sherbrooke, Québec, Canada
| | - Sameh Geha
- Department of Pathology, Centre Hospitalier Universitaire de Sherbrooke, Sherbrooke, Québec, Canada
| | - Xavier Roucou
- Department of Biochemistry, Université de Sherbrooke, Sherbrooke, Québec, Canada; Institute of Pharmacology, Université de Sherbrooke, Sherbrooke, Québec, Canada.
| | - Fernand Gobeil
- Department of Pharmacology & Physiology, Université de Sherbrooke, Sherbrooke, Québec, Canada; Institute of Pharmacology, Université de Sherbrooke, Sherbrooke, Québec, Canada.
| |
Collapse
|
60
|
Witkowski JM, Bryl E, Fulop T. Proteodynamics and aging of eukaryotic cells. Mech Ageing Dev 2021; 194:111430. [PMID: 33421431 DOI: 10.1016/j.mad.2021.111430] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/29/2020] [Revised: 12/28/2020] [Accepted: 12/30/2020] [Indexed: 12/11/2022]
Abstract
All aspects of each protein existence in the eukaryotic cells, starting from the pre-translation events, through translation, multiple different post-translational modifications, functional life and eventual proteostatic removal after loss of functionality and changes in physico-chemical properties, can be collectively called the proteodynamics. With aging, passing of time as well as accumulating effects of exposures, interactions and wearing-off lead to problems at each of the above mentioned stages, eventually leading to general malfunction of the proteome. This work briefly reviews and summarizes current knowledge concerning this important topic.
Collapse
Affiliation(s)
- Jacek M Witkowski
- Department of Pathophysiology, Medical University of Gdańsk, Gdańsk, Poland.
| | - Ewa Bryl
- Department of Pathology and Experimental Rheumatology, Medical University of Gdańsk, Gdańsk, Poland
| | - Tamas Fulop
- Research Center on Aging, Graduate Program in Immunology, Faculty of Medicine and Health Sciences, University of Sherbrooke, Sherbrooke, Quebec, Canada
| |
Collapse
|
61
|
Cardon T, Fournier I, Salzet M. SARS-Cov-2 Interactome with Human Ghost Proteome: A Neglected World Encompassing a Wealth of Biological Data. Microorganisms 2020; 8:E2036. [PMID: 33352703 PMCID: PMC7766365 DOI: 10.3390/microorganisms8122036] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/12/2020] [Revised: 12/16/2020] [Accepted: 12/17/2020] [Indexed: 11/17/2022] Open
Abstract
Conventionally, eukaryotic mRNAs were thought to be monocistronic, leading to the translation of a single protein. However, large-scale proteomics have led to a massive identification of proteins translated from mRNAs of alternative ORF (AltORFs), in addition to the predicted proteins issued from the reference ORF or from ncRNAs. These alternative proteins (AltProts) are not represented in the conventional protein databases and this "ghost proteome" was not considered until recently. Some of these proteins are functional and there is growing evidence that they are involved in central functions in physiological and physiopathological context. Based on our experience with AltProts, we were interested in finding out their interaction with the viral protein coming from the SARS-CoV-2 virus, responsible for the 2020 COVID-19 outbreak. Thus, we have scrutinized the recently published data by Krogan and coworkers (2020) on the SARS-CoV-2 interactome with host cells by affinity purification in co-immunoprecipitation (co-IP) in the perspective of drug repurposing. The initial work revealed the interaction between 332 human cellular reference proteins (RefProts) with the 27 viral proteins. Re-interrogation of this data using 23 viral targets and including AltProts, followed by enrichment of the interaction networks, leads to identify 218 RefProts (in common to initial study), plus 56 AltProts involved in 93 interactions. This demonstrates the necessity to take into account the ghost proteome for discovering new therapeutic targets, and establish new therapeutic strategies. Missing the ghost proteome in the drug metabolism and pharmacokinetic (DMPK) drug development pipeline will certainly be a major limitation to the establishment of efficient therapies.
Collapse
Affiliation(s)
- Tristan Cardon
- Inserm U1192, University Lille, CHU Lille, Laboratory Protéomique Réponse Inflammatoire Spectrométrie de Masse (PRISM), F-59000 Lille, France
| | - Isabelle Fournier
- Inserm U1192, University Lille, CHU Lille, Laboratory Protéomique Réponse Inflammatoire Spectrométrie de Masse (PRISM), F-59000 Lille, France
- Institut Universitaire de France, 75000 Paris, France
| | - Michel Salzet
- Inserm U1192, University Lille, CHU Lille, Laboratory Protéomique Réponse Inflammatoire Spectrométrie de Masse (PRISM), F-59000 Lille, France
- Institut Universitaire de France, 75000 Paris, France
| |
Collapse
|
62
|
Matteau D, Lachance J, Grenier F, Gauthier S, Daubenspeck JM, Dybvig K, Garneau D, Knight TF, Jacques P, Rodrigue S. Integrative characterization of the near-minimal bacterium Mesoplasma florum. Mol Syst Biol 2020; 16:e9844. [PMID: 33331123 PMCID: PMC7745072 DOI: 10.15252/msb.20209844] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/06/2020] [Revised: 11/02/2020] [Accepted: 11/03/2020] [Indexed: 12/11/2022] Open
Abstract
The near-minimal bacterium Mesoplasma florum is an interesting model for synthetic genomics and systems biology due to its small genome (~ 800 kb), fast growth rate, and lack of pathogenic potential. However, fundamental aspects of its biology remain largely unexplored. Here, we report a broad yet remarkably detailed characterization of M. florum by combining a wide variety of experimental approaches. We investigated several physical and physiological parameters of this bacterium, including cell size, growth kinetics, and biomass composition of the cell. We also performed the first genome-wide analysis of its transcriptome and proteome, notably revealing a conserved promoter motif, the organization of transcription units, and the transcription and protein expression levels of all protein-coding sequences. We converted gene transcription and expression levels into absolute molecular abundances using biomass quantification results, generating an unprecedented view of the M. florum cellular composition and functions. These characterization efforts provide a strong experimental foundation for the development of a genome-scale model for M. florum and will guide future genome engineering endeavors in this simple organism.
Collapse
Affiliation(s)
- Dominick Matteau
- Département de biologieUniversité de SherbrookeSherbrookeQCCanada
| | | | - Frédéric Grenier
- Département de biologieUniversité de SherbrookeSherbrookeQCCanada
| | - Samuel Gauthier
- Département de biologieUniversité de SherbrookeSherbrookeQCCanada
| | | | - Kevin Dybvig
- Department of GeneticsUniversity of Alabama at BirminghamBirminghamALUSA
| | - Daniel Garneau
- Département de biologieUniversité de SherbrookeSherbrookeQCCanada
| | | | | | | |
Collapse
|
63
|
Cardon T, Fournier I, Salzet M. Shedding Light on the Ghost Proteome. Trends Biochem Sci 2020; 46:239-250. [PMID: 33246829 DOI: 10.1016/j.tibs.2020.10.003] [Citation(s) in RCA: 13] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/02/2020] [Revised: 10/21/2020] [Accepted: 10/22/2020] [Indexed: 01/19/2023]
Abstract
Conventionally, eukaryotic mRNAs were thought to be monocistronic, leading to the translation of a single protein. However, large-scale proteomics has led to the identification of proteins translated from alternative open reading frames (AltORFs) in mRNAs. AltORFs are found in addition to predicted reference ORFs and noncoding RNA. Alternative proteins are not represented in the conventional protein databases, and this 'Ghost proteome' was not considered until recently. Some of these proteins are functional, and there is growing evidence that they are involved in central functions in physiological and physiopathological contexts. Here, we review how this Ghost proteome fills the gap in our understanding of signaling pathways, establishes new markers of pathologies, and highlights therapeutic targets.
Collapse
Affiliation(s)
- Tristan Cardon
- Laboratoire Protéomique, Réponse Inflammatoire Spectrométrie de Masse (PRISM), Inserm U1192, University of Lille, CHU Lille, F-59000 Lille, France.
| | - Isabelle Fournier
- Laboratoire Protéomique, Réponse Inflammatoire Spectrométrie de Masse (PRISM), Inserm U1192, University of Lille, CHU Lille, F-59000 Lille, France; Institut Universitaire de France, Paris, France.
| | - Michel Salzet
- Laboratoire Protéomique, Réponse Inflammatoire Spectrométrie de Masse (PRISM), Inserm U1192, University of Lille, CHU Lille, F-59000 Lille, France; Institut Universitaire de France, Paris, France.
| |
Collapse
|
64
|
Guerra-Almeida D, Nunes-da-Fonseca R. Small Open Reading Frames: How Important Are They for Molecular Evolution? Front Genet 2020; 11:574737. [PMID: 33193682 PMCID: PMC7606980 DOI: 10.3389/fgene.2020.574737] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/21/2020] [Accepted: 08/25/2020] [Indexed: 11/13/2022] Open
Affiliation(s)
- Diego Guerra-Almeida
- Institute of Biodiversity and Sustainability, Federal University of Rio de Janeiro, Rio de Janeiro, Brazil
| | - Rodrigo Nunes-da-Fonseca
- Institute of Biodiversity and Sustainability, Federal University of Rio de Janeiro, Rio de Janeiro, Brazil.,National Institute of Science and Technology in Molecular Entomology, Rio de Janeiro, Brazil
| |
Collapse
|
65
|
Lafranchi L, Schlesinger D, Kimler KJ, Elsässer SJ. Universal Single-Residue Terminal Labels for Fluorescent Live Cell Imaging of Microproteins. J Am Chem Soc 2020; 142:20080-20087. [PMID: 33175524 DOI: 10.1021/jacs.0c09574] [Citation(s) in RCA: 11] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/15/2022]
Abstract
Genetically encoded fluorescent tags for visualization of proteins in living cells add six to several hundred amino acids to the protein of interest. While suitable for most proteins, common tags easily match and exceed the size of microproteins of 60 amino acids or less. The added molecular weight and structure of such fluorescent tag may thus significantly affect in vivo biophysical and biochemical properties of microproteins. Here, we develop single-residue terminal labeling (STELLA) tags that introduce a single noncanonical amino acid either at the N- or C-terminus of a protein or microprotein of interest for subsequent specific fluorescent labeling. Efficient terminal noncanonical amino acid mutagenesis is achieved using a precursor tag that is tracelessly cleaved. Subsequent selective bioorthogonal reaction with a cell-permeable organic dye enables live cell imaging of microproteins with minimal perturbation of their native sequence. The use of terminal residues for labeling provides a universally applicable and easily scalable strategy, which avoids alteration of the core sequence of the microprotein.
Collapse
Affiliation(s)
- Lorenzo Lafranchi
- Science for Life Laboratory, Department of Medical Biochemistry and Biophysics, Division of Genome Biology, Karolinska Institutet, Stockholm, 17165, Sweden.,Ming Wai Lau Centre for Reparative Medicine, Stockholm node, Karolinska Institutet, Stockholm, 17165, Sweden
| | - Dörte Schlesinger
- Science for Life Laboratory, Department of Medical Biochemistry and Biophysics, Division of Genome Biology, Karolinska Institutet, Stockholm, 17165, Sweden.,Ming Wai Lau Centre for Reparative Medicine, Stockholm node, Karolinska Institutet, Stockholm, 17165, Sweden
| | - Kyle J Kimler
- Science for Life Laboratory, Department of Medical Biochemistry and Biophysics, Division of Genome Biology, Karolinska Institutet, Stockholm, 17165, Sweden.,Ming Wai Lau Centre for Reparative Medicine, Stockholm node, Karolinska Institutet, Stockholm, 17165, Sweden
| | - Simon J Elsässer
- Science for Life Laboratory, Department of Medical Biochemistry and Biophysics, Division of Genome Biology, Karolinska Institutet, Stockholm, 17165, Sweden.,Ming Wai Lau Centre for Reparative Medicine, Stockholm node, Karolinska Institutet, Stockholm, 17165, Sweden
| |
Collapse
|
66
|
Cardon T, Franck J, Coyaud E, Laurent EMN, Damato M, Maffia M, Vergara D, Fournier I, Salzet M. Alternative proteins are functional regulators in cell reprogramming by PKA activation. Nucleic Acids Res 2020; 48:7864-7882. [PMID: 32324228 PMCID: PMC7641301 DOI: 10.1093/nar/gkaa277] [Citation(s) in RCA: 19] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/09/2019] [Revised: 04/06/2020] [Accepted: 04/21/2020] [Indexed: 12/28/2022] Open
Abstract
It has been recently shown that many proteins are lacking from reference databases used in mass spectrometry analysis, due to their translation templated on alternative open reading frames. This questions our current understanding of gene annotation and drastically expands the theoretical proteome complexity. The functions of these alternative proteins (AltProts) still remain largely unknown. We have developed a large-scale and unsupervised approach based on cross-linking mass spectrometry (XL-MS) followed by shotgun proteomics to gather information on the functional role of AltProts by mapping them back into known signalling pathways through the identification of their reference protein (RefProt) interactors. We have identified and profiled AltProts in a cancer cell reprogramming system: NCH82 human glioma cells after 0, 16, 24 and 48 h Forskolin stimulation. Forskolin is a protein kinase A activator inducing cell differentiation and epithelial–mesenchymal transition. Our data show that AltMAP2, AltTRNAU1AP and AltEPHA5 interactions with tropomyosin 4 are downregulated under Forskolin treatment. In a wider perspective, Gene Ontology and pathway enrichment analysis (STRING) revealed that RefProts associated with AltProts are enriched in cellular mobility and transfer RNA regulation. This study strongly suggests novel roles of AltProts in multiple essential cellular functions and supports the importance of considering them in future biological studies.
Collapse
Affiliation(s)
- Tristan Cardon
- Univ. Lille, Inserm, CHU Lille, U1192-Protéomique Réponse Inflammatoire Spectrométrie de Masse (PRISM), F-59000 Lille, France
| | - Julien Franck
- Univ. Lille, Inserm, CHU Lille, U1192-Protéomique Réponse Inflammatoire Spectrométrie de Masse (PRISM), F-59000 Lille, France
| | - Etienne Coyaud
- Univ. Lille, Inserm, CHU Lille, U1192-Protéomique Réponse Inflammatoire Spectrométrie de Masse (PRISM), F-59000 Lille, France
| | - Estelle M N Laurent
- Univ. Lille, Inserm, CHU Lille, U1192-Protéomique Réponse Inflammatoire Spectrométrie de Masse (PRISM), F-59000 Lille, France
| | - Marina Damato
- Univ. Lille, Inserm, CHU Lille, U1192-Protéomique Réponse Inflammatoire Spectrométrie de Masse (PRISM), F-59000 Lille, France.,Department of Biological and Environmental Sciences and Technologies, University of Salento, 73100 Lecce, Italy
| | - Michele Maffia
- Department of Biological and Environmental Sciences and Technologies, University of Salento, 73100 Lecce, Italy
| | - Daniele Vergara
- Department of Biological and Environmental Sciences and Technologies, University of Salento, 73100 Lecce, Italy
| | - Isabelle Fournier
- Univ. Lille, Inserm, CHU Lille, U1192-Protéomique Réponse Inflammatoire Spectrométrie de Masse (PRISM), F-59000 Lille, France.,Institut Universitaire de France (IUF),75005 Paris, France
| | - Michel Salzet
- Univ. Lille, Inserm, CHU Lille, U1192-Protéomique Réponse Inflammatoire Spectrométrie de Masse (PRISM), F-59000 Lille, France.,Institut Universitaire de France (IUF),75005 Paris, France
| |
Collapse
|
67
|
Khitun A, Slavoff SA. Proteomic Detection and Validation of Translated Small Open Reading Frames. ACTA ACUST UNITED AC 2020; 11:e77. [PMID: 31750990 DOI: 10.1002/cpch.77] [Citation(s) in RCA: 27] [Impact Index Per Article: 5.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/13/2022]
Abstract
Small open reading frames (smORFs) encode previously unannotated polypeptides or short proteins that regulate translation in cis (eukaryotes) and/or are independently functional (prokaryotes and eukaryotes). Ongoing efforts for complete annotation and functional characterization of smORF-encoded proteins have yielded novel regulators and therapeutic targets. However, because they are excluded from protein databases, initiate at non-AUG start codons, and produce few unique tryptic peptides, unannotated small proteins cannot be detected with standard proteomic methods. Here,, we outline a procedure for mass spectrometry-based detection of translated smORFs in cultured human cells from protein extraction, digestion, and LC-MS/MS, to database preparation and data analysis. Following proteomic detection, translation from a unique smORF may be validated via siRNA-based silencing or overexpression and epitope tagging. This is necessary to unambiguously assign a peptide to a smORF within a specific transcript isoform or genomic locus. Provided that sufficient starting material is available, this workflow can be applied to any cell type/organism and adjusted to study specific (patho)physiological contexts including, but not limited to, development, stress, and disease. © 2019 by John Wiley & Sons, Inc. Basic Protocol 1: Protein extraction, size selection, and trypsin digestion Alternate Protocol 1: In-solution C8 column size selection Support Protocol 1: Chloroform/methanol precipitation Support Protocol 2: Reduction, alkylation, and in-solution protease digestion Support Protocol 3: Peptide de-salting Basic Protocol 2: Two-dimensional LC-MS/MS with ERLIC fractionation Basic Protocol 3: Transcriptomic database construction Alternate Protocol 2: Transcriptomics database generation with gffread Basic Protocol 4: Non-annotated peptide identification from LC-MS/MS data Basic Protocol 5: Validation using isotopically labeled synthetic peptide standards and siRNA Basic Protocol 6: Transcript validation using transient overexpression.
Collapse
Affiliation(s)
- Alexandra Khitun
- Department of Chemistry, Yale University, New Haven, Connecticut.,Chemical Biology Institute, Yale University, West Haven, Connecticut
| | - Sarah A Slavoff
- Department of Chemistry, Yale University, New Haven, Connecticut.,Chemical Biology Institute, Yale University, West Haven, Connecticut.,Department of Molecular Biophysics and Biochemistry, Yale University, New Haven, Connecticut
| |
Collapse
|
68
|
Evolution of novel genes in three-spined stickleback populations. Heredity (Edinb) 2020; 125:50-59. [PMID: 32499660 PMCID: PMC7413265 DOI: 10.1038/s41437-020-0319-7] [Citation(s) in RCA: 19] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/24/2019] [Revised: 04/27/2020] [Accepted: 04/30/2020] [Indexed: 12/22/2022] Open
Abstract
Eukaryotic genomes frequently acquire new protein-coding genes which may significantly impact an organism’s fitness. Novel genes can be created, for example, by duplication of large genomic regions or de novo, from previously non-coding DNA. Either way, creation of a novel transcript is an essential early step during novel gene emergence. Most studies on the gain-and-loss dynamics of novel genes so far have compared genomes between species, constraining analyses to genes that have remained fixed over long time scales. However, the importance of novel genes for rapid adaptation among populations has recently been shown. Therefore, since little is known about the evolutionary dynamics of transcripts across natural populations, we here study transcriptomes from several tissues and nine geographically distinct populations of an ecological model species, the three-spined stickleback. Our findings suggest that novel genes typically start out as transcripts with low expression and high tissue specificity. Early expression regulation appears to be mediated by gene-body methylation. Although most new and narrowly expressed genes are rapidly lost, those that survive and subsequently spread through populations tend to gain broader and higher expression levels. The properties of the encoded proteins, such as disorder and aggregation propensity, hardly change. Correspondingly, young novel genes are not preferentially under positive selection but older novel genes more often overlap with FST outlier regions. Taken together, expression of the surviving novel genes is rapidly regulated, probably via epigenetic mechanisms, while structural properties of encoded proteins are non-debilitating and might only change much later.
Collapse
|
69
|
Brunet MA, Leblanc S, Roucou X. Reconsidering proteomic diversity with functional investigation of small ORFs and alternative ORFs. Exp Cell Res 2020; 393:112057. [PMID: 32387289 DOI: 10.1016/j.yexcr.2020.112057] [Citation(s) in RCA: 29] [Impact Index Per Article: 5.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/19/2019] [Revised: 04/21/2020] [Accepted: 05/02/2020] [Indexed: 12/13/2022]
Abstract
The discovery of functional yet non-annotated open reading frames (ORFs) throughout the genome of several species presents an unprecedented challenge in current genome annotation. These novel ORFs are shorter than annotated ones and many can be found on the same RNA, in opposition to current assumptions in annotation methodologies. Whilst the literature lacks consensus, these novel ORFs are commonly referred to as small ORFs (sORFs) or alternative ORFs (alt-ORFs). Unannotated ORFs represent an overlooked layer of complexity in the coding potential of genomes and are transforming our current vision of the nature of coding genes. In this review, we outline what constitutes a sORF or an alt-ORF and emphasize differences between both nomenclatures. We then describe complementary large-scale methods to accurately discover novel ORFs as well as yield functional insights on the novel proteins they encode. While serendipitous discoveries highlighted the functional importance of some novel ORFs, omics methods facilitate and improve their characterization to better understand physiological and pathological pathways. Functional annotation of sORFs, alt-ORFs and their corresponding microproteins will likely help fundamental and clinical research.
Collapse
Affiliation(s)
- Marie A Brunet
- Department of Biochemistry and Functional Genomics, Université de Sherbrooke, Sherbrooke, Québec, Canada; PROTEO, Quebec Network for Research on Protein Function, Structure, and Engineering, Canada.
| | - Sebastien Leblanc
- Department of Biochemistry and Functional Genomics, Université de Sherbrooke, Sherbrooke, Québec, Canada; PROTEO, Quebec Network for Research on Protein Function, Structure, and Engineering, Canada
| | - Xavier Roucou
- Department of Biochemistry and Functional Genomics, Université de Sherbrooke, Sherbrooke, Québec, Canada; PROTEO, Quebec Network for Research on Protein Function, Structure, and Engineering, Canada.
| |
Collapse
|
70
|
Kiniry SJ, Michel AM, Baranov PV. Computational methods for ribosome profiling data analysis. WILEY INTERDISCIPLINARY REVIEWS. RNA 2020; 11:e1577. [PMID: 31760685 DOI: 10.1002/wrna.1577] [Citation(s) in RCA: 20] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/30/2019] [Revised: 10/12/2019] [Accepted: 10/16/2019] [Indexed: 12/15/2022]
Abstract
Since the introduction of the ribosome profiling technique in 2009 its popularity has greatly increased. It is widely used for the comprehensive assessment of gene expression and for studying the mechanisms of regulation at the translational level. As the number of ribosome profiling datasets being produced continues to grow, so too does the need for reliable software that can provide answers to the biological questions it can address. This review describes the computational methods and tools that have been developed to analyze ribosome profiling data at the different stages of the process. It starts with initial routine processing of raw data and follows with more specific tasks such as the identification of translated open reading frames, differential gene expression analysis, or evaluation of local or global codon decoding rates. The review pinpoints challenges associated with each step and explains the ways in which they are currently addressed. In addition it provides a comprehensive, albeit incomplete, list of publicly available software applicable to each step, which may be a beneficial starting point to those unexposed to ribosome profiling analysis. The outline of current challenges in ribosome profiling data analysis may inspire computational biologists to search for novel, potentially superior, solutions that will improve and expand the bioinformatician's toolbox for ribosome profiling data analysis. This article is characterized under: Translation > Ribosome Structure/Function RNA Evolution and Genomics > Computational Analyses of RNA Translation > Translation Mechanisms Translation > Translation Regulation.
Collapse
Affiliation(s)
- Stephen J Kiniry
- School of Biochemistry and Cell Biology, University College Cork, Cork, Ireland
| | - Audrey M Michel
- School of Biochemistry and Cell Biology, University College Cork, Cork, Ireland
| | - Pavel V Baranov
- School of Biochemistry and Cell Biology, University College Cork, Cork, Ireland
- Shemyakin-Ovchinnikov Institute of Bioorganic Chemistry, RAS, Moscow, Russia
| |
Collapse
|
71
|
Merino-Valverde I, Greco E, Abad M. The microproteome of cancer: From invisibility to relevance. Exp Cell Res 2020; 392:111997. [PMID: 32302626 DOI: 10.1016/j.yexcr.2020.111997] [Citation(s) in RCA: 18] [Impact Index Per Article: 3.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/08/2020] [Revised: 04/02/2020] [Accepted: 04/03/2020] [Indexed: 01/08/2023]
Abstract
Recent findings have revealed that many genomic regions previously annotated as non-protein coding actually contain small open reading frames, smaller that 300 bp, that are transcribed and translated into evolutionary conserved microproteins. To date, only a small subset of them have been functionally characterized, but they play key functions in fundamental processes such as DNA repair, RNA processing and metabolism regulation. This emergent field seems to hide a new category of molecular regulators with clinical potential. In this review, we focus on its relevance for cancer. Following Hanahan and Weinberg's classification of the hallmarks of cancer, we provide an overview of those microproteins known to be implicated in cancer or those that, based on their function, are likely to play a role in cancer. The resulting picture is that while we are at the very early times of this field, it holds the promise to provide crucial information to understand cancer biology.
Collapse
Affiliation(s)
| | - Emanuela Greco
- Vall d'Hebron Institute of Oncology (VHIO), Barcelona, 08035, Spain
| | - María Abad
- Vall d'Hebron Institute of Oncology (VHIO), Barcelona, 08035, Spain.
| |
Collapse
|
72
|
Pavesi A. New insights into the evolutionary features of viral overlapping genes by discriminant analysis. Virology 2020; 546:51-66. [PMID: 32452417 PMCID: PMC7157939 DOI: 10.1016/j.virol.2020.03.007] [Citation(s) in RCA: 24] [Impact Index Per Article: 4.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/20/2020] [Accepted: 03/29/2020] [Indexed: 12/18/2022]
Abstract
Overlapping genes originate by a mechanism of overprinting, in which nucleotide substitutions in a pre-existing frame induce the expression of a de novo protein from an alternative frame. In this study, I assembled a dataset of 319 viral overlapping genes, which included 82 overlaps whose expression is experimentally known and the respective 237 homologs. Principal component analysis revealed that overlapping genes have a common pattern of nucleotide and amino acid composition. Discriminant analysis separated overlapping from non-overlapping genes with an accuracy of 97%. When applied to overlapping genes with known genealogy, it separated ancestral from de novo frames with an accuracy close to 100%. This high discriminant power was crucial to computationally design variants of de novo viral proteins known to possess selective anticancer toxicity (apoptin) or protection against neurodegeneration (X protein), as well as to detect two new potential overlapping genes in the genome of the new coronavirus SARS-CoV-2.
Collapse
Affiliation(s)
- Angelo Pavesi
- Department of Chemistry, Life Sciences and Environmental Sustainability, University of Parma, Parco Area Delle Scienze 23/A, I-43124, Parma, Italy.
| |
Collapse
|
73
|
Cao X, Slavoff SA. Non-AUG start codons: Expanding and regulating the small and alternative ORFeome. Exp Cell Res 2020; 391:111973. [PMID: 32209305 DOI: 10.1016/j.yexcr.2020.111973] [Citation(s) in RCA: 44] [Impact Index Per Article: 8.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/01/2019] [Revised: 03/10/2020] [Accepted: 03/18/2020] [Indexed: 01/17/2023]
Abstract
Recent ribosome profiling and proteomic studies have revealed the presence of thousands of novel coding sequences, referred to as small open reading frames (sORFs), in prokaryotic and eukaryotic genomes. These genes have defied discovery via traditional genomic tools not only because they tend to be shorter than standard gene annotation length cutoffs, but also because they are, as a class, enriched in sequence properties previously assumed to be unusual, including non-AUG start codons. In this review, we summarize what is currently known about the incidence, efficiency, and mechanism of non-AUG start codon usage in prokaryotes and eukaryotes, and provide examples of regulatory and functional sORFs that initiate at non-AUG codons. While only a handful of non-AUG-initiated novel genes have been characterized in detail to date, their participation in important biological processes suggests that an improved understanding of this class of genes is needed.
Collapse
Affiliation(s)
- Xiongwen Cao
- Department of Chemistry, Yale University, New Haven, CT, 06520, United States; Chemical Biology Institute, Yale University, West Haven, CT, 06516, United States
| | - Sarah A Slavoff
- Department of Chemistry, Yale University, New Haven, CT, 06520, United States; Chemical Biology Institute, Yale University, West Haven, CT, 06516, United States; Department of Molecular Biophysics and Biochemistry, Yale University, New Haven, CT, 06529, United States.
| |
Collapse
|
74
|
Dubois ML, Meller A, Samandi S, Brunelle M, Frion J, Brunet MA, Toupin A, Beaudoin MC, Jacques JF, Lévesque D, Scott MS, Lavigne P, Roucou X, Boisvert FM. UBB pseudogene 4 encodes functional ubiquitin variants. Nat Commun 2020; 11:1306. [PMID: 32161257 PMCID: PMC7066184 DOI: 10.1038/s41467-020-15090-6] [Citation(s) in RCA: 38] [Impact Index Per Article: 7.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/30/2019] [Accepted: 02/18/2020] [Indexed: 02/06/2023] Open
Abstract
Pseudogenes are mutated copies of protein-coding genes that cannot be translated into proteins, but a small subset of pseudogenes has been detected at the protein level. Although ubiquitin pseudogenes represent one of the most abundant pseudogene families in many organisms, little is known about their expression and signaling potential. By re-analyzing public RNA-sequencing and proteomics datasets, we here provide evidence for the expression of several ubiquitin pseudogenes including UBB pseudogene 4 (UBBP4), which encodes UbKEKS (Q2K, K33E, Q49K, N60S). The functional consequences of UbKEKS conjugation appear to differ from canonical ubiquitylation. Quantitative proteomics shows that UbKEKS modifies specific proteins including lamins. Knockout of UBBP4 results in slower cell division, and accumulation of lamin A within the nucleolus. Our work suggests that a subset of proteins reported as ubiquitin targets may instead be modified by ubiquitin variants that are the products of wrongly annotated pseudogenes and induce different functional effects. Ubiquitin pseudogenes are present in many organisms but whether they encode functional proteins has remained unclear. Here, the authors show that human UBB pseudogene 4 produces ubiquitin variants with amino acid compositions and cellular functions that are distinct from canonical ubiquitin.
Collapse
Affiliation(s)
| | - Anna Meller
- Department of Immunology and Cell Biology, Sherbrooke, QC, Canada
| | - Sondos Samandi
- Department of Biochemistry and Functional Genomics, Sherbrooke, QC, Canada
| | - Mylène Brunelle
- Department of Biochemistry and Functional Genomics, Sherbrooke, QC, Canada
| | - Julie Frion
- Department of Immunology and Cell Biology, Sherbrooke, QC, Canada
| | - Marie A Brunet
- Department of Biochemistry and Functional Genomics, Sherbrooke, QC, Canada
| | - Amanda Toupin
- Department of Biochemistry and Functional Genomics, Sherbrooke, QC, Canada
| | - Maxime C Beaudoin
- Department of Biochemistry and Functional Genomics, Sherbrooke, QC, Canada
| | | | | | - Michelle S Scott
- Department of Biochemistry and Functional Genomics, Sherbrooke, QC, Canada
| | - Pierre Lavigne
- Department of Biochemistry and Functional Genomics, Sherbrooke, QC, Canada
| | - Xavier Roucou
- Department of Biochemistry and Functional Genomics, Sherbrooke, QC, Canada.
| | | |
Collapse
|
75
|
Cardon T, Hervé F, Delcourt V, Roucou X, Salzet M, Franck J, Fournier I. Optimized Sample Preparation Workflow for Improved Identification of Ghost Proteins. Anal Chem 2019; 92:1122-1129. [PMID: 31829555 DOI: 10.1021/acs.analchem.9b04188] [Citation(s) in RCA: 28] [Impact Index Per Article: 4.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/30/2022]
Abstract
Large scale proteomic strategies rely on database interrogation. Thus, only referenced proteins can be identified. Recently, Alternative Proteins (AltProts) translated from nonannotated Alternative Open reading frame (AltORFs) were discovered using customized databases. Because of their small size which confers them peptide-like physicochemical properties, they are more difficult to detect using standard proteomics strategies. In this study, we tested different preparation workflows for improving the identification of AltProts in NCH82 human glioma cell line. The highest number of identified AltProts was achieved with RIPA buffer or boiling water extraction followed by acetic acid precipitation.
Collapse
Affiliation(s)
- Tristan Cardon
- Inserm, U1192 - Laboratoire Protéomique, Réponse Inflammatoire et Spectrométrie de Masse (PRISM) , Université de Lille , F-59000 Lille , France
| | - Flore Hervé
- Inserm, U1192 - Laboratoire Protéomique, Réponse Inflammatoire et Spectrométrie de Masse (PRISM) , Université de Lille , F-59000 Lille , France
| | - Vivian Delcourt
- Inserm, U1192 - Laboratoire Protéomique, Réponse Inflammatoire et Spectrométrie de Masse (PRISM) , Université de Lille , F-59000 Lille , France.,Department of Biochemistry , Université de Sherbrooke , Quebec , Sherbrooke , Canada
| | - Xavier Roucou
- Inserm, U1192 - Laboratoire Protéomique, Réponse Inflammatoire et Spectrométrie de Masse (PRISM) , Université de Lille , F-59000 Lille , France.,Department of Biochemistry , Université de Sherbrooke , Quebec , Sherbrooke , Canada
| | - Michel Salzet
- Inserm, U1192 - Laboratoire Protéomique, Réponse Inflammatoire et Spectrométrie de Masse (PRISM) , Université de Lille , F-59000 Lille , France.,Institut Universitaire de France (IUF) , Paris , France
| | - Julien Franck
- Inserm, U1192 - Laboratoire Protéomique, Réponse Inflammatoire et Spectrométrie de Masse (PRISM) , Université de Lille , F-59000 Lille , France
| | - Isabelle Fournier
- Inserm, U1192 - Laboratoire Protéomique, Réponse Inflammatoire et Spectrométrie de Masse (PRISM) , Université de Lille , F-59000 Lille , France.,Institut Universitaire de France (IUF) , Paris , France
| |
Collapse
|
76
|
The multiverse nature of epithelial to mesenchymal transition. Semin Cancer Biol 2019; 58:1-10. [DOI: 10.1016/j.semcancer.2018.11.004] [Citation(s) in RCA: 69] [Impact Index Per Article: 11.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/29/2018] [Revised: 11/09/2018] [Accepted: 11/15/2018] [Indexed: 12/13/2022]
|
77
|
Michel AM, Kiniry SJ, O'Connor PBF, Mullan JP, Baranov PV. GWIPS-viz: 2018 update. Nucleic Acids Res 2019; 46:D823-D830. [PMID: 28977460 PMCID: PMC5753223 DOI: 10.1093/nar/gkx790] [Citation(s) in RCA: 39] [Impact Index Per Article: 6.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/08/2017] [Accepted: 08/29/2017] [Indexed: 12/15/2022] Open
Abstract
The GWIPS-viz browser (http://gwips.ucc.ie/) is an on-line genome browser which is tailored for exploring ribosome profiling (Ribo-seq) data. Since its publication in 2014, GWIPS-viz provides Ribo-seq data for an additional 14 genomes bringing the current total to 23. The integration of new Ribo-seq data has been automated thereby increasing the number of available tracks to 1792, a 10-fold increase in the last three years. The increase is particularly substantial for data derived from human sources. Following user requests, we added the functionality to download these tracks in bigWig format. We also incorporated new types of data (e.g. TCP-seq) as well as auxiliary tracks from other sources that help with the interpretation of Ribo-seq data. Improvements in the visualization of the data have been carried out particularly for bacterial genomes where the Ribo-seq data are now shown in a strand specific manner. For higher eukaryotic datasets, we provide characteristics of individual datasets using the RUST program which includes the triplet periodicity, sequencing biases and relative inferred A-site dwell times. This information can be used for assessing the quality of Ribo-seq datasets. To improve the power of the signal, we aggregate Ribo-seq data from several studies into Global aggregate tracks for each genome.
Collapse
Affiliation(s)
- Audrey M Michel
- School of Biochemistry and Cell Biology, University College Cork, Cork, Ireland
| | - Stephen J Kiniry
- School of Biochemistry and Cell Biology, University College Cork, Cork, Ireland
| | | | - James P Mullan
- School of Biochemistry and Cell Biology, University College Cork, Cork, Ireland
| | - Pavel V Baranov
- School of Biochemistry and Cell Biology, University College Cork, Cork, Ireland
| |
Collapse
|
78
|
Fesenko I, Kirov I, Kniazev A, Khazigaleeva R, Lazarev V, Kharlampieva D, Grafskaia E, Zgoda V, Butenko I, Arapidi G, Mamaeva A, Ivanov V, Govorun V. Distinct types of short open reading frames are translated in plant cells. Genome Res 2019; 29:1464-1477. [PMID: 31387879 PMCID: PMC6724668 DOI: 10.1101/gr.253302.119] [Citation(s) in RCA: 48] [Impact Index Per Article: 8.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/04/2019] [Accepted: 08/01/2019] [Indexed: 02/07/2023]
Abstract
Genomes contain millions of short (<100 codons) open reading frames (sORFs), which are usually dismissed during gene annotation. Nevertheless, peptides encoded by such sORFs can play important biological roles, and their impact on cellular processes has long been underestimated. Here, we analyzed approximately 70,000 transcribed sORFs in the model plant Physcomitrella patens (moss). Several distinct classes of sORFs that differ in terms of their position on transcripts and the level of evolutionary conservation are present in the moss genome. Over 5000 sORFs were conserved in at least one of 10 plant species examined. Mass spectrometry analysis of proteomic and peptidomic data sets suggested that tens of sORFs located on distinct parts of mRNAs and long noncoding RNAs (lncRNAs) are translated, including conserved sORFs. Translational analysis of the sORFs and main ORFs at a single locus suggested the existence of genes that code for multiple proteins and peptides with tissue-specific expression. Functional analysis of four lncRNA-encoded peptides showed that sORFs-encoded peptides are involved in regulation of growth and differentiation in moss. Knocking out lncRNA-encoded peptides resulted in a decrease of moss growth. In contrast, the overexpression of these peptides resulted in a diverse range of phenotypic effects. Our results thus open new avenues for discovering novel, biologically active peptides in the plant kingdom.
Collapse
Affiliation(s)
- Igor Fesenko
- Shemyakin and Ovchinnikov Institute of Bioorganic Chemistry of the Russian Academy of Sciences, 117997 Moscow, Russian Federation
| | - Ilya Kirov
- Laboratory of marker-assisted and genomic selection of plants, All-Russian Research Institute of Agricultural Biotechnology, 127550 Moscow, Russian Federation
| | - Andrey Kniazev
- Shemyakin and Ovchinnikov Institute of Bioorganic Chemistry of the Russian Academy of Sciences, 117997 Moscow, Russian Federation
| | - Regina Khazigaleeva
- Shemyakin and Ovchinnikov Institute of Bioorganic Chemistry of the Russian Academy of Sciences, 117997 Moscow, Russian Federation
| | - Vassili Lazarev
- Federal Research and Clinical Center of Physical-Chemical Medicine of Federal Medical Biological Agency, 119435 Moscow, Russian Federation.,Moscow Institute of Physics and Technology (National Research University), 141701 Dolgoprudny, Moscow Region, Russian Federation
| | - Daria Kharlampieva
- Federal Research and Clinical Center of Physical-Chemical Medicine of Federal Medical Biological Agency, 119435 Moscow, Russian Federation
| | - Ekaterina Grafskaia
- Federal Research and Clinical Center of Physical-Chemical Medicine of Federal Medical Biological Agency, 119435 Moscow, Russian Federation.,Moscow Institute of Physics and Technology (National Research University), 141701 Dolgoprudny, Moscow Region, Russian Federation
| | - Viktor Zgoda
- Laboratory of System Biology, Institute of Biomedical Chemistry, 119121 Moscow, Russian Federation
| | - Ivan Butenko
- Federal Research and Clinical Center of Physical-Chemical Medicine of Federal Medical Biological Agency, 119435 Moscow, Russian Federation
| | - Georgy Arapidi
- Shemyakin and Ovchinnikov Institute of Bioorganic Chemistry of the Russian Academy of Sciences, 117997 Moscow, Russian Federation.,Federal Research and Clinical Center of Physical-Chemical Medicine of Federal Medical Biological Agency, 119435 Moscow, Russian Federation
| | - Anna Mamaeva
- Shemyakin and Ovchinnikov Institute of Bioorganic Chemistry of the Russian Academy of Sciences, 117997 Moscow, Russian Federation
| | - Vadim Ivanov
- Shemyakin and Ovchinnikov Institute of Bioorganic Chemistry of the Russian Academy of Sciences, 117997 Moscow, Russian Federation
| | - Vadim Govorun
- Federal Research and Clinical Center of Physical-Chemical Medicine of Federal Medical Biological Agency, 119435 Moscow, Russian Federation
| |
Collapse
|
79
|
Nielly-Thibault L, Landry CR. Differences Between the Raw Material and the Products of de Novo Gene Birth Can Result from Mutational Biases. Genetics 2019; 212:1353-1366. [PMID: 31227545 PMCID: PMC6707459 DOI: 10.1534/genetics.119.302187] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/18/2019] [Accepted: 06/14/2019] [Indexed: 12/03/2022] Open
Abstract
Proteins are among the most important constituents of biological systems. Because all protein-coding genes have a noncoding ancestral form, the properties of noncoding sequences and how they shape the birth of novel proteins may influence the structure and function of all proteins. Differences between the properties of young proteins and random expectations from noncoding sequences have previously been interpreted as the result of natural selection. However, interpreting such deviations requires a yet-unattained understanding of the raw material of de novo gene birth and its relation to novel functional proteins. We mathematically show that the average properties and selective filtering of the "junk" polypeptides of which this raw material is composed are not the only factors influencing the properties of novel functional proteins. We find that in some biological scenarios, they also depend on the variance of the properties of junk polypeptides and their correlation with the rate of allelic turnover, which may itself depend on mutational biases. This suggests for instance that any property of polypeptides that accelerates their exploration of the sequence space could be overrepresented in novel functional proteins, even if it has a limited effect on adaptive value. To exemplify the use of our general theoretical results, we build a simple model that predicts the mean length and mean intrinsic disorder of novel functional proteins from the genomic GC content and a single evolutionary parameter. This work provides a theoretical framework that can guide the prediction and interpretation of results when studying the de novo emergence of protein-coding genes.
Collapse
Affiliation(s)
- Lou Nielly-Thibault
- Institut de Biologie Intégrative et des Systèmes, Université Laval, Quebec, Quebec G1V 0A6, Canada
- Département de Biologie, Université Laval, Quebec, Quebec G1V 0A6, Canada
- Département de Biochimie, de Microbiologie et de Bio-Informatique, Université Laval, Quebec, Quebec G1V 0A6, Canada
- PROTEO, Quebec, Quebec G1V 0A6, Canada
| | - Christian R Landry
- Institut de Biologie Intégrative et des Systèmes, Université Laval, Quebec, Quebec G1V 0A6, Canada
- Département de Biologie, Université Laval, Quebec, Quebec G1V 0A6, Canada
- Département de Biochimie, de Microbiologie et de Bio-Informatique, Université Laval, Quebec, Quebec G1V 0A6, Canada
- PROTEO, Quebec, Quebec G1V 0A6, Canada
| |
Collapse
|
80
|
Cardon T, Salzet M, Franck J, Fournier I. Nuclei of HeLa cells interactomes unravel a network of ghost proteins involved in proteins translation. Biochim Biophys Acta Gen Subj 2019; 1863:1458-1470. [PMID: 31128158 DOI: 10.1016/j.bbagen.2019.05.009] [Citation(s) in RCA: 20] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/19/2018] [Revised: 04/18/2019] [Accepted: 05/14/2019] [Indexed: 11/29/2022]
Abstract
Ghost proteins are issued from alternative Open Reading Frames (ORFs) and are missing a genome annotation. Indeed, historical filters applied for the detection of putative translated ORFs led to a wrong classification of transcripts considered as non-coding although translated proteins can be detected by proteomics. This Ghost (also called Alternative) proteome was neglected, and one major issue is to identify the implication of the Ghost proteins in the biological processes. In this context, we aimed to identify the protein-protein interactions (PPIs) of the Ghost proteins. For that, we re-explored a cross-link MS study performed on nuclei of HeLa cells using cross-linking mass spectrometry (XL-MS) associated with the HaltOrf database. Among 1679 cross-link interactions identified, 292 are involving Ghost Proteins. Forty-Four of these Ghost proteins are found to interact with 7 Reference proteins related to ribonucleoproteins, ribosome subunits and zinc finger proteins network. We, thus, have focused our attention on the heterotrimer between the RE/poly(U)-binding/degradation factor 1 (AUF1), the Ribosomal protein 10 (RPL10) and AltATAD2. Using I-Tasser software we performed docking models from which we could suggest the attachment of AUF1 on the external part of RPL10 and the interaction of AltATAD2 on the RPL10 region interacting with 5S ribosomal RNA as a mechanism of regulation of the ribosome. Taken together, these results reveal the importance of Ghost Proteins within known protein interaction networks.
Collapse
Affiliation(s)
- Tristan Cardon
- Inserm, U1192 - Laboratoire Protéomique, Réponse Inflammatoire et Spectrométrie de Masse (PRISM), Université de Lille, F-59000 Lille, France
| | - Michel Salzet
- Inserm, U1192 - Laboratoire Protéomique, Réponse Inflammatoire et Spectrométrie de Masse (PRISM), Université de Lille, F-59000 Lille, France.
| | - Julien Franck
- Inserm, U1192 - Laboratoire Protéomique, Réponse Inflammatoire et Spectrométrie de Masse (PRISM), Université de Lille, F-59000 Lille, France.
| | - Isabelle Fournier
- Inserm, U1192 - Laboratoire Protéomique, Réponse Inflammatoire et Spectrométrie de Masse (PRISM), Université de Lille, F-59000 Lille, France.
| |
Collapse
|
81
|
Pavesi A. Asymmetric evolution in viral overlapping genes is a source of selective protein adaptation. Virology 2019; 532:39-47. [PMID: 31004987 PMCID: PMC7125799 DOI: 10.1016/j.virol.2019.03.017] [Citation(s) in RCA: 15] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/03/2019] [Revised: 03/25/2019] [Accepted: 03/26/2019] [Indexed: 12/29/2022]
Abstract
Overlapping genes represent an intriguing puzzle, as they encode two proteins whose ability to evolve is constrained by each other. Overlapping genes can undergo “symmetric evolution” (similar selection pressures on the two proteins) or “asymmetric evolution” (significantly different selection pressures on the two proteins). By sequence analysis of 75 pairs of homologous viral overlapping genes, I evaluated their accordance with one or the other model. Analysis of nucleotide and amino acid sequences revealed that half of overlaps undergo asymmetric evolution, as the protein from one frame shows a number of substitutions significantly higher than that of the protein from the other frame. Interestingly, the most variable protein (often known to interact with the host proteins) appeared to be encoded by the de novo frame in all cases examined. These findings suggest that overlapping genes, besides to increase the coding ability of viruses, are also a source of selective protein adaptation. A dataset of 80 pairs of homologous overlapping genes from viruses is examined. Its analysis reveals that half of overlapping genes undergo asymmetric evolution. The most variable gene product is that encoded by the de novo overlapping gene. Overlapping genes evolving asymmetrically are a source of selective protein adaptation.
Collapse
Affiliation(s)
- Angelo Pavesi
- Department of Chemistry, Life Sciences and Environmental Sustainability, University of Parma, Parco Area delle Scienze 11/A, I-43124, Parma, Italy.
| |
Collapse
|
82
|
Hao Y, Zhang L, Niu Y, Cai T, Luo J, He S, Zhang B, Zhang D, Qin Y, Yang F, Chen R. SmProt: a database of small proteins encoded by annotated coding and non-coding RNA loci. Brief Bioinform 2019; 19:636-643. [PMID: 28137767 DOI: 10.1093/bib/bbx005] [Citation(s) in RCA: 57] [Impact Index Per Article: 9.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/22/2016] [Indexed: 11/12/2022] Open
Abstract
Small proteins is the general term for proteins with length shorter than 100 amino acids. Identification and functional studies of small proteins have advanced rapidly in recent years, and several studies have shown that small proteins play important roles in diverse functions including development, muscle contraction and DNA repair. Identification and characterization of previously unrecognized small proteins may contribute in important ways to cell biology and human health. Current databases are generally somewhat deficient in that they have either not collected small proteins systematically, or contain only predictions of small proteins in a limited number of tissues and species. Here, we present a specifically designed web-accessible database, small proteins database (SmProt, http://bioinfo.ibp.ac.cn/SmProt), which is a database documenting small proteins. The current release of SmProt incorporates 255 010 small proteins computationally or experimentally identified in 291 cell lines/tissues derived from eight popular species. The database provides a variety of data including basic information (sequence, location, gene name, organism, etc.) as well as specific information (experiment, function, disease type, etc.). To facilitate data extraction, SmProt supports multiple search options, including species, genome location, gene name and their aliases, cell lines/tissues, ORF type, gene type, PubMed ID and SmProt ID. SmProt also incorporates a service for the BLAST alignment search and provides a local UCSC Genome Browser. Additionally, SmProt defines a high-confidence set of small proteins and predicts the functions of the small proteins.
Collapse
Affiliation(s)
- Yajing Hao
- Key Laboratory of RNA Biology, Institute of Biophysics, Chinese Academy of Sciences, Beijing, China
| | - Lili Zhang
- Key Laboratory of RNA Biology, Institute of Biophysics, Chinese Academy of Sciences, Beijing, China
| | - Yiwei Niu
- Key Laboratory of RNA Biology, Institute of Biophysics, Chinese Academy of Sciences, Beijing, China
| | - Tanxi Cai
- Key Laboratory of Protein and Peptide Pharmaceuticals and Laboratory of Proteomics, Institute of Biophysics, Chinese Academy of Sciences, Beijing, China
| | - Jianjun Luo
- Key Laboratory of RNA Biology, Institute of Biophysics, Chinese Academy of Sciences, Beijing, China
| | - Shunmin He
- Key Laboratory of the Zoological Systematics and Evolution, Institute of Zoology, Chinese Academy of Sciences, Beijing, China
| | - Bao Zhang
- Key Laboratory of RNA Biology, Institute of Biophysics, Chinese Academy of Sciences, Beijing, China
| | - Dejiu Zhang
- Key Laboratory of RNA Biology, Institute of Biophysics, Chinese Academy of Sciences, Beijing, China
| | - Yan Qin
- Key Laboratory of RNA Biology, Institute of Biophysics, Chinese Academy of Sciences, Beijing, China
| | - Fuquan Yang
- Key Laboratory of Protein and Peptide Pharmaceuticals and Laboratory of Proteomics, Institute of Biophysics, Chinese Academy of Sciences, Beijing, China
| | - Runsheng Chen
- Key Laboratory of RNA Biology, Institute of Biophysics, Chinese Academy of Sciences, Beijing, China
| |
Collapse
|
83
|
Weaver J, Mohammad F, Buskirk AR, Storz G. Identifying Small Proteins by Ribosome Profiling with Stalled Initiation Complexes. mBio 2019; 10:e02819-18. [PMID: 30837344 PMCID: PMC6401488 DOI: 10.1128/mbio.02819-18] [Citation(s) in RCA: 126] [Impact Index Per Article: 21.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/15/2018] [Accepted: 01/24/2019] [Indexed: 11/20/2022] Open
Abstract
Small proteins consisting of 50 or fewer amino acids have been identified as regulators of larger proteins in bacteria and eukaryotes. Despite the importance of these molecules, the total number of small proteins remains unknown because conventional annotation pipelines usually exclude small open reading frames (smORFs). We previously identified several dozen small proteins in the model organism Escherichia coli using theoretical bioinformatic approaches based on sequence conservation and matches to canonical ribosome binding sites. Here, we present an empirical approach for discovering new proteins, taking advantage of recent advances in ribosome profiling in which antibiotics are used to trap newly initiated 70S ribosomes at start codons. This approach led to the identification of many novel initiation sites in intergenic regions in E. coli We tagged 41 smORFs on the chromosome and detected protein synthesis for all but three. Not only are the corresponding genes intergenic but they are also found antisense to other genes, in operons, and overlapping other open reading frames (ORFs), some impacting the translation of larger downstream genes. These results demonstrate the utility of this method for identifying new genes, regardless of their genomic context.IMPORTANCE Proteins comprised of 50 or fewer amino acids have been shown to interact with and modulate the functions of larger proteins in a range of organisms. Despite the possible importance of small proteins, the true prevalence and capabilities of these regulators remain unknown as the small size of the proteins places serious limitations on their identification, purification, and characterization. Here, we present a ribosome profiling approach with stalled initiation complexes that led to the identification of 38 new small proteins.
Collapse
Affiliation(s)
- Jeremy Weaver
- Division of Molecular and Cellular Biology, Eunice Kennedy Shriver National Institute of Child Health and Human Development, Bethesda, Maryland, USA
| | - Fuad Mohammad
- Department of Molecular Biology and Genetics, Johns Hopkins School of Medicine, Baltimore, Maryland, USA
| | - Allen R Buskirk
- Department of Molecular Biology and Genetics, Johns Hopkins School of Medicine, Baltimore, Maryland, USA
| | - Gisela Storz
- Division of Molecular and Cellular Biology, Eunice Kennedy Shriver National Institute of Child Health and Human Development, Bethesda, Maryland, USA
| |
Collapse
|
84
|
Khitun A, Ness TJ, Slavoff SA. Small open reading frames and cellular stress responses. Mol Omics 2019; 15:108-116. [PMID: 30810554 DOI: 10.1039/c8mo00283e] [Citation(s) in RCA: 43] [Impact Index Per Article: 7.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/11/2022]
Abstract
Small open reading frames (smORFs) encoding polypeptides of less than 100 amino acids in eukaryotes (50 amino acids in prokaryotes) were historically excluded from genome annotation. However, recent advances in genomics, ribosome footprinting, and proteomics have revealed thousands of translated smORFs in genomes spanning evolutionary space. These smORFs can encode functional polypeptides, or act as cis-translational regulators. Herein we review evidence that some smORF-encoded polypeptides (SEPs) participate in stress responses in both prokaryotes and eukaryotes, and that some upstream ORFs (uORFs) regulate stress-responsive translation of downstream cistrons in eukaryotic cells. These studies provide insight into a regulated subclass of smORFs and suggest that at least some SEPs may participate in maintenance of cellular homeostasis under stress.
Collapse
Affiliation(s)
- Alexandra Khitun
- Chemical Biology Institute, Yale University, West Haven, CT 06516, USA. and Department of Chemistry, Yale University, New Haven, CT 06520, USA
| | - Travis J Ness
- Chemical Biology Institute, Yale University, West Haven, CT 06516, USA. and Department of Chemistry, Yale University, New Haven, CT 06520, USA
| | - Sarah A Slavoff
- Chemical Biology Institute, Yale University, West Haven, CT 06516, USA. and Department of Chemistry, Yale University, New Haven, CT 06520, USA and Department of Molecular Biophysics and Biochemistry, Yale University, New Haven, CT 06520, USA
| |
Collapse
|
85
|
Sapkota D, Lake AM, Yang W, Yang C, Wesseling H, Guise A, Uncu C, Dalal JS, Kraft AW, Lee JM, Sands MS, Steen JA, Dougherty JD. Cell-Type-Specific Profiling of Alternative Translation Identifies Regulated Protein Isoform Variation in the Mouse Brain. Cell Rep 2019; 26:594-607.e7. [PMID: 30650354 PMCID: PMC6392083 DOI: 10.1016/j.celrep.2018.12.077] [Citation(s) in RCA: 45] [Impact Index Per Article: 7.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/21/2018] [Revised: 10/23/2018] [Accepted: 12/18/2018] [Indexed: 12/27/2022] Open
Abstract
Alternative translation initiation and stop codon readthrough in a few well-studied cases have been shown to allow the same transcript to generate multiple protein variants. Because the brain shows a particularly abundant use of alternative splicing, we sought to study alternative translation in CNS cells. We show that alternative translation is widespread and regulated across brain transcripts. In neural cultures, we identify alternative initiation on hundreds of transcripts, confirm several N-terminal protein variants, and show the modulation of the phenomenon by KCl stimulation. We also detect readthrough in cultures and show differential levels of normal and readthrough versions of AQP4 in gliotic diseases. Finally, we couple translating ribosome affinity purification to ribosome footprinting (TRAP-RF) for cell-type-specific analysis of neuronal and astrocytic translational readthrough in the mouse brain. We demonstrate that this unappreciated mechanism generates numerous and diverse protein isoforms in a cell-type-specific manner in the brain.
Collapse
Affiliation(s)
- Darshan Sapkota
- Department of Genetics, Washington University School of Medicine, St. Louis, MO 63110, USA; Department of Psychiatry, Washington University School of Medicine, St. Louis, MO 63110, USA
| | - Allison M Lake
- Department of Genetics, Washington University School of Medicine, St. Louis, MO 63110, USA; Department of Psychiatry, Washington University School of Medicine, St. Louis, MO 63110, USA
| | - Wei Yang
- Department of Genetics, Washington University School of Medicine, St. Louis, MO 63110, USA
| | - Chengran Yang
- Department of Genetics, Washington University School of Medicine, St. Louis, MO 63110, USA; Department of Psychiatry, Washington University School of Medicine, St. Louis, MO 63110, USA
| | - Hendrik Wesseling
- Boston Children's Hospital, F.M. Kirby Center for Neurobiology, Harvard Medical School, Boston, MA 02115, USA
| | - Amanda Guise
- Boston Children's Hospital, F.M. Kirby Center for Neurobiology, Harvard Medical School, Boston, MA 02115, USA
| | - Ceren Uncu
- Boston Children's Hospital, F.M. Kirby Center for Neurobiology, Harvard Medical School, Boston, MA 02115, USA
| | - Jasbir S Dalal
- Boston Children's Hospital, F.M. Kirby Center for Neurobiology, Harvard Medical School, Boston, MA 02115, USA
| | - Andrew W Kraft
- Departments of Neurology, Radiology, and Biomedical Engineering, Washington University School of Medicine, St. Louis, MO 63110, USA
| | - Jin-Moo Lee
- Departments of Neurology, Radiology, and Biomedical Engineering, Washington University School of Medicine, St. Louis, MO 63110, USA
| | - Mark S Sands
- Department of Genetics, Washington University School of Medicine, St. Louis, MO 63110, USA; Deparment of Medicine, Washington University School of Medicine, St. Louis, MO 63112, USA
| | - Judith A Steen
- Boston Children's Hospital, F.M. Kirby Center for Neurobiology, Harvard Medical School, Boston, MA 02115, USA
| | - Joseph D Dougherty
- Department of Genetics, Washington University School of Medicine, St. Louis, MO 63110, USA; Department of Psychiatry, Washington University School of Medicine, St. Louis, MO 63110, USA.
| |
Collapse
|
86
|
Wang T, Liu Y, Liu Q, Cummins S, Zhao M. Integrative proteomic analysis reveals potential high-frequency alternative open reading frame-encoded peptides in human colorectal cancer. Life Sci 2018; 215:182-189. [PMID: 30419281 DOI: 10.1016/j.lfs.2018.11.018] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/25/2018] [Revised: 10/31/2018] [Accepted: 11/08/2018] [Indexed: 11/30/2022]
Abstract
Identification of alternative open reading frame-encoded peptides (AEPs) for the diagnosis of colorectal cancer at the proteome level is largely unexplored because of a lack of comprehensive proteomics data. Here, we performed a comprehensive integrative analysis of mass spectral data published by Clinical Proteomic Tumor Analysis Consortium and characterized 93 high-confident AEPs encoded within 75 genes. There are four cancer-related genes appeared to have AEPs identified frequently in >20 out of 95 colorectal cancer samples, including ABCF2, AR, RBM10 and NRG1. Further network analysis of the identified AEPs found the enrichment of novel AEPs within hormone androgen receptor and a highly-modularised network with 42 genes associated with patient survival. Our results not only suggested a mechanistic view of how AEPs work in cancer progression, but also shed light on somatic amino acid mutations in AEPs, which might be overlooked previously because of their low frequencies. In particular, potential high-frequency mutations in 77 samples associated with EDARADD may contribute to the discovery of new biomarkers and the development of innovative therapeutic approaches.
Collapse
Affiliation(s)
- Tianfang Wang
- School of Science and Engineering, University of the Sunshine Coast, Maroochydore DC, Queensland, 4558, Australia.
| | - Yining Liu
- The School of Public Health, Institute for Chemical Carcinogenesis, Guangzhou Medical University, 195 Dongfengxi Road, Guangzhou 510182, China
| | - Qi Liu
- Department of Biomedical Informatics, School of Medicine, Vanderbilt University, Nashville, TN 37232, United States; Center for Quantitative Sciences, School of Medicine, Vanderbilt University, Nashville, TN 37232, United States
| | - Scott Cummins
- School of Science and Engineering, University of the Sunshine Coast, Maroochydore DC, Queensland, 4558, Australia
| | - Min Zhao
- School of Science and Engineering, University of the Sunshine Coast, Maroochydore DC, Queensland, 4558, Australia.
| |
Collapse
|
87
|
Rathore A, Martinez TF, Chu Q, Saghatelian A. Small, but mighty? Searching for human microproteins and their potential for understanding health and disease. Expert Rev Proteomics 2018; 15:963-965. [PMID: 30415582 DOI: 10.1080/14789450.2018.1547194] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/22/2022]
Affiliation(s)
- Annie Rathore
- a Clayton Foundation Laboratories for Peptide Biology , The Salk Institute for Biological Studies , La Jolla , CA , USA
| | - Thomas F Martinez
- a Clayton Foundation Laboratories for Peptide Biology , The Salk Institute for Biological Studies , La Jolla , CA , USA
| | - Qian Chu
- a Clayton Foundation Laboratories for Peptide Biology , The Salk Institute for Biological Studies , La Jolla , CA , USA
| | - Alan Saghatelian
- a Clayton Foundation Laboratories for Peptide Biology , The Salk Institute for Biological Studies , La Jolla , CA , USA
| |
Collapse
|
88
|
Pavesi A, Vianelli A, Chirico N, Bao Y, Blinkova O, Belshaw R, Firth A, Karlin D. Overlapping genes and the proteins they encode differ significantly in their sequence composition from non-overlapping genes. PLoS One 2018; 13:e0202513. [PMID: 30339683 PMCID: PMC6195259 DOI: 10.1371/journal.pone.0202513] [Citation(s) in RCA: 34] [Impact Index Per Article: 4.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/22/2018] [Accepted: 08/03/2018] [Indexed: 11/19/2022] Open
Abstract
Overlapping genes represent a fascinating evolutionary puzzle, since they encode two functionally unrelated proteins from the same DNA sequence. They originate by a mechanism of overprinting, in which point mutations in an existing frame allow the expression (the "birth") of a completely new protein from a second frame. In viruses, in which overlapping genes are abundant, these new proteins often play a critical role in infection, yet they are frequently overlooked during genome annotation. This results in erroneous interpretation of mutational studies and in a significant waste of resources. Therefore, overlapping genes need to be correctly detected, especially since they are now thought to be abundant also in eukaryotes. Developing better detection methods and conducting systematic evolutionary studies require a large, reliable benchmark dataset of known cases. We thus assembled a high-quality dataset of 80 viral overlapping genes whose expression is experimentally proven. Many of them were not present in databases. We found that overall, overlapping genes differ significantly from non-overlapping genes in their nucleotide and amino acid composition. In particular, the proteins they encode are enriched in high-degeneracy amino acids and depleted in low-degeneracy ones, which may alleviate the evolutionary constraints acting on overlapping genes. Principal component analysis revealed that the vast majority of overlapping genes follow a similar composition bias, despite their heterogeneity in length and function. Six proven mammalian overlapping genes also followed this bias. We propose that this apparently near-universal composition bias may either favour the birth of overlapping genes, or/and result from selection pressure acting on them.
Collapse
Affiliation(s)
- Angelo Pavesi
- Department of Chemistry, Life Sciences and Environmental Sustainability, University of Parma, Parma, Italy
| | - Alberto Vianelli
- Department of Theoretical and Applied Sciences, University of Insubria, Varese, Italy
| | - Nicola Chirico
- Department of Theoretical and Applied Sciences, University of Insubria, Varese, Italy
| | - Yiming Bao
- BIG Data Center, Beijing Institute of Genomics, Chinese Academy of Sciences, Beijing, China
| | - Olga Blinkova
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD, United States of America
| | - Robert Belshaw
- School of Biomedical & Healthcare Sciences, Plymouth University Peninsula Schools of Medicine and Dentistry (PUPSMD), Plymouth, United Kingdom
| | - Andrew Firth
- Department of Pathology, Division of Virology, University of Cambridge, Cambridge, United Kingdom
| | - David Karlin
- Department of Zoology, University of Oxford, Oxford, United Kingdom
- Division of Structural Biology, University of Oxford, Oxford, United Kingdom
| |
Collapse
|
89
|
Rathore A, Chu Q, Tan D, Martinez TF, Donaldson CJ, Diedrich JK, Yates JR, Saghatelian A. MIEF1 Microprotein Regulates Mitochondrial Translation. Biochemistry 2018; 57:5564-5575. [PMID: 30215512 DOI: 10.1021/acs.biochem.8b00726] [Citation(s) in RCA: 71] [Impact Index Per Article: 10.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/27/2022]
Abstract
Recent technological advances led to the discovery of hundreds to thousands of peptides and small proteins (microproteins) encoded by small open reading frames (smORFs). Characterization of new microproteins demonstrates their role in fundamental biological processes and highlights the value in discovering and characterizing more microproteins. The elucidation of microprotein-protein interactions (MPIs) is useful for determining the biochemical and cellular roles of microproteins. In this study, we characterize the protein interaction partners of mitochondrial elongation factor 1 microprotein (MIEF1-MP) using a proximity labeling strategy that relies on APEX2. MIEF1-MP localizes to the mitochondrial matrix where it interacts with the mitochondrial ribosome (mitoribosome). Functional studies demonstrate that MIEF1-MP regulates mitochondrial translation via its binding to the mitoribosome. Loss of MIEF1-MP decreases the mitochondrial translation rate, while an elevated level of MIEF1-MP increases the translation rate. The identification of MIEF1-MP reveals a new gene involved in this process.
Collapse
Affiliation(s)
- Annie Rathore
- Clayton Foundation Laboratories for Peptide Biology , The Salk Institute for Biological Studies , 10010 North Torrey Pines Road , La Jolla , California 92037 , United States.,Division of Biological Sciences , University of California, San Diego , 9500 Gilman Drive , La Jolla , California 92093 , United States
| | - Qian Chu
- Clayton Foundation Laboratories for Peptide Biology , The Salk Institute for Biological Studies , 10010 North Torrey Pines Road , La Jolla , California 92037 , United States
| | - Dan Tan
- Clayton Foundation Laboratories for Peptide Biology , The Salk Institute for Biological Studies , 10010 North Torrey Pines Road , La Jolla , California 92037 , United States
| | - Thomas F Martinez
- Clayton Foundation Laboratories for Peptide Biology , The Salk Institute for Biological Studies , 10010 North Torrey Pines Road , La Jolla , California 92037 , United States
| | - Cynthia J Donaldson
- Clayton Foundation Laboratories for Peptide Biology , The Salk Institute for Biological Studies , 10010 North Torrey Pines Road , La Jolla , California 92037 , United States
| | - Jolene K Diedrich
- Mass Spectrometry Core for Proteomics and Metabolomics , The Salk Institute for Biological Studies , 10010 North Torrey Pines Road , La Jolla , California 92037 , United States.,Department of Molecular Medicine , The Scripps Research Institute , 10550 North Torrey Pines Road , La Jolla , California 92037 , United States
| | - John R Yates
- Department of Molecular Medicine , The Scripps Research Institute , 10550 North Torrey Pines Road , La Jolla , California 92037 , United States
| | - Alan Saghatelian
- Clayton Foundation Laboratories for Peptide Biology , The Salk Institute for Biological Studies , 10010 North Torrey Pines Road , La Jolla , California 92037 , United States
| |
Collapse
|
90
|
Delcourt V, Brunelle M, Roy AV, Jacques JF, Salzet M, Fournier I, Roucou X. The Protein Coded by a Short Open Reading Frame, Not by the Annotated Coding Sequence, Is the Main Gene Product of the Dual-Coding Gene MIEF1. Mol Cell Proteomics 2018; 17:2402-2411. [PMID: 30181344 PMCID: PMC6283296 DOI: 10.1074/mcp.ra118.000593] [Citation(s) in RCA: 42] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/23/2018] [Revised: 07/19/2018] [Indexed: 12/18/2022] Open
Abstract
Proteogenomics and ribosome profiling concurrently show that genes may code for both a large and one or more small proteins translated from annotated coding sequences (CDSs) and unannotated alternative open reading frames (named alternative ORFs or altORFs), respectively, but the stoichiometry between large and small proteins translated from a same gene is unknown. MIEF1, a gene recently identified as a dual-coding gene, harbors a CDS and a newly annotated and actively translated altORF located in the 5′UTR. Here, we use absolute quantification with stable isotope-labeled peptides and parallel reaction monitoring to determine levels of both proteins in two human cells lines and in human colon. We report that the main MIEF1 translational product is not the canonical 463 amino acid MiD51 protein but the small 70 amino acid alternative MiD51 protein (altMiD51). These results demonstrate the inadequacy of the single CDS concept and provide a strong argument for incorporating altORFs and small proteins in functional annotations.
Collapse
Affiliation(s)
- Vivian Delcourt
- Département de Biochimie, Université de Sherbrooke, Québec, Canada; Univ. Lille, INSERM U1192, Laboratoire Protéomique, Réponse Inflammatoire and Spectrométrie de Masse (PRISM) F-59000 Lille, France; PROTEO, Québec Network for Research on Protein Function, Structure, and Engineering, Québec, Canada
| | - Mylène Brunelle
- Département de Biochimie, Université de Sherbrooke, Québec, Canada; PROTEO, Québec Network for Research on Protein Function, Structure, and Engineering, Québec, Canada
| | - Annie V Roy
- Département de Biochimie, Université de Sherbrooke, Québec, Canada; PROTEO, Québec Network for Research on Protein Function, Structure, and Engineering, Québec, Canada
| | - Jean-François Jacques
- Département de Biochimie, Université de Sherbrooke, Québec, Canada; PROTEO, Québec Network for Research on Protein Function, Structure, and Engineering, Québec, Canada
| | - Michel Salzet
- Univ. Lille, INSERM U1192, Laboratoire Protéomique, Réponse Inflammatoire and Spectrométrie de Masse (PRISM) F-59000 Lille, France
| | - Isabelle Fournier
- Univ. Lille, INSERM U1192, Laboratoire Protéomique, Réponse Inflammatoire and Spectrométrie de Masse (PRISM) F-59000 Lille, France
| | - Xavier Roucou
- Département de Biochimie, Université de Sherbrooke, Québec, Canada; PROTEO, Québec Network for Research on Protein Function, Structure, and Engineering, Québec, Canada.
| |
Collapse
|
91
|
Dermit M, Dodel M, Mardakheh FK. Methods for monitoring and measurement of protein translation in time and space. MOLECULAR BIOSYSTEMS 2018; 13:2477-2488. [PMID: 29051942 PMCID: PMC5795484 DOI: 10.1039/c7mb00476a] [Citation(s) in RCA: 22] [Impact Index Per Article: 3.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Indexed: 12/16/2022]
Abstract
Regulation of protein translation constitutes a crucial step in control of gene expression. Here we review recent methods for system-wide monitoring and measurement of protein translation.
Regulation of protein translation constitutes a crucial step in control of gene expression. In comparison to transcriptional regulation, however, translational control has remained a significantly under-studied layer of gene expression. This trend is now beginning to shift thanks to recent advances in next-generation sequencing, proteomics, and microscopy based methodologies which allow accurate monitoring of protein translation rates, from single target messenger RNA molecules to genome-wide scale studies. In this review, we summarize these recent advances, and discuss how they are enabling researchers to study translational regulation in a wide variety of in vitro and in vivo biological systems, with unprecedented depth and spatiotemporal resolution.
Collapse
Affiliation(s)
- Maria Dermit
- Centre for Molecular Oncology, Barts Cancer Institute, Queen Mary University of London, John Vane Science Centre, Charterhouse Square, London EC1M 6BQ, UK.
| | - Martin Dodel
- Centre for Molecular Oncology, Barts Cancer Institute, Queen Mary University of London, John Vane Science Centre, Charterhouse Square, London EC1M 6BQ, UK.
| | - Faraz K Mardakheh
- Centre for Molecular Oncology, Barts Cancer Institute, Queen Mary University of London, John Vane Science Centre, Charterhouse Square, London EC1M 6BQ, UK.
| |
Collapse
|
92
|
|
93
|
Hollerer I, Higdon A, Brar GA. Strategies and Challenges in Identifying Function for Thousands of sORF-Encoded Peptides in Meiosis. Proteomics 2018; 18:e1700274. [PMID: 28929627 PMCID: PMC6135095 DOI: 10.1002/pmic.201700274] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/14/2017] [Indexed: 11/11/2022]
Abstract
Recent genomic analyses have revealed pervasive translation from formerly unrecognized short open reading frames (sORFs) during yeast meiosis. Despite their short length, which has caused these regions to be systematically overlooked by traditional gene annotation approaches, meiotic sORFs share many features with classical genes, implying the potential for similar types of cellular functions. We found that sORF expression accounts for approximately 10-20% of the cellular translation capacity in yeast during meiotic differentiation and occurs within well-defined time windows, suggesting the production of relatively abundant peptides with stage-specific meiotic roles from these regions. Here, we provide arguments supporting this hypothesis and discuss sORF similarities and differences, as a group, to traditional protein coding regions, as well as challenges in defining their specific functions.
Collapse
Affiliation(s)
- Ina Hollerer
- Department of Molecular and Cell Biology, University of California, Berkeley, CA, USA
- California Institute for Quantitative Biosciences (QB3), University of California, Berkeley, CA, USA
| | - Andrea Higdon
- Department of Molecular and Cell Biology, University of California, Berkeley, CA, USA
- California Institute for Quantitative Biosciences (QB3), University of California, Berkeley, CA, USA
| | - Gloria A Brar
- Department of Molecular and Cell Biology, University of California, Berkeley, CA, USA
- California Institute for Quantitative Biosciences (QB3), University of California, Berkeley, CA, USA
| |
Collapse
|
94
|
Finkel Y, Stern‐Ginossar N, Schwartz M. Viral Short ORFs and Their Possible Functions. Proteomics 2018; 18:e1700255. [PMID: 29150926 PMCID: PMC7167739 DOI: 10.1002/pmic.201700255] [Citation(s) in RCA: 20] [Impact Index Per Article: 2.9] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/26/2017] [Revised: 11/06/2017] [Indexed: 12/30/2022]
Abstract
Definition of functional genomic elements is one of the greater challenges of the genomic era. Traditionally, putative short open reading frames (sORFs) coding for less than 100 amino acids were disregarded due to computational and experimental limitations; however, it has become clear over the past several years that translation of sORFs is pervasive and serves diverse functions. The development of ribosome profiling, allowing identification of translated sequences genome wide, revealed wide spread, previously unidentified translation events. New computational methodologies as well as improved mass spectrometry approaches also contributed to the task of annotating translated sORFs in different organisms. Viruses are of special interest due to the selective pressure on their genome size, their rapid and confining evolution, and the potential contribution of novel peptides to the host immune response. Indeed, many functional viral sORFs were characterized to date, and ribosome profiling analyses suggest that this may be the tip of the iceberg. Our computational analyses of sORFs identified by ribosome profiling in DNA viruses demonstrate that they may be enriched in specific features implying that at least some of them are functional. Combination of systematic genome editing strategies with synthetic tagging will take us into the next step-elucidation of the biological relevance and function of this intriguing class of molecules.
Collapse
Affiliation(s)
- Yaara Finkel
- Department of Molecular GeneticsWeizmann Institute of ScienceRehovotIsrael
| | | | - Michal Schwartz
- Department of Molecular GeneticsWeizmann Institute of ScienceRehovotIsrael
| |
Collapse
|
95
|
Brunet MA, Levesque SA, Hunting DJ, Cohen AA, Roucou X. Recognition of the polycistronic nature of human genes is critical to understanding the genotype-phenotype relationship. Genome Res 2018; 28:609-624. [PMID: 29626081 PMCID: PMC5932603 DOI: 10.1101/gr.230938.117] [Citation(s) in RCA: 49] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/06/2017] [Accepted: 03/27/2018] [Indexed: 12/12/2022]
Abstract
Technological advances promise unprecedented opportunities for whole exome sequencing and proteomic analyses of populations. Currently, data from genome and exome sequencing or proteomic studies are searched against reference genome annotations. This provides the foundation for research and clinical screening for genetic causes of pathologies. However, current genome annotations substantially underestimate the proteomic information encoded within a gene. Numerous studies have now demonstrated the expression and function of alternative (mainly small, sometimes overlapping) ORFs within mature gene transcripts. This has important consequences for the correlation of phenotypes and genotypes. Most alternative ORFs are not yet annotated because of a lack of evidence, and this absence from databases precludes their detection by standard proteomic methods, such as mass spectrometry. Here, we demonstrate how current approaches tend to overlook alternative ORFs, hindering the discovery of new genetic drivers and fundamental research. We discuss available tools and techniques to improve identification of proteins from alternative ORFs and finally suggest a novel annotation system to permit a more complete representation of the transcriptomic and proteomic information contained within a gene. Given the crucial challenge of distinguishing functional ORFs from random ones, the suggested pipeline emphasizes both experimental data and conservation signatures. The addition of alternative ORFs in databases will render identification less serendipitous and advance the pace of research and genomic knowledge. This review highlights the urgent medical and research need to incorporate alternative ORFs in current genome annotations and thus permit their inclusion in hypotheses and models, which relate phenotypes and genotypes.
Collapse
Affiliation(s)
- Marie A Brunet
- Biochemistry Department, Université de Sherbrooke, Quebec J1E 4K8, Canada.,Groupe de recherche PRIMUS, Department of Family and Emergency Medicine, Quebec J1H 5N4, Canada.,PROTEO, Quebec Network for Research on Protein Function, Structure, and Engineering, Université Laval, Quebec G1V 0A6, Canada
| | - Sébastien A Levesque
- Pediatric Department, Centre Hospitalier de l'Université de Sherbrooke, Quebec J1H 5N4, Canada
| | - Darel J Hunting
- Department of Nuclear Medicine & Radiobiology, Université de Sherbrooke, Quebec J1H 5N4, Canada
| | - Alan A Cohen
- Groupe de recherche PRIMUS, Department of Family and Emergency Medicine, Quebec J1H 5N4, Canada
| | - Xavier Roucou
- Biochemistry Department, Université de Sherbrooke, Quebec J1E 4K8, Canada.,PROTEO, Quebec Network for Research on Protein Function, Structure, and Engineering, Université Laval, Quebec G1V 0A6, Canada
| |
Collapse
|
96
|
The influence of transcript assembly on the proteogenomics discovery of microproteins. PLoS One 2018; 13:e0194518. [PMID: 29584760 PMCID: PMC5870951 DOI: 10.1371/journal.pone.0194518] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/14/2017] [Accepted: 03/05/2018] [Indexed: 11/19/2022] Open
Abstract
Proteogenomics methods have identified many non-annotated protein-coding genes in the human genome. Many of the newly discovered protein-coding genes encode peptides and small proteins, referred to collectively as microproteins. Microproteins are produced through ribosome translation of small open reading frames (smORFs). The discovery of many smORFs reveals a blind spot in traditional gene-finding algorithms for these genes. Biological studies have found roles for microproteins in cell biology and physiology, and the potential that there exists additional bioactive microproteins drives the interest in detection and discovery of these molecules. A key step in any proteogenomics workflow is the assembly of RNA-Seq data into likely mRNA transcripts that are then used to create a searchable protein database. Here we demonstrate that specific features of the assembled transcriptome impact microprotein detection by shotgun proteomics. By tailoring transcript assembly for downstream mass spectrometry searching, we show that we can detect more than double the number of high-quality microprotein candidates and introduce a novel open-source mRNA assembler for proteogenomics (MAPS) that incorporates all of these features. By integrating our specialized assembler, MAPS, and a popular generalized assembler into our proteogenomics pipeline, we detect 45 novel human microproteins from a high quality proteogenomics dataset of a human cell line. We then characterize the features of the novel microproteins, identifying two classes of microproteins. Our work highlights the importance of specialized transcriptome assembly upstream of proteomics validation when searching for short and potentially rare and poorly conserved proteins.
Collapse
|
97
|
Discovery of coding regions in the human genome by integrated proteogenomics analysis workflow. Nat Commun 2018; 9:903. [PMID: 29500430 PMCID: PMC5834625 DOI: 10.1038/s41467-018-03311-y] [Citation(s) in RCA: 100] [Impact Index Per Article: 14.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/08/2017] [Accepted: 02/02/2018] [Indexed: 01/23/2023] Open
Abstract
Proteogenomics enable the discovery of novel peptides (from unannotated genomic protein-coding loci) and single amino acid variant peptides (derived from single-nucleotide polymorphisms and mutations). Increasing the reliability of these identifications is crucial to ensure their usefulness for genome annotation and potential application as neoantigens in cancer immunotherapy. We here present integrated proteogenomics analysis workflow (IPAW), which combines peptide discovery, curation, and validation. IPAW includes the SpectrumAI tool for automated inspection of MS/MS spectra, eliminating false identifications of single-residue substitution peptides. We employ IPAW to analyze two proteomics data sets acquired from A431 cells and five normal human tissues using extended (pH range, 3–10) high-resolution isoelectric focusing (HiRIEF) pre-fractionation and TMT-based peptide quantitation. The IPAW results provide evidence for the translation of pseudogenes, lncRNAs, short ORFs, alternative ORFs, N-terminal extensions, and intronic sequences. Moreover, our quantitative analysis indicates that protein production from certain pseudogenes and lncRNAs is tissue specific. Proteogenomics enables the discovery of protein coding regions and disease-relevant mutations but their verification remains challenging. Here, the authors combine peptide discovery, curation and validation in an integrated proteogenomics workflow, robustly identifying unknown coding regions and mutations.
Collapse
|
98
|
Khazigaleeva RA, Fesenko IA. Biologically active peptides encoded by small open reading frames. RUSSIAN JOURNAL OF BIOORGANIC CHEMISTRY 2018. [DOI: 10.1134/s106816201706005x] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/23/2022]
|
99
|
Laumont CM, Perreault C. Exploiting non-canonical translation to identify new targets for T cell-based cancer immunotherapy. Cell Mol Life Sci 2018; 75:607-621. [PMID: 28823056 PMCID: PMC11105255 DOI: 10.1007/s00018-017-2628-4] [Citation(s) in RCA: 56] [Impact Index Per Article: 8.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/20/2017] [Revised: 08/03/2017] [Accepted: 08/16/2017] [Indexed: 01/11/2023]
Abstract
Cryptic MHC I-associated peptides (MAPs) are produced via two mechanisms: translation of protein-coding genes in non-canonical reading frames and translation of allegedly non-coding sequences. In general, cryptic MAPs are coded by relatively short open reading frames whose translation can be regulated at the level of initiation, elongation or termination. In contrast to conventional MAPs, the processing of cryptic MAPs is frequently proteasome independent. The existence of cryptic MAPs derived from allegedly non-coding regions enlarges the scope of CD8 T cell immunosurveillance from a mere ~2% to as much as ~75% of the human genome. Considering that 99% of cancer-specific mutations are located in those allegedly non-coding regions, cryptic MAPs could furthermore represent a particularly rich source of tumor-specific antigens. However, extensive proteogenomic analyses will be required to determine the breath as well as the temporal and spatial plasticity of the cryptic MAP repertoire in normal and neoplastic cells.
Collapse
Affiliation(s)
- Céline M Laumont
- Institute for Research in Immunology and Cancer, Université de Montréal, Station Centre-Ville, PO Box 6128, Montreal, QC, H3C 3J7, Canada
- Department of Medicine, Faculty of Medicine, Université de Montréal, Station Centre-Ville, PO Box 6128, Montreal, QC, H3C 3J7, Canada
| | - Claude Perreault
- Institute for Research in Immunology and Cancer, Université de Montréal, Station Centre-Ville, PO Box 6128, Montreal, QC, H3C 3J7, Canada.
- Department of Medicine, Faculty of Medicine, Université de Montréal, Station Centre-Ville, PO Box 6128, Montreal, QC, H3C 3J7, Canada.
- Division of Hematology, Hôpital Maisonneuve-Rosemont, 5415 de l'Assomption Boulevard, Montreal, QC, H1T 2M4, Canada.
| |
Collapse
|
100
|
Moldován N, Tombácz D, Szűcs A, Csabai Z, Snyder M, Boldogkői Z. Multi-Platform Sequencing Approach Reveals a Novel Transcriptome Profile in Pseudorabies Virus. Front Microbiol 2018; 8:2708. [PMID: 29403453 PMCID: PMC5786565 DOI: 10.3389/fmicb.2017.02708] [Citation(s) in RCA: 50] [Impact Index Per Article: 7.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/17/2017] [Accepted: 12/29/2017] [Indexed: 12/14/2022] Open
Abstract
Third-generation sequencing is an emerging technology that is capable of solving several problems that earlier approaches were not able to, including the identification of transcripts isoforms and overlapping transcripts. In this study, we used long-read sequencing for the analysis of pseudorabies virus (PRV) transcriptome, including Oxford Nanopore Technologies MinION, PacBio RS-II, and Illumina HiScanSQ platforms. We also used data from our previous short-read and long-read sequencing studies for the comparison of the results and in order to confirm the obtained data. Our investigations identified 19 formerly unknown putative protein-coding genes, all of which are 5' truncated forms of earlier annotated longer PRV genes. Additionally, we detected 19 non-coding RNAs, including 5' and 3' truncated transcripts without in-frame ORFs, antisense RNAs, as well as RNA molecules encoded by those parts of the viral genome where no transcription had been detected before. This study has also led to the identification of three complex transcripts and 50 distinct length isoforms, including transcription start and end variants. We also detected 121 novel transcript overlaps, and two transcripts that overlap the replication origins of PRV. Furthermore, in silico analysis revealed 145 upstream ORFs, many of which are located on the longer 5' isoforms of the transcripts.
Collapse
Affiliation(s)
- Norbert Moldován
- Department of Medical Biology, Faculty of Medicine, University of Szeged, Szeged, Hungary
| | - Dóra Tombácz
- Department of Medical Biology, Faculty of Medicine, University of Szeged, Szeged, Hungary
| | - Attila Szűcs
- Department of Medical Biology, Faculty of Medicine, University of Szeged, Szeged, Hungary
| | - Zsolt Csabai
- Department of Medical Biology, Faculty of Medicine, University of Szeged, Szeged, Hungary
| | - Michael Snyder
- Department of Genetics, School of Medicine, Stanford University, Stanford, CA, United States
| | - Zsolt Boldogkői
- Department of Medical Biology, Faculty of Medicine, University of Szeged, Szeged, Hungary
| |
Collapse
|