1
|
Schlesinger D, Dirks C, Navarro C, Lafranchi L, Spinner A, Raja GL, Mun-Sum Tong G, Eirich J, Martinez TF, Elsässer SJ. A large-scale sORF screen identifies putative microproteins involved in cancer cell fitness. iScience 2025; 28:111884. [PMID: 40124493 PMCID: PMC11929002 DOI: 10.1016/j.isci.2025.111884] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/24/2023] [Revised: 10/11/2024] [Accepted: 01/21/2025] [Indexed: 03/25/2025] Open
Abstract
The human genome contains thousands of potentially coding short open reading frames (sORFs). While a growing set of microproteins translated from these sORFs have been demonstrated to mediate important cellular functions, the majority remains uncharacterized. In our study, we performed a high-throughput CRISPR-Cas9 knock-out screen targeting 11,776 sORFs to identify microproteins essential for cancer cell line growth. We show that the CENPBD2P gene encodes a translated sORF and promotes cell fitness. We selected five additional candidate sORFs encoding microproteins between 11 and 63 amino acids in length for further functional assessment. Green fluorescent protein fusion constructs of these microproteins localized to distinct subcellular compartments, and the majority showed reproducible biochemical interaction partners. Studying the fitness and transcriptome of sORF knock-outs and complementation with the corresponding microprotein, we identify rescuable phenotypes while also illustrating the limitations and caveats of our pipeline for sORF functional screening and characterization.
Collapse
Affiliation(s)
- Dörte Schlesinger
- Science for Life Laboratory, Karolinska Institutet, Department of Medical Biochemistry and Biophysics, Division of Genome Biology, 17165 Stockholm, Sweden
- Ming Wai Lau Centre for Reparative Medicine, Stockholm node, Karolinska Institutet, 17165 Stockholm, Sweden
| | - Christopher Dirks
- Science for Life Laboratory, Karolinska Institutet, Department of Medical Biochemistry and Biophysics, Division of Genome Biology, 17165 Stockholm, Sweden
- Ming Wai Lau Centre for Reparative Medicine, Stockholm node, Karolinska Institutet, 17165 Stockholm, Sweden
| | - Carmen Navarro
- Science for Life Laboratory, Karolinska Institutet, Department of Medical Biochemistry and Biophysics, Division of Genome Biology, 17165 Stockholm, Sweden
- Ming Wai Lau Centre for Reparative Medicine, Stockholm node, Karolinska Institutet, 17165 Stockholm, Sweden
| | - Lorenzo Lafranchi
- Science for Life Laboratory, Karolinska Institutet, Department of Medical Biochemistry and Biophysics, Division of Genome Biology, 17165 Stockholm, Sweden
- Ming Wai Lau Centre for Reparative Medicine, Stockholm node, Karolinska Institutet, 17165 Stockholm, Sweden
| | - Anna Spinner
- Science for Life Laboratory, Karolinska Institutet, Department of Medical Biochemistry and Biophysics, Division of Genome Biology, 17165 Stockholm, Sweden
| | - Glancis Luzeena Raja
- Science for Life Laboratory, Karolinska Institutet, Department of Medical Biochemistry and Biophysics, Division of Genome Biology, 17165 Stockholm, Sweden
| | - Gregory Mun-Sum Tong
- Department of Pharmaceutical Sciences, University of California, Irvine, Irvine, CA 92617, USA
| | - Jürgen Eirich
- Science for Life Laboratory, Karolinska Institutet, Department of Medical Biochemistry and Biophysics, Division of Genome Biology, 17165 Stockholm, Sweden
- University of Münster, Institute of Plant Biology and Biotechnology (IBBP), 48143 Münster, Germany
| | - Thomas Farid Martinez
- Department of Pharmaceutical Sciences, University of California, Irvine, Irvine, CA 92617, USA
- Department of Biological Chemistry, University of California, Irvine, Irvine, CA 92617, USA
- Chao Family Comprehensive Cancer Center, University of California, Irvine, Irvine, CA 92617, USA
| | - Simon Johannes Elsässer
- Science for Life Laboratory, Karolinska Institutet, Department of Medical Biochemistry and Biophysics, Division of Genome Biology, 17165 Stockholm, Sweden
- Ming Wai Lau Centre for Reparative Medicine, Stockholm node, Karolinska Institutet, 17165 Stockholm, Sweden
| |
Collapse
|
2
|
Ajala I, Vanderperre B. Non-canonical ORFs-derived protein products in mitochondria: A multifaceted exploration of their functions in health and disease. Protein Sci 2025; 34:e70053. [PMID: 39969119 PMCID: PMC11837024 DOI: 10.1002/pro.70053] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/09/2024] [Revised: 01/20/2025] [Accepted: 01/22/2025] [Indexed: 02/20/2025]
Abstract
Traditionally, eukaryotic mRNAs were perceived as inherently monocistronic. However, recent insights from ribosome profiling (Ribo-seq) and proteomics studies challenge this paradigm. These investigations reveal that, beyond the currently annotated reference proteins (RefProts), there exist additional proteins known as alternative proteins (AltProts) and small open reading frames derived microproteins encoded in regions of mRNAs previously considered untranslated or in non-coding transcripts. This experimental evidence broadens the spectrum of functional proteins within cells, tissues, and organs, potentially offering crucial insights into biological processes. Notably, a significant proportion of these newly identified AltProts and microproteins demonstrates localization in mitochondria, contributing to the functions of mitochondrial complexes. This review delves into the overlooked realm of the alternative proteome within mitochondria, exploring the role of nuclear or mitochondrial-genome-encoded AltProts and microproteins in physiological and pathological cellular processes.
Collapse
Affiliation(s)
- Ikram Ajala
- Department of Biological Sciences, Université du Québec à MontréalCERMO‐FC Research CenterMontrealQuebecCanada
- Network for Research on Protein FunctionEngineering and Applications (PROTEO)MontréalQuebecCanada
| | - Benoît Vanderperre
- Department of Biological Sciences, Université du Québec à MontréalCERMO‐FC Research CenterMontrealQuebecCanada
- Network for Research on Protein FunctionEngineering and Applications (PROTEO)MontréalQuebecCanada
| |
Collapse
|
3
|
Baena-Angulo C, Platero AI, Couso JP. Cis to trans: small ORF functions emerging through evolution. Trends Genet 2025; 41:119-131. [PMID: 39603921 DOI: 10.1016/j.tig.2024.10.012] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/18/2024] [Revised: 10/14/2024] [Accepted: 10/28/2024] [Indexed: 11/29/2024]
Abstract
Hundreds of thousands of small open reading frames (smORFs) of less than 100 codons exist in every genome, especially in long noncoding RNAs (lncRNAs) and in the 5' leaders of mRNAs. smORFs are often discarded as nonfunctional, but ribosomal profiling (RiboSeq) reveals that thousands are translated, while characterised smORF functions have risen from anecdotal to identifiable trends: smORFs can either have a cis-noncoding regulatory function (involving low translation of nonfunctional peptides) or full coding function mediated by robustly translated peptides, often having cellular and physiological roles as membrane-associated regulators of canonical proteins. The evolutionary context reveals that many smORFs represent new genes emerging de novo from noncoding sequences. We suggest a mechanism for this process, where cis-noncoding smORF functions provide niches for the subsequent evolution of full peptide functions.
Collapse
Affiliation(s)
- Casimiro Baena-Angulo
- Centro Andaluz de Biología del Desarrollo, CSIC, Universidad Pablo de Olavide, Carretera de Utrera Km1, Sevilla 41013, Spain
| | - Ana Isabel Platero
- Centro Andaluz de Biología del Desarrollo, CSIC, Universidad Pablo de Olavide, Carretera de Utrera Km1, Sevilla 41013, Spain
| | - Juan Pablo Couso
- Centro Andaluz de Biología del Desarrollo, CSIC, Universidad Pablo de Olavide, Carretera de Utrera Km1, Sevilla 41013, Spain.
| |
Collapse
|
4
|
Hofman DA, Prensner JR, van Heesch S. Microproteins in cancer: identification, biological functions, and clinical implications. Trends Genet 2025; 41:146-161. [PMID: 39379206 PMCID: PMC11794034 DOI: 10.1016/j.tig.2024.09.002] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/06/2024] [Revised: 08/19/2024] [Accepted: 09/17/2024] [Indexed: 10/10/2024]
Abstract
Cancer continues to be a major global health challenge, accounting for 10 million deaths annually worldwide. Since the inception of genome-wide cancer sequencing studies 20 years ago, a core set of ~700 oncogenes and tumor suppressor genes has become the basis for cancer research. However, this research has been based largely on an understanding that the human genome encodes ~19 500 protein-coding genes. Complementing this genomic landscape, recent advances have described numerous microproteins which are now poised to redefine our understanding of oncogenic processes and open new avenues for therapeutic intervention. This review explores the emerging evidence for microprotein involvement in cancer mechanisms and discusses potential therapeutic applications, with an emphasis on highlighting recent advances in the field.
Collapse
Affiliation(s)
- Damon A Hofman
- Princess Máxima Center for Pediatric Oncology, Heidelberglaan 25, 3584, CS, Utrecht, The Netherlands; Oncode Institute, Utrecht, The Netherlands
| | - John R Prensner
- Department of Pediatrics, Division of Pediatric Hematology/Oncology and Biological Chemistry, University of Michigan Medical School, Ann Arbor, MI 48109, USA.
| | - Sebastiaan van Heesch
- Princess Máxima Center for Pediatric Oncology, Heidelberglaan 25, 3584, CS, Utrecht, The Netherlands; Oncode Institute, Utrecht, The Netherlands.
| |
Collapse
|
5
|
Kochetov AV. Evaluation of Eukaryotic mRNA Coding Potential. Methods Mol Biol 2025; 2859:319-331. [PMID: 39436610 DOI: 10.1007/978-1-0716-4152-1_18] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/23/2024]
Abstract
It is widely discussed that eukaryotic mRNAs can encode several functional polypeptides. Recent progress in NGS and proteomics techniques has resulted in a huge volume of information on potential alternative translation initiation sites and open reading frames (altORFs). However, these data are still incomprehensive, and the vast majority of eukaryotic mRNAs annotated in conventional databases (e.g., GenBank) contain a single ORF (CDS) encoding a protein larger than some arbitrary threshold (commonly 100 amino acid residues). Indeed, some gene functions may relate to the polypeptides encoded by unannotated altORFs, and insufficient information in nucleotide sequence databanks may limit the interpretation of genomics and transcriptomics data. However, despite the need for special experiments to predict altORFs accurately, there are some simple methods for their preliminary mapping.
Collapse
Affiliation(s)
- Alex V Kochetov
- Institute of Cytology and Genetics, SB RAS, Novosibirsk, Russia.
- Novosibirsk State Agrarian University, Novosibirsk, Russia.
- Novosibirsk State University, Novosibirsk, Russia.
| |
Collapse
|
6
|
Fleck K, Luria V, Garag N, Karger A, Hunter T, Marten D, Phu W, Nam KM, Sestan N, O’Donnell-Luria AH, Erceg J. Functional associations of evolutionarily recent human genes exhibit sensitivity to the 3D genome landscape and disease. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.03.17.585403. [PMID: 38559085 PMCID: PMC10980080 DOI: 10.1101/2024.03.17.585403] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Indexed: 04/04/2024]
Abstract
Genome organization is intricately tied to regulating genes and associated cell fate decisions. Here, we examine the positioning and functional significance of human genes, grouped by their lineage restriction level, within the 3D organization of the genome. We reveal that genes of different lineage restriction levels have distinct positioning relationships with both domains and loop anchors, and remarkably consistent relationships with boundaries across cell types. While the functional associations of each group of genes are primarily cell type-specific, associations of conserved genes maintain greater stability across 3D genomic features and disease than recently evolved genes. Furthermore, the expression of these genes across various tissues follows an evolutionary progression, such that RNA levels increase from young lineage restricted genes to ancient genes present in most species. Thus, the distinct relationships of gene evolutionary age, function, and positioning within 3D genomic features contribute to tissue-specific gene regulation in development and disease.
Collapse
Affiliation(s)
- Katherine Fleck
- Department of Molecular and Cell Biology, University of Connecticut, Storrs, CT 06269, USA
- Institute for Systems Genomics, University of Connecticut, Storrs, CT 06269, USA
| | - Victor Luria
- Department of Neuroscience, Yale School of Medicine, New Haven, CT 06510, USA
- Division of Genetics and Genomics, Boston Children’s Hospital, Boston, MA 02115, USA
- Program in Medical and Population Genetics, Broad Institute of Harvard and MIT, Cambridge, MA 02142, USA
- Department of Systems Biology, Harvard Medical School, Boston, MA 02115, USA
| | - Nitanta Garag
- Department of Molecular and Cell Biology, University of Connecticut, Storrs, CT 06269, USA
| | - Amir Karger
- IT-Research Computing, Harvard Medical School, Boston, MA 02115, USA
| | - Trevor Hunter
- Department of Molecular and Cell Biology, University of Connecticut, Storrs, CT 06269, USA
| | - Daniel Marten
- Division of Genetics and Genomics, Boston Children’s Hospital, Boston, MA 02115, USA
- Program in Medical and Population Genetics, Broad Institute of Harvard and MIT, Cambridge, MA 02142, USA
| | - William Phu
- Division of Genetics and Genomics, Boston Children’s Hospital, Boston, MA 02115, USA
- Program in Medical and Population Genetics, Broad Institute of Harvard and MIT, Cambridge, MA 02142, USA
| | - Kee-Myoung Nam
- Department of Molecular, Cellular and Developmental Biology, Yale University, New Haven, CT 06510, USA
| | - Nenad Sestan
- Department of Neuroscience, Yale School of Medicine, New Haven, CT 06510, USA
| | - Anne H. O’Donnell-Luria
- Division of Genetics and Genomics, Boston Children’s Hospital, Boston, MA 02115, USA
- Program in Medical and Population Genetics, Broad Institute of Harvard and MIT, Cambridge, MA 02142, USA
- Department of Pediatrics, Harvard Medical School, Boston, MA 02115, USA
| | - Jelena Erceg
- Department of Molecular and Cell Biology, University of Connecticut, Storrs, CT 06269, USA
- Institute for Systems Genomics, University of Connecticut, Storrs, CT 06269, USA
- Department of Genetics and Genome Sciences, University of Connecticut Health Center, Farmington, CT 06030, USA
| |
Collapse
|
7
|
Whited AM, Jungreis I, Allen J, Cleveland CL, Mudge JM, Kellis M, Rinn JL, Hough LE. Biophysical characterization of high-confidence, small human proteins. BIOPHYSICAL REPORTS 2024; 4:100167. [PMID: 38909903 PMCID: PMC11305224 DOI: 10.1016/j.bpr.2024.100167] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 01/29/2024] [Revised: 04/09/2024] [Accepted: 06/20/2024] [Indexed: 06/25/2024]
Abstract
Significant efforts have been made to characterize the biophysical properties of proteins. Small proteins have received less attention because their annotation has historically been less reliable. However, recent improvements in sequencing, proteomics, and bioinformatics techniques have led to the high-confidence annotation of small open reading frames (smORFs) that encode for functional proteins, producing smORF-encoded proteins (SEPs). SEPs have been found to perform critical functions in several species, including humans. While significant efforts have been made to annotate SEPs, less attention has been given to the biophysical properties of these proteins. We characterized the distributions of predicted and curated biophysical properties, including sequence composition, structure, localization, function, and disease association of a conservative list of previously identified human SEPs. We found significant differences between SEPs and both larger proteins and control sets. In addition, we provide an example of how our characterization of biophysical properties can contribute to distinguishing protein-coding smORFs from noncoding ones in otherwise ambiguous cases.
Collapse
Affiliation(s)
- A M Whited
- BioFrontiers Institute, University of Colorado, Boulder, Colorado
| | - Irwin Jungreis
- Broad Institute of MIT and Harvard, Cambridge, Massachusetts; MIT Computer Science and Artificial Intelligence Laboratory, Cambridge, Massachusetts
| | - Jeffre Allen
- BioFrontiers Institute, University of Colorado, Boulder, Colorado; Department of Biochemistry, University of Colorado Boulder, Boulder, Colorado
| | | | - Jonathan M Mudge
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge, United Kingdom
| | - Manolis Kellis
- Broad Institute of MIT and Harvard, Cambridge, Massachusetts; MIT Computer Science and Artificial Intelligence Laboratory, Cambridge, Massachusetts
| | - John L Rinn
- BioFrontiers Institute, University of Colorado, Boulder, Colorado; Department of Biochemistry, University of Colorado Boulder, Boulder, Colorado
| | - Loren E Hough
- BioFrontiers Institute, University of Colorado, Boulder, Colorado; Department of Physics, University of Colorado Boulder, Boulder, Colorado.
| |
Collapse
|
8
|
Daisy Precilla S, Biswas I, Anitha TS, Agieshkumar B. Microproteins unveiling new dimensions in cancer. Funct Integr Genomics 2024; 24:152. [PMID: 39223429 DOI: 10.1007/s10142-024-01426-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/17/2024] [Revised: 08/08/2024] [Accepted: 08/20/2024] [Indexed: 09/04/2024]
Abstract
In the complex landscape of cancer biology, the discovery of microproteins has triggered a paradigm shift, thereby, challenging the conventional conceptions of gene regulation. Though overlooked for years, these entities encoded by the small open reading frames (100-150 codons), have a significant impact on various cellular processes. As precision medicine pioneers delve deeper into the genome and proteome, microproteins have come into the limelight. Typically characterized by a single protein domain that directly binds to the target protein complex and regulates their assembly, these microproteins have been shown to play a key role in fundamental biological processes such as RNA processing, DNA repair, and metabolism regulation. Techniques for identification and characterization, such as ribosome profiling and proteogenomic approaches, have unraveled unique mechanisms by which these microproteins regulate cell signaling or pathological processes in most diseases including cancer. However, the functional relevance of these microproteins in cancer remains unclear. In this context, the current review aims to "rethink the essence of these genes" and explore "how these hidden players-microproteins orchestrate the signaling cascades of cancer, both as accelerators and brakes.".
Collapse
Affiliation(s)
- S Daisy Precilla
- Mahatma Gandhi Medical Advanced Research Institute (MGMARI), Sri Balaji Vidyapeeth, Puducherry, 607 402, India.
| | - Indrani Biswas
- Mahatma Gandhi Medical Advanced Research Institute (MGMARI), Sri Balaji Vidyapeeth, Puducherry, 607 402, India
| | - T S Anitha
- Department of Biochemistry and Molecular Biology, Pondicherry University, Puducherry, 605 014, India
| | - B Agieshkumar
- Mahatma Gandhi Medical Advanced Research Institute (MGMARI), Sri Balaji Vidyapeeth, Puducherry, 607 402, India
| |
Collapse
|
9
|
Coorssen JR, Padula MP. Proteomics-The State of the Field: The Definition and Analysis of Proteomes Should Be Based in Reality, Not Convenience. Proteomes 2024; 12:14. [PMID: 38651373 PMCID: PMC11036260 DOI: 10.3390/proteomes12020014] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/17/2024] [Revised: 04/17/2024] [Accepted: 04/17/2024] [Indexed: 04/25/2024] Open
Abstract
With growing recognition and acknowledgement of the genuine complexity of proteomes, we are finally entering the post-proteogenomic era. Routine assessment of proteomes as inferred correlates of gene sequences (i.e., canonical 'proteins') cannot provide the necessary critical analysis of systems-level biology that is needed to understand underlying molecular mechanisms and pathways or identify the most selective biomarkers and therapeutic targets. These critical requirements demand the analysis of proteomes at the level of proteoforms/protein species, the actual active molecular players. Currently, only highly refined integrated or integrative top-down proteomics (iTDP) enables the analytical depth necessary to provide routine, comprehensive, and quantitative proteome assessments across the widest range of proteoforms inherent to native systems. Here we provide a broad perspective of the field, taking in historical and current realities, to establish a more balanced understanding of where the field has come from (in particular during the ten years since Proteomes was launched), current issues, and how things likely need to proceed if necessary deep proteome analyses are to succeed. We base this in our firm belief that the best proteomic analyses reflect, as closely as possible, the native sample at the moment of sampling. We also seek to emphasise that this and future analytical approaches are likely best based on the broad recognition and exploitation of the complementarity of currently successful approaches. This also emphasises the need to continuously evaluate and further optimize established approaches, to avoid complacency in thinking and expectations but also to promote the critical and careful development and introduction of new approaches, most notably those that address proteoforms. Above all, we wish to emphasise that a rigorous focus on analytical quality must override current thinking that largely values analytical speed; the latter would certainly be nice, if only proteoforms could thus be effectively, routinely, and quantitatively assessed. Alas, proteomes are composed of proteoforms, not molecular species that can be amplified or that directly mirror genes (i.e., 'canonical'). The problem is hard, and we must accept and address it as such, but the payoff in playing this longer game of rigorous deep proteome analyses is the promise of far more selective biomarkers, drug targets, and truly personalised or even individualised medicine.
Collapse
Affiliation(s)
- Jens R. Coorssen
- Department of Biological Sciences, Faculty of Mathematics and Science, Brock University, St. Catharines, ON L2S 3A1, Canada
- Institute for Globally Distributed Open Research and Education (IGDORE), St. Catharines, ON L2N 4X2, Canada
| | - Matthew P. Padula
- School of Life Sciences and Proteomics, Lipidomics and Metabolomics Core Facility, Faculty of Science, University of Technology Sydney, Sydney, NSW 2007, Australia
| |
Collapse
|
10
|
Whited AM, Jungreis I, Allen J, Cleveland CL, Mudge JM, Kellis M, Rinn JL, Hough LE. Biophysical characterization of high-confidence, small human proteins. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.04.12.589296. [PMID: 38659920 PMCID: PMC11042228 DOI: 10.1101/2024.04.12.589296] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 04/26/2024]
Abstract
Significant efforts have been made to characterize the biophysical properties of proteins. Small proteins have received less attention because their annotation has historically been less reliable. However, recent improvements in sequencing, proteomics, and bioinformatics techniques have led to the high-confidence annotation of small open reading frames (smORFs) that encode for functional proteins, producing smORF-encoded proteins (SEPs). SEPs have been found to perform critical functions in several species, including humans. While significant efforts have been made to annotate SEPs, less attention has been given to the biophysical properties of these proteins. We characterized the distributions of predicted and curated biophysical properties, including sequence composition, structure, localization, function, and disease association of a conservative list of previously identified human SEPs. We found significant differences between SEPs and both larger proteins and control sets. Additionally, we provide an example of how our characterization of biophysical properties can contribute to distinguishing protein-coding smORFs from non-coding ones in otherwise ambiguous cases.
Collapse
Affiliation(s)
- A M Whited
- BioFrontiers Institute, University of Colorado, Boulder, CO, USA
| | - Irwin Jungreis
- Broad Institute of MIT and Harvard, Cambridge, MA, USA
- MIT Computer Science and Artificial Intelligence Laboratory, Cambridge, MA, USA
| | - Jeffre Allen
- BioFrontiers Institute, University of Colorado, Boulder, CO, USA
- Department of Biochemistry, University of Colorado Boulder, CO, USA
| | | | - Jonathan M Mudge
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge, UK
| | - Manolis Kellis
- Broad Institute of MIT and Harvard, Cambridge, MA, USA
- MIT Computer Science and Artificial Intelligence Laboratory, Cambridge, MA, USA
| | - John L Rinn
- BioFrontiers Institute, University of Colorado, Boulder, CO, USA
- Department of Biochemistry, University of Colorado Boulder, CO, USA
| | - Loren E Hough
- BioFrontiers Institute, University of Colorado, Boulder, CO, USA
- Department of Physics, University of Colorado Boulder, CO, USA
| |
Collapse
|
11
|
Valdivia-Francia F, Sendoel A. No country for old methods: New tools for studying microproteins. iScience 2024; 27:108972. [PMID: 38333695 PMCID: PMC10850755 DOI: 10.1016/j.isci.2024.108972] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/10/2024] Open
Abstract
Microproteins encoded by small open reading frames (sORFs) have emerged as a fascinating frontier in genomics. Traditionally overlooked due to their small size, recent technological advancements such as ribosome profiling, mass spectrometry-based strategies and advanced computational approaches have led to the annotation of more than 7000 sORFs in the human genome. Despite the vast progress, only a tiny portion of these microproteins have been characterized and an important challenge in the field lies in identifying functionally relevant microproteins and understanding their role in different cellular contexts. In this review, we explore the recent advancements in sORF research, focusing on the new methodologies and computational approaches that have facilitated their identification and functional characterization. Leveraging these new tools hold great promise for dissecting the diverse cellular roles of microproteins and will ultimately pave the way for understanding their role in the pathogenesis of diseases and identifying new therapeutic targets.
Collapse
Affiliation(s)
- Fabiola Valdivia-Francia
- University of Zurich, Institute for Regenerative Medicine (IREM), Wagistrasse 12, 8952 Schlieren-Zurich, Switzerland
- Life Science Zurich Graduate School, Molecular Life Science Program, University of Zurich/ ETH Zurich, Schlieren-Zurich, Switzerland
| | - Ataman Sendoel
- University of Zurich, Institute for Regenerative Medicine (IREM), Wagistrasse 12, 8952 Schlieren-Zurich, Switzerland
| |
Collapse
|
12
|
Saitoh M. Transcriptional regulation of EMT transcription factors in cancer. Semin Cancer Biol 2023; 97:21-29. [PMID: 37802266 DOI: 10.1016/j.semcancer.2023.10.001] [Citation(s) in RCA: 25] [Impact Index Per Article: 12.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/29/2022] [Revised: 12/01/2022] [Accepted: 10/02/2023] [Indexed: 10/08/2023]
Abstract
The epithelial-mesenchymal transition (EMT) is one of the processes by which epithelial cells transdifferentiate into mesenchymal cells in the developmental stage, known as "complete EMT." In epithelial cancer, EMT, also termed "partial EMT," is associated with invasion, metastasis, and resistance to therapy, and is elicited by several transcription factors, frequently referred to as EMT transcription factors. Among these transcription factors that regulate EMT, ZEB1/2 (ZEB1 and ZEB2), SNAIL, and TWIST play a prominent role in driving the EMT process (hereafter referred to as "EMT-TFs"). Among these, ZEB1/2 show positive correlation with both expression of mesenchymal marker proteins and the aggressiveness of various carcinomas. On the other hand, TWIST and SNAIL are also correlated with the aggressiveness of carcinomas, but are not highly correlated with mesenchymal marker protein expression. Interestingly, these EMT-TFs are not detected simultaneously in any studied cases of aggressive cancers, except for sarcoma. Thus, only one or some of the EMT-TFs are expressed at high levels in cells of aggressive carcinomas. Expression of EMT-TFs is regulated by transforming growth factor-β (TGF-β), a well-established inducer of EMT, in cooperation with other signaling molecules, such as active RAS signals. The focus of this review is the molecular mechanisms by which EMT-TFs are transcriptionally sustained at sufficiently high levels in cells of aggressive carcinomas and upregulated by TGF-β during cancer progression.
Collapse
Affiliation(s)
- Masao Saitoh
- Center for Medical Education and Sciences, Graduate School of Medicine, University of Yamanashi, Chuo-city, Yamanashi, Japan.
| |
Collapse
|
13
|
Sahgal A, Uversky V, Davé V. Microproteins transitioning into a new Phase: Defining the undefined. Methods 2023; 220:38-54. [PMID: 37890707 DOI: 10.1016/j.ymeth.2023.10.009] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/01/2023] [Revised: 10/19/2023] [Accepted: 10/21/2023] [Indexed: 10/29/2023] Open
Abstract
Recent advancements in omics technologies have unveiled a hitherto unknown group of short polypeptides called microproteins (miPs). Despite their size, accumulating evidence has demonstrated that miPs exert varied and potent biological functions. They act in paracrine, juxtracrine, and endocrine fashion, maintaining cellular physiology and driving diseases. The present study focuses on biochemical and biophysical analysis and characterization of twenty-four human miPs using distinct computational methods, including RIDAO, AlphaFold2, D2P2, FuzDrop, STRING, and Emboss Pep wheel. miPs often lack well-defined tertiary structures and may harbor intrinsically disordered regions (IDRs) that play pivotal roles in cellular functions. Our analyses define the physicochemical properties of an essential subset of miPs, elucidating their structural characteristics and demonstrating their propensity for driving or participating in liquid-liquid phase separation (LLPS) and intracellular condensate formation. Notably, miPs such as NoBody and pTUNAR revealed a high propensity for LLPS, implicating their potential involvement in forming membrane-less organelles (MLOs) during intracellular LLPS and condensate formation. The results of our study indicate that miPs have functionally profound implications in cellular compartmentalization and signaling processes essential for regulating normal cellular functions. Taken together, our methodological approach explains and highlights the biological importance of these miPs, providing a deeper understanding of the unusual structural landscape and functionality of these newly defined small proteins. Understanding their functions and biological behavior will aid in developing targeted therapies for diseases that involve miPs.
Collapse
Affiliation(s)
- Aayushi Sahgal
- Department of Molecular Medicine, Morsani College of Medicine, University of South Florida, Tampa, FL 33612, United States; Biotechnology Graduate Program, Morsani College of Medicine, University of South Florida, Tampa, FL 33612, United States
| | - Vladimir Uversky
- Department of Molecular Medicine, Morsani College of Medicine, University of South Florida, Tampa, FL 33612, United States; USF Health Byrd Alzheimer's Research Institute, Morsani College of Medicine, University of South Florida, Tampa, FL 33612, United States
| | - Vrushank Davé
- Department of Molecular Medicine, Morsani College of Medicine, University of South Florida, Tampa, FL 33612, United States; Biotechnology Graduate Program, Morsani College of Medicine, University of South Florida, Tampa, FL 33612, United States; Department of Pathology and Cell Biology, Morsani College of Medicine, University of South Florida, Tampa, FL 33612, United States; Department of Oncologic Sciences, Morsani College of Medicine, University of South Florida, Tampa, FL 33612, United States.
| |
Collapse
|
14
|
Mohsen JJ, Martel AA, Slavoff SA. Microproteins-Discovery, structure, and function. Proteomics 2023; 23:e2100211. [PMID: 37603371 PMCID: PMC10841188 DOI: 10.1002/pmic.202100211] [Citation(s) in RCA: 20] [Impact Index Per Article: 10.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/04/2023] [Revised: 08/03/2023] [Accepted: 08/10/2023] [Indexed: 08/22/2023]
Abstract
Advances in proteogenomic technologies have revealed hundreds to thousands of translated small open reading frames (sORFs) that encode microproteins in genomes across evolutionary space. While many microproteins have now been shown to play critical roles in biology and human disease, a majority of recently identified microproteins have little or no experimental evidence regarding their functionality. Computational tools have some limitations for analysis of short, poorly conserved microprotein sequences, so additional approaches are needed to determine the role of each member of this recently discovered polypeptide class. A currently underexplored avenue in the study of microproteins is structure prediction and determination, which delivers a depth of functional information. In this review, we provide a brief overview of microprotein discovery methods, then examine examples of microprotein structures (and, conversely, intrinsic disorder) that have been experimentally determined using crystallography, cryo-electron microscopy, and NMR, which provide insight into their molecular functions and mechanisms. Additionally, we discuss examples of predicted microprotein structures that have provided insight or context regarding their function. Analysis of microprotein structure at the angstrom level, and confirmation of predicted structures, therefore, has potential to identify translated microproteins that are of biological importance and to provide molecular mechanism for their in vivo roles.
Collapse
Affiliation(s)
- Jessica J. Mohsen
- Department of Chemistry, Yale University, New Haven, CT, USA
- Institute of Biomolecular Design and Discovery, Yale University, West Haven, CT, USA
| | - Alina A. Martel
- Institute of Biomolecular Design and Discovery, Yale University, West Haven, CT, USA
| | - Sarah A. Slavoff
- Department of Chemistry, Yale University, New Haven, CT, USA
- Institute of Biomolecular Design and Discovery, Yale University, West Haven, CT, USA
- Department of Molecular Biophysics and Biochemistry, Yale University, New Haven, CT, USA
| |
Collapse
|
15
|
Xie L, Bowman ME, Louie GV, Zhang C, Ardejani MS, Huang X, Chu Q, Donaldson CJ, Vaughan JM, Shan H, Powers ET, Kelly JW, Lyumkis D, Noel JP, Saghatelian A. Biochemistry and Protein Interactions of the CYREN Microprotein. Biochemistry 2023; 62:3050-3060. [PMID: 37813856 PMCID: PMC12060184 DOI: 10.1021/acs.biochem.3c00397] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/11/2023]
Abstract
Over the past decade, advances in genomics have identified thousands of additional protein-coding small open reading frames (smORFs) missed by traditional gene finding approaches. These smORFs encode peptides and small proteins, commonly termed micropeptides or microproteins. Several of these newly discovered microproteins have biological functions and operate through interactions with proteins and protein complexes within the cell. CYREN1 is a characterized microprotein that regulates double-strand break repair in mammalian cells through interaction with Ku70/80 heterodimer. Ku70/80 binds to and stabilizes double-strand breaks and recruits the machinery needed for nonhomologous end join repair. In this study, we examined the biochemical properties of CYREN1 to better understand and explain its cellular protein interactions. Our findings support that CYREN1 is an intrinsically disordered microprotein and this disordered structure allows it to enriches several proteins, including a newly discovered interaction with SF3B1 via a distinct short linear motif (SLiMs) on CYREN1. Since many microproteins are predicted to be disordered, CYREN1 is an exemplar of how microproteins interact with other proteins and reveals an unknown scaffolding function of this microprotein that may link NHEJ and splicing.
Collapse
Affiliation(s)
- Lina Xie
- Clayton Foundation Peptide Biology Laboratories, The Salk Institute for Biological Studies, 10010 North Torrey Pines Road, La Jolla, CA, USA
| | - Marianne E Bowman
- Jack H Skirball Center for Chemical Biology and Proteomics, Salk Institute for Biological Studies, La Jolla, CA, USA
| | - Gordon V Louie
- Jack H Skirball Center for Chemical Biology and Proteomics, Salk Institute for Biological Studies, La Jolla, CA, USA
| | - Cheng Zhang
- Laboratory of Genetics, The Salk Institute for Biological Studies; Graduate School of Biological Sciences, Section of Molecular Biology, University of California San Diego; Department of Integrative Structural and Computational Biology, The Scripps Research Institute, La Jolla, CA, USA
| | - Maziar S. Ardejani
- Department of Chemistry and The Skaggs Institute for Chemical Biology, The Scripps Research Institute, La Jolla, CA, USA
| | - Xuemei Huang
- University of California, San Diego, Department of Chemistry and Biochemistry, 9500 Gilman Drive, La Jolla, CA, USA
| | - Qian Chu
- Department of Pharmacy, China Pharmaceutical University, Nanjing, Jiangsu, China
| | - Cynthia J Donaldson
- Clayton Foundation Peptide Biology Laboratories, The Salk Institute for Biological Studies, 10010 North Torrey Pines Road, La Jolla, CA, USA
| | - Joan M Vaughan
- Clayton Foundation Peptide Biology Laboratories, The Salk Institute for Biological Studies, 10010 North Torrey Pines Road, La Jolla, CA, USA
| | - Huanqi Shan
- Clayton Foundation Peptide Biology Laboratories, The Salk Institute for Biological Studies, 10010 North Torrey Pines Road, La Jolla, CA, USA
| | - Evan T. Powers
- Department of Chemistry and The Skaggs Institute for Chemical Biology, The Scripps Research Institute, La Jolla, CA, USA
| | - Jeffery W. Kelly
- Department of Chemistry and The Skaggs Institute for Chemical Biology, The Scripps Research Institute, La Jolla, CA, USA
| | - Dimitry Lyumkis
- Laboratory of Genetics, The Salk Institute for Biological Studies; Graduate School of Biological Sciences, Section of Molecular Biology, University of California San Diego; Department of Integrative Structural and Computational Biology, The Scripps Research Institute, La Jolla, CA, USA
| | - Joseph P. Noel
- Jack H Skirball Center for Chemical Biology and Proteomics, Salk Institute for Biological Studies, La Jolla, CA, USA
| | - Alan Saghatelian
- Clayton Foundation Peptide Biology Laboratories, The Salk Institute for Biological Studies, 10010 North Torrey Pines Road, La Jolla, CA, USA
| |
Collapse
|
16
|
Mohaupt P, Vialaret J, Hirtz C, Lehmann S. Readthrough isoform of aquaporin-4 (AQP4) as a therapeutic target for Alzheimer's disease and other proteinopathies. Alzheimers Res Ther 2023; 15:170. [PMID: 37821965 PMCID: PMC10566184 DOI: 10.1186/s13195-023-01318-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/30/2023] [Accepted: 09/27/2023] [Indexed: 10/13/2023]
Abstract
The glymphatic system is a crucial component in preserving brain homeostasis by facilitating waste clearance from the central nervous system (CNS). Aquaporin-4 (AQP4) water channels facilitate the continuous interchange between cerebrospinal fluid and brain interstitial fluid by convective flow movement. This flow is responsible for guiding proteins and metabolites away from the CNS. Proteinopathies are neurological conditions characterized by the accumulation of aggregated proteins or peptides in the brain. In Alzheimer's disease (AD), the deposition of amyloid-β (Aβ) peptides causes the formation of senile plaques. This accumulation has been hypothesized to be a result of the imbalance between Aβ production and clearance. Recent studies have shown that an extended form of AQP4 increases Aβ clearance from the brain. In this mini-review, we present a summary of these findings and explore the potential for future therapeutic strategies aiming to boost waste clearance in AD.
Collapse
Affiliation(s)
- Pablo Mohaupt
- LBPC-PPC, Université de Montpellier, IRMB CHU de Montpellier, INM INSERM, Montpellier, France
| | - Jérôme Vialaret
- LBPC-PPC, Université de Montpellier, IRMB CHU de Montpellier, INM INSERM, Montpellier, France
| | - Christophe Hirtz
- LBPC-PPC, Université de Montpellier, IRMB CHU de Montpellier, INM INSERM, Montpellier, France
| | - Sylvain Lehmann
- LBPC-PPC, Université de Montpellier, IRMB CHU de Montpellier, INM INSERM, Montpellier, France
| |
Collapse
|
17
|
Capuz A, Osien S, Cardon T, Karnoub MA, Aboulouard S, Raffo-Romero A, Duhamel M, Cizkova D, Trerotola M, Devos D, Kobeissy F, Vanden Abeele F, Bonnefond A, Fournier I, Rodet F, Salzet M. Heimdall, an alternative protein issued from a ncRNA related to kappa light chain variable region of immunoglobulins from astrocytes: a new player in neural proteome. Cell Death Dis 2023; 14:526. [PMID: 37587118 PMCID: PMC10432539 DOI: 10.1038/s41419-023-06037-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/04/2023] [Revised: 07/27/2023] [Accepted: 08/02/2023] [Indexed: 08/18/2023]
Abstract
The dogma "One gene, one protein" is clearly obsolete since cells use alternative splicing and generate multiple transcripts which are translated into protein isoforms, but also use alternative translation initiation sites (TISs) and termination sites on a given transcript. Alternative open reading frames for individual transcripts give proteins originate from the 5'- and 3'-UTR mRNA regions, frameshifts of mRNA ORFs or from non-coding RNAs. Longtime considered as non-coding, recent in-silico translation prediction methods enriched the protein databases allowing the identification of new target structures that have not been identified previously. To gain insight into the role of these newly identified alternative proteins in the regulation of cellular functions, it is crucial to assess their dynamic modulation within a framework of altered physiological modifications such as experimental spinal cord injury (SCI). Here, we carried out a longitudinal proteomic study on rat SCI from 12 h to 10 days. Based on the alternative protein predictions, it was possible to identify a plethora of newly predicted protein hits. Among these proteins, some presented a special interest due to high homology with variable chain regions of immunoglobulins. We focus our interest on the one related to Kappa variable light chains which is similarly highly produced by B cells in the Bence jones disease, but here expressed in astrocytes. This protein, name Heimdall is an Intrinsically disordered protein which is secreted under inflammatory conditions. Immunoprecipitation experiments showed that the Heimdall interactome contained proteins related to astrocyte fate keepers such as "NOTCH1, EPHA3, IPO13" as well as membrane receptor protein including "CHRNA9; TGFBR, EPHB6, and TRAM". However, when Heimdall protein was neutralized utilizing a specific antibody or its gene knocked out by CRISPR-Cas9, sprouting elongations were observed in the corresponding astrocytes. Interestingly, depolarization assays and intracellular calcium measurements in Heimdall KO, established a depolarization effect on astrocyte membranes KO cells were more likely that the one found in neuroprogenitors. Proteomic analyses performed under injury conditions or under lipopolysaccharides (LPS) stimulation, revealed the expression of neuronal factors, stem cell proteins, proliferation, and neurogenesis of astrocyte convertor factors such as EPHA4, NOTCH2, SLIT3, SEMA3F, suggesting a role of Heimdall could regulate astrocytic fate. Taken together, Heimdall could be a novel member of the gatekeeping astrocyte-to-neuroprogenitor conversion factors.
Collapse
Affiliation(s)
- Alice Capuz
- Univ. Lille, Inserm, U-1192 - Laboratoire Protéomique, Réponse Inflammatoire et Spectrométrie de Masse-PRISM, F-59000, Lille, France
| | - Sylvain Osien
- Univ. Lille, Inserm, U-1192 - Laboratoire Protéomique, Réponse Inflammatoire et Spectrométrie de Masse-PRISM, F-59000, Lille, France
| | - Tristan Cardon
- Univ. Lille, Inserm, U-1192 - Laboratoire Protéomique, Réponse Inflammatoire et Spectrométrie de Masse-PRISM, F-59000, Lille, France
| | - Mélodie Anne Karnoub
- Univ. Lille, Inserm, U-1192 - Laboratoire Protéomique, Réponse Inflammatoire et Spectrométrie de Masse-PRISM, F-59000, Lille, France
| | - Soulaimane Aboulouard
- Univ. Lille, Inserm, U-1192 - Laboratoire Protéomique, Réponse Inflammatoire et Spectrométrie de Masse-PRISM, F-59000, Lille, France
| | - Antonella Raffo-Romero
- Univ. Lille, Inserm, U-1192 - Laboratoire Protéomique, Réponse Inflammatoire et Spectrométrie de Masse-PRISM, F-59000, Lille, France
| | - Marie Duhamel
- Univ. Lille, Inserm, U-1192 - Laboratoire Protéomique, Réponse Inflammatoire et Spectrométrie de Masse-PRISM, F-59000, Lille, France
| | - Dasa Cizkova
- Univ. Lille, Inserm, U-1192 - Laboratoire Protéomique, Réponse Inflammatoire et Spectrométrie de Masse-PRISM, F-59000, Lille, France
- Institute of Neuroimmunology, Slovak Academy of Sciences, Dúbravská cesta 9, 845 10, Bratislava, Slovakia
- Centre for Experimental and Clinical Regenerative Medicine, University of Veterinary Medicine and Pharmacy in Kosice, Kosice, Slovakia
| | - Marco Trerotola
- Laboratory of Cancer Pathology, Center for Advanced Studies and Technology (CAST), University 'G. d'Annunzio', Chieti, Italy
- Department of Medical, Oral and Biotechnological Sciences, University 'G. d'Annunzio', Chieti, Italy
| | - David Devos
- Université de Lille, INSERM, U1172, CHU-Lille, Lille Neuroscience Cognition Research Centre, 1 place de Verdun, 59000, Lille, France
| | - Firas Kobeissy
- Department of Biochemistry and Molecular Genetics, Faculty of Medicine, American University of Beirut, Beirut, Lebanon
| | - Fabien Vanden Abeele
- Université de Lille, INSERM U1003, Laboratory of Cell Physiology, 59650, Villeneuve d'Ascq, France
| | - Amélie Bonnefond
- Univ. Lille, Inserm UMR1283, CNRS UMR8199, European Genomic Institute for Diabetes (EGID), Institut Pasteur de Lille, CHU de Lille, 1 place de Verdun, 59000, Lille, France
| | - Isabelle Fournier
- Univ. Lille, Inserm, U-1192 - Laboratoire Protéomique, Réponse Inflammatoire et Spectrométrie de Masse-PRISM, F-59000, Lille, France
- Institut Universitaire de France, 75005, Paris, France
| | - Franck Rodet
- Univ. Lille, Inserm, U-1192 - Laboratoire Protéomique, Réponse Inflammatoire et Spectrométrie de Masse-PRISM, F-59000, Lille, France.
| | - Michel Salzet
- Univ. Lille, Inserm, U-1192 - Laboratoire Protéomique, Réponse Inflammatoire et Spectrométrie de Masse-PRISM, F-59000, Lille, France.
- Institut Universitaire de France, 75005, Paris, France.
| |
Collapse
|
18
|
Zaytsev K, Fedorov A, Korotkov E. Classification of Promoter Sequences from Human Genome. Int J Mol Sci 2023; 24:12561. [PMID: 37628742 PMCID: PMC10454140 DOI: 10.3390/ijms241612561] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/02/2023] [Revised: 07/28/2023] [Accepted: 08/03/2023] [Indexed: 08/27/2023] Open
Abstract
We have developed a new method for promoter sequence classification based on a genetic algorithm and the MAHDS sequence alignment method. We have created four classes of human promoters, combining 17,310 sequences out of the 29,598 present in the EPD database. We searched the human genome for potential promoter sequences (PPSs) using dynamic programming and position weight matrices representing each of the promoter sequence classes. A total of 3,065,317 potential promoter sequences were found. Only 1,241,206 of them were located in unannotated parts of the human genome. Every other PPS found intersected with either true promoters, transposable elements, or interspersed repeats. We found a strong intersection between PPSs and Alu elements as well as transcript start sites. The number of false positive PPSs is estimated to be 3 × 10-8 per nucleotide, which is several orders of magnitude lower than for any other promoter prediction method. The developed method can be used to search for PPSs in various eukaryotic genomes.
Collapse
Affiliation(s)
- Konstantin Zaytsev
- Bach Institute of Biochemistry, Federal Research Center of Biotechnology of the Russian Academy of Sciences, 119071 Moscow, Russia
| | - Alexey Fedorov
- Bach Institute of Biochemistry, Federal Research Center of Biotechnology of the Russian Academy of Sciences, 119071 Moscow, Russia
| | - Eugene Korotkov
- Institute of Bioengineering, Federal Research Center of Biotechnology of the Russian Academy of Sciences, 119071 Moscow, Russia
| |
Collapse
|
19
|
Chen Y, Cao X, Loh KH, Slavoff SA. Chemical labeling and proteomics for characterization of unannotated small and alternative open reading frame-encoded polypeptides. Biochem Soc Trans 2023; 51:1071-1082. [PMID: 37171061 PMCID: PMC10317152 DOI: 10.1042/bst20221074] [Citation(s) in RCA: 5] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/03/2023] [Revised: 03/27/2023] [Accepted: 04/13/2023] [Indexed: 05/13/2023]
Abstract
Thousands of unannotated small and alternative open reading frames (smORFs and alt-ORFs, respectively) have recently been revealed in mammalian genomes. While hundreds of mammalian smORF- and alt-ORF-encoded proteins (SEPs and alt-proteins, respectively) affect cell proliferation, the overwhelming majority of smORFs and alt-ORFs remain uncharacterized at the molecular level. Complicating the task of identifying the biological roles of smORFs and alt-ORFs, the SEPs and alt-proteins that they encode exhibit limited sequence homology to protein domains of known function. Experimental techniques for the functionalization of these gene classes are therefore required. Approaches combining chemical labeling and quantitative proteomics have greatly advanced our ability to identify and characterize functional SEPs and alt-proteins in high throughput. In this review, we briefly describe the principles of proteomic discovery of SEPs and alt-proteins, then summarize how these technologies interface with chemical labeling for identification of SEPs and alt-proteins with specific properties, as well as in defining the interactome of SEPs and alt-proteins.
Collapse
Affiliation(s)
- Yanran Chen
- Department of Chemistry, Yale University, New Haven, CT, U.S.A
- Institute for Biomolecular Design and Discovery, Yale University, West Haven, CT, U.S.A
| | - Xiongwen Cao
- Department of Chemistry, Yale University, New Haven, CT, U.S.A
- Institute for Biomolecular Design and Discovery, Yale University, West Haven, CT, U.S.A
- Department of Comparative Medicine, Yale University School of Medicine, New Haven, CT, U.S.A
- Shanghai Key Laboratory of Regulatory Biology, Institute of Biomedical Sciences and School of Life Sciences, East China Normal University, Shanghai, China
| | - Ken H. Loh
- Institute for Biomolecular Design and Discovery, Yale University, West Haven, CT, U.S.A
- Department of Comparative Medicine, Yale University School of Medicine, New Haven, CT, U.S.A
| | - Sarah A. Slavoff
- Department of Chemistry, Yale University, New Haven, CT, U.S.A
- Institute for Biomolecular Design and Discovery, Yale University, West Haven, CT, U.S.A
- Department of Molecular Biophysics and Biochemistry, Yale University, New Haven, CT, U.S.A
| |
Collapse
|
20
|
Leblanc S, Brunet MA, Jacques JF, Lekehal AM, Duclos A, Tremblay A, Bruggeman-Gascon A, Samandi S, Brunelle M, Cohen AA, Scott MS, Roucou X. Newfound Coding Potential of Transcripts Unveils Missing Members of Human Protein Communities. GENOMICS, PROTEOMICS & BIOINFORMATICS 2023; 21:515-534. [PMID: 36183975 PMCID: PMC10787177 DOI: 10.1016/j.gpb.2022.09.008] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/01/2021] [Revised: 08/10/2022] [Accepted: 09/26/2022] [Indexed: 06/16/2023]
Abstract
Recent proteogenomic approaches have led to the discovery that regions of the transcriptome previously annotated as non-coding regions [i.e., untranslated regions (UTRs), open reading frames overlapping annotated coding sequences in a different reading frame, and non-coding RNAs] frequently encode proteins, termed alternative proteins (altProts). This suggests that previously identified protein-protein interaction (PPI) networks are partially incomplete because altProts are not present in conventional protein databases. Here, we used the proteogenomic resource OpenProt and a combined spectrum- and peptide-centric analysis for the re-analysis of a high-throughput human network proteomics dataset, thereby revealing the presence of 261 altProts in the network. We found 19 genes encoding both an annotated (reference) and an alternative protein interacting with each other. Of the 117 altProts encoded by pseudogenes, 38 are direct interactors of reference proteins encoded by their respective parental genes. Finally, we experimentally validate several interactions involving altProts. These data improve the blueprints of the human PPI network and suggest functional roles for hundreds of altProts.
Collapse
Affiliation(s)
- Sébastien Leblanc
- Department of Biochemistry and Functional Genomics, Université de Sherbrooke, Sherbrooke, QC J1E 4K8, Canada; PROTEO, Quebec Network for Research on Protein Function, Structure, and Engineering, Quebec City, QC G1V 0A6, Canada
| | - Marie A Brunet
- Department of Biochemistry and Functional Genomics, Université de Sherbrooke, Sherbrooke, QC J1E 4K8, Canada; PROTEO, Quebec Network for Research on Protein Function, Structure, and Engineering, Quebec City, QC G1V 0A6, Canada
| | - Jean-François Jacques
- Department of Biochemistry and Functional Genomics, Université de Sherbrooke, Sherbrooke, QC J1E 4K8, Canada; PROTEO, Quebec Network for Research on Protein Function, Structure, and Engineering, Quebec City, QC G1V 0A6, Canada
| | - Amina M Lekehal
- Department of Biochemistry and Functional Genomics, Université de Sherbrooke, Sherbrooke, QC J1E 4K8, Canada; PROTEO, Quebec Network for Research on Protein Function, Structure, and Engineering, Quebec City, QC G1V 0A6, Canada
| | - Andréa Duclos
- Department of Biochemistry and Functional Genomics, Université de Sherbrooke, Sherbrooke, QC J1E 4K8, Canada
| | - Alexia Tremblay
- Department of Biochemistry and Functional Genomics, Université de Sherbrooke, Sherbrooke, QC J1E 4K8, Canada
| | - Alexis Bruggeman-Gascon
- Department of Biochemistry and Functional Genomics, Université de Sherbrooke, Sherbrooke, QC J1E 4K8, Canada
| | - Sondos Samandi
- Department of Biochemistry and Functional Genomics, Université de Sherbrooke, Sherbrooke, QC J1E 4K8, Canada; PROTEO, Quebec Network for Research on Protein Function, Structure, and Engineering, Quebec City, QC G1V 0A6, Canada
| | - Mylène Brunelle
- Department of Biochemistry and Functional Genomics, Université de Sherbrooke, Sherbrooke, QC J1E 4K8, Canada; PROTEO, Quebec Network for Research on Protein Function, Structure, and Engineering, Quebec City, QC G1V 0A6, Canada
| | - Alan A Cohen
- Department of Family Medicine, Université de Sherbrooke, Sherbrooke, QC J1H 5N4, Canada
| | - Michelle S Scott
- Department of Biochemistry and Functional Genomics, Université de Sherbrooke, Sherbrooke, QC J1E 4K8, Canada
| | - Xavier Roucou
- Department of Biochemistry and Functional Genomics, Université de Sherbrooke, Sherbrooke, QC J1E 4K8, Canada; PROTEO, Quebec Network for Research on Protein Function, Structure, and Engineering, Quebec City, QC G1V 0A6, Canada.
| |
Collapse
|
21
|
Othoum G, Maher CA. CrypticProteinDB: an integrated database of proteome and immunopeptidome derived non-canonical cancer proteins. NAR Cancer 2023; 5:zcad024. [PMID: 37275273 PMCID: PMC10233886 DOI: 10.1093/narcan/zcad024] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/12/2023] [Revised: 05/14/2023] [Accepted: 05/16/2023] [Indexed: 06/07/2023] Open
Abstract
Translated non-canonical proteins derived from noncoding regions or alternative open reading frames (ORFs) can contribute to critical and diverse cellular processes. In the context of cancer, they also represent an under-appreciated source of targets for cancer immunotherapy through their tumor-enriched expression or by harboring somatic mutations that produce neoantigens. Here, we introduce the largest integration and proteogenomic analysis of novel peptides to assess the prevalence of non-canonical ORFs (ncORFs) in more than 900 patient proteomes and 26 immunopeptidome datasets across 14 cancer types. The integrative proteogenomic analysis of whole-cell proteomes and immunopeptidomes revealed peptide support for a nonredundant set of 9760 upstream, downstream, and out-of-frame ncORFs in protein coding genes and 12811 in noncoding RNAs. Notably, 6486 ncORFs were derived from differentially expressed genes and 340 were ubiquitously translated across eight or more cancers. The analysis also led to the discovery of thirty-four epitopes and eight neoantigens from non-canonical proteins in two cohorts as novel cancer immunotargets. Collectively, our analysis integrated both bottom-up proteogenomic and targeted peptide validation to illustrate the prevalence of translated non-canonical proteins in cancer and to provide a resource for the prioritization of novel proteins supported by proteomic, immunopeptidomic, genomic and transcriptomic data, available at https://www.maherlab.com/crypticproteindb.
Collapse
Affiliation(s)
- Ghofran Othoum
- Department of Internal Medicine, Washington University School of Medicine, St. Louis, MO 63108, USA
| | - Christopher A Maher
- Department of Internal Medicine, Washington University School of Medicine, St. Louis, MO 63108, USA
- Department of Biomedical Engineering, Washington University in St. Louis, MO 63108, USA
- Alvin J. Siteman Cancer Center, Washington University in St. Louis, St. Louis, MO 63108, USA
| |
Collapse
|
22
|
Kawaguchi M, Chang WS, Tsuchiya H, Kinoshita N, Miyaji A, Kawahara-Miki R, Tomita K, Sogabe A, Yorifuji M, Kono T, Kaneko T, Yasumasu S. Orphan gene expressed in flame cone cells uniquely found in seahorse epithelium. Cell Tissue Res 2023:10.1007/s00441-023-03779-1. [PMID: 37227506 DOI: 10.1007/s00441-023-03779-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/23/2022] [Accepted: 04/26/2023] [Indexed: 05/26/2023]
Abstract
The seahorse is one of the most unique teleost fishes in its morphology. The body is surrounded by bony plates and spines, and the male fish possess a brooding organ, called the brood pouch, on their tail. The surfaces of the brood pouch and the spines are surrounded by characteristic so-called flame cone cells. Based on our histological observations, flame cone cells are present in the seahorse Hippocampus abdominalis, but not in the barbed pipefish Urocampus nanus or the seaweed pipefish Syngnathus schlegeli, both of which belong to the same family as the seahorse. In the flame cone cells, we observed expression of an "orphan gene" lacking homologs in other lineages. This gene, which we named the proline-glycine rich (pgrich) gene, codes for an amino acid sequence composed of repetitive units. In situ hybridization and immunohistochemical analyses detected pgrich-positive signals from the flame cone cells. Based on a survey of the genome sequences of 15 teleost species, the pgrich gene is only found from some species of Syngnathiformes (namely, the genera Syngnathus and Hippocampus). The amino acid sequence of the seahorse PGrich is somewhat similar to the sequence deduced from the antisense strand of elastin. Furthermore, there are many transposable elements around the pgrich gene. These results suggest that the pgrich gene may have originated from the elastin gene with the involvement of transposable elements and obtained its novel function in the flame cone cells during the evolution of the seahorse.
Collapse
Affiliation(s)
- Mari Kawaguchi
- Department of Materials and Life Sciences, Faculty of Science and Technology, Sophia University, Chiyoda-ku, Tokyo, Japan.
| | - Wen-Shan Chang
- Department of Materials and Life Sciences, Faculty of Science and Technology, Sophia University, Chiyoda-ku, Tokyo, Japan
| | - Hazuki Tsuchiya
- Department of Materials and Life Sciences, Faculty of Science and Technology, Sophia University, Chiyoda-ku, Tokyo, Japan
| | - Nana Kinoshita
- Department of Materials and Life Sciences, Faculty of Science and Technology, Sophia University, Chiyoda-ku, Tokyo, Japan
| | - Akira Miyaji
- Department of Materials and Life Sciences, Faculty of Science and Technology, Sophia University, Chiyoda-ku, Tokyo, Japan
| | - Ryouka Kawahara-Miki
- Genome Research Center, NODAI Research Institute, Tokyo University of Agriculture, Setagaya-Ku, Tokyo, Japan
| | - Kenji Tomita
- Graduate School of Agricultural and Life Sciences, The University of Tokyo, Bunkyo-ku, Tokyo, Japan
| | - Atsushi Sogabe
- Department of Biology, Faculty of Agriculture and Life Science, Hirosaki University, Bunkyo, Hirosaki, Aomori, 036-8561, Japan
| | - Makiko Yorifuji
- Sesoko Station, Tropical Biosphere Research Center, University of the Ryukyus, Sesoko, Motobu, Okinawa, 905-0227, Japan
- Demonstration Laboratory, Marine Ecology Research Institute, Arahama, Kashiwazaki, Niigata, 945-0017, Japan
| | - Tomohiro Kono
- Genome Research Center, NODAI Research Institute, Tokyo University of Agriculture, Setagaya-Ku, Tokyo, Japan
- Department of Bioscience, Tokyo University of Agriculture, Setagaya-Ku, Tokyo, Japan
| | - Toyoji Kaneko
- Graduate School of Agricultural and Life Sciences, The University of Tokyo, Bunkyo-ku, Tokyo, Japan
| | - Shigeki Yasumasu
- Department of Materials and Life Sciences, Faculty of Science and Technology, Sophia University, Chiyoda-ku, Tokyo, Japan
| |
Collapse
|
23
|
Inchingolo MA, Diman A, Adamczewski M, Humphreys T, Jaquier-Gubler P, Curran JA. TP53BP1, a dual-coding gene, uses promoter switching and translational reinitiation to express a smORF protein. iScience 2023; 26:106757. [PMID: 37216125 PMCID: PMC10193022 DOI: 10.1016/j.isci.2023.106757] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/20/2022] [Revised: 03/07/2023] [Accepted: 04/24/2023] [Indexed: 05/24/2023] Open
Abstract
The complexity of the metazoan proteome is significantly increased by the expression of small proteins (<100 aa) derived from smORFs within lncRNAs, uORFs, 3' UTRs and, reading frames overlapping the CDS. These smORF encoded proteins (SEPs) have diverse roles, ranging from the regulation of cellular physiological to essential developmental functions. We report the characterization of a new member of this protein family, SEP53BP1, derived from a small internal ORF that overlaps the CDS encoding 53BP1. Its expression is coupled to the utilization of an alternative, cell-type specific promoter coupled to translational reinitiation events mediated by a uORF in the alternative 5' TL of the mRNA. This uORF-mediated reinitiation at an internal ORF is also observed in zebrafish. Interactome studies indicate that the human SEP53BP1 associates with components of the protein turnover pathway including the proteasome, and the TRiC/CCT chaperonin complex, suggesting that it may play a role in cellular proteostasis.
Collapse
Affiliation(s)
- Marta A. Inchingolo
- Department of Microbiology and Molecular Medicine, Faculty of Medicine, University of Geneva, Geneva, Switzerland
| | - Aurélie Diman
- Department of Microbiology and Molecular Medicine, Faculty of Medicine, University of Geneva, Geneva, Switzerland
| | - Maxime Adamczewski
- Department of Microbiology and Molecular Medicine, Faculty of Medicine, University of Geneva, Geneva, Switzerland
- Faculté de Médecine et Pharmacie, Université Grenoble Alpes, Grenoble, France
| | - Tom Humphreys
- Department of Microbiology and Molecular Medicine, Faculty of Medicine, University of Geneva, Geneva, Switzerland
- Faculty of Biology, Medicine and Health, University of Manchester, Manchester, UK
| | - Pascale Jaquier-Gubler
- Department of Microbiology and Molecular Medicine, Faculty of Medicine, University of Geneva, Geneva, Switzerland
| | - Joseph A. Curran
- Department of Microbiology and Molecular Medicine, Faculty of Medicine, University of Geneva, Geneva, Switzerland
- Institute of Genetics and Genomics of Geneva (iGE3), University of Geneva, Geneva, Switzerland
| |
Collapse
|
24
|
Muraleedharan A, Vanderperre B. The endo-lysosomal system in Parkinson's disease: expanding the horizon. J Mol Biol 2023:168140. [PMID: 37148997 DOI: 10.1016/j.jmb.2023.168140] [Citation(s) in RCA: 12] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/26/2023] [Revised: 04/22/2023] [Accepted: 04/27/2023] [Indexed: 05/08/2023]
Abstract
Parkinson's disease (PD) is the second most common neurodegenerative disorder after Alzheimer's disease, and its prevalence is increasing with age. A wealth of genetic evidence indicates that the endo-lysosomal system is a major pathway driving PD pathogenesis with a growing number of genes encoding endo-lysosomal proteins identified as risk factors for PD, making it a promising target for therapeutic intervention. However, detailed knowledge and understanding of the molecular mechanisms linking these genes to the disease are available for only a handful of them (e.g. LRRK2, GBA1, VPS35). Taking on the challenge of studying poorly characterized genes and proteins can be daunting, due to the limited availability of tools and knowledge from previous literature. This review aims at providing a valuable source of molecular and cellular insights into the biology of lesser-studied PD-linked endo-lysosomal genes, to help and encourage researchers in filling the knowledge gap around these less popular genetic players. Specific endo-lysosomal pathways discussed range from endocytosis, sorting, and vesicular trafficking to the regulation of membrane lipids of these membrane-bound organelles and the specific enzymatic activities they contain. We also provide perspectives on future challenges that the community needs to tackle and propose approaches to move forward in our understanding of these poorly studied endo-lysosomal genes. This will help harness their potential in designing innovative and efficient treatments to ultimately re-establish neuronal homeostasis in PD but also other diseases involving endo-lysosomal dysfunction.
Collapse
Affiliation(s)
- Amitha Muraleedharan
- Centre d'Excellence en Recherche sur les Maladies Orphelines - Fondation Courtois and Biological Sciences Department, Université du Québec à Montréal
| | - Benoît Vanderperre
- Centre d'Excellence en Recherche sur les Maladies Orphelines - Fondation Courtois and Biological Sciences Department, Université du Québec à Montréal
| |
Collapse
|
25
|
Cassidy L, Kaulich PT, Tholey A. Proteoforms expand the world of microproteins and short open reading frame-encoded peptides. iScience 2023; 26:106069. [PMID: 36818287 PMCID: PMC9929600 DOI: 10.1016/j.isci.2023.106069] [Citation(s) in RCA: 11] [Impact Index Per Article: 5.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/28/2023] Open
Abstract
Microproteins and short open reading frame-encoded peptides (SEPs) can, like all proteins, carry numerous posttranslational modifications. Together with posttranscriptional processes, this leads to a high number of possible distinct protein molecules, the proteoforms, out of a limited number of genes. The identification, quantification, and molecular characterization of proteoforms possess special challenges to established, mainly bottom-up proteomics (BUP) based analytical approaches. While BUP methods are powerful, proteins have to be inferred rather than directly identified, which hampers the detection of proteoforms. An alternative approach is top-down proteomics (TDP) which allows to identify intact proteoforms. This perspective article provides a brief overview of modified microproteins and SEPs, introduces the proteoform terminology, and compares present BUP and TDP workflows highlighting their major advantages and caveats. Necessary future developments in TDP to fully accentuate its potential for proteoform-centric analytics of microproteins and SEPs will be discussed.
Collapse
Affiliation(s)
- Liam Cassidy
- Systematic Proteome Research & Bioanalytics, Institute for Experimental Medicine, Christian-Albrechts-Universität zu Kiel, 24105 Kiel, Germany
| | - Philipp T. Kaulich
- Systematic Proteome Research & Bioanalytics, Institute for Experimental Medicine, Christian-Albrechts-Universität zu Kiel, 24105 Kiel, Germany
| | - Andreas Tholey
- Systematic Proteome Research & Bioanalytics, Institute for Experimental Medicine, Christian-Albrechts-Universität zu Kiel, 24105 Kiel, Germany,Corresponding author
| |
Collapse
|
26
|
Álvarez-Urdiola R, Borràs E, Valverde F, Matus JT, Sabidó E, Riechmann JL. Peptidomics Methods Applied to the Study of Flower Development. Methods Mol Biol 2023; 2686:509-536. [PMID: 37540375 DOI: 10.1007/978-1-0716-3299-4_24] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 08/05/2023]
Abstract
Understanding the global and dynamic nature of plant developmental processes requires not only the study of the transcriptome, but also of the proteome, including its largely uncharacterized peptidome fraction. Recent advances in proteomics and high-throughput analyses of translating RNAs (ribosome profiling) have begun to address this issue, evidencing the existence of novel, uncharacterized, and possibly functional peptides. To validate the accumulation in tissues of sORF-encoded polypeptides (SEPs), the basic setup of proteomic analyses (i.e., LC-MS/MS) can be followed. However, the detection of peptides that are small (up to ~100 aa, 6-7 kDa) and novel (i.e., not annotated in reference databases) presents specific challenges that need to be addressed both experimentally and with computational biology resources. Several methods have been developed in recent years to isolate and identify peptides from plant tissues. In this chapter, we outline two different peptide extraction protocols and the subsequent peptide identification by mass spectrometry using the database search or the de novo identification methods.
Collapse
Affiliation(s)
- Raquel Álvarez-Urdiola
- Centre for Research in Agricultural Genomics (CRAG) CSIC-IRTA-UAB-UB, Edifici CRAG, Campus UAB, Cerdanyola del Vallès, Barcelona, Spain
| | - Eva Borràs
- Centre for Genomic Regulation (CRG), Barcelona Institute of Science and Technology, Barcelona, Spain
- Universitat Pompeu Fabra, Barcelona, Spain
| | - Federico Valverde
- Institute for Plant Biochemistry and Photosynthesis CSIC - University of Seville, Seville, Spain
| | - José Tomás Matus
- Centre for Research in Agricultural Genomics (CRAG) CSIC-IRTA-UAB-UB, Edifici CRAG, Campus UAB, Cerdanyola del Vallès, Barcelona, Spain
- Institute for Integrative Systems Biology (I2SysBio), Universitat de València-CSIC, Paterna, Valencia, Spain
| | - Eduard Sabidó
- Centre for Genomic Regulation (CRG), Barcelona Institute of Science and Technology, Barcelona, Spain
- Universitat Pompeu Fabra, Barcelona, Spain
| | - José Luis Riechmann
- Centre for Research in Agricultural Genomics (CRAG) CSIC-IRTA-UAB-UB, Edifici CRAG, Campus UAB, Cerdanyola del Vallès, Barcelona, Spain.
- Institució Catalana de Recerca i Estudis Avançats (ICREA), Barcelona, Spain.
| |
Collapse
|
27
|
Çakır U, Gabed N, Brunet M, Roucou X, Kryvoruchko I. Mosaic translation hypothesis: chimeric polypeptides produced via multiple ribosomal frameshifting as a basis for adaptability. FEBS J 2023; 290:370-378. [PMID: 34743413 DOI: 10.1111/febs.16269] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/07/2021] [Revised: 10/03/2021] [Accepted: 11/05/2021] [Indexed: 02/05/2023]
Abstract
How many different proteins can be produced from a single spliced transcript? Genome annotation projects overlook the coding potential of reading frames other than that of the reference open reading frames (refORFs). Recently, alternative open reading frames (altORFs) and their translational products, alternative proteins, have been shown to carry out important functions in various organisms. AltORFs overlapping refORFs or other altORFs in a different reading frame may be involved in one fundamental mechanism so far overlooked. A few years ago, it was proposed that altORFs may act as building blocks for chimeric (mosaic) polypeptides, which are produced via multiple ribosomal frameshifting events from a single mature transcript. We adopt terminology from that earlier discussion and call this mechanism mosaic translation. This way of extracting and combining genetic information may significantly increase proteome diversity. Thus, we hypothesize that this mechanism may have contributed to the flexibility and adaptability of organisms to a variety of environmental conditions. Specialized ribosomes acting as sensors probably played a central role in this process. Importantly, mosaic translation may be the main source of protein diversity in genomes that lack alternative splicing. The idea of mosaic translation is a testable hypothesis, although its direct demonstration is challenging. Should mosaic translation occur, we would currently highly underestimate the complexity of translation mechanisms and thus the proteome.
Collapse
Affiliation(s)
- Umut Çakır
- Molecular Biology and Genetics Department, Faculty of Arts and Sciences, Boğaziçi University, Istanbul, Turkey
| | - Noujoud Gabed
- Cellular and Molecular Biology Department, Oran High School of Biological Sciences (ESSBO), Oran, Algeria
| | - Marie Brunet
- Department of Pediatrics, Medical Genetics Service, Université de Sherbrooke, QC, Canada.,Centre de Recherche du Centre Hospitalier Universitaire de Sherbrooke (CRCHUS), QC, Canada
| | - Xavier Roucou
- Centre de Recherche du Centre Hospitalier Universitaire de Sherbrooke (CRCHUS), QC, Canada.,Department of Biochemistry and Functional Genomics, Université de Sherbrooke, QC, Canada
| | - Igor Kryvoruchko
- Molecular Biology and Genetics Department, Faculty of Arts and Sciences, Boğaziçi University, Istanbul, Turkey
| |
Collapse
|
28
|
Haploinsufficiency of EXT1 and Heparan Sulphate Deficiency Associated with Hereditary Multiple Exostoses in a Pakistani Family. Medicina (B Aires) 2022; 59:medicina59010100. [PMID: 36676722 PMCID: PMC9863873 DOI: 10.3390/medicina59010100] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/07/2022] [Revised: 12/26/2022] [Accepted: 12/28/2022] [Indexed: 01/04/2023] Open
Abstract
Background and Objectives: Hereditary multiple exostoses (HME) is a disease characterized by cartilage-capped bony protuberances at the site of growth plates of long bones. Functional mutations in the exostosin genes (EXT1 and EXT2) are reported to affect the hedgehog signalling pathways leading to multiple enchondromatosis. However, the exact role of each EXT protein in the regulation of heparan sulphate (HS) chain elongation is still an enigma. In this study, a Pakistani family with HME is investigated to find out the genetic basis of the disease. Materials and Methods: Genotyping of eight members of the family by amplifying microsatellite markers, tightly linked to the EXT1 and EXT2 genes. Results: The study revealed linkage of the HME family to the EXT1 locus 8q24.1. Sanger sequencing identified a heterozygous deletion (c.247Cdel) in exon 1 of EXT1, segregating with the disease phenotype in the family. In silico analysis predicted a shift in the frame causing an early stop codon (p.R83GfsX52). The predicted dwarf protein constituting 134 amino acids was functionally aberrant with a complete loss of the catalytic domain at the C-terminus. Interestingly, an alternative open reading frame 3 (ORF3) caused by the frame shift is predicted to encode a protein sequence, identical to the wild type and containing the catalytic domain, but lacking the first 100 amino acids of the wild-type EXT1 protein. Conclusion: Consequently, haploinsufficiency could be the cause of HME in the investigated family as the mutated copy of EXT1 is ineffective for EXT-1/2 complex formation. The predicted ORF3 protein could be of great significance in understanding several aspects of HME pathogenesis.
Collapse
|
29
|
Mohaupt P, Roucou X, Delaby C, Vialaret J, Lehmann S, Hirtz C. The alternative proteome in neurobiology. Front Cell Neurosci 2022; 16:1019680. [PMID: 36467612 PMCID: PMC9712206 DOI: 10.3389/fncel.2022.1019680] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/15/2022] [Accepted: 11/02/2022] [Indexed: 10/13/2023] Open
Abstract
Translation involves the biosynthesis of a protein sequence following the decoding of the genetic information embedded in a messenger RNA (mRNA). Typically, the eukaryotic mRNA was considered to be inherently monocistronic, but this paradigm is not in agreement with the translational landscape of cells, tissues, and organs. Recent ribosome sequencing (Ribo-seq) and proteomics studies show that, in addition to currently annotated reference proteins (RefProt), other proteins termed alternative proteins (AltProts), and microproteins are encoded in regions of mRNAs thought to be untranslated or in transcripts annotated as non-coding. This experimental evidence expands the repertoire of functional proteins within a cell and potentially provides important information on biological processes. This review explores the hitherto overlooked alternative proteome in neurobiology and considers the role of AltProts in pathological and healthy neuromolecular processes.
Collapse
Affiliation(s)
- Pablo Mohaupt
- LBPC-PPC, Université de Montpellier, IRMB CHU de Montpellier, INM INSERM, Montpellier, France
| | - Xavier Roucou
- Department of Biochemistry and Functional Genomics, Université de Sherbrooke, Sherbrooke, QC, Canada
| | - Constance Delaby
- LBPC-PPC, Université de Montpellier, IRMB CHU de Montpellier, INM INSERM, Montpellier, France
| | - Jérôme Vialaret
- LBPC-PPC, Université de Montpellier, IRMB CHU de Montpellier, INM INSERM, Montpellier, France
| | - Sylvain Lehmann
- LBPC-PPC, Université de Montpellier, IRMB CHU de Montpellier, INM INSERM, Montpellier, France
| | - Christophe Hirtz
- LBPC-PPC, Université de Montpellier, IRMB CHU de Montpellier, INM INSERM, Montpellier, France
| |
Collapse
|
30
|
Duhamel M, Drelich L, Wisztorski M, Aboulouard S, Gimeno JP, Ogrinc N, Devos P, Cardon T, Weller M, Escande F, Zairi F, Maurage CA, Le Rhun É, Fournier I, Salzet M. Spatial analysis of the glioblastoma proteome reveals specific molecular signatures and markers of survival. Nat Commun 2022; 13:6665. [PMID: 36333286 PMCID: PMC9636229 DOI: 10.1038/s41467-022-34208-6] [Citation(s) in RCA: 24] [Impact Index Per Article: 8.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/19/2022] [Accepted: 10/18/2022] [Indexed: 11/06/2022] Open
Abstract
Molecular heterogeneity is a key feature of glioblastoma that impedes patient stratification and leads to large discrepancies in mean patient survival. Here, we analyze a cohort of 96 glioblastoma patients with survival ranging from a few months to over 4 years. 46 tumors are analyzed by mass spectrometry-based spatially-resolved proteomics guided by mass spectrometry imaging. Integration of protein expression and clinical information highlights three molecular groups associated with immune, neurogenesis, and tumorigenesis signatures with high intra-tumoral heterogeneity. Furthermore, a set of proteins originating from reference and alternative ORFs is found to be statistically significant based on patient survival times. Among these proteins, a 5-protein signature is associated with survival. The expression of these 5 proteins is validated by immunofluorescence on an additional cohort of 50 patients. Overall, our work characterizes distinct molecular regions within glioblastoma tissues based on protein expression, which may help guide glioblastoma prognosis and improve current glioblastoma classification.
Collapse
Affiliation(s)
- Marie Duhamel
- Univ.Lille, Inserm, CHU Lille, U1192, Laboratoire Protéomique, Réponse Inflammatoire et Spectrométrie de Masse (PRISM), F-59000, Lille, France.
| | - Lauranne Drelich
- Univ.Lille, Inserm, CHU Lille, U1192, Laboratoire Protéomique, Réponse Inflammatoire et Spectrométrie de Masse (PRISM), F-59000, Lille, France
| | - Maxence Wisztorski
- Univ.Lille, Inserm, CHU Lille, U1192, Laboratoire Protéomique, Réponse Inflammatoire et Spectrométrie de Masse (PRISM), F-59000, Lille, France
| | - Soulaimane Aboulouard
- Univ.Lille, Inserm, CHU Lille, U1192, Laboratoire Protéomique, Réponse Inflammatoire et Spectrométrie de Masse (PRISM), F-59000, Lille, France
| | - Jean-Pascal Gimeno
- Univ.Lille, Inserm, CHU Lille, U1192, Laboratoire Protéomique, Réponse Inflammatoire et Spectrométrie de Masse (PRISM), F-59000, Lille, France
| | - Nina Ogrinc
- Univ.Lille, Inserm, CHU Lille, U1192, Laboratoire Protéomique, Réponse Inflammatoire et Spectrométrie de Masse (PRISM), F-59000, Lille, France
| | - Patrick Devos
- Univ. Lille, CHU Lille, ULR 2694 - METRICS: Évaluation des technologies de santé et des pratiques médicales, F-59000, Lille, France
| | - Tristan Cardon
- Univ.Lille, Inserm, CHU Lille, U1192, Laboratoire Protéomique, Réponse Inflammatoire et Spectrométrie de Masse (PRISM), F-59000, Lille, France
| | - Michael Weller
- Department of Neurology & Clinical Neuroscience Center, University Hospital and University of Zurich, Zurich, Switzerland
| | - Fabienne Escande
- CHU Lille, Service de biochimie et biologie moléculaire, CHU Lille, F-59000, Lille, France
| | - Fahed Zairi
- CHU Lille, Service de neurochirurgie, F-59000, Lille, France
| | - Claude-Alain Maurage
- CHU Lille, Service de biochimie et biologie moléculaire, CHU Lille, F-59000, Lille, France
| | - Émilie Le Rhun
- Univ.Lille, Inserm, CHU Lille, U1192, Laboratoire Protéomique, Réponse Inflammatoire et Spectrométrie de Masse (PRISM), F-59000, Lille, France.
- Department of Neurology & Clinical Neuroscience Center, University Hospital and University of Zurich, Zurich, Switzerland.
- CHU Lille, Service de biochimie et biologie moléculaire, CHU Lille, F-59000, Lille, France.
| | - Isabelle Fournier
- Univ.Lille, Inserm, CHU Lille, U1192, Laboratoire Protéomique, Réponse Inflammatoire et Spectrométrie de Masse (PRISM), F-59000, Lille, France.
- Institut Universitaire de France (IUF), 75000, Paris, France.
| | - Michel Salzet
- Univ.Lille, Inserm, CHU Lille, U1192, Laboratoire Protéomique, Réponse Inflammatoire et Spectrométrie de Masse (PRISM), F-59000, Lille, France.
- Institut Universitaire de France (IUF), 75000, Paris, France.
| |
Collapse
|
31
|
Zheng X, Xiang M. Mitochondrion-located peptides and their pleiotropic physiological functions. FEBS J 2022; 289:6919-6935. [PMID: 35599630 DOI: 10.1111/febs.16532] [Citation(s) in RCA: 13] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/22/2021] [Revised: 05/12/2022] [Accepted: 05/20/2022] [Indexed: 01/13/2023]
Abstract
With the development of advanced technologies, many small open reading frames (sORFs) have been found to be translated into micropeptides. Interestingly, a considerable proportion of micropeptides are located in mitochondria, which are designated here as mitochondrion-located peptides (MLPs). These MLPs often contain a transmembrane domain and show a high degree of conservation across species. They usually act as co-factors of large proteins and play regulatory roles in mitochondria such as electron transport in the respiratory chain, reactive oxygen species (ROS) production, metabolic homeostasis, and so on. Deficiency of MLPs disturbs diverse physiological processes including immunity, differentiation, and metabolism both in vivo and in vitro. These findings reveal crucial functions for MLPs and provide fresh insights into diverse mitochondrion-associated biological processes and diseases.
Collapse
Affiliation(s)
- Xintong Zheng
- State Key Laboratory of Ophthalmology, Zhongshan Ophthalmic Center, Guangdong Provincial Key Laboratory of Ophthalmology and Visual Science, Sun Yat-sen University, Guangzhou, China
| | - Mengqing Xiang
- State Key Laboratory of Ophthalmology, Zhongshan Ophthalmic Center, Guangdong Provincial Key Laboratory of Ophthalmology and Visual Science, Sun Yat-sen University, Guangzhou, China.,Guangdong Provincial Key Laboratory of Brain Function and Disease, Zhongshan School of Medicine, Sun Yat-sen University, Guangzhou, China
| |
Collapse
|
32
|
Manske F, Ogoniak L, Jürgens L, Grundmann N, Makałowski W, Wethmar K. The new uORFdb: integrating literature, sequence, and variation data in a central hub for uORF research. Nucleic Acids Res 2022; 51:D328-D336. [PMID: 36305828 PMCID: PMC9825577 DOI: 10.1093/nar/gkac899] [Citation(s) in RCA: 15] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/14/2022] [Revised: 09/28/2022] [Accepted: 10/03/2022] [Indexed: 02/07/2023] Open
Abstract
Upstream open reading frames (uORFs) are initiated by AUG or near-cognate start codons and have been identified in the transcript leader sequences of the majority of eukaryotic transcripts. Functionally, uORFs are implicated in downstream translational regulation of the main protein coding sequence and may serve as a source of non-canonical peptides. Genetic defects in uORF sequences have been linked to the development of various diseases, including cancer. To simplify uORF-related research, the initial release of uORFdb in 2014 provided a comprehensive and manually curated collection of uORF-related literature. Here, we present an updated sequence-based version of uORFdb, accessible at https://www.bioinformatics.uni-muenster.de/tools/uorfdb. The new uORFdb enables users to directly access sequence information, graphical displays, and genetic variation data for over 2.4 million human uORFs. It also includes sequence data of >4.2 million uORFs in 12 additional species. Multiple uORFs can be displayed in transcript- and reading-frame-specific models to visualize the translational context. A variety of filters, sequence-related information, and links to external resources (UCSC Genome Browser, dbSNP, ClinVar) facilitate immediate in-depth analysis of individual uORFs. The database also contains uORF-related somatic variation data obtained from whole-genome sequencing (WGS) analyses of 677 cancer samples collected by the TCGA consortium.
Collapse
Affiliation(s)
- Felix Manske
- Institute of Bioinformatics, University of Münster, Münster 48149, Germany
| | - Lynn Ogoniak
- Institute of Bioinformatics, University of Münster, Münster 48149, Germany
| | - Lara Jürgens
- Department of Medicine A, Hematology, Oncology, Hemostaseology and Pneumology, University Hospital Münster, Münster 48149, Germany
| | - Norbert Grundmann
- Institute of Bioinformatics, University of Münster, Münster 48149, Germany
| | - Wojciech Makałowski
- Correspondence may also be addressed to Wojciech Makałowski. Tel: +49 2518353006;
| | - Klaus Wethmar
- To whom correspondence should be addressed. Tel: +49 2518347587; Fax: +49 2518347588;
| |
Collapse
|
33
|
Na Z, Dai X, Zheng SJ, Bryant CJ, Loh KH, Su H, Luo Y, Buhagiar AF, Cao X, Baserga SJ, Chen S, Slavoff SA. Mapping subcellular localizations of unannotated microproteins and alternative proteins with MicroID. Mol Cell 2022; 82:2900-2911.e7. [PMID: 35905735 PMCID: PMC9662605 DOI: 10.1016/j.molcel.2022.06.035] [Citation(s) in RCA: 24] [Impact Index Per Article: 8.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/07/2021] [Revised: 04/08/2022] [Accepted: 06/29/2022] [Indexed: 11/15/2022]
Abstract
Proteogenomic identification of translated small open reading frames has revealed thousands of previously unannotated, largely uncharacterized microproteins, or polypeptides of less than 100 amino acids, and alternative proteins (alt-proteins) that are co-encoded with canonical proteins and are often larger. The subcellular localizations of microproteins and alt-proteins are generally unknown but can have significant implications for their functions. Proximity biotinylation is an attractive approach to define the protein composition of subcellular compartments in cells and in animals. Here, we developed a high-throughput technology to map unannotated microproteins and alt-proteins to subcellular localizations by proximity biotinylation with TurboID (MicroID). More than 150 microproteins and alt-proteins are associated with subnuclear organelles. One alt-protein, alt-LAMA3, localizes to the nucleolus and functions in pre-rRNA transcription. We applied MicroID in a mouse model, validating expression of a conserved nuclear microprotein, and establishing MicroID for discovery of microproteins and alt-proteins in vivo.
Collapse
Affiliation(s)
- Zhenkun Na
- Department of Chemistry, Yale University, New Haven, CT 06520, USA; Institute of Biomolecular Design and Discovery, Yale University, West Haven, CT 06516, USA
| | - Xiaoyun Dai
- Department of Genetics, Yale University School of Medicine, New Haven, CT 06520, USA; Systems Biology Institute, Yale University, West Haven, CT 06516, USA
| | - Shu-Jian Zheng
- Department of Chemistry, Yale University, New Haven, CT 06520, USA; Institute of Biomolecular Design and Discovery, Yale University, West Haven, CT 06516, USA
| | - Carson J Bryant
- Department of Molecular Biophysics and Biochemistry, Yale University, New Haven, CT 06529, USA
| | - Ken H Loh
- Laboratory of Molecular Genetics, Howard Hughes Medical Institute, The Rockefeller University, New York, NY 10065, USA
| | - Haomiao Su
- Department of Chemistry, Yale University, New Haven, CT 06520, USA; Institute of Biomolecular Design and Discovery, Yale University, West Haven, CT 06516, USA
| | - Yang Luo
- Department of Chemistry, Yale University, New Haven, CT 06520, USA; Institute of Biomolecular Design and Discovery, Yale University, West Haven, CT 06516, USA
| | - Amber F Buhagiar
- Department of Molecular Biophysics and Biochemistry, Yale University, New Haven, CT 06529, USA
| | - Xiongwen Cao
- Department of Chemistry, Yale University, New Haven, CT 06520, USA; Institute of Biomolecular Design and Discovery, Yale University, West Haven, CT 06516, USA
| | - Susan J Baserga
- Department of Molecular Biophysics and Biochemistry, Yale University, New Haven, CT 06529, USA; Department of Genetics, Yale University School of Medicine, New Haven, CT 06520, USA; Department of Therapeutic Radiology, Yale University School of Medicine, New Haven, CT 06520, USA
| | - Sidi Chen
- Department of Genetics, Yale University School of Medicine, New Haven, CT 06520, USA; Systems Biology Institute, Yale University, West Haven, CT 06516, USA
| | - Sarah A Slavoff
- Department of Chemistry, Yale University, New Haven, CT 06520, USA; Institute of Biomolecular Design and Discovery, Yale University, West Haven, CT 06516, USA; Department of Molecular Biophysics and Biochemistry, Yale University, New Haven, CT 06529, USA.
| |
Collapse
|
34
|
Sampadi B, Mullenders LHF, Vrieling H. Low and high doses of ionizing radiation evoke discrete global (phospho)proteome responses. DNA Repair (Amst) 2022; 113:103305. [PMID: 35255311 DOI: 10.1016/j.dnarep.2022.103305] [Citation(s) in RCA: 6] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/28/2021] [Revised: 02/02/2022] [Accepted: 02/22/2022] [Indexed: 12/16/2022]
Abstract
BACKGROUND Although cancer risk is assumed to be linear with ionizing radiation (IR) dose, it is unclear to what extent low doses (LD) of IR from medical and occupational exposures pose a cancer risk for humans. Improved mechanistic understanding of the signaling responses to LD may help to clarify this uncertainty. Here, we performed quantitative mass spectrometry-based proteomics and phosphoproteomics experiments, using mouse embryonic stem cells, at 0.5 h and 4 h after exposure to LD (0.1 Gy) and high doses (HD; 1 Gy) of IR. RESULTS The proteome remained relatively stable (29; 0.5% proteins responded), whereas the phosphoproteome changed dynamically (819; 7% phosphosites changed) upon irradiation. Dose-dependent alterations of 25 IR-responsive proteins were identified, with only four in common between LD and HD. Mitochondrial metabolic proteins and pathways responded to LD, whereas transporter proteins and mitochondrial uncoupling pathways responded to HD. Congruently, mitochondrial respiration increased after LD exposure but decreased after HD exposure. While the bulk of the phosphoproteome response to LD (76%) occurred already at 0.5 h, an equivalent proportion of the phosphosites responded to HD at both time points. Motif, kinome/phosphatome, kinase-substrate, and pathway analyses revealed a robust DNA damage response (DDR) activation after HD exposure but not after LD exposure. Instead, LD-irradiation induced (de)phosphorylation of kinases, kinase-substrates and phosphatases that predominantly respond to reactive oxygen species (ROS) production. CONCLUSION Our analyses identify discrete global proteome and phosphoproteome responses after LD and HD, uncovering novel proteins and protein (de)phosphorylation events involved in the dose-dependent ionizing radiation responses.
Collapse
Affiliation(s)
- Bharath Sampadi
- Department of Human Genetics, Leiden University Medical Center, Einthovenweg 20, 2333ZC Leiden, The Netherlands.
| | - Leon H F Mullenders
- Department of Human Genetics, Leiden University Medical Center, Einthovenweg 20, 2333ZC Leiden, The Netherlands; Department of Genetics, Research Institute of Environmental Medicine (RIeM), Nagoya University, Nagoya, Japan
| | - Harry Vrieling
- Department of Human Genetics, Leiden University Medical Center, Einthovenweg 20, 2333ZC Leiden, The Netherlands.
| |
Collapse
|
35
|
Cao X, Khitun A, Harold CM, Bryant CJ, Zheng SJ, Baserga SJ, Slavoff SA. Nascent alt-protein chemoproteomics reveals a pre-60S assembly checkpoint inhibitor. Nat Chem Biol 2022; 18:643-651. [PMID: 35393574 PMCID: PMC9423127 DOI: 10.1038/s41589-022-01003-9] [Citation(s) in RCA: 10] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/03/2021] [Accepted: 02/25/2022] [Indexed: 12/29/2022]
Abstract
Many unannotated microproteins and alternative proteins (alt-proteins) are coencoded with canonical proteins, but few of their functions are known. Motivated by the hypothesis that alt-proteins undergoing regulated synthesis could play important cellular roles, we developed a chemoproteomic pipeline to identify nascent alt-proteins in human cells. We identified 22 actively translated alt-proteins or N-terminal extensions, one of which is post-transcriptionally upregulated by DNA damage stress. We further defined a nucleolar, cell-cycle-regulated alt-protein that negatively regulates assembly of the pre-60S ribosomal subunit (MINAS-60). Depletion of MINAS-60 increases the amount of cytoplasmic 60S ribosomal subunit, upregulating global protein synthesis and cell proliferation. Mechanistically, MINAS-60 represses the rate of late-stage pre-60S assembly and export to the cytoplasm. Together, these results implicate MINAS-60 as a potential checkpoint inhibitor of pre-60S assembly and demonstrate that chemoproteomics enables hypothesis generation for uncharacterized alt-proteins.
Collapse
Affiliation(s)
- Xiongwen Cao
- Department of Chemistry, Yale University, New Haven, CT, USA.,Institute of Biomolecular Design and Discovery, Yale University, West Haven, CT, USA
| | - Alexandra Khitun
- Department of Chemistry, Yale University, New Haven, CT, USA.,Institute of Biomolecular Design and Discovery, Yale University, West Haven, CT, USA
| | - Cecelia M Harold
- Department of Genetics, Yale University School of Medicine, New Haven, CT, USA
| | - Carson J Bryant
- Department of Molecular Biophysics and Biochemistry, Yale University, New Haven, CT, USA
| | - Shu-Jian Zheng
- Department of Chemistry, Yale University, New Haven, CT, USA.,Institute of Biomolecular Design and Discovery, Yale University, West Haven, CT, USA
| | - Susan J Baserga
- Department of Genetics, Yale University School of Medicine, New Haven, CT, USA.,Department of Molecular Biophysics and Biochemistry, Yale University, New Haven, CT, USA.,Department of Therapeutic Radiology, Yale University School of Medicine, New Haven, CT, USA
| | - Sarah A Slavoff
- Department of Chemistry, Yale University, New Haven, CT, USA. .,Institute of Biomolecular Design and Discovery, Yale University, West Haven, CT, USA. .,Department of Molecular Biophysics and Biochemistry, Yale University, New Haven, CT, USA.
| |
Collapse
|
36
|
Zhang Z, Li Y, Yuan W, Wang Z, Wan C. Proteomic-driven identification of short open reading frame-encoded peptides. Proteomics 2022; 22:e2100312. [PMID: 35384297 DOI: 10.1002/pmic.202100312] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/25/2022] [Revised: 03/29/2022] [Accepted: 03/30/2022] [Indexed: 11/10/2022]
Abstract
Accumulating evidence has shown that a large number of short open reading frames (sORFs) also have the ability to encode proteins. The discovery of sORFs opens up a new research area, leading to the identification and functional study of sORF encoded peptides (SEPs) at the omics level. Besides bioinformatics prediction and ribosomal profiling, mass spectrometry (MS) has become a significant tool as it directly detects the sequence of SEPs. Though MS-based proteomics methods have proved to be effective for qualitative and quantitative analysis of SEPs, the detection of SEPs is still a great challenge due to their low abundance and short sequence. To illustrate the progress in method development, we described and discussed the main steps of large-scale proteomics identification of SEPs, including SEP extraction and enrichment, MS detection, data processing and quality control, quantification, and function prediction and validation methods. This article is protected by copyright. All rights reserved.
Collapse
Affiliation(s)
- Zheng Zhang
- School of Life Sciences and Hubei Key Laboratory of Genetic Regulation and Integrative Biology, Central China Normal University, Wuhan, Hubei, 430079, People's Republic of China
| | - Yujie Li
- School of Life Sciences and Hubei Key Laboratory of Genetic Regulation and Integrative Biology, Central China Normal University, Wuhan, Hubei, 430079, People's Republic of China
| | - Wenqian Yuan
- School of Life Sciences and Hubei Key Laboratory of Genetic Regulation and Integrative Biology, Central China Normal University, Wuhan, Hubei, 430079, People's Republic of China
| | - Zhiwei Wang
- School of Life Sciences and Hubei Key Laboratory of Genetic Regulation and Integrative Biology, Central China Normal University, Wuhan, Hubei, 430079, People's Republic of China
| | - Cuihong Wan
- School of Life Sciences and Hubei Key Laboratory of Genetic Regulation and Integrative Biology, Central China Normal University, Wuhan, Hubei, 430079, People's Republic of China
| |
Collapse
|
37
|
Leong AZX, Lee PY, Mohtar MA, Syafruddin SE, Pung YF, Low TY. Short open reading frames (sORFs) and microproteins: an update on their identification and validation measures. J Biomed Sci 2022; 29:19. [PMID: 35300685 PMCID: PMC8928697 DOI: 10.1186/s12929-022-00802-5] [Citation(s) in RCA: 38] [Impact Index Per Article: 12.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/31/2021] [Accepted: 03/09/2022] [Indexed: 12/17/2022] Open
Abstract
A short open reading frame (sORFs) constitutes ≤ 300 bases, encoding a microprotein or sORF-encoded protein (SEP) which comprises ≤ 100 amino acids. Traditionally dismissed by genome annotation pipelines as meaningless noise, sORFs were found to possess coding potential with ribosome profiling (RIBO-Seq), which unveiled sORF-based transcripts at various genome locations. Nonetheless, the existence of corresponding microproteins that are stable and functional was little substantiated by experimental evidence initially. With recent advancements in multi-omics, the identification, validation, and functional characterisation of sORFs and microproteins have become feasible. In this review, we discuss the history and development of an emerging research field of sORFs and microproteins. In particular, we focus on an array of bioinformatics and OMICS approaches used for predicting, sequencing, validating, and characterizing these recently discovered entities. These strategies include RIBO-Seq which detects sORF transcripts via ribosome footprints, and mass spectrometry (MS)-based proteomics for sequencing the resultant microproteins. Subsequently, our discussion extends to the functional characterisation of microproteins by incorporating CRISPR/Cas9 screen and protein–protein interaction (PPI) studies. Our review discusses not only detection methodologies, but we also highlight on the challenges and potential solutions in identifying and validating sORFs and their microproteins. The novelty of this review lies within its validation for the functional role of microproteins, which could contribute towards the future landscape of microproteomics.
Collapse
Affiliation(s)
- Alyssa Zi-Xin Leong
- UKM Medical Molecular Biology Institute (UMBI), Universiti Kebangsaan Malaysia, 56000, Kuala Lumpur, Malaysia
| | - Pey Yee Lee
- UKM Medical Molecular Biology Institute (UMBI), Universiti Kebangsaan Malaysia, 56000, Kuala Lumpur, Malaysia
| | - M Aiman Mohtar
- UKM Medical Molecular Biology Institute (UMBI), Universiti Kebangsaan Malaysia, 56000, Kuala Lumpur, Malaysia
| | - Saiful Effendi Syafruddin
- UKM Medical Molecular Biology Institute (UMBI), Universiti Kebangsaan Malaysia, 56000, Kuala Lumpur, Malaysia
| | - Yuh-Fen Pung
- Division of Biomedical Science, School of Pharmacy, University of Nottingham Malaysia, Semenyih, 43500, Selangor, Malaysia
| | - Teck Yew Low
- UKM Medical Molecular Biology Institute (UMBI), Universiti Kebangsaan Malaysia, 56000, Kuala Lumpur, Malaysia.
| |
Collapse
|
38
|
Small open reading frames in plant research: from prediction to functional characterization. 3 Biotech 2022; 12:76. [PMID: 35251879 PMCID: PMC8873315 DOI: 10.1007/s13205-022-03147-w] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/28/2021] [Accepted: 02/11/2022] [Indexed: 11/01/2022] Open
Abstract
Gene prediction is a laborious and time-consuming task. The advancement of sequencing technologies and bioinformatics tools, coupled with accelerated rate of ribosome profiling and mass spectrometry development, have made identification of small open reading frames (sORFs) (< 100 codons) in various plant genomes possible. The past 50 years have seen sORFs being isolated from many organisms. However, to date, a comprehensive sORF annotation pipeline is as yet unavailable, hence, addressed in our review. Here, we also provide current information on classification and functions of plant sORFs and their potential applications in crop improvement programs.
Collapse
|
39
|
Bonilauri B, Dallagiovanna B. Microproteins in skeletal muscle: hidden keys in muscle physiology. J Cachexia Sarcopenia Muscle 2022; 13:100-113. [PMID: 34850602 PMCID: PMC8818594 DOI: 10.1002/jcsm.12866] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 07/06/2021] [Revised: 10/01/2021] [Accepted: 10/12/2021] [Indexed: 11/10/2022] Open
Abstract
Recent advances in the transcriptomics, translatomics, and proteomics have led us to the exciting new world of functional endogenous microproteins. These microproteins have a small size and are derived from small open reading frames (smORFs) of RNAs previously annotated as non-coding (e.g. lncRNAs and circRNAs) as well as from untranslated regions and canonical mRNAs. The presence of these microproteins reveals a much larger translatable portion of the genome, shifting previously defined dogmas and paradigms. These findings affect our view of organisms as a whole, including skeletal muscle tissue. Emerging evidence demonstrates that several smORF-derived microproteins play crucial roles during muscle development (myogenesis), maintenance, and regeneration, as well as lipid and glucose metabolism and skeletal muscle bioenergetics. These microproteins are also involved in processes including physical activity capacity, cellular stress, and muscular-related diseases (i.e. myopathy, cachexia, atrophy, and muscle wasting). Given the role of these small proteins as important key regulators of several skeletal muscle processes, there are rich prospects for the discovery of new microproteins and possible therapies using synthetic microproteins.
Collapse
Affiliation(s)
- Bernardo Bonilauri
- Laboratory of Basic Biology of Stem Cells (LABCET)Carlos Chagas Institute ‐ Fiocruz‐PRCuritibaParanáBrazil
| | - Bruno Dallagiovanna
- Laboratory of Basic Biology of Stem Cells (LABCET)Carlos Chagas Institute ‐ Fiocruz‐PRCuritibaParanáBrazil
| |
Collapse
|
40
|
Abstract
Modern genome-scale methods that identify new genes, such as proteogenomics and ribosome profiling, have revealed, to the surprise of many, that overlap in genes, open reading frames and even coding sequences is widespread and functionally integrated into prokaryotic, eukaryotic and viral genomes. In parallel, the constraints that overlapping regions place on genome sequences and their evolution can be harnessed in bioengineering to build more robust synthetic strains and constructs. With a focus on overlapping protein-coding and RNA-coding genes, this Review examines their discovery, topology and biogenesis in the context of their genome biology. We highlight exciting new uses for sequence overlap to control translation, compress synthetic genetic constructs, and protect against mutation.
Collapse
|
41
|
Phosphoproteomics Sample Preparation Impacts Biological Interpretation of Phosphorylation Signaling Outcomes. Cells 2021; 10:cells10123407. [PMID: 34943915 PMCID: PMC8699897 DOI: 10.3390/cells10123407] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/28/2021] [Revised: 11/28/2021] [Accepted: 12/01/2021] [Indexed: 01/02/2023] Open
Abstract
The influence of phosphoproteomics sample preparation methods on the biological interpretation of signaling outcome is unclear. Here, we demonstrate a strong bias in phosphorylation signaling targets uncovered by comparing the phosphoproteomes generated by two commonly used methods-strong cation exchange chromatography-based phosphoproteomics (SCXPhos) and single-run high-throughput phosphoproteomics (HighPhos). Phosphoproteomes of embryonic stem cells exposed to ionizing radiation (IR) profiled by both methods achieved equivalent coverage (around 20,000 phosphosites), whereas a combined dataset significantly increased the depth (>30,000 phosphosites). While both methods reproducibly quantified a subset of shared IR-responsive phosphosites that represent DNA damage and cell-cycle-related signaling events, most IR-responsive phosphoproteins (>82%) and phosphosites (>96%) were method-specific. Both methods uncovered unique insights into phospho-signaling mediated by single (SCXPhos) versus double/multi-site (HighPhos) phosphorylation events; particularly, each method identified a distinct set of previously unreported IR-responsive kinome/phosphatome (95% disparate) directly impacting the uncovered biology.
Collapse
|
42
|
In-Depth Annotation of the Drosophila Bithorax-Complex Reveals the Presence of Several Alternative ORFs That Could Encode for Motif-Rich Peptides. Cells 2021; 10:cells10112983. [PMID: 34831206 PMCID: PMC8616405 DOI: 10.3390/cells10112983] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/13/2021] [Revised: 10/17/2021] [Accepted: 10/26/2021] [Indexed: 11/19/2022] Open
Abstract
It is recognized that a large proportion of eukaryotic RNAs and proteins is not produced from conventional genes but from short and alternative (alt) open reading frames (ORFs) that are not captured by gene prediction programs. Here we present an in silico prediction of altORFs by applying several selecting filters based on evolutionary conservation and annotations of previously characterized altORF peptides. Our work was performed in the Bithorax-complex (BX-C), which was one of the first genomic regions described to contain long non-coding RNAs in Drosophila. We showed that several altORFs could be predicted from coding and non-coding sequences of BX-C. In addition, the selected altORFs encode for proteins that contain several interesting molecular features, such as the presence of transmembrane helices or a general propensity to be rich in short interaction motifs. Of particular interest, one altORF encodes for a protein that contains a peptide sequence found in specific isoforms of two Drosophila Hox proteins. Our work thus suggests that several altORF proteins could be produced from a particular genomic region known for its critical role during Drosophila embryonic development. The molecular signatures of these altORF proteins further suggests that several of them could make numerous protein–protein interactions and be of functional importance in vivo.
Collapse
|
43
|
Sergiev PV, Rubtsova MP. Little but Loud. The Diversity of Functions of Small Proteins and Peptides - Translational Products of Short Reading Frames. BIOCHEMISTRY (MOSCOW) 2021; 86:1139-1150. [PMID: 34565317 DOI: 10.1134/s0006297921090091] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/23/2022]
Abstract
Cell functioning is tightly regulated process. For many years, research in the fields of proteomics and functional genomics has been focused on the role of proteins in cell functioning. The advances in science have led to the uncovering that short open reading frames, previously considered non-functional, serve a variety of functions. Short reading frames in polycistronic mRNAs often regulate their stability and translational efficiency of the main reading frame. The improvement of proteomic analysis methods has made it possible to identify the products of translation of short open reading frames in quantities that suggest the existence of functional role of those peptides and short proteins. Studies demonstrating their role unravel a new level of the regulation of cell functioning and its adaptation to changing conditions. This review is devoted to the analysis of functions of recently discovered peptides and short proteins.
Collapse
Affiliation(s)
- Petr V Sergiev
- Faculty of Chemistry, Lomonosov Moscow State University, Moscow, 119991, Russia. .,Skoltech Center of Life Sciences, Skolkovo Institute of Science and Technology, Skolkovo, 143025, Russia.,Institute of Functional Genomics, Lomonosov Moscow State University, Moscow, 119991, Russia
| | - Maria P Rubtsova
- Faculty of Chemistry, Lomonosov Moscow State University, Moscow, 119991, Russia.
| |
Collapse
|
44
|
Nomura Y, Dohmae N. Discovery of a small protein-encoding cis-regulatory overlapping gene of the tumor suppressor gene Scribble in humans. Commun Biol 2021; 4:1098. [PMID: 34535749 PMCID: PMC8448870 DOI: 10.1038/s42003-021-02619-8] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/23/2020] [Accepted: 08/30/2021] [Indexed: 12/26/2022] Open
Abstract
Intensive gene annotation has revealed many functional and regulatory elements in the human genome. Although eukaryotic protein-coding genes are generally transcribed into monocistronic mRNAs, recent studies have discovered additional short open reading frames (sORFs) in mRNAs. Here, we performed proteogenomic data mining for hidden proteins categorized into sORF-encoded polypeptides (SEPs) in human cancers. We identified a new SEP-encoding overlapping sORF (oORF) on the cell polarity determinant Scribble (SCRIB) that is considered a proto-oncogene with tumor suppressor function in Hippo-YAP/TAZ, MAPK/ERK, and PI3K/Akt/mTOR signaling. Reanalysis of clinical human proteomic data revealed translational dysregulation of both SCRIB and its oORF, oSCRIB, during carcinogenesis. Biochemical analyses suggested that the translatable oSCRIB constitutively limits the capacity of eukaryotic ribosomes to translate the downstream SCRIB. These findings provide a new example of cis-regulatory oORFs that function as a ribosomal roadblock and potentially serve as a fail-safe mechanism to normal cells for non-excessive downstream gene expression, which is hijacked in cancer. Yuhta Nomura and Naoshi Dohmae report the discovery of a small protein-coding gene that overlaps the tumor suppressor gene Scribble. Their data suggest that the overlapping gene, oSCRIB, limits the translation of downstream Scribble and may have important implications in cancer.
Collapse
Affiliation(s)
- Yuhta Nomura
- Biomolecular Characterization Unit, RIKEN Center for Sustainable Resource Science, 2-1 Hirosawa, Wako, Saitama, 351-0198, Japan.
| | - Naoshi Dohmae
- Biomolecular Characterization Unit, RIKEN Center for Sustainable Resource Science, 2-1 Hirosawa, Wako, Saitama, 351-0198, Japan.
| |
Collapse
|
45
|
Brunet MA, Lekehal AM, Roucou X. How to Illuminate the Dark Proteome Using the Multi-omic OpenProt Resource. ACTA ACUST UNITED AC 2021; 71:e103. [PMID: 32780568 DOI: 10.1002/cpbi.103] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/16/2022]
Abstract
Ten of thousands of open reading frames (ORFs) are hidden within genomes. These alternative ORFs, or small ORFs, have eluded annotations because they are either small or within unsuspected locations. They are found in untranslated regions or overlap a known coding sequence in messenger RNA and anywhere in a "non-coding" RNA. Serendipitous discoveries have highlighted these ORFs' importance in biological functions and pathways. With their discovery came the need for deeper ORF annotation and large-scale mining of public repositories to gather supporting experimental evidence. OpenProt, accessible at https://openprot.org/, is the first proteogenomic resource enforcing a polycistronic model of annotation across an exhaustive transcriptome for 10 species. Moreover, OpenProt reports experimental evidence cumulated across a re-analysis of 114 mass spectrometry and 87 ribosome profiling datasets. The multi-omics OpenProt resource also includes the identification of predicted functional domains and evaluation of conservation for all predicted ORFs. The OpenProt web server provides two query interfaces and one genome browser. The query interfaces allow for exploration of the coding potential of genes or transcripts of interest as well as custom downloads of all information contained in OpenProt. © 2020 The Authors. Basic Protocol 1: Using the Search interface Basic Protocol 2: Using the Downloads interface.
Collapse
Affiliation(s)
- Marie A Brunet
- Department of Biochemistry and Functional Genomics, Université de Sherbrooke, Sherbrooke, Québec, Canada.,PROTEO, Quebec Network for Research on Protein Function, Structure, and Engineering, Québec, Canada
| | - Amina M Lekehal
- Department of Biochemistry and Functional Genomics, Université de Sherbrooke, Sherbrooke, Québec, Canada.,PROTEO, Quebec Network for Research on Protein Function, Structure, and Engineering, Québec, Canada
| | - Xavier Roucou
- Department of Biochemistry and Functional Genomics, Université de Sherbrooke, Sherbrooke, Québec, Canada.,PROTEO, Quebec Network for Research on Protein Function, Structure, and Engineering, Québec, Canada
| |
Collapse
|
46
|
Carbonara K, Andonovski M, Coorssen JR. Proteomes Are of Proteoforms: Embracing the Complexity. Proteomes 2021; 9:38. [PMID: 34564541 PMCID: PMC8482110 DOI: 10.3390/proteomes9030038] [Citation(s) in RCA: 64] [Impact Index Per Article: 16.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/01/2021] [Revised: 08/24/2021] [Accepted: 08/29/2021] [Indexed: 12/17/2022] Open
Abstract
Proteomes are complex-much more so than genomes or transcriptomes. Thus, simplifying their analysis does not simplify the issue. Proteomes are of proteoforms, not canonical proteins. While having a catalogue of amino acid sequences provides invaluable information, this is the Proteome-lite. To dissect biological mechanisms and identify critical biomarkers/drug targets, we must assess the myriad of proteoforms that arise at any point before, after, and between translation and transcription (e.g., isoforms, splice variants, and post-translational modifications [PTM]), as well as newly defined species. There are numerous analytical methods currently used to address proteome depth and here we critically evaluate these in terms of the current 'state-of-the-field'. We thus discuss both pros and cons of available approaches and where improvements or refinements are needed to quantitatively characterize proteomes. To enable a next-generation approach, we suggest that advances lie in transdisciplinarity via integration of current proteomic methods to yield a unified discipline that capitalizes on the strongest qualities of each. Such a necessary (if not revolutionary) shift cannot be accomplished by a continued primary focus on proteo-genomics/-transcriptomics. We must embrace the complexity. Yes, these are the hard questions, and this will not be easy…but where is the fun in easy?
Collapse
Affiliation(s)
| | | | - Jens R. Coorssen
- Faculties of Applied Health Sciences and Mathematics & Science, Departments of Health Sciences and Biological Sciences, Brock University, 1812 Sir Isaac Brock Way, St. Catharines, ON L2S 3A1, Canada; (K.C.); (M.A.)
| |
Collapse
|
47
|
Li J, Singh U, Arendsee Z, Wurtele ES. Landscape of the Dark Transcriptome Revealed Through Re-mining Massive RNA-Seq Data. Front Genet 2021; 12:722981. [PMID: 34484307 PMCID: PMC8415361 DOI: 10.3389/fgene.2021.722981] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/09/2021] [Accepted: 07/26/2021] [Indexed: 12/13/2022] Open
Abstract
The "dark transcriptome" can be considered the multitude of sequences that are transcribed but not annotated as genes. We evaluated expression of 6,692 annotated genes and 29,354 unannotated open reading frames (ORFs) in the Saccharomyces cerevisiae genome across diverse environmental, genetic and developmental conditions (3,457 RNA-Seq samples). Over 30% of the highly transcribed ORFs have translation evidence. Phylostratigraphic analysis infers most of these transcribed ORFs would encode species-specific proteins ("orphan-ORFs"); hundreds have mean expression comparable to annotated genes. These data reveal unannotated ORFs most likely to be protein-coding genes. We partitioned a co-expression matrix by Markov Chain Clustering; the resultant clusters contain 2,468 orphan-ORFs. We provide the aggregated RNA-Seq yeast data with extensive metadata as a project in MetaOmGraph (MOG), a tool designed for interactive analysis and visualization. This approach enables reuse of public RNA-Seq data for exploratory discovery, providing a rich context for experimentalists to make novel, experimentally testable hypotheses about candidate genes.
Collapse
Affiliation(s)
- Jing Li
- Genetics and Genomics Graduate Program, Iowa State University, Ames, IA, United States
- Department of Genetics, Development, and Cell Biology, Iowa State University, Ames, IA, United States
- Center for Metabolic Biology, Iowa State University, Ames, IA, United States
| | - Urminder Singh
- Department of Genetics, Development, and Cell Biology, Iowa State University, Ames, IA, United States
- Center for Metabolic Biology, Iowa State University, Ames, IA, United States
- Bioinformatics and Computational Biology Program, Iowa State University, Ames, IA, United States
| | - Zebulun Arendsee
- Department of Genetics, Development, and Cell Biology, Iowa State University, Ames, IA, United States
- Center for Metabolic Biology, Iowa State University, Ames, IA, United States
- Bioinformatics and Computational Biology Program, Iowa State University, Ames, IA, United States
| | - Eve Syrkin Wurtele
- Genetics and Genomics Graduate Program, Iowa State University, Ames, IA, United States
- Department of Genetics, Development, and Cell Biology, Iowa State University, Ames, IA, United States
- Center for Metabolic Biology, Iowa State University, Ames, IA, United States
- Bioinformatics and Computational Biology Program, Iowa State University, Ames, IA, United States
| |
Collapse
|
48
|
Silva J, Nina P, Romão L. Translation of ABCE1 Is Tightly Regulated by Upstream Open Reading Frames in Human Colorectal Cells. Biomedicines 2021; 9:biomedicines9080911. [PMID: 34440115 PMCID: PMC8389594 DOI: 10.3390/biomedicines9080911] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/16/2021] [Accepted: 07/26/2021] [Indexed: 11/29/2022] Open
Abstract
ATP-binding cassette subfamily E member 1 (ABCE1) belongs to the ABC protein family of transporters; however, it does not behave as a drug transporter. Instead, ABCE1 actively participates in different stages of translation and is also associated with oncogenic functions. Ribosome profiling analysis in colorectal cancer cells has revealed a high ribosome occupancy in the human ABCE1 mRNA 5′-leader sequence, indicating the presence of translatable upstream open reading frames (uORFs). These cis-acting translational regulatory elements usually act as repressors of translation of the main coding sequence. In the present study, we dissect the regulatory function of the five AUG and five non-AUG uORFs identified in the human ABCE1 mRNA 5′-leader sequence. We show that the expression of the main coding sequence is tightly regulated by the ABCE1 AUG uORFs in colorectal cells. Our results are consistent with a model wherein uORF1 is efficiently translated, behaving as a barrier to downstream uORF translation. The few ribosomes that can bypass uORF1 (and/or uORF2) must probably initiate at the inhibitory uORF3 or uORF5 that efficiently repress translation of the main ORF. This inhibitory property is slightly overcome in conditions of endoplasmic reticulum stress. In addition, we observed that these potent translation-inhibitory AUG uORFs function equally in cancer and in non-tumorigenic colorectal cells, which is consistent with a lack of oncogenic function. In conclusion, we establish human ABCE1 as an additional example of uORF-mediated translational regulation and that this tight regulation contributes to control ABCE1 protein levels in different cell environments.
Collapse
Affiliation(s)
- Joana Silva
- Departamento de Genética Humana, Instituto Nacional de Saúde Doutor Ricardo Jorge, 1649-016 Lisboa, Portugal; (J.S.); (P.N.)
- Instituto de Biossistemas e Ciências Integrativas (BioISI), Faculdade de Ciências, Universidade de Lisboa, 1749-016 Lisboa, Portugal
| | - Pedro Nina
- Departamento de Genética Humana, Instituto Nacional de Saúde Doutor Ricardo Jorge, 1649-016 Lisboa, Portugal; (J.S.); (P.N.)
| | - Luísa Romão
- Departamento de Genética Humana, Instituto Nacional de Saúde Doutor Ricardo Jorge, 1649-016 Lisboa, Portugal; (J.S.); (P.N.)
- Instituto de Biossistemas e Ciências Integrativas (BioISI), Faculdade de Ciências, Universidade de Lisboa, 1749-016 Lisboa, Portugal
- Correspondence: ; Tel.: +351-21-750-8155
| |
Collapse
|
49
|
Guerra-Almeida D, Tschoeke DA, da-Fonseca RN. Understanding small ORF diversity through a comprehensive transcription feature classification. DNA Res 2021; 28:6317669. [PMID: 34240112 PMCID: PMC8435553 DOI: 10.1093/dnares/dsab007] [Citation(s) in RCA: 15] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/22/2020] [Indexed: 11/13/2022] Open
Abstract
Small open reading frames (small ORFs/sORFs/smORFs) are potentially coding sequences smaller than 100 codons that have historically been considered junk DNA by gene prediction software and in annotation screening; however, the advent of next-generation sequencing has contributed to the deeper investigation of junk DNA regions and their transcription products, resulting in the emergence of smORFs as a new focus of interest in systems biology. Several smORF peptides were recently reported in noncanonical mRNAs as new players in numerous biological contexts; however, their relevance is still overlooked in coding potential analysis. Hence, this review proposes a smORF classification based on transcriptional features, discussing the most promising approaches to investigate smORFs based on their different characteristics. First, smORFs were divided into nonexpressed (intergenic) and expressed (genic) smORFs. Second, genic smORFs were classified as smORFs located in noncoding RNAs (ncRNAs) or canonical mRNAs. Finally, smORFs in ncRNAs were further subdivided into sequences located in small or long RNAs, whereas smORFs located in canonical mRNAs were subdivided into several specific classes depending on their localization along the gene. We hope that this review provides new insights into large-scale annotations and reinforces the role of smORFs as essential components of a hidden coding DNA world.
Collapse
Affiliation(s)
- Diego Guerra-Almeida
- Institute of Biodiversity and Sustainability, Federal University of Rio de Janeiro, Rio de Janeiro, Brazil
| | - Diogo Antonio Tschoeke
- Alberto Luiz Coimbra Institute of Graduate Studies and Engineering Research (COPPE), Biomedical Engineering Program, Federal University of Rio de Janeiro, Rio de Janeiro, Brazil
| | - Rodrigo Nunes- da-Fonseca
- Institute of Biodiversity and Sustainability, Federal University of Rio de Janeiro, Rio de Janeiro, Brazil.,National Institute of Science and Technology in Molecular Entomology, Rio de Janeiro, Brazil
| |
Collapse
|
50
|
In-depth proteomics analysis of sentinel lymph nodes from individuals with endometrial cancer. CELL REPORTS MEDICINE 2021; 2:100318. [PMID: 34195683 PMCID: PMC8233695 DOI: 10.1016/j.xcrm.2021.100318] [Citation(s) in RCA: 18] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 09/22/2020] [Revised: 12/17/2020] [Accepted: 05/20/2021] [Indexed: 12/18/2022]
Abstract
Endometrial cancer (EC) is one of the most common gynecological cancers worldwide. Sentinel lymph node (SLN) status could be a major prognostic factor in evaluation of EC, but several prospective studies need to be performed. Here we report an in-depth proteomics analysis showing significant variations in the SLN protein landscape in EC. We show that SLNs are correlated to each tumor grade, which strengthens evidence of SLN involvement in EC. A few proteins are overexpressed specifically at each EC tumor grade and in the corresponding SLN. These proteins, which are significantly variable in both locations, should be considered potential markers of overall survival. Five major proteins for EC and SLN (PRSS3, PTX3, ASS1, ALDH2, and ANXA1) were identified in large-scale proteomics and validated by immunohistochemistry. This study improves stratification and diagnosis of individuals with EC as a result of proteomics profiling of SLNs.
Collapse
|