1
|
de Azevedo ALK, Gomig THB, Batista M, de Oliveira JC, Cavalli IJ, Gradia DF, Ribeiro EMDSF. Peptidomics and Machine Learning-based Evaluation of Noncoding RNA-Derived Micropeptides in Breast Cancer: Expression Patterns and Functional/Therapeutic Insights. J Transl Med 2024; 104:102150. [PMID: 39393531 DOI: 10.1016/j.labinv.2024.102150] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/23/2024] [Revised: 09/20/2024] [Accepted: 10/03/2024] [Indexed: 10/13/2024] Open
Abstract
Breast cancer is a highly heterogeneous disease characterized by different subtypes arising from molecular alterations that give the disease different phenotypes, clinical behaviors, and prognostic. The noncoding RNA (ncRNA)-derived micropeptides (MPs) represent a novel layer of complexity in cancer study once they can be biologically active and can present potential as biomarkers and also in therapeutics. However, few large-scale studies address the expression of these peptides at the peptidomics level or evaluate their functions and potential in peptide-based therapeutics for breast cancer. In this study, we propose deepening the landscape of ncRNA-derived MPs in breast cancer subtypes and advance the comprehension of the relevance of these molecules to the disease. First, we constructed a 16,349 unique putative MP sequence data set by integrating 2 previously published lists of predicted ncRNA-derived MPs. We evaluated its expression on high-throughput mass spectrometry data of breast tumor samples from different subtypes. Next, we applied several machine and deep learning tools, such as AntiCP 2.0, MULocDeep, PEPstrMOD, Peptipedia, and PreAIP, to predict its functions, cellular localization, tertiary structure, physicochemical features, and other properties related to therapeutics. We identified 58 peptides expressed on breast tissue, including 27 differentially expressed MPs in tumor compared with nontumor samples and MPs exhibiting tumor or subtype specificity. These peptides presented physicochemical features compatible with the canonical proteome and were predicted to influence the tumor immune environment and participate in cell communication, metabolism, and signaling processes. In addition, some MPs presented potential as anticancer, antiinflammatory, and antiangiogenic molecules. Our data demonstrate that MPs derived from ncRNAs have expression patterns associated with specific breast cancer subtypes and tumor specificity, thus highlighting their potential as biomarkers for molecular classification. We also reinforce the relevance of MPs as biologically active molecules that play a role in breast tumorigenesis, besides their potential in peptide-based therapeutics.
Collapse
Affiliation(s)
| | | | - Michel Batista
- Laboratory of Applied Sciences and Technologies in Health, Carlos Chagas Institute, Fiocruz, Curitiba, Brazil; Mass Spectrometry Facility-RPT02H, Carlos Chagas Institute, Fiocruz, Curitiba, Brazil
| | | | - Iglenir João Cavalli
- Genetics Post-Graduation Program, Genetics Department, Federal University of Paraná, Curitiba, Brazil
| | - Daniela Fiori Gradia
- Genetics Post-Graduation Program, Genetics Department, Federal University of Paraná, Curitiba, Brazil
| | | |
Collapse
|
2
|
Deutsch EW, Kok LW, Mudge JM, Ruiz-Orera J, Fierro-Monti I, Sun Z, Abelin JG, Alba MM, Aspden JL, Bazzini AA, Bruford EA, Brunet MA, Calviello L, Carr SA, Carvunis AR, Chothani S, Clauwaert J, Dean K, Faridi P, Frankish A, Hubner N, Ingolia NT, Magrane M, Martin MJ, Martinez TF, Menschaert G, Ohler U, Orchard S, Rackham O, Roucou X, Slavoff SA, Valen E, Wacholder A, Weissman JS, Wu W, Xie Z, Choudhary J, Bassani-Sternberg M, Vizcaíno JA, Ternette N, Moritz RL, Prensner JR, van Heesch S. High-quality peptide evidence for annotating non-canonical open reading frames as human proteins. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.09.09.612016. [PMID: 39314370 PMCID: PMC11419116 DOI: 10.1101/2024.09.09.612016] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Subscribe] [Scholar Register] [Indexed: 09/25/2024]
Abstract
A major scientific drive is to characterize the protein-coding genome as it provides the primary basis for the study of human health. But the fundamental question remains: what has been missed in prior genomic analyses? Over the past decade, the translation of non-canonical open reading frames (ncORFs) has been observed across human cell types and disease states, with major implications for proteomics, genomics, and clinical science. However, the impact of ncORFs has been limited by the absence of a large-scale understanding of their contribution to the human proteome. Here, we report the collaborative efforts of stakeholders in proteomics, immunopeptidomics, Ribo-seq ORF discovery, and gene annotation, to produce a consensus landscape of protein-level evidence for ncORFs. We show that at least 25% of a set of 7,264 ncORFs give rise to translated gene products, yielding over 3,000 peptides in a pan-proteome analysis encompassing 3.8 billion mass spectra from 95,520 experiments. With these data, we developed an annotation framework for ncORFs and created public tools for researchers through GENCODE and PeptideAtlas. This work will provide a platform to advance ncORF-derived proteins in biomedical discovery and, beyond humans, diverse animals and plants where ncORFs are similarly observed.
Collapse
Affiliation(s)
- Eric W Deutsch
- Institute for Systems Biology (ISB), Seattle, WA, 98109, USA
| | - Leron W Kok
- Princess Máxima Center for Pediatric Oncology, Utrecht, 3584 CS, The Netherlands
- Oncode Institute, Utrecht, The Netherlands
| | - Jonathan M Mudge
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, CB10 1SD, UK
| | - Jorge Ruiz-Orera
- Cardiovascular and Metabolic Sciences, Max Delbrück Center for Molecular Medicine in the Helmholtz Association (MDC), Berlin, 13125, Germany
| | - Ivo Fierro-Monti
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, CB10 1SD, UK
| | - Zhi Sun
- Institute for Systems Biology (ISB), Seattle, WA, 98109, USA
| | | | - M Mar Alba
- Hospital del Mar Research Institute, Barcelona, Spain
- Catalan Institute for Research and Advanced Studies (ICREA), Barcelona, Spain
| | - Julie L Aspden
- School of Molecular and Cellular Biology, Faculty of Biological Sciences, University of Leeds, Leeds, LS2 9JT, UK
| | - Ariel A Bazzini
- Stowers Institute for Medical Research, Kansas City, MO, 64110, USA
- Department of Molecular and Integrative Physiology, University of Kansas Medical Center, Kansas City, KS, 66160, USA
| | - Elspeth A Bruford
- HUGO Gene Nomenclature Committee (HGNC), Department of Haematology, University of Cambridge School of Clinical Medicine, Cambridge, UK
| | - Marie A Brunet
- Pediatrics Department, University of Sherbrooke, Sherbrooke, Québec, Canada
- Centre de Recherche du Centre hospitalier universitaire de Sherbrooke (CRCHUS), Sherbrooke, Québec, Canada
| | | | - Steven A Carr
- Broad Institute of MIT and Harvard, Cambridge, MA, 02142, USA
| | - Anne-Ruxandra Carvunis
- Department of Computational and Systems Biology, School of Medicine, University of Pittsburgh, Pittsburgh, PA, 15213, USA
- Pittsburgh Center for Evolutionary Biology and Medicine, School of Medicine, University of Pittsburgh, Pittsburgh, PA, 15213, USA
| | - Sonia Chothani
- Centre for Computational Biology and Program in Cardiovascular and Metabolic Disorders, Duke-NUS (National University of Singapore) Medical School, Singapore
| | - Jim Clauwaert
- Department of Pediatrics, Division of Pediatric Hematology/Oncology, University of Michigan Medical School, Ann Arbor, MI, 48109, USA
- Department of Biological Chemistry, University of Michigan Medical School, Ann Arbor, MI, 48109, USA
| | - Kellie Dean
- School of Biochemistry and Cell Biology, University College Cork, Cork, Ireland
| | - Pouya Faridi
- Centre for Cancer Research, Hudson Institute of Medical Research, Clayton, VIC, Australia
- Monash Proteomics & Metabolomics Platform, Department of Medicine, School of Clinical Sciences, Monash University, Clayton, VIC, Australia
| | - Adam Frankish
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, CB10 1SD, UK
| | - Norbert Hubner
- Cardiovascular and Metabolic Sciences, Max Delbrück Center for Molecular Medicine in the Helmholtz Association (MDC), Berlin, 13125, Germany
- Charité-Universitätsmedizin Berlin, Berlin, 10117, Germany
- Helmholtz-Institute for Translational AngioCardioScience (HI-TAC) of the Max Delbrück Center for Molecular Medicine in the Helmholtz Association (MDC) at Heidelberg University, Heidelberg, 69117, Germany
- DZHK (German Center for Cardiovascular Research), Partner Site Berlin, Berlin, 13347, Germany
| | - Nicholas T Ingolia
- Department of Molecular and Cell Biology, Center for Computational Biology, University of California, Berkeley, Berkeley, CA, 94720-3202, USA
| | - Michele Magrane
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, CB10 1SD, UK
| | - Maria Jesus Martin
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, CB10 1SD, UK
| | - Thomas F Martinez
- Department of Pharmaceutical Sciences, University of California, Irvine, Irvine, CA, 92617, USA
- Department of Biological Chemistry, University of California, Irvine, Irvine, CA, 92617, USA
- Chao Family Comprehensive Cancer Center, University of California, Irvine, Irvine, CA, 92617, USA
| | - Gerben Menschaert
- Biobix, Lab of Bioinformatics and Computational Genomics, Department of Mathematical Modelling, Statistics and Bioinformatics, Ghent University, Ghent, Belgium
| | - Uwe Ohler
- Department of Biology, Humboldt University Berlin, Berlin, 10117, Germany
- Berlin Institute of Medical Systems Biology (BIMSB), Max Delbrück Center for Molecular Medicine in the Helmholtz Association, Berlin, 10115, Germany
| | - Sandra Orchard
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, CB10 1SD, UK
| | | | - Xavier Roucou
- Department of Biochemistry and Functional Genomics, Université de Sherbrooke, Sherbrooke, Québec, Canada
| | - Sarah A Slavoff
- Department of Chemistry, Yale University, New Haven, CT, 06520, USA
- Department of Molecular Biophysics and Biochemistry, Yale University, New Haven, CT, 06520, USA
- Institute for Biomolecular Design and Discovery, Yale University, West Haven, CT, 06516, USA
| | - Eivind Valen
- Department of Biosciences, University of Oslo, Oslo, Norway
| | - Aaron Wacholder
- Department of Computational and Systems Biology, School of Medicine, University of Pittsburgh, Pittsburgh, PA, 15213, USA
- Pittsburgh Center for Evolutionary Biology and Medicine, School of Medicine, University of Pittsburgh, Pittsburgh, PA, 15213, USA
| | - Jonathan S Weissman
- Whitehead Institute for Biomedical Research, Cambridge, MA, 02142, USA
- Department of Biology, Massachusetts Institute of Technology, Cambridge, MA, 02142, USA
- Howard Hughes Medical Institute, Massachusetts Institute of Technology, Cambridge, MA, 02138, USA
- David H. Koch Institute for Integrative Cancer Research, Massachusetts Institute of Technology, Cambridge, MA, 02139, USA
| | - Wei Wu
- Singapore Immunology Network (SIgN), Agency for Science, Technology and Research (A*STAR), Singapore
- Department of Pharmacy & Pharmaceutical sciences, National University of Singapore (NUS), Singapore
| | - Zhi Xie
- State Key Laboratory of Ophthalmology, Zhongshan Ophthalmic Center, Sun Yat-sen University, Guangzhou, China
| | - Jyoti Choudhary
- Functional Proteomics Group, Institute of Cancer Research, Chester Betty Labs, London, SW3 6JB, UK
| | - Michal Bassani-Sternberg
- Ludwig Institute for Cancer Research, University of Lausanne, Lausanne, 1005, Switzerland
- Department of Oncology, Centre hospitalier universitaire vaudois (CHUV), Lausanne, 1005, Switzerland
- Agora Cancer Research Centre, Lausanne, 1011, Switzerland
| | - Juan Antonio Vizcaíno
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, CB10 1SD, UK
| | - Nicola Ternette
- School of Life Sciences, Division Cell Signalling and Immunology, University of Dundee, Dundee, DD1 5EH, UK
- Centre for Immuno-Oncology, University of Oxford, Oxford, OX37DQ, UK
| | - Robert L Moritz
- Institute for Systems Biology (ISB), Seattle, WA, 98109, USA
| | - John R Prensner
- Department of Pediatrics, Division of Pediatric Hematology/Oncology, University of Michigan Medical School, Ann Arbor, MI, 48109, USA
- Department of Biological Chemistry, University of Michigan Medical School, Ann Arbor, MI, 48109, USA
| | - Sebastiaan van Heesch
- Princess Máxima Center for Pediatric Oncology, Utrecht, 3584 CS, The Netherlands
- Oncode Institute, Utrecht, The Netherlands
| |
Collapse
|
3
|
Valencia JD, Hendrix DA. Improving deep models of protein-coding potential with a Fourier-transform architecture and machine translation task. PLoS Comput Biol 2023; 19:e1011526. [PMID: 37824580 PMCID: PMC10597526 DOI: 10.1371/journal.pcbi.1011526] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/19/2023] [Revised: 10/24/2023] [Accepted: 09/18/2023] [Indexed: 10/14/2023] Open
Abstract
Ribosomes are information-processing macromolecular machines that integrate complex sequence patterns in messenger RNA (mRNA) transcripts to synthesize proteins. Studies of the sequence features that distinguish mRNAs from long noncoding RNAs (lncRNAs) may yield insight into the information that directs and regulates translation. Computational methods for calculating protein-coding potential are important for distinguishing mRNAs from lncRNAs during genome annotation, but most machine learning methods for this task rely on previously known rules to define features. Sequence-to-sequence (seq2seq) models, particularly ones using transformer networks, have proven capable of learning complex grammatical relationships between words to perform natural language translation. Seeking to leverage these advancements in the biological domain, we present a seq2seq formulation for predicting protein-coding potential with deep neural networks and demonstrate that simultaneously learning translation from RNA to protein improves classification performance relative to a classification-only training objective. Inspired by classical signal processing methods for gene discovery and Fourier-based image-processing neural networks, we introduce LocalFilterNet (LFNet). LFNet is a network architecture with an inductive bias for modeling the three-nucleotide periodicity apparent in coding sequences. We incorporate LFNet within an encoder-decoder framework to test whether the translation task improves the classification of transcripts and the interpretation of their sequence features. We use the resulting model to compute nucleotide-resolution importance scores, revealing sequence patterns that could assist the cellular machinery in distinguishing mRNAs and lncRNAs. Finally, we develop a novel approach for estimating mutation effects from Integrated Gradients, a backpropagation-based feature attribution, and characterize the difficulty of efficient approximations in this setting.
Collapse
Affiliation(s)
- Joseph D. Valencia
- School of Electrical Engineering and Computer Science, Oregon State University, Corvallis, Oregon, United States of America
| | - David A. Hendrix
- School of Electrical Engineering and Computer Science, Oregon State University, Corvallis, Oregon, United States of America
- Department of Biochemistry and Biophysics, Oregon State University, Corvallis, Oregon, United States of America
| |
Collapse
|
4
|
Prensner JR, Abelin JG, Kok LW, Clauser KR, Mudge JM, Ruiz-Orera J, Bassani-Sternberg M, Moritz RL, Deutsch EW, van Heesch S. What Can Ribo-Seq, Immunopeptidomics, and Proteomics Tell Us About the Noncanonical Proteome? Mol Cell Proteomics 2023; 22:100631. [PMID: 37572790 PMCID: PMC10506109 DOI: 10.1016/j.mcpro.2023.100631] [Citation(s) in RCA: 38] [Impact Index Per Article: 19.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/14/2023] [Revised: 07/21/2023] [Accepted: 08/08/2023] [Indexed: 08/14/2023] Open
Abstract
Ribosome profiling (Ribo-Seq) has proven transformative for our understanding of the human genome and proteome by illuminating thousands of noncanonical sites of ribosome translation outside the currently annotated coding sequences (CDSs). A conservative estimate suggests that at least 7000 noncanonical ORFs are translated, which, at first glance, has the potential to expand the number of human protein CDSs by 30%, from ∼19,500 annotated CDSs to over 26,000 annotated CDSs. Yet, additional scrutiny of these ORFs has raised numerous questions about what fraction of them truly produce a protein product and what fraction of those can be understood as proteins according to conventional understanding of the term. Adding further complication is the fact that published estimates of noncanonical ORFs vary widely by around 30-fold, from several thousand to several hundred thousand. The summation of this research has left the genomics and proteomics communities both excited by the prospect of new coding regions in the human genome but searching for guidance on how to proceed. Here, we discuss the current state of noncanonical ORF research, databases, and interpretation, focusing on how to assess whether a given ORF can be said to be "protein coding."
Collapse
Affiliation(s)
- John R Prensner
- Division of Pediatric Hematology/Oncology, Department of Pediatrics, University of Michigan Medical School, Ann Arbor, Michigan, USA; Department of Biological Chemistry, University of Michigan Medical School, Ann Arbor, Michigan, USA.
| | | | - Leron W Kok
- Princess Máxima Center for Pediatric Oncology, Utrecht, The Netherlands
| | - Karl R Clauser
- Broad Institute of MIT and Harvard, Cambridge, Massachusetts, USA
| | - Jonathan M Mudge
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Cambridge, UK
| | - Jorge Ruiz-Orera
- Cardiovascular and Metabolic Sciences, Max Delbrück Center for Molecular Medicine in the Helmholtz Association (MDC), Berlin, Germany
| | - Michal Bassani-Sternberg
- Ludwig Institute for Cancer Research, Agora Center Bugnon 25A, University of Lausanne, Lausanne, Switzerland; Department of Oncology, Centre Hospitalier Universitaire Vaudois (CHUV), Lausanne, Switzerland; Agora Cancer Research Centre, Lausanne, Switzerland
| | - Robert L Moritz
- Institute for Systems Biology (ISB), Seattle, Washington, USA
| | - Eric W Deutsch
- Institute for Systems Biology (ISB), Seattle, Washington, USA
| | | |
Collapse
|
5
|
Prensner JR, Abelin JG, Kok LW, Clauser KR, Mudge JM, Ruiz-Orera J, Bassani-Sternberg M, Deutsch EW, van Heesch S. What can Ribo-seq and proteomics tell us about the non-canonical proteome? BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.05.16.541049. [PMID: 37292611 PMCID: PMC10245706 DOI: 10.1101/2023.05.16.541049] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/10/2023]
Abstract
Ribosome profiling (Ribo-seq) has proven transformative for our understanding of the human genome and proteome by illuminating thousands of non-canonical sites of ribosome translation outside of the currently annotated coding sequences (CDSs). A conservative estimate suggests that at least 7,000 non-canonical open reading frames (ORFs) are translated, which, at first glance, has the potential to expand the number of human protein-coding sequences by 30%, from ∼19,500 annotated CDSs to over 26,000. Yet, additional scrutiny of these ORFs has raised numerous questions about what fraction of them truly produce a protein product and what fraction of those can be understood as proteins according to conventional understanding of the term. Adding further complication is the fact that published estimates of non-canonical ORFs vary widely by around 30-fold, from several thousand to several hundred thousand. The summation of this research has left the genomics and proteomics communities both excited by the prospect of new coding regions in the human genome, but searching for guidance on how to proceed. Here, we discuss the current state of non-canonical ORF research, databases, and interpretation, focusing on how to assess whether a given ORF can be said to be "protein-coding". In brief The human genome encodes thousands of non-canonical open reading frames (ORFs) in addition to protein-coding genes. As a nascent field, many questions remain regarding non-canonical ORFs. How many exist? Do they encode proteins? What level of evidence is needed for their verification? Central to these debates has been the advent of ribosome profiling (Ribo-seq) as a method to discern genome-wide ribosome occupancy, and immunopeptidomics as a method to detect peptides that are processed and presented by MHC molecules and not observed in traditional proteomics experiments. This article provides a synthesis of the current state of non-canonical ORF research and proposes standards for their future investigation and reporting. Highlights Combined use of Ribo-seq and proteomics-based methods enables optimal confidence in detecting non-canonical ORFs and their protein products.Ribo-seq can provide more sensitive detection of non-canonical ORFs, but data quality and analytical pipelines will impact results.Non-canonical ORF catalogs are diverse and span both high-stringency and low-stringency ORF nominations.A framework for standardized non-canonical ORF evidence will advance the research field.
Collapse
Affiliation(s)
- John R. Prensner
- Department of Pediatrics, Division of Pediatric Hematology/Oncology, University of Michigan Medical School, Ann Arbor, MI 48109, USA
| | | | - Leron W. Kok
- Princess Máxima Center for Pediatric Oncology, Heidelberglaan 25, 3584 CS, Utrecht, the Netherlands
| | - Karl R. Clauser
- Broad Institute of MIT and Harvard, Cambridge, MA, 02142, USA
| | - Jonathan M. Mudge
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Jorge Ruiz-Orera
- Cardiovascular and Metabolic Sciences, Max Delbrück Center for Molecular Medicine in the Helmholtz Association (MDC), 13125 Berlin, Germany
| | - Michal Bassani-Sternberg
- Ludwig Institute for Cancer Research, University of Lausanne, Agora Center Bugnon 25A, 1005 Lausanne, Switzerland
- Department of Oncology, Centre hospitalier universitaire vaudois (CHUV), Rue du Bugnon 46, 1005 Lausanne, Switzerland
- Agora Cancer Research Centre, 1011 Lausanne, Switzerland
| | - Eric W. Deutsch
- Institute for Systems Biology (ISB), Seattle, Washington 98109, USA
| | - Sebastiaan van Heesch
- Princess Máxima Center for Pediatric Oncology, Heidelberglaan 25, 3584 CS, Utrecht, the Netherlands
| |
Collapse
|
6
|
Valencia JD, Hendrix DA. Improving deep models of protein-coding potential with a Fourier-transform architecture and machine translation task. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.04.03.535488. [PMID: 37066250 PMCID: PMC10104019 DOI: 10.1101/2023.04.03.535488] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 05/05/2023]
Abstract
Ribosomes are information-processing macromolecular machines that integrate complex sequence patterns in messenger RNA (mRNA) transcripts to synthesize proteins. Studies of the sequence features that distinguish mRNAs from long noncoding RNAs (lncRNAs) may yield insight into the information that directs and regulates translation. Computational methods for calculating protein-coding potential are important for distinguishing mRNAs from lncRNAs during genome annotation, but most machine learning methods for this task rely on previously known rules to define features. Sequence-to-sequence (seq2seq) models, particularly ones using transformer networks, have proven capable of learning complex grammatical relationships between words to perform natural language translation. Seeking to leverage these advancements in the biological domain, we present a seq2seq formulation for predicting protein-coding potential with deep neural networks and demonstrate that simultaneously learning translation from RNA to protein improves classification performance relative to a classification-only training objective. Inspired by classical signal processing methods for gene discovery and Fourier-based image-processing neural networks, we introduce LocalFilterNet (LFNet). LFNet is a network architecture with an inductive bias for modeling the three-nucleotide periodicity apparent in coding sequences. We incorporate LFNet within an encoder-decoder framework to test whether the translation task improves the classification of transcripts and the interpretation of their sequence features. We use the resulting model to compute nucleotide-resolution importance scores, revealing sequence patterns that could assist the cellular machinery in distinguishing mRNAs and lncRNAs. Finally, we develop a novel approach for estimating mutation effects from Integrated Gradients, a backpropagation-based feature attribution, and characterize the difficulty of efficient approximations in this setting.
Collapse
Affiliation(s)
- Joseph D. Valencia
- School of Electrical Engineering and Computer Science, Oregon State University, Corvallis, OR, USA
| | - David A. Hendrix
- School of Electrical Engineering and Computer Science, Oregon State University, Corvallis, OR, USA
- Department of Biochemistry and Biophysics, Oregon State University, Corvallis, OR, USA
| |
Collapse
|
7
|
Claeys T, Menu M, Bouwmeester R, Gevaert K, Martens L. Machine Learning on Large-Scale Proteomics Data Identifies Tissue and Cell-Type Specific Proteins. J Proteome Res 2023; 22:1181-1192. [PMID: 36963412 PMCID: PMC10088018 DOI: 10.1021/acs.jproteome.2c00644] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 03/26/2023]
Abstract
Using data from 183 public human data sets from PRIDE, a machine learning model was trained to identify tissue and cell-type specific protein patterns. PRIDE projects were searched with ionbot and tissue/cell type annotation was manually added. Data from physiological samples were used to train a Random Forest model on protein abundances to classify samples into tissues and cell types. Subsequently, a one-vs-all classification and feature importance were used to analyze the most discriminating protein abundances per class. Based on protein abundance alone, the model was able to predict tissues with 98% accuracy, and cell types with 99% accuracy. The F-scores describe a clear view on tissue-specific proteins and tissue-specific protein expression patterns. In-depth feature analysis shows slight confusion between physiologically similar tissues, demonstrating the capacity of the algorithm to detect biologically relevant patterns. These results can in turn inform downstream uses, from identification of the tissue of origin of proteins in complex samples such as liquid biopsies, to studying the proteome of tissue-like samples such as organoids and cell lines.
Collapse
Affiliation(s)
- Tine Claeys
- VIB-UGent Center for Medical Biotechnology, VIB, 9052 Ghent, Belgium
- Department of Biomolecular Medicine, Ghent University, 9052 Ghent, Belgium
| | - Maxime Menu
- VIB-UGent Center for Medical Biotechnology, VIB, 9052 Ghent, Belgium
- Department of Biomolecular Medicine, Ghent University, 9052 Ghent, Belgium
| | - Robbin Bouwmeester
- VIB-UGent Center for Medical Biotechnology, VIB, 9052 Ghent, Belgium
- Department of Biomolecular Medicine, Ghent University, 9052 Ghent, Belgium
| | - Kris Gevaert
- VIB-UGent Center for Medical Biotechnology, VIB, 9052 Ghent, Belgium
- Department of Biomolecular Medicine, Ghent University, 9052 Ghent, Belgium
| | - Lennart Martens
- VIB-UGent Center for Medical Biotechnology, VIB, 9052 Ghent, Belgium
- Department of Biomolecular Medicine, Ghent University, 9052 Ghent, Belgium
| |
Collapse
|
8
|
Functional Relationships between Long Non-Coding RNAs and Estrogen Receptor Alpha: A New Frontier in Hormone-Responsive Breast Cancer Management. Int J Mol Sci 2023; 24:ijms24021145. [PMID: 36674656 PMCID: PMC9863308 DOI: 10.3390/ijms24021145] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/22/2022] [Revised: 01/03/2023] [Accepted: 01/05/2023] [Indexed: 01/11/2023] Open
Abstract
In the complex and articulated machinery of the human genome, less than 2% of the transcriptome encodes for proteins, while at least 75% is actively transcribed into non-coding RNAs (ncRNAs). Among the non-coding transcripts, those ≥200 nucleotides long (lncRNAs) are receiving growing attention for their involvement in human diseases, particularly cancer. Genomic studies have revealed the multiplicity of processes, including neoplastic transformation and tumor progression, in which lncRNAs are involved by regulating gene expression at epigenetic, transcriptional, and post-transcriptional levels by mechanism(s) that still need to be clarified. In breast cancer, several lncRNAs were identified and demonstrated to have either oncogenic or tumor-suppressive roles. The functional understanding of the mechanisms of lncRNA action in this disease could represent a potential for translational applications, as these molecules may serve as novel biomarkers of clinical use and potential therapeutic targets. This review highlights the relationship between lncRNAs and the principal hallmark of the luminal breast cancer phenotype, estrogen receptor α (ERα), providing an overview of new potential ways to inhibit estrogenic signaling via this nuclear receptor toward escaping resistance to endocrine therapy.
Collapse
|
9
|
Bogaert A, Fijalkowska D, Staes A, Van de Steene T, Demol H, Gevaert K. Limited evidence for protein products of non-coding transcripts in the HEK293T cellular cytosol. Mol Cell Proteomics 2022; 21:100264. [PMID: 35788065 PMCID: PMC9396073 DOI: 10.1016/j.mcpro.2022.100264] [Citation(s) in RCA: 10] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/21/2022] [Revised: 06/22/2022] [Accepted: 06/30/2022] [Indexed: 10/25/2022] Open
Abstract
Ribosome profiling has revealed translation outside of canonical coding sequences (CDSs) including translation of short upstream ORFs, long non-coding RNAs, overlapping ORFs, ORFs in UTRs or ORFs in alternative reading frames. Studies combining mass spectrometry, ribosome profiling and CRISPR-based screens showed that hundreds of ORFs derived from non-coding transcripts produce (micro)proteins, while other studies failed to find evidence for such types of non-canonical translation products. Here, we attempted to discover translation products from non-coding regions by strongly reducing the complexity of the sample prior to mass spectrometric analysis. We used an extended database as the search space and applied stringent filtering of the identified peptides to find evidence for novel translation events. We show that, theoretically our strategy facilitates the detection of translation events of transcripts from non-coding regions, but experimentally only find 19 peptides that might originate from such translation events. Finally, Virotrap based interactome analysis of two N-terminal proteoforms originating from non-coding regions finally showed the functional potential of these novel proteins.
Collapse
Affiliation(s)
- Annelies Bogaert
- VIB Center for Medical Biotechnology, VIB, Ghent, 9052, Belgium; Department of Biomolecular Medicine, Ghent University, Ghent, 9052, Belgium
| | - Daria Fijalkowska
- VIB Center for Medical Biotechnology, VIB, Ghent, 9052, Belgium; Department of Biomolecular Medicine, Ghent University, Ghent, 9052, Belgium
| | - An Staes
- VIB Center for Medical Biotechnology, VIB, Ghent, 9052, Belgium; Department of Biomolecular Medicine, Ghent University, Ghent, 9052, Belgium
| | - Tessa Van de Steene
- VIB Center for Medical Biotechnology, VIB, Ghent, 9052, Belgium; Department of Biomolecular Medicine, Ghent University, Ghent, 9052, Belgium
| | - Hans Demol
- VIB Center for Medical Biotechnology, VIB, Ghent, 9052, Belgium; Department of Biomolecular Medicine, Ghent University, Ghent, 9052, Belgium
| | - Kris Gevaert
- VIB Center for Medical Biotechnology, VIB, Ghent, 9052, Belgium; Department of Biomolecular Medicine, Ghent University, Ghent, 9052, Belgium.
| |
Collapse
|
10
|
Luo X, Huang Y, Li H, Luo Y, Zuo Z, Ren J, Xie Y. SPENCER: a comprehensive database for small peptides encoded by noncoding RNAs in cancer patients. Nucleic Acids Res 2022; 50:D1373-D1381. [PMID: 34570216 PMCID: PMC8728293 DOI: 10.1093/nar/gkab822] [Citation(s) in RCA: 35] [Impact Index Per Article: 11.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/14/2021] [Revised: 09/03/2021] [Accepted: 09/08/2021] [Indexed: 01/07/2023] Open
Abstract
As an increasing number of noncoding RNAs (ncRNAs) have been suggested to encode short bioactive peptides in cancer, the exploration of ncRNA-encoded small peptides (ncPEPs) is emerging as a fascinating field in cancer research. To assist in studies on the regulatory mechanisms of ncPEPs, we describe here a database called SPENCER (http://spencer.renlab.org). Currently, SPENCER has collected a total of 2806 mass spectrometry (MS) data points from 55 studies, covering 1007 tumor samples and 719 normal samples. Using an MS-based proteomics analysis pipeline, SPENCER identified 29 526 ncPEPs across 15 different cancer types. Specifically, 22 060 of these ncPEPs were experimentally validated in other studies. By comparing tumor and normal samples, the identified ncPEPs were divided into four expression groups: tumor-specific, upregulated in cancer, downregulated in cancer, and others. Additionally, since ncPEPs are potential targets for neoantigen-based cancer immunotherapy, SPENCER also predicted the immunogenicity of all the identified ncPEPs by assessing their MHC-I binding affinity, stability, and TCR recognition probability. As a result, 4497 ncPEPs curated in SPENCER were predicted to be immunogenic. Overall, SPENCER will be a useful resource for investigating cancer-associated ncPEPs and may boost further research in cancer.
Collapse
Affiliation(s)
- Xiaotong Luo
- School of Life Sciences, Precision Medicine Institute, The First Affiliated Hospital, Sun Yat-sen University, Guangzhou 510080, China
- State Key Laboratory of Oncology in South China, Cancer Center, Collaborative Innovation Center for Cancer Medicine, Sun Yat-sen University, Guangzhou 510060, China
| | - Yuantai Huang
- School of Life Sciences, Precision Medicine Institute, The First Affiliated Hospital, Sun Yat-sen University, Guangzhou 510080, China
- State Key Laboratory of Oncology in South China, Cancer Center, Collaborative Innovation Center for Cancer Medicine, Sun Yat-sen University, Guangzhou 510060, China
| | - Huiqin Li
- School of Life Sciences, Precision Medicine Institute, The First Affiliated Hospital, Sun Yat-sen University, Guangzhou 510080, China
| | - Yihai Luo
- School of Life Sciences, Precision Medicine Institute, The First Affiliated Hospital, Sun Yat-sen University, Guangzhou 510080, China
| | - Zhixiang Zuo
- State Key Laboratory of Oncology in South China, Cancer Center, Collaborative Innovation Center for Cancer Medicine, Sun Yat-sen University, Guangzhou 510060, China
| | - Jian Ren
- School of Life Sciences, Precision Medicine Institute, The First Affiliated Hospital, Sun Yat-sen University, Guangzhou 510080, China
- State Key Laboratory of Oncology in South China, Cancer Center, Collaborative Innovation Center for Cancer Medicine, Sun Yat-sen University, Guangzhou 510060, China
| | - Yubin Xie
- School of Life Sciences, Precision Medicine Institute, The First Affiliated Hospital, Sun Yat-sen University, Guangzhou 510080, China
| |
Collapse
|
11
|
Parmar BS, Peeters MKR, Boonen K, Clark EC, Baggerman G, Menschaert G, Temmerman L. Identification of Non-Canonical Translation Products in C. elegans Using Tandem Mass Spectrometry. Front Genet 2021; 12:728900. [PMID: 34759956 PMCID: PMC8575065 DOI: 10.3389/fgene.2021.728900] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/22/2021] [Accepted: 09/16/2021] [Indexed: 11/22/2022] Open
Abstract
Transcriptome and ribosome sequencing have revealed the existence of many non-canonical transcripts, mainly containing splice variants, ncRNA, sORFs and altORFs. However, identification and characterization of products that may be translated out of these remains a challenge. Addressing this, we here report on 552 non-canonical proteins and splice variants in the model organism C. elegans using tandem mass spectrometry. Aided by sequencing-based prediction, we generated a custom proteome database tailored to search for non-canonical translation products of C. elegans. Using this database, we mined available mass spectrometric resources of C. elegans, from which 51 novel, non-canonical proteins could be identified. Furthermore, we utilized diverse proteomic and peptidomic strategies to detect 40 novel non-canonical proteins in C. elegans by LC-TIMS-MS/MS, of which 6 were common with our meta-analysis of existing resources. Together, this permits us to provide a resource with detailed annotation of 467 splice variants and 85 novel proteins mapped onto UTRs, non-coding regions and alternative open reading frames of the C. elegans genome.
Collapse
Affiliation(s)
- Bhavesh S. Parmar
- Animal Physiology and Neurobiology, University of Leuven (KU Leuven), Leuven, Belgium
| | - Marlies K. R. Peeters
- Laboratory of Bioinformatics and Computational Genomics (BioBix), Department of Mathematical Modelling, Ghent University, Ghent, Belgium
| | - Kurt Boonen
- Centre for Proteomics (CFP), University of Antwerp, Antwerp, Belgium
| | - Ellie C. Clark
- Animal Physiology and Neurobiology, University of Leuven (KU Leuven), Leuven, Belgium
| | - Geert Baggerman
- Centre for Proteomics (CFP), University of Antwerp, Antwerp, Belgium
| | - Gerben Menschaert
- Laboratory of Bioinformatics and Computational Genomics (BioBix), Department of Mathematical Modelling, Ghent University, Ghent, Belgium
| | - Liesbet Temmerman
- Animal Physiology and Neurobiology, University of Leuven (KU Leuven), Leuven, Belgium
| |
Collapse
|
12
|
Andjus S, Morillon A, Wery M. From Yeast to Mammals, the Nonsense-Mediated mRNA Decay as a Master Regulator of Long Non-Coding RNAs Functional Trajectory. Noncoding RNA 2021; 7:ncrna7030044. [PMID: 34449682 PMCID: PMC8395947 DOI: 10.3390/ncrna7030044] [Citation(s) in RCA: 9] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/15/2021] [Revised: 07/22/2021] [Accepted: 07/25/2021] [Indexed: 12/22/2022] Open
Abstract
The Nonsense-Mediated mRNA Decay (NMD) has been classically viewed as a translation-dependent RNA surveillance pathway degrading aberrant mRNAs containing premature stop codons. However, it is now clear that mRNA quality control represents only one face of the multiple functions of NMD. Indeed, NMD also regulates the physiological expression of normal mRNAs, and more surprisingly, of long non-coding (lnc)RNAs. Here, we review the different mechanisms of NMD activation in yeast and mammals, and we discuss the molecular bases of the NMD sensitivity of lncRNAs, considering the functional roles of NMD and of translation in the metabolism of these transcripts. In this regard, we describe several examples of functional micropeptides produced from lncRNAs. We propose that translation and NMD provide potent means to regulate the expression of lncRNAs, which might be critical for the cell to respond to environmental changes.
Collapse
Affiliation(s)
- Sara Andjus
- ncRNA, Epigenetic and Genome Fluidity, Institut Curie, PSL University, Sorbonne Université, CNRS UMR3244, 26 Rue d’Ulm, CEDEX 05, F-75248 Paris, France;
| | - Antonin Morillon
- ncRNA, Epigenetic and Genome Fluidity, Institut Curie, Sorbonne Université, CNRS UMR3244, 26 Rue d’Ulm, CEDEX 05, F-75248 Paris, France
- Correspondence: (A.M.); (M.W.)
| | - Maxime Wery
- ncRNA, Epigenetic and Genome Fluidity, Institut Curie, Sorbonne Université, CNRS UMR3244, 26 Rue d’Ulm, CEDEX 05, F-75248 Paris, France
- Correspondence: (A.M.); (M.W.)
| |
Collapse
|
13
|
Zhang Q, Wu E, Tang Y, Cai T, Zhang L, Wang J, Hao Y, Zhang B, Zhou Y, Guo X, Luo J, Chen R, Yang F. Deeply Mining a Universe of Peptides Encoded by Long Noncoding RNAs. Mol Cell Proteomics 2021; 20:100109. [PMID: 34129944 PMCID: PMC8335655 DOI: 10.1016/j.mcpro.2021.100109] [Citation(s) in RCA: 30] [Impact Index Per Article: 7.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/03/2021] [Revised: 05/16/2021] [Accepted: 06/02/2021] [Indexed: 11/22/2022] Open
Abstract
Many small ORFs embedded in long noncoding RNA (lncRNA) transcripts have been shown to encode biologically functional polypeptides (small ORF-encoded polypeptides [SEPs]) in different organisms. Despite some novel SEPs have been found, the identification is still hampered by their poor predictability, diminutive size, and low relative abundance. Here, we take advantage of NONCODE, a repository containing the most complete collection and annotation of lncRNA transcripts from different species, to build a novel database that attempts to maximize a collection of SEPs from human and mouse lncRNA transcripts. In order to further improve SEP discovery, we implemented two effective and complementary polypeptide enrichment strategies using 30-kDa molecular weight cutoff filter and C8 solid-phase extraction column. These combined strategies enabled us to discover 353 SEPs from eight human cell lines and 409 SEPs from three mouse cell lines and eight mouse tissues. Importantly, 19 of them were then verified through in vitro expression, immunoblotting, parallel reaction monitoring, and synthetic peptides. Subsequent bioinformatics analysis revealed that some of the physical and chemical properties of these novel SEPs, including amino acid composition and codon usage, are different from those commonly found in canonical proteins. Intriguingly, nearly 65% of the identified SEPs were found to be initiated with non-AUG start codons. The 762 novel SEPs probably represent the largest number of SEPs detected by MS reported to date. These novel SEPs might not only provide new clues for the annotation of noncoding elements in the genome but also serve as a valuable resource for functional study.
Collapse
Affiliation(s)
- Qing Zhang
- Laboratory of Protein and Peptide Pharmaceuticals & Laboratory of Proteomics, Institute of Biophysics, Chinese Academy of Sciences, Beijing, China; College of Life Sciences, University of Chinese Academy of Sciences, Beijing, China
| | - Erzhong Wu
- College of Life Sciences, University of Chinese Academy of Sciences, Beijing, China; Key Laboratory of RNA Biology, Institute of Biophysics, Chinese Academy of Sciences, Beijing, China
| | - Yiheng Tang
- College of Life Sciences, University of Chinese Academy of Sciences, Beijing, China; Key Laboratory of RNA Biology, Institute of Biophysics, Chinese Academy of Sciences, Beijing, China
| | - Tanxi Cai
- Laboratory of Protein and Peptide Pharmaceuticals & Laboratory of Proteomics, Institute of Biophysics, Chinese Academy of Sciences, Beijing, China; College of Life Sciences, University of Chinese Academy of Sciences, Beijing, China
| | - Lili Zhang
- College of Life Sciences, University of Chinese Academy of Sciences, Beijing, China; Key Laboratory of RNA Biology, Institute of Biophysics, Chinese Academy of Sciences, Beijing, China
| | - Jifeng Wang
- Laboratory of Protein and Peptide Pharmaceuticals & Laboratory of Proteomics, Institute of Biophysics, Chinese Academy of Sciences, Beijing, China; College of Life Sciences, University of Chinese Academy of Sciences, Beijing, China
| | - Yajing Hao
- College of Life Sciences, University of Chinese Academy of Sciences, Beijing, China; Key Laboratory of RNA Biology, Institute of Biophysics, Chinese Academy of Sciences, Beijing, China
| | - Bao Zhang
- College of Life Sciences, University of Chinese Academy of Sciences, Beijing, China; Key Laboratory of RNA Biology, Institute of Biophysics, Chinese Academy of Sciences, Beijing, China
| | - Yue Zhou
- Laboratory of Protein and Peptide Pharmaceuticals & Laboratory of Proteomics, Institute of Biophysics, Chinese Academy of Sciences, Beijing, China; College of Life Sciences, University of Chinese Academy of Sciences, Beijing, China; Thermofisher Scientific, Shanghai, China
| | - Xiaojing Guo
- Laboratory of Protein and Peptide Pharmaceuticals & Laboratory of Proteomics, Institute of Biophysics, Chinese Academy of Sciences, Beijing, China; College of Life Sciences, University of Chinese Academy of Sciences, Beijing, China
| | - Jianjun Luo
- College of Life Sciences, University of Chinese Academy of Sciences, Beijing, China; Key Laboratory of RNA Biology, Institute of Biophysics, Chinese Academy of Sciences, Beijing, China.
| | - Runsheng Chen
- College of Life Sciences, University of Chinese Academy of Sciences, Beijing, China; Key Laboratory of RNA Biology, Institute of Biophysics, Chinese Academy of Sciences, Beijing, China; Guangdong Geneway Decoding Bio-Tech Co Ltd, Foshan, China.
| | - Fuquan Yang
- Laboratory of Protein and Peptide Pharmaceuticals & Laboratory of Proteomics, Institute of Biophysics, Chinese Academy of Sciences, Beijing, China; College of Life Sciences, University of Chinese Academy of Sciences, Beijing, China.
| |
Collapse
|
14
|
Vitorino R, Guedes S, Amado F, Santos M, Akimitsu N. The role of micropeptides in biology. Cell Mol Life Sci 2021; 78:3285-3298. [PMID: 33507325 PMCID: PMC11073438 DOI: 10.1007/s00018-020-03740-3] [Citation(s) in RCA: 37] [Impact Index Per Article: 9.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/10/2020] [Revised: 12/01/2020] [Accepted: 12/11/2020] [Indexed: 12/11/2022]
Abstract
Micropeptides are small polypeptides coded by small open-reading frames. Progress in computational biology and the analyses of large-scale transcriptomes and proteomes have revealed that mammalian genomes produce a large number of transcripts encoding micropeptides. Many of these have been previously annotated as long noncoding RNAs. The role of micropeptides in cellular homeostasis maintenance has been demonstrated. This review discusses different types of micropeptides as well as methods to identify them, such as computational approaches, ribosome profiling, and mass spectrometry.
Collapse
Affiliation(s)
- Rui Vitorino
- Departamento de Cirurgia E Fisiologia, Faculdade de Medicina da Universidade Do Porto, UnIC, Porto, Portugal.
- Department of Medical Sciences, iBiMED, University of Aveiro, Aveiro, Portugal.
| | - Sofia Guedes
- Departamento de Química, LAQV-REQUIMTE, Universidade de Aveiro, Aveiro, Portugal
- Department of Chemistry, University of Aveiro, Aveiro, Portugal
| | - Francisco Amado
- Departamento de Química, LAQV-REQUIMTE, Universidade de Aveiro, Aveiro, Portugal
- Department of Chemistry, University of Aveiro, Aveiro, Portugal
| | - Manuel Santos
- Department of Medical Sciences, iBiMED, University of Aveiro, Aveiro, Portugal
| | | |
Collapse
|
15
|
Ramasamy P, Turan D, Tichshenko N, Hulstaert N, Vandermarliere E, Vranken W, Martens L. Scop3P: A Comprehensive Resource of Human Phosphosites within Their Full Context. J Proteome Res 2020; 19:3478-3486. [PMID: 32508104 DOI: 10.1021/acs.jproteome.0c00306] [Citation(s) in RCA: 19] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/08/2023]
Affiliation(s)
- Pathmanaban Ramasamy
- VIB-UGent Center for Medical Biotechnology, VIB, Ghent 9000, Belgium
- Department of Biomolecular Medicine, Faculty of Health Sciences and Medicine, Ghent University, Ghent 9000, Belgium
- Interuniversity Institute of Bioinformatics in Brussels, ULB-VUB, 1050 Brussels, Belgium
- Structural Biology Brussels, Vrije Universiteit Brussel, Pleinlaan 2, 1050 Brussels, Belgium
- Centre for Structural Biology, VIB, 1050 Brussels, Belgium
| | - Demet Turan
- VIB-UGent Center for Medical Biotechnology, VIB, Ghent 9000, Belgium
- Department of Biomolecular Medicine, Faculty of Health Sciences and Medicine, Ghent University, Ghent 9000, Belgium
| | - Natalia Tichshenko
- VIB-UGent Center for Medical Biotechnology, VIB, Ghent 9000, Belgium
- Department of Biomolecular Medicine, Faculty of Health Sciences and Medicine, Ghent University, Ghent 9000, Belgium
| | - Niels Hulstaert
- VIB-UGent Center for Medical Biotechnology, VIB, Ghent 9000, Belgium
- Department of Biomolecular Medicine, Faculty of Health Sciences and Medicine, Ghent University, Ghent 9000, Belgium
| | - Elien Vandermarliere
- VIB-UGent Center for Medical Biotechnology, VIB, Ghent 9000, Belgium
- Department of Biomolecular Medicine, Faculty of Health Sciences and Medicine, Ghent University, Ghent 9000, Belgium
| | - Wim Vranken
- Interuniversity Institute of Bioinformatics in Brussels, ULB-VUB, 1050 Brussels, Belgium
- Structural Biology Brussels, Vrije Universiteit Brussel, Pleinlaan 2, 1050 Brussels, Belgium
- Centre for Structural Biology, VIB, 1050 Brussels, Belgium
| | - Lennart Martens
- VIB-UGent Center for Medical Biotechnology, VIB, Ghent 9000, Belgium
- Department of Biomolecular Medicine, Faculty of Health Sciences and Medicine, Ghent University, Ghent 9000, Belgium
| |
Collapse
|
16
|
Maciel LF, Morales-Vicente DA, Silveira GO, Ribeiro RO, Olberg GGO, Pires DS, Amaral MS, Verjovski-Almeida S. Weighted Gene Co-Expression Analyses Point to Long Non-Coding RNA Hub Genes at Different Schistosoma mansoni Life-Cycle Stages. Front Genet 2019; 10:823. [PMID: 31572441 PMCID: PMC6752179 DOI: 10.3389/fgene.2019.00823] [Citation(s) in RCA: 19] [Impact Index Per Article: 3.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/04/2019] [Accepted: 08/09/2019] [Indexed: 01/21/2023] Open
Abstract
Long non-coding RNAs (lncRNAs) (>200 nt) are expressed at levels lower than those of the protein-coding mRNAs, and in all eukaryotic model species where they have been characterized, they are transcribed from thousands of different genomic loci. In humans, some four dozen lncRNAs have been studied in detail, and they have been shown to play important roles in transcriptional regulation, acting in conjunction with transcription factors and epigenetic marks to modulate the tissue-type specific programs of transcriptional gene activation and repression. In Schistosoma mansoni, around 10,000 lncRNAs have been identified in previous works. However, the limited number of RNA-sequencing (RNA-seq) libraries that had been previously assessed, together with the use of old and incomplete versions of the S. mansoni genome and protein-coding transcriptome annotations, have hampered the identification of all lncRNAs expressed in the parasite. Here we have used 633 publicly available S. mansoni RNA-seq libraries from whole worms at different stages (n = 121), from isolated tissues (n = 24), from cell-populations (n = 81), and from single-cells (n = 407). We have assembled a set of 16,583 lncRNA transcripts originated from 10,024 genes, of which 11,022 are novel S. mansoni lncRNA transcripts, whereas the remaining 5,561 transcripts comprise 120 lncRNAs that are identical to and 5,441 lncRNAs that have gene overlap with S. mansoni lncRNAs already reported in previous works. Most importantly, our more stringent assembly and filtering pipeline has identified and removed a set of 4,293 lncRNA transcripts from previous publications that were in fact derived from partially processed mRNAs with intron retention. We have used weighted gene co-expression network analyses and identified 15 different gene co-expression modules. Each parasite life-cycle stage has at least one highly correlated gene co-expression module, and each module is comprised of hundreds to thousands lncRNAs and mRNAs having correlated co-expression patterns at different stages. Inspection of the top most highly connected genes within the modules’ networks has shown that different lncRNAs are hub genes at different life-cycle stages, being among the most promising candidate lncRNAs to be further explored for functional characterization.
Collapse
Affiliation(s)
- Lucas F Maciel
- Laboratório de Expressão Gênica em Eucariotos, Instituto Butantan, São Paulo, Brazil.,Programa Interunidades em Bioinformática, Instituto de Matemática e Estatística, Universidade de São Paulo, São Paulo, Brazil
| | - David A Morales-Vicente
- Laboratório de Expressão Gênica em Eucariotos, Instituto Butantan, São Paulo, Brazil.,Departamento de Bioquímica, Instituto de Química, Universidade de São Paulo, São Paulo, Brazil
| | - Gilbert O Silveira
- Laboratório de Expressão Gênica em Eucariotos, Instituto Butantan, São Paulo, Brazil.,Departamento de Bioquímica, Instituto de Química, Universidade de São Paulo, São Paulo, Brazil
| | - Raphael O Ribeiro
- Laboratório de Expressão Gênica em Eucariotos, Instituto Butantan, São Paulo, Brazil.,Departamento de Bioquímica, Instituto de Química, Universidade de São Paulo, São Paulo, Brazil
| | - Giovanna G O Olberg
- Laboratório de Expressão Gênica em Eucariotos, Instituto Butantan, São Paulo, Brazil
| | - David S Pires
- Laboratório de Expressão Gênica em Eucariotos, Instituto Butantan, São Paulo, Brazil
| | - Murilo S Amaral
- Laboratório de Expressão Gênica em Eucariotos, Instituto Butantan, São Paulo, Brazil
| | - Sergio Verjovski-Almeida
- Laboratório de Expressão Gênica em Eucariotos, Instituto Butantan, São Paulo, Brazil.,Departamento de Bioquímica, Instituto de Química, Universidade de São Paulo, São Paulo, Brazil
| |
Collapse
|
17
|
Harnessing the tissue and plasma lncRNA-peptidome to discover peptide-based cancer biomarkers. Sci Rep 2019; 9:12322. [PMID: 31444383 PMCID: PMC6707329 DOI: 10.1038/s41598-019-48774-1] [Citation(s) in RCA: 18] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/27/2019] [Accepted: 08/12/2019] [Indexed: 01/22/2023] Open
Abstract
Proteome-centric studies, although have identified numerous lncRNA-encoded polypeptides, lack differential expression analysis of lncRNA-peptidome across primary tissues, cell lines and cancer states. We established a computational-proteogenomic workflow involving re-processing of publicly available LC-MS/MS data, which facilitated the identification of tissue-specific and universally expressed (UExp) lncRNA-polypeptides across 14 primary human tissues and 11 cell lines. The utility of lncRNA-peptidome as cancer-biomarkers was investigated by re-processing LC-MS/MS data from 92 colon-adenocarcinoma (COAD) and 30 normal colon-epithelium tissues. Intriguingly, a significant upregulation of five lncRNA UExp-polypeptides in COAD tissues was observed. Furthermore, clustering of the UExp-polypeptides led to the classification of COAD patients that coincided with the clinical stratification, underlining the prognostic potential of the UExp-polypeptides. Lastly, we identified differential abundance of the UExp-polypeptides in the plasma of prostate-cancer patients highlighting their potential as plasma-biomarker. The analysis of lncRNA-peptidome may pave the way to identify effective tissue/plasma biomarkers for different cancer types.
Collapse
|
18
|
Fesenko I, Kirov I, Kniazev A, Khazigaleeva R, Lazarev V, Kharlampieva D, Grafskaia E, Zgoda V, Butenko I, Arapidi G, Mamaeva A, Ivanov V, Govorun V. Distinct types of short open reading frames are translated in plant cells. Genome Res 2019; 29:1464-1477. [PMID: 31387879 PMCID: PMC6724668 DOI: 10.1101/gr.253302.119] [Citation(s) in RCA: 48] [Impact Index Per Article: 8.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/04/2019] [Accepted: 08/01/2019] [Indexed: 02/07/2023]
Abstract
Genomes contain millions of short (<100 codons) open reading frames (sORFs), which are usually dismissed during gene annotation. Nevertheless, peptides encoded by such sORFs can play important biological roles, and their impact on cellular processes has long been underestimated. Here, we analyzed approximately 70,000 transcribed sORFs in the model plant Physcomitrella patens (moss). Several distinct classes of sORFs that differ in terms of their position on transcripts and the level of evolutionary conservation are present in the moss genome. Over 5000 sORFs were conserved in at least one of 10 plant species examined. Mass spectrometry analysis of proteomic and peptidomic data sets suggested that tens of sORFs located on distinct parts of mRNAs and long noncoding RNAs (lncRNAs) are translated, including conserved sORFs. Translational analysis of the sORFs and main ORFs at a single locus suggested the existence of genes that code for multiple proteins and peptides with tissue-specific expression. Functional analysis of four lncRNA-encoded peptides showed that sORFs-encoded peptides are involved in regulation of growth and differentiation in moss. Knocking out lncRNA-encoded peptides resulted in a decrease of moss growth. In contrast, the overexpression of these peptides resulted in a diverse range of phenotypic effects. Our results thus open new avenues for discovering novel, biologically active peptides in the plant kingdom.
Collapse
Affiliation(s)
- Igor Fesenko
- Shemyakin and Ovchinnikov Institute of Bioorganic Chemistry of the Russian Academy of Sciences, 117997 Moscow, Russian Federation
| | - Ilya Kirov
- Laboratory of marker-assisted and genomic selection of plants, All-Russian Research Institute of Agricultural Biotechnology, 127550 Moscow, Russian Federation
| | - Andrey Kniazev
- Shemyakin and Ovchinnikov Institute of Bioorganic Chemistry of the Russian Academy of Sciences, 117997 Moscow, Russian Federation
| | - Regina Khazigaleeva
- Shemyakin and Ovchinnikov Institute of Bioorganic Chemistry of the Russian Academy of Sciences, 117997 Moscow, Russian Federation
| | - Vassili Lazarev
- Federal Research and Clinical Center of Physical-Chemical Medicine of Federal Medical Biological Agency, 119435 Moscow, Russian Federation.,Moscow Institute of Physics and Technology (National Research University), 141701 Dolgoprudny, Moscow Region, Russian Federation
| | - Daria Kharlampieva
- Federal Research and Clinical Center of Physical-Chemical Medicine of Federal Medical Biological Agency, 119435 Moscow, Russian Federation
| | - Ekaterina Grafskaia
- Federal Research and Clinical Center of Physical-Chemical Medicine of Federal Medical Biological Agency, 119435 Moscow, Russian Federation.,Moscow Institute of Physics and Technology (National Research University), 141701 Dolgoprudny, Moscow Region, Russian Federation
| | - Viktor Zgoda
- Laboratory of System Biology, Institute of Biomedical Chemistry, 119121 Moscow, Russian Federation
| | - Ivan Butenko
- Federal Research and Clinical Center of Physical-Chemical Medicine of Federal Medical Biological Agency, 119435 Moscow, Russian Federation
| | - Georgy Arapidi
- Shemyakin and Ovchinnikov Institute of Bioorganic Chemistry of the Russian Academy of Sciences, 117997 Moscow, Russian Federation.,Federal Research and Clinical Center of Physical-Chemical Medicine of Federal Medical Biological Agency, 119435 Moscow, Russian Federation
| | - Anna Mamaeva
- Shemyakin and Ovchinnikov Institute of Bioorganic Chemistry of the Russian Academy of Sciences, 117997 Moscow, Russian Federation
| | - Vadim Ivanov
- Shemyakin and Ovchinnikov Institute of Bioorganic Chemistry of the Russian Academy of Sciences, 117997 Moscow, Russian Federation
| | - Vadim Govorun
- Federal Research and Clinical Center of Physical-Chemical Medicine of Federal Medical Biological Agency, 119435 Moscow, Russian Federation
| |
Collapse
|
19
|
Ruiz-Orera J, Albà MM. Conserved regions in long non-coding RNAs contain abundant translation and protein-RNA interaction signatures. NAR Genom Bioinform 2019; 1:e2. [PMID: 33575549 PMCID: PMC7671363 DOI: 10.1093/nargab/lqz002] [Citation(s) in RCA: 25] [Impact Index Per Article: 4.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/24/2019] [Revised: 06/14/2019] [Accepted: 07/04/2019] [Indexed: 02/06/2023] Open
Abstract
The mammalian transcriptome includes thousands of transcripts that do not correspond to annotated protein-coding genes and that are known as long non-coding RNAs (lncRNAs). A handful of lncRNAs have well-characterized regulatory functions but the biological significance of the majority of them is not well understood. LncRNAs that are conserved between mice and humans are likely to be enriched in functional sequences. Here, we investigate the presence of different types of ribosome profiling signatures in lncRNAs and how they relate to sequence conservation. We find that lncRNA-conserved regions contain three times more ORFs with translation evidence than non-conserved ones, and identify nine cases that display significant sequence constraints at the amino acid sequence level. The study also reveals that conserved regions in intergenic lncRNAs are significantly enriched in protein–RNA interaction signatures when compared to non-conserved ones; this includes sites in well-characterized lncRNAs, such as Cyrano, Malat1, Neat1 and Meg3, as well as in tens of lncRNAs of unknown function. This work illustrates how the analysis of ribosome profiling data coupled with evolutionary analysis provides new opportunities to explore the lncRNA functional landscape.
Collapse
Affiliation(s)
- Jorge Ruiz-Orera
- Evolutionary Genomics Group, Research Programme on Biomedical Informatics, Hospital del Mar Research Institute, Universitat Pompeu Fabra, Dr Aiguader 88, Barcelona 08003, Spain
| | - M Mar Albà
- Evolutionary Genomics Group, Research Programme on Biomedical Informatics, Hospital del Mar Research Institute, Universitat Pompeu Fabra, Dr Aiguader 88, Barcelona 08003, Spain.,Catalan Institution for Research and Advanced Studies, Passeig Lluís Companys 23, Barcelona 08010, Spain
| |
Collapse
|
20
|
Varon M, Levy T, Mazor G, Ben David H, Marciano R, Krelin Y, Prasad M, Elkabets M, Pauck D, Ahmadov U, Picard D, Qin N, Borkhardt A, Reifenberger G, Leprivier G, Remke M, Rotblat B. The long noncoding RNA TP73-AS1 promotes tumorigenicity of medulloblastoma cells. Int J Cancer 2019; 145:3402-3413. [PMID: 31081944 DOI: 10.1002/ijc.32400] [Citation(s) in RCA: 26] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/28/2018] [Revised: 04/11/2019] [Accepted: 04/25/2019] [Indexed: 12/30/2022]
Abstract
Medulloblastoma is the most common malignant brain cancer in children. Since previous studies have mainly focused on alterations in the coding genome, our understanding of the contribution of long noncoding RNAs (lncRNAs) to medulloblastoma biology is just emerging. Using patient-derived data, we show that the promoter of lncRNA TP73-AS1 is hypomethylated and that the transcript is highly expressed in the SHH subgroup. Furthermore, high expression of TP73-AS1 is correlated with poor outcome in patients with TP53 wild-type SHH tumors. Silencing TP73-AS1 in medulloblastoma tumor cells induced apoptosis, while proliferation and migration were inhibited in culture. In vivo, silencing TP73-AS1 in medulloblastoma tumor cells resulted in reduced tumor growth, reduced proliferation of tumor cells, increased apoptosis and led to prolonged survival of tumor-bearing mice. Together, our study suggests that the lncRNA TP73-AS1 is a prognostic marker and therapeutic target in medulloblastoma tumors and serves as a proof of concept that lncRNAs are important factors in the disease.
Collapse
Affiliation(s)
- Mor Varon
- Department of Life Sciences, Ben-Gurion University of the Negev, Beer Sheva, Israel
| | - Tal Levy
- Department of Life Sciences, Ben-Gurion University of the Negev, Beer Sheva, Israel
| | - Gal Mazor
- Department of Life Sciences, Ben-Gurion University of the Negev, Beer Sheva, Israel
| | - Hila Ben David
- Department of Life Sciences, Ben-Gurion University of the Negev, Beer Sheva, Israel
| | - Ran Marciano
- Department of Life Sciences, Ben-Gurion University of the Negev, Beer Sheva, Israel.,The National Institute for Biotechnology in the Negev, Beer Sheva, Israel
| | - Yakov Krelin
- Department of Life Sciences, Ben-Gurion University of the Negev, Beer Sheva, Israel
| | - Manu Prasad
- The Shraga Segal Department of Microbiology, Immunology and Genetics, Faculty of Health Sciences, Ben-Gurion University of the Negev, Beer-Sheva, Israel
| | - Moshe Elkabets
- The Shraga Segal Department of Microbiology, Immunology and Genetics, Faculty of Health Sciences, Ben-Gurion University of the Negev, Beer-Sheva, Israel
| | - David Pauck
- Department of Pediatric Neuro-Oncogenomics, German Cancer Research Center (DKFZ), Heidelberg, Germany.,German Cancer Consortium (DKTK), Partner Site Essen/Düsseldorf, Essen, Germany.,Department of Pediatric Oncology, Hematology, and Clinical Immunology, Medical Faculty, University Hospital Düsseldorf, Düsseldorf, Germany.,Institute of Neuropathology, Medical Faculty, University Hospital Düsseldorf, Düsseldorf, Germany
| | - Ulvi Ahmadov
- Department of Pediatric Neuro-Oncogenomics, German Cancer Research Center (DKFZ), Heidelberg, Germany.,German Cancer Consortium (DKTK), Partner Site Essen/Düsseldorf, Essen, Germany.,Department of Pediatric Oncology, Hematology, and Clinical Immunology, Medical Faculty, University Hospital Düsseldorf, Düsseldorf, Germany.,Institute of Neuropathology, Medical Faculty, University Hospital Düsseldorf, Düsseldorf, Germany
| | - Daniel Picard
- Department of Pediatric Neuro-Oncogenomics, German Cancer Research Center (DKFZ), Heidelberg, Germany.,German Cancer Consortium (DKTK), Partner Site Essen/Düsseldorf, Essen, Germany.,Department of Pediatric Oncology, Hematology, and Clinical Immunology, Medical Faculty, University Hospital Düsseldorf, Düsseldorf, Germany.,Institute of Neuropathology, Medical Faculty, University Hospital Düsseldorf, Düsseldorf, Germany
| | - Nan Qin
- Department of Pediatric Neuro-Oncogenomics, German Cancer Research Center (DKFZ), Heidelberg, Germany.,German Cancer Consortium (DKTK), Partner Site Essen/Düsseldorf, Essen, Germany.,Department of Pediatric Oncology, Hematology, and Clinical Immunology, Medical Faculty, University Hospital Düsseldorf, Düsseldorf, Germany.,Institute of Neuropathology, Medical Faculty, University Hospital Düsseldorf, Düsseldorf, Germany
| | - Arndt Borkhardt
- German Cancer Consortium (DKTK), Partner Site Essen/Düsseldorf, Essen, Germany.,Department of Pediatric Oncology, Hematology, and Clinical Immunology, Medical Faculty, University Hospital Düsseldorf, Düsseldorf, Germany
| | - Guido Reifenberger
- German Cancer Consortium (DKTK), Partner Site Essen/Düsseldorf, Essen, Germany.,Institute of Neuropathology, Medical Faculty, University Hospital Düsseldorf, Düsseldorf, Germany
| | - Gabriel Leprivier
- Department of Pediatric Oncology, Hematology, and Clinical Immunology, Medical Faculty, University Hospital Düsseldorf, Düsseldorf, Germany
| | - Marc Remke
- Department of Pediatric Neuro-Oncogenomics, German Cancer Research Center (DKFZ), Heidelberg, Germany.,German Cancer Consortium (DKTK), Partner Site Essen/Düsseldorf, Essen, Germany.,Department of Pediatric Oncology, Hematology, and Clinical Immunology, Medical Faculty, University Hospital Düsseldorf, Düsseldorf, Germany.,Institute of Neuropathology, Medical Faculty, University Hospital Düsseldorf, Düsseldorf, Germany
| | - Barak Rotblat
- Department of Life Sciences, Ben-Gurion University of the Negev, Beer Sheva, Israel.,The National Institute for Biotechnology in the Negev, Beer Sheva, Israel
| |
Collapse
|
21
|
Li J, Liu C. Coding or Noncoding, the Converging Concepts of RNAs. Front Genet 2019; 10:496. [PMID: 31178900 PMCID: PMC6538810 DOI: 10.3389/fgene.2019.00496] [Citation(s) in RCA: 124] [Impact Index Per Article: 20.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/24/2019] [Accepted: 05/06/2019] [Indexed: 12/18/2022] Open
Abstract
Technological advances over the past decade have unraveled the remarkable complexity of RNA. The identification of small peptides encoded by long non-coding RNAs (lncRNAs) as well as regulatory functions mediated by non-coding regions of mRNAs have further complicated our understanding of the multifaceted functions of RNA. In this review, we summarize current evidence pointing to dual roles of RNA molecules defined by their coding and non-coding potentials. We also discuss how the emerging roles of RNA transform our understanding of gene expression and evolution.
Collapse
Affiliation(s)
- Jing Li
- CAS Key Laboratory of Tropical Plant Resource and Sustainable Use, Xishuangbanna Tropical Botanical Garden, The Innovative Academy of Seed Design, Chinese Academy of Sciences, Kunming, China
| | - Changning Liu
- CAS Key Laboratory of Tropical Plant Resource and Sustainable Use, Xishuangbanna Tropical Botanical Garden, The Innovative Academy of Seed Design, Chinese Academy of Sciences, Kunming, China
| |
Collapse
|
22
|
Abstract
INTRODUCTION Small open reading frames (sORFs) with potential protein-coding capacity have been disclosed in various transcripts, including long noncoding RNAs (LncRNAs), mRNAs (5'-upstream, coding domain, and 3'-downstream), circular RNAs, pri-miRNAs, and ribosomal RNAs (rRNAs). Recent characterization of several sORF-encoded peptides (SEPs or micropeptides) revealed their important roles in many fundamental biological processes in a broad range of species from yeast to human. The success in the mining of micropeptides attributes to the advanced bioinformatics and high-throughput sequencing techniques. Areas covered: sORFs and SEPs were overlooked for their tiny size and the difficulty of identification by bioinformatics analyses. With more and more sORFs and SEPs have been identified, this field has attracted more attention. This review covers recent advances in the strategies for the detection and identification of sORFs and SEPs. Expert commentary: The advantages and drawbacks of the strategies for detection and identification of sORFs and SEPs are discussed, as well as the techniques that are used to decipher the roles of micropeptides in organisms are described.
Collapse
Affiliation(s)
- Xinqiang Yin
- a The Engineering Research Center of Synthetic Polypeptide Drug Discovery and Evaluation of Jiangsu Province , China Pharmaceutical University , Nanjing , China.,b The Basic Medical School , North Sichuan Medical College , Nanchong , China
| | - Yuanyuan Jing
- c Department of Preventive Medicine , North Sichuan Medical College , Nanchong , China
| | - Hanmei Xu
- a The Engineering Research Center of Synthetic Polypeptide Drug Discovery and Evaluation of Jiangsu Province , China Pharmaceutical University , Nanjing , China.,d State Key Laboratory of Natural Medicines, Ministry of Education , China Pharmaceutical University , Nanjing , China
| |
Collapse
|
23
|
Lorenzi L, Avila Cobos F, Decock A, Everaert C, Helsmoortel H, Lefever S, Verboom K, Volders PJ, Speleman F, Vandesompele J, Mestdagh P. Long noncoding RNA expression profiling in cancer: Challenges and opportunities. Genes Chromosomes Cancer 2019; 58:191-199. [PMID: 30461116 DOI: 10.1002/gcc.22709] [Citation(s) in RCA: 104] [Impact Index Per Article: 17.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/03/2018] [Revised: 11/06/2018] [Accepted: 11/18/2018] [Indexed: 12/11/2022] Open
Abstract
In recent years, technological advances in transcriptome profiling revealed that the repertoire of human RNA molecules is more diverse and extended than originally thought. This diversity and complexity mainly derive from a large ensemble of noncoding RNAs. Because of their key roles in cellular processes important for normal development and physiology, disruption of noncoding RNA expression is intrinsically linked to human disease, including cancer. Therefore, studying the noncoding portion of the transcriptome offers the prospect of identifying novel therapeutic and diagnostic targets. Although evidence of the relevance of noncoding RNAs in cancer is accumulating, we still face many challenges when it comes to accurately profiling their expression levels. Some of these challenges are inherent to the technologies employed, whereas others are associated with characteristics of the noncoding RNAs themselves. In this review, we discuss the challenges related to long noncoding RNA expression profiling, highlight how cancer long noncoding RNAs provide new opportunities for cancer diagnosis and treatment, and reflect on future developments.
Collapse
Affiliation(s)
- Lucía Lorenzi
- Center for Medical Genetics Ghent (CMGG), Ghent University, Ghent, Belgium.,Cancer Research Institute Ghent (CRIG), Ghent University, Ghent, Belgium
| | - Francisco Avila Cobos
- Center for Medical Genetics Ghent (CMGG), Ghent University, Ghent, Belgium.,Cancer Research Institute Ghent (CRIG), Ghent University, Ghent, Belgium
| | - Anneleen Decock
- Center for Medical Genetics Ghent (CMGG), Ghent University, Ghent, Belgium.,Cancer Research Institute Ghent (CRIG), Ghent University, Ghent, Belgium
| | - Celine Everaert
- Center for Medical Genetics Ghent (CMGG), Ghent University, Ghent, Belgium.,Cancer Research Institute Ghent (CRIG), Ghent University, Ghent, Belgium
| | - Hetty Helsmoortel
- Center for Medical Genetics Ghent (CMGG), Ghent University, Ghent, Belgium.,Cancer Research Institute Ghent (CRIG), Ghent University, Ghent, Belgium
| | - Steve Lefever
- Center for Medical Genetics Ghent (CMGG), Ghent University, Ghent, Belgium.,Cancer Research Institute Ghent (CRIG), Ghent University, Ghent, Belgium
| | - Karen Verboom
- Center for Medical Genetics Ghent (CMGG), Ghent University, Ghent, Belgium.,Cancer Research Institute Ghent (CRIG), Ghent University, Ghent, Belgium
| | - Pieter-Jan Volders
- Center for Medical Genetics Ghent (CMGG), Ghent University, Ghent, Belgium.,Cancer Research Institute Ghent (CRIG), Ghent University, Ghent, Belgium
| | - Frank Speleman
- Center for Medical Genetics Ghent (CMGG), Ghent University, Ghent, Belgium.,Cancer Research Institute Ghent (CRIG), Ghent University, Ghent, Belgium
| | - Jo Vandesompele
- Center for Medical Genetics Ghent (CMGG), Ghent University, Ghent, Belgium.,Cancer Research Institute Ghent (CRIG), Ghent University, Ghent, Belgium
| | - Pieter Mestdagh
- Center for Medical Genetics Ghent (CMGG), Ghent University, Ghent, Belgium.,Cancer Research Institute Ghent (CRIG), Ghent University, Ghent, Belgium
| |
Collapse
|
24
|
Paik YK, Lane L, Kawamura T, Chen YJ, Cho JY, LaBaer J, Yoo JS, Domont G, Corrales F, Omenn GS, Archakov A, Encarnación-Guevara S, Lui S, Salekdeh GH, Cho JY, Kim CY, Overall CM. Launching the C-HPP neXt-CP50 Pilot Project for Functional Characterization of Identified Proteins with No Known Function. J Proteome Res 2018; 17:4042-4050. [PMID: 30269496 PMCID: PMC6693327 DOI: 10.1021/acs.jproteome.8b00383] [Citation(s) in RCA: 42] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/18/2022]
Abstract
An important goal of the Human Proteome Organization (HUPO) Chromosome-centric Human Proteome Project (C-HPP) is to correctly define the number of canonical proteins encoded by their cognate open reading frames on each chromosome in the human genome. When identified with high confidence of protein evidence (PE), such proteins are termed PE1 proteins in the online database resource, neXtProt. However, proteins that have not been identified unequivocally at the protein level but that have other evidence suggestive of their existence (PE2-4) are termed missing proteins (MPs). The number of MPs has been reduced from 5511 in 2012 to 2186 in 2018 (neXtProt 2018-01-17 release). Although the annotation of the human proteome has made significant progress, the "parts list" alone does not inform function. Indeed, 1937 proteins representing ∼10% of the human proteome have no function either annotated from experimental characterization or predicted by homology to other proteins. Specifically, these 1937 "dark proteins" of the so-called dark proteome are composed of 1260 functionally uncharacterized but identified PE1 proteins, designated as uPE1, plus 677 MPs from categories PE2-PE4, which also have no known or predicted function and are termed uMPs. At the HUPO-2017 Annual Meeting, the C-HPP officially adopted the uPE1 pilot initiative, with 14 participating international teams later committing to demonstrate the feasibility of the functional characterization of large numbers of dark proteins (CP), starting first with 50 uPE1 proteins, in a stepwise chromosome-centric organizational manner. The second aim of the feasibility phase to characterize protein (CP) functions of 50 uPE1 proteins, termed the neXt-CP50 initiative, is to utilize a variety of approaches and workflows according to individual team expertise, interest, and resources so as to enable the C-HPP to recommend experimentally proven workflows to the proteome community within 3 years. The results from this pilot will not only be the cornerstone of a larger characterization initiative but also enhance understanding of the human proteome and integrated cellular networks for the discovery of new mechanisms of pathology, mechanistically informative biomarkers, and rational drug targets.
Collapse
Affiliation(s)
- Young-Ki Paik
- Yonsei Proteome Research Center and Department of Integrative Omics, Yonsei University, Sudaemoon-ku, Seoul, Korea
| | - Lydie Lane
- CALIPHO group, Swiss Institute of Bioinformatics & Department of Microbiology and Molecular medicine, Faculty of Medicine, University of Geneva, Geneva, Switzerland
| | - Takeshi Kawamura
- Proteomics Laboratory, Isotope Science Center, The University of Tokyo, Bunkyo-Ku, Tokyo 113-0032 Japan
| | - Yu-Ju Chen
- Institute of Chemistry Academia Sinica, 128 Academia Road Sec. 2, Nankang Taipei 115 Taiwan
| | - Je-Yoel Cho
- Research Institute for Veterinary Science, College of Veterinary Medicine, Seoul University, 1 Gwanak-, Gwanak-gu, 151-742 Seoul, South Korea
| | - Joshua LaBaer
- McAllister Ave. Arizona State University, Tempe, Arizona, 85287-5001, USA
| | - Jong Shin Yoo
- Division of Mass Spectrometry Research, Korea Basic Science Institute, Ochang, Korea
| | - Gilberto Domont
- Federal University of Rio de Janeiro Institute of Chemistry, Rio de Janeiro, RJ Brazil
| | - Fernando Corrales
- Functional Proteomics Laboratory National Center of Biotechnology, CSIC 28049 Madrid, Spain
| | - Gilbert S. Omenn
- Center for Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, Michigan 48109-2218, United States
| | | | | | - Siqi Lui
- BGI-Shenzhen, Beishan Industrial Zone, Yantian District, Shenzhen, 518083, China
| | - Ghasem Hosseini Salekdeh
- Department of Molecular Systems Biology, Royan Institute for Stem Cell Biology and Technology, 1665659911, Tehran, Iran
- Department of Molecular Sciences, Macquarie University, Sydney, Australia
| | - Jin-Young Cho
- Yonsei Proteome Research Center and Department of Integrative Omics, Yonsei University, Sudaemoon-ku, Seoul, Korea
| | - Chae-Yeon Kim
- Yonsei Proteome Research Center and Department of Integrative Omics, Yonsei University, Sudaemoon-ku, Seoul, Korea
| | - Christopher M. Overall
- Centre for Blood Research, Departments of Oral Biological & Medical Sciences, and Biochemistry & Molecular Biology, Faculty of Dentistry, University of British Columbia, Vancouver, Canada
| |
Collapse
|
25
|
Giambruno R, Mihailovich M, Bonaldi T. Mass Spectrometry-Based Proteomics to Unveil the Non-coding RNA World. Front Mol Biosci 2018; 5:90. [PMID: 30467545 PMCID: PMC6236024 DOI: 10.3389/fmolb.2018.00090] [Citation(s) in RCA: 14] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/26/2018] [Accepted: 10/15/2018] [Indexed: 01/03/2023] Open
Abstract
The interaction between non-coding RNAs (ncRNAs) and proteins is crucial for the stability, localization and function of the different classes of ncRNAs. Although ncRNAs, when embedded in various ribonucleoprotein (RNP) complexes, control the fundamental processes of gene expression, their biological functions and mechanisms of action are still largely unexplored. Mass Spectrometry (MS)-based proteomics has emerged as powerful tool to study the ncRNA world: on the one hand, by identifying the proteins interacting with distinct ncRNAs; on the other hand, by measuring the impact of ncRNAs on global protein levels. Here, we will first provide a concise overview on the basic principles of MS-based proteomics for systematic protein identification and quantification; then, we will recapitulate the main approaches that have been implemented for the screening of ncRNA interactors and the dissection of ncRNA-protein complex composition. Finally, we will describe examples of various proteomics strategies developed to characterize the effect of ncRNAs on gene expression, with a focus on the systematic identification of microRNA (miRNA) targets.
Collapse
Affiliation(s)
| | | | - Tiziana Bonaldi
- Department of Experimental Oncology, IEO, European Institute of Oncology IRCCS, Milan, Italy
| |
Collapse
|
26
|
Yang M, Lin X, Liu X, Zhang J, Ge F. Genome Annotation of a Model Diatom Phaeodactylum tricornutum Using an Integrated Proteogenomic Pipeline. MOLECULAR PLANT 2018; 11:1292-1307. [PMID: 30176371 DOI: 10.1016/j.molp.2018.08.005] [Citation(s) in RCA: 30] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/15/2018] [Revised: 08/26/2018] [Accepted: 08/28/2018] [Indexed: 06/08/2023]
Abstract
Diatoms comprise a diverse and ecologically important group of eukaryotic phytoplankton that significantly contributes to marine primary production and global carbon cycling. Phaeodactylum tricornutum is commonly used as a model organism for studying diatom biology. Although its genome was sequenced in 2008, a high-quality genome annotation is still not available for this diatom. Here we report the development of an integrated proteogenomic pipeline and its application for improved annotation of P. tricornutum genome using mass spectrometry (MS)-based proteomics data. Our proteogenomic analysis unambiguously identified approximately 8300 genes and revealed 606 novel proteins, 506 revised genes, 94 splice variants, 58 single amino acid variants, and a holistic view of post-translational modifications in P. tricornutum. We experimentally confirmed a subset of novel events and obtained MS evidence for more than 200 micropeptides in P. tricornutum. These findings expand the genomic landscape of P. tricornutum and provide a rich resource for the study of diatom biology. The proteogenomic pipeline we developed in this study is applicable to any sequenced eukaryote and thus represents a significant contribution to the toolset for eukaryotic proteogenomic analysis. The pipeline and its source code are freely available at https://sourceforge.net/projects/gapeproteogenomic.
Collapse
Affiliation(s)
- Mingkun Yang
- Key Laboratory of Algal Biology, Institute of Hydrobiology, Chinese Academy of Sciences, Wuhan 430072, China
| | - Xiaohuang Lin
- Key Laboratory of Algal Biology, Institute of Hydrobiology, Chinese Academy of Sciences, Wuhan 430072, China; University of Chinese Academy of Sciences, Beijing 100039, China
| | - Xin Liu
- Key Laboratory of Algal Biology, Institute of Hydrobiology, Chinese Academy of Sciences, Wuhan 430072, China; University of Chinese Academy of Sciences, Beijing 100039, China
| | - Jia Zhang
- Key Laboratory of Algal Biology, Institute of Hydrobiology, Chinese Academy of Sciences, Wuhan 430072, China
| | - Feng Ge
- Key Laboratory of Algal Biology, Institute of Hydrobiology, Chinese Academy of Sciences, Wuhan 430072, China; University of Chinese Academy of Sciences, Beijing 100039, China.
| |
Collapse
|
27
|
Uszczynska-Ratajczak B, Lagarde J, Frankish A, Guigó R, Johnson R. Towards a complete map of the human long non-coding RNA transcriptome. Nat Rev Genet 2018; 19:535-548. [PMID: 29795125 PMCID: PMC6451964 DOI: 10.1038/s41576-018-0017-y] [Citation(s) in RCA: 416] [Impact Index Per Article: 59.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/01/2023]
Abstract
Gene maps, or annotations, enable us to navigate the functional landscape of our genome. They are a resource upon which virtually all studies depend, from single-gene to genome-wide scales and from basic molecular biology to medical genetics. Yet present-day annotations suffer from trade-offs between quality and size, with serious but often unappreciated consequences for downstream studies. This is particularly true for long non-coding RNAs (lncRNAs), which are poorly characterized compared to protein-coding genes. Long-read sequencing technologies promise to improve current annotations, paving the way towards a complete annotation of lncRNAs expressed throughout a human lifetime.
Collapse
Affiliation(s)
| | - Julien Lagarde
- Centre for Genomic Regulation (CRG), The Barcelona Institute of Science and Technology, Barcelona, Catalonia, Spain
- Universitat Pompeu Fabra (UPF), Barcelona, Catalonia, Spain
| | - Adam Frankish
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge, UK
| | - Roderic Guigó
- Centre for Genomic Regulation (CRG), The Barcelona Institute of Science and Technology, Barcelona, Catalonia, Spain
- Universitat Pompeu Fabra (UPF), Barcelona, Catalonia, Spain
| | - Rory Johnson
- Department of Medical Oncology, Inselspital, University Hospital and University of Bern, Bern, Switzerland.
- Department of Biomedical Research (DBMR), University of Bern, Bern, Switzerland.
| |
Collapse
|
28
|
Fesenko IA, Kirov IV, Filippova AA. Impact of Noncoding Part of the Genome on the Proteome Plasticity of the Eukaryotic Cell. RUSSIAN JOURNAL OF BIOORGANIC CHEMISTRY 2018. [DOI: 10.1134/s1068162018040076] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/23/2022]
|
29
|
Paik YK, Omenn GS, Hancock WS, Lane L, Overall CM. Advances in the Chromosome-Centric Human Proteome Project: looking to the future. Expert Rev Proteomics 2017; 14:1059-1071. [PMID: 29039980 DOI: 10.1080/14789450.2017.1394189] [Citation(s) in RCA: 22] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/01/2023]
Abstract
INTRODUCTION The mission of the Chromosome-Centric Human Proteome Project (C-HPP), is to map and annotate the entire predicted human protein set (~20,000 proteins) encoded by each chromosome. The initial steps of the project are focused on 'missing proteins (MPs)', which lacked documented evidence for existence at protein level. In addition to remaining 2,579 MPs, we also target those annotated proteins having unknown functions, uPE1 proteins, alternative splice isoforms and post-translational modifications. We also consider how to investigate various protein functions involved in cis-regulatory phenomena, amplicons lncRNAs and smORFs. Areas covered: We will cover the scope, historic background, progress, challenges and future prospects of C-HPP. This review also addresses the question of how we can best improve the methodological approaches, select the optimal biological samples, and recommend stringent protocols for the identification and characterization of MPs. A new strategy for functional analysis of some of those annotated proteins having unknown function will also be discussed. Expert commentary: If the project moves well by reshaping the original goals, the current working modules and team work in the proposed extended planning period, it is anticipated that a progressively more detailed draft of an accurate chromosome-based proteome map will become available with functional information.
Collapse
Affiliation(s)
- Young-Ki Paik
- a Yonsei Proteome Research Center and Department of Biochemistry , Yonsei University , Seoul , Korea
| | - Gilbert S Omenn
- b Department of Computational Medicine & Bioinformatics , University of Michigan , Ann Arbor , MI , USA
| | - William S Hancock
- c Department of Chemical Biology , Northeastern University , Boston , Massachusetts 02115 , USA
| | - Lydie Lane
- d Department of Human Protein Sciences, Faculty of Medicine , University of Geneva , Geneva , Switzerland.,e Swiss Institute of Bioinformatics , Geneva , Switzerland
| | - Christopher M Overall
- f Centre for Blood Research, Departments of Oral Biological & Medical Sciences, and Biochemistry & Molecular Biology, Faculty of Dentistry , University of British Columbia , Vancouver , Canada
| |
Collapse
|
30
|
Omenn GS, Lane L, Lundberg EK, Overall CM, Deutsch EW. Progress on the HUPO Draft Human Proteome: 2017 Metrics of the Human Proteome Project. J Proteome Res 2017; 16:4281-4287. [PMID: 28853897 DOI: 10.1021/acs.jproteome.7b00375] [Citation(s) in RCA: 50] [Impact Index Per Article: 6.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/09/2023]
Abstract
The Human Proteome Organization (HUPO) Human Proteome Project (HPP) continues to make progress on its two overall goals: (1) completing the protein parts list, with an annual update of the HUPO draft human proteome, and (2) making proteomics an integrated complement to genomics and transcriptomics throughout biomedical and life sciences research. neXtProt version 2017-01-23 has 17 008 confident protein identifications (Protein Existence [PE] level 1) that are compliant with the HPP Guidelines v2.1 ( https://hupo.org/Guidelines ), up from 13 664 in 2012-12 and 16 518 in 2016-04. Remaining to be found by mass spectrometry and other methods are 2579 "missing proteins" (PE2+3+4), down from 2949 in 2016. PeptideAtlas 2017-01 has 15 173 canonical proteins, accounting for nearly all of the 15 290 PE1 proteins based on MS data. These resources have extensive data on PTMs, single amino acid variants, and splice isoforms. The Human Protein Atlas v16 has 10 492 highly curated protein entries with tissue and subcellular spatial localization of proteins and transcript expression. Organ-specific popular protein lists have been generated for broad use in quantitative targeted proteomics using SRM-MS or DIA-SWATH-MS studies of biology and disease.
Collapse
Affiliation(s)
- Gilbert S Omenn
- Department of Computational Medicine and Bioinformatics, University of Michigan , 100 Washtenaw Avenue, Ann Arbor, Michigan 48109-2218, United States.,Institute for Systems Biology , 401 Terry Avenue North, Seattle, Washington 98109-5263, United States
| | - Lydie Lane
- CALIPHO Group, SIB Swiss Institute of Bioinformatics and Department of Human Protein Science, University of Geneva , CMU, Michel-Servet 1, 1211 Geneva 4, Switzerland
| | - Emma K Lundberg
- SciLifeLab Stockholm and School of Biotechnology, KTH, Karolinska Institutet Science Park , Tomtebodavägen 23, SE-171 65 Solna, Sweden
| | - Christopher M Overall
- Life Sciences Institute, Faculty of Dentistry, University of British Columbia , 2350 Health Sciences Mall, Room 4.401, Vancouver, British Columbia V6T 1Z3, Canada
| | - Eric W Deutsch
- Institute for Systems Biology , 401 Terry Avenue North, Seattle, Washington 98109-5263, United States
| |
Collapse
|