1
|
Liu X, Chen H, Li Z, Yang X, Jin W, Wang Y, Zheng J, Li L, Xuan C, Yuan J, Yang Y. InPACT: a computational method for accurate characterization of intronic polyadenylation from RNA sequencing data. Nat Commun 2024; 15:2583. [PMID: 38519498 PMCID: PMC10960005 DOI: 10.1038/s41467-024-46875-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/12/2023] [Accepted: 03/12/2024] [Indexed: 03/25/2024] Open
Abstract
Alternative polyadenylation can occur in introns, termed intronic polyadenylation (IPA), has been implicated in diverse biological processes and diseases, as it can produce noncoding transcripts or transcripts with truncated coding regions. However, a reliable method is required to accurately characterize IPA. Here, we propose a computational method called InPACT, which allows for the precise characterization of IPA from conventional RNA-seq data. InPACT successfully identifies numerous previously unannotated IPA transcripts in human cells, many of which are translated, as evidenced by ribosome profiling data. We have demonstrated that InPACT outperforms other methods in terms of IPA identification and quantification. Moreover, InPACT applied to monocyte activation reveals temporally coordinated IPA events. Further application on single-cell RNA-seq data of human fetal bone marrow reveals the expression of several IPA isoforms in a context-specific manner. Therefore, InPACT represents a powerful tool for the accurate characterization of IPA from RNA-seq data.
Collapse
Affiliation(s)
- Xiaochuan Liu
- The Province and Ministry Co-sponsored Collaborative Innovation Center for Medical Epigenetics, Tianjin Key Laboratory of Inflammatory Biology, The Second Hospital of Tianjin Medical University, Department of Bioinformatics, School of Basic Medical Sciences, Tianjin Medical University, Tianjin, 300070, China
| | - Hao Chen
- Department of Biochemistry and Molecular Biology, School of Basic Medical Sciences, Tianjin Medical University, Tianjin, 300070, China
| | - Zekun Li
- The Province and Ministry Co-sponsored Collaborative Innovation Center for Medical Epigenetics, Tianjin Key Laboratory of Inflammatory Biology, The Second Hospital of Tianjin Medical University, Department of Bioinformatics, School of Basic Medical Sciences, Tianjin Medical University, Tianjin, 300070, China
| | - Xiaoxiao Yang
- The Province and Ministry Co-sponsored Collaborative Innovation Center for Medical Epigenetics, Tianjin Key Laboratory of Inflammatory Biology, The Second Hospital of Tianjin Medical University, Department of Bioinformatics, School of Basic Medical Sciences, Tianjin Medical University, Tianjin, 300070, China
- Department of Pharmacology, School of Basic Medical Sciences, Tianjin Medical University, Tianjin, 300070, China
| | - Wen Jin
- The Province and Ministry Co-sponsored Collaborative Innovation Center for Medical Epigenetics, Tianjin Key Laboratory of Inflammatory Biology, The Second Hospital of Tianjin Medical University, Department of Bioinformatics, School of Basic Medical Sciences, Tianjin Medical University, Tianjin, 300070, China
- Department of Pharmacology, School of Basic Medical Sciences, Tianjin Medical University, Tianjin, 300070, China
| | - Yuting Wang
- The Province and Ministry Co-sponsored Collaborative Innovation Center for Medical Epigenetics, Tianjin Key Laboratory of Inflammatory Biology, The Second Hospital of Tianjin Medical University, Department of Bioinformatics, School of Basic Medical Sciences, Tianjin Medical University, Tianjin, 300070, China
- Department of Pharmacology, School of Basic Medical Sciences, Tianjin Medical University, Tianjin, 300070, China
| | - Jian Zheng
- Department of Immunology, School of Basic Medical Sciences, Tianjin Medical University, Tianjin, 300070, China
| | - Long Li
- Department of Immunology, School of Basic Medical Sciences, Tianjin Medical University, Tianjin, 300070, China
| | - Chenghao Xuan
- Department of Biochemistry and Molecular Biology, School of Basic Medical Sciences, Tianjin Medical University, Tianjin, 300070, China.
| | - Jiapei Yuan
- State Key Laboratory of Experimental Hematology, National Clinical Research Center for Blood Diseases, Haihe Laboratory of Cell Ecosystem, Institute of Hematology and Blood Diseases Hospital, Chinese Academy of Medical Sciences and Peking Union Medical College, Tianjin, 300020, China.
- Tianjin Institutes of Health Science, Tianjin, 301600, China.
| | - Yang Yang
- The Province and Ministry Co-sponsored Collaborative Innovation Center for Medical Epigenetics, Tianjin Key Laboratory of Inflammatory Biology, The Second Hospital of Tianjin Medical University, Department of Bioinformatics, School of Basic Medical Sciences, Tianjin Medical University, Tianjin, 300070, China.
- Department of Pharmacology, School of Basic Medical Sciences, Tianjin Medical University, Tianjin, 300070, China.
| |
Collapse
|
2
|
Valdivia-Francia F, Sendoel A. No country for old methods: New tools for studying microproteins. iScience 2024; 27:108972. [PMID: 38333695 PMCID: PMC10850755 DOI: 10.1016/j.isci.2024.108972] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/10/2024] Open
Abstract
Microproteins encoded by small open reading frames (sORFs) have emerged as a fascinating frontier in genomics. Traditionally overlooked due to their small size, recent technological advancements such as ribosome profiling, mass spectrometry-based strategies and advanced computational approaches have led to the annotation of more than 7000 sORFs in the human genome. Despite the vast progress, only a tiny portion of these microproteins have been characterized and an important challenge in the field lies in identifying functionally relevant microproteins and understanding their role in different cellular contexts. In this review, we explore the recent advancements in sORF research, focusing on the new methodologies and computational approaches that have facilitated their identification and functional characterization. Leveraging these new tools hold great promise for dissecting the diverse cellular roles of microproteins and will ultimately pave the way for understanding their role in the pathogenesis of diseases and identifying new therapeutic targets.
Collapse
Affiliation(s)
- Fabiola Valdivia-Francia
- University of Zurich, Institute for Regenerative Medicine (IREM), Wagistrasse 12, 8952 Schlieren-Zurich, Switzerland
- Life Science Zurich Graduate School, Molecular Life Science Program, University of Zurich/ ETH Zurich, Schlieren-Zurich, Switzerland
| | - Ataman Sendoel
- University of Zurich, Institute for Regenerative Medicine (IREM), Wagistrasse 12, 8952 Schlieren-Zurich, Switzerland
| |
Collapse
|
3
|
Richardson MO, Eddy SR. ORFeus: a computational method to detect programmed ribosomal frameshifts and other non-canonical translation events. BMC Bioinformatics 2023; 24:471. [PMID: 38093195 PMCID: PMC10720069 DOI: 10.1186/s12859-023-05602-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/24/2023] [Accepted: 12/05/2023] [Indexed: 12/17/2023] Open
Abstract
BACKGROUND In canonical protein translation, ribosomes initiate translation at a specific start codon, maintain a single reading frame throughout elongation, and terminate at the first in-frame stop codon. However, ribosomal behavior can deviate at each of these steps, sometimes in a programmed manner. Certain mRNAs contain sequence and structural elements that cause ribosomes to begin translation at alternative start codons, shift reading frame, read through stop codons, or reinitiate on the same mRNA. These processes represent important translational control mechanisms that can allow an mRNA to encode multiple functional protein products or regulate protein expression. The prevalence of these events remains uncertain, due to the difficulty of systematic detection. RESULTS We have developed a computational model to infer non-canonical translation events from ribosome profiling data. CONCLUSION ORFeus identifies known examples of alternative open reading frames and recoding events across different organisms and enables transcriptome-wide searches for novel events.
Collapse
Affiliation(s)
- Mary O Richardson
- Department of Molecular and Cellular Biology, Harvard University, Cambridge, MA, USA
| | - Sean R Eddy
- Department of Molecular and Cellular Biology, Harvard University, Cambridge, MA, USA.
- Howard Hughes Medical Institute, Harvard University, Cambridge, MA, USA.
| |
Collapse
|
4
|
Wang J, Wang W, Ma F, Qian H. A hidden translatome in tumors-the coding lncRNAs. SCIENCE CHINA. LIFE SCIENCES 2023; 66:2755-2772. [PMID: 37154857 DOI: 10.1007/s11427-022-2289-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 11/08/2022] [Accepted: 12/29/2022] [Indexed: 05/10/2023]
Abstract
Long noncoding RNAs (lncRNAs) have been extensively identified in eukaryotic genomes and have been shown to play critical roles in the development of multiple cancers. Through the application and development of ribosome analysis and sequencing technologies, advanced studies have discovered the translation of lncRNAs. Although lncRNAs were originally defined as noncoding RNAs, many lncRNAs actually contain small open reading frames that are translated into peptides. This opens a broad area for the functional investigation of lncRNAs. Here, we introduce prospective methods and databases for screening lncRNAs with functional polypeptides. We also summarize the specific lncRNA-encoded proteins and their molecular mechanisms that promote or inhibit cancerous. Importantly, the role of lncRNA-encoded peptides/proteins holds promise in cancer research, but some potential challenges remain unresolved. This review includes reports on lncRNA-encoded peptides or proteins in cancer, aiming to provide theoretical basis and related references to facilitate the discovery of more functional peptides encoded by lncRNA, and to further develop new anti-cancer therapeutic targets as well as clinical biomarkers of diagnosis and prognosis.
Collapse
Affiliation(s)
- Jinsong Wang
- State Key Laboratory of Molecular Oncology, National Cancer Center/National Clinical Research Center for Cancer/Cancer Hospital, Chinese Academy of Medical Sciences and Peking Union Medical College, Beijing, 100021, China
| | - Wenna Wang
- State Key Laboratory of Molecular Oncology, National Cancer Center/National Clinical Research Center for Cancer/Cancer Hospital, Chinese Academy of Medical Sciences and Peking Union Medical College, Beijing, 100021, China
- Department of Medical Oncology, National Cancer Center/National Clinical Research Center for Cancer/Cancer Hospital, Chinese Academy of Medical Sciences and Peking Union Medical College, Beijing, 100021, China
| | - Fei Ma
- State Key Laboratory of Molecular Oncology, National Cancer Center/National Clinical Research Center for Cancer/Cancer Hospital, Chinese Academy of Medical Sciences and Peking Union Medical College, Beijing, 100021, China.
- Department of Medical Oncology, National Cancer Center/National Clinical Research Center for Cancer/Cancer Hospital, Chinese Academy of Medical Sciences and Peking Union Medical College, Beijing, 100021, China.
| | - Haili Qian
- State Key Laboratory of Molecular Oncology, National Cancer Center/National Clinical Research Center for Cancer/Cancer Hospital, Chinese Academy of Medical Sciences and Peking Union Medical College, Beijing, 100021, China.
| |
Collapse
|
5
|
Valencia JD, Hendrix DA. Improving deep models of protein-coding potential with a Fourier-transform architecture and machine translation task. PLoS Comput Biol 2023; 19:e1011526. [PMID: 37824580 PMCID: PMC10597526 DOI: 10.1371/journal.pcbi.1011526] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/19/2023] [Revised: 10/24/2023] [Accepted: 09/18/2023] [Indexed: 10/14/2023] Open
Abstract
Ribosomes are information-processing macromolecular machines that integrate complex sequence patterns in messenger RNA (mRNA) transcripts to synthesize proteins. Studies of the sequence features that distinguish mRNAs from long noncoding RNAs (lncRNAs) may yield insight into the information that directs and regulates translation. Computational methods for calculating protein-coding potential are important for distinguishing mRNAs from lncRNAs during genome annotation, but most machine learning methods for this task rely on previously known rules to define features. Sequence-to-sequence (seq2seq) models, particularly ones using transformer networks, have proven capable of learning complex grammatical relationships between words to perform natural language translation. Seeking to leverage these advancements in the biological domain, we present a seq2seq formulation for predicting protein-coding potential with deep neural networks and demonstrate that simultaneously learning translation from RNA to protein improves classification performance relative to a classification-only training objective. Inspired by classical signal processing methods for gene discovery and Fourier-based image-processing neural networks, we introduce LocalFilterNet (LFNet). LFNet is a network architecture with an inductive bias for modeling the three-nucleotide periodicity apparent in coding sequences. We incorporate LFNet within an encoder-decoder framework to test whether the translation task improves the classification of transcripts and the interpretation of their sequence features. We use the resulting model to compute nucleotide-resolution importance scores, revealing sequence patterns that could assist the cellular machinery in distinguishing mRNAs and lncRNAs. Finally, we develop a novel approach for estimating mutation effects from Integrated Gradients, a backpropagation-based feature attribution, and characterize the difficulty of efficient approximations in this setting.
Collapse
Affiliation(s)
- Joseph D. Valencia
- School of Electrical Engineering and Computer Science, Oregon State University, Corvallis, Oregon, United States of America
| | - David A. Hendrix
- School of Electrical Engineering and Computer Science, Oregon State University, Corvallis, Oregon, United States of America
- Department of Biochemistry and Biophysics, Oregon State University, Corvallis, Oregon, United States of America
| |
Collapse
|
6
|
Zhang T, Zheng H, Lu D, Guan G, Li D, Zhang J, Liu S, Zhao J, Guo JT, Lu F, Chen X. RNA binding protein TIAR modulates HBV replication by tipping the balance of pgRNA translation. Signal Transduct Target Ther 2023; 8:346. [PMID: 37699883 PMCID: PMC10497612 DOI: 10.1038/s41392-023-01573-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/30/2023] [Revised: 06/20/2023] [Accepted: 07/23/2023] [Indexed: 09/14/2023] Open
Abstract
The pregenomic RNA (pgRNA) of hepatitis B virus (HBV) serves not only as a bicistronic message RNA to translate core protein (Cp) and DNA polymerase (Pol), but also as the template for reverse transcriptional replication of viral DNA upon packaging into nucleocapsid. Although it is well known that pgRNA translates much more Cp than Pol, the molecular mechanism underlying the regulation of Cp and Pol translation efficiency from pgRNA remains elusive. In this study, we systematically profiled HBV nucleocapsid- and pgRNA-associated cellular proteins by proteomic analysis and identified TIA-1-related protein (TIAR) as a novel cellular protein that binds pgRNA and promotes HBV DNA replication. Interestingly, loss- and gain-of-function genetic analyses showed that manipulation of TIAR expression did not alter the levels of HBV transcripts nor the secretion of HBsAg and HBeAg in human hepatoma cells supporting HBV replication. However, Ribo-seq and PRM-based mass spectrometry analyses demonstrated that TIAR increased the translation of Pol but decreased the translation of Cp from pgRNA. RNA immunoprecipitation (RIP) and pulldown assays further revealed that TIAR directly binds pgRNA at the 5' stem-loop (ε). Moreover, HBV replication or Cp expression induced the increased expression and redistribution of TIAR from the nucleus to the cytoplasm of hepatocytes. Our results thus imply that TIAR is a novel cellular factor that regulates HBV replication by binding to the 5' ε structure of pgRNA to tip the balance of Cp and Pol translation. Through induction of TIAR translocation from the nucleus to the cytoplasm, Cp indirectly regulates the Pol translation and balances Cp and Pol expression levels in infected hepatocytes to ensure efficient viral replication.
Collapse
Affiliation(s)
- Ting Zhang
- Department of Microbiology and Infectious Disease Center, School of Basic Medical Sciences, Peking University Health Science Center, Beijing, 100191, China
| | - Huiling Zheng
- Department of Microbiology and Infectious Disease Center, School of Basic Medical Sciences, Peking University Health Science Center, Beijing, 100191, China
| | - Danjuan Lu
- Department of Microbiology and Infectious Disease Center, School of Basic Medical Sciences, Peking University Health Science Center, Beijing, 100191, China
| | - Guiwen Guan
- Department of Microbiology and Infectious Disease Center, School of Basic Medical Sciences, Peking University Health Science Center, Beijing, 100191, China
| | - Deyao Li
- Department of Microbiology and Infectious Disease Center, School of Basic Medical Sciences, Peking University Health Science Center, Beijing, 100191, China
| | - Jing Zhang
- Department of Microbiology and Infectious Disease Center, School of Basic Medical Sciences, Peking University Health Science Center, Beijing, 100191, China
| | - Shuhong Liu
- Department of Pathology and Hepatology, The Fifth Medical Center of Chinese PLA General Hospital, Beijing, 100039, China
| | - Jingmin Zhao
- Department of Pathology and Hepatology, The Fifth Medical Center of Chinese PLA General Hospital, Beijing, 100039, China
| | - Ju-Tao Guo
- Department of Experimental Therapeutics, Baruch S. Blumberg Institute, Doylestown, PA, 18902, USA.
| | - Fengmin Lu
- Department of Microbiology and Infectious Disease Center, School of Basic Medical Sciences, Peking University Health Science Center, Beijing, 100191, China.
- Beijing Key Laboratory of Hepatitis C and Immunotherapy for Liver Diseases, Peking University Hepatology Institute, Peking University People's Hospital, Beijing, 100044, China.
| | - Xiangmei Chen
- Department of Microbiology and Infectious Disease Center, School of Basic Medical Sciences, Peking University Health Science Center, Beijing, 100191, China.
| |
Collapse
|
7
|
Feng YZ, Zhu QF, Xue J, Chen P, Yu Y. Shining in the dark: the big world of small peptides in plants. ABIOTECH 2023; 4:238-256. [PMID: 37970469 PMCID: PMC10638237 DOI: 10.1007/s42994-023-00100-0] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 12/01/2022] [Accepted: 02/24/2023] [Indexed: 11/17/2023]
Abstract
Small peptides represent a subset of dark matter in plant proteomes. Through differential expression patterns and modes of action, small peptides act as important regulators of plant growth and development. Over the past 20 years, many small peptides have been identified due to technical advances in genome sequencing, bioinformatics, and chemical biology. In this article, we summarize the classification of plant small peptides and experimental strategies used to identify them as well as their potential use in agronomic breeding. We review the biological functions and molecular mechanisms of small peptides in plants, discuss current problems in small peptide research and highlight future research directions in this field. Our review provides crucial insight into small peptides in plants and will contribute to a better understanding of their potential roles in biotechnology and agriculture.
Collapse
Affiliation(s)
- Yan-Zhao Feng
- Guangdong Key Laboratory of Crop Germplasm Resources Preservation and Utilization, Key Laboratory of South China Modern Biological Seed Industry, Ministry of Agriculture and Rural Affairs, Agro-Biological Gene Research Center, Guangdong Academy of Agricultural Sciences, Guangzhou, 510640 China
| | - Qing-Feng Zhu
- Guangdong Key Laboratory of Crop Germplasm Resources Preservation and Utilization, Key Laboratory of South China Modern Biological Seed Industry, Ministry of Agriculture and Rural Affairs, Agro-Biological Gene Research Center, Guangdong Academy of Agricultural Sciences, Guangzhou, 510640 China
| | - Jiao Xue
- Guangdong Key Laboratory of Crop Germplasm Resources Preservation and Utilization, Key Laboratory of South China Modern Biological Seed Industry, Ministry of Agriculture and Rural Affairs, Agro-Biological Gene Research Center, Guangdong Academy of Agricultural Sciences, Guangzhou, 510640 China
| | - Pei Chen
- Guangdong Key Laboratory of Crop Germplasm Resources Preservation and Utilization, Key Laboratory of South China Modern Biological Seed Industry, Ministry of Agriculture and Rural Affairs, Agro-Biological Gene Research Center, Guangdong Academy of Agricultural Sciences, Guangzhou, 510640 China
| | - Yang Yu
- Guangdong Key Laboratory of Crop Germplasm Resources Preservation and Utilization, Key Laboratory of South China Modern Biological Seed Industry, Ministry of Agriculture and Rural Affairs, Agro-Biological Gene Research Center, Guangdong Academy of Agricultural Sciences, Guangzhou, 510640 China
| |
Collapse
|
8
|
Dong X, Zhang K, Xun C, Chu T, Liang S, Zeng Y, Liu Z. Small Open Reading Frame-Encoded Micro-Peptides: An Emerging Protein World. Int J Mol Sci 2023; 24:10562. [PMID: 37445739 DOI: 10.3390/ijms241310562] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/13/2023] [Revised: 06/20/2023] [Accepted: 06/21/2023] [Indexed: 07/15/2023] Open
Abstract
Small open reading frames (sORFs) are often overlooked features in genomes. In the past, they were labeled as noncoding or "transcriptional noise". However, accumulating evidence from recent years suggests that sORFs may be transcribed and translated to produce sORF-encoded polypeptides (SEPs) with less than 100 amino acids. The vigorous development of computational algorithms, ribosome profiling, and peptidome has facilitated the prediction and identification of many new SEPs. These SEPs were revealed to be involved in a wide range of basic biological processes, such as gene expression regulation, embryonic development, cellular metabolism, inflammation, and even carcinogenesis. To effectively understand the potential biological functions of SEPs, we discuss the history and development of the newly emerging research on sORFs and SEPs. In particular, we review a range of recently discovered bioinformatics tools for identifying, predicting, and validating SEPs as well as a variety of biochemical experiments for characterizing SEP functions. Lastly, this review underlines the challenges and future directions in identifying and validating sORFs and their encoded micropeptides, providing a significant reference for upcoming research on sORF-encoded peptides.
Collapse
Affiliation(s)
- Xiaoping Dong
- National & Local Joint Engineering Laboratory of Animal Peptide Drug Development, College of Life Sciences, Hunan Normal University, Changsha 410081, China
- Peptide and Small Molecule Drug R&D Platform, Furong Laboratory, Hunan Normal University, Changsha 410081, China
| | - Kun Zhang
- The State Key Laboratory of Developmental Biology of Freshwater Fish, College of Life Science, Hunan Normal University, Changsha 410081, China
| | - Chengfeng Xun
- National & Local Joint Engineering Laboratory of Animal Peptide Drug Development, College of Life Sciences, Hunan Normal University, Changsha 410081, China
- Peptide and Small Molecule Drug R&D Platform, Furong Laboratory, Hunan Normal University, Changsha 410081, China
| | - Tianqi Chu
- National & Local Joint Engineering Laboratory of Animal Peptide Drug Development, College of Life Sciences, Hunan Normal University, Changsha 410081, China
- Peptide and Small Molecule Drug R&D Platform, Furong Laboratory, Hunan Normal University, Changsha 410081, China
| | - Songping Liang
- National & Local Joint Engineering Laboratory of Animal Peptide Drug Development, College of Life Sciences, Hunan Normal University, Changsha 410081, China
- Peptide and Small Molecule Drug R&D Platform, Furong Laboratory, Hunan Normal University, Changsha 410081, China
| | - Yong Zeng
- National & Local Joint Engineering Laboratory of Animal Peptide Drug Development, College of Life Sciences, Hunan Normal University, Changsha 410081, China
- Peptide and Small Molecule Drug R&D Platform, Furong Laboratory, Hunan Normal University, Changsha 410081, China
- The State Key Laboratory of Developmental Biology of Freshwater Fish, College of Life Science, Hunan Normal University, Changsha 410081, China
| | - Zhonghua Liu
- National & Local Joint Engineering Laboratory of Animal Peptide Drug Development, College of Life Sciences, Hunan Normal University, Changsha 410081, China
- Peptide and Small Molecule Drug R&D Platform, Furong Laboratory, Hunan Normal University, Changsha 410081, China
| |
Collapse
|
9
|
Valencia JD, Hendrix DA. Improving deep models of protein-coding potential with a Fourier-transform architecture and machine translation task. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.04.03.535488. [PMID: 37066250 PMCID: PMC10104019 DOI: 10.1101/2023.04.03.535488] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 05/05/2023]
Abstract
Ribosomes are information-processing macromolecular machines that integrate complex sequence patterns in messenger RNA (mRNA) transcripts to synthesize proteins. Studies of the sequence features that distinguish mRNAs from long noncoding RNAs (lncRNAs) may yield insight into the information that directs and regulates translation. Computational methods for calculating protein-coding potential are important for distinguishing mRNAs from lncRNAs during genome annotation, but most machine learning methods for this task rely on previously known rules to define features. Sequence-to-sequence (seq2seq) models, particularly ones using transformer networks, have proven capable of learning complex grammatical relationships between words to perform natural language translation. Seeking to leverage these advancements in the biological domain, we present a seq2seq formulation for predicting protein-coding potential with deep neural networks and demonstrate that simultaneously learning translation from RNA to protein improves classification performance relative to a classification-only training objective. Inspired by classical signal processing methods for gene discovery and Fourier-based image-processing neural networks, we introduce LocalFilterNet (LFNet). LFNet is a network architecture with an inductive bias for modeling the three-nucleotide periodicity apparent in coding sequences. We incorporate LFNet within an encoder-decoder framework to test whether the translation task improves the classification of transcripts and the interpretation of their sequence features. We use the resulting model to compute nucleotide-resolution importance scores, revealing sequence patterns that could assist the cellular machinery in distinguishing mRNAs and lncRNAs. Finally, we develop a novel approach for estimating mutation effects from Integrated Gradients, a backpropagation-based feature attribution, and characterize the difficulty of efficient approximations in this setting.
Collapse
Affiliation(s)
- Joseph D. Valencia
- School of Electrical Engineering and Computer Science, Oregon State University, Corvallis, OR, USA
| | - David A. Hendrix
- School of Electrical Engineering and Computer Science, Oregon State University, Corvallis, OR, USA
- Department of Biochemistry and Biophysics, Oregon State University, Corvallis, OR, USA
| |
Collapse
|
10
|
Song B, Li H, Jiang M, Gao Z, Wang S, Gao L, Chen Y, Li W. slORFfinder: a tool to detect open reading frames resulting from trans-splicing of spliced leader sequences. Brief Bioinform 2023; 24:6972299. [PMID: 36611257 PMCID: PMC9851317 DOI: 10.1093/bib/bbac610] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/31/2022] [Revised: 11/16/2022] [Accepted: 12/11/2022] [Indexed: 01/09/2023] Open
Abstract
Trans-splicing of a spliced leader (SL) to the 5' ends of mRNAs is used to produce mature mRNAs in several phyla of great importance to human health and the marine ecosystem. One of the consequences of the addition of SL sequences is the change or disruption of the open reading frames (ORFs) in the recipient transcripts. Given that most SL sequences have one or more of the trinucleotide NUG, including AUG in flatworms, trans-splicing of SL sequences can potentially supply a start codon to create new ORFs, which we refer to as slORFs, in the recipient mRNAs. Due to the lack of a tool to precisely detect them, slORFs were usually neglected in previous studies. In this work, we present the tool slORFfinder, which automatically links the SL sequences to the recipient mRNAs at the trans-splicing sites identified from SL-containing reads of RNA-Seq and predicts slORFs according to the distribution of ribosome-protected footprints (RPFs) on the trans-spliced transcripts. By applying this tool to the analyses of nematodes, ascidians and euglena, whose RPFs are publicly available, we find wide existence of slORFs in these taxa. Furthermore, we find that slORFs are generally translated at higher levels than the annotated ORFs in the genomes, suggesting they might have important functions. Overall, this study provides a tool, slORFfinder (https://github.com/songbo446/slORFfinder), to identify slORFs, which can enhance our understanding of ORFs in taxa with SL machinery.
Collapse
Affiliation(s)
| | | | - Mengyun Jiang
- Shenzhen Branch, Guangdong Laboratory of Lingnan Modern Agriculture, Genome Analysis Laboratory of the Ministry of Agriculture and Rural Affairs, Agricultural Genomics Institute at Shenzhen, Chinese Academy of Agricultural Sciences, Shenzhen 518120, China
| | - Zhongtian Gao
- Guangdong Provincial Key Laboratory for Plant Epigenetics, College of Life Sciences and Oceanography, Shenzhen University, Shenzhen 518060, China
| | - Suikang Wang
- Shenzhen Branch, Guangdong Laboratory of Lingnan Modern Agriculture, Genome Analysis Laboratory of the Ministry of Agriculture and Rural Affairs, Agricultural Genomics Institute at Shenzhen, Chinese Academy of Agricultural Sciences, Shenzhen 518120, China
| | - Lei Gao
- Guangdong Provincial Key Laboratory for Plant Epigenetics, College of Life Sciences and Oceanography, Shenzhen University, Shenzhen 518060, China
| | - Yunsheng Chen
- Corresponding authors: Yunsheng Chen, Department of Laboratory Medicine, Shenzhen Children's Hospital, Shenzhen 518038, China, E-mail: ; Wujiao Li, Department of Laboratory Medicine, Shenzhen Childrens' Hospital, Shenzhen 518038, China, E-mail:
| | - Wujiao Li
- Corresponding authors: Yunsheng Chen, Department of Laboratory Medicine, Shenzhen Children's Hospital, Shenzhen 518038, China, E-mail: ; Wujiao Li, Department of Laboratory Medicine, Shenzhen Childrens' Hospital, Shenzhen 518038, China, E-mail:
| |
Collapse
|
11
|
Chothani S, Ho L, Schafer S, Rackham O. Discovering microproteins: making the most of ribosome profiling data. RNA Biol 2023; 20:943-954. [PMID: 38013207 PMCID: PMC10730196 DOI: 10.1080/15476286.2023.2279845] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 10/30/2023] [Indexed: 11/29/2023] Open
Abstract
Building a reference set of protein-coding open reading frames (ORFs) has revolutionized biological process discovery and understanding. Traditionally, gene models have been confirmed using cDNA sequencing and encoded translated regions inferred using sequence-based detection of start and stop combinations longer than 100 amino-acids to prevent false positives. This has led to small ORFs (smORFs) and their encoded proteins left un-annotated. Ribo-seq allows deciphering translated regions from untranslated irrespective of the length. In this review, we describe the power of Ribo-seq data in detection of smORFs while discussing the major challenge posed by data-quality, -depth and -sparseness in identifying the start and end of smORF translation. In particular, we outline smORF cataloguing efforts in humans and the large differences that have arisen due to variation in data, methods and assumptions. Although current versions of smORF reference sets can already be used as a powerful tool for hypothesis generation, we recommend that future editions should consider these data limitations and adopt unified processing for the community to establish a canonical catalogue of translated smORFs.
Collapse
Affiliation(s)
- Sonia Chothani
- Program in Cardiovascular and Metabolic Disorders, Duke-National University of Singapore, Singapore
| | - Lena Ho
- Program in Cardiovascular and Metabolic Disorders, Duke-National University of Singapore, Singapore
| | - Sebastian Schafer
- Program in Cardiovascular and Metabolic Disorders, Duke-National University of Singapore, Singapore
| | - Owen Rackham
- Program in Cardiovascular and Metabolic Disorders, Duke-National University of Singapore, Singapore
- School of Biological Sciences, University of Southampton, Southampton, UK
- The Alan Turing Institute, The British Library, London, UK
| |
Collapse
|
12
|
Yang Y, Wang H, Zhang Y, Chen L, Chen G, Bao Z, Yang Y, Xie Z, Zhao Q. An Optimized Proteomics Approach Reveals Novel Alternative Proteins in Mouse Liver Development. Mol Cell Proteomics 2022; 22:100480. [PMID: 36494044 PMCID: PMC9823216 DOI: 10.1016/j.mcpro.2022.100480] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/21/2022] [Revised: 11/15/2022] [Accepted: 12/04/2022] [Indexed: 12/12/2022] Open
Abstract
Alternative ORFs (AltORFs) are unannotated sequences in genome that encode novel peptides or proteins named alternative proteins (AltProts). Although ribosome profiling and bioinformatics predict a large number of AltProts, mass spectrometry as the only direct way of identification is hampered by the short lengths and relative low abundance of AltProts. There is an urgent need for improvement of mass spectrometry methodologies for AltProt identification. Here, we report an approach based on size-exclusion chromatography for simultaneous enrichment and fractionation of AltProts from complex proteome. This method greatly simplifies the variance of AltProts discovery by enriching small proteins smaller than 40 kDa. In a systematic comparison between 10 methods, the approach we reported enabled the discovery of more AltProts with overall higher intensities, with less cost of time and effort compared to other workflows. We applied this approach to identify 89 novel AltProts from mouse liver, 39 of which were differentially expressed between embryonic and adult mice. During embryonic development, the upregulated AltProts were mainly involved in biological pathways on RNA splicing and processing, whereas the AltProts involved in metabolisms were more active in adult livers. Our study not only provides an effective approach for identifying AltProts but also novel AltProts that are potentially important in developmental biology.
Collapse
Affiliation(s)
- Ying Yang
- State Key Laboratory of Chemical Biology and Drug Discovery, Department of Applied Biology and Chemical Technology, The Hong Kong Polytechnic University, Hong Kong, SAR, China
| | - Hongwei Wang
- State Key Laboratory of Ophthalmology, Zhongshan Ophthalmic Center, Sun Yat-sen University, Guangzhou, China
| | - Yuanliang Zhang
- State Key Laboratory of Chemical Biology and Drug Discovery, Department of Applied Biology and Chemical Technology, The Hong Kong Polytechnic University, Hong Kong, SAR, China
| | - Lei Chen
- State Key Laboratory of Chemical Biology and Drug Discovery, Department of Applied Biology and Chemical Technology, The Hong Kong Polytechnic University, Hong Kong, SAR, China
| | - Gennong Chen
- State Key Laboratory of Ophthalmology, Zhongshan Ophthalmic Center, Sun Yat-sen University, Guangzhou, China
| | - Zhaoshi Bao
- Department of Neurosurgery, Beijing Tiantan Hospital, Capital Medical School, Beijing, China
| | - Yang Yang
- State Key Laboratory of Chemical Biology and Drug Discovery, Department of Applied Biology and Chemical Technology, The Hong Kong Polytechnic University, Hong Kong, SAR, China
| | - Zhi Xie
- State Key Laboratory of Ophthalmology, Zhongshan Ophthalmic Center, Sun Yat-sen University, Guangzhou, China
| | - Qian Zhao
- State Key Laboratory of Chemical Biology and Drug Discovery, Department of Applied Biology and Chemical Technology, The Hong Kong Polytechnic University, Hong Kong, SAR, China,For correspondence: Qian Zhao
| |
Collapse
|
13
|
Bagheri A, Astafev A, Al-Hashimy T, Jiang P. Tracing Translational Footprint by Ribo-Seq: Principle, Workflow, and Applications to Understand the Mechanism of Human Diseases. Cells 2022; 11:cells11192966. [PMID: 36230928 PMCID: PMC9562884 DOI: 10.3390/cells11192966] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/26/2022] [Revised: 09/02/2022] [Accepted: 09/19/2022] [Indexed: 11/30/2022] Open
Abstract
RNA-seq has been widely used as a high-throughput method to characterize transcript dynamic changes in a broad context, such as development and diseases. However, whether RNA-seq-estimated transcriptional dynamics can be translated into protein level changes is largely unknown. Ribo-seq (Ribosome profiling) is an emerging technology that allows for the investigation of the translational footprint via profiling ribosome-bounded mRNA fragments. Ribo-seq coupled with RNA-seq will allow us to understand the transcriptional and translational control of the fundamental biological process and human diseases. This review focuses on discussing the principle, workflow, and applications of Ribo-seq to study human diseases.
Collapse
Affiliation(s)
- Atefeh Bagheri
- Department of Biological, Geological and Environmental Sciences (BGES), Cleveland State University, Cleveland, OH 44115, USA
- Center for Gene Regulation in Health and Disease (GRHD), Cleveland State University, Cleveland, OH 44115, USA
| | - Artem Astafev
- Department of Biological, Geological and Environmental Sciences (BGES), Cleveland State University, Cleveland, OH 44115, USA
- Center for Gene Regulation in Health and Disease (GRHD), Cleveland State University, Cleveland, OH 44115, USA
| | - Tara Al-Hashimy
- Department of Biological, Geological and Environmental Sciences (BGES), Cleveland State University, Cleveland, OH 44115, USA
| | - Peng Jiang
- Department of Biological, Geological and Environmental Sciences (BGES), Cleveland State University, Cleveland, OH 44115, USA
- Center for Gene Regulation in Health and Disease (GRHD), Cleveland State University, Cleveland, OH 44115, USA
- Center for Applied Data Analysis and Modeling (ADAM), Cleveland State University, Cleveland, OH 44115, USA
- Center for RNA Science and Therapeutics, School of Medicine, Case Western Reserve University, Cleveland, OH 44106, USA
- Correspondence: ; Tel.: +1-(216)-687-3917
| |
Collapse
|
14
|
Jiang M, Ning W, Wu S, Wang X, Zhu K, Li A, Li Y, Cheng S, Song B. Three-nucleotide periodicity of nucleotide diversity in a population enables the identification of open reading frames. Brief Bioinform 2022; 23:6607611. [PMID: 35698834 PMCID: PMC9294425 DOI: 10.1093/bib/bbac210] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/24/2022] [Revised: 04/25/2022] [Accepted: 05/06/2022] [Indexed: 11/14/2022] Open
Abstract
Accurate prediction of open reading frames (ORFs) is important for studying and using genome sequences. Ribosomes move along mRNA strands with a step of three nucleotides and datasets carrying this information can be used to predict ORFs. The ribosome-protected footprints (RPFs) feature a significant 3-nt periodicity on mRNAs and are powerful in predicting translating ORFs, including small ORFs (sORFs), but the application of RPFs is limited because they are too short to be accurately mapped in complex genomes. In this study, we found a significant 3-nt periodicity in the datasets of populational genomic variants in coding sequences, in which the nucleotide diversity increases every three nucleotides. We suggest that this feature can be used to predict ORFs and develop the Python package ‘OrfPP’, which recovers ~83% of the annotated ORFs in the tested genomes on average, independent of the population sizes and the complexity of the genomes. The novel ORFs, including sORFs, identified from single-nucleotide polymorphisms are supported by protein mass spectrometry evidence comparable to that of the annotated ORFs. The application of OrfPP to tetraploid cotton and hexaploid wheat genomes successfully identified 76.17% and 87.43% of the annotated ORFs in the genomes, respectively, as well as 4704 sORFs, including 1182 upstream and 2110 downstream ORFs in cotton and 5025 sORFs, including 232 upstream and 234 downstream ORFs in wheat. Overall, we propose an alternative and supplementary approach for ORF prediction that can extend the studies of sORFs to more complex genomes.
Collapse
Affiliation(s)
- Mengyun Jiang
- Chinese Academy of Agricultural Sciences and Henan University, China
| | - Weidong Ning
- Chinese Academy of Agricultural Sciences and Huazhong Agricultural University, China
| | - Shishi Wu
- Chinese Academy of Agricultural Sciences and Henan University, China
| | - Xingwei Wang
- Chinese Academy of Agricultural Sciences and Henan University, China
| | - Kun Zhu
- Chinese Academy of Agricultural Sciences and Henan University, China
| | - Aomei Li
- Chinese Academy of Agricultural Sciences, China
| | - Yongyao Li
- Chinese Academy of Agricultural Sciences, China
| | | | - Bo Song
- Chinese Academy of Agricultural Sciences, China
| |
Collapse
|
15
|
Small open reading frames in plant research: from prediction to functional characterization. 3 Biotech 2022; 12:76. [PMID: 35251879 PMCID: PMC8873315 DOI: 10.1007/s13205-022-03147-w] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/28/2021] [Accepted: 02/11/2022] [Indexed: 11/01/2022] Open
Abstract
Gene prediction is a laborious and time-consuming task. The advancement of sequencing technologies and bioinformatics tools, coupled with accelerated rate of ribosome profiling and mass spectrometry development, have made identification of small open reading frames (sORFs) (< 100 codons) in various plant genomes possible. The past 50 years have seen sORFs being isolated from many organisms. However, to date, a comprehensive sORF annotation pipeline is as yet unavailable, hence, addressed in our review. Here, we also provide current information on classification and functions of plant sORFs and their potential applications in crop improvement programs.
Collapse
|
16
|
Wu HYL, Hsu PY. RiboPlotR: a visualization tool for periodic Ribo-seq reads. PLANT METHODS 2021; 17:124. [PMID: 34876166 PMCID: PMC8650366 DOI: 10.1186/s13007-021-00824-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 09/28/2021] [Accepted: 11/26/2021] [Indexed: 06/13/2023]
Abstract
BACKGROUND Ribo-seq has revolutionized the study of genome-wide mRNA translation. High-quality Ribo-seq data display strong 3-nucleotide (nt) periodicity, which corresponds to translating ribosomes deciphering three nts at a time. While 3-nt periodicity has been widely used to study novel translation events such as upstream ORFs in 5' untranslated regions and small ORFs in presumed non-coding RNAs, tools that allow the visualization of these events remain underdeveloped. RESULTS RiboPlotR is a visualization package written in R that presents both RNA-seq coverage and Ribo-seq reads in genomic coordinates for all annotated transcript isoforms of a gene. Specifically, for individual isoform models, RiboPlotR plots Ribo-seq data in the context of gene structures, including 5' and 3' untranslated regions and introns, and it presents the reads for all three reading frames in three different colors. The inclusion of gene structures and color-coding the reading frames facilitate observing new translation events and identifying potential regulatory mechanisms. CONCLUSIONS RiboPlotR is freely available ( https://github.com/hsinyenwu/RiboPlotR and https://sourceforge.net/projects/riboplotr/ ) and allows the visualization of translated features identified in Ribo-seq data.
Collapse
Affiliation(s)
- Hsin-Yen Larry Wu
- Department of Biochemistry & Molecular Biology, Michigan State University, East Lansing, MI, 48824, USA
| | - Polly Yingshan Hsu
- Department of Biochemistry & Molecular Biology, Michigan State University, East Lansing, MI, 48824, USA.
| |
Collapse
|
17
|
Chen L, Yang Y, Zhang Y, Li K, Cai H, Wang H, Zhao Q. The Small Open Reading Frame-Encoded Peptides: Advances in Methodologies and Functional Studies. Chembiochem 2021; 23:e202100534. [PMID: 34862721 DOI: 10.1002/cbic.202100534] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/06/2021] [Revised: 11/15/2021] [Indexed: 11/07/2022]
Abstract
Small open reading frames (sORFs) are an important class of genes with less than 100 codons. They were historically annotated as noncoding or even junk sequences. In recent years, accumulating evidence suggests that sORFs could encode a considerable number of polypeptides, many of which play important roles in both physiology and disease pathology. However, it has been technically challenging to directly detect sORF-encoded peptides (SEPs). Here, we discuss the latest advances in methodologies for identifying SEPs with mass spectrometry, as well as the progress on functional studies of SEPs.
Collapse
Affiliation(s)
- Lei Chen
- State Key Laboratory of Chemical Biology and Drug Discovery, Department of Applied Biology and Chemical Technology, Hong Kong Polytechnic University, Hung Hom, Hong Kong SAR, 999077, P. R. China.,Laboratory for Synthetic Chemistry and Chemical Biology Limited, Hong Kong Science and Technology Park, New Territories, Hong Kong SAR, 999077, P. R. China
| | - Ying Yang
- State Key Laboratory of Chemical Biology and Drug Discovery, Department of Applied Biology and Chemical Technology, Hong Kong Polytechnic University, Hung Hom, Hong Kong SAR, 999077, P. R. China
| | - Yuanliang Zhang
- State Key Laboratory of Chemical Biology and Drug Discovery, Department of Applied Biology and Chemical Technology, Hong Kong Polytechnic University, Hung Hom, Hong Kong SAR, 999077, P. R. China
| | - Kecheng Li
- State Key Laboratory of Chemical Biology and Drug Discovery, Department of Applied Biology and Chemical Technology, Hong Kong Polytechnic University, Hung Hom, Hong Kong SAR, 999077, P. R. China
| | - Hongmin Cai
- School of Computer Science and Engineering, South China University of Technology, Guangzhou, 510623, P. R. China
| | - Hongwei Wang
- State Key Laboratory of Ophthalmology, Zhongshan Ophthalmic Center, Sun Yat-sen University, Guangdong Provincial Key Laboratory of Ophthalmology and Visual Science, Guangzhou, 510623, P. R. China
| | - Qian Zhao
- State Key Laboratory of Chemical Biology and Drug Discovery, Department of Applied Biology and Chemical Technology, Hong Kong Polytechnic University, Hung Hom, Hong Kong SAR, 999077, P. R. China
| |
Collapse
|
18
|
Tang Z, Fan W, Li Q, Wang D, Wen M, Wang J, Li X, Zhou Y. MVIP: multi-omics portal of viral infection. Nucleic Acids Res 2021; 50:D817-D827. [PMID: 34718748 PMCID: PMC8689837 DOI: 10.1093/nar/gkab958] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/15/2021] [Revised: 09/30/2021] [Accepted: 10/05/2021] [Indexed: 12/13/2022] Open
Abstract
Virus infections are huge threats to living organisms and cause many diseases, such as COVID-19 caused by SARS-CoV-2, which has led to millions of deaths. To develop effective strategies to control viral infection, we need to understand its molecular events in host cells. Virus related functional genomic datasets are growing rapidly, however, an integrative platform for systematically investigating host responses to viruses is missing. Here, we developed a user-friendly multi-omics portal of viral infection named as MVIP (https://mvip.whu.edu.cn/). We manually collected available high-throughput sequencing data under viral infection, and unified their detailed metadata including virus, host species, infection time, assay, and target, etc. We processed multi-layered omics data of more than 4900 viral infected samples from 77 viruses and 33 host species with standard pipelines, including RNA-seq, ChIP-seq, and CLIP-seq, etc. In addition, we integrated these genome-wide signals into customized genome browsers, and developed multiple dynamic charts to exhibit the information, such as time-course dynamic and differential gene expression profiles, alternative splicing changes and enriched GO/KEGG terms. Furthermore, we implemented several tools for efficiently mining the virus-host interactions by virus, host and genes. MVIP would help users to retrieve large-scale functional information and promote the understanding of virus-host interactions.
Collapse
Affiliation(s)
- Zhidong Tang
- State Key Laboratory of Virology, College of Life Sciences, Wuhan University, Wuhan 430072, China
| | - Weiliang Fan
- State Key Laboratory of Virology, College of Life Sciences, Wuhan University, Wuhan 430072, China
| | - Qiming Li
- State Key Laboratory of Virology, College of Life Sciences, Wuhan University, Wuhan 430072, China
| | - Dehe Wang
- State Key Laboratory of Virology, College of Life Sciences, Wuhan University, Wuhan 430072, China
| | - Miaomiao Wen
- Institute for Advanced Studies, Wuhan University, Wuhan 430072, China
| | - Junhao Wang
- State Key Laboratory of Virology, College of Life Sciences, Wuhan University, Wuhan 430072, China
| | - Xingqiao Li
- State Key Laboratory of Virology, College of Life Sciences, Wuhan University, Wuhan 430072, China
| | - Yu Zhou
- State Key Laboratory of Virology, College of Life Sciences, Wuhan University, Wuhan 430072, China.,Institute for Advanced Studies, Wuhan University, Wuhan 430072, China.,RNA Institute, Wuhan University, Wuhan 430072, China.,Frontier Science Center for Immunology and Metabolism, Wuhan University, Wuhan 430072, China
| |
Collapse
|
19
|
Song B, Jiang M, Gao L. RiboNT: A Noise-Tolerant Predictor of Open Reading Frames from Ribosome-Protected Footprints. Life (Basel) 2021; 11:life11070701. [PMID: 34357073 PMCID: PMC8307163 DOI: 10.3390/life11070701] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/26/2021] [Revised: 07/13/2021] [Accepted: 07/14/2021] [Indexed: 01/27/2023] Open
Abstract
Ribo-seq, also known as ribosome profiling, refers to the sequencing of ribosome-protected mRNA fragments (RPFs). This technique has greatly advanced our understanding of translation and facilitated the identification of novel open reading frames (ORFs) within untranslated regions or non-coding sequences as well as the identification of non-canonical start codons. However, the widespread application of Ribo-seq has been hindered because obtaining periodic RPFs requires a highly optimized protocol, which may be difficult to achieve, particularly in non-model organisms. Furthermore, the periodic RPFs are too short (28 nt) for accurate mapping to polyploid genomes, but longer RPFs are usually produced with a compromise in periodicity. Here we present RiboNT, a noise-tolerant ORF predictor that can utilize RPFs with poor periodicity. It evaluates RPF periodicity and automatically weighs the support from RPFs and codon usage before combining their contributions to identify translated ORFs. The results demonstrate the utility of RiboNT for identifying both long and small ORFs using RPFs with either good or poor periodicity. We implemented the pipeline on a dataset of RPFs with poor periodicity derived from membrane-bound polysomes of Arabidopsis thaliana seedlings and identified several small ORFs (sORFs) evolutionarily conserved in diverse plant species. RiboNT should greatly broaden the application of Ribo-seq by minimizing the requirement of RPF quality and allowing the use of longer RPFs, which is critical for organisms with complex genomes because these RPFs can be more accurately mapped to the position from which they were derived.
Collapse
Affiliation(s)
- Bo Song
- Guangdong Provincial Key Laboratory for Plant Epigenetics, College of Life Sciences and Oceanography, Shenzhen University, Shenzhen 518060, China
- Shenzhen Branch, Guangdong Laboratory of Lingnan Modern Agriculture, Genome Analysis Laboratory of the Ministry of Agriculture and Rural Affairs, Agricultural Genomics Institute at Shenzhen, Chinese Academy of Agricultural Sciences, Shenzhen 518120, China;
- Correspondence: (B.S.); (L.G.)
| | - Mengyun Jiang
- Shenzhen Branch, Guangdong Laboratory of Lingnan Modern Agriculture, Genome Analysis Laboratory of the Ministry of Agriculture and Rural Affairs, Agricultural Genomics Institute at Shenzhen, Chinese Academy of Agricultural Sciences, Shenzhen 518120, China;
- State Key Laboratory of Crop Stress Adaptation and Improvement, School of Life Sciences, Henan University, Kaifeng 475004, China
- Shenzhen Research Institute of Henan University, Shenzhen 518000, China
| | - Lei Gao
- Guangdong Provincial Key Laboratory for Plant Epigenetics, College of Life Sciences and Oceanography, Shenzhen University, Shenzhen 518060, China
- Correspondence: (B.S.); (L.G.)
| |
Collapse
|
20
|
Poidevin L, Forment J, Unal D, Ferrando A. Transcriptome and translatome changes in germinated pollen under heat stress uncover roles of transporter genes involved in pollen tube growth. PLANT, CELL & ENVIRONMENT 2021. [PMID: 33289138 DOI: 10.1101/2020.05.29.122937] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/04/2023]
Abstract
Plant reproduction is one key biological process that is very sensitive to heat stress and, as a result, enhanced global warming becomes a serious threat to agriculture. In this work, we have studied the effects of heat on germinated pollen of Arabidopsis thaliana both at the transcriptional and translational level. We have used a high-resolution ribosome profiling technology to provide a comprehensive study of the transcriptome and the translatome of germinated pollen at permissive and restrictive temperatures. We have found significant down-regulation of key membrane transporters required for pollen tube growth by heat, thus uncovering heat-sensitive targets. A subset of the heat-repressed transporters showed coordinated up-regulation with canonical heat-shock genes at permissive conditions. We also found specific regulations at the translational level and we have uncovered the presence of ribosomes on sequences annotated as non-coding. Our results demonstrate that heat impacts mostly on membrane transporters thus explaining the deleterious effects of heat stress on pollen growth. The specific regulations at the translational level and the presence of ribosomes on non-coding RNAs highlights novel regulatory aspects on plant fertilization.
Collapse
Affiliation(s)
- Laetitia Poidevin
- Instituto de Biología Molecular y Celular de Plantas, Consejo Superior de Investigaciones Científicas-Universitat Politècnica de València, Valencia, Spain
| | - Javier Forment
- Instituto de Biología Molecular y Celular de Plantas, Consejo Superior de Investigaciones Científicas-Universitat Politècnica de València, Valencia, Spain
| | - Dilek Unal
- Biotechnology Application and Research Center, and Department of Molecular Biology, Faculty of Science and Letter, Bilecik Seyh Edebali University, Bilecik, Turkey
| | - Alejandro Ferrando
- Instituto de Biología Molecular y Celular de Plantas, Consejo Superior de Investigaciones Científicas-Universitat Politècnica de València, Valencia, Spain
| |
Collapse
|
21
|
Poidevin L, Forment J, Unal D, Ferrando A. Transcriptome and translatome changes in germinated pollen under heat stress uncover roles of transporter genes involved in pollen tube growth. PLANT, CELL & ENVIRONMENT 2021; 44:2167-2184. [PMID: 33289138 DOI: 10.1111/pce.13972] [Citation(s) in RCA: 18] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/30/2020] [Revised: 11/27/2020] [Accepted: 11/28/2020] [Indexed: 05/12/2023]
Abstract
Plant reproduction is one key biological process that is very sensitive to heat stress and, as a result, enhanced global warming becomes a serious threat to agriculture. In this work, we have studied the effects of heat on germinated pollen of Arabidopsis thaliana both at the transcriptional and translational level. We have used a high-resolution ribosome profiling technology to provide a comprehensive study of the transcriptome and the translatome of germinated pollen at permissive and restrictive temperatures. We have found significant down-regulation of key membrane transporters required for pollen tube growth by heat, thus uncovering heat-sensitive targets. A subset of the heat-repressed transporters showed coordinated up-regulation with canonical heat-shock genes at permissive conditions. We also found specific regulations at the translational level and we have uncovered the presence of ribosomes on sequences annotated as non-coding. Our results demonstrate that heat impacts mostly on membrane transporters thus explaining the deleterious effects of heat stress on pollen growth. The specific regulations at the translational level and the presence of ribosomes on non-coding RNAs highlights novel regulatory aspects on plant fertilization.
Collapse
Affiliation(s)
- Laetitia Poidevin
- Instituto de Biología Molecular y Celular de Plantas, Consejo Superior de Investigaciones Científicas-Universitat Politècnica de València, Valencia, Spain
| | - Javier Forment
- Instituto de Biología Molecular y Celular de Plantas, Consejo Superior de Investigaciones Científicas-Universitat Politècnica de València, Valencia, Spain
| | - Dilek Unal
- Biotechnology Application and Research Center, and Department of Molecular Biology, Faculty of Science and Letter, Bilecik Seyh Edebali University, Bilecik, Turkey
| | - Alejandro Ferrando
- Instituto de Biología Molecular y Celular de Plantas, Consejo Superior de Investigaciones Científicas-Universitat Politècnica de València, Valencia, Spain
| |
Collapse
|
22
|
Tjeldnes H, Labun K, Torres Cleuren Y, Chyżyńska K, Świrski M, Valen E. ORFik: a comprehensive R toolkit for the analysis of translation. BMC Bioinformatics 2021; 22:336. [PMID: 34147079 PMCID: PMC8214792 DOI: 10.1186/s12859-021-04254-w] [Citation(s) in RCA: 13] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/18/2021] [Accepted: 06/09/2021] [Indexed: 12/13/2022] Open
Abstract
BACKGROUND With the rapid growth in the use of high-throughput methods for characterizing translation and the continued expansion of multi-omics, there is a need for back-end functions and streamlined tools for processing, analyzing, and characterizing data produced by these assays. RESULTS Here, we introduce ORFik, a user-friendly R/Bioconductor API and toolbox for studying translation and its regulation. It extends GenomicRanges from the genome to the transcriptome and implements a framework that integrates data from several sources. ORFik streamlines the steps to process, analyze, and visualize the different steps of translation with a particular focus on initiation and elongation. It accepts high-throughput sequencing data from ribosome profiling to quantify ribosome elongation or RCP-seq/TCP-seq to also quantify ribosome scanning. In addition, ORFik can use CAGE data to accurately determine 5'UTRs and RNA-seq for determining translation relative to RNA abundance. ORFik supports and calculates over 30 different translation-related features and metrics from the literature and can annotate translated regions such as proteins or upstream open reading frames (uORFs). As a use-case, we demonstrate using ORFik to rapidly annotate the dynamics of 5' UTRs across different tissues, detect their uORFs, and characterize their scanning and translation in the downstream protein-coding regions. CONCLUSION In summary, ORFik introduces hundreds of tested, documented and optimized methods. ORFik is designed to be easily customizable, enabling users to create complete workflows from raw data to publication-ready figures for several types of sequencing data. Finally, by improving speed and scope of many core Bioconductor functions, ORFik offers enhancement benefiting the entire Bioconductor environment. AVAILABILITY http://bioconductor.org/packages/ORFik .
Collapse
Affiliation(s)
- Håkon Tjeldnes
- Computational Biology Unit, Department of Informatics, University of Bergen, Bergen, Norway
| | - Kornel Labun
- Computational Biology Unit, Department of Informatics, University of Bergen, Bergen, Norway
| | - Yamila Torres Cleuren
- Computational Biology Unit, Department of Informatics, University of Bergen, Bergen, Norway.,Sars International Centre for Marine Molecular Biology, University of Bergen, Bergen, Norway
| | - Katarzyna Chyżyńska
- Computational Biology Unit, Department of Informatics, University of Bergen, Bergen, Norway
| | - Michał Świrski
- Institute of Genetics and Biotechnology, Faculty of Biology, University of Warsaw, Warsaw, Poland
| | - Eivind Valen
- Computational Biology Unit, Department of Informatics, University of Bergen, Bergen, Norway. .,Sars International Centre for Marine Molecular Biology, University of Bergen, Bergen, Norway.
| |
Collapse
|
23
|
Rutley N, Poidevin L, Doniger T, Tillett RL, Rath A, Forment J, Luria G, Schlauch KA, Ferrando A, Harper JF, Miller G. Characterization of novel pollen-expressed transcripts reveals their potential roles in pollen heat stress response in Arabidopsis thaliana. PLANT REPRODUCTION 2021; 34:61-78. [PMID: 33459869 PMCID: PMC7902599 DOI: 10.1007/s00497-020-00400-1] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/16/2020] [Accepted: 11/17/2020] [Indexed: 05/27/2023]
Abstract
Arabidopsis pollen transcriptome analysis revealed new intergenic transcripts of unknown function, many of which are long non-coding RNAs, that may function in pollen-specific processes, including the heat stress response. The male gametophyte is the most heat sensitive of all plant tissues. In recent years, long noncoding RNAs (lncRNAs) have emerged as important components of cellular regulatory networks involved in most biological processes, including response to stress. While examining RNAseq datasets of developing and germinating Arabidopsis thaliana pollen exposed to heat stress (HS), we identified 66 novel and 246 recently annotated intergenic expressed loci (XLOCs) of unknown function, with the majority encoding lncRNAs. Comparison with HS in cauline leaves and other RNAseq experiments indicated that 74% of the 312 XLOCs are pollen-specific, and at least 42% are HS-responsive. Phylogenetic analysis revealed that 96% of the genes evolved recently in Brassicaceae. We found that 50 genes are putative targets of microRNAs and that 30% of the XLOCs contain small open reading frames (ORFs) with homology to protein sequences. Finally, RNAseq of ribosome-protected RNA fragments together with predictions of periodic footprint of the ribosome P-sites indicated that 23 of these ORFs are likely to be translated. Our findings indicate that many of the 312 unknown genes might be functional and play a significant role in pollen biology, including the HS response.
Collapse
Affiliation(s)
- Nicholas Rutley
- The Mina and Everard Goodman Faculty of Life Sciences, Bar Ilan University, 5290002, Ramat-Gan, Israel
| | - Laetitia Poidevin
- Instituto de Biología Molecular y Celular de Plantas, Consejo Superior de Investigaciones Cient́́if́icas-Universitat Politècnica de València, Valencia, Spain
| | - Tirza Doniger
- The Mina and Everard Goodman Faculty of Life Sciences, Bar Ilan University, 5290002, Ramat-Gan, Israel
| | - Richard L Tillett
- Department of Biochemistry and Molecular Biology, University of Nevada at Reno, Reno, NV, 89557, USA
- Nevada INBRE Bioinformatics Core, University of Nevada at Reno, Reno, NV, 89557, USA
| | - Abhishek Rath
- The Mina and Everard Goodman Faculty of Life Sciences, Bar Ilan University, 5290002, Ramat-Gan, Israel
| | - Javier Forment
- Instituto de Biología Molecular y Celular de Plantas, Consejo Superior de Investigaciones Cient́́if́icas-Universitat Politècnica de València, Valencia, Spain
| | - Gilad Luria
- The Mina and Everard Goodman Faculty of Life Sciences, Bar Ilan University, 5290002, Ramat-Gan, Israel
| | - Karen A Schlauch
- Institute of Health Innovation, Desert Research Institute, Department of Pharmacology, University of Nevada at Reno, Reno, NV, 89557, USA
| | - Alejandro Ferrando
- Instituto de Biología Molecular y Celular de Plantas, Consejo Superior de Investigaciones Cient́́if́icas-Universitat Politècnica de València, Valencia, Spain
| | - Jeffery F Harper
- Department of Biochemistry and Molecular Biology, University of Nevada at Reno, Reno, NV, 89557, USA
| | - Gad Miller
- The Mina and Everard Goodman Faculty of Life Sciences, Bar Ilan University, 5290002, Ramat-Gan, Israel.
| |
Collapse
|
24
|
Schlesinger D, Elsässer SJ. Revisiting sORFs: overcoming challenges to identify and characterize functional microproteins. FEBS J 2021; 289:53-74. [PMID: 33595896 DOI: 10.1111/febs.15769] [Citation(s) in RCA: 48] [Impact Index Per Article: 16.0] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/21/2020] [Revised: 01/17/2021] [Accepted: 02/15/2021] [Indexed: 02/07/2023]
Abstract
Short ORFs (sORFs), that is, occurrences of a start and stop codon within 100 codons or less, can be found in organisms of all domains of life, outnumbering annotated protein-coding ORFs by orders of magnitude. Even though functional proteins smaller than 100 amino acids are known, the coding potential of sORFs has often been overlooked, as it is not trivial to predict and test for functionality within the large number of sORFs. Recent advances in ribosome profiling and mass spectrometry approaches, together with refined bioinformatic predictions, have enabled a huge leap forward in this field and identified thousands of likely coding sORFs. A relatively low number of small proteins or microproteins produced from these sORFs have been characterized so far on the molecular, structural, and/or mechanistic level. These however display versatile and, in some cases, essential cellular functions, allowing for the exciting possibility that many more, previously unknown small proteins might be encoded in the genome, waiting to be discovered. This review will give an overview of the steadily growing microprotein field, focusing on eukaryotic small proteins. We will discuss emerging themes in the molecular action of microproteins, as well as advances and challenges in microprotein identification and characterization.
Collapse
Affiliation(s)
- Dörte Schlesinger
- Science for Life Laboratory, Division of Genome Biology, Department of Medical Biochemistry and Biophysics, Karolinska Institutet, Stockholm, Sweden.,Ming Wai Lau Centre for Reparative Medicine, Stockholm node, Karolinska Institutet, Stockholm, Sweden
| | - Simon J Elsässer
- Science for Life Laboratory, Division of Genome Biology, Department of Medical Biochemistry and Biophysics, Karolinska Institutet, Stockholm, Sweden.,Ming Wai Lau Centre for Reparative Medicine, Stockholm node, Karolinska Institutet, Stockholm, Sweden
| |
Collapse
|
25
|
Xing J, Liu H, Jiang W, Wang L. LncRNA-Encoded Peptide: Functions and Predicting Methods. Front Oncol 2021; 10:622294. [PMID: 33520729 PMCID: PMC7842084 DOI: 10.3389/fonc.2020.622294] [Citation(s) in RCA: 51] [Impact Index Per Article: 17.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/28/2020] [Accepted: 11/30/2020] [Indexed: 12/16/2022] Open
Abstract
Long non-coding RNA (lncRNA) was originally defined as the representative of the non-coding RNAs and unable to encode. However, recent reports suggest that some lncRNAs actually contain open reading frames that encode peptides. These coding products play important roles in the pathogenesis of many diseases. Here, we summarize the regulatory pathways of mammalian lncRNA-encoded peptides in influencing muscle function, mRNA stability, gene expression, and so on. We also address the promoting and inhibiting functions of the peptides in different cancers and other diseases. Then we introduce the computational predicting methods and data resources to predict the coding ability of lncRNA. The intention of this review is to provide references for further coding research and contribute to reveal the potential prospects for targeted tumor therapy.
Collapse
Affiliation(s)
- Jiani Xing
- Department of Pathophysiology, Medical College of Southeast University, Nanjing, China
| | - Haizhou Liu
- Department of Biomedical Engineering, Nanjing University of Aeronautics and Astronautics, Nanjing, China
| | - Wei Jiang
- Department of Biomedical Engineering, Nanjing University of Aeronautics and Astronautics, Nanjing, China
| | - Lihong Wang
- Department of Pathophysiology, Medical College of Southeast University, Nanjing, China.,Jiangsu Provincial Key Laboratory of Critical Care Medicine, Nanjing, China
| |
Collapse
|
26
|
Zhao Y, Zhou Y, Liu Y, Hao Y, Li M, Pu X, Li C, Wen Z. Uncovering the prognostic gene signatures for the improvement of risk stratification in cancers by using deep learning algorithm coupled with wavelet transform. BMC Bioinformatics 2020; 21:195. [PMID: 32429941 PMCID: PMC7236453 DOI: 10.1186/s12859-020-03544-z] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/25/2019] [Accepted: 05/11/2020] [Indexed: 01/08/2023] Open
Abstract
Background The aim of gene expression-based clinical modelling in tumorigenesis is not only to accurately predict the clinical endpoints, but also to reveal the genome characteristics for downstream analysis for the purpose of understanding the mechanisms of cancers. Most of the conventional machine learning methods involved a gene filtering step, in which tens of thousands of genes were firstly filtered based on the gene expression levels by a statistical method with an arbitrary cutoff. Although gene filtering procedure helps to reduce the feature dimension and avoid overfitting, there is a risk that some pathogenic genes important to the disease will be ignored. Results In this study, we proposed a novel deep learning approach by combining a convolutional neural network with stationary wavelet transform (SWT-CNN) for stratifying cancer patients and predicting their clinical outcomes without gene filtering based on tumor genomic profiles. The proposed SWT-CNN overperformed the state-of-art algorithms, including support vector machine (SVM) and logistic regression (LR), and produced comparable prediction performance to random forest (RF). Furthermore, for all the cancer types, we firstly proposed a method to weight the genes with the scores, which took advantage of the representative features in the hidden layer of convolutional neural network, and then selected the prognostic genes for the Cox proportional-hazards regression. The results showed that risk stratifications can be effectively improved by using the identified prognostic genes as feature, indicating that the representative features generated by SWT-CNN can well correlate the genes with prognostic risk in cancers and be helpful for selecting the prognostic gene signatures. Conclusions Our results indicated that gene expression-based SWT-CNN model can be an excellent tool for stratifying the prognostic risk for cancer patients. In addition, the representative features of SWT-CNN were validated to be useful for evaluating the importance of the genes in the risk stratification and can be further used to identify the prognostic gene signatures.
Collapse
|
27
|
Liu Q, Shvarts T, Sliz P, Gregory RI. RiboToolkit: an integrated platform for analysis and annotation of ribosome profiling data to decode mRNA translation at codon resolution. Nucleic Acids Res 2020; 48:W218-W229. [PMID: 32427338 PMCID: PMC7319539 DOI: 10.1093/nar/gkaa395] [Citation(s) in RCA: 41] [Impact Index Per Article: 10.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/06/2020] [Revised: 04/23/2020] [Accepted: 05/15/2020] [Indexed: 12/31/2022] Open
Abstract
Ribosome profiling (Ribo-seq) is a powerful technology for globally monitoring RNA translation; ranging from codon occupancy profiling, identification of actively translated open reading frames (ORFs), to the quantification of translational efficiency under various physiological or experimental conditions. However, analyzing and decoding translation information from Ribo-seq data is not trivial. Although there are many existing tools to analyze Ribo-seq data, most of these tools are designed for specific or limited functionalities and an easy-to-use integrated tool to analyze Ribo-seq data is lacking. Fortunately, the small size (26–34 nt) of ribosome protected fragments (RPFs) in Ribo-seq and the relatively small amount of sequencing data greatly facilitates the development of such a web platform, which is easy to manipulate for users with or without bioinformatic expertise. Thus, we developed RiboToolkit (http://rnabioinfor.tch.harvard.edu/RiboToolkit), a convenient, freely available, web-based service to centralize Ribo-seq data analyses, including data cleaning and quality evaluation, expression analysis based on RPFs, codon occupancy, translation efficiency analysis, differential translation analysis, functional annotation, translation metagene analysis, and identification of actively translated ORFs. Besides, easy-to-use web interfaces were developed to facilitate data analysis and intuitively visualize results. Thus, RiboToolkit will greatly facilitate the study of mRNA translation based on ribosome profiling.
Collapse
Affiliation(s)
- Qi Liu
- Stem Cell Program, Division of Hematology/Oncology, Boston Children's Hospital, Boston, MA 02115, USA.,Department of Biological Chemistry and Molecular Pharmacology, Harvard Medical School, Boston, MA 02115, USA
| | - Tanya Shvarts
- Computational Health Informatics Program, Boston Children's Hospital, Boston, MA 02115, USA
| | - Piotr Sliz
- Department of Biological Chemistry and Molecular Pharmacology, Harvard Medical School, Boston, MA 02115, USA.,Computational Health Informatics Program, Boston Children's Hospital, Boston, MA 02115, USA
| | - Richard I Gregory
- Stem Cell Program, Division of Hematology/Oncology, Boston Children's Hospital, Boston, MA 02115, USA.,Department of Biological Chemistry and Molecular Pharmacology, Harvard Medical School, Boston, MA 02115, USA.,Department of Pediatrics, Harvard Medical School, Boston, MA 02115, USA.,Harvard Initiative for RNA Medicine, Boston, MA 02115, USA.,Harvard Stem Cell Institute, Cambridge, MA 02138, USA
| |
Collapse
|
28
|
Choudhary S, Li W, D Smith A. Accurate detection of short and long active ORFs using Ribo-seq data. Bioinformatics 2020; 36:2053-2059. [PMID: 31750902 DOI: 10.1093/bioinformatics/btz878] [Citation(s) in RCA: 20] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/18/2019] [Revised: 11/04/2019] [Accepted: 11/20/2019] [Indexed: 12/27/2022] Open
Abstract
MOTIVATION Ribo-seq, a technique for deep-sequencing ribosome-protected mRNA fragments, has enabled transcriptome-wide monitoring of translation in vivo. It has opened avenues for re-evaluating the coding potential of open reading frames (ORFs), including many short ORFs that were previously presumed to be non-translating. However, the detection of translating ORFs, specifically short ORFs, from Ribo-seq data, remains challenging due to its high heterogeneity and noise. RESULTS We present ribotricer, a method for detecting actively translating ORFs by directly leveraging the three-nucleotide periodicity of Ribo-seq data. Ribotricer demonstrates higher accuracy and robustness compared with other methods at detecting actively translating ORFs including short ORFs on multiple published datasets across species inclusive of Arabidopsis, Caenorhabditis elegans, Drosophila, human, mouse, rat, yeast and zebrafish. AVAILABILITY AND IMPLEMENTATION Ribotricer is available at https://github.com/smithlabcode/ribotricer. All analysis scripts and results are available at https://github.com/smithlabcode/ribotricer-results. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Saket Choudhary
- Computational Biology and Bioinformatics, University of Southern California, Los Angeles, CA 90089, USA
| | - Wenzheng Li
- Computational Biology and Bioinformatics, University of Southern California, Los Angeles, CA 90089, USA
| | - Andrew D Smith
- Computational Biology and Bioinformatics, University of Southern California, Los Angeles, CA 90089, USA
| |
Collapse
|
29
|
Li F, Xing X, Xiao Z, Xu G, Yang X. RiboMiner: a toolset for mining multi-dimensional features of the translatome with ribosome profiling data. BMC Bioinformatics 2020; 21:340. [PMID: 32738892 PMCID: PMC7430821 DOI: 10.1186/s12859-020-03670-8] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/22/2020] [Accepted: 07/20/2020] [Indexed: 02/08/2023] Open
Abstract
Background Ribosome profiling has been widely used for studies of translation under a large variety of cellular and physiological contexts. Many of these studies have greatly benefitted from a series of data-mining tools designed for dissection of the translatome from different aspects. However, as the studies of translation advance quickly, the current toolbox still falls in short, and more specialized tools are in urgent need for deeper and more efficient mining of the important and new features of the translation landscapes. Results Here, we present RiboMiner, a bioinformatics toolset for mining of multi-dimensional features of the translatome with ribosome profiling data. RiboMiner performs extensive quality assessment of the data and integrates a spectrum of tools for various metagene analyses of the ribosome footprints and for detailed analyses of multiple features related to translation regulation. Visualizations of all the results are available. Many of these analyses have not been provided by previous methods. RiboMiner is highly flexible, as the pipeline could be easily adapted and customized for different scopes and targets of the studies. Conclusions Applications of RiboMiner on two published datasets did not only reproduced the main results reported before, but also generated novel insights into the translation regulation processes. Therefore, being complementary to the current tools, RiboMiner could be a valuable resource for dissections of the translation landscapes and the translation regulations by mining the ribosome profiling data more comprehensively and with higher resolution. RiboMiner is freely available at https://github.com/xryanglab/RiboMiner and https://pypi.org/project/RiboMiner.
Collapse
Affiliation(s)
- Fajin Li
- MOE Key Laboratory of Bioinformatics, School of Life Sciences, Tsinghua University, Medical Science Building D231, Beijing, 100084, China.,Center for Synthetic & Systems Biology, Tsinghua University, Beijing, 100084, China.,Joint Graduate Program of Peking-Tsinghua-National Institute of Biological Science, Tsinghua University, Beijing, 100084, China
| | - Xudong Xing
- MOE Key Laboratory of Bioinformatics, School of Life Sciences, Tsinghua University, Medical Science Building D231, Beijing, 100084, China.,Center for Synthetic & Systems Biology, Tsinghua University, Beijing, 100084, China.,Joint Graduate Program of Peking-Tsinghua-National Institute of Biological Science, Tsinghua University, Beijing, 100084, China
| | - Zhengtao Xiao
- MOE Key Laboratory of Bioinformatics, School of Life Sciences, Tsinghua University, Medical Science Building D231, Beijing, 100084, China.,Center for Synthetic & Systems Biology, Tsinghua University, Beijing, 100084, China
| | - Gang Xu
- MOE Key Laboratory of Bioinformatics, School of Life Sciences, Tsinghua University, Medical Science Building D231, Beijing, 100084, China
| | - Xuerui Yang
- MOE Key Laboratory of Bioinformatics, School of Life Sciences, Tsinghua University, Medical Science Building D231, Beijing, 100084, China. .,Center for Synthetic & Systems Biology, Tsinghua University, Beijing, 100084, China.
| |
Collapse
|
30
|
Zhu Y, Xu G, Yang YT, Xu Z, Chen X, Shi B, Xie D, Lu ZJ, Wang P. POSTAR2: deciphering the post-transcriptional regulatory logics. Nucleic Acids Res 2020; 47:D203-D211. [PMID: 30239819 PMCID: PMC6323971 DOI: 10.1093/nar/gky830] [Citation(s) in RCA: 108] [Impact Index Per Article: 27.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/26/2018] [Accepted: 09/08/2018] [Indexed: 01/20/2023] Open
Abstract
Post-transcriptional regulation of RNAs is critical to the diverse range of cellular processes. The volume of functional genomic data focusing on post-transcriptional regulation logics continues to grow in recent years. In the current database version, POSTAR2 (http://lulab.life.tsinghua.edu.cn/postar), we included the following new features and data: updated ∼500 CLIP-seq datasets (∼1200 CLIP-seq datasets in total) from six species, including human, mouse, fly, worm, Arabidopsis and yeast; added a new module ‘Translatome’, which is derived from Ribo-seq datasets and contains ∼36 million open reading frames (ORFs) in the genomes from the six species; updated and unified post-transcriptional regulation and variation data. Finally, we improved web interfaces for searching and visualizing protein–RNA interactions with multi-layer information. Meanwhile, we also merged our CLIPdb database into POSTAR2. POSTAR2 will help researchers investigate the post-transcriptional regulatory logics coordinated by RNA-binding proteins and translational landscape of cellular RNAs.
Collapse
Affiliation(s)
- Yumin Zhu
- MOE Key Laboratory of Bioinformatics, Center for Synthetic and Systems Biology, School of Life Sciences, Tsinghua University, Beijing 100084, China.,Division of General Surgery, Peking University First Hospital, Beijing 100034, China
| | - Gang Xu
- MOE Key Laboratory of Bioinformatics, Center for Synthetic and Systems Biology, School of Life Sciences, Tsinghua University, Beijing 100084, China
| | - Yucheng T Yang
- Department of Molecular Biophysics and Biochemistry, Yale University, New Haven, CT 06520, USA
| | - Zhiyu Xu
- MOE Key Laboratory of Bioinformatics, Center for Synthetic and Systems Biology, School of Life Sciences, Tsinghua University, Beijing 100084, China
| | - Xinduo Chen
- MOE Key Laboratory of Bioinformatics, Center for Synthetic and Systems Biology, School of Life Sciences, Tsinghua University, Beijing 100084, China
| | - Binbin Shi
- MOE Key Laboratory of Bioinformatics, Center for Synthetic and Systems Biology, School of Life Sciences, Tsinghua University, Beijing 100084, China
| | - Daoxin Xie
- MOE Key Laboratory of Bioinformatics, Center for Synthetic and Systems Biology, School of Life Sciences, Tsinghua University, Beijing 100084, China
| | - Zhi John Lu
- MOE Key Laboratory of Bioinformatics, Center for Synthetic and Systems Biology, School of Life Sciences, Tsinghua University, Beijing 100084, China
| | - Pengyuan Wang
- Division of General Surgery, Peking University First Hospital, Beijing 100034, China
| |
Collapse
|
31
|
Kiniry SJ, Michel AM, Baranov PV. Computational methods for ribosome profiling data analysis. WILEY INTERDISCIPLINARY REVIEWS. RNA 2020; 11:e1577. [PMID: 31760685 DOI: 10.1002/wrna.1577] [Citation(s) in RCA: 20] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/30/2019] [Revised: 10/12/2019] [Accepted: 10/16/2019] [Indexed: 12/15/2022]
Abstract
Since the introduction of the ribosome profiling technique in 2009 its popularity has greatly increased. It is widely used for the comprehensive assessment of gene expression and for studying the mechanisms of regulation at the translational level. As the number of ribosome profiling datasets being produced continues to grow, so too does the need for reliable software that can provide answers to the biological questions it can address. This review describes the computational methods and tools that have been developed to analyze ribosome profiling data at the different stages of the process. It starts with initial routine processing of raw data and follows with more specific tasks such as the identification of translated open reading frames, differential gene expression analysis, or evaluation of local or global codon decoding rates. The review pinpoints challenges associated with each step and explains the ways in which they are currently addressed. In addition it provides a comprehensive, albeit incomplete, list of publicly available software applicable to each step, which may be a beneficial starting point to those unexposed to ribosome profiling analysis. The outline of current challenges in ribosome profiling data analysis may inspire computational biologists to search for novel, potentially superior, solutions that will improve and expand the bioinformatician's toolbox for ribosome profiling data analysis. This article is characterized under: Translation > Ribosome Structure/Function RNA Evolution and Genomics > Computational Analyses of RNA Translation > Translation Mechanisms Translation > Translation Regulation.
Collapse
Affiliation(s)
- Stephen J Kiniry
- School of Biochemistry and Cell Biology, University College Cork, Cork, Ireland
| | - Audrey M Michel
- School of Biochemistry and Cell Biology, University College Cork, Cork, Ireland
| | - Pavel V Baranov
- School of Biochemistry and Cell Biology, University College Cork, Cork, Ireland
- Shemyakin-Ovchinnikov Institute of Bioorganic Chemistry, RAS, Moscow, Russia
| |
Collapse
|
32
|
Recent advances in ribosome profiling for deciphering translational regulation. Methods 2020; 176:46-54. [DOI: 10.1016/j.ymeth.2019.05.011] [Citation(s) in RCA: 11] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/04/2018] [Revised: 05/02/2019] [Accepted: 05/15/2019] [Indexed: 12/16/2022] Open
|
33
|
uORF-Tools-Workflow for the determination of translation-regulatory upstream open reading frames. PLoS One 2019; 14:e0222459. [PMID: 31513641 PMCID: PMC6742470 DOI: 10.1371/journal.pone.0222459] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/26/2019] [Accepted: 08/29/2019] [Indexed: 12/17/2022] Open
Abstract
Ribosome profiling (ribo-seq) provides a means to analyze active translation by determining ribosome occupancy in a transcriptome-wide manner. The vast majority of ribosome protected fragments (RPFs) resides within the protein-coding sequence of mRNAs. However, commonly reads are also found within the transcript leader sequence (TLS) (aka 5’ untranslated region) preceding the main open reading frame (ORF), indicating the translation of regulatory upstream ORFs (uORFs). Here, we present a workflow for the identification of translation-regulatory uORFs. Specifically, uORF-Tools uses Ribo-TISH to identify uORFs within a given dataset and generates a uORF annotation file. In addition, a comprehensive human uORF annotation file, based on 35 ribo-seq files, is provided, which can serve as an alternative input file for the workflow. To assess the translation-regulatory activity of the uORFs, stimulus-induced changes in the ratio of the RPFs residing in the main ORFs relative to those found in the associated uORFs are determined. The resulting output file allows for the easy identification of candidate uORFs, which have translation-inhibitory effects on their associated main ORFs. uORF-Tools is available as a free and open Snakemake workflow at https://github.com/Biochemistry1-FFM/uORF-Tools. It is easily installed and all necessary tools are provided in a version-controlled manner, which also ensures lasting usability. uORF-Tools is designed for intuitive use and requires only limited computing times and resources.
Collapse
|