1
|
Zhang Y, Yang Y, Li K, Chen L, Yang Y, Yang C, Xie Z, Wang H, Zhao Q. Enhanced Discovery of Alternative Proteins (AltProts) in Mouse Cardiac Development Using Data-Independent Acquisition (DIA) Proteomics. Anal Chem 2025; 97:1517-1527. [PMID: 39813267 PMCID: PMC11781309 DOI: 10.1021/acs.analchem.4c02924] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/07/2024] [Revised: 11/27/2024] [Accepted: 11/27/2024] [Indexed: 01/18/2025]
Abstract
Alternative proteins (AltProts) are a class of proteins encoded by DNA sequences previously classified as noncoding. Despite their historically being overlooked, recent studies have highlighted their widespread presence and distinctive biological roles. So far, direct detection of AltProt has been relying on data-dependent acquisition (DDA) mass spectrometry (MS). However, data-independent acquisition (DIA) MS, a method that is rapidly gaining popularity for the analysis of canonical proteins, has seen limited application in AltProt research, largely due to the complexities involved in constructing DIA libraries. In this study, we present a novel DIA workflow that leverages a fragmentation spectra predictor for the efficient construction of DIA libraries, significantly enhancing the detection of AltProts. Our method achieved a 2-fold increase in the identification of AltProts and a 50% reduction in missing values compared to DDA. We conducted a comprehensive comparison of four AltProt databases, four DIA-library construction strategies, and three analytical software tools to establish an optimal workflow for AltProt analysis. Utilizing this workflow, we investigated the mouse heart development process and identified over 50 AltProts with differential expression between embryonic and adult heart tissues. Over 30 unannotated mouse AltProts were validated, including ASDURF, which played a crucial role in cardiac development. Our findings not only provide a practical workflow for MS-based AltProt analysis but also reveal novel AltProts with potential significance in biological functions.
Collapse
Affiliation(s)
- Yuanliang Zhang
- Department
of Applied Biology and Chemical Technology, State Key Laboratory of
Chemical Biology and Drug Discovery, Hong
Kong Polytechnic University, Hong Kong 999077, China
| | - Ying Yang
- Department
of Applied Biology and Chemical Technology, State Key Laboratory of
Chemical Biology and Drug Discovery, Hong
Kong Polytechnic University, Hong Kong 999077, China
| | - Kecheng Li
- Department
of Applied Biology and Chemical Technology, State Key Laboratory of
Chemical Biology and Drug Discovery, Hong
Kong Polytechnic University, Hong Kong 999077, China
| | - Lei Chen
- Department
of Applied Biology and Chemical Technology, State Key Laboratory of
Chemical Biology and Drug Discovery, Hong
Kong Polytechnic University, Hong Kong 999077, China
| | - Yang Yang
- Department
of Applied Biology and Chemical Technology, State Key Laboratory of
Chemical Biology and Drug Discovery, Hong
Kong Polytechnic University, Hong Kong 999077, China
| | - Chenxi Yang
- Department
of Applied Biology and Chemical Technology, State Key Laboratory of
Chemical Biology and Drug Discovery, Hong
Kong Polytechnic University, Hong Kong 999077, China
| | - Zhi Xie
- State
Key Laboratory of Ophthalmology, Zhongshan Ophthalmic Center, Sun Yat-sen University, Guangzhou 510060, China
| | - Hongwei Wang
- State
Key Laboratory of Ophthalmology, Zhongshan Ophthalmic Center, Sun Yat-sen University, Guangzhou 510060, China
| | - Qian Zhao
- Department
of Applied Biology and Chemical Technology, State Key Laboratory of
Chemical Biology and Drug Discovery, Hong
Kong Polytechnic University, Hong Kong 999077, China
| |
Collapse
|
2
|
Chanut-Delalande H, Zanet J. Small ORFs, Big Insights: Drosophila as a Model to Unraveling Microprotein Functions. Cells 2024; 13:1645. [PMID: 39404408 PMCID: PMC11475943 DOI: 10.3390/cells13191645] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/13/2024] [Revised: 09/27/2024] [Accepted: 10/02/2024] [Indexed: 10/19/2024] Open
Abstract
Recently developed experimental and computational approaches to identify putative coding small ORFs (smORFs) in genomes have revealed thousands of smORFs localized within coding and non-coding RNAs. They can be translated into smORF peptides or microproteins, which are defined as less than 100 amino acids in length. The identification of such a large number of potential biological regulators represents a major challenge, notably for elucidating the in vivo functions of these microproteins. Since the emergence of this field, Drosophila has proved to be a valuable model for studying the biological functions of microproteins in vivo. In this review, we outline how the smORF field emerged and the nomenclature used in this domain. We summarize the technical challenges associated with identifying putative coding smORFs in the genome and the relevant translated microproteins. Finally, recent findings on one of the best studied smORF peptides, Pri, and other microproteins studied so far in Drosophila are described. These studies highlight the diverse roles that microproteins can fulfil in the regulation of various molecular targets involved in distinct cellular processes during animal development and physiology. Given the recent emergence of the microprotein field and the associated discoveries, the microproteome represents an exquisite source of potentially bioactive molecules, whose in vivo biological functions can be explored in the Drosophila model.
Collapse
Affiliation(s)
| | - Jennifer Zanet
- Unité de Biologie Moléculaire, Cellulaire et du Développement (MCD), UMR 5077, Centre de Biologie Intégrative (CBI), CNRS, UPS, Université de Toulouse, 31062 Toulouse, France;
| |
Collapse
|
3
|
Su X, Shi C, Liu F, Tan M, Wang Y, Zhu L, Chen Y, Yu M, Wang X, Liu J, Liu Y, Lin W, Fang Z, Sun Q, Zhou T, Lin A. HMPA: a pioneering framework for the noncanonical peptidome from discovery to functional insights. Brief Bioinform 2024; 25:bbae510. [PMID: 39413795 PMCID: PMC11483136 DOI: 10.1093/bib/bbae510] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/04/2024] [Revised: 09/01/2024] [Accepted: 09/30/2024] [Indexed: 10/18/2024] Open
Abstract
Advancements in peptidomics have revealed numerous small open reading frames with coding potential and revealed that some of these micropeptides are closely related to human cancer. However, the systematic analysis and integration from sequence to structure and function remains largely undeveloped. Here, as a solution, we built a workflow for the collection and analysis of proteomic data, transcriptomic data, and clinical outcomes for cancer-associated micropeptides using publicly available datasets from large cohorts. We initially identified 19 586 novel micropeptides by reanalyzing proteomic profile data from 3753 samples across 8 cancer types. Further quantitative analysis of these micropeptides, along with associated clinical data, identified 3065 that were dysregulated in cancer, with 370 of them showing a strong association with prognosis. Moreover, we employed a deep learning framework to construct a micropeptide-protein interaction network for further bioinformatics analysis, revealing that micropeptides are involved in multiple biological processes as bioactive molecules. Taken together, our atlas provides a benchmark for high-throughput prediction and functional exploration of micropeptides, providing new insights into their biological mechanisms in cancer. The HMPA is freely available at http://hmpa.zju.edu.cn.
Collapse
Affiliation(s)
- Xinwan Su
- MOE Laboratory of Biosystem Homeostasis and Protection, College of Life Sciences, Zhejiang University, 866 Yuhangtang Road, West Lake District, Hangzhou, Zhejiang 310058, China
- Cancer Center, Zhejiang University, 866 Yuhangtang Road, West Lake District, Hangzhou, Zhejiang 310058, China
- Key Laboratory of Cancer Prevention and Intervention, China National Ministry of Education, 866 Yuhangtang Road, West Lake District, Hangzhou, Zhejiang 310000, China
| | - Chengyu Shi
- MOE Laboratory of Biosystem Homeostasis and Protection, College of Life Sciences, Zhejiang University, 866 Yuhangtang Road, West Lake District, Hangzhou, Zhejiang 310058, China
- Cancer Center, Zhejiang University, 866 Yuhangtang Road, West Lake District, Hangzhou, Zhejiang 310058, China
- Key Laboratory of Cancer Prevention and Intervention, China National Ministry of Education, 866 Yuhangtang Road, West Lake District, Hangzhou, Zhejiang 310000, China
| | - Fangzhou Liu
- MOE Laboratory of Biosystem Homeostasis and Protection, College of Life Sciences, Zhejiang University, 866 Yuhangtang Road, West Lake District, Hangzhou, Zhejiang 310058, China
- Cancer Center, Zhejiang University, 866 Yuhangtang Road, West Lake District, Hangzhou, Zhejiang 310058, China
- Key Laboratory of Cancer Prevention and Intervention, China National Ministry of Education, 866 Yuhangtang Road, West Lake District, Hangzhou, Zhejiang 310000, China
| | - Manman Tan
- MOE Laboratory of Biosystem Homeostasis and Protection, College of Life Sciences, Zhejiang University, 866 Yuhangtang Road, West Lake District, Hangzhou, Zhejiang 310058, China
- Cancer Center, Zhejiang University, 866 Yuhangtang Road, West Lake District, Hangzhou, Zhejiang 310058, China
- Key Laboratory of Cancer Prevention and Intervention, China National Ministry of Education, 866 Yuhangtang Road, West Lake District, Hangzhou, Zhejiang 310000, China
| | - Ying Wang
- MOE Laboratory of Biosystem Homeostasis and Protection, College of Life Sciences, Zhejiang University, 866 Yuhangtang Road, West Lake District, Hangzhou, Zhejiang 310058, China
- Cancer Center, Zhejiang University, 866 Yuhangtang Road, West Lake District, Hangzhou, Zhejiang 310058, China
- Key Laboratory of Cancer Prevention and Intervention, China National Ministry of Education, 866 Yuhangtang Road, West Lake District, Hangzhou, Zhejiang 310000, China
| | - Linyu Zhu
- MOE Laboratory of Biosystem Homeostasis and Protection, College of Life Sciences, Zhejiang University, 866 Yuhangtang Road, West Lake District, Hangzhou, Zhejiang 310058, China
- Cancer Center, Zhejiang University, 866 Yuhangtang Road, West Lake District, Hangzhou, Zhejiang 310058, China
- Key Laboratory of Cancer Prevention and Intervention, China National Ministry of Education, 866 Yuhangtang Road, West Lake District, Hangzhou, Zhejiang 310000, China
| | - Yu Chen
- MOE Laboratory of Biosystem Homeostasis and Protection, College of Life Sciences, Zhejiang University, 866 Yuhangtang Road, West Lake District, Hangzhou, Zhejiang 310058, China
- Cancer Center, Zhejiang University, 866 Yuhangtang Road, West Lake District, Hangzhou, Zhejiang 310058, China
- Key Laboratory of Cancer Prevention and Intervention, China National Ministry of Education, 866 Yuhangtang Road, West Lake District, Hangzhou, Zhejiang 310000, China
| | - Meng Yu
- MOE Laboratory of Biosystem Homeostasis and Protection, College of Life Sciences, Zhejiang University, 866 Yuhangtang Road, West Lake District, Hangzhou, Zhejiang 310058, China
- Cancer Center, Zhejiang University, 866 Yuhangtang Road, West Lake District, Hangzhou, Zhejiang 310058, China
- Key Laboratory of Cancer Prevention and Intervention, China National Ministry of Education, 866 Yuhangtang Road, West Lake District, Hangzhou, Zhejiang 310000, China
| | - Xinyi Wang
- MOE Laboratory of Biosystem Homeostasis and Protection, College of Life Sciences, Zhejiang University, 866 Yuhangtang Road, West Lake District, Hangzhou, Zhejiang 310058, China
- Cancer Center, Zhejiang University, 866 Yuhangtang Road, West Lake District, Hangzhou, Zhejiang 310058, China
- Key Laboratory of Cancer Prevention and Intervention, China National Ministry of Education, 866 Yuhangtang Road, West Lake District, Hangzhou, Zhejiang 310000, China
| | - Jian Liu
- Zhejiang University-University of Edinburgh Institute, Zhejiang University School of Medicine, 718 East Haizhou Rd., Haining, Zhejiang 314400, China
| | - Yang Liu
- Institute of Immunology, Zhejiang University School of Medicine, 866 Yuhangtang Road, West Lake District, Hangzhou, Zhejiang 310009, China
| | - Weiqiang Lin
- International School of Medicine, International Institutes of Medicine, The 4th Affiliated Hospital of Zhejiang University School of Medicine, No. N1, Shangcheng Avenue, Yiwu, Zhejiang 322000, China
| | - Zhaoyuan Fang
- Zhejiang University-University of Edinburgh Institute, Zhejiang University School of Medicine, 718 East Haizhou Rd., Haining, Zhejiang 314400, China
- The Second Affiliated Hospital, Zhejiang University School of Medicine, 88 Jiefang Road, Shangcheng District, Hangzhou, Zhejiang 310000, China
| | - Qiang Sun
- International School of Medicine, International Institutes of Medicine, The 4th Affiliated Hospital of Zhejiang University School of Medicine, No. N1, Shangcheng Avenue, Yiwu, Zhejiang 322000, China
| | - Tianhua Zhou
- Cancer Center, Zhejiang University, 866 Yuhangtang Road, West Lake District, Hangzhou, Zhejiang 310058, China
- Department of Cell Biology and Program in Molecular Cell Biology, Zhejiang University School of Medicine, 866 Yuhangtang Road, West Lake District, Hangzhou, Zhejiang 310058, China
- Department of Molecular Genetics, University of Toronto, 1 King's College Circle, Toronto, ON M5S 1A8, Canada
| | - Aifu Lin
- MOE Laboratory of Biosystem Homeostasis and Protection, College of Life Sciences, Zhejiang University, 866 Yuhangtang Road, West Lake District, Hangzhou, Zhejiang 310058, China
- Cancer Center, Zhejiang University, 866 Yuhangtang Road, West Lake District, Hangzhou, Zhejiang 310058, China
- Key Laboratory of Cancer Prevention and Intervention, China National Ministry of Education, 866 Yuhangtang Road, West Lake District, Hangzhou, Zhejiang 310000, China
- International School of Medicine, International Institutes of Medicine, The 4th Affiliated Hospital of Zhejiang University School of Medicine, No. N1, Shangcheng Avenue, Yiwu, Zhejiang 322000, China
- Future Health Laboratory, Innovation Center of Yangtze River Delta, Zhejiang University, 828 Zhongxing Road, Xitang District, Jiashan, Zhejiang, 314100, China
- Key Laboratory for Cell and Gene Engineering of Zhejiang Province, 866 Yuhangtang Road, West Lake District, Hangzhou, Zhejiang 310058, China
| |
Collapse
|
4
|
Ding S, Liao H, Huang F, Chen L, Guo W, Feng K, Huang T, Cai YD. Analyzing domain features of small proteins using a machine-learning method. Proteomics 2024; 24:e2300302. [PMID: 38258387 DOI: 10.1002/pmic.202300302] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/06/2023] [Revised: 01/14/2024] [Accepted: 01/15/2024] [Indexed: 01/24/2024]
Abstract
Small proteins (SPs) are a unique group of proteins that play crucial roles in many important biological processes. Exploring the biological function of SPs is necessary. In this study, the InterPro tool and the maximum correlation method were utilized to analyze functional domains of SPs. The purpose was to identify important functional domains that can indicate the essential differences between small and large protein sequences. First, the small and large proteins were represented by their functional domains via a one-hot scheme. Then, the MaxRel method was adopted to evaluate the relationships between each domain and the target variable, indicating small or large protein. The top 36 domain features were selected for further investigation. Among them, 14 were deemed to be highly related to SPs because they were annotated to SPs more frequently than large proteins. We found the involvement of functional domains, such as ubiquitin-conjugating enzyme/RWD-like, nuclear transport factor 2 domain, and alpha subunit of guanine nucleotide-binding protein (G-protein) in regulating the biological function of SPs. The involvement of these domains has been confirmed by other recent studies. Our findings indicate that protein functional domains may regulate small protein-related functions and predict their biological activity.
Collapse
Affiliation(s)
- ShiJian Ding
- School of Life Sciences, Shanghai University, Shanghai, China
| | | | - FeiMing Huang
- School of Life Sciences, Shanghai University, Shanghai, China
| | - Lei Chen
- College of Information Engineering, Shanghai Maritime University, Shanghai, China
| | - Wei Guo
- Key Laboratory of Stem Cell Biology, Shanghai Jiao Tong University School of Medicine (SJTUSM) & Shanghai Institutes for Biological Sciences (SIBS), Chinese Academy of Sciences (CAS), Shanghai, China
| | - KaiYan Feng
- Department of Computer Science, Guangdong AIB Polytechnic College, Guangzhou, China
| | - Tao Huang
- Bio-Med Big Data Center, CAS Key Laboratory of Computational Biology, Shanghai Institute of Nutrition and Health, University of Chinese Academy of Sciences, Chinese Academy of Sciences, Shanghai, China
- CAS Key Laboratory of Tissue Microenvironment and Tumor, Shanghai Institute of Nutrition and Health, University of Chinese Academy of Sciences, Chinese Academy of Sciences, Shanghai, China
| | - Yu-Dong Cai
- School of Life Sciences, Shanghai University, Shanghai, China
| |
Collapse
|
5
|
Santos-Júnior CD, Der Torossian Torres M, Duan Y, del Río ÁR, Schmidt TS, Chong H, Fullam A, Kuhn M, Zhu C, Houseman A, Somborski J, Vines A, Zhao XM, Bork P, Huerta-Cepas J, de la Fuente-Nunez C, Coelho LP. Computational exploration of the global microbiome for antibiotic discovery. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.08.31.555663. [PMID: 37693522 PMCID: PMC10491242 DOI: 10.1101/2023.08.31.555663] [Citation(s) in RCA: 11] [Impact Index Per Article: 5.5] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 09/12/2023]
Abstract
Novel antibiotics are urgently needed to combat the antibiotic-resistance crisis. We present a machine learning-based approach to predict prokaryotic antimicrobial peptides (AMPs) by leveraging a vast dataset of 63,410 metagenomes and 87,920 microbial genomes. This led to the creation of AMPSphere, a comprehensive catalog comprising 863,498 non-redundant peptides, the majority of which were previously unknown. We observed that AMP production varies by habitat, with animal-associated samples displaying the highest proportion of AMPs compared to other habitats. Furthermore, within different human-associated microbiota, strain-level differences were evident. To validate our predictions, we synthesized and experimentally tested 50 AMPs, demonstrating their efficacy against clinically relevant drug-resistant pathogens both in vitro and in vivo. These AMPs exhibited antibacterial activity by targeting the bacterial membrane. Additionally, AMPSphere provides valuable insights into the evolutionary origins of peptides. In conclusion, our approach identified AMP sequences within prokaryotic microbiomes, opening up new avenues for the discovery of antibiotics.
Collapse
Affiliation(s)
- Célio Dias Santos-Júnior
- Institute of Science and Technology for Brain-Inspired Intelligence - ISTBI, Fudan University, Shanghai, China
| | - Marcelo Der Torossian Torres
- Machine Biology Group, Departments of Psychiatry and Microbiology, Institute for Biomedical Informatics, Institute for Translational Medicine and Therapeutics, Perelman School of Medicine, University of Pennsylvania; Philadelphia, Pennsylvania, United States of America
- Departments of Bioengineering and Chemical and Biomolecular Engineering, School of Engineering and Applied Science, University of Pennsylvania; Philadelphia, Pennsylvania, United States of America
- Penn Institute for Computational Science, University of Pennsylvania; Philadelphia, Pennsylvania, United States of America
| | - Yiqian Duan
- Institute of Science and Technology for Brain-Inspired Intelligence - ISTBI, Fudan University, Shanghai, China
| | - Álvaro Rodríguez del Río
- Centro de Biotecnología y Genómica de Plantas, Universidad Politécnica de Madrid (UPM) - Instituto Nacional de Investigación y Tecnología Agraria y Alimentaria (INIA-CSIC), Campus de Montegancedo-UPM, 28223 Pozuelo de Alarcón, Madrid, Spain
| | - Thomas S.B. Schmidt
- Structural and Computational Biology Unit, European Molecular Biology Laboratory, Heidelberg, Germany
| | - Hui Chong
- Institute of Science and Technology for Brain-Inspired Intelligence - ISTBI, Fudan University, Shanghai, China
| | - Anthony Fullam
- Structural and Computational Biology Unit, European Molecular Biology Laboratory, Heidelberg, Germany
| | - Michael Kuhn
- Structural and Computational Biology Unit, European Molecular Biology Laboratory, Heidelberg, Germany
| | - Chengkai Zhu
- Institute of Science and Technology for Brain-Inspired Intelligence - ISTBI, Fudan University, Shanghai, China
| | - Amy Houseman
- Institute of Science and Technology for Brain-Inspired Intelligence - ISTBI, Fudan University, Shanghai, China
| | - Jelena Somborski
- Institute of Science and Technology for Brain-Inspired Intelligence - ISTBI, Fudan University, Shanghai, China
| | - Anna Vines
- Institute of Science and Technology for Brain-Inspired Intelligence - ISTBI, Fudan University, Shanghai, China
| | - Xing-Ming Zhao
- Institute of Science and Technology for Brain-Inspired Intelligence - ISTBI, Fudan University, Shanghai, China
- Department of Neurology, Zhongshan Hospital, Fudan University, Shanghai, China
- State Key Laboratory of Medical Neurobiology, Institutes of Brain Science, Fudan University, Shanghai, China
- MOE Key Laboratory of Computational Neuroscience and Brain-Inspired Intelligence
- MOE Frontiers Center for Brain Science, Fudan University, Shanghai, China
- International Human Phenome Institute, Shanghai, China
| | - Peer Bork
- Structural and Computational Biology Unit, European Molecular Biology Laboratory, Heidelberg, Germany
- Max Delbrück Centre for Molecular Medicine, Berlin, Germany
- Department of Bioinformatics, Biocenter, University of Würzburg, Würzburg, Germany
| | - Jaime Huerta-Cepas
- Centro de Biotecnología y Genómica de Plantas, Universidad Politécnica de Madrid (UPM) - Instituto Nacional de Investigación y Tecnología Agraria y Alimentaria (INIA-CSIC), Campus de Montegancedo-UPM, 28223 Pozuelo de Alarcón, Madrid, Spain
| | - Cesar de la Fuente-Nunez
- Machine Biology Group, Departments of Psychiatry and Microbiology, Institute for Biomedical Informatics, Institute for Translational Medicine and Therapeutics, Perelman School of Medicine, University of Pennsylvania; Philadelphia, Pennsylvania, United States of America
- Departments of Bioengineering and Chemical and Biomolecular Engineering, School of Engineering and Applied Science, University of Pennsylvania; Philadelphia, Pennsylvania, United States of America
- Penn Institute for Computational Science, University of Pennsylvania; Philadelphia, Pennsylvania, United States of America
| | - Luis Pedro Coelho
- Institute of Science and Technology for Brain-Inspired Intelligence - ISTBI, Fudan University, Shanghai, China
| |
Collapse
|
6
|
Hassel KR, Brito-Estrada O, Makarewich CA. Microproteins: Overlooked regulators of physiology and disease. iScience 2023; 26:106781. [PMID: 37213226 PMCID: PMC10199267 DOI: 10.1016/j.isci.2023.106781] [Citation(s) in RCA: 20] [Impact Index Per Article: 10.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 05/23/2023] Open
Abstract
Ongoing efforts to generate a complete and accurate annotation of the genome have revealed a significant blind spot for small proteins (<100 amino acids) originating from short open reading frames (sORFs). The recent discovery of numerous sORF-encoded proteins, termed microproteins, that play diverse roles in critical cellular processes has ignited the field of microprotein biology. Large-scale efforts are currently underway to identify sORF-encoded microproteins in diverse cell-types and tissues and specialized methods and tools have been developed to aid in their discovery, validation, and functional characterization. Microproteins that have been identified thus far play important roles in fundamental processes including ion transport, oxidative phosphorylation, and stress signaling. In this review, we discuss the optimized tools available for microprotein discovery and validation, summarize the biological functions of numerous microproteins, outline the promise for developing microproteins as therapeutic targets, and look forward to the future of the field of microprotein biology.
Collapse
Affiliation(s)
- Keira R. Hassel
- The Heart Institute, Division of Molecular Cardiovascular Biology, Cincinnati Children’s Hospital Medical Center, Cincinnati, OH 45229, USA
- University of Cincinnati College of Medicine, Cincinnati, OH 45229, USA
| | - Omar Brito-Estrada
- The Heart Institute, Division of Molecular Cardiovascular Biology, Cincinnati Children’s Hospital Medical Center, Cincinnati, OH 45229, USA
- University of Cincinnati College of Medicine, Cincinnati, OH 45229, USA
| | - Catherine A. Makarewich
- The Heart Institute, Division of Molecular Cardiovascular Biology, Cincinnati Children’s Hospital Medical Center, Cincinnati, OH 45229, USA
- Department of Pediatrics, University of Cincinnati College of Medicine, Cincinnati, OH 45229, USA
| |
Collapse
|
7
|
Wan L, Xiao W, Huang Z, Zhou A, Jiang Y, Zou B, Liu B, Deng C, Zhang Y. Systematic identification of smORFs in domestic silkworm ( Bombyx mori). PeerJ 2023; 11:e14682. [PMID: 36655040 PMCID: PMC9841908 DOI: 10.7717/peerj.14682] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/12/2022] [Accepted: 12/13/2022] [Indexed: 01/15/2023] Open
Abstract
The silkworm (Bombyx mori) is not only an excellent model species, but also an important agricultural economic insect. Taking it as the research object, its advantages of low maintenance cost and no biohazard risks are considered. Small open reading frames (smORFs) are an important class of genomic elements that can produce bioactive peptides. However, the smORFs in silkworm had been poorly identified and studied. To further study the smORFs in silkworm, systematic genome-wide identification is essential. Here, we identified and analyzed smORFs in the silkworm using comprehensive methods. Our results showed that at least 738 highly reliable smORFs were found in B. mori and that 34,401 possible smORFs were partially supported. We also identified some differentially expressed and tissue-specific-expressed smORFs, which may be closely related to the characteristics and functions of the tissues. This article provides a basis for subsequent research on smORFs in silkworm, and also hopes to provide a reference point for future research methods for smORFs in other species.
Collapse
Affiliation(s)
- Linrong Wan
- Sericultural Research Institute,Sichuan Academy of Agricultural Sciences, Nanchong, Sichuan, China,College of Agronomy, Sichuan Agricultural University, Chengdu, Sichuan, China
| | - Wenfu Xiao
- Sericultural Research Institute,Sichuan Academy of Agricultural Sciences, Nanchong, Sichuan, China
| | - Ziyan Huang
- Research and Development Center, LyuKang, Chengdu, Sichuan, China,Departments of Bioinformatics, DNA Stories Bioinformatics Center, Chengdu, Sichuan, China
| | - Anlian Zhou
- Sericultural Research Institute,Sichuan Academy of Agricultural Sciences, Nanchong, Sichuan, China
| | - Yaming Jiang
- Sericultural Research Institute,Sichuan Academy of Agricultural Sciences, Nanchong, Sichuan, China
| | - Bangxing Zou
- Sericultural Research Institute,Sichuan Academy of Agricultural Sciences, Nanchong, Sichuan, China
| | - Binbin Liu
- Sericultural Research Institute,Sichuan Academy of Agricultural Sciences, Nanchong, Sichuan, China
| | - Cao Deng
- Research and Development Center, LyuKang, Chengdu, Sichuan, China,Departments of Bioinformatics, DNA Stories Bioinformatics Center, Chengdu, Sichuan, China
| | - Youhong Zhang
- Sericultural Research Institute,Sichuan Academy of Agricultural Sciences, Nanchong, Sichuan, China
| |
Collapse
|
8
|
Ceron-Noriega A, Almeida MV, Levin M, Butter F. Nematode gene annotation by machine-learning-assisted proteotranscriptomics enables proteome-wide evolutionary analysis. Genome Res 2023; 33:112-128. [PMID: 36653121 PMCID: PMC9977148 DOI: 10.1101/gr.277070.122] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/28/2022] [Accepted: 11/18/2022] [Indexed: 01/19/2023]
Abstract
Nematodes encompass more than 24,000 described species, which were discovered in almost every ecological habitat, and make up >80% of metazoan taxonomic diversity in soils. The last common ancestor of nematodes is believed to date back to ∼650-750 million years, generating a large and phylogenetically diverse group to be explored. However, for most species high-quality gene annotations are incomprehensive or missing. Combining short-read RNA sequencing with mass spectrometry-based proteomics and machine-learning quality control in an approach called proteotranscriptomics, we improve gene annotations for nine genome-sequenced nematode species and provide new gene annotations for three additional species without genome assemblies. Emphasizing the sensitivity of our methodology, we provide evidence for two hitherto undescribed genes in the model organism Caenorhabditis elegans Extensive phylogenetic systems analysis using this comprehensive proteome annotation provides new insights into evolutionary processes of this metazoan group.
Collapse
Affiliation(s)
| | | | - Michal Levin
- Institute of Molecular Biology (IMB), 55128 Mainz, Germany
| | - Falk Butter
- Institute of Molecular Biology (IMB), 55128 Mainz, Germany
| |
Collapse
|
9
|
Chothani S, Ho L, Schafer S, Rackham O. Discovering microproteins: making the most of ribosome profiling data. RNA Biol 2023; 20:943-954. [PMID: 38013207 PMCID: PMC10730196 DOI: 10.1080/15476286.2023.2279845] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 10/30/2023] [Indexed: 11/29/2023] Open
Abstract
Building a reference set of protein-coding open reading frames (ORFs) has revolutionized biological process discovery and understanding. Traditionally, gene models have been confirmed using cDNA sequencing and encoded translated regions inferred using sequence-based detection of start and stop combinations longer than 100 amino-acids to prevent false positives. This has led to small ORFs (smORFs) and their encoded proteins left un-annotated. Ribo-seq allows deciphering translated regions from untranslated irrespective of the length. In this review, we describe the power of Ribo-seq data in detection of smORFs while discussing the major challenge posed by data-quality, -depth and -sparseness in identifying the start and end of smORF translation. In particular, we outline smORF cataloguing efforts in humans and the large differences that have arisen due to variation in data, methods and assumptions. Although current versions of smORF reference sets can already be used as a powerful tool for hypothesis generation, we recommend that future editions should consider these data limitations and adopt unified processing for the community to establish a canonical catalogue of translated smORFs.
Collapse
Affiliation(s)
- Sonia Chothani
- Program in Cardiovascular and Metabolic Disorders, Duke-National University of Singapore, Singapore
| | - Lena Ho
- Program in Cardiovascular and Metabolic Disorders, Duke-National University of Singapore, Singapore
| | - Sebastian Schafer
- Program in Cardiovascular and Metabolic Disorders, Duke-National University of Singapore, Singapore
| | - Owen Rackham
- Program in Cardiovascular and Metabolic Disorders, Duke-National University of Singapore, Singapore
- School of Biological Sciences, University of Southampton, Southampton, UK
- The Alan Turing Institute, The British Library, London, UK
| |
Collapse
|
10
|
Turchetti B, Buzzini P, Baeza M. A genomic approach to analyze the cold adaptation of yeasts isolated from Italian Alps. Front Microbiol 2022; 13:1026102. [DOI: 10.3389/fmicb.2022.1026102] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/23/2022] [Accepted: 10/07/2022] [Indexed: 11/11/2022] Open
Abstract
Microorganisms including yeasts are responsible for mineralization of organic matter in cold regions, and their characterization is critical to elucidate the ecology of such environments on Earth. Strategies developed by yeasts to survive in cold environments have been increasingly studied in the last years and applied to different biotechnological applications, but their knowledge is still limited. Microbial adaptations to cold include the synthesis of cryoprotective compounds, as well as the presence of a high number of genes encoding the synthesis of proteins/enzymes characterized by a reduced proline content and highly flexible and large catalytic active sites. This study is a comparative genomic study on the adaptations of yeasts isolated from the Italian Alps, considering their growth kinetics. The optimal temperature for growth (OTG), growth rate (Gr), and draft genome sizes considerably varied (OTG, 10°C–20°C; Gr, 0.071–0.0726; genomes, 20.7–21.5 Mpb; %GC, 50.9–61.5). A direct relationship was observed between calculated protein flexibilities and OTG, but not for Gr. Putative genes encoding for cold stress response were found, as well as high numbers of genes encoding for general, oxidative, and osmotic stresses. The cold response genes found in the studied yeasts play roles in cell membrane adaptation, compatible solute accumulation, RNA structure changes, and protein folding, i.e., dihydrolipoamide dehydrogenase, glycogen synthase, omega-6 fatty acid, stearoyl-CoA desaturase, ATP-dependent RNA helicase, and elongation of very-long-chain fatty acids. A redundancy for several putative genes was found, higher for P-loop containing nucleoside triphosphate hydrolase, alpha/beta hydrolase, armadillo repeat-containing proteins, and the major facilitator superfamily protein. Hundreds of thousands of small open reading frames (SmORFs) were found in all studied yeasts, especially in Phenoliferia glacialis. Gene clusters encoding for the synthesis of secondary metabolites such as terpene, non-ribosomal peptide, and type III polyketide were predicted in four, three, and two studied yeasts, respectively.
Collapse
|
11
|
Sruthi KB, Menon A, P A, Vasudevan Soniya E. Pervasive translation of small open reading frames in plant long non-coding RNAs. FRONTIERS IN PLANT SCIENCE 2022; 13:975938. [PMID: 36352887 PMCID: PMC9638090 DOI: 10.3389/fpls.2022.975938] [Citation(s) in RCA: 6] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 06/22/2022] [Accepted: 09/29/2022] [Indexed: 06/16/2023]
Abstract
Long non-coding RNAs (lncRNAs) are primarily recognized as non-coding transcripts longer than 200 nucleotides with low coding potential and are present in both eukaryotes and prokaryotes. Recent findings reveal that lncRNAs can code for micropeptides in various species. Micropeptides are generated from small open reading frames (smORFs) and have been discovered frequently in short mRNAs and non-coding RNAs, such as lncRNAs, circular RNAs, and pri-miRNAs. The most accepted definition of a smORF is an ORF containing fewer than 100 codons, and ribosome profiling and mass spectrometry are the most prevalent experimental techniques used to identify them. Although the majority of micropeptides perform critical roles throughout plant developmental processes and stress conditions, only a handful of their functions have been verified to date. Even though more research is being directed toward identifying micropeptides, there is still a dearth of information regarding these peptides in plants. This review outlines the lncRNA-encoded peptides, the evolutionary roles of such peptides in plants, and the techniques used to identify them. It also describes the functions of the pri-miRNA and circRNA-encoded peptides that have been identified in plants.
Collapse
|
12
|
Montini N, Doughty TW, Domenzain I, Fenton DA, Baranov PV, Harrington R, Nielsen J, Siewers V, Morrissey JP. Identification of a novel gene required for competitive growth at high temperature in the thermotolerant yeast Kluyveromyces marxianus. MICROBIOLOGY (READING, ENGLAND) 2022; 168. [PMID: 35333706 PMCID: PMC9558357 DOI: 10.1099/mic.0.001148] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 12/19/2022]
Abstract
It is important to understand the basis of thermotolerance in yeasts to broaden their application in industrial biotechnology. The capacity to run bioprocesses at temperatures above 40 °C is of great interest but this is beyond the growth range of most of the commonly used yeast species. In contrast, some industrial yeasts such as Kluyveromyces marxianus can grow at temperatures of 45 °C or higher. Such species are valuable for direct use in industrial biotechnology and as a vehicle to study the genetic and physiological basis of yeast thermotolerance. In previous work, we reported that evolutionarily young genes disproportionately changed expression when yeast were growing under stressful conditions and postulated that such genes could be important for long-term adaptation to stress. Here, we tested this hypothesis in K. marxianus by identifying and studying species-specific genes that showed increased expression during high-temperature growth. Twelve such genes were identified and 11 were successfully inactivated using CRISPR-mediated mutagenesis. One gene, KLMX_70384, is required for competitive growth at high temperature, supporting the hypothesis that evolutionary young genes could play roles in adaptation to harsh environments. KLMX_70384 is predicted to encode an 83 aa peptide, and RNA sequencing and ribo-sequencing were used to confirm transcription and translation of the gene. The precise function of KLMX_70384 remains unknown but some features are suggestive of RNA-binding activity. The gene is located in what was previously considered an intergenic region of the genome, which lacks homologues in other yeasts or in databases. Overall, the data support the hypothesis that genes that arose de novo in K. marxianus after the speciation event that separated K. marxianus and K. lactis contribute to some of its unique traits.
Collapse
Affiliation(s)
- Noemi Montini
- School of Microbiology, APC Microbiome Ireland, Environmental Research Institute and SUSFERM Centre, University College Cork, Cork T12 K8AF, Ireland
| | - Tyler W Doughty
- Department of Biology and Biological Engineering, Chalmers University of Technology, SE-41296 Gothenburg, Sweden
| | - Iván Domenzain
- Department of Biology and Biological Engineering, Chalmers University of Technology, SE-41296 Gothenburg, Sweden
| | - Darren A Fenton
- School of Microbiology, APC Microbiome Ireland, Environmental Research Institute and SUSFERM Centre, University College Cork, Cork T12 K8AF, Ireland.,School of Biochemistry and Cell Biology, University College Cork, Cork T12 K8AF, Ireland
| | - Pavel V Baranov
- School of Biochemistry and Cell Biology, University College Cork, Cork T12 K8AF, Ireland
| | - Ronan Harrington
- School of Microbiology, APC Microbiome Ireland, Environmental Research Institute and SUSFERM Centre, University College Cork, Cork T12 K8AF, Ireland
| | - Jens Nielsen
- Department of Biology and Biological Engineering, Chalmers University of Technology, SE-41296 Gothenburg, Sweden
| | - Verena Siewers
- Department of Biology and Biological Engineering, Chalmers University of Technology, SE-41296 Gothenburg, Sweden
| | - John P Morrissey
- School of Microbiology, APC Microbiome Ireland, Environmental Research Institute and SUSFERM Centre, University College Cork, Cork T12 K8AF, Ireland
| |
Collapse
|
13
|
Leong AZX, Lee PY, Mohtar MA, Syafruddin SE, Pung YF, Low TY. Short open reading frames (sORFs) and microproteins: an update on their identification and validation measures. J Biomed Sci 2022; 29:19. [PMID: 35300685 PMCID: PMC8928697 DOI: 10.1186/s12929-022-00802-5] [Citation(s) in RCA: 38] [Impact Index Per Article: 12.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/31/2021] [Accepted: 03/09/2022] [Indexed: 12/17/2022] Open
Abstract
A short open reading frame (sORFs) constitutes ≤ 300 bases, encoding a microprotein or sORF-encoded protein (SEP) which comprises ≤ 100 amino acids. Traditionally dismissed by genome annotation pipelines as meaningless noise, sORFs were found to possess coding potential with ribosome profiling (RIBO-Seq), which unveiled sORF-based transcripts at various genome locations. Nonetheless, the existence of corresponding microproteins that are stable and functional was little substantiated by experimental evidence initially. With recent advancements in multi-omics, the identification, validation, and functional characterisation of sORFs and microproteins have become feasible. In this review, we discuss the history and development of an emerging research field of sORFs and microproteins. In particular, we focus on an array of bioinformatics and OMICS approaches used for predicting, sequencing, validating, and characterizing these recently discovered entities. These strategies include RIBO-Seq which detects sORF transcripts via ribosome footprints, and mass spectrometry (MS)-based proteomics for sequencing the resultant microproteins. Subsequently, our discussion extends to the functional characterisation of microproteins by incorporating CRISPR/Cas9 screen and protein–protein interaction (PPI) studies. Our review discusses not only detection methodologies, but we also highlight on the challenges and potential solutions in identifying and validating sORFs and their microproteins. The novelty of this review lies within its validation for the functional role of microproteins, which could contribute towards the future landscape of microproteomics.
Collapse
Affiliation(s)
- Alyssa Zi-Xin Leong
- UKM Medical Molecular Biology Institute (UMBI), Universiti Kebangsaan Malaysia, 56000, Kuala Lumpur, Malaysia
| | - Pey Yee Lee
- UKM Medical Molecular Biology Institute (UMBI), Universiti Kebangsaan Malaysia, 56000, Kuala Lumpur, Malaysia
| | - M Aiman Mohtar
- UKM Medical Molecular Biology Institute (UMBI), Universiti Kebangsaan Malaysia, 56000, Kuala Lumpur, Malaysia
| | - Saiful Effendi Syafruddin
- UKM Medical Molecular Biology Institute (UMBI), Universiti Kebangsaan Malaysia, 56000, Kuala Lumpur, Malaysia
| | - Yuh-Fen Pung
- Division of Biomedical Science, School of Pharmacy, University of Nottingham Malaysia, Semenyih, 43500, Selangor, Malaysia
| | - Teck Yew Low
- UKM Medical Molecular Biology Institute (UMBI), Universiti Kebangsaan Malaysia, 56000, Kuala Lumpur, Malaysia.
| |
Collapse
|
14
|
Koch P, Schmitt S, Cardner M, Beerenwinkel N, Panke S, Held M. Discovery of antimicrobials by massively parallelized growth assays (Me x). Sci Rep 2022; 12:4097. [PMID: 35260685 PMCID: PMC8904554 DOI: 10.1038/s41598-022-07755-7] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/04/2021] [Accepted: 02/17/2022] [Indexed: 01/12/2023] Open
Abstract
The number of newly approved antimicrobial compounds has been steadily decreasing over the past 50 years emphasizing the need for novel antimicrobial substances. Here we present Mex, a method for the high-throughput discovery of novel antimicrobials, that relies on E. coli self-screening to determine the bioactivity of more than ten thousand naturally occurring peptides. Analysis of thousands of E. coli growth curves using next-generation sequencing enables the identification of more than 1000 previously unknown antimicrobial peptides. Additionally, by incorporating the kinetics of growth inhibition, a first indication of the mode of action is obtained, which has implications for the ultimate usefulness of the peptides in question. The most promising peptides of the screen are chemically synthesized and their activity is determined in standardized susceptibility assays. Ten out of 15 investigated peptides efficiently eradicate bacteria at a minimal inhibitory concentration in the lower µM or upper nM range. This work represents a step-change in the high-throughput discovery of functionally diverse antimicrobials.
Collapse
Affiliation(s)
- Philipp Koch
- Bioprocess Laboratory, Department of Biosystems Science and Engineering, ETH Zurich, Basel, Switzerland
| | - Steven Schmitt
- Bioprocess Laboratory, Department of Biosystems Science and Engineering, ETH Zurich, Basel, Switzerland
| | - Mathias Cardner
- Computational Biology, Department of Biosystems Science and Engineering, ETH Zurich, Basel, Switzerland.,SIB Swiss Institute of Bioinformatics, 4058, Basel, Switzerland
| | - Niko Beerenwinkel
- Computational Biology, Department of Biosystems Science and Engineering, ETH Zurich, Basel, Switzerland.,SIB Swiss Institute of Bioinformatics, 4058, Basel, Switzerland
| | - Sven Panke
- Bioprocess Laboratory, Department of Biosystems Science and Engineering, ETH Zurich, Basel, Switzerland
| | - Martin Held
- Bioprocess Laboratory, Department of Biosystems Science and Engineering, ETH Zurich, Basel, Switzerland.
| |
Collapse
|
15
|
Micropeptides translated from putative long non-coding RNAs. Acta Biochim Biophys Sin (Shanghai) 2022; 54:292-300. [PMID: 35538037 PMCID: PMC9827906 DOI: 10.3724/abbs.2022010] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/25/2022] Open
Abstract
Long non-coding RNAs (lncRNAs) transcribed in mammals and eukaryotes were thought to have no protein coding capability. However, recent studies have suggested that plenty of lncRNAs are mis-annotated and virtually contain coding sequences which are translated into functional peptides by ribosomal machinery, and these functional peptides are called micropeptides or small peptides. Here we review the rapidly advancing field of micropeptides translated from putative lncRNAs, describe the strategies for their identification, and elucidate their critical roles in many fundamental biological processes. We also discuss the prospects of research in micropeptides and the potential applications of micropeptides.
Collapse
|
16
|
Noncoding-RNA-Based Therapeutics with an Emphasis on Prostatic Carcinoma—Progress and Challenges. Vaccines (Basel) 2022; 10:vaccines10020276. [PMID: 35214734 PMCID: PMC8877701 DOI: 10.3390/vaccines10020276] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/28/2021] [Revised: 01/26/2022] [Accepted: 02/03/2022] [Indexed: 12/19/2022] Open
Abstract
Noncoding RNAs (ncRNAs) defy the central dogma by representing a family of RNA molecules that are not translated into protein but can convey information encoded in their DNA. Elucidating the exact function of ncRNA has been a focus of discovery in the last decade and remains challenging. Nevertheless, the importance of understanding ncRNA is apparent since these molecules regulate gene expression at the transcriptional and post-transcriptional level exerting pleiotropic effects critical in development, oncogenesis, and immunity. NcRNAs have been referred to as “the dark matter of the nucleus”, and unraveling their role in physiologic and pathologic processes will provide vast opportunities for basic and translational research with the potential for significant therapeutic progress. Consequently, strong efforts are underway to exploit the therapeutic utility of ncRNA, some of which have been approved by the US Food and Drug Administration and the European Medicines Agency. The use of ncRNA therapeutics (or “vaccines” if defined as anti-disease agents) may result in improved curative strategies when used alone or in combination with existing treatments. This review will focus on the role of ncRNA therapeutics in prostatic carcinoma while exploring basic biological aspects of these molecules that represent about 97% of the transcriptome in humans.
Collapse
|
17
|
Kute PM, Soukarieh O, Tjeldnes H, Trégouët DA, Valen E. Small Open Reading Frames, How to Find Them and Determine Their Function. Front Genet 2022; 12:796060. [PMID: 35154250 PMCID: PMC8831751 DOI: 10.3389/fgene.2021.796060] [Citation(s) in RCA: 13] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/15/2021] [Accepted: 12/30/2021] [Indexed: 12/12/2022] Open
Abstract
Advances in genomics and molecular biology have revealed an abundance of small open reading frames (sORFs) across all types of transcripts. While these sORFs are often assumed to be non-functional, many have been implicated in physiological functions and a significant number of sORFs have been described in human diseases. Thus, sORFs may represent a hidden repository of functional elements that could serve as therapeutic targets. Unlike protein-coding genes, it is not necessarily the encoded peptide of an sORF that enacts its function, sometimes simply the act of translating an sORF might have a regulatory role. Indeed, the most studied sORFs are located in the 5′UTRs of coding transcripts and can have a regulatory impact on the translation of the downstream protein-coding sequence. However, sORFs have also been abundantly identified in non-coding RNAs including lncRNAs, circular RNAs and ribosomal RNAs suggesting that sORFs may be diverse in function. Of the many different experimental methods used to discover sORFs, the most commonly used are ribosome profiling and mass spectrometry. These can confirm interactions between transcripts and ribosomes and the production of a peptide, respectively. Extensions to ribosome profiling, which also capture scanning ribosomes, have further made it possible to see how sORFs impact the translation initiation of mRNAs. While high-throughput techniques have made the identification of sORFs less difficult, defining their function, if any, is typically more challenging. Together, the abundance and potential function of many of these sORFs argues for the necessity of including sORFs in gene annotations and systematically characterizing these to understand their potential functional roles. In this review, we will focus on the high-throughput methods used in the detection and characterization of sORFs and discuss techniques for validation and functional characterization.
Collapse
Affiliation(s)
- Preeti Madhav Kute
- Computational Biology Unit, Department of Informatics, University of Bergen, Bergen, Norway
- Sars International Centre for Marine Molecular Biology, University of Bergen, Bergen, Norway
| | - Omar Soukarieh
- Department of Molecular Epidemiology Of Vascular and Brain Disorders, INSERM, BPH, U1219, University of Bordeaux, Bordeaux, France
| | - Håkon Tjeldnes
- Computational Biology Unit, Department of Informatics, University of Bergen, Bergen, Norway
| | - David-Alexandre Trégouët
- Department of Molecular Epidemiology Of Vascular and Brain Disorders, INSERM, BPH, U1219, University of Bordeaux, Bordeaux, France
| | - Eivind Valen
- Computational Biology Unit, Department of Informatics, University of Bergen, Bergen, Norway
- Sars International Centre for Marine Molecular Biology, University of Bergen, Bergen, Norway
- *Correspondence: Eivind Valen,
| |
Collapse
|
18
|
Liu T, Wu J, Wu Y, Hu W, Fang Z, Wang Z, Jiang C, Li S. LncPep: A Resource of Translational Evidences for lncRNAs. Front Cell Dev Biol 2022; 10:795084. [PMID: 35141219 PMCID: PMC8819059 DOI: 10.3389/fcell.2022.795084] [Citation(s) in RCA: 11] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/14/2021] [Accepted: 01/05/2022] [Indexed: 12/13/2022] Open
Abstract
Long noncoding RNAs (lncRNAs) are a type of transcript that is >200 nucleotides long with no protein-coding capacity. Accumulating studies have suggested that lncRNAs contain open reading frames (ORFs) that encode peptides. Although several noncoding RNA-encoded peptide-related databases have been developed, most of them display only a small number of experimentally validated peptides, and resources focused on lncRNA-encoded peptides are still lacking. We used six types of evidence, coding potential assessment tool (CPAT), coding potential calculator v2.0 (CPC2), N6-methyladenosine modification of RNA sites (m6A), Pfam, ribosome profiling (Ribo-seq), and translation initiation sites (TISs), to evaluate the coding potential of 883,804 lncRNAs across 39 species. We constructed a comprehensive database of lncRNA-encoded peptides, LncPep (http://www.shenglilabs.com/LncPep/). LncPep provides three major functional modules: 1) user-friendly searching/browsing interface, 2) prediction and BLAST modules for exploring novel lncRNAs and peptides, and 3) annotations for lncRNAs, peptides and supporting evidence. Taken together, LncPep is a user-friendly and convenient platform for discovering and investigating peptides encoded by lncRNAs.
Collapse
Affiliation(s)
- Teng Liu
- Precision Research Center for Refractory Diseases, Institute for Clinical Research, Shanghai General Hospital, Shanghai Jiao Tong University School of Medicine, Shanghai, China
| | - Jingni Wu
- Precision Research Center for Refractory Diseases, Institute for Clinical Research, Shanghai General Hospital, Shanghai Jiao Tong University School of Medicine, Shanghai, China
| | - Yangjun Wu
- Department of Gynecological Oncology, Fudan University Shanghai Cancer Center, Shanghai, China
| | - Wei Hu
- Precision Research Center for Refractory Diseases, Institute for Clinical Research, Shanghai General Hospital, Shanghai Jiao Tong University School of Medicine, Shanghai, China
| | - Zhixiao Fang
- Precision Research Center for Refractory Diseases, Institute for Clinical Research, Shanghai General Hospital, Shanghai Jiao Tong University School of Medicine, Shanghai, China
| | - Zishan Wang
- Department of Genetics and Genomic Sciences, Center for Transformative Disease Modeling, Tisch Cancer Institute, Icahn Institute for Data Science and Genomic Technology, Icahn School of Medicine at Mount Sinai, New York, NY, United States
| | - Chunjie Jiang
- Institute for Diabetes Obesity, and Metabolism, Perelman School of Medicine at the University of Pennsylvania, Philadelphia, PA, United States
| | - Shengli Li
- Precision Research Center for Refractory Diseases, Institute for Clinical Research, Shanghai General Hospital, Shanghai Jiao Tong University School of Medicine, Shanghai, China
- *Correspondence: Shengli Li,
| |
Collapse
|
19
|
Abstract
In recent years, there has been increased appreciation that a whole category of proteins, small proteins of around 50 amino acids or fewer in length, has been missed by annotation as well as by genetic and biochemical assays. With the increased recognition that small proteins are stable within cells and have regulatory functions, there has been intensified study of these proteins. As a result, important questions about small proteins in bacteria and archaea are coming to the fore. Here, we give an overview of these questions, the initial answers, and the approaches needed to address these questions more fully. More detailed discussions of how small proteins can be identified by ribosome profiling and mass spectrometry approaches are provided by two accompanying reviews (N. Vazquez-Laslop, C. M. Sharma, A. S. Mankin, and A. R. Buskirk, J Bacteriol 204:e00294-21, 2022, https://doi.org/10.1128/JB.00294-21; C. H. Ahrens, J. T. Wade, M. M. Champion, and J. D. Langer, J Bacteriol 204:e00353-21, 2022, https://doi.org/10.1128/JB.00353-21). We are excited by the prospects of new insights and possible therapeutic approaches coming from this emerging field.
Collapse
Affiliation(s)
- Todd Gray
- Wadsworth Center, New York State Department of Health, Albany, New York, USA
- Department of Biomedical Sciences, University at Albany, Albany, New York, USA
| | - Gisela Storz
- Division of Molecular and Cellular Biology, Eunice Kennedy Shriver National Institute of Child Health and Human Development, Bethesda, Maryland, USA
| | - Kai Papenfort
- Institute of Microbiology, Friedrich Schiller University, Jena, Germany
- Microverse Cluster, Friedrich Schiller University, Jena, Germany
| |
Collapse
|
20
|
Cardon T, Fournier I, Salzet M. Unveiling a Ghost Proteome in the Glioblastoma Non-Coding RNAs. Front Cell Dev Biol 2022; 9:703583. [PMID: 35004666 PMCID: PMC8733697 DOI: 10.3389/fcell.2021.703583] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/30/2021] [Accepted: 12/03/2021] [Indexed: 12/13/2022] Open
Abstract
Glioblastoma is the most common brain cancer in adults. Nevertheless, the median survival time is 15 months, if treated with at least a near total resection and followed by radiotherapy in association with temozolomide. In glioblastoma (GBM), variations of non-coding ribonucleic acid (ncRNA) expression have been demonstrated in tumor processes, especially in the regulation of major signaling pathways. Moreover, many ncRNAs present in their sequences an Open Reading Frame (ORF) allowing their translations into proteins, so-called alternative proteins (AltProt) and constituting the “ghost proteome.” This neglected world in GBM has been shown to be implicated in protein–protein interaction (PPI) with reference proteins (RefProt) reflecting involvement in signaling pathways linked to cellular mobility and transfer RNA regulation. More recently, clinical studies have revealed that AltProt is also involved in the patient’s survival and bad prognosis. We thus propose to review the ncRNAs involved in GBM and highlight their function in the disease.
Collapse
Affiliation(s)
- Tristan Cardon
- University of Lille, Inserm, CHU Lille, U1192-Protéomique Réponse Inflammatoire Spectrométrie de Masse-PRISM, Lille, France
| | - Isabelle Fournier
- University of Lille, Inserm, CHU Lille, U1192-Protéomique Réponse Inflammatoire Spectrométrie de Masse-PRISM, Lille, France.,Institut Universitaire de France, Paris, France
| | - Michel Salzet
- University of Lille, Inserm, CHU Lille, U1192-Protéomique Réponse Inflammatoire Spectrométrie de Masse-PRISM, Lille, France.,Institut Universitaire de France, Paris, France
| |
Collapse
|
21
|
Parmar BS, Peeters MKR, Boonen K, Clark EC, Baggerman G, Menschaert G, Temmerman L. Identification of Non-Canonical Translation Products in C. elegans Using Tandem Mass Spectrometry. Front Genet 2021; 12:728900. [PMID: 34759956 PMCID: PMC8575065 DOI: 10.3389/fgene.2021.728900] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/22/2021] [Accepted: 09/16/2021] [Indexed: 11/22/2022] Open
Abstract
Transcriptome and ribosome sequencing have revealed the existence of many non-canonical transcripts, mainly containing splice variants, ncRNA, sORFs and altORFs. However, identification and characterization of products that may be translated out of these remains a challenge. Addressing this, we here report on 552 non-canonical proteins and splice variants in the model organism C. elegans using tandem mass spectrometry. Aided by sequencing-based prediction, we generated a custom proteome database tailored to search for non-canonical translation products of C. elegans. Using this database, we mined available mass spectrometric resources of C. elegans, from which 51 novel, non-canonical proteins could be identified. Furthermore, we utilized diverse proteomic and peptidomic strategies to detect 40 novel non-canonical proteins in C. elegans by LC-TIMS-MS/MS, of which 6 were common with our meta-analysis of existing resources. Together, this permits us to provide a resource with detailed annotation of 467 splice variants and 85 novel proteins mapped onto UTRs, non-coding regions and alternative open reading frames of the C. elegans genome.
Collapse
Affiliation(s)
- Bhavesh S. Parmar
- Animal Physiology and Neurobiology, University of Leuven (KU Leuven), Leuven, Belgium
| | - Marlies K. R. Peeters
- Laboratory of Bioinformatics and Computational Genomics (BioBix), Department of Mathematical Modelling, Ghent University, Ghent, Belgium
| | - Kurt Boonen
- Centre for Proteomics (CFP), University of Antwerp, Antwerp, Belgium
| | - Ellie C. Clark
- Animal Physiology and Neurobiology, University of Leuven (KU Leuven), Leuven, Belgium
| | - Geert Baggerman
- Centre for Proteomics (CFP), University of Antwerp, Antwerp, Belgium
| | - Gerben Menschaert
- Laboratory of Bioinformatics and Computational Genomics (BioBix), Department of Mathematical Modelling, Ghent University, Ghent, Belgium
| | - Liesbet Temmerman
- Animal Physiology and Neurobiology, University of Leuven (KU Leuven), Leuven, Belgium
| |
Collapse
|
22
|
Fesenko I, Shabalina SA, Mamaeva A, Knyazev A, Glushkevich A, Lyapina I, Ziganshin R, Kovalchuk S, Kharlampieva D, Lazarev V, Taliansky M, Koonin EV. A vast pool of lineage-specific microproteins encoded by long non-coding RNAs in plants. Nucleic Acids Res 2021; 49:10328-10346. [PMID: 34570232 DOI: 10.1093/nar/gkab816] [Citation(s) in RCA: 34] [Impact Index Per Article: 8.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/03/2021] [Revised: 08/17/2021] [Accepted: 09/17/2021] [Indexed: 12/17/2022] Open
Abstract
Pervasive transcription of eukaryotic genomes results in expression of long non-coding RNAs (lncRNAs) most of which are poorly conserved in evolution and appear to be non-functional. However, some lncRNAs have been shown to perform specific functions, in particular, transcription regulation. Thousands of small open reading frames (smORFs, <100 codons) located on lncRNAs potentially might be translated into peptides or microproteins. We report a comprehensive analysis of the conservation and evolutionary trajectories of lncRNAs-smORFs from the moss Physcomitrium patens across transcriptomes of 479 plant species. Although thousands of smORFs are subject to substantial purifying selection, the majority of the smORFs appear to be evolutionary young and could represent a major pool for functional innovation. Using nanopore RNA sequencing, we show that, on average, the transcriptional level of conserved smORFs is higher than that of non-conserved smORFs. Proteomic analysis confirmed translation of 82 novel species-specific smORFs. Numerous conserved smORFs containing low complexity regions (LCRs) or transmembrane domains were identified, the biological functions of a selected LCR-smORF were demonstrated experimentally. Thus, microproteins encoded by smORFs are a major, functionally diverse component of the plant proteome.
Collapse
Affiliation(s)
- Igor Fesenko
- Shemyakin and Ovchinnikov Institute of Bioorganic Chemistry of the Russian Academy of Sciences, Moscow 117997, Russian Federation
| | - Svetlana A Shabalina
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD 20894, USA
| | - Anna Mamaeva
- Shemyakin and Ovchinnikov Institute of Bioorganic Chemistry of the Russian Academy of Sciences, Moscow 117997, Russian Federation
| | - Andrey Knyazev
- Shemyakin and Ovchinnikov Institute of Bioorganic Chemistry of the Russian Academy of Sciences, Moscow 117997, Russian Federation
| | - Anna Glushkevich
- Shemyakin and Ovchinnikov Institute of Bioorganic Chemistry of the Russian Academy of Sciences, Moscow 117997, Russian Federation
| | - Irina Lyapina
- Shemyakin and Ovchinnikov Institute of Bioorganic Chemistry of the Russian Academy of Sciences, Moscow 117997, Russian Federation
| | - Rustam Ziganshin
- Shemyakin and Ovchinnikov Institute of Bioorganic Chemistry of the Russian Academy of Sciences, Moscow 117997, Russian Federation
| | - Sergey Kovalchuk
- Shemyakin and Ovchinnikov Institute of Bioorganic Chemistry of the Russian Academy of Sciences, Moscow 117997, Russian Federation
| | - Daria Kharlampieva
- Department of Cell Biology, Federal Research and Clinical Center of Physical -Chemical Medicine of Federal Medical Biological Agency, Moscow 119435, Russian Federation
| | - Vassili Lazarev
- Department of Cell Biology, Federal Research and Clinical Center of Physical -Chemical Medicine of Federal Medical Biological Agency, Moscow 119435, Russian Federation.,Moscow Institute of Physics and Technology (National Research University), Dolgoprudny, Moscow region, 141701, Russian Federation
| | - Michael Taliansky
- Shemyakin and Ovchinnikov Institute of Bioorganic Chemistry of the Russian Academy of Sciences, Moscow 117997, Russian Federation.,The James Hutton Institute, Invergowrie, Dundee DD2 5DA, UK
| | - Eugene V Koonin
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD 20894, USA
| |
Collapse
|
23
|
Kiniry SJ, Judge CE, Michel AM, Baranov PV. Trips-Viz: an environment for the analysis of public and user-generated ribosome profiling data. Nucleic Acids Res 2021; 49:W662-W670. [PMID: 33950201 PMCID: PMC8262740 DOI: 10.1093/nar/gkab323] [Citation(s) in RCA: 15] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/19/2021] [Revised: 04/11/2021] [Accepted: 04/20/2021] [Indexed: 02/07/2023] Open
Abstract
Trips-Viz (https://trips.ucc.ie/) is an interactive platform for the analysis and visualization of ribosome profiling (Ribo-Seq) and shotgun RNA sequencing (RNA-seq) data. This includes publicly available and user generated data, hence Trips-Viz can be classified as a database and as a server. As a database it provides access to many processed Ribo-Seq and RNA-seq data aligned to reference transcriptomes which has been expanded considerably since its inception. Here, we focus on the server functionality of Trips-viz which also has been greatly improved. Trips-viz now enables visualisation of proteomics data from a large number of processed mass spectrometry datasets. It can be used to support translation inferred from Ribo-Seq data. Users are now able to upload a custom reference transcriptome as well as data types other than Ribo-Seq/RNA-Seq. Incorporating custom data has been streamlined with RiboGalaxy (https://ribogalaxy.ucc.ie/) integration. The other new functionality is the rapid detection of translated open reading frames (ORFs) through a simple easy to use interface. The analysis of differential expression has been also improved via integration of DESeq2 and Anota2seq in addition to a number of other improvements of existing Trips-viz features.
Collapse
Affiliation(s)
- Stephen J Kiniry
- School of Biochemistry and Cell Biology, University College Cork, Cork, Ireland
| | - Ciara E Judge
- School of Biochemistry and Cell Biology, University College Cork, Cork, Ireland
| | - Audrey M Michel
- School of Biochemistry and Cell Biology, University College Cork, Cork, Ireland
- Ribomaps Ltd, Western Gateway Bld, Western Rd, Cork, Ireland
| | - Pavel V Baranov
- School of Biochemistry and Cell Biology, University College Cork, Cork, Ireland
- Shemyakin-Ovchinnikov Institute of Bioorganic Chemistry, RAS, Moscow, Russia
| |
Collapse
|
24
|
Choteau SA, Wagner A, Pierre P, Spinelli L, Brun C. MetamORF: a repository of unique short open reading frames identified by both experimental and computational approaches for gene and metagene analyses. DATABASE-THE JOURNAL OF BIOLOGICAL DATABASES AND CURATION 2021; 2021:6307706. [PMID: 34156446 PMCID: PMC8218702 DOI: 10.1093/database/baab032] [Citation(s) in RCA: 22] [Impact Index Per Article: 5.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 12/23/2020] [Revised: 04/08/2021] [Accepted: 05/17/2021] [Indexed: 11/12/2022]
Abstract
The development of high-throughput technologies revealed the existence of non-canonical short open reading frames (sORFs) on most eukaryotic ribonucleic acids. They are ubiquitous genetic elements conserved across species and suspected to be involved in numerous cellular processes. MetamORF (https://metamorf.hb.univ-amu.fr/) aims to provide a repository of unique sORFs identified in the human and mouse genomes with both experimental and computational approaches. By gathering publicly available sORF data, normalizing them and summarizing redundant information, we were able to identify a total of 1 162 675 unique sORFs. Despite the usual characterization of ORFs as short, upstream or downstream, there is currently no clear consensus regarding the definition of these categories. Thus, the data have been reprocessed using a normalized nomenclature. MetamORF enables new analyses at locus, gene, transcript and ORF levels, which should offer the possibility to address new questions regarding sORF functions in the future. The repository is available through an user-friendly web interface, allowing easy browsing, visualization, filtering over multiple criteria and export possibilities. sORFs can be searched starting from a gene, a transcript and an ORF ID, looking in a genome area or browsing the whole repository for a species. The database content has also been made available through track hubs at UCSC Genome Browser. Finally, we demonstrated an enrichment of genes harboring upstream ORFs among genes expressed in response to reticular stress. Database URL https://metamorf.hb.univ-amu.fr/.
Collapse
Affiliation(s)
- Sebastien A Choteau
- Aix-Marseille University, INSERM, TAGC, Turing Centre for Living Systems, 163 Avenue de Luminy, Marseille 13009, France.,Aix-Marseille University, INSERM, CNRS, CIML, Turing Centre for Living Systems, 163 Avenue de Luminy, Marseille 13009, France
| | - Audrey Wagner
- Aix-Marseille University, INSERM, TAGC, Turing Centre for Living Systems, 163 Avenue de Luminy, Marseille 13009, France
| | - Philippe Pierre
- Aix-Marseille University, INSERM, CNRS, CIML, Turing Centre for Living Systems, 163 Avenue de Luminy, Marseille 13009, France.,Department of Medical Sciences, Institute for Research in Biomedicine (iBiMED) and Ilidio Pinho Foundation, University of Aveiro, Aveiro 3810-193, Portugal.,Shanghai Institute of Immunology, School of Medicine, Shanghai Jiao Tong University, Shanghai, China
| | - Lionel Spinelli
- Aix-Marseille University, INSERM, TAGC, Turing Centre for Living Systems, 163 Avenue de Luminy, Marseille 13009, France.,Aix-Marseille University, INSERM, CNRS, CIML, Turing Centre for Living Systems, 163 Avenue de Luminy, Marseille 13009, France
| | - Christine Brun
- Aix-Marseille University, INSERM, TAGC, Turing Centre for Living Systems, 163 Avenue de Luminy, Marseille 13009, France.,CNRS, 31 Chemin Joseph Aiguier, Marseille 13009, France
| |
Collapse
|
25
|
Multi-omics annotation of human long non-coding RNAs. Biochem Soc Trans 2021; 48:1545-1556. [PMID: 32756901 DOI: 10.1042/bst20191063] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/14/2020] [Revised: 07/05/2020] [Accepted: 07/07/2020] [Indexed: 12/12/2022]
Abstract
LncRNAs (long non-coding RNAs) are pervasively transcribed in the human genome and also extensively involved in a variety of essential biological processes and human diseases. The comprehensive annotation of human lncRNAs is of great significance in navigating the functional landscape of the human genome and deepening the understanding of the multi-featured RNA world. However, the unique characteristics of lncRNAs as well as their enormous quantity have complicated and challenged the annotation of lncRNAs. Advances in high-throughput sequencing technologies give rise to a large volume of omics data that are generated at an unprecedented rate and scale, providing possibilities in the identification, characterization and functional annotation of lncRNAs. Here, we review the recent important discoveries of human lncRNAs through analysis of various omics data and summarize specialized lncRNA database resources. Moreover, we highlight the multi-omics integrative analysis as a powerful strategy to efficiently discover and characterize the functional lncRNAs and elucidate their potential molecular mechanisms.
Collapse
|
26
|
Wang B, Wang Z, Pan N, Huang J, Wan C. Improved Identification of Small Open Reading Frames Encoded Peptides by Top-Down Proteomic Approaches and De Novo Sequencing. Int J Mol Sci 2021; 22:ijms22115476. [PMID: 34067398 PMCID: PMC8197016 DOI: 10.3390/ijms22115476] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/13/2021] [Revised: 05/14/2021] [Accepted: 05/18/2021] [Indexed: 12/20/2022] Open
Abstract
Small open reading frames (sORFs) have translational potential to produce peptides that play essential roles in various biological processes. Nevertheless, many sORF-encoded peptides (SEPs) are still on the prediction level. Here, we construct a strategy to analyze SEPs by combining top-down and de novo sequencing to improve SEP identification and sequence coverage. With de novo sequencing, we identified 1682 peptides mapping to 2544 human sORFs, which were all first characterized in this work. Two-thirds of these new sORFs have reading frame shifts and use a non-ATG start codon. The top-down approach identified 241 human SEPs, with high sequence coverage. The average length of the peptides from the bottom-up database search was 19 amino acids (AA); from de novo sequencing, it was 9 AA; and from the top-down approach, it was 25 AA. The longer peptide positively boosts the sequence coverage, more efficiently distinguishing SEPs from the known gene coding sequence. Top-down has the advantage of identifying peptides with sequential K/R or high K/R content, which is unfavorable in the bottom-up approach. Our method can explore new coding sORFs and obtain highly accurate sequences of their SEPs, which can also benefit future function research.
Collapse
|
27
|
Chen J, Zhang J, Gao Y, Li Y, Feng C, Song C, Ning Z, Zhou X, Zhao J, Feng M, Zhang Y, Wei L, Pan Q, Jiang Y, Qian F, Han J, Yang Y, Wang Q, Li C. LncSEA: a platform for long non-coding RNA related sets and enrichment analysis. Nucleic Acids Res 2021; 49:D969-D980. [PMID: 33045741 PMCID: PMC7778898 DOI: 10.1093/nar/gkaa806] [Citation(s) in RCA: 78] [Impact Index Per Article: 19.5] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/27/2020] [Revised: 09/03/2020] [Accepted: 09/30/2020] [Indexed: 02/01/2023] Open
Abstract
Long non-coding RNAs (lncRNAs) have been proven to play important roles in transcriptional processes and various biological functions. Establishing a comprehensive collection of human lncRNA sets is urgent work at present. Using reference lncRNA sets, enrichment analyses will be useful for analyzing lncRNA lists of interest submitted by users. Therefore, we developed a human lncRNA sets database, called LncSEA, which aimed to document a large number of available resources for human lncRNA sets and provide annotation and enrichment analyses for lncRNAs. LncSEA supports >40 000 lncRNA reference sets across 18 categories and 66 sub-categories, and covers over 50 000 lncRNAs. We not only collected lncRNA sets based on downstream regulatory data sources, but also identified a large number of lncRNA sets regulated by upstream transcription factors (TFs) and DNA regulatory elements by integrating TF ChIP-seq, DNase-seq, ATAC-seq and H3K27ac ChIP-seq data. Importantly, LncSEA provides annotation and enrichment analyses of lncRNA sets associated with upstream regulators and downstream targets. In summary, LncSEA is a powerful platform that provides a variety of types of lncRNA sets for users, and supports lncRNA annotations and enrichment analyses. The LncSEA database is freely accessible at http://bio.liclab.net/LncSEA/index.php.
Collapse
Affiliation(s)
- Jiaxin Chen
- School of Medical Informatics, Daqing Campus, Harbin Medical University, Daqing 163319, China
| | - Jian Zhang
- School of Medical Informatics, Daqing Campus, Harbin Medical University, Daqing 163319, China
| | - Yu Gao
- School of Medical Informatics, Daqing Campus, Harbin Medical University, Daqing 163319, China
| | - Yanyu Li
- School of Medical Informatics, Daqing Campus, Harbin Medical University, Daqing 163319, China
| | - Chenchen Feng
- School of Medical Informatics, Daqing Campus, Harbin Medical University, Daqing 163319, China
| | - Chao Song
- Department of Pharmacology, Daqing Campus, Harbin Medical University, Daqing 163319, China
| | - Ziyu Ning
- School of Medical Informatics, Daqing Campus, Harbin Medical University, Daqing 163319, China
| | - Xinyuan Zhou
- School of Medical Informatics, Daqing Campus, Harbin Medical University, Daqing 163319, China
| | - Jianmei Zhao
- School of Medical Informatics, Daqing Campus, Harbin Medical University, Daqing 163319, China
| | - Minghong Feng
- School of Medical Informatics, Daqing Campus, Harbin Medical University, Daqing 163319, China
| | - Yuexin Zhang
- School of Medical Informatics, Daqing Campus, Harbin Medical University, Daqing 163319, China
| | - Ling Wei
- School of Medical Informatics, Daqing Campus, Harbin Medical University, Daqing 163319, China
| | - Qi Pan
- School of Medical Informatics, Daqing Campus, Harbin Medical University, Daqing 163319, China
| | - Yong Jiang
- School of Medical Informatics, Daqing Campus, Harbin Medical University, Daqing 163319, China
| | - Fengcui Qian
- School of Medical Informatics, Daqing Campus, Harbin Medical University, Daqing 163319, China
| | - Junwei Han
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin 150081, China
| | - Yongsan Yang
- School of Medical Informatics, Daqing Campus, Harbin Medical University, Daqing 163319, China
| | - Qiuyu Wang
- School of Medical Informatics, Daqing Campus, Harbin Medical University, Daqing 163319, China
| | - Chunquan Li
- School of Medical Informatics, Daqing Campus, Harbin Medical University, Daqing 163319, China
| |
Collapse
|
28
|
Neville MDC, Kohze R, Erady C, Meena N, Hayden M, Cooper DN, Mort M, Prabakaran S. A platform for curated products from novel open reading frames prompts reinterpretation of disease variants. Genome Res 2021; 31:327-336. [PMID: 33468550 PMCID: PMC7849405 DOI: 10.1101/gr.263202.120] [Citation(s) in RCA: 14] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/05/2020] [Accepted: 08/26/2020] [Indexed: 11/29/2022]
Abstract
Recent evidence from proteomics and deep massively parallel sequencing studies have revealed that eukaryotic genomes contain substantial numbers of as-yet-uncharacterized open reading frames (ORFs). We define these uncharacterized ORFs as novel ORFs (nORFs). nORFs in humans are mostly under 100 codons and are found in diverse regions of the genome, including in long noncoding RNAs, pseudogenes, 3' UTRs, 5' UTRs, and alternative reading frames of canonical protein coding exons. There is therefore a pressing need to evaluate the potential functional importance of these unannotated transcripts and proteins in biological pathways and human disease on a larger scale, rather than one at a time. In this study, we outline the creation of a valuable nORFs data set with experimental evidence of translation for the community, use measures of heritability and selection that reveal signals for functional importance, and show the potential implications for functional interpretation of genetic variants in nORFs. Our results indicate that some variants that were previously classified as being benign or of uncertain significance may have to be reinterpreted.
Collapse
Affiliation(s)
- Matthew D C Neville
- Department of Genetics, University of Cambridge, Cambridge CB2 3EH, United Kingdom
| | - Robin Kohze
- Department of Genetics, University of Cambridge, Cambridge CB2 3EH, United Kingdom
| | - Chaitanya Erady
- Department of Genetics, University of Cambridge, Cambridge CB2 3EH, United Kingdom
| | - Narendra Meena
- Department of Biology, Indian Institute of Science Education and Research, Pune, Maharashtra 411008, India
| | - Matthew Hayden
- Institute of Medical Genetics, Cardiff University, Heath Park, Cardiff CF14 4XN, United Kingdom
| | - David N Cooper
- Institute of Medical Genetics, Cardiff University, Heath Park, Cardiff CF14 4XN, United Kingdom
| | - Matthew Mort
- Institute of Medical Genetics, Cardiff University, Heath Park, Cardiff CF14 4XN, United Kingdom
| | - Sudhakaran Prabakaran
- Department of Genetics, University of Cambridge, Cambridge CB2 3EH, United Kingdom
- Department of Biology, Indian Institute of Science Education and Research, Pune, Maharashtra 411008, India
- St Edmund's College, University of Cambridge, Cambridge CB3 0BN, United Kingdom
| |
Collapse
|
29
|
Erady C, Boxall A, Puntambekar S, Suhas Jagannathan N, Chauhan R, Chong D, Meena N, Kulkarni A, Kasabe B, Prathivadi Bhayankaram K, Umrania Y, Andreani A, Nel J, Wayland MT, Pina C, Lilley KS, Prabakaran S. Pan-cancer analysis of transcripts encoding novel open-reading frames (nORFs) and their potential biological functions. NPJ Genom Med 2021; 6:4. [PMID: 33495453 PMCID: PMC7835362 DOI: 10.1038/s41525-020-00167-4] [Citation(s) in RCA: 22] [Impact Index Per Article: 5.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/06/2020] [Accepted: 11/18/2020] [Indexed: 12/13/2022] Open
Abstract
Uncharacterized and unannotated open-reading frames, which we refer to as novel open reading frames (nORFs), may sometimes encode peptides that remain unexplored for novel therapeutic opportunities. To our knowledge, no systematic identification and characterization of transcripts encoding nORFs or their translation products in cancer, or in any other physiological process has been performed. We use our curated nORFs database (nORFs.org), together with RNA-Seq data from The Cancer Genome Atlas (TCGA) and Genotype-Expression (GTEx) consortiums, to identify transcripts containing nORFs that are expressed frequently in cancer or matched normal tissue across 22 cancer types. We show nORFs are subject to extensive dysregulation at the transcript level in cancer tissue and that a small subset of nORFs are associated with overall patient survival, suggesting that nORFs may have prognostic value. We also show that nORF products can form protein-like structures with post-translational modifications. Finally, we perform in silico screening for inhibitors against nORF-encoded proteins that are disrupted in stomach and esophageal cancer, showing that they can potentially be targeted by inhibitors. We hope this work will guide and motivate future studies that perform in-depth characterization of nORF functions in cancer and other diseases.
Collapse
Affiliation(s)
- Chaitanya Erady
- Department of Genetics, University of Cambridge, Downing Site, Cambridge, CB2 3EH, UK
| | - Adam Boxall
- Department of Genetics, University of Cambridge, Downing Site, Cambridge, CB2 3EH, UK
| | - Shraddha Puntambekar
- Department of Biology, Indian Institute of Science Education and Research, Pune, Maharashtra, 411008, India
| | - N Suhas Jagannathan
- Cancer and Stem Cell Biology Programme, and Centre for Computational Biology, Duke-NUS Medical School, Singapore, 169857, Singapore
| | - Ruchi Chauhan
- Department of Genetics, University of Cambridge, Downing Site, Cambridge, CB2 3EH, UK
| | - David Chong
- Department of Genetics, University of Cambridge, Downing Site, Cambridge, CB2 3EH, UK
| | - Narendra Meena
- Department of Genetics, University of Cambridge, Downing Site, Cambridge, CB2 3EH, UK
| | - Apurv Kulkarni
- Department of Biology, Indian Institute of Science Education and Research, Pune, Maharashtra, 411008, India
| | - Bhagyashri Kasabe
- Department of Biology, Indian Institute of Science Education and Research, Pune, Maharashtra, 411008, India
| | | | - Yagnesh Umrania
- Cambridge Centre for Proteomics, Department of Biochemistry, University of Cambridge, Tennis Court Road, Cambridge, CB2 1QR, UK
| | - Adam Andreani
- Department of Genetics, University of Cambridge, Downing Site, Cambridge, CB2 3EH, UK
| | - Jean Nel
- Department of Genetics, University of Cambridge, Downing Site, Cambridge, CB2 3EH, UK
| | - Matthew T Wayland
- Department of Zoology, University of Cambridge, Downing Street, Cambridge, CB2 3EJ, UK
| | - Cristina Pina
- Department of Haematology, Cambridge Biomedical Campus, Cambridge, CB2 0PT, UK
| | - Kathryn S Lilley
- Cambridge Centre for Proteomics, Department of Biochemistry, University of Cambridge, Tennis Court Road, Cambridge, CB2 1QR, UK
| | - Sudhakaran Prabakaran
- Department of Genetics, University of Cambridge, Downing Site, Cambridge, CB2 3EH, UK.
| |
Collapse
|
30
|
Xing J, Liu H, Jiang W, Wang L. LncRNA-Encoded Peptide: Functions and Predicting Methods. Front Oncol 2021; 10:622294. [PMID: 33520729 PMCID: PMC7842084 DOI: 10.3389/fonc.2020.622294] [Citation(s) in RCA: 65] [Impact Index Per Article: 16.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/28/2020] [Accepted: 11/30/2020] [Indexed: 12/16/2022] Open
Abstract
Long non-coding RNA (lncRNA) was originally defined as the representative of the non-coding RNAs and unable to encode. However, recent reports suggest that some lncRNAs actually contain open reading frames that encode peptides. These coding products play important roles in the pathogenesis of many diseases. Here, we summarize the regulatory pathways of mammalian lncRNA-encoded peptides in influencing muscle function, mRNA stability, gene expression, and so on. We also address the promoting and inhibiting functions of the peptides in different cancers and other diseases. Then we introduce the computational predicting methods and data resources to predict the coding ability of lncRNA. The intention of this review is to provide references for further coding research and contribute to reveal the potential prospects for targeted tumor therapy.
Collapse
Affiliation(s)
- Jiani Xing
- Department of Pathophysiology, Medical College of Southeast University, Nanjing, China
| | - Haizhou Liu
- Department of Biomedical Engineering, Nanjing University of Aeronautics and Astronautics, Nanjing, China
| | - Wei Jiang
- Department of Biomedical Engineering, Nanjing University of Aeronautics and Astronautics, Nanjing, China
| | - Lihong Wang
- Department of Pathophysiology, Medical College of Southeast University, Nanjing, China.,Jiangsu Provincial Key Laboratory of Critical Care Medicine, Nanjing, China
| |
Collapse
|
31
|
Brunet MA, Lucier JF, Levesque M, Leblanc S, Jacques JF, Al-Saedi HRH, Guilloy N, Grenier F, Avino M, Fournier I, Salzet M, Ouangraoua A, Scott M, Boisvert FM, Roucou X. OpenProt 2021: deeper functional annotation of the coding potential of eukaryotic genomes. Nucleic Acids Res 2021; 49:D380-D388. [PMID: 33179748 PMCID: PMC7779043 DOI: 10.1093/nar/gkaa1036] [Citation(s) in RCA: 71] [Impact Index Per Article: 17.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/14/2020] [Revised: 10/15/2020] [Accepted: 10/16/2020] [Indexed: 12/12/2022] Open
Abstract
OpenProt (www.openprot.org) is the first proteogenomic resource supporting a polycistronic annotation model for eukaryotic genomes. It provides a deeper annotation of open reading frames (ORFs) while mining experimental data for supporting evidence using cutting-edge algorithms. This update presents the major improvements since the initial release of OpenProt. All species support recent NCBI RefSeq and Ensembl annotations, with changes in annotations being reported in OpenProt. Using the 131 ribosome profiling datasets re-analysed by OpenProt to date, non-AUG initiation starts are reported alongside a confidence score of the initiating codon. From the 177 mass spectrometry datasets re-analysed by OpenProt to date, the unicity of the detected peptides is controlled at each implementation. Furthermore, to guide the users, detectability statistics and protein relationships (isoforms) are now reported for each protein. Finally, to foster access to deeper ORF annotation independently of one's bioinformatics skills or computational resources, OpenProt now offers a data analysis platform. Users can submit their dataset for analysis and receive the results from the analysis by OpenProt. All data on OpenProt are freely available and downloadable for each species, the release-based format ensuring a continuous access to the data. Thus, OpenProt enables a more comprehensive annotation of eukaryotic genomes and fosters functional proteomic discoveries.
Collapse
Affiliation(s)
- Marie A Brunet
- Department of Biochemistry and Functional Genomics, Université de Sherbrooke, 3201 Jean Mignault, Sherbrooke, QC J1E 4K8, Canada
- PROTEO, Quebec Network for Research on Protein Function, Structure, and Engineering, Université Laval, Quebec City, QC G1V0A6, Canada
| | - Jean-François Lucier
- Center for Computational Science, Université de Sherbrooke, Sherbrooke, QC J1K 2R1, Canada
- Biology Department, Université de Sherbrooke, Sherbrooke, QC J1K 2R1, Canada
| | - Maxime Levesque
- Center for Computational Science, Université de Sherbrooke, Sherbrooke, QC J1K 2R1, Canada
- Biology Department, Université de Sherbrooke, Sherbrooke, QC J1K 2R1, Canada
| | - Sébastien Leblanc
- Department of Biochemistry and Functional Genomics, Université de Sherbrooke, 3201 Jean Mignault, Sherbrooke, QC J1E 4K8, Canada
- PROTEO, Quebec Network for Research on Protein Function, Structure, and Engineering, Université Laval, Quebec City, QC G1V0A6, Canada
| | - Jean-Francois Jacques
- Department of Biochemistry and Functional Genomics, Université de Sherbrooke, 3201 Jean Mignault, Sherbrooke, QC J1E 4K8, Canada
- PROTEO, Quebec Network for Research on Protein Function, Structure, and Engineering, Université Laval, Quebec City, QC G1V0A6, Canada
| | - Hassan R H Al-Saedi
- Department of Biochemistry and Functional Genomics, Université de Sherbrooke, 3201 Jean Mignault, Sherbrooke, QC J1E 4K8, Canada
| | - Noé Guilloy
- Department of Biochemistry and Functional Genomics, Université de Sherbrooke, 3201 Jean Mignault, Sherbrooke, QC J1E 4K8, Canada
- PROTEO, Quebec Network for Research on Protein Function, Structure, and Engineering, Université Laval, Quebec City, QC G1V0A6, Canada
| | - Frederic Grenier
- Center for Computational Science, Université de Sherbrooke, Sherbrooke, QC J1K 2R1, Canada
- Biology Department, Université de Sherbrooke, Sherbrooke, QC J1K 2R1, Canada
| | - Mariano Avino
- Department of Biochemistry and Functional Genomics, Université de Sherbrooke, 3201 Jean Mignault, Sherbrooke, QC J1E 4K8, Canada
| | - Isabelle Fournier
- INSERM U1192, Laboratoire Protéomique, Réponse Inflammatoire & Spectrométrie de Masse (PRISM), Université de Lille, F-59000 Lille, France
| | - Michel Salzet
- INSERM U1192, Laboratoire Protéomique, Réponse Inflammatoire & Spectrométrie de Masse (PRISM), Université de Lille, F-59000 Lille, France
| | - Aïda Ouangraoua
- Informatics Department, Université de Sherbrooke, Sherbrooke, QC J1K 2R1, Canada
| | - Michelle S Scott
- Department of Biochemistry and Functional Genomics, Université de Sherbrooke, 3201 Jean Mignault, Sherbrooke, QC J1E 4K8, Canada
| | - François-Michel Boisvert
- Department of Immunology and Cellular Biology, Université de Sherbrooke, Sherbrooke, QC J1E 4K8, Canada
| | - Xavier Roucou
- Department of Biochemistry and Functional Genomics, Université de Sherbrooke, 3201 Jean Mignault, Sherbrooke, QC J1E 4K8, Canada
- PROTEO, Quebec Network for Research on Protein Function, Structure, and Engineering, Université Laval, Quebec City, QC G1V0A6, Canada
| |
Collapse
|
32
|
CNCB-NGDC Members and Partners, Xue Y, Bao Y, Zhang Z, Zhao W, Xiao J, He S, Zhang G, Li Y, Zhao G, Chen R, Song S, Ma L, Zou D, Tian D, Li C, Zhu J, Gong Z, Chen M, Wang A, Ma Y, Li M, Teng X, Cui Y, Duan G, Zhang M, Jin T, Shi C, Du Z, Zhang Y, Liu C, Li R, Zeng J, Hao L, Jiang S, Chen H, Han D, Xiao J, Zhang Z, Zhao W, Xue Y, Bao Y, Zhang T, Kang W, Yang F, Qu J, Zhang W, Bao Y, Liu GH, Liu L, Zhang Y, Niu G, Zhu T, Feng C, Liu X, Zhang Y, Li Z, Chen R, Li Q, Teng X, Ma L, Hua Z, Tian D, Jiang C, Chen Z, He F, Zhao Y, Jin Y, Zhang Z, Huang L, Song S, Yuan Y, Zhou C, Xu Q, He S, Ye W, Cao R, Wang P, Ling Y, Yan X, Wang Q, Zhang G, Li Z, Liu L, Jiang S, Li Q, Feng C, Du Q, Ma L, Zong W, Kang H, Zhang M, Xiong Z, Li R, Huan W, Ling Y, Zhang S, Xia Q, Cao R, Fan X, et alCNCB-NGDC Members and Partners, Xue Y, Bao Y, Zhang Z, Zhao W, Xiao J, He S, Zhang G, Li Y, Zhao G, Chen R, Song S, Ma L, Zou D, Tian D, Li C, Zhu J, Gong Z, Chen M, Wang A, Ma Y, Li M, Teng X, Cui Y, Duan G, Zhang M, Jin T, Shi C, Du Z, Zhang Y, Liu C, Li R, Zeng J, Hao L, Jiang S, Chen H, Han D, Xiao J, Zhang Z, Zhao W, Xue Y, Bao Y, Zhang T, Kang W, Yang F, Qu J, Zhang W, Bao Y, Liu GH, Liu L, Zhang Y, Niu G, Zhu T, Feng C, Liu X, Zhang Y, Li Z, Chen R, Li Q, Teng X, Ma L, Hua Z, Tian D, Jiang C, Chen Z, He F, Zhao Y, Jin Y, Zhang Z, Huang L, Song S, Yuan Y, Zhou C, Xu Q, He S, Ye W, Cao R, Wang P, Ling Y, Yan X, Wang Q, Zhang G, Li Z, Liu L, Jiang S, Li Q, Feng C, Du Q, Ma L, Zong W, Kang H, Zhang M, Xiong Z, Li R, Huan W, Ling Y, Zhang S, Xia Q, Cao R, Fan X, Wang Z, Zhang G, Chen X, Chen T, Zhang S, Tang B, Zhu J, Dong L, Zhang Z, Wang Z, Kang H, Wang Y, Ma Y, Wu S, Kang H, Chen M, Li C, Tian D, Tang B, Liu X, Teng X, Song S, Tian D, Liu X, Li C, Teng X, Song S, Zhang Y, Zou D, Zhu T, Chen M, Niu G, Liu C, Xiong Y, Hao L, Niu G, Zou D, Zhu T, Shao X, Hao L, Li Y, Zhou H, Chen X, Zheng Y, Kang Q, Hao D, Zhang L, Luo H, Hao Y, Chen R, Zhang P, He S, Zou D, Zhang M, Xiong Z, Nie Z, Yu S, Li R, Li M, Li R, Bao Y, Xiong Z, Li M, Yang F, Ma Y, Sang J, Li Z, Li R, Tang B, Zhang X, Dong L, Zhou Q, Cui Y, Zhai S, Zhang Y, Wang G, Zhao W, Wang Z, Zhu Q, Li X, Zhu J, Tian D, Kang H, Li C, Zhang S, Song S, Li M, Zhao W, Yan J, Sang J, Zou D, Li C, Wang Z, Zhang Y, Zhu T, Song S, Wang X, Hao L, Liu Y, Wang Z, Luo H, Zhu J, Wu X, Tian D, Li C, Zhao W, Jing HC, Chen M, Zou D, Hao L, Zhao L, Wang J, Li Y, Song T, Zheng Y, Chen R, Zhao Y, He S, Zou D, Mehmood F, Ali S, Ali A, Saleem S, Hussain I, Abbasi AA, Ma L, Zou D, Zou D, Jiang S, Zhang Z, Jiang S, Zhao W, Xiao J, Bao Y, Zhang Z, Zuo Z, Ren J, Zhang X, Xiao Y, Li X, Zhang X, Xiao Y, Li X, Tu Y, Xue Y, Wu W, Ji P, Zhao F, Meng X, Chen M, Peng D, Xue Y, Luo H, Gao F, Zhang X, Xiao Y, Li X, Ning W, Xue Y, Lin S, Xue Y, Liu T, Guo AY, Yuan H, Zhang YE, Tan X, Xue Y, Zhang W, Xue Y, Xie Y, Ren J, Wang C, Xue Y, Liu CJ, Guo AY, Yang DC, Tian F, Gao G, Tang D, Xue Y, Yao L, Xue Y, Cui Q, An NA, Li CY, Luo X, Ren J, Zhang X, Xiao Y, Li X. Database Resources of the National Genomics Data Center, China National Center for Bioinformation in 2021. Nucleic Acids Res 2021; 49:D18-D28. [PMID: 33175170 PMCID: PMC7779035 DOI: 10.1093/nar/gkaa1022] [Show More Authors] [Citation(s) in RCA: 154] [Impact Index Per Article: 38.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/14/2020] [Revised: 10/13/2020] [Accepted: 10/16/2020] [Indexed: 12/20/2022] Open
Abstract
The National Genomics Data Center (NGDC), part of the China National Center for Bioinformation (CNCB), provides a suite of database resources to support worldwide research activities in both academia and industry. With the explosive growth of multi-omics data, CNCB-NGDC is continually expanding, updating and enriching its core database resources through big data deposition, integration and translation. In the past year, considerable efforts have been devoted to 2019nCoVR, a newly established resource providing a global landscape of SARS-CoV-2 genomic sequences, variants, and haplotypes, as well as Aging Atlas, BrainBase, GTDB (Glycosyltransferases Database), LncExpDB, and TransCirc (Translation potential for circular RNAs). Meanwhile, a series of resources have been updated and improved, including BioProject, BioSample, GWH (Genome Warehouse), GVM (Genome Variation Map), GEN (Gene Expression Nebulas) as well as several biodiversity and plant resources. Particularly, BIG Search, a scalable, one-stop, cross-database search engine, has been significantly updated by providing easy access to a large number of internal and external biological resources from CNCB-NGDC, our partners, EBI and NCBI. All of these resources along with their services are publicly accessible at https://bigd.big.ac.cn.
Collapse
|
33
|
Huang Y, Wang J, Zhao Y, Wang H, Liu T, Li Y, Cui T, Li W, Feng Y, Luo J, Gong J, Ning L, Zhang Y, Wang D, Zhang Y. cncRNAdb: a manually curated resource of experimentally supported RNAs with both protein-coding and noncoding function. Nucleic Acids Res 2021; 49:D65-D70. [PMID: 33010163 PMCID: PMC7778915 DOI: 10.1093/nar/gkaa791] [Citation(s) in RCA: 43] [Impact Index Per Article: 10.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/29/2020] [Revised: 08/30/2020] [Accepted: 09/11/2020] [Indexed: 12/14/2022] Open
Abstract
RNA endowed with both protein-coding and noncoding functions is referred to as 'dual-function RNA', 'binary functional RNA (bifunctional RNA)' or 'cncRNA (coding and noncoding RNA)'. Recently, an increasing number of cncRNAs have been identified, including both translated ncRNAs (ncRNAs with coding functions) and untranslated mRNAs (mRNAs with noncoding functions). However, an appropriate database for storing and organizing cncRNAs is still lacking. Here, we developed cncRNAdb, a manually curated database of experimentally supported cncRNAs, which aims to provide a resource for efficient manipulation, browsing and analysis of cncRNAs. The current version of cncRNAdb documents about 2600 manually curated entries of cncRNA functions with experimental evidence, involving more than 2,000 RNAs (including over 1300 translated ncRNAs and over 600 untranslated mRNAs) across over 20 species. In summary, we believe that cncRNAdb will help elucidate the functions and mechanisms of cncRNAs and develop new prediction methods. The database is available at http://www.rna-society.org/cncrnadb/.
Collapse
MESH Headings
- 3' Untranslated Regions
- 5' Untranslated Regions
- Animals
- Databases, Nucleic Acid/organization & administration
- Drosophila melanogaster/genetics
- Humans
- Mice
- MicroRNAs/classification
- MicroRNAs/genetics
- Pan troglodytes/genetics
- RNA, Circular/classification
- RNA, Circular/genetics
- RNA, Long Noncoding/classification
- RNA, Long Noncoding/genetics
- RNA, Messenger/classification
- RNA, Messenger/genetics
- RNA, Ribosomal/classification
- RNA, Ribosomal/genetics
- RNA, Small Interfering/classification
- RNA, Small Interfering/genetics
- RNA, Transfer/classification
- RNA, Transfer/genetics
- Software
- Zebrafish/genetics
Collapse
Affiliation(s)
- Yan Huang
- Shunde Hospital, Southern Medical University (The First People's Hospital of Shunde Foshan), Foshan 528308, China
| | - Jing Wang
- Department of Bioinformatics, School of Basic Medical Sciences, Southern Medical University, Guangzhou 510515, China
| | - Yue Zhao
- School of Basic Medical Sciences & Forensic Medicine, Hangzhou Medical College, Hangzhou 310053, China
| | - Huafeng Wang
- Shunde Hospital, Southern Medical University (The First People's Hospital of Shunde Foshan), Foshan 528308, China
| | - Tianyuan Liu
- Department of Bioinformatics, School of Basic Medical Sciences, Southern Medical University, Guangzhou 510515, China
| | - Yuhe Li
- Department of Bioinformatics, School of Basic Medical Sciences, Southern Medical University, Guangzhou 510515, China
| | - Tianyu Cui
- Department of Bioinformatics, School of Basic Medical Sciences, Southern Medical University, Guangzhou 510515, China
| | - Weiyi Li
- Department of Bioinformatics, School of Basic Medical Sciences, Southern Medical University, Guangzhou 510515, China
| | - Yige Feng
- Department of Bioinformatics, School of Basic Medical Sciences, Southern Medical University, Guangzhou 510515, China
| | - Jiaxin Luo
- Department of Bioinformatics, School of Basic Medical Sciences, Southern Medical University, Guangzhou 510515, China
| | - Jiaqi Gong
- Department of Bioinformatics, School of Basic Medical Sciences, Southern Medical University, Guangzhou 510515, China
| | - Lin Ning
- Dermatology Hospital, Southern Medical University, Guangzhou 510091, China
| | - Yong Zhang
- Shunde Hospital, Southern Medical University (The First People's Hospital of Shunde Foshan), Foshan 528308, China
| | - Dong Wang
- Shunde Hospital, Southern Medical University (The First People's Hospital of Shunde Foshan), Foshan 528308, China
- Department of Bioinformatics, School of Basic Medical Sciences, Southern Medical University, Guangzhou 510515, China
- Dermatology Hospital, Southern Medical University, Guangzhou 510091, China
| | - Yang Zhang
- Shunde Hospital, Southern Medical University (The First People's Hospital of Shunde Foshan), Foshan 528308, China
| |
Collapse
|
34
|
Dowling D, Schmitz JF, Bornberg-Bauer E. Stochastic Gain and Loss of Novel Transcribed Open Reading Frames in the Human Lineage. Genome Biol Evol 2020; 12:2183-2195. [PMID: 33210146 PMCID: PMC7674706 DOI: 10.1093/gbe/evaa194] [Citation(s) in RCA: 22] [Impact Index Per Article: 4.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 09/12/2020] [Indexed: 12/12/2022] Open
Abstract
In addition to known genes, much of the human genome is transcribed into RNA. Chance formation of novel open reading frames (ORFs) can lead to the translation of myriad new proteins. Some of these ORFs may yield advantageous adaptive de novo proteins. However, widespread translation of noncoding DNA can also produce hazardous protein molecules, which can misfold and/or form toxic aggregates. The dynamics of how de novo proteins emerge from potentially toxic raw materials and what influences their long-term survival are unknown. Here, using transcriptomic data from human and five other primates, we generate a set of transcribed human ORFs at six conservation levels to investigate which properties influence the early emergence and long-term retention of these expressed ORFs. As these taxa diverged from each other relatively recently, we present a fine scale view of the evolution of novel sequences over recent evolutionary time. We find that novel human-restricted ORFs are preferentially located on GC-rich gene-dense chromosomes, suggesting their retention is linked to pre-existing genes. Sequence properties such as intrinsic structural disorder and aggregation propensity-which have been proposed to play a role in survival of de novo genes-remain unchanged over time. Even very young sequences code for proteins with low aggregation propensities, suggesting that genomic regions with many novel transcribed ORFs are concomitantly less likely to produce ORFs which code for harmful toxic proteins. Our data indicate that the survival of these novel ORFs is largely stochastic rather than shaped by selection.
Collapse
Affiliation(s)
- Daniel Dowling
- Institute for Evolution and Biodiversity, University of Münster, Germany
| | - Jonathan F Schmitz
- Institute for Evolution and Biodiversity, University of Münster, Germany
| | | |
Collapse
|
35
|
Chen Y, Li D, Fan W, Zheng X, Zhou Y, Ye H, Liang X, Du W, Zhou Y, Wang K. PsORF: a database of small ORFs in plants. PLANT BIOTECHNOLOGY JOURNAL 2020; 18:2158-2160. [PMID: 32333496 PMCID: PMC7589237 DOI: 10.1111/pbi.13389] [Citation(s) in RCA: 32] [Impact Index Per Article: 6.4] [Reference Citation Analysis] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 12/13/2019] [Revised: 04/14/2020] [Accepted: 04/18/2020] [Indexed: 05/15/2023]
Affiliation(s)
- Yanjun Chen
- College of Life SciencesWuhan UniversityWuhanChina
| | - Danyang Li
- College of Life SciencesWuhan UniversityWuhanChina
| | - Weiliang Fan
- College of Life SciencesWuhan UniversityWuhanChina
- State Key Laboratory of VirologyWuhan UniversityWuhanChina
| | | | - Yifan Zhou
- College of Life SciencesWuhan UniversityWuhanChina
| | - Hanzhe Ye
- College of Life SciencesWuhan UniversityWuhanChina
| | | | - Wei Du
- College of Life SciencesWuhan UniversityWuhanChina
| | - Yu Zhou
- College of Life SciencesWuhan UniversityWuhanChina
- State Key Laboratory of VirologyWuhan UniversityWuhanChina
| | - Kun Wang
- College of Life SciencesWuhan UniversityWuhanChina
| |
Collapse
|
36
|
smORFunction: a tool for predicting functions of small open reading frames and microproteins. BMC Bioinformatics 2020; 21:455. [PMID: 33054771 PMCID: PMC7559452 DOI: 10.1186/s12859-020-03805-x] [Citation(s) in RCA: 19] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/22/2020] [Accepted: 10/08/2020] [Indexed: 12/14/2022] Open
Abstract
Background Small open reading frame (smORF) is open reading frame with a length of less than 100 codons. Microproteins, translated from smORFs, have been found to participate in a variety of biological processes such as muscle formation and contraction, cell proliferation, and immune activation. Although previous studies have collected and annotated a large abundance of smORFs, functions of the vast majority of smORFs are still unknown. It is thus increasingly important to develop computational methods to annotate the functions of these smORFs. Results In this study, we collected 617,462 unique smORFs from three studies. The expression of smORF RNAs was estimated by reannotated microarray probes. Using a speed-optimized correlation algorism, the functions of smORFs were predicted by their correlated genes with known functional annotations. After applying our method to 5 known microproteins from literatures, our method successfully predicted their functions. Further validation from the UniProt database showed that at least one function of 202 out of 270 microproteins was predicted. Conclusions We developed a method, smORFunction, to provide function predictions of smORFs/microproteins in at most 265 models generated from 173 datasets, including 48 tissues/cells, 82 diseases (and normal). The tool can be available at https://www.cuilab.cn/smorfunction.
Collapse
|
37
|
Leblanc S, Brunet MA. Modelling of pathogen-host systems using deeper ORF annotations and transcriptomics to inform proteomics analyses. Comput Struct Biotechnol J 2020; 18:2836-2850. [PMID: 33133425 PMCID: PMC7585943 DOI: 10.1016/j.csbj.2020.10.010] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/29/2020] [Revised: 10/07/2020] [Accepted: 10/08/2020] [Indexed: 01/08/2023] Open
Abstract
The Zika virus is a flavivirus that can cause fulminant outbreaks and lead to Guillain-Barré syndrome, microcephaly and fetal demise. Like other flaviviruses, the Zika virus is transmitted by mosquitoes and provokes neurological disorders. Despite its risk to public health, no antiviral nor vaccine are currently available. In the recent years, several studies have set to identify human host proteins interacting with Zika viral proteins to better understand its pathogenicity. Yet these studies used standard human protein sequence databases. Such databases rely on genome annotations, which enforce a minimal open reading frame (ORF) length criterion. An ever-increasing number of studies have demonstrated the shortcomings of such annotation, which overlooks thousands of functional ORFs. Here we show that the use of a customized database including currently non-annotated proteins led to the identification of 4 alternative proteins as interactors of the viral capsid and NS4A proteins. Furthermore, 12 alternative proteins were identified in the proteome profiling of Zika infected monocytes, one of which was significantly up-regulated. This study presents a computational framework for the re-analysis of proteomics datasets to better investigate the viral-host protein interplays upon infection with the Zika virus.
Collapse
Key Words
- AP-MS, affinity-purification mass spectrometry
- Alternative ORFs
- DEP, differentially expressed proteins
- FDR, false discovery rate
- FPKM, fragments per kilobase of exon model per million reads mapped
- Flavivirus
- HCIP, highly confident interacting proteins
- HCMV, human cytomegalovirus
- LFQ, label free quantification
- MS, mass spectrometry
- ORF, open reading frame
- PSM, peptide spectrum match
- Protein network
- Proteogenomics
- Proteome profiling
- ZIKV, Zika virus
- Zika
- altProt, alternative protein
- ncRNA, non-coding RNA
- sORF, small open reading frame
Collapse
Affiliation(s)
- Sebastien Leblanc
- Department of Biochemistry and Functional Genomics, Université de Sherbrooke, Sherbrooke, Québec, Canada
- PROTEO, Quebec Network for Research on Protein Function, Structure, and Engineering, Canada
| | - Marie A. Brunet
- Department of Biochemistry and Functional Genomics, Université de Sherbrooke, Sherbrooke, Québec, Canada
- PROTEO, Quebec Network for Research on Protein Function, Structure, and Engineering, Canada
| |
Collapse
|
38
|
Sendino M, Omaetxebarria MJ, Prieto G, Rodriguez JA. Using a Simple Cellular Assay to Map NES Motifs in Cancer-Related Proteins, Gain Insight into CRM1-Mediated NES Export, and Search for NES-Harboring Micropeptides. Int J Mol Sci 2020; 21:E6341. [PMID: 32882917 PMCID: PMC7503480 DOI: 10.3390/ijms21176341] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/10/2020] [Revised: 08/24/2020] [Accepted: 08/26/2020] [Indexed: 12/26/2022] Open
Abstract
The nuclear export receptor CRM1 (XPO1) recognizes and binds specific sequence motifs termed nuclear export signals (NESs) in cargo proteins. About 200 NES motifs have been identified, but over a thousand human proteins are potential CRM1 cargos, and most of their NESs remain to be identified. On the other hand, the interaction of NES peptides with the "NES-binding groove" of CRM1 was studied in detail using structural and biochemical analyses, but a better understanding of CRM1 function requires further investigation of how the results from these in vitro studies translate into actual NES export in a cellular context. Here we show that a simple cellular assay, based on a recently described reporter (SRVB/A), can be applied to identify novel potential NESs motifs, and to obtain relevant information on different aspects of CRM1-mediated NES export. Using cellular assays, we first map 19 new sequence motifs with nuclear export activity in 14 cancer-related proteins that are potential CRM1 cargos. Next, we investigate the effect of mutations in individual NES-binding groove residues, providing further insight into CRM1-mediated NES export. Finally, we extend the search for CRM1-dependent NESs to a recently uncovered, but potentially vast, set of small proteins called micropeptides. By doing so, we report the first NES-harboring human micropeptides.
Collapse
Affiliation(s)
- Maria Sendino
- Department of Genetics, Physical Anthropology and Animal Physiology, University of the Basque Country (UPV/EHU), 48940 Leioa, Spain;
| | - Miren Josu Omaetxebarria
- Department of Biochemistry and Molecular Biology, University of the Basque Country (UPV/EHU), 48940 Leioa, Spain;
| | - Gorka Prieto
- Department of Communications Engineering, University of the Basque Country (UPV/EHU), 48013 Bilbao, Spain;
| | - Jose Antonio Rodriguez
- Department of Genetics, Physical Anthropology and Animal Physiology, University of the Basque Country (UPV/EHU), 48940 Leioa, Spain;
| |
Collapse
|
39
|
Choi SW, Kim HW, Nam JW. The small peptide world in long noncoding RNAs. Brief Bioinform 2020; 20:1853-1864. [PMID: 30010717 PMCID: PMC6917221 DOI: 10.1093/bib/bby055] [Citation(s) in RCA: 200] [Impact Index Per Article: 40.0] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/09/2018] [Revised: 05/08/2018] [Indexed: 02/07/2023] Open
Abstract
Long noncoding RNAs (lncRNAs) are a group of transcripts that are longer than 200 nucleotides (nt) without coding potential. Over the past decade, tens of thousands of novel lncRNAs have been annotated in animal and plant genomes because of advanced high-throughput RNA sequencing technologies and with the aid of coding transcript classifiers. Further, a considerable number of reports have revealed the existence of stable, functional small peptides (also known as micropeptides), translated from lncRNAs. In this review, we discuss the methods of lncRNA classification, the investigations regarding their coding potential and the functional significance of the peptides they encode.
Collapse
Affiliation(s)
- Seo-Won Choi
- Department of Life Science, College of Natural Sciences, Hanyang University, Seoul 04763, Republic of Korea
| | - Hyun-Woo Kim
- Department of Life Science, College of Natural Sciences, Hanyang University, Seoul 04763, Republic of Korea
| | - Jin-Wu Nam
- Department of Life Science, College of Natural Sciences, Hanyang University, Seoul 04763, Republic of Korea
| |
Collapse
|
40
|
Brunet MA, Brunelle M, Lucier JF, Delcourt V, Levesque M, Grenier F, Samandi S, Leblanc S, Aguilar JD, Dufour P, Jacques JF, Fournier I, Ouangraoua A, Scott MS, Boisvert FM, Roucou X. OpenProt: a more comprehensive guide to explore eukaryotic coding potential and proteomes. Nucleic Acids Res 2020; 47:D403-D410. [PMID: 30299502 PMCID: PMC6323990 DOI: 10.1093/nar/gky936] [Citation(s) in RCA: 41] [Impact Index Per Article: 8.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/19/2018] [Accepted: 10/04/2018] [Indexed: 01/06/2023] Open
Abstract
Advances in proteomics and sequencing have highlighted many non-annotated open reading frames (ORFs) in eukaryotic genomes. Genome annotations, cornerstones of today's research, mostly rely on protein prior knowledge and on ab initio prediction algorithms. Such algorithms notably enforce an arbitrary criterion of one coding sequence (CDS) per transcript, leading to a substantial underestimation of the coding potential of eukaryotes. Here, we present OpenProt, the first database fully endorsing a polycistronic model of eukaryotic genomes to date. OpenProt contains all possible ORFs longer than 30 codons across 10 species, and cumulates supporting evidence such as protein conservation, translation and expression. OpenProt annotates all known proteins (RefProts), novel predicted isoforms (Isoforms) and novel predicted proteins from alternative ORFs (AltProts). It incorporates cutting-edge algorithms to evaluate protein orthology and re-interrogate publicly available ribosome profiling and mass spectrometry datasets, supporting the annotation of thousands of predicted ORFs. The constantly growing database currently cumulates evidence from 87 ribosome profiling and 114 mass spectrometry studies from several species, tissues and cell lines. All data is freely available and downloadable from a web platform (www.openprot.org) supporting a genome browser and advanced queries for each species. Thus, OpenProt enables a more comprehensive landscape of eukaryotic genomes’ coding potential.
Collapse
Affiliation(s)
- Marie A Brunet
- Department of Biochemistry, Université de Sherbrooke, Sherbrooke, Québec, Canada.,PROTEO, Quebec Network for Research on Protein Function, Structure, and Engineering, Université de Lille, F-59000 Lille, France
| | - Mylène Brunelle
- Department of Biochemistry, Université de Sherbrooke, Sherbrooke, Québec, Canada.,PROTEO, Quebec Network for Research on Protein Function, Structure, and Engineering, Université de Lille, F-59000 Lille, France
| | - Jean-François Lucier
- Center for Computational Science, Université de Sherbrooke, Sherbrooke, Québec, Canada.,Biology Department, Université de Sherbrooke, Sherbrooke, Québec, Canada
| | - Vivian Delcourt
- Department of Biochemistry, Université de Sherbrooke, Sherbrooke, Québec, Canada.,PROTEO, Quebec Network for Research on Protein Function, Structure, and Engineering, Université de Lille, F-59000 Lille, France.,INSERM U1192, Laboratoire Protéomique, Réponse Inflammatoire & Spectrométrie de Masse (PRISM), Université de Lille, F-59000 Lille, France
| | - Maxime Levesque
- Center for Computational Science, Université de Sherbrooke, Sherbrooke, Québec, Canada.,Biology Department, Université de Sherbrooke, Sherbrooke, Québec, Canada
| | - Frédéric Grenier
- Center for Computational Science, Université de Sherbrooke, Sherbrooke, Québec, Canada.,Biology Department, Université de Sherbrooke, Sherbrooke, Québec, Canada
| | - Sondos Samandi
- Department of Biochemistry, Université de Sherbrooke, Sherbrooke, Québec, Canada.,PROTEO, Quebec Network for Research on Protein Function, Structure, and Engineering, Université de Lille, F-59000 Lille, France
| | - Sébastien Leblanc
- Department of Biochemistry, Université de Sherbrooke, Sherbrooke, Québec, Canada
| | - Jean-David Aguilar
- Department of Biochemistry, Université de Sherbrooke, Sherbrooke, Québec, Canada
| | - Pascal Dufour
- Department of Biochemistry, Université de Sherbrooke, Sherbrooke, Québec, Canada
| | - Jean-Francois Jacques
- Department of Biochemistry, Université de Sherbrooke, Sherbrooke, Québec, Canada.,PROTEO, Quebec Network for Research on Protein Function, Structure, and Engineering, Université de Lille, F-59000 Lille, France
| | - Isabelle Fournier
- INSERM U1192, Laboratoire Protéomique, Réponse Inflammatoire & Spectrométrie de Masse (PRISM), Université de Lille, F-59000 Lille, France
| | - Aida Ouangraoua
- Informatics Department, Université de Sherbrooke, Sherbrooke, Québec, Canada
| | - Michelle S Scott
- Department of Biochemistry, Université de Sherbrooke, Sherbrooke, Québec, Canada
| | | | - Xavier Roucou
- Department of Biochemistry, Université de Sherbrooke, Sherbrooke, Québec, Canada.,PROTEO, Quebec Network for Research on Protein Function, Structure, and Engineering, Université de Lille, F-59000 Lille, France
| |
Collapse
|
41
|
Kiniry SJ, O'Connor PBF, Michel AM, Baranov PV. Trips-Viz: a transcriptome browser for exploring Ribo-Seq data. Nucleic Acids Res 2020; 47:D847-D852. [PMID: 30239879 PMCID: PMC6324076 DOI: 10.1093/nar/gky842] [Citation(s) in RCA: 38] [Impact Index Per Article: 7.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/16/2018] [Accepted: 09/12/2018] [Indexed: 01/05/2023] Open
Abstract
Ribosome profiling (Ribo-Seq) is a technique that allows for the isolation and sequencing of mRNA fragments protected from nuclease digestion by actively translating ribosomes. Mapping these ribosome footprints to a genome or transcriptome generates quantitative information on translated regions. To provide access to publicly available ribosome profiling data in the context of transcriptomes we developed Trips-Viz (transcriptome-wide information on protein synthesis-visualized). Trips-Viz provides a large range of graphical tools for exploring global properties of translatomes and of individual transcripts. It enables analysis of aligned footprints to evaluate datasets quality, differential gene expression detection, visual identification of upstream ORFs and alternative proteoforms. Trips-Viz is available at https://trips.ucc.ie.
Collapse
Affiliation(s)
- Stephen J Kiniry
- School of Biochemistry and Cell Biology, University College Cork, Cork, Ireland
| | | | - Audrey M Michel
- School of Biochemistry and Cell Biology, University College Cork, Cork, Ireland
| | - Pavel V Baranov
- School of Biochemistry and Cell Biology, University College Cork, Cork, Ireland
| |
Collapse
|
42
|
Brunet MA, Leblanc S, Roucou X. Reconsidering proteomic diversity with functional investigation of small ORFs and alternative ORFs. Exp Cell Res 2020; 393:112057. [PMID: 32387289 DOI: 10.1016/j.yexcr.2020.112057] [Citation(s) in RCA: 29] [Impact Index Per Article: 5.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/19/2019] [Revised: 04/21/2020] [Accepted: 05/02/2020] [Indexed: 12/13/2022]
Abstract
The discovery of functional yet non-annotated open reading frames (ORFs) throughout the genome of several species presents an unprecedented challenge in current genome annotation. These novel ORFs are shorter than annotated ones and many can be found on the same RNA, in opposition to current assumptions in annotation methodologies. Whilst the literature lacks consensus, these novel ORFs are commonly referred to as small ORFs (sORFs) or alternative ORFs (alt-ORFs). Unannotated ORFs represent an overlooked layer of complexity in the coding potential of genomes and are transforming our current vision of the nature of coding genes. In this review, we outline what constitutes a sORF or an alt-ORF and emphasize differences between both nomenclatures. We then describe complementary large-scale methods to accurately discover novel ORFs as well as yield functional insights on the novel proteins they encode. While serendipitous discoveries highlighted the functional importance of some novel ORFs, omics methods facilitate and improve their characterization to better understand physiological and pathological pathways. Functional annotation of sORFs, alt-ORFs and their corresponding microproteins will likely help fundamental and clinical research.
Collapse
Affiliation(s)
- Marie A Brunet
- Department of Biochemistry and Functional Genomics, Université de Sherbrooke, Sherbrooke, Québec, Canada; PROTEO, Quebec Network for Research on Protein Function, Structure, and Engineering, Canada.
| | - Sebastien Leblanc
- Department of Biochemistry and Functional Genomics, Université de Sherbrooke, Sherbrooke, Québec, Canada; PROTEO, Quebec Network for Research on Protein Function, Structure, and Engineering, Canada
| | - Xavier Roucou
- Department of Biochemistry and Functional Genomics, Université de Sherbrooke, Sherbrooke, Québec, Canada; PROTEO, Quebec Network for Research on Protein Function, Structure, and Engineering, Canada.
| |
Collapse
|
43
|
Salamini-Montemurri M, Lamas-Maceiras M, Barreiro-Alonso A, Vizoso-Vázquez Á, Rodríguez-Belmonte E, Quindós-Varela M, Cerdán ME. The Challenges and Opportunities of LncRNAs in Ovarian Cancer Research and Clinical Use. Cancers (Basel) 2020; 12:E1020. [PMID: 32326249 PMCID: PMC7225988 DOI: 10.3390/cancers12041020] [Citation(s) in RCA: 26] [Impact Index Per Article: 5.2] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/26/2020] [Revised: 04/15/2020] [Accepted: 04/17/2020] [Indexed: 12/24/2022] Open
Abstract
Ovarian cancer is one of the most lethal gynecological malignancies worldwide because it tends to be detected late, when the disease has already spread, and prognosis is poor. In this review we aim to highlight the importance of long non-coding RNAs (lncRNAs) in diagnosis, prognosis and treatment choice, to make progress towards increasingly personalized medicine in this malignancy. We review the effects of lncRNAs associated with ovarian cancer in the context of cancer hallmarks. We also discuss the molecular mechanisms by which lncRNAs become involved in cellular physiology; the onset, development and progression of ovarian cancer; and lncRNAs' regulatory mechanisms at the transcriptional, post-transcriptional and post-translational stages of gene expression. Finally, we compile a series of online resources useful for the study of lncRNAs, especially in the context of ovarian cancer. Future work required in the field is also discussed along with some concluding remarks.
Collapse
Affiliation(s)
- Martín Salamini-Montemurri
- EXPRELA Group, Centro de Investigacións Científicas Avanzadas (CICA), Departamento de Bioloxía, Facultade de Ciencias, INIBIC-Universidade da Coruña, Campus de A Coruña, 15071 A Coruña, Spain; (M.S.-M.); (M.L.-M.); (A.B.-A.); (E.R.-B.)
| | - Mónica Lamas-Maceiras
- EXPRELA Group, Centro de Investigacións Científicas Avanzadas (CICA), Departamento de Bioloxía, Facultade de Ciencias, INIBIC-Universidade da Coruña, Campus de A Coruña, 15071 A Coruña, Spain; (M.S.-M.); (M.L.-M.); (A.B.-A.); (E.R.-B.)
| | - Aida Barreiro-Alonso
- EXPRELA Group, Centro de Investigacións Científicas Avanzadas (CICA), Departamento de Bioloxía, Facultade de Ciencias, INIBIC-Universidade da Coruña, Campus de A Coruña, 15071 A Coruña, Spain; (M.S.-M.); (M.L.-M.); (A.B.-A.); (E.R.-B.)
| | - Ángel Vizoso-Vázquez
- EXPRELA Group, Centro de Investigacións Científicas Avanzadas (CICA), Departamento de Bioloxía, Facultade de Ciencias, INIBIC-Universidade da Coruña, Campus de A Coruña, 15071 A Coruña, Spain; (M.S.-M.); (M.L.-M.); (A.B.-A.); (E.R.-B.)
| | - Esther Rodríguez-Belmonte
- EXPRELA Group, Centro de Investigacións Científicas Avanzadas (CICA), Departamento de Bioloxía, Facultade de Ciencias, INIBIC-Universidade da Coruña, Campus de A Coruña, 15071 A Coruña, Spain; (M.S.-M.); (M.L.-M.); (A.B.-A.); (E.R.-B.)
| | - María Quindós-Varela
- Translational Cancer Research Group, Instituto de Investigación Biomédica de A Coruña (INIBIC), Carretera del Pasaje s/n, 15006 A Coruña, Spain;
| | - María Esperanza Cerdán
- EXPRELA Group, Centro de Investigacións Científicas Avanzadas (CICA), Departamento de Bioloxía, Facultade de Ciencias, INIBIC-Universidade da Coruña, Campus de A Coruña, 15071 A Coruña, Spain; (M.S.-M.); (M.L.-M.); (A.B.-A.); (E.R.-B.)
| |
Collapse
|
44
|
Heames B, Schmitz J, Bornberg-Bauer E. A Continuum of Evolving De Novo Genes Drives Protein-Coding Novelty in Drosophila. J Mol Evol 2020; 88:382-398. [PMID: 32253450 PMCID: PMC7162840 DOI: 10.1007/s00239-020-09939-z] [Citation(s) in RCA: 48] [Impact Index Per Article: 9.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/14/2019] [Accepted: 03/13/2020] [Indexed: 12/13/2022]
Abstract
Orphan genes, lacking detectable homologs in outgroup species, typically represent 10-30% of eukaryotic genomes. Efforts to find the source of these young genes indicate that de novo emergence from non-coding DNA may in part explain their prevalence. Here, we investigate the roots of orphan gene emergence in the Drosophila genus. Across the annotated proteomes of twelve species, we find 6297 orphan genes within 4953 taxon-specific clusters of orthologs. By inferring the ancestral DNA as non-coding for between 550 and 2467 (8.7-39.2%) of these genes, we describe for the first time how de novo emergence contributes to the abundance of clade-specific Drosophila genes. In support of them having functional roles, we show that de novo genes have robust expression and translational support. However, the distinct nucleotide sequences of de novo genes, which have characteristics intermediate between intergenic regions and conserved genes, reflect their recent birth from non-coding DNA. We find that de novo genes encode more disordered proteins than both older genes and intergenic regions. Together, our results suggest that gene emergence from non-coding DNA provides an abundant source of material for the evolution of new proteins. Following gene birth, gradual evolution over large evolutionary timescales moulds sequence properties towards those of conserved genes, resulting in a continuum of properties whose starting points depend on the nucleotide sequences of an initial pool of novel genes.
Collapse
Affiliation(s)
- Brennen Heames
- Institute for Evolution and Biodiversity, 48149, Münster, Germany
| | - Jonathan Schmitz
- Institute for Evolution and Biodiversity, 48149, Münster, Germany
| | | |
Collapse
|
45
|
Mitochondrial peptide BRAWNIN is essential for vertebrate respiratory complex III assembly. Nat Commun 2020; 11:1312. [PMID: 32161263 PMCID: PMC7066179 DOI: 10.1038/s41467-020-14999-2] [Citation(s) in RCA: 84] [Impact Index Per Article: 16.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/19/2019] [Accepted: 02/14/2020] [Indexed: 11/08/2022] Open
Abstract
The emergence of small open reading frame (sORF)-encoded peptides (SEPs) is rapidly expanding the known proteome at the lower end of the size distribution. Here, we show that the mitochondrial proteome, particularly the respiratory chain, is enriched for small proteins. Using a prediction and validation pipeline for SEPs, we report the discovery of 16 endogenous nuclear encoded, mitochondrial-localized SEPs (mito-SEPs). Through functional prediction, proteomics, metabolomics and metabolic flux modeling, we demonstrate that BRAWNIN, a 71 a.a. peptide encoded by C12orf73, is essential for respiratory chain complex III (CIII) assembly. In human cells, BRAWNIN is induced by the energy-sensing AMPK pathway, and its depletion impairs mitochondrial ATP production. In zebrafish, Brawnin deletion causes complete CIII loss, resulting in severe growth retardation, lactic acidosis and early death. Our findings demonstrate that BRAWNIN is essential for vertebrate oxidative phosphorylation. We propose that mito-SEPs are an untapped resource for essential regulators of oxidative metabolism.
Collapse
|
46
|
Fang E, Wang X, Wang J, Hu A, Song H, Yang F, Li D, Xiao W, Chen Y, Guo Y, Liu Y, Li H, Huang K, Zheng L, Tong Q. Therapeutic targeting of YY1/MZF1 axis by MZF1-uPEP inhibits aerobic glycolysis and neuroblastoma progression. Am J Cancer Res 2020; 10:1555-1571. [PMID: 32042322 PMCID: PMC6993229 DOI: 10.7150/thno.37383] [Citation(s) in RCA: 21] [Impact Index Per Article: 4.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/07/2019] [Accepted: 10/09/2019] [Indexed: 12/13/2022] Open
Abstract
As a hallmark of metabolic reprogramming, aerobic glycolysis contributes to tumorigenesis and aggressiveness. However, the mechanisms and therapeutic strategies regulating aerobic glycolysis in neuroblastoma (NB), one of leading causes of cancer-related death in childhood, still remain elusive. Methods: Transcriptional regulators and their downstream glycolytic genes were identified by a comprehensive screening of publicly available datasets. Dual-luciferase, chromatin immunoprecipitation, real-time quantitative RT-PCR, western blot, gene over-expression or silencing, co-immunoprecipitation, mass spectrometry, peptide pull-down assay, sucrose gradient sedimentation, seahorse extracellular flux, MTT colorimetric, soft agar, matrigel invasion, and nude mice assays were undertaken to explore the biological effects and underlying mechanisms of transcriptional regulators in NB cells. Survival analysis was performed by using log-rank test and Cox regression assay. Results: Transcription factor myeloid zinc finger 1 (MZF1) was identified as an independent prognostic factor (hazard ratio=2.330, 95% confidence interval=1.021 to 3.317), and facilitated glycolysis process through increasing expression of hexokinase 2 (HK2) and phosphoglycerate kinase 1 (PGK1). Meanwhile, a 21-amino acid peptide encoded by upstream open reading frame of MZF1, termed as MZF1-uPEP, bound to zinc finger domain of Yin Yang 1 (YY1), resulting in repressed transactivation of YY1 and decreased transcription of MZF1 and downstream genes HK2 and PGK1. Administration of a cell-penetrating MZF1-uPEP or lentivirus over-expressing MZF1-uPEP inhibited the aerobic glycolysis, tumorigenesis and aggressiveness of NB cells. In clinical NB cases, low expression of MZF1-uPEP or high expression of MZF1, YY1, HK2, or PGK1 was associated with poor survival of patients. Conclusions: These results indicate that therapeutic targeting of YY1/MZF1 axis by MZF1-uPEP inhibits aerobic glycolysis and NB progression.
Collapse
|
47
|
Martinez TF, Chu Q, Donaldson C, Tan D, Shokhirev MN, Saghatelian A. Accurate annotation of human protein-coding small open reading frames. Nat Chem Biol 2019; 16:458-468. [PMID: 31819274 PMCID: PMC7085969 DOI: 10.1038/s41589-019-0425-0] [Citation(s) in RCA: 141] [Impact Index Per Article: 23.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/12/2019] [Accepted: 11/01/2019] [Indexed: 12/13/2022]
Abstract
Functional protein-coding small open reading frames (smORFs) are emerging as an important class of genes. However, the number of translated smORFs in the human genome is unclear because proteogenomic methods are not sensitive enough, and, as we show, Ribo-Seq strategies require additional measures to ensure comprehensive and accurate smORF annotation. Here, we integrate de novo transcriptome assembly and Ribo-Seq into an improved workflow that overcomes obstacles with previous methods to more confidently annotate thousands of smORFs. Evolutionary conservation analyses suggest that hundreds of smORF-encoded microproteins are likely functional. Additionally, many smORFs are regulated during fundamental biological processes, such as cell stress. Peptides derived from smORFs are also detectable on human leukocyte antigen complexes, revealing smORFs as a source of antigens. Thus, by including additional validation into our smORF annotation workflow, we accurately identify thousands of unannotated translated smORFs that will provide a rich pool of unexplored, functional human genes.
Collapse
Affiliation(s)
- Thomas F Martinez
- Clayton Foundation Laboratories for Peptide Biology, Salk Institute for Biological Studies, La Jolla, CA, USA.
| | - Qian Chu
- Clayton Foundation Laboratories for Peptide Biology, Salk Institute for Biological Studies, La Jolla, CA, USA
| | - Cynthia Donaldson
- Clayton Foundation Laboratories for Peptide Biology, Salk Institute for Biological Studies, La Jolla, CA, USA
| | - Dan Tan
- Clayton Foundation Laboratories for Peptide Biology, Salk Institute for Biological Studies, La Jolla, CA, USA
| | - Maxim N Shokhirev
- Razavi Newman Integrative Genomics Bioinformatics Core, Salk Institute for Biological Studies, La Jolla, CA, USA
| | - Alan Saghatelian
- Clayton Foundation Laboratories for Peptide Biology, Salk Institute for Biological Studies, La Jolla, CA, USA.
| |
Collapse
|
48
|
Zhu M, Gribskov M. MiPepid: MicroPeptide identification tool using machine learning. BMC Bioinformatics 2019; 20:559. [PMID: 31703551 PMCID: PMC6842143 DOI: 10.1186/s12859-019-3033-9] [Citation(s) in RCA: 55] [Impact Index Per Article: 9.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/10/2019] [Accepted: 08/16/2019] [Indexed: 12/13/2022] Open
Abstract
Background Micropeptides are small proteins with length < = 100 amino acids. Short open reading frames that could produces micropeptides were traditionally ignored due to technical difficulties, as few small peptides had been experimentally confirmed. In the past decade, a growing number of micropeptides have been shown to play significant roles in vital biological activities. Despite the increased amount of data, we still lack bioinformatics tools for specifically identifying micropeptides from DNA sequences. Indeed, most existing tools for classifying coding and noncoding ORFs were built on datasets in which “normal-sized” proteins were considered to be positives and short ORFs were generally considered to be noncoding. Since the functional and biophysical constraints on small peptides are likely to be different from those on “normal” proteins, methods for predicting short translated ORFs must be trained independently from those for longer proteins. Results In this study, we have developed MiPepid, a machine-learning tool specifically for the identification of micropeptides. We trained MiPepid using carefully cleaned data from existing databases and used logistic regression with 4-mer features. With only the sequence information of an ORF, MiPepid is able to predict whether it encodes a micropeptide with 96% accuracy on a blind dataset of high-confidence micropeptides, and to correctly classify newly discovered micropeptides not included in either the training or the blind test data. Compared with state-of-the-art coding potential prediction methods, MiPepid performs exceptionally well, as other methods incorrectly classify most bona fide micropeptides as noncoding. MiPepid is alignment-free and runs sufficiently fast for genome-scale analyses. It is easy to use and is available at https://github.com/MindAI/MiPepid. Conclusions MiPepid was developed to specifically predict micropeptides, a category of proteins with increasing significance, from DNA sequences. It shows evident advantages over existing coding potential prediction methods on micropeptide identification. It is ready to use and runs fast. Electronic supplementary material The online version of this article (10.1186/s12859-019-3033-9) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Mengmeng Zhu
- Department of Statistics, Purdue University, West Lafayette, IN, 47907, USA.,Department of Biological Sciences, Purdue University, West Lafayette, IN, 47907, USA
| | - Michael Gribskov
- Department of Biological Sciences, Purdue University, West Lafayette, IN, 47907, USA.
| |
Collapse
|
49
|
Wang J, Zhu S, Meng N, He Y, Lu R, Yan GR. ncRNA-Encoded Peptides or Proteins and Cancer. Mol Ther 2019; 27:1718-1725. [PMID: 31526596 DOI: 10.1016/j.ymthe.2019.09.001] [Citation(s) in RCA: 237] [Impact Index Per Article: 39.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/19/2019] [Revised: 08/26/2019] [Accepted: 09/01/2019] [Indexed: 12/31/2022] Open
Abstract
Non-coding RNAs (ncRNAs) are unique RNA transcripts that have been widely identified in the eukaryotic genome and have been shown to play key roles in the development of many cancers. However, the rapid development of genome-wide translation profiling and ribosome profiling has revealed that a small number of small open reading frames (sORFs) within ncRNAs actually have peptide- or protein-coding potential. The peptides or proteins encoded by ncRNA (HOXB-AS3, encoded by long ncRNA [lncRNA]; FBXW7-185aa, PINT-87aa, and SHPRH-146aa, encoded by circular RNA [circRNA]; and miPEP-200a and miPEP-200b, encoded by primary miRNAs) have been shown to be critical players in cancer development and progression, through effects upon the regulation of glucose metabolism, the epithelial-to-mesenchymal transition, and the ubiquitination pathway. In this review, we summarize the reported peptides or proteins encoded by ncRNAs in cancer and explore the application of these peptides or proteins in the development of anti-tumor drugs and the identification of relevant therapeutic targets and tumor biomarkers.
Collapse
Affiliation(s)
- Jizhong Wang
- Biomedicine Research Center, State Key Laboratory of Respiratory Disease, Key Laboratory for Major Obstetric Diseases of Guangdong Province, The Third Affiliated Hospital of Guangzhou Medical University, Guangzhou 510150, China
| | - Song Zhu
- Biomedicine Research Center, State Key Laboratory of Respiratory Disease, Key Laboratory for Major Obstetric Diseases of Guangdong Province, The Third Affiliated Hospital of Guangzhou Medical University, Guangzhou 510150, China
| | - Nan Meng
- Biomedicine Research Center, State Key Laboratory of Respiratory Disease, Key Laboratory for Major Obstetric Diseases of Guangdong Province, The Third Affiliated Hospital of Guangzhou Medical University, Guangzhou 510150, China
| | - Yutian He
- Biomedicine Research Center, State Key Laboratory of Respiratory Disease, Key Laboratory for Major Obstetric Diseases of Guangdong Province, The Third Affiliated Hospital of Guangzhou Medical University, Guangzhou 510150, China
| | - Ruixun Lu
- Key Laboratory of Protein Modification and Degradation, Guangzhou Medical University, Guangzhou 511436, China
| | - Guang-Rong Yan
- Biomedicine Research Center, State Key Laboratory of Respiratory Disease, Key Laboratory for Major Obstetric Diseases of Guangdong Province, The Third Affiliated Hospital of Guangzhou Medical University, Guangzhou 510150, China; Key Laboratory of Protein Modification and Degradation, Guangzhou Medical University, Guangzhou 511436, China.
| |
Collapse
|
50
|
Xiao Z, Huang R, Xing X, Chen Y, Deng H, Yang X. De novo annotation and characterization of the translatome with ribosome profiling data. Nucleic Acids Res 2019. [PMID: 29538776 PMCID: PMC6007384 DOI: 10.1093/nar/gky179] [Citation(s) in RCA: 86] [Impact Index Per Article: 14.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/11/2022] Open
Abstract
By capturing and sequencing the RNA fragments protected by translating ribosomes, ribosome profiling provides snapshots of translation at subcodon resolution. The growing needs for comprehensive annotation and characterization of the context-dependent translatomes are calling for an efficient and unbiased method to accurately recover the signal of active translation from the ribosome profiling data. Here we present our new method, RiboCode, for such purpose. Being tested with simulated and real ribosome profiling data, and validated with cell type-specific QTI-seq and mass spectrometry data, RiboCode exhibits superior efficiency, sensitivity, and accuracy for de novo annotation of the translatome, which covers various types of ORFs in the previously annotated coding and non-coding regions. As an example, RiboCode was applied to assemble the context-specific translatomes of yeast under normal and stress conditions. Comparisons among these translatomes revealed stress-activated novel upstream and downstream ORFs, some of which are associated with translational dysregulations of the annotated main ORFs under the stress conditions.
Collapse
Affiliation(s)
- Zhengtao Xiao
- MOE Key Laboratory of Bioinformatics, Tsinghua University, Beijing 100084, China.,Center for Synthetic & Systems Biology, Tsinghua University, Beijing 100084, China.,School of Life Sciences, Tsinghua University, Beijing 100084, China
| | - Rongyao Huang
- MOE Key Laboratory of Bioinformatics, Tsinghua University, Beijing 100084, China.,Center for Synthetic & Systems Biology, Tsinghua University, Beijing 100084, China.,School of Life Sciences, Tsinghua University, Beijing 100084, China
| | - Xudong Xing
- MOE Key Laboratory of Bioinformatics, Tsinghua University, Beijing 100084, China.,Center for Synthetic & Systems Biology, Tsinghua University, Beijing 100084, China.,School of Life Sciences, Tsinghua University, Beijing 100084, China.,Joint Graduate Program of Peking-Tsinghua-National Institute of Biological Science, Tsinghua University, Beijing 100084, China
| | - Yuling Chen
- MOE Key Laboratory of Bioinformatics, Tsinghua University, Beijing 100084, China.,Center for Synthetic & Systems Biology, Tsinghua University, Beijing 100084, China.,School of Life Sciences, Tsinghua University, Beijing 100084, China
| | - Haiteng Deng
- MOE Key Laboratory of Bioinformatics, Tsinghua University, Beijing 100084, China.,Center for Synthetic & Systems Biology, Tsinghua University, Beijing 100084, China.,School of Life Sciences, Tsinghua University, Beijing 100084, China
| | - Xuerui Yang
- MOE Key Laboratory of Bioinformatics, Tsinghua University, Beijing 100084, China.,Center for Synthetic & Systems Biology, Tsinghua University, Beijing 100084, China.,School of Life Sciences, Tsinghua University, Beijing 100084, China
| |
Collapse
|