1
|
Froschauer K, Svensson SL, Gelhausen R, Fiore E, Kible P, Klaude A, Kucklick M, Fuchs S, Eggenhofer F, Yang C, Falush D, Engelmann S, Backofen R, Sharma CM. Complementary Ribo-seq approaches map the translatome and provide a small protein census in the foodborne pathogen Campylobacter jejuni. Nat Commun 2025; 16:3078. [PMID: 40159498 PMCID: PMC11955535 DOI: 10.1038/s41467-025-58329-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/26/2024] [Accepted: 03/18/2025] [Indexed: 04/02/2025] Open
Abstract
In contrast to transcriptome maps, bacterial small protein (≤50-100 aa) coding landscapes, including overlapping genes, are poorly characterized. However, an emerging number of small proteins have crucial roles in bacterial physiology and virulence. Here, we present a Ribo-seq-based high-resolution translatome map for the major foodborne pathogen Campylobacter jejuni. Besides conventional Ribo-seq, we employed translation initiation site (TIS) profiling to map start codons and also developed a translation termination site (TTS) profiling approach, which revealed stop codons not apparent from the reference genome in virulence loci. Our integrated approach combined with independent validation expanded the small proteome by two-fold, including CioY, a new 34 aa component of the CioAB oxidase. Overall, our study generates a high-resolution annotation of the C. jejuni coding landscape, provided in an interactive browser, and showcases a strategy for applying integrated Ribo-seq to other species to enrich our understanding of small proteomes.
Collapse
Affiliation(s)
- Kathrin Froschauer
- University of Würzburg, Institute of Molecular Infection Biology, Department of Molecular Infection Biology II, Würzburg, Germany
| | - Sarah L Svensson
- University of Würzburg, Institute of Molecular Infection Biology, Department of Molecular Infection Biology II, Würzburg, Germany
- The Center for Microbes, Development and Health, CAS Key Laboratory of Molecular Virology and Immunology, Shanghai Institute of Immunity and Infection, Chinese Academy of Sciences, Shanghai, China
| | - Rick Gelhausen
- Bioinformatics Group, Department of Computer Science, University of Freiburg, Freiburg, Germany
| | - Elisabetta Fiore
- University of Würzburg, Institute of Molecular Infection Biology, Department of Molecular Infection Biology II, Würzburg, Germany
| | - Philipp Kible
- University of Würzburg, Institute of Molecular Infection Biology, Department of Molecular Infection Biology II, Würzburg, Germany
| | - Alicia Klaude
- Technische Universität Braunschweig, Institute for Microbiology, Braunschweig, Germany
- Helmholtz Centre for Infection Research (HZI), Braunschweig, Germany
| | - Martin Kucklick
- Technische Universität Braunschweig, Institute for Microbiology, Braunschweig, Germany
- Helmholtz Centre for Infection Research (HZI), Braunschweig, Germany
| | - Stephan Fuchs
- Robert Koch Institute, Methodenentwicklung und Forschungsinfrastruktur (MF), Berlin, Germany
| | - Florian Eggenhofer
- Bioinformatics Group, Department of Computer Science, University of Freiburg, Freiburg, Germany
| | - Chao Yang
- The Center for Microbes, Development and Health, CAS Key Laboratory of Molecular Virology and Immunology, Shanghai Institute of Immunity and Infection, Chinese Academy of Sciences, Shanghai, China
| | - Daniel Falush
- The Center for Microbes, Development and Health, CAS Key Laboratory of Molecular Virology and Immunology, Shanghai Institute of Immunity and Infection, Chinese Academy of Sciences, Shanghai, China
| | - Susanne Engelmann
- Technische Universität Braunschweig, Institute for Microbiology, Braunschweig, Germany
- Helmholtz Centre for Infection Research (HZI), Braunschweig, Germany
| | - Rolf Backofen
- Bioinformatics Group, Department of Computer Science, University of Freiburg, Freiburg, Germany
- Signalling Research Centre CIBSS, University of Freiburg, Freiburg, Germany
| | - Cynthia M Sharma
- University of Würzburg, Institute of Molecular Infection Biology, Department of Molecular Infection Biology II, Würzburg, Germany.
| |
Collapse
|
2
|
Lim CS, Gibbon AK, Tran Nguyen AT, Chieng GSW, Brown CM. RIBOSS detects novel translational events by combining long- and short-read transcriptome and translatome profiling. Brief Bioinform 2025; 26:bbaf164. [PMID: 40221960 PMCID: PMC11994033 DOI: 10.1093/bib/bbaf164] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/08/2024] [Revised: 03/18/2025] [Accepted: 03/23/2025] [Indexed: 04/15/2025] Open
Abstract
Ribosome profiling is a high-throughput sequencing technique that captures the positions of translating ribosomes on RNAs. Recent advancements in ribosome profiling include achieving highly phased ribosome footprints for plant translatomes and more recently for bacterial translatomes. This substantially increases the specificity of detecting open reading frames (ORFs) that can be translated, such as small ORFs located upstream and downstream of the annotated ORFs. However, most genomes (e.g. bacterial genomes) lack the annotations for the transcription start and termination sites. This hinders the systematic discovery of novel ORFs in the 'untranslated' regions in ribosome profiling data. Here, we develop a new computational pipeline called RIBOSS to discover noncanonical ORFs and assess their translational potential against annotated ORFs. The RIBOSS Python modules are versatile, and we use them to analyse both prokaryotic and eukaryotic data. We present a resulting list of noncanonical ORFs with high translational potential in Homo sapiens, Arabidopsis thaliana, and Salmonella enterica. We further illustrate RIBOSS utility when studying organisms with incomplete transcriptome annotations. We leverage long-read and short-read data for reference-guided transcriptome assembly and highly phased ribosome profiling data for detecting novel translational events in the assembled transcriptome for S. enterica. In sum, RIBOSS is the first integrated computational pipeline for noncanonical ORF detection and translational potential assessment that incorporates long- and short-read sequencing technologies to investigate translation. RIBOSS is freely available at https://github.com/lcscs12345/riboss.
Collapse
Affiliation(s)
- Chun Shen Lim
- Department of Biochemistry, School of Biomedical Sciences, University of Otago, 710 Cumberland Street, Dunedin North, Dunedin 9016, New Zealand
- Genetics Otago, University of Otago, 710 Cumberland Street, Dunedin North, Dunedin 9016, New Zealand
| | - Alexandra K Gibbon
- Department of Biochemistry, School of Biomedical Sciences, University of Otago, 710 Cumberland Street, Dunedin North, Dunedin 9016, New Zealand
- Genetics Otago, University of Otago, 710 Cumberland Street, Dunedin North, Dunedin 9016, New Zealand
| | - Anh Thu Tran Nguyen
- Department of Biochemistry, School of Biomedical Sciences, University of Otago, 710 Cumberland Street, Dunedin North, Dunedin 9016, New Zealand
- Genetics Otago, University of Otago, 710 Cumberland Street, Dunedin North, Dunedin 9016, New Zealand
| | - Gabrielle S W Chieng
- Department of Biochemistry, School of Biomedical Sciences, University of Otago, 710 Cumberland Street, Dunedin North, Dunedin 9016, New Zealand
- Genetics Otago, University of Otago, 710 Cumberland Street, Dunedin North, Dunedin 9016, New Zealand
| | - Chris M Brown
- Department of Biochemistry, School of Biomedical Sciences, University of Otago, 710 Cumberland Street, Dunedin North, Dunedin 9016, New Zealand
- Genetics Otago, University of Otago, 710 Cumberland Street, Dunedin North, Dunedin 9016, New Zealand
| |
Collapse
|
3
|
Pereira AB, Marano M, Bathala R, Zaragoza RA, Neira A, Samano A, Owoyemi A, Casola C. Orphan genes are not a distinct biological entity. Bioessays 2025; 47:e2400146. [PMID: 39491810 DOI: 10.1002/bies.202400146] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/16/2024] [Revised: 10/06/2024] [Accepted: 10/11/2024] [Indexed: 11/05/2024]
Abstract
The genome sequencing revolution has revealed that all species possess a large number of unique genes critical for trait variation, adaptation, and evolutionary innovation. One widely used approach to identify such genes consists of detecting protein-coding sequences with no homology in other genomes, termed orphan genes. These genes have been extensively studied, under the assumption that they represent valid proxies for species-specific genes. Here, we critically evaluate taxonomic, phylogenetic, and sequence evolution evidence showing that orphan genes belong to a range of evolutionary ages and thus cannot be assigned to a single lineage. Furthermore, we show that the processes generating orphan genes are substantially more diverse than generally thought and include horizontal gene transfer, transposable element domestication, and overprinting. Thus, orphan genes represent a heterogeneous collection of genes rather than a single biological entity, making them unsuitable as a subject for meaningful investigation of gene evolution and phenotypic innovation.
Collapse
Affiliation(s)
- Andres Barboza Pereira
- Interdisciplinary Graduate Program in Genetics & Genomics, Texas A&M University, College Station, Texas, USA
- Interdisciplinary Doctoral Program in Ecology and Evolutionary Biology, Texas A&M University, College Station, Texas, USA
| | - Matthew Marano
- Interdisciplinary Doctoral Program in Ecology and Evolutionary Biology, Texas A&M University, College Station, Texas, USA
| | - Ramya Bathala
- Department of Biochemistry and Biophysics, Texas A&M University, College Station, Texas, USA
| | | | - Andres Neira
- School of Pharmacy, Texas A&M University, College Station, Texas, USA
| | - Alex Samano
- Department of Biology, Texas A&M University, College Station, Texas, USA
| | - Adekola Owoyemi
- Department of Ecology and Conservation Biology, Texas A&M University, College Station, Texas, USA
| | - Claudio Casola
- Interdisciplinary Graduate Program in Genetics & Genomics, Texas A&M University, College Station, Texas, USA
- Interdisciplinary Doctoral Program in Ecology and Evolutionary Biology, Texas A&M University, College Station, Texas, USA
- Department of Ecology and Conservation Biology, Texas A&M University, College Station, Texas, USA
| |
Collapse
|
4
|
Vakirlis N, Kupczok A. Large-scale investigation of species-specific orphan genes in the human gut microbiome elucidates their evolutionary origins. Genome Res 2024; 34:888-903. [PMID: 38977308 PMCID: PMC11293555 DOI: 10.1101/gr.278977.124] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/11/2024] [Accepted: 06/12/2024] [Indexed: 07/10/2024]
Abstract
Species-specific genes, also known as orphans, are ubiquitous across life's domains. In prokaryotes, species-specific orphan genes (SSOGs) are mostly thought to originate in external elements such as viruses followed by horizontal gene transfer, whereas the scenario of native origination, through rapid divergence or de novo, is mostly dismissed. However, quantitative evidence supporting either scenario is lacking. Here, we systematically analyzed genomes from 4644 human gut microbiome species and identified more than 600,000 unique SSOGs, representing an average of 2.6% of a given species' pangenome. These sequences are mostly rare within each species yet show signs of purifying selection. Overall, SSOGs use optimal codons less frequently, and their proteins are more disordered than those of conserved genes (i.e., non-SSOGs). Importantly, across species, the GC content of SSOGs closely matches that of conserved ones. In contrast, the ∼5% of SSOGs that share similarity to known viral sequences have distinct characteristics, including lower GC content. Thus, SSOGs with similarity to viruses differ from the remaining SSOGs, contrasting an external origination scenario for most of them. By examining the orthologous genomic region in closely related species, we show that a small subset of SSOGs likely evolved natively de novo and find that these genes also differ in their properties from the remaining SSOGs. Our results challenge the notion that external elements are the dominant source of prokaryotic genetic novelty and will enable future studies into the biological role and relevance of species-specific genes in the human gut.
Collapse
Affiliation(s)
- Nikolaos Vakirlis
- Institute For Fundamental Biomedical Research, B.S.R.C. "Alexander Fleming," Vari 166 72, Greece;
- Institute for General Microbiology, Kiel University, 24118 Kiel, Germany
| | - Anne Kupczok
- Bioinformatics Group, Wageningen University, 6700 PB Wageningen, The Netherlands
| |
Collapse
|
5
|
Ardern Z. Alternative Reading Frames are an Underappreciated Source of Protein Sequence Novelty. J Mol Evol 2023; 91:570-580. [PMID: 37326679 DOI: 10.1007/s00239-023-10122-3] [Citation(s) in RCA: 8] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/19/2022] [Accepted: 05/31/2023] [Indexed: 06/17/2023]
Abstract
Protein-coding DNA sequences can be translated into completely different amino acid sequences if the nucleotide triplets used are shifted by a non-triplet amount on the same DNA strand or by translating codons from the opposite strand. Such "alternative reading frames" of protein-coding genes are a major contributor to the evolution of novel protein products. Recent studies demonstrating this include examples across the three domains of cellular life and in viruses. These sequences increase the number of trials potentially available for the evolutionary invention of new genes and also have unusual properties which may facilitate gene origin. There is evidence that the structure of the standard genetic code contributes to the features and gene-likeness of some alternative frame sequences. These findings have important implications across diverse areas of molecular biology, including for genome annotation, structural biology, and evolutionary genomics.
Collapse
|
6
|
Thomas KE, Gagniuc PA, Gagniuc E. Moonlighting genes harbor antisense ORFs that encode potential membrane proteins. Sci Rep 2023; 13:12591. [PMID: 37537268 PMCID: PMC10400600 DOI: 10.1038/s41598-023-39869-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/07/2023] [Accepted: 08/01/2023] [Indexed: 08/05/2023] Open
Abstract
Moonlighting genes encode for single polypeptide molecules that perform multiple and often unrelated functions. These genes occur across all domains of life. Their ubiquity and functional diversity raise many questions as to their origins, evolution, and role in the cell cycle. In this study, we present a simple bioinformatics probe that allows us to rank genes by antisense translation potential, and we show that this probe enriches, reliably, for moonlighting genes across a variety of organisms. We find that moonlighting genes harbor putative antisense open reading frames (ORFs) rich in codons for non-polar amino acids. We also find that moonlighting genes tend to co-locate with genes involved in cell wall, cell membrane, or cell envelope production. On the basis of this and other findings, we offer a model in which we propose that moonlighting gene products are likely to escape the cell through gaps in the cell wall and membrane, at wall/membrane construction sites; and we propose that antisense ORFs produce "membrane-sticky" protein products, effectively binding moonlighting-gene DNA to the cell membrane in porous areas where intensive cell-wall/cell-membrane construction is underway. This leads to high potential for escape of moonlighting proteins to the cell surface. Evolutionary and other implications of these findings are discussed.
Collapse
Affiliation(s)
| | - Paul A Gagniuc
- Faculty of Engineering in Foreign Languages, University Politehnica of Bucharest, Bucharest, Romania.
| | - Elvira Gagniuc
- Synevovet Laboratory, Bucharest, Romania
- Faculty of Veterinary Medicine, University of Agronomic Sciences and Veterinary Medicine, Bucharest, Romania
| |
Collapse
|
7
|
Kienzle L, Bettinazzi S, Choquette T, Brunet M, Khorami HH, Jacques JF, Moreau M, Roucou X, Landry CR, Angers A, Breton S. A small protein coded within the mitochondrial canonical gene nd4 regulates mitochondrial bioenergetics. BMC Biol 2023; 21:111. [PMID: 37198654 DOI: 10.1186/s12915-023-01609-y] [Citation(s) in RCA: 8] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/17/2022] [Accepted: 05/03/2023] [Indexed: 05/19/2023] Open
Abstract
BACKGROUND Mitochondria have a central role in cellular functions, aging, and in certain diseases. They possess their own genome, a vestige of their bacterial ancestor. Over the course of evolution, most of the genes of the ancestor have been lost or transferred to the nucleus. In humans, the mtDNA is a very small circular molecule with a functional repertoire limited to only 37 genes. Its extremely compact nature with genes arranged one after the other and separated by short non-coding regions suggests that there is little room for evolutionary novelties. This is radically different from bacterial genomes, which are also circular but much larger, and in which we can find genes inside other genes. These sequences, different from the reference coding sequences, are called alternatives open reading frames or altORFs, and they are involved in key biological functions. However, whether altORFs exist in mitochondrial protein-coding genes or elsewhere in the human mitogenome has not been fully addressed. RESULTS We found a downstream alternative ATG initiation codon in the + 3 reading frame of the human mitochondrial nd4 gene. This newly characterized altORF encodes a 99-amino-acid-long polypeptide, MTALTND4, which is conserved in primates. Our custom antibody, but not the pre-immune serum, was able to immunoprecipitate MTALTND4 from HeLa cell lysates, confirming the existence of an endogenous MTALTND4 peptide. The protein is localized in mitochondria and cytoplasm and is also found in the plasma, and it impacts cell and mitochondrial physiology. CONCLUSIONS Many human mitochondrial translated ORFs might have so far gone unnoticed. By ignoring mtaltORFs, we have underestimated the coding potential of the mitogenome. Alternative mitochondrial peptides such as MTALTND4 may offer a new framework for the investigation of mitochondrial functions and diseases.
Collapse
Affiliation(s)
- Laura Kienzle
- Département de sciences biologiques, Université de Montréal, Montréal, Canada
| | - Stefano Bettinazzi
- Département de sciences biologiques, Université de Montréal, Montréal, Canada
| | - Thierry Choquette
- Département de sciences biologiques, Université de Montréal, Montréal, Canada
| | - Marie Brunet
- Service de génétique médicale, Département de pédiatrie, Université de Sherbrooke, Sherbrooke, Canada
- Centre de recherche du Centre hospitalier universitaire de Sherbrooke (CRCHUS), Sherbrooke, Canada
| | | | - Jean-François Jacques
- Département de biochimie et génomique fonctionnelle, Université de Sherbrooke, Sherbrooke, Canada
| | - Mathilde Moreau
- Département de biochimie et génomique fonctionnelle, Université de Sherbrooke, Sherbrooke, Canada
| | - Xavier Roucou
- Centre de recherche du Centre hospitalier universitaire de Sherbrooke (CRCHUS), Sherbrooke, Canada
- Département de biochimie et génomique fonctionnelle, Université de Sherbrooke, Sherbrooke, Canada
| | - Christian R Landry
- Département de biochimie, de microbiologie et de bio-informatique, Faculté des sciences et de génie, Université Laval, Québec, Canada
- Institut de biologie intégrative et des systèmes, Université Laval, Québec, Canada
- PROTEO, Le regroupement québécois de recherche sur la fonction, l'ingénierie et les applications des protéines, Université Laval, Québec, Canada
- Centre de recherche sur les données massives, Université Laval, Québec, Canada
- Département de biologie, Faculté des sciences et de génie, Université Laval, Québec, Canada
| | - Annie Angers
- Département de sciences biologiques, Université de Montréal, Montréal, Canada
| | - Sophie Breton
- Département de sciences biologiques, Université de Montréal, Montréal, Canada.
| |
Collapse
|
8
|
Zhao L, Tabari E, Rong H, Dong X, Xue D, Su Z. Antisense transcription and its roles in adaption to environmental stress in E. coli. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.03.23.533988. [PMID: 36993172 PMCID: PMC10055363 DOI: 10.1101/2023.03.23.533988] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/19/2023]
Abstract
UNLABELLED It has been reported that a highly varying proportion (1% ∼ 93%) of genes in various prokaryotes have antisense RNA (asRNA) transcription. However, the extent of the pervasiveness of asRNA transcription in the well-studied E. coli K12 strain has thus far been an issue of debate. Furthermore, very little is known about the expression patterns and functions of asRNAs under various conditions. To fill these gaps, we determined the transcriptomes and proteomes of E. coli K12 at multiple time points in five culture conditions using strand-specific RNA-seq, differential RNA-seq, and quantitative mass spectrometry methods. To reduce artifacts of possible transcriptional noise, we identified asRNA using stringent criteria with biological replicate verification and transcription start sites (TSSs) information included. We identified a total of 660 asRNAs, which were generally short and largely condition-dependently transcribed. We found that the proportions of the genes which had asRNA transcription highly depended on the culture conditions and time points. We classified the transcriptional activities of the genes in six transcriptional modes according to their relative levels of asRNA to mRNA. Many genes changed their transcriptional modes at different time points of the culture conditions, and such transitions can be described in a well-defined manner. Intriguingly, the protein levels and mRNA levels of genes in the sense-only/sense-dominant mode were moderately correlated, but the same was not true for genes in the balanced/antisense-dominant mode, in which asRNAs were at a comparable or higher level to mRNAs. These observations were further validated by western blot on candidate genes, where an increase in asRNA transcription diminished gene expression in one case and enhanced it in another. These results suggest that asRNAs may directly or indirectly regulate translation by forming duplexes with cognate mRNAs. Thus, asRNAs may play an important role in the bacterium's responses to environmental changes during growth and adaption to different environments. IMPORTANCE The cis -antisense RNA (asRNA) is a type of understudied RNA molecules in prokaryotes, which is believed to be important in regulating gene expression. Our current understanding of asRNA is constrained by inconsistent reports about its identification and properties. These discrepancies are partially caused by a lack of sufficient samples, biological replicates, and culture conditions. This study aimed to overcome these disadvantages and identified 660 putative asRNAs using integrated information from strand-specific RNA-seq, differential RNA-seq, and mass spectrometry methods. In addition, we explored the relative expression between asRNAs and sense RNAs and investigated asRNA regulated transcriptional activity changes over different culture conditions and time points. Our work strongly suggests that asRNAs may play a crucial role in bacterium's responses to environmental changes during growth and adaption to different environments.
Collapse
|
9
|
Graf F, Zehentner B, Fellner L, Scherer S, Neuhaus K. Three Novel Antisense Overlapping Genes in E. coli O157:H7 EDL933. Microbiol Spectr 2023; 11:e0235122. [PMID: 36533921 PMCID: PMC9927249 DOI: 10.1128/spectrum.02351-22] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/22/2022] [Accepted: 12/03/2022] [Indexed: 12/23/2022] Open
Abstract
The abundance of long overlapping genes in prokaryotic genomes is likely to be significantly underestimated. To date, only a few examples of such genes are fully established. Using RNA sequencing and ribosome profiling, we found expression of novel overlapping open reading frames in Escherichia coli O157:H7 EDL933 (EHEC). Indeed, the overlapping candidate genes are equipped with typical structural elements required for transcription and translation, i.e., promoters, transcription start sites, as well as terminators, all of which were experimentally verified. Translationally arrested mutants, unable to produce the overlapping encoded protein, were found to have a growth disadvantage when grown competitively against the wild type. Thus, the phenotypes found imply biological functionality of the genes at the level of proteins produced. The addition of 3 more examples of prokaryotic overlapping genes to the currently limited, yet constantly growing pool of such genes emphasizes the underestimated coding capacity of bacterial genomes. IMPORTANCE The abundance of long overlapping genes in prokaryotic genomes is likely to be significantly underestimated, since such genes are not allowed in genome annotations. However, ribosome profiling catches mRNA in the moment of being template for protein production. Using this technique and subsequent experiments, we verified 3 novel overlapping genes encoded in antisense of known genes. This adds more examples of prokaryotic overlapping genes to the currently limited, yet constantly growing pool of such genes.
Collapse
Affiliation(s)
- Franziska Graf
- Core Facility Microbiome, ZIEL – Institute for Food & Health, Technische Universität München, Freising, Germany
- Chair for Microbial Ecology, TUM School of Life Sciences, Technische Universität München, Freising, Germany
| | - Barbara Zehentner
- Chair for Microbial Ecology, TUM School of Life Sciences, Technische Universität München, Freising, Germany
| | - Lea Fellner
- Chair for Microbial Ecology, TUM School of Life Sciences, Technische Universität München, Freising, Germany
| | - Siegfried Scherer
- Core Facility Microbiome, ZIEL – Institute for Food & Health, Technische Universität München, Freising, Germany
- Chair for Microbial Ecology, TUM School of Life Sciences, Technische Universität München, Freising, Germany
| | - Klaus Neuhaus
- Core Facility Microbiome, ZIEL – Institute for Food & Health, Technische Universität München, Freising, Germany
- Chair for Microbial Ecology, TUM School of Life Sciences, Technische Universität München, Freising, Germany
| |
Collapse
|
10
|
Gelhausen R, Müller T, Svensson SL, Alkhnbashi OS, Sharma CM, Eggenhofer F, Backofen R. RiboReport - benchmarking tools for ribosome profiling-based identification of open reading frames in bacteria. Brief Bioinform 2022; 23:bbab549. [PMID: 35037022 PMCID: PMC8921622 DOI: 10.1093/bib/bbab549] [Citation(s) in RCA: 7] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/25/2021] [Revised: 11/22/2021] [Accepted: 11/29/2021] [Indexed: 11/19/2022] Open
Abstract
Small proteins encoded by short open reading frames (ORFs) with 50 codons or fewer are emerging as an important class of cellular macromolecules in diverse organisms. However, they often evade detection by proteomics or in silico methods. Ribosome profiling (Ribo-seq) has revealed widespread translation in genomic regions previously thought to be non-coding, driving the development of ORF detection tools using Ribo-seq data. However, only a handful of tools have been designed for bacteria, and these have not yet been systematically compared. Here, we aimed to identify tools that use Ribo-seq data to correctly determine the translational status of annotated bacterial ORFs and also discover novel translated regions with high sensitivity. To this end, we generated a large set of annotated ORFs from four diverse bacterial organisms, manually labeled for their translation status based on Ribo-seq data, which are available for future benchmarking studies. This set was used to investigate the predictive performance of seven Ribo-seq-based ORF detection tools (REPARATION_blast, DeepRibo, Ribo-TISH, PRICE, smORFer, ribotricer and SPECtre), as well as IRSOM, which uses coding potential and RNA-seq coverage only. DeepRibo and REPARATION_blast robustly predicted translated ORFs, including sORFs, with no significant difference for ORFs in close proximity to other genes versus stand-alone genes. However, no tool predicted a set of novel, experimentally verified sORFs with high sensitivity. Start codon predictions with smORFer show the value of initiation site profiling data to further improve the sensitivity of ORF prediction tools in bacteria. Overall, we find that bacterial tools perform well for sORF detection, although there is potential for improving their performance, applicability, usability and reproducibility.
Collapse
Affiliation(s)
- Rick Gelhausen
- Bioinformatics Group, Department of Computer Science, University of Freiburg, Georges-Köhler-Allee 106, 79110, Freiburg, Germany
| | - Teresa Müller
- Bioinformatics Group, Department of Computer Science, University of Freiburg, Georges-Köhler-Allee 106, 79110, Freiburg, Germany
| | - Sarah L Svensson
- Department of Molecular Infection Biology II, Institute of Molecular Infection Biology (IMIB), University of Würzburg, Josef-Schneider-Str. 2 / D15, 97080, Würzburg, Germany
| | - Omer S Alkhnbashi
- Information and Computer Science Department, King Fahd University of Petroleum and Minerals, Saudi Arabia
- SDAIA-KFUPM Joint Research Center for Artificial Intelligence (JRC-AI), King Fahd University of Petroleum and Minerals, Saudi Arabia
| | - Cynthia M Sharma
- Department of Molecular Infection Biology II, Institute of Molecular Infection Biology (IMIB), University of Würzburg, Josef-Schneider-Str. 2 / D15, 97080, Würzburg, Germany
| | - Florian Eggenhofer
- Bioinformatics Group, Department of Computer Science, University of Freiburg, Georges-Köhler-Allee 106, 79110, Freiburg, Germany
| | - Rolf Backofen
- Bioinformatics Group, Department of Computer Science, University of Freiburg, Georges-Köhler-Allee 106, 79110, Freiburg, Germany
- Signalling Research Centres BIOSS and CIBSS, University of Freiburg, Schänzlestr. 18, 79104, State, Germany
| |
Collapse
|
11
|
Kreitmeier M, Ardern Z, Abele M, Ludwig C, Scherer S, Neuhaus K. Spotlight on alternative frame coding: Two long overlapping genes in Pseudomonas aeruginosa are translated and under purifying selection. iScience 2022; 25:103844. [PMID: 35198897 PMCID: PMC8850804 DOI: 10.1016/j.isci.2022.103844] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/10/2021] [Revised: 10/14/2021] [Accepted: 01/27/2022] [Indexed: 12/13/2022] Open
Abstract
The existence of overlapping genes (OLGs) with significant coding overlaps revolutionizes our understanding of genomic complexity. We report two exceptionally long (957 nt and 1536 nt), evolutionarily novel, translated antisense open reading frames (ORFs) embedded within annotated genes in the pathogenic Gram-negative bacterium Pseudomonas aeruginosa. Both OLG pairs show sequence features consistent with being genes and transcriptional signals in RNA sequencing. Translation of both OLGs was confirmed by ribosome profiling and mass spectrometry. Quantitative proteomics of samples taken during different phases of growth revealed regulation of protein abundances, implying biological functionality. Both OLGs are taxonomically restricted, and likely arose by overprinting within the genus. Evidence for purifying selection further supports functionality. The OLGs reported here, designated olg1 and olg2, are the longest yet proposed in prokaryotes and are among the best attested in terms of translation and evolutionary constraint. These results highlight a potentially large unexplored dimension of prokaryotic genomes.
Collapse
Affiliation(s)
- Michaela Kreitmeier
- Chair for Microbial Ecology, TUM School of Life Sciences, Technische Universität München, Weihenstephaner Berg 3, 85354 Freising, Germany
| | - Zachary Ardern
- Chair for Microbial Ecology, TUM School of Life Sciences, Technische Universität München, Weihenstephaner Berg 3, 85354 Freising, Germany
- Wellcome Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SA, UK
| | - Miriam Abele
- Bavarian Center for Biomolecular Mass Spectrometry (BayBioMS), TUM School of Life Sciences, Technische Universität München, Gregor-Mendel-Strasse 4, 85354 Freising, Germany
| | - Christina Ludwig
- Bavarian Center for Biomolecular Mass Spectrometry (BayBioMS), TUM School of Life Sciences, Technische Universität München, Gregor-Mendel-Strasse 4, 85354 Freising, Germany
| | - Siegfried Scherer
- Chair for Microbial Ecology, TUM School of Life Sciences, Technische Universität München, Weihenstephaner Berg 3, 85354 Freising, Germany
| | - Klaus Neuhaus
- Core Facility Microbiome, ZIEL – Institute for Food & Health, Technische Universität München, Weihenstephaner Berg 3, 85354 Freising, Germany
| |
Collapse
|
12
|
Wichmann S, Scherer S, Ardern Z. Biological factors in the synthetic construction of overlapping genes. BMC Genomics 2021; 22:888. [PMID: 34895142 PMCID: PMC8665328 DOI: 10.1186/s12864-021-08181-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/08/2020] [Accepted: 11/17/2021] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Overlapping genes (OLGs) with long protein-coding overlapping sequences are disallowed by standard genome annotation programs, outside of viruses. Recently however they have been discovered in Archaea, diverse Bacteria, and Mammals. The biological factors underlying life's ability to create overlapping genes require more study, and may have important applications in understanding evolution and in biotechnology. A previous study claimed that protein domains from viruses were much better suited to forming overlaps than those from other cellular organisms - in this study we assessed this claim, in order to discover what might underlie taxonomic differences in the creation of gene overlaps. RESULTS After overlapping arbitrary Pfam domain pairs and evaluating them with Hidden Markov Models we find OLG construction to be much less constrained than expected. For instance, close to 10% of the constructed sequences cannot be distinguished from typical sequences in their protein family. Most are also indistinguishable from natural protein sequences regarding identity and secondary structure. Surprisingly, contrary to a previous study, virus domains were much less suitable for designing OLGs than bacterial or eukaryotic domains were. In general, the amount of amino acid change required to force a domain to overlap is approximately equal to the variation observed within a typical domain family. The resulting high similarity between natural sequences and those altered so as to overlap is mostly due to the combination of high redundancy in the genetic code and the evolutionary exchangeability of many amino acids. CONCLUSIONS Synthetic overlapping genes which closely resemble natural gene sequences, as measured by HMM profiles, are remarkably easy to construct, and most arbitrary domain pairs can be altered so as to overlap while retaining high similarity to the original sequences. Future work however will need to assess important factors not considered such as intragenic interactions which affect protein folding. While the analysis here is not sufficient to guarantee functional folding proteins, further analysis of constructed OLGs will improve our understanding of the origin of these remarkable genetic elements across life and opens up exciting possibilities for synthetic biology.
Collapse
Affiliation(s)
- Stefan Wichmann
- Chair of Microbial Ecology, Department of Molecular Life Sciences, Technical University of Munich, Freising, Germany
| | - Siegfried Scherer
- Chair of Microbial Ecology, Department of Molecular Life Sciences, Technical University of Munich, Freising, Germany
| | - Zachary Ardern
- Chair of Microbial Ecology, Department of Molecular Life Sciences, Technical University of Munich, Freising, Germany.
- Wellcome Sanger Institute, Wellcome Genome Campus, Hinxton, UK.
| |
Collapse
|
13
|
Ardern Z. Small proteins: overcoming size restrictions. Nat Rev Microbiol 2021; 20:65. [PMID: 34848872 DOI: 10.1038/s41579-021-00672-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]
Affiliation(s)
- Zachary Ardern
- Wellcome Sanger Institute, Wellcome Genome Campus, Hinxton, UK.
| |
Collapse
|
14
|
Watson AK, Lopez P, Bapteste E. Hundreds of out-of-frame remodelled gene families in the E. coli pangenome. Mol Biol Evol 2021; 39:6430988. [PMID: 34792602 PMCID: PMC8788219 DOI: 10.1093/molbev/msab329] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/01/2023] Open
Abstract
All genomes include gene families with very limited taxonomic distributions that potentially represent new genes and innovations in protein-coding sequence, raising questions on the origins of such genes. Some of these genes are hypothesized to have formed de novo, from noncoding sequences, and recent work has begun to elucidate the processes by which de novo gene formation can occur. A special case of de novo gene formation, overprinting, describes the origin of new genes from noncoding alternative reading frames of existing open reading frames (ORFs). We argue that additionally, out-of-frame gene fission/fusion events of alternative reading frames of ORFs and out-of-frame lateral gene transfers could contribute to the origin of new gene families. To demonstrate this, we developed an original pattern-search in sequence similarity networks, enhancing the use of these graphs, commonly used to detect in-frame remodeled genes. We applied this approach to gene families in 524 complete genomes of Escherichia coli. We identified 767 gene families whose evolutionary history likely included at least one out-of-frame remodeling event. These genes with out-of-frame components represent ∼2.5% of all genes in the E. coli pangenome, suggesting that alternative reading frames of existing ORFs can contribute to a significant proportion of de novo genes in bacteria.
Collapse
Affiliation(s)
- Andrew K Watson
- Institut de Systématique, Evolution, Biodiversité (ISYEB), Sorbonne Université, CNRS, Museum National d'Histoire Naturelle, EPHE, Université des Antilles, 7, quai Saint Bernard, Paris, 75005, France
| | - Philippe Lopez
- Institut de Systématique, Evolution, Biodiversité (ISYEB), Sorbonne Université, CNRS, Museum National d'Histoire Naturelle, EPHE, Université des Antilles, 7, quai Saint Bernard, Paris, 75005, France
| | - Eric Bapteste
- Institut de Systématique, Evolution, Biodiversité (ISYEB), Sorbonne Université, CNRS, Museum National d'Histoire Naturelle, EPHE, Université des Antilles, 7, quai Saint Bernard, Paris, 75005, France
| |
Collapse
|
15
|
Mehravar M, Ghaemimanesh F, Poursani EM. Exon and intron sharing in opposite direction-an undocumented phenomenon in human genome-between Pou5f1 and Tcf19 genes. BMC Genomics 2021; 22:718. [PMID: 34610795 PMCID: PMC8493703 DOI: 10.1186/s12864-021-08039-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/07/2021] [Accepted: 09/24/2021] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Overlapping genes share same genomic regions in parallel (sense) or anti-parallel (anti-sense) orientations. These gene pairs seem to occur in all domains of life and are best known from viruses. However, the advantage and biological significance of overlapping genes is still unclear. Expressed sequence tags (ESTs) analysis enabled us to uncover an overlapping gene pair in the human genome. RESULTS By using in silico analysis of previous experimental documentations, we reveal a new form of overlapping genes in the human genome, in which two genes found on opposite strands (Pou5f1 and Tcf19), share two exons and one intron enclosed, at the same positions, between OCT4B3 and TCF19-D splice variants. CONCLUSIONS This new form of overlapping gene expands our previous perception of splicing events and may shed more light on the complexity of gene regulation in higher organisms. Additional such genes might be detected by ESTs analysis also of other organisms.
Collapse
Affiliation(s)
- Majid Mehravar
- Department of Anatomy and Developmental Biology, Development and Stem Cells Program, Biomedicine Discovery Institute, Monash University, Melbourne, Australia
| | - Fatemeh Ghaemimanesh
- Monoclonal Antibody Research Center, Avicenna Research Institute, ACECR, Tehran, Iran
| | - Ensieh M Poursani
- Hematology, Oncology and Stem Cell Transplantation Research Center, Tehran University of Medical Sciences, Tehran, Iran.
| |
Collapse
|
16
|
Yates TB, Feng K, Zhang J, Singan V, Jawdy SS, Ranjan P, Abraham PE, Barry K, Lipzen A, Pan C, Schmutz J, Chen JG, Tuskan GA, Muchero W. The Ancient Salicoid Genome Duplication Event: A Platform for Reconstruction of De Novo Gene Evolution in Populus trichocarpa. Genome Biol Evol 2021; 13:evab198. [PMID: 34469536 PMCID: PMC8445398 DOI: 10.1093/gbe/evab198] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 08/22/2021] [Indexed: 12/13/2022] Open
Abstract
Orphan genes are characteristic genomic features that have no detectable homology to genes in any other species and represent an important attribute of genome evolution as sources of novel genetic functions. Here, we identified 445 genes specific to Populus trichocarpa. Of these, we performed deeper reconstruction of 13 orphan genes to provide evidence of de novo gene evolution. Populus and its sister genera Salix are particularly well suited for the study of orphan gene evolution because of the Salicoid whole-genome duplication event which resulted in highly syntenic sister chromosomal segments across the Salicaceae. We leveraged this genomic feature to reconstruct de novo gene evolution from intergenera, interspecies, and intragenomic perspectives by comparing the syntenic regions within the P. trichocarpa reference, then P. deltoides, and finally Salix purpurea. Furthermore, we demonstrated that 86.5% of the putative orphan genes had evidence of transcription. Additionally, we also utilized the Populus genome-wide association mapping panel, a collection of 1,084 undomesticated P. trichocarpa genotypes to further determine putative regulatory networks of orphan genes using expression quantitative trait loci (eQTL) mapping. Functional enrichment of these eQTL subnetworks identified common biological themes associated with orphan genes such as response to stress and defense response. We also identify a putative cis-element for a de novo gene and leverage conserved synteny to describe evolution of a putative transcription factor binding site. Overall, 45% of orphan genes were captured in trans-eQTL networks.
Collapse
Affiliation(s)
- Timothy B Yates
- Bredesen Center for Interdisciplinary Research, University of Tennessee, Knoxville, Tennessee, USA
- Biosciences Division, Oak Ridge National Laboratory, Oak Ridge, Tennessee, USA
- Center for Bioenergy Innovation, Oak Ridge, Tennessee, USA
| | - Kai Feng
- Biosciences Division, Oak Ridge National Laboratory, Oak Ridge, Tennessee, USA
- Center for Bioenergy Innovation, Oak Ridge, Tennessee, USA
| | - Jin Zhang
- Biosciences Division, Oak Ridge National Laboratory, Oak Ridge, Tennessee, USA
- Center for Bioenergy Innovation, Oak Ridge, Tennessee, USA
| | - Vasanth Singan
- U.S. Department of Energy Joint Genome Institute, Lawrence Berkeley National Laboratory, Berkeley, California, USA
| | - Sara S Jawdy
- Biosciences Division, Oak Ridge National Laboratory, Oak Ridge, Tennessee, USA
- Center for Bioenergy Innovation, Oak Ridge, Tennessee, USA
| | - Priya Ranjan
- Biosciences Division, Oak Ridge National Laboratory, Oak Ridge, Tennessee, USA
- Center for Bioenergy Innovation, Oak Ridge, Tennessee, USA
| | - Paul E Abraham
- Biosciences Division, Oak Ridge National Laboratory, Oak Ridge, Tennessee, USA
- Center for Bioenergy Innovation, Oak Ridge, Tennessee, USA
| | - Kerrie Barry
- U.S. Department of Energy Joint Genome Institute, Lawrence Berkeley National Laboratory, Berkeley, California, USA
| | - Anna Lipzen
- U.S. Department of Energy Joint Genome Institute, Lawrence Berkeley National Laboratory, Berkeley, California, USA
| | - Chongle Pan
- School of Computer Science and Department of Microbiology and Plant Biology, University of Oklahoma, Norman, Oklahoma, USA
| | - Jeremy Schmutz
- U.S. Department of Energy Joint Genome Institute, Lawrence Berkeley National Laboratory, Berkeley, California, USA
- HudsonAlpha Institute for Biotechnology, Huntsville, Alabama, USA
| | - Jin-Gui Chen
- Bredesen Center for Interdisciplinary Research, University of Tennessee, Knoxville, Tennessee, USA
- Biosciences Division, Oak Ridge National Laboratory, Oak Ridge, Tennessee, USA
- Center for Bioenergy Innovation, Oak Ridge, Tennessee, USA
| | - Gerald A Tuskan
- Biosciences Division, Oak Ridge National Laboratory, Oak Ridge, Tennessee, USA
- Center for Bioenergy Innovation, Oak Ridge, Tennessee, USA
| | - Wellington Muchero
- Bredesen Center for Interdisciplinary Research, University of Tennessee, Knoxville, Tennessee, USA
- Biosciences Division, Oak Ridge National Laboratory, Oak Ridge, Tennessee, USA
- Center for Bioenergy Innovation, Oak Ridge, Tennessee, USA
| |
Collapse
|
17
|
Chia JY, Khoo KS, Ling TC, Croft L, Manickam S, Yap YJ, Show PL. Description and detection of excludons as transcriptional regulators in gram-positive, gram-negative and archaeal strains of prokaryotes. BIOCATALYSIS AND AGRICULTURAL BIOTECHNOLOGY 2021. [DOI: 10.1016/j.bcab.2021.101933] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/22/2022]
|