1
|
Pereira AB, Marano M, Bathala R, Zaragoza RA, Neira A, Samano A, Owoyemi A, Casola C. Orphan genes are not a distinct biological entity. Bioessays 2025; 47:e2400146. [PMID: 39491810 DOI: 10.1002/bies.202400146] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/16/2024] [Revised: 10/06/2024] [Accepted: 10/11/2024] [Indexed: 11/05/2024]
Abstract
The genome sequencing revolution has revealed that all species possess a large number of unique genes critical for trait variation, adaptation, and evolutionary innovation. One widely used approach to identify such genes consists of detecting protein-coding sequences with no homology in other genomes, termed orphan genes. These genes have been extensively studied, under the assumption that they represent valid proxies for species-specific genes. Here, we critically evaluate taxonomic, phylogenetic, and sequence evolution evidence showing that orphan genes belong to a range of evolutionary ages and thus cannot be assigned to a single lineage. Furthermore, we show that the processes generating orphan genes are substantially more diverse than generally thought and include horizontal gene transfer, transposable element domestication, and overprinting. Thus, orphan genes represent a heterogeneous collection of genes rather than a single biological entity, making them unsuitable as a subject for meaningful investigation of gene evolution and phenotypic innovation.
Collapse
Affiliation(s)
- Andres Barboza Pereira
- Interdisciplinary Graduate Program in Genetics & Genomics, Texas A&M University, College Station, Texas, USA
- Interdisciplinary Doctoral Program in Ecology and Evolutionary Biology, Texas A&M University, College Station, Texas, USA
| | - Matthew Marano
- Interdisciplinary Doctoral Program in Ecology and Evolutionary Biology, Texas A&M University, College Station, Texas, USA
| | - Ramya Bathala
- Department of Biochemistry and Biophysics, Texas A&M University, College Station, Texas, USA
| | | | - Andres Neira
- School of Pharmacy, Texas A&M University, College Station, Texas, USA
| | - Alex Samano
- Department of Biology, Texas A&M University, College Station, Texas, USA
| | - Adekola Owoyemi
- Department of Ecology and Conservation Biology, Texas A&M University, College Station, Texas, USA
| | - Claudio Casola
- Interdisciplinary Graduate Program in Genetics & Genomics, Texas A&M University, College Station, Texas, USA
- Interdisciplinary Doctoral Program in Ecology and Evolutionary Biology, Texas A&M University, College Station, Texas, USA
- Department of Ecology and Conservation Biology, Texas A&M University, College Station, Texas, USA
| |
Collapse
|
2
|
Wehbi S, Wheeler A, Morel B, Manepalli N, Minh BQ, Lauretta DS, Masel J. Order of amino acid recruitment into the genetic code resolved by last universal common ancestor's protein domains. Proc Natl Acad Sci U S A 2024; 121:e2410311121. [PMID: 39665745 DOI: 10.1073/pnas.2410311121] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/04/2024] [Accepted: 11/13/2024] [Indexed: 12/13/2024] Open
Abstract
The current "consensus" order in which amino acids were added to the genetic code is based on potentially biased criteria, such as the absence of sulfur-containing amino acids from the Urey-Miller experiment which lacked sulfur. More broadly, abiotic abundance might not reflect biotic abundance in the organisms in which the genetic code evolved. Here, we instead identify which protein domains date to the last universal common ancestor (LUCA) and then infer the order of recruitment from deviations of their ancestrally reconstructed amino acid frequencies from the still-ancient post-LUCA controls. We find that smaller amino acids were added to the code earlier, with no additional predictive power in the previous consensus order. Metal-binding (cysteine and histidine) and sulfur-containing (cysteine and methionine) amino acids were added to the genetic code much earlier than previously thought. Methionine and histidine were added to the code earlier than expected from their molecular weights and glutamine later. Early methionine availability is compatible with inferred early use of S-adenosylmethionine and early histidine with its purine-like structure and the demand for metal binding. Even more ancient protein sequences-those that had already diversified into multiple distinct copies prior to LUCA-have significantly higher frequencies of aromatic amino acids (tryptophan, tyrosine, phenylalanine, and histidine) and lower frequencies of valine and glutamic acid than single-copy LUCA sequences. If at least some of these sequences predate the current code, then their distinct enrichment patterns provide hints about earlier, alternative genetic codes.
Collapse
Affiliation(s)
- Sawsan Wehbi
- Genetics Graduate Interdisciplinary Program, University of Arizona, Tucson, AZ 85721
| | - Andrew Wheeler
- Genetics Graduate Interdisciplinary Program, University of Arizona, Tucson, AZ 85721
| | - Benoit Morel
- Computational Molecular Evolution Group, Heidelberg Institute for Theoretical Studies, Heidelberg, Germany
| | - Nandini Manepalli
- Department of Molecular and Cellular Biology, University of Arizona, Tucson, AZ 85721
| | - Bui Quang Minh
- School of Computing, Australian National University, Canberra, ACT, Australia
| | - Dante S Lauretta
- Lunar and Planetary Laboratory, University of Arizona, Tucson, AZ 85721
| | - Joanna Masel
- Department of Ecology and Evolutionary Biology, University of Arizona, Tucson, AZ 85721
| |
Collapse
|
3
|
Shakir S, Boissinot S, Michon T, Lafarge S, Zaidi SS. Beyond movement: expanding functional landscape of luteovirus movement proteins. TRENDS IN PLANT SCIENCE 2024; 29:1331-1341. [PMID: 39306539 DOI: 10.1016/j.tplants.2024.09.001] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/30/2024] [Revised: 08/27/2024] [Accepted: 09/02/2024] [Indexed: 12/07/2024]
Abstract
Viruses explore the potential multifunctional capacity of the proteins encoded in their compact genome to establish infection. P4 of luteoviruses has emerged as one such multifunctional protein. Expressed from an open reading frame (ORF) nested within coat protein ORF, it displays diverse subcellular localizations and interactions, reflecting its complex role in virus infection. In this review we explore how P4, constrained by overlapping ORFs, has evolved multiple functional motifs. We analyze these motifs' conservation across different barley yellow dwarf virus (BYDV) species and related poleroviruses. We also discuss how viral proteins cooperate to facilitate movement and localization of the virus throughout infection. We provide insights into potential future research directions and suggest strategies for developing potential antiviral-resistant approaches.
Collapse
Affiliation(s)
- Sara Shakir
- UMR Biologie du Fruit et Pathologie, INRAE, Université de Bordeaux, 33882, Villenave d'Ornon, France.
| | - Sylvaine Boissinot
- UMR Biologie du Fruit et Pathologie, INRAE, Université de Bordeaux, 33882, Villenave d'Ornon, France
| | - Thierry Michon
- UMR Biologie du Fruit et Pathologie, INRAE, Université de Bordeaux, 33882, Villenave d'Ornon, France
| | - Stéphane Lafarge
- Centre de Recherche de Chappes, Route d'Ennezat CS90216, 63720, Chappes, France
| | - Syed S Zaidi
- UMR Biologie du Fruit et Pathologie, INRAE, Université de Bordeaux, 33882, Villenave d'Ornon, France.
| |
Collapse
|
4
|
Wehbi S, Wheeler A, Morel B, Manepalli N, Minh BQ, Lauretta DS, Masel J. Order of amino acid recruitment into the genetic code resolved by Last Universal Common Ancestor's protein domains. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.04.13.589375. [PMID: 38659899 PMCID: PMC11042313 DOI: 10.1101/2024.04.13.589375] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 04/26/2024]
Abstract
The current "consensus" order in which amino acids were added to the genetic code is based on potentially biased criteria, such as absence of sulfur-containing amino acids from the Urey-Miller experiment which lacked sulfur. More broadly, abiotic abundance might not reflect biotic abundance in the organisms in which the genetic code evolved. Here, we instead identify which protein domains date to the last universal common ancestor (LUCA), then infer the order of recruitment from deviations of their ancestrally reconstructed amino acid frequencies from the still-ancient post-LUCA controls. We find that smaller amino acids were added to the code earlier, with no additional predictive power in the previous "consensus" order. Metal-binding (cysteine and histidine) and sulfur-containing (cysteine and methionine) amino acids were added to the genetic code much earlier than previously thought. Methionine and histidine were added to the code earlier than expected from their molecular weights, and glutamine later. Early methionine availability is compatible with inferred early use of S-adenosylmethionine, and early histidine with its purine-like structure and the demand for metal-binding. Even more ancient protein sequences - those that had already diversified into multiple distinct copies prior to LUCA - have significantly higher frequencies of aromatic amino acids (tryptophan, tyrosine, phenylalanine and histidine), and lower frequencies of valine and glutamic acid than single copy LUCA sequences. If at least some of these sequences predate the current code, then their distinct enrichment patterns provide hints about earlier, alternative genetic codes.
Collapse
Affiliation(s)
- Sawsan Wehbi
- Genetics Graduate Interdisciplinary Program, University of Arizona, Tucson, Arizona, 85721, USA
| | - Andrew Wheeler
- Genetics Graduate Interdisciplinary Program, University of Arizona, Tucson, Arizona, 85721, USA
| | - Benoit Morel
- Computational Molecular Evolution Group, Heidelberg Institute for Theoretical Studies, Heidelberg, Germany
| | - Nandini Manepalli
- Department of Molecular and Cellular Biology, University of Arizona, Tucson, AZ, 85721, USA
| | - Bui Quang Minh
- School of Computing, Australian National University, Canberra, ACT, Australia
| | - Dante S. Lauretta
- Lunar and Planetary Laboratory, University of Arizona, Tucson, AZ 85721, USA
| | - Joanna Masel
- Department of Ecology and Evolutionary Biology, University of Arizona, Tucson, AZ, 85721, USA
| |
Collapse
|
5
|
uz-Zaman MH, D’Alton S, Barrick JE, Ochman H. Promoter recruitment drives the emergence of proto-genes in a long-term evolution experiment with Escherichia coli. PLoS Biol 2024; 22:e3002418. [PMID: 38713714 PMCID: PMC11101190 DOI: 10.1371/journal.pbio.3002418] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/18/2023] [Revised: 05/17/2024] [Accepted: 04/18/2024] [Indexed: 05/09/2024] Open
Abstract
The phenomenon of de novo gene birth-the emergence of genes from non-genic sequences-has received considerable attention due to the widespread occurrence of genes that are unique to particular species or genomes. Most instances of de novo gene birth have been recognized through comparative analyses of genome sequences in eukaryotes, despite the abundance of novel, lineage-specific genes in bacteria and the relative ease with which bacteria can be studied in an experimental context. Here, we explore the genetic record of the Escherichia coli long-term evolution experiment (LTEE) for changes indicative of "proto-genic" phases of new gene birth in which non-genic sequences evolve stable transcription and/or translation. Over the time span of the LTEE, non-genic regions are frequently transcribed, translated and differentially expressed, with levels of transcription across low-expressed regions increasing in later generations of the experiment. Proto-genes formed downstream of new mutations result either from insertion element activity or chromosomal translocations that fused preexisting regulatory sequences to regions that were not expressed in the LTEE ancestor. Additionally, we identified instances of proto-gene emergence in which a previously unexpressed sequence was transcribed after formation of an upstream promoter, although such cases were rare compared to those caused by recruitment of preexisting promoters. Tracing the origin of the causative mutations, we discovered that most occurred early in the history of the LTEE, often within the first 20,000 generations, and became fixed soon after emergence. Our findings show that proto-genes emerge frequently within evolving populations, can persist stably, and can serve as potential substrates for new gene formation.
Collapse
Affiliation(s)
- Md. Hassan uz-Zaman
- Department of Molecular Biosciences, University of Texas at Austin, Austin, Texas, United States of America
| | - Simon D’Alton
- Department of Molecular Biosciences, University of Texas at Austin, Austin, Texas, United States of America
| | - Jeffrey E. Barrick
- Department of Molecular Biosciences, University of Texas at Austin, Austin, Texas, United States of America
| | - Howard Ochman
- Department of Molecular Biosciences, University of Texas at Austin, Austin, Texas, United States of America
| |
Collapse
|
6
|
Legarda EG, Elena SF, Mushegian AR. Emergence of two distinct spatial folds in a pair of plant virus proteins encoded by nested genes. J Biol Chem 2024; 300:107218. [PMID: 38522515 PMCID: PMC11044054 DOI: 10.1016/j.jbc.2024.107218] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/02/2023] [Revised: 03/15/2024] [Accepted: 03/19/2024] [Indexed: 03/26/2024] Open
Abstract
Virus genomes may encode overlapping or nested open reading frames that increase their coding capacity. It is not known whether the constraints on spatial structures of the two encoded proteins limit the evolvability of nested genes. We examine the evolution of a pair of proteins, p22 and p19, encoded by nested genes in plant viruses from the genus Tombusvirus. The known structure of p19, a suppressor of RNA silencing, belongs to the RAGNYA fold from the alpha+beta class. The structure of p22, the cell-to-cell movement protein from the 30K family widespread in plant viruses, is predicted with the AlphaFold approach, suggesting a single jelly-roll fold core from the all-beta class, structurally similar to capsid proteins from plant and animal viruses. The nucleotide and codon preferences impose modest constraints on the types of secondary structures encoded in the alternative reading frames, nonetheless allowing for compact, well-ordered folds from different structural classes in two similarly-sized nested proteins. Tombusvirus p22 emerged through radiation of the widespread 30K family, which evolved by duplication of a virus capsid protein early in the evolution of plant viruses, whereas lineage-specific p19 may have emerged by a stepwise increase in the length of the overprinted gene and incremental acquisition of functionally active secondary structure elements by the protein product. This evolution of p19 toward the RAGNYA fold represents one of the first documented examples of protein structure convergence in naturally occurring proteins.
Collapse
Affiliation(s)
- Esmeralda G Legarda
- Instituto de Biología Integrativa de Sistemas (I2SysBio), CSIC-Universitat de València, Paterna, València, Spain
| | - Santiago F Elena
- Instituto de Biología Integrativa de Sistemas (I2SysBio), CSIC-Universitat de València, Paterna, València, Spain; The Santa Fe Institute, Santa Fe, New Mexico, USA
| | - Arcady R Mushegian
- Division of Molecular and Cellular Biosciences, National Science Foundation, Arlington, Virginia, USA.
| |
Collapse
|
7
|
Liu X, Xiao C, Xu X, Zhang J, Mo F, Chen JY, Delihas N, Zhang L, An NA, Li CY. Origin of functional de novo genes in humans from "hopeful monsters". WILEY INTERDISCIPLINARY REVIEWS. RNA 2024; 15:e1845. [PMID: 38605485 DOI: 10.1002/wrna.1845] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 08/24/2023] [Revised: 03/13/2024] [Accepted: 03/18/2024] [Indexed: 04/13/2024]
Abstract
For a long time, it was believed that new genes arise only from modifications of preexisting genes, but the discovery of de novo protein-coding genes that originated from noncoding DNA regions demonstrates the existence of a "motherless" origination process for new genes. However, the features, distributions, expression profiles, and origin modes of these genes in humans seem to support the notion that their origin is not a purely "motherless" process; rather, these genes arise preferentially from genomic regions encoding preexisting precursors with gene-like features. In such a case, the gene loci are typically not brand new. In this short review, we will summarize the definition and features of human de novo genes and clarify their process of origination from ancestral non-coding genomic regions. In addition, we define the favored precursors, or "hopeful monsters," for the origin of de novo genes and present a discussion of the functional significance of these young genes in brain development and tumorigenesis in humans. This article is categorized under: RNA Evolution and Genomics > RNA and Ribonucleoprotein Evolution.
Collapse
Affiliation(s)
- Xiaoge Liu
- State Key Laboratory of Protein and Plant Gene Research, Laboratory of Bioinformatics and Genomic Medicine, Institute of Molecular Medicine, College of Future Technology, Peking University, Beijing, China
| | - Chunfu Xiao
- State Key Laboratory of Protein and Plant Gene Research, Laboratory of Bioinformatics and Genomic Medicine, Institute of Molecular Medicine, College of Future Technology, Peking University, Beijing, China
| | - Xinwei Xu
- State Key Laboratory of Protein and Plant Gene Research, Laboratory of Bioinformatics and Genomic Medicine, Institute of Molecular Medicine, College of Future Technology, Peking University, Beijing, China
| | - Jie Zhang
- State Key Laboratory of Protein and Plant Gene Research, Laboratory of Bioinformatics and Genomic Medicine, Institute of Molecular Medicine, College of Future Technology, Peking University, Beijing, China
| | - Fan Mo
- State Key Laboratory of Stem Cell and Reproductive Biology, Institute of Stem Cell and Regeneration, Institute of Zoology, Chinese Academy of Sciences, Beijing, China
| | - Jia-Yu Chen
- State Key Laboratory of Pharmaceutical Biotechnology, School of Life Sciences, Chemistry and Biomedicine Innovation Center (ChemBIC), Nanjing University, Nanjing, China
| | - Nicholas Delihas
- Department of Microbiology and Immunology, Renaissance School of Medicine, Stony Brook University, Stony Brook, New York, USA
| | - Li Zhang
- Chinese Institute for Brain Research, Beijing, China
| | - Ni A An
- State Key Laboratory of Protein and Plant Gene Research, Laboratory of Bioinformatics and Genomic Medicine, Institute of Molecular Medicine, College of Future Technology, Peking University, Beijing, China
| | - Chuan-Yun Li
- State Key Laboratory of Protein and Plant Gene Research, Laboratory of Bioinformatics and Genomic Medicine, Institute of Molecular Medicine, College of Future Technology, Peking University, Beijing, China
- Chinese Institute for Brain Research, Beijing, China
- Southwest United Graduate School, Kunming, China
| |
Collapse
|
8
|
Pavesi A, Romerio F. Creation of the HIV-1 antisense gene asp coincided with the emergence of the pandemic group M and is associated with faster disease progression. Microbiol Spectr 2024; 12:e0380223. [PMID: 38230940 PMCID: PMC10846101 DOI: 10.1128/spectrum.03802-23] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/31/2023] [Accepted: 12/19/2023] [Indexed: 01/18/2024] Open
Abstract
Despite being first identified more than three decades ago, the antisense gene asp of HIV-1 remains an enigma. asp is present uniquely in pandemic (group M) HIV-1 strains, and it is absent in all non-pandemic (out-of-M) HIV-1 strains and virtually all non-human primate lentiviruses. This suggests that the creation of asp may have contributed to HIV-1 fitness or worldwide spread. It also raises the question of which evolutionary processes were at play in the creation of asp. Here, we show that HIV-1 genomes containing an intact asp gene are associated with faster HIV-1 disease progression. Furthermore, we demonstrate that the creation of a full-length asp gene occurred via the evolution of codon usage in env overlapping asp on the opposite strand. This involved differential use of synonymous codons or conservative amino acid substitution in env that eliminated internal stop codons in asp, and redistribution of synonymous codons in env that minimized the likelihood of new premature stops arising in asp. Nevertheless, the creation of a full-length asp gene reduced the genetic diversity of env. The Luria-Delbruck fluctuation test suggests that the interrupted asp open reading frame (ORF) is the progenitor of the intact ORF, rather than a descendant under random genetic drift. Therefore, the existence of group-M isolates with a truncated asp ORF indicates an incomplete transition process. For the first time, our study links the presence of a full-length asp ORF to faster disease progression, thus warranting further investigation into the cellular processes and molecular mechanisms through which the ASP protein impacts HIV-1 replication, transmission, and pathogenesis.IMPORTANCEOverlapping genes engage in a tug-of-war, constraining each other's evolution. The creation of a new gene overlapping an existing one comes at an evolutionary cost. Thus, its conservation must be advantageous, or it will be lost, especially if the pre-existing gene is essential for the viability of the virus or cell. We found that the creation and conservation of the HIV-1 antisense gene asp occurred through differential use of synonymous codons or conservative amino acid substitutions within the overlapping gene, env. This process did not involve amino acid changes in ENV that benefited its function, but rather it constrained the evolution of ENV. Nonetheless, the creation of asp brought a net selective advantage to HIV-1 because asp is conserved especially among high-prevalence strains. The association between the presence of an intact asp gene and faster HIV-1 disease progression supports that conclusion and warrants further investigation.
Collapse
Affiliation(s)
- Angelo Pavesi
- Department of Chemistry, Life Sciences and Environmental Sustainability, University of Parma, Parma, Italy
| | - Fabio Romerio
- Department of Molecular and Comparative Pathobiology, Johns Hopkins University School of Medicine, Baltimore, Maryland, USA
| |
Collapse
|
9
|
Takeuchi N, Fullmer MS, Maddock DJ, Poole AM. The Constructive Black Queen hypothesis: new functions can evolve under conditions favouring gene loss. THE ISME JOURNAL 2024; 18:wrae011. [PMID: 38366199 PMCID: PMC10942775 DOI: 10.1093/ismejo/wrae011] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 01/05/2024] [Revised: 01/17/2024] [Accepted: 01/19/2024] [Indexed: 02/18/2024]
Abstract
Duplication is a major route for the emergence of new gene functions. However, the emergence of new gene functions via this route may be reduced in prokaryotes, as redundant genes are often rapidly purged. In lineages with compact, streamlined genomes, it thus appears challenging for novel function to emerge via duplication and divergence. A further pressure contributing to gene loss occurs under Black Queen dynamics, as cheaters that lose the capacity to produce a public good can instead acquire it from neighbouring producers. We propose that Black Queen dynamics can favour the emergence of new function because, under an emerging Black Queen dynamic, there is high gene redundancy spread across a community of interacting cells. Using computational modelling, we demonstrate that new gene functions can emerge under Black Queen dynamics. This result holds even if there is deletion bias due to low duplication rates and selection against redundant gene copies resulting from the high cost associated with carrying a locus. However, when the public good production costs are high, Black Queen dynamics impede the fixation of new functions. Our results expand the mechanisms by which new gene functions can emerge in prokaryotic systems.
Collapse
Affiliation(s)
- Nobuto Takeuchi
- School of Biological Sciences, University of Auckland, Auckland 1010, New Zealand
- Universal Biology Institute, University of Tokyo, Tokyo 113-0033, Japan
- Department of Biology, Faculty of Sciences, Kyushu University, Fukuoka 819-0395, Japan
| | - Matthew S Fullmer
- School of Biological Sciences, University of Auckland, Auckland 1010, New Zealand
| | - Danielle J Maddock
- School of Biological Sciences, University of Auckland, Auckland 1010, New Zealand
| | - Anthony M Poole
- School of Biological Sciences, University of Auckland, Auckland 1010, New Zealand
| |
Collapse
|
10
|
Bukhnikashvili L. Overlaps Between CDS Regions of Protein-Coding Genes in the Human Genome: A Case Study on the NR1D1-THRA Gene Pair. J Mol Evol 2023; 91:963-975. [PMID: 38006429 DOI: 10.1007/s00239-023-10147-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/28/2023] [Accepted: 11/12/2023] [Indexed: 11/27/2023]
Abstract
For several decades, it has been known that a substantial number of genes within human DNA exhibit overlap; however, the biological and evolutionary significance of these overlaps remain poorly understood. This study focused on investigating specific instances of overlap where the overlapping DNA region encompasses the coding DNA sequences (CDSs) of protein-coding genes. The results revealed that proteins encoded by overlapping CDSs exhibit greater disorder than those from nonoverlapping CDSs. Additionally, these DNA regions were identified as GC-rich. This could be partially attributed to the absence of stop codons from two distinct reading frames rather than one. Furthermore, these regions were found to harbour fewer single-nucleotide polymorphism (SNP) sites, possibly due to constraints arising from the overlapping state where mutations could affect two genes simultaneously.While elucidating these properties, the NR1D1-THRA gene pair emerged as an exceptional case with highly structured proteins and a distinctly conserved sequence across eutherian mammals. Both NR1D1 and THRA are nuclear receptors lacking a ligand-binding domain at their C-terminus, which is the region where these gene pairs overlap. The NR1D1 gene is involved in the regulation of circadian rhythm, while the THRA gene encodes a thyroid hormone receptor, and both play crucial roles in various physiological processes. This study suggests that, in addition to their well-established functions, the specifically overlapping CDS regions of these genes may encode protein segments with additional, yet undiscovered, biological roles.
Collapse
|
11
|
Uz-Zaman MH, D'Alton S, Barrick JE, Ochman H. Promoter capture drives the emergence of proto-genes in Escherichia coli. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.11.15.567300. [PMID: 38013999 PMCID: PMC10680751 DOI: 10.1101/2023.11.15.567300] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/29/2023]
Abstract
The phenomenon of de novo gene birth-the emergence of genes from non-genic sequences-has received considerable attention due to the widespread occurrence of genes that are unique to particular species or genomes. Most instances of de novo gene birth have been recognized through comparative analyses of genome sequences in eukaryotes, despite the abundance of novel, lineage-specific genes in bacteria and the relative ease with which bacteria can be studied in an experimental context. Here, we explore the genetic record of the Escherichia coli Long-Term Evolution Experiment (LTEE) for changes indicative of "proto-genic" phases of new gene birth in which non-genic sequences evolve stable transcription and/or translation. Over the time-span of the LTEE, non-genic regions are frequently transcribed, translated and differentially expressed, thereby serving as raw material for new gene emergence. Most proto-genes result either from insertion element activity or chromosomal translocations that fused pre-existing regulatory sequences to regions that were not expressed in the LTEE ancestor. Additionally, we identified instances of proto-gene emergence in which a previously unexpressed sequence was transcribed after formation of an upstream promoter. Tracing the origin of the causative mutations, we discovered that most occurred early in the history of the LTEE, often within the first 20,000 generations, and became fixed soon after emergence. Our findings show that proto-genes emerge frequently within evolving populations, persist stably, and can serve as potential substrates for new gene formation.
Collapse
|
12
|
Ardern Z. Alternative Reading Frames are an Underappreciated Source of Protein Sequence Novelty. J Mol Evol 2023; 91:570-580. [PMID: 37326679 DOI: 10.1007/s00239-023-10122-3] [Citation(s) in RCA: 8] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/19/2022] [Accepted: 05/31/2023] [Indexed: 06/17/2023]
Abstract
Protein-coding DNA sequences can be translated into completely different amino acid sequences if the nucleotide triplets used are shifted by a non-triplet amount on the same DNA strand or by translating codons from the opposite strand. Such "alternative reading frames" of protein-coding genes are a major contributor to the evolution of novel protein products. Recent studies demonstrating this include examples across the three domains of cellular life and in viruses. These sequences increase the number of trials potentially available for the evolutionary invention of new genes and also have unusual properties which may facilitate gene origin. There is evidence that the structure of the standard genetic code contributes to the features and gene-likeness of some alternative frame sequences. These findings have important implications across diverse areas of molecular biology, including for genome annotation, structural biology, and evolutionary genomics.
Collapse
|
13
|
N’Guessan A, Kailasam S, Mostefai F, Poujol R, Grenier JC, Ismailova N, Contini P, De Palma R, Haber C, Stadler V, Bourque G, Hussin JG, Shapiro BJ, Fritz JH, Piccirillo CA. Selection for immune evasion in SARS-CoV-2 revealed by high-resolution epitope mapping and sequence analysis. iScience 2023; 26:107394. [PMID: 37599818 PMCID: PMC10433132 DOI: 10.1016/j.isci.2023.107394] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/05/2022] [Revised: 02/10/2023] [Accepted: 07/10/2023] [Indexed: 08/22/2023] Open
Abstract
Here, we exploit a deep serological profiling strategy coupled with an integrated, computational framework for the analysis of SARS-CoV-2 humoral immune responses. Applying a high-density peptide array (HDPA) spanning the entire proteomes of SARS-CoV-2 and endemic human coronaviruses allowed identification of B cell epitopes and relate them to their evolutionary and structural properties. We identify hotspots of pre-existing immunity and identify cross-reactive epitopes that contribute to increasing the overall humoral immune response to SARS-CoV-2. Using a public dataset of over 38,000 viral genomes from the early phase of the pandemic, capturing both inter- and within-host genetic viral diversity, we determined the evolutionary profile of epitopes and the differences across proteins, waves, and SARS-CoV-2 variants. Lastly, we show that mutations in spike and nucleocapsid epitopes are under stronger selection between than within patients, suggesting that most of the selective pressure for immune evasion occurs upon transmission between hosts.
Collapse
Affiliation(s)
- Arnaud N’Guessan
- Department of Microbiology and Immunology, McGill University, Montréal, QC, Canada
- McGill Genome Centre, McGill University, Montréal, QC, Canada
| | - Senthilkumar Kailasam
- Canadian Center for Computational Genomics, Montréal, QC, Canada
- Department of Human Genetics, McGill University, Montréal, QC, Canada
- Dahdaleh Institute of Genomic Medicine (DIgM), McGill University, Montréal, QC, Canada
| | - Fatima Mostefai
- Research Centre, Montreal Heart Institute, Montreal, QC, Canada
- Département de Biochimie et Médecine Moléculaire, Université de Montréal, Montréal, QC, Canada
| | - Raphaël Poujol
- Research Centre, Montreal Heart Institute, Montreal, QC, Canada
| | | | - Nailya Ismailova
- Department of Microbiology and Immunology, McGill University, Montréal, QC, Canada
- McGill University Research Center on Complex Traits (MRCCT), McGill University, Montréal, QC, Canada
- Dahdaleh Institute of Genomic Medicine (DIgM), McGill University, Montréal, QC, Canada
| | - Paola Contini
- Department of Internal Medicine, University of Genoa and IRCCS IST-Ospedale San Martino, Genoa, Italy
| | - Raffaele De Palma
- Department of Internal Medicine, University of Genoa and IRCCS IST-Ospedale San Martino, Genoa, Italy
| | | | | | - Guillaume Bourque
- Canadian Center for Computational Genomics, Montréal, QC, Canada
- Department of Human Genetics, McGill University, Montréal, QC, Canada
- Dahdaleh Institute of Genomic Medicine (DIgM), McGill University, Montréal, QC, Canada
| | - Julie G. Hussin
- Research Centre, Montreal Heart Institute, Montreal, QC, Canada
- Département de Médecine, Université de Montréal, Montréal, QC, Canada
| | - B. Jesse Shapiro
- Department of Microbiology and Immunology, McGill University, Montréal, QC, Canada
- McGill Genome Centre, McGill University, Montréal, QC, Canada
- Dahdaleh Institute of Genomic Medicine (DIgM), McGill University, Montréal, QC, Canada
| | - Jörg H. Fritz
- Department of Microbiology and Immunology, McGill University, Montréal, QC, Canada
- McGill University Research Center on Complex Traits (MRCCT), McGill University, Montréal, QC, Canada
- Dahdaleh Institute of Genomic Medicine (DIgM), McGill University, Montréal, QC, Canada
| | - Ciriaco A. Piccirillo
- Department of Microbiology and Immunology, McGill University, Montréal, QC, Canada
- McGill University Research Center on Complex Traits (MRCCT), McGill University, Montréal, QC, Canada
- Infectious Diseases and Immunity in Global Health Program of the Research Institute of McGill Health Center, Montréal, QC, Canada
- Dahdaleh Institute of Genomic Medicine (DIgM), McGill University, Montréal, QC, Canada
| |
Collapse
|
14
|
Olendraite I, Brown K, Firth AE. Identification of RNA Virus-Derived RdRp Sequences in Publicly Available Transcriptomic Data Sets. Mol Biol Evol 2023; 40:msad060. [PMID: 37014783 PMCID: PMC10101049 DOI: 10.1093/molbev/msad060] [Citation(s) in RCA: 17] [Impact Index Per Article: 8.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/04/2022] [Revised: 01/15/2023] [Accepted: 03/08/2023] [Indexed: 04/05/2023] Open
Abstract
RNA viruses are abundant and highly diverse and infect all or most eukaryotic organisms. However, only a tiny fraction of the number and diversity of RNA virus species have been catalogued. To cost-effectively expand the diversity of known RNA virus sequences, we mined publicly available transcriptomic data sets. We developed 77 family-level Hidden Markov Model profiles for the viral RNA-dependent RNA polymerase (RdRp)-the only universal "hallmark" gene of RNA viruses. By using these to search the National Center for Biotechnology Information Transcriptome Shotgun Assembly database, we identified 5,867 contigs encoding RNA virus RdRps or fragments thereof and analyzed their diversity, taxonomic classification, phylogeny, and host associations. Our study expands the known diversity of RNA viruses, and the 77 curated RdRp Profile Hidden Markov Models provide a useful resource for the virus discovery community.
Collapse
Affiliation(s)
- Ingrida Olendraite
- Division of Virology, Department of Pathology, Addenbrookes Hospital, University of Cambridge, Cambridge, United Kingdom
| | - Katherine Brown
- Division of Virology, Department of Pathology, Addenbrookes Hospital, University of Cambridge, Cambridge, United Kingdom
| | - Andrew E Firth
- Division of Virology, Department of Pathology, Addenbrookes Hospital, University of Cambridge, Cambridge, United Kingdom
| |
Collapse
|
15
|
Lal M, Bhardwaj E, Chahar N, Yadav S, Das S. Comprehensive analysis of 1R- and 2R-MYBs reveals novel genic and protein features, complex organisation, selective expansion and insights into evolutionary tendencies. Funct Integr Genomics 2022; 22:371-405. [PMID: 35260976 DOI: 10.1007/s10142-022-00836-w] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/05/2021] [Revised: 02/10/2022] [Accepted: 02/23/2022] [Indexed: 11/28/2022]
Abstract
Myeloblastosis (MYB) family, the largest plant transcription factor family, has been subcategorised based on the number and type of repeats in the MYB domain. In spite of several reports, evolution of MYB genes and repeats remains enigmatic. Brassicaceae members are endowed with complex genomes, including dysploidy because of its unique history with multiple rounds of polyploidisation, genomic fractionations and rearrangements. The present study is an attempt to gain insights into the complexities of MYB family diversity, understand impacts of genome evolution on gene families and develop an evolutionary framework to understand the origin of various subcategories of MYB gene family. We identified and analysed 1129 MYBs that included 1R-, 2R-, 3R- and atypical-MYBs across sixteen species representing protists, fungi, animals and plants and exclude MYB identified from Brassicaceae except Arabidopsis thaliana; in addition, a total of 1137 2R-MYB genes from six Brassicaceae species were also analysed. Comparative analysis revealed predominance of 1R-MYBs in protists, fungi, animals and lower plants. Phylogenetic reconstruction and analysis of selection pressure suggested ancestral nature of R1-type repeat containing 1R-MYBs that might have undergone intragenic duplication to form multi-repeat MYBs. Distinct differences in gene structure between 1R-MYB and 2R-MYBs were observed regarding intron number, the ratio of gene length to coding DNA sequence (CDS) length and the length of exons encoding the MYB domain. Conserved as well as novel and lineage-specific intron phases were identified. Analyses of physicochemical properties revealed drastic differences indicating functional diversification in MYBs. Phylogenetic reconstruction of 1R- and 2R-MYB genes revealed a shared structure-function relationship in clades which was supported when transcriptome data was analysed in silico. Comparative genomics to study distribution pattern and mapping of 2R-MYBs revealed congruency and greater degree of synteny and collinearity among closely related species. Micro-synteny analysis of genomic segments revealed high conservation of genes that are immediately flanking the surrounding tandemly organised 2R-MYBs along with instances of local duplication, reorganisations and genome fractionation. In summary, polyploidy, dysploidy, reshuffling and genome fractionation were found to cause loss or gain of 2R-MYB genes. The findings need to be supported with functional validation to understand gene structure-function relationship along the evolutionary lineage and adaptive strategies based on comparative functional genomics in plants.
Collapse
Affiliation(s)
- Mukund Lal
- Department of Botany, University of Delhi, Delhi, 110007, India
| | - Ekta Bhardwaj
- Department of Botany, University of Delhi, Delhi, 110007, India
| | - Nishu Chahar
- Department of Botany, University of Delhi, Delhi, 110007, India
| | - Shobha Yadav
- Department of Botany, University of Delhi, Delhi, 110007, India
| | - Sandip Das
- Department of Botany, University of Delhi, Delhi, 110007, India.
| |
Collapse
|
16
|
Li WX, Ding SW. Mammalian viral suppressors of RNA interference. Trends Biochem Sci 2022; 47:978-988. [PMID: 35618579 DOI: 10.1016/j.tibs.2022.05.001] [Citation(s) in RCA: 17] [Impact Index Per Article: 5.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/05/2022] [Revised: 04/14/2022] [Accepted: 05/02/2022] [Indexed: 12/18/2022]
Abstract
The antiviral defense directed by the RNAi pathway employs distinct specificity and effector mechanisms compared with other immune responses. The specificity of antiviral RNAi is programmed by siRNAs processed from virus-derived double-stranded RNA by Dicer endonuclease. Argonaute-containing RNA-induced silencing complex loaded with the viral siRNAs acts as the effector to mediate specific virus clearance by RNAi. Recent studies have provided evidence for the production and antiviral function of virus-derived siRNAs in both undifferentiated and differentiated mammalian cells infected with a range of RNA viruses when the cognate virus-encoded suppressor of RNAi (VSR) is rendered nonfunctional. In this review, we discuss the function, mechanism, and evolutionary origin of the validated mammalian VSRs and cell culture assays for their identification.
Collapse
Affiliation(s)
- Wan-Xiang Li
- Department of Microbiology and Plant Pathology, University of California, Riverside, Riverside, CA, USA
| | - Shou-Wei Ding
- Department of Microbiology and Plant Pathology, University of California, Riverside, Riverside, CA, USA.
| |
Collapse
|
17
|
Pley C, Lourenço J, McNaughton AL, Matthews PC. Spacer Domain in Hepatitis B Virus Polymerase: Plugging a Hole or Performing a Role? J Virol 2022; 96:e0005122. [PMID: 35412348 PMCID: PMC9093120 DOI: 10.1128/jvi.00051-22] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/10/2022] [Accepted: 03/14/2022] [Indexed: 11/25/2022] Open
Abstract
Hepatitis B virus (HBV) polymerase is divided into terminal protein, spacer, reverse transcriptase, and RNase domains. Spacer has previously been considered dispensable, merely acting as a tether between other domains or providing plasticity to accommodate deletions and mutations. We explore evidence for the role of spacer sequence, structure, and function in HBV evolution and lineage, consider its associations with escape from drugs, vaccines, and immune responses, and review its potential impacts on disease outcomes.
Collapse
Affiliation(s)
- Caitlin Pley
- School of Clinical Medicine, University of Cambridge, Cambridge, United Kingdom
- Guy’s and St Thomas’ NHS Foundation Trust, London, United Kingdom
| | - José Lourenço
- Department of Zoology, University of Oxford, Oxford, United Kingdom
- Biosystems and Integrative Sciences Institute, University of Lisbon, Lisbon, Portugal
| | - Anna L. McNaughton
- Population Health Science, Bristol Medical School, University of Bristol, Bristol, United Kingdom
- Nuffield Department of Medicine, University of Oxford Medawar Building, Oxford, United Kingdom
| | - Philippa C. Matthews
- Nuffield Department of Medicine, University of Oxford Medawar Building, Oxford, United Kingdom
- The Francis Crick Institute, London, United Kingdom
- Division of Infection and Immunity, University College London, London, United Kingdom
| |
Collapse
|
18
|
Kreitmeier M, Ardern Z, Abele M, Ludwig C, Scherer S, Neuhaus K. Spotlight on alternative frame coding: Two long overlapping genes in Pseudomonas aeruginosa are translated and under purifying selection. iScience 2022; 25:103844. [PMID: 35198897 PMCID: PMC8850804 DOI: 10.1016/j.isci.2022.103844] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/10/2021] [Revised: 10/14/2021] [Accepted: 01/27/2022] [Indexed: 12/13/2022] Open
Abstract
The existence of overlapping genes (OLGs) with significant coding overlaps revolutionizes our understanding of genomic complexity. We report two exceptionally long (957 nt and 1536 nt), evolutionarily novel, translated antisense open reading frames (ORFs) embedded within annotated genes in the pathogenic Gram-negative bacterium Pseudomonas aeruginosa. Both OLG pairs show sequence features consistent with being genes and transcriptional signals in RNA sequencing. Translation of both OLGs was confirmed by ribosome profiling and mass spectrometry. Quantitative proteomics of samples taken during different phases of growth revealed regulation of protein abundances, implying biological functionality. Both OLGs are taxonomically restricted, and likely arose by overprinting within the genus. Evidence for purifying selection further supports functionality. The OLGs reported here, designated olg1 and olg2, are the longest yet proposed in prokaryotes and are among the best attested in terms of translation and evolutionary constraint. These results highlight a potentially large unexplored dimension of prokaryotic genomes.
Collapse
Affiliation(s)
- Michaela Kreitmeier
- Chair for Microbial Ecology, TUM School of Life Sciences, Technische Universität München, Weihenstephaner Berg 3, 85354 Freising, Germany
| | - Zachary Ardern
- Chair for Microbial Ecology, TUM School of Life Sciences, Technische Universität München, Weihenstephaner Berg 3, 85354 Freising, Germany
- Wellcome Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SA, UK
| | - Miriam Abele
- Bavarian Center for Biomolecular Mass Spectrometry (BayBioMS), TUM School of Life Sciences, Technische Universität München, Gregor-Mendel-Strasse 4, 85354 Freising, Germany
| | - Christina Ludwig
- Bavarian Center for Biomolecular Mass Spectrometry (BayBioMS), TUM School of Life Sciences, Technische Universität München, Gregor-Mendel-Strasse 4, 85354 Freising, Germany
| | - Siegfried Scherer
- Chair for Microbial Ecology, TUM School of Life Sciences, Technische Universität München, Weihenstephaner Berg 3, 85354 Freising, Germany
| | - Klaus Neuhaus
- Core Facility Microbiome, ZIEL – Institute for Food & Health, Technische Universität München, Weihenstephaner Berg 3, 85354 Freising, Germany
| |
Collapse
|
19
|
Biba D, Klink G, Bazykin G. Pairs of mutually compensatory frameshifting mutations contribute to protein evolution. Mol Biol Evol 2022; 39:6524633. [PMID: 35137193 PMCID: PMC8935012 DOI: 10.1093/molbev/msac031] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/22/2022] Open
Abstract
Insertions and deletions of lengths not divisible by 3 in protein-coding sequences cause frameshifts that usually induce premature stop codons and may carry a high fitness cost. However, this cost can be partially offset by a second compensatory indel restoring the reading frame. The role of such pairs of compensatory frameshifting mutations (pCFMs) in evolution has not been studied systematically. Here, we use whole-genome alignments of protein-coding genes of 100 vertebrate species, and of 122 insect species, studying the prevalence of pCFMs in their divergence. We detect a total of 624 candidate pCFM genes; six of them pass stringent quality filtering, including three human genes: RAB36, ARHGAP6, and NCR3LG1. In some instances, amino acid substitutions closely predating or following pCFMs restored the biochemical similarity of the frameshifted segment to the ancestral amino acid sequence, possibly reducing or negating the fitness cost of the pCFM. Typically, however, the biochemical similarity of the frameshifted sequence to the ancestral one was not higher than the similarity of a random sequence of a protein-coding gene to its frameshifted version, indicating that pCFMs can uncover radically novel regions of protein space. In total, pCFMs represent an appreciable and previously overlooked source of novel variation in amino acid sequences.
Collapse
Affiliation(s)
- Dmitry Biba
- Center of Life Sciences, Skolkovo Institute of Science and Technology, Moscow, 121205, Russia - Moscow, Oblast
| | - Galya Klink
- Institute for Information Transmission Problems of the Russian Academy of Sciences (Kharkevitch Institute), Moscow, 127051, Russia
| | - Georgii Bazykin
- Center of Life Sciences, Skolkovo Institute of Science and Technology, Moscow, 121205, Russia - Moscow, Oblast.,Institute for Information Transmission Problems of the Russian Academy of Sciences (Kharkevitch Institute), Moscow, 127051, Russia
| |
Collapse
|
20
|
Gammuto L, Chiellini C, Iozzo M, Fani R, Petroni G. The Azurin Coding Gene: Origin and Phylogenetic Distribution. Microorganisms 2021; 10:microorganisms10010009. [PMID: 35056457 PMCID: PMC8779525 DOI: 10.3390/microorganisms10010009] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/29/2021] [Revised: 12/16/2021] [Accepted: 12/18/2021] [Indexed: 12/31/2022] Open
Abstract
Azurin is a bacterial-derived cupredoxin, which is mainly involved in electron transport reactions. Interest in azurin protein has risen in recent years due to its anticancer activity and its possible applications in anticancer therapies. Nevertheless, the attention of the scientific community only focused on the azurin protein found in Pseudomonas aeruginosa (Proteobacteria, Gammaproteobacteria). In this work, we performed the first comprehensive screening of all the bacterial genomes available in online repositories to assess azurin distribution in the three domains of life. The Azurin coding gene was not detected in the domains Archaea and Eucarya, whereas it was detected in phyla other than Proteobacteria, such as Bacteroidetes, Verrucomicrobia and Chloroflexi, and a phylogenetic analysis of the retrieved sequences was performed. Observed patchy distribution and phylogenetic data suggest that once it appeared in the bacterial domain, the azurin coding gene was lost in several bacterial phyla and/or anciently horizontally transferred between different phyla, even though a vertical inheritance appeared to be the major force driving the transmission of this gene. Interestingly, a shared conserved domain has been found among azurin members of all the investigated phyla. This domain is already known in P. aeruginosa as p28 domain and its importance for azurin anticancer activity has been widely explored. These findings may open a new and intriguing perspective in deciphering the azurin anticancer mechanisms and to develop new tools for treating cancer diseases.
Collapse
Affiliation(s)
- Leandro Gammuto
- Department of Biology, University of Pisa, 56126 Pisa, Italy;
| | - Carolina Chiellini
- National Research Council, Institute of Agricultural Biology and Biotechnology, Via Moruzzi 1, 56124 Pisa, Italy;
| | - Marta Iozzo
- Department of Experimental and Clinical Biomedical Sciences, University of Florence, Viale Morgagni 50, 50134 Florence, Italy;
| | - Renato Fani
- Laboratory of Microbial and Molecular Evolution, Department of Biology, University of Florence, Via Madonna del Piano 6, 50019 Sesto Fiorentino, Italy
- Correspondence: (R.F.); (G.P.)
| | - Giulio Petroni
- Department of Biology, University of Pisa, 56126 Pisa, Italy;
- Correspondence: (R.F.); (G.P.)
| |
Collapse
|
21
|
Wichmann S, Scherer S, Ardern Z. Biological factors in the synthetic construction of overlapping genes. BMC Genomics 2021; 22:888. [PMID: 34895142 PMCID: PMC8665328 DOI: 10.1186/s12864-021-08181-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/08/2020] [Accepted: 11/17/2021] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Overlapping genes (OLGs) with long protein-coding overlapping sequences are disallowed by standard genome annotation programs, outside of viruses. Recently however they have been discovered in Archaea, diverse Bacteria, and Mammals. The biological factors underlying life's ability to create overlapping genes require more study, and may have important applications in understanding evolution and in biotechnology. A previous study claimed that protein domains from viruses were much better suited to forming overlaps than those from other cellular organisms - in this study we assessed this claim, in order to discover what might underlie taxonomic differences in the creation of gene overlaps. RESULTS After overlapping arbitrary Pfam domain pairs and evaluating them with Hidden Markov Models we find OLG construction to be much less constrained than expected. For instance, close to 10% of the constructed sequences cannot be distinguished from typical sequences in their protein family. Most are also indistinguishable from natural protein sequences regarding identity and secondary structure. Surprisingly, contrary to a previous study, virus domains were much less suitable for designing OLGs than bacterial or eukaryotic domains were. In general, the amount of amino acid change required to force a domain to overlap is approximately equal to the variation observed within a typical domain family. The resulting high similarity between natural sequences and those altered so as to overlap is mostly due to the combination of high redundancy in the genetic code and the evolutionary exchangeability of many amino acids. CONCLUSIONS Synthetic overlapping genes which closely resemble natural gene sequences, as measured by HMM profiles, are remarkably easy to construct, and most arbitrary domain pairs can be altered so as to overlap while retaining high similarity to the original sequences. Future work however will need to assess important factors not considered such as intragenic interactions which affect protein folding. While the analysis here is not sufficient to guarantee functional folding proteins, further analysis of constructed OLGs will improve our understanding of the origin of these remarkable genetic elements across life and opens up exciting possibilities for synthetic biology.
Collapse
Affiliation(s)
- Stefan Wichmann
- Chair of Microbial Ecology, Department of Molecular Life Sciences, Technical University of Munich, Freising, Germany
| | - Siegfried Scherer
- Chair of Microbial Ecology, Department of Molecular Life Sciences, Technical University of Munich, Freising, Germany
| | - Zachary Ardern
- Chair of Microbial Ecology, Department of Molecular Life Sciences, Technical University of Munich, Freising, Germany.
- Wellcome Sanger Institute, Wellcome Genome Campus, Hinxton, UK.
| |
Collapse
|
22
|
Computational methods for inferring location and genealogy of overlapping genes in virus genomes: approaches and applications. Curr Opin Virol 2021; 52:1-8. [PMID: 34798370 PMCID: PMC8594276 DOI: 10.1016/j.coviro.2021.10.009] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/04/2021] [Revised: 10/21/2021] [Accepted: 10/22/2021] [Indexed: 12/02/2022]
Abstract
Viruses may evolve to increase the amount of encoded genetic information by means of overlapping genes, which utilize several reading frames. Such overlapping genes may be especially impactful for genomes of small size, often serving a source of novel accessory proteins, some of which play a crucial role in viral pathogenicity or in promoting the systemic spread of virus. Diverse genome-based metrics were proposed to facilitate recognition of overlapping genes that otherwise may be overlooked during genome annotation. They can detect the atypical codon bias associated with the overlap (e.g. a statistically significant reduction in variability at synonymous sites) or other sequence-composition features peculiar to overlapping genes. In this review, I compare nine computational methods, discuss their strengths and limitations, and survey how they were applied to detect candidate overlapping genes in the genome of SARS-CoV-2, the etiological agent of COVID-19 pandemic.
Collapse
|
23
|
Pavesi A. Prediction of two novel overlapping ORFs in the genome of SARS-CoV-2. Virology 2021; 562:149-157. [PMID: 34339929 PMCID: PMC8317007 DOI: 10.1016/j.virol.2021.07.011] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/15/2021] [Revised: 07/21/2021] [Accepted: 07/21/2021] [Indexed: 10/25/2022]
Abstract
Six candidate overlapping genes have been detected in SARS-CoV-2, yet current methods struggle to detect overlapping genes that recently originated. However, such genes might encode proteins beneficial to the virus, and provide a model system to understand gene birth. To complement existing detection methods, I first demonstrated that selection pressure to avoid stop codons in alternative reading frames is a driving force in the origin and retention of overlapping genes. I then built a detection method, CodScr, based on this selection pressure. Finally, I combined CodScr with methods that detect other properties of overlapping genes, such as a biased nucleotide and amino acid composition. I detected two novel ORFs (ORF-Sh and ORF-Mh), overlapping the spike and membrane genes respectively, which are under selection pressure and may be beneficial to SARS-CoV-2. ORF-Sh and ORF-Mh are present, as ORF uninterrupted by stop codons, in 100% and 95% of the SARS-CoV-2 genomes, respectively.
Collapse
Affiliation(s)
- Angelo Pavesi
- Department of Chemistry, Life Sciences and Environmental Sustainability, University of Parma, Parco Area Delle Scienze 23/A, I-43124, Parma, Italy.
| |
Collapse
|
24
|
Guerra-Almeida D, Tschoeke DA, da-Fonseca RN. Understanding small ORF diversity through a comprehensive transcription feature classification. DNA Res 2021; 28:6317669. [PMID: 34240112 PMCID: PMC8435553 DOI: 10.1093/dnares/dsab007] [Citation(s) in RCA: 15] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/22/2020] [Indexed: 11/13/2022] Open
Abstract
Small open reading frames (small ORFs/sORFs/smORFs) are potentially coding sequences smaller than 100 codons that have historically been considered junk DNA by gene prediction software and in annotation screening; however, the advent of next-generation sequencing has contributed to the deeper investigation of junk DNA regions and their transcription products, resulting in the emergence of smORFs as a new focus of interest in systems biology. Several smORF peptides were recently reported in noncanonical mRNAs as new players in numerous biological contexts; however, their relevance is still overlooked in coding potential analysis. Hence, this review proposes a smORF classification based on transcriptional features, discussing the most promising approaches to investigate smORFs based on their different characteristics. First, smORFs were divided into nonexpressed (intergenic) and expressed (genic) smORFs. Second, genic smORFs were classified as smORFs located in noncoding RNAs (ncRNAs) or canonical mRNAs. Finally, smORFs in ncRNAs were further subdivided into sequences located in small or long RNAs, whereas smORFs located in canonical mRNAs were subdivided into several specific classes depending on their localization along the gene. We hope that this review provides new insights into large-scale annotations and reinforces the role of smORFs as essential components of a hidden coding DNA world.
Collapse
Affiliation(s)
- Diego Guerra-Almeida
- Institute of Biodiversity and Sustainability, Federal University of Rio de Janeiro, Rio de Janeiro, Brazil
| | - Diogo Antonio Tschoeke
- Alberto Luiz Coimbra Institute of Graduate Studies and Engineering Research (COPPE), Biomedical Engineering Program, Federal University of Rio de Janeiro, Rio de Janeiro, Brazil
| | - Rodrigo Nunes- da-Fonseca
- Institute of Biodiversity and Sustainability, Federal University of Rio de Janeiro, Rio de Janeiro, Brazil.,National Institute of Science and Technology in Molecular Entomology, Rio de Janeiro, Brazil
| |
Collapse
|
25
|
Pavesi A. Origin, Evolution and Stability of Overlapping Genes in Viruses: A Systematic Review. Genes (Basel) 2021; 12:genes12060809. [PMID: 34073395 PMCID: PMC8227390 DOI: 10.3390/genes12060809] [Citation(s) in RCA: 19] [Impact Index Per Article: 4.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/06/2021] [Revised: 05/22/2021] [Accepted: 05/24/2021] [Indexed: 12/11/2022] Open
Abstract
During their long evolutionary history viruses generated many proteins de novo by a mechanism called “overprinting”. Overprinting is a process in which critical nucleotide substitutions in a pre-existing gene can induce the expression of a novel protein by translation of an alternative open reading frame (ORF). Overlapping genes represent an intriguing example of adaptive conflict, because they simultaneously encode two proteins whose freedom to change is constrained by each other. However, overlapping genes are also a source of genetic novelties, as the constraints under which alternative ORFs evolve can give rise to proteins with unusual sequence properties, most importantly the potential for novel functions. Starting with the discovery of overlapping genes in phages infecting Escherichia coli, this review covers a range of studies dealing with detection of overlapping genes in small eukaryotic viruses (genomic length below 30 kb) and recognition of their critical role in the evolution of pathogenicity. Origin of overlapping genes, what factors favor their birth and retention, and how they manage their inherent adaptive conflict are extensively reviewed. Special attention is paid to the assembly of overlapping genes into ad hoc databases, suitable for future studies, and to the development of statistical methods for exploring viral genome sequences in search of undiscovered overlaps.
Collapse
Affiliation(s)
- Angelo Pavesi
- Department of Chemistry, Life Sciences and Environmental Sustainability, University of Parma, Parco Area delle Scienze 23/A, I-43124 Parma, Italy
| |
Collapse
|
26
|
James JE, Willis SM, Nelson PG, Weibel C, Kosinski LJ, Masel J. Universal and taxon-specific trends in protein sequences as a function of age. eLife 2021; 10:e57347. [PMID: 33416492 PMCID: PMC7819706 DOI: 10.7554/elife.57347] [Citation(s) in RCA: 15] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/28/2020] [Accepted: 01/05/2021] [Indexed: 01/12/2023] Open
Abstract
Extant protein-coding sequences span a huge range of ages, from those that emerged only recently to those present in the last universal common ancestor. Because evolution has had less time to act on young sequences, there might be 'phylostratigraphy' trends in any properties that evolve slowly with age. A long-term reduction in hydrophobicity and hydrophobic clustering was found in previous, taxonomically restricted studies. Here we perform integrated phylostratigraphy across 435 fully sequenced species, using sensitive HMM methods to detect protein domain homology. We find that the reduction in hydrophobic clustering is universal across lineages. However, only young animal domains have a tendency to have higher structural disorder. Among ancient domains, trends in amino acid composition reflect the order of recruitment into the genetic code, suggesting that the composition of the contemporary descendants of ancient sequences reflects amino acid availability during the earliest stages of life, when these sequences first emerged.
Collapse
Affiliation(s)
- Jennifer E James
- Department of Ecology and Evolutionary Biology, University of ArizonaTucsonUnited States
| | - Sara M Willis
- Department of Ecology and Evolutionary Biology, University of ArizonaTucsonUnited States
| | - Paul G Nelson
- Department of Ecology and Evolutionary Biology, University of ArizonaTucsonUnited States
| | - Catherine Weibel
- Department of Physics, University of ArizonaTucsonUnited States
- Department of Mathematics, University of ArizonaTucsonUnited States
| | - Luke J Kosinski
- Department of Molecular and Cellular Biology, University of ArizonaTucsonUnited States
| | - Joanna Masel
- Department of Ecology and Evolutionary Biology, University of ArizonaTucsonUnited States
| |
Collapse
|
27
|
Agranovsky A. Enhancing Capsid Proteins Capacity in Plant Virus-Vector Interactions and Virus Transmission. Cells 2021; 10:cells10010090. [PMID: 33430410 PMCID: PMC7827187 DOI: 10.3390/cells10010090] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/01/2020] [Revised: 01/02/2021] [Accepted: 01/04/2021] [Indexed: 12/02/2022] Open
Abstract
Vector transmission of plant viruses is basically of two types that depend on the virus helper component proteins or the capsid proteins. A number of plant viruses belonging to disparate groups have developed unusual capsid proteins providing for interactions with the vector. Thus, cauliflower mosaic virus, a plant pararetrovirus, employs a virion associated p3 protein, the major capsid protein, and a helper component for the semi-persistent transmission by aphids. Benyviruses encode a capsid protein readthrough domain (CP-RTD) located at one end of the rod-like helical particle, which serves for the virus transmission by soil fungal zoospores. Likewise, the CP-RTD, being a minor component of the luteovirus icosahedral virions, provides for persistent, circulative aphid transmission. Closteroviruses encode several CPs and virion-associated proteins that form the filamentous helical particles and mediate transmission by aphid, whitefly, or mealybug vectors. The variable strategies of transmission and evolutionary ‘inventions’ of the unusual capsid proteins of plant RNA viruses are discussed.
Collapse
|
28
|
Douglas J, Drummond AJ, Kingston RL. Evolutionary history of cotranscriptional editing in the paramyxoviral phosphoprotein gene. Virus Evol 2021; 7:veab028. [PMID: 34141448 PMCID: PMC8204654 DOI: 10.1093/ve/veab028] [Citation(s) in RCA: 21] [Impact Index Per Article: 5.3] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/31/2022] Open
Abstract
The phosphoprotein gene of the paramyxoviruses encodes multiple protein products. The P, V, and W proteins are generated by transcriptional slippage. This process results in the insertion of non-templated guanosine nucleosides into the mRNA at a conserved edit site. The P protein is an essential component of the viral RNA polymerase and is encoded by a faithful copy of the gene in the majority of paramyxoviruses. However, in some cases, the non-essential V protein is encoded by default and guanosines must be inserted into the mRNA in order to encode P. The number of guanosines inserted into the P gene can be described by a probability distribution, which varies between viruses. In this article, we review the nature of these distributions, which can be inferred from mRNA sequencing data, and reconstruct the evolutionary history of cotranscriptional editing in the paramyxovirus family. Our model suggests that, throughout known history of the family, the system has switched from a P default to a V default mode four times; complete loss of the editing system has occurred twice, the canonical zinc finger domain of the V protein has been deleted or heavily mutated a further two times, and the W protein has independently evolved a novel function three times. Finally, we review the physical mechanisms of cotranscriptional editing via slippage of the viral RNA polymerase.
Collapse
Affiliation(s)
- Jordan Douglas
- Centre for Computational Evolution, University of Auckland, Auckland 1010, New Zealand
- School of Computer Science, University of Auckland, Auckland 1010, New Zealand
| | - Alexei J Drummond
- Centre for Computational Evolution, University of Auckland, Auckland 1010, New Zealand
- School of Biological Sciences, University of Auckland, Auckland 1010, New Zealand
| | - Richard L Kingston
- School of Biological Sciences, University of Auckland, Auckland 1010, New Zealand
| |
Collapse
|
29
|
Wright BW, Ruan J, Molloy MP, Jaschke PR. Genome Modularization Reveals Overlapped Gene Topology Is Necessary for Efficient Viral Reproduction. ACS Synth Biol 2020; 9:3079-3090. [PMID: 33044064 DOI: 10.1021/acssynbio.0c00323] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/08/2023]
Abstract
Sequence overlap between two genes is common across all genomes, with viruses having high proportions of these gene overlaps. Genome modularization and refactoring is the process of disrupting natural gene overlaps to separate coding sequences to enable their individual manipulation. The biological function and fitness effects of gene overlaps are not fully understood, and their effects on gene cluster and genome-level refactoring are unknown. The bacteriophage φX174 genome has ∼26% of nucleotides involved in encoding more than one gene. In this study we use an engineered φX174 phage containing a genome with all gene overlaps removed to show that gene overlap is critical to maintaining optimal viral fecundity. Through detailed phenotypic measurements we reveal that genome modularization in φX174 causes virion replication, stability, and attachment deficiencies. Quantitation of the complete phage proteome across an infection cycle reveals 30% of proteins display abnormal expression patterns. Taken together, we have for the first time comprehensively demonstrated that gene modularization severely perturbs the coordinated functioning of a bacteriophage replication cycle. This work highlights the biological importance of gene overlap in natural genomes and that reducing gene overlap disruption should be an integral part of future genome engineering projects.
Collapse
Affiliation(s)
- Bradley W. Wright
- Department of Molecular Sciences, Macquarie University, Sydney, NSW 2109, Australia
| | - Juanfang Ruan
- Electron Microscope Unit, Mark Wainwright Analytical Centre, The University of New South Wales, Sydney, NSW 2052, Australia
- School of Biotechnology and Biomolecular Sciences, The University of New South Wales, Sydney, NSW 2052, Australia
| | - Mark P. Molloy
- Kolling Institute, Northern Clinical School, The University of Sydney, Sydney, NSW 2006, Australia
| | - Paul R. Jaschke
- Department of Molecular Sciences, Macquarie University, Sydney, NSW 2109, Australia
| |
Collapse
|
30
|
Nelson CW, Ardern Z, Goldberg TL, Meng C, Kuo CH, Ludwig C, Kolokotronis SO, Wei X. Dynamically evolving novel overlapping gene as a factor in the SARS-CoV-2 pandemic. eLife 2020; 9:e59633. [PMID: 33001029 PMCID: PMC7655111 DOI: 10.7554/elife.59633] [Citation(s) in RCA: 61] [Impact Index Per Article: 12.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/03/2020] [Accepted: 09/30/2020] [Indexed: 12/11/2022] Open
Abstract
Understanding the emergence of novel viruses requires an accurate and comprehensive annotation of their genomes. Overlapping genes (OLGs) are common in viruses and have been associated with pandemics but are still widely overlooked. We identify and characterize ORF3d, a novel OLG in SARS-CoV-2 that is also present in Guangxi pangolin-CoVs but not other closely related pangolin-CoVs or bat-CoVs. We then document evidence of ORF3d translation, characterize its protein sequence, and conduct an evolutionary analysis at three levels: between taxa (21 members of Severe acute respiratory syndrome-related coronavirus), between human hosts (3978 SARS-CoV-2 consensus sequences), and within human hosts (401 deeply sequenced SARS-CoV-2 samples). ORF3d has been independently identified and shown to elicit a strong antibody response in COVID-19 patients. However, it has been misclassified as the unrelated gene ORF3b, leading to confusion. Our results liken ORF3d to other accessory genes in emerging viruses and highlight the importance of OLGs.
Collapse
MESH Headings
- Amino Acid Sequence
- Animals
- Antibodies, Viral/immunology
- Antibody Specificity
- Antigens, Viral/biosynthesis
- Antigens, Viral/genetics
- Antigens, Viral/immunology
- Betacoronavirus/genetics
- Betacoronavirus/pathogenicity
- Betacoronavirus/physiology
- COVID-19
- China/epidemiology
- Chiroptera/virology
- Coronavirus/genetics
- Coronavirus Infections/epidemiology
- Coronavirus Infections/virology
- Epitopes/genetics
- Epitopes/immunology
- Europe/epidemiology
- Eutheria/virology
- Evolution, Molecular
- Gene Expression Regulation, Viral
- Genes, Overlapping
- Genes, Viral
- Genetic Variation
- Haplotypes/genetics
- Host Specificity/genetics
- Humans
- Models, Molecular
- Mutation
- Open Reading Frames/genetics
- Pandemics
- Phylogeny
- Pneumonia, Viral/epidemiology
- Pneumonia, Viral/virology
- Protein Biosynthesis
- Protein Conformation
- RNA, Viral/genetics
- SARS-CoV-2
- Sequence Alignment
- Sequence Homology, Nucleic Acid
- Viral Proteins/genetics
- Viral Proteins/immunology
Collapse
Affiliation(s)
- Chase W Nelson
- Biodiversity Research Center, Academia SinicaTaipeiTaiwan
- Institute for Comparative Genomics, American Museum of Natural HistoryNew YorkUnited States
| | - Zachary Ardern
- Chair for Microbial Ecology, Technical University of MunichFreisingGermany
| | - Tony L Goldberg
- Department of Pathobiological Sciences, University of Wisconsin-MadisonMadisonUnited States
- Global Health Institute, University of Wisconsin-MadisonMadisonUnited States
| | - Chen Meng
- Bavarian Center for Biomolecular Mass Spectrometry (BayBioMS), Technical University of MunichFreisingGermany
| | - Chen-Hao Kuo
- Biodiversity Research Center, Academia SinicaTaipeiTaiwan
| | - Christina Ludwig
- Bavarian Center for Biomolecular Mass Spectrometry (BayBioMS), Technical University of MunichFreisingGermany
| | - Sergios-Orestis Kolokotronis
- Institute for Comparative Genomics, American Museum of Natural HistoryNew YorkUnited States
- Department of Epidemiology and Biostatistics, School of Public Health, SUNY Downstate Health Sciences UniversityBrooklynUnited States
- Institute for Genomic Health, SUNY Downstate Health Sciences UniversityBrooklynUnited States
- Division of Infectious Diseases, Department of Medicine, SUNY Downstate Health Sciences UniversityBrooklynUnited States
| | - Xinzhu Wei
- Departments of Integrative Biology and Statistics, University of California, BerkeleyBerkeleyUnited States
- Departments of Computer Science, Human Genetics, and Computational Medicine, University of California, Los AngelesLos AngelesUnited States
| |
Collapse
|
31
|
Mo D, Li X, Raabe CA, Rozhdestvensky TS, Skryabin BV, Brosius J. Circular RNA Encoded Amyloid Beta peptides-A Novel Putative Player in Alzheimer's Disease. Cells 2020; 9:E2196. [PMID: 33003364 PMCID: PMC7650678 DOI: 10.3390/cells9102196] [Citation(s) in RCA: 34] [Impact Index Per Article: 6.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/30/2020] [Revised: 09/15/2020] [Accepted: 09/24/2020] [Indexed: 02/05/2023] Open
Abstract
Alzheimer's disease (AD) is an age-related detrimental dementia. Amyloid beta peptides (Aβ) play a crucial role in the pathology of AD. In familial AD, Aβ are generated from the full-length amyloid beta precursor protein (APP) via dysregulated proteolytic processing; however, in the case of sporadic AD, the mechanism of Aβ biogenesis remains elusive. circRNAs are a class of transcripts preferentially expressed in brain. We identified a circRNA harboring the Aβ-coding region of the APP gene termed circAβ-a. This circular RNA was detected in the brains of AD patients and non-dementia controls. With the aid of our recently established approach for analysis of circRNA functions, we demonstrated that circAβ-a is efficiently translated into a novel Aβ-containing Aβ175 polypeptide (19.2 KDa) in both cultured cells and human brain. Furthermore, Aβ175 was shown to be processed into Aβ peptides-a hallmark of AD. In summary, our analysis revealed an alternative pathway of Aβ biogenesis. Consequently, circAβ-a and its corresponding translation product could potentially represent novel therapeutic targets for AD treatment. Importantly, our data point to yet another evolutionary route for potentially increasing proteome complexity by generating additional polypeptide variants using back-splicing of primary transcripts that yield circular RNA templates.
Collapse
Affiliation(s)
- Dingding Mo
- Max Planck Institute for Biology of Ageing, Joseph-Stelzmann-Strasse 9b, 50931 Cologne, Germany;
- VIB-KU Leuven Center for Brain & Disease Research, KU Leuven, O&N IV Herestraat 49—box 602, 3000 Leuven, Belgium
- Medical Faculty, Core Facility Transgenic Animal and Genetic Engineering Models (TRAM), University of Münster, Von-Esmarch-Str. 56, D-48149 Münster, Germany; (T.S.R.); (B.V.S.)
| | - Xinping Li
- Max Planck Institute for Biology of Ageing, Joseph-Stelzmann-Strasse 9b, 50931 Cologne, Germany;
| | - Carsten A. Raabe
- Institute of Experimental Pathology, Centre for Molecular Biology of Inflammation (ZMBE), University of Münster, Von-Esmarch-Str. 56, D-48149 Münster, Germany; (C.A.R.); (J.B.)
- Institute of Medical Biochemistry, Centre for Molecular Biology of Inflammation (ZMBE), University of Münster, Von-Esmarch-Strasse 56, D-48149 Münster, Germany
| | - Timofey S. Rozhdestvensky
- Medical Faculty, Core Facility Transgenic Animal and Genetic Engineering Models (TRAM), University of Münster, Von-Esmarch-Str. 56, D-48149 Münster, Germany; (T.S.R.); (B.V.S.)
| | - Boris V. Skryabin
- Medical Faculty, Core Facility Transgenic Animal and Genetic Engineering Models (TRAM), University of Münster, Von-Esmarch-Str. 56, D-48149 Münster, Germany; (T.S.R.); (B.V.S.)
| | - Juergen Brosius
- Institute of Experimental Pathology, Centre for Molecular Biology of Inflammation (ZMBE), University of Münster, Von-Esmarch-Str. 56, D-48149 Münster, Germany; (C.A.R.); (J.B.)
- Institutes for Systems Genetics, Frontiers Science Center for Disease-related Molecular Network, West China Hospital, Sichuan University, Chengdu 610212, China
| |
Collapse
|
32
|
Abstract
The genomes of bacteria contain fewer genes and substantially less noncoding DNA than those of eukaryotes, and as a result, they have much less raw material to invent new traits. Yet, bacteria are vastly more taxonomically diverse, numerically abundant, and globally successful in colonizing new habitats compared to eukaryotes. Although bacterial genomes are generally considered to be optimized for efficient growth and rapid adaptation, nonadaptive processes have played a major role in shaping the size, contents, and compact organization of bacterial genomes and have allowed the establishment of deleterious traits that serve as the raw materials for genetic innovation.
Collapse
Affiliation(s)
- Paul C Kirchberger
- Department of Integrative Biology, University of Texas at Austin, Texas 78712, USA; ; ;
| | - Marian L Schmidt
- Department of Integrative Biology, University of Texas at Austin, Texas 78712, USA; ; ;
| | - Howard Ochman
- Department of Integrative Biology, University of Texas at Austin, Texas 78712, USA; ; ;
| |
Collapse
|
33
|
Ho JSY, Angel M, Ma Y, Sloan E, Wang G, Martinez-Romero C, Alenquer M, Roudko V, Chung L, Zheng S, Chang M, Fstkchyan Y, Clohisey S, Dinan AM, Gibbs J, Gifford R, Shen R, Gu Q, Irigoyen N, Campisi L, Huang C, Zhao N, Jones JD, van Knippenberg I, Zhu Z, Moshkina N, Meyer L, Noel J, Peralta Z, Rezelj V, Kaake R, Rosenberg B, Wang B, Wei J, Paessler S, Wise HM, Johnson J, Vannini A, Amorim MJ, Baillie JK, Miraldi ER, Benner C, Brierley I, Digard P, Łuksza M, Firth AE, Krogan N, Greenbaum BD, MacLeod MK, van Bakel H, Garcìa-Sastre A, Yewdell JW, Hutchinson E, Marazzi I. Hybrid Gene Origination Creates Human-Virus Chimeric Proteins during Infection. Cell 2020; 181:1502-1517.e23. [PMID: 32559462 PMCID: PMC7323901 DOI: 10.1016/j.cell.2020.05.035] [Citation(s) in RCA: 34] [Impact Index Per Article: 6.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/29/2019] [Revised: 02/26/2020] [Accepted: 05/18/2020] [Indexed: 01/12/2023]
Abstract
RNA viruses are a major human health threat. The life cycles of many highly pathogenic RNA viruses like influenza A virus (IAV) and Lassa virus depends on host mRNA, because viral polymerases cleave 5'-m7G-capped host transcripts to prime viral mRNA synthesis ("cap-snatching"). We hypothesized that start codons within cap-snatched host transcripts could generate chimeric human-viral mRNAs with coding potential. We report the existence of this mechanism of gene origination, which we named "start-snatching." Depending on the reading frame, start-snatching allows the translation of host and viral "untranslated regions" (UTRs) to create N-terminally extended viral proteins or entirely novel polypeptides by genetic overprinting. We show that both types of chimeric proteins are made in IAV-infected cells, generate T cell responses, and contribute to virulence. Our results indicate that during infection with IAV, and likely a multitude of other human, animal and plant viruses, a host-dependent mechanism allows the genesis of hybrid genes.
Collapse
Affiliation(s)
- Jessica Sook Yuin Ho
- Department of Microbiology, Icahn School of Medicine at Mount Sinai, New York, NY 10029, USA
| | - Matthew Angel
- Laboratory of Viral Diseases, National Institute of Allergy and Infectious Diseases, NIH, Bethesda, MD 20892, USA
| | - Yixuan Ma
- Department of Microbiology, Icahn School of Medicine at Mount Sinai, New York, NY 10029, USA
| | - Elizabeth Sloan
- MRC-University of Glasgow Centre for Virus Research, Glasgow G61 1QH, UK
| | - Guojun Wang
- Department of Microbiology, Icahn School of Medicine at Mount Sinai, New York, NY 10029, USA; Global Health and Emerging Pathogens Institute, Icahn School of Medicine at Mount Sinai, New York, NY 10029, USA
| | - Carles Martinez-Romero
- Department of Microbiology, Icahn School of Medicine at Mount Sinai, New York, NY 10029, USA; Global Health and Emerging Pathogens Institute, Icahn School of Medicine at Mount Sinai, New York, NY 10029, USA; Division of Infectious Diseases, Department of Medicine, Icahn School of Medicine at Mount Sinai, New York, NY 10029, USA
| | - Marta Alenquer
- Instituto Gulbenkian de Ciência, 2780-156 Oeiras, Portugal
| | - Vladimir Roudko
- Tisch Cancer Institute, Icahn School of Medicine at Mount Sinai, New York, NY 10029, USA; Department of Medicine, Hematology and Medical Oncology, Icahn School of Medicine at Mount Sinai, New York, NY 10029, USA; Department of Oncological Sciences, Icahn School of Medicine at Mount Sinai, New York, NY 10029, USA; Department of Pathology, Icahn School of Medicine at Mount Sinai, New York, NY 10029, USA
| | - Liliane Chung
- The Roslin Institute, University of Edinburgh, Edinburgh EH25 9PS, UK
| | - Simin Zheng
- Department of Microbiology, Icahn School of Medicine at Mount Sinai, New York, NY 10029, USA
| | - Max Chang
- Department of Medicine, School of Medicine, University of California San Diego, La Jolla, CA 92037, USA
| | - Yesai Fstkchyan
- Department of Microbiology, Icahn School of Medicine at Mount Sinai, New York, NY 10029, USA
| | - Sara Clohisey
- The Roslin Institute, University of Edinburgh, Edinburgh EH25 9PS, UK
| | - Adam M Dinan
- Division of Virology, Department of Pathology, University of Cambridge, Cambridge CB2 0SP, UK
| | - James Gibbs
- Laboratory of Viral Diseases, National Institute of Allergy and Infectious Diseases, NIH, Bethesda, MD 20892, USA
| | - Robert Gifford
- MRC-University of Glasgow Centre for Virus Research, Glasgow G61 1QH, UK
| | - Rong Shen
- Division of Structural Biology, The Institute of Cancer Research, London SW7 3RP, UK
| | - Quan Gu
- MRC-University of Glasgow Centre for Virus Research, Glasgow G61 1QH, UK
| | - Nerea Irigoyen
- Division of Virology, Department of Pathology, University of Cambridge, Cambridge CB2 0SP, UK
| | - Laura Campisi
- Department of Microbiology, Icahn School of Medicine at Mount Sinai, New York, NY 10029, USA
| | - Cheng Huang
- Department of Pathology, the University of Texas Medical Branch, Galveston, TX 77555, USA
| | - Nan Zhao
- Department of Microbiology, Icahn School of Medicine at Mount Sinai, New York, NY 10029, USA
| | - Joshua D Jones
- Division of Virology, Department of Pathology, University of Cambridge, Cambridge CB2 0SP, UK
| | | | - Zeyu Zhu
- Department of Microbiology, Icahn School of Medicine at Mount Sinai, New York, NY 10029, USA
| | - Natasha Moshkina
- Department of Microbiology, Icahn School of Medicine at Mount Sinai, New York, NY 10029, USA
| | - Léa Meyer
- MRC-University of Glasgow Centre for Virus Research, Glasgow G61 1QH, UK
| | - Justine Noel
- Department of Microbiology, Icahn School of Medicine at Mount Sinai, New York, NY 10029, USA
| | - Zuleyma Peralta
- Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, New York, NY 10029, USA
| | - Veronica Rezelj
- MRC-University of Glasgow Centre for Virus Research, Glasgow G61 1QH, UK
| | - Robyn Kaake
- Department of Cellular and Molecular Pharmacology, University of California, San Francisco, San Francisco, CA 94158, USA
| | - Brad Rosenberg
- Department of Microbiology, Icahn School of Medicine at Mount Sinai, New York, NY 10029, USA
| | - Bo Wang
- The Roslin Institute, University of Edinburgh, Edinburgh EH25 9PS, UK
| | - Jiajie Wei
- Laboratory of Viral Diseases, National Institute of Allergy and Infectious Diseases, NIH, Bethesda, MD 20892, USA
| | - Slobodan Paessler
- Department of Pathology, the University of Texas Medical Branch, Galveston, TX 77555, USA
| | - Helen M Wise
- The Roslin Institute, University of Edinburgh, Edinburgh EH25 9PS, UK
| | - Jeffrey Johnson
- Department of Microbiology, Icahn School of Medicine at Mount Sinai, New York, NY 10029, USA; Department of Cellular and Molecular Pharmacology, University of California, San Francisco, San Francisco, CA 94158, USA
| | - Alessandro Vannini
- Division of Structural Biology, The Institute of Cancer Research, London SW7 3RP, UK; Fondazione Human Technopole, Structural Biology Research Centre, 20157 Milan, Italy
| | | | - J Kenneth Baillie
- The Roslin Institute, University of Edinburgh, Edinburgh EH25 9PS, UK
| | - Emily R Miraldi
- Divisions of Immunobiology and Biomedical Informatics, Cincinnati Children's Hospital, Cincinnati, OH 45229, USA; Department of Pediatrics, University of Cincinnati College of Medicine, Cincinnati, OH 45257, USA
| | - Christopher Benner
- Department of Medicine, School of Medicine, University of California San Diego, La Jolla, CA 92037, USA
| | - Ian Brierley
- Division of Virology, Department of Pathology, University of Cambridge, Cambridge CB2 0SP, UK
| | - Paul Digard
- The Roslin Institute, University of Edinburgh, Edinburgh EH25 9PS, UK
| | - Marta Łuksza
- Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, New York, NY 10029, USA
| | - Andrew E Firth
- Division of Virology, Department of Pathology, University of Cambridge, Cambridge CB2 0SP, UK
| | - Nevan Krogan
- Department of Cellular and Molecular Pharmacology, University of California, San Francisco, San Francisco, CA 94158, USA
| | - Benjamin D Greenbaum
- Tisch Cancer Institute, Icahn School of Medicine at Mount Sinai, New York, NY 10029, USA; Department of Medicine, Hematology and Medical Oncology, Icahn School of Medicine at Mount Sinai, New York, NY 10029, USA; Department of Oncological Sciences, Icahn School of Medicine at Mount Sinai, New York, NY 10029, USA; Department of Pathology, Icahn School of Medicine at Mount Sinai, New York, NY 10029, USA
| | - Megan K MacLeod
- Centre for Immunobiology, Institute of Infection, Immunity and Inflammation, University of Glasgow, Glasgow G12 8QQ, UK
| | - Harm van Bakel
- Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, New York, NY 10029, USA
| | - Adolfo Garcìa-Sastre
- Department of Microbiology, Icahn School of Medicine at Mount Sinai, New York, NY 10029, USA; Global Health and Emerging Pathogens Institute, Icahn School of Medicine at Mount Sinai, New York, NY 10029, USA; Division of Infectious Diseases, Department of Medicine, Icahn School of Medicine at Mount Sinai, New York, NY 10029, USA
| | - Jonathan W Yewdell
- Laboratory of Viral Diseases, National Institute of Allergy and Infectious Diseases, NIH, Bethesda, MD 20892, USA
| | - Edward Hutchinson
- MRC-University of Glasgow Centre for Virus Research, Glasgow G61 1QH, UK.
| | - Ivan Marazzi
- Department of Microbiology, Icahn School of Medicine at Mount Sinai, New York, NY 10029, USA; Global Health and Emerging Pathogens Institute, Icahn School of Medicine at Mount Sinai, New York, NY 10029, USA.
| |
Collapse
|
34
|
Pavesi A. New insights into the evolutionary features of viral overlapping genes by discriminant analysis. Virology 2020; 546:51-66. [PMID: 32452417 PMCID: PMC7157939 DOI: 10.1016/j.virol.2020.03.007] [Citation(s) in RCA: 24] [Impact Index Per Article: 4.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/20/2020] [Accepted: 03/29/2020] [Indexed: 12/18/2022]
Abstract
Overlapping genes originate by a mechanism of overprinting, in which nucleotide substitutions in a pre-existing frame induce the expression of a de novo protein from an alternative frame. In this study, I assembled a dataset of 319 viral overlapping genes, which included 82 overlaps whose expression is experimentally known and the respective 237 homologs. Principal component analysis revealed that overlapping genes have a common pattern of nucleotide and amino acid composition. Discriminant analysis separated overlapping from non-overlapping genes with an accuracy of 97%. When applied to overlapping genes with known genealogy, it separated ancestral from de novo frames with an accuracy close to 100%. This high discriminant power was crucial to computationally design variants of de novo viral proteins known to possess selective anticancer toxicity (apoptin) or protection against neurodegeneration (X protein), as well as to detect two new potential overlapping genes in the genome of the new coronavirus SARS-CoV-2.
Collapse
Affiliation(s)
- Angelo Pavesi
- Department of Chemistry, Life Sciences and Environmental Sustainability, University of Parma, Parco Area Delle Scienze 23/A, I-43124, Parma, Italy.
| |
Collapse
|
35
|
Abstract
Overlapping genes are commonplace in viruses and play an important role in their function and evolution. However, aside from studies on specific groups of viruses, relatively little is known about the extent and nature of gene overlap and its determinants in viruses as a whole. Here, we present an extensive characterisation of gene overlap in viruses through an analysis of reference genomes present in the NCBI virus genome database. We find that over half the instances of gene overlap are very small, covering <10 nt, and 84 per cent are <50 nt in length. Despite this, 53 per cent of all viruses still contained a gene overlap of 50 nt or larger. We also investigate several predictors of gene overlap such as genome structure (single- and double-stranded RNA and DNA), virus family, genome length, and genome segmentation. This revealed that gene overlap occurs more frequently in DNA viruses than in RNA viruses, and more frequently in single-stranded viruses than in double-stranded viruses. Genome segmentation is also associated with gene overlap, particularly in single-stranded DNA viruses. Notably, we observed a large range of overlap frequencies across families of all genome types, suggesting that it is a common evolutionary trait that provides flexible genome structures in all virus families.
Collapse
Affiliation(s)
- Timothy E Schlub
- Sydney School of Public Health, Faculty of Medicine and Health,The University of Sydney, NSW, 2006, Australia
| | - Edward C Holmes
- School of Life and Environmental Sciences and School of Medical Sciences, Marie Bashir Institute for Infectious Diseases and Biosecurity, The University of Sydney, Sydney, NSW 2006, Australia
| |
Collapse
|
36
|
Dinan AM, Lukhovitskaya NI, Olendraite I, Firth AE. A case for a negative-strand coding sequence in a group of positive-sense RNA viruses. Virus Evol 2020; 6:veaa007. [PMID: 32064120 PMCID: PMC7010960 DOI: 10.1093/ve/veaa007] [Citation(s) in RCA: 22] [Impact Index Per Article: 4.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/13/2022] Open
Abstract
Positive-sense single-stranded RNA viruses form the largest and most diverse group of eukaryote-infecting viruses. Their genomes comprise one or more segments of coding-sense RNA that function directly as messenger RNAs upon release into the cytoplasm of infected cells. Positive-sense RNA viruses are generally accepted to encode proteins solely on the positive strand. However, we previously identified a surprisingly long (∼1,000-codon) open reading frame (ORF) on the negative strand of some members of the family Narnaviridae which, together with RNA bacteriophages of the family Leviviridae, form a sister group to all other positive-sense RNA viruses. Here, we completed the genomes of three mosquito-associated narnaviruses, all of which have the long reverse-frame ORF. We systematically identified narnaviral sequences in public data sets from a wide range of sources, including arthropod, fungal, and plant transcriptomic data sets. Long reverse-frame ORFs are widespread in one clade of narnaviruses, where they frequently occupy >95 per cent of the genome. The reverse-frame ORFs correspond to a specific avoidance of CUA, UUA, and UCA codons (i.e. stop codon reverse complements) in the forward-frame RNA-dependent RNA polymerase ORF. However, absence of these codons cannot be explained by other factors such as inability to decode these codons or GC3 bias. Together with other analyses, we provide the strongest evidence yet of coding capacity on the negative strand of a positive-sense RNA virus. As these ORFs comprise some of the longest known overlapping genes, their study may be of broad relevance to understanding overlapping gene evolution and de novo origin of genes.
Collapse
Affiliation(s)
- Adam M Dinan
- Division of Virology, Department of Pathology, University of Cambridge, Tennis Court Road, Cambridge, CB2 1QP, UK
| | - Nina I Lukhovitskaya
- Division of Virology, Department of Pathology, University of Cambridge, Tennis Court Road, Cambridge, CB2 1QP, UK
| | - Ingrida Olendraite
- Division of Virology, Department of Pathology, University of Cambridge, Tennis Court Road, Cambridge, CB2 1QP, UK
| | - Andrew E Firth
- Division of Virology, Department of Pathology, University of Cambridge, Tennis Court Road, Cambridge, CB2 1QP, UK
| |
Collapse
|
37
|
Gibbs AJ, Hajizadeh M, Ohshima K, Jones RA. The Potyviruses: An Evolutionary Synthesis Is Emerging. Viruses 2020; 12:E132. [PMID: 31979056 PMCID: PMC7077269 DOI: 10.3390/v12020132] [Citation(s) in RCA: 54] [Impact Index Per Article: 10.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/31/2019] [Revised: 01/16/2020] [Accepted: 01/20/2020] [Indexed: 12/28/2022] Open
Abstract
In this review, encouraged by the dictum of Theodosius Dobzhansky that "Nothing in biology makes sense except in the light of evolution", we outline the likely evolutionary pathways that have resulted in the observed similarities and differences of the extant molecules, biology, distribution, etc. of the potyvirids and, especially, its largest genus, the potyviruses. The potyvirids are a family of plant-infecting RNA-genome viruses. They had a single polyphyletic origin, and all share at least three of their genes (i.e., the helicase region of their CI protein, the RdRp region of their NIb protein and their coat protein) with other viruses which are otherwise unrelated. Potyvirids fall into 11 genera of which the potyviruses, the largest, include more than 150 distinct viruses found worldwide. The first potyvirus probably originated 15,000-30,000 years ago, in a Eurasian grass host, by acquiring crucial changes to its coat protein and HC-Pro protein, which enabled it to be transmitted by migrating host-seeking aphids. All potyviruses are aphid-borne and, in nature, infect discreet sets of monocotyledonous or eudicotyledonous angiosperms. All potyvirus genomes are under negative selection; the HC-Pro, CP, Nia, and NIb genes are most strongly selected, and the PIPO gene least, but there are overriding virus specific differences; for example, all turnip mosaic virus genes are more strongly conserved than those of potato virus Y. Estimates of dN/dS (ω) indicate whether potyvirus populations have been evolving as one or more subpopulations and could be used to help define species boundaries. Recombinants are common in many potyvirus populations (20%-64% in five examined), but recombination seems to be an uncommon speciation mechanism as, of 149 distinct potyviruses, only two were clear recombinants. Human activities, especially trade and farming, have fostered and spread both potyviruses and their aphid vectors throughout the world, especially over the past five centuries. The world distribution of potyviruses, especially those found on islands, indicates that potyviruses may be more frequently or effectively transmitted by seed than experimental tests suggest. Only two meta-genomic potyviruses have been recorded from animal samples, and both are probably contaminants.
Collapse
Affiliation(s)
- Adrian J. Gibbs
- Emeritus Faculty, Australian National University, Canberra, ACT 2601, Australia
| | - Mohammad Hajizadeh
- Department of Plant Protection, Faculty of Agriculture, University of Kurdistan, P.O. Box 416, Sanandaj, Iran
| | - Kazusato Ohshima
- Laboratory of Plant Virology, Department of Applied Biological Sciences, Faculty of Agriculture, Saga University, 1-banchi, Honjo-machi, Saga 840-8502, Japan;
- The United Graduate School of Agricultural Sciences, Kagoshima University, 1-21-2410 Korimoto, Kagoshima 890-0065, Japan
| | - Roger A.C. Jones
- Institute of Agriculture, University of Western Australia, 35 Stirling Highway, Crawley, WA 6009, Australia
| |
Collapse
|
38
|
Li X, Zhang G, Zhu Y, Bi J, Hao H, Hou H. Effect of the luxI/R gene on AHL-signaling molecules and QS regulatory mechanism in Hafnia alvei H4. AMB Express 2019; 9:197. [PMID: 31807954 PMCID: PMC6895348 DOI: 10.1186/s13568-019-0917-z] [Citation(s) in RCA: 17] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/18/2019] [Accepted: 11/22/2019] [Indexed: 01/03/2023] Open
Abstract
Hafnia alvei H4 is a bacterium subject to regulation by a N-acyl-l-homoserine lactone (AHL)-mediated quorum sensing system and is closely related to the corruption of instant sea cucumber. Studying the effect of Hafnia alvei H4 quorum sensing regulatory genes on AHLs is necessary for the quality and preservation of instant sea cucumber. In this study, the draft genome of H. alvei H4, which comprises a single chromosome of 4,687,151 bp, was sequenced and analyzed and the types of AHLs were analyzed employing thin-layer chromatography (TLC) and high resolution triple quadrupole liquid chromatography/mass spectrometry (LC/MS). Then the wild-type strain of H. alvei H4 and the luxI/R double mutant (ΔluxIR) were compared by transcriptome sequencing (RNA-seq). The results indicate that the incomplete genome sequence revealed the presence of one quorum-sensing (QS) gene set, designated as lasI/expR. Three major AHLs, N-hexanoyl-l-homoserine lactone (C6-HSL), N-butyryl-l-homoserine lactone (C4-HSL), and N-(3-oxo-octanoyl)-l-homoserine lactone (3-oxo-C8-HSL) were found, with C6-HSL being the most abundant. C6-HSL was not detected in the culture of the luxI mutant (ΔluxI) and higher levels of C4-HSL was found in the culture of the luxR mutant (ΔluxR), which suggested that the luxR gene may have a positive effect on C4-HSL production. It was also found that AHL and QS genes are closely related in the absence of luxIR double deletion. The results of this study can further elucidate at the genetic level that luxI and luxR genes are involved in the regulation of AHL.
Collapse
|
39
|
Affram Y, Zapata JC, Gholizadeh Z, Tolbert WD, Zhou W, Iglesias-Ussel MD, Pazgier M, Ray K, Latinovic OS, Romerio F. The HIV-1 Antisense Protein ASP Is a Transmembrane Protein of the Cell Surface and an Integral Protein of the Viral Envelope. J Virol 2019; 93:e00574-19. [PMID: 31434734 PMCID: PMC6803264 DOI: 10.1128/jvi.00574-19] [Citation(s) in RCA: 23] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/05/2019] [Accepted: 08/14/2019] [Indexed: 12/13/2022] Open
Abstract
The negative strand of HIV-1 encodes a highly hydrophobic antisense protein (ASP) with no known homologs. The presence of humoral and cellular immune responses to ASP in HIV-1 patients indicates that ASP is expressed in vivo, but its role in HIV-1 replication remains unknown. We investigated ASP expression in multiple chronically infected myeloid and lymphoid cell lines using an anti-ASP monoclonal antibody (324.6) in combination with flow cytometry and microscopy approaches. At baseline and in the absence of stimuli, ASP shows polarized subnuclear distribution, preferentially in areas with low content of suppressive epigenetic marks. However, following treatment with phorbol 12-myristate 13-acetate (PMA), ASP translocates to the cytoplasm and is detectable on the cell surface, even in the absence of membrane permeabilization, indicating that 324.6 recognizes an ASP epitope that is exposed extracellularly. Further, surface staining with 324.6 and anti-gp120 antibodies showed that ASP and gp120 colocalize, suggesting that ASP might become incorporated in the membranes of budding virions. Indeed, fluorescence correlation spectroscopy studies showed binding of 324.6 to cell-free HIV-1 particles. Moreover, 324.6 was able to capture and retain HIV-1 virions with efficiency similar to that of the anti-gp120 antibody VRC01. Our studies indicate that ASP is an integral protein of the plasma membranes of chronically infected cells stimulated with PMA, and upon viral budding, ASP becomes a structural protein of the HIV-1 envelope. These results may provide leads to investigate the possible role of ASP in the virus replication cycle and suggest that ASP may represent a new therapeutic or vaccine target.IMPORTANCE The HIV-1 genome contains a gene expressed in the opposite, or antisense, direction to all other genes. The protein product of this antisense gene, called ASP, is poorly characterized, and its role in viral replication remains unknown. We provide evidence that the antisense protein, ASP, of HIV-1 is found within the cell nucleus in unstimulated cells. In addition, we show that after PMA treatment, ASP exits the nucleus and localizes on the cell membrane. Moreover, we demonstrate that ASP is present on the surfaces of viral particles. Altogether, our studies identify ASP as a new structural component of HIV-1 and show that ASP is an accessory protein that promotes viral replication. The presence of ASP on the surfaces of both infected cells and viral particles might be exploited therapeutically.
Collapse
Affiliation(s)
- Yvonne Affram
- Institute of Human Virology, University of Maryland School of Medicine, Baltimore, Maryland, USA
| | - Juan C Zapata
- Institute of Human Virology, University of Maryland School of Medicine, Baltimore, Maryland, USA
| | - Zahra Gholizadeh
- Institute of Human Virology, University of Maryland School of Medicine, Baltimore, Maryland, USA
| | - William D Tolbert
- Institute of Human Virology, University of Maryland School of Medicine, Baltimore, Maryland, USA
| | - Wei Zhou
- Institute of Human Virology, University of Maryland School of Medicine, Baltimore, Maryland, USA
| | - Maria D Iglesias-Ussel
- Institute of Human Virology, University of Maryland School of Medicine, Baltimore, Maryland, USA
| | - Marzena Pazgier
- Institute of Human Virology, University of Maryland School of Medicine, Baltimore, Maryland, USA
| | - Krishanu Ray
- Institute of Human Virology, University of Maryland School of Medicine, Baltimore, Maryland, USA
| | - Olga S Latinovic
- Institute of Human Virology, University of Maryland School of Medicine, Baltimore, Maryland, USA
| | - Fabio Romerio
- Institute of Human Virology, University of Maryland School of Medicine, Baltimore, Maryland, USA
| |
Collapse
|
40
|
Chen CH, Pan CY, Lin WC. Overlapping protein-coding genes in human genome and their coincidental expression in tissues. Sci Rep 2019; 9:13377. [PMID: 31527706 PMCID: PMC6746723 DOI: 10.1038/s41598-019-49802-w] [Citation(s) in RCA: 12] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/22/2019] [Accepted: 08/29/2019] [Indexed: 01/23/2023] Open
Abstract
The completion of human genome sequences and the advancement of next-generation sequencing technologies have engendered a clear understanding of all human genes. Overlapping genes are usually observed in compact genomes, such as those of bacteria and viruses. Notably, overlapping protein-coding genes do exist in human genome sequences. Accordingly, we used the current Ensembl gene annotations to identify overlapping human protein-coding genes. We analysed 19,200 well-annotated protein-coding genes and determined that 4,951 protein-coding genes overlapped with their adjacent genes. Approximately a quarter of all human protein-coding genes were overlapping genes. We observed different clusters of overlapping protein-coding genes, ranging from two genes (paired overlapping genes) to 22 genes. We also divided the paired overlapping protein-coding gene groups into four subtypes. We found that the divergent overlapping gene subtype had a stronger expression association than did the subtypes of 5'-tandem overlapping and 3'-tandem overlapping genes. The majority of paired overlapping genes exhibited comparable coincidental tissue expression profiles; however, a few overlapping gene pairs displayed distinctive tissue expression association patterns. In summary, we have carefully examined the genomic features and distributions about human overlapping protein-coding genes and found coincidental expression in tissues for most overlapping protein-coding genes.
Collapse
Affiliation(s)
- Chao-Hsin Chen
- Institute of Biomedical Sciences, Academia Sinica, Taipei, Taiwan R.O.C
| | - Chao-Yu Pan
- Institute of Biomedical Sciences, Academia Sinica, Taipei, Taiwan R.O.C.,Institute of Biomedical Informatics, National Yang-Ming University, Taipei, Taiwan R.O.C
| | - Wen-Chang Lin
- Institute of Biomedical Sciences, Academia Sinica, Taipei, Taiwan R.O.C.. .,Institute of Biomedical Informatics, National Yang-Ming University, Taipei, Taiwan R.O.C..
| |
Collapse
|
41
|
Minarovits J, Niller HH. Truncated oncoproteins of retroviruses and hepatitis B virus: A lesson in contrasts. INFECTION GENETICS AND EVOLUTION 2019; 73:342-357. [DOI: 10.1016/j.meegid.2019.05.020] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/18/2019] [Revised: 05/14/2019] [Accepted: 05/27/2019] [Indexed: 02/07/2023]
|
42
|
Nielly-Thibault L, Landry CR. Differences Between the Raw Material and the Products of de Novo Gene Birth Can Result from Mutational Biases. Genetics 2019; 212:1353-1366. [PMID: 31227545 PMCID: PMC6707459 DOI: 10.1534/genetics.119.302187] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/18/2019] [Accepted: 06/14/2019] [Indexed: 12/03/2022] Open
Abstract
Proteins are among the most important constituents of biological systems. Because all protein-coding genes have a noncoding ancestral form, the properties of noncoding sequences and how they shape the birth of novel proteins may influence the structure and function of all proteins. Differences between the properties of young proteins and random expectations from noncoding sequences have previously been interpreted as the result of natural selection. However, interpreting such deviations requires a yet-unattained understanding of the raw material of de novo gene birth and its relation to novel functional proteins. We mathematically show that the average properties and selective filtering of the "junk" polypeptides of which this raw material is composed are not the only factors influencing the properties of novel functional proteins. We find that in some biological scenarios, they also depend on the variance of the properties of junk polypeptides and their correlation with the rate of allelic turnover, which may itself depend on mutational biases. This suggests for instance that any property of polypeptides that accelerates their exploration of the sequence space could be overrepresented in novel functional proteins, even if it has a limited effect on adaptive value. To exemplify the use of our general theoretical results, we build a simple model that predicts the mean length and mean intrinsic disorder of novel functional proteins from the genomic GC content and a single evolutionary parameter. This work provides a theoretical framework that can guide the prediction and interpretation of results when studying the de novo emergence of protein-coding genes.
Collapse
Affiliation(s)
- Lou Nielly-Thibault
- Institut de Biologie Intégrative et des Systèmes, Université Laval, Quebec, Quebec G1V 0A6, Canada
- Département de Biologie, Université Laval, Quebec, Quebec G1V 0A6, Canada
- Département de Biochimie, de Microbiologie et de Bio-Informatique, Université Laval, Quebec, Quebec G1V 0A6, Canada
- PROTEO, Quebec, Quebec G1V 0A6, Canada
| | - Christian R Landry
- Institut de Biologie Intégrative et des Systèmes, Université Laval, Quebec, Quebec G1V 0A6, Canada
- Département de Biologie, Université Laval, Quebec, Quebec G1V 0A6, Canada
- Département de Biochimie, de Microbiologie et de Bio-Informatique, Université Laval, Quebec, Quebec G1V 0A6, Canada
- PROTEO, Quebec, Quebec G1V 0A6, Canada
| |
Collapse
|
43
|
Prabh N, Rödelsperger C. De Novo, Divergence, and Mixed Origin Contribute to the Emergence of Orphan Genes in Pristionchus Nematodes. G3 (BETHESDA, MD.) 2019; 9:2277-2286. [PMID: 31088903 PMCID: PMC6643871 DOI: 10.1534/g3.119.400326] [Citation(s) in RCA: 23] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 02/04/2019] [Accepted: 05/11/2019] [Indexed: 12/30/2022]
Abstract
Homology is a fundamental concept in comparative biology. It is extensively used at the sequence level to make phylogenetic hypotheses and functional inferences. Nonetheless, the majority of eukaryotic genomes contain large numbers of orphan genes lacking homologs in other taxa. Generally, the fraction of orphan genes is higher in genomically undersampled clades, and in the absence of closely related genomes any hypothesis about their origin and evolution remains untestable. Previously, we sequenced ten genomes with an underlying ladder-like phylogeny to establish a phylogenomic framework for studying genome evolution in diplogastrid nematodes. Here, we use this deeply sampled data set to understand the processes that generate orphan genes in our focal species Pristionchus pacificus Based on phylostratigraphic analysis and additional bioinformatic filters, we obtained 29 high-confidence candidate genes for which mechanisms of orphan origin were proposed based on manual inspection. This revealed diverse mechanisms including annotation artifacts, chimeric origin, alternative reading frame usage, and gene splitting with subsequent gain of de novo exons. In addition, we present two cases of complete de novo origination from non-coding regions, which represents one of the first reports of de novo genes in nematodes. Thus, we conclude that de novo emergence, divergence, and mixed mechanisms contribute to novel gene formation in Pristionchus nematodes.
Collapse
Affiliation(s)
- Neel Prabh
- Department of Integrative Evolutionary Biology, Max-Planck-Institute for Developmental Biology, Max-Planck-Ring 9, 72076 Tübingen, Germany
- Department of Evolutionary Genetics, Max-Planck-Institute for Evolutionary Biology, August Thienemann Str. 2, 24306 Plön, Germany
| | - Christian Rödelsperger
- Department of Integrative Evolutionary Biology, Max-Planck-Institute for Developmental Biology, Max-Planck-Ring 9, 72076 Tübingen, Germany
| |
Collapse
|
44
|
Schlub TE, Buchmann JP, Holmes EC. A Simple Method to Detect Candidate Overlapping Genes in Viruses Using Single Genome Sequences. Mol Biol Evol 2019; 35:2572-2581. [PMID: 30099499 PMCID: PMC6188560 DOI: 10.1093/molbev/msy155] [Citation(s) in RCA: 18] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/13/2022] Open
Abstract
Overlapping genes in viruses maximize the coding capacity of their genomes and allow the generation of new genes without major increases in genome size. Despite their importance, the evolution and function of overlapping genes are often not well understood, in part due to difficulties in their detection. In addition, most bioinformatic approaches for the detection of overlapping genes require the comparison of multiple genome sequences that may not be available in metagenomic surveys of virus biodiversity. We introduce a simple new method for identifying candidate functional overlapping genes using single virus genome sequences. Our method uses randomization tests to estimate the expected length of open reading frames and then identifies overlapping open reading frames that significantly exceed this length and are thus predicted to be functional. We applied this method to 2548 reference RNA virus genomes and find that it has both high sensitivity and low false discovery for genes that overlap by at least 50 nucleotides. Notably, this analysis provided evidence for 29 previously undiscovered functional overlapping genes, some of which are coded in the antisense direction suggesting there are limitations in our current understanding of RNA virus replication.
Collapse
Affiliation(s)
- Timothy E Schlub
- Sydney School of Public Health, Faculty of Medicine and Health, The University of Sydney, Sydney, NSW, Australia
| | - Jan P Buchmann
- Marie Bashir Institute for Infectious Diseases and Biosecurity, Charles Perkins Centre, School of Life and Environmental Sciences and Sydney Medical School, The University of Sydney, Sydney, NSW , Australia
| | - Edward C Holmes
- Marie Bashir Institute for Infectious Diseases and Biosecurity, Charles Perkins Centre, School of Life and Environmental Sciences and Sydney Medical School, The University of Sydney, Sydney, NSW , Australia
| |
Collapse
|
45
|
Affiliation(s)
- Stephen Branden Van Oss
- Department of Computational and Systems Biology, Pittsburgh Center for Evolutionary Biology and Medicine, School of Medicine, University of Pittsburgh, Pittsburgh, PA, United States of America
| | - Anne-Ruxandra Carvunis
- Department of Computational and Systems Biology, Pittsburgh Center for Evolutionary Biology and Medicine, School of Medicine, University of Pittsburgh, Pittsburgh, PA, United States of America
| |
Collapse
|
46
|
Pavesi A. Asymmetric evolution in viral overlapping genes is a source of selective protein adaptation. Virology 2019; 532:39-47. [PMID: 31004987 PMCID: PMC7125799 DOI: 10.1016/j.virol.2019.03.017] [Citation(s) in RCA: 15] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/03/2019] [Revised: 03/25/2019] [Accepted: 03/26/2019] [Indexed: 12/29/2022]
Abstract
Overlapping genes represent an intriguing puzzle, as they encode two proteins whose ability to evolve is constrained by each other. Overlapping genes can undergo “symmetric evolution” (similar selection pressures on the two proteins) or “asymmetric evolution” (significantly different selection pressures on the two proteins). By sequence analysis of 75 pairs of homologous viral overlapping genes, I evaluated their accordance with one or the other model. Analysis of nucleotide and amino acid sequences revealed that half of overlaps undergo asymmetric evolution, as the protein from one frame shows a number of substitutions significantly higher than that of the protein from the other frame. Interestingly, the most variable protein (often known to interact with the host proteins) appeared to be encoded by the de novo frame in all cases examined. These findings suggest that overlapping genes, besides to increase the coding ability of viruses, are also a source of selective protein adaptation. A dataset of 80 pairs of homologous overlapping genes from viruses is examined. Its analysis reveals that half of overlapping genes undergo asymmetric evolution. The most variable gene product is that encoded by the de novo overlapping gene. Overlapping genes evolving asymmetrically are a source of selective protein adaptation.
Collapse
Affiliation(s)
- Angelo Pavesi
- Department of Chemistry, Life Sciences and Environmental Sustainability, University of Parma, Parco Area delle Scienze 11/A, I-43124, Parma, Italy.
| |
Collapse
|
47
|
Pavesi A, Vianelli A, Chirico N, Bao Y, Blinkova O, Belshaw R, Firth A, Karlin D. Overlapping genes and the proteins they encode differ significantly in their sequence composition from non-overlapping genes. PLoS One 2018; 13:e0202513. [PMID: 30339683 PMCID: PMC6195259 DOI: 10.1371/journal.pone.0202513] [Citation(s) in RCA: 34] [Impact Index Per Article: 4.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/22/2018] [Accepted: 08/03/2018] [Indexed: 11/19/2022] Open
Abstract
Overlapping genes represent a fascinating evolutionary puzzle, since they encode two functionally unrelated proteins from the same DNA sequence. They originate by a mechanism of overprinting, in which point mutations in an existing frame allow the expression (the "birth") of a completely new protein from a second frame. In viruses, in which overlapping genes are abundant, these new proteins often play a critical role in infection, yet they are frequently overlooked during genome annotation. This results in erroneous interpretation of mutational studies and in a significant waste of resources. Therefore, overlapping genes need to be correctly detected, especially since they are now thought to be abundant also in eukaryotes. Developing better detection methods and conducting systematic evolutionary studies require a large, reliable benchmark dataset of known cases. We thus assembled a high-quality dataset of 80 viral overlapping genes whose expression is experimentally proven. Many of them were not present in databases. We found that overall, overlapping genes differ significantly from non-overlapping genes in their nucleotide and amino acid composition. In particular, the proteins they encode are enriched in high-degeneracy amino acids and depleted in low-degeneracy ones, which may alleviate the evolutionary constraints acting on overlapping genes. Principal component analysis revealed that the vast majority of overlapping genes follow a similar composition bias, despite their heterogeneity in length and function. Six proven mammalian overlapping genes also followed this bias. We propose that this apparently near-universal composition bias may either favour the birth of overlapping genes, or/and result from selection pressure acting on them.
Collapse
Affiliation(s)
- Angelo Pavesi
- Department of Chemistry, Life Sciences and Environmental Sustainability, University of Parma, Parma, Italy
| | - Alberto Vianelli
- Department of Theoretical and Applied Sciences, University of Insubria, Varese, Italy
| | - Nicola Chirico
- Department of Theoretical and Applied Sciences, University of Insubria, Varese, Italy
| | - Yiming Bao
- BIG Data Center, Beijing Institute of Genomics, Chinese Academy of Sciences, Beijing, China
| | - Olga Blinkova
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD, United States of America
| | - Robert Belshaw
- School of Biomedical & Healthcare Sciences, Plymouth University Peninsula Schools of Medicine and Dentistry (PUPSMD), Plymouth, United Kingdom
| | - Andrew Firth
- Department of Pathology, Division of Virology, University of Cambridge, Cambridge, United Kingdom
| | - David Karlin
- Department of Zoology, University of Oxford, Oxford, United Kingdom
- Division of Structural Biology, University of Oxford, Oxford, United Kingdom
| |
Collapse
|
48
|
Stewart H, Brown K, Dinan AM, Irigoyen N, Snijder EJ, Firth AE. Transcriptional and Translational Landscape of Equine Torovirus. J Virol 2018; 92:e00589-18. [PMID: 29950409 PMCID: PMC6096809 DOI: 10.1128/jvi.00589-18] [Citation(s) in RCA: 19] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/06/2018] [Accepted: 06/13/2018] [Indexed: 12/15/2022] Open
Abstract
The genus Torovirus (subfamily Torovirinae, family Coronaviridae, order Nidovirales) encompasses a range of species that infect domestic ungulates, including cattle, sheep, goats, pigs, and horses, causing an acute self-limiting gastroenteritis. Using the prototype species equine torovirus (EToV), we performed parallel RNA sequencing (RNA-seq) and ribosome profiling (Ribo-seq) to analyze the relative expression levels of the known torovirus proteins and transcripts, chimeric sequences produced via discontinuous RNA synthesis (a characteristic of the nidovirus replication cycle), and changes in host transcription and translation as a result of EToV infection. RNA sequencing confirmed that EToV utilizes a unique combination of discontinuous and nondiscontinuous RNA synthesis to produce its subgenomic RNAs (sgRNAs); indeed, we identified transcripts arising from both mechanisms that would result in sgRNAs encoding the nucleocapsid. Our ribosome profiling analysis revealed that ribosomes efficiently translate two novel CUG-initiated open reading frames (ORFs), located within the so-called 5' untranslated region. We have termed the resulting proteins U1 and U2. Comparative genomic analysis confirmed that these ORFs are conserved across all available torovirus sequences, and the inferred amino acid sequences are subject to purifying selection, indicating that U1 and U2 are functionally relevant. This study provides the first high-resolution analysis of transcription and translation in this neglected group of livestock pathogens.IMPORTANCE Toroviruses infect cattle, goats, pigs, and horses worldwide and can cause gastrointestinal disease. There is no treatment or vaccine, and their ability to spill over into humans has not been assessed. These viruses are related to important human pathogens, including severe acute respiratory syndrome (SARS) coronavirus, and they share some common features; however, the mechanism that they use to produce sgRNA molecules differs. Here, we performed deep sequencing to determine how equine torovirus produces sgRNAs. In doing so, we also identified two previously unknown open reading frames "hidden" within the genome. Together these results highlight the similarities and differences between this domestic animal virus and related pathogens of humans and livestock.
Collapse
Affiliation(s)
- Hazel Stewart
- Division of Virology, Department of Pathology, University of Cambridge, Cambridge, United Kingdom
| | - Katherine Brown
- Division of Virology, Department of Pathology, University of Cambridge, Cambridge, United Kingdom
| | - Adam M Dinan
- Division of Virology, Department of Pathology, University of Cambridge, Cambridge, United Kingdom
| | - Nerea Irigoyen
- Division of Virology, Department of Pathology, University of Cambridge, Cambridge, United Kingdom
| | - Eric J Snijder
- Molecular Virology Laboratory, Department of Medical Microbiology, Leiden University Medical Center, Leiden, The Netherlands
| | - Andrew E Firth
- Division of Virology, Department of Pathology, University of Cambridge, Cambridge, United Kingdom
| |
Collapse
|
49
|
Willis S, Masel J. Gene Birth Contributes to Structural Disorder Encoded by Overlapping Genes. Genetics 2018; 210:303-313. [PMID: 30026186 PMCID: PMC6116962 DOI: 10.1534/genetics.118.301249] [Citation(s) in RCA: 26] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/11/2018] [Accepted: 07/18/2018] [Indexed: 11/18/2022] Open
Abstract
The same nucleotide sequence can encode two protein products in different reading frames. Overlapping gene regions encode higher levels of intrinsic structural disorder (ISD) than nonoverlapping genes (39% vs. 25% in our viral dataset). This might be because of the intrinsic properties of the genetic code, because one member per pair was recently born de novo in a process that favors high ISD, or because high ISD relieves increased evolutionary constraint imposed by dual-coding. Here, we quantify the relative contributions of these three alternative hypotheses. We estimate that the recency of de novo gene birth explains [Formula: see text] or more of the elevation in ISD in overlapping regions of viral genes. While the two reading frames within a same-strand overlapping gene pair have markedly different ISD tendencies that must be controlled for, their effects cancel out to make no net contribution to ISD. The remaining elevation of ISD in the older members of overlapping gene pairs, presumed due to the need to alleviate evolutionary constraint, was already present prior to the origin of the overlap. Same-strand overlapping gene birth events can occur in two different frames, favoring high ISD either in the ancestral gene or in the novel gene; surprisingly, most de novo gene birth events contained completely within the body of an ancestral gene favor high ISD in the ancestral gene (23 phylogenetically independent events vs. 1). This can be explained by mutation bias favoring the frame with more start codons and fewer stop codons.
Collapse
Affiliation(s)
- Sara Willis
- Department of Ecology and Evolutionary Biology, University of Arizona, Tucson, Arizona 85721
| | - Joanna Masel
- Department of Ecology and Evolutionary Biology, University of Arizona, Tucson, Arizona 85721
| |
Collapse
|
50
|
Abstract
De novo genes are very important for evolutionary innovation. However, how these genes originate and spread remains largely unknown. To better understand this, we rigorously searched for de novo genes in Saccharomyces cerevisiae S288C and examined their spread and fixation in the population. Here, we identified 84 de novo genes in S. cerevisiae S288C since the divergence with their sister groups. Transcriptome and ribosome profiling data revealed at least 8 (10%) and 28 (33%) de novo genes being expressed and translated only under specific conditions, respectively. DNA microarray data, based on 2-fold change, showed that 87% of the de novo genes are regulated during various biological processes, such as nutrient utilization and sporulation. Our comparative and evolutionary analyses further revealed that some factors, including single nucleotide polymorphism (SNP)/indel mutation, high GC content, and DNA shuffling, contribute to the birth of de novo genes, while domestication and natural selection drive the spread and fixation of these genes. Finally, we also provide evidence suggesting the possible parallel origin of a de novo gene between S. cerevisiae and Saccharomyces paradoxus. Together, our study provides several new insights into the origin and spread of de novo genes. Emergence of de novo genes has occurred in many lineages during evolution, but the birth, spread, and function of these genes remain unresolved. Here we have searched for de novo genes from Saccharomyces cerevisiae S288C using rigorous methods, which reduced the effects of bad annotation and genomic gaps on the identification of de novo genes. Through this analysis, we have found 84 new genes originating de novo from previously noncoding regions, 87% of which are very likely involved in various biological processes. We noticed that 10% and 33% of de novo genes were only expressed and translated under specific conditions, therefore, verification of de novo genes through transcriptome and ribosome profiling, especially from limited expression data, may underestimate the number of bona fide new genes. We further show that SNP/indel mutation, high GC content, and DNA shuffling could be involved in the birth of de novo genes, while domestication and natural selection drive the spread and fixation of these genes. Finally, we provide evidence suggesting the possible parallel origin of a new gene.
Collapse
|