1
|
Karlin DG. Parvovirus B19 and Human Parvovirus 4 Encode Similar Proteins in a Reading Frame Overlapping the VP1 Capsid Gene. Viruses 2024; 16:191. [PMID: 38399966 PMCID: PMC10891878 DOI: 10.3390/v16020191] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/12/2023] [Revised: 01/12/2024] [Accepted: 01/24/2024] [Indexed: 02/25/2024] Open
Abstract
Viruses frequently contain overlapping genes, which encode functionally unrelated proteins from the same DNA or RNA region but in different reading frames. Yet, overlapping genes are often overlooked during genome annotation, in particular in DNA viruses. Here we looked for the presence of overlapping genes likely to encode a functional protein in human parvovirus B19 (genus Erythroparvovirus), using an experimentally validated software, Synplot2. Synplot2 detected an open reading frame, X, conserved in all erythroparvoviruses, which overlaps the VP1 capsid gene and is under highly significant selection pressure. In a related virus, human parvovirus 4 (genus Tetraparvovirus), Synplot2 also detected an open reading frame under highly significant selection pressure, ARF1, which overlaps the VP1 gene and is conserved in all tetraparvoviruses. These findings provide compelling evidence that the X and ARF1 proteins must be expressed and functional. X and ARF1 have the exact same location (they overlap the region of the VP1 gene encoding the phospholipase A2 domain), are both in the same frame (+1) with respect to the VP1 frame, and encode proteins with similar predicted properties, including a central transmembrane region. Further studies will be needed to determine whether they have a common origin and similar function. X and ARF1 are probably translated either from a polycistronic mRNA by a non-canonical mechanism, or from an unmapped monocistronic mRNA. Finally, we also discovered proteins predicted to be expressed from a frame overlapping VP1 in other species related to parvovirus B19: porcine parvovirus 2 (Z protein) and bovine parvovirus 3 (X-like protein).
Collapse
Affiliation(s)
- David G. Karlin
- Division Phytomedicine, Thaer-Institute of Agricultural and Horticultural Sciences, Humboldt-Universität zu Berlin, Lentzeallee 55/57, D-14195 Berlin, Germany;
- Independent Researcher, 13000 Marseille, France
| |
Collapse
|
2
|
Pavesi A. Origin, Evolution and Stability of Overlapping Genes in Viruses: A Systematic Review. Genes (Basel) 2021; 12:genes12060809. [PMID: 34073395 PMCID: PMC8227390 DOI: 10.3390/genes12060809] [Citation(s) in RCA: 19] [Impact Index Per Article: 4.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/06/2021] [Revised: 05/22/2021] [Accepted: 05/24/2021] [Indexed: 12/11/2022] Open
Abstract
During their long evolutionary history viruses generated many proteins de novo by a mechanism called “overprinting”. Overprinting is a process in which critical nucleotide substitutions in a pre-existing gene can induce the expression of a novel protein by translation of an alternative open reading frame (ORF). Overlapping genes represent an intriguing example of adaptive conflict, because they simultaneously encode two proteins whose freedom to change is constrained by each other. However, overlapping genes are also a source of genetic novelties, as the constraints under which alternative ORFs evolve can give rise to proteins with unusual sequence properties, most importantly the potential for novel functions. Starting with the discovery of overlapping genes in phages infecting Escherichia coli, this review covers a range of studies dealing with detection of overlapping genes in small eukaryotic viruses (genomic length below 30 kb) and recognition of their critical role in the evolution of pathogenicity. Origin of overlapping genes, what factors favor their birth and retention, and how they manage their inherent adaptive conflict are extensively reviewed. Special attention is paid to the assembly of overlapping genes into ad hoc databases, suitable for future studies, and to the development of statistical methods for exploring viral genome sequences in search of undiscovered overlaps.
Collapse
Affiliation(s)
- Angelo Pavesi
- Department of Chemistry, Life Sciences and Environmental Sustainability, University of Parma, Parco Area delle Scienze 23/A, I-43124 Parma, Italy
| |
Collapse
|
3
|
Unconventional viral gene expression mechanisms as therapeutic targets. Nature 2021; 593:362-371. [PMID: 34012080 DOI: 10.1038/s41586-021-03511-5] [Citation(s) in RCA: 34] [Impact Index Per Article: 8.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/08/2020] [Accepted: 03/22/2021] [Indexed: 12/14/2022]
Abstract
Unlike the human genome that comprises mostly noncoding and regulatory sequences, viruses have evolved under the constraints of maintaining a small genome size while expanding the efficiency of their coding and regulatory sequences. As a result, viruses use strategies of transcription and translation in which one or more of the steps in the conventional gene-protein production line are altered. These alternative strategies of viral gene expression (also known as gene recoding) can be uniquely brought about by dedicated viral enzymes or by co-opting host factors (known as host dependencies). Targeting these unique enzymatic activities and host factors exposes vulnerabilities of a virus and provides a paradigm for the design of novel antiviral therapies. In this Review, we describe the types and mechanisms of unconventional gene and protein expression in viruses, and provide a perspective on how future basic mechanistic work could inform translational efforts that are aimed at viral eradication.
Collapse
|
4
|
Du Y, Ji C, Liu T, Zhang W, Fang Q, Dong Q, Li M, Wang H, Chen Y, Ouyang K, Wei Z, Huang W. Identification of a novel protein in porcine astrovirus that is important for virus replication. Vet Microbiol 2021; 255:108984. [PMID: 33684827 DOI: 10.1016/j.vetmic.2021.108984] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/19/2020] [Accepted: 01/07/2021] [Indexed: 10/22/2022]
Abstract
Overlapping genes are common in some RNA viruses. It has been proposed that a potential overlapping gene is the ORFX, here termed ORF2b, which overlaps the ORF2 coding sequence in astroviruses. The aim of this study was to determine whether ORF2b is an overlapping gene that encodes a functional protein which is needed for viral replication. Sequence alignment showed that there was an ORF2b in a PAstV type 1 strain of astrovirus, PAstV1-GX1, which was embedded within the larger ORF2. The AUG codon for ORF2b is located 19 nucleotides downstream of the initiation site of ORF2 and contains 369 nucleotides and it codes for a predicted 122-amino-acid protein. A specific polyclonal antibody against the ORF2b protein was raised and used to demonstrate the expression of the new identified gene in virus-infected and pCAGGS-ORF2b-transfected cells. Analysis of purified virions revealed that the ORF2b protein was not incorporated into virus particles. Reverse genetics based on a PAstV type 1 infectious cDNA clone showed that the ORF2b protein was not essential but important for optimal virus infectivity. Knockout of the downstream potential stop codon candidate of ORF2b demonstrated that the C-terminus of the ORF2b protein can be extended by 170 amino acids, suggesting that the C-terminus of the newly identified ORF2b protein may be variable.
Collapse
Affiliation(s)
- Yanjie Du
- College of Animal Science and Technology, Guangxi University, No.100 Daxue Road, Nanning 530005, China
| | - Chengyuan Ji
- College of Animal Science and Technology, Guangxi University, No.100 Daxue Road, Nanning 530005, China
| | - Teng Liu
- College of Animal Science and Technology, Guangxi University, No.100 Daxue Road, Nanning 530005, China
| | - Wenchao Zhang
- College of Animal Science and Technology, Guangxi University, No.100 Daxue Road, Nanning 530005, China
| | - Qingli Fang
- College of Animal Science and Technology, Guangxi University, No.100 Daxue Road, Nanning 530005, China
| | - Qinting Dong
- College of Animal Science and Technology, Guangxi University, No.100 Daxue Road, Nanning 530005, China
| | - Mingyang Li
- College of Animal Science and Technology, Guangxi University, No.100 Daxue Road, Nanning 530005, China
| | - Hao Wang
- College of Animal Science and Technology, Guangxi University, No.100 Daxue Road, Nanning 530005, China
| | - Ying Chen
- College of Animal Science and Technology, Guangxi University, No.100 Daxue Road, Nanning 530005, China
| | - Kang Ouyang
- College of Animal Science and Technology, Guangxi University, No.100 Daxue Road, Nanning 530005, China
| | - Zuzhang Wei
- College of Animal Science and Technology, Guangxi University, No.100 Daxue Road, Nanning 530005, China.
| | - Weijian Huang
- College of Animal Science and Technology, Guangxi University, No.100 Daxue Road, Nanning 530005, China.
| |
Collapse
|
5
|
Orientation-dependent toxic effect of human papillomavirus type 33 long control region DNA in Escherichia coli cells. Virus Genes 2020; 56:298-305. [PMID: 32246353 PMCID: PMC7220894 DOI: 10.1007/s11262-020-01754-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/21/2020] [Accepted: 03/20/2020] [Indexed: 11/15/2022]
Abstract
The functional analysis of human papillomavirus (HPV) sequence variation requires the molecular cloning of different genomic regions of virus variants. In this study, we report an unexpected difficulty experienced when trying to clone HPV33 long control region (LCR) variants in Escherichia coli. Standard cloning strategies proved to be inappropriate to clone HPV33 LCR variants in the forward orientation into a eukaryotic reporter vector (pGL2-Basic). However, by slight modification of culture conditions (incubation at 25 °C instead of 37 °C), constructs containing the HPV33 LCR variants in the forward orientation were obtained. Transformation experiments performed with different HPV33 LCR constructs indicated that there is a sequence element in the 5′ LCR of HPV33 causing temperature-dependent toxic effect in E. coli. Sequence analysis revealed the presence of an open reading frame (ORF) in the 5′ part of HPV33 LCR potentially encoding a 116-amino acid polypeptide. Protein structure prediction suggested that this putative protein might have a structural similarity to transmembrane proteins. Even a low-level expression of this protein may cause significant toxicity in the host bacteria. In silico analysis of the LCR of HPV33 and some other HPV types belonging to the species Alphapapillomavirus 9 (HPV31, 35 and 58) seemed to support the assumption that the ORFs found in the 5′ LCR of these HPVs are protein-coding sequences. Further studies should be performed to prove that these putative proteins are really expressed in the infected host cells and to identify their function.
Collapse
|
6
|
Schlub TE, Buchmann JP, Holmes EC. A Simple Method to Detect Candidate Overlapping Genes in Viruses Using Single Genome Sequences. Mol Biol Evol 2019; 35:2572-2581. [PMID: 30099499 PMCID: PMC6188560 DOI: 10.1093/molbev/msy155] [Citation(s) in RCA: 18] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/13/2022] Open
Abstract
Overlapping genes in viruses maximize the coding capacity of their genomes and allow the generation of new genes without major increases in genome size. Despite their importance, the evolution and function of overlapping genes are often not well understood, in part due to difficulties in their detection. In addition, most bioinformatic approaches for the detection of overlapping genes require the comparison of multiple genome sequences that may not be available in metagenomic surveys of virus biodiversity. We introduce a simple new method for identifying candidate functional overlapping genes using single virus genome sequences. Our method uses randomization tests to estimate the expected length of open reading frames and then identifies overlapping open reading frames that significantly exceed this length and are thus predicted to be functional. We applied this method to 2548 reference RNA virus genomes and find that it has both high sensitivity and low false discovery for genes that overlap by at least 50 nucleotides. Notably, this analysis provided evidence for 29 previously undiscovered functional overlapping genes, some of which are coded in the antisense direction suggesting there are limitations in our current understanding of RNA virus replication.
Collapse
Affiliation(s)
- Timothy E Schlub
- Sydney School of Public Health, Faculty of Medicine and Health, The University of Sydney, Sydney, NSW, Australia
| | - Jan P Buchmann
- Marie Bashir Institute for Infectious Diseases and Biosecurity, Charles Perkins Centre, School of Life and Environmental Sciences and Sydney Medical School, The University of Sydney, Sydney, NSW , Australia
| | - Edward C Holmes
- Marie Bashir Institute for Infectious Diseases and Biosecurity, Charles Perkins Centre, School of Life and Environmental Sciences and Sydney Medical School, The University of Sydney, Sydney, NSW , Australia
| |
Collapse
|
7
|
Kongari R, Rajaure M, Cahill J, Rasche E, Mijalis E, Berry J, Young R. Phage spanins: diversity, topological dynamics and gene convergence. BMC Bioinformatics 2018; 19:326. [PMID: 30219026 PMCID: PMC6139136 DOI: 10.1186/s12859-018-2342-8] [Citation(s) in RCA: 66] [Impact Index Per Article: 9.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/19/2018] [Accepted: 08/28/2018] [Indexed: 01/21/2023] Open
Abstract
BACKGROUND Spanins are phage lysis proteins required to disrupt the outer membrane. Phages employ either two-component spanins or unimolecular spanins in this final step of Gram-negative host lysis. Two-component spanins like Rz-Rz1 from phage lambda consist of an integral inner membrane protein: i-spanin, and an outer membrane lipoprotein: o-spanin, that form a complex spanning the periplasm. Two-component spanins exist in three different genetic architectures; embedded, overlapped and separated. In contrast, the unimolecular spanins, like gp11 from phage T1, have an N-terminal lipoylation signal sequence and a C-terminal transmembrane domain to account for the topology requirements. Our proposed model for spanin function, for both spanin types, follows a common theme of the outer membrane getting fused with the inner membrane, effecting the release of progeny virions. RESULTS Here we present a SpaninDataBase which consists of 528 two-component spanins and 58 unimolecular spanins identified in this analysis. Primary analysis revealed significant differences in the secondary structure predictions for the periplasmic domains of the two-component and unimolecular spanin types, as well as within the three different genetic architectures of the two-component spanins. Using a threshold of 40% sequence identity over 40% sequence length, we were able to group the spanins into 143 i-spanin, 125 o-spanin and 13 u-spanin families. More than 40% of these families from each type were singletons, underlining the extreme diversity of this class of lysis proteins. Multiple sequence alignments of periplasmic domains demonstrated conserved secondary structure patterns and domain organization within family members. Furthermore, analysis of families with members from different architecture allowed us to interpret the evolutionary dynamics of spanin gene arrangement. Also, the potential universal role of intermolecular disulfide bonds in two-component spanin function was substantiated through bioinformatic and genetic approaches. Additionally, a novel lipobox motif, AWAC, was identified and experimentally verified. CONCLUSIONS The findings from this bioinformatic approach gave us instructive insights into spanin function, evolution, domain organization and provide a platform for future spanin annotation, as well as biochemical and genetic experiments. They also establish that spanins, like viral membrane fusion proteins, adopt different strategies to achieve fusion of the inner and outer membranes.
Collapse
Affiliation(s)
- Rohit Kongari
- Center for Phage Technology, Department of Biochemistry and Biophysics, Texas A&M University, 2128 TAMU, College Station, TX, 77843-2128, USA
| | | | - Jesse Cahill
- Center for Phage Technology, Department of Biochemistry and Biophysics, Texas A&M University, 2128 TAMU, College Station, TX, 77843-2128, USA
| | - Eric Rasche
- Center for Phage Technology, Department of Biochemistry and Biophysics, Texas A&M University, 2128 TAMU, College Station, TX, 77843-2128, USA
| | - Eleni Mijalis
- Center for Phage Technology, Department of Biochemistry and Biophysics, Texas A&M University, 2128 TAMU, College Station, TX, 77843-2128, USA
| | - Joel Berry
- University of California, San Francisco, CA, USA
| | - Ry Young
- Center for Phage Technology, Department of Biochemistry and Biophysics, Texas A&M University, 2128 TAMU, College Station, TX, 77843-2128, USA.
| |
Collapse
|
8
|
Hücker SM, Vanderhaeghen S, Abellan-Schneyder I, Wecko R, Simon S, Scherer S, Neuhaus K. A novel short L-arginine responsive protein-coding gene (laoB) antiparallel overlapping to a CadC-like transcriptional regulator in Escherichia coli O157:H7 Sakai originated by overprinting. BMC Evol Biol 2018; 18:21. [PMID: 29433444 PMCID: PMC5810103 DOI: 10.1186/s12862-018-1134-0] [Citation(s) in RCA: 17] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/22/2017] [Accepted: 01/31/2018] [Indexed: 11/10/2022] Open
Abstract
Background Due to the DNA triplet code, it is possible that the sequences of two or more protein-coding genes overlap to a large degree. However, such non-trivial overlaps are usually excluded by genome annotation pipelines and, thus, only a few overlapping gene pairs have been described in bacteria. In contrast, transcriptome and translatome sequencing reveals many signals originated from the antisense strand of annotated genes, of which we analyzed an example gene pair in more detail. Results A small open reading frame of Escherichia coli O157:H7 strain Sakai (EHEC), designated laoB (L-arginine responsive overlapping gene), is embedded in reading frame −2 in the antisense strand of ECs5115, encoding a CadC-like transcriptional regulator. This overlapping gene shows evidence of transcription and translation in Luria-Bertani (LB) and brain-heart infusion (BHI) medium based on RNA sequencing (RNAseq) and ribosomal-footprint sequencing (RIBOseq). The transcriptional start site is 289 base pairs (bp) upstream of the start codon and transcription termination is 155 bp downstream of the stop codon. Overexpression of LaoB fused to an enhanced green fluorescent protein (EGFP) reporter was possible. The sequence upstream of the transcriptional start site displayed strong promoter activity under different conditions, whereas promoter activity was significantly decreased in the presence of L-arginine. A strand-specific translationally arrested mutant of laoB provided a significant growth advantage in competitive growth experiments in the presence of L-arginine compared to the wild type, which returned to wild type level after complementation of laoB in trans. A phylostratigraphic analysis indicated that the novel gene is restricted to the Escherichia/Shigella clade and might have originated recently by overprinting leading to the expression of part of the antisense strand of ECs5115. Conclusions Here, we present evidence of a novel small protein-coding gene laoB encoded in the antisense frame −2 of the annotated gene ECs5115. Clearly, laoB is evolutionarily young and it originated in the Escherichia/Shigella clade by overprinting, a process which may cause the de novo evolution of bacterial genes like laoB. Electronic supplementary material The online version of this article (10.1186/s12862-018-1134-0) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Sarah M Hücker
- Chair for Microbial Ecology, Wissenschaftszentrum Weihenstephan, Technische Universität München, Weihenstephaner Berg 3, 85354, Freising, Germany.,Fraunhofer ITEM-R, Am Biopark 9, 93053, Regensburg, Germany
| | - Sonja Vanderhaeghen
- Chair for Microbial Ecology, Wissenschaftszentrum Weihenstephan, Technische Universität München, Weihenstephaner Berg 3, 85354, Freising, Germany
| | - Isabel Abellan-Schneyder
- Chair for Microbial Ecology, Wissenschaftszentrum Weihenstephan, Technische Universität München, Weihenstephaner Berg 3, 85354, Freising, Germany.,Core Facility Microbiome/NGS, ZIEL - Institute for Food & Health, Technische Universität München, Weihenstephaner Berg 3, 85354, Freising, Germany
| | - Romy Wecko
- Chair for Microbial Ecology, Wissenschaftszentrum Weihenstephan, Technische Universität München, Weihenstephaner Berg 3, 85354, Freising, Germany
| | - Svenja Simon
- Department of Computer and Information Science, University of Konstanz, Box 78, 78457, Konstanz, Germany
| | - Siegfried Scherer
- Chair for Microbial Ecology, Wissenschaftszentrum Weihenstephan, Technische Universität München, Weihenstephaner Berg 3, 85354, Freising, Germany.,ZIEL - Institute for Food & Health, Technische Universität München, Weihenstephaner Berg 3, 85354, Freising, Germany
| | - Klaus Neuhaus
- Chair for Microbial Ecology, Wissenschaftszentrum Weihenstephan, Technische Universität München, Weihenstephaner Berg 3, 85354, Freising, Germany. .,Core Facility Microbiome/NGS, ZIEL - Institute for Food & Health, Technische Universität München, Weihenstephaner Berg 3, 85354, Freising, Germany.
| |
Collapse
|
9
|
Kirby LE, Koslowsky D. Mitochondrial dual-coding genes in Trypanosoma brucei. PLoS Negl Trop Dis 2017; 11:e0005989. [PMID: 28991908 PMCID: PMC5650466 DOI: 10.1371/journal.pntd.0005989] [Citation(s) in RCA: 17] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/15/2017] [Revised: 10/20/2017] [Accepted: 09/23/2017] [Indexed: 12/31/2022] Open
Abstract
Trypanosoma brucei is transmitted between mammalian hosts by the tsetse fly. In the mammal, they are exclusively extracellular, continuously replicating within the bloodstream. During this stage, the mitochondrion lacks a functional electron transport chain (ETC). Successful transition to the fly, requires activation of the ETC and ATP synthesis via oxidative phosphorylation. This life cycle leads to a major problem: in the bloodstream, the mitochondrial genes are not under selection and are subject to genetic drift that endangers their integrity. Exacerbating this, T. brucei undergoes repeated population bottlenecks as they evade the host immune system that would create additional forces of genetic drift. These parasites possess several unique genetic features, including RNA editing of mitochondrial transcripts. RNA editing creates open reading frames by the guided insertion and deletion of U-residues within the mRNA. A major question in the field has been why this metabolically expensive system of RNA editing would evolve and persist. Here, we show that many of the edited mRNAs can alter the choice of start codon and the open reading frame by alternative editing of the 5’ end. Analyses of mutational bias indicate that six of the mitochondrial genes may be dual-coding and that RNA editing allows access to both reading frames. We hypothesize that dual-coding genes can protect genetic information by essentially hiding a non-selected gene within one that remains under selection. Thus, the complex RNA editing system found in the mitochondria of trypanosomes provides a unique molecular strategy to combat genetic drift in non-selective conditions. In African trypanosomes, many of the mitochondrial mRNAs require extensive RNA editing before they can be translated. During this process, each edited transcript can undergo hundreds of cleavage/ligation events as U-residues are inserted or deleted to generate a translatable open reading frame. A major paradox has been why this incredibly metabolically expensive process would evolve and persist. In this work, we show that many of the mitochondrial genes in trypanosomes are dual-coding, utilizing different reading frames to potentially produce two very different proteins. Access to both reading frames is made possible by alternative editing of the 5’ end of the transcript. We hypothesize that dual-coding genes may work to protect the mitochondrial genes from mutations during growth in the mammalian host, when many of the mitochondrial genes are not being used. Thus, the complex RNA editing system may be maintained because it provides a unique molecular strategy to combat genetic drift.
Collapse
Affiliation(s)
- Laura E. Kirby
- Department of Microbiology and Molecular Genetics, Michigan State University, East Lansing, Michigan, United States of America
| | - Donna Koslowsky
- Department of Microbiology and Molecular Genetics, Michigan State University, East Lansing, Michigan, United States of America
- * E-mail:
| |
Collapse
|
10
|
Lim CS, Brown CM. Hepatitis B virus nuclear export elements: RNA stem-loop α and β, key parts of the HBV post-transcriptional regulatory element. RNA Biol 2016; 13:743-7. [PMID: 27031749 DOI: 10.1080/15476286.2016.1166330] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/06/2023] Open
Abstract
Many viruses contain RNA elements that modulate splicing and/or promote nuclear export of their RNAs. The RNAs of the major human pathogen, hepatitis B virus (HBV) contain a large (~600 bases) composite cis-acting 'post-transcriptional regulatory element' (PRE). This element promotes expression from these naturally intronless transcripts. Indeed, the related woodchuck hepadnavirus PRE (WPRE) is used to enhance expression in gene therapy and other expression vectors. These PRE are likely to act through a combination of mechanisms, including promotion of RNA nuclear export. Functional components of both the HBV PRE and WPRE are 2 conserved RNA cis-acting stem-loop (SL) structures, SLα and SLβ. They are within the coding regions of polymerase (P) gene, and both P and X genes, respectively. Based on previous studies using mutagenesis and/or nuclear magnetic resonance (NMR), here we propose 2 covariance models for SLα and SLβ. The model for the 30-nucleotide SLα contains a G-bulge and a CNGG(U) apical loop of which the first and the fourth loop residues form a CG pair and the fifth loop residue is bulged out, as observed in the NMR structure. The model for the 23-nucleotide SLβ contains a 7-base-pair stem and a 9-nucleotide loop. Comparison of the models with other RNA structural elements, as well as similarity searches of human transcriptome and viral genomes demonstrate that SLα and SLβ are specific to HBV transcripts. However, they are well conserved among the hepadnaviruses of non-human primates, the woodchuck and ground squirrel.
Collapse
Affiliation(s)
- Chun Shen Lim
- a Biochemistry and Genetics Otago , University of Otago , Dunedin , New Zealand
| | - Chris M Brown
- a Biochemistry and Genetics Otago , University of Otago , Dunedin , New Zealand
| |
Collapse
|
11
|
Woo PCY, Lau SKP, Teng JLL, Tsang AKL, Joseph S, Xie J, Jose S, Fan RYY, Wernery U, Yuen KY. A novel astrovirus from dromedaries in the Middle East. J Gen Virol 2015; 96:2697-2707. [PMID: 26296576 DOI: 10.1099/jgv.0.000233] [Citation(s) in RCA: 23] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022] Open
Abstract
The recent emergence of Middle East respiratory syndrome coronavirus from the Middle East and its discovery from dromedary camels has boosted interest in the search for novel viruses in dromedaries. The existence of astroviruses (AstVs) in dromedaries was previously unknown. We describe the discovery of a novel dromedary camel AstV (DcAstV) from dromedaries in Dubai. Among 215 dromedaries, DcAstV was detected in faecal samples of four [three (1.5 %) adult dromedaries and one (8.3 %) dromedary calf] by reverse transcription-PCR. Sequencing of the four DcAstV genomes and phylogenetic analysis showed that the DcAstVs formed a distinct cluster. Although DcAstV was most closely related to a recently characterized porcine AstV 2, their capsid proteins only shared 60-66 % amino acid identity, with a mean amino acid genetic distance of 0.372. Notably, the N-terminal halves of the capsid proteins of DcAstV shared ≤ 85 % amino acid identity, but the C-terminal halves only shared ≤ 49 % amino acid identity compared with the corresponding proteins in other AstVs. A high variation of the genome sequences of DcAstV was also observed, with a mean amino acid genetic distance of 0.214 for ORF2 of the four strains. Recombination analysis revealed a possible recombination event in ORF2 of strain DcAstV-274. The low Ka/Ks ratios (number of non-synonymous substitutions per non-synonymous site to number of synonymous substitutions per synonymous site) of the four ORFs in the DcAstV genomes supported the suggestion that dromedaries are the natural reservoir where AstV is stably evolving. These results suggest that AstV is a novel species of the genus Mamastrovirus in the family Astroviridae. Further studies are important to understand the pathogenic potential of DcAstV.
Collapse
Affiliation(s)
- Patrick C Y Woo
- State Key Laboratory of Emerging Infectious Diseases, The University of Hong Kong, Hong Kong, PR China.,Carol Yu Centre for Infection, The University of Hong Kong, Hong Kong, PR China.,Department of Microbiology, The University of Hong Kong, Hong Kong, PR China.,Research Centre of Infection and Immunology, The University of Hong Kong, Hong Kong, PR China
| | - Susanna K P Lau
- Carol Yu Centre for Infection, The University of Hong Kong, Hong Kong, PR China.,Research Centre of Infection and Immunology, The University of Hong Kong, Hong Kong, PR China.,State Key Laboratory of Emerging Infectious Diseases, The University of Hong Kong, Hong Kong, PR China.,Department of Microbiology, The University of Hong Kong, Hong Kong, PR China
| | - Jade L L Teng
- Department of Microbiology, The University of Hong Kong, Hong Kong, PR China
| | - Alan K L Tsang
- Department of Microbiology, The University of Hong Kong, Hong Kong, PR China
| | - Sunitha Joseph
- Central Veterinary Research Laboratory, Dubai, United Arab Emirates
| | - Jun Xie
- Department of Microbiology, The University of Hong Kong, Hong Kong, PR China
| | - Shanty Jose
- Central Veterinary Research Laboratory, Dubai, United Arab Emirates
| | - Rachel Y Y Fan
- Department of Microbiology, The University of Hong Kong, Hong Kong, PR China
| | - Ulrich Wernery
- Central Veterinary Research Laboratory, Dubai, United Arab Emirates
| | - Kwok-Yung Yuen
- Research Centre of Infection and Immunology, The University of Hong Kong, Hong Kong, PR China.,Carol Yu Centre for Infection, The University of Hong Kong, Hong Kong, PR China.,Department of Microbiology, The University of Hong Kong, Hong Kong, PR China.,State Key Laboratory of Emerging Infectious Diseases, The University of Hong Kong, Hong Kong, PR China
| |
Collapse
|
12
|
Batista ARS, Nicolini C, Rodrigues KB, Melo FL, Vasques RM, de Macêdo MA, Inoue-Nagata AK, Nagata T. Unique RNA 2 sequences of two Brazilian isolates of Pepper ringspot virus, a tobravirus. Virus Genes 2014; 49:169-73. [DOI: 10.1007/s11262-014-1066-8] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/28/2014] [Accepted: 04/01/2014] [Indexed: 10/25/2022]
|
13
|
Viral proteins originated de novo by overprinting can be identified by codon usage: application to the "gene nursery" of Deltaretroviruses. PLoS Comput Biol 2013; 9:e1003162. [PMID: 23966842 PMCID: PMC3744397 DOI: 10.1371/journal.pcbi.1003162] [Citation(s) in RCA: 52] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/26/2012] [Accepted: 06/13/2013] [Indexed: 12/24/2022] Open
Abstract
A well-known mechanism through which new protein-coding genes originate is by modification of pre-existing genes, e.g. by duplication or horizontal transfer. In contrast, many viruses generate protein-coding genes de novo, via the overprinting of a new reading frame onto an existing (“ancestral”) frame. This mechanism is thought to play an important role in viral pathogenicity, but has been poorly explored, perhaps because identifying the de novo frames is very challenging. Therefore, a new approach to detect them was needed. We assembled a reference set of overlapping genes for which we could reliably determine the ancestral frames, and found that their codon usage was significantly closer to that of the rest of the viral genome than the codon usage of de novo frames. Based on this observation, we designed a method that allowed the identification of de novo frames based on their codon usage with a very good specificity, but intermediate sensitivity. Using our method, we predicted that the Rex gene of deltaretroviruses has originated de novo by overprinting the Tax gene. Intriguingly, several genes in the same genomic region have also originated de novo and encode proteins that regulate the functions of Tax. Such “gene nurseries” may be common in viral genomes. Finally, our results confirm that the genomic GC content is not the only determinant of codon usage in viruses and suggest that a constraint linked to translation must influence codon usage. How does novelty originate in nature? It is commonly thought that new genes are generated mainly by modifications of existing genes (the “tinkering” model). In contrast, we have shown recently that in viruses, numerous genes are generated entirely de novo (“from scratch”). The role of these genes remains underexplored, however, because they are difficult to identify. We have therefore developed a new method to detect genes originated de novo in viral genomes, based on the observation that each viral genome has a unique “signature”, which genes originated de novo do not share. We applied this method to analyze the genes of Human T-Lymphotropic Virus 1 (HTLV1), a relative of the HIV virus and also a major human pathogen that infects about twenty million people worldwide. The life cycle of HTLV1 is finely regulated – it can stay dormant for long periods and can provoke blood cancers (leukemias) after a very long incubation. We discovered that several of the genes of HTLV1 have originated de novo. These novel genes play a key role in regulating the life cycle of HTLV1, and presumably its pathogenicity. Our investigations suggest that such “gene nurseries” may be common in viruses.
Collapse
|
14
|
Kawano Y, Neeley S, Adachi K, Nakai H. An experimental and computational evolution-based method to study a mode of co-evolution of overlapping open reading frames in the AAV2 viral genome. PLoS One 2013; 8:e66211. [PMID: 23826091 PMCID: PMC3691236 DOI: 10.1371/journal.pone.0066211] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/15/2013] [Accepted: 05/07/2013] [Indexed: 02/07/2023] Open
Abstract
Overlapping open reading frames (ORFs) in viral genomes undergo co-evolution; however, how individual amino acids coded by overlapping ORFs are structurally, functionally, and co-evolutionarily constrained remains difficult to address by conventional homologous sequence alignment approaches. We report here a new experimental and computational evolution-based methodology to address this question and report its preliminary application to elucidating a mode of co-evolution of the frame-shifted overlapping ORFs in the adeno-associated virus (AAV) serotype 2 viral genome. These ORFs encode both capsid VP protein and non-structural assembly-activating protein (AAP). To show proof of principle of the new method, we focused on the evolutionarily conserved QVKEVTQ and KSKRSRR motifs, a pair of overlapping heptapeptides in VP and AAP, respectively. In the new method, we first identified a large number of capsid-forming VP3 mutants and functionally competent AAP mutants of these motifs from mutant libraries by experimental directed evolution under no co-evolutionary constraints. We used Illumina sequencing to obtain a large dataset and then statistically assessed the viability of VP and AAP heptapeptide mutants. The obtained heptapeptide information was then integrated into an evolutionary algorithm, with which VP and AAP were co-evolved from random or native nucleotide sequences in silico. As a result, we demonstrate that these two heptapeptide motifs could exhibit high degeneracy if coded by separate nucleotide sequences, and elucidate how overlap-evoked co-evolutionary constraints play a role in making the VP and AAP heptapeptide sequences into the present shape. Specifically, we demonstrate that two valine (V) residues and β-strand propensity in QVKEVTQ are structurally important, the strongly negative and hydrophilic nature of KSKRSRR is functionally important, and overlap-evoked co-evolution imposes strong constraints on serine (S) residues in KSKRSRR, despite high degeneracy of the motifs in the absence of co-evolutionary constraints.
Collapse
Affiliation(s)
- Yasuhiro Kawano
- Department of Molecular and Medical Genetics, Oregon Health and Science University School of Medicine, Portland, Oregon, United States of America
- Takara Bio Inc., Otsu Shiga, Japan
| | - Shane Neeley
- Department of Molecular and Medical Genetics, Oregon Health and Science University School of Medicine, Portland, Oregon, United States of America
| | - Kei Adachi
- Department of Molecular and Medical Genetics, Oregon Health and Science University School of Medicine, Portland, Oregon, United States of America
| | - Hiroyuki Nakai
- Department of Molecular and Medical Genetics, Oregon Health and Science University School of Medicine, Portland, Oregon, United States of America
- * E-mail:
| |
Collapse
|
15
|
Sabath N, Wagner A, Karlin D. Evolution of viral proteins originated de novo by overprinting. Mol Biol Evol 2012; 29:3767-80. [PMID: 22821011 PMCID: PMC3494269 DOI: 10.1093/molbev/mss179] [Citation(s) in RCA: 101] [Impact Index Per Article: 7.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/02/2023] Open
Abstract
New protein-coding genes can originate either through modification of existing genes or de novo. Recently, the importance of de novo origination has been recognized in eukaryotes, although eukaryotic genes originated de novo are relatively rare and difficult to identify. In contrast, viruses contain many de novo genes, namely those in which an existing gene has been “overprinted” by a new open reading frame, a process that generates a new protein-coding gene overlapping the ancestral gene. We analyzed the evolution of 12 experimentally validated viral genes that originated de novo and estimated their relative ages. We found that young de novo genes have a different codon usage from the rest of the genome. They evolve rapidly and are under positive or weak purifying selection. Thus, young de novo genes might have strain-specific functions, or no function, and would be difficult to detect using current genome annotation methods that rely on the sequence signature of purifying selection. In contrast to young de novo genes, older de novo genes have a codon usage that is similar to the rest of the genome. They evolve slowly and are under stronger purifying selection. Some of the oldest de novo genes evolve under stronger selection pressure than the ancestral gene they overlap, suggesting an evolutionary tug of war between the ancestral and the de novo gene.
Collapse
Affiliation(s)
- Niv Sabath
- Institute of Evolutionary Biology and Environmental Studies, University of Zurich, Zurich, Switzerland.
| | | | | |
Collapse
|
16
|
Sabath N, Morris JS, Graur D. Is there a twelfth protein-coding gene in the genome of influenza A? A selection-based approach to the detection of overlapping genes in closely related sequences. J Mol Evol 2011; 73:305-15. [PMID: 22187135 DOI: 10.1007/s00239-011-9477-9] [Citation(s) in RCA: 13] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/30/2011] [Accepted: 12/02/2011] [Indexed: 02/06/2023]
Abstract
Protein-coding genes often contain long overlapping open-reading frames (ORFs), which may or may not be functional. Current methods that utilize the signature of purifying selection to detect functional overlapping genes are limited to the analysis of sequences from divergent species, thus rendering them inapplicable to genes found only in closely related sequences. Here, we present a method for the detection of selection signatures on overlapping reading frames by using closely related sequences, and apply the method to several known overlapping genes, and to an overlapping ORF on the negative strand of segment 8 of influenza A virus (NEG8), for which the suggestion has been made that it is functional. We find no evidence that NEG8 is under selection, suggesting that the intact reading frame might be non-functional, although we cannot fully exclude the possibility that the method is not sensitive enough to detect the signature of selection acting on this gene. We present the limitations of the method using known overlapping genes and suggest several approaches to improve it in future studies. Finally, we examine alternative explanations for the sequence conservation of NEG8 in the absence of selection. We show that overlap type and genomic context affect the conservation of intact overlapping ORFs and should therefore be considered in any attempt of estimating the signature of selection in overlapping genes.
Collapse
Affiliation(s)
- Niv Sabath
- Institute of Evolutionary Biology and Environmental Studies, University of Zurich, 8057 Zurich, Switzerland.
| | | | | |
Collapse
|
17
|
Norovirus regulation of the innate immune response and apoptosis occurs via the product of the alternative open reading frame 4. PLoS Pathog 2011; 7:e1002413. [PMID: 22174679 PMCID: PMC3234229 DOI: 10.1371/journal.ppat.1002413] [Citation(s) in RCA: 174] [Impact Index Per Article: 12.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/02/2011] [Accepted: 10/12/2011] [Indexed: 12/25/2022] Open
Abstract
Small RNA viruses have evolved many mechanisms to increase the capacity of their short genomes. Here we describe the identification and characterization of a novel open reading frame (ORF4) encoded by the murine norovirus (MNV) subgenomic RNA, in an alternative reading frame overlapping the VP1 coding region. ORF4 is translated during virus infection and the resultant protein localizes predominantly to the mitochondria. Using reverse genetics we demonstrated that expression of ORF4 is not required for virus replication in tissue culture but its loss results in a fitness cost since viruses lacking the ability to express ORF4 restore expression upon repeated passage in tissue culture. Functional analysis indicated that the protein produced from ORF4 antagonizes the innate immune response to infection by delaying the upregulation of a number of cellular genes activated by the innate pathway, including IFN-Beta. Apoptosis in the RAW264.7 macrophage cell line was also increased during virus infection in the absence of ORF4 expression. In vivo analysis of the WT and mutant virus lacking the ability to express ORF4 demonstrated an important role for ORF4 expression in infection and virulence. STAT1-/- mice infected with a virus lacking the ability to express ORF4 showed a delay in the onset of clinical signs when compared to mice infected with WT virus. Quantitative PCR and histopathological analysis of samples from these infected mice demonstrated that infection with a virus not expressing ORF4 results in a delayed infection in this system. In light of these findings we propose the name virulence factor 1, VF1 for this protein. The identification of VF1 represents the first characterization of an alternative open reading frame protein for the calicivirus family. The immune regulatory function of the MNV VF1 protein provide important perspectives for future research into norovirus biology and pathogenesis. This report describes the identification and characterization of a novel protein of unknown function encoded by a mouse virus genetically similar to human noroviruses. This gene is unique to the mouse virus and occupies the same part of the genome that codes for the major capsid protein. The protein that we have described as virulence factor 1 (VF1) is found in all murine norovirus isolates, absent in all human strains but is indeed expressed during infection. Its expression enables MNV-1 to establish efficient infection of its natural host through interference with interferon-mediated response pathways and apoptosis. Our data would indicate that the VF1 protein is multi-functional with an ability to modulate the host's response to infection. Murine noroviruses are frequently used firstly as a model to study human norovirus replication and pathogenesis, studies hampered by their inability to replicate in cell culture. Secondly, persistent infection of laboratory animals with murine norovirus may affect other models of disease using experimental mice. The role of VF1 in infection and pathology in the differential outcome of infection is the source of continued research in our laboratory.
Collapse
|
18
|
Baranov PV, Wills NM, Barriscale KA, Firth AE, Jud MC, Letsou A, Manning G, Atkins JF. Programmed ribosomal frameshifting in the expression of the regulator of intestinal stem cell proliferation, adenomatous polyposis coli (APC). RNA Biol 2011; 8:637-47. [PMID: 21593603 PMCID: PMC3225980 DOI: 10.4161/rna.8.4.15395] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/16/2011] [Revised: 03/03/2011] [Accepted: 03/04/2011] [Indexed: 12/14/2022] Open
Abstract
A programmed ribosomal frameshift (PRF) in the decoding of APC (adenomatous polyposis coli) mRNA has been identified and characterized in Caenorhabditis worms, Drosophila and mosquitoes. The frameshift product lacks the C-terminal approximately one-third of the product of standard decoding and instead has a short sequence encoded by the -1 frame which is just 13 residues in C. elegans, but is 125 in D. melanogaster. The frameshift site is A_AA.A_AA.C in Caenorhabditids, fruit flies and the mosquitoes studied while a variant A_AA.A_AA.A is found in some other nematodes. The predicted secondary RNA structure of the downstream stimulators varies considerably in the species studied. In the twelve sequenced Drosophila genomes, it is a long stem with a four-way junction in its loop. In the five sequenced Caenorhabditis species, it is a short RNA pseudoknot with an additional stem in loop 1. The efficiency of frameshifting varies significantly, depending on the particular stimulator within the frameshift cassette, when tested with reporter constructs in rabbit reticulocyte lysates. Phylogenetic analysis of the distribution of APC programmed ribosomal frameshifting cassettes suggests it has an ancient origin and raises questions about a possibility of synthesis of alternative protein products during expression of APC in other organisms such as humans. The origin of APC as a PRF candidate emerged from a prior study of evolutionary signatures derived from comparative analysis of the 12 fly genomes. Three other proposed PRF candidates (Xbp1, CG32736, CG14047) with switches in conservation of reading frames are likely explained by mechanisms other than PRF.
Collapse
Affiliation(s)
- Pavel V Baranov
- Biochemistry Department, University College Cork, Cork, Ireland.
| | | | | | | | | | | | | | | |
Collapse
|
19
|
Sabath N, Graur D. Detection of functional overlapping genes: simulation and case studies. J Mol Evol 2010; 71:308-16. [PMID: 20820768 DOI: 10.1007/s00239-010-9386-3] [Citation(s) in RCA: 16] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/18/2009] [Accepted: 07/26/2010] [Indexed: 12/16/2022]
Abstract
As far as protein-coding genes are concerned, there is a non-zero probability that at least one of the five possible overlapping sequences of any gene will contain an open-reading frame (ORF) of a length that may be suitable for coding a functional protein. It is, however, very difficult to determine whether or not such an ORF is functional. Recently, we proposed a method that predicts functionality of an overlapping ORF if it can be shown that it has been subject to purifying selection during its evolution. Here, we use simulation to test this method under several conditions and compare it with the method of Firth and Brown. We found that under most conditions, our method detects functional overlapping genes with higher sensitivity than Firth and Brown's method, while maintaining high specificity. Further, we tested the hypothesis that the two aminoacyl tRNA synthetase classes have originated from a pair of overlapping genes. A central piece of evidence ostensibly supporting this hypothesis is the assertion that an overlapping ORF of a heat-shock protein-70 gene, which exhibits some similarity to class 2 aminoacyl tRNA synthetases, is functional. We found signature of purifying selection only in highly divergent sequences, suggesting that the method yields false-positives in high sequence divergence and that the overlapping ORF is not a functional gene. Finally, we examined three cases of overlap in the human genome. We find varying signatures of purifying selection acting on these overlaps, raising the possibility that two of the overlapping genes may not be functional.
Collapse
Affiliation(s)
- Niv Sabath
- Department of Biology and Biochemistry, University of Houston, Houston, TX 77204, USA.
| | | |
Collapse
|
20
|
Firth AE, Atkins JF. Candidates in Astroviruses, Seadornaviruses, Cytorhabdoviruses and Coronaviruses for +1 frame overlapping genes accessed by leaky scanning. Virol J 2010; 7:17. [PMID: 20100346 PMCID: PMC2832772 DOI: 10.1186/1743-422x-7-17] [Citation(s) in RCA: 21] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/25/2009] [Accepted: 01/25/2010] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Overlapping genes are common in RNA viruses where they serve as a mechanism to optimize the coding potential of compact genomes. However, annotation of overlapping genes can be difficult using conventional gene-finding software. Recently we have been using a number of complementary approaches to systematically identify previously undetected overlapping genes in RNA virus genomes. In this article we gather together a number of promising candidate new overlapping genes that may be of interest to the community. RESULTS Overlapping gene predictions are presented for the astroviruses, seadornaviruses, cytorhabdoviruses and coronaviruses (families Astroviridae, Reoviridae, Rhabdoviridae and Coronaviridae, respectively).
Collapse
Affiliation(s)
- Andrew E Firth
- BioSciences Institute, University College Cork, Cork, Ireland.
| | | |
Collapse
|
21
|
Firth AE, Atkins JF. Evidence for a novel coding sequence overlapping the 5'-terminal approximately 90 codons of the gill-associated and yellow head okavirus envelope glycoprotein gene. Virol J 2009; 6:222. [PMID: 20017924 PMCID: PMC2805633 DOI: 10.1186/1743-422x-6-222] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/01/2009] [Accepted: 12/17/2009] [Indexed: 11/23/2022] Open
Abstract
The genus Okavirus (order Nidovirales) includes a number of viruses that infect crustaceans, causing major losses in the shrimp industry. These viruses have a linear positive-sense ssRNA genome of ~26-27 kb, encoding a large replicase polyprotein that is expressed from the genomic RNA, and several additional proteins that are expressed from a nested set of 3'-coterminal subgenomic RNAs. In this brief report, we describe the bioinformatic discovery of a new, apparently coding, ORF that overlaps the 5' end of the envelope glycoprotein encoding sequence, ORF3, in the +2 reading frame. The new ORF has a strong coding signature and, in fact, is more conserved at the amino acid level than the overlapping region of ORF3. We propose that translation of the new ORF initiates at a conserved AUG codon separated by just 2 nt from the ORF3 AUG initiation codon, resulting in a novel 86 amino acid protein.
Collapse
Affiliation(s)
- Andrew E Firth
- BioSciences Institute, University College Cork, Cork, Ireland.
| | | |
Collapse
|
22
|
Clifford M, Twigg J, Upton C. Evidence for a novel gene associated with human influenza A viruses. Virol J 2009; 6:198. [PMID: 19917120 PMCID: PMC2780412 DOI: 10.1186/1743-422x-6-198] [Citation(s) in RCA: 21] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/15/2009] [Accepted: 11/16/2009] [Indexed: 02/06/2023] Open
Abstract
BACKGROUND Influenza A virus genomes are comprised of 8 negative strand single-stranded RNA segments and are thought to encode 11 proteins, which are all translated from mRNAs complementary to the genomic strands. Although human, swine and avian influenza A viruses are very similar, cross-species infections are usually limited. However, antigenic differences are considerable and when viruses become established in a different host or if novel viruses are created by re-assortment devastating pandemics may arise. RESULTS Examination of influenza A virus genomes from the early 20th Century revealed the association of a 167 codon ORF encoded by the genomic strand of segment 8 with human isolates. Close to the timing of the 1948 pseudopandemic, a mutation occurred that resulted in the extension of this ORF to 216 codons. Since 1948, this ORF has been almost totally maintained in human influenza A viruses suggesting a selectable biological function. The discovery of cytotoxic T cells responding to an epitope encoded by this ORF suggests that it is translated into protein. Evidence of several other non-traditionally translated polypeptides in influenza A virus support the translation of this genomic strand ORF. The gene product is predicted to have a signal sequence and two transmembrane domains. CONCLUSION We hypothesize that the genomic strand of segment 8 of encodes a novel influenza A virus protein. The persistence and conservation of this genomic strand ORF for almost a century in human influenza A viruses provides strong evidence that it is translated into a polypeptide that enhances viral fitness in the human host. This has important consequences for the interpretation of experiments that utilize mutations in the NS1 and NEP genes of segment 8 and also for the consideration of events that may alter the spread and/or pathogenesis of swine and avian influenza A viruses in the human population.
Collapse
Affiliation(s)
- Monica Clifford
- Department of Biochemistry and Microbiology, University of Victoria, Victoria, BC, V8W 3P6, Canada
| | - James Twigg
- Department of Biochemistry and Microbiology, University of Victoria, Victoria, BC, V8W 3P6, Canada
| | - Chris Upton
- Department of Biochemistry and Microbiology, University of Victoria, Victoria, BC, V8W 3P6, Canada
| |
Collapse
|
23
|
Firth AE, Wang QS, Jan E, Atkins JF. Bioinformatic evidence for a stem-loop structure 5'-adjacent to the IGR-IRES and for an overlapping gene in the bee paralysis dicistroviruses. Virol J 2009; 6:193. [PMID: 19895695 PMCID: PMC2777877 DOI: 10.1186/1743-422x-6-193] [Citation(s) in RCA: 27] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/23/2009] [Accepted: 11/06/2009] [Indexed: 02/09/2023] Open
Abstract
The family Dicistroviridae (order Picornavirales) includes species that infect insects and other arthropods. These viruses have a linear positive-sense ssRNA genome of ~8-10 kb, which contains two long ORFs. The 5' ORF encodes the nonstructural polyprotein while the 3' ORF encodes the structural polyprotein. The dicistroviruses are noteworthy for the intergenic Internal Ribosome Entry Site (IGR-IRES) that mediates efficient translation initation on the 3' ORF without the requirement for initiator Met-tRNA. Acute bee paralysis virus, Israel acute paralysis virus of bees and Kashmir bee virus form a distinct subgroup within the Dicistroviridae family. In this brief report, we describe the bioinformatic discovery of a new, apparently coding, ORF in these viruses. The ORF overlaps the 5' end of the structural polyprotein coding sequence in the +1 reading frame. We also identify a potential 14-18 bp RNA stem-loop structure 5'-adjacent to the IGR-IRES. We discuss potential translation initiation mechanisms for the novel ORF in the context of the IGR-IRES and 5'-adjacent stem-loop.
Collapse
Affiliation(s)
- Andrew E Firth
- BioSciences Institute, University College Cork, Cork, Ireland.
| | | | | | | |
Collapse
|
24
|
Functional viral metagenomics and the next generation of molecular tools. Trends Microbiol 2009; 18:20-9. [PMID: 19896852 DOI: 10.1016/j.tim.2009.10.001] [Citation(s) in RCA: 47] [Impact Index Per Article: 2.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/18/2009] [Revised: 10/14/2009] [Accepted: 10/19/2009] [Indexed: 12/13/2022]
Abstract
The enzymes of bacteriophages and other viruses have been essential research tools since the first days of molecular biology. However, the current repertoire of viral enzymes only hints at their overall potential. The most commonly used enzymes are derived from a surprisingly small number of cultivated viruses, which is remarkable considering the extreme abundance and diversity of viruses revealed over the past decade by metagenomic analysis. To access the treasure trove of enzymes hidden in the global virosphere and develop them for research, therapeutic and diagnostic uses, improvements are needed in our ability to rapidly and efficiently discover, express and characterize viral genes to produce useful proteins. In this paper, we discuss improvements to sampling and cloning methods, functional and genomics-based screens, and expression systems, which should accelerate discovery of new enzymes and other viral proteins for use in research and medicine.
Collapse
|
25
|
|
26
|
Firth AE, Atkins JF. A case for a CUG-initiated coding sequence overlapping torovirus ORF1a and encoding a novel 30 kDa product. Virol J 2009; 6:136. [PMID: 19737402 PMCID: PMC2749830 DOI: 10.1186/1743-422x-6-136] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/09/2009] [Accepted: 09/08/2009] [Indexed: 11/23/2022] Open
Abstract
The genus Torovirus (order Nidovirales) includes a number of species that infect livestock. These viruses have a linear positive-sense ssRNA genome of approximately 25-30 kb, encoding a large polyprotein that is expressed from the genomic RNA, and several additional proteins expressed from a nested set of 3'-coterminal subgenomic RNAs. In this brief report, we describe the bioinformatic discovery of a new, apparently coding, ORF that overlaps the 5' end of the polyprotein coding sequence, ORF1a, in the +2 reading frame. The new ORF has a strong coding signature and, in fact, is more conserved at the amino acid level than the overlapping region of ORF1a. We propose that the new ORF utilizes a non-AUG initiation codon--namely a conserved CUG codon in a strong Kozak context--upstream of the ORF1a AUG initiation codon, resulting in a novel 258 amino acid protein, dubbed '30K'.
Collapse
Affiliation(s)
- Andrew E Firth
- BioSciences Institute, University College Cork, Cork, Ireland
| | - John F Atkins
- BioSciences Institute, University College Cork, Cork, Ireland
- Department of Human Genetics, University of Utah, Salt Lake City, UT 84112-5330, USA
| |
Collapse
|
27
|
Firth AE, Atkins JF. Analysis of the coding potential of the partially overlapping 3' ORF in segment 5 of the plant fijiviruses. Virol J 2009; 6:32. [PMID: 19292925 PMCID: PMC2666654 DOI: 10.1186/1743-422x-6-32] [Citation(s) in RCA: 29] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/26/2009] [Accepted: 03/17/2009] [Indexed: 01/10/2023] Open
Abstract
The plant-infecting members of the genus Fijivirus (family Reoviridae) have linear dsRNA genomes divided into 10 segments, two of which contain two substantial and non-overlapping ORFs, while the remaining eight are apparently monocistronic. However, one of these - namely segment 5 - contains a second long ORF (approximately 200+ codons) that overlaps the 3' end of the major ORF (approximately 920-940 codons) in the +1 reading frame. In this report, we use bioinformatic techniques to analyze the pattern of base variations across an alignment of fijivirus segment 5 sequences, and show that this 3' ORF has a strong coding signature. Possible translation mechanisms for this unusually positioned ORF are discussed.
Collapse
Affiliation(s)
- Andrew E Firth
- BioSciences Institute, University College Cork, Cork, Ireland.
| | | |
Collapse
|
28
|
Firth AE, Atkins JF. Bioinformatic analysis suggests that a conserved ORF in the waikaviruses encodes an overlapping gene. Arch Virol 2008; 153:1379-83. [PMID: 18535758 DOI: 10.1007/s00705-008-0119-5] [Citation(s) in RCA: 18] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/26/2008] [Accepted: 04/16/2008] [Indexed: 11/29/2022]
Abstract
The genus Waikavirus belongs to the order Picornavirales, whose members all use a polyprotein expression strategy. With the exception of Theiler's virus, overlapping genes are essentially unknown in the order. Recently, we reported experimental verification for a new short overlapping coding sequence (CDS) in the Potyviridae-a family in which overlapping genes were previously unknown. Using the same bioinformatics software (MLOGD), we have identified an approximately 89-codon conserved open reading frame (ORF) with a strong coding signature in members of the genus Waikavirus. The ORF overlaps the polyprotein ORF but is in the +1 reading frame. Here, we describe the bioinformatic analysis.
Collapse
Affiliation(s)
- Andrew E Firth
- BioSciences Institute, University College Cork, Cork, Ireland.
| | | |
Collapse
|
29
|
Firth AE, Atkins JF. Bioinformatic analysis suggests that the Cypovirus 1 major core protein cistron harbours an overlapping gene. Virol J 2008; 5:62. [PMID: 18492230 PMCID: PMC2409309 DOI: 10.1186/1743-422x-5-62] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/16/2008] [Accepted: 05/20/2008] [Indexed: 11/10/2022] Open
Abstract
Members of the genus Cypovirus (family Reoviridae) are common pathogens of insects. These viruses have linear dsRNA genomes divided into 10–11 segments, which have generally been assumed to be monocistronic. Here, bioinformatic evidence is presented for a short overlapping coding sequence (CDS) in the cypovirus genome segment encoding the major core capsid protein VP1, overlapping the 5'-terminal region of the VP1 ORF in the +1 reading frame. In Cypovirus type 1 (CPV-1), a 62-codon AUG-initiated open reading frame (hereafter ORFX) is present in all four available segment 1 sequences. The pattern of base variations across the sequence alignment indicates that ORFX is subject to functional constraints at the amino acid level (even when the constraints due to coding in the overlapping VP1 reading frame are taken into account; MLOGD software). In fact the translated ORFX shows greater amino acid conservation than the overlapping region of VP1. The genomic location of ORFX is consistent with translation via leaky scanning. A 62–64 codon AUG-initiated ORF is present in a corresponding location and reading frame in other available cypovirus sequences (2 CPV-14, 1 CPV-15) and an 87-codon ORFX homologue may also be present in Aedes pseudoscutellaris reovirus. The ORFX amino acid sequences are hydrophilic and basic, with between 12 and 16 Arg/Lys residues in each though, at 7.5–10.2 kDa, the putative ORFX product is too small to appear on typical published protein gels.
Collapse
Affiliation(s)
- Andrew E Firth
- BioSciences Institute, University College Cork, Cork, Ireland.
| | | |
Collapse
|
30
|
Firth AE. Bioinformatic analysis suggests that the Orbivirus VP6 cistron encodes an overlapping gene. Virol J 2008; 5:48. [PMID: 18489030 PMCID: PMC2373779 DOI: 10.1186/1743-422x-5-48] [Citation(s) in RCA: 39] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/25/2008] [Accepted: 04/14/2008] [Indexed: 11/25/2022] Open
Abstract
Background The genus Orbivirus includes several species that infect livestock – including Bluetongue virus (BTV) and African horse sickness virus (AHSV). These viruses have linear dsRNA genomes divided into ten segments, all of which have previously been assumed to be monocistronic. Results Bioinformatic evidence is presented for a short overlapping coding sequence (CDS) in the Orbivirus genome segment 9, overlapping the VP6 cistron in the +1 reading frame. In BTV, a 77–79 codon AUG-initiated open reading frame (hereafter ORFX) is present in all 48 segment 9 sequences analysed. The pattern of base variations across the 48-sequence alignment indicates that ORFX is subject to functional constraints at the amino acid level (even when the constraints due to coding in the overlapping VP6 reading frame are taken into account; MLOGD software). In fact the translated ORFX shows greater amino acid conservation than the overlapping region of VP6. The ORFX AUG codon has a strong Kozak context in all 48 sequences. Each has only one or two upstream AUG codons, always in the VP6 reading frame, and (with a single exception) always with weak or medium Kozak context. Thus, in BTV, ORFX may be translated via leaky scanning. A long (83–169 codon) ORF is present in a corresponding location and reading frame in all other Orbivirus species analysed except Saint Croix River virus (SCRV; the most divergent). Again, the pattern of base variations across sequence alignments indicates multiple coding in the VP6 and ORFX reading frames. Conclusion At ~9.5 kDa, the putative ORFX product in BTV is too small to appear on most published protein gels. Nonetheless, a review of past literature reveals a number of possible detections. We hope that presentation of this bioinformatic analysis will stimulate an attempt to experimentally verify the expression and functional role of ORFX, and hence lead to a greater understanding of the molecular biology of these important pathogens.
Collapse
Affiliation(s)
- Andrew E Firth
- Department of Biochemistry, BioSciences Institute, University College Cork, Cork, Ireland.
| |
Collapse
|
31
|
Abstract
The family Potyviridae includes >30% of known plant virus species, many of which are of great agricultural significance. These viruses have a positive sense RNA genome that is approximately 10 kb long and contains a single long ORF. The ORF is translated into a large polyprotein, which is cleaved into approximately 10 mature proteins. We report the discovery of a short ORF embedded within the P3 cistron of the polyprotein but translated in the +2 reading-frame. The ORF, termed pipo, is conserved and has a strong bioinformatic coding signature throughout the large and diverse Potyviridae family. Mutations that knock out expression of the PIPO protein in Turnip mosaic potyvirus but leave the polyprotein amino acid sequence unaltered are lethal to the virus. Immunoblotting with antisera raised against two nonoverlapping 14-aa antigens, derived from the PIPO amino acid sequence, reveals the expression of an approximately 25-kDa PIPO fusion product in planta. This is consistent with expression of PIPO as a P3-PIPO fusion product via ribosomal frameshifting or transcriptional slippage at a highly conserved G(1-2)A(6-7) motif at the 5' end of pipo. This discovery suggests that other short overlapping genes may remain hidden even in well studied virus genomes (as well as cellular organisms) and demonstrates the utility of the software package MLOGD as a tool for identifying such genes.
Collapse
|
32
|
Panjaworayan N, Roessner SK, Firth AE, Brown CM. HBVRegDB: annotation, comparison, detection and visualization of regulatory elements in hepatitis B virus sequences. Virol J 2007; 4:136. [PMID: 18086305 PMCID: PMC2235840 DOI: 10.1186/1743-422x-4-136] [Citation(s) in RCA: 27] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/17/2007] [Accepted: 12/17/2007] [Indexed: 12/28/2022] Open
Abstract
BACKGROUND The many Hepadnaviridae sequences available have widely varied functional annotation. The genomes are very compact (approximately 3.2 kb) but contain multiple layers of functional regulatory elements in addition to coding regions. Key regions are subject to purifying selection, as mutations in these regions will produce non-functional viruses. RESULTS These genomic sequences have been organized into a structured database to facilitate research at the molecular level. HBVRegDB is a comparative genomic analysis tool with an integrated underlying sequence database. The database contains genomic sequence data from representative viruses. In addition to INSDC and RefSeq annotation, HBVRegDB also contains expert and systematically calculated annotations (e.g. promoters) and comparative genome analysis results (e.g. blastn, tblastx). It also contains analyses based on curated HBV alignments. Information about conserved regions - including primary conservation (e.g. CDS-Plotcon) and RNA secondary structure predictions (e.g. Alidot) - is integrated into the database. A large amount of data is graphically presented using the GBrowse (Generic Genome Browser) adapted for analysis of viral genomes. Flexible query access is provided based on any annotated genomic feature. Novel regulatory motifs can be found by analysing the annotated sequences. CONCLUSION HBVRegDB serves as a knowledge database and as a comparative genomic analysis tool for molecular biologists investigating HBV. It is publicly available and complementary to other viral and HBV focused datasets and tools http://hbvregdb.otago.ac.nz. The availability of multiple and highly annotated sequences of viral genomes in one database combined with comparative analysis tools facilitates detection of novel genomic elements.
Collapse
|
33
|
Abstract
MOTIVATION Viral genomes tend to code in overlapping reading frames to maximize informational content. This may result in atypical codon bias and particular evolutionary constraints. Due to the fast mutation rate of viruses, there is additional strong evidence for varying selection between intra- and intergenomic regions. The presence of multiple coding regions complicates the concept of K(a)/K(s) ratio, and thus begs for an alternative approach when investigating selection strengths. Building on the paper by McCauley and Hein, we develop a method for annotating a viral genome coding in overlapping reading frames. We introduce an evolutionary model capable of accounting for varying levels of selection along the genome, and incorporate it into our prior single sequence HMM methodology, extending it now to a phylogenetic HMM. Given an alignment of several homologous viruses to a reference sequence, we may thus achieve an annotation both of coding regions as well as selection strengths, allowing us to investigate different selection patterns and hypotheses. RESULTS We illustrate our method by applying it to a multiple alignment of four HIV2 sequences, as well as of three Hepatitis B sequences. We obtain an annotation of the coding regions, as well as a posterior probability for each site of the strength of selection acting on it. From this we may deduce the average posterior selection acting on the different genes. Whilst we are encouraged to see in HIV2, that the known to be conserved genes gag and pol are indeed annotated as such, we also discover several sites of less stringent negative selection within the env gene. To the best of our knowledge, we are the first to subsequently provide a full selection annotation of the Hepatitis B genome by explicitly modelling the evolution within overlapping reading frames, and not relying on simple K(a)/K(s) ratios.
Collapse
Affiliation(s)
- Stephen McCauley
- Department of Statistics, University of Oxford, 1 South Parks Road, OX1 3TG, UK
| | | | | | | |
Collapse
|
34
|
Belshaw R, Pybus OG, Rambaut A. The evolution of genome compression and genomic novelty in RNA viruses. Genome Res 2007; 17:1496-504. [PMID: 17785537 PMCID: PMC1987338 DOI: 10.1101/gr.6305707] [Citation(s) in RCA: 119] [Impact Index Per Article: 6.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/25/2022]
Abstract
The genomes of RNA viruses are characterized by their extremely small size and extremely high mutation rates (typically 10 kb and 10(-4)/base/replication cycle, respectively), traits that are thought to be causally linked. One aspect of their small size is the genome compression caused by the use of overlapping genes (where some nucleotides code for two genes). Using a comparative analysis of all known RNA viral species, we show that viruses with larger genomes tend to have less gene overlap. We provide a numerical model to show how a high mutation rate could lead to gene overlap, and we discuss the factors that might explain the observed relationship between gene overlap and genome size. We also propose a model for the evolution of gene overlap based on the co-opting of previously unused ORFs, which gives rise to two types of overlap: (1) the creation of novel genes inside older genes, predominantly via +1 frameshifts, and (2) the incremental increase in overlap between originally contiguous genes, with no frameshift preference. Both types of overlap are viewed as the creation of genomic novelty under pressure for genome compression. Simulations based on our model generate the empirical size distributions of overlaps and explain the observed frameshift preferences. We suggest that RNA viruses are a good model system for the investigation of general evolutionary relationship between genome attributes such as mutational robustness, mutation rate, and size.
Collapse
Affiliation(s)
- Robert Belshaw
- Department of Zoology, University of Oxford, Oxford OX1 3PS, United Kingdom.
| | | | | |
Collapse
|
35
|
de Groot S, Mailund T, Hein J. Comparative annotation of viral genomes with non-conserved gene structure. Bioinformatics 2007; 23:1080-9. [PMID: 17341494 DOI: 10.1093/bioinformatics/btm078] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open
Abstract
MOTIVATION Detecting genes in viral genomes is a complex task. Due to the biological necessity of them being constrained in length, RNA viruses in particular tend to code in overlapping reading frames. Since one amino acid is encoded by a triplet of nucleic acids, up to three genes may be coded for simultaneously in one direction. Conventional hidden Markov model (HMM)-based gene-finding algorithms may typically find it difficult to identify multiple coding regions, since in general their topologies do not allow for the presence of overlapping or nested genes. Comparative methods have therefore been restricted to likelihood ratio tests on potential regions as to being double or single coding, using the fact that the constrictions forced upon multiple-coding nucleotides will result in atypical sequence evolution. Exploiting these same constraints, we present an HMM based gene-finding program, which allows for coding in unidirectional nested and overlapping reading frames, to annotate two homologous aligned viral genomes. Our method does not insist on conserved gene structure between the two sequences, thus making it applicable for the pairwise comparison of more distantly related sequences. RESULTS We apply our method to 15 pairwise alignments of six different HIV2 genomes. Given sufficient evolutionary distance between the two sequences, we achieve sensitivity of approximately 84-89% and specificity of approximately 97-99.9%. We additionally annotate three pairwise alignments of the more distantly related HIV1 and HIV2, as well as of two different hepatitis viruses, attaining results of approximately 87% sensitivity and approximately 98.5% specificity. We subsequently incorporate prior knowledge by 'knowing' the gene structure of one sequence and annotating the other conditional on it. Boosting accuracy close to perfect we demonstrate that conservation of gene structure on top of nucleotide sequence is a valuable source of information, especially in distantly related genomes. AVAILABILITY The Java code is available from the authors.
Collapse
|
36
|
McCauley S, Hein J. Using hidden Markov models and observed evolution to annotate viral genomes. Bioinformatics 2006; 22:1308-16. [PMID: 16613911 DOI: 10.1093/bioinformatics/btl092] [Citation(s) in RCA: 10] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open
Abstract
MOTIVATION ssRNA (single stranded) viral genomes are generally constrained in length and utilize overlapping reading frames to maximally exploit the coding potential within the genome length restrictions. This overlapping coding phenomenon leads to complex evolutionary constraints operating on the genome. In regions which code for more than one protein, silent mutations in one reading frame generally have a protein coding effect in another. To maximize coding flexibility in all reading frames, overlapping regions are often compositionally biased towards amino acids which are 6-fold degenerate with respect to the 64 codon alphabet. Previous methodologies have used this fact in an ad hoc manner to look for overlapping genes by motif matching. In this paper differentiated nucleotide compositional patterns in overlapping regions are incorporated into a probabilistic hidden Markov model (HMM) framework which is used to annotate ssRNA viral genomes. This work focuses on single sequence annotation and applies an HMM framework to ssRNA viral annotation. A description of how the HMM is parameterized, whilst annotating within a missing data framework is given. A Phylogenetic HMM (Phylo-HMM) extension, as applied to 14 aligned HIV2 sequences is also presented. This evolutionary extension serves as an illustration of the potential of the Phylo-HMM framework for ssRNA viral genomic annotation. RESULTS The single sequence annotation procedure (SSA) is applied to 14 different strains of the HIV2 virus. Further results on alternative ssRNA viral genomes are presented to illustrate more generally the performance of the method. The results of the SSA method are encouraging however there is still room for improvement, and since there is overwhelming evidence to indicate that comparative methods can improve coding sequence (CDS) annotation, the SSA method is extended to a Phylo-HMM to incorporate evolutionary information. The Phylo-HMM extension is applied to the same set of 14 HIV2 sequences which are pre-aligned. The performance improvement that results from including the evolutionary information in the analysis is illustrated.
Collapse
|
37
|
Allen MJ, Schroeder DC, Donkin A, Crawfurd KJ, Wilson WH. Genome comparison of two Coccolithoviruses. Virol J 2006; 3:15. [PMID: 16553948 PMCID: PMC1440845 DOI: 10.1186/1743-422x-3-15] [Citation(s) in RCA: 26] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/25/2005] [Accepted: 03/22/2006] [Indexed: 11/17/2022] Open
Abstract
Background The Coccolithoviridae is a recently discovered family of viruses that infect the marine coccolithophorid Emiliania huxleyi. Following on from the sequencing of the type strain EhV-86, we have sequenced a second strain, EhV-163. Results We have sequenced approximately 80% of the EhV-163 genome, equating to more than 200 full length CDSs. Conserved and variable CDSs and a gene replacement have been identified in the EhV-86 and EhV-163 genomes. Conclusion The sequencing of EhV-163 has provided a wealth of information which will aid the re-annotating of the EhV-86 genome and identified a gene insertion in EhV-163.
Collapse
Affiliation(s)
- Michael J Allen
- Plymouth Marine Laboratory, Prospect Place, The Hoe, Plymouth, PL1 3DH, UK
| | | | - Andrew Donkin
- Plymouth Marine Laboratory, Prospect Place, The Hoe, Plymouth, PL1 3DH, UK
| | | | - William H Wilson
- Plymouth Marine Laboratory, Prospect Place, The Hoe, Plymouth, PL1 3DH, UK
| |
Collapse
|
38
|
Firth AE, Brown CM. Detecting overlapping coding sequences in virus genomes. BMC Bioinformatics 2006; 7:75. [PMID: 16483358 PMCID: PMC1395342 DOI: 10.1186/1471-2105-7-75] [Citation(s) in RCA: 67] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/09/2005] [Accepted: 02/16/2006] [Indexed: 11/10/2022] Open
Abstract
Background Detecting new coding sequences (CDSs) in viral genomes can be difficult for several reasons. The typically compact genomes often contain a number of overlapping coding and non-coding functional elements, which can result in unusual patterns of codon usage; conservation between related sequences can be difficult to interpret – especially within overlapping genes; and viruses often employ non-canonical translational mechanisms – e.g. frameshifting, stop codon read-through, leaky-scanning and internal ribosome entry sites – which can conceal potentially coding open reading frames (ORFs). Results In a previous paper we introduced a new statistic – MLOGD (Maximum Likelihood Overlapping Gene Detector) – for detecting and analysing overlapping CDSs. Here we present (a) an improved MLOGD statistic, (b) a greatly extended suite of software using MLOGD, (c) a database of results for 640 virus sequence alignments, and (d) a web-interface to the software and database. Tests show that, from an alignment with just 20 mutations, MLOGD can discriminate non-overlapping CDSs from non-coding ORFs with a typical accuracy of up to 98%, and can detect CDSs overlapping known CDSs with a typical accuracy of 90%. In addition, the software produces a variety of statistics and graphics, useful for analysing an input multiple sequence alignment. Conclusion MLOGD is an easy-to-use tool for virus genome annotation, detecting new CDSs – in particular overlapping or short CDSs – and for analysing overlapping CDSs following frameshift sites. The software, web-server, database and supplementary material are available at .
Collapse
Affiliation(s)
- Andrew E Firth
- Department of Biochemistry, University of Otago, PO Box 56, Dunedin, New Zealand
| | - Chris M Brown
- Department of Biochemistry, University of Otago, PO Box 56, Dunedin, New Zealand
| |
Collapse
|
39
|
Current Awareness on Comparative and Functional Genomics. Comp Funct Genomics 2005. [PMCID: PMC2447509 DOI: 10.1002/cfg.490] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/09/2022] Open
|