Reference Citation Analysis: Find an Article, Find a Category, Find a Journal, Find a Scholar

For: Firth AE, Brown CM. Detecting overlapping coding sequences with pairwise alignments. Bioinformatics 2004;21:282-92. [PMID: 15347574 DOI: 10.1093/bioinformatics/bti007] [Citation(s) in RCA: 39] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022] Open

For:	Firth AE, Brown CM. Detecting overlapping coding sequences with pairwise alignments. Bioinformatics 2004;21:282-92. [PMID: 15347574 DOI: 10.1093/bioinformatics/bti007] [Citation(s) in RCA: 39] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022] Open

Number

Cited by Other Article(s)

Karlin DG. Parvovirus B19 and Human Parvovirus 4 Encode Similar Proteins in a Reading Frame Overlapping the VP1 Capsid Gene. Viruses 2024;16:191. [PMID: 38399966 PMCID: PMC10891878 DOI: 10.3390/v16020191] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/12/2023] [Revised: 01/12/2024] [Accepted: 01/24/2024] [Indexed: 02/25/2024] Open

Pavesi A. Origin, Evolution and Stability of Overlapping Genes in Viruses: A Systematic Review. Genes (Basel) 2021;12:genes12060809. [PMID: 34073395 PMCID: PMC8227390 DOI: 10.3390/genes12060809] [Citation(s) in RCA: 19] [Impact Index Per Article: 4.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/06/2021] [Revised: 05/22/2021] [Accepted: 05/24/2021] [Indexed: 12/11/2022] Open

Unconventional viral gene expression mechanisms as therapeutic targets. Nature 2021;593:362-371. [PMID: 34012080 DOI: 10.1038/s41586-021-03511-5] [Citation(s) in RCA: 34] [Impact Index Per Article: 8.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/08/2020] [Accepted: 03/22/2021] [Indexed: 12/14/2022]

Du Y, Ji C, Liu T, Zhang W, Fang Q, Dong Q, Li M, Wang H, Chen Y, Ouyang K, Wei Z, Huang W. Identification of a novel protein in porcine astrovirus that is important for virus replication. Vet Microbiol 2021;255:108984. [PMID: 33684827 DOI: 10.1016/j.vetmic.2021.108984] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/19/2020] [Accepted: 01/07/2021] [Indexed: 10/22/2022]

Orientation-dependent toxic effect of human papillomavirus type 33 long control region DNA in Escherichia coli cells. Virus Genes 2020;56:298-305. [PMID: 32246353 PMCID: PMC7220894 DOI: 10.1007/s11262-020-01754-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/21/2020] [Accepted: 03/20/2020] [Indexed: 11/15/2022]

Schlub TE, Buchmann JP, Holmes EC. A Simple Method to Detect Candidate Overlapping Genes in Viruses Using Single Genome Sequences. Mol Biol Evol 2019;35:2572-2581. [PMID: 30099499 PMCID: PMC6188560 DOI: 10.1093/molbev/msy155] [Citation(s) in RCA: 18] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/13/2022] Open

Kongari R, Rajaure M, Cahill J, Rasche E, Mijalis E, Berry J, Young R. Phage spanins: diversity, topological dynamics and gene convergence. BMC Bioinformatics 2018;19:326. [PMID: 30219026 PMCID: PMC6139136 DOI: 10.1186/s12859-018-2342-8] [Citation(s) in RCA: 66] [Impact Index Per Article: 9.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/19/2018] [Accepted: 08/28/2018] [Indexed: 01/21/2023] Open

Abstract

BACKGROUND

Spanins are phage lysis proteins required to disrupt the outer membrane. Phages employ either two-component spanins or unimolecular spanins in this final step of Gram-negative host lysis. Two-component spanins like Rz-Rz1 from phage lambda consist of an integral inner membrane protein: i-spanin, and an outer membrane lipoprotein: o-spanin, that form a complex spanning the periplasm. Two-component spanins exist in three different genetic architectures; embedded, overlapped and separated. In contrast, the unimolecular spanins, like gp11 from phage T1, have an N-terminal lipoylation signal sequence and a C-terminal transmembrane domain to account for the topology requirements. Our proposed model for spanin function, for both spanin types, follows a common theme of the outer membrane getting fused with the inner membrane, effecting the release of progeny virions.

RESULTS

Here we present a SpaninDataBase which consists of 528 two-component spanins and 58 unimolecular spanins identified in this analysis. Primary analysis revealed significant differences in the secondary structure predictions for the periplasmic domains of the two-component and unimolecular spanin types, as well as within the three different genetic architectures of the two-component spanins. Using a threshold of 40% sequence identity over 40% sequence length, we were able to group the spanins into 143 i-spanin, 125 o-spanin and 13 u-spanin families. More than 40% of these families from each type were singletons, underlining the extreme diversity of this class of lysis proteins. Multiple sequence alignments of periplasmic domains demonstrated conserved secondary structure patterns and domain organization within family members. Furthermore, analysis of families with members from different architecture allowed us to interpret the evolutionary dynamics of spanin gene arrangement. Also, the potential universal role of intermolecular disulfide bonds in two-component spanin function was substantiated through bioinformatic and genetic approaches. Additionally, a novel lipobox motif, AWAC, was identified and experimentally verified.

CONCLUSIONS

The findings from this bioinformatic approach gave us instructive insights into spanin function, evolution, domain organization and provide a platform for future spanin annotation, as well as biochemical and genetic experiments. They also establish that spanins, like viral membrane fusion proteins, adopt different strategies to achieve fusion of the inner and outer membranes.

Collapse

Hücker SM, Vanderhaeghen S, Abellan-Schneyder I, Wecko R, Simon S, Scherer S, Neuhaus K. A novel short L-arginine responsive protein-coding gene (laoB) antiparallel overlapping to a CadC-like transcriptional regulator in Escherichia coli O157:H7 Sakai originated by overprinting. BMC Evol Biol 2018;18:21. [PMID: 29433444 PMCID: PMC5810103 DOI: 10.1186/s12862-018-1134-0] [Citation(s) in RCA: 17] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/22/2017] [Accepted: 01/31/2018] [Indexed: 11/10/2022] Open

Abstract

Background

Due to the DNA triplet code, it is possible that the sequences of two or more protein-coding genes overlap to a large degree. However, such non-trivial overlaps are usually excluded by genome annotation pipelines and, thus, only a few overlapping gene pairs have been described in bacteria. In contrast, transcriptome and translatome sequencing reveals many signals originated from the antisense strand of annotated genes, of which we analyzed an example gene pair in more detail.

Results

A small open reading frame of Escherichia coli O157:H7 strain Sakai (EHEC), designated laoB (L-arginine responsive overlapping gene), is embedded in reading frame −2 in the antisense strand of ECs5115, encoding a CadC-like transcriptional regulator. This overlapping gene shows evidence of transcription and translation in Luria-Bertani (LB) and brain-heart infusion (BHI) medium based on RNA sequencing (RNAseq) and ribosomal-footprint sequencing (RIBOseq). The transcriptional start site is 289 base pairs (bp) upstream of the start codon and transcription termination is 155 bp downstream of the stop codon. Overexpression of LaoB fused to an enhanced green fluorescent protein (EGFP) reporter was possible. The sequence upstream of the transcriptional start site displayed strong promoter activity under different conditions, whereas promoter activity was significantly decreased in the presence of L-arginine. A strand-specific translationally arrested mutant of laoB provided a significant growth advantage in competitive growth experiments in the presence of L-arginine compared to the wild type, which returned to wild type level after complementation of laoB in trans. A phylostratigraphic analysis indicated that the novel gene is restricted to the Escherichia/Shigella clade and might have originated recently by overprinting leading to the expression of part of the antisense strand of ECs5115.

Conclusions

Here, we present evidence of a novel small protein-coding gene laoB encoded in the antisense frame −2 of the annotated gene ECs5115. Clearly, laoB is evolutionarily young and it originated in the Escherichia/Shigella clade by overprinting, a process which may cause the de novo evolution of bacterial genes like laoB.

Electronic supplementary material

The online version of this article (10.1186/s12862-018-1134-0) contains supplementary material, which is available to authorized users.

Collapse

Kirby LE, Koslowsky D. Mitochondrial dual-coding genes in Trypanosoma brucei. PLoS Negl Trop Dis 2017;11:e0005989. [PMID: 28991908 PMCID: PMC5650466 DOI: 10.1371/journal.pntd.0005989] [Citation(s) in RCA: 17] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/15/2017] [Revised: 10/20/2017] [Accepted: 09/23/2017] [Indexed: 12/31/2022] Open

Abstract

Trypanosoma brucei is transmitted between mammalian hosts by the tsetse fly. In the mammal, they are exclusively extracellular, continuously replicating within the bloodstream. During this stage, the mitochondrion lacks a functional electron transport chain (ETC). Successful transition to the fly, requires activation of the ETC and ATP synthesis via oxidative phosphorylation. This life cycle leads to a major problem: in the bloodstream, the mitochondrial genes are not under selection and are subject to genetic drift that endangers their integrity. Exacerbating this, T. brucei undergoes repeated population bottlenecks as they evade the host immune system that would create additional forces of genetic drift. These parasites possess several unique genetic features, including RNA editing of mitochondrial transcripts. RNA editing creates open reading frames by the guided insertion and deletion of U-residues within the mRNA. A major question in the field has been why this metabolically expensive system of RNA editing would evolve and persist. Here, we show that many of the edited mRNAs can alter the choice of start codon and the open reading frame by alternative editing of the 5’ end. Analyses of mutational bias indicate that six of the mitochondrial genes may be dual-coding and that RNA editing allows access to both reading frames. We hypothesize that dual-coding genes can protect genetic information by essentially hiding a non-selected gene within one that remains under selection. Thus, the complex RNA editing system found in the mitochondria of trypanosomes provides a unique molecular strategy to combat genetic drift in non-selective conditions.

In African trypanosomes, many of the mitochondrial mRNAs require extensive RNA editing before they can be translated. During this process, each edited transcript can undergo hundreds of cleavage/ligation events as U-residues are inserted or deleted to generate a translatable open reading frame. A major paradox has been why this incredibly metabolically expensive process would evolve and persist. In this work, we show that many of the mitochondrial genes in trypanosomes are dual-coding, utilizing different reading frames to potentially produce two very different proteins. Access to both reading frames is made possible by alternative editing of the 5’ end of the transcript. We hypothesize that dual-coding genes may work to protect the mitochondrial genes from mutations during growth in the mammalian host, when many of the mitochondrial genes are not being used. Thus, the complex RNA editing system may be maintained because it provides a unique molecular strategy to combat genetic drift.

Collapse

Lim CS, Brown CM. Hepatitis B virus nuclear export elements: RNA stem-loop α and β, key parts of the HBV post-transcriptional regulatory element. RNA Biol 2016;13:743-7. [PMID: 27031749 DOI: 10.1080/15476286.2016.1166330] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/06/2023] Open

Woo PCY, Lau SKP, Teng JLL, Tsang AKL, Joseph S, Xie J, Jose S, Fan RYY, Wernery U, Yuen KY. A novel astrovirus from dromedaries in the Middle East. J Gen Virol 2015;96:2697-2707. [PMID: 26296576 DOI: 10.1099/jgv.0.000233] [Citation(s) in RCA: 23] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022] Open

Batista ARS, Nicolini C, Rodrigues KB, Melo FL, Vasques RM, de Macêdo MA, Inoue-Nagata AK, Nagata T. Unique RNA 2 sequences of two Brazilian isolates of Pepper ringspot virus, a tobravirus. Virus Genes 2014;49:169-73. [DOI: 10.1007/s11262-014-1066-8] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/28/2014] [Accepted: 04/01/2014] [Indexed: 10/25/2022]

Viral proteins originated de novo by overprinting can be identified by codon usage: application to the "gene nursery" of Deltaretroviruses. PLoS Comput Biol 2013;9:e1003162. [PMID: 23966842 PMCID: PMC3744397 DOI: 10.1371/journal.pcbi.1003162] [Citation(s) in RCA: 52] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/26/2012] [Accepted: 06/13/2013] [Indexed: 12/24/2022] Open

Abstract

A well-known mechanism through which new protein-coding genes originate is by modification of pre-existing genes, e.g. by duplication or horizontal transfer. In contrast, many viruses generate protein-coding genes de novo, via the overprinting of a new reading frame onto an existing (“ancestral”) frame. This mechanism is thought to play an important role in viral pathogenicity, but has been poorly explored, perhaps because identifying the de novo frames is very challenging. Therefore, a new approach to detect them was needed. We assembled a reference set of overlapping genes for which we could reliably determine the ancestral frames, and found that their codon usage was significantly closer to that of the rest of the viral genome than the codon usage of de novo frames. Based on this observation, we designed a method that allowed the identification of de novo frames based on their codon usage with a very good specificity, but intermediate sensitivity. Using our method, we predicted that the Rex gene of deltaretroviruses has originated de novo by overprinting the Tax gene. Intriguingly, several genes in the same genomic region have also originated de novo and encode proteins that regulate the functions of Tax. Such “gene nurseries” may be common in viral genomes. Finally, our results confirm that the genomic GC content is not the only determinant of codon usage in viruses and suggest that a constraint linked to translation must influence codon usage.

How does novelty originate in nature? It is commonly thought that new genes are generated mainly by modifications of existing genes (the “tinkering” model). In contrast, we have shown recently that in viruses, numerous genes are generated entirely de novo (“from scratch”). The role of these genes remains underexplored, however, because they are difficult to identify. We have therefore developed a new method to detect genes originated de novo in viral genomes, based on the observation that each viral genome has a unique “signature”, which genes originated de novo do not share. We applied this method to analyze the genes of Human T-Lymphotropic Virus 1 (HTLV1), a relative of the HIV virus and also a major human pathogen that infects about twenty million people worldwide. The life cycle of HTLV1 is finely regulated – it can stay dormant for long periods and can provoke blood cancers (leukemias) after a very long incubation. We discovered that several of the genes of HTLV1 have originated de novo. These novel genes play a key role in regulating the life cycle of HTLV1, and presumably its pathogenicity. Our investigations suggest that such “gene nurseries” may be common in viruses.

Collapse

Kawano Y, Neeley S, Adachi K, Nakai H. An experimental and computational evolution-based method to study a mode of co-evolution of overlapping open reading frames in the AAV2 viral genome. PLoS One 2013;8:e66211. [PMID: 23826091 PMCID: PMC3691236 DOI: 10.1371/journal.pone.0066211] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/15/2013] [Accepted: 05/07/2013] [Indexed: 02/07/2023] Open

Abstract

Overlapping open reading frames (ORFs) in viral genomes undergo co-evolution; however, how individual amino acids coded by overlapping ORFs are structurally, functionally, and co-evolutionarily constrained remains difficult to address by conventional homologous sequence alignment approaches. We report here a new experimental and computational evolution-based methodology to address this question and report its preliminary application to elucidating a mode of co-evolution of the frame-shifted overlapping ORFs in the adeno-associated virus (AAV) serotype 2 viral genome. These ORFs encode both capsid VP protein and non-structural assembly-activating protein (AAP). To show proof of principle of the new method, we focused on the evolutionarily conserved QVKEVTQ and KSKRSRR motifs, a pair of overlapping heptapeptides in VP and AAP, respectively. In the new method, we first identified a large number of capsid-forming VP3 mutants and functionally competent AAP mutants of these motifs from mutant libraries by experimental directed evolution under no co-evolutionary constraints. We used Illumina sequencing to obtain a large dataset and then statistically assessed the viability of VP and AAP heptapeptide mutants. The obtained heptapeptide information was then integrated into an evolutionary algorithm, with which VP and AAP were co-evolved from random or native nucleotide sequences in silico. As a result, we demonstrate that these two heptapeptide motifs could exhibit high degeneracy if coded by separate nucleotide sequences, and elucidate how overlap-evoked co-evolutionary constraints play a role in making the VP and AAP heptapeptide sequences into the present shape. Specifically, we demonstrate that two valine (V) residues and β-strand propensity in QVKEVTQ are structurally important, the strongly negative and hydrophilic nature of KSKRSRR is functionally important, and overlap-evoked co-evolution imposes strong constraints on serine (S) residues in KSKRSRR, despite high degeneracy of the motifs in the absence of co-evolutionary constraints.

Collapse

Sabath N, Wagner A, Karlin D. Evolution of viral proteins originated de novo by overprinting. Mol Biol Evol 2012;29:3767-80. [PMID: 22821011 PMCID: PMC3494269 DOI: 10.1093/molbev/mss179] [Citation(s) in RCA: 101] [Impact Index Per Article: 7.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/02/2023] Open

Sabath N, Morris JS, Graur D. Is there a twelfth protein-coding gene in the genome of influenza A? A selection-based approach to the detection of overlapping genes in closely related sequences. J Mol Evol 2011;73:305-15. [PMID: 22187135 DOI: 10.1007/s00239-011-9477-9] [Citation(s) in RCA: 13] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/30/2011] [Accepted: 12/02/2011] [Indexed: 02/06/2023]

Norovirus regulation of the innate immune response and apoptosis occurs via the product of the alternative open reading frame 4. PLoS Pathog 2011;7:e1002413. [PMID: 22174679 PMCID: PMC3234229 DOI: 10.1371/journal.ppat.1002413] [Citation(s) in RCA: 174] [Impact Index Per Article: 12.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/02/2011] [Accepted: 10/12/2011] [Indexed: 12/25/2022] Open

Abstract

Small RNA viruses have evolved many mechanisms to increase the capacity of their short genomes. Here we describe the identification and characterization of a novel open reading frame (ORF4) encoded by the murine norovirus (MNV) subgenomic RNA, in an alternative reading frame overlapping the VP1 coding region. ORF4 is translated during virus infection and the resultant protein localizes predominantly to the mitochondria. Using reverse genetics we demonstrated that expression of ORF4 is not required for virus replication in tissue culture but its loss results in a fitness cost since viruses lacking the ability to express ORF4 restore expression upon repeated passage in tissue culture. Functional analysis indicated that the protein produced from ORF4 antagonizes the innate immune response to infection by delaying the upregulation of a number of cellular genes activated by the innate pathway, including IFN-Beta. Apoptosis in the RAW264.7 macrophage cell line was also increased during virus infection in the absence of ORF4 expression. In vivo analysis of the WT and mutant virus lacking the ability to express ORF4 demonstrated an important role for ORF4 expression in infection and virulence. STAT1-/- mice infected with a virus lacking the ability to express ORF4 showed a delay in the onset of clinical signs when compared to mice infected with WT virus. Quantitative PCR and histopathological analysis of samples from these infected mice demonstrated that infection with a virus not expressing ORF4 results in a delayed infection in this system. In light of these findings we propose the name virulence factor 1, VF1 for this protein. The identification of VF1 represents the first characterization of an alternative open reading frame protein for the calicivirus family. The immune regulatory function of the MNV VF1 protein provide important perspectives for future research into norovirus biology and pathogenesis.

This report describes the identification and characterization of a novel protein of unknown function encoded by a mouse virus genetically similar to human noroviruses. This gene is unique to the mouse virus and occupies the same part of the genome that codes for the major capsid protein. The protein that we have described as virulence factor 1 (VF1) is found in all murine norovirus isolates, absent in all human strains but is indeed expressed during infection. Its expression enables MNV-1 to establish efficient infection of its natural host through interference with interferon-mediated response pathways and apoptosis. Our data would indicate that the VF1 protein is multi-functional with an ability to modulate the host's response to infection. Murine noroviruses are frequently used firstly as a model to study human norovirus replication and pathogenesis, studies hampered by their inability to replicate in cell culture. Secondly, persistent infection of laboratory animals with murine norovirus may affect other models of disease using experimental mice. The role of VF1 in infection and pathology in the differential outcome of infection is the source of continued research in our laboratory.

Collapse

Baranov PV, Wills NM, Barriscale KA, Firth AE, Jud MC, Letsou A, Manning G, Atkins JF. Programmed ribosomal frameshifting in the expression of the regulator of intestinal stem cell proliferation, adenomatous polyposis coli (APC). RNA Biol 2011;8:637-47. [PMID: 21593603 PMCID: PMC3225980 DOI: 10.4161/rna.8.4.15395] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/16/2011] [Revised: 03/03/2011] [Accepted: 03/04/2011] [Indexed: 12/14/2022] Open

Sabath N, Graur D. Detection of functional overlapping genes: simulation and case studies. J Mol Evol 2010;71:308-16. [PMID: 20820768 DOI: 10.1007/s00239-010-9386-3] [Citation(s) in RCA: 16] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/18/2009] [Accepted: 07/26/2010] [Indexed: 12/16/2022]

Firth AE, Atkins JF. Candidates in Astroviruses, Seadornaviruses, Cytorhabdoviruses and Coronaviruses for +1 frame overlapping genes accessed by leaky scanning. Virol J 2010;7:17. [PMID: 20100346 PMCID: PMC2832772 DOI: 10.1186/1743-422x-7-17] [Citation(s) in RCA: 21] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/25/2009] [Accepted: 01/25/2010] [Indexed: 11/10/2022] Open

Firth AE, Atkins JF. Evidence for a novel coding sequence overlapping the 5'-terminal approximately 90 codons of the gill-associated and yellow head okavirus envelope glycoprotein gene. Virol J 2009;6:222. [PMID: 20017924 PMCID: PMC2805633 DOI: 10.1186/1743-422x-6-222] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/01/2009] [Accepted: 12/17/2009] [Indexed: 11/23/2022] Open

Clifford M, Twigg J, Upton C. Evidence for a novel gene associated with human influenza A viruses. Virol J 2009;6:198. [PMID: 19917120 PMCID: PMC2780412 DOI: 10.1186/1743-422x-6-198] [Citation(s) in RCA: 21] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/15/2009] [Accepted: 11/16/2009] [Indexed: 02/06/2023] Open

Abstract

BACKGROUND

Influenza A virus genomes are comprised of 8 negative strand single-stranded RNA segments and are thought to encode 11 proteins, which are all translated from mRNAs complementary to the genomic strands. Although human, swine and avian influenza A viruses are very similar, cross-species infections are usually limited. However, antigenic differences are considerable and when viruses become established in a different host or if novel viruses are created by re-assortment devastating pandemics may arise.

RESULTS

Examination of influenza A virus genomes from the early 20th Century revealed the association of a 167 codon ORF encoded by the genomic strand of segment 8 with human isolates. Close to the timing of the 1948 pseudopandemic, a mutation occurred that resulted in the extension of this ORF to 216 codons. Since 1948, this ORF has been almost totally maintained in human influenza A viruses suggesting a selectable biological function. The discovery of cytotoxic T cells responding to an epitope encoded by this ORF suggests that it is translated into protein. Evidence of several other non-traditionally translated polypeptides in influenza A virus support the translation of this genomic strand ORF. The gene product is predicted to have a signal sequence and two transmembrane domains.

CONCLUSION

We hypothesize that the genomic strand of segment 8 of encodes a novel influenza A virus protein. The persistence and conservation of this genomic strand ORF for almost a century in human influenza A viruses provides strong evidence that it is translated into a polypeptide that enhances viral fitness in the human host. This has important consequences for the interpretation of experiments that utilize mutations in the NS1 and NEP genes of segment 8 and also for the consideration of events that may alter the spread and/or pathogenesis of swine and avian influenza A viruses in the human population.

Collapse

Firth AE, Wang QS, Jan E, Atkins JF. Bioinformatic evidence for a stem-loop structure 5'-adjacent to the IGR-IRES and for an overlapping gene in the bee paralysis dicistroviruses. Virol J 2009;6:193. [PMID: 19895695 PMCID: PMC2777877 DOI: 10.1186/1743-422x-6-193] [Citation(s) in RCA: 27] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/23/2009] [Accepted: 11/06/2009] [Indexed: 02/09/2023] Open

Functional viral metagenomics and the next generation of molecular tools. Trends Microbiol 2009;18:20-9. [PMID: 19896852 DOI: 10.1016/j.tim.2009.10.001] [Citation(s) in RCA: 47] [Impact Index Per Article: 2.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/18/2009] [Revised: 10/14/2009] [Accepted: 10/19/2009] [Indexed: 12/13/2022]

Firth AE, Bekaert M, Baranov PV. Computational Resources for Studying Recoding. ACTA ACUST UNITED AC 2009. [DOI: 10.1007/978-0-387-89382-2_20] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 03/27/2023]

Firth AE, Atkins JF. A case for a CUG-initiated coding sequence overlapping torovirus ORF1a and encoding a novel 30 kDa product. Virol J 2009;6:136. [PMID: 19737402 PMCID: PMC2749830 DOI: 10.1186/1743-422x-6-136] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/09/2009] [Accepted: 09/08/2009] [Indexed: 11/23/2022] Open

Firth AE, Atkins JF. Analysis of the coding potential of the partially overlapping 3' ORF in segment 5 of the plant fijiviruses. Virol J 2009;6:32. [PMID: 19292925 PMCID: PMC2666654 DOI: 10.1186/1743-422x-6-32] [Citation(s) in RCA: 29] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/26/2009] [Accepted: 03/17/2009] [Indexed: 01/10/2023] Open

Firth AE, Atkins JF. Bioinformatic analysis suggests that a conserved ORF in the waikaviruses encodes an overlapping gene. Arch Virol 2008;153:1379-83. [PMID: 18535758 DOI: 10.1007/s00705-008-0119-5] [Citation(s) in RCA: 18] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/26/2008] [Accepted: 04/16/2008] [Indexed: 11/29/2022]

Firth AE, Atkins JF. Bioinformatic analysis suggests that the Cypovirus 1 major core protein cistron harbours an overlapping gene. Virol J 2008;5:62. [PMID: 18492230 PMCID: PMC2409309 DOI: 10.1186/1743-422x-5-62] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/16/2008] [Accepted: 05/20/2008] [Indexed: 11/10/2022] Open

Firth AE. Bioinformatic analysis suggests that the Orbivirus VP6 cistron encodes an overlapping gene. Virol J 2008;5:48. [PMID: 18489030 PMCID: PMC2373779 DOI: 10.1186/1743-422x-5-48] [Citation(s) in RCA: 39] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/25/2008] [Accepted: 04/14/2008] [Indexed: 11/25/2022] Open

Abstract

Background

The genus Orbivirus includes several species that infect livestock – including Bluetongue virus (BTV) and African horse sickness virus (AHSV). These viruses have linear dsRNA genomes divided into ten segments, all of which have previously been assumed to be monocistronic.

Results

Bioinformatic evidence is presented for a short overlapping coding sequence (CDS) in the Orbivirus genome segment 9, overlapping the VP6 cistron in the +1 reading frame. In BTV, a 77–79 codon AUG-initiated open reading frame (hereafter ORFX) is present in all 48 segment 9 sequences analysed. The pattern of base variations across the 48-sequence alignment indicates that ORFX is subject to functional constraints at the amino acid level (even when the constraints due to coding in the overlapping VP6 reading frame are taken into account; MLOGD software). In fact the translated ORFX shows greater amino acid conservation than the overlapping region of VP6. The ORFX AUG codon has a strong Kozak context in all 48 sequences. Each has only one or two upstream AUG codons, always in the VP6 reading frame, and (with a single exception) always with weak or medium Kozak context. Thus, in BTV, ORFX may be translated via leaky scanning. A long (83–169 codon) ORF is present in a corresponding location and reading frame in all other Orbivirus species analysed except Saint Croix River virus (SCRV; the most divergent). Again, the pattern of base variations across sequence alignments indicates multiple coding in the VP6 and ORFX reading frames.

Conclusion

At ~9.5 kDa, the putative ORFX product in BTV is too small to appear on most published protein gels. Nonetheless, a review of past literature reveals a number of possible detections. We hope that presentation of this bioinformatic analysis will stimulate an attempt to experimentally verify the expression and functional role of ORFX, and hence lead to a greater understanding of the molecular biology of these important pathogens.

Collapse

An overlapping essential gene in the Potyviridae. Proc Natl Acad Sci U S A 2008;105:5897-902. [PMID: 18408156 DOI: 10.1073/pnas.0800468105] [Citation(s) in RCA: 579] [Impact Index Per Article: 34.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022] Open

Panjaworayan N, Roessner SK, Firth AE, Brown CM. HBVRegDB: annotation, comparison, detection and visualization of regulatory elements in hepatitis B virus sequences. Virol J 2007;4:136. [PMID: 18086305 PMCID: PMC2235840 DOI: 10.1186/1743-422x-4-136] [Citation(s) in RCA: 27] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/17/2007] [Accepted: 12/17/2007] [Indexed: 12/28/2022] Open

McCauley S, de Groot S, Mailund T, Hein J. Annotation of selection strengths in viral genomes. Bioinformatics 2007;23:2978-86. [PMID: 17921171 DOI: 10.1093/bioinformatics/btm472] [Citation(s) in RCA: 12] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open

Abstract

MOTIVATION

Viral genomes tend to code in overlapping reading frames to maximize informational content. This may result in atypical codon bias and particular evolutionary constraints. Due to the fast mutation rate of viruses, there is additional strong evidence for varying selection between intra- and intergenomic regions. The presence of multiple coding regions complicates the concept of K(a)/K(s) ratio, and thus begs for an alternative approach when investigating selection strengths. Building on the paper by McCauley and Hein, we develop a method for annotating a viral genome coding in overlapping reading frames. We introduce an evolutionary model capable of accounting for varying levels of selection along the genome, and incorporate it into our prior single sequence HMM methodology, extending it now to a phylogenetic HMM. Given an alignment of several homologous viruses to a reference sequence, we may thus achieve an annotation both of coding regions as well as selection strengths, allowing us to investigate different selection patterns and hypotheses.

RESULTS

We illustrate our method by applying it to a multiple alignment of four HIV2 sequences, as well as of three Hepatitis B sequences. We obtain an annotation of the coding regions, as well as a posterior probability for each site of the strength of selection acting on it. From this we may deduce the average posterior selection acting on the different genes. Whilst we are encouraged to see in HIV2, that the known to be conserved genes gag and pol are indeed annotated as such, we also discover several sites of less stringent negative selection within the env gene. To the best of our knowledge, we are the first to subsequently provide a full selection annotation of the Hepatitis B genome by explicitly modelling the evolution within overlapping reading frames, and not relying on simple K(a)/K(s) ratios.

Collapse

Belshaw R, Pybus OG, Rambaut A. The evolution of genome compression and genomic novelty in RNA viruses. Genome Res 2007;17:1496-504. [PMID: 17785537 PMCID: PMC1987338 DOI: 10.1101/gr.6305707] [Citation(s) in RCA: 119] [Impact Index Per Article: 6.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/25/2022]

de Groot S, Mailund T, Hein J. Comparative annotation of viral genomes with non-conserved gene structure. Bioinformatics 2007;23:1080-9. [PMID: 17341494 DOI: 10.1093/bioinformatics/btm078] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open

Abstract

MOTIVATION

Detecting genes in viral genomes is a complex task. Due to the biological necessity of them being constrained in length, RNA viruses in particular tend to code in overlapping reading frames. Since one amino acid is encoded by a triplet of nucleic acids, up to three genes may be coded for simultaneously in one direction. Conventional hidden Markov model (HMM)-based gene-finding algorithms may typically find it difficult to identify multiple coding regions, since in general their topologies do not allow for the presence of overlapping or nested genes. Comparative methods have therefore been restricted to likelihood ratio tests on potential regions as to being double or single coding, using the fact that the constrictions forced upon multiple-coding nucleotides will result in atypical sequence evolution. Exploiting these same constraints, we present an HMM based gene-finding program, which allows for coding in unidirectional nested and overlapping reading frames, to annotate two homologous aligned viral genomes. Our method does not insist on conserved gene structure between the two sequences, thus making it applicable for the pairwise comparison of more distantly related sequences.

RESULTS

We apply our method to 15 pairwise alignments of six different HIV2 genomes. Given sufficient evolutionary distance between the two sequences, we achieve sensitivity of approximately 84-89% and specificity of approximately 97-99.9%. We additionally annotate three pairwise alignments of the more distantly related HIV1 and HIV2, as well as of two different hepatitis viruses, attaining results of approximately 87% sensitivity and approximately 98.5% specificity. We subsequently incorporate prior knowledge by 'knowing' the gene structure of one sequence and annotating the other conditional on it. Boosting accuracy close to perfect we demonstrate that conservation of gene structure on top of nucleotide sequence is a valuable source of information, especially in distantly related genomes.

AVAILABILITY

The Java code is available from the authors.

Collapse

McCauley S, Hein J. Using hidden Markov models and observed evolution to annotate viral genomes. Bioinformatics 2006;22:1308-16. [PMID: 16613911 DOI: 10.1093/bioinformatics/btl092] [Citation(s) in RCA: 10] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open

Abstract

MOTIVATION

ssRNA (single stranded) viral genomes are generally constrained in length and utilize overlapping reading frames to maximally exploit the coding potential within the genome length restrictions. This overlapping coding phenomenon leads to complex evolutionary constraints operating on the genome. In regions which code for more than one protein, silent mutations in one reading frame generally have a protein coding effect in another. To maximize coding flexibility in all reading frames, overlapping regions are often compositionally biased towards amino acids which are 6-fold degenerate with respect to the 64 codon alphabet. Previous methodologies have used this fact in an ad hoc manner to look for overlapping genes by motif matching. In this paper differentiated nucleotide compositional patterns in overlapping regions are incorporated into a probabilistic hidden Markov model (HMM) framework which is used to annotate ssRNA viral genomes. This work focuses on single sequence annotation and applies an HMM framework to ssRNA viral annotation. A description of how the HMM is parameterized, whilst annotating within a missing data framework is given. A Phylogenetic HMM (Phylo-HMM) extension, as applied to 14 aligned HIV2 sequences is also presented. This evolutionary extension serves as an illustration of the potential of the Phylo-HMM framework for ssRNA viral genomic annotation.

RESULTS

The single sequence annotation procedure (SSA) is applied to 14 different strains of the HIV2 virus. Further results on alternative ssRNA viral genomes are presented to illustrate more generally the performance of the method. The results of the SSA method are encouraging however there is still room for improvement, and since there is overwhelming evidence to indicate that comparative methods can improve coding sequence (CDS) annotation, the SSA method is extended to a Phylo-HMM to incorporate evolutionary information. The Phylo-HMM extension is applied to the same set of 14 HIV2 sequences which are pre-aligned. The performance improvement that results from including the evolutionary information in the analysis is illustrated.

Collapse

Allen MJ, Schroeder DC, Donkin A, Crawfurd KJ, Wilson WH. Genome comparison of two Coccolithoviruses. Virol J 2006;3:15. [PMID: 16553948 PMCID: PMC1440845 DOI: 10.1186/1743-422x-3-15] [Citation(s) in RCA: 26] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/25/2005] [Accepted: 03/22/2006] [Indexed: 11/17/2022] Open

Firth AE, Brown CM. Detecting overlapping coding sequences in virus genomes. BMC Bioinformatics 2006;7:75. [PMID: 16483358 PMCID: PMC1395342 DOI: 10.1186/1471-2105-7-75] [Citation(s) in RCA: 67] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/09/2005] [Accepted: 02/16/2006] [Indexed: 11/10/2022] Open

Current Awareness on Comparative and Functional Genomics. Comp Funct Genomics 2005. [PMCID: PMC2447509 DOI: 10.1002/cfg.490] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/09/2022] Open