1
|
Iqbal S, Begum F. Identification and characterization of integrated prophages and CRISPR-Cas system in Bacillus subtilis RS10 genome. Braz J Microbiol 2024; 55:537-542. [PMID: 38216797 PMCID: PMC10920515 DOI: 10.1007/s42770-024-01249-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/05/2023] [Accepted: 01/04/2024] [Indexed: 01/14/2024] Open
Abstract
Bacteriophages have been extensively investigated due to their prominent role in the virulence and resistance of pathogenic bacteria. However, little attention has been given to the non-pathogenic Bacillus phages, and their role in the ecological bacteria genome is overlooked. In the present study, we characterized two Bacillus phages with a linear DNA genome of 33.6 kb with 44.83% GC contents and 129.3 kb with 34.70% GC contents. A total of 46 and 175 putative coding DNA sequences (CDS) were identified in prophage 1 (P1) and prophage 2 (P2), respectively, with no tRNA genes. Comparative genome sequence analysis revealed that P1 shares eight CDS with phage Jimmer 2 (NC-041976), and phage Osiris (NC-028969), and six with phage phi CT9441A (NC-029022). On the other hand, P2 showed high similarity with Bacill_SPbeta_NC_001884 and Bacillus phage phi 105. Further, genome analysis indicates several horizontal gene transfer events in both phages during the evolution process. In addition, we detected two CRISPR-Cas systems for the first time in B. subtilis. The identified CRISPR system consists of 24 and 25 direct repeats and integrase coding genes, while the cas gene which encodes Cas protein involved in the cleavage of a target sequence is missing. These findings will expand the current knowledge of soil phages as well as help to develop a new perspective for investigating more ecological phages to understand their role in bacterial communities and diversity.
Collapse
Affiliation(s)
- Sajid Iqbal
- Department of Industrial Biotechnology, Atta-ur-Rahman School of Applied Biosciences (ASAB), National University of Sciences and Technology (NUST), H-12, Islamabad, Pakistan.
- Oujiang Laboratory (Zhejiang Laboratory for Regenerative Medicine, Vision and Brain Health), Wenzhou, 325000, China.
| | - Farida Begum
- Department of Biochemistry, Abdul Wali Khan University Mardan (AWKUM), Mardan, Pakistan
| |
Collapse
|
2
|
Abstract
Two decades of metagenomic analyses have revealed that in many environments, small (∼5 kb), single-stranded DNA phages of the family Microviridae dominate the virome. Although the emblematic microvirus phiX174 is ubiquitous in the laboratory, most other microviruses, particularly those of the gokushovirus and amoyvirus lineages, have proven to be much more elusive. This puzzling lack of representative isolates has hindered insights into microviral biology. Furthermore, the idiosyncratic size and nature of their genomes have resulted in considerable misjudgments of their actual abundance in nature. Fortunately, recent successes in microvirus isolation and improved metagenomic methodologies can now provide us with more accurate appraisals of their abundance, their hosts, and their interactions. The emerging picture is that phiX174 and its relatives are rather rare and atypical microviruses, and that a tremendous diversity of other microviruses is ready for exploration.
Collapse
Affiliation(s)
- Paul C Kirchberger
- Department of Molecular Biosciences, University of Texas at Austin, Austin, Texas, USA
- Department of Microbiology and Molecular Genetics, Oklahoma State University, Stillwater, Oklahoma, USA;
| | - Howard Ochman
- Department of Molecular Biosciences, University of Texas at Austin, Austin, Texas, USA
| |
Collapse
|
3
|
W B Jr M, A S R, P M, F B. Cellular and Natural Viral Engineering in Cognition-Based Evolution. Commun Integr Biol 2023; 16:2196145. [PMID: 37153718 PMCID: PMC10155641 DOI: 10.1080/19420889.2023.2196145] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/28/2022] [Accepted: 03/23/2023] [Indexed: 05/10/2023] Open
Abstract
Neo-Darwinism conceptualizes evolution as the continuous succession of predominately random genetic variations disciplined by natural selection. In that frame, the primary interaction between cells and the virome is relegated to host-parasite dynamics governed by selective influences. Cognition-Based Evolution regards biological and evolutionary development as a reciprocating cognition-based informational interactome for the protection of self-referential cells. To sustain cellular homeorhesis, cognitive cells collaborate to assess the validity of ambiguous biological information. That collective interaction involves coordinate measurement, communication, and active deployment of resources as Natural Cellular Engineering. These coordinated activities drive multicellularity, biological development, and evolutionary change. The virome participates as the vital intercessory among the cellular domains to ensure their shared permanent perpetuation. The interactions between the virome and the cellular domains represent active virocellular cross-communications for the continual exchange of resources. Modular genetic transfers between viruses and cells carry bioactive potentials. Those exchanges are deployed as nonrandom flexible tools among the domains in their continuous confrontation with environmental stresses. This alternative framework fundamentally shifts our perspective on viral-cellular interactions, strengthening established principles of viral symbiogenesis. Pathogenesis can now be properly appraised as one expression of a range of outcomes between cells and viruses within a larger conceptual framework of Natural Viral Engineering as a co-engineering participant with cells. It is proposed that Natural Viral Engineering should be viewed as a co-existent facet of Natural Cellular Engineering within Cognition-Based Evolution.
Collapse
Affiliation(s)
- Miller W B Jr
- Banner Health Systems - Medicine, Paradise Valley, Arizona, AZ, USA
| | - Reber A S
- Department of Psychology, University of British Columbia, Vancouver, BC, Canada
| | - Marshall P
- Department of Engineering, Evolution 2.0, Oak Park, IL, USA
| | - Baluška F
- Institute of Cellular and Molecular Botany, University of Bonn, Bonn, Germany
| |
Collapse
|
4
|
Pley C, Lourenço J, McNaughton AL, Matthews PC. Spacer Domain in Hepatitis B Virus Polymerase: Plugging a Hole or Performing a Role? J Virol 2022; 96:e0005122. [PMID: 35412348 PMCID: PMC9093120 DOI: 10.1128/jvi.00051-22] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/10/2022] [Accepted: 03/14/2022] [Indexed: 11/25/2022] Open
Abstract
Hepatitis B virus (HBV) polymerase is divided into terminal protein, spacer, reverse transcriptase, and RNase domains. Spacer has previously been considered dispensable, merely acting as a tether between other domains or providing plasticity to accommodate deletions and mutations. We explore evidence for the role of spacer sequence, structure, and function in HBV evolution and lineage, consider its associations with escape from drugs, vaccines, and immune responses, and review its potential impacts on disease outcomes.
Collapse
Affiliation(s)
- Caitlin Pley
- School of Clinical Medicine, University of Cambridge, Cambridge, United Kingdom
- Guy’s and St Thomas’ NHS Foundation Trust, London, United Kingdom
| | - José Lourenço
- Department of Zoology, University of Oxford, Oxford, United Kingdom
- Biosystems and Integrative Sciences Institute, University of Lisbon, Lisbon, Portugal
| | - Anna L. McNaughton
- Population Health Science, Bristol Medical School, University of Bristol, Bristol, United Kingdom
- Nuffield Department of Medicine, University of Oxford Medawar Building, Oxford, United Kingdom
| | - Philippa C. Matthews
- Nuffield Department of Medicine, University of Oxford Medawar Building, Oxford, United Kingdom
- The Francis Crick Institute, London, United Kingdom
- Division of Infection and Immunity, University College London, London, United Kingdom
| |
Collapse
|
5
|
Watson AK, Lopez P, Bapteste E. Hundreds of out-of-frame remodelled gene families in the E. coli pangenome. Mol Biol Evol 2021; 39:6430988. [PMID: 34792602 PMCID: PMC8788219 DOI: 10.1093/molbev/msab329] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/01/2023] Open
Abstract
All genomes include gene families with very limited taxonomic distributions that potentially represent new genes and innovations in protein-coding sequence, raising questions on the origins of such genes. Some of these genes are hypothesized to have formed de novo, from noncoding sequences, and recent work has begun to elucidate the processes by which de novo gene formation can occur. A special case of de novo gene formation, overprinting, describes the origin of new genes from noncoding alternative reading frames of existing open reading frames (ORFs). We argue that additionally, out-of-frame gene fission/fusion events of alternative reading frames of ORFs and out-of-frame lateral gene transfers could contribute to the origin of new gene families. To demonstrate this, we developed an original pattern-search in sequence similarity networks, enhancing the use of these graphs, commonly used to detect in-frame remodeled genes. We applied this approach to gene families in 524 complete genomes of Escherichia coli. We identified 767 gene families whose evolutionary history likely included at least one out-of-frame remodeling event. These genes with out-of-frame components represent ∼2.5% of all genes in the E. coli pangenome, suggesting that alternative reading frames of existing ORFs can contribute to a significant proportion of de novo genes in bacteria.
Collapse
Affiliation(s)
- Andrew K Watson
- Institut de Systématique, Evolution, Biodiversité (ISYEB), Sorbonne Université, CNRS, Museum National d'Histoire Naturelle, EPHE, Université des Antilles, 7, quai Saint Bernard, Paris, 75005, France
| | - Philippe Lopez
- Institut de Systématique, Evolution, Biodiversité (ISYEB), Sorbonne Université, CNRS, Museum National d'Histoire Naturelle, EPHE, Université des Antilles, 7, quai Saint Bernard, Paris, 75005, France
| | - Eric Bapteste
- Institut de Systématique, Evolution, Biodiversité (ISYEB), Sorbonne Université, CNRS, Museum National d'Histoire Naturelle, EPHE, Université des Antilles, 7, quai Saint Bernard, Paris, 75005, France
| |
Collapse
|
6
|
Computational methods for inferring location and genealogy of overlapping genes in virus genomes: approaches and applications. Curr Opin Virol 2021; 52:1-8. [PMID: 34798370 PMCID: PMC8594276 DOI: 10.1016/j.coviro.2021.10.009] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/04/2021] [Revised: 10/21/2021] [Accepted: 10/22/2021] [Indexed: 12/02/2022]
Abstract
Viruses may evolve to increase the amount of encoded genetic information by means of overlapping genes, which utilize several reading frames. Such overlapping genes may be especially impactful for genomes of small size, often serving a source of novel accessory proteins, some of which play a crucial role in viral pathogenicity or in promoting the systemic spread of virus. Diverse genome-based metrics were proposed to facilitate recognition of overlapping genes that otherwise may be overlooked during genome annotation. They can detect the atypical codon bias associated with the overlap (e.g. a statistically significant reduction in variability at synonymous sites) or other sequence-composition features peculiar to overlapping genes. In this review, I compare nine computational methods, discuss their strengths and limitations, and survey how they were applied to detect candidate overlapping genes in the genome of SARS-CoV-2, the etiological agent of COVID-19 pandemic.
Collapse
|
7
|
Khan MSI, Gao X, Liang K, Mei S, Zhan J. Virulent Drexlervirial Bacteriophage MSK, Morphological and Genome Resemblance With Rtp Bacteriophage Inhibits the Multidrug-Resistant Bacteria. Front Microbiol 2021; 12:706700. [PMID: 34504479 PMCID: PMC8421802 DOI: 10.3389/fmicb.2021.706700] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/07/2021] [Accepted: 06/14/2021] [Indexed: 11/13/2022] Open
Abstract
Phage-host interactions are likely to have the most critical aspect of phage biology. Phages are the most abundant and ubiquitous infectious acellular entities in the biosphere, where their presence remains elusive. Here, the novel Escherichia coli lytic bacteriophage, named MSK, was isolated from the lysed culture of E. coli C (phix174 host). The genome of phage MSK was sequenced, comprising 45,053 bp with 44.8% G + C composition. In total, 73 open reading frames (ORFs) were predicted, out of which 24 showed a close homology with known functional proteins, including one tRNA-arg; however, the other 49 proteins with no proven function in the genome database were called hypothetical. Electron Microscopy and genome characterization have revealed that MSK phage has a rosette-like tail tip. There were, in total, 46 ORFs which were homologous to the Rtp genome. Among these ORFs, the tail fiber protein with a locus tag of MSK_000019 was homologous to Rtp 43 protein, which determines the host specificity. The other protein, MSK_000046, encodes lipoprotein (cor gene); that protein resembles Rtp 45, responsible for preventing adsorption during cell lysis. Thirteen MSK structural proteins were identified by SDS-PAGE analysis. Out of these, 12 were vital structural proteins, and one was a hypothetical protein. Among these, the protein terminase large (MSK_000072) subunit, which may be involved in DNA packaging and proposed packaging strategy of MSK bacteriophage genome, takes place through headful packaging using the pac-sites. Biosafety assessment of highly stable phage MSK genome analysis has revealed that the phage did not possess virulence genes, which indicates proper phage therapy. MSK phage potentially could be used to inhibit the multidrug-resistant bacteria, including AMP, TCN, and Colistin. Further, a comparative genome and lifestyle study of MSK phage confirmed the highest similarity level (87.18% ANI). These findings suggest it to be a new lytic isolated phage species. Finally, Blast and phylogenetic analysis of the large terminase subunit and tail fiber protein put it in Rtp viruses' genus of family Drexlerviridae.
Collapse
Affiliation(s)
- Muhammad Saleem Iqbal Khan
- Department of Biochemistry, Cancer Institute of the Second Affiliated Hospital (Key Laboratory of Cancer Prevention and Intervention, China National Ministry of Education), School of Medicine, Zhejiang University, Hangzhou, China
| | - Xiangzheng Gao
- Department of Biochemistry, Cancer Institute of the Second Affiliated Hospital (Key Laboratory of Cancer Prevention and Intervention, China National Ministry of Education), School of Medicine, Zhejiang University, Hangzhou, China
| | - Keying Liang
- Department of Biochemistry, Cancer Institute of the Second Affiliated Hospital (Key Laboratory of Cancer Prevention and Intervention, China National Ministry of Education), School of Medicine, Zhejiang University, Hangzhou, China
| | - Shengsheng Mei
- Department of Biochemistry, Cancer Institute of the Second Affiliated Hospital (Key Laboratory of Cancer Prevention and Intervention, China National Ministry of Education), School of Medicine, Zhejiang University, Hangzhou, China
| | - Jinbiao Zhan
- Department of Biochemistry, Cancer Institute of the Second Affiliated Hospital (Key Laboratory of Cancer Prevention and Intervention, China National Ministry of Education), School of Medicine, Zhejiang University, Hangzhou, China
| |
Collapse
|
8
|
Pavesi A. Origin, Evolution and Stability of Overlapping Genes in Viruses: A Systematic Review. Genes (Basel) 2021; 12:genes12060809. [PMID: 34073395 PMCID: PMC8227390 DOI: 10.3390/genes12060809] [Citation(s) in RCA: 19] [Impact Index Per Article: 4.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/06/2021] [Revised: 05/22/2021] [Accepted: 05/24/2021] [Indexed: 12/11/2022] Open
Abstract
During their long evolutionary history viruses generated many proteins de novo by a mechanism called “overprinting”. Overprinting is a process in which critical nucleotide substitutions in a pre-existing gene can induce the expression of a novel protein by translation of an alternative open reading frame (ORF). Overlapping genes represent an intriguing example of adaptive conflict, because they simultaneously encode two proteins whose freedom to change is constrained by each other. However, overlapping genes are also a source of genetic novelties, as the constraints under which alternative ORFs evolve can give rise to proteins with unusual sequence properties, most importantly the potential for novel functions. Starting with the discovery of overlapping genes in phages infecting Escherichia coli, this review covers a range of studies dealing with detection of overlapping genes in small eukaryotic viruses (genomic length below 30 kb) and recognition of their critical role in the evolution of pathogenicity. Origin of overlapping genes, what factors favor their birth and retention, and how they manage their inherent adaptive conflict are extensively reviewed. Special attention is paid to the assembly of overlapping genes into ad hoc databases, suitable for future studies, and to the development of statistical methods for exploring viral genome sequences in search of undiscovered overlaps.
Collapse
Affiliation(s)
- Angelo Pavesi
- Department of Chemistry, Life Sciences and Environmental Sustainability, University of Parma, Parco Area delle Scienze 23/A, I-43124 Parma, Italy
| |
Collapse
|
9
|
Li M, Lin H, Wang L, Wang J. Complete genome sequence of the extreme-pH-resistant Salmonella bacteriophage αα of the family Microviridae. Arch Virol 2020; 166:325-329. [PMID: 33221988 DOI: 10.1007/s00705-020-04880-4] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/16/2020] [Accepted: 09/29/2020] [Indexed: 12/01/2022]
Abstract
A novel Salmonella bacteriophage (phage), named αα, was the first reported member of the family Microviridae to exhibit tolerance to both extreme acidic and alkaline conditions (pH 2-12 for 1 h). Phage αα has a circular single-stranded DNA genome of 5,387 nt with a G+C content of 44.66%. A total of 11 putative gene products and no tRNA genes are encoded in the phage αα genome. Whole-genome sequence comparisons revealed that phage αα shares 95% identity with coliphage phiX174 and had a close evolutionary relationship to the phages NC1 and NC7. Phylogenetic analysis of the structural proteins of phage αα and 18 other phiX174-like phages showed that a phylogenetic tree based on protein B sequences had a topology similar to that obtained using whole genome sequences. In addition, variable sites in proteins F and G distributed on the surface of the mature capsid and the conserved protein J were probably involved in maintaining the structural integrity of the phage under extreme pH conditions. Our findings could open up new perspectives for identifying more extreme-pH-resistant phages and their structural proteins and understanding the mechanism of phage adaptation and evolution under extreme environmental stress.
Collapse
Affiliation(s)
- Mengzhe Li
- Food Safety Laboratory, Department of Food Science and Engineering, Ocean University of China, Qingdao, 266003, People's Republic of China
| | - Hong Lin
- Food Safety Laboratory, Department of Food Science and Engineering, Ocean University of China, Qingdao, 266003, People's Republic of China
| | - Luokai Wang
- Food Safety Laboratory, Department of Food Science and Engineering, Ocean University of China, Qingdao, 266003, People's Republic of China
| | - Jingxue Wang
- Food Safety Laboratory, Department of Food Science and Engineering, Ocean University of China, Qingdao, 266003, People's Republic of China.
| |
Collapse
|
10
|
Circular Single-Stranded DNA Virus ( Microviridae: Gokushovirinae: Jodiemicrovirus) Associated with the Pathobiome of the Flat-Back Mud Crab, Eurypanopeus depressus. Microbiol Resour Announc 2019; 8:8/47/e01026-19. [PMID: 31753941 PMCID: PMC6872883 DOI: 10.1128/mra.01026-19] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/02/2023] Open
Abstract
A single-stranded DNA (ssDNA) virus is presented from a metagenomic data set derived from Alphaproteobacteria-infected hepatopancreatic tissues of the crab Eurypanopeus depressus. The circular virus genome (4,768 bp) encodes 14 hypothetical proteins, some similar to other bacteriophages (Microviridae). Based on its relatedness to other Microviridae, this virus represents a member of a novel genus. A single-stranded DNA (ssDNA) virus is presented from a metagenomic data set derived from Alphaproteobacteria-infected hepatopancreatic tissues of the crab Eurypanopeus depressus. The circular virus genome (4,768 bp) encodes 14 hypothetical proteins, some similar to other bacteriophages (Microviridae). Based on its relatedness to other Microviridae, this virus represents a member of a novel genus.
Collapse
|
11
|
Affram Y, Zapata JC, Gholizadeh Z, Tolbert WD, Zhou W, Iglesias-Ussel MD, Pazgier M, Ray K, Latinovic OS, Romerio F. The HIV-1 Antisense Protein ASP Is a Transmembrane Protein of the Cell Surface and an Integral Protein of the Viral Envelope. J Virol 2019; 93:e00574-19. [PMID: 31434734 PMCID: PMC6803264 DOI: 10.1128/jvi.00574-19] [Citation(s) in RCA: 23] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/05/2019] [Accepted: 08/14/2019] [Indexed: 12/13/2022] Open
Abstract
The negative strand of HIV-1 encodes a highly hydrophobic antisense protein (ASP) with no known homologs. The presence of humoral and cellular immune responses to ASP in HIV-1 patients indicates that ASP is expressed in vivo, but its role in HIV-1 replication remains unknown. We investigated ASP expression in multiple chronically infected myeloid and lymphoid cell lines using an anti-ASP monoclonal antibody (324.6) in combination with flow cytometry and microscopy approaches. At baseline and in the absence of stimuli, ASP shows polarized subnuclear distribution, preferentially in areas with low content of suppressive epigenetic marks. However, following treatment with phorbol 12-myristate 13-acetate (PMA), ASP translocates to the cytoplasm and is detectable on the cell surface, even in the absence of membrane permeabilization, indicating that 324.6 recognizes an ASP epitope that is exposed extracellularly. Further, surface staining with 324.6 and anti-gp120 antibodies showed that ASP and gp120 colocalize, suggesting that ASP might become incorporated in the membranes of budding virions. Indeed, fluorescence correlation spectroscopy studies showed binding of 324.6 to cell-free HIV-1 particles. Moreover, 324.6 was able to capture and retain HIV-1 virions with efficiency similar to that of the anti-gp120 antibody VRC01. Our studies indicate that ASP is an integral protein of the plasma membranes of chronically infected cells stimulated with PMA, and upon viral budding, ASP becomes a structural protein of the HIV-1 envelope. These results may provide leads to investigate the possible role of ASP in the virus replication cycle and suggest that ASP may represent a new therapeutic or vaccine target.IMPORTANCE The HIV-1 genome contains a gene expressed in the opposite, or antisense, direction to all other genes. The protein product of this antisense gene, called ASP, is poorly characterized, and its role in viral replication remains unknown. We provide evidence that the antisense protein, ASP, of HIV-1 is found within the cell nucleus in unstimulated cells. In addition, we show that after PMA treatment, ASP exits the nucleus and localizes on the cell membrane. Moreover, we demonstrate that ASP is present on the surfaces of viral particles. Altogether, our studies identify ASP as a new structural component of HIV-1 and show that ASP is an accessory protein that promotes viral replication. The presence of ASP on the surfaces of both infected cells and viral particles might be exploited therapeutically.
Collapse
Affiliation(s)
- Yvonne Affram
- Institute of Human Virology, University of Maryland School of Medicine, Baltimore, Maryland, USA
| | - Juan C Zapata
- Institute of Human Virology, University of Maryland School of Medicine, Baltimore, Maryland, USA
| | - Zahra Gholizadeh
- Institute of Human Virology, University of Maryland School of Medicine, Baltimore, Maryland, USA
| | - William D Tolbert
- Institute of Human Virology, University of Maryland School of Medicine, Baltimore, Maryland, USA
| | - Wei Zhou
- Institute of Human Virology, University of Maryland School of Medicine, Baltimore, Maryland, USA
| | - Maria D Iglesias-Ussel
- Institute of Human Virology, University of Maryland School of Medicine, Baltimore, Maryland, USA
| | - Marzena Pazgier
- Institute of Human Virology, University of Maryland School of Medicine, Baltimore, Maryland, USA
| | - Krishanu Ray
- Institute of Human Virology, University of Maryland School of Medicine, Baltimore, Maryland, USA
| | - Olga S Latinovic
- Institute of Human Virology, University of Maryland School of Medicine, Baltimore, Maryland, USA
| | - Fabio Romerio
- Institute of Human Virology, University of Maryland School of Medicine, Baltimore, Maryland, USA
| |
Collapse
|
12
|
The Evolution and Expression Pattern of Human Overlapping lncRNA and Protein-coding Gene Pairs. Sci Rep 2017; 7:42775. [PMID: 28344339 PMCID: PMC5366806 DOI: 10.1038/srep42775] [Citation(s) in RCA: 24] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/16/2016] [Accepted: 01/13/2017] [Indexed: 12/27/2022] Open
Abstract
Long non-coding RNA overlapping with protein-coding gene (lncRNA-coding pair) is a special type of overlapping genes. Protein-coding overlapping genes have been well studied and increasing attention has been paid to lncRNAs. By studying lncRNA-coding pairs in human genome, we showed that lncRNA-coding pairs were more likely to be generated by overprinting and retaining genes in lncRNA-coding pairs were given higher priority than non-overlapping genes. Besides, the preference of overlapping configurations preserved during evolution was based on the origin of lncRNA-coding pairs. Further investigations showed that lncRNAs promoting the splicing of their embedded protein-coding partners was a unilateral interaction, but the existence of overlapping partners improving the gene expression was bidirectional and the effect was decreased with the increased evolutionary age of genes. Additionally, the expression of lncRNA-coding pairs showed an overall positive correlation and the expression correlation was associated with their overlapping configurations, local genomic environment and evolutionary age of genes. Comparison of the expression correlation of lncRNA-coding pairs between normal and cancer samples found that the lineage-specific pairs including old protein-coding genes may play an important role in tumorigenesis. This work presents a systematically comprehensive understanding of the evolution and the expression pattern of human lncRNA-coding pairs.
Collapse
|
13
|
Saha D, Podder S, Ghosh TC. Overlapping Regions in HIV-1 Genome Act as Potential Sites for Host-Virus Interaction. Front Microbiol 2016; 7:1735. [PMID: 27867372 PMCID: PMC5095123 DOI: 10.3389/fmicb.2016.01735] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/05/2016] [Accepted: 10/17/2016] [Indexed: 01/05/2023] Open
Abstract
More than a decade, overlapping genes in RNA viruses became a subject of research which has explored various effect of gene overlapping on the evolution and function of viral genomes like genome size compaction. Additionally, overlapping regions (OVRs) are also reported to encode elevated degree of protein intrinsic disorder (PID) in unspliced RNA viruses. With the aim to explore the roles of OVRs in HIV-1 pathogenesis, we have carried out an in-depth analysis on the association of gene overlapping with PID in 35 HIV1- M subtypes. Our study reveals an over representation of PID in OVR of HIV-1 genomes. These disordered residues endure several vital, structural features like short linear motifs (SLiMs) and protein phosphorylation (PP) sites which are previously shown to be involved in massive host–virus interaction. Moreover, SLiMs in OVRs are noticed to be more functionally potential as compared to that of non-overlapping region. Although, density of experimentally verified SLiMs, resided in 9 HIV-1 genes, involved in host–virus interaction do not show any bias toward clustering into OVR, tat and rev two important proteins mediates host–pathogen interaction by their experimentally verified SLiMs, which are mostly localized in OVR. Finally, our analysis suggests that the acquisition of SLiMs in OVR is mutually exclusive of the occurrence of disordered residues, while the enrichment of PPs in OVR is solely dependent on PID and not on overlapping coding frames. Thus, OVRs of HIV-1 genomes could be demarcated as potential molecular recognition sites during host–virus interaction.
Collapse
Affiliation(s)
- Deeya Saha
- Bioinformatics Centre, Bose Institute Kolkata, India
| | - Soumita Podder
- Department of Microbiology, Raiganj University Raiganj, India
| | | |
Collapse
|
14
|
Amarillas L, Chaidez C, González-Robles A, Lugo-Melchor Y, León-Félix J. Characterization of novel bacteriophage phiC119 capable of lysing multidrug-resistant Shiga toxin-producing Escherichia coli O157:H7. PeerJ 2016; 4:e2423. [PMID: 27672499 PMCID: PMC5028729 DOI: 10.7717/peerj.2423] [Citation(s) in RCA: 16] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/15/2016] [Accepted: 08/09/2016] [Indexed: 12/22/2022] Open
Abstract
BACKGROUND Shiga toxin-producing Escherichia coli (STEC) is one of the most common and widely distributed foodborne pathogens that has been frequently implicated in gastrointestinal and urinary tract infections. Moreover, high rates of multiple antibiotic-resistant E. coli strains have been reported worldwide. Due to the emergence of antibiotic-resistant strains, bacteriophages are considered an attractive alternative to biocontrol pathogenic bacteria. Characterization is a preliminary step towards designing a phage for biocontrol. METHODS In this study, we describe the characterization of a bacteriophage designated phiC119, which can infect and lyse several multidrug-resistant STEC strains and some Salmonella strains. The phage genome was screened to detect the stx-genes using PCR, morphological analysis, host range was determined, and genome sequencing were carried out, as well as an analysis of the cohesive ends and identification of the type of genetic material through enzymatic digestion of the genome. RESULTS Analysis of the bacteriophage particles by transmission electron microscopy showed that it had an icosahedral head and a long tail, characteristic of the family Siphoviridae. The phage exhibits broad host range against multidrug-resistant and highly virulent E. coli isolates. One-step growth experiments revealed that the phiC119 phage presented a large burst size (210 PFU/cell) and a latent period of 20 min. Based on genomic analysis, the phage contains a linear double-stranded DNA genome with a size of 47,319 bp. The phage encodes 75 putative proteins, but lysogeny and virulence genes were not found in the phiC119 genome. CONCLUSION These results suggest that phage phiC119 may be a good biological control agent. However, further studies are required to ensure its control of STEC and to confirm the safety of phage use.
Collapse
Affiliation(s)
- Luis Amarillas
- Laboratorio de Biología Molecular y Genómica Funcional, Centro de Investigación en Alimentación y Desarrollo, A. C., Culiacán, Sinaloa, México; Laboratorio de Genética, Instituto de Investigación Lightbourn, A. C., Cd. Jiménez, Chihuahua, México
| | - Cristóbal Chaidez
- Inocuidad Alimentaria, Centro de Investigación en Alimentación y Desarrollo, A. C. , Culiacán, Sinaloa , México
| | - Arturo González-Robles
- Departamento de Infectómica y Patogénesis Molecular, Centro de Investigación y de Estudios Avanzados, Instituto Politécnico Nacional , Ciudad de México , México
| | - Yadira Lugo-Melchor
- Laboratorio de Biología Molecular de la Unidad de Servicios Analíticos y Metrológicos, Centro de Investigación y Asistencia en Tecnología y Diseño del Estado de Jalisco A. C. , Guadalajara, Jalisco , México
| | - Josefina León-Félix
- Laboratorio de Biología Molecular y Genómica Funcional, Centro de Investigación en Alimentación y Desarrollo, A. C. , Culiacán, Sinaloa , México
| |
Collapse
|
15
|
Different patterns of codon usage in the overlapping polymerase and surface genes of hepatitis B virus suggest a de novo origin by modular evolution. J Gen Virol 2015; 96:3577-3586. [DOI: 10.1099/jgv.0.000307] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/05/2023] Open
Abstract
The polymerase (P) and surface (S) genes of hepatitis B virus (HBV) show the longest gene overlap in animal viruses. Gene overlaps originate by the overprinting of a novel frame onto an ancestral pre-existing frame. Identifying which frame is ancestral and which frame is de novo (the genealogy of the overlap) is an appealing topic. However, the P/S overlap of HBV is an intriguing paradox, because both genes are indispensable for virus survival. Thus, the hypothesis of a primordial virus without the surface protein or without the polymerase makes no biological sense. With the aim to determine the genealogy of the overlap, the codon usage of the overlapping frames P and S was compared to that of the non-overlapping region. It was found that the overlap of human HBV had two patterns of codon usage. One was localized in the 5′ one-third of the overlap and the other in the 3′ two-thirds. By extending the analysis to non-human HBVs, it was found that this feature occurred in all hepadnaviruses. Under the assumption that the ancestral frame has a codon usage significantly closer to that of the non-overlapping region than the de novo frame, the ancestral frames in the 5′ and 3′ region of the overlap could be predicted. They were, respectively, frame S and frame P. These results suggest that the spacer domain of the polymerase and the S domain of the surface protein originated de novo by overprinting. They support a modular evolution hypothesis for the origin of the overlap.
Collapse
|
16
|
Wei X, Zhang J. A simple method for estimating the strength of natural selection on overlapping genes. Genome Biol Evol 2014; 7:381-90. [PMID: 25552532 PMCID: PMC4316641 DOI: 10.1093/gbe/evu294] [Citation(s) in RCA: 20] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/20/2022] Open
Abstract
Overlapping genes, where one DNA sequence codes for two proteins with different reading frames, are not uncommon in viruses and cellular organisms. Estimating the direction and strength of natural selection acting on overlapping genes is important for understanding their functionality, origin, evolution, maintenance, and potential interaction. However, the standard methods for estimating synonymous (dS) and nonsynonymous (dN) nucleotide substitution rates are inapplicable here because a nucleotide change can be simultaneously synonymous and nonsynonymous when both reading frames involved are considered. We have developed a simple method that can estimate dN/dS and test for the action of natural selection in each relevant reading frame of the overlapping genes. Our method is an extension of the modified Nei-Gojobori method previously developed for nonoverlapping genes. We confirmed the reliability of our method using extensive computer simulation. Applying this method, we studied the longest human sense–antisense overlapping gene pair, LRRC8E and ENSG00000214248. Although LRRC8E (leucine-rich repeat containing eight family, member E) is known to regulate cell size, the function of ENSG00000214248 is unknown. Our analysis revealed purifying selection on ENSG00000214248 and suggested that it originated in the common ancestor of bony vertebrates.
Collapse
Affiliation(s)
- Xinzhu Wei
- Department of Ecology and Evolutionary Biology, University of Michigan
| | - Jianzhi Zhang
- Department of Ecology and Evolutionary Biology, University of Michigan
| |
Collapse
|
17
|
Overlapping genes: a new strategy of thermophilic stress tolerance in prokaryotes. Extremophiles 2014; 19:345-53. [PMID: 25503326 DOI: 10.1007/s00792-014-0720-3] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/09/2014] [Accepted: 12/01/2014] [Indexed: 12/29/2022]
Abstract
Overlapping genes (OGs) draw the focus of recent day's research. However, the significance of OGs in prokaryotic genomes remained unexplored. As an adaptation to high temperature, thermophiles were shown to eliminate their intergenic regions. Therefore, it could be possible that prokaryotes would increase their OG content to adapt to high temperature. To test this hypothesis, we carried out a comparative study on OG frequency of 256 prokaryotic genomes comprising both thermophiles and non-thermophiles. It was found that thermophiles exhibit higher frequency of overlapping genes than non-thermophiles. Moreover, overlap frequency was found to correlate with optimal growth temperature (OGT) in prokaryotes. Long overlap frequency was found to hold a positive correlation with OGT resulting in an abundance of long overlaps in thermophiles compared to non-thermophiles. On the other hand, short overlap (1-4 nucleotides) frequency (SOF) did not yield any direct correlation with OGT. However, the correlation of SOF with CAIavg (extent of variation of codon usage bias measured as the mean of codon adaptation index of all genes in a given genome) and IG% (proportion of intergenic regions) indicate that they might upregulate the aforementioned factors (CAIavg and IG%) which are already known to be vital forces for thermophilic adaptation. From these evidences, we propose that the OG content bears a strong link to thermophily. Long overlaps are important for their genome compaction and short overlaps are important to uphold high CAIavg. Our findings will surely help in better understanding of the significance of overlapping gene content in prokaryotic genomes.
Collapse
|
18
|
Shukla A, Hilgenfeld R. Acquisition of new protein domains by coronaviruses: analysis of overlapping genes coding for proteins N and 9b in SARS coronavirus. Virus Genes 2014; 50:29-38. [PMID: 25410051 PMCID: PMC7089080 DOI: 10.1007/s11262-014-1139-8] [Citation(s) in RCA: 16] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/17/2014] [Accepted: 10/25/2014] [Indexed: 12/02/2022]
Abstract
Acquisition of new proteins by viruses usually occurs through horizontal gene transfer or through gene duplication, but another, less common mechanism is the usage of completely or partially overlapping reading frames. A case of acquisition of a completely new protein through introduction of a start codon in an alternative reading frame is the protein encoded by open reading frame (orf) 9b of SARS coronavirus. This gene completely overlaps with the nucleocapsid (N) gene (orf9a). Our findings indicate that the orf9b gene features a discordant codon-usage pattern. We analyzed the evolution of orf9b in concert with orf9a using sequence data of betacoronavirus-lineage b and found that orf9b, which encodes the overprinting protein, evolved largely independent of the overprinted orf9a. We also examined the protein products of these genomic sequences for their structural flexibility and found that it is not necessary for a newly acquired, overlapping protein product to be intrinsically disordered, in contrast to earlier suggestions. Our findings contribute to characterizing sequence properties of newly acquired genes making use of overlapping reading frames.
Collapse
Affiliation(s)
- Aditi Shukla
- Institute of Biochemistry, Center for Structural and Cell Biology in Medicine, University of Lübeck, Ratzeburger Allee 160, 23538 Lübeck, Germany
- Graduate School for Computing in Medicine & Life Sciences, University of Lübeck, Lübeck, Germany
| | - Rolf Hilgenfeld
- Institute of Biochemistry, Center for Structural and Cell Biology in Medicine, University of Lübeck, Ratzeburger Allee 160, 23538 Lübeck, Germany
- German Center for Infection Research (DZIF), University of Lübeck, Lübeck, Germany
| |
Collapse
|
19
|
Doore SM, Baird CD, Roznowski AP, Fane BA. The Evolution of Genes within Genes and the Control of DNA Replication in Microviruses. Mol Biol Evol 2014; 31:1421-31. [DOI: 10.1093/molbev/msu089] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open
|
20
|
Lo MK, Søgaard TM, Karlin DG. Evolution and structural organization of the C proteins of paramyxovirinae. PLoS One 2014; 9:e90003. [PMID: 24587180 PMCID: PMC3934983 DOI: 10.1371/journal.pone.0090003] [Citation(s) in RCA: 25] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/19/2013] [Accepted: 01/24/2014] [Indexed: 12/21/2022] Open
Abstract
The phosphoprotein (P) gene of most Paramyxovirinae encodes several proteins in overlapping frames: P and V, which share a common N-terminus (PNT), and C, which overlaps PNT. Overlapping genes are of particular interest because they encode proteins originated de novo, some of which have unknown structural folds, challenging the notion that nature utilizes only a limited, well-mapped area of fold space. The C proteins cluster in three groups, comprising measles, Nipah, and Sendai virus. We predicted that all C proteins have a similar organization: a variable, disordered N-terminus and a conserved, α-helical C-terminus. We confirmed this predicted organization by biophysically characterizing recombinant C proteins from Tupaia paramyxovirus (measles group) and human parainfluenza virus 1 (Sendai group). We also found that the C of the measles and Nipah groups have statistically significant sequence similarity, indicating a common origin. Although the C of the Sendai group lack sequence similarity with them, we speculate that they also have a common origin, given their similar genomic location and structural organization. Since C is dispensable for viral replication, unlike PNT, we hypothesize that C may have originated de novo by overprinting PNT in the ancestor of Paramyxovirinae. Intriguingly, in measles virus and Nipah virus, PNT encodes STAT1-binding sites that overlap different regions of the C-terminus of C, indicating they have probably originated independently. This arrangement, in which the same genetic region encodes simultaneously a crucial functional motif (a STAT1-binding site) and a highly constrained region (the C-terminus of C), seems paradoxical, since it should severely reduce the ability of the virus to adapt. The fact that it originated twice suggests that it must be balanced by an evolutionary advantage, perhaps from reducing the size of the genetic region vulnerable to mutations.
Collapse
Affiliation(s)
- Michael K. Lo
- Centers for Disease Control and Prevention, Viral Special Pathogens Branch, Atlanta, Georgia, United States of America
| | - Teit Max Søgaard
- Division of Structural Biology, Oxford University, Oxford, United Kingdom
| | - David G. Karlin
- Division of Structural Biology, Oxford University, Oxford, United Kingdom
- Department of Zoology, University of Oxford, Oxford, United Kingdom
- * E-mail:
| |
Collapse
|
21
|
Labonté JM, Suttle CA. Metagenomic and whole-genome analysis reveals new lineages of gokushoviruses and biogeographic separation in the sea. Front Microbiol 2013; 4:404. [PMID: 24399999 PMCID: PMC3871881 DOI: 10.3389/fmicb.2013.00404] [Citation(s) in RCA: 50] [Impact Index Per Article: 4.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/12/2013] [Accepted: 12/06/2013] [Indexed: 01/20/2023] Open
Abstract
Much remains to be learned about single-stranded (ss) DNA viruses in natural systems, and the evolutionary relationships among them. One of the eight recognized families of ssDNA viruses is the Microviridae, a group of viruses infecting bacteria. In this study we used metagenomic analysis, genome assembly, and amplicon sequencing of purified ssDNA to show that bacteriophages belonging to the subfamily Gokushovirinae within the Microviridae are genetically diverse and widespread members of marine microbial communities. Metagenomic analysis of coastal samples from the Gulf of Mexico (GOM) and British Columbia, Canada, revealed numerous sequences belonging to gokushoviruses and allowed the assembly of five putative genomes with an organization similar to chlamydiamicroviruses. Fragment recruitment to these genomes from different metagenomic data sets is consistent with gokushovirus genotypes being restricted to specific oceanic regions. Conservation among the assembled genomes allowed the design of degenerate primers that target an 800 bp fragment from the gene encoding the major capsid protein. Sequences could be amplified from coastal temperate and subtropical waters, but not from samples collected from the Arctic Ocean, or freshwater lakes. Phylogenetic analysis revealed that most sequences were distantly related to those from cultured representatives. Moreover, the sequences fell into at least seven distinct evolutionary groups, most of which were represented by one of the assembled metagenomes. Our results greatly expand the known sequence space for gokushoviruses, and reveal biogeographic separation and new evolutionary lineages of gokushoviruses in the oceans.
Collapse
Affiliation(s)
- Jessica M Labonté
- Department of Microbiology and Immunology, University of British Columbia Vancouver, BC, Canada
| | - Curtis A Suttle
- Department of Microbiology and Immunology, University of British Columbia Vancouver, BC, Canada ; Department of Earth, Ocean and Atmospheric Sciences, University of British Columbia Vancouver, BC, Canada ; Department of Botany, University of British Columbia Vancouver, BC, Canada ; Canadian Institute for Advanced Research, University of British Columbia Vancouver, BC, Canada
| |
Collapse
|
22
|
Rodriguez-Frias F, Buti M, Tabernero D, Homs M. Quasispecies structure, cornerstone of hepatitis B virus infection: mass sequencing approach. World J Gastroenterol 2013; 19:6995-7023. [PMID: 24222943 PMCID: PMC3819535 DOI: 10.3748/wjg.v19.i41.6995] [Citation(s) in RCA: 34] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 06/28/2013] [Revised: 08/23/2013] [Accepted: 09/15/2013] [Indexed: 02/06/2023] Open
Abstract
Hepatitis B virus (HBV) is a DNA virus with complex replication, and high replication and mutation rates, leading to a heterogeneous viral population. The population is comprised of genomes that are closely related, but not identical; hence, HBV is considered a viral quasispecies. Quasispecies variability may be somewhat limited by the high degree of overlapping between the HBV coding regions, which is especially important in the P and S gene overlapping regions, but is less significant in the X and preCore/Core genes. Despite this restriction, several clinically and pathologically relevant variants have been characterized along the viral genome. Next-generation sequencing (NGS) approaches enable high-throughput analysis of thousands of clonally amplified regions and are powerful tools for characterizing genetic diversity in viral strains. In the present review, we update the information regarding HBV variability and present a summary of the various NGS approaches available for research in this virus. In addition, we provide an analysis of the clinical implications of HBV variants and their study by NGS.
Collapse
|
23
|
Viral proteins originated de novo by overprinting can be identified by codon usage: application to the "gene nursery" of Deltaretroviruses. PLoS Comput Biol 2013; 9:e1003162. [PMID: 23966842 PMCID: PMC3744397 DOI: 10.1371/journal.pcbi.1003162] [Citation(s) in RCA: 52] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/26/2012] [Accepted: 06/13/2013] [Indexed: 12/24/2022] Open
Abstract
A well-known mechanism through which new protein-coding genes originate is by modification of pre-existing genes, e.g. by duplication or horizontal transfer. In contrast, many viruses generate protein-coding genes de novo, via the overprinting of a new reading frame onto an existing (“ancestral”) frame. This mechanism is thought to play an important role in viral pathogenicity, but has been poorly explored, perhaps because identifying the de novo frames is very challenging. Therefore, a new approach to detect them was needed. We assembled a reference set of overlapping genes for which we could reliably determine the ancestral frames, and found that their codon usage was significantly closer to that of the rest of the viral genome than the codon usage of de novo frames. Based on this observation, we designed a method that allowed the identification of de novo frames based on their codon usage with a very good specificity, but intermediate sensitivity. Using our method, we predicted that the Rex gene of deltaretroviruses has originated de novo by overprinting the Tax gene. Intriguingly, several genes in the same genomic region have also originated de novo and encode proteins that regulate the functions of Tax. Such “gene nurseries” may be common in viral genomes. Finally, our results confirm that the genomic GC content is not the only determinant of codon usage in viruses and suggest that a constraint linked to translation must influence codon usage. How does novelty originate in nature? It is commonly thought that new genes are generated mainly by modifications of existing genes (the “tinkering” model). In contrast, we have shown recently that in viruses, numerous genes are generated entirely de novo (“from scratch”). The role of these genes remains underexplored, however, because they are difficult to identify. We have therefore developed a new method to detect genes originated de novo in viral genomes, based on the observation that each viral genome has a unique “signature”, which genes originated de novo do not share. We applied this method to analyze the genes of Human T-Lymphotropic Virus 1 (HTLV1), a relative of the HIV virus and also a major human pathogen that infects about twenty million people worldwide. The life cycle of HTLV1 is finely regulated – it can stay dormant for long periods and can provoke blood cancers (leukemias) after a very long incubation. We discovered that several of the genes of HTLV1 have originated de novo. These novel genes play a key role in regulating the life cycle of HTLV1, and presumably its pathogenicity. Our investigations suggest that such “gene nurseries” may be common in viruses.
Collapse
|
24
|
Kawano Y, Neeley S, Adachi K, Nakai H. An experimental and computational evolution-based method to study a mode of co-evolution of overlapping open reading frames in the AAV2 viral genome. PLoS One 2013; 8:e66211. [PMID: 23826091 PMCID: PMC3691236 DOI: 10.1371/journal.pone.0066211] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/15/2013] [Accepted: 05/07/2013] [Indexed: 02/07/2023] Open
Abstract
Overlapping open reading frames (ORFs) in viral genomes undergo co-evolution; however, how individual amino acids coded by overlapping ORFs are structurally, functionally, and co-evolutionarily constrained remains difficult to address by conventional homologous sequence alignment approaches. We report here a new experimental and computational evolution-based methodology to address this question and report its preliminary application to elucidating a mode of co-evolution of the frame-shifted overlapping ORFs in the adeno-associated virus (AAV) serotype 2 viral genome. These ORFs encode both capsid VP protein and non-structural assembly-activating protein (AAP). To show proof of principle of the new method, we focused on the evolutionarily conserved QVKEVTQ and KSKRSRR motifs, a pair of overlapping heptapeptides in VP and AAP, respectively. In the new method, we first identified a large number of capsid-forming VP3 mutants and functionally competent AAP mutants of these motifs from mutant libraries by experimental directed evolution under no co-evolutionary constraints. We used Illumina sequencing to obtain a large dataset and then statistically assessed the viability of VP and AAP heptapeptide mutants. The obtained heptapeptide information was then integrated into an evolutionary algorithm, with which VP and AAP were co-evolved from random or native nucleotide sequences in silico. As a result, we demonstrate that these two heptapeptide motifs could exhibit high degeneracy if coded by separate nucleotide sequences, and elucidate how overlap-evoked co-evolutionary constraints play a role in making the VP and AAP heptapeptide sequences into the present shape. Specifically, we demonstrate that two valine (V) residues and β-strand propensity in QVKEVTQ are structurally important, the strongly negative and hydrophilic nature of KSKRSRR is functionally important, and overlap-evoked co-evolution imposes strong constraints on serine (S) residues in KSKRSRR, despite high degeneracy of the motifs in the absence of co-evolutionary constraints.
Collapse
Affiliation(s)
- Yasuhiro Kawano
- Department of Molecular and Medical Genetics, Oregon Health and Science University School of Medicine, Portland, Oregon, United States of America
- Takara Bio Inc., Otsu Shiga, Japan
| | - Shane Neeley
- Department of Molecular and Medical Genetics, Oregon Health and Science University School of Medicine, Portland, Oregon, United States of America
| | - Kei Adachi
- Department of Molecular and Medical Genetics, Oregon Health and Science University School of Medicine, Portland, Oregon, United States of America
| | - Hiroyuki Nakai
- Department of Molecular and Medical Genetics, Oregon Health and Science University School of Medicine, Portland, Oregon, United States of America
- * E-mail:
| |
Collapse
|
25
|
Simon-Loriere E, Holmes EC, Pagán I. The effect of gene overlapping on the rate of RNA virus evolution. Mol Biol Evol 2013; 30:1916-28. [PMID: 23686658 DOI: 10.1093/molbev/mst094] [Citation(s) in RCA: 38] [Impact Index Per Article: 3.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/19/2023] Open
Abstract
Gene overlapping is widely employed by RNA viruses to generate genetic novelty while retaining a small genome size. However, gene overlapping also increases the deleterious effect of mutations as they affect more than one gene, thereby reducing the evolutionary rate of RNA viruses and hence their adaptive capacity. Although there is general agreement on the benefits of gene overlapping as a mechanism of genomic compression for rapidly evolving organisms, its effect on the pace of RNA virus evolution remains a source of debate. To address this issue, we collected sequence data from 117 instances of gene overlapping across 19 families, 30 genera, and 55 species of RNA viruses. On these data, we analyzed how genetic distances, selective pressures, and the distribution of RNA secondary structures and conserved protein functional domains vary between overlapping (OV) and nonoverlapping (NOV) regions. We show that gene overlapping generally results in a decrease in the rate of RNA virus evolution through a reduction in the frequency of synonymous mutations. However, this effect is less pronounced in genes with a terminal rather than an internal gene overlap, which might result from a greater proportion of protein functional conserved domains in NOV than in OV regions, in turn reducing the number of nonsynonymous mutations in the former. Overall, our analyses clarify the role of gene overlapping as a modulator of the evolutionary rates exhibited by RNA viruses and shed light on the factors that shape the genetic diversity of this important group of pathogens.
Collapse
Affiliation(s)
- Etienne Simon-Loriere
- Institut Pasteur, Unité de Génétique Fonctionnelle des Maladies Infectieuses, Paris, France
| | | | | |
Collapse
|
26
|
Torres C, Fernández MDB, Flichman DM, Campos RH, Mbayed VA. Influence of overlapping genes on the evolution of human hepatitis B virus. Virology 2013; 441:40-8. [PMID: 23541083 DOI: 10.1016/j.virol.2013.02.027] [Citation(s) in RCA: 23] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/08/2013] [Revised: 02/05/2013] [Accepted: 02/28/2013] [Indexed: 12/23/2022]
Abstract
The aim of this work was to analyse the influence of overlapping genes on the evolution of hepatitis B virus (HBV). A differential evolutionary behaviour among genetic regions and clinical status was found. Dissimilar levels of conservation of the different protein regions could derive from alternative mechanisms to maintain functionality. We propose that, in overlapping regions, selective constraints on one of the genes could drive the substitution process. This would allow protein conservation in one gene by synonymous substitutions while mechanisms of tolerance to the change operate in the overlapping gene (e.g. usage of amino acids with high-degeneracy codons, differential codon usage and replacement by physicochemically similar amino acids). In addition, differential selection pressure according to the HBeAg status was found in all genes, suggesting that the immune response could be one of the factors that would constrain viral replication by interacting with different HBV proteins during the HBeAg(-) stage.
Collapse
Affiliation(s)
- Carolina Torres
- Cátedra de Virología, Facultad de Farmacia y Bioquímica, Universidad de Buenos Aires, Ciudad Autónoma de Buenos Aires, Argentina; CONICET, Argentina
| | | | | | | | | |
Collapse
|
27
|
A fully decompressed synthetic bacteriophage øX174 genome assembled and archived in yeast. Virology 2012; 434:278-84. [DOI: 10.1016/j.virol.2012.09.020] [Citation(s) in RCA: 72] [Impact Index Per Article: 5.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/23/2012] [Revised: 09/15/2012] [Accepted: 09/21/2012] [Indexed: 11/20/2022]
|
28
|
Seligmann H. Overlapping genetic codes for overlapping frameshifted genes in Testudines, and Lepidochelys olivacea as special case. Comput Biol Chem 2012; 41:18-34. [DOI: 10.1016/j.compbiolchem.2012.08.002] [Citation(s) in RCA: 42] [Impact Index Per Article: 3.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/06/2011] [Revised: 03/14/2012] [Accepted: 08/05/2012] [Indexed: 11/29/2022]
|
29
|
Zarghani SN, Shams-Bakhsh M, Zand N, Sokhandan-Bashir N, Pazhouhandeh M. Genetic analysis of Iranian population of Potato leafroll virus based on ORF0. Virus Genes 2012; 45:567-74. [PMID: 22903753 DOI: 10.1007/s11262-012-0804-z] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/15/2012] [Accepted: 08/06/2012] [Indexed: 11/25/2022]
Abstract
Potato leafroll virus (PLRV) is a destructive virus of potatoes and responsible for high yield losses wherever potatoes are grown. In this study, DNA fragments containing ORF0 from each of nine PLRV isolates was sequenced. Sequence analysis data using 36 isolates from 12 different countries including 14 Iranian isolates showed that the identities of ORF0 at both nucleotide and amino acid levels between the Iranian isolates were 96-100 % and these isolates were more similar to the European PLRV isolates than to the other isolates. Furthermore, phylogenetic and population genetic analysis were carried out on the basis of full-length ORF0 and overlapping and non-overlapping regions of ORF0 and ORF1 (ORF0/1) which revealed that PLRV isolates were not geographically resolved. Also, we identified negative selection with different ratios for each of the mentioned genomic regions suggesting effects of F-box motif and -1 frameshift on ORF0 non-overlapping region and ORF0/1 in the selection pressure, respectively. Five recombination events were detected in the Iranian, Australian, and European isolates suggesting an important role for this phenomenon in influencing genetic diversity within this virus population.
Collapse
|
30
|
Sabath N, Wagner A, Karlin D. Evolution of viral proteins originated de novo by overprinting. Mol Biol Evol 2012; 29:3767-80. [PMID: 22821011 PMCID: PMC3494269 DOI: 10.1093/molbev/mss179] [Citation(s) in RCA: 101] [Impact Index Per Article: 7.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/02/2023] Open
Abstract
New protein-coding genes can originate either through modification of existing genes or de novo. Recently, the importance of de novo origination has been recognized in eukaryotes, although eukaryotic genes originated de novo are relatively rare and difficult to identify. In contrast, viruses contain many de novo genes, namely those in which an existing gene has been “overprinted” by a new open reading frame, a process that generates a new protein-coding gene overlapping the ancestral gene. We analyzed the evolution of 12 experimentally validated viral genes that originated de novo and estimated their relative ages. We found that young de novo genes have a different codon usage from the rest of the genome. They evolve rapidly and are under positive or weak purifying selection. Thus, young de novo genes might have strain-specific functions, or no function, and would be difficult to detect using current genome annotation methods that rely on the sequence signature of purifying selection. In contrast to young de novo genes, older de novo genes have a codon usage that is similar to the rest of the genome. They evolve slowly and are under stronger purifying selection. Some of the oldest de novo genes evolve under stronger selection pressure than the ancestral gene they overlap, suggesting an evolutionary tug of war between the ancestral and the de novo gene.
Collapse
Affiliation(s)
- Niv Sabath
- Institute of Evolutionary Biology and Environmental Studies, University of Zurich, Zurich, Switzerland.
| | | | | |
Collapse
|
31
|
Ma MR, Ha XQ, Ling H, Wang ML, Zhang FX, Zhang SD, Li G, Yan W. The characteristics of the synonymous codon usage in hepatitis B virus and the effects of host on the virus in codon usage pattern. Virol J 2011; 8:544. [PMID: 22171933 PMCID: PMC3287100 DOI: 10.1186/1743-422x-8-544] [Citation(s) in RCA: 18] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/06/2011] [Accepted: 12/15/2011] [Indexed: 12/18/2022] Open
Abstract
BACKGROUND Hepatitis B virus (HBV) infection is one of the main human health problem and causes a large-scale of patients chronic infection worldwide.. As the replication of HBV depends on its host cell system, codon usage pattern for the viral gene might be susceptible to two main selections, namely mutation pressure and translation selection. In this case, a deeper investigation between HBV evolution and host adaptive response might assist control this disease. RESULT Relative synonymous codon usage (RSCU) values for the whole HBV coding sequence were studied by Principal component analysis (PCA). The characteristics of the synonymous codon usage patterns, nucleotide contents and the comparison between ENC values of the whole HBV coding sequence indicated that the interaction between virus mutation pressure and host translation selection exists in the processes of HBV evolution. The synonymous codon usage pattern of HBV is a mixture of coincidence and antagonism to that of host cell. But the difference of genetic characteristic of HBV failed to be observed to its different epidemic areas or subtypes, suggesting that geographic factor is limited to influence the evolution of this virus, while genetic characteristic based on HBV genotypes could be divided into three groups, namely (i) genotyps A and E, (ii) genotype B, (iii) genotypes C, D and G. CONCLUSION Codon usage patterns from PCA for identification of evolutionary trends in HBV provide an alternative approach to understand the evolution of HBV. Further more, a combined selection of mutation pressure with translation selection on codon usage might shed a light on understanding the evolutionary trends of HBV genotypes.
Collapse
Affiliation(s)
- Ming-ren Ma
- Experimental Center of Medicine, Lanzhou General Hospital, Lanzhou Military Area Command, Lanzhou 730000, China.
| | | | | | | | | | | | | | | |
Collapse
|
32
|
Krupovic M, Prangishvili D, Hendrix RW, Bamford DH. Genomics of bacterial and archaeal viruses: dynamics within the prokaryotic virosphere. Microbiol Mol Biol Rev 2011; 75:610-35. [PMID: 22126996 PMCID: PMC3232739 DOI: 10.1128/mmbr.00011-11] [Citation(s) in RCA: 158] [Impact Index Per Article: 11.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/11/2022] Open
Abstract
Prokaryotes, bacteria and archaea, are the most abundant cellular organisms among those sharing the planet Earth with human beings (among others). However, numerous ecological studies have revealed that it is actually prokaryotic viruses that predominate on our planet and outnumber their hosts by at least an order of magnitude. An understanding of how this viral domain is organized and what are the mechanisms governing its evolution is therefore of great interest and importance. The vast majority of characterized prokaryotic viruses belong to the order Caudovirales, double-stranded DNA (dsDNA) bacteriophages with tails. Consequently, these viruses have been studied (and reviewed) extensively from both genomic and functional perspectives. However, albeit numerous, tailed phages represent only a minor fraction of the prokaryotic virus diversity. Therefore, the knowledge which has been generated for this viral system does not offer a comprehensive view of the prokaryotic virosphere. In this review, we discuss all families of bacterial and archaeal viruses that contain more than one characterized member and for which evolutionary conclusions can be attempted by use of comparative genomic analysis. We focus on the molecular mechanisms of their genome evolution as well as on the relationships between different viral groups and plasmids. It becomes clear that evolutionary mechanisms shaping the genomes of prokaryotic viruses vary between different families and depend on the type of the nucleic acid, characteristics of the virion structure, as well as the mode of the life cycle. We also point out that horizontal gene transfer is not equally prevalent in different virus families and is not uniformly unrestricted for diverse viral functions.
Collapse
Affiliation(s)
- Mart Krupovic
- Institut Pasteur, Unité Biologie Moléculaire du Gène chez les Extrêmophiles, Département de Microbiologie, 25 rue du Dr. Roux, 75015 Paris, France.
| | | | | | | |
Collapse
|
33
|
Successful COG8 and PDF overlap is mediated by alterations in splicing and polyadenylation signals. Hum Genet 2011; 131:265-74. [PMID: 21805148 DOI: 10.1007/s00439-011-1075-9] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/23/2011] [Accepted: 07/19/2011] [Indexed: 01/21/2023]
Abstract
Although gene-free areas compose the great majority of eukaryotic genomes, a significant fraction of genes overlaps, i.e., unique nucleotide sequences are part of more than one transcription unit. In this work, the evolutionary history and origin of a same-strand gene overlap is dissected through the analysis of COG8 (component of oligomeric Golgi complex 8) and PDF (peptide deformylase). Comparative genomic surveys reveal that the relative locations of these two genes have been changing over the last 445 million years from distinct chromosomal locations in fish to overlapping in rodents and primates, indicating that the overlap between these genes precedes their divergence. The overlap between the two genes was initiated by the gain of a novel splice donor site between the COG8 stop codon and PDF initiation codon. Splicing is accomplished by the use of the PDF acceptor, leading COG8 to share the 3'end with PDF. In primates, loss of the ancestral polyadenylation signal for COG8 makes the overlap between COG8 and PDF mandatory, while in mouse and rat concurrent overlapping and non-overlapping Cog8 transcripts exist. Altogether, we demonstrate that the origin, evolution and preservation of the COG8/PDF same-strand overlap follow similar mechanistic steps as those documented for antisense overlaps where gain and/or loss of splice sites and polyadenylation signals seems to drive the process.
Collapse
|
34
|
Immune-induced evolutionary selection focused on a single reading frame in overlapping hepatitis B virus proteins. J Virol 2011; 85:4558-66. [PMID: 21307195 DOI: 10.1128/jvi.02142-10] [Citation(s) in RCA: 32] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/13/2022] Open
Abstract
Viruses employ various means to evade immune detection. Reduction of CD8(+) T cell epitopes is one of the common strategies used for this purpose. Hepatitis B virus (HBV), a member of the Hepadnaviridae family, has four open reading frames, with about 50% overlap between the genes they encode. We computed the CD8(+) T cell epitope density within HBV proteins and the mutations within the epitopes. Our results suggest that HBV accumulates escape mutations that reduce the number of epitopes. These mutations are not equally distributed among genes and reading frames. While the highly expressed core and X proteins are selected to have low epitope density, polymerase, which is expressed at low levels, does not undergo the same selection. In overlapping regions, mutations in one protein-coding sequence also affect the other protein-coding sequence. We show that mutations lead to the removal of epitopes in X and surface proteins even at the expense of the addition of epitopes in polymerase. The total escape mutation rate for overlapping regions is lower than that for nonoverlapping regions. The lower epitope replacement rate for overlapping regions slows the evolutionary escape rate of these regions but leads to the accumulation of mutations more robust in the transfer between hosts, such as mutations preventing proteasomal cleavage into epitopes.
Collapse
|
35
|
Pagán I, Holmes EC. Long-term evolution of the Luteoviridae: time scale and mode of virus speciation. J Virol 2010; 84:6177-87. [PMID: 20375155 PMCID: PMC2876656 DOI: 10.1128/jvi.02160-09] [Citation(s) in RCA: 92] [Impact Index Per Article: 6.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/12/2009] [Accepted: 03/31/2010] [Indexed: 12/20/2022] Open
Abstract
Despite their importance as agents of emerging disease, the time scale and evolutionary processes that shape the appearance of new viral species are largely unknown. To address these issues, we analyzed intra- and interspecific evolutionary processes in the Luteoviridae family of plant RNA viruses. Using the coat protein gene of 12 members of the family, we determined their phylogenetic relationships, rates of nucleotide substitution, times to common ancestry, and patterns of speciation. An associated multigene analysis enabled us to infer the nature of selection pressures and the genomic distribution of recombination events. Although rates of evolutionary change and selection pressures varied among genes and species and were lower in some overlapping gene regions, all fell within the range of those seen in animal RNA viruses. Recombination breakpoints were commonly observed at gene boundaries but less so within genes. Our molecular clock analysis suggested that the origin of the currently circulating Luteoviridae species occurred within the last 4 millennia, with intraspecific genetic diversity arising within the last few hundred years. Speciation within the Luteoviridae may therefore be associated with the expansion of agricultural systems. Finally, our phylogenetic analysis suggested that viral speciation events tended to occur within the same plant host species and country of origin, as expected if speciation is largely sympatric, rather than allopatric, in nature.
Collapse
Affiliation(s)
- Israel Pagán
- Center for Infectious Disease Dynamics, Department of Biology, The Pennsylvania State University, University Park, PA 16802, USA.
| | | |
Collapse
|
36
|
Sequence variability and evolution of the terminal overlapping VP5 gene of the infectious bursal disease virus. Virus Genes 2010; 41:59-66. [DOI: 10.1007/s11262-010-0485-4] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/01/2009] [Accepted: 04/15/2010] [Indexed: 10/19/2022]
|
37
|
Zhang D, Chen J, Deng L, Mao Q, Zheng J, Wu J, Zeng C, Li Y. Evolutionary selection associated with the multi-function of overlapping genes in the hepatitis B virus. INFECTION GENETICS AND EVOLUTION 2010; 10:84-8. [DOI: 10.1016/j.meegid.2009.10.006] [Citation(s) in RCA: 25] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/01/2009] [Revised: 10/10/2009] [Accepted: 10/20/2009] [Indexed: 11/16/2022]
|
38
|
Liang JW, Tian FL, Lan ZR, Huang B, Zhuang WZ. Selection characterization on overlapping reading frame of multiple-protein-encoding P gene in Newcastle disease virus. Vet Microbiol 2009; 144:257-63. [PMID: 20079581 DOI: 10.1016/j.vetmic.2009.12.029] [Citation(s) in RCA: 17] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/15/2009] [Accepted: 12/21/2009] [Indexed: 01/08/2023]
Abstract
The aim of this study was to characterize the molecular evolution of P and V protein genes of the Newcastle disease virus (NDV). The P gene sequences of 55 NDV isolates, representing different chronological and geographic origins, were obtained from GenBank. In this paper, the evolution of the specific regions of the NDV P gene, encoding the P and V proteins, was analyzed. The nucleotides from the shared P/V region encoded the co-amino terminus of the two proteins, while the P-V/V-P region was respectively encoded by the nucleotides within the P ORF or the V ORF in the common sequence (after the mRNA editing site). As well, the P-cut region exclusively encoded the P protein. Finally, the P-V and V-P regions were further broken down into P1 and P2 fragments with the corresponding V1 and V2 fragments. In the P gene, the P-cut portion corresponding to the C-terminal of the P protein was the most highly conserved, while the P-V region was the most variable. This was interpreted as a lower constraint for function in the common sequence than in the unique P sequence that is known to contain an important function. Interestingly, in the common P-V/V-P function, variability of V1 was compensated by a higher conservation of the corresponding P1, and conversely for the P2/V2, which suggested that the flexibility of one ORF with less function served the purpose of allowing positive selection in the other overlapping ORF that exhibited more function.
Collapse
Affiliation(s)
- Jun-Wen Liang
- College of Life Science, Shandong Normal University, Wenhua East Road, Shandong Province, Jinan 250014, China
| | | | | | | | | |
Collapse
|
39
|
Dickins B, Nekrutenko A. High-resolution mapping of evolutionary trajectories in a phage. Genome Biol Evol 2009; 1:294-307. [PMID: 20333199 PMCID: PMC2817424 DOI: 10.1093/gbe/evp029] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 07/29/2009] [Indexed: 12/11/2022] Open
Abstract
Experimental evolution in rapidly reproducing viruses offers a robust means to infer substitution trajectories during evolution. But with conventional approaches, this inference is limited by how many individual genotypes can be sampled from the population at a time. Low-frequency changes are difficult to detect, potentially rendering early stages of adaptation unobservable. Here we circumvent this using short-read sequencing technology in a fine-grained analysis of polymorphism dynamics in the sentinel organism: a single-stranded DNA phage PhiX174. Nucleotide differences were educed from noise with binomial filtering methods that harnessed quality scores and separate data from brief phage amplifications. Remarkably, a significant degree of variation was observed in all samples including those grown in brief 2-h cultures. Sites previously reported as subject to high-frequency polymorphisms over a course of weeks exhibited monotonic increases in polymorphism frequency within hours in this study. Additionally, even with limitations imposed by the short length of sequencing reads, we were able to observe statistically significant linkage among polymorphic sites in evolved lineages. Additional parallels between replicate lineages were apparent in the sharing of polymorphic sites and in correlated polymorphism frequencies. Missense mutations were more likely to occur than silent mutations. This study offers the first glimpse into "real-time" substitution dynamics and offers a robust conceptual framework for future viral resequencing studies.
Collapse
Affiliation(s)
- Benjamin Dickins
- Center for Comparative Genomics and Bioinformatics, The Pennsylvania State University, USA.
| | | |
Collapse
|
40
|
Overlapping genes produce proteins with unusual sequence properties and offer insight into de novo protein creation. J Virol 2009; 83:10719-36. [PMID: 19640978 DOI: 10.1128/jvi.00595-09] [Citation(s) in RCA: 143] [Impact Index Per Article: 8.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/20/2022] Open
Abstract
It is widely assumed that new proteins are created by duplication, fusion, or fission of existing coding sequences. Another mechanism of protein birth is provided by overlapping genes. They are created de novo by mutations within a coding sequence that lead to the expression of a novel protein in another reading frame, a process called "overprinting." To investigate this mechanism, we have analyzed the sequences of the protein products of manually curated overlapping genes from 43 genera of unspliced RNA viruses infecting eukaryotes. Overlapping proteins have a sequence composition globally biased toward disorder-promoting amino acids and are predicted to contain significantly more structural disorder than nonoverlapping proteins. By analyzing the phylogenetic distribution of overlapping proteins, we were able to confirm that 17 of these had been created de novo and to study them individually. Most proteins created de novo are orphans (i.e., restricted to one species or genus). Almost all are accessory proteins that play a role in viral pathogenicity or spread, rather than proteins central to viral replication or structure. Most proteins created de novo are predicted to be fully disordered and have a highly unusual sequence composition. This suggests that some viral overlapping reading frames encoding hypothetical proteins with highly biased composition, often discarded as noncoding, might in fact encode proteins. Some proteins created de novo are predicted to be ordered, however, and whenever a three-dimensional structure of such a protein has been solved, it corresponds to a fold previously unobserved, suggesting that the study of these proteins could enhance our knowledge of protein space.
Collapse
|
41
|
Sabath N, Landan G, Graur D. A method for the simultaneous estimation of selection intensities in overlapping genes. PLoS One 2008; 3:e3996. [PMID: 19098983 PMCID: PMC2601044 DOI: 10.1371/journal.pone.0003996] [Citation(s) in RCA: 31] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/22/2008] [Accepted: 11/21/2008] [Indexed: 11/18/2022] Open
Abstract
Inferring the intensity of positive selection in protein-coding genes is important since it is used to shed light on the process of adaptation. Recently, it has been reported that overlapping genes, which are ubiquitous in all domains of life, seem to exhibit inordinate degrees of positive selection. Here, we present a new method for the simultaneous estimation of selection intensities in overlapping genes. We show that the appearance of positive selection is caused by assuming that selection operates independently on each gene in an overlapping pair, thereby ignoring the unique evolutionary constraints on overlapping coding regions. Our method uses an exact evolutionary model, thereby voiding the need for approximation or intensive computation. We test the method by simulating the evolution of overlapping genes of different types as well as under diverse evolutionary scenarios. Our results indicate that the independent estimation approach leads to the false appearance of positive selection even though the gene is in reality subject to negative selection. Finally, we use our method to estimate selection in two influenza A genes for which positive selection was previously inferred. We find no evidence for positive selection in both cases.
Collapse
Affiliation(s)
- Niv Sabath
- Department of Biology and Biochemistry, University of Houston, Houston, Texas, United States of America.
| | | | | |
Collapse
|
42
|
de Groot S, Mailund T, Lunter G, Hein J. Investigating selection on viruses: a statistical alignment approach. BMC Bioinformatics 2008; 9:304. [PMID: 18616801 PMCID: PMC2478691 DOI: 10.1186/1471-2105-9-304] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/10/2007] [Accepted: 07/10/2008] [Indexed: 01/14/2023] Open
Abstract
BACKGROUND Two problems complicate the study of selection in viral genomes: Firstly, the presence of genes in overlapping reading frames implies that selection in one reading frame can bias our estimates of neutral mutation rates in another reading frame. Secondly, the high mutation rates we are likely to encounter complicate the inference of a reliable alignment of genomes. To address these issues, we develop a model that explicitly models selection in overlapping reading frames. We then integrate this model into a statistical alignment framework, enabling us to estimate selection while explicitly dealing with the uncertainty of individual alignments. We show that in this way we obtain un-biased selection parameters for different genomic regions of interest, and can improve in accuracy compared to using a fixed alignment. RESULTS We run a series of simulation studies to gauge how well we do in selection estimation, especially in comparison to the use of a fixed alignment. We show that the standard practice of using a ClustalW alignment can lead to considerable biases and that estimation accuracy increases substantially when explicitly integrating over the uncertainty in inferred alignments. We even manage to compete favourably for general evolutionary distances with an alignment produced by GenAl. We subsequently run our method on HIV2 and Hepatitis B sequences. CONCLUSION We propose that marginalizing over all alignments, as opposed to using a fixed one, should be considered in any parametric inference from divergent sequence data for which the alignments are not known with certainty. Moreover, we discover in HIV2 that double coding regions appear to be under less stringent selection than single coding ones. Additionally, there appears to be evidence for differential selection, where one overlapping reading frame is under positive and the other under negative selection.
Collapse
Affiliation(s)
- Saskia de Groot
- Department of Statistics, University of Oxford, 1 South Parks Road, OX1 3TG, UK
| | - Thomas Mailund
- Department of Statistics, University of Oxford, 1 South Parks Road, OX1 3TG, UK
- BiRC affliation: Bioinformatics Research Center, University of Aarhus, Hoeg-Guldbergsgade 90, 8000 Aarhus, Denmark
| | - Gerton Lunter
- MRC Functional Genetics Unit, Department of Physiology, Anatomy & Genetics, University of Oxford, 1 South Parks Road, Oxford OX1 3QX, UK
| | - Jotun Hein
- Department of Statistics, University of Oxford, 1 South Parks Road, OX1 3TG, UK
| |
Collapse
|
43
|
Delaye L, Deluna A, Lazcano A, Becerra A. The origin of a novel gene through overprinting in Escherichia coli. BMC Evol Biol 2008; 8:31. [PMID: 18226237 PMCID: PMC2268670 DOI: 10.1186/1471-2148-8-31] [Citation(s) in RCA: 36] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/22/2007] [Accepted: 01/28/2008] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Overlapped genes originate by a) loss of a stop codon among contiguous genes coded in different frames; b) shift to an upstream initiation codon of one of the contiguous genes; or c) by overprinting, whereby a novel open reading frame originates through point mutation inside an existing gene. Although overlapped genes are common in viruses, it is not clear whether overprinting has led to new genes in prokaryotes. RESULTS Here we report the origin of a new gene through overprinting in Escherichia coli K12. The htgA gene coding for a positive regulator of the sigma 32 heat shock promoter arose by point mutation in a 123/213 phase within an open reading frame (yaaW) of unknown function, most likely in the lineage leading to E. coli and Shigella sp. Further, we show that yaaW sequences coding for htgA genes have a slower evolutionary rate than those lacking an overlapped htgA gene. CONCLUSION While overprinting has been shown to be rather frequent in the evolution of new genes in viruses, our results suggest that this mechanism has also contributed to the origin of a novel gene in a prokaryote. We propose the term janolog (from Jano, the two-faced Roman god) to describe the homology relationship that holds between two genes when one originated through overprinting of the other. One cannot dismiss the possibility that at least a small fraction of the large number of novel ORPhan genes detected in pan-genome and metagenomic studies arose by overprinting.
Collapse
Affiliation(s)
- Luis Delaye
- Facultad de Ciencias, Universidad Nacional Autónoma de México, Apdo. Postal 70-407, Cd. Universitaria, 04510 México DF, México.
| | | | | | | |
Collapse
|
44
|
Abstract
MOTIVATION Viral genomes tend to code in overlapping reading frames to maximize informational content. This may result in atypical codon bias and particular evolutionary constraints. Due to the fast mutation rate of viruses, there is additional strong evidence for varying selection between intra- and intergenomic regions. The presence of multiple coding regions complicates the concept of K(a)/K(s) ratio, and thus begs for an alternative approach when investigating selection strengths. Building on the paper by McCauley and Hein, we develop a method for annotating a viral genome coding in overlapping reading frames. We introduce an evolutionary model capable of accounting for varying levels of selection along the genome, and incorporate it into our prior single sequence HMM methodology, extending it now to a phylogenetic HMM. Given an alignment of several homologous viruses to a reference sequence, we may thus achieve an annotation both of coding regions as well as selection strengths, allowing us to investigate different selection patterns and hypotheses. RESULTS We illustrate our method by applying it to a multiple alignment of four HIV2 sequences, as well as of three Hepatitis B sequences. We obtain an annotation of the coding regions, as well as a posterior probability for each site of the strength of selection acting on it. From this we may deduce the average posterior selection acting on the different genes. Whilst we are encouraged to see in HIV2, that the known to be conserved genes gag and pol are indeed annotated as such, we also discover several sites of less stringent negative selection within the env gene. To the best of our knowledge, we are the first to subsequently provide a full selection annotation of the Hepatitis B genome by explicitly modelling the evolution within overlapping reading frames, and not relying on simple K(a)/K(s) ratios.
Collapse
Affiliation(s)
- Stephen McCauley
- Department of Statistics, University of Oxford, 1 South Parks Road, OX1 3TG, UK
| | | | | | | |
Collapse
|
45
|
van Hemert FJ, Zaaijer HL, Berkhout B, Lukashov VV. Mosaic amino acid conservation in 3D-structures of surface protein and polymerase of hepatitis B virus. Virology 2007; 370:362-72. [PMID: 17935747 DOI: 10.1016/j.virol.2007.08.036] [Citation(s) in RCA: 19] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/10/2007] [Revised: 07/31/2007] [Accepted: 08/25/2007] [Indexed: 12/17/2022]
Abstract
Surface protein and polymerase of hepatitis B virus provide a striking example of gene overlap. Inclusion of more coding constraints in the phylogenetic analysis forces the tree toward accepted topology. Three-dimensional protein modeling demonstrates that participation in local protein function underlies the observed mosaic patterns of amino acid conservation and variability. Conserved amino acid residues of polymerase were typically clustered at the catalytic core marked by the YMDD motif. The proposed tertiary structure of surface protein displayed the expected transmembrane helices in a 2-domain constellation. Conserved amino acids like, for instance, cysteine residues are involved in the spatial orientation of the two domains, the exposed location of the a-determinant and the dimer formation of surface protein. By means of computational alanine replacement scanning, we demonstrated that the interfaces between domains in monomeric surface protein, between the monomers in dimeric surface protein and in a capsid-surface protein complex mainly consist of relatively well-conserved amino acid residues.
Collapse
Affiliation(s)
- Formijn J van Hemert
- Laboratory of Experimental Virology, Department of Medical Microbiology, Center for Infection and Immunity Amsterdam (CINIMA), Academic Medical Center, University of Amsterdam, Meibergdreef 15, 1105 AZ Amsterdam, The Netherlands.
| | | | | | | |
Collapse
|
46
|
Cai Y, Hartnett B, Gustafsson C, Peccoud J. A syntactic model to design and verify synthetic genetic constructs derived from standard biological parts. Bioinformatics 2007; 23:2760-7. [PMID: 17804435 DOI: 10.1093/bioinformatics/btm446] [Citation(s) in RCA: 59] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/25/2023] Open
Abstract
MOTIVATION The sequence of artificial genetic constructs is composed of multiple functional fragments, or genetic parts, involved in different molecular steps of gene expression mechanisms. Biologists have deciphered structural rules that the design of genetic constructs needs to follow in order to ensure a successful completion of the gene expression process, but these rules have not been formalized, making it challenging for non-specialists to benefit from the recent progress in gene synthesis. RESULTS We show that context-free grammars (CFG) can formalize these design principles. This approach provides a path to organizing libraries of genetic parts according to their biological functions, which correspond to the syntactic categories of the CFG. It also provides a framework for the systematic design of new genetic constructs consistent with the design principles expressed in the CFG. Using parsing algorithms, this syntactic model enables the verification of existing constructs. We illustrate these possibilities by describing a CFG that generates the most common architectures of genetic constructs in Escherichia coli. AVAILABILITY A web site allows readers to experiment with the algorithms presented in this article: www.genocad.org. SUPPLEMENTARY INFORMATION Sequences and models are available at Bioinformatics online.
Collapse
Affiliation(s)
- Yizhi Cai
- Virginia Bioinformatics Institute, Virginia Polytechnic Institute and State University, Washington Street, MC 0477, Blacksburg VA 24061, USA
| | | | | | | |
Collapse
|
47
|
Kingsford C, Delcher AL, Salzberg SL. A unified model explaining the offsets of overlapping and near-overlapping prokaryotic genes. Mol Biol Evol 2007; 24:2091-8. [PMID: 17642473 PMCID: PMC2429982 DOI: 10.1093/molbev/msm145] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open
Abstract
Overlapping genes are a common phenomenon. Among sequenced prokaryotes, more than 29% of all annotated genes overlap at least 1 of their 2 flanking genes. We present a unified model for the creation and repair of overlaps among adjacent genes where the 3' ends either overlap or nearly overlap. Our model, derived from a comprehensive analysis of complete prokaryotic genomes in GenBank, explains the nonuniform distribution of the lengths of such overlap regions far more simply than previously proposed models. Specifically, we explain the distribution of overlap lengths based on random extensions of genes to the next occurring downstream stop codon. Our model also provides an explanation for a newly observed (here) pattern in the distribution of the separation distances of closely spaced nonoverlapping genes. We provide evidence that the newly described biased distribution of separation distances is driven by the same phenomenon that creates the uneven distribution of overlap lengths. This suggests a dynamic picture of continual overlap creation and elimination.
Collapse
Affiliation(s)
- Carl Kingsford
- Center for Bioinformatics and Computational Biology, Institute for Advanced Computer Studies, University of Maryland, USA.
| | | | | |
Collapse
|
48
|
Mayrose I, Doron-Faigenboim A, Bacharach E, Pupko T. Towards realistic codon models: among site variability and dependency of synonymous and non-synonymous rates. ACTA ACUST UNITED AC 2007; 23:i319-27. [PMID: 17646313 DOI: 10.1093/bioinformatics/btm176] [Citation(s) in RCA: 49] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022]
Abstract
Codon evolutionary models are widely used to infer the selection forces acting on a protein. The non-synonymous to synonymous rate ratio (denoted by Ka/Ks) is used to infer specific positions that are under purifying or positive selection. Current evolutionary models usually assume that only the non-synonymous rates vary among sites while the synonymous substitution rates are constant. This assumption ignores the possibility of selection forces acting at the DNA or mRNA levels. Towards a more realistic description of sequence evolution, we present a model that accounts for among-site-variation of both synonymous and non-synonymous substitution rates. Furthermore, we alleviate the widespread assumption that positions evolve independently of each other. Thus, possible sources of bias caused by random fluctuations in either the synonymous or non-synonymous rate estimations at a single site is removed. Our model is based on two hidden Markov models that operate on the spatial dimension: one describes the dependency between adjacent non-synonymous rates while the other describes the dependency between adjacent synonymous rates. The presented model is applied to study the selection pressure across the HIV-1 genome. The new model better describes the evolution of all HIV-1 genes, as compared to current codon models. Using both simulations and real data analyses, we illustrate that accounting for synonymous rate variability and dependency greatly increases the accuracy of Ka/Ks estimation and in particular of positively selected sites. Finally, we discuss the applicability of the developed model to infer the selection forces in regulatory and overlapping regions of the HIV-1 genome.
Collapse
Affiliation(s)
- Itay Mayrose
- The Department of Cell Research and Immunology, George S. Wise Faculty of Life Sciences, Tel- Aviv University, Tel Aviv 69978, Israel
| | | | | | | |
Collapse
|
49
|
Pavesi A. Pattern of nucleotide substitution in the overlapping nonstructural genes of influenza A virus and implication for the genetic diversity of the H5N1 subtype. Gene 2007; 402:28-34. [PMID: 17825505 DOI: 10.1016/j.gene.2007.07.013] [Citation(s) in RCA: 10] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/13/2007] [Revised: 07/12/2007] [Accepted: 07/12/2007] [Indexed: 11/24/2022]
Abstract
In viruses under strong pressure to minimize genome size, overlapping genes represent a fine strategy to condense a maximum amount of information into short nucleotide sequences. Here, we investigated the evolution of the genes encoding the nonstructural proteins NS1 and NS2 of influenza A virus (IAV), which are one of the best characterized cases of gene overlap. By a detailed analysis of about four hundred sequences grouped into 11 IAV subtypes, we found that the overlapping coding region of the NS1 gene shows a significant increase of the rate of nonsynonymous change, with respect to its nonoverlapping counterpart. The same feature was observed in the overlapping coding region of the NS2 gene. Such a variation pattern, which implies the occurrence of several amino acid substitutions in the protein regions encoded by overlapping frames, is different from the pattern of constrained evolution typical of other viral overlapping-gene systems. Amino acid sequence analysis of the NS1 and NS2 proteins revealed that some nonsynonymous substitutions, located in the region of gene overlap, play a critical role in shaping the genetic diversity of the highly pathogenic subtype H5N1. Since both proteins contribute to disease pathogenesis by affecting many virus and host-cell processes, information provided by this study should be useful to highlight the impact of nonstructural gene variation on the pathogenicity of H5N1 viruses.
Collapse
Affiliation(s)
- Angelo Pavesi
- Department of Genetics, Biology of Microorganisms, Anthropology, Evolution, University of Parma, V. le G. P. Usberti 11/A, I-43100 Parma, Italy.
| |
Collapse
|
50
|
Szklarczyk R, Heringa J, Pond SK, Nekrutenko A. Rapid asymmetric evolution of a dual-coding tumor suppressor INK4a/ARF locus contradicts its function. Proc Natl Acad Sci U S A 2007; 104:12807-12. [PMID: 17652172 PMCID: PMC1937548 DOI: 10.1073/pnas.0703238104] [Citation(s) in RCA: 21] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/17/2022] Open
Abstract
INK4a/ARF tumor suppressor locus encodes two protein products, INK4a and ARF, essential for controlling tumorigenesis and mutated in more than half of human cancers. There is no resemblance between the two proteins: their coding regions are assembled by alternative splicing of two mutually exclusive 5' exons into a constitutive one containing overlapping out-of-phase reading frames. We show that the dual-coding arrangement conflicts with the high cost of mutations within INK4a/ARF. Unexpectedly, the locus evolves rapidly and asymmetrically, with ARF accumulating the majority of amino acid replacements. Rapid evolution drives both INK4a and ARF proteins out of sync with other members of the RB and p53 tumor suppressor pathways, both of which are controlled by the locus. Yet, the asymmetric behavior may be an intrinsic property of dual-coding exons: INK4a/ARF closely mimics the evolution of 90 newly identified genes with similar dual-coding structure. Thus, the strong link between mutations in INK4a/ARF and cancer may be a direct consequence of the architecture of the locus.
Collapse
Affiliation(s)
- Radek Szklarczyk
- *Centre for Integrative Bioinformatics, Vrije University, De Boelelaan 1081a, 1081HV, Amsterdam, The Netherlands
| | - Jaap Heringa
- *Centre for Integrative Bioinformatics, Vrije University, De Boelelaan 1081a, 1081HV, Amsterdam, The Netherlands
| | | | - Anton Nekrutenko
- Center for Comparative Genomics and Bioinformatics, Pennsylvania State University, University Park, PA 16803
- To whom correspondence should be addressed at:
505 Wartik Laboratory, Center for Comparative Genomics and Bioinformatics, Pennsylvania State University, University Park, PA 16802. E-mail:
| |
Collapse
|