1
|
Garcia-Mazcorro JF, Barcenas-Walls JR. Thinking beside the box: Should we care about the non-coding strand of the 16S rRNA gene? FEMS Microbiol Lett 2016; 363:fnw171. [DOI: 10.1093/femsle/fnw171] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 07/08/2016] [Indexed: 12/22/2022] Open
|
2
|
Guo FB. The distribution patterns of bases of protein-coding genes, non-coding ORFs, and intergenic sequences in pseudomonas aeruginosa PA01 genome and its implications. J Biomol Struct Dyn 2008; 25:127-33. [PMID: 17718591 DOI: 10.1080/07391102.2007.10507161] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/28/2022]
Abstract
The distribution patterns of bases of DNA fragments in different regions in P. aeruginosa genome are analyzed in this paper. It's shown that 5565 protein-coding genes, 17315 non-coding ORFs, and 1104 intergenic sequences are located into seven clusters based on their base frequencies. Almost all the protein-coding genes are contained in one of the seven clusters. The significant difference of base frequencies among three codon positions in high GC genome, which arouse the division between the distribution patterns of bases of six reading frames of protein-coding genes, is responsible for the appearance of the clustering phenomenon. In the light of the clustering phenomenon, the author supposes that the anitisense strand ORFs, particularly those corresponding to Frame 2' and Frame 3', may not code for proteins in P. aeruginosa genome.
Collapse
Affiliation(s)
- F-B Guo
- School of Life Science and Technology, University of Electronic Science and Technology of China, Chengdu 610054, China.
| |
Collapse
|
3
|
McGeoch DJ, Rixon FJ, Davison AJ. Topics in herpesvirus genomics and evolution. Virus Res 2006; 117:90-104. [PMID: 16490275 DOI: 10.1016/j.virusres.2006.01.002] [Citation(s) in RCA: 367] [Impact Index Per Article: 19.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/06/2005] [Revised: 01/04/2006] [Accepted: 01/06/2006] [Indexed: 12/19/2022]
Abstract
Herpesviruses comprise an abundant, widely distributed group of large DNA viruses of humans and other vertebrates, and overall are among the most extensively studied large DNA viruses. Many herpesvirus genome sequences have been determined, and interpreted in terms of gene contents to give detailed views of both ubiquitous and lineage-specific functions. Availability of gene sequences has also enabled evaluations of evolutionary relationships. For herpesviruses of mammals, a robust phylogenetic tree has been constructed, which shows many features characteristic of synchronous development of virus and host lineages over large evolutionary timespans. It has also emerged that three distinct groupings of herpesviruses exist: the first containing viruses with mammals, birds and reptiles as natural hosts; the second containing viruses of amphibians and fish; and the third consisting of a single invertebrate herpesvirus. Within each of the first two groups, the genomes show clear evidence of descent from a common ancestor, but relationships between the three groups are extremely remote. Detailed analyses of capsid structures provide the best evidence for a common origin of the three groups. At a finer level, the structure of the capsid shell protein further suggests an element of common origin between herpesviruses and tailed DNA bacteriophages.
Collapse
Affiliation(s)
- Duncan J McGeoch
- Medical Research Council Virology Unit, Institute of Virology, University of Glasgow, Church Street, Glasgow G11 5JR, UK.
| | | | | |
Collapse
|
4
|
Boldogkõi Z, Barta E. Specific amino acid content and codon usage account for the existence of overlapping ORFS. Biosystems 1999; 51:95-100. [PMID: 10482421 DOI: 10.1016/s0303-2647(99)00018-0] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/28/2023]
Abstract
Here we present a novel hypothesis for the origin of overlapping open reading frames (O-ORFs) observed in the 'non-coding frames' of several genes of yeast chromosome II. By computer analysis it was found that the specific amino acid content and base distribution pattern at certain genomic locations and the presence of O-ORFs were related. This observation prompt us to conclude that these O-ORFs are mere statistical curiosities without any biological function, which is in contrast to the hypotheses proposed by other authors.
Collapse
Affiliation(s)
- Z Boldogkõi
- Laboratory of Neuromorphology, Semmelweis University of Medicine, Budapest, Hungary.
| | | |
Collapse
|
5
|
Silke J. The majority of long non-stop reading frames on the antisense strand can be explained by biased codon usage. Gene 1997; 194:143-55. [PMID: 9266684 DOI: 10.1016/s0378-1119(97)00199-6] [Citation(s) in RCA: 20] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/05/2023]
Abstract
In recent studies it has been suggested that long reading frames on the antisense strand of open reading frames (ORFs) are more frequent than expected. The vertebrate DNA database was searched for long (greater than 900 bp) antisense non-stop reading frames (aNRFs) that overlap known coding regions. The sequences obtained were predominantly positioned in DNA with a high usage of G or C in the third codon position of the sense ORF. The major class of sequences revealed by the search was that of the heat-shock protein 70 kDa (Hsp70) family. A long Hsp70 aNRF was found in many Hsp70 sequences and occurred in species as diverse as fish, flies, fungi and bacteria. The role of codon usage bias was analysed both in the specific case of the Hsp70 genes and in a general species-wide context. The data obtained showed that even the very long aNRFs present in the Hsp70 family could be explained by codon usage bias on the sense strand. Codon usage bias is determined by GC content at the third codon position of the sense ORF and, in some species, by a high expression level of the gene in question. Such an explanation for the occurrence of long aNRFs cannot exclude that some aNRFs are transcribed and translated.
Collapse
Affiliation(s)
- J Silke
- Institut für Molekularbiologie II, Universität Zürich, Switzerland
| |
Collapse
|
6
|
Andersson SGE, Sharp PM. Codon usage in the Mycobacterium tuberculosis complex. MICROBIOLOGY (READING, ENGLAND) 1996; 142 ( Pt 4):915-925. [PMID: 8936318 DOI: 10.1099/00221287-142-4-915] [Citation(s) in RCA: 80] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 02/03/2023]
Abstract
The usage of alternative synonymous codons in Mycobacterium tuberculosis (and M. bovis) genes has been investigated. This species is a member of the high-G+C Gram-positive bacteria, with a genomic G+C content around 65 mol%. This G+C-richness is reflected in a strong bias towards C- and G-ending codons for every amino acid: overall, the G+C content at the third positions of codons is 83%. However, there is significant variation in codon usage patterns among genes, which appears to be associated with gene expression level. From the variation among genes, putative optimal codons were identified for 15 amino acids. The degree of bias towards optimal codons in an M. tuberculosis gene is correlated with that in homologues from Escherichia coli and Bacillus subtilis. The set of selectively favoured codons seems to be quite highly conserved between M. tuberculosis and another high-G+C Gram-positive bacterium, Corynebacterium glutamicum, even though the genome and overall codon usage of the latter are much less G+C-rich.
Collapse
Affiliation(s)
- Siv G E Andersson
- Department of Molecular Biology, Biomedical Center, Uppsala University, Uppsala, S-75124, Sweden
| | - Paul M Sharp
- Department of Genetics, University of Nottingham, Queen's Medical Centre, Nottingham NG7 2UH, UK
| |
Collapse
|
7
|
Borodovsky M, Koonin EV, Rudd KE. New genes in old sequence: a strategy for finding genes in the bacterial genome. Trends Biochem Sci 1994; 19:309-13. [PMID: 7940673 DOI: 10.1016/0968-0004(94)90067-1] [Citation(s) in RCA: 54] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/28/2023]
Affiliation(s)
- M Borodovsky
- School of Biology, Georgia Institute of Technology, Atlanta 30332-0230
| | | | | |
Collapse
|
8
|
Merino E, Balbás P, Puente JL, Bolívar F. Antisense overlapping open reading frames in genes from bacteria to humans. Nucleic Acids Res 1994; 22:1903-8. [PMID: 8208617 PMCID: PMC308092 DOI: 10.1093/nar/22.10.1903] [Citation(s) in RCA: 60] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/29/2023] Open
Abstract
Long Open Reading Frames (ORFs) in antisense DNA strands have been reported in the literature as being rare events. However, an extensive analysis of the GenBank database revealed that a substantial number of genes from several species contain an in-phase ORF in the antisense strand, that overlaps entirely the coding sequence of the sense strand, or even extends beyond. The findings described in this paper show that this is a frequent, non-random phenomenon, which is primarily dependent on codon usage, and to a lesser extent on gene size and GC content. Examination of the sequence database for several prokaryotic and eukaryotic organisms, demonstrates that coding sequences with in-phase, 100% overlapping antisense ORFs are present in every genome studied so far.
Collapse
Affiliation(s)
- E Merino
- Departamento de Biología Molecular, Universidad Nacional Autónoma de Mexico, Cuernavaca
| | | | | | | |
Collapse
|
9
|
Abstract
The nature and variation of synonymous codon usage in 47 open reading frames from Kluyveromyces lactis have been investigated. Using multivariate statistical analysis, a single major trend among K. lactis genes was identified that differentiates among genes by expression level: highly expressed genes have high codon usage bias, while genes of low expression level have low bias. A relatively minor secondary trend differentiates among genes according to G+C content at silent sites. In these respects, K. lactis is similar to both Saccharomyces cerevisiae and Candida albicans, and the same 'optimal' codons appear to be selected in highly expressed genes in all three species. In addition, silent sites in K. lactis and S. cerevisiae have similar G+C contents, but in C. albicans genes they are more A+T-rich. Thus, in all essential features, codon usage in K. lactis is very similar to that in S. cerevisiae, even though silent sites in genes compared between these two species have undergone sufficient mutation to be saturated with changes. We conclude that the factors influencing overall codon usage, namely mutational biases and the abundances of particular tRNAs, have not diverged between the two species. Nevertheless, in a few cases, codon usage differs between homologous genes from K. lactis and S. cerevisiae. The strength of codon usage bias in cytochrome c genes differs considerably, presumably because of different expression patterns in the two species. Two other, linked, genes have very different G+C content at silent sites in the two species, which may be a reflection of their chromosomal locations. Correspondence analysis was used to identify two open reading frames with highly atypical codon usage that are probably not genes.
Collapse
Affiliation(s)
- A T Lloyd
- Department of Genetics, Trinity College, Dublin, Ireland
| | | |
Collapse
|
10
|
Abstract
Many protein families are common to all cellular organisms, indicating that many genes have ancient origins. Genetic variation is mostly attributed to processes such as mutation, duplication, and rearrangement of ancient modules. Thus it is widely assumed that much of present-day genetic diversity can be traced by common ancestry to a molecular "big bang." A rarely considered alternative is that proteins may arise continuously de novo. One mechanism of generating different coding sequences is by "overprinting," in which an existing nucleotide sequence is translated de novo in a different reading frame or from noncoding open reading frames. The clearest evidence for overprinting is provided when the original gene function is retained, as in overlapping genes. Analysis of their phylogenies indicates which are the original genes and which are their informationally novel partners. We report here the phylogenetic relationships of overlapping coding sequences from steroid-related receptor genes and from tymovirus, luteovirus, and lentivirus genomes. For each pair of overlapping coding sequences, one is confined to a single lineage, whereas the other is more widespread. This suggests that the phylogenetically restricted coding sequence arose only in the progenitor of that lineage by translating an out-of-frame sequence to yield the new polypeptide. The production of novel exons by alternative splicing in thyroid receptor and lentivirus genes suggests that introns can be a valuable evolutionary source for overprinting. New genes and their products may drive major evolutionary changes.
Collapse
Affiliation(s)
- P K Keese
- Commonwealth Scientific and Industrial Organisation, Division of Plant Industry, Australian National University, Canberra
| | | |
Collapse
|
11
|
|
12
|
Wolfe SA, Smith JM. Nucleotide sequence and analysis of the purA gene encoding adenylosuccinate synthetase of Escherichia coli K12. J Biol Chem 1988. [DOI: 10.1016/s0021-9258(18)37402-7] [Citation(s) in RCA: 32] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/22/2022] Open
|
13
|
Abstract
The genome of the human immunodeficiency virus (HIV) is known to contain eight open reading frames (ORFs) on the minus strand of the double-stranded DNA replicative intermediate. Data presented here indicate that the DNA plus strand of HIV contains a previously unidentified ORF in a region complementary to the envelope gene sequence. This ORF could encode a protein of approximately 190 amino acid residues with a relative molecular mass of 20 kilodaltons if translation began from the first initiation codon. The predicted protein is highly hydrophobic and thus could be membrane associated. It is possible, therefore, that the HIV genome encodes a protein on antisense messenger RNA.
Collapse
Affiliation(s)
- R H Miller
- Hepatitis Viruses Section, National Institute of Allergy and Infectious Diseases, Bethesda, MD 20892
| |
Collapse
|
14
|
Sharp PM, Li WH. The codon Adaptation Index--a measure of directional synonymous codon usage bias, and its potential applications. Nucleic Acids Res 1987; 15:1281-95. [PMID: 3547335 PMCID: PMC340524 DOI: 10.1093/nar/15.3.1281] [Citation(s) in RCA: 2683] [Impact Index Per Article: 70.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/06/2023] Open
Abstract
A simple, effective measure of synonymous codon usage bias, the Codon Adaptation Index, is detailed. The index uses a reference set of highly expressed genes from a species to assess the relative merits of each codon, and a score for a gene is calculated from the frequency of use of all codons in that gene. The index assesses the extent to which selection has been effective in moulding the pattern of codon usage. In that respect it is useful for predicting the level of expression of a gene, for assessing the adaptation of viral genes to their hosts, and for making comparisons of codon usage in different organisms. The index may also give an approximate indication of the likely success of heterologous gene expression.
Collapse
|
15
|
Hershey HV, Taylor MW. Sequence of the E. coli APRT gene. ADVANCES IN EXPERIMENTAL MEDICINE AND BIOLOGY 1986; 195 Pt A:239-46. [PMID: 3524135 DOI: 10.1007/978-1-4684-5104-7_38] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/06/2023]
|
16
|
Hershey HV, Taylor MW. Nucleotide sequence and deduced amino acid sequence of Escherichia coli adenine phosphoribosyltransferase and comparison with other analogous enzymes. Gene 1986; 43:287-93. [PMID: 3527873 DOI: 10.1016/0378-1119(86)90218-0] [Citation(s) in RCA: 108] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/06/2023]
Abstract
The Escherichia coli apt gene has been analyzed and its nucleotide (nt) sequence and the deduced amino acid (aa) sequence compared to those of other phosphoribosyltransferases (PRTs). The apt mRNA has a 102-nt leader sequence which may form alternate secondary structures. The RNA transcript may also form several 3' hairpin structures, which, however, do not appear to act as Rho-independent terminators. All PRTs, including E. coli adenine PRT (APRT), have a strongly conserved 13-aa sequence, as well as other regions of aa sequence or structural similarity. E. coli APRT is remarkably similar to the mouse enzyme.
Collapse
|