1
|
Majeed A, Sharma V, Ul Rehman W, Kaur A, Das S, Joseph J, Singh A, Bhardwaj P. Comprehensive Codon Usage Analysis Across Diverse Plant Lineages. Biochem Genet 2025:10.1007/s10528-025-11053-y. [PMID: 39966258 DOI: 10.1007/s10528-025-11053-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/06/2023] [Accepted: 02/06/2025] [Indexed: 02/20/2025]
Abstract
The variation of codon usage patterns in response to the evolution of organisms is an intriguing question to answer. This study investigated the relevance of the evolutionary events of vascularization and seed production with the codon usage patterns in different plant lineages. We found that the optimal codons of non-vascular lineages generally end with GC, whereas those of the vascular lineages end with AU. Correspondence analysis and model-based clustering showed that the evolution of the codon usage pattern follows the evolutionary event of the vascularization more precisely than that of the seed production. The dinucleotides CpG and TpA were under-represented in all the lineages, whereas the dinucleotide TpG was found over-represented in all the lineages, except algae. Evolutionary-related lineages showed similar codon pair bias. The dinucleotide CpA showed a similar representation as those of its parent codon pairs. Although natural selection predominates over mutational pressure in determining the codon usage bias, the relative influence of mutational pressure is higher in the non-vascular lineages than those in the vascular lineages.
Collapse
Affiliation(s)
- Aasim Majeed
- Molecular Genetics Laboratory, Department of Botany, Central University of Punjab, VPO Ghudda, Distt. Bathinda, Punjab, 151401, India
| | - Vikas Sharma
- Molecular Genetics Laboratory, Department of Botany, Central University of Punjab, VPO Ghudda, Distt. Bathinda, Punjab, 151401, India
| | - Wahid Ul Rehman
- Molecular Genetics Laboratory, Department of Botany, Central University of Punjab, VPO Ghudda, Distt. Bathinda, Punjab, 151401, India
| | - Amitozdeep Kaur
- Molecular Genetics Laboratory, Department of Botany, Central University of Punjab, VPO Ghudda, Distt. Bathinda, Punjab, 151401, India
| | - Sreemoyee Das
- Molecular Genetics Laboratory, Department of Botany, Central University of Punjab, VPO Ghudda, Distt. Bathinda, Punjab, 151401, India
| | - Josepheena Joseph
- Molecular Genetics Laboratory, Department of Botany, Central University of Punjab, VPO Ghudda, Distt. Bathinda, Punjab, 151401, India
| | - Amandeep Singh
- Molecular Genetics Laboratory, Department of Botany, Central University of Punjab, VPO Ghudda, Distt. Bathinda, Punjab, 151401, India
| | - Pankaj Bhardwaj
- Molecular Genetics Laboratory, Department of Botany, Central University of Punjab, VPO Ghudda, Distt. Bathinda, Punjab, 151401, India.
| |
Collapse
|
2
|
O'Connor PBF. The Evolutionary Transition of the RNA World to Obcells to Cellular-Based Life. J Mol Evol 2024; 92:278-285. [PMID: 38683368 DOI: 10.1007/s00239-024-10171-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/04/2023] [Accepted: 04/08/2024] [Indexed: 05/01/2024]
Abstract
The obcell hypothesis is a proposed route for the RNA world to develop into a primitive cellular one. It posits that this transition began with the emergence of the proto-ribosome which enabled RNA to colonise the external surface of lipids by the synthesis of amphipathic peptidyl-RNAs. The obcell hypothesis also posits that the emergence of a predation-based ecosystem provided a selection mechanism for continued sophistication amongst early life forms. Here, I argue for this hypothesis owing to its significant explanatory power; it offers a rationale why a ribosome which initially was capable only of producing short non-coded peptides was advantageous and it forgoes issues related to maintaining a replicating RNA inside a lipid enclosure. I develop this model by proposing that the evolutionary selection for improved membrane anchors resulted in the emergence of primitive membrane pores which enabled obcells to gradually evolve into a cellular morphology. Moreover, I introduce a model of obcell production which advances that tRNAs developed from primers of the RNA world.
Collapse
|
3
|
Di Giulio M. Theories of the origin of the genetic code: Strong corroboration for the coevolution theory. Biosystems 2024; 239:105217. [PMID: 38663520 DOI: 10.1016/j.biosystems.2024.105217] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/21/2024] [Revised: 04/16/2024] [Accepted: 04/18/2024] [Indexed: 04/29/2024]
Abstract
I analyzed all the theories and models of the origin of the genetic code, and over the years, I have considered the main suggestions that could explain this origin. The conclusion of this analysis is that the coevolution theory of the origin of the genetic code is the theory that best captures the majority of observations concerning the organization of the genetic code. In other words, the biosynthetic relationships between amino acids would have heavily influenced the origin of the organization of the genetic code, as supported by the coevolution theory. Instead, the presence in the genetic code of physicochemical properties of amino acids, which have also been linked to the physicochemical properties of anticodons or codons or bases by stereochemical and physicochemical theories, would simply be the result of natural selection. More explicitly, I maintain that these correlations between codons, anticodons or bases and amino acids are in fact the result not of a real correlation between amino acids and codons, for example, but are only the effect of the intervention of natural selection. Specifically, in the genetic code table we expect, for example, that the most similar codons - that is, those that differ by only one base - will have more similar physicochemical properties. Therefore, the 64 codons of the genetic code table ordered in a certain way would also represent an ordering of some of their physicochemical properties. Now, a study aimed at clarifying which physicochemical property of amino acids has influenced the allocation of amino acids in the genetic code has established that the partition energy of amino acids has played a role decisive in this. Indeed, under some conditions, the genetic code was found to be approximately 98% optimized on its columns. In this same work, it was shown that this was most likely the result of the action of natural selection. If natural selection had truly allocated the amino acids in the genetic code in such a way that similar amino acids also have similar codons - this, not through a mechanism of physicochemical interaction between, for example, codons and amino acids - then it might turn out that even different physicochemical properties of codons (or anticodons or bases) show some correlation with the physicochemical properties of amino acids, simply because the partition energy of amino acids is correlated with other physicochemical properties of amino acids. It is very likely that this would inevitably lead to a correlation between codons (or anticodons or bases) and amino acids. In other words, since the codons (anticodons or bases) are ordered in the genetic code, that is to say, some of their physicochemical properties should also be ordered by a similar order, and given that the amino acids would also appear to have been ordered in the genetic code by selection natural, then it should inevitably turn out that there is a correlation between, for example, the hydrophobicity of anticodons and that of amino acids. Instead, the intervention of natural selection in organizing the genetic code would appear to be highly compatible with the main mechanism of structuring the genetic code as supported by the coevolution theory. This would make the coevolution theory the only plausible explanation for the origin of the genetic code.
Collapse
Affiliation(s)
- Massimo Di Giulio
- The Ionian School, Early Evolution of Life Department, Genetic Code and tRNA Origin Laboratory, Via Roma 19, 67030, Alfedena, L'Aquila, Italy.
| |
Collapse
|
4
|
Štambuk N, Konjevoda P, Štambuk A. How ambiguity codes specify molecular descriptors and information flow in Code Biology. Biosystems 2023; 233:105034. [PMID: 37739308 DOI: 10.1016/j.biosystems.2023.105034] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/21/2023] [Revised: 09/12/2023] [Accepted: 09/12/2023] [Indexed: 09/24/2023]
Abstract
The article presents IUPAC ambiguity codes for incomplete nucleic acid specification, and their use in Code Biology. It is shown how to use this nomenclature in order to extract accurate information on different properties of the biological systems. We investigated the use of ambiguity codes, as mathematical and logical operators and truth table elements, for the encoding of amino acids by means of the Standard Genetic Code. It is explained how to use ambiguity codes and truth functions in order to obtain accurate information on different properties of the biological systems. Nucleotide ambiguity codes could be applied to: 1. encoding descriptive information of nucleotides, amino acids and proteins (e.g., of polarity, relative solvent accessibility, atom depth, etc.), and 2. system modelling ranging from standard bioinformatics tools to classic evolutionary models (i.e. from Miyazawa-Jernigan statistical potential to Kimura three-substitution-type model, respectively). It is shown that the algorithms based on IUPAC ambiguity codes, Boolean functions and truth table, Probabilistic Square of Opposition/Semiotic Square and Klein 4-groups-could be used for the bioinformatics analyses and Relational data modelling in natural science. Underlying mathematical, logical and semiotic concepts of interest are presented and addressed.
Collapse
Affiliation(s)
- Nikola Štambuk
- Centre for Nuclear Magnetic Resonance, Ruđer Bošković Institute, Bijenička cesta 54, HR-10000, Zagreb, Croatia.
| | - Paško Konjevoda
- Laboratory for Epigenomics, Division of Molecular Medicine, Ruđer Bošković Institute, Bijenička cesta 54, HR-10000, Zagreb, Croatia.
| | - Albert Štambuk
- Faculty of Kinesiology, University of Zagreb, Horvaćanski zavoj 15, HR-10000 Zagreb, Croatia
| |
Collapse
|
5
|
Štambuk N, Konjevoda P, Brčić-Kostić K, Baković J, Štambuk A. New algorithm for the analysis of nucleotide and amino acid evolutionary relationships based on Klein four-group. Biosystems 2023; 233:105030. [PMID: 37717902 DOI: 10.1016/j.biosystems.2023.105030] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/31/2023] [Revised: 09/10/2023] [Accepted: 09/10/2023] [Indexed: 09/19/2023]
Abstract
Phylogenetics is the study of ancestral relationships among biological species. Such sequence analyses are often represented as phylogenetic trees. The branching pattern of each tree and its topology reflect the evolutionary relatedness between analyzed sequences. We present a Klein four-group algorithm (K4A) for the evolutionary analysis of nucleotide and amino acid sequences. Klein four-group set of operators consists of: identity e (U), and three elements-a = transition (C), b = transversion (G) and c = transition-transversion or complementarity (A). We generated Klein four-group based distance matrices of: 1. Cayley table (CK4), 2. Table rows (K4R), 3. Table columns (K4C), and 4. Euclidean 2D distance (K4E). The performance of the matrices was tested on a dataset of RecA proteins in bacteria, eukaryotes (Rad51 homolog) and archaea (RadA homolog). RecA and its functional homologs are found in all species, and are essential for the repair and maintenance of DNA. Consequently, they represent a good model for the study of evolutionary relationship of protein and nucleotide sequences. The ancestral relationship between the sequences was correctly classified by all K4A matrices concerning general topology. All distance matrices exhibited small variations among species, and overall results of tree classification were in agreement with the general patterns obtained by standard BLOSUM and PAM substitution matrices. During the evolution of a code there is a phase of optimization of system rules, the ambiguity of a code is eliminated, and the system starts producing specific components. Klein four-group algorithm is consistent with the concept of ambiguity reduction. It also enables the use of different genetic code table variants optimized for particular transitions in evolution based on biological specificity.
Collapse
Affiliation(s)
- Nikola Štambuk
- Centre for Nuclear Magnetic Resonance, Ruđer Bošković Institute, Bijenička cesta 54, HR-10000, Zagreb, Croatia.
| | - Paško Konjevoda
- Laboratory for Epigenomics, Division of Molecular Medicine, Ruđer Bošković Institute, Bijenička cesta 54, HR-10000, Zagreb, Croatia.
| | - Krunoslav Brčić-Kostić
- Laboratory of Evolutionary Genetics, Division of Molecular Biology, Ruđer Bošković Institute, Bijenička cesta 54, HR-10000, Zagreb, Croatia
| | - Josip Baković
- University Hospital Dubrava, Department of Surgery, Avenija Gojka Šuška 6, HR-10000, Zagreb, Croatia
| | - Albert Štambuk
- Faculty of Kinesiology, University of Zagreb, Horvaćanski zavoj 15, HR-10000 Zagreb, Croatia
| |
Collapse
|
6
|
Caldararo F, Di Giulio M. The genetic code is very close to a global optimum in a model of its origin taking into account both the partition energy of amino acids and their biosynthetic relationships. Biosystems 2022; 214:104613. [DOI: 10.1016/j.biosystems.2022.104613] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/19/2021] [Revised: 01/16/2022] [Accepted: 01/17/2022] [Indexed: 01/23/2023]
|
7
|
Schwersensky M, Rooman M, Pucci F. Large-scale in silico mutagenesis experiments reveal optimization of genetic code and codon usage for protein mutational robustness. BMC Biol 2020; 18:146. [PMID: 33081759 PMCID: PMC7576759 DOI: 10.1186/s12915-020-00870-9] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/14/2020] [Accepted: 09/16/2020] [Indexed: 12/31/2022] Open
Abstract
BACKGROUND How, and the extent to which, evolution acts on DNA and protein sequences to ensure mutational robustness and evolvability is a long-standing open question in the field of molecular evolution. We addressed this issue through the first structurome-scale computational investigation, in which we estimated the change in folding free energy upon all possible single-site mutations introduced in more than 20,000 protein structures, as well as through available experimental stability and fitness data. RESULTS At the amino acid level, we found the protein surface to be more robust against random mutations than the core, this difference being stronger for small proteins. The destabilizing and neutral mutations are more numerous in the core and on the surface, respectively, whereas the stabilizing mutations are about 4% in both regions. At the genetic code level, we observed smallest destabilization for mutations that are due to substitutions of base III in the codon, followed by base I, bases I+III, base II, and other multiple base substitutions. This ranking highly anticorrelates with the codon-anticodon mispairing frequency in the translation process. This suggests that the standard genetic code is optimized to limit the impact of random mutations, but even more so to limit translation errors. At the codon level, both the codon usage and the usage bias appear to optimize mutational robustness and translation accuracy, especially for surface residues. CONCLUSION Our results highlight the non-universality of mutational robustness and its multiscale dependence on protein features, the structure of the genetic code, and the codon usage. Our analyses and approach are strongly supported by available experimental mutagenesis data.
Collapse
Affiliation(s)
- Martin Schwersensky
- Computational Biology and Bioinformatics, Université Libre de Bruxelles, CP 165/61, Roosevelt Ave. 50, Brussels, 1050, Belgium
| | - Marianne Rooman
- Computational Biology and Bioinformatics, Université Libre de Bruxelles, CP 165/61, Roosevelt Ave. 50, Brussels, 1050, Belgium.
- Interuniversity Institute of Bioinformatics in Brussels, Boulevard du Triomphe, Brussels, 1050, Belgium.
| | - Fabrizio Pucci
- Computational Biology and Bioinformatics, Université Libre de Bruxelles, CP 165/61, Roosevelt Ave. 50, Brussels, 1050, Belgium.
- Interuniversity Institute of Bioinformatics in Brussels, Boulevard du Triomphe, Brussels, 1050, Belgium.
| |
Collapse
|
8
|
Codon usage bias in the H gene of canine distemper virus. Microb Pathog 2020; 149:104511. [PMID: 32961282 DOI: 10.1016/j.micpath.2020.104511] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/29/2020] [Revised: 08/30/2020] [Accepted: 09/16/2020] [Indexed: 12/25/2022]
Abstract
Canine distemper virus (CDV), a non-segmented single negative-stranded RNA (ssRNA), is the etiological agent of canine distemper. Canine distemper is a highly contagious and lethal viral disease in domestic dogs and wild carnivores. Study of the evolution of CDV presents an essential key to improve the vaccine efficacy. In this study, a total of 328 full-length CDV hemagglutinin (H) gene sequences were subjected to phylogenetic, amino acid mutations, and codon usage analysis. In accordance with previous study, CDV genotypes consisted of fifteen lineages. The unique amino acid substitution sites in each CDV lineages have been identified for the first time, including America-1 (Q330H), America-2 (I585S), Asia-1 (A359V), Asia-2 (H61R), Asia-3 (P108Q), Asia-4 (K213T), India-1/Asia-5(S497P), Arctic (S20L), Africa-1(N489S), Colombian (V41I), EWL (I44V), Europe (D560E), Europe-1/South America-1(K161Q), South America-2 (R580Q), and East African (S214A). Codon usage analysis indicated that H gene exhibited low codon usage bias and further neutrality plot analysis demonstrated that natural selection played a dominated role in driving CPV evolution. The effective number of codons (ENC) plots show that all the different sequences are below the standard curve, indicating that mutational pressure is not the only factor affecting CUB but other forces, including natural selection. The neutrality analysis showed that the slope of the regression line was 0.1501, indicating natural selection dominates directional mutation pressure in driving the codon usage pattern. In addition, nucleotide composition, relative synonymous codon usage value, dinucleotide content, and geographical distribution have been proven to influence the codon usage bias of the CDV H gene. The novel findings enhanced the understanding of CDV evolution.
Collapse
|
9
|
Wang B. The Pattern of Occurrence of Cytosine in the Genetic Code Minimizes Deleterious Mutations and Favors Proper Function of the Translational Machinery. OPEN JOURNAL OF GENETICS 2020; 10:8-15. [DOI: 10.4236/ojgen.2020.101002] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 09/01/2023]
|
10
|
Wang B. The Eight Trigrams of the I Ching Provide a New Avenue for Characterizing the Association between mRNA Codons and the Hydrophobicity of the Encoded Amino Acids. OPEN JOURNAL OF PHILOSOPHY 2020; 10:1-8. [DOI: 10.4236/ojpp.2020.101001] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 09/01/2023]
|
11
|
Ismail SNFB, Baharum SN, Fazry S, Low CF. Comparative genome analysis reveals a distinct influence of nucleotide composition on virus-host species-specific interaction of prawn-infecting nodavirus. JOURNAL OF FISH DISEASES 2019; 42:1761-1772. [PMID: 31637743 DOI: 10.1111/jfd.13093] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/26/2019] [Revised: 08/21/2019] [Accepted: 08/26/2019] [Indexed: 06/10/2023]
Abstract
Discovery of species-specific interaction between the host and virus has drawn the interest of many researchers to study the evolution of the newly emerged virus. Comparative genome analysis provides insights of the virus functional genome evolution and the underlying mechanisms of virus-host interactions. The analysis of nucleotide composition signified the evolution of nodavirus towards host specialization in a host-specific mutation manner. GC-rich genome of betanodavirus was significantly deficient in UpA and UpU dinucleotides composition, whilst the AU-rich genome of gammanodavirus was deficient in CpG dinucleotide. The capsid of MrNV and PvNV of gammanodavirus retains the highest abundance of adenine and uracil at the second codon position, respectively, which were found to be very distinctive from the other genera. ENC-GC3 plot inferred the influence of natural selection and mutational pressure in shaping the evolution of MrNV RdRp and capsid, respectively. Furthermore, CAI/eCAI analysis predicts a comparable adaptability of MrNV in squid, Sepia officinalis than its natural host, Macrobrachium rosenbergii. Thus, further study is warranted to investigate the capacity of MrNV replication in S. officinalis owing to its high codon adaptation index.
Collapse
Affiliation(s)
| | | | - Shazrul Fazry
- Tasik Chini Research Center, Faculty of Science and Technology, Universiti Kebangsaan Malaysia, Selangor, Bangi, Malaysia
| | - Chen Fei Low
- Institute of Systems Biology, Universiti Kebangsaan Malaysia, Bangi, Selangor, Malaysia
| |
Collapse
|
12
|
Zu H, Zhang H, Yao M, Zhang J, Di H, Zhang L, Dong L, Wang Z, Zhou Y. Molecular characteristics of segment 5, a unique fragment encoding two partially overlapping ORFs in the genome of rice black-streaked dwarf virus. PLoS One 2019; 14:e0224569. [PMID: 31697693 PMCID: PMC6837423 DOI: 10.1371/journal.pone.0224569] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/14/2019] [Accepted: 10/16/2019] [Indexed: 02/04/2023] Open
Abstract
Rice black-streaked dwarf virus (RBSDV), a ds-RNA virus in Fijivirus genus with family Reoviridae, which is transmitted by the small brown planthopper, is responsible for incidence of maize rough dwarf disease (MRDD) and rice black-streaked dwarf disease (RBSDD). To understand the variation and evolution of S5, a unique fragment in the genome of RBSDV which encodes two partially overlapping ORFs (ORF5-1 and ORF5-2), we analyzed 127 sequences from maize and rice exhibiting symptoms of dwarfism. The nucleotide diversity of both ORF5-1 (π = 0.039) and ORF5-2 (π = 0.027) was higher than that of the overlapping region (π = 0.011) (P < 0.05). ORF5-2 was under the greatest selection pressure based on codon bias analysis, and its activation was possibly influenced by the overlapping region. The recombinant fragments of three recombinant events (14NM23, 14BM20, and 14NM17) cross the overlapping region. Based on neighbor-joining tree analysis, the overlapping region could represent the evolutionary basis of the full-length S5, which was classified into three main groups. RBSDV populations were expanding and haplotype diversity resulted mainly from the overlapping region. The genetic differentiation of combinations (T127-B35, T127-J34, A58-B35, A58-J34, and B35-J34) reached significant or extremely significant levels. Gene flow was most frequent between subpopulations A58 and B35, with the smallest |Fst| (0.02930). We investigated interactions between 13 RBSDV proteins by two-hybrid screening assays and identified interactions between P5-1/P6, P6/P9-1, and P3/P6. We also observed self-interactive effects of P3, P6, P7-1, and P10. In short, we have proven that RBSDV populations were expanding and the overlapping region plays an important role in the genetic variation and evolution of RBSDV S5. Our results enable ongoing research into the evolutionary history of RBSDV-S5 with two partly overlapping ORFs.
Collapse
Affiliation(s)
- Hongyue Zu
- Key Laboratory of Germplasm Enhancement, Physiology and Ecology of Food Crops in Cold Region, Northeast Agricultural University, Changjiang Road, Xiangfang District, Harbin, Heilongjiang Province, China
| | - Hong Zhang
- Key Laboratory of Germplasm Enhancement, Physiology and Ecology of Food Crops in Cold Region, Northeast Agricultural University, Changjiang Road, Xiangfang District, Harbin, Heilongjiang Province, China
| | - Minhao Yao
- Key Laboratory of Germplasm Enhancement, Physiology and Ecology of Food Crops in Cold Region, Northeast Agricultural University, Changjiang Road, Xiangfang District, Harbin, Heilongjiang Province, China
| | - Jiayue Zhang
- Key Laboratory of Germplasm Enhancement, Physiology and Ecology of Food Crops in Cold Region, Northeast Agricultural University, Changjiang Road, Xiangfang District, Harbin, Heilongjiang Province, China
| | - Hong Di
- Key Laboratory of Germplasm Enhancement, Physiology and Ecology of Food Crops in Cold Region, Northeast Agricultural University, Changjiang Road, Xiangfang District, Harbin, Heilongjiang Province, China
| | - Lin Zhang
- Key Laboratory of Germplasm Enhancement, Physiology and Ecology of Food Crops in Cold Region, Northeast Agricultural University, Changjiang Road, Xiangfang District, Harbin, Heilongjiang Province, China
| | - Ling Dong
- Key Laboratory of Germplasm Enhancement, Physiology and Ecology of Food Crops in Cold Region, Northeast Agricultural University, Changjiang Road, Xiangfang District, Harbin, Heilongjiang Province, China
| | - Zhenhua Wang
- Key Laboratory of Germplasm Enhancement, Physiology and Ecology of Food Crops in Cold Region, Northeast Agricultural University, Changjiang Road, Xiangfang District, Harbin, Heilongjiang Province, China
- * E-mail: (YZ); (ZHW)
| | - Yu Zhou
- Key Laboratory of Germplasm Enhancement, Physiology and Ecology of Food Crops in Cold Region, Northeast Agricultural University, Changjiang Road, Xiangfang District, Harbin, Heilongjiang Province, China
- * E-mail: (YZ); (ZHW)
| |
Collapse
|
13
|
Fontrodona N, Aubé F, Claude JB, Polvèche H, Lemaire S, Tranchevent LC, Modolo L, Mortreux F, Bourgeois CF, Auboeuf D. Interplay between coding and exonic splicing regulatory sequences. Genome Res 2019; 29:711-722. [PMID: 30962178 PMCID: PMC6499313 DOI: 10.1101/gr.241315.118] [Citation(s) in RCA: 13] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/02/2018] [Accepted: 03/28/2019] [Indexed: 01/24/2023]
Abstract
The inclusion of exons during the splicing process depends on the binding of splicing factors to short low-complexity regulatory sequences. The relationship between exonic splicing regulatory sequences and coding sequences is still poorly understood. We demonstrate that exons that are coregulated by any given splicing factor share a similar nucleotide composition bias and preferentially code for amino acids with similar physicochemical properties because of the nonrandomness of the genetic code. Indeed, amino acids sharing similar physicochemical properties correspond to codons that have the same nucleotide composition bias. In particular, we uncover that the TRA2A and TRA2B splicing factors that bind to adenine-rich motifs promote the inclusion of adenine-rich exons coding preferentially for hydrophilic amino acids that correspond to adenine-rich codons. SRSF2 that binds guanine/cytosine-rich motifs promotes the inclusion of GC-rich exons coding preferentially for small amino acids, whereas SRSF3 that binds cytosine-rich motifs promotes the inclusion of exons coding preferentially for uncharged amino acids, like serine and threonine that can be phosphorylated. Finally, coregulated exons encoding amino acids with similar physicochemical properties correspond to specific protein features. In conclusion, the regulation of an exon by a splicing factor that relies on the affinity of this factor for specific nucleotide(s) is tightly interconnected with the exon-encoded physicochemical properties. We therefore uncover an unanticipated bidirectional interplay between the splicing regulatory process and its biological functional outcome.
Collapse
Affiliation(s)
- Nicolas Fontrodona
- Université Lyon, ENS de Lyon, Université Claude Bernard, CNRS UMR 5239, INSERM U1210, Laboratory of Biology and Modelling of the Cell, F-69007, Lyon, France
| | - Fabien Aubé
- Université Lyon, ENS de Lyon, Université Claude Bernard, CNRS UMR 5239, INSERM U1210, Laboratory of Biology and Modelling of the Cell, F-69007, Lyon, France
| | - Jean-Baptiste Claude
- Université Lyon, ENS de Lyon, Université Claude Bernard, CNRS UMR 5239, INSERM U1210, Laboratory of Biology and Modelling of the Cell, F-69007, Lyon, France
| | - Hélène Polvèche
- Université Lyon, ENS de Lyon, Université Claude Bernard, CNRS UMR 5239, INSERM U1210, Laboratory of Biology and Modelling of the Cell, F-69007, Lyon, France
| | - Sébastien Lemaire
- Université Lyon, ENS de Lyon, Université Claude Bernard, CNRS UMR 5239, INSERM U1210, Laboratory of Biology and Modelling of the Cell, F-69007, Lyon, France
| | - Léon-Charles Tranchevent
- Proteome and Genome Research Unit, Department of Oncology, Luxembourg Institute of Health (LIH), L-1445 Strassen, Luxembourg
| | - Laurent Modolo
- LBMC Biocomputing Center, CNRS UMR 5239, INSERM U1210, F-69007, Lyon, France
| | - Franck Mortreux
- Université Lyon, ENS de Lyon, Université Claude Bernard, CNRS UMR 5239, INSERM U1210, Laboratory of Biology and Modelling of the Cell, F-69007, Lyon, France
| | - Cyril F Bourgeois
- Université Lyon, ENS de Lyon, Université Claude Bernard, CNRS UMR 5239, INSERM U1210, Laboratory of Biology and Modelling of the Cell, F-69007, Lyon, France
| | - Didier Auboeuf
- Université Lyon, ENS de Lyon, Université Claude Bernard, CNRS UMR 5239, INSERM U1210, Laboratory of Biology and Modelling of the Cell, F-69007, Lyon, France
| |
Collapse
|
14
|
Compositional dynamics and codon usage pattern of BRCA1 gene across nine mammalian species. Genomics 2019; 111:167-176. [DOI: 10.1016/j.ygeno.2018.01.013] [Citation(s) in RCA: 21] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/01/2017] [Revised: 12/22/2017] [Accepted: 01/22/2018] [Indexed: 11/19/2022]
|
15
|
Chakraborty S, Uddin A, Mazumder TH, Choudhury MN, Malakar AK, Paul P, Halder B, Deka H, Mazumder GA, Barbhuiya RA, Barbhuiya MA, Devi WJ. Codon usage and expression level of human mitochondrial 13 protein coding genes across six continents. Mitochondrion 2018; 42:64-76. [DOI: 10.1016/j.mito.2017.11.006] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/14/2016] [Revised: 10/09/2017] [Accepted: 11/27/2017] [Indexed: 02/03/2023]
|
16
|
Li G, Ji S, Zhai X, Zhang Y, Liu J, Zhu M, Zhou J, Su S. Evolutionary and genetic analysis of the VP2 gene of canine parvovirus. BMC Genomics 2017; 18:534. [PMID: 28716118 PMCID: PMC5512735 DOI: 10.1186/s12864-017-3935-8] [Citation(s) in RCA: 41] [Impact Index Per Article: 5.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/03/2017] [Accepted: 07/09/2017] [Indexed: 01/20/2023] Open
Abstract
BACKGROUND Canine parvovirus (CPV) type 2 emerged in 1978 in the USA and quickly spread among dog populations all over the world with high morbidity. Although CPV is a DNA virus, its genomic substitution rate is similar to some RNA viruses. Therefore, it is important to trace the evolution of CPV to monitor the appearance of mutations that might affect vaccine effectiveness. RESULTS Our analysis shows that the VP2 genes of CPV isolated from 1979 to 2016 are divided into six groups: GI, GII, GIII, GIV, GV, and GVI. Amino acid mutation analysis revealed several undiscovered important mutation sites: F267Y, Y324I, and T440A. Of note, the evolutionary rate of the CPV VP2 gene from Asia and Europe decreased. Codon usage analysis showed that the VP2 gene of CPV exhibits high bias with an ENC ranging from 34.93 to 36.7. Furthermore, we demonstrate that natural selection plays a major role compared to mutation pressure driving CPV evolution. CONCLUSIONS There are few studies on the codon usage of CPV. Here, we comprehensively studied the genetic evolution, codon usage pattern, and evolutionary characterization of the VP2 gene of CPV. The novel findings revealing the evolutionary process of CPV will greatly serve future CPV research.
Collapse
Affiliation(s)
- Gairu Li
- Jiangsu Engineering Laboratory of Animal Immunology, Institute of Immunology and College of Veterinary Medicine, Nanjing Agricultural University, Nanjing, China
| | - Senlin Ji
- Jiangsu Engineering Laboratory of Animal Immunology, Institute of Immunology and College of Veterinary Medicine, Nanjing Agricultural University, Nanjing, China
| | - Xiaofeng Zhai
- Jiangsu Engineering Laboratory of Animal Immunology, Institute of Immunology and College of Veterinary Medicine, Nanjing Agricultural University, Nanjing, China
| | - Yuxiang Zhang
- Jiangsu Engineering Laboratory of Animal Immunology, Institute of Immunology and College of Veterinary Medicine, Nanjing Agricultural University, Nanjing, China
| | - Jie Liu
- Jiangsu Engineering Laboratory of Animal Immunology, Institute of Immunology and College of Veterinary Medicine, Nanjing Agricultural University, Nanjing, China
| | - Mengyan Zhu
- Jiangsu Engineering Laboratory of Animal Immunology, Institute of Immunology and College of Veterinary Medicine, Nanjing Agricultural University, Nanjing, China
| | - Jiyong Zhou
- Jiangsu Engineering Laboratory of Animal Immunology, Institute of Immunology and College of Veterinary Medicine, Nanjing Agricultural University, Nanjing, China
| | - Shuo Su
- Jiangsu Engineering Laboratory of Animal Immunology, Institute of Immunology and College of Veterinary Medicine, Nanjing Agricultural University, Nanjing, China.
| |
Collapse
|
17
|
Chakraborty S, Nag D, Mazumder TH, Uddin A. Codon usage pattern and prediction of gene expression level in Bungarus species. Gene 2016; 604:48-60. [PMID: 27845207 DOI: 10.1016/j.gene.2016.11.023] [Citation(s) in RCA: 35] [Impact Index Per Article: 3.9] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/25/2016] [Revised: 10/18/2016] [Accepted: 11/10/2016] [Indexed: 10/20/2022]
Abstract
Codon bias study in an organism gains significance in understanding the molecular mechanism as well as the functional conservation of gene expression during the course of evolution. The prime focus in this study is to compare the codon usage patterns among the four species belonging to the genus Bungarus (B. multicinctus, B. fasciatus, B. candidus and B. flaviceps) using several codon bias parameters. Our results suggested that relatively low codon bias exists in the coding sequences of the selected species. The compositional constraints together with gene expression level might influence the patterns of codon usage among the genes of Bungarus species. Both natural selection and mutation pressure affect the codon usage pattern in Bungarus species as evident from correspondence analysis. Neutrality plot indicates that natural selection played a major role while mutation pressure played a minor role in codon usage pattern of the genes in Bungarus species.
Collapse
Affiliation(s)
- Supriyo Chakraborty
- Department of Biotechnology, Assam University, Silchar, Assam 788011, India.
| | - Debojyoti Nag
- Department of Biotechnology, Assam University, Silchar, Assam 788011, India
| | | | - Arif Uddin
- Department of Biotechnology, Assam University, Silchar, Assam 788011, India; Moinul Hoque Choudhury Memorial Science College, Algapur, HailaKandi, Assam 788150, India
| |
Collapse
|
18
|
Uddin A, Choudhury MN, Chakraborty S. Codon usage bias and phylogenetic analysis of mitochondrial ND1 gene in pisces, aves, and mammals. Mitochondrial DNA A DNA Mapp Seq Anal 2016; 29:36-48. [PMID: 27776434 DOI: 10.1080/24701394.2016.1233534] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/21/2022]
Abstract
The mitochondrially encoded NADH:ubiquinone oxidoreductase core subunit 1 (MT-ND1) gene is a subunit of the respiratory chain complex I and involved in the first step of the electron transport chain of oxidative phosphorylation (OXPHOS). To understand the pattern of compositional properties, codon usage and expression level of mitochondrial ND1 genes in pisces, aves, and mammals, we used bioinformatic approaches as no work was reported earlier. In this study, a perl script was used for calculating nucleotide contents and different codon usage bias parameters. The codon usage bias of MT-ND1 was low but the expression level was high as revealed from high ENC and CAI value. Correspondence analysis (COA) suggests that the pattern of codon usage for MT-ND1 gene is not same across species and that compositional constraint played an important role in codon usage pattern of this gene among pisces, aves, and mammals. From the regression equation of GC12 on GC3, it can be inferred that the natural selection might have played a dominant role while mutation pressure played a minor role in influencing the codon usage patterns. Further, ND1 gene has a discrepancy with cytochrome B (CYB) gene in preference of codons as evident from COA. The codon usage bias was low. It is influenced by nucleotide composition, natural selection, mutation pressure, length (number) of amino acids, and relative dinucleotide composition. This study helps in understanding the molecular biology, genetics, evolution of MT-ND1 gene, and also for designing a synthetic gene.
Collapse
Affiliation(s)
- Arif Uddin
- a Department of Zoology , Moinul Hoque Choudhury Memorial Science College , Algapur , India
| | | | | |
Collapse
|
19
|
Nasrullah I, Butt AM, Tahir S, Idrees M, Tong Y. Genomic analysis of codon usage shows influence of mutation pressure, natural selection, and host features on Marburg virus evolution. BMC Evol Biol 2015; 15:174. [PMID: 26306510 PMCID: PMC4550055 DOI: 10.1186/s12862-015-0456-4] [Citation(s) in RCA: 93] [Impact Index Per Article: 9.3] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/10/2015] [Accepted: 08/17/2015] [Indexed: 02/07/2023] Open
Abstract
BACKGROUND The Marburg virus (MARV) has a negative-sense single-stranded RNA genome, belongs to the family Filoviridae, and is responsible for several outbreaks of highly fatal hemorrhagic fever. Codon usage patterns of viruses reflect a series of evolutionary changes that enable viruses to shape their survival rates and fitness toward the external environment and, most importantly, their hosts. To understand the evolution of MARV at the codon level, we report a comprehensive analysis of synonymous codon usage patterns in MARV genomes. Multiple codon analysis approaches and statistical methods were performed to determine overall codon usage patterns, biases in codon usage, and influence of various factors, including mutation pressure, natural selection, and its two hosts, Homo sapiens and Rousettus aegyptiacus. RESULTS Nucleotide composition and relative synonymous codon usage (RSCU) analysis revealed that MARV shows mutation bias and prefers U- and A-ended codons to code amino acids. Effective number of codons analysis indicated that overall codon usage among MARV genomes is slightly biased. The Parity Rule 2 plot analysis showed that GC and AU nucleotides were not used proportionally which accounts for the presence of natural selection. Codon usage patterns of MARV were also found to be influenced by its hosts. This indicates that MARV have evolved codon usage patterns that are specific to both of its hosts. Moreover, selection pressure from R. aegyptiacus on the MARV RSCU patterns was found to be dominant compared with that from H. sapiens. Overall, mutation pressure was found to be the most important and dominant force that shapes codon usage patterns in MARV. CONCLUSIONS To our knowledge, this is the first detailed codon usage analysis of MARV and extends our understanding of the mechanisms that contribute to codon usage and evolution of MARV.
Collapse
Affiliation(s)
- Izza Nasrullah
- Department of Biochemistry, Faculty of Biological Sciences, Quaid-i-Azam University, Islamabad, 45320, Pakistan.
| | - Azeem M Butt
- Centre of Excellence in Molecular Biology (CEMB), University of the Punjab, Lahore, 53700, Pakistan.
| | - Shifa Tahir
- INRA, UMR85 Physiologie de la Reproduction et des Comportements, Nouzilly, F-37380, France. .,CNRS, UMR7247, F-37380, Nouzilly, France. .,Université François Rabelais de Tours, Tours, F-37380, France.
| | - Muhammad Idrees
- Centre of Excellence in Molecular Biology (CEMB), University of the Punjab, Lahore, 53700, Pakistan.
| | - Yigang Tong
- State Key Laboratory of Pathogen and Biosecurity, Beijing Institute of Microbiology and Epidemiology, Beijing, 100071, People's Republic of China.
| |
Collapse
|
20
|
Ponce de Leon M, de Miranda AB, Alvarez-Valin F, Carels N. The Purine Bias of Coding Sequences is Determined by Physicochemical Constraints on Proteins. Bioinform Biol Insights 2014; 8:93-108. [PMID: 24899802 PMCID: PMC4039185 DOI: 10.4137/bbi.s13161] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/08/2013] [Revised: 11/24/2013] [Accepted: 11/24/2013] [Indexed: 01/02/2023] Open
Abstract
For this report, we analyzed protein secondary structures in relation to the statistics of three nucleotide codon positions. The purpose of this investigation was to find which properties of the ribosome, tRNA or protein level, could explain the purine bias (Rrr) as it is observed in coding DNA. We found that the Rrr pattern is the consequence of a regularity (the codon structure) resulting from physicochemical constraints on proteins and thermodynamic constraints on ribosomal machinery. The physicochemical constraints on proteins mainly come from the hydropathy and molecular weight (MW) of secondary structures as well as the energy cost of amino acid synthesis. These constraints appear through a network of statistical correlations, such as (i) the cost of amino acid synthesis, which is in favor of a higher level of guanine in the first codon position, (ii) the constructive contribution of hydropathy alternation in proteins, (iii) the spatial organization of secondary structure in proteins according to solvent accessibility, (iv) the spatial organization of secondary structure according to amino acid hydropathy, (v) the statistical correlation of MW with protein secondary structures and their overall hydropathy, (vi) the statistical correlation of thymine in the second codon position with hydropathy and the energy cost of amino acid synthesis, and (vii) the statistical correlation of adenine in the second codon position with amino acid complexity and the MW of secondary protein structures. Amino acid physicochemical properties and functional constraints on proteins constitute a code that is translated into a purine bias within the coding DNA via tRNAs. In that sense, the Rrr pattern within coding DNA is the effect of information transfer on nucleotide composition from protein to DNA by selection according to the codon positions. Thus, coding DNA structure and ribosomal machinery co-evolved to minimize the energy cost of protein coding given the functional constraints on proteins.
Collapse
Affiliation(s)
- Miguel Ponce de Leon
- Sección Biomatemática, Facultad de Ciencias, Universidad de la República, Iguá, Montevideo, Uruguay
| | - Antonio Basilio de Miranda
- Fundação Oswaldo Cruz (FIOCRUZ), Instituto Oswaldo Cruz (IOC), Laboratório de Genômica Funcional e Bioinformática, Rio de Janeiro, RJ, Brazil
| | - Fernando Alvarez-Valin
- Sección Biomatemática, Facultad de Ciencias, Universidad de la República, Iguá, Montevideo, Uruguay
| | - Nicolas Carels
- Fundação Oswaldo Cruz (FIOCRUZ), Instituto Oswaldo Cruz (IOC), Laboratório de Genômica Funcional e Bioinformática, Rio de Janeiro, RJ, Brazil
| |
Collapse
|
21
|
Butt AM, Nasrullah I, Tong Y. Genome-wide analysis of codon usage and influencing factors in chikungunya viruses. PLoS One 2014; 9:e90905. [PMID: 24595095 PMCID: PMC3942501 DOI: 10.1371/journal.pone.0090905] [Citation(s) in RCA: 151] [Impact Index Per Article: 13.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/29/2013] [Accepted: 02/06/2014] [Indexed: 02/03/2023] Open
Abstract
Chikungunya virus (CHIKV) is an arthropod-borne virus of the family Togaviridae that is transmitted to humans by Aedes spp. mosquitoes. Its genome comprises a 12 kb single-strand positive-sense RNA. In the present study, we report the patterns of synonymous codon usage in 141 CHIKV genomes by calculating several codon usage indices and applying multivariate statistical methods. Relative synonymous codon usage (RSCU) analysis showed that the preferred synonymous codons were G/C and A-ended. A comparative analysis of RSCU between CHIKV and its hosts showed that codon usage patterns of CHIKV are a mixture of coincidence and antagonism. Similarity index analysis showed that the overall codon usage patterns of CHIKV have been strongly influenced by Pan troglodytes and Aedes albopictus during evolution. The overall codon usage bias was low in CHIKV genomes, as inferred from the analysis of effective number of codons (ENC) and codon adaptation index (CAI). Our data suggested that although mutation pressure dominates codon usage in CHIKV, patterns of codon usage in CHIKV are also under the influence of natural selection from its hosts and geography. To the best of our knowledge, this is first report describing codon usage analysis in CHIKV genomes. The findings from this study are expected to increase our understanding of factors involved in viral evolution, and fitness towards hosts and the environment.
Collapse
Affiliation(s)
- Azeem Mehmood Butt
- Centre of Excellence in Molecular Biology (CEMB), University of the Punjab, Lahore, Pakistan
| | - Izza Nasrullah
- Department of Biochemistry, Faculty of Biological Sciences, Quaid-i-Azam University, Islamabad, Pakistan
| | - Yigang Tong
- State Key Laboratory of Pathogen and Biosecurity, Beijing Institute of Microbiology and Epidemiology, Beijing, People’s Republic of China
- * E-mail:
| |
Collapse
|
22
|
Synonymous codon usage in TTSuV2: analysis and comparison with TTSuV1. PLoS One 2013; 8:e81469. [PMID: 24303050 PMCID: PMC3841265 DOI: 10.1371/journal.pone.0081469] [Citation(s) in RCA: 41] [Impact Index Per Article: 3.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/12/2013] [Accepted: 10/12/2013] [Indexed: 11/19/2022] Open
Abstract
Two species of the DNA virus Torque teno sus virus (TTSuV), TTSuV1 and TTSuV2, have become widely distributed in pig-farming countries in recent years. In this study, we performed a comprehensive analysis of synonymous codon usage bias in 41 available TTSuV2 coding sequences (CDS), and compared the codon usage patterns of TTSuV2 and TTSuV1. TTSuV codon usage patterns were found to be phylogenetically conserved. Values for the effective number of codons (ENC) indicated that the overall extent of codon usage bias in both TTSuV2 and TTSuV1 was not significant, the most frequently occurring codons had an A or C at the third codon position. Correspondence analysis (COA) was performed and TTSuV2 and TTSuV1 sequences were located in different quadrants of the first two major axes. A plot of the ENC revealed that compositional constraint was the major factor determining the codon usage bias for TTSuV2. In addition, hierarchical cluster analysis of 41 TTSuV2 isolates based on relative synonymous codon usage (RSCU) values suggested that there was no association between geographic distribution and codon bias of TTSuV2 sequences. Finally, the comparison of RSCU for TTSuV2, TTSuV1 and the corresponding host sequence indicated that the codon usage pattern of TTSuV2 was similar to that of TTSuV1. However the similarity was low for each virus and its host. These conclusions provide important insight into the synonymous codon usage pattern of TTSuV2, as well as better understangding of the molecular evolution of TTSuV2 genomes.
Collapse
|
23
|
The complete mitochondrial genome of the Antarctic sea spider Ammothea carolinensis (Chelicerata; Pycnogonida). Polar Biol 2013. [DOI: 10.1007/s00300-013-1288-6] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/27/2022]
|
24
|
Biro JC. Coding nucleic acids are chaperons for protein folding: a novel theory of protein folding. Gene 2012; 515:249-57. [PMID: 23266645 DOI: 10.1016/j.gene.2012.12.048] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/02/2012] [Revised: 12/04/2012] [Accepted: 12/06/2012] [Indexed: 11/29/2022]
Abstract
The arguments for nucleic acid chaperons are reviewed and three new lines of evidence are added. (1) It was found that amino acids encoded by codons in short nucleic acid loops frequently form turns and helices in the corresponding protein structures. (2) The amino acids encoded by partially complementary (1st and 3rd nucleotides) codons are more frequently co-located in the encoded proteins than expected by chance. (3) There are significant correlations between thermodynamic changes (ddG) caused by codon mutations in nucleic acids and the thermodynamic changes caused by the corresponding amino acid mutations in the encoded proteins. We conclude that the concept of the Proteomic Code and nucleic acid chaperons seems correct from the bioinformatics point of view, and we expect to see direct biochemical experiments and evidence in the near future.
Collapse
Affiliation(s)
- Jan C Biro
- Karolinska Institute, Stockholm, Sweden.
| |
Collapse
|
25
|
Wang Q, Lei Y, Xu X, Wang G, Chen LL. Theoretical prediction and experimental verification of protein-coding genes in plant pathogen genome Agrobacterium tumefaciens strain C58. PLoS One 2012; 7:e43176. [PMID: 22984411 PMCID: PMC3439454 DOI: 10.1371/journal.pone.0043176] [Citation(s) in RCA: 17] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/12/2012] [Accepted: 07/18/2012] [Indexed: 11/19/2022] Open
Abstract
Agrobacterium tumefaciens strain C58 is a Gram-negative soil bacterium capable of inducing tumors (crown galls) on many dicotyledonous plants. The genome of A. tumefaciens strain C58 was re-annotated based on the Z-curve method. First, all the ‘hypothetical genes’ were re-identified, and 29 originally annotated ‘hypothetical genes’ were recognized to be non-coding open reading frames (ORFs). Theoretical evidence obtained from principal component analysis, clusters of orthologous groups of proteins occupation, and average length distribution showed that these non-coding ORFs were highly unlikely to encode proteins. Results from the reverse transcription-polymerase chain reaction (RT-PCR) experiments on three different growth stages of A. tumefaciens C58 confirmed that 23 (79%) of the identified non-coding ORFs have no transcripts in these growth stages. In addition, using theoretical prediction, 19 potential protein-coding genes were predicted to be new protein-coding genes. Fifteen (79%) of these genes were verified with RT-PCR experiments. The RT-PCR experimental results confirmed the reliability of our theoretical prediction, indicating that false-positive prediction and missing genes always exist in the annotation of A. tumefaciens C58 genome. The improved annotation will serve as a valuable resource for the research of the lifestyle, metabolism, and pathogenicity of A. tumefaciens C58. The re-annotation of A. tumefaciens C58 can be obtained from http://211.69.128.148/Atum/.
Collapse
Affiliation(s)
- Qian Wang
- State Key Laboratory of Agricultural Microbiology, College of Life Science and Technology, Huazhong Agricultural University, Wuhan, People's Republic of China
| | - Yang Lei
- State Key Laboratory of Agricultural Microbiology, College of Life Science and Technology, Huazhong Agricultural University, Wuhan, People's Republic of China
- Center for Bioinformatics, Huazhong Agricultural University, Wuhan, People's Republic of China
| | - Xiwen Xu
- State Key Laboratory of Agricultural Microbiology, College of Life Science and Technology, Huazhong Agricultural University, Wuhan, People's Republic of China
- Center for Bioinformatics, Huazhong Agricultural University, Wuhan, People's Republic of China
| | - Gejiao Wang
- State Key Laboratory of Agricultural Microbiology, College of Life Science and Technology, Huazhong Agricultural University, Wuhan, People's Republic of China
- * E-mail: (WG); (LLC)
| | - Ling-Ling Chen
- State Key Laboratory of Agricultural Microbiology, College of Life Science and Technology, Huazhong Agricultural University, Wuhan, People's Republic of China
- Center for Bioinformatics, Huazhong Agricultural University, Wuhan, People's Republic of China
- * E-mail: (WG); (LLC)
| |
Collapse
|
26
|
Lei Y, Kang SK, Gao J, Jia XS, Chen LL. Improved annotation of a plant pathogen genome Xanthomonas oryzae pv. oryzae PXO99A. J Biomol Struct Dyn 2012; 31:342-50. [PMID: 22849520 DOI: 10.1080/07391102.2012.698218] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/28/2022]
Abstract
Many bacterial genomes have been sequenced and stored in public databases now, of which Reference Sequence (RefSeq) is the most widely used one. However, the annotation in RefSeq is still unsatisfactory. The present analysis is focused on the re-annotation of an important plant pathogen genome Xanthomonas oryzae pv. oryzae PXO99A (Xoo PXO99A), which is the causal agent of bacterial blight on rice. Based on the parameters of 28 nucleotide frequencies and support vector machine algorithm, 41 originally annotated hypothetical genes were recognized as noncoding sequences, which were further supported by principal component analysis and other evidence. Ten of them were tested with reverse transcription-polymerase chain reaction experiments (RT-PCR), and all of them were confirmed to be noncoding sequences. Furthermore, 197 potential new genes not annotated in RefSeq were both recognized by two ab initio gene finding programs. Most of them only have sequence similarities with part of the known genes in other species, so they are unlikely to be protein-coding genes. Twelve potential new genes have high full-length sequence similarities with function-known genes, which are very likely to be true protein-coding genes. All the 12 potential genes were tested with RT-PCR, and 11 of them (92%) were successfully amplified in cDNA template. The RT-PCR experiments confirm that our theoretical prediction has high accuracy. The improvement of Xoo PXO99A annotation is helpful for the research of lifestyle, metabolism, and pathogenicity of this important plant pathogen. The improved annotation can be obtained from http://211.69.128.148/Xoo .
Collapse
Affiliation(s)
- Yang Lei
- National Key Laboratory of Agricultural Microbiology, College of Life Science and Technology, Center for Bioinformatics, Huazhong Agricultural University, No. 1 Shizishan Street, Hongshan District, Wuhan, 430070, P.R. China
| | | | | | | | | |
Collapse
|
27
|
Huang JT, Xing DJ, Huang W. Choice of synonymous codons associated with protein folding. Proteins 2012; 80:2056-62. [DOI: 10.1002/prot.24096] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/14/2012] [Revised: 03/29/2012] [Accepted: 04/05/2012] [Indexed: 11/11/2022]
|
28
|
Dass JFP, Sudandiradoss C. Insight into pattern of codon biasness and nucleotide base usage in serotonin receptor gene family from different mammalian species. Gene 2012; 503:92-100. [PMID: 22480817 DOI: 10.1016/j.gene.2012.03.057] [Citation(s) in RCA: 17] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/08/2011] [Revised: 03/14/2012] [Accepted: 03/17/2012] [Indexed: 11/16/2022]
Abstract
5-HT (5-Hydroxy-tryptamine) or serotonin receptors are found both in central and peripheral nervous system as well as in non-neuronal tissues. In the animal and human nervous system, serotonin produces various functional effects through a variety of membrane bound receptors. In this study, we focus on 5-HT receptor family from different mammals and examined the factors that account for codon and nucleotide usage variation. A total of 110 homologous coding sequences from 11 different mammalian species were analyzed using relative synonymous codon usage (RSCU), correspondence analysis (COA) and hierarchical cluster analysis together with nucleotide base usage frequency of chemically similar amino acid codons. The mean effective number of codon (ENc) value of 37.06 for 5-HT(6) shows very high codon bias within the family and may be due to high selective translational efficiency. The COA and Spearman's rank correlation reveals that the nucleotide compositional mutation bias as the major factors influencing the codon usage in serotonin receptor genes. The hierarchical cluster analysis suggests that gene function is another dominant factor that affects the codon usage bias, while species is a minor factor. Nucleotide base usage was reported using Goldman, Engelman, Stietz (GES) scale reveals the presence of high uracil (>45%) content at functionally important hydrophobic regions. Our in silico approach will certainly help for further investigations on critical inference on evolution, structure, function and gene expression aspects of 5-HT receptors family which are potential antipsychotic drug targets.
Collapse
Affiliation(s)
- J Febin Prabhu Dass
- School of Biosciences and Technology, VIT University, Vellore, Tamil Nadu State, India
| | | |
Collapse
|
29
|
Liu XS, Zhang YG, Fang YZ, Wang YL. Patterns and influencing factor of synonymous codon usage in porcine circovirus. Virol J 2012; 9:68. [PMID: 22416942 PMCID: PMC3341187 DOI: 10.1186/1743-422x-9-68] [Citation(s) in RCA: 32] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/13/2011] [Accepted: 03/15/2012] [Indexed: 11/11/2022] Open
Abstract
Background Analysis of codon usage can reveal much about the molecular evolution of the viruses. Nevertheless, little information about synonymous codon usage pattern of porcine circovirus (PCV) genome in the process of its evolution is available. In this study, to give a new understanding on the evolutionary characteristics of PCV and the effects of natural selection from its host on the codon usage pattern of the virus, Patterns and the key determinants of codon usage in PCV were examined. Methods We carried out comprehensive analysis on codon usage pattern in the PCV genome, by calculating relative synonymous codon usage (RSCU), effective number of codons (ENC), dinucleotides and nucleic acid content of the PCV genome. Results PCV genomes have relatively much lower content of GC and codon preference, this result shows that nucleotide constraints have a major impact on its synonymous codon usage. The results of the correspondence analysis indicate codon usage patterns of PCV of various genotypes, various subgenotypes changed greatly, and significant differences in codon usage patterns of Each virus of Circoviridae.There is much comparability between PCV and its host in their synonymous codon usage, suggesting that the natural selection pressure from the host factor also affect the codon usage patterns of PCV. In particular, PCV genotype II is in synonymous codon usage more similar to pig than to PCV genotype I, which may be one of the most important molecular mechanisms of PCV genotype II to cause disease. The calculations results of the relative abundance of dinucleotides indicate that the composition of dinucleotides also plays a key role in the variation found in synonymous codon usage in PCV. Furthermore, geographic factors, the general average hydrophobicity and the aromaticity may be related to the formation of codon usage patterns of PCV. Conclusion The results of these studies suggest that synonymous codon usage pattern of PCV genome are the result of interaction between mutation pressure and natural selection from its host. The information from this study may not only have theoretical value in understanding the characteristics of synonymous codon usage in PCV genomes, but also have significant value for the molecular evolution of PCV.
Collapse
Affiliation(s)
- Xin-sheng Liu
- State Key Laboratory of Veterinary Etiological Biology, National Foot and Mouth Disease Reference Laboratory, Lanzhou Veterinary Research Institute, Chinese Academy of Agricultural Sciences, Lanzhou 730046, People's Republic of China
| | | | | | | |
Collapse
|
30
|
Yu JF, Xiao K, Jiang DK, Guo J, Wang JH, Sun X. An integrative method for identifying the over-annotated protein-coding genes in microbial genomes. DNA Res 2011; 18:435-49. [PMID: 21903723 PMCID: PMC3223076 DOI: 10.1093/dnares/dsr030] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022] Open
Abstract
The falsely annotated protein-coding genes have been deemed one of the major causes accounting for the annotating errors in public databases. Although many filtering approaches have been designed for the over-annotated protein-coding genes, some are questionable due to the resultant increase in false negative. Furthermore, there is no webserver or software specifically devised for the problem of over-annotation. In this study, we propose an integrative algorithm for detecting the over-annotated protein-coding genes in microorganisms. Overall, an average accuracy of 99.94% is achieved over 61 microbial genomes. The extremely high accuracy indicates that the presented algorithm is efficient to differentiate the protein-coding genes from the non-coding open reading frames. Abundant analyses show that the predicting results are reliable and the integrative algorithm is robust and convenient. Our analysis also indicates that the over-annotated protein-coding genes can cause the false positive of horizontal gene transfers detection. The webserver of the proposed algorithm can be freely accessible from www.cbi.seu.edu.cn/RPGM.
Collapse
Affiliation(s)
- Jia-Feng Yu
- State Key Laboratory of Bioelectronics, School of Biological Science and Medical Engineering, Southeast University, Nanjing, China.
| | | | | | | | | | | |
Collapse
|
31
|
Zhang Z, Yu J. On the organizational dynamics of the genetic code. GENOMICS PROTEOMICS & BIOINFORMATICS 2011; 9:21-9. [PMID: 21641559 PMCID: PMC5054158 DOI: 10.1016/s1672-0229(11)60004-1] [Citation(s) in RCA: 20] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 08/30/2010] [Accepted: 10/26/2010] [Indexed: 11/23/2022]
Abstract
The organization of the canonical genetic code needs to be thoroughly illuminated. Here we reorder the four nucleotides—adenine, thymine, guanine and cytosine—according to their emergence in evolution, and apply the organizational rules to devising an algebraic representation for the canonical genetic code. Under a framework of the devised code, we quantify codon and amino acid usages from a large collection of 917 prokaryotic genome sequences, and associate the usages with its intrinsic structure and classification schemes as well as amino acid physicochemical properties. Our results show that the algebraic representation of the code is structurally equivalent to a content-centric organization of the code and that codon and amino acid usages under different classification schemes were correlated closely with GC content, implying a set of rules governing composition dynamics across a wide variety of prokaryotic genome sequences. These results also indicate that codons and amino acids are not randomly allocated in the code, where the six-fold degenerate codons and their amino acids have important balancing roles for error minimization. Therefore, the content-centric code is of great usefulness in deciphering its hitherto unknown regularities as well as the dynamics of nucleotide, codon, and amino acid compositions.
Collapse
Affiliation(s)
- Zhang Zhang
- Plant Stress Genomics Research Center, Division of Chemical and Life Sciences and Engineering, King Abdullah University of Science and Technology, Thuwal, Saudi Arabia
| | | |
Collapse
|
32
|
Pett W, Ryan JF, Pang K, Mullikin JC, Martindale MQ, Baxevanis AD, Lavrov DV. Extreme mitochondrial evolution in the ctenophore Mnemiopsis leidyi: Insight from mtDNA and the nuclear genome. MITOCHONDRIAL DNA 2011; 22:130-42. [PMID: 21985407 PMCID: PMC3313829 DOI: 10.3109/19401736.2011.624611] [Citation(s) in RCA: 66] [Impact Index Per Article: 4.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/13/2022]
Abstract
Recent advances in sequencing technology have led to a rapid accumulation of mitochondrial DNA (mtDNA) sequences, which now represent the wide spectrum of animal diversity. However, one animal phylum--Ctenophora--has, to date, remained completely unsampled. Ctenophores, a small group of marine animals, are of interest due to their unusual biology, controversial phylogenetic position, and devastating impact as invasive species. Using data from the Mnemiopsis leidyi genome sequencing project, we Polymerase Chain Reaction (PCR) amplified and analyzed its complete mitochondrial (mt-) genome. At just over 10 kb, the mt-genome of M. leidyi is the smallest animal mtDNA ever reported and is among the most derived. It has lost at least 25 genes, including atp6 and all tRNA genes. We show that atp6 has been relocated to the nuclear genome and has acquired introns and a mitochondrial targeting presequence, while tRNA genes have been genuinely lost, along with nuclear-encoded mt-aminoacyl tRNA synthetases. The mt-genome of M. leidyi also displays extremely high rates of sequence evolution, which likely led to the degeneration of both protein and rRNA genes. In particular, encoded rRNA molecules possess little similarity with their homologs in other organisms and have highly reduced secondary structures. At the same time, nuclear encoded mt-ribosomal proteins have undergone expansions, likely to compensate for the reductions in mt-rRNA. The unusual features identified in M. leidyi mtDNA make this organism an interesting system for the study of various aspects of mitochondrial biology, particularly protein and tRNA import and mt-ribosome structures, and add to its value as an emerging model species. Furthermore, the fast-evolving M. leidyi mtDNA should be a convenient molecular marker for species- and population-level studies.
Collapse
Affiliation(s)
- Walker Pett
- Department of Ecology, Evolution and Organismal Biology, Iowa State University, Ames, IA 50010, USA
| | - Joseph F. Ryan
- Genome Technology Branch, National Human Genome Research Institute, National Institutes of Health, Bethesda, MD 20892, USA
| | - Kevin Pang
- Kewalo Marine Laboratory, Pacific Bioscience Research Center, University of Hawaii, Honolulu, HI 96813, USA
| | - James C. Mullikin
- Genome Technology Branch, National Human Genome Research Institute, National Institutes of Health, Bethesda, MD 20892, USA
| | - Mark Q. Martindale
- Kewalo Marine Laboratory, Pacific Bioscience Research Center, University of Hawaii, Honolulu, HI 96813, USA
| | - Andreas D. Baxevanis
- Genome Technology Branch, National Human Genome Research Institute, National Institutes of Health, Bethesda, MD 20892, USA
| | - Dennis V. Lavrov
- Department of Ecology, Evolution and Organismal Biology, Iowa State University, Ames, IA 50010, USA
| |
Collapse
|
33
|
Tang SL, Chang BC, Halgamuge SK. Gene functionality's influence on the second codon: A large-scale survey of second codon composition in three domains. Genomics 2010; 96:92-101. [DOI: 10.1016/j.ygeno.2010.04.001] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/09/2009] [Revised: 02/03/2010] [Accepted: 04/07/2010] [Indexed: 10/19/2022]
|
34
|
Agutter PS. Editorial: hypotheses about protein folding--the proteomic code and wonderfolds. THEORETICAL BIOLOGY & MEDICAL MODELLING 2009; 6:31. [PMID: 20034380 PMCID: PMC2803780 DOI: 10.1186/1742-4682-6-31] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Subscribe] [Scholar Register] [Received: 12/16/2009] [Accepted: 12/24/2009] [Indexed: 11/22/2022]
Abstract
Theoretical biology journals can contribute in many ways to the progress of knowledge. They are particularly well-placed to encourage dialogue and debate about hypotheses addressing problematical areas of research. An online journal provides an especially useful forum for such debate because of the option of posting comments within days of the publication of a contentious article.
Collapse
|
35
|
Jiang P, Sun X, Lu Z. Analysis of synonymous codon usage in Aeropyrum pernix K1 and other Crenarchaeota microorganisms. J Genet Genomics 2009; 34:275-84. [PMID: 17498625 PMCID: PMC7129909 DOI: 10.1016/s1673-8527(07)60029-0] [Citation(s) in RCA: 32] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/02/2006] [Accepted: 08/22/2006] [Indexed: 11/18/2022]
Abstract
In this study, a comparative analysis of the codon usage bias was performed in Aeropyrum pernix K1 and two other phylogenetically related Crenarchaeota microorganisms (i.e., Pyrobaculum aerophilum str. IM2 and Sulfolobus acidocaldarius DSM 639). The results indicated that the synonymous codon usage in A. pernix K1 was less biased, which was highly correlated with the GC(3S) value. The codon usage patterns were phylogenetically conserved among these Crenarchaeota microorganisms. Comparatively, it is the species function rather than the gene function that determines their gene codon usage patterns. A. pernix K1, P. aerophilum str. IM2, and S. acidocaldarius DSM 639 live in differently extreme conditions. It is presumed that the living environment played an important role in determining the codon usage pattern of these microorganisms. Besides, there was no strain-specific codon usage among these microorganisms. The extent of codon bias in A. pernix K1 and S. acidocaldarius DSM 639 were highly correlated with the gene expression level, but no such association was detected in P. aerophilum str. IM2 genomes.
Collapse
Affiliation(s)
- Peng Jiang
- State Key Laboratory of Bioelectronics, Department of Biological Science and Medical Engineering, Southeast University, Nanjing 210096, China
| | | | | |
Collapse
|
36
|
Biro JC. Discovery of proteomic code with mRNA assisted protein folding. Int J Mol Sci 2008; 9:2424-2446. [PMID: 19330085 PMCID: PMC2635648 DOI: 10.3390/ijms9122424] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/18/2008] [Revised: 11/24/2008] [Accepted: 12/02/2008] [Indexed: 01/18/2023] Open
Abstract
The 3x redundancy of the Genetic Code is usually explained as a necessity to increase the mutation-resistance of the genetic information. However recent bioinformatical observations indicate that the redundant Genetic Code contains more biological information than previously known and which is additional to the 64/20 definition of amino acids. It might define the physico-chemical and structural properties of amino acids, the codon boundaries, the amino acid co-locations (interactions) in the coded proteins and the free folding energy of mRNAs. This additional information, which seems to be necessary to determine the 3D structure of coding nucleic acids as well as the coded proteins, is known as the Proteomic Code and mRNA Assisted Protein Folding.
Collapse
Affiliation(s)
- Jan C Biro
- Homulus Foundation, 612 S Flower St, Los Angeles, 90 017 CA, USA. E-Mail:
; Tel. +1-213-627-6134
| |
Collapse
|
37
|
Komar AA. A pause for thought along the co-translational folding pathway. Trends Biochem Sci 2008; 34:16-24. [PMID: 18996013 DOI: 10.1016/j.tibs.2008.10.002] [Citation(s) in RCA: 256] [Impact Index Per Article: 15.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/22/2008] [Revised: 10/09/2008] [Accepted: 10/13/2008] [Indexed: 11/26/2022]
Abstract
A unifying concept that combines the basic features governing self-organization of proteins into complex three-dimensional structures in vitro and in vivo is still lacking. Recent experimental results and theoretical in silico modeling studies provide evidence showing that mRNA might contain an additional layer of information, beyond the amino acid sequence, that fine-tunes in vivo protein folding, which is largely believed to start as a co-translational process. These findings indicate that translation kinetics might direct the co-translational folding pathway and that translational pausing at rare codons might provide a time delay to enable independent and sequential folding of the defined portions of the nascent polypeptide emerging from the ribosome.
Collapse
Affiliation(s)
- Anton A Komar
- Department of Biological, Center for Gene Regulation in Health and Disease, Cleveland State University, Cleveland, OH 44115, USA.
| |
Collapse
|
38
|
Guo FB. The distribution patterns of bases of protein-coding genes, non-coding ORFs, and intergenic sequences in pseudomonas aeruginosa PA01 genome and its implications. J Biomol Struct Dyn 2008; 25:127-33. [PMID: 17718591 DOI: 10.1080/07391102.2007.10507161] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/28/2022]
Abstract
The distribution patterns of bases of DNA fragments in different regions in P. aeruginosa genome are analyzed in this paper. It's shown that 5565 protein-coding genes, 17315 non-coding ORFs, and 1104 intergenic sequences are located into seven clusters based on their base frequencies. Almost all the protein-coding genes are contained in one of the seven clusters. The significant difference of base frequencies among three codon positions in high GC genome, which arouse the division between the distribution patterns of bases of six reading frames of protein-coding genes, is responsible for the appearance of the clustering phenomenon. In the light of the clustering phenomenon, the author supposes that the anitisense strand ORFs, particularly those corresponding to Frame 2' and Frame 3', may not code for proteins in P. aeruginosa genome.
Collapse
Affiliation(s)
- F-B Guo
- School of Life Science and Technology, University of Electronic Science and Technology of China, Chengdu 610054, China.
| |
Collapse
|
39
|
Chen LL, Ma BG, Gao N. Reannotation of hypothetical ORFs in plant pathogen Erwinia carotovora subsp. atroseptica SCRI1043. FEBS J 2007; 275:198-206. [PMID: 18067578 DOI: 10.1111/j.1742-4658.2007.06190.x] [Citation(s) in RCA: 16] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]
Abstract
Over-annotation of hypothetical ORFs is a common phenomenon in bacterial genomes, which necessitates confirming the coding reliability of hypothetical ORFs and then predicting their functions. The important plant pathogen Erwinia carotovora subsp. atroseptica SCRI1043 (Eca1043) is a typical case because more than a quarter of its annotated ORFs are hypothetical. Our analysis focuses on annotation of Eca1043 hypothetical ORFs, and comprises two efforts: (a) based on the Z-curve method, 49 originally annotated hypothetical ORFs are recognized as noncoding, this is further supported by principal components analysis and other evidence; and (b) using sequence-alignment tools and some functional resources, more than a half of the hypothetical genes were assigned functions. The potential functions of 427 hypothetical genes are summarized according to the cluster of orthologous groups functional category. Moreover, 114 and 86 hypothetical genes are recognized as putative 'membrane proteins' and 'exported proteins', respectively. Reannotation of Eca1043 hypothetical ORFs will benefit research into the lifestyle, metabolism and pathogenicity of the important plant pathogen. Also, our study proffers a model for the reannotation of hypothetical ORFs in microbial genomes.
Collapse
Affiliation(s)
- Ling-Ling Chen
- Shandong Provincial Research Center for Bioinformatic Engineering and Technique, Shandong University of Technology, Zibo, China.
| | | | | |
Collapse
|
40
|
Moura GR, Lousado JP, Pinheiro M, Carreto L, Silva RM, Oliveira JL, Santos MAS. Codon-triplet context unveils unique features of the Candida albicans protein coding genome. BMC Genomics 2007; 8:444. [PMID: 18047667 PMCID: PMC2244636 DOI: 10.1186/1471-2164-8-444] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/15/2007] [Accepted: 11/29/2007] [Indexed: 11/29/2022] Open
Abstract
Background The evolutionary forces that determine the arrangement of synonymous codons within open reading frames and fine tune mRNA translation efficiency are not yet understood. In order to tackle this question we have carried out a large scale study of codon-triplet contexts in 11 fungal species to unravel associations or relationships between codons present at the ribosome A-, P- and E-sites during each decoding cycle. Results Our analysis unveiled high bias within the context of codon-triplets, in particular strong preference for triplets of identical codons. We have also identified a surprisingly large number of codon-triplet combinations that vanished from fungal ORFeomes. Candida albicans exacerbated these features, showed an unbalanced tRNA population for decoding its pool of codons and used near-cognate decoding for a large set of codons, suggesting that unique evolutionary forces shaped the evolution of its ORFeome. Conclusion We have developed bioinformatics tools for large-scale analysis of codon-triplet contexts. These algorithms identified codon-triplets context biases, allowed for large scale comparative codon-triplet analysis, and identified rules governing codon-triplet context. They could also detect alterations to the standard genetic code.
Collapse
Affiliation(s)
- Gabriela R Moura
- Department of Biology and CESAM, University of Aveiro, 3810-193 Aveiro, Portugal.
| | | | | | | | | | | | | |
Collapse
|
41
|
Biro JC. The Proteomic Code: a molecular recognition code for proteins. Theor Biol Med Model 2007; 4:45. [PMID: 17999762 PMCID: PMC2206014 DOI: 10.1186/1742-4682-4-45] [Citation(s) in RCA: 26] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/02/2007] [Accepted: 11/13/2007] [Indexed: 11/30/2022] Open
Abstract
Background The Proteomic Code is a set of rules by which information in genetic material is transferred into the physico-chemical properties of amino acids. It determines how individual amino acids interact with each other during folding and in specific protein-protein interactions. The Proteomic Code is part of the redundant Genetic Code. Review The 25-year-old history of this concept is reviewed from the first independent suggestions by Biro and Mekler, through the works of Blalock, Root-Bernstein, Siemion, Miller and others, followed by the discovery of a Common Periodic Table of Codons and Nucleic Acids in 2003 and culminating in the recent conceptualization of partial complementary coding of interacting amino acids as well as the theory of the nucleic acid-assisted protein folding. Methods and conclusions A novel cloning method for the design and production of specific, high-affinity-reacting proteins (SHARP) is presented. This method is based on the concept of proteomic codes and is suitable for large-scale, industrial production of specifically interacting peptides.
Collapse
Affiliation(s)
- Jan C Biro
- Homulus Foundation, 88 Howard, #1205, San Francisco, CA 94105, USA.
| |
Collapse
|
42
|
Comparative analysis of essential genes and nonessential genes in Escherichia coli K12. Mol Genet Genomics 2007; 279:87-94. [PMID: 17943314 DOI: 10.1007/s00438-007-0298-x] [Citation(s) in RCA: 20] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/08/2007] [Accepted: 09/26/2007] [Indexed: 10/22/2022]
Abstract
Genes can be classified as essential or nonessential based on their indispensability for a living organism. Previous researches have suggested that essential genes evolve more slowly than nonessential genes and the impact of gene dispensability on a gene's evolutionary rate is not as strong as expected. However, findings have not been consistent and evidence is controversial regarding the relationship between the gene indispensability and the rate of gene evolution. Understanding how different classes of genes evolve is essential for a full understanding of evolutionary biology, and may have medical relevance in the design of new antibacterial agents. We therefore performed an investigation into the properties of essential and nonessential genes. Analysis of evolutionary conservation, protein length distribution and amino acid usage between essential and nonessential genes in Escherichia coli K12 demonstrated that essential genes are relatively preserved throughout the bacterial kingdom when compared to nonessential genes. Furthermore, results show that essential genes, compared to nonessential genes, have a significantly higher proportion of large (>534 amino acids) and small proteins (<139 amino acids) relative to medium-sized proteins. The pattern of amino acids usage shows a similar trend for essential and nonessential genes, although some notable exceptions are observed. These findings help to clarify our understanding of the evolutionary mechanisms of essential and nonessential genes, relevant to the study of mutagenesis and possibly allowing prediction of gene properties in other poorly understood organisms.
Collapse
|
43
|
Biro JC. Protein folding information in nucleic acids which is not present in the genetic code. Ann N Y Acad Sci 2007; 1091:399-411. [PMID: 17341631 DOI: 10.1196/annals.1378.083] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022]
Abstract
Nucleic acid subsequences comprising the 1st and/or 3rd codon residues in mRNAs express significantly higher free folding energy (FFE) than the subsequence containing only the 2nd residues (P < 0.0001, n = 81). This periodic FFE difference is not present in introns. The FFE in the 1st and 3rd residues is additive, which suggests that these residues contain a significant number of complementary bases and contribute to selection for local mRNA secondary structures. This periodic, codon-related structure forming of mRNAs indicates a connection between the structure of exons and the corresponding (translated) proteins. The folding energy dot plots of RNAs and the residue contact maps of the coded proteins are indeed similar. Residue contact statistics using 81 different protein structures confirmed that amino acids that are coded by partially reverse and complementary codons (Watson-Crick base pairs at the 1st and 3rd codon positions and translated in reverse orientation) are preferentially co-located in protein structures.
Collapse
Affiliation(s)
- Jan C Biro
- Homulus Foundation, 88 Howard #1205, San Francisco, CA 94195, USA.
| |
Collapse
|
44
|
Yang J, Dong XC, Leng Y. Application of FTTP to alpha-helix or beta-strand motifs. J Theor Biol 2006; 242:199-219. [PMID: 16616204 DOI: 10.1016/j.jtbi.2006.02.014] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/31/2005] [Revised: 02/17/2006] [Accepted: 02/21/2006] [Indexed: 11/28/2022]
Abstract
Information concerning protein structure is widely dispersed and cannot easily and rapidly be processed by the biological community. We present a database of tendentious factors of three states of tripeptide units from PDB database, called a bank of tendentious factors of three states of three-peptide units (FTTP). The FTTP database was constructed based on conformational dihedral angle (varphi,psi) library of 20(3) peptide triplets by exhaustively searching through PDB databases. We introduce the FTTP database for the analysis of characteristics common to relative conformational biases of all peptide triplets, especially finding some motifs apt to alpha-helix and beta-strand. Our results show that this will provide a platform for studies of short peptide motifs, folding codons, secondary structure and three-dimensional (3D) structure of proteins. Moreover, FTTP is a unique resource that will allow a comprehensive characterization of peptide triplets and thus improve our understanding of sequence-structure relationship, refined domains, 3D structures, and their associated function. We believe the FTTP database will help biologists in increasing the efficiency of finding useful and relevant information regarding structure-function relationship of proteins. Therefore, this approach will play an important role in protein folding, protein engineering, molecular design, and proteomics.
Collapse
Affiliation(s)
- Jie Yang
- State Key Laboratory of Pharmaceutical Biotechnology, College of Life Sciences, Nanjing University, Nanjing 210093, PR China.
| | | | | |
Collapse
|
45
|
Biro JC. Indications that "codon boundaries" are physico-chemically defined and that protein-folding information is contained in the redundant exon bases. Theor Biol Med Model 2006; 3:28. [PMID: 16893453 PMCID: PMC1560374 DOI: 10.1186/1742-4682-3-28] [Citation(s) in RCA: 11] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/16/2005] [Accepted: 08/07/2006] [Indexed: 12/02/2022] Open
Abstract
Background All the information necessary for protein folding is supposed to be present in the amino acid sequence. It is still not possible to provide specific ab initio structure predictions by bioinformatical methods. It is suspected that additional folding information is present in protein coding nucleic acid sequences, but this is not represented by the known genetic code. Results Nucleic acid subsequences comprising the 1st and/or 3rd codon residues in mRNAs express significantly higher free folding energy (FFE) than the subsequence containing only the 2nd residues (p < 0.0001, n = 81). This periodic FFE difference is not present in introns. It is therefore a specific physico-chemical characteristic of coding sequences and might contribute to unambiguous definition of codon boundaries during translation. The FFEs of the 1st and 3rd residues are additive, which suggests that these residues contain a significant number of complementary bases and that may contribute to selection for local RNA secondary structures in coding regions. This periodic, codon-related structure-formation of mRNAs indicates a connection between the structures of exons and the corresponding (translated) proteins. The folding energy dot plots of RNAs and the residue contact maps of the coded proteins are indeed similar. Residue contact statistics using 81 different protein structures confirmed that amino acids that are coded by partially reverse and complementary codons (Watson-Crick (WC) base pairs at the 1st and 3rd codon positions and translated in reverse orientation) are preferentially co-located in protein structures. Conclusion Exons are distinguished from introns, and codon boundaries are physico-chemically defined, by periodically distributed FFE differences between codon positions. There is a selection for local RNA secondary structures in coding regions and this nucleic acid structure resembles the folding profiles of the coded proteins. The preferentially (specifically) interacting amino acids are coded by partially complementary codons, which strongly supports the connection between mRNA and the corresponding protein structures and indicates that there is protein folding information in nucleic acids that is not present in the genetic code. This might suggest an additional explanation of codon redundancy.
Collapse
|
46
|
Yang J, Dong XC, Leng Y. Conformation biases of amino acids based on tripeptide microenvironment from PDB database. J Theor Biol 2005; 240:374-84. [PMID: 16290902 DOI: 10.1016/j.jtbi.2005.09.025] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/13/2005] [Revised: 09/28/2005] [Accepted: 09/29/2005] [Indexed: 11/30/2022]
Abstract
We have constructed a bank (FTTP) of tendentious factors of three states of three-peptide units from PDB database based on conformational dihedral angle library and demonstrated that amino acid biases toward protein secondary structure are present in natural protein sequences. Our research results reveal that 20 standard amino acids fall into three groups: nine residues inclined to alpha-helix with a common character (e.g. direct side chain aliphatic residues or positive/negative charged residues) arrange in three grades, viz EA, QKRLD, and MN, in turn; seven residues are apt to beta-strand with 2'-branched side chain aliphatic residues or benzyl-included residues, namely PV, IYTC, and F, in three ranks; and four residues SHWG show a double tendency to both alpha and beta. Noticeably, proline has the strongest ability to form extended conformation, especially the Re value up to 9.5298 at position 3 (Table 3). Thus, biases of codons show an evident tendency in protein folding, where GC-rich codons are mainly in charge of forming contracted conformation, especially the codon's first letter plays a dominant role in translating the genomic GC signature into protein sequences and structures. So, biases of amino acids will play an important role in protein folding, folding codons, refining domain, structure prediction, and structural genomics/proteomics.
Collapse
Affiliation(s)
- Jie Yang
- Life Science College, State Key Laboratory of Pharmaceutical Biotechnology, Nanjing University, Nanjing 210093, PR China.
| | | | | |
Collapse
|
47
|
Sau K, Gupta SK, Sau S, Ghosh TC. Synonymous codon usage bias in 16 Staphylococcus aureus phages: implication in phage therapy. Virus Res 2005; 113:123-31. [PMID: 15970346 DOI: 10.1016/j.virusres.2005.05.001] [Citation(s) in RCA: 40] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/26/2005] [Revised: 05/06/2005] [Accepted: 05/10/2005] [Indexed: 11/22/2022]
Abstract
To reveal the factors influencing architecture of protein-coding genes in staphylococcal phages, relative synonymous codon usage variation has been investigated in 920 protein-coding genes of 16 staphylococcal phages. As expected for AT rich genomes, there are predominantly A and T ending codons in all 16 phages. Both Nc plot and correspondence analysis on relative synonymous codon usage indicates that mutation bias influences codon usage variation in the 16 phages. Correspondence analysis also suggests that translational selection and gene length also influence the codon usage variation in the phages to some extent and codon usage in staphylococcal phages is phage-specific but not S. aureus-specific. Further analysis indicates that among 16 staphylococcal phages, 44AHJD, P68 and K may be extremely virulent in nature as most of their genes have high translation efficiency. If this is true, then above three phages may be useful for curing staphylococcal infections.
Collapse
Affiliation(s)
- K Sau
- Bioinformatics Centre, Bose Institute, P1/12, CIT Scheme VII M, Calcutta 700 054, India.
| | | | | | | |
Collapse
|
48
|
Bradshaw PC, Rathi A, Samuels DC. Mitochondrial-encoded membrane protein transcripts are pyrimidine-rich while soluble protein transcripts and ribosomal RNA are purine-rich. BMC Genomics 2005; 6:136. [PMID: 16185363 PMCID: PMC1262711 DOI: 10.1186/1471-2164-6-136] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/12/2005] [Accepted: 09/26/2005] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Eukaryotic organisms contain mitochondria, organelles capable of producing large amounts of ATP by oxidative phosphorylation. Each cell contains many mitochondria with many copies of mitochondrial DNA in each organelle. The mitochondrial DNA encodes a small but functionally critical portion of the oxidative phosphorylation machinery, a few other species-specific proteins, and the rRNA and tRNA used for the translation of these transcripts. Because the microenvironment of the mitochondrion is unique, mitochondrial genes may be subject to different selectional pressures than those affecting nuclear genes. RESULTS From an analysis of the mitochondrial genomes of a wide range of eukaryotic species we show that there are three simple rules for the pyrimidine and purine abundances in mitochondrial DNA transcripts. Mitochondrial membrane protein transcripts are pyrimidine rich, rRNA transcripts are purine-rich and the soluble protein transcripts are purine-rich. The transitions between pyrimidine and purine-rich regions of the genomes are rapid and are easily visible on a pyrimidine-purine walk graph. These rules are followed, with few exceptions, independent of which strand encodes the gene. Despite the robustness of these rules across a diverse set of species, the magnitude of the differences between the pyrimidine and purine content is fairly small. Typically, the mitochondrial membrane protein transcripts have a pyrimidine richness of 56%, the rRNA transcripts are 55% purine, and the soluble protein transcripts are only 53% purine. CONCLUSION The pyrimidine richness of mitochondrial-encoded membrane protein transcripts is partly driven by U nucleotides in the second codon position in all species, which yields hydrophobic amino acids. The purine-richness of soluble protein transcripts is mainly driven by A nucleotides in the first codon position. The purine-richness of rRNA is also due to an abundance of A nucleotides. Possible mechanisms as to how these trends are maintained in mtDNA genomes of such diverse ancestry, size and variability of A-T richness are discussed.
Collapse
Affiliation(s)
- Patrick C Bradshaw
- Virginia Bioinformatics Institute, Virginia Polytechnic Institute and State University, Blacksburg, VA 24061, USA
| | - Anand Rathi
- Virginia Bioinformatics Institute, Virginia Polytechnic Institute and State University, Blacksburg, VA 24061, USA
| | - David C Samuels
- Virginia Bioinformatics Institute, Virginia Polytechnic Institute and State University, Blacksburg, VA 24061, USA
| |
Collapse
|
49
|
Biro JC. Nucleic acid chaperons: a theory of an RNA-assisted protein folding. Theor Biol Med Model 2005; 2:35. [PMID: 16137324 PMCID: PMC1232867 DOI: 10.1186/1742-4682-2-35] [Citation(s) in RCA: 12] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/10/2005] [Accepted: 09/01/2005] [Indexed: 12/04/2022] Open
Abstract
Background Proteins are assumed to contain all the information necessary for unambiguous folding (Anfinsen's principle). However, ab initio structure prediction is often not successful because the amino acid sequence itself is not sufficient to guide between endless folding possibilities. It seems to be a logical to try to find the "missing" information in nucleic acids, in the redundant codon base. Results mRNA energy dot plots and protein residue contact maps were found to be rather similar. The structure of mRNA is also conserved if the protein structure is conserved, even if the sequence similarity is low. These observations led me to suppose that some similarity might exist between nucleic acid and protein folding. I found that amino acid pairs, which are co-located in the protein structure, are preferentially coded by complementary codons. This codon complementarity is not perfect; it is suboptimal where the 1st and 3rd codon residues are complementary to each other in reverse orientation, while the 2nd codon letters may be, but are not necessarily, complementary. Conclusion Partial complementary coding of co-locating amino acids in protein structures suggests that mRNA assists in protein folding and functions not only as a template but even as a chaperon during translation. This function explains the role of wobble bases and answers the mystery of why we have a redundant codon base.
Collapse
Affiliation(s)
- Jan C Biro
- Homulus Foundation, San Francisco, CA 94105, USA.
| |
Collapse
|
50
|
Sau K, Sau S, Mandal SC, Ghosh TC. Factors influencing the synonymous codon and amino acid usage bias in AT-rich Pseudomonas aeruginosa phage PhiKZ. Acta Biochim Biophys Sin (Shanghai) 2005; 37:625-33. [PMID: 16143818 PMCID: PMC7109957 DOI: 10.1111/j.1745-7270.2005.00089.x] [Citation(s) in RCA: 14] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022] Open
Abstract
To reveal how the AT-rich genome of bacteriophage PhiKZ has been shaped in order to carry out its growth in the GC-rich host Pseudomonas aeruginosa, synonymous codon and amino acid usage bias of PhiKZ was investigated and the data were compared with that of P. aeruginosa. It was found that synonymous codon and amino acid usage of PhiKZ was distinct from that of P. aeruginosa. In contrast to P. aeruginosa, the third codon position of the synonymous codons of PhiKZ carries mostly A or T base; codon usage bias in PhiKZ is dictated mainly by mutational bias and, to a lesser extent, by translational selection. A cluster analysis of the relative synonymous codon usage values of 16 myoviruses including PhiKZ shows that PhiKZ is evolutionary much closer to Escherichia coli phage T4. Further analysis reveals that the three factors of mean molecular weight, aromaticity and cysteine content are mostly responsible for the variation of amino acid usage in PhiKZ proteins, whereas amino acid usage of P. aeruginosa proteins is mainly governed by grand average of hydropathicity, aromaticity and cysteine content. Based on these observations, we suggest that codons of the phage-like PhiKZ have evolved to preferentially incorporate the smaller amino acid residues into their proteins during translation, thereby economizing the cost of its development in GC-rich P. aeruginosa.
Collapse
Affiliation(s)
- K. Sau
- Department of Mathematics, Jadavpur UniversityCalcutta 700 032, India
| | - S. Sau
- Department of Biochemistry, Bose Institute, P1/12-CIT Scheme VII MCalcutta 700 054, India
| | - S. C. Mandal
- Department of Mathematics, Jadavpur UniversityCalcutta 700 032, India
- Corresponding authors: S. C. MANDAL: E-mail,
| | - T. C. Ghosh
- Bioinformatics Centre, Bose Institute, P1/12-CIT Scheme VII MCalcutta 700 054, India
- T. C. GHOSH: Tel, +91-33-2334 6626; Fax, +91-33-2334 3886; E-mail,
| |
Collapse
|