1
|
The Canonical Table of the Genetic Code as a periodic system of triplets. Biosystems 2022; 214:104636. [PMID: 35181371 DOI: 10.1016/j.biosystems.2022.104636] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/28/2021] [Revised: 01/29/2022] [Accepted: 01/30/2022] [Indexed: 11/22/2022]
Abstract
The Canonical Table of the Genetic Code (CTGC) is constructed theoretically on the basis of the similarity of PFs (PF) of proteins with the conformation of 4-arc chain graphs (Karasev, 2019). Of the 64 conformations of the graph, specified by the position of the connectivity edges, and the matrices of 6 variables (x1 … x6), xi = (0, 1), 4 blocks of 16 elements each were formed. Then they were coded in the form of triplets based on the correspondence of pairs of variables to four letters of the code: 00 = C, 01 = U, 10 = G, 11 = A, and supplemented based on the known triplet-amino acid assignment. The resulting table is compared with the Periodic Table of Chemical Elements (PTCE). As in the PTCE, this CTGC has an initial element - a triplet that encodes graphs with zero number of connected edges. Within each block, vacancies are filled with connectivity edges in two alternative ways, both in rows and in the columns. As we move from the initial block 00 to the final block 11, there is a sequential filling of vacancies for variables x3x4: 00, 01, 10, 11. In general, the CTGC can be considered as a periodic system of triplets. Comparison with the previously described variety of tables of the genetic code made it possible to conclude that the CTGC more adequately reflects the properties of the genetic code. Prospects for the possible application of this table are being discussed.
Collapse
|
2
|
Fontrodona N, Aubé F, Claude JB, Polvèche H, Lemaire S, Tranchevent LC, Modolo L, Mortreux F, Bourgeois CF, Auboeuf D. Interplay between coding and exonic splicing regulatory sequences. Genome Res 2019; 29:711-722. [PMID: 30962178 PMCID: PMC6499313 DOI: 10.1101/gr.241315.118] [Citation(s) in RCA: 13] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/02/2018] [Accepted: 03/28/2019] [Indexed: 01/24/2023]
Abstract
The inclusion of exons during the splicing process depends on the binding of splicing factors to short low-complexity regulatory sequences. The relationship between exonic splicing regulatory sequences and coding sequences is still poorly understood. We demonstrate that exons that are coregulated by any given splicing factor share a similar nucleotide composition bias and preferentially code for amino acids with similar physicochemical properties because of the nonrandomness of the genetic code. Indeed, amino acids sharing similar physicochemical properties correspond to codons that have the same nucleotide composition bias. In particular, we uncover that the TRA2A and TRA2B splicing factors that bind to adenine-rich motifs promote the inclusion of adenine-rich exons coding preferentially for hydrophilic amino acids that correspond to adenine-rich codons. SRSF2 that binds guanine/cytosine-rich motifs promotes the inclusion of GC-rich exons coding preferentially for small amino acids, whereas SRSF3 that binds cytosine-rich motifs promotes the inclusion of exons coding preferentially for uncharged amino acids, like serine and threonine that can be phosphorylated. Finally, coregulated exons encoding amino acids with similar physicochemical properties correspond to specific protein features. In conclusion, the regulation of an exon by a splicing factor that relies on the affinity of this factor for specific nucleotide(s) is tightly interconnected with the exon-encoded physicochemical properties. We therefore uncover an unanticipated bidirectional interplay between the splicing regulatory process and its biological functional outcome.
Collapse
Affiliation(s)
- Nicolas Fontrodona
- Université Lyon, ENS de Lyon, Université Claude Bernard, CNRS UMR 5239, INSERM U1210, Laboratory of Biology and Modelling of the Cell, F-69007, Lyon, France
| | - Fabien Aubé
- Université Lyon, ENS de Lyon, Université Claude Bernard, CNRS UMR 5239, INSERM U1210, Laboratory of Biology and Modelling of the Cell, F-69007, Lyon, France
| | - Jean-Baptiste Claude
- Université Lyon, ENS de Lyon, Université Claude Bernard, CNRS UMR 5239, INSERM U1210, Laboratory of Biology and Modelling of the Cell, F-69007, Lyon, France
| | - Hélène Polvèche
- Université Lyon, ENS de Lyon, Université Claude Bernard, CNRS UMR 5239, INSERM U1210, Laboratory of Biology and Modelling of the Cell, F-69007, Lyon, France
| | - Sébastien Lemaire
- Université Lyon, ENS de Lyon, Université Claude Bernard, CNRS UMR 5239, INSERM U1210, Laboratory of Biology and Modelling of the Cell, F-69007, Lyon, France
| | - Léon-Charles Tranchevent
- Proteome and Genome Research Unit, Department of Oncology, Luxembourg Institute of Health (LIH), L-1445 Strassen, Luxembourg
| | - Laurent Modolo
- LBMC Biocomputing Center, CNRS UMR 5239, INSERM U1210, F-69007, Lyon, France
| | - Franck Mortreux
- Université Lyon, ENS de Lyon, Université Claude Bernard, CNRS UMR 5239, INSERM U1210, Laboratory of Biology and Modelling of the Cell, F-69007, Lyon, France
| | - Cyril F Bourgeois
- Université Lyon, ENS de Lyon, Université Claude Bernard, CNRS UMR 5239, INSERM U1210, Laboratory of Biology and Modelling of the Cell, F-69007, Lyon, France
| | - Didier Auboeuf
- Université Lyon, ENS de Lyon, Université Claude Bernard, CNRS UMR 5239, INSERM U1210, Laboratory of Biology and Modelling of the Cell, F-69007, Lyon, France
| |
Collapse
|
3
|
Banwell EF, Piette BMAG, Taormina A, Heddle JG. Reciprocal Nucleopeptides as the Ancestral Darwinian Self-Replicator. Mol Biol Evol 2019; 35:404-416. [PMID: 29126321 PMCID: PMC5850689 DOI: 10.1093/molbev/msx292] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/02/2023] Open
Abstract
Even the simplest organisms are too complex to have spontaneously arisen fully formed, yet precursors to first life must have emerged ab initio from their environment. A watershed event was the appearance of the first entity capable of evolution: the Initial Darwinian Ancestor. Here, we suggest that nucleopeptide reciprocal replicators could have carried out this important role and contend that this is the simplest way to explain extant replication systems in a mathematically consistent way. We propose short nucleic acid templates on which amino-acylated adapters assembled. Spatial localization drives peptide ligation from activated precursors to generate phosphodiester-bond-catalytic peptides. Comprising autocatalytic protein and nucleic acid sequences, this dynamical system links and unifies several previous hypotheses and provides a plausible model for the emergence of DNA and the operational code.
Collapse
Affiliation(s)
- Eleanor F Banwell
- Heddle Initiative Research Unit, RIKEN, 2-1 Hirosawa, Wako, Saitama 351-0198, Japan
| | | | - Anne Taormina
- Department for Mathematical Sciences, Durham University, Durham, United Kingdom
| | - Jonathan G Heddle
- Heddle Initiative Research Unit, RIKEN, 2-1 Hirosawa, Wako, Saitama 351-0198, Japan.,Bionanoscience and Biochemistry Laboratory, Malopolska Centre of Biotechnology, Jagiellonian University, Krakow, Poland
| |
Collapse
|
4
|
Nemzer LR. Shannon information entropy in the canonical genetic code. J Theor Biol 2017; 415:158-170. [DOI: 10.1016/j.jtbi.2016.12.010] [Citation(s) in RCA: 23] [Impact Index Per Article: 2.9] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/25/2016] [Revised: 11/30/2016] [Accepted: 12/12/2016] [Indexed: 11/15/2022]
|
5
|
Elengoe A, Hamdan S. In Silico Molecular Modeling and Docking Studies on Novel Mutants (E229V, H225P and D230C) of the Nucleotide-Binding Domain of Homo sapiens Hsp70. Interdiscip Sci 2016; 9:478-498. [PMID: 27517798 DOI: 10.1007/s12539-016-0181-8] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/27/2015] [Revised: 07/22/2016] [Accepted: 08/01/2016] [Indexed: 12/25/2022]
Abstract
In this study, we explored the possibility of determining the synergistic interactions between nucleotide-binding domain (NBD) of Homo sapiens heat-shock 70 kDa protein (Hsp70) and E1A 32 kDa of adenovirus serotype 5 motif (PNLVP) in the efficiency of killing of tumor cells in cancer treatment. At present, the protein interaction between NBD and PNLVP motif is still unknown, but believed to enhance the rate of virus replication in tumor cells. Three mutant models (E229V, H225P and D230C) were built and simulated, and their interactions with PNLVP motif were studied. The PNLVP motif showed the binding energy and intermolecular energy values with the novel E229V mutant at -7.32 and -11.2 kcal/mol. The E229V mutant had the highest number of hydrogen bonds (7). Based on the root mean square deviation, root mean square fluctuation, hydrogen bonds, salt bridge, secondary structure, surface-accessible solvent area, potential energy and distance matrices analyses, it was proved that the E229V had the strongest and most stable interaction with the PNLVP motif among all the four protein-ligand complex structures. The knowledge of this protein-ligand complex model would help in designing Hsp70 structure-based drug for cancer therapy.
Collapse
Affiliation(s)
- Asita Elengoe
- Department of Biosciences and Health Sciences, Faculty of Biosciences and Medical Engineering, Universiti Teknologi Malaysia, 81310, Skudai, Johor, Malaysia
| | - Salehhuddin Hamdan
- Department of Biosciences and Health Sciences, Faculty of Biosciences and Medical Engineering, Universiti Teknologi Malaysia, 81310, Skudai, Johor, Malaysia.
| |
Collapse
|
6
|
Elengoe A, Naser MA, Hamdan S. Molecular dynamics simulation and docking studies on novel mutants (T11V, T12P and D364S) of the nucleotide-binding domain of human heat shock 70 kDa protein. Biologia (Bratisl) 2015. [DOI: 10.1515/biolog-2015-0194] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/15/2022]
|
7
|
Cross Talk between KGF and KITLG Proteins Implicated with Ovarian Folliculogenesis in Buffalo Bubalus bubalis. PLoS One 2015; 10:e0127993. [PMID: 26083339 PMCID: PMC4470682 DOI: 10.1371/journal.pone.0127993] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/25/2015] [Accepted: 04/21/2015] [Indexed: 11/19/2022] Open
Abstract
Molecular interactions between mesenchymal-derived Keratinocyte growth factor (KGF) and Kit ligand (KITLG) are essential for follicular development. These factors are expressed by theca and granulosa cells. We determined full length coding sequence of buffalo KGF and KITLG proteins having 194 and 274 amino acids, respectively. The recombinant KGF and KITLG proteins were solubilized in 10 mM Tris, pH 7.5 and 50 mM Tris, pH 7.4 and purified using Ni-NTA column and GST affinity chromatography, respectively. The purity and molecular weight of His-KGF (~23 kDa) and GST-KITLG (~57 kDa) proteins were confirmed by SDS-PAGE and western blotting. The co-immunoprecipitation assay accompanied with computational analysis demonstrated the interaction between KGF and KITLG proteins. We deduced 3D structures of the candidate proteins and assessed their binding based on protein docking. In the process, KGF specific residues, Lys123, Glu135, Lys140, Lys155 and Trp156 and KITLG specific ones, Ser226, Phe233, Gly234, Ala235, Phe236, Trp238 and Lys239 involved in the formation of KGF-KITLG complex were detected. The hydrophobic interactions surrounding KGF-KITLG complex affirmed their binding affinity and stability to the interacting interface. Additionally, in-silico site directed mutagenesis enabled the assessment of changes that occurred in the binding energies of mutated KGF-KITLG protein complex. Our results demonstrate that in the presence of KITLG, KGF mimics its native binding mode suggesting all the KGF residues are specific to their binding complex. This study provides an insight on the critical amino acid residues participating in buffalo ovarian folliculogenesis.
Collapse
|
8
|
Elengoe A, Naser MA, Hamdan S. Modeling and docking studies on novel mutants (K71L and T204V) of the ATPase domain of human heat shock 70 kDa protein 1. Int J Mol Sci 2014; 15:6797-814. [PMID: 24758925 PMCID: PMC4013662 DOI: 10.3390/ijms15046797] [Citation(s) in RCA: 26] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/25/2014] [Revised: 04/03/2014] [Accepted: 04/04/2014] [Indexed: 11/16/2022] Open
Abstract
The purpose of exploring protein interactions between human adenovirus and heat shock protein 70 is to exploit a potentially synergistic interaction to enhance anti-tumoral efficacy and decrease toxicity in cancer treatment. However, the protein interaction of Hsp70 with E1A32 kDa of human adenovirus serotype 5 remains to be elucidated. In this study, two residues of ATPase domain of human heat shock 70 kDa protein 1 (PDB: 1 HJO) were mutated. 3D mutant models (K71L and T204V) using PyMol software were then constructed. The structures were evaluated by PROCHECK, ProQ, ERRAT, Verify 3D and ProSA modules. All evidence suggests that all protein models are acceptable and of good quality. The E1A32 kDa motif was retrieved from UniProt (P03255), as well as subjected to docking interaction with NBD, K71L and T204V, using the Autodock 4.2 program. The best lowest binding energy value of −9.09 kcal/mol was selected for novel T204V. Moreover, the protein-ligand complex structures were validated by RMSD, RMSF, hydrogen bonds and salt bridge analysis. This revealed that the T204V-E1A32 kDa motif complex was the most stable among all three complex structures. This study provides information about the interaction between Hsp70 and the E1A32 kDa motif, which emphasizes future perspectives to design rational drugs and vaccines in cancer therapy.
Collapse
Affiliation(s)
- Asita Elengoe
- Faculty of Bioscience and Medical Engineering, Universiti Teknologi Malaysia, Skudai, Johor 81310, Malaysia.
| | - Mohammed Abu Naser
- Faculty of Bioscience and Medical Engineering, Universiti Teknologi Malaysia, Skudai, Johor 81310, Malaysia.
| | - Salehhuddin Hamdan
- Faculty of Bioscience and Medical Engineering, Universiti Teknologi Malaysia, Skudai, Johor 81310, Malaysia.
| |
Collapse
|
9
|
Zanzoni A, Marchese D, Agostini F, Bolognesi B, Cirillo D, Botta-Orfila M, Livi CM, Rodriguez-Mulero S, Tartaglia GG. Principles of self-organization in biological pathways: a hypothesis on the autogenous association of alpha-synuclein. Nucleic Acids Res 2013; 41:9987-98. [PMID: 24003031 PMCID: PMC3905859 DOI: 10.1093/nar/gkt794] [Citation(s) in RCA: 36] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/17/2022] Open
Abstract
Previous evidence indicates that a number of proteins are able to interact with cognate mRNAs. These autogenous associations represent important regulatory mechanisms that control gene expression at the translational level. Using the catRAPID approach to predict the propensity of proteins to bind to RNA, we investigated the occurrence of autogenous associations in the human proteome. Our algorithm correctly identified binding sites in well-known cases such as thymidylate synthase, tumor suppressor P53, synaptotagmin-1, serine/ariginine-rich splicing factor 2, heat shock 70 kDa, ribonucleic particle-specific U1A and ribosomal protein S13. In addition, we found that several other proteins are able to bind to their own mRNAs. A large-scale analysis of biological pathways revealed that aggregation-prone and structurally disordered proteins have the highest propensity to interact with cognate RNAs. These findings are substantiated by experimental evidence on amyloidogenic proteins such as TAR DNA-binding protein 43 and fragile X mental retardation protein. Among the amyloidogenic proteins, we predicted that Parkinson’s disease-related α-synuclein is highly prone to interact with cognate transcripts, which suggests the existence of RNA-dependent factors in its function and dysfunction. Indeed, as aggregation is intrinsically concentration dependent, it is possible that autogenous interactions play a crucial role in controlling protein homeostasis.
Collapse
Affiliation(s)
- Andreas Zanzoni
- Gene Function and Evolution, Bioinformatics and Genomics, Centre for Genomic Regulation (CRG), 08003 Barcelona, Spain and Universitat Pompeu Fabra (UPF), 08003 Barcelona, Spain
| | | | | | | | | | | | | | | | | |
Collapse
|
10
|
Biro JC, Biro JM. The concept of RNA-assisted protein folding: Representation of amino acid kinetics at the tRNA level. J Theor Biol 2013; 317:168-74. [DOI: 10.1016/j.jtbi.2012.09.032] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/17/2012] [Revised: 09/24/2012] [Accepted: 09/25/2012] [Indexed: 10/27/2022]
|
11
|
Biro JC. Coding nucleic acids are chaperons for protein folding: a novel theory of protein folding. Gene 2012; 515:249-57. [PMID: 23266645 DOI: 10.1016/j.gene.2012.12.048] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/02/2012] [Revised: 12/04/2012] [Accepted: 12/06/2012] [Indexed: 11/29/2022]
Abstract
The arguments for nucleic acid chaperons are reviewed and three new lines of evidence are added. (1) It was found that amino acids encoded by codons in short nucleic acid loops frequently form turns and helices in the corresponding protein structures. (2) The amino acids encoded by partially complementary (1st and 3rd nucleotides) codons are more frequently co-located in the encoded proteins than expected by chance. (3) There are significant correlations between thermodynamic changes (ddG) caused by codon mutations in nucleic acids and the thermodynamic changes caused by the corresponding amino acid mutations in the encoded proteins. We conclude that the concept of the Proteomic Code and nucleic acid chaperons seems correct from the bioinformatics point of view, and we expect to see direct biochemical experiments and evidence in the near future.
Collapse
Affiliation(s)
- Jan C Biro
- Karolinska Institute, Stockholm, Sweden.
| |
Collapse
|
12
|
|
13
|
Biro JC. The concept of RNA-assisted protein folding: the role of tRNA. Theor Biol Med Model 2012; 9:10. [PMID: 22462735 PMCID: PMC3359187 DOI: 10.1186/1742-4682-9-10] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/11/2012] [Accepted: 04/02/2012] [Indexed: 02/07/2023] Open
Abstract
We suggest that tRNA actively participates in the transfer of 3D information from mRNA to peptides--in addition to its well-known, "classical" role of translating the 3-letter RNA codes into the one letter protein code. The tRNA molecule displays a series of thermodynamically favored configurations during translation, a movement which places the codon and coded amino acids in proximity to each other and make physical contact between some amino acids and their codons possible. This specific codon-amino acid interaction of some selected amino acids is necessary for the transfer of spatial information from mRNA to coded proteins, and is known as RNA-assisted protein folding.
Collapse
Affiliation(s)
- Jan C Biro
- Karolinska Institute, Stockholm, Sweden.
| |
Collapse
|
14
|
Zhang Z, Li J, Cui P, Ding F, Li A, Townsend JP, Yu J. Codon Deviation Coefficient: a novel measure for estimating codon usage bias and its statistical significance. BMC Bioinformatics 2012; 13:43. [PMID: 22435713 PMCID: PMC3368730 DOI: 10.1186/1471-2105-13-43] [Citation(s) in RCA: 44] [Impact Index Per Article: 3.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/29/2011] [Accepted: 03/22/2012] [Indexed: 02/07/2023] Open
Abstract
Background Genetic mutation, selective pressure for translational efficiency and accuracy, level of gene expression, and protein function through natural selection are all believed to lead to codon usage bias (CUB). Therefore, informative measurement of CUB is of fundamental importance to making inferences regarding gene function and genome evolution. However, extant measures of CUB have not fully accounted for the quantitative effect of background nucleotide composition and have not statistically evaluated the significance of CUB in sequence analysis. Results Here we propose a novel measure--Codon Deviation Coefficient (CDC)--that provides an informative measurement of CUB and its statistical significance without requiring any prior knowledge. Unlike previous measures, CDC estimates CUB by accounting for background nucleotide compositions tailored to codon positions and adopts the bootstrapping to assess the statistical significance of CUB for any given sequence. We evaluate CDC by examining its effectiveness on simulated sequences and empirical data and show that CDC outperforms extant measures by achieving a more informative estimation of CUB and its statistical significance. Conclusions As validated by both simulated and empirical data, CDC provides a highly informative quantification of CUB and its statistical significance, useful for determining comparative magnitudes and patterns of biased codon usage for genes or genomes with diverse sequence compositions.
Collapse
Affiliation(s)
- Zhang Zhang
- Computational Bioscience Research Center (CBRC), King Abdullah Universitof Science and Technology (KAUST), Thuwal 23955-6900, Kingdom of Saudi Arabia
| | | | | | | | | | | | | |
Collapse
|
15
|
Zhang Z, Yu J. On the organizational dynamics of the genetic code. GENOMICS PROTEOMICS & BIOINFORMATICS 2011; 9:21-9. [PMID: 21641559 PMCID: PMC5054158 DOI: 10.1016/s1672-0229(11)60004-1] [Citation(s) in RCA: 20] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 08/30/2010] [Accepted: 10/26/2010] [Indexed: 11/23/2022]
Abstract
The organization of the canonical genetic code needs to be thoroughly illuminated. Here we reorder the four nucleotides—adenine, thymine, guanine and cytosine—according to their emergence in evolution, and apply the organizational rules to devising an algebraic representation for the canonical genetic code. Under a framework of the devised code, we quantify codon and amino acid usages from a large collection of 917 prokaryotic genome sequences, and associate the usages with its intrinsic structure and classification schemes as well as amino acid physicochemical properties. Our results show that the algebraic representation of the code is structurally equivalent to a content-centric organization of the code and that codon and amino acid usages under different classification schemes were correlated closely with GC content, implying a set of rules governing composition dynamics across a wide variety of prokaryotic genome sequences. These results also indicate that codons and amino acids are not randomly allocated in the code, where the six-fold degenerate codons and their amino acids have important balancing roles for error minimization. Therefore, the content-centric code is of great usefulness in deciphering its hitherto unknown regularities as well as the dynamics of nucleotide, codon, and amino acid compositions.
Collapse
Affiliation(s)
- Zhang Zhang
- Plant Stress Genomics Research Center, Division of Chemical and Life Sciences and Engineering, King Abdullah University of Science and Technology, Thuwal, Saudi Arabia
| | | |
Collapse
|
16
|
Zhang Z, Yu J. Modeling compositional dynamics based on GC and purine contents of protein-coding sequences. Biol Direct 2010; 5:63. [PMID: 21059261 PMCID: PMC2989939 DOI: 10.1186/1745-6150-5-63] [Citation(s) in RCA: 11] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/16/2010] [Accepted: 11/08/2010] [Indexed: 12/03/2022] Open
Abstract
Background Understanding the compositional dynamics of genomes and their coding sequences is of great significance in gaining clues into molecular evolution and a large number of publically-available genome sequences have allowed us to quantitatively predict deviations of empirical data from their theoretical counterparts. However, the quantification of theoretical compositional variations for a wide diversity of genomes remains a major challenge. Results To model the compositional dynamics of protein-coding sequences, we propose two simple models that take into account both mutation and selection effects, which act differently at the three codon positions, and use both GC and purine contents as compositional parameters. The two models concern the theoretical composition of nucleotides, codons, and amino acids, with no prerequisite of homologous sequences or their alignments. We evaluated the two models by quantifying theoretical compositions of a large collection of protein-coding sequences (including 46 of Archaea, 686 of Bacteria, and 826 of Eukarya), yielding consistent theoretical compositions across all the collected sequences. Conclusions We show that the compositions of nucleotides, codons, and amino acids are largely determined by both GC and purine contents and suggest that deviations of the observed from the expected compositions may reflect compositional signatures that arise from a complex interplay between mutation and selection via DNA replication and repair mechanisms. Reviewers This article was reviewed by Zhaolei Zhang (nominated by Mark Gerstein), Guruprasad Ananda (nominated by Kateryna Makova), and Daniel Haft.
Collapse
Affiliation(s)
- Zhang Zhang
- Plant Stress Genomics Research Center, Division of Chemical and Life Sciences and Engineering, King Abdullah University of Science and Technology, Thuwal 23955-6900, Kingdom of Saudi Arabia
| | | |
Collapse
|
17
|
Castro-Chavez F. The rules of variation: amino acid exchange according to the rotating circular genetic code. J Theor Biol 2010; 264:711-21. [PMID: 20371250 PMCID: PMC3130497 DOI: 10.1016/j.jtbi.2010.03.046] [Citation(s) in RCA: 20] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/01/2009] [Revised: 03/06/2010] [Accepted: 03/30/2010] [Indexed: 12/11/2022]
Abstract
General guidelines for the molecular basis of functional variation are presented while focused on the rotating circular genetic code and allowable exchanges that make it resistant to genetic diseases under normal conditions. The rules of variation, bioinformatics aids for preventative medicine, are: (1) same position in the four quadrants for hydrophobic codons, (2) same or contiguous position in two quadrants for synonymous or related codons, and (3) same quadrant for equivalent codons. To preserve protein function, amino acid exchange according to the first rule takes into account the positional homology of essential hydrophobic amino acids with every codon with a central uracil in the four quadrants, the second rule includes codons for identical, acidic, or their amidic amino acids present in two quadrants, and the third rule, the smaller, aromatic, stop codons, and basic amino acids, each in proximity within a 90 degree angle. I also define codifying genes and palindromati, CTCGTGCCGAATTCGGCACGAG.
Collapse
|
18
|
Traphagen SJ, Dimarco MJ, Silliker ME. RNA editing of 10 Didymium iridis mitochondrial genes and comparison with the homologous genes in Physarum polycephalum. RNA (NEW YORK, N.Y.) 2010; 16:828-838. [PMID: 20159952 PMCID: PMC2844629 DOI: 10.1261/rna.1989310] [Citation(s) in RCA: 12] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 11/06/2009] [Accepted: 12/22/2009] [Indexed: 05/28/2023]
Abstract
Regions of the Didymium iridis mitochondrial genome were identified with similarity to typical mitochondrial genes; however, these regions contained numerous stop codons. We used RT-PCR and DNA sequencing to determine whether, through RNA editing, these regions were transcribed into mRNAs that could encode functional proteins. Ten putative gene regions were examined: atp1, atp6, atp8, atp9, cox1, cox2, cytb, nad4L, nad6, and nad7. The cDNA sequences of each gene could encode a functional mitochondrial protein that was highly conserved compared with homologous genes. The type of editing events and editing sequence features were very similar to those observed in the homologous genes of Physarum polycephalum, though the actual editing locations showed a variable degree of conservation. Edited sites were compared with encoded sites in D. iridis and P. polycephalum for all 10 genes. Edited sequence for a portion of the cox1 gene was available for six myxomycetes, which, when compared, showed a high degree of conservation at the protein level. Different types of editing events showed varying degrees of site conservation with C-to-U base changes being the least conserved. Several aspects of single C insertion editing events led to the preferential creation of hydrophobic amino acid codons that may help to minimize adverse effects on the resulting protein structure.
Collapse
Affiliation(s)
- Stephen J Traphagen
- The English High School, Boston Public Schools, Boston, Massachusetts 02130, USA
| | | | | |
Collapse
|
19
|
Liu X, Zhang J, Ni F, Dong X, Han B, Han D, Ji Z, Zhao Y. Genome wide exploration of the origin and evolution of amino acids. BMC Evol Biol 2010; 10:77. [PMID: 20230639 PMCID: PMC2853539 DOI: 10.1186/1471-2148-10-77] [Citation(s) in RCA: 13] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/15/2009] [Accepted: 03/15/2010] [Indexed: 11/10/2022] Open
Abstract
Background Even after years of exploration, the terrestrial origin of bio-molecules remains unsolved and controversial. Today, observation of amino acid composition in proteins has become an alternative way for a global understanding of the mystery encoded in whole genomes and seeking clues for the origin of amino acids. Results In this study, we statistically monitored the frequencies of 20 alpha-amino acids in 549 taxa from three kingdoms of life: archaebacteria, eubacteria, and eukaryotes. We found that the amino acids evolved independently in these three kingdoms; but, conserved linkages were observed in two groups of amino acids, (A, G, H, L, P, Q, R, and W) and (F, I, K, N, S, and Y). Moreover, the amino acids encoded by GC-poor codons (F, Y, N, K, I, and M) were found to "lose" their usage in the development from single cell eukaryotic organisms like S. cerevisiae to H. sapiens, while the amino acids encoded by GC-rich codons (P, A, G, and W) were found to gain usage. These findings further support the co-evolution hypothesis of amino acids and genetic codes. Conclusion We proposed a new chronological order of the appearance of amino acids (L, A, V/E/G, S, I, K, T, R/D, P, N, F, Q, Y, M, H, W, C). Two conserved evolutionary paths of amino acids were also suggested: A→G→R→P and K→Y.
Collapse
Affiliation(s)
- Xiaoxia Liu
- The Key Laboratory for Chemical Biology of Fujian Province, Department of Chemistry, College of Chemistry and Chemical Engineering, Xiamen University, Xiamen 361005, Fujian, PR China
| | | | | | | | | | | | | | | |
Collapse
|
20
|
Hendrickson PG, Silliker ME. RNA editing in six mitochondrial ribosomal protein genes of Didymium iridis. Curr Genet 2010; 56:203-13. [PMID: 20169440 DOI: 10.1007/s00294-010-0292-4] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/09/2009] [Revised: 01/29/2010] [Accepted: 02/01/2010] [Indexed: 11/30/2022]
Abstract
Similarity searches with Didymium iridis mitochondrial genomic DNA identified six possible ribosomal protein-coding regions, however, each region contained stop codons that would need to be removed by RNA editing to produce functional transcripts. RT-PCR was used to amplify these regions from total RNA for cloning and sequencing. Six functional transcripts were verified for the following ribosomal protein genes: rpS12, rpS7, rpL2, rpS19, rpS3, and rpL16. The editing events observed, such as single C and U nucleotide insertions and a dinucleotide insertion, were consistent with previously observed editing patterns seen in D. iridis. Additionally, a new form of insertional editing, a single A insertion, was observed in a conserved region of the rpL16 gene. While the majority of codons created by editing specify hydrophobic amino acids, a greater proportion of the codons created in these hydrophilic ribosomal proteins called for positively charged amino acids in comparison to the previously characterized hydrophobic respiratory protein genes. This first report of edited soluble mitochondrial ribosomal proteins in myxomycetes expands upon the RNA editing patterns previously seen; there was: a greater proportion of created codons specifying positively charged amino acids, a shift in the codon position edited, and the insertion of single A nucleotides.
Collapse
Affiliation(s)
- Peter G Hendrickson
- Immunology Department, Children's Memorial Research Center, Chicago, IL 60614, USA
| | | |
Collapse
|
21
|
Agutter PS. Editorial: hypotheses about protein folding--the proteomic code and wonderfolds. THEORETICAL BIOLOGY & MEDICAL MODELLING 2009; 6:31. [PMID: 20034380 PMCID: PMC2803780 DOI: 10.1186/1742-4682-6-31] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Subscribe] [Scholar Register] [Received: 12/16/2009] [Accepted: 12/24/2009] [Indexed: 11/22/2022]
Abstract
Theoretical biology journals can contribute in many ways to the progress of knowledge. They are particularly well-placed to encourage dialogue and debate about hypotheses addressing problematical areas of research. An online journal provides an especially useful forum for such debate because of the option of posting comments within days of the publication of a contentious article.
Collapse
|
22
|
Berleant D, White M, Pierce E, Tudoreanu E, Boeszoermenyi A, Shtridelman Y, Macosko JC. The Genetic Code—More Than Just a Table. Cell Biochem Biophys 2009; 55:107-16. [DOI: 10.1007/s12013-009-9060-9] [Citation(s) in RCA: 11] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/27/2009] [Accepted: 07/02/2009] [Indexed: 10/20/2022]
|
23
|
Biro JC. Discovery of proteomic code with mRNA assisted protein folding. Int J Mol Sci 2008; 9:2424-2446. [PMID: 19330085 PMCID: PMC2635648 DOI: 10.3390/ijms9122424] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/18/2008] [Revised: 11/24/2008] [Accepted: 12/02/2008] [Indexed: 01/18/2023] Open
Abstract
The 3x redundancy of the Genetic Code is usually explained as a necessity to increase the mutation-resistance of the genetic information. However recent bioinformatical observations indicate that the redundant Genetic Code contains more biological information than previously known and which is additional to the 64/20 definition of amino acids. It might define the physico-chemical and structural properties of amino acids, the codon boundaries, the amino acid co-locations (interactions) in the coded proteins and the free folding energy of mRNAs. This additional information, which seems to be necessary to determine the 3D structure of coding nucleic acids as well as the coded proteins, is known as the Proteomic Code and mRNA Assisted Protein Folding.
Collapse
Affiliation(s)
- Jan C Biro
- Homulus Foundation, 612 S Flower St, Los Angeles, 90 017 CA, USA. E-Mail:
; Tel. +1-213-627-6134
| |
Collapse
|
24
|
Biro JC. Does codon bias have an evolutionary origin? Theor Biol Med Model 2008; 5:16. [PMID: 18667081 PMCID: PMC2519059 DOI: 10.1186/1742-4682-5-16] [Citation(s) in RCA: 21] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/07/2008] [Accepted: 07/30/2008] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND There is a 3-fold redundancy in the Genetic Code; most amino acids are encoded by more than one codon. These synonymous codons are not used equally; there is a Codon Usage Bias (CUB). This article will provide novel information about the origin and evolution of this bias. RESULTS Codon Usage Bias (CUB, defined here as deviation from equal usage of synonymous codons) was studied in 113 species. The average CUB was 29.3 +/- 1.1% (S.E.M, n = 113) of the theoretical maximum and declined progressively with evolution and increasing genome complexity. A Pan-Genomic Codon Usage Frequency (CUF) Table was constructed to describe genome-wide relationships among codons. Significant correlations were found between the number of synonymous codons and (i) the frequency of the respective amino acids (ii) the size of CUB. Numerous, statistically highly significant, internal correlations were found among codons and the nucleic acids they comprise. These strong correlations made it possible to predict missing synonymous codons (wobble bases) reliably from the remaining codons or codon residues. CONCLUSION The results put the concept of "codon bias" into a novel perspective. The internal connectivity of codons indicates that all synonymous codons might be integrated parts of the Genetic Code with equal importance in maintaining its functional integrity.
Collapse
Affiliation(s)
- Jan C Biro
- Homulus Foundation, 612 S Flower St, Los Angeles, CA 90017, USA.
| |
Collapse
|
25
|
Biro JC. The Proteomic Code: a molecular recognition code for proteins. Theor Biol Med Model 2007; 4:45. [PMID: 17999762 PMCID: PMC2206014 DOI: 10.1186/1742-4682-4-45] [Citation(s) in RCA: 26] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/02/2007] [Accepted: 11/13/2007] [Indexed: 11/30/2022] Open
Abstract
Background The Proteomic Code is a set of rules by which information in genetic material is transferred into the physico-chemical properties of amino acids. It determines how individual amino acids interact with each other during folding and in specific protein-protein interactions. The Proteomic Code is part of the redundant Genetic Code. Review The 25-year-old history of this concept is reviewed from the first independent suggestions by Biro and Mekler, through the works of Blalock, Root-Bernstein, Siemion, Miller and others, followed by the discovery of a Common Periodic Table of Codons and Nucleic Acids in 2003 and culminating in the recent conceptualization of partial complementary coding of interacting amino acids as well as the theory of the nucleic acid-assisted protein folding. Methods and conclusions A novel cloning method for the design and production of specific, high-affinity-reacting proteins (SHARP) is presented. This method is based on the concept of proteomic codes and is suitable for large-scale, industrial production of specifically interacting peptides.
Collapse
Affiliation(s)
- Jan C Biro
- Homulus Foundation, 88 Howard, #1205, San Francisco, CA 94105, USA.
| |
Collapse
|
26
|
Benyo B, Biro JC, Benyo Z. Codes in the codons: construction of a codon/amino acid periodic table and a study of the nature of specific nucleic acid-protein interactions. CONFERENCE PROCEEDINGS : ... ANNUAL INTERNATIONAL CONFERENCE OF THE IEEE ENGINEERING IN MEDICINE AND BIOLOGY SOCIETY. IEEE ENGINEERING IN MEDICINE AND BIOLOGY SOCIETY. ANNUAL CONFERENCE 2007; 2004:2860-3. [PMID: 17270874 DOI: 10.1109/iembs.2004.1403815] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]
Abstract
The theory of "codon-amino acid coevolution" was first proposed by Woese in 1967. It suggests that there is a stereochemical matching - that is, affinity - between amino acids and certain of the base triplet sequences that code for those amino acids. We have constructed a common periodic table of codons and amino acids, where the nucleic acid table showed perfect axial symmetry for codons and the corresponding amino acid table also displayed periodicity regarding the biochemical properties (charge and hydrophobicity) of the 20 amino acids and the position of the stop signals. The table indicates that the middle (2/sup nd/) amino acid in the codon has a prominent role in determining some of the structural features of the amino acids. The possibility that physical contact between codons and amino acids might exist was tested on restriction enzymes. Many recognition site-like sequences were found in the coding sequences of these enzymes and as many as 73 examples of codon-amino acid co-location were observed in the 7 known 3D structures (December 2003) of endonuclease-nucleic acid complexes. These results indicate that the smallest possible units of specific nucleic acid-protein interaction are indeed the stereochemically compatible codons and amino acids.
Collapse
Affiliation(s)
- B Benyo
- Dept. of Informatics, Szechenyi Istvan Univ., Gyor, Hungary
| | | | | |
Collapse
|
27
|
Biro JC. Protein folding information in nucleic acids which is not present in the genetic code. Ann N Y Acad Sci 2007; 1091:399-411. [PMID: 17341631 DOI: 10.1196/annals.1378.083] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022]
Abstract
Nucleic acid subsequences comprising the 1st and/or 3rd codon residues in mRNAs express significantly higher free folding energy (FFE) than the subsequence containing only the 2nd residues (P < 0.0001, n = 81). This periodic FFE difference is not present in introns. The FFE in the 1st and 3rd residues is additive, which suggests that these residues contain a significant number of complementary bases and contribute to selection for local mRNA secondary structures. This periodic, codon-related structure forming of mRNAs indicates a connection between the structure of exons and the corresponding (translated) proteins. The folding energy dot plots of RNAs and the residue contact maps of the coded proteins are indeed similar. Residue contact statistics using 81 different protein structures confirmed that amino acids that are coded by partially reverse and complementary codons (Watson-Crick base pairs at the 1st and 3rd codon positions and translated in reverse orientation) are preferentially co-located in protein structures.
Collapse
Affiliation(s)
- Jan C Biro
- Homulus Foundation, 88 Howard #1205, San Francisco, CA 94195, USA.
| |
Collapse
|
28
|
Roosterman D, Goerge T, Schneider SW, Bunnett NW, Steinhoff M. Neuronal Control of Skin Function: The Skin as a Neuroimmunoendocrine Organ. Physiol Rev 2006; 86:1309-79. [PMID: 17015491 DOI: 10.1152/physrev.00026.2005] [Citation(s) in RCA: 431] [Impact Index Per Article: 22.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/17/2022] Open
Abstract
This review focuses on the role of the peripheral nervous system in cutaneous biology and disease. During the last few years, a modern concept of an interactive network between cutaneous nerves, the neuroendocrine axis, and the immune system has been established. We learned that neurocutaneous interactions influence a variety of physiological and pathophysiological functions, including cell growth, immunity, inflammation, pruritus, and wound healing. This interaction is mediated by primary afferent as well as autonomic nerves, which release neuromediators and activate specific receptors on many target cells in the skin. A dense network of sensory nerves releases neuropeptides, thereby modulating inflammation, cell growth, and the immune responses in the skin. Neurotrophic factors, in addition to regulating nerve growth, participate in many properties of skin function. The skin expresses a variety of neurohormone receptors coupled to heterotrimeric G proteins that are tightly involved in skin homeostasis and inflammation. This neurohormone-receptor interaction is modulated by endopeptidases, which are able to terminate neuropeptide-induced inflammatory or immune responses. Neuronal proteinase-activated receptors or transient receptor potential ion channels are recently described receptors that may have been important in regulating neurogenic inflammation, pain, and pruritus. Together, a close multidirectional interaction between neuromediators, high-affinity receptors, and regulatory proteases is critically involved to maintain tissue integrity and regulate inflammatory responses in the skin. A deeper understanding of cutaneous neuroimmunoendocrinology may help to develop new strategies for the treatment of several skin diseases.
Collapse
|
29
|
Biro JC. Indications that "codon boundaries" are physico-chemically defined and that protein-folding information is contained in the redundant exon bases. Theor Biol Med Model 2006; 3:28. [PMID: 16893453 PMCID: PMC1560374 DOI: 10.1186/1742-4682-3-28] [Citation(s) in RCA: 11] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/16/2005] [Accepted: 08/07/2006] [Indexed: 12/02/2022] Open
Abstract
Background All the information necessary for protein folding is supposed to be present in the amino acid sequence. It is still not possible to provide specific ab initio structure predictions by bioinformatical methods. It is suspected that additional folding information is present in protein coding nucleic acid sequences, but this is not represented by the known genetic code. Results Nucleic acid subsequences comprising the 1st and/or 3rd codon residues in mRNAs express significantly higher free folding energy (FFE) than the subsequence containing only the 2nd residues (p < 0.0001, n = 81). This periodic FFE difference is not present in introns. It is therefore a specific physico-chemical characteristic of coding sequences and might contribute to unambiguous definition of codon boundaries during translation. The FFEs of the 1st and 3rd residues are additive, which suggests that these residues contain a significant number of complementary bases and that may contribute to selection for local RNA secondary structures in coding regions. This periodic, codon-related structure-formation of mRNAs indicates a connection between the structures of exons and the corresponding (translated) proteins. The folding energy dot plots of RNAs and the residue contact maps of the coded proteins are indeed similar. Residue contact statistics using 81 different protein structures confirmed that amino acids that are coded by partially reverse and complementary codons (Watson-Crick (WC) base pairs at the 1st and 3rd codon positions and translated in reverse orientation) are preferentially co-located in protein structures. Conclusion Exons are distinguished from introns, and codon boundaries are physico-chemically defined, by periodically distributed FFE differences between codon positions. There is a selection for local RNA secondary structures in coding regions and this nucleic acid structure resembles the folding profiles of the coded proteins. The preferentially (specifically) interacting amino acids are coded by partially complementary codons, which strongly supports the connection between mRNA and the corresponding protein structures and indicates that there is protein folding information in nucleic acids that is not present in the genetic code. This might suggest an additional explanation of codon redundancy.
Collapse
|
30
|
Biro JC. Amino acid size, charge, hydropathy indices and matrices for protein structure analysis. Theor Biol Med Model 2006; 3:15. [PMID: 16551371 PMCID: PMC1450267 DOI: 10.1186/1742-4682-3-15] [Citation(s) in RCA: 58] [Impact Index Per Article: 3.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/16/2005] [Accepted: 03/22/2006] [Indexed: 01/31/2023] Open
Abstract
BACKGROUND Prediction of protein folding and specific interactions from only the sequence (ab initio) is a major challenge in bioinformatics. It is believed that such prediction will prove possible if Anfinsen's thermodynamic principle is correct for all kinds of proteins, and all the information necessary to form a concrete 3D structure is indeed present in the sequence. RESULTS We indexed the 200 possible amino acid pairs for their compatibility regarding the three major physicochemical properties--size, charge and hydrophobicity--and constructed Size, Charge and Hydropathy Compatibility Indices and Matrices (SCI & SCM, CCI & CCM, and HCI & HCM). Each index characterized the expected strength of interaction (compatibility) of two amino acids by numbers from 1 (not compatible) to 20 (highly compatible). We found statistically significant positive correlations between these indices and the propensity for amino acid co-locations in real protein structures (a sample containing total 34630 co-locations in 80 different protein structures): for HCI: p < 0.01, n = 400 in 10 subgroups; for SCI p < 1.3E-08, n = 400 in 10 subgroups; for CCI: p < 0.01, n = 175). Size compatibility between residues (well known to exist in nucleic acids) is a novel observation for proteins. Regression analyzes indicated at least 7 well distinguished clusters regarding size compatibility and 5 clusters of charge compatibility. We tried to predict or reconstruct simple 2D representations of 3D structures from the sequence using these matrices by applying a dot plot-like method. The location and pattern of the most compatible subsequences was very similar or identical when the three fundamentally different matrices were used, which indicates the consistency of physicochemical compatibility. However, it was not sufficient to choose one preferred configuration between the many possible predicted options. CONCLUSION Indexing of amino acids for major physico-chemical properties is a powerful approach to understanding and assisting protein design. However, it is probably insufficient itself for complete ab initio structure prediction.
Collapse
Affiliation(s)
- J C Biro
- Homulus Foundation, San Francisco, CA, USA.
| |
Collapse
|
31
|
Biro JC. A novel intra-molecular protein–protein interaction code based on partial complementary coding of co-locating amino acids. Med Hypotheses 2006; 66:137-42. [PMID: 16168570 DOI: 10.1016/j.mehy.2005.07.014] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/05/2005] [Revised: 07/10/2005] [Accepted: 07/13/2005] [Indexed: 10/25/2022]
Abstract
Proteins are assumed to contain all the information necessary for unambiguous folding and specific interaction with each other. However, ab initio structure prediction is often not successful because the amino acid sequence itself is simply not sufficient to guide between endless folding possibilities. It seems to be logical to try to find the "missing" information in nucleic acids, in the redundant codon. Statistical analyses of approximately 35K amino acid co-locations in 80 different protein structures indicate the existence of a weak intra-molecular protein-protein interaction code. Co-locating amino acids are preferentially coded by codons which are complementary in reverse orientation to each other at the 1st and 3rd codon positions, but not necessarily at the 2nd. This code, called D-1 X 3/RC-3 X 1, limits the number of preferred amino acid pairs from 20 to 10.3+/-0.8 (SEM, n=20) and emphasizes the importance of "strictly" defined amino acids (those having less synonymous codons). The existence of this code does not by any means violate the known physicochemical rules of protein folding or interaction. It is suggested that the biological source of preferential (specific) amino acid co-locations is the partial complementarity of their codons. This special coding of co-locating amino acids is important to better understanding of some fundamental biochemical processes and observations such as: (a) protein folding; (b) specific and high affinity protein-protein interactions; (c) the role of the wobble bases; (d) the significance of the redundant genetic code; (e) the origin of specific protein-protein interactions. Furthermore it might be useful even in protein design.
Collapse
Affiliation(s)
- Jan C Biro
- Homulus Foundation, 88 Howard, #1205, San Francisco, CA 94 105, USA.
| |
Collapse
|
32
|
Bradshaw PC, Rathi A, Samuels DC. Mitochondrial-encoded membrane protein transcripts are pyrimidine-rich while soluble protein transcripts and ribosomal RNA are purine-rich. BMC Genomics 2005; 6:136. [PMID: 16185363 PMCID: PMC1262711 DOI: 10.1186/1471-2164-6-136] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/12/2005] [Accepted: 09/26/2005] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Eukaryotic organisms contain mitochondria, organelles capable of producing large amounts of ATP by oxidative phosphorylation. Each cell contains many mitochondria with many copies of mitochondrial DNA in each organelle. The mitochondrial DNA encodes a small but functionally critical portion of the oxidative phosphorylation machinery, a few other species-specific proteins, and the rRNA and tRNA used for the translation of these transcripts. Because the microenvironment of the mitochondrion is unique, mitochondrial genes may be subject to different selectional pressures than those affecting nuclear genes. RESULTS From an analysis of the mitochondrial genomes of a wide range of eukaryotic species we show that there are three simple rules for the pyrimidine and purine abundances in mitochondrial DNA transcripts. Mitochondrial membrane protein transcripts are pyrimidine rich, rRNA transcripts are purine-rich and the soluble protein transcripts are purine-rich. The transitions between pyrimidine and purine-rich regions of the genomes are rapid and are easily visible on a pyrimidine-purine walk graph. These rules are followed, with few exceptions, independent of which strand encodes the gene. Despite the robustness of these rules across a diverse set of species, the magnitude of the differences between the pyrimidine and purine content is fairly small. Typically, the mitochondrial membrane protein transcripts have a pyrimidine richness of 56%, the rRNA transcripts are 55% purine, and the soluble protein transcripts are only 53% purine. CONCLUSION The pyrimidine richness of mitochondrial-encoded membrane protein transcripts is partly driven by U nucleotides in the second codon position in all species, which yields hydrophobic amino acids. The purine-richness of soluble protein transcripts is mainly driven by A nucleotides in the first codon position. The purine-richness of rRNA is also due to an abundance of A nucleotides. Possible mechanisms as to how these trends are maintained in mtDNA genomes of such diverse ancestry, size and variability of A-T richness are discussed.
Collapse
Affiliation(s)
- Patrick C Bradshaw
- Virginia Bioinformatics Institute, Virginia Polytechnic Institute and State University, Blacksburg, VA 24061, USA
| | - Anand Rathi
- Virginia Bioinformatics Institute, Virginia Polytechnic Institute and State University, Blacksburg, VA 24061, USA
| | - David C Samuels
- Virginia Bioinformatics Institute, Virginia Polytechnic Institute and State University, Blacksburg, VA 24061, USA
| |
Collapse
|
33
|
Biro JC. Nucleic acid chaperons: a theory of an RNA-assisted protein folding. Theor Biol Med Model 2005; 2:35. [PMID: 16137324 PMCID: PMC1232867 DOI: 10.1186/1742-4682-2-35] [Citation(s) in RCA: 12] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/10/2005] [Accepted: 09/01/2005] [Indexed: 12/04/2022] Open
Abstract
Background Proteins are assumed to contain all the information necessary for unambiguous folding (Anfinsen's principle). However, ab initio structure prediction is often not successful because the amino acid sequence itself is not sufficient to guide between endless folding possibilities. It seems to be a logical to try to find the "missing" information in nucleic acids, in the redundant codon base. Results mRNA energy dot plots and protein residue contact maps were found to be rather similar. The structure of mRNA is also conserved if the protein structure is conserved, even if the sequence similarity is low. These observations led me to suppose that some similarity might exist between nucleic acid and protein folding. I found that amino acid pairs, which are co-located in the protein structure, are preferentially coded by complementary codons. This codon complementarity is not perfect; it is suboptimal where the 1st and 3rd codon residues are complementary to each other in reverse orientation, while the 2nd codon letters may be, but are not necessarily, complementary. Conclusion Partial complementary coding of co-locating amino acids in protein structures suggests that mRNA assists in protein folding and functions not only as a template but even as a chaperon during translation. This function explains the role of wobble bases and answers the mystery of why we have a redundant codon base.
Collapse
Affiliation(s)
- Jan C Biro
- Homulus Foundation, San Francisco, CA 94105, USA.
| |
Collapse
|
34
|
Biro JC, Benyó Z, Sansom C, Benyó B. In search of the nature of specific nucleic acid-protein interactions. ACTA ACUST UNITED AC 2005; 92:1-10. [PMID: 16003939 DOI: 10.1556/aphysiol.92.2005.1.1] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/19/2022]
Abstract
The theory of "codon-amino acid coevolution" was first proposed by Woese in 1967. It suggests that there is a stereochemical matching - that is, affinity - between amino acids and certain of the base triplet sequences that code for those amino acids. We have constructed a Common Periodic Table of Codons and Amino Acids, where the Nucleic Acid Table showed perfect axial symmetry for codons and the corresponding Amino Acid Table also displayed periodicity regarding the biochemical properties (charge and hydrophobicity) of the 20 amino acids and the position of the stop signals. The Table indicates that the middle (2nd) amino acid in the codon has a prominent role in determining some of the structural features of the amino acids. The possibility that physical contact between codons and amino acids might exist was tested on restriction enzymes. Many recognition site-like sequences were found in the coding sequences of these enzymes and as many as 73 examples of codon-amino acid co-location were observed in the 7 known 3D structures (December 2003) of endonuclease-nucleic acid complexes. These results indicate that the smallest possible units of specific nucleic acid-protein interaction are indeed the stereochemically compatible codons and amino acids.
Collapse
Affiliation(s)
- J C Biro
- Karolinska Institute, Stockholm, Sweden
| | | | | | | |
Collapse
|
35
|
Biro JC. Seven fundamental, unsolved questions in molecular biology. Cooperative storage and bi-directional transfer of biological information by nucleic acids and proteins: an alternative to "central dogma". Med Hypotheses 2005; 63:951-62. [PMID: 15504561 DOI: 10.1016/j.mehy.2004.06.024] [Citation(s) in RCA: 10] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/23/2004] [Accepted: 06/14/2004] [Indexed: 11/24/2022]
Abstract
The Human Genome Mapping Project provided us a large amount of sequence data. However our understanding of these data did not grow proportionally, because old dogmas still set the limits of our thinking. The gene-centric, reductionistical side of molecular biology is reviewed and seven problems are formulated, each indicating the insufficiency of the "central dogma". The following is concluded and suggested: 1. Genes are located and expressed on both DNA strands; 2. Introns are the source of important biological regulation and diversity; 3. Repeats are the frame of the chromatin structure and participate in the chromatin regulation; 4. The molecular accessibility of the canonical dsDNA structure is poor; 5. The genetic code is co-evolved with the amino acids and there is a stereochemical matching between the codes andamino acids; 6. The flow of information between nucleic acids and proteins is bi-directional and reverse translation might exist; 7. Complex genetic information is always carried and stored by nucleic acids and proteins together.
Collapse
Affiliation(s)
- J C Biro
- Karolinska Institute, Stockholm, Sweden.
| |
Collapse
|
36
|
Biro JC, Biro JMK. Frequent occurrence of recognition site-like sequences in the restriction endonucleases. BMC Bioinformatics 2004; 5:30. [PMID: 15113406 PMCID: PMC394317 DOI: 10.1186/1471-2105-5-30] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/10/2003] [Accepted: 03/16/2004] [Indexed: 11/19/2022] Open
Abstract
Background There are two different theories about the development of the genetic code. Woese suggested that it was developed in connection with the amino acid repertoire, while Crick argued that any connection between codons and amino acids is only the result of an "accident". This question is fundamental to understand the nature of specific protein-nucleic acid interactions. Results The nature of specific protein-nucleic acid interaction between restriction endonucleases (RE) and their recognition sequences (RS) was studied by bioinformatics methods. It was found that the frequency of 5–6 residue long RS-like oligonucleotides is unexpectedly high in the nucleic acid sequence of the corresponding RE (p < 0.05 and p < 0.001 respectively, n = 7). There is an extensive conservation of these RS-like sequences in RE isoschizomers. A review of the seven available crystallographic studies showed that the amino acids coded by codons that are subsets of recognition sequences were often closely located to the RS itself and they were in many cases directly adjacent to the codon-like triplets in the RS. Fifty-five examples of this codon-amino acid co-localization are found and analyzed, which represents 41.5% of total 132 amino acids which are localized within 8 Å distance to the C1' atoms in the DNA. The average distance between the closest atoms in the codons and amino acids is 5.5 +/- 0.2 Å (mean +/- S.E.M, n = 55), while the distance between the nitrogen and oxygen atoms of the co-localized molecules is significantly shorter, (3.4 +/- 0.2 Å, p < 0.001, n = 15), when positively charged amino acids are involved. This is indicating that an interaction between the nucleic- and amino acids might occur. Conclusion We interpret these results in favor of Woese and suggest that the genetic code is "rational" and there is a stereospecific relationship between the codes and the amino acids.
Collapse
Affiliation(s)
- Jan C Biro
- Karolinska Institute, Stockholm, Sweden
- Homulus Informatics, 88 Howard, # 1205, San Francisco, 94 105 CA, USA
| | - Josephine MK Biro
- Homulus Informatics, 88 Howard, # 1205, San Francisco, 94 105 CA, USA
| |
Collapse
|