1
|
Baeza M, Sepulveda D, Cifuentes V, Alcaíno J. Codon usage bias in yeasts and its correlation with gene expression, growth temperature, and protein structure. Front Microbiol 2024; 15:1414422. [PMID: 39040903 PMCID: PMC11260810 DOI: 10.3389/fmicb.2024.1414422] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/08/2024] [Accepted: 06/25/2024] [Indexed: 07/24/2024] Open
Abstract
Codon usage bias (CUB) has been described in viruses, prokaryotes, and eukaryotes and has been linked to several cellular and environmental factors, such as the organism's growth temperature, gene expression levels, and regulation of protein synthesis and folding. Most of the studies in this area have been conducted in bacteria and higher eukaryotes, in some cases with different results. In this study, a comparative analysis of CUB in yeasts isolated from cold and template environments was performed in order to evaluate the correlation of CUB with yeast optimal temperature of growth (OTG), gene expression levels, cellular function, and structure of encoded proteins. Among the main findings, highly expressed ORFs tend to have a more similar CUB within and between yeasts, and a direct correlation between codons ending in C and expression level was generally found. A low correspondence between CUB and OTG was observed, with an inverse correlation for some codons ending in C. The clustering of yeasts based on their CUB partially aligns with their OTG, being more consistent for yeasts with lower OTG. In most yeasts, the abundance of preferred codons was generally lower at the 5' end of ORFs, higher in segments encoding beta strand, lower in segments encoding extracellular and transmembrane regions, and higher in "translation" and "energy metabolism" pathways, especially in highly expressed ORFs. Based on our findings, it is suggested that the abundance and distribution of preferred and non-preferred codons along mRNAs contribute to proper protein folding and functionality by regulating protein synthesis rates, becoming a more important factor under conditions that require faster protein synthesis in yeasts.
Collapse
Affiliation(s)
- Marcelo Baeza
- Departamento de Ciencias Ecológicas, Facultad de Ciencias, Universidad de Chile, Santiago, Chile
| | | | - Víctor Cifuentes
- Departamento de Ciencias Ecológicas, Facultad de Ciencias, Universidad de Chile, Santiago, Chile
| | - Jennifer Alcaíno
- Departamento de Ciencias Ecológicas, Facultad de Ciencias, Universidad de Chile, Santiago, Chile
| |
Collapse
|
2
|
Andargie M, Congyi Z. Genome-wide analysis of codon usage in sesame ( Sesamum indicum L.). Heliyon 2022; 8:e08687. [PMID: 35106386 PMCID: PMC8789531 DOI: 10.1016/j.heliyon.2021.e08687] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/15/2021] [Revised: 11/20/2021] [Accepted: 12/24/2021] [Indexed: 10/28/2022] Open
Abstract
Sesamum indicum is an ancient oil crop grown in tropical and subtropical areas of the world. We have analyzed 23,538 coding sequences (CDS) of S. indicum to understand the factors shaping codon usage in this important oil crop plant. We identified eleven highly preferred codons in S. indicum that have AT-endings. The slope of a neutrality plot was less than one while effective number of codons (ENC) plot showed distribution above and below the standard curve. There is a significant relationship between protein length and relative synonymous codon usage (RSCU) at the primary axis while there is a weak correlation between protein length and Nc values. Correspondence analysis conducted on RSCU values differentiated CDS based on their GC content and their characteristic feature and showed a discrete distribution. Moreover, by determining codon usage, we found out that majority of the lignan biosynthesis related genes showed a weaker codon usage bias. These results provide insights into understanding codon evolution in sesame.
Collapse
Affiliation(s)
- Mebeaselassie Andargie
- University of Goettingen, Molecular Phytopathology and Mycotoxin Research, Grisebachstrasse 6, 37077 Goettingen, Germany
| | - Zhu Congyi
- Key Laboratory of South Subtropical Fruit Biology and Genetic Resource Utilization (MOA), Guangdong Province Key Laboratory of Tropical and Subtropical Fruit Tree Research, Institute of Fruit Tree Research, Guangdong Academy of Agricultural Sciences, Guangzhou, China
| |
Collapse
|
3
|
Mazumder TH, Alqahtani AM, Alqahtani T, Emran TB, A. Aldahish A, Uddin A. Analysis of Codon Usage of Speech Gene FoxP2 among Animals. BIOLOGY 2021; 10:1078. [PMID: 34827071 PMCID: PMC8614651 DOI: 10.3390/biology10111078] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 09/06/2021] [Revised: 10/12/2021] [Accepted: 10/16/2021] [Indexed: 12/03/2022]
Abstract
The protein-coding gene FoxP2 (fork head box protein P2) plays a major role in communication and evolutionary changes. The present study carried out a comprehensive codon usage bias analysis in the FoxP2 gene among a diverse group of animals including fishes, birds, reptiles, and mammals. We observed that in the genome of fishes for the FoxP2 gene, codons ending with C or G were most frequently used, while in birds, reptiles, and mammals, codons ending with T or A were most frequently used. A higher ENC value was observed for the FoxP2 gene indicating a lower CUB. Parity role two-bias plots suggested that apart from mutation pressure, other factors such as natural selection might have influenced the CUB. The frequency distribution of the ENC observed and ENC expected ratio revealed that mutation pressure plays a key role in the patterns of codon usage of FoxP2. Besides, correspondence analysis exposed the composition of the nucleobase under mutation bias affects the codon usage of the FoxP2 gene. However, neutrality plots revealed the major role of natural selection over mutation pressure in the CUB of FoxP2. In addition, the codon usage patterns for FoxP2 among the selected genomes suggested that nature has favored nearly all the synonymous codons for encoding the corresponding amino acid. The uniform usage of 12 synonymous codons for FoxP2 was observed among the species of birds. The amino acid usage frequency for FoxP2 revealed that the amino acids Leucine, Glutamine, and Serine were predominant over other amino acids among all the species of fishes, birds, reptiles, and mammals.
Collapse
Affiliation(s)
| | - Ali M. Alqahtani
- Department of Pharmacology, College of Pharmacy, King Khalid University, Abha 62529, Saudi Arabia; (A.M.A.); (T.A.); (A.A.A.)
| | - Taha Alqahtani
- Department of Pharmacology, College of Pharmacy, King Khalid University, Abha 62529, Saudi Arabia; (A.M.A.); (T.A.); (A.A.A.)
| | - Talha Bin Emran
- Department of Pharmacy, BGC Trust University Bangladesh, Chittagong 4381, Bangladesh;
| | - Afaf A. Aldahish
- Department of Pharmacology, College of Pharmacy, King Khalid University, Abha 62529, Saudi Arabia; (A.M.A.); (T.A.); (A.A.A.)
| | - Arif Uddin
- Department of Zoology, Moinul Hoque Choudhury Memorial College, Hailakandi 788150, Assam, India
| |
Collapse
|
4
|
Barbhuiya PA, Uddin A, Chakraborty S. Codon usage pattern and evolutionary forces of mitochondrial ND genes among orders of class Amphibia. J Cell Physiol 2020; 236:2850-2868. [PMID: 32960450 DOI: 10.1002/jcp.30050] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/18/2020] [Revised: 08/07/2020] [Accepted: 08/31/2020] [Indexed: 12/18/2022]
Abstract
In this study, we used a bioinformatics approach to analyze the nucleotide composition and pattern of synonymous codon usage in mitochondrial ND genes in three amphibian groups, that is, orders Anura, Caudata, and Gymnophiona to identify the commonality and the differences of codon usage as no research work was reported yet. The high value of the effective number of codons revealed that the codon usage bias (CUB) was low in mitochondrial ND genes among the orders. Nucleotide composition analysis suggested that for each gene, the compositional features differed among Anura, Caudata, and Gymnophiona and the GC content was lower than AT content. Furthermore, a highly significant difference (p < .05) for GC content was found in each gene among the orders. The heat map showed contrasting patterns of codon usage among different ND genes. The regression of GC12 on GC3 suggested a narrow range of GC3 distribution and some points were located in the diagonal, indicating both mutation pressure and natural selection might influence the CUB. Moreover, the slope of the regression line was less than 0.5 in all ND genes among orders, indicating natural selection might have played the dominant role whereas mutation pressure had played a minor role in shaping CUB of ND genes across orders.
Collapse
Affiliation(s)
| | - Arif Uddin
- Department of Zoology, Moinul Hoque Choudhury Memorial Science College, Hailakandi, Assam, India
| | | |
Collapse
|
5
|
Begum Y, Mondal SK. Comprehensive study of the genes involved in chlorophyll synthesis and degradation pathways in some monocot and dicot plant species. J Biomol Struct Dyn 2020; 39:2387-2414. [PMID: 32292132 DOI: 10.1080/07391102.2020.1748717] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/20/2022]
Abstract
Chlorophyll (Chl) biosynthesis is one of the most important cellular processes essential for plant photosynthesis. Chl degradation pathway is also important catabolic process occurs during leaf senescence, fruit ripening and under biotic or abiotic stress conditions. Here we have systematically investigated the molecular evolution, gene structure, compositional analysis along with ENc plot, correspondence analysis and codon usage bias of the proteins and encoded genes involved in Chl metabolism from monocots and dicots. The gene and species specific phylogenetic trees using amino acid sequences showed clear clustering formation of the selected species based on monocots and dicots but not supported by 18S rRNA. Nucleotide composition of the encoding genes showed that average GC%, GC1%, GC2% and GC3% were higher in monocots. RSCU analysis depicts that genes from monocots for both pathways and genes for synthesis pathway from dicots only biased to G/C-ending synonymous codons but in degradation pathway most optimal codons (except UUG) in dicots biased to A/U-ending synonymous codons. We found strong evidence of episodic diversifying selection at several amino acid sites in all genes investigated. Conserved domain and gene structures were observed for the genes with varying lengths of introns and exons, involved in Chl metabolism along with some intronless genes within synthesis pathway. ENc and correspondence analyses suggested the mutational or selection constraint on the genes to shape the codon usage. These comprehensive studies may be helpful in further research in molecular phylogenetics and genomics and to better understand the evolutionary dynamics of Chl metabolic pathway.Communicated by Ramaswamy H. Sarma.
Collapse
Affiliation(s)
- Yasmin Begum
- Department of Biophysics, Molecular Biology and Bioinformatics, University of Calcutta, Kolkata, West Bengal, India.,Center of Excellence in Systems Biology and Biomedical Engineering (TEQIP Phase-II), University of Calcutta, Kolkata, West Bengal, India
| | - Sunil Kanti Mondal
- Department of Biotechnology, The University of Burdwan, Burdwan, West Bengal, India
| |
Collapse
|
6
|
Barbhuiya PA, Uddin A, Chakraborty S. Genome‐wide comparison of codon usage dynamics in mitochondrial genes across different species of amphibian genus
Bombina. JOURNAL OF EXPERIMENTAL ZOOLOGY PART B-MOLECULAR AND DEVELOPMENTAL EVOLUTION 2019; 332:99-112. [DOI: 10.1002/jez.b.22852] [Citation(s) in RCA: 14] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/10/2018] [Revised: 03/10/2019] [Accepted: 03/20/2019] [Indexed: 01/16/2023]
Affiliation(s)
| | - Arif Uddin
- Department of ZoologyMoinul Hoque Choudhury Memorial Science CollegeHailakandi Assam India
| | | |
Collapse
|
7
|
Hussain S, Rasool ST, Asif AH. A detailed analysis of synonymous codon usage in human bocavirus. Arch Virol 2018; 164:335-347. [PMID: 30327886 DOI: 10.1007/s00705-018-4063-8] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/06/2018] [Accepted: 09/16/2018] [Indexed: 01/16/2023]
Abstract
Human bocavirus (HBoV) is a recently discovered parvovirus associated with respiratory and gastroenteric infections in children. To date, four distinct subtypes have been identified worldwide. HBoV1 is the most frequently detected bocavirus in clinical samples derived from the respiratory tract. HBoV has a single-stranded DNA genome, which encodes two nonstructural proteins, NS1 and NP1, and two structural proteins, VP1 and VP2. Despite a large number of available HBoV sequences, the molecular evolution of this virus remains enigmatic. Here, we applied bioinformatic methods to measure the codon usage bias in 156 HBoV genomes and analyzed the factors responsible for preferential use of various synonymous codons. The effective number of codons (ENC) indicates a highly conserved, gene-specific codon usage bias in the HBoV genome. The structural genes exhibit a higher degree of codon usage bias than the non-structural genes. Natural selection emerged as dominant factor influencing the codon usage bias in the HBoV genome. Other factors that influence the codon usage include mutational pressure, gene length, protein properties, and the relative abundance of dinucleotides. The results presented in this study provide important insight into the molecular evolution of HBoV and may serve as a primer for HBoV gene expression studies and development of safe and effective vaccines to prevent infection.
Collapse
Affiliation(s)
- Snawar Hussain
- Department of Biomedical Science, College of Clinical Pharmacy, King Faisal University, P.O Box 400, Al-Ahsa, 31982, Kingdom of Saudi Arabia.
| | - Sahibzada Tasleem Rasool
- Department of Biomedical Science, College of Clinical Pharmacy, King Faisal University, P.O Box 400, Al-Ahsa, 31982, Kingdom of Saudi Arabia
| | - Afzal Haq Asif
- Department of Biomedical Science, College of Clinical Pharmacy, King Faisal University, P.O Box 400, Al-Ahsa, 31982, Kingdom of Saudi Arabia
| |
Collapse
|
8
|
Paul P, Malakar AK, Chakraborty S. Compositional bias coupled with selection and mutation pressure drives codon usage in Brassica campestris genes. Food Sci Biotechnol 2017; 27:725-733. [PMID: 30263798 DOI: 10.1007/s10068-017-0285-x] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/29/2017] [Revised: 11/28/2017] [Accepted: 12/03/2017] [Indexed: 11/25/2022] Open
Abstract
The plant Brassica campestris includes the vegetables turnip and Chinese cabbage, important plants of economic importance. Here, we have analysed the codon usage bias of B. campestris for 116 protein coding genes. Neutrality analysis showed that B. campestris had a wide range of GC3s, and a significant correlation was observed between GC12 and GC3. Nc versus GC3s plot showed a few genes on or proximate to the expected curve, but the majority of points were found to be scattered distantly from the expected curve. Correspondence analysis on codon usage revealed that the position preference of codons on multidimensional space totally depends on the presence of A and T at synonymous third codon position. These results altogether suggest that composition bias along with selection (major) and mutation pressure (minor) affects the codon usage pattern of the protein coding genes in Brassica campestris.
Collapse
Affiliation(s)
- Prosenjit Paul
- Department of Biotechnology, Assam University, Silchar, Assam 788011 India
| | - Arup Kumar Malakar
- Department of Biotechnology, Assam University, Silchar, Assam 788011 India
| | | |
Collapse
|
9
|
Codon usage and amino acid usage influence genes expression level. Genetica 2017; 146:53-63. [DOI: 10.1007/s10709-017-9996-4] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/08/2017] [Accepted: 10/09/2017] [Indexed: 11/30/2022]
|
10
|
Abstract
Based on the Shannon's information communication theory, information amount of the entire length of a polymeric macromolecule can be calculated in bits through adding the entropies of each building block. Proteins, DNA and RNA are such macromolecules. When only the building blocks' variation is considered as the source of entropy, there is seemingly lower information in case of the protein if this approach is applied directly on a protein of specific size and the coding sequence size of the mRNA corresponding to the particular length of the protein. This decrease in the information amount seems contradictory but this apparent conflict is resolved by considering the conformational variations in proteins as a new variable in the calculation and balancing the approximated entropy of the coding part of the mRNA and the protein. Probabilities can change therefore we also assigned hypothetical probabilities to the conformational states, which represent the uneven distribution as the time spent in one conformation, providing the probability of the presence in either or one of the possible conformations. Results that are obtained by using hypothetical probabilities are in line with the experimental values of variations in the conformational-state of protein populations. This equalization approach has further biological relevance that it compensates for the degeneracy in the codon usage during protein translation and it leads to the conclusion that the alphabet size for the protein is rather optimal for the proper protein functioning within the thermodynamic milieu of the cell. The findings were also discussed in relation to the codon bias and have implications in relation to the codon evolution concept. Eventually, this work brings the fields of protein structural studies and molecular protein translation processes together with a novel approach.
Collapse
Affiliation(s)
- Y Adiguzel
- Biophysics Department, School of Medicine, Istanbul Kemerburgaz University, Istanbul, Turkey.
| |
Collapse
|
11
|
Hussain S, Rasool ST. Analysis of synonymous codon usage in Zika virus. Acta Trop 2017; 173:136-146. [PMID: 28606821 DOI: 10.1016/j.actatropica.2017.06.006] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/20/2017] [Revised: 06/04/2017] [Accepted: 06/07/2017] [Indexed: 01/11/2023]
Abstract
Zika virus is a zoonotic pathogen, which have made frequent incursion into the human population in Africa and South East Asia over the course of several decades but never reached to the pandemic proportions until the most recent outbreak. Viruses are solely dependent on host synthetic machinery for their replication cycle; therefore, replication and persistence in a host species of different genetic background requires certain degree of adaptation. These adaptations are necessary to avoid detection from host immune surveillance and maximize the utilization of available resources for efficient viral replication. Study of genomic composition and codon usage pattern not only offer an insight into the adaptation of viruses to their new host, but may also provide some information about pathogenesis and spread of the virus. To elucidate the genetic features and synonymous codon usage bias in ZIKV genome, a comprehensive analysis was performed on 80 full-length ZIKV sequences. Our analyses shows that the overall extent of codon usage bias in ZIKV genome is low and affected by nucleotide composition, protein properties, natural selection, and gene expression level.
Collapse
|
12
|
Genome-wide comparative analysis of codon usage bias and codon context patterns among cyanobacterial genomes. Mar Genomics 2016; 32:31-39. [PMID: 27733306 DOI: 10.1016/j.margen.2016.10.001] [Citation(s) in RCA: 26] [Impact Index Per Article: 2.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/14/2016] [Revised: 09/11/2016] [Accepted: 10/03/2016] [Indexed: 11/20/2022]
Abstract
With the increasing accumulation of genomic sequence information of prokaryotes, the study of codon usage bias has gained renewed attention. The purpose of this study was to examine codon selection pattern within and across cyanobacterial species belonging to diverse taxonomic orders and habitats. We performed detailed comparative analysis of cyanobacterial genomes with respect to codon bias. Our analysis reflects that in cyanobacterial genomes, A- and/or T-ending codons were used predominantly in the genes whereas G- and/or C-ending codons were largely avoided. Variation in the codon context usage of cyanobacterial genes corresponded to the clustering of cyanobacteria as per their GC content. Analysis of codon adaptation index (CAI) and synonymous codon usage order (SCUO) revealed that majority of genes are associated with low codon bias. Codon selection pattern in cyanobacterial genomes reflected compositional constraints as major influencing factor. It is also identified that although, mutational constraint may play some role in affecting codon usage bias in cyanobacteria, compositional constraint in terms of genomic GC composition coupled with environmental factors affected codon selection pattern in cyanobacterial genomes.
Collapse
|
13
|
Bernardi G. Genome Organization and Chromosome Architecture. COLD SPRING HARBOR SYMPOSIA ON QUANTITATIVE BIOLOGY 2016; 80:83-91. [PMID: 26801160 DOI: 10.1101/sqb.2015.80.027318] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/24/2022]
Abstract
How the same DNA sequences can function in the three-dimensional architecture of interphase nucleus, fold in the very compact structure of metaphase chromosomes, and go precisely back to the original interphase architecture in the following cell cycle remains an unresolved question to this day. The solution to this question presented here rests on the correlations that were found to hold between the isochore organization of the genome and the architecture of chromosomes from interphase to metaphase. The key points are the following: (1) The transition from the looped domains and subdomains of interphase chromatin to the 30-nm fiber loops of early prophase chromosomes goes through their unfolding into an extended chromatin structure (probably a 10-nm "beads-on-a-string" structure); (2) the architectural proteins of interphase chromatin, such as CTCF and cohesin subunits, are retained in mitosis and are part of the discontinuous protein scaffold of mitotic chromosomes; and (3) the conservation of the link between architectural proteins and their binding sites on DNA through the cell cycle explains the reversibility of the interphase to mitosis process and the "mitotic memory" of interphase architecture.
Collapse
Affiliation(s)
- Giorgio Bernardi
- Science Department, Roma Tre University, 00146 Rome, Italy Stazione Zoologica Anton Dohrn, 80121 Naples, Italy
| |
Collapse
|
14
|
Abstract
How the same DNA sequences can function in the three-dimensional architecture of interphase nucleus, fold in the very compact structure of metaphase chromosomes and go precisely back to the original interphase architecture in the following cell cycle remains an unresolved question to this day. The strategy used to address this issue was to analyze the correlations between chromosome architecture and the compositional patterns of DNA sequences spanning a size range from a few hundreds to a few thousands Kilobases. This is a critical range that encompasses isochores, interphase chromatin domains and boundaries, and chromosomal bands. The solution rests on the following key points: 1) the transition from the looped domains and sub-domains of interphase chromatin to the 30-nm fiber loops of early prophase chromosomes goes through the unfolding into an extended chromatin structure (probably a 10-nm "beads-on-a-string" structure); 2) the architectural proteins of interphase chromatin, such as CTCF and cohesin sub-units, are retained in mitosis and are part of the discontinuous protein scaffold of mitotic chromosomes; 3) the conservation of the link between architectural proteins and their binding sites on DNA through the cell cycle explains the "mitotic memory" of interphase architecture and the reversibility of the interphase to mitosis process. The results presented here also lead to a general conclusion which concerns the existence of correlations between the isochore organization of the genome and the architecture of chromosomes from interphase to metaphase.
Collapse
Affiliation(s)
- Giorgio Bernardi
- Science Department, Roma Tre University, Marconi, Rome, Italy
- Stazione Zoologica Anton Dohrn, Villa Comunale, Naples, Italy
| |
Collapse
|
15
|
Baeza M, Alcaíno J, Barahona S, Sepúlveda D, Cifuentes V. Codon usage and codon context bias in Xanthophyllomyces dendrorhous. BMC Genomics 2015; 16:293. [PMID: 25887493 PMCID: PMC4404019 DOI: 10.1186/s12864-015-1493-5] [Citation(s) in RCA: 30] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/30/2014] [Accepted: 03/27/2015] [Indexed: 01/11/2023] Open
Abstract
Background Synonymous codons are used differentially in organisms from the three domains of life, a phenomenon referred to as codon usage bias. In addition, codon pair bias, particularly in the 3’ codon context, has also been described in several organisms and is associated with the accuracy and rate of translation. An improved understanding of both types of bias is important for the optimization of heterologous protein expression, particularly in biotechnologically important organisms, such as the yeast Xanthophyllomyces dendrorhous, a promising bioresource for the carotenoid astaxanthin. Using genomic and transcriptomic data, the codon usage and codon context biases of X. dendrorhous open reading frames (ORFs) were analyzed to determine their expression levels, GC% and sequence lengths. X. dendrorhous totiviral ORFs were also included in these analyses. Results A total of 1,695 X. dendrorhous ORFs were identified through comparison with sequences in multiple databases, and the intron-exon structures of these sequences were determined. Although there were important expression variations among the ORFs under the studied conditions (different phases of growth and available carbon sources), most of these sequences were highly expressed under at least one of the analyzed conditions. Independent of the culture conditions, the highly expressed genes showed a strong bias in both codon usage and the 3’ context, with a minor association with the GC% and no relationship to the sequence length. The codon usage and codon-pair bias of the totiviral ORFs were highly variable with no similarities to the host ORFs. Conclusions There is a direct relation between the level of gene expression and codon usage and 3′ context bias in X. dendrorhous, which is more evident for ORFs that are expressed at the highest levels under the studied conditions. However, there is no direct relation between the totiviral ORF biases and the host ORFs. Electronic supplementary material The online version of this article (doi:10.1186/s12864-015-1493-5) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Marcelo Baeza
- Departamento de Ciencias Ecológicas, Facultad de Ciencias, Universidad de Chile, Las Palmeras 3425, Casilla 653, Santiago, Chile.
| | - Jennifer Alcaíno
- Departamento de Ciencias Ecológicas, Facultad de Ciencias, Universidad de Chile, Las Palmeras 3425, Casilla 653, Santiago, Chile.
| | - Salvador Barahona
- Departamento de Ciencias Ecológicas, Facultad de Ciencias, Universidad de Chile, Las Palmeras 3425, Casilla 653, Santiago, Chile.
| | - Dionisia Sepúlveda
- Departamento de Ciencias Ecológicas, Facultad de Ciencias, Universidad de Chile, Las Palmeras 3425, Casilla 653, Santiago, Chile.
| | - Víctor Cifuentes
- Departamento de Ciencias Ecológicas, Facultad de Ciencias, Universidad de Chile, Las Palmeras 3425, Casilla 653, Santiago, Chile.
| |
Collapse
|
16
|
Rao Y, Wang Z, Chai X, Nie Q, Zhang X. Hydrophobicity and aromaticity are primary factors shaping variation in amino acid usage of chicken proteome. PLoS One 2014; 9:e110381. [PMID: 25329059 PMCID: PMC4199684 DOI: 10.1371/journal.pone.0110381] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/02/2013] [Accepted: 09/22/2014] [Indexed: 11/18/2022] Open
Abstract
Amino acids are utilized with different frequencies both among species and among genes within the same genome. Up to date, no study on the amino acid usage pattern of chicken has been performed. In the present study, we carried out a systematic examination of the amino acid usage in the chicken proteome. Our data indicated that the relative amino acid usage is positively correlated with the tRNA gene copy number. GC contents, including GC1, GC2, GC3, GC content of CDS and GC content of the introns, were correlated with the most of the amino acid usage, especially for GC rich and GC poor amino acids, however, multiple linear regression analyses indicated that only approximately 10–40% variation of amino acid usage can be explained by GC content for GC rich and GC poor amino acids. For other intermediate GC content amino acids, only approximately 10% variation can be explained. Correspondence analyses demonstrated that the main factors responsible for the variation of amino acid usage in chicken are hydrophobicity, aromaticity and genomic GC content. Gene expression level also influenced the amino acid usage significantly. We argued that the amino acid usage of chicken proteome likely reflects a balance or near balance between the action of selection, mutation, and genetic drift.
Collapse
Affiliation(s)
- Yousheng Rao
- Department of Biological Technology, Nanchang Normal University, Nanchang, Jiangxi, China
- Guangdong Provincial Key Laboratory of Agro-animal Genomics and Molecular Breeding, South China Agricultural University, Guangzhou, Guangdong, China
| | - Zhangfeng Wang
- Department of Biological Technology, Nanchang Normal University, Nanchang, Jiangxi, China
| | - Xuewen Chai
- Department of Biological Technology, Nanchang Normal University, Nanchang, Jiangxi, China
| | - Qinghua Nie
- Guangdong Provincial Key Laboratory of Agro-animal Genomics and Molecular Breeding, South China Agricultural University, Guangzhou, Guangdong, China
| | - Xiquan Zhang
- Guangdong Provincial Key Laboratory of Agro-animal Genomics and Molecular Breeding, South China Agricultural University, Guangzhou, Guangdong, China
- * E-mail:
| |
Collapse
|
17
|
Li S, Yang J. System analysis of synonymous codon usage biases in archaeal virus genomes. J Theor Biol 2014; 355:128-39. [PMID: 24685889 PMCID: PMC7094158 DOI: 10.1016/j.jtbi.2014.03.022] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/29/2013] [Revised: 03/11/2014] [Accepted: 03/12/2014] [Indexed: 12/30/2022]
Abstract
Recent studies of geothermally heated aquatic ecosystems have found widely divergent viruses with unusual morphotypes. Archaeal viruses isolated from these hot habitats usually have double-stranded DNA genomes, linear or circular, and can infect members of the Archaea domain. In this study, the synonymous codon usage bias (SCUB) and dinucleotide composition in the available complete archaeal virus genome sequences have been investigated. It was found that there is a significant variation in SCUB among different Archaeal virus species, which is mainly determined by the base composition. The outcome of correspondence analysis (COA) and Spearman׳s rank correlation analysis shows that codon usage of selected archaeal virus genes depends mainly on GC richness of genome, and the gene׳s function, albeit with smaller effects, also contributes to codon usage in this virus. Furthermore, this investigation reveals that aromaticity of each protein is also critical in affecting SCUB of these viral genes although it was less important than that of the mutational bias. Especially, mutational pressure may influence SCUB in SIRV1, SIRV2, ARV1, AFV1, and PhiCh1 viruses, whereas translational selection could play a leading role in HRPV1׳s SCUB. These conclusions not only can offer an insight into the codon usage biases of archaeal virus and subsequently the possible relationship between archaeal viruses and their host, but also may help in understanding the evolution of archaeal viruses and their gene classification, and more helpful to explore the origin of life and the evolution of biology. The SCUB of archaeal virus genes depends mainly on GC richness of genome. The mutational pressure is the main factor that influences SCUB. The aromaticity of each protein is also critical in affecting SCUB. The translational selection could play a leading role in HRPV1׳s SCUB. The mode is helpful to explore the origin of life and the evolution of biology.
Collapse
Affiliation(s)
- Sen Li
- State Key Laboratory of Pharmaceutical Biotechnology, School of Life Science, Nanjing University, Nanjing 210093, China
| | - Jie Yang
- State Key Laboratory of Pharmaceutical Biotechnology, School of Life Science, Nanjing University, Nanjing 210093, China.
| |
Collapse
|
18
|
A comparative analysis of synonymous codon usage bias pattern in human albumin superfamily. ScientificWorldJournal 2014; 2014:639682. [PMID: 24707212 PMCID: PMC3951064 DOI: 10.1155/2014/639682] [Citation(s) in RCA: 26] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/25/2013] [Accepted: 01/11/2014] [Indexed: 11/29/2022] Open
Abstract
Synonymous codon usage bias is an inevitable phenomenon in organismic taxa across the three domains of life. Though the frequency of codon usage is not equal across species and within genome in the same species, the phenomenon is non random and is tissue-specific. Several factors such as GC content, nucleotide distribution, protein hydropathy, protein secondary structure, and translational selection are reported to contribute to codon usage preference. The synonymous codon usage patterns can be helpful in revealing the expression pattern of genes as well as the evolutionary relationship between the sequences. In this study, synonymous codon usage bias patterns were determined for the evolutionarily close proteins of albumin superfamily, namely, albumin, α-fetoprotein, afamin, and vitamin D-binding protein. Our study demonstrated that the genes of the four albumin superfamily members have low GC content and high values of effective number of codons (ENC) suggesting high expressivity of these genes and less bias in codon usage preferences. This study also provided evidence that the albumin superfamily members are not subjected to mutational selection pressure.
Collapse
|
19
|
Pal A, Mukhopadhyay S, Bothra AK. Statistical analysis of pentose phosphate pathway genes from eubacteria and eukarya reveals translational selection as a major force in shaping codon usage pattern. Bioinformation 2013; 9:349-56. [PMID: 23750079 PMCID: PMC3669787 DOI: 10.6026/97320630009349] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/26/2013] [Accepted: 03/27/2013] [Indexed: 11/23/2022] Open
Abstract
Comparative analysis of metabolic pathways among widely diverse species provides an excellent opportunity to extract information about the functional relation of organisms and pentose phosphate pathway exemplifies one such pathway. A comparative codon usage analysis of the pentose phosphate pathway genes of a diverse group of organisms representing different niches and the related factors affecting codon usage with special reference to the major forces influencing codon usage patterns was carried out. It was observed that organism specific codon usage bias percolates into vital metabolic pathway genes irrespective of their near universality. A clear distinction in the codon usage pattern of gram positive and gram negative bacteria, which is a major classification criterion for bacteria, in terms of pentose phosphate pathway was an important observation of this study. The codon utilization scheme in all the organisms indicates the presence of translation selection as a major force in shaping codon usage. Another key observation was the segregation of the H. sapiens genes as a separate cluster by correspondence analysis, which is primarily attributed to the different codon usage pattern in this genus along with its longer gene lengths. We have also analyzed the amino acid distribution comparison of transketolase protein primary structures among all the organisms and found that there is a certain degree of predictability in the composition profile except in A. fumigatus and H. sapiens, where few exceptions are prominent. In A. fumigatus, a human pathogen responsible for invasive aspergillosis, a significantly different codon usage pattern, which finally translated into its amino acid composition model portraying a unique profile in a key pentose phosphate pathway enzyme transketolase was observed.
Collapse
Affiliation(s)
- Ayon Pal
- Department of Botany, Raiganj College (University College) P.O.- Raiganj, Dist.- Uttar Dinajpur, PIN-733134, West Bengal, India
| | - Subhasis Mukhopadhyay
- Bioinformatics Centre, Department of Biophysics, Molecular Biology and Bioinformatics University of Calcutta, 92 APC Road, Kolkata-700009, West Bengal, India
| | - Asim Kumar Bothra
- Cheminformatics Bioinformatics Lab, Department of Chemistry, Raiganj College (University College) P.O.- Raiganj, Dist.- Uttar Dinajpur, PIN-733134, West Bengal, India
| |
Collapse
|
20
|
Yang J, Li S, Liu YX. Systematic analysis of diabetes- and glucose metabolism-related proteins and its application to Alzheimer’s disease. ACTA ACUST UNITED AC 2013. [DOI: 10.4236/jbise.2013.66078] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/20/2022]
|
21
|
Li M, Zhao Z, Chen J, Wang B, Li Z, Li J, Cai M. Characterization of synonymous codon usage bias in the pseudorabies virus US1 gene. Virol Sin 2012; 27:303-15. [PMID: 23055006 DOI: 10.1007/s12250-012-3270-9] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/10/2012] [Accepted: 09/12/2012] [Indexed: 12/11/2022] Open
Abstract
In the present study, we examined the codon usage bias between pseudorabies virus (PRV) US1 gene and the US1-like genes of 20 reference alphaherpesviruses. Comparative analysis showed noticeable disparities of the synonymous codon usage bias in the 21 alphaherpesviruses, indicated by codon adaptation index, effective number of codons (ENc) and GC3s value. The codon usage pattern of PRV US1 gene was phylogenetically conserved and similar to that of the US1-like genes of the genus Varicellovirus of alphaherpesvirus, with a strong bias towards the codons with C and G at the third codon position. Cluster analysis of codon usage pattern of PRV US1 gene with its reference alphaherpesviruses demonstrated that the codon usage bias of US1-like genes of 21 alphaherpesviruses had a very close relation with their gene functions. ENc-plot revealed that the genetic heterogeneity in PRV US1 gene and the 20 reference alphaherpesviruses was constrained by G+C content, as well as the gene length. In addition, comparison of codon preferences in the US1 gene of PRV with those of E. coli, yeast and human revealed that there were 50 codons showing distinct usage differences between PRV and yeast, 49 between PRV and human, but 48 between PRV and E. coli. Although there were slightly fewer differences in codon usages between E.coli and PRV, the difference is unlikely to be statistically significant, and experimental studies are necessary to establish the most suitable expression system for PRV US1. In conclusion, these results may improve our understanding of the evolution, pathogenesis and functional studies of PRV, as well as contributing to the area of herpesvirus research or even studies with other viruses.
Collapse
Affiliation(s)
- Meili Li
- Department of Pathogenic Biology and Immunology, Guangzhou Medical University, Guangzhou 510182, China
| | | | | | | | | | | | | |
Collapse
|
22
|
Liu XS, Zhang YG, Fang YZ, Wang YL. Patterns and influencing factor of synonymous codon usage in porcine circovirus. Virol J 2012; 9:68. [PMID: 22416942 PMCID: PMC3341187 DOI: 10.1186/1743-422x-9-68] [Citation(s) in RCA: 32] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/13/2011] [Accepted: 03/15/2012] [Indexed: 11/11/2022] Open
Abstract
Background Analysis of codon usage can reveal much about the molecular evolution of the viruses. Nevertheless, little information about synonymous codon usage pattern of porcine circovirus (PCV) genome in the process of its evolution is available. In this study, to give a new understanding on the evolutionary characteristics of PCV and the effects of natural selection from its host on the codon usage pattern of the virus, Patterns and the key determinants of codon usage in PCV were examined. Methods We carried out comprehensive analysis on codon usage pattern in the PCV genome, by calculating relative synonymous codon usage (RSCU), effective number of codons (ENC), dinucleotides and nucleic acid content of the PCV genome. Results PCV genomes have relatively much lower content of GC and codon preference, this result shows that nucleotide constraints have a major impact on its synonymous codon usage. The results of the correspondence analysis indicate codon usage patterns of PCV of various genotypes, various subgenotypes changed greatly, and significant differences in codon usage patterns of Each virus of Circoviridae.There is much comparability between PCV and its host in their synonymous codon usage, suggesting that the natural selection pressure from the host factor also affect the codon usage patterns of PCV. In particular, PCV genotype II is in synonymous codon usage more similar to pig than to PCV genotype I, which may be one of the most important molecular mechanisms of PCV genotype II to cause disease. The calculations results of the relative abundance of dinucleotides indicate that the composition of dinucleotides also plays a key role in the variation found in synonymous codon usage in PCV. Furthermore, geographic factors, the general average hydrophobicity and the aromaticity may be related to the formation of codon usage patterns of PCV. Conclusion The results of these studies suggest that synonymous codon usage pattern of PCV genome are the result of interaction between mutation pressure and natural selection from its host. The information from this study may not only have theoretical value in understanding the characteristics of synonymous codon usage in PCV genomes, but also have significant value for the molecular evolution of PCV.
Collapse
Affiliation(s)
- Xin-sheng Liu
- State Key Laboratory of Veterinary Etiological Biology, National Foot and Mouth Disease Reference Laboratory, Lanzhou Veterinary Research Institute, Chinese Academy of Agricultural Sciences, Lanzhou 730046, People's Republic of China
| | | | | | | |
Collapse
|
23
|
Zhu E, Sambath S. Analysis of Codon Usage Bias in Interferon Alpha Gene of the Giant Panda (Ailuropoda Melanoleuca). ADVANCES IN INTELLIGENT AND SOFT COMPUTING 2012. [PMCID: PMC7123504 DOI: 10.1007/978-3-642-27537-1_37] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
Abstract
The analysis on codon usage bias of IFN-a gene of giant panda (Ailuropoda melanoleuca) may provide a basis for understanding the evolution relationship of giant panda and for selecting appropriate host expression systems to improve the expression of target genes. In this paper, the codon usage bias in the mature IFN-a sequence of giant panda and 15 reference species have been analyzed. The results showed that the synonymous codons with G and C at the third codon position were widely used and the ENC-GC3S plot revealed that the genetic heterogeneity in IFN-a gene was main constrained by mutational bias. Contrastive analysis revealed that there were 40 codons showing distinct usage differences between GpIFN-a and Escherichia coli, 38 codons between GpIFN-a and yeast. and only 30 between GpIFN-a and Homo sapiens. Therefore the Homo expression system may be more suitable for the expression of GpIFN-a genes.
Collapse
Affiliation(s)
- Egui Zhu
- South China Normal University, Guangzhou, 510631 China, People's Republic
| | - Sabo Sambath
- South China Normal University, Guangzhou, 510631 China, People's Republic
| |
Collapse
|
24
|
Analysis of synonymous codon usage in the UL24 gene of duck enteritis virus. Virus Genes 2008; 38:96-103. [PMID: 18958612 DOI: 10.1007/s11262-008-0295-0] [Citation(s) in RCA: 33] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/26/2008] [Accepted: 10/09/2008] [Indexed: 10/21/2022]
Abstract
The analysis on codon usage bias of UL24 gene of duck enteritis virus (DEV) may improve our understanding of the evolution and pathogenesis of DEV and provide a basis for understanding the relevant mechanism for biased usage of synonymous codons and for selecting appropriate expression systems to improve the expression of target genes. The codon usage bias of UL24 genes of DEV and 27 reference herpesviruses were analyzed. The results showed that codon of UL24 gene of DEV was strong bias toward the synonymous codons with A and T at the third codon position. A high level of diversity in codon usage bias existed, and the effective number of codons used in a gene plot revealed that the genetic heterogeneity in UL24 gene of herpesviruses was constrained by the G + C content. The phylogentic analysis suggested that DEV was evolutionarily closer to Alphaherpesvirinae and that there was no significant deviation in codon usage in different virus strains. There were 20 codons showing distinct usage differences between DEV and Escherichia coli, 23 between DEV and Homo sapiens, but only 16 codons between DEV and yeast. Therefore the yeast expression system may be more suitable for the expression of DEV genes.
Collapse
|
25
|
Biro JC. Does codon bias have an evolutionary origin? Theor Biol Med Model 2008; 5:16. [PMID: 18667081 PMCID: PMC2519059 DOI: 10.1186/1742-4682-5-16] [Citation(s) in RCA: 21] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/07/2008] [Accepted: 07/30/2008] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND There is a 3-fold redundancy in the Genetic Code; most amino acids are encoded by more than one codon. These synonymous codons are not used equally; there is a Codon Usage Bias (CUB). This article will provide novel information about the origin and evolution of this bias. RESULTS Codon Usage Bias (CUB, defined here as deviation from equal usage of synonymous codons) was studied in 113 species. The average CUB was 29.3 +/- 1.1% (S.E.M, n = 113) of the theoretical maximum and declined progressively with evolution and increasing genome complexity. A Pan-Genomic Codon Usage Frequency (CUF) Table was constructed to describe genome-wide relationships among codons. Significant correlations were found between the number of synonymous codons and (i) the frequency of the respective amino acids (ii) the size of CUB. Numerous, statistically highly significant, internal correlations were found among codons and the nucleic acids they comprise. These strong correlations made it possible to predict missing synonymous codons (wobble bases) reliably from the remaining codons or codon residues. CONCLUSION The results put the concept of "codon bias" into a novel perspective. The internal connectivity of codons indicates that all synonymous codons might be integrated parts of the Genetic Code with equal importance in maintaining its functional integrity.
Collapse
Affiliation(s)
- Jan C Biro
- Homulus Foundation, 612 S Flower St, Los Angeles, CA 90017, USA.
| |
Collapse
|
26
|
Mukhopadhyay P, Basak S, Ghosh TC. Synonymous codon usage in different protein secondary structural classes of human genes: implication for increased non-randomness of GC3 rich genes towards protein stability. J Biosci 2007; 32:947-63. [PMID: 17914237 DOI: 10.1007/s12038-007-0095-z] [Citation(s) in RCA: 15] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]
Abstract
The relationship between the synonymous codon usage and different protein secondary structural classes were investigated using 401 Homo sapiens proteins extracted from Protein Data Bank (PDB). A simple Chi-square test was used to assess the significance of deviation of the observed and expected frequencies of 59 codons at the level of individual synonymous families in the four different protein secondary structural classes. It was observed that synonymous codon families show non-randomness in codon usage in four different secondary structural classes. However,when the genes were classified according to their GC3 levels there was an increase in non-randomness in high GC3 group of genes. The non-randomness in codon usage was further tested among the same protein secondary structures belonging to four different protein folding classes of high GC3 group of genes. The results show that in each of the protein secondary structural unit there exist some synonymous family that shows class specific codon-usage pattern. Moreover, there is an increased non-random behaviour of synonymous codons in sheet structure of all secondary structural classes in high GC3 group of genes. Biological implications of these results have been discussed.
Collapse
Affiliation(s)
- Pamela Mukhopadhyay
- Bioinformatics Centre, Bose Institute, P 1/12, CIT Scheme VII M, Kolkata 700 054, India
| | | | | |
Collapse
|
27
|
Biro JC. The Proteomic Code: a molecular recognition code for proteins. Theor Biol Med Model 2007; 4:45. [PMID: 17999762 PMCID: PMC2206014 DOI: 10.1186/1742-4682-4-45] [Citation(s) in RCA: 26] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/02/2007] [Accepted: 11/13/2007] [Indexed: 11/30/2022] Open
Abstract
Background The Proteomic Code is a set of rules by which information in genetic material is transferred into the physico-chemical properties of amino acids. It determines how individual amino acids interact with each other during folding and in specific protein-protein interactions. The Proteomic Code is part of the redundant Genetic Code. Review The 25-year-old history of this concept is reviewed from the first independent suggestions by Biro and Mekler, through the works of Blalock, Root-Bernstein, Siemion, Miller and others, followed by the discovery of a Common Periodic Table of Codons and Nucleic Acids in 2003 and culminating in the recent conceptualization of partial complementary coding of interacting amino acids as well as the theory of the nucleic acid-assisted protein folding. Methods and conclusions A novel cloning method for the design and production of specific, high-affinity-reacting proteins (SHARP) is presented. This method is based on the concept of proteomic codes and is suitable for large-scale, industrial production of specifically interacting peptides.
Collapse
Affiliation(s)
- Jan C Biro
- Homulus Foundation, 88 Howard, #1205, San Francisco, CA 94105, USA.
| |
Collapse
|
28
|
Angellotti MC, Bhuiyan SB, Chen G, Wan XF. CodonO: codon usage bias analysis within and across genomes. Nucleic Acids Res 2007; 35:W132-6. [PMID: 17537810 PMCID: PMC1933134 DOI: 10.1093/nar/gkm392] [Citation(s) in RCA: 85] [Impact Index Per Article: 4.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open
Abstract
UNLABELLED Synonymous codon usage biases are associated with various biological factors, such as gene expression level, gene length, gene translation initiation signal, protein amino acid composition, protein structure, tRNA abundance, mutation frequency and patterns, and GC compositions. Quantification of codon usage bias helps understand evolution of living organisms. A codon usage bias pipeline is demanding for codon usage bias analyses within and across genomes. Here we present a CodonO webserver service as a user-friendly tool for codon usage bias analyses across and within genomes in real time. The webserver is available at http//www.sysbiology.org/CodonO. CONTACT wanhenry@yahoo.com.
Collapse
Affiliation(s)
| | | | | | - Xiu-Feng Wan
- *To whom correspondence should be addressed. +1-513-529-0426+1-513-529-2431
| |
Collapse
|
29
|
Abstract
The vertebrate genome is a mosaic of GC-poor and GC-rich isochores, megabase-sized DNA regions of fairly homogeneous base composition that differ in relative amount, gene density, gene expression, replication timing, and recombination frequency. At the emergence of warm-blooded vertebrates, the gene-rich, moderately GC-rich isochores of the cold-blooded ancestors underwent a GC increase. This increase was similar in mammals and birds and was maintained during the evolution of mammalian and avian orders. Neither the GC increase nor its conservation can be accounted for by the random fixation of neutral or nearly neutral single-nucleotide changes (i.e., the vast majority of nucleotide substitutions) or by a biased gene conversion process occurring at random genome locations. Both phenomena can be explained, however, by the neoselectionist theory of genome evolution that is presented here. This theory fully accepts Ohta's nearly neutral view of point mutations but proposes in addition (i) that the AT-biased mutational input present in vertebrates pushes some DNA regions below a certain GC threshold; (ii) that these lower GC levels cause regional changes in chromatin structure that lead to deleterious effects on replication and transcription; and (iii) that the carriers of these changes undergo negative (purifying) selection, the final result being a compositional conservation of the original isochore pattern in the surviving population. Negative selection may also largely explain the GC increase accompanying the emergence of warm-blooded vertebrates. In conclusion, the neoselectionist theory not only provides a solution to the neutralist/selectionist debate but also introduces an epigenomic component in genome evolution.
Collapse
Affiliation(s)
- Giorgio Bernardi
- Molecular Evolution Laboratory, Stazione Zoologica Anton Dohrn, Villa Comunale, 80121 Naples, Italy.
| |
Collapse
|
30
|
Biro JC. Protein folding information in nucleic acids which is not present in the genetic code. Ann N Y Acad Sci 2007; 1091:399-411. [PMID: 17341631 DOI: 10.1196/annals.1378.083] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022]
Abstract
Nucleic acid subsequences comprising the 1st and/or 3rd codon residues in mRNAs express significantly higher free folding energy (FFE) than the subsequence containing only the 2nd residues (P < 0.0001, n = 81). This periodic FFE difference is not present in introns. The FFE in the 1st and 3rd residues is additive, which suggests that these residues contain a significant number of complementary bases and contribute to selection for local mRNA secondary structures. This periodic, codon-related structure forming of mRNAs indicates a connection between the structure of exons and the corresponding (translated) proteins. The folding energy dot plots of RNAs and the residue contact maps of the coded proteins are indeed similar. Residue contact statistics using 81 different protein structures confirmed that amino acids that are coded by partially reverse and complementary codons (Watson-Crick base pairs at the 1st and 3rd codon positions and translated in reverse orientation) are preferentially co-located in protein structures.
Collapse
Affiliation(s)
- Jan C Biro
- Homulus Foundation, 88 Howard #1205, San Francisco, CA 94195, USA.
| |
Collapse
|
31
|
Kahali B, Basak S, Ghosh TC. Reinvestigating the codon and amino acid usage of S. cerevisiae genome: a new insight from protein secondary structure analysis. Biochem Biophys Res Commun 2007; 354:693-9. [PMID: 17258174 DOI: 10.1016/j.bbrc.2007.01.038] [Citation(s) in RCA: 39] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/28/2006] [Accepted: 01/05/2007] [Indexed: 11/29/2022]
Abstract
Biased usage of synonymous codons has been elucidated under the perspective of cellular tRNA abundance for quite a long time now. Taking advantage of publicly available gene expression data for Saccharomyces cerevisiae, a systematic analysis of the codon and amino acid usages in two different coding regions corresponding to the regular (helix and strand) as well as the irregular (coil) protein secondary structures, have been performed. Our analyses suggest that apart from tRNA abundance, mRNA folding stability is another major evolutionary force in shaping the codon and amino acid usage differences between the highly and lowly expressed genes in S. cerevisiae genome and surprisingly it depends on the coding regions corresponding to the secondary structures of the encoded proteins. This is obviously a new paradigm in understanding the codon usage in S. cerevisiae. Differential amino acid usage between highly and lowly expressed genes in the regions coding for the irregular protein secondary structure in S. cerevisiae is expounded by the stability of the mRNA folded structure. Irrespective of the protein secondary structural type, the highly expressed genes always tend to encode cheaper amino acids in order to reduce the overall biosynthetic cost of production of the corresponding protein. This study supports the hypothesis that the tRNA abundance is a consequence of and not a reason for the biased usage of amino acid between highly and lowly expressed genes.
Collapse
Affiliation(s)
- Bratati Kahali
- Bioinformatics Centre, Bose Institute, P 1/12, C.I.T. Scheme VII M, Kolkata 700 054, India
| | | | | |
Collapse
|
32
|
Biro JC. Indications that "codon boundaries" are physico-chemically defined and that protein-folding information is contained in the redundant exon bases. Theor Biol Med Model 2006; 3:28. [PMID: 16893453 PMCID: PMC1560374 DOI: 10.1186/1742-4682-3-28] [Citation(s) in RCA: 11] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/16/2005] [Accepted: 08/07/2006] [Indexed: 12/02/2022] Open
Abstract
Background All the information necessary for protein folding is supposed to be present in the amino acid sequence. It is still not possible to provide specific ab initio structure predictions by bioinformatical methods. It is suspected that additional folding information is present in protein coding nucleic acid sequences, but this is not represented by the known genetic code. Results Nucleic acid subsequences comprising the 1st and/or 3rd codon residues in mRNAs express significantly higher free folding energy (FFE) than the subsequence containing only the 2nd residues (p < 0.0001, n = 81). This periodic FFE difference is not present in introns. It is therefore a specific physico-chemical characteristic of coding sequences and might contribute to unambiguous definition of codon boundaries during translation. The FFEs of the 1st and 3rd residues are additive, which suggests that these residues contain a significant number of complementary bases and that may contribute to selection for local RNA secondary structures in coding regions. This periodic, codon-related structure-formation of mRNAs indicates a connection between the structures of exons and the corresponding (translated) proteins. The folding energy dot plots of RNAs and the residue contact maps of the coded proteins are indeed similar. Residue contact statistics using 81 different protein structures confirmed that amino acids that are coded by partially reverse and complementary codons (Watson-Crick (WC) base pairs at the 1st and 3rd codon positions and translated in reverse orientation) are preferentially co-located in protein structures. Conclusion Exons are distinguished from introns, and codon boundaries are physico-chemically defined, by periodically distributed FFE differences between codon positions. There is a selection for local RNA secondary structures in coding regions and this nucleic acid structure resembles the folding profiles of the coded proteins. The preferentially (specifically) interacting amino acids are coded by partially complementary codons, which strongly supports the connection between mRNA and the corresponding protein structures and indicates that there is protein folding information in nucleic acids that is not present in the genetic code. This might suggest an additional explanation of codon redundancy.
Collapse
|
33
|
Sau K, Sau S, Mandal SC, Ghosh TC. Factors influencing the synonymous codon and amino acid usage bias in AT-rich Pseudomonas aeruginosa phage PhiKZ. Acta Biochim Biophys Sin (Shanghai) 2005; 37:625-33. [PMID: 16143818 PMCID: PMC7109957 DOI: 10.1111/j.1745-7270.2005.00089.x] [Citation(s) in RCA: 14] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022] Open
Abstract
To reveal how the AT-rich genome of bacteriophage PhiKZ has been shaped in order to carry out its growth in the GC-rich host Pseudomonas aeruginosa, synonymous codon and amino acid usage bias of PhiKZ was investigated and the data were compared with that of P. aeruginosa. It was found that synonymous codon and amino acid usage of PhiKZ was distinct from that of P. aeruginosa. In contrast to P. aeruginosa, the third codon position of the synonymous codons of PhiKZ carries mostly A or T base; codon usage bias in PhiKZ is dictated mainly by mutational bias and, to a lesser extent, by translational selection. A cluster analysis of the relative synonymous codon usage values of 16 myoviruses including PhiKZ shows that PhiKZ is evolutionary much closer to Escherichia coli phage T4. Further analysis reveals that the three factors of mean molecular weight, aromaticity and cysteine content are mostly responsible for the variation of amino acid usage in PhiKZ proteins, whereas amino acid usage of P. aeruginosa proteins is mainly governed by grand average of hydropathicity, aromaticity and cysteine content. Based on these observations, we suggest that codons of the phage-like PhiKZ have evolved to preferentially incorporate the smaller amino acid residues into their proteins during translation, thereby economizing the cost of its development in GC-rich P. aeruginosa.
Collapse
Affiliation(s)
- K. Sau
- Department of Mathematics, Jadavpur UniversityCalcutta 700 032, India
| | - S. Sau
- Department of Biochemistry, Bose Institute, P1/12-CIT Scheme VII MCalcutta 700 054, India
| | - S. C. Mandal
- Department of Mathematics, Jadavpur UniversityCalcutta 700 032, India
- Corresponding authors: S. C. MANDAL: E-mail,
| | - T. C. Ghosh
- Bioinformatics Centre, Bose Institute, P1/12-CIT Scheme VII MCalcutta 700 054, India
- T. C. GHOSH: Tel, +91-33-2334 6626; Fax, +91-33-2334 3886; E-mail,
| |
Collapse
|
34
|
Sahu K, Gupta SK, Sau S, Ghosh TC. Comparative Analysis of the Base Composition and Codon Usages in Fourteen Mycobacteriophage Genomes. J Biomol Struct Dyn 2005; 23:63-71. [PMID: 15918677 DOI: 10.1080/07391102.2005.10507047] [Citation(s) in RCA: 10] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/28/2022]
Abstract
To study the possible codon usage and base composition variation in the bacteriophages, fourteen mycobacteriophages were used as a model system here and both the parameters in all these phages and their plating bacteria, M. smegmatis had been determined and compared. As all the organisms are GC-rich, the GC contents at third codon positions were found in fact higher than the second codon positions as well as the first + second codon positions in all the organisms indicating that directional mutational pressure is strongly operative at the synonymous third codon positions. Nc plot indicates that codon usage variation in all these organisms are governed by the forces other than compositional constraints. Correspondence analysis suggests that: (i) there are codon usage variation among the genes and genomes of the fourteen mycobacteriophages and M. smegmatis, i.e., codon usage patterns in the mycobacteriophages is phage-specific but not the M. smegmatis-specific; (ii) synonymous codon usage patterns of Barnyard, Che8, Che9d, and Omega are more similar than the rest mycobacteriophages and M. smegmatis; (iii) codon usage bias in the mycobacteriophages are mainly determined by mutational pressure; and (iv) the genes of comparatively GC rich genomes are more biased than the GC poor genomes. Translational selection in determining the codon usage variation in highly expressed genes can be invoked from the predominant occurrences of C ending codons in the highly expressed genes. Cluster analysis based on codon usage data also shows that there are two distinct branches for the fourteen mycobacteriophages and there is codon usage variation even among the phages of each branch.
Collapse
Affiliation(s)
- K Sahu
- Bioinformatics Centre, Bose Institute, P1/12 - CIT Scheme VII M, Calcutta 700 054, India
| | | | | | | |
Collapse
|
35
|
Banerjee T, Gupta SK, Ghosh TC. Role of mutational bias and natural selection on genome-wide nucleotide bias in prokaryotic organisms. Biosystems 2005; 81:11-8. [PMID: 15917123 DOI: 10.1016/j.biosystems.2005.01.002] [Citation(s) in RCA: 20] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/05/2004] [Revised: 01/08/2005] [Accepted: 01/12/2005] [Indexed: 11/24/2022]
Abstract
Correlations between genomic GC contents and amino acid frequencies were studied in the homologous sequences of 12 eubacterial genomes. Results show that amino acids encoded by GC-rich codons increases significantly with genomic GC contents, whereas opposite trend was observed in case of amino acids encoded by GC-poor codons. Further studies show all the amino acids do not change in the predicted direction according to their genomic GC pressure, suggesting that protein evolution is not entirely dictated by their nucleotide frequencies. Amino acid substitution matrix calculated among hydrophobic, amphipathic and hydrophilic amino acid groups' shows that amphipathic and hydrophilic amino acids are more frequently substituted by hydrophobic amino acids than from hydrophobic to hydrophilic or amphipathic amino acids. This indicates that nucleotide bias induces a directional changes in proteome composition in such a way that underwent strong changes in hydropathy values. In fact, significant increases in hydrophobicity values have also been observed with the increase of genomic GC contents. Correlations between GC contents and amino acid compositions in three different predicted protein secondary structures show that hydropathy values increases significantly with GC contents in aperiodic and helix structures whereas strand structure remains insensitive with the genomic GC levels. The relative importance of mutation and selection on the evolution of proteins have been discussed on the basis of these results.
Collapse
Affiliation(s)
- T Banerjee
- Bioinformatics Centre, Bose Institute, P 1/12, C.I.T. Scheme VII M, Kolkata 700 054, India
| | | | | |
Collapse
|
36
|
D'Onofrio G, Ghosh TC. The compositional transition of vertebrate genomes: an analysis of the secondary structure of the proteins encoded by human genes. Gene 2005; 345:27-33. [PMID: 15716110 DOI: 10.1016/j.gene.2004.11.037] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/21/2004] [Revised: 11/12/2004] [Accepted: 11/23/2004] [Indexed: 11/25/2022]
Abstract
Fluctuations and increments of both C(3) and G(3) levels along the human coding sequences were investigated comparing two sets of Xenopus/human orthologous genes. The first set of genes shows minor differences of the GC(3) levels, the second shows considerable increments of the GC(3) levels in the human genes. In both data sets, the fluctuations of C(3) and G(3) levels along the coding sequences correlated with the secondary structures of the encoded proteins. The human genes that underwent the compositional transition showed a different increment of the C(3) and G(3) levels within and among the structural units of the proteins. The relative synonymous codon usage (RSCU) of several amino acids were also affected during the compositional transition, showing that there exists a correlation between RSCU and protein secondary structures in human genes. The importance of natural selection for the formation of isochore organization of the human genome has been discussed on the basis of these results.
Collapse
Affiliation(s)
- Giuseppe D'Onofrio
- Laboratorio di Evoluzione Molecolare, Stazione Zoologica A. Dohrn, 80121 Napoli, Italy.
| | | |
Collapse
|
37
|
Wan XF, Xu D, Kleinhofs A, Zhou J. Quantitative relationship between synonymous codon usage bias and GC composition across unicellular genomes. BMC Evol Biol 2004; 4:19. [PMID: 15222899 PMCID: PMC476735 DOI: 10.1186/1471-2148-4-19] [Citation(s) in RCA: 96] [Impact Index Per Article: 4.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/28/2004] [Accepted: 06/28/2004] [Indexed: 11/25/2022] Open
Abstract
Background Codon usage bias has been widely reported to correlate with GC composition. However, the quantitative relationship between codon usage bias and GC composition across species has not been reported. Results Based on an informatics method (SCUO) we developed previously using Shannon informational theory and maximum entropy theory, we investigated the quantitative relationship between codon usage bias and GC composition. The regression based on 70 bacterial and 16 archaeal genomes showed that in bacteria, SCUO = -2.06 * GC3 + 2.05*(GC3)2 + 0.65, r = 0.91, and that in archaea, SCUO = -1.79 * GC3 + 1.85*(GC3)2 + 0.56, r = 0.89. We developed an analytical model to quantify synonymous codon usage bias by GC compositions based on SCUO. The parameters within this model were inferred by inspecting the relationship between codon usage bias and GC composition across 70 bacterial and 16 archaeal genomes. We further simplified this relationship using only GC3. This simple model was supported by computational simulation. Conclusions The synonymous codon usage bias could be simply expressed as 1+ (p/2)log2(p/2) + ((1-p)/2)log2((l-p)/2), where p = GC3. The software we developed for measuring SCUO (codonO) is available at .
Collapse
Affiliation(s)
- Xiu-Feng Wan
- Environmental Sciences Division, Oak Ridge National Laboratory, Oak Ridge, TN 37831, USA
- Digital Biology Laboratory, Department of Computer Science, University of Missouri, Columbia, MO 65211, USA
| | - Dong Xu
- Digital Biology Laboratory, Department of Computer Science, University of Missouri, Columbia, MO 65211, USA
| | - Andris Kleinhofs
- Department of Genetics and Cell Biology, Washington State University, Pullman, WA 99164, USA
| | - Jizhong Zhou
- Environmental Sciences Division, Oak Ridge National Laboratory, Oak Ridge, TN 37831, USA
| |
Collapse
|