1
|
Alonso AM, Diambra L. Dicodon-based measures for modeling gene expression. Bioinformatics 2023; 39:btad380. [PMID: 37307098 PMCID: PMC10287933 DOI: 10.1093/bioinformatics/btad380] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/20/2022] [Revised: 05/20/2023] [Accepted: 06/09/2023] [Indexed: 06/14/2023] Open
Abstract
MOTIVATION Codon usage preference patterns have been associated with modulation of translation efficiency, protein folding, and mRNA decay. However, new studies support that codon pair usage has also a remarkable effect at the gene expression level. Here, we expand the concept of CAI to answer if codon pair usage patterns can be understood in terms of codon usage bias, or if they offer new information regarding coding translation efficiency. RESULTS Through the implementation of a weighting strategy to consider the dicodon contributions, we observe that the dicodon-based measure has greater correlations with gene expression level than CAI. Interestingly, we have noted that dicodons associated with a low value of adaptiveness are related to dicodons which mediate strong translational inhibition in yeast. We have also noticed that some codon-pairs have a smaller dicodon contribution than estimated by the product of the respective codon contributions. AVAILABILITY AND IMPLEMENTATION Scripts, implemented in Python, are freely available for download at https://zenodo.org/record/7738276#.ZBIDBtLMIdU.
Collapse
Affiliation(s)
- Andres M Alonso
- Instituto Tecnológico Chascomús (INTECH), CONICET-UNSAM, Intendente Marino km 8.2, Chascomús, 7130 Provincia de Buenos Aires, Argentina
- CCT-La Plata, CONICET, Calle 8 Nº 1467, La Plata, B1904CMC Provincia de Buenos Aires, Argentina
| | - Luis Diambra
- CCT-La Plata, CONICET, Calle 8 Nº 1467, La Plata, B1904CMC Provincia de Buenos Aires, Argentina
- Centro Regional de Estudios Genómicos, FCE-UNLP, Blvd 120 N∘ 1461, La Plata, 1900 Provincia de Buenos Aires, Argentina
| |
Collapse
|
2
|
Khrustalev VV, Khrustaleva TA, Popinako AV. Germline mutations directions are different between introns of the same gene: case study of the gene coding for amyloid-beta precursor protein. Genetica 2023; 151:61-73. [PMID: 36129589 DOI: 10.1007/s10709-022-00166-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/23/2022] [Accepted: 09/08/2022] [Indexed: 02/01/2023]
Abstract
Amyloid-beta precursor protein (APP) is highly conserved in mammals. This feature allowed us to compare nucleotide usage biases in fourfold degenerated sites along the length of its coding region for 146 species of mammals and birds in search of fragments with significant deviations. Even though cytosine usage has the highest value in fourfold degenerated sites in APP coding region from all tested placental mammals, in contrast to marsupial mammals with the bias toward thymine usage, the most frequent germline and somatic mutations in human APP coding region are C to T and G to A transitions. The same mutational AT-pressure is characteristic for germline mutations in introns of human APP gene. However, surprisingly, there are several exceptional introns with deviations in germline mutations rates. The most of those introns surround exons with exceptional biases in nucleotide usage in fourfold degenerated sites. Existence of such fragments in exons 4 and 5, as well as in exon 14, can be connected with the presence of lncRNA genes in complementary strand of DNA. Exceptional nucleotide usage bias in exons 16 and 17 that contain a sequence encoding amyloid-beta peptides can be explained either by the presence of yet unmapped lncRNA(s), or by the autonomous expression of a short mRNA that encodes just C-terminal part of the APP providing an alternative source of amyloid-beta peptides. This hypothesis is supported by the increased rate of T to C transitions in introns 16-17 and 17-18 of Human APP gene relatively to other introns.
Collapse
Affiliation(s)
| | | | - Anna Vladimirovna Popinako
- Bach Institute of Biochemistry, Research Center of Biotechnology of the Russian Academy of Sciences, Moscow, Russian Federation
| |
Collapse
|
3
|
Abstract
Bacterial genomes often reflect a bias in the usage of codons. These biases are often most notable within highly expressed genes. While deviations in codon usage can be attributed to selection or mutational biases, they can also be functional, for example controlling gene expression or guiding protein structure. Several different metrics have been developed to identify biases in codon usage. Previously we released a database, CBDB: The Codon Bias Database, in which users could retrieve precalculated codon bias data for bacterial RefSeq genomes. With the increase of bacterial genome sequence data since its release a new tool was needed. Here we present the Dynamic Codon Biaser (DCB) tool, a web application that dynamically calculates the codon usage bias statistics of prokaryotic genomes. DCB bases these calculations on 40 different highly expressed genes (HEGs) that are highly conserved across different prokaryotic species. A user can either specify an NCBI accession number or upload their own sequence. DCB returns both the bias statistics and the genome’s HEG sequences. These calculations have several downstream applications, such as evolutionary studies and phage–host predictions. The source code is freely available, and the website is hosted at www.cbdb.info.
Collapse
Affiliation(s)
- Brian Dehlinger
- Bioinformatics Program, Loyola University Chicago, Chicago, IL 60660, USA
| | - Jared Jurss
- Bioinformatics Program, Loyola University Chicago, Chicago, IL 60660, USA
| | - Karson Lychuk
- Bioinformatics Program, Loyola University Chicago, Chicago, IL 60660, USA
| | - Catherine Putonti
- Bioinformatics Program, Loyola University Chicago, Chicago, IL 60660, USA
- Department of Biology, Loyola University Chicago, Chicago, IL 60660, USA
- Department of Computer Science, Loyola University Chicago, Chicago, IL 60660, USA
- Department of Microbiology and Immunology, Loyola University Chicago, Stritch School of Medicine, Maywood, IL 60153, USA
- *Correspondence: Catherine Putonti,
| |
Collapse
|
4
|
Thompson JD, Ripp R, Mayer C, Poch O, Michel CJ. Potential role of the X circular code in the regulation of gene expression. Biosystems 2021; 203:104368. [PMID: 33567309 DOI: 10.1016/j.biosystems.2021.104368] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/09/2020] [Revised: 01/18/2021] [Accepted: 01/20/2021] [Indexed: 02/06/2023]
Abstract
The X circular code is a set of 20 trinucleotides (codons) that has been identified in the protein-coding genes of most organisms (bacteria, archaea, eukaryotes, plasmids, viruses). It has been shown previously that the X circular code has the important mathematical property of being an error-correcting code. Thus, motifs of the X circular code, i.e. a series of codons belonging to X and called X motifs, allow identification and maintenance of the reading frame in genes. X motifs are significantly enriched in protein-coding genes, but have also been identified in many transfer RNA (tRNA) genes and in important functional regions of the ribosomal RNA (rRNA), notably in the peptidyl transferase center and the decoding center. Here, we investigate the potential role of X motifs as functional elements of protein-coding genes. First, we identify the codons of the X circular code which are frequent or rare in each domain of life (archaea, bacteria, eukaryota) and show that, for the amino acids with the highest codon bias, the preferred codon is often an X codon. We also observe a correlation between the 20 X codons and the optimal codons/dicodons that have been shown to influence translation efficiency. Then, we examined recently published experimental results concerning gene expression levels in diverse organisms. The approach used is the analysis of X motifs according to their density ds(X), i.e. the number of X motifs per kilobase in a gene sequence s. Surprisingly, this simple parameter identifies several unexpected relations between the X circular code and gene expression. For example, the X motifs are significantly enriched in the minimal gene set belonging to the three domains of life, and in codon-optimized genes. Furthermore, the density of X motifs generally correlates with experimental measures of translation efficiency and mRNA stability. Taken together, these results lead us to propose that the X motifs may represent a genetic signal contributing to the maintenance of the correct reading frame and the optimization and regulation of gene expression.
Collapse
Affiliation(s)
- Julie D Thompson
- Department of Computer Science, ICube, CNRS, University of Strasbourg, Strasbourg, France.
| | - Raymond Ripp
- Department of Computer Science, ICube, CNRS, University of Strasbourg, Strasbourg, France.
| | - Claudine Mayer
- Department of Computer Science, ICube, CNRS, University of Strasbourg, Strasbourg, France; Unité de Microbiologie Structurale, Institut Pasteur, CNRS, 75724, Paris Cedex 15, France; Université Paris Diderot, Sorbonne Paris Cité, 75724, Paris Cedex 15, France.
| | - Olivier Poch
- Department of Computer Science, ICube, CNRS, University of Strasbourg, Strasbourg, France.
| | - Christian J Michel
- Department of Computer Science, ICube, CNRS, University of Strasbourg, Strasbourg, France.
| |
Collapse
|
5
|
Samatova E, Daberger J, Liutkute M, Rodnina MV. Translational Control by Ribosome Pausing in Bacteria: How a Non-uniform Pace of Translation Affects Protein Production and Folding. Front Microbiol 2021; 11:619430. [PMID: 33505387 PMCID: PMC7829197 DOI: 10.3389/fmicb.2020.619430] [Citation(s) in RCA: 40] [Impact Index Per Article: 10.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/20/2020] [Accepted: 12/11/2020] [Indexed: 11/23/2022] Open
Abstract
Protein homeostasis of bacterial cells is maintained by coordinated processes of protein production, folding, and degradation. Translational efficiency of a given mRNA depends on how often the ribosomes initiate synthesis of a new polypeptide and how quickly they read the coding sequence to produce a full-length protein. The pace of ribosomes along the mRNA is not uniform: periods of rapid synthesis are separated by pauses. Here, we summarize recent evidence on how ribosome pausing affects translational efficiency and protein folding. We discuss the factors that slow down translation elongation and affect the quality of the newly synthesized protein. Ribosome pausing emerges as important factor contributing to the regulatory programs that ensure the quality of the proteome and integrate the cellular and environmental cues into regulatory circuits of the cell.
Collapse
Affiliation(s)
- Ekaterina Samatova
- Department of Physical Biochemistry, Max Planck Institute for Biophysical Chemistry, Göttingen, Germany
| | - Jan Daberger
- Department of Physical Biochemistry, Max Planck Institute for Biophysical Chemistry, Göttingen, Germany
| | - Marija Liutkute
- Department of Physical Biochemistry, Max Planck Institute for Biophysical Chemistry, Göttingen, Germany
| | - Marina V Rodnina
- Department of Physical Biochemistry, Max Planck Institute for Biophysical Chemistry, Göttingen, Germany
| |
Collapse
|
6
|
Oldfield CJ, Peng Z, Uversky VN, Kurgan L. Codon selection reduces GC content bias in nucleic acids encoding for intrinsically disordered proteins. Cell Mol Life Sci 2020; 77:149-160. [PMID: 31175370 PMCID: PMC11104855 DOI: 10.1007/s00018-019-03166-6] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/03/2019] [Revised: 05/14/2019] [Accepted: 05/28/2019] [Indexed: 02/06/2023]
Abstract
Protein-coding nucleic acids exhibit composition and codon biases between sequences coding for intrinsically disordered regions (IDRs) and those coding for structured regions. IDRs are regions of proteins that are folding self-insufficient and which function without the prerequisite of folded structure. Several authors have investigated composition bias or codon selection in regions encoding for IDRs, primarily in Eukaryota, and concluded that elevated GC content is the result of the biased amino acid composition of IDRs. We substantively extend previous work by examining GC content in regions encoding IDRs, from 44 species in Eukaryota, Archaea, and Bacteria, spanning a wide range of GC content. We confirm that regions coding for IDRs show a significantly elevated GC content, even across all domains of life. Although this is largely attributable to the amino acid composition bias of IDRs, we show that this bias is independent of the overall GC content and, most importantly, we are the first to observe that GC content bias in IDRs is significantly different than expected from IDR amino acid composition alone. We empirically find compensatory codon selection that reduces the observed GC content bias in IDRs. This selection is dependent on the overall GC content of the organism. The codon selection bias manifests as use of infrequent, AT-rich codons in encoding IDRs. Further, we find these relationships to be independent of the intrinsic disorder prediction method used, and independent of estimated translation efficiency. These observations are consistent with the previous work, and we speculate on whether the observed biases are causal or symptomatic of other driving forces.
Collapse
Affiliation(s)
- Christopher J Oldfield
- Department of Computer Science, Virginia Commonwealth University, Richmond, VA, 23284, USA.
| | - Zhenling Peng
- Center for Applied Mathematics, Tianjin University, Tianjin, 300072, China
| | - Vladimir N Uversky
- Department of Molecular Medicine and USF Health Byrd Alzheimer's Research Institute, Morsani College of Medicine, University of South Florida, Tampa, FL, 33612, USA
- Institute for Biological Instrumentation, Russian Academy of Sciences, 142290, Pushchino, Moscow Region, Russia
| | - Lukasz Kurgan
- Department of Computer Science, Virginia Commonwealth University, Richmond, VA, 23284, USA.
| |
Collapse
|
7
|
Khrustalev VV, Khrustaleva TA, Stojarov AN, Sharma N, Bhaskar B, Giri R. The history of mutational pressure changes during the evolution of adeno-associated viruses: A message to gene therapy and DNA-vaccine vectors designers. INFECTION GENETICS AND EVOLUTION 2019; 77:104100. [PMID: 31678645 DOI: 10.1016/j.meegid.2019.104100] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/11/2019] [Revised: 08/25/2019] [Accepted: 10/29/2019] [Indexed: 10/25/2022]
Abstract
The use of virus-associated vectors for gene therapy and vaccination have emerged as safe and effective delivery system. Like all other genetic materials, these vehicles are also prone to spontaneous mutations. To understand what types of nucleotide mutations are expected in the vector, one needs to know distinct characteristics of mutational process in the corresponding virus. In this study we analyzed mutational pressure directions along the length of the genomes of all types of primate adeno-associated viruses (AAV) that are frequently used in gene therapy or DNA-vaccines. We observed clear evidences of transcription-associated mutational pressure in AAV: nucleotide usage biases are changing drastically after each of the three promoters: the higher the rate of transcription, the stronger the bias towards GC to AT mutations. Moreover, the usage of G decreased at the lower transcription rate (after P19 promoter) than the usage of C (after P40 promoter). Since nucleotide usage biases are retrospective indices, we created a scenario of changes in transcriptional map during the AAV evolution. Current mutational pressure directions are different for AAV types, while all of them demonstrate high rates of T to C transitions in the second long ORF. Since transcription rate and cell tropism are the main factors determining the preferable direction of nucleotide mutations in AAV, mutational pressure should be checked experimentally in DNA vectors before their final design with the aim to make the transferred gene more stable against those mutations.
Collapse
Affiliation(s)
| | - Tatyana Aleksandrovna Khrustaleva
- Biochemical Group of Multidisciplinary Diagnostic Laboratory, Institute of Physiology of the National Academy of Sciences of Belarus, Minsk, Belarus
| | | | - Nitin Sharma
- School of Basic Sciences, Indian Institute of Technology Mandi, Himachal Pradesh 175005, India
| | - Bhaskar Bhaskar
- School of Basic Sciences, Indian Institute of Technology Mandi, Himachal Pradesh 175005, India
| | - Rajanish Giri
- School of Basic Sciences, Indian Institute of Technology Mandi, Himachal Pradesh 175005, India; BioX Centre, Indian Institute of Technology Mandi, VPO Kamand, 175005, India
| |
Collapse
|
8
|
Zeng Z, Bromberg Y. Predicting Functional Effects of Synonymous Variants: A Systematic Review and Perspectives. Front Genet 2019; 10:914. [PMID: 31649718 PMCID: PMC6791167 DOI: 10.3389/fgene.2019.00914] [Citation(s) in RCA: 60] [Impact Index Per Article: 10.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/28/2019] [Accepted: 08/29/2019] [Indexed: 12/13/2022] Open
Abstract
Recent advances in high-throughput experimentation have put the exploration of genome sequences at the forefront of precision medicine. In an effort to interpret the sequencing data, numerous computational methods have been developed for evaluating the effects of genome variants. Interestingly, despite the fact that every person has as many synonymous (sSNV) as non-synonymous single nucleotide variants, our ability to predict their effects is limited. The paucity of experimentally tested sSNV effects appears to be the limiting factor in development of such methods. Here, we summarize the details and evaluate the performance of nine existing computational methods capable of predicting sSNV effects. We used a set of observed and artificially generated variants to approximate large scale performance expectations of these tools. We note that the distribution of these variants across amino acid and codon types suggests purifying evolutionary selection retaining generated variants out of the observed set; i.e., we expect the generated set to be enriched for deleterious variants. Closer inspection of the relationship between the observed variant frequencies and the associated prediction scores identifies predictor-specific scoring thresholds of reliable effect predictions. Notably, across all predictors, the variants scoring above these thresholds were significantly more often generated than observed. which confirms our assumption that the generated set is enriched for deleterious variants. Finally, we find that while the methods differ in their ability to identify severe sSNV effects, no predictor appears capable of definitively recognizing subtle effects of such variants on a large scale.
Collapse
Affiliation(s)
- Zishuo Zeng
- Institute for Quantitative Biomedicine, Rutgers University, Piscataway, NJ, United States
- Department of Biochemistry and Microbiology, Rutgers University, New Brunswick, NJ, United States
| | - Yana Bromberg
- Department of Biochemistry and Microbiology, Rutgers University, New Brunswick, NJ, United States
- Department of Genetics, Rutgers University, Human Genetics Institute, Piscataway, NJ, United States
| |
Collapse
|
9
|
Comprehensive profiling of codon usage signatures and codon context variations in the genus Ustilago. World J Microbiol Biotechnol 2019; 35:118. [DOI: 10.1007/s11274-019-2693-y] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/03/2018] [Accepted: 07/07/2019] [Indexed: 02/02/2023]
|
10
|
Diambra LA. Differential bicodon usage in lowly and highly abundant proteins. PeerJ 2017; 5:e3081. [PMID: 28289571 PMCID: PMC5346287 DOI: 10.7717/peerj.3081] [Citation(s) in RCA: 24] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/01/2016] [Accepted: 02/10/2017] [Indexed: 01/23/2023] Open
Abstract
Degeneracy in the genetic code implies that different codons can encode the same amino acid. Usage preference of synonymous codons has been observed in all domains of life. There is much evidence suggesting that this bias has a major role on protein elongation rate, contributing to differential expression and to co-translational folding. In addition to codon usage bias, other preference variations have been observed such as codon pairs. In this paper, I report that codon pairs have significant different frequency usage for coding either lowly or highly abundant proteins. These usage preferences cannot be explained by the frequency usage of the single codons. The statistical analysis of coding sequences of nine organisms reveals that in many cases bicodon preferences are shared between related organisms. Furthermore, it is observed that misfolding in the drug-transport protein, encoded by MDR1 gene, is better explained by a big change in the pause propensity due to the synonymous bicodon variant, rather than by a relatively small change in codon usage. These findings suggest that codon pair usage can be a more powerful framework to understand translation elongation rate, protein folding efficiency, and to improve protocols to optimize heterologous gene expression.
Collapse
Affiliation(s)
- Luis A. Diambra
- Centro Regional de Estudios Genómicos, Universidad Nacional de La Plata, CONICET, La Plata, Argentina
| |
Collapse
|
11
|
Khrustalev VV, Khrustaleva TA, Sharma N, Giri R. Mutational Pressure in Zika Virus: Local ADAR-Editing Areas Associated with Pauses in Translation and Replication. Front Cell Infect Microbiol 2017; 7:44. [PMID: 28275585 PMCID: PMC5319961 DOI: 10.3389/fcimb.2017.00044] [Citation(s) in RCA: 29] [Impact Index Per Article: 3.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/15/2016] [Accepted: 02/07/2017] [Indexed: 12/21/2022] Open
Abstract
Zika virus (ZIKV) spread led to the recent medical health emergency of international concern. Understanding the variations in virus system is of utmost need. Using available complete sequences of ZIKV we estimated directions of mutational pressure along the length of consensus sequences of three lineages of the virus. Results showed that guanine usage is growing in ZIKV RNA plus strand due to adenine to guanine transitions, while adenine usage is growing due to cytosine to adenine transversions. Especially high levels of guanine have been found in two-fold degenerated sites of certain areas of RNA plus strand with high amount of secondary structure. The usage of cytosine in two-fold degenerated sites shows direct dependence on the amount of secondary structure in 52% (consensus sequence of East African ZIKV lineage)—32% (consensus sequence of epidemic strains) of the length of RNA minus strand. These facts are the evidences of ADAR-editing of both strands of ZIKV genome during pauses in replication. RNA plus strand can also be edited by ADAR during pauses in translation caused by the appearance of groups of rare codons. According to our results, RNA minus strand of epidemic ZIKV strain has lower number of points in which polymerase can be stalled (allowing ADAR-editing) compared to other strains. The data on preferable directions of mutational pressure in epidemic ZIKV strain is useful for future vaccine development and understanding the evolution of new strains.
Collapse
Affiliation(s)
| | - Tatyana A Khrustaleva
- Laboratory of Cellular Technologies, Institute of Physiology of the National Academy of Sciences of Belarus Minsk, Belarus
| | - Nitin Sharma
- School of Basic Sciences, Indian Institute of Technology Mandi Mandi, India
| | - Rajanish Giri
- School of Basic Sciences, Indian Institute of Technology Mandi Mandi, India
| |
Collapse
|
12
|
Roy A, Mukhopadhyay S, Sarkar I, Sen A. Comparative investigation of the various determinants that influence the codon and amino acid usage patterns in the genus Bifidobacterium. World J Microbiol Biotechnol 2015; 31:959-81. [DOI: 10.1007/s11274-015-1850-1] [Citation(s) in RCA: 22] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/31/2015] [Accepted: 03/31/2015] [Indexed: 12/31/2022]
|
13
|
Abstract
Owing to the degeneracy of the genetic code, a protein sequence can be encoded by many different synonymous mRNA coding sequences. Synonymous codon usage was once thought to be functionally neutral, but evidence now indicates it is shaped by evolutionary selection and affects other aspects of protein biogenesis beyond specifying the amino acid sequence of the protein. Synonymous rare codons, once thought to have only negative impacts on the speed and accuracy of translation, are now known to play an important role in diverse functions, including regulation of cotranslational folding, covalent modifications, secretion, and expression level. Mutations altering synonymous codon usage are linked to human diseases. However, much remains unknown about the molecular mechanisms connecting synonymous codon usage to efficient protein biogenesis and proper cell physiology. Here we review recent literature on the functional effects of codon usage, including bioinformatics approaches aimed at identifying general roles for synonymous codon usage.
Collapse
|
14
|
O'Neill PK, Or M, Erill I. scnRCA: a novel method to detect consistent patterns of translational selection in mutationally-biased genomes. PLoS One 2013; 8:e76177. [PMID: 24116094 PMCID: PMC3792112 DOI: 10.1371/journal.pone.0076177] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/21/2013] [Accepted: 08/23/2013] [Indexed: 12/04/2022] Open
Abstract
Codon usage bias (CUB) results from the complex interplay between translational selection and mutational biases. Current methods for CUB analysis apply heuristics to integrate both components, limiting the depth and scope of CUB analysis as a technique to probe into the evolution and optimization of protein-coding genes. Here we introduce a self-consistent CUB index (scnRCA) that incorporates implicit correction for mutational biases, facilitating exploration of the translational selection component of CUB. We validate this technique using gene expression data and we apply it to a detailed analysis of CUB in the Pseudomonadales. Our results illustrate how the selective enrichment of specific codons among highly expressed genes is preserved in the context of genome-wide shifts in codon frequencies, and how the balance between mutational and translational biases leads to varying definitions of codon optimality. We extend this analysis to other moderate and fast growing bacteria and we provide unified support for the hypothesis that C- and A-ending codons of two-box amino acids, and the U-ending codons of four-box amino acids, are systematically enriched among highly expressed genes across bacteria. The use of an unbiased estimator of CUB allows us to report for the first time that the signature of translational selection is strongly conserved in the Pseudomonadales in spite of drastic changes in genome composition, and extends well beyond the core set of highly optimized genes in each genome. We generalize these results to other moderate and fast growing bacteria, hinting at selection for a universal pattern of gene expression that is conserved and detectable in conserved patterns of codon usage bias.
Collapse
Affiliation(s)
- Patrick K. O'Neill
- Department of Biological Sciences, University of Maryland Baltimore County (UMBC), Baltimore, Maryland, United States of America
| | - Mindy Or
- Department of Biological Sciences, University of Maryland Baltimore County (UMBC), Baltimore, Maryland, United States of America
| | - Ivan Erill
- Department of Biological Sciences, University of Maryland Baltimore County (UMBC), Baltimore, Maryland, United States of America
- * E-mail:
| |
Collapse
|