1
|
Mou QH, Hu Z, Zhang J, Daroch M, Tang J. Comparative genomics of thermosynechococcaceae and thermostichaceae: insights into codon usage bias. Acta Biochim Pol 2025; 71:13825. [PMID: 39845100 PMCID: PMC11750575 DOI: 10.3389/abp.2024.13825] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/19/2024] [Accepted: 12/20/2024] [Indexed: 01/24/2025]
Abstract
Members of the families Thermosynechococcaceae and Thermostichaceae are well-known unicellular thermophilic cyanobacteria and a non-thermophilic genus Pseudocalidococcus was newly classified into the former. Analysis of the codon usage bias (CUB) of cyanobacterial species inhabiting different thermal and non-thermal niches will benefit the understanding of their genetic and evolutionary characteristics. Herein, the CUB and codon context patterns of protein-coding genes were systematically analyzed and compared between members of the two families. Overall, the nucleotide composition and CUB indices were found to differ between thermophiles and non-thermophiles. The thermophiles showed a higher G/C content in the codon base composition and tended to end with G/C compared to the non-thermophiles. Correlation analysis indicated significant associations between codon base composition and CUB indices. The results of the effective number of codons, parity-rule 2, neutral and correspondence analyses indicated that mutational pressure and natural selection primarily account for CUB in these cyanobacterial species, but the primary driving forces exhibit variation among genera. Moreover, the optimal codons identified based on relative synonymous codon usage values were found to differ among genera and even within genera. In addition, codon context pattern analysis revealed the specificity of the sequence context of start and stop codons among genera. Intriguingly, the clustering of codon context patterns appeared to be more related to thermotolerance than to phylogenomic relationships. In conclusion, this study facilitates the understanding of the characteristics and sources of variation of CUB and the evolution of the surveyed cyanobacterial clades with different thermotolerance and provides insights into their adaptation to different environments.
Collapse
Affiliation(s)
- Qiao-Hui Mou
- School of Food and Bioengineering, Chengdu University, Chengdu, China
| | - Zhe Hu
- School of Food and Bioengineering, Chengdu University, Chengdu, China
| | - Jing Zhang
- Food Safety Detection Key Laboratory of Sichuan, Technical Center of Chengdu Customs, Chengdu, China
| | - Maurycy Daroch
- School of Environment and Energy, Peking University Shenzhen Graduate School, Shenzhen, China
| | - Jie Tang
- School of Food and Bioengineering, Chengdu University, Chengdu, China
| |
Collapse
|
2
|
Fang J, Hu Y, Hu Z. Comparative analysis of codon usage patterns in 16 chloroplast genomes of suborder Halimedineae. BMC Genomics 2024; 25:945. [PMID: 39379800 PMCID: PMC11459826 DOI: 10.1186/s12864-024-10825-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/07/2024] [Accepted: 09/23/2024] [Indexed: 10/10/2024] Open
Abstract
The Halimedineae are marine green macroalgae that play crucial roles as primary producers in various habitats, including coral reefs, rocky shores, embayments, lagoons, and seagrass beds. Several tropical species have calcified thalli, which contribute significantly to the formation of coral reefs. In this study, we investigated the codon usage patterns and the main factors influencing codon usage bias in 16 chloroplast genomes of the suborder Halimedineae. Nucleotide composition analysis revealed that the codons of these species were enriched in A/U bases and preferred to end in A/U bases, and the distribution of GC content followed a trend of GC1 > GC2 > GC3. 30 optimal codons encoding 17 amino acids were identified, and most of the optimal codons and all of the over-expressed codons preferentially ended with A/U. The neutrality plot, effective number of codons (ENc) plot, and parity rule 2 (PR2) plot analysis indicated that natural selection played a major role in shaping codon usage bias of the most Halimedineae species. The genetic relationships based on their RSCU values and chloroplast protein-coding genes showed the closely related species have similar codon usage patterns. This study describes, for the first time, the codon usage patterns and characterization of Halimedineae chloroplast genomes, and provides new insights into the evolution of this suborder.
Collapse
Affiliation(s)
- Jiao Fang
- Wuhan Institute of Biomedical Sciences, School of Medicine, Jianghan University, Wuhan, Hubei, China.
| | - Yuquan Hu
- Hubei Key Laboratory of Environmental and Health Effects of Persistent Toxic Substances, College of Life Science, Jianghan University, Wuhan, Hubei, China
| | - Zhangfeng Hu
- Wuhan Institute of Biomedical Sciences, School of Medicine, Jianghan University, Wuhan, Hubei, China.
- Hubei Key Laboratory of Environmental and Health Effects of Persistent Toxic Substances, College of Life Science, Jianghan University, Wuhan, Hubei, China.
| |
Collapse
|
3
|
Panda A, Tuller T. Determinants of associations between codon and amino acid usage patterns of microbial communities and the environment inferred based on a cross-biome metagenomic analysis. NPJ Biofilms Microbiomes 2023; 9:5. [PMID: 36693851 PMCID: PMC9873608 DOI: 10.1038/s41522-023-00372-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/15/2022] [Accepted: 01/11/2023] [Indexed: 01/25/2023] Open
Abstract
Codon and amino acid usage were associated with almost every aspect of microbial life. However, how the environment may impact the codon and amino acid choice of microbial communities at the habitat level is not clearly understood. Therefore, in this study, we analyzed codon and amino acid usage patterns of a large number of environmental samples collected from diverse ecological niches. Our results suggested that samples derived from similar environmental niches, in general, show overall similar codon and amino acid distribution as compared to samples from other habitats. To substantiate the relative impact of the environment, we considered several factors, such as their similarity in GC content, or in functional or taxonomic abundance. Our analysis demonstrated that none of these factors can fully explain the trends that we observed at the codon or amino acid level implying a direct environmental influence on them. Further, our analysis demonstrated different levels of selection on codon bias in different microbial communities with the highest bias in host-associated environments such as the digestive system or oral samples and the lowest level of selection in soil and water samples. Considering a large number of metagenomic samples here we showed that microorganisms collected from similar environmental backgrounds exhibit similar patterns of codon and amino acid usage irrespective of the location or time from where the samples were collected. Thus our study suggested a direct impact of the environment on codon and amino usage of microorganisms that cannot be explained considering the influence of other factors.
Collapse
Affiliation(s)
- Arup Panda
- Department of Biomedical Engineering, Tel Aviv University, Tel Aviv, 69978, Israel
| | - Tamir Tuller
- Department of Biomedical Engineering, Tel Aviv University, Tel Aviv, 69978, Israel.
| |
Collapse
|
4
|
Sophiarani Y, Chakraborty S. Comparison of compositional constraints: Nuclear genome vs plasmid genome of Pseudomonas syringae pv. tomato DC3000. J Biosci 2022. [DOI: 10.1007/s12038-022-00296-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/24/2022]
|
5
|
Masłowska-Górnicz A, van den Bosch MRM, Saccenti E, Suarez-Diez M. A large-scale analysis of codon usage bias in 4868 bacterial genomes shows association of codon adaptation index with GC content, protein functional domains and bacterial phenotypes. BIOCHIMICA ET BIOPHYSICA ACTA. GENE REGULATORY MECHANISMS 2022; 1865:194826. [PMID: 35605953 DOI: 10.1016/j.bbagrm.2022.194826] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 01/10/2022] [Revised: 05/05/2022] [Accepted: 05/12/2022] [Indexed: 06/15/2023]
Abstract
Multiple synonymous codons code for the same amino acid, resulting in the degeneracy of the genetic code and in the preferred used of some codons called codon bias usage (CBU). We performed a large-scale analysis of codon usage bias analysing the distribution of the codon adaptation index (CAI) and the codon relative adaptiveness index (RA) in 4868 bacterial genomes. We found that CAI values differ significantly between protein functional domains and part of the protein outside domains and show how CAI, GC content and preferred usage of polymerase III alpha subunits are related. Additionally, we give evidence of the association between CAI and bacterial phenotypes.
Collapse
Affiliation(s)
- Anna Masłowska-Górnicz
- Laboratory of Systems and Synthetic Biology, Wageningen University & Research, Stippeneng 4, 6708 WE Wageningen, the Netherlands
| | - Melanie R M van den Bosch
- Laboratory of Systems and Synthetic Biology, Wageningen University & Research, Stippeneng 4, 6708 WE Wageningen, the Netherlands
| | - Edoardo Saccenti
- Laboratory of Systems and Synthetic Biology, Wageningen University & Research, Stippeneng 4, 6708 WE Wageningen, the Netherlands.
| | - Maria Suarez-Diez
- Laboratory of Systems and Synthetic Biology, Wageningen University & Research, Stippeneng 4, 6708 WE Wageningen, the Netherlands.
| |
Collapse
|
6
|
Tikhomirova TS, Matyunin MA, Lobanov MY, Galzitskaya OV. In-depth analysis of amino acid and nucleotide sequences of Hsp60: how conserved is this protein? Proteins 2021; 90:1119-1141. [PMID: 34964171 DOI: 10.1002/prot.26294] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/10/2021] [Revised: 12/21/2021] [Accepted: 12/23/2021] [Indexed: 11/07/2022]
Abstract
Chaperonin Hsp60, as a protein found in all organisms, is of great interest in medicine, since it is present in many tissues and can be used both as a drug and as an object of targeted therapy. Hence, Hsp60 deserves a fundamental comparative analysis to assess its evolutionary characteristics. It was found that the percent identity of Hsp60 amino acid sequences both within and between phyla was not high enough to identify Hsp60s as highly conserved proteins. However, their ATP binding sites are largely conserved. The amino acid composition of Hsp60s remained relatively constant. At the same time, the analysis of the nucleotide sequences showed that GC content in the Hsp60 genes was comparable to or greater than the genomic values, which may indicate a high resistance to mutations due to tight control of the nucleotide composition by DNA repair systems. Natural selection plays a dominant role in the evolution of Hsp60 genes. The degree of mutational pressure affecting the Hsp60 genes is quite low, and its direction does not depend on taxonomy. Interestingly, for the Hsp60 genes from Chordata, Arthropoda, and Proteobacteria the exact direction of mutational pressure could not be determined. However, upon further division into classes, it was found that the direction of the mutational pressure for Hsp60 genes from Fish differs from that for other chordates. The direction of the mutational pressure affects the synonymous codon usage bias. The number of high and low represented codons increases with increasing GC content, which can improve codon usage. Special server has been created for bioinformatics analysis of Hsp60: http://oka.protres.ru:4202/.
Collapse
Affiliation(s)
- Tatyana S Tikhomirova
- Institute for Biological Instrumentation of the Russian Academy of Sciences, Federal Research Center "Pushchino Scientific Center for Biological Research of the Russian Academy of Sciences", Pushchino, Moscow Region, Russia
| | - Maxim A Matyunin
- Institute of Protein Research, Russian Academy of Sciences, Pushchino, Moscow Region, Russia
| | - Michail Yu Lobanov
- Institute of Protein Research, Russian Academy of Sciences, Pushchino, Moscow Region, Russia
| | - Oxana V Galzitskaya
- Institute of Protein Research, Russian Academy of Sciences, Pushchino, Moscow Region, Russia
- Institute of Theoretical and Experimental Biophysics, Russian Academy of Sciences, Pushchino, Moscow Region, Russia
| |
Collapse
|
7
|
Arella D, Dilucca M, Giansanti A. Codon usage bias and environmental adaptation in microbial organisms. Mol Genet Genomics 2021; 296:751-762. [PMID: 33818631 PMCID: PMC8144148 DOI: 10.1007/s00438-021-01771-4] [Citation(s) in RCA: 34] [Impact Index Per Article: 8.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/14/2020] [Accepted: 02/22/2021] [Indexed: 01/01/2023]
Abstract
In each genome, synonymous codons are used with different frequencies; this general phenomenon is known as codon usage bias. It has been previously recognised that codon usage bias could affect the cellular fitness and might be associated with the ecology of microbial organisms. In this exploratory study, we investigated the relationship between codon usage bias, lifestyles (thermophiles vs. mesophiles; pathogenic vs. non-pathogenic; halophilic vs. non-halophilic; aerobic vs. anaerobic and facultative) and habitats (aquatic, terrestrial, host-associated, specialised, multiple) of 615 microbial organisms (544 bacteria and 71 archaea). Principal component analysis revealed that species with given phenotypic traits and living in similar environmental conditions have similar codon preferences, as represented by the relative synonymous codon usage (RSCU) index, and similar spectra of tRNA availability, as gauged by the tRNA gene copy number (tGCN). Moreover, by measuring the average tRNA adaptation index (tAI) for each genome, an index that can be associated with translational efficiency, we observed that organisms able to live in multiple habitats, including facultative organisms, mesophiles and pathogenic bacteria, are characterised by a reduced translational efficiency, consistently with their need to adapt to different environments. Our results show that synonymous codon choices might be under strong translational selection, which modulates the choice of the codons to differently match tRNA availability, depending on the organism's lifestyle needs. To our knowledge, this is the first large-scale study that examines the role of codon bias and translational efficiency in the adaptation of microbial organisms to the environment in which they live.
Collapse
Affiliation(s)
- Davide Arella
- Department of Physics, Sapienza University of Rome, 00185, Rome, Italy.
| | - Maddalena Dilucca
- Department of Physics, Sapienza University of Rome, 001885, Rome, Italy
| | - Andrea Giansanti
- Department of Physics, Sapienza University of Rome, 00185, Rome, Italy
- INFN, Roma1 Unit, 00185, Rome, Italy
| |
Collapse
|
8
|
Bahiri-Elitzur S, Tuller T. Codon-based indices for modeling gene expression and transcript evolution. Comput Struct Biotechnol J 2021; 19:2646-2663. [PMID: 34025951 PMCID: PMC8122159 DOI: 10.1016/j.csbj.2021.04.042] [Citation(s) in RCA: 33] [Impact Index Per Article: 8.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/14/2021] [Revised: 04/17/2021] [Accepted: 04/18/2021] [Indexed: 11/21/2022] Open
Abstract
Codon usage bias (CUB) refers to the phenomena that synonymous codons are used in different frequencies in most genes and organisms. The general assumption is that codon biases reflect a balance between mutational biases and natural selection. Today we understand that the codon content is related and can affect all gene expression steps. Starting from the 1980s, codon-based indices have been used for answering different questions in all biomedical fields, including systems biology, agriculture, medicine, and biotechnology. In general, codon usage bias indices weigh each codon or a small set of codons to estimate the fitting of a certain coding sequence to a certain phenomenon (e.g., bias in codons, adaptation to the tRNA pool, frequencies of certain codons, transcription elongation speed, etc.) and are usually easy to implement. Today there are dozens of such indices; thus, this paper aims to review and compare the different codon usage bias indices, their applications, and advantages. In addition, we perform analysis that demonstrates that most indices tend to correlate even though they aim to capture different aspects. Due to the centrality of codon usage bias on different gene expression steps, it is important to keep developing new indices that can capture additional aspects that are not modeled with the current indices.
Collapse
Affiliation(s)
| | - Tamir Tuller
- Department of Biomedical Engineering, Tel-Aviv University, Tel Aviv, Israel
- The Sagol School of Neuroscience, Tel-Aviv University, Tel Aviv, Israel
| |
Collapse
|
9
|
Barbhuiya PA, Uddin A, Chakraborty S. Understanding the codon usage patterns of mitochondrial CO genes among Amphibians. Gene 2021; 777:145462. [PMID: 33515725 DOI: 10.1016/j.gene.2021.145462] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/15/2020] [Revised: 12/18/2020] [Accepted: 01/20/2021] [Indexed: 11/17/2022]
Abstract
A universal phenomenon of using synonymous codons unequally in coding sequences known as codon usage bias (CUB) is observed in all forms of life. Mutation and natural selection drive CUB in many species but the relative role of evolutionary forces varies across species, genes and genomes. We studied the CUB in mitochondrial (mt) CO genes from three orders of Amphibia using bioinformatics approach as no work was reported yet. We observed that CUB of mt CO genes of Amphibians was weak across different orders. Order Caudata had higher CUB followed by Gymnophiona and Anura for all genes and CUB also varied across genes. Nucleotide composition analysis showed that CO genes were AT-rich. The AT content in Caudata was higher than that in Gymnophiona while Anura showed the least content. Multiple investigations namely nucleotide composition, correspondence analysis, parity plot analysis showed that the interplay of mutation pressure and natural selection caused CUB in these genes. Neutrality plot suggested the involvement of natural selection was more than the mutation pressure. The contribution of natural selection was higher in Anura than Gymnophiona and the lowest in Caudata. The codons CGA, TGA, AAA were found to be highly favoured by nature across all genes and orders.
Collapse
Affiliation(s)
- Parvin A Barbhuiya
- Department of Biotechnology, Assam University, Silchar 788150, Assam, India
| | - Arif Uddin
- Department of Zoology, Moinul Hoque Choudhury Memorial Science College, Algapur, Hailakandi 788150, Assam, India
| | - Supriyo Chakraborty
- Department of Biotechnology, Assam University, Silchar 788150, Assam, India.
| |
Collapse
|
10
|
Zwickl NF, Stralis-Pavese N, Schäffer C, Dohm JC, Himmelbauer H. Comparative genome characterization of the periodontal pathogen Tannerella forsythia. BMC Genomics 2020; 21:150. [PMID: 32046654 PMCID: PMC7014623 DOI: 10.1186/s12864-020-6535-y] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/18/2019] [Accepted: 01/23/2020] [Indexed: 02/08/2023] Open
Abstract
BACKGROUND Tannerella forsythia is a bacterial pathogen implicated in periodontal disease. Numerous virulence-associated T. forsythia genes have been described, however, it is necessary to expand the knowledge on T. forsythia's genome structure and genetic repertoire to further elucidate its role within pathogenesis. Tannerella sp. BU063, a putative periodontal health-associated sister taxon and closest known relative to T. forsythia is available for comparative analyses. In the past, strain confusion involving the T. forsythia reference type strain ATCC 43037 led to discrepancies between results obtained from in silico analyses and wet-lab experimentation. RESULTS We generated a substantially improved genome assembly of T. forsythia ATCC 43037 covering 99% of the genome in three sequences. Using annotated genomes of ten Tannerella strains we established a soft core genome encompassing 2108 genes, based on orthologs present in > = 80% of the strains analysed. We used a set of known and hypothetical virulence factors for comparisons in pathogenic strains and the putative periodontal health-associated isolate Tannerella sp. BU063 to identify candidate genes promoting T. forsythia's pathogenesis. Searching for pathogenicity islands we detected 38 candidate regions in the T. forsythia genome. Only four of these regions corresponded to previously described pathogenicity islands. While the general protein O-glycosylation gene cluster of T. forsythia ATCC 43037 has been described previously, genes required for the initiation of glycan synthesis are yet to be discovered. We found six putative glycosylation loci which were only partially conserved in other bacteria. Lastly, we performed a comparative analysis of translational bias in T. forsythia and Tannerella sp. BU063 and detected highly biased genes. CONCLUSIONS We provide resources and important information on the genomes of Tannerella strains. Comparative analyses enabled us to assess the suitability of T. forsythia virulence factors as therapeutic targets and to suggest novel putative virulence factors. Further, we report on gene loci that should be addressed in the context of elucidating T. forsythia's protein O-glycosylation pathway. In summary, our work paves the way for further molecular dissection of T. forsythia biology in general and virulence of this species in particular.
Collapse
Affiliation(s)
- Nikolaus F. Zwickl
- Department of Biotechnology, Institute of Computational Biology, University of Natural Resources and Life Sciences (BOKU), Vienna, Austria
| | - Nancy Stralis-Pavese
- Department of Biotechnology, Institute of Computational Biology, University of Natural Resources and Life Sciences (BOKU), Vienna, Austria
| | - Christina Schäffer
- Department of NanoBiotechnology, NanoGlycobiology unit, University of Natural Resources and Life Sciences (BOKU), Vienna, Austria
| | - Juliane C. Dohm
- Department of Biotechnology, Institute of Computational Biology, University of Natural Resources and Life Sciences (BOKU), Vienna, Austria
| | - Heinz Himmelbauer
- Department of Biotechnology, Institute of Computational Biology, University of Natural Resources and Life Sciences (BOKU), Vienna, Austria
| |
Collapse
|
11
|
Pal A, Saha BK, Saha J. Comparative in silico analysis of ftsZ gene from different bacteria reveals the preference for core set of codons in coding sequence structuring and secondary structural elements determination. PLoS One 2019; 14:e0219231. [PMID: 31841523 PMCID: PMC6913975 DOI: 10.1371/journal.pone.0219231] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/17/2019] [Accepted: 11/28/2019] [Indexed: 11/19/2022] Open
Abstract
The deluge of sequence information in the recent times provide us with an excellent opportunity to compare organisms on a large genomic scale. In this study we have tried to decipher the variation in the gene organization and structuring of a vital bacterial gene called ftsZ which codes for an integral component of the bacterial cell division, the FtsZ protein. FtsZ is homologous to tubulin protein and has been found to be ubiquitous in eubacteria. FtsZ is showing increasing promise as a target for antibacterial drug discovery. Our study of ftsZ protein from 143 different bacterial species spanning a wider range of morphological and physiological type demonstrates that the ftsZ gene of about ninety three percent of the organisms show relatively biased codon usage profile and significant GC deviation from their genomic GC content. Comparative codon usage analysis of ftsZ and a core housekeeping gene rpoB demonstrated that codon usage pattern of ftsZ CDS is shaped by natural selection to a large extent and mimics that of a housekeeping gene. We have also detected a tendency among the different organisms to utilize a core set of codons in structuring the ftsZ coding sequence. We observed that the compositional frequency of the amino acid serine in the FtsZ protein appears to be a indicator of the bacterial lifestyle. Our meticulous analysis of the ftsZ gene linked with the corresponding FtsZ protein show that there is a bias towards the use of specific synonymous codons particularly in the helix and strand regions of the multi-domain FtsZ protein. Overall our findings suggest that in an indispensable and vital protein such as FtsZ, there is an inherent tendency to maintain form for optimized performance in spite of the extrinsic variability in coding features.
Collapse
Affiliation(s)
- Ayon Pal
- Microbiology & Computational Biology Laboratory, Department of Botany, Raiganj University, Raiganj, West Bengal, India
| | - Barnan Kumar Saha
- Microbiology & Computational Biology Laboratory, Department of Botany, Raiganj University, Raiganj, West Bengal, India
| | - Jayanti Saha
- Microbiology & Computational Biology Laboratory, Department of Botany, Raiganj University, Raiganj, West Bengal, India
| |
Collapse
|
12
|
Hart A, Cortés MP, Latorre M, Martinez S. Codon usage bias reveals genomic adaptations to environmental conditions in an acidophilic consortium. PLoS One 2018; 13:e0195869. [PMID: 29742107 PMCID: PMC5942774 DOI: 10.1371/journal.pone.0195869] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/21/2017] [Accepted: 03/30/2018] [Indexed: 11/20/2022] Open
Abstract
The analysis of codon usage bias has been widely used to characterize different communities of microorganisms. In this context, the aim of this work was to study the codon usage bias in a natural consortium of five acidophilic bacteria used for biomining. The codon usage bias of the consortium was contrasted with genes from an alternative collection of acidophilic reference strains and metagenome samples. Results indicate that acidophilic bacteria preferentially have low codon usage bias, consistent with both their capacity to live in a wide range of habitats and their slow growth rate, a characteristic probably acquired independently from their phylogenetic relationships. In addition, the analysis showed significant differences in the unique sets of genes from the autotrophic species of the consortium in relation to other acidophilic organisms, principally in genes which code for proteins involved in metal and oxidative stress resistance. The lower values of codon usage bias obtained in this unique set of genes suggest higher transcriptional adaptation to living in extreme conditions, which was probably acquired as a measure for resisting the elevated metal conditions present in the mine.
Collapse
Affiliation(s)
- Andrew Hart
- UMI 2071 CNRS-UCHILE, Facultad de Ciencias Físicas y Matemáticas, Centro de Modelamiento Matemático, Universidad de Chile, Casilla 170, Correo 3, Santiago, Chile
| | - María Paz Cortés
- Mathomics, Centro de Modelamiento Matemático, Universidad de Chile, Santiago, Chile
- Fondap-Center of Genome Regulation, Facultad de Ciencias, Universidad de Chile, Santiago, Chile
| | - Mauricio Latorre
- Mathomics, Centro de Modelamiento Matemático, Universidad de Chile, Santiago, Chile
- Fondap-Center of Genome Regulation, Facultad de Ciencias, Universidad de Chile, Santiago, Chile
- Laboratorio de Bioinformática y Expresión Génica, INTA, Universidad de Chile, Macul, Santiago, Chile
- Universidad de O'Higgins, Instituto de Ciencias de la Ingeniería, Rancagua, Chile
- * E-mail: (ML); (SM)
| | - Servet Martinez
- Departamento de Ingeniería Matemática, UMI 2071 CNRS-UCHILE, Facultad de Ciencias Físicas y Matemáticas, Centro de Modelamiento Matemático, Universidad de Chile, Casilla 170, Correo 3, Santiago, Chile
- * E-mail: (ML); (SM)
| |
Collapse
|
13
|
de Freitas Nascimento J, Kelly S, Sunter J, Carrington M. Codon choice directs constitutive mRNA levels in trypanosomes. eLife 2018; 7:e32467. [PMID: 29543152 PMCID: PMC5896880 DOI: 10.7554/elife.32467] [Citation(s) in RCA: 36] [Impact Index Per Article: 5.1] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/03/2017] [Accepted: 02/27/2018] [Indexed: 11/13/2022] Open
Abstract
Selective transcription of individual protein coding genes does not occur in trypanosomes and the cellular copy number of each mRNA must be determined post-transcriptionally. Here, we provide evidence that codon choice directs the levels of constitutively expressed mRNAs. First, a novel codon usage metric, the gene expression codon adaptation index (geCAI), was developed that maximised the relationship between codon choice and the measured abundance for a transcriptome. Second, geCAI predictions of mRNA levels were tested using differently coded GFP transgenes and were successful over a 25-fold range, similar to the variation in endogenous mRNAs. Third, translation was necessary for the accelerated mRNA turnover resulting from codon choice. Thus, in trypanosomes, the information determining the levels of most mRNAs resides in the open reading frame and translation is required to access this information.
Collapse
Affiliation(s)
| | - Steven Kelly
- Department of Plant SciencesUniversity of OxfordOxfordUnited Kingdom
| | - Jack Sunter
- Department of BiochemistryUniversity of CambridgeCambridgeUnited Kingdom
| | - Mark Carrington
- Department of BiochemistryUniversity of CambridgeCambridgeUnited Kingdom
| |
Collapse
|
14
|
Sun Y, Tamarit D, Andersson SGE. Switches in Genomic GC Content Drive Shifts of Optimal Codons under Sustained Selection on Synonymous Sites. Genome Biol Evol 2018; 9:2560-2579. [PMID: 27540085 PMCID: PMC5629928 DOI: 10.1093/gbe/evw201] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 08/12/2016] [Indexed: 12/16/2022] Open
Abstract
The major codon preference model suggests that codons read by tRNAs in high concentrations are preferentially utilized in highly expressed genes. However, the identity of the optimal codons differs between species although the forces driving such changes are poorly understood. We suggest that these questions can be tackled by placing codon usage studies in a phylogenetic framework and that bacterial genomes with extreme nucleotide composition biases provide informative model systems. Switches in the background substitution biases from GC to AT have occurred in Gardnerella vaginalis (GC = 32%), and from AT to GC in Lactobacillus delbrueckii (GC = 62%) and Lactobacillus fermentum (GC = 63%). We show that despite the large effects on codon usage patterns by these switches, all three species evolve under selection on synonymous sites. In G. vaginalis, the dramatic codon frequency changes coincide with shifts of optimal codons. In contrast, the optimal codons have not shifted in the two Lactobacillus genomes despite an increased fraction of GC-ending codons. We suggest that all three species are in different phases of an on-going shift of optimal codons, and attribute the difference to a stronger background substitution bias and/or longer time since the switch in G. vaginalis. We show that comparative and correlative methods for optimal codon identification yield conflicting results for genomes in flux and discuss possible reasons for the mispredictions. We conclude that switches in the direction of the background substitution biases can drive major shifts in codon preference patterns even under sustained selection on synonymous codon sites.
Collapse
Affiliation(s)
- Yu Sun
- Department of Molecular Evolution, Cell and Molecular Biology, Science for Life Laboratory, Uppsala University, Uppsala, Sweden
| | - Daniel Tamarit
- Department of Molecular Evolution, Cell and Molecular Biology, Science for Life Laboratory, Uppsala University, Uppsala, Sweden
| | - Siv G E Andersson
- Department of Molecular Evolution, Cell and Molecular Biology, Science for Life Laboratory, Uppsala University, Uppsala, Sweden
| |
Collapse
|
15
|
Das S, Chottopadhyay B, Sahoo S. Comparative Analysis of Predicted Gene Expression among Crenarchaeal Genomes. Genomics Inform 2017; 15:38-47. [PMID: 28416948 PMCID: PMC5389947 DOI: 10.5808/gi.2017.15.1.38] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/16/2016] [Revised: 11/28/2016] [Accepted: 01/26/2017] [Indexed: 12/13/2022] Open
Abstract
Research into new methods for identifying highly expressed genes in anonymous genome sequences has been going on for more than 15 years. We presented here an alternative approach based on modified score of relative codon usage bias to identify highly expressed genes in crenarchaeal genomes. The proposed algorithm relies exclusively on sequence features for identifying the highly expressed genes. In this study, a comparative analysis of predicted highly expressed genes in five crenarchaeal genomes was performed using the score of Modified Relative Codon Bias Strength (MRCBS) as a numerical estimator of gene expression level. We found a systematic strong correlation between Codon Adaptation Index and MRCBS. Additionally, MRCBS correlated well with other expression measures. Our study indicates that MRCBS can consistently capture the highly expressed genes.
Collapse
Affiliation(s)
- Shibsankar Das
- Department of Mathematics, Uluberia College, Uluberia 711315, India
| | | | | |
Collapse
|
16
|
Lal D, Verma M, Behura SK, Lal R. Codon usage bias in phylum Actinobacteria : relevance to environmental adaptation and host pathogenicity. Res Microbiol 2016; 167:669-677. [DOI: 10.1016/j.resmic.2016.06.003] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/02/2015] [Revised: 06/08/2016] [Accepted: 06/10/2016] [Indexed: 10/21/2022]
|
17
|
Genome-Wide Analysis of the Synonymous Codon Usage Patterns in Riemerella anatipestifer. Int J Mol Sci 2016; 17:ijms17081304. [PMID: 27517915 PMCID: PMC5000701 DOI: 10.3390/ijms17081304] [Citation(s) in RCA: 25] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/25/2016] [Revised: 07/31/2016] [Accepted: 08/02/2016] [Indexed: 11/17/2022] Open
Abstract
Riemerella anatipestifer (RA) belongs to the Flavobacteriaceae family and can cause a septicemia disease in poultry. The synonymous codon usage patterns of bacteria reflect a series of evolutionary changes that enable bacteria to improve tolerance of the various environments. We detailed the codon usage patterns of RA isolates from the available 12 sequenced genomes by multiple codon and statistical analysis. Nucleotide compositions and relative synonymous codon usage (RSCU) analysis revealed that A or U ending codons are predominant in RA. Neutrality analysis found no significant correlation between GC12 and GC₃ (p > 0.05). Correspondence analysis and ENc-plot results showed that natural selection dominated over mutation in the codon usage bias. The tree of cluster analysis based on RSCU was concordant with dendrogram based on genomic BLAST by neighbor-joining method. By comparative analysis, about 50 highly expressed genes that were orthologs across all 12 strains were found in the top 5% of high CAI value. Based on these CAI values, we infer that RA contains a number of predicted highly expressed coding sequences, involved in transcriptional regulation and metabolism, reflecting their requirement for dealing with diverse environmental conditions. These results provide some useful information on the mechanisms that contribute to codon usage bias and evolution of RA.
Collapse
|
18
|
Pal A, Banerjee R, Mondal UK, Mukhopadhyay S, Bothra AK. Deconstruction of archaeal genome depict strategic consensus in core pathways coding sequence assembly. PLoS One 2015; 10:e0118245. [PMID: 25674789 PMCID: PMC4326414 DOI: 10.1371/journal.pone.0118245] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/30/2014] [Accepted: 01/06/2015] [Indexed: 11/18/2022] Open
Abstract
A comprehensive in silico analysis of 71 species representing the different taxonomic classes and physiological genre of the domain Archaea was performed. These organisms differed in their physiological attributes, particularly oxygen tolerance and energy metabolism. We explored the diversity and similarity in the codon usage pattern in the genes and genomes of these organisms, emphasizing on their core cellular pathways. Our thrust was to figure out whether there is any underlying similarity in the design of core pathways within these organisms. Analyses of codon utilization pattern, construction of hierarchical linear models of codon usage, expression pattern and codon pair preference pointed to the fact that, in the archaea there is a trend towards biased use of synonymous codons in the core cellular pathways and the Nc-plots appeared to display the physiological variations present within the different species. Our analyses revealed that aerobic species of archaea possessed a larger degree of freedom in regulating expression levels than could be accounted for by codon usage bias alone. This feature might be a consequence of their enhanced metabolic activities as a result of their adaptation to the relatively O2-rich environment. Species of archaea, which are related from the taxonomical viewpoint, were found to have striking similarities in their ORF structuring pattern. In the anaerobic species of archaea, codon bias was found to be a major determinant of gene expression. We have also detected a significant difference in the codon pair usage pattern between the whole genome and the genes related to vital cellular pathways, and it was not only species-specific but pathway specific too. This hints towards the structuring of ORFs with better decoding accuracy during translation. Finally, a codon-pathway interaction in shaping the codon design of pathways was observed where the transcription pathway exhibited a significantly different coding frequency signature.
Collapse
Affiliation(s)
- Ayon Pal
- Department of Botany, Raiganj College (University College), Raiganj, Uttar Dinajpur, West Bengal, India
| | - Rachana Banerjee
- Department of Biophysics, Molecular Biology and Bioinformatics, University of Calcutta, Kolkata, West Bengal, India
| | - Uttam K Mondal
- Cheminformatics Bioinformatics Laboratory, Department of Chemistry, Raiganj College (University College), Raiganj, Uttar Dinajpur, West Bengal, India
| | - Subhasis Mukhopadhyay
- Department of Biophysics, Molecular Biology and Bioinformatics, University of Calcutta, Kolkata, West Bengal, India
| | - Asim K Bothra
- Cheminformatics Bioinformatics Laboratory, Department of Chemistry, Raiganj College (University College), Raiganj, Uttar Dinajpur, West Bengal, India
| |
Collapse
|
19
|
A genome-wide identification of genes undergoing recombination and positive selection in Neisseria. BIOMED RESEARCH INTERNATIONAL 2014; 2014:815672. [PMID: 25180194 PMCID: PMC4142384 DOI: 10.1155/2014/815672] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 06/02/2014] [Revised: 07/18/2014] [Accepted: 07/18/2014] [Indexed: 01/01/2023]
Abstract
Currently, there is particular interest in the molecular mechanisms of adaptive evolution in bacteria. Neisseria is a genus of gram negative bacteria, and there has recently been considerable focus on its two human pathogenic species N. meningitidis and N. gonorrhoeae. Until now, no genome-wide studies have attempted to scan for the genes related to adaptive evolution. For this reason, we selected 18 Neisseria genomes (14 N. meningitidis, 3 N. gonorrhoeae and 1 commensal N. lactamics) to conduct a comparative genome analysis to obtain a comprehensive understanding of the roles of natural selection and homologous recombination throughout the history of adaptive evolution. Among the 1012 core orthologous genes, we identified 635 genes with recombination signals and 10 genes that showed significant evidence of positive selection. Further functional analyses revealed that no functional bias was found in the recombined genes. Positively selected genes are prone to DNA processing and iron uptake, which are essential for the fundamental life cycle. Overall, the results indicate that both recombination and positive selection play crucial roles in the adaptive evolution of Neisseria genomes. The positively selected genes and the corresponding amino acid sites provide us with valuable targets for further research into the detailed mechanisms of adaptive evolution in Neisseria.
Collapse
|
20
|
Krisko A, Copic T, Gabaldón T, Lehner B, Supek F. Inferring gene function from evolutionary change in signatures of translation efficiency. Genome Biol 2014; 15:R44. [PMID: 24580753 PMCID: PMC4054840 DOI: 10.1186/gb-2014-15-3-r44] [Citation(s) in RCA: 40] [Impact Index Per Article: 3.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/24/2013] [Accepted: 03/03/2014] [Indexed: 11/13/2022] Open
Abstract
Background The genetic code is redundant, meaning that most amino acids can be encoded by more than one codon. Highly expressed genes tend to use optimal codons to increase the accuracy and speed of translation. Thus, codon usage biases provide a signature of the relative expression levels of genes, which can, uniquely, be quantified across the domains of life. Results Here we describe a general statistical framework to exploit this phenomenon and to systematically associate genes with environments and phenotypic traits through changes in codon adaptation. By inferring evolutionary signatures of translation efficiency in 911 bacterial and archaeal genomes while controlling for confounding effects of phylogeny and inter-correlated phenotypes, we linked 187 gene families to 24 diverse phenotypic traits. A series of experiments in Escherichia coli revealed that 13 of 15, 19 of 23, and 3 of 6 gene families with changes in codon adaptation in aerotolerant, thermophilic, or halophilic microbes. Respectively, confer specific resistance to, respectively, hydrogen peroxide, heat, and high salinity. Further, we demonstrate experimentally that changes in codon optimality alone are sufficient to enhance stress resistance. Finally, we present evidence that multiple genes with altered codon optimality in aerobes confer oxidative stress resistance by controlling the levels of iron and NAD(P)H. Conclusions Taken together, these results provide experimental evidence for a widespread connection between changes in translation efficiency and phenotypic adaptation. As the number of sequenced genomes increases, this novel genomic context method for linking genes to phenotypes based on sequence alone will become increasingly useful.
Collapse
|
21
|
O'Neill PK, Or M, Erill I. scnRCA: a novel method to detect consistent patterns of translational selection in mutationally-biased genomes. PLoS One 2013; 8:e76177. [PMID: 24116094 PMCID: PMC3792112 DOI: 10.1371/journal.pone.0076177] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/21/2013] [Accepted: 08/23/2013] [Indexed: 12/04/2022] Open
Abstract
Codon usage bias (CUB) results from the complex interplay between translational selection and mutational biases. Current methods for CUB analysis apply heuristics to integrate both components, limiting the depth and scope of CUB analysis as a technique to probe into the evolution and optimization of protein-coding genes. Here we introduce a self-consistent CUB index (scnRCA) that incorporates implicit correction for mutational biases, facilitating exploration of the translational selection component of CUB. We validate this technique using gene expression data and we apply it to a detailed analysis of CUB in the Pseudomonadales. Our results illustrate how the selective enrichment of specific codons among highly expressed genes is preserved in the context of genome-wide shifts in codon frequencies, and how the balance between mutational and translational biases leads to varying definitions of codon optimality. We extend this analysis to other moderate and fast growing bacteria and we provide unified support for the hypothesis that C- and A-ending codons of two-box amino acids, and the U-ending codons of four-box amino acids, are systematically enriched among highly expressed genes across bacteria. The use of an unbiased estimator of CUB allows us to report for the first time that the signature of translational selection is strongly conserved in the Pseudomonadales in spite of drastic changes in genome composition, and extends well beyond the core set of highly optimized genes in each genome. We generalize these results to other moderate and fast growing bacteria, hinting at selection for a universal pattern of gene expression that is conserved and detectable in conserved patterns of codon usage bias.
Collapse
Affiliation(s)
- Patrick K. O'Neill
- Department of Biological Sciences, University of Maryland Baltimore County (UMBC), Baltimore, Maryland, United States of America
| | - Mindy Or
- Department of Biological Sciences, University of Maryland Baltimore County (UMBC), Baltimore, Maryland, United States of America
| | - Ivan Erill
- Department of Biological Sciences, University of Maryland Baltimore County (UMBC), Baltimore, Maryland, United States of America
- * E-mail:
| |
Collapse
|
22
|
Tello M, Vergara F, Spencer E. Genomic adaptation of the ISA virus to Salmo salar codon usage. Virol J 2013; 10:223. [PMID: 23829271 PMCID: PMC3706250 DOI: 10.1186/1743-422x-10-223] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/28/2013] [Accepted: 07/01/2013] [Indexed: 01/09/2023] Open
Abstract
Background The ISA virus (ISAV) is an Orthomyxovirus whose genome encodes for at least 10 proteins. Low protein identity and lack of genetic tools have hampered the study of the molecular mechanism behind its virulence. It has been shown that viral codon usage controls several processes such as translational efficiency, folding, tuning of protein expression, antigenicity and virulence. Despite this, the possible role that adaptation to host codon usage plays in virulence and viral evolution has not been studied in ISAV. Methods Intergenomic adaptation between viral and host genomes was calculated using the codon adaptation index score with EMBOSS software and the Kazusa database. Classification of host genes according to GeneOnthology was performed using Blast2go. A non parametric test was applied to determine the presence of significant correlations among CAI, mortality and time. Results Using the codon adaptation index (CAI) score, we found that the encoding genes for nucleoprotein, matrix protein M1 and antagonist of Interferon I signaling (NS1) are the ISAV genes that are more adapted to host codon usage, in agreement with their requirement for production of viral particles and inactivation of antiviral responses. Comparison to host genes showed that ISAV shares CAI values with less than 0.45% of Salmo salar genes. GeneOntology classification of host genes showed that ISAV genes share CAI values with genes from less than 3% of the host biological process, far from the 14% shown by Influenza A viruses and closer to the 5% shown by Influenza B and C. As well, we identified a positive correlation (p<0.05) between CAI values of a virus and the duration of the outbreak disease in given salmon farms, as well as a weak relationship between codon adaptation values of PB1 and the mortality rates of a set of ISA viruses. Conclusions Our analysis shows that ISAV is the least adapted viral Salmo salar pathogen and Orthomyxovirus family member less adapted to host codon usage, avoiding the general behavior of host genes. This is probably due to its recent emergence among farmed Salmon populations.
Collapse
Affiliation(s)
- Mario Tello
- Centro de Biotecnología Acuícola, Laboratorio de Virología, Facultad de Química y Biología, Universidad de Santiago de Chile, Avenida Libertador Bernardo O'Higgins 3363, Santiago, Chile.
| | | | | |
Collapse
|
23
|
Poon SK, Peacock L, Gibson W, Gull K, Kelly S. A modular and optimized single marker system for generating Trypanosoma brucei cell lines expressing T7 RNA polymerase and the tetracycline repressor. Open Biol 2013; 2:110037. [PMID: 22645659 PMCID: PMC3352093 DOI: 10.1098/rsob.110037] [Citation(s) in RCA: 93] [Impact Index Per Article: 7.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/05/2011] [Accepted: 01/24/2012] [Indexed: 11/24/2022] Open
Abstract
Here, we present a simple modular extendable vector system for introducing the T7
RNA polymerase and tetracycline repressor genes into Trypanosoma
brucei. This novel system exploits developments in our
understanding of gene expression and genome organization to produce a
streamlined plasmid optimized for high levels of expression of the introduced
transgenes. We demonstrate the utility of this novel system in bloodstream and
procyclic forms of Trypanosoma brucei, including the genome
strain TREU927/4. We validate these cell lines using a variety of inducible
experiments that recapture previously published lethal and non-lethal
phenotypes. We further demonstrate the utility of the single marker (SmOx)
TREU927/4 cell line for in vivo experiments in the tsetse fly
and provide a set of plasmids that enable both whole-fly and salivary
gland-specific inducible expression of transgenes.
Collapse
Affiliation(s)
- S K Poon
- Sir William Dunn School of Pathology, University of Oxford, South Parks Road, Oxford OX1 3RE, UK
| | | | | | | | | |
Collapse
|
24
|
Nikolic N, Smole Z, Krisko A. Proteomic properties reveal phyloecological clusters of Archaea. PLoS One 2012; 7:e48231. [PMID: 23133575 PMCID: PMC3485053 DOI: 10.1371/journal.pone.0048231] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/26/2011] [Accepted: 09/28/2012] [Indexed: 11/18/2022] Open
Abstract
In this study, we propose a novel way to describe the variety of environmental adaptations of Archaea. We have clustered 57 Archaea by using a non-redundant set of proteomic features, and verified that the clusters correspond to environmental adaptations to the archaeal habitats. The first cluster consists dominantly of hyperthermophiles and hyperthermoacidophilic aerobes. The second cluster joins together halophilic and extremely halophilic Archaea, while the third cluster contains mesophilic (mostly methanogenic) Archaea together with thermoacidophiles. The non-redundant subset of proteomic features was found to consist of five features: the ratio of charged residues to uncharged, average protein size, normalized frequency of beta-sheet, normalized frequency of extended structure and number of hydrogen bond donors. We propose this clustering to be termed phyloecological clustering. This approach could give additional insights into relationships among archaeal species that may be hidden by sole phylogenetic analysis.
Collapse
Affiliation(s)
- Nela Nikolic
- Mediterranean Institute for Life Sciences, Split, Croatia
- Institute of Biogeochemistry and Pollutant Dynamics, ETH Zurich, Zurich, Switzerland
- Department of Environmental Microbiology, Eawag, Duebendorf, Switzerland
| | - Zlatko Smole
- Mediterranean Institute for Life Sciences, Split, Croatia
- Institute of Cell Biology, ETH Zurich, Zurich, Switzerland
| | - Anita Krisko
- Mediterranean Institute for Life Sciences, Split, Croatia
- * E-mail:
| |
Collapse
|
25
|
Phan TH, Nguyen DL. Species-specificity of DNA trimer densities in chromosomes and their use in the classification of closely related organisms. J Microbiol Methods 2012; 91:30-7. [PMID: 22820348 DOI: 10.1016/j.mimet.2012.07.011] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/13/2012] [Revised: 07/09/2012] [Accepted: 07/10/2012] [Indexed: 11/27/2022]
Abstract
16S rDNA sequences are conventionally used for classification of organisms. However, the use of these sequences is sometimes not successful, especially for closely related species. For better classification of these organisms, several methods that are genome sequence-based have been developed. Sequence alignment-based methods are tedious and time-consuming, as they need conserved coding sequences to be identified and deduced prior to sequence alignment. Likewise, method that relies on gene function needs genes to be assessed for function similarity. Other alignment-free methods, which are based on particular genome sequence properties, so far have been complex and not species-specific enough for classification of organisms below genus level. The present study found that the ratios of DNA trimer frequencies to chromosomal length were species-specific. Density of a trimer in a chromosomal sequence was defined as the average frequency of the trimer per 1 kbp. The species-specificity of trimer densities in chromosomes of many closely related bacteria was compared in parallel with 16S rDNA sequences in these same bacteria. The results of these comparisons indicate that trimer densities in chromosomes can be used to simply and efficiently classify the organisms below genus level.
Collapse
Affiliation(s)
- Thi Huyen Phan
- Department of Biotechnology, Ho Chi Minh City University of Technology, VNU-HCM, Ward 14, District 10, Ho Chi Minh City, Vietnam.
| | | |
Collapse
|
26
|
Raiford DW, Heizer EM, Miller RV, Doom TE, Raymer ML, Krane DE. Metabolic and translational efficiency in microbial organisms. J Mol Evol 2012; 74:206-16. [PMID: 22538926 DOI: 10.1007/s00239-012-9500-9] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/29/2011] [Accepted: 04/05/2012] [Indexed: 11/25/2022]
Abstract
Metabolic efficiency, as a selective force shaping proteomes, has been shown to exist in Escherichia coli and Bacillus subtilis and in a small number of organisms with photoautotrophic and thermophilic lifestyles. Earlier attempts at larger-scale analyses have utilized proxies (such as molecular weight) for biosynthetic cost, and did not consider lifestyle or auxotrophy. This study extends the analysis to all currently sequenced microbial organisms that are amenable to these analyses while utilizing lifestyle specific amino acid biosynthesis pathways (where possible) to determine protein production costs and compensating for auxotrophy. The tendency for highly expressed proteins (with adherence to codon usage bias as a proxy for expressivity) to utilize less biosynthetically expensive amino acids is taken as evidence of cost selection. A comprehensive analysis of sequenced genomes to identify those that exhibit strong translational efficiency bias (389 out of 1,700 sequenced organisms) is also presented.
Collapse
Affiliation(s)
- Douglas W Raiford
- Department of Computer Science, University of Montana, Missoula, MT, USA.
| | | | | | | | | | | |
Collapse
|
27
|
The layout of a bacterial genome. FEBS Lett 2012; 586:2043-8. [DOI: 10.1016/j.febslet.2012.03.051] [Citation(s) in RCA: 26] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/02/2012] [Revised: 03/25/2012] [Accepted: 03/26/2012] [Indexed: 12/25/2022]
|
28
|
Radomski JP, Slonimski PP. Alignment free characterization of the influenza-A hemagglutinin genes by the ISSCOR method. C R Biol 2012; 335:180-93. [PMID: 22464426 DOI: 10.1016/j.crvi.2012.01.001] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/07/2010] [Revised: 10/26/2011] [Accepted: 01/11/2012] [Indexed: 12/23/2022]
Abstract
Analyses and visualizations by the ISSCOR method of the influenza virus hemagglutinin genes of three different A-subtypes revealed some rather striking temporal (for A/H3N3), and spatial relationships (for A/H5N1) between groups of individual gene subsets. The application to the A/H1N1 set revealed also relationships between the seasonal H1, and the swine-like novel 2009 H1v variants in a quick and unambiguous manner. Based on these examples we consider the application of the ISSCOR method for analysis of large sets of homologous genes as a worthwhile addition to a toolbox of genomics-it allows a rapid diagnostics of trends, and possibly can even aid an early warning of newly emerging epidemiological threats.
Collapse
Affiliation(s)
- Jan P Radomski
- Interdisciplinary Center for Mathematical and Computational Modeling, Warsaw University, Warsaw, Poland.
| | | |
Collapse
|
29
|
Retchless AC, Lawrence JG. Quantification of codon selection for comparative bacterial genomics. BMC Genomics 2011; 12:374. [PMID: 21787402 PMCID: PMC3162537 DOI: 10.1186/1471-2164-12-374] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/23/2011] [Accepted: 07/25/2011] [Indexed: 11/16/2022] Open
Abstract
Background Statistics measuring codon selection seek to compare genes by their sensitivity to selection for translational efficiency, but existing statistics lack a model for testing the significance of differences between genes. Here, we introduce a new statistic for measuring codon selection, the Adaptive Codon Enrichment (ACE). Results This statistic represents codon usage bias in terms of a probabilistic distribution, quantifying the extent that preferred codons are over-represented in the gene of interest relative to the mean and variance that would result from stochastic sampling of codons. Expected codon frequencies are derived from the observed codon usage frequencies of a broad set of genes, such that they are likely to reflect nonselective, genome wide influences on codon usage (e.g. mutational biases). The relative adaptiveness of synonymous codons is deduced from the frequency of codon usage in a pre-selected set of genes relative to the expected frequency. The ACE can predict both transcript abundance during rapid growth and the rate of synonymous substitutions, with accuracy comparable to or greater than existing metrics. We further examine how the composition of reference gene sets affects the accuracy of the statistic, and suggest methods for selecting appropriate reference sets for any genome, including bacteriophages. Finally, we demonstrate that the ACE may naturally be extended to quantify the genome-wide influence of codon selection in a manner that is sensitive to a large fraction of codons in the genome. This reveals substantial variation among genomes, correlated with the tRNA gene number, even among groups of bacteria where previously proposed whole-genome measures show little variation. Conclusions The statistical framework of the ACE allows rigorous comparison of the level of codon selection acting on genes, both within a genome and between genomes.
Collapse
Affiliation(s)
- Adam C Retchless
- Department of Biological Sciences, University of Pittsburgh, Pittsburgh, PA 15260, USA
| | | |
Collapse
|
30
|
Raiford DW, Krane DE, Doom TEW, Raymer ML. A genetic optimization approach for isolating translational efficiency bias. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2011; 8:342-352. [PMID: 21233519 DOI: 10.1109/tcbb.2009.24] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/30/2023]
Abstract
The study of codon usage bias is an important research area that contributes to our understanding of molecular evolution, phylogenetic relationships, respiratory lifestyle, and other characteristics. Translational efficiency bias is perhaps the most well-studied codon usage bias, as it is frequently utilized to predict relative protein expression levels. We present a novel approach to isolating translational efficiency bias in microbial genomes. There are several existent methods for isolating translational efficiency bias. Previous approaches are susceptible to the confounding influences of other potentially dominant biases. Additionally, existing approaches to identifying translational efficiency bias generally require both genomic sequence information and prior knowledge of a set of highly expressed genes. This novel approach provides more accurate results from sequence information alone by resisting the confounding effects of other biases. We validate this increase in accuracy in isolating translational efficiency bias on 10 microbial genomes, five of which have proven particularly difficult for existing approaches due to the presence of strong confounding biases.
Collapse
Affiliation(s)
- Douglas W Raiford
- Department of Computer Science, University of Montana, 32 Campus Dr., Missoula, MT 59812, USA
| | | | | | | |
Collapse
|
31
|
von Mandach C, Merkl R. Genes optimized by evolution for accurate and fast translation encode in Archaea and Bacteria a broad and characteristic spectrum of protein functions. BMC Genomics 2010; 11:617. [PMID: 21050470 PMCID: PMC3091758 DOI: 10.1186/1471-2164-11-617] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/10/2010] [Accepted: 11/04/2010] [Indexed: 11/13/2022] Open
Abstract
Background In many microbial genomes, a strong preference for a small number of codons can be observed in genes whose products are needed by the cell in large quantities. This codon usage bias (CUB) improves translational accuracy and speed and is one of several factors optimizing cell growth. Whereas CUB and the overrepresentation of individual proteins have been studied in detail, it is still unclear which high-level metabolic categories are subject to translational optimization in different habitats. Results In a systematic study of 388 microbial species, we have identified for each genome a specific subset of genes characterized by a marked CUB, which we named the effectome. As expected, gene products related to protein synthesis are abundant in both archaeal and bacterial effectomes. In addition, enzymes contributing to energy production and gene products involved in protein folding and stabilization are overrepresented. The comparison of genomes from eleven habitats shows that the environment has only a minor effect on the composition of the effectomes. As a paradigmatic example, we detailed the effectome content of 37 bacterial genomes that are most likely exposed to strongest selective pressure towards translational optimization. These effectomes accommodate a broad range of protein functions like enzymes related to glycolysis/gluconeogenesis and the TCA cycle, ATP synthases, aminoacyl-tRNA synthetases, chaperones, proteases that degrade misfolded proteins, protectants against oxidative damage, as well as cold shock and outer membrane proteins. Conclusions We made clear that effectomes consist of specific subsets of the proteome being involved in several cellular functions. As expected, some functions are related to cell growth and affect speed and quality of protein synthesis. Additionally, the effectomes contain enzymes of central metabolic pathways and cellular functions sustaining microbial life under stress situations. These findings indicate that cell growth is an important but not the only factor modulating translational accuracy and speed by means of CUB.
Collapse
|
32
|
Variation in the correlation of G + C composition with synonymous codon usage bias among bacteria. EURASIP JOURNAL ON BIOINFORMATICS & SYSTEMS BIOLOGY 2010:61374. [PMID: 18350114 DOI: 10.1155/2007/61374] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Received: 01/31/2007] [Accepted: 06/04/2007] [Indexed: 11/17/2022]
Abstract
G + C composition at the third codon position (GC3) is widely reported to be correlated with synonymous codon usage bias. However, no quantitative attempt has been made to compare the extent of this correlation among different genomes. Here, we applied Shannon entropy from information theory to measure the degree of GC3 bias and that of synonymous codon usage bias of each gene. The strength of the correlation of GC3 with synonymous codon usage bias, quantified by a correlation coefficient, varied widely among bacterial genomes, ranging from -0.07 to 0.95. Previous analyses suggesting that the relationship between GC3 and synonymous codon usage bias is independent of species are thus inconsistent with the more detailed analyses obtained here for individual species.
Collapse
|
33
|
Supek F, Škunca N, Repar J, Vlahoviček K, Šmuc T. Translational selection is ubiquitous in prokaryotes. PLoS Genet 2010; 6:e1001004. [PMID: 20585573 PMCID: PMC2891978 DOI: 10.1371/journal.pgen.1001004] [Citation(s) in RCA: 69] [Impact Index Per Article: 4.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/11/2009] [Accepted: 05/26/2010] [Indexed: 11/29/2022] Open
Abstract
Codon usage bias in prokaryotic genomes is largely a consequence of background substitution patterns in DNA, but highly expressed genes may show a preference towards codons that enable more efficient and/or accurate translation. We introduce a novel approach based on supervised machine learning that detects effects of translational selection on genes, while controlling for local variation in nucleotide substitution patterns represented as sequence composition of intergenic DNA. A cornerstone of our method is a Random Forest classifier that outperformed previous distance measure-based approaches, such as the codon adaptation index, in the task of discerning the (highly expressed) ribosomal protein genes by their codon frequencies. Unlike previous reports, we show evidence that translational selection in prokaryotes is practically universal: in 460 of 461 examined microbial genomes, we find that a subset of genes shows a higher codon usage similarity to the ribosomal proteins than would be expected from the local sequence composition. These genes constitute a substantial part of the genome—between 5% and 33%, depending on genome size—while also exhibiting higher experimentally measured mRNA abundances and tending toward codons that match tRNA anticodons by canonical base pairing. Certain gene functional categories are generally enriched with, or depleted of codon-optimized genes, the trends of enrichment/depletion being conserved between Archaea and Bacteria. Prominent exceptions from these trends might indicate genes with alternative physiological roles; we speculate on specific examples related to detoxication of oxygen radicals and ammonia and to possible misannotations of asparaginyl–tRNA synthetases. Since the presence of codon optimizations on genes is a valid proxy for expression levels in fully sequenced genomes, we provide an example of an “adaptome” by highlighting gene functions with expression levels elevated specifically in thermophilic Bacteria and Archaea. Synonymous codons are not equally common in genomes. The main causes of unequal codon usage are varying nucleotide substitution patterns, as manifested in the wide range of genomic nucleotide compositions. However, since the first E. coli and yeast genes were sequenced, it became evident that there was also a bias towards codons that can be translated to protein faster and more accurately. This bias was stronger in highly expressed genes, and its driving force was termed translational selection. Researchers sought for effects of translational selection in microbial genomes as they became available, employing a flurry of mathematical approaches which sometimes led to contradictory conclusions. We introduce a sensitive and accurate machine learning-based methodology and find that highly expressed genes have a recognizable codon usage pattern in almost every bacterial and archaeal genome analyzed, even after accounting for large differences in background nucleotide composition. We also show that the gene functional category has a great bearing on whether that gene is subject to translational selection. Since presence of codon optimizations can be used as a purely sequence-derived proxy for expression levels, we can delineate “adaptomes” by relating predicted gene activity to organisms' phenotypes, which we demonstrate on genomes of temperature-resistant Bacteria and Archaea.
Collapse
Affiliation(s)
- Fran Supek
- Division of Electronics, Rudjer Boskovic Institute, Zagreb, Croatia
| | - Nives Škunca
- Division of Electronics, Rudjer Boskovic Institute, Zagreb, Croatia
| | - Jelena Repar
- Division of Molecular Biology, Rudjer Boskovic Institute, Zagreb, Croatia
| | - Kristian Vlahoviček
- Division of Biology, Faculty of Science, University of Zagreb, Zagreb, Croatia
- Department of Informatics, University of Oslo, Oslo, Norway
| | - Tomislav Šmuc
- Division of Electronics, Rudjer Boskovic Institute, Zagreb, Croatia
- * E-mail:
| |
Collapse
|
34
|
Fox JM, Erill I. Relative codon adaptation: a generic codon bias index for prediction of gene expression. DNA Res 2010; 17:185-96. [PMID: 20453079 PMCID: PMC2885275 DOI: 10.1093/dnares/dsq012] [Citation(s) in RCA: 54] [Impact Index Per Article: 3.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open
Abstract
The development of codon bias indices (CBIs) remains an active field of research due to their myriad applications in computational biology. Recently, the relative codon usage bias (RCBS) was introduced as a novel CBI able to estimate codon bias without using a reference set. The results of this new index when applied to Escherichia coli and Saccharomyces cerevisiae led the authors of the original publications to conclude that natural selection favours higher expression and enhanced codon usage optimization in short genes. Here, we show that this conclusion was flawed and based on the systematic oversight of an intrinsic bias for short sequences in the RCBS index and of biases in the small data sets used for validation in E. coli. Furthermore, we reveal that how the RCBS can be corrected to produce useful results and how its underlying principle, which we here term relative codon adaptation (RCA), can be made into a powerful reference-set-based index that directly takes into account the genomic base composition. Finally, we show that RCA outperforms the codon adaptation index (CAI) as a predictor of gene expression when operating on the CAI reference set and that this improvement is significantly larger when analysing genomes with high mutational bias.
Collapse
Affiliation(s)
- Jesse M Fox
- Department of Biological Sciences, University of Maryland Baltimore County (UMBC), 1000 Hilltop Road, Baltimore, MD 21228, USA
| | | |
Collapse
|
35
|
Raiford DW, Krane DE, Doom TE, Raymer ML. Automated isolation of translational efficiency bias that resists the confounding effect of GC(AT)-content. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2010; 7:238-250. [PMID: 20431144 DOI: 10.1109/tcbb.2008.65] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/29/2023]
Abstract
Genomic sequencing projects are an abundant source of information for biological studies ranging from the molecular to the ecological in scale; however, much of the information present may yet be hidden from casual analysis. One such information domain, trends in codon usage, can provide a wealth of information about an organism's genes and their expression. Degeneracy in the genetic code allows more than one triplet codon to code for the same amino acid, and usage of these codons is often biased such that one or more of these synonymous codons are preferred. Detection of this bias is an important tool in the analysis of genomic data, particularly as a predictor of gene expressivity. Methods for identifying codon usage bias in genomic data that rely solely on genomic sequence data are susceptible to being confounded by the presence of several factors simultaneously influencing codon selection. Presented here is a new technique for removing the effects of one of the more common confounding factors, GC(AT)-content, and of visualizing the search-space for codon usage bias through the use of a solution landscape. This technique successfully isolates expressivity-related codon usage trends, using only genomic sequence information, where other techniques fail due to the presence of GC(AT)-content confounding influences.
Collapse
Affiliation(s)
- Douglas W Raiford
- Department of Computer Science, University of Montana, Missoula, MT 59812, USA.
| | | | | | | |
Collapse
|
36
|
Perry SC, Beiko RG. Distinguishing microbial genome fragments based on their composition: evolutionary and comparative genomic perspectives. Genome Biol Evol 2010; 2:117-31. [PMID: 20333228 PMCID: PMC2839357 DOI: 10.1093/gbe/evq004] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 01/19/2010] [Indexed: 01/23/2023] Open
Abstract
It is well known that patterns of nucleotide composition vary within and among
genomes, although the reasons why these variations exist are not completely
understood. Between-genome compositional variation has been exploited to assign
environmental shotgun sequences to their most likely originating genomes,
whereas within-genome variation has been used to identify recently acquired
genetic material such as pathogenicity islands. Recent sequence assignment
techniques have achieved high levels of accuracy on artificial data sets, but
the relative difficulty of distinguishing lineages with varying degrees of
relatedness, and different types of genomic sequence, has not been examined in
depth. We investigated the compositional differences in a set of 774 sequenced
microbial genomes, finding rapid divergence among closely related genomes, but
also convergence of compositional patterns among genomes with similar habitats.
Support vector machines were then used to distinguish all pairs of genomes based
on genome fragments 500 nucleotides in length. The nearly 300,000 accuracy
scores obtained from these trials were used to construct general models of
distinguishability versus taxonomic and compositional indices of genomic
divergence. Unusual genome pairs were evident from their large residuals
relative to the fitted model, and we identified several factors including genome
reduction, putative lateral genetic transfer, and habitat convergence that
influence the distinguishability of genomes. The positional, compositional, and
functional context of a fragment within a genome has a strong influence on its
likelihood of correct classification, but in a way that depends on the taxonomic
and ecological similarity of the comparator genome.
Collapse
Affiliation(s)
- Scott C Perry
- Faculty of Computer Science, Dalhousie University, Halifax, Nova Scotia, Canada
| | | |
Collapse
|
37
|
Lichtenberg J, Jacox E, Welch JD, Kurz K, Liang X, Yang MQ, Drews F, Ecker K, Lee SS, Elnitski L, Welch LR. Word-based characterization of promoters involved in human DNA repair pathways. BMC Genomics 2009; 10 Suppl 1:S18. [PMID: 19594877 PMCID: PMC2709261 DOI: 10.1186/1471-2164-10-s1-s18] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022] Open
Abstract
Background DNA repair genes provide an important contribution towards the surveillance and repair of DNA damage. These genes produce a large network of interacting proteins whose mRNA expression is likely to be regulated by similar regulatory factors. Full characterization of promoters of DNA repair genes and the similarities among them will more fully elucidate the regulatory networks that activate or inhibit their expression. To address this goal, the authors introduce a technique to find regulatory genomic signatures, which represents a specific application of the genomic signature methodology to classify DNA sequences as putative functional elements within a single organism. Results The effectiveness of the regulatory genomic signatures is demonstrated via analysis of promoter sequences for genes in DNA repair pathways of humans. The promoters are divided into two classes, the bidirectional promoters and the unidirectional promoters, and distinct genomic signatures are calculated for each class. The genomic signatures include statistically overrepresented words, word clusters, and co-occurring words. The robustness of this method is confirmed by the ability to identify sequences that exist as motifs in TRANSFAC and JASPAR databases, and in overlap with verified binding sites in this set of promoter regions. Conclusion The word-based signatures are shown to be effective by finding occurrences of known regulatory sites. Moreover, the signatures of the bidirectional and unidirectional promoters of human DNA repair pathways are clearly distinct, exhibiting virtually no overlap. In addition to providing an effective characterization method for related DNA sequences, the signatures elucidate putative regulatory aspects of DNA repair pathways, which are notably under-characterized.
Collapse
Affiliation(s)
- Jens Lichtenberg
- Bioinformatics Laboratory, School of Electrical Engineering and Computer Science, Ohio University, Athens, Ohio, USA.
| | | | | | | | | | | | | | | | | | | | | |
Collapse
|
38
|
Tzahor S, Man-Aharonovich D, Kirkup BC, Yogev T, Berman-Frank I, Polz MF, Béjà O, Mandel-Gutfreund Y. A supervised learning approach for taxonomic classification of core-photosystem-II genes and transcripts in the marine environment. BMC Genomics 2009; 10:229. [PMID: 19445709 PMCID: PMC2696472 DOI: 10.1186/1471-2164-10-229] [Citation(s) in RCA: 17] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/18/2008] [Accepted: 05/16/2009] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Cyanobacteria of the genera Synechococcus and Prochlorococcus play a key role in marine photosynthesis, which contributes to the global carbon cycle and to the world oxygen supply. Recently, genes encoding the photosystem II reaction center (psbA and psbD) were found in cyanophage genomes. This phenomenon suggested that the horizontal transfer of these genes may be involved in increasing phage fitness. To date, a very small percentage of marine bacteria and phages has been cultured. Thus, mapping genomic data extracted directly from the environment to its taxonomic origin is necessary for a better understanding of phage-host relationships and dynamics. RESULTS To achieve an accurate and rapid taxonomic classification, we employed a computational approach combining a multi-class Support Vector Machine (SVM) with a codon usage position specific scoring matrix (cuPSSM). Our method has been applied successfully to classify core-photosystem-II gene fragments, including partial sequences coming directly from the ocean, to seven different taxonomic classes. Applying the method on a large set of DNA and RNA psbA clones from the Mediterranean Sea, we studied the distribution of cyanobacterial psbA genes and transcripts in their natural environment. Using our approach, we were able to simultaneously examine taxonomic and ecological distributions in the marine environment. CONCLUSION The ability to accurately classify the origin of individual genes and transcripts coming directly from the environment is of great importance in studying marine ecology. The classification method presented in this paper could be applied further to classify other genes amplified from the environment, for which training data is available.
Collapse
Affiliation(s)
- Shani Tzahor
- Faculty of Biology, Technion – Israel Institute of Technology, Haifa 32000, Israel
- Inter-Departmental Program for Biotechnology, Technion – Israel Institute of Technology, Haifa 32000, Israel
| | | | - Benjamin C Kirkup
- Department of Civil and Environmental Engineering, Massachusetts Institute of Technology, Cambridge, MA 02139, USA
| | - Tali Yogev
- Faculty of Life Sciences, Bar-Ilan University, Ramat Gan 52900, Israel
| | | | - Martin F Polz
- Department of Civil and Environmental Engineering, Massachusetts Institute of Technology, Cambridge, MA 02139, USA
| | - Oded Béjà
- Faculty of Biology, Technion – Israel Institute of Technology, Haifa 32000, Israel
| | | |
Collapse
|
39
|
Tzahor S, Man-Aharonovich D, Kirkup BC, Yogev T, Berman-Frank I, Polz MF, Béjà O, Mandel-Gutfreund Y. A supervised learning approach for taxonomic classification of core-photosystem-II genes and transcripts in the marine environment. BMC Genomics 2009. [PMID: 19445709 DOI: 10.1186/1471-2164-10-229.] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Cyanobacteria of the genera Synechococcus and Prochlorococcus play a key role in marine photosynthesis, which contributes to the global carbon cycle and to the world oxygen supply. Recently, genes encoding the photosystem II reaction center (psbA and psbD) were found in cyanophage genomes. This phenomenon suggested that the horizontal transfer of these genes may be involved in increasing phage fitness. To date, a very small percentage of marine bacteria and phages has been cultured. Thus, mapping genomic data extracted directly from the environment to its taxonomic origin is necessary for a better understanding of phage-host relationships and dynamics. RESULTS To achieve an accurate and rapid taxonomic classification, we employed a computational approach combining a multi-class Support Vector Machine (SVM) with a codon usage position specific scoring matrix (cuPSSM). Our method has been applied successfully to classify core-photosystem-II gene fragments, including partial sequences coming directly from the ocean, to seven different taxonomic classes. Applying the method on a large set of DNA and RNA psbA clones from the Mediterranean Sea, we studied the distribution of cyanobacterial psbA genes and transcripts in their natural environment. Using our approach, we were able to simultaneously examine taxonomic and ecological distributions in the marine environment. CONCLUSION The ability to accurately classify the origin of individual genes and transcripts coming directly from the environment is of great importance in studying marine ecology. The classification method presented in this paper could be applied further to classify other genes amplified from the environment, for which training data is available.
Collapse
Affiliation(s)
- Shani Tzahor
- Faculty of Biology, Technion - Israel Institute of Technology, Haifa, Israel.
| | | | | | | | | | | | | | | |
Collapse
|
40
|
Classification and regression tree (CART) analyses of genomic signatures reveal sets of tetramers that discriminate temperature optima of archaea and bacteria. ARCHAEA-AN INTERNATIONAL MICROBIOLOGICAL JOURNAL 2009; 2:159-67. [PMID: 19054742 DOI: 10.1155/2008/829730] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/17/2022]
Abstract
Classification and regression tree (CART) analysis was applied to genome-wide tetranucleotide frequencies (genomic signatures) of 195 archaea and bacteria. Although genomic signatures have typically been used to classify evolutionary divergence, in this study, convergent evolution was the focus. Temperature optima for most of the organisms examined could be distinguished by CART analyses of tetranucleotide frequencies. This suggests that pervasive (nonlinear) qualities of genomes may reflect certain environmental conditions (such as temperature) in which those genomes evolved. The predominant use of GAGA and AGGA as the discriminating tetramers in CART models suggests that purine-loading and codon biases of thermophiles may explain some of the results.
Collapse
|
41
|
Radomski JP, Slonimski PP. ISSCOR: Intragenic, Stochastic Synonymous Codon Occurrence Replacement--a new method for an alignment-free genome sequence analysis. C R Biol 2009; 332:336-50. [PMID: 19304264 DOI: 10.1016/j.crvi.2008.11.008] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/11/2008] [Revised: 11/06/2008] [Accepted: 12/02/2008] [Indexed: 11/17/2022]
Abstract
Synonymous codons do not occur at equal frequencies. Codon usage and codon bias have been extensively studied. However, the sequential order in which synonymous codons appear within a gene has not been studied until now. Here we describe an in silico method, which is the first attempt to tackle this problem: to what extent this sequential order is unique, and to what extent the succession of synonymous codons is important. This method, which we called Intragenic, Stochastic Synonymous Codon Occurrence Replacement (ISSCOR), generates, by a Monte Carlo approach, a set of genes which code for the same amino acid sequence, and display the same codon usage, but have random permutations of the synonymous codons, and therefore different sequential codon orders from the original gene. We analyze the complete genome of the bacterium Helicobacter pylori (containing 1574 protein coding genes), and show by various, alignment-free computational methods (e.g., frequency distribution of codon-pairs, as well as that of nucleotide bigrams in codon-pairs), that: (i) not only the succession of adjacent synonymous codons is far from random, but also, which is totally unexpected, the occurrences of non-adjacent synonymous codon-pairs are highly constrained, at strikingly long distances of dozens of nucleotides; (ii) the statistical deviations from the random synonymous codon order are overwhelming; and (iii) the pattern of nucleotide bigrams in codon-pairs can be used in a novel way for characterizing and comparing genes and genomes. Our results demonstrate that the sequential order of synonymous codons within a gene must be under a strong selective pressure, which is superimposed on the classical codon usage. This new dimension can be measured by the ISSCOR method, which is simple, robust, and should be useful for comparative and functional genomics.
Collapse
Affiliation(s)
- Jan P Radomski
- Interdisciplinary Center for Mathematical and Computational Modeling, Warsaw University, Pawińskiego 5A, Bldg. D, 02106 Warsaw, Poland.
| | | |
Collapse
|
42
|
Suzuki H, Brown CJ, Forney LJ, Top EM. Comparison of correspondence analysis methods for synonymous codon usage in bacteria. DNA Res 2008; 15:357-65. [PMID: 18940873 PMCID: PMC2608848 DOI: 10.1093/dnares/dsn028] [Citation(s) in RCA: 55] [Impact Index Per Article: 3.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/21/2008] [Accepted: 09/24/2008] [Indexed: 12/02/2022] Open
Abstract
Synonymous codon usage varies both between organisms and among genes within a genome, and arises due to differences in G + C content, replication strand skew, or gene expression levels. Correspondence analysis (CA) is widely used to identify major sources of variation in synonymous codon usage among genes and provides a way to identify horizontally transferred or highly expressed genes. Four methods of CA have been developed based on three kinds of input data: absolute codon frequency, relative codon frequency, and relative synonymous codon usage (RSCU) as well as within-group CA (WCA). Although different CA methods have been used in the past, no comprehensive comparative study has been performed to evaluate their effectiveness. Here, the four CA methods were evaluated by applying them to 241 bacterial genome sequences. The results indicate that WCA is more effective than the other three methods in generating axes that reflect variations in synonymous codon usage. Furthermore, WCA reveals sources that were previously unnoticed in some genomes; e.g. synonymous codon usage related to replication strand skew was detected in Rickettsia prowazekii. Though CA based on RSCU is widely used, our evaluation indicates that this method does not perform as well as WCA.
Collapse
Affiliation(s)
- Haruo Suzuki
- Department of Biological Sciences and Initiative for Bioinformatics and Evolutionary Studies, University of Idaho, PO Box 443051, Moscow, Idaho 83844-3051, USA.
| | | | | | | |
Collapse
|
43
|
Orsi RH, Sun Q, Wiedmann M. Genome-wide analyses reveal lineage specific contributions of positive selection and recombination to the evolution of Listeria monocytogenes. BMC Evol Biol 2008; 8:233. [PMID: 18700032 PMCID: PMC2532693 DOI: 10.1186/1471-2148-8-233] [Citation(s) in RCA: 60] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/26/2007] [Accepted: 08/12/2008] [Indexed: 12/30/2022] Open
Abstract
Background The genus Listeria includes two closely related pathogenic and non-pathogenic species, L. monocytogenes and L. innocua. L. monocytogenes is an opportunistic human foodborne and animal pathogen that includes two common lineages. While lineage I is more commonly found among human listeriosis cases, lineage II appears to be overrepresented among isolates from foods and environmental sources. This study used the genome sequences for one L. innocua strain and four L. monocytogenes strains representing lineages I and II, to characterize the contributions of positive selection and recombination to the evolution of the L. innocua/L. monocytogenes core genome. Results Among the 2267 genes in the L. monocytogenes/L. innocua core genome, 1097 genes showed evidence for recombination and 36 genes showed evidence for positive selection. Positive selection was strongly associated with recombination. Specifically, 29 of the 36 genes under positive selection also showed evidence for recombination. Recombination was more common among isolates in lineage II than lineage I; this trend was confirmed by sequencing five genes in a larger isolate set. Positive selection was more abundant in the ancestral branch of lineage II (20 genes) as compared to the ancestral branch of lineage I (9 genes). Additional genes under positive selection were identified in the branch separating the two species; for this branch, genes in the role category "Cell wall and membrane biogenesis" were significantly more likely to have evidence for positive selection. Positive selection of three genes was confirmed in a larger isolate set, which also revealed occurrence of multiple premature stop codons in one positively selected gene involved in flagellar motility (flaR). Conclusion While recombination and positive selection both contribute to evolution of L. monocytogenes, the relative contributions of these evolutionary forces seem to differ by L. monocytogenes lineages and appear to be more important in the evolution of lineage II, which seems to be found in a broader range of environments, as compared to the apparently more host adapted lineage I. Diversification of cell wall and membrane biogenesis and motility-related genes may play a particularly important role in the evolution of L. monocytogenes.
Collapse
Affiliation(s)
- Renato H Orsi
- Department of Food Science, Cornell University, Ithaca, NY, USA.
| | | | | |
Collapse
|
44
|
Paape D, Lippuner C, Schmid M, Ackermann R, Barrios-Llerena ME, Zimny-Arndt U, Brinkmann V, Arndt B, Pleissner KP, Jungblut PR, Aebischer T. Transgenic, fluorescent Leishmania mexicana allow direct analysis of the proteome of intracellular amastigotes. Mol Cell Proteomics 2008; 7:1688-701. [PMID: 18474515 DOI: 10.1074/mcp.m700343-mcp200] [Citation(s) in RCA: 61] [Impact Index Per Article: 3.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/06/2022] Open
Abstract
Investigating the proteome of intracellular pathogens is often hampered by inadequate methodologies to purify the pathogen free of host cell material. This has also precluded direct proteome analysis of the intracellular, amastigote form of Leishmania spp., protozoan parasites that cause a spectrum of diseases that affect some 12 million patients worldwide. Here a method is presented that combines classic, isopycnic density centrifugation with fluorescent particle sorting for purification by exploiting transgenic, fluorescent parasites to allow direct proteome analysis of the purified organisms. By this approach the proteome of intracellular Leishmania mexicana amastigotes was compared with that of extracellular promastigotes that are transmitted by insect vectors. In total, 509 different proteins were identified by mass spectrometry and database search. This number corresponds to approximately 6% of gene products predicted from the reference genome of Leishmania major. Intracellular amastigotes synthesized significantly more proteins with basic pI and showed a greater abundance of enzymes of fatty acid catabolism, which may reflect their living in acidic habitats and metabolic adaptation to nutrient availability, respectively. Bioinformatics analyses of the genes corresponding to the protein data sets produced clear evidence for skewed codon usage and translational bias in these organisms. Moreover analysis of the subset of genes whose products were more abundant in amastigotes revealed characteristic sequence motifs in 3'-untranslated regions that have been linked to translational control elements. This suggests that proteome data sets may be used to identify regulatory elements in mRNAs. Last but not least, at 6% coverage the proteome identified all vaccine antigens tested to date. Thus, the present data set provides a valuable resource for selection of candidate vaccine antigens.
Collapse
Affiliation(s)
- Daniel Paape
- Institute of Immunology and Infection Research, University of Edinburgh, West Mains Road, Edinburgh EH9 3JT, United Kingdom
| | | | | | | | | | | | | | | | | | | | | |
Collapse
|
45
|
Carbone A. Codon bias is a major factor explaining phage evolution in translationally biased hosts. J Mol Evol 2008; 66:210-23. [PMID: 18286220 DOI: 10.1007/s00239-008-9068-6] [Citation(s) in RCA: 60] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/27/2007] [Revised: 11/20/2007] [Accepted: 12/07/2007] [Indexed: 11/28/2022]
Abstract
The size and diversity of bacteriophage populations require methodologies to quantitatively study the landscape of phage differences. Statistical approaches are confronted with small genome sizes forbidding significant single-phage analysis, and comparative methods analyzing full phage genomes represent an alternative but they are of difficult interpretation due to lateral gene transfer, which creates a mosaic spectrum of related phage species. Based on a large-scale codon bias analysis of 116 DNA phages hosted by 11 translationally biased bacteria belonging to different phylogenetic families, we observe that phage genomes are almost always under codon selective pressure imposed by translationally biased hosts, and we propose a classification of phages with translationally biased hosts which is based on adaptation patterns. We introduce a computational method for comparing phages sharing homologous proteins, possibly accepted by different hosts. We observe that throughout phages, independently from the host, capsid genes appear to be the most affected by host translational bias. For coliphages, genes involved in virion morphogenesis, host interaction and ssDNA binding are also affected by adaptive pressure. Adaptation affects long and small phages in a significant way. We analyze in more detail the Microviridae phage space to illustrate the potentiality of the approach. The small number of directions in adaptation observed in phages grouped around phi X174 is discussed in the light of functional bias. The adaptation analysis of the set of Microviridae phages defined around phi MH2K shows that phage classification based on adaptation does not reflect bacterial phylogeny.
Collapse
Affiliation(s)
- Alessandra Carbone
- Génomique Analytique, Université Pierre et Marie Curie-Paris 6, UMR S511, 91 Bd de l'Hôpital, 75013, Paris, France.
| |
Collapse
|
46
|
Puigbò P, Romeu A, Garcia-Vallvé S. HEG-DB: a database of predicted highly expressed genes in prokaryotic complete genomes under translational selection. Nucleic Acids Res 2007; 36:D524-7. [PMID: 17933767 PMCID: PMC2238906 DOI: 10.1093/nar/gkm831] [Citation(s) in RCA: 60] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open
Abstract
The highly expressed genes database (HEG-DB) is a genomic database that includes the prediction of which genes are highly expressed in prokaryotic complete genomes under strong translational selection. The current version of the database contains general features for almost 200 genomes under translational selection, including the correspondence analysis of the relative synonymous codon usage for all genes, and the analysis of their highly expressed genes. For each genome, the database contains functional and positional information about the predicted group of highly expressed genes. This information can also be accessed using a search engine. Among other statistical parameters, the database also provides the Codon Adaptation Index (CAI) for all of the genes using the codon usage of the highly expressed genes as a reference set. The 'Pathway Tools Omics Viewer' from the BioCyc database enables the metabolic capabilities of each genome to be explored, particularly those related to the group of highly expressed genes. The HEG-DB is freely available at http://genomes.urv.cat/HEG-DB.
Collapse
Affiliation(s)
- Pere Puigbò
- Evolutionary Genomics Group, Biochemistry and Biotechnology Department, Faculty of Chemistry, Rovira i Virgili University (URV), c/Marcel-li Domingo, s/n. Campus Sescelades, 43007 Tarragona, Spain.
| | | | | |
Collapse
|
47
|
Haverkamp T, Acinas SG, Doeleman M, Stomp M, Huisman J, Stal LJ. Diversity and phylogeny of Baltic Sea picocyanobacteria inferred from their ITS and phycobiliprotein operons. Environ Microbiol 2007; 10:174-88. [PMID: 17903216 DOI: 10.1111/j.1462-2920.2007.01442.x] [Citation(s) in RCA: 21] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/26/2022]
Abstract
Picocyanobacteria of the genus Synechococcus span a range of different colours, from red strains rich in phycoerythrin (PE) to green strains rich in phycocyanin (PC). Here, we show that coexistence of red and green picocyanobacteria in the Baltic Sea is widespread. The diversity and phylogeny of red and green picocyanobacteria was analysed using three different genes: 16S rRNA-ITS, the cpeBA operon of the red PE pigment and the cpcBA operon of the green PC pigment. Sequencing of 209 clones showed that Baltic Sea picocyanobacteria exhibit high levels of microdiversity. The partial nucleotide sequences of the cpcBA and cpeBA operons from the clone libraries of the Baltic Sea revealed two distinct phylogenetic clades: one clade containing mainly sequences from cultured PC-rich picocyanobacteria, while the other contains only sequences from cultivated PE-rich strains. A third clade of phycourobilin (PUB) containing strains of PE-rich Synechococcus spp. did not contain sequences from the Baltic Sea clone libraries. These findings differ from previously published phylogenies based on 16S rRNA gene analysis. Our data suggest that, in terms of their pigmentation, Synechococcus spp. represent three different lineages occupying different ecological niches in the underwater light spectrum. Strains from different lineages can coexist in light environments that overlap with their light absorption spectra.
Collapse
Affiliation(s)
- Thomas Haverkamp
- Department of Marine Microbiology, Netherlands Institute of Ecology, NIOO-KNAW, P.O. Box 140, 4400 AC Yerseke, The Netherlands
| | | | | | | | | | | |
Collapse
|
48
|
Gorban AN, Zinovyev AY. The mystery of two straight lines in bacterial genome statistics. Bull Math Biol 2007; 69:2429-42. [PMID: 17577600 DOI: 10.1007/s11538-007-9229-6] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/09/2006] [Accepted: 05/04/2007] [Indexed: 10/23/2022]
Abstract
In special coordinates (codon position-specific nucleotide frequencies), bacterial genomes form two straight lines in 9-dimensional space: one line for eubacterial genomes, another for archaeal genomes. All the 348 distinct bacterial genomes available in Genbank in April 2007, belong to these lines with high accuracy. The main challenge now is to explain the observed high accuracy. The new phenomenon of complementary symmetry for codon position-specific nucleotide frequencies is observed. The results of analysis of several codon usage models are presented. We demonstrate that the mean-field approximation, which is also known as context-free, or complete independence model, or Segre variety, can serve as a reasonable approximation to the real codon usage. The first two principal components of codon usage correlate strongly with genomic G+C content and the optimal growth temperature, respectively. The variation of codon usage along the third component is related to the curvature of the mean-field approximation. First three eigenvalues in codon usage PCA explain 59.1%, 7.8% and 4.7% of variation. The eubacterial and archaeal genomes codon usage is clearly distributed along two third order curves with genomic G+C content as a parameter.
Collapse
|
49
|
Willenbrock H, Friis C, Friis AS, Ussery DW. An environmental signature for 323 microbial genomes based on codon adaptation indices. Genome Biol 2007; 7:R114. [PMID: 17156429 PMCID: PMC1794427 DOI: 10.1186/gb-2006-7-12-r114] [Citation(s) in RCA: 47] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/28/2006] [Revised: 09/20/2006] [Accepted: 12/07/2006] [Indexed: 11/23/2022] Open
Abstract
The correlation of two methods for estimating codon adaptation indices applied to more than 300 bacterial species shows that codon usage preference provides an environmental signature by which it is possible to group bacteria according to their lifestyle Background Codon adaptation indices (CAIs) represent an evolutionary strategy to modulate gene expression and have widely been used to predict potentially highly expressed genes within microbial genomes. Here, we evaluate and compare two very different methods for estimating CAI values, one corresponding to translational codon usage bias and the second obtained mathematically by searching for the most dominant codon bias. Results The level of correlation between these two CAI methods is a simple and intuitive measure of the degree of translational bias in an organism, and from this we confirm that fast replicating bacteria are more likely to have a dominant translational codon usage bias than are slow replicating bacteria, and that this translational codon usage bias may be used for prediction of highly expressed genes. By analyzing more than 300 bacterial genomes, as well as five fungal genomes, we show that codon usage preference provides an environmental signature by which it is possible to group bacteria according to their lifestyle, for instance soil bacteria and soil symbionts, spore formers, enteric bacteria, aquatic bacteria, and intercellular and extracellular pathogens. Conclusion The results and the approach described here may be used to acquire new knowledge regarding species lifestyle and to elucidate relationships between organisms that are far apart evolutionarily.
Collapse
Affiliation(s)
- Hanni Willenbrock
- Center for Biological Sequence Analysis, BioCentrum-DTU, The Technical University of Denmark, DK-2800 Lyngby, Denmark
| | - Carsten Friis
- Center for Biological Sequence Analysis, BioCentrum-DTU, The Technical University of Denmark, DK-2800 Lyngby, Denmark
| | - Agnieszka S Friis
- Center for Biological Sequence Analysis, BioCentrum-DTU, The Technical University of Denmark, DK-2800 Lyngby, Denmark
| | - David W Ussery
- Center for Biological Sequence Analysis, BioCentrum-DTU, The Technical University of Denmark, DK-2800 Lyngby, Denmark
| |
Collapse
|
50
|
Prediction of highly expressed genes in microbes based on chromatin accessibility. BMC Mol Biol 2007; 8:11. [PMID: 17295928 PMCID: PMC1805505 DOI: 10.1186/1471-2199-8-11] [Citation(s) in RCA: 11] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/25/2006] [Accepted: 02/13/2007] [Indexed: 12/22/2022] Open
Abstract
BACKGROUND It is well known that gene expression is dependent on chromatin structure in eukaryotes and it is likely that chromatin can play a role in bacterial gene expression as well. Here, we use a nucleosomal position preference measure of anisotropic DNA flexibility to predict highly expressed genes in microbial genomes. We compare these predictions with those based on codon adaptation index (CAI) values, and also with experimental data for 6 different microbial genomes, with a particular interest in experimental data from Escherichia coli. Moreover, position preference is examined further in 328 sequenced microbial genomes. RESULTS We find that absolute gene expression levels are correlated with the position preference in many microbial genomes. It is postulated that in these regions, the DNA may be more accessible to the transcriptional machinery. Moreover, ribosomal proteins and ribosomal RNA are encoded by DNA having significantly lower position preference values than other genes in fast-replicating microbes. CONCLUSION This insight into DNA structure-dependent gene expression in microbes may be exploited for predicting the expression of non-translated genes such as non-coding RNAs that may not be predicted by any of the conventional codon usage bias approaches.
Collapse
|