1
|
Mazumder TH, Uddin A. Understanding the nucleotide composition and patterns of codon usage in the expression of human oral cancer genes. Mutat Res 2024; 829:111880. [PMID: 39197334 DOI: 10.1016/j.mrfmmm.2024.111880] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/02/2024] [Revised: 07/06/2024] [Accepted: 08/09/2024] [Indexed: 09/01/2024]
Abstract
Oral squamous cell carcinoma (OSCC) is primarily known as oral cancer (OC) that mostly occurs in mouth, lips and tongue. Mutations in some of the genes cause OC and some genes are risk factors for progression of OC. In this study, we analyzed the compositional features and pattern of codon usage in genes involved in OC using computational method as no work was reported yet. Compositional features suggested that the overall GC content was higher i.e. genes were GC rich. Effective number of codons (ENC) values ranged from 34.6 to 55.9 with a mean value of 49.03±4.22 representing low codon usage bias (CUB). Correspondence analysis (COA) suggested that the codon usage pattern was different in different genes. In genes associated with OC, highly significant correlation was observed between GC12 and GC3 (r=0.454, p<0.01) suggesting that directional mutation affected all the three codon positions. This is the first report on pattern of codon usage pattern on genes involved in OC, which not only alludes a new perspective for elucidating the mechanisms of biased usage of synonymous codons but also provide valuable clues for molecular genetic engineering.
Collapse
Affiliation(s)
| | - Arif Uddin
- Departments of Zoology, Moinul Hoque Choudhury Memorial Science College, Algapur, Hailakandi, Assam 788150, India.
| |
Collapse
|
2
|
Codon Usage for Genetic Diversity, and Evolutionary Dynamics of Novel Porcine Parvoviruses 2 through 7 (PPV2–PPV7). Viruses 2022; 14:v14020170. [PMID: 35215764 PMCID: PMC8876854 DOI: 10.3390/v14020170] [Citation(s) in RCA: 5] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/16/2021] [Revised: 01/13/2022] [Accepted: 01/14/2022] [Indexed: 02/06/2023] Open
Abstract
Porcine parvovirus (PPV) is the main pathogen of reproductive disorders. In recent years, a new type of porcine parvovirus has been discovered and named porcine parvovirus 2 to 7 (PPV2–PPV7), and it is associated with porcine circovirus type 2 in pigs. Codon usage patterns and their effects on the evolution and host adaptation of different PPV sub-types are still largely unknown. Here, we define six main sub-types based on the Bayesian method of structural proteins of each sub-type of PPV, including PPV2, PPV3, PPV4, PPV5, PPV6, and PPV7, which show different degrees of codon usage preferences. The effective number of codons (ENC) indicates that all PPV sub-types have low codon bias. According to the codon adaptation index (CAI), PPV3 and PPV7 have the highest similarity with the host, which is related to the main popular tendency of the host in the field; according to the frequency of optimal codons (FOP), PPV7 has the highest frequency of optimal codons, indicating the most frequently used codons in its genes; and according to the relative codon deoptimization index (RCDI), PPV3 has a higher degree. Therefore, it is determined that mutational stress has a certain impact on the codon usage preference of PPV genes, and natural selection plays a very decisive and dominant role in the codon usage pattern. Our research provides a new perspective on the evolution of porcine parvovirus (PPV) and may help provide a new method for future research on the origin, evolutionary model, and host adaptation of PPV.
Collapse
|
3
|
Jiang S, Du Q, Feng C, Ma L, Zhang Z. CompoDynamics: a comprehensive database for characterizing sequence composition dynamics. Nucleic Acids Res 2022; 50:D962-D969. [PMID: 34718745 PMCID: PMC8728180 DOI: 10.1093/nar/gkab979] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/15/2021] [Revised: 10/02/2021] [Accepted: 10/06/2021] [Indexed: 11/15/2022] Open
Abstract
Sequence compositions of nucleic acids and proteins have significant impact on gene expression, RNA stability, translation efficiency, RNA/protein structure and molecular function, and are associated with genome evolution and adaptation across all kingdoms of life. Therefore, a devoted resource of sequence compositions and associated features is fundamentally crucial for a wide range of biological research. Here, we present CompoDynamics (https://ngdc.cncb.ac.cn/compodynamics/), a comprehensive database of sequence compositions of coding sequences (CDSs) and genomes for all kinds of species. Taking advantage of the exponential growth of RefSeq data, CompoDynamics presents a wealth of sequence compositions (nucleotide content, codon usage, amino acid usage) and derived features (coding potential, physicochemical property and phase separation) for 118 689 747 high-quality CDSs and 34 562 genomes across 24 995 species. Additionally, interactive analytical tools are provided to enable comparative analyses of sequence compositions and molecular features across different species and gene groups. Collectively, CompoDynamics bears the great potential to better understand the underlying roles of sequence composition dynamics across genes and genomes, providing a fundamental resource in support of a broad spectrum of biological studies.
Collapse
Affiliation(s)
- Shuai Jiang
- National Genomics Data Center & CAS Key Laboratory of Genome Sciences and Information, Beijing Institute of Genomics, Chinese Academy of Sciences, Beijing 100101, China
- China National Center for Bioinformation, Beijing 100101, China
| | - Qiang Du
- National Genomics Data Center & CAS Key Laboratory of Genome Sciences and Information, Beijing Institute of Genomics, Chinese Academy of Sciences, Beijing 100101, China
- China National Center for Bioinformation, Beijing 100101, China
- University of Chinese Academy of Sciences, Beijing 100049, China
| | - Changrui Feng
- National Genomics Data Center & CAS Key Laboratory of Genome Sciences and Information, Beijing Institute of Genomics, Chinese Academy of Sciences, Beijing 100101, China
- China National Center for Bioinformation, Beijing 100101, China
- University of Chinese Academy of Sciences, Beijing 100049, China
| | - Lina Ma
- National Genomics Data Center & CAS Key Laboratory of Genome Sciences and Information, Beijing Institute of Genomics, Chinese Academy of Sciences, Beijing 100101, China
- China National Center for Bioinformation, Beijing 100101, China
- University of Chinese Academy of Sciences, Beijing 100049, China
| | - Zhang Zhang
- National Genomics Data Center & CAS Key Laboratory of Genome Sciences and Information, Beijing Institute of Genomics, Chinese Academy of Sciences, Beijing 100101, China
- China National Center for Bioinformation, Beijing 100101, China
- University of Chinese Academy of Sciences, Beijing 100049, China
| |
Collapse
|
4
|
Nair RR, Mohan M, Rudramurthy GR, Vivekanandam R, Satheshkumar PS. Strategies and Patterns of Codon Bias in Molluscum Contagiosum Virus. Pathogens 2021; 10:1649. [PMID: 34959603 PMCID: PMC8703355 DOI: 10.3390/pathogens10121649] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/24/2021] [Revised: 12/14/2021] [Accepted: 12/16/2021] [Indexed: 11/22/2022] Open
Abstract
Trends associated with codon usage in molluscum contagiosum virus (MCV) and factors governing the evolution of codon usage have not been investigated so far. In this study, attempts were made to decipher the codon usage trends and discover the major evolutionary forces that influence the patterns of codon usage in MCV with special reference to sub-types 1 and 2, MCV-1 and MCV-2, respectively. Three hypotheses were tested: (1) codon usage patterns of MCV-1 and MCV-2 are identical; (2) SCUB (synonymous codon usage bias) patterns of MCV-1 and MCV-2 slightly deviate from that of human host to avoid affecting the fitness of host; and (3) translational selection predominantly shapes the SCUB of MCV-1 and MCV-2. Various codon usage indices viz. relative codon usage value, effective number of codons and codon adaptation index were calculated to infer the nature of codon usage. Correspondence analysis and correlation analysis were performed to assess the relative contribution of silent base contents and significance of codon usage indices in defining bias in codon usage. Among the tested hypotheses, only the second and third hypotheses were accepted.
Collapse
Affiliation(s)
- Rahul Raveendran Nair
- Centre for Evolutionary Ecology, Aushmath Biosciences, Vadavalli Post, Coimbatore 641041, India
| | - Manikandan Mohan
- College of Pharmacy, University of Georgia, Athens, GA 30605, USA;
| | | | - Reethu Vivekanandam
- Department of Biotechnology, Bharathiyar University, Coimbatore 641046, India;
| | | |
Collapse
|
5
|
Abstract
Atypical porcine pestivirus (APPV) has been identified as the main causative agent for congenital tremor (CT) type A-II in piglets, which is threatening the health of the global swine herd. However, the evolution of APPV remains largely unknown. In this study, phylogenetic analysis showed that APPV could be divided into three phylogroups (I, II, and III). Phylogroups I and II included viral strains from China, while phylogroup III contained strains from Europe, North America, and Asia. Phylogroups I and II are tentatively thought to be of Chinese origin. Next, compositional property analysis revealed that a high frequency of nucleotide A and A-end codons was used in the APPV genome. Intriguingly, the analysis of preferred codons revealed that the AGA[Arg] and AGG[Arg] were overrepresented. Dinucleotide CC was found to be overrepresented, and dinucleotide CG was underrepresented. Furthermore, it was found that the weak codon usage bias of APPV was mainly dominated by selection pressures versus mutational forces. The codon adaptation index (CAI), relative codon deoptimization index (RCDI), and similarity index (SiD) analyses showed that the codon usage patterns of phylogroup II and III were more similar to the one of a pig than phylogroup I, suggesting that phylogroup II and III may be more adaptive to pigs. Overall, this study provides insights into APPV evolution through phylogeny and codon usage pattern analysis.
Collapse
Affiliation(s)
- Shuonan Pan
- College of Veterinary Medicine, Yangzhou University , Yangzhou, Jiangsu, People's Republic of China
| | - Chunxiao Mou
- College of Veterinary Medicine, Yangzhou University , Yangzhou, Jiangsu, People's Republic of China.,Institute of Comparative Medicine, Yangzhou University , Yangzhou, Jiangsu, People's Republic of China.,Jiangsu Co-Innovation Center for Prevention and Control of Important Animal Infectious Diseases and Zoonoses, Yangzhou University , Yangzhou, Jiangsu, People's Republic of China
| | - Huiguang Wu
- College of Veterinary Medicine, Yangzhou University , Yangzhou, Jiangsu, People's Republic of China.,Institute of Comparative Medicine, Yangzhou University , Yangzhou, Jiangsu, People's Republic of China.,Jiangsu Co-Innovation Center for Prevention and Control of Important Animal Infectious Diseases and Zoonoses, Yangzhou University , Yangzhou, Jiangsu, People's Republic of China.,Joint International Research Laboratory of Agriculture and Agri-Product Safety, the Ministry of Education of China, Yangzhou University , Yangzhou, Jiangsu, People's Republic of China
| | - Zhenhai Chen
- College of Veterinary Medicine, Yangzhou University , Yangzhou, Jiangsu, People's Republic of China.,Institute of Comparative Medicine, Yangzhou University , Yangzhou, Jiangsu, People's Republic of China.,Jiangsu Co-Innovation Center for Prevention and Control of Important Animal Infectious Diseases and Zoonoses, Yangzhou University , Yangzhou, Jiangsu, People's Republic of China.,Joint International Research Laboratory of Agriculture and Agri-Product Safety, the Ministry of Education of China, Yangzhou University , Yangzhou, Jiangsu, People's Republic of China
| |
Collapse
|
6
|
Evolutionary Patterns of Codon Usage in Major Lineages of Porcine Reproductive and Respiratory Syndrome Virus in China. Viruses 2021; 13:v13061044. [PMID: 34072978 PMCID: PMC8228872 DOI: 10.3390/v13061044] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/04/2021] [Revised: 05/16/2021] [Accepted: 05/25/2021] [Indexed: 11/17/2022] Open
Abstract
Porcine reproductive and respiratory syndrome virus (PRRSV) is economically important and characterized by its extensive variation. The codon usage patterns and their influence on viral evolution and host adaptation among different PRRSV strains remain largely unknown. Here, the codon usage of ORF5 genes from lineages 1, 3, 5, and 8, and MLV strains of type 2 PRRSV in China was analyzed. A compositional property analysis of ORF5 genes revealed that nucleotide C is most frequently used at the third position of codons, accompanied by rich GC3s. The effective number of codon (ENC) and codon pair bias (CPB) values indicate that all ORF5 genes have low codon bias and the differences in CPB scores among four lineages are almost not significant. When compared with host codon usage patterns, lineage 1 strains show higher CAI and SiD values, with a high similarity to pig, which might relate to its predominant epidemic propensity in the field. The CAI, RCDI, and SiD values of ORF5 genes from different passages of MLV JXA1R indicate no relation between attenuation and CPB or codon adaptation decrease during serial passage on non-host cells. These findings provide a novel way of understanding the PRRSV's evolution, related to viral survival, host adaptation, and virulence.
Collapse
|
7
|
Yu J. From Mutation Signature to Molecular Mechanism in the RNA World: A Case of SARS-CoV-2. GENOMICS PROTEOMICS & BIOINFORMATICS 2020; 18:627-639. [PMID: 32739507 PMCID: PMC7391168 DOI: 10.1016/j.gpb.2020.07.003] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 05/31/2020] [Revised: 07/10/2020] [Accepted: 07/23/2020] [Indexed: 02/07/2023]
Affiliation(s)
- Jun Yu
- China National Center for Bioinformation, Beijing 100101, China; CAS Key Laboratory of Genome Sciences and Information, Beijing Institute of Genomics, Chinese Academy of Sciences, Beijing 100101, China; University of Chinese Academy of Sciences, Beijing 100190, China.
| |
Collapse
|
8
|
Yang RH, Su JH, Shang JJ, Wu YY, Li Y, Bao DP, Yao YJ. Evaluation of the ribosomal DNA internal transcribed spacer (ITS), specifically ITS1 and ITS2, for the analysis of fungal diversity by deep sequencing. PLoS One 2018; 13:e0206428. [PMID: 30359454 PMCID: PMC6201957 DOI: 10.1371/journal.pone.0206428] [Citation(s) in RCA: 62] [Impact Index Per Article: 10.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/25/2018] [Accepted: 10/12/2018] [Indexed: 12/17/2022] Open
Abstract
The nuclear ribosomal DNA internal transcribed spacer (ITS) has been widely used to assess the fungal composition in different environments by deep sequencing. To evaluate the ITS in the analysis of fungal diversity, comparisons of the clustering and taxonomy generated by sequencing with different portions of the whole fragment were conducted in this study. For a total of 83,120 full-length ITS sequences obtained from the UNITE database, it was found that, on average, ITS1 varied more than ITS2 within the kingdom Fungi; this variation included length and GC content variations and polymorphisms, with some polymorphisms specific to particular fungal groups. The taxonomic accuracy for ITS was higher than that for ITS1 or ITS2. The commonly used operational taxonomic unit (OTU) for evaluating fungal diversity and richness assigned several species to a single OTU even with clustering at 99.00% sequence similarity. The clustering and taxonomic capacities did not differ between ITS1 and ITS2. However, the OTU commonality between ITS1 and ITS2 was very low. To test this observation further, 219,741 pyrosequencing reads, including 39,840 full-length ITS sequences, were obtained from 10 soil samples and were clustered into OTUs. The pyrosequencing results agreed with the results of the in silico analysis. ITS1 might overestimate the fungal diversity and richness. Analyses using ITS, ITS1 and ITS2 yielded several different taxa, and the taxonomic preferences for ITS and ITS2 were similar. The results demonstrated that ITS2 alone might be a more suitable marker for revealing the operational taxonomic richness and taxonomy specifics of fungal communities when the full-length ITS is not available.
Collapse
Affiliation(s)
- Rui-Heng Yang
- Key Laboratory of Edible Fungal Resources and Utilization (South), National Engineering Research Center of Edible Fungi, Key Laboratory of Agricultural Genetics and Breeding of Shanghai, Institute of Edible Fungi, Shanghai Academy of Agricultural Sciences, Shanghai, China
| | - Jin-He Su
- Computer Engineering College, Jimei University, Xiamen, China
| | - Jun-Jun Shang
- Key Laboratory of Edible Fungal Resources and Utilization (South), National Engineering Research Center of Edible Fungi, Key Laboratory of Agricultural Genetics and Breeding of Shanghai, Institute of Edible Fungi, Shanghai Academy of Agricultural Sciences, Shanghai, China
| | - Ying-Ying Wu
- Key Laboratory of Edible Fungal Resources and Utilization (South), National Engineering Research Center of Edible Fungi, Key Laboratory of Agricultural Genetics and Breeding of Shanghai, Institute of Edible Fungi, Shanghai Academy of Agricultural Sciences, Shanghai, China
| | - Yan Li
- Key Laboratory of Edible Fungal Resources and Utilization (South), National Engineering Research Center of Edible Fungi, Key Laboratory of Agricultural Genetics and Breeding of Shanghai, Institute of Edible Fungi, Shanghai Academy of Agricultural Sciences, Shanghai, China
| | - Da-Peng Bao
- Key Laboratory of Edible Fungal Resources and Utilization (South), National Engineering Research Center of Edible Fungi, Key Laboratory of Agricultural Genetics and Breeding of Shanghai, Institute of Edible Fungi, Shanghai Academy of Agricultural Sciences, Shanghai, China
- * E-mail: (YJY); (DPB)
| | - Yi-Jian Yao
- State Key Laboratory of Mycology, Institute of Microbiology, Chinese Academy of Sciences, Beijing, China
- * E-mail: (YJY); (DPB)
| |
Collapse
|
9
|
Paul P, Malakar AK, Chakraborty S. Codon usage vis-a-vis start and stop codon context analysis of three dicot species. J Genet 2018. [DOI: 10.1007/s12041-018-0892-1] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/23/2022]
|
10
|
Zhao Y, Zheng H, Xu A, Yan D, Jiang Z, Qi Q, Sun J. Analysis of codon usage bias of envelope glycoprotein genes in nuclear polyhedrosis virus (NPV) and its relation to evolution. BMC Genomics 2016; 17:677. [PMID: 27558469 PMCID: PMC4997668 DOI: 10.1186/s12864-016-3021-7] [Citation(s) in RCA: 50] [Impact Index Per Article: 6.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/02/2016] [Accepted: 08/16/2016] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Analysis of codon usage bias is an extremely versatile method using in furthering understanding of the genetic and evolutionary paths of species. Codon usage bias of envelope glycoprotein genes in nuclear polyhedrosis virus (NPV) has remained largely unexplored at present. Hence, the codon usage bias of NPV envelope glycoprotein was analyzed here to reveal the genetic and evolutionary relationships between different viral species in baculovirus genus. RESULTS A total of 9236 codons from 18 different species of NPV of the baculovirus genera were used to perform this analysis. Glycoprotein of NPV exhibits weaker codon usage bias. Neutrality plot analysis and correlation analysis of effective number of codons (ENC) values indicate that natural selection is the main factor influencing codon usage bias, and that the impact of mutation pressure is relatively smaller. Another cluster analysis shows that the kinship or evolutionary relationships of these viral species can be divided into two broad categories despite all of these 18 species are from the same baculovirus genus. CONCLUSIONS There are many elements that can affect codon bias, such as the composition of amino acids, mutation pressure, natural selection, gene expression level, and etc. In the meantime, cluster analysis also illustrates that codon usage bias of virus envelope glycoprotein can serve as an effective means of evolutionary classification in baculovirus genus.
Collapse
Affiliation(s)
- Yongchao Zhao
- Subtropical Sericulture and Mulberry Resources Protection and Safety Engineering Research Center, Guangdong Provincial Key Laboratory of Agro-animal Genomics and Molecular Breeding, College of Animal Science, South China Agricultural University, Guangzhou, 510642, People's Republic of China
| | - Hao Zheng
- Subtropical Sericulture and Mulberry Resources Protection and Safety Engineering Research Center, Guangdong Provincial Key Laboratory of Agro-animal Genomics and Molecular Breeding, College of Animal Science, South China Agricultural University, Guangzhou, 510642, People's Republic of China
| | - Anying Xu
- Sericultural Research Institute, Chinese Academy of Agricultural Sciences, Zhenjiang Jiangsu, 212018, People's Republic of China
| | - Donghua Yan
- Subtropical Sericulture and Mulberry Resources Protection and Safety Engineering Research Center, Guangdong Provincial Key Laboratory of Agro-animal Genomics and Molecular Breeding, College of Animal Science, South China Agricultural University, Guangzhou, 510642, People's Republic of China
| | - Zijian Jiang
- Subtropical Sericulture and Mulberry Resources Protection and Safety Engineering Research Center, Guangdong Provincial Key Laboratory of Agro-animal Genomics and Molecular Breeding, College of Animal Science, South China Agricultural University, Guangzhou, 510642, People's Republic of China
| | - Qi Qi
- Subtropical Sericulture and Mulberry Resources Protection and Safety Engineering Research Center, Guangdong Provincial Key Laboratory of Agro-animal Genomics and Molecular Breeding, College of Animal Science, South China Agricultural University, Guangzhou, 510642, People's Republic of China
| | - Jingchen Sun
- Subtropical Sericulture and Mulberry Resources Protection and Safety Engineering Research Center, Guangdong Provincial Key Laboratory of Agro-animal Genomics and Molecular Breeding, College of Animal Science, South China Agricultural University, Guangzhou, 510642, People's Republic of China.
| |
Collapse
|
11
|
Sun S, Xiao J, Zhang H, Zhang Z. Pangenome Evidence for Higher Codon Usage Bias and Stronger Translational Selection in Core Genes of Escherichia coli. Front Microbiol 2016; 7:1180. [PMID: 27536275 PMCID: PMC4971109 DOI: 10.3389/fmicb.2016.01180] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/22/2016] [Accepted: 07/18/2016] [Indexed: 11/25/2022] Open
Abstract
Codon usage bias, as a combined interplay from mutation and selection, has been intensively studied in Escherichia coli. However, codon usage analysis in an E. coli pangenome remains unexplored and the relative importance of mutation and selection acting on core genes and strain-specific genes is unknown. Here we perform comprehensive codon usage analyses based on a collection of multiple complete genome sequences of E. coli. Our results show that core genes that are present in all strains have higher codon usage bias than strain-specific genes that are unique to single strains. We further explore the forces in influencing codon usage and investigate the difference of the major force between core and strain-specific genes. Our results demonstrate that although mutation may exert genome-wide influences on codon usage acting similarly in different gene sets, selection dominates as an important force to shape biased codon usage as genes are present in an increased number of strains. Together, our results provide important insights for better understanding genome plasticity and complexity as well as evolutionary mechanisms behind codon usage bias.
Collapse
Affiliation(s)
- Shixiang Sun
- CAS Key Laboratory of Genome Sciences and Information, Beijing Institute of Genomics, Chinese Academy of SciencesBeijing, China; BIG Data Center, Beijing Institute of Genomics, Chinese Academy of SciencesBeijing, China; University of Chinese Academy of SciencesBeijing, China
| | - Jingfa Xiao
- CAS Key Laboratory of Genome Sciences and Information, Beijing Institute of Genomics, Chinese Academy of SciencesBeijing, China; BIG Data Center, Beijing Institute of Genomics, Chinese Academy of SciencesBeijing, China
| | - Huiyong Zhang
- College of Life Sciences, Henan Agricultural University Zhengzhou, China
| | - Zhang Zhang
- CAS Key Laboratory of Genome Sciences and Information, Beijing Institute of Genomics, Chinese Academy of SciencesBeijing, China; BIG Data Center, Beijing Institute of Genomics, Chinese Academy of SciencesBeijing, China
| |
Collapse
|
12
|
Apostolou-Karampelis K, Nikolaou C, Almirantis Y. A novel skew analysis reveals substitution asymmetries linked to genetic code GC-biases and PolIII a-subunit isoforms. DNA Res 2016; 23:353-63. [PMID: 27345720 PMCID: PMC4991834 DOI: 10.1093/dnares/dsw021] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/11/2016] [Accepted: 05/09/2016] [Indexed: 11/30/2022] Open
Abstract
Strand biases reflect deviations from a null expectation of DNA evolution that assumes strand-symmetric substitution rates. Here, we present strong evidence that nearest-neighbour preferences are a strand-biased feature of bacterial genomes, indicating neighbour-dependent substitution asymmetries. To detect such asymmetries we introduce an alignment free index (relative abundance skews). The profiles of relative abundance skews along coding sequences can trace the phylogenetic relations of bacteria, suggesting that the patterns of neighbour-dependent substitution strand-biases are not common among different lineages, but are rather species-specific. Analysis of neighbour-dependent and codon-site skews sheds light on the origins of substitution asymmetries. Via a simple model we argue that the structure of the genetic code imposes position-dependent substitution strand-biases along coding sequences, as a response to GC mutation pressure. Thus, the organization of the genetic code per se can lead to an uneven distribution of nucleotides among different codon sites, even when requirements for specific codons and amino-acids are not accounted for. Moreover, our results suggest that strand-biases in replication fidelity of PolIII α-subunit induce substitution asymmetries, both neighbour-dependent and independent, on a genome scale. The role of DNA repair systems, such as transcription-coupled repair, is also considered.
Collapse
Affiliation(s)
| | - Christoforos Nikolaou
- Computational Genomics Group, Department of Biology, University of Crete, 71409 Heraklion, Greece
| | - Yannis Almirantis
- Institute of Biosciences and Applications, National Center for Scientific Research "Demokritos", 15310 Athens, Greece
| |
Collapse
|
13
|
Perez-Rueda E, Ibarra JA. Distribution of putative xenogeneic silencers in prokaryote genomes. Comput Biol Chem 2015; 58:167-72. [PMID: 26247404 DOI: 10.1016/j.compbiolchem.2015.06.007] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/05/2015] [Revised: 06/05/2015] [Accepted: 06/27/2015] [Indexed: 12/30/2022]
Abstract
Gene silencing is an important function as it keeps newly acquired foreign DNA repressed, thereby avoiding possible deleterious effects in the host organism. Known transcriptional regulators associated with this process are called xenogeneic silencers (XS) and belong to either the H-NS, Lsr2, MvaT or Rok families. In the work described here we looked for XS-like regulators and their distribution in prokaryotic organisms was evaluated. Our analysis showed that putative XS regulators similar to H-NS, Lsr2, MvaT or Rok are present only in bacteria (31.7%). This does not exclude the existence of alternative XS in the rest of the organisms analyzed. Additionally, of the four XS groups evaluated in this work, those from the H-NS family have diversified more than the other groups. In order to compare the distribution of these putative XS regulators we also searched for other nucleoid-associated proteins (NAPs) not included in this group such as Fis, EbfC/YbaB, HU/IHF and Alba. Results showed that NAPs from the Fis, EbfC/YbaB, HU/IHF and Alba families are widely (94%) distributed among prokaryotes. These NAPs were found in multiple combinations with or without XS-like proteins. In regard with XS regulators, results showed that only XS proteins from one family were found in those organisms containing them. This suggests specificity for this type of regulators and their corresponding genomes.
Collapse
Affiliation(s)
- Ernesto Perez-Rueda
- Departamento de Ingeniería Celular y Biocatálisis, Instituto de Biotecnología UNAM, Av. Universidad 2001, Cuernavaca, Morelos CP 62210, Mexico; Unidad Multidisciplinaria de Docencia e Investigación, Sisal Facultad de Ciencias, Sisal, Yucatán, UNAM, Mexico
| | - J Antonio Ibarra
- Departamento de Microbiología, Escuela Nacional de Ciencias Biológicas, Instituto Politécnico Nacional, Prol. de Carpio y Plan de Ayala. Col. Sto. Tomás, Distrito Federal, CP 11340, Mexico.
| |
Collapse
|
14
|
Wang XC, Liu C, Huang L, Bengtsson-Palme J, Chen H, Zhang JH, Cai D, Li JQ. ITS1: a DNA barcode better than ITS2 in eukaryotes? Mol Ecol Resour 2014; 15:573-86. [PMID: 25187125 DOI: 10.1111/1755-0998.12325] [Citation(s) in RCA: 98] [Impact Index Per Article: 9.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/25/2014] [Accepted: 08/27/2014] [Indexed: 11/30/2022]
Abstract
A DNA barcode is a short piece of DNA sequence used for species determination and discovery. The internal transcribed spacer (ITS/ITS2) region has been proposed as the standard DNA barcode for fungi and seed plants and has been widely used in DNA barcoding analyses for other biological groups, for example algae, protists and animals. The ITS region consists of both ITS1 and ITS2 regions. Here, a large-scale meta-analysis was carried out to compare ITS1 and ITS2 from three aspects: PCR amplification, DNA sequencing and species discrimination, in terms of the presence of DNA barcoding gaps, species discrimination efficiency, sequence length distribution, GC content distribution and primer universality. In total, 85 345 sequence pairs in 10 major groups of eukaryotes, including ascomycetes, basidiomycetes, liverworts, mosses, ferns, gymnosperms, monocotyledons, eudicotyledons, insects and fishes, covering 611 families, 3694 genera, and 19 060 species, were analysed. Using similarity-based methods, we calculated species discrimination efficiencies for ITS1 and ITS2 in all major groups, families and genera. Using Fisher's exact test, we found that ITS1 has significantly higher efficiencies than ITS2 in 17 of the 47 families and 20 of the 49 genera, which are sample-rich. By in silico PCR amplification evaluation, primer universality of the extensively applied ITS1 primers was found superior to that of ITS2 primers. Additionally, shorter length of amplification product and lower GC content was discovered to be two other advantages of ITS1 for sequencing. In summary, ITS1 represents a better DNA barcode than ITS2 for eukaryotic species.
Collapse
Affiliation(s)
- Xin-Cun Wang
- Institute of Medicinal Plant Development, Chinese Academy of Medical Science, 151 MaLianWa North Road, Beijing, 100193, China
| | | | | | | | | | | | | | | |
Collapse
|
15
|
Zhang Z, Yu J. Does the genetic code have a eukaryotic origin? GENOMICS PROTEOMICS & BIOINFORMATICS 2013; 11:41-55. [PMID: 23402863 PMCID: PMC4357656 DOI: 10.1016/j.gpb.2013.01.001] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 11/15/2012] [Revised: 01/09/2013] [Accepted: 01/11/2013] [Indexed: 11/29/2022]
Abstract
In the RNA world, RNA is assumed to be the dominant macromolecule performing most, if not all, core “house-keeping” functions. The ribo-cell hypothesis suggests that the genetic code and the translation machinery may both be born of the RNA world, and the introduction of DNA to ribo-cells may take over the informational role of RNA gradually, such as a mature set of genetic code and mechanism enabling stable inheritance of sequence and its variation. In this context, we modeled the genetic code in two content variables—GC and purine contents—of protein-coding sequences and measured the purine content sensitivities for each codon when the sensitivity (% usage) is plotted as a function of GC content variation. The analysis leads to a new pattern—the symmetric pattern—where the sensitivity of purine content variation shows diagonally symmetry in the codon table more significantly in the two GC content invariable quarters in addition to the two existing patterns where the table is divided into either four GC content sensitivity quarters or two amino acid diversity halves. The most insensitive codon sets are GUN (valine) and CAN (CAR for asparagine and CAY for aspartic acid) and the most biased amino acid is valine (always over-estimated) followed by alanine (always under-estimated). The unique position of valine and its codons suggests its key roles in the final recruitment of the complete codon set of the canonical table. The distinct choice may only be attributable to sequence signatures or signals of splice sites for spliceosomal introns shared by all extant eukaryotes.
Collapse
Affiliation(s)
- Zhang Zhang
- CAS Key Laboratory of Genome Sciences and Information, Beijing Institute of Genomics, Chinese Academy of Sciences, Beijing 100101, China
| | | |
Collapse
|
16
|
Zhang Z, Yu J. The pendulum model for genome compositional dynamics: from the four nucleotides to the twenty amino acids. GENOMICS PROTEOMICS & BIOINFORMATICS 2012; 10:175-80. [PMID: 23084772 PMCID: PMC5054704 DOI: 10.1016/j.gpb.2012.08.002] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 07/31/2012] [Accepted: 08/02/2012] [Indexed: 12/29/2022]
Abstract
The genetic code serves as one of the natural links for life’s two conceptual frameworks—the informational and operational tracks—bridging the nucleotide sequence of DNA and RNA to the amino acid sequence of protein and thus its structure and function. On the informational track, DNA and its four building blocks have four basic variables: order, length, GC and purine contents; the latter two exhibit unique characteristics in prokaryotic genomes where protein-coding sequences dominate. Bridging the two tracks, tRNAs and their aminoacyl tRNA synthases that interpret each codon—nucleotide triplet, together with ribosomes, form a complex machinery that translates genetic information encoded on the messenger RNAs into proteins. On the operational track, proteins are selected in a context of cellular and organismal functions constantly. The principle of such a functional selection is to minimize the damage caused by sequence alteration in a seemingly random fashion at the nucleotide level and its function-altering consequence at the protein level; the principle also suggests that there must be complex yet sophisticated mechanisms to protect molecular interactions and cellular processes for cells and organisms from the damage in addition to both immediate or short-term eliminations and long-term selections. The two-century study of selection at species and population levels has been leading a way to understand rules of inheritance and evolution at molecular levels along the informational track, while ribogenomics, epigenomics and other operationally-defined omics (such as the metabolite-centric metabolomics) have been ushering biologists into the new millennium along the operational track.
Collapse
Affiliation(s)
- Zhang Zhang
- CAS Key Laboratory of Genome Sciences and Information, Beijing Institute of Genomics, Chinese Academy of Sciences, Beijing, China
| | | |
Collapse
|
17
|
Wu H, Qu H, Wan N, Zhang Z, Hu S, Yu J. Strand-biased gene distribution in bacteria is related to both horizontal gene transfer and strand-biased nucleotide composition. GENOMICS PROTEOMICS & BIOINFORMATICS 2012; 10:186-96. [PMID: 23084774 PMCID: PMC5054707 DOI: 10.1016/j.gpb.2012.08.001] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 07/19/2012] [Accepted: 07/29/2012] [Indexed: 11/18/2022]
Abstract
Although strand-biased gene distribution (SGD) was described some two decades ago, the underlying molecular mechanisms and their relationship remain elusive. Its facets include, but are not limited to, the degree of biases, the strand-preference of genes, and the influence of background nucleotide composition variations. Using a dataset composed of 364 non-redundant bacterial genomes, we sought to illustrate our current understanding of SGD. First, when we divided the collection of bacterial genomes into non-polC and polC groups according to their possession of DnaE isoforms that correlate closely with taxonomy, the SGD of the polC group stood out more significantly than that of the non-polC group. Second, when examining horizontal gene transfer, coupled with gene functional conservation (essentiality) and expressivity (level of expression), we realized that they all contributed to SGD. Third, we further demonstrated a weaker G-dominance on the leading strand of the non-polC group but strong purine dominance (both G and A) on the leading strand of the polC group. We propose that strand-biased nucleotide composition plays a decisive role for SGD since the polC-bearing genomes are not only AT-rich but also have pronounced purine-rich leading strands, and we believe that a special mutation spectrum that leads to a strong purine asymmetry and a strong strand-biased nucleotide composition coupled with functional selections for genes and their functions are both at work.
Collapse
|
18
|
Zhang Z, Li J, Cui P, Ding F, Li A, Townsend JP, Yu J. Codon Deviation Coefficient: a novel measure for estimating codon usage bias and its statistical significance. BMC Bioinformatics 2012; 13:43. [PMID: 22435713 PMCID: PMC3368730 DOI: 10.1186/1471-2105-13-43] [Citation(s) in RCA: 47] [Impact Index Per Article: 3.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/29/2011] [Accepted: 03/22/2012] [Indexed: 02/07/2023] Open
Abstract
Background Genetic mutation, selective pressure for translational efficiency and accuracy, level of gene expression, and protein function through natural selection are all believed to lead to codon usage bias (CUB). Therefore, informative measurement of CUB is of fundamental importance to making inferences regarding gene function and genome evolution. However, extant measures of CUB have not fully accounted for the quantitative effect of background nucleotide composition and have not statistically evaluated the significance of CUB in sequence analysis. Results Here we propose a novel measure--Codon Deviation Coefficient (CDC)--that provides an informative measurement of CUB and its statistical significance without requiring any prior knowledge. Unlike previous measures, CDC estimates CUB by accounting for background nucleotide compositions tailored to codon positions and adopts the bootstrapping to assess the statistical significance of CUB for any given sequence. We evaluate CDC by examining its effectiveness on simulated sequences and empirical data and show that CDC outperforms extant measures by achieving a more informative estimation of CUB and its statistical significance. Conclusions As validated by both simulated and empirical data, CDC provides a highly informative quantification of CUB and its statistical significance, useful for determining comparative magnitudes and patterns of biased codon usage for genes or genomes with diverse sequence compositions.
Collapse
Affiliation(s)
- Zhang Zhang
- Computational Bioscience Research Center (CBRC), King Abdullah Universitof Science and Technology (KAUST), Thuwal 23955-6900, Kingdom of Saudi Arabia
| | | | | | | | | | | | | |
Collapse
|
19
|
Wu H, Zhang Z, Hu S, Yu J. On the molecular mechanism of GC content variation among eubacterial genomes. Biol Direct 2012; 7:2. [PMID: 22230424 PMCID: PMC3274465 DOI: 10.1186/1745-6150-7-2] [Citation(s) in RCA: 79] [Impact Index Per Article: 6.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/12/2011] [Accepted: 01/10/2012] [Indexed: 12/02/2022] Open
Abstract
Background As a key parameter of genome sequence variation, the GC content of bacterial genomes has been investigated for over half a century, and many hypotheses have been put forward to explain this GC content variation and its relationship to other fundamental processes. Previously, we classified eubacteria into dnaE-based groups (the dimeric combination of DNA polymerase III alpha subunits), according to a hypothesis where GC content variation is essentially governed by genome replication and DNA repair mechanisms. Further investigation led to the discovery that two major mutator genes, polC and dnaE2, may be responsible for genomic GC content variation. Consequently, an in-depth analysis was conducted to evaluate various potential intrinsic and extrinsic factors in association with GC content variation among eubacterial genomes. Results Mutator genes, especially those with dominant effects on the mutation spectra, are biased towards either GC or AT richness, and they alter genomic GC content in the two opposite directions. Increased bacterial genome size (or gene number) appears to rely on increased genomic GC content; however, it is unclear whether the changes are directly related to certain environmental pressures. Certain environmental and bacteriological features are related to GC content variation, but their trends are more obvious when analyzed under the dnaE-based grouping scheme. Most terrestrial, plant-associated, and nitrogen-fixing bacteria are members of the dnaE1|dnaE2 group, whereas most pathogenic or symbiotic bacteria in insects, and those dwelling in aquatic environments, are largely members of the dnaE1|polV group. Conclusion Our studies provide several lines of evidence indicating that DNA polymerase III α subunit and its isoforms participating in either replication (such as polC) or SOS mutagenesis/translesion synthesis (such as dnaE2), play dominant roles in determining GC variability. Other environmental or bacteriological factors, such as genome size, temperature, oxygen requirement, and habitat, either play subsidiary roles or rely indirectly on different mutator genes to fine-tune the GC content. These results provide a comprehensive insight into mechanisms of GC content variation and the robustness of eubacterial genomes in adapting their ever-changing environments over billions of years. Reviewers This paper was reviewed by Nicolas Galtier, Adam Eyre-Walker, and Eugene Koonin.
Collapse
Affiliation(s)
- Hao Wu
- James D Watson Institute of Genome Sciences, Zhejiang University, Hangzhou 310007, China
| | | | | | | |
Collapse
|
20
|
Fang Y, Li Z, Liu J, Shu C, Wang X, Zhang X, Yu X, Zhao D, Liu G, Hu S, Zhang J, Al-Mssallem I, Yu J. A pangenomic study of Bacillus thuringiensis. J Genet Genomics 2011; 38:567-76. [PMID: 22196399 DOI: 10.1016/j.jgg.2011.11.001] [Citation(s) in RCA: 16] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/24/2011] [Revised: 10/25/2011] [Accepted: 11/09/2011] [Indexed: 11/28/2022]
Abstract
Bacillus thuringiensis (B. thuringiensis) is a soil-dwelling Gram-positive bacterium and its plasmid-encoded toxins (Cry) are commonly used as biological alternatives to pesticides. In a pangenomic study, we sequenced seven B. thuringiensis isolates in both high coverage and base-quality using the next-generation sequencing platform. The B. thuringiensis pangenome was extrapolated to have 4196 core genes and an asymptotic value of 558 unique genes when a new genome is added. Compared to the pangenomes of its closely related species of the same genus, B. thuringiensis pangenome shows an open characteristic, similar to B. cereus but not to B. anthracis; the latter has a closed pangenome. We also found extensive divergence among the seven B. thuringiensis genome assemblies, which harbor ample repeats and single nucleotide polymorphisms (SNPs). The identities among orthologous genes are greater than 84.5% and the hotspots for the genome variations were discovered in genomic regions of 2.3-2.8Mb and 5.0-5.6Mb. We concluded that high-coverage sequence assemblies from multiple strains, before all the gaps are closed, are very useful for pangenomic studies.
Collapse
Affiliation(s)
- Yongjun Fang
- James D. Watson Institute of Genome Sciences, College of Life Science, Zhejiang University, Hangzhou 310058, China
| | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
21
|
Zhang Z, Yu J. On the organizational dynamics of the genetic code. GENOMICS PROTEOMICS & BIOINFORMATICS 2011; 9:21-9. [PMID: 21641559 PMCID: PMC5054158 DOI: 10.1016/s1672-0229(11)60004-1] [Citation(s) in RCA: 20] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 08/30/2010] [Accepted: 10/26/2010] [Indexed: 11/23/2022]
Abstract
The organization of the canonical genetic code needs to be thoroughly illuminated. Here we reorder the four nucleotides—adenine, thymine, guanine and cytosine—according to their emergence in evolution, and apply the organizational rules to devising an algebraic representation for the canonical genetic code. Under a framework of the devised code, we quantify codon and amino acid usages from a large collection of 917 prokaryotic genome sequences, and associate the usages with its intrinsic structure and classification schemes as well as amino acid physicochemical properties. Our results show that the algebraic representation of the code is structurally equivalent to a content-centric organization of the code and that codon and amino acid usages under different classification schemes were correlated closely with GC content, implying a set of rules governing composition dynamics across a wide variety of prokaryotic genome sequences. These results also indicate that codons and amino acids are not randomly allocated in the code, where the six-fold degenerate codons and their amino acids have important balancing roles for error minimization. Therefore, the content-centric code is of great usefulness in deciphering its hitherto unknown regularities as well as the dynamics of nucleotide, codon, and amino acid compositions.
Collapse
Affiliation(s)
- Zhang Zhang
- Plant Stress Genomics Research Center, Division of Chemical and Life Sciences and Engineering, King Abdullah University of Science and Technology, Thuwal, Saudi Arabia
| | | |
Collapse
|
22
|
Zhang Z, Yu J. Modeling compositional dynamics based on GC and purine contents of protein-coding sequences. Biol Direct 2010; 5:63. [PMID: 21059261 PMCID: PMC2989939 DOI: 10.1186/1745-6150-5-63] [Citation(s) in RCA: 11] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/16/2010] [Accepted: 11/08/2010] [Indexed: 12/03/2022] Open
Abstract
Background Understanding the compositional dynamics of genomes and their coding sequences is of great significance in gaining clues into molecular evolution and a large number of publically-available genome sequences have allowed us to quantitatively predict deviations of empirical data from their theoretical counterparts. However, the quantification of theoretical compositional variations for a wide diversity of genomes remains a major challenge. Results To model the compositional dynamics of protein-coding sequences, we propose two simple models that take into account both mutation and selection effects, which act differently at the three codon positions, and use both GC and purine contents as compositional parameters. The two models concern the theoretical composition of nucleotides, codons, and amino acids, with no prerequisite of homologous sequences or their alignments. We evaluated the two models by quantifying theoretical compositions of a large collection of protein-coding sequences (including 46 of Archaea, 686 of Bacteria, and 826 of Eukarya), yielding consistent theoretical compositions across all the collected sequences. Conclusions We show that the compositions of nucleotides, codons, and amino acids are largely determined by both GC and purine contents and suggest that deviations of the observed from the expected compositions may reflect compositional signatures that arise from a complex interplay between mutation and selection via DNA replication and repair mechanisms. Reviewers This article was reviewed by Zhaolei Zhang (nominated by Mark Gerstein), Guruprasad Ananda (nominated by Kateryna Makova), and Daniel Haft.
Collapse
Affiliation(s)
- Zhang Zhang
- Plant Stress Genomics Research Center, Division of Chemical and Life Sciences and Engineering, King Abdullah University of Science and Technology, Thuwal 23955-6900, Kingdom of Saudi Arabia
| | | |
Collapse
|
23
|
Qu H, Wu H, Zhang T, Zhang Z, Hu S, Yu J. Nucleotide compositional asymmetry between the leading and lagging strands of eubacterial genomes. Res Microbiol 2010; 161:838-46. [PMID: 20868744 DOI: 10.1016/j.resmic.2010.09.015] [Citation(s) in RCA: 16] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/27/2010] [Accepted: 08/03/2010] [Indexed: 11/15/2022]
Abstract
Nucleotide compositional asymmetry (NCA) between leading and lagging strands (LeS and LaS) is dynamic and diverse among eubacterial genomes due to different mutation and selection forces. A thorough investigation is needed in order to study the relationship between nucleotide composition dynamics and gene distribution biases. Based on a collection of 364 eubacterial genomes that were grouped according to a DnaE-based scheme (DnaE1-DnaE1, DnaE2-DnaE1, and DnaE3-PolC), we investigated NCA and nucleotide composition gradients at three codon positions and found that there was universal G-enrichment on LeS among all groups. This was due to a strong selection for G-heading (codon position1 or cp1) codons and mutation pressure that led to more G-ending (cp3) codons. Moreover, a slight T-enrichment of LeS due to the mutation of cytosine deamination at cp3 was universal among DnaE1-DnaE1 and DnaE2-DnaE1 genomes, but was not clearly seen among DnaE3-PolC genomes, in which A-enrichment of LeS was proposed to be the effect of selections unique to polC and a mutation bias toward A-richness at cp1 that may be a result of transcription-coupled DNA repair mechanisms. Furthermore, strand-biased gene distribution enhances the purine-richness of LeS for DnaE3-PolC genomes and T-richness of LeS for DnaE1-DnaE1 and DnaE2-dnaE1 genomes.
Collapse
Affiliation(s)
- Hongzhu Qu
- Key Laboratory of Genome Sciences and Information, Beijing Institute of Genomics, Chinese Academy of Sciences, Beijing 100029, China.
| | | | | | | | | | | |
Collapse
|
24
|
Comparative analysis of acidobacterial genomic fragments from terrestrial and aquatic metagenomic libraries, with emphasis on acidobacteria subdivision 6. Appl Environ Microbiol 2010; 76:6769-77. [PMID: 20729323 DOI: 10.1128/aem.00343-10] [Citation(s) in RCA: 22] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/20/2022] Open
Abstract
The bacterial phylum Acidobacteria has a widespread distribution and is one of the most common and diverse phyla in soil habitats. However, members of this phylum have often been recalcitrant to cultivation methods, hampering the study of this presumably important bacterial group. In this study, we used a cultivation-independent metagenomic approach to recover genomic information from soilborne members of this phylum. A soil metagenomic fosmid library was screened by PCR targeting acidobacterial 16S rRNA genes, facilitating the recovery of 17 positive clones. Recovered inserts appeared to originate from a range of Acidobacteria subdivisions, with dominance of subdivision 6 (10 clones). Upon full-length insert sequencing, gene annotation identified a total of 350 open reading frames (ORFs), representing a broad range of functions. Remarkably, six inserts from subdivision 6 contained a region of gene synteny, containing genes involved in purine de novo biosynthesis and encoding tRNA synthetase and conserved hypothetical proteins. Similar genomic regions had previously been observed in several environmental clones recovered from soil and marine sediments, facilitating comparisons with respect to gene organization and evolution. Comparative analyses revealed a general dichotomy between marine and terrestrial genes in both phylogeny and G+C content. Although the significance of this homologous gene cluster across subdivision 6 members is not known, it appears to be a common feature within a large percentage of all acidobacterial genomic fragments recovered from both of these environments.
Collapse
|
25
|
Khrustalev VV, Barkovsky EV. The level of cytosine is usually much higher than the level of guanine in two-fold degenerated sites from third codon positions of genes from Simplex- and Varicelloviruses with G+C higher than 50%. J Theor Biol 2010; 266:88-98. [PMID: 20600145 DOI: 10.1016/j.jtbi.2010.06.023] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/04/2010] [Revised: 05/05/2010] [Accepted: 06/15/2010] [Indexed: 11/26/2022]
Abstract
We studied usage of cytosine and guanine in 914 genes from completely sequenced genomes of five Simplex- and seven Varicelloviruses. In genes with total GC-content higher than 50% usage of cytosine is usually higher than usage of guanine (an average difference for genes with G+C higher than 70% reaches 4.0%). This difference is caused mostly by the elevated usage of cytosine in two-fold degenerated sites situated in third codon positions relatively to the usage of guanine in two-fold degenerated sites situated in third codon positions (an average difference for genes with G+C higher than 70% is equal to 28.2%). The usage of amino acids that are encoded by codons containing cytosine in two-fold degenerated sites situated in third codon positions (AA2TC) is much higher than the usage of amino acids encoded by codons containing guanine in two-fold degenerated sites situated in third codon positions (AA2AG). The usage of AA2AG declines much more steeply with the growth of GC-content than the usage of AA2TC. This effect is the consequence of the nature of genetic code and of the negative selection. In GC-rich genes the usage of cytosine in four-fold degenerated sites is only a little (but significantly) higher than the usage of guanine (in genes with G+C higher than 70% an average difference is equal to 4.3%). This difference may be caused by transcription-associated mutational pressure.
Collapse
Affiliation(s)
- Vladislav Victorovich Khrustalev
- Department of General Chemistry, Belarussian State Medical University, Communisticheskaya 7-24, Dzerzinskogo 83, Minsk 220029, Belarus.
| | | |
Collapse
|
26
|
Variable correlation of genome GC% with transfer RNA number as well as with transfer RNA diversity among bacterial groups: α-Proteobacteria and Tenericutes exhibit strong positive correlation. Microbiol Res 2010; 165:232-42. [DOI: 10.1016/j.micres.2009.05.005] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/20/2009] [Revised: 05/04/2009] [Accepted: 05/15/2009] [Indexed: 11/21/2022]
|
27
|
Xiao JF, Yu J. A scenario on the stepwise evolution of the genetic code. GENOMICS PROTEOMICS & BIOINFORMATICS 2008; 5:143-51. [PMID: 18267295 PMCID: PMC5054201 DOI: 10.1016/s1672-0229(08)60001-7] [Citation(s) in RCA: 17] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 11/16/2022]
Abstract
It is believed that in the RNA world the operational (ribozymes) and the informational (riboscripts) RNA molecules were created with only three (adenosine, uridine, and guanosine) and two (adenosine and uridine) nucleosides, respectively, so that the genetic code started uncomplicated. Ribozymes subsequently evolved to be able to cut and paste themselves and riboscripts were acceptive to rigorous editing (adenosine to inosine); the intensive diversification of RNA molecules shaped novel cellular machineries that are capable of polymerizing amino acids—a new type of cellular building materials for life. Initially, the genetic code, encoding seven amino acids, was created only to distinguish purine and pyrimidine; it was later expanded in a stepwise way to encode 12, 15, and 20 amino acids through the relief of guanine from its roles as operational signals and through the recruitment of cytosine. Therefore, the maturation of the genetic code also coincided with (1) the departure of aminoacyl-tRNA synthetases (AARSs) from the primordial translation machinery, (2) the replacement of informational RNA by DNA, and (3) the co-evolution of AARSs and their cognate tRNAs. This model predicts gradual replacements of RNA-made molecular mechanisms, cellular processes by proteins, and informational exploitation by DNA.
Collapse
Affiliation(s)
- Jing-Fa Xiao
- CAS Key Laboratory of Genome Sciences and Information, Beijing Institute of Genomics, Chinese Academy of Sciences, Beijing 100029, China
| | | |
Collapse
|
28
|
Abstract
The codon table for the canonical genetic code can be rearranged in such a way that the code is divided into four quarters and two halves according to the variability of their GC and purine contents, respectively. For prokaryotic genomes, when the genomic GC content increases, their amino acid contents tend to be restricted to the GC-rich quarter and the purine-content insensitive half, where all codons are fourfold degenerate and relatively mutation-tolerant. Conversely, when the genomic GC content decreases, most of the codons retract to the AU-rich quarter and the purine-content sensitive half; most of the codons not only remain encoding physicochemically diversified amino acids but also vary when transversion (between purine and pyrimidine) happens. Amino acids with sixfold-degenerate codons are distributed into all four quarters and across the two halves; their fourfold-degenerate codons are all partitioned into the purine-insensitive half in favorite of robustness against mutations. The features manifested in the rearranged codon table explain most of the intrinsic relationship between protein coding sequences (the informational content) and amino acid compositions (the functional content). The renovated codon table is useful in predicting abundant amino acids and positioning the amino acids with related or distinct physicochemical properties.
Collapse
Affiliation(s)
- Jun Yu
- Key Laboratory of Genome Sciences and Information, Beijing Institute of Genomics, Chinese Academy of Sciences, Beijing 101300, China.
| |
Collapse
|