1
|
Qin J, Ma Y, Liu Y, Wang Y. Phylogenomic analysis and dynamic evolution of chloroplast genomes of Clematis nannophylla. Sci Rep 2024; 14:15109. [PMID: 38956388 PMCID: PMC11220099 DOI: 10.1038/s41598-024-65154-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/22/2024] [Accepted: 06/17/2024] [Indexed: 07/04/2024] Open
Abstract
Clematis nannophylla is a perennial shrub of Clematis with ecological, ornamental, and medicinal value, distributed in the arid and semi-arid areas of northwest China. This study successfully determined the chloroplast (cp) genome of C. nannophylla, reconstructing a phylogenetic tree of Clematis. This cp genome is 159,801 bp in length and has a typical tetrad structure, including a large single-copy, a small single-copy, and a pair of reverse repeats (IRa and IRb). It contains 133 unique genes, including 89 protein-coding, 36 tRNA, and 8 rRNA genes. Additionally, 66 simple repeat sequences, 50 dispersed repeats, and 24 tandem repeats were found; many of the dispersed and tandem repeats were between 20-30 bp and 10-20 bp, respectively, and the abundant repeats were located in the large single copy region. The cp genome was relatively conserved, especially in the IR region, where no inversion or rearrangement was observed, further revealing that the coding regions were more conserved than the noncoding regions. Phylogenetic analysis showed that C. nannophylla is more closely related to C. fruticosa and C. songorica. Our analysis provides reference data for molecular marker development, phylogenetic analysis, population studies, and cp genome processes to better utilise C. nannophylla.
Collapse
Affiliation(s)
- Jinping Qin
- College of Animal Husbandry and Veterinary Science, Qinghai University, Xining, 810016, Qinghai, China
| | - Yushou Ma
- College of Animal Husbandry and Veterinary Science, Qinghai University, Xining, 810016, Qinghai, China
| | - Ying Liu
- College of Animal Husbandry and Veterinary Science, Qinghai University, Xining, 810016, Qinghai, China.
| | - Yanlong Wang
- College of Animal Husbandry and Veterinary Science, Qinghai University, Xining, 810016, Qinghai, China.
| |
Collapse
|
2
|
Moeckel C, Zaravinos A, Georgakopoulos-Soares I. Strand asymmetries across genomic processes. Comput Struct Biotechnol J 2023; 21:2036-2047. [PMID: 36968020 PMCID: PMC10030826 DOI: 10.1016/j.csbj.2023.03.007] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/18/2023] [Revised: 03/08/2023] [Accepted: 03/08/2023] [Indexed: 03/12/2023] Open
Abstract
Across biological systems, a number of genomic processes, including transcription, replication, DNA repair, and transcription factor binding, display intrinsic directionalities. These directionalities are reflected in the asymmetric distribution of nucleotides, motifs, genes, transposon integration sites, and other functional elements across the two complementary strands. Strand asymmetries, including GC skews and mutational biases, have shaped the nucleotide composition of diverse organisms. The investigation of strand asymmetries often serves as a method to understand underlying biological mechanisms, including protein binding preferences, transcription factor interactions, retrotransposition, DNA damage and repair preferences, transcription-replication collisions, and mutagenesis mechanisms. Research into this subject also enables the identification of functional genomic sites, such as replication origins and transcription start sites. Improvements in our ability to detect and quantify DNA strand asymmetries will provide insights into diverse functionalities of the genome, the contribution of different mutational mechanisms in germline and somatic mutagenesis, and our knowledge of genome instability and evolution, which all have significant clinical implications in human disease, including cancer. In this review, we describe key developments that have been made across the field of genomic strand asymmetries, as well as the discovery of associated mechanisms.
Collapse
Affiliation(s)
- Camille Moeckel
- Institute for Personalized Medicine, Department of Biochemistry and Molecular Biology, The Pennsylvania State University College of Medicine, Hershey, PA, USA
| | - Apostolos Zaravinos
- Department of Life Sciences, European University Cyprus, Diogenis Str., 6, Nicosia 2404, Cyprus
- Cancer Genetics, Genomics and Systems Biology laboratory, Basic and Translational Cancer Research Center (BTCRC), Nicosia 1516, Cyprus
| | - Ilias Georgakopoulos-Soares
- Institute for Personalized Medicine, Department of Biochemistry and Molecular Biology, The Pennsylvania State University College of Medicine, Hershey, PA, USA
| |
Collapse
|
3
|
Murat P, Perez C, Crisp A, van Eijk P, Reed SH, Guilbaud G, Sale JE. DNA replication initiation shapes the mutational landscape and expression of the human genome. SCIENCE ADVANCES 2022; 8:eadd3686. [PMID: 36351018 PMCID: PMC9645720 DOI: 10.1126/sciadv.add3686] [Citation(s) in RCA: 13] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 06/07/2022] [Accepted: 09/21/2022] [Indexed: 06/16/2023]
Abstract
The interplay between active biological processes and DNA repair is central to mutagenesis. Here, we show that the ubiquitous process of replication initiation is mutagenic, leaving a specific mutational footprint at thousands of early and efficient replication origins. The observed mutational pattern is consistent with two distinct mechanisms, reflecting the two-step process of origin activation, triggering the formation of DNA breaks at the center of origins and local error-prone DNA synthesis in their immediate vicinity. We demonstrate that these replication initiation-dependent mutational processes exert an influence on phenotypic diversity in humans that is disproportionate to the origins' genomic size: By increasing mutational loads at gene promoters and splice junctions, the presence of an origin significantly influences both gene expression and mRNA isoform usage. Last, we show that mutagenesis at origins not only drives the evolution of origin sequences but also contributes to sculpting regulatory domains of the human genome.
Collapse
Affiliation(s)
- Pierre Murat
- Division of Protein & Nucleic Acid Chemistry, MRC Laboratory of Molecular Biology, Francis Crick Avenue, Cambridge, CB2 0QH, UK
| | - Consuelo Perez
- Division of Protein & Nucleic Acid Chemistry, MRC Laboratory of Molecular Biology, Francis Crick Avenue, Cambridge, CB2 0QH, UK
| | - Alastair Crisp
- Division of Protein & Nucleic Acid Chemistry, MRC Laboratory of Molecular Biology, Francis Crick Avenue, Cambridge, CB2 0QH, UK
| | - Patrick van Eijk
- Broken String Biosciences Ltd., BioData Innovation Centre, Unit AB3-03, Level 3, Wellcome Genome Campus, Hinxton, Cambridge CB10 1DR, UK
- Division of Cancer & Genetics School of Medicine, Cardiff University, Heath Park, Cardiff CF14 4XN, UK
| | - Simon H. Reed
- Broken String Biosciences Ltd., BioData Innovation Centre, Unit AB3-03, Level 3, Wellcome Genome Campus, Hinxton, Cambridge CB10 1DR, UK
- Division of Cancer & Genetics School of Medicine, Cardiff University, Heath Park, Cardiff CF14 4XN, UK
| | - Guillaume Guilbaud
- Division of Protein & Nucleic Acid Chemistry, MRC Laboratory of Molecular Biology, Francis Crick Avenue, Cambridge, CB2 0QH, UK
| | - Julian E. Sale
- Division of Protein & Nucleic Acid Chemistry, MRC Laboratory of Molecular Biology, Francis Crick Avenue, Cambridge, CB2 0QH, UK
| |
Collapse
|
4
|
Guilbaud G, Murat P, Wilkes HS, Lerner LK, Sale JE, Krude T. Determination of human DNA replication origin position and efficiency reveals principles of initiation zone organisation. Nucleic Acids Res 2022; 50:7436-7450. [PMID: 35801867 PMCID: PMC9303276 DOI: 10.1093/nar/gkac555] [Citation(s) in RCA: 29] [Impact Index Per Article: 9.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/30/2021] [Revised: 06/14/2022] [Accepted: 06/20/2022] [Indexed: 12/16/2022] Open
Abstract
Replication of the human genome initiates within broad zones of ∼150 kb. The extent to which firing of individual DNA replication origins within initiation zones is spatially stochastic or localised at defined sites remains a matter of debate. A thorough characterisation of the dynamic activation of origins within initiation zones is hampered by the lack of a high-resolution map of both their position and efficiency. To address this shortcoming, we describe a modification of initiation site sequencing (ini-seq), based on density substitution. Newly replicated DNA is rendered 'heavy-light' (HL) by incorporation of BrdUTP while unreplicated DNA remains 'light-light' (LL). Replicated HL-DNA is separated from unreplicated LL-DNA by equilibrium density gradient centrifugation, then both fractions are subjected to massive parallel sequencing. This allows precise mapping of 23,905 replication origins simultaneously with an assignment of a replication initiation efficiency score to each. We show that origin firing within early initiation zones is not randomly distributed. Rather, origins are arranged hierarchically with a set of very highly efficient origins marking zone boundaries. We propose that these origins explain much of the early firing activity arising within initiation zones, helping to unify the concept of replication initiation zones with the identification of discrete replication origin sites.
Collapse
Affiliation(s)
- Guillaume Guilbaud
- Division of Protein and Nucleic Acid Chemistry, MRC Laboratory of Molecular Biology, Francis Crick Avenue, Cambridge, CB2 0QH, UK
| | - Pierre Murat
- Division of Protein and Nucleic Acid Chemistry, MRC Laboratory of Molecular Biology, Francis Crick Avenue, Cambridge, CB2 0QH, UK
| | - Helen S Wilkes
- Department of Zoology, University of Cambridge, Downing Street, Cambridge, CB2 3EJ, UK
| | - Leticia Koch Lerner
- Division of Protein and Nucleic Acid Chemistry, MRC Laboratory of Molecular Biology, Francis Crick Avenue, Cambridge, CB2 0QH, UK
| | - Julian E Sale
- Division of Protein and Nucleic Acid Chemistry, MRC Laboratory of Molecular Biology, Francis Crick Avenue, Cambridge, CB2 0QH, UK
| | - Torsten Krude
- Department of Zoology, University of Cambridge, Downing Street, Cambridge, CB2 3EJ, UK
| |
Collapse
|
5
|
Jin YT, Pu DK, Guo HX, Deng Z, Chen LL, Guo FB. T-G-A Deficiency Pattern in Protein-Coding Genes and Its Potential Reason. Front Microbiol 2022; 13:847325. [PMID: 35602045 PMCID: PMC9116502 DOI: 10.3389/fmicb.2022.847325] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/02/2022] [Accepted: 03/30/2022] [Indexed: 11/20/2022] Open
Abstract
If a stop codon appears within one gene, then its translation will be terminated earlier than expected. False folding of premature protein will be adverse to the host; hence, all functional genes would tend to avoid the intragenic stop codons. Therefore, we hypothesize that there will be less frequency of nucleotides corresponding to stop codons at each codon position of genes. Here, we validate this inference by investigating the nucleotide frequency at a large scale and results from 19,911 prokaryote genomes revealed that nucleotides coinciding with stop codons indeed have the lowest frequency in most genomes. Interestingly, genes with three types of stop codons all tend to follow a T-G-A deficiency pattern, suggesting that the property of avoiding intragenic termination pressure is the same and the major stop codon TGA plays a dominant role in this effect. Finally, a positive correlation between the TGA deficiency extent and the base length was observed in start-experimentally verified genes of Escherichia coli (E. coli). This strengthens the proof of our hypothesis. The T-G-A deficiency pattern observed would help to understand the evolution of codon usage tactics in extant organisms.
Collapse
Affiliation(s)
- Yan-Ting Jin
- School of Life Science and Technology, University of Electronic Science and Technology of China, Chengdu, China.,Department of Respiratory and Critical Care Medicine, Zhongnan Hospital of Wuhan University, Key Laboratory of Combinatorial Biosynthesis and Drug Discovery, Ministry of Education and School of Pharmaceutical Sciences, Wuhan University, Wuhan, China
| | - Dong-Kai Pu
- School of Life Science and Technology, University of Electronic Science and Technology of China, Chengdu, China
| | - Hai-Xia Guo
- School of Life Science and Technology, University of Electronic Science and Technology of China, Chengdu, China
| | - Zixin Deng
- Department of Respiratory and Critical Care Medicine, Zhongnan Hospital of Wuhan University, Key Laboratory of Combinatorial Biosynthesis and Drug Discovery, Ministry of Education and School of Pharmaceutical Sciences, Wuhan University, Wuhan, China
| | - Ling-Ling Chen
- Agricultural Bioinformatics Key Laboratory of Hubei Province, College of Informatics, Huazhong Agricultural University, Wuhan, China
| | - Feng-Biao Guo
- Department of Respiratory and Critical Care Medicine, Zhongnan Hospital of Wuhan University, Key Laboratory of Combinatorial Biosynthesis and Drug Discovery, Ministry of Education and School of Pharmaceutical Sciences, Wuhan University, Wuhan, China
| |
Collapse
|
6
|
Cui G, Wang C, Wei X, Wang H, Wang X, Zhu X, Li J, Yang H, Duan H. Complete chloroplast genome of Hordeum brevisubulatum: Genome organization, synonymous codon usage, phylogenetic relationships, and comparative structure analysis. PLoS One 2021; 16:e0261196. [PMID: 34898618 PMCID: PMC8668134 DOI: 10.1371/journal.pone.0261196] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/13/2021] [Accepted: 11/28/2021] [Indexed: 11/18/2022] Open
Abstract
BACKGROUND Hordeum brevisubulatum, known as fine perennial forage, is used for soil salinity improvement in northern China. Chloroplast (cp) genome is an ideal model for assessing its genome evolution and the phylogenetic relationships. We de novo sequenced and analyzed the cp genome of H. brevisubulatum, providing a fundamental reference for further studies in genetics and molecular breeding. RESULTS The cp genome of H. brevisubulatum was 137,155 bp in length with a typical quadripartite structure. A total of 130 functional genes were annotated and the gene of accD was lost in the process of evolution. Among all the annotated genes, 16 different genes harbored introns and the genes of ycf3 and rps12 contained two introns. Parity rule 2 (PR2) plot analysis showed that majority of genes had a bias toward T over A in the coding strand in all five Hordeum species, and a slight G over C in the other four Hordeum species except for H. bogdanil. Additionally, 52 dispersed repeat sequences and 182 simple sequence repeats were identified. Moreover, some unique SSRs of each species could be used as molecular markers for further study. Compared to the other four Hordeum species, H. brevisubulatum was most closely related to H. bogdanii and its cp genome was relatively conserved. Moreover, inverted repeat regions (IRa and IRb) were less divergent than other parts and coding regions were relatively conserved compared to non-coding regions. Main divergence was presented at the SSC/IR border. CONCLUSIONS This research comprehensively describes the architecture of the H. brevisubulatum cp genome and improves our understanding of its cp biology and genetic diversity, which will facilitate biological discoveries and cp genome engineering.
Collapse
Affiliation(s)
- Guangxin Cui
- Lanzhou Institute of Husbandry and Pharmaceutical Science, Chinese Academy of Agricultural Sciences, Lanzhou, Gansu, China
| | - Chunmei Wang
- Lanzhou Institute of Husbandry and Pharmaceutical Science, Chinese Academy of Agricultural Sciences, Lanzhou, Gansu, China
| | - Xiaoxing Wei
- Academy of Animal and Veterinary Sciences, Qinghai University, Xining, Qinghai, China
| | - Hongbo Wang
- Lanzhou Institute of Husbandry and Pharmaceutical Science, Chinese Academy of Agricultural Sciences, Lanzhou, Gansu, China
- Laboratory of Quality & Safety Risk Assessment for Livestock Products, Ministry of Agriculture and Rural Affairs, Lanzhou, Gansu, China
| | - Xiaoli Wang
- Lanzhou Institute of Husbandry and Pharmaceutical Science, Chinese Academy of Agricultural Sciences, Lanzhou, Gansu, China
| | - Xinqiang Zhu
- Lanzhou Institute of Husbandry and Pharmaceutical Science, Chinese Academy of Agricultural Sciences, Lanzhou, Gansu, China
| | - JinHua Li
- Lanzhou Institute of Husbandry and Pharmaceutical Science, Chinese Academy of Agricultural Sciences, Lanzhou, Gansu, China
| | - Hongshan Yang
- Lanzhou Institute of Husbandry and Pharmaceutical Science, Chinese Academy of Agricultural Sciences, Lanzhou, Gansu, China
- * E-mail: (HY); (HD)
| | - Huirong Duan
- Lanzhou Institute of Husbandry and Pharmaceutical Science, Chinese Academy of Agricultural Sciences, Lanzhou, Gansu, China
- * E-mail: (HY); (HD)
| |
Collapse
|
7
|
Georgakopoulos-Soares I, Mouratidis I, Parada GE, Matharu N, Hemberg M, Ahituv N. Asymmetron: a toolkit for the identification of strand asymmetry patterns in biological sequences. Nucleic Acids Res 2021; 49:e4. [PMID: 33211865 PMCID: PMC7797064 DOI: 10.1093/nar/gkaa1052] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/19/2020] [Revised: 10/15/2020] [Accepted: 10/20/2020] [Indexed: 11/23/2022] Open
Abstract
DNA strand asymmetries can have a major effect on several biological functions, including replication, transcription and transcription factor binding. As such, DNA strand asymmetries and mutational strand bias can provide information about biological function. However, a versatile tool to explore this does not exist. Here, we present Asymmetron, a user-friendly computational tool that performs statistical analysis and visualizations for the evaluation of strand asymmetries. Asymmetron takes as input DNA features provided with strand annotation and outputs strand asymmetries for consecutive occurrences of a single DNA feature or between pairs of features. We illustrate the use of Asymmetron by identifying transcriptional and replicative strand asymmetries of germline structural variant breakpoints. We also show that the orientation of the binding sites of 45% of human transcription factors analyzed have a significant DNA strand bias in transcribed regions, that is also corroborated in ChIP-seq analyses, and is likely associated with transcription. In summary, we provide a novel tool to assess DNA strand asymmetries and show how it can be used to derive new insights across a variety of biological disciplines.
Collapse
Affiliation(s)
- Ilias Georgakopoulos-Soares
- Department of Bioengineering and Therapeutic Sciences, University of California San Francisco, San Francisco, CA, USA
- Institute for Human Genetics, University of California San Francisco, San Francisco, CA, USA
| | - Ioannis Mouratidis
- Aristotle University of Thessaloniki, Department of Mathematics, Thessaloniki, GR, Greece
| | - Guillermo E Parada
- Wellcome Sanger Institute, Wellcome Genome Campus, Hinxton, CB10 1SA, UK
- Wellcome Trust Cancer Research UK Gurdon Institute, University of Cambridge, Tennis Court Road, Cambridge CB2 1QN, UK
| | - Navneet Matharu
- Department of Bioengineering and Therapeutic Sciences, University of California San Francisco, San Francisco, CA, USA
- Institute for Human Genetics, University of California San Francisco, San Francisco, CA, USA
- Innovative Genomics Institute, University of California San Francisco, San Francisco, CA, USA
| | - Martin Hemberg
- Wellcome Sanger Institute, Wellcome Genome Campus, Hinxton, CB10 1SA, UK
- Wellcome Trust Cancer Research UK Gurdon Institute, University of Cambridge, Tennis Court Road, Cambridge CB2 1QN, UK
| | - Nadav Ahituv
- Department of Bioengineering and Therapeutic Sciences, University of California San Francisco, San Francisco, CA, USA
- Institute for Human Genetics, University of California San Francisco, San Francisco, CA, USA
| |
Collapse
|
8
|
Xu Q, Chen H, Sun W, Zhu D, Zhang Y, Chen JL, Chen Y. Genome-wide analysis of the synonymous codon usage pattern of Streptococcus suis. Microb Pathog 2021; 150:104732. [PMID: 33429052 DOI: 10.1016/j.micpath.2021.104732] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/06/2020] [Revised: 12/30/2020] [Accepted: 01/03/2021] [Indexed: 01/21/2023]
Abstract
Streptococcus suis (S. suis) is a gram-positive coccus that causes disease in humans and animals. The codon usage pattern of bacteria reveals a range of evolutionary changes that assist them to enhance tolerance to environments. To better understand the genetic features during the evolution of S. suis, we performed codon usage analysis. Nine pathogenic strains of different serotypes and different geographical distribution were analyzed to better understand the differences in their evolutionary process. Nucleotide compositions and relative synonymous codon usage (RSCU) analysis revealed that A/T-ending codons are dominant in S. suis. Neutrality analysis, correspondence analysis and ENC-plot results revealed that natural selection is the predominant element prompting codon usage. Cluster analysis based on RSCU was roughly consistent with the dendrogram rooted genomic BLAST analysis. Comparison of synonymous codon usage pattern between S. suis and susceptible hosts (H. sapiens and S. scrofa) revealed that the codon usage of S. suis is separated from the synonymous codon usage of susceptible hosts. The CAI values implied that S. suis includes a series of predicted highly expressed coding sequences contained in metabolism and transcriptional regulation, revealing the necessity of this pathogen to deal with various environmental conditions. The study of codon usage in S. suis may provide evidence involving the molecular evolution of bacteria and a better understanding of evolutionary relationships between S. suis and its corresponding hosts.
Collapse
Affiliation(s)
- Quanming Xu
- Fujian Agriculture and Forestry University, Fuzhou, 350002, China
| | - Hong Chen
- Fujian Agriculture and Forestry University, Fuzhou, 350002, China; Key Laboratory of Fujian-Taiwan Animal Pathogen Biology, College of Animal Sciences (College of Bee Science), Fujian Agriculture and Forestry University, Fuzhou, 350002, China
| | - Wen Sun
- Fujian Agriculture and Forestry University, Fuzhou, 350002, China; Key Laboratory of Fujian-Taiwan Animal Pathogen Biology, College of Animal Sciences (College of Bee Science), Fujian Agriculture and Forestry University, Fuzhou, 350002, China
| | - Dewen Zhu
- Fujian Agriculture and Forestry University, Fuzhou, 350002, China; Key Laboratory of Fujian-Taiwan Animal Pathogen Biology, College of Animal Sciences (College of Bee Science), Fujian Agriculture and Forestry University, Fuzhou, 350002, China
| | - Yongyi Zhang
- Fujian Agriculture and Forestry University, Fuzhou, 350002, China; Key Laboratory of Fujian-Taiwan Animal Pathogen Biology, College of Animal Sciences (College of Bee Science), Fujian Agriculture and Forestry University, Fuzhou, 350002, China
| | - Ji-Long Chen
- Fujian Agriculture and Forestry University, Fuzhou, 350002, China; Key Laboratory of Fujian-Taiwan Animal Pathogen Biology, College of Animal Sciences (College of Bee Science), Fujian Agriculture and Forestry University, Fuzhou, 350002, China
| | - Ye Chen
- Fujian Agriculture and Forestry University, Fuzhou, 350002, China; Key Laboratory of Fujian-Taiwan Animal Pathogen Biology, College of Animal Sciences (College of Bee Science), Fujian Agriculture and Forestry University, Fuzhou, 350002, China.
| |
Collapse
|
9
|
Barbhuiya PA, Uddin A, Chakraborty S. Codon usage pattern and evolutionary forces of mitochondrial ND genes among orders of class Amphibia. J Cell Physiol 2020; 236:2850-2868. [PMID: 32960450 DOI: 10.1002/jcp.30050] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/18/2020] [Revised: 08/07/2020] [Accepted: 08/31/2020] [Indexed: 12/18/2022]
Abstract
In this study, we used a bioinformatics approach to analyze the nucleotide composition and pattern of synonymous codon usage in mitochondrial ND genes in three amphibian groups, that is, orders Anura, Caudata, and Gymnophiona to identify the commonality and the differences of codon usage as no research work was reported yet. The high value of the effective number of codons revealed that the codon usage bias (CUB) was low in mitochondrial ND genes among the orders. Nucleotide composition analysis suggested that for each gene, the compositional features differed among Anura, Caudata, and Gymnophiona and the GC content was lower than AT content. Furthermore, a highly significant difference (p < .05) for GC content was found in each gene among the orders. The heat map showed contrasting patterns of codon usage among different ND genes. The regression of GC12 on GC3 suggested a narrow range of GC3 distribution and some points were located in the diagonal, indicating both mutation pressure and natural selection might influence the CUB. Moreover, the slope of the regression line was less than 0.5 in all ND genes among orders, indicating natural selection might have played the dominant role whereas mutation pressure had played a minor role in shaping CUB of ND genes across orders.
Collapse
Affiliation(s)
| | - Arif Uddin
- Department of Zoology, Moinul Hoque Choudhury Memorial Science College, Hailakandi, Assam, India
| | | |
Collapse
|
10
|
Zhou Y, Zhang W, Wu H, Huang K, Jin J. A high-resolution genomic composition-based method with the ability to distinguish similar bacterial organisms. BMC Genomics 2019; 20:754. [PMID: 31638897 PMCID: PMC6805505 DOI: 10.1186/s12864-019-6119-x] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/26/2019] [Accepted: 09/20/2019] [Indexed: 12/03/2022] Open
Abstract
Background Genomic composition has been found to be species specific and is used to differentiate bacterial species. To date, almost no published composition-based approaches are able to distinguish between most closely related organisms, including intra-genus species and intra-species strains. Thus, it is necessary to develop a novel approach to address this problem. Results Here, we initially determine that the “tetranucleotide-derived z-value Pearson correlation coefficient” (TETRA) approach is representative of other published statistical methods. Then, we devise a novel method called “Tetranucleotide-derived Z-value Manhattan Distance” (TZMD) and compare it with the TETRA approach. Our results show that TZMD reflects the maximal genome difference, while TETRA does not in most conditions, demonstrating in theory that TZMD provides improved resolution. Additionally, our analysis of real data shows that TZMD improves species differentiation and clearly differentiates similar organisms, including similar species belonging to the same genospecies, subspecies and intraspecific strains, most of which cannot be distinguished by TETRA. Furthermore, TZMD is able to determine clonal strains with the TZMD = 0 criterion, which intrinsically encompasses identical composition, high average nucleotide identity and high percentage of shared genomes. Conclusions Our extensive assessment demonstrates that TZMD has high resolution. This study is the first to propose a composition-based method for differentiating bacteria at the strain level and to demonstrate that composition is also strain specific. TZMD is a powerful tool and the first easy-to-use approach for differentiating clonal and non-clonal strains. Therefore, as the first composition-based algorithm for strain typing, TZMD will facilitate bacterial studies in the future.
Collapse
Affiliation(s)
- Yizhuang Zhou
- Laboratory of Hepatobiliary and Pancreatic Surgery, The Affiliated Hospital of Guilin Medical University, Guilin, Guangxi, 541001, People's Republic of China. .,Peking-Tsinghua Center for Life Science, Academy for Advanced Interdisciplinary Studies, Peking University, Beijing, 100871, People's Republic of China.
| | - Wenting Zhang
- Laboratory of Hepatobiliary and Pancreatic Surgery, The Affiliated Hospital of Guilin Medical University, Guilin, Guangxi, 541001, People's Republic of China
| | - Huixian Wu
- China-USA Lipids in Health and Disease Research Center, Guilin Medical University, Guilin, Guangxi, 541001, People's Republic of China.,Guangxi Key Laboratory of Molecular Medicine in Liver Injury and Repair, Guilin Medical University, Guilin, Guangxi, 541001, People's Republic of China
| | - Kai Huang
- Laboratory of Hepatobiliary and Pancreatic Surgery, The Affiliated Hospital of Guilin Medical University, Guilin, Guangxi, 541001, People's Republic of China.,China-USA Lipids in Health and Disease Research Center, Guilin Medical University, Guilin, Guangxi, 541001, People's Republic of China.,Guangxi Key Laboratory of Molecular Medicine in Liver Injury and Repair, Guilin Medical University, Guilin, Guangxi, 541001, People's Republic of China
| | - Junfei Jin
- Laboratory of Hepatobiliary and Pancreatic Surgery, The Affiliated Hospital of Guilin Medical University, Guilin, Guangxi, 541001, People's Republic of China. .,China-USA Lipids in Health and Disease Research Center, Guilin Medical University, Guilin, Guangxi, 541001, People's Republic of China. .,Guangxi Key Laboratory of Molecular Medicine in Liver Injury and Repair, Guilin Medical University, Guilin, Guangxi, 541001, People's Republic of China.
| |
Collapse
|
11
|
Mrázek J, Karls AC. In silico simulations of occurrence of transcription factor binding sites in bacterial genomes. BMC Evol Biol 2019; 19:67. [PMID: 30823869 PMCID: PMC6397444 DOI: 10.1186/s12862-019-1381-8] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/12/2018] [Accepted: 02/01/2019] [Indexed: 11/16/2022] Open
Abstract
Background Interactions between transcription factors and their specific binding sites are a key component of regulation of gene expression. Until recently, it was generally assumed that most bacterial transcription factor binding sites are located at or near promoters. However, several recent works utilizing high-throughput technology to detect transcription factor binding sites in bacterial genomes found a large number of binding sites in unexpected locations, particularly inside genes, as opposed to known or expected promoter regions. While some of these intragenic binding sites likely have regulatory functions, an alternative scenario is that many of these binding sites arise by chance in the absence of selective constraints. The latter possibility was supported by in silico simulations for σ54 binding sites in Salmonella. Results In this work, we extend these simulations to more than forty transcription factors from E. coli and other bacteria. The results suggest that binding sites for all analyzed transcription factors are likely to arise throughout the genome by random genetic drift and many transcription factor binding sites found in genomes may not have specific regulatory functions. In addition, when comparing observed and expected patterns of occurrence of binding sites in genomes, we observed distinct differences among different transcription factors. Conclusions We speculate that transcription factor binding sites randomly occurring throughout the genome could be beneficial in promoting emergence of new regulatory interactions and thus facilitating evolution of gene regulatory networks. Electronic supplementary material The online version of this article (10.1186/s12862-019-1381-8) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Jan Mrázek
- Department of Microbiology, University of Georgia, Athens, GA, USA. .,Institute of Bioinformatics, University of Georgia, Athens, GA, USA.
| | - Anna C Karls
- Department of Microbiology, University of Georgia, Athens, GA, USA
| |
Collapse
|
12
|
Bose D, Mukhopadhyay S. Comparative genomics of a few members of the family Aquificaceae on the basis of their codon usage profile. GENE REPORTS 2019. [DOI: 10.1016/j.genrep.2018.11.003] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/30/2023]
|
13
|
Synonymous Codon Usages as an Evolutionary Dynamic for Chlamydiaceae. Int J Mol Sci 2018; 19:ijms19124010. [PMID: 30545112 PMCID: PMC6321445 DOI: 10.3390/ijms19124010] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/04/2018] [Revised: 12/06/2018] [Accepted: 12/10/2018] [Indexed: 01/08/2023] Open
Abstract
The family of Chlamydiaceae contains a group of obligate intracellular bacteria that can infect a wide range of hosts. The evolutionary trend of members in this family is a hot topic, which benefits our understanding of the cross-infection of these pathogens. In this study, 14 whole genomes of 12 Chlamydia species were used to investigate the nucleotide, codon, and amino acid usage bias by synonymous codon usage value and information entropy method. The results showed that all the studied Chlamydia spp. had A/T rich genes with over-represented A or T at the third positions and G or C under-represented at these positions, suggesting that nucleotide usages influenced synonymous codon usages. The overall codon usage trend from synonymous codon usage variations divides the Chlamydia spp. into four separate clusters, while amino acid usage divides the Chlamydia spp. into two clusters with some exceptions, which reflected the genetic diversity of the Chlamydiaceae family members. The overall codon usage pattern represented by the effective number of codons (ENC) was significantly positively correlated to gene GC3 content. A negative correlation exists between ENC and the codon adaptation index for some Chlamydia species. These results suggested that mutation pressure caused by nucleotide composition constraint played an important role in shaping synonymous codon usage patterns. Furthermore, codon usage of T3ss and Pmps gene families adapted to that of the corresponding genome. Taken together, analyses help our understanding of evolutionary interactions between nucleotide, synonymous codon, and amino acid usages in genes of Chlamydiaceae family members.
Collapse
|
14
|
Amgarten D, Braga LPP, da Silva AM, Setubal JC. MARVEL, a Tool for Prediction of Bacteriophage Sequences in Metagenomic Bins. Front Genet 2018; 9:304. [PMID: 30131825 PMCID: PMC6090037 DOI: 10.3389/fgene.2018.00304] [Citation(s) in RCA: 103] [Impact Index Per Article: 14.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/17/2018] [Accepted: 07/18/2018] [Indexed: 01/21/2023] Open
Abstract
Here we present MARVEL, a tool for prediction of double-stranded DNA bacteriophage sequences in metagenomic bins. MARVEL uses a random forest machine learning approach. We trained the program on a dataset with 1,247 phage and 1,029 bacterial genomes, and tested it on a dataset with 335 bacterial and 177 phage genomes. We show that three simple genomic features extracted from contig sequences were sufficient to achieve a good performance in separating bacterial from phage sequences: gene density, strand shifts, and fraction of significant hits to a viral protein database. We compared the performance of MARVEL to that of VirSorter and VirFinder, two popular programs for predicting viral sequences. Our results show that all three programs have comparable specificity, but MARVEL achieves much better performance on the recall (sensitivity) measure. This means that MARVEL should be able to identify many more phage sequences in metagenomic bins than heretofore has been possible. In a simple test with real data, containing mostly bacterial sequences, MARVEL classified 58 out of 209 bins as phage genomes; other evidence suggests that 57 of these 58 bins are novel phage sequences. MARVEL is freely available at https://github.com/LaboratorioBioinformatica/MARVEL.
Collapse
Affiliation(s)
- Deyvid Amgarten
- Departamento de Bioquímica, Instituto de Química, Universidade de São Paulo, São Paulo, Brazil
| | - Lucas P P Braga
- Departamento de Bioquímica, Instituto de Química, Universidade de São Paulo, São Paulo, Brazil.,INRA, UMR 1347, Agroécologie, Dijon, France
| | - Aline M da Silva
- Departamento de Bioquímica, Instituto de Química, Universidade de São Paulo, São Paulo, Brazil
| | - João C Setubal
- Departamento de Bioquímica, Instituto de Química, Universidade de São Paulo, São Paulo, Brazil.,Biocomplexity Institute of Virginia Tech, Blacksburg, VA, United States
| |
Collapse
|
15
|
diCenzo GC, Benedict AB, Fondi M, Walker GC, Finan TM, Mengoni A, Griffitts JS. Robustness encoded across essential and accessory replicons of the ecologically versatile bacterium Sinorhizobium meliloti. PLoS Genet 2018; 14:e1007357. [PMID: 29672509 PMCID: PMC5929573 DOI: 10.1371/journal.pgen.1007357] [Citation(s) in RCA: 38] [Impact Index Per Article: 5.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/16/2018] [Revised: 05/01/2018] [Accepted: 04/10/2018] [Indexed: 11/19/2022] Open
Abstract
Bacterial genome evolution is characterized by gains, losses, and rearrangements of functional genetic segments. The extent to which large-scale genomic alterations influence genotype-phenotype relationships has not been investigated in a high-throughput manner. In the symbiotic soil bacterium Sinorhizobium meliloti, the genome is composed of a chromosome and two large extrachromosomal replicons (pSymA and pSymB, which together constitute 45% of the genome). Massively parallel transposon insertion sequencing (Tn-seq) was employed to evaluate the contributions of chromosomal genes to growth fitness in both the presence and absence of these extrachromosomal replicons. Ten percent of chromosomal genes from diverse functional categories are shown to genetically interact with pSymA and pSymB. These results demonstrate the pervasive robustness provided by the extrachromosomal replicons, which is further supported by constraint-based metabolic modeling. A comprehensive picture of core S. meliloti metabolism was generated through a Tn-seq-guided in silico metabolic network reconstruction, producing a core network encompassing 726 genes. This integrated approach facilitated functional assignments for previously uncharacterized genes, while also revealing that Tn-seq alone missed over a quarter of wild-type metabolism. This work highlights the many functional dependencies and epistatic relationships that may arise between bacterial replicons and across a genome, while also demonstrating how Tn-seq and metabolic modeling can be used together to yield insights not obtainable by either method alone. S. meliloti, which has traditionally facilitated ground-breaking insights into symbiotic communication, is also emerging as an excellent model for studying the evolution of functional relationships between bacterial chromosomes and anciently acquired accessory replicons. Multi-replicon genome architecture is present in ~ 10% of presently sequenced bacterial genomes. The S. meliloti genome is composed of three circular replicons, two of which are dispensable even though they encompass nearly half of the protein-coding genes in this organism. The construction of strains lacking these replicons has enabled a straightforward, genome-wide analysis of interactions between the chromosome and the non-essential replicons, revealing extensive functional cooperation between these genomic components. This analysis enabled a substantial refinement of a metabolic network model for S. meliloti. The integration of massively parallel genotype-phenotype screening with in silico metabolic reconstruction has enhanced our understanding of metabolic network structure as it relates to genome evolution in S. meliloti, and exemplifies an approach that may be productively applied to other taxa. The combined experimental and computational approach employed here further provides unique insights into the pervasive genetic interactions that may exist within large bacterial genomes.
Collapse
Affiliation(s)
- George C. diCenzo
- Department of Biology, University of Florence, Sesto Fiorentino, FI, Italy
- * E-mail:
| | - Alex B. Benedict
- Department of Microbiology and Molecular Biology, Brigham Young University, Provo, UT, United States of America
| | - Marco Fondi
- Department of Biology, University of Florence, Sesto Fiorentino, FI, Italy
| | - Graham C. Walker
- Department of Biology, Massachusetts Institute of Technology, Cambridge, MA, United States of America
| | | | - Alessio Mengoni
- Department of Biology, University of Florence, Sesto Fiorentino, FI, Italy
| | - Joel S. Griffitts
- Department of Microbiology and Molecular Biology, Brigham Young University, Provo, UT, United States of America
| |
Collapse
|
16
|
Bergman J, Betancourt AJ, Vogl C. Transcription-Associated Compositional Skews in Drosophila Genes. Genome Biol Evol 2018; 10:269-275. [PMID: 29036491 PMCID: PMC5786239 DOI: 10.1093/gbe/evx200] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 09/25/2017] [Indexed: 12/23/2022] Open
Abstract
In many organisms, local deviations from Chargaff's second parity rule are observed around replication and transcription start sites and within intron sequences. Here, we use expression data as well as a whole-genome data set of nearly 200 haplotypes to investigate such compositional skews in Drosophila melanogaster genes. We find a positive correlation between compositional skew and gene expression, comparable in strength to similar correlations between expression levels and genome-wide sequence features. This correlation is relatively stronger for germline, compared with somatic expression, consistent with the process of transcription-associated mutation bias. We also inferred mutation rates from alleles segregating at low frequencies in short introns, and show that, whereas the overall GC content of short introns does not conform to the equilibrium expectation, the level of the observed deviation from the second parity rule is generally consistent with the inferred rates.
Collapse
Affiliation(s)
- Juraj Bergman
- Institut für Populationsgenetik, Vetmeduni Vienna, Wien, Austria
- Vienna Graduate School of Population Genetics, Vetmeduni Vienna, Wien, Austria
| | - Andrea J Betancourt
- Institut für Populationsgenetik, Vetmeduni Vienna, Wien, Austria
- Present address: Institute of Integrative Biology, University of Liverpool, Liverpool, United Kingdom
| | - Claus Vogl
- Institut für Tierzucht und Genetik, Vetmeduni Vienna, Wien, Austria
| |
Collapse
|
17
|
Amgarten D, Braga LPP, da Silva AM, Setubal JC. MARVEL, a Tool for Prediction of Bacteriophage Sequences in Metagenomic Bins. Front Genet 2018. [PMID: 30131825 DOI: 10.3389/fgene.2018.00304/full] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 04/26/2023] Open
Abstract
Here we present MARVEL, a tool for prediction of double-stranded DNA bacteriophage sequences in metagenomic bins. MARVEL uses a random forest machine learning approach. We trained the program on a dataset with 1,247 phage and 1,029 bacterial genomes, and tested it on a dataset with 335 bacterial and 177 phage genomes. We show that three simple genomic features extracted from contig sequences were sufficient to achieve a good performance in separating bacterial from phage sequences: gene density, strand shifts, and fraction of significant hits to a viral protein database. We compared the performance of MARVEL to that of VirSorter and VirFinder, two popular programs for predicting viral sequences. Our results show that all three programs have comparable specificity, but MARVEL achieves much better performance on the recall (sensitivity) measure. This means that MARVEL should be able to identify many more phage sequences in metagenomic bins than heretofore has been possible. In a simple test with real data, containing mostly bacterial sequences, MARVEL classified 58 out of 209 bins as phage genomes; other evidence suggests that 57 of these 58 bins are novel phage sequences. MARVEL is freely available at https://github.com/LaboratorioBioinformatica/MARVEL.
Collapse
Affiliation(s)
- Deyvid Amgarten
- Departamento de Bioquímica, Instituto de Química, Universidade de São Paulo, São Paulo, Brazil
| | - Lucas P P Braga
- Departamento de Bioquímica, Instituto de Química, Universidade de São Paulo, São Paulo, Brazil
- INRA, UMR 1347, Agroécologie, Dijon, France
| | - Aline M da Silva
- Departamento de Bioquímica, Instituto de Química, Universidade de São Paulo, São Paulo, Brazil
| | - João C Setubal
- Departamento de Bioquímica, Instituto de Química, Universidade de São Paulo, São Paulo, Brazil
- Biocomplexity Institute of Virginia Tech, Blacksburg, VA, United States
| |
Collapse
|
18
|
Kono N, Tomita M, Arakawa K. eRP arrangement: a strategy for assembled genomic contig rearrangement based on replication profiling in bacteria. BMC Genomics 2017; 18:784. [PMID: 29029602 PMCID: PMC5640929 DOI: 10.1186/s12864-017-4162-z] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/21/2017] [Accepted: 10/05/2017] [Indexed: 12/15/2022] Open
Abstract
Background The reduced cost of sequencing has made de novo sequencing and the assembly of draft microbial genomes feasible in any ordinary biology lab. However, the process of finishing and completing the genome remains labor-intensive and computationally challenging in some cases, such as in the study of complete genome sequences, genomic rearrangements, long-range syntenic relationships, and structural variations. Methods Here, we show a contig reordering strategy based on experimental replication profiling (eRP) to recapitulate the bacterial genome structure within draft genomes. During the exponential growth phase, the majority of bacteria show a global genomic copy number gradient that is enriched near the replication origin and gradually declines toward the terminus. Therefore, if genome sequencing is performed with appropriate timing, the short-read coverage reflects this copy number gradient, providing information about the contig positions relative to the replication origin and terminus. Results We therefore investigated the appropriate timing for genomic DNA sampling and developed an algorithm for the reordering of the contigs based on eRP. As a result, this strategy successfully recapitulates the genomic structure of various structural mutants with draft genome sequencing. Conclusions Our strategy was successful for contig rearrangement with intracellular DNA replication behavior mechanisms and can be applied to almost all bacteria because the DNA replication system is highly conserved. Therefore, eRP makes it possible to understand genomic structural information and long-range syntenic relationships using a draft genome that is based on short reads. Electronic supplementary material The online version of this article (10.1186/s12864-017-4162-z) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Nobuaki Kono
- Institute for Advanced Biosciences, Keio University, Mizukami 246-2, Kakuganji, Tsuruoka, Yamagata, 997-0052, Japan.
| | - Masaru Tomita
- Institute for Advanced Biosciences, Keio University, Mizukami 246-2, Kakuganji, Tsuruoka, Yamagata, 997-0052, Japan
| | - Kazuharu Arakawa
- Institute for Advanced Biosciences, Keio University, Mizukami 246-2, Kakuganji, Tsuruoka, Yamagata, 997-0052, Japan
| |
Collapse
|
19
|
Agarwal M, Bhowmick K, Shah K, Krishnamachari A, Dhar SK. Identification and characterization of ARS-like sequences as putative origin(s) of replication in human malaria parasite Plasmodium falciparum. FEBS J 2017. [PMID: 28644560 DOI: 10.1111/febs.14150] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/23/2023]
Abstract
DNA replication is a fundamental process in genome maintenance, and initiates from several genomic sites (origins) in eukaryotes. In Saccharomyces cerevisiae, conserved sequences known as autonomously replicating sequences (ARSs) provide a landing pad for the origin recognition complex (ORC), leading to replication initiation. Although origins from higher eukaryotes share some common sequence features, the definitive genomic organization of these sites remains elusive. The human malaria parasite Plasmodium falciparum undergoes multiple rounds of DNA replication; therefore, control of initiation events is crucial to ensure proper replication. However, the sites of DNA replication initiation and the mechanism by which replication is initiated are poorly understood. Here, we have identified and characterized putative origins in P. falciparum by bioinformatics analyses and experimental approaches. An autocorrelation measure method was initially used to search for regions with marked fluctuation (dips) in the chromosome, which we hypothesized might contain potential origins. Indeed, S. cerevisiae ARS consensus sequences were found in dip regions. Several of these P. falciparum sequences were validated with chromatin immunoprecipitation-quantitative PCR, nascent strand abundance and a plasmid stability assay. Subsequently, the same sequences were used in yeast to confirm their potential as origins in vivo. Our results identify the presence of functional ARSs in P. falciparum and provide meaningful insights into replication origins in these deadly parasites. These data could be useful in designing transgenic vectors with improved stability for transfection in P. falciparum.
Collapse
Affiliation(s)
- Meetu Agarwal
- Special Centre for Molecular Medicine, Jawaharlal Nehru University, New Delhi, India
| | - Krishanu Bhowmick
- Special Centre for Molecular Medicine, Jawaharlal Nehru University, New Delhi, India
| | - Kushal Shah
- Department of Electrical Engineering, Indian Institute of Technology, New Delhi, India
| | | | - Suman Kumar Dhar
- Special Centre for Molecular Medicine, Jawaharlal Nehru University, New Delhi, India
| |
Collapse
|
20
|
Błażej P, Mackiewicz D, Grabińska M, Wnętrzak M, Mackiewicz P. Optimization of amino acid replacement costs by mutational pressure in bacterial genomes. Sci Rep 2017; 7:1061. [PMID: 28432324 PMCID: PMC5430830 DOI: 10.1038/s41598-017-01130-7] [Citation(s) in RCA: 19] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/13/2016] [Accepted: 03/27/2017] [Indexed: 12/17/2022] Open
Abstract
Mutations are considered a spontaneous and random process, which is important component of evolution because it generates genetic variation. On the other hand, mutations are deleterious leading to non-functional genes and energetically costly repairs. Therefore, one can expect that the mutational pressure is optimized to simultaneously generate genetic diversity and preserve genetic information. To check if empirical mutational pressures are optimized in these ways, we compared matrices of nucleotide mutation rates derived from bacterial genomes with their best possible alternatives that minimized or maximized costs of amino acid replacements associated with differences in their physicochemical properties (e.g. hydropathy and polarity). It should be noted that the studied empirical nucleotide substitution matrices and the costs of amino acid replacements are independent because these matrices were derived from sites free of selection on amino acid properties and the amino acid costs assumed only amino acid physicochemical properties without any information about mutation at the nucleotide level. Obtained results indicate that the empirical mutational matrices show a tendency to minimize costs of amino acid replacements. It implies that bacterial mutational pressures can evolve to decrease consequences of amino acid substitutions. However, the optimization is not full, which enables generation of some genetic variability.
Collapse
Affiliation(s)
- Paweł Błażej
- Department of Genomics, Faculty of Biotechnology, University of Wrocław, ul. Joliot-Curie 14a, 50-383, Wrocław, Poland
| | - Dorota Mackiewicz
- Department of Genomics, Faculty of Biotechnology, University of Wrocław, ul. Joliot-Curie 14a, 50-383, Wrocław, Poland
| | - Małgorzata Grabińska
- Department of Genomics, Faculty of Biotechnology, University of Wrocław, ul. Joliot-Curie 14a, 50-383, Wrocław, Poland
| | - Małgorzata Wnętrzak
- Department of Genomics, Faculty of Biotechnology, University of Wrocław, ul. Joliot-Curie 14a, 50-383, Wrocław, Poland
| | - Paweł Mackiewicz
- Department of Genomics, Faculty of Biotechnology, University of Wrocław, ul. Joliot-Curie 14a, 50-383, Wrocław, Poland.
| |
Collapse
|
21
|
The Impact of Selection at the Amino Acid Level on the Usage of Synonymous Codons. G3-GENES GENOMES GENETICS 2017; 7:967-981. [PMID: 28122952 PMCID: PMC5345726 DOI: 10.1534/g3.116.038125] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 02/06/2023]
Abstract
There are two main forces that affect usage of synonymous codons: directional mutational pressure and selection. The effectiveness of protein translation is usually considered as the main selectional factor. However, biased codon usage can also be a byproduct of a general selection at the amino acid level interacting with nucleotide replacements. To evaluate the validity and strength of such an effect, we superimposed >3.5 billion unrestricted mutational processes on the selection of nonsynonymous substitutions based on the differences in physicochemical properties of the coded amino acids. Using a modified evolutionary optimization algorithm, we determined the conditions in which the effect on the relative codon usage is maximized. We found that the effect is enhanced by mutational processes generating more adenine and thymine than guanine and cytosine, as well as more purines than pyrimidines. Interestingly, this effect is observed only under an unrestricted model of nucleotide substitution, and disappears when the mutational process is time-reversible. Comparison of the simulation results with data for real protein coding sequences indicates that the impact of selection at the amino acid level on synonymous codon usage cannot be neglected. Furthermore, it can considerably interfere, especially in AT-rich genomes, with other selections on codon usage, e.g., translational efficiency. It may also lead to difficulties in the recognition of other effects influencing codon bias, and an overestimation of protein coding sequences whose codon usage is subjected to adaptational selection.
Collapse
|
22
|
Singh VK, Krishnamachari A. Context based computational analysis and characterization of ARS consensus sequences (ACS) of Saccharomyces cerevisiae genome. GENOMICS DATA 2016; 9:130-6. [PMID: 27508123 PMCID: PMC4971157 DOI: 10.1016/j.gdata.2016.07.005] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 03/26/2016] [Revised: 06/27/2016] [Accepted: 07/06/2016] [Indexed: 01/08/2023]
Abstract
Genome-wide experimental studies in Saccharomyces cerevisiae reveal that autonomous replicating sequence (ARS) requires an essential consensus sequence (ACS) for replication activity. Computational studies identified thousands of ACS like patterns in the genome. However, only a few hundreds of these sites act as replicating sites and the rest are considered as dormant or evolving sites. In a bid to understand the sequence makeup of replication sites, a content and context-based analysis was performed on a set of replicating ACS sequences that binds to origin-recognition complex (ORC) denoted as ORC-ACS and non-replicating ACS sequences (nrACS), that are not bound by ORC. In this study, DNA properties such as base composition, correlation, sequence dependent thermodynamic and DNA structural profiles, and their positions have been considered for characterizing ORC-ACS and nrACS. Analysis reveals that ORC-ACS depict marked differences in nucleotide composition and context features in its vicinity compared to nrACS. Interestingly, an A-rich motif was also discovered in ORC-ACS sequences within its nucleosome-free region. Profound changes in the conformational features, such as DNA helical twist, inclination angle and stacking energy between ORC-ACS and nrACS were observed. Distribution of ACS motifs in the non-coding segments points to the locations of ORC-ACS which are found far away from the adjacent gene start position compared to nrACS thereby enabling an accessible environment for ORC-proteins. Our attempt is novel in considering the contextual view of ACS and its flanking region along with nucleosome positioning in the S. cerevisiae genome and may be useful for any computational prediction scheme.
Collapse
|
23
|
Apostolou-Karampelis K, Nikolaou C, Almirantis Y. A novel skew analysis reveals substitution asymmetries linked to genetic code GC-biases and PolIII a-subunit isoforms. DNA Res 2016; 23:353-63. [PMID: 27345720 PMCID: PMC4991834 DOI: 10.1093/dnares/dsw021] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/11/2016] [Accepted: 05/09/2016] [Indexed: 11/30/2022] Open
Abstract
Strand biases reflect deviations from a null expectation of DNA evolution that assumes strand-symmetric substitution rates. Here, we present strong evidence that nearest-neighbour preferences are a strand-biased feature of bacterial genomes, indicating neighbour-dependent substitution asymmetries. To detect such asymmetries we introduce an alignment free index (relative abundance skews). The profiles of relative abundance skews along coding sequences can trace the phylogenetic relations of bacteria, suggesting that the patterns of neighbour-dependent substitution strand-biases are not common among different lineages, but are rather species-specific. Analysis of neighbour-dependent and codon-site skews sheds light on the origins of substitution asymmetries. Via a simple model we argue that the structure of the genetic code imposes position-dependent substitution strand-biases along coding sequences, as a response to GC mutation pressure. Thus, the organization of the genetic code per se can lead to an uneven distribution of nucleotides among different codon sites, even when requirements for specific codons and amino-acids are not accounted for. Moreover, our results suggest that strand-biases in replication fidelity of PolIII α-subunit induce substitution asymmetries, both neighbour-dependent and independent, on a genome scale. The role of DNA repair systems, such as transcription-coupled repair, is also considered.
Collapse
Affiliation(s)
| | - Christoforos Nikolaou
- Computational Genomics Group, Department of Biology, University of Crete, 71409 Heraklion, Greece
| | - Yannis Almirantis
- Institute of Biosciences and Applications, National Center for Scientific Research "Demokritos", 15310 Athens, Greece
| |
Collapse
|
24
|
Nucleotide composition bias and codon usage trends of gene populations in Mycoplasma capricolum subsp. capricolum and M. Agalactiae. J Genet 2016; 94:251-60. [PMID: 26174672 DOI: 10.1007/s12041-015-0512-2] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/23/2022]
Abstract
Because of the low GC content of the gene population, amino acids of the two mycoplasmas tend to be encoded by synonymous codons with an A or T end. Compared with the codon usage of ovine, Mycoplasma capricolum and M. agalactiae tend to select optimal codons, which are rare codons in ovine. Due to codon usage pattern caused by genes with key biological functions, the overall codon usage trends represent a certain evolutionary direction in the life cycle of the two mycoplasmas. The overall codon usage trends of a gene population of M. capricolum subsp. capricolum can be obviously separated from other mycoplasmas, and the overall codon usage trends of M. agalactiae are highly similar to those of M. bovis. These results partly indicate the independent evolution of the two mycoplasmas without the limits of the host cell's environment. The GC and AT skews estimate nucleotide composition bias at different positions of nucleotide triplets and the protein consideration caused by the nucleotide composition bias at codon positions 1 and 2 largely take part in synonymous codon usage patterns of the two mycoplasmas. The correlation between the codon adaptation index and codon usage variation indicates that the effect of codon usage on gene expression in M. capricolum subsp. capricolum is opposite to that of M. agalactiae, further suggesting independence of the evolutionary process influencing the overall codon usage trends of gene populations of mycoplasmas.
Collapse
|
25
|
Lis M, Walther D. The orientation of transcription factor binding site motifs in gene promoter regions: does it matter? BMC Genomics 2016; 17:185. [PMID: 26939991 PMCID: PMC4778318 DOI: 10.1186/s12864-016-2549-x] [Citation(s) in RCA: 28] [Impact Index Per Article: 3.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/19/2015] [Accepted: 02/27/2016] [Indexed: 12/23/2022] Open
Abstract
Background Gene expression is to large degree regulated by the specific binding of protein transcription factors to cis-regulatory transcription factor binding sites in gene promoter regions. Despite the identification of hundreds of binding site sequence motifs, the question as to whether motif orientation matters with regard to the gene expression regulation of the respective downstream genes appears surprisingly underinvestigated. Results We pursued a statistical approach by probing 293 reported non-palindromic transcription factor binding site and ten core promoter motifs in Arabidopsis thaliana for evidence of any relevance of motif orientation based on mapping statistics and effects on the co-regulation of gene expression of the respective downstream genes. Although positional intervals closer to the transcription start site (TSS) were found with increased frequencies of motifs exhibiting orientation preference, a corresponding effect with regard to gene expression regulation as evidenced by increased co-expression of genes harboring the favored orientation in their upstream sequence could not be established. Furthermore, we identified an intrinsic orientational asymmetry of sequence regions close to the TSS as the likely source of the identified motif orientation preferences. By contrast, motif presence irrespective of orientation was found associated with pronounced effects on gene expression co-regulation validating the pursued approach. Inspecting motif pairs revealed statistically preferred orientational arrangements, but no consistent effect with regard to arrangement-dependent gene expression regulation was evident. Conclusions Our results suggest that for the motifs considered here, either no specific orientation rendering them functional across all their instances exists with orientational requirements instead depending on gene-locus specific additional factors, or that the binding orientation of transcription factors may generally not be relevant, but rather the event of binding itself. Electronic supplementary material The online version of this article (doi:10.1186/s12864-016-2549-x) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Monika Lis
- Max Planck Institute for Molecular Plant Physiology, Am Mühlenberg 1, 14476, Potsdam-Golm, Germany.
| | - Dirk Walther
- Max Planck Institute for Molecular Plant Physiology, Am Mühlenberg 1, 14476, Potsdam-Golm, Germany.
| |
Collapse
|
26
|
Błażej P, Miasojedow B, Grabińska M, Mackiewicz P. Optimization of Mutation Pressure in Relation to Properties of Protein-Coding Sequences in Bacterial Genomes. PLoS One 2015; 10:e0130411. [PMID: 26121655 PMCID: PMC4488281 DOI: 10.1371/journal.pone.0130411] [Citation(s) in RCA: 16] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/21/2014] [Accepted: 05/19/2015] [Indexed: 12/22/2022] Open
Abstract
Most mutations are deleterious and require energetically costly repairs. Therefore, it seems that any minimization of mutation rate is beneficial. On the other hand, mutations generate genetic diversity indispensable for evolution and adaptation of organisms to changing environmental conditions. Thus, it is expected that a spontaneous mutational pressure should be an optimal compromise between these two extremes. In order to study the optimization of the pressure, we compared mutational transition probability matrices from bacterial genomes with artificial matrices fulfilling the same general features as the real ones, e.g., the stationary distribution and the speed of convergence to the stationarity. The artificial matrices were optimized on real protein-coding sequences based on Evolutionary Strategies approach to minimize or maximize the probability of non-synonymous substitutions and costs of amino acid replacements depending on their physicochemical properties. The results show that the empirical matrices have a tendency to minimize the effects of mutations rather than maximize their costs on the amino acid level. They were also similar to the optimized artificial matrices in the nucleotide substitution pattern, especially the high transitions/transversions ratio. We observed no substantial differences between the effects of mutational matrices on protein-coding sequences in genomes under study in respect of differently replicated DNA strands, mutational cost types and properties of the referenced artificial matrices. The findings indicate that the empirical mutational matrices are rather adapted to minimize mutational costs in the studied organisms in comparison to other matrices with similar mathematical constraints.
Collapse
Affiliation(s)
- Paweł Błażej
- Department of Genomics, Faculty of Biotechnology, University of Wrocław, Wrocław, Poland
| | - Błażej Miasojedow
- Section of Mathematical Statistics, The Faculty of Mathematics, Informatics and Mechanics, University of Warsaw, Warszawa, Poland
| | - Małgorzata Grabińska
- Department of Genomics, Faculty of Biotechnology, University of Wrocław, Wrocław, Poland
| | - Paweł Mackiewicz
- Department of Genomics, Faculty of Biotechnology, University of Wrocław, Wrocław, Poland
- * E-mail:
| |
Collapse
|
27
|
Parikh H, Singh A, Krishnamachari A, Shah K. Computational prediction of origin of replication in bacterial genomes using correlated entropy measure (CEM). Biosystems 2015; 128:19-25. [DOI: 10.1016/j.biosystems.2015.01.001] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/29/2014] [Revised: 12/31/2014] [Accepted: 01/01/2015] [Indexed: 01/28/2023]
|
28
|
Goswami A, Roy Chowdhury A, Sarkar M, Saha SK, Paul S, Dutta C. Strand-biased gene distribution, purine assymetry and environmental factors influence protein evolution in Bacillus. FEBS Lett 2015; 589:629-38. [PMID: 25639611 DOI: 10.1016/j.febslet.2015.01.028] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/03/2014] [Revised: 01/16/2015] [Accepted: 01/18/2015] [Indexed: 12/23/2022]
Abstract
A strong purine asymmetry, along with strand-biased gene distribution and the presence of PolC, prevails in Bacillus and some other members of Firmicutes, Fusobacteria and Tenericutes. The analysis of protein features in 21 Bacillus species of diverse metabolic, virulence and ecological traits revealed that purine asymmetry in conjunction with lineage/niche specific constraints significantly influences protein evolution in Bacillus. All Bacillus species, except for Se-respiring Bacillus selenitireducens, display distinct strand-specific biases in amino acid usage, which may affect the isoelectric point or surface charge distribution of proteins with prevalence of acidic and basic residues in the leading and lagging strand proteins, respectively.
Collapse
Affiliation(s)
- Aranyak Goswami
- Structural Biology & Bioinformatics Division, CSIR - Indian Institute of Chemical Biology, 4, Raja S. C. Mullick Road, Kolkata 700032, India.
| | - Anindya Roy Chowdhury
- Structural Biology & Bioinformatics Division, CSIR - Indian Institute of Chemical Biology, 4, Raja S. C. Mullick Road, Kolkata 700032, India.
| | - Munmun Sarkar
- Structural Biology & Bioinformatics Division, CSIR - Indian Institute of Chemical Biology, 4, Raja S. C. Mullick Road, Kolkata 700032, India.
| | - Sanjoy Kumar Saha
- Structural Biology & Bioinformatics Division, CSIR - Indian Institute of Chemical Biology, 4, Raja S. C. Mullick Road, Kolkata 700032, India.
| | - Sandip Paul
- Structural Biology & Bioinformatics Division, CSIR - Indian Institute of Chemical Biology, 4, Raja S. C. Mullick Road, Kolkata 700032, India.
| | - Chitra Dutta
- Structural Biology & Bioinformatics Division, CSIR - Indian Institute of Chemical Biology, 4, Raja S. C. Mullick Road, Kolkata 700032, India.
| |
Collapse
|
29
|
Fonseca MM, Harris DJ, Posada D. The inversion of the Control Region in three mitogenomes provides further evidence for an asymmetric model of vertebrate mtDNA replication. PLoS One 2014; 9:e106654. [PMID: 25268704 PMCID: PMC4182315 DOI: 10.1371/journal.pone.0106654] [Citation(s) in RCA: 33] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/11/2014] [Accepted: 08/04/2014] [Indexed: 11/29/2022] Open
Abstract
Mitochondrial genomes are known to have a strong strand-specific compositional bias that is more pronounced at fourfold redundant sites of mtDNA protein-coding genes. This observation suggests that strand asymmetries, to a large extent, are caused by mutational asymmetric mechanisms. In vertebrate mitogenomes, replication and not transcription seems to play a major role in shaping compositional bias. Hence, one can better understand how mtDNA is replicated – a debated issue – through a detailed picture of mitochondrial genome evolution. Here, we analyzed the compositional bias (AT and GC skews) in protein-coding genes of almost 2,500 complete vertebrate mitogenomes. We were able to identify three fish mitogenomes with inverted AT/GC skew coupled with an inversion of the Control Region. These findings suggest that the vertebrate mitochondrial replication mechanism is asymmetric and may invert its polarity, with the leading-strand becoming the lagging-strand and vice-versa, without compromising mtDNA maintenance and expression. The inversion of the strand-specific compositional bias through the inversion of the Control Region is in agreement with the strand-displacement model but it is also compatible with the RITOLS model of mtDNA replication.
Collapse
Affiliation(s)
- Miguel M. Fonseca
- Department of Biochemistry, Genetics and Immunology, University of Vigo, Vigo, Spain
- CIBIO/InBIO, Research Center in Biodiversity and Genetic Resources, University of Porto, Vairão, Portugal
- * E-mail:
| | - D. James Harris
- CIBIO/InBIO, Research Center in Biodiversity and Genetic Resources, University of Porto, Vairão, Portugal
| | - David Posada
- Department of Biochemistry, Genetics and Immunology, University of Vigo, Vigo, Spain
| |
Collapse
|
30
|
Xing YQ, Liu GQ, Zhao XJ, Zhao HY, Cai L. Genome-wide characterization and prediction of Arabidopsis thaliana replication origins. Biosystems 2014; 124:1-6. [PMID: 25050475 DOI: 10.1016/j.biosystems.2014.07.001] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/11/2013] [Revised: 03/25/2014] [Accepted: 07/15/2014] [Indexed: 01/25/2023]
Abstract
Identification of replication origins is crucial for the faithful duplication of genomic DNA. The frequencies of single nucleotides and dinucleotides, GC/AT bias and GC/AT profile in the vicinity of Arabidopsis thaliana replication origins were analyzed in the present work. The guanine content or cytosine content is higher in origin of replication (Ori) than in non-Ori. The SS (S=G or C) dinucleotides are favoured in Ori whereas WW (W=A or T) dinucleotides are favoured in non-Ori. GC/AT bias and GC/AT profile in Ori are significantly different from that in non-Ori. Furthermore, by inputting DNA sequence features into support vector machine, we distinguished between the Ori and non-Ori regions in A. thaliana. The total prediction accuracy is about 69.5% as evaluated by the 10-fold cross-validation. This result suggested that apart from DNA sequence, deciphering the selection of replication origin must integrate many other factors including nucleosome positioning, DNA methylation, histone modification, etc. In addition, by comparing predictive performance we found that the predictive accuracy of SVM using sequence features on the context of WS language is significantly better than that of RY language. Furthermore, the same conclusion was also obtained in S. cerevisiae and D. melanogaster.
Collapse
Affiliation(s)
- Yong-Qiang Xing
- School of Mathematics, Physics and Biological Engineering, Inner Mongolia University of Science and Technology, Baotou, 014010, China; School of Physical Science and Technology, Inner Mongolia University, Hohhot, 010021, China; The Institute of Bioengineering and Technology, Inner Mongolia University of Science and Technology, Baotou, 014010, China
| | - Guo-Qing Liu
- School of Mathematics, Physics and Biological Engineering, Inner Mongolia University of Science and Technology, Baotou, 014010, China; The Institute of Bioengineering and Technology, Inner Mongolia University of Science and Technology, Baotou, 014010, China
| | - Xiu-Juan Zhao
- School of Mathematics, Physics and Biological Engineering, Inner Mongolia University of Science and Technology, Baotou, 014010, China; The Institute of Bioengineering and Technology, Inner Mongolia University of Science and Technology, Baotou, 014010, China
| | - Hong-Yu Zhao
- School of Mathematics, Physics and Biological Engineering, Inner Mongolia University of Science and Technology, Baotou, 014010, China; The Institute of Bioengineering and Technology, Inner Mongolia University of Science and Technology, Baotou, 014010, China; Inner Mongolia Key Laboratory of Biomass-Energy Conversion, Baotou, 014010, China
| | - Lu Cai
- School of Mathematics, Physics and Biological Engineering, Inner Mongolia University of Science and Technology, Baotou, 014010, China; The Institute of Bioengineering and Technology, Inner Mongolia University of Science and Technology, Baotou, 014010, China; Inner Mongolia Key Laboratory of Biomass-Energy Conversion, Baotou, 014010, China.
| |
Collapse
|
31
|
Li S, Yang J. System analysis of synonymous codon usage biases in archaeal virus genomes. J Theor Biol 2014; 355:128-39. [PMID: 24685889 PMCID: PMC7094158 DOI: 10.1016/j.jtbi.2014.03.022] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/29/2013] [Revised: 03/11/2014] [Accepted: 03/12/2014] [Indexed: 12/30/2022]
Abstract
Recent studies of geothermally heated aquatic ecosystems have found widely divergent viruses with unusual morphotypes. Archaeal viruses isolated from these hot habitats usually have double-stranded DNA genomes, linear or circular, and can infect members of the Archaea domain. In this study, the synonymous codon usage bias (SCUB) and dinucleotide composition in the available complete archaeal virus genome sequences have been investigated. It was found that there is a significant variation in SCUB among different Archaeal virus species, which is mainly determined by the base composition. The outcome of correspondence analysis (COA) and Spearman׳s rank correlation analysis shows that codon usage of selected archaeal virus genes depends mainly on GC richness of genome, and the gene׳s function, albeit with smaller effects, also contributes to codon usage in this virus. Furthermore, this investigation reveals that aromaticity of each protein is also critical in affecting SCUB of these viral genes although it was less important than that of the mutational bias. Especially, mutational pressure may influence SCUB in SIRV1, SIRV2, ARV1, AFV1, and PhiCh1 viruses, whereas translational selection could play a leading role in HRPV1׳s SCUB. These conclusions not only can offer an insight into the codon usage biases of archaeal virus and subsequently the possible relationship between archaeal viruses and their host, but also may help in understanding the evolution of archaeal viruses and their gene classification, and more helpful to explore the origin of life and the evolution of biology. The SCUB of archaeal virus genes depends mainly on GC richness of genome. The mutational pressure is the main factor that influences SCUB. The aromaticity of each protein is also critical in affecting SCUB. The translational selection could play a leading role in HRPV1׳s SCUB. The mode is helpful to explore the origin of life and the evolution of biology.
Collapse
Affiliation(s)
- Sen Li
- State Key Laboratory of Pharmaceutical Biotechnology, School of Life Science, Nanjing University, Nanjing 210093, China
| | - Jie Yang
- State Key Laboratory of Pharmaceutical Biotechnology, School of Life Science, Nanjing University, Nanjing 210093, China.
| |
Collapse
|
32
|
Rapoport AE, Trifonov EN. Compensatory nature of Chargaff’s second parity rule. J Biomol Struct Dyn 2013; 31:1324-36. [DOI: 10.1080/07391102.2012.736757] [Citation(s) in RCA: 11] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/27/2022]
|
33
|
Evertts AG, Coller HA. Back to the origin: reconsidering replication, transcription, epigenetics, and cell cycle control. Genes Cancer 2013; 3:678-96. [PMID: 23634256 DOI: 10.1177/1947601912474891] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/24/2022] Open
Abstract
In bacteria, replication is a carefully orchestrated event that unfolds the same way for each bacterium and each cell division. The process of DNA replication in bacteria optimizes cell growth and coordinates high levels of simultaneous replication and transcription. In metazoans, the organization of replication is more enigmatic. The lack of a specific sequence that defines origins of replication has, until recently, severely limited our ability to define the organizing principles of DNA replication. This question is of particular importance as emerging data suggest that replication stress is an important contributor to inherited genetic damage and the genomic instability in tumors. We consider here the replication program in several different organisms including recent genome-wide analyses of replication origins in humans. We review recent studies on the role of cytosine methylation in replication origins, the role of transcriptional looping and gene gating in DNA replication, and the role of chromatin's 3-dimensional structure in DNA replication. We use these new findings to consider several questions surrounding DNA replication in metazoans: How are origins selected? What is the relationship between replication and transcription? How do checkpoints inhibit origin firing? Why are there early and late firing origins? We then discuss whether oncogenes promote cancer through a role in DNA replication and whether errors in DNA replication are important contributors to the genomic alterations and gene fusion events observed in cancer. We conclude with some important areas for future experimentation.
Collapse
|
34
|
Nayak KC. Comparative genome sequence analysis of Sulfolobus acidocaldarius and 9 other isolates of its genus for factors influencing codon and amino acid usage. Gene 2013; 513:163-73. [DOI: 10.1016/j.gene.2012.10.024] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/02/2012] [Revised: 10/08/2012] [Accepted: 10/21/2012] [Indexed: 11/17/2022]
|
35
|
Audit B, Zaghloul L, Baker A, Arneodo A, Chen CL, d'Aubenton-Carafa Y, Thermes C. Megabase replication domains along the human genome: relation to chromatin structure and genome organisation. Subcell Biochem 2013; 61:57-80. [PMID: 23150246 DOI: 10.1007/978-94-007-4525-4_3] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/03/2023]
Abstract
In higher eukaryotes, the absence of specific sequence motifs, marking the origins of replication has been a serious hindrance to the understanding of (i) the mechanisms that regulate the spatio-temporal replication program, and (ii) the links between origins activation, chromatin structure and transcription. In this chapter, we review the partitioning of the human genome into megabased-size replication domains delineated as N-shaped motifs in the strand compositional asymmetry profiles. They collectively span 28.3% of the genome and are bordered by more than 1,000 putative replication origins. We recapitulate the comparison of this partition of the human genome with high-resolution experimental data that confirms that replication domain borders are likely to be preferential replication initiation zones in the germline. In addition, we highlight the specific distribution of experimental and numerical chromatin marks along replication domains. Domain borders correspond to particular open chromatin regions, possibly encoded in the DNA sequence, and around which replication and transcription are highly coordinated. These regions also present a high evolutionary breakpoint density, suggesting that susceptibility to breakage might be linked to local open chromatin fiber state. Altogether, this chapter presents a compartmentalization of the human genome into replication domains that are landmarks of the human genome organization and are likely to play a key role in genome dynamics during evolution and in pathological situations.
Collapse
|
36
|
Marsolier-Kergoat MC. Asymmetry indices for analysis and prediction of replication origins in eukaryotic genomes. PLoS One 2012; 7:e45050. [PMID: 23028755 PMCID: PMC3459929 DOI: 10.1371/journal.pone.0045050] [Citation(s) in RCA: 11] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/15/2012] [Accepted: 08/15/2012] [Indexed: 01/15/2023] Open
Abstract
DNA replication was recently shown to induce the formation of compositional skews in the genomes of the yeasts Saccharomyces cerevisiae and Kluyveromyces lactis. In this work, I have characterized further GC and TA skew variations in the vicinity of S. cerevisiae replication origins and termination sites, and defined asymmetry indices for origin analysis and prediction. The presence of skew jumps at some termination sites in the S. cerevisiae genome was established. The majority of S. cerevisiae replication origins are marked by an oriented consensus sequence called ACS, but no evidence could be found for asymmetric origin firing that would be linked to ACS orientation. Asymmetry indices related to GC and TA skews were defined, and a global asymmetry index IGC,TA was described. IGC,TA was found to strongly correlate with origin efficiency in S. cerevisiae and to allow the determination of sets of intergenes significantly enriched in origin loci. The generalized use of asymmetry indices for origin prediction in naive genomes implies the determination of the direction of the skews, i.e. the identification of which strand, leading or lagging, is enriched in G and which one is enriched in T. Recent work indicates that in Candida albicans and in several related species, centromeres contain early and efficient replication origins. It has been proposed that the skew jumps observed at these positions would reflect the activity of these origins, thus allowing to determine the direction of the skews in these genomes. However, I show here that the skew jumps at C. albicans centromeres are not related to replication and that replication-associated GC and TA skews in C. albicans have in fact the opposite directions of what was proposed.
Collapse
|
37
|
Coupling of σG activation to completion of engulfment during sporulation of Bacillus subtilis survives large perturbations to DNA translocation and replication. J Bacteriol 2012; 194:6264-71. [PMID: 22984259 DOI: 10.1128/jb.01470-12] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/20/2022] Open
Abstract
Spore formation in Bacillus subtilis is characterized by activation of RNA polymerase sigma factors, including the late-expressed σ(G). During spore formation an asymmetric division occurs, yielding the smaller prespore and the larger mother cell. At division, only 30% of the chromosome is in the prespore, and the rest is then translocated into the prespore. Following completion of engulfment of the prespore by the mother cell, σ(G) is activated in the prespore. Here we tested the link between engulfment and σ(G) activation by perturbing DNA translocation and replication, which are completed before engulfment. One approach was to have large DNA insertions in the chromosome; the second was to have an impaired DNA translocase; the third was to use a strain in which the site of termination of chromosome replication was relocated. Insertion of 2.3 Mb of Synechocystis DNA into the B. subtilis genome had the largest effect, delaying engulfment by at least 90 min. Chromosome translocation was also delayed and was completed shortly before the completion of engulfment. Despite the delay, σ(G) became active only after the completion of engulfment. All results are consistent with a strong link between completion of engulfment and σ(G) activation. They support a link between completion of chromosome translocation and completion of engulfment.
Collapse
|
38
|
Brockman SA, McFadden CS. The mitochondrial genome of Paraminabea aldersladei (Cnidaria: Anthozoa: Octocorallia) supports intramolecular recombination as the primary mechanism of gene rearrangement in octocoral mitochondrial genomes. Genome Biol Evol 2012; 4:994-1006. [PMID: 22975720 PMCID: PMC3468961 DOI: 10.1093/gbe/evs074] [Citation(s) in RCA: 43] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/21/2022] Open
Abstract
Sequencing of the complete mitochondrial genome of the soft coral Paraminabea aldersladei (Alcyoniidae) revealed a unique gene order, the fifth mt gene arrangement now known within the cnidarian subclass Octocorallia. At 19,886 bp, the mt genome of P. aldersladei is the second largest known for octocorals; its gene content and nucleotide composition are, however, identical to most other octocorals, and the additional length is due to the presence of two large, noncoding intergenic regions. Relative to the presumed ancestral octocoral gene order, in P. aldersladei a block of three protein-coding genes (nad6–nad3–nad4l) has been translocated and inverted. Mapping the distribution of mt gene arrangements onto a taxonomically comprehensive phylogeny of Octocorallia suggests that all of the known octocoral gene orders have evolved by successive inversions of one or more evolutionarily conserved blocks of protein-coding genes. This mode of genome evolution is unique among Metazoa, and contrasts strongly with that observed in Hexacorallia, in which extreme gene shuffling has occurred among taxonomic orders. Two of the four conserved gene blocks found in Octocorallia are, however, also conserved in the linear mt genomes of Medusozoa and in one group of Demospongiae. We speculate that the rate and mechanism of gene rearrangement in octocorals may be influenced by the presence in their mt genomes of mtMutS, a putatively active DNA mismatch repair protein that may also play a role in mediating intramolecular recombination.
Collapse
|
39
|
Arakawa K, Tomita M. Measures of compositional strand bias related to replication machinery and its applications. Curr Genomics 2012; 13:4-15. [PMID: 22942671 PMCID: PMC3269016 DOI: 10.2174/138920212799034749] [Citation(s) in RCA: 16] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/23/2011] [Revised: 09/10/2011] [Accepted: 09/20/2011] [Indexed: 11/22/2022] Open
Abstract
The compositional asymmetry of complementary bases in nucleotide sequences implies the existence of a mutational or selectional bias in the two strands of the DNA duplex, which is commonly shaped by strand-specific mechanisms in transcription or replication. Such strand bias in genomes, frequently visualized by GC skew graphs, is used for the computational prediction of transcription start sites and replication origins, as well as for comparative evolutionary genomics studies. The use of measures of compositional strand bias in order to quantify the degree of strand asymmetry is crucial, as it is the basis for determining the applicability of compositional analysis and comparing the strength of the mutational bias in different biological machineries in various species. Here, we review the measures of strand bias that have been proposed to date, including the ∆GC skew, the B1 index, the predictability score of linear discriminant analysis for gene orientation, the signal-to-noise ratio of the oligonucleotide bias, and the GC skew index. These measures have been predominantly designed for and applied to the analysis of replication-related mutational processes in prokaryotes, but we also give research examples in eukaryotes.
Collapse
Affiliation(s)
- Kazuharu Arakawa
- Institute for Advanced Biosciences, Keio University, Fujisawa 252-8520, Japan
| | | |
Collapse
|
40
|
Baker A, Julienne H, Chen CL, Audit B, d'Aubenton-Carafa Y, Thermes C, Arneodo A. Linking the DNA strand asymmetry to the spatio-temporal replication program. I. About the role of the replication fork polarity in genome evolution. THE EUROPEAN PHYSICAL JOURNAL. E, SOFT MATTER 2012; 35:92. [PMID: 23001787 DOI: 10.1140/epje/i2012-12092-y] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/06/2012] [Revised: 08/08/2012] [Accepted: 08/21/2012] [Indexed: 06/01/2023]
Abstract
Two key cellular processes, namely transcription and replication, require the opening of the DNA double helix and act differently on the two DNA strands, generating different mutational patterns (mutational asymmetry) that may result, after long evolutionary time, in different nucleotide compositions on the two DNA strands (compositional asymmetry). We elaborate on the simplest model of neutral substitution rates that takes into account the strand asymmetries generated by the transcription and replication processes. Using perturbation theory, we then solve the time evolution of the DNA composition under strand-asymmetric substitution rates. In our minimal model, the compositional and substitutional asymmetries are predicted to decompose into a transcription- and a replication-associated components. The transcription-associated asymmetry increases in magnitude with transcription rate and changes sign with gene orientation while the replication-associated asymmetry is proportional to the replication fork polarity. These results are confirmed experimentally in the human genome, using substitution rates obtained by aligning the human and chimpanzee genomes using macaca and orangutan as outgroups, and replication fork polarity determined in the HeLa cell line as estimated from the derivative of the mean replication timing. When further investigating the dynamics of compositional skew evolution, we show that it is not at equilibrium yet and that its evolution is an extremely slow process with characteristic time scales of several hundred Myrs.
Collapse
Affiliation(s)
- A Baker
- Université de Lyon, Lyon, France
| | | | | | | | | | | | | |
Collapse
|
41
|
[Current status of theoretical studies on essential genes in microbes]. YI CHUAN = HEREDITAS 2012; 34:420-30. [PMID: 22522159 DOI: 10.3724/sp.j.1005.2012.00420] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/31/2022]
Abstract
Essential genes are indispensable for the survival of an organism in optimal conditions. Recently, study on essential gene is becoming a hot topic of microbiology, genomics, and bioinformatics. This paper described the experiments that determined essential genes in some microbes and the theoretical researches on essential genes were reviewed. The major content contained comparison of essential genes and non-essential genes based on information on evolutionary conservation and sequence composition, and in silico prediction of essential genes, and analysis of the chromosomal distributions of essential genes. Finally, related progresses were concluded and the open problems were pointed out.
Collapse
|
42
|
Zhang Z, Li J, Cui P, Ding F, Li A, Townsend JP, Yu J. Codon Deviation Coefficient: a novel measure for estimating codon usage bias and its statistical significance. BMC Bioinformatics 2012; 13:43. [PMID: 22435713 PMCID: PMC3368730 DOI: 10.1186/1471-2105-13-43] [Citation(s) in RCA: 44] [Impact Index Per Article: 3.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/29/2011] [Accepted: 03/22/2012] [Indexed: 02/07/2023] Open
Abstract
Background Genetic mutation, selective pressure for translational efficiency and accuracy, level of gene expression, and protein function through natural selection are all believed to lead to codon usage bias (CUB). Therefore, informative measurement of CUB is of fundamental importance to making inferences regarding gene function and genome evolution. However, extant measures of CUB have not fully accounted for the quantitative effect of background nucleotide composition and have not statistically evaluated the significance of CUB in sequence analysis. Results Here we propose a novel measure--Codon Deviation Coefficient (CDC)--that provides an informative measurement of CUB and its statistical significance without requiring any prior knowledge. Unlike previous measures, CDC estimates CUB by accounting for background nucleotide compositions tailored to codon positions and adopts the bootstrapping to assess the statistical significance of CUB for any given sequence. We evaluate CDC by examining its effectiveness on simulated sequences and empirical data and show that CDC outperforms extant measures by achieving a more informative estimation of CUB and its statistical significance. Conclusions As validated by both simulated and empirical data, CDC provides a highly informative quantification of CUB and its statistical significance, useful for determining comparative magnitudes and patterns of biased codon usage for genes or genomes with diverse sequence compositions.
Collapse
Affiliation(s)
- Zhang Zhang
- Computational Bioscience Research Center (CBRC), King Abdullah Universitof Science and Technology (KAUST), Thuwal 23955-6900, Kingdom of Saudi Arabia
| | | | | | | | | | | | | |
Collapse
|
43
|
Shah K, Krishnamachari A. Nucleotide correlation based measure for identifying origin of replication in genomic sequences. Biosystems 2012; 107:52-5. [PMID: 21945744 DOI: 10.1016/j.biosystems.2011.09.003] [Citation(s) in RCA: 10] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/13/2011] [Revised: 08/30/2011] [Accepted: 09/10/2011] [Indexed: 12/18/2022]
Abstract
Computational prediction of the origin of replication is a challenging problem and of immense interest to biologists. Several methods have been proposed for identifying the replicon site for various classes of organisms. However, these methods have limited applicability since the replication mechanism is different in different organisms. We propose a correlation measure and show that it is correctly able to predict the origin of replication in most of the bacterial genomes. When applied to Methanocaldococcus jannaschii, Plasmodium falciparum apicoplast and Nicotiana tabacum plastid, this correlation based method is able to correctly predict the origin of replication whereas the generally used GC skew measure fails. Thus, this correlation based measure is a novel and promising tool for predicting the origin of replication in a wide class of organisms. This could have important implications in not only gaining a deeper understanding of the replication machinery in higher organisms, but also for drug discovery.
Collapse
Affiliation(s)
- Kushal Shah
- School of Computational and Integrative Sciences, Jawaharlal Nehru University, New Delhi, India.
| | | |
Collapse
|
44
|
Guo FB. [Strong strand specific composition bias-a genomic character of some obligate parasites or symbionts]. YI CHUAN = HEREDITAS 2011; 33:1039-1047. [PMID: 21993278 DOI: 10.3724/sp.j.1005.2011.01039] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/31/2023]
Abstract
DNA replication includes a set of asymmetric mechanisms, which is a division into lagging and leading strands. The former is synthesized continuously whereas the synthesis for the latter is discontinuous. Such a asymmetric mechanism leads to distinct nucleotide composition of these two strands. Strands specific nucleotide composition bias was originally found in genomes of echinoderm and vertebrate mitochondria and then in several bacterial genomes. With the rapid growth in the number of sequenced genomes, many bacteria and even eukaryotes are found to have the consistent strand composition bias. In some bacteria, the extent of strand specific composition bias was so strong that genes on the two replicating strands could be separated according to their codon usages. Till now, 11 obligate intracellular bacteria have been found to have separate codon usages according to whether genes located on the leading or lagging strands. However, there is still not a well-accepted theory that could interpret the reason for the occurrence of separate codon usages in some special bacterial genomes and not in others. This paper reviews the related works and points out its open problems.
Collapse
Affiliation(s)
- Feng-Biao Guo
- University of Electronic Science and Technology of China, Chengdu, China.
| |
Collapse
|
45
|
Marsolier-Kergoat MC, Goldar A. DNA replication induces compositional biases in yeast. Mol Biol Evol 2011; 29:893-904. [PMID: 21948086 DOI: 10.1093/molbev/msr240] [Citation(s) in RCA: 12] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/06/2023] Open
Abstract
Asymmetries intrinsic to the process of DNA replication are expected to cause differences in the substitution patterns of the leading and the lagging strands and to induce compositional biases. These biases have been detected in the majority of eubacterial genomes but rarely in eukaryotes. Only in the human genome, the activity of a minority of replication origins seems to generate compositional biases. In this work, we provide evidence for replication-associated GC and TA skews in the genomes of two yeast species, Saccharomyces cerevisiae and Kluyveromyces lactis, whereas the data for the Schizosaccharomyces pombe genome are less conclusive. In contrast with the genomes of Homo sapiens and of the majority of eubacteria, the leading strand is enriched in cytosine and adenine in both S. cerevisiae and K. lactis. We observed significant variations across the interorigin intervals of several substitution rates in the S. cerevisiae lineage since its divergence from S. paradoxus. We also found that the S. cerevisiae genome is far from compositional equilibrium and that its present compositional biases are due to substitution rates operating before its divergence from S. paradoxus. Finally, we observed that replication and transcription tend to be cooriented in the S. cerevisiae genome, especially for genes encoding subunits of protein complexes. Taken together, our results suggest that replication-related compositional biases may be a feature of many eukaryotic genomes despite the stochastic nature of the firing of replication origins in these genomes.
Collapse
|
46
|
Charneski CA, Honti F, Bryant JM, Hurst LD, Feil EJ. Atypical at skew in Firmicute genomes results from selection and not from mutation. PLoS Genet 2011; 7:e1002283. [PMID: 21935355 PMCID: PMC3174206 DOI: 10.1371/journal.pgen.1002283] [Citation(s) in RCA: 28] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/25/2011] [Accepted: 07/12/2011] [Indexed: 11/18/2022] Open
Abstract
The second parity rule states that, if there is no bias in mutation or selection, then within each strand of DNA complementary bases are present at approximately equal frequencies. In bacteria, however, there is commonly an excess of G (over C) and, to a lesser extent, T (over A) in the replicatory leading strand. The low G+C Firmicutes, such as Staphylococcus aureus, are unusual in displaying an excess of A over T on the leading strand. As mutation has been established as a major force in the generation of such skews across various bacterial taxa, this anomaly has been assumed to reflect unusual mutation biases in Firmicute genomes. Here we show that this is not the case and that mutation bias does not explain the atypical AT skew seen in S. aureus. First, recently arisen intergenic SNPs predict the classical replication-derived equilibrium enrichment of T relative to A, contrary to what is observed. Second, sites predicted to be under weak purifying selection display only weak AT skew. Third, AT skew is primarily associated with largely non-synonymous first and second codon sites and is seen with respect to their sense direction, not which replicating strand they lie on. The atypical AT skew we show to be a consequence of the strong bias for genes to be co-oriented with the replicating fork, coupled with the selective avoidance of both stop codons and costly amino acids, which tend to have T-rich codons. That intergenic sequence has more A than T, while at mutational equilibrium a preponderance of T is expected, points to a possible further unresolved selective source of skew. When considering a single strand of DNA, it is not necessarily the case that the frequency of each base should equal its complementary partner, such that A = T and G = C. For the leading strand, it is typically the case that Gs are more common than Cs, and Ts more common than As. This bias is widely thought to arise due to different mutational biases during replication. The Firmicutes exhibit an atypical preference for A over T on the leading strand, and here we show that selection, rather than mutation, can explain this exception. For those bases within coding regions, selection acts to inflate the frequency of A over T in order to avoid stop codons and to use metabolically cheap amino acids. Because genes are not orientated randomly, this manifests as an overall enrichment of A on the leading strand. Furthermore, a direct examination of mutational patterns is inconsistent with the observed enrichment of As. Curiously, our data also point to an unresolved source of selection on synonymous and intergenic sites, which are widely assumed to be neutral.
Collapse
|
47
|
Rajewska M, Wegrzyn K, Konieczny I. AT-rich region and repeated sequences - the essential elements of replication origins of bacterial replicons. FEMS Microbiol Rev 2011; 36:408-34. [PMID: 22092310 DOI: 10.1111/j.1574-6976.2011.00300.x] [Citation(s) in RCA: 88] [Impact Index Per Article: 6.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/15/2011] [Accepted: 07/07/2011] [Indexed: 11/27/2022] Open
Abstract
Repeated sequences are commonly present in the sites for DNA replication initiation in bacterial, archaeal, and eukaryotic replicons. Those motifs are usually the binding places for replication initiation proteins or replication regulatory factors. In prokaryotic replication origins, the most abundant repeated sequences are DnaA boxes which are the binding sites for chromosomal replication initiation protein DnaA, iterons which bind plasmid or phage DNA replication initiators, defined motifs for site-specific DNA methylation, and 13-nucleotide-long motifs of a not too well-characterized function, which are present within a specific region of replication origin containing higher than average content of adenine and thymine residues. In this review, we specify methods allowing identification of a replication origin, basing on the localization of an AT-rich region and the arrangement of the origin's structural elements. We describe the regularity of the position and structure of the AT-rich regions in bacterial chromosomes and plasmids. The importance of 13-nucleotide-long repeats present at the AT-rich region, as well as other motifs overlapping them, was pointed out to be essential for DNA replication initiation including origin opening, helicase loading and replication complex assembly. We also summarize the role of AT-rich region repeated sequences for DNA replication regulation.
Collapse
Affiliation(s)
- Magdalena Rajewska
- Department of Molecular and Cellular Biology, Intercollegiate Faculty of Biotechnology, University of Gdansk, Gdansk, Poland
| | | | | |
Collapse
|
48
|
A close relationship between primary nucleotides sequence structure and the composition of functional genes in the genome of prokaryotes. Mol Phylogenet Evol 2011; 61:650-8. [PMID: 21864693 DOI: 10.1016/j.ympev.2011.08.011] [Citation(s) in RCA: 21] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/29/2010] [Revised: 05/31/2011] [Accepted: 08/05/2011] [Indexed: 11/21/2022]
Abstract
Comparative genomics is an essential tool to unravel how genomes change over evolutionary time and to gain clues on the links between functional genomics and evolution. In prokaryotes, the large, good quality, genome sequences available in public databases and the recently developed large-scale computational methods, offer an unprecedent view on the ecology and evolution of microorganisms through comparative genomics. In this work, we examined the links among genome structure (i.e., the sequential distribution of nucleotides itself by detrended fluctuation analysis, DFA) and genomic diversity (i.e., gene functionality by Clusters of Orthologous Genes, COGs) in 828 full sequenced prokaryotic genomes from 548 different bacteria and archaea species. DFA scaling exponent α indicated persistent long-range correlations (fractality) in each genome analyzed. Higher resolution power was found when considering the sequential succession of purine (AG) vs. pyrimidine (CT) bases than either keto (GT) to amino (AC) forms or strongly (GC) vs. weakly (AT) bonded nucleotides. Interestingly, the phyla Aquificae, Fusobacteria, Dictyoglomi, Nitrospirae, and Thermotogae were closer to archaea than to their bacterial counterparts. A strong significant correlation was found between scaling exponent α and COGs distribution, and we consistently observed that the larger α the more heterogeneous was the gene distribution within each functional category, suggesting a close relationship between primary nucleotides sequence structure and functional genes composition.
Collapse
|
49
|
Håfström T, Jansson DS, Segerman B. Complete genome sequence of Brachyspira intermedia reveals unique genomic features in Brachyspira species and phage-mediated horizontal gene transfer. BMC Genomics 2011; 12:395. [PMID: 21816042 PMCID: PMC3163572 DOI: 10.1186/1471-2164-12-395] [Citation(s) in RCA: 30] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/20/2011] [Accepted: 08/04/2011] [Indexed: 11/10/2022] Open
Abstract
Background Brachyspira spp. colonize the intestines of some mammalian and avian species and show different degrees of enteropathogenicity. Brachyspira intermedia can cause production losses in chickens and strain PWS/AT now becomes the fourth genome to be completed in the genus Brachyspira. Results 15 classes of unique and shared genes were analyzed in B. intermedia, B. murdochii, B. hyodysenteriae and B. pilosicoli. The largest number of unique genes was found in B. intermedia and B. murdochii. This indicates the presence of larger pan-genomes. In general, hypothetical protein annotations are overrepresented among the unique genes. A 3.2 kb plasmid was found in B. intermedia strain PWS/AT. The plasmid was also present in the B. murdochii strain but not in nine other Brachyspira isolates. Within the Brachyspira genomes, genes had been translocated and also frequently switched between leading and lagging strands, a process that can be followed by different AT-skews in the third positions of synonymous codons. We also found evidence that bacteriophages were being remodeled and genes incorporated into them. Conclusions The accessory gene pool shapes species-specific traits. It is also influenced by reductive genome evolution and horizontal gene transfer. Gene-transfer events can cross both species and genus boundaries and bacteriophages appear to play an important role in this process. A mechanism for horizontal gene transfer appears to be gene translocations leading to remodeling of bacteriophages in combination with broad tropism.
Collapse
Affiliation(s)
- Therese Håfström
- Department of Bacteriology, National Veterinary Institute (SVA), SE 751 89 Uppsala, Sweden
| | | | | |
Collapse
|
50
|
Chen CL, Duquenne L, Audit B, Guilbaud G, Rappailles A, Baker A, Huvet M, d'Aubenton-Carafa Y, Hyrien O, Arneodo A, Thermes C. Replication-associated mutational asymmetry in the human genome. Mol Biol Evol 2011; 28:2327-37. [PMID: 21368316 DOI: 10.1093/molbev/msr056] [Citation(s) in RCA: 60] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/12/2023] Open
Abstract
During evolution, mutations occur at rates that can differ between the two DNA strands. In the human genome, nucleotide substitutions occur at different rates on the transcribed and non-transcribed strands that may result from transcription-coupled repair. These mutational asymmetries generate transcription-associated compositional skews. To date, the existence of such asymmetries associated with replication has not yet been established. Here, we compute the nucleotide substitution matrices around replication initiation zones identified as sharp peaks in replication timing profiles and associated with abrupt jumps in the compositional skew profile. We show that the substitution matrices computed in these regions fully explain the jumps in the compositional skew profile when crossing initiation zones. In intergenic regions, we observe mutational asymmetries measured as differences between complementary substitution rates; their sign changes when crossing initiation zones. These mutational asymmetries are unlikely to result from cryptic transcription but can be explained by a model based on replication errors and strand-biased repair. In transcribed regions, mutational asymmetries associated with replication superimpose on the previously described mutational asymmetries associated with transcription. We separate the substitution asymmetries associated with both mechanisms, which allows us to determine for the first time in eukaryotes, the mutational asymmetries associated with replication and to reevaluate those associated with transcription. Replication-associated mutational asymmetry may result from unequal rates of complementary base misincorporation by the DNA polymerases coupled with DNA mismatch repair (MMR) acting with different efficiencies on the leading and lagging strands. Replication, acting in germ line cells during long evolutionary times, contributed equally with transcription to produce the present abrupt jumps in the compositional skew. These results demonstrate that DNA replication is one of the major processes that shape human genome composition.
Collapse
Affiliation(s)
- Chun-Long Chen
- Centre de Génétique Moléculaire, Centre National de la Recherche Scientifique (CNRS), Gif-sur-Yvette, France
| | | | | | | | | | | | | | | | | | | | | |
Collapse
|