1
|
Li W, Almirantis Y, Provata A. Revisiting the neutral dynamics derived limiting guanine-cytosine content using human de novo point mutation data. Meta Gene 2022. [DOI: 10.1016/j.mgene.2021.100994] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022] Open
|
2
|
Palazzo AF, Kang YM. GC-content biases in protein-coding genes act as an "mRNA identity" feature for nuclear export. Bioessays 2020; 43:e2000197. [PMID: 33165929 DOI: 10.1002/bies.202000197] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/22/2020] [Revised: 09/30/2020] [Accepted: 10/01/2020] [Indexed: 01/11/2023]
Abstract
It has long been observed that human protein-coding genes have a particular distribution of GC-content: the 5' end of these genes has high GC-content while the 3' end has low GC-content. In 2012, it was proposed that this pattern of GC-content could act as an mRNA identity feature that would lead to it being better recognized by the cellular machinery to promote its nuclear export. In contrast, junk RNA, which largely lacks this feature, would be retained in the nucleus and targeted for decay. Now two recent papers have provided evidence that GC-content does promote the nuclear export of many mRNAs in human cells.
Collapse
Affiliation(s)
- Alexander F Palazzo
- Department of Biochemistry, University of Toronto, Toronto, ON, M5G 1M1, Canada
| | - Yoon Mo Kang
- Department of Biochemistry, University of Toronto, Toronto, ON, M5G 1M1, Canada
| |
Collapse
|
3
|
Evolutionary Forces and Codon Bias in Different Flavors of Intrinsic Disorder in the Human Proteome. J Mol Evol 2019; 88:164-178. [DOI: 10.1007/s00239-019-09921-4] [Citation(s) in RCA: 20] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/10/2019] [Accepted: 11/26/2019] [Indexed: 12/22/2022]
|
4
|
Sievers A, Bosiek K, Bisch M, Dreessen C, Riedel J, Froß P, Hausmann M, Hildenbrand G. K-mer Content, Correlation, and Position Analysis of Genome DNA Sequences for the Identification of Function and Evolutionary Features. Genes (Basel) 2017; 8:E122. [PMID: 28422050 PMCID: PMC5406869 DOI: 10.3390/genes8040122] [Citation(s) in RCA: 21] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/10/2017] [Revised: 03/24/2017] [Accepted: 04/04/2017] [Indexed: 12/26/2022] Open
Abstract
In genome analysis, k-mer-based comparison methods have become standard tools. However, even though they are able to deliver reliable results, other algorithms seem to work better in some cases. To improve k-mer-based DNA sequence analysis and comparison, we successfully checked whether adding positional resolution is beneficial for finding and/or comparing interesting organizational structures. A simple but efficient algorithm for extracting and saving local k-mer spectra (frequency distribution of k-mers) was developed and used. The results were analyzed by including positional information based on visualizations as genomic maps and by applying basic vector correlation methods. This analysis was concentrated on small word lengths (1 ≤ k ≤ 4) on relatively small viral genomes of Papillomaviridae and Herpesviridae, while also checking its usability for larger sequences, namely human chromosome 2 and the homologous chromosomes (2A, 2B) of a chimpanzee. Using this alignment-free analysis, several regions with specific characteristics in Papillomaviridae and Herpesviridae formerly identified by independent, mostly alignment-based methods, were confirmed. Correlations between the k-mer content and several genes in these genomes have been found, showing similarities between classified and unclassified viruses, which may be potentially useful for further taxonomic research. Furthermore, unknown k-mer correlations in the genomes of Human Herpesviruses (HHVs), which are probably of major biological function, are found and described. Using the chromosomes of a chimpanzee and human that are currently known, identities between the species on every analyzed chromosome were reproduced. This demonstrates the feasibility of our approach for large data sets of complex genomes. Based on these results, we suggest k-mer analysis with positional resolution as a method for closing a gap between the effectiveness of alignment-based methods (like NCBI BLAST) and the high pace of standard k-mer analysis.
Collapse
Affiliation(s)
- Aaron Sievers
- Kirchhoff-Institute for Physics, Heidelberg University, INF 227, 69117 Heidelberg, Germany.
| | - Katharina Bosiek
- Kirchhoff-Institute for Physics, Heidelberg University, INF 227, 69117 Heidelberg, Germany.
| | - Marc Bisch
- Kirchhoff-Institute for Physics, Heidelberg University, INF 227, 69117 Heidelberg, Germany.
| | - Chris Dreessen
- Kirchhoff-Institute for Physics, Heidelberg University, INF 227, 69117 Heidelberg, Germany.
| | - Jascha Riedel
- Kirchhoff-Institute for Physics, Heidelberg University, INF 227, 69117 Heidelberg, Germany.
| | - Patrick Froß
- Kirchhoff-Institute for Physics, Heidelberg University, INF 227, 69117 Heidelberg, Germany.
| | - Michael Hausmann
- Kirchhoff-Institute for Physics, Heidelberg University, INF 227, 69117 Heidelberg, Germany.
| | - Georg Hildenbrand
- Kirchhoff-Institute for Physics, Heidelberg University, INF 227, 69117 Heidelberg, Germany.
- Department of Radiation Oncology, Universitätsmedizin Mannheim, Medical Faculty Mannheim, Heidelberg University, Theodor-Kutzer-Ufer 1-3, 68167 Mannheim, Germany.
| |
Collapse
|
5
|
Fuertes MA, Rodrigo JR, Alonso C. Do Intron and Coding Sequences of Some Human-Mouse Orthologs Evolve as a Single Unit? J Mol Evol 2016; 82:247-50. [PMID: 27220874 DOI: 10.1007/s00239-016-9746-8] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/04/2016] [Accepted: 05/11/2016] [Indexed: 11/25/2022]
Abstract
It has been previously suggested that both the coding and the associated non-coding sequences of some human-mouse orthologs could evolve as a single unit. This letter deals with the observation that between mouse and humans some orthologs change significantly their compositional features as an indication that the molecular evolution is a local process. Moreover, the data shown indicate that the coding and the intron sequences of these orthologs do not evolve independently but instead both undergo a concerted evolution, evolving as a single unit, from a compositional cluster in mouse to a different compositional cluster in human.
Collapse
Affiliation(s)
- Miguel Angel Fuertes
- Centro de Biología Molecular "Severo Ochoa" (CSIC-UAM), Universidad Autónoma de Madrid, c/Nicolás Cabrera 1, 28049, Madrid, Spain.
| | | | - Carlos Alonso
- Centro de Biología Molecular "Severo Ochoa" (CSIC-UAM), Universidad Autónoma de Madrid, c/Nicolás Cabrera 1, 28049, Madrid, Spain
| |
Collapse
|
6
|
Whittle CA, Extavour CG. Codon and Amino Acid Usage Are Shaped by Selection Across Divergent Model Organisms of the Pancrustacea. G3 (BETHESDA, MD.) 2015; 5:2307-21. [PMID: 26384771 PMCID: PMC4632051 DOI: 10.1534/g3.115.021402] [Citation(s) in RCA: 17] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 06/15/2015] [Accepted: 08/28/2015] [Indexed: 01/24/2023]
Abstract
In protein-coding genes, synonymous codon usage and amino acid composition correlate to expression in some eukaryotes, and may result from translational selection. Here, we studied large-scale RNA-seq data from three divergent arthropod models, including cricket (Gryllus bimaculatus), milkweed bug (Oncopeltus fasciatus), and the amphipod crustacean Parhyale hawaiensis, and tested for optimization of codon and amino acid usage relative to expression level. We report strong signals of AT3 optimal codons (those favored in highly expressed genes) in G. bimaculatus and O. fasciatus, whereas weaker signs of GC3 optimal codons were found in P. hawaiensis, suggesting selection on codon usage in all three organisms. Further, in G. bimaculatus and O. fasciatus, high expression was associated with lowered frequency of amino acids with large size/complexity (S/C) scores in favor of those with intermediate S/C values; thus, selection may favor smaller amino acids while retaining those of moderate size for protein stability or conformation. In P. hawaiensis, highly transcribed genes had elevated frequency of amino acids with large and small S/C scores, suggesting a complex dynamic in this crustacean. In all species, the highly transcribed genes appeared to favor short proteins, high optimal codon usage, specific amino acids, and were preferentially involved in cell-cycling and protein synthesis. Together, based on examination of 1,680,067, 1,667,783, and 1,326,896 codon sites in G. bimaculatus, O. fasciatus, and P. hawaiensis, respectively, we conclude that translational selection shapes codon and amino acid usage in these three Pancrustacean arthropods.
Collapse
Affiliation(s)
- Carrie A Whittle
- Department of Organismic and Evolutionary Biology, Harvard University, Cambridge, Massachusetts 02138
| | - Cassandra G Extavour
- Department of Organismic and Evolutionary Biology, Harvard University, Cambridge, Massachusetts 02138 Department of Molecular and Cellular Biology, Harvard University, Cambridge, Massachusetts 02138
| |
Collapse
|
7
|
De Maio N, Schlötterer C, Kosiol C. Linking great apes genome evolution across time scales using polymorphism-aware phylogenetic models. Mol Biol Evol 2013; 30:2249-62. [PMID: 23906727 PMCID: PMC3773373 DOI: 10.1093/molbev/mst131] [Citation(s) in RCA: 59] [Impact Index Per Article: 5.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/17/2022] Open
Abstract
The genomes of related species contain valuable information on the history of the considered taxa. Great apes in particular exhibit variation of evolutionary patterns along their genomes. However, the great ape data also bring new challenges, such as the presence of incomplete lineage sorting and ancestral shared polymorphisms. Previous methods for genome-scale analysis are restricted to very few individuals or cannot disentangle the contribution of mutation rates and fixation biases. This represents a limitation both for the understanding of these forces as well as for the detection of regions affected by selection. Here, we present a new model designed to estimate mutation rates and fixation biases from genetic variation within and between species. We relax the assumption of instantaneous substitutions, modeling substitutions as mutational events followed by a gradual fixation. Hence, we straightforwardly account for shared ancestral polymorphisms and incomplete lineage sorting. We analyze genome-wide synonymous site alignments of human, chimpanzee, and two orangutan species. From each taxon, we include data from several individuals. We estimate mutation rates and GC-biased gene conversion intensity. We find that both mutation rates and biased gene conversion vary with GC content. We also find lineage-specific differences, with weaker fixation biases in orangutan species, suggesting a reduced historical effective population size. Finally, our results are consistent with directional selection acting on coding sequences in relation to exonic splicing enhancers.
Collapse
Affiliation(s)
- Nicola De Maio
- Institut für Populationsgenetik, Vetmeduni Vienna, Wien, Austria
| | | | | |
Collapse
|
8
|
Nuclear export as a key arbiter of "mRNA identity" in eukaryotes. BIOCHIMICA ET BIOPHYSICA ACTA-GENE REGULATORY MECHANISMS 2012; 1819:566-77. [PMID: 22248619 DOI: 10.1016/j.bbagrm.2011.12.012] [Citation(s) in RCA: 31] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Received: 10/04/2011] [Revised: 12/23/2011] [Accepted: 12/29/2011] [Indexed: 01/15/2023]
Abstract
Over the past decade, various studies have indicated that most of the eukaryotic genome is transcribed at some level. The pervasiveness of transcription might seem surprising when one considers that only a quarter of the human genome comprises genes (including exons and introns) and less than 2% codes for protein. This conundrum is partially explained by the unique evolutionary pressures that are imposed on species with small population sizes, such as eukaryotes. These conditions promote the expansion of introns and non-functional intergenic DNA, and the accumulation of cryptic transcriptional start sites. As a result, the eukaryotic gene expression machinery must effectively evaluate whether or not a transcript has all the hallmarks of a protein-coding mRNA. If a transcript contains these features, then positive feedback loops are activated to further stimulate its transcription, processing, nuclear export and ultimately, translation. However if a transcript lacks features associated with "mRNA identity", then the RNA is degraded and/or used to inhibit further transcription and translation of the gene. Here we discuss how mRNA identity is assessed by the nuclear export machinery in order to extract meaningful information from the eukaryotic genome. In the process, we provide an explanation of why certain sequences that are enriched in protein-coding genes, such as the signal sequence coding region, promote mRNA nuclear export in vertebrates. This article is part of a Special Issue entitled: Nuclear Transport and RNA Processing.
Collapse
|
9
|
MA FEI, ZHUANG YONGLONG, CHEN LIMING, LIN LUPING, LI YANDA, XU XIAOFENG, CHEN XUEPING. COMPARING SYNONYMOUS CODON USAGE OF ALTERNATIVELY SPLICED GENES WITH NON-ALTERNATIVELY SPLICED GENES IN HUMAN GENOME. J BIOL SYST 2011. [DOI: 10.1142/s021833900400104x] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022]
Abstract
It is becoming clear that alternative splicing plays an important role in expanding protein diversity. However, the previous studies on codons usage did not distinguish alternative splicing from non-alternative splicing. Do codon usage patterns hold distinctions between them? Thus, we attempted to systematically compare the differences of synonymous codon usage patterns between alternatively and non-alternatively spliced genes by analyzing the large datasets from human genome. The results indicated:(1) There are highly significant differences in the average Nc values between non-alternatively spliced genes and the longer isoform genes as well as the shorter isoform genes, and the level of codon usage bias of non-alternatively spliced genes is to some extent higher than that in alternatively spliced genes.(2) Very extensive heterogeneity of G+C content in silent third codon position (GC3s) was evident among these genes, and it could be also shown there are highly significant differences in the average GC3s values between non-alternatively spliced genes and the longer isoform genes as well as the shorter isoform genes.(3) The Nc-plots and correspondence analysis reveal that codon usage bias are mainly dominated by mutation bias, and no correlation between gene expression level and synonymous codon biased usage is found in human genes.(4) Overall codon usage data analysis indicated that the C-ending codons usage has a highly significant differences between the longer isoform genes and non-alternatively spliced genes as well as the shorter isoform genes, it further found out that there is no significant differences of C-ending codons usage between the shorter isoform genes and non-alternatively spliced genes.Finally, our results seem to imply that alternative splicing gene may originate from non-alternative splicing gene, and may be created by DNA mutation or gene fusion, and be retained through nature selection and adaptive evolution.
Collapse
Affiliation(s)
- FEI MA
- School of Life Science, Xiamen University, Xiamen 361005, China
- Institute of Bioinformatics, Tsinghua University, Beijing 100084, China
| | - YONGLONG ZHUANG
- Institute of Bioinformatics, Tsinghua University, Beijing 100084, China
| | - LIMING CHEN
- School of Life Science, Xiamen University, Xiamen 361005, China
| | - LUPING LIN
- School of Life Science, Xiamen University, Xiamen 361005, China
| | - YANDA LI
- Institute of Bioinformatics, Tsinghua University, Beijing 100084, China
| | - XIAOFENG XU
- Life Science College, Nanjing Normal University, Nanjing 210097, China
| | - XUEPING CHEN
- College of Economics and Technology, University of Science and Technology of China, Hefei 230052, China
| |
Collapse
|
10
|
Porceddu A, Camiolo S. Spatial analyses of mono, di and trinucleotide trends in plant genes. PLoS One 2011; 6:e22855. [PMID: 21829660 PMCID: PMC3148226 DOI: 10.1371/journal.pone.0022855] [Citation(s) in RCA: 10] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/13/2011] [Accepted: 06/30/2011] [Indexed: 11/24/2022] Open
Abstract
Genomic DNA sequences display compositional heterogeneity on many scales. In this paper we analyzed tendencies and anomalies in the occurence of mono, di and trinucleotides in structural regions of plant genes. Representation of these trends as a function of position along genic sequences highlighted compositional features peculiar of either monocots or eudicots that were remarkably uniform within these two evolutionary clades. The most evident of these features appeared in the form of gradient of base content along the direction of transcription. The robustness of such a representation was validated in sequences sub-datasets generated considering structural and compositional features such as total length of cds, overall GC content and genic orientation in the genome. Piecewise regression analyses indicated that the gradients could be conveniently approximated to a two segmented model where a first region featuring a steep slope is followed by a second segment fitting a milder variation. In general, monocots species showed steeper segments than eudicots. The guanine gradient was the most distinctive feature between the two evolutionary clades, being moderately increasing in eudicots and firmly decreasing in monocots. Single gene investigation revealed that a high proportion of genes show compositional trends compatible with a segmented model suggesting that these features are essential attributes of gene organization. Dinucleotide and trinucleotide biases were referred to expectation based on a random union of the component elements. The average bias at dinucleotide level identified a significant undererpresentation of some dinucleotide and the overrepresention of others. The bias at trinucleotide level was on average low. Finally, the analysis of bryophyte coding sequences showed mononucleotide, dinucleotide and trinucleotide compositional trends resembling those of higher plants. This finding suggested that the emergenge of compositional bias is an ancient event in evolution which was already present at the time of land conquest by green plants.
Collapse
Affiliation(s)
- Andrea Porceddu
- Dipartimento di Scienze Agronomiche e Genetica Vegetale Agraria, Università degli Studi di Sassari, Sassari, Italy.
| | | |
Collapse
|
11
|
Qiu H, Hildebrand F, Kuraku S, Meyer A. Unresolved orthology and peculiar coding sequence properties of lamprey genes: the KCNA gene family as test case. BMC Genomics 2011; 12:325. [PMID: 21699680 PMCID: PMC3141671 DOI: 10.1186/1471-2164-12-325] [Citation(s) in RCA: 54] [Impact Index Per Article: 4.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/28/2010] [Accepted: 06/23/2011] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND In understanding the evolutionary process of vertebrates, cyclostomes (hagfishes and lamprey) occupy crucial positions. Resolving molecular phylogenetic relationships of cyclostome genes with gnathostomes (jawed vertebrates) genes is indispensable in deciphering both the species tree and gene trees. However, molecular phylogenetic analyses, especially those including lamprey genes, have produced highly discordant results between gene families. To efficiently scrutinize this problem using partial genome assemblies of early vertebrates, we focused on the potassium voltage-gated channel, shaker-related (KCNA) family, whose members are mostly single-exon. RESULTS Seven sea lamprey KCNA genes as well as six elephant shark genes were identified, and their orthologies to bony vertebrate subgroups were assessed. In contrast to robustly supported orthology of the elephant shark genes to gnathostome subgroups, clear orthology of any sea lamprey gene could not be established. Notably, sea lamprey KCNA sequences displayed unique codon usage pattern and amino acid composition, probably associated with exceptionally high GC-content in their coding regions. This lamprey-specific property of coding sequences was also observed generally for genes outside this gene family. CONCLUSIONS Our results suggest that secondary modifications of sequence properties unique to the lamprey lineage may be one of the factors preventing robust orthology assessments of lamprey genes, which deserves further genome-wide validation. The lamprey lineage-specific alteration of protein-coding sequence properties needs to be taken into consideration in tackling the key questions about early vertebrate evolution.
Collapse
Affiliation(s)
- Huan Qiu
- Department of Biology, University of Konstanz, Konstanz, Germany
| | | | | | | |
Collapse
|
12
|
Fuertes MA, Pérez JM, Zuckerkandl E, Alonso C. Introns form compositional clusters in parallel with the compositional clusters of the coding sequences to which they pertain. J Mol Evol 2010; 72:1-13. [PMID: 21132282 DOI: 10.1007/s00239-010-9411-6] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/22/2009] [Accepted: 11/10/2010] [Indexed: 11/29/2022]
Abstract
This report deals with the study of compositional properties of human gene sequences evaluating similarities and differences among functionally distinct sectors of the gene independently of the reading frame. To retrieve the compositional information of DNA, we present a neighbor base dependent coding system in which the alphabet of 64 letters (DNA triplets) is compressed to an alphabet of 14 letters here termed triplet composons. The triplets containing the same set of distinct bases in whatever order and number form a triplet composon. The reading of the DNA sequence is performed starting at any letter of the initial triplet and then moving, triplet-to-triplet, until the end of the sequence. The readings were made in an overlapping way along the length of the sequences. The analysis of the compositional content in terms of the composon usage frequencies of the gene sequences shows that: (i) the compositional content of the sequences is far from that of random sequences, even in the case of non-protein coding sequences; (ii) coding sequences can be classified as components of compositional clusters; and (iii) intron sequences in a cluster have the same composon usage frequencies, even as their base composition differs notably from that of their home coding sequences. A comparison of the composon usage frequencies between human and mouse homologous genes indicated that two clusters found in humans do not have their counterpart in mouse whereas the others clusters are stable in both species with respect to their composon usage frequencies in both coding and noncoding sequences.
Collapse
Affiliation(s)
- Miguel A Fuertes
- Centro de Biología Molecular Severo Ochoa (CSIC-UAM), Universidad Autónoma de Madrid, c/Nicolás Cabrera 1, 28049, Madrid, Spain.
| | | | | | | |
Collapse
|
13
|
Dunham I, Beare DM, Collins JE. The characteristics of human genes: analysis of human chromosome 22. Comp Funct Genomics 2010; 4:635-46. [PMID: 18629020 PMCID: PMC2447302 DOI: 10.1002/cfg.335] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/15/2003] [Revised: 09/04/2003] [Accepted: 09/08/2003] [Indexed: 11/11/2022] Open
Affiliation(s)
- Ian Dunham
- The Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SA, UK.
| | | | | |
Collapse
|
14
|
Mojsin M, Kovacevic-Grujicic N, Krstic A, Popovic J, Milivojevic M, Stevanovic M. Comparative analysis of SOX3 protein orthologs: Expansion of homopolymeric amino acid tracts during vertebrate evolution. Biochem Genet 2010; 48:612-23. [PMID: 20495863 DOI: 10.1007/s10528-010-9343-2] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/01/2009] [Accepted: 01/25/2010] [Indexed: 10/19/2022]
Abstract
To understand more fully the structure and evolution of the SOX3 protein, we comparatively analyzed its orthologs in vertebrates. Since complex disorders are associated with human SOX3 polyalanine expansions, our investigation focused on both compositional and evolutionary analysis of various homopolymeric amino acid tracts observed in SOX3 orthologs. Our analysis revealed that the observed homopolymeric alanine, glycine, and proline tracts are mammal-specific, except for one polyglycine tract present in birds. Since it is likely that the SOX3 protein acquired additional roles in brain development in Eutheria, we might speculate that development of novel brain functions during the course of evolution was affected, at least in part, by such structural-functional changes in the SOX3 protein.
Collapse
Affiliation(s)
- Marija Mojsin
- Institute of Molecular Genetics and Genetic Engineering, University of Belgrade, Serbia
| | | | | | | | | | | |
Collapse
|
15
|
Tatarinova TV, Alexandrov NN, Bouck JB, Feldmann KA. GC3 biology in corn, rice, sorghum and other grasses. BMC Genomics 2010; 11:308. [PMID: 20470436 PMCID: PMC2895627 DOI: 10.1186/1471-2164-11-308] [Citation(s) in RCA: 105] [Impact Index Per Article: 7.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/16/2009] [Accepted: 05/16/2010] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND The third, or wobble, position in a codon provides a high degree of possible degeneracy and is an elegant fault-tolerance mechanism. Nucleotide biases between organisms at the wobble position have been documented and correlated with the abundances of the complementary tRNAs. We and others have noticed a bias for cytosine and guanine at the third position in a subset of transcripts within a single organism. The bias is present in some plant species and warm-blooded vertebrates but not in all plants, or in invertebrates or cold-blooded vertebrates. RESULTS Here we demonstrate that in certain organisms the amount of GC at the wobble position (GC3) can be used to distinguish two classes of genes. We highlight the following features of genes with high GC3 content: they (1) provide more targets for methylation, (2) exhibit more variable expression, (3) more frequently possess upstream TATA boxes, (4) are predominant in certain classes of genes (e.g., stress responsive genes) and (5) have a GC3 content that increases from 5'to 3'. These observations led us to formulate a hypothesis to explain GC3 bimodality in grasses. CONCLUSIONS Our findings suggest that high levels of GC3 typify a class of genes whose expression is regulated through DNA methylation or are a legacy of accelerated evolution through gene conversion. We discuss the three most probable explanations for GC3 bimodality: biased gene conversion, transcriptional and translational advantage and gene methylation.
Collapse
Affiliation(s)
- Tatiana V Tatarinova
- Department of Biomedical Engineering, Georgia Institute of Technology, Atlanta, Georgia 30332, USA.
| | | | | | | |
Collapse
|
16
|
Abstract
SOX proteins constitute a large family of diverse, well-conserved transcription factors present in vertebrates and invertebrates, and also implicated in control of many developmental processes. Our objectives have been to identify Sox14 gene of goat (Capra hircus), cow (Bos taurus) and rat (Rattus norvegicus), and to perform comparative analyses and mapping of SOx14 orthologues from numerous vertebrate species. PCR based approach was used to identify Sox14 of goat, cow and rat, while nucleotide and amino acid sequence alignments and mapping were performed using information currently available in public database. Comparative sequence analysis revealed remarkable identity among Sox14 orthologues and helped us to identify highly conserved motifs that represent molecular signatures of SOX14 protein that might have structural or functional significance. Further, determined chromosomal locations of numerous predicted group B Sox genes and their neighbouring genes using currently available genome database. In conclusion, our study has not only supported the proposed model of group B Sox genes evolution in chicken and mammals, but has also revealed that additional evolutionary events split Sox B genes into different chromosomes in some mammals. Mapping data presented in this study could help in refining the understanding of the evolution of group B Sox genes in vertebrates.
Collapse
|
17
|
Carels N, Vidal R, Frías D. Universal Features for the Classification of Coding and Non-coding DNA Sequences. Bioinform Biol Insights 2009; 3:37-49. [PMID: 20140069 PMCID: PMC2808180 DOI: 10.4137/bbi.s2236] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/06/2022] Open
Abstract
In this report, we revisited simple features that allow the classification of coding sequences (CDS) from non-coding DNA. The spectrum of codon usage of our sequence sample is large and suggests that these features are universal. The features that we investigated combine (i) the stop codon distribution, (ii) the product of purine probabilities in the three positions of nucleotide triplets, (iii) the product of Cytosine, Guanine, Adenine probabilities in 1st, 2nd, 3rd position of triplets, respectively, (iv) the product of G and C probabilities in 1st and 2nd position of triplets. These features are a natural consequence of the physico-chemical properties of proteins and their combination is successful in classifying CDS and non-coding DNA (introns) with a success rate >95% above 350 bp. The coding strand and coding frame are implicitly deduced when the sequences are classified as coding.
Collapse
Affiliation(s)
- Nicolas Carels
- Fundação Oswaldo Cruz (FIOCRUZ), Instituto Oswaldo Cruz (IOC), Laboratório de Genômica Funcional e Bioinformática, Rio de Janeiro, RJ, Brazil
| | | | | |
Collapse
|
18
|
Elhaik E, Landan G, Graur D. Can GC content at third-codon positions be used as a proxy for isochore composition? Mol Biol Evol 2009; 26:1829-33. [PMID: 19443854 DOI: 10.1093/molbev/msp100] [Citation(s) in RCA: 29] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open
Abstract
The isochore theory depicts the genomes of warm-blooded vertebrates as a mosaic of long genomic regions that are characterized by relatively homogeneous GC content. In the absence of genomic data, the GC content at third-codon positions of protein-coding genes (GC3) was commonly used as a proxy for the GC content of isochores. Oddly, in the postgenomic era, GC3 is still sometimes used as a proxy for the GC composition of isochores. Here, we use genic and genomic sequences from human, chimpanzee, cow, mouse, rat, chicken, and zebrafish to show that GC3 only explains a very small proportion of the variation in GC content of long genomic sequences flanking the genes (GCf), and what little correlation there is between GC3 and GCf was found to decay rapidly with distance from the gene. The coefficient of variation of GC3 was found to be much larger than that of GCf and, therefore, GC3 and GCf values are not comparable with each other. Comparisons of orthologous gene pairs from 1) human and chimpanzee and 2) mouse and rat show strong correlations between their GC3 values, but very weak correlations between their GCf values. We conclude that the GC content of third-codon position cannot be used as stand-in for isochoric composition.
Collapse
Affiliation(s)
- Eran Elhaik
- Department of Biology and Biochemistry, University of Houston, TX, USA
| | | | | |
Collapse
|
19
|
Chen XG, Hu J, Yang X. [Analysis of correlation of local GC level in human protein coding genes]. YI CHUAN = HEREDITAS 2008; 30:1169-1174. [PMID: 18779175 DOI: 10.3724/sp.j.1005.2008.01169] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/26/2023]
Abstract
GC level is an important feature of genomic composition, which significantly improve our understanding of structure, function and evolution of genes. In this paper, the nonredundant DNA sequence of 7,992 human protein coding genes were retrieved from public database and the local GC level of different sequence regions and correlation between GC levels were analyzed.. The results showed that the GC levels of different sequence regions were strikingly nonuniform. 5' untranslated regions were of richest GC, with average GC content being 62.5%. 3'-untranslated regions were of poorest GC, with average GC content being 43.97%. GC contents of 3' flanking sequences profoundly matched the GC levels of DNA large fragments where the genes were located. Although the GC contents of open reading frames (ORFs) were higher than that of intron, 3' non-translated region and 3' flanking sequences, high correlation existed among the GC contents of the four regions. Average GC content of the third codon position (GC3) was 58.9%, higher than that of the fist and second position, and showed high correlation to GC contents of ORFs, with correlation coefficients being 0.91, besides of its significant association with GC contents of intron, 3'-untranslated region and 3' flanking sequences. Moreover, the linear regression of GC3 against GC contents of 3' flanking sequences yielded a slope of 1.25. Thus, GC3 was a sensitive indicator for GC change of local genome. As for 5' flanking sequences, 5' untranslated regions, fist and second codon position, however, their GC level exhibited weaker correlation with that of other regions. These results suggest that the third codon positions, introns, 3'-untranslated regions and 3' flanking sequences may evolve similarly while first and second codon positions, 5' flanking sequences and 5' untranslated region were expected to bear more selective stress for holding their functions.
Collapse
Affiliation(s)
- Xiang-Gui Chen
- School of Bioengineering, Xihua University, Chengdu 610039, China.
| | | | | |
Collapse
|
20
|
Schmidt T, Frishman D. Assignment of isochores for all completely sequenced vertebrate genomes using a consensus. Genome Biol 2008; 9:R104. [PMID: 18590563 PMCID: PMC2481423 DOI: 10.1186/gb-2008-9-6-r104] [Citation(s) in RCA: 11] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/13/2008] [Revised: 05/22/2008] [Accepted: 06/30/2008] [Indexed: 11/16/2022] Open
Abstract
A new consensus isochore assignment method and a database of isochore maps for all completely sequenced vertebrate genomes are presented. We show that although the currently available isochore mapping methods agree on the isochore classification of about two-thirds of the human DNA, they produce significantly different results with regard to the location of isochore boundaries and isochore length distribution. We present a new consensus isochore assignment method based on majority voting and provide IsoBase, a comprehensive on-line database of isochore maps for all completely sequenced vertebrate genomes.
Collapse
Affiliation(s)
- Thorsten Schmidt
- Department of Genome-Oriented Bioinformatics, Wissenschaftszentrum Weihenstephan, Technische Universität München, D-85350 Freising, Germany
| | | |
Collapse
|
21
|
Correlations between coding and contiguous non-coding sequences in isochore families from vertebrate genomes. Gene 2008; 410:241-8. [DOI: 10.1016/j.gene.2007.12.016] [Citation(s) in RCA: 21] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/12/2007] [Revised: 11/13/2007] [Accepted: 12/05/2007] [Indexed: 11/22/2022]
|
22
|
Warnecke T, Parmley JL, Hurst LD. Finding exonic islands in a sea of non-coding sequence: splicing related constraints on protein composition and evolution are common in intron-rich genomes. Genome Biol 2008; 9:R29. [PMID: 18257921 PMCID: PMC2374712 DOI: 10.1186/gb-2008-9-2-r29] [Citation(s) in RCA: 36] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/05/2007] [Revised: 11/23/2007] [Accepted: 02/07/2008] [Indexed: 01/09/2023] Open
Abstract
BACKGROUND In mammals, splice-regulatory domains impose marked trends on the relative abundance of certain amino acids near exon-intron boundaries. Is this a mammalian particularity or symptomatic of exonic splicing regulation across taxa? Are such trends more common in species that a priori have a harder time identifying exon ends, that is, those with pre-mRNA rich in intronic sequence? We address these questions surveying exon composition in a sample of phylogenetically diverse genomes. RESULTS Biased amino acid usage near exon-intron boundaries is common throughout the metazoa but not restricted to the metazoa. There is extensive cross-species concordance as to which amino acids are affected, and reduced/elevated abundances are well predicted by knowledge of splice enhancers. Species expected to rely on exon definition for splicing, that is, those with a higher ratio of intronic to coding sequence, more introns per gene and longer introns, exhibit more amino acid skews. Notably, this includes the intron-rich basidiomycete Cryptococcus neoformans, which, unlike intron-poor ascomycetes (Schizosaccharomyces pombe, Saccharomyces cerevisiae), exhibits compositional biases reminiscent of the metazoa. Strikingly, 5 prime ends of nematode exons deviate radically from normality: amino acids strongly preferred near boundaries are strongly avoided in other species, and vice versa. This we suggest is a measure to avoid attracting trans-splicing machinery. CONCLUSION Constraints on amino acid composition near exon-intron boundaries are phylogenetically widespread and characteristic of species where exon localization should be problematic. That compositional biases accord with sequence preferences of splice-regulatory proteins and are absent in ascomycetes is consistent with selection on exonic splicing regulation.
Collapse
Affiliation(s)
- Tobias Warnecke
- Department of Biology and Biochemistry, University of Bath, Claverton Down, Bath, BA2 7AY, UK.
| | | | | |
Collapse
|
23
|
Different functional classes of genes are characterized by different compositional properties. FEBS Lett 2007; 581:5819-24. [DOI: 10.1016/j.febslet.2007.11.052] [Citation(s) in RCA: 25] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/13/2007] [Revised: 11/14/2007] [Accepted: 11/16/2007] [Indexed: 11/19/2022]
|
24
|
Kuraku S, Kuratani S. Time scale for cyclostome evolution inferred with a phylogenetic diagnosis of hagfish and lamprey cDNA sequences. Zoolog Sci 2007; 23:1053-64. [PMID: 17261918 DOI: 10.2108/zsj.23.1053] [Citation(s) in RCA: 138] [Impact Index Per Article: 8.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/17/2022]
Abstract
The Cyclostomata consists of the two orders Myxiniformes (hagfishes) and Petromyzoniformes (lampreys), and its monophyly has been unequivocally supported by recent molecular phylogenetic studies. Under this updated vertebrate phylogeny, we performed in silico evolutionary analyses using currently available cDNA sequences of cyclostomes. We first calculated the GC-content at four-fold degenerate sites (GC(4)), which revealed that an extremely high GC-content is shared by all the lamprey species we surveyed, whereas no striking pattern in GC-content was observed in any of the hagfish species surveyed. We then estimated the timing of diversification in cyclostome evolution using nucleotide and amino acid sequences. We obtained divergence times of 470-390 million years ago (Mya) in the Ordovician-Silurian-Devonian Periods for the interordinal split between Myxiniformes and Petromyzoniformes; 90-60 Mya in the Cretaceous-Tertiary Periods for the split between the two hagfish subfamilies, Myxininae and Eptatretinae; 280-220 Mya in the Permian-Triassic Periods for the split between the two lamprey subfamilies, Geotriinae and Petromyzoninae; and 30-10 Mya in the Tertiary Period for the split between the two lamprey genera, Petromyzon and Lethenteron. This evolutionary configuration indicates that Myxiniformes and Petromyzoniformes diverged shortly after the common ancestor of cyclostomes split from the future gnathostome lineage. Our results also suggest that intra-subfamilial diversification in hagfish and lamprey lineages (especially those distributed in the northern hemisphere) occurred in the Cretaceous or Tertiary Periods.
Collapse
Affiliation(s)
- Shigehiro Kuraku
- Laboratory for Evolutionary Morphology, RIKEN Center for Developmental Biology, Kobe 650-0047, Japan.
| | | |
Collapse
|
25
|
Melodelima C, Gautier C, Piau D. A markovian approach for the prediction of mouse isochores. J Math Biol 2007; 55:353-64. [PMID: 17486342 DOI: 10.1007/s00285-007-0087-5] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/27/2006] [Revised: 03/01/2007] [Indexed: 10/23/2022]
Abstract
Hidden Markov models (HMMs) are effective tools to detect series of statistically homogeneous structures, but they are not well suited to analyse complex structures. For example, the duration of stay in a state of a HMM must follow a geometric law. Numerous other methodological difficulties are encountered when using HMMs to segregate genes from transposons or retroviruses, or to determine the isochore classes of genes. The aim of this paper is to analyse these methodological difficulties, and to suggest new tools for the exploration of genome data. We show that HMMs can be used to analyse complex gene structures with bell-shaped length distribution by using convolution of geometric distributions. Thus, we have introduced macros-states to model the distributions of the lengths of the regions. Our study shows that simple HMM could be used to model the isochore organisation of the mouse genome. This potential use of markovian models to help in data exploration has been underestimated until now.
Collapse
Affiliation(s)
- Christelle Melodelima
- UMR 5558 CNRS Biométrie et Biologie Evolutive, Université Claude Bernard Lyon 1, 43 boulevard du 11 Novembre 1818, 69622 Villeurbanne Cedex, France.
| | | | | |
Collapse
|
26
|
Press WH, Robins H. Isochores exhibit evidence of genes interacting with the large-scale genomic environment. Genetics 2006; 174:1029-40. [PMID: 16951086 PMCID: PMC1602094 DOI: 10.1534/genetics.105.054445] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022] Open
Abstract
The genomes of mammals and birds can be partitioned into megabase-long regions, termed isochores, with consistently high, or low, average C + G content. Isochores with high CG contain a mixture of CG-rich and AT-rich genes, while high-AT isochores contain predominantly AT-rich genes. The two gene populations in the high-CG isochores are functionally distinguishable by statistical analysis of their gene ontology categories. However, the aggregate of the two populations in CG isochores is not statistically distinct from AT-rich genes in AT isochores. Genes tend to be located at local extrema of composition within the isochores, indicating that the CG-enriching mechanism acted differently when near to genes. On the other hand, maximum-likelihood reconstruction of molecular phylogenetic trees shows that branch lengths (evolutionary distances) for third codon positions in CG-rich genes are not substantially larger than those for AT-rich genes. In the context of neutral mutation theory this argues against any strong positive selection. Disparate features of isochores might be explained by a model in which about half of all genes functionally require AT richness, while, in warm-blooded organisms, about half the genome (in large coherent blocks) acquired a strong bias for mutations to CG. Using mutations in CG-rich genes as convenient indicators, we show that approximately 20% of amino acids in proteins are broadly substitutable, without regard to chemical similarity.
Collapse
|
27
|
A computational prediction of isochores based on hidden Markov models. Gene 2006; 385:41-9. [PMID: 17020791 DOI: 10.1016/j.gene.2006.04.032] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/30/2005] [Revised: 03/17/2006] [Accepted: 04/03/2006] [Indexed: 11/30/2022]
Abstract
Mammalian genomes are organised into a mosaic of regions (in general more than 300 kb in length), with differing, relatively homogeneous G+C contents. The G+C content is the basic characteristic of isochores, but they have also been associated with many other biological properties. For instance, the genes are more compact and their density is highest in G+C rich isochores. Various ways of locating isochores in the human genome have been developed, but such methods use only the base composition of the DNA sequences. The present paper proposes a new method, based on a hidden Markov model, which takes into account several of the biological properties associated with the isochore structure of a genome. This method leads to good segmentation of the human genome into isochores, and also permits a new analysis of the known heterogeneity of G+C rich isochores: most (60%) of the G+C poor genes embedded in G+C rich isochores have UTR sequences characteristic of G+C rich genes. This genomic feature is discussed in the context of both evolution and genome function.
Collapse
|
28
|
Marques AT, Antunes A, Fernandes PA, Ramos MJ. Comparative evolutionary genomics of the HADH2 gene encoding Abeta-binding alcohol dehydrogenase/17beta-hydroxysteroid dehydrogenase type 10 (ABAD/HSD10). BMC Genomics 2006; 7:202. [PMID: 16899120 PMCID: PMC1559703 DOI: 10.1186/1471-2164-7-202] [Citation(s) in RCA: 19] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/02/2006] [Accepted: 08/09/2006] [Indexed: 11/17/2022] Open
Abstract
Background The Aβ-binding alcohol dehydrogenase/17β-hydroxysteroid dehydrogenase type 10 (ABAD/HSD10) is an enzyme involved in pivotal metabolic processes and in the mitochondrial dysfunction seen in the Alzheimer's disease. Here we use comparative genomic analyses to study the evolution of the HADH2 gene encoding ABAD/HSD10 across several eukaryotic species. Results Both vertebrate and nematode HADH2 genes showed a six-exon/five-intron organization while those of the insects had a reduced and varied number of exons (two to three). Eutherian mammal HADH2 genes revealed some highly conserved noncoding regions, which may indicate the presence of functional elements, namely in the upstream region about 1 kb of the transcription start site and in the first part of intron 1. These regions were also conserved between Tetraodon and Fugu fishes. We identified a conserved alternative splicing event between human and dog, which have a nine amino acid deletion, causing the removal of the strand βF. This strand is one of the seven strands that compose the core β-sheet of the Rossman fold dinucleotide-binding motif characteristic of the short chain dehydrogenase/reductase (SDR) family members. However, the fact that the substrate binding cleft residues are retained and the existence of a shared variant between human and dog suggest that it might be functional. Molecular adaptation analyses across eutherian mammal orthologues revealed the existence of sites under positive selection, some of which being localized in the substrate-binding cleft and in the insertion 1 region on loop D (an important region for the Aβ-binding to the enzyme). Interestingly, a higher than expected number of nonsynonymous substitutions were observed between human/chimpanzee and orangutan, with six out of the seven amino acid replacements being under molecular adaptation (including three in loop D and one in the substrate binding loop). Conclusion Our study revealed that HADH2 genes maintained a reasonable conserved organization across a large evolutionary distance. The conserved noncoding regions identified among mammals and between pufferfishes, the evidence of an alternative splicing variant conserved between human and dog, and the detection of positive selection across eutherian mammals, may be of importance for further research on ABAD/HSD10 function and its implication in the Alzheimer's disease.
Collapse
Affiliation(s)
- Alexandra T Marques
- REQUIMTE, Departamento de Química, Faculdade de Ciências, Universidade do Porto, Rua do Campo Alegre, 687, 4169-007 Porto, Portugal
| | - Agostinho Antunes
- REQUIMTE, Departamento de Química, Faculdade de Ciências, Universidade do Porto, Rua do Campo Alegre, 687, 4169-007 Porto, Portugal
| | - Pedro A Fernandes
- REQUIMTE, Departamento de Química, Faculdade de Ciências, Universidade do Porto, Rua do Campo Alegre, 687, 4169-007 Porto, Portugal
| | - Maria J Ramos
- REQUIMTE, Departamento de Química, Faculdade de Ciências, Universidade do Porto, Rua do Campo Alegre, 687, 4169-007 Porto, Portugal
| |
Collapse
|
29
|
Fortes GG, Bouza C, Martínez P, Sánchez L. Diversity in isochore structure among cold-blooded vertebrates based on GC content of coding and non-coding sequences. Genetica 2006; 129:281-9. [PMID: 16897446 DOI: 10.1007/s10709-006-0009-2] [Citation(s) in RCA: 17] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/01/2005] [Accepted: 04/19/2006] [Indexed: 11/29/2022]
Abstract
To review the general consideration about the different compositional structure of warm and cold-blooded vertebrates genomes, we used of the increasing number of genetic sequences, including coding (exons) and non-coding (introns) regions, that have been deposited on the databases throughout last years. The nucleotide distributions of the third codon positions (GC3) have been analyzed in 1510 coding sequences (CDS) of fish, 1414 CDS of amphibians and 320 CDS of reptiles. Also, the relationship between GC content of 74, 56 and 25 CDS of fish, amphibians and reptiles, respectively and that of their corresponding introns (GCI) have been considerated. In accordance with recent data, sequence analysis showed the presence of very GC3-rich CDS in these poikilotherm vertebrates. However, very high diversity in compositional patterns among different orders of fish, amphibians and reptiles was found. Significant positive correlations between GC3 and GCI was also confirmed for the genes analyzed. Nevertheless, introns resulted to be poorer in GC than their corresponding CDS, this difference being larger than in human genome. Because the limited number of available sequences including exons and introns we must be cautious about the results derived from them. However, the indicious of higher GC richness of coding sequences than of their corresponding introns could aid to understand the discrepancy of sequence analysis with the ultracentrifugation studies in cold-blooded vertebrates that did not predict the existence of GC-rich isochores.
Collapse
Affiliation(s)
- Gloria G Fortes
- Departamento de Genética, Facultad de Veterinaria, Universidad de Santiago de Compostela, Lugo, Spain
| | | | | | | |
Collapse
|
30
|
Joy F, Basak S, Gupta SK, Das PJ, Ghosh SK, Ghosh TC. Compositional correlations in canine genome reflects similarity with human genes. BMB Rep 2006; 39:240-6. [PMID: 16756751 DOI: 10.5483/bmbrep.2006.39.3.240] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/20/2022] Open
Abstract
The base compositional correlations that hold among various coding and noncoding regions of the canine genome have been analysed. The distribution pattern of genes, on the basis of GC(3) composition, shows a wide range similar to that observed in human. However the occurrence of maximum number of genes was observed in the range of 65-75% of GC(3) composition. The correlation between the coding DNA sequences of canine with the different noncoding regions (introns and flanking regions) is found to be significant and in many cases the degree of correlation show similarity to human genome. We found that these correlations are not limited to the GC content alone, but is holding at the level of the frequency of individual bases as well. The present study suggests that canines ideally belong to the predicted 'general mammalian pattern' of genome composition along with human beings.
Collapse
Affiliation(s)
- Faustin Joy
- Bioinformatics Centre, Bose Institute, Kolkata, India
| | | | | | | | | | | |
Collapse
|
31
|
Scaiewicz V, Sabbía V, Piovani R, Musto H. CpG islands are the second main factor shaping codon usage in human genes. Biochem Biophys Res Commun 2006; 343:1257-61. [PMID: 16581018 DOI: 10.1016/j.bbrc.2006.03.108] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/03/2006] [Accepted: 03/15/2006] [Indexed: 01/22/2023]
Abstract
A correspondence analysis of codon usage in human genes revealed, as expected, that the first axis is strongly correlated with the base composition at synonymous third codon positions. At one extreme of the second axis were localized genes with a high frequency of NCG and CGN codons. The great majority of these sequences were embedded in CpG islands, while the opposite is true for the genes placed at the other extreme. The two main conclusions of this paper are: (1) the influence of CpG islands on codon usage, and (2) since the second axis is orthogonal (and therefore independent) of the first, GC3-rich genes are not necessarily associated with CpG islands.
Collapse
Affiliation(s)
- Viviana Scaiewicz
- Laboratorio de Organización y Evolución del Genoma, Facultad de Ciencias, Iguá 4225, Montevideo 11400, Uruguay
| | | | | | | |
Collapse
|
32
|
Kuraku S, Ishijima J, Nishida-Umehara C, Agata K, Kuratani S, Matsuda Y. cDNA-based gene mapping and GC3 profiling in the soft-shelled turtle suggest a chromosomal size-dependent GC bias shared by sauropsids. Chromosome Res 2006; 14:187-202. [PMID: 16544192 DOI: 10.1007/s10577-006-1035-8] [Citation(s) in RCA: 38] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/03/2005] [Accepted: 01/10/2006] [Indexed: 10/24/2022]
Abstract
Mammalian and avian genomes comprise several classes of chromosomal segments that vary dramatically in GC-content. Especially in chicken, microchromosomes exhibit a higher GC-content and a higher gene density than macrochromosomes. To understand the evolutionary history of the intra-genome GC heterogeneity in amniotes, it is necessary to examine the equivalence of this GC heterogeneity at the nucleotide level between these animals including reptiles, from which birds diverged. We isolated cDNAs for 39 protein-coding genes from the Chinese soft-shelled turtle, Pelodiscus sinensis, and performed chromosome mapping of 31 genes. The GC-content of exonic third positions (GC3) of P. sinensis genes showed a heterogeneous distribution, and exhibited a significant positive correlation with that of chicken and human orthologs, indicating that the last common ancestor of extant amniotes had already established a GC-compartmentalized genomic structure. Furthermore, chromosome mapping in P. sinensis revealed that microchromosomes tend to contain more GC-rich genes than GC-poor genes, as in chicken. These results illustrate two modes of genome evolution in amniotes: mammals elaborated the genomic configuration in which GC-rich and GC-poor regions coexist in individual chromosomes, whereas sauropsids (reptiles and birds) refined the chromosomal size-dependent GC compartmentalization in which GC-rich genomic fractions tend to be confined to microchromosomes.
Collapse
Affiliation(s)
- Shigehiro Kuraku
- Laboratory for Evolutionary Morphology, RIKEN Center for Developmental Biology, 2-2-3 Minatojima-minamimachi, Chuo-ku, Kobe, 650-0047, Japan.
| | | | | | | | | | | |
Collapse
|
33
|
Kliman RM, Bernal CA. Unusual usage of AGG and TTG codons in humans and their viruses. Gene 2005; 352:92-9. [PMID: 15922516 DOI: 10.1016/j.gene.2005.04.001] [Citation(s) in RCA: 20] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/24/2004] [Revised: 12/28/2004] [Accepted: 04/01/2005] [Indexed: 11/22/2022]
Abstract
Prior analysis on human protein-coding DNA sequences has identified local base composition as the primary predictor of synonymous codon usage. However, in many organisms, codon usage is influenced by natural selection, particularly for efficient expression of functional gene products. Because viruses are expected to evolve codon usage in the context of their host's molecular machinery, their genomes provide another window into the forces that guide their host's molecular evolution. Factor analysis was performed on codon usage of 16,654 genes annotated in Build 34 of the human genome, and the primary factor was correlated strongly with local base composition. However, two codons, AGG and TTG, rose in frequency as all other C- and G-ending codons decreased in frequency. These two codons were the only C- or G-ending codons with usages that negatively correlated with gene expression. Variation among viruses in codon usage also strongly reflects variation in base composition and, again, AGG and TTG decrease in frequency as all other C- and G-ending codons increase in frequency. It appears that usages of these two codons can not be explained by local compositional biases, implying a more direct role of natural selection on codon usage in humans.
Collapse
Affiliation(s)
- Richard M Kliman
- Department of Biological Sciences, Cedar Crest College, 100 College Drive, Allentown, PA, USA.
| | | |
Collapse
|
34
|
Carels N. The maize gene space is compositionally compartimentalized. FEBS Lett 2005; 579:3867-71. [PMID: 15996663 DOI: 10.1016/j.febslet.2005.05.063] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/08/2005] [Accepted: 05/13/2005] [Indexed: 11/18/2022]
Abstract
Previous investigations by Southern hybridization of cDNA with compositional DNA fractions showed that the majority of maize genes are located in a narrow GC range of DNA fragments and that the corresponding gene space was GC-richer than the region of the genome where zein genes are found. Here, we revisited the maize gene space using new data from the maize genome sequencing initiative. We found that the maize gene space itself is formed of two compositional compartments, i.e., a GC-poor and a GC-rich, characterized by a different distribution of Opie and Huck retrotransposons. The GC-rich compartment tends to be richer in GC-rich genes than the GC-poor compartment. However, the gene space compartimentalization of maize is much simpler than that of human.
Collapse
Affiliation(s)
- Nicolas Carels
- Laboratório de Bioinformática, Universidade Estadual de Santa Cruz, Rod. Ilhéus/Itabuna km. 16, 45650-000 Ilhéus Bahia, Brazil.
| |
Collapse
|
35
|
Jabbari K, Bernardi G. Comparative genomics of Anopheles gambiae and Drosophila melanogaster. Gene 2004; 333:183-6. [PMID: 15177694 DOI: 10.1016/j.gene.2004.02.038] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/17/2003] [Accepted: 02/10/2004] [Indexed: 10/26/2022]
Abstract
A sequence analysis of the genomes of Anopheles gambiae and Drosophila melanogaster reveals that Anopheles DNA is more heterogeneous and GC-richer than Drosophila DNA. The gene concentration across the Anopheles genome is characterized by low levels in the GC-poor part of the genome and a 3-fold increase in the GC-richest part; this gene density gradient is approximately half that of Drosophila. GC levels of introns and flanking sequences are correlated with GC(3) values (GC levels of third codon positions) of the corresponding genes with slopes much lower than unity; in other words, most introns and intergenic sequences are less GC-rich than the corresponding GC(3) values. These findings, which describe a compositional shift within Diptera, is of interest because of their parallels in the well studied major shift in vertebrates.
Collapse
Affiliation(s)
- Kamel Jabbari
- Laboratoire de Génétique Moléculaire, Institut Jacques Monod, 2 Place Jussieu, F-75005 Paris, France
| | | |
Collapse
|
36
|
Marín A, Wang M, Gutiérrez G. Short-range compositional correlation in the yeast genome depends on transcriptional orientation. Gene 2004; 333:151-5. [PMID: 15177690 DOI: 10.1016/j.gene.2004.02.016] [Citation(s) in RCA: 12] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/20/2003] [Revised: 01/21/2004] [Accepted: 02/10/2004] [Indexed: 11/29/2022]
Abstract
This article reports an analysis of composition of about 5000 intergenic regions and neighboring ORFs in the nuclear genome of Saccharomyces cerevisiae, and their correlation. Intergenic regions flanked by divergently transcribed ORFs are GC richer (36%) than those separating convergent ORFs (29%). This difference in GC content cannot be fully attributed to its location upstream or downstream the ORFs, since no such strong compositional bias is found within 3' and 5' segments of intergenic regions between ORFs transcribed in the same direction. We have also found that the GC content of intergenic regions is positively correlated to that of its flanking ORFs in tandem and divergent orientations, but not in convergent orientations, and that the correlation coefficient between the GC content of nearby ORFs is higher for divergent pairs. Our observations are discussed in the light of recent work stressing the relationships between base composition, chromatin structure and meiotic recombination.
Collapse
Affiliation(s)
- Antonio Marín
- Departamento de Genética, Universidad de Sevilla, Apartado 1095, E-41080 Sevilla, Spain.
| | | | | |
Collapse
|
37
|
Abstract
The existence of a well conserved linear relationship between GC levels of genes' second and third codon positions (GC2, GC3) prompted us to focus on the landscape, or joint distribution, spanned by these two variables. In human, well curated coding sequences now cover at least 15%-30% of the estimated total gene set. Our analysis of the landscape defined by this gene set revealed not only the well documented linear crest, but also the presence of several peaks and valleys along that crest, a property that was also indicated in two other warm-blooded vertebrates represented by large gene databases, that is, mouse and chicken. GC2 is the sum of eight amino acid frequencies, whereas GC3 is linearly related to the GC level of the chromosomal region containing the gene. The landscapes therefore portray relations between proteins and the DNA environments of the genes that encode them.
Collapse
Affiliation(s)
- Stéphane Cruveiller
- Laboratorio di Evoluzione Molecolare, Stazione Zoologica Anton Dohrn, 80121 Napoli, Italy
| | | | | | | |
Collapse
|
38
|
Zhang R, Zhang CT. Isochore Structures in the Genome of the Plant Arabidopsis thaliana. J Mol Evol 2004; 59:227-38. [PMID: 15486696 DOI: 10.1007/s00239-004-2617-8] [Citation(s) in RCA: 18] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/08/2003] [Accepted: 02/10/2004] [Indexed: 10/26/2022]
Abstract
Arabidopsis thaliana is an important model system for the study of plant biology. We have analyzed the complete genome sequences of Arabidopsis by using a newly developed windowless method for the GC content computation, the cumulative GC profile. It is shown that the Arabidopsis genome is organized into a mosaic structure of isochores. All the centromeric regions are located in GC-rich isochores, called centromere-isochores, which are characterized by a high GC content but low gene and T-DNA insertion densities. This characteristic distinguishes centromere-isochores from the other class of GC-rich isochores, called GC-isochores, which have high gene and T-DNA insertion densities. Consequently, 15 isochores have been identified, i.e., 7 AT-isochores, 3 GC-isochores, and 5 centromere-isochores. The genes in centromere-isochores, which have the highest GC content, have much shorter intron lengths and lower intron numbers, compared to those of the other two types. There is also considerable difference in the numbers and lengths of transposable elements (TEs) between AT and GC-isochores, i.e., the TE number (length) of AT-isochores is 6.3 (7.3) times that of GC-isochores. It is generally believed that TEs are accumulated in the regions surrounding the centromeres. However, within these TE-rich regions, there are regions of extremely low TE numbers (TE deserts), which correspond to the positions of centromere-isochores. In addition, a heterochromatic knob is located at the boundary of an AT-isochore. Furthermore, we show that the differences in GC content among isochores are mainly due to the GC content variation of introns, the third codon positions and intergenic regions.
Collapse
Affiliation(s)
- Ren Zhang
- Department of Epidemiology and Biostatistics, Tianjin Cancer Institute and Hospital, 300060 Tianjin, China
| | | |
Collapse
|
39
|
Abstract
A positive correlation holds between the GC level of third codon positions of human genes (GC(3)) and hydropathy of the encoded proteins. This correlation may appear counterintuitive, since it links a physical property of proteins to the base composition of 'synonymous' sites. We here establish the nontriviality of the correlation, which has recently been contested. In particular, the correlation cannot simply be a consequence of an analogous correlation for first and second codon positions, since no such correlation exists. More generally, for any explanation via two chained correlations, the intermediate property would need to be strongly correlated with hydrophobicity and/or GC(3).
Collapse
Affiliation(s)
- Kamel Jabbari
- Laboratoire de Génétique Moléculaire, Institut Jacques Monod, 2 Place Jussieu, 75005 Paris, France
| | | | | | | |
Collapse
|
40
|
Lund G, Lauria M, Guldberg P, Zaina S. Duplication-Dependent CG Suppression of the Seed Storage Protein Genes of Maize. Genetics 2003; 165:835-48. [PMID: 14573492 PMCID: PMC1462805 DOI: 10.1093/genetics/165.2.835] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open
Abstract
Abstract
This study investigates the prevalence of CG and CNG suppression in single- vs. multicopy DNA regions of the maize genome. The analysis includes the single- and multicopy seed storage proteins (zeins), the miniature inverted-repeat transposable elements (MITEs), and long terminal repeat (LTR) retrotransposons. Zein genes are clustered on specific chromosomal regions, whereas MITEs and LTRs are dispersed in the genome. The multicopy zein genes are CG suppressed and exhibit large variations in CG suppression. The variation observed correlates with the extent of duplication each zein gene has undergone, indicating that gene duplication results in an increased turnover of cytosine residues. Alignment of individual zein genes confirms this observation and demonstrates that CG depletion results primarily from polarized C:T and G:A transition mutations from a less to a more extensively duplicated gene. In addition, transition mutations occur primarily in a CG or CNG context suggesting that CG suppression may result from deamination of methylated cytosine residues. Duplication-dependent CG depletion is likely to occur at other loci as duplicated MITEs and LTR elements, or elements inserted into duplicated gene regions, also exhibit CG depletion.
Collapse
Affiliation(s)
- Gertrud Lund
- Plant Biochemistry Laboratory, Department of Plant Biology, The Royal Veterinary and Agricultural University, DK-1871 Frederiksberg C, Denmark.
| | | | | | | |
Collapse
|
41
|
Hamada K, Horiike T, Ota H, Mizuno K, Shinozawa T. Presence of isochore structures in reptile genomes suggested by the relationship between GC contents of intron regions and those of coding regions. Genes Genet Syst 2003; 78:195-8. [PMID: 12773820 DOI: 10.1266/ggs.78.195] [Citation(s) in RCA: 11] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/23/2022] Open
Abstract
Vertebrate genomes are mosaics of isochores. On the assumption that marked differences exist in the isochore structure between warm-blooded and cold-blooded animals, variations among vertebrates were previously attributed to adaptation to homeothermy. However, based on the data of coding regions from representatives of extant vertebrates, including a turtle, a crocodile (Archosauromorpha) and a few kinds of snakes (Lepidosauromorpha), it was recently hypothesized that the common ancestors of mammals, birds and extant reptiles already had the "warm-blooded" isochore structure. To test this hypothesis, the nucleotide sequences of alpha-globin genes including non-coding regions (introns) from two snakes, N. kaouthia and E. climacophora, were determined (accession number: AB104824, AB104825). The correlation between the GC contents in the introns and exons of alpha-globin genes from snakes and those from other vertebrates supports the above hypothesis. Similar analysis using data for exons and introns of other genes obtained from the GenBank (Release 131) also support the above hypothesis.
Collapse
Affiliation(s)
- Kazuo Hamada
- Department of Biological and Chemical Engineering, Faculty of Engineering, Gunma University, Kiryu, Japan
| | | | | | | | | |
Collapse
|
42
|
Lercher MJ, Smith NGC, Eyre-Walker A, Hurst LD. The evolution of isochores: evidence from SNP frequency distributions. Genetics 2002; 162:1805-10. [PMID: 12524350 PMCID: PMC1462390 DOI: 10.1093/genetics/162.4.1805] [Citation(s) in RCA: 45] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open
Abstract
The large-scale systematic variation in nucleotide composition along mammalian and avian genomes has been a focus of the debate between neutralist and selectionist views of molecular evolution. Here we test whether the compositional variation is due to mutation bias using two new tests, which do not assume compositional equilibrium. In the first test we assume a standard population genetics model, but in the second we make no assumptions about the underlying population genetics. We apply the tests to single-nucleotide polymorphism data from noncoding regions of the human genome. Both models of neutral mutation bias fit the frequency distributions of SNPs segregating in low- and medium-GC-content regions of the genome adequately, although both suggest compositional nonequilibrium. However, neither model fits the frequency distribution of SNPs from the high-GC-content regions. In contrast, a simple population genetics model that incorporates selection or biased gene conversion cannot be rejected. The results suggest that mutation biases are not solely responsible for the compositional biases found in noncoding regions.
Collapse
Affiliation(s)
- Martin J Lercher
- Department of Biology and Biochemistry, University of Bath, Bath BA2 7AY, United Kingdom.
| | | | | | | |
Collapse
|
43
|
Duret L, Semon M, Piganeau G, Mouchiroud D, Galtier N. Vanishing GC-rich isochores in mammalian genomes. Genetics 2002; 162:1837-47. [PMID: 12524353 PMCID: PMC1462357 DOI: 10.1093/genetics/162.4.1837] [Citation(s) in RCA: 137] [Impact Index Per Article: 6.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open
Abstract
To understand the origin and evolution of isochores-the peculiar spatial distribution of GC content within mammalian genomes-we analyzed the synonymous substitution pattern in coding sequences from closely related species in different mammalian orders. In primate and cetartiodactyls, GC-rich genes are undergoing a large excess of GC --> AT substitutions over AT --> GC substitutions: GC-rich isochores are slowly disappearing from the genome of these two mammalian orders. In rodents, our analyses suggest both a decrease in GC content of GC-rich isochores and an increase in GC-poor isochores, but more data will be necessary to assess the significance of this pattern. These observations question the conclusions of previous works that assumed that base composition was at equilibrium. Analysis of allele frequency in human polymorphism data, however, confirmed that in the GC-rich parts of the genome, GC alleles have a higher probability of fixation than AT alleles. This fixation bias appears not strong enough to overcome the large excess of GC --> AT mutations. Thus, whatever the evolutionary force (neutral or selective) at the origin of GC-rich isochores, this force is no longer effective in mammals. We propose a model based on the biased gene conversion hypothesis that accounts for the origin of GC-rich isochores in the ancestral amniote genome and for their decline in present-day mammals.
Collapse
Affiliation(s)
- Laurent Duret
- Laboratoire de Biométrie et Biologie Evolutive, UMR CNRS 5558 Université Claude Bernard Lyon 1, 69622 Villeurbanne Cedex, France.
| | | | | | | | | |
Collapse
|
44
|
D'Onofrio G, Ghosh TC, Bernardi G. The base composition of the genes is correlated with the secondary structures of the encoded proteins. Gene 2002; 300:179-87. [PMID: 12468099 DOI: 10.1016/s0378-1119(02)01045-4] [Citation(s) in RCA: 45] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/27/2022]
Abstract
The analysis of a non-redundant set of human proteins, for which both the crystallographic structures and the corresponding gene sequences are available, show that bases at third codon position are non-uniformly distributed along the coding sequences. Significant compositional differences are found by comparing the gene regions corresponding to the different secondary structures of the proteins. Inter-and intra-structure differences were most pronounced in the GC-richest genes. These results are not compatible with any proposed hypotheses based on a neutral process of formation/maintenance of the high GC(3) levels of the genes localized in the GC-richest isochores of the human genome.
Collapse
Affiliation(s)
- Giuseppe D'Onofrio
- Laboratorio di Evoluzione Molecolare, Stazione Zoologica A. Dohrn, Naples, Italy.
| | | | | |
Collapse
|
45
|
Abstract
Sauropsids form a complex group of vertebrates including squamates (lizards and snakes), turtles, crocodiles, sphenodon and birds (which are often considered as a separate class). Although avian genomes have been relatively well studied, the genomes of the other groups have remained only sparsely characterized. Moreover, the nuclear sequences available in databanks are still very limited. In the present study, we have analysed the compositional patterns, i.e. the GC (molar fraction of guanine and cytosine in DNA) distributions, of 31 reptilian (particularly snake) genomes by analytical ultracentrifugation of DNAs in CsCl gradients. The profiles were characterized by their modal buoyant density rho(o), mean buoyant density < rho>, asymmetry < rho>- rho(o), and heterogeneity H. The modal buoyant density distribution of reptilian DNAs clearly distinguishes two groups. The snakes fall in the same range of modal densities as most mammals, whereas crocodiles, turtles and lizards show higher values (>1.700 g/cm(3)). As far as the more important compositional properties of asymmetry and heterogeneity are concerned, previous studies showed that amphibians and fishes share relatively low values, whereas birds and mammals are characterized by highly heterogeneous and asymmetric patterns (with the exception of Muridae, which have a lower heterogeneity). The present results show that the snake genomes cover a broad range of asymmetry and heterogeneity values, whereas the genomes of crocodiles and turtles cover a narrow range that is intermediate between those of fishes/amphibians and those of mammals/birds.
Collapse
Affiliation(s)
- Sandrine Hughes
- Laboratorio di Evoluzione Molecolare, Stazione Zoologica Anton Dohrn, Villa Comunale, 80121, Naples, Italy
| | | | | |
Collapse
|
46
|
Birdsell JA. Integrating genomics, bioinformatics, and classical genetics to study the effects of recombination on genome evolution. Mol Biol Evol 2002; 19:1181-97. [PMID: 12082137 DOI: 10.1093/oxfordjournals.molbev.a004176] [Citation(s) in RCA: 180] [Impact Index Per Article: 8.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open
Abstract
This study presents compelling evidence that recombination significantly increases the silent GC content of a genome in a selectively neutral manner, resulting in a highly significant positive correlation between recombination and "GC3s" in the yeast Saccharomyces cerevisiae. Neither selection nor mutation can explain this relationship. A highly significant GC-biased mismatch repair system is documented for the first time in any member of the Kingdom Fungi. Much of the variation in the GC3s within yeast appears to result from GC-biased gene conversion. Evidence suggests that GC-biased mismatch repair exists in numerous organisms spanning six kingdoms. This transkingdom GC mismatch repair bias may have evolved in response to a ubiquitous AT mutational bias. A significant positive correlation between recombination and GC content is found in many of these same organisms, suggesting that the processes influencing the evolution of the yeast genome may be a general phenomenon. Nonrecombining regions of the genome and nonrecombining genomes would not be subject to this type of molecular drive. It is suggested that the low GC content characteristic of many nonrecombining genomes may be the result of three processes (1) a prevailing AT mutational bias, (2) random fixation of the most common types of mutation, and (3) the absence of the GC-biased gene conversion which, in recombining organisms, permits the reversal of the most common types of mutation. A model is proposed to explain the observation that introns, intergenic regions, and pseudogenes typically have lower GC content than the silent sites of corresponding open reading frames. This model is based on the observation that the greater the heterology between two sequences, the less likely it is that recombination will occur between them. According to this "Constraint" hypothesis, the formation and propagation of heteroduplex DNA is expected to occur, on average, more frequently within conserved coding and regulatory regions of the genome. In organisms possessing GC-biased mismatch repair, this would enhance the GC content of these regions through biased gene conversion. These findings have a number of important implications for the way we view genome evolution and suggest a new model for the evolution of sex.
Collapse
Affiliation(s)
- John A Birdsell
- Department of Ecology and Evolutionary Biology, University of Arizona, Tucson, Arizona 85121, USA.
| |
Collapse
|
47
|
|
48
|
Abstract
Within-intron difference of correlation with base composition of the adjacent exons was studied in the genomes of 34 species. For this purpose, GC-percent was determined for segments of 50 bp in length taken at both intron margins and in the internal part of the intron. It was found that in certain genomes the coefficient of correlation with GC-percent of the adjacent exon was significantly higher for the intron margin than for the internal part of the intron (homeotherms, cereals). Only part of this difference can be explained by unequal probability of insertion of transposable elements. Those multicellular organisms which have a low or no within-intron difference in correlation with the adjacent exons (anamniotes, invertebrates, dicots) show a higher local compositional heterogeneity (a greater exon/intron contrast in the GC-content). These results are evidence against the mutational bias being a possible explanation for the compositional genome heterogeneity. Thus, in the genomes with a high global heterogeneity there seems to be a selective force for compliance of intron base composition with the adjacent exons. This force is stronger in those parts of the intron that are closer to exons. In addition, the previously found positive general correlation between the genome size and average intron length was confirmed with a much larger dataset. However, within separate phylogenetic groups this rule can be broken, as it occurs in the cereals (family Poaceae), where a negative correlation was found.
Collapse
Affiliation(s)
- A E Vinogradov
- Institute of Cytology, Russian Academy of Sciences, Tikhoretsky Avenue 4, 194064, St. Petersburg, Russia.
| |
Collapse
|
49
|
Clay O, Carels N, Douady C, Macaya G, Bernardi G. Compositional heterogeneity within and among isochores in mammalian genomes. I. CsCl and sequence analyses. Gene 2001; 276:15-24. [PMID: 11591467 DOI: 10.1016/s0378-1119(01)00667-9] [Citation(s) in RCA: 23] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]
Abstract
GC level distributions of a species' nuclear genome, or of its compositional fractions, encode key information on structural and functional properties of the genome and on its evolution. They can be calculated either from absorbance profiles of the DNA in CsCl density gradients at sedimentation equilibrium, or by scanning long contigs of largely sequenced genomes. In the present study, we address the quantitative characterization of the compositional heterogeneity of genomes, as measured by the GC distributions of fixed-length fragments. Special attention is given to mammalian genomes, since their compartmentalization into isochores implies two levels of heterogeneity, intra-isochore (local) and inter-isochore (global). This partitioning is a natural one, since large-scale compositional properties vary much more among isochores than within them. Intra-isochore GC distributions become roughly Gaussian for long fragments, and their standard deviations decrease only slowly with increasing fragment length, unlike random sequences. This effect can be explained by 'long-range' correlations, often overlooked, that are present along isochores.
Collapse
Affiliation(s)
- O Clay
- Laboratory of Molecular Evolution, Stazione Zoologica Anton Dohrn, Villa Comunale, 80121, Naples, Italy
| | | | | | | | | |
Collapse
|
50
|
Abstract
A few months ago the International Human Genome Sequencing Consortium (IHGSC) published a 61-page paper on the human genome (IHGSC, Nature 409 (2001) 860). Here comments will be presented on some points of the paper that were previously investigated in our laboratory, and some misunderstandings and misconceptions about the organization and the evolutionary history of the human genome will be discussed. A very recent article on the same subject (Eyre-Walker and Hurst, Nat. Rev. Genet. 2 (2001) 549) will also be addressed. The present paper is a complement to two review articles which were published last year (Bernardi, Gene 241 (2000) 3; Gene 259(1) (2000) 31).
Collapse
Affiliation(s)
- G Bernardi
- Laboratory of Molecular Evolution, Stazione Zoologica Anton Dohrn, Villa Comunale, 80121, Naples, Italy.
| |
Collapse
|