1
|
Capanna E, Redi C. Giving bodies to ghosts: locating molecules in the very place where they exert their biological roles. Eur J Histochem 2024; 68:3950. [PMID: 38285084 PMCID: PMC11059455 DOI: 10.4081/ejh.2024.3950] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/22/2023] [Accepted: 01/22/2024] [Indexed: 01/30/2024] Open
Abstract
This paper reviews some of the goals of our investigations published over the years on Rivista di Istochimica Normale e Patologica, Basic and Applied Histochemistry, and the European Journal of Histochemistry - EJH. In a series of papers, we published some of the basic cytochemical features of the sperm cytodifferentiation process for the first time. This was a conceptual and practical prerequisite to the in situ quantitative evaluation of sperm DNA content. We showed that the discrepancy between the expected 1:2 ratio when comparing sperm versus somatic cell DNA content (sperm DNA content is always far low from the theoretical value) is due to DNA losses caused by the hydrochloric treatment entailed by the Feulgen reaction. The knowledge of the specific losses that occur during the various steps of the Feulgen reaction has allowed us to use it critically in Genome Size studies to highlight: - sperm aneuploidy in chromosomally derived subfertility; - the broad variability range of Mammalian genome sizes; - that termites are roaches (after decades of discussion on this topic). In addition, in a seminal paper on human oocytes, we showed (by transmission electron microscopy) a specific chromatin and cytoplasmic organization (both essential for further embryo development) linked to oocyte maturation arrest, a datum quite relevant to treating unmet therapeutic needs in human and veterinary reproduction.
Collapse
|
2
|
Yi SV, Goodisman MAD. The impact of epigenetic information on genome evolution. Philos Trans R Soc Lond B Biol Sci 2021; 376:20200114. [PMID: 33866804 DOI: 10.1098/rstb.2020.0114] [Citation(s) in RCA: 18] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/06/2023] Open
Abstract
Epigenetic information affects gene function by interacting with chromatin, while not changing the DNA sequence itself. However, it has become apparent that the interactions between epigenetic information and chromatin can, in fact, indirectly lead to DNA mutations and ultimately influence genome evolution. This review evaluates the ways in which epigenetic information affects genome sequence and evolution. We discuss how DNA methylation has strong and pervasive effects on DNA sequence evolution in eukaryotic organisms. We also review how the physical interactions arising from the connections between histone proteins and DNA affect DNA mutation and repair. We then discuss how a variety of epigenetic mechanisms exert substantial effects on genome evolution by suppressing the movement of transposable elements. Finally, we examine how genome expansion through gene duplication is also partially controlled by epigenetic information. Overall, we conclude that epigenetic information has widespread indirect effects on DNA sequences in eukaryotes and represents a potent cause and constraint of genome evolution. This article is part of the theme issue 'How does epigenetics influence the course of evolution?'
Collapse
Affiliation(s)
- Soojin V Yi
- School of Biological Sciences, Georgia Institute of Technology, Atlanta, GA 30332, USA
| | - Michael A D Goodisman
- School of Biological Sciences, Georgia Institute of Technology, Atlanta, GA 30332, USA
| |
Collapse
|
3
|
Huttener R, Thorrez L, In't Veld T, Granvik M, Snoeck L, Van Lommel L, Schuit F. GC content of vertebrate exome landscapes reveal areas of accelerated protein evolution. BMC Evol Biol 2019; 19:144. [PMID: 31311498 PMCID: PMC6636035 DOI: 10.1186/s12862-019-1469-1] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/18/2019] [Accepted: 06/26/2019] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Rapid accumulation of vertebrate genome sequences render comparative genomics a powerful approach to study macro-evolutionary events. The assessment of phylogenic relationships between species routinely depends on the analysis of sequence homology at the nucleotide or protein level. RESULTS We analyzed mRNA GC content, codon usage and divergence of orthologous proteins in 55 vertebrate genomes. Data were visualized in genome-wide landscapes using a sliding window approach. Landscapes of GC content reveal both evolutionary conservation of clustered genes, and lineage-specific changes, so that it was possible to construct a phylogenetic tree that closely matched the classic "tree of life". Landscapes of GC content also strongly correlated to landscapes of amino acid usage: positive correlation with glycine, alanine, arginine and proline and negative correlation with phenylalanine, tyrosine, methionine, isoleucine, asparagine and lysine. Peaks of GC content correlated strongly with increased protein divergence. CONCLUSIONS Landscapes of base- and amino acid composition of the coding genome opens a new approach in comparative genomics, allowing identification of discrete regions in which protein evolution accelerated over deep evolutionary time. Insight in the evolution of genome structure may spur novel studies assessing the evolutionary benefit of genes in particular genomic regions.
Collapse
Affiliation(s)
- R Huttener
- Gene Expression Unit, Dept of Cellular and Molecular Medicine, KU Leuven, Leuven, Belgium
| | - L Thorrez
- Gene Expression Unit, Dept of Cellular and Molecular Medicine, KU Leuven, Leuven, Belgium.,Tissue Engineering Laboratory, Dept of Development and Regeneration, KU Leuven, Kortrijk, Belgium
| | - T In't Veld
- Gene Expression Unit, Dept of Cellular and Molecular Medicine, KU Leuven, Leuven, Belgium
| | - M Granvik
- Gene Expression Unit, Dept of Cellular and Molecular Medicine, KU Leuven, Leuven, Belgium
| | - L Snoeck
- Tissue Engineering Laboratory, Dept of Development and Regeneration, KU Leuven, Kortrijk, Belgium
| | - L Van Lommel
- Gene Expression Unit, Dept of Cellular and Molecular Medicine, KU Leuven, Leuven, Belgium
| | - F Schuit
- Gene Expression Unit, Dept of Cellular and Molecular Medicine, KU Leuven, Leuven, Belgium.
| |
Collapse
|
4
|
Abstract
Amino acids typically are encoded by multiple synonymous codons that are not used with the same frequency. Codon usage bias has drawn considerable attention, and several explanations have been offered, including variation in GC-content between species. Focusing on a simple parameter—combined GC proportion of all the synonymous codons for a particular amino acid, termed GCsyn—we try to deepen our understanding of the relationship between GC-content and amino acid/codon usage in more details. We analyzed 65 widely distributed representative species and found a close association between GCsyn, GC-content, and amino acids usage. The overall usages of the four amino acids with the greatest GCsyn and the five amino acids with the lowest GCsyn both vary with the regional GC-content, whereas the usage of the remaining 11 amino acids with intermediate GCsyn is less variable. More interesting, we discovered that codon usage frequencies are nearly constant in regions with similar GC-content. We further quantified the effects of regional GC-content variation (low to high) on amino acid usage and found that GC-content determines the usage variation of amino acids, especially those with extremely high GCsyn, which accounts for 76.7% of the changed GC-content for those regions. Our results suggest that GCsyn correlates with GC-content and has impact on codon/amino acid usage. These findings suggest a novel approach to understanding the role of codon and amino acid usage in shaping genomic architecture and evolutionary patterns of organisms.
Collapse
|
5
|
Taher L, Narlikar L, Ovcharenko I. Identification and computational analysis of gene regulatory elements. Cold Spring Harb Protoc 2015; 2015:pdb.top083642. [PMID: 25561628 PMCID: PMC5885252 DOI: 10.1101/pdb.top083642] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/04/2023]
Abstract
Over the last two decades, advances in experimental and computational technologies have greatly facilitated genomic research. Next-generation sequencing technologies have made de novo sequencing of large genomes affordable, and powerful computational approaches have enabled accurate annotations of genomic DNA sequences. Charting functional regions in genomes must account for not only the coding sequences, but also noncoding RNAs, repetitive elements, chromatin states, epigenetic modifications, and gene regulatory elements. A mix of comparative genomics, high-throughput biological experiments, and machine learning approaches has played a major role in this truly global effort. Here we describe some of these approaches and provide an account of our current understanding of the complex landscape of the human genome. We also present overviews of different publicly available, large-scale experimental data sets and computational tools, which we hope will prove beneficial for researchers working with large and complex genomes.
Collapse
Affiliation(s)
- Leila Taher
- Computational Biology Branch, National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, Maryland 20894
- Institute for Biostatistics and Informatics in Medicine and Ageing Research, University of Rostock, 18051 Rostock, Germany
| | - Leelavati Narlikar
- Chemical Engineering and Process Development Division, National Chemical Laboratory, CSIR, Pune 411008, India
| | - Ivan Ovcharenko
- Computational Biology Branch, National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, Maryland 20894
| |
Collapse
|
6
|
Zhang SH, Wang L. Two common profiles exist for genomic oligonucleotide frequencies. BMC Res Notes 2012; 5:639. [PMID: 23158698 PMCID: PMC3532236 DOI: 10.1186/1756-0500-5-639] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/24/2012] [Accepted: 11/14/2012] [Indexed: 11/19/2022] Open
Abstract
Background It was reported that there is a majority profile for trinucleotide frequencies among genomes. And further study has revealed that two common profiles, rather than one majority profile, exist for genomic trinucleotide frequencies. However, the origins of the common/majority profile remain elusive. Moreover, it is not clear whether the features of common profile may be extended to oligonucleotides other than trinucleotides. Findings We analyzed 571 prokaryotic genomes (chromosomes) and some selected eukaryotic nuclear genomes as well as other genetic systems to study their compositional features. We found that there are also two common profiles for genomic oligonucleotide frequencies: one is from low-GC content genomes, and the other is from high-GC content genomes. Furthermore, each common profile is highly correlated to the average profile of random sequences with corresponding GC content and generated according to first-order symmetry. Conclusions The causes for the existence of two common profiles would mainly be GC content variations and strand symmetry of genomic sequences. Therefore, both GC content and strand symmetry would play important roles in genome evolution.
Collapse
Affiliation(s)
- Shang-Hong Zhang
- Key Laboratory of Gene Engineering of Ministry of Education, and Biotechnology Research Center, Sun Yat-sen University, Guangzhou, 510275, China.
| | | |
Collapse
|
7
|
Windisch HS, Lucassen M, Frickenhaus S. Evolutionary force in confamiliar marine vertebrates of different temperature realms: adaptive trends in zoarcid fish transcriptomes. BMC Genomics 2012; 13:549. [PMID: 23051706 PMCID: PMC3557217 DOI: 10.1186/1471-2164-13-549] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/08/2012] [Accepted: 10/08/2012] [Indexed: 12/04/2022] Open
Abstract
BACKGROUND Studies of temperature-induced adaptation on the basis of genomic sequence data were mainly done in extremophiles. Although the general hypothesis of an increased molecular flexibility in the cold is widely accepted, the results of thermal adaptation are still difficult to detect at proteomic down to the genomic sequence level. Approaches towards a more detailed picture emerge with the advent of new sequencing technologies. Only small changes in primary protein structure have been shown to modify kinetic and thermal properties of enzymes, but likewise for interspecies comparisons a high genetic identity is still essential to specify common principles. The present study uses comprehensive transcriptomic sequence information to uncover general patterns of thermal adaptation on the RNA as well as protein primary structure. RESULTS By comparing orthologous sequences of two closely related zoarcid fish inhabiting different latitudinal zones (Antarctica: Pachycara brachycephalum, temperate zone: Zoarces viviparus) we were able to detect significant differences in the codon usage. In the cold-adapted species a lower GC content in the wobble position prevailed for preserved amino acids. We were able to estimate 40-60% coverage of the functions represented within the two compared zoarcid cDNA-libraries on the basis of a reference genome of the phylogenetically closely related fish Gasterosteus aculeatus. A distinct pattern of amino acid substitutions could be identified for the non-synonymous codon exchanges, with a remarkable surplus of serine and reduction of glutamic acid and asparagine for the Antarctic species. CONCLUSION Based on the differences between orthologous sequences from confamiliar species, distinguished mainly by the temperature regimes of their habitats, we hypothesize that temperature leaves a signature on the composition of biological macromolecules (RNA, proteins) with implications for the transcription and translation level. As the observed pattern of amino acid substitutions only partly support the flexibility hypothesis further evolutionary forces may be effective at the global transcriptome level.
Collapse
Affiliation(s)
- Heidrun Sigrid Windisch
- Alfred Wegener Institute for Polar and Marine Research, Am Handelshafen 12, Bremerhaven, Germany
| | - Magnus Lucassen
- Alfred Wegener Institute for Polar and Marine Research, Am Handelshafen 12, Bremerhaven, Germany
| | - Stephan Frickenhaus
- Alfred Wegener Institute for Polar and Marine Research, Am Handelshafen 12, Bremerhaven, Germany
| |
Collapse
|
8
|
Faux N. Single amino acid and trinucleotide repeats: function and evolution. ADVANCES IN EXPERIMENTAL MEDICINE AND BIOLOGY 2012; 769:26-40. [PMID: 23560303 DOI: 10.1007/978-1-4614-5434-2_3] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/12/2022]
Abstract
The most well known effect of single amino acid repeat expansion, beyond a certain threshold, is the development of a specific disease, depending on the protein in which the expansion has occurred. For example, the expansion of the glutamine repeat in huntingtin leads to the debilitating neurodegenerative disease, Huntington's disease. Similarly, there are a range of other disorders caused by trinucleotide repeat expansions encoding polyglutamine or polyalanine tracts. The age of onset of the polyglutamine-induced neurodegenerative diseases is usually negatively correlated with the length of expanded CAG/glutamine repeat. However, recent studies have given evidence that single amino acid repeats may also play critical roles in normal protein function and that changes in the length of single amino acid repeats is likely to play a beneficial role in evolution. This chapter will look at the prevalence, function and possible role single amino acid repeats have in evolution and other biological processes.
Collapse
Affiliation(s)
- Noel Faux
- Mental Health Research Institute, The University of Melbourne, Parkville, Victoria, Australia.
| |
Collapse
|
9
|
Aoi MC, Rourke BC. Interspecific and intragenic differences in codon usage bias among vertebrate myosin heavy-chain genes. J Mol Evol 2011; 73:74-93. [PMID: 21915654 DOI: 10.1007/s00239-011-9457-0] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/24/2010] [Accepted: 08/19/2011] [Indexed: 01/13/2023]
Abstract
Synonymous codon usage bias is a broadly observed phenomenon in bacteria, plants, and invertebrates and may result from selection. However, the role of selective pressures in shaping codon bias is still controversial in vertebrates, particularly for mammals. The myosin heavy-chain (MyHC) gene family comprises multiple isoforms of the major force-producing contractile protein in cardiac and skeletal muscles. Slow and fast genes are tandemly arrayed on separate chromosomes, and have distinct patterns of functionality and expression in muscle. We analyze both full-length MyHC genes (~5400 bp) and a larger collection of partial sequences at the 3' end (~500 bp). The MyHC isoforms are an interesting system in which to study codon usage bias because of their length, expression, and critical importance to organismal mobility. Codon bias and GC content differs among MyHC genes with regards to functional type, isoform, and position within the gene. Codon bias even varies by isoform within a species. We find evidence in favor of both chromosomal influences on nucleotide composition and selection against nonsense errors (SANE) acting on codon usage in MyHC genes. Intragenic variation in codon bias and elongation rate is significant, with a strong trend for increasing codon bias and elongation rate towards the 3' end of the gene, although the trend is dependent upon the degeneracy class of the codons. Therefore, patterns of codon usage in MyHC genes are consistent with models supporting SANE as a major force shaping codon usage.
Collapse
Affiliation(s)
- Mikio C Aoi
- Department of Mathematics, North Carolina State University, Raleigh, NC 27695, USA
| | | |
Collapse
|
10
|
Wang Y, Leung FCC. GC content increased at CpG flanking positions of fish genes compared with sea squirt orthologs as a mechanism for reducing impact of DNA methylation. PLoS One 2008; 3:e3612. [PMID: 19005573 PMCID: PMC2580031 DOI: 10.1371/journal.pone.0003612] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/11/2008] [Accepted: 10/13/2008] [Indexed: 01/18/2023] Open
Abstract
Background Fractional DNA methylation in sea squirts evolved to global DNA methylation in fish. The impact of global DNA methylation is reflected by more CpG depletions and/or more A/T to G/C changes at CpG flanking positions due to context-dependent mutations of methylated CpG sites. Methods and Findings In this report, we demonstrate that the sea squirt genes have undergone more CpG to TpG/CpA substitutions than the fish orthologs using homologous fragments from orthologous genes among Ciona intestinalis, Ciona savignyi, fugufish and zebrafish. To avoid premature transcription, the TGA sites derived from CGA were largely converted to TGG in sea squirt genes. By contrast, a significant increment of GC content at CpG flanking positions was shown in fish genes. The positively selected A/T to G/C substitutions, in combination with the CpG to TpG/CpA substitutions, are the sources of the extremely low CpG observed/expected ratios in vertebrates. The nonsynonymous substitutions caused by the GC content increase have resulted in frequent amino acid replacements in the directions that were not noticed previously. Conclusion The increased GC content at CpG flanking positions can reduce CpG loss in fish genes and attenuate the impact of DNA methylation on CpG-containing codons, probably accounting for evolution towards vertebrates.
Collapse
Affiliation(s)
- Yong Wang
- Department of Zoology and Genome Research Centre, The University of Hong Kong, Pokfulam, Hong Kong
- * E-mail:
| | - Frederick C. C. Leung
- Department of Zoology and Genome Research Centre, The University of Hong Kong, Pokfulam, Hong Kong
| |
Collapse
|
11
|
Antezana MA, Jordan IK. Highly conserved regimes of neighbor-base-dependent mutation generated the background primary-structural heterogeneities along vertebrate chromosomes. PLoS One 2008; 3:e2145. [PMID: 18478116 PMCID: PMC2366069 DOI: 10.1371/journal.pone.0002145] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/20/2007] [Accepted: 03/17/2008] [Indexed: 01/01/2023] Open
Abstract
The content of guanine+cytosine varies markedly along the chromosomes of homeotherms and great effort has been devoted to studying this heterogeneity and its biological implications. Already before the DNA-sequencing era, however, it was established that the dinucleotides in the DNA of mammals in particular, and of most organisms in general, show striking over- and under-representations that cannot be explained by the base composition. Here we show that in the coding regions of vertebrates both GC content and codon occurrences are strongly correlated with such "motif preferences" even though we quantify the latter using an index that is not affected by the base composition, codon usage, and protein-sequence encoding. These correlations are likely to be the result of the long-term shaping of the primary structure of genic and non-genic DNA by a regime of mutation of which central features have been maintained by natural selection. We find indeed that these preferences are conserved in vertebrates even more rigidly than codon occurrences and we show that the occurrence-preference correlations are stronger in intronic and non-genic DNA, with the R(2)s reaching 99% when GC content is approximately 0.5. The mutation regime appears to be characterized by rates that depend markedly on the bases present at the site preceding and at that following each mutating site, because when we estimate such rates of neighbor-base-dependent mutation (NBDM) from substitutions retrieved from alignments of coding, intronic, and non-genic mammalian DNA sorted and grouped by GC content, they suffice to simulate DNA sequences in which motif occurrences and preferences as well as the correlations of motif preferences with GC content and with motif occurrences, are very similar to the mammalian ones. The best fit, however, is obtained with NBDM regimes lacking strand effects, which indicates that over the long term NBDM switches strands in the germline as one would expect for effects due to loosely contained background transcription. Finally, we show that human coding regions are less mutable under the estimated NBDM regimes than under matched context-independent mutation and that this entails marked differences between the spectra of amino-acid mutations that either mutation regime should generate. In the Discussion we examine the mechanisms likely to underlie NBDM heterogeneity along chromosomes and propose that it reflects how the diversity and activity of lesion-bypass polymerases (LBPs) track the landscapes of scheduled and non-scheduled genome repair, replication, and transcription during the cell cycle. We conclude that the primary structure of vertebrate genic DNA at and below the trinucleotide level has been governed over the long term by highly conserved regimes of NBDM which should be under direct natural selection because they alter drastically missense-mutation rates and hence the somatic and the germline mutational loads. Therefore, the non-coding DNA of vertebrates may have been shaped by NBDM only epiphenomenally, with non-genic DNA being affected mainly when found in the proximity of genes.
Collapse
Affiliation(s)
- Marcos A Antezana
- Department of Ecology and Evolution, University of Chicago, Chicago, Illinois, United States of America.
| | | |
Collapse
|
12
|
Oliver JL, Bernaola-Galván P, Hackenberg M, Carpena P. Phylogenetic distribution of large-scale genome patchiness. BMC Evol Biol 2008; 8:107. [PMID: 18405379 PMCID: PMC2397391 DOI: 10.1186/1471-2148-8-107] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/15/2007] [Accepted: 04/11/2008] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND The phylogenetic distribution of large-scale genome structure (i.e. mosaic compositional patchiness) has been explored mainly by analytical ultracentrifugation of bulk DNA. However, with the availability of large, good-quality chromosome sequences, and the recently developed computational methods to directly analyze patchiness on the genome sequence, an evolutionary comparative analysis can be carried out at the sequence level. RESULTS The local variations in the scaling exponent of the Detrended Fluctuation Analysis are used here to analyze large-scale genome structure and directly uncover the characteristic scales present in genome sequences. Furthermore, through shuffling experiments of selected genome regions, computationally-identified, isochore-like regions were identified as the biological source for the uncovered large-scale genome structure. The phylogenetic distribution of short- and large-scale patchiness was determined in the best-sequenced genome assemblies from eleven eukaryotic genomes: mammals (Homo sapiens, Pan troglodytes, Mus musculus, Rattus norvegicus, and Canis familiaris), birds (Gallus gallus), fishes (Danio rerio), invertebrates (Drosophila melanogaster and Caenorhabditis elegans), plants (Arabidopsis thaliana) and yeasts (Saccharomyces cerevisiae). We found large-scale patchiness of genome structure, associated with in silico determined, isochore-like regions, throughout this wide phylogenetic range. CONCLUSION Large-scale genome structure is detected by directly analyzing DNA sequences in a wide range of eukaryotic chromosome sequences, from human to yeast. In all these genomes, large-scale patchiness can be associated with the isochore-like regions, as directly detected in silico at the sequence level.
Collapse
Affiliation(s)
- José L Oliver
- Dpto de Genética, Facultad de Ciencias, Universidad de Granada, Spain.
| | | | | | | |
Collapse
|
13
|
Elango N, Kim SH, Vigoda E, Yi SV. Mutations of different molecular origins exhibit contrasting patterns of regional substitution rate variation. PLoS Comput Biol 2008; 4:e1000015. [PMID: 18463707 PMCID: PMC2265638 DOI: 10.1371/journal.pcbi.1000015] [Citation(s) in RCA: 76] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/12/2007] [Accepted: 01/30/2008] [Indexed: 11/19/2022] Open
Abstract
Transitions at CpG dinucleotides, referred to as “CpG substitutions”, are a major mutational input into vertebrate genomes and a leading cause of human genetic disease. The prevalence of CpG substitutions is due to their mutational origin, which is dependent on DNA methylation. In comparison, other single nucleotide substitutions (for example those occurring at GpC dinucleotides) mainly arise from errors during DNA replication. Here we analyzed high quality BAC-based data from human, chimpanzee, and baboon to investigate regional variation of CpG substitution rates. We show that CpG substitutions occur approximately 15 times more frequently than other single nucleotide substitutions in primate genomes, and that they exhibit substantial regional variation. Patterns of CpG rate variation are consistent with differences in methylation level and susceptibility to subsequent deamination. In particular, we propose a “distance-decaying” hypothesis, positing that due to the molecular mechanism of a CpG substitution, rates are correlated with the stability of double-stranded DNA surrounding each CpG dinucleotide, and the effect of local DNA stability may decrease with distance from the CpG dinucleotide. Consistent with our “distance-decaying” hypothesis, rates of CpG substitution are strongly (negatively) correlated with regional G+C content. The influence of G+C content decays as the distance from the target CpG site increases. We estimate that the influence of local G+C content extends up to 1,500∼2,000 bps centered on each CpG site. We also show that the distance-decaying relationship persisted when we controlled for the effect of long-range homogeneity of nucleotide composition. GpC sites, in contrast, do not exhibit such “distance-decaying” relationship. Our results highlight an example of the distinctive properties of methylation-dependent substitutions versus substitutions mostly arising from errors during DNA replication. Furthermore, the negative relationship between G+C content and CpG rates may provide an explanation for the observation that GC-rich SINEs show lower CpG rates than other repetitive elements. Mutations are raw materials of evolution. Earlier studies have shown that mutations occur at different frequencies in different genomic regions. By investigating the patterns and causes of such “regional” variation of mutations, we can better understand the mechanisms of underlying mutagenesis. In the human and other mammalian genomes, the most common type of mutation is caused by DNA methylation, which targets cytosines followed by guanine (CpG dinucleotides). Methylated cytosines are then subject to spontaneous deamination, which will cause a C to T (or G to A) transition (CpG substitution). Because this mutational process is unique to CpG substitutions, we reasoned that they might show different patterns of variability from other substitutions. Using high quality genomic sequences from primates and by separately analyzing variability of CpG substitutions and other substitutions, we demonstrate that CpG substitutions occur approximately 15 times more frequently than other substitutions, and show a distinctive pattern of regional variability. Particularly, we propose and provide evidence that because the deamination step requires temporary strand separation, G+C composition near 1,500–2,000 bps each direction from a target CpG affects the probability of a CpG substitution. Incorporating the difference in CpG and other substitutions discovered in this study will help build more realistic evolutionary models.
Collapse
Affiliation(s)
- Navin Elango
- School of Biology, Georgia Institute of Technology, Atlanta, Georgia, United States of America
| | - Seong-Ho Kim
- School of Biology, Georgia Institute of Technology, Atlanta, Georgia, United States of America
| | - NISC Comparative Sequencing Program
- Genome Technology Branch and NIH Intramural Sequencing Center, National Human Genome Research Institute, National Institutes of Health, Bethesda, Maryland, United States of America
| | - Eric Vigoda
- College of Computing, Georgia Institute of Technology, Atlanta, Georgia, United States of America
| | - Soojin V. Yi
- School of Biology, Georgia Institute of Technology, Atlanta, Georgia, United States of America
- * E-mail:
| |
Collapse
|
14
|
Smarda P, Bures P, Horová L, Foggi B, Rossi G. Genome size and GC content evolution of Festuca: ancestral expansion and subsequent reduction. ANNALS OF BOTANY 2008; 101:421-33. [PMID: 18158307 PMCID: PMC2701825 DOI: 10.1093/aob/mcm307] [Citation(s) in RCA: 52] [Impact Index Per Article: 3.1] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/26/2007] [Revised: 09/10/2007] [Accepted: 11/06/2007] [Indexed: 05/20/2023]
Abstract
BACKGROUND AND AIMS Plant evolution is well known to be frequently associated with remarkable changes in genome size and composition; however, the knowledge of long-term evolutionary dynamics of these processes still remains very limited. Here a study is made of the fine dynamics of quantitative genome evolution in Festuca (fescue), the largest genus in Poaceae (grasses). METHODS Using flow cytometry (PI, DAPI), measurements were made of DNA content (2C-value), monoploid genome size (Cx-value), average chromosome size (C/n-value) and cytosine + guanine (GC) content of 101 Festuca taxa and 14 of their close relatives. The results were compared with the existing phylogeny based on ITS and trnL-F sequences. KEY RESULTS The divergence of the fescue lineage from related Poeae was predated by about a 2-fold monoploid genome and chromosome size enlargement, and apparent GC content enrichment. The backward reduction of these parameters, running parallel in both main evolutionary lineages of fine-leaved and broad-leaved fescues, appears to diverge among the existing species groups. The most dramatic reductions are associated with the most recently and rapidly evolving groups which, in combination with recent intraspecific genome size variability, indicate that the reduction process is probably ongoing and evolutionarily young. This dynamics may be a consequence of GC-rich retrotransposon proliferation and removal. Polyploids derived from parents with a large genome size and high GC content (mostly allopolyploids) had smaller Cx- and C/n-values and only slightly deviated from parental GC content, whereas polyploids derived from parents with small genome and low GC content (mostly autopolyploids) generally had a markedly increased GC content and slightly higher Cx- and C/n-values. CONCLUSIONS The present study indicates the high potential of general quantitative characters of the genome for understanding the long-term processes of genome evolution, testing evolutionary hypotheses and their usefulness for large-scale genomic projects. Taken together, the results suggest that there is an evolutionary advantage for small genomes in Festuca.
Collapse
Affiliation(s)
- Petr Smarda
- Masaryk University, Faculty of Science, Institute of Botany and Zoology, Kotlárská 2, CZ-611 37 Brno, Czech Republic.
| | | | | | | | | |
Collapse
|
15
|
Salim HMW, Ring KL, Cavalcanti ARO. Patterns of codon usage in two ciliates that reassign the genetic code: Tetrahymena thermophila and Paramecium tetraurelia. Protist 2008; 159:283-98. [PMID: 18207458 DOI: 10.1016/j.protis.2007.11.003] [Citation(s) in RCA: 29] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/24/2007] [Accepted: 11/17/2007] [Indexed: 10/22/2022]
Abstract
We used the recently sequenced genomes of the ciliates Tetrahymena thermophila and Paramecium tetraurelia to analyze the codon usage patterns in both organisms; we have analyzed codon usage bias, Gln codon usage, GC content and the nucleotide contexts of initiation and termination codons in Tetrahymena and Paramecium. We also studied how these trends change along the length of the genes and in a subset of highly expressed genes. Our results corroborate some of the trends previously described in Tetrahymena, but also negate some specific observations. In both genomes we found a strong bias toward codons with low GC content; however, in highly expressed genes this bias is smaller and codons ending in GC tend to be more frequent. We also found that codon bias increases along gene segments and in highly expressed genes and that the context surrounding initiation and termination codons are always AT rich. Our results also suggest differences in the efficiency of translation of the reassigned stop codons between the two species and between the reassigned codons. Finally, we discuss some of the possible causes for such translational efficiency differences.
Collapse
Affiliation(s)
- Hannah M W Salim
- Biology Department, Pomona College, 175 w 6th street, Claremont, CA 91711, USA
| | | | | |
Collapse
|
16
|
Sellis D, Provata A, Almirantis Y. Alu and LINE1 distributions in the human chromosomes: evidence of global genomic organization expressed in the form of power laws. Mol Biol Evol 2007; 24:2385-99. [PMID: 17728280 DOI: 10.1093/molbev/msm181] [Citation(s) in RCA: 37] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/24/2022] Open
Abstract
Spatial distribution and clustering of repetitive elements are extensively studied during the last years, as well as their colocalization with other genomic components. Here we investigate the large-scale features of Alu and LINE1 spatial arrangement in the human genome by studying the size distribution of interrepeat distances. In most cases, we have found power-law size distributions extending in several orders of magnitude. We have also studied the correlations of the extent of the power law (linear region in double-logarithmic scale) and of the corresponding exponent (slope) with other genomic properties. A model has been formulated to explain the formation of the observed power laws. According to the model, 2 kinds of events occur repetitively in evolutionary time: random insertion of several types of intruding sequences and occasional loss of repeats belonging to the initial population due to "elimination" events. This simple mechanism is shown to reproduce the observed power-law size distributions and is compatible with our present knowledge on the dynamics of repeat proliferation in the genome.
Collapse
Affiliation(s)
- Diamantis Sellis
- National Center for Scientific Research Demokritos, Institute of Biology, Athens, Greece
| | | | | |
Collapse
|
17
|
Abstract
The vertebrate genome is a mosaic of GC-poor and GC-rich isochores, megabase-sized DNA regions of fairly homogeneous base composition that differ in relative amount, gene density, gene expression, replication timing, and recombination frequency. At the emergence of warm-blooded vertebrates, the gene-rich, moderately GC-rich isochores of the cold-blooded ancestors underwent a GC increase. This increase was similar in mammals and birds and was maintained during the evolution of mammalian and avian orders. Neither the GC increase nor its conservation can be accounted for by the random fixation of neutral or nearly neutral single-nucleotide changes (i.e., the vast majority of nucleotide substitutions) or by a biased gene conversion process occurring at random genome locations. Both phenomena can be explained, however, by the neoselectionist theory of genome evolution that is presented here. This theory fully accepts Ohta's nearly neutral view of point mutations but proposes in addition (i) that the AT-biased mutational input present in vertebrates pushes some DNA regions below a certain GC threshold; (ii) that these lower GC levels cause regional changes in chromatin structure that lead to deleterious effects on replication and transcription; and (iii) that the carriers of these changes undergo negative (purifying) selection, the final result being a compositional conservation of the original isochore pattern in the surviving population. Negative selection may also largely explain the GC increase accompanying the emergence of warm-blooded vertebrates. In conclusion, the neoselectionist theory not only provides a solution to the neutralist/selectionist debate but also introduces an epigenomic component in genome evolution.
Collapse
Affiliation(s)
- Giorgio Bernardi
- Molecular Evolution Laboratory, Stazione Zoologica Anton Dohrn, Villa Comunale, 80121 Naples, Italy.
| |
Collapse
|
18
|
Kuhl JC, Havey MJ, Martin WJ, Cheung F, Yuan Q, Landherr L, Hu Y, Leebens-Mack J, Town CD, Sink KC. Comparative genomic analyses in Asparagus. Genome 2007; 48:1052-60. [PMID: 16391674 DOI: 10.1139/g05-073] [Citation(s) in RCA: 26] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/22/2022]
Abstract
Garden asparagus (Asparagus officinalis L.) belongs to the monocot family Asparagaceae in the order Asparagales. Onion (Allium cepa L.) and Asparagus officinalis are 2 of the most economically important plants of the core Asparagales, a well supported monophyletic group within the Asparagales. Coding regions in onion have lower GC contents than the grasses. We compared the GC content of 3374 unique expressed sequence tags (ESTs) from A. officinalis with Lycoris longituba and onion (both members of the core Asparagales), Acorus americanus (sister to all other monocots), the grasses, and Arabidopsis. Although ESTs in A. officinalis and Acorus had a higher average GC content than Arabidopsis, Lycoris, and onion, all were clearly lower than the grasses. The Asparagaceae have the smallest nuclear genomes among all plants in the core Asparagales, which typically have huge genomes. Within the Asparagaceae, European Asparagus species have approximately twice the nuclear DNA of that of southern African Asparagus species. We cloned and sequenced 20 genomic amplicons from European A. officinalis and the southern African species Asparagus plumosus and observed no clear evidence for a recent genome doubling in A. officinalis relative to A. plumosus. These results indicate that members of the genus Asparagus with smaller genomes may be useful genomic models for plants in the core Asparagales.
Collapse
Affiliation(s)
- Joseph C Kuhl
- Department of Horticulture, Michigan State University, East Lansing 48824, USA
| | | | | | | | | | | | | | | | | | | |
Collapse
|
19
|
Yue GH, Lo LC, Zhu ZY, Lin G, Feng F. The complete nucleotide sequence of the mitochondrial genome of Tetraodon nigroviridis. ACTA ACUST UNITED AC 2006; 17:115-21. [PMID: 17076253 DOI: 10.1080/10425170600700378] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/23/2022]
Abstract
The fresh water pufferfish Tetraodon nigroviridis is a model organism for studying evolution of genome and gene functions, but its mitochondrial genome (mtDNA) sequence is still not available. We determined the complete nucleotide sequence of its mtDNA using shotgun sequencing. The T. nigroviridis mtDNA was 16,462 bp, and contained 13 protein coding genes, 22 tRNAs, 2 rRNAs and a major non-coding region. The gene order was identical to the common type of vertebrate mtDNA, whereas the G + C content in the sense strand was 46.9%, much higher than most other fish species. One hundred and three SNPs were detected in the control region of the mtDNA of 35 individuals, a majority of SNPs were detected in the 5' end of the control region. A phylogenetic study including 21 fish species was performed on concatenated amino acid sequences of 12 protein coding genes, and revealed that the T. nigroviridis was clustered with Fugu rubripes into a group. The complete mtDNA sequence and SNPs in its control region will be useful in studying fish evolution, in differentiating different Tetraodon species and in analyzing genetic diversity within T. nigroviridis.
Collapse
Affiliation(s)
- Gen Hua Yue
- Molecular Population Genetics Group, Temasek Life Sciences Lab, 1 Research Link, National University of Singapore, Singapore.
| | | | | | | | | |
Collapse
|
20
|
Pollard KS, Salama SR, King B, Kern AD, Dreszer T, Katzman S, Siepel A, Pedersen JS, Bejerano G, Baertsch R, Rosenbloom KR, Kent J, Haussler D. Forces shaping the fastest evolving regions in the human genome. PLoS Genet 2006; 2:e168. [PMID: 17040131 PMCID: PMC1599772 DOI: 10.1371/journal.pgen.0020168] [Citation(s) in RCA: 351] [Impact Index Per Article: 18.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/27/2005] [Accepted: 08/23/2006] [Indexed: 01/19/2023] Open
Abstract
Comparative genomics allow us to search the human genome for segments that were extensively changed in the last approximately 5 million years since divergence from our common ancestor with chimpanzee, but are highly conserved in other species and thus are likely to be functional. We found 202 genomic elements that are highly conserved in vertebrates but show evidence of significantly accelerated substitution rates in human. These are mostly in non-coding DNA, often near genes associated with transcription and DNA binding. Resequencing confirmed that the five most accelerated elements are dramatically changed in human but not in other primates, with seven times more substitutions in human than in chimp. The accelerated elements, and in particular the top five, show a strong bias for adenine and thymine to guanine and cytosine nucleotide changes and are disproportionately located in high recombination and high guanine and cytosine content environments near telomeres, suggesting either biased gene conversion or isochore selection. In addition, there is some evidence of directional selection in the regions containing the two most accelerated regions. A combination of evolutionary forces has contributed to accelerated evolution of the fastest evolving elements in the human genome.
Collapse
Affiliation(s)
- Katherine S Pollard
- Department of Biomolecular Engineering, University of California Santa Cruz, Santa Cruz, California, United States of America.
| | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
21
|
Ko WY, Piao S, Akashi H. Strong regional heterogeneity in base composition evolution on the Drosophila X chromosome. Genetics 2006; 174:349-62. [PMID: 16547109 PMCID: PMC1569809 DOI: 10.1534/genetics.105.054346] [Citation(s) in RCA: 11] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/05/2005] [Accepted: 05/08/2006] [Indexed: 11/18/2022] Open
Abstract
Fluctuations in base composition appear to be prevalent in Drosophila and mammal genome evolution, but their timescale, genomic breadth, and causes remain obscure. Here, we study base composition evolution within the X chromosomes of Drosophila melanogaster and five of its close relatives. Substitutions were inferred on six extant and two ancestral lineages for 14 near-telomeric and 9 nontelomeric genes. GC content evolution is highly variable both within the genome and within the phylogenetic tree. In the lineages leading to D. yakuba and D. orena, GC content at silent sites has increased rapidly near telomeres, but has decreased in more proximal (nontelomeric) regions. D. orena shows a 17-fold excess of GC-increasing vs. AT-increasing synonymous changes within a small (approximately 130-kb) region close to the telomeric end. Base composition changes within introns are consistent with changes in mutation patterns, but stronger GC elevation at synonymous sites suggests contributions of natural selection or biased gene conversion. The Drosophila yakuba lineage shows a less extreme elevation of GC content distributed over a wider genetic region (approximately 1.2 Mb). A lack of change in GC content for most introns within this region suggests a role of natural selection in localized base composition fluctuations.
Collapse
Affiliation(s)
- Wen-Ya Ko
- Institute of Molecular Evolutionary Genetics and Department of Biology, Pennsylvania State University, University Park, Pennsylvania 16802, USA
| | | | | |
Collapse
|
22
|
Bultrini E, Pizzi E. A new parameter to study compositional properties of non-coding regions in eukaryotic genomes. Gene 2006; 385:75-82. [PMID: 16978802 DOI: 10.1016/j.gene.2006.05.030] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/18/2006] [Revised: 05/04/2006] [Accepted: 05/19/2006] [Indexed: 10/24/2022]
Abstract
Genomes are characterized by global and local compositional properties that are interesting in an evolutionary perspective but also provide useful information for the identification of some functional elements. Following previous studies, in this work we investigated compositional properties of non-coding sequences in four eukaryotic genomes (C. elegans, D. melanogaster, M. musculus, H. sapiens). We developed a procedure based on Principal Component Analysis (PCA) and Linear Discriminant Analysis (LDA) to identify pentamers that are over-represented in introns (intron vocabulary) and to define a new parameter (LD) that reflects oligonucleotide composition of a given sequence. We analyzed genomic sequences and we found that all non-coding parts of a genome are characterized by similar LD values. Furthermore, we used the new parameter to analyze potentially regulatory regions. We extracted non-redundant sets of promoter sequences for D. melanogaster and H. sapiens and we studied their compositional (G+C content and LD parameter) and conformational (bendability propensity) properties. We found that regions immediately surrounding transcription start sites are distinguishable because of their %G+C, LD and bendability values.
Collapse
Affiliation(s)
- Emanuele Bultrini
- Dipartimento di Malattie Infettive, Parassitarie ed Immunomediate, Istituto Superiore di Sanità, Viale Regina Elena, 299, 00161 Roma, Italy
| | | |
Collapse
|
23
|
Wang HC, Xia X, Hickey D. Thermal Adaptation of the Small Subunit Ribosomal RNA Gene: A Comparative Study. J Mol Evol 2006; 63:120-6. [PMID: 16786438 DOI: 10.1007/s00239-005-0255-4] [Citation(s) in RCA: 33] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/28/2005] [Accepted: 03/01/2006] [Indexed: 11/27/2022]
Abstract
We carried out a comprehensive survey of small subunit ribosomal RNA sequences from archaeal, bacterial, and eukaryotic lineages in order to understand the general patterns of thermal adaptation in the rRNA genes. Within each lineage, we compared sequences from mesophilic, moderately thermophilic, and hyperthermophilic species. We carried out a more detailed study of the archaea, because of the wide range of growth temperatures within this group. Our results confirmed that there is a clear correlation between the GC content of the paired stem regions of the 16S rRNA genes and the optimal growth temperature, and we show that this correlation cannot be explained simply by phylogenetic relatedness among the thermophilic archaeal species. In addition, we found a significant, positive relationship between rRNA stem length and growth temperature. These correlations are found in both bacterial and archaeal rRNA genes. Finally, we compared rRNA sequences from warm-blooded and cold-blooded vertebrates. We found that, while rRNA sequences from the warm-blooded vertebrates have a higher overall GC content than those from the cold-blooded vertebrates, this difference is not concentrated in the paired regions of the molecule, suggesting that thermal adaptation is not the cause of the nucleotide differences between the vertebrate lineages.
Collapse
Affiliation(s)
- Huai-Chun Wang
- Department of Mathematics and Statistics, Dalhousie University, Halifax, Nova Scotia, B3H 3J5, Canada
| | | | | |
Collapse
|
24
|
Khelifi A, Meunier J, Duret L, Mouchiroud D. GC content evolution of the human and mouse genomes: insights from the study of processed pseudogenes in regions of different recombination rates. J Mol Evol 2006; 62:745-52. [PMID: 16752212 DOI: 10.1007/s00239-005-0186-0] [Citation(s) in RCA: 22] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/26/2005] [Accepted: 02/02/2006] [Indexed: 01/27/2023]
Abstract
Processed pseudogenes are generated by reverse transcription of a functional gene. They are generally nonfunctional after their insertion and, as a consequence, are no longer subjected to the selective constraints associated with functional genes. Because of this property they can be used as neutral markers in molecular evolution. In this work, we investigated the relationship between the evolution of GC content in recently inserted processed pseudogenes and the local recombination pattern in two mammalian genomes (human and mouse). We confirmed, using original markers, that recombination drives GC content in the human genome and we demonstrated that this is also true for the mouse genome despite lower recombination rates. Finally, we discussed the consequences on isochores evolution and the contrast between the human and the mouse pattern.
Collapse
Affiliation(s)
- Adel Khelifi
- Laboratoire de Biométrie et Biologie Evolutive, UMR CNRS 5558, Université Claude Bernard-Lyon 1, 16 rue Raphael Dubois, 69622 Villeurbanne Cedex, France.
| | | | | | | |
Collapse
|
25
|
van Rheede T, Bastiaans T, Boone DN, Hedges SB, de Jong WW, Madsen O. The Platypus Is in Its Place: Nuclear Genes and Indels Confirm the Sister Group Relation of Monotremes and Therians. Mol Biol Evol 2005; 23:587-97. [PMID: 16291999 DOI: 10.1093/molbev/msj064] [Citation(s) in RCA: 89] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/23/2022] Open
Abstract
Morphological data supports monotremes as the sister group of Theria (extant marsupials + eutherians), but phylogenetic analyses of 12 mitochondrial protein-coding genes have strongly supported the grouping of monotremes with marsupials: the Marsupionta hypothesis. Various nuclear genes tend to support Theria, but a comprehensive study of long concatenated sequences and broad taxon sampling is lacking. We therefore determined sequences from six nuclear genes and obtained additional sequences from the databases to create two large and independent nuclear data sets. One (data set I) emphasized taxon sampling and comprised five genes, with a concatenated length of 2,793 bp, from 21 species (two monotremes, six marsupials, nine placentals, and four outgroups). The other (data set II) emphasized gene sampling and comprised eight genes and three proteins, with a concatenated length of 10,773 bp or 3,669 amino acids, from five taxa (a monotreme, a marsupial, a rodent, human, and chicken). Both data sets were analyzed by parsimony, minimum evolution, maximum likelihood, and Bayesian methods using various models and data partitions. Data set I gave bootstrap support values for Theria between 55% and 100%, while support for Marsupionta was at most 12.3%. Taking base compositional bias into account generally increased the support for Theria. Data set II exclusively supported Theria, with the highest possible values and significantly rejected Marsupionta. Independent phylogenetic evidence in support of Theria was obtained from two single amino acid deletions and one insertion, while no supporting insertions and deletions were found for Marsupionta. On the basis of our data sets, the time of divergence between Monotremata and Theria was estimated at 231-217 MYA and between Marsupialia and Eutheria at 193-186 MYA. The morphological evidence for a basal position of Monotremata, well separated from Theria, is thus fully supported by the available molecular data from nuclear genes.
Collapse
Affiliation(s)
- Teun van Rheede
- Department of Biochemistry, Radboud University Nijmegen, Nijmegen, The Netherlands
| | | | | | | | | | | |
Collapse
|
26
|
Vinogradov AE. Dualism of gene GC content and CpG pattern in regard to expression in the human genome: magnitude versus breadth. Trends Genet 2005; 21:639-43. [PMID: 16202472 DOI: 10.1016/j.tig.2005.09.002] [Citation(s) in RCA: 47] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/31/2005] [Revised: 08/10/2005] [Accepted: 09/09/2005] [Indexed: 11/26/2022]
Abstract
In this article, I show that, in the human genome, the GC content in genes (but not the CpG island in the promoter) is related to the maximum level of gene expression among tissues, whereas the promoter CpG island and gene CpG level are more strongly related to the breadth of expression among tissues. The relevance of gene GC content to expression cannot be a consequence (i.e. a byproduct) of transcription because it does not correlate with expression in the germline. The variation of GC content and CpG level can determine the characteristics of gene expression in a synergistic interplay with transcription-factor-binding sites (mediated by chromatin condensation).
Collapse
|
27
|
Pucciarelli S, Marziale F, Di Giuseppe G, Barchetta S, Miceli C. Ribosomal cold-adaptation: characterization of the genes encoding the acidic ribosomal P0 and P2 proteins from the Antarctic ciliate Euplotes focardii. Gene 2005; 360:103-10. [PMID: 16143466 DOI: 10.1016/j.gene.2005.06.007] [Citation(s) in RCA: 13] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/04/2004] [Revised: 04/14/2005] [Accepted: 06/02/2005] [Indexed: 10/25/2022]
Abstract
Molecular adaptation at low temperature requires specificities represented mainly by modifications in the gene sequence and consequently in the protein primary structure. To characterize the molecular mechanisms responsible for ribosome cold-adaptation, we compared the ribosomal P0 and P2 genes from the Antarctic ciliate Euplotes focardii with homologous genes from mesophilic organisms, including the ciliates Tetrahymena thermophila and non cold-adapted Euplotes species. This analysis revealed the presence of non synonymous mutations unique to E. focardii. In the P0 protein the mutations produced amino acid substitutions that increased the molecular flexibility that may facilitate a conformational adjustment associated with the interaction with the GTPase center of the large subunit rRNA, and increased the hydrophobicity of the region involved in the interaction with P1/P2 heterodimer, probably to keep associated the ribosomal stalk in the cold. In the P2 protein the mutations produced amino acid substitutions that increased the N-terminus flexibility, which may facilitate interactions with P1 protein in the formation of the heterodimer, and reduced the mobility of the C-terminus, to stabilize the stalk during ribosomal activity. Finally, P proteins appeared to be valid markers for investigating the phylogenetic origin of early eukaryotes.
Collapse
Affiliation(s)
- Sandra Pucciarelli
- Dipartimento di Biologia Molecolare, Cellulare e Animale, University of Camerino, Via F Camerini 2, 62032 Camerino (MC), Italy
| | | | | | | | | |
Collapse
|
28
|
Bolzer A, Kreth G, Solovei I, Koehler D, Saracoglu K, Fauth C, Müller S, Eils R, Cremer C, Speicher MR, Cremer T. Three-dimensional maps of all chromosomes in human male fibroblast nuclei and prometaphase rosettes. PLoS Biol 2005; 3:e157. [PMID: 15839726 PMCID: PMC1084335 DOI: 10.1371/journal.pbio.0030157] [Citation(s) in RCA: 590] [Impact Index Per Article: 29.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/26/2004] [Accepted: 03/02/2005] [Indexed: 12/19/2022] Open
Abstract
Studies of higher-order chromatin arrangements are an essential part of ongoing attempts to explore changes in epigenome structure and their functional implications during development and cell differentiation. However, the extent and cell-type-specificity of three-dimensional (3D) chromosome arrangements has remained controversial. In order to overcome technical limitations of previous studies, we have developed tools that allow the quantitative 3D positional mapping of all chromosomes simultaneously. We present unequivocal evidence for a probabilistic 3D order of prometaphase chromosomes, as well as of chromosome territories (CTs) in nuclei of quiescent (G0) and cycling (early S-phase) human diploid fibroblasts (46, XY). Radial distance measurements showed a probabilistic, highly nonrandom correlation with chromosome size: small chromosomes-independently of their gene density-were distributed significantly closer to the center of the nucleus or prometaphase rosette, while large chromosomes were located closer to the nuclear or rosette rim. This arrangement was independently confirmed in both human fibroblast and amniotic fluid cell nuclei. Notably, these cell types exhibit flat-ellipsoidal cell nuclei, in contrast to the spherical nuclei of lymphocytes and several other human cell types, for which we and others previously demonstrated gene-density-correlated radial 3D CT arrangements. Modeling of 3D CT arrangements suggests that cell-type-specific differences in radial CT arrangements are not solely due to geometrical constraints that result from nuclear shape differences. We also found gene-density-correlated arrangements of higher-order chromatin shared by all human cell types studied so far. Chromatin domains, which are gene-poor, form a layer beneath the nuclear envelope, while gene-dense chromatin is enriched in the nuclear interior. We discuss the possible functional implications of this finding.
Collapse
Affiliation(s)
- Andreas Bolzer
- 1Department of Biology II, Anthropology and Human GeneticsLudwig Maximilians University, MunichGermany
| | - Gregor Kreth
- 2Kirchhoff Institute of Physics, University of HeidelbergHeidelbergGermany
| | - Irina Solovei
- 1Department of Biology II, Anthropology and Human GeneticsLudwig Maximilians University, MunichGermany
| | - Daniela Koehler
- 1Department of Biology II, Anthropology and Human GeneticsLudwig Maximilians University, MunichGermany
| | - Kaan Saracoglu
- 3Theoretical Bioinformatics, German Cancer Research Center (DKFZ)HeidelbergGermany
| | - Christine Fauth
- 4Institute of Human Genetics, Technical University MunichGermany
- 5Institute of Human Genetics, GSF National Research Center for Environment and HealthNeuherbergGermany
| | - Stefan Müller
- 1Department of Biology II, Anthropology and Human GeneticsLudwig Maximilians University, MunichGermany
| | - Roland Eils
- 3Theoretical Bioinformatics, German Cancer Research Center (DKFZ)HeidelbergGermany
| | - Christoph Cremer
- 2Kirchhoff Institute of Physics, University of HeidelbergHeidelbergGermany
| | - Michael R Speicher
- 4Institute of Human Genetics, Technical University MunichGermany
- 5Institute of Human Genetics, GSF National Research Center for Environment and HealthNeuherbergGermany
| | - Thomas Cremer
- 1Department of Biology II, Anthropology and Human GeneticsLudwig Maximilians University, MunichGermany
| |
Collapse
|
29
|
Adams DJ, Dermitzakis ET, Cox T, Smith J, Davies R, Banerjee R, Bonfield J, Mullikin JC, Chung YJ, Rogers J, Bradley A. Complex haplotypes, copy number polymorphisms and coding variation in two recently divergent mouse strains. Nat Genet 2005; 37:532-6. [PMID: 15852006 DOI: 10.1038/ng1551] [Citation(s) in RCA: 54] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/19/2004] [Accepted: 03/18/2005] [Indexed: 11/09/2022]
Abstract
Inbred mouse strains provide the foundation for mouse genetics. By selecting for phenotypic features of interest, inbreeding drives genomic evolution and eliminates individual variation, while fixing certain sets of alleles that are responsible for the trait characteristics of the strain. Mouse strains 129Sv (129S5) and C57BL/6J, two of the most widely used inbred lines, diverged from common ancestors within the last century, yet very little is known about the genomic differences between them. By comparative genomic hybridization and sequence analysis of 129S5 short insert libraries, we identified substantial structural variation, a complex fine-scale haplotype pattern with a continuous distribution of diversity blocks, and extensive nucleotide variation, including nonsynonymous coding SNPs and stop codons. Collectively, these genomic changes denote the level and direction of allele fixation that has occurred during inbreeding and provide a basis for defining what makes these mouse strains unique.
Collapse
Affiliation(s)
- David J Adams
- The Wellcome Trust Sanger Institute, Hinxton, Cambs, CB10 1SA, UK
| | | | | | | | | | | | | | | | | | | | | |
Collapse
|
30
|
Levin AM, Ghosh D, Cho KR, Kardia SLR. A model-based scan statistic for identifying extreme chromosomal regions of gene expression in human tumors. Bioinformatics 2005; 21:2867-74. [PMID: 15814559 DOI: 10.1093/bioinformatics/bti417] [Citation(s) in RCA: 27] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open
Abstract
MOTIVATION The analysis of gene expression data in its chromosomal context has been a recent development in cancer research. However, currently available methods fail to account for variation in the distance between genes, gene density and genomic features (e.g. GC content) in identifying increased or decreased chromosomal regions of gene expression. RESULTS We have developed a model-based scan statistic that accounts for these aspects of the complex landscape of the human genome in the identification of extreme chromosomal regions of gene expression. This method may be applied to gene expression data regardless of the microarray platform used to generate it. To demonstrate the accuracy and utility of this method, we applied it to a breast cancer gene expression dataset and tested its ability to predict regions containing medium-to-high level DNA amplification (DNA ratio values >2). A classifier was developed from the scan statistic results that had a 10-fold cross-validated classification rate of 93% and a positive predictive value of 88%. This result strongly suggests that the model-based scan statistic and the expression characteristics of an increased chromosomal region of gene expression can be used to accurately predict chromosomal regions containing amplified genes. AVAILABILITY Functions in the R-language are available from the author upon request. CONTACT fcouples@umich.edu.
Collapse
Affiliation(s)
- Albert M Levin
- Department of Epidemiology, University of Michigan, Ann Arbor, MI 48104-3028, USA.
| | | | | | | |
Collapse
|
31
|
Yamashita R, Suzuki Y, Sugano S, Nakai K. Genome-wide analysis reveals strong correlation between CpG islands with nearby transcription start sites of genes and their tissue specificity. Gene 2005; 350:129-36. [PMID: 15784181 DOI: 10.1016/j.gene.2005.01.012] [Citation(s) in RCA: 69] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/11/2004] [Revised: 12/28/2004] [Accepted: 01/24/2005] [Indexed: 10/25/2022]
Abstract
It has been envisaged that CpG islands are often observed near the transcriptional start sites (TSS) of housekeeping genes. However, neither the precise positions of CpG islands relative to TSS of genes nor the correlation between the presence of the CpG islands and the expression specificity of these genes is well-understood. Using thousands of sequences with known TSS in human and mouse, we found that there is a clear peak in the distribution of CpG islands around TSS in the genes of these two species. Thus, we classified human (mouse) genes into 6600 (2948) CpG+ genes and 2619 (1830) CpG- ones, based on the presence of a CpG island within the -100: +100 region. We estimated the degree of each gene being a housekeeper by the number of cDNA libraries where its ESTs were detected. Then, the tendency that a gene lacking CpG islands around its TSS is expressed with a higher degree of tissue specificity turned out to be evolutionarily conserved. We also confirmed this tendency by analyzing the gene ontology annotation of classified genes. Since no such clear correlation was found in the control data (mRNAs, pre-mRNAs, and chromosome banding pattern), we concluded that the effect of a CpG island near the TSS should be more important than the global GC content of the region where the gene resides.
Collapse
Affiliation(s)
- Riu Yamashita
- Human Genome Center, Institute of Medical Science, University of Tokyo, 4-6-1, Shirokane-dai Minato-ku, Tokyo 108-8639, Japan
| | | | | | | |
Collapse
|
32
|
Castresana J, Guigó R, Albà MM. Clustering of genes coding for DNA binding proteins in a region of atypical evolution of the human genome. J Mol Evol 2005; 59:72-9. [PMID: 15383909 DOI: 10.1007/s00239-004-2605-z] [Citation(s) in RCA: 13] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/14/2003] [Accepted: 02/03/2004] [Indexed: 10/26/2022]
Abstract
Comparison of the human and mouse genomes has revealed that significant variations in evolutionary rates exist among genomic regions and that a large part of this variation is interchromosomal. We confirm in this work, using a large collection of introns, that human chromosome 19 is the one that shows the highest divergence with respect to mouse. To search for other differences among chromosomes, we examine the distribution of gene functions in human and mouse chromosomes using the Gene Ontology definitions. We found by correspondence analysis that among the strongest clusterings of gene functions in human chromosomes is a group of genes coding for DNA binding proteins in chromosome 19. Interestingly, chromosome 19 also has a very high GC content, a feature that has been proposed to promote an opening of the chromatin, thereby facilitating binding of proteins to the DNA helix. In the mouse genome, however, a similar aggregation of genes coding for DNA binding proteins and high GC content cannot be found. This suggests that the distribution of genes coding for DNA binding proteins and the variations of the chromatin accessibility to these proteins are different in the human and mouse genomes. It is likely that the overall high synonymous and intron rates in chromosome 19 are a by-product of the high GC content of this chromosome.
Collapse
Affiliation(s)
- Jose Castresana
- Centre de Regulació Genòmica (CRG), Programme of Bioinformatics and Genomics, Passeig Marítim 37-49, 08003, Barcelona, Spain.
| | | | | |
Collapse
|
33
|
Vinogradov AE. Noncoding DNA, isochores and gene expression: nucleosome formation potential. Nucleic Acids Res 2005; 33:559-63. [PMID: 15673716 PMCID: PMC548339 DOI: 10.1093/nar/gki184] [Citation(s) in RCA: 58] [Impact Index Per Article: 2.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/05/2004] [Revised: 12/21/2004] [Accepted: 12/21/2004] [Indexed: 12/04/2022] Open
Abstract
The nucleosome formation potential of introns, intergenic spacers and exons of human genes is shown here to negatively correlate with among-tissues breadth of gene expression. The nucleosome formation potential is also found to negatively correlate with the GC content of genomic sequences; the slope of regression line is steeper in exons compared with noncoding DNA (introns and intergenic spacers). The correlation with GC content is independent of sequence length; in turn, the nucleosome formation potential of introns and intergenic spacers positively (albeit weakly) correlates with sequence length independently of GC content. These findings help explain the functional significance of the isochores (regions differing in GC content) in the human genome as a result of optimization of genomic structure for epigenetic complexity and support the notion that noncoding DNA is important for orderly chromatin condensation and chromatin-mediated suppression of tissue-specific genes.
Collapse
|
34
|
Paces J, Zíka R, Paces V, Pavlícek A, Clay O, Bernardi G. Representing GC variation along eukaryotic chromosomes. Gene 2004; 333:135-41. [PMID: 15177688 DOI: 10.1016/j.gene.2004.02.041] [Citation(s) in RCA: 34] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/28/2003] [Accepted: 02/10/2004] [Indexed: 02/03/2023]
Abstract
Genome sequencing now permits direct visual representation, at any scale, of GC heterogeneity along the chromosomes of several higher eukaryotes. Plots can be easily obtained from the chromosomal sequences, yet sequence releases of mammalian or plant chromosomes still tend to use small scales or window sizes that obscure important large-scale compositional features. To faithfully reveal, at one glance, the compositional variation at a given scale, we have devised a simple scheme that combines line plots with color-coded shading of the regions underneath the plots. The scheme can be applied to different eukaryotic genomes to facilitate their comparison, as illustrated here for a sample of chromosomes chosen from seven selected species. As a complement to a previously published compact view of isochores in the human genome sequence, we include here an analogous map for the recently sequenced mouse genome, and discuss the contribution of repetitive DNA to the GC variation along the plots. Supplementary information, including a database of color-coded GC profiles for all recently sequenced eukaryotes and the program draw_chromosomes_gc.pl used to obtain them, are available at.
Collapse
Affiliation(s)
- Jan Paces
- Institute of Molecular Genetics, Academy of Sciences of the Czech Republic, Flemingovo 2, Prague CZ-16637, Czech Republic
| | | | | | | | | | | |
Collapse
|
35
|
Abstract
The existence of a well conserved linear relationship between GC levels of genes' second and third codon positions (GC2, GC3) prompted us to focus on the landscape, or joint distribution, spanned by these two variables. In human, well curated coding sequences now cover at least 15%-30% of the estimated total gene set. Our analysis of the landscape defined by this gene set revealed not only the well documented linear crest, but also the presence of several peaks and valleys along that crest, a property that was also indicated in two other warm-blooded vertebrates represented by large gene databases, that is, mouse and chicken. GC2 is the sum of eight amino acid frequencies, whereas GC3 is linearly related to the GC level of the chromosomal region containing the gene. The landscapes therefore portray relations between proteins and the DNA environments of the genes that encode them.
Collapse
Affiliation(s)
- Stéphane Cruveiller
- Laboratorio di Evoluzione Molecolare, Stazione Zoologica Anton Dohrn, 80121 Napoli, Italy
| | | | | | | |
Collapse
|
36
|
Rooney AP. Selection for highly biased amino acid frequency in the TolA cell envelope protein of Proteobacteria. J Mol Evol 2004; 57:731-6. [PMID: 14745542 DOI: 10.1007/s00239-003-2530-6] [Citation(s) in RCA: 12] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/19/2002] [Accepted: 07/21/2003] [Indexed: 11/30/2022]
Abstract
The bacterial cell envelope protein TolA functions to maintain the integrity of the cell membrane. This protein contains high levels of alanine and lysine that are used in the formation of alpha helices, which are required for normal protein function. The neutral model of molecular evolution predicts that amino acid composition and nucleotide composition are driven by the underlying GC content, as a result of mutation bias. However, this study shows that selection has acted to maintain high levels of alanine and lysine in the TolA protein of Proteobacteria, which in turn has biased nucleotide composition in the corresponding tolA gene.
Collapse
Affiliation(s)
- Alejandro P Rooney
- Microbial Genomics and Bioprocessing Research Unit, National Center for Agricultural Utilization Research, Agricultural Research Service, U.S. Department of Agriculture, 1815 North University Street, Peoria, IL 61604, USA.
| |
Collapse
|
37
|
Marais G, Charlesworth B, Wright SI. Recombination and base composition: the case of the highly self-fertilizing plant Arabidopsis thaliana. Genome Biol 2004; 5:R45. [PMID: 15239830 PMCID: PMC463295 DOI: 10.1186/gb-2004-5-7-r45] [Citation(s) in RCA: 68] [Impact Index Per Article: 3.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/26/2004] [Revised: 04/26/2004] [Accepted: 04/30/2004] [Indexed: 11/24/2022] Open
Abstract
The effects of recombination and self-fertilization on base composition were investigated both theoretically and experimentally in the Arabidopsis genome. Levels of inbreeding modulate the effect of recombination on base composition. Background Rates of recombination can vary among genomic regions in eukaryotes, and this is believed to have major effects on their genome organization in terms of base composition, DNA repeat density, intron size, evolutionary rates and gene order. In highly self-fertilizing species such as Arabidopsis thaliana, however, heterozygosity is expected to be strongly reduced and recombination will be much less effective, so that its influence on genome organization should be greatly reduced. Results Here we investigated theoretically the joint effects of recombination and self-fertilization on base composition, and tested the predictions with genomic data from the complete A. thaliana genome. We show that, in this species, both codon-usage bias and GC content do not correlate with the local rates of crossing over, in agreement with our theoretical results. Conclusions We conclude that levels of inbreeding modulate the effect of recombination on base composition, and possibly other genomic features (for example, transposable element dynamics). We argue that inbreeding should be considered when interpreting patterns of molecular evolution.
Collapse
Affiliation(s)
- G Marais
- Institute of Cell, Animal and Population Biology, University of Edinburgh, EH9 3JT Edinburgh, UK
| | - B Charlesworth
- Institute of Cell, Animal and Population Biology, University of Edinburgh, EH9 3JT Edinburgh, UK
| | - S I Wright
- Institute of Cell, Animal and Population Biology, University of Edinburgh, EH9 3JT Edinburgh, UK
- Current address: Department of Biology, York University, 4700 Keele St, Toronto, Ontario M3J 1P3, Canada
| |
Collapse
|
38
|
Friedman KA, Heller A. Guanosine Distribution and Oxidation Resistance in Eight Eukaryotic Genomes. J Am Chem Soc 2004; 126:2368-71. [PMID: 14982441 DOI: 10.1021/ja038217r] [Citation(s) in RCA: 23] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/15/2023]
Abstract
Reactive oxygen species that attack DNA are continuously generated in living cells. Both the guanosine (G) mole fraction and its distribution should affect the stability of genomes and their parts to oxidation. At a lesser G content, genomes should be more oxidation resistant or "ennobled". Oxidant scavenging by G's in nonessential parts of introns and intergenic domains should decrease G oxidation in the essential exons. To determine whether genomes are indeed ennobled and whether oxidant-scavenging domains exist in genomes, the relative rates of guanosine oxidation in average exons, introns, and intergenic domains were estimated. Comparison among genomes indicated that average exons are ennobled in the genomes of Caenorhabditis (worm), Arabidopsis (plant), Saccharomyces (yeast), Schizosaccharomyces (yeast), and Plasmodium (malaria parasite), and that average introns and intergenic domains are ennobled in these genomes and in the genome of Drosophila (fly). The exon oxidation rates estimated for these genomes were less than the rate for the hypothetical "standard" genome, with a 0.25 mole fraction of uniformly distributed G. For Plasmodium the rate was half of that estimated for the standard genome. Average exons were not ennobled in the human or fly genomes; their G distributions were comparable to that in the standard genome. Instead, their exons were situated between introns and intergenic domains that could protect them by oxidant scavenging, the G's of their introns and intergenic domains outnumbering those of their exons 50-fold in humans and 4-fold in flies. The G distribution in the Encephalitozoon (parasite) genome was not protective relative to that of the standard genome.
Collapse
Affiliation(s)
- Keith A Friedman
- Department of Chemical Engineering and the Texas Materials Institute, University of Texas at Austin, Austin, Texas 78712-0231, USA.
| | | |
Collapse
|
39
|
Kuhl JC, Cheung F, Yuan Q, Martin W, Zewdie Y, McCallum J, Catanach A, Rutherford P, Sink KC, Jenderek M, Prince JP, Town CD, Havey MJ. A unique set of 11,008 onion expressed sequence tags reveals expressed sequence and genomic differences between the monocot orders Asparagales and Poales. THE PLANT CELL 2004; 16:114-25. [PMID: 14671025 PMCID: PMC301399 DOI: 10.1105/tpc.017202] [Citation(s) in RCA: 84] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/10/2003] [Accepted: 11/05/2003] [Indexed: 05/18/2023]
Abstract
Enormous genomic resources have been developed for plants in the monocot order Poales; however, it is not clear how representative the Poales are for the monocots as a whole. The Asparagales are a monophyletic order sister to the lineage carrying the Poales and possess economically important plants such as asparagus, garlic, and onion. To assess the genomic differences between the Asparagales and Poales, we generated 11,008 unique ESTs from a normalized cDNA library of onion. Sequence analyses of these ESTs revealed microsatellite markers, single nucleotide polymorphisms, and homologs of transposable elements. Mean nucleotide similarity between rice and the Asparagales was 78% across coding regions. Expressed sequence and genomic comparisons revealed strong differences between the Asparagales and Poales for codon usage and mean GC content, GC distribution, and relative GC content at each codon position, indicating that genomic characteristics are not uniform across the monocots. The Asparagales were more similar to eudicots than to the Poales for these genomic characteristics.
Collapse
Affiliation(s)
- Joseph C Kuhl
- Department of Horticulture, Michigan State University, East Lansing, Michigan 48824, USA
| | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
40
|
Abstract
Since many gene duplications in the human genome are ancient duplications going back to the origin of vertebrates, the question may be asked about the fate of such duplicated genes at the compositional genome transitions that occurred between cold- and warm-blooded vertebrates. Indeed, at that transition, about half of the (GC-poor) genes of cold-blooded vertebrates (the genes of the gene-dense "ancestral genome core") underwent a GC enrichment to become the genes of the "genome core" of warm-blooded vertebrates. Since the compositional distribution of the human duplicated genes investigated (1111 pairs) mimics the general distribution of human genes (about 50% GC(3)-poor and 50% GC(3)-rich genes, the border being at 60% GC(3)), we considered two possibilities, namely that the compositional transition affected either (i) about half of the copies on a random basis, or (ii) preferentially only one copy of the duplicated genes. The two possibilities could be distinguished if each copy is put into one of two subsets according to its GC(3) level. Indeed, in the first case, the two distributions would be similar, whereas in the second case, the two distributions would be different, one copy having maintained the ancestral GC-poor composition, and one copy having undergone the compositional change. Using this approach, we could show that, by far and large, one copy of the duplicated genes preferentially underwent the GC enrichment. This result implies that this copy, which had possibly acquired a different function and/or regulation, was preferentially translocated into the gene-dense compartment of the genome, the "ancestral genome core", namely the "gene space" which underwent the compositional transition at the emergence of warm-blooded vertebrates.
Collapse
Affiliation(s)
- Kamel Jabbari
- Laboratoire de Génétique Moléculaire, Institut Jacques Monod, 2 Place Jussieu, F-75005 Paris, France
| | | | | |
Collapse
|
41
|
Clay O, Arhondakis S, D'Onofrio G, Bernardi G. LDH-A and α-actin as tools to assess the effects of temperature on the vertebrate genome: some problems. Gene 2003; 317:157-60. [PMID: 14604804 DOI: 10.1016/s0378-1119(03)00698-x] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/22/2022]
Abstract
In a recent paper written with the purpose of shedding light on the question of whether genomic GC levels are related to temperature in vertebrates, Ream et al. [Mol. Biol. Evol. 20 (2003) 105] offered an analysis of two sets of homologous genes: those coding for alpha-actin and lactate dehydrogenase-A (LDH-A). The conclusion was that "there is no consistent relationship between adaptation temperature and the percentage of thermal stability-enhancing G+C base pairs in protein-coding genes". We argue here that the data presented neither prove nor suggest such a conclusion because of conceptual and methodological errors.
Collapse
Affiliation(s)
- Oliver Clay
- Laboratorio di Evoluzione Molecolare, Stazione Zoologica Anton Dohrn, Villa Comunale, 80121, Naples, Italy
| | | | | | | |
Collapse
|
42
|
Vinogradov AE. Isochores and tissue-specificity. Nucleic Acids Res 2003; 31:5212-20. [PMID: 12930973 PMCID: PMC212799 DOI: 10.1093/nar/gkg699] [Citation(s) in RCA: 97] [Impact Index Per Article: 4.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/21/2003] [Revised: 05/11/2003] [Accepted: 07/03/2003] [Indexed: 11/13/2022] Open
Abstract
The housekeeping (ubiquitously expressed) genes in the mammal genome were shown here to be on average slightly GC-richer than tissue-specific genes. Both housekeeping and tissue-specific genes occupy similar ranges of GC content, but the former tend to concentrate in the upper part of the range. In the human genome, tissue-specific genes show two maxima, GC-poor and GC-rich. The strictly tissue-specific human genes tend to concentrate in the GC-poor region; their distribution is left-skewed and thus reciprocal to the distribution of housekeeping genes. The intermediately tissue-specific genes show an intermediate GC content and the right-skewed distribution. Both in the human and mouse, genes specific for some tissues (e.g., parts of the central nervous system) have a higher average GC content than housekeeping genes. Since they are not transcribed in the germ line (in contrast to housekeeping genes), and therefore have a lower probability of inheritable gene conversion, this finding contradicts the biased gene conversion (BGC) explanation for elevated GC content in the heavy isochores of mammal genome. Genes specific for germ-line tissues (ovary, testes) show a low average GC content, which is also in contradiction to the BGC explanation. Both for the total data set and for the most part of tissues taken separately, a weak positive correlation was found between gene GC content and expression level. The fraction of ubiquitously expressed genes is nearly 1.5-fold higher in the mouse than in the human. This suggests that mouse tissues are comparatively less differentiated (on the molecular level), which can be related to a less pronounced isochoric structure of the mouse genome. In each separate tissue (in both species), tissue-specific genes do not form a clear-cut frequency peak (in contrast to housekeeping genes), but constitute a continuum with a gradually increasing degree of tissue-specificity, which probably reflects the path of cell differentiation and/or an independent use of the same protein in several unrelated tissues.
Collapse
Affiliation(s)
- Alexander E Vinogradov
- Institute of Cytology, Russian Academy of Sciences, Tikhoretsky Avenue 4, St Petersburg 194064, Russia.
| |
Collapse
|
43
|
Yuhki N, Beck T, Stephens RM, Nishigaki Y, Newmann K, O'Brien SJ. Comparative genome organization of human, murine, and feline MHC class II region. Genome Res 2003; 13:1169-79. [PMID: 12743023 PMCID: PMC403645 DOI: 10.1101/gr.976103] [Citation(s) in RCA: 72] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/30/2022]
Abstract
To study comparative molecular dynamics in the genesis of the major histocompatibility complex (MHC), we determined a complete nucleotide sequence spanning 758,291 bp of the domestic cat (Felis catus) extended and classical class II region. The feline class II MHC includes 44 genes (31 predicted to be expressed) which display DNA sequence homology and ordered gene synteny with human HLA and mouse H2, in extended class II and centromere proximal regions (DM to DO) of the classical class II region. However, remarkable genomic alterations including gene gain and loss plus size differentials of 250 kb are evident in comparisons of the cat class II with those of human and mouse. The cat MHC lacks the entire DQ region and retains only relict pseudogene homologs of DP genes, compensated by expansion and reorganization of seven modern DR genes. Repetitive gene families within the feline MHC comprise 35% of the feline MHC with very different density and abundance of GC levels, SINES, LINES, STRs, and retro-elements from the same repeats in human and mouse MHC. Comparison of the feline MHC with the murine and human MHC offers a detailed view of the consequences of genome organization in three mammalian lineages.
Collapse
Affiliation(s)
- Naoya Yuhki
- Laboratory of Genomic Diversity, National Cancer Institute-Frederick, Frederick, Maryland 21702, USA.
| | | | | | | | | | | |
Collapse
|
44
|
Vinogradov AE. DNA helix: the importance of being GC-rich. Nucleic Acids Res 2003; 31:1838-44. [PMID: 12654999 PMCID: PMC152811 DOI: 10.1093/nar/gkg296] [Citation(s) in RCA: 179] [Impact Index Per Article: 8.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/21/2003] [Revised: 02/12/2003] [Accepted: 02/12/2003] [Indexed: 11/12/2022] Open
Abstract
A new explanation for the emergence of heavy (GC-rich) isochores is proposed, based on the study of thermostability, bendability, ability to B-Z transition and curvature of the DNA helix. The absolute values of thermostability, bendability and ability to B-Z transition correlated positively with GC content, whereas curvature correlated negatively. The relative values of these parameters were determined as compared to randomized sequences. In genes and intergenic spacers of warm-blooded animals, both the relative bendability and ability to B-Z transition increased with elevation of GC content, whereas the relative thermostability and curvature decreased. The usage of synonymous codons in GC-rich genes was also found to augment bendability and ability to B-Z transition and to reduce thermostability of DNA (as compared to synonymous codons with the same GC content). The analysis of transposable elements (Alu and B2 repeats in the human and mouse) showed that the level of their divergence from the consensus sequence positively correlated with relative bendability and ability to B-Z transition and negatively with relative thermostability. The bendability and ability to B-Z transition are known to relate to open chromatin and active transcription, whereas curvature facilitates chromatin condensation. Because heavy isochores are known to be gene-rich and show a high level of transcription, it is suggested here that isochores arose not as an adaptation to elevated temperature but because of a certain grade of general organization and correspondingly advanced level of genomic organization, reflected in genome structuring, with physical properties of DNA in the gene-rich regions being optimized for active transcription and in the gene-poor regions for chromatin condensation ('transcription/grade' concept).
Collapse
Affiliation(s)
- Alexander E Vinogradov
- Institute of Cytology, Russian Academy of Sciences, Tikhoretsky Avenue 4, St Petersburg 194064, Russia.
| |
Collapse
|
45
|
Som A, Sahoo S, Mukhopadhyay I, Chakrabarti J, Chaudhury R. Scaling violations in coding DNA. EUROPHYSICS LETTERS (EPL) 2003; 62:271-277. [DOI: 10.1209/epl/i2003-00341-6] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 07/19/2023]
|
46
|
Ream RA, Johns GC, Somero GN. Base compositions of genes encoding alpha-actin and lactate dehydrogenase-A from differently adapted vertebrates show no temperature-adaptive variation in G + C content. Mol Biol Evol 2003; 20:105-10. [PMID: 12519912 DOI: 10.1093/molbev/msg008] [Citation(s) in RCA: 28] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022] Open
Abstract
There is a long-standing debate in molecular evolution concerning the putative importance of GC content in adapting the thermal stabilities of DNA and RNA. Most studies of this relationship have examined broad-scale compositional patterns, for example, total GC percentages in genomes and occurrence of GC-rich isochores. Few studies have systematically examined the GC contents of individual orthologous genes from differently thermally adapted species. When this has been done, the emphasis has been on comparing large numbers of genes in only a few species. We have approached the GC-adaptation temperature hypothesis in a different manner by examining patterns of base composition of genes encoding lactate dehydrogenase-A (ldh-a) and alpha-actin (alpha-actin) from 51 species of vertebrates whose adaptation temperatures ranged from -1.86 degrees C (Antarctic fishes) to approximately 45 degrees C (desert reptile). No significant positive correlation was found between any index of GC content (GC content of the entire sequence, GC content of the third codon position [GC(3)], and GC content at fourfold degenerate sites [GC(4)]) and any index of adaptation temperature (maximal, mean, or minimal body temperature). For alpha-actin, slopes of regression lines for all comparisons did not differ significantly from zero. For ldh-a, negative correlations between adaptation temperature and total GC content, GC(3), and GC(4) were observed but were shown to be due entirely to phylogenetic influences (as revealed by independent contrast analyses). This comparison of GC content across a wide range of ectothermic ("cold-blooded") and endothermic ("warm-blooded") vertebrates revealed that frogs of the genus Xenopus, which have commonly been used as a representative cold-blooded species, in fact are outliers among ectotherms for the alpha-actin analyses, raising concern about the appropriateness of choosing these amphibians as representative of ectothermic vertebrates in general. Our study indicates that, whereas GC contents of isochores may show variation among different classes of vertebrates, there is no consistent relationship between adaptation temperature and the percentage of thermal stability-enhancing G + C base pairs in protein-coding genes.
Collapse
Affiliation(s)
- Rachael A Ream
- Hopkins Marine Station of Stanford University, Pacific Grove, California, USA
| | | | | |
Collapse
|
47
|
Sueoka N. Wide intra-genomic G+C heterogeneity in human and chicken is mainly due to strand-symmetric directional mutation pressures: dGTP-oxidation and symmetric cytosine-deamination hypotheses. Gene 2002; 300:141-54. [PMID: 12468095 DOI: 10.1016/s0378-1119(02)01046-6] [Citation(s) in RCA: 22] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/26/2022]
Abstract
The intra-strand Parity Rule 2 of DNA (PR2) states that A=T and G=C within each strands. Useful corollaries of PR2 are G/(G+C)=A/(A+T)=0.5, G/(G+A)=C/(C+T)=G+C, G/(G+T)=C/(C+A)=G+C. Here. A, T, G, and C represent relative contents of the four nucleotide residues in a specific strand of DNA, so that A+T+G+C=1. Thus, deviations from the PR2 is a sign of strand-specific (or asymmetric) mutation and/or selection pressures. The present study delineates the symmetric and asymmetric effects of mutations on the intra-genomic heterogeneity of the G+C content in the human genome. The results of this study on the human genome are: (1) When both two- and four-codon amino acids were combined, only slight departures from the PR2 were observed in the total ranges of G+C content of the third-codon position. Thus, the G+C heterogeneity is likely to be caused by symmetric mutagenesis between the two strands. (2) The above result makes the deamination of cytosine due to double-strand breathing of DNA [Mol. Biol. Evol. 17 (2000) 1371] and/or incorporation of the oxidized guanine (8-oxo-guanine) opposite adenine during DNA replication (dGTP-oxidation hypothesis) as the most likely candidates for the major cause of the diversities of the G+C content. (3) Patterns of amino acid-specific PR2-biases detected by plotting PR2 corollaries against the G+C content of third codon position revealed that eight four-codon amino acids can be divided into three types by the second codon letter: (a) C(2)-type (Ala, Pro, Ser4, and Thr), (b) G(2)-type (Arg4 and Gly), and (c) T(2)-type (Leu4 and Val). (4) Most of the asymmetric plot patterns of the above three classes in PR2 biases can be explained by C(2)-->T(2) deamination of C(2)pG(3) of C(2)-type to T(2)pG(3) (T(2)-type) in both human and chicken. This explains the existence of some preferred codons in human and chicken. However, these biases (asymmetric) hardly contribute to the overall G+C content diversity of the third codon position.
Collapse
Affiliation(s)
- Noboru Sueoka
- Department of Molecular, Cellular, and Developmental Biology, University of Colorado, Boulder, CO 80309-0347, USA.
| |
Collapse
|
48
|
Abstract
Genes are non-uniformly distributed in the human genome, reaching the highest concentration in GC-rich isochores. This is one of the fundamental aspects of the human genome organization (Gene 241/259 (2000a,b) 3/31, for a review). In the present paper the gene distribution was analyzed in relationship to the gene expression pattern and levels. In this study evidence is produced showing: (i) that a biased gene distribution towards GC-rich isochores applies to both tissue-specific and housekeeping genes; and (ii) that genes localized in GC-rich isochores have high transcriptional levels. Since gene density and transcriptional levels are correlated with each other and both are correlated with the GC content of the isochores, the biased gene distribution in the human genome presumably is the result of selection at the gene expression levels.
Collapse
Affiliation(s)
- Giuseppe D'Onofrio
- Laboratorio di Evoluzione Molecolare, Stazione Zoologica Anton Dohrn, Naples, Italy.
| |
Collapse
|
49
|
Nikolaou C, Almirantis Y. A study of the middle-scale nucleotide clustering in DNA sequences of various origin and functionality, by means of a method based on a modified standard deviation. J Theor Biol 2002; 217:479-92. [PMID: 12234754 DOI: 10.1006/jtbi.2002.3045] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/22/2022]
Abstract
The deviation from randomness in the distribution of nucleotides in genomic sequences is quantified and studied, using a modified standard deviation (MSD). This method implies a "per block" computation of the standard deviation of the nucleotide frequencies of occurrence, using local means (means taken in a neighborhood of each block). This quantity may serve as a scale-dependent measure of the nucleotide clustering. In the present work, the meso-scale of tenths of nucleotides is principally explored, by means of suitably adjusted filter parameters. This length scale is of an order of magnitude not directly affected by the grammar and syntax rules of the protein-coding procedure, remaining shorter than the scale of appearance of large-scale characteristics of the genome. MSD has been found to distinguish systematically between the sequences of different origin and functionality. The most near-random are found to be coding sequences of prokaryotes, while in intronic and intergenic regions of eukaryotic genomes, extended clustering of similar nucleotides is observed. The distributions of MSD values of large collections of sequences are found to be in most cases characteristic of their biological role and origin. Protein- and non-coding, prokaryotic and eukaryotic DNA as well as promoter, rRNA, viral and organelle sequences have been examined. The presented results corroborate a recently proposed model for genome evolution. The method is also applied for an assessment of the annotation of ORFs taken from the complete genome of Saccharomyces cerevisiae.
Collapse
Affiliation(s)
- Christoforos Nikolaou
- Institute of Biology, National Research Center for Physical Sciences, "Demokritos" 15310, Athens, Greece
| | | |
Collapse
|
50
|
Abstract
An analysis by CsCl density gradient centrifugation has shown that, at a fragment size of about 100 kb, the DNA of a urochordate, Ciona intestinalis, is remarkably homogeneous in base composition. Localization of 16 coding sequences from C. intestinalis, chosen so as to cover the distribution range of all available coding sequences for this organism, showed a nearly symmetrical distribution almost coinciding with the DNA distribution. Both distributions are remarkably different from those found in vertebrates, which are skewed towards high GC levels (to a greater extent in warm-blooded vertebrates). In order to account for this change in genome organization, we propose a working hypothesis that can be tested. Basically, we suggest that the genome duplication that occurred between urochordates and fishes was accompanied by a preferential integration of transposons in one compartment of the genome, which was made gene-poor (by lowering gene density) compared to the rest. Since the gene-poor compartment (the 'empty quarter') is characterized by a lower level of gene expression compared to the gene-rich compartment (the 'genome core') in the vertebrate genome, we further suggest, as a working hypothesis, that a compartmentalization according to gene expression already existed in urochordates.
Collapse
Affiliation(s)
- Giuliana de Luca di Roseto
- Laboratorio di Evoluzione Molecolare, Stazione Zoologica Anton Dohrn, Villa Comunale, 80121, Naples, Italy
| | | | | |
Collapse
|